
A key underlying assumption of OmicsAnalyst is that discrete clusters are present in your omics data.
OmicsAnalyst is designed to detect, visualize and analyze these clusters. A clear outcome of this approach is that
OmicsAnalyst will partition your data into clusters, regardless of whether there are biologically meaningful groups present.
Although this approach may not be suitable for all omics data, such knowledge is rarely known a priori.
Therefore, we strongly recommend users to evaluate their omics data in an unbiased, datadriven manner to
complement mainstream differential analysis and supervised methods.
In addition, users can visualize and analyze the patterns/groups with regard to different metadata they provided.
This function is independent of the clusters detected. For instance, users can directly visualize and compare any
predefined groups within our 3D visualization system.

OmicsAnalyst was designed to provide an intuitive means for clinicians and bench scientists to work directly with big omics data.
It achieves this by integrating dimensionality reduction, densitybased clustering, and 3D visual analytics in a userfriendly webbased platform
to allow users to interact and discover patterns within their large datasets from their personal computer.

Omics Abundance Tables
OmicsAnalyst accepts one or multiple omics abundance tables generated from highthroughput instruments such as
metabolomics data (targeted and untargeted), transcriptomics (bulk and singlecell), and sequencingbased microbiome data (16S rRNA and shotgun
metagenomics). Gene and metabolite annotation across 25 different species is supported. Features must be in rows and samples in columns (example below).
Files must either be in .txt, .csv, or .zip format.
Example Abundance File
Feature Patient1 Patient2 Pateint3
89.10761 26.0996200732 23.1460903921 25.7902022326
108.04453 23.9072161208 30.2066140474 15.2227715523
112.05049 25.2105089519 26.4889060662 22.4397627169
114.06625 25.7236861033 24.5242553536 18.8258771889
116.07045 25.4935276049 26.6822051326 19.4841708816
126.06567 28.3976291914 31.7650213292 27.6698396171
127.12259 25.7866080331 21.3600835035 21.4387426779
140.08137 21.2081862174 21.2651236842 19.6536824974
151.09598 24.756381796 28.6858438887 25.2851696948
153.06519 26.4146136809 25.7091294726 24.3941898518
153.04049 21.1820994064 20.828480842 27.6147928377
......
Metadata Table
OmicsAnalyst accepts a single metadata file containing metadata information for all samples.
In this table, sample names should be in the first column, followed by the different metadata in subsequent columns (example below).
Files must either be in .txt or .csv.
Example Metadata
Samples TissueType Age
Sample1 Liver 71
Sample2 Skin 68
Sample3 Liver 90
Sample4 Skin 61
Sample5 Liver 74
Sample6 Skin 73
Example data from a multiomics (shotgun metagenomics + untargeted metabolomics) study of Ulcerative Colitis (UC)
Notes about formatting your data files:

Sample and feature names must be unique and consist of a combination of common English letters, underscores and numbers for naming purpose. Latin/Greek letters are not supported.

Sample and feature names must be consistent across all files (i.e. omics abundance tables and metadata file).

Data values (read counts or proportions) should contain only numeric and positive values. Empty cells or cells with NA values will be replaced with zero.

Metadata is not permitted in the abundance tables.

The 3D visualization system was developed based on the Web Graphics Library or WebGL technology.
WebGL is the standard 3D graphics API for the web. It allows developers to harness the full power of the computer’s 3D rendering hardware
from within the browser using JavaScript. Before WebGL, developers had to rely on plugins or native applications and ask their users to
download and install custom software in order to deliver a hardwareaccelerated 3D experience.
WebGL is supported by most major modern browsers that support HTML5. We have tested OmicsNet in several major browsers (see below).
Our empirical testings have shown that Google Chrome usually gives the best performance for the same computer:
Name

Version

Note

Google Chrome

50+

★★★★★

Mozilla Firefox

47+

★★★★☆

Apple Safari

10.1+

★★★☆☆

Microsoft Edge

12+

★★★☆☆


Chrome
First, enable hardware acceleration:
 Go to
chrome://settings
 Click the + Show advanced settings button
 In the System section, ensure the Use hardware acceleration when available
checkbox is checked (you'll need to relaunch Chrome for any changes to take effect)
Then enable WebGL:
For more information, see:
Chrome Help: WebGL and 3D graphics.
Firefox
First, enable WebGL:
 Type
about:config
in the browser address bar and press enter
 Search for
webgl.disabled
 Ensure that its value is
false
(any changes take effect immediately without relaunching Firefox)
Then inspect the status of WebGL:
 Go to
about:support
 Inspect the WebGL Renderer row in the Graphics table:
If your graphics card/drivers are blacklisted, you can override the blacklist.
Warning: this is not recommended! (see blacklists note below). To override the blacklist:
 Go to
about:config
 Search for
webgl.forceenabled
 Set it to
true
Safari
 Go to Safari's Preferences
 Select the Security tab
 Make sure to check theAllow WebGL checkbox
Source: https://superuser.com/questions/836832/howcanienablewebglinmybrowser

Algorithm

Full Name

Note

PCA

Principal Component Analysis


PLS

Partial Least Squares


PCoA

Principal Coordinate Analysis


NMDS

Nonmetric Multidimensional Scaling


UMAP

Uniform Manifold Approximation and Projection


O2PLS

Twoway orthogonal PLS


sCCA

Sparse Canonical Correlation Analysis


Procrustes

Procrustes analysis



OmcisAnalyst is very flexible and can be used to answer many different questions in omics and multiomics data analysis.
Below are some common questions that OmicsAnalyst can address.

Whether two omics are correlated and the significance of the correlation

OmicsAnalyst offers Robust Maximum Association Between Data Sets using a highperformance R package ccaPP.
The package tests the maximum association measures using several different mesures including Pearson, Spearman or Kendall.
The signicance of maximum association estimates can be assessed via permutation tests.

PCA searches for patterns in the variables, while PCoA searches for similarities between samples.
Unlike PCA/PLS that uses raw data, PCoA takes a (dis)similarity matrix as input and assigns each item a location in
lowdimensional space. Distancebased ordinations such as PCoA are recommended over PCA when there are lots of missing data as PCA would result
in all samples clustering near the origin.

Both PCoA and NMDS take a distance matrix as input. PCoA maximizes the linear correlation between samples, whereas
NMDS maximizes the rankorder correlation between samples. Users should use PCoA if distances between samples are so
close that a linear transformation would suffice. NMDS is suggested if users wish to highlight the gradient structure within
their data.

Stress scores: Less than 5 is excellent (rare), 510 good, 1020 fair, usable, but could be misleading,
and scores greater than 20 should be interpreted with caution.

UMAP and tSNE are both popular dimensionality reduction methods widely used in singlecell transcriptomics. However,
tSNE suffers from some limitations, namely a slow computation time and inability to capture global structure.
Meanwhile, UMAP preserves more local and global data structure than tSNE with a shorter computation time.
This means that for tSNE, only innercluster distances are meaningful, while intercluster relations may be
more informative in UMAP versus tSNE.

Samples that are clustered together are closely related. However, the sizes of clusters relative to each other is meaningless.
As well, distances between clusters is also likely meaningless. While global positions of clusters are preserved, distances are not meaningful.

The UMAP algorithm is stochastic, meaning that different runs with the same parameters may yield different results.

Since the the embedding is potentially a highly nonlinear transformation, no direct important measures are offered by tSNE/UMAP.
To answer this question, users can directly compare these clusters using several wellestablished differential analysis methods
(such as univariate tests, limma, DEseq2, EdgeR) to identify significant features underlying these clusters.

Procrustes analysis is the analysis of shapes. It takes as input two ordination matrices with corresponding points,
and transforms one ordination by rotating, reflecting, scaling, and translating it to minimize the distances
between corresponding points in the other ordination (maximizing fit between corresponding observations).
In OmicsAnalyst, raw omics data is transformed into ordinations with
PCA, which are then configured to minimize the sum of square deviations between corresponding points (samples).

PROcrustean randomization TEST (PROTEST) is a permutation test to determine the measure of goodness of fit (m^{2}) of two datasets.
The null hypothesis of PROTEST is that the two datasets do not exhibit greater concordance than expected by chance. Using
the permutation approach, variables from one dataset are randomly ordered while keeping the covariance structure, and the fit
between the datasets is recalculated. The original fit is then compared to the fit obtained from the randomized data.
This is repeated several times to determine the number of times when the original fit was smaller or equal to the fit obtained from the randomized data.
For more information, refer to here. By default,
the PROTEST implementation in OmicsAnalyst runs 999 permutation and outputs values
for the "sum of squares", "correlation", and "pvalue". For the sum of squares (m_{12}), values vary from 0 to 1, with low values
indicating greater concordance.

The Procrustes plot provides a visual indication of match between two ordinations. Spheres represent samples and belong
to either omics 1 or omics 2 depending on the color of the line connected to the sphere.
The lines between two spheres represent the position of a sample in the second ordination
to its position in the target ordination. Longer distances (lines) between the two spheres indicates poor match
while short distances indicate good agreement between datasets.

The MCIA plot shows the projection of two omics datasets into the same dimensional space. Shapes represent samples and
identical samples are connected by a line to the center point, which represents the reference structure which
maximizes the covariance derived from the MCIA synthetic analysis.
The shorter the line, the better the correlation between samples obtained by different omics.

The visualization is limited by the performance of users' computers and screen resolutions.
Too many data points will result in greater latency in manipulating the plot.
Based on empircal tests and practical utilities, we recommend to keep the total data points
to be less than 5000  it is rare that the sample size will be larger than this number.
For very large data, please make sure you have a decent computer equipped with a high performing graphics card.