All Articles IPA CLC Genomics Server CLC Genomics Workbench HGMD





Analysis Match

« Go Back

Information

 
SummaryCore Analysis tab that displays related analyses (among thousands of curated analyses) based on shared pattern of predicted "entities" like Upstream Regulators, Canonical Pathways, etc.)
Answer

Analysis Match

Automatically discover other IPA Core Analyses with similar (or opposite) biological results as compared to yours, to help confirm your interpretation of the results or to provide unexpected insights into underlying shared biological mechanisms.

Analysis Match automatically compares your analysis against other analyses you have created (in your Project Manager) as well as tens of thousands of other human and mouse expression analyses curated from public sources. This “analysis-to-analysis” matching is based on shared patterns of Canonical Pathways, Upstream Regulators, Causal Networks, and Diseases and Functions.

 

Source of matching analyses:

The analyses included in Analysis Match were generated in IPA from more than 100,000 highly curated and quality-controlled human, mouse, and rat disease and oncology datasets re-processed from SRA, GEO, Array Express, TCGA (by mutational status), LINCS, GTEx, ENCODE Consortium, and more. These datasets were generated by QIAGEN’s OmicSoft acquisition, and are the “comparisons” found in DiseaseLand,  OncoLandSingleCellLand, and Normal Cells and Tissues representing various contrasts such as disease and normal, treatment vs. non-treatment, and much more. Matches against your own analyses, analyses shared with you, and IPA's Example Analyses are also returned in Analysis Match.


Analysis Match results

Analysis Match results are presented in a tab at the right side of the IPA Core Analysis window. If you have licensed the feature, it will be populated with a table that ranks other analyses you have access to versus the one you opened.

By default, the analyses are ranked from those that are most similar to least similar to your analysis and is based on overall similarity scores. The analyses are matched based on a set of signatures that are created for each analysis, namely for Canonical Pathways, Upstream Regulators, Causal Networks, and Diseases and Functions. Each signature is used independently to match against other analyses. 
See this section below for more detail on how the signatures are created and scored.

The Analysis Match tab is shown below for an analysis of the transcriptome of kidney tissue from mice treated with the NRF2 (NFE2L2) activating chemical CDDO-Me ratio’ed to DMSO-treated controls (PMID 26422507).


The results above have been filtered to show only the most strongly similar and dissimilar analyses based on overall z-score percentage. Each of the first four colored columns represents the percentage similarity of each type of signature to the analysis you opened. The fuchsia color indicates similarity and cyan color indicates dissimilarity. The first scoring column (“CP”) is the match for the Canonical Pathway signature, the second (“UR”) is for Upstream Regulators, the third (“CN”) is for Causal Networks, and the last (“DE”) is for Downstream Effects (i.e., Diseases and Functions). The final of the fuchsia and blue column is the average of those four signature matches. The white and purple columns to the right of the z-score columns display the right-tailed Fisher's Exact Test p-value for each of the signature matches.


Filtering your results

You can filter the results by any of the columns by clicking on the funnel icon at the top of the column and entering numbers or text. In the case of the z-score columns, the cutoff value you enter is treated as an absolute value. For example, if you enter a value of 50, the results will be filtered to those with a score that is >50 or <-50.

​You can limit the results to certain (or all) OmicSoft "Lands" and/or any of your own projects.
Click on the Project filter funnel, then click on one or more Lands to select a subset, or click the OmicSoft icon to select all Lands at once.

You can switch to wild card searches using the radio button.


Filter for OmicSoft repositories in Analysis Match.png

You can also filter on metadata that has been captured about each analysis. Each analysis has been annotated with information such as the species used in an experiment, the type of sample comparison (Disease vs. Normal, Treatment vs. Control, etc.), and much more. The full list of metadata columns that can be added / removed using the Customize Table menu is attached to this article.

In the example below, the Analysis Match tab is shown with columns added and removed and filtered on the "sampledatamode" field to limit results to only RNA-seq analyses.





(Note that, unlike this example, some analyses will not have Canonical Pathway signature matches. This is because there are a number of Canonical Pathways that do not currently have z-scores).

Evaluate Metadata

Select a set of analyses of interest that are from the Analysis Match repository and use the Evaluate Metadata button. IPA will then automatically analyze metadata from over 90 curated metadata fields to detect if there are commonalities among the selected analyses.


 

Notice that although the "query" analysis is from kidneys of mice that had been treated with the NRF2 activator CDDO-me, the analyses that match it are enriched for analyses derived from mouse lungs exposed to smoking. In this example, the 30 most similar analyses were selected, eight of which came from "3R4F smoking exposure". There are only 59 analyses in the 90,000 within the repository that have the same treatment, so it is quite significant that eight of them appear in the top 30 results (p-value = 2.17E-13).


Definitions of the columns:

Column nameDefinition 
Metadata fieldThe name of the OmicSoft metadata field where a significant term (keyword or phrase) was found.
Significant termThe actual term that was found to be significantly enriched in the set of analyses you selected.
p-valueThe right-tailed Fisher's Exact Test p-value computed for a particular term in the selected analyses.
Selected analyses with termThe number of analyses among the set of selected analyses that have a significant term.
Total analyses with termThe number of analyses among the entire Analysis Match repository of selected analyses that have the significant term.
Selected analyses with any value in fieldThe number of analyses that could have had a relevant term in the set of analyses that you selected.
Total analyses with any value in fieldThe universe of analyses that have any term in that metadata field.




The next step is to view the underlying details of the matches with a heatmap. Please see Related-analyses-heatmap for details.
 

The OmicSoft Dataset and Analysis Repository in IPA

This section describes what datasets are in the Analysis Match repository in IPA.
 

Scope of the repository

The OmicSoft repository is organized into several project folders in IPA:

DiseaseLand
  • HumanDisease
  • MouseDisease
  • RatDisease
  • LINCS
OncoLand
  • Hematology
  • Metastatic Cancer
  • OncoHuman (Formerly OncoGEO)
  • Pediatrics
  • TCGA
  • OncoMouse
  • ENCODE RNA Binding
SingleCellLand
  • SingleCellHuman
  • SingleCellHumanUmi
  • SingleCellHumanUmiLite
  • SingleCellHumanHCL
  • SingleCellMouse
  • SingleCellMouseUmi
  • SingleCellMouseUmiLite
Normal Cells and Tissues
  • Human Tissues (GTEx)
  
Overview of >118,000 datasets available in Analysis Match (as of July 2022)

Number OmicSoft datasets July 2022.png

How OmicSoft datasets were analyzed in IPA

Omicsoft completely re-processes, normalizes, QA's, and annotates data from public repositories. The resulting datasets derive from a number of different experimental designs, cell types, tissue, array platforms, and RNA-seq technologies. Because of the diversity of experiments in the repository, it is impossible to use one or even a small set of standard cutoffs to analyze them in IPA. Therefore, the following strategy is used to obtain a fairly uniform set of analysis-ready genes for each dataset:

User-added image

Each gene that passes these cutoffs is given an Expr Other value of 1 during the data upload process to "mark" them as an analysis-ready gene. This value is reflected as an up arrow in IPA when viewing the dataset but is not treated as up-regulated by IPA. 

The repository contains over 100,000 datasets, with the majority containing ~1,000 analysis-ready genes. A subset has fewer than 1,000 genes passing the p-value <0.01* cutoff. In each case, the reference set was assigned to the complete dataset, meaning both analysis-ready and all other genes in the dataset. The repository is typically updated quarterly both with new datasets and re-run of all existing to use the latest content.

*Note that for LINCS, the p-value threshold was set to 0.05 rather than 0.01.
 

How signatures are created and compared

After the analysis is created, IPA creates a set of up to four signatures for the analysis, consisting of what is shown in the parentheses.
  • ​Canonical Pathways (up to 20 pathways)
  • ​Upstream Regulators (up to 100 regulators)
  • Causal Networks (up to 100 master regulators)
  • Diseases & Functions (up to 100 diseases or functions)
Each signature was created as described in the illustration below:

User-added image

Not every analysis has enough significant entities of each type to form a full signature for each. For example, there may only be six Canonical Pathways with significant z-scores for a particular analysis, and so for that analysis, the Canonical Pathway Signature would only contain six entities (i.e., six pathways). 


Scoring of signature against other analyses 

IPA computes a z-score for the match of the "query" signature against the signatures of all other analyses as shown:

User-added image

That "raw" z-score is available to view in a hidden-by-default column in the Analysis Match tab. To make the score more useful, IPA normalizes the score by computing the maximum possible z-score for the query signature. This is the best match a signature could possibly have -- a match to itself:

User-added image

Then the actual match (the raw z-score) is calculated as a percentage of the max; i.e., a very strongly matching z-score might be 80% of the maximum, whereas a weakly matching signature might have a raw z-score that is 20% of the max.
 


You can also use the repository without your own analysis, just by searching for available analyses of interest.


The OmicSoft datasets and analysis are stored in the IPA Library, and you can use Dataset and Analysis Search to quickly find analyses of interest. Note that they are read-only and cannot be downloaded from IPA.
image

The image below shows a search for human asthma analyses but excludes those involving albuterol. From search results like these, you can double click to open an analysis or select up to 20 for visualization in a full Comparison Analysis. 

image


You can also use the Overlay > Analyses, Datasets, & Lists feature to search for OmicSoft analyses by metadata keywords to overlay onto a pathway or network of interest to see which molecules are up-or-down-regulated in that particular condition.


You may not have Analysis Match active on your license today, but please consult with your local QIAGEN customer solutions manager or email ts-bioinformatics@qiagen.com  for additional details on how to get access.
 
 
 
 
 
 
 
 
TitleAnalysis Match
URL NameAnalysis-Match