Nina Riddell: microarray

Friday, 12 August 2016

Validating RNA-seq results using previously published data

Large-scale omics studies usually require some form of validation following the initial analysis. For RNA-seq or microarray studies, this validation frequently takes the form of qPCR. However, qPCR isn't always possible and, as pointed out here and here, is not without limitations.

Comparing single gene results against a previously published dataset of similar design offers an alternative (or complementary) approach to traditional qPCR validation. The 2010 paper by Suarez-Farinas et al. provides a nice example of how to do this using GSEA. The authors' approach is based on the reasoning that the genes classified as significantly up- and down-regulated in a given study should be enriched at the top and bottom, respectively, of a microarray or RNA-seq dataset testing the same treatment effect. This method is potentially more meaningful than choosing a handful of results for qPCR replication as it allows a large number of differentially-expressed genes to be tested (e.g. 15-1000). Moreover, it can validate genes that respond robustly to the treatment effect of interest under the varied conditions encompassed by the two datasets, whilst ruling out the influence of many small undesirable methodological differences likely to be reproduced by a within-lab qPCR validation.

If you’re interested, check out the paper by Suarez-Farinas et al. (2010) and our recently published RNA-seq study.

Tuesday, 2 June 2015

Analyzing pathway expression using GSEA

For my first study I profiled gene expression in the retina/RPE/choroid using microarray and analyzed the resulting data-sets using Gene Set Enrichment Analysis (GSEA). GSEA evaluates genome-wide expression profiles to determine whether classes of genes (gene sets) are over-represented. These gene sets are based on a priori knowledge, such as KEGG pathways. GSEAs strength lies in its ability to identify subtle changes distributed across a transcript network that may be missed by more traditional single-gene analyses approaches. This approach can unify results across seemingly disparate related data-sets, which is valuable in a research area like mine where relatively few similarities have been identified across the transcriptome-wide studies conducted to date. On a practical level, the results of GSEA are also more interpretable than large lists of individual differentially-expressed genes as they are based in an established biological framework.

When I first started using GSEA to evaluate my dataset our lab had a subscription to Pathway Studio. The Pathway Studio implementation was easy to use and had great graphics; unfortunately our licence expired before I was finished. I switched to the Broad Institute’s GSEA software using the graphical java interface. I’m now glad I was forced to move to a freeware platform, as I think the Broad software gave me greater control of the analysis and the ability to explore the results in more depth.

The Broad GSEA wiki provides a useful over-view of how to use the application. I found that leading edge analysis was a particularly important tool when interpreting results. This analysis identifies the core genes responsible for enrichment of a gene set, and over-lap in these genes across enriched pathways.