Showing posts with label data analysis. Show all posts
Showing posts with label data analysis. Show all posts

Wednesday, 1 July 2015

Mapping a gene list onto KEGG pathways

In a previous post I discussed using GSEA to identify enriched KEGG pathways in my microarray data-set. After running the GSEA leading edge analysis I had a list of the genes that contributed most to the enrichment score for each pathway. I then wanted to examine where these genes fell within the KEGG pathway. The ‘user data mapping’ function in KEGG Mapper is a nice tool to achieve this quickly.

User data mapping

Select the ‘user data mapping’ option when viewing the reference KEGG pathway. In the pop-up window enter the gene symbols followed by the background and foreground color in hexadecimal numbers (if no color is specified the default is red).

Clicking ‘pathway mapping’ updates the reference to highlight the entered gene list. In this case it appears that my leading edge includes a sub-set of genes involved in methylation of histone lysine residues (in addition to the main pathway functions).

Tuesday, 2 June 2015

Analyzing pathway expression using GSEA

For my first study I profiled gene expression in the retina/RPE/choroid using microarray and analyzed the resulting data-sets using Gene Set Enrichment Analysis (GSEA). GSEA evaluates genome-wide expression profiles to determine whether classes of genes (gene sets) are over-represented. These gene sets are based on a priori knowledge, such as KEGG pathways. GSEAs strength lies in its ability to identify subtle changes distributed across a transcript network that may be missed by more traditional single-gene analyses approaches. This approach can unify results across seemingly disparate related data-sets, which is valuable in a research area like mine where relatively few similarities have been identified across the transcriptome-wide studies conducted to date. On a practical level, the results of GSEA are also more interpretable than large lists of individual differentially-expressed genes as they are based in an established biological framework.

When I first started using GSEA to evaluate my dataset our lab had a subscription to Pathway Studio. The Pathway Studio implementation was easy to use and had great graphics; unfortunately our licence expired before I was finished. I switched to the Broad Institute’s GSEA software using the graphical java interface. I’m now glad I was forced to move to a freeware platform, as I think the Broad software gave me greater control of the analysis and the ability to explore the results in more depth. 

The Broad GSEA wiki provides a useful over-view of how to use the application. I found that leading edge analysis was a particularly important tool when interpreting results. This analysis identifies the core genes responsible for enrichment of a gene set, and over-lap in these genes across enriched pathways.