The difference between expression pattern and expression cluster data in WormBase

Chris Grove explains how expression pattern data differs from expression cluster data in WormBase:

It’s important to understand that we have two avenues of gene expression curation at WormBase: one for individual gene expression pattern evaluation (e.g. using expression reporters like GFP or by performing an in situ hybridization of mRNA) which get captured in our “Expression Pattern” database objects (the “Expr_pattern” class) and a second avenue for larger scale (e.g. genome-wide) analyses of gene expression (e.g. RNA-Seq or microarray analyses) under certain conditions or in certain cells (or in certain life stages) which get captured in our “Expression Cluster” database objects (the “Expression_cluster” class).

In “Expression Pattern” annotations, genes may be associated with an anatomy term via a qualifier which can be one of three terms: Certain, Uncertain, or Partial. If authors declare that a gene is clearly and specifically expressed in a particular anatomical object, we flag it as “Certain”. If authors state that a gene might be expressed in a particular anatomical object, we flag it as “Uncertain”. If authors state that a gene is expressed in part of (or in a subset of) a particular anatomical object, we flag it as “Partial”.

In “Expression Cluster” annotations, genes may be associated with an anatomy term via a qualifier which can be one of three terms: Expressed, Depleted, Enriched. When large scale studies like RNA-Seq are performed in a particular neuron, for example, gene expression can be detected for even very lowly expressed genes and can sometimes include 10,000-15,000 genes (almost the entire set of protein coding genes) and these are considered “Expressed”, albeit possibly very lowly expressed, or expressed at a moderate level but not really more so than in any other tissue (e.g. housekeeping genes). Therefore, genes in these Expression cluster sets are typically evaluated as to whether they are “Enriched” in a particular anatomical object (i.e. statistically much more likely to be expressed there than in other tissues) of “Depleted” in a particular anatomical object (i.e. statistically much less likely to be expressed there than in other tissues). For our expression summary files, such as the “anatomy_association” file on the WormBase FTP site, we only include gene-anatomy associations from the “Expression Cluster” data class when the association is considered “Enriched” (hence why you don’t see the qualifiers “Expressed” or “Depleted” in the file).

Leave a Reply

Your email address will not be published. Required fields are marked *