The difference between expression pattern and expression cluster data in WormBase

Chris Grove explains how expression pattern data differs from expression cluster data in WormBase:

It’s important to understand that we have two avenues of gene expression curation at WormBase: one for individual gene expression pattern evaluation (e.g. using expression reporters like GFP or by performing an in situ hybridization of mRNA) which get captured in our “Expression Pattern” database objects (the “Expr_pattern” class) and a second avenue for larger scale (e.g. genome-wide) analyses of gene expression (e.g. RNA-Seq or microarray analyses) under certain conditions or in certain cells (or in certain life stages) which get captured in our “Expression Cluster” database objects (the “Expression_cluster” class).

In “Expression Pattern” annotations, genes may be associated with an anatomy term via a qualifier which can be one of three terms: Certain, Uncertain, or Partial. If authors declare that a gene is clearly and specifically expressed in a particular anatomical object, we flag it as “Certain”. If authors state that a gene might be expressed in a particular anatomical object, we flag it as “Uncertain”. If authors state that a gene is expressed in part of (or in a subset of) a particular anatomical object, we flag it as “Partial”.

In “Expression Cluster” annotations, genes may be associated with an anatomy term via a qualifier which can be one of three terms: Expressed, Depleted, Enriched. When large scale studies like RNA-Seq are performed in a particular neuron, for example, gene expression can be detected for even very lowly expressed genes and can sometimes include 10,000-15,000 genes (almost the entire set of protein coding genes) and these are considered “Expressed”, albeit possibly very lowly expressed, or expressed at a moderate level but not really more so than in any other tissue (e.g. housekeeping genes). Therefore, genes in these Expression cluster sets are typically evaluated as to whether they are “Enriched” in a particular anatomical object (i.e. statistically much more likely to be expressed there than in other tissues) of “Depleted” in a particular anatomical object (i.e. statistically much less likely to be expressed there than in other tissues). For our expression summary files, such as the “anatomy_association” file on the WormBase FTP site, we only include gene-anatomy associations from the “Expression Cluster” data class when the association is considered “Enriched” (hence why you don’t see the qualifiers “Expressed” or “Depleted” in the file).

Textpresso searches now available on WormBase pages

Textpresso is a powerful full text search engine for several biomedical literatures. Textpresso searches are now available on many types of pages in WormBase such as gene pages, variation/allele pages, molecule/chemical pages, etc., in the References widget. Note that you should have the ‘References’ widget open (by clicking on it) from the side bar on a gene, allele, or molecule page. Click on the ‘Textpresso’ link in ‘Find references identified by Textpresso’ at the bottom of the references to trigger a search for that entity by the Textpresso search engine. For example, clicking this link in the ‘References’ widget on the lin-10 page will trigger a full text search by the Textpresso search engine for lin-10 in the C. elegans literature and return a page displaying the search results.

WormBase version WS274 released

Please note that WormBase version WS274 was released and is the current release now live on the website. The version of the C. elegans reference genome in this release is WBcel235, included since WS235.

Highlights of this release include:

  1. Isoseq data which was presented at the PacBioNorth America User Group meeting. We applied the PacBio post processing pipeline as described here. The resulting sequences are including in the transcript alignments and will be visible on the genome browser. They will also be included in our automatic transcript generation pipeline in a future release of WormBase.
  2. Improved code to generate non-redundant transcripts. Each CDS should now have only unique transcripts.
  3. Improved interaction data display in the Interactions widget: New Venn diagram for the different types of interactions-physical, genetic and regulatory.

For a complete and more detailed list of changes for this release please see:

https://wormbase.org//about/wormbase_release_WS274#0–10