WormBase Version WS262

If you noticed, version WS262 has been out for several days.  The version number present at the top of the WormBase home page is now a link to the Release Notes which is a concise summary listing the various data types and their numbers.  Check it out if you want to get a quick view of the breadth and depth of the data in WormBase and to see what has changed since the last release.  Also find the list of data files available for this release, on our FTP site.

Data explained: gene descriptions

WormBase writes and displays short summaries about genes, in the ‘Overview’ widget on the very top of gene pages.  When we realized we couldn’t keep up with both updating and writing new gene descriptions, we developed an automated gene descriptions data pipeline that looks at primary data from the most recent WormBase release, in order to write gene descriptions for the next WormBase release (eg., the gene descriptions for the WS262 release of WormBase are based on the WS261 WormBase data release). The data we currently include in a gene description are – orthology to human for C. elegans genes and orthology to C. elegans for non-elegans species genes (such as C. briggsae), biological process, molecular function and cellular localization (based on Gene Ontology (GO) annotations), and tissue expression data.  For poorly studied genes with no functional data, we include expression and regulation data summaries from large scale experiments such as microarray, tiling array and RNA sequencing. For every new release, scripts add new data that has been curated between the releases, in the above categories, to the gene descriptions.  We currently have over 140,000 gene descriptions for nine species. The descriptions for the non-elegans genes such as C. briggsae, C. japonica, etc. can be found in the ‘Overview’ widget on their respective gene pages.  In addition, we also make available a file with all the gene descriptions for a given species by release, on our FTP site for download, for eg., for C. elegans, the ‘c_elegans.PRJNA13758.WS262.functional_descriptions.txt’ file is available here.  Files for other species can be found by going down a similar directory structure in the WS262 release directory.

Specify genome locations with 30 nucleotides of flanking sequence!

Many of us have had the experience of trying to reconstruct what someone has done and been frustrated trying to find the exact sequence. Relative coordinates do not last: gene models often change so that “Leu234” in a protein is no longer there and our knowledge of genome sequence changes (or we are working with a different strains) so the EcoR1 site 5’ to your favorite gene is not there.  There is an easy solution: always specify a location by sequence. Thirty nucleotides is sufficient in essentially all cases to uniquely locate the site. Your simple effort in specifying a genome location by sequence, when you are writing a paper will make experiments easily reproducible, as well as help WormBase in curating such studies.