Please specify allele and strain names in publications

WormBase curates data from published papers and attaches different types of data such as phenotype, overview, expression, human disease model, etc., to genetic entities such as genes, alleles, strains or transgenes. These are also bonafide  ‘objects’ in our database which allow us to attach data to them.  If we cannot find these named genetic entities in your paper it becomes extremely difficult for us to curate the paper.  It is not enough to just specify the amino acid or nucleic acid change of a mutation, we need either the strain or allele name to curate the paper.

WormBase Version WS262

If you noticed, version WS262 has been out for several days.  The version number present at the top of the WormBase home page is now a link to the Release Notes which is a concise summary listing the various data types and their numbers.  Check it out if you want to get a quick view of the breadth and depth of the data in WormBase and to see what has changed since the last release.  Also find the list of data files available for this release, on our FTP site.

Data explained: gene descriptions

WormBase writes and displays short summaries about genes, in the ‘Overview’ widget on the very top of gene pages.  When we realized we couldn’t keep up with both updating and writing new gene descriptions, we developed an automated gene descriptions data pipeline that looks at primary data from the most recent WormBase release, in order to write gene descriptions for the next WormBase release (eg., the gene descriptions for the WS262 release of WormBase are based on the WS261 WormBase data release). The data we currently include in a gene description are – orthology to human for C. elegans genes and orthology to C. elegans for non-elegans species genes (such as C. briggsae), biological process, molecular function and cellular localization (based on Gene Ontology (GO) annotations), and tissue expression data.  For poorly studied genes with no functional data, we include expression and regulation data summaries from large scale experiments such as microarray, tiling array and RNA sequencing. For every new release, scripts add new data that has been curated between the releases, in the above categories, to the gene descriptions.  We currently have over 140,000 gene descriptions for nine species. The descriptions for the non-elegans genes such as C. briggsae, C. japonica, etc. can be found in the ‘Overview’ widget on their respective gene pages.  In addition, we also make available a file with all the gene descriptions for a given species by release, on our FTP site for download, for eg., for C. elegans, the ‘c_elegans.PRJNA13758.WS262.functional_descriptions.txt’ file is available here.  Files for other species can be found by going down a similar directory structure in the WS262 release directory.