If you noticed, version WS262 has been out for several days. The version number present at the top of the WormBase home page is now a link to the Release Notes which is a concise summary listing the various data types and their numbers. Check it out if you want to get a quick view of the breadth and depth of the data in WormBase and to see what has changed since the last release. Also find the list of data files available for this release, on our FTP site.
Check out the new chapter in WormBook (Genetics)–Repressive Chromatin in Caenorhabditis elegans: Establishment, Composition and Function, by Julie Ahringer and Susan M. Gasser.
WormBase writes and displays short summaries about genes, in the ‘Overview’ widget on the very top of gene pages. When we realized we couldn’t keep up with both updating and writing new gene descriptions, we developed an automated gene descriptions data pipeline that looks at primary data from the most recent WormBase release, in order to write gene descriptions for the next WormBase release (eg., the gene descriptions for the WS262 release of WormBase are based on the WS261 WormBase data release). The data we currently include in a gene description are – orthology to human for C. elegans genes and orthology to C. elegans for non-elegans species genes (such as C. briggsae), biological process, molecular function and cellular localization (based on Gene Ontology (GO) annotations), and tissue expression data. For poorly studied genes with no functional data, we include expression and regulation data summaries from large scale experiments such as microarray, tiling array and RNA sequencing. For every new release, scripts add new data that has been curated between the releases, in the above categories, to the gene descriptions. We currently have over 140,000 gene descriptions for nine species. The descriptions for the non-elegans genes such as C. briggsae, C. japonica, etc. can be found in the ‘Overview’ widget on their respective gene pages. In addition, we also make available a file with all the gene descriptions for a given species by release, on our FTP site for download, for eg., for C. elegans, the ‘c_elegans.PRJNA13758.WS262.functional_descriptions.txt’ file is available here. Files for other species can be found by going down a similar directory structure in the WS262 release directory.
Many of us have had the experience of trying to reconstruct what someone has done and been frustrated trying to find the exact sequence. Relative coordinates do not last: gene models often change so that “Leu234” in a protein is no longer there and our knowledge of genome sequence changes (or we are working with a different strains) so the EcoR1 site 5’ to your favorite gene is not there. There is an easy solution: always specify a location by sequence. Thirty nucleotides is sufficient in essentially all cases to uniquely locate the site. Your simple effort in specifying a genome location by sequence, when you are writing a paper will make experiments easily reproducible, as well as help WormBase in curating such studies.
WormBase requests that authors provide complete information about genetic entities such as strains, alleles, transgenes, etc., in published papers. Providing a clear list of the experimental genetic entities used, in the paper, along with complete information about them would make curating your paper easier and quicker, saving time and effort that curators spend in searching for such information and/or writing to authors. For example, if you use strains, please provide the complete list of strain names and their genotypes, including transgenes, markers and any additional components. Papers with incomplete information about genetic entities and reagents can only be partially curated or not curated at all, making valuable data about worm models of biology and disease unavailable to both the worm and biomedical research communities.