New and Improved Author First Pass Pipeline!

Community curation is a valuable part of WormBase. For the past ten years, C. elegans researchers have participated in our ‘Author First Pass’ pipeline using a web-based form to alert WormBase curators to entities and data types in their newly published papers.

We are very excited to announce a new and improved ‘Author First Pass’ pipeline that uses the Textpresso Central text mining system to automatically extract entities and identify data types in new C. elegans papers.

As before, corresponding authors will receive an email (Help WormBase curate your paper!) shortly after publication of their paper with a link to the new form.

But now, instead of having to enter all information de novo, authors simply need to verify the results and, if need be, modify them using simple check boxes and autocomplete menus of WormBase entities.

Links from the ‘Author First Pass’ form to select data entry forms allow for more detailed community curation, if desired.

We look forward to your participation and welcome any feedback you might have on your user experience!

Many thanks to Hannes Buelow, Simon Harvey, Hang Lu, Judith Kimble, Dayong M Wang, and Kunitoshi Yamanaka for helping to test our new form!

Data explained: gene descriptions

WormBase writes and displays short summaries about genes, in the ‘Overview’ widget on the very top of gene pages.  When we realized we couldn’t keep up with both updating and writing new gene descriptions, we developed an automated gene descriptions data pipeline that looks at primary data from the most recent WormBase release, in order to write gene descriptions for the next WormBase release (eg., the gene descriptions for the WS262 release of WormBase are based on the WS261 WormBase data release). The data we currently include in a gene description are – orthology to human for C. elegans genes and orthology to C. elegans for non-elegans species genes (such as C. briggsae), biological process, molecular function and cellular localization (based on Gene Ontology (GO) annotations), and tissue expression data.  For poorly studied genes with no functional data, we include expression and regulation data summaries from large scale experiments such as microarray, tiling array and RNA sequencing. For every new release, scripts add new data that has been curated between the releases, in the above categories, to the gene descriptions.  We currently have over 140,000 gene descriptions for nine species. The descriptions for the non-elegans genes such as C. briggsae, C. japonica, etc. can be found in the ‘Overview’ widget on their respective gene pages.  In addition, we also make available a file with all the gene descriptions for a given species by release, on our FTP site for download, for eg., for C. elegans, the ‘c_elegans.PRJNA13758.WS262.functional_descriptions.txt’ file is available here.  Files for other species can be found by going down a similar directory structure in the WS262 release directory.