WormBase updates in WS277 – new protein schematic images!

We have released the 277th version of WormBase! As always, for a detailed report please look at the WS277 release notes.

New features: We’ve updated the protein schematic image in the homology widget on protein pages (see, for example, the UNC-2, isoform a page.) This image displays protein domains and exons mapped to amino acid coordinates, making it easy to see which regions of a transcript correspond to specific features. Driven by JBrowse — the genome browser at WormBase — users can click through the image to an interactive view in amino acid coordinates. From that familiar interface, one can scroll, export sequence (can they?), get additional information on specific features, and zoom in to single amino acid residue resolution, color-coded by chemical property.  The previous static protein schematic image remains available by clicking on the “Legacy Protein Schematics” link, but will be removed after the WS278 release.

New data sets:

C. elegans: Additional Nanopore transcript data has been added.

C. elegans VC20210: The VC2010 strain data includes gene annotation which has been manually curated to improve the coding gene structures lifted over from the N2 strain annotation. This process is substantially complete, with some work still to be done on chromosome X. The number of coding genes which do not map correctly has been reduced so far from around 400 down to 50 genes which cannot yet be located and 10 which appear to be pseudogenes in VC2010. There are 20 genes which appear to be duplicated in N2 and have disappeared in VC2010. There have been 39 novel coding genes created.


WormBase is updated to WS260

The 260th version of WormBase has been released!  The release notes that are published with each new release gives you various statistics about the data in the current release as well as providing a quick peak at the breath and depth of the data in WormBase.  It also lists all the data files available for download at our FTP site.  Check all of these out and let us know if you have any suggestions or questions!

Gene Transfer Format (GTF) files now available

WormBase now provides the canonical gene set for each species in Gene Transfer Format (GTF, http://mblab.wustl.edu/GTF22.html). These files can be used directly by a number of popular sequence analyses tools (e.g. Cufflinks).

The GTF files are available from the WormBase FTP site, for example, the GTF file for C. elegans, c_elegans.PRJNA13758.WS253.canonical_geneset.gtf.gz, is available here.

 

WS247: C. briggsae genes have descriptions!

In the previous WS246 release we introduced automated gene descriptions for C. elegans genes that lacked a manually written one. These gene descriptions include information related to orthology, process, function and sub-cellular localization (when these data-types have been curated in the WormBase database), giving the user a quick overview of the gene. The current WS247 release includes automated descriptions for over 18,000 C. briggsae genes.  Check out the C. briggase gene pages to view these descriptions under ‘Overview’!  In future releases, we will add genes from many more species!  Also, WormBase is working on user-friendly forms which you can use to edit these descriptions and make them better.