WS267 release of WormBase

Please note that the WS267 release of WormBase is now live.  The complete release notes for WS267 can be viewed here.  Some of the highlights include–

Trichuris muris update
Trinity assembled RNASeq reads from publicly available short read data at SRA have been added to Trichuris muris as additional track and alignments. In addition IsoSeq data from long-range PacBio data (provided by the Berriman lab), corrected by genome alignment has been used as additional source to build transcript models.

In addition the Trichuris muris ncRNA gene set has increased from 26 to 759 following the integration of data produced by the WormBase Parasite ncRNA prediction pipeline. These transcripts have been fully integrated with stable IDs and associated naming and meta data.

Gene descriptions for T. muris will be coming in the WS268 release of WormBase.

Brugia malayi update
New gene models provided by the Beech lab for Parasitology at the McGill University have been merged into the official gene set.

WS266 release

Please note that the WS266 version of WormBase has been released! The release notes for this release describe the data types and their numbers. A list of all files available on our FTP site can also be viewed.  Changes in this release include the following:

Physical Interaction data curation

We have added over 5000 manually curated physical interactions which include binary protein-protein interactions as well as protein interactions that occur in a protein complex. Protein-protein interaction data can be found as a part of physical interaction data in the Interactions widget on the gene page. The Interactions widget provides different types of interaction data related to the gene of interest, such as physical, genetic, regulatory, and predicted interactions.

 

Protein Identifiers

We have made a change to our internal identifiers for nematode protein sequences. Previously, we prefixed each identifier with *two* prefixes to denote which  species the protein is from, e.g. WP:CE00001. We have removed the first prefix (the “WP:”) from these identifiers.

Since these prefixes are almost entirely invisible on the site and have never been used by external resources hosting worm data (e.g. UniProt), this change should not affect most users.

WS265 release

loci with two different protein products

There are a small number of loci which code for two very different protein products. These include dicistronic mRNA operons and loci which have a few small exons in common then have alternate splicing leading to many different exons being used.

The details of some of these can be found in the WormBook chapter “Operon and non-operon gene clusters in the C. elegans genome”.

Previously, these have been curated as isoforms of a single gene. This was causing problems because the description of gene
function would be based on one isoform, leaving the other isoform undescribed or described incorrectly, based on the first isoform.
There are 42 C. elegans and 47 C. briggsae known loci that have now been split to have different Gene IDs.

Guest Blog: New Nematodes in WormBase

 

This WS264 release of WormBase includes two new genome assemblies from both a free-living Caenorhabditis species (C. nigoni) and a whipworm parasite of mice (Trichuris muris).

The C. nigoni genome was assembled from both long-read (Pacific Biosciences) and short-read (Illumina) data, and then further scaffolded by genome-wide alignment with its very close relative, C. briggsae.

Despite the fact that C. nigoni and C. briggsae are closely enough related to produce partially fertile offspring, their lifestyles and genomes are quite different.  C. briggsae, like C. elegans, is primarily a self-fertilizing hermaphrodite with roughly 1% males.  C. nigoni, in contrast, is like most animal species (including humans) and has 50% males with 50% females.  At the molecular level, C. nigoni‘s genome is larger than that of C. briggsae (130 Mb versus 108 Mb) and encodes 7,000 more genes, which appear to have been lost in C. briggsae after it evolved hermaphroditism, and which disproportionately encode small proteins with male-biased expression.

The T. muris genome was assembled from long-read (Pacific Biosciences) and short-read (Illumina) data, with the help of an optical map.

T. muris infects the caecum region of the mouse large intestine, and is very closely related to the human whipworm parasite T. trichiura, for which T. muris is a laboratory model.  Adult whipworms have a highly unusual body shape for nematodes: their heads and front bodies have a whip-like shape that can be inserted into intestinal cells like a flexible needle, and that is easily mistaken as a “tail” rather than the worm’s head.  Whipworm heads have a specific ultrastructure called a “stichosome” that allows them both to suck nutrients out of intestinal cells and to export immunosuppressive molecules into their hosts.  This strategy is unfortunately effective: over 700 million human beings are currently infected by T. trichiura.  Having a high-quality genome assembly for T. muris raises the hope of rational interventions against this worldwide parasite.

guest authors: Faye Rogers(1) and Erich Schwarz(2)

(1) Wellcome Sanger Institute

(2) Cornell University

 

Related Links:

WS264 release

C. elegans sORFs

sORFs.org is a public repository of small open reading frames (sORFs) identified by ribosome profiling (RIBO-seq).

It contains predicted sORF regions for several species, including C. elegans.

We have annotated 118 predicted sORF regions as coding (CDS) isoforms of the existing genes. It is likely that in the next release, where these isoforms do not overlap with existing isoforms, these sORF regions will be changed to be individual genes and not isoforms.

52 of these annotated sORF regions do not start with the canonical Methionine AUG initiation codon. It is possible that they use a non-canonical initiation codon. Some of these non-canonical initiation codons are not the expected non-canonical initiation codon Isoleucine, but code for residues like Valine.

Trichuris muris

This release we see the integrated of the Edinburgh strain of Trichuris muris version TMUE3.1. This species has been fully integrated as a core species meaning there are stable IDs and tracking with inclusion in all additional pipelines and analysis.
The genome assembly and gene annotation has been taken directly from the Pathogen Genomics group at the WTSI. Additional mapping of gene mergers, splits and transfer of IDs from the TMUE2.2 has been done to allow users to identify their genes of interest.

Caenorhabditis nigoni

This release includes the Caenorhabditis nigoni genome assembly and gene set described in “Rapid genome shrinkage in a self-fertile nematode reveals sperm competition proteins” by Da Yin, et. al (Science 359,55-61 2018) as non-core species set.
This species should be of special interest, due to its phylogenetic closeness to C.briggsae and its differences in sexual reproduction.

The data is available as files on the WormBase FTP site, as well as the JBrowse genome browser.