New VC2010 and improved CB4856 genomes published

The VC2010 genome assembly has just been published in Genome Research as an open-access article. This assembly is largely derived from a completely isogenic VC2010-derived strain, PD1074, that is now available from the CGC.

In related news, Junho Lee and his lab have simultaneously published an open-access article on their substantially improved genome assembly for Hawaiian C. elegans (CB4856).

The Caenorhabditis Genomes Project (CGP)

The sequencing of the genome of the nematode Caenorhabditis elegans [1] remains one of the milestones of modern biology, and this genome sequence is the essential backdrop to a vast body of work on this key model organism. As Dobzhansky said, “Nothing in biology makes sense except in the light of evolution”, and it is clear that complete understanding of C. elegans will only be achieved when it is placed in an evolutionary context. We have initiated a project to generate that context, aiming to generate draft genomes for all the remaining unsequenced species.


Genomics of Caenorhabditis

C. elegans is but one nematode, an “anecdotal” instance of how a genomic system generates a complex organism. But how did this system come to be? Which parts are historical accident and which are the result of selection? What competing forces are at work in shaping the genome – its composition, size, synteny and linkage dynamics, repeat content, mobile element diversity, gene structure, gene birth and death, sequence diversity, … ? To deliver answers to these questions (and many more) we contend that genome sequence information from as many related species within the genus Caenorhabditis will form an essential backdrop to specific research programmes.

The time is now ripe for a programme to sequence the diversity of Caenorhabditis. In the last 10 years, there has been a remarkable global effort of discovery of new species, sparked by the Félix lab’s discovery of the likely “true” ecology of Caenorhabditis in rotting fruits and other plant material [2]. The number of species in culture now exceeds 40 (and is growing) and their relationships have been robustly inferred using multi-locus analyses by the Kiontke lab [3]. Several genomes are already available. After the success of the C. elegans genome project, the genome of C. briggsae [4] was sequenced, the NHGRI sponsored the sequencing of C. nigoni, C. brenneri and C. tropicalis at WUGSC [5], the Sternberg lab has sequenced C. angaria [6], the Phillips lab has sequenced C. remanei and the Blaxter lab has sequenced C. wallacei (aka sp. 16), sp. 5 and sp.1.

A step change in sequencing technologies, and in assembly algorithms, now means that good-enough genomes can be generated quickly, efficiently and cheaply. We therefore have embarked on a project to “complete” the sequencing of all Caenorhabditis species currently available in culture, a Caenorhabditis Genomes Project (CGP). The project will be funded largely from generous application of intramural support from Edinburgh Genomics (, and led by the Blaxter laboratory in Edinburgh (, but we invite all interested researchers to join us in an open collaboration. Additional funding will be sought to improve the genome assemblies, and any support available in the community will significantly improve what can be done. We expect that additional species will be discovered, and would hope to add them to the project as they are defined.

The strategy

The current roster of genomes, and their status, is available at
We intend that the GCP will be an open collaboration and will be making data available for free download under the “usual” agreements – basically that anyone carrying out whole genome analyses contacts us before proceeding to publication (and preferably much earlier) so that we can all coordinate efforts. There is so much to be done that collaboration will be essential.

Data generation Our strategy is to ask researchers with live cultures, preferably inbred strains, to make DNA and RNA and to ship these to Edinburgh for sequencing. We are not demanding that inbred lines be generated, as this process often takes many months, and can generate very sick nematodes that are unlikely to be good representatives for their species. Advances in assembly routines mean that we are much better able to deal with heterozygosity issues during assembly. We are currently generating a standard dataset for each species (125 b paired end data from two short insert genomic libraries at 350 and 550 bases [~80 M read pairs, or ~100x coverage], and stranded RNASeq data [~25 M read pairs]) using Illumina HiSeq2500v4 instruments. For selected species we may also produce Illumina mate pair libraries and / or PacBio data (and would encourage colleagues with special interest in a species to “sponsor” the generation of these additional scaffolding data).

Primary analyses Raw data will be posted on the project website as it is generated and passes QC (and also uploaded to SRA). Colleagues are free to download and analyse the raw data. We will be building best-effort assemblies for each genome, possibly by having collaboratively competitive mini-assemblathons for each set of species as they come off the sequencers. Assemblies will be posted along with explicit recipies describing how they were generated and core quality metrics.

Annotation We will perform best-practice gene finding on each species using the stranded RNASeq and comparative data from other species, and decorate the genomes with annotation (sequence similarity, domains, expression values). The genome annotation files (and a description of the protocols used) will be posted for download. A combination of skills and approaches will give the best results and we will coordinate “annotatathons”, perhaps using collaborative platforms such as WebApollo. In particular, we propose to perform bulk reannotation of all species, following the same protocols for each, periodically (for example when we hit 15 or 20, or all species).

Genome databasing and publication Genome sequences, genes and annotations will be made available through a local genome explorer (an BADGER instance [7]). The BADGER “versions” of the genomes will not act as “databases of record” – we are not intending to replicate WormBase – but rather interim homes for the data to spur research and cooperation. When a genome reaches a stable annotation status, we will deposit it in INSDC (ENA/GenBank/DDBJ) and WormBase [8]. We will aim to promote peer-reviewed publication of the genomes and analyses, and will also publish data papers so that the genomes can be sensibly used and cited as early as is possible.

Project timing, oversight, staffing

We have started the project already. In addition to the three species sequenced by the Blaxter lab in collaboration with Asher Cutter and Marie-Anne Félix already, the Félix lab has provided genomic DNA and RNA from eight new species, and data has been generated for four of these (as of 01 Nov 2014). For many other species DNA and RNA are being generated, and the Rockman, Phillips, Fierst and Wang labs are sequencing additional taxa (and strains). We hope to complete the sequencing in Edinburgh by late Spring 2015, and have assemblies by late Summer 2015. Obviously as data is to be released as we generate it, there will be incremental updates as we approach completion.

We will maintain a project blog, announcing upcoming data, and also an annotation/interest roster where individuals and groups can express interests in species or analysis topics. An open google group will be used to foster discussion and data sharing.

Management and oversight will be light. We propose that an oversight group (composed of – minimally – Mark Blaxter, Marie-Anne Félix, Karin Kiontke, Erich Schwarz, and a WormBase representative) will coordinate data release announcements and assure quality through open conference calls. To promote the project, we would like to have a “Genomics of the genus Caenorhabditis” workshop at the 2015 International C. elegans Meeting.

1. The C. elegans Genome Sequencing Consortium (1998) Genome sequence of the nematode C. elegans: a platform for investigating biology. Science282: 2012-2018.
2. Barriere A, Felix MA (2006) Isolation of C. elegans and related nematodes. WormBook: 1-9.
3. Kiontke KC, Felix MA, Ailion M, Rockman MV, Braendle C, et al. (2011) A phylogeny and molecular barcodes for Caenorhabditis, with numerous new species from rotting fruits. BMC evolutionary biology 11: 339.
4. Stein LD, Bao Z, Blasiar D, Blumenthal T, Brent MR, et al. (2003) The Genome Sequence of Caenorhabditis briggsae: A Platform for Comparative Genomics. PLoS Biol 1: E45.
5. Sternberg PW, Waterston RH, Speith J, Eddy S, Wilson RK (2003) Genome Sequence of Additional Caenorhabditis species: Enhancing the Utility of C. elegans as a Model Organism.
6. Mortazavi A, Schwarz EM, Williams B, Schaeffer L, Antoshechkin I, et al. (2010) Scaffolding a Caenorhabditis nematode genome with RNA-seq. Genome Res.
7. Elsworth B, Jones M, Blaxter M (2013) Badger–an accessible genome exploration environment. Bioinformatics.
8. Howe K, Davis P, Paulini M, Tuli MA, Williams G, et al. (2012) WormBase: Annotating many nematode genomes. Worm 1: 15-21.

Trichuris suis

A new genome of a zoonotic whip worm species, Trichuris suis, has been made publicly available by the Gasser lab of Melbourne University.

As part of the research into the unique properties of the genome, a male and a female worm has been sequenced, assembled and annotated as described in Genome and transcriptome of the porcine whipworm Trichuris suis. Jex AR, Nejsum P, Schwarz EM, Hu L, Young ND, Hall RS, Korhonen PK, Liao S, Thamsborg S, Xia J, Xu P, Wang S, Scheerlinck JP, Hofmann A, Sternberg PW, Wang J, Gasser RB. Nat Genet. 2014 Jul;46(7):701-6. doi: 10.1038/ng.3012. Epub 2014 Jun 15.

It has been included as part of the WS243 release of WormBase and is shown on a Genome Browser, as well as on orthology sections of genes. Flatfiles of the raw data are also available on

Ancylostoma ceylanicum

The parasitic nematode Ancylostoma ceylanicum is a hookworm, closely related to the hookworms Ancylostoma duodenale and to Necator americanus.

These three species collectively infect over 500 million human beings, typically by burrowing into the skin as dauer-like L3 larvae, passing through the bloodstream and lungs, being swallowed along with mucus cleaning the lungs, and becoming permanently established as blood-drinking adults in the small intestine.

Despite the great difference in their life cycles from that of C. elegans, hookworms (and related parasites such as Haemonchus contortus) are actually more closely related to C. elegans than is the free-living nematode Pristionchus pacificus.

The bulk of hookworm infections are by A. duodenale and N. americanus; however, these two species do not generally infect other mammals, making them difficult to study experimentally. In contrast, A. ceylanicum competently infects humans, dogs, cats, and golden hamsters, making it an experimentally tractable human hookworm as well as an emerging zoonotic parasite ( Researchers at Cornell, Caltech, and UCSD have therefore sequenced the genome and transcriptome of A. ceylanicum in order to determine possible new targets for drugs and vaccines.

Its genome has been included as part of the WS243 release of WormBase and is shown on a Genome Browser, as well as on orthology sections of genes. Flatfiles of the raw data are also available on

new Pristionchus species in WS243

Thanks to the contribution of new sequencing data on Pristionchus spp. populations as described in “Characterization of genetic diversity in the nematode Pristionchus pacificus from population-scale resequencing data” by Christian Roedelsperger,, published in Genetics, the next WormBase release (WS243) will include Pristionchus exspectatus as a new species.

The new data will be available through the FTP site, a Genome Browser and as orthologs from already existing gene pages.