WormBase has been updated to the WS214 release of the database. New in this release: data from the modENCODE Project, available as tracks on the C. elegans Genome Browser.
mining.wormbase.org: now online
We’re happy to report that the new data mining server is now online. aceserver.cshl.edu will be retired at the end of the first week of June 2010.
If you use aceserver.cshl.edu for programmatic access to WormBase, please update your scripts now with the following information:
host : mining.wormbase.org port : 2005 # for acedb queries port : 3306 # MySQL queries of sequence feature databases via Bio:DB::GFF/Bio::DB::SeqFeature
Here’s an example script using Ace.pm that lists all of the genes in the Unc gene class:
#!/usr/bin/perl
use Ace;
use strict;
my $db = Ace->connect(-host=>'mining.wormbase.org',-port=>'2005')
or die "Can't connect to the server: $!";
# Get all genes in the Unc gene_class
my $gene_class = $db->fetch(Gene_class=>'unc');
my @genes = $gene_class->Genes;
foreach (@genes) {
print join("t",$_, $_->Public_name),"n";
}
And here’s a script mining sequence features using Bio::DB::GFF. It fetches all coding exons and prints their sequence in FASTA. Please note that access to the MySQL databases is pending firewall reconfiguration which should be complete in the next week.
#!/usr/bin/perl
use Bio::DB::GFF;
use strict;
my $dsn = 'c_elegans:mining.wormbase.org';
my $feature = 'coding_exon';
my $db = Bio::DB::GFF->new(-dsn => 'dbi:mysql:' . $dsn,
-user => 'remote-user',
-pass => '',)
|| die "Couldn't establish a connection to $dsn";
my $iterator = $db->get_seq_stream(-type => $feature);
# Iterate over all of the requested features
while (my $feature = $iterator->next_seq) {
# Create a more informative header
my $name = $feature->name;
my $type = $feature->type;
my $start = $feature->start;
my $stop = $feature->stop;
my $strand = $feature->strand;
my $refseq = $feature->sourceseq; # This is the name of the chromosome
my $header = ">$name ($type; strand: $strand; $refseq: $start..$stop)";
# If requested, fetch the sequence of the feature and convert it to fasta
my $seq = to_fasta($feature->dna);
print ">$headern",$seq,"n";
}
# This subroutine converts a dna string into fasta format
sub to_fasta {
my $sequence = shift;
# Return if we are already in fasta format.
return if ($sequence=~/^>(.+)$/m);
# This is the business part of the subroutine.
# Place a carriage return after every 80 characters
$sequence =~ s/(.{80})/$1n/g;
return $sequence;
}
Questions? Hit me up at [email protected].
Colocation problems resolved; service restored
This morning, a configuration error by our colocation facility blocked access to WormBase. This problem is now resolved. We apologize for the service disruption.
WormBase Release: WS213
WormBase has been updated to the WS213 release of the database. Release notes are available on the WormBase site.
Sanger Institute WormBase Project Manager position
The successful applicant will be responsible for managing a group of four computer biologists involved in database production and annotation. Additionally, the team undertake a wide range of tasks including detailed curation, genome wide data analysis and automatic annotation pipelines.
For further detail see the Sanger Institute Vacancies page. Informally inquiries can be directed to Anthony Rogers ([email protected]).