WormMart is under redevelopment

We know that several of our users have had problems using WormMart.  We would like to alert users to the fact that WormMart is under redevelopment, some data-sets are  unavailable and there are bugs in the tool that we know of and are actively working on.  Developers aim to build a new release before the end of the year.  We are sorry for this inconvenience.

Did you know that WormBase provides useful data files for download?

WormBase maintains a public FTP site where you can find many commonly requested files and datasets, the WormBase software and prepackaged databases. DNA sequence data for the genomes of C. elegans, C. briggsae, C. remanei, etc., are available in FASTA format, as is protein data.  Microarray data like the up-to-date mapping of microarray probes to WormBase genes for Affymetrix, Agilent, Washington University Genome Sequencing Center and Stanford Microarray Database (SMD) chips, is also made available.  For C. elegans, the following files are down-loadable from the FTP site: confirmed_genes — which lists curated C. elegans genes that have been confirmed by transcriptional data; wormpep — FASTA-format files containing predicted and confirmed protein translations, and many other files.

Take a look at our FTP site at ftp://ftp.wormbase.org/pub/wormbase/.  Be sure to look at the README file in each directory for a listing of the contents of that directory.

Remote access to relational sequence feature databases

Power users: you can now remotely access our sequence feature databases.

Host : mining.wormbase.org
Port : 3306
User: remote-user
Pass: none
[tharris@unkar: ~]> mysql -h mining.wormbase.org -u remote-user
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 14
Server version: 5.1.45-1-log (Debian)

Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| b_malayi           |
| c_brenneri         |
| c_briggsae         |
| c_elegans          |
| c_elegans_gmap     |
| c_elegans_pmap     |
| c_japonica         |
| c_remanei          |
| clustal            |
| h_bacteriophora    |
| m_hapla            |
| m_incognita        |
| p_pacificus        |
| test               |
+--------------------+
15 rows in set (0.00 sec)

Here’s an example script written in Perl using Bio::DB::GFF.

#!/usr/bin/perl

use Bio::DB::GFF;
use strict;

my $db = Bio::DB::GFF->new(-dsn  => 'dbi:mysql:c_elegans:mining.wormbase.org',
                           -user => 'remote-user',
                           -pass => '',)
  || die "Couldn't establish a connection to remote data mining server: $!";

my $iterator = $db->get_seq_stream(-type => ['coding:exon'] );

# Iterate over all of the requested features
while (my $feature = $iterator->next_seq) {

    # Create a more informative header
    my $name   = $feature->name;
    my $type   = $feature->type;
    my $start  = $feature->start;
    my $stop   = $feature->stop;
    my $strand = $feature->strand;
    my $refseq = $feature->sourceseq; # This is the name of the chromosome
    my $header = ">$name ($type; strand: $strand; $refseq: $start..$stop)";

    # If requested, fetch the sequence of the feature and convert it to fasta
      my $seq  = to_fasta($feature->dna);
      print ">$headern",$seq,"n";
}

# This subroutine converts a dna string into fasta format
sub to_fasta {
  my $sequence = shift;

  # Return if we are already in fasta format.
  return if ($sequence=~/^>(.+)$/m);

  # This is the business part of the subroutine.
  # Place a carriage return after every 80 characters
  $sequence =~ s/(.{80})/$1n/g;
  return $sequence;
}

Questions? Hit me up at todd@wormbase.org. And remember, please play nice: this is a shared resource. Egregious use that significantly disrupts other users may be curtailed without warning.

mining.wormbase.org: now online

We’re happy to report that the new data mining server is now online. aceserver.cshl.edu will be retired at the end of the first week of June 2010.

If you use aceserver.cshl.edu for programmatic access to WormBase, please update your scripts now with the following information:

host : mining.wormbase.org
port : 2005  # for acedb queries
port : 3306  # MySQL queries of sequence feature databases via Bio:DB::GFF/Bio::DB::SeqFeature

Here’s an example script using Ace.pm that lists all of the genes in the Unc gene class:


#!/usr/bin/perl
use Ace;
use strict;

my $db = Ace->connect(-host=>'mining.wormbase.org',-port=>'2005')
or die "Can't connect to the server: $!";

# Get all genes in the Unc gene_class
my $gene_class = $db->fetch(Gene_class=>'unc');
my @genes = $gene_class->Genes;
foreach (@genes) {
print join("t",$_, $_->Public_name),"n";
}

And here’s a script mining sequence features using Bio::DB::GFF. It fetches all coding exons and prints their sequence in FASTA. Please note that access to the MySQL databases is pending firewall reconfiguration which should be complete in the next week.


#!/usr/bin/perl

use Bio::DB::GFF;
use strict;

my $dsn = 'c_elegans:mining.wormbase.org';
my $feature = 'coding_exon';
my $db = Bio::DB::GFF->new(-dsn => 'dbi:mysql:' . $dsn,
-user => 'remote-user',
-pass => '',)
|| die "Couldn't establish a connection to $dsn";

my $iterator = $db->get_seq_stream(-type => $feature);

# Iterate over all of the requested features
while (my $feature = $iterator->next_seq) {

# Create a more informative header
my $name = $feature->name;
my $type = $feature->type;
my $start = $feature->start;
my $stop = $feature->stop;
my $strand = $feature->strand;
my $refseq = $feature->sourceseq; # This is the name of the chromosome
my $header = ">$name ($type; strand: $strand; $refseq: $start..$stop)";

# If requested, fetch the sequence of the feature and convert it to fasta
my $seq = to_fasta($feature->dna);
print ">$headern",$seq,"n";
}

# This subroutine converts a dna string into fasta format
sub to_fasta {
my $sequence = shift;

# Return if we are already in fasta format.
return if ($sequence=~/^>(.+)$/m);

# This is the business part of the subroutine.
# Place a carriage return after every 80 characters
$sequence =~ s/(.{80})/$1n/g;
return $sequence;
}

Questions? Hit me up at todd@wormbase.org.