Power users: you can now remotely access our sequence feature databases.
Host : mining.wormbase.org Port : 3306 User: remote-user Pass: none
[tharris@unkar: ~]> mysql -h mining.wormbase.org -u remote-user Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 14 Server version: 5.1.45-1-log (Debian) Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | b_malayi | | c_brenneri | | c_briggsae | | c_elegans | | c_elegans_gmap | | c_elegans_pmap | | c_japonica | | c_remanei | | clustal | | h_bacteriophora | | m_hapla | | m_incognita | | p_pacificus | | test | +--------------------+ 15 rows in set (0.00 sec)
Here’s an example script written in Perl using Bio::DB::GFF.
#!/usr/bin/perl
use Bio::DB::GFF;
use strict;
my $db = Bio::DB::GFF->new(-dsn => 'dbi:mysql:c_elegans:mining.wormbase.org',
-user => 'remote-user',
-pass => '',)
|| die "Couldn't establish a connection to remote data mining server: $!";
my $iterator = $db->get_seq_stream(-type => ['coding:exon'] );
# Iterate over all of the requested features
while (my $feature = $iterator->next_seq) {
# Create a more informative header
my $name = $feature->name;
my $type = $feature->type;
my $start = $feature->start;
my $stop = $feature->stop;
my $strand = $feature->strand;
my $refseq = $feature->sourceseq; # This is the name of the chromosome
my $header = ">$name ($type; strand: $strand; $refseq: $start..$stop)";
# If requested, fetch the sequence of the feature and convert it to fasta
my $seq = to_fasta($feature->dna);
print ">$headern",$seq,"n";
}
# This subroutine converts a dna string into fasta format
sub to_fasta {
my $sequence = shift;
# Return if we are already in fasta format.
return if ($sequence=~/^>(.+)$/m);
# This is the business part of the subroutine.
# Place a carriage return after every 80 characters
$sequence =~ s/(.{80})/$1n/g;
return $sequence;
}
Questions? Hit me up at [email protected]. And remember, please play nice: this is a shared resource. Egregious use that significantly disrupts other users may be curtailed without warning.
I guess, the colon in the feature type should be an underscore? … and the greater sign in the print statement.
There may be encoded characters (although I’m not seeing the examples you point out). Feature type should be “coding:exon”, and the greater sign in the print statement is just for generating a fasta header.