WormBase refines method to map RNAi targets to the genome

During the process of curation of RNA interference (RNAi) data, WormBase routinely maps the targets of any given RNAi experiment to the genome based on information present in the paper that describes the experiment. Recently WormBase has refined this process and addressed inconsistencies in target determination.  Previously, we were not filtering out the highly fragmented hits that occurred. That is, when many very short alignments occurred close together on the genome our mapping script was concatenating these splits, much like it would do when it skips over introns. These hits caused errant primary and secondary targets to be displayed. Most targets for RNAi experiments remain unchanged, but errant hits have been removed from WormBase.
The criteria for primary and secondary target determination (these descriptions are also on the individual RNAi report pages) are as follows:
Primary targets: These are targets that have sequence identity to the RNAi probe of at least 95%, over a stretch of at least 100 nucleotides, identified using a
combination of BLAST and BLAT algorithms.  These are usually the intended target genes of an RNAi experiment.
Secondary targets: These are targets that have between 80 and 94.99% sequence identity over a stretch of at least 200 nucleotides to the RNAi probe. Targets (and overlapping genes) that satisfy these criteria may or may not be susceptible to a RNAi effect with the given probe and represent secondary (unintended) genomic targets of an RNAi experiment.

Remote access to relational sequence feature databases

Power users: you can now remotely access our sequence feature databases.

Host : mining.wormbase.org
Port : 3306
User: remote-user
Pass: none
[tharris@unkar: ~]> mysql -h mining.wormbase.org -u remote-user
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 14
Server version: 5.1.45-1-log (Debian)

Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| b_malayi           |
| c_brenneri         |
| c_briggsae         |
| c_elegans          |
| c_elegans_gmap     |
| c_elegans_pmap     |
| c_japonica         |
| c_remanei          |
| clustal            |
| h_bacteriophora    |
| m_hapla            |
| m_incognita        |
| p_pacificus        |
| test               |
+--------------------+
15 rows in set (0.00 sec)

Here’s an example script written in Perl using Bio::DB::GFF.

#!/usr/bin/perl

use Bio::DB::GFF;
use strict;

my $db = Bio::DB::GFF->new(-dsn  => 'dbi:mysql:c_elegans:mining.wormbase.org',
                           -user => 'remote-user',
                           -pass => '',)
  || die "Couldn't establish a connection to remote data mining server: $!";

my $iterator = $db->get_seq_stream(-type => ['coding:exon'] );

# Iterate over all of the requested features
while (my $feature = $iterator->next_seq) {

    # Create a more informative header
    my $name   = $feature->name;
    my $type   = $feature->type;
    my $start  = $feature->start;
    my $stop   = $feature->stop;
    my $strand = $feature->strand;
    my $refseq = $feature->sourceseq; # This is the name of the chromosome
    my $header = ">$name ($type; strand: $strand; $refseq: $start..$stop)";

    # If requested, fetch the sequence of the feature and convert it to fasta
      my $seq  = to_fasta($feature->dna);
      print ">$headern",$seq,"n";
}

# This subroutine converts a dna string into fasta format
sub to_fasta {
  my $sequence = shift;

  # Return if we are already in fasta format.
  return if ($sequence=~/^>(.+)$/m);

  # This is the business part of the subroutine.
  # Place a carriage return after every 80 characters
  $sequence =~ s/(.{80})/$1n/g;
  return $sequence;
}

Questions? Hit me up at [email protected]. And remember, please play nice: this is a shared resource. Egregious use that significantly disrupts other users may be curtailed without warning.