WormBase refines method to map RNAi targets to the genome

July 13, 2010 by Ranjana Kishore Leave a Comment

During the process of curation of RNA interference (RNAi) data, WormBase routinely maps the targets of any given RNAi experiment to the genome based on information present in the paper that describes the experiment. Recently WormBase has refined this process and addressed inconsistencies in target determination. Previously, we were not filtering out the highly fragmented hits that occurred. That is, when many very short alignments occurred close together on the genome our mapping script was concatenating these splits, much like it would do when it skips over introns. These hits caused errant primary and secondary targets to be displayed. Most targets for RNAi experiments remain unchanged, but errant hits have been removed from WormBase.
The criteria for primary and secondary target determination (these descriptions are also on the individual RNAi report pages) are as follows:
Primary targets: These are targets that have sequence identity to the RNAi probe of at least 95%, over a stretch of at least 100 nucleotides, identified using a
combination of BLAST and BLAT algorithms. These are usually the intended target genes of an RNAi experiment.
Secondary targets: These are targets that have between 80 and 94.99% sequence identity over a stretch of at least 200 nucleotides to the RNAi probe. Targets (and overlapping genes) that satisfy these criteria may or may not be susceptible to a RNAi effect with the given probe and represent secondary (unintended) genomic targets of an RNAi experiment.

New release of WormBase: WS215

July 2, 2010 by Todd Harris 4 Comments

WormBase has been updated to the WS215 release of the database.

modENCODE data now available at WormBase

June 24, 2010 by Todd Harris Leave a Comment

The WormBase genome browser now contains over 100 tracks of functional data from the ModENCODE Project. Please visit ModENCODE to search and download the underlying data sets.

Biological Curator position with WormBase at Caltech

June 23, 2010 by Todd Harris Leave a Comment

Biological Curator position is now available at WormBase-Caltech. Please contact Paul Sternberg ([email protected]) for more information.

Remote access to relational sequence feature databases

June 19, 2010 by Todd Harris 2 Comments

Power users: you can now remotely access our sequence feature databases.

Host : mining.wormbase.org
Port : 3306
User: remote-user
Pass: none

[tharris@unkar: ~]> mysql -h mining.wormbase.org -u remote-user
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 14
Server version: 5.1.45-1-log (Debian)

Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| b_malayi           |
| c_brenneri         |
| c_briggsae         |
| c_elegans          |
| c_elegans_gmap     |
| c_elegans_pmap     |
| c_japonica         |
| c_remanei          |
| clustal            |
| h_bacteriophora    |
| m_hapla            |
| m_incognita        |
| p_pacificus        |
| test               |
+--------------------+
15 rows in set (0.00 sec)

Here’s an example script written in Perl using Bio::DB::GFF.

#!/usr/bin/perl

use Bio::DB::GFF;
use strict;

my $db = Bio::DB::GFF->new(-dsn  => 'dbi:mysql:c_elegans:mining.wormbase.org',
                           -user => 'remote-user',
                           -pass => '',)
  || die "Couldn't establish a connection to remote data mining server: $!";

my $iterator = $db->get_seq_stream(-type => ['coding:exon'] );

# Iterate over all of the requested features
while (my $feature = $iterator->next_seq) {

    # Create a more informative header
    my $name   = $feature->name;
    my $type   = $feature->type;
    my $start  = $feature->start;
    my $stop   = $feature->stop;
    my $strand = $feature->strand;
    my $refseq = $feature->sourceseq; # This is the name of the chromosome
    my $header = ">$name ($type; strand: $strand; $refseq: $start..$stop)";

    # If requested, fetch the sequence of the feature and convert it to fasta
      my $seq  = to_fasta($feature->dna);
      print ">$headern",$seq,"n";
}

# This subroutine converts a dna string into fasta format
sub to_fasta {
  my $sequence = shift;

  # Return if we are already in fasta format.
  return if ($sequence=~/^>(.+)$/m);

  # This is the business part of the subroutine.
  # Place a carriage return after every 80 characters
  $sequence =~ s/(.{80})/$1n/g;
  return $sequence;
}

Questions? Hit me up at [email protected]. And remember, please play nice: this is a shared resource. Egregious use that significantly disrupts other users may be curtailed without warning.

«Previous Page

WormBase -- The Blog

The Official Blog of WormBase

WormBase refines method to map RNAi targets to the genome

New release of WormBase: WS215

modENCODE data now available at WormBase

Biological Curator position with WormBase at Caltech

Remote access to relational sequence feature databases

Share this:

Share this:

Share this:

Share this:

Share this: