RNASeq File Loader - Overview

Count and RPKM File Format is a tab-delimited file format for loading HTS(High Throughput Sequencing Data) into MeV for analysis. We are currently providing RNASeq analysis algorithms only for these data sets.

MeV being a desktop software cannot provide functionalities like base calling and sequence alignment etc, which are computationaly intensive processes and should be be done on hign end server clusters.

The entry point for HTS data into MeV should be in a summarized form. Which means, the raw sequence data has been base-called and aligned and tag counts have been assembled, summarized and mapped to a reference genome at the transcript/gene level.

This loader supports both Counts and RPKM/FPKM and a combinations of both as described in the Data Type section below. Currently we have support for Human and Mouse data only and it is provided as annotaiton from RefSeq or ENSMBL.

Both Count & Expression (RPKM/FPKM) are maintained for this kind of loader. The user has the option of loading either or both kinds of info for there data set. If either Count or RPKM is left out, it is calculated based on the publication descibed in the section Count to RPKM and vice versa.

Dialog Selections

Library Size File

This should be a tab-delimited file wihtout header. Each row should have 2 columns: Sample name and Library Size. Comment lines are OK and should start with "#". An example:

# This is a comment
Sample_1 5454545
Sample_2 694545
Sample_n 3245443
# This is the End

Count to RPKM and vice versa

When either RPKM or Count information is provided MeV calculates the other based on the publication by Mortazavi et al. Nature Methods - 5, 621 - 628 (2008). The supplemntary section describes the approach in detail. Here is the basic formula used: RPKM = Count/Library Size/TranscriptLength*1e+9

Rules and requirements