- Large scale sequencing has been revolutionized
by the introduction of several next-generation sequencing
- RNA sequencing (RNA-seq) is the use of NGS for
- Because it can generate an unlimited dynamic range,
provide greater sensitivity than microarrays.RNA-seq
has been hailed as the future of transcriptome research.
- RNA-seq is the first sequencing-based method that
allows the entire transcriptome to be surveyed by
high-throughput and quantitative manner.
An typical RNA-seq
- Long RNAs are first converted into a library of cDNA
fragments through either RNA fragmentation or DNA
- Sequencing adaptors (blue) are subsequently added to
each cDNA fragment and a short sequence is obtained from
each cDNA using high-throughput sequencing technology.
- The resulting sequence reads are aligned with the
reference genome or transcriptome, and classified as three
types: exonic reads, junction reads and poly(A) end-reads.
- These three types are used to generate a
base-resolution expression profile for each gene, as
illustrated at the bottom; a yeast ORF with one intron is
shown (Fig. 1.).
Fig. 1. A typical RNA-Seq experiment
General themes of RNA-seq
Computational protocol for
RNA-seq (Fig. 2.).
- Obtain raw data
- Align (with reference genome)/assemble (without
reference genome) reads
- Process alignment with a tool specific to the
- Post Process
- Summarize and visualize
Fig. 2. RNA-Seq computational
RNA-Seq analysis: practical session using
galaxy main server (5)
The dataset: Genome-wide analysis of allelic expression
imbalance in human primary cells by high-throughput
1. Opening a session in Galaxy
Galaxy is an open, web-based platform for
data intensive biomedical research.
2. Obtaining the data
3. Quality control of high throughput
FastQC aims to provide a simple way
to do some quality control checks on raw sequence data
from high-throughput sequencing pipelines.
4. Loading fastq file onto Galaxy server
5. Mapping read with TopHat
TopHat is a fast splice junction mapper
for RNA-Seq reads. It aligns RNA-Seq reads to
using the ultra high-throughput short read
aligner Bowtie, and then analyzes the mapping results to
junctions between exons.
6.Viewing the results with Integrated Genome
7.Computing FPKM with cufflinks
Cufflinks perform transcript
assembly and FPKM (RPKM) estimates for RNA-Seq data. One
important parameter of
Cufflinks is to choose a reference genome
that will tell cufflinks the locations of the gene for
which we want to compute
the expression. This argument appear as Use
Reference Annotation parameter in Galaxy.
Advantage of RNA-seq
compared with microarrays
- RNA-seq does not need
reference sequence for genes/genome being assayed
- More sensitive for less abundant transcripts
- Large dynamic range (105 vs. 102
- Allows the detection of nucleotide variation in
the transcribed regions (SNP)
- Quantitation of splicing
- Can survey novel genes if genome model still
- Reanalysis of data can become more valuable as
genome annotation improves
- High technical reproducibility
Table 1. Several advantage of
RNA-seq compared with microarrays (6)
less abundant transcripts
3. Quantifying expression levels: RNA-Seq and microarray
To enhance the scientific
community's understanding of the advantages and challenges
of RNA-Seq, the performance of an RNA-Seq approach
(Illumina Genome Analyzer II) and a microarray-based
approach (Affymetrix Rat Genome 230 2.0 arrays) for
detecting differentially expressed genes (DEGs) in the
kidneys of rats was carried out.
The results indicated that RNA-Seq was more
sensitive in detecting genes with low expression levels,
while similar gene expression patterns were observed for
both platforms. Moreover, although the overlap of the DEGs
was only 40-50%, the biological interpretation was largely
consistent between the RNA-Seq and microarray data (3).