rnaSPAdes manual

1. About rnaSPAdes
2. rnaSPAdes specifics
    2.1. Running rnaSPAdes
    2.2. rnaSPAdes-specific options
    2.2. Assemblying strand-specific RNA-Seq
    2.3. rnaSPAdes output
3. Assembly evaluation
4. Citation
5. Feedback and bug reports

1 About rnaSPAdes

rnaSPAdes is a tool for de novo transcriptome assembly from RNA-Seq data and is suitable for all kind of organisms. rnaSPAdes is a part of SPAdes package since version 3.9. Information about SPAdes download, requirements, installation and basic options can be found in SPAdes manual. Below you may find information about differences between SPAdes and rnaSPAdes.

2 rnaSPAdes specifics

2.1 Running rnaSPAdes

To run rnaSPAdes use


    rnaspades.py [options] -o <output_dir>


    spades.py --rna [options] -o <output_dir>

Note that we assume that SPAdes installation directory is added to the PATH variable (provide full path to rnaSPAdes executable otherwise: <rnaspades installation dir>/rnaspades.py).

Here are several notes regarding rnaSPAdes options:

rnaSPAdes can take as an input only paired-end and single-end libraries.
rnaSPAdes does not support --careful and --cov-cutoff options.
rnaSPAdes is not compatible with other pipeline options such as --meta, --sc and --plasmid. If you wish to assemble metatranscriptomic data just run rnaSPAdes as it is.
By default rnaSPAdes uses 2 k-mer sizes, which are automatically detected using read length (approximately one third and half of the maximal read length). We recommend not to change this parameter because smaller k-mer sizes typically result in multiple chimeric (misassembled) transcripts. In case you have any doubts about your run, do not hesitate to contact us using e-mail given below.

2.2 rnaSPAdes-specific options

--fast
Uses only a single k-mer size (detected automatically) and removes short low-covered isolated edges from the graph. Note, that very short and low-expressed transcripts may be missing when this option is used.

2.3 Assemblying strand-specific RNA-Seq

rnaSPAdes supports strand-specific RNA-Seq dataset. You can indicate that the dataset is strand-specific using one of the following options:

--ss-fr
The data set is strand-specific and first read in pair corresponds to actual gene strand.

--ss-rf
The data set is strand-specific and first read in pair corresponds to reverse gene strand (antisense).

Note, that strand-specificity is not realated and should not be confused with FR and RF orientation of paired reads. RNA-Seq paired-end reads typically have forward-reverse orientation (--> <--), which is assumed by default and no additional options are needed (see main manual for deatails).

If the data set is single-end use --ss-fr option in case when reads correspond to gene strand and --ss-rf otherwise.

2.4 rnaSPAdes output

rnaSPAdes outputs one main FASTA file named transcripts.fasta. The corresponding file with paths in the assembly_graph.fastg is transcripts.paths.

In addition rnaSPAdes outputs transcripts with different level of filtration into <output_dir>/:

hard_filtered_transcripts.fasta – includes only long and reliable transcripts with rather high expression.
soft_filtered_transcripts.fasta – includes short and low-expressed transcipts, likely to contain junk sequences.

We reccomend to use main transcripts.fasta file in case you don't have any specific needs for you projects. Do not hesitate to contact us using e-mail given below.

Contigs/scaffolds names in rnaSPAdes output FASTA files have the following format:
>NODE_97_length_6237_cov_11.9819_g8_i2
Similarly to SPAdes, 97 is the number of the transcript, 6237 is its sequence length in nucleotides and 11.9819 is the k-mer coverage. Note that the k-mer coverage is always lower than the read (per-base) coverage. g8_i2 correspond to the gene number 8 and isoform number 2 within this gene. Transcripts with the same gene number are presumably received from same or somewhat similar (e.g. paralogous) genes. Note, that the prediction is based on the presence of shared sequences in the transcripts and is very approximate.

Address for communications: spades.support@cab.spbu.ru.