RNAseq aligners – The Next Generation

There was a time when Tophat/Cufflinks was the only game in town. That has changed. By a lot. Some of the newest aligners include

  • STAR. Using a suffix tree and the idea of compatible reads make this a very fast aligner. Whereas cufflinks requires 8+ hours, STAR will require <2. The downside: the index requires a lot of RAM. I think I’ve heard 32 Gb of RAM works but I run it on our 256GB machine.
  • GMAP/GSNAP – actually used since the EST days. I haven’t used it yet but according to bake-off comparisons like this, it is a relatively fast aligner.
  • Tophat/Cufflinks. Probably the most cited of all aligners. Straightfoward to use but seems to be underaligning when compared to newer software.

Some of the new tools in town are specialized for certain tasks, particularly aspects of splicing

  • Leafcutter is designed to study variation in intron splicing
  • MapSplice2 – splice junction discovery
  • SpliceTrap – differential splice usage
  • DEXseq – differential exon usage

New wave differential gene expression can now be done in minutes using these tools! First they use “transcripts per million” instead of the confusing FPKM/RPKM that no one seems to understand.

  • Kallisto (Pachter lab). From the good folks who brought us Tophat/Cufflinks and Express. I like the interactive UI for playing with your data through Sleuth.
  • Salmon/Sailfish. Runs extremely fast and makes use of the idea of compatible alignments which is more like STAR
  • RSEM – haven’t look at it yet

Workflows. There are a few workflows that seem popular for analyis:

  • Tophat/Cufflinks. You do the splice aware alignment, I’ll generate a ton of output for you
  • subread/edgeR/limma(voom)/Glimmr. align/count, normalize, DEG analysis and visualize. All thanks to bioconductor
  • your favorite aligner + DESeq2/edgeR/ebSeq

Of course, there’s not a lot of consensus on what works best or is “right” so there’s still room to add to this list.