Author Archive

Working with Pacbio IsoSeq data

By |

Working with Pacbio IsoSeq data

Overview With its long single molecule reads, Pacbio IsoSeq offers the tantalizing possibility of reading entire full-length RNAs. The library prep is not for the faint of heart and a recent provider asked me for 5ug of RNA to start – not an easy task! Moreover, a thorough interrogation of a transcriptome would require at…

Read more »

Favorite atom packages

By |

hydrogen atom-beautify autocomplete-python busy-signal goto-definition highlight-selected linter-flake8 markdown-writer minimap python-autopep8 symbols-tree-view script ftp-remote-edit

Read more »

Assembling reads with velvet

By |

Velvet is a hash-based deBrujin assembler. I occasionally use it to build contigs from short read RNAseq data, although it can be used for long reads, metagenomics, etc. To install, just download, extract and make. There are two parts to the program: 1. velveth – prepare the sequences 2. velvetg – build the graph and…

Read more »

GMAP for long read RNA

By |

GMAP is an old (circa 2006) software for long read alignment. Its use case is for mapping RNA reads back to a genome. It has found new life in the world of long read RNAseq such as from Pacbio reads. Perhaps because of its age and architecture, it has some quirks and dependencies that seem…

Read more »

Serving BAM files from S3

By |

Its handy to have a webserver where you can put up BAM files for sharing with collaborators. But if you have tons of files or available webserver, you can use Amazon S3 to serve up your files. While S3 is pretty simple, setting it up for serving BAM (and index) files has a few tricky…

Read more »

Packaging your data in R

By |

Packaging your data in R The goal of this is to make distribution of your data easy and consistent. The steps are not hard if you are using RStudio. The New Project wizard will do some of the work for you. Create a directory for the R project (myproject) Create a subdirectory (myproject/data-raw) In the…

Read more »

Organizing bioinformatics projects

By |

I struggle with this all the time. What’s the best way to organize projects? Here’s on suggestion for organizing R from Software Carpentry¬† This is a directory structure I like from an older PLOS paper For a typical R project, it looks something like this The source code in R is often dependent on previous…

Read more »

Rstudio-server over ssh

By |

Rstudio is a great tool but suppose you need to access your work computer to do some work from home? Use Rstudio-sever which ¬†allows you to basically connect to a remote machine and do your development there. I can’t find a definitive answer whether the free version of Rstudio-server encrypts the login information. Their website…

Read more »

RNAseq aligners – The Next Generation

By |

There was a time when Tophat/Cufflinks was the only game in town. That has changed. By a lot. Some of the newest aligners include STAR. Using a suffix tree and the idea of compatible reads make this a very fast aligner. Whereas cufflinks requires 8+ hours, STAR will require <2. The downside: the index requires…

Read more »

fix space in column name in data.tables

By |

Spaces in column names are changed to the standard R friendly format with a “.” in place of the space.

 

Read more »