Author Archive

Finding alternatives to Elsevier

By |

Currently, the University of California is boycotting Elsevier journals due to the onerous charges. In my field, that includes several journals such as: AJHG Cell, Cell Reports, Cell Stem Cell Gene Genomics Lancet European Journal of Medical Genetics Neuromuscular Disorders Molecular Therapy Trends in Genetics Here’s a list from 2012 of Elsevier journals under boycott…

Read more »

Getting gene attributes from biomart

By |

Biomart is an Ensembl service for querying and retrieving data. Here I’ll use R/Bioconductor to get gene positions for a list of genes. We basically need a mart object which encapsulates the species, host and database information. Here I’m using the US west coast mirror. This defaults to the current build (GRCh38) of the human…

Read more »

Bioconductor and Docker

By |

Dockerizing your bioconductor code is one way to ensure a reproducible workflow. It also enables easier deployment to cloud environments (e.g. AWS Batch) in case you want to scale your analysis. The official Bioconductor docker containers are hosted on Docker Hub. They are built with Rstudio-server installed so that you can run an interactive analysis….

Read more »

Working with Pacbio IsoSeq data

By |

Working with Pacbio IsoSeq data

Overview With its long single molecule reads, Pacbio IsoSeq offers the tantalizing possibility of reading entire full-length RNAs. The library prep is not for the faint of heart and a recent provider asked me for 5ug of RNA to start – not an easy task! Moreover, a thorough interrogation of a transcriptome would require at…

Read more »

Favorite atom packages

By |

hydrogen atom-beautify autocomplete-python busy-signal goto-definition highlight-selected linter-flake8 markdown-writer minimap python-autopep8 symbols-tree-view script ftp-remote-edit

Read more »

Assembling reads with velvet

By |

Velvet is a hash-based deBrujin assembler. I occasionally use it to build contigs from short read RNAseq data, although it can be used for long reads, metagenomics, etc. To install, just download, extract and make. There are two parts to the program: 1. velveth – prepare the sequences 2. velvetg – build the graph and…

Read more »

GMAP for long read RNA

By |

GMAP is an old (circa 2006) software for long read alignment. Its use case is for mapping RNA reads back to a genome. It has found new life in the world of long read RNAseq such as from Pacbio reads. Perhaps because of its age and architecture, it has some quirks and dependencies that seem…

Read more »

Serving BAM files from S3

By |

Its handy to have a webserver where you can put up BAM files for sharing with collaborators. But if you have tons of files or available webserver, you can use Amazon S3 to serve up your files. While S3 is pretty simple, setting it up for serving BAM (and index) files has a few tricky…

Read more »

Packaging your data in R

By |

Packaging your data in R The goal of this is to make distribution of your data easy and consistent. The steps are not hard if you are using RStudio. The New Project wizard will do some of the work for you. Create a directory for the R project (myproject) Create a subdirectory (myproject/data-raw) In the…

Read more »

Organizing bioinformatics projects

By |

I struggle with this all the time. What’s the best way to organize projects? Here’s on suggestion for organizing R from Software Carpentry¬† This is a directory structure I like from an older PLOS paper For a typical R project, it looks something like this The source code in R is often dependent on previous…

Read more »