Archive For The “Uncategorized” Category

Favorite atom packages

By |

hydrogen atom-beautify autocomplete-python busy-signal goto-definition highlight-selected linter-flake8 markdown-writer minimap python-autopep8 symbols-tree-view script ftp-remote-edit

Read more »

Packaging your data in R

By |

Packaging your data in R The goal of this is to make distribution of your data easy and consistent. The steps are not hard if you are using RStudio. The New Project wizard will do some of the work for you. Create a directory for the R project (myproject) Create a subdirectory (myproject/data-raw) In the…

Read more »

Organizing bioinformatics projects

By |

I struggle with this all the time. What’s the best way to organize projects? Here’s on suggestion for organizing R from Software Carpentry  This is a directory structure I like from an older PLOS paper For a typical R project, it looks something like this The source code in R is often dependent on previous…

Read more »

nodejs for genomics

By |

Some interesting sites to peruse later… http://www.bionode.io/ http://biojs.io/ http://devpost.com/software/genesis-computational-genomic-sequence-analysis-z47x9t http://thejackalofjavascript.com/dna-analysis-node-js/ https://www.npmjs.com/package/ntseq

Read more »

Amazon CLI for S3

By |

Some useful commands EC2 aws ec2 describe-instances aws ec2 describe-instances –region us-east-1 aws ec2 describe-instances –region us-east-1 –output table Any aws cli command can use the following flags *–profile – name of a profile to use, or “default” to use the default profile. *–region – AWS region to call. *–output – output format. *–endpoint-url –…

Read more »

getting lists of rRNA, miRNA in R

By |

Recently i had to remove all small RNAs from some cufflinks data. There are lots of way to do this, but this was relatively painless (aside from figuring it out). The HUGO website has subcategories of genes annotated and we can grab the data from there. It’s available as both JSON or TEXT. I found…

Read more »

first scratch at spark

By |

This follows from the previous post where i tried out Hadoop. This uses spark on the same comet cluster. To run interactively, the best thing to do first off is add this to my .bashrc

Then in the folder with the spark code (get it here), run this

Basically, it sleeps for 4…

Read more »

First scratch at hadoop

By |

Trying out comet at SDSC, thanks to the XSEDE folks. I’m not familiar with SLURM, but it was fun especially since I’ve been interseted in Hadoop and Spark for awhile. This is a run through using Hadoop Map Reduce using a Java program that looks for anagrams in a list of words. (link to data/code…

Read more »

Exome sequencing workflow

By |

This is a high level overview of the general workflow for processing and analyzing whole exome data [thanks Kevin]. QSEQ files are raw reads off the Illumina sequencer. If reads are paired, there there are two files, otherwise there is one file. A Phread score of ~40 is the max (high quality) Demultiplex samples: if…

Read more »