The LiftOver tool is essentially a way for you to take a location or set of locations in one build of a genome and map it to a different build or genome. For instance, say you have a locus chr17:1000000-1000100 in human genome build hg17 and you want to know where it resides in the hg18 build. You can use LiftOver for this.
If you have a relatively small number of locations you can use the web version of LiftOver (http://genome.ucsc.edu/cgi-bin/hgLiftOver LiftOver).
Here we’ll use the command line tool to lift over coordinates.
Grab the executable you need from http://hgdownload.cse.ucsc.edu/admin/exe/, browser over to your operating system (for instance linux.x86_64) and download the file liftOver to your local machine. This file is an executable, so you don’t need to compile and it should just work. You’ll probably need to make it executable by changing the permissions
chmod 744 liftOver
The basic synatax is this:
liftOver fileOfOldPositions map.chain fileOfNewPositions fileOfUnmappedPositions
The map.chain files are the ones that allow you to get from one build (or species) to another. These are found on the UCSC sequence and annotation downloads page (http://hgdownload.cse.ucsc.edu/downloads.html#terms) for the organism and genome build of interest. For each species and build, there’s a “LiftOver files” link. Click that to get to the chain files.
The chain files are named as explained below:
The file names reflect the assembly conversion data contained within
in the format
To .over.chain.gz. For example, a file named
hg15Tohg16.zip file contains the liftOver data needed to convert hg15
(Human Build 33) coordinates to hg16 (Human Build 34). If no file is
available for the assembly in which you’re interested, please send a
request to the genome mailing list (firstname.lastname@example.org) and we will
attempt to provide you with one.
So if you want to liftOver hg17 positions to hg18 positions in human, you’ll want a file named hg17ToHg18.over.chain.gz. Chain files are basically the result of whole genome-to-genome alignments.
Unzip the chain file
You’ll also need your list of positions in BED format, which is basically a tab separated file of positions like so:
chrN startpos endpos
Then all you have to do is run liftOver. Not every position will be found in the newer build because merges, changes or revisions of the build so unmapped locations will be written out to the unmapped file. You can also use gff/gtf files as input. The only option I’ve really messed with is the minMatch parameter. For human to human, the recommended setting is 0.95 (95% of bases in your given interval match). For cross-species, say human to mouse, I commonly used 0.10 (10%) which is the recommended setting.