gene expression analysis using TopHat and Cufflinks

  1. Install latest Bowtie2. See here.
    $ bowtie2 --version
    /opt/bi/bowtie2-2.2.6/bowtie2-align-s version 2.2.6
    Built on localhost.localdomain
    Wed Jul 22 16:18:32 EDT 2015
    Compiler: gcc version 4.1.2 20080704 (Red Hat 4.1.2-54)
    Options: -O3 -m64 -msse2  -funroll-loops -g3 -DPOPCNT_CAPABILITY
    Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}
  2. Install latest TopHat. See here.
    $ tophat2 -v
    TopHat v2.1.0
  3. Install latest Cufflinks. See here.
    $ cufflinks
    cufflinks v2.2.1
    linked against Boost version 104700
  4. Prepare the reference genome. See here.
  5. Check your data
  6. Build the transcriptome index. See here.
  7. Protect your reference data.
  8. Prepare your working directory.
  9. Download data (lung).
  10. Download data (stomach).
  11. Check your data.
  12. Read alignment with TopHat.
    Map the reads for each sample to the reference genome:
  13. Quantification with Cuffquant.
    Compute the gene expression profiles which are used subsequently by Cuffdiff:
  14. Optional:
    1. Delete data.
    2. Check your data.

Tophat transcriptome index

When runnning TopHat with the -G option, it will build a Bowtie index from the provided reference annotation each time. TopHat will output the following lines when doing so:
[..] Building transcriptome data files tophat_out/tmp/genes
[..] Building Bowtie index from genes.fa
Building the index can take quite some time. You can pre-build this index once and use it afterwards each time you run TopHat. To do so, invoke TopHat just with the -G and the --transcriptome-index parameters. TopHat will build an index in the provided location.
$ tophat -G /var/data/bi/reference/prebuild/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf --transcriptome-index /var/data/bi/reference/prebuild/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/transciptome_index/genes /var/data/bi/reference/prebuild/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome
Note that you still have to provide the <bowtie_index> argument. Read the TopHat manual as well. Output:
Building transcriptome files with TopHat v2.1.0
Checking for Bowtie
Bowtie version:
Checking for Bowtie index files (genome)..
Checking for reference FASTA file
Building transcriptome data files /var/data/bi/reference/prebuild/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/transciptome_index/genes
Building Bowtie index from genes.fa
Transcriptome files prepared. This was the only task requested.
Files created:
$ ls -lh /var/data/bi/reference/prebuild/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/transciptome_index/
total 1,7G
-rw-rw-r-- 1 root root 125M Jan  3 17:08 genes.1.bt2
-rw-rw-r-- 1 root root  69M Jan  3 17:08 genes.2.bt2
-rw-rw-r-- 1 root root 1,7M Jan  3 16:56 genes.3.bt2
-rw-rw-r-- 1 root root  69M Jan  3 16:56 genes.4.bt2
-rw-rw-r-- 1 root root 305M Jan  3 16:56 genes.fa
-rw-rw-r-- 1 root root  26M Jan  3 16:56 genes.fa.tlst
-rwxrwxr-x 1 root root 874M Jan  3 16:55 genes.gff
-rw-rw-r-- 1 root root 125M Jan  3 17:19 genes.rev.1.bt2
-rw-rw-r-- 1 root root  69M Jan  3 17:19 genes.rev.2.bt2
-rw-rw-r-- 1 root root   24 Jan  3 16:56 genes.ver

install latest TopHat

See here.