fastq format

fastq or fastq.gz (compressed version) is used to store NGS data. A fastq file has at least one record, each record consists of four lines.
  1. ID, starts with @
  2. sequence
  3. End of sequence, starts with +
  4. Sequencing quality information. One ASCII encoded quality score per base.
A record’s sequence is called read.
@ERR315326.7031172/1
TGGCACCACACCCCTCTAAGACGCAGCAAT
+
BBBFFFFFFFFFFIIIIIIIIIIIIIIIII
Quality scores can be represented using three different encodings which use a different range of ASCII characters:
NameASCII character range
Sanger, Illumina >= v1.833-126
Solexa, Illumina < v1.359-126
Illumina v1.3 - v1.764-126

References:
  1. Wikipedia
  2. Bioinformatics Data Skills
  3. Galaxy Wiki

bowtie2 paired-end alignment

Bowtie 2 version 2.2.6 by Ben Langmead (langmea@cs.jhu.edu, www.cs.jhu.edu/~langmea)
Usage: 
  bowtie2 [options]* -x  {-1  -2  | -U } [-S ]
time bowtie2 -p 8 -x reference/human/hg19 -1 input_1.fastq -2 input_2.fastq > output.sam
48755614 reads; of these:
  48755614 (100.00%) were paired; of these:
    24164921 (49.56%) aligned concordantly 0 times
    14767863 (30.29%) aligned concordantly exactly 1 time
    9822830 (20.15%) aligned concordantly >1 times
    ----
    24164921 pairs aligned concordantly 0 times; of these:
      4355288 (18.02%) aligned discordantly 1 time
    ----
    19809633 pairs aligned 0 times concordantly or discordantly; of these:
      39619266 mates make up the pairs; of these:
        27256949 (68.80%) aligned 0 times
        8927823 (22.53%) aligned exactly 1 time
        3434494 (8.67%) aligned >1 times
72.05% overall alignment rate

real	44m49.082s
product: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz

create bowtie2 index for reference genome

Usage: bowtie2-build [options]*  
bowtie2-build does not support reading from standard in, so the input file is extracted first.
product: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz