how much memory is required to store the human genome?

  1. The genome is the complete set of DNA, which is encoded by a sequence of four letters, A, T, G and C.
  2. To encode a four letter alphabet, you need 2 bits per letter.
  3. The genome consists of ~3 billion letters.
  4. 2 bit times 3 billion letters = ~771 Megabyte data (2 bit times 3,234,830,000 = 6,469,660,000 bit = 771.24357 MB).

Why 2 bits?

Number of bitsDistinct Patterns
10, 1
200, 01, 10, 11
3000, 001, 010, 011, 100, 101, 110, 111
40000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111
In general, each additional bit doubles the number of patterns (n bits -> 2n patterns). Note that sequence data is usually stored as ASCII text, which is a 7-bit encoding, increasing file size accordingly.
Share itShare on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn

Leave a Reply

Your email address will not be published. Required fields are marked *