fasta format

fasta or fasta.gz (compressed version) is a very generic text format that is used to store sequence data. A fasta file has at least one record, each record consists of a minimum of two lines.
  1. ID, starts with >
  2. Sequence, typically wrapped to multiple lines at a fixed maximum line witdh.
>gi|30212|emb|X56692.1| H.sapiens mRNA for C-reactive protein
GGACTTCTAGCCCCTGAACTTTCAGCCGAATACATCTTTTCCAAAGGAGTGAATTCAGGCCCTTGT
CTGGCAGCAGGACGTGACC

References:
  1. Galaxy Wiki

sam/ bam format

sam or bam (compressed version) is used to store NGS alignment data.

QNAMEthe sequence/ read name
FLAGbitwise flag
RNAMEthe reference sequence name
POSthe position in the reference sequence (1-based indices)
MAPQthe mapping quaility
CIGARCIGAR strings
RNEXTRef. name of the mate/next read
PNEXTPosition of the mate/next read
TLENthe template length for paired-end reads
SEQthe read sequence
QUALthe base call qualities of the read sequence (same as in FASTQ format)

References:
  1. SAMv1 specs
  2. Galaxy Wiki

fastq format

fastq or fastq.gz (compressed version) is used to store NGS data. A fastq file has at least one record, each record consists of four lines.
  1. ID, starts with @
  2. sequence
  3. End of sequence, starts with +
  4. Sequencing quality information. One ASCII encoded quality score per base.
A record’s sequence is called read.
@ERR315326.7031172/1
TGGCACCACACCCCTCTAAGACGCAGCAAT
+
BBBFFFFFFFFFFIIIIIIIIIIIIIIIII
Quality scores can be represented using three different encodings which use a different range of ASCII characters:
NameASCII character range
Sanger, Illumina >= v1.833-126
Solexa, Illumina < v1.359-126
Illumina v1.3 - v1.764-126

References:
  1. Wikipedia
  2. Bioinformatics Data Skills
  3. Galaxy Wiki