DNA
DNA is used for biological information storage. The DNA backbone is sturdy and reliable, making it an excellent method of storing data safely.
Ploidy is the number of copies of a chromosome contained in a tissue. In humans, somantic cells are diploid (two homologous sets of each chromosome) while gametes (reproductive cells) are haploid (one set of a chromosome). Some organisms contain many sets of chromosomes, such as octoploid strawberries.
Nuclear DNA
Mitochondrial DNA
Mitochondria are the power factories of cells. Some cells can contain over 200 mitochondria, and each mitochondrion contains DNA. This mitochondrial DNA, or mtDNA, is inherited maternally. Nuclear DNA and mtDNA are generally considered to be from separate evolutionary origins. Mitochondria were probably once bacteria eaten by eukaryotic organisms. There is no gene recombination in mtDNA, resulting in the same mtDNA being passed from parents to progeny.
In genomics we often work heavily with the mitochondrial genome. Mitochondrial DNA has a faster mutation rate than nuclear DNA, is short, easy to amplify and sequence, and therefore a very important genome for our lab work.
The human mitochondrial genome was first published in 1981 by Anderson et al.:
Anderson S., A.T. Bankier, B. G. Barrell, M. H. L. de Bruijn, A. R. Coulson, J. Drouin, I. C. Eperon, D. P. Nierlich, B. A. Roe, F. Sanger, P. H. Schreier, A. J. H. Smith, R. Staden, and I. G. Young. 1981. Sequence and organization of the human mitochondrial genome. Nature 290:457–465.
The one described here is the one GenBank uses as an example (NC_001807). The mitochondrial genome is circular, nonrecombining, and maternally inherited.
From African (Yoruba) individual published in:
Ingman, M., H. Kaessmann, S. Paabo, and U. Gyllensten. 2000. Mitochondrial genome variation and the origin of modern humans. Nature 408(6813):708-713.
Size: 16571 Nucleotides
Content: 37 genes
13 protein coding genes: Range in size 207 to 1812 nucleotides
2 rRNA genes: Sizes are 954 and 1558 nucleotides
22 tRNA genes: Range in size is 59 to 75 nucleotides, average size is 69 nucleotides
Two replication origins: OH and OL for the heavy and light strands
There is one tRNA gene for 18 of the 20 amino acids and two tRNA genes used for Leucine and Serine.
The two strands have a difference in guanine (G) content, called Strand Bias, with the Light Strand being guanine (G) deficient, which typically among vertebrates contains 11-14% G. The complementary strand is termed the Heavy Strand. This does not hold outside of vertebrates.
Most genes are encoded on the heavy strand. Of the protein coding genes only ND6 is light strand encoded. Both rRNA genes are heavy strand transcribed. Of the 22 tRNA genes only 8 are light strand transcribed.
The two DNA strands of the mitochondrial genome are Transcribed into RNA individually, each strand as a single transcript, which is subsequently cleaved. It is important to note that genes encoded or transcribed on the same strand cannot overlap with the following exception.
Among protein coding regions, the two ATPase (ATP6 and ATP8) genes are bicistronically encoded, as well as the ND4 and ND4L subunits of NADH dehydrogenase. These genes when transcribed into RNA from the same strand do have overlapping regions.
This results in a unique situation of protein coding stop codons among vertebrates. Typical full stop codons are “TAA, AGA and AGG”. When tRNA genes are transcribed on opposite strands with other tRNA genes or protein coding genes they often overlap. When tRNA genes are transcribed on the same strand as a protein gene they cannot overlap and tRNA genes often punctuate the “Stop” of a protein transcript following cleavage as first described by Ojala et al. in 1981:
Ojala D, J. Montoya, and G. Attardi. 1981. tRNA punctuation model of RNA processing in human mitochondria. Nature 290:470–474.
RNAs are in general polyadenylated following cleavage with a string of “A” bases at the end. What results in mitochondrial-cleaved transcripts are termed Partial Stop Codons, from processing of the primary whole mitochondrial-strand transcript into smaller transcripts of RNA that have undergone polyadenylation. Hence:
T = TAA
TA = TAA
AG = AGA
If there is a protein-coding gene that ends in a “T” and the adjacent tRNA transcribed on the same strand starts with an “AA” this is not the same gene with a full stop of “TAA” in the protein coding segment. I outline this in detail because it is the number one problem in gene annotation (gene boundaries) among mitochondrial files in GenBank.
In addition, it has been noted, at least among vertebrates (Boore et al., 2005), that mitochondrial protein coding genes can start with the codons of either “NTG” or “ATN” (N= G, A, T, or C) which are probably post-transcriptionally modified or read as fMet. This results in six alternative start codons from the typical “ATG” start.
Boore, J. L., J. R. Macey, and M. Medina. 2005. Whole mitochondrial genome sequencing and gene order comparisons of animals. In Molecular Evolutiion: Producing the Biochemical Data, Part B, E. A. Zimmer and E. Roalson (eds.), Methods in Enzymology 395:311-348.
Gene/Replication Origin |
Type |
Size |
Strand Encoded |
Notes |
replication origin |
|
579 |
noncoding |
Heavy strand replication origin (OH) |
tRNA-Phe |
tRNA |
71 |
heavy strand |
|
12S rRNA |
rRNA |
954 |
heavy strand |
small subunit ribosomal RNA |
tRNA-Val |
tRNA |
69 |
heavy strand |
|
16S rRNA |
rRNA |
1558 |
heavy strand |
large subunit ribosomal RNA |
tRNA-Leu |
tRNA |
75 |
heavy strand |
note = codons recognized: UUR |
ND1 |
protein |
957 |
heavy strand |
NADH dehydrogenase subunit 1 |
tRNA-Ile |
tRNA |
69 |
heavy strand |
|
tRNA-Gln |
tRNA |
72 |
light strand |
|
tRNA-Met |
tRNA |
68 |
heavy strand |
|
ND2 |
protein |
1042 |
heavy strand |
NADH dehydrogenase subunit 2 |
tRNA-Trp |
tRNA |
68 |
heavy strand |
|
tRNA-Ala |
tRNA |
69 |
light strand |
|
tRNA-Asn |
tRNA |
73 |
light strand |
|
Light strand replication origin (OL) |
replication origin |
34 |
noncoding |
|
tRNA-Cys |
tRNA |
66 |
light strand |
|
tRNA-Tyr |
tRNA |
66 |
light strand |
|
COI |
protein |
1542 |
heavy strand |
cytochrome c oxidase subunit I |
tRNA-Ser |
tRNA |
72 |
light strand |
codons recognized: UCN |
tRNA-Asp |
tRNA |
68 |
heavy strand |
|
COII |
protein |
684 |
heavy strand |
cytochrome c oxidase subunit II |
tRNA-Lys |
tRNA |
70 |
heavy strand |
|
ATP8 |
protein |
207 |
heavy strand |
ATP synthase F0 subunit 8 |
ATP6 |
protein |
681 |
heavy strand |
ATP synthase F0 subunit 6 |
COIII |
protein |
781 |
heavy strand |
cytochrome c oxidase subunit III |
tRNA-Gly |
tRNA |
68 |
heavy strand |
|
ND3 |
protein |
346 |
heavy strand |
NADH dehydrogenase subunit 3 |
tRNA-Arg |
tRNA |
65 |
heavy strand |
|
ND4L |
protein |
297 |
heavy strand |
NADH dehydrogenase subunit 4L |
ND4 |
protein |
1378 |
heavy strand |
NADH dehydrogenase subunit 4 |
tRNA-His |
tRNA |
69 |
heavy strand |
|
tRNA-Ser |
tRNA |
59 |
heavy strand |
codons recognized: AGY |
tRNA-Leu |
tRNA |
71 |
heavy strand |
codons recognized: CUN |
ND5 |
protein |
1812 |
heavy strand |
NADH dehydrogenase subunit 5 |
ND6 |
protein |
525 |
light strand |
NADH dehydrogenase subunit 6 |
tRNA-Glu |
tRNA |
69 |
light strand |
|
Cytb |
protein |
1135 |
heavy strand |
cytochrome b |
tRNA-Thr |
tRNA |
66 |
heavy strand |
|
tRNA-Pro |
tRNA |
69 |
light strand |
|
Genes
RNA
The difference between DNA and RNA is
biologically active RNA
Non-coding RNA
Non-coding RNAs, or ncRNAs
mRNA
Messenger RNA
tRNA
Transfer RNA
tmRNA
Transfer-messenger RNA
rRNA
Ribosomal RNA
snRNA
Additional ncRNA
Short bacterial ncRNA, or sRNA
A DNA strand
DNA strands are made of a sugar-phosphate backbone. The DNA strand is ordered in a 5' → 3' (five prime to three prime) direction, with the number derived from which carbon on the sugar is connected to the chain. The sugar is then connected to a base.
The base sequence is the unique idenfication we use in genomics. The base can be A, C, G or T in DNA. In RNA, the T is replaced by U.
The DNA double helix
DNA strands are arranged with a backbone in the 5' → 3' direction, and their complementary strands are flipped around. This chirality, or handedness, is what gives DNA the double helical structure.
Hydrogen bonding
When a hydrogen atom is connected to an electronegative atom (O, N, F), the electronegative atom will not share the electrons evenly. This leaves hydrogen as an exposed proton, creating a δ+ on the hydrogen and a δ- on the electronegative element. This dipole allows hydrogen bonding as the attractive force between δ+ and δ-.
Hydrogen bonding is strong, but it is not as strong as a covalent bond. In a covalent bond electrons are shared between two atoms. In a hydrogen bond there is only attraction between positive and negative molecular dipoles.
In DNA, hydrogen bonding is used to ensure proper pairing. A, T and U only have two hydrogen bonding sites, and can only pair with each other as A-T (in DNA) or A-U (RNA). C and G have three hydrogen bonding sites so can only make pairs as C-G.
Purines and Pyrimidines
Purines
Adenine, Guanine
Pyrimidines
Cytosine, Thymine, Uracil
Pairing
Purines pair with pyrimidines. This means G & A pair with C, T & U. DNA uses ACGT, RNA uses ACGU.