Introduction to Genomics

This is a high-level overview of what we do in a cutting-edge Genomics lab. It is the story of how we go from tissue sample to sequenced genome.

The Genomics lab contains exhaustive instrumentation. This lab walkthrough was created to help familiarize yourself with the capabilities of our lab.

So how do we go from living organism to a DNA sequence we can actually read? DNA is found inside the nucleus of certain cells. Genomics is the study of an entire strand of DNA, but it is not so simple to get to that point - removing the DNA from the nucleus will not yield perfect strands. Instead DNA is cut up, so we wind up with fragments. These must be sorted and read individually, then reassembled using computer software.

A: DNA Extraction

The first step is to collect tissue samples. For zoological specimens, they must be tissued properly. This means freezing in liquid nitrogen upon being collected. By freezing tissue very fast instead of slowly, the ice forms smaller crystals.

The tissues must be stored in a -80°C freezer, of which we have 11. Such low temperature is required to prevent enzymes from breaking down the DNA, as enzymes only function in certain temperatures.

snapper

If you need to efficiently extract high molecular weight DNA, RNA or protein from fresh tissue samples, with high recovery and high yield, we have a Covaris CryoPrep™.

This device is able to pulverize snap frozen samples contained in amazing aerospace-designed bags to recover high molecular weight DNA and other biological products. Samples traditionally difficult to extract by other methods, but successfully extracted with the CryoPrep™ include plant material with high cellulose content, including seeds. CryoPrep™ cryogenic tissue pulverization works on samples with high pigment content. Other application areas include dry pulverization, hard tissues, tumor, bone, cartilage, cardiac, and skin.

B: Duplication and Storage
PCR


In the lab we can amplify (replicate many copies) of specific genes using PCR.

PCR is the breaking apart of DNA double helixes (denaturation), the combining of primers to specific parts of a DNA strand (annealing), and the adding of nucleotides to the primer by polymerase (extension), allowing you to double the DNA at a specific location (in our case, a specific gene) in one thermal cycle. By repeating the thermal cycle, you can amplify (create copies exponentially) of specific genes or segments of DNA.

Cloning and Colony Picking

We can use bacteria to store DNA and copy DNA. Bacteria routinely pass fragments of DNA to each other through conjugation, in a process known as horizontal gene transfer. This is what is used in cloning.

Bacteria are great for sorting and copying our DNA. They contain plasmids. By using an engineered plasmid called pUC18, we can use two important traits: the LacZ gene will express a blue color in the presence of X-gal (a sugar analog of lactose), and an ampicillin-resistant gene. The blunt-ended DNA fragments can be inserted into the LacZ gene, disrupting the blue color. To induce plasmids to insert into bacteria faster, we can use an electroporlator. It uses an electric shot to create tiny holes in the bacteria for the plasmids to enter. Bacteria which take in the DNA are then referred to as "transformed." Only one in 1000 E. coli incorporate our DNA. By using an agar plate with X-gal and ampicillin, we can see which colonies have inserted DNA (intact lacZ genes will express a blue color, indicating they obtained a pUC18 plasmid but did not insert our DNA fragment) and the ampicillin will kill any bacteria which did not insert a plasmid. We can then use a robotic colony picker to create new plates for the white colonies, as white colonies represent bacteria containing our fragments.

After letting the bacteria multiply for a day, we transfer the bacteria into a new 384-well plate and fill each well with a TE buffer. The TE buffer will protect the DNA while we heat up the plate to 95C, allowing us to separate the plasmid from the bacteria by bursting the cell walls.

The plasmids can then be amplified using rolling circle amplification. TempliPhi (an enzyme, free nucleotides and hexamers) is added to the wells.

Storage (BAC/Genomic Libraries)
C: Quantity and Quality of DNA
Bioanalyzing

Quantification, sizing and quality control of DNA

C: Shearing and Size Selection
Shearing

The Covaris LE220R™ and two Hydroshear™ are instruments which allow us to cut DNA into random fragments. These random fragments will overlap each other. When sequenced, the overlaps can be used for alignment, allowing us to reassemble the many small DNA segments into one large string.

Covaris AFA works using the principle of cavitation, by creating microscopic bubbles. When the bubbles collapse, water rushes in and cuts the DNA. This is the same idea as using depth charges used to sink submarines; water rushing in from a collapsing bubble of air hits the hull and causes the damage.

Hydroshears work by passing DNA samples through a tube, where it is forced through a small aperture in a ruby. This will stretch the DNA, cutting it into consistent sizes. The ends of the DNA will be sticky (have an overhang) and must be ligated using enzymes.

Gel Electrophoresis
Size Selection

Faster DNA size selection; NGS Library Construction with the Sage Science Pippin Prep™.

Following DNA shearing, we offer advanced fragment size selection using the Sage Science Pippin Prep™. Individual sample channels eliminate sample cross-contamination, and reproducible extractions provide more consistent results. The Pippin Prep™ collects targets between 50bp - 1.5kb, providing narrow fragment size distributions for paired-end sequencing. Minimal low molecular weight contamination reduces wasted reads and ambiguous indel calling. This approach maximizes short read DNA assemblies.

More information: http://www.sagescience.com/products/pippin-prep/

E: DNA Sequencing

Sequencing is any of several methods to read the A, G, T or C arrangement on a strand of DNA. The DNA is sequenced in tiny pieces (50bp-10kb) which are then reassembled on a computer, providing a picture of the entire genome sequence.

first generation

We use an ABI 3730xl, a type of Sanger sequencer.

Sanger Sequencing, or capillary sequencing, is a way of reading the nucleotides by using chain termination.

dNTPs are the free nucleotides (deoxyribonucleotide triphosphates, or free A, C, G and T nucleotides) dATP, dCTP, dGTP and dTTP. In chain termination we use both dNTPs and ddNTPs (dideoxyribonucleotide triphosphates, or free A, C, G and T nucleotides, where the hydroxyl group on the third carbon is missing an oxygen) together. The polyermase enzyme will recognize the ddNTP is not a correct dNTP, so will cease adding nucleotides at that location.

The dNTPs and ddNTPs are added into four separate containers with a radioactive primer. This allows us to to see only one color to infer the nucleotide.

The ddNTPs will be added in random places along each strand, and by stopping the polymerase action randomly we can run the various DNA fragments out on gels and visualize the chain-terminations to determine the DNA sequence.

Chain termination sequencing is useful as it can read long sequences of DNA (over 800bp) and is very accurate. However, this method of sequencing requires a lot of work and large amount of reagent, making the bases sequenced per dollar expensive - nearly 50,000 times more expensive per base than Illumina sequencing.

For more information on chain termination sequencing, there are YouTube videos which illustrate it in more depth, as well as good information at JGI.

http://www.youtube.com/watch?v=eJ3WvQsPLUk http://www.youtube.com/watch?v=VhH-SJKGAPo

Also see the 3-part series of shotgun sequencing: part I, part II and part III.

next-gen

Illumina (150bp) performs sequencing by synthesizing complementary DNA strands. It is another method of chain termination. DNA which has been sheared into small fragments is separated on gel by fragment size. A certain size fragment is selected and amplified through PCR. Bridge amplification is employed to sythesize copies of the DNA on the surface of a flow cell. As each base is added a camera records the fluorescent signal emitted to determine the sequence. This is done in parallel, with 150 million DNA fragments per flow cell. Illumina is currently the most widely-employed technology in DNA sequencing.

PacBio (2-10kb) uses single molecule real time sequencing. It uses polymerase to add phospholinked nucleotides which carry fluorescent labels on the terminal phosphate, which is cleaved away during replication. This fluorscence is recorded and a DNA sequence is discerned. They hope to read an entire genome in under an hour for $100.

454 (700bp) by Roche uses pyrosequencing. Available in the mid 2000s, is similar to Sanger Sequencing but is done in parallel and requires far less preparation. Using emulsion PCR amplificiation, it creates many copies of a DNA strand attached to a bead. By adding a base it can detect how much light is generated in strand synthesis, revealing the nucleotide added.

Ion Torrent (200bp) by LifeTech is much like a very precise pH meter. It detects changes in the concentration of H+, released while polymerase synthesizes a complementary strand of DNA.

SOLiD (<100bp), also by LifeTech, uses emulsion PCR like Roche, but on beads of varying size. Fluorescence emitted when fragments of DNA are added onto a strand sequence are recorded. SOLiD is very useful for detecting SNPs, insertions and deletions.

next-next gen

Nabsys (100kb) is poised to use positional sequencing. With current sequencing we have to reassemble a giant puzzle. Positional sequencing can read a length of base pairs and keep track of the location within the genome, greatly simplifying assembly. Instead of uses light to detect base pairs added, Nabsys uses electronic DNA sequencing. By attaching probes to the DNA and pulling them through a nanopore, they can analyze current versus time, creating a genome-length probe map of the DNA. Done in parallel, the genomic sequence can be determined. YouTube

Oxford Nanopore detects ionic currents as molecules move through a nanapore. Measuring the current allows them to identify the molecule moving through, including A, C, G and T on a strand of DNA.

NobleGen

PicoSeq

F: DNA Expression Analysis
G: Bioinformatics
H: Development of Student Efforts