DNA, or deoxyribonucleic acid, contains the genetic instructions used in the development and functioning of all known living organisms and many viruses. DNA is often compared to a set of blueprints or a recipe, or code, since it contains the instructions needed to construct other components of cells, such as proteins and RNA molecules. The DNA segments that carry genetic information are called genes.
DNA is made up of a long chain of individual units called nucleotides. Each nucleotide contains one of four nitrogen-containing nucleobases – cytosine (C), guanine (G), adenine (A) or thymine (T). The order of these bases is what determines the information available for building and maintaining an organism. DNA bases pair up with each other, A with T and C with G, to form units called base pairs. Each base is also attached to a sugar and a phosphate group. Together, a base, sugar and phosphate are called a nucleotide. Nucleotides are arranged in two long strands that form a spiral called a double helix. The structure of the double helix looks like a ladder that has been twisted.
What information is stored in DNA?
DNA contains the instructions needed for an organism to develop, survive and reproduce. The important kinds of information stored in DNA include:
Instructions for making proteins
Proteins perform a wide array of functions within organisms. They include structural proteins, enzymes, hormones, antibodies, transporters, and much more. The instructions for making a protein are encoded in DNA within genes. Each gene contains the recipe for a single protein. This recipe lists the sequence of amino acids that make up the protein. The building blocks of proteins, amino acids, are brought to the growing protein chain by RNA according to the DNA instructions. There are 20 standard amino acids used by organisms on Earth. The sequence and number of amino acids determine the protein’s shape and function.
Instructions for making RNA
In additional to encoding the information for making proteins, DNA also contains the instructions for producing RNA molecules. RNA plays an essential role in converting the genetic information from DNA into functional proteins. The three main types of RNA molecules are messenger RNA (mRNA), transfer RNA (tRNA) and ribosomal RNA (rRNA). mRNA carries the message from DNA to be used to build proteins. tRNA brings the amino acids to the growing protein chain according to the mRNA blueprint. rRNA combines with proteins to form ribosomes that link amino acids together.
Regulatory instructions
DNA does not just encode direct recipes for making proteins and RNA. It also contains regulatory instructions telling the cell when to switch genes on and off. This controls when and where proteins are made in the body. Regulatory DNA regions bind proteins that control the copying of DNA into RNA and the expression of genes. This regulation allows cells with the same DNA to differentiate into the many cell types needed in a multicellular organism. It also allows organisms to adapt to changes in their environment.
Instructions for DNA replication and repair
When cells divide, they need to accurately replicate their DNA so the next generation of cells contains the same genetic information. DNA therefore encodes the proteins and RNAs involved in DNA replication and proofreading. If damage occurs to the DNA, repair enzymes are also encoded by the DNA. These enzymes fix different kinds of damage, from breaks in the DNA strands to chemically changed bases. Accurate DNA replication and repair are critical for preserving the information in the genome.
DNA packaging instructions
To fit inside cells, DNA needs to be tightly packaged into structures called chromosomes. A human cell contains about 2 meters of DNA. This is packaged down into 46 chromosomes ranging from around 50 million to 250 million base pairs each. DNA encodes histone proteins that allow DNA strands to wrap around them for compact storage. The DNA is further organized and condensed by looping and coiling to form higher order chromosome structures.
How is the information in DNA read?
For the coded messages in DNA to be used by the cell, the information first needs to be converted into a usable form. This involves copying the genetic instructions from DNA into RNA through a process called transcription. The steps of transcription include:
1. Unzipping of the DNA double helix
An enzyme called helicase unwinds and unzips the DNA double helix. This exposes the bases in each strand of DNA.
2. Binding of RNA polymerase
The enzyme RNA polymerase binds to the DNA at the start of the gene to be transcribed. It positions itself on one of the DNA strands known as the template strand.
3. Synthesis of pre-mRNA
RNA polymerase moves step-by-step along the DNA template strand, pairing RNA bases with complementary DNA bases. Adenine pairs with uracil in RNA instead of thymine. This forms a strand of pre-messenger RNA (pre-mRNA) that is complementary to the DNA template strand.
4. Termination
At a stop signal in the DNA, the RNA polymerase releases the newly made pre-mRNA and detaches from the DNA. This completes transcription.
The pre-mRNA then undergoes splicing to remove non-coding regions. The mature mRNA travels out of the nucleus to the ribosome where translation occurs. The mRNA nucleotide sequence is read in sets of three bases called codons. Each codon codes for a specific amino acid. The amino acids are strung together in the order dictated by the mRNA to form a protein. In this way, the genetic code stored in DNA is ultimately converted into functional proteins.
What parts of DNA are genes?
Genes make up about 2% of the human genome. They encode all the proteins and functional RNAs needed by human cells. But genes do not take up the entire DNA molecule. There are also regulatory regions that control gene activity, as well as non-coding regions whose purpose is still being investigated.
Each gene consists of alternating coding regions called exons and non-coding regions called introns. Within genes, only the exons code for proteins or RNAs. When the pre-mRNA is processed, the introns are removed and the exons are spliced together to generate the final mRNA product.
Human genes vary enormously in size due to differences in the amount of DNA used to code for proteins and RNAs and the sizes of introns. For example, the dystrophin gene that is mutated in muscular dystrophy is the largest known human gene, containing around 2.2 million base pairs of DNA. In contrast, genes coding for tiny RNA molecules may be only 70-100 base pairs long.
The number of genes contained in an organism’s DNA can also vary greatly. Scientists initially predicted over 100,000 genes in the human genome based on estimates of the complexity of humans over simpler organisms. However, the initial sequencing of the human genome revealed only around 20-25,000 protein-coding genes. Approximately 1000-2000 additional genes code for RNAs. While humans certainly appear more complex than species with fewer genes, this highlights that it is not just gene number that determines complexity. Elements like gene regulation and alternative splicing of transcripts likely contribute greatly to human complexity.
What types of DNA variations are there?
No two people have exactly the same DNA sequence. Even identical twins have some differences in their genomes accumulated over their lifetimes. Variations in DNA among members of a species are referred to as genetic polymorphisms. They provide the genetic basis for diversity and influence traits like appearance and disease susceptibility. The types of natural DNA sequence variations include:
Single nucleotide polymorphisms (SNPs)
SNPs involve variation in a single base pair. These are the most common type of polymorphism in human DNA. Millions of SNPs have been catalogued in the human genome. While many have no effect, some influence risk of diseases like cancer.
Short tandem repeats (STRs)
STRs, also called microsatellites, are repeating sequences of 2-5 base pairs. The number of times the core sequence repeats at a location can differ between individuals. STRs are commonly used in forensic DNA fingerprinting.
Insertions and deletions
Segments of DNA ranging from a single base pair to thousands can be inserted or deleted in the genome. Deletions may remove important genes, while insertions can truncate and inactivate genes. Small insertions and deletions within genes can shift the gene reading frame, dramatically altering the resulting protein.
Copy number variations
Large segments of the genome can also be duplicated or deleted during evolution or within a population. This leads to variations in copy number of those genes between different people. Changes in copy number variation has been linked to increased risk of neurological disorders like autism.
Inversions and translocations
DNA sequences can also be flipped around or moved to new positions in the genome. Inversions rearrange the orientation of a DNA segment, while translocations shift a segment to a new chromosomal location. These variations can disrupt genes and their regulation.
Together, all these types of genetic variations contribute to the differences we see between individuals in traits and disease susceptibility. Studying them helps scientists piece together how changes in DNA sequence influence human phenotypes.
What types of DNA exist besides the human genome?
In addition to the DNA that makes up the human genome within the nucleus, humans also have DNA located in mitochondria. Mitochondria are structures that supply energy to the cell. They contain a small circular chromosome made up of about 16,500 base pairs. The mitochondrial DNA (mtDNA) has 37 genes needed for mitochondrial function. MtDNA is inherited solely from the mother. Research on mtDNA variants helps trace maternal ancestry.
Humans also contain a vast community of symbiotic microbes that harbor their own DNA. The human microbiome includes bacteria, viruses, fungi and other microorganisms living on the skin and various body surfaces. Studies estimate the ~100 trillion microbial cells in the body contain millions of unique genes that influence metabolism, immunity and other processes. The Human Microbiome Project was launched to analyze the role of this “second genome” in human health and disease.
Beyond humans, the natural world contains a remarkable diversity of genomic information. Millions of species have DNA adapted to their unique requirements. Even strains of bacteria have DNA variations that influence traits like virulence and antibiotic resistance. Environmental shotgun sequencing continues to uncover new genomes in everything from soil microbes to ocean plankton. Cataloging and comparing DNA across the kingdoms of life provides insight into evolution and biology.
How is DNA sequenced?
Determining the order of bases within a strand of DNA is called DNA sequencing. There are a number of methods scientists use to sequence DNA:
Sanger sequencing
Developed by Frederick Sanger in 1977, this was the first method for determining the sequence of DNA. It involves using DNA polymerase to synthesize new DNA from a single-stranded template, incorporating chain-terminating ddNTPs that stop elongation at different points. The fragments produced are separated by size, allowing the sequence to be read. Sanger sequencing decoded the first complete genome of a free-living organism, the bacterium Haemophilus influenzae, in 1995.
Next-generation sequencing
New high-throughput methods emerged in the mid-2000s that dramatically sped up DNA sequencing. These include technologies like Illumina’s sequencing by synthesis, Roche’s 454 pyrosequencing, and Life Technology’s sequencing by ligation. They allow massively parallel sequencing of millions of small fragments, enabling fast and cheap genome sequencing.
Long-read sequencing
Methods like PacBio’s single-molecule real-time (SMRT) sequencing and Oxford Nanopore’s nanopore sequencing provide much longer read lengths. This facilitates assembly of complex repetitive regions. It also enables sequencing of full-length RNAs and targeted sequencing of long DNA fragments.
DNA mapping
Techniques like chromosome conformation capture (3C) probe the 3D structure and spatial arrangement of chromatin in the nucleus. This reveals important interactions between DNA elements for genome organization and gene regulation.
Continued advances in DNA sequencing speed and scale have opened up exciting possibilities for research and clinical testing. Human genome sequencing is now widely accessible for around $1000, enabling personalized medicine applications. DNA sequencing also facilitates fundamental biological discovery across the tree of life.
What are some important discoveries made by analyzing DNA sequences?
Many revolutionary insights have emerged from analyzing DNA sequences. Examples include:
Confirming the universal genetic code
Early DNA sequencing experiments confirmed that all species use the same genetic code translating nucleic acid base triplets into amino acids. Minor variations were later discovered in mitochondrial and protozoan DNA.
Understanding human evolution
Comparing DNA sequences from humans and other apes proved humans and chimpanzees share over 98% of their DNA. More recent genome projects exploring human mutation rates and migration patterns continue to reveal our origins and history.
Developing phylogenetic trees
DNA sequence comparisons provide a record of evolutionary relationships by indicating how recently species diverged from a common ancestor. Phylogenetic trees built using DNA create an organizational framework for classifying life’s diversity.
Identifying disease-causing mutations
The search for genetic mutations linked to cancers and thousands of inherited conditions relies heavily on DNA tests. These pinpoint disease-related variants from the billions of base pairs in the human genome.
Enabling DNA forensics
Analyses of STRs, SNPs and mitochondrial DNA sequences allow forensic scientists to identify crime suspects from traces of biological evidence or determine remains. DNA evidence has exonerated wrongfully convicted individuals through DNA fingerprinting comparisons.
Tracing human migrations and ancestry
Patterns in mitochondrial DNA and Y chromosome sequences chart ancient human travels. Companies like 23andMe connect customers to their geographical roots by comparing small differences in DNA inherited from common ancestors.
Cataloging biodiversity
Environmental DNA sequencing is uncovering a huge range of novel organisms and expanding the tree of life. Comparing genes shared across species assembles evolutionary connections while highlighting unique adaptations.
What are some future applications of DNA sequencing?
DNA sequencing will continue fueling discovery across the life sciences and beyond. Future applications may include:
Precision medicine
Rather than “one size fits all”, treatments will be tailored to a patient’s genetics. Tumor DNA sequencing guides cancer therapy. Germline DNA testing helps prevent disease. Sequencing of microbiome, viruses and pathogens enables more exact diagnoses.
Gene editing
Powerful CRISPR gene editing relies on DNA sequence recognition between guide RNAs and target sites. Improvements in efficiency and accuracy will expand editing possibilities for research and medicine.
DNA data storage
With exponential growth in data, DNA is an attractive medium for high-density information storage and encryption using base codes. DNA can last centuries without maintenance.
Forensic technologies
Expanded DNA databases along with new sequencing and analysis methods will continue modernizing forensic investigations by increasing identifiers from minute samples.
Bioengineering
Designing novel enzymes, metabolic pathways and organisms relies on manipulating DNA sequences. Synthetic biology and DNA programming applications range from manufacturing to environmental remediation.
Agriculture
Breeding crops and livestock with favourable traits will be accelerated by genomic selection methods. Sequenced DNA will ensure safety of GMOs. Soil metagenomics will improve crop growth.
Conservation
DNA barcoding catalogues biodiversity and monitors endangered species. Ancient DNA from fossils and subfossils helps rescue extinct genomes. Environmental DNA provides early warning of ecosystem shifts.
Conclusion
The instructions for life are written in the four-letter code of DNA bases. Thanks to advances in sequencing technology, scientists are unlocking the myriad ways organisms encode information in the sequence and structure of DNA molecules. Reading and writing DNA offers unprecedented control over biology, driving advances towards a future of personalized medicine, enhanced sustainability, and much more we cannot yet imagine. Nearly seventy years after Watson and Crick revealed the double helix model, DNA retains its mystique and promise as the blueprint of life.