Introduction to DNA

Under scientific review

This introductory page is under scientific review. Use it for orientation, not clinical decision-making.

Deoxyribonucleic acid (DNA) is the molecule that carries genetic information for the development, functioning, growth, and reproduction of all known organisms and many viruses. Sources: [1]

Note: Information reflects 2025 standards. Verify references and database URLs periodically.

DNA Structure

DNA is a double helix molecule, composed of two strands that coil around each other. Sources: [2]

Nucleotide Composition

Each nucleotide is composed of three parts:

Phosphate Group: Linkage between nucleotides
Sugar (Deoxyribose): Chain structure
Nitrogenous Base: Genetic information

Nitrogenous Bases

There are four types of nitrogenous bases in DNA: Sources: [3]

Adenine (A): Pairs with Thymine (T)
Guanine (G): Pairs with Cytosine (C)
Cytosine (C): Pairs with Guanine (G)
Thymine (T): Pairs with Adenine (A)

DNA and Cancer

DNA damage can lead to mutations, which are changes in the nucleotide sequence: Sources: [4]

Mutations in Regulatory Genes: Control of cell cycle
Accumulation of Mutations: Malignant transformation
Repair Mechanisms: Protection against damage
Genomic Instability: Cancer characteristic

Computational Applications

Sequence Analysis

Install Biopython once in your environment:

bash

pip install biopython

python

# NOTE: Educational example only; not a validated analysis pipeline.
from collections import Counter
from pathlib import Path

from Bio import SeqIO
from Bio.SeqUtils import gc_fraction

EXAMPLE_FASTA = Path("gene.fasta")

def write_example_fasta(path: Path = EXAMPLE_FASTA) -> None:
    """Create a tiny FASTA file so the example is runnable end-to-end."""
    path.write_text(
        ">gene_example\n"
        "ATGCGTACGTACGTACGTAGCTAGCTAGCTACGTAGCTACGTAGCATGCTAGCTAGCTAGC\n"
        "TAGCTAGCTAGCTAGCTAGCTAGCTACGTAGCTAGCTAGCTAGCTAGCTACGTAGCTAGCTAGC\n",
        encoding="utf-8",
    )

def analyze_dna_sequence(sequence_file: str | Path) -> dict:
    """Analyze a FASTA DNA sequence for basic educational statistics."""
    record = SeqIO.read(str(sequence_file), "fasta")
    dna_seq = str(record.seq).upper()

    allowed_bases = {"A", "C", "G", "T", "N"}
    invalid_bases = sorted(set(dna_seq) - allowed_bases)
    if invalid_bases:
        raise ValueError(f"Unexpected FASTA bases: {', '.join(invalid_bases)}")

    base_counts = Counter(dna_seq)
    canonical_length = sum(base_counts[base] for base in "ACGT")
    gc_content = gc_fraction(dna_seq) * 100
    at_ratio = base_counts["A"] / base_counts["T"] if base_counts["T"] else None
    gc_ratio = base_counts["G"] / base_counts["C"] if base_counts["C"] else None

    return {
        "record_id": record.id,
        "length": len(dna_seq),
        "canonical_base_count": canonical_length,
        "gc_content": gc_content,
        "base_counts": dict(base_counts),
        "at_ratio": at_ratio,
        "gc_ratio": gc_ratio,
    }

write_example_fasta()
results = analyze_dna_sequence(EXAMPLE_FASTA)

print(f"Record: {results['record_id']}")
print(f"Length: {results['length']} bp")
print(f"GC content: {results['gc_content']:.1f}%")
print(
    f"A/T ratio: {results['at_ratio']:.2f}"
    if results["at_ratio"] is not None
    else "A/T ratio: n/a"
)

Mutation Detection

python

# NOTE: Educational example only; does not handle indels or alignment.
def detect_mutations(reference_seq, sample_seq):
    """Detects mutations between reference and sample sequences."""
    
    mutations = []
    
    # Compare position by position
    for i, (ref_base, sample_base) in enumerate(zip(reference_seq, sample_seq)):
        if ref_base != sample_base:
            mutation = {
                'position': i + 1,
                'reference': ref_base,
                'sample': sample_base,
                'type': classify_mutation(ref_base, sample_base)
            }
            mutations.append(mutation)
    
    return mutations

def classify_mutation(ref_base, sample_base):
    """Classifies mutation type."""
    
    # Transitions (purine ↔ purine, pyrimidine ↔ pyrimidine)
    purines = {'A', 'G'}
    pyrimidines = {'C', 'T'}
    
    if (ref_base in purines and sample_base in purines) or \
       (ref_base in pyrimidines and sample_base in pyrimidines):
        return "Transition"
    else:
        return "Transversion"

# Example
ref = "ATGCGATCGATCG"
sample = "ATGCGCTCGATCG"
mutations = detect_mutations(ref, sample)

for mut in mutations:
    print(f"Position {mut['position']}: {mut['reference']} → {mut['sample']} ({mut['type']})")

Analysis Techniques

Sequencing: Determination of base order
PCR: Amplification of specific regions
Southern Blot: Detection of specific sequences Sources: [5]
Hybridization: Identification of complementary sequences

Learning Resources

Books: Watson et al., Molecular Biology of the Gene
Courses: Khan Academy: DNA structure and replication, Coursera Bioinformatics Specialization
Tools: NCBI, Ensembl, DDBJ

If these mutations occur in genes that control the cell cycle, they can lead to cancer development.

References

DNA definition and role: Wikipedia DNA entry; NCBI Molecular Biology textbook; Visible Body overview.
DNA double helix model: Watson–Crick and modern summaries (NCBI textbook; Nature Scitable overview).
Nucleotide components and base pairing rules (Genome.gov glossary; Wikipedia base pair; LibreTexts DNA structure).
Mutations, proto‑oncogenes/tumor suppressors, and cancer: LibreTexts; PNAS; Cancer.ca overview; PMC reviews.
Southern blot concept and protocols (ConductScience; LibreTexts; Sigma‑Aldrich protocol).

Introduction to DNA ​

DNA Structure ​

Nucleotide Composition ​

Nitrogenous Bases ​

DNA and Cancer ​

Computational Applications ​

Sequence Analysis ​

Mutation Detection ​

Analysis Techniques ​

Learning Resources ​

References ​