Skip to content

Introduction to DNA

Deoxyribonucleic acid (DNA) is the molecule that carries genetic information for the development, functioning, growth, and reproduction of all known organisms and many viruses. Sources: [1]

Note: Information reflects 2025 standards. Verify references and database URLs periodically.

DNA Structure

DNA is a double helix molecule, composed of two strands that coil around each other. Sources: [2]

Nucleotide Composition

Each nucleotide is composed of three parts:

  • Phosphate Group: Linkage between nucleotides
  • Sugar (Deoxyribose): Chain structure
  • Nitrogenous Base: Genetic information

Nitrogenous Bases

There are four types of nitrogenous bases in DNA: Sources: [3]

  • Adenine (A): Pairs with Thymine (T)
  • Guanine (G): Pairs with Cytosine (C)
  • Cytosine (C): Pairs with Guanine (G)
  • Thymine (T): Pairs with Adenine (A)

DNA and Cancer

DNA damage can lead to mutations, which are changes in the nucleotide sequence: Sources: [4]

  • Mutations in Regulatory Genes: Control of cell cycle
  • Accumulation of Mutations: Malignant transformation
  • Repair Mechanisms: Protection against damage
  • Genomic Instability: Cancer characteristic

Computational Applications

Sequence Analysis

Install Biopython once in your environment:

bash
pip install biopython
python
# NOTE: Educational example only; not a validated analysis pipeline.
from collections import Counter
from pathlib import Path

from Bio import SeqIO
from Bio.SeqUtils import gc_fraction

EXAMPLE_FASTA = Path("gene.fasta")

def write_example_fasta(path: Path = EXAMPLE_FASTA) -> None:
    """Create a tiny FASTA file so the example is runnable end-to-end."""
    path.write_text(
        ">gene_example\n"
        "ATGCGTACGTACGTACGTAGCTAGCTAGCTACGTAGCTACGTAGCATGCTAGCTAGCTAGC\n"
        "TAGCTAGCTAGCTAGCTAGCTAGCTACGTAGCTAGCTAGCTAGCTAGCTACGTAGCTAGCTAGC\n",
        encoding="utf-8",
    )

def analyze_dna_sequence(sequence_file: str | Path) -> dict:
    """Analyze a FASTA DNA sequence for basic educational statistics."""
    record = SeqIO.read(str(sequence_file), "fasta")
    dna_seq = str(record.seq).upper()

    allowed_bases = {"A", "C", "G", "T", "N"}
    invalid_bases = sorted(set(dna_seq) - allowed_bases)
    if invalid_bases:
        raise ValueError(f"Unexpected FASTA bases: {', '.join(invalid_bases)}")

    base_counts = Counter(dna_seq)
    canonical_length = sum(base_counts[base] for base in "ACGT")
    gc_content = gc_fraction(dna_seq) * 100
    at_ratio = base_counts["A"] / base_counts["T"] if base_counts["T"] else None
    gc_ratio = base_counts["G"] / base_counts["C"] if base_counts["C"] else None

    return {
        "record_id": record.id,
        "length": len(dna_seq),
        "canonical_base_count": canonical_length,
        "gc_content": gc_content,
        "base_counts": dict(base_counts),
        "at_ratio": at_ratio,
        "gc_ratio": gc_ratio,
    }

write_example_fasta()
results = analyze_dna_sequence(EXAMPLE_FASTA)

print(f"Record: {results['record_id']}")
print(f"Length: {results['length']} bp")
print(f"GC content: {results['gc_content']:.1f}%")
print(
    f"A/T ratio: {results['at_ratio']:.2f}"
    if results["at_ratio"] is not None
    else "A/T ratio: n/a"
)

Mutation Detection

python
# NOTE: Educational example only; does not handle indels or alignment.
def detect_mutations(reference_seq, sample_seq):
    """Detects mutations between reference and sample sequences."""
    
    mutations = []
    
    # Compare position by position
    for i, (ref_base, sample_base) in enumerate(zip(reference_seq, sample_seq)):
        if ref_base != sample_base:
            mutation = {
                'position': i + 1,
                'reference': ref_base,
                'sample': sample_base,
                'type': classify_mutation(ref_base, sample_base)
            }
            mutations.append(mutation)
    
    return mutations

def classify_mutation(ref_base, sample_base):
    """Classifies mutation type."""
    
    # Transitions (purine ↔ purine, pyrimidine ↔ pyrimidine)
    purines = {'A', 'G'}
    pyrimidines = {'C', 'T'}
    
    if (ref_base in purines and sample_base in purines) or \
       (ref_base in pyrimidines and sample_base in pyrimidines):
        return "Transition"
    else:
        return "Transversion"

# Example
ref = "ATGCGATCGATCG"
sample = "ATGCGCTCGATCG"
mutations = detect_mutations(ref, sample)

for mut in mutations:
    print(f"Position {mut['position']}: {mut['reference']}{mut['sample']} ({mut['type']})")

Analysis Techniques

  • Sequencing: Determination of base order
  • PCR: Amplification of specific regions
  • Southern Blot: Detection of specific sequences Sources: [5]
  • Hybridization: Identification of complementary sequences

Learning Resources


If these mutations occur in genes that control the cell cycle, they can lead to cancer development.

References

  1. DNA definition and role: Wikipedia DNA entry; NCBI Molecular Biology textbook; Visible Body overview.
  2. DNA double helix model: Watson–Crick and modern summaries (NCBI textbook; Nature Scitable overview).
  3. Nucleotide components and base pairing rules (Genome.gov glossary; Wikipedia base pair; LibreTexts DNA structure).
  4. Mutations, proto‑oncogenes/tumor suppressors, and cancer: LibreTexts; PNAS; Cancer.ca overview; PMC reviews.
  5. Southern blot concept and protocols (ConductScience; LibreTexts; Sigma‑Aldrich protocol).

Early public release. Content evolves through continuous review. Questions: [email protected] · CC BY 4.0 where applicable.