Introduction to DNA
Deoxyribonucleic acid (DNA) is the molecule that carries genetic information for the development, functioning, growth, and reproduction of all known organisms and many viruses. Sources: [1]
Note: Information reflects 2025 standards. Verify references and database URLs periodically.
DNA Structure
DNA is a double helix molecule, composed of two strands that coil around each other. Sources: [2]
Nucleotide Composition
Each nucleotide is composed of three parts:
- Phosphate Group: Linkage between nucleotides
- Sugar (Deoxyribose): Chain structure
- Nitrogenous Base: Genetic information
Nitrogenous Bases
There are four types of nitrogenous bases in DNA: Sources: [3]
- Adenine (A): Pairs with Thymine (T)
- Guanine (G): Pairs with Cytosine (C)
- Cytosine (C): Pairs with Guanine (G)
- Thymine (T): Pairs with Adenine (A)
DNA and Cancer
DNA damage can lead to mutations, which are changes in the nucleotide sequence: Sources: [4]
- Mutations in Regulatory Genes: Control of cell cycle
- Accumulation of Mutations: Malignant transformation
- Repair Mechanisms: Protection against damage
- Genomic Instability: Cancer characteristic
Computational Applications
Sequence Analysis
Install Biopython once in your environment:
bash
pip install biopythonpython
# NOTE: Educational example only; not a validated analysis pipeline.
from collections import Counter
from pathlib import Path
from Bio import SeqIO
from Bio.SeqUtils import gc_fraction
EXAMPLE_FASTA = Path("gene.fasta")
def write_example_fasta(path: Path = EXAMPLE_FASTA) -> None:
"""Create a tiny FASTA file so the example is runnable end-to-end."""
path.write_text(
">gene_example\n"
"ATGCGTACGTACGTACGTAGCTAGCTAGCTACGTAGCTACGTAGCATGCTAGCTAGCTAGC\n"
"TAGCTAGCTAGCTAGCTAGCTAGCTACGTAGCTAGCTAGCTAGCTAGCTACGTAGCTAGCTAGC\n",
encoding="utf-8",
)
def analyze_dna_sequence(sequence_file: str | Path) -> dict:
"""Analyze a FASTA DNA sequence for basic educational statistics."""
record = SeqIO.read(str(sequence_file), "fasta")
dna_seq = str(record.seq).upper()
allowed_bases = {"A", "C", "G", "T", "N"}
invalid_bases = sorted(set(dna_seq) - allowed_bases)
if invalid_bases:
raise ValueError(f"Unexpected FASTA bases: {', '.join(invalid_bases)}")
base_counts = Counter(dna_seq)
canonical_length = sum(base_counts[base] for base in "ACGT")
gc_content = gc_fraction(dna_seq) * 100
at_ratio = base_counts["A"] / base_counts["T"] if base_counts["T"] else None
gc_ratio = base_counts["G"] / base_counts["C"] if base_counts["C"] else None
return {
"record_id": record.id,
"length": len(dna_seq),
"canonical_base_count": canonical_length,
"gc_content": gc_content,
"base_counts": dict(base_counts),
"at_ratio": at_ratio,
"gc_ratio": gc_ratio,
}
write_example_fasta()
results = analyze_dna_sequence(EXAMPLE_FASTA)
print(f"Record: {results['record_id']}")
print(f"Length: {results['length']} bp")
print(f"GC content: {results['gc_content']:.1f}%")
print(
f"A/T ratio: {results['at_ratio']:.2f}"
if results["at_ratio"] is not None
else "A/T ratio: n/a"
)Mutation Detection
python
# NOTE: Educational example only; does not handle indels or alignment.
def detect_mutations(reference_seq, sample_seq):
"""Detects mutations between reference and sample sequences."""
mutations = []
# Compare position by position
for i, (ref_base, sample_base) in enumerate(zip(reference_seq, sample_seq)):
if ref_base != sample_base:
mutation = {
'position': i + 1,
'reference': ref_base,
'sample': sample_base,
'type': classify_mutation(ref_base, sample_base)
}
mutations.append(mutation)
return mutations
def classify_mutation(ref_base, sample_base):
"""Classifies mutation type."""
# Transitions (purine ↔ purine, pyrimidine ↔ pyrimidine)
purines = {'A', 'G'}
pyrimidines = {'C', 'T'}
if (ref_base in purines and sample_base in purines) or \
(ref_base in pyrimidines and sample_base in pyrimidines):
return "Transition"
else:
return "Transversion"
# Example
ref = "ATGCGATCGATCG"
sample = "ATGCGCTCGATCG"
mutations = detect_mutations(ref, sample)
for mut in mutations:
print(f"Position {mut['position']}: {mut['reference']} → {mut['sample']} ({mut['type']})")Analysis Techniques
- Sequencing: Determination of base order
- PCR: Amplification of specific regions
- Southern Blot: Detection of specific sequences Sources: [5]
- Hybridization: Identification of complementary sequences
Learning Resources
- Books: Watson et al., Molecular Biology of the Gene
- Courses: Khan Academy: DNA structure and replication, Coursera Bioinformatics Specialization
- Tools: NCBI, Ensembl, DDBJ
If these mutations occur in genes that control the cell cycle, they can lead to cancer development.
References
- DNA definition and role: Wikipedia DNA entry; NCBI Molecular Biology textbook; Visible Body overview.
- DNA double helix model: Watson–Crick and modern summaries (NCBI textbook; Nature Scitable overview).
- Nucleotide components and base pairing rules (Genome.gov glossary; Wikipedia base pair; LibreTexts DNA structure).
- Mutations, proto‑oncogenes/tumor suppressors, and cancer: LibreTexts; PNAS; Cancer.ca overview; PMC reviews.
- Southern blot concept and protocols (ConductScience; LibreTexts; Sigma‑Aldrich protocol).