What is Cancer?
Cancer is fundamentally a disease of the genome.At its core, cancer occurs when normal cells acquire genetic alterations that allow them to grow and divide uncontrollably. Sources: [1]
Note: Information reflects 2025 standards. Numbers and code examples are illustrative; actual values vary by cancer type and study.
The Simple Definition
Cancer is a group of diseases characterized by uncontrolled cell growth and the ability to invade and spread to other parts of the body. Sources: [2]
But for developers and data scientists, we need a more precise, technical understanding.
The Technical Definition
As a teaching mnemonic for developers, cancer can be sketched as:
Normal cell + somatic/genetic alterations + selection over time + tissue environment -> malignant cloneThis is not a formal mathematical model. It is a simplified pointer toward clonal evolution and multistep carcinogenesis: cells acquire heritable changes, variants with growth/survival advantages expand, and the tissue microenvironment shapes which clones persist. Sources: [3]
Key Characteristics
1. Genomic Instability
Cancer cells have unstable genomes, meaning they accumulate mutations faster than normal cells. Sources: [4]
# Conceptual representation (illustrative; varies widely by tumor type)
normal_mutation_rate = 1e-10 # per base per cell division (approx.)
cancer_mutation_rate = 1e-8 # rough order-of-magnitude higher[5]2. Clonal Evolution
Cancer develops through a process similar to Darwinian evolution:
- Mutation occurs in a single cell
- Selection favors cells with growth advantages
- Expansion of the fittest clone
- Repeat the process
3. Heterogeneity
Cancer is not a single disease but hundreds of different diseases, each with unique:
- Genetic profiles
- Behavior patterns
- Treatment responses
Types of Cancer
By Tissue of Origin
| Type | Origin | Examples |
|---|---|---|
| Carcinomas | Epithelial tissues | Breast, lung, colon cancer |
| Sarcomas | Connective tissues | Bone, muscle cancers |
| Leukemias | Blood cells | ALL, AML, CLL |
| Lymphomas | Lymphatic system | Hodgkin's, non-Hodgkin's |
By Genetic Profile
Modern cancer classification increasingly relies on molecular characteristics: Sources: [6]
Traditional: Lung Cancer
Modern: EGFR-mutant Lung AdenocarcinomaThe Data Perspective
For data scientists, cancer presents several interesting characteristics:
High-Dimensional Data
- Genomic: about 20,000 protein-coding genes, plus many non-coding RNAs, pseudogenes, regulatory elements, and structural features Sources: [7]
- Transcriptomic: Expression levels of all genes
- Proteomic: Protein abundance
- Clinical: Demographics, treatments, outcomes
Sparse and Noisy
- Most genes are not mutated in any given cancer
- Measurement noise is significant
- Missing data is common
Heterogeneous
- Different cancer types have different patterns
- Even within types, significant variation exists
- Batch effects are common in genomic data
Why This Matters for Your Code
Understanding cancer biology helps you:
- Choose appropriate features for machine learning models
- Interpret results in biological context
- Validate findings against known biology
- Communicate effectively with domain experts
Example: Mutation Burden
Different cancer types have vastly different mutation burdens. The numbers below are rough, illustrative central tendencies, not fixed values for every tumor; within one cancer type, TMB can span orders of magnitude. Sources: [8]
mutation_burden = {
'melanoma': '~10-20+', # mutations per megabase; often high but highly variable
'lung_cancer': '~5-15+',
'breast_cancer': '~1-3',
'pediatric_cancer': '<1-2'
}This biological knowledge helps explain why:
- Immunotherapy often works better in high‑mutation cancers Sources: [9]
- Different sequencing depths are needed
- Statistical power varies by cancer type
Next Steps
Now that you understand what cancer is, let's explore:
- The Genetic Basis of Cancer - How DNA changes drive cancer
- Cancer Hallmarks - The key characteristics of cancer cells
- Tumor Progression - How cancer develops over time
Key Takeaways
- Cancer is a genetic disease caused by accumulated mutations Sources: [1]
- It's not one disease but hundreds of different diseases
- Understanding the biology is crucial for effective data analysis
- The genomic perspective provides a framework for computational approaches
References
- Hanahan, D., & Weinberg, R.A. (2011). Hallmarks of cancer: The next generation. Cell, 144(5), 646–674. doi:10.1016/j.cell.2011.02.013.
- National Cancer Institute / WHO patient materials definitions of cancer.
- Nowell PC. The clonal evolution of tumor cell populations. Science. 1976;194(4260):23-28. doi:10.1126/science.959840.
- Negrini, S., Gorgoulis, V.G., & Halazonetis, T.D. (2010). Genomic instability—an evolving hallmark of cancer. Nature Reviews Molecular Cell Biology, 11, 220–228. https://www.nature.com/articles/nrm2858
- Martincorena I, Raine KM, Gerstung M, et al. Universal patterns of selection in cancer and somatic tissues. Cell. 2017;171(5):1029-1041.e21. doi:10.1016/j.cell.2017.09.042; Lynch M. Rate, molecular spectrum, and consequences of human mutation. PNAS. 2010;107(3):961-968. https://pmc.ncbi.nlm.nih.gov/articles/PMC2914198/
- EGFR‑mutant lung adenocarcinoma as a precision oncology subtype (TLCR 2022): https://tlcr.amegroups.com/article/view/83462/html.
- Harrow J, Frankish A, Gonzalez JM, et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Research. 2012;22(9):1760-1774. doi:10.1101/gr.135350.111.
- Frontiers in Immunology (2023) high‑TMB distributions across TCGA: https://pmc.ncbi.nlm.nih.gov/articles/PMC9998480/; Lawrence, M.S., et al. (2013/2016) TCGA mutation burden patterns: https://pmc.ncbi.nlm.nih.gov/articles/PMC4930685/
- Journal for ImmunoTherapy of Cancer (2025) 13:e010311 (8440 pts) https://jitc.bmj.com/content/13/2/e010311; Nature Medicine (2023) https://www.nature.com/articles/s41591-022-02163-w; Meta‑analysis (2022) https://pmc.ncbi.nlm.nih.gov/articles/PMC8956924/