Skip to content

What is Cancer?

Cancer is fundamentally a disease of the genome.At its core, cancer occurs when normal cells acquire genetic alterations that allow them to grow and divide uncontrollably. Sources: [1]

Note: Information reflects 2025 standards. Numbers and code examples are illustrative; actual values vary by cancer type and study.

The Simple Definition

Cancer is a group of diseases characterized by uncontrolled cell growth and the ability to invade and spread to other parts of the body. Sources: [2]

But for developers and data scientists, we need a more precise, technical understanding.

The Technical Definition

As a teaching mnemonic for developers, cancer can be sketched as:

Normal cell + somatic/genetic alterations + selection over time + tissue environment -> malignant clone

This is not a formal mathematical model. It is a simplified pointer toward clonal evolution and multistep carcinogenesis: cells acquire heritable changes, variants with growth/survival advantages expand, and the tissue microenvironment shapes which clones persist. Sources: [3]

Key Characteristics

1. Genomic Instability

Cancer cells have unstable genomes, meaning they accumulate mutations faster than normal cells. Sources: [4]

python
# Conceptual representation (illustrative; varies widely by tumor type)
normal_mutation_rate = 1e-10  # per base per cell division (approx.)
cancer_mutation_rate = 1e-8   # rough order-of-magnitude higher[5]

2. Clonal Evolution

Cancer develops through a process similar to Darwinian evolution:

  1. Mutation occurs in a single cell
  2. Selection favors cells with growth advantages
  3. Expansion of the fittest clone
  4. Repeat the process

3. Heterogeneity

Cancer is not a single disease but hundreds of different diseases, each with unique:

  • Genetic profiles
  • Behavior patterns
  • Treatment responses

Types of Cancer

By Tissue of Origin

TypeOriginExamples
CarcinomasEpithelial tissuesBreast, lung, colon cancer
SarcomasConnective tissuesBone, muscle cancers
LeukemiasBlood cellsALL, AML, CLL
LymphomasLymphatic systemHodgkin's, non-Hodgkin's

By Genetic Profile

Modern cancer classification increasingly relies on molecular characteristics: Sources: [6]

Traditional: Lung Cancer
Modern: EGFR-mutant Lung Adenocarcinoma

The Data Perspective

For data scientists, cancer presents several interesting characteristics:

High-Dimensional Data

  • Genomic: about 20,000 protein-coding genes, plus many non-coding RNAs, pseudogenes, regulatory elements, and structural features Sources: [7]
  • Transcriptomic: Expression levels of all genes
  • Proteomic: Protein abundance
  • Clinical: Demographics, treatments, outcomes

Sparse and Noisy

  • Most genes are not mutated in any given cancer
  • Measurement noise is significant
  • Missing data is common

Heterogeneous

  • Different cancer types have different patterns
  • Even within types, significant variation exists
  • Batch effects are common in genomic data

Why This Matters for Your Code

Understanding cancer biology helps you:

  1. Choose appropriate features for machine learning models
  2. Interpret results in biological context
  3. Validate findings against known biology
  4. Communicate effectively with domain experts

Example: Mutation Burden

Different cancer types have vastly different mutation burdens. The numbers below are rough, illustrative central tendencies, not fixed values for every tumor; within one cancer type, TMB can span orders of magnitude. Sources: [8]

python
mutation_burden = {
    'melanoma': '~10-20+',      # mutations per megabase; often high but highly variable
    'lung_cancer': '~5-15+',
    'breast_cancer': '~1-3',
    'pediatric_cancer': '<1-2'
}

This biological knowledge helps explain why:

  • Immunotherapy often works better in high‑mutation cancers Sources: [9]
  • Different sequencing depths are needed
  • Statistical power varies by cancer type

Next Steps

Now that you understand what cancer is, let's explore:

Key Takeaways

  • Cancer is a genetic disease caused by accumulated mutations Sources: [1]
  • It's not one disease but hundreds of different diseases
  • Understanding the biology is crucial for effective data analysis
  • The genomic perspective provides a framework for computational approaches

References

  1. Hanahan, D., & Weinberg, R.A. (2011). Hallmarks of cancer: The next generation. Cell, 144(5), 646–674. doi:10.1016/j.cell.2011.02.013.
  2. National Cancer Institute / WHO patient materials definitions of cancer.
  3. Nowell PC. The clonal evolution of tumor cell populations. Science. 1976;194(4260):23-28. doi:10.1126/science.959840.
  4. Negrini, S., Gorgoulis, V.G., & Halazonetis, T.D. (2010). Genomic instability—an evolving hallmark of cancer. Nature Reviews Molecular Cell Biology, 11, 220–228. https://www.nature.com/articles/nrm2858
  5. Martincorena I, Raine KM, Gerstung M, et al. Universal patterns of selection in cancer and somatic tissues. Cell. 2017;171(5):1029-1041.e21. doi:10.1016/j.cell.2017.09.042; Lynch M. Rate, molecular spectrum, and consequences of human mutation. PNAS. 2010;107(3):961-968. https://pmc.ncbi.nlm.nih.gov/articles/PMC2914198/
  6. EGFR‑mutant lung adenocarcinoma as a precision oncology subtype (TLCR 2022): https://tlcr.amegroups.com/article/view/83462/html.
  7. Harrow J, Frankish A, Gonzalez JM, et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Research. 2012;22(9):1760-1774. doi:10.1101/gr.135350.111.
  8. Frontiers in Immunology (2023) high‑TMB distributions across TCGA: https://pmc.ncbi.nlm.nih.gov/articles/PMC9998480/; Lawrence, M.S., et al. (2013/2016) TCGA mutation burden patterns: https://pmc.ncbi.nlm.nih.gov/articles/PMC4930685/
  9. Journal for ImmunoTherapy of Cancer (2025) 13:e010311 (8440 pts) https://jitc.bmj.com/content/13/2/e010311; Nature Medicine (2023) https://www.nature.com/articles/s41591-022-02163-w; Meta‑analysis (2022) https://pmc.ncbi.nlm.nih.gov/articles/PMC8956924/

Early public release. Content evolves through continuous review. Questions: [email protected] · CC BY 4.0 where applicable.