AI & Machine Learning in Oncology

Note: This page is educational and reflects the state of the literature in 2025. It does not replace medical advice.

TL;DR

Machine learning is now embedded across oncology — screening (lung LDCT, mammography), digital pathology, radiomics, genomic interpretation, drug discovery, trial matching, and outcome prediction. The technical advances are real, but most production deployments still face the same hard problems: distribution shift, label quality, calibration in subgroups, regulatory framing, and the gap between AUC on a held-out test set and improving outcomes for a real patient. This page is the orientation map; see ML pitfalls in oncology for the failure modes you must internalize.

1. Where ML actually has measurable impact today

Domain	Examples (2024–2025)	Status
Lung cancer screening on LDCT	AI-driven nodule detection, malignancy risk scoring, end-to-end risk prediction[1]	Several FDA-cleared tools; integrated in clinical workflow at scale
Mammography triage	Worklist prioritization, missed-cancer reduction	Multiple cleared tools; deployed in screening programs
Digital pathology	Prostate Gleason assist, breast HER2/Ki-67 quantification, MSI prediction from H&E	Cleared tools; integration with LIS expanding
PD-L1 / TMB / TME prediction	ML scoring of immunotherapy benefit signals from images and omics[2]	Mostly research-stage; some IVD validation in progress
Genomic variant prediction	Pathogenicity scoring, splice impact, structural variant classification	Used widely in clinical interpretation pipelines
Drug discovery	Protein structure (AlphaFold), generative chemistry, property prediction	Reshaped early discovery; clinical translation lags
Trial matching	EHR/NLP → eligibility, ClinicalTrials.gov linkage	Several platforms in production
Outcome / risk prediction	Survival, treatment response, toxicity	Many models; few well-validated for deployment
Operations	Schedule optimization, no-show prediction, sepsis alerts	Common in academic centers

For depth on the screening and immunotherapy threads, see refs and. Sources: [1], [2]

2. Data modalities, briefly

Genomics / multi-omics — VCF, expression matrices, methylation, variant interpretation. See From FASTQ to variants and Multi-omics.
Imaging — radiology (CT, MRI, PET, US, mammography), pathology (WSI), endoscopy, dermatology.
Free text — pathology reports, radiology reports, oncology notes, discharge summaries.
Structured EHR — labs, vitals, medications, ICD codes, procedures.
Trial / outcome registries — pre-registered endpoints, AE reports.
Patient-reported outcomes — symptom diaries, ePROs, wearable signals.

Each modality has its own quirks: imaging needs preprocessing pipelines and acquisition-protocol awareness; genomics needs versioned reference and variant annotation; text needs strong de-identification and ontology grounding.

3. Model classes commonly used

Model class	Where it shines
Gradient-boosted trees (XGBoost, LightGBM)	Tabular EHR data, structured features; strong baseline
CNNs	Medical imaging (still the dominant class in production)
Transformers / vision transformers	Whole-slide pathology, multi-modal fusion
Foundation models for imaging	RETFound, BiomedCLIP, pathology FMs — emerging
Foundation models for text	BioGPT, GatorTron, Med-PaLM — early operational uses
Graph neural networks	Drug discovery, network biology, patient similarity
Survival models	Cox-PH, DeepSurv, time-aware transformers
Diffusion models	Synthetic data, augmentation, generative chemistry
Reinforcement learning	Adaptive trial design, dose optimization (research)

The model class often matters less than how the data is split, what the labels really mean, and how the model is evaluated in clinical workflow.

4. Honest evaluation

Three layers of evaluation, in increasing rigor:

Discrimination — AUC, sensitivity/specificity at operating points.
Calibration — does the model's predicted probability match observed frequency? Far more important than AUC for clinical use; far less reported.
Clinical utility — does the model change a decision that improves an outcome? Decision-curve analysis, prospective deployment, randomized trials of AI vs. no AI.

Common reporting failures:

AUC reported on the same site/scanner used for training (no external validation).
Test set leakage (multiple slices from the same patient split across train/test).
Class imbalance ignored (rare cancers always look "high accuracy" if you predict "no cancer").
No subgroup analysis (sex, age, race/ethnicity, scanner manufacturer, geography).
Threshold optimized post-hoc to maximize a single metric.

Reference standards: TRIPOD-AI, CONSORT-AI, SPIRIT-AI for clinical AI reporting and trial design.

5. Regulatory and deployment context

FDA (US) — SaMD framework; pre-cert pilot; Predetermined Change Control Plan (PCCP) for AI/ML lifecycle management.
EU — MDR + AI Act (2024) — high-risk medical AI requires conformity assessment, transparency, human oversight, post-market surveillance.
Brazil — ANVISA RDC 657/2022 and RDC 751/2022 cover SaMD; AI-specific guidance evolving; LGPD for data protection.

Deployment-time obligations beyond accuracy:

Versioning and reproducibility — exact model + preprocessing reproducible from a tag.
Monitoring for drift — input distribution and outcome calibration tracked over time.
Safety logging — actionable alerts when performance degrades.
User interface and decision support — show the model's confidence and uncertainty meaningfully.
Human oversight — mandatory for high-risk recommendations.
Recall and rollback — ability to disable or revert a model fast when problems are found.

For the regulatory framing in detail, see Regulatory & ethics.

6. Fairness, equity, and the data-shift problem

Models trained on US/European populations frequently underperform on Brazilian, African, Asian, or Indigenous populations.Causes: Sources: [2]

Distribution shift — different acquisition equipment, patient demographics, comorbidity patterns.
Label bias — historical care disparities encoded as ground truth.
Sampling bias — under-represented populations under-represented in training data.

Mitigations:

Test on local data before deployment, period.
Monitor subgroup performance, not just overall metrics.
Recalibrate models for local populations when feasible.
Build local datasets — Brazilian initiatives (e.g., A.C. Camargo, Albert Einstein, USP, INCA, ABRACE) are filling part of the gap.
Federated learning where centralization isn't possible.

7. Best practices for technologists building oncology ML

Read the protocol before writing code. Domain framing is the highest-leverage decision.
Get a clinician on the team. Not as a stakeholder — as a co-developer.
Prefer simple, well-calibrated baselines before reaching for foundation models.
Hold out by patient, by site, by time — not by row.
External validation is non-negotiable for clinical use.
Calibration > AUC for clinical decision support.
Plan for monitoring before plan for deployment.
Document everything the way a regulator would expect to read it (pipelines, data lineage, evaluation, change history).
Measure clinical utility, not just statistical performance. A trial of "AI vs. no AI" is the gold standard.
Read ML pitfalls in oncology before starting.

8. Common myths to push back on

"Bigger model = better." Not for clinical use. Smaller, well-calibrated, locally validated models often outperform.
"We can replace radiologists / pathologists." Augmentation is the realistic 5-year story; replacement is mostly not the goal nor possible.
"Privacy isn't an issue if we de-identify." Genomic data are inherently re-identifiable; treat them like PHI.
"Higher AUC always helps patients." Often not: calibration, threshold, and workflow integration matter more.
"If it works at one site, it works everywhere." Almost never true.

References

Adams SJ, Mikhael P, Wohlwend J, et al. Artificial Intelligence and Machine Learning in Lung Cancer Screening. Thorac Surg Clin 2023;33:401-409. PMID 37806742. https://doi.org/10.1016/j.thorsurg.2023.03.001
Gao Q, Yang L, Lu M, Jin R, Ye H, Ma T. The artificial intelligence and machine learning in lung cancer immunotherapy. J Hematol Oncol 2023;16:55. PMID 37226190. https://doi.org/10.1186/s13045-023-01456-y
Meyer ML, Fitzgerald BG, Paz-Ares L, et al. New promises and challenges in the treatment of advanced non-small-cell lung cancer. Lancet 2024;404:803-822. PMID 39121882. https://doi.org/10.1016/S0140-6736(24)01029-8
U.S. National Cancer Institute. https://www.cancer.gov/about-cancer/understanding/what-is-cancer
American Cancer Society. https://www.cancer.org/cancer.html
Cleveland Clinic. Cancer (overview). https://my.clevelandclinic.org/health/diseases/12194-cancer
A.C. Camargo Cancer Center. https://accamargo.org.br
Fundação do Câncer (Brasil). https://www.cancer.org.br/
Ministério da Saúde / BVS. ABC do câncer. https://bvsms.saude.gov.br/bvs/publicacoes/abc_do_cancer.pdf
ANVISA — Agência Nacional de Vigilância Sanitária. https://www.gov.br/anvisa/pt-br

AI & Machine Learning in Oncology ​

TL;DR ​

1. Where ML actually has measurable impact today ​

2. Data modalities, briefly ​

3. Model classes commonly used ​

4. Honest evaluation ​

5. Regulatory and deployment context ​

6. Fairness, equity, and the data-shift problem ​

7. Best practices for technologists building oncology ML ​

8. Common myths to push back on ​

See also ​

References ​