Skip to content

AI & Machine Learning in Oncology

Note: This page is educational and reflects the state of the literature in 2025. It does not replace medical advice.

TL;DR

Machine learning is now embedded across oncology — screening (lung LDCT, mammography), digital pathology, radiomics, genomic interpretation, drug discovery, trial matching, and outcome prediction. The technical advances are real, but most production deployments still face the same hard problems: distribution shift, label quality, calibration in subgroups, regulatory framing, and the gap between AUC on a held-out test set and improving outcomes for a real patient. This page is the orientation map; see ML pitfalls in oncology for the failure modes you must internalize.


1. Where ML actually has measurable impact today

DomainExamples (2024–2025)Status
Lung cancer screening on LDCTAI-driven nodule detection, malignancy risk scoring, end-to-end risk prediction[1]Several FDA-cleared tools; integrated in clinical workflow at scale
Mammography triageWorklist prioritization, missed-cancer reductionMultiple cleared tools; deployed in screening programs
Digital pathologyProstate Gleason assist, breast HER2/Ki-67 quantification, MSI prediction from H&ECleared tools; integration with LIS expanding
PD-L1 / TMB / TME predictionML scoring of immunotherapy benefit signals from images and omics[2]Mostly research-stage; some IVD validation in progress
Genomic variant predictionPathogenicity scoring, splice impact, structural variant classificationUsed widely in clinical interpretation pipelines
Drug discoveryProtein structure (AlphaFold), generative chemistry, property predictionReshaped early discovery; clinical translation lags
Trial matchingEHR/NLP → eligibility, ClinicalTrials.gov linkageSeveral platforms in production
Outcome / risk predictionSurvival, treatment response, toxicityMany models; few well-validated for deployment
OperationsSchedule optimization, no-show prediction, sepsis alertsCommon in academic centers

For depth on the screening and immunotherapy threads, see refs and. Sources: [1], [2]


2. Data modalities, briefly

  • Genomics / multi-omics — VCF, expression matrices, methylation, variant interpretation. See From FASTQ to variants and Multi-omics.
  • Imaging — radiology (CT, MRI, PET, US, mammography), pathology (WSI), endoscopy, dermatology.
  • Free text — pathology reports, radiology reports, oncology notes, discharge summaries.
  • Structured EHR — labs, vitals, medications, ICD codes, procedures.
  • Trial / outcome registries — pre-registered endpoints, AE reports.
  • Patient-reported outcomes — symptom diaries, ePROs, wearable signals.

Each modality has its own quirks: imaging needs preprocessing pipelines and acquisition-protocol awareness; genomics needs versioned reference and variant annotation; text needs strong de-identification and ontology grounding.


3. Model classes commonly used

Model classWhere it shines
Gradient-boosted trees (XGBoost, LightGBM)Tabular EHR data, structured features; strong baseline
CNNsMedical imaging (still the dominant class in production)
Transformers / vision transformersWhole-slide pathology, multi-modal fusion
Foundation models for imagingRETFound, BiomedCLIP, pathology FMs — emerging
Foundation models for textBioGPT, GatorTron, Med-PaLM — early operational uses
Graph neural networksDrug discovery, network biology, patient similarity
Survival modelsCox-PH, DeepSurv, time-aware transformers
Diffusion modelsSynthetic data, augmentation, generative chemistry
Reinforcement learningAdaptive trial design, dose optimization (research)

The model class often matters less than how the data is split, what the labels really mean, and how the model is evaluated in clinical workflow.


4. Honest evaluation

Three layers of evaluation, in increasing rigor:

  1. Discrimination — AUC, sensitivity/specificity at operating points.
  2. Calibration — does the model's predicted probability match observed frequency? Far more important than AUC for clinical use; far less reported.
  3. Clinical utility — does the model change a decision that improves an outcome? Decision-curve analysis, prospective deployment, randomized trials of AI vs. no AI.

Common reporting failures:

  • AUC reported on the same site/scanner used for training (no external validation).
  • Test set leakage (multiple slices from the same patient split across train/test).
  • Class imbalance ignored (rare cancers always look "high accuracy" if you predict "no cancer").
  • No subgroup analysis (sex, age, race/ethnicity, scanner manufacturer, geography).
  • Threshold optimized post-hoc to maximize a single metric.

Reference standards: TRIPOD-AI, CONSORT-AI, SPIRIT-AI for clinical AI reporting and trial design.


5. Regulatory and deployment context

  • FDA (US) — SaMD framework; pre-cert pilot; Predetermined Change Control Plan (PCCP) for AI/ML lifecycle management.
  • EU — MDR + AI Act (2024) — high-risk medical AI requires conformity assessment, transparency, human oversight, post-market surveillance.
  • Brazil — ANVISA RDC 657/2022 and RDC 751/2022 cover SaMD; AI-specific guidance evolving; LGPD for data protection.

Deployment-time obligations beyond accuracy:

  • Versioning and reproducibility — exact model + preprocessing reproducible from a tag.
  • Monitoring for drift — input distribution and outcome calibration tracked over time.
  • Safety logging — actionable alerts when performance degrades.
  • User interface and decision support — show the model's confidence and uncertainty meaningfully.
  • Human oversight — mandatory for high-risk recommendations.
  • Recall and rollback — ability to disable or revert a model fast when problems are found.

For the regulatory framing in detail, see Regulatory & ethics.


6. Fairness, equity, and the data-shift problem

Models trained on US/European populations frequently underperform on Brazilian, African, Asian, or Indigenous populations.Causes: Sources: [2]

  • Distribution shift — different acquisition equipment, patient demographics, comorbidity patterns.
  • Label bias — historical care disparities encoded as ground truth.
  • Sampling bias — under-represented populations under-represented in training data.

Mitigations:

  • Test on local data before deployment, period.
  • Monitor subgroup performance, not just overall metrics.
  • Recalibrate models for local populations when feasible.
  • Build local datasets — Brazilian initiatives (e.g., A.C. Camargo, Albert Einstein, USP, INCA, ABRACE) are filling part of the gap.
  • Federated learning where centralization isn't possible.

7. Best practices for technologists building oncology ML

  1. Read the protocol before writing code. Domain framing is the highest-leverage decision.
  2. Get a clinician on the team. Not as a stakeholder — as a co-developer.
  3. Prefer simple, well-calibrated baselines before reaching for foundation models.
  4. Hold out by patient, by site, by time — not by row.
  5. External validation is non-negotiable for clinical use.
  6. Calibration > AUC for clinical decision support.
  7. Plan for monitoring before plan for deployment.
  8. Document everything the way a regulator would expect to read it (pipelines, data lineage, evaluation, change history).
  9. Measure clinical utility, not just statistical performance. A trial of "AI vs. no AI" is the gold standard.
  10. Read ML pitfalls in oncology before starting.

8. Common myths to push back on

  • "Bigger model = better." Not for clinical use. Smaller, well-calibrated, locally validated models often outperform.
  • "We can replace radiologists / pathologists." Augmentation is the realistic 5-year story; replacement is mostly not the goal nor possible.
  • "Privacy isn't an issue if we de-identify." Genomic data are inherently re-identifiable; treat them like PHI.
  • "Higher AUC always helps patients." Often not: calibration, threshold, and workflow integration matter more.
  • "If it works at one site, it works everywhere." Almost never true.

See also


References

  1. Adams SJ, Mikhael P, Wohlwend J, et al. Artificial Intelligence and Machine Learning in Lung Cancer Screening. Thorac Surg Clin 2023;33:401-409. PMID 37806742. https://doi.org/10.1016/j.thorsurg.2023.03.001
  2. Gao Q, Yang L, Lu M, Jin R, Ye H, Ma T. The artificial intelligence and machine learning in lung cancer immunotherapy. J Hematol Oncol 2023;16:55. PMID 37226190. https://doi.org/10.1186/s13045-023-01456-y
  3. Meyer ML, Fitzgerald BG, Paz-Ares L, et al. New promises and challenges in the treatment of advanced non-small-cell lung cancer. Lancet 2024;404:803-822. PMID 39121882. https://doi.org/10.1016/S0140-6736(24)01029-8
  4. U.S. National Cancer Institute. https://www.cancer.gov/about-cancer/understanding/what-is-cancer
  5. American Cancer Society. https://www.cancer.org/cancer.html
  6. Cleveland Clinic. Cancer (overview). https://my.clevelandclinic.org/health/diseases/12194-cancer
  7. A.C. Camargo Cancer Center. https://accamargo.org.br
  8. Fundação do Câncer (Brasil). https://www.cancer.org.br/
  9. Ministério da Saúde / BVS. ABC do câncer. https://bvsms.saude.gov.br/bvs/publicacoes/abc_do_cancer.pdf
  10. ANVISA — Agência Nacional de Vigilância Sanitária. https://www.gov.br/anvisa/pt-br

Early public release. Content evolves through continuous review. Questions: [email protected] · CC BY 4.0 where applicable.