Skip to content

Datasets & Benchmarks

Curated cancer datasets and benchmarking resources.

TCGA Datasets

TCGA-BRCA

  • Description: Breast Invasive Carcinoma
  • Samples: 1,098 primary tumors
  • Data Types: WGS, WXS, RNA-seq, miRNA, Methylation
  • Download: GDC Portal

TCGA-LUAD

  • Description: Lung Adenocarcinoma
  • Samples: 1,185 primary tumors
  • Data Types: WGS, WXS, RNA-seq, miRNA, Methylation
  • Download: GDC Portal

TCGA-COAD

  • Description: Colon Adenocarcinoma
  • Samples: 521 primary tumors
  • Data Types: WGS, WXS, RNA-seq, miRNA, Methylation
  • Download: GDC Portal

Benchmark Datasets

MSK-IMPACT

  • Description: Targeted sequencing panel
  • Samples: 10,000+ patients
  • Data Types: Targeted sequencing
  • Download: cBioPortal

GENIE

  • Description: AACR Project GENIE
  • Samples: 100,000+ patients
  • Data Types: Clinical + genomic
  • Download: GENIE Portal

Single Cell Datasets

Human Cell Atlas

  • Description: Single-cell reference atlas
  • Samples: 1M+ cells
  • Data Types: scRNA-seq, scATAC-seq
  • Download: HCA Portal

Imaging Datasets

TCIA Collections

  • Description: Medical imaging collections
  • Samples: 100,000+ images
  • Data Types: CT, MRI, PET, Pathology
  • Download: TCIA Portal

Data Standards

  • DICOM - Medical imaging
  • BAM/SAM - Sequence alignment
  • VCF - Variant calling
  • MAF - Mutation annotation
  • GFF/GTF - Genome annotation

Early public release. Content evolves through continuous review. Questions: [email protected] · CC BY 4.0 where applicable.