Skip to content

Reproducible Pipelines

Pre-built analysis pipelines for cancer research.

GATK Variant Calling Pipeline

Overview

Complete pipeline for germline and somatic variant calling using GATK best practices.

Components

  • BWA-MEM - Read alignment
  • GATK - Variant calling
  • VEP - Variant annotation
  • QC - Quality control metrics

Usage

bash
# Run the pipeline
nextflow run hack-cancer/gatk-variant-calling \
  --input samples.csv \
  --reference hg38.fa \
  --output results/

Outputs

  • VCF - Variant calls
  • BAM - Aligned reads
  • QC - Quality metrics
  • Reports - HTML reports

RNA-seq Analysis Pipeline

Overview

Differential gene expression analysis for cancer samples.

Components

  • STAR - Read alignment
  • featureCounts - Read counting
  • DESeq2 - Differential expression
  • GSEA - Pathway analysis

Usage

bash
# Run the pipeline
nextflow run hack-cancer/rnaseq-analysis \
  --input samples.csv \
  --reference hg38.fa \
  --gtf genes.gtf \
  --output results/

Single Cell Analysis Pipeline

Overview

Complete single-cell RNA-seq analysis workflow.

Components

  • CellRanger - Read processing
  • Seurat - Quality control
  • Monocle3 - Trajectory analysis
  • CellChat - Cell communication

Usage

bash
# Run the pipeline
nextflow run hack-cancer/single-cell-analysis \
  --input samples.csv \
  --reference hg38.fa \
  --output results/

Cloud Deployment

Google Cloud

  • Terraform - Infrastructure as code
  • Kubernetes - Container orchestration
  • Cloud Storage - Data storage

AWS

  • CloudFormation - Infrastructure
  • ECS - Container service
  • S3 - Object storage

Reproducibility

  • Docker - Containerization
  • Conda - Environment management
  • Nextflow - Workflow orchestration
  • Git - Version control

Early public release. Content evolves through continuous review. Questions: [email protected] · CC BY 4.0 where applicable.