Skip to content

Introduction to Bioinformatics: Decoding Biological Big Data

Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biological data.As an interdisciplinary field of science, bioinformatics combines biology, computer science, informatics, mathematics, and statistics to analyze and interpret biological data. Bioinformatics is essential for analyzing large genomic and proteomic datasets generated in cancer research. Sources: [1]

Note: Information reflects 2025 standards. Verify versions, URLs, and dataset access policies periodically.

Bioinformatics Tools and Methods

Sequence Alignment

Comparison of DNA, RNA, or protein sequences to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences.

Cancer applications:

  • Identification of somatic mutations Sources: [2]
  • Comparison between primary tumors and metastases
  • Analysis of genetic variants

Gene Expression Analysis

Study of the expression of different genes under different conditions. This can help identify genes that are differentially expressed in cancer cells compared to normal cells.

Main techniques:

  • RNA-Seq (RNA sequencing) Sources: [3]
  • DNA microarrays Sources: [4]
  • Single-cell analysis

Structural Genomics

Prediction of the three-dimensional structure of proteins from their amino acid sequence.

Relevance to cancer:

  • Targeted drug design Sources: [5]
  • Understanding pathogenic mutations
  • Prediction of protein-protein interactions

Metabolic Pathway Analysis

Study of the networks of chemical reactions that occur within cells. This can help identify pathways that are altered in cancer and that may be targets for therapy.

Bioinformatics and Cancer

Bioinformatics plays a crucial role in cancer research, enabling the analysis of large amounts of data to identify:

  • Driver mutations vs. passenger mutations
  • Oncogenes and tumor suppressor genes
  • Altered signaling pathways
  • Biomarkers for diagnosis and prognosis

Key Tools

cBioPortal

Web platform for exploration, visualization, and analysis of multidimensional cancer data. Sources: [6]

TCGA (The Cancer Genome Atlas)

Program that characterized more than 20,000 primary cancer and matched normal samples across 33 cancer types. Sources: [7]

GDC (Genomic Data Commons)

Platform for sharing genomic and clinical data; primary portal for TCGA and related datasets. Sources: [8]

Required Technical Skills

Programming Languages

  • Python: BioPython, pandas, numpy
  • R: Bioconductor, ggplot2
  • Bash: Processing genomic files

Analysis Tools

  • Galaxy: Web interface for genomic analysis
  • IGV: Genome viewer
  • BLAST: Sequence similarity search

Next Steps

  1. Install basic tools (Python, R, Bioconductor)
  2. Explore public datasets (TCGA, GEO)
  3. Learn basic workflows for genomic analysis
  4. Practice with real cancer data

Additional Resources


This article is part of the fundamentals series for developers and data scientists interested in cancer research.


References

  1. IUBMB definition and Wikipedia overview of bioinformatics (IUBMB 2011; Wikipedia accessed 2025-10).
  2. Pan-cancer discovery of somatic mutations from RNA-seq (Nat Commun 2024) and related TCGA studies.
  3. Reviews of clinical gene expression assays and RNA‑seq in oncology (e.g., PMC6036716; Thermo Fisher overview).
  4. Gene expression profiling by DNA microarrays and applications in cancer (early foundational reviews and tutorials).
  5. Structural genomics in drug discovery (e.g., protein structure prediction in structure‑based design; recent 2023 reviews).
  6. cBioPortal platform descriptions (Science Signaling 2013; NAR/Methods/PMC articles 2013–2018; cBioPortal docs).
  7. NCI TCGA overview — >20,000 primary cancer and matched normal samples, 33 cancer types; TCGA → GDC.
  8. Genomic Data Commons — official NCI portal for genomic and clinical datasets, including TCGA.

Early public release. Content evolves through continuous review. Questions: [email protected] · CC BY 4.0 where applicable.