Whole Genome Sequencing (WGS)- Introduction, workflow, Pipelines, Applications


DNA sequencing is a technique used to test for genetic abnormalities that involves determining the order of DNA building blocks (nucleotides) in an individual’s genetic code. To identifying genetic differences, whole genome sequencing is increasingly employed in healthcare and research; both rely on modern technology that allow quick sequencing of vast amounts of DNA. These methods are referred to as next-generation sequencing.

Whole genome sequencing (WGS) is a comprehensive method that detects nearly all DNA alterations by sequencing both the protein-coding and non-coding components of the genome.

As a result of insufficient genetic testing, millions of patients are now suffering from misdiagnosed or undetected genetic disorders. Although procedures such as single gene testing, panel testing, or microarrays can be used to determine the origin of a disease in some circumstances, these analyses are ultimately limited and may fail to show the whole genetic explanation. WGS, on the other hand, overcomes such limitations and is the only test capable of detecting almost all forms of disease-causing genomic variations in a single test.

With great accuracy, whole genome resequencing can detect DNA biomarkers like single nucleotide polymorphisms (SNPs), insertions and deletions (indels), structural variations (SVs), copy number variations (CNVs), and other genetic alterations in the sequenced species. Also, it offers a rare chance to identify polymorphic variations in a population, which thoroughly reveals the fundamental processes underpinning species creation, development, expansion, and evolution.


The procedure for whole genome sequencing is a complicated one that calls for specific knowledge and tools. However, developments in sequencing technology and data processing tools have increased the accessibility of this potent tool for researchers working in a variety of sectors, including environmental science, agriculture, and medicine.

Depending on the technology employed and the complexity of the genome being sequenced, the complete WGS procedure may take several weeks or even months. WGS is now a potent tool for comprehending the genetic underpinnings of disease and other biological processes due to recent developments in sequencing technologies and data analysis techniques.

Workflow of whole exome sequencing
Source: Google

Fig: Workflow of whole exome sequencing

Workflow stepDescriptionKey techniques and tools
Sample collection, DNA extraction and sample preparationThe first step is to obtain a sample from the organism of interest, such as blood, saliva, or tissue. DNA is then extracted from the sample using a variety of methods. Obtain and prepare the sample for sequencingDNA extraction kits, quality control metrics (e.g., Qubit, TapeStation), library preparation kits
Library preparation and SequencingPrepare the DNA for sequencing by fragmenting, add adapters and generate raw sequence dataLibrary preparation kits.
Illumina, PacBio, Oxford Nanopore, or other sequencing platforms, sequencing reagents and consumables
Read processingClean, trim, and filter raw sequence data to improve accuracyFastQC, Trimmomatic, BBDuk, Cutadapt
Genome assemblyReconstruct the genome from the sequence dataSPAdes, ABySS, Canu, Flye
Genome annotationIdentify and annotate genes, regulatory regions, repeats, and other genomic featuresMAKER, BRAKER, NCBI Prokaryotic Genome Annotation Pipeline
Comparative genomicsCompare the sequenced genome to reference genomes or other sequenced genomes to identify variationsMUMmer, QUAST, BUSCO, Roary
Functional analysisInterpret the biological function of genes and other genomic featuresInterProScan, Blast2GO, EggNOG, KEGG

A typical computational pipeline for whole genome sequencing:

The pipeline depicted above is a simplified version; the actual processes and software utilized will vary based on the sequencing technology, the quality and complexity of the genome, and the specific research or clinical topic being addressed. The pipeline’s overarching purpose, however, is to extract accurate, complete, and interpretable genomic information from raw sequencing data.

Quality ControlCheck for read quality, adapter contamination, and other quality metrics. Raw sequencing data is checked for quality control metrics such as read length, base quality scores, and adapter contamination. Low-quality or adapter-containing reads are removed to ensure accuracy in downstream analysis.FastQC, Trimmomatic, Cutadapt
Read AlignmentMap reads to a reference genome or assemble de novo. Quality-controlled reads are aligned to a reference genome or de novo assembled using read aligner software. The goal is to identify the genomic location of each read and to account for sequencing errors, indels, and other variations.BWA, Bowtie, SOAPdenovo
Variant CallingVariant calling software is used to identify variants such as single nucleotide polymorphisms (SNPs), insertions, deletions, and structural variations, by comparing the aligned reads to the reference genome. Variants are filtered based on various quality control metrics.GATK, FreeBayes, Samtools
Variant FilteringFilter variants based on quality control metrics, such as depth of coverage, genotype quality, and allele frequencyVCFtools, bcftools, GATK
AnnotationThe variants are annotated to predict their functional impact, such as effects on protein structure, regulatory regions, or splicing. The annotation can also provide information on population frequencies, conservation, and disease associations.ANNOVAR, SnpEff, VEP
InterpretationThe annotated variants are interpreted to determine their clinical or biological significance, such as their association with diseases, drug responses, or phenotypic traits. This may involve integration with other datasets, such as gene expression or functional assays.ClinVar, dbSNP, COSMIC, InterVar


Identification of novel disease genesWES can be used to identify new genes associated with diseases, expanding our understanding of disease mechanisms and potential therapeutic targets.
Population genetics and evolutionary studiesWES can be used to study genetic variation within and between populations, providing insights into human evolution and migration patterns.
Functional genomicsWES can be used to investigate the functional consequences of genetic mutations on protein structure and function, informing the development of new therapies and drug targets.
Epigenetic researchWES can be used in combination with epigenetic profiling to investigate the role of epigenetic modifications in gene expression and disease development.
Developmental biologyWES can be used to study the genetic basis of developmental disorders and birth defects, providing insights into the molecular mechanisms underlying normal and abnormal development.
Cancer diagnosis and treatmentWES can identify somatic mutations in cancer cells, providing a more accurate diagnosis and guiding personalized treatment plans, including targeted therapies.
PharmacogenomicsWES can identify genetic variations that affect drug metabolism, efficacy, and toxicity, allowing doctors to prescribe the most effective and safest drugs for each patient.

Advantages of Whole-Genome Sequencing:

  • Rapid identification and characterisation of microorganisms, providing information on strain relatedness, where they originated, and how they evolved
  • Identifying critical virulence factors – unique characteristics that help the pathogen cause illness
  • Antimicrobial resistance WGS can discover which antibiotics microbes are resistant to much more quickly than standard culture approaches.
  • A high-resolution, base-by-base representation of the genome is provided.
  • Captures both large and small variants that targeted other techniques could uncover.
  • Identifies possible causal variations for more research into gene expression and regulatory mechanisms.
  • Delivers immense quantities of data in a short period of time to aid in the assembly of new genomes.
  • Outbreak detection, mapping, and analysis


  • Yin, R., Kwoh, C.K. and Zheng, J., 2019. Whole genome sequencing analysis.
  • Uelze, L., Grützke, J., Borowiak, M. et al. Typing methods based on whole genome sequencing data. One Health Outlook 2, 3 (2020).
  • Brunfeldt, M., Teare, H., Schuurbiers, D. et al. Simulating the Genetics Clinic of the Future — whether undergoing whole-genome sequencing shapes professional attitudes. J Community Genet 13, 247–256 (2022)
  • Amor, D.J., 2015. Future of whole genome sequencing. Journal of Paediatrics and Child Health, 51(3), pp.251-254.

Leave a Comment

Your email address will not be published. Required fields are marked *