BCFtools - Minos

Docker Image: quay.io/biocontainers/bcftools:1.20 + samtools:1.20

BCFtools is the variant calling component of the SAMtools/HTSlib ecosystem, maintained by the Wellcome Sanger Institute. It uses a classic pileup-based approach: for each genomic position, it stacks up all the reads, computes genotype likelihoods from the base and mapping qualities, then applies a Bayesian caller to determine genotypes. BCFtools is fast, lightweight, and well-suited for straightforward variant calling. Its two-stage pipeline (mpileup → call) gives miners independent control over the data preparation (mpileup) and genotyping (call) stages.


Approach	Pileup-based genotype likelihoods + Bayesian caller
Strengths	Speed, low memory, independent pipeline control
Parameters	22 across 7 categories
Compute	Low

How It Works

The pipeline runs three stages:

1. bcftools mpileup

Reads the BAM file and constructs a pileup at each genomic position. For each position, it computes:

Base Alignment Quality (BAQ): Probabilistic realignment that adjusts base qualities near indels to reduce false SNP calls caused by misalignment
Genotype likelihoods: Based on base qualities, mapping qualities, and an indel error model
Indel candidates: Using a gap-aware model with configurable open/extension penalties

The mpileup stage outputs genotype likelihoods in BCF format.

2. bcftools call

Takes genotype likelihoods from mpileup and applies a caller model:

Multiallelic caller (-m): Uses a constrained likelihood framework that considers all alleles simultaneously. Controlled by the prior parameter (mutation rate). The BCFtools documentation recommends this as the standard caller.
Consensus caller (-c): Uses a simple consensus approach controlled by pval_threshold.

3. bcftools norm

Normalizes variants: left-aligns indels, splits multi-allelic sites, and ensures VCF consistency.

Hyperparameters

Base/Mapping Quality (mpileup)

Parameter	Flag	Range	Default	Description
`min_MQ`	`-q`	0-60	0	Minimum mapping quality for a read to be included. The default of 0 includes all reads, even those mapping to multiple locations. Higher values filter ambiguously mapped reads.
`min_BQ`	`-Q`	0-50	13	Minimum base quality for a base to be considered. The default of 13 corresponds to roughly a 5% error rate. Higher values increase precision. Lower values increase sensitivity at the cost of more noise.
`max_BQ`	`--max-BQ`	1-90	60	Cap base qualities at this maximum value. Prevents overly optimistic quality scores from dominating the likelihood calculation. Useful when instrument base quality calibration is poor.
`delta_BQ`	`--delta-BQ`	0-99	30	Smoothing parameter: if a base’s quality exceeds its neighbor’s quality + delta_BQ, it’s reduced. Prevents isolated high-quality bases from having outsized influence. Lower values apply more aggressive smoothing. 99 effectively disables this filter.
`adjust_MQ`	`-C`	0-100	50	Coefficient for downward-adjusting mapping quality. Reduces the effective mapping quality, making the caller slightly more conservative. 0 disables the adjustment. If your BAMs are BWA-aligned, 50 is appropriate.

Depth Limits (mpileup)

Parameter	Flag	Range	Default	Description
`max_depth`	`-d`	0-10000	250	Maximum read depth per file. Positions exceeding this depth are downsampled randomly.
`max_idepth`	`-L`	1-10000	250	Maximum depth for indel calling specifically. If depth exceeds this, indel calling is skipped at that position. Must be proportional to expected coverage.

BAQ — Base Alignment Quality (mpileup)

BAQ is a probabilistic realignment technique that adjusts base qualities near indels. It’s one of BCFtools’ key innovations for reducing false SNPs caused by indel-related misalignment.

Parameter	Flag	Values	Default	Description
`no_BAQ`	`-B`	true/false	false	Disable BAQ entirely. BAQ reduces false positive SNPs near indels but can also suppress real SNPs near real indels.
`full_BAQ`	`-D`	true/false	false	Apply BAQ on all reads, not just those in problematic regions. The default uses heuristics to decide when BAQ is needed. Enabling gives more thorough quality adjustment at the cost of some compute time.
`redo_BAQ`	`-E`	true/false	false	Recalculate BAQ from scratch, ignoring any existing BQ tags in the BAM. Enable if the BAM was processed by a pipeline that pre-computed BQ tags with different settings.

Indel Calling (mpileup)

These parameters control the indel error model in mpileup and directly affect indel calling behavior.

Parameter	Flag	Range	Default	Description
`open_prob`	`-o`	1-60	40	Phred-scaled gap open probability. The value represents how unlikely the caller believes a gap opening (start of an indel) is due to sequencing error. Lower values mean the caller believes gap opens are more likely real (not errors), so it calls more indels. Higher values are more conservative.
`ext_prob`	`-e`	1-60	20	Phred-scaled gap extension probability. Controls how unlikely extending an existing gap is. Lower values make the caller more willing to call longer indels. Higher values penalize longer gaps.
`gap_frac`	`-F`	0.0-1.0	0.002	Minimum fraction of reads that must be gapped at a position for an indel candidate to be generated. Lower values increase indel sensitivity. Higher values reduce false positive indels.
`tandem_qual`	`-h`	0-1000	500	Coefficient for homopolymer error modeling. Given an l-long homopolymer run, the sequencing error of an indel of size s is modeled as `tandem_qual * s / l`. Higher values tell the caller that indels in homopolymers are more likely real (more calls). Lower values are more skeptical of homopolymer indels. The change from 100 (BCFtools 1.12) to 500 (1.20) significantly increased indel sensitivity.
`indel_bias`	`--indel-bias`	0.1-5.0	1.0	Global scaling factor for indel scores. Values below 1.0 reduce indel calling (favor precision). Values above 1.0 increase indel calling (favor recall).
`del_bias`	`--del-bias`	0.1-2.0	1.0	Relative likelihood bias between deletions and insertions. Values below 1.0 make the caller believe deletions are less likely to be real (more likely artifacts). Values above 1.0 favor deletions. The default of 1.0 treats them equally. PacBio CCS profiles use 0.4 because PacBio reads have systematic deletion artifacts. For Illumina data, 1.0 is typical.
`min_ireads`	`--min-ireads`	1-100	1	Minimum number of gapped reads to generate an indel candidate. Higher values require more evidence before considering an indel, reducing false positives.
`score_vs_ref`	`--score-vs-ref`	0.0-1.0	0.0	Controls whether indel quality is scored against the reference allele (1.0) or the next-best alternative (0.0). Values between 0 and 1 interpolate. Higher values make indel calling more stringent (must clearly beat the reference). PacBio CCS profiles use 0.7.

Read Filtering (mpileup)

Parameter	Flag	Values	Default	Description
`count_orphans`	`-A`	true/false	false	Include anomalous read pairs — reads that are paired but not “properly paired” (unexpected insert size or orientation). By default, these are discarded. Enabling adds more data but may introduce mapping artifacts.

Caller Mode (call)

Parameter	Flag	Values	Default	Description
`multiallelic_caller`	`-m`	true/false	true	Use the multiallelic caller. When disabled, falls back to the consensus caller (`-c`). The multiallelic caller simultaneously evaluates all possible alleles at a site.
`variants_only`	`-v`	true/false	true	Only output variant sites. When disabled, all sites (including reference) are output.

Genotyping (call)

Parameter	Flag	Range	Default	Description
`ploidy`	`--ploidy`	GRCh37 / GRCh38 / X / Y / 1 / 2	GRCh38	Predefined ploidy model. `GRCh38` applies diploid calling with appropriate sex chromosome handling.
`prior`	`-P`	0.0-1.0	0.0011	Expected substitution (mutation) rate. Only used with the multiallelic caller (`-m`). This is the Bayesian prior probability of observing a variant at any given position. Higher values increase sensitivity — the caller expects more variants. Lower values increase specificity.
`pval_threshold`	`-p`	0.0-1.0	0.5	P-value threshold for the consensus caller (`-c`) only. This parameter has no effect when using the multiallelic caller (`-m`).

The pval_threshold parameter only applies when using the consensus caller (-c). It has no effect when using the multiallelic caller (-m), which is the default and recommended mode.

Documentation Index

​How It Works

​1. bcftools mpileup

​2. bcftools call

​3. bcftools norm

​Hyperparameters

​Base/Mapping Quality (mpileup)

​Depth Limits (mpileup)

​BAQ — Base Alignment Quality (mpileup)

​Indel Calling (mpileup)

​Read Filtering (mpileup)

​Caller Mode (call)

​Genotyping (call)

How It Works

1. bcftools mpileup

2. bcftools call

3. bcftools norm

Hyperparameters

Base/Mapping Quality (mpileup)

Depth Limits (mpileup)

BAQ — Base Alignment Quality (mpileup)

Indel Calling (mpileup)

Read Filtering (mpileup)

Caller Mode (call)

Genotyping (call)