Docker Image: quay.io/biocontainers/bcftools:1.20 + samtools:1.20
BCFtools is the variant calling component of the SAMtools/HTSlib ecosystem, maintained by the Wellcome Sanger Institute. It uses a classic pileup-based approach: for each genomic position, it stacks up all the reads, computes genotype likelihoods from the base and mapping qualities, then applies a Bayesian caller to determine genotypes.
BCFtools is fast, lightweight, and well-suited for straightforward variant calling. Its two-stage pipeline (mpileup → call) gives miners independent control over the data preparation (mpileup) and genotyping (call) stages.
| |
|---|
| Approach | Pileup-based genotype likelihoods + Bayesian caller |
| Strengths | Speed, low memory, independent pipeline control |
| Parameters | 22 across 7 categories |
| Compute | Low |
How It Works
The pipeline runs three stages:
Reads the BAM file and constructs a pileup at each genomic position. For each position, it computes:
- Base Alignment Quality (BAQ): Probabilistic realignment that adjusts base qualities near indels to reduce false SNP calls caused by misalignment
- Genotype likelihoods: Based on base qualities, mapping qualities, and an indel error model
- Indel candidates: Using a gap-aware model with configurable open/extension penalties
The mpileup stage outputs genotype likelihoods in BCF format.
Takes genotype likelihoods from mpileup and applies a caller model:
- Multiallelic caller (
-m): Uses a constrained likelihood framework that considers all alleles simultaneously. Controlled by the prior parameter (mutation rate). The BCFtools documentation recommends this as the standard caller.
- Consensus caller (
-c): Uses a simple consensus approach controlled by pval_threshold.
Normalizes variants: left-aligns indels, splits multi-allelic sites, and ensures VCF consistency.
Hyperparameters
Base/Mapping Quality (mpileup)
| Parameter | Flag | Range | Default | Description |
|---|
min_MQ | -q | 0-60 | 0 | Minimum mapping quality for a read to be included. The default of 0 includes all reads, even those mapping to multiple locations. Higher values filter ambiguously mapped reads. |
min_BQ | -Q | 0-50 | 13 | Minimum base quality for a base to be considered. The default of 13 corresponds to roughly a 5% error rate. Higher values increase precision. Lower values increase sensitivity at the cost of more noise. |
max_BQ | --max-BQ | 1-90 | 60 | Cap base qualities at this maximum value. Prevents overly optimistic quality scores from dominating the likelihood calculation. Useful when instrument base quality calibration is poor. |
delta_BQ | --delta-BQ | 0-99 | 30 | Smoothing parameter: if a base’s quality exceeds its neighbor’s quality + delta_BQ, it’s reduced. Prevents isolated high-quality bases from having outsized influence. Lower values apply more aggressive smoothing. 99 effectively disables this filter. |
adjust_MQ | -C | 0-100 | 50 | Coefficient for downward-adjusting mapping quality. Reduces the effective mapping quality, making the caller slightly more conservative. 0 disables the adjustment. If your BAMs are BWA-aligned, 50 is appropriate. |
Depth Limits (mpileup)
| Parameter | Flag | Range | Default | Description |
|---|
max_depth | -d | 0-10000 | 250 | Maximum read depth per file. Positions exceeding this depth are downsampled randomly. |
max_idepth | -L | 1-10000 | 250 | Maximum depth for indel calling specifically. If depth exceeds this, indel calling is skipped at that position. Must be proportional to expected coverage. |
BAQ — Base Alignment Quality (mpileup)
BAQ is a probabilistic realignment technique that adjusts base qualities near indels. It’s one of BCFtools’ key innovations for reducing false SNPs caused by indel-related misalignment.
| Parameter | Flag | Values | Default | Description |
|---|
no_BAQ | -B | true/false | false | Disable BAQ entirely. BAQ reduces false positive SNPs near indels but can also suppress real SNPs near real indels. |
full_BAQ | -D | true/false | false | Apply BAQ on all reads, not just those in problematic regions. The default uses heuristics to decide when BAQ is needed. Enabling gives more thorough quality adjustment at the cost of some compute time. |
redo_BAQ | -E | true/false | false | Recalculate BAQ from scratch, ignoring any existing BQ tags in the BAM. Enable if the BAM was processed by a pipeline that pre-computed BQ tags with different settings. |
Indel Calling (mpileup)
These parameters control the indel error model in mpileup and directly affect indel calling behavior.
| Parameter | Flag | Range | Default | Description |
|---|
open_prob | -o | 1-60 | 40 | Phred-scaled gap open probability. The value represents how unlikely the caller believes a gap opening (start of an indel) is due to sequencing error. Lower values mean the caller believes gap opens are more likely real (not errors), so it calls more indels. Higher values are more conservative. |
ext_prob | -e | 1-60 | 20 | Phred-scaled gap extension probability. Controls how unlikely extending an existing gap is. Lower values make the caller more willing to call longer indels. Higher values penalize longer gaps. |
gap_frac | -F | 0.0-1.0 | 0.002 | Minimum fraction of reads that must be gapped at a position for an indel candidate to be generated. Lower values increase indel sensitivity. Higher values reduce false positive indels. |
tandem_qual | -h | 0-1000 | 500 | Coefficient for homopolymer error modeling. Given an l-long homopolymer run, the sequencing error of an indel of size s is modeled as tandem_qual * s / l. Higher values tell the caller that indels in homopolymers are more likely real (more calls). Lower values are more skeptical of homopolymer indels. The change from 100 (BCFtools 1.12) to 500 (1.20) significantly increased indel sensitivity. |
indel_bias | --indel-bias | 0.1-5.0 | 1.0 | Global scaling factor for indel scores. Values below 1.0 reduce indel calling (favor precision). Values above 1.0 increase indel calling (favor recall). |
del_bias | --del-bias | 0.1-2.0 | 1.0 | Relative likelihood bias between deletions and insertions. Values below 1.0 make the caller believe deletions are less likely to be real (more likely artifacts). Values above 1.0 favor deletions. The default of 1.0 treats them equally. PacBio CCS profiles use 0.4 because PacBio reads have systematic deletion artifacts. For Illumina data, 1.0 is typical. |
min_ireads | --min-ireads | 1-100 | 1 | Minimum number of gapped reads to generate an indel candidate. Higher values require more evidence before considering an indel, reducing false positives. |
score_vs_ref | --score-vs-ref | 0.0-1.0 | 0.0 | Controls whether indel quality is scored against the reference allele (1.0) or the next-best alternative (0.0). Values between 0 and 1 interpolate. Higher values make indel calling more stringent (must clearly beat the reference). PacBio CCS profiles use 0.7. |
Read Filtering (mpileup)
| Parameter | Flag | Values | Default | Description |
|---|
count_orphans | -A | true/false | false | Include anomalous read pairs — reads that are paired but not “properly paired” (unexpected insert size or orientation). By default, these are discarded. Enabling adds more data but may introduce mapping artifacts. |
Caller Mode (call)
| Parameter | Flag | Values | Default | Description |
|---|
multiallelic_caller | -m | true/false | true | Use the multiallelic caller. When disabled, falls back to the consensus caller (-c). The multiallelic caller simultaneously evaluates all possible alleles at a site. |
variants_only | -v | true/false | true | Only output variant sites. When disabled, all sites (including reference) are output. |
Genotyping (call)
| Parameter | Flag | Range | Default | Description |
|---|
ploidy | --ploidy | GRCh37 / GRCh38 / X / Y / 1 / 2 | GRCh38 | Predefined ploidy model. GRCh38 applies diploid calling with appropriate sex chromosome handling. |
prior | -P | 0.0-1.0 | 0.0011 | Expected substitution (mutation) rate. Only used with the multiallelic caller (-m). This is the Bayesian prior probability of observing a variant at any given position. Higher values increase sensitivity — the caller expects more variants. Lower values increase specificity. |
pval_threshold | -p | 0.0-1.0 | 0.5 | P-value threshold for the consensus caller (-c) only. This parameter has no effect when using the multiallelic caller (-m). |
The pval_threshold parameter only applies when using the consensus caller (-c). It has no effect when using the multiallelic caller (-m), which is the default and recommended mode.