Skip to main content
Docker Image: staphb/freebayes:1.3.7
FreeBayes is a Bayesian haplotype-based variant caller developed by Erik Garrison. FreeBayes uses a direct haplotype evaluation method — it considers all possible alleles at a locus simultaneously and computes Bayesian posterior probabilities for each genotype. It was designed to be fast, simple, and capable of calling SNPs, indels, MNPs (multi-nucleotide polymorphisms), and complex events in a single pass. FreeBayes is notable for its rich set of Bayesian priors (population genetics, Hardy-Weinberg, allele balance) that can be individually toggled. This makes it the tool with the most “knobs” for controlling the statistical model.
ApproachBayesian haplotype evaluation with toggleable priors
StrengthsFine-grained prior control, MNPs, complex events
Parameters22 across 9 categories
ComputeMedium

How It Works

1. Candidate Identification

FreeBayes scans the BAM file in the target region and identifies positions where reads differ from the reference. It applies coverage and allele fraction filters to determine which sites are worth evaluating.

2. Haplotype Construction

At each candidate site, FreeBayes considers all observed alleles and constructs candidate haplotypes within a configurable window (max_complex_gap). Unlike GATK, it does not perform full local assembly — instead, it directly evaluates the alleles observed in reads.

3. Bayesian Genotyping

For each candidate site, FreeBayes calculates:
  • Data likelihood: Probability of observing the reads given each possible genotype, using base quality and mapping quality
  • Prior probability: Based on the Ewens Sampling Formula (controlled by theta), Hardy-Weinberg Equilibrium, allele balance expectations, and binomial observation model
  • Posterior probability: Combined likelihood × prior, used to call the genotype

4. Output

Sites passing the posterior probability threshold (pvar) and quality filters are emitted as variant calls.

Hyperparameters

Quality Filtering

ParameterRangeDefaultDescription
min_mapping_quality0-601Minimum mapping quality for a read to be used. The default of 1 is permissive — it keeps almost everything except reads that map equally well everywhere (mapQ=0). Higher values filter ambiguously mapped reads and can reduce false positives in repetitive regions.
min_base_quality0-501Minimum base quality for an allele observation. The default of 1 is extremely permissive. Higher values filter low-quality bases that contribute noise.
base_quality_cap0-600Cap all base qualities at this value. 0 means disabled (no cap). Some instruments report overly optimistic quality scores. Setting a cap prevents any single base from having outsized influence.

Allele Detection Thresholds

ParameterRangeDefaultDescription
min_alternate_fraction0.0-1.00.05Minimum fraction of reads at a site supporting the alternate allele. Lower values increase sensitivity to mosaic or low-VAF variants. Higher values reduce noise. This is a critical sensitivity control.
min_alternate_count1-1002Minimum absolute number of reads supporting the alternate allele. The hard floor for alt support. Lower values maximize sensitivity. Higher values provide stronger evidence requirements.
min_alternate_qsum0-100000Minimum sum of base qualities across all reads supporting the alternate allele. Acts as a quality-weighted version of min_alternate_count. Provides a smoother filter than a hard count threshold.

Coverage

ParameterRangeDefaultDescription
min_coverage0-10000Minimum total read depth to process a site. 0 means process everything. Higher values skip extremely low-coverage sites where calling is unreliable.

Read Filtering

ParameterRangeDefaultDescription
mismatch_base_quality_threshold0-6010Base quality threshold for counting mismatches in read-level filters. Only mismatches at bases with quality >= this value count toward read_max_mismatch_fraction. Lower values count more mismatches (stricter filtering). Higher values only count high-confidence mismatches.
read_max_mismatch_fraction0.0-1.01.0Maximum fraction of read bases that can be mismatches before excluding the read. The default of 1.0 disables this filter entirely. Lower values remove reads with a high proportion of mismatches, which strongly suggests misalignment or contamination.

Genotype Likelihood / Priors

These parameters control the Bayesian statistical model and affect how FreeBayes weighs evidence and applies priors.
ParameterRangeDefaultDescription
theta0.0-0.10.001Population-scaled mutation rate, used as the parameter for the Ewens Sampling Formula prior. Higher values make the caller believe variants are more common, increasing sensitivity (more calls). Lower values make it more conservative. The default of 0.001 matches typical human nucleotide diversity.
read_dependence_factor0.0-1.00.9Scaling factor for successive observations from the same position/strand. Models the non-independence of reads (due to PCR amplification). A value of 1.0 treats all reads as independent. Lower values discount redundant evidence more aggressively, reducing false positives from PCR bias. Higher values trust all reads equally. Strongly affects the sensitivity/specificity tradeoff.
pvar0.0-1.00.0Minimum posterior probability to report a variant. At 0.0, all sites passing filters are reported. Higher values act as a Bayesian quality gate, only reporting variants the model is confident about.
use_mapping_qualitytrue/falsefalseIncorporate mapping quality into data likelihood calculations. When enabled, reads with lower mapping quality contribute less to the genotype likelihood. Enabling can improve accuracy in regions with ambiguous mappings.
harmonic_indel_qualitytrue/falsefalseUse the harmonic mean of flanking base qualities for indels instead of the minimum. The harmonic mean is more nuanced than the minimum and can provide better indel quality estimates.

Prior Model Toggles

FreeBayes applies several Bayesian priors by default. Turning them off removes assumptions about population genetics.
ParameterValuesDefaultDescription
hwe_priors_offtrue/falsefalseDisable Hardy-Weinberg Equilibrium prior. HWE priors favor genotypes consistent with expected population frequencies (e.g., if alt allele frequency is 0.3, the prior favors het over hom-alt).
binomial_obs_priors_offtrue/falsefalseDisable binomial observation priors. These model the expected distribution of allele observations given a genotype (e.g., a het should show ~50% alt reads). Useful when observation distributions are systematically skewed.
allele_balance_priors_offtrue/falsefalseDisable allele balance probability prior. This prior penalizes genotypes where the observed allele balance doesn’t match expectations. Similar to binomial priors but operates at the aggregate level.

Contamination

ParameterRangeDefaultDescription
prob_contamination0.0-1.00.0Prior probability that a read comes from contaminating DNA. Higher values raise the bar for calling heterozygous variants, as low-frequency alleles might be attributed to contamination.

Population Genetics

ParameterRangeDefaultDescription
ploidy1-102Assumed ploidy. 2 for diploid human calling on autosomes. Changing this fundamentally alters the genotyping model.
use_best_n_alleles0-200Limit evaluation to the N best SNP alleles. 0 means evaluate all observed alleles. Lower values can speed up calling at multi-allelic sites without losing accuracy for typical biallelic variants.

Haplotype / Complex Variants

ParameterRangeDefaultDescription
max_complex_gap0-1003Maximum distance (in bp) between variants that can be grouped into a single complex allele (MNP or complex event). Higher values allow FreeBayes to call complex events spanning more bases. Lower values force variants to be called individually.
min_repeat_entropy0-41Minimum Shannon entropy (in bits) for a repeat to trigger repeat-aware calling. Lower values are more permissive (more regions treated as repeats). Higher values only flag highly repetitive regions.
min_repeat_size1-1005Minimum total length (in bp) of a short tandem repeat region to trigger repeat-aware calling. Lower values apply repeat handling to shorter repeats. Higher values only activate for longer repeat tracts. Affects accuracy in homopolymer and STR regions.

Algorithm

ParameterRangeDefaultDescription
genotyping_max_banddepth1-207Maximum depth of the banded genotype likelihood calculation. Controls how many alternative genotypes are evaluated per sample. Higher values allow more thorough exploration of the genotype space at multi-allelic sites. Lower values are faster but may miss the correct genotype at complex sites.