We present full-genome genotype imputations for 88 classical laboratory mouse strains, using a novel method, and genotypes for 12 strains determined by sequencing. Using genotypes at 549,683 SNP loci obtained with the Mouse Diversity Array, we partitioned the genome of 100 mouse strains into 40,647 intervals that exhibit no evidence of historical recombination. For each of these intervals we inferred a local phylogenetic tree. We combined these data with 12 million loci with sequence variations recently discovered by whole-genome sequencing in a common subset of 12 classical laboratory strains. For each phylogenetic tree we identified strains sharing a leaf node with one or more of the sequenced strains. We then imputed high- and medium-confidence genotypes for each of 88 nonsequenced genomes. Among inbred strains, we imputed 92% of SNPs genome-wide, with 71% in high-confidence regions. Our method produced 977 million new genotypes with an estimated per-SNP error rate of 0.083% in high-confidence regions and 0.37% genome-wide.

Download is a zipped (.zip) file consisting of 20 comma-separated (.csv) files, one for each autosome + X. Positions are relative to NCBI Build 37. Confidence is listed following the allele for each strain and can be 0 (low), 1 (medium), or 2 (high). Alleles for Sanger strains are indicated with a confidence value of 3. The files annotated with the SNP effects indicate the alternate allele (only when an effect is listed) and the consequence of the SNP as per the Ensembl 64 VEP v2.2 immediately after the position (columns 2 and 3). SNP consequences are taken directly from the Sanger annotated SNP files. Each file lists one SNP per line in the following format:

Position Strain 1 Confidence Strain 2 Confidence ... Strain n Confidence
Position of SNP 1 Strn 1 allele Strn 1 conf. Strn 2 allele Strn 2 conf. ... Strn n allele Strn n conf.
Position of SNP 2 Strn 1 allele Strn 1 conf. Strn 2 allele Strn 2 conf. ... Strn n allele Strn n conf.
... ... ... ... ... ... ... ...
Position of SNP n Strn 1 allele Strn 1 conf. Strn 2 allele Strn 2 conf. ... Strn n allele Strn n conf.

File # of strains Date Size
Imputed w/confidence scores 88 10/13/2011 139.4 Mb
Imputed w/confidence scores + Sanger 100 2/08/2012 156.6 Mb
Imputed w/confidence scores + Sanger + SNP effects 100 2/13/2012 162.4 Mb

These tools allow you to filter the full data set down to a particular set of strains and region of interest. Select a chromosome, start and end positions (NCBI build 37), and a subset of strains. Click 'Update' to view the distribution of imputation confidence for the selected strains averaged over your region of interest. After choosing the subset you want, click 'Filter and Download' to download the filtered data.


3000000 = 3,000,000 = 3m = 3M = 3000k = 3KK

Type a strain name to search. Multiple strains can be selected.

Confidence plot

Imputation confidence is shown as a stacked histogram where green represents high confidence, brown is medium confidence, and red is low confidence.

View a single strain

Type a strain name to search.

Use of these data should cite the reference below.


Jeremy R. Wang, Fernando Pardo-Manuel de Villena, Heather A. Lawson, James M. Cheverud, Gary A. Churchill, and Leonard McMillan. Imputation of SNPs in inbred mice using local phylogeny. Genetics, 2012.