We present full-genome genotype imputations for 88 classical laboratory mouse strains, using a novel method, and genotypes for 12 strains determined by sequencing. Using genotypes at 549,683 SNP loci obtained with the Mouse Diversity Array, we partitioned the genome of 100 mouse strains into 40,647 intervals that exhibit no evidence of historical recombination. For each of these intervals we inferred a local phylogenetic tree. We combined these data with 12 million loci with sequence variations recently discovered by whole-genome sequencing in a common subset of 12 classical laboratory strains. For each phylogenetic tree we identified strains sharing a leaf node with one or more of the sequenced strains. We then imputed high- and medium-confidence genotypes for each of 88 nonsequenced genomes. Among inbred strains, we imputed 92% of SNPs genome-wide, with 71% in high-confidence regions. Our method produced 977 million new genotypes with an estimated per-SNP error rate of 0.083% in high-confidence regions and 0.37% genome-wide.
Further details are explained in our paper (see reference).
Download is a zipped (.zip) file consisting of 20 comma-separated (.csv) files, one for each autosome + X. Positions are relative to NCBI Build 37. Confidence is listed following the allele for each strain and can be 0 (low), 1 (medium), or 2 (high). Alleles for Sanger strains are indicated with a confidence value of 3. The files annotated with the SNP effects indicate the alternate allele (only when an effect is listed) and the consequence of the SNP as per the Ensembl 64 VEP v2.2 immediately after the position (columns 2 and 3). SNP consequences are taken directly from the Sanger annotated SNP files. Each file lists one SNP per line in the following format:
|Position||Strain 1||Confidence||Strain 2||Confidence||...||Strain n||Confidence|
|Position of SNP 1||Strn 1 allele||Strn 1 conf.||Strn 2 allele||Strn 2 conf.||...||Strn n allele||Strn n conf.|
|Position of SNP 2||Strn 1 allele||Strn 1 conf.||Strn 2 allele||Strn 2 conf.||...||Strn n allele||Strn n conf.|
|Position of SNP n||Strn 1 allele||Strn 1 conf.||Strn 2 allele||Strn 2 conf.||...||Strn n allele||Strn n conf.|
|File||# of strains||Date||Size|
|Imputed w/confidence scores||88||10/13/2011||139.4 Mb|
|Imputed w/confidence scores + Sanger||100||2/08/2012||156.6 Mb|
|Imputed w/confidence scores + Sanger + SNP effects||100||2/13/2012||162.4 Mb|
Use of these data should cite the reference below
These tools allow you to filter the full data set down to a particular set of strains and region of interest. Select a chromosome, start and end positions (NCBI build 37), and a subset of strains. Click 'Update' to view the distribution of imputation confidence for the selected strains averaged over your region of interest. After choosing the subset you want, click 'Filter and Download' to download the filtered data.
Imputation confidence is shown as a stacked histogram where green represents
Jeremy R. Wang, Fernando Pardo-Manuel de Villena, Heather A. Lawson, James M. Cheverud, Gary A. Churchill, and Leonard McMillan. Imputation of SNPs in inbred mice using local phylogeny. Genetics, 2012.