Imputed Mouse SNP Resource

High quality genotypes for 88 classical inbred strains

Description


We present full-genome genotype imputations for 88 classical laboratory mouse strains, using a novel method, and genotypes for 12 strains determined by sequencing. Using genotypes at 549,683 SNP loci obtained with the Mouse Diversity Array, we partitioned the genome of 100 mouse strains into 40,647 intervals that exhibit no evidence of historical recombination. For each of these intervals we inferred a local phylogenetic tree. We combined these data with 12 million loci with sequence variations recently discovered by whole-genome sequencing in a common subset of 12 classical laboratory strains. For each phylogenetic tree we identified strains sharing a leaf node with one or more of the sequenced strains. We then imputed high- and medium-confidence genotypes for each of 88 nonsequenced genomes. Among inbred strains, we imputed 92% of SNPs genome-wide, with 71% in high-confidence regions. Our method produced 977 million new genotypes with an estimated per-SNP error rate of 0.083% in high-confidence regions and 0.37% genome-wide.

Further details are explained in our paper (see reference).

Download


Download is a zipped (.zip) file consisting of 20 comma-separated (.csv) files, one for each autosome + X. Positions are relative to NCBI Build 37. Confidence is listed following the allele for each strain and can be 0 (low), 1 (medium), or 2 (high). Alleles for Sanger strains are indicated with a confidence value of 3. The files annotated with the SNP effects indicate the alternate allele (only when an effect is listed) and the consequence of the SNP as per the Ensembl 64 VEP v2.2 immediately after the position (columns 2 and 3). SNP consequences are taken directly from the Sanger annotated SNP files. Each file lists one SNP per line in the following format:


Position Strain 1 Confidence Strain 2 Confidence ... Strain n Confidence
Position of SNP 1 Strn 1 allele Strn 1 conf. Strn 2 allele Strn 2 conf. ... Strn n allele Strn n conf.
Position of SNP 2 Strn 1 allele Strn 1 conf. Strn 2 allele Strn 2 conf. ... Strn n allele Strn n conf.
... ... ... ... ... ... ... ...
Position of SNP n Strn 1 allele Strn 1 conf. Strn 2 allele Strn 2 conf. ... Strn n allele Strn n conf.

File # of strains Date Size
Imputed w/confidence scores 88 10/13/2011 139.4 Mb
Imputed w/confidence scores + Sanger 100 2/08/2012 156.6 Mb
Imputed w/confidence scores + Sanger + SNP effects 100 2/13/2012 162.4 Mb

Use of these data should cite the reference below

Tools


These tools allow you to filter the full data set down to a particular set of strains and region of interest. Select a chromosome, start and end positions (NCBI build 37), and a subset of strains. Click 'Update' to view the distribution of imputation confidence for the selected strains averaged over your region of interest. After choosing the subset you want, click 'Filter and Download' to download the filtered data.

Filter

3000000 = 3,000,000 = 3m = 3M = 3000k = 3KK

Type a strain name to search. Multiple strains can be selected.


View filter data
Confidence plot

Imputation confidence is shown as a stacked histogram where green represents high confidence, brown is medium confidence, and red is low confidence.


View a single strain

Type a strain name to search.


Download
Filter and Download

This may take several minutes.
Use of these data should cite the reference below.

Reference


Jeremy R. Wang, Fernando Pardo-Manuel de Villena, Heather A. Lawson, James M. Cheverud, Gary A. Churchill, and Leonard McMillan. Imputation of SNPs in inbred mice using local phylogeny. Genetics, 2012.