UNC Systems Genetics

Collaborative Cross Genomes

Here we provide genomic sequences for the Collaborative Cross (CC) mouse strains and the eight CC founder strains in the form of FASTA files for the 19 autosomes, sex chromosomes (X and Y), and mitochondria (M). These sequences can be used as reference sequences for high-throughput short-read alignments, or for any other comparative genomic analyses.

Each genome comes with a companion MOD file, which can be used to remap coordinates from the FASTA sequences back to reference coordinates. This is necessary since, in general, all gene and genomic annotations are specified relative to the reference. MOD files are genome and version specific, and therefore should always be downloaded together as a set with their associated FASTA sequence.

We supply two types of genomes, sequenced and imputed. Sequenced genomes result from direct DNA sequencing at a minimum of 30x coverage, and an iterative alignment process. Imputed genomes are derived from genotype data, where we first construct a haplotype mosaic using MegaMUGA genotypes and then assemble an imputed genome using segments of DNA sequence from the inferred founders 

For the previous version of this page, please click here.


Variants and Standard Reference Sequences

The variants included in the MOD files were extracted from the latest Variant Calling Files VCFs from Sanger's recent sequencing efforts of common and important Mus musculus strains.

The standard Mus musculus reference sequences can be downloaded as follows.


CC Founders

The sequences and MOD files in the table below are from the eight founder strains of the CC and DO genetic reference panels.

Strain MOD (Build 37) FASTA (Build 37) MOD (Build 38) FASTA (Build 38)
A/J MOD File(29 MB) Fasta File(734 MB) MOD File(29 MB) Fasta File(759 MB)
C57BL/6J MOD File(0 MB) Fasta File(734 MB) MOD File(0 MB) Fasta File(760 MB)
129S1/SvImJ MOD File(31 MB) Fasta File(734 MB) MOD File(31 MB) Fasta File(759 MB)
NOD/ShiLtJ MOD File(30 MB) Fasta File(734 MB) MOD File(30 MB) Fasta File(759 MB)
NZO/HlLtJ MOD File(32 MB) Fasta File(734 MB) MOD File(32 MB) Fasta File(759 MB)
CAST/EiJ MOD File(113 MB) Fasta File(734 MB) MOD File(112 MB) Fasta File(759 MB)
PWK/PhJ MOD File(111 MB) Fasta File(734 MB) MOD File(111 MB) Fasta File(759 MB)
WSB/EiJ MOD File(43 MB) Fasta File(734 MB) MOD File(43 MB) Fasta File(759 MB)

CC Strains

Name Strain MOD (Build 37) FASTA (Build 37)
CC001/Unc OR13140 MOD File(37 MB) FASTA File(734 MB)
CC002/Unc OR15156 MOD File(41 MB) FASTA File(734 MB)
CC003/Unc OR13067 MOD File(42 MB) FASTA File(734 MB)
CC004/TauUnc IL16188 MOD File(40 MB) FASTA File(734 MB)
CC005/TauUnc IL16211 MOD File(36 MB) FASTA File(734 MB)
CC006/TauUnc IL16750 MOD File(37 MB) FASTA File(734 MB)
CC007/Unc OR13421 MOD File(35 MB) FASTA File(734 MB)
CC008/GeniUnc AU8036 MOD File(41 MB) FASTA File(734 MB)
CC009/Unc OR5489 MOD File(45 MB) FASTA File(734 MB)
CC010/GeniUnc AU8018 MOD File(36 MB) FASTA File(734 MB)
CC011/Unc OR3252 MOD File(37 MB) FASTA File(734 MB)
CC012/GeniUnc AU8005 MOD File(39 MB) FASTA File(734 MB)
CC013/GeniUnc AU8010 MOD File(32 MB) FASTA File(734 MB)
CC015/Unc OR3154 MOD File(42 MB) FASTA File(734 MB)
CC016/GeniUnc AU8024 MOD File(31 MB) FASTA File(734 MB)
CC017/Unc OR3032 MOD File(49 MB) FASTA File(734 MB)
CC018/Unc OR3609 MOD File(38 MB) FASTA File(734 MB)
CC019/TauUnc IL16513 MOD File(33 MB) FASTA File(734 MB)
CC020/GeniUnc AU8054 MOD File(35 MB) FASTA File(734 MB)
CC021/Unc OR1566 MOD File(49 MB) FASTA File(734 MB)
CC022/GeniUnc AU8046 MOD File(48 MB) FASTA File(734 MB)
CC023/GeniUnc AU8043 MOD File(43 MB) FASTA File(734 MB)
CC024/GeniUnc AU8004 MOD File(44 MB) FASTA File(734 MB)
CC025/GeniUnc AU8008 MOD File(49 MB) FASTA File(734 MB)
CC026/GeniUnc AU8026 MOD File(40 MB) FASTA File(734 MB)
CC027/GeniUnc AU8027 MOD File(36 MB) FASTA File(734 MB)
CC028/GeniUnc AU8016 MOD File(44 MB) FASTA File(734 MB)
CC029/Unc OR3564 MOD File(44 MB) FASTA File(734 MB)
CC030/GeniUnc AU8034 MOD File(45 MB) FASTA File(734 MB)
CC031/GeniUnc AU8031 MOD File(33 MB) FASTA File(734 MB)
CC032/GeniUnc AU8002 MOD File(40 MB) FASTA File(734 MB)
CC033/GeniUnc AU8033 MOD File(39 MB) FASTA File(734 MB)
CC034/Unc OR5080 MOD File(38 MB) FASTA File(734 MB)
CC035/Unc OR5035 MOD File(46 MB) FASTA File(734 MB)
CC036/Unc OR477 MOD File(38 MB) FASTA File(734 MB)
CC037/TauUnc IL16072 MOD File(28 MB) FASTA File(734 MB)
CC038/GeniUnc AU8049 MOD File(45 MB) FASTA File(734 MB)
CC039/Unc OR15155 MOD File(49 MB) FASTA File(734 MB)
CC040/TauUnc IL16557 MOD File(42 MB) FASTA File(734 MB)
CC041/TauUnc IL16441 MOD File(37 MB) FASTA File(734 MB)
CC042/GeniUnc AU18042 MOD File(39 MB) FASTA File(734 MB)
CC043/GeniUnc AU8021 MOD File(43 MB) FASTA File(734 MB)
CC044/Unc OR4410 MOD File(50 MB) FASTA File(734 MB)
CC045/GeniUnc AU8045 MOD File(42 MB) FASTA File(734 MB)
CC046/Unc OR5346 MOD File(42 MB) FASTA File(734 MB)
CC047/Unc OR5612 MOD File(41 MB) FASTA File(734 MB)
CC048/Unc OR5248 MOD File(53 MB) FASTA File(734 MB)
CC049/TauUnc IL16296 MOD File(39 MB) FASTA File(734 MB)
CC050/Unc OR867 MOD File(46 MB) FASTA File(734 MB)
CC051/TauUnc IL16912 MOD File(23 MB) FASTA File(734 MB)
CC052/GeniUnc AU8052 MOD File(46 MB) FASTA File(734 MB)
CC053/Unc OR773 MOD File(46 MB) FASTA File(734 MB)
CC054/GeniUnc AU8050 MOD File(50 MB) FASTA File(734 MB)
CC055/TauUnc IL16680 MOD File(46 MB) FASTA File(734 MB)
CC056/GeniUnc AU8056 MOD File(32 MB) FASTA File(734 MB)
CC057/Unc OR3393 MOD File(39 MB) FASTA File(734 MB)
CC058/Unc OR5358 MOD File(53 MB) FASTA File(734 MB)
CC059/TauUnc IL16012 MOD File(23 MB) FASTA File(734 MB)
CC060/Unc OR3460 MOD File(39 MB) FASTA File(734 MB)
CC061/GeniUnc AU8048 MOD File(38 MB) FASTA File(734 MB)
CC062/Unc OR5306 MOD File(44 MB) FASTA File(734 MB)
CC063/Unc OR5391 MOD File(46 MB) FASTA File(734 MB)
CC065/Unc OR5119 MOD File(40 MB) FASTA File(734 MB)
CC068/TauUnc IL16768 MOD File(45 MB) FASTA File(734 MB)
CC070/TauUnc IL16034 MOD File(45 MB) FASTA File(734 MB)
CC071/TauUnc IL16785 MOD File(0 MB) FASTA File(0 MB)
CC072/TauUnc IL16521 MOD File(25 MB) FASTA File(734 MB)
CC073/Unc OR1515 MOD File(45 MB) FASTA File(734 MB)
CC074/Unc OR3015 MOD File(40 MB) FASTA File(734 MB)
CC075/Unc OR3260 MOD File(0 MB) FASTA File(0 MB)
CC076/Unc OR5343 MOD File(0 MB) FASTA File(0 MB)
CC078/TauUnc IL6126 MOD File(0 MB) FASTA File(0 MB)
CC079/TauUnc IL6411 MOD File(0 MB) FASTA File(0 MB)
CC080/TauUnc IL6573 MOD File(0 MB) FASTA File(0 MB)
CC081/Unc OR3269 MOD File(0 MB) FASTA File(0 MB)

Sanger Strains

Strain MOD (Build 37) FASTA (Build 37) MOD (Build 38) FASTA (Build 38)
129P2/OlaHsd MOD File(30 MB) Fasta File(759 MB)
129S5SvEvBrd MOD File(26 MB) Fasta File(759 MB)
AKR/J MOD File(29 MB) Fasta File(734 MB) MOD File(29 MB) Fasta File(759 MB)
BALB/cJ MOD File(28 MB) Fasta File(734 MB) MOD File(28 MB) Fasta File(759 MB)
C3H/HeJ MOD File(31 MB) Fasta File(734 MB) MOD File(31 MB) Fasta File(759 MB)
C57BL/6NJ MOD File(0 MB) Fasta File(734 MB) MOD File(0 MB) Fasta File(759 MB)
CBA/J MOD File(32 MB) Fasta File(734 MB) MOD File(31 MB) Fasta File(759 MB)
DBA/2J MOD File(31 MB) Fasta File(734 MB) MOD File(30 MB) Fasta File(759 MB)
FVB/NJ MOD File(30 MB) Fasta File(734 MB) MOD File(29 MB) Fasta File(759 MB)
LP/J MOD File(32 MB) Fasta File(734 MB) MOD File(32 MB) Fasta File(759 MB)
SPRET/EiJ MOD File(212 MB) Fasta File(734 MB) MOD File(212 MB) Fasta File(759 MB)


Pseudogenome Tools

We provide a suite of tools that simplify the incorporation of our pseudogenomes into standard analysis and hiseq pipelines.

All of the latest tools are under PyPi. It is highly recommended to use the following commands for installation.

easy_install modtools
easy_install lapels
easy_install suspenders

 

[Modtools]

Modtools is used to generate standard reference genome and pseudogenome sequences.

The code is hosted in https://pypi.python.org/pypi/modtools.

For the usage of vcf2mod, please refer to the example in http://csbio.unc.edu/~sphuang/vcf2mod/ .

 

[Lapels]

Lapels is used to remap pseudogenome alignments, in the form of a BAM file, back to the reference sequence. This entails the removal of all indels (via the cigar string modifications, the underlying sequence is unaltered) and adjustments to the fragment and its mate's starting positions. Lapels also annotates the number and types (SNPs, insertions, and deletions) of sequence variants seen in each read.

The input includes the BAM file of psedogenome alignment and the MOD file associated with the FASTA sequences used in the alignment. (Please bundle MOD and FASTA while downloading.)

The output is a BAM file with corrected reads positions, cigar strings, and annotated tags. It has been tested to be compatible with downstream tools, such as IGV (using the reference genome) and Cufflinks (using any referenced based transcript library).

Lapels is written in Python. It requires the pysam library and the argparse library.

The code is hosted in https://pypi.python.org/pypi/lapels.

 

[Suspenders]

Suspenders merges the results of multiple alignments (BAM files) applied to the same set of reads. It is used when working with F1 and RIX crosses, where we suggest performing separate alignments to each parental genome. Suspenders then effectively merges and annotates these separate BAM files into a single consensus BAM file.

When reads map to the same genomic location in both alignments, only one read is output. Where there are differences in either mapping positions or multiplicity of reads, Suspenders determines the most likely alignment and source genome for the read, which is sent to the output BAM file. When there is no significant difference in the alignments all multiple mappings are output.

Suspenders is written in Python. It requires the pysam library and the argparse libraries.

The code for Suspenders is available in https://pypi.python.org/pypi/suspenders/.

Usage

For general use cases, please refer to the Suspenders wiki page or contact James Holt (holtjma@cs.unc.edu).

Publications

S. Huang, C.-Y. Kao, L. McMillan, and W. Wang.Transforming genomes using mod files with applications. In Proceedings of the ACM Conferenceon Bioinformatics, Computational Biology and Biomedicine. ACM, 2013.[link]

J. Holt, S. Huang, L. McMillan, and W. Wang. Read annotation pipeline for high-throughput sequencing data. In Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM, 2013. [link]


Using Pseudogenomes with the Collaborative Cross



UNC Systems Genetics Sponsored By: