Instruction:

The source code and the documentation of GeneScissors can be downloaded by the following git command

 git clone git://github.com/zzj/jeweler.git

Right now, GeneScissors is only tested on Linux. On your Linux machine, please make sure the following softwares or libraries are installed correctly: Python (>=2.7.2), cmake, gcc(>=4.4), Cufflinks, and TopHat. For Python, please make sure the pip and scikit-learn package is also installed.

After you cloned the package, enter into the folder named jeweler and please run the following command:

./bootstrap.sh

This command will automatically download necessary libraries required to compile GeneScissors. If you see some error at this step, you probably need to go through the corresponding library manual to check why the required library fails to compile. Please do not proceed to the next step until you clear all errors.

make

This will compile GeneScissors. At the end of the make command, it will run all tests of GeneScissors. If all tests are passed, you have compiled GeneScissors correctly.

Now, you need to prepare a reference information file: each row is the information of a strain, which is tab separated, and contains three columns. First column is a single letter, second column is a strain name, and the third column is the path of the reference genome file (fasta file) for the strain. An example is here .

And then, you can create another file, which contains all bam files from aligner, and the name conversion is that first two letters indicates paternal and maternal reference genomes, which corresponds to the letters in the reference information file, and the remaining characters can be anything you like. An example is here . In the example, the first filename is HF_0130_M_merged.bam, and based on the reference information file, it represents that it is a F1 mouse, and its mother is WSB and its father is CAST.

You also need to update the reference file in factory/cuffcompare_worker.py file. Please change the reffile to the path of your gtf reference file.

In the last, you can run the following command to generate commands for gene_scissors

 python factory/manager.py --filelist bam_file_list --reftable reference.table

The command creates a list of new commands, and each one corresponds to a bam file in the alignment file list. All generated commands should to be executed (sequentially or parallelly) in order to finish all the analysis, e.g. run through a shell (as the following command) or submitted to a computing cluster under your choice.

 python factory/manager.py --filelist bam_file_list  --reftable reference.table | bash

GeneScissors will also run Cufflinks and Cuffcompare during the process. The results are located at the result/bam_file_list. To be specific, the file at result/[bam_file_list name]/cufflinks/[individual id]/new_transcripts.gtf is the filtered result for [individual id]. And all suspicious genes are also output to file suspicious_transcripts.gtf under the same folder.

In order to maximize the accuracy of the GeneScissors pipeline , to use the modified version of Cufflinks and TopHat is recommended, but not necessary. Here are two patches for modified Cufflinks and modified TopHat used in the paper.

Download the patch file for modified Cufflinks (1.3.0).

Download the patch file for modified TopHat (1.4.1).