Logged in as: guest Log in
Problem Set #1 mcmillan / Version 13

Comp 555: BioAlgorithms -- Spring 2015

Problem Set #1

Issued: 2/2/2015      Due: In class 2/23/2015

 


Homework Information: Some of the problems are probably too long to attempt the night before the due date, so plan accordingly. No late homework will accepted. However, your lowest homework will be dropped. Feel free to work with others, but the work you hand in should be your own.

Genetic Fingerprints

Recall that a resrtiction enzyme cuts DNA in to fragments at sites with specific sequence patterns, which are generally reverse palindromes that typically have lengths of 4, 6, or 8 bases. These patterns are called recognition sites and the intervals between these sites are called digestion fragments. A single mammalian chromosome is so long that a given recognition site will occur frequently, and the digestion fragments lengths will vary significantly. Moreover, due to mutations the number and positions of recognition sites can vary from individual to individual.DNA Fingerprint

Genetic fingerprinting is the term applied to the general process of forming a limited picture of a person's genetic makeup (which is cheaper than actual sequencing). Applications of genetic fingerprinting include forensics and paternity tests. These methods rely on a process called partial digestion where a sample of DNA is replicated artificially, and then treated with a given restriction enzyme, which cuts the DNA at nearly every possible combination of restriction sites. A second process called gel electrophoresis is then used to separate the digestion fragments based on their molecular weights (i.e. size in bases), with larger pieces forming bands at one end and smaller pieces tending toward the other. The pattern of bands formed can be considered as a sort of fingerprint, which can be used to distingish how related indiviuals are. Closely related individuals tend to share patterns of bins moreso that would two people chosen at random.

The sizes of the digestion fragments can be thought of simply as the collection of distances between restriction sites in the genome. It is possible to cut out bands from a gel electrophoresis experiment, and use them to construct Bacterial Artifical Chromosomes (BACs) which can serve as starting points for sequencing. While knowledge of fragment sizes provides very little information about the DNA sequence, it can provide information about how fragments are arranged. Thus providing a scaffold for mapping the sequenced BACs into a more complete picture of haow a chromosome is organized. These inferred chromosome fragement organizations are called restriction maps. The application of forming a restriction map from cleaved restriction fragments motivates this problem set.

Problem 1: How many possible reverse palindromic recognition sites of length 6 exist? Write a code fragment to generate all of them. What fraction of 2N-mers are reverese palindromic as a function of N?

Problem 2:  How many time does each of the recognition sites from problem 1 appear in the genome sequence for human chromosome 1? Your result should be in the form of a python dictonary using the sequence as a key and its count as a value. Can you identify any pattern that impacts the frequency of a recognition site's occurances?

Problem 3: Plot a histogram of the fragment lengths generated by a complete digestion (the sequence is cleaved at every recognition site) of human chromosome 1 by the ShpI restriction enzyme with the following recognition site and cutting pattern, CGTAC/G.

Problem 4: Double Digest Mapping is a restriction mapping method that uses two different restriction enzymes. In this approach DNA is completely digested such that only fragments between consecutive cut sites remain. Three preparations are used in a double digest. In the first, one restriction enzyme is used to digest the DNA and a gel is run. In the second preparation the second restriction enzyme is used for a complete digest a gel is run. In the third preparation, both restriction enzymes are run and a gel is run. By analyzing the gel a set of DNA length lengths can be determined for each gel.

As an example suppose that, for a particular DNA sequence, the restriction enzyme HapII produces fragment lengths of:

[68, 114, 133, 162, 387, 557, 649, 813, 1737, 2012, 2325, 7342]

And for the same DNA, the restriction enzyme BstUI produces fragment lengths of:

[235, 316, 1403, 1455, 1562, 1589, 1633, 1666, 6440]

When the DNA is digested by both HapII and BstUI produces fragment lengths of:

[9, 68, 99, 114, 133, 136, 162, 183, 387, 513, 557, 754, 813, 1032, 1455, 1562, 1633, 1666, 2012, 3011]

Devise a branch-and-bound algorithm to assemble the the restriction map from this data set. Make your algorithm such that all distinct solutions are generated. Note that solutions come in pairs where the fragments are arranged in the same order, but one is arranged from left-to-right and the second from right to left. You solution should prefer the solution with the shorter left-most fragment in its double digestion.


Human chromosome 1 in compressed FASTA format

 

Submit your solutions here.



Posted by andrewwg on 2015-02-24 19:47:50 (references Version 12)

How to submit?




Site built using pyWeb version 1.10
© 2010 Leonard McMillan, Alex Jackson and UNC Computational Genetics