Announcements
- May 9: The final exam can be downloaded anytime during the examination period (12pm-3pm). It must be submitted online before the exam period ends. It is designed to take only 120 mins, but you can use the entire time allottted.
- May 1: I will hold a course review session on Thursday, May 4 from 5pm to 7pm in SN011.
- April 18: Problem Set #4 has been revised. The changes are to Problems #4 and #5. Please download and transfer your answers to it.
- April 13: Problem Set #4 is now available and it is due on Monday 4/24.
- March 29: A study session for Problem Set #3 will be held on Friday, March 31 in SN325 from 2:00 to 3:30pm.
- March 22: Problem Set #3 is now available and it is due on Monday 4/3.
- March 20: Grades and solutions to the Midterm and problem set #1 are now online. You must login using your onyen to access them. You'll need to attend class to find out your password.
- March 8: The Midterm can be downloaded and must be submitted online. You will have 75 minutes to complete it. I recommend that you submit partial versions as the submission system will be automatically disabled at 12:30.
- Feburary 28: A study session for Problem Set #2 will be held on Friday, March 3 in SN011 from 2:00 to 3:30pm.
- Feburary 27: The original test dataset, PS2contigs.fa, for problem 5 of PS#2 had issues which have now been fixed. Please download it again.
- Feburary 27: A revisied version of Problem Set #2 has been posted. It corrects an error in problem #3 and revises the constraints on problem #5.
- Feburary 25: A second test dataset, PS2contigs.fa, for problem 5 of PS#2 is now online. Also, note that the lengths of the contigs exceed 100bp and their overlap can be less than the 50% contrary to the specification given in the problem's description. I will revise the problem set an repost it soon.
- February 20: Problem Set #2 is now available and it is due on Monday 3/6.
- February 8: I will hold a study session to answer questions related to problem set #1 on Friday 2/10 from 2:00 to 3:30pm in SN325
- February 7: Jupyter notebooks for each of the lectures are now online. The code and discussion should be intact, however, many image and web links may not work. Where possible, I will try to fix these problems over time. Therefore, be prepared to download them again.
- February 3: All of the fasta files needed for the problem set #1 should now be on-line.
- January 31: Problem Set #1 is now available and it is due on Monday 2/13. A link is included in the problem set for submitting it online.
- January 23: I will hold a tutorial session on Friday 1/27 in SN011 from 1:30-3:00 covering installing Jupyter, getting started with Python, and Rosalind.
- January 11: First class meeting in SN011. See you there
Course Description
Computational methods are fueling a revolution in the biological sciences. Computers are already nearly as indispensable as microscopes for analyzing and interpreting biological data. As a result, two new multidisciplinary fields, bioinformatics and computational biology, have emerged. This course will explore the computational methods and algorithmic principles driving this revolution. It will cover basic topics in molecular biology, genetics, and proteomics. The course also addresses basic computational theory and algorithms including asymptotic notation, recursion, divide-and-conquer approaches, graph algorithms, dynamic programming, and greedy algorithms. These fundamental concepts from computer science will be taught within the context of motivating problems drawn from contemporary biology. Example biological topics include sequence alignment, motif finding, gene rearrangement, DNA sequencing, protein peptide sequencing, phylogeny, and gene expression analysis.
This course is suitable for both computer science and biology students at both undergraduate and graduate levels. Students who wish to take this course should have some programming experience in a modern programming language. Knowledge of data structures, algorithm design, and biology is helpful but not required. There will be 5 problem sets each with short programming assignments, a midterm, and a final exam.
A syllabus for this offering of Comp555 can be downloaded from here.
Book, Course Information, and Prerequisites
Here is the book, which I will be supplementing with new materials:
Bioinformatics Algorithms: An Active Learning Approach by Phillip Compeau and Pavel Pevzner Active Learning Publishers © 2014, ISBN: 978-0-9903746-0-2.
Credit Hours: |
3 |
Location: |
SN011 |
Time: |
MW 11:15-12:30 |
URL: |
http://www.csbio.unc.edu/mcmillan/?run=Courses.Comp555S17 |
Prerequisites: |
COMP 410, Math 381, or equivalents
|
Course Instructors
Instructor: |
Leonard McMillan |
 |
Office: |
SN316 |
email: |
mcmillan@cs.unc.edu |
Office Hours: |
Tuesdays 11am-12pm, 3pm-4pm |
Schedule
Date |
Topic |
Homework |
January 11 |
Introduction (slides, notebook) |
|
January 16 |
No Class (MLK Holiday) |
January 18 |
Exploring a Genome (slides, notebook) |
(Video Parts 1 & 2) Reading: Chapter 1 pp 3-14 |
January 23 |
Exploring a Genome (Continued) (slides, notebook) |
(Video Parts 3, 4, & 5) Reading: Chapter 1 pp 14-45 |
January 25 |
Finding Patterns in DNA (slides, notebook) |
(Video Parts 1 & 2) Reading: Chapter 3 pp 83-92 |
January 27 |
Crash course in Jupyter, Python, and Rosalind (notebook) |
Link: Rosalind |
January 30 |
Searching for Motifs (slides, notebook) |
(Video Parts 3, 4, 5 & 6) Reading: Chapter 3 pp 93-127 Problem Set #1 Assigned |
February 1 |
Protein Sequences and Antibotics (slides, notebook) |
(Video Parts 1,2, & 3) Reading: Chapter 2 pp 47-58
|
February 6 |
Inferring Protein Sequences from Fragments (slides, notebook) |
(Video Parts 4, 5, & 6) Reading: Chapter 2 pp 58-66
|
February 8 |
Scaling up Peptide Sequencing (slides, notebook) |
(Video Parts 7, 8, 9, & 10) Reading: Chapter 2 pp 66-80 |
February 10 |
Problem set #1 study session in SN325 from 2:00pm-3:30pm |
Due on 2/13 |
February 13 |
Assembling a Genome (slides, notebook) |
(Video Parts 1, 2, 3 & 4) Reading: Chapter 4 pp 129-152 |
February 15 |
Path Finding in Graphs (slides, notebook) |
(Video Parts 4, 5, 6, 7, & 8) Reading: Chapter 4 pp 153-187
|
February 20 |
The Realities of Genome Assembly (slides, notebook) |
(Video Parts 9, 10, 11, & 12) Problem Set #2 Assigned |
February 22 |
Comparing Sequences (slides, notebook) |
(Video Parts 1, 2, 3, & 4) Reading: Chapter 5 pp 189-199 |
February 27 |
Sequence Alignment (slides, notebook) |
(Video Parts 5, 6, 7, & 8) Reading: Chapter 5 pp 200-229
|
March 1 |
Advanced Sequence Alignment (slides, notebook) |
(Video Parts 9, 10, & 11) Reading: Chapter 5 pp 230-258
|
March 3 |
Problem set #2 study session in SN325 from 2:00pm-3:30pm |
Due on 3/6 |
March 6 |
Divide and Conquer Algorithms (slides, notebook) |
Problem Set #2 due |
March 8 |
Midterm, Open book, open notes, online (covers to Lecture 13, 2/27) |
March 13 |
No Class (Spring Break) |
March 15 |
March 20 |
Go over Midterm |
|
March 22 |
Greedy Algorithms (slides, notebook) |
(Video Parts 1-4) Problem Set #3 issued
|
March 27 |
Genome Rearrangements (slides) |
(Video Parts 5-9) |
March 29 |
Clustering and Evolution (slides) |
|
March 31 |
Problem set #3 study session in SN325 from 2:00pm-3:30pm |
Due on 4/3 |
April 3 |
Imperfect Tree Construction (slides) |
Not in book Problem Set #3 Due |
April 5 |
Perfect Phylogeny (slides) |
Not in book |
April 10 |
Combinatorial Pattern Matching (slides) |
(Video Parts 1-3) |
April 12 |
Suffix Trees and BWTs (slides) |
(Video Parts 4-9) Problem Set #4 Assigned |
April 17 |
Multi-String BWTs (slides) |
Not in book |
April 18 |
Problem Set #4 Study session |
|
April 24 |
Hidden Markov Models (slides) |
Not in book |
April 26 |
Finding Founder Origins using HMMs (slides) |
Not in book |
May 9 |
Final Exam 12:00pm-3:00pm, Open book, open notes, online (covers to Lectures 14-25) |
Resources
- PS#2 study session on Friday (SN011 2pm-3:30pm)
|