Logged in as: guest Log in

Announcements

[Lecture Zoom Link] [Office Hours Zoom Link]

  • Final Corrections: The penalty value for an INDEL on Problem #1 should be -3.

                                 If the emission probabilities for the HMM given in Problem #7 do not sum to 1.0, you can
                                 assume that the remaining probability is for outputting an underscore "_".

  • May 11:  The final exam can be downloaded from here when it is available. You must be logged in to your Comp 555 account to access your personalized exam. You will have from 8:00am-11:00am to take the exam. Upload your ultimate version before the 11am deadline. You can upload multiple times, only the last one sumbitted will be graded. Please check that the version of the FInal that you submit is, in fact, the one with your answers and not the empty one that you downloaded. I will be online using the Lecture Zoom to answer questions. To ask a question, you should raise your hand in the Zoom and I will assign you to a breakout meeting.


  • May 11: Problem Set #6 is now graded.
  • May 4: I will hold a Final study session on May 6 from 9:30-11:00am. We will meet using the Lecture Zoom Link above. 
  • April 29: The last part of Problem 5 in PS6 is poorly worded. Rather than:

"Find the approximation ratio for the improvedBreakpointReversalSort() algorithm over all permuations of length 10 and an example permutation that achieves it."

        it should read

"Find the approximation ratio for the improvedBreakpointReversalSort() algorithm over all permuations of length 10 and an example permutation where improvedBreakpointReversalSort() achieves the optimal number of reversals."

  • April 22: Problem Set #6 is now on-line and due on April 29.
  • April 21 (Boo): No office hours today, dose 2 blues.
  • April 6 (Boo): Office hours pushed back to 3pm
  • April 1: Problem Set #4 is now on-line and due on April 15.
  • March 30: Midterm and PS#3 grades are posted.
  • March 23 (Boo):  Shortened office hours today, vaccine appointment.
  • March 23: Problem Set #2 grades are posted.
  • March 18: The midterm can be downloaded from here when it is available. You must be logged in to your Comp 555 account to access your personalized exam. You will have from 9:30am-11:00am to take the exam. Upload your ultimate version before the 11am deadline. You can upload multiple times, only the last one sumbitted will be graded. Please check that the version of the Midterm that you submit is, in fact, the one with your answers and not the empty one that you downloaded. I will be online using the Lecture Zoom to answer questions. To ask a question, you should raise your hand in the Zoom and I will assign you to a breakout meeting.
  • March 16: I will be holding a Midterm study session from 5pm-6pm tonight. During this session I will show an example Midterm from a previous course offering that covers the roughly same material. We can talk through basic strategies for approaching each problem, but I do not intend to provide answers to questions. Instead, the goal of the session is to provide some sense of the types of questions that you might be asked. For those who could not attend here is the video.
  • March 11: A new version (v1.1) of Problem Set #3 has been posted. It includes corrected links to the datasets and a clarification of how intervals are defined for specifying ranges and substring slices (for example the substring of Chromo.fa refered to Problem 4 should have 50,000 characters not 50,001 prior to appending the '$'). This version does not change the answer to any problem. I would suggest that you download the new version and copy any answers that you have completed thus far before submitting.
  • March 9: The grades for problem set #1 have been posted. To see them login to your Comp555 account, and go to the [Setup] page. You should see a button with your grade. If you press the button you will see your graded problem set.
  • March 3: The data set links in Problem Set #3 are incorrect. I plan to post a revised version, but for now you can use the following links instead: reads.fa and Chromo.fa. There is also a bug in the render() method of the Graph class. You can either wait for the revised problem set, or replace self.vertex.iteritems() with self.vertex.items() (this method is not needed for any problem).
  • March 2: Problem Set #3 is now on-line and due on March 16.
  • Feburary 18: Class is cancelled due to inclement weather. In case you are experiencing an internet outage, I will extend the due date for PS #1 24 hours until midnight on 2/19. Note that PS #2 will still be due on 2/25.
  • February 17: (LM) Just a reminder... I hope to focus my office hours today on PS#2. I will only entertain breif clarifications WRT PS #1. Boo will provide PS #1 last-minute hints/heroics if needed today and tomorrow. 
  • February 16: You should now be able to submit problem sets. Make sure that you select the correct one from the pull down before submitting.
  • February 16 (Boo): Office hours closed at 1400, email with questions and I'll be happy to sign back in.
  • February 16 (Boo): Will be holding normal office hours today 1pm - 4pm
  • February 11: Problem Set #2 is online and due on Feburary 25.
  • February 11 (Boo): Office hours ending at 3pm. Feel free to email any questions or to schedule office hours on Mon/Tue.
  • February 9: Tonight from 5:00pm-6:30pm I will offer an optional Python Tutorial, Use this Zoom Link, Have a Jupyter notebook up and running.
  • February 4: The links to datasets in Problem Set #1 will not work. I plan to upload an updated version of the Notebook this weekend with fixed links and additional clarifications, However, this should not keep you from getting started! Also, there is yet another version of the Human genome that you can use. In this zip archive, It includes only the 25 needed ".seq" files. These will save you the step of extracting them from the "official" FASTA file.
  • February 3: I have created a zip archive of the human genome (GCA_000001405.15_GRCh38_genomic.fna) used in Lecture 3 and Problem Set 1. After downloading, you should "unzip" it and the expected fasta file should appear.
  • February 2: Problem Set #1 is online and due on Feburary 18, no late submissions will be accepted.
    If you are still having problems logging into your COMP555 course account, please send me an email
    with "COMP555S21" in the subject line. In the body of the email give me your ONYEN and PID.
  • February 1: Videos of past lectures have been posted online.
  • January 28: I have not yet posted Problem Set #1. I hope to post it before the weekend. Check back here frequently.
  • January 19: First class meeting. See you there

Course Description


Computational methods are fueling a revolution in the biological sciences. Computers are already nearly as indispensable as microscopes for analyzing and interpreting biological data. As a result, two new multidisciplinary fields, bioinformatics and computational biology, have emerged. This course will explore the computational methods and algorithmic principles driving this revolution. It will cover basic topics in molecular biology, genetics, and proteomics. The course also addresses basic computational theory and algorithms including asymptotic notation, recursion, divide-and-conquer approaches, graph algorithms, dynamic programming, and greedy algorithms. These fundamental concepts from computer science will be taught within the context of motivating problems drawn from contemporary biology. Example biological topics include sequence alignment, motif finding, gene rearrangement, DNA sequencing, protein peptide sequencing, phylogeny, and gene expression analysis.

This course is suitable for both computer science and biology students at both undergraduate and graduate levels. Students who wish to take this course should have some programming experience in a modern programming language. Knowledge of data structures, algorithm design, and biology is helpful but not required. There will be 6 problem sets each with short programming assignments. No late problem sets will be accepted, however, I will drop the score of the lowest when calculating the course grade. The grade will be computed as follows best 5 of 6 problem sets (5 each worth 8%), a midterm (worth 25%), a final exam (worth 25%), and many unannounced in-class exercises (in total worth 10% with the lowest 2 dropped).

A syllabus for this offering of Comp555 can be downloaded from here.

Book, Course Information, and Prerequisites


This semester I will not be using a book. I will be teaching from my notes and I plan to add at least two modules of new material.

Credit Hours: 3
Location: SN014
Time: TTh 9:30-10:45
URL: http://www.csbio.unc.edu/mcmillan/?run=Courses.Comp555S21
Prerequisites: COMP 410, Math 381, or equivalents

Course Instructor


 

Instructor:  Leonard McMillan Leonard's Mug
Office:  SN316
email:  mcmillan@cs.unc.edu
Office Hours:  Wednesdays 2pm-4pm

 

RA:  Boo Fullwood Boo's Mug
Office:  SN325
email:  iamboo@cs.unc.edu
Office Hours:

MW 10am - 1pm, TTh 1pm-4pm [Zoom]

 

Schedule


Date Topic Homework
January 19 Lecture 1. Introduction (slides) (video)  
January 21 Lecture 2. Jumping into Genomes (slides) (notebook) (SARS-CoV-2.fa) (video)  
January 26 Lecture 3. Finding patterns in DNA (slides) (notebook) (genome in zip fle) (video)  
January 28 Lecture 4. Finding hidden patterns in DNA (slides) (notebook) (LTR14A.fa) (video)  
February 2 Lecture 5. Searching for Shared Patterns (slides) (notebook) (video) PS #1
February 4 Lecture 6: Finding Motifs in our Lifetime (slides) (notebook) (video)  
February 9 Lecture 7. Assembling a Genome (slides) (video)  
February 9 Optional Python Tutorial 5:00pm-6:30pm, Use this Zoom Link (notebook)
February 11 Lecture 8. Finding Paths in Graphs (slides) (notebook) (video) PS #2
February 16 Wellness day (No class)
February 18 CLASS CANCELLED DUE TO WEATHER. PS #1 Due
February 23 Lecture 9. Realities of Genome Assembly (slides) (notebook) (video)
 
February 25 Lecture 10. Combinatorial Pattern Matching (slides) (notebook) (video)
PS #3, PS #2 Due
March 2 Lecture 11. Suffix Arrays and BWTs (slides) (notebook) (video)
 
March 4 Lecture 11. Suffix Arrays and BWTs continued (slides) (notebook) (video)
 
March 9 Lecture 12. Multi-string BWTs (slides) (notebook) (video)  
March 11 Wellness day (No class)
March 16 Lecture 13. Adventures in Dynamic Programming (slides) (notebook) (video) PS #3 Due
March 18 Midterm Exam covering Lectures 1-14 (open notes, open internet)
March 23 Lecture 15. Comparing Sequences (slides) (notebook) (video)  
March 25 Lecture 16. Sequence Alignment (slides) (notebook) (video)  
March 30 Lecture 17. Advanced Sequence Alignment (slides) (notebook) (video)  
April 1 Lecture 18. Divide and Conquer (slides) (video) PS #4
April 6 Lecture 19. Determining a Peptide's Sequence (slides) (notebook) (video)  
April 8 Lecture 20. Scaling Up Peptide Sequencing (slides) (notebook) (video) PS #5
April 13 Lecture 21. Hidden Markov Models (slides) (notebook) (video)  
April 15 Lecture 22. Inferring Ancestry using HMMs (slides) (notebook) (CCGenotypes.csv) (video) PS #4 Due
April 20 Lecture 22. Inferring Ancestry using HMMs (continued) (video)  
April 22 Lecture 23. Genome Rearrangements (slides) (notebook) (video) PS #6, PS #5 Due
April 27 Lecture 24. Genome Rearrangements (cont) (slides) (notebook) (video)  
April 29 Lecture 25. Randomized Algorithms (slides) (notebook) (video) PS #6 Due
May 4 Lecture 26. Nonsense Mutations (slides) (video)  
May 11 (Tues)  Final Exam (zoom) 8:00am-11:00am

 

Resources

Jupyter

All coding examples, problem sets, and exams will use Jupyter Notebooks and Python3. If you have a Jupyter Notebook enviroment set up, I recommend that you use Anaconda as follows:

  • Go to https://docs.anaconda.com/anaconda/install/
  • Follow the installation instructions for your operating system
  • Open the Navigator
  • Select Launch under Jupyter Notebook
  • A screen like the Jupyterhub should appear in your browser
  • Create a folder for the class


Site built using pyWeb version 1.10
© 2010 Leonard McMillan, Alex Jackson and UNC Computational Genetics