Logged in as: guest Log in

Announcements

  • May 1: The Final exam is online. I suggest that you logout of your  Comp555 account and log back in before downloading it to insure that your session does not time out near the end of the exam period.
  • May 1: The final exam will be given in the following Zoom meeting (Meeting ID is: 711-921-541) from 8am-11am.
  • April 30: I think that, finally finally, that every video is visible without a password. 
  • April 29: I think that, finally, the videos are visible without passwords. Also, Problem set #5 is now graded. 
  • April 28: I updated the study session video links so that no password is required. 
  • April 27: Today's Final Exam stdy session will use the following Zoom (Meeting ID is: 711-921-541) Meeting (the one we used for class). The video is here... but you missed the ground hog!
  • April 27: The grades for Problem Sets 1-4 are now posted.
  • April 23: A Final Exam study session will be held on Monday, April 27 from 4-6pm. A video of the session will be posted for those who are unable to attend.
  • April 22: There will be not office hours for me this Thursday (4/23). If you have a grading question please send me an email. -Daniel
  • April 8: Problem Set #5 is now online and due on 4/21.
  • March 29: My office hours will be from 1 pm - 3 pm Tuesday, Thursday. Please use this Zoom link for my OH: https://zoom.us/j/8254884544 . -Ziwei 
  • March 28: If you are experiencing issues getting to the course Jupyterhub, please cc Boo Fullwood on problem emails. 
  • March 26: My office hours today will be from 11am-12pm today (will return to 11-1pm next Tuesday). Please use this Zoom link for this and all future OH: https://unc.zoom.us/j/262978511  -Daniel
  • March 25: My office office hours (McMillan W 2pm-4pm) will resume this week via my Zoom personal meeting room at https://unc.zoom.us/my/leonardmcmillan. Be prepared to share your desktop!
  • March 24: To those having trouble accessing the Zoom meeting, you need to sign in with your Onyen here before joining: https://software.sites.unc.edu/zoom/. Lectures will be recorded and posted here.
  • March 24: Problem Set #4 is now online and due on 4/7.
  • March 23: Class will resume tomorrow 3/24 on Zoom (Meeting ID is: 711-921-541) at the regular time. The The password for class, if you need it, will be "COVID-19". My experience is that you are required to use the UNC VPN, and, if you areprompted to authenticate, select the [SSO] option, which will take you to screen that asks for a "company" domain in a textbox, where you should enter "UNC". You might be asked to enter your ONYEN passowrd at some point as well.
  • March 21: I had originally planed to attend a conference on the last day of class (4/23), but things have changed and I will use that class to make up for the two-classes lost during the extended Spring Break. 
  • March 14: Please continue to monitor this website for changes as Comp 555 prepares to move online. 
  • March 5: Note about grading: If you have a 0/20 with no comments for any question on a problem set, it's very likely you may have either overridden your PS1 or have something wrong with your cell metadata. Come see me about that. If I took off points for not having code for #4 on PS1, I will regrade those for accuracy and have that updated this weekend. -Daniel
  • March 5: I will be holding usual office hours today in SN317 SN341 to address any grading issues. -Daniel
  • March 5: You can download the Midterm here.
  • March 3: In lieu of office hours tomorrow, I will hold a midterm study session from 2pm-3:15pm in SN014 (our normal classroom).
  • March 2: I will be cancelling office hours again on Tuesday as I still try to recover from the flu. I will be on email for any questions regarding the problem set or grading. I'm still trying to catch up with my emails so I apologize if I miss yours. -Daniel
  • March 2: A system update over the weeked had an unexpected consequence of making it impossible to login to or submit problem sets via the course website. However, downloading was not impacted. The problem is now fixed. On Problem Set #3 disregard the k-mers sizes mentioned in the set up of problem 2 (19 and 25). I should refer to the 15-mer dataset accessible from the links in the problem set. 
  • Februray 27: I will move my office hour to Tuesday 1 pm - 5 pm starting next week.  -Ziwei Chen
  • Februray 27: I won't be able to host OH today because of illness, stay tuned for possible makeup OH. -Daniel
  • Februray 25: Ziwei's office hour will he held in SN317.
  • February 25: Office hours will be held in my office today (SN341) -Daniel.
  • February 19: Office hours will be held in SN317 today. 
  • February 19: Problem Set #3 is online and is due on 3/3/2020.
  • February 11: Note: The "transcription binding factor motifs" link to access motifs.fa at PS2 is broken. There's another link at number 1 which should work. Here is another link to motifs.fa. -Daniel
  • February 11: Office hours today will be in SN 317 (three doors to the left from SN325)
  • February 06: Helpful hint for PS2 #1, if you're getting an empty result from MedianStringMotifSearch, try changing your generated motifs to uppercase. -Daniel
  • February 06: I'm moving my OH for today to SN341 because of a double booking. -Daniel
  • February 06: Problem Set #2 is now online and is due on 2/18/2020.
  • February 03: The Jupyterhub will temporarily be available only from campus networks. Please see the bottom of the page for VPN instructions.(now fixed)
  • January 28: Ziwei Chen will be unable to hold office hours today from 3:30pm-5:30pm due to illness.
  • January 28: I will be offering an optional "Crash course on Python" on 1/29 from 5:00pm-6:30 in SN011.
  • January 21: Problem Set #1 is now online and is due on 2/4/2020.
  • January 16: Daniel and Ziwei will be holding their office hours in SN321.
  • January 9: First class meeting in SN014. See you there

Course Description


Computational methods are fueling a revolution in the biological sciences. Computers are already nearly as indispensable as microscopes for analyzing and interpreting biological data. As a result, two new multidisciplinary fields, bioinformatics and computational biology, have emerged. This course will explore the computational methods and algorithmic principles driving this revolution. It will cover basic topics in molecular biology, genetics, and proteomics. The course also addresses basic computational theory and algorithms including asymptotic notation, recursion, divide-and-conquer approaches, graph algorithms, dynamic programming, and greedy algorithms. These fundamental concepts from computer science will be taught within the context of motivating problems drawn from contemporary biology. Example biological topics include sequence alignment, motif finding, gene rearrangement, DNA sequencing, protein peptide sequencing, phylogeny, and gene expression analysis.

This course is suitable for both computer science and biology students at both undergraduate and graduate levels. Students who wish to take this course should have some programming experience in a modern programming language. Knowledge of data structures, algorithm design, and biology is helpful but not required. There will be 5 problem sets each with short programming assignments (each worth 8%), a midterm (worth 25%), a final exam (worth 25%), and many unannounced in-class exercises (in total worth 10% with the lowest 2 dropped).

A syllabus for this offering of Comp555 can be downloaded from here.

Book, Course Information, and Prerequisites


This semester I will not be using a book. In the past, I have used the following textbook, but I plan to deviate from it significantly in this offering. Nonetheless, you may find it useful as a supplement.

Bioinformatics Algorithms Bioinformatics Algorithms: An Active Learning Approach, Vol 1
by Phillip Compeau and Pavel Pevzner
Active Learning Publishers © 2014, ISBN: 0990374610.

Credit Hours: 3
Location: SN014
Time: TTh 9:30-10:45
URL: http://www.csbio.unc.edu/mcmillan/?run=Courses.Comp555S20
Prerequisites: COMP 410, Math 381, or equivalents

Course Instructors




Instructor:  Leonard McMillan Leonard's Mug
Office:  SN316
email:  mcmillan@cs.unc.edu
Office Hours:  Wednesdays 2pm-4pm



TA:  Daniel Su Daniel's Face
Office:  SN317
email:  sudan@live.unc.edu
Office Hours:  Tuesday, Thursday 11am-1pm



LA: Ziwei Chen Ziwei Chen
Office: https://zoom.us/j/8254884544
email:  ziwei75@live.unc.edu
Office Hours:  Tuesday, Thursday 1pm - 3pm

Schedule


Date Topic Homework
January 9 Lecture 1. Introduction (slides)  
January 14 Lecture 2. Jumping into Genomes (slides) (notebook, genome)  
January 16 Lecture 3. Finding patterns in DNA (slides) (notebook, genome)  
January 21 Lecture 4. Finding hidden patterns in DNA (slides) (notebook) PS #1 T.Petrophila GenomeA GenomeB
January 23 Lecture 5. Finding Motifs in our Lifetime (slides) (notebook)  
January 28 Lecture 6: Assembling a Genome (slides)  
January 30 Lecture 7. Finding Paths in Graphs (slides) (notebook) Tutorial Notebook
February 4 Lecture 8. Finding Eulerian Paths (slides) (notebook) PS #2, PS #1 Due
February 6 Lecture 9. Realities of Genome Assembly (slides) (notebook)
February 11 Lecture 10. Combinatorial Pattern Matching (slides) (notebook)  
February 13 Lecture 11. Suffix Arrays and BWTs (slides) (notebook)
February 18 Lecture 12. Multi-string BWTs (slides) (notebook) PS #3, PS #2 Due
February 20 Lecture 13. Multi-string BWTs (cont.) (slides) (notebook)  
February 25 Lecture 14. Adventures in Dynamic Programming (slides) (notebook)
February 27 Lecture 15. Comparing Sequences (slides) (notebook)  
March 3 Lecture 16. Sequence Alignment (slides) (notebook PS #3 Due
March 5 Midterm Exam covering Lectures 1-14 (open notes, open internet)
March 10 No Class (Spring Break Extended due to Coronavirus)
March 12
March 17
March 19
March 24 Lecture 17. Advanced Sequence Alignment (slides) (notebook) PS #4 (video)
March 26 Lecture 18. Protein Sequences and Antibotics (slides) (notebook) (BacillusBrevis.fa)  (video)
March 31 Lecture 19. Determining a Peptide's Sequence (slides) (notebook)  (no video Embarassed )
April 2 Lecture 20. Scaling Up Peptide Sequencing (slides) (notebook) (video)
April 7 Lecture 21. Divide and Conquer Algorithms (slides) PS #5, PS #4 Due
(video)
April 9 Lecture 22. Hidden Markov Models (slides) (notebook) (video)
April 14 Lecture 23. Inferring Ancestry with HMMs (slides) (notebook) (data) (video)
April 16 Lecture 24. Genome Rearrangements (slides) (notebook) (video)
April 21 Lecture 25. Genome Rearrangements Continued (slides) (notebook) PS #5 due
(video)
April 23 Lecture 26. Randomized Algorithms (slides) (notebook) (video)
Friday, May 1 Final Exam (SN014) In Zoom 8:00am-11:00am

 

Resources

UNC Campus VPN

Please follow the instructions here to install and use a vpn to access Campus-Only Resources.

Individual Jupyter Notebooks

 It is recommended that, as an alternative to the class Jupyterhub, you have access to a Jupyter environment either locally or on another cloud service. The recommended path is to use Google Colab or Azure Cloud Notebooks, but you can install anaconda locally and use that if you wish.

Google Colab

  • Go to colab.research.google.com
  • Sign in with a Google Account.
  • Upload notebooks to Google Drive.
  • Select Notebooks to open python environment.

Azure Notebooks

  • Go to notebooks.azure.com
  • Sign in in the top right
  • Use your onyen@ad.unc.edu login with your onyen password
  • Select 'My Projects' in the top navigation bar
  • Select '+ New Project'
  • Select a name and hit create
  • Select your new project and hit 'Run on Free Cloud'
  • You should now be at a notebook homepage. This can now be used just like the hub.

Local with Anaconda

This completely avoids cloud environments, and gives some other useful python tools, but is somewhat more fiddly

  • Go to https://docs.anaconda.com/anaconda/install/
  • Follow the installation instructions for your operating system
  • Open the Navigator
  • Select Launch under Jupyter Notebook
  • A screen like the Jupyterhub should appear in your browser
  • Create a folder for the class


Site built using pyWeb version 1.10
© 2010 Leonard McMillan, Alex Jackson and UNC Computational Genetics