- September 16: I have made a few small changes and clarifications to Problem Set #2. You should download it again and transfer your answers from the old version 1.0 to the new one 1.1. Also, Kat and Boo will be leading a special recitation on 9/17 from 6pm-7pm for students from section 1. They plan to go over the slides and highlight all siginificant differences in the first 6 lectures prior to the section merge. [Recitation Zoom Link]
- September 15: I have a few major annoucnements. First, I am delaying the due and issue date for problem sets #2 and #3 respectively to 9/22 to give folks from section 001 more time, and to allow me to incorporate more scalabled appraches into grading infrastructure. Second, I am redistributing my office hours over two days W, Th from 11am-noon rather a single block on W from 10am-noon. This will allow me to hold an organizational meeting with the TAs, and I hope it provides more access. I still have not yet garnered control of all the course infrastructure used in section 001. I hope to do so this week.
- September 9: I messed up and lost track of my office hours today. Actually, it was a side effect of being triple booked. I plan to mke up for these missed hours early next week. Stay tuned for updates.
- September 8: Welcome everyone attending from Comp521-001. We will discuss what is going on, and various senarios during today's class.
- September 3: Problem Set #2 is now online.
- September 2: There are two clarifications and fixes for Problem Set #1. On Problem #8, you should also ignore cases where the number of beds in a hosptial is reported as "0" when finding the cases-to-bed ratios. BTW, the zero beds are not usually true, it often means that the hospital did not report a number, as this data was scraped from sources where the hospitals elected to provide information. On Problem 10 there should be a fourth outcome, "confirmed" in addition to the 3 given (death, hospitalized, and recovered). The point of Problem 10 is to imagine that the primary COVID-19 data was provided in a different form. You should address how you would format this new data in a table, AND how you would provide backward compatibility to the old tables developed in Problems 1-5. Lastly, you should not assume that the new "case-by-case" records mentioned in Problem 10 are identifiable (i.e. don't assume a primary key like a social security number). Instead, you should assume that information about each case reports only the county, sex, race, and date and type (confirmed, death, hospitalized, or recovered). Note: that on a given day there can be many cases from the same county and demographic group, thus, it may not be possible to create a primary key for this relation.
- August 28: Just like in the real world, there are misspellings and other inconsistencies in the data files you were given for Problem Set #1. You might need to consider this in your answers. The correct answer is one where the data has been cleaned.
- August 27: I am changing the due date for Problem Set #1 to 9/8 to allow the new cohort from section 001 to catch up.
- August 26: Here are a few clarifications with regard to Problem Set #1. Notice that the case counts in the files "NCCOVIDConfirmed.csv" and "NCCOVIDDeaths.csv" are cumulative, and my expectation is that your COVID19 table in Problem 3 will include daily rather than cumulative counts. This has no real impact on the problem, but it better clarifies the database's design. You will need to consider the cumulative nature of the raw data, however, in problems 8 (makes it a little tougher) and 9 (makes it simpler). As a reminder, I will be holding office hours today from 10am-noon focusing on getting the transferred student's course environments set up.
- August 25: I have created accounts for all of the students who recently transferred from section 001. You should now be able to follow the instructions and exercises from Lecture 3. I will also be holding office hours tomorrow at this Zoom Link. In this meeting I will be available to help (and give priority to) anyone who has a problem accessing their course account and Jupyter hub. Office hours will be use a drop-in and group paradigm. If you need to schedule a one-on-one meeting with the instructor you can schedule one during office hours or after lecture.
- August 22: The Final Exam schedule is now available. Our exam will take place on November 19, 2020 from 3pm-6pm.
- August 21: The university has called off classes on 8/25, thus we will next meet via Zoom on 8/27. I plan to make adjustments to the schedule to accommodate this change. Problem Set #1 is now posted. Due to the closing of UNC on 8/24-8/25, I may make adjustments to the due date. I am also still working to add the students who transferred from section 001 to section 002. Thus, these new students cannot yet login to the course website or the Jupyter Hub. They can, however, download a copy of all lecture pdfs and the problem set. Stay tuned.
- August 17: Due to recent outbreaks tomorrow's (8/18) class will be offered online only. Use the given Zoom Link.
- August 13: Here's the link to the in-class exercise
- August 12: The website links for Lecture 1 and Zoom are now fixed (lm).
- August 11: First day of class and the course syllabus.
COVID-19 Considerations

- All course meetings will be simulcast via Zoom using a link provided at the top of this website. You are expected to use Zoom if you are exhibiting any COVID-19 symptom, including an elevated temperature, a cough, shortness of breath, fatigue, head or body aches, loss of taste or smell, sore throat, congestion, or runny nose.
- If the instructor exhibits any symptoms, that day's lecture will be presented online rather than in person. If this happens, students will be able to watch the live broadcast projected in the classroom if they wish.
- ALL students must adhere to the university's health safety standards in effect at the time of each lecture. Currently, this requires a mask AND social distancing of at least 6 feet while in the classroom.
- NO students are allowed to sit in the front row of the classroom nor enter through the east door located at the front of the classroom.
- Exams, problem sets, and live exercises will be online. Access will only be provided during prescribed intervals, which implies that exercises and exams cannot be taken asynchronously.
- Students will be responsible for cleaning and removing all materials from their desk areas at the end of each class period.
- The class may at any time revert to being entirely online.
Course Description
Databases are an indispensable tool for managing information, and a course on the principles and practice of database systems is now an integral part of any computer science curricula. This course covers the fundamentals of modern database management systems, in particular relational database systems.
The material covered in Comp 521 can be broken into three areas of emphasis. The first area includes database foundation material such as the relational model, and normal forms. The second area of emphasis is database application programming and includes among others the topics of Structured Query Language (SQL), integrating databases into programs, and web-based database usage. The third area of emphasis is the systems side of databases, which includes database indexing, efficient query evaluation, the transaction-based model, and concurrency.
This course is suitable for computer science majors at both undergraduate and graduate levels. Students who wish to take this course should have some programming experience in a modern language and knowledge of data structures. Course grades will be based on in-class exercises, five problem sets, a mid-terms, and a final exam.
Book, Course Information, and Prerequisites
|
 |
Textbook (optional): |
Database Management Systems, Third edition by Raghu Ramakrishnan and Johannes Gehrke McGraw-Hill Higher Education © 2003, ISBN: 0072465638. |
Credit Hours: |
3 |
Location: |
Sitterson 014 |
Time: |
TTh 3:00pm-4:15PM |
URL: |
http://csbio.unc.edu/mcmillan/?run=Courses.Comp521F20 |
Prerequisites: |
COMP 401, Comp 410, Comp 411 or equivalents
|
Course Instructors
 |
Instructor: |
Leonard McMillan |
Office: |
SN 316 |
email: |
mcmillan@cs.unc.edu |

|
RA: |
Boo Fullwood
|
Office: |
SN 325 |
email: |
iamboo@cs.unc.edu |

|
RA: |
Zhongrui Chen
|
Office: |
SN ??? |
email: |
jcpwfloi@cs.unc.edu |

|
RA: |
Juan Garcia
|
Office: |
SN ??? |
email: |
jjgarcia@cs.unc.edu |

|
RA: |
Larry He
|
Office: |
SN ??? |
email: |
lahe@cs.unc.edu |

|
RA: |
Kat Kirchoff
|
Office: |
SN ??? |
email: |
kat@cs.unc.edu |
Office Hours:
Leonard McMillan: Wednesday and Thursday, 11am-12pm
Boo: M 11am-6:30pm, W 11am-6:30pm ZhongRui: M 4-5pm, Th 5:15pm-6:45pm, F 9am-noon Juan: M noon-2pm, T 8am-9am, T 10am-3pm, Th 8am-3pm Larry: M 8am-2pm, F 8am-2pm Kat: T 11am-1pm, W 8am-10am, W 11am-1pm, W 3pm-4pm, F 9am-noon
Schedule
Week 1: |
T 8/11 |
Lecture 1: Introduction and Overview (pdf) (video) |
|
Th 8/13 |
Lecture 2: Entities and Relations (pdf) (video) |
|
Week 2: |
T 8/18 |
Lecture 3: The Trouble with Files (pdf) (video) NCDemographics.csv |
|
Th 8/20 |
Lecture 4: The Relational Model (pdf) (video) |
PS1 |
Week 3: |
T 8/25 |
Lecture 5: SQL: Basic Queries (pdf) Class cancelled to empty campus |
|
Th 8/27 |
Lecture 6: SQL: Basic Queries (pdf) (video) |
|
Week 4: |
T 9/1 |
Lecture 7: SQL:Advanced Queries (pdf) (video) |
|
Th 9/3 |
Lecture 8: SQL: Joins, Updates, and Transactions (pdf) (video) |
PS2 |
Week 5: |
T 9/8 |
Lecture 9: Exploring a Database (pdf) (video) |
(PS1 due) |
Th 9/10 |
Lecture 10: Database Application Development (pdf) (video) |
|
Week 6: |
T 9/15 |
Lecture 11: Overview of Storage and Indexing (pdf) (video) |
|
Th 9/17 |
Lecture 12: Storing and Buffering Data (pdf) (video) |
|
Week 7: |
T 9/22 |
Lecture 13: Tree-Structured Indexes (pdf)(video) |
PS3 (PS2 due) |
Th 9/24 |
Lecture 14: Hash-Based Indexes (pdf) (video) |
|
Week 8: |
T 9/29 |
Lecture 15: Overview of Query Evaluation (pdf) (video) |
|
Th 10/1 |
Midterm Review |
|
Week 9: |
T 10/6 |
Midterm (Covers Lectures 1-14 and will be given in-class, online) |
Th 10/8 |
Lecture 16: External Sorting (pdf) (video) |
|
Week 10: |
T 10/13 |
Lecture 17: Schema Refinement and Normal Forms (pdf) (video) |
(PS3 due) |
Th 10/15 |
Schema Refinement continued (pdf) (video) |
|
Week 11: |
T 10/20 |
Lecture 18: More on Query Evaluation (pdf) (video) |
PS4 |
Th 10/22 |
Lecture 19: Transaction Scheduling and Preemption (pdf) (video) |
|
Week 12: |
T 10/27 |
Lecture 20: Database Crash Recovery (pdf)(video) |
|
Th 10/29 |
Lecture 21: Intro to NoSQL databases (pdf))(video) |
PS5 |
Week 13: |
T 11/3 |
Lecture 22: MapReduce for Big Data (pdf) (video) |
(PS4 due)
|
Th 11/5 |
Lecture 23: Hadoop Pig and Hive (pdf) (video) |
|
Week 14: |
T 11/10 |
Lecture 24: NoSQL Document Databases (pdf) (video) |
|
Th 11/12 |
Lecture 24: NoSQL Document Databases (continued) (video) |
|
Week 15: |
T 11/17 |
Lecture 25: NoSQL Graph Databases (pdf) |
(PS5 due) |
Th 11/19 |
Final Exam: noon-3pm |
Course Resources
- A simple interactive SQL interpreter that can be embedded in a Jupyter Notebook. You will need to rename the downloaded file to "iSQL.py"
- The small Sailor database used for examples in class.
- AN even smaller, tiny version of the Sailor database used to demonstrate JOINS.
Individual Jupyter Notebooks
It is recommended that, as an alternative to the class Jupyterhub, you have access to a Jupyter environment either locally or on another cloud service. The recommended path is to use the Azure Cloud Notebooks, but you can install anaconda locally and use that if you wish.
Azure Notebooks
This is the recommended alternative to the Jupyterhub. You will get a personal VM with 4GB of memory which should be plenty for this course.
- Go to notebooks.azure.com
- Sign in in the top right
- Use your onyen@ad.unc.edu login with your onyen password
- Select 'My Projects' in the top navigation bar
- Select '+ New Project'
- Select a name and hit create
- Select your new project and hit 'Run on Free Cloud'
- You should now be at a notebook homepage. This can now be used just like the hub.
Local with Anaconda
This completely avoids cloud environments, and gives some other useful python tools, but is somewhat more fiddly
- Go to https://docs.anaconda.com/anaconda/install/
- Follow the installation instructions for your operating system
- Open the Navigator
- Select Launch under Jupyter Notebook
- A screen like the Jupyterhub should appear in your browser
- Create a folder for the class
|