Comp 521: Files and Databases -- Fall 2010 Problem Set #4 Issued: 10/27/2010 Due: In class 11/10/2010
Homework Information: Some of the problems are probably too long to attempt the night before the due date, so plan accordingly. Late homework will penalized by a factor of 70.71% for each late class period. Feel free to work with others, but the work you hand in should be your own.
Question 1.
Consider the following tree
A) Suppose this is an ISAM tree. Show the tree after inserting the following elements: (16, 17, 18) and then deleting the following elements: (11, 12, 16).
B) Suppose this is a B+ tree. Show the tree after inserting the following elements (16. 17, 18)
C) Now show tree from part B after deleting the following elements: (11, 12, 16). Assume the deletion algorithm tries to merge/redistribute with the right sibling if one exists.
Question 2.
Suppose we are bulk-loading a B-Tree. Pages are 4096 bytes, with 88 bytes of each used for headers. A key value takes 8 bytes, and a pointer to a tree node or row takes 8 bytes.
A) What is the degree of the B-tree?
B) Each leaf node is to be loaded to a maximum occupancy of 0.8, and there are 1,000,000 rows. How many leaf nodes will there be?
C) How high will the tree be?
Question 3.
Suppose we are building a extensible hash index on a table of 1,000,000 rows. Key values are 8 bytes, a pointer to a row is 8 bytes, and a page is 4096 bytes. Each bucket needs 8 bytes at the end reserved for an overflow bucket pointer. Assume all keys are distinct.
A) What is the (lowest possible) global depth?
B) What is the average occupancy of a bucket, assuming all buckets have a local depth equal to the global depth from part (A)?
Question 4.
Consider the Extensible Hash structure shown on page 377 (Fig 11.6).
A) Show the structure after removing the record with hash value 10.
B) Show the structure from part A after adding two records with hash value 27.
C) Show the structure from part B after adding two records with hash value 28.
Question 5.
Draw query trees for the following relational algebra expressions based on the "movies.db" database's schema:
A. πfirst,last(σmovieId=2643 ^ sex ='F'(Customers Rentals))
B. πfirst,last(σsex='F'(Customers) σmovieId=2643(Rentals))
C. πfirst,last(σmovieId=2643(σsex='F'(Customers) Rentals))
D. πfirst,last(σsex='F'(Customers σmovieId=2643(Rentals)))
Question 6.
Use the following simple hashing scheme, key = CardNo % 4096, to build indexes for the "Rentals" and "Customers" tables of the movies.db database from problem sets #2 and #3 using SQLite and Python. Use your index to answer the following questions.
A. What is the occupancy of (number of records in) the largest, smallest, and average buckets?
B. Using your indices estimate the number of tests saved when joining Rentals and Customers in a query such as the following:
SELECT C.first, C.last, R.date
FROM Customers, Rentals
WHERE C.CardNo=R.CardNo
when compared to a heap-file organization of both relations? (Note: Assume that key matches are found on average after searching through half of either a heap-file or a hash bucket).
C. How many records must be searched in the hash index to find the total number of Rentals made by the customer named "LEONARD LEONARD"?
Turn in your code to generate the indexes, compute, and print out each of the three answers. Also include a print out of your result.