Database Homework #2, Spring 2014

For each question below (most anyway), you will write an SQL query. Additionally, for each question with a star (*), provide the SQLite output (the table that is printed).

Each of these questions should be answered with a single SQL query that would hypothetically work for any logical database instance. In other words, you shouldn't "hard-code" anything in a query that will make the query return incorrect information if the any tuples in the database change.

  1. (Easy SQL) Using SQLite, load the netflix1.db database. It has a table called "recommendation" with records of Netflix customers and how they rated movies. A rating is from 1-5 (1 for "I hate it!" to 5 for "I love it!").
    1. *Return all movies with a rating of 5 from at least one reviewer.
    2. *Return all the reviewers who rated Fight Club.

  2. (Creating tables, importing data, and easy SQL)
    1. Using SQLite, load the movie_ratings.csv file into a database. (This file has three columns, an ID for a customer, a movie name, and a rating from 1-5.) Consult the SQLite documentation for how to do this. You will need to write a CREATE TABLE statement and then import the data. Provide your CREATE TABLE statement for this problem. Don't worry about constraints like primary keys; just choose appropriate data types.
    2. Find the number of reviews where the rating is 5.
    3. Find the number of reviewers who gave a rating of 1 to Gone with the wind.

  3. (Medium SQL) Consider the following relational design used at a Memphis pizza restaurant:

    The Customer table maintains the personal information for people who order pizzas, such as a unique ID, name, phone number, credit card number (ccn), and neighborhood (e.g., Midtown, East Memphis, Evergreen). It is possible that two different people have the same name.

    In the Pizza table, every pizza has an ID, name (e.g., "the works"), particular size (e.g., 7 inches) and price. Note that different pizzas may have the same name, but different sizes.

    The Order table includes the records about which customer ordered which pizza, quantity of pizzas, orderyear (e.g., 2014), ordermonth (e.g., 12), orderday (e.g., 27), and the order time (e.g., "6:13pm"). Note that an order can contain more than one pizza; there will just be multiple tuples with the same cust_id, date, and time.

    The Supplies table includes the information of the various groceries used by the store: the name, unit price, and the amount left in the store (e.g. the store might have 3 lbs of mozzarella left with a unit price of $5 per lb).

    The Ingredient table keeps the records about the amount of ingredients used by each pizza (so "The works 7 inch" might use only 0.1 lbs of mozzarella, while "Four cheese 12 inch" might use 0.4 lbs).

    Write SQL queries to answer parts b through f. Use only the operators and SQL statements we have learnt in class. In addition, avoid the operators IN, ALL, ANY, and EXISTS.

    1. What are good primary keys for the five tables?
    2. Find the ids of the customers with name "Bob."
    3. List pizza ids of the pizzas ordered by all the customers. Remove duplicates.
    4. Find pizza ids of all the pizzas with the price more than 15 dollars.
    5. List ids and names of the customers who ordered at least one pizza and live in Midtown.
    6. Find all the pizzas (pizza id, pizza name, and size) where the store makes a profit of at least 10 dollars per whole pizza. (Here, we assume Profit = Pizza Price minus Ingredient Cost)

  4. (Advanced SQL) Using the potter2.db SQLite database (note: not the original potter.db database!), write SQL queries to answer the following questions.
    1. Find a listing of all courses (specific CRNs) with how many students are enrolled in each CRN.
    2. *Find the largest number of students in any one CRN. In other words, find the size of the biggest class.
    3. *Find the courses (specific CRNs) with the largest number of students. Return the CRN, course name, year, and number of students. Hint: there are four of these courses.
    4. *Find, for each of the four houses, the most popular course title (not CRN) within that house. In other words, you want to find the course title, that across all CRNs with that course title, have the largest student enrollment for each house. Return a table consisting of the house, the course title, and the number of students enrolled in that course title from that house.

      Hint: this is hard. Plan your query out in a text editor so you can move the pieces around and rewrite it as you construct it. Read the book section on correlated subqueries. This may be helpful as well.

      Hint 2: As it turns out, the four houses all have the same most popular course title.