IEOR 4720 SQL and Databases
Part 1 (10 points)
This first part is about riders (clients) signup.
Given the below subset of Uber’s schema (the 3 tables below), write executable SQL queries to answer the questions. Please answer in a single query script for each question and you can use at most 1 temporary table (a CTE with a WITH statement or a nested query).
A. (5 pts) For each of the cities 'Qarth' and 'Meereen', calculate the average difference between Actual and Predicted ETA (estimated time of arrival) for all completed trips within the last 90 days.
B. (5 pts) A rider’s signup is defined as an event labeled ‘sign_up_success’ within the events table. Restrict the attention to signups in the first 7 days of 2024. For each city (‘Qarth’ and ‘Meereen’) and for each day in which there was a signup, determine the percentage of signups that resulted in a completed trip within 168 hours of the signup.
Data is not provided for this exercise, but you can generate fake data to test your queries.
Submission instructions: You will copy/paste your queries into Courseworks.
Table name: trips
Column name: | Datatype: |
id | integer |
client_id | Integer (Foreign keyed to events.rider_id) |
driver_id | integer |
city_id | Integer (Foreign keyed to cities.city_id) |
client_rating | integer |
driver_rating | integer |
request_at | Timestamp |
predicted_eta | Integer (in minutes) |
actual_eta | Integer (in minutes) |
status | String (can be ‘completed’, canceled’) |
Table name: cities
Column name: | Datatype: |
city_id | integer |
city_name | string |
Table name: events
Column name: | Datatype: |
device_id | integer |
rider_id | integer |
city_id | Integer |
event_name | String (can be: ‘sign_up_success’, ‘attempted_sign_up’, ‘sign_up_failure’) |
_ts | Timestamp of the event |
Part 2 (20 points)
This second part is not related to the first part. In this second part we will study drivers signup.
Uber’s goal is to make sure that the drivers that start the signup process end up taking their first trip on the platform.
A snapshot of data from the driver_signup table is available in the driver_signup.csv file (on Courseworks).
An Uber product manager needs help answering a few (intentionally ambiguous) questions:
1. Are there any issues with the data? Feel free to make assumptions to perform your analysis.
2. What fraction of the drivers that sign up also take a first trip? How long does it take them to take a first trip?
3. What makes a driver more likely to start driving?
4. How would you define the most important performance metric that our cross-functional team should track using this table? What are one or two additional performance metrics that we should track?
5. Do you have any ideas for our app to improve the performance metrics that you picked?
Submission instructions:
Please report the results of your analysis in a Google Slides deck of at most 12 pages (title page included). Optional: you can add additional pages or code in an appendix section of your deck (they won’t be graded). When done working on your slides deck, you will click on Share (top right) and change the General Access setting to LionMail (or Public or columbia.edu). You will then copy the url link to your deck and paste it in the Courseworks submission field.
Your analysis will be evaluated based on its content and the criteria for building an effective presentation, discussed throughout the course. The analysis can be a simple exploratory analysis (fancy statistical models are not necessary).
Table name: driver_signup
Column name: | Datatype: |
id | integer |
city_id | Integer (where did the driver sign up) |
signup_os | String (can be: “android”, “ios”, “website”, …) |
signup_channel | string (Can be: “offline”, “paid”, “organic”, “referral”) |
signup_timestamp | Timestamp (timestamp of account creation) |
bgc_date | Date (of background consent) |
vehicle_added_date | Date when driver uploaded vehicle info |
first_trip_date | Date of first trip as a driver |
vehicle_make | String (make of vehicle uploaded) |
vehicle_model | String (model of vehicle uploaded) |
vehicle_year | Year that the car was made |