Originally published on StrataScratch.
Behind the user friendliness of the smartphone app, Lyft puts a lot of effort into collecting and analyzing data about its users and drivers. This helps them plan and optimize their resources to provide the most efficient inner city transportation considering the price, time, and level of comfort.
Interviewers at Lyft try to ask the right questions to hire capable data scientists. In this article, we will walk you through one of the common Lyft data scientist interview questions, where candidates have to calculate driver churn rate based on available data.
Basic to Intermediate Concepts Tested in Lyft Data Scientist Interview
Lyft is looking for data scientists who can help the company by writing efficient and readable SQL queries. To do that, candidates need to thoroughly understand SQL concepts.
Let’s look at some essential SQL concepts which can help you get the data scientist job at Lyft.
COUNT()
When going into a data science interview, it’s important to know how to use all aggregate functions. This includes their syntax, possible arguments, and output values. Reading “The Ultimate Guide to SQL Aggregate Functions” can help you better understand this SQL feature.
To answer this particular question, a candidate needs to understand the COUNT() aggregate function. You should know that the COUNT() function returns the number of rows that have a value in the specified column. If there is a NULL value, the COUNT() function will not add that row to the total.
Other useful operations that a candidate might need to perform include: getting the number of rows with unique values in a specific column, getting the number of total records, getting the number of records with NULL values in a specific column, giving an alias to the result, and so on.
It’s also useful to know how to use the GROUP BY statement to summarize the results of aggregate functions such as COUNT().
In the best-case scenario, you should be familiar with some of the examples of how COUNT() can be used in practice. To arrive at the final answer to this question, we are going to use COUNT() to find the number of rows with a NULL value in the specified column. Then we can find the ratio between this number and the total number of rows in the table.
Another important detail is that the COUNT() aggregate function returns an integer value. In our solution, we divide one output of a COUNT() function by another to get a decimal ratio. To get the desired result, we will need to convert one of the values to a float.
Converting Value Types
When solving a question, it’s important to know what type of values you’re dealing with. If needed, you should be able to convert one type of value into another to perform arithmetic operations and get accurate results.
A candidate should know how to convert a date value to a string, an integer to a float, a string to an integer, or any other combination. One of the most important functions to know is CAST(), but there is also a shorthand syntax for changing the type of value.
In this particular question, we are dealing with the differences between integer division and float division. Finding an answer requires a thorough understanding of the difference between two arithmetic operations. You should also know when they occur and why divisions are of one type or another.
For example, we might need to divide the result of one COUNT() function by another. Both of these are integer values. If we want to get a float value as an answer, we have to perform a float division. For that, we have to convert one of them to a float value.
Read full article on StrataScratch.