Data Engineering Interview Questions and Tutorial

March 20, 2020

Passing a coding interview is a different skill than writing code. Even a CS degree holder who has failed 8+ coding inteviews just needs the practice and the right opportunity!

I recently did a coding interview for a Data Engineer position at one of the top tech companies in the Bay Area (e.g., one of FANG). With sharing the exact Python questions and my "solutions", I hope it would help give insight on how to solve coding problems on the fly. Give it a try to test your knowledge or continue reading to see how I tried to solve it.

Disclaimer: There were 3 Python questions I recieved during my 1-hr coding interview, which was done using coderpad. Each question required that I wrote a function that would have to "pass" all the multiple test cases listed, which was validated by the assert() function. The solutions that I have provided are the ones that I came up with during the interview process. They are not the only solutions or the "fastest".

Question 1

Return the count of a given char that exists in a string

"""
Example: 
s = "mississippi"
char = "s"
Expected Output = 4

Assumptions: Length of char is always 1 
"""

# Try solving this function!

def countChar(s, char):
    # your code here
    return num

assert countChar("mississippi", "s") == 4

Question 2

Return a list of mismatched words between two strings

"""
Example: 
s = "i think dogs are cute"
char = "dogs like to eat"
Expected Output = ['i', 'think', 'are', 'cute', 'like', 'food']

Assumptions: List of words can be returned in any order
"""

# Try solving this function!

def findMismatch(s1, s2):
    # your code here
    return output_list

assert findMismatch("i think dogs are cute", "dogs like to eat") == ['i', 'think', 'are', 'cute', 'like', 'food']

Question 3

Return a list that replaces any 'None' element with the previous non-None element.

"""
Example:
input_list = [1, 1, 8, None] 
Expected Output = [1, 1, 8, 8]
"""

# Try solving this function!
def update_list(input_list):
    # your code here
    return output_list

# Also consider the following test cases
assert fix_list(None) == None
assert fix_list([None,5,4,None]) == [None,5,4,4]

Give yourself about 7-10 minutes to solve each question. If it takes more than 15 minutes, try reviewing data structures in Python.

Solving these questions

The goal of writing each function is to get "passing" for all test cases provided, under a "reasonable" timeframe. Keep practicing!

“I have not failed. I've just found 10,000 ways that won't work.” - Thomas Edison

Question 1: Return the count of a given char that exists in a string.

First, we have to store the counts of the target letter ("char"), so we need to initialize a dictionary using dict() or {}. I tend to prefer the former, just to make reading the code easier.

"""
Example: 
s = "mississippi"
char = "s"
Expected Output = 4

Assumptions: Length of char is always 1 
"""

def countChar(s, char):
    # initialize a dictionary 
    output_count = dict()

    return num

assert countChar("mississippi", "s") == 4

Using the dictionary, we can create a new counter starting at 0 associated with the target letter. We can do this by creating a key-value pair with the key as "char" and value as 0.

"""
Example: 
s = "mississippi"
char = "s"
Expected Output = 4

Assumptions: Length of char is always 1 
"""

def countChar(s, char):
    # initialize a dictionary 
    output_count = dict()
    
    # create a key-value pair with the target letter and counter at 0
    output_count[char] = 0

    return num

assert countChar("mississippi", "s") == 4

Now that the dictionary has been set up, we can iterate each letter of the string by creating a for loop. In the for loop, we can also create a condition for when a letter exists the target letter, "char". This is because any Python string can be treated like a list object (e.g., "hello" -> ["h", "e", "l", "l", "o"]) - we can check whether a string (which contains a "list" of characters) exists in another string (which contains another "list" of characters).

"""
Example: 
s = "mississippi"
char = "s"
Expected Output = 4

Assumptions: Length of char is always 1 
"""

def countChar(s, char):
    # initialize a dictionary 
    output_count = dict()
    
    # create a key-value pair with the target letter and counter at 0
    output_count[char] = 0

    # create a for loop to iterate each letter of the given string
    for letter in s:

        # setup IF condition when the letter "exists" in char
        if letter in char:

    return num

assert countChar("mississippi", "s") == 4

We want to count every time the target char exists in the given string. We can use the abbreviated syntax for counting, count += 1, which is identical to count = count + 1.

To add the number of "counts" into the key-value pair of our dictionary, we can write output_count[char] += 1, which is identical to output_count[char] = output_count[char] + 1.

Finally, we want to the function to return the total counts of the target char, so we return the value of the key-value pair by assigning the num variable with output_count[char].

"""
Example: 
s = "mississippi"
char = "s"
Expected Output = 4

Assumptions: Length of char is always 1 
"""

def countChar(s, char):
    # initialize a dictionary 
    output_count = dict()
    
    # create a key-value pair with the target letter and counter at 0
    output_count[char] = 0

    # create a for loop to iterate each letter of the given string
    for letter in s:

        # setup IF condition when the letter "exists" in char
        if letter in char:

            # incremental counter 
            output_count[char] +=1

    # assign the num variable with the count of the key-value pair
    num = output_count[char]

    return num

assert countChar("mississippi", "s") == 4

Let's try to run this code! If the function does not give you a Assertion() error, this function passes!


Question 2: Return a list of mismatched words between two strings.

First, let's create a list called output_list to store any mismatch words found.

Secondly, let's convert each given string into a list using the split() function. By default, Python will split each word of the string if there is a blank space in between.

For instance, if the string is "hello world", the split() function will return the string as a list ["hello", "world"].

"""
Example: 
s = "i think dogs are cute"
char = "dogs like to eat"
Expected Output = ['i', 'think', 'are', 'cute', 'like', 'food']

Assumptions: List of words can be returned in any order
"""

def findMismatch(s1, s2):
    # create new list to store mismatch words
    output_list = []
    
    # create a list for each string 
    split_s1 = s1.split()
    split_s2 = s2.split()

    return output_list

assert findMismatch("i think dogs are cute", "dogs like to eat") == ['i', 'think', 'are', 'cute', 'like', 'food']

We want to create a for loop so that we can iterate every word in each of the list of words and check that the word does not exist in the other list using the if else condition.

"""
Example: 
s = "i think dogs are cute"
char = "dogs like to eat"
Expected Output = ['i', 'think', 'are', 'cute', 'like', 'food']

Assumptions: List of words can be returned in any order
"""

def findMismatch(s1, s2):
    # create new list to store mismatch words
    output_list = []
    
    # create a list for each string 
    split_s1 = s1.split()
    split_s2 = s2.split()

    # for loop to iterate every word from first list 
    for word in split_s1:

        # conditional to check if the word does not exist in second list
        if word not in split_s2:


    # for loop to iterate every word from second list
    for word in split_s2:

        # conditional to check if the word does not exist in first list
        if word not in split_s1:

    return output_list

assert findMismatch("i think dogs are cute", "dogs like to eat") == ['i', 'think', 'are', 'cute', 'like', 'food']

Under the condition where the word does not exist in the other list, we want to add the word to our output_list, using the append() function. Finally, the function will return the list of mismatch words in any order.

"""
Example: 
s = "i think dogs are cute"
char = "dogs like to eat"
Expected Output = ['i', 'think', 'are', 'cute', 'like', 'food']

Assumptions: List of words can be returned in any order
"""

def findMismatch(s1, s2):
    # create new list to store mismatch words
    output_list = []
    
    # create a list for each string 
    split_s1 = s1.split()
    split_s2 = s2.split()

    # for loop to iterate every word from first list 
    for word in split_s1:

        # conditional to check if the word does not exist in second list
        if word not in split_s2:

            # add word to output_list
            output_list.append(word)


    # for loop to iterate every word from second list
    for word in split_s2:

        # conditional to check if the word does not exist in first list
        if word not in split_s1:

            # add word to output_list
            output_list.append(word)

    return output_list

assert findMismatch("i think dogs are cute", "dogs like to eat") == ['i', 'think', 'are', 'cute', 'like', 'food']

You can check that this function returns the list of mismatch words according to the test case by running the assert() function.


Question 3: Return a list that replaces any 'None' element with the previous non-None element.

Note: this was a fairly challenging question as there are multiple test cases to consider.

First, let's create a new list called output_list for the function to return.

Consider the first test case. If the input list is None, the function should return None. Let's write that condition using an if else statement.

"""
Example:
input_list = [1, 1, 8, None] 
Expected Output = [1, 1, 8, 8]
"""

def update_list(input_list):
    
    # create new list to store output
    output_list = []

    # condition to satisfy the first test case
    if input_list is None:
        return None

    else: 


    return output_list

# Also consider the following test cases
assert fix_list(None) == None
assert fix_list([None,5,4,None]) == [None,5,4,4]

Next, we want to be able to iterate each item of input_list using a for loop and check whether the item is None with an if else statement. If the item is not None, we can add the item to the output_list using the append() function.

"""
Example:
input_list = [1, 1, 8, None] 
Expected Output = [1, 1, 8, 8]
"""

def update_list(input_list):
    
    # create new list to store output
    output_list = []

    # condition to satisfy the first test case
    if input_list is None:
        return None

    else:   
        # iterate each element of the intput_list
        for element in input_list:

            # conditional if the element is not None
            if element is not None:
                output_list.append(element)

            else:


    return output_list

# Also consider the following test cases
assert fix_list(None) == None
assert fix_list([None,5,4,None]) == [None,5,4,4]

The tricky part is figuring out how to add the previous element into the output_list if the current element is None.

One strategy is to store the previous element as a variable, such as previous_element = element when the element is not None. In Python, any variable must be declared outside of the for loop.

What should we assign the previous_element with? An empty string? Recall that for the second test case, the first element of the list is None and does not have a previous element. We can set the previous_element variable as None.

"""
Example:
input_list = [1, 1, 8, None] 
Expected Output = [1, 1, 8, 8]
"""

def update_list(input_list):
    
    # create new list to store output
    output_list = []

    # create a new variable as None
    previous_element = None

    # condition to satisfy the first test case
    if input_list is None:
        return None

    else:
        # iterate each element of the intput_list
        for element in input_list:

            # conditional if the element is not None
            if element is not None:
                output_list.append(element)

                # also store the element 
                previous_element = element

            else:
                # add the previous element when the element is None
                output_list.append(previous_element)

    return output_list

# Also consider the following test cases
assert fix_list(None) == None
assert fix_list([None,5,4,None]) == [None,5,4,4]

Finally, the function should return a list that satisfies the two test cases listed here. Try running this function with the test cases using assert()!


Concluding Thoughts

It's a rough process to "crack" the coding interview. Even if you don't do well, treat all coding interview experiences as a learning process. No matter how many times you have failed, think of how to prepare better for the next one! Several great resources are discussions in LeetCode and Glassdoor for an aggregation of technical interviews.