Learning How (And When) To Use List Comprehensions For Data Cleaning Using Python

“When all you have is a hammer, everything looks like a nail.” — Abraham Maslow

What is a List Comprehension?

#Creating variables for three sandwiches, and storing the #ingredients as a listsandwich_one = ['bread','vegan provolone cheese','tomato','spinach','red onion','black bean patty','yellow mustard']sandwich_two = ['barbecue sauce','vegan provolone cheese','portobello mushroom','bread','pulled jack fruit']sandwich_three = ['red onion','tomato','vegan mayonnaise','bread','spicy fake chikn patty','vegan mozzarella cheese','spinach']
#Creating a for loop to check inside sandwich two for ingredients.for i in sandwich_two:
ingredients = i
print(ingredients)
#Storing the for loop inside of a function for more efficiencydef check_sandwich (x):
for i in x:
ingredients = i
print(ingredients)
#Typing the function anytime, and inputting the variable, or #sandwich that you're interested in.check_sandwich(sandwich_three)
#The list comprehension below replaces the for loop.
#We go from 3 lines of code to 1 line of code.
def check_sandwich (x): ingredients = [i for i in x]
print(ingredients)
#Running the function to check on sandwich threecheck_sandwich(sandwich_three)
new_list = [expression(item) for item in old_list if conditional(item)]

A step-by-step process to go from “for loop” to list comprehension

old_list = ['cat','car','keep','corp','key','cool','kobe','coo']
new_list = []
for i in old_list:
if len(i) > 3:
new_list.append(i)
print(new_list)
new_list =
new_list = [i]
new_list = [i for i in old_list]
new_list = [i for i in old_list if len(i) > 3]
new_list = []
for i in old_list:
if len(i) > 3:
new_list.append(i)
print(new_list)
new_list = [i for i in old_list if len(i) > 3]

Every list comprehension can be rewritten as a for loop, but not every for loop can be rewritten as a list comprehension.

Do I want / need to create a new list from an existing list?

Using a list comprehension for the “dataframe nested dictionary” problem

a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object.

btc_list = [i['name'] for i in practice_df['belongs_to_collection'] if i != None]

Copywriter & Website specialist | practicing ma’at & meditation | Student of data science & decentralization

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store