What is list comprehension?
Python is an object oriented programming language. Almost everything in them is treated consistently as an object. Python also features functional programming which is very similar to mathematical way of approaching problem where you assign inputs in a function and you get the same output with same input value. Given a functionf(x) = x2
, f(x) will always return the same result with the same x value. The function has no "side-effect" which means an operation has no effect on a variable/object that is outside the intended usage. "Side-effect" refers to leaks in your code which can modify a mutable data structure or variable.
Functional programming is also good for parallel computing as there is no shared data or access to the same variable.
List comprehension is a part of functional programming which provides a crisp way to create lists without writing a for loop.In the image above, the
for clause
iterates through each item of list. if clause
filters list and returns only those items where filter condition meets. if clause is optional so you can ignore it if you don't have conditional statement.
[i**3 for i in [1,2,3,4] if i>2]
means take item one by one from list [1,2,3,4] iteratively and then check if it is greater than 2. If yes, it takes cube of it. Otherwise ignore the value if it is less than or equal to 2. Later it creates a list of cube of values 3 and 4.
Output : [27, 64]
List Comprehension vs. For Loop vs. Lambda + map()
All these three have different programming styles of iterating through each element of list but they serve the same purpose or return the same output. There are some differences between them as shown below.[i**2 for i in range(2,10)]
sqr = [] for i in range(2,10): sqr.append(i**2) sqr
list(map(lambda i: i**2, range(2, 10)))
Output [4, 9, 16, 25, 36, 49, 64, 81]List comprehension is performing a loop operation and then combines items to a list in just a single line of code. It is more understandable and clearer than for loop and lambda.
range(2,10)
returns 2 through 9 (excluding 10).
**2
refers to square (number raised to power of 2). sqr = []
creates empty list. append( )
function stores output of each repetition of sequence (i.e. square value) in for loop.
map( )
applies the lambda function to each item of iterable (list). Wrap it in list( )
to generate list as output
If statement
if x%2 == 0
checks whether a number is even or not. For example, 5 % 2
returns 1 which is remainder. If remainder is 0, it means it is an even number.
l1= [x**2 for x in range(1, 10**7) if x % 2 == 0] # Processing Time : 3.96 seconds
sqr = [] for x in range(1, 10**7): if x%2 == 0: sqr.append(x**2) # Processing Time : 5.46 seconds
l0 = list(map(lambda x: x**2, filter(lambda x: x%2 == 0, range(1, 10**7)))) # Processing Time : 5.32 seconds
filter(lambda x: x%2 == 0, range(1, 10**7))
returns even numbers from 1 through (10 raised to power 7) as filter() function is used to subset items from the list. Roughly you can think of filter() as WHERE
clause of SQL.
List Comprehension : IF-ELSE
Here we are telling python to convert text of each item of list to uppercase letters if length of string is greater than 4. Otherwise, convert text to lowercase.upper( )
converts string to uppercase. Similarly, you can use lower( )
function for transforming string to lowercase.
mylist = ['Dave', 'Micheal', 'Deeps'] [x.upper() if len(x)>4 else x.lower() for x in mylist]
k = [] for x in mylist: if len(x) > 4: k.append(x.upper()) else: k.append(x.lower()) k
Filtering Dictionary using List Comprehension
Suppose you have a dictionary and you want to select particular keys and specific values. In short you want to apply conditional statementIF
to subset or filter dictionary.
d = {'a': [1,2,1], 'b': [3,4,1], 'c': [5,6,2]}
[x for x in d['b'] if x >2] Output [(3, 4)]
[(x,y) for x, y in zip(d['a'],d['b']) if x == 1 and y > 1] Output [(1, 3)]In the above program, x refers to d['a'] and y refers to d['b'].
all( )
checks condition(s) for all values in the list. It returns True/False.
k
refers to keys and v
refers to values of dictionary.
[(k,v) for k,v in d.items() if all(x > 1 for x in v) ] Output [('c', [5, 6, 2])]Only key
c
contains values which all are greater than 1.
Similarly we have any( )
function which returns True if any of the value meets condition.
[(k,v) for k,v in d.items() if any(x > 2 for x in v) ] Output [('b', [3, 4, 1]), ('c', [5, 6, 2])]
Using List Comprehension on Pandas DataFrame
In real-world, we generally have data stored in either CSV or relational databases. We generally convert it to pandas dataframe and then we do data cleaning and manipulation. Hence it is important to learn how to use list comprehension on dataframe. Suppose you have a dataframe for employees having their names and current and previous salary. Let's create a pandas dataframe for illustration.import pandas as pd df = pd.DataFrame({'name': ['Sandy', 'Sam', 'Wright', 'Atul'], 'prevsalary': [71, 65, 64, 90], 'nextsalary': [75, 80, 61, 89]}) df
name prevsalary nextsalary 0 Sandy 71 75 1 Sam 65 80 2 Wright 64 61 3 Atul 90 89We need to create a column called 'Flag' which will have value either 'High Bracket' or 'Low Bracket'. If values of column
prevsalary
is greater than 70, it should be labelled as 'High Bracket'. Else assign it as 'Low Bracket'.
df['Flag'] = ["High Bracket" if x > 70 else "Low Bracket" for x in df['prevsalary']]The above line of code produced a new column in the df dataframe.
name prevsalary nextsalary Flag 0 Sandy 71 75 High Bracket 1 Sam 65 80 Low Bracket 2 Wright 64 61 Low Bracket 3 Atul 90 89 High Bracket
How to perform operation on multiple columns?
Suppose you want to compare two variables (columns)prevsalary
and nextsalary
. If prevsalary > nextsalary, categorize it as "Increase". Else populate "Decrease" value.
df['Flag2'] = ["Increase" if x > y else "Decrease" for (x,y) in zip(df['nextsalary'],df['prevsalary'])]
name prevsalary nextsalary Flag Flag2 0 Sandy 71 75 High Bracket Increase 1 Sam 65 80 Low Bracket Increase 2 Wright 64 61 Low Bracket Decrease 3 Atul 90 89 High Bracket DecreaseHere we are comparing two variables x and y in list comprehension
x = df['nextsalary'], y = df['prevsalary']
Convert character variable to integer
Suppose you have a column which contains numeric values but the column is defined as a character variable (object column type). You need to convert it to integer. It is a common problem when you read data from some online portal or some columns of table are stored in char or varchar format in relational database. Let's create a dataframe for example.import pandas as pd df = pd.DataFrame({"col1" : ['1', '2', '3']})
df['col2'] = [int(x) for x in df['col1']]See the output below.
col2
has column type integer.
col1 object col2 int64This can be solved without list comprehension as pandas has built-in function for the same
df['col2'] = df['col1'].astype(int)
Nested List Comprehension
It is equivalent to multiple for-loop. The standard syntax of nested list comprehension is shown below
Suppose you have a list of lists and you want to select only the odd numbers. Any number not divisible by 2 is an odd number.mat = [[1,2], [3,4], [5,6]] [x for row in mat for x in row if x%2==1]
b = [] for row in mat: for x in row: if x%2 == 1: b.append(x) b
Output [1, 3, 5]
List Comprehension on dictionaries within list
Suppose you have a list which contains multiple dictionaries and you want to apply filtering on values and selecting specific keys.mylist = [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}, {'a': 5, 'b': 6}]In the following code, we are selecting only key
a
and all the values against this key where it is greater than 1.
[i['a'] for i in mylist if 'a' in i if i['a'] > 1] Output [3, 5]
How to create tuples from lists
Idea is to prepare all the possible combinations by applying multiple loops with list comprehension.l1 = ['a','b'] l2 = ['c','d'] [(x,y) for x in l1 for y in l2]
Output [('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd')]
Split Sentences into words with list comprehension
In text mining, it is one of the initial data cleaning step to break sentences into words. Before splitting, make sure all words are converted into either lower or uppercase to avoid multiple words for same content.text = ["Life is beautiful", "No need to overthink", "Meditation help in overcoming depression"]
[word for sentence in text for word in sentence.lower().split(' ')]
['life', 'is', 'beautiful', 'no', 'need', 'to', 'overthink', 'meditation', 'help', 'in', 'overcoming', 'depression']
for sentence in text
means looping through each sentence of listtext
text[0].lower().split(' ')
converts to lowercase and then separate sentence into words and returns['life', 'is', 'beautiful']
1. Remove these words is, in, to, no
from text
list. Desired output should be as follows -
'life', 'beautiful', 'need', 'overthink', 'meditation', 'help', 'overcoming', 'depression'
2. Find matching numbers in the lists.
x = [51, 24, 32, 41] y = [42, 32, 41, 50]
Output should be [32, 41]
3. If element of list is between 30 and 45, make it 1 else 0
x = [51, 24, 32, 41]Output : [0, 0, 1, 1]
Nice post, thanks..
ReplyDeleteIn the example;
[i['a'] for i in mylist if 'a' in i if i['a'] > 1 ]
would be better not to get 'key' error
It is working fine at my end. What error you are getting?
Delete