String Functions in Python with Examples

Deepanshu Bhalla 2 Comments
Python String Functions

This tutorial explains various string (character) functions used in Python, with examples. To manipulate strings and character values, python has several built-in functions.

List of Frequently Used String Functions in Python

The table below shows many common string functions in Python along with their descriptions and their equivalent functions in MS Excel.

Function Description MS EXCEL FUNCTION
mystring[:N] Extract N number of characters from start of string. LEFT( )
mystring[-N:] Extract N number of characters from end of string RIGHT( )
mystring[X:Y] Extract characters from middle of string, starting from X position and ends with Y MID( )
str.split(sep=' ') Split Strings -
str.replace(old_substring, new_substring) Replace a part of text with different sub-string REPLACE( )
str.lower() Convert characters to lowercase LOWER( )
str.upper() Convert characters to uppercase UPPER( )
str.contains('pattern', case=False) Check if pattern matches  (Pandas Function) SQL LIKE Operator
str.extract(regular_expression) Return matched values (Pandas Function) -
str.count('sub_string') Count occurence of pattern in string -
str.find( ) Return position of sub-string or pattern FIND( )
str.isalnum() Check whether string consists of only alphanumeric characters -
str.islower() Check whether characters are all lower case -
str.isupper() Check whether characters are all upper case -
str.isnumeric() Check whether string consists of only numeric characters -
str.isspace() Check whether string consists of only whitespace characters -
len( ) Calculate length of string LEN( )
cat( ) Concatenate Strings (Pandas Function) CONCATENATE( )
separator.join(str) Concatenate Strings CONCATENATE( )

LEFT, RIGHT and MID Functions

If you are intermediate MS Excel users, you must have used LEFT, RIGHT and MID Functions. These functions are used to extract N number of characters or letters from string.

1. Extract First N Characters
mystring = "Hey buddy, wassup?"
mystring[:2]
Output
'He'
  1. string[start:stop:step] means item start from 0 (default) through (stop-1), step by 1 (default).
  2. mystring[:2] is equivalent to mystring[0:2]
  3. mystring[:2] tells Python to pull first 2 characters from mystring string object.
  4. Indexing starts from zero so it includes first, second element and excluding third.
2. Find last two characters of string
mystring[-2:]

The above command returns p?.The -2 starts the range from second last position through maximum length of string.

3. Find characters from middle of string
mystring[1:3]
Output
'ey'

mystring[1:3] returns second and third characters. 1 refers to second character as index begins with 0.

4. How to reverse string?
mystring[::-1]
Output
'?pussaw ,yddub yeH'

-1 tells Python to start it from end and increment it by 1 from right to left.

5. How to Extract Characters from Character Variable in Pandas DataFrame?

Let's create a fake data frame for illustration. In the code below, we are creating a dataframe named df containing only 1 variable called var1

import pandas as pd
df = pd.DataFrame({"var1": ["A_2", "B_1", "C_2", "A_2"]})
Output
  var1
0  A_2
1  B_1
2  C_2
3  A_2

To deal text data in Python Pandas Dataframe, we can use str attribute. It can be used for slicing character values.

df['var1'].str[0]

In this case, we are fetching first character from var1 variable. See the output shown below.

Output
0    A
1    B
2    C
3    A

Extract Words from String

Suppose you need to take out word(s) instead of characters from string. Generally we consider one blank space as delimiter to find words from string.

1. Find first word of string
mystring.split()[0]
Output
'Hey'
How it works?
  1. split() function breaks string using space as a default separator
  2. mystring.split() returns ['Hey', 'buddy,', 'wassup?']
  3. 0 returns first item or word Hey
2. Comma as separator for words
mystring.split(',')[0]
Output
Out[1]: 'Hey buddy'
3. How to extract last word
mystring.split()[-1]
Output
Out[1]: 'wassup?'
4. How to extract word in DataFrame

Let's build a dummy data frame consisting of customer names and call it variable custname

mydf = pd.DataFrame({"custname": ["Priya_Sehgal", "David_Stevart", "Kasia_Woja", "Sandy_Dave"]})
Output
        custname
0   Priya_Sehgal
1  David_Stevart
2     Kasia_Woja
3     Sandy_Dave
#First Word
mydf['fname'] = mydf['custname'].str.split('_').str[0]

#Last Word
mydf['lname'] = mydf['custname'].str.split('_').str[1]
Output
        custname  fname    lname
0   Priya_Sehgal  Priya   Sehgal
1  David_Stevart  David  Stevart
2     Kasia_Woja  Kasia     Woja
3     Sandy_Dave  Sandy     Dave
Detailed Explanation
  1. str.split( ) is similar to split( ). It is used to activate split function in pandas data frame in Python.
  2. In the code above, we created two new columns named fname and lname storing first and last name.

SQL LIKE Operator in Pandas DataFrame

In SQL, LIKE Statement is used to find out if a character string matches or contains a pattern. We can implement similar functionality in python using str.contains( ) function.

df2 = pd.DataFrame({"var1": ["AA_2", "B_1", "C_2", "a_2"],
                    "var2": ["X_2", "Y_1", "Z_2", "X2"]})
Output
   var1 var2
0  AA_2  X_2
1   B_1  Y_1
2   C_2  Z_2
3   a_2   X2
How to find rows having specific characters in variable?

The following code tells Python to find rows containing either 'A' or 'B' in variable 'var1'.

df2['var1'].str.contains('A|B')

str.contains(pattern) is used to match pattern in Pandas Dataframe.

Output
0     True
1     True
2    False
3    False
The above command returns FALSE against fourth row as the function is case-sensitive. To ignore case-sensitivity, we can use case=False parameter. See the working example below.
df2['var1'].str.contains('A|B', case=False)
How to filter rows containing a particular pattern?

In the following program, we are asking Python to subset data with condition - contain character values either A or B. It is equivalent to WHERE keyword in SQL.

df2[df2['var1'].str.contains('A|B', case=False)]
Output
 var1 var2
0  AA_2  X_2
1   B_1  Y_1
3   a_2   X2

Suppose you want only those values that have alphabet followed by '_'

df2[df2['var1'].str.contains('^[A-Z]_', case=False)]

^ is a token of regular expression which means begin with a particular item.

Output
  var1 var2
1  B_1  Y_1
2  C_2  Z_2
3  a_2   X2

Find position of a particular character or keyword

str.find(pattern) is used to find position of sub-string. In this case, sub-string is '_'.

df2['var1'].str.find('_')
Output
0    2
1    1
2    1
3    1

Replace substring

str.replace(old_text,new_text,case=False) is used to replace a particular character(s) or pattern with some new value or pattern. In the code below, we are replacing _ with -- in variable var1.

df2['var1'].str.replace('_', '--', case=False)
Output
0    AA--2
1     B--1
2     C--2
3     A--2

We can also complex patterns like the following program. + means item occurs one or more times. In this case, alphabet occurring 1 or more times.

df2['var1'].str.replace('[A-Z]+_', 'X', case=False)
Output
0    X2
1    X1
2    X2
3    X2

Find length of string

len(string) is used to calculate length of string. In pandas data frame, you can apply str.len() for the same.

df2['var1'].str.len()
Output
0    4
1    3
2    3
3    3

To find count of occurrence of a particular character (let's say, how many time 'A' appears in each row), you can use str.count(pattern) function.

df2['var1'].str.count('A')

Convert to lowercase and uppercase

str.lower() and str.upper() functions are used to convert string to lower and uppercase values.

#Convert to lower case
mydf['custname'].str.lower()

#Convert to upper case
mydf['custname'].str.upper()

Remove Leading and Trailing Spaces

  1. str.strip() removes both leading and trailing spaces.
  2. str.lstrip() removes leading spaces (at beginning).
  3. str.rstrip() removes trailing spaces (at end).
df1 = pd.DataFrame({'y1': [' jack', 'jill ', ' jesse ', 'frank ']})
df1['both']=df1['y1'].str.strip()
df1['left']=df1['y1'].str.lstrip()
df1['right']=df1['y1'].str.rstrip()
Output
        y1   both    left   right
0     jack   jack    jack    jack
1    jill    jill   jill     jill
2   jesse   jesse  jesse    jesse
3   frank   frank  frank    frank

Convert Numeric to String

With the use of str( ) function, you can convert numeric value to string.

myvariable = 4
mystr = str(myvariable)

Concatenate or Join Strings

By simply using +, you can join two string values.

x = "Deepanshu"
y ="Bhalla"
x+y
Output
DeepanshuBhalla

In case you want to add a space between two strings, you can use this - x+' '+y returns Deepanshu Bhalla

Suppose you have a list containing multiple string values and you want to combine them. You can use join( ) function.

string0 = ['Ram', 'Kumar', 'Singh']
' '.join(string0)
Output
'Ram Kumar Singh'

Suppose you want to combine or concatenate two columns of pandas dataframe.

mydf['fullname'] = mydf['fname'] + ' ' + mydf['lname']
OR
mydf['fullname'] = mydf[['fname', 'lname']].apply(lambda x: ' '.join(x), axis=1)
Output
        custname  fname    lname       fullname
0   Priya_Sehgal  Priya   Sehgal   Priya Sehgal
1  David_Stevart  David  Stevart  David Stevart
2     Kasia_Woja  Kasia     Woja     Kasia Woja
3     Sandy_Dave  Sandy     Dave     Sandy Dave

SQL IN Operator in Pandas

We can use isin(list) function to include multiple values in our filtering or subsetting criteria.

mydata = pd.DataFrame({'product': ['A', 'B', 'B', 'C','C','D','A']})
mydata[mydata['product'].isin(['A', 'B'])]
Output
  product
0       A
1       B
2       B
6       A
How to apply NOT criteria while selecting multiple values?

We can use sign ~ to tell python to negate the condition.

mydata[~mydata['product'].isin(['A', 'B'])]

Extract a particular pattern from string

str.extract(r'regex-pattern') is used for this task.

df2['var1'].str.extract(r'(^[A-Z]_)')

r'(^[A-Z]_)' means starts with A-Z and then followed by '_'

Output
0    NaN
1     B_
2     C_
3    NaN

To remove missing values, we can use dropna( ) function.

df2['var1'].str.extract(r'(^[A-Z]_)').dropna()
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Post Comment 2 Responses to "String Functions in Python with Examples"
  1. Thanks to Listendata for an excellent article. I love Listendata.

    ReplyDelete
  2. Nice article, it was very helpful to me. Thanks!

    ReplyDelete
Next → ← Prev