Loading CSV data in Python using pandas

This tutorial explains how to read a CSV file in python with pandas. It outlines many examples of loading a CSV file into Python. Pandas is an awesome package for data manipulation. It includes various functions to load and import data from various formats. In this post, we will see how to load comma separated files with several use cases.

Load Package

You have to load required package i.e. pandas. Run the following command to load it.
import pandas as pd
Create Sample Data for Import

The program below creates a sample data frame which can be used further for demonstration.
dt = {'ID': [11, 12, 13, 14, 15],
            'first_name': ['David', 'Jamie', 'Steve', 'Stevart', 'John'],
            'company': ['Aon', 'TCS', 'Google', 'RBS', '.'],
            'salary': [74, 76, 96, 71, 78]}
mydt = pd.DataFrame(dt, columns = ['ID', 'first_name', 'company', 'salary'])
The sample data looks like below - 
Sample Data
Save data as CSV in the working directory

The following command tells python to write data in CSV format.
mydt.to_csv('workingfile.csv', index=False)
Example 1 : Read CSV file with header row

It's the basic syntax of read_csv() function. You just need to mention the filename.
mydata  = pd.read_csv("workingfile.csv")
Example 2 : Read CSV file without header row
mydata0  = pd.read_csv("workingfile.csv", header = None)
If you specify "header = None", python would assign a series of numbers starting from 0 to (number of columns - 1). See the output shown below -
Example 3 : Specify missing values

The na_values= options is used to set some values as blank / missing values.
mydata00  = pd.read_csv("workingfile.csv", na_values=['.'])
Set Missing Values

Example 4 : Set Index Column to ID
mydata01  = pd.read_csv("workingfile.csv", index_col ='ID')
Python : Setting Index Column
As you can see in the above image, the column ID has been set as index column.

Example 5 : Read CSV File from URL

You can directly read data from the CSV file that is stored on a web link.
mydata02  = pd.read_csv("http://winterolympicsmedals.com/medals.csv")

Example 6 : Skip First 5 Rows While Importing CSV
mydata03  = pd.read_csv("http://winterolympicsmedals.com/medals.csv", skiprows=5)
It reads data from 6th row (6th row would be a header row)

Example 7 : Skip Last 5 Rows While Importing CSV
mydata04  = pd.read_csv("http://winterolympicsmedals.com/medals.csv", skip_footer=5)
It excludes last5 rows.

Example 8 : Read only first 5 rows
mydata05  = pd.read_csv("http://winterolympicsmedals.com/medals.csv", nrows=5)
Example 9 : Interpreting "," as thousands separator
mydata06 = pd.read_csv("http://winterolympicsmedals.com/medals.csv", thousands=",")
Example 10 : Read only specific columns
mydata07 = pd.read_csv("http://winterolympicsmedals.com/medals.csv", usecols=(1,5,7))
The above code reads only columns placed at first, fifth and seventh position.

Example 11 : Read some rows and columns
mydata08 = pd.read_csv("http://winterolympicsmedals.com/medals.csv", usecols=(1,5,7),nrows=5)
In the above command, we have combined usecols= and nrows= options. It will select only first 5 rows and selected columns.

About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has close to 7 years of experience in data science and predictive modeling. During his tenure, he has worked with global clients in various domains like retail and commercial banking, Telecom, HR and Automotive.

While I love having friends who agree, I only learn from those who don't.

Let's Get Connected: Email | LinkedIn

Get Free Email Updates :
*Please confirm your email address by clicking on the link sent to your Email*

Related Posts:

0 Response to "Loading CSV data in Python using pandas"

Post a Comment

Next → ← Prev