Matplotlib Tutorial : Learn by Examples

Deepanshu Bhalla 16 Comments
This tutorial outlines how to perform plotting and data visualization in python using Matplotlib library. The objective of this post is to get you familiar with the basics and advanced plotting functions of the library. It contains several examples which will give you hands-on experience in generating plots in python.
Table of Contents

What is Matplotlib?

It is a powerful python library for creating graphics or charts. It takes care of all of your basic and advanced plotting requirements in Python. It took inspiration from MATLAB programming language and provides a similar MATLAB like interface for graphics. The beauty of this library is that it integrates well with pandas package which is used for data manipulation. With the combination of these two libraries, you can easily perform data wrangling along with visualization and get valuable insights out of data. Like ggplot2 library in R, matplotlib library is the grammar of graphics in Python and most used library for charts in Python.
visualization python

Basics of Matplotlib

First step you need to install and load matplotlib library. It must be already installed if you used Anaconda for setting up Python environment.
Install library
If matplotlib is not already installed, you can install it by using the command
pip install matplotlib
Import / Load Library
We will import Matplotlib’s Pyplot module and used alias or short-form as plt
from matplotlib import pyplot as plt
Elements of Graph
Different elements or parts of a standard graph are shown in the image below -
basics of plot
Figure
You can think of the figure as a big graph consisting of multiple sub-plots. Sub-plot can be one or more than one on a figure. In graphics world, it is called 'canvas'.
figure vs axes
Axes
You can call them 'sub-plots'.
Axis
It's the same thing (x or y-axis) which you studied in school or college. A standard graph shows the marks on the axis. In matplotlib library, it is called ticks and text or value in ticks is called ticklabels.
Basic Plot
x = [1, 2, 3, 4, 5]
y = [5, 7, 3, 8, 4]
plt.bar(x,y)
plt.show()
bar plot python
If you are using Jupyter Notebook, you can submit this command %matplotlib inline once to display or show plots automatically without need to enter plt.show() after generation of each plot.

Functions used for different types of plots

The following tables explain different graphs along with functions defined for these graphs in matplotlib library.
Type of Plot Function
line plot (Default) plt.plot(  )
vertical bar plots plt.bar(  )
horizontal bar plots plt.barh(  )
histogram plt.hist(  )
boxplot plt.box(  )
area plots plt.area(  )
scatter plots plt.scatter(  )
pie plots plt.pie(  )
hexagonal bin plots plt.hexbin(  )

1. Bar Graph

Bar Graph is used to make comparison between different categories or groups. Suppose you want to show comparison between cities in terms of average annual income. Let's try with basic bar chart.
plt.title("Simple Bar graph") # Name title of the graph
plt.xlabel('Students') # Assign the name of the x axis
plt.ylabel("Math Score") # Assign the name of the y axis
plt.bar(x, y, color='red') # Change bar color
plt.show()
You can style your graph using the following functions -
  • plt.title( ) for specifying title of your plot.
  • plt.xlabel( ) for labeling x-axis.
  • plt.ylabel( ) for labeling y-axis.
  • color = option in plt.bar( ) for defining color of bars.

How to show values or labels on top of bar

It's not easy and straightforward to show values on bar graph as there is no in-built function for this task in matplotlib library. Hence we have to write our code to accomplish this task.
barplot = plt.bar(x, y)
for bar in barplot:
    yval = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2.0, yval, int(yval), va='bottom') #va: vertical alignment y positional argument
    
plt.title("Simple Bar graph")
plt.xlabel('Students')
plt.ylabel("Math Score")
How this code works?
We need to understand the logic for showing labels on top of each bar. First we need to find out the position where we need to show the labels. .get_height() returns height of rectangle of each bar which is basically a value of y-axis. To find value of x-axis, we can use get_x() and get_width() function. plt.text() is used to place text on the graph. Syntax of plt.text( )
plt.text(position of x axis where you want to show text, position of y-axis, text, other_options)
We used int(yval) instead of yval because by default, it shows values in decimals.

How to hide axis

Many times we hide y-axis to give aesthetic touch to our bar graph. To do this in matplotlib, we can leverage plt.yticks( ) function.[ ] means empty list.
plt.yticks([])
hide axis bar
How to show string values in Bar Graph
It is straightforward to do if you are using matplotlib version 2.1 onwards. To check version of python package, you can use !pip show matplotlib in Jupyter notebook. To upgrade it, you can submit the code !pip install --upgrade matplotlib
x = ['A', 'B', 'C', 'D', 'E']
y = [5, 7, 3, 8, 4]
plt.bar(x, y)
If you are using version prior to matplotlib 2.1, matplotlib does not take string values in x-axis in bar graph so we need to find workaround to solve this problem. Solution is to show string values as labels and range(len(x)) would display values through 1 to 5 in x-axis. plt.xticks can be used for this task. It follows the syntax plt.xticks(location of ticks, labels, size='small')
plt.bar(range(len(x)), y)
plt.xticks(range(len(x)), x, size='small')

Horizontal Bar Chart

Nowadays analysts prefer showing horizontal bar chart instead of column bar chart for comparison as it looks more professional and elegant in terms of look. Both the type of charts serve the same purpose. plt.barh(x,y) is used for generating horizontal bar graph.
plt.barh(x,y)
plt.title("Simple Horizontal Bar graph")
plt.xlabel("Math Score")
plt.ylabel('Students')
Horizontal Bar Graph

How to sort or order bars

To arrange bars in order based on values not alphabetically, we need to combine both the lists and then sort them based on value of list y. zip( ) function is used to combine items of lists x and y. sorted( )sort them based on y. Then we split and store it in x and y lists.
y,x = zip(*sorted(zip(y,x)))
plt.barh(x,y)
sort bar graph
Reverse order of bars
plt.subplot() is used to find out current axes and then invert function assists to reverse the order.
plt.barh(x,y)
ax=plt.subplot()
ax.invert_yaxis()

Use Professional Themes / Styles for Graphs

There are many themes available in pyplot module. See the list of built-in themes which you can leverage to make your graph more graceful.
print(plt.style.available)
['bmh', 'classic', 'dark_background', 'fivethirtyeight', 'ggplot', 'grayscale', 'seaborn-bright', 'seaborn-colorblind', 'seaborn-dark-palette', 'seaborn-dark', 'seaborn-darkgrid', 'seaborn-deep', 'seaborn-muted', 'seaborn-notebook', 'seaborn-paper', 'seaborn-pastel', 'seaborn-poster', 'seaborn-talk', 'seaborn-ticks', 'seaborn-white', 'seaborn-whitegrid', 'seaborn', '_classic_test']
I'm going to use fivethirtyeight theme.
plt.style.use('fivethirtyeight')

Line Graph

Line graphs are used to show value of some items over time. Suppose you need to show pass percentage of students of government schools in the last 5 years. Another example - how sales have changed in the past five years? Let's create a pandas dataframe for the same.
import pandas as pd

df = pd.DataFrame({"Year" : [2014,2015,2016,2017,2018], 
                  "Sales" : [2000, 3000, 4000, 3500, 6000]})
plt.plot( ) function is used for line graph. It is the default graph type.
# plot line chart
plt.plot(df["Year"], df["Sales"])
plt.title("Simple Line Plot")
plt.xlabel('Year')
plt.ylabel('Sales')
line graph
Pandas can make graphs by calling plot directly from the data frame. Plot can be called by defining plot type in kind= option. Syntax of Plot in Pandas
df.plot(x="Year", y="Sales", kind="line")
- 'line' for line plot (Default)
- 'bar'  for vertical bar plots
- 'barh' for horizontal bar plots
- 'hist' for histogram
- 'pie' for pie plots
- 'box' for boxplot
- 'kde' for density plots
- 'area' for area plots
- 'scatter' for scatter plots
- 'hexbin' for hexagonal bin plots

Add Marker in Line Plot

By making use of style= option, you can include marker with customization in color and style.
ax = df.plot(x="Year", y="Sales", kind="line", title ="Simple Line Plot", legend=False, style = 'r--')
ax.set(ylabel='Sales', xlabel = 'Year', xticks =df["Year"])
line marker
You can also use these styles ro, ro--, r+, rD-.. They refer to cirles, dash lines, dash-dotted lines. You can mention any color ('g' for green, 'b' for blue, 'k' for black etc.)

.set method is used to add x and y axis labels, limits and ticks. It can also be written like the code below -

ax.set_ylabel('Sales')
ax.set_xlabel('Year')
ax.set_xticks(df["Year"])
Syntax of ax.set family functions is equivalent to plt. styling functions. See comparison of some of them -
ax.set_ylabel() plt.ylabel()
ax.set_xlabel() plt.xlabel()
ax.set_xticks() plt.xticks()
legend=False tells pandas to turnoff legend

Add Multiple Lines in Line Graph Pandas Way

In the code below, we are creating a pandas DataFrame consisting sales of two products A and B along with time period (Year). Idea is to compare sales of products and how they performed in the last 5 years.
import pandas as pd

product = pd.DataFrame({"Year" : [2014,2015,2016,2017,2018], 
                  "ProdASales" : [2000, 3000, 4000, 3500, 6000],
                  "ProdBSales" : [3000, 4000, 3500, 3500, 5500]})


# Multi line plot
ax = product.plot("Year", "ProdASales", kind="line", label = 'Product A Sales')
product.plot("Year", "ProdBSales", ax= ax , kind="line", label = 'Product B Sales', title= 'MultiLine Plot') #ax : axes object

# Set axes
ax.set(ylabel='Sales', xlabel = 'Year', xticks =df["Year"])
multiple lines in plot
We included second line by adding it in ax= option which is axes object. It is like a same figure but multiple axes.

How to change interval on x axis?

Suppose you want to show years from 2014 to 2018 with increment by an year on x-axis. If values are incremented by 0.5 instead of 1 so that's an issue you should fix it. See the image below -
# How to show years with same interval as it is defined in column
ax = df.plot(x="Year", y="Sales", kind="line", title ="Simple Line Plot", legend=False)
ax.set(ylabel='Sales', xlabel = 'Year', xticks =df["Year"])
xticks= can be used to change the scale of intervals of x axis. It will show the ticks what do you want to show on x axis.

Scatter Plot

A scatter plot is mainly used to show relationship between two continuous variables. For example, you want to measure the relationship between height and weight. Like line graph, it can also be used to show trend over time. We generally plot a set of points on x and y axes.

kind = 'scatter' is used for creating scatter diagram.

ax = product.plot("Year", "ProdASales", kind='scatter', color = 'red', title = 'Year by ProductA Sales')
ax.set(ylabel='ProductA Sales', xlabel = 'Year', xticks =df["Year"])
plt.show()

Pie Chart

A pie chart is a circular graph which splits data into slices to show numerical proportion of each category. If you are showing percentages, all of them should add to 100%.
share = [20, 12, 11, 4, 3]
companies = ['Google', 'Facebook', 'Apple', 'Microsoft', 'IBM', ]
comp = pd.DataFrame({"share" : share, "companies" : companies})
ax = comp.plot(y="share", kind="pie", labels = comp["companies"], autopct = '%1.0f%%', legend=False, title='Market Share')

# Hide y-axis label
ax.set(ylabel='')

Customize Pie Chart

The default startangle is 0. By making startangle = 90 , everything will be rotated counter-clockwise by 90 degrees. By using explode = option, you can explode specific categories. In the following program, we are exploding first three categories.
# Customize Pie Chart
ax = comp.plot(y="share", kind="pie", labels = comp["companies"], startangle = 90, shadow = True, 
        explode = (0.1, 0.1, 0.1, 0, 0), autopct = '%1.0f%%', legend=False, title='Market Share')
ax.set(ylabel='')
plt.show()

Histogram

Histogram is used to show the frequency distribution of a continuous variable. Suppose you want to see the distribution of marks got by students.
# Creating random data
import numpy as np
np.random.seed(1)
mydf = pd.DataFrame({"Age" : np.random.randint(low=20, high=100, size=50)})

# Histogram
ax = mydf.plot(bins= 5, kind="hist", rwidth = 0.7, title = 'Distribution - Marks', legend=False)
ax.set(xlabel="Bins")
plt.show()
bins represents the number of intervals you want to show on x axis. rwidth shows the relative width of the bars as a fraction of the bin width.

How to add multiple sub-plots

With the use of matplotlib library, we can generate multiple sub-plots in the same graph or figure. Matplotlib provides two interfaces to do this task - plt.subplots( ) and plt.figure(). Logic is similar in both the ways - we will have a figure and we'll add multiple axes (sub-plots) on the figure one by one.

I created a dummy DataFrame for illustration. In this example, we have data for cities with cost of living scores (fake data!) of year 2017 and 2018.

labels = ['Delhi', 'Mumbai', 'Bangalore', 'Chennai']
x1 = [45, 30, 15, 10]
x2 = [25, 20, 25, 50]

finaldf = pd.DataFrame({"2017_Score":x1, "2018_Score" : x2, "cities" : labels})
.add_subplot() or .subplots() follows the syntax rule -
subplot(number of rows, number of columns, plot number)
.add_subplot(121) means 1 row, 2 columns and 1st plot
.add_subplot(122) means 1 row, 2 columns and 2nd plot

Similarly, .subplots(1, 2) means 1 row and 2 columns
Both the following methods return same result. It generate two sub-plots and place them next to each other (horizontally).
Method I
fig = plt.figure()

ax1 = fig.add_subplot(121)
ax = finaldf.plot(x="cities",  y="2017_Score", ax=ax1, kind="barh", legend = False, title = "2017 Score")
ax.invert_yaxis()

ax2 = fig.add_subplot(122)
ax = finaldf.plot(x="cities",  y="2018_Score", ax=ax2, kind="barh", legend = False, title = "2018 Score")
ax.invert_yaxis()
ax.set(ylabel='')
Method II
fig, (ax0, ax01) = plt.subplots(1, 2)

ax = finaldf.plot(x="cities",  y="2017_Score", ax=ax0, kind="barh", legend = False, title = "2017 Score")
ax.invert_yaxis()

ax = finaldf.plot(x="cities",  y="2018_Score", ax=ax01, kind="barh", legend = False, title = "2018 Score")
ax.invert_yaxis()
ax.set(ylabel='')
Detailed Explanation
These 3 lines of code return blank (empty) 2 sub-plots. 121 refers to left hand side plot and 122 refers to right hand side plot. With the use of ax= object we use axes and control positioning of sub-plots.
fig = plt.figure()
fig.add_subplot(121)
fig.add_subplot(122)
How to show sub-plots vertically
In this section, we will demonstrate how to put a sub-plot on top of second one. Here we are using 211 and 222 in subplot( ) function. 211 means 2 rows, 1 column and 1st plot.
fig = plt.figure()
ax1 = fig.add_subplot(211)
ax = finaldf.plot(x="cities",  y="2017_Score", ax=ax1, kind="barh", legend = False, title = "2017 vs 2018 Score")
ax.invert_yaxis()
plt.xticks(range(0,60,10))
ax.set(ylabel='')

ax2 = fig.add_subplot(212)
ax = finaldf.plot(x="cities",  y="2018_Score", ax=ax2, kind="barh", legend = False)
ax.invert_yaxis()
ax.set(ylabel='')

Useful Tips

How to save plot
# save plot
x = ['A','B','C', 'D']
y = [1,2,3,4]
fig = plt.figure()
plt.bar(x, y)
fig.savefig('C:/My Files/Blog/firstimg.png')
plt.close(fig)
How to set different limits in axis
Suppose you have a bar graph and you want to set limits in x and y axis manually. plt.subplot() returns axes. In set_ylim( ), you can assign limits which is smallest and largest value of y axis. Similarly you can set limit in x-axis.
x = ['A','B','C', 'D']
y = [100,119,800,900]
plt.bar(x, y)
ax = plt.subplot()
ax.set_ylim(0,1000)
plt.show()
How to show multiple entries in legend
plt.legend(["First","Second"])
How to change font size, weight and color in graph
plt.bar(x, y)
plt.title("Cost of Living", fontsize=18, fontweight='bold', color='blue')
plt.xlabel("Cities", fontsize=16)
plt.ylabel("Score", fontsize=16)
Exercise
Get your hands dirty with this exercise. Prepare the graph shown below using dataframe finaldf. Give attention to each part of the graph and try to replicate it. Post your solution in the comment box below.
EndNote
I hope this article gave you a fair idea about matplotlib library and how to generate plots with customization in python. Your next step should be practice, practice and practice with some sample or publicly available datasets (available on UCI Machine Learning Repository and kaggle websites).
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Post Comment 16 Responses to "Matplotlib Tutorial : Learn by Examples"
  1. Excellent !! Basics explained clearly.

    ReplyDelete
  2. Excellent. Clearly explained. I use jupyter. Will help me for sure.

    ReplyDelete
    Replies
    1. Cool. Let me know if you face any issue(s) in doing it in Jupyter!

      Delete
  3. Good one. Very useful. Thank you for posting

    ReplyDelete
  4. correct me if there is a simpler way.
    thankyou

    ax=finaldf.plot(x='cities',kind='barh')
    plt.xlabel('score',color='black')
    plt.ylabel('cities',color='black')
    ax.invert_yaxis()
    legend = ax.legend([2017,2018],frameon=True)

    plt.title('Cost of Living Index',fontweight='bold')
    plt.show()

    ReplyDelete
  5. usful, I learned a lot.
    ax = finaldf.plot.barh(x='Cities', rot=0)
    plt.title("Cost of Living Index", fontsize=18, fontweight='bold')
    plt.xlabel('Score', fontsize=16)
    plt.ylabel("Cities", fontsize=16)
    ax.set_xlim(0,55)
    ax.invert_yaxis()

    ReplyDelete
    Replies
    1. Awesome. You didn't change legend. Just saying - It seems you changed the name of column "cities" (as it is in small case in finaldf. In your code, it starts with capital 'C'). It does not matter at all (you can call it by any column name you wish. your code is perfect!)

      Delete
  6. Thank you for stopping by my blog. Cheers!

    ReplyDelete
  7. This tutorial is awesome!
    Thank you!

    ReplyDelete
  8. Wow...
    This is awesome 🙌. You just gave me another reason to learn Python cos I was never a Python fan; I prefer R to Python.
    I'm wondering if you written on seaborn library

    ReplyDelete
  9. This is very good, we appreciate you bro!

    ReplyDelete
Next → ← Prev