This document discusses data visualization and Matplotlib. It begins with an introduction to data visualization and its importance. It then covers basic visualization rules like labeling axes and adding titles. It discusses what Matplotlib is and how to install it. It provides examples of common plot types in Matplotlib like sine waves, scatter plots, bar charts, and pie charts. It also discusses working with data science and Pandas, including how to create Pandas Series and DataFrames from various data sources.
Axa Assurance Maroc - Insurer Innovation Award 2024
Data Visualization with Matplotlib in Python
1. Chapter – 4
Data Science with Python
Intro to Data Visualization with Matplotlib
Prof. Maulik Borsaniya
Maulik Borsaniya - Gardividyapith
2. Data Visualization
Data visualization is a very important part of data
analysis. You can use it to explore your data. If you
understand your data well, you’ll have a better chance
to find some insights. Finally, when you find any
insights, you can use visualizations again to be able to
share your findings with other people.
However, the idea here is to learn the fundamentals of
Data Visualization and Matplotlib. So, our plots will be
much simpler than that example.
Maulik Borsaniya - Gardividyapith
3. Basic Visualization Rules
Before we look at some kinds of plots, we’ll introduce
some basic rules. Those rules help us make nice and
informative plots instead of confusing ones.
Steps
i. The first step is to choose the appropriate plot type. If there
are various options, we can try to compare them, and
choose the one that fits our model the best.
ii. Second, when we choose your type of plot, one of the most
important things is to label your axis. If we don’t do this,
the plot is not informative enough.
iii. Third, we can add a title to make our plot more informative.
Maulik Borsaniya - Gardividyapith
4. IV Fourth, add labels for different categories when needed.
V Five, optionally we can add a text or an arrow
at interesting data points.
VI Six, in some cases we can use some sizes and colors of
the data to make the plot more informative.
Maulik Borsaniya - Gardividyapith
5. What is Matplotlib ?
Matplotlib is a python library used to create 2D graphs and plots by
using python scripts. It has a module named pyplot which makes
things easy for plotting by providing feature to control line styles, font
properties, formatting axes etc. It supports a very wide variety of
graphs and plots namely - histogram, bar charts, power spectra, error
charts etc.
It is used along with NumPy to provide an environment that is an effective
open source alternative for MatLab.
Pyplot is a matplotlib module which provides a MATLAB-like interface.
Matplotlib is designed to be as usable as MATLAB, with the ability to use
Python, and the advantage of being free and open-source.
Maulik Borsaniya - Gardividyapith
6. How to install Matplotlib ?
First of all you need to download Python from python.org .Which must be
Latest version.
Installing in windows you need to type following command in CMD.
python –mpip install -U pip
python –mpip install -U matplotlib
For Ubuntu.
sudo apt-get build-dep python-matplotlib
Maulik Borsaniya - Gardividyapith
7. Simple Example of Plotting(Sine wave form)
import numpy as np
import matplotlib.pyplot as plt
# Compute the x and y coordinates for points on a sine curve
x = np.arange(0, 3 * np.pi, 0.1)
y = np.sin(x)
plt.title("sine wave form")
# Plot the points using matplotlib
plt.plot(x, y)
plt.show()
Maulik Borsaniya - Gardividyapith
8. Sr.No. Parameter & Description
1 Start
The start of an interval. If omitted, defaults to 0
2 Stop
The end of an interval (not including this number)
3 Step
Spacing between values, default is 1
4 dtype
Data type of resulting ndarray. If not given, data type
of input is used
numpy.arange(start, stop, step, dtype)
The constructor takes the following parameters
Maulik Borsaniya - Gardividyapith
9. Scatter Plot
this type of plot shows all individual data points. Here, they aren’t
connected with lines. Each data point has the value of the x-axis
value and the value from the y-axis values. This type of plot can be
used to display trends or correlations.
In data science, it shows how 2 variables compare.
To make a scatter plot with Matplotlib, we can use
the plt.scatter()function. Again, the first argument is used for the
data on the horizontal axis, and the second - for the vertical axis.
Maulik Borsaniya - Gardividyapith
11. Bar chart
represents categorical data with rectangular bars. Each bar
has a height corresponds to the value it represents. It’s
useful when we want to compare a given numeric value on
different categories. It can also be used with 2 data series.
To make a bar chart with Maplotlib, we’ll need
the plt.bar() function.
Maulik Borsaniya - Gardividyapith
12. E.g.. Bar Chart
# Our data
import matplotlib.pyplot as plt
labels = ["JavaScript", "Java", "Python", "C#"]
usage = [69.8, 45.3, 38.8, 34.4]
# Generating the y positions. Later, we'll use them to replace them with labels.
y_positions = range(len(labels))
# Creating our bar plot
plt.bar(y_positions, usage)
plt.xticks(y_positions, labels)
plt.ylabel("Usage (%)")
plt.title("Programming language usage")
plt.show()
Maulik Borsaniya - Gardividyapith
13. Pie chart
a circular plot, divided into slices to show numerical proportion.
They are widely used in the business world.
However, many experts recommend to avoid them. The main
reason is that it’s difficult to compare the sections of a given pie
chart. Also, it’s difficult to compare data across multiple pie
charts.
In many cases, they can be replaced by a bar chart.
Maulik Borsaniya - Gardividyapith
14. Pie Chart Example
import matplotlib.pyplot as plt
sizes = [25, 20, 45, 10]
labels = ["Cats", "Dogs", "Tigers", "Goats"]
plt.pie(sizes, labels = labels, autopct = "%.2f")#float and persentage value
plt.axes().set_aspect("equal")#auto #num #aspect ratio
plt.show()
Maulik Borsaniya - Gardividyapith
15. Working With Data Science And Panda
Pandas is an open-source Python Library used for high-
performance data manipulation and data analysis using its
powerful data structures. Python with pandas is in use in a variety
of academic and commercial domains, including Finance,
Economics, Statistics, Advertising, Web Analytics, and more.
Using Pandas, we can accomplish five typical steps in the
processing and analysis of data, regardless of the origin of data —
load, organize, manipulate, model, and analyze the data.
Below are the some of the important features of Pandas which is
used specifically for Data processing and Data analysis work.
Maulik Borsaniya - Gardividyapith
16. If you want to work with data & sheets you need
to do and install Panda First.
Installation steps
In Windows
-> CMD - > Go to the specific python installed directory.
type following command over there and keep breathing…( )
C:>Python pip install pandas
For Ubuntu
-> Terminal
Type following commands.
>> sudo pip install pandas
Maulik Borsaniya - Gardividyapith
17. Pandas handles data through Series, Data Frame, and Panel. We will see
some examples from each of these.
Pandas Series
Series is a one-dimensional labeled array capable of holding data of any type
(integer, string, float, python objects, etc.). The axis labels are collectively called
index. A pandas Series can be created using the following constructor
Syntax : pandas. Series( data, index, dtype, copy)
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print s
Maulik Borsaniya - Gardividyapith
18. Pandas Data Frame
A Data frame is a two-dimensional data structure, i.e., data is
aligned in a tabular fashion in rows and columns. A pandas Data
Frame can be created using the following constructor
Syntax : pandas.DataFrame( data, index, columns, dtype, copy)
Eg.
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
print df
Maulik Borsaniya - Gardividyapith
19. What is Data Frame ?
A Data frame is a two-dimensional data structure, i.e.,
data is aligned in a tabular fashion in rows and columns.
Features of Data Frame
Potentially columns are of different types
Size – Mutable
Labeled axes (rows and columns)
Can Perform Arithmetic operations on rows and columns
Structure
Maulik Borsaniya - Gardividyapith
21. Data frame from list
import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print (df)
E.g.2
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print df
E.g.3
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
Df=pd.DataFrame(data,columns=['Name','Age'],dtype=float)
print df
Maulik Borsaniya - Gardividyapith
22. Creating Data Frame from Dictionary
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
print df
Maulik Borsaniya - Gardividyapith
23. Reading Data From CSV / Excel
import pandas as pd
data = pd.read_csv('C:Python34/sheet1.csv')
print (data)
Reading Specific Row – Eg.2
import pandas as pd
data = pd.read_csv('C:Python34/sheet1.csv')
# Slice the result for first 5 rows
print (data[0:5]['salary'])
#for Excel you can use read_excel…..
Maulik Borsaniya - Gardividyapith