SlideShare a Scribd company logo
1 of 48
Presenter:
Date:
TOPIC: AI and DS
Private and Confidential www.futureconnect.net 1
Private and Confidential www.futureconnect.net 2
AGENDA
UNIT
NAME
TOPICS
Hours
Count
Session
1.DATA
SCIENCE
1. DATA SCIENCE LIBARIES
2. NUMPY
3. PANDAS
4. MATPLOTLIB
5. DATA EXPLORATION
2 2
OBJECTIVES
• Gain knowledge of Data Science Libraries
• To understand Data Science Manipulation Packages
• Demo for Data Exploration using Package
3
Private and Confidential www.futureconnect.net 3
Data Mining
Scrapy
• One of the most popular Python data science libraries, Scrapy helps to build crawling programs
(spider bots) that can retrieve structured data from the web – for example, URLs or contact info.
• It's a great tool for scraping data used in, for example, Python machine learning models.
• Developers use it for gathering data from APIs.
BeautifulSoup
• BeautifulSoup is another really popular library for web crawling and data scraping.
• If you want to collect data that’s available on some website but not via a proper CSV or API,
BeautifulSoup can help you scrape it and arrange it into the format you need.
4
Private and Confidential www.futureconnect.net 4
Data Processing and Modeling
NumPy
• NumPy (Numerical Python) is a perfect tool for scientific computing and performing basic and
advanced array operations.
• The library offers many handy features performing operations on n-arrays and matrices in
Python.
SciPy
• This useful library includes modules for linear algebra, integration, optimization, and statistics.
• Its main functionality was built upon NumPy, so its arrays make use of this library.
• SciPy works great for all kinds of scientific programming projects (science, mathematics, and
engineering
5
Private and Confidential www.futureconnect.net 5
Data Processing and Modeling
Pandas
• Pandas is a library created to help developers work with "labeled" and "relational" data intuitively.
• It's based on two main data structures: "Series" (one-dimensional, like a list of items) and "Data
Frames" (two-dimensional, like a table with multiple columns).
Keras
• Keras is a great library for building neural networks and modeling.
• It's very straightforward to use and provides developers with a good degree of extensibility. The
library takes advantage of other packages, (Theano or TensorFlow) as its backends.
6
Private and Confidential www.futureconnect.net 6
Data Processing and Modeling
SciKit-Learn
• This is an industry-standard for data science projects based in Python.
• Scikits is a group of packages in the SciPy Stack that were created for specific functionalities –
for example, image processing. Scikit-learn uses the math operations of SciPy to expose a
concise interface to the most common machine learning algorithms.
PyTorch
• PyTorch is a framework that is perfect for data scientists who want to perform deep learning tasks
easily.
• The tool allows performing tensor computations with GPU acceleration. It's also used for other
tasks – for example, for creating dynamic computational graphs and calculating gradients
automatically.
7
Private and Confidential www.futureconnect.net 7
Data Processing and Modeling
TensorFlow
• TensorFlow is a popular Python framework for machine learning and deep learning, which was
developed at Google Brain.
• It's the best tool for tasks like object identification, speech recognition, and many others.
• It helps in working with artificial neural networks that need to handle multiple data sets.
XGBoost
• This library is used to implement machine learning algorithms under the Gradient Boosting
framework.
• XGBoost is portable, flexible, and efficient.
• It offers parallel tree boosting that helps teams to resolve many data science problems. Another
advantage is that developers can run the same code on major distributed environments such as
Hadoop, SGE, and MPI.
8
Private and Confidential www.futureconnect.net 8
Data Visualization
Matplotlib
• This is a standard data science library that helps to generate data visualizations such as two-
dimensional diagrams and graphs (histograms, scatterplots, non-Cartesian coordinates graphs).
• Matplotlib is one of those plotting libraries that are really useful in data science projects —
it provides an object-oriented API for embedding plots into applications.
• Developers need to write more code than usual while using this library for generating advanced
visualizations.
Seaborn
• Seaborn is based on Matplotlib and serves as a useful Python machine learning tool for
visualizing statistical models – heatmaps and other types of visualizations that summarize data
and depict the overall distributions.
• When using this library, you get to benefit from an extensive gallery of visualizations (including
complex ones like time series, joint plots, and violin diagrams).
9
Private and Confidential www.futureconnect.net 9
Data Visualization
Bokeh
• This library is a great tool for creating interactive and scalable visualizations inside browsers using
JavaScript widgets. Bokeh is fully independent of Matplotlib.
• It focuses on interactivity and presents visualizations through modern browsers – similarly to Data-
Driven Documents (d3.js). It offers a set of graphs, interaction abilities (like linking plots or adding
JavaScript widgets), and styling.
Plotly
• This web-based tool for data visualization that offers many useful out-of-box graphics – you can
find them on the Plot.ly website.
• The library works very well in interactive web applications.
pydot
• This library helps to generate oriented and non-oriented graphs.
• It serves as an interface to Graphviz (written in pure Python). The graphs created come in handy
when you're developing algorithms based on neural networks and decision trees.
10
Private and Confidential www.futureconnect.net 10
Python Libraries for Data Science
• Pandas: Used for structured data operations
• NumPy: Creating Arrays
• Matplotlib: Data Visualization
• Scikit-learn: Machine Learning Operations
• SciPy: Perform Scientific operations
• TensorFlow: Symbolic math library
• BeautifulSoup: Parsing HTML and XML pages
Private and Confidential www.futureconnect.net 11
This 3 Python Libraries will be
covered in the following slides
Numpy
• NumPy=Numerical Python
• Created in 2005 by Travis Oliphant.
• Consist of Array objects and perform array processing.
• NumPy is faster than traditional Python lists as it is stored in one continuous place
in memory.
• The array object in NumPy is called ndarray.
Private and Confidential www.futureconnect.net 12
Top four benefits that NumPy can bring to your code:
1. More speed: NumPy uses algorithms written in C that complete in nanoseconds rather than
seconds.
2. Fewer loops: NumPy helps you to reduce loops and keep from getting tangled up in iteration
indices.
3. Clearer code: Without loops, your code will look more like the equations you’re trying to
calculate.
4. Better quality: There are thousands of contributors working to keep NumPy fast, friendly, and
bug free.
13
Private and Confidential www.futureconnect.net 13
Numpy Installation and Importing
Pre-requirements: Python and Python Package Installer(pip)
Installation: pip install numpy
Import: After installation, import the package by the “import” keyword.
import numpy
This ensures that NumPy package is properly installed and ready to use
Package
Private and Confidential www.futureconnect.net 14
Numpy-ndarray Object
• It defines the collection of items which belong to same type.
• Each element in ndarray is an object of data-type object : dtype
• Basic ndarray creation: numpy.array
OR
numpy.array(object, dtype = None, copy = True, order = None, subok = False, ndmin =
0)
Array interface Data type Object copying Row/Col major Base class array Number of
or 1D dimensions
Private and Confidential www.futureconnect.net 15
Sample Input-Output
Code:
import numpy as np
a=np.array([1,2,3])
b=np.array([[1,2],[3,4]])
print(a)
print(b)
Output:
[1,2,3]
[[1,2]
[3,4]]
Private and Confidential www.futureconnect.net 16
1D Array
2D Array
NumPy arrays can be multi-dimensional too.
np.array([[1,2,3,4],[5,6,7,8]])
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
• Here, we created a 2-dimensional array of values.
• Note: A matrix is just a rectangular array of numbers with shape N x M where N is
the number of rows and M is the number of columns in the matrix. The one you
just saw above is a 2 x 4 matrix.
17
Private and Confidential www.futureconnect.net 17
Types of NumPy arrays
• Array of zeros
• Array of ones
• Random numbers in ndarrays
• Imatrix in NumPy
• Evenly spaced ndarray
18
Private and Confidential www.futureconnect.net 18
Numpy - Array Indexing and Slicing
• It is used to access array elements by using index element.
• The indexes in NumPy arrays start with 0.
arr = np.array([1, 2, 3, 4])
arr[0] Accessing first element of the array. Hence, the value is 1.
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
arr[0,1] Accessing the second element of the 2D array. Hence, the value is 2.
Slicing: Taking elements of an array from start index to end index [start:end] or [start:step:end]
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5]) Ans: [2 3 4 5]
Private and Confidential www.futureconnect.net 19
Dimensions of NumPy arrays
You can easily determine the number of dimensions or axes of a NumPy array using the ndims attribute:
# number of axis
a = np.array([[5,10,15],[20,25,20]])
print('Array :','n',a)
print('Dimensions :','n',a.ndim)
Array :
[[ 5 10 15]
[20 25 20]]
Dimensions :
2
This array has two dimensions: 2 rows and 3 columns.
20
Private and Confidential www.futureconnect.net 20
Numpy- Array Shape and Reshape
• The shape of an array is the number of data elements in the array.
• It has an attribute called shape to perform the action
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(arr.shape)
• Reshaping is done to change the shape of an array.
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(4, 3)
print(newarr)
Output: (2,4)
Output: [[1 2 3]
[4 5 6]
[7 8 9]
[10 11 12]]
Private and Confidential www.futureconnect.net 21
Flattening a NumPy array
Sometimes when you have a multidimensional array and want to collapse it to a single-dimensional
array, you can either use the flatten() method or the ravel() method:
Syntax:
• flatten()
• ravel()
22
Private and Confidential www.futureconnect.net 22
Transpose of a NumPy array
Another very interesting reshaping method of NumPy is the transpose() method. It takes the input
array and swaps the rows with the column values, and the column values with the values of the rows:
Syntax : numpy.transpose()
23
Private and Confidential www.futureconnect.net 23
Expanding and Squeezing a NumPy array
Expanding a NumPy array
• You can add a new axis to an array using the expand_dims() method by providing the array and the
axis along which to expand
Squeezing a NumPy array
• On the other hand, if you instead want to reduce the axis of the array, use the squeeze() method.
• It removes the axis that has a single entry. This means if you have created a 2 x 2 x 1 matrix,
squeeze() will remove the third dimension from the matrix
24
Private and Confidential www.futureconnect.net 24
Numpy- Arrays Join and Split
• Joining means to merge two or more arrays.
• We use concatenate() function to join arrays.
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
print(arr)
• Splitting means to breaking one array into many.
arr = np.array([1, 2, 3, 4, 5, 6])
newarr = np.array_split(arr, 3)
print(newarr)
Output: [1 2 3 4 5 6]
Output: [array([1,2]),array([3,4]),array([5,6])]
Private and Confidential www.futureconnect.net 25
Pandas
• Data Analysis Tool
• Used for exploring, manipulating, analyzing data.
• The source code for Pandas is found at this github repository
https://github.com/pandas-dev/pandas
• Pandas convert messy data into readable and required format for analysis.
Private and Confidential www.futureconnect.net 26
Pandas Installation and Importing
Pre-requirements: Python and Python Package Installer(pip)
Installation: pip install pandas
Import: After installation, import the package by the “import” keyword.
import pandas
This ensures that Pandas package is properly installed and ready to use
Package
Private and Confidential www.futureconnect.net 27
Pandas -Series and Dataframes
• Series is a 1D array containing one type of data
import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)
• Dataframe is a 2D array containing rows and columns
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data)
print(df)
Output: 0 1
1 7
2 2
dtype: int64
Loading data into dataframe Output:
calories duration
0 420 50
1 380 40
2 390 45
Private and Confidential www.futureconnect.net 28
Pandas: Read CSV
• It is used to read CSV(Comma Separated File).
• pd.read_csv() function is used.
import pandas as pd
df = pd.read_csv('data.csv’)
When we print df, we get first 5 rows and last 5 columns in the data as default
df.head(10) : Print first 10 rows
df.tail(10): Print last 10 rows.
df.info(): Information about the data
Input File:data.csv
File is read and stored as data frame in df variable
Private and Confidential www.futureconnect.net 29
Python Matplotlib
• Graph Plotting Library
• Created by John D. Hunter
• The source code for Matplotlib is located at this github repository
https://github.com/matplotlib/matplotlib
• It makes use of NumPy, the numerical mathematics extension of Python
• The current stable version is 2.2.0 released in January 2018.
Private and Confidential www.futureconnect.net 30
Matplotlib Installation and Importing
Pre-requirements: Python and Python Package Installer(pip)
Installation: pip install matplotlib
Import: After installation, import the package by the “import” keyword.
import matplotlib
This ensures that Matplotlib package is properly installed and ready to use
Package
Private and Confidential www.futureconnect.net 31
Matplotlib Pyplot
• Matplotlib utilities comes under the Pyplot submodule as plt shown below:
import matplotlib.pyplot as plt
Now, Pyplot can be referred as plt
• plot() function is used to draw lines from points
• show() function is used to display the graph
import matplotlib.pyplot as plt
import numpy as np
xpoints = np.array([0, 6])
ypoints = np.array([0, 250])
plt.plot(xpoints, ypoints)
plt.show()
Private and Confidential www.futureconnect.net 32
Matplotlib Functions
• xlabel() and ylabel() functions are used to add labels
• subplots() functions to draw multiple plots in one figure
• scatter() function is used to construct scatter plots
• bar() function to draw bar graphs
Scatter Plot
Bar Plot
Private and Confidential www.futureconnect.net 33
DATA EXPLORATION: load data file(s)
Private and Confidential www.futureconnect.net 34
DATA EXPLORATION:load data file(s)
Private and Confidential www.futureconnect.net 35
DATA EXPLORATION:load data file(s)
Private and Confidential www.futureconnect.net 36
DATA EXPLORATION:convert a variable to a
different data type
Private and Confidential www.futureconnect.net 37
DATA EXPLORATION:Transpose a Data set or
dataframe
Private and Confidential www.futureconnect.net 38
DATA EXPLORATION:Sort a Pandas DataFrame
Private and Confidential www.futureconnect.net 39
DATA EXPLORATION: Histogram Plot
Private and Confidential www.futureconnect.net 40
DATA EXPLORATION: Histogram Plot
Private and Confidential www.futureconnect.net 41
DATA EXPLORATION:Scatter Plot
Private and Confidential www.futureconnect.net 42
DATA EXPLORATION:Box Plot
Private and Confidential www.futureconnect.net 43
DATA EXPLORATION:Generate frequency
tables
Private and Confidential www.futureconnect.net 44
DATA EXPLORATION:Sample Dataset
Private and Confidential www.futureconnect.net 45
DATA EXPLORATION:Remove duplicate
values
Private and Confidential www.futureconnect.net 46
DATA EXPLORATION:Group variables
Private and Confidential www.futureconnect.net 47
DATA EXPLORATION:Treat missing values
TREATMENT:
Private and Confidential www.futureconnect.net 48

More Related Content

What's hot

Data Structures for Statistical Computing in Python
Data Structures for Statistical Computing in PythonData Structures for Statistical Computing in Python
Data Structures for Statistical Computing in Python
Wes McKinney
 
Scipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in PythonScipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in Python
Wes McKinney
 

What's hot (19)

Scaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsScaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUs
 
Data Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonData Wrangling and Visualization Using Python
Data Wrangling and Visualization Using Python
 
Standardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft PresentationStandardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft Presentation
 
Data Structures for Statistical Computing in Python
Data Structures for Statistical Computing in PythonData Structures for Statistical Computing in Python
Data Structures for Statistical Computing in Python
 
PyCon Estonia 2019
PyCon Estonia 2019PyCon Estonia 2019
PyCon Estonia 2019
 
PyData Barcelona Keynote
PyData Barcelona KeynotePyData Barcelona Keynote
PyData Barcelona Keynote
 
Data visualization in Python
Data visualization in PythonData visualization in Python
Data visualization in Python
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013
 
Array computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyDataArray computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyData
 
Python for Computer Vision - Revision 2nd Edition
Python for Computer Vision - Revision 2nd EditionPython for Computer Vision - Revision 2nd Edition
Python for Computer Vision - Revision 2nd Edition
 
Data Analytics Webinar for Aspirants
Data Analytics Webinar for AspirantsData Analytics Webinar for Aspirants
Data Analytics Webinar for Aspirants
 
Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019
 
Scipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in PythonScipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in Python
 
SciPy Latin America 2019
SciPy Latin America 2019SciPy Latin America 2019
SciPy Latin America 2019
 
Data science in ruby is it possible? is it fast? should we use it?
Data science in ruby is it possible? is it fast? should we use it?Data science in ruby is it possible? is it fast? should we use it?
Data science in ruby is it possible? is it fast? should we use it?
 
Numba lightning
Numba lightningNumba lightning
Numba lightning
 
Analyzing Data With Python
Analyzing Data With PythonAnalyzing Data With Python
Analyzing Data With Python
 
Python for Data Science with Anaconda
Python for Data Science with AnacondaPython for Data Science with Anaconda
Python for Data Science with Anaconda
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
 

Similar to Session 2

Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
Simplilearn
 

Similar to Session 2 (20)

Python for Data Science: A Comprehensive Guide
Python for Data Science: A Comprehensive GuidePython for Data Science: A Comprehensive Guide
Python for Data Science: A Comprehensive Guide
 
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysis
 
Py tables
Py tablesPy tables
Py tables
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
 
Abhishek Training PPT.pptx
Abhishek Training PPT.pptxAbhishek Training PPT.pptx
Abhishek Training PPT.pptx
 
PyTables
PyTablesPyTables
PyTables
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
 
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
 
Data science in ruby, is it possible? is it fast? should we use it?
Data science in ruby, is it possible? is it fast? should we use it?Data science in ruby, is it possible? is it fast? should we use it?
Data science in ruby, is it possible? is it fast? should we use it?
 
Adarsh_Masekar(2GP19CS003).pptx
Adarsh_Masekar(2GP19CS003).pptxAdarsh_Masekar(2GP19CS003).pptx
Adarsh_Masekar(2GP19CS003).pptx
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
 
ANN-Lecture2-Python Startup.pptx
ANN-Lecture2-Python Startup.pptxANN-Lecture2-Python Startup.pptx
ANN-Lecture2-Python Startup.pptx
 
DS LAB MANUAL.pdf
DS LAB MANUAL.pdfDS LAB MANUAL.pdf
DS LAB MANUAL.pdf
 
3 python packages
3 python packages3 python packages
3 python packages
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps Approach
 
Python for ML
Python for MLPython for ML
Python for ML
 
What is Python? An overview of Python for science.
What is Python? An overview of Python for science.What is Python? An overview of Python for science.
What is Python? An overview of Python for science.
 

Recently uploaded

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Recently uploaded (20)

FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 

Session 2

  • 1. Presenter: Date: TOPIC: AI and DS Private and Confidential www.futureconnect.net 1
  • 2. Private and Confidential www.futureconnect.net 2 AGENDA UNIT NAME TOPICS Hours Count Session 1.DATA SCIENCE 1. DATA SCIENCE LIBARIES 2. NUMPY 3. PANDAS 4. MATPLOTLIB 5. DATA EXPLORATION 2 2
  • 3. OBJECTIVES • Gain knowledge of Data Science Libraries • To understand Data Science Manipulation Packages • Demo for Data Exploration using Package 3 Private and Confidential www.futureconnect.net 3
  • 4. Data Mining Scrapy • One of the most popular Python data science libraries, Scrapy helps to build crawling programs (spider bots) that can retrieve structured data from the web – for example, URLs or contact info. • It's a great tool for scraping data used in, for example, Python machine learning models. • Developers use it for gathering data from APIs. BeautifulSoup • BeautifulSoup is another really popular library for web crawling and data scraping. • If you want to collect data that’s available on some website but not via a proper CSV or API, BeautifulSoup can help you scrape it and arrange it into the format you need. 4 Private and Confidential www.futureconnect.net 4
  • 5. Data Processing and Modeling NumPy • NumPy (Numerical Python) is a perfect tool for scientific computing and performing basic and advanced array operations. • The library offers many handy features performing operations on n-arrays and matrices in Python. SciPy • This useful library includes modules for linear algebra, integration, optimization, and statistics. • Its main functionality was built upon NumPy, so its arrays make use of this library. • SciPy works great for all kinds of scientific programming projects (science, mathematics, and engineering 5 Private and Confidential www.futureconnect.net 5
  • 6. Data Processing and Modeling Pandas • Pandas is a library created to help developers work with "labeled" and "relational" data intuitively. • It's based on two main data structures: "Series" (one-dimensional, like a list of items) and "Data Frames" (two-dimensional, like a table with multiple columns). Keras • Keras is a great library for building neural networks and modeling. • It's very straightforward to use and provides developers with a good degree of extensibility. The library takes advantage of other packages, (Theano or TensorFlow) as its backends. 6 Private and Confidential www.futureconnect.net 6
  • 7. Data Processing and Modeling SciKit-Learn • This is an industry-standard for data science projects based in Python. • Scikits is a group of packages in the SciPy Stack that were created for specific functionalities – for example, image processing. Scikit-learn uses the math operations of SciPy to expose a concise interface to the most common machine learning algorithms. PyTorch • PyTorch is a framework that is perfect for data scientists who want to perform deep learning tasks easily. • The tool allows performing tensor computations with GPU acceleration. It's also used for other tasks – for example, for creating dynamic computational graphs and calculating gradients automatically. 7 Private and Confidential www.futureconnect.net 7
  • 8. Data Processing and Modeling TensorFlow • TensorFlow is a popular Python framework for machine learning and deep learning, which was developed at Google Brain. • It's the best tool for tasks like object identification, speech recognition, and many others. • It helps in working with artificial neural networks that need to handle multiple data sets. XGBoost • This library is used to implement machine learning algorithms under the Gradient Boosting framework. • XGBoost is portable, flexible, and efficient. • It offers parallel tree boosting that helps teams to resolve many data science problems. Another advantage is that developers can run the same code on major distributed environments such as Hadoop, SGE, and MPI. 8 Private and Confidential www.futureconnect.net 8
  • 9. Data Visualization Matplotlib • This is a standard data science library that helps to generate data visualizations such as two- dimensional diagrams and graphs (histograms, scatterplots, non-Cartesian coordinates graphs). • Matplotlib is one of those plotting libraries that are really useful in data science projects — it provides an object-oriented API for embedding plots into applications. • Developers need to write more code than usual while using this library for generating advanced visualizations. Seaborn • Seaborn is based on Matplotlib and serves as a useful Python machine learning tool for visualizing statistical models – heatmaps and other types of visualizations that summarize data and depict the overall distributions. • When using this library, you get to benefit from an extensive gallery of visualizations (including complex ones like time series, joint plots, and violin diagrams). 9 Private and Confidential www.futureconnect.net 9
  • 10. Data Visualization Bokeh • This library is a great tool for creating interactive and scalable visualizations inside browsers using JavaScript widgets. Bokeh is fully independent of Matplotlib. • It focuses on interactivity and presents visualizations through modern browsers – similarly to Data- Driven Documents (d3.js). It offers a set of graphs, interaction abilities (like linking plots or adding JavaScript widgets), and styling. Plotly • This web-based tool for data visualization that offers many useful out-of-box graphics – you can find them on the Plot.ly website. • The library works very well in interactive web applications. pydot • This library helps to generate oriented and non-oriented graphs. • It serves as an interface to Graphviz (written in pure Python). The graphs created come in handy when you're developing algorithms based on neural networks and decision trees. 10 Private and Confidential www.futureconnect.net 10
  • 11. Python Libraries for Data Science • Pandas: Used for structured data operations • NumPy: Creating Arrays • Matplotlib: Data Visualization • Scikit-learn: Machine Learning Operations • SciPy: Perform Scientific operations • TensorFlow: Symbolic math library • BeautifulSoup: Parsing HTML and XML pages Private and Confidential www.futureconnect.net 11 This 3 Python Libraries will be covered in the following slides
  • 12. Numpy • NumPy=Numerical Python • Created in 2005 by Travis Oliphant. • Consist of Array objects and perform array processing. • NumPy is faster than traditional Python lists as it is stored in one continuous place in memory. • The array object in NumPy is called ndarray. Private and Confidential www.futureconnect.net 12
  • 13. Top four benefits that NumPy can bring to your code: 1. More speed: NumPy uses algorithms written in C that complete in nanoseconds rather than seconds. 2. Fewer loops: NumPy helps you to reduce loops and keep from getting tangled up in iteration indices. 3. Clearer code: Without loops, your code will look more like the equations you’re trying to calculate. 4. Better quality: There are thousands of contributors working to keep NumPy fast, friendly, and bug free. 13 Private and Confidential www.futureconnect.net 13
  • 14. Numpy Installation and Importing Pre-requirements: Python and Python Package Installer(pip) Installation: pip install numpy Import: After installation, import the package by the “import” keyword. import numpy This ensures that NumPy package is properly installed and ready to use Package Private and Confidential www.futureconnect.net 14
  • 15. Numpy-ndarray Object • It defines the collection of items which belong to same type. • Each element in ndarray is an object of data-type object : dtype • Basic ndarray creation: numpy.array OR numpy.array(object, dtype = None, copy = True, order = None, subok = False, ndmin = 0) Array interface Data type Object copying Row/Col major Base class array Number of or 1D dimensions Private and Confidential www.futureconnect.net 15
  • 16. Sample Input-Output Code: import numpy as np a=np.array([1,2,3]) b=np.array([[1,2],[3,4]]) print(a) print(b) Output: [1,2,3] [[1,2] [3,4]] Private and Confidential www.futureconnect.net 16 1D Array 2D Array
  • 17. NumPy arrays can be multi-dimensional too. np.array([[1,2,3,4],[5,6,7,8]]) array([[1, 2, 3, 4], [5, 6, 7, 8]]) • Here, we created a 2-dimensional array of values. • Note: A matrix is just a rectangular array of numbers with shape N x M where N is the number of rows and M is the number of columns in the matrix. The one you just saw above is a 2 x 4 matrix. 17 Private and Confidential www.futureconnect.net 17
  • 18. Types of NumPy arrays • Array of zeros • Array of ones • Random numbers in ndarrays • Imatrix in NumPy • Evenly spaced ndarray 18 Private and Confidential www.futureconnect.net 18
  • 19. Numpy - Array Indexing and Slicing • It is used to access array elements by using index element. • The indexes in NumPy arrays start with 0. arr = np.array([1, 2, 3, 4]) arr[0] Accessing first element of the array. Hence, the value is 1. arr = np.array([[1,2,3,4,5], [6,7,8,9,10]]) arr[0,1] Accessing the second element of the 2D array. Hence, the value is 2. Slicing: Taking elements of an array from start index to end index [start:end] or [start:step:end] arr = np.array([1, 2, 3, 4, 5, 6, 7]) print(arr[1:5]) Ans: [2 3 4 5] Private and Confidential www.futureconnect.net 19
  • 20. Dimensions of NumPy arrays You can easily determine the number of dimensions or axes of a NumPy array using the ndims attribute: # number of axis a = np.array([[5,10,15],[20,25,20]]) print('Array :','n',a) print('Dimensions :','n',a.ndim) Array : [[ 5 10 15] [20 25 20]] Dimensions : 2 This array has two dimensions: 2 rows and 3 columns. 20 Private and Confidential www.futureconnect.net 20
  • 21. Numpy- Array Shape and Reshape • The shape of an array is the number of data elements in the array. • It has an attribute called shape to perform the action arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]]) print(arr.shape) • Reshaping is done to change the shape of an array. arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]) newarr = arr.reshape(4, 3) print(newarr) Output: (2,4) Output: [[1 2 3] [4 5 6] [7 8 9] [10 11 12]] Private and Confidential www.futureconnect.net 21
  • 22. Flattening a NumPy array Sometimes when you have a multidimensional array and want to collapse it to a single-dimensional array, you can either use the flatten() method or the ravel() method: Syntax: • flatten() • ravel() 22 Private and Confidential www.futureconnect.net 22
  • 23. Transpose of a NumPy array Another very interesting reshaping method of NumPy is the transpose() method. It takes the input array and swaps the rows with the column values, and the column values with the values of the rows: Syntax : numpy.transpose() 23 Private and Confidential www.futureconnect.net 23
  • 24. Expanding and Squeezing a NumPy array Expanding a NumPy array • You can add a new axis to an array using the expand_dims() method by providing the array and the axis along which to expand Squeezing a NumPy array • On the other hand, if you instead want to reduce the axis of the array, use the squeeze() method. • It removes the axis that has a single entry. This means if you have created a 2 x 2 x 1 matrix, squeeze() will remove the third dimension from the matrix 24 Private and Confidential www.futureconnect.net 24
  • 25. Numpy- Arrays Join and Split • Joining means to merge two or more arrays. • We use concatenate() function to join arrays. arr1 = np.array([1, 2, 3]) arr2 = np.array([4, 5, 6]) arr = np.concatenate((arr1, arr2)) print(arr) • Splitting means to breaking one array into many. arr = np.array([1, 2, 3, 4, 5, 6]) newarr = np.array_split(arr, 3) print(newarr) Output: [1 2 3 4 5 6] Output: [array([1,2]),array([3,4]),array([5,6])] Private and Confidential www.futureconnect.net 25
  • 26. Pandas • Data Analysis Tool • Used for exploring, manipulating, analyzing data. • The source code for Pandas is found at this github repository https://github.com/pandas-dev/pandas • Pandas convert messy data into readable and required format for analysis. Private and Confidential www.futureconnect.net 26
  • 27. Pandas Installation and Importing Pre-requirements: Python and Python Package Installer(pip) Installation: pip install pandas Import: After installation, import the package by the “import” keyword. import pandas This ensures that Pandas package is properly installed and ready to use Package Private and Confidential www.futureconnect.net 27
  • 28. Pandas -Series and Dataframes • Series is a 1D array containing one type of data import pandas as pd a = [1, 7, 2] myvar = pd.Series(a) print(myvar) • Dataframe is a 2D array containing rows and columns import pandas as pd data = { "calories": [420, 380, 390], "duration": [50, 40, 45] } df = pd.DataFrame(data) print(df) Output: 0 1 1 7 2 2 dtype: int64 Loading data into dataframe Output: calories duration 0 420 50 1 380 40 2 390 45 Private and Confidential www.futureconnect.net 28
  • 29. Pandas: Read CSV • It is used to read CSV(Comma Separated File). • pd.read_csv() function is used. import pandas as pd df = pd.read_csv('data.csv’) When we print df, we get first 5 rows and last 5 columns in the data as default df.head(10) : Print first 10 rows df.tail(10): Print last 10 rows. df.info(): Information about the data Input File:data.csv File is read and stored as data frame in df variable Private and Confidential www.futureconnect.net 29
  • 30. Python Matplotlib • Graph Plotting Library • Created by John D. Hunter • The source code for Matplotlib is located at this github repository https://github.com/matplotlib/matplotlib • It makes use of NumPy, the numerical mathematics extension of Python • The current stable version is 2.2.0 released in January 2018. Private and Confidential www.futureconnect.net 30
  • 31. Matplotlib Installation and Importing Pre-requirements: Python and Python Package Installer(pip) Installation: pip install matplotlib Import: After installation, import the package by the “import” keyword. import matplotlib This ensures that Matplotlib package is properly installed and ready to use Package Private and Confidential www.futureconnect.net 31
  • 32. Matplotlib Pyplot • Matplotlib utilities comes under the Pyplot submodule as plt shown below: import matplotlib.pyplot as plt Now, Pyplot can be referred as plt • plot() function is used to draw lines from points • show() function is used to display the graph import matplotlib.pyplot as plt import numpy as np xpoints = np.array([0, 6]) ypoints = np.array([0, 250]) plt.plot(xpoints, ypoints) plt.show() Private and Confidential www.futureconnect.net 32
  • 33. Matplotlib Functions • xlabel() and ylabel() functions are used to add labels • subplots() functions to draw multiple plots in one figure • scatter() function is used to construct scatter plots • bar() function to draw bar graphs Scatter Plot Bar Plot Private and Confidential www.futureconnect.net 33
  • 34. DATA EXPLORATION: load data file(s) Private and Confidential www.futureconnect.net 34
  • 35. DATA EXPLORATION:load data file(s) Private and Confidential www.futureconnect.net 35
  • 36. DATA EXPLORATION:load data file(s) Private and Confidential www.futureconnect.net 36
  • 37. DATA EXPLORATION:convert a variable to a different data type Private and Confidential www.futureconnect.net 37
  • 38. DATA EXPLORATION:Transpose a Data set or dataframe Private and Confidential www.futureconnect.net 38
  • 39. DATA EXPLORATION:Sort a Pandas DataFrame Private and Confidential www.futureconnect.net 39
  • 40. DATA EXPLORATION: Histogram Plot Private and Confidential www.futureconnect.net 40
  • 41. DATA EXPLORATION: Histogram Plot Private and Confidential www.futureconnect.net 41
  • 42. DATA EXPLORATION:Scatter Plot Private and Confidential www.futureconnect.net 42
  • 43. DATA EXPLORATION:Box Plot Private and Confidential www.futureconnect.net 43
  • 44. DATA EXPLORATION:Generate frequency tables Private and Confidential www.futureconnect.net 44
  • 45. DATA EXPLORATION:Sample Dataset Private and Confidential www.futureconnect.net 45
  • 46. DATA EXPLORATION:Remove duplicate values Private and Confidential www.futureconnect.net 46
  • 47. DATA EXPLORATION:Group variables Private and Confidential www.futureconnect.net 47
  • 48. DATA EXPLORATION:Treat missing values TREATMENT: Private and Confidential www.futureconnect.net 48