SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
Programming for Data
Analysis
Week 3
Dr. Ferdin Joe John Joseph
Faculty of Information Technology
Thai – Nichi Institute of Technology, Bangkok
Today’s lesson
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
2
• Pivoting
• Binning
• Replacing and Renaming
• Laboratory
Pivoting in pandas
pandas.DataFrame.pivot_table
Syntax:
DataFrame.pivot_table(values=None, index=None, columns=None,
aggfunc='mean', fill_value=None, margins=False, dropna=True,
margins_name='All')
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
3
Parameters
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
4
Output
A pivoted table in the form of dataframe
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
5
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
6
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
7
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
8
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
9
Binning
• When dealing with continuous numeric data, it is often helpful to bin
the data into multiple buckets for further analysis.
• There are several different terms for binning including bucketing,
discrete binning, discretization or quantization.
• Pandas supports these approaches using the cut and qcut functions.
• Histogram is mostly used to visualize
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
10
Binning
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
11
Binning
• Pandas to process data
• Numpy to calculate arrays
• Seaborn to visualize histogram
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
12
Qcut
• Qcut is used to divide data in four quarters equally
• when you ask for quintiles with qcut, the bins will be chosen so that
you have the same number of records in each bin. You have 30
records, so should have 6 in each bin (your output should look like
this, although the breakpoints will differ due to the random draw)
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
13
Cut
• cut will choose the bins to be evenly spaced according to the values
themselves and not the frequency of those values.
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
14
Binning – Read Data
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
15
Binning
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
16
Binning
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
17
Binning
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
18
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
19
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
20
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
21
Binning quantized in variable
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
22
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
23
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
24
Naming Bins
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
25
Binning – Other applications
• Image histograms
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
26
Statistical Data Binning
• Statistical data binning is a way to group numbers of more or less
continuous values into a smaller number of "bins".
• For example, if you have data about a group of people, you might
want to arrange their ages into a smaller number of age intervals (for
example, grouping every five years together).
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
27
Methods to divide Bins
• Equal frequency binning
• Equal width binning
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
28
Equal frequency binning
• Bins have equal frequency
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
29
Equal Width Binning
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
30
Code
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
31
Advantages
• binning allows easy identification of outliers,
• invalid and missing values of numerical variables.
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
32
DSA 207 - Binning
• Create pivot table to find month wise average of internal and external
temperature, humidity and carbon monoxide levels in the fish data
• Visualize the binning of humidity levels in fish data over a particular
time of a day in a month. Do it with the following
• 1. Qcut
• 2. Cut
• 3. Naming Bins
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
33

Contenu connexe

Tendances

Blockchain Technology - Week 2 - Blockchain Terminologies
Blockchain Technology - Week 2 - Blockchain TerminologiesBlockchain Technology - Week 2 - Blockchain Terminologies
Blockchain Technology - Week 2 - Blockchain TerminologiesFerdin Joe John Joseph PhD
 
Blockchain Technology - Week 11 - Thai-Nichi Institute of Technology
Blockchain Technology - Week 11 - Thai-Nichi Institute of TechnologyBlockchain Technology - Week 11 - Thai-Nichi Institute of Technology
Blockchain Technology - Week 11 - Thai-Nichi Institute of TechnologyFerdin Joe John Joseph PhD
 
Blockchain Technology - Week 4 - Hyperledger and Smart Contracts
Blockchain Technology - Week 4 - Hyperledger and Smart ContractsBlockchain Technology - Week 4 - Hyperledger and Smart Contracts
Blockchain Technology - Week 4 - Hyperledger and Smart ContractsFerdin Joe John Joseph PhD
 
Blockchain Technology - Week 10 - CAP Teorem, Byzantines General Problem
Blockchain Technology - Week 10 - CAP Teorem, Byzantines General ProblemBlockchain Technology - Week 10 - CAP Teorem, Byzantines General Problem
Blockchain Technology - Week 10 - CAP Teorem, Byzantines General ProblemFerdin Joe John Joseph PhD
 
Blockchain Technology - Week 1 - Introduction to Blockchain
Blockchain Technology - Week 1 - Introduction to BlockchainBlockchain Technology - Week 1 - Introduction to Blockchain
Blockchain Technology - Week 1 - Introduction to BlockchainFerdin Joe John Joseph PhD
 
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud ComputingWeek 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud ComputingFerdin Joe John Joseph PhD
 
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud ComputingWeek 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud ComputingFerdin Joe John Joseph PhD
 
Week 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
Week 1: Introduction to Cloud Computing - DSA 441 Cloud ComputingWeek 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
Week 1: Introduction to Cloud Computing - DSA 441 Cloud ComputingFerdin Joe John Joseph PhD
 
BaoCaoFreeRTOS.pptx
BaoCaoFreeRTOS.pptxBaoCaoFreeRTOS.pptx
BaoCaoFreeRTOS.pptxHuynhPyN
 
Caso jgb
Caso jgbCaso jgb
Caso jgbdediaz
 
32 đề thi vào lớp 10 dh khtn ha noi 1989 2005 truonghocso.com
32 đề thi vào lớp 10 dh khtn ha noi 1989 2005   truonghocso.com32 đề thi vào lớp 10 dh khtn ha noi 1989 2005   truonghocso.com
32 đề thi vào lớp 10 dh khtn ha noi 1989 2005 truonghocso.comThế Giới Tinh Hoa
 
TUYỂN TẬP 11 CHUYÊN ĐỀ LUYỆN THI VIOLYMPIC TOÁN LỚP 2
TUYỂN TẬP 11 CHUYÊN ĐỀ LUYỆN THI VIOLYMPIC TOÁN LỚP 2TUYỂN TẬP 11 CHUYÊN ĐỀ LUYỆN THI VIOLYMPIC TOÁN LỚP 2
TUYỂN TẬP 11 CHUYÊN ĐỀ LUYỆN THI VIOLYMPIC TOÁN LỚP 2Bồi Dưỡng HSG Toán Lớp 3
 

Tendances (20)

Blockchain Technology - Week 2 - Blockchain Terminologies
Blockchain Technology - Week 2 - Blockchain TerminologiesBlockchain Technology - Week 2 - Blockchain Terminologies
Blockchain Technology - Week 2 - Blockchain Terminologies
 
Week 9: Programming for Data Analysis
Week 9: Programming for Data AnalysisWeek 9: Programming for Data Analysis
Week 9: Programming for Data Analysis
 
Blockchain Technology - Week 11 - Thai-Nichi Institute of Technology
Blockchain Technology - Week 11 - Thai-Nichi Institute of TechnologyBlockchain Technology - Week 11 - Thai-Nichi Institute of Technology
Blockchain Technology - Week 11 - Thai-Nichi Institute of Technology
 
Blockchain Technology - Week 9 - Blockciphers
Blockchain Technology - Week 9 - BlockciphersBlockchain Technology - Week 9 - Blockciphers
Blockchain Technology - Week 9 - Blockciphers
 
Blockchain Technology - Week 4 - Hyperledger and Smart Contracts
Blockchain Technology - Week 4 - Hyperledger and Smart ContractsBlockchain Technology - Week 4 - Hyperledger and Smart Contracts
Blockchain Technology - Week 4 - Hyperledger and Smart Contracts
 
Blockchain Technology - Week 10 - CAP Teorem, Byzantines General Problem
Blockchain Technology - Week 10 - CAP Teorem, Byzantines General ProblemBlockchain Technology - Week 10 - CAP Teorem, Byzantines General Problem
Blockchain Technology - Week 10 - CAP Teorem, Byzantines General Problem
 
Data wrangling week 10
Data wrangling week 10Data wrangling week 10
Data wrangling week 10
 
Blockchain Technology - Week 1 - Introduction to Blockchain
Blockchain Technology - Week 1 - Introduction to BlockchainBlockchain Technology - Week 1 - Introduction to Blockchain
Blockchain Technology - Week 1 - Introduction to Blockchain
 
Data Wrangling Week 4
Data Wrangling Week 4Data Wrangling Week 4
Data Wrangling Week 4
 
Data wrangling week 6
Data wrangling week 6Data wrangling week 6
Data wrangling week 6
 
Data wrangling week3
Data wrangling week3Data wrangling week3
Data wrangling week3
 
Data wrangling week2
Data wrangling week2Data wrangling week2
Data wrangling week2
 
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud ComputingWeek 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
 
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud ComputingWeek 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
 
Week 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
Week 1: Introduction to Cloud Computing - DSA 441 Cloud ComputingWeek 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
Week 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
 
BaoCaoFreeRTOS.pptx
BaoCaoFreeRTOS.pptxBaoCaoFreeRTOS.pptx
BaoCaoFreeRTOS.pptx
 
Caso jgb
Caso jgbCaso jgb
Caso jgb
 
32 đề thi vào lớp 10 dh khtn ha noi 1989 2005 truonghocso.com
32 đề thi vào lớp 10 dh khtn ha noi 1989 2005   truonghocso.com32 đề thi vào lớp 10 dh khtn ha noi 1989 2005   truonghocso.com
32 đề thi vào lớp 10 dh khtn ha noi 1989 2005 truonghocso.com
 
Dãy số vmo2009
Dãy số vmo2009Dãy số vmo2009
Dãy số vmo2009
 
TUYỂN TẬP 11 CHUYÊN ĐỀ LUYỆN THI VIOLYMPIC TOÁN LỚP 2
TUYỂN TẬP 11 CHUYÊN ĐỀ LUYỆN THI VIOLYMPIC TOÁN LỚP 2TUYỂN TẬP 11 CHUYÊN ĐỀ LUYỆN THI VIOLYMPIC TOÁN LỚP 2
TUYỂN TẬP 11 CHUYÊN ĐỀ LUYỆN THI VIOLYMPIC TOÁN LỚP 2
 

Similaire à Programming for Data Analysis: Week 3

2019 DSA 105 Introduction to Data Science Week 3
2019 DSA 105 Introduction to Data Science Week 32019 DSA 105 Introduction to Data Science Week 3
2019 DSA 105 Introduction to Data Science Week 3Ferdin Joe John Joseph PhD
 
Data Structures and Algorithm - Week 4 - Trees, Binary Trees
Data Structures and Algorithm - Week 4 - Trees, Binary TreesData Structures and Algorithm - Week 4 - Trees, Binary Trees
Data Structures and Algorithm - Week 4 - Trees, Binary TreesFerdin Joe John Joseph PhD
 
Data Structures and Algorithm - Week 6 - Red Black Trees
Data Structures and Algorithm - Week 6 - Red Black TreesData Structures and Algorithm - Week 6 - Red Black Trees
Data Structures and Algorithm - Week 6 - Red Black TreesFerdin Joe John Joseph PhD
 
Ifla203 archambault
Ifla203 archambaultIfla203 archambault
Ifla203 archambaultsusangar
 
Effective STEM Education Solutions Webinar
Effective STEM Education Solutions WebinarEffective STEM Education Solutions Webinar
Effective STEM Education Solutions WebinarStudica
 
Data Science Curriculum at Indiana University
Data Science Curriculum at Indiana UniversityData Science Curriculum at Indiana University
Data Science Curriculum at Indiana UniversityGeoffrey Fox
 
The systemic challenges in data science initiatives (and some solutions)
The systemic challenges in data science initiatives (and some solutions)The systemic challenges in data science initiatives (and some solutions)
The systemic challenges in data science initiatives (and some solutions)Balázs Kégl
 
A seminar on neo4 j
A seminar on neo4 jA seminar on neo4 j
A seminar on neo4 jRishikese MR
 
UnSupervised Learning Clustering
UnSupervised Learning ClusteringUnSupervised Learning Clustering
UnSupervised Learning ClusteringFEG
 
Smart SE: Recurrent Education Program of IoT and AI for Business
Smart SE: Recurrent Education Program of IoT and AI for BusinessSmart SE: Recurrent Education Program of IoT and AI for Business
Smart SE: Recurrent Education Program of IoT and AI for BusinessHironori Washizaki
 
1. Intro DS.pptx
1. Intro DS.pptx1. Intro DS.pptx
1. Intro DS.pptxAnusuya123
 

Similaire à Programming for Data Analysis: Week 3 (20)

Data Wrangling Week 7
Data Wrangling Week 7Data Wrangling Week 7
Data Wrangling Week 7
 
Data wrangling week 5
Data wrangling week 5Data wrangling week 5
Data wrangling week 5
 
2019 DSA 105 Introduction to Data Science Week 3
2019 DSA 105 Introduction to Data Science Week 32019 DSA 105 Introduction to Data Science Week 3
2019 DSA 105 Introduction to Data Science Week 3
 
Data Structures and Algorithm - Week 4 - Trees, Binary Trees
Data Structures and Algorithm - Week 4 - Trees, Binary TreesData Structures and Algorithm - Week 4 - Trees, Binary Trees
Data Structures and Algorithm - Week 4 - Trees, Binary Trees
 
Data Structures and Algorithm - Week 6 - Red Black Trees
Data Structures and Algorithm - Week 6 - Red Black TreesData Structures and Algorithm - Week 6 - Red Black Trees
Data Structures and Algorithm - Week 6 - Red Black Trees
 
Webinar: Blockchain Beyond Cryptocurrencies
Webinar: Blockchain Beyond CryptocurrenciesWebinar: Blockchain Beyond Cryptocurrencies
Webinar: Blockchain Beyond Cryptocurrencies
 
Ifla203 archambault
Ifla203 archambaultIfla203 archambault
Ifla203 archambault
 
FDS_dept_ppt.pptx
FDS_dept_ppt.pptxFDS_dept_ppt.pptx
FDS_dept_ppt.pptx
 
Data wrangling week 9
Data wrangling week 9Data wrangling week 9
Data wrangling week 9
 
Effective STEM Education Solutions Webinar
Effective STEM Education Solutions WebinarEffective STEM Education Solutions Webinar
Effective STEM Education Solutions Webinar
 
Data Science Curriculum at Indiana University
Data Science Curriculum at Indiana UniversityData Science Curriculum at Indiana University
Data Science Curriculum at Indiana University
 
Research Methodology
Research MethodologyResearch Methodology
Research Methodology
 
The systemic challenges in data science initiatives (and some solutions)
The systemic challenges in data science initiatives (and some solutions)The systemic challenges in data science initiatives (and some solutions)
The systemic challenges in data science initiatives (and some solutions)
 
A seminar on neo4 j
A seminar on neo4 jA seminar on neo4 j
A seminar on neo4 j
 
Deep Learning and CNN Architectures
Deep Learning and CNN ArchitecturesDeep Learning and CNN Architectures
Deep Learning and CNN Architectures
 
Data wrangling week1
Data wrangling week1Data wrangling week1
Data wrangling week1
 
UnSupervised Learning Clustering
UnSupervised Learning ClusteringUnSupervised Learning Clustering
UnSupervised Learning Clustering
 
Smart SE: Recurrent Education Program of IoT and AI for Business
Smart SE: Recurrent Education Program of IoT and AI for BusinessSmart SE: Recurrent Education Program of IoT and AI for Business
Smart SE: Recurrent Education Program of IoT and AI for Business
 
1. Intro DS.pptx
1. Intro DS.pptx1. Intro DS.pptx
1. Intro DS.pptx
 
2015 03-28-eb-final
2015 03-28-eb-final2015 03-28-eb-final
2015 03-28-eb-final
 

Plus de Ferdin Joe John Joseph PhD

Week 11: Cloud Native- DSA 441 Cloud Computing
Week 11: Cloud Native- DSA 441 Cloud ComputingWeek 11: Cloud Native- DSA 441 Cloud Computing
Week 11: Cloud Native- DSA 441 Cloud ComputingFerdin Joe John Joseph PhD
 
Week 10: Cloud Security- DSA 441 Cloud Computing
Week 10: Cloud Security- DSA 441 Cloud ComputingWeek 10: Cloud Security- DSA 441 Cloud Computing
Week 10: Cloud Security- DSA 441 Cloud ComputingFerdin Joe John Joseph PhD
 
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...Ferdin Joe John Joseph PhD
 
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...Ferdin Joe John Joseph PhD
 
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud ComputingWeek 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud ComputingFerdin Joe John Joseph PhD
 
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...Ferdin Joe John Joseph PhD
 
Week 2: Virtualization and VM Ware - DSA 441 Cloud Computing
Week 2: Virtualization and VM Ware - DSA 441 Cloud ComputingWeek 2: Virtualization and VM Ware - DSA 441 Cloud Computing
Week 2: Virtualization and VM Ware - DSA 441 Cloud ComputingFerdin Joe John Joseph PhD
 
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculum
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculumSept 6 2021 BTech Artificial Intelligence and Data Science curriculum
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculumFerdin Joe John Joseph PhD
 
Transforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approachTransforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approachFerdin Joe John Joseph PhD
 

Plus de Ferdin Joe John Joseph PhD (15)

Invited Talk DGTiCon 2022
Invited Talk DGTiCon 2022Invited Talk DGTiCon 2022
Invited Talk DGTiCon 2022
 
Week 12: Cloud AI- DSA 441 Cloud Computing
Week 12: Cloud AI- DSA 441 Cloud ComputingWeek 12: Cloud AI- DSA 441 Cloud Computing
Week 12: Cloud AI- DSA 441 Cloud Computing
 
Week 11: Cloud Native- DSA 441 Cloud Computing
Week 11: Cloud Native- DSA 441 Cloud ComputingWeek 11: Cloud Native- DSA 441 Cloud Computing
Week 11: Cloud Native- DSA 441 Cloud Computing
 
Week 10: Cloud Security- DSA 441 Cloud Computing
Week 10: Cloud Security- DSA 441 Cloud ComputingWeek 10: Cloud Security- DSA 441 Cloud Computing
Week 10: Cloud Security- DSA 441 Cloud Computing
 
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
 
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
 
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud ComputingWeek 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
 
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
 
Week 2: Virtualization and VM Ware - DSA 441 Cloud Computing
Week 2: Virtualization and VM Ware - DSA 441 Cloud ComputingWeek 2: Virtualization and VM Ware - DSA 441 Cloud Computing
Week 2: Virtualization and VM Ware - DSA 441 Cloud Computing
 
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculum
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculumSept 6 2021 BTech Artificial Intelligence and Data Science curriculum
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculum
 
Hadoop in Alibaba Cloud
Hadoop in Alibaba CloudHadoop in Alibaba Cloud
Hadoop in Alibaba Cloud
 
Cloud Computing Essentials in Alibaba Cloud
Cloud Computing Essentials in Alibaba CloudCloud Computing Essentials in Alibaba Cloud
Cloud Computing Essentials in Alibaba Cloud
 
Transforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approachTransforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approach
 
Deep learning - Introduction
Deep learning - IntroductionDeep learning - Introduction
Deep learning - Introduction
 
Data wrangling week 11
Data wrangling week 11Data wrangling week 11
Data wrangling week 11
 

Dernier

FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 

Dernier (20)

FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 

Programming for Data Analysis: Week 3

  • 1. Programming for Data Analysis Week 3 Dr. Ferdin Joe John Joseph Faculty of Information Technology Thai – Nichi Institute of Technology, Bangkok
  • 2. Today’s lesson Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 2 • Pivoting • Binning • Replacing and Renaming • Laboratory
  • 3. Pivoting in pandas pandas.DataFrame.pivot_table Syntax: DataFrame.pivot_table(values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All') Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 3
  • 4. Parameters Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 4
  • 5. Output A pivoted table in the form of dataframe Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 5
  • 6. Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 6
  • 7. Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 7
  • 8. Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 8
  • 9. Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 9
  • 10. Binning • When dealing with continuous numeric data, it is often helpful to bin the data into multiple buckets for further analysis. • There are several different terms for binning including bucketing, discrete binning, discretization or quantization. • Pandas supports these approaches using the cut and qcut functions. • Histogram is mostly used to visualize Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 10
  • 11. Binning Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 11
  • 12. Binning • Pandas to process data • Numpy to calculate arrays • Seaborn to visualize histogram Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 12
  • 13. Qcut • Qcut is used to divide data in four quarters equally • when you ask for quintiles with qcut, the bins will be chosen so that you have the same number of records in each bin. You have 30 records, so should have 6 in each bin (your output should look like this, although the breakpoints will differ due to the random draw) Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 13
  • 14. Cut • cut will choose the bins to be evenly spaced according to the values themselves and not the frequency of those values. Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 14
  • 15. Binning – Read Data Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 15
  • 16. Binning Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 16
  • 17. Binning Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 17
  • 18. Binning Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 18
  • 19. Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 19
  • 20. Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 20
  • 21. Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 21
  • 22. Binning quantized in variable Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 22
  • 23. Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 23
  • 24. Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 24
  • 25. Naming Bins Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 25
  • 26. Binning – Other applications • Image histograms Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 26
  • 27. Statistical Data Binning • Statistical data binning is a way to group numbers of more or less continuous values into a smaller number of "bins". • For example, if you have data about a group of people, you might want to arrange their ages into a smaller number of age intervals (for example, grouping every five years together). Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 27
  • 28. Methods to divide Bins • Equal frequency binning • Equal width binning Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 28
  • 29. Equal frequency binning • Bins have equal frequency Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 29
  • 30. Equal Width Binning Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 30
  • 31. Code Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 31
  • 32. Advantages • binning allows easy identification of outliers, • invalid and missing values of numerical variables. Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 32
  • 33. DSA 207 - Binning • Create pivot table to find month wise average of internal and external temperature, humidity and carbon monoxide levels in the fish data • Visualize the binning of humidity levels in fish data over a particular time of a day in a month. Do it with the following • 1. Qcut • 2. Cut • 3. Naming Bins Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 33