SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
EDA Visualization
Orozco Hsu
2023-10-31
1
About me
• Education
• NCU (MIS)、NCCU (CS)
• Work Experience
• Telecom big data Innovation
• AI projects
• Retail marketing technology
• User Group
• TW Spark User Group
• TW Hadoop User Group
• Taiwan Data Engineer Association Director
• Research
• Big Data/ ML/ AIOT/ AI Columnist
2
Tutorial
Content
3
Data Science Process
Exploration Data Analysis and Visualization
Home work
Code
• Download materials:
• https://drive.google.com/drive/folders/1KPC3K19_vJgRb5Op5M9bqQYJEh2zv
rnr?usp=sharing
4
Get ready to your Orange 3
• Version: 3.36.1
5
6
The Pyramid of Data Needs (and why it matters for your career) | by Hugh Williams | Medium
7
The Pyramid of Data Needs (and why it matters for your career) | by Hugh Williams | Medium
Static chart
• There are generally THREE STEPS in drawing a chart:
• Observing the data, determine the relationship, and select the chart.
• What type of data it is, and what content you want to express.
• Category
• Numeric
• Text
• Datetime
• After clarifying the content to be expressed, you can choose which chart to
use to express it.
8
Pie chart
• You must have some kind of whole
amount that is divided into a number
of distinct parts.
• Your primary objective in a pie chart
should be to compare each group’s
contribution to the whole.
9
Line chart
• Line charts provide the clearest
graphical representation of time-
related variables and are the
preferred mode for representing
trends or variables over time.
10
Histogram chart
• It is used to summarize discrete
or continuous data that are
measured on an interval scale.
• It is often used to illustrate the
major features of the distribution
of the data in a convenient form.
11
Bar chart
• It provides a way of showing
data values represented as
the comparison of multiple
data sets side by side.
12
Differences between histogram and bar chart
Comparison terms Bar chart Histogram
Usage
To compare different categories of
data.
To display the distribution of a variable.
Type of variable Categorical variables Numeric variables
Rendering
Each data point is rendered as a
separate bar.
The data points are grouped and
rendered based on the bin value.
The entire range of data values is
divided into a series of non-
overlapping intervals.
Space between bars Can have space. No space.
Reordering bars Can be reordered. Cannot be reordered.
13
Scatter Plot
• It uses dots to
represent values for
two different numeric
variables and observe
relationships between
variables.
14
Box plot
• Q1: The first quartile (25%) position.
• Q3: The third quartile (75%) position.
• Interquartile range (IQR)
• Lower and upper 1.5*IQR whiskers:
These represent the limits and
boundaries for the outliers.
• Outliers: Defined as observations that
fall below Q1 − 1.5 IQR or above Q3 +
1.5 IQR.
15
Dataset description
• Using this dataset to
predict whether passengers
will survive the Titanic
accident
16
Data Summary
• Load titanic.csv
• Data description
• Names, Types, Role, Values
• Change the Columns
17
Data Summary
• Missing values
• Using Features Statistics
Widget
• How about those missing
ratios?
18
Preprocess (Remove or Impute columns)
• Remove columns
19
Preprocess (Remove or Impute columns)
• Impute columns
• For Default Method
• For each column
20
Pie chart
• Orange 3 has deprecated
Pie chat widget
• Use python script instead
• Find the output file
21
Line chart
• Typically, trend analysis
charts are presented
together with time-based
data
22
Distribution chart
• Used to present by sorting
frequency
• In Orange 3, both of numeric
or category data can be
presented here
• Bar chart widget is not used
much compared to others
23
Scatter plot
• It used to observe the degree
of correlation between
features
• positive correlation
• negative correlation
• noncorrelation
24
Box plot
• Comparing multiple
features with each other
25
Pivot Table
• It summarizes the data
of a more extensive
table into a table of
statistics.
• The statistics can include
sums, averages, counts,
etc.
26
1. Show me top 10 data rows
• Hint: Use Data Sampler widget
27
2. Show me dataset info
• How many Rows?
• How many Features?
• All information like this!
28
3. Get a count of the number of survivors
29
4. Survival Conclusion
• For features, SEX, PCLASS, SIBSP,
PARCH, EMBARKED
• Women had a higher chance of survival
than men.
• First-class passengers had a higher
chance of survival.
• Passengers with siblings, spouses had a
higher chance of survival.
• Passengers with children and parents
had a higher chance of survival.
• Departing from the S terminal may
lead to lower cabin class and lower
chances of survival.
30
5. Show me sex survival rate
31
6. Look at survival rate by SEX and PCLASS
• Women in first class had a survival rate as high as 96.8%. In contrast,
men in economy class only had a 13.54% chance of survival
32
7. Look at survival rate by SEX, AGE and
PCLASS
• In the event of a disaster, women in
first class or business class have a 90%
chance of survival regardless of age.
• On the other hand, if a man is in
economy class and older than 18, the
chance of survival is only 13.36%.
• To summarize, in a disaster scenario,
girls and women have a higher chance
of survival compared to boys and men.
• Additionally, the higher the class (such
as first class), the higher the chances
of survival.
33
8. The price paid of each class
• Try to plot Pclass and Fare chart
to visualize data
• Every seat had someone board
for free, while others spent over
500 pounds for a first-class
ticket. It's quite an interesting
observation!
34

Contenu connexe

Similaire à 202312 Exploration of Data Analysis Visualization

Reif Regression Diagnostics I and II
Reif Regression Diagnostics I and IIReif Regression Diagnostics I and II
Reif Regression Diagnostics I and IIMegan Reif
 
Exploratory Data Analysis week 4
Exploratory Data Analysis week 4Exploratory Data Analysis week 4
Exploratory Data Analysis week 4Manzur Ashraf
 
4. six sigma descriptive statistics
4. six sigma descriptive statistics4. six sigma descriptive statistics
4. six sigma descriptive statisticsHakeem-Ur- Rehman
 
Presentation of Project and Critique.pptx
Presentation of Project and Critique.pptxPresentation of Project and Critique.pptx
Presentation of Project and Critique.pptxBillyMoses1
 
2. AAdata presentation edited edited tutor srudents(1).pptx
2. AAdata presentation edited edited tutor srudents(1).pptx2. AAdata presentation edited edited tutor srudents(1).pptx
2. AAdata presentation edited edited tutor srudents(1).pptxssuser504dda
 
A-guide-to-creating-and-interpreting-run-and-control-charts
A-guide-to-creating-and-interpreting-run-and-control-chartsA-guide-to-creating-and-interpreting-run-and-control-charts
A-guide-to-creating-and-interpreting-run-and-control-chartsRohitLakhotia12
 
Data Science and Data Visualization (All about Data Analysis) by Pooja Ajmera
Data Science and Data Visualization (All about Data Analysis) by Pooja AjmeraData Science and Data Visualization (All about Data Analysis) by Pooja Ajmera
Data Science and Data Visualization (All about Data Analysis) by Pooja AjmeraPooja Ajmera
 
Anomaly detection Meetup Slides
Anomaly detection Meetup SlidesAnomaly detection Meetup Slides
Anomaly detection Meetup SlidesQuantUniversity
 
cs 601 - lecture 1.pptx
cs 601 - lecture 1.pptxcs 601 - lecture 1.pptx
cs 601 - lecture 1.pptxGopalPatidar13
 
The Role of Histograms in Exploring Data Insights
The Role of Histograms in Exploring Data InsightsThe Role of Histograms in Exploring Data Insights
The Role of Histograms in Exploring Data InsightsCIToolkit
 
Humanmetrics Jung Typology Test™You haven’t answered 1 que
Humanmetrics Jung Typology Test™You haven’t answered 1 queHumanmetrics Jung Typology Test™You haven’t answered 1 que
Humanmetrics Jung Typology Test™You haven’t answered 1 queNarcisaBrandenburg70
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Henock Beyene
 
1.0 Descriptive statistics.pdf
1.0 Descriptive statistics.pdf1.0 Descriptive statistics.pdf
1.0 Descriptive statistics.pdfthaersyam
 

Similaire à 202312 Exploration of Data Analysis Visualization (20)

7 QC - NEW.ppt
7 QC - NEW.ppt7 QC - NEW.ppt
7 QC - NEW.ppt
 
Reif Regression Diagnostics I and II
Reif Regression Diagnostics I and IIReif Regression Diagnostics I and II
Reif Regression Diagnostics I and II
 
Exploratory Data Analysis week 4
Exploratory Data Analysis week 4Exploratory Data Analysis week 4
Exploratory Data Analysis week 4
 
4. six sigma descriptive statistics
4. six sigma descriptive statistics4. six sigma descriptive statistics
4. six sigma descriptive statistics
 
EDA.pptx
EDA.pptxEDA.pptx
EDA.pptx
 
quality tools.ppt
quality tools.pptquality tools.ppt
quality tools.ppt
 
Presentation of Project and Critique.pptx
Presentation of Project and Critique.pptxPresentation of Project and Critique.pptx
Presentation of Project and Critique.pptx
 
2. AAdata presentation edited edited tutor srudents(1).pptx
2. AAdata presentation edited edited tutor srudents(1).pptx2. AAdata presentation edited edited tutor srudents(1).pptx
2. AAdata presentation edited edited tutor srudents(1).pptx
 
A-guide-to-creating-and-interpreting-run-and-control-charts
A-guide-to-creating-and-interpreting-run-and-control-chartsA-guide-to-creating-and-interpreting-run-and-control-charts
A-guide-to-creating-and-interpreting-run-and-control-charts
 
4 module 3 --
4 module 3 --4 module 3 --
4 module 3 --
 
Bba 2001
Bba 2001Bba 2001
Bba 2001
 
Data Science and Data Visualization (All about Data Analysis) by Pooja Ajmera
Data Science and Data Visualization (All about Data Analysis) by Pooja AjmeraData Science and Data Visualization (All about Data Analysis) by Pooja Ajmera
Data Science and Data Visualization (All about Data Analysis) by Pooja Ajmera
 
Anomaly detection Meetup Slides
Anomaly detection Meetup SlidesAnomaly detection Meetup Slides
Anomaly detection Meetup Slides
 
cs 601 - lecture 1.pptx
cs 601 - lecture 1.pptxcs 601 - lecture 1.pptx
cs 601 - lecture 1.pptx
 
The Role of Histograms in Exploring Data Insights
The Role of Histograms in Exploring Data InsightsThe Role of Histograms in Exploring Data Insights
The Role of Histograms in Exploring Data Insights
 
Humanmetrics Jung Typology Test™You haven’t answered 1 que
Humanmetrics Jung Typology Test™You haven’t answered 1 queHumanmetrics Jung Typology Test™You haven’t answered 1 que
Humanmetrics Jung Typology Test™You haven’t answered 1 que
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01
 
Week_2_Lecture.pdf
Week_2_Lecture.pdfWeek_2_Lecture.pdf
Week_2_Lecture.pdf
 
1.0 Descriptive statistics.pdf
1.0 Descriptive statistics.pdf1.0 Descriptive statistics.pdf
1.0 Descriptive statistics.pdf
 
Data organization
Data organizationData organization
Data organization
 

Plus de FEG

Sequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdfSequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdfFEG
 
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdfFEG
 
Pytorch cnn netowork introduction 20240318
Pytorch cnn netowork introduction 20240318Pytorch cnn netowork introduction 20240318
Pytorch cnn netowork introduction 20240318FEG
 
2023 Decision Tree analysis in business practices
2023 Decision Tree analysis in business practices2023 Decision Tree analysis in business practices
2023 Decision Tree analysis in business practicesFEG
 
2023 Clustering analysis using Python from scratch
2023 Clustering analysis using Python from scratch2023 Clustering analysis using Python from scratch
2023 Clustering analysis using Python from scratchFEG
 
2023 Data visualization using Python from scratch
2023 Data visualization using Python from scratch2023 Data visualization using Python from scratch
2023 Data visualization using Python from scratchFEG
 
2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratchFEG
 
2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_RulesFEG
 
202312 Exploration Data Analysis Visualization (English version)
202312 Exploration Data Analysis Visualization (English version)202312 Exploration Data Analysis Visualization (English version)
202312 Exploration Data Analysis Visualization (English version)FEG
 
Transfer Learning (20230516)
Transfer Learning (20230516)Transfer Learning (20230516)
Transfer Learning (20230516)FEG
 
Image Classification (20230411)
Image Classification (20230411)Image Classification (20230411)
Image Classification (20230411)FEG
 
Google CoLab (20230321)
Google CoLab (20230321)Google CoLab (20230321)
Google CoLab (20230321)FEG
 
Supervised Learning
Supervised LearningSupervised Learning
Supervised LearningFEG
 
UnSupervised Learning Clustering
UnSupervised Learning ClusteringUnSupervised Learning Clustering
UnSupervised Learning ClusteringFEG
 
6_Association_rule_碩士班第六次.pdf
6_Association_rule_碩士班第六次.pdf6_Association_rule_碩士班第六次.pdf
6_Association_rule_碩士班第六次.pdfFEG
 
5_Neural_network_碩士班第五次.pdf
5_Neural_network_碩士班第五次.pdf5_Neural_network_碩士班第五次.pdf
5_Neural_network_碩士班第五次.pdfFEG
 
4_Regression_analysis.pdf
4_Regression_analysis.pdf4_Regression_analysis.pdf
4_Regression_analysis.pdfFEG
 
3_Decision_tree.pdf
3_Decision_tree.pdf3_Decision_tree.pdf
3_Decision_tree.pdfFEG
 
2_Clustering.pdf
2_Clustering.pdf2_Clustering.pdf
2_Clustering.pdfFEG
 
1_大二班_資料視覺化_20221028.pdf
1_大二班_資料視覺化_20221028.pdf1_大二班_資料視覺化_20221028.pdf
1_大二班_資料視覺化_20221028.pdfFEG
 

Plus de FEG (20)

Sequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdfSequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdf
 
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
 
Pytorch cnn netowork introduction 20240318
Pytorch cnn netowork introduction 20240318Pytorch cnn netowork introduction 20240318
Pytorch cnn netowork introduction 20240318
 
2023 Decision Tree analysis in business practices
2023 Decision Tree analysis in business practices2023 Decision Tree analysis in business practices
2023 Decision Tree analysis in business practices
 
2023 Clustering analysis using Python from scratch
2023 Clustering analysis using Python from scratch2023 Clustering analysis using Python from scratch
2023 Clustering analysis using Python from scratch
 
2023 Data visualization using Python from scratch
2023 Data visualization using Python from scratch2023 Data visualization using Python from scratch
2023 Data visualization using Python from scratch
 
2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch
 
2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules
 
202312 Exploration Data Analysis Visualization (English version)
202312 Exploration Data Analysis Visualization (English version)202312 Exploration Data Analysis Visualization (English version)
202312 Exploration Data Analysis Visualization (English version)
 
Transfer Learning (20230516)
Transfer Learning (20230516)Transfer Learning (20230516)
Transfer Learning (20230516)
 
Image Classification (20230411)
Image Classification (20230411)Image Classification (20230411)
Image Classification (20230411)
 
Google CoLab (20230321)
Google CoLab (20230321)Google CoLab (20230321)
Google CoLab (20230321)
 
Supervised Learning
Supervised LearningSupervised Learning
Supervised Learning
 
UnSupervised Learning Clustering
UnSupervised Learning ClusteringUnSupervised Learning Clustering
UnSupervised Learning Clustering
 
6_Association_rule_碩士班第六次.pdf
6_Association_rule_碩士班第六次.pdf6_Association_rule_碩士班第六次.pdf
6_Association_rule_碩士班第六次.pdf
 
5_Neural_network_碩士班第五次.pdf
5_Neural_network_碩士班第五次.pdf5_Neural_network_碩士班第五次.pdf
5_Neural_network_碩士班第五次.pdf
 
4_Regression_analysis.pdf
4_Regression_analysis.pdf4_Regression_analysis.pdf
4_Regression_analysis.pdf
 
3_Decision_tree.pdf
3_Decision_tree.pdf3_Decision_tree.pdf
3_Decision_tree.pdf
 
2_Clustering.pdf
2_Clustering.pdf2_Clustering.pdf
2_Clustering.pdf
 
1_大二班_資料視覺化_20221028.pdf
1_大二班_資料視覺化_20221028.pdf1_大二班_資料視覺化_20221028.pdf
1_大二班_資料視覺化_20221028.pdf
 

Dernier

Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdfkhraisr
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themeitharjee
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...HyderabadDolls
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfSayantanBiswas37
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 

Dernier (20)

Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 

202312 Exploration of Data Analysis Visualization

  • 2. About me • Education • NCU (MIS)、NCCU (CS) • Work Experience • Telecom big data Innovation • AI projects • Retail marketing technology • User Group • TW Spark User Group • TW Hadoop User Group • Taiwan Data Engineer Association Director • Research • Big Data/ ML/ AIOT/ AI Columnist 2
  • 3. Tutorial Content 3 Data Science Process Exploration Data Analysis and Visualization Home work
  • 4. Code • Download materials: • https://drive.google.com/drive/folders/1KPC3K19_vJgRb5Op5M9bqQYJEh2zv rnr?usp=sharing 4
  • 5. Get ready to your Orange 3 • Version: 3.36.1 5
  • 6. 6 The Pyramid of Data Needs (and why it matters for your career) | by Hugh Williams | Medium
  • 7. 7 The Pyramid of Data Needs (and why it matters for your career) | by Hugh Williams | Medium
  • 8. Static chart • There are generally THREE STEPS in drawing a chart: • Observing the data, determine the relationship, and select the chart. • What type of data it is, and what content you want to express. • Category • Numeric • Text • Datetime • After clarifying the content to be expressed, you can choose which chart to use to express it. 8
  • 9. Pie chart • You must have some kind of whole amount that is divided into a number of distinct parts. • Your primary objective in a pie chart should be to compare each group’s contribution to the whole. 9
  • 10. Line chart • Line charts provide the clearest graphical representation of time- related variables and are the preferred mode for representing trends or variables over time. 10
  • 11. Histogram chart • It is used to summarize discrete or continuous data that are measured on an interval scale. • It is often used to illustrate the major features of the distribution of the data in a convenient form. 11
  • 12. Bar chart • It provides a way of showing data values represented as the comparison of multiple data sets side by side. 12
  • 13. Differences between histogram and bar chart Comparison terms Bar chart Histogram Usage To compare different categories of data. To display the distribution of a variable. Type of variable Categorical variables Numeric variables Rendering Each data point is rendered as a separate bar. The data points are grouped and rendered based on the bin value. The entire range of data values is divided into a series of non- overlapping intervals. Space between bars Can have space. No space. Reordering bars Can be reordered. Cannot be reordered. 13
  • 14. Scatter Plot • It uses dots to represent values for two different numeric variables and observe relationships between variables. 14
  • 15. Box plot • Q1: The first quartile (25%) position. • Q3: The third quartile (75%) position. • Interquartile range (IQR) • Lower and upper 1.5*IQR whiskers: These represent the limits and boundaries for the outliers. • Outliers: Defined as observations that fall below Q1 − 1.5 IQR or above Q3 + 1.5 IQR. 15
  • 16. Dataset description • Using this dataset to predict whether passengers will survive the Titanic accident 16
  • 17. Data Summary • Load titanic.csv • Data description • Names, Types, Role, Values • Change the Columns 17
  • 18. Data Summary • Missing values • Using Features Statistics Widget • How about those missing ratios? 18
  • 19. Preprocess (Remove or Impute columns) • Remove columns 19
  • 20. Preprocess (Remove or Impute columns) • Impute columns • For Default Method • For each column 20
  • 21. Pie chart • Orange 3 has deprecated Pie chat widget • Use python script instead • Find the output file 21
  • 22. Line chart • Typically, trend analysis charts are presented together with time-based data 22
  • 23. Distribution chart • Used to present by sorting frequency • In Orange 3, both of numeric or category data can be presented here • Bar chart widget is not used much compared to others 23
  • 24. Scatter plot • It used to observe the degree of correlation between features • positive correlation • negative correlation • noncorrelation 24
  • 25. Box plot • Comparing multiple features with each other 25
  • 26. Pivot Table • It summarizes the data of a more extensive table into a table of statistics. • The statistics can include sums, averages, counts, etc. 26
  • 27. 1. Show me top 10 data rows • Hint: Use Data Sampler widget 27
  • 28. 2. Show me dataset info • How many Rows? • How many Features? • All information like this! 28
  • 29. 3. Get a count of the number of survivors 29
  • 30. 4. Survival Conclusion • For features, SEX, PCLASS, SIBSP, PARCH, EMBARKED • Women had a higher chance of survival than men. • First-class passengers had a higher chance of survival. • Passengers with siblings, spouses had a higher chance of survival. • Passengers with children and parents had a higher chance of survival. • Departing from the S terminal may lead to lower cabin class and lower chances of survival. 30
  • 31. 5. Show me sex survival rate 31
  • 32. 6. Look at survival rate by SEX and PCLASS • Women in first class had a survival rate as high as 96.8%. In contrast, men in economy class only had a 13.54% chance of survival 32
  • 33. 7. Look at survival rate by SEX, AGE and PCLASS • In the event of a disaster, women in first class or business class have a 90% chance of survival regardless of age. • On the other hand, if a man is in economy class and older than 18, the chance of survival is only 13.36%. • To summarize, in a disaster scenario, girls and women have a higher chance of survival compared to boys and men. • Additionally, the higher the class (such as first class), the higher the chances of survival. 33
  • 34. 8. The price paid of each class • Try to plot Pclass and Fare chart to visualize data • Every seat had someone board for free, while others spent over 500 pounds for a first-class ticket. It's quite an interesting observation! 34