2. About me
• Education
• NCU (MIS)、NCCU (CS)
• Work Experience
• Telecom big data Innovation
• AI projects
• Retail marketing technology
• User Group
• TW Spark User Group
• TW Hadoop User Group
• Taiwan Data Engineer Association Director
• Research
• Big Data/ ML/ AIOT/ AI Columnist
2
6. 6
The Pyramid of Data Needs (and why it matters for your career) | by Hugh Williams | Medium
7. 7
The Pyramid of Data Needs (and why it matters for your career) | by Hugh Williams | Medium
8. Static chart
• There are generally THREE STEPS in drawing a chart:
• Observing the data, determine the relationship, and select the chart.
• What type of data it is, and what content you want to express.
• Category
• Numeric
• Text
• Datetime
• After clarifying the content to be expressed, you can choose which chart to
use to express it.
8
9. Pie chart
• You must have some kind of whole
amount that is divided into a number
of distinct parts.
• Your primary objective in a pie chart
should be to compare each group’s
contribution to the whole.
9
10. Line chart
• Line charts provide the clearest
graphical representation of time-
related variables and are the
preferred mode for representing
trends or variables over time.
10
11. Histogram chart
• It is used to summarize discrete
or continuous data that are
measured on an interval scale.
• It is often used to illustrate the
major features of the distribution
of the data in a convenient form.
11
12. Bar chart
• It provides a way of showing
data values represented as
the comparison of multiple
data sets side by side.
12
13. Differences between histogram and bar chart
Comparison terms Bar chart Histogram
Usage
To compare different categories of
data.
To display the distribution of a variable.
Type of variable Categorical variables Numeric variables
Rendering
Each data point is rendered as a
separate bar.
The data points are grouped and
rendered based on the bin value.
The entire range of data values is
divided into a series of non-
overlapping intervals.
Space between bars Can have space. No space.
Reordering bars Can be reordered. Cannot be reordered.
13
14. Scatter Plot
• It uses dots to
represent values for
two different numeric
variables and observe
relationships between
variables.
14
15. Box plot
• Q1: The first quartile (25%) position.
• Q3: The third quartile (75%) position.
• Interquartile range (IQR)
• Lower and upper 1.5*IQR whiskers:
These represent the limits and
boundaries for the outliers.
• Outliers: Defined as observations that
fall below Q1 − 1.5 IQR or above Q3 +
1.5 IQR.
15
20. Preprocess (Remove or Impute columns)
• Impute columns
• For Default Method
• For each column
20
21. Pie chart
• Orange 3 has deprecated
Pie chat widget
• Use python script instead
• Find the output file
21
22. Line chart
• Typically, trend analysis
charts are presented
together with time-based
data
22
23. Distribution chart
• Used to present by sorting
frequency
• In Orange 3, both of numeric
or category data can be
presented here
• Bar chart widget is not used
much compared to others
23
24. Scatter plot
• It used to observe the degree
of correlation between
features
• positive correlation
• negative correlation
• noncorrelation
24
26. Pivot Table
• It summarizes the data
of a more extensive
table into a table of
statistics.
• The statistics can include
sums, averages, counts,
etc.
26
27. 1. Show me top 10 data rows
• Hint: Use Data Sampler widget
27
28. 2. Show me dataset info
• How many Rows?
• How many Features?
• All information like this!
28
30. 4. Survival Conclusion
• For features, SEX, PCLASS, SIBSP,
PARCH, EMBARKED
• Women had a higher chance of survival
than men.
• First-class passengers had a higher
chance of survival.
• Passengers with siblings, spouses had a
higher chance of survival.
• Passengers with children and parents
had a higher chance of survival.
• Departing from the S terminal may
lead to lower cabin class and lower
chances of survival.
30
32. 6. Look at survival rate by SEX and PCLASS
• Women in first class had a survival rate as high as 96.8%. In contrast,
men in economy class only had a 13.54% chance of survival
32
33. 7. Look at survival rate by SEX, AGE and
PCLASS
• In the event of a disaster, women in
first class or business class have a 90%
chance of survival regardless of age.
• On the other hand, if a man is in
economy class and older than 18, the
chance of survival is only 13.36%.
• To summarize, in a disaster scenario,
girls and women have a higher chance
of survival compared to boys and men.
• Additionally, the higher the class (such
as first class), the higher the chances
of survival.
33
34. 8. The price paid of each class
• Try to plot Pclass and Fare chart
to visualize data
• Every seat had someone board
for free, while others spent over
500 pounds for a first-class
ticket. It's quite an interesting
observation!
34