1. Programming for Data
Analysis
Week 3
Dr. Ferdin Joe John Joseph
Faculty of Information Technology
Thai – Nichi Institute of Technology, Bangkok
2. Today’s lesson
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
2
• Pivoting
• Binning
• Replacing and Renaming
• Laboratory
10. Binning
• When dealing with continuous numeric data, it is often helpful to bin
the data into multiple buckets for further analysis.
• There are several different terms for binning including bucketing,
discrete binning, discretization or quantization.
• Pandas supports these approaches using the cut and qcut functions.
• Histogram is mostly used to visualize
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
10
12. Binning
• Pandas to process data
• Numpy to calculate arrays
• Seaborn to visualize histogram
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
12
13. Qcut
• Qcut is used to divide data in four quarters equally
• when you ask for quintiles with qcut, the bins will be chosen so that
you have the same number of records in each bin. You have 30
records, so should have 6 in each bin (your output should look like
this, although the breakpoints will differ due to the random draw)
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
13
14. Cut
• cut will choose the bins to be evenly spaced according to the values
themselves and not the frequency of those values.
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
14
15. Binning – Read Data
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
15
25. Naming Bins
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
25
26. Binning – Other applications
• Image histograms
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
26
27. Statistical Data Binning
• Statistical data binning is a way to group numbers of more or less
continuous values into a smaller number of "bins".
• For example, if you have data about a group of people, you might
want to arrange their ages into a smaller number of age intervals (for
example, grouping every five years together).
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
27
28. Methods to divide Bins
• Equal frequency binning
• Equal width binning
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
28
29. Equal frequency binning
• Bins have equal frequency
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
29
32. Advantages
• binning allows easy identification of outliers,
• invalid and missing values of numerical variables.
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
32
33. DSA 207 - Binning
• Create pivot table to find month wise average of internal and external
temperature, humidity and carbon monoxide levels in the fish data
• Visualize the binning of humidity levels in fish data over a particular
time of a day in a month. Do it with the following
• 1. Qcut
• 2. Cut
• 3. Naming Bins
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
33