SlideShare une entreprise Scribd logo
1  sur  20
Télécharger pour lire hors ligne
Feature Engineering for IoT
Darryl Ng
#ISSLearningFest
Rise of IoT
#ISSLearningFest
https://www.statista.com/statistics/1183457/iot-connected-devices-worldwide/
IoT Reference Architecture
#ISSLearningFest
https://docs.microsoft.com/en-us/azure/architecture/reference-architectures/iot
Sense Connect Collect Process Act
Devices generate events
•Through platform to
application
Insights based on data
•Derived by evaluating
incoming device events
Actions based on insights
•Execute processes and
workflows in the application
DATA
IoT and Cloud Providers
1. Capabilities added to the devices
a. Device side processing
• Real-time analytics, edge ML capabilities
2. Gateway to communicate with downstream, heterogeneous devices
3. Cloud services
a. Device management capabilities, i.e. device shadowing, provisioning,
OTA updates, security
a. Stream processing
b. Big data stack
• Analytics and visualization
#ISSLearningFest
Cloud-centric Device/Gateway-
centric
Handling Data
#ISSLearningFest
Volume
Velocity
Variety
Veracity
Value
Data
reduction
Data
transformation
Data
integration
Data cleaning
Data
discretization
Data Collection
• Data collection can be a significant effort
in machine learning
• Types of Data
• Historical Data (e.g. past weather)
• Generated data (e.g. weather from sensors)
• Manually collected (e.g. observe or visual
inputs at different times of the day)
• Collect data to infer its probability
distribution
• Generate more data from the probability
distribution
#ISSLearningFest
Feature Engineering
• Extracting features out of data and transforming them into something
that can be used as a learning model in machine learning algorithm
• Accuracy of machine learning model depends on the quality of data
used for learning
• Good Features => Model learns quickly
• Bad Features => Model doesn’t learn
#ISSLearningFest
Features, Samples and Label
date precipitation temp_max temp_min wind weather
1/1/2012 0 12.8 5 4.7drizzle
1/2/2012 10.9 10.6 2.8 4.5rain
1/3/2012 0.8 11.7 7.2 2.3rain
1/4/2012 20.3 12.2 5.6 4.7rain
1/5/2012 1.3 8.9 2.8 6.1rain
1/6/2012 2.5 4.4 2.2 2.2rain
1/7/2012 0 7.2 2.8 2.3rain
1/8/2012 0 10 2.8 2sun
1/9/2012 4.3 9.4 5 3.4rain
1/10/2012 1 6.1 0.6 3.4rain
1/11/2012 0 6.1 -1.1 5.1sun
1/12/2012 0 6.1 -1.7 1.9sun
1/13/2012 0 5 -2.8 1.3sun
1/14/2012 0 16.1 1.7 4.3sun
1/15/2012 0 21.1 7.2 4.1sun
1/16/2012 0 20 6.1 2.1sun
1/17/2012 0 14.4 3.9 3sun
1/18/2012 0 18.3 4.4 4.3sun
1/19/2012 0 25.6 12.8 2.2drizzle
1/20/2012 0 18.9 13.9 2.8drizzle
1/21/2012 0 22.2 13.3 1.7drizzle
#ISSLearningFest
sample
features label
Imputation
Categories of missing data:
1. Missing at Random (MAR)
• More data available on a different sample.
2. Missing Completely at Random
• No relationship exists between missing values and
other observations.
3. Missing Not at Random
• There’s a reason why the values are missing and
records should be flagged.
• Numerical
• Categorical
#ISSLearningFest
date precipitation temp_max temp_min wind
1/1/2012 0 12.8 5 4.7
1/2/2012 10.9 10.6 2.8 4.5
1/3/2012 0.8 11.7 7.2 2.3
1/4/2012 20.3 12.2 5.6 4.7
1/5/2012 1.3 8.9 2.8 6.1
1/6/2012 2.5 4.4 2.2 2.2
1/7/2012 0 7.2 2.8 2.3
1/8/2012 0 10 2.8 2
1/9/2012 4.3 9.4 5 3.4
1/10/2012 1 6.1 0.6 3.4
1/11/2012 0 6.1 -1.1 5.1
1/12/2012 0 6.1 -1.7 1.9
1/13/2012 0 5 -2.8 1.3
1/14/2012 0 16.1 1.7 4.3
1/15/2012 0 21.1 7.2 4.1
1/16/2012 20 6.1 2.1
1/17/2012 14.4 3.9 3
1/18/2012 18.3 4.4 4.3
1/19/2012 0 25.6 12.8 2.2
1/20/2012 0 18.9 13.9 2.8
1/21/2012 0 22.2 13.3 1.7
date precipitation temp_max temp_min wind
1/1/2012 0 12.8 5 4.7
1/2/2012 10.9 10.6 2.8 4.5
1/3/2012 0.8 11.7 7.2 2.3
1/4/2012 20.3 12.2 5.6 4.7
1/5/2012 1.3 8.9 2.8 6.1
1/6/2012 2.5 4.4 2.2 2.2
1/7/2012 0 7.2 2.8 2.3
1/8/2012 0 10 2.8 2
1/9/2012 4.3 9.4 5 3.4
1/10/2012 1 6.1 0.6 3.4
1/11/2012 0 6.1 -1.1 5.1
1/12/2012 0 6.1 -1.7 1.9
1/13/2012 0 5 -2.8 1.3
1/14/2012 0 16.1 1.7 4.3
1/15/2012 0 21.1 7.2 4.1
1/16/2012 0 20 6.1 2.1
1/17/2012 0 14.4 3.9 3
1/18/2012 0 18.3 4.4 4.3
1/19/2012 0 25.6 12.8 2.2
1/20/2012 0 18.9 13.9 2.8
1/21/2012 0 22.2 13.3 1.7
Handling Outliers
• Removal
• Replacing values
• Capping
• Discretization
• Binning
#ISSLearningFest
date precipitation temp_max temp_min wind
1/1/2012 0 12.8 5 4.7
1/2/2012 10.9 10.6 2.8 4.5
1/3/2012 0.8 11.7 7.2 2.3
1/4/2012 20.3 12.2 5.6 4.7
1/5/2012 1.3 8.9 2.8 6.1
1/6/2012 2.5 4.4 2.2 2.2
1/7/2012 0 7.2 2.8 2.3
1/8/2012 0 10 2.8 2
1/9/2012 4.3 9.4 5 3.4
1/10/2012 1 6.1 0.6 3.4
1/11/2012 0 6.1 -1.1 5.1
1/12/2012 0 6.1 -1.7 1.9
1/13/2012 0 5 -2.8 1.3
1/14/2012 0 16.1 1.7 4.3
1/15/2012 0 21.1 7.2 4.1
1/16/2012 0 20 6.1 2.1
1/17/2012 0 14.4 3.9 3
1/18/2012 0 18.3 4.4 4.3
1/19/2012 0 25.6 12.8 2.2
1/20/2012 0 18.9 13.9 2.8
1/21/2012 0 22.2 13.3 1.7
date precipitation temp_max temp_min wind
1/1/2012 0 12.8 5 4.7
1/2/2012 10.9 10.6 2.8 4.5
1/3/2012 0.8 11.7 7.2 2.3
1/4/2012 20.3 12.2 5.6 4.7
1/5/2012 1.3 8.9 2.8 6.1
1/6/2012 2.5 4.4 2.2 2.2
1/7/2012 0 7.2 2.8 2.3
1/8/2012 0 10 2.8 2
1/9/2012 4.3 9.4 5 3.4
1/10/2012 1 6.1 0.6 3.4
1/11/2012 0 6.1 -1.1 5.1
1/12/2012 0 6.1 -1.7 1.9
1/13/2012 0 5 -2.8 1.3
1/14/2012 0 16.1 1.7 4.3
1/15/2012 0 21.1 7.2 4.1
1/16/2012 0 20 6.1 2.1
1/17/2012 0 14.4 3.9 3
1/18/2012 0 18.3 4.4 4.3
1/19/2012 0 25.6 12.8 2.2
1/20/2012 0 18.9 13.9 2.8
1/21/2012 0 22.2 13.3 1.7
Feature Selection
• Select features that are highly correlated
to target
• Pick the most representative features from
existing features
• For selected features, look for sets of
features that are highly correlated with
each other
• In each set, select feature with highest
correlation to target
• Use final selected features to train the
model
#ISSLearningFest
date precipitation temp_max temp_min wind weather
1/1/2012 0 12.8 5 4.7 drizzle
1/2/2012 10.9 10.6 2.8 4.5 rain
1/3/2012 0.8 11.7 7.2 2.3 rain
1/4/2012 20.3 12.2 5.6 4.7 rain
1/5/2012 1.3 8.9 2.8 6.1 rain
1/6/2012 2.5 4.4 2.2 2.2 rain
1/7/2012 0 7.2 2.8 2.3 rain
1/8/2012 0 10 2.8 2 sun
1/9/2012 4.3 9.4 5 3.4 rain
1/10/2012 1 6.1 0.6 3.4 rain
1/11/2012 0 6.1 -1.1 5.1 sun
1/12/2012 0 6.1 -1.7 1.9 sun
1/13/2012 0 5 -2.8 1.3 sun
1/14/2012 0 16.1 1.7 4.3 sun
1/15/2012 0 21.1 7.2 4.1 sun
1/16/2012 0 20 6.1 2.1 sun
1/17/2012 0 14.4 3.9 3 sun
1/18/2012 0 18.3 4.4 4.3 sun
1/19/2012 0 25.6 12.8 2.2 drizzle
1/20/2012 0 18.9 13.9 2.8 drizzle
1/21/2012 0 22.2 13.3 1.7 drizzle
Selected features implies state
Pearson Correlation
• Measure of the extend to which two random variables change in
tandem
• Value between -1 to +1
• -1 indicates strong negative linear correlation
• 0 indicates no correlation
• +1 indicates strong positive correlation
#ISSLearningFest
Correlation between variables
#ISSLearningFest
Feature Extraction
• Analyse existing features to generate new features
• Dimension Reduction
• Reducing a 4D/3D space  2D space
#ISSLearningFest
date precipitation temp_max temp_min wind weather
1/1/2012 0 12.8 5 4.7 drizzle
1/2/2012 10.9 10.6 2.8 4.5 rain
1/3/2012 0.8 11.7 7.2 2.3 rain
1/4/2012 20.3 12.2 5.6 4.7 rain
1/5/2012 1.3 8.9 2.8 6.1 rain
1/6/2012 2.5 4.4 2.2 2.2 rain
1/7/2012 0 7.2 2.8 2.3 rain
1/8/2012 0 10 2.8 2 sun
1/9/2012 4.3 9.4 5 3.4 rain
1/10/2012 1 6.1 0.6 3.4 rain
1/11/2012 0 6.1 -1.1 5.1 sun
1/12/2012 0 6.1 -1.7 1.9 sun
1/13/2012 0 5 -2.8 1.3 sun
1/14/2012 0 16.1 1.7 4.3 sun
1/15/2012 0 21.1 7.2 4.1 sun
1/16/2012 0 20 6.1 2.1 sun
1/17/2012 0 14.4 3.9 3 sun
1/18/2012 0 18.3 4.4 4.3 sun
1/19/2012 0 25.6 12.8 2.2 drizzle
1/20/2012 0 18.9 13.9 2.8 drizzle
1/21/2012 0 22.2 13.3 1.7 drizzle
PCA Analysis
precipitation temp_max weather
0 12.8 drizzle
10.9 10.6 rain
0.8 11.7 rain
20.3 12.2 rain
1.3 8.9 rain
2.5 4.4 rain
0 7.2 rain
0 10 sun
4.3 9.4 rain
1 6.1 rain
0 6.1 sun
0 6.1 sun
0 5 sun
0 16.1 sun
0 21.1 sun
0 20 sun
0 14.4 sun
0 18.3 sun
0 25.6 drizzle
0 18.9 drizzle
0 22.2 drizzle
Feature Scaling
• Different scales in our dataset
• Different techniques
• Normalization: min-max scaling
• Values in column bounded between fixed range 0 and 1
• Standardization: Z-score normalization
• Values in column rescale to Gaussian distribution, i.e. show
mean and variance
• Standardization
• Reduces each feature to similar scale for ease of
comparison
• Performed within each feature, not across features
• Shift dataset to origin allows learning models to learn
faster and better
#ISSLearningFest
date precipitation temp_max temp_min wind weather
1/1/2012 0 12.8 5 4.7 drizzle
1/2/2012 10.9 10.6 2.8 4.5 rain
1/3/2012 0.8 11.7 7.2 2.3 rain
1/4/2012 20.3 12.2 5.6 4.7 rain
1/5/2012 1.3 8.9 2.8 6.1 rain
1/6/2012 2.5 4.4 2.2 2.2 rain
1/7/2012 0 7.2 2.8 2.3 rain
1/8/2012 0 10 2.8 2 sun
1/9/2012 4.3 9.4 5 3.4 rain
1/10/2012 1 6.1 0.6 3.4 rain
1/11/2012 0 6.1 -1.1 5.1 sun
1/12/2012 0 6.1 -1.7 1.9 sun
1/13/2012 0 5 -2.8 1.3 sun
1/14/2012 0 16.1 1.7 4.3 sun
1/15/2012 0 21.1 7.2 4.1 sun
1/16/2012 0 20 6.1 2.1 sun
1/17/2012 0 14.4 3.9 3 sun
1/18/2012 0 18.3 4.4 4.3 sun
1/19/2012 0 25.6 12.8 2.2 drizzle
1/20/2012 0 18.9 13.9 2.8 drizzle
1/21/2012 0 22.2 13.3 1.7 drizzle
Small scale
Implementing ML algorithm for IoT solution
• Sampling
• Split dataset into training dataset
(80%) and test dataset (20%)
• Build ML model
• Put training dataset to ML algorithm for
training
• Output: Trained model/Predictor
generated
• Test ML model
• Use test dataset passed to
predictor/model
• Evaluate model
• determine the accuracy of our model
#ISSLearningFest
Summary
• Data Cleaning
• Impute missing values
• Encode categorical features
• Data Transformation
• Transform and scale numerical variables
• Feature Extraction
• Perform discretization
• Remove outliers
• Feature selection
• Perform feature extraction from date and
time
• Create new features from existing ones
• Feature Iteration
• Pump to ML algorithm to produce trained
model
#ISSLearningFest
Give Us Your Feedback
#ISSLearningFest
Day 2 Programme
Question & Answer
#ISSLearningFest
Thank You!
darrylng@nus.edu.sg
#ISSLearningFest

Contenu connexe

Similaire à Feature Engineering for IoT

EduWeb - Building a Responsive Website for the Presidential Debate
EduWeb - Building a Responsive Website for the Presidential DebateEduWeb - Building a Responsive Website for the Presidential Debate
EduWeb - Building a Responsive Website for the Presidential DebateJon Liu
 
Mark Dzwonczyk at the Common Ground Alliance 2012
Mark Dzwonczyk at the Common Ground Alliance 2012Mark Dzwonczyk at the Common Ground Alliance 2012
Mark Dzwonczyk at the Common Ground Alliance 2012nicholville
 
Web Page Test - Beyond the Basics
Web Page Test - Beyond the BasicsWeb Page Test - Beyond the Basics
Web Page Test - Beyond the BasicsAndy Davies
 
CRC/SWAC DLIO workshop 2013
CRC/SWAC DLIO workshop 2013CRC/SWAC DLIO workshop 2013
CRC/SWAC DLIO workshop 2013FAO
 
CRC/SWAC DLIO workshop 2013
CRC/SWAC DLIO workshop 2013CRC/SWAC DLIO workshop 2013
CRC/SWAC DLIO workshop 2013FAOLocust
 
TERENCE automated reasoning and natural language processing for generating ed...
TERENCE automated reasoning and natural language processing for generating ed...TERENCE automated reasoning and natural language processing for generating ed...
TERENCE automated reasoning and natural language processing for generating ed...Rosella Gennari
 
Mobile Software Engineering Crash Course - C01 Intro
Mobile Software Engineering Crash Course - C01 IntroMobile Software Engineering Crash Course - C01 Intro
Mobile Software Engineering Crash Course - C01 IntroMohammad Shaker
 
Software engineering paradigm applied
Software engineering paradigm appliedSoftware engineering paradigm applied
Software engineering paradigm appliedbhuygv
 
Graphical Analysis of PV Plant Data
Graphical Analysis of PV Plant DataGraphical Analysis of PV Plant Data
Graphical Analysis of PV Plant DataCupertinoElectric
 
A website's structured data success story
A website's structured data success storyA website's structured data success story
A website's structured data success storyJarno van Driel
 
Front-End Performance Starts On the Server
Front-End Performance Starts On the ServerFront-End Performance Starts On the Server
Front-End Performance Starts On the ServerJon Arne Sæterås
 
Come for the software, stay for the community - How Drupal improves and evolves
Come for the software, stay for the community - How Drupal improves and evolvesCome for the software, stay for the community - How Drupal improves and evolves
Come for the software, stay for the community - How Drupal improves and evolvesGábor Hojtsy
 
Creating a Big data Strategy with Tactics for Quick Implementation
Creating a Big data Strategy with Tactics for Quick ImplementationCreating a Big data Strategy with Tactics for Quick Implementation
Creating a Big data Strategy with Tactics for Quick ImplementationLewandog, Inc,
 
De6.1 report
De6.1 reportDe6.1 report
De6.1 reportjrgcolin
 
Rating system for determining whether to accept or reject objection raised by...
Rating system for determining whether to accept or reject objection raised by...Rating system for determining whether to accept or reject objection raised by...
Rating system for determining whether to accept or reject objection raised by...Tal Lavian Ph.D.
 
IRJET- School in the Cloud
IRJET- School in the CloudIRJET- School in the Cloud
IRJET- School in the CloudIRJET Journal
 
Dragging old data forward: finding yourself an RDA Helper
Dragging old data forward:  finding yourself an RDA HelperDragging old data forward:  finding yourself an RDA Helper
Dragging old data forward: finding yourself an RDA HelperTerry Reese
 

Similaire à Feature Engineering for IoT (20)

EduWeb - Building a Responsive Website for the Presidential Debate
EduWeb - Building a Responsive Website for the Presidential DebateEduWeb - Building a Responsive Website for the Presidential Debate
EduWeb - Building a Responsive Website for the Presidential Debate
 
Mark Dzwonczyk at the Common Ground Alliance 2012
Mark Dzwonczyk at the Common Ground Alliance 2012Mark Dzwonczyk at the Common Ground Alliance 2012
Mark Dzwonczyk at the Common Ground Alliance 2012
 
Web Page Test - Beyond the Basics
Web Page Test - Beyond the BasicsWeb Page Test - Beyond the Basics
Web Page Test - Beyond the Basics
 
Jabed technologies rev12_jf
Jabed technologies rev12_jfJabed technologies rev12_jf
Jabed technologies rev12_jf
 
CRC/SWAC DLIO workshop 2013
CRC/SWAC DLIO workshop 2013CRC/SWAC DLIO workshop 2013
CRC/SWAC DLIO workshop 2013
 
CRC/SWAC DLIO workshop 2013
CRC/SWAC DLIO workshop 2013CRC/SWAC DLIO workshop 2013
CRC/SWAC DLIO workshop 2013
 
TERENCE automated reasoning and natural language processing for generating ed...
TERENCE automated reasoning and natural language processing for generating ed...TERENCE automated reasoning and natural language processing for generating ed...
TERENCE automated reasoning and natural language processing for generating ed...
 
Mobile Software Engineering Crash Course - C01 Intro
Mobile Software Engineering Crash Course - C01 IntroMobile Software Engineering Crash Course - C01 Intro
Mobile Software Engineering Crash Course - C01 Intro
 
Software engineering paradigm applied
Software engineering paradigm appliedSoftware engineering paradigm applied
Software engineering paradigm applied
 
Graphical Analysis of PV Plant Data
Graphical Analysis of PV Plant DataGraphical Analysis of PV Plant Data
Graphical Analysis of PV Plant Data
 
Open Geospatial Consortium (OGC) - Water/Hydro related activities
Open Geospatial Consortium (OGC) - Water/Hydro related activitiesOpen Geospatial Consortium (OGC) - Water/Hydro related activities
Open Geospatial Consortium (OGC) - Water/Hydro related activities
 
A website's structured data success story
A website's structured data success storyA website's structured data success story
A website's structured data success story
 
Front-End Performance Starts On the Server
Front-End Performance Starts On the ServerFront-End Performance Starts On the Server
Front-End Performance Starts On the Server
 
Energy Saving Calculations for Recommissioning and Design
Energy Saving Calculations for Recommissioning and DesignEnergy Saving Calculations for Recommissioning and Design
Energy Saving Calculations for Recommissioning and Design
 
Come for the software, stay for the community - How Drupal improves and evolves
Come for the software, stay for the community - How Drupal improves and evolvesCome for the software, stay for the community - How Drupal improves and evolves
Come for the software, stay for the community - How Drupal improves and evolves
 
Creating a Big data Strategy with Tactics for Quick Implementation
Creating a Big data Strategy with Tactics for Quick ImplementationCreating a Big data Strategy with Tactics for Quick Implementation
Creating a Big data Strategy with Tactics for Quick Implementation
 
De6.1 report
De6.1 reportDe6.1 report
De6.1 report
 
Rating system for determining whether to accept or reject objection raised by...
Rating system for determining whether to accept or reject objection raised by...Rating system for determining whether to accept or reject objection raised by...
Rating system for determining whether to accept or reject objection raised by...
 
IRJET- School in the Cloud
IRJET- School in the CloudIRJET- School in the Cloud
IRJET- School in the Cloud
 
Dragging old data forward: finding yourself an RDA Helper
Dragging old data forward:  finding yourself an RDA HelperDragging old data forward:  finding yourself an RDA Helper
Dragging old data forward: finding yourself an RDA Helper
 

Plus de NUS-ISS

Designing Impactful Services and User Experience - Lim Wee Khee
Designing Impactful Services and User Experience - Lim Wee KheeDesigning Impactful Services and User Experience - Lim Wee Khee
Designing Impactful Services and User Experience - Lim Wee KheeNUS-ISS
 
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...NUS-ISS
 
How the World's Leading Independent Automotive Distributor is Reinventing Its...
How the World's Leading Independent Automotive Distributor is Reinventing Its...How the World's Leading Independent Automotive Distributor is Reinventing Its...
How the World's Leading Independent Automotive Distributor is Reinventing Its...NUS-ISS
 
The Importance of Cybersecurity for Digital Transformation
The Importance of Cybersecurity for Digital TransformationThe Importance of Cybersecurity for Digital Transformation
The Importance of Cybersecurity for Digital TransformationNUS-ISS
 
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...NUS-ISS
 
Understanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix GohUnderstanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix GohNUS-ISS
 
Digital Product-Centric Enterprise and Enterprise Architecture - Tan Eng Tsze
Digital Product-Centric Enterprise and Enterprise Architecture - Tan Eng TszeDigital Product-Centric Enterprise and Enterprise Architecture - Tan Eng Tsze
Digital Product-Centric Enterprise and Enterprise Architecture - Tan Eng TszeNUS-ISS
 
Emerging & Future Technology - How to Prepare for the Next 10 Years of Radica...
Emerging & Future Technology - How to Prepare for the Next 10 Years of Radica...Emerging & Future Technology - How to Prepare for the Next 10 Years of Radica...
Emerging & Future Technology - How to Prepare for the Next 10 Years of Radica...NUS-ISS
 
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...NUS-ISS
 
Supply Chain Security for Containerised Workloads - Lee Chuk Munn
Supply Chain Security for Containerised Workloads - Lee Chuk MunnSupply Chain Security for Containerised Workloads - Lee Chuk Munn
Supply Chain Security for Containerised Workloads - Lee Chuk MunnNUS-ISS
 
Future of Learning - Yap Aye Wee.pdf
Future of Learning - Yap Aye Wee.pdfFuture of Learning - Yap Aye Wee.pdf
Future of Learning - Yap Aye Wee.pdfNUS-ISS
 
Future of Learning - Khoong Chan Meng
Future of Learning - Khoong Chan MengFuture of Learning - Khoong Chan Meng
Future of Learning - Khoong Chan MengNUS-ISS
 
Site Reliability Engineer (SRE), We Keep The Lights On 24/7
Site Reliability Engineer (SRE), We Keep The Lights On 24/7Site Reliability Engineer (SRE), We Keep The Lights On 24/7
Site Reliability Engineer (SRE), We Keep The Lights On 24/7NUS-ISS
 
Product Management in The Trenches for a Cloud Service
Product Management in The Trenches for a Cloud ServiceProduct Management in The Trenches for a Cloud Service
Product Management in The Trenches for a Cloud ServiceNUS-ISS
 
Overview of Data and Analytics Essentials and Foundations
Overview of Data and Analytics Essentials and FoundationsOverview of Data and Analytics Essentials and Foundations
Overview of Data and Analytics Essentials and FoundationsNUS-ISS
 
Predictive Analytics
Predictive AnalyticsPredictive Analytics
Predictive AnalyticsNUS-ISS
 
Master of Technology in Software Engineering
Master of Technology in Software EngineeringMaster of Technology in Software Engineering
Master of Technology in Software EngineeringNUS-ISS
 
Master of Technology in Enterprise Business Analytics
Master of Technology in Enterprise Business AnalyticsMaster of Technology in Enterprise Business Analytics
Master of Technology in Enterprise Business AnalyticsNUS-ISS
 
Diagnosing Complex Problems Using System Archetypes
Diagnosing Complex Problems Using System ArchetypesDiagnosing Complex Problems Using System Archetypes
Diagnosing Complex Problems Using System ArchetypesNUS-ISS
 
Satisfying the ‘-ilities’ of an Enterprise Cloud Service
Satisfying the ‘-ilities’ of an Enterprise Cloud ServiceSatisfying the ‘-ilities’ of an Enterprise Cloud Service
Satisfying the ‘-ilities’ of an Enterprise Cloud ServiceNUS-ISS
 

Plus de NUS-ISS (20)

Designing Impactful Services and User Experience - Lim Wee Khee
Designing Impactful Services and User Experience - Lim Wee KheeDesigning Impactful Services and User Experience - Lim Wee Khee
Designing Impactful Services and User Experience - Lim Wee Khee
 
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
 
How the World's Leading Independent Automotive Distributor is Reinventing Its...
How the World's Leading Independent Automotive Distributor is Reinventing Its...How the World's Leading Independent Automotive Distributor is Reinventing Its...
How the World's Leading Independent Automotive Distributor is Reinventing Its...
 
The Importance of Cybersecurity for Digital Transformation
The Importance of Cybersecurity for Digital TransformationThe Importance of Cybersecurity for Digital Transformation
The Importance of Cybersecurity for Digital Transformation
 
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
 
Understanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix GohUnderstanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix Goh
 
Digital Product-Centric Enterprise and Enterprise Architecture - Tan Eng Tsze
Digital Product-Centric Enterprise and Enterprise Architecture - Tan Eng TszeDigital Product-Centric Enterprise and Enterprise Architecture - Tan Eng Tsze
Digital Product-Centric Enterprise and Enterprise Architecture - Tan Eng Tsze
 
Emerging & Future Technology - How to Prepare for the Next 10 Years of Radica...
Emerging & Future Technology - How to Prepare for the Next 10 Years of Radica...Emerging & Future Technology - How to Prepare for the Next 10 Years of Radica...
Emerging & Future Technology - How to Prepare for the Next 10 Years of Radica...
 
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...
 
Supply Chain Security for Containerised Workloads - Lee Chuk Munn
Supply Chain Security for Containerised Workloads - Lee Chuk MunnSupply Chain Security for Containerised Workloads - Lee Chuk Munn
Supply Chain Security for Containerised Workloads - Lee Chuk Munn
 
Future of Learning - Yap Aye Wee.pdf
Future of Learning - Yap Aye Wee.pdfFuture of Learning - Yap Aye Wee.pdf
Future of Learning - Yap Aye Wee.pdf
 
Future of Learning - Khoong Chan Meng
Future of Learning - Khoong Chan MengFuture of Learning - Khoong Chan Meng
Future of Learning - Khoong Chan Meng
 
Site Reliability Engineer (SRE), We Keep The Lights On 24/7
Site Reliability Engineer (SRE), We Keep The Lights On 24/7Site Reliability Engineer (SRE), We Keep The Lights On 24/7
Site Reliability Engineer (SRE), We Keep The Lights On 24/7
 
Product Management in The Trenches for a Cloud Service
Product Management in The Trenches for a Cloud ServiceProduct Management in The Trenches for a Cloud Service
Product Management in The Trenches for a Cloud Service
 
Overview of Data and Analytics Essentials and Foundations
Overview of Data and Analytics Essentials and FoundationsOverview of Data and Analytics Essentials and Foundations
Overview of Data and Analytics Essentials and Foundations
 
Predictive Analytics
Predictive AnalyticsPredictive Analytics
Predictive Analytics
 
Master of Technology in Software Engineering
Master of Technology in Software EngineeringMaster of Technology in Software Engineering
Master of Technology in Software Engineering
 
Master of Technology in Enterprise Business Analytics
Master of Technology in Enterprise Business AnalyticsMaster of Technology in Enterprise Business Analytics
Master of Technology in Enterprise Business Analytics
 
Diagnosing Complex Problems Using System Archetypes
Diagnosing Complex Problems Using System ArchetypesDiagnosing Complex Problems Using System Archetypes
Diagnosing Complex Problems Using System Archetypes
 
Satisfying the ‘-ilities’ of an Enterprise Cloud Service
Satisfying the ‘-ilities’ of an Enterprise Cloud ServiceSatisfying the ‘-ilities’ of an Enterprise Cloud Service
Satisfying the ‘-ilities’ of an Enterprise Cloud Service
 

Dernier

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Dernier (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Feature Engineering for IoT

  • 1. Feature Engineering for IoT Darryl Ng #ISSLearningFest
  • 3. IoT Reference Architecture #ISSLearningFest https://docs.microsoft.com/en-us/azure/architecture/reference-architectures/iot Sense Connect Collect Process Act Devices generate events •Through platform to application Insights based on data •Derived by evaluating incoming device events Actions based on insights •Execute processes and workflows in the application DATA
  • 4. IoT and Cloud Providers 1. Capabilities added to the devices a. Device side processing • Real-time analytics, edge ML capabilities 2. Gateway to communicate with downstream, heterogeneous devices 3. Cloud services a. Device management capabilities, i.e. device shadowing, provisioning, OTA updates, security a. Stream processing b. Big data stack • Analytics and visualization #ISSLearningFest Cloud-centric Device/Gateway- centric
  • 6. Data Collection • Data collection can be a significant effort in machine learning • Types of Data • Historical Data (e.g. past weather) • Generated data (e.g. weather from sensors) • Manually collected (e.g. observe or visual inputs at different times of the day) • Collect data to infer its probability distribution • Generate more data from the probability distribution #ISSLearningFest
  • 7. Feature Engineering • Extracting features out of data and transforming them into something that can be used as a learning model in machine learning algorithm • Accuracy of machine learning model depends on the quality of data used for learning • Good Features => Model learns quickly • Bad Features => Model doesn’t learn #ISSLearningFest
  • 8. Features, Samples and Label date precipitation temp_max temp_min wind weather 1/1/2012 0 12.8 5 4.7drizzle 1/2/2012 10.9 10.6 2.8 4.5rain 1/3/2012 0.8 11.7 7.2 2.3rain 1/4/2012 20.3 12.2 5.6 4.7rain 1/5/2012 1.3 8.9 2.8 6.1rain 1/6/2012 2.5 4.4 2.2 2.2rain 1/7/2012 0 7.2 2.8 2.3rain 1/8/2012 0 10 2.8 2sun 1/9/2012 4.3 9.4 5 3.4rain 1/10/2012 1 6.1 0.6 3.4rain 1/11/2012 0 6.1 -1.1 5.1sun 1/12/2012 0 6.1 -1.7 1.9sun 1/13/2012 0 5 -2.8 1.3sun 1/14/2012 0 16.1 1.7 4.3sun 1/15/2012 0 21.1 7.2 4.1sun 1/16/2012 0 20 6.1 2.1sun 1/17/2012 0 14.4 3.9 3sun 1/18/2012 0 18.3 4.4 4.3sun 1/19/2012 0 25.6 12.8 2.2drizzle 1/20/2012 0 18.9 13.9 2.8drizzle 1/21/2012 0 22.2 13.3 1.7drizzle #ISSLearningFest sample features label
  • 9. Imputation Categories of missing data: 1. Missing at Random (MAR) • More data available on a different sample. 2. Missing Completely at Random • No relationship exists between missing values and other observations. 3. Missing Not at Random • There’s a reason why the values are missing and records should be flagged. • Numerical • Categorical #ISSLearningFest date precipitation temp_max temp_min wind 1/1/2012 0 12.8 5 4.7 1/2/2012 10.9 10.6 2.8 4.5 1/3/2012 0.8 11.7 7.2 2.3 1/4/2012 20.3 12.2 5.6 4.7 1/5/2012 1.3 8.9 2.8 6.1 1/6/2012 2.5 4.4 2.2 2.2 1/7/2012 0 7.2 2.8 2.3 1/8/2012 0 10 2.8 2 1/9/2012 4.3 9.4 5 3.4 1/10/2012 1 6.1 0.6 3.4 1/11/2012 0 6.1 -1.1 5.1 1/12/2012 0 6.1 -1.7 1.9 1/13/2012 0 5 -2.8 1.3 1/14/2012 0 16.1 1.7 4.3 1/15/2012 0 21.1 7.2 4.1 1/16/2012 20 6.1 2.1 1/17/2012 14.4 3.9 3 1/18/2012 18.3 4.4 4.3 1/19/2012 0 25.6 12.8 2.2 1/20/2012 0 18.9 13.9 2.8 1/21/2012 0 22.2 13.3 1.7 date precipitation temp_max temp_min wind 1/1/2012 0 12.8 5 4.7 1/2/2012 10.9 10.6 2.8 4.5 1/3/2012 0.8 11.7 7.2 2.3 1/4/2012 20.3 12.2 5.6 4.7 1/5/2012 1.3 8.9 2.8 6.1 1/6/2012 2.5 4.4 2.2 2.2 1/7/2012 0 7.2 2.8 2.3 1/8/2012 0 10 2.8 2 1/9/2012 4.3 9.4 5 3.4 1/10/2012 1 6.1 0.6 3.4 1/11/2012 0 6.1 -1.1 5.1 1/12/2012 0 6.1 -1.7 1.9 1/13/2012 0 5 -2.8 1.3 1/14/2012 0 16.1 1.7 4.3 1/15/2012 0 21.1 7.2 4.1 1/16/2012 0 20 6.1 2.1 1/17/2012 0 14.4 3.9 3 1/18/2012 0 18.3 4.4 4.3 1/19/2012 0 25.6 12.8 2.2 1/20/2012 0 18.9 13.9 2.8 1/21/2012 0 22.2 13.3 1.7
  • 10. Handling Outliers • Removal • Replacing values • Capping • Discretization • Binning #ISSLearningFest date precipitation temp_max temp_min wind 1/1/2012 0 12.8 5 4.7 1/2/2012 10.9 10.6 2.8 4.5 1/3/2012 0.8 11.7 7.2 2.3 1/4/2012 20.3 12.2 5.6 4.7 1/5/2012 1.3 8.9 2.8 6.1 1/6/2012 2.5 4.4 2.2 2.2 1/7/2012 0 7.2 2.8 2.3 1/8/2012 0 10 2.8 2 1/9/2012 4.3 9.4 5 3.4 1/10/2012 1 6.1 0.6 3.4 1/11/2012 0 6.1 -1.1 5.1 1/12/2012 0 6.1 -1.7 1.9 1/13/2012 0 5 -2.8 1.3 1/14/2012 0 16.1 1.7 4.3 1/15/2012 0 21.1 7.2 4.1 1/16/2012 0 20 6.1 2.1 1/17/2012 0 14.4 3.9 3 1/18/2012 0 18.3 4.4 4.3 1/19/2012 0 25.6 12.8 2.2 1/20/2012 0 18.9 13.9 2.8 1/21/2012 0 22.2 13.3 1.7 date precipitation temp_max temp_min wind 1/1/2012 0 12.8 5 4.7 1/2/2012 10.9 10.6 2.8 4.5 1/3/2012 0.8 11.7 7.2 2.3 1/4/2012 20.3 12.2 5.6 4.7 1/5/2012 1.3 8.9 2.8 6.1 1/6/2012 2.5 4.4 2.2 2.2 1/7/2012 0 7.2 2.8 2.3 1/8/2012 0 10 2.8 2 1/9/2012 4.3 9.4 5 3.4 1/10/2012 1 6.1 0.6 3.4 1/11/2012 0 6.1 -1.1 5.1 1/12/2012 0 6.1 -1.7 1.9 1/13/2012 0 5 -2.8 1.3 1/14/2012 0 16.1 1.7 4.3 1/15/2012 0 21.1 7.2 4.1 1/16/2012 0 20 6.1 2.1 1/17/2012 0 14.4 3.9 3 1/18/2012 0 18.3 4.4 4.3 1/19/2012 0 25.6 12.8 2.2 1/20/2012 0 18.9 13.9 2.8 1/21/2012 0 22.2 13.3 1.7
  • 11. Feature Selection • Select features that are highly correlated to target • Pick the most representative features from existing features • For selected features, look for sets of features that are highly correlated with each other • In each set, select feature with highest correlation to target • Use final selected features to train the model #ISSLearningFest date precipitation temp_max temp_min wind weather 1/1/2012 0 12.8 5 4.7 drizzle 1/2/2012 10.9 10.6 2.8 4.5 rain 1/3/2012 0.8 11.7 7.2 2.3 rain 1/4/2012 20.3 12.2 5.6 4.7 rain 1/5/2012 1.3 8.9 2.8 6.1 rain 1/6/2012 2.5 4.4 2.2 2.2 rain 1/7/2012 0 7.2 2.8 2.3 rain 1/8/2012 0 10 2.8 2 sun 1/9/2012 4.3 9.4 5 3.4 rain 1/10/2012 1 6.1 0.6 3.4 rain 1/11/2012 0 6.1 -1.1 5.1 sun 1/12/2012 0 6.1 -1.7 1.9 sun 1/13/2012 0 5 -2.8 1.3 sun 1/14/2012 0 16.1 1.7 4.3 sun 1/15/2012 0 21.1 7.2 4.1 sun 1/16/2012 0 20 6.1 2.1 sun 1/17/2012 0 14.4 3.9 3 sun 1/18/2012 0 18.3 4.4 4.3 sun 1/19/2012 0 25.6 12.8 2.2 drizzle 1/20/2012 0 18.9 13.9 2.8 drizzle 1/21/2012 0 22.2 13.3 1.7 drizzle Selected features implies state
  • 12. Pearson Correlation • Measure of the extend to which two random variables change in tandem • Value between -1 to +1 • -1 indicates strong negative linear correlation • 0 indicates no correlation • +1 indicates strong positive correlation #ISSLearningFest
  • 14. Feature Extraction • Analyse existing features to generate new features • Dimension Reduction • Reducing a 4D/3D space  2D space #ISSLearningFest date precipitation temp_max temp_min wind weather 1/1/2012 0 12.8 5 4.7 drizzle 1/2/2012 10.9 10.6 2.8 4.5 rain 1/3/2012 0.8 11.7 7.2 2.3 rain 1/4/2012 20.3 12.2 5.6 4.7 rain 1/5/2012 1.3 8.9 2.8 6.1 rain 1/6/2012 2.5 4.4 2.2 2.2 rain 1/7/2012 0 7.2 2.8 2.3 rain 1/8/2012 0 10 2.8 2 sun 1/9/2012 4.3 9.4 5 3.4 rain 1/10/2012 1 6.1 0.6 3.4 rain 1/11/2012 0 6.1 -1.1 5.1 sun 1/12/2012 0 6.1 -1.7 1.9 sun 1/13/2012 0 5 -2.8 1.3 sun 1/14/2012 0 16.1 1.7 4.3 sun 1/15/2012 0 21.1 7.2 4.1 sun 1/16/2012 0 20 6.1 2.1 sun 1/17/2012 0 14.4 3.9 3 sun 1/18/2012 0 18.3 4.4 4.3 sun 1/19/2012 0 25.6 12.8 2.2 drizzle 1/20/2012 0 18.9 13.9 2.8 drizzle 1/21/2012 0 22.2 13.3 1.7 drizzle PCA Analysis precipitation temp_max weather 0 12.8 drizzle 10.9 10.6 rain 0.8 11.7 rain 20.3 12.2 rain 1.3 8.9 rain 2.5 4.4 rain 0 7.2 rain 0 10 sun 4.3 9.4 rain 1 6.1 rain 0 6.1 sun 0 6.1 sun 0 5 sun 0 16.1 sun 0 21.1 sun 0 20 sun 0 14.4 sun 0 18.3 sun 0 25.6 drizzle 0 18.9 drizzle 0 22.2 drizzle
  • 15. Feature Scaling • Different scales in our dataset • Different techniques • Normalization: min-max scaling • Values in column bounded between fixed range 0 and 1 • Standardization: Z-score normalization • Values in column rescale to Gaussian distribution, i.e. show mean and variance • Standardization • Reduces each feature to similar scale for ease of comparison • Performed within each feature, not across features • Shift dataset to origin allows learning models to learn faster and better #ISSLearningFest date precipitation temp_max temp_min wind weather 1/1/2012 0 12.8 5 4.7 drizzle 1/2/2012 10.9 10.6 2.8 4.5 rain 1/3/2012 0.8 11.7 7.2 2.3 rain 1/4/2012 20.3 12.2 5.6 4.7 rain 1/5/2012 1.3 8.9 2.8 6.1 rain 1/6/2012 2.5 4.4 2.2 2.2 rain 1/7/2012 0 7.2 2.8 2.3 rain 1/8/2012 0 10 2.8 2 sun 1/9/2012 4.3 9.4 5 3.4 rain 1/10/2012 1 6.1 0.6 3.4 rain 1/11/2012 0 6.1 -1.1 5.1 sun 1/12/2012 0 6.1 -1.7 1.9 sun 1/13/2012 0 5 -2.8 1.3 sun 1/14/2012 0 16.1 1.7 4.3 sun 1/15/2012 0 21.1 7.2 4.1 sun 1/16/2012 0 20 6.1 2.1 sun 1/17/2012 0 14.4 3.9 3 sun 1/18/2012 0 18.3 4.4 4.3 sun 1/19/2012 0 25.6 12.8 2.2 drizzle 1/20/2012 0 18.9 13.9 2.8 drizzle 1/21/2012 0 22.2 13.3 1.7 drizzle Small scale
  • 16. Implementing ML algorithm for IoT solution • Sampling • Split dataset into training dataset (80%) and test dataset (20%) • Build ML model • Put training dataset to ML algorithm for training • Output: Trained model/Predictor generated • Test ML model • Use test dataset passed to predictor/model • Evaluate model • determine the accuracy of our model #ISSLearningFest
  • 17. Summary • Data Cleaning • Impute missing values • Encode categorical features • Data Transformation • Transform and scale numerical variables • Feature Extraction • Perform discretization • Remove outliers • Feature selection • Perform feature extraction from date and time • Create new features from existing ones • Feature Iteration • Pump to ML algorithm to produce trained model #ISSLearningFest
  • 18. Give Us Your Feedback #ISSLearningFest Day 2 Programme