SlideShare une entreprise Scribd logo
1  sur  11
Télécharger pour lire hors ligne
Machine Learning, Key to Your Formulation
Challenges
Marc Borowczak, PRC Consulting LLC (http://www.prcconsulting.net)
February 17, 2016
Formulation Challenges are Everywhere…
Step1: Retrieve Existing Data
Step 2: Normalize Data
Step 3: Train and Test a Model
Step 4: Evaluate Model Performance
Step 5: Improving Model Performance with 5 Hidden Neurons
Step 6: Improving Model using Random Forest Algorithm
Step 7: Testing Further with Resampling
Step 8: Actual Display of a Random Forest Tree Solution
Conclusions
References
Formulation Challenges are Everywhere…
You develop pharmaceutical, cosmetic, food, industrial or civil engineered products, and are often confronted
with the challenge of blending and formulating to meet process or performance properties. While traditional
Research and Development does approach the problem with experimentation, it generally involves designs,
time and resource constraints, and can be considered slow, expensive and often times redundant, fast forgotten
or perhaps obsolete.
Consider the alternative Machine Learning tools offers today. We will show this is not only quick, efficient and
ultimately the only way Front End of Innovation should proceed, and how it is particularly suited for formulation
and classification.
Today, we will explain how Machine Learning can shed new light on this generic and very persistent formulation
challenge. We will discuss the other important aspect of classification and clustering often associated with these
formulations challenges in a forthcoming communication.
Step1: Retrieve Existing Data
To illustrate the approach, we selected a formulation dataset hosted on UCI Machine Learning Repository
(http://archive.ics.uci.edu/ml/datasets.html), to predict the compressive strength (http://archive.ics.uci.edu
/ml/datasets/Concrete+Compressive+Strength) performance dependency on the formulation ingredients. This is
the well-known formulation composition - property relationship scientists, engineers and business professionals
must address daily and any established R&D would certainly have similar and sometimes hidden knowledge in
its archives…
We will use R to demonstrate quickly the approach on this dataset (http://archive.ics.uci.edu/ml/machine-
learning-databases/concrete/compressive/Concrete_Data.xls), and also demonstrate how reproducibility of the
analysis is enforced. The analysis tool and platform are documented, all libraries clearly listed, while data is
retrieved programmatically and date stamped from the repository.
Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html
1 of 11 2/18/2016 5:07 PM
Sys.info()[1:5]
## sysname release version nodename machine
## "Windows" "7 x64" "build 9200" "STALLION" "x86-64"
sessionInfo()
## R version 3.2.2 (2015-08-14)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 8 x64 (build 9200)
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] reshape_0.8.5 scales_0.3.0 devtools_1.8.0
## [4] rpart.plot_1.5.3 rpart_4.1-10 randomForest_4.6-10
## [7] neuralnet_1.32 MASS_7.3-44 caret_6.0-52
## [10] ggplot2_1.0.1 lattice_0.20-33 stringr_1.0.0
## [13] xlsx_0.5.7 xlsxjars_0.6.1 rJava_0.9-7
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.0 git2r_0.11.0 formatR_1.2
## [4] nloptr_1.0.4 plyr_1.8.3 iterators_1.0.7
## [7] tools_3.2.2 digest_0.6.8 lme4_1.1-9
## [10] memoise_0.2.1 evaluate_0.7.2 gtable_0.1.2
## [13] nlme_3.1-121 mgcv_1.8-9 Matrix_1.2-2
## [16] foreach_1.4.2 curl_0.9.3 parallel_3.2.2
## [19] yaml_2.1.13 SparseM_1.7 brglm_0.5-9
## [22] proto_0.3-10 xml2_0.1.1 BradleyTerry2_1.0-6
## [25] knitr_1.11 rversions_1.0.2 MatrixModels_0.4-1
## [28] gtools_3.5.0 stats4_3.2.2 nnet_7.3-11
## [31] rmarkdown_0.9.2 minqa_1.2.4 reshape2_1.4.1
## [34] car_2.0-26 magrittr_1.5 codetools_0.2-14
## [37] htmltools_0.2.6 splines_3.2.2 pbkrtest_0.4-2
## [40] colorspace_1.2-6 quantreg_5.18 stringi_0.5-5
## [43] munsell_0.4.2
Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html
2 of 11 2/18/2016 5:07 PM
library(xlsx)
library(stringr)
library(caret)
library(neuralnet)
library(devtools)
library(rpart)
library(rpart.plot)
userdir <- getwd()
datadir <- "./data"
if (!file.exists("data")){dir.create("data")}
fileUrl <- "http://archive.ics.uci.edu/ml/machine-learning-databases/concrete/compress
ive/Concrete_Data.xls?accessType=DOWNLOAD"
download.file(fileUrl,destfile="./data/Concrete_Data.xls",method="curl")
dateDownloaded <- date()
concrete <- read.xlsx("./data/Concrete_Data.xls",sheetName="Sheet1")
str(concrete)
## 'data.frame': 1030 obs. of 9 variables:
## $ Cement..component.1..kg.in.a.m.3.mixture. : num 540 540 332 332 199
...
## $ Blast.Furnace.Slag..component.2..kg.in.a.m.3.mixture.: num 0 0 142 142 132 ...
## $ Fly.Ash..component.3..kg.in.a.m.3.mixture. : num 0 0 0 0 0 0 0 0 0 0
...
## $ Water...component.4..kg.in.a.m.3.mixture. : num 162 162 228 228 192
228 228 228 228 228 ...
## $ Superplasticizer..component.5..kg.in.a.m.3.mixture. : num 2.5 2.5 0 0 0 0 0 0
0 0 ...
## $ Coarse.Aggregate...component.6..kg.in.a.m.3.mixture. : num 1040 1055 932 932 97
8 ...
## $ Fine.Aggregate..component.7..kg.in.a.m.3.mixture. : num 676 676 594 594 826
...
## $ Age..day. : num 28 28 270 365 360 90
365 28 28 28 ...
## $ Concrete.compressive.strength.MPa..megapascals.. : num 80 61.9 40.3 41.1 44
.3 ...
Step 2: Normalize Data
The dataset information reveals 1030 observations with 9 variables: 8 inputs, from which 7 are ingredients and
1 is a process attribute (Age) and 1 output, the strength property. There are no missing values in this set. We’ll
easily truncate the variable names and normalize the data, displaying the normalized strength.
normalize <- function(x) {return((x - min(x)) / (max(x) - min(x)))}
names(concrete)<-gsub("."," ",names(concrete))
names(concrete)<-word(names(concrete),1)
names(concrete)[9]<-"Strength"
concrete_norm <- as.data.frame(lapply(concrete, normalize))
Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html
3 of 11 2/18/2016 5:07 PM
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.2663 0.4000 0.4172 0.5457 1.0000
These transformations should be typical of a generic formulation where ingredients and process variables are
independent or input variables, and property is a dependent or output variable.
Step 3: Train and Test a Model
The method we’ll follow now is a standard approach where we randomly split the data set in a train and test set.
The caret (https://cran.r-project.org/web/packages/caret/caret.pdf) package implements this task well. We’ll use
75% of the data to train and the remainder to test the model. To make the analysis reproducible, we’ll use the
set.seed() function.
set.seed(12121)
inTrain<-createDataPartition(y=concrete_norm$Strength,p=0.75,list=FALSE)
concrete_train <- concrete_norm[inTrain, ]
concrete_test <- concrete_norm[-inTrain, ]
concrete_model <- neuralnet(formula = Strength ~ Cement + Blast + Fly + Water + Superp
lasticizer + Coarse + Fine + Age, data = concrete_train)
The network topology is then easily visualized. See for details the excellent NeuralNetTool page
(https://beckmw.wordpress.com/tag/neural-network/). Suffise to say that even this simple first attempt highlights
the main dependencies and higher impacts are highligted with thicker links in the typical neural net
representation. here the I’s represent inputs, O is the output, H and B are Hidden and Bias nodes as defined in
the theory. Note a single bias node is added for each input and hidden layer to accomodate input features equal
to 0.
Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html
4 of 11 2/18/2016 5:07 PM
Step 4: Evaluate Model Performance
We will now compute predictions and compare them to actual strength and examine the correlation between
predicted and actual strength values.
model_results <- compute(concrete_model, concrete_test[1:8])
predicted_strength <- model_results$net.result
cor(predicted_strength, concrete_test$Strength)[1,1]
## [1] 0.8336352308
The default neural net exhibits a correlation of 0.8336352. We can certainly try to improve it by including a few
hidden neurons…
Step 5: Improving Model Performance with 5
Hidden Neurons
concrete_model2 <- neuralnet(formula = Strength ~ Cement + Blast + Fly + Water + Super
plasticizer + Coarse + Fine + Age, data = concrete_train, hidden=5)
plot.nnet(concrete_model2,cex.val=0.75)
Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html
5 of 11 2/18/2016 5:07 PM
We observe age remains a key contributor, but the effects of Water, Superplasticizer, Cement and Blast are also
visibly ranked. 4 out of the 5 Hidden nodes are about evenly contributing…
model_results2 <- compute(concrete_model2, concrete_test[1:8])
predicted_strength2 <- model_results2$net.result
cor(predicted_strength2, concrete_test$Strength)[1,1]
## [1] 0.9138705528
p <- plot(concrete_test$Strength,predicted_strength2)
Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html
6 of 11 2/18/2016 5:07 PM
Step 6: Improving Model using Random Forest
Algorithm
We now will rely on the Random Forest algorithm and attempt a model improvement.
model_result3 <- train(Strength ~ ., data = concrete_train,method='rf',prox=TRUE)
model_result3
Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html
7 of 11 2/18/2016 5:07 PM
## Random Forest
##
## 774 samples
## 8 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 774, 774, 774, 774, 774, 774, ...
## Resampling results across tuning parameters:
##
## mtry RMSE Rsquared RMSE SD Rsquared SD
## 2 0.07833903729 0.8774837294 0.004515965257 0.01607292866
## 5 0.07159729020 0.8869237640 0.004510365293 0.01395064039
## 8 0.07365915696 0.8777421774 0.005197103383 0.01691711063
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 5.
predicted_strength3 <- predict(model_result3,concrete_test)
cor(predicted_strength3, concrete_test$Strength)
## [1] 0.943384811
The default Random Forest algorithm helped improve our prediction and exhibits a correlation of 0.9433848.
Again, we can certainly try to improve by introducing resampling… The caret package offers multiple methods
to try out. We’ll just try one to give an idea…
Step 7: Testing Further with Resampling
model_result4 <- train(Strength ~ ., method='rf',data = concrete_train,verbose=FALSE,t
rControl = trainControl(method="cv"))
model_result4
Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html
8 of 11 2/18/2016 5:07 PM
## Random Forest
##
## 774 samples
## 8 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 696, 696, 695, 697, 698, 696, ...
## Resampling results across tuning parameters:
##
## mtry RMSE Rsquared RMSE SD Rsquared SD
## 2 0.06910599 0.9068129 0.01101706 0.04013275
## 5 0.06227778 0.9121046 0.01232558 0.03945448
## 8 0.06287753 0.9089120 0.01208088 0.03835530
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 5.
predicted_strength4 <- predict(model_result4,concrete_test)
cor(predicted_strength4, concrete_test$Strength)
## [1] 0.9428245
We observe the prediction is practically unchanged in this case, with a correlation of 0.9428245. Still not bad for
a quick analysis performed on existing data. Regardless of our property target, we already derived key areas to
investigate deeper… and can clearly see some key ingredients (Cement, Blast, Fly, Superplasticizer, Water…)
and the Age process as determining factors to produce strength performance. So naturally, one may want to
display this model.
Step 8: Actual Display of a Random Forest Tree
Solution
It turns out that so-called blackbox models are – well – meant to stay in their box! However, the rpart
(https://cran.r-project.org/web/packages/rpart/rpart.pdf) and rpart.plot (https://cran.r-project.org/web/packages
/rpart.plot/rpart.plot.pdf) packages make it easy to visualize even complex trees.
strength.tree <- rpart(Strength ~ .,data=concrete_train, control=rpart.control(minspli
t=20,cp=0.002))
prp(strength.tree,compress=TRUE)
Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html
9 of 11 2/18/2016 5:07 PM
In the network, normalized strength is indicated in the oval leaves, and are ranked from low to high from left to
right branches.
Conclusions
We hope this typical example demonstrates that Machine Learning algorithms are well positioned to help
resolve formulation challenges, offering a fast, efficient and economical alternative to tedious experimentation. It
is easy to imagine how similar questions can be resolved in all types of R&D, in materials, cosmetics, food or
any scientific area.
Rubber formulations to minimize rolling resistance and emissions, or modern composites to build renewable
energy sources or lighweight transportation vehicles and next-generation public transit, as well as innovative
UV-shield oinments and tasty snacks and drinks…, all present similar challenges where only the nature of
inputs and outputs vary. Therefore, the method can and should be applied broadly!
Next time, we’ll review how to address another common challenge: classification and clustering. Till then, we
hope this approach has triggered interest.
Why not try and implement Machine Learning in your scientific or technical expert area? Remember, PRC
Consulting, LLC (http://www.prcconsulting.net) is dedicated to boosting innovation thru improved Analytics, one
customer at the time!
References
Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html
10 of 11 2/18/2016 5:07 PM
The following sources are referenced as they provided significant help and information to develop this Machine
Learning analysis applied to formulations:
UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/datasets.html)1.
caret (https://cran.r-project.org/web/packages/caret/caret.pdf)2.
NeuralNetTool (https://beckmw.wordpress.com/tag/neural-network/)3.
rpart (https://cran.r-project.org/web/packages/rpart/rpart.pdf)4.
rpart.plot (https://cran.r-project.org/web/packages/rpart.plot/rpart.plot.pdf)5.
RStudio (https://www.rstudio.com)6.
Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html
11 of 11 2/18/2016 5:07 PM

Contenu connexe

En vedette (13)

Shyrley n°15 5-c
Shyrley n°15 5-cShyrley n°15 5-c
Shyrley n°15 5-c
 
Inflation
InflationInflation
Inflation
 
Presentacion multimedia claudia hilares_5_c
Presentacion multimedia claudia hilares_5_cPresentacion multimedia claudia hilares_5_c
Presentacion multimedia claudia hilares_5_c
 
Deus de promessas
Deus de promessasDeus de promessas
Deus de promessas
 
Influencia del inetrnet en la sociedad silvia cardenas 5_c
Influencia del inetrnet en la sociedad silvia cardenas 5_cInfluencia del inetrnet en la sociedad silvia cardenas 5_c
Influencia del inetrnet en la sociedad silvia cardenas 5_c
 
Blog
BlogBlog
Blog
 
Azizi fonctions des icones sous spss11.5
Azizi fonctions des icones sous spss11.5Azizi fonctions des icones sous spss11.5
Azizi fonctions des icones sous spss11.5
 
Tax Incidence Webinar slides
Tax Incidence Webinar slidesTax Incidence Webinar slides
Tax Incidence Webinar slides
 
Historical Foundations of Management
Historical Foundations of ManagementHistorical Foundations of Management
Historical Foundations of Management
 
Qué es un dominio de Internet
Qué es un dominio de InternetQué es un dominio de Internet
Qué es un dominio de Internet
 
Incidence of tax
Incidence of taxIncidence of tax
Incidence of tax
 
Master degree Safety Resume
Master degree Safety ResumeMaster degree Safety Resume
Master degree Safety Resume
 
101 lecture 6
101 lecture 6101 lecture 6
101 lecture 6
 

Similaire à Machine learning key to your formulation challenges

Computer Tools for Academic Research
Computer Tools for Academic ResearchComputer Tools for Academic Research
Computer Tools for Academic ResearchMiklos Koren
 
B2 2005 introduction_load_testing_blackboard_primer_draft
B2 2005 introduction_load_testing_blackboard_primer_draftB2 2005 introduction_load_testing_blackboard_primer_draft
B2 2005 introduction_load_testing_blackboard_primer_draftSteve Feldman
 
Open Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and HistogramsOpen Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and HistogramsFrederic Descamps
 
Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Gülden Bilgütay
 
IRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware PerformanceIRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware PerformanceIRJET Journal
 
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor DriveIRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor DriveIRJET Journal
 
RivieraJUG - MySQL Indexes and Histograms
RivieraJUG - MySQL Indexes and HistogramsRivieraJUG - MySQL Indexes and Histograms
RivieraJUG - MySQL Indexes and HistogramsFrederic Descamps
 
Webinar: Performance Tuning + Optimization
Webinar: Performance Tuning + OptimizationWebinar: Performance Tuning + Optimization
Webinar: Performance Tuning + OptimizationMongoDB
 
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...Joachim Schlosser
 
[DSC Europe 22] Smart approach in development and deployment process for vari...
[DSC Europe 22] Smart approach in development and deployment process for vari...[DSC Europe 22] Smart approach in development and deployment process for vari...
[DSC Europe 22] Smart approach in development and deployment process for vari...DataScienceConferenc1
 
Performance schema in_my_sql_5.6_pluk2013
Performance schema in_my_sql_5.6_pluk2013Performance schema in_my_sql_5.6_pluk2013
Performance schema in_my_sql_5.6_pluk2013Valeriy Kravchuk
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple stepsRenjith M P
 
Cloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamCloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamDoug Needham
 
10 Ways To Improve Your Code( Neal Ford)
10  Ways To  Improve  Your  Code( Neal  Ford)10  Ways To  Improve  Your  Code( Neal  Ford)
10 Ways To Improve Your Code( Neal Ford)guestebde
 
Testing Experience - Evolution of Test Automation Frameworks
Testing Experience - Evolution of Test Automation FrameworksTesting Experience - Evolution of Test Automation Frameworks
Testing Experience - Evolution of Test Automation FrameworksŁukasz Morawski
 
Workshop: Your first machine learning project
Workshop: Your first machine learning projectWorkshop: Your first machine learning project
Workshop: Your first machine learning projectAlex Austin
 

Similaire à Machine learning key to your formulation challenges (20)

Computer Tools for Academic Research
Computer Tools for Academic ResearchComputer Tools for Academic Research
Computer Tools for Academic Research
 
B2 2005 introduction_load_testing_blackboard_primer_draft
B2 2005 introduction_load_testing_blackboard_primer_draftB2 2005 introduction_load_testing_blackboard_primer_draft
B2 2005 introduction_load_testing_blackboard_primer_draft
 
Open Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and HistogramsOpen Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and Histograms
 
10 Ways To Improve Your Code
10 Ways To Improve Your Code10 Ways To Improve Your Code
10 Ways To Improve Your Code
 
Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21
 
IRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware PerformanceIRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware Performance
 
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor DriveIRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
 
RivieraJUG - MySQL Indexes and Histograms
RivieraJUG - MySQL Indexes and HistogramsRivieraJUG - MySQL Indexes and Histograms
RivieraJUG - MySQL Indexes and Histograms
 
Webinar: Performance Tuning + Optimization
Webinar: Performance Tuning + OptimizationWebinar: Performance Tuning + Optimization
Webinar: Performance Tuning + Optimization
 
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
 
[DSC Europe 22] Smart approach in development and deployment process for vari...
[DSC Europe 22] Smart approach in development and deployment process for vari...[DSC Europe 22] Smart approach in development and deployment process for vari...
[DSC Europe 22] Smart approach in development and deployment process for vari...
 
Performance schema in_my_sql_5.6_pluk2013
Performance schema in_my_sql_5.6_pluk2013Performance schema in_my_sql_5.6_pluk2013
Performance schema in_my_sql_5.6_pluk2013
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple steps
 
Ember
EmberEmber
Ember
 
Cloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamCloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug Needham
 
Lecture-6-7.pptx
Lecture-6-7.pptxLecture-6-7.pptx
Lecture-6-7.pptx
 
Benchmarking_ML_Tools
Benchmarking_ML_ToolsBenchmarking_ML_Tools
Benchmarking_ML_Tools
 
10 Ways To Improve Your Code( Neal Ford)
10  Ways To  Improve  Your  Code( Neal  Ford)10  Ways To  Improve  Your  Code( Neal  Ford)
10 Ways To Improve Your Code( Neal Ford)
 
Testing Experience - Evolution of Test Automation Frameworks
Testing Experience - Evolution of Test Automation FrameworksTesting Experience - Evolution of Test Automation Frameworks
Testing Experience - Evolution of Test Automation Frameworks
 
Workshop: Your first machine learning project
Workshop: Your first machine learning projectWorkshop: Your first machine learning project
Workshop: Your first machine learning project
 

Dernier

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 

Dernier (20)

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 

Machine learning key to your formulation challenges

  • 1. Machine Learning, Key to Your Formulation Challenges Marc Borowczak, PRC Consulting LLC (http://www.prcconsulting.net) February 17, 2016 Formulation Challenges are Everywhere… Step1: Retrieve Existing Data Step 2: Normalize Data Step 3: Train and Test a Model Step 4: Evaluate Model Performance Step 5: Improving Model Performance with 5 Hidden Neurons Step 6: Improving Model using Random Forest Algorithm Step 7: Testing Further with Resampling Step 8: Actual Display of a Random Forest Tree Solution Conclusions References Formulation Challenges are Everywhere… You develop pharmaceutical, cosmetic, food, industrial or civil engineered products, and are often confronted with the challenge of blending and formulating to meet process or performance properties. While traditional Research and Development does approach the problem with experimentation, it generally involves designs, time and resource constraints, and can be considered slow, expensive and often times redundant, fast forgotten or perhaps obsolete. Consider the alternative Machine Learning tools offers today. We will show this is not only quick, efficient and ultimately the only way Front End of Innovation should proceed, and how it is particularly suited for formulation and classification. Today, we will explain how Machine Learning can shed new light on this generic and very persistent formulation challenge. We will discuss the other important aspect of classification and clustering often associated with these formulations challenges in a forthcoming communication. Step1: Retrieve Existing Data To illustrate the approach, we selected a formulation dataset hosted on UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/datasets.html), to predict the compressive strength (http://archive.ics.uci.edu /ml/datasets/Concrete+Compressive+Strength) performance dependency on the formulation ingredients. This is the well-known formulation composition - property relationship scientists, engineers and business professionals must address daily and any established R&D would certainly have similar and sometimes hidden knowledge in its archives… We will use R to demonstrate quickly the approach on this dataset (http://archive.ics.uci.edu/ml/machine- learning-databases/concrete/compressive/Concrete_Data.xls), and also demonstrate how reproducibility of the analysis is enforced. The analysis tool and platform are documented, all libraries clearly listed, while data is retrieved programmatically and date stamped from the repository. Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html 1 of 11 2/18/2016 5:07 PM
  • 2. Sys.info()[1:5] ## sysname release version nodename machine ## "Windows" "7 x64" "build 9200" "STALLION" "x86-64" sessionInfo() ## R version 3.2.2 (2015-08-14) ## Platform: x86_64-w64-mingw32/x64 (64-bit) ## Running under: Windows 8 x64 (build 9200) ## ## locale: ## [1] LC_COLLATE=English_United States.1252 ## [2] LC_CTYPE=English_United States.1252 ## [3] LC_MONETARY=English_United States.1252 ## [4] LC_NUMERIC=C ## [5] LC_TIME=English_United States.1252 ## ## attached base packages: ## [1] grid stats graphics grDevices utils datasets methods ## [8] base ## ## other attached packages: ## [1] reshape_0.8.5 scales_0.3.0 devtools_1.8.0 ## [4] rpart.plot_1.5.3 rpart_4.1-10 randomForest_4.6-10 ## [7] neuralnet_1.32 MASS_7.3-44 caret_6.0-52 ## [10] ggplot2_1.0.1 lattice_0.20-33 stringr_1.0.0 ## [13] xlsx_0.5.7 xlsxjars_0.6.1 rJava_0.9-7 ## ## loaded via a namespace (and not attached): ## [1] Rcpp_0.12.0 git2r_0.11.0 formatR_1.2 ## [4] nloptr_1.0.4 plyr_1.8.3 iterators_1.0.7 ## [7] tools_3.2.2 digest_0.6.8 lme4_1.1-9 ## [10] memoise_0.2.1 evaluate_0.7.2 gtable_0.1.2 ## [13] nlme_3.1-121 mgcv_1.8-9 Matrix_1.2-2 ## [16] foreach_1.4.2 curl_0.9.3 parallel_3.2.2 ## [19] yaml_2.1.13 SparseM_1.7 brglm_0.5-9 ## [22] proto_0.3-10 xml2_0.1.1 BradleyTerry2_1.0-6 ## [25] knitr_1.11 rversions_1.0.2 MatrixModels_0.4-1 ## [28] gtools_3.5.0 stats4_3.2.2 nnet_7.3-11 ## [31] rmarkdown_0.9.2 minqa_1.2.4 reshape2_1.4.1 ## [34] car_2.0-26 magrittr_1.5 codetools_0.2-14 ## [37] htmltools_0.2.6 splines_3.2.2 pbkrtest_0.4-2 ## [40] colorspace_1.2-6 quantreg_5.18 stringi_0.5-5 ## [43] munsell_0.4.2 Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html 2 of 11 2/18/2016 5:07 PM
  • 3. library(xlsx) library(stringr) library(caret) library(neuralnet) library(devtools) library(rpart) library(rpart.plot) userdir <- getwd() datadir <- "./data" if (!file.exists("data")){dir.create("data")} fileUrl <- "http://archive.ics.uci.edu/ml/machine-learning-databases/concrete/compress ive/Concrete_Data.xls?accessType=DOWNLOAD" download.file(fileUrl,destfile="./data/Concrete_Data.xls",method="curl") dateDownloaded <- date() concrete <- read.xlsx("./data/Concrete_Data.xls",sheetName="Sheet1") str(concrete) ## 'data.frame': 1030 obs. of 9 variables: ## $ Cement..component.1..kg.in.a.m.3.mixture. : num 540 540 332 332 199 ... ## $ Blast.Furnace.Slag..component.2..kg.in.a.m.3.mixture.: num 0 0 142 142 132 ... ## $ Fly.Ash..component.3..kg.in.a.m.3.mixture. : num 0 0 0 0 0 0 0 0 0 0 ... ## $ Water...component.4..kg.in.a.m.3.mixture. : num 162 162 228 228 192 228 228 228 228 228 ... ## $ Superplasticizer..component.5..kg.in.a.m.3.mixture. : num 2.5 2.5 0 0 0 0 0 0 0 0 ... ## $ Coarse.Aggregate...component.6..kg.in.a.m.3.mixture. : num 1040 1055 932 932 97 8 ... ## $ Fine.Aggregate..component.7..kg.in.a.m.3.mixture. : num 676 676 594 594 826 ... ## $ Age..day. : num 28 28 270 365 360 90 365 28 28 28 ... ## $ Concrete.compressive.strength.MPa..megapascals.. : num 80 61.9 40.3 41.1 44 .3 ... Step 2: Normalize Data The dataset information reveals 1030 observations with 9 variables: 8 inputs, from which 7 are ingredients and 1 is a process attribute (Age) and 1 output, the strength property. There are no missing values in this set. We’ll easily truncate the variable names and normalize the data, displaying the normalized strength. normalize <- function(x) {return((x - min(x)) / (max(x) - min(x)))} names(concrete)<-gsub("."," ",names(concrete)) names(concrete)<-word(names(concrete),1) names(concrete)[9]<-"Strength" concrete_norm <- as.data.frame(lapply(concrete, normalize)) Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html 3 of 11 2/18/2016 5:07 PM
  • 4. ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.0000 0.2663 0.4000 0.4172 0.5457 1.0000 These transformations should be typical of a generic formulation where ingredients and process variables are independent or input variables, and property is a dependent or output variable. Step 3: Train and Test a Model The method we’ll follow now is a standard approach where we randomly split the data set in a train and test set. The caret (https://cran.r-project.org/web/packages/caret/caret.pdf) package implements this task well. We’ll use 75% of the data to train and the remainder to test the model. To make the analysis reproducible, we’ll use the set.seed() function. set.seed(12121) inTrain<-createDataPartition(y=concrete_norm$Strength,p=0.75,list=FALSE) concrete_train <- concrete_norm[inTrain, ] concrete_test <- concrete_norm[-inTrain, ] concrete_model <- neuralnet(formula = Strength ~ Cement + Blast + Fly + Water + Superp lasticizer + Coarse + Fine + Age, data = concrete_train) The network topology is then easily visualized. See for details the excellent NeuralNetTool page (https://beckmw.wordpress.com/tag/neural-network/). Suffise to say that even this simple first attempt highlights the main dependencies and higher impacts are highligted with thicker links in the typical neural net representation. here the I’s represent inputs, O is the output, H and B are Hidden and Bias nodes as defined in the theory. Note a single bias node is added for each input and hidden layer to accomodate input features equal to 0. Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html 4 of 11 2/18/2016 5:07 PM
  • 5. Step 4: Evaluate Model Performance We will now compute predictions and compare them to actual strength and examine the correlation between predicted and actual strength values. model_results <- compute(concrete_model, concrete_test[1:8]) predicted_strength <- model_results$net.result cor(predicted_strength, concrete_test$Strength)[1,1] ## [1] 0.8336352308 The default neural net exhibits a correlation of 0.8336352. We can certainly try to improve it by including a few hidden neurons… Step 5: Improving Model Performance with 5 Hidden Neurons concrete_model2 <- neuralnet(formula = Strength ~ Cement + Blast + Fly + Water + Super plasticizer + Coarse + Fine + Age, data = concrete_train, hidden=5) plot.nnet(concrete_model2,cex.val=0.75) Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html 5 of 11 2/18/2016 5:07 PM
  • 6. We observe age remains a key contributor, but the effects of Water, Superplasticizer, Cement and Blast are also visibly ranked. 4 out of the 5 Hidden nodes are about evenly contributing… model_results2 <- compute(concrete_model2, concrete_test[1:8]) predicted_strength2 <- model_results2$net.result cor(predicted_strength2, concrete_test$Strength)[1,1] ## [1] 0.9138705528 p <- plot(concrete_test$Strength,predicted_strength2) Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html 6 of 11 2/18/2016 5:07 PM
  • 7. Step 6: Improving Model using Random Forest Algorithm We now will rely on the Random Forest algorithm and attempt a model improvement. model_result3 <- train(Strength ~ ., data = concrete_train,method='rf',prox=TRUE) model_result3 Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html 7 of 11 2/18/2016 5:07 PM
  • 8. ## Random Forest ## ## 774 samples ## 8 predictor ## ## No pre-processing ## Resampling: Bootstrapped (25 reps) ## Summary of sample sizes: 774, 774, 774, 774, 774, 774, ... ## Resampling results across tuning parameters: ## ## mtry RMSE Rsquared RMSE SD Rsquared SD ## 2 0.07833903729 0.8774837294 0.004515965257 0.01607292866 ## 5 0.07159729020 0.8869237640 0.004510365293 0.01395064039 ## 8 0.07365915696 0.8777421774 0.005197103383 0.01691711063 ## ## RMSE was used to select the optimal model using the smallest value. ## The final value used for the model was mtry = 5. predicted_strength3 <- predict(model_result3,concrete_test) cor(predicted_strength3, concrete_test$Strength) ## [1] 0.943384811 The default Random Forest algorithm helped improve our prediction and exhibits a correlation of 0.9433848. Again, we can certainly try to improve by introducing resampling… The caret package offers multiple methods to try out. We’ll just try one to give an idea… Step 7: Testing Further with Resampling model_result4 <- train(Strength ~ ., method='rf',data = concrete_train,verbose=FALSE,t rControl = trainControl(method="cv")) model_result4 Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html 8 of 11 2/18/2016 5:07 PM
  • 9. ## Random Forest ## ## 774 samples ## 8 predictor ## ## No pre-processing ## Resampling: Cross-Validated (10 fold) ## Summary of sample sizes: 696, 696, 695, 697, 698, 696, ... ## Resampling results across tuning parameters: ## ## mtry RMSE Rsquared RMSE SD Rsquared SD ## 2 0.06910599 0.9068129 0.01101706 0.04013275 ## 5 0.06227778 0.9121046 0.01232558 0.03945448 ## 8 0.06287753 0.9089120 0.01208088 0.03835530 ## ## RMSE was used to select the optimal model using the smallest value. ## The final value used for the model was mtry = 5. predicted_strength4 <- predict(model_result4,concrete_test) cor(predicted_strength4, concrete_test$Strength) ## [1] 0.9428245 We observe the prediction is practically unchanged in this case, with a correlation of 0.9428245. Still not bad for a quick analysis performed on existing data. Regardless of our property target, we already derived key areas to investigate deeper… and can clearly see some key ingredients (Cement, Blast, Fly, Superplasticizer, Water…) and the Age process as determining factors to produce strength performance. So naturally, one may want to display this model. Step 8: Actual Display of a Random Forest Tree Solution It turns out that so-called blackbox models are – well – meant to stay in their box! However, the rpart (https://cran.r-project.org/web/packages/rpart/rpart.pdf) and rpart.plot (https://cran.r-project.org/web/packages /rpart.plot/rpart.plot.pdf) packages make it easy to visualize even complex trees. strength.tree <- rpart(Strength ~ .,data=concrete_train, control=rpart.control(minspli t=20,cp=0.002)) prp(strength.tree,compress=TRUE) Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html 9 of 11 2/18/2016 5:07 PM
  • 10. In the network, normalized strength is indicated in the oval leaves, and are ranked from low to high from left to right branches. Conclusions We hope this typical example demonstrates that Machine Learning algorithms are well positioned to help resolve formulation challenges, offering a fast, efficient and economical alternative to tedious experimentation. It is easy to imagine how similar questions can be resolved in all types of R&D, in materials, cosmetics, food or any scientific area. Rubber formulations to minimize rolling resistance and emissions, or modern composites to build renewable energy sources or lighweight transportation vehicles and next-generation public transit, as well as innovative UV-shield oinments and tasty snacks and drinks…, all present similar challenges where only the nature of inputs and outputs vary. Therefore, the method can and should be applied broadly! Next time, we’ll review how to address another common challenge: classification and clustering. Till then, we hope this approach has triggered interest. Why not try and implement Machine Learning in your scientific or technical expert area? Remember, PRC Consulting, LLC (http://www.prcconsulting.net) is dedicated to boosting innovation thru improved Analytics, one customer at the time! References Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html 10 of 11 2/18/2016 5:07 PM
  • 11. The following sources are referenced as they provided significant help and information to develop this Machine Learning analysis applied to formulations: UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/datasets.html)1. caret (https://cran.r-project.org/web/packages/caret/caret.pdf)2. NeuralNetTool (https://beckmw.wordpress.com/tag/neural-network/)3. rpart (https://cran.r-project.org/web/packages/rpart/rpart.pdf)4. rpart.plot (https://cran.r-project.org/web/packages/rpart.plot/rpart.plot.pdf)5. RStudio (https://www.rstudio.com)6. Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html 11 of 11 2/18/2016 5:07 PM