Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
| © Copyright 2015 Hitachi Consulting1
Microsoft R
ScaleR Overview with a Quick Tutorial
Khalid M. Salama, Ph.D.
Business ...
| © Copyright 2015 Hitachi Consulting2
Outline
 Experimental Data Science vs Operational Machine Learning
 Microsoft R S...
| © Copyright 2015 Hitachi Consulting3
Experimental Data Science vs Operational
Machine Learning
| © Copyright 2015 Hitachi Consulting4
Exploratory Data
Analysis
Data Science Activities
Experimentation vs. Operationaliz...
| © Copyright 2015 Hitachi Consulting5
Online Apps
Automated ML Pipeline
Data Science Activities
Experimentation vs. Opera...
| © Copyright 2015 Hitachi Consulting6
Microsoft R Server
| © Copyright 2015 Hitachi Consulting7
Microsoft R Server
R in Microsoft World
Microsoft R Open (MRO)
 Based on latest Op...
| © Copyright 2015 Hitachi Consulting8
Microsoft R Server
Comparison
CRAN MRO MRS
Data size In-memory In-memory In-memory ...
| © Copyright 2015 Hitachi Consulting9
Microsoft R Server
Components & Compute Contexts
Microsoft R Server
CRAN&MSROpen
Sc...
Import
Data
1- Reference to a Data
Source
 RxTextData()
 RxSqlServerData()
 RxOdbcData()
 RxTeradata()
2- Import Data ...
| © Copyright 2015 Hitachi Consulting11
Microsoft R – ScaleR
Get Information
Revo.version – query the version of the curre...
| © Copyright 2015 Hitachi Consulting12
Microsoft R – ScaleR
Set Information
rxSetComputContext(computeContext) – the foll...
| © Copyright 2015 Hitachi Consulting13
Microsoft R – ScaleR
Get Data
1. Reference a Data Source – The following are the f...
| © Copyright 2015 Hitachi Consulting14
Microsoft R – ScaleR
Reference a Data Source
file_path = file.path(data_directory,...
| © Copyright 2015 Hitachi Consulting15
Microsoft R – ScaleR
Import to xdf
xdf_file_path = file_path = file.path(data_dire...
| © Copyright 2015 Hitachi Consulting16
Microsoft R – ScaleR
Describing xdf
rxGetInfo( data = iris_xdata, getVarInfo = TRU...
| © Copyright 2015 Hitachi Consulting17
Microsoft R – ScaleR
Read a subset of xdf to a data frame
iris_subset = rxReadXdf(...
| © Copyright 2015 Hitachi Consulting18
Microsoft R – ScaleR
Process & Transform
Remember that you compute context can be ...
| © Copyright 2015 Hitachi Consulting19
Microsoft R – ScaleR
Process & Transform
Extract means and stdvs (will be used to ...
| © Copyright 2015 Hitachi Consulting20
Microsoft R – ScaleR
Process & Transform
Create data processing function
process_d...
| © Copyright 2015 Hitachi Consulting21
Microsoft R – ScaleR
Process & Transform
Execute the process_data function on the ...
| © Copyright 2015 Hitachi Consulting22
Microsoft R – ScaleR
Summarize & Analyse
Understand variable dependencies & correl...
| © Copyright 2015 Hitachi Consulting23
Microsoft R – ScaleR
Summarize & Analyse
Summarize data (generate sums, means, and...
| © Copyright 2015 Hitachi Consulting24
Microsoft R – ScaleR
Summarize & Analyse
Summarize cross tab results
summary(ctabs...
| © Copyright 2015 Hitachi Consulting25
Microsoft R – ScaleR
Summarize & Analyse
Summarize using xCube (to produce a long-...
| © Copyright 2015 Hitachi Consulting26
Microsoft R – ScaleR
Visualize
rxHistogram(~Sepal.Length|Species, data = iris_xdat...
| © Copyright 2015 Hitachi Consulting27
Microsoft R – ScaleR
Learn & Predict
Classification Algorithms
 rxDTrees() – Deci...
| © Copyright 2015 Hitachi Consulting28
Microsoft R – ScaleR
Learn & Predict – Decision Trees Example
rxDTrees() used to t...
| © Copyright 2015 Hitachi Consulting29
Microsoft R – ScaleR
Learn & Predict – Decision Trees Example
# get predictions, i...
| © Copyright 2015 Hitachi Consulting30
Microsoft R – ScaleR
Learn & Predict – Decision Trees Example
# compute accuracy
p...
| © Copyright 2015 Hitachi Consulting31
Microsoft R – ScaleR
Parallel Processing on Partitioned Data
In some cases, instea...
| © Copyright 2015 Hitachi Consulting32
Microsoft R – ScaleR
Parallel Processing on Partitioned Data
For example, using th...
| © Copyright 2015 Hitachi Consulting33
Microsoft R – msrdeploy
Deploy & Consume
In order to deploy an R model as a web AP...
| © Copyright 2015 Hitachi Consulting34
Microsoft R – msrdeploy
Deploy & Consume
library(mrsdeploy)
# generate data
x = 1:...
| © Copyright 2015 Hitachi Consulting35
Microsoft R – MicrosoftML
MicrosoftML Overview
Machine Learning Algorithms
 rxFas...
| © Copyright 2015 Hitachi Consulting36
My Background
Applying Computational Intelligence in Data Mining
 Honorary Resear...
| © Copyright 2015 Hitachi Consulting37
Thanks!
Prochain SlideShare
Chargement dans…5
×

Microsoft R - ScaleR Overview

2 948 vues

Publié le

Microsoft R enable enterprise-wide, scalable experimental data science and operational machine learning, by providing a collection of servers and tools that extend the capabilities of open-source R In these slides, we give a quick introduction to Microsoft R Server architecture, and a comprehensive overview of ScaleR, the core libraries to Microsoft R, that enables parallel execution and use external data frames (xdfs). A tutorial-like presentation covering how to: 1) setup the environments, 2) read data, 3) process & transform, 4) analyse, summarize, visualize, 5) learn & predict, and finally 6) deploy and consume (using msrdeploy).

Publié dans : Données & analyses
  • Identifiez-vous pour voir les commentaires

Microsoft R - ScaleR Overview

  1. 1. | © Copyright 2015 Hitachi Consulting1 Microsoft R ScaleR Overview with a Quick Tutorial Khalid M. Salama, Ph.D. Business Insights & Analytics Hitachi Consulting UK We Make it Happen. Better.
  2. 2. | © Copyright 2015 Hitachi Consulting2 Outline  Experimental Data Science vs Operational Machine Learning  Microsoft R Server  Overview on ScaleR  How to: Setup Environment  How to: Get Data  How to: Process & Transform  How to: Summarize, Analyse, and Visualize  How to: Learn & Predict  How to: Deploy and Consume (msrdeploy)  Overview on MicrosoftML package functionality
  3. 3. | © Copyright 2015 Hitachi Consulting3 Experimental Data Science vs Operational Machine Learning
  4. 4. | © Copyright 2015 Hitachi Consulting4 Exploratory Data Analysis Data Science Activities Experimentation vs. Operationalization Collect Data Blend Visualize Prepare ML Experiment Algorithm Selection Parameter Tuning Training & Testing Model Learning Dataset Report of Visuals & Findings Decision! Data Analysis & Experimentation  Interactive  Easy to perform  Rich Visualizations
  5. 5. | © Copyright 2015 Hitachi Consulting5 Online Apps Automated ML Pipeline Data Science Activities Experimentation vs. Operationalization Model Data Ingestion Data Processing Model Training Scoring Deploy Web APIs Predict Train Export Batch Real-time Operational ML Pipelines  Pipelined (ETL Integration)  Scalable  Apps Integration
  6. 6. | © Copyright 2015 Hitachi Consulting6 Microsoft R Server
  7. 7. | © Copyright 2015 Hitachi Consulting7 Microsoft R Server R in Microsoft World Microsoft R Open (MRO)  Based on latest Open Source R (3.2.2.) - Built, tested, and distributed by Microsoft  More efficient and multi-threaded computation  Enhanced by Intel Math Kernel Library (MKL) to speed up linear algebra functions  Compatible with all R-related software
  8. 8. | © Copyright 2015 Hitachi Consulting8 Microsoft R Server Comparison CRAN MRO MRS Data size In-memory In-memory In-memory & disk Efficiency Single threaded Multi-threaded Multi-threaded, parallel processing 1:N servers Support Community Community Community + Commercial Functionality 7500+ innovative analytic packages 7500+ innovative analytic packages 7500+ innovative packages + commercial parallel high-speed functions Licence Open Source Open Source Commercial license.
  9. 9. | © Copyright 2015 Hitachi Consulting9 Microsoft R Server Components & Compute Contexts Microsoft R Server CRAN&MSROpen ScaleR DistributedR ConnectR MicrosoftML-Package Operationalization (msrdeploy) RStudio | RTVS MS R Client Scale & Deploy DifferentComputeContexts  Installed on Windows or Linux  ScaleR - Optimized for parallel execution on Big Data, to eliminate memory limitations.  ConnectR – Provides access to local file systems, hdfs, hive, sqlserver, Teradata, etc.  DistributeR - Adaptable parallel execution framework to enable running on different (distributed) compute contexts.  Operationalization (msrdeploy) – Deploy the model as a Web API. https://msdn.microsoft.com/en-us/microsoft-r/microsoft-r-getting-started
  10. 10. Import Data 1- Reference to a Data Source  RxTextData()  RxSqlServerData()  RxOdbcData()  RxTeradata() 2- Import Data to XDF  rxImport()  RxSasData()  RxSpssData()  RxHiveData()  RxParquetData() 3- Reference XDF  RxXdfData() Setup 1- Get Information  Revo.home()  Revo.version  rxGetComputeContex()  rxGetFileSystem()  rxOptions() 2- Set Properties  rxSetComputeContex()  RxLocalSeq  RxLocalParallel  RxInSqlServer  rxSetFileSystem()  RxNativeFileSystem  RxHdfsFileSystem  rxSetOption()  RxInTeradata  RxHadoopMR  RxSpark Process & Transfor m rxDataStep()  inData (ref to data source)  outFile (xdf)  overwrite (the outFile if exists)  varToKeep (column selection)  rowSelection (filter)  transformObjects (need in your process)  transformPackages (need in your process)  transformFunc (function with your processing logic) rxMerge()  inData1  inData2  outFile  matchVars  matchType Others  rxSplit()  rxSort()  rxFactors() Summariz e  rxSummary()  rxQuantile()  rxCrossTabs()  rxCube() (formula,data)  rxMarginals()  as.xtabs() (crossTabs) Learn & Predict Classification  rxDTrees()  rxBTrees()  rxDForest()  rxNaiveBayes()  rxLogit() (formula, data) Analyze  rxCovCor()  rxCor()  rxSSCP() (formula, data) Predict  rxPredict(model, data)  rxRoc()  rxHistogram()  rxLinePlot()  rxRocCurve() Regression  rxLinMod()  rxGlm()  rxDTrees()  rxBTrees() (formula, data) Clustering  rxKMeans() (formula, data) Analyse Visualiz e Microsoft R ScaleR Summary Map Deploy 4 View Data Information  rxGetInfo()  rxChisquaredTest()  rxFisherTest()  rxKendallCor()  rxRiskRatio()  rxOddsRatio() (xtab) msrdeploy  remoteLogin  listServices()  getService()  publishService()  api$conumse()
  11. 11. | © Copyright 2015 Hitachi Consulting11 Microsoft R – ScaleR Get Information Revo.version – query the version of the current ScaleR Revo.home() – get the path of the currently used R. Make sure it is Microsoft R (Client or Server), not Open-Source R rxGetComputeContext() – get the current compute context. You can set the current compute context to many different options, as shown next. rxGetFileSystem() – get the default file system used. You can change the currently used file system from “native” to a “hdfs”, as shown next. rxOptions() – list all the ScaleR configurations, and their current values. You can get the value of a specific option using rxGetOption(“optionName”)
  12. 12. | © Copyright 2015 Hitachi Consulting12 Microsoft R – ScaleR Set Information rxSetComputContext(computeContext) – the following are the various options, each is an computeContext object (each need different parameters to construct):  RxLocalSeq()  RxLocalParallel()  RxInSqlServer() rxSetFileSystem(fileSystem) – the filesystem object can one of the two following options:  RxNativeFileSystem()  RxHdfsFileSystem() rxSetOption(option = value) – used to set an option. Note that, these are the global default values, you can overwrite these values in each operation. The default values (that you set here) are used if nothing is specified in the operations  RxInTeradata()  RxHadoopMR()  RxSpark()
  13. 13. | © Copyright 2015 Hitachi Consulting13 Microsoft R – ScaleR Get Data 1. Reference a Data Source – The following are the functions to use to reference various data sources  RxTextData()  RxOdbcData()  RxSqlServerData()  RxTeraData() 2. Import the data to an eXternal Data Frame (xdf) - Not that, you can query the data in the data source, but you need to import it to xdf to be able to process it in your computeContext. rxImport( inData = dataSource, outFile = xdfFile.xdf )  overwrite = Boolean flag to replace an existing xdf file or not  append = use “rows” to append to the same .xdf file 3. Read the imported xdf data RxXdfData( file = xdfFile.xdf )  createCompositeSet = set to TRUE if you point to a directory that contains multiple .xdf files to treat them as one dataset.  RxSasData()  RxSpssData()  RxHiveData()  RxParquetData()
  14. 14. | © Copyright 2015 Hitachi Consulting14 Microsoft R – ScaleR Reference a Data Source file_path = file.path(data_directory,”iris.csv”) txtDataSource = rxTextData(file = file_path) OR connection_string = “Driver=SQL Server; Server=.; Database=dbdemo; Trusted_Connection = True;” sql_query = “SELECT * FROM iris;” sqlDataSource = rxSqlServerData(connectionString = connection_string, sqlQuery = sql_query) Note, this is only reference to the data source, which will not make anything with the data until you query it, e.g. head(dataSource)
  15. 15. | © Copyright 2015 Hitachi Consulting15 Microsoft R – ScaleR Import to xdf xdf_file_path = file_path = file.path(data_directory,”iris.xdf”) iris_xdata = rxImport( inData = dataSource, outFile = xdf_file_path overwrite = TRUE, append = “none” )  inData = any “Rx” Data Source, or it can be a file path  outFile = file to store the .xdf dataset  overwrite = Boolean flag to replace an existing xdf file or not  append = use “rows” to append to the same .xdf file This will create iris.xdf file in your fileSystem, and return iris_xdata reference to work with the dataset. You can read the .xdf file later: iris_xdata = RxXdfData( file = xdf_file_path) class(iris_xdata)
  16. 16. | © Copyright 2015 Hitachi Consulting16 Microsoft R – ScaleR Describing xdf rxGetInfo( data = iris_xdata, getVarInfo = TRUE, numRows = 2) rxSummary(formula = ~., data = xdata)
  17. 17. | © Copyright 2015 Hitachi Consulting17 Microsoft R – ScaleR Read a subset of xdf to a data frame iris_subset = rxReadXdf(data = iris.xdata, startRow = 10, numRows = 5)  iris_subset = in-memory data frame  data = Rx Data Source  numRows = number of rows to retrieve Sometimes it is useful to get a (small) subset of the xdf to a data frame to test a processing function on it before we apply it on the big data (xdf)
  18. 18. | © Copyright 2015 Hitachi Consulting18 Microsoft R – ScaleR Process & Transform Remember that you compute context can be a distributed processing cluster: Hpc, spark, Hadoop, etc. In such case, each node of the compute cluster processes a subset of your xdf, as it is shredded also on a HDFS You data processing operation needs to consider that, i.e., all the needed objects and packages are available for the local node to process this data portion rxDataSetp() function is used to process and transform an xdf dataset, and can be used to perform the following  Filter rows  Select columns  Add computed columns  Convert column types (e.g. discetize to factors)  Update existing columns (handling missing values, scale & normalize, etc.) rxDataStep(…)  inData = xdf to process  outFile = can be the same as the input xdf. If omitted, the function return a data frame  overwrite = set to TRUE if inData = outFile  rowSelection = (col1 > 50) & …  varToKeep = character vector of columns to select  transformFunc = a function that has the processing logic  transformObjects = list of objects used in the function  transformPackages = list of packages used in the function
  19. 19. | © Copyright 2015 Hitachi Consulting19 Microsoft R – ScaleR Process & Transform Extract means and stdvs (will be used to normalize some columns) rxsummary = rxSummary(~.,iris_xdata) str(rxsummary$sDataFrame) means = rxsummary$sDataFrame$Mean stdvs = rxsummary$sDataFrame$StdDev Extract quantiles for Sepal.Length (will be used to discretize it) cut_points = rxQuantile(varName = "Sepal.Length", data = iris_xdata) cut_points
  20. 20. | © Copyright 2015 Hitachi Consulting20 Microsoft R – ScaleR Process & Transform Create data processing function process_data = function(data_frame){ # discretize data_frame$Sepal.Length_Disc = cut(data_frame$Sepal.Length, breaks = cut_points) # normalize data_frame$Petal.Length_norm = (data_frame$Petal.Length - means[3])/stdvs[3] data_frame$Petal.Width_norm = (data_frame$Petal.Width - means[4])/stdvs[4] return(data_frame) } Note the following:  The function expects a data frame, which will be a subset of the xdf dataset running on a compute node  cut_points, means, and stdvs are variables that will be available to the scope of this function when passed via the rxDataStep() function
  21. 21. | © Copyright 2015 Hitachi Consulting21 Microsoft R – ScaleR Process & Transform Execute the process_data function on the iris_xdata rxDataStep(inData = iris_xdata, outFile = iris_xdata, overwrite = TRUE, rowSelection = !is.na(Species), transformFunc = process_data, transformObjects = list( "cut_points" = cut_points, "means" = means, "stdvs" = stdvs ) )
  22. 22. | © Copyright 2015 Hitachi Consulting22 Microsoft R – ScaleR Summarize & Analyse Understand variable dependencies & correlations formula = ~ Species+Sepal.Length + Sepal.Width + Petal.Length + Petal.Width rxCovCor(formula, data = iris_xdata, type = "Cor")  “Cor” = correlation  “Cov” = covariane  “SSCP” = sum squred / cross product
  23. 23. | © Copyright 2015 Hitachi Consulting23 Microsoft R – ScaleR Summarize & Analyse Summarize data (generate sums, means, and counts) using cross tabs formula = Sepal.Width ~ Sepal.Length_Disc:Species ctabs = rxCrossTabs(formula, data = iris_xdata, means = TRUE) ctabs$sums ctabs$means ctabs$counts
  24. 24. | © Copyright 2015 Hitachi Consulting24 Microsoft R – ScaleR Summarize & Analyse Summarize cross tab results summary(ctabs, output = "means") Get Margins rxMarginals(ctabs, output = “sums”) Perform Statistical Dependency test
  25. 25. | © Copyright 2015 Hitachi Consulting25 Microsoft R – ScaleR Summarize & Analyse Summarize using xCube (to produce a long-format table) formula = Petal.Width ~ F(Petal.Length) rxCube(formula, data = iris_xdata)  F(variable) converts the variable into a factor, on the fly, using the distinct rounded values of this variable
  26. 26. | © Copyright 2015 Hitachi Consulting26 Microsoft R – ScaleR Visualize rxHistogram(~Sepal.Length|Species, data = iris_xdata)
  27. 27. | © Copyright 2015 Hitachi Consulting27 Microsoft R – ScaleR Learn & Predict Classification Algorithms  rxDTrees() – Decision Trees for classification and regression. Can be converted to rpart tree models  rxBTrees() – Gradient Boosted Trees  rxDForest() – Random Forests  rxNaiveBayes()  rxLogit() – Logistic Regression Models Regression Algorithms  rxLinMod() – Linear Regression Models  rxGlm() Generalized Linear Models  rxDTrees()  rxBTrees() Clustering Algoritm  rxKMeans() All the algorithms accept the following parameters  Formula: response ~ input1+input2:input3  Data: learning set  Other parameters depending on the algorithms
  28. 28. | © Copyright 2015 Hitachi Consulting28 Microsoft R – ScaleR Learn & Predict – Decision Trees Example rxDTrees() used to train classification (target variable is categorical) & regression (target variable is numeric) trees. The output is similar to rpart tree model. The key parameters are:  formula: response ~ input1+input2:input3  data: traing set  xVal: number of cross validation folds for pruning  maxDepth: maximum number of tree levels (to control complexity)  minBucket: minimum number of examples must be in a leaf node (to control complexity) formula = Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width models.dtree = rxDTree(formula, data = iris_xdata) models.dtree
  29. 29. | © Copyright 2015 Hitachi Consulting29 Microsoft R – ScaleR Learn & Predict – Decision Trees Example # get predictions, in form of probabilities predictions = rxPredict(models.dtree, data = iris_xdata, type = c("prob")) # select only columns of actual and predicted (as data frame) predictions = rxDataStep(predictions, varsToKeep =c("Species", "setosa_Pred", "versicolor_Pred","virginica_Pred"), transforms = list( setosa_actual = as.numeric(Species=='setosa'), versicolor_actual = as.numeric(Species=='versicolor'), virginica_actual = as.numeric(Species=='virginica') ) ) # display the prediction results rxGetInfo(predictions, getVarInfo = TRUE, numRows = 5) # plot Roc Curve (with respect to versicolor predictions) rxRocCurve(actualVarName = "versicolor_actual", predVarNames = c("versicolor_Pred"), data = predictions)
  30. 30. | © Copyright 2015 Hitachi Consulting30 Microsoft R – ScaleR Learn & Predict – Decision Trees Example # compute accuracy predictions = rxPredict(models.dtree, data = iris_xdata, type = c("class")) predictions = rxReadXdf( predictions , varsToKeep = c("Species","Species_Pred")) accuracy = sum(as.numeric(predictions$Species == predictions$Species_Pred)/nrow(predictions)) print(accuracy) #use Revo Tree View to show tree tree = RevoTreeView::createTreeView(models.dtree) plot(tree) #convert to rpart tree model rpart_tree= as.rpart(models.dtree) class(rpart_tree) #export to pmml format library(pmml) pmml(rpart_tree)
  31. 31. | © Copyright 2015 Hitachi Consulting31 Microsoft R – ScaleR Parallel Processing on Partitioned Data In some cases, instead of building one “Big” model using all your “Big” data, you build “many” models using “small” subsets of the data For example, building many time-series models, one for each product line, for demand forecasting, or several regression models, one for each geographic area, for fraud detection This is also called mixture of local models In this case, your data is partitioned into (smaller) subsets, by a certain criteria, and then local models are built, one for each data subset Such a process can be performed in parallel, using rxExecBy() function, which takes the following parameters:  inData = xdf dataset to be partitioned  keys = character vector of the name of the dataset columns by which the data will be partitioned. These columns should be of type factor  func = the function that will be applied on each data partition (i.e., learning a local model)  rxExecBy() returns a list containing the constructed model of each partition Dataset Partition Subset 1 Subset 2 Subset 3 Local Model 1 Local Model 2 Local Model 3 Learn Learn Learn } Parallel Learning
  32. 32. | © Copyright 2015 Hitachi Consulting32 Microsoft R – ScaleR Parallel Processing on Partitioned Data For example, using the iris dataset, lets build a regression model that estimates Sepal.Length based on the Sepal.Width, for each Species type. In other words, we will partition the iris dataset into 3 subsets, one for each Species type (setosa, versicolor virginica), and build a local model for each partition, in parallel xdf = RxTextData(file = file.path(data_directory,"iris.csv")) buildLocalModels = function(keys, data){ local_xdf = rxImport(inData = data) local_model = rxLinMod(formula = Sepal.Length ~ Sepal.Width, data = data) return(local_model) } local_models = rxExecBy(inData = xdf, keys = c("Species"), func = buildLocalModels) local_models[[1]]$result local_models[[2]]$result local_models[[3]]$result
  33. 33. | © Copyright 2015 Hitachi Consulting33 Microsoft R – msrdeploy Deploy & Consume In order to deploy an R model as a web API, you need to configure an MS R Server for operationalization, by running the R-Server-Admin-Util, as described in this link: https://msdn.microsoft.com/en-us/microsoft-r/operationalize/about
  34. 34. | © Copyright 2015 Hitachi Consulting34 Microsoft R – msrdeploy Deploy & Consume library(mrsdeploy) # generate data x = 1:100 y = 2*x + rnorm(n=length(x), mean = 0, sd = 5) #buid a linear model reg_model = lm(y~x) # create a prediction function: takes input, and uses the lm to estimate the output estimate_output = function(input){ newdata = as.data.frame(x = input) names(newdata) = c("x") estimates = predict(reg_model, newdata = newdata, type = "response") return(estimates) } # connect to R Server to deploy into remoteLogin("http://localhost:12800", username = "admin", password = <password>) serviceName <- paste("estimate_output_", round(as.numeric(Sys.time()), 0)) # publish the prediction function api = publishService( serviceName, code = estimate_output, model = reg_model, # model to be used in the function inputs = list(input = "numeric"), outputs = list(output = "numeric"), v = "v1.0.0") # query the published API api # list the deployed APIs mrsdeploy::listServices() # consume the API result = api$estimate_output(120) result$output("output")
  35. 35. | © Copyright 2015 Hitachi Consulting35 Microsoft R – MicrosoftML MicrosoftML Overview Machine Learning Algorithms  rxFastLinear() – binary classification & Regression  rxOneClassSvm() – anomaly detection (unsupervised)  rxFastTrees() – classification & regression  rxFastForest() – classification & regression  rxNeuralNetworks() – classification & regression  rxLogisticRegression() - regression rxEnsemble() – combine a number of models of various kinds Text Processing  featurizeText() – TF, IDF, TF-IDF  getSentiment() – using pretrained model Image Processing  featurizeImage() – using a pretrained model  loadImgae()  resizeImage()  extractPixels() - extracts the pixel values from an image Other Processing  selectFeatures() – using minCount or mutualInfo  categorical() – converts a categorical variable to indicator columns  categoricalHash() converts a categorical variable to indicator columns using hashing (used with variable with many values) https://msdn.microsoft.com/en-us/microsoft-r/microsoftml-get-started
  36. 36. | © Copyright 2015 Hitachi Consulting36 My Background Applying Computational Intelligence in Data Mining  Honorary Research Fellow, School of Computing , University of Kent.  Ph.D. Computer Science, University of Kent, Canterbury, UK.  28+ published journal and conference papers in the fields of AI and ML https://www.researchgate.net/profile/Khalid_Salama https://www.linkedin.com/in/khalid-salama-24403144/ https://github.com/khalid-m-salama/sqlbits-2017
  37. 37. | © Copyright 2015 Hitachi Consulting37 Thanks!

×