SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
Valencian Summer School in Machine Learning
4th edition
September 13–14, 2018
Basic WhizzML
Mercè Martín
Outline
1 Server-side workflows: WhizzML
2 Example Workflow: Model or Ensemble?
3 Closing the cycle: WhizzML and Feature engineering
#VSSML18 Basic WhizzML September 13–14, 2018 3 / 36
Outline
1 Server-side workflows: WhizzML
2 Example Workflow: Model or Ensemble?
3 Closing the cycle: WhizzML and Feature engineering
#VSSML18 Basic WhizzML September 13–14, 2018 4 / 36
Client-side Machine Learning Automation
Problems of client-side solutions
Complexity Lots of details outside the problem domain
Extensibility Bigmler hides complexity at the cost of flexibility
• We need to explicitly control the resource management flows and cope
with errors
• Alternatively, we use assistants that do it for us, but for a limited subset
of workflows
#VSSML18 Basic WhizzML September 13–14, 2018 5 / 36
Client-side Machine Learning Automation
Problems of client-side solutions
Scalability Client-side workflows hard to optimize
Reuse No inter-language compatibility
• We need to deal with number of parallel tasks and available shared
resources
• The same workflows need to be reprogrammed in many languages
We’ve managed to abstract the ML algorithms logic, but not the
workflow logic
Not enough abstraction
#VSSML18 Basic WhizzML September 13–14, 2018 6 / 36
Higher-level Machine Learning
#VSSML18 Basic WhizzML September 13–14, 2018 7 / 36
Server-side Machine Learning
Solution (scalability, reuse): Back to the server
#VSSML18 Basic WhizzML September 13–14, 2018 8 / 36
Basic workflows: automatic generation
#VSSML18 Basic WhizzML September 13–14, 2018 9 / 36
Server-side Machine Learning Automation
Solution (complexity, extensibility): Domain-specific languages
abstracting plus naming and full language flexibility
#VSSML18 Basic WhizzML September 13–14, 2018 10 / 36
WhizzML in a Nutshell
• Domain-specific language for ML workflow automation
High-level problem and solution specification
• Framework for scalable, remote execution of ML workflows
Sophisticated server-side optimization
Out-of-the-box scalability
Client-server brittleness removed
Infrastructure for creating and sharing ML scripts and libraries
#VSSML18 Basic WhizzML September 13–14, 2018 11 / 36
WhizzML REST Resources
Library Reusable building-block: a collection of
WhizzML definitions that can be imported by
other libraries or scripts.
Script Executable code that describes an actual
workflow.
• Imports List of libraries with code used by
the script.
• Inputs List of input values that
parameterize the workflow.
• Outputs List of values computed by the
script and returned to the user.
Execution Given a script and a complete set of inputs,
the workflow can be executed and its outputs
generated.
#VSSML18 Basic WhizzML September 13–14, 2018 12 / 36
Use the REPL
Defining global variables
(define text "Hello BigMLers")
Defining local variables
(let (local-text "Hello BigMLers")
(log-info local-text))
Defining procedures
(define (print-hello name)
(log-info "Hello " name))
;; use it!
(print-hello "BigMLers")
every sentence returns a value and variables are immutable
#VSSML18 Basic WhizzML September 13–14, 2018 13 / 36
How to create WhizzML Scripts/Libraries
Github
Script editor
Gallery
Other scripts
Scriptify
−→
#VSSML18 Basic WhizzML September 13–14, 2018 14 / 36
Higher-level Machine Learning
#VSSML18 Basic WhizzML September 13–14, 2018 15 / 36
Basic workflow in WhizzML
(let (dataset (create-dataset source)
cluster (create-cluster dataset))
(create-batchcentroid dataset
cluster
{"output_dataset" true
"all_fields" true}))
#VSSML18 Basic WhizzML September 13–14, 2018 16 / 36
Abstraction at a higher level
#VSSML18 Basic WhizzML September 13–14, 2018 17 / 36
Scripts in WhizzML: Usable by any binding
from bigml.api import BigML
api = BigML()
# choose workflow
script = 'script/567b4b5be3f2a123a690ff56'
# define parameters
inputs = {'source': 'source/5643d345f43a234ff2310a3e'}
# execute
api.ok(api.create_execution(script, inputs))
#VSSML18 Basic WhizzML September 13–14, 2018 18 / 36
Scripts in WhizzML: Trivial parallelization
#VSSML18 Basic WhizzML September 13–14, 2018 19 / 36
Scripts in WhizzML: Trivial parallelization
#VSSML18 Basic WhizzML September 13–14, 2018 20 / 36
What else do we need?
The standard functions
• Numeric and relational operators (+, *, <, =, ...)
• Mathematical functions (cos, sinh, floor ...)
• Strings and regular expressions (str, matches?, replace, ...)
• Flatline generation
• Collections: list traversal, sorting, map manipulation
• BigML resources manipulation
Creation create-source, create-and-wait-dataset, etc.
Retrieval fetch, list-anomalies, etc.
Update update
Deletion delete
• Machine Learning Algorithms (SMACDown, Boosting, etc.)
#VSSML18 Basic WhizzML September 13–14, 2018 21 / 36
Outline
1 Server-side workflows: WhizzML
2 Example Workflow: Model or Ensemble?
3 Closing the cycle: WhizzML and Feature engineering
#VSSML18 Basic WhizzML September 13–14, 2018 22 / 36
Model or Ensemble?
• Split a dataset in test and training parts
• Create a model and an ensemble with the training dataset
• Evaluate both with the test dataset
• Choose the one with better evaluation (f-measure)
https://github.com/whizzml/examples/tree/master/model-or-ensemble
#VSSML18 Basic WhizzML September 13–14, 2018 23 / 36
Model or Ensemble?
;; Function encapsulating the full workflow
(define (model-or-ensemble src-id)
(let (ds-id (create-dataset src-id)
[train-id test-id] (create-random-dataset-split ds-id 0.8)
m-id (create-model train-id)
e-id (create-ensemble train-id {"number_of_models" 15})
m-f (f-measure (create-evaluation m-id test-id))
e-f (f-measure (create-evaluation e-id test-id)))
(if (> m-f e-f) m-id e-id)))
We only need a new function
f-measure to evaluate the f-measure of a model
#VSSML18 Basic WhizzML September 13–14, 2018 24 / 36
Model or Ensemble?
;; Function to extract the f-measure from an
;; evaluation, given its id.
(define (f-measure ev-id)
(let (ev-id (wait ev-id) ;; because fetch doesn't wait
evaluation (fetch ev-id))
(evaluation ["result" "model" "average_f_measure"])))
#VSSML18 Basic WhizzML September 13–14, 2018 25 / 36
Model or Ensemble?
;; Function encapsulating the full workflow
(define (model-or-ensemble src-id)
(let (ds-id (create-dataset src-id)
[train-id test-id] (create-random-dataset-split ds-id 0.
m-id (create-model train-id)
e-id (create-ensemble train-id {"number_of_models" 15})
m-f (f-measure (create-evaluation m-id test-id))
e-f (f-measure (create-evaluation e-id test-id)))
(if (> m-f e-f) m-id e-id)))
;; Compute the result of the script execution
;; - Inputs: [{"name": "input-source-id", "type": "source-id"}]
;; - Outputs: [{"name": "result", "type": "resource-id"}]
(define result (model-or-ensemble input-source-id))
#VSSML18 Basic WhizzML September 13–14, 2018 26 / 36
Outline
1 Server-side workflows: WhizzML
2 Example Workflow: Model or Ensemble?
3 Closing the cycle: WhizzML and Feature engineering
#VSSML18 Basic WhizzML September 13–14, 2018 27 / 36
Data Transformations
Feature engineering is an
unavoidable part of any Machine
Learning DSL. WhizzML covers
that thanks to a transformations
language: Flatline
#VSSML18 Basic WhizzML September 13–14, 2018 28 / 36
Transforming item counts to features
basket milk eggs flour salt chocolate caviar
milk,eggs Y Y N N N N
milk,flour Y N Y N N N
milk,flour,eggs Y Y Y N N N
chocolate N N N N Y N
#VSSML18 Basic WhizzML September 13–14, 2018 29 / 36
Adding a new feature with Flatline
We need to...
Create a new dataset by adding a new field that will have
a binary content depending on the value of the basket
field
The WhizzML expression will be like
(create-dataset ds-id
{"new_fields" [{"name" new-field-name
"field" new-field-value}]})
where the field value should be computed using a Flatline expression
(if (contains-items? "basket" "milk") "Y" "N")
#VSSML18 Basic WhizzML September 13–14, 2018 30 / 36
Item counts to features with Flatline
One new field per category
(if (contains-items? "basket" "milk") "Y" "N")
(if (contains-items? "basket" "eggs") "Y" "N")
(if (contains-items? "basket" "flour") "Y" "N")
(if (contains-items? "basket" "salt") "Y" "N")
(if (contains-items? "basket" "chocolate") "Y" "N")
(if (contains-items? "basket" "caviar") "Y" "N")
Parameterized code generation
Field name
Item values
Y/N category names
#VSSML18 Basic WhizzML September 13–14, 2018 31 / 36
Flatline code generation with WhizzML
The WhizzML code should generate a string per category
"(if (contains-items? "basket" "milk") "Y" "N")"
#VSSML18 Basic WhizzML September 13–14, 2018 32 / 36
Flatline code generation with WhizzML
The WhizzML code should generate a string per category
"(if (contains-items? "basket" "milk") "Y" "N")"
Let’s extract the parameters in the expression
(let (field "basket"
item "milk"
yes "Y"
no "N")
(flatline "(if (contains-items? {{field}} {{item}})"
"{{yes}}"
"{{no}})"))
#VSSML18 Basic WhizzML September 13–14, 2018 32 / 36
Flatline code generation with WhizzML
The WhizzML code should generate a string per category
"(if (contains-items? "basket" "milk") "Y" "N")"
Let’s extract the parameters in the expression
(let (field "basket"
item "milk"
yes "Y"
no "N")
(flatline "(if (contains-items? {{field}} {{item}})"
"{{yes}}"
"{{no}})"))
Eventually, let’s create a procedure
(define (field-flatline field item yes no)
(flatline "(if (contains-items? {{field}} {{item}})"
"{{yes}}"
"{{no}})"))
#VSSML18 Basic WhizzML September 13–14, 2018 32 / 36
Flatline code generation with WhizzML
(define (field-flatline field item yes no)
(flatline "(if (contains-items? {{field}} {{item}})"
"{{yes}}"
"{{no}})"))
(define (item-fields field items yes no)
(for (item items)
{"field" (field-flatline field item yes no)}))
(define (dataset-item-fields ds-id field)
(let (ds (fetch ds-id)
item-dist (ds ["fields" field "summary" "items"])
items (map head item-dist))
(item-fields field items "Y" "N")))
#VSSML18 Basic WhizzML September 13–14, 2018 33 / 36
Flatline code generation with WhizzML
(define output-dataset
(let (fs {"new_fields" (dataset-item-fields input-dataset
field)})
(create-dataset input-dataset fs)))
{"inputs": [{"name": "input-dataset",
"type": "dataset-id",
"description": "The input dataset"},
{"name": "field",
"type": "string",
"description": "Id of the items field"}],
"outputs": [{"name": "output-dataset",
"type": "dataset-id",
"description": "The id of the generated dataset"}]}
#VSSML18 Basic WhizzML September 13–14, 2018 34 / 36
More information
Resources
• Home: https://bigml.com/whizzml
• Documentation: https://bigml.com/whizzml#documentation
• Examples: https://github.com/whizzml/examples
#VSSML18 Basic WhizzML September 13–14, 2018 35 / 36
Questions?
#VSSML18 Basic WhizzML September 13–14, 2018 36 / 36

Contenu connexe

Similaire à VSSML18. Introduction to WhizzML

Advanced WhizzML Workflows
Advanced WhizzML WorkflowsAdvanced WhizzML Workflows
Advanced WhizzML WorkflowsBigML, Inc
 
Project A Data Modelling Best Practices Part II: How to Build a Data Warehouse?
Project A Data Modelling Best Practices Part II: How to Build a Data Warehouse?Project A Data Modelling Best Practices Part II: How to Build a Data Warehouse?
Project A Data Modelling Best Practices Part II: How to Build a Data Warehouse?Martin Loetzsch
 
TSAR (TimeSeries AggregatoR) Tech Talk
TSAR (TimeSeries AggregatoR) Tech TalkTSAR (TimeSeries AggregatoR) Tech Talk
TSAR (TimeSeries AggregatoR) Tech TalkAnirudh Todi
 
Big Data, Bigger Analytics
Big Data, Bigger AnalyticsBig Data, Bigger Analytics
Big Data, Bigger AnalyticsItzhak Kameli
 
Relevance trilogy may dream be with you! (dec17)
Relevance trilogy  may dream be with you! (dec17)Relevance trilogy  may dream be with you! (dec17)
Relevance trilogy may dream be with you! (dec17)Woonsan Ko
 
Code-first GraphQL Server Development with Prisma
Code-first  GraphQL Server Development with PrismaCode-first  GraphQL Server Development with Prisma
Code-first GraphQL Server Development with PrismaNikolas Burk
 
Sperasoft‬ talks j point 2015
Sperasoft‬ talks j point 2015Sperasoft‬ talks j point 2015
Sperasoft‬ talks j point 2015Sperasoft
 
Hivemall meets Digdag @Hackertackle 2018-02-17
Hivemall meets Digdag @Hackertackle 2018-02-17Hivemall meets Digdag @Hackertackle 2018-02-17
Hivemall meets Digdag @Hackertackle 2018-02-17Makoto Yui
 
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project ADN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project ADataconomy Media
 
Headache from using mathematical software
Headache from using mathematical softwareHeadache from using mathematical software
Headache from using mathematical softwarePVS-Studio
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...Chetan Khatri
 
BSSML17 - API and WhizzML
BSSML17 - API and WhizzMLBSSML17 - API and WhizzML
BSSML17 - API and WhizzMLBigML, Inc
 
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...InfluxData
 
How to grow GraphQL and remove SQLAlchemy and REST API from a high-load Pytho...
How to grow GraphQL and remove SQLAlchemy and REST API from a high-load Pytho...How to grow GraphQL and remove SQLAlchemy and REST API from a high-load Pytho...
How to grow GraphQL and remove SQLAlchemy and REST API from a high-load Pytho...Oleksandr Tarasenko
 
On the representation and reuse of machine learning (ML) models
On the representation and reuse of machine learning (ML) modelsOn the representation and reuse of machine learning (ML) models
On the representation and reuse of machine learning (ML) modelsVillu Ruusmann
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...Amazon Web Services
 

Similaire à VSSML18. Introduction to WhizzML (20)

Advanced WhizzML Workflows
Advanced WhizzML WorkflowsAdvanced WhizzML Workflows
Advanced WhizzML Workflows
 
Project A Data Modelling Best Practices Part II: How to Build a Data Warehouse?
Project A Data Modelling Best Practices Part II: How to Build a Data Warehouse?Project A Data Modelling Best Practices Part II: How to Build a Data Warehouse?
Project A Data Modelling Best Practices Part II: How to Build a Data Warehouse?
 
Cubes 1.0 Overview
Cubes 1.0 OverviewCubes 1.0 Overview
Cubes 1.0 Overview
 
Tsar tech talk
Tsar tech talkTsar tech talk
Tsar tech talk
 
TSAR (TimeSeries AggregatoR) Tech Talk
TSAR (TimeSeries AggregatoR) Tech TalkTSAR (TimeSeries AggregatoR) Tech Talk
TSAR (TimeSeries AggregatoR) Tech Talk
 
Lambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter LawreyLambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter Lawrey
 
Big Data, Bigger Analytics
Big Data, Bigger AnalyticsBig Data, Bigger Analytics
Big Data, Bigger Analytics
 
Relevance trilogy may dream be with you! (dec17)
Relevance trilogy  may dream be with you! (dec17)Relevance trilogy  may dream be with you! (dec17)
Relevance trilogy may dream be with you! (dec17)
 
Code-first GraphQL Server Development with Prisma
Code-first  GraphQL Server Development with PrismaCode-first  GraphQL Server Development with Prisma
Code-first GraphQL Server Development with Prisma
 
Sperasoft‬ talks j point 2015
Sperasoft‬ talks j point 2015Sperasoft‬ talks j point 2015
Sperasoft‬ talks j point 2015
 
Hivemall meets Digdag @Hackertackle 2018-02-17
Hivemall meets Digdag @Hackertackle 2018-02-17Hivemall meets Digdag @Hackertackle 2018-02-17
Hivemall meets Digdag @Hackertackle 2018-02-17
 
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project ADN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
 
Headache from using mathematical software
Headache from using mathematical softwareHeadache from using mathematical software
Headache from using mathematical software
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
 
BSSML17 - API and WhizzML
BSSML17 - API and WhizzMLBSSML17 - API and WhizzML
BSSML17 - API and WhizzML
 
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
 
How to grow GraphQL and remove SQLAlchemy and REST API from a high-load Pytho...
How to grow GraphQL and remove SQLAlchemy and REST API from a high-load Pytho...How to grow GraphQL and remove SQLAlchemy and REST API from a high-load Pytho...
How to grow GraphQL and remove SQLAlchemy and REST API from a high-load Pytho...
 
AppSync and GraphQL on iOS
AppSync and GraphQL on iOSAppSync and GraphQL on iOS
AppSync and GraphQL on iOS
 
On the representation and reuse of machine learning (ML) models
On the representation and reuse of machine learning (ML) modelsOn the representation and reuse of machine learning (ML) models
On the representation and reuse of machine learning (ML) models
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
 

Plus de BigML, Inc

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingBigML, Inc
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationBigML, Inc
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceBigML, Inc
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesBigML, Inc
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector BigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionBigML, Inc
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLBigML, Inc
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLBigML, Inc
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyBigML, Inc
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorBigML, Inc
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsBigML, Inc
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsBigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleBigML, Inc
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIBigML, Inc
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object DetectionBigML, Inc
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image ProcessingBigML, Inc
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureBigML, Inc
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorBigML, Inc
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotBigML, Inc
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...BigML, Inc
 

Plus de BigML, Inc (20)

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
 

Dernier

TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxAniqa Zai
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...HyderabadDolls
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service AvailableVastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Availablegargpaaro
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...HyderabadDolls
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 

Dernier (20)

TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service AvailableVastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 

VSSML18. Introduction to WhizzML

  • 1. Valencian Summer School in Machine Learning 4th edition September 13–14, 2018
  • 3. Outline 1 Server-side workflows: WhizzML 2 Example Workflow: Model or Ensemble? 3 Closing the cycle: WhizzML and Feature engineering #VSSML18 Basic WhizzML September 13–14, 2018 3 / 36
  • 4. Outline 1 Server-side workflows: WhizzML 2 Example Workflow: Model or Ensemble? 3 Closing the cycle: WhizzML and Feature engineering #VSSML18 Basic WhizzML September 13–14, 2018 4 / 36
  • 5. Client-side Machine Learning Automation Problems of client-side solutions Complexity Lots of details outside the problem domain Extensibility Bigmler hides complexity at the cost of flexibility • We need to explicitly control the resource management flows and cope with errors • Alternatively, we use assistants that do it for us, but for a limited subset of workflows #VSSML18 Basic WhizzML September 13–14, 2018 5 / 36
  • 6. Client-side Machine Learning Automation Problems of client-side solutions Scalability Client-side workflows hard to optimize Reuse No inter-language compatibility • We need to deal with number of parallel tasks and available shared resources • The same workflows need to be reprogrammed in many languages We’ve managed to abstract the ML algorithms logic, but not the workflow logic Not enough abstraction #VSSML18 Basic WhizzML September 13–14, 2018 6 / 36
  • 7. Higher-level Machine Learning #VSSML18 Basic WhizzML September 13–14, 2018 7 / 36
  • 8. Server-side Machine Learning Solution (scalability, reuse): Back to the server #VSSML18 Basic WhizzML September 13–14, 2018 8 / 36
  • 9. Basic workflows: automatic generation #VSSML18 Basic WhizzML September 13–14, 2018 9 / 36
  • 10. Server-side Machine Learning Automation Solution (complexity, extensibility): Domain-specific languages abstracting plus naming and full language flexibility #VSSML18 Basic WhizzML September 13–14, 2018 10 / 36
  • 11. WhizzML in a Nutshell • Domain-specific language for ML workflow automation High-level problem and solution specification • Framework for scalable, remote execution of ML workflows Sophisticated server-side optimization Out-of-the-box scalability Client-server brittleness removed Infrastructure for creating and sharing ML scripts and libraries #VSSML18 Basic WhizzML September 13–14, 2018 11 / 36
  • 12. WhizzML REST Resources Library Reusable building-block: a collection of WhizzML definitions that can be imported by other libraries or scripts. Script Executable code that describes an actual workflow. • Imports List of libraries with code used by the script. • Inputs List of input values that parameterize the workflow. • Outputs List of values computed by the script and returned to the user. Execution Given a script and a complete set of inputs, the workflow can be executed and its outputs generated. #VSSML18 Basic WhizzML September 13–14, 2018 12 / 36
  • 13. Use the REPL Defining global variables (define text "Hello BigMLers") Defining local variables (let (local-text "Hello BigMLers") (log-info local-text)) Defining procedures (define (print-hello name) (log-info "Hello " name)) ;; use it! (print-hello "BigMLers") every sentence returns a value and variables are immutable #VSSML18 Basic WhizzML September 13–14, 2018 13 / 36
  • 14. How to create WhizzML Scripts/Libraries Github Script editor Gallery Other scripts Scriptify −→ #VSSML18 Basic WhizzML September 13–14, 2018 14 / 36
  • 15. Higher-level Machine Learning #VSSML18 Basic WhizzML September 13–14, 2018 15 / 36
  • 16. Basic workflow in WhizzML (let (dataset (create-dataset source) cluster (create-cluster dataset)) (create-batchcentroid dataset cluster {"output_dataset" true "all_fields" true})) #VSSML18 Basic WhizzML September 13–14, 2018 16 / 36
  • 17. Abstraction at a higher level #VSSML18 Basic WhizzML September 13–14, 2018 17 / 36
  • 18. Scripts in WhizzML: Usable by any binding from bigml.api import BigML api = BigML() # choose workflow script = 'script/567b4b5be3f2a123a690ff56' # define parameters inputs = {'source': 'source/5643d345f43a234ff2310a3e'} # execute api.ok(api.create_execution(script, inputs)) #VSSML18 Basic WhizzML September 13–14, 2018 18 / 36
  • 19. Scripts in WhizzML: Trivial parallelization #VSSML18 Basic WhizzML September 13–14, 2018 19 / 36
  • 20. Scripts in WhizzML: Trivial parallelization #VSSML18 Basic WhizzML September 13–14, 2018 20 / 36
  • 21. What else do we need? The standard functions • Numeric and relational operators (+, *, <, =, ...) • Mathematical functions (cos, sinh, floor ...) • Strings and regular expressions (str, matches?, replace, ...) • Flatline generation • Collections: list traversal, sorting, map manipulation • BigML resources manipulation Creation create-source, create-and-wait-dataset, etc. Retrieval fetch, list-anomalies, etc. Update update Deletion delete • Machine Learning Algorithms (SMACDown, Boosting, etc.) #VSSML18 Basic WhizzML September 13–14, 2018 21 / 36
  • 22. Outline 1 Server-side workflows: WhizzML 2 Example Workflow: Model or Ensemble? 3 Closing the cycle: WhizzML and Feature engineering #VSSML18 Basic WhizzML September 13–14, 2018 22 / 36
  • 23. Model or Ensemble? • Split a dataset in test and training parts • Create a model and an ensemble with the training dataset • Evaluate both with the test dataset • Choose the one with better evaluation (f-measure) https://github.com/whizzml/examples/tree/master/model-or-ensemble #VSSML18 Basic WhizzML September 13–14, 2018 23 / 36
  • 24. Model or Ensemble? ;; Function encapsulating the full workflow (define (model-or-ensemble src-id) (let (ds-id (create-dataset src-id) [train-id test-id] (create-random-dataset-split ds-id 0.8) m-id (create-model train-id) e-id (create-ensemble train-id {"number_of_models" 15}) m-f (f-measure (create-evaluation m-id test-id)) e-f (f-measure (create-evaluation e-id test-id))) (if (> m-f e-f) m-id e-id))) We only need a new function f-measure to evaluate the f-measure of a model #VSSML18 Basic WhizzML September 13–14, 2018 24 / 36
  • 25. Model or Ensemble? ;; Function to extract the f-measure from an ;; evaluation, given its id. (define (f-measure ev-id) (let (ev-id (wait ev-id) ;; because fetch doesn't wait evaluation (fetch ev-id)) (evaluation ["result" "model" "average_f_measure"]))) #VSSML18 Basic WhizzML September 13–14, 2018 25 / 36
  • 26. Model or Ensemble? ;; Function encapsulating the full workflow (define (model-or-ensemble src-id) (let (ds-id (create-dataset src-id) [train-id test-id] (create-random-dataset-split ds-id 0. m-id (create-model train-id) e-id (create-ensemble train-id {"number_of_models" 15}) m-f (f-measure (create-evaluation m-id test-id)) e-f (f-measure (create-evaluation e-id test-id))) (if (> m-f e-f) m-id e-id))) ;; Compute the result of the script execution ;; - Inputs: [{"name": "input-source-id", "type": "source-id"}] ;; - Outputs: [{"name": "result", "type": "resource-id"}] (define result (model-or-ensemble input-source-id)) #VSSML18 Basic WhizzML September 13–14, 2018 26 / 36
  • 27. Outline 1 Server-side workflows: WhizzML 2 Example Workflow: Model or Ensemble? 3 Closing the cycle: WhizzML and Feature engineering #VSSML18 Basic WhizzML September 13–14, 2018 27 / 36
  • 28. Data Transformations Feature engineering is an unavoidable part of any Machine Learning DSL. WhizzML covers that thanks to a transformations language: Flatline #VSSML18 Basic WhizzML September 13–14, 2018 28 / 36
  • 29. Transforming item counts to features basket milk eggs flour salt chocolate caviar milk,eggs Y Y N N N N milk,flour Y N Y N N N milk,flour,eggs Y Y Y N N N chocolate N N N N Y N #VSSML18 Basic WhizzML September 13–14, 2018 29 / 36
  • 30. Adding a new feature with Flatline We need to... Create a new dataset by adding a new field that will have a binary content depending on the value of the basket field The WhizzML expression will be like (create-dataset ds-id {"new_fields" [{"name" new-field-name "field" new-field-value}]}) where the field value should be computed using a Flatline expression (if (contains-items? "basket" "milk") "Y" "N") #VSSML18 Basic WhizzML September 13–14, 2018 30 / 36
  • 31. Item counts to features with Flatline One new field per category (if (contains-items? "basket" "milk") "Y" "N") (if (contains-items? "basket" "eggs") "Y" "N") (if (contains-items? "basket" "flour") "Y" "N") (if (contains-items? "basket" "salt") "Y" "N") (if (contains-items? "basket" "chocolate") "Y" "N") (if (contains-items? "basket" "caviar") "Y" "N") Parameterized code generation Field name Item values Y/N category names #VSSML18 Basic WhizzML September 13–14, 2018 31 / 36
  • 32. Flatline code generation with WhizzML The WhizzML code should generate a string per category "(if (contains-items? "basket" "milk") "Y" "N")" #VSSML18 Basic WhizzML September 13–14, 2018 32 / 36
  • 33. Flatline code generation with WhizzML The WhizzML code should generate a string per category "(if (contains-items? "basket" "milk") "Y" "N")" Let’s extract the parameters in the expression (let (field "basket" item "milk" yes "Y" no "N") (flatline "(if (contains-items? {{field}} {{item}})" "{{yes}}" "{{no}})")) #VSSML18 Basic WhizzML September 13–14, 2018 32 / 36
  • 34. Flatline code generation with WhizzML The WhizzML code should generate a string per category "(if (contains-items? "basket" "milk") "Y" "N")" Let’s extract the parameters in the expression (let (field "basket" item "milk" yes "Y" no "N") (flatline "(if (contains-items? {{field}} {{item}})" "{{yes}}" "{{no}})")) Eventually, let’s create a procedure (define (field-flatline field item yes no) (flatline "(if (contains-items? {{field}} {{item}})" "{{yes}}" "{{no}})")) #VSSML18 Basic WhizzML September 13–14, 2018 32 / 36
  • 35. Flatline code generation with WhizzML (define (field-flatline field item yes no) (flatline "(if (contains-items? {{field}} {{item}})" "{{yes}}" "{{no}})")) (define (item-fields field items yes no) (for (item items) {"field" (field-flatline field item yes no)})) (define (dataset-item-fields ds-id field) (let (ds (fetch ds-id) item-dist (ds ["fields" field "summary" "items"]) items (map head item-dist)) (item-fields field items "Y" "N"))) #VSSML18 Basic WhizzML September 13–14, 2018 33 / 36
  • 36. Flatline code generation with WhizzML (define output-dataset (let (fs {"new_fields" (dataset-item-fields input-dataset field)}) (create-dataset input-dataset fs))) {"inputs": [{"name": "input-dataset", "type": "dataset-id", "description": "The input dataset"}, {"name": "field", "type": "string", "description": "Id of the items field"}], "outputs": [{"name": "output-dataset", "type": "dataset-id", "description": "The id of the generated dataset"}]} #VSSML18 Basic WhizzML September 13–14, 2018 34 / 36
  • 37. More information Resources • Home: https://bigml.com/whizzml • Documentation: https://bigml.com/whizzml#documentation • Examples: https://github.com/whizzml/examples #VSSML18 Basic WhizzML September 13–14, 2018 35 / 36
  • 38. Questions? #VSSML18 Basic WhizzML September 13–14, 2018 36 / 36