3. Outline
1 Server-side workflows: WhizzML
2 Example Workflow: Model or Ensemble?
3 Closing the cycle: WhizzML and Feature engineering
#VSSML18 Basic WhizzML September 13–14, 2018 3 / 36
4. Outline
1 Server-side workflows: WhizzML
2 Example Workflow: Model or Ensemble?
3 Closing the cycle: WhizzML and Feature engineering
#VSSML18 Basic WhizzML September 13–14, 2018 4 / 36
5. Client-side Machine Learning Automation
Problems of client-side solutions
Complexity Lots of details outside the problem domain
Extensibility Bigmler hides complexity at the cost of flexibility
• We need to explicitly control the resource management flows and cope
with errors
• Alternatively, we use assistants that do it for us, but for a limited subset
of workflows
#VSSML18 Basic WhizzML September 13–14, 2018 5 / 36
6. Client-side Machine Learning Automation
Problems of client-side solutions
Scalability Client-side workflows hard to optimize
Reuse No inter-language compatibility
• We need to deal with number of parallel tasks and available shared
resources
• The same workflows need to be reprogrammed in many languages
We’ve managed to abstract the ML algorithms logic, but not the
workflow logic
Not enough abstraction
#VSSML18 Basic WhizzML September 13–14, 2018 6 / 36
10. Server-side Machine Learning Automation
Solution (complexity, extensibility): Domain-specific languages
abstracting plus naming and full language flexibility
#VSSML18 Basic WhizzML September 13–14, 2018 10 / 36
11. WhizzML in a Nutshell
• Domain-specific language for ML workflow automation
High-level problem and solution specification
• Framework for scalable, remote execution of ML workflows
Sophisticated server-side optimization
Out-of-the-box scalability
Client-server brittleness removed
Infrastructure for creating and sharing ML scripts and libraries
#VSSML18 Basic WhizzML September 13–14, 2018 11 / 36
12. WhizzML REST Resources
Library Reusable building-block: a collection of
WhizzML definitions that can be imported by
other libraries or scripts.
Script Executable code that describes an actual
workflow.
• Imports List of libraries with code used by
the script.
• Inputs List of input values that
parameterize the workflow.
• Outputs List of values computed by the
script and returned to the user.
Execution Given a script and a complete set of inputs,
the workflow can be executed and its outputs
generated.
#VSSML18 Basic WhizzML September 13–14, 2018 12 / 36
13. Use the REPL
Defining global variables
(define text "Hello BigMLers")
Defining local variables
(let (local-text "Hello BigMLers")
(log-info local-text))
Defining procedures
(define (print-hello name)
(log-info "Hello " name))
;; use it!
(print-hello "BigMLers")
every sentence returns a value and variables are immutable
#VSSML18 Basic WhizzML September 13–14, 2018 13 / 36
14. How to create WhizzML Scripts/Libraries
Github
Script editor
Gallery
Other scripts
Scriptify
−→
#VSSML18 Basic WhizzML September 13–14, 2018 14 / 36
17. Abstraction at a higher level
#VSSML18 Basic WhizzML September 13–14, 2018 17 / 36
18. Scripts in WhizzML: Usable by any binding
from bigml.api import BigML
api = BigML()
# choose workflow
script = 'script/567b4b5be3f2a123a690ff56'
# define parameters
inputs = {'source': 'source/5643d345f43a234ff2310a3e'}
# execute
api.ok(api.create_execution(script, inputs))
#VSSML18 Basic WhizzML September 13–14, 2018 18 / 36
19. Scripts in WhizzML: Trivial parallelization
#VSSML18 Basic WhizzML September 13–14, 2018 19 / 36
20. Scripts in WhizzML: Trivial parallelization
#VSSML18 Basic WhizzML September 13–14, 2018 20 / 36
21. What else do we need?
The standard functions
• Numeric and relational operators (+, *, <, =, ...)
• Mathematical functions (cos, sinh, floor ...)
• Strings and regular expressions (str, matches?, replace, ...)
• Flatline generation
• Collections: list traversal, sorting, map manipulation
• BigML resources manipulation
Creation create-source, create-and-wait-dataset, etc.
Retrieval fetch, list-anomalies, etc.
Update update
Deletion delete
• Machine Learning Algorithms (SMACDown, Boosting, etc.)
#VSSML18 Basic WhizzML September 13–14, 2018 21 / 36
22. Outline
1 Server-side workflows: WhizzML
2 Example Workflow: Model or Ensemble?
3 Closing the cycle: WhizzML and Feature engineering
#VSSML18 Basic WhizzML September 13–14, 2018 22 / 36
23. Model or Ensemble?
• Split a dataset in test and training parts
• Create a model and an ensemble with the training dataset
• Evaluate both with the test dataset
• Choose the one with better evaluation (f-measure)
https://github.com/whizzml/examples/tree/master/model-or-ensemble
#VSSML18 Basic WhizzML September 13–14, 2018 23 / 36
24. Model or Ensemble?
;; Function encapsulating the full workflow
(define (model-or-ensemble src-id)
(let (ds-id (create-dataset src-id)
[train-id test-id] (create-random-dataset-split ds-id 0.8)
m-id (create-model train-id)
e-id (create-ensemble train-id {"number_of_models" 15})
m-f (f-measure (create-evaluation m-id test-id))
e-f (f-measure (create-evaluation e-id test-id)))
(if (> m-f e-f) m-id e-id)))
We only need a new function
f-measure to evaluate the f-measure of a model
#VSSML18 Basic WhizzML September 13–14, 2018 24 / 36
25. Model or Ensemble?
;; Function to extract the f-measure from an
;; evaluation, given its id.
(define (f-measure ev-id)
(let (ev-id (wait ev-id) ;; because fetch doesn't wait
evaluation (fetch ev-id))
(evaluation ["result" "model" "average_f_measure"])))
#VSSML18 Basic WhizzML September 13–14, 2018 25 / 36
26. Model or Ensemble?
;; Function encapsulating the full workflow
(define (model-or-ensemble src-id)
(let (ds-id (create-dataset src-id)
[train-id test-id] (create-random-dataset-split ds-id 0.
m-id (create-model train-id)
e-id (create-ensemble train-id {"number_of_models" 15})
m-f (f-measure (create-evaluation m-id test-id))
e-f (f-measure (create-evaluation e-id test-id)))
(if (> m-f e-f) m-id e-id)))
;; Compute the result of the script execution
;; - Inputs: [{"name": "input-source-id", "type": "source-id"}]
;; - Outputs: [{"name": "result", "type": "resource-id"}]
(define result (model-or-ensemble input-source-id))
#VSSML18 Basic WhizzML September 13–14, 2018 26 / 36
27. Outline
1 Server-side workflows: WhizzML
2 Example Workflow: Model or Ensemble?
3 Closing the cycle: WhizzML and Feature engineering
#VSSML18 Basic WhizzML September 13–14, 2018 27 / 36
28. Data Transformations
Feature engineering is an
unavoidable part of any Machine
Learning DSL. WhizzML covers
that thanks to a transformations
language: Flatline
#VSSML18 Basic WhizzML September 13–14, 2018 28 / 36
29. Transforming item counts to features
basket milk eggs flour salt chocolate caviar
milk,eggs Y Y N N N N
milk,flour Y N Y N N N
milk,flour,eggs Y Y Y N N N
chocolate N N N N Y N
#VSSML18 Basic WhizzML September 13–14, 2018 29 / 36
30. Adding a new feature with Flatline
We need to...
Create a new dataset by adding a new field that will have
a binary content depending on the value of the basket
field
The WhizzML expression will be like
(create-dataset ds-id
{"new_fields" [{"name" new-field-name
"field" new-field-value}]})
where the field value should be computed using a Flatline expression
(if (contains-items? "basket" "milk") "Y" "N")
#VSSML18 Basic WhizzML September 13–14, 2018 30 / 36
31. Item counts to features with Flatline
One new field per category
(if (contains-items? "basket" "milk") "Y" "N")
(if (contains-items? "basket" "eggs") "Y" "N")
(if (contains-items? "basket" "flour") "Y" "N")
(if (contains-items? "basket" "salt") "Y" "N")
(if (contains-items? "basket" "chocolate") "Y" "N")
(if (contains-items? "basket" "caviar") "Y" "N")
Parameterized code generation
Field name
Item values
Y/N category names
#VSSML18 Basic WhizzML September 13–14, 2018 31 / 36
32. Flatline code generation with WhizzML
The WhizzML code should generate a string per category
"(if (contains-items? "basket" "milk") "Y" "N")"
#VSSML18 Basic WhizzML September 13–14, 2018 32 / 36
33. Flatline code generation with WhizzML
The WhizzML code should generate a string per category
"(if (contains-items? "basket" "milk") "Y" "N")"
Let’s extract the parameters in the expression
(let (field "basket"
item "milk"
yes "Y"
no "N")
(flatline "(if (contains-items? {{field}} {{item}})"
"{{yes}}"
"{{no}})"))
#VSSML18 Basic WhizzML September 13–14, 2018 32 / 36
34. Flatline code generation with WhizzML
The WhizzML code should generate a string per category
"(if (contains-items? "basket" "milk") "Y" "N")"
Let’s extract the parameters in the expression
(let (field "basket"
item "milk"
yes "Y"
no "N")
(flatline "(if (contains-items? {{field}} {{item}})"
"{{yes}}"
"{{no}})"))
Eventually, let’s create a procedure
(define (field-flatline field item yes no)
(flatline "(if (contains-items? {{field}} {{item}})"
"{{yes}}"
"{{no}})"))
#VSSML18 Basic WhizzML September 13–14, 2018 32 / 36
35. Flatline code generation with WhizzML
(define (field-flatline field item yes no)
(flatline "(if (contains-items? {{field}} {{item}})"
"{{yes}}"
"{{no}})"))
(define (item-fields field items yes no)
(for (item items)
{"field" (field-flatline field item yes no)}))
(define (dataset-item-fields ds-id field)
(let (ds (fetch ds-id)
item-dist (ds ["fields" field "summary" "items"])
items (map head item-dist))
(item-fields field items "Y" "N")))
#VSSML18 Basic WhizzML September 13–14, 2018 33 / 36
36. Flatline code generation with WhizzML
(define output-dataset
(let (fs {"new_fields" (dataset-item-fields input-dataset
field)})
(create-dataset input-dataset fs)))
{"inputs": [{"name": "input-dataset",
"type": "dataset-id",
"description": "The input dataset"},
{"name": "field",
"type": "string",
"description": "Id of the items field"}],
"outputs": [{"name": "output-dataset",
"type": "dataset-id",
"description": "The id of the generated dataset"}]}
#VSSML18 Basic WhizzML September 13–14, 2018 34 / 36
37. More information
Resources
• Home: https://bigml.com/whizzml
• Documentation: https://bigml.com/whizzml#documentation
• Examples: https://github.com/whizzml/examples
#VSSML18 Basic WhizzML September 13–14, 2018 35 / 36