WhizzML is a domain-specific language for automating Machine Learning workflows, implement high-level Machine Learning algorithms, and easily share them with others. WhizzML offers out-of-the-box scalability, abstracts away the complexity of underlying infrastructure, and helps analysts, developers, and scientists reduce the burden of repetitive and time-consuming analytics tasks.
2. Outline
1 What is WhizzML?
2 WhizzML Server-side Resources
3 WhizzML Language Basics
4 Standard Library Overview
5 Tutorial Walkthrough: Model or Ensemble?
The BigML Team Basic WhizzML Workflows May 2016 2 / 24
3. Outline
1 What is WhizzML?
2 WhizzML Server-side Resources
3 WhizzML Language Basics
4 Standard Library Overview
5 Tutorial Walkthrough: Model or Ensemble?
The BigML Team Basic WhizzML Workflows May 2016 3 / 24
4. WhizzML in a Nutshell
• Domain-specific language for ML workflow automation
High-level problem and solution specification
• Framework for scalable, remote execution of ML workflows
Sophisticated server-side optimization
Out-of-the-box scalability
Client-server brittleness removed
Infrastructure for creating and sharing ML scripts and libraries
The BigML Team Basic WhizzML Workflows May 2016 4 / 24
5. Outline
1 What is WhizzML?
2 WhizzML Server-side Resources
3 WhizzML Language Basics
4 Standard Library Overview
5 Tutorial Walkthrough: Model or Ensemble?
The BigML Team Basic WhizzML Workflows May 2016 5 / 24
6. WhizzML REST Resources
Library Reusable building-block: a collection of
WhizzML definitions that can be imported by
other libraries or scripts.
Script Executable code that describes an actual
workflow.
• Imports List of libraries with code used by
the script.
• Inputs List of input values that
parameterize the workflow.
• Outputs List of values computed by the
script and returned to the user.
Execution Given a script and a complete set of inputs,
the workflow can be executed and its outputs
generated.
The BigML Team Basic WhizzML Workflows May 2016 6 / 24
7. Outline
1 What is WhizzML?
2 WhizzML Server-side Resources
3 WhizzML Language Basics
4 Standard Library Overview
5 Tutorial Walkthrough: Model or Ensemble?
The BigML Team Basic WhizzML Workflows May 2016 7 / 24
8. Basic Syntax
Atomic constants
"a string value"
23, -10, -1.23E11, 1.42342
true, false
Fully parenthesized prefix notation
(list-sources) ;; Function call without arguments
(log-info "Hello World!")
(* 2 (+ 2 3)) ;; Evaluates to 2 * (2 + 3)
(atan (tan 3)) ;; Nested function calls
The BigML Team Basic WhizzML Workflows May 2016 8 / 24
12. Functions
Defining a function
(define (function-name arg1 arg2 ...)
body)
Examples
(define (add-numbers x y)
(+ x y))
(define (create-model-and-ensemble dataset-id)
(create-model {"dataset" dataset-id})
(create-ensemble {"dataset" dataset-id
"number_of_models" 10}))
The BigML Team Basic WhizzML Workflows May 2016 12 / 24
13. Local variables
Let bindings
(let (name-1 val-1
name-2 val-2
...)
body)
Example:
(define no-of-models 10)
(let (msg "I am creating "
id "dataset/570861ecb85eee0472000016")
;; here msg, id and no-of-models are bound
(log-info msg no-of-models)
(create-ensemble {"dataset" id
"number_of_models" no-of-models}))
;;; here msg and id are *not* bound
The BigML Team Basic WhizzML Workflows May 2016 13 / 24
14. Conditionals
if
(if (> x 0) ;; condition
"x is positive" ;; consequent
"x is not positive") ;; alternative
when
(when (positive? n)
(log-info "Creating a few models...")
(create-lots-of-models n))
The BigML Team Basic WhizzML Workflows May 2016 14 / 24
15. Conditionals
cond
;; Nested conditionals
(if (> x 3)
"big"
(if (< x 1)
"small"
"standard"))
;; are better with cond:
(cond (> x 3) "big"
(< x 1) "small"
"standard")
The BigML Team Basic WhizzML Workflows May 2016 15 / 24
16. Error handling
Signaling errors
(raise {"message" "Division by zero" "code" -10})
Catching errors
(try (/ 42 x)
(catch e
(log-warn "I've got an error with message: "
(get e "message")
" and code "
(get e "code"))))
The BigML Team Basic WhizzML Workflows May 2016 16 / 24
17. Demo: a simple script
Create dataset and return its row number
(define (make-dataset id name)
(let (ds-id (create-and-wait-dataset {"source" id
"name" name}))
(fetch ds-id)))
(define dataset (make-dataset source-id source-name))
(define dataset-id (get dataset "resource"))
(define rows (get dataset "rows"))
https://gist.github.com/whizzmler/917a05cf6c173381116e3cc02da70e42
The BigML Team Basic WhizzML Workflows May 2016 17 / 24
18. Outline
1 What is WhizzML?
2 WhizzML Server-side Resources
3 WhizzML Language Basics
4 Standard Library Overview
5 Tutorial Walkthrough: Model or Ensemble?
The BigML Team Basic WhizzML Workflows May 2016 18 / 24
19. Standard functions
• Numeric and relational operators (+, *, <, =, ...)
• Mathematical functions (cos, sinh, floor ...)
• Strings and regular expressions (str, matches?, replace, ...)
• Flatline generation
• Collections: list traversal, sorting, map manipulation
• BigML resources manipulation
Creation create-source, create-and-wait-dataset, etc.
Retrieval fetch, list-anomalies, etc.
Update update
Deletion delete
• Machine Learning Algorithms (SMACDown, Boosting, etc.)
The BigML Team Basic WhizzML Workflows May 2016 19 / 24
20. Outline
1 What is WhizzML?
2 WhizzML Server-side Resources
3 WhizzML Language Basics
4 Standard Library Overview
5 Tutorial Walkthrough: Model or Ensemble?
The BigML Team Basic WhizzML Workflows May 2016 20 / 24
21. Model or Ensemble?
• Split a dataset in test and training parts
• Create a model and an ensemble with the training dataset
• Evaluate both with the test dataset
• Choose the one with better evaluation (f-measure)
https://github.com/whizzml/examples/tree/master/model-or-ensemble
The BigML Team Basic WhizzML Workflows May 2016 21 / 24
22. Model or Ensemble?
;; Functions for creating the two dataset parts
;; and the model and ensemble from the training set.
(define (sample-dataset ds-id rate oob)
(create-and-wait-dataset {"sample_rate" rate
"origin_dataset" ds-id
"out_of_bag" oob
"seed" "whizzml-example"}))
(define (split-dataset ds-id rate)
(list (sample-dataset ds-id rate false)
(sample-dataset ds-id rate true)))
(define (make-model ds-id)
(create-and-wait-model {"dataset" ds-id}))
(define (make-ensemble ds-id size)
(create-and-wait-ensemble {"dataset" ds-id
"number_of_models" size}))
The BigML Team Basic WhizzML Workflows May 2016 22 / 24
23. Model or Ensemble?
;; Functions for evaluating model and ensemble
;; using the test set, and to extract f-measure from
;; the evaluation results
(define (evaluate-model model-id ds-id)
(create-and-wait-evaluation {"model" model-id
"dataset" ds-id}))
(define (evaluate-ensemble model-id ds-id)
(create-and-wait-evaluation {"ensemble" model-id
"dataset" ds-id}))
(define (f-measure ev-id)
(get-in (fetch ev-id) ["result" "model" "average_f_measure"]))
The BigML Team Basic WhizzML Workflows May 2016 23 / 24
24. Model or Ensemble?
;; Function encapsulating the full workflow
(define (model-or-ensemble src-id)
(let (ds-id (create-and-wait-dataset {"source" src-id})
;; ^ full dataset
ids (split-dataset ds-id 0.8) ;; split it 80/20
train-id (nth ids 0) ;; the 80% for training
test-id (nth ids 1) ;; and 20% for evaluations
m-id (make-model train-id) ;; create a model
e-id (make-ensemble train-id 15) ;; and an ensemble
m-f (f-measure (evaluate-model m-id test-id)) ;; evaluate
e-f (f-measure (evaluate-ensemble e-id test-id)))
(log-info "model f " m-f " / ensemble f " e-f)
(if (> m-f e-f) m-id e-id)))
;; Compute the result of the script execution
;; - Inputs: [{"name": "input-source-id", "type": "source-id"}]
;; - Outputs: [{"name": "result", "type": "resource-id"}]
(define result (model-or-ensemble input-source-id))
The BigML Team Basic WhizzML Workflows May 2016 24 / 24