At H2O.ai we see a world where all software will incorporate AI, and we’re focused on bringing AI to business through software. H2O.ai is the maker behind H2O, the leading open source machine and deep learning platform for smarter applications and data products. H2O operationalizes data science by developing and deploying algorithms and models for R, Python and the Sparkling Water API for Spark.
In this webinar, you will learn about the scalable H2O core platform and the distributed algorithms it supports. H2O integrates seamlessly with the R and the Python environments. We will show you how to leverage the power of H2O algorithms in R, Python and H2O Flow interface. Come with an open mind and some high level knowledge of machine learning, and you will take away a stream of knowledge for your next ML/DL project.
Amy Wang is a math hacker at H2O, as well as the Sales Engineering Lead. She graduated from Hunter College in NYC with a Masters in Applied Mathematics and Statistics with a heavy concentration on numerical analysis and financial mathematics.
Her interest in applicable math eventually lead her to big data and finding the appropriate mediums for data analysis.
Desmond is a Senior Director of Marketing at H2O.ai. In his 15+ years of career in Enterprise Software, Desmond worked in Distributed Systems, Storage, Virtualization, MPP databases, Streaming Analytics Platform, and most recently Machine Learning. He obtained his Master’s degree in Computer Science from Stanford University and MBA degree from UC Berkeley, Haas School of Business.
3. Agenda for H2O Introduction Webinar
▪ Company Introduction (5 mins)
▪ H2O Introduction and Demo (35 mins)
– Installation of H2O
– Flight delay prediction use case
• Use case description
• Data set description
• Data munging
• Model creation
▪ Q&A (10 mins)
4. H2O AI Platform
In-Memory, Distributed
Machine Learning with
Visual Intelligence
H2O AI in Spark
with Data Prep and ML
Pipelines
Operationalize Model
Building and Deployment
Governance.
Best-of-breed
GPU Deep Learning
with easy API and AutoML
TensorFlow, MXNet or Caffe
and H2O
Deep
Water
AI For Business
Transformation
Insights on Text,
Images, Transactions,
Speech
Best Machine
Learning Algorithms
on Spark
Platform to Build and
Scale Data Products.
Dual licensing (AGPL
and Commercial)
H2O is the #1 Platform for Open Source AI
5. Open Source Drives Community Adoption
Companies Using H2O.ai
2014 2015 2016 2017
9173
6427
3810
400
H2O.ai Users
2014 2015 2016 2017
83108
54163
38257
1000
* Data from July of every year, except for 2017 when data from Feb 21st are used.
7. H2O.ai Strongly Positioned in Key Analyst Reports and Press
“Overall customer satisfaction is very
high.”
“H2O is especially suited to IoT edge
and device scenarios.”
“H2O had the highest reference customer
analytics support score of all the
vendors.”
H2O.ai is a Visionary
in the Gartner Magic Quadrant
for Data Science Platforms
“H2O.ai has significant adoption by
large enterprises such as Macy’s,
Comcast, and Capital One.”
“H2O.ai is best known for developing
open source, cluster-distributed ML
algorithms at a time (2011) when big data
demanded them, but no one else had
them.”
H2O.ai is a Strong Performer
in the Forrester Predictive
Analytics & Machine Learning
H2O.ai is a Top 10 Hot Artificial
Intelligence (AI) Technologies
on Forbes
H2O.ai named alongside Nvidia, Google,
IBM, Intel, Microsoft, SAS, et al as in Top
10 Hot Artificial Intelligence (AI) on
Forbes - contributed by Gil Press
8. H2O Use Cases – Videos and Talks
Auto
Insurance
UBI
Telematics
Commercial
Insurance
Risk Analytics
Financial
Services
Customer
Insights
Digital Marketing
Consumer
Behavior
Pawan Divarkarla
Chief Data Officer
“H2O is an enabler in
how people are
thinking about data.”
Conor Jensen
Analytics Director
“Advanced analytics
was one of the key
investments we
decided to make.”
Brendan Herger
Data Scientist
“H2O is the best solution
to to iterate very quickly
on large datasets and
produce meaning models.”
Satya Satyamoorthy
Director, Software Dev
"I am a big fan of open
source. H2O is the best
fit in terms of cost as
well as ease of use and
scalability and
usability.”
Play Video Play Video Play Video Play Video
Progressive Zurich Capital One Nielsen Catalina
11. Supervised Learning
H2O Algorithms
Statistical
Analysis
Ensembles
Deep Neural
Networks
• Generalized Linear Models: Binomial, Gaussian, Gamma, Poisson, and
Tweedie
• Naive Bayes: Binary Text Classification
• Distributed Random Forest: Classification or Regression Models
• Gradient Boosting Machine: Ensembles of shallow decision trees with
increasing refined approximations
• Deep Learning: Create multi-layer feed forward neural networks starting
with an input layer followed by multiple layers of nonlinear transformations
12. Unsupervised Learning
Clustering
Dimensionality
Reduction
Anomaly Detection
• K-means: Partition observations into k clusters of the same spatial size.
Categorical features are one hot encoded.
• Archetypes [GLRM]: Partition observations into k archetypes.
• Principal Component Analysis: Linearly transforms correlated variables
to independent components
• Generalized Low Rank Model: Approximates data set as a product of
two low dimensional factors. Extends PCA to handle sparse data,
categorical data, and adds regularization.
• Autoencoders [Deep Learning]: Create multi-layer feed forward neural
networks starting with an input layer followed by multiple layers of
nonlinear transformations
H2O Algorithms
13. Accuracy with Speed and Scale
HDFS
S3
SQL
NoSQL
Classification
Regression
Feature
Engineering
In-Memory
Map Reduce/Fork Join
Columnar Compression
Deep Learning
PCA, GLM, Cox
Random Forest / GBM
Ensembles
Fast Modeling Engine
Streaming
Nano Fast Java Scoring Engines
Matrix
Factorization
Clustering
Munging
14. Reading Data into H2O with R
STEP 1
R user
h2o_df = h2o.importFile(“../data/allyears2k.csv”)
15. Reading Data from HDFS into H2O with R
H2O
H2O
H2O
data.csv
HTTP REST
API request to
H2O
has HDFS path
H2O ClusterInitiate
distributed
ingest
HDFS
Request
data from
HDFS
STEP 2
2.2
2.3
2.4
R
h2o.importFile()
2.1
R function
call
16. Reading Data from HDFS into H2O with R
H2O
H2O
H2O
R
HDFS
STEP 3
Cluster IP
Cluster Port
Pointer to Data
Return pointer
to data in
REST API
JSON
Response
HDFS
provides
data
3.3
3.4
3.1h2o_df object
created in R
data.csv
h2o_df
H2O
Fram
e
3.2
Distributed
H2O
Frame in DKV
H2O Cluster