On Jan 10, 2014, we presented this to a full-house audience, members of the Silicon Valley Machine Learning group, at the Hacker Dojo in Mountain View, CA. The session was enjoyable with a very interactive, well-informed audience.
What's New in Teams Calling, Meetings and Devices March 2024
Data Intelligence for All: Interactive Analytics for Big Data
1. Data Intelligence for All
Visual, Interactive, Predictive Analytics for Big Data
!
Christopher Nguyen, PhD
Co-Founder & CEO
Presented on January 10, 2014
2. Big-Data Compute Engines, Google Apps
Engineering Director, Google Founders’ Award,
HKUST Prof, 2 successful enterprise exits,
Stanford PhD
Deep engineering &
business experience from
Google, Yahoo et al.
PhD’s in DM & ML from
UIUC, Georgia Tech,
Stanford, Berkeley, ...
Hadoop distributed/streaming analytics,Yahoo
Hadoop Eng, UIUC PhD
Machine learning & machine vision, US Army
Research Lab, Johns Hopkins PhD
http://adatao.com
5. 2
0
1
3
BIG DATA + BIG COMPUTE =
OPPORTUNITIES
BIG DA
TA=
PROBL
EMS
automatic customer segmentation
Machine Learning
Predictive Analytics
Natural Language
http://adatao.com
19. for Business Users
Predictive Decision Making
A Beautiful New Way to Create & Share
Visual Narratives of Your Analysis
Perform Ad Hoc Queries in Plain English
Publish Streaming, Interactive Dashboards
Collaborate With Others In Real Time
Query Terabytes in Seconds.
http://adatao.com
20. 001 001
0 1 1 00 1 1 0
1 1 1 01 1 1 0
1 0 0 01 0 0 0
0 0 0 10 0 0 1
0 0 0 10 0 0 1
0 1 1 00 1 1 0
1 1 1 01 1 1 0
for Data Scientists & Engineers
Big Data Mining & Machine Learning
Powerful In-Memory Data Mining & Machine
Learning—Model Terabytes in Seconds
Interactive, Cluster-Scale Data Munging &
Modeling with Native R, R-Studio, Python,
SQL, and Java Front-ends
Real-Time Scoring Directly From Trained Models
Share reproducible, live data analysis documents
Hadoop, Cassandra, RDBMS, Streaming Data
http://adatao.com
27. Algorithm
Run time (sec) for
50GB (800M rows)
Per-Core Throughput
(MB/sec-core)
Per-Machine Throughput
(MB/s)
adatao.lm (ridge)
3.04
130
1,040
adatao.lm
4.05
102
816
adatao.lm.gd
12.2
32
256
adatao.glm.gd
24.5
16
128
adatao.glm
36.1
11
88
adatao.kmeans
335
1.2
9.6
pAnalytics performance on building machine learning models with cluster
Adatao16 (m3.2xlarge) on a 50GB data set of 5 features and 800 million rows.
(Gradient descent algorithms are over 5 iterations)
http://adatao.com
28. Algorithm
Run time (sec) for
1.1 TB dataset (1.6B rows)
Per-Core Throughput
(MB/sec-core)
Per-Machine Throughput
(MB/s)
adatao.lm (ridge)
70.9
130
1,040
adatao.lm
74.9
123
984
adatao.lm.gd
127
72.8
582
adatao.glm.gd
145
63.6
509
pAnalytics performance on building machine learning models with cluster
Adatao40 (m3.2xlarge) on a 1.1 TB data set of 40 features and 1.6 billion rows.
(Gradient descent algorithms are over 5 iterations)
http://adatao.com