The Briefing Room with Dr. Robin Bloor, Trifacta and Zoomdata
Live Webcast March 10, 2015
Watch the Archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=dd9fed3c7c476ae3a0f881ae6b53dcc5
Square pegs and round holes don't get along, which is one reason why traditional data management approaches simply won't work for Big Data. The variety and velocity of data types flying at us today require a new strategy for identifying, streamlining and utilizing information assets and processes. Decades-old technology won’t cut it – a combination of new tools and techniques must be used to enable effective discovery of insights in a timely fashion.
Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor explain why today's data landscape calls for a much different data management approach. He'll be briefed by Trifacta and Zoomdata, who will show how their technologies use a range of functionality – including machine learning – to help companies "wrangle" their data. They'll also demonstrate the optimal step-by-step process of working with new data types.
Visit InsideAnalysis.com for more information.
3. Twitter Tag: #briefr The Briefing Room
Welcome
Host:
Eric Kavanagh
eric.kavanagh@bloorgroup.com
@eric_kavanagh
4. Twitter Tag: #briefr The Briefing Room
Reveal the essential characteristics of enterprise
software, good and bad
Provide a forum for detailed analysis of today s innovative
technologies
Give vendors a chance to explain their product to savvy
analysts
Allow audience members to pose serious questions... and
get answers!
Mission
5. Twitter Tag: #briefr The Briefing Room
Topics
March: BI/ANALYTICS
April: BIG DATA
May: CLOUD
6. Twitter Tag: #briefr The Briefing Room
Should I Bring My Tools?
Ø Hammers aren’t good for
plumbing!
Ø Big Data requires a new set
of tools
Ø Preparing and Exploring are
very different
Ø Don’t throw out your old
tool box!
7. Twitter Tag: #briefr The Briefing Room
Analyst: Robin Bloor
Robin Bloor is
Chief Analyst at
The Bloor Group
robin.bloor@bloorgroup.com
@robinbloor
8. Twitter Tag: #briefr The Briefing Room
Trifacta and Zoomdata
Trifacta offers a platform for
data transformation and
preparation
The interface is rich in
visualization and provides
previews and
recommendations
The platform also includes a
learning layer which employs
machine learning algorithms to
facilitate automation and self-
learning
Zoomdata is a Big Data
exploration, visualization and
analytics platform
The platform offers a wide
range of analytics and BI tools,
such as dashboards, stream
processing and IoT analytics
Its pre-built connectors allow
the Zoomdata server to
connect directly to data
sources
9. Twitter Tag: #briefr The Briefing Room
Guests:
Russ Cosentino is Vice President of Marketing & Business
Development at Zoomdata. Throughout his career he has focused on
developing solutions that leverage technology to solve business
problems. His experience includes application development for
mission critical systems for the DoD, automated recruitment
programs for the intelligence community and the application of text
analytics for commercial VOC programs.
Dr. Joe Hellerstein is Trifacta’s Chief Strategy Officer and a
Professor of Computer Science at Berkeley. His career in research
and industry has focused on data-centric systems and the way they
drive computing. In 2010, Fortune Magazine included him in their
list of 50 smartest people in technology, and MIT Technology Review
magazine included his Bloom language for cloud computing on their
TR10 list of the 10 technologies “most likely to change our world.”
11. Dr. Joe Hellerstein
Professor, EECS Computer Science Division, UC Berkeley
Co-founder & Chief Strategy Officer, Trifacta
DATA WRANGLING
AND THE ART OF BIG DATA DISCOVERY
Russ Cosentino
Vice President
Marketing & Business Development, Zoomdata
12. Founded in 2012, from Berkeley/Stanford research roots
dp = data to the people
“facilitating interactions between people and data
throughout the analytic lifecycle”
Stanford Visualization Group’s “Data Wrangler”
Elegant solutions for a messy world:
The 80% problem of preparing data for exploratory analytics
13. TRADITIONAL APPROACH TO DATA MANAGEMENT
Enterprise
Data
Warehouse
Implement
Data
Sources
ETL
Structured
Ingest
Storage
#1,
2,
N
ELT
Store
&
Process
EDW
Archive
ETL
Access
Data
Analyze
Data
Search
Statistical
Machine
Learning
SQL
Serve
Serve
Optimize
Implement
Custom
Application
Point Solution
ELT
ELT
14. MANY PEOPLE INVOLVED IN THE PROCESS
DATA
ARCHITECT
DATABASE
ADMINISTRATOR
SYSTEM
ADMINISTRATOR
BUSINESS
ANALYST
BI
ADMINISTRATOR
SYSTEM
ADMINISTRATOR
15. IT COULD BE SIMPLER
DATABASE ADMINISTRATOR BUSINESS ANALYST
16. MODERN DATA AND VISUALIZATION ENVIRONMENT
Visualiza8on
Data
Sources
Structured
Ingest
Store
&
Process
Data
Prepara8on
Serve
Unstructured
Ingest
Serve
17. REAL BENEFITS OF A SELF-SERVICE APPROACH
+15%
Cash Increase
+26%
Pipeline Growth
-67%
Cost Reduction
Real-Time
23. Twitter Tag: #briefr The Briefing Room
Perceptions & Questions
Analyst:
Robin Bloor
24. I am not a
number!
To Round-Up & Wrangle
Robin Bloor, PhD
25. The Flow of Data
The movement of data:
from ACQUISITION
through PREPARATION
to ANALYSIS
Is not necessarily simple…
26. The General Picture
Data Sources
Analytics
Service
Mgt
Life Cycle
Mgt
MetaData
Discovery
MDM
MetaData
Mgt
Data
Cleansing
Data
Lineage
R
O
U
N
D
|
U
P
W
R
A
N
G
L
I
N
G
Staging Area
(Hadoop)
Data Warehouse
or other location
Data Streams
ETL
ETL
27. Immediate Analytics & the Rest
§ Metadata discovery
§ Metadata management
§ Data cleansing
§ Data lineage
IMMEDIATE ANALYTICS Data Sources
Analytics
Service
Mgt
Life Cycle
Mgt
MetaData
Discovery
MDM
MetaData
Mgt
Data
Cleansing
Data
Lineage
R
O
U
N
D
|
U
P
W
R
A
N
G
L
I
N
G
Staging Area
(Hadoop)
Data Warehouse
or other location
Data Streams
ETL
ETL
§ MDM
§ Service mgt
§ Lifecycle mgt
§ ETL
DOWNSTREAM
28. The Analytics Business Process
§ The main point to note
is that it is iterative
§ It has morphed, because
of:
o Data availability
o Parallel technology
o Scalable software
o Open source tools
o M/C learning
Data
Access
Data
Prep
Model
Analyze
Deploy
Execute
29. Analytical Latencies
1. Data access
2. Data preparation
3. Model development
4. Execution
5. Implementation
6. Model audit & update
This is where the
rubber meets the road:
Speed = Value
30. The Impending Reality
Technology is speeding up analytics
by TWO ORDERS OF MAGNITUDE
(on the IT side)
This is changing analytics
31. u Is your capability only relevant to analytics or
does it have broader areas of application?
u Technically, what makes it fast?
u Please comment on analytical workloads:
- What do you see as the natural IT bottlenecks?
- What do you see as the natural business
bottlenecks?
u Do we want business analysts to become ersatz
data scientists?
32. u In respect to scale, what is your largest
implementation by data volume, and what was
the industry sector/problem space?
u Who do you partner with?
u What do you see as the largest barrier to
adoption of Trifacta?
34. Twitter Tag: #briefr The Briefing Room
Upcoming Topics
www.insideanalysis.com
March: BI/ANALYTICS
April: BIG DATA
May: CLOUD
35. Twitter Tag: #briefr The Briefing Room
THANK YOU
for your
ATTENTION!
Some images provided courtesy of
Wikimedia Commons and Wikipedia, including:
"Multiple pliers" by Ed Stevenhagen from nl. Licensed under CC BY-SA 3.0 via Wikimedia Commons -
http://commons.wikimedia.org/wiki/File:Multiple_pliers.jpg#mediaviewer/File:Multiple_pliers.jpg