SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
Grab some
coffee and
enjoy the
pre-show
banter before
the top of the
hour!
The Briefing Room
Data Wrangling and the Art of Big Data Discovery
Twitter Tag: #briefr The Briefing Room
Welcome
Host:
Eric Kavanagh
eric.kavanagh@bloorgroup.com
@eric_kavanagh
Twitter Tag: #briefr The Briefing Room
  Reveal the essential characteristics of enterprise
software, good and bad
  Provide a forum for detailed analysis of today s innovative
technologies
  Give vendors a chance to explain their product to savvy
analysts
  Allow audience members to pose serious questions... and
get answers!
Mission
Twitter Tag: #briefr The Briefing Room
Topics
March: BI/ANALYTICS
April: BIG DATA
May: CLOUD
Twitter Tag: #briefr The Briefing Room
Should I Bring My Tools?
Ø  Hammers aren’t good for
plumbing!
Ø  Big Data requires a new set
of tools
Ø  Preparing and Exploring are
very different
Ø  Don’t throw out your old
tool box!
Twitter Tag: #briefr The Briefing Room
Analyst: Robin Bloor
Robin Bloor is
Chief Analyst at
The Bloor Group
robin.bloor@bloorgroup.com
@robinbloor
Twitter Tag: #briefr The Briefing Room
Trifacta and Zoomdata
Trifacta offers a platform for
data transformation and
preparation
  The interface is rich in
visualization and provides
previews and
recommendations
  The platform also includes a
learning layer which employs
machine learning algorithms to
facilitate automation and self-
learning
Zoomdata is a Big Data
exploration, visualization and
analytics platform
  The platform offers a wide
range of analytics and BI tools,
such as dashboards, stream
processing and IoT analytics
  Its pre-built connectors allow
the Zoomdata server to
connect directly to data
sources
Twitter Tag: #briefr The Briefing Room
Guests:
Russ Cosentino is Vice President of Marketing & Business
Development at Zoomdata. Throughout his career he has focused on
developing solutions that leverage technology to solve business
problems. His experience includes application development for
mission critical systems for the DoD, automated recruitment
programs for the intelligence community and the application of text
analytics for commercial VOC programs.
Dr. Joe Hellerstein is Trifacta’s Chief Strategy Officer and a
Professor of Computer Science at Berkeley. His career in research
and industry has focused on data-centric systems and the way they
drive computing. In 2010, Fortune Magazine included him in their
list of 50 smartest people in technology, and MIT Technology Review
magazine included his Bloom language for cloud computing on their
TR10 list of the 10 technologies “most likely to change our world.”
Data Wrangling and the Art
of Big Data Discovery
Dr. Joe Hellerstein
Professor, EECS Computer Science Division, UC Berkeley
Co-founder & Chief Strategy Officer, Trifacta
DATA WRANGLING
AND THE ART OF BIG DATA DISCOVERY
Russ Cosentino
Vice President
Marketing & Business Development, Zoomdata
Founded in 2012, from Berkeley/Stanford research roots
dp = data to the people
“facilitating interactions between people and data
throughout the analytic lifecycle”
Stanford Visualization Group’s “Data Wrangler”
Elegant solutions for a messy world:
The 80% problem of preparing data for exploratory analytics
TRADITIONAL APPROACH TO DATA MANAGEMENT
Enterprise	
  Data	
  Warehouse	
  
Implement	
  Data	
  Sources	
  
ETL	
  
Structured	
  
Ingest
Storage	
  #1,	
  2,	
  N	
  
ELT	
  
Store	
  &	
  Process	
  
EDW	
  
Archive	
  
ETL	
  
Access	
  Data	
  
Analyze	
  Data	
  
Search
Statistical
Machine
Learning
SQL
Serve
Serve
Optimize
Implement
Custom
Application
Point Solution
ELT	
  
ELT	
  
MANY PEOPLE INVOLVED IN THE PROCESS
DATA
ARCHITECT
DATABASE
ADMINISTRATOR
SYSTEM
ADMINISTRATOR
BUSINESS
ANALYST
BI
ADMINISTRATOR
SYSTEM
ADMINISTRATOR
IT COULD BE SIMPLER
DATABASE ADMINISTRATOR BUSINESS ANALYST
MODERN DATA AND VISUALIZATION ENVIRONMENT
Visualiza8on	
  Data	
  Sources	
  
Structured	
  
Ingest
Store	
  &	
  Process	
   Data	
  Prepara8on	
  
Serve
Unstructured	
  
Ingest
Serve
REAL BENEFITS OF A SELF-SERVICE APPROACH
+15%
Cash Increase
+26%
Pipeline Growth
-67%
Cost Reduction
Real-Time
+15%
Cash Increase
+26%
Pipeline Growth
-67%
Cost Reduction
+48%
Speed of Delivery
+42%
Self-Service Access
+40%
Decision Quality
Real-Time Big Data
REAL BENEFITS OF A SELF-SERVICE APPROACH
+15%
Cash Increase
+26%
Pipeline Growth
-67%
Cost Reduction
+70%
Collaboration
+64%
Decision Speed
+61%
User Adoption
+48%
Speed of Delivery
+42%
Self-Service Access
+40%
Decision Quality
Real-Time InteractiveBig Data
REAL BENEFITS OF A SELF-SERVICE APPROACH
DEMONSTRATION
MODERN DATA ARCHITECTURE FOR SELF-SERVICE
INTELLIGENCE
Twitter Tag: #briefr The Briefing Room
Perceptions & Questions
Analyst:
Robin Bloor
I am not a
number!
To Round-Up & Wrangle
Robin Bloor, PhD
The Flow of Data
The movement of data:
from ACQUISITION
through PREPARATION
to ANALYSIS
Is not necessarily simple…
The General Picture
Data Sources
Analytics
Service
Mgt
Life Cycle
Mgt
MetaData
Discovery
MDM
MetaData
Mgt
Data
Cleansing
Data
Lineage
R
O
U
N
D
|
U
P
W
R
A
N
G
L
I
N
G
Staging Area
(Hadoop)
Data Warehouse
or other location
Data Streams
ETL
ETL
Immediate Analytics & the Rest
§  Metadata discovery
§  Metadata management
§  Data cleansing
§  Data lineage
IMMEDIATE ANALYTICS Data Sources
Analytics
Service
Mgt
Life Cycle
Mgt
MetaData
Discovery
MDM
MetaData
Mgt
Data
Cleansing
Data
Lineage
R
O
U
N
D
|
U
P
W
R
A
N
G
L
I
N
G
Staging Area
(Hadoop)
Data Warehouse
or other location
Data Streams
ETL
ETL
§  MDM
§  Service mgt
§  Lifecycle mgt
§  ETL
DOWNSTREAM
The Analytics Business Process
§  The main point to note
is that it is iterative
§  It has morphed, because
of:
o  Data availability
o  Parallel technology
o  Scalable software
o  Open source tools
o  M/C learning
Data
Access
Data
Prep
Model
Analyze
Deploy
Execute
Analytical Latencies
1.  Data access
2.  Data preparation
3.  Model development
4.  Execution
5.  Implementation
6.  Model audit & update
This is where the
rubber meets the road:
Speed = Value
The Impending Reality
Technology is speeding up analytics
by TWO ORDERS OF MAGNITUDE
(on the IT side)
This is changing analytics
u  Is your capability only relevant to analytics or
does it have broader areas of application?
u  Technically, what makes it fast?
u  Please comment on analytical workloads:
- What do you see as the natural IT bottlenecks?
- What do you see as the natural business
bottlenecks?
u  Do we want business analysts to become ersatz
data scientists?
u  In respect to scale, what is your largest
implementation by data volume, and what was
the industry sector/problem space?
u  Who do you partner with?
u  What do you see as the largest barrier to
adoption of Trifacta?
Twitter Tag: #briefr The Briefing Room
Twitter Tag: #briefr The Briefing Room
Upcoming Topics
www.insideanalysis.com
March: BI/ANALYTICS
April: BIG DATA
May: CLOUD
Twitter Tag: #briefr The Briefing Room
THANK YOU
for your
ATTENTION!
Some images provided courtesy of
Wikimedia Commons and Wikipedia, including:
"Multiple pliers" by Ed Stevenhagen from nl. Licensed under CC BY-SA 3.0 via Wikimedia Commons -
http://commons.wikimedia.org/wiki/File:Multiple_pliers.jpg#mediaviewer/File:Multiple_pliers.jpg

Contenu connexe

Tendances

Lessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Lessons from building a stream-first metadata platform | Shirshanka Das, StealthLessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Lessons from building a stream-first metadata platform | Shirshanka Das, StealthHostedbyConfluent
 
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013Dataiku
 
frog IoT Big Design IoT World Congress 2015
frog IoT Big Design IoT World Congress 2015frog IoT Big Design IoT World Congress 2015
frog IoT Big Design IoT World Congress 2015Patrick Kalaher
 
Total Data Industry Report
Total Data Industry ReportTotal Data Industry Report
Total Data Industry ReportRan Zhang
 
Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality
Beyond the Data Lake - Matthias Korn, Technical Consultant at Data VirtualityBeyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality
Beyond the Data Lake - Matthias Korn, Technical Consultant at Data VirtualityDataconomy Media
 
Big Data Analytics in Government
Big Data Analytics in GovernmentBig Data Analytics in Government
Big Data Analytics in GovernmentDeepak Ramanathan
 
Big data analytics
Big data analyticsBig data analytics
Big data analyticsRavi Teja
 
YHORG Presentation 23 February 2016
YHORG Presentation 23 February 2016YHORG Presentation 23 February 2016
YHORG Presentation 23 February 2016Richard Vidgen
 
Big Data Scotland 2017
Big Data Scotland 2017Big Data Scotland 2017
Big Data Scotland 2017Ray Bugg
 
Building Data Science Teams, Abbreviated
Building Data Science Teams, AbbreviatedBuilding Data Science Teams, Abbreviated
Building Data Science Teams, AbbreviatedAllen Day, PhD
 
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Connected Data World
 
Data Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk ManagementData Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk ManagementData Science Thailand
 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamGreg Goltsov
 
Introduction to Deep Learning and AI at Scale for Managers
Introduction to Deep Learning and AI at Scale for ManagersIntroduction to Deep Learning and AI at Scale for Managers
Introduction to Deep Learning and AI at Scale for ManagersDataWorks Summit
 
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data ScienceAndrew Gardner
 
Using A Distributed Graph Database To Make Sense Of Disparate Data Stores
Using A Distributed Graph Database To Make Sense Of Disparate Data StoresUsing A Distributed Graph Database To Make Sense Of Disparate Data Stores
Using A Distributed Graph Database To Make Sense Of Disparate Data StoresInfiniteGraph
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science TeamsEMC
 

Tendances (20)

Lessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Lessons from building a stream-first metadata platform | Shirshanka Das, StealthLessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Lessons from building a stream-first metadata platform | Shirshanka Das, Stealth
 
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
 
frog IoT Big Design IoT World Congress 2015
frog IoT Big Design IoT World Congress 2015frog IoT Big Design IoT World Congress 2015
frog IoT Big Design IoT World Congress 2015
 
Total Data Industry Report
Total Data Industry ReportTotal Data Industry Report
Total Data Industry Report
 
Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality
Beyond the Data Lake - Matthias Korn, Technical Consultant at Data VirtualityBeyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality
Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big Data Analytics in Government
Big Data Analytics in GovernmentBig Data Analytics in Government
Big Data Analytics in Government
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
YHORG Presentation 23 February 2016
YHORG Presentation 23 February 2016YHORG Presentation 23 February 2016
YHORG Presentation 23 February 2016
 
Big Data Scotland 2017
Big Data Scotland 2017Big Data Scotland 2017
Big Data Scotland 2017
 
Building Data Science Teams, Abbreviated
Building Data Science Teams, AbbreviatedBuilding Data Science Teams, Abbreviated
Building Data Science Teams, Abbreviated
 
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Data Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk ManagementData Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk Management
 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data Team
 
Introduction to Deep Learning and AI at Scale for Managers
Introduction to Deep Learning and AI at Scale for ManagersIntroduction to Deep Learning and AI at Scale for Managers
Introduction to Deep Learning and AI at Scale for Managers
 
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data Science
 
Big data analysis
Big data analysisBig data analysis
Big data analysis
 
Using A Distributed Graph Database To Make Sense Of Disparate Data Stores
Using A Distributed Graph Database To Make Sense Of Disparate Data StoresUsing A Distributed Graph Database To Make Sense Of Disparate Data Stores
Using A Distributed Graph Database To Make Sense Of Disparate Data Stores
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science Teams
 

En vedette

Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
 
Webinar - Risky Business: How to Balance Innovation & Risk in Big Data
Webinar - Risky Business: How to Balance Innovation & Risk in Big DataWebinar - Risky Business: How to Balance Innovation & Risk in Big Data
Webinar - Risky Business: How to Balance Innovation & Risk in Big DataZaloni
 
Impact of health education on tuberculosis drug adherence
Impact of health education on tuberculosis drug adherenceImpact of health education on tuberculosis drug adherence
Impact of health education on tuberculosis drug adherenceSkillet Tony
 
OUR GOAL AND FOCUS FOR "OPEN FOG CONSORTIUM"
OUR GOAL AND FOCUS FOR "OPEN FOG CONSORTIUM"OUR GOAL AND FOCUS FOR "OPEN FOG CONSORTIUM"
OUR GOAL AND FOCUS FOR "OPEN FOG CONSORTIUM"Naoto MATSUMOTO
 
DataMeet 4: Data cleaning & census data
DataMeet 4: Data cleaning & census dataDataMeet 4: Data cleaning & census data
DataMeet 4: Data cleaning & census dataRitvvij Parrikh
 
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopGwen (Chen) Shapira
 
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifactahuguk
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkGuido Schmutz
 
Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Cloudera, Inc.
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining ConceptsDung Nguyen
 
Data Collection-Primary & Secondary
Data Collection-Primary & SecondaryData Collection-Primary & Secondary
Data Collection-Primary & SecondaryPrathamesh Parab
 

En vedette (16)

Data Wrangling
Data WranglingData Wrangling
Data Wrangling
 
Real time analytics in Big Data
Real time analytics in Big DataReal time analytics in Big Data
Real time analytics in Big Data
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
Webinar - Risky Business: How to Balance Innovation & Risk in Big Data
Webinar - Risky Business: How to Balance Innovation & Risk in Big DataWebinar - Risky Business: How to Balance Innovation & Risk in Big Data
Webinar - Risky Business: How to Balance Innovation & Risk in Big Data
 
Impact of health education on tuberculosis drug adherence
Impact of health education on tuberculosis drug adherenceImpact of health education on tuberculosis drug adherence
Impact of health education on tuberculosis drug adherence
 
OUR GOAL AND FOCUS FOR "OPEN FOG CONSORTIUM"
OUR GOAL AND FOCUS FOR "OPEN FOG CONSORTIUM"OUR GOAL AND FOCUS FOR "OPEN FOG CONSORTIUM"
OUR GOAL AND FOCUS FOR "OPEN FOG CONSORTIUM"
 
DataMeet 4: Data cleaning & census data
DataMeet 4: Data cleaning & census dataDataMeet 4: Data cleaning & census data
DataMeet 4: Data cleaning & census data
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
 
Data Mining Overview
Data Mining OverviewData Mining Overview
Data Mining Overview
 
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
 
Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Chapter 10-DATA ANALYSIS & PRESENTATION
Chapter 10-DATA ANALYSIS & PRESENTATIONChapter 10-DATA ANALYSIS & PRESENTATION
Chapter 10-DATA ANALYSIS & PRESENTATION
 
Data Collection-Primary & Secondary
Data Collection-Primary & SecondaryData Collection-Primary & Secondary
Data Collection-Primary & Secondary
 

Similaire à Data Wrangling and the Art of Big Data Discovery

Data Culture Series - Keynote - 24th feb
Data Culture Series - Keynote - 24th febData Culture Series - Keynote - 24th feb
Data Culture Series - Keynote - 24th febJonathan Woodward
 
Business in the Driver’s Seat – An Improved Model for Integration
Business in the Driver’s Seat – An Improved Model for IntegrationBusiness in the Driver’s Seat – An Improved Model for Integration
Business in the Driver’s Seat – An Improved Model for IntegrationInside Analysis
 
Understanding What’s Possible: Getting Business Value from Big Data Quickly
Understanding What’s Possible: Getting Business Value from Big Data QuicklyUnderstanding What’s Possible: Getting Business Value from Big Data Quickly
Understanding What’s Possible: Getting Business Value from Big Data QuicklyInside Analysis
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopInside Analysis
 
The Great Lakes: How to Approach a Big Data Implementation
The Great Lakes: How to Approach a Big Data ImplementationThe Great Lakes: How to Approach a Big Data Implementation
The Great Lakes: How to Approach a Big Data ImplementationInside Analysis
 
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptxarpit206900
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science TJ Stalcup
 
The New Simple: Predictive Analytics for the Mainstream
The New Simple: Predictive Analytics for the Mainstream The New Simple: Predictive Analytics for the Mainstream
The New Simple: Predictive Analytics for the Mainstream Inside Analysis
 
Data analytics course 3
Data analytics course 3Data analytics course 3
Data analytics course 3nakshatraL
 
Fire in the Hole: How a Spark-Powered Platform Charges Analytics
Fire in the Hole: How a Spark-Powered Platform Charges Analytics Fire in the Hole: How a Spark-Powered Platform Charges Analytics
Fire in the Hole: How a Spark-Powered Platform Charges Analytics Inside Analysis
 
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...Jürgen Ambrosi
 
Next generation of data scientist
Next generation of data scientistNext generation of data scientist
Next generation of data scientistTanujaSomvanshi1
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?Inside Analysis
 
Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018mark madsen
 
Drive It Home: A Roadmap for Today's Data-Driven Culture
Drive It Home: A Roadmap for Today's Data-Driven CultureDrive It Home: A Roadmap for Today's Data-Driven Culture
Drive It Home: A Roadmap for Today's Data-Driven CultureInside Analysis
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfData Science Council of America
 
Come diventare data scientist - Paolo Pellegrini
Come diventare data scientist - Paolo PellegriniCome diventare data scientist - Paolo Pellegrini
Come diventare data scientist - Paolo PellegriniDonatella Cambosu
 
Deeper Questions: How Interactive Visualization Empowers Analysts
Deeper Questions: How Interactive Visualization Empowers AnalystsDeeper Questions: How Interactive Visualization Empowers Analysts
Deeper Questions: How Interactive Visualization Empowers AnalystsInside Analysis
 
A Connected Data Landscape: Virtualization and the Internet of Things
A Connected Data Landscape: Virtualization and the Internet of ThingsA Connected Data Landscape: Virtualization and the Internet of Things
A Connected Data Landscape: Virtualization and the Internet of ThingsInside Analysis
 

Similaire à Data Wrangling and the Art of Big Data Discovery (20)

Data Culture Series - Keynote - 24th feb
Data Culture Series - Keynote - 24th febData Culture Series - Keynote - 24th feb
Data Culture Series - Keynote - 24th feb
 
Business in the Driver’s Seat – An Improved Model for Integration
Business in the Driver’s Seat – An Improved Model for IntegrationBusiness in the Driver’s Seat – An Improved Model for Integration
Business in the Driver’s Seat – An Improved Model for Integration
 
Understanding What’s Possible: Getting Business Value from Big Data Quickly
Understanding What’s Possible: Getting Business Value from Big Data QuicklyUnderstanding What’s Possible: Getting Business Value from Big Data Quickly
Understanding What’s Possible: Getting Business Value from Big Data Quickly
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Joe C
Joe CJoe C
Joe C
 
The Great Lakes: How to Approach a Big Data Implementation
The Great Lakes: How to Approach a Big Data ImplementationThe Great Lakes: How to Approach a Big Data Implementation
The Great Lakes: How to Approach a Big Data Implementation
 
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
 
The New Simple: Predictive Analytics for the Mainstream
The New Simple: Predictive Analytics for the Mainstream The New Simple: Predictive Analytics for the Mainstream
The New Simple: Predictive Analytics for the Mainstream
 
Data analytics course 3
Data analytics course 3Data analytics course 3
Data analytics course 3
 
Fire in the Hole: How a Spark-Powered Platform Charges Analytics
Fire in the Hole: How a Spark-Powered Platform Charges Analytics Fire in the Hole: How a Spark-Powered Platform Charges Analytics
Fire in the Hole: How a Spark-Powered Platform Charges Analytics
 
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
 
Next generation of data scientist
Next generation of data scientistNext generation of data scientist
Next generation of data scientist
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?
 
Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018
 
Drive It Home: A Roadmap for Today's Data-Driven Culture
Drive It Home: A Roadmap for Today's Data-Driven CultureDrive It Home: A Roadmap for Today's Data-Driven Culture
Drive It Home: A Roadmap for Today's Data-Driven Culture
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
 
Come diventare data scientist - Paolo Pellegrini
Come diventare data scientist - Paolo PellegriniCome diventare data scientist - Paolo Pellegrini
Come diventare data scientist - Paolo Pellegrini
 
Deeper Questions: How Interactive Visualization Empowers Analysts
Deeper Questions: How Interactive Visualization Empowers AnalystsDeeper Questions: How Interactive Visualization Empowers Analysts
Deeper Questions: How Interactive Visualization Empowers Analysts
 
A Connected Data Landscape: Virtualization and the Internet of Things
A Connected Data Landscape: Virtualization and the Internet of ThingsA Connected Data Landscape: Virtualization and the Internet of Things
A Connected Data Landscape: Virtualization and the Internet of Things
 

Plus de Inside Analysis

An Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIAn Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIInside Analysis
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationInside Analysis
 
Fit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownFit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownInside Analysis
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security Inside Analysis
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeInside Analysis
 
Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataInside Analysis
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsInside Analysis
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingInside Analysis
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLInside Analysis
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelInside Analysis
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureInside Analysis
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskInside Analysis
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataInside Analysis
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseInside Analysis
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldInside Analysis
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave DuggalInside Analysis
 
Phasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyPhasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyInside Analysis
 
Red Hat - Sarangan Rangachari
Red Hat - Sarangan RangachariRed Hat - Sarangan Rangachari
Red Hat - Sarangan RangachariInside Analysis
 

Plus de Inside Analysis (20)

An Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIAn Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BI
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
 
Fit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownFit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data Letdown
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On Time
 
Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of Data
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time Analytics
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of Everything
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global Level
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your Architecture
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the Risk
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big Data
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data Warehouse
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile World
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave Duggal
 
Modus Operandi
Modus OperandiModus Operandi
Modus Operandi
 
Phasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyPhasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey Malafsky
 
Red Hat - Sarangan Rangachari
Red Hat - Sarangan RangachariRed Hat - Sarangan Rangachari
Red Hat - Sarangan Rangachari
 
WebAction-Sami Abkay
WebAction-Sami AbkayWebAction-Sami Abkay
WebAction-Sami Abkay
 

Dernier

Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 

Dernier (20)

Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 

Data Wrangling and the Art of Big Data Discovery

  • 1. Grab some coffee and enjoy the pre-show banter before the top of the hour!
  • 2. The Briefing Room Data Wrangling and the Art of Big Data Discovery
  • 3. Twitter Tag: #briefr The Briefing Room Welcome Host: Eric Kavanagh eric.kavanagh@bloorgroup.com @eric_kavanagh
  • 4. Twitter Tag: #briefr The Briefing Room   Reveal the essential characteristics of enterprise software, good and bad   Provide a forum for detailed analysis of today s innovative technologies   Give vendors a chance to explain their product to savvy analysts   Allow audience members to pose serious questions... and get answers! Mission
  • 5. Twitter Tag: #briefr The Briefing Room Topics March: BI/ANALYTICS April: BIG DATA May: CLOUD
  • 6. Twitter Tag: #briefr The Briefing Room Should I Bring My Tools? Ø  Hammers aren’t good for plumbing! Ø  Big Data requires a new set of tools Ø  Preparing and Exploring are very different Ø  Don’t throw out your old tool box!
  • 7. Twitter Tag: #briefr The Briefing Room Analyst: Robin Bloor Robin Bloor is Chief Analyst at The Bloor Group robin.bloor@bloorgroup.com @robinbloor
  • 8. Twitter Tag: #briefr The Briefing Room Trifacta and Zoomdata Trifacta offers a platform for data transformation and preparation   The interface is rich in visualization and provides previews and recommendations   The platform also includes a learning layer which employs machine learning algorithms to facilitate automation and self- learning Zoomdata is a Big Data exploration, visualization and analytics platform   The platform offers a wide range of analytics and BI tools, such as dashboards, stream processing and IoT analytics   Its pre-built connectors allow the Zoomdata server to connect directly to data sources
  • 9. Twitter Tag: #briefr The Briefing Room Guests: Russ Cosentino is Vice President of Marketing & Business Development at Zoomdata. Throughout his career he has focused on developing solutions that leverage technology to solve business problems. His experience includes application development for mission critical systems for the DoD, automated recruitment programs for the intelligence community and the application of text analytics for commercial VOC programs. Dr. Joe Hellerstein is Trifacta’s Chief Strategy Officer and a Professor of Computer Science at Berkeley. His career in research and industry has focused on data-centric systems and the way they drive computing. In 2010, Fortune Magazine included him in their list of 50 smartest people in technology, and MIT Technology Review magazine included his Bloom language for cloud computing on their TR10 list of the 10 technologies “most likely to change our world.”
  • 10. Data Wrangling and the Art of Big Data Discovery
  • 11. Dr. Joe Hellerstein Professor, EECS Computer Science Division, UC Berkeley Co-founder & Chief Strategy Officer, Trifacta DATA WRANGLING AND THE ART OF BIG DATA DISCOVERY Russ Cosentino Vice President Marketing & Business Development, Zoomdata
  • 12. Founded in 2012, from Berkeley/Stanford research roots dp = data to the people “facilitating interactions between people and data throughout the analytic lifecycle” Stanford Visualization Group’s “Data Wrangler” Elegant solutions for a messy world: The 80% problem of preparing data for exploratory analytics
  • 13. TRADITIONAL APPROACH TO DATA MANAGEMENT Enterprise  Data  Warehouse   Implement  Data  Sources   ETL   Structured   Ingest Storage  #1,  2,  N   ELT   Store  &  Process   EDW   Archive   ETL   Access  Data   Analyze  Data   Search Statistical Machine Learning SQL Serve Serve Optimize Implement Custom Application Point Solution ELT   ELT  
  • 14. MANY PEOPLE INVOLVED IN THE PROCESS DATA ARCHITECT DATABASE ADMINISTRATOR SYSTEM ADMINISTRATOR BUSINESS ANALYST BI ADMINISTRATOR SYSTEM ADMINISTRATOR
  • 15. IT COULD BE SIMPLER DATABASE ADMINISTRATOR BUSINESS ANALYST
  • 16. MODERN DATA AND VISUALIZATION ENVIRONMENT Visualiza8on  Data  Sources   Structured   Ingest Store  &  Process   Data  Prepara8on   Serve Unstructured   Ingest Serve
  • 17. REAL BENEFITS OF A SELF-SERVICE APPROACH +15% Cash Increase +26% Pipeline Growth -67% Cost Reduction Real-Time
  • 18. +15% Cash Increase +26% Pipeline Growth -67% Cost Reduction +48% Speed of Delivery +42% Self-Service Access +40% Decision Quality Real-Time Big Data REAL BENEFITS OF A SELF-SERVICE APPROACH
  • 19. +15% Cash Increase +26% Pipeline Growth -67% Cost Reduction +70% Collaboration +64% Decision Speed +61% User Adoption +48% Speed of Delivery +42% Self-Service Access +40% Decision Quality Real-Time InteractiveBig Data REAL BENEFITS OF A SELF-SERVICE APPROACH
  • 21. MODERN DATA ARCHITECTURE FOR SELF-SERVICE INTELLIGENCE
  • 22.
  • 23. Twitter Tag: #briefr The Briefing Room Perceptions & Questions Analyst: Robin Bloor
  • 24. I am not a number! To Round-Up & Wrangle Robin Bloor, PhD
  • 25. The Flow of Data The movement of data: from ACQUISITION through PREPARATION to ANALYSIS Is not necessarily simple…
  • 26. The General Picture Data Sources Analytics Service Mgt Life Cycle Mgt MetaData Discovery MDM MetaData Mgt Data Cleansing Data Lineage R O U N D | U P W R A N G L I N G Staging Area (Hadoop) Data Warehouse or other location Data Streams ETL ETL
  • 27. Immediate Analytics & the Rest §  Metadata discovery §  Metadata management §  Data cleansing §  Data lineage IMMEDIATE ANALYTICS Data Sources Analytics Service Mgt Life Cycle Mgt MetaData Discovery MDM MetaData Mgt Data Cleansing Data Lineage R O U N D | U P W R A N G L I N G Staging Area (Hadoop) Data Warehouse or other location Data Streams ETL ETL §  MDM §  Service mgt §  Lifecycle mgt §  ETL DOWNSTREAM
  • 28. The Analytics Business Process §  The main point to note is that it is iterative §  It has morphed, because of: o  Data availability o  Parallel technology o  Scalable software o  Open source tools o  M/C learning Data Access Data Prep Model Analyze Deploy Execute
  • 29. Analytical Latencies 1.  Data access 2.  Data preparation 3.  Model development 4.  Execution 5.  Implementation 6.  Model audit & update This is where the rubber meets the road: Speed = Value
  • 30. The Impending Reality Technology is speeding up analytics by TWO ORDERS OF MAGNITUDE (on the IT side) This is changing analytics
  • 31. u  Is your capability only relevant to analytics or does it have broader areas of application? u  Technically, what makes it fast? u  Please comment on analytical workloads: - What do you see as the natural IT bottlenecks? - What do you see as the natural business bottlenecks? u  Do we want business analysts to become ersatz data scientists?
  • 32. u  In respect to scale, what is your largest implementation by data volume, and what was the industry sector/problem space? u  Who do you partner with? u  What do you see as the largest barrier to adoption of Trifacta?
  • 33. Twitter Tag: #briefr The Briefing Room
  • 34. Twitter Tag: #briefr The Briefing Room Upcoming Topics www.insideanalysis.com March: BI/ANALYTICS April: BIG DATA May: CLOUD
  • 35. Twitter Tag: #briefr The Briefing Room THANK YOU for your ATTENTION! Some images provided courtesy of Wikimedia Commons and Wikipedia, including: "Multiple pliers" by Ed Stevenhagen from nl. Licensed under CC BY-SA 3.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Multiple_pliers.jpg#mediaviewer/File:Multiple_pliers.jpg