SlideShare une entreprise Scribd logo
1  sur  35
@joe_Caserta@joe_Caserta
Webinar: Building A New Platform
for Customer Analytics
Presented By Sponsored By
Nov 16, 2016
@joe_Caserta
@casertaconcepts
@databricks
@joe_Caserta
Joe Caserta Timeline
Launched Big Data practice
Co-author, with Ralph Kimball, The Data
Warehouse ETL Toolkit (Wiley)
Data Analysis, Data Warehousing and Business
Intelligence since 1996
Began consulting database programing and data
modeling 25+ years hands-on experience building database
solutions
Founded Caserta Concepts in NYC
Web log analytics solution published in Intelligent
Enterprise magazine
Launched Data Science, Data Interaction and Cloud
practices
Laser focus on extending Data Analytics with Big Data
solutions
1986
2004
1996
2009
2001
2013
2012
2014
Dedicated to Data Governance Techniques on Big
Data (Innovation)
Awarded Top 20 Big Data Companies 2016
Top 20 Most Powerful
Big Data consulting firms
Launched Big Data Warehousing (BDW) Meetup NYC:
2,000+ Members
2016 Awarded Fastest Growing Big Data Companies
2016
Established best practices for big data ecosystem
implementations
@joe_Caserta
About Caserta Concepts
– Consulting Data Innovation and Modern Data Engineering
– Award-winning company
– Internationally recognized work force
– Strategy, Architecture, Implementation, Governance
– Innovation Partner
– Strategic Consulting
– Advanced Architecture
– Build & Deploy
• Leader in Enterprise Data Solutions
– Big Data Analytics
– Data Warehousing
– Business Intelligence
Data Science
Cloud Computing
Data Governance
@joe_Caserta
Caserta Client Portfolio
Retail/eCommerce
& Manufacturing
Finance, Healthcare
& Insurance
Digital Media/AdTech
Education & Services
@joe_Caserta
Awards & Recognition
Top 10
Fastest Growing
Big Data Companies
2016
@joe_Caserta
Our Partners
@joe_Caserta
Use Case Objectives
• Cross-Channel Behavior Tracking / Identity Resolution
• Access to Atomic Level Transaction Data
• Blend Data Assets from Multiple Marketing Channels
• Fast Data Onboarding and Discovery
• Ensure Data Quality
• Single platform / Landscape simplification
• Understand Path-to-Purchase Customer Journey
• Improve Customer Experience & Increase Sales
@joe_Caserta
The Recipe: Build a Dynamic Data Platform
OLD WAY:
• Structure Data  Ingest Data  Analyze Data
• Fixed Capacity
• Monolith
NEW WAY:
• Ingest Data  Analyze Data  Structure Data
• Dynamic Capacity
• Ecosystem
RECIPE:
• Cloud
• Data Lake
• Holistic Architecture & Framework
@joe_Caserta
Analytics: The Whole Brain Challenge
Front
Back
Analytics Oriented
• Data Science
• Research
Process Oriented
• Data Governance
• Compliance
Operations Oriented
• Shared Services
• Data Engineering
Revenue Oriented
• Revenue Goals
• Monetizing Data
@joe_Caserta
Chief Data Organization (Oversight)
Vertical Business Area
[Sales/Finance/Marketing/Operations/Customer Svc]
Product Owner
SCRUM Master
Development Team
Business Subject Matter Expertise
Data Librarian/Data Stewardship
Data Science/ Statistical Skills
Data Engineering / Architecture
Presentation/ BI Report Development Skills
Data Quality Assurance
DevOps
IT Organization
(Oversight)
Enterprise Data Architect
Solution Engineers
Data Integration Practice
User Experience Practice
QA Practice
Operations Practice
Advanced Analytics
Business Analysts
Data Analysts
Data Scientists
Statisticians
Data Engineers
Planning Organization
Project Managers
Data Organization
Data Gov Coordinator
Data Librarians
Data Stewards
It Takes a Village!
@joe_Caserta
Unexpected Reaction to Change
@joe_Caserta
Global economics
Intensity of competition
Reduce costs
Move to cross-functional teams
New executive leadership
Speed of technical change
Social trends and changes
Period of time in present role
Status & perks of office/dept under threat
No apparent reasons for proposed changes
Lack of understanding of proposed changes
Fear of inability to cope with new technology
Concern over job security
Forces for Change Forces Resisting Change
Status Quo
Moving the Status Quo
http://www.change-management-coach.com/force-field-analysis.html
@joe_Caserta
The Data Lake Paradigm
Technology:
• Scalable distributed storage  S3
• Pluggable fit-for-purpose processing  EMR
• Consistent extensible framework  Spark
• Dimensional Data Warehouse  Redshift
Functional Capabilities:
• Remove barriers between data ingestion and analysis
• Democratize Data with Just Enough Data Governance
@joe_Caserta
Why AWS?
@joe_Caserta
Why Spark?
“Big Box” tools vs ROI?
– Prohibitively expensive  limited by licensing $$$
– Typically limited to the scalability of a single server
We Spark!
• Development local or distributed is identical
• Beautiful high level API’s
• Full universe of Python modules
• Open source and Free
• Blazing fast!
• Databricks cloud makes it easier
Spark has become our default processing engine for a
myriad of engineering & science problems
@joe_Caserta
Why we Databricks
• Interactive UI
• Includes a workspace with notebooks, dashboards, job scheduler,
point-and-click cluster management
• Cluster sharing
• Multiple users can connect to the same cluster, saving cost
• Security features
• Access controls to the whole workspace, clusters
• Collaboration
• Multi-user access to the same notebook, revision control, and IDE
and GitHub integration
• Data management
• Support for connecting different data sources to Spark, caching
service to speed up queries
@joe_Caserta
Ingest Raw
Data
Organize, Define,
Complete
Munging, Blending
Machine Learning
Data Quality and Monitoring
Metadata, ILM , Security
Data Catalog
Data Integration
Fully Governed ( trusted)
Arbitrary/Ad-hoc Queries
and Reporting
Big
Data
Ware
house
Data Science Workspace
Data Lake – Integrated Sandbox
Landing Area – Source Data in “Full Fidelity”
Usage Pattern Data Governance
Metadata, ILM,
Security
Corporate Data Pyramid (CDP)
@joe_Caserta
Data Integration
Identity Resolution
Data Quality
Discovery / Exploration
Machine Learning
Models Development
Reports / Dashboards
Applications
APIs
Structured Data
Unstructured Data
SQL, NoSQL, Object Store
Find Share Collaborate
Data Engineer
Data Scientist
Business Analyst
App Developer
Analyze
Persist
DeployIngest
Data Lake on the Cloud
@joe_Caserta
Solution Architecture
ODS
ETL/ID Res
@joe_Caserta
Type
Comments
Single Touch Rules-Based Statistically Driven
Assign the credit
to the first or last
exposure
Assign the credit to
each interaction
based on business
rules
Assign the credit to
interactions based
on data-driven
model
Ad-Click Mailing MailingE-mail E-mailAd-Click Ad-Click
100% 33% 33% 33% 27% 49% 24%
- Last touch only
- Ignores bulk of
customer journey
- Undervalues
other interactions
and influencers
- Subjective
- Assigns arbitrary
values to each
interaction
- Lacks analytics rigor
to determine weights
ü Looks at full behavior
patterns
ü Consider all touch points
ü Can apply different
models for best results
ü Use data to find
correlations between
touch points (winning
combinations)
Path-to-Purchase Methods
@joe_Caserta
Unifying the Customer Across Channels
Customer Data Integration (CDI):
Match and manage customer information from all available sources
Marketing channels: DMP, Salesforce, Adobe, Social, Direct Mail, Call Center, CRM
In other words…
We need to figure out how to
LINK people across systems!
@joe_Caserta
Mastering Master Data is Still MDM
Standardize
Match
Survivorship
Validate
@joe_Caserta
Standardization and Matching
Cleanse and Parse:
• Names
• Resolve nicknames
• Create deterministic hash, phonetic
representation
• Addresses
• Emails
• Phone Numbers
Matching:
Join based on combinations of cleansed
and standardized data to create match
results:
Spark map operations:
• Data cleansing, transformation, and
standardization
– Address Parsing: usaddress, postal-
address, etc
– Name Hashing: fuzzy, etc
– Genderization: sexmachine, etc
@joe_Caserta
Mastering Unmanageable Source Data
Reveal
• Wait for the customer to “reveal” themselves
• Create link between anonymous self and known profile
Vector
• May need behavioral statistical profiling
• Compare use vectors
Rebuild
• Recluster all prior activities
• Rebuild the Graph
@joe_Caserta
The Matching Process
The matching process output gives us the relationships between customers:
Great, but it’s not very useable, you need to traverse the dataset to find out 1234 and 1235 are the
same person (and this is a trivial case)
And we need to cluster and identify our survivors (vertex)
xid yid match_type
1234 4849 phone
4849 5499 email
5499 1235 address
4849 7788 cookie
5499 7788 cookie
4849 1234 phone
@joe_Caserta
Graph to the Rescue
1234 4849
5499
7788
We just need to import our edges into a graph
and “dump” out communities
Don’t think table…
think Graph! These matches are
actually communities
1235
@joe_Caserta
Connected Components algorithm labels each connected component of the
graph with the ID of its lowest-numbered vertex
This lowest number vertex can serve as our “survivor” (not field survivorship)
Connected Components
xid yid
1234 4849
1234 5499
1234 1235
1234 7788
1234 7788
1234 1234
@joe_Caserta
Identity Resolution Process
@joe_Caserta
The Notebook is the ETL Tool
@joe_Caserta
The Notebook is the Data Science Tool
@joe_Caserta
The Redshift DW is still Dimensional
@joe_Caserta
Use Graph for Data Lineage
@joe_Caserta
The Goal: Top Conversion Paths
Source: Accelerom AG, Zurich
@joe_Caserta
• Data Lake: S3 / NoSQL or SQL as Needed
• Identity Resolution: Spark / Graph Frames
• ETL: Spark/Python
• Path-to-Purchase: S3 / Spark / MLlib
• Data Warehouse: RedShift
• Shared Interface: Notebooks
• Business Intelligence: Tableau
Recap
@joe_Caserta
Joe Caserta
President, Caserta Concepts
joe@casertaconcepts.com
@joe_Caserta
• Award-winning company
• Transformative Data Strategies
• Modern Data Engineering
• Advanced Architecture
• Innovation Partner
• Strategic Consulting
• Advanced Technical Design
• Build & Deploy Solutions
• BDW Meetup
• New York City
• 3,000+ members
• Knowledge sharing
Data is not important, it’s what you do with it that’s important!
Thank You

Contenu connexe

Tendances

You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?Caserta
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017Caserta
 
A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...DataWorks Summit
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on HadoopCaserta
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkCaserta
 
Moving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsMoving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsCaserta
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for EveryoneCaserta
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18Harvinder Atwal
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It? Caserta
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the CloudCaserta
 
Maximize the Value of Your Data: Neo4j Graph Data Platform
Maximize the Value of Your Data: Neo4j Graph Data PlatformMaximize the Value of Your Data: Neo4j Graph Data Platform
Maximize the Value of Your Data: Neo4j Graph Data PlatformNeo4j
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data LakeCaserta
 
An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...
An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...
An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...Neo4j
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalHarvinder Atwal
 
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...DATAVERSITY
 
Using Machine Learning to Understand and Predict Marketing ROI
Using Machine Learning to Understand and Predict Marketing ROIUsing Machine Learning to Understand and Predict Marketing ROI
Using Machine Learning to Understand and Predict Marketing ROIDATAVERSITY
 
Business Value of Data
Business Value of Data Business Value of Data
Business Value of Data UIResearchPark
 
Smart Data Webinar: Knowledge as a Service
Smart Data Webinar: Knowledge as a ServiceSmart Data Webinar: Knowledge as a Service
Smart Data Webinar: Knowledge as a ServiceDATAVERSITY
 
IBM Governed Data Lake
IBM Governed Data LakeIBM Governed Data Lake
IBM Governed Data LakeKaran Sachdeva
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseCaserta
 

Tendances (20)

You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017
 
A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 
Moving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsMoving Past Infrastructure Limitations
Moving Past Infrastructure Limitations
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It?
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
 
Maximize the Value of Your Data: Neo4j Graph Data Platform
Maximize the Value of Your Data: Neo4j Graph Data PlatformMaximize the Value of Your Data: Neo4j Graph Data Platform
Maximize the Value of Your Data: Neo4j Graph Data Platform
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data Lake
 
An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...
An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...
An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
 
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
 
Using Machine Learning to Understand and Predict Marketing ROI
Using Machine Learning to Understand and Predict Marketing ROIUsing Machine Learning to Understand and Predict Marketing ROI
Using Machine Learning to Understand and Predict Marketing ROI
 
Business Value of Data
Business Value of Data Business Value of Data
Business Value of Data
 
Smart Data Webinar: Knowledge as a Service
Smart Data Webinar: Knowledge as a ServiceSmart Data Webinar: Knowledge as a Service
Smart Data Webinar: Knowledge as a Service
 
IBM Governed Data Lake
IBM Governed Data LakeIBM Governed Data Lake
IBM Governed Data Lake
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the Enterprise
 

En vedette

The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation Caserta
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Caserta
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure CloudCaserta
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by DatabricksCaserta
 
Designing High Performance ETL for Data Warehouse
Designing High Performance ETL for Data WarehouseDesigning High Performance ETL for Data Warehouse
Designing High Performance ETL for Data WarehouseMarcel Franke
 
1DMP: Marketing Data Platform - the future of data-driven marketing
1DMP: Marketing Data Platform - the future of data-driven marketing1DMP: Marketing Data Platform - the future of data-driven marketing
1DMP: Marketing Data Platform - the future of data-driven marketingCleverLEAF
 
Oracle PL/SQL Best Practices
Oracle PL/SQL Best PracticesOracle PL/SQL Best Practices
Oracle PL/SQL Best PracticesEmrah METE
 
Veri Ambarları için Oracle'ın Analitik SQL Desteği
Veri Ambarları için Oracle'ın Analitik SQL DesteğiVeri Ambarları için Oracle'ın Analitik SQL Desteği
Veri Ambarları için Oracle'ın Analitik SQL DesteğiEmrah METE
 
Thank Bunny - Customer Engagement Platform
Thank Bunny - Customer Engagement PlatformThank Bunny - Customer Engagement Platform
Thank Bunny - Customer Engagement PlatformSeshu Karthick
 
The Big Data Ecosystem for Financial Services
The Big Data Ecosystem for Financial ServicesThe Big Data Ecosystem for Financial Services
The Big Data Ecosystem for Financial ServicesDataStax
 
The Art Of Net Promoter Score
The Art Of Net Promoter ScoreThe Art Of Net Promoter Score
The Art Of Net Promoter ScoreAureus Analytics
 
Galvanize Data Science Open House
Galvanize Data Science Open HouseGalvanize Data Science Open House
Galvanize Data Science Open Houseryanorban
 
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...Spark Summit
 
Your data structures are made of maths!
Your data structures are made of maths!Your data structures are made of maths!
Your data structures are made of maths!kenbot
 
DGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data QualityDGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data QualityCaserta
 
Appraiser: How Airbnb Generates Complex Models in Spark for Demand Prediction...
Appraiser: How Airbnb Generates Complex Models in Spark for Demand Prediction...Appraiser: How Airbnb Generates Complex Models in Spark for Demand Prediction...
Appraiser: How Airbnb Generates Complex Models in Spark for Demand Prediction...Spark Summit
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemYael Garten
 

En vedette (20)

The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
 
Designing High Performance ETL for Data Warehouse
Designing High Performance ETL for Data WarehouseDesigning High Performance ETL for Data Warehouse
Designing High Performance ETL for Data Warehouse
 
BIg Data Trends in 2016
BIg Data Trends in 2016BIg Data Trends in 2016
BIg Data Trends in 2016
 
1DMP: Marketing Data Platform - the future of data-driven marketing
1DMP: Marketing Data Platform - the future of data-driven marketing1DMP: Marketing Data Platform - the future of data-driven marketing
1DMP: Marketing Data Platform - the future of data-driven marketing
 
Internet of Things and Big Data
Internet of Things and Big DataInternet of Things and Big Data
Internet of Things and Big Data
 
Oracle PL/SQL Best Practices
Oracle PL/SQL Best PracticesOracle PL/SQL Best Practices
Oracle PL/SQL Best Practices
 
Veri Ambarları için Oracle'ın Analitik SQL Desteği
Veri Ambarları için Oracle'ın Analitik SQL DesteğiVeri Ambarları için Oracle'ın Analitik SQL Desteği
Veri Ambarları için Oracle'ın Analitik SQL Desteği
 
Thank Bunny - Customer Engagement Platform
Thank Bunny - Customer Engagement PlatformThank Bunny - Customer Engagement Platform
Thank Bunny - Customer Engagement Platform
 
The Big Data Ecosystem for Financial Services
The Big Data Ecosystem for Financial ServicesThe Big Data Ecosystem for Financial Services
The Big Data Ecosystem for Financial Services
 
Cloud transition - The Trivadis approach
Cloud transition - The Trivadis approachCloud transition - The Trivadis approach
Cloud transition - The Trivadis approach
 
The Art Of Net Promoter Score
The Art Of Net Promoter ScoreThe Art Of Net Promoter Score
The Art Of Net Promoter Score
 
Galvanize Data Science Open House
Galvanize Data Science Open HouseGalvanize Data Science Open House
Galvanize Data Science Open House
 
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
 
Your data structures are made of maths!
Your data structures are made of maths!Your data structures are made of maths!
Your data structures are made of maths!
 
DGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data QualityDGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data Quality
 
Appraiser: How Airbnb Generates Complex Models in Spark for Demand Prediction...
Appraiser: How Airbnb Generates Complex Models in Spark for Demand Prediction...Appraiser: How Airbnb Generates Complex Models in Spark for Demand Prediction...
Appraiser: How Airbnb Generates Complex Models in Spark for Demand Prediction...
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystem
 

Similaire à Building a New Platform for Customer Analytics

Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsCaserta
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and InnovationCaserta
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneySai Paravastu
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseDatabricks
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeCaserta
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Caserta
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceCaserta
 
Big Data Analytics with Microsoft
Big Data Analytics with MicrosoftBig Data Analytics with Microsoft
Big Data Analytics with MicrosoftCaserta
 
Why an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessWhy an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessInformatica
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeMicrosoft
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Denodo
 
Bridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudBridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudInside Analysis
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationDenodo
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopCCG
 
DataCanvas: Big Data Analytic Flow in Cloud
DataCanvas: Big Data Analytic Flow in CloudDataCanvas: Big Data Analytic Flow in Cloud
DataCanvas: Big Data Analytic Flow in CloudLei Fang
 
Graph Databases – Benefits and Risks
Graph Databases – Benefits and RisksGraph Databases – Benefits and Risks
Graph Databases – Benefits and RisksDATAVERSITY
 
Knowledge Graphs Webinar- 11/7/2017
Knowledge Graphs Webinar- 11/7/2017Knowledge Graphs Webinar- 11/7/2017
Knowledge Graphs Webinar- 11/7/2017Neo4j
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Dataconomy Media
 

Similaire à Building a New Platform for Customer Analytics (20)

Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment Options
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Big Data Analytics with Microsoft
Big Data Analytics with MicrosoftBig Data Analytics with Microsoft
Big Data Analytics with Microsoft
 
Why an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessWhy an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business Success
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLake
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
 
Bridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudBridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the Cloud
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual Workshop
 
DataCanvas: Big Data Analytic Flow in Cloud
DataCanvas: Big Data Analytic Flow in CloudDataCanvas: Big Data Analytic Flow in Cloud
DataCanvas: Big Data Analytic Flow in Cloud
 
Graph Databases – Benefits and Risks
Graph Databases – Benefits and RisksGraph Databases – Benefits and Risks
Graph Databases – Benefits and Risks
 
Knowledge Graphs Webinar- 11/7/2017
Knowledge Graphs Webinar- 11/7/2017Knowledge Graphs Webinar- 11/7/2017
Knowledge Graphs Webinar- 11/7/2017
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
 

Dernier

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 

Dernier (20)

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 

Building a New Platform for Customer Analytics

  • 1. @joe_Caserta@joe_Caserta Webinar: Building A New Platform for Customer Analytics Presented By Sponsored By Nov 16, 2016 @joe_Caserta @casertaconcepts @databricks
  • 2. @joe_Caserta Joe Caserta Timeline Launched Big Data practice Co-author, with Ralph Kimball, The Data Warehouse ETL Toolkit (Wiley) Data Analysis, Data Warehousing and Business Intelligence since 1996 Began consulting database programing and data modeling 25+ years hands-on experience building database solutions Founded Caserta Concepts in NYC Web log analytics solution published in Intelligent Enterprise magazine Launched Data Science, Data Interaction and Cloud practices Laser focus on extending Data Analytics with Big Data solutions 1986 2004 1996 2009 2001 2013 2012 2014 Dedicated to Data Governance Techniques on Big Data (Innovation) Awarded Top 20 Big Data Companies 2016 Top 20 Most Powerful Big Data consulting firms Launched Big Data Warehousing (BDW) Meetup NYC: 2,000+ Members 2016 Awarded Fastest Growing Big Data Companies 2016 Established best practices for big data ecosystem implementations
  • 3. @joe_Caserta About Caserta Concepts – Consulting Data Innovation and Modern Data Engineering – Award-winning company – Internationally recognized work force – Strategy, Architecture, Implementation, Governance – Innovation Partner – Strategic Consulting – Advanced Architecture – Build & Deploy • Leader in Enterprise Data Solutions – Big Data Analytics – Data Warehousing – Business Intelligence Data Science Cloud Computing Data Governance
  • 4. @joe_Caserta Caserta Client Portfolio Retail/eCommerce & Manufacturing Finance, Healthcare & Insurance Digital Media/AdTech Education & Services
  • 5. @joe_Caserta Awards & Recognition Top 10 Fastest Growing Big Data Companies 2016
  • 7. @joe_Caserta Use Case Objectives • Cross-Channel Behavior Tracking / Identity Resolution • Access to Atomic Level Transaction Data • Blend Data Assets from Multiple Marketing Channels • Fast Data Onboarding and Discovery • Ensure Data Quality • Single platform / Landscape simplification • Understand Path-to-Purchase Customer Journey • Improve Customer Experience & Increase Sales
  • 8. @joe_Caserta The Recipe: Build a Dynamic Data Platform OLD WAY: • Structure Data  Ingest Data  Analyze Data • Fixed Capacity • Monolith NEW WAY: • Ingest Data  Analyze Data  Structure Data • Dynamic Capacity • Ecosystem RECIPE: • Cloud • Data Lake • Holistic Architecture & Framework
  • 9. @joe_Caserta Analytics: The Whole Brain Challenge Front Back Analytics Oriented • Data Science • Research Process Oriented • Data Governance • Compliance Operations Oriented • Shared Services • Data Engineering Revenue Oriented • Revenue Goals • Monetizing Data
  • 10. @joe_Caserta Chief Data Organization (Oversight) Vertical Business Area [Sales/Finance/Marketing/Operations/Customer Svc] Product Owner SCRUM Master Development Team Business Subject Matter Expertise Data Librarian/Data Stewardship Data Science/ Statistical Skills Data Engineering / Architecture Presentation/ BI Report Development Skills Data Quality Assurance DevOps IT Organization (Oversight) Enterprise Data Architect Solution Engineers Data Integration Practice User Experience Practice QA Practice Operations Practice Advanced Analytics Business Analysts Data Analysts Data Scientists Statisticians Data Engineers Planning Organization Project Managers Data Organization Data Gov Coordinator Data Librarians Data Stewards It Takes a Village!
  • 12. @joe_Caserta Global economics Intensity of competition Reduce costs Move to cross-functional teams New executive leadership Speed of technical change Social trends and changes Period of time in present role Status & perks of office/dept under threat No apparent reasons for proposed changes Lack of understanding of proposed changes Fear of inability to cope with new technology Concern over job security Forces for Change Forces Resisting Change Status Quo Moving the Status Quo http://www.change-management-coach.com/force-field-analysis.html
  • 13. @joe_Caserta The Data Lake Paradigm Technology: • Scalable distributed storage  S3 • Pluggable fit-for-purpose processing  EMR • Consistent extensible framework  Spark • Dimensional Data Warehouse  Redshift Functional Capabilities: • Remove barriers between data ingestion and analysis • Democratize Data with Just Enough Data Governance
  • 15. @joe_Caserta Why Spark? “Big Box” tools vs ROI? – Prohibitively expensive  limited by licensing $$$ – Typically limited to the scalability of a single server We Spark! • Development local or distributed is identical • Beautiful high level API’s • Full universe of Python modules • Open source and Free • Blazing fast! • Databricks cloud makes it easier Spark has become our default processing engine for a myriad of engineering & science problems
  • 16. @joe_Caserta Why we Databricks • Interactive UI • Includes a workspace with notebooks, dashboards, job scheduler, point-and-click cluster management • Cluster sharing • Multiple users can connect to the same cluster, saving cost • Security features • Access controls to the whole workspace, clusters • Collaboration • Multi-user access to the same notebook, revision control, and IDE and GitHub integration • Data management • Support for connecting different data sources to Spark, caching service to speed up queries
  • 17. @joe_Caserta Ingest Raw Data Organize, Define, Complete Munging, Blending Machine Learning Data Quality and Monitoring Metadata, ILM , Security Data Catalog Data Integration Fully Governed ( trusted) Arbitrary/Ad-hoc Queries and Reporting Big Data Ware house Data Science Workspace Data Lake – Integrated Sandbox Landing Area – Source Data in “Full Fidelity” Usage Pattern Data Governance Metadata, ILM, Security Corporate Data Pyramid (CDP)
  • 18. @joe_Caserta Data Integration Identity Resolution Data Quality Discovery / Exploration Machine Learning Models Development Reports / Dashboards Applications APIs Structured Data Unstructured Data SQL, NoSQL, Object Store Find Share Collaborate Data Engineer Data Scientist Business Analyst App Developer Analyze Persist DeployIngest Data Lake on the Cloud
  • 20. @joe_Caserta Type Comments Single Touch Rules-Based Statistically Driven Assign the credit to the first or last exposure Assign the credit to each interaction based on business rules Assign the credit to interactions based on data-driven model Ad-Click Mailing MailingE-mail E-mailAd-Click Ad-Click 100% 33% 33% 33% 27% 49% 24% - Last touch only - Ignores bulk of customer journey - Undervalues other interactions and influencers - Subjective - Assigns arbitrary values to each interaction - Lacks analytics rigor to determine weights ü Looks at full behavior patterns ü Consider all touch points ü Can apply different models for best results ü Use data to find correlations between touch points (winning combinations) Path-to-Purchase Methods
  • 21. @joe_Caserta Unifying the Customer Across Channels Customer Data Integration (CDI): Match and manage customer information from all available sources Marketing channels: DMP, Salesforce, Adobe, Social, Direct Mail, Call Center, CRM In other words… We need to figure out how to LINK people across systems!
  • 22. @joe_Caserta Mastering Master Data is Still MDM Standardize Match Survivorship Validate
  • 23. @joe_Caserta Standardization and Matching Cleanse and Parse: • Names • Resolve nicknames • Create deterministic hash, phonetic representation • Addresses • Emails • Phone Numbers Matching: Join based on combinations of cleansed and standardized data to create match results: Spark map operations: • Data cleansing, transformation, and standardization – Address Parsing: usaddress, postal- address, etc – Name Hashing: fuzzy, etc – Genderization: sexmachine, etc
  • 24. @joe_Caserta Mastering Unmanageable Source Data Reveal • Wait for the customer to “reveal” themselves • Create link between anonymous self and known profile Vector • May need behavioral statistical profiling • Compare use vectors Rebuild • Recluster all prior activities • Rebuild the Graph
  • 25. @joe_Caserta The Matching Process The matching process output gives us the relationships between customers: Great, but it’s not very useable, you need to traverse the dataset to find out 1234 and 1235 are the same person (and this is a trivial case) And we need to cluster and identify our survivors (vertex) xid yid match_type 1234 4849 phone 4849 5499 email 5499 1235 address 4849 7788 cookie 5499 7788 cookie 4849 1234 phone
  • 26. @joe_Caserta Graph to the Rescue 1234 4849 5499 7788 We just need to import our edges into a graph and “dump” out communities Don’t think table… think Graph! These matches are actually communities 1235
  • 27. @joe_Caserta Connected Components algorithm labels each connected component of the graph with the ID of its lowest-numbered vertex This lowest number vertex can serve as our “survivor” (not field survivorship) Connected Components xid yid 1234 4849 1234 5499 1234 1235 1234 7788 1234 7788 1234 1234
  • 30. @joe_Caserta The Notebook is the Data Science Tool
  • 31. @joe_Caserta The Redshift DW is still Dimensional
  • 33. @joe_Caserta The Goal: Top Conversion Paths Source: Accelerom AG, Zurich
  • 34. @joe_Caserta • Data Lake: S3 / NoSQL or SQL as Needed • Identity Resolution: Spark / Graph Frames • ETL: Spark/Python • Path-to-Purchase: S3 / Spark / MLlib • Data Warehouse: RedShift • Shared Interface: Notebooks • Business Intelligence: Tableau Recap
  • 35. @joe_Caserta Joe Caserta President, Caserta Concepts joe@casertaconcepts.com @joe_Caserta • Award-winning company • Transformative Data Strategies • Modern Data Engineering • Advanced Architecture • Innovation Partner • Strategic Consulting • Advanced Technical Design • Build & Deploy Solutions • BDW Meetup • New York City • 3,000+ members • Knowledge sharing Data is not important, it’s what you do with it that’s important! Thank You