SlideShare une entreprise Scribd logo
1  sur  41
Ravi Pillala, Chief Data Architect & Distinguished Engineer
Modernizing Analytics & AI for today’s needs:
Intuit TurboTax Case Study
7/21/2022
©2021 Intuit Inc. All rights reserved. 2
Consumers
Small businesses
Self-employed
Who we serve
©2021 Intuit Inc. All rights reserved. 4
Unique consumer and small business assets at scale
©2021 Intuit Inc. All rights reserved. 5
Married 2 years ago— last year he claimed his
daughter, Candace, as a dependent. This year
his ex-wife will claim their daughter.
Recently left his job at Toyota to work for Honda
Had been renting, but just bought a condo
Goal
To be confident he can file easily
with TurboTax, given all the
changes in his life.
RETURNING TURBOTAX CUSTOMER
Liam
©2021 Intuit Inc. All rights reserved. 6
Goal
To be confident she can file easily
with TurboTax to get the maximum refund possible.
First time filers
Liam
©2021 Intuit Inc. All rights reserved. 7
©2021 Intuit Inc. All rights reserved. 7
Intuit Confidential and Proprietary 7
©2021 Intuit Inc. All rights reserved. 8
Powering Prosperity with AI and Data-driven platforms
©2021 Intuit Inc. All rights reserved. 9
©2021 Intuit Inc. All rights reserved. 9
Intuit Confidential and Proprietary 9
©2021 Intuit Inc. All rights reserved. 10
Behavioral Analytics
● Event Collection Standards
● Customer intents
● Personalization
Key areas to focus
Analytics at Scale
● Analytics tech stack
● Separate storage &
processing
● Real-time analytics
Data Discovery/Understanding
● Data Documentation
● Tools to explore data
● Data Stewardship
● Centralized Governance
● Data Lineage
©2021 Intuit Inc. All rights reserved. 11
Event Collection (From → To)
©2021 Intuit Inc. All rights reserved. 12
From: Behavior Analytics - Event Collection
©2021 Intuit Inc. All rights reserved. 13
Data available for consumption after 4 hours to 1 day
Legacy Clickstream Architecture
©2021 Intuit Inc. All rights reserved. 14
Legacy Payload
fid : 75F773438B1D0E25-3DDB5C9586B1731B
cc : USD
ch : support
c1 : TT_S_SQ_COOKIE
c2 : 1588699149406
c4 : fecb4198593190599779
c5 : Customer Care
c6 : sh-view
c7 : Help System<mytt
c14 : View>LCQ>4331716>>2>IL
c19 : ViewWidget
c34 : en-US
c36 : websdk-prod
c44 : HPArticle<MYTT:undefined<expert_approved_ugc:false
v3 : display:viewWidget
pageName : MYTT/sh-view
v47 : https://ttlc.intuit.com/questions/4331716
WHERE ?
WHAT ?
WHO ?
Unreadable and Difficult to Use
©2021 Intuit Inc. All rights reserved. 15
To: Behavior Analytics - Event Collection
Amplitude
Adobe
Braze
©2021 Intuit Inc. All rights reserved. 16
Rainbow Properties
action object
What
(logical)
object_detail
ui_action
What
(behavioral)
ui_object_detail ui_access_point
ui_object
Domain purpose
org scope
Where screen
scope_area
ivid pseudonym_id
Who
©2021 Intuit Inc. All rights reserved. 17
Event Collection Standards (ECS) - Standard Event Tracking Example
WHO
WHAT
org : cg
purpose : prod
scope : turbotax
event sender name : oihs/contact-us-plugin/widget
event sender purpose : care
event sender scope : contactus
event sender screen : questionStep
event : content : engaged
object : content
action : engaged
search term : I haven't received my refund yet and I need to know what's the
problem.
ui action : clicked
ui object : button
ui object detail : Continue
workflow id : 7fa8d4d6-6fb5-41c0-b2d5-971742227b6c
topic name : cg-turbotax-clickstream
timestamp : 2020-05-04T06:27:41.799Z
userId : 20abd451b935d4c27ad417a258f15ccba
*** This example only includes a specific subset of attributes ***
©2021 Intuit Inc. All rights reserved. 18
Behavioral Analytics
● Event Collection Standards
● Customer intents
● Personalization
Key areas to focus
Analytics at Scale
● Analytics tech stack
● Separate storage &
processing
● Real-time analytics
Data Discovery/Understanding
● Data Documentation
● Tools to explore data
● Data Stewardship
● Centralized Governance
● Data Lineage
©2021 Intuit Inc. All rights reserved. 19
Intuit analytics journey before modernization
Reporting silos MPP appliance Hadoop data lake New MPP appliance Migrated to Cloud
©2021 Intuit Inc. All rights reserved. 20
MPP
Data Lake
Lift and Shift to AWS
Data Sources
Applications
Behavioral
3rd Party
Hive Metastore
Data
EC2 EBS
… …
EMR Cluster
Batch
Stream
Processing
Data Workers
Tables : 50K Data : 2.5PB ETLs: 10K Queries: 500K Users: 2000 ETL Users: 60
©2021 Intuit Inc. All rights reserved. 21
ETL Processing
Data Lake
Data Sources
Applications
Behavioral
3rd Party
Hive Metastore
Data
EMR Cluster
Batch
Stream
Data Workers
AWS Glue
Redshift ETL
Athena
Redshift Reporting
Dashboards
Phase 2: Migrating to Redshift (Modernizing analytics)
Tables : 10K Data : 400TB ETLs: 3K Queries: 130K Users: 2000 ETL Users: N/A
Modernized analytics platform with Redshift
Amazon Redshift
managed storage
Data sharing Amazon Redshift Spectrum Concurrency scaling
Elasticity
©2021 Intuit Inc. All rights reserved. 23
Behavioral Analytics
● Event Collection Standards
● Customer intents
● Personalization
Key areas to focus
Analytics at Scale
● Analytics tech stack
● Separate storage &
processing
● Real-time analytics
Data Discovery/Understanding
● Data Documentation
● Tools to explore data
● Data Stewardship
● Centralized Governance
● Data Lineage
©2021 Intuit Inc. All rights reserved. 24
Processors and Pipelines
● Serial processors (e.g., reusable intermediate topic)
● Parallel processors (e.g., fleet deployment)
● Processor = Business Logic & Code
● Pipeline = Deployment & Infrastructure
©2021 Intuit Inc. All rights reserved. 25
Processor CI/CD Layer
UX Layer
Control Layer
Runtime Layer
Infrastructure Layer
Application Layer
Pipeline CI/CD Layer
Customer
Experience
Behind-the-scenes
Tech Stack Overview
©2021 Intuit Inc. All rights reserved. 26
Behavioral Analytics
● Event Collection Standards
● Customer intents
● Personalization
Key areas to focus
Analytics at Scale
● Analytics tech stack
● Separate storage &
processing
● Real-time analytics
Data Discovery/Understanding
● Data Documentation
● Tools to explore data
● Data Stewardship
● Centralized Governance
● Data Lineage
©2021 Intuit Inc. All rights reserved. 27
Our Data Ecosystem is big, complex and messy...
©2021 Intuit Inc. All rights reserved. 28
We have a lot of data which is great, but very hard to discover and figure out what to use
Our Data Ecosystem is big, complex and messy...
DATA LAKE
DATA
WAREHOUSE(S)
200,000+
Tables
3,000+
Schemas
200+
Data Sources
DATA MARTS
CURATED
DATA
RAW DATA
ANALYST
PROCESSED
DATA SOURCES
SELECT RAW
DATA
DATA MARTS
REPORTING TABLES
& more
Internal
External/3P
Pradeep
©2021 Intuit Inc. All rights reserved. 29
I am a DATA SCIENTIST building ML
models and often use data produced
by BU/FG Analysts. I would like to
know the owner, data quality and
reliability of the data I want to use.
I am a BU DEVELOPER trying to
see if data produced by the new
service launched is being ingested
accurately into the lake for
downstream consumption.
I am a DATA ENGINEER building
pipelines for data marts and trying to
choose the right data for my use-case and
get alerted when metadata changes
occur so I can ensure my pipelines
continue to work properly.
I am a BUSINESS ANALYST trying to
build Dashboards to report on KPIs
for a new product Feature launched. I
need to find data that I can trust and
use for my analysis.
What are the Core Personas and why is data important to them
I am a ENTITY DATA STEWARD
curating Data Map entities in my
domain for downstream use. I need
to query the raw data to produce
the entities.
Our Users
Veena
©2021 Intuit Inc. All rights reserved. 30
Understanding user problems we need to solve
What is making data discovery and exploration hard for our data workers?
Where can I find the data?
What does the data mean?
Can I trust the data?
How is the data connected?
How can I get access to data?
Which datasource to use?
When to use what tool?
Why are my queries slow?
DISCOVERY EXPLORATION
©2021 Intuit Inc. All rights reserved. 31
Ideal State
What users need for a great Data Discovery & Exploration experience?
A tool that helps our data workers to
● easily find relevant data that is well-documented, reliable & trustable by
providing quality metrics like data freshness, completeness and the ability to
quickly reach out to the owner for clarifications and see similar data and
joins to solve the use-case
● seamlessly request for access, run queries against blazing-fast,
performant engines, reuse & share their work
Veena
solve it!
©2021 Intuit Inc. All rights reserved. 33
Data Map
OUR
APPROACH
Data Discovery Data Exploration
Organize and
govern data
across Intuit
Build a rich
data
discovery
(catalog)
experience
for all our
data in the
lake &
warehouses
Buy a
superior data
exploration
tool for all our
data
- powered by
MDR
©2021 Intuit Inc. All rights reserved. 34
Data Discovery app
©2021 Intuit Inc. All rights reserved. 35
Data Exploration
©2021 Intuit Inc. All rights reserved. 36
Behavioral Analytics
● Event Collection Standards
● Customer intents
● Personalization
Key areas to focus
Analytics at Scale
● Analytics tech stack
● Separate storage &
processing
● Real-time analytics
Data Discovery/Understanding
● Data Documentation
● Tools to explore data
● Data Stewardship
● Centralized Governance
● Data Lineage
Work In Progress!!!
©2021 Intuit Inc. All rights reserved. 38
AWS Glue/ Lake Formation: Data Lake Design
©2021 Intuit Inc. All rights reserved. 39
Data Lake & Data Mesh
Ta
x
Work
Commerce
Finance
Q&A
©2021 Intuit Inc. All rights reserved. 41
Intuit’s Journey
ERA OF
DOS
ERA OF
WINDOWS
ERA OF
WEB
ERA OF
MOBILE AND CLOUD
ERA OF
ARTIFICIAL
INTELLIGENCE
D
A
T
A
V
O
L
U
M
E
P
E
R
C
U
S
T
O
M
E
R
1980s 1990s 2000s 2010s 2020 to Present*
Intuit Founded Customers: 1.3M
Revenue: $33M
Digital Footprint: MBs
Customers: 5.6M
Revenue: $1B
Digital Footprint: GBs
Customers: 29M
Revenue: $3.5B
Digital Footprint: TBs
Customers: 102M
Revenue: $9.6B
Digital Footprint: PBs
2019: Analytical
Platform on AWS
2021: Analytics
powered by Redshift

Contenu connexe

Similaire à Modernizing Analytics & AI at Intuit: A Case Study

Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...HostedbyConfluent
 
apidays LIVE Paris 2021 - EDI & API on One Integration Platform by Mir Mustha...
apidays LIVE Paris 2021 - EDI & API on One Integration Platform by Mir Mustha...apidays LIVE Paris 2021 - EDI & API on One Integration Platform by Mir Mustha...
apidays LIVE Paris 2021 - EDI & API on One Integration Platform by Mir Mustha...apidays
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Denodo
 
There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?Aerospike, Inc.
 
Thought leadership Oct2015 selfserve
Thought leadership Oct2015 selfserveThought leadership Oct2015 selfserve
Thought leadership Oct2015 selfserveRon Krzoska
 
RWDG Slides: Using Tools to Advance Your Data Governance Program
RWDG Slides: Using Tools to Advance Your Data Governance ProgramRWDG Slides: Using Tools to Advance Your Data Governance Program
RWDG Slides: Using Tools to Advance Your Data Governance ProgramDATAVERSITY
 
How to Digitize Industrial Manufacturing with Azure IoT Edge, InfluxDB, and M...
How to Digitize Industrial Manufacturing with Azure IoT Edge, InfluxDB, and M...How to Digitize Industrial Manufacturing with Azure IoT Edge, InfluxDB, and M...
How to Digitize Industrial Manufacturing with Azure IoT Edge, InfluxDB, and M...InfluxData
 
Powering the Internet of Things with Apache Hadoop
Powering the Internet of Things with Apache HadoopPowering the Internet of Things with Apache Hadoop
Powering the Internet of Things with Apache HadoopCloudera, Inc.
 
AI Foundations: Simpler Technologies, Smarter Business
AI Foundations: Simpler Technologies, Smarter BusinessAI Foundations: Simpler Technologies, Smarter Business
AI Foundations: Simpler Technologies, Smarter BusinessTIBCO_Software
 
The State of Streaming Analytics: The Need for Speed and Scale
The State of Streaming Analytics: The Need for Speed and ScaleThe State of Streaming Analytics: The Need for Speed and Scale
The State of Streaming Analytics: The Need for Speed and ScaleVoltDB
 
Snowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the UglySnowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the UglySamanthaBerlant
 
IRJET- Data Analytics & Visualization using Qlik
IRJET- Data Analytics & Visualization using QlikIRJET- Data Analytics & Visualization using Qlik
IRJET- Data Analytics & Visualization using QlikIRJET Journal
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightCloudera, Inc.
 
How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...
How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...
How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...Enterprise Management Associates
 
Digital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraDigital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraAttunity
 
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...HostedbyConfluent
 
Data Science Case Studies: The Internet of Things: Implications for the Enter...
Data Science Case Studies: The Internet of Things: Implications for the Enter...Data Science Case Studies: The Internet of Things: Implications for the Enter...
Data Science Case Studies: The Internet of Things: Implications for the Enter...VMware Tanzu
 
INTERFACE, by apidays - The Evolution of Data Movement.pdf
INTERFACE, by apidays - The Evolution of Data Movement.pdfINTERFACE, by apidays - The Evolution of Data Movement.pdf
INTERFACE, by apidays - The Evolution of Data Movement.pdfapidays
 
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025Nicola Sandoli
 

Similaire à Modernizing Analytics & AI at Intuit: A Case Study (20)

Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
 
apidays LIVE Paris 2021 - EDI & API on One Integration Platform by Mir Mustha...
apidays LIVE Paris 2021 - EDI & API on One Integration Platform by Mir Mustha...apidays LIVE Paris 2021 - EDI & API on One Integration Platform by Mir Mustha...
apidays LIVE Paris 2021 - EDI & API on One Integration Platform by Mir Mustha...
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?
 
Thought leadership Oct2015 selfserve
Thought leadership Oct2015 selfserveThought leadership Oct2015 selfserve
Thought leadership Oct2015 selfserve
 
RWDG Slides: Using Tools to Advance Your Data Governance Program
RWDG Slides: Using Tools to Advance Your Data Governance ProgramRWDG Slides: Using Tools to Advance Your Data Governance Program
RWDG Slides: Using Tools to Advance Your Data Governance Program
 
How to Digitize Industrial Manufacturing with Azure IoT Edge, InfluxDB, and M...
How to Digitize Industrial Manufacturing with Azure IoT Edge, InfluxDB, and M...How to Digitize Industrial Manufacturing with Azure IoT Edge, InfluxDB, and M...
How to Digitize Industrial Manufacturing with Azure IoT Edge, InfluxDB, and M...
 
Powering the Internet of Things with Apache Hadoop
Powering the Internet of Things with Apache HadoopPowering the Internet of Things with Apache Hadoop
Powering the Internet of Things with Apache Hadoop
 
AI Foundations: Simpler Technologies, Smarter Business
AI Foundations: Simpler Technologies, Smarter BusinessAI Foundations: Simpler Technologies, Smarter Business
AI Foundations: Simpler Technologies, Smarter Business
 
Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?
 
The State of Streaming Analytics: The Need for Speed and Scale
The State of Streaming Analytics: The Need for Speed and ScaleThe State of Streaming Analytics: The Need for Speed and Scale
The State of Streaming Analytics: The Need for Speed and Scale
 
Snowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the UglySnowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the Ugly
 
IRJET- Data Analytics & Visualization using Qlik
IRJET- Data Analytics & Visualization using QlikIRJET- Data Analytics & Visualization using Qlik
IRJET- Data Analytics & Visualization using Qlik
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
 
How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...
How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...
How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...
 
Digital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraDigital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming Era
 
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
 
Data Science Case Studies: The Internet of Things: Implications for the Enter...
Data Science Case Studies: The Internet of Things: Implications for the Enter...Data Science Case Studies: The Internet of Things: Implications for the Enter...
Data Science Case Studies: The Internet of Things: Implications for the Enter...
 
INTERFACE, by apidays - The Evolution of Data Movement.pdf
INTERFACE, by apidays - The Evolution of Data Movement.pdfINTERFACE, by apidays - The Evolution of Data Movement.pdf
INTERFACE, by apidays - The Evolution of Data Movement.pdf
 
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
 

Plus de Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...Data Con LA
 

Plus de Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...
 

Dernier

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 

Dernier (20)

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 

Modernizing Analytics & AI at Intuit: A Case Study

  • 1. Ravi Pillala, Chief Data Architect & Distinguished Engineer Modernizing Analytics & AI for today’s needs: Intuit TurboTax Case Study 7/21/2022
  • 2. ©2021 Intuit Inc. All rights reserved. 2
  • 4. ©2021 Intuit Inc. All rights reserved. 4 Unique consumer and small business assets at scale
  • 5. ©2021 Intuit Inc. All rights reserved. 5 Married 2 years ago— last year he claimed his daughter, Candace, as a dependent. This year his ex-wife will claim their daughter. Recently left his job at Toyota to work for Honda Had been renting, but just bought a condo Goal To be confident he can file easily with TurboTax, given all the changes in his life. RETURNING TURBOTAX CUSTOMER Liam
  • 6. ©2021 Intuit Inc. All rights reserved. 6 Goal To be confident she can file easily with TurboTax to get the maximum refund possible. First time filers Liam
  • 7. ©2021 Intuit Inc. All rights reserved. 7 ©2021 Intuit Inc. All rights reserved. 7 Intuit Confidential and Proprietary 7
  • 8. ©2021 Intuit Inc. All rights reserved. 8 Powering Prosperity with AI and Data-driven platforms
  • 9. ©2021 Intuit Inc. All rights reserved. 9 ©2021 Intuit Inc. All rights reserved. 9 Intuit Confidential and Proprietary 9
  • 10. ©2021 Intuit Inc. All rights reserved. 10 Behavioral Analytics ● Event Collection Standards ● Customer intents ● Personalization Key areas to focus Analytics at Scale ● Analytics tech stack ● Separate storage & processing ● Real-time analytics Data Discovery/Understanding ● Data Documentation ● Tools to explore data ● Data Stewardship ● Centralized Governance ● Data Lineage
  • 11. ©2021 Intuit Inc. All rights reserved. 11 Event Collection (From → To)
  • 12. ©2021 Intuit Inc. All rights reserved. 12 From: Behavior Analytics - Event Collection
  • 13. ©2021 Intuit Inc. All rights reserved. 13 Data available for consumption after 4 hours to 1 day Legacy Clickstream Architecture
  • 14. ©2021 Intuit Inc. All rights reserved. 14 Legacy Payload fid : 75F773438B1D0E25-3DDB5C9586B1731B cc : USD ch : support c1 : TT_S_SQ_COOKIE c2 : 1588699149406 c4 : fecb4198593190599779 c5 : Customer Care c6 : sh-view c7 : Help System<mytt c14 : View>LCQ>4331716>>2>IL c19 : ViewWidget c34 : en-US c36 : websdk-prod c44 : HPArticle<MYTT:undefined<expert_approved_ugc:false v3 : display:viewWidget pageName : MYTT/sh-view v47 : https://ttlc.intuit.com/questions/4331716 WHERE ? WHAT ? WHO ? Unreadable and Difficult to Use
  • 15. ©2021 Intuit Inc. All rights reserved. 15 To: Behavior Analytics - Event Collection Amplitude Adobe Braze
  • 16. ©2021 Intuit Inc. All rights reserved. 16 Rainbow Properties action object What (logical) object_detail ui_action What (behavioral) ui_object_detail ui_access_point ui_object Domain purpose org scope Where screen scope_area ivid pseudonym_id Who
  • 17. ©2021 Intuit Inc. All rights reserved. 17 Event Collection Standards (ECS) - Standard Event Tracking Example WHO WHAT org : cg purpose : prod scope : turbotax event sender name : oihs/contact-us-plugin/widget event sender purpose : care event sender scope : contactus event sender screen : questionStep event : content : engaged object : content action : engaged search term : I haven't received my refund yet and I need to know what's the problem. ui action : clicked ui object : button ui object detail : Continue workflow id : 7fa8d4d6-6fb5-41c0-b2d5-971742227b6c topic name : cg-turbotax-clickstream timestamp : 2020-05-04T06:27:41.799Z userId : 20abd451b935d4c27ad417a258f15ccba *** This example only includes a specific subset of attributes ***
  • 18. ©2021 Intuit Inc. All rights reserved. 18 Behavioral Analytics ● Event Collection Standards ● Customer intents ● Personalization Key areas to focus Analytics at Scale ● Analytics tech stack ● Separate storage & processing ● Real-time analytics Data Discovery/Understanding ● Data Documentation ● Tools to explore data ● Data Stewardship ● Centralized Governance ● Data Lineage
  • 19. ©2021 Intuit Inc. All rights reserved. 19 Intuit analytics journey before modernization Reporting silos MPP appliance Hadoop data lake New MPP appliance Migrated to Cloud
  • 20. ©2021 Intuit Inc. All rights reserved. 20 MPP Data Lake Lift and Shift to AWS Data Sources Applications Behavioral 3rd Party Hive Metastore Data EC2 EBS … … EMR Cluster Batch Stream Processing Data Workers Tables : 50K Data : 2.5PB ETLs: 10K Queries: 500K Users: 2000 ETL Users: 60
  • 21. ©2021 Intuit Inc. All rights reserved. 21 ETL Processing Data Lake Data Sources Applications Behavioral 3rd Party Hive Metastore Data EMR Cluster Batch Stream Data Workers AWS Glue Redshift ETL Athena Redshift Reporting Dashboards Phase 2: Migrating to Redshift (Modernizing analytics) Tables : 10K Data : 400TB ETLs: 3K Queries: 130K Users: 2000 ETL Users: N/A
  • 22. Modernized analytics platform with Redshift Amazon Redshift managed storage Data sharing Amazon Redshift Spectrum Concurrency scaling Elasticity
  • 23. ©2021 Intuit Inc. All rights reserved. 23 Behavioral Analytics ● Event Collection Standards ● Customer intents ● Personalization Key areas to focus Analytics at Scale ● Analytics tech stack ● Separate storage & processing ● Real-time analytics Data Discovery/Understanding ● Data Documentation ● Tools to explore data ● Data Stewardship ● Centralized Governance ● Data Lineage
  • 24. ©2021 Intuit Inc. All rights reserved. 24 Processors and Pipelines ● Serial processors (e.g., reusable intermediate topic) ● Parallel processors (e.g., fleet deployment) ● Processor = Business Logic & Code ● Pipeline = Deployment & Infrastructure
  • 25. ©2021 Intuit Inc. All rights reserved. 25 Processor CI/CD Layer UX Layer Control Layer Runtime Layer Infrastructure Layer Application Layer Pipeline CI/CD Layer Customer Experience Behind-the-scenes Tech Stack Overview
  • 26. ©2021 Intuit Inc. All rights reserved. 26 Behavioral Analytics ● Event Collection Standards ● Customer intents ● Personalization Key areas to focus Analytics at Scale ● Analytics tech stack ● Separate storage & processing ● Real-time analytics Data Discovery/Understanding ● Data Documentation ● Tools to explore data ● Data Stewardship ● Centralized Governance ● Data Lineage
  • 27. ©2021 Intuit Inc. All rights reserved. 27 Our Data Ecosystem is big, complex and messy...
  • 28. ©2021 Intuit Inc. All rights reserved. 28 We have a lot of data which is great, but very hard to discover and figure out what to use Our Data Ecosystem is big, complex and messy... DATA LAKE DATA WAREHOUSE(S) 200,000+ Tables 3,000+ Schemas 200+ Data Sources DATA MARTS CURATED DATA RAW DATA ANALYST PROCESSED DATA SOURCES SELECT RAW DATA DATA MARTS REPORTING TABLES & more Internal External/3P Pradeep
  • 29. ©2021 Intuit Inc. All rights reserved. 29 I am a DATA SCIENTIST building ML models and often use data produced by BU/FG Analysts. I would like to know the owner, data quality and reliability of the data I want to use. I am a BU DEVELOPER trying to see if data produced by the new service launched is being ingested accurately into the lake for downstream consumption. I am a DATA ENGINEER building pipelines for data marts and trying to choose the right data for my use-case and get alerted when metadata changes occur so I can ensure my pipelines continue to work properly. I am a BUSINESS ANALYST trying to build Dashboards to report on KPIs for a new product Feature launched. I need to find data that I can trust and use for my analysis. What are the Core Personas and why is data important to them I am a ENTITY DATA STEWARD curating Data Map entities in my domain for downstream use. I need to query the raw data to produce the entities. Our Users Veena
  • 30. ©2021 Intuit Inc. All rights reserved. 30 Understanding user problems we need to solve What is making data discovery and exploration hard for our data workers? Where can I find the data? What does the data mean? Can I trust the data? How is the data connected? How can I get access to data? Which datasource to use? When to use what tool? Why are my queries slow? DISCOVERY EXPLORATION
  • 31. ©2021 Intuit Inc. All rights reserved. 31 Ideal State What users need for a great Data Discovery & Exploration experience? A tool that helps our data workers to ● easily find relevant data that is well-documented, reliable & trustable by providing quality metrics like data freshness, completeness and the ability to quickly reach out to the owner for clarifications and see similar data and joins to solve the use-case ● seamlessly request for access, run queries against blazing-fast, performant engines, reuse & share their work Veena
  • 33. ©2021 Intuit Inc. All rights reserved. 33 Data Map OUR APPROACH Data Discovery Data Exploration Organize and govern data across Intuit Build a rich data discovery (catalog) experience for all our data in the lake & warehouses Buy a superior data exploration tool for all our data - powered by MDR
  • 34. ©2021 Intuit Inc. All rights reserved. 34 Data Discovery app
  • 35. ©2021 Intuit Inc. All rights reserved. 35 Data Exploration
  • 36. ©2021 Intuit Inc. All rights reserved. 36 Behavioral Analytics ● Event Collection Standards ● Customer intents ● Personalization Key areas to focus Analytics at Scale ● Analytics tech stack ● Separate storage & processing ● Real-time analytics Data Discovery/Understanding ● Data Documentation ● Tools to explore data ● Data Stewardship ● Centralized Governance ● Data Lineage
  • 38. ©2021 Intuit Inc. All rights reserved. 38 AWS Glue/ Lake Formation: Data Lake Design
  • 39. ©2021 Intuit Inc. All rights reserved. 39 Data Lake & Data Mesh Ta x Work Commerce Finance
  • 40. Q&A
  • 41. ©2021 Intuit Inc. All rights reserved. 41 Intuit’s Journey ERA OF DOS ERA OF WINDOWS ERA OF WEB ERA OF MOBILE AND CLOUD ERA OF ARTIFICIAL INTELLIGENCE D A T A V O L U M E P E R C U S T O M E R 1980s 1990s 2000s 2010s 2020 to Present* Intuit Founded Customers: 1.3M Revenue: $33M Digital Footprint: MBs Customers: 5.6M Revenue: $1B Digital Footprint: GBs Customers: 29M Revenue: $3.5B Digital Footprint: TBs Customers: 102M Revenue: $9.6B Digital Footprint: PBs 2019: Analytical Platform on AWS 2021: Analytics powered by Redshift