SlideShare une entreprise Scribd logo
1  sur  14
Télécharger pour lire hors ligne
Data and Analytics at
Holland & Barrett
Building a "3-Michelin-star" data platform on AWS
to power insights at the speed of thought
Dobo Radichkov
Chief Data Officer
7 June 2023
About Holland & Barrett
Founded in 1870, we
exist to make health
and wellness a way of
life for everyone.
3
The Holland & Barrett Data & Analytics vision
For our colleagues
To become the beating heart of the organisation and unlock
success for our colleagues, customers and partners.
For our partners
For our customers
4
The Holland & Barrett Data & Analytics vision
Data platform
Single source of truth
Analytics & BI
Personalisation
Data Science & ML
Health analytics
Analytics in the field
(stores & suppliers)
Data monetisation
For our colleagues
To become the beating heart of the organisation and unlock
success for our colleagues, customers and partners.
For our partners
For our customers
⭐⭐⭐
⭐⭐⭐
⭐⭐⭐
⭐⭐
⭐⭐
⭐
⭐
⭐
⭐⭐⭐ Mature ⭐⭐ Scaling ⭐ Early days
5
We are now in ‘Phase II’ of this journey
▪ Complete core reporting
▪ Self-service BI
▪ Functional analytics
▪ Analytics in the field
▪ Data science & ML
BUILD NEW
FOUNDATION
SCALE OPERA-
TING MODEL
DRIVE VALUE &
INNOVATION
▪ Data strategy & vision
▪ Set up data teams
▪ AWS-centric data lake
▪ Redshift data warehouse
▪ Metabase BI platform
▪ Data as driver of value:
– Increase revenue
– Reduce costs
– Improve UX
– Optimise processes
▪ Data as driver of
innovation
2022 2023 2024+
CRAWL METAMORPHOSE WALK FLY TRANSCEND
I II III
6
The H&B data organisation
§ Data lake &
governance
§ Source
system
integration
§ Data services
§ Data
modelling &
transforma-
tions
§ Single source
of truth for
reporting &
analytics
§ Management
reporting
§ Operational
reporting
§ Data
visualisation
§ Data science
and applied
machine
learning
§ Forecasting &
optimisation
§ Personali-
sation
§ Product
squad
analytics
§ Product
experimen-
tation
§ Digital trade
analytics
§ Performance
marketing
analytics
§ CRM
analytics
1 2 3 4 5
DATA
ENGINEERING
DATA
WAREHOUSE
BUSINESS
INTELLIEGENCE
DATA
SCIENCE
WEB & APP
ANALYTICS
DIGITAL
ANALYTICS
6
7
“3-Michelin-star” data platform 😋
Operational master data
(customers, products, orders, stock, etc.)
BI & Core
Reporting
Data Science /
Applied ML
Product &
Digital Analytics
DATA
WAREHOUSE
Raw systems data
(security, data governance, access control)
DATA LAKE
Supply Chain
Retail Ops
Commercial
Customer
Finance
“Raw
ingredients
& food
storage”
“The
kitchen &
cooking
process”
“The
finished
meals &
service”
AS400
(until
demise)
Oracle
(until
demise)
GA4 …
Till
system
Order
mgmt.
system
Single
view of
stock
Production systems & services
8
Data lake architecture
AS400 Oracle
Amazon
Aurora
Amazon
RDS
On-premise DBs
(legacy estate)
Cloud DBs
…
Kafka Connect
(Amazon MSK)
APIs &
SaaS
DynamoDB
tables
Scraper
(in-house crawler)
Katalog UI Katalog DB
(Aurora PgSQL)
Right to erase
/ access
Eraser /
Accessor Success
Data lake
(Amazon S3)
▪ 5,000 datasets
▪ 98k fields
▪ 10.4M files
Data lake
S3 buckets
▪ JSON*
▪ Parquet
▪ AVRO
▪ CSV
GOVERNANCE
INGEST
Data lake index
(DynamoDB)
Airflow Airflow
1 2 3
4
5
9
Data warehouse architecture
4 x ra3.16xlarge
Data warehouse
(Amazon Redshift)
Data lake
(Amazon S3)
ELT orchestration
COPY
(data ingest)
External tables
(Amazon Redshift Spectrum)
APIs &
SaaS
▪ 2,670 tables
▪ 2m queries / month
▪ Layered data architecture
▪ Raw data stored
in SUPER columns
▪ Hourly ELT with
idempotent pipelines
Cache
(Amazon Aurora)
Foreign data wrapper
(pg_cron for scheduling)
External schema
(live federated queries)
▪ Used as fast storage
layer for data apps
▪ Serves raw data
for ELT data pipelines
1
2
3
10
New Redshift features we are excited about
▪ Long-awaited
improvement that
help us efficiently
generate large pre-
aggregated multi-
dimensional cubes
▪ Great in combination
with HLL functions for
fast unique counts
▪ MERGE to simplify our
incremental data
pipelines
▪ S3 auto-copy to
simplify data lake
ingest pipelines
▪ Aurora zero-ETL
integration to simplify
CDC pipelines
▪ Create ”masked”
versions of tables to
improve data privacy
and governance
▪ Eliminates overhead
of maintaining
multiple versions /
slices of the data
ROLLUP / CUBE
1 DATA MASKING
2 OTHER
3
11
BI & Analytics architecture
Data warehouse
(Amazon Redshift)
Raw data layer
Operational data layer
BI data layer
Cubes
Consumers
Raw unmodified data from source
systems – ELT from data lake
Clean, transformed, disaggregated
entity relationship model – starting
point for all reporting & analytics
Customer, orders, product, stores,
warehouse, stock master data
Semi-aggregated datasets to
enable fast reporting & analytics.
Includes pre-computed
HLL sketches for efficient
unique counts.
Multi-dimensional ROLAP cubes
delivering pre-aggregated metrics
along pre-defined dimensions.
Best practice: CUBE/ROLLUP on
top of pre-computed HLL sketches
Data IDEs
(JDBC)
Data sharing
Athena
One-stop shop
analytics
APIs
1
2
3
4
5
12
Redshift enables all reporting & analytics use cases
▪ Official reporting
built by central BI
team
▪ Self-service
analytics done
autonomously
within teams
▪ Field analytics
embedded in
customer-facing
apps
Registered users (self-service analytics)
13
Data Science & ML architecture
Develop Train Serve
Amazon Athena Amazon Redshift
Amazon EC2 AWS Batch
Aurora / RDS
DynamoDB
API Gateway AWS Lambda
R / Python
Notebooks
Feature engineering
Model development Model training
Amazon Redshift
Feature extraction pipelines
Amazon Athena
EC2 instances
ML data layer
Serverless
1 2 3
14
H&B data drives core business value & innovation
✓ Unit economics
✓ Store network planning
✓ Competitor intelligence
✓ Promo effectiveness
✓ Econometrics / MMM
✓ Space & range analytics
Commercial
Finance
Wellness
Supply chain
✓ Daily / weekly / monthly
management reporting
✓ Operational trade reporting
✓ Intraday / peak reporting
✓ Exception reporting
✓ Single view of stock
✓ Forecasting & replenishment
✓ Fulfilment analytics
✓ Stock availability
✓ Clearance / overstock analytics
✓ Supplier analytics
✓ Diagnostics
✓ Health analytics
✓ Personalised wellness
✓ Behavioural engine
Customer Digital
✓ Single customer view
✓ Customer lifecycle
management
✓ eCRM enablement
✓ Customer lifetime value
✓ Digi marketing measurement
✓ Personalisation & search
✓ OKRs
✓ UX / funnel analytics
✓ Experimentation platform
✓ Web / app event tracking
✓ SEO analytics

Contenu connexe

Tendances

Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseSnowflake Computing
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceDenodo
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleHow to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleDATAVERSITY
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
 
Data Catalogues - Architecting for Collaboration & Self-Service
Data Catalogues - Architecting for Collaboration & Self-ServiceData Catalogues - Architecting for Collaboration & Self-Service
Data Catalogues - Architecting for Collaboration & Self-ServiceDATAVERSITY
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaScyllaDB
 
Changing the game with cloud dw
Changing the game with cloud dwChanging the game with cloud dw
Changing the game with cloud dwelephantscale
 
Informatica MDM Presentation
Informatica MDM PresentationInformatica MDM Presentation
Informatica MDM PresentationMaxHung
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Rajesh Kumar
 
Azure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoAzure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoDimko Zhluktenko
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
The Power of Workday Extend
The Power of Workday ExtendThe Power of Workday Extend
The Power of Workday ExtendWorkday, Inc.
 
Cloud and Data Analytics Architecture: Data Everywhere for Everyone
Cloud and Data Analytics Architecture: Data Everywhere for EveryoneCloud and Data Analytics Architecture: Data Everywhere for Everyone
Cloud and Data Analytics Architecture: Data Everywhere for EveryoneMichal Hodinka
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Cathrine Wilhelmsen
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWSGary Stafford
 
The Connected Consumer – Real-time Customer 360
The Connected Consumer – Real-time Customer 360The Connected Consumer – Real-time Customer 360
The Connected Consumer – Real-time Customer 360Capgemini
 

Tendances (20)

Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and Governance
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleHow to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Data Catalogues - Architecting for Collaboration & Self-Service
Data Catalogues - Architecting for Collaboration & Self-ServiceData Catalogues - Architecting for Collaboration & Self-Service
Data Catalogues - Architecting for Collaboration & Self-Service
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
 
Changing the game with cloud dw
Changing the game with cloud dwChanging the game with cloud dw
Changing the game with cloud dw
 
Informatica MDM Presentation
Informatica MDM PresentationInformatica MDM Presentation
Informatica MDM Presentation
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
 
Azure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoAzure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene Polonichko
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
The Power of Workday Extend
The Power of Workday ExtendThe Power of Workday Extend
The Power of Workday Extend
 
Cloud and Data Analytics Architecture: Data Everywhere for Everyone
Cloud and Data Analytics Architecture: Data Everywhere for EveryoneCloud and Data Analytics Architecture: Data Everywhere for Everyone
Cloud and Data Analytics Architecture: Data Everywhere for Everyone
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
The Connected Consumer – Real-time Customer 360
The Connected Consumer – Real-time Customer 360The Connected Consumer – Real-time Customer 360
The Connected Consumer – Real-time Customer 360
 

Similaire à Data and Analytics at Holland & Barrett: Building a '3-Michelin-star' Data Platform on AWS

Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataPentaho
 
MDS ap_OEM Product Portfolio Intorduction to the DT & Analytics
MDS ap_OEM Product Portfolio Intorduction to the DT & AnalyticsMDS ap_OEM Product Portfolio Intorduction to the DT & Analytics
MDS ap_OEM Product Portfolio Intorduction to the DT & AnalyticsMDS ap
 
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...Lucas Jellema
 
A developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure DatabricksA developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure DatabricksMicrosoft Tech Community
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Amazon Web Services
 
Leveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven DecisionsLeveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven DecisionsAmazon Web Services
 
Introduction to Azure Synapse Webinar
Introduction to Azure Synapse WebinarIntroduction to Azure Synapse Webinar
Introduction to Azure Synapse WebinarPeter Ward
 
Tapdata Product Intro
Tapdata Product IntroTapdata Product Intro
Tapdata Product IntroTapdata
 
UTAD - Jornadas de Informática - Potential of Big Data
UTAD - Jornadas de Informática - Potential of Big DataUTAD - Jornadas de Informática - Potential of Big Data
UTAD - Jornadas de Informática - Potential of Big DataMarco Silva
 
How to Empower Your Business Users with Oracle Data Visualization
How to Empower Your Business Users with Oracle Data VisualizationHow to Empower Your Business Users with Oracle Data Visualization
How to Empower Your Business Users with Oracle Data VisualizationPerficient, Inc.
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationMatthew W. Bowers
 
Using obi apps to consolidate data for taleo, salesforce and net suite apps_ppt
Using obi apps to consolidate data for taleo, salesforce and net suite apps_pptUsing obi apps to consolidate data for taleo, salesforce and net suite apps_ppt
Using obi apps to consolidate data for taleo, salesforce and net suite apps_pptShiv Bharti
 
Alten calsoft labs analytics service offerings
Alten calsoft labs   analytics service offeringsAlten calsoft labs   analytics service offerings
Alten calsoft labs analytics service offeringsSandeep Vyas
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Denodo
 
Data & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft PlatformsData & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft PlatformsSonata Software
 
Cortana analytics ou comment office 365 peut rendre vos données plus intellig...
Cortana analytics ou comment office 365 peut rendre vos données plus intellig...Cortana analytics ou comment office 365 peut rendre vos données plus intellig...
Cortana analytics ou comment office 365 peut rendre vos données plus intellig...Nicolas Georgeault
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceCambridge Semantics
 
Oracle EPM BI Overview
Oracle EPM BI OverviewOracle EPM BI Overview
Oracle EPM BI Overviewcglylesu
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?RTTS
 

Similaire à Data and Analytics at Holland & Barrett: Building a '3-Michelin-star' Data Platform on AWS (20)

Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
 
MDS ap_OEM Product Portfolio Intorduction to the DT & Analytics
MDS ap_OEM Product Portfolio Intorduction to the DT & AnalyticsMDS ap_OEM Product Portfolio Intorduction to the DT & Analytics
MDS ap_OEM Product Portfolio Intorduction to the DT & Analytics
 
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
 
A developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure DatabricksA developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure Databricks
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
Leveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven DecisionsLeveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven Decisions
 
Introduction to Azure Synapse Webinar
Introduction to Azure Synapse WebinarIntroduction to Azure Synapse Webinar
Introduction to Azure Synapse Webinar
 
The Bi-Store Business Intelligence as a Service
The Bi-Store Business Intelligence as a ServiceThe Bi-Store Business Intelligence as a Service
The Bi-Store Business Intelligence as a Service
 
Tapdata Product Intro
Tapdata Product IntroTapdata Product Intro
Tapdata Product Intro
 
UTAD - Jornadas de Informática - Potential of Big Data
UTAD - Jornadas de Informática - Potential of Big DataUTAD - Jornadas de Informática - Potential of Big Data
UTAD - Jornadas de Informática - Potential of Big Data
 
How to Empower Your Business Users with Oracle Data Visualization
How to Empower Your Business Users with Oracle Data VisualizationHow to Empower Your Business Users with Oracle Data Visualization
How to Empower Your Business Users with Oracle Data Visualization
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
 
Using obi apps to consolidate data for taleo, salesforce and net suite apps_ppt
Using obi apps to consolidate data for taleo, salesforce and net suite apps_pptUsing obi apps to consolidate data for taleo, salesforce and net suite apps_ppt
Using obi apps to consolidate data for taleo, salesforce and net suite apps_ppt
 
Alten calsoft labs analytics service offerings
Alten calsoft labs   analytics service offeringsAlten calsoft labs   analytics service offerings
Alten calsoft labs analytics service offerings
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
 
Data & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft PlatformsData & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft Platforms
 
Cortana analytics ou comment office 365 peut rendre vos données plus intellig...
Cortana analytics ou comment office 365 peut rendre vos données plus intellig...Cortana analytics ou comment office 365 peut rendre vos données plus intellig...
Cortana analytics ou comment office 365 peut rendre vos données plus intellig...
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
 
Oracle EPM BI Overview
Oracle EPM BI OverviewOracle EPM BI Overview
Oracle EPM BI Overview
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
 

Plus de Dobo Radichkov

Holland & Barrett: Gen AI Prompt Engineering for Tech teams
Holland & Barrett: Gen AI Prompt Engineering for Tech teamsHolland & Barrett: Gen AI Prompt Engineering for Tech teams
Holland & Barrett: Gen AI Prompt Engineering for Tech teamsDobo Radichkov
 
Unleashing the Power of GPT & LLM: A Holland & Barrett Exploration
Unleashing the Power of GPT & LLM: A Holland & Barrett ExplorationUnleashing the Power of GPT & LLM: A Holland & Barrett Exploration
Unleashing the Power of GPT & LLM: A Holland & Barrett ExplorationDobo Radichkov
 
Customer lifecycle management for fun and profit at OLX, Berlin marketplace c...
Customer lifecycle management for fun and profit at OLX, Berlin marketplace c...Customer lifecycle management for fun and profit at OLX, Berlin marketplace c...
Customer lifecycle management for fun and profit at OLX, Berlin marketplace c...Dobo Radichkov
 
OLX Ventures blockchain perspective, Feb 2018
OLX Ventures blockchain perspective, Feb 2018OLX Ventures blockchain perspective, Feb 2018
OLX Ventures blockchain perspective, Feb 2018Dobo Radichkov
 
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, BarcelonaReal-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, BarcelonaDobo Radichkov
 
OLX Group presentation for AWS Redshift meetup in London, 5 July 2017
OLX Group presentation for AWS Redshift meetup in London, 5 July 2017OLX Group presentation for AWS Redshift meetup in London, 5 July 2017
OLX Group presentation for AWS Redshift meetup in London, 5 July 2017Dobo Radichkov
 

Plus de Dobo Radichkov (6)

Holland & Barrett: Gen AI Prompt Engineering for Tech teams
Holland & Barrett: Gen AI Prompt Engineering for Tech teamsHolland & Barrett: Gen AI Prompt Engineering for Tech teams
Holland & Barrett: Gen AI Prompt Engineering for Tech teams
 
Unleashing the Power of GPT & LLM: A Holland & Barrett Exploration
Unleashing the Power of GPT & LLM: A Holland & Barrett ExplorationUnleashing the Power of GPT & LLM: A Holland & Barrett Exploration
Unleashing the Power of GPT & LLM: A Holland & Barrett Exploration
 
Customer lifecycle management for fun and profit at OLX, Berlin marketplace c...
Customer lifecycle management for fun and profit at OLX, Berlin marketplace c...Customer lifecycle management for fun and profit at OLX, Berlin marketplace c...
Customer lifecycle management for fun and profit at OLX, Berlin marketplace c...
 
OLX Ventures blockchain perspective, Feb 2018
OLX Ventures blockchain perspective, Feb 2018OLX Ventures blockchain perspective, Feb 2018
OLX Ventures blockchain perspective, Feb 2018
 
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, BarcelonaReal-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona
 
OLX Group presentation for AWS Redshift meetup in London, 5 July 2017
OLX Group presentation for AWS Redshift meetup in London, 5 July 2017OLX Group presentation for AWS Redshift meetup in London, 5 July 2017
OLX Group presentation for AWS Redshift meetup in London, 5 July 2017
 

Dernier

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 

Dernier (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Data and Analytics at Holland & Barrett: Building a '3-Michelin-star' Data Platform on AWS

  • 1. Data and Analytics at Holland & Barrett Building a "3-Michelin-star" data platform on AWS to power insights at the speed of thought Dobo Radichkov Chief Data Officer 7 June 2023
  • 2. About Holland & Barrett Founded in 1870, we exist to make health and wellness a way of life for everyone.
  • 3. 3 The Holland & Barrett Data & Analytics vision For our colleagues To become the beating heart of the organisation and unlock success for our colleagues, customers and partners. For our partners For our customers
  • 4. 4 The Holland & Barrett Data & Analytics vision Data platform Single source of truth Analytics & BI Personalisation Data Science & ML Health analytics Analytics in the field (stores & suppliers) Data monetisation For our colleagues To become the beating heart of the organisation and unlock success for our colleagues, customers and partners. For our partners For our customers ⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐ ⭐⭐ ⭐⭐ ⭐ ⭐ ⭐ ⭐⭐⭐ Mature ⭐⭐ Scaling ⭐ Early days
  • 5. 5 We are now in ‘Phase II’ of this journey ▪ Complete core reporting ▪ Self-service BI ▪ Functional analytics ▪ Analytics in the field ▪ Data science & ML BUILD NEW FOUNDATION SCALE OPERA- TING MODEL DRIVE VALUE & INNOVATION ▪ Data strategy & vision ▪ Set up data teams ▪ AWS-centric data lake ▪ Redshift data warehouse ▪ Metabase BI platform ▪ Data as driver of value: – Increase revenue – Reduce costs – Improve UX – Optimise processes ▪ Data as driver of innovation 2022 2023 2024+ CRAWL METAMORPHOSE WALK FLY TRANSCEND I II III
  • 6. 6 The H&B data organisation § Data lake & governance § Source system integration § Data services § Data modelling & transforma- tions § Single source of truth for reporting & analytics § Management reporting § Operational reporting § Data visualisation § Data science and applied machine learning § Forecasting & optimisation § Personali- sation § Product squad analytics § Product experimen- tation § Digital trade analytics § Performance marketing analytics § CRM analytics 1 2 3 4 5 DATA ENGINEERING DATA WAREHOUSE BUSINESS INTELLIEGENCE DATA SCIENCE WEB & APP ANALYTICS DIGITAL ANALYTICS 6
  • 7. 7 “3-Michelin-star” data platform 😋 Operational master data (customers, products, orders, stock, etc.) BI & Core Reporting Data Science / Applied ML Product & Digital Analytics DATA WAREHOUSE Raw systems data (security, data governance, access control) DATA LAKE Supply Chain Retail Ops Commercial Customer Finance “Raw ingredients & food storage” “The kitchen & cooking process” “The finished meals & service” AS400 (until demise) Oracle (until demise) GA4 … Till system Order mgmt. system Single view of stock Production systems & services
  • 8. 8 Data lake architecture AS400 Oracle Amazon Aurora Amazon RDS On-premise DBs (legacy estate) Cloud DBs … Kafka Connect (Amazon MSK) APIs & SaaS DynamoDB tables Scraper (in-house crawler) Katalog UI Katalog DB (Aurora PgSQL) Right to erase / access Eraser / Accessor Success Data lake (Amazon S3) ▪ 5,000 datasets ▪ 98k fields ▪ 10.4M files Data lake S3 buckets ▪ JSON* ▪ Parquet ▪ AVRO ▪ CSV GOVERNANCE INGEST Data lake index (DynamoDB) Airflow Airflow 1 2 3 4 5
  • 9. 9 Data warehouse architecture 4 x ra3.16xlarge Data warehouse (Amazon Redshift) Data lake (Amazon S3) ELT orchestration COPY (data ingest) External tables (Amazon Redshift Spectrum) APIs & SaaS ▪ 2,670 tables ▪ 2m queries / month ▪ Layered data architecture ▪ Raw data stored in SUPER columns ▪ Hourly ELT with idempotent pipelines Cache (Amazon Aurora) Foreign data wrapper (pg_cron for scheduling) External schema (live federated queries) ▪ Used as fast storage layer for data apps ▪ Serves raw data for ELT data pipelines 1 2 3
  • 10. 10 New Redshift features we are excited about ▪ Long-awaited improvement that help us efficiently generate large pre- aggregated multi- dimensional cubes ▪ Great in combination with HLL functions for fast unique counts ▪ MERGE to simplify our incremental data pipelines ▪ S3 auto-copy to simplify data lake ingest pipelines ▪ Aurora zero-ETL integration to simplify CDC pipelines ▪ Create ”masked” versions of tables to improve data privacy and governance ▪ Eliminates overhead of maintaining multiple versions / slices of the data ROLLUP / CUBE 1 DATA MASKING 2 OTHER 3
  • 11. 11 BI & Analytics architecture Data warehouse (Amazon Redshift) Raw data layer Operational data layer BI data layer Cubes Consumers Raw unmodified data from source systems – ELT from data lake Clean, transformed, disaggregated entity relationship model – starting point for all reporting & analytics Customer, orders, product, stores, warehouse, stock master data Semi-aggregated datasets to enable fast reporting & analytics. Includes pre-computed HLL sketches for efficient unique counts. Multi-dimensional ROLAP cubes delivering pre-aggregated metrics along pre-defined dimensions. Best practice: CUBE/ROLLUP on top of pre-computed HLL sketches Data IDEs (JDBC) Data sharing Athena One-stop shop analytics APIs 1 2 3 4 5
  • 12. 12 Redshift enables all reporting & analytics use cases ▪ Official reporting built by central BI team ▪ Self-service analytics done autonomously within teams ▪ Field analytics embedded in customer-facing apps Registered users (self-service analytics)
  • 13. 13 Data Science & ML architecture Develop Train Serve Amazon Athena Amazon Redshift Amazon EC2 AWS Batch Aurora / RDS DynamoDB API Gateway AWS Lambda R / Python Notebooks Feature engineering Model development Model training Amazon Redshift Feature extraction pipelines Amazon Athena EC2 instances ML data layer Serverless 1 2 3
  • 14. 14 H&B data drives core business value & innovation ✓ Unit economics ✓ Store network planning ✓ Competitor intelligence ✓ Promo effectiveness ✓ Econometrics / MMM ✓ Space & range analytics Commercial Finance Wellness Supply chain ✓ Daily / weekly / monthly management reporting ✓ Operational trade reporting ✓ Intraday / peak reporting ✓ Exception reporting ✓ Single view of stock ✓ Forecasting & replenishment ✓ Fulfilment analytics ✓ Stock availability ✓ Clearance / overstock analytics ✓ Supplier analytics ✓ Diagnostics ✓ Health analytics ✓ Personalised wellness ✓ Behavioural engine Customer Digital ✓ Single customer view ✓ Customer lifecycle management ✓ eCRM enablement ✓ Customer lifetime value ✓ Digi marketing measurement ✓ Personalisation & search ✓ OKRs ✓ UX / funnel analytics ✓ Experimentation platform ✓ Web / app event tracking ✓ SEO analytics