This presentation, delivered at the AWS London Summit 2023, provides an in-depth look at how Holland & Barrett built a robust, high-performing data platform on AWS to drive insights at the speed of thought. Dobo Radichkov, Chief Data Officer, shares key aspects of the data strategy, outlining how the company utilised AWS Redshift, Metabase, and Retool to create an efficient data lake, data warehouse, and analytics layer. The presentation also discusses the transformative impact of this data infrastructure on various business areas, including Finance, Commercial, Supply Chain, Customer, Digital, and Wellness. Through this data-driven journey, Holland & Barrett aims to become the beating heart of the organization, unlocking success for colleagues, customers, and partners alike.
In the presentation, Dobo Radichkov lays out Holland & Barrett's vision to make their Data & Analytics team the heartbeat of the organization, a vision that has guided their strategy and tool selection. He explains how this vision is brought to life through their organizational structure, comprising of six specialized teams: Data Engineering, Data Warehouse, Business Intelligence, Data Science, Web & App Analytics, and Digital Analytics.
Dobo takes the audience through the company's strategic roadmap, a three-phase plan guiding the growth and development of their data capabilities. This roadmap isn’t just a technological plan but signifies a transformational journey for the team, aiming to embed data-driven decision-making in the DNA of Holland & Barrett.
Lastly, he showcases the '3-Michelin-star' data platform's architecture, painting a clear picture of how data moves from raw systems to the operational master data and, finally, to the analytics layer. The presentation concludes by highlighting how the newly formed data platform drives core business value and innovation across various business domains, reinforcing Holland & Barrett's commitment to becoming a data-led organization.
What Are The Drone Anti-jamming Systems Technology?
Data and Analytics at Holland & Barrett: Building a '3-Michelin-star' Data Platform on AWS
1. Data and Analytics at
Holland & Barrett
Building a "3-Michelin-star" data platform on AWS
to power insights at the speed of thought
Dobo Radichkov
Chief Data Officer
7 June 2023
2. About Holland & Barrett
Founded in 1870, we
exist to make health
and wellness a way of
life for everyone.
3. 3
The Holland & Barrett Data & Analytics vision
For our colleagues
To become the beating heart of the organisation and unlock
success for our colleagues, customers and partners.
For our partners
For our customers
4. 4
The Holland & Barrett Data & Analytics vision
Data platform
Single source of truth
Analytics & BI
Personalisation
Data Science & ML
Health analytics
Analytics in the field
(stores & suppliers)
Data monetisation
For our colleagues
To become the beating heart of the organisation and unlock
success for our colleagues, customers and partners.
For our partners
For our customers
⭐⭐⭐
⭐⭐⭐
⭐⭐⭐
⭐⭐
⭐⭐
⭐
⭐
⭐
⭐⭐⭐ Mature ⭐⭐ Scaling ⭐ Early days
5. 5
We are now in ‘Phase II’ of this journey
▪ Complete core reporting
▪ Self-service BI
▪ Functional analytics
▪ Analytics in the field
▪ Data science & ML
BUILD NEW
FOUNDATION
SCALE OPERA-
TING MODEL
DRIVE VALUE &
INNOVATION
▪ Data strategy & vision
▪ Set up data teams
▪ AWS-centric data lake
▪ Redshift data warehouse
▪ Metabase BI platform
▪ Data as driver of value:
– Increase revenue
– Reduce costs
– Improve UX
– Optimise processes
▪ Data as driver of
innovation
2022 2023 2024+
CRAWL METAMORPHOSE WALK FLY TRANSCEND
I II III
6. 6
The H&B data organisation
§ Data lake &
governance
§ Source
system
integration
§ Data services
§ Data
modelling &
transforma-
tions
§ Single source
of truth for
reporting &
analytics
§ Management
reporting
§ Operational
reporting
§ Data
visualisation
§ Data science
and applied
machine
learning
§ Forecasting &
optimisation
§ Personali-
sation
§ Product
squad
analytics
§ Product
experimen-
tation
§ Digital trade
analytics
§ Performance
marketing
analytics
§ CRM
analytics
1 2 3 4 5
DATA
ENGINEERING
DATA
WAREHOUSE
BUSINESS
INTELLIEGENCE
DATA
SCIENCE
WEB & APP
ANALYTICS
DIGITAL
ANALYTICS
6
7. 7
“3-Michelin-star” data platform 😋
Operational master data
(customers, products, orders, stock, etc.)
BI & Core
Reporting
Data Science /
Applied ML
Product &
Digital Analytics
DATA
WAREHOUSE
Raw systems data
(security, data governance, access control)
DATA LAKE
Supply Chain
Retail Ops
Commercial
Customer
Finance
“Raw
ingredients
& food
storage”
“The
kitchen &
cooking
process”
“The
finished
meals &
service”
AS400
(until
demise)
Oracle
(until
demise)
GA4 …
Till
system
Order
mgmt.
system
Single
view of
stock
Production systems & services
8. 8
Data lake architecture
AS400 Oracle
Amazon
Aurora
Amazon
RDS
On-premise DBs
(legacy estate)
Cloud DBs
…
Kafka Connect
(Amazon MSK)
APIs &
SaaS
DynamoDB
tables
Scraper
(in-house crawler)
Katalog UI Katalog DB
(Aurora PgSQL)
Right to erase
/ access
Eraser /
Accessor Success
Data lake
(Amazon S3)
▪ 5,000 datasets
▪ 98k fields
▪ 10.4M files
Data lake
S3 buckets
▪ JSON*
▪ Parquet
▪ AVRO
▪ CSV
GOVERNANCE
INGEST
Data lake index
(DynamoDB)
Airflow Airflow
1 2 3
4
5
9. 9
Data warehouse architecture
4 x ra3.16xlarge
Data warehouse
(Amazon Redshift)
Data lake
(Amazon S3)
ELT orchestration
COPY
(data ingest)
External tables
(Amazon Redshift Spectrum)
APIs &
SaaS
▪ 2,670 tables
▪ 2m queries / month
▪ Layered data architecture
▪ Raw data stored
in SUPER columns
▪ Hourly ELT with
idempotent pipelines
Cache
(Amazon Aurora)
Foreign data wrapper
(pg_cron for scheduling)
External schema
(live federated queries)
▪ Used as fast storage
layer for data apps
▪ Serves raw data
for ELT data pipelines
1
2
3
10. 10
New Redshift features we are excited about
▪ Long-awaited
improvement that
help us efficiently
generate large pre-
aggregated multi-
dimensional cubes
▪ Great in combination
with HLL functions for
fast unique counts
▪ MERGE to simplify our
incremental data
pipelines
▪ S3 auto-copy to
simplify data lake
ingest pipelines
▪ Aurora zero-ETL
integration to simplify
CDC pipelines
▪ Create ”masked”
versions of tables to
improve data privacy
and governance
▪ Eliminates overhead
of maintaining
multiple versions /
slices of the data
ROLLUP / CUBE
1 DATA MASKING
2 OTHER
3
11. 11
BI & Analytics architecture
Data warehouse
(Amazon Redshift)
Raw data layer
Operational data layer
BI data layer
Cubes
Consumers
Raw unmodified data from source
systems – ELT from data lake
Clean, transformed, disaggregated
entity relationship model – starting
point for all reporting & analytics
Customer, orders, product, stores,
warehouse, stock master data
Semi-aggregated datasets to
enable fast reporting & analytics.
Includes pre-computed
HLL sketches for efficient
unique counts.
Multi-dimensional ROLAP cubes
delivering pre-aggregated metrics
along pre-defined dimensions.
Best practice: CUBE/ROLLUP on
top of pre-computed HLL sketches
Data IDEs
(JDBC)
Data sharing
Athena
One-stop shop
analytics
APIs
1
2
3
4
5
12. 12
Redshift enables all reporting & analytics use cases
▪ Official reporting
built by central BI
team
▪ Self-service
analytics done
autonomously
within teams
▪ Field analytics
embedded in
customer-facing
apps
Registered users (self-service analytics)
13. 13
Data Science & ML architecture
Develop Train Serve
Amazon Athena Amazon Redshift
Amazon EC2 AWS Batch
Aurora / RDS
DynamoDB
API Gateway AWS Lambda
R / Python
Notebooks
Feature engineering
Model development Model training
Amazon Redshift
Feature extraction pipelines
Amazon Athena
EC2 instances
ML data layer
Serverless
1 2 3
14. 14
H&B data drives core business value & innovation
✓ Unit economics
✓ Store network planning
✓ Competitor intelligence
✓ Promo effectiveness
✓ Econometrics / MMM
✓ Space & range analytics
Commercial
Finance
Wellness
Supply chain
✓ Daily / weekly / monthly
management reporting
✓ Operational trade reporting
✓ Intraday / peak reporting
✓ Exception reporting
✓ Single view of stock
✓ Forecasting & replenishment
✓ Fulfilment analytics
✓ Stock availability
✓ Clearance / overstock analytics
✓ Supplier analytics
✓ Diagnostics
✓ Health analytics
✓ Personalised wellness
✓ Behavioural engine
Customer Digital
✓ Single customer view
✓ Customer lifecycle
management
✓ eCRM enablement
✓ Customer lifetime value
✓ Digi marketing measurement
✓ Personalisation & search
✓ OKRs
✓ UX / funnel analytics
✓ Experimentation platform
✓ Web / app event tracking
✓ SEO analytics