What exactly is a data product, and how to build one in a data driven manner? In this session I will dive into those questions. This will be done, based on recent experience in a project where a particular search technology was replaced by a data drive search pipeline.
First some context will be sketched by laying out the starting point of this project. How, we moved from a once-a-day update of the index, to real-time updates in an architecture based on ElasticSearch, kafka, microservices, command query responsibility separation, and real-time monitoring using kafka-streams.
Next, various parts of the new pipeline will be highlighted while discussing what kind of data was measured and how it steered the engineering efforts. Various lessons learned, such as in which order to do things when building a data product and how to deal with the relation between engineering and data science will be discussed.
Finally, we will have a look on how machine learning techniques such as learning-to-rank, and entity recognition, where added to the mix. After this session you will have a better understanding of the pitfalls when building a data product and some concrete anchors to drive the engineering efforts of your own team.
5. Be aware of both cycles, start both as early
as possible, and organize the right people.
Start monitoring early, keep the metric
simple, verify you can move it!
15. Each searchphrase should return a result
● Easy to measure
● Hard to implement ⇒ “Bosch boorhamer PBH 3000 750 watt 1600 rpm”
Results should be regularly bought
● Tricky to measure
● Even harder to get right
Interaction of results should happen with first N results
● Easy to measure
● Hard to get right
Each search result should be interacted with
● Easy to measure
● All interactions happen with results on page 5
16. Start a discussion about what success
means at the start, and iterate over it
regularly.
Make initial metrics global, combine multiple
metrics in the simplest way possible, and
stick to them.
18. Backend systems
E-Commerce website
Header / footer
Search
overview
page
Search
Service
Index
forwards
results to
Sent technology
specific query
(JSON/HTTP)
Basket
User
navigates to
Event bus
Sent search phrase /
filters to service.
(JSON/HTTP)
19. Backend systems
E-Commerce website
Header / footer
Search
overview
page
Search
Service
Index
forwards
results to
Sent technology
specific query
(JSON/HTTP)
Sent search phrase /
filters to service.
(JSON/HTTP)
Basket
Event
Collector
User
navigates to
Sends event
(JSON/HTTP)
Sends event
(JSON/HTTP)
web events
Metrics
processor
Dashboard
Produces event (AVRO/Kafka)
Consumes
event
(AVRO/Kafka)
Consumes
Metric
(JSON/Backend)
20.
21. Use a single source for metrics and maintain
them within the team whenever possible.
Discuss which decisions you make based on
the current gathered metrics.
29. Textual Ranking
Order Data Driven Order
Most
bought
Most
Viewed
Least
bought
Least
bought
Most
Viewed
Most
bought
?
https://github.com/o19s/elasticsearch-learning-to-rank
36. Data Driven Product - A software
system, which has measurable
impact, on a well defined business
process, using machine learning.
37. +31 (0)1 - 68479294
Coltbaan 4E, Nieuwegein
info@bigdatarepublic.nl
www.bigdatarepublic.nl
/company/bigdata-republic
@bigdatarep
DATA SCIENCE | BIG DATA ANALYTICS | BIG DATA
ARCHITECTURES