2. - About Kumparan
- Intro to Recommendation System
- Building From Scratch
- Getting user’s behaviour
- Analyze user behaviour
- Processing the data and deploy
- Serving the recommendation
- Iterate and improve
- Questions?
Content
3. About Kumparan
- A startup that focus in both media and
technology
- We want to become a scalable and yet credible
media platform
- Media platform means that it is a platform
where people can publish their content on
Kumparan
4. Intro to Recommendation System
Recommendation System is basically a system that
manage what content that each user will see in
Kumparan
5. Building From Scratch
- Imagine you need to build a recommendation
system from scratch.
- Then you will realize that no data available...
- First Challenge: Gather the data!
6. Building From Scratch
- Once we get the data, now the problem is how
do we use the data.
- Second Challenge: Process the data!
7. Building From Scratch
- Lastly, after we process the data and use it.
Now we should ask on how do we improve it
- Third Challenge: Iterate and improve!
8. Getting Users Behaviour Data
- We need to build a tracker to get the users
behaviour data
- Challenges:
- High velocity data stream
- High volume data to process
- We have a burst traffic in a certain period
in media and the tracker system need to be
able to autoscale in this case
- (optional) Having a real-time tracking
system
10. Getting Users Behaviour Data
Further challenges:
- Define the format to define the events
- Create documentations on what to track
- Implement the tracking code on the frontend
that will later call the tracker-api
11. Analyze user behaviour
- Usually the challenge of analysing user
behaviour is the problem of processing a big
data
- We solved it by using BigQuery as our
DataWarehouse
- We can basically use SQL to collect as well as
process in BigQuery
- And for the more detailed analysis, we will
aggregate the data on BigQuery and further
process it with Python on Jupyter notebook
13. Processing the data deployment
- Similar with the analysis, we use both BigQuery
and Python for deployment
- We use our own system for managing the
BigQuery queries on top of Airflow
- For the Python, it is deployed on Kubernetes as
a CronJob
- The result will be stored in serving database
(Elasticsearch, MySQL, Redis, or BigTable)
15. Serving the Recommendation
- Need to explain the infra for serving the API
- After we get the result in the serving database,
the last step would be serving the API
- We serve the API using Python (sanic or flask) in
kubernetes as well
- The endpoint in kubernetes would be all
combined in one subdomain with NginX reverse
proxy for integration
- We can cache the endpoint to have a faster
latency
17. Iterate and Improve
- To iterate and improve is basically to change
the parameter, algorithm, or possibly the UX
- One of the challenge is to actually know
whether the changes have a positive
improvement
- We can use AB-test to test our hypothesis that
the new idea have a significant improvement
statistically!
18. Iterate and Improve
- We build our AB-test platform in Kumparan
- It is divided into two things:
- AB-test system
- AB-test analysis
- We use the open source library Planout for the
AB-test system
- For the AB-test analysis, we use python dash
plotly library for visualization
21. Conclusion
- We explain the recommender infrastructure in
Kumparan
- The infrastructure include the data gathering,
processing, serving, and how to iterate and
improve
- I hope this presentation is useful :)
25. We are hiring!
1. Data Engineer
2. Data Scientist
3. BI Engineer
4. BI Analyst
5. Software Engineer (Frontend, Backend & Mobile
Application)
Email Us on joindev@kumparan.com