SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
Recommender System
Infrastructure in Kumparan
Yosua Michael Maranatha
- About Kumparan
- Intro to Recommendation System
- Building From Scratch
- Getting user’s behaviour
- Analyze user behaviour
- Processing the data and deploy
- Serving the recommendation
- Iterate and improve
- Questions?
Content
About Kumparan
- A startup that focus in both media and
technology
- We want to become a scalable and yet credible
media platform
- Media platform means that it is a platform
where people can publish their content on
Kumparan
Intro to Recommendation System
Recommendation System is basically a system that
manage what content that each user will see in
Kumparan
Building From Scratch
- Imagine you need to build a recommendation
system from scratch.
- Then you will realize that no data available...
- First Challenge: Gather the data!
Building From Scratch
- Once we get the data, now the problem is how
do we use the data.
- Second Challenge: Process the data!
Building From Scratch
- Lastly, after we process the data and use it.
Now we should ask on how do we improve it
- Third Challenge: Iterate and improve!
Getting Users Behaviour Data
- We need to build a tracker to get the users
behaviour data
- Challenges:
- High velocity data stream
- High volume data to process
- We have a burst traffic in a certain period
in media and the tracker system need to be
able to autoscale in this case
- (optional) Having a real-time tracking
system
Getting Users Behaviour Data
Getting Users Behaviour Data
Further challenges:
- Define the format to define the events
- Create documentations on what to track
- Implement the tracking code on the frontend
that will later call the tracker-api
Analyze user behaviour
- Usually the challenge of analysing user
behaviour is the problem of processing a big
data
- We solved it by using BigQuery as our
DataWarehouse
- We can basically use SQL to collect as well as
process in BigQuery
- And for the more detailed analysis, we will
aggregate the data on BigQuery and further
process it with Python on Jupyter notebook
Analyze user behaviour
Processing the data deployment
- Similar with the analysis, we use both BigQuery
and Python for deployment
- We use our own system for managing the
BigQuery queries on top of Airflow
- For the Python, it is deployed on Kubernetes as
a CronJob
- The result will be stored in serving database
(Elasticsearch, MySQL, Redis, or BigTable)
Processing the data deployment
Serving the Recommendation
- Need to explain the infra for serving the API
- After we get the result in the serving database,
the last step would be serving the API
- We serve the API using Python (sanic or flask) in
kubernetes as well
- The endpoint in kubernetes would be all
combined in one subdomain with NginX reverse
proxy for integration
- We can cache the endpoint to have a faster
latency
Serving the Recommendation
Iterate and Improve
- To iterate and improve is basically to change
the parameter, algorithm, or possibly the UX
- One of the challenge is to actually know
whether the changes have a positive
improvement
- We can use AB-test to test our hypothesis that
the new idea have a significant improvement
statistically!
Iterate and Improve
- We build our AB-test platform in Kumparan
- It is divided into two things:
- AB-test system
- AB-test analysis
- We use the open source library Planout for the
AB-test system
- For the AB-test analysis, we use python dash
plotly library for visualization
Iterate and Improve
AB-test System
Iterate and Improve
AB-test Analysis
Conclusion
- We explain the recommender infrastructure in
Kumparan
- The infrastructure include the data gathering,
processing, serving, and how to iterate and
improve
- I hope this presentation is useful :)
Recommender Infrastructure Diagram
AB-Test Infrastructure Diagram
THANK YOU!
We are hiring!
1. Data Engineer
2. Data Scientist
3. BI Engineer
4. BI Analyst
5. Software Engineer (Frontend, Backend & Mobile
Application)
Email Us on joindev@kumparan.com
QUESTIONS ?

Contenu connexe

Similaire à Recommender Infrastructure in Kumparan

Summarization and opinion detection in product reviews
Summarization and opinion detection in product reviewsSummarization and opinion detection in product reviews
Summarization and opinion detection in product reviewspapanaboinasuman
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and FutureKeiichiro Ono
 
Data Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyData Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyAnkita Dubey
 
Web and Android App Development
Web and Android App DevelopmentWeb and Android App Development
Web and Android App DevelopmentGaurav Gopal Gupta
 
LKIN17: Enabling Enterprise Agility though a Hybrid Agile Implementation Mode...
LKIN17: Enabling Enterprise Agility though a Hybrid Agile Implementation Mode...LKIN17: Enabling Enterprise Agility though a Hybrid Agile Implementation Mode...
LKIN17: Enabling Enterprise Agility though a Hybrid Agile Implementation Mode...Innovation Roots
 
Lean Kanban India 2017 | Case study - Hybrid Agile Implementation Model to En...
Lean Kanban India 2017 | Case study - Hybrid Agile Implementation Model to En...Lean Kanban India 2017 | Case study - Hybrid Agile Implementation Model to En...
Lean Kanban India 2017 | Case study - Hybrid Agile Implementation Model to En...LeanKanbanIndia
 
Online-Voting-System.doc
Online-Voting-System.docOnline-Voting-System.doc
Online-Voting-System.docShangaviS2
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
 
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTSBig Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTSMatt Stubbs
 
IEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesIEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesNish Parikh
 
Large scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log miningLarge scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log miningitstuff
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Simplilearn
 
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Benjamin Bengfort
 
Distributed Database practicals
Distributed Database practicals Distributed Database practicals
Distributed Database practicals Vrushali Lanjewar
 
Real timeeventmonitoringsystem(1)
Real timeeventmonitoringsystem(1)Real timeeventmonitoringsystem(1)
Real timeeventmonitoringsystem(1)Atyam Sriharsha
 
Library Management System using oracle database
Library Management System using oracle databaseLibrary Management System using oracle database
Library Management System using oracle databaseSaikot Roy
 
Measuring the New Wikipedia Community (PyData SV 2013)
Measuring the New Wikipedia Community (PyData SV 2013)Measuring the New Wikipedia Community (PyData SV 2013)
Measuring the New Wikipedia Community (PyData SV 2013)PyData
 
Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark MLAhmet Bulut
 

Similaire à Recommender Infrastructure in Kumparan (20)

52845
5284552845
52845
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
Summarization and opinion detection in product reviews
Summarization and opinion detection in product reviewsSummarization and opinion detection in product reviews
Summarization and opinion detection in product reviews
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and Future
 
Data Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyData Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubey
 
Web and Android App Development
Web and Android App DevelopmentWeb and Android App Development
Web and Android App Development
 
LKIN17: Enabling Enterprise Agility though a Hybrid Agile Implementation Mode...
LKIN17: Enabling Enterprise Agility though a Hybrid Agile Implementation Mode...LKIN17: Enabling Enterprise Agility though a Hybrid Agile Implementation Mode...
LKIN17: Enabling Enterprise Agility though a Hybrid Agile Implementation Mode...
 
Lean Kanban India 2017 | Case study - Hybrid Agile Implementation Model to En...
Lean Kanban India 2017 | Case study - Hybrid Agile Implementation Model to En...Lean Kanban India 2017 | Case study - Hybrid Agile Implementation Model to En...
Lean Kanban India 2017 | Case study - Hybrid Agile Implementation Model to En...
 
Online-Voting-System.doc
Online-Voting-System.docOnline-Voting-System.doc
Online-Voting-System.doc
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTSBig Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
 
IEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesIEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slides
 
Large scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log miningLarge scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log mining
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
 
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
 
Distributed Database practicals
Distributed Database practicals Distributed Database practicals
Distributed Database practicals
 
Real timeeventmonitoringsystem(1)
Real timeeventmonitoringsystem(1)Real timeeventmonitoringsystem(1)
Real timeeventmonitoringsystem(1)
 
Library Management System using oracle database
Library Management System using oracle databaseLibrary Management System using oracle database
Library Management System using oracle database
 
Measuring the New Wikipedia Community (PyData SV 2013)
Measuring the New Wikipedia Community (PyData SV 2013)Measuring the New Wikipedia Community (PyData SV 2013)
Measuring the New Wikipedia Community (PyData SV 2013)
 
Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark ML
 

Dernier

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 

Dernier (20)

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 

Recommender Infrastructure in Kumparan

  • 1. Recommender System Infrastructure in Kumparan Yosua Michael Maranatha
  • 2. - About Kumparan - Intro to Recommendation System - Building From Scratch - Getting user’s behaviour - Analyze user behaviour - Processing the data and deploy - Serving the recommendation - Iterate and improve - Questions? Content
  • 3. About Kumparan - A startup that focus in both media and technology - We want to become a scalable and yet credible media platform - Media platform means that it is a platform where people can publish their content on Kumparan
  • 4. Intro to Recommendation System Recommendation System is basically a system that manage what content that each user will see in Kumparan
  • 5. Building From Scratch - Imagine you need to build a recommendation system from scratch. - Then you will realize that no data available... - First Challenge: Gather the data!
  • 6. Building From Scratch - Once we get the data, now the problem is how do we use the data. - Second Challenge: Process the data!
  • 7. Building From Scratch - Lastly, after we process the data and use it. Now we should ask on how do we improve it - Third Challenge: Iterate and improve!
  • 8. Getting Users Behaviour Data - We need to build a tracker to get the users behaviour data - Challenges: - High velocity data stream - High volume data to process - We have a burst traffic in a certain period in media and the tracker system need to be able to autoscale in this case - (optional) Having a real-time tracking system
  • 10. Getting Users Behaviour Data Further challenges: - Define the format to define the events - Create documentations on what to track - Implement the tracking code on the frontend that will later call the tracker-api
  • 11. Analyze user behaviour - Usually the challenge of analysing user behaviour is the problem of processing a big data - We solved it by using BigQuery as our DataWarehouse - We can basically use SQL to collect as well as process in BigQuery - And for the more detailed analysis, we will aggregate the data on BigQuery and further process it with Python on Jupyter notebook
  • 13. Processing the data deployment - Similar with the analysis, we use both BigQuery and Python for deployment - We use our own system for managing the BigQuery queries on top of Airflow - For the Python, it is deployed on Kubernetes as a CronJob - The result will be stored in serving database (Elasticsearch, MySQL, Redis, or BigTable)
  • 14. Processing the data deployment
  • 15. Serving the Recommendation - Need to explain the infra for serving the API - After we get the result in the serving database, the last step would be serving the API - We serve the API using Python (sanic or flask) in kubernetes as well - The endpoint in kubernetes would be all combined in one subdomain with NginX reverse proxy for integration - We can cache the endpoint to have a faster latency
  • 17. Iterate and Improve - To iterate and improve is basically to change the parameter, algorithm, or possibly the UX - One of the challenge is to actually know whether the changes have a positive improvement - We can use AB-test to test our hypothesis that the new idea have a significant improvement statistically!
  • 18. Iterate and Improve - We build our AB-test platform in Kumparan - It is divided into two things: - AB-test system - AB-test analysis - We use the open source library Planout for the AB-test system - For the AB-test analysis, we use python dash plotly library for visualization
  • 21. Conclusion - We explain the recommender infrastructure in Kumparan - The infrastructure include the data gathering, processing, serving, and how to iterate and improve - I hope this presentation is useful :)
  • 25. We are hiring! 1. Data Engineer 2. Data Scientist 3. BI Engineer 4. BI Analyst 5. Software Engineer (Frontend, Backend & Mobile Application) Email Us on joindev@kumparan.com