SlideShare une entreprise Scribd logo
1  sur  43
Télécharger pour lire hors ligne
StudySapuri Data Analytics Platform
with Treasure Data
Tetsuo Yamabe
Recruit Marketing Partners Co., Ltd.
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
About Me
Tetsuo Yamabe
2
Data Engineer / Ph.D. (Eng)
Communication Design Group
Business Development Department
Online Learning Development Office
Education & Learning Business Division
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
About Me
Tetsuo Yamabe
3
Joined RMP at Aug.2015
10 months TD experience
Data analytics platform development
for our online learning service
(a.k.a. StudySapuri)
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
• 980 JPY / month ~
• Individual & In class business model
5
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Individual In class
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Individual In class
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
http://www.slideshare.net/Seigen/ss-61816140
Adaptive Learning for personalized LX
Collaborative research with Matsuo Lab. at Tokyo Univ.
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Outline
1. Background
2. Platform Migration and TD
3. Technical Details
4. Challenges and Future Work
5. Conclusion
9
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
1. Background
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved. 11
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved. 12
Recruit Technologies
Recruit Marketing Partners
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved. 13
Recruit Marketing Partners
Recruit TechnologiesQuipper
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Quipper
• “Distributors of Wisdom”
‒ Japanese EdTech company launched in London
‒ Teacher-student communication support system
• Worldwide presence in global education scene
‒ London, Tokyo, Manila, Jakarta, Mexico City
‒ Open culture with strong engineering competence
‒ Acquired by Recruit Marketing Partners in Apr. 2015
14
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Recruit
private
cloud
AWS
Before After
2016.2.25
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
2. Platform Migration and TD
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Before “Quipper Migration”
• Main usage
‒ KPI monitoring
‒ Adhoc user activity analytics
• Used together with private Hadoop
‒ WebHive
18
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Before “Quipper Migration”
19
Raw tables/logs Transformed tables
Member attributes
Activity logs
Data
Ops
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Extract, Transform and Load Pattern
Pros
• Easy to use (simple schema, aggregated information)
• Easy to maintain (data team perspective)
• Reduced size information and logs
Cons
• Inflexibility in fixed data source and schema definition
• Bloating tables
• Black-boxed transformation
• Communication cost across divisions/companies
20
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
After “Quipper Migration”
21
Raw tables/logs
Scooped
tables
Member attributes
Activity logs
Transformed
tables
DataInfraDev
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Extract, Load and Transform Pattern
Pros
• You have everything you need/want
• Fully aggregated data in TD
Cons
• Duplicate business logic
• Batch process maintenance cost
• Data volume and load time
• Learning cost (app data and internal architecture)
22
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved. 23
Contents Performance Monitoring
Customer Support Support
Students Performance Report
Class Status Report
KPI Monitoring
Salesman Support
Developer Support
Prototyping New FeatureData Science Support
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Fact Sheet
• 50+ tables are daily imported by Embulk
• 30+ hive queries are invoked by Luigi
• 10+ presto queries are scheduled in TD web console
• 20+ reports are delivered to 5 business divisions
24
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
3. Technical Details
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Streaming
Insert
Application
(Server side)
Databases
Application
(Client side)
TD SDK
Kinesis Lambda
DataTank
PlazmaDB
Join /w FDW
Bulk import
System Overview
Payment logs
Video info
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Featured Topics
• Client-side events
‒ SPA event tracking
‒ Customized TD tag
• Server-side events
‒ Streaming insert with Kinesis + Lambda
• td-client-python
‒ Durability improvement
27
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Featured Topics
• DataTank
‒ Isolate sensitive information from Plazma DB
‒ Data mart store to connect BI
• Luigi
‒ Define data transforming job with table dependency
‒ Invoke Embulk command inside Luigi Jobs
28
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Featured Topics
• Bulk import
‒ Cross import from MongoDB and PostgreSQL to
PlazmaDB and DataTank
• embulk-input-mongodb
• embulk-input-postgresql
• embulk-filter-insert
• embulk-filter-eval
• embulk-output-td
• embulk-output-postgresql
29
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
4. Challenges and Future Work
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Scooped
raw tables
Transformed
tables
Report
tables / marts
Scheduled queries in web console
• Select all without conditions
• Assign column name in Japanese
• Result export to Google spreadsheet
Transform tables in Luigi tasks
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Record Set Versioning at Transforming Phase
32
=2016/03/31
2016/04/01
2016/04/02
append
user_0001
user_0002
user_0003
Table C
user_0001
user_0002
user_0003
user_0001
user_0002
user_0003
user_0001
user_0002
user_0003
Table B
user_0001
user_0002
user_0003
user_0001
user_0002
user_0003
user_0001
user_0002
user_0003
Table A
user_0001
user_0002
user_0003
user_0001
user_0002
user_0003
=
=
+
+
+
Partition-based versioning pattern
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Record Set Versioning at Transforming Phase
33
create
Table A_yyyymmdd
=2016/03/31
user_0001
user_0002
user_0003
user_0001
user_0002
user_0003
user_0001
user_0002
user_0003
+
2016/04/01
user_0001
user_0002
user_0003
user_0001
user_0002
user_0003
user_0001
user_0002
user_0003
=+
2016/04/02
user_0001
user_0002
user_0003
user_0001
user_0002
user_0003
user_0001
user_0002
user_0003
=+
Table B_yyyymmdd Table C_yyyymmdd
Table-based versioning pattern
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Record Set Versioning at Transforming Phase
• Table-based versioning doesn’t fit TD
‒ Increased table degrades query performance
‒ Union operator is needed for all the tables
‒ Append and remove is not realistic
• Partition-based versioning with “once a day” rule
‒ Drop daily partition first before record insert
‒ ALTER TABLE capability would be helpful to
invoke drop partition in a query
34
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Reuse Application’s Business Logic
• Frequently appearing clause should be defined as a
common UDF or view
‒ Incl. schema definition, const definition etc
‒ TD is missing both UDF and view features
• Preliminary transform complicated tables in
application side before loading into TD?
‒ Hybrid approach
‒ Reuse application code
35
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Other topics
• Increasing users across division
‒ Account management (incl. dev/ops/biz)
‒ Race condition in Presto resource
‒ Large file delivery via web console
• Presto/Hive query testing framework
‒ Test against small dataset with Presto/Hive SQL
interface?
36
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
5. Conclusion
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Success Factors
• TD allows to focus on understanding application and
communication with Quipper engineers
‒ Fully managed Hadoop service
‒ Customer support’s quick response
• Different DB but still in same TD
‒ No extra cost at database-cross JOIN
‒ Continuous analytics with JukenSapuri data
38
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Success Factors
• Quipper’s culture and strong skills are really helpful to
setup a data analytics platform for their application
‒ Global market already had a BQ based platform
‒ Open information and communication
• Slack x GitHub x Google Drive
‒ Clean code with fine readability
‒ HRT : Humanity, Respect, and Trust
• Cultural convergence between Quipper and RMP
39
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Conway’s Law?
40
Data
Infra
Dev
Casual open communication over chat + PR
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Beyond Monitoring and Reporting
• Sophisticated machine-learning with Hivemall
• Realtime data processing and feed to application
41
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Distributors of Wisdom
x
世界の果てまで最高のまなびを届ける
42
(C) Recruit Marketing Partners Co.,Ltd. All rights reserved. 43

Contenu connexe

En vedette

『GMOプライベートDMP』の開発にあたって取り組んできた DevOps、更にその反省点と現在進行中のカイゼン事例の紹介
『GMOプライベートDMP』の開発にあたって取り組んできた DevOps、更にその反省点と現在進行中のカイゼン事例の紹介『GMOプライベートDMP』の開発にあたって取り組んできた DevOps、更にその反省点と現在進行中のカイゼン事例の紹介
『GMOプライベートDMP』の開発にあたって取り組んできた DevOps、更にその反省点と現在進行中のカイゼン事例の紹介Tetsuo Yamabe
 
継続的デリバリーと読み解く Web 開発あるあるとその対策
継続的デリバリーと読み解く Web 開発あるあるとその対策継続的デリバリーと読み解く Web 開発あるあるとその対策
継続的デリバリーと読み解く Web 開発あるあるとその対策Tetsuo Yamabe
 
GMO プライベート DMP 開発で 取り組んできた DevOps と今後の展望
GMO プライベート DMP 開発で 取り組んできた DevOps と今後の展望GMO プライベート DMP 開発で 取り組んできた DevOps と今後の展望
GMO プライベート DMP 開発で 取り組んできた DevOps と今後の展望Tetsuo Yamabe
 
GMO プライベート DMP で ビッグデータ解析をするために アプリクラウドで Apache Spark の検証をしてみた
GMO プライベート DMP で ビッグデータ解析をするために アプリクラウドで Apache Spark の検証をしてみたGMO プライベート DMP で ビッグデータ解析をするために アプリクラウドで Apache Spark の検証をしてみた
GMO プライベート DMP で ビッグデータ解析をするために アプリクラウドで Apache Spark の検証をしてみたTetsuo Yamabe
 
並列データベースシステムの概念と原理
並列データベースシステムの概念と原理並列データベースシステムの概念と原理
並列データベースシステムの概念と原理Makoto Yui
 
モノタロウが トレジャーデータを使う理由と、 データを活かす企業文化
モノタロウがトレジャーデータを使う理由と、データを活かす企業文化モノタロウがトレジャーデータを使う理由と、データを活かす企業文化
モノタロウが トレジャーデータを使う理由と、 データを活かす企業文化株式会社MonotaRO Tech Team
 
ゲームニクス理論
ゲームニクス理論ゲームニクス理論
ゲームニクス理論TANREN Inc.
 
EmbulkとDigdagとデータ分析基盤と
EmbulkとDigdagとデータ分析基盤とEmbulkとDigdagとデータ分析基盤と
EmbulkとDigdagとデータ分析基盤とToru Takahashi
 
時を超えた越境への道
時を超えた越境への道時を超えた越境への道
時を超えた越境への道toshihiro ichitani
 
なぜ今、ハードテックスタートアップなのか
なぜ今、ハードテックスタートアップなのかなぜ今、ハードテックスタートアップなのか
なぜ今、ハードテックスタートアップなのかTakaaki Umada
 
Re:ゼロから文化を創り、技術を伝承する ~客先常駐エンジニアと「社内勉強会」で築いた価値と変化
Re:ゼロから文化を創り、技術を伝承する ~客先常駐エンジニアと「社内勉強会」で築いた価値と変化Re:ゼロから文化を創り、技術を伝承する ~客先常駐エンジニアと「社内勉強会」で築いた価値と変化
Re:ゼロから文化を創り、技術を伝承する ~客先常駐エンジニアと「社内勉強会」で築いた価値と変化Shunsuke Suga
 
2016-10-25 product manager conference 資料
2016-10-25 product manager conference 資料2016-10-25 product manager conference 資料
2016-10-25 product manager conference 資料Takeo Iyo
 
とあるスタートアップの評価指標(メトリクス)
とあるスタートアップの評価指標(メトリクス)とあるスタートアップの評価指標(メトリクス)
とあるスタートアップの評価指標(メトリクス)Takaaki Umada
 
ハードテック スタートアップのトレンド (2016 年版)
ハードテック スタートアップのトレンド (2016 年版)ハードテック スタートアップのトレンド (2016 年版)
ハードテック スタートアップのトレンド (2016 年版)Takaaki Umada
 
ゼロからはじめるプロダクトマネージャー生活
ゼロからはじめるプロダクトマネージャー生活ゼロからはじめるプロダクトマネージャー生活
ゼロからはじめるプロダクトマネージャー生活Takaaki Umada
 
Googleのインフラ技術から考える理想のDevOps
Googleのインフラ技術から考える理想のDevOpsGoogleのインフラ技術から考える理想のDevOps
Googleのインフラ技術から考える理想のDevOpsEtsuji Nakai
 
逆説のスタートアップ思考
逆説のスタートアップ思考逆説のスタートアップ思考
逆説のスタートアップ思考Takaaki Umada
 

En vedette (17)

『GMOプライベートDMP』の開発にあたって取り組んできた DevOps、更にその反省点と現在進行中のカイゼン事例の紹介
『GMOプライベートDMP』の開発にあたって取り組んできた DevOps、更にその反省点と現在進行中のカイゼン事例の紹介『GMOプライベートDMP』の開発にあたって取り組んできた DevOps、更にその反省点と現在進行中のカイゼン事例の紹介
『GMOプライベートDMP』の開発にあたって取り組んできた DevOps、更にその反省点と現在進行中のカイゼン事例の紹介
 
継続的デリバリーと読み解く Web 開発あるあるとその対策
継続的デリバリーと読み解く Web 開発あるあるとその対策継続的デリバリーと読み解く Web 開発あるあるとその対策
継続的デリバリーと読み解く Web 開発あるあるとその対策
 
GMO プライベート DMP 開発で 取り組んできた DevOps と今後の展望
GMO プライベート DMP 開発で 取り組んできた DevOps と今後の展望GMO プライベート DMP 開発で 取り組んできた DevOps と今後の展望
GMO プライベート DMP 開発で 取り組んできた DevOps と今後の展望
 
GMO プライベート DMP で ビッグデータ解析をするために アプリクラウドで Apache Spark の検証をしてみた
GMO プライベート DMP で ビッグデータ解析をするために アプリクラウドで Apache Spark の検証をしてみたGMO プライベート DMP で ビッグデータ解析をするために アプリクラウドで Apache Spark の検証をしてみた
GMO プライベート DMP で ビッグデータ解析をするために アプリクラウドで Apache Spark の検証をしてみた
 
並列データベースシステムの概念と原理
並列データベースシステムの概念と原理並列データベースシステムの概念と原理
並列データベースシステムの概念と原理
 
モノタロウが トレジャーデータを使う理由と、 データを活かす企業文化
モノタロウがトレジャーデータを使う理由と、データを活かす企業文化モノタロウがトレジャーデータを使う理由と、データを活かす企業文化
モノタロウが トレジャーデータを使う理由と、 データを活かす企業文化
 
ゲームニクス理論
ゲームニクス理論ゲームニクス理論
ゲームニクス理論
 
EmbulkとDigdagとデータ分析基盤と
EmbulkとDigdagとデータ分析基盤とEmbulkとDigdagとデータ分析基盤と
EmbulkとDigdagとデータ分析基盤と
 
時を超えた越境への道
時を超えた越境への道時を超えた越境への道
時を超えた越境への道
 
なぜ今、ハードテックスタートアップなのか
なぜ今、ハードテックスタートアップなのかなぜ今、ハードテックスタートアップなのか
なぜ今、ハードテックスタートアップなのか
 
Re:ゼロから文化を創り、技術を伝承する ~客先常駐エンジニアと「社内勉強会」で築いた価値と変化
Re:ゼロから文化を創り、技術を伝承する ~客先常駐エンジニアと「社内勉強会」で築いた価値と変化Re:ゼロから文化を創り、技術を伝承する ~客先常駐エンジニアと「社内勉強会」で築いた価値と変化
Re:ゼロから文化を創り、技術を伝承する ~客先常駐エンジニアと「社内勉強会」で築いた価値と変化
 
2016-10-25 product manager conference 資料
2016-10-25 product manager conference 資料2016-10-25 product manager conference 資料
2016-10-25 product manager conference 資料
 
とあるスタートアップの評価指標(メトリクス)
とあるスタートアップの評価指標(メトリクス)とあるスタートアップの評価指標(メトリクス)
とあるスタートアップの評価指標(メトリクス)
 
ハードテック スタートアップのトレンド (2016 年版)
ハードテック スタートアップのトレンド (2016 年版)ハードテック スタートアップのトレンド (2016 年版)
ハードテック スタートアップのトレンド (2016 年版)
 
ゼロからはじめるプロダクトマネージャー生活
ゼロからはじめるプロダクトマネージャー生活ゼロからはじめるプロダクトマネージャー生活
ゼロからはじめるプロダクトマネージャー生活
 
Googleのインフラ技術から考える理想のDevOps
Googleのインフラ技術から考える理想のDevOpsGoogleのインフラ技術から考える理想のDevOps
Googleのインフラ技術から考える理想のDevOps
 
逆説のスタートアップ思考
逆説のスタートアップ思考逆説のスタートアップ思考
逆説のスタートアップ思考
 

Similaire à StudySapuri Data Analytics Platform with Treasure Data

3.pp level iii training ppt block&precasting, crusher
3.pp level iii   training ppt  block&precasting, crusher3.pp level iii   training ppt  block&precasting, crusher
3.pp level iii training ppt block&precasting, crusherAbhijit Patil
 
Embrace And Extend Your Analytics
Embrace And Extend Your AnalyticsEmbrace And Extend Your Analytics
Embrace And Extend Your AnalyticsWiiisdom
 
Neo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j
 
Webinar (UK/Europe) - Demystifying SAP S/4HANA & Test Automation
Webinar (UK/Europe) - Demystifying SAP S/4HANA & Test AutomationWebinar (UK/Europe) - Demystifying SAP S/4HANA & Test Automation
Webinar (UK/Europe) - Demystifying SAP S/4HANA & Test AutomationJK Tech
 
SCM Migration Webinar - English
SCM Migration Webinar - EnglishSCM Migration Webinar - English
SCM Migration Webinar - EnglishCollabNet
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyNeo4j
 
Migrating from Oracle to Postgres
Migrating from Oracle to PostgresMigrating from Oracle to Postgres
Migrating from Oracle to PostgresEDB
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy Neo4j
 
Data-Centric Approach for Project Delivery
Data-Centric Approach for Project DeliveryData-Centric Approach for Project Delivery
Data-Centric Approach for Project DeliveryAVEVA Group plc
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyNeo4j
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makerszekeLabs Technologies
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyNeo4j
 
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4j
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4jUnified Data Catalog - Recommendations powered by Apache Spark & Neo4j
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4jDeepak Chandramouli
 
Sugar Analytics: Powered by Cognos Business Intelligence
Sugar Analytics: Powered by Cognos Business IntelligenceSugar Analytics: Powered by Cognos Business Intelligence
Sugar Analytics: Powered by Cognos Business IntelligenceSugarCRM
 
Unifying Marketing Data & Multi-Touch Attribution Analysis
Unifying Marketing Data & Multi-Touch Attribution AnalysisUnifying Marketing Data & Multi-Touch Attribution Analysis
Unifying Marketing Data & Multi-Touch Attribution AnalysisPrinciple America
 
Webinar (UK/Europe) - Demystifying SAP S/4HANA
Webinar (UK/Europe) - Demystifying SAP S/4HANAWebinar (UK/Europe) - Demystifying SAP S/4HANA
Webinar (UK/Europe) - Demystifying SAP S/4HANAJK Tech
 
Ai and data migration as a service subhash bhat cwin18-india
Ai and data migration as a service subhash bhat cwin18-indiaAi and data migration as a service subhash bhat cwin18-india
Ai and data migration as a service subhash bhat cwin18-indiaCapgemini
 

Similaire à StudySapuri Data Analytics Platform with Treasure Data (20)

3.pp level iii training ppt block&precasting, crusher
3.pp level iii   training ppt  block&precasting, crusher3.pp level iii   training ppt  block&precasting, crusher
3.pp level iii training ppt block&precasting, crusher
 
Resume_Presious
Resume_PresiousResume_Presious
Resume_Presious
 
Embrace And Extend Your Analytics
Embrace And Extend Your AnalyticsEmbrace And Extend Your Analytics
Embrace And Extend Your Analytics
 
Neo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael Moore
 
Webinar (UK/Europe) - Demystifying SAP S/4HANA & Test Automation
Webinar (UK/Europe) - Demystifying SAP S/4HANA & Test AutomationWebinar (UK/Europe) - Demystifying SAP S/4HANA & Test Automation
Webinar (UK/Europe) - Demystifying SAP S/4HANA & Test Automation
 
SCM Migration Webinar - English
SCM Migration Webinar - EnglishSCM Migration Webinar - English
SCM Migration Webinar - English
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Migrating from Oracle to Postgres
Migrating from Oracle to PostgresMigrating from Oracle to Postgres
Migrating from Oracle to Postgres
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Data-Centric Approach for Project Delivery
Data-Centric Approach for Project DeliveryData-Centric Approach for Project Delivery
Data-Centric Approach for Project Delivery
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Amit_Kumar_CV
Amit_Kumar_CVAmit_Kumar_CV
Amit_Kumar_CV
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4j
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4jUnified Data Catalog - Recommendations powered by Apache Spark & Neo4j
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4j
 
Sugar Analytics: Powered by Cognos Business Intelligence
Sugar Analytics: Powered by Cognos Business IntelligenceSugar Analytics: Powered by Cognos Business Intelligence
Sugar Analytics: Powered by Cognos Business Intelligence
 
Unifying Marketing Data & Multi-Touch Attribution Analysis
Unifying Marketing Data & Multi-Touch Attribution AnalysisUnifying Marketing Data & Multi-Touch Attribution Analysis
Unifying Marketing Data & Multi-Touch Attribution Analysis
 
Webinar (UK/Europe) - Demystifying SAP S/4HANA
Webinar (UK/Europe) - Demystifying SAP S/4HANAWebinar (UK/Europe) - Demystifying SAP S/4HANA
Webinar (UK/Europe) - Demystifying SAP S/4HANA
 
Ai and data migration as a service subhash bhat cwin18-india
Ai and data migration as a service subhash bhat cwin18-indiaAi and data migration as a service subhash bhat cwin18-india
Ai and data migration as a service subhash bhat cwin18-india
 
BP_SAP_MDM
BP_SAP_MDMBP_SAP_MDM
BP_SAP_MDM
 

Dernier

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 

Dernier (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

StudySapuri Data Analytics Platform with Treasure Data

  • 1. StudySapuri Data Analytics Platform with Treasure Data Tetsuo Yamabe Recruit Marketing Partners Co., Ltd.
  • 2. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. About Me Tetsuo Yamabe 2 Data Engineer / Ph.D. (Eng) Communication Design Group Business Development Department Online Learning Development Office Education & Learning Business Division
  • 3. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. About Me Tetsuo Yamabe 3 Joined RMP at Aug.2015 10 months TD experience Data analytics platform development for our online learning service (a.k.a. StudySapuri)
  • 4. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
  • 5. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. • 980 JPY / month ~ • Individual & In class business model 5
  • 6. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. Individual In class
  • 7. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. Individual In class
  • 8. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. http://www.slideshare.net/Seigen/ss-61816140 Adaptive Learning for personalized LX Collaborative research with Matsuo Lab. at Tokyo Univ.
  • 9. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. Outline 1. Background 2. Platform Migration and TD 3. Technical Details 4. Challenges and Future Work 5. Conclusion 9
  • 10. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. 1. Background
  • 11. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. 11
  • 12. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. 12 Recruit Technologies Recruit Marketing Partners
  • 13. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. 13 Recruit Marketing Partners Recruit TechnologiesQuipper
  • 14. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. Quipper • “Distributors of Wisdom” ‒ Japanese EdTech company launched in London ‒ Teacher-student communication support system • Worldwide presence in global education scene ‒ London, Tokyo, Manila, Jakarta, Mexico City ‒ Open culture with strong engineering competence ‒ Acquired by Recruit Marketing Partners in Apr. 2015 14
  • 15. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
  • 16. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. Recruit private cloud AWS Before After 2016.2.25
  • 17. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. 2. Platform Migration and TD
  • 18. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. Before “Quipper Migration” • Main usage ‒ KPI monitoring ‒ Adhoc user activity analytics • Used together with private Hadoop ‒ WebHive 18
  • 19. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. Before “Quipper Migration” 19 Raw tables/logs Transformed tables Member attributes Activity logs Data Ops
  • 20. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. Extract, Transform and Load Pattern Pros • Easy to use (simple schema, aggregated information) • Easy to maintain (data team perspective) • Reduced size information and logs Cons • Inflexibility in fixed data source and schema definition • Bloating tables • Black-boxed transformation • Communication cost across divisions/companies 20
  • 21. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. After “Quipper Migration” 21 Raw tables/logs Scooped tables Member attributes Activity logs Transformed tables DataInfraDev
  • 22. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. Extract, Load and Transform Pattern Pros • You have everything you need/want • Fully aggregated data in TD Cons • Duplicate business logic • Batch process maintenance cost • Data volume and load time • Learning cost (app data and internal architecture) 22
  • 23. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. 23 Contents Performance Monitoring Customer Support Support Students Performance Report Class Status Report KPI Monitoring Salesman Support Developer Support Prototyping New FeatureData Science Support
  • 24. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. Fact Sheet • 50+ tables are daily imported by Embulk • 30+ hive queries are invoked by Luigi • 10+ presto queries are scheduled in TD web console • 20+ reports are delivered to 5 business divisions 24
  • 25. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. 3. Technical Details
  • 26. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. Streaming Insert Application (Server side) Databases Application (Client side) TD SDK Kinesis Lambda DataTank PlazmaDB Join /w FDW Bulk import System Overview Payment logs Video info
  • 27. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. Featured Topics • Client-side events ‒ SPA event tracking ‒ Customized TD tag • Server-side events ‒ Streaming insert with Kinesis + Lambda • td-client-python ‒ Durability improvement 27
  • 28. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. Featured Topics • DataTank ‒ Isolate sensitive information from Plazma DB ‒ Data mart store to connect BI • Luigi ‒ Define data transforming job with table dependency ‒ Invoke Embulk command inside Luigi Jobs 28
  • 29. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. Featured Topics • Bulk import ‒ Cross import from MongoDB and PostgreSQL to PlazmaDB and DataTank • embulk-input-mongodb • embulk-input-postgresql • embulk-filter-insert • embulk-filter-eval • embulk-output-td • embulk-output-postgresql 29
  • 30. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. 4. Challenges and Future Work
  • 31. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. Scooped raw tables Transformed tables Report tables / marts Scheduled queries in web console • Select all without conditions • Assign column name in Japanese • Result export to Google spreadsheet Transform tables in Luigi tasks
  • 32. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. Record Set Versioning at Transforming Phase 32 =2016/03/31 2016/04/01 2016/04/02 append user_0001 user_0002 user_0003 Table C user_0001 user_0002 user_0003 user_0001 user_0002 user_0003 user_0001 user_0002 user_0003 Table B user_0001 user_0002 user_0003 user_0001 user_0002 user_0003 user_0001 user_0002 user_0003 Table A user_0001 user_0002 user_0003 user_0001 user_0002 user_0003 = = + + + Partition-based versioning pattern
  • 33. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. Record Set Versioning at Transforming Phase 33 create Table A_yyyymmdd =2016/03/31 user_0001 user_0002 user_0003 user_0001 user_0002 user_0003 user_0001 user_0002 user_0003 + 2016/04/01 user_0001 user_0002 user_0003 user_0001 user_0002 user_0003 user_0001 user_0002 user_0003 =+ 2016/04/02 user_0001 user_0002 user_0003 user_0001 user_0002 user_0003 user_0001 user_0002 user_0003 =+ Table B_yyyymmdd Table C_yyyymmdd Table-based versioning pattern
  • 34. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. Record Set Versioning at Transforming Phase • Table-based versioning doesn’t fit TD ‒ Increased table degrades query performance ‒ Union operator is needed for all the tables ‒ Append and remove is not realistic • Partition-based versioning with “once a day” rule ‒ Drop daily partition first before record insert ‒ ALTER TABLE capability would be helpful to invoke drop partition in a query 34
  • 35. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. Reuse Application’s Business Logic • Frequently appearing clause should be defined as a common UDF or view ‒ Incl. schema definition, const definition etc ‒ TD is missing both UDF and view features • Preliminary transform complicated tables in application side before loading into TD? ‒ Hybrid approach ‒ Reuse application code 35
  • 36. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. Other topics • Increasing users across division ‒ Account management (incl. dev/ops/biz) ‒ Race condition in Presto resource ‒ Large file delivery via web console • Presto/Hive query testing framework ‒ Test against small dataset with Presto/Hive SQL interface? 36
  • 37. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. 5. Conclusion
  • 38. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. Success Factors • TD allows to focus on understanding application and communication with Quipper engineers ‒ Fully managed Hadoop service ‒ Customer support’s quick response • Different DB but still in same TD ‒ No extra cost at database-cross JOIN ‒ Continuous analytics with JukenSapuri data 38
  • 39. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. Success Factors • Quipper’s culture and strong skills are really helpful to setup a data analytics platform for their application ‒ Global market already had a BQ based platform ‒ Open information and communication • Slack x GitHub x Google Drive ‒ Clean code with fine readability ‒ HRT : Humanity, Respect, and Trust • Cultural convergence between Quipper and RMP 39
  • 40. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. Conway’s Law? 40 Data Infra Dev Casual open communication over chat + PR
  • 41. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. Beyond Monitoring and Reporting • Sophisticated machine-learning with Hivemall • Realtime data processing and feed to application 41
  • 42. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. Distributors of Wisdom x 世界の果てまで最高のまなびを届ける 42
  • 43. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. 43