+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
StudySapuri Data Analytics Platform with Treasure Data
1. StudySapuri Data Analytics Platform
with Treasure Data
Tetsuo Yamabe
Recruit Marketing Partners Co., Ltd.
2. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
About Me
Tetsuo Yamabe
2
Data Engineer / Ph.D. (Eng)
Communication Design Group
Business Development Department
Online Learning Development Office
Education & Learning Business Division
3. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
About Me
Tetsuo Yamabe
3
Joined RMP at Aug.2015
10 months TD experience
Data analytics platform development
for our online learning service
(a.k.a. StudySapuri)
8. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
http://www.slideshare.net/Seigen/ss-61816140
Adaptive Learning for personalized LX
Collaborative research with Matsuo Lab. at Tokyo Univ.
9. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Outline
1. Background
2. Platform Migration and TD
3. Technical Details
4. Challenges and Future Work
5. Conclusion
9
12. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. 12
Recruit Technologies
Recruit Marketing Partners
13. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. 13
Recruit Marketing Partners
Recruit TechnologiesQuipper
14. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Quipper
• “Distributors of Wisdom”
‒ Japanese EdTech company launched in London
‒ Teacher-student communication support system
• Worldwide presence in global education scene
‒ London, Tokyo, Manila, Jakarta, Mexico City
‒ Open culture with strong engineering competence
‒ Acquired by Recruit Marketing Partners in Apr. 2015
14
16. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Recruit
private
cloud
AWS
Before After
2016.2.25
17. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
2. Platform Migration and TD
18. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Before “Quipper Migration”
• Main usage
‒ KPI monitoring
‒ Adhoc user activity analytics
• Used together with private Hadoop
‒ WebHive
18
19. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Before “Quipper Migration”
19
Raw tables/logs Transformed tables
Member attributes
Activity logs
Data
Ops
20. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Extract, Transform and Load Pattern
Pros
• Easy to use (simple schema, aggregated information)
• Easy to maintain (data team perspective)
• Reduced size information and logs
Cons
• Inflexibility in fixed data source and schema definition
• Bloating tables
• Black-boxed transformation
• Communication cost across divisions/companies
20
21. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
After “Quipper Migration”
21
Raw tables/logs
Scooped
tables
Member attributes
Activity logs
Transformed
tables
DataInfraDev
22. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Extract, Load and Transform Pattern
Pros
• You have everything you need/want
• Fully aggregated data in TD
Cons
• Duplicate business logic
• Batch process maintenance cost
• Data volume and load time
• Learning cost (app data and internal architecture)
22
23. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. 23
Contents Performance Monitoring
Customer Support Support
Students Performance Report
Class Status Report
KPI Monitoring
Salesman Support
Developer Support
Prototyping New FeatureData Science Support
24. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Fact Sheet
• 50+ tables are daily imported by Embulk
• 30+ hive queries are invoked by Luigi
• 10+ presto queries are scheduled in TD web console
• 20+ reports are delivered to 5 business divisions
24
26. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Streaming
Insert
Application
(Server side)
Databases
Application
(Client side)
TD SDK
Kinesis Lambda
DataTank
PlazmaDB
Join /w FDW
Bulk import
System Overview
Payment logs
Video info
27. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Featured Topics
• Client-side events
‒ SPA event tracking
‒ Customized TD tag
• Server-side events
‒ Streaming insert with Kinesis + Lambda
• td-client-python
‒ Durability improvement
27
28. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Featured Topics
• DataTank
‒ Isolate sensitive information from Plazma DB
‒ Data mart store to connect BI
• Luigi
‒ Define data transforming job with table dependency
‒ Invoke Embulk command inside Luigi Jobs
28
29. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Featured Topics
• Bulk import
‒ Cross import from MongoDB and PostgreSQL to
PlazmaDB and DataTank
• embulk-input-mongodb
• embulk-input-postgresql
• embulk-filter-insert
• embulk-filter-eval
• embulk-output-td
• embulk-output-postgresql
29
30. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
4. Challenges and Future Work
31. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Scooped
raw tables
Transformed
tables
Report
tables / marts
Scheduled queries in web console
• Select all without conditions
• Assign column name in Japanese
• Result export to Google spreadsheet
Transform tables in Luigi tasks
32. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Record Set Versioning at Transforming Phase
32
=2016/03/31
2016/04/01
2016/04/02
append
user_0001
user_0002
user_0003
Table C
user_0001
user_0002
user_0003
user_0001
user_0002
user_0003
user_0001
user_0002
user_0003
Table B
user_0001
user_0002
user_0003
user_0001
user_0002
user_0003
user_0001
user_0002
user_0003
Table A
user_0001
user_0002
user_0003
user_0001
user_0002
user_0003
=
=
+
+
+
Partition-based versioning pattern
34. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Record Set Versioning at Transforming Phase
• Table-based versioning doesn’t fit TD
‒ Increased table degrades query performance
‒ Union operator is needed for all the tables
‒ Append and remove is not realistic
• Partition-based versioning with “once a day” rule
‒ Drop daily partition first before record insert
‒ ALTER TABLE capability would be helpful to
invoke drop partition in a query
34
35. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Reuse Application’s Business Logic
• Frequently appearing clause should be defined as a
common UDF or view
‒ Incl. schema definition, const definition etc
‒ TD is missing both UDF and view features
• Preliminary transform complicated tables in
application side before loading into TD?
‒ Hybrid approach
‒ Reuse application code
35
36. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Other topics
• Increasing users across division
‒ Account management (incl. dev/ops/biz)
‒ Race condition in Presto resource
‒ Large file delivery via web console
• Presto/Hive query testing framework
‒ Test against small dataset with Presto/Hive SQL
interface?
36
38. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Success Factors
• TD allows to focus on understanding application and
communication with Quipper engineers
‒ Fully managed Hadoop service
‒ Customer support’s quick response
• Different DB but still in same TD
‒ No extra cost at database-cross JOIN
‒ Continuous analytics with JukenSapuri data
38
39. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Success Factors
• Quipper’s culture and strong skills are really helpful to
setup a data analytics platform for their application
‒ Global market already had a BQ based platform
‒ Open information and communication
• Slack x GitHub x Google Drive
‒ Clean code with fine readability
‒ HRT : Humanity, Respect, and Trust
• Cultural convergence between Quipper and RMP
39
40. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Conway’s Law?
40
Data
Infra
Dev
Casual open communication over chat + PR
41. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Beyond Monitoring and Reporting
• Sophisticated machine-learning with Hivemall
• Realtime data processing and feed to application
41
42. (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
Distributors of Wisdom
x
世界の果てまで最高のまなびを届ける
42