6. Our development situation
• We love OSS.
• We have a robust analytic environment.
• We love SQL.
• We have corporate culture that can
challenge new technologies.
9. Our business situation
• We manage and operate web site of BtoB.
• Our data lifecycle is long.
• Business side not write sql.
• watching re:dash and Adobe analytics.
10. Embulk our use case
• Create point in time data.
• Create machine learning data.
• Execute Machine Learning on the Cloud.
• Natural language processing.
• Marketing Automation.
• One shot data migration.
• Large scale parallel crawl.
• etc…
16. Digdag my use case
• Digdag is very easy to use workflow
engine.
• Solving complex dependencies of
machine learning and natural language
processing.
• Crawl web site.
17. embulk-filter-kuromoji
• Kuromoji is an open source Japanese
morphological analyzer written in Java.
• morphological analyze on Embulk.
• Support Neologd.
• Support custom dictionary.
• 安全と安心は違う => [“安全”,”と”,“安
心”,“は”,“違う”]
18. embulk-filter-icu4j
• icu is Unicode normalize library.
• Unicode normalize on Embulk.
• https://hondou.homedns.org/pukiwiki/pukiwi
ki.php?Java%2520ICU4J.
• Many normalize pettern.
– Any-Katakana
– Any-Lower
– Any-NFC
– Any-NFKC
28. Cloud vision API Limits
Type of Limit Usage Limit
MB per image 4 MB
MB per request 8 MB
Requests per second 10
Requests per feature per day 700,000
Requests per feature per month 20,000,000
Images per second 8
Images per request 16
30. Natural language processing
• Google Cloud Natural Language API
• Cognitive Services
– Language Understanding Intelligent Service
– Text Analytics API
– Bing Spell Check API
33. Machine translation
• Google Translate API
– Support neural machine translation(need
premium edition).
• Azure
– Support neural machine translation.
– Text and speech
• Translator Text API
• Translator Speech API
36. Embulk my requests
• Binary Type support.
– I want scripting image processing.
• Replace logger.
– I want to structure logs.
– I want use fluent logger.
• executor plugin more developper friendly.
– Pluggable resource control.