This is a real case from VMfive to shifting ELK architecture from AWS. Currently GCP Data Pipeline provide us more efficiency and stable environment for running our service.
4. Pros & Cons
• Pros :
• Well Support.
• Well docs.
• Easy to find Reference.
• Cons :
• High Cost.
• Not open source.
• Have to set the scale at first.
7. The Products and Services logos may be used to accurately reference Google's technology and tools, for instance in architecture diagrams. 7
Batch
BI Analysis
Storage
Cloud Storage
Processing
Cloud DataflowStreaming
Time Series Streaming
Cloud Pub/Sub
Storage
BigQuery
8. The Products and Services logos may be used to accurately reference Google's technology and tools, for instance in architecture diagrams. 8
Targeting Engines
Data Sources
Machine Learning
Applications
API Backend
Compute Engine
Spark MLlib
Cloud Dataproc
App Engine
Transform Data
Hosted Models
Cloud Machine Learning
Real-Time
Prediction API
Device Related
Cloud Pub/Sub
Behavior Related
Cloud Pub/Sub
3rd Party Data
Cloud Pub/Sub
Redis
Compute Engine
9. Pros & Cons
• Pros :
• Cost-effective.
• Operation-effective.
• Google got your back.
• Cons :
• API/SDK changes everyday.
• Some still in beta mode.
• Docs everywhere.
10. Workflow Monitoring
• Digdag <Airflow/Oozie/Luigi>
• Native support Python & Ruby
• Multi-Cloud
• Modular
• Workflow as code
• Docker Support
• Altering to Slack
14. Cost Comparison
• $2000 on AWS per month
• about $200 on GCP production
• about another $200 for dev
• 50M events per month
15. Business Use Case
• Digital Ads Targeting
• User Behavior Tagging
• BI
• GEO Reporting
• KPI Reporting
• User Demographic
16. Some Tips
• BigQuery
• https://status.cloud.google.com/incident/bigquery/
18022
• Solved by Fluentd’s Retry and HA
• Dataflow’s SDK & docs is not sync
• Dataflow Sideinput has a bug with Streaming mode
• Compute Engine SLB - TCP/UDP setup for forwarding
17. Flunetd Update
• Release note for v0.14
• sub second event flush
• New Plugin APIS
support formatting configurations dynamically
(e.g., path /my/dest/${tag}/mydata.%Y-%m-%d.log)
• Secure Forward