How To Troubleshoot Collaboration Apps for the Modern Connected Worker
Challenges in Building a Data Pipeline
1. Manish Singh
Engineer at Hevo
https://linkedin.com/in/manishsingh123/
Challenges in Building a
Data Pipeline
2. ● Data Pipeline
● Possible Implementations
● Challenges
● Data Processing Architectures
Agenda
3. ● Highly scalable
● Highly available
● Low latency
● Zero data loss
● Support for multiple data sources (e.g. MySQL, NoSQL,
Mixpanel, Analytics)
● Instrumentation, monitoring, and alerting
● Real-time vs Batch
Expectations
7. ● Complexity of transformation logic compromises latency
● Hardware systems today are better equipped
● Efficient, reduces load time
● Cost effective in the cloud, less components required
Moving from traditional ETL
to ELT
8. ● Query Source DB and keep offset (ID, Updated timestamp)
● Database change logs (e.g. Mysql Binlogs, MongoDB Oplogs)
Replication Modes
9. ● New fields can be added to a source at any point in time
● Character lengths of String columns in source can increase
● Data Type incompatibility between Source and Destination
● Varying type casting
● Data loss during loads - Power failure, Server failure, Code
bugs, etc
Challenges
10. ● Schema detection cannot be done upfront
● Different documents in a single collection can have a different
set of fields
● Different documents in a single collection can have
incompatible field data types
● Nested objects and arrays with a dynamic structure
Additional Challenges with
NoSQL
11. ● Transformations
● Security (Filter, Hashing)
● Replay Mechanism
● Integrity and Anomaly Detection
● Monitoring and Alerts for failures
● Activity Log
Effective Implementations
12.
13.
14. ● How to beat the CAP theorem by Nathan Marz
● Different layers for stream and batch processing
● Need to manage two different layers of the system
Lambda Architecture
16. ● Questioning the Lambda Architecture by Jay Kreps
● Only stream processing with parallelism
● Set Kafka retention policy
● Reprocess into separate table
● Switch table when done and delete the old one
Kappa Architecture