In Data Engineer's Lunch #60, Rahul Singh, CEO here at Anant, will discuss modern data processing/pipeline approaches.
Want to learn about modern data engineering patterns & practices for global data platforms? A high-level overview of different types, frameworks, and workflows in data processing and pipeline design.
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
1. Developing Enterprise
Consciousness
Global Data & Analytics Platforms on Cloud Neutral Systems
Rahul Xavier Singh Anant Corporation
Confidential - Not to be Distributed Except to Pre-Approved Audiences. Copyright Anant 2022
2. Data & Analytics platforms
are the central nervous
system of business
platforms. Every business,
small or large has a need
to get connected.
7. 7
Beyond 12 Factor … “Enterprise Consciousness”
● Current Business Information is available to the user / customer in
the swiftest way possible within the bounds of reasonable costs.
● Business Information is generally available to the enterprise, siloed
only by security and governance.
● Data platforms make use of appropriate resources for hot vs. cold,
raw vs. enhanced data.
● Data platforms are always available, redundant, always trying to
achieve a RPO/RTO of zero.
16. Current & Future State
Current Tools & Issues
● RMQ, Redis, Mongo not scaling.
● C#, Java , Node etc. can’t do Big Data alone.
● Data Replication / Resiliency is difficult
● DevOps / DataOps ?
Future Goals
● Scalable & Resilient Message Delivery
● Fault Tolerant Data Processing
● Real Time Data Storage & Retrieval
● Automatic Deployment & Upgrades
● Predictable , Scalable Growth for Platform
● Customer Satisfaction of Data Quality & Freshness
Example : Cloud neutral global data & analytics platform.
Technologies in evaluation.
17. Old Data & Analytics: ETL + Batch + Waiting
Much of the current thinking is that the state of the systems
in an enterprise are synchronous and that analysis must be
done sequentially, iteratively from beginning to end in batch.
18. New Data & Analytics: Events + Current Data
The growing trend in new thinking is that the state of the
systems in an enterprise is dynamically asynchronous and
that there is no “state” but everything is a stream of events.
19. New Data & Analytics : Streams + Queues + Bus
Streams, Queues, Bus: These technologies have been
around for a long time. What’s different today is that the
customer demand for realtime is forcing it across the board.
23. Cassandra + Spark + Kafka : Use Cases
Image: https://mesosphere.com/blog/kafka-dcos-tutorial/
1. Lambda Architecture: Balances stream
and batch processing for reliability.
2. Machine Learning : Delivering predictive,
and descriptive analytics in real-time.
3. Master Data Management : Ensure that all
the data is consistent all the time in all the
systems.
4. Realtime Customer Experience :
Customers are always informed,
recommendations are made, etc.
5. Realtime Information Systems : Team
members are always informed, etc.
38. 38
Topics
Data Pipeline
Data Engineering Tools
Apache Spark*
Apache Kafka*
Kubernetes/Docker/Helm
Terraform/Ansible
GitOps for Dev/DataOps
Airflow
Argo/Kubeflow
39. Sessions ● Presentation
○ Overview of how this fits into the larger
picture
○ Concept of the topics
○ Any tour of the technology
● Discussion
○ Session - Two Way Q&A
○ Online - Slack
○ Offline - Slack / Email
● Assignment
○ Hands-on
○ Self-paced work to put into practice
○ Create portfolio items
○ Try out new technology
Notes
Slides
● Overview
● Concepts
● Tech Tour
● Q&A
● Online
● Offline
Presentation Discussion
Git Repo
● Design
● Engineering
● Automation
Assignment
Notes de l'éditeur
Challenge
Large organizations and small businesses have the same problem.
Large organizations need to integrate data first before they can extract value whether through business intelligence, analytics, or machine learning.
Small organizations use online platforms to run their organizations (likely to use software that they have less control of their data)
Solution
Enterprise consciousness - An eventually consistent enterprise where all data synchronized between applications back to a central translytical database.
Challenge
Currently the components are broken up in to different vendors and parts.
Similar to building a computer every time for every client.
Challenge
Currently the components are broken up in to different vendors and parts.
Similar to building a computer every time for every client.