%in tembisa+277-882-255-28 abortion pills for sale in tembisa
Anatomy of a data driven architecture - Tamir Dresher
1. Anatomy of a Data Driven
Architecture
@tamir_dresher
System Architect @ Payoneer
2. 2
System Architect @ @tamir_dresher
Tamir Dresher
My Books:
Software Engineering Lecturer
Ruppin Academic Center
tamirdr@payoneer.com
https://www.israelclouds.com/iasaisrael
3. 3
The need for DATA
Your Data
Data-Driven Decision Making Data-powered product
• What markets are leading and where can I expand?
• What’s slowing my process?
• Is there a correlation between the time invested
in a sale and the income from the tenant?
• What products should I recommend to this user?
• Is this action fraudulent ?
• Should I suggest a discount to this user to
raise the chance for a purchase?
4. 4
ETL vs. ELT vs. Streaming
Transform LoadExtract
Transactional DB (OLTP) Analytical DB (OLAP)
LoadExtract
Transactional DB (OLTP)
Analytical DB (OLAP) / Storage
Transform
Events
Event
Stream
Stream
Processor
Real-time
insights
12. 12
Data Sources
Ingestion
&
Transformation
Storage
Query
&
Processing
Consumption
Data Integration Platforms
• Connectors to sources and destinations
• E.g. fivtran, stitchdata, rivery
Data Sources
Customized batching/micro-batching
• Spark jobs, Data libraries (panda, boto), Hive
• Workflows – Airflow, Dagster, Luigi
https://towardsdatascience.com/building-a-production-level-etl-pipeline-platform-using-apache-airflow-a4cf34203fbd
13. 13
Data Sources
Ingestion
&
Transformation
Storage
Query
&
Processing
Consumption
Data Integration Platforms
• Connectors to sources and destinations
• E.g. fivtran, stitchdata, rivery
Data Sources
Customized batching/micro-batching
• Spark jobs, Data libraries (panda, boto), Hive
• Workflows – Airflow, Dagster, Luigi
Event streaming and processing
• Messaging - Kafka, Pulsar, Kinesis, Event Hub
• Processing – Spark, Flink, Samza, Kafka
Streams, Azure Stream Analytics
14. 14
Data Sources
Ingestion
&
Transformation
Storage
Query
&
Processing
Consumption
Data Integration Platforms
• Connectors to sources and destinations
• E.g. fivtran, stitchdata, rivery
Data Sources
Customized batching/micro-batching
• Spark jobs, Data libraries (panda, boto), Hive
• Workflows – Airflow, Dagster, Luigi
Event streaming and processing
• Messaging - Kafka, Pulsar, Kinesis, Event Hub
• Processing – Spark, Flink, Samza, Kafka
Streams, Azure Stream Analytics
-- Continuously aggregating a stream into a table with a ksqlDB push query.
CREATE STREAM locationUpdatesStream ...;
CREATE TABLE locationsPerUser AS
SELECT username, COUNT(*)
FROM locationUpdatesStream
GROUP BY username
EMIT CHANGES;
// Continuously aggregating to table
KStream<String, String> locationUpdatesStream = ...;
KTable<String, Long> locationsPerUser
= locationUpdatesStream
.groupBy((k, v) -> v.username)
.count();
https://www.confluent.io/blog/kafka-streams-tables-part-1-event-streaming/
15. 15
Data Sources
Ingestion
&
Transformation
Storage
Query
&
Processing
Consumption
Data Warehouse
• Structured format
• Designed to quickly generate
insights based on SQL like
queries
• Modern cloud base offerings –
Redshift, BigQuery, Snowflake,
Azure Synapse
Data Lake
• Structured and non-structured
data – CSV, Parquet, Images,
Audio
• Raw and historical data
• Designed to be used by data
scientists and create models by
various languages
16. 16
Data Sources
Ingestion
&
Transformation
Storage
Query
&
Processing
Consumption
Predictive
• Generate model
• Data Science and ML libraries –
Pandas, Numpy, R, scikit etc
• The model is periodically refreshed
Storage
Retrospective (Historical)
• Deriving Intelligence based on
statistics
• Built-in engine (Data Warehouse)
OR
• Query Engines – Presto, Impala
Real Time Analytics
• Run analytical queries
over big volumes of data
with interactive latencies.
• Apache Pinot,
Clickhouse, Druid
Data Science Platforms
• Helps managing the workflows,
productization and operations
- SageMaker, Iguazio, DataBricks etc.
17. 17
Data Sources
Ingestion
&
Transformation
Storage
Query
&
Processing
Consumption
Custom Apps
• Execute the model – how is the model
reachable?
• Translate user/system actions to
queries
• Visualize – custom (Plotly Dash,
Streamlit etc) or embedded (PowerBI,
Looker etc)
External Apps
• Augmented Analytics - External
services to generate insights and
explain them
(e.g. Anomaly Detection with Anodot/
CrunchMetrics/outlier.ai)
• Customizable Reports and Dashboard
(e.g. Looker, Tableu, PowerBI, Sisense)
18. 18
Summary
Data
at Rest
Data in
Motion Event/Dat
a Stream
Workflow Engine
Stream
Processor
Real-time
insights
Analytical
Storage
/Lake
Query
Engine
Model
Engine
Model
Accessible API
Application
Data Sources Ingestion&Transformation Storage Query&Processing Consume
19. 19
Data at
Rest
Data in
Motion Event/Data
Stream
Workflow Engine
Stream
Processor
Real-time
insights
Analytical
Storage
/Lake
Query
Engine
Model
Engine
Model
Accessible API
Application
Data Sources Ingestion&Transformation Storage Query&Processing Consume
Thank You!
@tamir_dresher
Notes de l'éditeur
Data comes from many sources and feed our application
We need it for the OLTP obviously but being the new gold we can use the data feed to and historical data to gain more insights
BI
ML/AI
data-driven decision making (analytic systems) and drive data-powered products
Traditionally , orgs moved the data from the OLTP to the OLAP (DWH) with ETL
Modern architectures now relies on ELT for batched load
And to get the best real time response streaming is the way to go
Traditionally , orgs moved the data from the OLTP to the OLAP (DWH) with ETL
Modern architectures now relies on ELT for batched load
And to get the best real time response streaming is the way to go