How Service Mesh Fits into the Modern Data Stack

Fabian Hardt
SEPTEMBER 2022
HOW SERVICE MESH
FITS INTO THE
MODERN DATA
STACK

AGENDA
MOTIVATION
01
WHAT IS THE MDS
02
SUMMARY
04
ARCHITECTURE
03

Data Lake and DWH are combined
In the same way, BI and AI are growing together, see e.g.
Snowflake and Databricks. Counter-movement to
increasing specialization. In general: it moves a lot.
WHAT WE ARE CURRENTLY SEEING...
A lot of money in the market
With it increasing fragmentation of functionalities; each
new startup takes care of a special function. This
distribution of individual functionalities in separate tools.
Cloud is the new standard
Hardly anyone still builds analytical architectures on-
premises. But there are exceptions. The Hadoop
ecosystem is becoming less important; Data lakes
increasingly on object storage in the public cloud.
Use of software development best
practices
Also as a result of migration to the cloud. So infrastructure
as code, automation, CI/CD. Increasing sympathy for code-
first and open source, return of frameworks, DIY, SQL only
(skills!).

WHAT IS THE MODERN DATA STACK
02

CORE CHARACTERISTICS OF THE MODERN DATA STACK (MDS)
Automation and
operationalization
Basic paradigms of modern software
development are being introduced, including
GitOps, CI/CD, containers and automated
testing.
Best of Breed and Modular
A Modern Data Stack has a modular
structure. Individual components can be
exchanged. EL+T are separated. The best tool
is selected for each discipline.
Cloud DWH
Central data storage component of the
Modern Data Stack. Combines advantages
of data lake and data warehouse.
SaaS / IaC
Focus on maintainability and low time to
market. This can be achieved using SaaS
services from cloud providers or automation
using IaC.

WHY TO USE MODERN DATA STACK?
¢ Data Mesh
¢ Total hype at the moment
¢ Organizational Framework for Data Driven Companies
¢ Data Products with APIs - similar to microservices
¢ Domain Driven Design from software development as a basis
¢ Clear responsibilities for Data Products
¢ More flexibility for developers in tool selection
¢ Modern Data Stack as a technical framework to implement data mesh (organizational)
¢ Flexible architecture to support “free choice of weapons” – Modern Data Platform
¢ APIs for intern and extern purposes
¢ Focus: Shorter “Time to Market”

DATA MESH – DATA PRODUCTS
¢ In direct connection with microservices from the classic SD environment
¢ Operational applications vs. analytical applications

USAGE OF DATA PRODUCTS
Data API

AIRBYTE
¢ Data Ingest
¢ Many standard connectors available
¢ Saas, Cloud, APIs, Databases,…
¢ Facebook, Google, Salesforce, Redshift, Snowflake, BigQuery, …
¢ Own connectors with Python Connector Development Kit
¢ Simple transformations possible
¢ SaaS (just in US) und Open Source for own installations
¢ Container based operation
¢ Separation of platform/connectors (server, UI, scheduler,
...)
¢ New container for each connector
¢ Possible alternatives: Stitch, Fivetran, Singer, Meltano, …

DBT
¢ Data Transformation („Data Build Tool“)
¢ Just „T“ in EL+T – Extraction separately
¢ ELT approach, so-called models are compiled for the target platform
(e.g. Cloud DWH, Snowflake) and executed there
¢ Code-first, SQL with Jinja (Templating)
¢ There is a growing community, extensions can be downloaded
¢ SaaS and Open Source (Python)
¢ DEV environment cloud.getdbt.com
¢ Any editor can be used locally, CLI available (dbt-core)
¢ Deployment
¢ VM, Docker container, can be integrated almost anywhere
¢ Possible alternatives: Azure Data Factory, Talend, Informatica, …

APACHE AIRFLOW
¢ Workflow management system
¢ Originally developed by Airbnb
¢ Running a DAG (Directed Acyclic Graph)
¢ Nodes contain operators, can execute code, but also control other
tools
¢ Popular for building/running data pipelines
¢ Best suited for GitOps / integration into pipelines
¢ Managed variants available and open source
¢ Astronomer, Managed Airflow bei AWS, Google
¢ Consists of: Scheduler, Worker, UI, DB, Flower (Celery, Redis)
¢ Parallel processing on several workers possible
¢ Scales thanks to container technology
¢ Possible alternatives: Dagster, Luigi, Prefect, …

MDS ARCHITECTURE WITHOUT SERVICE MESH

TYPICAL SERVICE MESH WITHOUT MDS

KUMA IN ACTION
¢ All internal MDS services get
sidecars
¢ Central overview over
services of all domains
¢ Status of services
¢ Metrics of services
¢ Traffic between components
can be controlled

METRICS & TRACING
Metrics & Tracing of internal MDS components:

WHAT PROBLEMS DOES A SERVICE MESH SOLVE IN MDS
¢ Centralized Service Mesh implementaion
¢ Centralized overview over all services
¢ Internal – MDS
¢ External – Data APIs
¢ Centralized monitoring over all services
¢ Monitoring
¢ Logging
¢ Tracing
¢ Decrease time to market
¢ Developers don't have to worry about recurring problems
¢ Security – TLS
¢ Authentication & Authorization
¢ Support for exporting data & APIs

¢ Modern Data Stack as a distributed data platform
¢ One possible architecture to build support Data Mesh
implementation
¢ Service Mesh helps to
¢ Secure…
¢ Monitor…
¢ Trace…
¢ …this Modern Data Stack Architecture
¢ But: The complexity of the system is additionally increased
¢ The team must have a deep understanding of service mesh
SUMMARY

Fabian Hardt
CONTACT
Solution Architect
Fabian.hardt@opitz-consulting.com
https://twitter.com/fabian_hardt
www.linkedin.com/in/fabian-hardt

How Service Mesh Fits into the Modern Data Stack

Recommandé

Recommandé

Contenu connexe

Similaire à How Service Mesh Fits into the Modern Data Stack

Similaire à How Service Mesh Fits into the Modern Data Stack (20)

Plus de Fabian Hardt

Plus de Fabian Hardt (9)

Dernier

Dernier (20)

How Service Mesh Fits into the Modern Data Stack