SlideShare une entreprise Scribd logo
1  sur  20
Télécharger pour lire hors ligne
Relational Database Stockholm
Syndrome
Liberate your data
Introduction
Neil Murray – Lead Big Data Architect
• Big Data platforms, products and solutions
• Telco
• Public Sector
6point6 - Technology consultancy with strong expertise in digital, data,
emerging technology and cyber.
About
Digital
Helping businesses on their
digital transformation journey,
which we see as a past to the
future continuum.
Data
Helping businesses
leverage data platforms,
data science and data
engineering to drive value
from the data they generate.
Cyber
Helping businesses
understand, manage and
contain cyber risks, with
appropriate measures,
prioritised for all their
digital assets.
What we do
Emerging
Technology
Working with businesses to
combine human and
artificial intelligence (AI),
optimising data, process
and technology to support
enhanced decision making.
Relational Database Stockholm Syndrome - the psychological
phenomenon often observed in hostage situations where the hostages
(Data/Services) start to identify with (and sympathise with) their captor
(Relational Database), even though trapped.
Situation
Typical Situation : Data Warehouse
• The Challenge:
• 40+ siloed data sources ingested at irregular intervals ‘as-is’ into a data
warehouse
• Deploy an OLAP data warehouse to co-locate the previously disparate data
for exploration, analysis and analytics.
• A need for abstraction! Business domains, entities and data models to be
identified - subsets of data transformed
• The Result:
• Batch based, single vendor, MPP OLAP shared database with a universal
interface - SQL
• Data team select ELT tooling leveraging push-down SQL to exploit DB
investment – storage and compute
• Platform team to monitor, support, tune, tidy and upgrade
We have all lived the typical situation
DATA WAREHOUSE
RELATIONAL DATABASE
MPP OLAP
Atomic Data
COLLECT
ELT
Specific ELT Tool
Extract Load Transform
DATA
DATABASE
OLTP
FILE
CSV, XML, XLS
API
REST, GRPC
STREAMING
Kafka, CDC
…
…
Abstract Data
Typical Situation : Data Access
• Services deployed - Reporting, Analytics, Search,
Rules Engines, ML, …
• Services access data only via abstract layer
• SQL interface (JDBC, ODBC, Clients)
• Storage and compute - utilise DB to transform and
shape data for specific service usage
• Emerging pattern of exporting transformed data
where appropriate to support specific use cases
The service solution
SERVICE
SEARCH
SERVICE
REPORTING
DATA WAREHOUSE
RELATIONAL DATABASE
MPP OLAP
Atomic Data
Abstract Data
SERVICE
RULES
DATA CONSUMERS
Data Scientist
Data Analyst
SERVICE CONSUMERS
Exercising Control
• How to support rich data structures?
• Impossible to predict future needs!
• Data ‘lost in translation’
• Difficult to model key metadata
• Design leads to bad leaky data - shortcuts
• Change to schema design difficult to manage and
has enormous blast area
Relational Schema is too rigid
SERVICE
SEARCH
SERVICE
REPORTING
DATA WAREHOUSE
RELATIONAL DATABASE
MPP OLAP
Atomic Data
Abstract Data
SERVICE
RULES
DATA CONSUMERS
Data Scientist
Data Analyst
Exercising Control
• Services struggle to be independently deployable using a
single shared database
• Transformations embed business logic in the DB
• Failed attempts to create Service APIs
• ELT tooling may only work for target DB
• ELT requires careful scheduling, hand cranking
Lost independence
SERVICE
SEARCH
SERVICE
REPORTING
DATA WAREHOUSE
RELATIONAL DATABASE
MPP OLAP
Atomic Data
Abstract Data
SERVICE
DATA CONSUMERS
Data Scientist
Data Analyst
COLLECT
ELT
Specific ELT Tool
Extract Load Transform
Exercising Control
• Complex data ecosystem
• Limited shared resources
• SQL interface
• Punishing queries
• High latency data polling based ‘subscriptions’
• Poll/batch/bulk/delta the only valid approaches =
stale data
Performance
SERVICE
ANALYTICS
SERVICE
SEARCH
SERVICE
REPORTING
DATA WAREHOUSE
RELATIONAL DATABASE
MPP OLAP
Atomic Data
Abstract Data
SERVICE
RULES
DATA CONSUMERS
Data Scientist
Data Analyst
COLLECT
ELT
Specific ELT Tool
Extract Load Transform
Sympathy
• Temptation to reach around abstract layer
irresistible! Hybrid access patterns emerge
• Spaghetti SQL code harbouring complex nested
dependencies
• No protection from upstream changes
• Tight coupling and technical debt that will never be
repaid
Workarounds
SERVICE
SEARCH
SERVICE
REPORTING
DATA WAREHOUSE
RELATIONAL DATABASE
MPP OLAP
Atomic Data
Abstract Data
SERVICE
RULES
Sympathy
• Solutions converge around data locality (gravity)
• ELTTTTTTT
• Repetition of compute
• Challenging to share
• Challenging to maintain
• Bespoke frameworks emerge
Locality and Transformations
SERVICE
SEARCH
SERVICE
REPORTING
DATA WAREHOUSE
RELATIONAL DATABASE
MPP OLAP
Atomic Data
Abstract Data
SERVICE
RULES
FRAMEWORK
Data Hostage Test
• Does your data reside in single DBMS and do the SMT frequently discuss the purchase of additional nodes,
capacity and licenses?
• Is business is slow to adapt? Are opportunities frequently missed?
• Is it easy to trial/adopt new technologies?
• Is there an appetite for change or does change = unpredictable risk and cost?
• You have agile teams, yet agility is not reflected in services/products?
• Is there a personnel skew towards a large DBA/Platform team?
• Cookie-cutter solution architecture - 1 tool fits all? We’ve always done it this way!
Question your data captivity…
Data Liberation
• Kappa Architecture (J. Kreps)
• Turning the Database Inside Out (M. Kleppmann)
• A Database Unbundled (B. Stopford)
• Deconstruct the data warehouse
• Data relocated to a distributed log
• State management
• Queries/Projections relocated to services
• The right tool for the right job
• Separation of storage and compute - BYOC
An alternative approach DATA
DATABASE
OLTP
FILE
CSV, XML, XLS
API
REST, GRPC
STREAMING
Kafka, CDC
…
…
COLLECT
ETP
Record Assembly
Extract PublishTransform
EVENTS
DISTRIBUTED LOG + STATE PROCESSING
Apache Kafka
State
SERVICE
SEARCH
SERVICE
REPORTING
SERVICE
RULES
STORAGE
DOCUMENT
STORAGE
OBJECT
DATA CONSUMERS
Data Scientist
Data Analyst
Data Liberation
• Entities modelled as Domain Driven Design Aggregates or
Events
• Evolvable Schema – Avro/Protobuf/Thrift
• Support for rich data structures
• Governable/extensible/composable/versionable
• Efficient/full data capture with generics
• A contract, insulating change and democratising data whilst not
mandating shape of storage
Domain Entities in an Evolvable Schema
Abstract Evolvable Schema
Metadata
Common
Type
Generic
Attrs Feats
Abstract Relational Schema
Data Liberation
• Paradigm shift from stateful entity ELT to stateless
entity ETP
• Extract: extensible adapters get data from variety of
data sources batch or stream
• Transform: map entity data to schema
• Publish: submit commands
• Language/tool/log agnostic
Stateful Entity Data Pipeline - ETP
Command Envelope
COLLECT
ETP
Record Assembly
Extract PublishTransform
EVENTS
DISTRIBUTED LOG + STATE PROCESSING
Apache Kafka
State
Abstract Evolvable Schema
Metadata
Common
Type
Generic
Attrs Feats
DATA
DATABASE
OLTP
FILE
CSV, XML, XLS
API
REST, GRPC
STREAMING
Kafka, CDC
…
…
Data Liberation
• Domain Processing consumes commands to
handle state mutation, de-duplication, out-of-
order resolution
• Utilise Kafka Streams API and RocksDB
• Produces state change events and entity
snapshots for downstream consumers
• Distributed, resilient, separation of concerns
• Eventual Consistency paradigm can remove
assembly complexity (joins)
Stateful Entity Data Pipeline – Domain Processing
EVENTS
DISTRIBUTED LOG + STATE PROCESSING
Apache Kafka
DOMAIN PROCESSING
AGGREGATE PROCESSING
Kafka Streams, RocksDB
State
Command State Change + Snapshot
Fact
Data Liberation
• Enrichment pattern to utilise Domain Processing
mutation logic
• Examples: lookups, conformance, matching,
scoring, feature engineering
• De-coupled, event driven, independently deployable
• Cross-datasource
Stateful Entity Data Pipeline - Enrichment
ENRICHMENT
LOOKUP
ENRICHMENT
LOOKUP
EVENTS
DISTRIBUTED LOG + STATE PROCESSING
Apache Kafka
DOMAIN PROCESSING
AGGREGATE PROCESSING
Kafka Streams, RocksDB
State
Command State Change + Snapshot
ENRICHMENT
LOOKUP
Data Liberation
• Polyglot data stores, the right tool for the right
job
• Maintain existing services
• Embedded lightweight materialised views
(CQRS)
• Low latency
• Independently deployable, agile
• Service patterns: stateful (snapshot), stateless
(state change), ephemeral, one-time, serverless, …
Stateful Entity Data Pipeline - Services
EVENTS
DISTRIBUTED LOG + STATE PROCESSING
Apache Kafka
State
SERVICE
SEARCH
SERVICE
REPORTING
SERVICE
RULES
SERVING
DOCUMENT
SERVING
OBJECT
Command State Change + Snapshot
The way forward
• Right tool for the right job – an enabler for innovation
• Don’t underestimate the sympathy factor – the process will be like removing a comfort blanket, there will be
resistance. Challenge opinions on storage vs compute, batch vs stream, SQL all the things
• Your schema is your contract – take care to maintain compatibility, put governance in place. Avoid the
temptation to use JSON
• Utilise Kafka ecosystem first
• Leverage Kafka Streams for domain processing and services or KSQL where appropriate
• Use Schema Registry
• Prefer Kafka Connect
• Consider Kafka managed service offerings – Confluent Cloud, KMS
Learnings
Get in touch
Neil Murray
Lead Big Data Architect, Data
neil.murray@6point6.co.uk
About 6point6
Integrating digital technology into your business can result in fundamental changes to
how you operate and deliver value to your customers. To go digital is to reinvent
yourself to the core, opening yourself and your clients to a world of possibilities.
6point6 is a technology consultancy. We bring a wealth of hands-on experience to help
financial service providers, media houses and government achieve more with digital.
Using cutting edge technology and agile delivery methods, we help you reinvent,
transform and secure a brighter digital future.
Visit us on www.6point6.co.uk
Twitter: @6point6ltd
LinkedIn: linkedin.com/company/6point6
192

Contenu connexe

Tendances

Priyank Patel, Teradata, Hadoop & SQL
Priyank Patel, Teradata, Hadoop & SQLPriyank Patel, Teradata, Hadoop & SQL
Priyank Patel, Teradata, Hadoop & SQL
The Hive
 

Tendances (18)

Tame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data IntegrationTame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data Integration
 
Enterprise Data Classification and Provenance
Enterprise Data Classification and ProvenanceEnterprise Data Classification and Provenance
Enterprise Data Classification and Provenance
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
 
2009.10.22 S308460 Cloud Data Services
2009.10.22 S308460  Cloud Data Services2009.10.22 S308460  Cloud Data Services
2009.10.22 S308460 Cloud Data Services
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
 
Microservices Patterns with GoldenGate
Microservices Patterns with GoldenGateMicroservices Patterns with GoldenGate
Microservices Patterns with GoldenGate
 
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big DataExpand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Data
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
 
DBCS Office Hours - Modernization through Migration
DBCS Office Hours - Modernization through MigrationDBCS Office Hours - Modernization through Migration
DBCS Office Hours - Modernization through Migration
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - Webinar
 
Priyank Patel, Teradata, Hadoop & SQL
Priyank Patel, Teradata, Hadoop & SQLPriyank Patel, Teradata, Hadoop & SQL
Priyank Patel, Teradata, Hadoop & SQL
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3
 
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
 

Similaire à Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 Confluent Streaming Event

Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
Cloudera, Inc.
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 

Similaire à Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 Confluent Streaming Event (20)

Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lake
 
Taming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI OptionsTaming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI Options
 
Distributed Data Quality - Technical Solutions for Organizational Scaling
Distributed Data Quality - Technical Solutions for Organizational ScalingDistributed Data Quality - Technical Solutions for Organizational Scaling
Distributed Data Quality - Technical Solutions for Organizational Scaling
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data Lake
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
 
An AMIS overview of database 12c
An AMIS overview of database 12cAn AMIS overview of database 12c
An AMIS overview of database 12c
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 

Plus de confluent

Plus de confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 Confluent Streaming Event

  • 2. Introduction Neil Murray – Lead Big Data Architect • Big Data platforms, products and solutions • Telco • Public Sector 6point6 - Technology consultancy with strong expertise in digital, data, emerging technology and cyber. About
  • 3. Digital Helping businesses on their digital transformation journey, which we see as a past to the future continuum. Data Helping businesses leverage data platforms, data science and data engineering to drive value from the data they generate. Cyber Helping businesses understand, manage and contain cyber risks, with appropriate measures, prioritised for all their digital assets. What we do Emerging Technology Working with businesses to combine human and artificial intelligence (AI), optimising data, process and technology to support enhanced decision making.
  • 4. Relational Database Stockholm Syndrome - the psychological phenomenon often observed in hostage situations where the hostages (Data/Services) start to identify with (and sympathise with) their captor (Relational Database), even though trapped. Situation
  • 5. Typical Situation : Data Warehouse • The Challenge: • 40+ siloed data sources ingested at irregular intervals ‘as-is’ into a data warehouse • Deploy an OLAP data warehouse to co-locate the previously disparate data for exploration, analysis and analytics. • A need for abstraction! Business domains, entities and data models to be identified - subsets of data transformed • The Result: • Batch based, single vendor, MPP OLAP shared database with a universal interface - SQL • Data team select ELT tooling leveraging push-down SQL to exploit DB investment – storage and compute • Platform team to monitor, support, tune, tidy and upgrade We have all lived the typical situation DATA WAREHOUSE RELATIONAL DATABASE MPP OLAP Atomic Data COLLECT ELT Specific ELT Tool Extract Load Transform DATA DATABASE OLTP FILE CSV, XML, XLS API REST, GRPC STREAMING Kafka, CDC … … Abstract Data
  • 6. Typical Situation : Data Access • Services deployed - Reporting, Analytics, Search, Rules Engines, ML, … • Services access data only via abstract layer • SQL interface (JDBC, ODBC, Clients) • Storage and compute - utilise DB to transform and shape data for specific service usage • Emerging pattern of exporting transformed data where appropriate to support specific use cases The service solution SERVICE SEARCH SERVICE REPORTING DATA WAREHOUSE RELATIONAL DATABASE MPP OLAP Atomic Data Abstract Data SERVICE RULES DATA CONSUMERS Data Scientist Data Analyst SERVICE CONSUMERS
  • 7. Exercising Control • How to support rich data structures? • Impossible to predict future needs! • Data ‘lost in translation’ • Difficult to model key metadata • Design leads to bad leaky data - shortcuts • Change to schema design difficult to manage and has enormous blast area Relational Schema is too rigid SERVICE SEARCH SERVICE REPORTING DATA WAREHOUSE RELATIONAL DATABASE MPP OLAP Atomic Data Abstract Data SERVICE RULES DATA CONSUMERS Data Scientist Data Analyst
  • 8. Exercising Control • Services struggle to be independently deployable using a single shared database • Transformations embed business logic in the DB • Failed attempts to create Service APIs • ELT tooling may only work for target DB • ELT requires careful scheduling, hand cranking Lost independence SERVICE SEARCH SERVICE REPORTING DATA WAREHOUSE RELATIONAL DATABASE MPP OLAP Atomic Data Abstract Data SERVICE DATA CONSUMERS Data Scientist Data Analyst COLLECT ELT Specific ELT Tool Extract Load Transform
  • 9. Exercising Control • Complex data ecosystem • Limited shared resources • SQL interface • Punishing queries • High latency data polling based ‘subscriptions’ • Poll/batch/bulk/delta the only valid approaches = stale data Performance SERVICE ANALYTICS SERVICE SEARCH SERVICE REPORTING DATA WAREHOUSE RELATIONAL DATABASE MPP OLAP Atomic Data Abstract Data SERVICE RULES DATA CONSUMERS Data Scientist Data Analyst COLLECT ELT Specific ELT Tool Extract Load Transform
  • 10. Sympathy • Temptation to reach around abstract layer irresistible! Hybrid access patterns emerge • Spaghetti SQL code harbouring complex nested dependencies • No protection from upstream changes • Tight coupling and technical debt that will never be repaid Workarounds SERVICE SEARCH SERVICE REPORTING DATA WAREHOUSE RELATIONAL DATABASE MPP OLAP Atomic Data Abstract Data SERVICE RULES
  • 11. Sympathy • Solutions converge around data locality (gravity) • ELTTTTTTT • Repetition of compute • Challenging to share • Challenging to maintain • Bespoke frameworks emerge Locality and Transformations SERVICE SEARCH SERVICE REPORTING DATA WAREHOUSE RELATIONAL DATABASE MPP OLAP Atomic Data Abstract Data SERVICE RULES FRAMEWORK
  • 12. Data Hostage Test • Does your data reside in single DBMS and do the SMT frequently discuss the purchase of additional nodes, capacity and licenses? • Is business is slow to adapt? Are opportunities frequently missed? • Is it easy to trial/adopt new technologies? • Is there an appetite for change or does change = unpredictable risk and cost? • You have agile teams, yet agility is not reflected in services/products? • Is there a personnel skew towards a large DBA/Platform team? • Cookie-cutter solution architecture - 1 tool fits all? We’ve always done it this way! Question your data captivity…
  • 13. Data Liberation • Kappa Architecture (J. Kreps) • Turning the Database Inside Out (M. Kleppmann) • A Database Unbundled (B. Stopford) • Deconstruct the data warehouse • Data relocated to a distributed log • State management • Queries/Projections relocated to services • The right tool for the right job • Separation of storage and compute - BYOC An alternative approach DATA DATABASE OLTP FILE CSV, XML, XLS API REST, GRPC STREAMING Kafka, CDC … … COLLECT ETP Record Assembly Extract PublishTransform EVENTS DISTRIBUTED LOG + STATE PROCESSING Apache Kafka State SERVICE SEARCH SERVICE REPORTING SERVICE RULES STORAGE DOCUMENT STORAGE OBJECT DATA CONSUMERS Data Scientist Data Analyst
  • 14. Data Liberation • Entities modelled as Domain Driven Design Aggregates or Events • Evolvable Schema – Avro/Protobuf/Thrift • Support for rich data structures • Governable/extensible/composable/versionable • Efficient/full data capture with generics • A contract, insulating change and democratising data whilst not mandating shape of storage Domain Entities in an Evolvable Schema Abstract Evolvable Schema Metadata Common Type Generic Attrs Feats Abstract Relational Schema
  • 15. Data Liberation • Paradigm shift from stateful entity ELT to stateless entity ETP • Extract: extensible adapters get data from variety of data sources batch or stream • Transform: map entity data to schema • Publish: submit commands • Language/tool/log agnostic Stateful Entity Data Pipeline - ETP Command Envelope COLLECT ETP Record Assembly Extract PublishTransform EVENTS DISTRIBUTED LOG + STATE PROCESSING Apache Kafka State Abstract Evolvable Schema Metadata Common Type Generic Attrs Feats DATA DATABASE OLTP FILE CSV, XML, XLS API REST, GRPC STREAMING Kafka, CDC … …
  • 16. Data Liberation • Domain Processing consumes commands to handle state mutation, de-duplication, out-of- order resolution • Utilise Kafka Streams API and RocksDB • Produces state change events and entity snapshots for downstream consumers • Distributed, resilient, separation of concerns • Eventual Consistency paradigm can remove assembly complexity (joins) Stateful Entity Data Pipeline – Domain Processing EVENTS DISTRIBUTED LOG + STATE PROCESSING Apache Kafka DOMAIN PROCESSING AGGREGATE PROCESSING Kafka Streams, RocksDB State Command State Change + Snapshot Fact
  • 17. Data Liberation • Enrichment pattern to utilise Domain Processing mutation logic • Examples: lookups, conformance, matching, scoring, feature engineering • De-coupled, event driven, independently deployable • Cross-datasource Stateful Entity Data Pipeline - Enrichment ENRICHMENT LOOKUP ENRICHMENT LOOKUP EVENTS DISTRIBUTED LOG + STATE PROCESSING Apache Kafka DOMAIN PROCESSING AGGREGATE PROCESSING Kafka Streams, RocksDB State Command State Change + Snapshot ENRICHMENT LOOKUP
  • 18. Data Liberation • Polyglot data stores, the right tool for the right job • Maintain existing services • Embedded lightweight materialised views (CQRS) • Low latency • Independently deployable, agile • Service patterns: stateful (snapshot), stateless (state change), ephemeral, one-time, serverless, … Stateful Entity Data Pipeline - Services EVENTS DISTRIBUTED LOG + STATE PROCESSING Apache Kafka State SERVICE SEARCH SERVICE REPORTING SERVICE RULES SERVING DOCUMENT SERVING OBJECT Command State Change + Snapshot
  • 19. The way forward • Right tool for the right job – an enabler for innovation • Don’t underestimate the sympathy factor – the process will be like removing a comfort blanket, there will be resistance. Challenge opinions on storage vs compute, batch vs stream, SQL all the things • Your schema is your contract – take care to maintain compatibility, put governance in place. Avoid the temptation to use JSON • Utilise Kafka ecosystem first • Leverage Kafka Streams for domain processing and services or KSQL where appropriate • Use Schema Registry • Prefer Kafka Connect • Consider Kafka managed service offerings – Confluent Cloud, KMS Learnings
  • 20. Get in touch Neil Murray Lead Big Data Architect, Data neil.murray@6point6.co.uk About 6point6 Integrating digital technology into your business can result in fundamental changes to how you operate and deliver value to your customers. To go digital is to reinvent yourself to the core, opening yourself and your clients to a world of possibilities. 6point6 is a technology consultancy. We bring a wealth of hands-on experience to help financial service providers, media houses and government achieve more with digital. Using cutting edge technology and agile delivery methods, we help you reinvent, transform and secure a brighter digital future. Visit us on www.6point6.co.uk Twitter: @6point6ltd LinkedIn: linkedin.com/company/6point6 192