Data Architectures for Robust Decision Making

System Architect at Confluent à Confluent
21 Feb 2015
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
1 sur 54

Contenu connexe

Tendances

Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streamingdatamantra
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Helena Edelson
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaJoe Stein
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Lucidworks
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDataWorks Summit
Stream Processing using Apache Spark and Apache KafkaStream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaAbhinav Singh

Tendances(20)

En vedette

HBase: Just the BasicsHBase: Just the Basics
HBase: Just the BasicsHBaseCon
HBase internalsHBase internals
HBase internalsMatteo Bertozzi
HBase in Practice HBase in Practice
HBase in Practice DataWorks Summit/Hadoop Summit
Apache HBase - Just the BasicsApache HBase - Just the Basics
Apache HBase - Just the BasicsHBaseCon
Apache HBase at Airbnb Apache HBase at Airbnb
Apache HBase at Airbnb HBaseCon
Intro to HBaseIntro to HBase
Intro to HBasealexbaranau

Similaire à Data Architectures for Robust Decision Making

Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...AboutYouGmbH
InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxData
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of MetadataJim Dowling
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...InfluxData
Realtime Detection of DDOS attacks using Apache Spark and MLLibRealtime Detection of DDOS attacks using Apache Spark and MLLib
Realtime Detection of DDOS attacks using Apache Spark and MLLibRyan Bosshart
Whats new in Oracle Database 12c release 12.1.0.2Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2Connor McDonald

Similaire à Data Architectures for Robust Decision Making(20)

Plus de Gwen (Chen) Shapira

Velocity 2019  - Kafka Operations Deep DiveVelocity 2019  - Kafka Operations Deep Dive
Velocity 2019 - Kafka Operations Deep DiveGwen (Chen) Shapira
Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote Gwen (Chen) Shapira
Gluecon - Kafka and the service meshGluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGwen (Chen) Shapira
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Gwen (Chen) Shapira
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebookGwen (Chen) Shapira
Kafka reliability   velocity 17Kafka reliability   velocity 17
Kafka reliability velocity 17Gwen (Chen) Shapira

Dernier

OpenOCD-K3OpenOCD-K3
OpenOCD-K3Nishanth Menon
engagedeeplycha9-140407035838-phpapp02.pptxengagedeeplycha9-140407035838-phpapp02.pptx
engagedeeplycha9-140407035838-phpapp02.pptxkaranthakur846894
Reinforced earth structures notes.pdfReinforced earth structures notes.pdf
Reinforced earth structures notes.pdfRamyaNarasimhan5
Boeing 777F Aircraft Sample Manual.pdfBoeing 777F Aircraft Sample Manual.pdf
Boeing 777F Aircraft Sample Manual.pdfTahirSadikovi
Work in Offline First Apps – Sync Datasources with WorkManager.pptxWork in Offline First Apps – Sync Datasources with WorkManager.pptx
Work in Offline First Apps – Sync Datasources with WorkManager.pptxJosephMuasya2
Problem solving using computers - Chapter 1 Problem solving using computers - Chapter 1
Problem solving using computers - Chapter 1 To Sum It Up

Data Architectures for Robust Decision Making

Notes de l'éditeur

  1. This gives me a lot of perspective regarding the use of Hadoop
  2. Not everyone, obviously. But I see a lot of “POC” type use-cases. 1 use case, maybe 3 data sources, 2 interesting insights from analysis. Everything requires lots of manual labor.
  3. Shikumika means “systemize.” This is the step that is crucial to improvement at any large entity. Shikumika means creating a base on which you can continue the improvement process. Because at an individual level, the original three steps are sufficient: build a hypothesis; act on it; and verify the results. If the validation proves the hypothesis to be the right one, you can simply continue acting on it. But for an entire organization, that’s not enough. The steps could end up as a hollow slogan. From the viewpoint of an organization, the cycle of hypothesizing, practicing and validating, conducted by an employee or a department, is a small experiment. If a hypothesis holds true in a small experiment, we can run with that hypothesis on a larger, organization-wide scale.
  4. We are looking for AGILE. The ability to expend, grow and evolve. To be flexible without adding tons of risk and overhead.
  5. Then we end up adding clients to use that source.
  6. But as we start to deploy our applications we realizet hat clients need data from a number of sources. So we add them as needed.
  7. But over time, particularly if we are segmenting services by function, we have stuff all over the place, and the dependencies are a nightmare. This makes for a fragile system.
  8. Kafka is a pub/sub messaging system that can decouple your data pipelines. Most of you are probably familiar with it’s history at LinkedIn and they use it as a high throughput relatively low latency commit log. It allows sources to push data without worrying about what clients are reading it. Note that producer push, and consumers pull. Kafka itself is a cluster of brokers, which handles both persisting data to disk and serving that data to consumer requests.
  9. Approach #1 – easier to develop and deploy in production. Doesn’t require a set of “spare” servers for the second stream Approach #2 – allows for real-time experiments
  10. There will be tools and patterns to move seamlessly between the two. Perhaps you won’t even need to care – just say how often you want the data refreshed – every day? Hour? 5 minutes? 5 seconds? 5 milliseconds?
  11. Sorry, but “Schema on Read” is kind of B.S. We admit that there is a schema, but we want to “ingest fast”, so we shift the burden to the readers. But the data is written once and read many many times by many different people. They each need to figure this out on their own? This makes no sense. Also, how are you going to validate the data without a schema?
  12. But over time, particularly if we are segmenting services by function, we have stuff all over the place, and the dependencies are a nightmare. This makes for a fragile system.
  13. Kafka is a pub/sub messaging system that can decouple your data pipelines. Most of you are probably familiar with it’s history at LinkedIn and they use it as a high throughput relatively low latency commit log. It allows sources to push data without worrying about what clients are reading it. Note that producer push, and consumers pull. Kafka itself is a cluster of brokers, which handles both persisting data to disk and serving that data to consumer requests.
  14. https://github.com/schema-repo/schema-repo