Soumettre la recherche
Mettre en ligne
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
•
7 j'aime
•
1,261 vues
Carol McDonald
Suivre
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Lire moins
Lire la suite
Logiciels
Signaler
Partager
Signaler
Partager
1 sur 101
Télécharger maintenant
Télécharger pour lire hors ligne
Recommandé
Applying Machine Learning to Live Patient Data
Applying Machine Learning to Live Patient Data
Carol McDonald
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Carol McDonald
Predicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine Learning
Carol McDonald
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Carol McDonald
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
Carol McDonald
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
Carol McDonald
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
Carol McDonald
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Carol McDonald
Recommandé
Applying Machine Learning to Live Patient Data
Applying Machine Learning to Live Patient Data
Carol McDonald
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Carol McDonald
Predicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine Learning
Carol McDonald
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Carol McDonald
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
Carol McDonald
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
Carol McDonald
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
Carol McDonald
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Carol McDonald
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Carol McDonald
Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures
Carol McDonald
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Carol McDonald
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Carol McDonald
Demystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep Learning
Carol McDonald
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Carol McDonald
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
MapR Technologies
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
Carol McDonald
Apache Spark Overview
Apache Spark Overview
Carol McDonald
Spark graphx
Spark graphx
Carol McDonald
Introduction to machine learning with GPUs
Introduction to machine learning with GPUs
Carol McDonald
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
MapR Technologies
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache HBase
Carol McDonald
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka API
Carol McDonald
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
MapR Technologies
Big Data Paris
Big Data Paris
Ted Dunning
Introduction to Spark
Introduction to Spark
Carol McDonald
Introduction to Spark on Hadoop
Introduction to Spark on Hadoop
Carol McDonald
Free Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache Spark
MapR Technologies
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
MapR Technologies
Episode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data Services
Mesosphere Inc.
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
Debraj GuhaThakurta
Contenu connexe
Tendances
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Carol McDonald
Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures
Carol McDonald
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Carol McDonald
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Carol McDonald
Demystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep Learning
Carol McDonald
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Carol McDonald
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
MapR Technologies
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
Carol McDonald
Apache Spark Overview
Apache Spark Overview
Carol McDonald
Spark graphx
Spark graphx
Carol McDonald
Introduction to machine learning with GPUs
Introduction to machine learning with GPUs
Carol McDonald
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
MapR Technologies
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache HBase
Carol McDonald
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka API
Carol McDonald
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
MapR Technologies
Big Data Paris
Big Data Paris
Ted Dunning
Introduction to Spark
Introduction to Spark
Carol McDonald
Introduction to Spark on Hadoop
Introduction to Spark on Hadoop
Carol McDonald
Free Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache Spark
MapR Technologies
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
MapR Technologies
Tendances
(20)
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Demystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep Learning
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
Apache Spark Overview
Apache Spark Overview
Spark graphx
Spark graphx
Introduction to machine learning with GPUs
Introduction to machine learning with GPUs
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache HBase
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka API
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Big Data Paris
Big Data Paris
Introduction to Spark
Introduction to Spark
Introduction to Spark on Hadoop
Introduction to Spark on Hadoop
Free Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache Spark
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
Similaire à Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Episode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data Services
Mesosphere Inc.
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
Debraj GuhaThakurta
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
Debraj GuhaThakurta
Dataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice Way
Josef Adersberger
Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache Flink
Tugdual Grall
Amazon Kinesis
Amazon Kinesis
Amazon Web Services
Application Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and Future
VARUN SAXENA
Application Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and Future
VARUN SAXENA
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloud
Scott Miao
Episode 4: Operating Kubernetes at Scale with DC/OS
Episode 4: Operating Kubernetes at Scale with DC/OS
Mesosphere Inc.
Running Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data Platform
Eva Tse
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
Amazon Web Services
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
MapR Technologies
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
Inside Analysis
Enterprise Data Lakes
Enterprise Data Lakes
Farid Gurbanov
CS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_Computing
Palani Kumar
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
DataStax Academy
User-space Network Processing
User-space Network Processing
Ryousei Takano
Modernizing upstream workflows with aws storage - john mallory
Modernizing upstream workflows with aws storage - john mallory
Amazon Web Services
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
DataWorks Summit
Similaire à Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
(20)
Episode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data Services
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
Dataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice Way
Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache Flink
Amazon Kinesis
Amazon Kinesis
Application Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and Future
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloud
Episode 4: Operating Kubernetes at Scale with DC/OS
Episode 4: Operating Kubernetes at Scale with DC/OS
Running Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
Enterprise Data Lakes
Enterprise Data Lakes
CS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_Computing
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
User-space Network Processing
User-space Network Processing
Modernizing upstream workflows with aws storage - john mallory
Modernizing upstream workflows with aws storage - john mallory
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
Plus de Carol McDonald
Spark machine learning predicting customer churn
Spark machine learning predicting customer churn
Carol McDonald
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Carol McDonald
Apache Spark Machine Learning
Apache Spark Machine Learning
Carol McDonald
Apache Spark streaming and HBase
Apache Spark streaming and HBase
Carol McDonald
Machine Learning Recommendations with Spark
Machine Learning Recommendations with Spark
Carol McDonald
CU9411MW.DOC
CU9411MW.DOC
Carol McDonald
Getting started with HBase
Getting started with HBase
Carol McDonald
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
Carol McDonald
Plus de Carol McDonald
(8)
Spark machine learning predicting customer churn
Spark machine learning predicting customer churn
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Apache Spark Machine Learning
Apache Spark Machine Learning
Apache Spark streaming and HBase
Apache Spark streaming and HBase
Machine Learning Recommendations with Spark
Machine Learning Recommendations with Spark
CU9411MW.DOC
CU9411MW.DOC
Getting started with HBase
Getting started with HBase
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
Dernier
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
ThousandEyes
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
software pro Development
Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid
Philip Schwarz
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
OnePlan Solutions
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
AmarnathKambale
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
shikhaohhpro
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
Delhi Call girls
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
kalichargn70th171
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
OnePlan Solutions
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
Andolasoft Inc
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
SolGuruz
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
Willy Marroquin (WillyDevNET)
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
aagamshah0812
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
kalichargn70th171
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
Fatema Valibhai
Software Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
Arshad QA
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
Delhi Call girls
Dernier
(20)
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
Software Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
1.
© 2017 MapR
Technologies Data Pipeline Using Apache APIs: Kafka, Spark, and MapR-DB
2.
© 2017 MapR
Technologies Data Pipeline Using Apache APIs: Kafka, Spark, and MapR-DB • Kafka • Spark Streaming • Spark SQL
3.
© 2017 MapR
Technologies Streaming ETL Pipeline Data Collect Process Store Stream Topic Spark Streaming Kafka API SQL Open JSON API Analyze JSON SQL
4.
© 2017 MapR
Technologies Traditional ETL Image Reference: Databricks
5.
© 2017 MapR
Technologies Streaming ETL Image Reference: Databricks
6.
© 2017 MapR
Technologies What is a Stream ? Producers Consumers • A stream is an continuous sequence of events or records • Records are key-value pairs Stream of Data key value key value key value key value
7.
© 2017 MapR
Technologies Examples of Streaming Data Fraud detection Smart Machinery Smart Meters Home Automation Networks Manufacturing Security Systems Patient Monitoring
8.
© 2017 MapR
Technologies Examples of Streaming Data • Monitoring devices combined with ML can provide alerts for Sepsis, which is one of the leading causes for death in hospitals – http://www.computerweekly.com/news/450422258/Putting-sepsis-algorithms-into-electronic- patient-records
9.
© 2017 MapR
Technologies Examples of Streaming Data • A Stanford team has shown that a machine-learning model can identify heart arrhythmias from an electrocardiogram (ECG) better than an expert – https://www.technologyreview.com/s/608234/the-machines-are-getting-ready-to-play-doctor/
10.
© 2017 MapR
Technologies Applying Machine Learning to Live Patient Data • https://www.slideshare.net/caroljmcdonald/applying-machine-learning-to- live-patient-data
11.
© 2017 MapR
Technologies What has changed in the past 10 years? • Distributed computing • Streaming analytics • Improved machine learning
12.
© 2017 MapR
Technologies Serve DataStore DataCollect Data What Do We Need to Do ? Process DataData Sources ? ? ? ?
13.
© 2017 MapR
Technologies Collect the Data Data IngestSource Stream Topic • Data Ingest: – Using the Kafka API
14.
© 2017 MapR
Technologies Organize Data into Topics with MapR-Event Streams Topics: Logical collection of events, Organize Events into Categories Consumers MapR Cluster Topic: Pressure Topic: Temperature Topic: Warnings Consumers Consumers Kafka API Kafka API
15.
© 2017 MapR
Technologies Scalable Messaging with MapR Event Streams Server 1 Partition1: Topic - Pressure Partition1: Topic - Temperature Partition1: Topic - Warning Server 2 Partition2: Topic - Pressure Partition2: Topic - Temperature Partition2: Topic - Warning Server 3 Partition3: Topic - Pressure Partition3: Topic - Temperature Partition3: Topic - Warning Topics are partitioned for throughput and scalability
16.
© 2017 MapR
Technologies Scalable Messaging with MapR Event Streams Partition1: Topic - Pressure Partition1: Topic - Temperature Partition1: Topic - Warning Partition2: Topic - Pressure Partition2: Topic - Temperature Partition2: Topic - Warning Partition3: Topic - Pressure Partition3: Topic - Temperature Partition3: Topic - Warning Producers are load balanced between partitions Kafka API
17.
© 2017 MapR
Technologies Scalable Messaging with MapR Event Streams Partition1: Topic - Pressure Partition1: Topic - Temperature Partition1: Topic - Warning Partition2: Topic - Pressure Partition2: Topic - Temperature Partition2: Topic - Warning Partition3: Topic - Pressure Partition3: Topic - Temperature Partition3: Topic - Warning Consumers Consumers Consumers Consumer groups can read in parallel Kafka API
18.
© 2017 MapR
Technologies Partition is like an Event Log Consumers MapR Cluster Topic: Admission / Server 1 Topic: Admission / Server 2 Topic: Admission / Server 3 Consumers Consumers Partition 1 New Messages are appended to the end Partition 2 Partition 3 6 5 4 3 2 1 3 2 1 5 4 3 2 1 Producers Producers Producers New Message 6 5 4 3 2 1 Old Message
19.
© 2017 MapR
Technologies Partition is like a Queue Messages are delivered in the order they are received MapR Cluster 6 5 4 3 2 1 Consumer groupProducers Read cursors Consumer group
20.
© 2017 MapR
Technologies Unlike a queue, events are still persisted after they’re delivered Messages remain on the partition, available to other consumers MapR Cluster (1 Server) Topic: Warning Partition 1 3 2 1 Unread Events Get Unread 3 2 1 Client Library ConsumerPoll
21.
© 2017 MapR
Technologies When Are Messages Deleted? • Messages can be persisted forever • Or • Older messages can be deleted automatically based on time to live MapR Cluster (1 Server) 6 5 4 3 2 1Partition 1 Older message
22.
© 2017 MapR
Technologies Traditional Message queue
23.
© 2017 MapR
Technologies How do we do this with High Performance at Scale? • Parallel operations • minimizes disk read/writes
24.
© 2017 MapR
Technologies Processing Same Message for Different Purposes Consumers Consumers Consumers Producers Producers Producers MapR-FS Kafka API Kafka API
25.
© 2017 MapR
Technologies Stream as the System of Record
26.
© 2017 MapR
Technologies A Table is a Snapshot of a Stream Updates Imagine each event as a change to an entry in a database. Account Id Balance WillO 80.00 BradA 20.00 1: WillO : Deposit : 100.00 2: BradA : Deposit : 50.00 3: BradA : Withdraw : 30.00 4: WillO : Withdraw: 20.00 https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying Change log 4 3 2 1
27.
© 2017 MapR
Technologies A Stream is a Change Log of a Table Change Log https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying 3 2 1 3 2 1 3 2 1 Duality of Streams and Tables Master: Append writes Slave: Apply writes in order Replication of changes
28.
© 2017 MapR
Technologies Rewind: Reprocessing Events MapR Cluster 6 5 4 3 2 1Producers Reprocess from oldest Consumer Create new view, Index, cache
29.
© 2017 MapR
Technologies Rewind Reprocessing Events MapR Cluster 6 5 4 3 2 1Producers To Newest Consumer new view Read from new view
30.
© 2017 MapR
Technologies Event Sourcing, Command Query Responsibility Separation: Turning the Database Upside Down Key-Val Document Graph Wide Column Time Series Relational ???Events Updates
31.
© 2017 MapR
Technologies Use Case: Streaming System of Record for Healthcare Objective: • Build a flexible, secure healthcare exchange Records Analysis Applications Challenges: • Many different data models • Security and privacy issues • HIPAA compliance Records
32.
© 2017 MapR
Technologies32 ALLOY Health: Exchange State HIE Clinical Data Viewer Reporting and Analytics Clinical Data Financial Data Provider Organizations What are the outcomes in the entire state on diabetes? Are there doctors that are doing this better than others? Georgia Health Connect
33.
© 2017 MapR
Technologies Use Case: Streaming System of Record for Healthcare
34.
© 2017 MapR
Technologies Spark Dataset
35.
© 2017 MapR
Technologies Spark Distributed Datasets Dataset W Executor P4 W Executor P1 P3 W Executor P2 partitioned Partition 1 8213034705, 95, 2.927373, jake7870, 0…… Partition 2 8213034705, 115, 2.943484, Davidbresler2, 1…. Partition 3 8213034705, 100, 2.951285, gladimacowgirl, 58… Partition 4 8213034705, 117, 2.998947, daysrus, 95…. • Read only collection of typed objects Dataset[T] • Partitioned across a cluster • Operated on in parallel • can be Cached
36.
© 2017 MapR
Technologies val df: Dataset[Payment] = spark.read.json(”/p/file.json").as[Payment] Spark Distributed Datasets read from a file Worker Task Worker Block 1 Block 2 Block 3 Driver Cache 1 Cache 2 Cache 3 Process & Cache Data Process & Cache Data Process & Cache Data Task Task Block 1 Driver tasks tasks tasks
37.
© 2017 MapR
Technologies DataFrame is like a table Dataset[Row] row columns DataFrame = Dataset[Row] can use Spark SQL
38.
© 2017 MapR
Technologies A Dataset is a collection of typed objects Dataset[objects] objects columns Dataset[Typed Object] can use Spark SQL and Functions
39.
© 2017 MapR
Technologies Spark Streaming
40.
© 2017 MapR
Technologies Collect Data Process the Data with Spark Streaming Process Data Stream Topic • scalable, high-throughput, stream processing of live data
41.
© 2017 MapR
Technologies What is a Stream ? Producers Consumers • A stream is an continuous sequence of events or records • Records are key-value pairs Stream of Data key value key value key value key value
42.
© 2017 MapR
Technologies Data stream Unbounded Table new data in the data stream = new rows appended to an unbounded table Data stream as an unbounded table Treat Stream as Unbounded Tables
43.
© 2017 MapR
Technologies Spark Distributed Datasets read from Stream partitions Task Cache Process & Cache Data offsets Stream partition Task Cache Process & Cache Data Task Cache Process & Cache Data Driver Stream partition Stream partition Data is cached for aggregations And windowed functions
44.
© 2017 MapR
Technologies Streaming data = Unbounded table Static Data = bounded table Same Dataset operations & SQL Stream Processing on Spark SQL Engine
45.
© 2017 MapR
Technologies Conceptual model incremental query 3. Append
46.
© 2017 MapR
Technologies Continuous incremental execution Spark SQL converts queries to incremental execution plans For input of data Incremental Incremental Incremental
47.
© 2017 MapR
Technologies Use Case: Payment Data Payment input data Stream Input "NEW","Covered Recipient Physician",,,,"132655","GREGG","D","ALZATE",,"8745 AERO DRIVE","STE 200","SAN DIEGO","CA","92123","United States",,,"Medical Doctor","Allopathic & Osteopathic Physicians|Radiology|Diagnostic Radiology","CA",,,,,"DFINE, Inc","100000000326","DFINE, Inc","CA","United States", 90.87,"02/12/2016","1","In-kind items and services","Food and Beverage",,,,"No","No Third Party Payment",,,,,"No","346039438","No","Yes","Covered","Device","Radiology","StabiliT", ,"Covered","Device","Radiology","STAR Tumor Ablation System",,,,,,,,,,,,,,,,,"2016","06/30/2017" transform Spark Streaming { "_id":"317150_08/26/2016_346122858", "physician_id":"317150", "date_payment":"08/26/2016", "record_id":"346122858", "payer":"Mission Pharmacal Company", "amount":9.23, "Physician_Specialty":"Obstetrics & Gynecology", "Nature_of_payment":"Food and Beverage" } JSON
48.
© 2017 MapR
Technologies Use Case: Open Payment Dataset • Payments Drug and Device companies make to • Physicians and Teaching Hospitals for • Travel, Research, Gifts, Speaking fees, and Meals
49.
© 2017 MapR
Technologies Scenario: Payment Data Provider ID Date Payer Payer State Provider Specialty Provider State Amount Payment Nature 1261770 01/11/2016 Southern Anesthesia & Surgical, Inc CO Oral and Maxillofacial Surgery CA 117.5 Food and Beverage
50.
© 2017 MapR
Technologies Stream the data into a Dataframe: Define the Schema case class Payment(physician_id: String, date_payment: String, payer: String, payer_state: String amount: Double, physician_specialty: String, phys_state: String, nature_of_payment:String) val schema = StructType(Array( StructField("_id", StringType, true), StructField("physician_id", StringType, true), StructField("date_payment", StringType, true), StructField("payer", StringType, true), StructField("payer_state", StringType, true), StructField("amount", DoubleType, true), StructField("physician_specialty", StringType, true), StructField("physician_type", StringType, true), StructField("physician_state", StringType, true), StructField("nature_of_payment", StringType, true) ))
51.
© 2017 MapR
Technologies Function to Parse CSV into Payment Class def parse(str: String): Payment = { val td = str.split(",(?=([^"]*"[^"]*")*[^"]*$)") val physician_id = td(5) val payer = td(27) . . . val physician_state = td(20) var focus =td(19) val id =physician_state+'_’+focus+ '_’+ date_payment+'_' + record_id Payment(id, physician_id, date_payment, payer, payer_state, amount, physician_type, focus, physician_state, nature_of_payment) }
52.
© 2017 MapR
Technologies Parsed and Transformed Payment Data { "_id":”TX_Gynecology_08/26/2016_346122858", "physician_id":"317150", "date_payment":"08/26/2016", "payer":"Mission Pharmacal Company", "payer_state":”CO", "amount":9.23, ”physician_specialty":”Gynecology", “physician_state":”TX" ”nature_of_payment":"Food and Beverage" } Example Dataset Row
53.
© 2017 MapR
Technologies Streaming pipeline Data source Specify data source returns a dataframe val df1 = spark.readStream.format("kafka") .option("kafka.boostrap.servers",...) .option("subscribe", "topic") .load()
54.
© 2017 MapR
Technologies Transformation Cast bytes from Kafka records to a string, parse csv , and return Dataset[Payment] spark.udf.register("deserialize", (message: String) => parse(message)) val df2=df1 .selectExpr("""deserialize(CAST(value as STRING)) AS message""") .select($"message".as[Payment])
55.
© 2017 MapR
Technologies Streams Stream Processing Stream Processing Storage Raw Enriched Filtered Stream Processing: • Filtering • Transformations • Aggregations • Enrichments with ML • Enrichments with joins MapR-DB MapR-XD
56.
© 2017 MapR
Technologies Dataframe Integrated Queries L I Query Description agg(expr, exprs) Aggregates on entire DataFrame distinct Returns new DataFrame with unique rows except(other) Returns new DataFrame with rows from this DataFrame not in other DataFrame filter(expr); where(condition) Filter based on the SQL expression or condition groupBy(cols: Columns) Groups DataFrame using specified columns join (DataFrame, joinExpr) Joins with another DataFrame using given join expression sort(sortcol) Returns new DataFrame sorted by specified column select(col) Selects set of columns
57.
© 2017 MapR
Technologies Continuous aggregations Continuously compute average payment amountval d3=df2.avg(“amount")
58.
© 2017 MapR
Technologies Continuous aggregations and filter val d3=df2.groupBy(“payer") .avg(“amount")
59.
© 2017 MapR
Technologies Continuous aggregations and filter val d3=df2 .filter($"amount" > 20000)
60.
© 2017 MapR
Technologies Streaming pipeline Kafka topic Data Sink Write results to Kafka topic Start running the query val query = df3.write .format("kafka") .option("kafka.bootstrap.servers", "host1:port1,host2:port2") .option("topic", "/apps/uberstream:uberp") query.start.awaitTermination()
61.
© 2017 MapR
Technologies Streaming Applicaton Streaming Dataset Streaming Dataset Transformed Stream Topic Stream Topic Stream Topic Stream Topic Stream Topic Stream Topic
62.
© 2017 MapR
Technologies Spark & MapR-DB
63.
© 2017 MapR
Technologies AnalyzeStore DataCollect Data What Do We Need to Do ? Process DataData Sources ? ? ? ? Stream Topic Spark Streaming JSON
64.
© 2017 MapR
Technologies Stream Processing Pipeline Data Collect Process Store Stream Topic Spark Streaming Kafka API SQL Open JSON API Analyze JSON SQL
65.
© 2017 MapR
Technologies Where/How to store data ?
66.
© 2017 MapR
Technologies Relational Database vs. MapR-DB bottleneck Storage ModelRDBMS MapR-DB Normalized schema à Joins for queries can cause bottleneck De-Normalized schema à Data that is read together is stored together Key colB colC xxx val val xxx val val Key colB colC xxx val val xxx val val Key colB colC xxx val val xxx val val
67.
© 2017 MapR
Technologies MapR-DB JSON Document Store Data that is read together is stored together
68.
© 2017 MapR
Technologies Designed for Partitioning and Scaling Key Range xxxx xxxx Key Range xxxx xxxx Key Range xxxx xxxx Fast Reads and Writes by Key! Data is automatically partitioned by Key Range! Key colB colC xxx val val xxx val val Key colB colC xxx val val xxx val val Key colB colC xxx val val xxx val val
69.
© 2017 MapR
Technologies MapR-DB JSON Document Store Data is automatically partitioned by Key Range!
70.
© 2017 MapR
Technologies Payment Data Automatically sorted and partitioned by Key Range (_id) { "_id":”TX_Gynecology_08/26/2016_346122858", "physician_id":"317150", "date_payment":"08/26/2016", "payer":"Mission Pharmacal Company", "payer_state":”CO", "amount":9.23, ”physician_specialty":”Gynecology", “physician_state":”TX" ”nature_of_payment":"Food and Beverage" }
71.
© 2017 MapR
Technologies Spark Streaming writing to MapR-DB JSON
72.
© 2017 MapR
Technologies Spark MapR-DB Connector • Connection object in every Spark Executor: • distributed parallel writes & reads
73.
© 2017 MapR
Technologies Use Case: Flight Delays Payment input data Stream Input "NEW","Covered Recipient Physician",,,,"132655","GREGG","D","ALZATE",,"8745 AERO DRIVE","STE 200","SAN DIEGO","CA","92123","United States",,,"Medical Doctor","Allopathic & Osteopathic Physicians|Radiology|Diagnostic Radiology","CA",,,,,"DFINE, Inc","100000000326","DFINE, Inc","CA","United States", 90.87,"02/12/2016","1","In-kind items and services","Food and Beverage",,,,"No","No Third Party Payment",,,,,"No","346039438","No","Yes","Covered","Device","Radiology","StabiliT", ,"Covered","Device","Radiology","STAR Tumor Ablation System",,,,,,,,,,,,,,,,,"2016","06/30/2017" transform Spark Streaming { "_id":"317150_08/26/2016_346122858", "physician_id":"317150", "date_payment":"08/26/2016", "record_id":"346122858", "payer":"Mission Pharmacal Company", "amount":9.23, "Physician_Specialty":"Obstetrics & Gynecology", "Nature_of_payment":"Food and Beverage" } JSON
74.
© 2017 MapR
Technologies Streaming pipeline Data Sink Write result to maprdb Start running the query val query = df2.writeStream .format(MapRDBSourceConfig.Format) .option(MapRDBSourceConfig.TablePathOption, "/apps/paytable") .option(MapRDBSourceConfig.CreateTableOption, false) .option(MapRDBSourceConfig.IdFieldPathOption, "value") .outputMode(”append") query.start().awaitTermination()
75.
© 2017 MapR
Technologies Streaming Applicaton Streaming Dataset Streaming Dataset Transformed Tablets Stream Topic Stream Topic Stream Topic Data is rapidly available for complex, ad-hoc analytics
76.
© 2017 MapR
Technologies Explore the Data With Spark SQL
77.
© 2017 MapR
Technologies AnalyzeStore DataCollect Data What Do We Need to Do ? Process DataData Sources ? ? ? Stream Topic Spark Streaming JSON SQL
78.
© 2017 MapR
Technologies Spark SQL Querying MapR-DB JSON
79.
© 2017 MapR
Technologies Data Frame Load data Load the data into a Dataframe val pdf: Dataset[Payment] = spark.loadFromMapRDB[Payment]("/apps/paytable", schema) .as[Payment] pdf.select("_id", "payer", "amount”).show
80.
© 2017 MapR
Technologies val pdf: Dataset[Payment] = spark.loadFromMapRDB[Payment]("/apps/paytable", schema) .as[Payment] Spark Distributed Datasets read from MapR-DB Partitions Worker Task Worker Driver Cache 1 Cache 2 Cache 3 Process & Cache Data Process & Cache Data Process & Cache Data Task Task Driver tasks tasks tasks
81.
© 2017 MapR
Technologies Language Integrated Queries L I Query Description agg(expr, exprs) Aggregates on entire DataFrame distinct Returns new DataFrame with unique rows except(other) Returns new DataFrame with rows from this DataFrame not in other DataFrame filter(expr); where(condition) Filter based on the SQL expression or condition groupBy(cols: Columns) Groups DataFrame using specified columns join (DataFrame, joinExpr) Joins with another DataFrame using given join expression sort(sortcol) Returns new DataFrame sorted by specified column select(col) Selects set of columns
82.
© 2017 MapR
Technologies Top 5 Nature of Payment by count val res = pdf.groupBy(”Nature_of_Payment") .count() .orderBy(desc(count)) .show(5)
83.
© 2017 MapR
Technologies Top 5 Nature of Payment by amount of payment %sql select Nature_of_payment, sum(amount) as total from payments group by Nature_of_payment order by total desc limit 5
84.
© 2017 MapR
Technologies What are the Nature of Payments with payments > $1000 with count pdf.filter($"amount" > 1000) .groupBy("Nature_of_payment") .count().orderBy(desc("count")).show()
85.
© 2017 MapR
Technologies Top 5 Physician Specialties by total Amount %sql select physician_specialty, sum(amount) as total from payments where physician_specialty IS NOT NULL group by physician_specialty order by total desc limit 5
86.
© 2017 MapR
Technologies Average Payment by Specialty
87.
© 2017 MapR
Technologies Top Payers by Total Amount with count %sql select payer, payer_state, count(*) as cnt, sum(amount) as total from payments group by payer, payer_state order by total desc limit 10
88.
© 2017 MapR
Technologies Stream Processing Building a Complete Data Architecture MapR File System (MapR-XD) MapR Converged Data Platform MapR Database (MapR-DB) MapR Event Streams Sources/Apps Bulk Processing All of these components can run on the same cluster with the MapR Converged platform.
89.
© 2017 MapR
Technologies Data Pipelines and Machine Learning Logistics Input Data + Actual Delay Input Data + Predictions Consumer withML Model 2 Consumer withML Model 1 Decoy results Consumer Consumer withML Model 3 Consumer Stream Archive Stream Scores Stream Input SQL SQL Real time Flight Data Stream Input Actual Delay Input Data + Predictions + Actual Delay Real Time dashboard + Historical Analysis
90.
© 2017 MapR
Technologies
91.
© 2017 MapR
Technologies To Learn More: • MapR Free ODT http://learn.mapr.com/
92.
© 2017 MapR
Technologies …helping you put data technology to work ● Find answers ● Ask technical questions ● Join on-demand training course discussions ● Follow release announcements ● Share and vote on product ideas ● Find Meetup and event listings Connect with fellow Apache Hadoop and Spark professionals community.mapr.com
93.
© 2017 MapR
Technologies MapR Blog • https://www.mapr.com/blog/
94.
© 2017 MapR
Technologies To Learn More: ETL Payment data pipeline • https://mapr.com/blog/etl-pipeline-healthcare-dataset-with-spark-json-mapr- db/ • https://mapr.com/blog/streaming-data-pipeline-transform-store-explore- healthcare-dataset-mapr-db/
95.
© 2017 MapR
Technologies To Learn More: • https://mapr.com/blog/how-stream-first-architecture-patterns-are- revolutionizing-healthcare-platforms/
96.
© 2017 MapR
Technologies To Learn More: • https://mapr.com/blog/ml-iot-connected-medical-devices/
97.
© 2017 MapR
Technologies Applying Machine Learning to Live Patient Data • https://www.slideshare.net/caroljmcdonald/applying-machine-learning-to- live-patient-data
98.
© 2017 MapR
Technologies MapR Container for Developers • https://maprdocs.mapr.com/home/MapRContainerDevelopers/ MapRContainerDevelopersOverview.html
99.
© 2017 MapR
Technologies MapR Data Science Refinery • https://mapr.com/products/data-science-refinery/
100.
© 2017 MapR
Technologies MapR Data Platform
101.
© 2017 MapR
Technologies Q&A ENGAGE WITH US
Télécharger maintenant