SlideShare une entreprise Scribd logo
1  sur  19
Télécharger pour lire hors ligne
WITSML data processing
example with Kafka and
Spark Streaming
Houston Hadoop Meetup, 4/26/2016
About me - Dmitry Kniazev
Currently Solution Architect at EPAM Systems
- About 4 years in Oil & Gas here in Houston
- Started working with Hadoop about 2 years ago
Before that BI/DW Specialist at EPAM Systems for 6 years
- Reports, ETL with Oracle, Microsoft, Cognos and other tools
- Enjoyed not SO HOT life in Eastern Europe
Before that Performance Analyst at EPAM Systems for 4 years
- Web Applications and Databases optimization
What is the problem?
Source: http://www.croftsystems.net/blog/conventional-vs.-unconventional
What is WITSML?
DATA EXCHANGE STANDARD FOR THE UPSTREAM OIL AND GAS INDUSTRY
WITSML
Data
Store
Rig
Aggregation
Solution
Rig
Aggregation
Solution
Corp
Store
WITSML
Data
Store
Service Company
#1
Operator #1
Service Company
#2
WITSML based
ApplicationsWITSML
Operator Company Data Center
Architecture
WITSML
Data
Store
HBase
WITSML
via
SOAP
Internet
Consumer
(Scala)
Producer
(Scala)
Service
Company
DC
Kafka
Consumer
(Scala)
Email /
Browser
What is Kafka?
What is Spark Streaming?
Discretized Stream
Producer - prep
// some important imports
import com.mycompany.witsml.client.WitsmlClient //based on jwitsml 1.0
import org.apache.kafka.clients.producer.{KafkaProducer, ProducerRecord}
import scala.xml.{Elem, Node, XML}
// variables initialization
var producer: KafkaProducer[String, String] = null
var startTimeIndex = DateTime.now()
var topic = ""
var pollInterval = 5
Producer - Kafka Properties
bootstrap.servers = srv1:9092,srv2:9092
key.serializer = org.apache.kafka.common.serialization.StringSerializer
value.serializer = org.apache.kafka.common.serialization.StringSerializer
Producer - main function
producer = new KafkaProducer[String, String](props)
// each wellBore is a separate Kafka topic which is going to be partitioned by log
topic = args(0)
while (true) {
val logs = WitsmlClient.getWitsmlResponse(logsQuery)
// parse logs and send messages to Kafka
(logs  "log").foreach { node: Node =>
// send all data from one log to the same partition
val key = (node  "@uidLog").text
(node  "data").foreach { data =>
val message = new ProducerRecord(topic, null, key, data.text)
producer.send(message)
}
}
Producer - results
”Well123” => Topic
“5207KFSJ18” => Key (Partition)
Content of <data> element => Message
Consumer - prep
import org.apache.spark.SparkConf
import org.apache.spark.sql.{Row, SQLContext}
import org.apache.spark.streaming.dstream.InputDStream
import org.apache.spark.streaming.kafka.KafkaUtils
var schema: StructType = null
val sc = new SparkConf().setAppName("WitsmlKafkaDemo")
val ssc = new StreamingContext(sc, Seconds(1))
val dStream: InputDStream = KafkaUtils.createDirectStream(ssc, kafkaParams, topics)
val sqlContext = new SQLContext(ssc.sparkContext)
Consumer - Rules Definition
# fields for Spark SQL query
`Co. Man G/L`,`Gain Loss - Spare`,`ACC_DRILL_STRKS`
# where clause for SQL query
`Co. Man G/L`>100 OR `Gain Loss - Spare`<(-42.1)
Consumer - main function
dStream.foreachRDD( batchRDD => {
val messages = batchRDD.map(_._2).map(_.split(","))
//create DataFrame with a custom schema
val df = sqlContext.createDataFrame(messages, schema)
//register temp table and test against rule
df.registerTempTable("timeLog")
val collected = sqlContext.sql("SELECT " + fields + " FROM timeLog WHERE " + condition).collect
if (collected.length > 0) {
//send email alert
WitsmlKafkaUtil.sendEmail(collected)
}
})
ssc.start()
ssc.awaitTermination()
Visualization with Highcharts
Why Highcharts?
- Websockets support -> real-time data visualization
- Multiple Y-axes that automatically scale -> many mnemonics on the same chart
- Inverted X-axis -> great for Depth Logs
- 3D charts that can be rotated -> Trajectories
- Area range with custom colors -> Formations on the background
- 100% client side javascript -> easy to deploy
Lessons Learned
- Throw away and re-design:
- Logs should be Topics, Wells(Wellbores) should be Partitions for Scalability
- Producers and Consumers should be Managed Services (Flume Agents?)
- Backend:
- Land data to HBase (and probably OpenTSDB)
- Frontend:
- WebApp to visualize both NRT and historical data?
- Mobile App for Alerts?
- Improve Producers:
- Speak many WITSML dialects?
- Get ready for Real-time:
- Support for ETP standard
Thank you!
dmitry_kniazev@epam.com
Links:
http://www.energistics.org/
http://www.highcharts.com/
https://spark.apache.org/
http://kafka.apache.org/

Contenu connexe

Tendances

Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
Enterprise Resource Planning (ERP)
Enterprise Resource Planning (ERP)Enterprise Resource Planning (ERP)
Enterprise Resource Planning (ERP)Aashish Adhikari
 
Why Salesforce Commerce Cloud?
Why Salesforce Commerce Cloud?Why Salesforce Commerce Cloud?
Why Salesforce Commerce Cloud?Docmation
 
Achieve Digital Transformation with SAP Hybris Commerce Travel Accelerator
Achieve Digital Transformation with SAP Hybris Commerce Travel AcceleratorAchieve Digital Transformation with SAP Hybris Commerce Travel Accelerator
Achieve Digital Transformation with SAP Hybris Commerce Travel AcceleratorSAP Customer Experience
 
A Comparison of Cloud based ERP Systems
A Comparison of Cloud based ERP SystemsA Comparison of Cloud based ERP Systems
A Comparison of Cloud based ERP SystemsNakul Patel
 
Innovations in Logistics with S4HANA Enterprise Management 1511
 Innovations in Logistics with S4HANA Enterprise Management 1511 Innovations in Logistics with S4HANA Enterprise Management 1511
Innovations in Logistics with S4HANA Enterprise Management 1511Danny Karsai
 
SAP Cloud Platform Product Overview L2 deck
SAP Cloud Platform Product Overview L2 deckSAP Cloud Platform Product Overview L2 deck
SAP Cloud Platform Product Overview L2 deckSAP Cloud Platform
 
Financial Services Cloud - Blueprint Webinar (March 20, 2016)
Financial Services Cloud - Blueprint Webinar (March 20, 2016)Financial Services Cloud - Blueprint Webinar (March 20, 2016)
Financial Services Cloud - Blueprint Webinar (March 20, 2016)Salesforce Partners
 
Power BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernancePower BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernanceJames Serra
 
A Front-Row Seat to Ticketmaster’s Use of MongoDB
A Front-Row Seat to Ticketmaster’s Use of MongoDBA Front-Row Seat to Ticketmaster’s Use of MongoDB
A Front-Row Seat to Ticketmaster’s Use of MongoDBMongoDB
 
IBP - Inventory Optimization Slides.pdf
IBP - Inventory Optimization Slides.pdfIBP - Inventory Optimization Slides.pdf
IBP - Inventory Optimization Slides.pdfMamtaShekhawat7
 
Overview power apps and microsoft flow
Overview power apps and microsoft flowOverview power apps and microsoft flow
Overview power apps and microsoft flowJuan Fabian
 
AstraZeneca at Neo4j GraphSummit London 14Nov23.pptx
AstraZeneca at Neo4j GraphSummit London 14Nov23.pptxAstraZeneca at Neo4j GraphSummit London 14Nov23.pptx
AstraZeneca at Neo4j GraphSummit London 14Nov23.pptxNeo4j
 
E-Commerce and Digital Transformation with SAP Hybris Solutions - 56593
E-Commerce and Digital Transformation with SAP Hybris Solutions - 56593E-Commerce and Digital Transformation with SAP Hybris Solutions - 56593
E-Commerce and Digital Transformation with SAP Hybris Solutions - 56593SAP Ariba Live 2018
 
SAP Integration Suite L1
SAP Integration Suite L1SAP Integration Suite L1
SAP Integration Suite L1SAP Technology
 

Tendances (20)

Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
SAP BI/BW
SAP BI/BWSAP BI/BW
SAP BI/BW
 
Enterprise Resource Planning (ERP)
Enterprise Resource Planning (ERP)Enterprise Resource Planning (ERP)
Enterprise Resource Planning (ERP)
 
Why Salesforce Commerce Cloud?
Why Salesforce Commerce Cloud?Why Salesforce Commerce Cloud?
Why Salesforce Commerce Cloud?
 
Achieve Digital Transformation with SAP Hybris Commerce Travel Accelerator
Achieve Digital Transformation with SAP Hybris Commerce Travel AcceleratorAchieve Digital Transformation with SAP Hybris Commerce Travel Accelerator
Achieve Digital Transformation with SAP Hybris Commerce Travel Accelerator
 
A Comparison of Cloud based ERP Systems
A Comparison of Cloud based ERP SystemsA Comparison of Cloud based ERP Systems
A Comparison of Cloud based ERP Systems
 
Innovations in Logistics with S4HANA Enterprise Management 1511
 Innovations in Logistics with S4HANA Enterprise Management 1511 Innovations in Logistics with S4HANA Enterprise Management 1511
Innovations in Logistics with S4HANA Enterprise Management 1511
 
SAP Cloud Platform Product Overview L2 deck
SAP Cloud Platform Product Overview L2 deckSAP Cloud Platform Product Overview L2 deck
SAP Cloud Platform Product Overview L2 deck
 
Sap ewm detailed presentation
Sap ewm detailed presentationSap ewm detailed presentation
Sap ewm detailed presentation
 
Financial Services Cloud - Blueprint Webinar (March 20, 2016)
Financial Services Cloud - Blueprint Webinar (March 20, 2016)Financial Services Cloud - Blueprint Webinar (March 20, 2016)
Financial Services Cloud - Blueprint Webinar (March 20, 2016)
 
SAP Basics
SAP BasicsSAP Basics
SAP Basics
 
Power BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernancePower BI Overview, Deployment and Governance
Power BI Overview, Deployment and Governance
 
Sap architecture
Sap architectureSap architecture
Sap architecture
 
A Front-Row Seat to Ticketmaster’s Use of MongoDB
A Front-Row Seat to Ticketmaster’s Use of MongoDBA Front-Row Seat to Ticketmaster’s Use of MongoDB
A Front-Row Seat to Ticketmaster’s Use of MongoDB
 
IBP - Inventory Optimization Slides.pdf
IBP - Inventory Optimization Slides.pdfIBP - Inventory Optimization Slides.pdf
IBP - Inventory Optimization Slides.pdf
 
Overview power apps and microsoft flow
Overview power apps and microsoft flowOverview power apps and microsoft flow
Overview power apps and microsoft flow
 
AstraZeneca at Neo4j GraphSummit London 14Nov23.pptx
AstraZeneca at Neo4j GraphSummit London 14Nov23.pptxAstraZeneca at Neo4j GraphSummit London 14Nov23.pptx
AstraZeneca at Neo4j GraphSummit London 14Nov23.pptx
 
E-Commerce and Digital Transformation with SAP Hybris Solutions - 56593
E-Commerce and Digital Transformation with SAP Hybris Solutions - 56593E-Commerce and Digital Transformation with SAP Hybris Solutions - 56593
E-Commerce and Digital Transformation with SAP Hybris Solutions - 56593
 
Customer 360
Customer 360Customer 360
Customer 360
 
SAP Integration Suite L1
SAP Integration Suite L1SAP Integration Suite L1
SAP Integration Suite L1
 

En vedette

Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop MeetupHadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop MeetupMark Kerzner
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data editionMark Kerzner
 
Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS Mark Kerzner
 
Witsml core api_version_1.3.1
Witsml core api_version_1.3.1Witsml core api_version_1.3.1
Witsml core api_version_1.3.1Suresh Ayyappan
 
Oil and Gas Climate Initiative 2016 report
Oil and Gas Climate Initiative 2016 reportOil and Gas Climate Initiative 2016 report
Oil and Gas Climate Initiative 2016 reportTotal
 
Set up Hadoop Cluster on Amazon EC2
Set up Hadoop Cluster on Amazon EC2Set up Hadoop Cluster on Amazon EC2
Set up Hadoop Cluster on Amazon EC2IMC Institute
 
Hadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - AltiscaleHadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - AltiscaleMark Kerzner
 
WITSML data processing with Kafka and Spark Streaming
WITSML data processing with Kafka and Spark StreamingWITSML data processing with Kafka and Spark Streaming
WITSML data processing with Kafka and Spark StreamingDmitry Kniazev
 
Challenges in Global Standardisation | EnergySys Hydrocarbon Allocation Forum
Challenges in Global Standardisation | EnergySys Hydrocarbon Allocation ForumChallenges in Global Standardisation | EnergySys Hydrocarbon Allocation Forum
Challenges in Global Standardisation | EnergySys Hydrocarbon Allocation ForumEnergySys Limited
 
GIS Technology and E&P in Petroleum Industry Context, Applications and Impact...
GIS Technology and E&P in Petroleum Industry Context, Applications and Impact...GIS Technology and E&P in Petroleum Industry Context, Applications and Impact...
GIS Technology and E&P in Petroleum Industry Context, Applications and Impact...Carlos Gabriel Asato
 
WITSML to PPDM mapping project
WITSML to PPDM mapping projectWITSML to PPDM mapping project
WITSML to PPDM mapping projectETLSolutions
 
Standards for Production Allocation
Standards for Production AllocationStandards for Production Allocation
Standards for Production AllocationEnergySys Limited
 
kafka-steaming-data
kafka-steaming-datakafka-steaming-data
kafka-steaming-dataBryan Jacobs
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2benjaminwootton
 
Prodml Production Reporting | Hydrocarbon Allocation Forum | 2014 09-30
Prodml Production Reporting | Hydrocarbon Allocation Forum | 2014 09-30Prodml Production Reporting | Hydrocarbon Allocation Forum | 2014 09-30
Prodml Production Reporting | Hydrocarbon Allocation Forum | 2014 09-30EnergySys Limited
 

En vedette (20)

Toorcamp 2016
Toorcamp 2016Toorcamp 2016
Toorcamp 2016
 
Cloudera search
Cloudera searchCloudera search
Cloudera search
 
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop MeetupHadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data edition
 
Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS
 
Witsml core api_version_1.3.1
Witsml core api_version_1.3.1Witsml core api_version_1.3.1
Witsml core api_version_1.3.1
 
Oil and Gas Climate Initiative 2016 report
Oil and Gas Climate Initiative 2016 reportOil and Gas Climate Initiative 2016 report
Oil and Gas Climate Initiative 2016 report
 
Hadoop on ec2
Hadoop on ec2Hadoop on ec2
Hadoop on ec2
 
Set up Hadoop Cluster on Amazon EC2
Set up Hadoop Cluster on Amazon EC2Set up Hadoop Cluster on Amazon EC2
Set up Hadoop Cluster on Amazon EC2
 
Hadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - AltiscaleHadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - Altiscale
 
WITSML data processing with Kafka and Spark Streaming
WITSML data processing with Kafka and Spark StreamingWITSML data processing with Kafka and Spark Streaming
WITSML data processing with Kafka and Spark Streaming
 
Challenges in Global Standardisation | EnergySys Hydrocarbon Allocation Forum
Challenges in Global Standardisation | EnergySys Hydrocarbon Allocation ForumChallenges in Global Standardisation | EnergySys Hydrocarbon Allocation Forum
Challenges in Global Standardisation | EnergySys Hydrocarbon Allocation Forum
 
GIS Technology and E&P in Petroleum Industry Context, Applications and Impact...
GIS Technology and E&P in Petroleum Industry Context, Applications and Impact...GIS Technology and E&P in Petroleum Industry Context, Applications and Impact...
GIS Technology and E&P in Petroleum Industry Context, Applications and Impact...
 
WITSML to PPDM mapping project
WITSML to PPDM mapping projectWITSML to PPDM mapping project
WITSML to PPDM mapping project
 
Standards for Production Allocation
Standards for Production AllocationStandards for Production Allocation
Standards for Production Allocation
 
kafka-steaming-data
kafka-steaming-datakafka-steaming-data
kafka-steaming-data
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2
 
WITSML
WITSMLWITSML
WITSML
 
Prodml Production Reporting | Hydrocarbon Allocation Forum | 2014 09-30
Prodml Production Reporting | Hydrocarbon Allocation Forum | 2014 09-30Prodml Production Reporting | Hydrocarbon Allocation Forum | 2014 09-30
Prodml Production Reporting | Hydrocarbon Allocation Forum | 2014 09-30
 
Data modelling 101
Data modelling 101Data modelling 101
Data modelling 101
 

Similaire à Witsml data processing with kafka and spark streaming

5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra EnvironmentJim Hatcher
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksAnyscale
 
Strata NYC 2015: What's new in Spark Streaming
Strata NYC 2015: What's new in Spark StreamingStrata NYC 2015: What's new in Spark Streaming
Strata NYC 2015: What's new in Spark StreamingDatabricks
 
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowChetan Khatri
 
Spark Study Notes
Spark Study NotesSpark Study Notes
Spark Study NotesRichard Kuo
 
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureSpark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureDataStax Academy
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Databricks
 
Spark streaming state of the union
Spark streaming state of the unionSpark streaming state of the union
Spark streaming state of the unionDatabricks
 
Intro to apache spark
Intro to apache sparkIntro to apache spark
Intro to apache sparkAmine Sagaama
 
Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streamingphanleson
 
Unified Data Access with Gimel
Unified Data Access with GimelUnified Data Access with Gimel
Unified Data Access with GimelAlluxio, Inc.
 
Data orchestration | 2020 | Alluxio | Gimel
Data orchestration | 2020 | Alluxio | GimelData orchestration | 2020 | Alluxio | Gimel
Data orchestration | 2020 | Alluxio | GimelDeepak Chandramouli
 
Spark streaming with kafka
Spark streaming with kafkaSpark streaming with kafka
Spark streaming with kafkaDori Waldman
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka Dori Waldman
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Michael Rys
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Anant Corporation
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData
 

Similaire à Witsml data processing with kafka and spark streaming (20)

5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
Strata NYC 2015: What's new in Spark Streaming
Strata NYC 2015: What's new in Spark StreamingStrata NYC 2015: What's new in Spark Streaming
Strata NYC 2015: What's new in Spark Streaming
 
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
 
Spark Study Notes
Spark Study NotesSpark Study Notes
Spark Study Notes
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
 
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureSpark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and Furure
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
 
Spark streaming state of the union
Spark streaming state of the unionSpark streaming state of the union
Spark streaming state of the union
 
Intro to apache spark
Intro to apache sparkIntro to apache spark
Intro to apache spark
 
Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streaming
 
Spark ML Pipeline serving
Spark ML Pipeline servingSpark ML Pipeline serving
Spark ML Pipeline serving
 
Unified Data Access with Gimel
Unified Data Access with GimelUnified Data Access with Gimel
Unified Data Access with Gimel
 
Data orchestration | 2020 | Alluxio | Gimel
Data orchestration | 2020 | Alluxio | GimelData orchestration | 2020 | Alluxio | Gimel
Data orchestration | 2020 | Alluxio | Gimel
 
Spark streaming with kafka
Spark streaming with kafkaSpark streaming with kafka
Spark streaming with kafka
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka
 
Nike tech talk.2
Nike tech talk.2Nike tech talk.2
Nike tech talk.2
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15
 

Plus de Mark Kerzner

IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for SparkMark Kerzner
 
Joe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiJoe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiMark Kerzner
 
FreeEed popcorn overview
FreeEed popcorn overviewFreeEed popcorn overview
FreeEed popcorn overviewMark Kerzner
 
FreeEed presentation
FreeEed presentationFreeEed presentation
FreeEed presentationMark Kerzner
 
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)Mark Kerzner
 
Porting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdpPorting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdpMark Kerzner
 
Automated Hadoop Cluster Construction on EC2
Automated Hadoop Cluster Construction on EC2Automated Hadoop Cluster Construction on EC2
Automated Hadoop Cluster Construction on EC2Mark Kerzner
 
Open source e_discovery
Open source e_discoveryOpen source e_discovery
Open source e_discoveryMark Kerzner
 
FreEed - Open Source eDiscovery
FreEed - Open Source eDiscoveryFreEed - Open Source eDiscovery
FreEed - Open Source eDiscoveryMark Kerzner
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaMark Kerzner
 
Google Office in Zurich, Switzerland
Google Office in Zurich, SwitzerlandGoogle Office in Zurich, Switzerland
Google Office in Zurich, SwitzerlandMark Kerzner
 
Fun art with fruit and vegetable
Fun art with fruit and vegetableFun art with fruit and vegetable
Fun art with fruit and vegetableMark Kerzner
 
Carnavale de Venice
Carnavale de VeniceCarnavale de Venice
Carnavale de VeniceMark Kerzner
 
Holocaust Memorial Tato
Holocaust Memorial TatoHolocaust Memorial Tato
Holocaust Memorial TatoMark Kerzner
 
Venice views with music
Venice views with musicVenice views with music
Venice views with musicMark Kerzner
 

Plus de Mark Kerzner (20)

IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
 
Joe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiJoe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFi
 
FreeEed popcorn overview
FreeEed popcorn overviewFreeEed popcorn overview
FreeEed popcorn overview
 
FreeEed presentation
FreeEed presentationFreeEed presentation
FreeEed presentation
 
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
 
SHMcloud vision
SHMcloud visionSHMcloud vision
SHMcloud vision
 
Porting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdpPorting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdp
 
Automated Hadoop Cluster Construction on EC2
Automated Hadoop Cluster Construction on EC2Automated Hadoop Cluster Construction on EC2
Automated Hadoop Cluster Construction on EC2
 
Open source e_discovery
Open source e_discoveryOpen source e_discovery
Open source e_discovery
 
FreEed - Open Source eDiscovery
FreEed - Open Source eDiscoveryFreEed - Open Source eDiscovery
FreEed - Open Source eDiscovery
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
 
Google Office in Zurich, Switzerland
Google Office in Zurich, SwitzerlandGoogle Office in Zurich, Switzerland
Google Office in Zurich, Switzerland
 
Fun art with fruit and vegetable
Fun art with fruit and vegetableFun art with fruit and vegetable
Fun art with fruit and vegetable
 
Carnavale de Venice
Carnavale de VeniceCarnavale de Venice
Carnavale de Venice
 
Holocaust Memorial Tato
Holocaust Memorial TatoHolocaust Memorial Tato
Holocaust Memorial Tato
 
Yehuda Pen
Yehuda PenYehuda Pen
Yehuda Pen
 
Mark Chagall
Mark ChagallMark Chagall
Mark Chagall
 
Thailand Visite
Thailand VisiteThailand Visite
Thailand Visite
 
Venice views with music
Venice views with musicVenice views with music
Venice views with music
 
Jean Beraud Paris
Jean Beraud ParisJean Beraud Paris
Jean Beraud Paris
 

Dernier

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Dernier (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Witsml data processing with kafka and spark streaming

  • 1. WITSML data processing example with Kafka and Spark Streaming Houston Hadoop Meetup, 4/26/2016
  • 2. About me - Dmitry Kniazev Currently Solution Architect at EPAM Systems - About 4 years in Oil & Gas here in Houston - Started working with Hadoop about 2 years ago Before that BI/DW Specialist at EPAM Systems for 6 years - Reports, ETL with Oracle, Microsoft, Cognos and other tools - Enjoyed not SO HOT life in Eastern Europe Before that Performance Analyst at EPAM Systems for 4 years - Web Applications and Databases optimization
  • 3. What is the problem? Source: http://www.croftsystems.net/blog/conventional-vs.-unconventional
  • 4. What is WITSML? DATA EXCHANGE STANDARD FOR THE UPSTREAM OIL AND GAS INDUSTRY WITSML Data Store Rig Aggregation Solution Rig Aggregation Solution Corp Store WITSML Data Store Service Company #1 Operator #1 Service Company #2 WITSML based ApplicationsWITSML
  • 5. Operator Company Data Center Architecture WITSML Data Store HBase WITSML via SOAP Internet Consumer (Scala) Producer (Scala) Service Company DC Kafka Consumer (Scala) Email / Browser
  • 7. What is Spark Streaming?
  • 9. Producer - prep // some important imports import com.mycompany.witsml.client.WitsmlClient //based on jwitsml 1.0 import org.apache.kafka.clients.producer.{KafkaProducer, ProducerRecord} import scala.xml.{Elem, Node, XML} // variables initialization var producer: KafkaProducer[String, String] = null var startTimeIndex = DateTime.now() var topic = "" var pollInterval = 5
  • 10. Producer - Kafka Properties bootstrap.servers = srv1:9092,srv2:9092 key.serializer = org.apache.kafka.common.serialization.StringSerializer value.serializer = org.apache.kafka.common.serialization.StringSerializer
  • 11. Producer - main function producer = new KafkaProducer[String, String](props) // each wellBore is a separate Kafka topic which is going to be partitioned by log topic = args(0) while (true) { val logs = WitsmlClient.getWitsmlResponse(logsQuery) // parse logs and send messages to Kafka (logs "log").foreach { node: Node => // send all data from one log to the same partition val key = (node "@uidLog").text (node "data").foreach { data => val message = new ProducerRecord(topic, null, key, data.text) producer.send(message) } }
  • 12. Producer - results ”Well123” => Topic “5207KFSJ18” => Key (Partition) Content of <data> element => Message
  • 13. Consumer - prep import org.apache.spark.SparkConf import org.apache.spark.sql.{Row, SQLContext} import org.apache.spark.streaming.dstream.InputDStream import org.apache.spark.streaming.kafka.KafkaUtils var schema: StructType = null val sc = new SparkConf().setAppName("WitsmlKafkaDemo") val ssc = new StreamingContext(sc, Seconds(1)) val dStream: InputDStream = KafkaUtils.createDirectStream(ssc, kafkaParams, topics) val sqlContext = new SQLContext(ssc.sparkContext)
  • 14. Consumer - Rules Definition # fields for Spark SQL query `Co. Man G/L`,`Gain Loss - Spare`,`ACC_DRILL_STRKS` # where clause for SQL query `Co. Man G/L`>100 OR `Gain Loss - Spare`<(-42.1)
  • 15. Consumer - main function dStream.foreachRDD( batchRDD => { val messages = batchRDD.map(_._2).map(_.split(",")) //create DataFrame with a custom schema val df = sqlContext.createDataFrame(messages, schema) //register temp table and test against rule df.registerTempTable("timeLog") val collected = sqlContext.sql("SELECT " + fields + " FROM timeLog WHERE " + condition).collect if (collected.length > 0) { //send email alert WitsmlKafkaUtil.sendEmail(collected) } }) ssc.start() ssc.awaitTermination()
  • 17. Why Highcharts? - Websockets support -> real-time data visualization - Multiple Y-axes that automatically scale -> many mnemonics on the same chart - Inverted X-axis -> great for Depth Logs - 3D charts that can be rotated -> Trajectories - Area range with custom colors -> Formations on the background - 100% client side javascript -> easy to deploy
  • 18. Lessons Learned - Throw away and re-design: - Logs should be Topics, Wells(Wellbores) should be Partitions for Scalability - Producers and Consumers should be Managed Services (Flume Agents?) - Backend: - Land data to HBase (and probably OpenTSDB) - Frontend: - WebApp to visualize both NRT and historical data? - Mobile App for Alerts? - Improve Producers: - Speak many WITSML dialects? - Get ready for Real-time: - Support for ETP standard