SlideShare une entreprise Scribd logo
1  sur  19
Télécharger pour lire hors ligne
WITSML data processing
example with Kafka and
Spark Streaming
Houston Hadoop Meetup, 4/26/2016
About me - Dmitry Kniazev
Currently Solution Architect at EPAM Systems
- About 4 years in Oil & Gas here in Houston
- Started working with Hadoop about 2 years ago
Before that BI/DW Specialist at EPAM Systems for 6 years
- Reports, ETL with Oracle, Microsoft, Cognos and other tools
- Enjoyed not SO HOT life in Eastern Europe
Before that Performance Analyst at EPAM Systems for 4 years
- Web Applications and Databases optimization
What is the problem?
Source: http://www.croftsystems.net/blog/conventional-vs.-unconventional
What is WITSML?
DATA EXCHANGE STANDARD FOR THE UPSTREAM OIL AND GAS INDUSTRY
WITSML
Data
Store
Rig
Aggregation
Solution
Rig
Aggregation
Solution
Corp
Store
WITSML
Data
Store
Service Company
#1
Operator #1
Service Company
#2
WITSML based
ApplicationsWITSML
Operator Company Data Center
Architecture
WITSML
Data
Store
HBase
WITSML
via
SOAP
Internet
Consumer
(Scala)
Producer
(Scala)
Service
Company
DC
Kafka
Consumer
(Scala)
Email /
Browser
What is Kafka?
What is Spark Streaming?
Discretized Stream
Producer - prep
// some important imports
import com.mycompany.witsml.client.WitsmlClient //based on jwitsml 1.0
import org.apache.kafka.clients.producer.{KafkaProducer, ProducerRecord}
import scala.xml.{Elem, Node, XML}
// variables initialization
var producer: KafkaProducer[String, String] = null
var startTimeIndex = DateTime.now()
var topic = ""
var pollInterval = 5
Producer - Kafka Properties
bootstrap.servers = srv1:9092,srv2:9092
key.serializer = org.apache.kafka.common.serialization.StringSerializer
value.serializer = org.apache.kafka.common.serialization.StringSerializer
Producer - main function
producer = new KafkaProducer[String, String](props)
// each wellBore is a separate Kafka topic which is going to be partitioned by log
topic = args(0)
while (true) {
val logs = WitsmlClient.getWitsmlResponse(logsQuery)
// parse logs and send messages to Kafka
(logs  "log").foreach { node: Node =>
// send all data from one log to the same partition
val key = (node  "@uidLog").text
(node  "data").foreach { data =>
val message = new ProducerRecord(topic, null, key, data.text)
producer.send(message)
}
}
Producer - results
”Well123” => Topic
“5207KFSJ18” => Key (Partition)
Content of <data> element => Message
Consumer - prep
import org.apache.spark.SparkConf
import org.apache.spark.sql.{Row, SQLContext}
import org.apache.spark.streaming.dstream.InputDStream
import org.apache.spark.streaming.kafka.KafkaUtils
var schema: StructType = null
val sc = new SparkConf().setAppName("WitsmlKafkaDemo")
val ssc = new StreamingContext(sc, Seconds(1))
val dStream: InputDStream = KafkaUtils.createDirectStream(ssc, kafkaParams, topics)
val sqlContext = new SQLContext(ssc.sparkContext)
Consumer - Rules Definition
# fields for Spark SQL query
`Co. Man G/L`,`Gain Loss - Spare`,`ACC_DRILL_STRKS`
# where clause for SQL query
`Co. Man G/L`>100 OR `Gain Loss - Spare`<(-42.1)
Consumer - main function
dStream.foreachRDD( batchRDD => {
val messages = batchRDD.map(_._2).map(_.split(","))
//create DataFrame with a custom schema
val df = sqlContext.createDataFrame(messages, schema)
//register temp table and test against rule
df.registerTempTable("timeLog")
val collected = sqlContext.sql("SELECT " + fields + " FROM timeLog WHERE " + condition).collect
if (collected.length > 0) {
//send email alert
WitsmlKafkaUtil.sendEmail(collected)
}
})
ssc.start()
ssc.awaitTermination()
Visualization with Highcharts
Why Highcharts?
- Websockets support -> real-time data visualization
- Multiple Y-axes that automatically scale -> many mnemonics on the same chart
- Inverted X-axis -> great for Depth Logs
- 3D charts that can be rotated -> Trajectories
- Area range with custom colors -> Formations on the background
- 100% client side javascript -> easy to deploy
Lessons Learned
- Throw away and re-design:
- Logs should be Topics, Wells(Wellbores) should be Partitions for Scalability
- Producers and Consumers should be Managed Services (Flume Agents?)
- Backend:
- Land data to HBase (and probably OpenTSDB)
- Frontend:
- WebApp to visualize both NRT and historical data?
- Mobile App for Alerts?
- Improve Producers:
- Speak many WITSML dialects?
- Get ready for Real-time:
- Support for ETP standard
Thank you!
dmitry_kniazev@epam.com
Links:
http://www.energistics.org/
http://www.highcharts.com/
https://spark.apache.org/
http://kafka.apache.org/

Contenu connexe

Tendances

Haystack 2019 - Query relaxation - a rewriting technique between search and r...
Haystack 2019 - Query relaxation - a rewriting technique between search and r...Haystack 2019 - Query relaxation - a rewriting technique between search and r...
Haystack 2019 - Query relaxation - a rewriting technique between search and r...OpenSource Connections
 
Spring Framework
Spring Framework  Spring Framework
Spring Framework tola99
 
Hierarchical Object Oriented Design
Hierarchical Object Oriented DesignHierarchical Object Oriented Design
Hierarchical Object Oriented Designsahibsahib
 
MongoDB Cheat Sheet – Quick Reference
MongoDB Cheat Sheet – Quick ReferenceMongoDB Cheat Sheet – Quick Reference
MongoDB Cheat Sheet – Quick ReferenceCésar Trigo
 
Java I/O and Object Serialization
Java I/O and Object SerializationJava I/O and Object Serialization
Java I/O and Object SerializationNavneet Prakash
 
Robust Calibration For SVI Model Arbitrage Free
Robust Calibration For SVI Model Arbitrage FreeRobust Calibration For SVI Model Arbitrage Free
Robust Calibration For SVI Model Arbitrage FreeTahar FERHATI
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4jNeo4j
 
Time series clustering presentation
Time series clustering presentationTime series clustering presentation
Time series clustering presentationEleni Stamatelou
 
MODULE 4-Text Analytics.pptx
MODULE 4-Text Analytics.pptxMODULE 4-Text Analytics.pptx
MODULE 4-Text Analytics.pptxnikshaikh786
 
Knowledge Graphs and Generative AI
Knowledge Graphs and Generative AIKnowledge Graphs and Generative AI
Knowledge Graphs and Generative AINeo4j
 
Advanced Functional Programming in Scala
Advanced Functional Programming in ScalaAdvanced Functional Programming in Scala
Advanced Functional Programming in ScalaPatrick Nicolas
 
Easy data-with-spring-data-jpa
Easy data-with-spring-data-jpaEasy data-with-spring-data-jpa
Easy data-with-spring-data-jpaStaples
 

Tendances (20)

Linear discriminant analysis
Linear discriminant analysisLinear discriminant analysis
Linear discriminant analysis
 
Haystack 2019 - Query relaxation - a rewriting technique between search and r...
Haystack 2019 - Query relaxation - a rewriting technique between search and r...Haystack 2019 - Query relaxation - a rewriting technique between search and r...
Haystack 2019 - Query relaxation - a rewriting technique between search and r...
 
Spring Framework
Spring Framework  Spring Framework
Spring Framework
 
Hierarchical Object Oriented Design
Hierarchical Object Oriented DesignHierarchical Object Oriented Design
Hierarchical Object Oriented Design
 
JavaScript and Artificial Intelligence by Aatman & Sagar - AhmedabadJS
JavaScript and Artificial Intelligence by Aatman & Sagar - AhmedabadJSJavaScript and Artificial Intelligence by Aatman & Sagar - AhmedabadJS
JavaScript and Artificial Intelligence by Aatman & Sagar - AhmedabadJS
 
MongoDB Cheat Sheet – Quick Reference
MongoDB Cheat Sheet – Quick ReferenceMongoDB Cheat Sheet – Quick Reference
MongoDB Cheat Sheet – Quick Reference
 
4. features of .net
4. features of .net4. features of .net
4. features of .net
 
Java I/O and Object Serialization
Java I/O and Object SerializationJava I/O and Object Serialization
Java I/O and Object Serialization
 
Robust Calibration For SVI Model Arbitrage Free
Robust Calibration For SVI Model Arbitrage FreeRobust Calibration For SVI Model Arbitrage Free
Robust Calibration For SVI Model Arbitrage Free
 
Topics Modeling
Topics ModelingTopics Modeling
Topics Modeling
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
Time series clustering presentation
Time series clustering presentationTime series clustering presentation
Time series clustering presentation
 
Entity Linking
Entity LinkingEntity Linking
Entity Linking
 
MODULE 4-Text Analytics.pptx
MODULE 4-Text Analytics.pptxMODULE 4-Text Analytics.pptx
MODULE 4-Text Analytics.pptx
 
Knowledge Graphs and Generative AI
Knowledge Graphs and Generative AIKnowledge Graphs and Generative AI
Knowledge Graphs and Generative AI
 
Advanced angular
Advanced angularAdvanced angular
Advanced angular
 
Advanced Functional Programming in Scala
Advanced Functional Programming in ScalaAdvanced Functional Programming in Scala
Advanced Functional Programming in Scala
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
Easy data-with-spring-data-jpa
Easy data-with-spring-data-jpaEasy data-with-spring-data-jpa
Easy data-with-spring-data-jpa
 

En vedette

Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop MeetupHadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop MeetupMark Kerzner
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data editionMark Kerzner
 
Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS Mark Kerzner
 
Witsml core api_version_1.3.1
Witsml core api_version_1.3.1Witsml core api_version_1.3.1
Witsml core api_version_1.3.1Suresh Ayyappan
 
Oil and Gas Climate Initiative 2016 report
Oil and Gas Climate Initiative 2016 reportOil and Gas Climate Initiative 2016 report
Oil and Gas Climate Initiative 2016 reportTotal
 
Set up Hadoop Cluster on Amazon EC2
Set up Hadoop Cluster on Amazon EC2Set up Hadoop Cluster on Amazon EC2
Set up Hadoop Cluster on Amazon EC2IMC Institute
 
Hadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - AltiscaleHadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - AltiscaleMark Kerzner
 
WITSML data processing with Kafka and Spark Streaming
WITSML data processing with Kafka and Spark StreamingWITSML data processing with Kafka and Spark Streaming
WITSML data processing with Kafka and Spark StreamingDmitry Kniazev
 
Challenges in Global Standardisation | EnergySys Hydrocarbon Allocation Forum
Challenges in Global Standardisation | EnergySys Hydrocarbon Allocation ForumChallenges in Global Standardisation | EnergySys Hydrocarbon Allocation Forum
Challenges in Global Standardisation | EnergySys Hydrocarbon Allocation ForumEnergySys Limited
 
GIS Technology and E&P in Petroleum Industry Context, Applications and Impact...
GIS Technology and E&P in Petroleum Industry Context, Applications and Impact...GIS Technology and E&P in Petroleum Industry Context, Applications and Impact...
GIS Technology and E&P in Petroleum Industry Context, Applications and Impact...Carlos Gabriel Asato
 
WITSML to PPDM mapping project
WITSML to PPDM mapping projectWITSML to PPDM mapping project
WITSML to PPDM mapping projectETLSolutions
 
Standards for Production Allocation
Standards for Production AllocationStandards for Production Allocation
Standards for Production AllocationEnergySys Limited
 
kafka-steaming-data
kafka-steaming-datakafka-steaming-data
kafka-steaming-dataBryan Jacobs
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2benjaminwootton
 
Prodml Production Reporting | Hydrocarbon Allocation Forum | 2014 09-30
Prodml Production Reporting | Hydrocarbon Allocation Forum | 2014 09-30Prodml Production Reporting | Hydrocarbon Allocation Forum | 2014 09-30
Prodml Production Reporting | Hydrocarbon Allocation Forum | 2014 09-30EnergySys Limited
 

En vedette (20)

Toorcamp 2016
Toorcamp 2016Toorcamp 2016
Toorcamp 2016
 
Cloudera search
Cloudera searchCloudera search
Cloudera search
 
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop MeetupHadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data edition
 
Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS
 
Witsml core api_version_1.3.1
Witsml core api_version_1.3.1Witsml core api_version_1.3.1
Witsml core api_version_1.3.1
 
Oil and Gas Climate Initiative 2016 report
Oil and Gas Climate Initiative 2016 reportOil and Gas Climate Initiative 2016 report
Oil and Gas Climate Initiative 2016 report
 
Hadoop on ec2
Hadoop on ec2Hadoop on ec2
Hadoop on ec2
 
Set up Hadoop Cluster on Amazon EC2
Set up Hadoop Cluster on Amazon EC2Set up Hadoop Cluster on Amazon EC2
Set up Hadoop Cluster on Amazon EC2
 
Hadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - AltiscaleHadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - Altiscale
 
WITSML data processing with Kafka and Spark Streaming
WITSML data processing with Kafka and Spark StreamingWITSML data processing with Kafka and Spark Streaming
WITSML data processing with Kafka and Spark Streaming
 
Challenges in Global Standardisation | EnergySys Hydrocarbon Allocation Forum
Challenges in Global Standardisation | EnergySys Hydrocarbon Allocation ForumChallenges in Global Standardisation | EnergySys Hydrocarbon Allocation Forum
Challenges in Global Standardisation | EnergySys Hydrocarbon Allocation Forum
 
GIS Technology and E&P in Petroleum Industry Context, Applications and Impact...
GIS Technology and E&P in Petroleum Industry Context, Applications and Impact...GIS Technology and E&P in Petroleum Industry Context, Applications and Impact...
GIS Technology and E&P in Petroleum Industry Context, Applications and Impact...
 
WITSML to PPDM mapping project
WITSML to PPDM mapping projectWITSML to PPDM mapping project
WITSML to PPDM mapping project
 
Standards for Production Allocation
Standards for Production AllocationStandards for Production Allocation
Standards for Production Allocation
 
kafka-steaming-data
kafka-steaming-datakafka-steaming-data
kafka-steaming-data
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2
 
WITSML
WITSMLWITSML
WITSML
 
Prodml Production Reporting | Hydrocarbon Allocation Forum | 2014 09-30
Prodml Production Reporting | Hydrocarbon Allocation Forum | 2014 09-30Prodml Production Reporting | Hydrocarbon Allocation Forum | 2014 09-30
Prodml Production Reporting | Hydrocarbon Allocation Forum | 2014 09-30
 
Data Modelling and WITSML
Data Modelling and WITSMLData Modelling and WITSML
Data Modelling and WITSML
 

Similaire à Witsml data processing with kafka and spark streaming

5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra EnvironmentJim Hatcher
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksAnyscale
 
Strata NYC 2015: What's new in Spark Streaming
Strata NYC 2015: What's new in Spark StreamingStrata NYC 2015: What's new in Spark Streaming
Strata NYC 2015: What's new in Spark StreamingDatabricks
 
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowChetan Khatri
 
Spark Study Notes
Spark Study NotesSpark Study Notes
Spark Study NotesRichard Kuo
 
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureSpark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureDataStax Academy
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Databricks
 
Spark streaming state of the union
Spark streaming state of the unionSpark streaming state of the union
Spark streaming state of the unionDatabricks
 
Intro to apache spark
Intro to apache sparkIntro to apache spark
Intro to apache sparkAmine Sagaama
 
Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streamingphanleson
 
Unified Data Access with Gimel
Unified Data Access with GimelUnified Data Access with Gimel
Unified Data Access with GimelAlluxio, Inc.
 
Data orchestration | 2020 | Alluxio | Gimel
Data orchestration | 2020 | Alluxio | GimelData orchestration | 2020 | Alluxio | Gimel
Data orchestration | 2020 | Alluxio | GimelDeepak Chandramouli
 
Spark streaming with kafka
Spark streaming with kafkaSpark streaming with kafka
Spark streaming with kafkaDori Waldman
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka Dori Waldman
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Michael Rys
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Anant Corporation
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData
 

Similaire à Witsml data processing with kafka and spark streaming (20)

5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
Strata NYC 2015: What's new in Spark Streaming
Strata NYC 2015: What's new in Spark StreamingStrata NYC 2015: What's new in Spark Streaming
Strata NYC 2015: What's new in Spark Streaming
 
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
 
Spark Study Notes
Spark Study NotesSpark Study Notes
Spark Study Notes
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
 
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureSpark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and Furure
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
 
Spark streaming state of the union
Spark streaming state of the unionSpark streaming state of the union
Spark streaming state of the union
 
Intro to apache spark
Intro to apache sparkIntro to apache spark
Intro to apache spark
 
Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streaming
 
Spark ML Pipeline serving
Spark ML Pipeline servingSpark ML Pipeline serving
Spark ML Pipeline serving
 
Unified Data Access with Gimel
Unified Data Access with GimelUnified Data Access with Gimel
Unified Data Access with Gimel
 
Data orchestration | 2020 | Alluxio | Gimel
Data orchestration | 2020 | Alluxio | GimelData orchestration | 2020 | Alluxio | Gimel
Data orchestration | 2020 | Alluxio | Gimel
 
Spark streaming with kafka
Spark streaming with kafkaSpark streaming with kafka
Spark streaming with kafka
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka
 
Nike tech talk.2
Nike tech talk.2Nike tech talk.2
Nike tech talk.2
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15
 

Plus de Mark Kerzner

IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for SparkMark Kerzner
 
Joe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiJoe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiMark Kerzner
 
FreeEed popcorn overview
FreeEed popcorn overviewFreeEed popcorn overview
FreeEed popcorn overviewMark Kerzner
 
FreeEed presentation
FreeEed presentationFreeEed presentation
FreeEed presentationMark Kerzner
 
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)Mark Kerzner
 
Porting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdpPorting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdpMark Kerzner
 
Automated Hadoop Cluster Construction on EC2
Automated Hadoop Cluster Construction on EC2Automated Hadoop Cluster Construction on EC2
Automated Hadoop Cluster Construction on EC2Mark Kerzner
 
Open source e_discovery
Open source e_discoveryOpen source e_discovery
Open source e_discoveryMark Kerzner
 
FreEed - Open Source eDiscovery
FreEed - Open Source eDiscoveryFreEed - Open Source eDiscovery
FreEed - Open Source eDiscoveryMark Kerzner
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaMark Kerzner
 
Google Office in Zurich, Switzerland
Google Office in Zurich, SwitzerlandGoogle Office in Zurich, Switzerland
Google Office in Zurich, SwitzerlandMark Kerzner
 
Fun art with fruit and vegetable
Fun art with fruit and vegetableFun art with fruit and vegetable
Fun art with fruit and vegetableMark Kerzner
 
Carnavale de Venice
Carnavale de VeniceCarnavale de Venice
Carnavale de VeniceMark Kerzner
 
Holocaust Memorial Tato
Holocaust Memorial TatoHolocaust Memorial Tato
Holocaust Memorial TatoMark Kerzner
 
Venice views with music
Venice views with musicVenice views with music
Venice views with musicMark Kerzner
 

Plus de Mark Kerzner (20)

IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
 
Joe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiJoe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFi
 
FreeEed popcorn overview
FreeEed popcorn overviewFreeEed popcorn overview
FreeEed popcorn overview
 
FreeEed presentation
FreeEed presentationFreeEed presentation
FreeEed presentation
 
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
 
SHMcloud vision
SHMcloud visionSHMcloud vision
SHMcloud vision
 
Porting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdpPorting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdp
 
Automated Hadoop Cluster Construction on EC2
Automated Hadoop Cluster Construction on EC2Automated Hadoop Cluster Construction on EC2
Automated Hadoop Cluster Construction on EC2
 
Open source e_discovery
Open source e_discoveryOpen source e_discovery
Open source e_discovery
 
FreEed - Open Source eDiscovery
FreEed - Open Source eDiscoveryFreEed - Open Source eDiscovery
FreEed - Open Source eDiscovery
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
 
Google Office in Zurich, Switzerland
Google Office in Zurich, SwitzerlandGoogle Office in Zurich, Switzerland
Google Office in Zurich, Switzerland
 
Fun art with fruit and vegetable
Fun art with fruit and vegetableFun art with fruit and vegetable
Fun art with fruit and vegetable
 
Carnavale de Venice
Carnavale de VeniceCarnavale de Venice
Carnavale de Venice
 
Holocaust Memorial Tato
Holocaust Memorial TatoHolocaust Memorial Tato
Holocaust Memorial Tato
 
Yehuda Pen
Yehuda PenYehuda Pen
Yehuda Pen
 
Mark Chagall
Mark ChagallMark Chagall
Mark Chagall
 
Thailand Visite
Thailand VisiteThailand Visite
Thailand Visite
 
Venice views with music
Venice views with musicVenice views with music
Venice views with music
 
Jean Beraud Paris
Jean Beraud ParisJean Beraud Paris
Jean Beraud Paris
 

Dernier

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 

Dernier (20)

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 

Witsml data processing with kafka and spark streaming

  • 1. WITSML data processing example with Kafka and Spark Streaming Houston Hadoop Meetup, 4/26/2016
  • 2. About me - Dmitry Kniazev Currently Solution Architect at EPAM Systems - About 4 years in Oil & Gas here in Houston - Started working with Hadoop about 2 years ago Before that BI/DW Specialist at EPAM Systems for 6 years - Reports, ETL with Oracle, Microsoft, Cognos and other tools - Enjoyed not SO HOT life in Eastern Europe Before that Performance Analyst at EPAM Systems for 4 years - Web Applications and Databases optimization
  • 3. What is the problem? Source: http://www.croftsystems.net/blog/conventional-vs.-unconventional
  • 4. What is WITSML? DATA EXCHANGE STANDARD FOR THE UPSTREAM OIL AND GAS INDUSTRY WITSML Data Store Rig Aggregation Solution Rig Aggregation Solution Corp Store WITSML Data Store Service Company #1 Operator #1 Service Company #2 WITSML based ApplicationsWITSML
  • 5. Operator Company Data Center Architecture WITSML Data Store HBase WITSML via SOAP Internet Consumer (Scala) Producer (Scala) Service Company DC Kafka Consumer (Scala) Email / Browser
  • 7. What is Spark Streaming?
  • 9. Producer - prep // some important imports import com.mycompany.witsml.client.WitsmlClient //based on jwitsml 1.0 import org.apache.kafka.clients.producer.{KafkaProducer, ProducerRecord} import scala.xml.{Elem, Node, XML} // variables initialization var producer: KafkaProducer[String, String] = null var startTimeIndex = DateTime.now() var topic = "" var pollInterval = 5
  • 10. Producer - Kafka Properties bootstrap.servers = srv1:9092,srv2:9092 key.serializer = org.apache.kafka.common.serialization.StringSerializer value.serializer = org.apache.kafka.common.serialization.StringSerializer
  • 11. Producer - main function producer = new KafkaProducer[String, String](props) // each wellBore is a separate Kafka topic which is going to be partitioned by log topic = args(0) while (true) { val logs = WitsmlClient.getWitsmlResponse(logsQuery) // parse logs and send messages to Kafka (logs "log").foreach { node: Node => // send all data from one log to the same partition val key = (node "@uidLog").text (node "data").foreach { data => val message = new ProducerRecord(topic, null, key, data.text) producer.send(message) } }
  • 12. Producer - results ”Well123” => Topic “5207KFSJ18” => Key (Partition) Content of <data> element => Message
  • 13. Consumer - prep import org.apache.spark.SparkConf import org.apache.spark.sql.{Row, SQLContext} import org.apache.spark.streaming.dstream.InputDStream import org.apache.spark.streaming.kafka.KafkaUtils var schema: StructType = null val sc = new SparkConf().setAppName("WitsmlKafkaDemo") val ssc = new StreamingContext(sc, Seconds(1)) val dStream: InputDStream = KafkaUtils.createDirectStream(ssc, kafkaParams, topics) val sqlContext = new SQLContext(ssc.sparkContext)
  • 14. Consumer - Rules Definition # fields for Spark SQL query `Co. Man G/L`,`Gain Loss - Spare`,`ACC_DRILL_STRKS` # where clause for SQL query `Co. Man G/L`>100 OR `Gain Loss - Spare`<(-42.1)
  • 15. Consumer - main function dStream.foreachRDD( batchRDD => { val messages = batchRDD.map(_._2).map(_.split(",")) //create DataFrame with a custom schema val df = sqlContext.createDataFrame(messages, schema) //register temp table and test against rule df.registerTempTable("timeLog") val collected = sqlContext.sql("SELECT " + fields + " FROM timeLog WHERE " + condition).collect if (collected.length > 0) { //send email alert WitsmlKafkaUtil.sendEmail(collected) } }) ssc.start() ssc.awaitTermination()
  • 17. Why Highcharts? - Websockets support -> real-time data visualization - Multiple Y-axes that automatically scale -> many mnemonics on the same chart - Inverted X-axis -> great for Depth Logs - 3D charts that can be rotated -> Trajectories - Area range with custom colors -> Formations on the background - 100% client side javascript -> easy to deploy
  • 18. Lessons Learned - Throw away and re-design: - Logs should be Topics, Wells(Wellbores) should be Partitions for Scalability - Producers and Consumers should be Managed Services (Flume Agents?) - Backend: - Land data to HBase (and probably OpenTSDB) - Frontend: - WebApp to visualize both NRT and historical data? - Mobile App for Alerts? - Improve Producers: - Speak many WITSML dialects? - Get ready for Real-time: - Support for ETP standard