2. About me - Dmitry Kniazev
Currently Solution Architect at EPAM Systems
- About 4 years in Oil & Gas here in Houston
- Started working with Hadoop about 2 years ago
Before that BI/DW Specialist at EPAM Systems for 6 years
- Reports, ETL with Oracle, Microsoft, Cognos and other tools
- Enjoyed not SO HOT life in Eastern Europe
Before that Performance Analyst at EPAM Systems for 4 years
- Web Applications and Databases optimization
3. What is the problem?
Source: http://www.croftsystems.net/blog/conventional-vs.-unconventional
4. What is WITSML?
DATA EXCHANGE STANDARD FOR THE UPSTREAM OIL AND GAS INDUSTRY
WITSML
Data
Store
Rig
Aggregation
Solution
Rig
Aggregation
Solution
Corp
Store
WITSML
Data
Store
Service Company
#1
Operator #1
Service Company
#2
WITSML based
ApplicationsWITSML
5. Operator Company Data Center
Architecture
WITSML
Data
Store
HBase
WITSML
via
SOAP
Internet
Consumer
(Scala)
Producer
(Scala)
Service
Company
DC
Kafka
Consumer
(Scala)
Email /
Browser
11. Producer - main function
producer = new KafkaProducer[String, String](props)
// each wellBore is a separate Kafka topic which is going to be partitioned by log
topic = args(0)
while (true) {
val logs = WitsmlClient.getWitsmlResponse(logsQuery)
// parse logs and send messages to Kafka
(logs "log").foreach { node: Node =>
// send all data from one log to the same partition
val key = (node "@uidLog").text
(node "data").foreach { data =>
val message = new ProducerRecord(topic, null, key, data.text)
producer.send(message)
}
}
13. Consumer - prep
import org.apache.spark.SparkConf
import org.apache.spark.sql.{Row, SQLContext}
import org.apache.spark.streaming.dstream.InputDStream
import org.apache.spark.streaming.kafka.KafkaUtils
var schema: StructType = null
val sc = new SparkConf().setAppName("WitsmlKafkaDemo")
val ssc = new StreamingContext(sc, Seconds(1))
val dStream: InputDStream = KafkaUtils.createDirectStream(ssc, kafkaParams, topics)
val sqlContext = new SQLContext(ssc.sparkContext)
14. Consumer - Rules Definition
# fields for Spark SQL query
`Co. Man G/L`,`Gain Loss - Spare`,`ACC_DRILL_STRKS`
# where clause for SQL query
`Co. Man G/L`>100 OR `Gain Loss - Spare`<(-42.1)
15. Consumer - main function
dStream.foreachRDD( batchRDD => {
val messages = batchRDD.map(_._2).map(_.split(","))
//create DataFrame with a custom schema
val df = sqlContext.createDataFrame(messages, schema)
//register temp table and test against rule
df.registerTempTable("timeLog")
val collected = sqlContext.sql("SELECT " + fields + " FROM timeLog WHERE " + condition).collect
if (collected.length > 0) {
//send email alert
WitsmlKafkaUtil.sendEmail(collected)
}
})
ssc.start()
ssc.awaitTermination()
17. Why Highcharts?
- Websockets support -> real-time data visualization
- Multiple Y-axes that automatically scale -> many mnemonics on the same chart
- Inverted X-axis -> great for Depth Logs
- 3D charts that can be rotated -> Trajectories
- Area range with custom colors -> Formations on the background
- 100% client side javascript -> easy to deploy
18. Lessons Learned
- Throw away and re-design:
- Logs should be Topics, Wells(Wellbores) should be Partitions for Scalability
- Producers and Consumers should be Managed Services (Flume Agents?)
- Backend:
- Land data to HBase (and probably OpenTSDB)
- Frontend:
- WebApp to visualize both NRT and historical data?
- Mobile App for Alerts?
- Improve Producers:
- Speak many WITSML dialects?
- Get ready for Real-time:
- Support for ETP standard