SlideShare a Scribd company logo
1 of 37
Gimel Data Platform Overview
Agenda
©2018 PayPal Inc. Confidential and proprietary. 2
• Introduction
• PayPal’s Analytics Ecosystem
• Why Gimel
• Challenges in Analytics
• Walk through simple use case
• Gimel Open Source Journey
About Us
• Product manager, data processing products
at PayPal
• 20 years in data and analytics across
networking, semi-conductors, telecom,
security and fintech industries
• Data warehouse developer, BI program
manager, Data product manager
romehta@paypal.com
https://www.linkedin.com/in/romit-mehta/
©2018 PayPal Inc. Confidential and proprietary. 3
Romit Mehta
• Big data platform engineer at PayPal
• 13 years in data engineering, 5 years in
scalable solutions with big data
• Developed several Spark-based solutions
across NoSQL, Key-Value, Messaging,
Document based & relational systems
dmohanakumarchan@paypal.com
https://www.linkedin.com/in/deepakmc/
Deepak Mohanakumar Chandramouli
PayPal – Key Metrics and Analytics
Ecosystem
4©2018 PayPal Inc. Confidential and proprietary.
PayPal Big Data Platform
5
160+ PB Data
75,000+ YARN
jobs/day
One of the largest
Aerospike,
Teradata,
Hortonworks and
Oracle installations
Compute supported:
MR, Pig, Hive, Spark,
Beam
13 prod clusters, 12 non-
prod clusters
GPU co-located with
Hadoop
6
Developer Data scientist Analyst Operator
Gimel SDK Notebooks
PCatalog Data API
Infrastructure services leveraged for elasticity and redundancy
Multi-DC Public cloudPredictive resource allocation
Logging
Monitoring
Alerting
Security
Application
Lifecycle
Management
Compute
Frameworkand
APIs
GimelData
Platform
User
Experience
andAccess
R Studio BI tools
Why Gimel?
7
Use case - Flights Cancelled
9
Kafka Teradata External
HDFS / Hive
Data Prep / Availability
ProcessStream Ingest LoadExtract/Load
Parquet/ORC/Text?
Productionalize, Logging, Monitoring, Alerting, Auditing, Data Quality
Data SourcesData Points
Flights Events
Airports
Airlines
Carrier
Geography & Geo
Tags
Publish
Use case challenges
…
©2018 PayPal Inc. Confidential and proprietary.
Analysis
Real-time/
processed data
©2018 PayPal Inc. Confidential and proprietary. 10
Spark Read From Hbase
Data Access Code is Cumbersome and Fragile
©2018 PayPal Inc. Confidential and proprietary. 11
Spark Read From Hbase Spark Read From Elastic Search
Spark Read From AeroSpike Spark Read From Druid
Data Access Code is Cumbersome and Fragile
©2018 PayPal Inc. Confidential and proprietary. 12
Datasets Challenges
Data access tied
to compute and
data store
versions
Hard to find
available
data sets
Storage-specific
dataset creation
results in
duplication and
increased latency
No audit
trail for
dataset
access
No standards for
on-boarding data
sets for others to
discover
No statistics
on data set
usage and
access trends
Datasets
©2018 PayPal Inc. Confidential and proprietary. 13
High-friction Data Application Lifecycle
Learn Code Optimize Build Deploy RunOnboarding Big Data Apps
Learn Code Optimize Build Deploy RunCompute Engine Changed
Learn Code Optimize Build Deploy RunCompute Version Upgraded
Learn Code Optimize Build Deploy RunStorage API Changed
Learn Code Optimize Build Deploy RunStorage Connector Upgraded
Learn Code Optimize Build Deploy RunStorage Hosts Migrated
Learn Code Optimize Build Deploy RunStorage Changed
Learn Code Optimize Build Deploy Run*********************
Gimel Demo
14
15
API, PCatalog, Tools
With Gimel & Notebooks
©2018 PayPal Inc. Confidential and proprietary.
Kafka Teradata External
HDFS/ Hive
Data Prep / Availability
ProcessIngest LoadExtract/Load
Parquet/ORC/Text?
Productionalize, Logging, Monitoring, Alerting, Auditing, Data QC
Data SourcesData Points
Flights Events
Airports
Airlines
Carrier
Geography & Geo Tags
Analysis Publish
Use case challenges - Simplified with Gimel
©2018 PayPal Inc. Confidential and proprietary.
Spark Read From Hbase Spark Read From Elastic Search
Spark Read From AeroSpike Spark Read From Druid
With Data API
✔
Data Access Simplified with Gimel Data API
16
©2018 PayPal Inc. Confidential and proprietary.
Spark Read From Hbase Spark Read From Elastic Search
Spark Read From AeroSpike Spark Read From Druid
With Data API
✔
SQL Support in Gimel Data Platform
17
©2018 PayPal Inc. Confidential and proprietary. 18
Data Application Lifecycle with Data API
Learn Code Optimize Build Deploy RunOnboarding Big Data Apps
RunCompute Engine Changed
Compute Version Upgraded
Storage API Changed
Storage Connector Upgraded
Storage Hosts Migrated
Storage Changed
*********************
Run
Run
Run
Run
Run
Run
Open Source
19©2018 PayPal Inc. Confidential and proprietary.
Gimel Open Source Journey
• Open source Gimel
PCatalog:
• Metadata
services
• Discovery
services
• Catalog UI
• Open source Compute
Framework (SCaaS)
• Livy features and
enhancements
• Monitoring and
alerting
• SDK and Gimel
integration
• Open source PayPal
Notebooks
• Jupyter features
and enhancements
• Gimel integration
©2018 PayPal Inc. Confidential and proprietary.
• Open sourced Gimel
Data API in April 2018
(http://try.gimel.io)
Gimel - Open Sourced
Codebase available: https://github.com/paypal/gimel
Slack: https://gimel-dev.slack.com
Google Groups: https://groups.google.com/d/forum/gimel-dev
©2017 PayPal Inc. Confidential and proprietary. 21
Q&A
G i t h u b : h t t p : / / g i m e l . i o
Tr y i t y o u r s e l f : h t t p : / / t r y. g i m e l . i o
S l a c k : h t t p s : / / g i m e l - d e v. s l a c k . c o m
G o o g l e G r o u p s : h t t p s : / / g r o u p s . g o o g l e . c o m / d / f o r u m / g i m e l - d e v
22
Gimel – Deep Dive
23
Job
LIVY GRID
Job Server
Batch
Livy
API
NAS
Batch
In InIn
Interactive
Sparkling
Water
Interactive
Interactive
Metrics
History Server
Thrift Server
In InIn
Interactive
Interactive
Log
Log
Indexing
Search
xDiscovery
Maintain
Catalog
Scan
Discover
Metadata
Services
PCatalog UI
Explore
Configure
Log
Indexing
Search
PayPal Analytics Ecosystem
©2018 PayPal Inc. Confidential and proprietary.
©2018 PayPal Inc. Confidential and proprietary. 25
A peek into
Streaming SQL
Launches … Spark Streaming App
--StreamingWindowSeconds
setgimel.kafka.throttle.streaming.window.seconds=10;
--Throttling
setgimel.kafka.throttle.streaming.maxRatePerPartition=1500;
--ZK checkpoint rootpath
setgimel.kafka.consumer.checkpoint.root=/checkpoints/appname;
--Checkpoint enablingflag -implicitlycheckpoints aftereach mini-batch in streaming
setgimel.kafka.reader.checkpoint.save.enabled=true;
--Jupyter MagicforstreamingSQLon Notebooks | Interactive Usecases
--LivyREPL-Same magicforstreamingSQLworks | Streaming Usecases
%%gimel-stream
--AssumePre-SplitHBASETable as anexample
insertintopcatalog.HBASE_dataset
select
cust_id,
kafka_ds.*
frompcatalog.KAFKA_dataset kafka_ds;
Batch SQL
Launches … Spark Batch App
--Establish10 concurrent connections perTopic-Partition
setgimel.kafka.throttle.batch.parallelsPerPartition=10;
--Fetchat max-10 M messagesfromeach partition
setgimel.kafka.throttle.batch.maxRecordsPerPartition=10,000,000;
--Jupyter Magicon Notebooks | Interactive Usecases
--LivyREPL-Same magicworks| Batch Usecases
%%gimel
insertintopcatalog.HIVE_dataset
partition(yyyy,mm,dd,hh,mi)
selectkafka_ds.*,gimel_load_id
,substr(commit_timestamp,1,4)as yyyy
,substr(commit_timestamp,6,2)as mm
,substr(commit_timestamp,9,2)as dd
,substr(commit_timestamp,12,2)as hh
,case when cast(substr(commit_timestamp,15,2)asINT) <= 30then "00" else "30" end asmi
from pcatalog.KAFKA_dataset kafka_ds;
Following are Jupyter/Livy Magic terms
• %%gimel : calls gimel.executeBatch(sql)
• %%gimel-stream : calls
gimel.executeStream(sql)
gimel.dataset.factory {
KafkaDataSet
ElasticSearchDataSet
DruidDataSet
HiveDataSet
AerospikeDataSet
HbaseDataSet
CassandraDataSet
JDBCDataSet
}
Metadata
Services
dataSet.read(“dataSetName”,options)
dataSet.write(dataToWrite,”dataSetName”,options)
dataStream.read(“dataSetName”, options)
valstorageDataSet =getFromFactory(type=“Hive”)
{
Core Connector Implementation, example –Kafka
Combination ofOpen SourceConnector and
In-house implementations
Open source connector such asDataStax/SHC /ES-Spark
}
& Anatomy of API
gimel.datastream.factory{
KafkaDataStream
}
CatalogProvider.getDataSetProperties(“dataSetName”)
valstorageDataStream= getFromStreamFactory(type=“kafka”)
kafkaDataSet.read(“dataSetName”,options)
hiveDataSet.write(dataToWrite,”dataSetName”,options)
storageDataStream.read(“dataSetName”,options)
dataSet.write(”pcatalog.HIVE_dataset”,readDf, options)
val dataSet :gimel.DataSet =DataSet(sparkSession)
valdf1 =dataSet.read(“pcatalog.KAFKA_dataset”, options);
df1.createGlobalTempView(“tmp_abc123”)
Val resolvedSelectSQL= selectSQL.replace(“pcatalog.KAFKA_dataset”,”tmp_abc123”)
Val readDf : DataFrame= sparkSession.sql(resolvedSelectSQL);
selectkafka_ds.*,gimel_load_id
,substr(commit_timestamp,1,4)as yyyy
,substr(commit_timestamp,6,2)as mm
,substr(commit_timestamp,9,2)as dd
,substr(commit_timestamp,12,2)as hh
frompcatalog.KAFKA_dataset kafka_ds
join default.geo_lkp lkp
on kafka_ds.zip =geo_lkp.zip
where geo_lkp.region = ‘MIDWEST’
%%gimel
insertintopcatalog.HIVE_dataset
partition(yyyy,mm,dd,hh,mi)
--Establish10 concurrent connections perTopic-Partition
setgimel.kafka.throttle.batch.parallelsPerPartition=10;
--Fetch at max -10 M messagesfromeach partition
setgimel.kafka.throttle.batch.maxRecordsPerPartition=10,000,000;
©2018 PayPal Inc. Confidential and proprietary.
Setgimel.catalog.provider=PCATALOG
CatalogProvider.getDataSetProperties(“dataSetName”)
Metadata
Services
Setgimel.catalog.provider=USER
CatalogProvider.getDataSetProperties(“dataSetName”)
Setgimel.catalog.provider=HIVE
CatalogProvider.getDataSetProperties(“dataSetName”)
sql> set dataSetProperties={
"key.deserializer":"org.apache.kafka.common.serialization.StringDeserializer",
"auto.offset.reset":"earliest",
"gimel.kafka.checkpoint.zookeeper.host":"zookeeper:2181",
"gimel.storage.type":"kafka",
"gimel.kafka.whitelist.topics":"kafka_topic",
"datasetName":"test_table1",
"value.deserializer":"org.apache.kafka.common.serialization.ByteArrayDeserialize
r",
"value.serializer":"org.apache.kafka.common.serialization.ByteArraySerializer",
"gimel.kafka.checkpoint.zookeeper.path":"/pcatalog/kafka_consumer/checkpoint",
"gimel.kafka.avro.schema.source":"CSR",
"gimel.kafka.zookeeper.connection.timeout.ms":"10000",
"gimel.kafka.avro.schema.source.url":"http://schema_registry:8081",
"key.serializer":"org.apache.kafka.common.serialization.StringSerializer",
"gimel.kafka.avro.schema.source.wrapper.key":"schema_registry_key",
"gimel.kafka.bootstrap.servers":"localhost:9092"
}
sql> Select * from pcatalog.test_table1.
spark.sql("set gimel.catalog.provider=USER");
val dataSetOptions = DataSetProperties(
"KAFKA",
Array(Field("payload","string",true)) ,
Array(),
Map(
"datasetName" -> "test_table1",
"auto.offset.reset"-> "earliest",
"gimel.kafka.bootstrap.servers"-> "localhost:9092",
"gimel.kafka.avro.schema.source"-> "CSR",
"gimel.kafka.avro.schema.source.url"-> "http://schema_registry:8081",
"gimel.kafka.avro.schema.source.wrapper.key"-> "schema_registry_key",
"gimel.kafka.checkpoint.zookeeper.host"-> "zookeeper:2181",
"gimel.kafka.checkpoint.zookeeper.path"->
"/pcatalog/kafka_consumer/checkpoint",
"gimel.kafka.whitelist.topics"-> "kafka_topic",
"gimel.kafka.zookeeper.connection.timeout.ms"-> "10000",
"gimel.storage.type"-> "kafka",
"key.serializer"-> "org.apache.kafka.common.serialization.StringSerializer",
"value.serializer"-> "org.apache.kafka.common.serialization.ByteArraySerializer"
)
)
dataSet.read(”test_table1",Map("dataSetProperties"->dataSetOptions))
CREATE EXTERNAL TABLE `pcatalog.test_table1`
(payload string)
LOCATION 'hdfs://tmp/'
TBLPROPERTIES (
"datasetName" -> "dummy",
"auto.offset.reset"-> "earliest",
"gimel.kafka.bootstrap.servers"-> "localhost:9092",
"gimel.kafka.avro.schema.source"-> "CSR",
"gimel.kafka.avro.schema.source.url"-> "http://schema_registry:8081",
"gimel.kafka.avro.schema.source.wrapper.key"-> "schema_registry_key",
"gimel.kafka.checkpoint.zookeeper.host"-> "zookeeper:2181",
"gimel.kafka.checkpoint.zookeeper.path"->
"/pcatalog/kafka_consumer/checkpoint",
"gimel.kafka.whitelist.topics"-> "kafka_topic",
"gimel.kafka.zookeeper.connection.timeout.ms"-> "10000",
"gimel.storage.type"-> "kafka",
"key.serializer"-> "org.apache.kafka.common.serialization.StringSerializer",
"value.serializer"->
"org.apache.kafka.common.serialization.ByteArraySerializer"
);
Spark-sql> Select * from pcatalog.test_table1
Scala> dataSet.read(”test_table1",Map("dataSetProperties"-
>dataSetOptions))
Catalog Provider – USER | HIVE | PCATALOG | Your Own Catalog
Metadata
Setgimel.catalog.provider=YOUR_CATALOG
CatalogProvider.getDataSetProperties(“dataSetName”)
{
//Implement this!
}
©2018 PayPal Inc. Confidential and proprietary.
Spark Thrift Server
org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.sc
ala
//result = sqlContext.sql(statement)  Original SQL Execution
//Integration of Gimel in Spark
result = GimelQueryProcessor.executeBatch(statement, sqlContext.sparkSession)
Integration with ecosystems
class SparkSqlInterpreter(conf: SparkConf) extends SparkInterpreter(conf) {
private val SCALA_MAGIC = "%%[sS][cC][aA][lL][aA] (.*)".r
private val PCATALOG_BATCH_MAGIC = "%%[gG][iI][mM][eE][lL](.*)".r
private val PCATALOG_STREAM_MAGIC = "%%[gG][iI][mM][eE][lL](.*)".sS][tT][rR][eE][aA][mM] (.*)".r
// ........
// .....
case PCATALOG_BATCH_MAGIC(gimelCode) => GimelQueryProcessor.executeBatch(gimelCode,
sparkSession)
case PCATALOG_STREAM_MAGIC(gimelCode) => GimelQueryProcessor.executeStream(gimelCode,
sparkSession)
case _ =>
// ........
// .....
com/cloudera/livy/repl/SparkSqlInterpreter.scala
Livy REPL
sparkmagic/sparkmagic/kernels/sparkkernel/kernel.js
define(['base/js/namespace'], function(IPython){
var onload = function() {
IPython.CodeCell.config_defaults.highlight_modes['magic_text/x-sql'] =
{'reg':[/^%%gimel/]};}
return { onload: onload }})
Jupyter Notebooks
©2018 PayPal Inc. Confidential and proprietary.
Data Stores Supported
©2018 PayPal Inc. Confidential and proprietary. 29
Systems
REST datasets
Acknowledgements
30
Acknowledgements
Gimel and PayPal Notebooks team:
Andrew Alves
Anisha Nainani
Ayushi Agarwal
Baskaran Gopalan
Dheeraj Rampally
Deepak Chandramouli
Laxmikant Patil
Meisam Fathi Salmi
Prabhu Kasinathan
Praveen Kanamarlapudi
Romit Mehta
Thilak Balasubramanian
Weijun Qian
31
Appendix
32©2018 PayPal Inc. Confidential and proprietary.
References Used
Images Referred :
https://www.google.com/search?q=big+data+stack+images&source=lnms&tbm=isch&sa=X&ved=0ahUKEwip1Jz3voPaAhU
oxFQKHV33AsgQ_AUICigB&biw=1440&bih=799
33©2018 PayPal Inc. Confidential and proprietary.
Spark Thrift Server - Integration
spark/sql/hive-
thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
//result = sqlContext.sql(statement)  Original SQL Execution
//Integration of Gimel in Spark
result = GimelQueryProcessor.executeBatch(statement, sqlContext.sparkSession)
©2018 PayPal Inc. Confidential and proprietary.
Livy - Integration
class SparkSqlInterpreter(conf: SparkConf) extends SparkInterpreter(conf) {
private val SCALA_MAGIC = "%%[sS][cC][aA][lL][aA] (.*)".r
private val PCATALOG_BATCH_MAGIC = "%%[gG][iI][mM][eE][lL](.*)".r
private val PCATALOG_STREAM_MAGIC = "%%[gG][iI][mM][eE][lL](.*)".sS][tT][rR][eE][aA][mM] (.*)".r
// ........
// .....
override def execute(code: String, outputPath: String): Interpreter.ExecuteResponse = {
require(sparkContext != null && sqlContext != null && sparkSession != null)
code match {
case SCALA_MAGIC(scalaCode) =>
super.execute(scalaCode, null)
case PCATALOG_BATCH_MAGIC(gimelCode) =>
Try {
GimelQueryProcessor.executeBatch(gimelCode, sparkSession)
} match {
case Success(x) => Interpreter.ExecuteSuccess(TEXT_PLAIN -> x)
case _ => Interpreter.ExecuteError("Failed", " ")
}
case PCATALOG_STREAM_MAGIC(gimelCode) =>
Try {
GimelQueryProcessor.executeStream(gimelCode, sparkSession)
} match {
case Success(x) => Interpreter.ExecuteSuccess(TEXT_PLAIN -> x)
case _ => Interpreter.ExecuteError("Failed", " ")
}
case _ =>
// ........
// .....
/repl/src/main/scala/com/cloudera/livy/repl/SparkSqlInterpreter.s
cala
©2018 PayPal Inc. Confidential and proprietary.
PayPal Notebooks (Jupyter) - Integration
def _scala_pcatalog_command(self, sql_context_variable_name):
if sql_context_variable_name == u'spark':
command = u'val output= {{import java.io.{{ByteArrayOutputStream, StringReader}};val outCapture = new
ByteArrayOutputStream;Console.withOut(outCapture){{gimel.GimelQueryProcessor.executeBatch("""{}""",sparkSession)}}}}'.format(self.query)
else:
command = u'val output= {{import java.io.{{ByteArrayOutputStream, StringReader}};val outCapture = new
ByteArrayOutputStream;Console.withOut(outCapture){{gimel..GimelQueryProcessor.executeBatch("""{}""",{})}}}}'.format(self.query, sql_context_variable_name)
if self.samplemethod == u'sample':
command = u'{}.sample(false, {})'.format(command, self.samplefraction)
if self.maxrows >= 0:
command = u'{}.take({})'.format(command, self.maxrows)
else:
command = u'{}.collect'.format(command)
return Command(u'{}.foreach(println)'.format(command+';noutput'))
sparkmagic/sparkmagic/livyclientlib/sqlquery.py
sparkmagic/sparkmagic/kernels/sparkkernel/kernel.js
define(['base/js/namespace'], function(IPython){
var onload = function() {
IPython.CodeCell.config_defaults.highlight_modes['magic_text/x-sql'] =
{'reg':[/^%%sql/]};
IPython.CodeCell.config_defaults.highlight_modes['magic_text/x-python'] =
{'reg':[/^%%local/]};
IPython.CodeCell.config_defaults.highlight_modes['magic_text/x-sql'] =
{'reg':[/^%%gimel/]};}
return { onload: onload }
})
©2018 PayPal Inc. Confidential and proprietary.
Connectors | High level
©2018 PayPal Inc. Confidential and proprietary. 37
Storage Version API Implementation
Kafka 0.10.2 Batch & Stream Connectors – Implementation from scratch
Elastic Search 5.4.6 Connector | https://www.elastic.co/guide/en/elasticsearch/hadoop/5.4/spark.html
Additional implementations added in Gimel to support daily / monthly partitioned indexes in ES
Aerospike 3.1x Read | Aerospike Spark Connector(Aerospark) is used to read data directly into a DataFrame
(https://github.com/sasha-polev/aerospark)
Write | Aerospike Native Java Client Put API is used.
For each partition of the Dataframe a client connection is established, to write data from that partition to Aerospike.
HBASE 1.2 Connector | Horton Works HBASE Connector for Spark (SHC)
https://github.com/hortonworks-spark/shc
Cassandra 2.x Connector | DataStax Connector
https://github.com/datastax/spark-cassandra-connector
HIVE 1.2 Leverages spark APIs under the hood.
Druid 0.82 Connector | Leverages Tranquility under the hood
https://github.com/druid-io/tranquility
Teradata /
Relational
Leverages JDBC Storage Handler
Support for Batch Reads/Loads , FAST Load & FAST Exports
Alluxio Leverage Cross cluster access via reads using Spark Conf : spark.yarn.access.namenodes

More Related Content

What's hot

Agile Integration Workshop
Agile Integration WorkshopAgile Integration Workshop
Agile Integration WorkshopJudy Breedlove
 
[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Data
[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Data[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Data
[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Dataconfluent
 
Simplifying the OpenAPI Development Experience
Simplifying the OpenAPI Development Experience Simplifying the OpenAPI Development Experience
Simplifying the OpenAPI Development Experience confluent
 
Airline reservations and routing: a graph use case
Airline reservations and routing: a graph use caseAirline reservations and routing: a graph use case
Airline reservations and routing: a graph use caseDataWorks Summit
 
SnapLogic Raises $37.5M to Fuel Big Data Integration Push
SnapLogic Raises $37.5M to Fuel Big Data Integration PushSnapLogic Raises $37.5M to Fuel Big Data Integration Push
SnapLogic Raises $37.5M to Fuel Big Data Integration PushSnapLogic
 
Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...DataWorks Summit
 
IT Modernization in Practice
IT Modernization in PracticeIT Modernization in Practice
IT Modernization in PracticeTom Diederich
 
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...HostedbyConfluent
 
Keynote: Customer Journey with Streaming Data on AWS - Rahul Pathak, AWS
Keynote: Customer Journey with Streaming Data on AWS - Rahul Pathak, AWSKeynote: Customer Journey with Streaming Data on AWS - Rahul Pathak, AWS
Keynote: Customer Journey with Streaming Data on AWS - Rahul Pathak, AWSFlink Forward
 
A Solution for Leveraging Kafka to Provide End-to-End ACID Transactions
A Solution for Leveraging Kafka to Provide End-to-End ACID TransactionsA Solution for Leveraging Kafka to Provide End-to-End ACID Transactions
A Solution for Leveraging Kafka to Provide End-to-End ACID Transactionsconfluent
 
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...SnapLogic
 
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...VoltDB
 
IoT and Microservice
IoT and MicroserviceIoT and Microservice
IoT and Microservicekgshukla
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019Timothy Spann
 
Choosing the Right Open Source Database
Choosing the Right Open Source DatabaseChoosing the Right Open Source Database
Choosing the Right Open Source DatabaseAll Things Open
 
Streaming Analytics - Comparison of Open Source Frameworks and Products
Streaming Analytics - Comparison of Open Source Frameworks and ProductsStreaming Analytics - Comparison of Open Source Frameworks and Products
Streaming Analytics - Comparison of Open Source Frameworks and ProductsKai Wähner
 
Why Integrating IBM Z into ServiceNow and Splunk Is So Important
Why Integrating IBM Z into ServiceNow and Splunk Is So ImportantWhy Integrating IBM Z into ServiceNow and Splunk Is So Important
Why Integrating IBM Z into ServiceNow and Splunk Is So ImportantPrecisely
 
Express Scripts: Driving Digital Transformation from Mainframe to Microservices
Express Scripts: Driving Digital Transformation from Mainframe to MicroservicesExpress Scripts: Driving Digital Transformation from Mainframe to Microservices
Express Scripts: Driving Digital Transformation from Mainframe to Microservicesconfluent
 

What's hot (20)

Agile Integration Workshop
Agile Integration WorkshopAgile Integration Workshop
Agile Integration Workshop
 
[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Data
[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Data[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Data
[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Data
 
Simplifying the OpenAPI Development Experience
Simplifying the OpenAPI Development Experience Simplifying the OpenAPI Development Experience
Simplifying the OpenAPI Development Experience
 
Airline reservations and routing: a graph use case
Airline reservations and routing: a graph use caseAirline reservations and routing: a graph use case
Airline reservations and routing: a graph use case
 
SnapLogic Raises $37.5M to Fuel Big Data Integration Push
SnapLogic Raises $37.5M to Fuel Big Data Integration PushSnapLogic Raises $37.5M to Fuel Big Data Integration Push
SnapLogic Raises $37.5M to Fuel Big Data Integration Push
 
Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...
 
IT Modernization in Practice
IT Modernization in PracticeIT Modernization in Practice
IT Modernization in Practice
 
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
 
Keynote: Customer Journey with Streaming Data on AWS - Rahul Pathak, AWS
Keynote: Customer Journey with Streaming Data on AWS - Rahul Pathak, AWSKeynote: Customer Journey with Streaming Data on AWS - Rahul Pathak, AWS
Keynote: Customer Journey with Streaming Data on AWS - Rahul Pathak, AWS
 
Quantum metrics
Quantum metricsQuantum metrics
Quantum metrics
 
A Solution for Leveraging Kafka to Provide End-to-End ACID Transactions
A Solution for Leveraging Kafka to Provide End-to-End ACID TransactionsA Solution for Leveraging Kafka to Provide End-to-End ACID Transactions
A Solution for Leveraging Kafka to Provide End-to-End ACID Transactions
 
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
 
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
 
IoT and Microservice
IoT and MicroserviceIoT and Microservice
IoT and Microservice
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019
 
Choosing the Right Open Source Database
Choosing the Right Open Source DatabaseChoosing the Right Open Source Database
Choosing the Right Open Source Database
 
Streaming Analytics - Comparison of Open Source Frameworks and Products
Streaming Analytics - Comparison of Open Source Frameworks and ProductsStreaming Analytics - Comparison of Open Source Frameworks and Products
Streaming Analytics - Comparison of Open Source Frameworks and Products
 
Why Integrating IBM Z into ServiceNow and Splunk Is So Important
Why Integrating IBM Z into ServiceNow and Splunk Is So ImportantWhy Integrating IBM Z into ServiceNow and Splunk Is So Important
Why Integrating IBM Z into ServiceNow and Splunk Is So Important
 
Big data ready Enterprise
Big data ready EnterpriseBig data ready Enterprise
Big data ready Enterprise
 
Express Scripts: Driving Digital Transformation from Mainframe to Microservices
Express Scripts: Driving Digital Transformation from Mainframe to MicroservicesExpress Scripts: Driving Digital Transformation from Mainframe to Microservices
Express Scripts: Driving Digital Transformation from Mainframe to Microservices
 

Similar to Gimel at Dataworks Summit San Jose 2018

QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformDeepak Chandramouli
 
PayPal Notebooks at Jupytercon 2018
PayPal Notebooks at Jupytercon 2018PayPal Notebooks at Jupytercon 2018
PayPal Notebooks at Jupytercon 2018Romit Mehta
 
Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Gimel and PayPal Notebooks @ TDWI Leadership Summit OrlandoGimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Gimel and PayPal Notebooks @ TDWI Leadership Summit OrlandoRomit Mehta
 
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?SnapLogic
 
apidays LIVE Australia 2020 - Data with a Mission by Matt McLarty
apidays LIVE Australia 2020 -  Data with a Mission by Matt McLarty apidays LIVE Australia 2020 -  Data with a Mission by Matt McLarty
apidays LIVE Australia 2020 - Data with a Mission by Matt McLarty apidays
 
apidays LIVE Paris - Data with a mission: a COVID-19 API case study by Matt M...
apidays LIVE Paris - Data with a mission: a COVID-19 API case study by Matt M...apidays LIVE Paris - Data with a mission: a COVID-19 API case study by Matt M...
apidays LIVE Paris - Data with a mission: a COVID-19 API case study by Matt M...apidays
 
apidays LIVE New York 2021 - Simplify Open Policy Agent with Styra DAS by Tim...
apidays LIVE New York 2021 - Simplify Open Policy Agent with Styra DAS by Tim...apidays LIVE New York 2021 - Simplify Open Policy Agent with Styra DAS by Tim...
apidays LIVE New York 2021 - Simplify Open Policy Agent with Styra DAS by Tim...apidays
 
DEM07 Best Practices for Monitoring Amazon ECS Containers Launched with Fargate
DEM07 Best Practices for Monitoring Amazon ECS Containers Launched with FargateDEM07 Best Practices for Monitoring Amazon ECS Containers Launched with Fargate
DEM07 Best Practices for Monitoring Amazon ECS Containers Launched with FargateAmazon Web Services
 
Motadata - Unified Product Suite for IT Operations and Big Data Analytics
Motadata - Unified Product Suite for IT Operations and Big Data AnalyticsMotadata - Unified Product Suite for IT Operations and Big Data Analytics
Motadata - Unified Product Suite for IT Operations and Big Data Analyticsnovsela
 
Achieving digital transformation with Siebel CRM and Oracle Cloud
Achieving digital transformation with Siebel CRM and Oracle Cloud Achieving digital transformation with Siebel CRM and Oracle Cloud
Achieving digital transformation with Siebel CRM and Oracle Cloud Sonia Wadhwa
 
Ad hoc analytics with Cassandra and Spark
Ad hoc analytics with Cassandra and SparkAd hoc analytics with Cassandra and Spark
Ad hoc analytics with Cassandra and SparkMohammed Guller
 
How Trek10 Uses Datadog's Distributed Tracing to Improve AWS Lambda Projects ...
How Trek10 Uses Datadog's Distributed Tracing to Improve AWS Lambda Projects ...How Trek10 Uses Datadog's Distributed Tracing to Improve AWS Lambda Projects ...
How Trek10 Uses Datadog's Distributed Tracing to Improve AWS Lambda Projects ...Amazon Web Services
 
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven EnterprisePivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven EnterpriseVMware Tanzu
 
Why You Need Manageability Now More than Ever and How to Get It
Why You Need Manageability Now More than Ever and How to Get ItWhy You Need Manageability Now More than Ever and How to Get It
Why You Need Manageability Now More than Ever and How to Get ItGustavo Rene Antunez
 
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...DataStax Academy
 
Office 365 Monitoring Best Practices
Office 365 Monitoring Best PracticesOffice 365 Monitoring Best Practices
Office 365 Monitoring Best PracticesThousandEyes
 
20181127 オラクル講演資料(DataRobot AI Experience Tokyo)
20181127 オラクル講演資料(DataRobot AI Experience Tokyo)20181127 オラクル講演資料(DataRobot AI Experience Tokyo)
20181127 オラクル講演資料(DataRobot AI Experience Tokyo)オラクルエンジニア通信
 

Similar to Gimel at Dataworks Summit San Jose 2018 (20)

QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic Platform
 
PayPal Notebooks at Jupytercon 2018
PayPal Notebooks at Jupytercon 2018PayPal Notebooks at Jupytercon 2018
PayPal Notebooks at Jupytercon 2018
 
Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Gimel and PayPal Notebooks @ TDWI Leadership Summit OrlandoGimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
 
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
 
Scale By The Bay | 2020 | Gimel
Scale By The Bay | 2020 | GimelScale By The Bay | 2020 | Gimel
Scale By The Bay | 2020 | Gimel
 
apidays LIVE Australia 2020 - Data with a Mission by Matt McLarty
apidays LIVE Australia 2020 -  Data with a Mission by Matt McLarty apidays LIVE Australia 2020 -  Data with a Mission by Matt McLarty
apidays LIVE Australia 2020 - Data with a Mission by Matt McLarty
 
apidays LIVE Paris - Data with a mission: a COVID-19 API case study by Matt M...
apidays LIVE Paris - Data with a mission: a COVID-19 API case study by Matt M...apidays LIVE Paris - Data with a mission: a COVID-19 API case study by Matt M...
apidays LIVE Paris - Data with a mission: a COVID-19 API case study by Matt M...
 
Cloud Native with Kyma
Cloud Native with KymaCloud Native with Kyma
Cloud Native with Kyma
 
Top 5 Lessons Learned in Deploying AI in the Real World
Top 5 Lessons Learned in Deploying AI in the Real WorldTop 5 Lessons Learned in Deploying AI in the Real World
Top 5 Lessons Learned in Deploying AI in the Real World
 
apidays LIVE New York 2021 - Simplify Open Policy Agent with Styra DAS by Tim...
apidays LIVE New York 2021 - Simplify Open Policy Agent with Styra DAS by Tim...apidays LIVE New York 2021 - Simplify Open Policy Agent with Styra DAS by Tim...
apidays LIVE New York 2021 - Simplify Open Policy Agent with Styra DAS by Tim...
 
DEM07 Best Practices for Monitoring Amazon ECS Containers Launched with Fargate
DEM07 Best Practices for Monitoring Amazon ECS Containers Launched with FargateDEM07 Best Practices for Monitoring Amazon ECS Containers Launched with Fargate
DEM07 Best Practices for Monitoring Amazon ECS Containers Launched with Fargate
 
Motadata - Unified Product Suite for IT Operations and Big Data Analytics
Motadata - Unified Product Suite for IT Operations and Big Data AnalyticsMotadata - Unified Product Suite for IT Operations and Big Data Analytics
Motadata - Unified Product Suite for IT Operations and Big Data Analytics
 
Achieving digital transformation with Siebel CRM and Oracle Cloud
Achieving digital transformation with Siebel CRM and Oracle Cloud Achieving digital transformation with Siebel CRM and Oracle Cloud
Achieving digital transformation with Siebel CRM and Oracle Cloud
 
Ad hoc analytics with Cassandra and Spark
Ad hoc analytics with Cassandra and SparkAd hoc analytics with Cassandra and Spark
Ad hoc analytics with Cassandra and Spark
 
How Trek10 Uses Datadog's Distributed Tracing to Improve AWS Lambda Projects ...
How Trek10 Uses Datadog's Distributed Tracing to Improve AWS Lambda Projects ...How Trek10 Uses Datadog's Distributed Tracing to Improve AWS Lambda Projects ...
How Trek10 Uses Datadog's Distributed Tracing to Improve AWS Lambda Projects ...
 
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven EnterprisePivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
 
Why You Need Manageability Now More than Ever and How to Get It
Why You Need Manageability Now More than Ever and How to Get ItWhy You Need Manageability Now More than Ever and How to Get It
Why You Need Manageability Now More than Ever and How to Get It
 
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...
 
Office 365 Monitoring Best Practices
Office 365 Monitoring Best PracticesOffice 365 Monitoring Best Practices
Office 365 Monitoring Best Practices
 
20181127 オラクル講演資料(DataRobot AI Experience Tokyo)
20181127 オラクル講演資料(DataRobot AI Experience Tokyo)20181127 オラクル講演資料(DataRobot AI Experience Tokyo)
20181127 オラクル講演資料(DataRobot AI Experience Tokyo)
 

Recently uploaded

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 

Recently uploaded (20)

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 

Gimel at Dataworks Summit San Jose 2018

  • 2. Agenda ©2018 PayPal Inc. Confidential and proprietary. 2 • Introduction • PayPal’s Analytics Ecosystem • Why Gimel • Challenges in Analytics • Walk through simple use case • Gimel Open Source Journey
  • 3. About Us • Product manager, data processing products at PayPal • 20 years in data and analytics across networking, semi-conductors, telecom, security and fintech industries • Data warehouse developer, BI program manager, Data product manager romehta@paypal.com https://www.linkedin.com/in/romit-mehta/ ©2018 PayPal Inc. Confidential and proprietary. 3 Romit Mehta • Big data platform engineer at PayPal • 13 years in data engineering, 5 years in scalable solutions with big data • Developed several Spark-based solutions across NoSQL, Key-Value, Messaging, Document based & relational systems dmohanakumarchan@paypal.com https://www.linkedin.com/in/deepakmc/ Deepak Mohanakumar Chandramouli
  • 4. PayPal – Key Metrics and Analytics Ecosystem 4©2018 PayPal Inc. Confidential and proprietary.
  • 5. PayPal Big Data Platform 5 160+ PB Data 75,000+ YARN jobs/day One of the largest Aerospike, Teradata, Hortonworks and Oracle installations Compute supported: MR, Pig, Hive, Spark, Beam 13 prod clusters, 12 non- prod clusters GPU co-located with Hadoop
  • 6. 6 Developer Data scientist Analyst Operator Gimel SDK Notebooks PCatalog Data API Infrastructure services leveraged for elasticity and redundancy Multi-DC Public cloudPredictive resource allocation Logging Monitoring Alerting Security Application Lifecycle Management Compute Frameworkand APIs GimelData Platform User Experience andAccess R Studio BI tools
  • 8. Use case - Flights Cancelled
  • 9. 9 Kafka Teradata External HDFS / Hive Data Prep / Availability ProcessStream Ingest LoadExtract/Load Parquet/ORC/Text? Productionalize, Logging, Monitoring, Alerting, Auditing, Data Quality Data SourcesData Points Flights Events Airports Airlines Carrier Geography & Geo Tags Publish Use case challenges … ©2018 PayPal Inc. Confidential and proprietary. Analysis Real-time/ processed data
  • 10. ©2018 PayPal Inc. Confidential and proprietary. 10 Spark Read From Hbase Data Access Code is Cumbersome and Fragile
  • 11. ©2018 PayPal Inc. Confidential and proprietary. 11 Spark Read From Hbase Spark Read From Elastic Search Spark Read From AeroSpike Spark Read From Druid Data Access Code is Cumbersome and Fragile
  • 12. ©2018 PayPal Inc. Confidential and proprietary. 12 Datasets Challenges Data access tied to compute and data store versions Hard to find available data sets Storage-specific dataset creation results in duplication and increased latency No audit trail for dataset access No standards for on-boarding data sets for others to discover No statistics on data set usage and access trends Datasets
  • 13. ©2018 PayPal Inc. Confidential and proprietary. 13 High-friction Data Application Lifecycle Learn Code Optimize Build Deploy RunOnboarding Big Data Apps Learn Code Optimize Build Deploy RunCompute Engine Changed Learn Code Optimize Build Deploy RunCompute Version Upgraded Learn Code Optimize Build Deploy RunStorage API Changed Learn Code Optimize Build Deploy RunStorage Connector Upgraded Learn Code Optimize Build Deploy RunStorage Hosts Migrated Learn Code Optimize Build Deploy RunStorage Changed Learn Code Optimize Build Deploy Run*********************
  • 15. 15 API, PCatalog, Tools With Gimel & Notebooks ©2018 PayPal Inc. Confidential and proprietary. Kafka Teradata External HDFS/ Hive Data Prep / Availability ProcessIngest LoadExtract/Load Parquet/ORC/Text? Productionalize, Logging, Monitoring, Alerting, Auditing, Data QC Data SourcesData Points Flights Events Airports Airlines Carrier Geography & Geo Tags Analysis Publish Use case challenges - Simplified with Gimel
  • 16. ©2018 PayPal Inc. Confidential and proprietary. Spark Read From Hbase Spark Read From Elastic Search Spark Read From AeroSpike Spark Read From Druid With Data API ✔ Data Access Simplified with Gimel Data API 16
  • 17. ©2018 PayPal Inc. Confidential and proprietary. Spark Read From Hbase Spark Read From Elastic Search Spark Read From AeroSpike Spark Read From Druid With Data API ✔ SQL Support in Gimel Data Platform 17
  • 18. ©2018 PayPal Inc. Confidential and proprietary. 18 Data Application Lifecycle with Data API Learn Code Optimize Build Deploy RunOnboarding Big Data Apps RunCompute Engine Changed Compute Version Upgraded Storage API Changed Storage Connector Upgraded Storage Hosts Migrated Storage Changed ********************* Run Run Run Run Run Run
  • 19. Open Source 19©2018 PayPal Inc. Confidential and proprietary.
  • 20. Gimel Open Source Journey • Open source Gimel PCatalog: • Metadata services • Discovery services • Catalog UI • Open source Compute Framework (SCaaS) • Livy features and enhancements • Monitoring and alerting • SDK and Gimel integration • Open source PayPal Notebooks • Jupyter features and enhancements • Gimel integration ©2018 PayPal Inc. Confidential and proprietary. • Open sourced Gimel Data API in April 2018 (http://try.gimel.io)
  • 21. Gimel - Open Sourced Codebase available: https://github.com/paypal/gimel Slack: https://gimel-dev.slack.com Google Groups: https://groups.google.com/d/forum/gimel-dev ©2017 PayPal Inc. Confidential and proprietary. 21
  • 22. Q&A G i t h u b : h t t p : / / g i m e l . i o Tr y i t y o u r s e l f : h t t p : / / t r y. g i m e l . i o S l a c k : h t t p s : / / g i m e l - d e v. s l a c k . c o m G o o g l e G r o u p s : h t t p s : / / g r o u p s . g o o g l e . c o m / d / f o r u m / g i m e l - d e v 22
  • 23. Gimel – Deep Dive 23
  • 24. Job LIVY GRID Job Server Batch Livy API NAS Batch In InIn Interactive Sparkling Water Interactive Interactive Metrics History Server Thrift Server In InIn Interactive Interactive Log Log Indexing Search xDiscovery Maintain Catalog Scan Discover Metadata Services PCatalog UI Explore Configure Log Indexing Search PayPal Analytics Ecosystem ©2018 PayPal Inc. Confidential and proprietary.
  • 25. ©2018 PayPal Inc. Confidential and proprietary. 25 A peek into Streaming SQL Launches … Spark Streaming App --StreamingWindowSeconds setgimel.kafka.throttle.streaming.window.seconds=10; --Throttling setgimel.kafka.throttle.streaming.maxRatePerPartition=1500; --ZK checkpoint rootpath setgimel.kafka.consumer.checkpoint.root=/checkpoints/appname; --Checkpoint enablingflag -implicitlycheckpoints aftereach mini-batch in streaming setgimel.kafka.reader.checkpoint.save.enabled=true; --Jupyter MagicforstreamingSQLon Notebooks | Interactive Usecases --LivyREPL-Same magicforstreamingSQLworks | Streaming Usecases %%gimel-stream --AssumePre-SplitHBASETable as anexample insertintopcatalog.HBASE_dataset select cust_id, kafka_ds.* frompcatalog.KAFKA_dataset kafka_ds; Batch SQL Launches … Spark Batch App --Establish10 concurrent connections perTopic-Partition setgimel.kafka.throttle.batch.parallelsPerPartition=10; --Fetchat max-10 M messagesfromeach partition setgimel.kafka.throttle.batch.maxRecordsPerPartition=10,000,000; --Jupyter Magicon Notebooks | Interactive Usecases --LivyREPL-Same magicworks| Batch Usecases %%gimel insertintopcatalog.HIVE_dataset partition(yyyy,mm,dd,hh,mi) selectkafka_ds.*,gimel_load_id ,substr(commit_timestamp,1,4)as yyyy ,substr(commit_timestamp,6,2)as mm ,substr(commit_timestamp,9,2)as dd ,substr(commit_timestamp,12,2)as hh ,case when cast(substr(commit_timestamp,15,2)asINT) <= 30then "00" else "30" end asmi from pcatalog.KAFKA_dataset kafka_ds; Following are Jupyter/Livy Magic terms • %%gimel : calls gimel.executeBatch(sql) • %%gimel-stream : calls gimel.executeStream(sql)
  • 26. gimel.dataset.factory { KafkaDataSet ElasticSearchDataSet DruidDataSet HiveDataSet AerospikeDataSet HbaseDataSet CassandraDataSet JDBCDataSet } Metadata Services dataSet.read(“dataSetName”,options) dataSet.write(dataToWrite,”dataSetName”,options) dataStream.read(“dataSetName”, options) valstorageDataSet =getFromFactory(type=“Hive”) { Core Connector Implementation, example –Kafka Combination ofOpen SourceConnector and In-house implementations Open source connector such asDataStax/SHC /ES-Spark } & Anatomy of API gimel.datastream.factory{ KafkaDataStream } CatalogProvider.getDataSetProperties(“dataSetName”) valstorageDataStream= getFromStreamFactory(type=“kafka”) kafkaDataSet.read(“dataSetName”,options) hiveDataSet.write(dataToWrite,”dataSetName”,options) storageDataStream.read(“dataSetName”,options) dataSet.write(”pcatalog.HIVE_dataset”,readDf, options) val dataSet :gimel.DataSet =DataSet(sparkSession) valdf1 =dataSet.read(“pcatalog.KAFKA_dataset”, options); df1.createGlobalTempView(“tmp_abc123”) Val resolvedSelectSQL= selectSQL.replace(“pcatalog.KAFKA_dataset”,”tmp_abc123”) Val readDf : DataFrame= sparkSession.sql(resolvedSelectSQL); selectkafka_ds.*,gimel_load_id ,substr(commit_timestamp,1,4)as yyyy ,substr(commit_timestamp,6,2)as mm ,substr(commit_timestamp,9,2)as dd ,substr(commit_timestamp,12,2)as hh frompcatalog.KAFKA_dataset kafka_ds join default.geo_lkp lkp on kafka_ds.zip =geo_lkp.zip where geo_lkp.region = ‘MIDWEST’ %%gimel insertintopcatalog.HIVE_dataset partition(yyyy,mm,dd,hh,mi) --Establish10 concurrent connections perTopic-Partition setgimel.kafka.throttle.batch.parallelsPerPartition=10; --Fetch at max -10 M messagesfromeach partition setgimel.kafka.throttle.batch.maxRecordsPerPartition=10,000,000; ©2018 PayPal Inc. Confidential and proprietary.
  • 27. Setgimel.catalog.provider=PCATALOG CatalogProvider.getDataSetProperties(“dataSetName”) Metadata Services Setgimel.catalog.provider=USER CatalogProvider.getDataSetProperties(“dataSetName”) Setgimel.catalog.provider=HIVE CatalogProvider.getDataSetProperties(“dataSetName”) sql> set dataSetProperties={ "key.deserializer":"org.apache.kafka.common.serialization.StringDeserializer", "auto.offset.reset":"earliest", "gimel.kafka.checkpoint.zookeeper.host":"zookeeper:2181", "gimel.storage.type":"kafka", "gimel.kafka.whitelist.topics":"kafka_topic", "datasetName":"test_table1", "value.deserializer":"org.apache.kafka.common.serialization.ByteArrayDeserialize r", "value.serializer":"org.apache.kafka.common.serialization.ByteArraySerializer", "gimel.kafka.checkpoint.zookeeper.path":"/pcatalog/kafka_consumer/checkpoint", "gimel.kafka.avro.schema.source":"CSR", "gimel.kafka.zookeeper.connection.timeout.ms":"10000", "gimel.kafka.avro.schema.source.url":"http://schema_registry:8081", "key.serializer":"org.apache.kafka.common.serialization.StringSerializer", "gimel.kafka.avro.schema.source.wrapper.key":"schema_registry_key", "gimel.kafka.bootstrap.servers":"localhost:9092" } sql> Select * from pcatalog.test_table1. spark.sql("set gimel.catalog.provider=USER"); val dataSetOptions = DataSetProperties( "KAFKA", Array(Field("payload","string",true)) , Array(), Map( "datasetName" -> "test_table1", "auto.offset.reset"-> "earliest", "gimel.kafka.bootstrap.servers"-> "localhost:9092", "gimel.kafka.avro.schema.source"-> "CSR", "gimel.kafka.avro.schema.source.url"-> "http://schema_registry:8081", "gimel.kafka.avro.schema.source.wrapper.key"-> "schema_registry_key", "gimel.kafka.checkpoint.zookeeper.host"-> "zookeeper:2181", "gimel.kafka.checkpoint.zookeeper.path"-> "/pcatalog/kafka_consumer/checkpoint", "gimel.kafka.whitelist.topics"-> "kafka_topic", "gimel.kafka.zookeeper.connection.timeout.ms"-> "10000", "gimel.storage.type"-> "kafka", "key.serializer"-> "org.apache.kafka.common.serialization.StringSerializer", "value.serializer"-> "org.apache.kafka.common.serialization.ByteArraySerializer" ) ) dataSet.read(”test_table1",Map("dataSetProperties"->dataSetOptions)) CREATE EXTERNAL TABLE `pcatalog.test_table1` (payload string) LOCATION 'hdfs://tmp/' TBLPROPERTIES ( "datasetName" -> "dummy", "auto.offset.reset"-> "earliest", "gimel.kafka.bootstrap.servers"-> "localhost:9092", "gimel.kafka.avro.schema.source"-> "CSR", "gimel.kafka.avro.schema.source.url"-> "http://schema_registry:8081", "gimel.kafka.avro.schema.source.wrapper.key"-> "schema_registry_key", "gimel.kafka.checkpoint.zookeeper.host"-> "zookeeper:2181", "gimel.kafka.checkpoint.zookeeper.path"-> "/pcatalog/kafka_consumer/checkpoint", "gimel.kafka.whitelist.topics"-> "kafka_topic", "gimel.kafka.zookeeper.connection.timeout.ms"-> "10000", "gimel.storage.type"-> "kafka", "key.serializer"-> "org.apache.kafka.common.serialization.StringSerializer", "value.serializer"-> "org.apache.kafka.common.serialization.ByteArraySerializer" ); Spark-sql> Select * from pcatalog.test_table1 Scala> dataSet.read(”test_table1",Map("dataSetProperties"- >dataSetOptions)) Catalog Provider – USER | HIVE | PCATALOG | Your Own Catalog Metadata Setgimel.catalog.provider=YOUR_CATALOG CatalogProvider.getDataSetProperties(“dataSetName”) { //Implement this! } ©2018 PayPal Inc. Confidential and proprietary.
  • 28. Spark Thrift Server org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.sc ala //result = sqlContext.sql(statement)  Original SQL Execution //Integration of Gimel in Spark result = GimelQueryProcessor.executeBatch(statement, sqlContext.sparkSession) Integration with ecosystems class SparkSqlInterpreter(conf: SparkConf) extends SparkInterpreter(conf) { private val SCALA_MAGIC = "%%[sS][cC][aA][lL][aA] (.*)".r private val PCATALOG_BATCH_MAGIC = "%%[gG][iI][mM][eE][lL](.*)".r private val PCATALOG_STREAM_MAGIC = "%%[gG][iI][mM][eE][lL](.*)".sS][tT][rR][eE][aA][mM] (.*)".r // ........ // ..... case PCATALOG_BATCH_MAGIC(gimelCode) => GimelQueryProcessor.executeBatch(gimelCode, sparkSession) case PCATALOG_STREAM_MAGIC(gimelCode) => GimelQueryProcessor.executeStream(gimelCode, sparkSession) case _ => // ........ // ..... com/cloudera/livy/repl/SparkSqlInterpreter.scala Livy REPL sparkmagic/sparkmagic/kernels/sparkkernel/kernel.js define(['base/js/namespace'], function(IPython){ var onload = function() { IPython.CodeCell.config_defaults.highlight_modes['magic_text/x-sql'] = {'reg':[/^%%gimel/]};} return { onload: onload }}) Jupyter Notebooks ©2018 PayPal Inc. Confidential and proprietary.
  • 29. Data Stores Supported ©2018 PayPal Inc. Confidential and proprietary. 29 Systems REST datasets
  • 31. Acknowledgements Gimel and PayPal Notebooks team: Andrew Alves Anisha Nainani Ayushi Agarwal Baskaran Gopalan Dheeraj Rampally Deepak Chandramouli Laxmikant Patil Meisam Fathi Salmi Prabhu Kasinathan Praveen Kanamarlapudi Romit Mehta Thilak Balasubramanian Weijun Qian 31
  • 32. Appendix 32©2018 PayPal Inc. Confidential and proprietary.
  • 33. References Used Images Referred : https://www.google.com/search?q=big+data+stack+images&source=lnms&tbm=isch&sa=X&ved=0ahUKEwip1Jz3voPaAhU oxFQKHV33AsgQ_AUICigB&biw=1440&bih=799 33©2018 PayPal Inc. Confidential and proprietary.
  • 34. Spark Thrift Server - Integration spark/sql/hive- thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala //result = sqlContext.sql(statement)  Original SQL Execution //Integration of Gimel in Spark result = GimelQueryProcessor.executeBatch(statement, sqlContext.sparkSession) ©2018 PayPal Inc. Confidential and proprietary.
  • 35. Livy - Integration class SparkSqlInterpreter(conf: SparkConf) extends SparkInterpreter(conf) { private val SCALA_MAGIC = "%%[sS][cC][aA][lL][aA] (.*)".r private val PCATALOG_BATCH_MAGIC = "%%[gG][iI][mM][eE][lL](.*)".r private val PCATALOG_STREAM_MAGIC = "%%[gG][iI][mM][eE][lL](.*)".sS][tT][rR][eE][aA][mM] (.*)".r // ........ // ..... override def execute(code: String, outputPath: String): Interpreter.ExecuteResponse = { require(sparkContext != null && sqlContext != null && sparkSession != null) code match { case SCALA_MAGIC(scalaCode) => super.execute(scalaCode, null) case PCATALOG_BATCH_MAGIC(gimelCode) => Try { GimelQueryProcessor.executeBatch(gimelCode, sparkSession) } match { case Success(x) => Interpreter.ExecuteSuccess(TEXT_PLAIN -> x) case _ => Interpreter.ExecuteError("Failed", " ") } case PCATALOG_STREAM_MAGIC(gimelCode) => Try { GimelQueryProcessor.executeStream(gimelCode, sparkSession) } match { case Success(x) => Interpreter.ExecuteSuccess(TEXT_PLAIN -> x) case _ => Interpreter.ExecuteError("Failed", " ") } case _ => // ........ // ..... /repl/src/main/scala/com/cloudera/livy/repl/SparkSqlInterpreter.s cala ©2018 PayPal Inc. Confidential and proprietary.
  • 36. PayPal Notebooks (Jupyter) - Integration def _scala_pcatalog_command(self, sql_context_variable_name): if sql_context_variable_name == u'spark': command = u'val output= {{import java.io.{{ByteArrayOutputStream, StringReader}};val outCapture = new ByteArrayOutputStream;Console.withOut(outCapture){{gimel.GimelQueryProcessor.executeBatch("""{}""",sparkSession)}}}}'.format(self.query) else: command = u'val output= {{import java.io.{{ByteArrayOutputStream, StringReader}};val outCapture = new ByteArrayOutputStream;Console.withOut(outCapture){{gimel..GimelQueryProcessor.executeBatch("""{}""",{})}}}}'.format(self.query, sql_context_variable_name) if self.samplemethod == u'sample': command = u'{}.sample(false, {})'.format(command, self.samplefraction) if self.maxrows >= 0: command = u'{}.take({})'.format(command, self.maxrows) else: command = u'{}.collect'.format(command) return Command(u'{}.foreach(println)'.format(command+';noutput')) sparkmagic/sparkmagic/livyclientlib/sqlquery.py sparkmagic/sparkmagic/kernels/sparkkernel/kernel.js define(['base/js/namespace'], function(IPython){ var onload = function() { IPython.CodeCell.config_defaults.highlight_modes['magic_text/x-sql'] = {'reg':[/^%%sql/]}; IPython.CodeCell.config_defaults.highlight_modes['magic_text/x-python'] = {'reg':[/^%%local/]}; IPython.CodeCell.config_defaults.highlight_modes['magic_text/x-sql'] = {'reg':[/^%%gimel/]};} return { onload: onload } }) ©2018 PayPal Inc. Confidential and proprietary.
  • 37. Connectors | High level ©2018 PayPal Inc. Confidential and proprietary. 37 Storage Version API Implementation Kafka 0.10.2 Batch & Stream Connectors – Implementation from scratch Elastic Search 5.4.6 Connector | https://www.elastic.co/guide/en/elasticsearch/hadoop/5.4/spark.html Additional implementations added in Gimel to support daily / monthly partitioned indexes in ES Aerospike 3.1x Read | Aerospike Spark Connector(Aerospark) is used to read data directly into a DataFrame (https://github.com/sasha-polev/aerospark) Write | Aerospike Native Java Client Put API is used. For each partition of the Dataframe a client connection is established, to write data from that partition to Aerospike. HBASE 1.2 Connector | Horton Works HBASE Connector for Spark (SHC) https://github.com/hortonworks-spark/shc Cassandra 2.x Connector | DataStax Connector https://github.com/datastax/spark-cassandra-connector HIVE 1.2 Leverages spark APIs under the hood. Druid 0.82 Connector | Leverages Tranquility under the hood https://github.com/druid-io/tranquility Teradata / Relational Leverages JDBC Storage Handler Support for Batch Reads/Loads , FAST Load & FAST Exports Alluxio Leverage Cross cluster access via reads using Spark Conf : spark.yarn.access.namenodes