Site | https://www.infoq.com/qconai2018/
Youtube | https://www.youtube.com/watch?v=2h0biIli2F4&t=19s
At PayPal, data engineers, analysts and data scientists work with a variety of datasources (Messaging, NoSQL, RDBMS, Documents, TSDB), compute engines (Spark, Flink, Beam, Hive), languages (Scala, Python, SQL) and execution models (stream, batch, interactive).
Due to this complex matrix of technologies and thousands of datasets, engineers spend considerable time learning about different data sources, formats, programming models, APIs, optimizations, etc. which impacts time-to-market (TTM). To solve this problem and to make product development more effective, PayPal Data Platform developed "Gimel", a unified analytics data platform which provides access to any storage through a single unified data API and SQL, that are powered by a centralized data catalog.
In this session, we will introduce you to the various components of Gimel - Compute Platform, Data API, PCatalog, GSQL and Notebooks. We will provide a demo depicting how Gimel reduces TTM by helping our engineers write a single line of code to access any storage without knowing the complexity behind the scenes.
6. PayPal Big Data Platform
6
13 prod clusters, 12 non-
prod clusters
GPU co-located with
Hadoop
150+ PB Data
40,000+ YARN
jobs/day
One of the largest
Aerospike,
Teradata,
Hortonworks and
Oracle installations
Compute supported:
MR, Pig, Hive, Spark,
Beam
8. 8
Developer Data scientist Analyst Operator
Gimel SDK Notebooks
PCatalog Data API
Infrastructure services leveraged for elasticity and redundancy
Multi-DC Public cloudPredictive resource allocation
Logging
Monitoring
Alerting
Security
Application
Lifecycle
Management
Compute
Frameworkand
APIs
GimelData
Platform
User
Experience
andAccess
R Studio BI tools
47. Q&A
( 1 0:55 A M ) G i m e l C o d e la bs: h t t p:/ /tr y.gime l.i o
S l a ck : h t t ps :// gime l - de v.s la ck .com
G o o gle G roups: h t t p s:/ /groups .google .com/ d/for um/ gim el - dev
47