Gimel is PayPal's data platform that provides a unified interface for accessing and analyzing data across different data stores and processing engines. The presentation provides an overview of Gimel, including PayPal's analytics ecosystem, the challenges Gimel addresses around data access and application lifecycle, and a demo of how Gimel simplifies a flights cancelled use case. It also discusses Gimel's open source journey and integration with ecosystems like Spark and Jupyter notebooks.
5. PayPal Big Data Platform
5
160+ PB Data
75,000+ YARN
jobs/day
One of the largest
Aerospike,
Teradata,
Hortonworks and
Oracle installations
Compute supported:
MR, Pig, Hive, Spark,
Beam
13 prod clusters, 12 non-
prod clusters
GPU co-located with
Hadoop
6. 6
Developer Data scientist Analyst Operator
Gimel SDK Notebooks
PCatalog Data API
Infrastructure services leveraged for elasticity and redundancy
Multi-DC Public cloudPredictive resource allocation
Logging
Monitoring
Alerting
Security
Application
Lifecycle
Management
Compute
Frameworkand
APIs
GimelData
Platform
User
Experience
andAccess
R Studio BI tools
22. Q&A
G i t h u b : h t t p : / / g i m e l . i o
Tr y i t y o u r s e l f : h t t p : / / t r y. g i m e l . i o
S l a c k : h t t p s : / / g i m e l - d e v. s l a c k . c o m
G o o g l e G r o u p s : h t t p s : / / g r o u p s . g o o g l e . c o m / d / f o r u m / g i m e l - d e v
22