SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
Improving Mobile
Payments with Real time
Spark
● Madhukara Phatak
● Big data consultant and
trainer at datamantra.io
● Consult in Hadoop, Spark
and Scala
● www.madhukaraphatak.com
Agenda
● Mobile as drive for big data
● Our customer solution
● Existing data solution
● Improved solution
● Technical details
● Future enhancements
● Q & A
Mobile as Big data drive
● Mobile has changed the way in which we interact with
world
● Most of the buy/sell happens on mobile today
○ Myntra went fully mobile
○ Flipkart and amazon say their 50% buy happens on
mobile
○ Quikr and OLX is mobile based selling platform
○ Ola etc
Challenges in Mobile
● Customers expect the service to available 24/7
● Tiny screens make very challenging to typical software
flows
● Flaky connectivity of mobile networks makes it tougher
● Constant moving results in drop in interactions
● No more downtime
● Everything has to be done in realtime
Mobile payments
● Almost every app earlier mentioned needs some kind of
payment
● Getting payments right on mobile is very hard
● Globally 21% of online shoppers abandon their basket
due to payment failures or delays
● Some companies are building sdk’s to help the app
developers
● Our customer is one of them
Why mobile payments are hard?
Too many inputs
Terrible interface by Banks
OTP vs Password
Our customer solution
Our customer solution
● Mobile sdk for applications simplify the payments
● SDK provides better user interface like big buttons to
generate OTP or other flows
● SDK also helps in filling up different kind of forms given
by different banks using consistent UI
● Better user experience across applications
● Application sends anonymous payments details across
apps to our customer servers
Some numbers
● 40 + customers
● Over 1 million transactions per month as per March
● Around 55% success rate ( 5 % above average)
● Supports major banks, payment gateways and wallet
providers
● Soon will be available in other than mobile payment
space
Why data matters?
● As number of transaction increases, things will go
wrong
● There are so many different combinations to go wrong
● Example
○ Airtel OTP failing with state bank netbanking
○ Customers stuck in password page
○ Not able to read OTP from some specific
● Understanding customer pain and reacting to it is
paramount
● Every help results in payment
Initial BI solution
Events
Hourly
Push
JSON
Data
S3FS
Session Wise
Aggregations
Initial BI solution
● Phone sdk pushes events like transaction initiation,
payment complete to logging servers
● Logging servers roll log for every one hour and push to
s3
● A single node spark machine aggregates data by
sessions and pushes it to mysql
● Google BigQuery is used for adhoc querying
Challenges with BI solution
● Batch processing
● Geared towards more of report generation oriented flow
● Very minimal use of Spark API’s as team was not well
aware of it’s potential
● No integration with mobile sdk for feed back loop
Requirements for consulting
● Bring the same reporting calculations to real time
● Understanding the user behaviour and tracking his/her
flow over a session
● Closing the loop by providing automatic alerts based on
the metric calculations
● Some new specific business cases like loyalty
management etc
● Improving team expertise on spark
Choosing Spark streaming
● Company was already invested in Spark so spark
streaming was no brainer
● Also porting spark batch code to streaming was mostly
straight forward as both talk same API
● Company used python as Spark API language which
was supported by streaming also
● So we didn't consider storm we went ahead with Spark
streaming
First version
Events
Five Minute Push
JSON
Data
FileStream
Session Wise
Aggregations
First version
● We used fileStream API of spark streaming which
allowed us to poll a s3 bucket for every few mins
● A new rolling appender was added to log servers to
push logs to s3 every 5 mins
● Exact same batch code was used for calculations which
made transition very easy
● All downstream applications remained same
Second version
Events
JSON
Data
Session Wise
Aggregations
Hourly
Push Realtime
Amazon Kinesis
● A kafka like distributed message queue by Amazon
● It’s used as managed kafka source on AWS web
services
● Highly scalable and low latency support
● Persistence with fault tolerance across multiple
availability zones
● Great integration with Spark
Second version
● Amazon kinesis is added as real time stream source
● Logging server push logs to kinesis as they arrive
● Streaming application pulls the data from kinesis for
every few mins
● Multiple partitions support added for parallel streams
Challenges with Python
● Spark streaming API for python was introduced in 1.2
whereas spark-streaming for Scala/Java is available
from 0.8
● No aws kinesis connector was available as of March
● Team has to write it’s own
● No support for python in Spark job server
Challenges from batch to streaming
● Session typically last from 1-10 mins. Batch is easy
most of the time session is done for a one hour data but
challenging for real time data
● Designing state for session
● Designing checkpointing and deciding on interval
● Weird checkpointing issues with s3 due to eventual
consistency
Improvements to batch code
● Most of the code was written in rdd paradigm as it was
only know to team
● Team was trained on spark sql and spark streaming
● Majority code was ported to Spark sql based solution to
improve readability and maintainability
● Recently moved into Dataframe based code
Third version
Events
JSON
Data
Session Wise
Aggregations
Hourly
Push Realtime
Choosing Mesos
● Mesos is a great cluster manager for Spark only
workloads
● Has specific coarse-grain mode which is dedicated for
the real time systems
● Minimal overhead compared to YARN
● Easy to setup on EC2
Fourth version
Events
JSON
Data
Session Wise
Aggregations
Hourly
Push Realtime
Grafana
● Added grafana for visualization and dashboards
● Graphana = Graphite + influxDB
● Moved away from mysql to time series database influx
DB
● Scales much better compared to mysql
● Data scientists or product managers can monitor
customers using these dashboards
● Integrates with mobile sdk

Contenu connexe

Tendances

Multi Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and TelliusMulti Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and Telliusdatamantra
 
Productionalizing a spark application
Productionalizing a spark applicationProductionalizing a spark application
Productionalizing a spark applicationdatamantra
 
Understanding transactional writes in datasource v2
Understanding transactional writes in  datasource v2Understanding transactional writes in  datasource v2
Understanding transactional writes in datasource v2datamantra
 
Introduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset APIIntroduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset APIdatamantra
 
Introduction to Datasource V2 API
Introduction to Datasource V2 APIIntroduction to Datasource V2 API
Introduction to Datasource V2 APIdatamantra
 
Exploratory Data Analysis in Spark
Exploratory Data Analysis in SparkExploratory Data Analysis in Spark
Exploratory Data Analysis in Sparkdatamantra
 
Introduction to dataset
Introduction to datasetIntroduction to dataset
Introduction to datasetdatamantra
 
Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streamingdatamantra
 
Anatomy of Data Source API : A deep dive into Spark Data source API
Anatomy of Data Source API : A deep dive into Spark Data source APIAnatomy of Data Source API : A deep dive into Spark Data source API
Anatomy of Data Source API : A deep dive into Spark Data source APIdatamantra
 
Evolution of apache spark
Evolution of apache sparkEvolution of apache spark
Evolution of apache sparkdatamantra
 
Introduction to Structured streaming
Introduction to Structured streamingIntroduction to Structured streaming
Introduction to Structured streamingdatamantra
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streamingdatamantra
 
Migrating to spark 2.0
Migrating to spark 2.0Migrating to spark 2.0
Migrating to spark 2.0datamantra
 
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development ModelLaskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development ModelGarindra Prahandono
 
Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2datamantra
 
Spark architecture
Spark architectureSpark architecture
Spark architecturedatamantra
 
Introduction to spark 2.0
Introduction to spark 2.0Introduction to spark 2.0
Introduction to spark 2.0datamantra
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streamingdatamantra
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafkadatamantra
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streamingdatamantra
 

Tendances (20)

Multi Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and TelliusMulti Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and Tellius
 
Productionalizing a spark application
Productionalizing a spark applicationProductionalizing a spark application
Productionalizing a spark application
 
Understanding transactional writes in datasource v2
Understanding transactional writes in  datasource v2Understanding transactional writes in  datasource v2
Understanding transactional writes in datasource v2
 
Introduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset APIIntroduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset API
 
Introduction to Datasource V2 API
Introduction to Datasource V2 APIIntroduction to Datasource V2 API
Introduction to Datasource V2 API
 
Exploratory Data Analysis in Spark
Exploratory Data Analysis in SparkExploratory Data Analysis in Spark
Exploratory Data Analysis in Spark
 
Introduction to dataset
Introduction to datasetIntroduction to dataset
Introduction to dataset
 
Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streaming
 
Anatomy of Data Source API : A deep dive into Spark Data source API
Anatomy of Data Source API : A deep dive into Spark Data source APIAnatomy of Data Source API : A deep dive into Spark Data source API
Anatomy of Data Source API : A deep dive into Spark Data source API
 
Evolution of apache spark
Evolution of apache sparkEvolution of apache spark
Evolution of apache spark
 
Introduction to Structured streaming
Introduction to Structured streamingIntroduction to Structured streaming
Introduction to Structured streaming
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streaming
 
Migrating to spark 2.0
Migrating to spark 2.0Migrating to spark 2.0
Migrating to spark 2.0
 
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development ModelLaskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
 
Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2
 
Spark architecture
Spark architectureSpark architecture
Spark architecture
 
Introduction to spark 2.0
Introduction to spark 2.0Introduction to spark 2.0
Introduction to spark 2.0
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streaming
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafka
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
 

En vedette

Digital assets And Currencies In The Information Age: Do Your Ones And Zero H...
Digital assets And Currencies In The Information Age: Do Your Ones And Zero H...Digital assets And Currencies In The Information Age: Do Your Ones And Zero H...
Digital assets And Currencies In The Information Age: Do Your Ones And Zero H...Bradley Perry
 
Google Wallet Presentation
Google Wallet PresentationGoogle Wallet Presentation
Google Wallet PresentationRaghav Sharma
 
Mesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overviewMesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overviewKrishna-Kumar
 
Digital currencies new technology new business model
Digital currencies new technology new business modelDigital currencies new technology new business model
Digital currencies new technology new business modelShiva Bissessar
 
Payments industry watches for inflections in mobile, credit
Payments industry watches for inflections in mobile, creditPayments industry watches for inflections in mobile, credit
Payments industry watches for inflections in mobile, creditBloomberg LP
 
Apple's Passbook and Google Wallet: 5 key differences you need to know
Apple's Passbook and Google Wallet: 5 key differences you need to knowApple's Passbook and Google Wallet: 5 key differences you need to know
Apple's Passbook and Google Wallet: 5 key differences you need to knowVibes_Thought_Leadership
 
Demonetisation: Push Towards a Digital Economy
Demonetisation: Push Towards a Digital EconomyDemonetisation: Push Towards a Digital Economy
Demonetisation: Push Towards a Digital EconomyShreyas Kamath
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flinkdatamantra
 
CBGTBT - Part 5 - Blockchains 102
CBGTBT - Part 5 - Blockchains 102CBGTBT - Part 5 - Blockchains 102
CBGTBT - Part 5 - Blockchains 102Blockstrap.com
 
CBGTBT - Part 6 - Transactions 102
CBGTBT - Part 6 - Transactions 102CBGTBT - Part 6 - Transactions 102
CBGTBT - Part 6 - Transactions 102Blockstrap.com
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Sparkdatamantra
 
Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Uri Laserson
 

En vedette (16)

Digital Currencies: Where to from here?
Digital Currencies: Where to from here?Digital Currencies: Where to from here?
Digital Currencies: Where to from here?
 
Digital assets And Currencies In The Information Age: Do Your Ones And Zero H...
Digital assets And Currencies In The Information Age: Do Your Ones And Zero H...Digital assets And Currencies In The Information Age: Do Your Ones And Zero H...
Digital assets And Currencies In The Information Age: Do Your Ones And Zero H...
 
Anatomy of file write in hadoop
Anatomy of file write in hadoopAnatomy of file write in hadoop
Anatomy of file write in hadoop
 
Google Wallet Presentation
Google Wallet PresentationGoogle Wallet Presentation
Google Wallet Presentation
 
Mesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overviewMesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overview
 
Visa
VisaVisa
Visa
 
Digital currencies new technology new business model
Digital currencies new technology new business modelDigital currencies new technology new business model
Digital currencies new technology new business model
 
Payments industry watches for inflections in mobile, credit
Payments industry watches for inflections in mobile, creditPayments industry watches for inflections in mobile, credit
Payments industry watches for inflections in mobile, credit
 
Apple's Passbook and Google Wallet: 5 key differences you need to know
Apple's Passbook and Google Wallet: 5 key differences you need to knowApple's Passbook and Google Wallet: 5 key differences you need to know
Apple's Passbook and Google Wallet: 5 key differences you need to know
 
Demonetisation: Push Towards a Digital Economy
Demonetisation: Push Towards a Digital EconomyDemonetisation: Push Towards a Digital Economy
Demonetisation: Push Towards a Digital Economy
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
A Multi Colored YARN
A Multi Colored YARNA Multi Colored YARN
A Multi Colored YARN
 
CBGTBT - Part 5 - Blockchains 102
CBGTBT - Part 5 - Blockchains 102CBGTBT - Part 5 - Blockchains 102
CBGTBT - Part 5 - Blockchains 102
 
CBGTBT - Part 6 - Transactions 102
CBGTBT - Part 6 - Transactions 102CBGTBT - Part 6 - Transactions 102
CBGTBT - Part 6 - Transactions 102
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)
 

Similaire à Improving Mobile Payments With Real time Spark

Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data PlatformDani Solà Lagares
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uberconfluent
 
Understanding time in structured streaming
Understanding time in structured streamingUnderstanding time in structured streaming
Understanding time in structured streamingdatamantra
 
apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...
apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...
apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...apidays
 
Simply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event ProcessingSimply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event Processingidan_by
 
apidays LIVE New York 2021 - Managing the usage of Asynchronous APIs: What do...
apidays LIVE New York 2021 - Managing the usage of Asynchronous APIs: What do...apidays LIVE New York 2021 - Managing the usage of Asynchronous APIs: What do...
apidays LIVE New York 2021 - Managing the usage of Asynchronous APIs: What do...apidays
 
Google App Engine Overview and Update
Google App Engine Overview and UpdateGoogle App Engine Overview and Update
Google App Engine Overview and UpdateChris Schalk
 
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...Flink Forward
 
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...confluent
 
Instruments to play microservice
Instruments to play microserviceInstruments to play microservice
Instruments to play microserviceChandresh Pancholi
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQLWSO2
 
Scaling Production Data across Microservices
Scaling Production Data across MicroservicesScaling Production Data across Microservices
Scaling Production Data across MicroservicesErik Ashepa
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Nginx Conference 2016 - Learnings and State of the Industry
Nginx Conference 2016 - Learnings and State of the IndustryNginx Conference 2016 - Learnings and State of the Industry
Nginx Conference 2016 - Learnings and State of the IndustryBenjamin Scholler
 
[APIdays NY] Managing the usage of Asynchronous APIs: What does it take?
[APIdays NY] Managing the usage of Asynchronous APIs: What does it take?[APIdays NY] Managing the usage of Asynchronous APIs: What does it take?
[APIdays NY] Managing the usage of Asynchronous APIs: What does it take?WSO2
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastDatabricks
 
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Bowen Li
 
The good, the bad, the ugly side of step functions
The good, the bad, the ugly side of step functionsThe good, the bad, the ugly side of step functions
The good, the bad, the ugly side of step functionsMohsiur Rahman
 
Sweet Streams (Are made of this)
Sweet Streams (Are made of this)Sweet Streams (Are made of this)
Sweet Streams (Are made of this)Corneil du Plessis
 
Structured Streaming in Spark
Structured Streaming in SparkStructured Streaming in Spark
Structured Streaming in SparkDigital Vidya
 

Similaire à Improving Mobile Payments With Real time Spark (20)

Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data Platform
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
 
Understanding time in structured streaming
Understanding time in structured streamingUnderstanding time in structured streaming
Understanding time in structured streaming
 
apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...
apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...
apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...
 
Simply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event ProcessingSimply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event Processing
 
apidays LIVE New York 2021 - Managing the usage of Asynchronous APIs: What do...
apidays LIVE New York 2021 - Managing the usage of Asynchronous APIs: What do...apidays LIVE New York 2021 - Managing the usage of Asynchronous APIs: What do...
apidays LIVE New York 2021 - Managing the usage of Asynchronous APIs: What do...
 
Google App Engine Overview and Update
Google App Engine Overview and UpdateGoogle App Engine Overview and Update
Google App Engine Overview and Update
 
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
 
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
 
Instruments to play microservice
Instruments to play microserviceInstruments to play microservice
Instruments to play microservice
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
Scaling Production Data across Microservices
Scaling Production Data across MicroservicesScaling Production Data across Microservices
Scaling Production Data across Microservices
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Nginx Conference 2016 - Learnings and State of the Industry
Nginx Conference 2016 - Learnings and State of the IndustryNginx Conference 2016 - Learnings and State of the Industry
Nginx Conference 2016 - Learnings and State of the Industry
 
[APIdays NY] Managing the usage of Asynchronous APIs: What does it take?
[APIdays NY] Managing the usage of Asynchronous APIs: What does it take?[APIdays NY] Managing the usage of Asynchronous APIs: What does it take?
[APIdays NY] Managing the usage of Asynchronous APIs: What does it take?
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and Fast
 
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
 
The good, the bad, the ugly side of step functions
The good, the bad, the ugly side of step functionsThe good, the bad, the ugly side of step functions
The good, the bad, the ugly side of step functions
 
Sweet Streams (Are made of this)
Sweet Streams (Are made of this)Sweet Streams (Are made of this)
Sweet Streams (Are made of this)
 
Structured Streaming in Spark
Structured Streaming in SparkStructured Streaming in Spark
Structured Streaming in Spark
 

Plus de datamantra

State management in Structured Streaming
State management in Structured StreamingState management in Structured Streaming
State management in Structured Streamingdatamantra
 
Spark on Kubernetes
Spark on KubernetesSpark on Kubernetes
Spark on Kubernetesdatamantra
 
Core Services behind Spark Job Execution
Core Services behind Spark Job ExecutionCore Services behind Spark Job Execution
Core Services behind Spark Job Executiondatamantra
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsdatamantra
 
Spark stack for Model life-cycle management
Spark stack for Model life-cycle managementSpark stack for Model life-cycle management
Spark stack for Model life-cycle managementdatamantra
 
Productionalizing Spark ML
Productionalizing Spark MLProductionalizing Spark ML
Productionalizing Spark MLdatamantra
 
Testing Spark and Scala
Testing Spark and ScalaTesting Spark and Scala
Testing Spark and Scaladatamantra
 
Understanding Implicits in Scala
Understanding Implicits in ScalaUnderstanding Implicits in Scala
Understanding Implicits in Scaladatamantra
 
Scalable Spark deployment using Kubernetes
Scalable Spark deployment using KubernetesScalable Spark deployment using Kubernetes
Scalable Spark deployment using Kubernetesdatamantra
 
Introduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actorsIntroduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actorsdatamantra
 
Functional programming in Scala
Functional programming in ScalaFunctional programming in Scala
Functional programming in Scaladatamantra
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scaledatamantra
 
Platform for Data Scientists
Platform for Data ScientistsPlatform for Data Scientists
Platform for Data Scientistsdatamantra
 
Building scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTPBuilding scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTPdatamantra
 
Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2datamantra
 
Anatomy of spark catalyst
Anatomy of spark catalystAnatomy of spark catalyst
Anatomy of spark catalystdatamantra
 

Plus de datamantra (16)

State management in Structured Streaming
State management in Structured StreamingState management in Structured Streaming
State management in Structured Streaming
 
Spark on Kubernetes
Spark on KubernetesSpark on Kubernetes
Spark on Kubernetes
 
Core Services behind Spark Job Execution
Core Services behind Spark Job ExecutionCore Services behind Spark Job Execution
Core Services behind Spark Job Execution
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloads
 
Spark stack for Model life-cycle management
Spark stack for Model life-cycle managementSpark stack for Model life-cycle management
Spark stack for Model life-cycle management
 
Productionalizing Spark ML
Productionalizing Spark MLProductionalizing Spark ML
Productionalizing Spark ML
 
Testing Spark and Scala
Testing Spark and ScalaTesting Spark and Scala
Testing Spark and Scala
 
Understanding Implicits in Scala
Understanding Implicits in ScalaUnderstanding Implicits in Scala
Understanding Implicits in Scala
 
Scalable Spark deployment using Kubernetes
Scalable Spark deployment using KubernetesScalable Spark deployment using Kubernetes
Scalable Spark deployment using Kubernetes
 
Introduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actorsIntroduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actors
 
Functional programming in Scala
Functional programming in ScalaFunctional programming in Scala
Functional programming in Scala
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scale
 
Platform for Data Scientists
Platform for Data ScientistsPlatform for Data Scientists
Platform for Data Scientists
 
Building scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTPBuilding scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTP
 
Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2
 
Anatomy of spark catalyst
Anatomy of spark catalystAnatomy of spark catalyst
Anatomy of spark catalyst
 

Dernier

Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 

Dernier (20)

Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 

Improving Mobile Payments With Real time Spark

  • 2. ● Madhukara Phatak ● Big data consultant and trainer at datamantra.io ● Consult in Hadoop, Spark and Scala ● www.madhukaraphatak.com
  • 3. Agenda ● Mobile as drive for big data ● Our customer solution ● Existing data solution ● Improved solution ● Technical details ● Future enhancements ● Q & A
  • 4. Mobile as Big data drive ● Mobile has changed the way in which we interact with world ● Most of the buy/sell happens on mobile today ○ Myntra went fully mobile ○ Flipkart and amazon say their 50% buy happens on mobile ○ Quikr and OLX is mobile based selling platform ○ Ola etc
  • 5. Challenges in Mobile ● Customers expect the service to available 24/7 ● Tiny screens make very challenging to typical software flows ● Flaky connectivity of mobile networks makes it tougher ● Constant moving results in drop in interactions ● No more downtime ● Everything has to be done in realtime
  • 6. Mobile payments ● Almost every app earlier mentioned needs some kind of payment ● Getting payments right on mobile is very hard ● Globally 21% of online shoppers abandon their basket due to payment failures or delays ● Some companies are building sdk’s to help the app developers ● Our customer is one of them
  • 12. Our customer solution ● Mobile sdk for applications simplify the payments ● SDK provides better user interface like big buttons to generate OTP or other flows ● SDK also helps in filling up different kind of forms given by different banks using consistent UI ● Better user experience across applications ● Application sends anonymous payments details across apps to our customer servers
  • 13. Some numbers ● 40 + customers ● Over 1 million transactions per month as per March ● Around 55% success rate ( 5 % above average) ● Supports major banks, payment gateways and wallet providers ● Soon will be available in other than mobile payment space
  • 14. Why data matters? ● As number of transaction increases, things will go wrong ● There are so many different combinations to go wrong ● Example ○ Airtel OTP failing with state bank netbanking ○ Customers stuck in password page ○ Not able to read OTP from some specific ● Understanding customer pain and reacting to it is paramount ● Every help results in payment
  • 16. Initial BI solution ● Phone sdk pushes events like transaction initiation, payment complete to logging servers ● Logging servers roll log for every one hour and push to s3 ● A single node spark machine aggregates data by sessions and pushes it to mysql ● Google BigQuery is used for adhoc querying
  • 17. Challenges with BI solution ● Batch processing ● Geared towards more of report generation oriented flow ● Very minimal use of Spark API’s as team was not well aware of it’s potential ● No integration with mobile sdk for feed back loop
  • 18. Requirements for consulting ● Bring the same reporting calculations to real time ● Understanding the user behaviour and tracking his/her flow over a session ● Closing the loop by providing automatic alerts based on the metric calculations ● Some new specific business cases like loyalty management etc ● Improving team expertise on spark
  • 19. Choosing Spark streaming ● Company was already invested in Spark so spark streaming was no brainer ● Also porting spark batch code to streaming was mostly straight forward as both talk same API ● Company used python as Spark API language which was supported by streaming also ● So we didn't consider storm we went ahead with Spark streaming
  • 20. First version Events Five Minute Push JSON Data FileStream Session Wise Aggregations
  • 21. First version ● We used fileStream API of spark streaming which allowed us to poll a s3 bucket for every few mins ● A new rolling appender was added to log servers to push logs to s3 every 5 mins ● Exact same batch code was used for calculations which made transition very easy ● All downstream applications remained same
  • 23. Amazon Kinesis ● A kafka like distributed message queue by Amazon ● It’s used as managed kafka source on AWS web services ● Highly scalable and low latency support ● Persistence with fault tolerance across multiple availability zones ● Great integration with Spark
  • 24. Second version ● Amazon kinesis is added as real time stream source ● Logging server push logs to kinesis as they arrive ● Streaming application pulls the data from kinesis for every few mins ● Multiple partitions support added for parallel streams
  • 25. Challenges with Python ● Spark streaming API for python was introduced in 1.2 whereas spark-streaming for Scala/Java is available from 0.8 ● No aws kinesis connector was available as of March ● Team has to write it’s own ● No support for python in Spark job server
  • 26. Challenges from batch to streaming ● Session typically last from 1-10 mins. Batch is easy most of the time session is done for a one hour data but challenging for real time data ● Designing state for session ● Designing checkpointing and deciding on interval ● Weird checkpointing issues with s3 due to eventual consistency
  • 27. Improvements to batch code ● Most of the code was written in rdd paradigm as it was only know to team ● Team was trained on spark sql and spark streaming ● Majority code was ported to Spark sql based solution to improve readability and maintainability ● Recently moved into Dataframe based code
  • 29. Choosing Mesos ● Mesos is a great cluster manager for Spark only workloads ● Has specific coarse-grain mode which is dedicated for the real time systems ● Minimal overhead compared to YARN ● Easy to setup on EC2
  • 31. Grafana ● Added grafana for visualization and dashboards ● Graphana = Graphite + influxDB ● Moved away from mysql to time series database influx DB ● Scales much better compared to mysql ● Data scientists or product managers can monitor customers using these dashboards ● Integrates with mobile sdk