SlideShare a Scribd company logo
1 of 33
Download to read offline
Data Science with Python
From analytics scripts to services in scale
Giuseppe Broccolo
Data Engineer @ Decibel ltd.
@giubro
/gbroccolo
gbroccolo@decibelinsight.com
PyLondinium 2019
London, UK, June 15th
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
Intro
From analytics scripts to services in scale
PyLondinium 2019
DNA primase enzyme:
- unroll human DNA (3.2G pair bases)
- DNA is unrolled with a rate of 5 pair
bases per sec (~20 yr)
Wikipedia, © 2019
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
Intro
From analytics scripts to services in scale
PyLondinium 2019
DNA primase enzyme:
- unroll human DNA (3.2G pair bases)
- DNA is unrolled with a rate of 5 pair
bases per sec (~20 yr)
- distributed processing: every 500k pair bases
(~20 yr → ~20 min)
Wikipedia, © 2019
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
The repo
From analytics scripts to services in scale
PyLondinium 2019
https://github.com/gbroccolo/k8s_example
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
Outline of the talk
From analytics scripts to services in scale
PyLondinium 2019
●
Requirements to deploy a service in scale
●
A simple case study: from a prototype to a service in scale
●
The roadmap
●
Conclusions
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
The requirements of
a scalable service
From analytics scripts to services in scale
PyLondinium 2019
https://12factor.net/
●
Prefer stateless applications
●
Async is better than sync
●
Ingest data via HTTP reqs, or consuming streaming
●
Store the configuration as ENVs – avoid filesystem persistence
●
Avoid even temporary artifacts on filesystem – prefer in memory executions
●
Rely on backing services if necessary
●
High portability and easy deploy of the processing units – use Docker!
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
From analytics scripts to services in scale
PyLondinium 2019
Data scientists produced a fantastic algorithm
to detect anomalies in gaussian time series
HTTP
Data engineers have to design the pipeline of a
responsive, in scale service able to ingest time
series provided via HTTP requests
curl -X POST -H “Content-Type: application/x-www-form-urlencoded” 
-d “(18:27:26.345, 2.345) (18:27:26.346, 2.352) ...” 
http://<url>/get_anomalous_data
The scenario
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
The scenario
From analytics scripts to services in scale
PyLondinium 2019
from itertools import count
import numpy as np
import pandas as pd
def nd_rolling(data, window_size):
sample = list(zip(count(), data.values[:, 0], data.values[:, 1]))
for idx in range(0, len(sample)):
idx0 = idx if idx - window_size < 0 else idx - window_size
window = [it for it in sample
if it[0] >= idx0 or it[0] <= idx0 + window_size]
x = np.array([it[2] for it in window])
yield {'idx': sample[idx][1],
'value': sample[idx][2],
'window_mean': np.mean(x),
'window_std': np.std(x)}
def get_anomalous_values(data, window_size=100):
return [(p['idx'], p['value'])
for p in nd_rolling(data, window_size)
if abs(p['value'] - p['window_mean']) 
> 5 * p['window_std']]
time
(18:27:26.345, 2.345),
(18:27:26.346, 2.352),
(18:27:26.347, 2.348),
...
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
Roadmap
From analytics scripts to services in scale
PyLondinium 2019
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
Step 1
define an importable module
From analytics scripts to services in scale
PyLondinium 2019
1
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
Step 1
define an importable module
From analytics scripts to services in scale
PyLondinium 2019
anomaly/
__init__.py
anomaly.py
setup.cfg
setup.py
requirements.txt
from itertools import count
import numpy as np
import pandas as pd
def nd_rolling(data, window_size):
sample = list(zip(count(), data.values[:, 0], data.values[:, 1]))
for idx in range(0, len(sample)):
idx0 = idx if idx - window_size < 0 else idx - window_size
window = [it for it in sample
if it[0] >= idx0 or it[0] <= idx0 + window_size]
x = np.array([it[2] for it in window])
yield {'idx': sample[idx][1],
'value': sample[idx][2],
'window_mean': np.mean(x),
'window_std': np.std(x)}
def get_anomalous_values(data, window_size=100):
return [(p['idx'], p['value'])
for p in nd_rolling(data, window_size)
if abs(p['value'] - p['window_mean']) 
> 5 * p['window_std']]
https://setuptools.readthedocs.io
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
Step 1
define an importable module
From analytics scripts to services in scale
PyLondinium 2019
$ pip install --no-cache-dir . && rm -rf ./*
$ python
Python 3.7.1 (default, Dec 20 2018, 10:12:31)
[Clang 10.0.0 (clang-1000.11.45.5)] on darwin
Type "help", "copyright", "credits" or "license" for more
information.
>>> from anomaly import anomaly
>>>
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
Step 2
expose a responsive endpoint
From analytics scripts to services in scale
PyLondinium 2019
1
2
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
Step 2
expose a responsive endpoint
From analytics scripts to services in scale
PyLondinium 2019
References: a lot of blogs, articles, ...
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
Step 2
expose a responsive endpoint
From analytics scripts to services in scale
PyLondinium 2019
from flask import Flask
from flask import request
import json
from io import StringIO
import re
import pandas as pd
from anomaly.anomaly import get_anomalous_values
app = Flask(__name__)
def bloat2float(x):
y = re.search("((.*),(.*))", x).group(1, 2)
return y[0], float(y[1])
@app.route('/get_anomalous_data', methods=['POST'])
def get_anomalous_data():
stream = pd.DataFrame([bloat2float(x) for x in StringIO(
request.get_data().decode('utf-8')).getvalue().split()])
stream = stream[pd.to_numeric(stream[1], errors='coerce').notnull()]
response = get_anomalous_values(stream)
return json.dumps(response), 200
if __name__ == '__main__':
app.run(host='0.0.0.0', port=80, debug=True)
main.py
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
Step 3
make it asynchronously
From analytics scripts to services in scale
PyLondinium 2019
1
2 3
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
From analytics scripts to services in scale
PyLondinium 2019
reduce the impact to the requestor
POST
GET
Step 3
make it asynchronously
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
From analytics scripts to services in scale
PyLondinium 2019
Step 3
make it asynchronously
celery worker
source
APP
redis & celery clients
Broker pipeline
Backend pipeline
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
Step 3
make it asynchronously
From analytics scripts to services in scale
PyLondinium 2019
[supervisord]
nodeamon=true
[program:uwsgi]
environment=PYTHONPATH=/app/
command=/usr/local/bin/uwsgi --ini /etc/uwsgi/uwsgi.ini --die-on-term --need-app --plugin
python3
autostart=true
autorestart=true
[program:nginx]
command=/usr/sbin/nginx
autostart=true
autorestart=true
[program:celery_worker]
environment=PYTHONPATH=/app/
command=/usr/local/bin/celery worker -A main.celery --loglevel=info
autostart=true
autorestart=true
supervisord.ini
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
From analytics scripts to services in scale
PyLondinium 2019
[…]
from celery import Celery
from celery.signals import after_setup_logger
from redis import StrictRedis
[…]
app = Flask(__name__)
app.config['CELERY_RESULT_BACKEND'] = 'redis://%s:6379/0' % os.environ.get('REDIS_HOST')
app.config['CELERY_BACKEND_URL'] = 'redis://%s:6379/0' % os.environ.get('REDIS_HOST')
celery = Celery(app.name, broker=app.config['CELERY_BACKEND_URL'])
celery.conf.update(app.config)
[…]
REDIS_CACHE = StrictRedis(
host='%s' % os.environ.get('REDIS_HOST'), port=6379, db=1,decode_responses=True)
[…]
@celery.task(bind=True)
def wrap_long_task(self, arg):
return get_anomalous_values(arg)
Step 3
make it asynchronously
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
From analytics scripts to services in scale
PyLondinium 2019
@app.route("/get_anomalous_data", methods=['GET', 'POST'])
def get_anomalous_data():
if request.method == ‘GET’:
task_id = REDIS_CACHE.rpop("running_task_ids")
task = wrap_long_task.AsyncResult(task_id)
if task.state == 'SUCCESS':
return json.dumps(task.result), 200
elif task.state == 'FAILURE':
return str(task.traceback), 500
else:
REDIS_CACHE.lpush("running_task_ids", task_id)
return ‘’, 202
elif request.method == ‘POST’:
stream = pd.DataFrame([bloat2float(x) for x in StringIO(
request.get_data().decode('utf-8')).getvalue().split()])
stream = stream[pd.to_numeric(stream[1], errors='coerce').notnull()]
task = wrap_long_task.apply_async(args=[stream.to_json(orient='records')])
REDIS_CACHE.lpush("running_task_ids", task.id)
return 'submitted', 200
Step 3
make it asynchronously
GET
POST
Celery ingests
JSON serializable
Inputs only!!
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
From analytics scripts to services in scale
PyLondinium 2019
[…]
def get_anomalous_values(data, window_size=100):
"""
data : pandas.core.frame.DataFrame
window_size: int
return: list
"""
pd_data = pd.read_json(data, orient='records')
# calculate the moving window for each point, and report the anomaly if
# the distance of the idx-th point is greater than md times the mahalanobis
# distance
return [(p['idx'], p['value']) for p in nd_rolling(pd_data, window_size)
if abs(p['value'] - p['window_mean']) > 5 * p['window_std']]
Step 3
make it asynchronously
Need to deserialise
the ingested data
into a Pandas
dataframe
anomaly.py
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
Step 4
microservices (Docker)
From analytics scripts to services in scale
PyLondinium 2019
4
3
1
2
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
Step 4
microservices (Docker)
From analytics scripts to services in scale
PyLondinium 2019
main
application
broker & backing
service
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
Step 4
microservices (Docker)
From analytics scripts to services in scale
PyLondinium 2019
Dockerfile
FROM decibel/uwsgi-nginx-flask-docker:python3.6-alpine3.8-pandas
MAINTAINER Giuseppe Broccolo <gbroccolo@decibelinsight.com>
RUN mkdir -p /source
COPY . /source
COPY ./main.py /app/
COPY ./supervisord.ini /etc/supervisor.d/
RUN cd /source &&
pip install --no-cache-dir . &&
cd / &&
rm -rf /source
inherited from: https://hub.docker.com/r/tiangolo/uwsgi-nginx-flask/
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
Step 4
microservices (Docker)
From analytics scripts to services in scale
PyLondinium 2019
version: '3.5'
networks:
redis_network:
services:
python-app:
build: .
environment:
- REDIS_HOST=redis
ports:
- “0.0.0.0:80:80”
depends_on:
- redis
networks:
- redis_network
redis:
image: redis:5.0.3-alpine3.8
networks:
- redis_network
docker-compose.yml
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
Step 5
kubernetes on cloud
From analytics scripts to services in scale
PyLondinium 2019
4 5
3
1
2
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
Step 5
kubernetes on cloud
From analytics scripts to services in scale
PyLondinium 2019
●
Dynamic cluster of worker units
●
autoscaled basing on load of single units
●
Several cloud providers – GKE, EKS, …
●
Deployable through YAML configurations:
●
PODs – elementary unit composed by 1+ containers
●
Deployments – how the PODs are deployed
●
Services – how the deployed PODs are exposed
●
Horizontal autoscalers – how the cluster autoscales
HTTP
https://kubernetes.io
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
Step 5
kubernetes on cloud – static cluster
From analytics scripts to services in scale
PyLondinium 2019
pods.yml
---
apiVersion: apps/v1
kind: Pod
[…]
spec:
containers:
- name: python-app
image: <some-docker-hub>/python-app:latest
env:
- name: REDIS_HOST
value: “localhost”
resources:
limits:
cpu: “400m”
requests:
cpu: “200m”
- name: redis
image: redis:5.0.3-alpine3.8
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
Step 5
kubernetes on cloud – scalable cluster
From analytics scripts to services in scale
PyLondinium 2019
pods.yml
---
apiVersion: apps/v1
kind: Pod
[…]
spec:
containers:
- name: python-app
image: <some-docker-hub>/python-app:latest
env:
- name: REDIS_HOST
value: “localhost”
resources:
limits:
cpu: “400m”
requests:
cpu: “200m”
- name: redis
image: redis:5.0.3-alpine3.8
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
Step 5
kubernetes on cloud – scalable cluster
From analytics scripts to services in scale
PyLondinium 2019
---
apiVersion: apps/v1
kind: Pod
[…]
spec:
containers:
- name: python-app
image: <some-docker-hub>/python-app:latest
env:
- name: REDIS_HOST
value: “redis”
resources:
limits:
cpu: “400m”
requests:
cpu: “200m”
ports:
- containerPort: 80
---
apiVersion: apps/v1
kind: Pod
[…]
spec:
containers:
- name: redis
image: redis:5.0.3-alpine3.8
---
apiVersion: v1
kind: Service
[…]
spec:
type: ClusterIP
ports:
- port: 6379
targetPort: 6379
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
Room for improvements
From analytics scripts to services in scale
PyLondinium 2019
●
Consume streams, publish results on streams
●
Make the cache backing service more reliables
(HA cluster, persist data in a volume in case of restart of pods)
●
Decouple Celery workers from HTTP data ingestion
Giuseppe Broccolo Data Science with Python:
London, UK, 2019 June 15th
https://github.com/gbroccolo/k8s_example
From analytics scripts to services in scale
PyLondinium 2019
4 5
gbroccolo@decibelinsight.com
@giubro
/gbroccolo
3
1
2
© Giuseppe Broccolo, Decibel ltd, 2019

More Related Content

Similar to High scalable applications with Python

Mixing C++ & Python II: Pybind11
Mixing C++ & Python II: Pybind11Mixing C++ & Python II: Pybind11
Mixing C++ & Python II: Pybind11corehard_by
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data AnalyticsEdureka!
 
DA 592 - Term Project Report - Berker Kozan Can Koklu
DA 592 - Term Project Report - Berker Kozan Can KokluDA 592 - Term Project Report - Berker Kozan Can Koklu
DA 592 - Term Project Report - Berker Kozan Can KokluCan Köklü
 
Cloud-based dynamic distributed optimisation of integrated process planning a...
Cloud-based dynamic distributed optimisation of integrated process planning a...Cloud-based dynamic distributed optimisation of integrated process planning a...
Cloud-based dynamic distributed optimisation of integrated process planning a...Piotr Dziurzanski
 
MongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
MongoDB Days UK: Using MongoDB and Python for Data Analysis PipelinesMongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
MongoDB Days UK: Using MongoDB and Python for Data Analysis PipelinesMongoDB
 
Primers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code ReviewPrimers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code ReviewDelft University of Technology
 
IRJET- Analysis of Boston’s Crime Data using Apache Pig
IRJET- Analysis of Boston’s Crime Data using Apache PigIRJET- Analysis of Boston’s Crime Data using Apache Pig
IRJET- Analysis of Boston’s Crime Data using Apache PigIRJET Journal
 
apidays LIVE Australia - From micro to macro-coordination through domain-cent...
apidays LIVE Australia - From micro to macro-coordination through domain-cent...apidays LIVE Australia - From micro to macro-coordination through domain-cent...
apidays LIVE Australia - From micro to macro-coordination through domain-cent...apidays
 
Scientific Plotting in Python
Scientific Plotting in PythonScientific Plotting in Python
Scientific Plotting in PythonJack Parmer
 
Samsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonSamsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonInsuk (Chris) Cho
 
Ben ford intro
Ben ford introBen ford intro
Ben ford introPuppet
 
Telemetry doesn't have to be scary; Ben Ford
Telemetry doesn't have to be scary; Ben FordTelemetry doesn't have to be scary; Ben Ford
Telemetry doesn't have to be scary; Ben FordPuppet
 
Collaboration hack with slackbot - PyCon HK 2018 - 2018.11.24
Collaboration hack with slackbot - PyCon HK 2018 - 2018.11.24Collaboration hack with slackbot - PyCon HK 2018 - 2018.11.24
Collaboration hack with slackbot - PyCon HK 2018 - 2018.11.24Kei IWASAKI
 
Serverless survival kit
Serverless survival kitServerless survival kit
Serverless survival kitSteve Houël
 
IRJET- Monument Informatica Application using AR
IRJET-  	  Monument Informatica Application using ARIRJET-  	  Monument Informatica Application using AR
IRJET- Monument Informatica Application using ARIRJET Journal
 
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...Edureka!
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in PythonAjay Ohri
 
StreamSight - Query-Driven Descriptive Analytics for IoT and Edge Computing
StreamSight - Query-Driven Descriptive Analytics for IoT and Edge ComputingStreamSight - Query-Driven Descriptive Analytics for IoT and Edge Computing
StreamSight - Query-Driven Descriptive Analytics for IoT and Edge ComputingDemetris Trihinas
 
Measuring Software development with GrimoireLab
Measuring Software development with GrimoireLabMeasuring Software development with GrimoireLab
Measuring Software development with GrimoireLabValerio Cosentino
 
Sparklyr: Big Data enabler for R users
Sparklyr: Big Data enabler for R usersSparklyr: Big Data enabler for R users
Sparklyr: Big Data enabler for R usersICTeam S.p.A.
 

Similar to High scalable applications with Python (20)

Mixing C++ & Python II: Pybind11
Mixing C++ & Python II: Pybind11Mixing C++ & Python II: Pybind11
Mixing C++ & Python II: Pybind11
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
 
DA 592 - Term Project Report - Berker Kozan Can Koklu
DA 592 - Term Project Report - Berker Kozan Can KokluDA 592 - Term Project Report - Berker Kozan Can Koklu
DA 592 - Term Project Report - Berker Kozan Can Koklu
 
Cloud-based dynamic distributed optimisation of integrated process planning a...
Cloud-based dynamic distributed optimisation of integrated process planning a...Cloud-based dynamic distributed optimisation of integrated process planning a...
Cloud-based dynamic distributed optimisation of integrated process planning a...
 
MongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
MongoDB Days UK: Using MongoDB and Python for Data Analysis PipelinesMongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
MongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
 
Primers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code ReviewPrimers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code Review
 
IRJET- Analysis of Boston’s Crime Data using Apache Pig
IRJET- Analysis of Boston’s Crime Data using Apache PigIRJET- Analysis of Boston’s Crime Data using Apache Pig
IRJET- Analysis of Boston’s Crime Data using Apache Pig
 
apidays LIVE Australia - From micro to macro-coordination through domain-cent...
apidays LIVE Australia - From micro to macro-coordination through domain-cent...apidays LIVE Australia - From micro to macro-coordination through domain-cent...
apidays LIVE Australia - From micro to macro-coordination through domain-cent...
 
Scientific Plotting in Python
Scientific Plotting in PythonScientific Plotting in Python
Scientific Plotting in Python
 
Samsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonSamsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of Python
 
Ben ford intro
Ben ford introBen ford intro
Ben ford intro
 
Telemetry doesn't have to be scary; Ben Ford
Telemetry doesn't have to be scary; Ben FordTelemetry doesn't have to be scary; Ben Ford
Telemetry doesn't have to be scary; Ben Ford
 
Collaboration hack with slackbot - PyCon HK 2018 - 2018.11.24
Collaboration hack with slackbot - PyCon HK 2018 - 2018.11.24Collaboration hack with slackbot - PyCon HK 2018 - 2018.11.24
Collaboration hack with slackbot - PyCon HK 2018 - 2018.11.24
 
Serverless survival kit
Serverless survival kitServerless survival kit
Serverless survival kit
 
IRJET- Monument Informatica Application using AR
IRJET-  	  Monument Informatica Application using ARIRJET-  	  Monument Informatica Application using AR
IRJET- Monument Informatica Application using AR
 
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in Python
 
StreamSight - Query-Driven Descriptive Analytics for IoT and Edge Computing
StreamSight - Query-Driven Descriptive Analytics for IoT and Edge ComputingStreamSight - Query-Driven Descriptive Analytics for IoT and Edge Computing
StreamSight - Query-Driven Descriptive Analytics for IoT and Edge Computing
 
Measuring Software development with GrimoireLab
Measuring Software development with GrimoireLabMeasuring Software development with GrimoireLab
Measuring Software development with GrimoireLab
 
Sparklyr: Big Data enabler for R users
Sparklyr: Big Data enabler for R usersSparklyr: Big Data enabler for R users
Sparklyr: Big Data enabler for R users
 

More from Giuseppe Broccolo

Non-parametric regressions & Neural Networks
Non-parametric regressions & Neural NetworksNon-parametric regressions & Neural Networks
Non-parametric regressions & Neural NetworksGiuseppe Broccolo
 
GBroccolo JRouhaud pgconfeu2016_brin4postgis
GBroccolo JRouhaud pgconfeu2016_brin4postgisGBroccolo JRouhaud pgconfeu2016_brin4postgis
GBroccolo JRouhaud pgconfeu2016_brin4postgisGiuseppe Broccolo
 
Gbroccolo pgconfeu2016 pgnfs
Gbroccolo pgconfeu2016 pgnfsGbroccolo pgconfeu2016 pgnfs
Gbroccolo pgconfeu2016 pgnfsGiuseppe Broccolo
 
Gbroccolo foss4 guk2016_brin4postgis
Gbroccolo foss4 guk2016_brin4postgisGbroccolo foss4 guk2016_brin4postgis
Gbroccolo foss4 guk2016_brin4postgisGiuseppe Broccolo
 
Relational approach with LiDAR data with PostgreSQL
Relational approach with LiDAR data with PostgreSQLRelational approach with LiDAR data with PostgreSQL
Relational approach with LiDAR data with PostgreSQLGiuseppe Broccolo
 
BRIN indexes on geospatial databases - FOSS4G.NA 2016
BRIN indexes on geospatial databases - FOSS4G.NA 2016BRIN indexes on geospatial databases - FOSS4G.NA 2016
BRIN indexes on geospatial databases - FOSS4G.NA 2016Giuseppe Broccolo
 
Gbroccolo itpug p_gday2015_geodbbrin
Gbroccolo itpug p_gday2015_geodbbrinGbroccolo itpug p_gday2015_geodbbrin
Gbroccolo itpug p_gday2015_geodbbrinGiuseppe Broccolo
 
gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.E...
gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.E...gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.E...
gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.E...Giuseppe Broccolo
 
Gbroccolo itpug p_gday2014_geodbindex
Gbroccolo itpug p_gday2014_geodbindexGbroccolo itpug p_gday2014_geodbindex
Gbroccolo itpug p_gday2014_geodbindexGiuseppe Broccolo
 

More from Giuseppe Broccolo (10)

Indexes in PostgreSQL (10)
Indexes in PostgreSQL (10)Indexes in PostgreSQL (10)
Indexes in PostgreSQL (10)
 
Non-parametric regressions & Neural Networks
Non-parametric regressions & Neural NetworksNon-parametric regressions & Neural Networks
Non-parametric regressions & Neural Networks
 
GBroccolo JRouhaud pgconfeu2016_brin4postgis
GBroccolo JRouhaud pgconfeu2016_brin4postgisGBroccolo JRouhaud pgconfeu2016_brin4postgis
GBroccolo JRouhaud pgconfeu2016_brin4postgis
 
Gbroccolo pgconfeu2016 pgnfs
Gbroccolo pgconfeu2016 pgnfsGbroccolo pgconfeu2016 pgnfs
Gbroccolo pgconfeu2016 pgnfs
 
Gbroccolo foss4 guk2016_brin4postgis
Gbroccolo foss4 guk2016_brin4postgisGbroccolo foss4 guk2016_brin4postgis
Gbroccolo foss4 guk2016_brin4postgis
 
Relational approach with LiDAR data with PostgreSQL
Relational approach with LiDAR data with PostgreSQLRelational approach with LiDAR data with PostgreSQL
Relational approach with LiDAR data with PostgreSQL
 
BRIN indexes on geospatial databases - FOSS4G.NA 2016
BRIN indexes on geospatial databases - FOSS4G.NA 2016BRIN indexes on geospatial databases - FOSS4G.NA 2016
BRIN indexes on geospatial databases - FOSS4G.NA 2016
 
Gbroccolo itpug p_gday2015_geodbbrin
Gbroccolo itpug p_gday2015_geodbbrinGbroccolo itpug p_gday2015_geodbbrin
Gbroccolo itpug p_gday2015_geodbbrin
 
gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.E...
gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.E...gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.E...
gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.E...
 
Gbroccolo itpug p_gday2014_geodbindex
Gbroccolo itpug p_gday2014_geodbindexGbroccolo itpug p_gday2014_geodbindex
Gbroccolo itpug p_gday2014_geodbindex
 

Recently uploaded

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 

Recently uploaded (20)

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 

High scalable applications with Python

  • 1. Data Science with Python From analytics scripts to services in scale Giuseppe Broccolo Data Engineer @ Decibel ltd. @giubro /gbroccolo gbroccolo@decibelinsight.com PyLondinium 2019 London, UK, June 15th
  • 2. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th Intro From analytics scripts to services in scale PyLondinium 2019 DNA primase enzyme: - unroll human DNA (3.2G pair bases) - DNA is unrolled with a rate of 5 pair bases per sec (~20 yr) Wikipedia, © 2019
  • 3. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th Intro From analytics scripts to services in scale PyLondinium 2019 DNA primase enzyme: - unroll human DNA (3.2G pair bases) - DNA is unrolled with a rate of 5 pair bases per sec (~20 yr) - distributed processing: every 500k pair bases (~20 yr → ~20 min) Wikipedia, © 2019
  • 4. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th The repo From analytics scripts to services in scale PyLondinium 2019 https://github.com/gbroccolo/k8s_example
  • 5. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th Outline of the talk From analytics scripts to services in scale PyLondinium 2019 ● Requirements to deploy a service in scale ● A simple case study: from a prototype to a service in scale ● The roadmap ● Conclusions
  • 6. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th The requirements of a scalable service From analytics scripts to services in scale PyLondinium 2019 https://12factor.net/ ● Prefer stateless applications ● Async is better than sync ● Ingest data via HTTP reqs, or consuming streaming ● Store the configuration as ENVs – avoid filesystem persistence ● Avoid even temporary artifacts on filesystem – prefer in memory executions ● Rely on backing services if necessary ● High portability and easy deploy of the processing units – use Docker!
  • 7. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th From analytics scripts to services in scale PyLondinium 2019 Data scientists produced a fantastic algorithm to detect anomalies in gaussian time series HTTP Data engineers have to design the pipeline of a responsive, in scale service able to ingest time series provided via HTTP requests curl -X POST -H “Content-Type: application/x-www-form-urlencoded” -d “(18:27:26.345, 2.345) (18:27:26.346, 2.352) ...” http://<url>/get_anomalous_data The scenario
  • 8. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th The scenario From analytics scripts to services in scale PyLondinium 2019 from itertools import count import numpy as np import pandas as pd def nd_rolling(data, window_size): sample = list(zip(count(), data.values[:, 0], data.values[:, 1])) for idx in range(0, len(sample)): idx0 = idx if idx - window_size < 0 else idx - window_size window = [it for it in sample if it[0] >= idx0 or it[0] <= idx0 + window_size] x = np.array([it[2] for it in window]) yield {'idx': sample[idx][1], 'value': sample[idx][2], 'window_mean': np.mean(x), 'window_std': np.std(x)} def get_anomalous_values(data, window_size=100): return [(p['idx'], p['value']) for p in nd_rolling(data, window_size) if abs(p['value'] - p['window_mean']) > 5 * p['window_std']] time (18:27:26.345, 2.345), (18:27:26.346, 2.352), (18:27:26.347, 2.348), ...
  • 9. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th Roadmap From analytics scripts to services in scale PyLondinium 2019
  • 10. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th Step 1 define an importable module From analytics scripts to services in scale PyLondinium 2019 1
  • 11. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th Step 1 define an importable module From analytics scripts to services in scale PyLondinium 2019 anomaly/ __init__.py anomaly.py setup.cfg setup.py requirements.txt from itertools import count import numpy as np import pandas as pd def nd_rolling(data, window_size): sample = list(zip(count(), data.values[:, 0], data.values[:, 1])) for idx in range(0, len(sample)): idx0 = idx if idx - window_size < 0 else idx - window_size window = [it for it in sample if it[0] >= idx0 or it[0] <= idx0 + window_size] x = np.array([it[2] for it in window]) yield {'idx': sample[idx][1], 'value': sample[idx][2], 'window_mean': np.mean(x), 'window_std': np.std(x)} def get_anomalous_values(data, window_size=100): return [(p['idx'], p['value']) for p in nd_rolling(data, window_size) if abs(p['value'] - p['window_mean']) > 5 * p['window_std']] https://setuptools.readthedocs.io
  • 12. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th Step 1 define an importable module From analytics scripts to services in scale PyLondinium 2019 $ pip install --no-cache-dir . && rm -rf ./* $ python Python 3.7.1 (default, Dec 20 2018, 10:12:31) [Clang 10.0.0 (clang-1000.11.45.5)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from anomaly import anomaly >>>
  • 13. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th Step 2 expose a responsive endpoint From analytics scripts to services in scale PyLondinium 2019 1 2
  • 14. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th Step 2 expose a responsive endpoint From analytics scripts to services in scale PyLondinium 2019 References: a lot of blogs, articles, ...
  • 15. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th Step 2 expose a responsive endpoint From analytics scripts to services in scale PyLondinium 2019 from flask import Flask from flask import request import json from io import StringIO import re import pandas as pd from anomaly.anomaly import get_anomalous_values app = Flask(__name__) def bloat2float(x): y = re.search("((.*),(.*))", x).group(1, 2) return y[0], float(y[1]) @app.route('/get_anomalous_data', methods=['POST']) def get_anomalous_data(): stream = pd.DataFrame([bloat2float(x) for x in StringIO( request.get_data().decode('utf-8')).getvalue().split()]) stream = stream[pd.to_numeric(stream[1], errors='coerce').notnull()] response = get_anomalous_values(stream) return json.dumps(response), 200 if __name__ == '__main__': app.run(host='0.0.0.0', port=80, debug=True) main.py
  • 16. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th Step 3 make it asynchronously From analytics scripts to services in scale PyLondinium 2019 1 2 3
  • 17. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th From analytics scripts to services in scale PyLondinium 2019 reduce the impact to the requestor POST GET Step 3 make it asynchronously
  • 18. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th From analytics scripts to services in scale PyLondinium 2019 Step 3 make it asynchronously celery worker source APP redis & celery clients Broker pipeline Backend pipeline
  • 19. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th Step 3 make it asynchronously From analytics scripts to services in scale PyLondinium 2019 [supervisord] nodeamon=true [program:uwsgi] environment=PYTHONPATH=/app/ command=/usr/local/bin/uwsgi --ini /etc/uwsgi/uwsgi.ini --die-on-term --need-app --plugin python3 autostart=true autorestart=true [program:nginx] command=/usr/sbin/nginx autostart=true autorestart=true [program:celery_worker] environment=PYTHONPATH=/app/ command=/usr/local/bin/celery worker -A main.celery --loglevel=info autostart=true autorestart=true supervisord.ini
  • 20. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th From analytics scripts to services in scale PyLondinium 2019 […] from celery import Celery from celery.signals import after_setup_logger from redis import StrictRedis […] app = Flask(__name__) app.config['CELERY_RESULT_BACKEND'] = 'redis://%s:6379/0' % os.environ.get('REDIS_HOST') app.config['CELERY_BACKEND_URL'] = 'redis://%s:6379/0' % os.environ.get('REDIS_HOST') celery = Celery(app.name, broker=app.config['CELERY_BACKEND_URL']) celery.conf.update(app.config) […] REDIS_CACHE = StrictRedis( host='%s' % os.environ.get('REDIS_HOST'), port=6379, db=1,decode_responses=True) […] @celery.task(bind=True) def wrap_long_task(self, arg): return get_anomalous_values(arg) Step 3 make it asynchronously
  • 21. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th From analytics scripts to services in scale PyLondinium 2019 @app.route("/get_anomalous_data", methods=['GET', 'POST']) def get_anomalous_data(): if request.method == ‘GET’: task_id = REDIS_CACHE.rpop("running_task_ids") task = wrap_long_task.AsyncResult(task_id) if task.state == 'SUCCESS': return json.dumps(task.result), 200 elif task.state == 'FAILURE': return str(task.traceback), 500 else: REDIS_CACHE.lpush("running_task_ids", task_id) return ‘’, 202 elif request.method == ‘POST’: stream = pd.DataFrame([bloat2float(x) for x in StringIO( request.get_data().decode('utf-8')).getvalue().split()]) stream = stream[pd.to_numeric(stream[1], errors='coerce').notnull()] task = wrap_long_task.apply_async(args=[stream.to_json(orient='records')]) REDIS_CACHE.lpush("running_task_ids", task.id) return 'submitted', 200 Step 3 make it asynchronously GET POST Celery ingests JSON serializable Inputs only!!
  • 22. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th From analytics scripts to services in scale PyLondinium 2019 […] def get_anomalous_values(data, window_size=100): """ data : pandas.core.frame.DataFrame window_size: int return: list """ pd_data = pd.read_json(data, orient='records') # calculate the moving window for each point, and report the anomaly if # the distance of the idx-th point is greater than md times the mahalanobis # distance return [(p['idx'], p['value']) for p in nd_rolling(pd_data, window_size) if abs(p['value'] - p['window_mean']) > 5 * p['window_std']] Step 3 make it asynchronously Need to deserialise the ingested data into a Pandas dataframe anomaly.py
  • 23. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th Step 4 microservices (Docker) From analytics scripts to services in scale PyLondinium 2019 4 3 1 2
  • 24. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th Step 4 microservices (Docker) From analytics scripts to services in scale PyLondinium 2019 main application broker & backing service
  • 25. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th Step 4 microservices (Docker) From analytics scripts to services in scale PyLondinium 2019 Dockerfile FROM decibel/uwsgi-nginx-flask-docker:python3.6-alpine3.8-pandas MAINTAINER Giuseppe Broccolo <gbroccolo@decibelinsight.com> RUN mkdir -p /source COPY . /source COPY ./main.py /app/ COPY ./supervisord.ini /etc/supervisor.d/ RUN cd /source && pip install --no-cache-dir . && cd / && rm -rf /source inherited from: https://hub.docker.com/r/tiangolo/uwsgi-nginx-flask/
  • 26. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th Step 4 microservices (Docker) From analytics scripts to services in scale PyLondinium 2019 version: '3.5' networks: redis_network: services: python-app: build: . environment: - REDIS_HOST=redis ports: - “0.0.0.0:80:80” depends_on: - redis networks: - redis_network redis: image: redis:5.0.3-alpine3.8 networks: - redis_network docker-compose.yml
  • 27. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th Step 5 kubernetes on cloud From analytics scripts to services in scale PyLondinium 2019 4 5 3 1 2
  • 28. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th Step 5 kubernetes on cloud From analytics scripts to services in scale PyLondinium 2019 ● Dynamic cluster of worker units ● autoscaled basing on load of single units ● Several cloud providers – GKE, EKS, … ● Deployable through YAML configurations: ● PODs – elementary unit composed by 1+ containers ● Deployments – how the PODs are deployed ● Services – how the deployed PODs are exposed ● Horizontal autoscalers – how the cluster autoscales HTTP https://kubernetes.io
  • 29. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th Step 5 kubernetes on cloud – static cluster From analytics scripts to services in scale PyLondinium 2019 pods.yml --- apiVersion: apps/v1 kind: Pod […] spec: containers: - name: python-app image: <some-docker-hub>/python-app:latest env: - name: REDIS_HOST value: “localhost” resources: limits: cpu: “400m” requests: cpu: “200m” - name: redis image: redis:5.0.3-alpine3.8
  • 30. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th Step 5 kubernetes on cloud – scalable cluster From analytics scripts to services in scale PyLondinium 2019 pods.yml --- apiVersion: apps/v1 kind: Pod […] spec: containers: - name: python-app image: <some-docker-hub>/python-app:latest env: - name: REDIS_HOST value: “localhost” resources: limits: cpu: “400m” requests: cpu: “200m” - name: redis image: redis:5.0.3-alpine3.8
  • 31. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th Step 5 kubernetes on cloud – scalable cluster From analytics scripts to services in scale PyLondinium 2019 --- apiVersion: apps/v1 kind: Pod […] spec: containers: - name: python-app image: <some-docker-hub>/python-app:latest env: - name: REDIS_HOST value: “redis” resources: limits: cpu: “400m” requests: cpu: “200m” ports: - containerPort: 80 --- apiVersion: apps/v1 kind: Pod […] spec: containers: - name: redis image: redis:5.0.3-alpine3.8 --- apiVersion: v1 kind: Service […] spec: type: ClusterIP ports: - port: 6379 targetPort: 6379
  • 32. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th Room for improvements From analytics scripts to services in scale PyLondinium 2019 ● Consume streams, publish results on streams ● Make the cache backing service more reliables (HA cluster, persist data in a volume in case of restart of pods) ● Decouple Celery workers from HTTP data ingestion
  • 33. Giuseppe Broccolo Data Science with Python: London, UK, 2019 June 15th https://github.com/gbroccolo/k8s_example From analytics scripts to services in scale PyLondinium 2019 4 5 gbroccolo@decibelinsight.com @giubro /gbroccolo 3 1 2 © Giuseppe Broccolo, Decibel ltd, 2019