Presented at PyOhio 2017: https://pyohio.org/schedule/presentation/284/
The Python data ecosystem provides amazing tools to quickly get up and running with machine learning models, but the path to stably serving them in production is not so clear. We'll discuss details of wrapping a minimal REST API around scikit-learn, training and persisting models in batch, and logging decisions, then compare to some other common approaches to productionizing models.
3. 3
• What’s the problem we’re solving?
• Why machine learning?
• Walkthrough of developing the model
• ✨ Live demo ✨
• Complications of moving this workflow to
production
• Other potential approaches
Overview
5. 5
Categorizing chats
# SELECT subject, body, category FROM chats;
subject | body | category
--------------+---------------------------+----------------
Check deposit | Hi how are you? I was… | education
Lost Card | Can you send me a new… | urgent
my transfer | My transfer of $10 isn’t… | education
Mail deposits | I have a large check… | education
urgent, customer education, new product, incidents, other
12. 12
Training the model
import pandas as pd
data_frame = pd.read_sql(redshift_connection,
"SELECT category, subject, body FROM chats;")
X = data_frame[['subject', 'body']]
y = data_frame['category']
X_train, X_test, y_train, y_test =
train_test_split(X, y, test_size=0.33, random_state=0)
pipeline.fit(X_train, y_train)
14. 14
Testing the model
from sklearn.metrics import classification_report
y_predicted = pipeline.predict(X_test)
print(classification_report(y_test, y_predicted))
precision recall f1-score support
class 0 0.67 1.00 0.80 2
class 1 0.00 0.00 0.00 1
class 2 1.00 0.50 0.67 2
avg / total 0.67 0.60 0.59 5
15. 15
Serving the model in Flask
from flask import route, jsonify, request
@route('/chat-classification-api/messages',
methods=['POST'])
def classify_messages():
"""Classify given chat messages"""
messages = request.get_json()
y = pipeline.predict(messages)
# join class labels back with identifiers
predictions = [{"chat_id": message["chat_id"],
"class_label": label}
for message, label in zip(messages, y)]
return jsonify(predictions)
30. 30
• Train and test in a batch environment
• Output serialized model and classification report
• sklearn.pipeline is convenient for storing
code+params
• Serve on-demand predictions separately
• Treat this like any production service
Recap