Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Data Product Architectures

3 359 vues

Publié le

Data products derive their value from data and generate new data in return; as a result, machine learning techniques must be applied to their architecture and their development. Machine learning fits models to make predictions on unknown inputs and must be generalizable and adaptable. As such, fitted models cannot exist in isolation; they must be operationalized and user facing so that applications can benefit from the new data, respond to it, and feed it back into the data product. Data product architectures are therefore life cycles and understanding the data product lifecycle will enable architects to develop robust, failure free workflows and applications. In this talk we will discuss the data product life cycle, explore how to engage a model build, evaluation, and selection phase with an operation and interaction phase. Following the lambda architecture, we will investigate wrapping a central computational store for speed and querying, as well as incorporating a discussion of monitoring, management, and data exploration for hypothesis driven development. From web applications to big data appliances; this architecture serves as a blueprint for handling data services of all sizes!

Publié dans : Technologie

Data Product Architectures

  1. 1. Data Product Architectures Benjamin Bengfort @bbengfort District Data Labs
  2. 2. Abstract
  3. 3. What is data science? Or what is the goal of data science? Or why do they pay us so much?
  4. 4. Two Objectives Orient Data Science to Users
  5. 5. Data Products are self-adapting, broadly applicable software-based engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data.
  6. 6. Data Products are Applications that Employ Many Machine Learning Models
  7. 7. Data Report
  8. 8. Without Feedback Models are Disconnected They cannot adapt, tune, or react.
  9. 9. Data Products aren’t single models So how do we architect data products?
  10. 10. The Lambda Architecture
  11. 11. Three Case Studies
  12. 12. Analyst Architecture
  13. 13. Analyst Architecture: Document Review
  14. 14. Analyst Architecture: Triggers
  15. 15. Recommender Architecture
  16. 16. Recommender: Annotation Service
  17. 17. Partisan Discourse Architecture
  18. 18. Partisan Discourse: Adding Documents
  19. 19. Partisan Discourse: Documents
  20. 20. Partisan Discourse: User Specific Models
  21. 21. Commonalities?
  22. 22. Microservices Architecture: Smart Endpoints, Dumb Pipe HTTP HTTP HTTP HTTP HTTPHTTP HTTP Stateful Services Database Backed Services
  23. 23. Django Application Model
  24. 24. Class Based, Definitional Programming from rest_framework import viewsets class InstanceViewSet(viewsets.ModelViewSet): queryset = Instance.objects.all() serializer_class = InstanceSerializer def list(self, request): pass def create(self, request): pass def retrieve(self, request, pk=None): pass def update(self, request, pk=None): pass def destroy(self, request, pk=None): pass from django.db import models from rest_framework import serializers as rf class InstanceSerializer(rf.ModelSerializer): prediction = rf.CharField(read_only=True) class Meta: model = Instance fields = ('color', 'shape', 'amount') class Instance(models.Model): SHAPES = ('square', 'triangle', 'circle') color = models.CharField(default='red') shape = models.CharField(choices=SHAPES) amount = models.IntegerField()
  25. 25. Features and Instances as Star Schema
  26. 26. REST API Feature Interaction
  27. 27. Model (ML) Build Process: Export Instance Table COPY ( SELECT instances.* FROM instances JOIN feature on feature.id = instance.id ... ORDER BY instance.created LIMIT 10000 ) as instances TO '/tmp/instances.csv' DELIMITER ',' CSV HEADER;
  28. 28. Model (ML) Build Process: Build Model import pandas as pd from sklearn.svm import SVC from sklearn.cross_validation import KFold # Load Data data = pd.read_csv('/tmp/instances.csv') scores = [] # Evaluation folds = KFold(n=len(data), n_folds=12) for train, test in folds: model = SVC() model.fit(data[train]) score = model.score(data[test]) scores.append(score) # Build the actual model model = SVC() model.fit(data)
  29. 29. Model (ML) Build Process: Store Model import json import pickle import base64 import datetime data = pickle.dump(model) data = base64.base64encode(data) return { "model": data, "created": datetime.datetime.now(), "form": repr(model), "name": model.__class__.__name__, "scores": scores, }
  30. 30. Model Data Storage from django.db import models class PredictiveModel(models.Model): name = models.CharField() params = models.JSONField() build = models.FloatField() f1_score = models.FloatField() created = models.DateTimeField() data = models.BinaryField()
  31. 31. REST API Model Interaction featurize() predict() Models Stored in Memory Update Annotations
  32. 32. Build Data Products!
  33. 33. Questions?