Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

[DSC Europe 22] Engineers guide for shepherding models in to production - Marko Dimitrijevic

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité

Consultez-les par la suite

1 sur 23 Publicité

[DSC Europe 22] Engineers guide for shepherding models in to production - Marko Dimitrijevic

Télécharger pour lire hors ligne

Sometimes just creating a good model is not enough we need to enable people to use it and that often means making it a part of a bigger system or somehow deploying it. This will be from an engineering point of view of how we work with a data scientist or a team of them to make sure the model is production ready. Here is a short check list of things we would do for each model: 1. understand what the model is trying to do/predict 2. define all of the model inputs and outputs 3. define point (as a point in time and integration point) in the wider system when the model is called 4. define how we want to host the model. We from engineering team usually help to make sure we can gather all of the model inputs and process all of the model outputs, also we make sure models are fast and reliable to call in a production environment and we help optimize them for that we also help enforce good engineering practices that rub off on DS people and make them more efficient. And in this talk we will see a few examples of how we do things and what things to look for.

Sometimes just creating a good model is not enough we need to enable people to use it and that often means making it a part of a bigger system or somehow deploying it. This will be from an engineering point of view of how we work with a data scientist or a team of them to make sure the model is production ready. Here is a short check list of things we would do for each model: 1. understand what the model is trying to do/predict 2. define all of the model inputs and outputs 3. define point (as a point in time and integration point) in the wider system when the model is called 4. define how we want to host the model. We from engineering team usually help to make sure we can gather all of the model inputs and process all of the model outputs, also we make sure models are fast and reliable to call in a production environment and we help optimize them for that we also help enforce good engineering practices that rub off on DS people and make them more efficient. And in this talk we will see a few examples of how we do things and what things to look for.

Publicité
Publicité

Plus De Contenu Connexe

Similaire à [DSC Europe 22] Engineers guide for shepherding models in to production - Marko Dimitrijevic (20)

Plus par DataScienceConferenc1 (20)

Publicité

[DSC Europe 22] Engineers guide for shepherding models in to production - Marko Dimitrijevic

  1. 1. Engineers guide for shepherding models in to production Marko Dimitrijević Staff software engineer at vroom marko.dimitrijevic@vast.com m_a_r_e.91@hotmail.com
  2. 2. What is an example of a data science model? ● Takes in data ● Learns from data ● Gets updated ● Produces results ● Provides value
  3. 3. What does it mean to run a model in production environment And what does it mean to run a model in real time ● Returns a result every time (or a meaningful error) ● Inputs and outputs are well defined and explained ● Performant and reasonably optimized for the task ● Results can be depended on and are verified to be correct ● System is able to handle errors and edge cases
  4. 4. Two main topics we will cover Building model pipeline ● Defining inputs and outputs ● Defining a call place ● Collecting inputs and delivering outputs ● Handling edge cases Hosting the model ● Defining resource needs ● Optimizing model code ● Integrating with platform code ● Testing and iterating
  5. 5. What does the engineering team do We also rename stuff to follow standards! ● We take care of monitoring and scaling to match the load ● We take care of the model hosting and plumbing for providing inputs and outputs ● We also advise on what is doable and not and help find the optimal solution ● We help rework models to run faster, better or to follow good practices ● We know to ask the right question and prevent future problems
  6. 6. How model pipeline is built Assuming research part is done and algorithm is picked ● Produce the model ● Identify where we can get the data for training ● Find a way to get all the inputs the model needs ● Find the right location to integrate the model and call it ● Deliver model outputs to the right place
  7. 7. When model trains on processed data Not a good way to do it!
  8. 8. Common issues in building model pipelines Most common issues are data related ● Model is using data upstream of the place from where it will be called. This can lead to inputs that do not take raw data but some processed/aggregated version of it that is not available at model call place. ● Data gets renamed many times on its way from source to the model training Sometimes people call same things different names and vice versa, this can lead to confusion and wrong inputs being used. If i want a location is that location of the customer or vehicle location or address of reconditioning center handling the vehicle? ● Incomplete data is filtered out in training but can it be in real world It is easy to ignore incomplete data when there is a lot of leftover data to work with, but in the real world pieces of information are often missing and sometimes a prediction is better than no prediction. Coverage can be very low for models that are strict about their inputs.
  9. 9. Common issues in building model pipelines Potential timing issues ● If model is called in real time and user is waiting for a response it should be <1s Sometimes models are not built to operate fast on a single input but are optimized for batch processing. Also we often need to spend time querying different services to prepare model inputs and that time adds up to the total response time to the user. ● Once model is done and needs to be integrated the process takes a long time Inserting a model in a production flow can take time and if we need to build a pipeline that gets multiple different and “exotic” inputs that can include many teams and be time consuming ● It is hard to identify the right moment in time to trigger the model We often want to predict something as early in the process as possible but we also want to have as many inputs as possible, sometimes inputs become available over time and we have to decide how long to wait.
  10. 10. Some tips for building models ● Build for data available at model call time/place Figure out what data is actually available at the place in the system that is going to use our model Figure out at what point in time the model is going to be called and what data we have at that point in time Expect to have missing data at runtime and prepare for it ● Describe your inputs and outputs When defining inputs have a description for each one that helps people figure out what it is ● Communicate early and optimize Start a conversation with with people early in the process about what model needs to run so it can be planned for and be ready to iterate on the inputs to adjust for the system limitations Optimize for the way data is being processed batch or single piece at the time and make sure model is fast enough for the intended use case
  11. 11. Some more tips for building models ● Build lookup tables for “static” data Not everything has to be provided to the model, consider making lookup tables that contain some of the “static” slower changing inputs. For example average user activity per day per state, or popularity per year, make, model, trim of a vehicle. Also include keys for missing in the lookup tables. Bundle lookup tables with the model and make an output of the training process. So each time a model is updated tables can be updated as well. ● Retrain often and include previous model tracking Automate model training process and retrain the model often if data is chaining. Setup tracking of the model once it is in production and record inputs and outputs. Compare those values with what is expected and include it in future model training. ● Be careful with string inputs data normalization might not happen on the source of data and some edge case values might appear rarely enough to be missed. Define a set of acceptable inputs or do normalization at runtime
  12. 12. How we host our models We support thousands of requests per second and few tens of milliseconds response times in some cases! This is an example of how we could host a model and does not have to represent a real vroom approach. ● Iterate on model code with DS owner many times ● Provide a template on how to deliver and build a model ● Provide a stable, fast and reliable platform that makes calling models easy ● Provide an easy way to run many version of the model and to add new versions ● Be ready to evolve to satisfy requirements and add new features
  13. 13. Example of a hosted model ● Each time model is trained a new mode data version is created ● Model calling code is shared for all model data versions ● Service can load multiple versions at the same time ● Model data version is picked dynamically ● Alias can be defined and pointed to specific model data version Simplified example system
  14. 14. How to define model code and model data Model data Produced by training, owned by DS, on remote storage Model code Owned by the ENG team Located in the hosting service Consists of model artifact and lookup files Artifact is packed up code (like .pkl file ) that has simple api to call a predict function. Lookup files should have values needed for input processing and can be used by the model code and values from them are passed in to model artifact Should be updated often New model data can be produced with each training cycle, API and format should remain the same so it is compatible with model code using it Model code should not be changed often (ever) It is intended to do pre processing and post processing of inputs/outputs using lookup files Model code should not contain model logic All the logic should be in model artifact that is called by the model code.
  15. 15. How to integrate a model ● Write some code to invoke the trained model Define generalized calling code that is parametrized by lookup files and knows how to pass in data and interpret the result. Make that in to model code ● Add all dependencies Identify all dependencies model needs to run, all inputs, all files and extract some inputs into lookup and make it in to model data ● Deliver model data to a remote location Define a location where model data is delivered and the way data is structured ● iterate, iterate, iterate Analyze performance and speed and iterate on improving it, convert from batch processing to single input processing, change data types used or rework code blocks that are sub optimal Promote model through environments and compared results with expected results ● Integrate with other service and start using it in production
  16. 16. What does a platform like this offer ● Easy and fast integration Simple calling code logic to invoke the model written in python ● Easy and fast model updates Model data is delivered to a remote location at will at any cadence. Service automatically scans and detects new model data versions periodically at runtime. ● Easy and fast testing Detected model data versions can be loaded(or unloaded) and hosted dynamically Service hosts multiple model data versions of a model at the same time allowing for easy comparison, AB testing, and independent updated of different use cases. When calling the service a specific model data version that should be used can be defined. If requested version is loaded response is sub second, if that version is discovered but not loaded then service downloads required model data, loads it up and starts hosting it and then result is returned still within the same request with a delay. After first load subsequent calls are fast.
  17. 17. What more does a platform like this offer ● Easy data manipulation You can define calling code to transform your input before passing it to the model and also do the same thing with your outputs. ● Monitoring and constant uptime are built in We monitor and report various metrics and have alerting for potential issues ● Designed for scaling and cost optimization We can scale horizontally to cover huge loads and have automatic load based scaling and automatic unhealthy host replacements with continuous uptime We can host model with high resource requirements in an optimized way We run efficiently with multiple model data versions sharing the same server but also multiple different models each having multiple model data versions are sharing the same server. This cuts down on cost dramatically and it might not seem as important with only one or two models in use, but when we get in to tens of models each with many different model data versions that can become a huge cost if each model is running on a separate server.
  18. 18. A peak under the hood ● We use python based service for model hosting to make DS integration of calling code easy ● We expose a set of REST API routes for each supported model ● We run on aws ec2 instances directly and support gpu and cpu based models ● We use s3 to store model data ● We run batch predictions and one by one real time predictions and produce millions of predictions per day ● Models are called by services that specialize in orchestration, data gathering and caching host/some-model/1/predict?modelDataVersion=last_one Request: [{ "vehicle": { "year": 2019, "make": "Lamborghini", "model": "Huracan Spyder" .... } other inputs.... }] Response: { "modelDataVersion": "2022-11-11", "prediction": [ { "reconditioningCost": "a lot of money :)" } ] } *fictional api example
  19. 19. Let’s summarize We also do a lot of other interesting stuff! ● Start building the pipeline early in the proces ● Don’t build a model in a silo ● Ability to run many models in parallel and many model versions for each of them ● Automated support for updating models ● Ability to run efficiently and scale to match high loads
  20. 20. Vroom is hiring! Reach out to our recruiters via LinkedIn to find out more or send us your CV at vroomcareer@vast.com
  21. 21. Questions?
  22. 22. S3 structure example
  23. 23. Model loading example

×