9. 9
Functionalities
Identify dependencies – producers and consumers.
Ensure server and application health monitoring/alerts are set
up.
Ensure all the dependent components are up and running.
Restart services applications automatically, if not working as
desired.
If not, send out an email notification with logs attached.
11. 11
Key pointers
Use Human-centered design approach as to solve the
problem
Identify multiple metrics to assess training and monitoring
When possible, directly examine your raw data
Understand the limitations of your dataset and model
Test, test and test.
Continue to monitor and update the system after deployment
12. 12
Impact on the team
• Improved Up-time, no manual intervention/debugging at large.
• Know the exact point of failure for downtime.
• Consistent test results without false positives.
• Drastic productivity boost by avoiding manual debugging.
• Extendibility for reliability/failover testing -> graceful
handling.
• We have time for solving bigger problems!