Financial Services companies are using machine learning to reduce fraud, streamline processes, and improve their bottom line. AWS provides tools that help them easily use AI tools like MXNet and Tensor Flow to perform predictive analytics, clustering, and more advanced data analyses. In this session, you'll hear how IHS Markit has used Machine Learning on AWS to help global banking institutions manage their commodities portfolios. You will also learn how the Amazon Machine Learning Service can take the hassle out of AI.
2. Takeaways from today’s session
I. How is machine learning changing Financial Services?
II. Case Study: IHS Markit’s commodity inventory solution
III. How do customers get started?
IV. Best practices when building machine learning in the cloud
4. Artificial Intelligence
A system or service that
can perform tasks usually
requiring human
intelligence.
AA
Hardware + learning
algorithms
Fit a network structure
to the data
Fit a function
To the data
AI
ML
DL
7. The challenge: SCALE
Data Training
Prediction
PBs and EBs of
existing data
Aggressive migration
Tons of GPUs
Elastic capacity
Pre-built images
8. The challenge: SCALE
Data Training Prediction
PBs and EBs of
existing data
Aggressive migration
Tons of GPUs
Elastic capacity
Pre-built images
Tons of GPUs and CPUs
Serverless
At the edge, on IoT Devices
9. Business Group Use Case ML Models AWS Tools
Asset Management
Equity ratings
Portfolio management
Portfolio optimization
- Machine Learning
(Clustering, SVM, Logistic
Regression, Gaussian
processes)
- Deep Learning (Embedding,
Matrix Factorization, LSTM,
conversational UI)
- Graphical models
- Amazon Machine Learning
- Deep Learning MXNet
- SparkML on Amazon EMR
- Tensor Flow
Banking
Credit card/lending fraud
Customer onboarding
Digital banking
Compliance
Anti-Money Laundering,
Counter-Terrorism Financing
Due Diligence
Security
Cyber-attack detection
Data leak detection
Virus detection
Financial Services use cases
12. Commodity inventory challenges
• Global warehouses are all different
• Diverse flow of paper documents from warehouse
inventory reports to bills of lading
• Digitize, classify, and aggregate the document flow
• Improve cost and risk management
• Localization diversity
• Eliminate error-prone manual operations
13. Singapore
青岛
Vlissingen
Singapore Al 123.4 mT
Vlissingen Al 234.5 mT
Qingdao Al 567.8 mT
Vendors
Report generation
Vendors continue to generate their
reports as usual. Documents are
redirected to a Markit server.
Process
Markit scrapes the critical data
from the warehouse’s unique
report using a variety of
technologies including OCR.
Audit
Markit stores a digital copy of
the original document for
auditing purposes.
Disseminate
The data is standardized for
language and terminology, then
compiled onto a customer
portal.
IHS Markit Trading firms
Data access
Dependent teams at the trading
firms connect to the portal to
pull a personalized view of the
required data.
Logistics
teams
Finance
teams
Compliance
teams
Schedulers/
traders
Commodity inventory solution
14. IHS Markit – cloud strategy
• Cloud first strategy
• Cloud native vs. lift and shift
• Time-to-market focus
• Investment upfront
• Refreshing the estate
• Financial transparency
• A DevOps world
• Global scale
• Cost savings in the end
15. Architecture approach
• Microservices design approach
• PaaS supporting orchestration and discovery
• OCR, classification, and analysis documents
• Cloud enablement with some lock-in
• API driven
• Modern UI, SSO, Preferences, Notifications
• SDLC – Agile, CI/CD, high velocity of change
• Highly available (Multi-AZ/Global)
16. Machine learning enablement
• Document classification and templates to reduce exceptions
• Configuration data and models (Keras + Tensorflow)
• Use of AML-based images to tool vs. DIY
• AWS compute with GPU on-demand for trainer
• Data lake of document classifications (strategic)
OCR
Classification
Predictor
QA ReviewMLDoc APIs
Document
Repository
Classification
Trainer
Corrected
Predictions
17. P2 Instances
ML product maintenance
ML AMI
ML AMI
Production
Predictor
(AWS Lambda)
Job Scheduler
Train model
with new data
Design new
model
New document
Prediction
corrections
Trained
models
Documents Results
Data Lake (Amazon S3)
19. Commodity tracker benefits
of docs STP
80%
improvement in
timeliness
60%
increase in efficiency
50%
Focus human effort
on the exceptions.
Reduce risk by
cutting the lag
between data
collection and use.
Empower teams to
focus energy on
value-creation
activities.
21. Amazon Machine Learning (Amazon ML)
Easy-to-use, managed machine learning
service built for developers
Robust, powerful machine learning
technology based on Amazon’s internal
systems
Create models using your data already
stored in the AWS cloud
Deploy models to production in seconds
22. Easy to use and developer-friendly
Use the intuitive, powerful service console to build
and explore your initial models
• Data retrieval
• Model training, quality evaluation, fine-tuning
• Deployment and management
Automate model lifecycle with fully featured APIs and
SDKs
• Java, Python, .NET, JavaScript, Ruby, PHP
Easily create smart iOS and Android applications with
AWS Mobile SDK
23. Powerful machine learning technology
Based on Amazon’s battle-hardened internal systems
Not just the algorithms:
• Smart data transformations
• Input data and model quality alerts
• Built-in industry best practices
Grows with your needs
• Train on up to 100 GB of data
• Generate billions of predictions
• Obtain predictions in batches or real-time
24. Integrated with the AWS data ecosystem
Access data that is stored in Amazon S3,
Amazon Redshift, or MySQL databases in
Amazon RDS
Output predictions to Amazon S3 for easy
integration with your data flows
Use AWS Identity and Access Management
(IAM) for fine-grained data access permission
policies
25. Fully managed model and prediction services
End-to-end service, with no servers to provision
and manage
One-click production model deployment
Programmatically query model metadata to
enable automatic retraining workflows
Monitor prediction usage patterns with Amazon
CloudWatch metrics
26. Three supported types of predictions
Binary classification
• Predict the answer to a Yes/No question
Multiclass classification
• Predict the correct category from a list
Regression
• Predict the value of a numeric variable
31. Building smart applications with Amazon ML
- Create a datasource object pointing to your data
- Explore and understand your data
- Transform data and train your model
Train
model
Evaluate and
optimize
Retrieve
predictions
1 2 3
32. Create a datasource object
>>> import boto
>>> ml = boto.connect_machinelearning()
>>> ds = ml.create_data_source_from_s3(
data_source_id = ’my_datasource',
data_spec = {
'DataLocationS3': 's3://bucket/input/data.csv',
'DataSchemaLocationS3': 's3://bucket/input/data.schema',
’compute_statistics’: True } )
34. Train your model
>>> import boto
>>> ml = boto.connect_machinelearning()
>>> model = ml.create_ml_model(
ml_model_id = ’my_model',
ml_model_type = 'REGRESSION',
training_data_source_id = 'my_datasource')
35. Building smart applications with Amazon ML
- Measure and understand model quality
- Adjust model interpretation
Train
model
Evaluate and
optimize
Retrieve
predictions
1 2 3
39. Building smart applications with Amazon ML
- Batch predictions
- Real-time predictions
Train
model
Evaluate and
optimize
Retrieve
predictions
1 2 3
40. Batch predictions
Asynchronous, large-volume prediction generation
Request through service console or API
Best for applications that deal with batches of data records
>>> import boto
>>> ml = boto.connect_machinelearning()
>>> model = ml.create_batch_prediction(
batch_prediction_id = 'my_batch_prediction’,
batch_prediction_data_source_id = ’my_datasource’,
ml_model_id = ’my_model',
output_uri = 's3://examplebucket/output/’)
41. Real-time predictions
Synchronous, low-latency, high-throughput prediction generation
Request through service API, server, or mobile SDKs
Best for interaction applications that deal with individual data records
>>> import boto
>>> ml = boto.connect_machinelearning()
>>> ml.predict(
ml_model_id = ’my_model',
predict_endpoint = ’example_endpoint’,
record = {’key1':’value1’, ’key2':’value2’})
{
'Prediction': {
'predictedValue': 13.284348,
'details': {
'Algorithm': 'SGD',
'PredictiveModelType': 'REGRESSION’
}
}
}