IBM z Analytics: Machine Learning and Cognitive Analytics for the Enterprise

IBM z Analytics
Mehmet Cüneyt Göksu
zAnalytics Technical Leader, IBM MEA
IBM Analytics Platform
Machine Learning and Cognitive Analytics for
the Enterprise

IBM z Analytics
3
AGENDA
• Machine Learning in general and what is IBM Solution
• Technical View of Machine Learning
• Video Demo
• Q&A Discussion

Data is becoming the
world’s new natural
resource, transforming
industries and professions
Gartner identifies
Machine Learning as
the Top Trend in IT for 2018
and at the top of
every CIO's strategy & budget
Mobile and social are
transforming individual
Engagement creating
expectations of security,
trust and value in return for
personal information
Three drivers are transforming the industry and IBM
Data is the new basis
of competitive advantage
Machine Learning is the
enabling technology of
the 21st century
A systematic approach
to engagement is
now required

Top Trends In Analytics, Machine
Learning And Cognitive Applications

All data originates in real-time…

…but, traditional analytics to gain insights and
build models is usually done much, much later.

Insights are perishable.
*perishable : Article that can lose its usefulness and value if not utilized within certain period

Traditional analytics infrastructure is too
slow and, perhaps, even harmful
BusinessValue
Time To Action
Data
originated
Analytics
performed
Insights
gleaned
Action
taken
Outdated
insights
Impotent or
harmful
actions
PositiveNegative
Decision
made
Poor
decision

Customers want and increasingly expect
to be treated like celebrities.

• Learn individual customer
characteristics and behaviors
• Detect customer needs and
desires in real-time
• Adapt applications to serve an
individual customer in real-
time
Celebrity experiences must:

Artificial Intelligence (AI)
Any system that mimics human
intelligence
IBM Watson
Machine Learning (ML)
Allows computers to learn on their
own
IBM Machine Learning
Deep Learning (DL)
Deep learning is part of a broader family of
machine learning methods based on learning
data representations, as opposed to task-
specific algorithms. (neural network)
IBM Deep Learning

IBM z Analytics
20
What is Machine Learning?
Machine Learning for z/OS
Computers that …
Learn without being explicitly programmed
Identify
Patterns
not readily
foreseen by
humans
Build Models
of behavior
from those
patterns
Score or Predict
Behavior with the
deployment models
computer
data
program
output
computer
data
output
program
Traditional Programming
Machine Learning

IBM z Analytics
22
Why Machine Learning?
• With Machine Learning companies can truly tap into their rich vein
of historical system of record information
• Mine it to automatically discover insights and generate predictive models to take
advantage of all the data they are capturing
• This means that instead of looking into the past for generating reports, businesses can
predict what will happen in the future based on analysis of their existing data
• Predicting the future means things like
 Personalizing every client interaction, risk reduction, fraud detection, cross sell/upsell,
customer categorization, inventory optimization, …. and infinite others all meant to
increase your revenue and disrupt your competition
The value of machine learning is rooted in its ability to create accurate models to guide future
actions and to discover patterns that we’ve never seen before

Machine learning is
everywhere, influencing
nearly everything we do…
7 out of 10 financial
customers would take
recommendations from a
robo advisor
Machine Learning Basics
 Identifies patterns in historical data
 Builds behavioral models from patterns
 Makes recommendations
 The data (operational & historical) is used
to “train” a model
Voice recognition systems
such as Siri use machine
learning to imitate human
interaction.

IBM z Analytics
25
Common Types of Machine Learning Approaches
1. Supervised – Makes a prediction
• Classification: Goal is to predict a category
– Binary-classification (yes/no)
• Examples: Fraud, Churn, Purchase, Spam email detection
– Multi-classification (which of several items to recommend)
• Examples: Netflix, Amazon recommendations, Ad
recommendations for products
• Regression: Goal is to predict a value
• Examples: Value of your home in a year, Stock prices prediction
2. Unsupervised – Groups items into clusters
• Clustering: Goal is to group data into clusters for better organization
– Example: Categorize banking customers by income in order to
know what products they will buy,
– Example: Classify customers by the fact that they returned a
product in the last 3 months

Machine Learning 101 : Supervised Learning
• A feature is a piece of information that might be useful for
prediction
– Example, predict the probability of a customer buying a product
• Labeled data is the desired output data
– Example, 1.0 representing a customer has bought a product; 0.0 representing NOT
GENDER AGE MARITAL_STATUS PROFESSION CUSTOMER_ID LABEL
F 24 Married Retail 4003 1.0
M 43 Married Trades 4004 1.0
F 43 Unspecified Hospitality 4005 0.0
F 43 Unspecified Sales 4006 1.0
M 28 Single Trades 4007 1.0
Feature Feature Feature Feature NOT a feature Label

Training a
model
Feature
Engineering
Feature
Engineering
Scoring
Labeled
examples
Training
Scoring
New
data
Model
Model
Predicted
data
Deploy
Data Science Experience
Operational system
Dev
Ops
Machine Learning 101 : a TrainOps (DevOps) story

Example: Credit card transaction anomaly (fraud) detection
• Input data: transaction history
– Credit card number, amount, date, merchant id, etc
– Fraud or not
• Data preparation
– Compute and update card profiles after each transaction:
• Time since previous transaction
• Average and variance of frequency of use
• Average amount
• Variance in amount
• Etc
• Training with Machine learning algorithms:
• Anomaly detection
• Classification (with unbalanced class as fraud occurrence is low)
• Yields model(s) that predict a probability that a given (transaction, card profile) is anomalous.

Scoring
• Once trained, the machine learning model must be used
within the transaction processing system
• Data must be prepared the same way:
– Before scoring a transaction, the card profile must be loaded
Model
Transactions
Card profiles
Scores
Transaction
processing
Update

Spark terms - DataFrame
• Spark uses DataFrame APIs to read data sets

Spark terms – Transformer
• Transformer is an operator which can transform one DataFrame into another DataFrame
• Example
– A StringIndexer encodes a string column of labels to a column of label indices.
• StringIndexer is needed because some algorithms can handle numeric types only
– G_IDX is encoded by StringIndexer from GENDER column
GENDER G_IDX AGE MARITAL_STATUS PROFESSION CUSTOMER_ID LABEL
F 1.0 24 Married Retail 4003 1.0
M 0.0 43 Married Trades 4004 1.0
F 1.0 43 Unspecified Hospitality 4005 0.0
F 1.0 43 Unspecified Sales 4006 1.0
M 0.0 28 Single Trades 4007 1.0

Spark terms – Pipeline
• A Pipeline chains multiple Transformers and Estimators together to specify an ML
workflow.
Pipeline

workflow.
String
Indexer
Pipeline
GENDER G_IDX AGE MARITAL_STATUS PROFESSION CUSTOMER_ID LABEL
F 1.0 24 Married Retail 4003 1.0
M 0.0 43 Married Trades 4004 1.0
F 1.0 43 Unspecified Hospitality 4005 0.0
F 1.0 43 Unspecified Sales 4006 1.0
M 0.0 28 Single Trades 4007 1.0

workflow.
String
Indexer
String
Indexer
Pipeline
GEND
ER
G_IDX AGE MARITAL_STATUS M_IDX PROFESSION CUSTOMER_ID LABEL
F 1.0 24 Married 1.0 Retail 4003 1.0
M 0.0 43 Married 1.0 Trades 4004 1.0
F 1.0 43 Unspecified 2.0 Hospitality 4005 0.0
F 1.0 43 Unspecified 2.0 Sales 4006 1.0
M 0.0 28 Single 3.0 Trades 4007 1.0

workflow.
String
Indexer
String
Indexer
String
Indexer
Pipeline
GENDE
R
G_ID
X
AGE MARITAL_STATUS M_IDX PROFESSION P_IDX CUSTOMER_I
D
LABEL
F 1.0 24 Married 1.0 Retail 1.0 4003 1.0
M 0.0 43 Married 1.0 Trades 3.0 4004 1.0
F 1.0 43 Unspecified 2.0 Hospitality 2.0 4005 0.0
F 1.0 43 Unspecified 2.0 Sales 0.0 4006 1.0
M 0.0 28 Single 0.0 Trades 3.0 4007 1.0

workflow.
String
Indexer
String
Indexer
String
Indexer
Vector
Assembler
Pipeline
GEN
DER
G_I
DX
AGE MARITAL_STA
TUS
M_I
DX
PROFESSION P_ID
X
FEATURES CUSTOMER_ID LABE
L
F 1.0 24 Married 1.0 Retail 1.0 (1.0, 1.0, 1.0) 4003 1.0
M 0.0 43 Married 1.0 Trades 3.0 (0.0, 1.0, 3.0) 4004 1.0
F 1.0 43 Unspecified 2.0 Hospitality 2.0 (1.0, 2.0, 2.0) 4005 0.0
F 1.0 43 Unspecified 2.0 Sales 0.0 (1.0, 2.0, 0.0) 4006 1.0
M 0.0 28 Single 0.0 Trades 3.0 (0.0, 0.0, 3.0) 4007 1.0

workflow.
String
Indexer
String
Indexer
String
Indexer
Vector
Assembler
Logistic
Regression
Pipeline
GEN
DER
G_I
DX
AGE MARITAL_STA
TUS
M_I
DX
PROFESSION P_ID
X
FEATURES CUSTOMER_ID LABE
L
F 1.0 24 Married 1.0 Retail 1.0 (1.0, 1.0, 1.0) 4003 1.0
M 0.0 43 Married 1.0 Trades 3.0 (0.0, 1.0, 3.0) 4004 1.0
F 1.0 43 Unspecified 2.0 Hospitality 2.0 (1.0, 2.0, 2.0) 4005 0.0
F 1.0 43 Unspecified 2.0 Sales 0.0 (1.0, 2.0, 0.0) 4006 1.0
M 0.0 28 Single 0.0 Trades 3.0 (0.0, 0.0, 3.0) 4007 1.0

Spark terms – PipelineModel
• PipelineModel is a Transformer which is used to make prediction
String
Indexer
String
Indexer
String
Indexer
Vector
Assembler
Logistic
Regression
Pipeline
Logistic
Regression
Model
GENDER AGE MARITAL_STATUS PROFESSION
F 24 Married Retail
Prediction: 1.0, Probability:
0.8523
PipelineModel.transform()

IBM z Analytics
47
Machine Learning is for all of our Industries
Healthcare
Aided diagnosis
Disease prevention
Finance
Fraud prevention
Financial trade
optimization
Retail
Marketing
personalization
Improved customer
service
Security
Security screening optimization
Improved cyber-security
Media & Entertainment
Ad targeting Audience prediction
Utilities
Usage pattern analysis
Identify efficiency
opportunities
Telco
Customer churn & retention
Network performance
Transportation
Self-driving cars Traffic congestion
prediction
ITOA
Predict outages
Prevent outages

IBM z Analytics
55
Improve Fraud Detection for Money Transfers
Business Challenge
Current rules based process Mizuho has for monitoring fraudulent transactions for money
transfer is very manual and resource intensive. Accounts with suspected fraudulent
activities are automatically alerted and then there are two additional manual steps to
evaluate the activity: 1) evaluate if transaction is suspect and 2) if suspect, evaluate
whether to report the transaction.
Proof of Concepts
1) IBM Data Science Elite team with support provided by CDL worked with Mizuho IMS
transactional data from across 9 tables to build a model that scored transactions
while minimizing the rate of false positives. The client prioritized reducing false
positives over false negatives (i.e., missing suspect transactions). The IBM team was
able to develop a model that yielded a 93% accuracy rate, with just 7% false positives.
2) CDL also is working in parallel with Mizuho on an IT Operational Analytics use case
leveraging our System Health Tree API to ensure continuous availability of their IMS
System.
Why they chose Machine
Learning for z/OS
• Since transactions originate from
IBM Z, they want a solution that
automates fraud detection close to
data on z/OS.
• Having a flexible and scalable
platform that allows them to bring
in external data, including their
data lake, is another important
factor
Mizuho Bank - A global bank with one of the largest customer base in Japan
with a keen focus on putting both corporate and individual clients first

IBM z Analytics
58
Multiple Flavors of IBM Machine Learning
1. Machine Learning on IBM
Cloud
2. IBM DSX Local 3. Machine Learning for z/OS
 API access for model training, deployment,
and management
 Immediate access to ML models within apps
 Real-time, streaming, & batch deploy options
 Packaged out of the box with DSX, accessible
via API, Wizard GUI, or DSX Canvas
 On-premises deployment on private cloud
IBM Z infrastructure
 Performance of optimized hardware
 Access to live transactional data
 IBM Z data remains in-place, while also
combining data from non-z sources
 Cost effective strategy for lowest latency and
highest degrees of security
Platform agnostic function to address business goals with the
same look and feel across deployment options
IBM Data Science
Experience (DSX) Cloud
IBM DSX Local
x86 Servers
IBM Power
IBM Z
IBM Machine
Learning for z/OS
“IBM manages for you”
“Behind your
firewall for
on-prem
management”
“Leverage
Z data for
hybrid
models”

IBM z Analytics
59
IBM Machine Learning / Data Science Experience Offerings
DSX-Cloud
(Analytics/machine learning assets
development)
Watson Machine Learning
(Machine Learning workflow end-
to-end management) Public Cloud
(2 different services)
IBM Machine Learning for z/OS
(Analytics/machine learning assets development +
Machine Learning workflow end-to-end management) x86 Linux + z/OS
Linux on z + z/OS
DSX-Local
(Analytics/machine learning assets development +
Machine Learning workflow end-to-end management)
x86
Power
Linux on z
Deployment (scoring) on z
Repository in DB2z
Authentication with zLDAP
Similar functions Same look and feel

IBM has transformed Machine Learning to Learning Machines1.Quick model development
2.Fast deployment
3.Easy Management - Continuous auditing &
proactive notification

Introducing IBM Machine Learning
IBM extracted ML out of Watson
IBM z Systems on premise
Machine Learning
Artificial Intelligence
Cognitive Computing
Machine Learning

Machine Learning
Watson APIs
NLP
Speech
Vision
Data
Custom Industry
ML Solutions
Scalable
Compute
Multi-purpose
Tooling
Rich
Algorithms
Open Source
Core
…
Custom Industry Machine Learning Solutions can be developed in
any environment.
Structured Data
Unstructured Data
Watson ML
WML
DSX
MLz
ML for z/OS
IBM ML
DSX Local
DSX Desktop

Machine Learning for z/OS Components• Announced on February 15 at NYC launch event
– General availability on March 17
• Two-tiered architecture: Application cluster runs on Linux x86 and z Linux with computing on z/OS with
z/OS Platform for Apache Spark as the runtime
– Components on z/OS
• Machine Learning for z/OS scoring service
• Spark cluster, including various SPARK ML libraries and CADS/HPO library
• Jupyter Kernel Gateway + Apache Toree + Python
– Components on z Linux/x86 Linux – delivered as Docker images
• Docker images deployed through Kubernetes
• Images contain: authentication token/broker, repository service, deployment service, ingestion service, training
service
• DB2 for z/OS used as the database to store the metadata information for the models, model deployment
information, and evaluation information
• z/OS Tivoli Directory Server (LDAP) is used for user management. Its backend can be RACF, DB2 for
z/OS, LDBM or any repository that z/OS LDAP supports.

Machine Learning for z/OS Hardware and Software Pre-
requisites
– Scoring service and computing cluster on z/OS
 z/OS 2.1 or beyond
 z/OS Platform for Apache Spark V1.1(need to apply ptf to get to spark level 2.0.2)
 LDAP (part of z/OS base product)
 IBM 64-bit JDK for z/OS
 DB2 for z/OS V10 or later
– Application Cluster
 Deployment on Linux x86
 x86 64-bit system with 8 cores, 32GB RAM and 250G disk space (recommendation: 3 Linux x86 systems for HA
coverage)
 200G storage device for each x86 system as sharing volumes
 CentOS 7.2 or RedHat Enterprise Linux Server 7.2 or later
 OpenJDK 8
 Deployment on z Linux (z13, z13s, zEnterprise EC12, zEnterprise BC12, LinuxOne Emperor, or LinuxOne
Rockhopper system)
 2 IFLs (or 4 virtual CPs) with 32 GB memory, 500 GB storage
 Ubuntu (64-bit) 16.04 or later
 OpenSSL 1.0.2g-1ubuntu9.1
 openJDK 1.8.0 or later
 curl 7.47.0

z/OS Liberty
IBM Machine Learning for z/OS ArchitectureApplication Cluster
Ingestion
service
Training
service
z/OS Spark Cluster
Ingestion lib Pipeline lib
Service Metadata
ML models
DB2z
MDSS driver
IBM Machine Learning UI
Jupyter Notebook / Visual Model Builder
Model Management / Model Deployment / Monitoring
Bundled software
MLz component
Pre-requisite software
z/OS Data sources
 Move Machine Leaning
capability to the platform
where the most valuable
data resides
 Integrate real-time
predictive analytics with
transactions
 Leverage z/OS superior
reliability, availability and
security
zLDAP
RACF
(optional)
Auth
Service
Kubernetes
Docker
GlusterFS
Linux (x86 or Linux on z)
z/OS
Scoring service
IMSVSAM
Jupyter Kernel Gateway
Repository
Service
Deployment
Service
(Model
Monitoring)
LDBM
Jupyter
Notebook
Server
DB2 SMF
CouchDB
(NoSQL
Metadata)
Apache Toree
z/OS Spark
In
Local Mode
CADS/HPO lib
DB2 JDBC driver
Python
2.7
Brunel (Visualization)
Feedback
service

What is Apache Spark?
Graph Analytics
Fast and integrated graph computation
Stream Processing
Near real-time data processing & analytics
Machine Learning
Incredibly fast, easy to deploy algorithms
Unified Data Access
Fast, familiar query language for all data
SparkCore
Spark SQL
Spark Streaming
MLlib (machine learning)
GraphX (graph)

Data Ingestion
• Leverage SparkSQL to ingest data from various data
sources
VSAM
z/OS
and many more . . .
Key Business
Transaction
Systems
Spark Applications: IBM and Partners
LogstreamIMSDB2 z/OS
Distributed *
(in development)
HortonWorks
HDFS
Apache Spark Core
Spark
Stream
Spark SQL MLib GraphX
RDD
DF
RDD
DF
Optimized data Layer
IBM z/OS Platform for Apache Spark
and many more . . .
Extremely high zIIP utilization
DB2 Analytics
Accelerator
Unique
capability, only
on Apache
Spark z/OS
Leverage IDAA to optimize Spark
queries transparently
BigInsights
Teradata
DB2 LUW
Oracle

Model Creation – Integrated Jupyter Notebook
• The Jupyter
Notebook is an
open-source web
application that
allows you to
create and share
documents that
contain live code,
equations,
visualizations and
explanatory text. Cell for code snippet
Interactively execution

Model Training – Visual Model Builder
• Visual model builder
is a wizard guiding
users to create a
model step by step
• No programming
skill is required

73© Copyright IBM Corporation 2017. Technical University/Symposia materials may not be reproduced in whole or in
part without the prior written permission of IBM.
Model Management – Saving Model
DB2 for z/OS
V10 or above
Models and
metadata of
models
ML libraries / services to
persistent models
Notebook
Visual
model
builder
ML services
• Models are managed in a central repository in DB2 for
z/OS
• Leverage the high availability of DB2 and z
PMML
Model

Model Deployment
• Model deployment is the process of moving model into
production environment to serve business need – single
click deployment
• Models are deployed as REST interfaces
• Runtime performance monitoring for scoring services
CICS WAS Mobile
DFHJSON

Continuous Performance Monitoring (cont.)
• Highlights deployment whose performance is
downgrading

Machine Learning for z/OS Enhancements for June 2017
Continuous delivery model provides new features and functions to
meet customer needs
• Standard PMML model support allows you to leverage your
existing assets
– Existing models that can be exported to standard PMML can now be
scored on z/OS if your data originates from z Systems
– Lightweight SPSS scoring engine used to score PMML models
• New MLEAP scoring engine gives customers more options for
scoring Spark ML models
– Huge performance boost for online scoring
• Feedback data ingestion further simplifies model evaluation and
feedback loop
• New Administration Dashboard allows system programmers to
easily monitor and manage the system resources from the web UI

PMML Model• The Predictive Model Markup Language (PMML) is an XML-
based predictive model interchange format.
• Many vendors can export their models to PMML format,
including SPSS, R and SAS
• IBM Machine Learning for z/OS supports scoring for PMML
models that conforms PMML standard
• Support for PMML extensions is not guaranteed

IBM z Analytics: Machine Learning and Cognitive Analytics for the Enterprise

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to IBM z Analytics: Machine Learning and Cognitive Analytics for the Enterprise

Similar to IBM z Analytics: Machine Learning and Cognitive Analytics for the Enterprise (20)

More from Cuneyt Goksu

More from Cuneyt Goksu (20)

Recently uploaded

Recently uploaded (20)

IBM z Analytics: Machine Learning and Cognitive Analytics for the Enterprise