Digital business is enabled by Artificial intelligence, Machine learning, and data science. Artificial intelligence and machine learning are dependent on right Information architecture and data foundation. Governed data lake infused with governance and data science platform gives you the power to take the organization in the digital transformation and AI journey.
Generative AI on Enterprise Cloud with NiFi and Milvus
Enabling digital business with governed data lake
1. Enabling Digital business with IBM
Governed Data Lake
Karan Sachdeva (karan@sg.ibm.com)
IBM Big Data Analytics
Sales Leader
Asia Pacific
Connect with me at -
https://www.linkedin.com/in/karan20/
2. 1.Digital and AI Challenges
2.How IBM Governed Data Lake can help?
3.Industry Use Cases
4.IBM Governed Data Lake Building blocks
1.Collect
2.Govern
3.Analyze (Data Science and ML)
5.How to start your Governed Data Lake journey
6. 1.Digital and AI Challenges
2.How IBM Governed Data Lake can help?
3.Industry Use Cases
4.IBM Governed Data Lake Building blocks
1.Collect
2.Govern
3.Analyze (Data Science and ML)
5.How to start your Governed Data Lake journey
7. Persist
Analyze
Ingest Deploy
Data | Assets | Pipelines | APIs
Intelligent governance | Metadata Management
What you need is a Integrated Platform based on Open
Standards
Governed Data Lake
Core Tenets
1. Intelligent by Design
2. Based on Open Standards
and Extensible
3. Collaborative for data
Professionals
4. Self-service access to
trusted data
5. Best in class streaming
and real-time analytics
Collaborate
Data steward Data scientistData engineer Developer
Find Share
8. 1.Digital and AI Challenges
2.How IBM Governed Data Lake can help?
3.Industry Use Cases
4.IBM Governed Data Lake Building blocks
1.Collect
2.Govern
3.Analyze (Data Science and ML)
5.How to start your Governed Data Lake journey
9. Industry Use Cases
Governed Data Lake Use Cases: Burning Business Problems More Robustly Addressed
Financial
Services
Insurance
••Customer 360
••Fraud
••Compliance- GDPR,
PDPA etc
••Risk Analytics
••Operational Data Store
••Predictive analytics
Telco
Media
Energy & Utilities
••Customer 360
••Customer Insights
••Network Optimization
••Data Monetization
••EDW Augmentation
••Predictive maintenance
Retail
Ecommerce
••Personalized Customer
offers
••Omni-channel
Customer Experience
••Loyalty programs
••Next Best Offer/Action
••Recommendation
Engine
••Agile Supply Chain
Manufacturing
Industrial
••Connected: Car, Plane,
Equipment
••Agile Supply Chain
••Predictive Maintenance
••IoT Data enabled
“Smart Services”
Government
Public Sector
••Border Control
••Public Safety /
Intelligence
••360 Tax payer
••Tax Optimization
••Cyber Threat
••Citizen Self Service
••Social Services Fraud
IBM Governed Data Lake
Open Standards | Governance | Machine Learning & Data Science
Enterprise data warehouse(EDW) Modernization | EDW Offload | Teradata/Oracle Refresh
10. Integrate inbound
touchpoints
-offer the best offer for
optimum outcomes
Retail Data Governed
Data Lake
Insight Action
Business
Outcomes
Behavioural
- CRM
- Store
- POS
Descriptive
- Location
- Mobile
- Demographic
- Weather
Interaction
- Web
clicksteams
- Call Center
Notes
- Emails
Attitudes
- Social
Media
Sentiments
Continual data feeds
- To support real-time decision
making at every point of
customer interaction
Leverage All Data
- High volumes
Integrate Data
- Traditional CRM and POS
data
- Combined with modern
sources
Capture and Access all data
- Paper-based customer
notes
- Call center notes
Customer identification
- Single record of customer
detail
Text processing
- Social media comments
- Call center logs
Understanding your customer
-Helps retailers understand
the “why” question and not
only the
“who/what/where/when”
providing you with greater
insight into customer
behavior and buying patterns.
-Real-time analytics to
anticipate customer behavior
Analytical questioning
- Exploration of the data
Business user dashboards
and visualisation across
multiple departments
- Upsell/cross sell options
- Marketing campaign
effectiveness
- Product analysis
- Comprehensive view of a
customer
Improve customer
segmentation
-Advanced customer
analytics to better
define homogeneous
customer clusters
Cross-channel
delivery of best
action to address
customer need and
enhance long term
business revenue
Relevant & Timely
Marketing Offers
- Highly personalized
communications and
offers
- making your
customer
relationship
management more
proactive
Consistency across all
customer interaction
points
- Web
- Mobile
- Call Center
- Email
- Social Media
Improve service
delivery and customer
satisfaction
Optimize revenue
generating actions
such as up sell, cross
sell and retention
Increase strategic
lifetime value and
loyalty
Example- Governed Data lake vision for Retail
Hadoop System
Data Federation
Data Science
Models
Streaming Data
Hadoop+ BigSQL
Info Integration
& Governance
Entity Matching
Predictive
Analytics
Social Media Analytics
Discovery &
Exploration
Business
Intelligence
Prescriptive
Analytics
Prescriptive
Analytics
IBM Data Science Experience
Decision Optimization
Social Merchandise
-Social data (internal and external),
frameworks, models and
dashboards
Retail
Ecommerce
•• Personalized
Customer offers
•• Omni-channel
Customer
Experience
•• Loyalty programs
•• Next Best
Offer/Action
•• Recommendation
Engine
•• Agile Supply
Chain
11. Select the entry points to your Governed Data Lake journey
Disruptive
Competitive
Optimized
Data & Insight Accessibility
Big Data made
accessible and
simple
Data assets made
understood,
protected and
trusted
ML,AI, Optimize
and Automate
Natural Language
Visualization and
Exploration
Collect
Govern
Data Science
Enterprise DW Modernization
EDW Augmentation
Customer 360
Operational DataStore
Machine Learning
Anomaly Detection
Recommendation Engine
Cognitive Text Analytics
12. 12
Hybrid
Data Management
Unified Governance
& Integration
Data Science
Machine Learning
Organize AnalyzeCollect
Understand customer
behavior to make smarter
marketing
& programming decisions.
Billions of records analyzed
in seconds, rather than
days, increasing on-
demand viewing.
Provide governed self-service
data lake for fraud detection
and customer engagement.
Disciplined data classification
upon entry, managing access,
quality, privacy, and retention.
Reduce unplanned trucking
standstills helping clients
better predict maintenance
needs
Applied statistical and ML
techniques to lower
diagnostic time 70% and
repair time 20%
IBM Governed Data Lake Customer Spotlights
13. 1.Digital and AI Challenges
2.How IBM Governed Data Lake can help?
3.Industry Use Cases
4.IBM Governed Data Lake Building blocks
1.Collect
2.Govern
3.Analyze (Data Science and ML)
5.How to start your Governed Data Lake journey
14. - Hortonworks Data
Platform
- Hortonworks Data Flow
- Db2 Big SQL
- IBM Big Replicate
- Information Governance
Catalog
- BigIntegrate
- BigQuality
- BigMatch
- CDC for Hadoop
§ Data Science Experience
Local
§ Decision Optimization
§ Watson Explorer (v12 +)
Under the hood IBM technologies
Governed Data Lake Use Cases: Burning Business Problems More Robustly Addressed
Collect- Hybrid Data
Management
Govern- Unified
Governance & Integration
Analyze- Data Science &
Business Analytics
Financial
Services
Insurance
••Customer 360
••Fraud
••Compliance- GDPR
••Risk
••Operational Data
Store
••Predictive analytics
Telco
Media
Energy & Utilities
••Customer 360
••Customer Insights
••Network
Optimization
••Data Monetization
••EDW Augmentation
••Predictive
maintenance
Retail
Ecommerce
••Personalized
Customer offers
••Omni-channel
Customer Experience
••Loyalty programs
••Next Best Offer/Action
••Recommendation
Engine
••Agile Supply Chain
Manufacturing
Industrial
••Connected: Car,
Plane, Equipment
••Agile Supply Chain
••Predictive
Maintenance
••IoT Data enabled
“Smart Services”
Government
Public Sector
••Border Control
••Risk / Intelligence
••360 Tax payer
••Tax Optimization
••Cyber Threat
••Fraud prevention
Enterprise data warehouse(EDW) Modernization/ EDW Offload/Teradata Takeout/
Capabilities
15. IBM Cloud
Embedded machine
learning and data
science
Drive more value from your
data. Run analytics where the
data lives using the tools your
data professionals prefer.
§ Spark and Jupyter
notebooks built-in
§ Integration with model
building, BI, and
visualization tools
Transactional and
analytic processing
—all in one place
Instant insight from real-time
operational data for growing
revenue, reducing cost and
lowering risk.
§ Simplify IT with transactions
and reporting (HTAP) within
the same system
§ Easy, low-risk offload from
expensive data warehouse-
Teradata or Oracle.
Common SQL engine
with built-in data
virtualization
Anchored by a common SQL
engine to enable scalable data
management solutions with
portable analytics.
§ Application and
operational compatibility
§ Provide transparent access
to other data sources
Support for
on-premises or cloud,
NoSQL or SQL
Offers flexibility in choosing the
form factor that best suits your
business, enabling a controlled
journey to the cloud.
§ A platform that fits your
data strategy
§ Bridge data stores for
seamless data integration
IBM Big SQL for Hadoop- Data Federation across data repositories
More intelligent analytics
and insights
Go at the speed
of your business
Write once, run anywhere,
from any source
Deploy your data
where you need it
18. Built-in learning to
get started or go
the distance with
advanced tutorials
Learn
The best of open source
and IBM value-add to
create state-of-the-art
data products
Create
Community and
social features that
provide meaningful
collaboration
Collaborate
http://datascience.ibm.com
IBM Data Science Experience
• Find tutorials and datasets
• Connect with Data Scientists
• Ask questions
• Read articles and papers
• Fork and share projects
• Watson Machine Learning
• SPSS Modeler Canvas
• Advanced Visualizations
• Projects and Version Control
• Managed Spark Service
• Code in Scala/Python/R
• Jupyter Notebooks
• RStudio IDE and Shiny
• Apache Spark
• Your favorite libraries
Predictive
Power
100%
Capacity
Model Builder
(CADS)
Build model1
Deploy model2
Refresh model3
Import Sources:
§ DSx Notebooks
§ DSx Flow UI
§ External tools
Auto-generate model
from input data,
testing various
algorithms for best
fit (e.g. CADS)
Detect loss of
predictive power and
refresh model,
subject to
preferences
Model
ML Model Lifecycle
19. Open source is a powerful engine, but as with any engine, it
needs the full system to accomplish any work
§ Security – SSO and code hardening to
reduce security gaps
§ Version Currency – We keep up-to-date
as open source quickly iterates
§ Data Connectivity – Connect to data
sources
§ Scalability – Makes tools designed for
desktops scalable to enterprise workloads
§ Enterprise IBM Support- World Class
support by SMEs
We provide:
20. 1.Digital and AI Challenges
2.How IBM Governed Data Lake can help?
3.Industry Use Cases
4.IBM Governed Data Lake Building blocks
1.Collect
2.Govern
3.Analyze (Data Science and ML)
5.How to start your Governed Data Lake journey
21. Data Science Sandbox Quick Start Solution
1. Receive best practices
for your organizations
to get started with
governed data lake
2. Achieve faster time-to-
value with pre-built
accelerators
3. Leverage world-class
data scientists and
engineers with proven
results at most mature
big data and Data
Science customers
12 Nodes of best in
class Open Source
Hadoop- IBM
Hortonworks Data
Platform
5 Users for IBM
Data Science
Experience
1 weeks of
partner
services
engagement
_________
= 75K USD*
*Commercials for services from third party Partner Services
*Commercials focused on Data Science environment for Governed Data Lake
*Commercials may vary from local conditions and countries to countries
*Offer valid till 31st March 2018
+ +
23. IBM is the industry leader in Open Source and Data Science
platforms
23
40,000+ Clients
in 160 countries, training
70,000+ client employees
43% Improvement
in Client Satisfaction
IBM Leadership - Total Portfolio
80% of reports, 60% of Forrester
IBM Consensus Leader in Data
Science & Business Analytics
General Excellence
reddot award
Interface Design
NPS 2017
Apache Committers Top 20
OPEN SOURCE
24. IBM Cloud
Call To Action:
• Identify the use case for your business
Involve us for free Discovery workshop
for Governed Data Lake.
• Read, learn and Contribute at-
• www.ibmbigdatahub.com
• Write to us- karan@sg.ibm.com
25. Persist
Analyze
Ingest Deploy
Projects | Data | Assets | Pipelines | APIs
Intelligent governance | Policy enforcement
What you need is a Integrated Platform based on Open
Standards
Our Core Tenets
1. Intelligent by Design
2. Collaborative for data
Professionals
3. Self-service access to
trusted data
4. Best in class streaming
and real-time analytics
5. Open and Extensible
Collaborate
Data steward Data scientistData engineer Developer
Find Share
26. IBM Big SQL Query Federation = virtualized data access
Transparent
§ Appears to be one source
§ Programmers don’t need to know how /
where data is stored
Heterogeneous
§ Accesses data from diverse sources
High Function
§ Full query support against all data
§ Capabilities of sources as well
Autonomous
§ Non-disruptive to data sources, existing
applications, systems.
High Performance
§ Optimization of distributed queries
SQL tools,
applications Data sources
Virtualized
data
27. Use Case- Social Programs Organizations Need to Be Able to Turn
Their Data into Actionable Information
Gain a comprehensive view of a
family’s ongoing needs and
program results
Match citizen’s needs to the right
program or service
Maximizing a limited budget
Am I managing my
resources effectively?
2. Outcomes
Focused
3. Integrated Service
Delivery
1. Citizen/
focused
information
28. Governed Data Lake Reference Architecture
Low Latency
Data Feeds
Reports,
dashboards,
apps
Real Time
Data Flow
Kafka
Apache NiFi
Governance
CDC for
Hadoop
Data
Processing
MapReduce
Spark
Analyze
Big SQL
Watson
Explorer
Ingest
Sqoop
Flume
IBM Information
Governance
Catalog
HDFS
Atlas
DSX
Big Integrate
Big Quality
Big Match
Yarn
Streaming Data
Text Data
Applications
Data
Time Series
Geo Spatial
Relational
Social Network
Video &
Image
New / Enhanced
Applications
Automated
Process
Use cases
Analytic
Applications
Watson
Cloud Services
ISV Solutions
Alerts
All Data Sources
29. Top 5 Best Practices to set up Governed Data Lake
2. Metadata management and Data Lineage
3. Cognitive Search in Natural Language
4. People: Data Engineers, Data Scientists, CDO and LOB Executives
5. Continuous Learning Incorporation with Feedback Loops
1. Use Case Generation and Prioritization:
Make innovation central to business vision, strategy, and execution
Five Key Recommendations to Innovate with Machine Learning and Big Data