How Customer and Developer Challenges in a heterogeneous data world have driven a more holistic data management strategy and approach to consumable solutions, team execution, and AI.
1. MongoDB and IBM How Customer and Developer
Challenges in a heterogenous data world have driven a more
holistic data management strategy and approach to consumable
solutions, team execution, and AI.
Michael Connor
Program Director, Offering Management, Open Source Databases
IBM Analytics
November 8, 2018
2. 74% of respondents say their data landscape is so complex that it
limits agility
85% struggle with data from a variety of locations, and 72 % say
that their data landscape is complex with the variety and
number of data sources
75% of analytics solutions will incorporate 10 or more exogenous
data sources from 2nd party partners or 3rd party providers
(Gartner states that by 2019)
Data Management is complex…
3. 3
It’s not the team
with the best
players that wins.
It’s the players
with the best team
that wins!
4. 4
“We must have had 99
percent of the game. it
was the other three
percent that cost us
the match.”
Ruud Gullit
6. • IBM’s commitment to:
• Open Source
• Freedom of choice
• Fusing Open Source with a modern, high performance data architecture
• Cloud flexibility – for data federation, advanced analytics and AI
• IBM Cloud Private for Data
…change drives response
9. 9
9
…and response…Ladder to AI
Multi-Cloud
COLLECT
ORGANIZE
ANALYZE
AUTOMATE
Data of every type, regardless of where it lives
MODERNIZE
TRUST
AI
10. The Hybrid Data
Management Solution
Set expands access
by leveraging the
Common SQL Engine
and Virtualization
improving visibility of
disparate data sources
across the enterprise
11. Digital transformation journey with hybrid data management
Hybrid Data
Management
COLLECT
Governance and
Integration
ORGANIZE
Data Science and
Business Analytics
ANALYZE
12. Write Once, Run Anywhere, with a Common SQL Engine
Hybrid Data Management Solutions - Unified application and user experience
Anchored by a Common SQL Engine enabling true, highly scalable hybrid data warehousing solutions with portable analytics
– Application compatibility
Write once, run anywhere
– Operational compatibility
Reuse operational and housekeeping procedures
– Licensing
Single entitlement for flexible consumption
enabling business agility and cost-optimization
– Integration
Data virtualization capabilities for query
federation and data movement
– Standardized analytics
Common programming model for in-DB analytics
– Ecosystem
One ISV product certification for all platforms
Managed public
Cloud DBaaS
Db2 on Cloud
Db2 Warehouse
on Cloud
Compose
Software
defined warehouse
on-premises
or in cloud
Db2 Warehouse
Dedicated analytics
appliance
Integrated Analytics
System
Custom deployable
database
Db2
Open source
MongoDB
PostgresSQL
Big SQL
Hadoop w/Hortonworks
13. So what is Data
Virtualization?
The ability to view, access,
manipulate and analyze data
without the need to know or
understand its physical
format or location.
13
14. Bringing Data
Virtualization to bear
on real problems
Where it applies …
Optimizing the analytics over different lines of
business.
Unifying data from multiple independent
without copying the data
Staying in compliance with privacy and security
legislation.
Combining IoT and enterprise data.
14
15. Your applications can provide transparent access to other data sources via built-in data virtualization
Your
applications IBM
Hybrid Data
Management
Data Sources
Data virtualization enables IT provisioning for the business
Select details for MongoDB:
• JSON support for NoSQL data stores
Federate to MongoDB collection
Ability to parse and query collection
Initial phase support local processing
Next phase supports Pushdown
16. - DDL to create Federation objects
create server mongotest type jdbc version 2.54 wrapper JAVA
options(host '9.30.252.5', port '28017', dbname 'test');
create nickname students
(
name char(32) options(jpath '$.name'),
exam_score double options(jpath
'$.scores[0].score'),
quiz_score double options(jpath
'$.scores[1].score'),
homework_score double options(jpath
'$.scores[2].score')
)
for server mongotest options(collection 'students');
- SQL for federated query from MongoDB
select * from students where exam_score > 60;
NAME EXAM_SCORE QUIZ_SCORE HOMEWORK_SCORE
--------------------------- -------------------------- ------------------------- --------------------------
Salena Olmos +8.03782650915718E+001 +4.24878066695681E+001 +9.65298617163333E+001
Sanda Ryba +7.70050995365469E+001 +8.78044963253892E+001 +2.52736853243295E+001
Aurelia Menendez +6.50604507103096E+001 +5.27979069190387E+001 +7.17613343916554E+001
{
"_id":1,
"name":"Aurelia Menendez",
"scores":[
{
"score":65.06045071030959,
"type":"exam"
},
{
"score":52.79790691903873,
"type":"quiz"
},
{
"score":71.76133439165544,
"type":"homework"
}
]
}
student
s
companies
…
Database: test
Example json document
Federated access to MongoDB
Federation
Data Source
SQL
Data
Predicate Pushdown
JSON Data Parsing
Nickname
Server
Nickname
Nickname
17. Moving forward an extended approach is required
Dynamic multipath routing avoids
bottlenecks and slow systems
Each node instead simultaneously
sends the relevant portions of the
query to both the connected data
source(s) to it’s peers in the
network.
Combines and process the results
as they are received.
Implicitly results in balanced
processing of the query through
the constellation.
19. Collaborative
Teaming across
various Roles
19
Data Engineer
Architects data pipelines & ensures operability.
Data Steward
Governs data & ensures regulatory compliance.
Data Scientist
Gets deep into the data to draw insights for the business.
Business Analyst
Works with data to apply insights to business strategy.
App Developer
Plugs into analysis and code to build apps.
20. Collect Data
– Fast provisioning of
Databases
– Data Warehousing
– Fast data ingest
– Data Virtualizing for
internal and remote
sources
– Structured and
Unstructured data
Organize Data
– Data integration & shaping
– Data curation
– Governance and privacy
policies
– Data asset lifecycle
management
Analyze Data
– Self-service analytics
tooling and productivity
– Data visualization &
exploration
– Machine learning
– Model management and
deployment
– Dashboards and business
reporting
What is IBM Cloud Private for Data?
21. Cloud-native Micro Services
Instant Provisioning of Infrastructure & Experiences
• Data Science & ML
• Data Preparation
• Dashboards
New
Enterprise Data Catalog
IBM Cloud Private
• Data integration
• Data profiling
• Policy management
• Databases & warehousing
• Fast data event store
• Data virtualization
A Self-Service, cloud native experience
Enterprise Dbs
Community Dbs
• Redis Community• MariaDB Community• MongoDB Community
• PostgreSQL Community w/Elite Support
• MongoDB Enterprise Advanced
*Coming
Soon…more to
follow
MongoDB Enterprise Tile
Accessible in ICP Catalog
User Requests MongoDB
Enterprise Instance(s)
Kupernetes VM Environment
Provisioned
Deploy MongoDB Instances
andOps Manager
Secure Access, Manage, and
Grow
22. App Dev - Data In Motion
Analytics Visualizations
Warehouse Disaster Recovery
Data Preparation / Wrangling
Data Repositories
Technology Consulting
Persistent Contain Storage
Storage as a Service
A growing ecosystem
24. 24https://github.com/IBM/watson-training-from-on-prem-data
IBM Watson is a vast umbrella of
technologies and solutions, one of
which is Watson Studio, a PAML
solution
Watson Studio blends workflow
capabilities with open source
machine learning libraries and
notebook-based interfaces
It is designed for all
collaborators— who are key to
making machine learning models
surface into production
applications
Watson offers easy integrated
access to IBM Cloud pretrained
machine learning models such as
Visual Recognition, Watson
Natural Language Classifier, and
many others.
What is Watson?
3
Open Source tools – Jupyter and RStudio
Watson Visual Recognition – retrain Watson
Elastic and customizable compute environments
Create ML flows and design Neural Networks visually
25. Tooling and API to make building apps easy, with the
ability to create and manage custom models with Watson
Studio.
Support for CoreML to leverage models on iOS devices.
Privacy and Security ensured by IBM.
An image recognition service that enables users
to quickly and accurately tag, classify, and train
visual content using machine learning.
BASIL
LEAF
HERB
PLANT STEM
GREEN
Visual Recognition
26. 26
General
Faces
Custom
Food
Text Explicit
Quickly
understand
the
contents,
scenes, and
actions
within an
image.
Locate faces
within an image
and receive age
and gender
estimates.
GA: Face
Detection
4Q Beta: Face
Matching
Determine if an
image contains
inappropriate
content that may
be unsuitable for
general
audiences.
Train
Watson to
understand
and classify
your own
custom
content.
Recognize foods
and meals with
enhanced
accuracy.
Extract full
words from
natural scene
images (i.e.
billboards,
street signs)
Watson Visual Recognition is trained
on:
Visual Inspection: An Insurance company
builds an image recognition solution to
automate visual inspections for damage,
defects, and quality assurance.
Aerial Inspection: A drone can use a custom
image model to survey and quickly identify
burned or flood damaged homes.
Social Media Listening: An Advertising
agency analyzes visual content in social
media posts to understand content,
sentiment, and trends.
Demographics: A Retailer uses face
detection capabilities to gather age and
gender estimates of its shoppers.
Resource Identification: A Mining &
Minerals company uses image recognition to
automatically identify assets and sites in
satellite imagery.
Content Enrichment: A Media company
uses image recognition to automatically
append metadata to visual content, turning
dark data into searchable content.
Identify multiple objects in images to better
understand an image as a whole.
4Q Roadmap: Closed BetaObject
Detectio
n
What is Visual Recognition used for?
27. 27
* Two critical concepts: Classifiers/classes and Scores
https://pbs.twimg.com/media/CdJavKoUAAAFLBG.jpg
Watson Visual Recognition – What is produced?
29. IBM Confidential29
Secure and Scale Your Data through IBM
Z®
/Hyper Protect Services / DBaaS
PowerAI + IBM POWER9™ + GPUs
with MongoDB on Private and Hybrid
Cloud
Advantage with IBM Platforms
• 4x faster model training on best GPU
server for AI
• 43% lower solution costs saving up to
$2M per rack
• 2x faster on fewer systems with less
cost
•
• Industry-leading data confidentiality
through built-in workload isolation,
restricted administrator access, tamper
protection against internal threats
• High availability and reliability
• Supports industry compliance and
certifications – GDPR
• Provides standard APIs to provision,
manage, maintain and monitor multiple
database types
• Integrates with IBM Cloud services
30. Summary: Why IBM and Hybrid Data
IBM has always been major sponsor of Open Source including in areas of
Hadoop, Data science, NoSQL, and Java
IBM Hybrid Data Management now includes MongoDB Enterprise with
Integration points including Db2 family Federation, Governance, Analytics
Cloud Private speed consumption of databases across organizations and roles
with Watson fit helps move organizations from analytics to AI
IBM Platforms can extend further the benefits of HDM
IBM is focused on getting developers to AI faster
If you are watching the soccer world cup that’s happening right now, one thing becomes clear: You can have the best individual player, but to win consistently or the championship, you need to have the best team.