Infosys and MongoDB – A strategic relationship
What is GDPR?
Overview of GDPR – Infosys PoV [Key Focus Areas, Own Journey]
Infosys Solution Framework to GDPR
What Organizations are doing to be GDPR Ready and Infosys’ Relevant experience
2. • Time to make your GDPR program
successful
• GDPR: Data Protection Requirements
• Single View Methodology
• Q&A and Conclusion
Agenda
3. Your Speakers today:
Paul Anderson
Regional Director
UK&I, MongoDB
Paul.Anderson@mongodb.com
Susan Geuens
Industry Principal
Infosys
Mat Keep
Director, Product Marketing
MongoDB
Mat.Keep@mongodb.com
Rich Cullen
Solution Architect
MongoDB
Rich.Cullen@mongodb.com
5. Time to make your GDPR
program successful
Susan Geuens
Industry Principle
Infosys
6. Time to make your
GDPR Program
Successful
Infosys PoV
https://www.infosys.com/gdpr/
For GDPR questions and consulting requests please feel free to email: gdprcompliance@Infosys.com
7. Presentation Agenda
02
03 Overview of GDPR – Infosys PoV [Key Focus Areas, Own Journey]
04 Infosys Solution Framework to GDPR
05 What Organizations are doing to be GDPR Ready and Infosys’ Relevant experience
What is GDPR?
7
01 Infosys and MongoDB – A strategic relationship
8. Infosys & MongoDB – a Strategic relationship
InnovationAlliance Sales Solutions Knowledge
Relationship Highlights
MongoDB’s Top Globally Managed SI partners
900+ MongoDB experienced people
Winner of Innovation award at MongoDB world 2015
Joint investments in go-to-market solutions
Dedicated alliance and field teams from to support business
Go-To-Market
Engagements ranging from up-front business consulting through support and maintenance
Leading Global 2K clients supported by MongoDB and Infosys jointly
Joint Innovation hub as a part of the partnership to drive solutions to the market
Knowledge, Solutions and Competencies
Dedicated experts working on joint solutions for industry solutions
Regular trainings and workshops for people to upgrade skills on MongoDB
Technical consulting on projects through MongoDB experts
A strategic relationship with a
commitment to bring in best in
class solution and people for a
successful migration
9. Infosys & MongoDB – Credentials
• About 100+ Big Data Architects &
Developers
• MongoDB Skilled Resources
• About 10+ Architects
• About 40+ Developers
• About 10+ Admins
• 10+ MongoDB certified professionals
SKILLS
• About 10+ MongoDB Engagements
• Reconciliation Platform with huge
savings in processing times and cost
for a US high tech Manufacturer
• A High performance Metadata
repository system to deliver response
within seconds
PROJECTS
• Strategic Partnership with MongoDB.
• Driving together many new modern
applications development and
migration of legacy workloads
• Infosys was a Gold sponsor at
MongoDB World 2016.
• Infosys received the Partner of the
Year award at the MongoDB World
2016
ALLIANCES
• Infosys Data Services Suite offering
MongoDB migration solution
• POVs and Pilots in progress
TOOLS
10. GDPR – Key Implications
The new regulation for EU Data protection framework is finalized. There is a need for organizations to assess their current environment and implement new
‘Data framework’ to be GDPR compliant before 25th May 2018
What do I need to know
about GDPR?
The EU General Data Protection Regulation will replace the current
Directive and will be directly applicable in all Member States
without the need for implementing national legislation.
What kind of impact it can
create to my
Organization?
The regulation is applicable not only to the EU nations but also to
all the Non-EU nations who process personal data of EU residents/
provide services to EU residents.
Where are we now and
What is needed to be
compliant
There is a need to assess the current data landscape to understand
how data is being captured, processed and published. Define new
framework to govern data activities in compliance with GDPR
Fines for some infringements up to 4% of
annual worldwide turnover or €20 million
Data controllers must notify most data
breaches without undue delay and, where
feasible,
within 72 hours of awareness.
Need for enhanced security and Governance
- DPO (Data Protection Officer)
Right to be forgotten – New rights for
data subjects
Key Implications
International Transfer of Data will
be governed by GDPR
Source: www.iapp.org
10
11. GDPR – Key Focus Areas
Analyzed 173 Regulations and 99 Articles of Official Journal of European Union and identified Key Focus Areas of GDPR
Rights of
Data Subjects
Data subjects can exercise
right to data portability,
right to restrict processing,
right to rectification, right to
access, right to erasure
and right to object
Consent
Management
Organizations must inform
data subjects of the existence
and consequences of any
activities which they carry out
and obtain explicit consent
from data subjects
Profiling
and automated
processing
Data subjects should
be informed existence
and consequences
profiling and
processing
DPO
Every organization will need to
appoint a DPO (Data protection
officer) at Controller / Processor
level who will interact with state
appointed Lead Supervisory
Authority.
DPO will need dashboards to get
informed and control over
notification & consent
management
DPIA
Organizations
processing personal
data required to conduct
privacy impact
assessments at a
regular intervals
Notifications
Breach notifications to
lead supervisory
authority and data
subjects.
Need to implement
automated alerts of
breach
Privacy
by Design
Data protection
principles should be
adopted into product
& project design
Territorial
Scope
Organizations that
provides services
or goods to EU
citizens they need
to comply
Data security
and protection
measures
Immediate or near term requirements due to technology impact
11
12. Infosys’ own journey in safeguarding privacy of our clients, employees,
contractors and vendors
12
13. Infosys Framework for GDPR*
GDPR Compliance Assessment
Policies, Procedures
Define & Design
Architect, Validate and Design
Assess
Assess, Envision and Roadmap
Administer & Implement
Build, Test and Integrate
Monitor & Secure
Stabilize and Improve
Organizational Assessment
and Data Management
Data Governance
and Change Management
Reporting and
Communication
Data Security, Privacy,
Accuracy and Storage
Overall Game Plan
Gap Analysis
Governance framework, Architecture,
Personal Data Life Cycle
Evaluate process and technology
landscape
Roadmap Strategy
Transition from As-Is to To-Be
Change Management
Develop Architecture and IT
Infrastructure Plan for GDPR
Processes Design
Maximize automation in personal data
collection/aggregation/reporting
Refine Personal Data Reporting
Complete, Accurate, Adaptable, Timely
Program Governance Plan
Plan to establish DPO Organization
Build Data Management Framework
for GDPR Compliance
Re-Alignment of Operations
Testing under Normal and
Stress/Crisis Situation
Roadmap Realization
Supervision & Remedial Actions
Periodic Review of Principles Within
and Outside Jurisdiction
GDPR Strategy Realization
Commission/ Decommission
Refine
Architecture, IT Infrastructure, Data
Aggregation & Reporting Processes
Deliverables, Accelerators, Templates
ADAM-InfosysFramework
*Infosys IP 13
14. Overall game plan to deliver a GDPR program
Infosys BoK On
GDPR
Identify
GDPR
capabilities
required by
client
Establish
Priorities for
Sales &
Marketing
ImpactDefine BRD’s for
client GDPR Program
We refine
identified capabilities tailored to
client
Assess
where client
stands w.r.t.
GDPR
Identify & prioritize
the gaps that need to be
addressed & define the
framework.
Design
GDPR compliant
solutions
Realize the design
Assessment
Framework
Value
Realization
Framework
To-Be” StateGDPR Compliant
Architecture
Refine the solutionMonitor and
Manage
Enable client
transition from As-Is to
To-Be GDPR
Compliant
We analyze and review the plans developed by client
& the existing policies and procedures
Lead generation, Sales
prospecting and Customer
outreach and communication
Change Management
14
15. … And GDPR requirements can be powered by the Reference Architecture
This Reference Architecture encapsulates business functions with technical capabilities
Customer Interaction Process
Consent
Management
Right to be
Forgotten and
Rectification
Customer
Engagement
Services
CustomerDataCollection,Generationand
processingApplications
Data Portability Services
Customer Data
Exchange
Customer Data
Requests
Enterprise Capabilities
Data Classification Data ProfilingData Discovery
Data Quality
Management
Information Life
Cycle Management
Authentication &
Authorization
Encryption
Data Pseudonymization /
Masking
Data Usage Auditing
& Analysis
System Security
Breach Identification &
Notification
Data /
Application
Security
Infrastructure
Security
Communication
Security
Auditing &
Monitoring
Information
Governance
Data Standards
Strategy & Policy
Organization
Structure
Compliance
Reporting
DPO Dashboard &
Reporting
Security Testing
15
16. What Organisations should be doing to be GDPR-Ready?
16
Risk management company Marsh stresses the
importance of leadership in prioritizing cyber
preparedness. Compliance with global data hygiene
standards is part of that preparedness.
Many companies already have a plan in place, but
they will need to review and update it to ensure that it
aligns with GDPR requirements
Sense of Urgency
Data protection plan
Once you’ve identified the risks and how to mitigate them,
you must put those measures into place. For most
companies, that means revising existing risk mitigation
measures
All companies will be affected by GDPR, some more
significantly than others. They may not have the
resources needed to meet requirements. Take the
outside help wherever required
Risk Mitigation
Ask For Help
The GDPR requires that companies report breaches
within 72 hours. How well the response teams minimize
the damage will directly affect the company’s risk of fines
for the breach. Make sure you are able to adequately
report and respond within the time period
Incidence Response Plan
The GDPR does not say whether the DPO needs to be
a discrete position, so presumably a company may
name someone who already has a similar role to the
position as long as that person can ensure the
protection of PII with no conflict of interest
DPO Appointment
IT alone is ill-prepared to meet GDPR requirements.
Taskforce should be set up that spread across
organization to meet desired success
Stakeholder Involvement
To ensure that you remain in compliance, and that will
require monitoring and continuous improvement
Continuous Assessment
Organizations need to know what data you store and
process on EU citizens and understand the risks
around it. Remember, the risk assessment must also
outline measures taken to mitigate that risk
Risk Assessment
17. Case Study | GDPR Assessment for a British Mutual Financial Institution
Assessment Program - Objective & Desired Outcome
Assess a set of applications within the Data and Analytics unit of the largest building
society of the world from GDPR perspective. The assessment includes ‘Information
Capture’ around personal data mapping, inbound and outbound interfaces, support,
operations and infrastructure for each application.
Enable gap analysis and roadmap for client’s GDPR-compliant To-Be State.
Key Learnings
• GDPR assessment program should have inter-linked and parallel assessment
of business + processes (top-down) and application + data (bottom-up) to
capture complete view from GDPR perspective.
• Due to the urgency of being GDPR-compliant by 25th May 2018, prioritization
and parallel execution of GDPR requirements is desirable.
• GDPR-inspired ‘Ways of Working’ within an organization play an important
role in maintaining the GDPR-compliant status once achieved.
GDPR Assessment Approach with Infosys
Top-Down Approach
Bottom-Up Approach
Business Analysis
Prioritized
Processes/App.
Top-Down
Observations
Information Audit
Program
Bottom – Up
Observations
Gap Analysis
( As-Is vs To-Be )
Recommendations
Roadmap for To-Be
( Next Phase )
Change
Requirements
Infosys Deliverables –
Information Audit
• Pilot Application Audit
Framework
• Project Plan
• Application Audit
Framework (Compiled)
• Assessment Report
Journey to GDPR Compliance – Design, Build, Test,
Support
17
18. Case Study | GDPR with a Norwegian Financial Services major
Determine whether applicable client systems and applications have technical functionalities and
controls that enable compliance with GDPR requirements.
Identify the compliance gaps for all applications processing personal data, estimate the cost for
remediation, prioritize the applications based on their risk rating and develop a roadmap for
implementation.
Key Learnings
• Personal data protection is key in BFS with the amount of personal and
sensitive data processed in this industry.
• Close collaboration and common understanding of IT and business is
important for the success of the GDPR assessment
• GDPR compliance should not be treated as an overhead but an
accelerator for successful business performanceGDPR Assessment Approach with Infosys
Infosys Deliverables –
• List of in-scope systems
• Assessment survey response
submission for in-scope systems
• Inputs to creating the cost
estimation model
• Remediation Cost estimates for
the pilot systems
• Review Gap analysis results and
Gap report
• .Support in Roadmap creation
Journey to GDPR Compliance –
Initiate implementation of remediation plan
Identified and
finalized in-
scope
systems for
the GDPR
Technical Gap
Assessment
Conducted
survey for
each in-scope
application to
assess and
compile the
GDPR
compliance
gaps
Finalized
system and
application
sizing
classification
criteria to
enable
categorizing
the systems
into different
sizes
The assessed,
in-scope
systems were
scored for
compliance
risk and
complexity
risk, ranked
and plotted on
a risk heat
map to enable
prioritization
Selected 3
Pilot systems
to represent
the three
different sizes
to conduct
pilot
estimations.
Calculated the
cost of
remediating
compliance
functionality
and control
gaps for each
system/applic
ation
Brainstormed
with the
various stake
holders to
prioritize the
risky systems
and create an
actionable
GDPR
Roadmap for
the client
Assessment Program - Objective & Desired Outcome
18
21. Disclosure
For a full description of the GDPR’s regulations, roles, and
responsibilities, it is recommended that readers refer to the text
of the GDPR (Regulation (EU) 2016/679), available from the
Official Journal of the European Union, and refer to legal counsel
for the interpretation of how the regulations apply to their
organization.
22. What’s Needed for Compliance?
What compliance isn’t….
•Turn on a bunch of database
security controls
•BOOM…we’re done!
23. What’s Needed for Compliance?
What compliance isn’t….
•Turn on a bunch of database
security controls
•BOOM…we’re done!
What compliance is…
•People
•Roles, responsibilities, accountability
•Process
•Business practices
•Product
•Technologies to implement controls
24. GDPR Data Protection Requirements
DISCOVER DEFEND DETECT
Identify all PII
in your systems
Implement appropriate security
controls
Monitor to identify suspicious
behavior, remediate gaps
25. Discover Defend Detect
Identify Personal Data Access Control Monitor & Report
Implement Retention Policies Pseudonymisation & Encryption Audit
Resilience & DR
Data Sovereignty
Mapping Required Capabilities to GDPR
26. Discover
Identification of Personal Data
Data Protection Impact Assessment
GDPR Article 35 (clause 1)
“Where a type of processing in particular using new technologies, and
taking into account the nature, scope, context and purposes of the
processing, is likely to result in a high risk to the rights and freedoms of
natural persons, the controller shall, prior to the processing, carry
out an assessment of the impact of the envisaged processing
operations on the protection of personal data.”
27. MongoDB Compass
The GUI for MongoDB
• Visualize & explore your schema
with an intuitive GUI
• Gain quick insights about your data
with easy-to-read histograms
• Build queries with a few clicks
• Drill down to view individual
documents in your collection
• Create governance rules to enforce
data controls
28. Discover
Retention of Personal Data
“Information to be Provided”
GDPR Article 13 (clause 2a)
“the period for which the personal data will be stored, or if
that is not possible, the criteria used to determine that period.”
29. Time to Live (TTL) Indexes
• Automates the expiry of data from the
database
• Define TTL index against a date field, specify
the expiration period
• Background process deletes the document
once retention period expires
• Simplifies enforcement, with lower overhead
30. Defend
General Security Requirements
“Security of Processing”
GDPR Article 32 (clause 1)
“….the controller and the processor shall implement appropriate technical and organisational
measures to ensure a level of security appropriate to the risk, including inter alia as appropriate:
a. the pseudonymisation and encryption of personal data;
b. the ability to ensure the ongoing confidentiality, integrity, availability and resilience of
processing systems and services;
c. the ability to restore the availability and access to personal data in a timely manner in the
event of a physical or technical incident;
d. a process for regularly testing, assessing and evaluating the effectiveness of technical
and organisational measures for ensuring the security of the processing.”
32. Defend
Pseudonymisation & Encryption
“Security of Processing”
GDPR Article 32 (clause 1)
“…. shall implement appropriate technical and organisational measures to ensure a level of
security appropriate to the risk…:
a. the pseudonymisation and encryption of personal data;”
“Communication of a Personal Data Breach to the Data Subject”
GDPR Article 34 (clause 3a)
Communication of a breach to a data subject is not required if the data is rendered unintelligible,
i.e. via encryption
33. Pseudonymisation & Encryption
• Read-Only Views: expose a subset of data from the
underlying database
• Exclude or mask fields, without affecting source collection
• Reduces risk of sensitive data exposure
• Separately specified permissions levels
• End to end data encryption
• Data in motion, TLS encryption
• Data at rest in persistent storage and backups
34. Defend
Resilience & Disaster Recovery
“Security of Processing”
GDPR Article 32 (clause 1)
“…. implement appropriate technical and organisational measures to ensure a level of security
appropriate to the risk, including …:
b. the ability to ensure the ongoing confidentiality, integrity, availability and resilience of
processing systems and services;
c. the ability to restore the availability and access to personal data in a timely manner in the
event of a physical or technical incident;”
35. Resilience & Disaster Recovery
• Replica set – 2 to 50 copies, always-on data
availability
• Self healing from failures
• Rolling restarts for planned maintenance
• Continuous backups, consistent cluster-wide
snapshots
• Point-in-time restore
Application
Driver
Primary
Secondary
Secondary
Replication
36. Defend
Sovereignty: Data Transfers Outside of the EU
GDPR Article 45 (clause 1)
“A transfer of personal data to a third country or an international organisation may take
place where the Commission has decided that the third country, a territory or one or more
specified sectors within that third country, or the international organisation in question ensures an
adequate level of protection.”
37. MongoDB Zones
• Partition data across distributed
clusters based on data locality policies
• Adhere to data sovereignty requirements
• If policies change, update the shard key range
and data is automatically migrated
• Can be configured visually from
MongoDB Ops Manager
38. Detect
Monitoring, Alerting, Auditing
“In the case of a personal data breach, the controller shall without undue delay
and, where feasible, not later than 72 hours after having become aware of it,
notify the personal data breach to the supervisory authority....”
“Notification of a Personal Data Breach to the Supervisory
Authority”
GDPR Article 33 (clause 1)
“Data Protection by Design and by Default”
GDPR Article 25 (clause 2)
“....Each controller and, where applicable, the controller's representative, shall
maintain a record of processing activities under its responsibility”
39. Monitoring & Auditing
• Over 100+ database-related metrics
• Visualized across charts and dashboards
• Real-time alerting
• API integration into APM platforms
• Auditing to log records all actions
taken against the database
• Configurable filters (commands, IP, etc) &
role-based auditing
• Change streams (coming)
40. Security Training
“.... the appropriate data protection training to personnel having permanent or
regular access to personal data”
“Binding corporate rules”
GDPR Article 47 (clause 2n)
• MongoDB M310 Security Course
• MongoDB University public & private training
• MongoDB Global Consulting Services
42. Digital Transformation with MongoDB
UK’s Leading Commercial Property Data Service Drives GDPR
readiness
Problem Why MongoDB Results
Problem Solution Results
Need to develop a new platform for
the company to move from
traditional print media to a digital
business delivering market
intelligence and tools across
multiple online channels
Monolithic application architecture
and rigid relational database
prevented IT team pushing new
updates any more than once per
month
Moved to MEAN stack powered by
a microservices-based architecture in
the cloud
MongoDB Enterprise Advanced for
access to advanced security and
support
MongoDB Encrypted storage engine
to support GDPR readiness
GDPR readiness with a much
more agile data platform
Supports 50x more releases per
month, with always on availability
Transformed business: now
digital is driving revenue growth
43. Developing New Mobile Channels
Enabling “Security by Design, and by Default”
Problem Why MongoDB Results
Situation Solution Results
50k employees, €4.5bn sales
Extend beyond brick and mortar to
mobile apps
Developing opt-in marketing
services for customer data
collection
MongoDB to store all customer data
collected from mobile apps
MongoDB Enterprise Advanced for
access controls, encryption, &
auditing
MongoDB Global Consulting Services
to advise on data protection best
practices
Implement security best
practices at the start of the
project
Avoid need to adjust architecture
later in the product cycle
Can demonstrate “security by
design and default”
Leading
European Retailer
51. Single View Defined
• What
• Single, real-time representation of a business entity or
domain
• Customer, product, supply chain, financial asset class, &
more
• How
• Gathers and organises data from multiple, disconnected
sources
• Aggregates information into a standardised format and joint
information model
• Why
• Improves business visibility
• Serve operational applications
• Foundation for analytics
52. Single View Use Cases
• Comparative view of
traders or products
• Firm-wide view of
asset exposure
• Aggregated
transactions for fraud
models
• Omni-channel view of
customers for
personalized marketing
• Inventory control &
management
• Single view of product
across channels &
demographics
• Management of patient
medical records for
treatment plans
• Macro-analysis view for
public health
• Medical history to
identify insurance risk
Finance Retail Healthcare
53. Challenges
• Current State
• Data dispersed across multitude of systems
• Different structures, different attributes
• Apps built to meet specific business requirements, not
integrated
• New data sources from new apps, M&A
• Governance Processes
• How to deliver & maintain single view in face of constant
business change
• Technology Limitations
• Traditional databases not well suited to single view
required capabilities
56. 10-Step Methodology
Step 1:
Define Scope
Step 4:
Appoint
Data Stewards
Step 5:
Develop
Data Model
Step 6:
Load &
Standardize
Step 7:
Merge,
Test & Reconcile
Step 8:
Infrastructure
Design
Step 3:
Identify
Data Producers
Step 2:
Identify
Data Consumers
Step 9:
Modify Consuming
Systems
Step 10:
Maintenance
Processes
Discover
Develop
Deploy
GDPR requires this…
…and this…
…and this!
57. 10-Step Methodology
Step 1:
Define Scope
Step 4:
Appoint
Data Stewards
Step 5:
Develop
Data Model
Step 6:
Load &
Standardize
Step 7:
Merge,
Test & Reconcile
Step 8:
Infrastructure
Design
Step 3:
Identify
Data Producers
Step 2:
Identify
Data Consumers
Step 9:
Modify Consuming
Systems
Step 10:
Maintenance
Processes
Discover
Develop
Deploy
GDPR requires this…
…and this…
…and this!
58. Step 1: Define Scope & Sponsorship
• Realistic scope, defined by specific success metric
• Long term: aggregate all customer data into a single view, serving all
business functions
• Initial phase: collecting all customer interactions on digital channels
over past 3-months to improve call center MTTR
• Appoint executive sponsors
• Senior: allocate resources and command credibility
• Combination of senior title from the business, and from the technology
group
Discover
59. Step 2 & 3: Identify Data Consumers & Data
Producers
Discover
Source Systems
• Single View Consumers Define
– Typical queries and SLAs
– Required data attributes
– Current data sources
• Identify apps generating the source data
– Identify application owners + associated databases
– Profile apps: operational, analytical
Step 2: Data Consumers
Step 3: Data Producers
Web
Mobile
CRM
Mainframe
60. Step 4: Appoint Data Stewards
• Data steward appointed for each data
source.
• Deep knowledge of:
• Source system schema
• Which tables store required attributes, what format
• Clients and apps that generate & consume the source
data
• Advise on data loading strategies
Develop
61. Step 5: Develop Single View Data Model
• Key inputs
• Required data attributes
• Query patterns
• Define common fields & data types
• Create rules to validate common data
• Define primary & secondary indexes
• Identify dynamic fields
• No need to pre-declare when using a document database
• Add relevant schema validation rules
• Localise data into a single document (where
appropriate)
{
_id : “mark.smith@mongodb.com”,
first_name : "Mark",
last_name : "Smith",
city : "San Francisco",
phones: [ {
number : “1-212-777-1212”,
dnc : true,
type : “home”
},
{
number : “1-212-777-1213”,
type : “cell”
}]}
Single View
Develop
62. Resources to Support Schema Design
MongoDB
Documentation
MongoDB
Development Rapid Start
Develop
63. Step 6: Load
2 phases: Initial Load & Delta Load
Emit JSON to preserve data types. Use Extended JSON
Load
ETLorMessageQueue
Single View
Develop
Initial Load
• ETL Tools
• Custom Loaders
Delta Load
• Batch loads: use tools above
• Real-time loads: Message queue
64. Step 6 (cont’d): Standardize
Data Source A Data Source B Data Source C
14
77
26
cust_id: 14
f_name: James
l_name: Bond
dob: 07/14/1968
eMail: 007@spook.com
fno: 77
first: Jim
last: Bond
born: 1968-07-14
email: 007@spook.com
xc_id: 26
name: James Bind
bdate: July 14, 68
Email: 007@spook.com
Develop
65. Step 7: Match, Merge & Reconcile
Develop
source_id: A_14
first_name: James
last_name: Bond
dob: 1968-07-14
eMail: 007@spook.com
source_id: B_77
first_name: Jim
last_name: Bond
dob: 1968-07-14
eMail: 007@spook.com
source_id: C_26
first_name: James
last_name: Bind
dob: 1968-07-14
eMail: 007@spook.com
_id: 007@spook.com
first_name: James
last_name: Bond
dob: 1968-07-14
cust_id: 14
f_name: James
l_name: Bond
dob: 07/14/1968
eMail: 007@spook.com
xc_id: 26
name: James Bind
bdate: July 14, 68
Email: 007@spook.com
Source
Data
Standardized
Data
Field names & data
types
Single View
Data merged,
tested & reconciled
fno: 77
first: Jim
last: Bond
born: 1968-07-14
email: 007@spook.com
66. Step 7 (cont’d): Match, Merge & Reconcile
• Use iterative grouping functions to cluster records with similar
attributes
1. Match against unique, authoritative attributes (email address, credit card #)
2. Match by combining attributes (last name, DoB, zip code)
3. Use fuzzy matching to catch errors in source data (i.e. different spellings of customer
name)
• Apply confidence factor to dictate merging
• Automatically merge records with 95%+ confidence
• Manually inspect records with lower confidence
Develop
67. Step 7 (cont’d): MongoDB Tools
• Workers framework to parallelize document comparisons
• Grouping tool to cluster documents based on attribute similarity
• Levenshtein to calculate distances, single-linkage clustering for matching
Develop
68. Step 8: Architecture Design
Deploy
• Deployment infrastructure
• MongoDB Production Readiness Consulting
Package provides recommendations:
• Hardware sizing
• HA/DR strategies
• Scaling
• Security for corporate and regulatory compliance
• Follow-on services for implementation
69. Step 9: Modify Consuming Systems
Deploy
• Modify the apps that consume the
single view
• Create an API that exposes the single view (i.e.
RESTful web service)
• Re-point apps to the web service (reads initially)
• Modify one consuming application at
time
Consuming
Systems
Reads
Single View
Call Center
Analytics
Technical
Support
Billing
70. Deploy
• Frequency of application launch & evolution
is accelerating
• Impacts to single view
• Adding new attributes from source systems
• Onboarding new data sources or digital channels
• Creating new apps that consume the single view
• Single view team needs to institutionalise
governance around on-going maintenance
• Repeat the 10-step process
• Dynamic schema is HUGE!
Step 10: Implement Maintenance Processes
72. Single View Maturity Model
Scope
BusinessBenefits
Transactions are written first to the single view, which
propagates the data back to the source system of record.
Writes are performed concurrently to the source systems as
well as the single view
The single view data model is enriched with additional
sources to serve more applications, including real-time
analytics. The single view becomes a platform serving
multiple applications
Single View
Platform
Records are copied via ETL or message queue
mechanisms from the source systems into the single view,
serving read queries. The single view serves one specific
application
Single View
Application
Single View First
Dual Writes
Read
Centric
Transforming the role of
the single view
Reads & Writes
• Advantages of writing to the single view
– Fresher data
– Reduced app complexity
– Improved application agility
73. Architecture for Writes to the Single View
ETLorMessageQueue
Web
Mobile
CRM
Mainframe
Single View
Update
Queue
Reads
Writes
Source Systems Consuming Systems
Load
Call Center
Analytics
Technical
Support
Billing
76. Why Not Use The Usual Tech – Relational
Databases?
Database MUST
simultaneously handle source
systems complexity
Untenable change
management
Complex data access
83. Single View of the Customer
Insurance leader generates coveted single view of
customers in 90 days – “The Wall”
Problem Why MongoDB ResultsProblem Solution Results
No single view of customer,
leading to poor customer
experience and churn
145 years of policy data, 70+
systems, 24 800 numbers, 15+
front-end apps that are not
integrated
Spent 2 years, $25M trying build
single view with RDBMS – failed
Built “The Wall,” pulling in
disparate data and serving single
view to customer service reps in
real time
Flexible data model to aggregate
disparate data into single data
store
Expressive query language and
secondary indexes to serve any
field in real time
Prototyped in 2 weeks
Deployed to production in 90 days
Decreased churn and improved
ability to upsell/cross-sell
84. Single View of Analytics
Data aggregation system to accelerate scientific research &
discovery
Problem Why MongoDB ResultsProblem Solution Results
Raw data from LHC & experiments
distributed across multitude of
source systems
Scientists don’t know location of
source data, or how to extract it
Relational databases rigid data
model prevented aggregation of
data from different sources
Data Aggregation System built on
MongoDB, consolidating analytics
into a single view
Dynamic schema represents data
of any structure
MongoDB query language
supports simple lookups to
complex search, traversals &
analytics
A single query to MongoDB can
return 10,000 documents from
different data sources for real time
analytics
Accelerates scientific time to
insight
Accessed by 3,000 physicists from
200 research institutions across
the globe
85. Single View of the Customer
360° view of the customer increases customer satisfaction,
cross-sell & up-sell with MongoDB, Spark, & Hadoop
Problem Why MongoDB ResultsProblem Solution Results
Customer data scattered across 100+
different systems
Poor customer experience: no
personalization, no consistent
experience across brands or devices
No way to analyze customer behavior to
deliver targeted offers
Single View application on MongoDB
flexible data model, expressive query
language, secondary indexes, &
horizontal scalability
Data from old relational systems fed
into Spark for analysis and then stored
in MongoDB to support real-time CRM
Customer data synced from MongoDB
to Hadoop for nightly batch jobs, then
fed back to MongoDB for personalized
recommendations
Single view serves customers from any
channel
Stores 10s of TBs of customer data
across multiple data centers
Increased revenues from improved
customer intimacy, driving cross-sell
and upsell
Global
Airline
87. Where to Go from Here?
• Single view projects are challenging
• Partner with a vendor offering proven methodology, tools
& technologies
• Learn More
• Download the whitepaper
• 10-Step Methodology to Building a Single View
• Engage
• MongoDB Global Consulting Services can help you
scope the project and get started
• Book a workshop