SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Data Integration and Polyglot Persistence
Damon Feldman, Ph.D.
Solutions Director – MarkLogic
Twitter: @damonfeldman
Integration Done Right – Avoiding the Franken-Beast
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 2
Agenda
 Review a specific data integration project
– The names have been changed to protect the innocent
 Why did it become complex?
 How does this inform integration generally?
 MarkLogic’s vision and features to address this problem.
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 3
Our Data Integration Project
 Simple need to allow people to apply for mortgages
– Accept binary Excel submissions containing structured
data, review and approve.
 Became complex
 We’ll walk through the various issues and considerations
 Finally, we’ll talk about how to simplify these systems.
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 4
Poly-what?
 Polyglot Persistence
 Polyglot - “someone who speaks or writes several languages”
 “The term polyglot is redefined for big data as [using] several core
database technologies [needed] no matter how narrow your
approach to big data.”
– Hurwitz et al: Big Data for Dummies
 Rows & columns; documents; binaries; RDF triples; text
 Note that MarkLogic handles multiple data forms, within one
technology, via universal indexing
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 5
The Requirement
 Mortgage application system
– Input: Excel worksheet submissions
– Business Entities are extracted
– Workflow and approval
– Binaries and XML documents are both persisted
– This is a NoSQL system, because it is focused on Business Entities
 Our customer chose to bifurcate their data
– MarkLogic for Documents (Business Entities)
– Alfresco for binaries
– Input Excel, PDF notices, some metadata
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 6
Polyglot, or “Poly-Not?”
MarkLogic is the best XML/JSON document store in the world – we get
that!
But binaries should go into a “content system….” a CMS or DAM.
Let’s use Alfresco to store the Excel and some generated PDF notices,
and put the XML in MarkLogic.
That way we use a best-of-breed system for each type of data!”
“
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 7
Envisioned Architecture
– Store the input
– Extract structured data
– Store the Business Entity XML
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 8
Coordinating The Parts
 Export super-jumbo loans from last week in 1GB chunks
– Include binaries and XML Business Entities
 Data is bifurcated
– MarkLogic knows dates and super-jumbo thresholds per
zip code
– Alfresco has the binaries
• Now What?
• Who controls paging to hit 1GB per
file?
• What knows how to get a record
and then make a REST call to
Alfresco?
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 9
We Went with Two Passes
 Export all for the week
 Chunk it in a second pass with Python
 Two phases, so two operations
Bonus Questions: how do you monitor the Python output for errors? What if it fails? What if the consumer finds data issues?
Is there traceability from the Business Entity query to the temp data, through the Python script to the bundled output?
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 10
The Franken-Beast
The Franken-Beast
OPERATIONAL
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 12
Think DevOps
#DevOps => Simplify, Monitor, Think of the impacts
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 13
Systemic Simplicity
 Your architecture diagram is a Chimera
– And we want it that way
– A couple more boxes on your architecture diagram may
mean a couple dozen boxes in your deployment diagram
Humans create simplified views like architecture
diagrams exactly because we are not well suited to
deal with this level of complexity
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 14
Building and Development is One Aspect
 Multiple stores required coordination and extra processing
 Architecture and development time were affected
 Other aspects of the program were also slowed down
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 15
Operational Components
 Alfresco ships as a
unit
 … but deploys as a
set of technologies
 …and needs
reliable storage
Beneath the Hood
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 16
HA/DR
 HA means
copies of all
persisted data
 Many stores,
many copies
 Many copies,
many configs
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 17
HA/DR
 DR means
copies of
entire
systems
 All with
replication
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 18
Is HA/DR Possible?
 Consistent data requires transactional control
 Having two (or more!) persistent components makes this
difficult or impossible
 Synchronizing data, restoring data, recovering to a point in
time? All require a notion of transactional consistency.
 This was a huge time- and brain-drain
With MarkLogic it is transactional, fast, correct ,and fully tested under load.
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 19
Monitoring
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 20
Clustered FS
Setup
CM, CI/CD
 Production is reflected in Dev, QA, Stage, etc.
 Entire process should be automated, repeatable and constant
MarkLogic
Code
Alfresco
Config
Oracle DDL
Batch Process
Code
Python Script
Master Config
Create
Directories
Set up initial
data or config
Production
QA
DEV
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 21
How does this story end?
 We are now working to remove much of the complexity
 Design is for binaries inside MarkLogic
 To reduce outages, operational complexity
 And improve performance
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 22
MarkLogic Approach
WHAT ABOUT OTHER
DATA SOURCES?
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 24
What about other data types?
 This use case is not specific to binaries and XML documents
Load and index
data “as is” from
varied sources
Binary
RDF
RDB
Deliver Data in
Unified Form
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 25
Same complexity applies to other data types
 Structured data + Semantic data
 Structured data + text data
 Semantic + text
 [ . . . ]
 Structured + Semantic + Binary, with mixed text
What would our Mortgage example look like with RDF Triples?
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 26
What is Semantic Triple Data?
• AKA RDF. AKA Linked Open Data.
dbr:Kevin_Bacon foaf:knows dbr:Harvey_Keitel
dbr:Kevin_Bacon dbo:spouse dbr:Kyra_Sedgwick
dbo:spouse rdfs:subPropertyOf dbo:knows
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 27
Back to Polyglot Persistence
 Documents: Natural Business Entities stored as documents
 Triples: Relationships among Business Entities as RDF Triples
Applicant
bob-jones-03
Application
MTG-0042
CreditHistory
EQFX-9928
Property
MTG-0042
Loan
MTG-0042
bob-jones-03 :appliesOn MTG-0042
bob-jones-03 hasCredit EQFX-9928
… includesDebt…
… hasCollateral…
+Inference: What real-estate exposures does this applicant have?
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 28
Naïve Polyglot Architecture
 What’s wrong with this picture?
Triple
Store
Extract Triples Ingest Process Store JSON
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 29
Customer Use Case: Documents + Triples (RDF)
BPS
Gloss
Impact
Vendors
Client
Event-Based
Feeds
Faceted Search
Rest APIs
CommonServices
IntegrationLayer(Camel)
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 30
MarkLogic Vision
 Polyglot Persistence
– Many types of data, not many sub-systems for data
– One simplified component
 XML, JSON, SQL views, unstructured (full-text search), Semantic data
(RDF Triples, SPARQL), Binary data (large binaries, streaming)
 Enterprise NoSQL
– All transactional. All HA. All with DR. All query-able with
one API. All scalable. All in one backup. All monitored
together.
 Ingest as-is
 Data Services out of the box
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 31
In Summary
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 32
Additional Resources
 Narrative form of this content:
http://www.marklogic.com/blog/polyglot-persistence-done-
right/
 Fowler’s early Polyglot Persistence note:
http://martinfowler.com/bliki/PolyglotPersistence.html
 Structured Document data + Triple/RDF Data presentation:
http://www.marklogic.com/resources/data-modeling-in-
practice-documents-and-triples/
 damon.feldman@marklogic.com
 @damonfeldman
? ? ? ?
?
Questions?
END
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 34
Deliver the right content,
to the right user,
in the right format,
in real time
Load and index
data “as is” from
ever-changing sources
MarkLogic
PDF
RDF
RDB
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 35
It didn’t have to be that way
Workflow
Persistence
(Business Entities + Binaries!)
(Highly-Available)
DR
MonitoringMarkLogic

Contenu connexe

Tendances

Advanced Analytics: Analytic Platforms Should Be Columnar Orientation
Advanced Analytics: Analytic Platforms Should Be Columnar OrientationAdvanced Analytics: Analytic Platforms Should Be Columnar Orientation
Advanced Analytics: Analytic Platforms Should Be Columnar OrientationDATAVERSITY
 
Data Integration, Interoperability and Virtualization
Data Integration, Interoperability and VirtualizationData Integration, Interoperability and Virtualization
Data Integration, Interoperability and VirtualizationJavier Ramírez
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsDATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...Mark Hewitt
 
ADV Slides: How to Improve Your Analytic Data Architecture Maturity
ADV Slides: How to Improve Your Analytic Data Architecture MaturityADV Slides: How to Improve Your Analytic Data Architecture Maturity
ADV Slides: How to Improve Your Analytic Data Architecture MaturityDATAVERSITY
 
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...DATAVERSITY
 
The Analytical HR Professional: A Look at Data-Driven Talent Management
The Analytical HR Professional: A Look at Data-Driven Talent ManagementThe Analytical HR Professional: A Look at Data-Driven Talent Management
The Analytical HR Professional: A Look at Data-Driven Talent ManagementHuman Capital Media
 
Why My Wife Loves Data Governance
Why My Wife Loves Data GovernanceWhy My Wife Loves Data Governance
Why My Wife Loves Data GovernancePaul Boal
 
Data-Ed Webinar: Data Governance Strategies
Data-Ed Webinar: Data Governance StrategiesData-Ed Webinar: Data Governance Strategies
Data-Ed Webinar: Data Governance StrategiesDATAVERSITY
 
How to Create Controlled Vocabularies for Competitive Intelligence
How to Create Controlled Vocabularies for Competitive IntelligenceHow to Create Controlled Vocabularies for Competitive Intelligence
How to Create Controlled Vocabularies for Competitive IntelligenceIntelCollab.com
 
Data-Ed Webinar: The Importance of MDM
Data-Ed Webinar: The Importance of MDMData-Ed Webinar: The Importance of MDM
Data-Ed Webinar: The Importance of MDMDATAVERSITY
 
Data-Ed Online: Unlock Business Value through Reference & MDM
Data-Ed Online: Unlock Business Value through Reference & MDMData-Ed Online: Unlock Business Value through Reference & MDM
Data-Ed Online: Unlock Business Value through Reference & MDMDATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectDATAVERSITY
 
DataEd Slides: Data Modeling is Fundamental
DataEd Slides:  Data Modeling is FundamentalDataEd Slides:  Data Modeling is Fundamental
DataEd Slides: Data Modeling is FundamentalDATAVERSITY
 
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...Christopher Bradley
 
Why Data Modeling Is Fundamental
Why Data Modeling Is FundamentalWhy Data Modeling Is Fundamental
Why Data Modeling Is FundamentalDATAVERSITY
 
Data Management Meets Human Management - Why Words Matter
Data Management Meets Human Management - Why Words MatterData Management Meets Human Management - Why Words Matter
Data Management Meets Human Management - Why Words MatterDATAVERSITY
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeDATAVERSITY
 

Tendances (20)

Advanced Analytics: Analytic Platforms Should Be Columnar Orientation
Advanced Analytics: Analytic Platforms Should Be Columnar OrientationAdvanced Analytics: Analytic Platforms Should Be Columnar Orientation
Advanced Analytics: Analytic Platforms Should Be Columnar Orientation
 
Data Integration, Interoperability and Virtualization
Data Integration, Interoperability and VirtualizationData Integration, Interoperability and Virtualization
Data Integration, Interoperability and Virtualization
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling Fundamentals
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...
 
ADV Slides: How to Improve Your Analytic Data Architecture Maturity
ADV Slides: How to Improve Your Analytic Data Architecture MaturityADV Slides: How to Improve Your Analytic Data Architecture Maturity
ADV Slides: How to Improve Your Analytic Data Architecture Maturity
 
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
 
The Analytical HR Professional: A Look at Data-Driven Talent Management
The Analytical HR Professional: A Look at Data-Driven Talent ManagementThe Analytical HR Professional: A Look at Data-Driven Talent Management
The Analytical HR Professional: A Look at Data-Driven Talent Management
 
Why My Wife Loves Data Governance
Why My Wife Loves Data GovernanceWhy My Wife Loves Data Governance
Why My Wife Loves Data Governance
 
Data-Ed Webinar: Data Governance Strategies
Data-Ed Webinar: Data Governance StrategiesData-Ed Webinar: Data Governance Strategies
Data-Ed Webinar: Data Governance Strategies
 
How to Create Controlled Vocabularies for Competitive Intelligence
How to Create Controlled Vocabularies for Competitive IntelligenceHow to Create Controlled Vocabularies for Competitive Intelligence
How to Create Controlled Vocabularies for Competitive Intelligence
 
Data-Ed Webinar: The Importance of MDM
Data-Ed Webinar: The Importance of MDMData-Ed Webinar: The Importance of MDM
Data-Ed Webinar: The Importance of MDM
 
Data-Ed Online: Unlock Business Value through Reference & MDM
Data-Ed Online: Unlock Business Value through Reference & MDMData-Ed Online: Unlock Business Value through Reference & MDM
Data-Ed Online: Unlock Business Value through Reference & MDM
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
DataEd Slides: Data Modeling is Fundamental
DataEd Slides:  Data Modeling is FundamentalDataEd Slides:  Data Modeling is Fundamental
DataEd Slides: Data Modeling is Fundamental
 
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
 
Why Data Modeling Is Fundamental
Why Data Modeling Is FundamentalWhy Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
 
Data Management Meets Human Management - Why Words Matter
Data Management Meets Human Management - Why Words MatterData Management Meets Human Management - Why Words Matter
Data Management Meets Human Management - Why Words Matter
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
 

Similaire à A Data Integration Case Study - Avoid Creating a “Franken-Beast”

Dom introduction-website-v1.0
Dom introduction-website-v1.0Dom introduction-website-v1.0
Dom introduction-website-v1.0Cogility
 
The New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudThe New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudInside Analysis
 
Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018Romit Mehta
 
Dataworks | 2018-06-20 | Gimel data platform
Dataworks | 2018-06-20 | Gimel data platformDataworks | 2018-06-20 | Gimel data platform
Dataworks | 2018-06-20 | Gimel data platformDeepak Chandramouli
 
ADV Slides: Modern Analytic Data Architecture Maturity Modeling
ADV Slides: Modern Analytic Data Architecture Maturity ModelingADV Slides: Modern Analytic Data Architecture Maturity Modeling
ADV Slides: Modern Analytic Data Architecture Maturity ModelingDATAVERSITY
 
From Shadow IT to Empowered IT
From Shadow IT to Empowered ITFrom Shadow IT to Empowered IT
From Shadow IT to Empowered ITWSO2
 
Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...
Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...
Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...IRJET Journal
 
Cloud Native Applications Containers Microservices Platforms CICD Oh my
Cloud Native Applications Containers Microservices Platforms CICD Oh myCloud Native Applications Containers Microservices Platforms CICD Oh my
Cloud Native Applications Containers Microservices Platforms CICD Oh myFabio Chiodini
 
Using IBM DataPower for rapid security and application integration with an op...
Using IBM DataPower for rapid security and application integration with an op...Using IBM DataPower for rapid security and application integration with an op...
Using IBM DataPower for rapid security and application integration with an op...Gennadiy Civil
 
apidays LIVE Paris 2021 - EDI & API on One Integration Platform by Mir Mustha...
apidays LIVE Paris 2021 - EDI & API on One Integration Platform by Mir Mustha...apidays LIVE Paris 2021 - EDI & API on One Integration Platform by Mir Mustha...
apidays LIVE Paris 2021 - EDI & API on One Integration Platform by Mir Mustha...apidays
 
GoldenGate and Stream Processing with Special Guest Rakuten
GoldenGate and Stream Processing with Special Guest RakutenGoldenGate and Stream Processing with Special Guest Rakuten
GoldenGate and Stream Processing with Special Guest RakutenJeffrey T. Pollock
 
DEVNET-1127 Unifying Application Logic with Datacenter Automation
DEVNET-1127	Unifying Application Logic with Datacenter AutomationDEVNET-1127	Unifying Application Logic with Datacenter Automation
DEVNET-1127 Unifying Application Logic with Datacenter AutomationCisco DevNet
 
IBM i Development: Increase Accuracy and Efficiency with SEQUEL's ABSTRACT a...
 IBM i Development: Increase Accuracy and Efficiency with SEQUEL's ABSTRACT a... IBM i Development: Increase Accuracy and Efficiency with SEQUEL's ABSTRACT a...
IBM i Development: Increase Accuracy and Efficiency with SEQUEL's ABSTRACT a...HelpSystems
 
Modern data integration expert sessions
Modern data integration expert sessionsModern data integration expert sessions
Modern data integration expert sessionsJessicaMurrell3
 
Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar ibi
 
A Blueprint for Cloud-Native Financial Institutions
A Blueprint for Cloud-Native Financial InstitutionsA Blueprint for Cloud-Native Financial Institutions
A Blueprint for Cloud-Native Financial InstitutionsAngelo Agatino Nicolosi
 
Public hyperledger meetup sf may 2018
Public hyperledger meetup sf may 2018Public hyperledger meetup sf may 2018
Public hyperledger meetup sf may 2018Oracle Developers
 
"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companies"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companiesData Science Milan
 

Similaire à A Data Integration Case Study - Avoid Creating a “Franken-Beast” (20)

Dom introduction-website-v1.0
Dom introduction-website-v1.0Dom introduction-website-v1.0
Dom introduction-website-v1.0
 
The New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudThe New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the Cloud
 
Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018
 
Dataworks | 2018-06-20 | Gimel data platform
Dataworks | 2018-06-20 | Gimel data platformDataworks | 2018-06-20 | Gimel data platform
Dataworks | 2018-06-20 | Gimel data platform
 
Industrial IoT bootcamp
Industrial IoT bootcampIndustrial IoT bootcamp
Industrial IoT bootcamp
 
ADV Slides: Modern Analytic Data Architecture Maturity Modeling
ADV Slides: Modern Analytic Data Architecture Maturity ModelingADV Slides: Modern Analytic Data Architecture Maturity Modeling
ADV Slides: Modern Analytic Data Architecture Maturity Modeling
 
From Shadow IT to Empowered IT
From Shadow IT to Empowered ITFrom Shadow IT to Empowered IT
From Shadow IT to Empowered IT
 
Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...
Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...
Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...
 
Cloud Native Applications Containers Microservices Platforms CICD Oh my
Cloud Native Applications Containers Microservices Platforms CICD Oh myCloud Native Applications Containers Microservices Platforms CICD Oh my
Cloud Native Applications Containers Microservices Platforms CICD Oh my
 
Using IBM DataPower for rapid security and application integration with an op...
Using IBM DataPower for rapid security and application integration with an op...Using IBM DataPower for rapid security and application integration with an op...
Using IBM DataPower for rapid security and application integration with an op...
 
apidays LIVE Paris 2021 - EDI & API on One Integration Platform by Mir Mustha...
apidays LIVE Paris 2021 - EDI & API on One Integration Platform by Mir Mustha...apidays LIVE Paris 2021 - EDI & API on One Integration Platform by Mir Mustha...
apidays LIVE Paris 2021 - EDI & API on One Integration Platform by Mir Mustha...
 
GoldenGate and Stream Processing with Special Guest Rakuten
GoldenGate and Stream Processing with Special Guest RakutenGoldenGate and Stream Processing with Special Guest Rakuten
GoldenGate and Stream Processing with Special Guest Rakuten
 
DEVNET-1127 Unifying Application Logic with Datacenter Automation
DEVNET-1127	Unifying Application Logic with Datacenter AutomationDEVNET-1127	Unifying Application Logic with Datacenter Automation
DEVNET-1127 Unifying Application Logic with Datacenter Automation
 
IBM i Development: Increase Accuracy and Efficiency with SEQUEL's ABSTRACT a...
 IBM i Development: Increase Accuracy and Efficiency with SEQUEL's ABSTRACT a... IBM i Development: Increase Accuracy and Efficiency with SEQUEL's ABSTRACT a...
IBM i Development: Increase Accuracy and Efficiency with SEQUEL's ABSTRACT a...
 
Oracle 360
Oracle 360Oracle 360
Oracle 360
 
Modern data integration expert sessions
Modern data integration expert sessionsModern data integration expert sessions
Modern data integration expert sessions
 
Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar
 
A Blueprint for Cloud-Native Financial Institutions
A Blueprint for Cloud-Native Financial InstitutionsA Blueprint for Cloud-Native Financial Institutions
A Blueprint for Cloud-Native Financial Institutions
 
Public hyperledger meetup sf may 2018
Public hyperledger meetup sf may 2018Public hyperledger meetup sf may 2018
Public hyperledger meetup sf may 2018
 
"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companies"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companies
 

Plus de DATAVERSITY

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data LiteracyDATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for YouDATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?DATAVERSITY
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling FundamentalsDATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectDATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise AnalyticsDATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesDATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
 

Plus de DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 

Dernier

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 

Dernier (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 

A Data Integration Case Study - Avoid Creating a “Franken-Beast”

  • 1. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Data Integration and Polyglot Persistence Damon Feldman, Ph.D. Solutions Director – MarkLogic Twitter: @damonfeldman Integration Done Right – Avoiding the Franken-Beast
  • 2. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 2 Agenda  Review a specific data integration project – The names have been changed to protect the innocent  Why did it become complex?  How does this inform integration generally?  MarkLogic’s vision and features to address this problem.
  • 3. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 3 Our Data Integration Project  Simple need to allow people to apply for mortgages – Accept binary Excel submissions containing structured data, review and approve.  Became complex  We’ll walk through the various issues and considerations  Finally, we’ll talk about how to simplify these systems.
  • 4. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 4 Poly-what?  Polyglot Persistence  Polyglot - “someone who speaks or writes several languages”  “The term polyglot is redefined for big data as [using] several core database technologies [needed] no matter how narrow your approach to big data.” – Hurwitz et al: Big Data for Dummies  Rows & columns; documents; binaries; RDF triples; text  Note that MarkLogic handles multiple data forms, within one technology, via universal indexing
  • 5. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 5 The Requirement  Mortgage application system – Input: Excel worksheet submissions – Business Entities are extracted – Workflow and approval – Binaries and XML documents are both persisted – This is a NoSQL system, because it is focused on Business Entities  Our customer chose to bifurcate their data – MarkLogic for Documents (Business Entities) – Alfresco for binaries – Input Excel, PDF notices, some metadata
  • 6. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 6 Polyglot, or “Poly-Not?” MarkLogic is the best XML/JSON document store in the world – we get that! But binaries should go into a “content system….” a CMS or DAM. Let’s use Alfresco to store the Excel and some generated PDF notices, and put the XML in MarkLogic. That way we use a best-of-breed system for each type of data!” “
  • 7. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 7 Envisioned Architecture – Store the input – Extract structured data – Store the Business Entity XML
  • 8. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 8 Coordinating The Parts  Export super-jumbo loans from last week in 1GB chunks – Include binaries and XML Business Entities  Data is bifurcated – MarkLogic knows dates and super-jumbo thresholds per zip code – Alfresco has the binaries • Now What? • Who controls paging to hit 1GB per file? • What knows how to get a record and then make a REST call to Alfresco?
  • 9. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 9 We Went with Two Passes  Export all for the week  Chunk it in a second pass with Python  Two phases, so two operations Bonus Questions: how do you monitor the Python output for errors? What if it fails? What if the consumer finds data issues? Is there traceability from the Business Entity query to the temp data, through the Python script to the bundled output?
  • 10. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 10 The Franken-Beast The Franken-Beast
  • 12. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 12 Think DevOps #DevOps => Simplify, Monitor, Think of the impacts
  • 13. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 13 Systemic Simplicity  Your architecture diagram is a Chimera – And we want it that way – A couple more boxes on your architecture diagram may mean a couple dozen boxes in your deployment diagram Humans create simplified views like architecture diagrams exactly because we are not well suited to deal with this level of complexity
  • 14. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 14 Building and Development is One Aspect  Multiple stores required coordination and extra processing  Architecture and development time were affected  Other aspects of the program were also slowed down
  • 15. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 15 Operational Components  Alfresco ships as a unit  … but deploys as a set of technologies  …and needs reliable storage Beneath the Hood
  • 16. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 16 HA/DR  HA means copies of all persisted data  Many stores, many copies  Many copies, many configs
  • 17. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 17 HA/DR  DR means copies of entire systems  All with replication
  • 18. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 18 Is HA/DR Possible?  Consistent data requires transactional control  Having two (or more!) persistent components makes this difficult or impossible  Synchronizing data, restoring data, recovering to a point in time? All require a notion of transactional consistency.  This was a huge time- and brain-drain With MarkLogic it is transactional, fast, correct ,and fully tested under load.
  • 19. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 19 Monitoring
  • 20. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 20 Clustered FS Setup CM, CI/CD  Production is reflected in Dev, QA, Stage, etc.  Entire process should be automated, repeatable and constant MarkLogic Code Alfresco Config Oracle DDL Batch Process Code Python Script Master Config Create Directories Set up initial data or config Production QA DEV
  • 21. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 21 How does this story end?  We are now working to remove much of the complexity  Design is for binaries inside MarkLogic  To reduce outages, operational complexity  And improve performance
  • 22. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 22 MarkLogic Approach
  • 24. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 24 What about other data types?  This use case is not specific to binaries and XML documents Load and index data “as is” from varied sources Binary RDF RDB Deliver Data in Unified Form
  • 25. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 25 Same complexity applies to other data types  Structured data + Semantic data  Structured data + text data  Semantic + text  [ . . . ]  Structured + Semantic + Binary, with mixed text What would our Mortgage example look like with RDF Triples?
  • 26. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 26 What is Semantic Triple Data? • AKA RDF. AKA Linked Open Data. dbr:Kevin_Bacon foaf:knows dbr:Harvey_Keitel dbr:Kevin_Bacon dbo:spouse dbr:Kyra_Sedgwick dbo:spouse rdfs:subPropertyOf dbo:knows
  • 27. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 27 Back to Polyglot Persistence  Documents: Natural Business Entities stored as documents  Triples: Relationships among Business Entities as RDF Triples Applicant bob-jones-03 Application MTG-0042 CreditHistory EQFX-9928 Property MTG-0042 Loan MTG-0042 bob-jones-03 :appliesOn MTG-0042 bob-jones-03 hasCredit EQFX-9928 … includesDebt… … hasCollateral… +Inference: What real-estate exposures does this applicant have?
  • 28. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 28 Naïve Polyglot Architecture  What’s wrong with this picture? Triple Store Extract Triples Ingest Process Store JSON
  • 29. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 29 Customer Use Case: Documents + Triples (RDF) BPS Gloss Impact Vendors Client Event-Based Feeds Faceted Search Rest APIs CommonServices IntegrationLayer(Camel)
  • 30. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 30 MarkLogic Vision  Polyglot Persistence – Many types of data, not many sub-systems for data – One simplified component  XML, JSON, SQL views, unstructured (full-text search), Semantic data (RDF Triples, SPARQL), Binary data (large binaries, streaming)  Enterprise NoSQL – All transactional. All HA. All with DR. All query-able with one API. All scalable. All in one backup. All monitored together.  Ingest as-is  Data Services out of the box
  • 31. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 31 In Summary
  • 32. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 32 Additional Resources  Narrative form of this content: http://www.marklogic.com/blog/polyglot-persistence-done- right/  Fowler’s early Polyglot Persistence note: http://martinfowler.com/bliki/PolyglotPersistence.html  Structured Document data + Triple/RDF Data presentation: http://www.marklogic.com/resources/data-modeling-in- practice-documents-and-triples/  damon.feldman@marklogic.com  @damonfeldman ? ? ? ? ? Questions?
  • 33. END
  • 34. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 34 Deliver the right content, to the right user, in the right format, in real time Load and index data “as is” from ever-changing sources MarkLogic PDF RDF RDB
  • 35. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 35 It didn’t have to be that way Workflow Persistence (Business Entities + Binaries!) (Highly-Available) DR MonitoringMarkLogic