SlideShare une entreprise Scribd logo
1  sur  35
Massimo Brignoli
Principal Solution Architect
massimo@mongodb.com
@massimobrignoli
L’architettura di Classe Enterprise di Nuova
Generazione
Agenda
• Nascita dei Data Lake
• Overview di MongoDB
• Proposta di un’architettura
EDM
• Case Study & Scenarios
• Data Lake Lessons Learned
Quanti dati?
• Una cosa non manca alla aziende: dati
– Flussi dei sensori
– Sentiment sui social
– Log dei server
– App mobile
• Analisti stimano una crescita del volume di dati del 40%
annuo, 90% dei quali non strutturati.
• Le tecnologie tradizionali (alcune disegnate 40 anni fa) non
sono sufficienti
La Promessa del “Big Data”
• Scoprire informazioni collezionando ed analizzando i dati
porta la promessa di
– Un vantaggio competitivo
– Risparmio economico
• Un esempio diffuso dell’utilizzo della tecnologia Big Data è la
“Single View”: aggregare tutto quello che si conosce di un
cliente per migliorarne l’ingaggio e i ricavi
• Il tradizionale EDW scricchiola sotto il carico, sopraffatto dal
volume e varietà dei dati (e dall’alto costo).
La Nascita dei Data Lake
• Molte aziende hanno iniziato a guardare verso un’architettura
detta Data Lake:
– Piattaforma per gestire i dati in modo flessibile
– Per aggregare e correlare i dati cross-silos in un unico posto
– Permette l’esplorazione di tutti i dati
• La piattaforma più in voga in questo momento è Hadoop:
– Permette la scalabilità orizzontale su hardware commodity
– Permette una schema di dati variegati ottimizzato in lettura
– Include strati di lavorazione dei dati in SQL e linguaggi comuni
– Grandi referenze (Yahoo e Google in primis)
Perché Hadoop?
• Hadoop Distributed FileSystem è disegnato per scalare su
grandi operazioni batch
• Fornisce un modello write-one read-many append-only
• Ottimizzato per lunghe scansione di TB o PB di dati
• Questa capacità di gestire dati multi-strutturati è usata:
– Segmentazione dei clienti per campagne di marketing e
recommendation
– Analisi predittiva
– Modelli di Rischio
Ma va bene per tutto?
• I Data Lake sono disegnati per fornire l’output di Hadoop alle
applicazioni online. Queste applicazioni hanno dei requisiti
tra cui:
– Latenza di risposta in ms
– Accesso random su un sottoinsieme di dati indicizzato
– Supporto di query espressive ed aggregazioni di dati
– Update di dati che cambiano valori frequentemente in real-time
Hadoop è la risposta a tutto?
• Nel nostro mondo guidato ormai dai dati, i millisecondi sono
importanti.
– Ricercatori IBM affermano che il 60% dei dati perde valore alcuni
millisecondi dopo la generazione
– Ad esempio identificare una transazione di borsa fraudolenta può essere
inutile dopo alcuni minuti
• Gartner predice che il 70% delle installazioni di Hadoop fallirà
per non aver raggiunto gli obiettivi di costo e di incremento
del fatturato.
Enterprise Data Management Pipeline
…
Siloed source databases
External feeds
(batch)
Streams
Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png
Transform
Store raw
data
AnalyzeAggregate
Pub-sub,ETL,fileimports
Stream Processing
Users
Other
Systems
Veloce Overview di MongoDB
Document Model
{
first_name: ‘Paul’,
surname: ‘Miller’,
city: ‘London’,
location: {
type : ‘Point’,
coordinates : [45.123,47.232]
},
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
MongoDB
RDBMS
Documents are Rich Data Structures
{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: 447557505611,
city: ‘London’,
location: {
type : ‘Point’,
coordinates : [45.123,47.232]
},
Profession: [‘banking’, ‘finance’, ‘trader’],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
Fields can contain an array
of sub-documents
Typed field values
Fields can contain
arrays
Fields
Documents are Flexible
{
product_name: ‘Acme Paint’,
color: [‘Red’, ‘Green’],
size_oz: [8, 32],
finish: [‘satin’, ‘eggshell’]
}
{
product_name: ‘T-shirt’,
size: [‘S’, ‘M’, ‘L’, ‘XL’],
color: [‘Heather Gray’ … ],
material: ‘100% cotton’,
wash: ‘cold’,
dry: ‘tumble dry low’
}
{
product_name: ‘Mountain Bike’,
brake_style: ‘mechanical disc’,
color: ‘grey’,
frame_material: ‘aluminum’,
no_speeds: 21,
package_height: ‘7.5x32.9x55’,
weight_lbs: 44.05,
suspension_type: ‘dual’,
wheel_size_in: 26
}
Documents in the same product catalog collection in MongoDB
Drivers & Frameworks
Morphia
MEAN Stack
Development – The Past
{ CODE } DB SCHEMAXML CONFIG
APPLICATION RELATIONAL DATABASE
OBJECT RELATIONAL
MAPPING
Development – With MongoDB
{ CODE } DB SCHEMAXML CONFIG
APPLICATION RELATIONAL DATABASE
OBJECT RELATIONAL
MAPPING
MongoDB is Full-Featured
Rich Queries
Find Paul’s cars
Find everybody in London with a car between 1970 and 1980
Geospatial Find all of the car owners within 5km of Trafalgar Sq.
Text Search Find all the cars described as having leather seats
Aggregation Calculate the average value of Paul’s car collection
Map Reduce
What is the ownership pattern of colors by geography over time
(is purple trending in China?)
Dynamic Lookup
• Combine data from multiple collections with left outer joins
for richer analytics & more flexibility in data modeling
Model of the Aggregation Framework
Analytics & BI Integration
Replica Sets
• Replica set – 2 to 50 copies
• Replica sets make up a self-healing ‘shard’
• Data center awareness
• Replica sets address:
• High availability
• Data durability, consistency
• Maintenance (e.g., HW swaps)
• Disaster Recovery
Application
Driver
Primary
Secondary
Secondary
Replication
Query Routing
• Multiple query optimization models
• Each of the sharding options are appropriate for different
apps / use cases
3.2 Features Relevant for EDM
• WiredTiger operational Engine
• In-memory storage engine
• Encryption
• Compass (data viewer & query builder)
• Connector for BI (Visualization)
• Connector for Hadoop
• Connector for Spark
• $lookUp (left outer join)
Enterprise Data Management Pipeline
…
Siloed source databases
External feeds
(batch)
Streams
Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png
Transform
Store raw
data
AnalyzeAggregate
Pub-sub,ETL,fileimports
Stream Processing
Users
Other
Systems
Come scegliere il layer di Data Management per
ognuno degli stage?
Processing
Layer
?
When you want:
1. Secondary indexes
2. Sub-second latency
3. Aggregations in DB
4. Updates of data
For:
1. Scanning files
2. When indexes
not needed
Wide column store
(e.g. HBase)
For:
1. Primary key queries
2. If multiple indexes &
slices not needed
3. Optimized for
writing, not reading
Data Store for Raw Dataset
…
Siloed source databases
External feeds
(batch)
Streams
Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png
Transform
Store raw
data
AnalyzeAggregate
Pub-sub,ETL,fileimports
Stream Processing
Users
Other
Systems
Store
raw data
Transform
- Typically just writing
record-by-record from
source data
- Usually just need high
write volumes
- All 3 options handle that
Transform read requirements
- Benefits to reading multiple datasets
sorted [by index], e.g. to do a merge
- Might want to look up across tables
with indexes (and join functionality in
MDB v3.2)
- Want high read performance while
writes are happening
Interactive querying on
the raw data could use
indexes with MongoDB
Data Store for Transformed Dataset
…
Siloed source databases
External feeds
(batch)
Streams
Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png
Transform
Store raw
data
AnalyzeAggregate
Pub-sub,ETL,fileimports
Stream Processing
Users
Other
Systems
AggregateTransform
Often benefits to
updating data as
merging multiple
datasets
Dashboards &
reports can have
sub-second latency
with indexes
Aggregate read requirements
- Benefits to using indexes for grouping
- Aggregations natively in the DB would help
- With indexes, can do aggregations on slices of
data
- Might want to look up across tables with
indexes to aggregate
Data Store for Aggregated Dataset
…
Siloed source databases
External feeds
(batch)
Streams
Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png
Transform
Store raw
data
AnalyzeAggregate
Pub-sub,ETL,fileimports
Stream Processing
Users
Other
Systems
AnalyzeAggregate
Dashboards &
reports can have
sub-second
latency with
indexes
Analytics read requirements
- For scanning all of data, could
be in any data store
- Often want to analyze a slice
of data (using indexes)
- Querying on slices is best in
MongoDB
Data Store for Last Dataset
…
Siloed source databases
External feeds
(batch)
Streams
Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png
Transform
Store raw
data
AnalyzeAggregate
Pub-sub,ETL,fileimports
Stream Processing
Users
Other
Systems
Analyze
Users
Dashboards &
reports can have
sub-second
latency with
indexes
- At the last step, there are
many consuming systems and
users
- Need expressive querying with
secondary indexes
- MongoDB is best option for the
publication or distribution of
analytical results and
operationalization of data
Other
Systems
Often digital
applications
- High scale
- Expressive querying
- JSON preferred
Often
RESTful
services,
APIs
Architettura EDM Completa
…
Siloed source
databases
External feeds
(batch)
Streams
Data processing pipeline
Pub-sub,ETL,fileimports
Stream Processing
Downstrea
m Systems
… …
Single CSR
Application
Unified
Digital Apps
Operational
Reporting
…
… …
Analytic
Reporting
Drivers & Stacks
Customer
Clusterin
g
Churn
Analysis
Predictiv
e
Analytics
…
Distributed
Processing
Governance to
choose where to
load and process
data
Optimal
location for
providing
operational
response times
& slices
Can run
processing on
all data or
slices
Data Lake
Example scenarios
1.Single Customer View
a. Operational
b. Analytics on customer segments
c. Analytics on all customers
2.Customer profiles & clustering
3.Presenting churn analytics on high value customers
Single View of Customer
Spanish bank replaces Teradata and Microstrategy to
increase business and avoid significant cost
Problem Why MongoDB Results
Problem Solution Results
Took days to implement new
functionality and business policies,
inhibiting revenue growth
Branches needed an app providing
single view of the customer and real
time recommendations for new
products and services
Multi-minute latency for accessing
customer data stored in Teradata and
Microstrategy
Built single view of customer on
MongoDB – flexible and scalable app
easy to adapt to new business needs
Super fast, ad hoc query capabilities
(milliseconds), and real-time analytics
thanks to MongoDB’s Aggregation
Framework
Can now leverage distributed
infrastructure and commodity
hardware for lower total cost of
ownership and greater availability
Cost avoidance of 10M$+
Application developed and deployed
in less than 6 months. New business
policies easily deployed and executed,
bringing new revenue to the company
Current capacity allows branches to
load instantly all customer info in
milliseconds, providing a great
customer experience
Large Spanish
Bank
Case Study
Insurance leader generates coveted single view of
customers in 90 days – “The Wall”
Problem Why MongoDB ResultsProblem Solution Results
No single view of customer, leading
to poor customer experience and
churn
145 years of policy data, 70+
systems, 15+ apps that are not
integrated
Spent 2 years, $25M trying build
single view with Oracle – failed
Built “The Wall” pulling in
disparate data and serving single
view to customer service reps in
real time
Flexible data model to aggregate
disparate data into single data
store
Churn analysis done with Hadoop
with relevant results output to
MongoDB
Prototyped in 2 weeks
Deployed to production in 90
days
Decreased churn and improved
ability to upsell/cross-sell
Top 15
Global Bank
Kicking Out Oracle
Global bank with 48M customers in 50 countries terminates
Oracle ULA & makes MongoDB database of choice
Problem Why MongoDB Results
Problem Solution Results
Slow development cycles due to RDBMS’
rigid data model hindering ability to meet
business demands
High TCO for hardware, licenses,
development, and support
(>$50M Oracle ULA)
Poor overall performance of customer-
facing and internal applications
Building dozens of apps on MongoDB,
both net new and migrations from Oracle
– e.g., significant portion of retail
banking, including customer-facing and
backoffice apps, fraud detection, card
activation, equity research content mgt.)
Flexible data model to develop apps
quickly and accommodate diverse data
Ability to scale infrastructure and costs
elastically
Able to cancel Oracle ULA. Evaluating
what apps can be migrated to MongoDB.
For new apps, MongoDB is default choice
Apps built in weeks instead of months or
years, e.g., ebanking app prototyped in 2
weeks and in production in 4 weeks
70% TCO reduction
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli

Contenu connexe

Tendances

Webinar: How MongoDB is making Government Better, Faster, Smarter
Webinar: How MongoDB is making Government Better, Faster, SmarterWebinar: How MongoDB is making Government Better, Faster, Smarter
Webinar: How MongoDB is making Government Better, Faster, Smarter
MongoDB
 
Linked Data Marketplaces
Linked Data MarketplacesLinked Data Marketplaces
Linked Data Marketplaces
Marin Dimitrov
 
Event-Based Subscription with MongoDB
Event-Based Subscription with MongoDBEvent-Based Subscription with MongoDB
Event-Based Subscription with MongoDB
MongoDB
 
BigData-Architecture
BigData-ArchitectureBigData-Architecture
BigData-Architecture
Narayana B
 

Tendances (20)

Graphs in Action
Graphs in ActionGraphs in Action
Graphs in Action
 
Neo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j GraphTalks - Einführung in GraphdatenbankenNeo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j GraphTalks - Einführung in Graphdatenbanken
 
Enabling the Bank of the Future by Ignacio Bernal
Enabling the Bank of the Future by Ignacio BernalEnabling the Bank of the Future by Ignacio Bernal
Enabling the Bank of the Future by Ignacio Bernal
 
Webinar: How MongoDB is making Government Better, Faster, Smarter
Webinar: How MongoDB is making Government Better, Faster, SmarterWebinar: How MongoDB is making Government Better, Faster, Smarter
Webinar: How MongoDB is making Government Better, Faster, Smarter
 
How Financial Services Organizations Use MongoDB
How Financial Services Organizations Use MongoDBHow Financial Services Organizations Use MongoDB
How Financial Services Organizations Use MongoDB
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
 
Graph db
Graph dbGraph db
Graph db
 
Connecting Silos in Real Time with Data Virtualization
Connecting Silos in Real Time with Data VirtualizationConnecting Silos in Real Time with Data Virtualization
Connecting Silos in Real Time with Data Virtualization
 
Data Virtualization at Logitech = #Winning
Data Virtualization at Logitech = #WinningData Virtualization at Logitech = #Winning
Data Virtualization at Logitech = #Winning
 
Beyond the Basics 3: Introduction to the MongoDB BI Connector
Beyond the Basics 3: Introduction to the MongoDB BI ConnectorBeyond the Basics 3: Introduction to the MongoDB BI Connector
Beyond the Basics 3: Introduction to the MongoDB BI Connector
 
Webinar: Real-time Risk Management and Regulatory Reporting with MongoDB
Webinar: Real-time Risk Management and Regulatory Reporting with MongoDBWebinar: Real-time Risk Management and Regulatory Reporting with MongoDB
Webinar: Real-time Risk Management and Regulatory Reporting with MongoDB
 
Geschäftliches Potential für System-Integratoren und Berater - Graphdatenban...
Geschäftliches Potential für System-Integratoren und Berater -  Graphdatenban...Geschäftliches Potential für System-Integratoren und Berater -  Graphdatenban...
Geschäftliches Potential für System-Integratoren und Berater - Graphdatenban...
 
Linked Data Marketplaces
Linked Data MarketplacesLinked Data Marketplaces
Linked Data Marketplaces
 
Solution architecture
Solution architectureSolution architecture
Solution architecture
 
Neo4j GraphTalks Munich - Einführung
Neo4j GraphTalks Munich - EinführungNeo4j GraphTalks Munich - Einführung
Neo4j GraphTalks Munich - Einführung
 
Event-Based Subscription with MongoDB
Event-Based Subscription with MongoDBEvent-Based Subscription with MongoDB
Event-Based Subscription with MongoDB
 
BigData-Architecture
BigData-ArchitectureBigData-Architecture
BigData-Architecture
 
AIRBNB DATA WAREHOUSE & GRAPH DATABASE
AIRBNB DATA WAREHOUSE & GRAPH DATABASEAIRBNB DATA WAREHOUSE & GRAPH DATABASE
AIRBNB DATA WAREHOUSE & GRAPH DATABASE
 
4 Steps to Make Customer Data Actionable
4 Steps to Make Customer Data Actionable 4 Steps to Make Customer Data Actionable
4 Steps to Make Customer Data Actionable
 

Similaire à L'architettura di classe enterprise di nuova generazione - Massimo Brignoli

L’architettura di classe enterprise di nuova generazione
L’architettura di classe enterprise di nuova generazioneL’architettura di classe enterprise di nuova generazione
L’architettura di classe enterprise di nuova generazione
MongoDB
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB
 

Similaire à L'architettura di classe enterprise di nuova generazione - Massimo Brignoli (20)

L’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneL’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova Generazione
 
L’architettura di classe enterprise di nuova generazione
L’architettura di classe enterprise di nuova generazioneL’architettura di classe enterprise di nuova generazione
L’architettura di classe enterprise di nuova generazione
 
MongoDB Europe 2016 - The Rise of the Data Lake
MongoDB Europe 2016 - The Rise of the Data LakeMongoDB Europe 2016 - The Rise of the Data Lake
MongoDB Europe 2016 - The Rise of the Data Lake
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
 
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceWebinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-Service
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data Lake
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 
Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platforms
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data Lake
 
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceWebinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-Service
 
MongoDB and In-Memory Computing
MongoDB and In-Memory ComputingMongoDB and In-Memory Computing
MongoDB and In-Memory Computing
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 

Plus de Data Driven Innovation

Plus de Data Driven Innovation (20)

Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
 
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
 
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
 
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
 
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
 
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
 
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
 
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
 
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
 
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
 
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
 
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
 
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
 
Big Data Confederation: toward the local urban data market place (Renzo Taffa...
Big Data Confederation: toward the local urban data market place (Renzo Taffa...Big Data Confederation: toward the local urban data market place (Renzo Taffa...
Big Data Confederation: toward the local urban data market place (Renzo Taffa...
 
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
 
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
 
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
 
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
 
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
 
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
 

Dernier

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Dernier (20)

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

L'architettura di classe enterprise di nuova generazione - Massimo Brignoli

  • 1. Massimo Brignoli Principal Solution Architect massimo@mongodb.com @massimobrignoli L’architettura di Classe Enterprise di Nuova Generazione
  • 2. Agenda • Nascita dei Data Lake • Overview di MongoDB • Proposta di un’architettura EDM • Case Study & Scenarios • Data Lake Lessons Learned
  • 3. Quanti dati? • Una cosa non manca alla aziende: dati – Flussi dei sensori – Sentiment sui social – Log dei server – App mobile • Analisti stimano una crescita del volume di dati del 40% annuo, 90% dei quali non strutturati. • Le tecnologie tradizionali (alcune disegnate 40 anni fa) non sono sufficienti
  • 4. La Promessa del “Big Data” • Scoprire informazioni collezionando ed analizzando i dati porta la promessa di – Un vantaggio competitivo – Risparmio economico • Un esempio diffuso dell’utilizzo della tecnologia Big Data è la “Single View”: aggregare tutto quello che si conosce di un cliente per migliorarne l’ingaggio e i ricavi • Il tradizionale EDW scricchiola sotto il carico, sopraffatto dal volume e varietà dei dati (e dall’alto costo).
  • 5. La Nascita dei Data Lake • Molte aziende hanno iniziato a guardare verso un’architettura detta Data Lake: – Piattaforma per gestire i dati in modo flessibile – Per aggregare e correlare i dati cross-silos in un unico posto – Permette l’esplorazione di tutti i dati • La piattaforma più in voga in questo momento è Hadoop: – Permette la scalabilità orizzontale su hardware commodity – Permette una schema di dati variegati ottimizzato in lettura – Include strati di lavorazione dei dati in SQL e linguaggi comuni – Grandi referenze (Yahoo e Google in primis)
  • 6. Perché Hadoop? • Hadoop Distributed FileSystem è disegnato per scalare su grandi operazioni batch • Fornisce un modello write-one read-many append-only • Ottimizzato per lunghe scansione di TB o PB di dati • Questa capacità di gestire dati multi-strutturati è usata: – Segmentazione dei clienti per campagne di marketing e recommendation – Analisi predittiva – Modelli di Rischio
  • 7. Ma va bene per tutto? • I Data Lake sono disegnati per fornire l’output di Hadoop alle applicazioni online. Queste applicazioni hanno dei requisiti tra cui: – Latenza di risposta in ms – Accesso random su un sottoinsieme di dati indicizzato – Supporto di query espressive ed aggregazioni di dati – Update di dati che cambiano valori frequentemente in real-time
  • 8. Hadoop è la risposta a tutto? • Nel nostro mondo guidato ormai dai dati, i millisecondi sono importanti. – Ricercatori IBM affermano che il 60% dei dati perde valore alcuni millisecondi dopo la generazione – Ad esempio identificare una transazione di borsa fraudolenta può essere inutile dopo alcuni minuti • Gartner predice che il 70% delle installazioni di Hadoop fallirà per non aver raggiunto gli obiettivi di costo e di incremento del fatturato.
  • 9. Enterprise Data Management Pipeline … Siloed source databases External feeds (batch) Streams Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png Transform Store raw data AnalyzeAggregate Pub-sub,ETL,fileimports Stream Processing Users Other Systems
  • 11. Document Model { first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, location: { type : ‘Point’, coordinates : [45.123,47.232] }, cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ] } MongoDB RDBMS
  • 12. Documents are Rich Data Structures { first_name: ‘Paul’, surname: ‘Miller’, cell: 447557505611, city: ‘London’, location: { type : ‘Point’, coordinates : [45.123,47.232] }, Profession: [‘banking’, ‘finance’, ‘trader’], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ] } Fields can contain an array of sub-documents Typed field values Fields can contain arrays Fields
  • 13. Documents are Flexible { product_name: ‘Acme Paint’, color: [‘Red’, ‘Green’], size_oz: [8, 32], finish: [‘satin’, ‘eggshell’] } { product_name: ‘T-shirt’, size: [‘S’, ‘M’, ‘L’, ‘XL’], color: [‘Heather Gray’ … ], material: ‘100% cotton’, wash: ‘cold’, dry: ‘tumble dry low’ } { product_name: ‘Mountain Bike’, brake_style: ‘mechanical disc’, color: ‘grey’, frame_material: ‘aluminum’, no_speeds: 21, package_height: ‘7.5x32.9x55’, weight_lbs: 44.05, suspension_type: ‘dual’, wheel_size_in: 26 } Documents in the same product catalog collection in MongoDB
  • 15. Development – The Past { CODE } DB SCHEMAXML CONFIG APPLICATION RELATIONAL DATABASE OBJECT RELATIONAL MAPPING
  • 16. Development – With MongoDB { CODE } DB SCHEMAXML CONFIG APPLICATION RELATIONAL DATABASE OBJECT RELATIONAL MAPPING
  • 17. MongoDB is Full-Featured Rich Queries Find Paul’s cars Find everybody in London with a car between 1970 and 1980 Geospatial Find all of the car owners within 5km of Trafalgar Sq. Text Search Find all the cars described as having leather seats Aggregation Calculate the average value of Paul’s car collection Map Reduce What is the ownership pattern of colors by geography over time (is purple trending in China?)
  • 18. Dynamic Lookup • Combine data from multiple collections with left outer joins for richer analytics & more flexibility in data modeling
  • 19. Model of the Aggregation Framework
  • 20. Analytics & BI Integration
  • 21. Replica Sets • Replica set – 2 to 50 copies • Replica sets make up a self-healing ‘shard’ • Data center awareness • Replica sets address: • High availability • Data durability, consistency • Maintenance (e.g., HW swaps) • Disaster Recovery Application Driver Primary Secondary Secondary Replication
  • 22. Query Routing • Multiple query optimization models • Each of the sharding options are appropriate for different apps / use cases
  • 23. 3.2 Features Relevant for EDM • WiredTiger operational Engine • In-memory storage engine • Encryption • Compass (data viewer & query builder) • Connector for BI (Visualization) • Connector for Hadoop • Connector for Spark • $lookUp (left outer join)
  • 24. Enterprise Data Management Pipeline … Siloed source databases External feeds (batch) Streams Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png Transform Store raw data AnalyzeAggregate Pub-sub,ETL,fileimports Stream Processing Users Other Systems
  • 25. Come scegliere il layer di Data Management per ognuno degli stage? Processing Layer ? When you want: 1. Secondary indexes 2. Sub-second latency 3. Aggregations in DB 4. Updates of data For: 1. Scanning files 2. When indexes not needed Wide column store (e.g. HBase) For: 1. Primary key queries 2. If multiple indexes & slices not needed 3. Optimized for writing, not reading
  • 26. Data Store for Raw Dataset … Siloed source databases External feeds (batch) Streams Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png Transform Store raw data AnalyzeAggregate Pub-sub,ETL,fileimports Stream Processing Users Other Systems Store raw data Transform - Typically just writing record-by-record from source data - Usually just need high write volumes - All 3 options handle that Transform read requirements - Benefits to reading multiple datasets sorted [by index], e.g. to do a merge - Might want to look up across tables with indexes (and join functionality in MDB v3.2) - Want high read performance while writes are happening Interactive querying on the raw data could use indexes with MongoDB
  • 27. Data Store for Transformed Dataset … Siloed source databases External feeds (batch) Streams Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png Transform Store raw data AnalyzeAggregate Pub-sub,ETL,fileimports Stream Processing Users Other Systems AggregateTransform Often benefits to updating data as merging multiple datasets Dashboards & reports can have sub-second latency with indexes Aggregate read requirements - Benefits to using indexes for grouping - Aggregations natively in the DB would help - With indexes, can do aggregations on slices of data - Might want to look up across tables with indexes to aggregate
  • 28. Data Store for Aggregated Dataset … Siloed source databases External feeds (batch) Streams Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png Transform Store raw data AnalyzeAggregate Pub-sub,ETL,fileimports Stream Processing Users Other Systems AnalyzeAggregate Dashboards & reports can have sub-second latency with indexes Analytics read requirements - For scanning all of data, could be in any data store - Often want to analyze a slice of data (using indexes) - Querying on slices is best in MongoDB
  • 29. Data Store for Last Dataset … Siloed source databases External feeds (batch) Streams Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png Transform Store raw data AnalyzeAggregate Pub-sub,ETL,fileimports Stream Processing Users Other Systems Analyze Users Dashboards & reports can have sub-second latency with indexes - At the last step, there are many consuming systems and users - Need expressive querying with secondary indexes - MongoDB is best option for the publication or distribution of analytical results and operationalization of data Other Systems Often digital applications - High scale - Expressive querying - JSON preferred Often RESTful services, APIs
  • 30. Architettura EDM Completa … Siloed source databases External feeds (batch) Streams Data processing pipeline Pub-sub,ETL,fileimports Stream Processing Downstrea m Systems … … Single CSR Application Unified Digital Apps Operational Reporting … … … Analytic Reporting Drivers & Stacks Customer Clusterin g Churn Analysis Predictiv e Analytics … Distributed Processing Governance to choose where to load and process data Optimal location for providing operational response times & slices Can run processing on all data or slices Data Lake
  • 31. Example scenarios 1.Single Customer View a. Operational b. Analytics on customer segments c. Analytics on all customers 2.Customer profiles & clustering 3.Presenting churn analytics on high value customers
  • 32. Single View of Customer Spanish bank replaces Teradata and Microstrategy to increase business and avoid significant cost Problem Why MongoDB Results Problem Solution Results Took days to implement new functionality and business policies, inhibiting revenue growth Branches needed an app providing single view of the customer and real time recommendations for new products and services Multi-minute latency for accessing customer data stored in Teradata and Microstrategy Built single view of customer on MongoDB – flexible and scalable app easy to adapt to new business needs Super fast, ad hoc query capabilities (milliseconds), and real-time analytics thanks to MongoDB’s Aggregation Framework Can now leverage distributed infrastructure and commodity hardware for lower total cost of ownership and greater availability Cost avoidance of 10M$+ Application developed and deployed in less than 6 months. New business policies easily deployed and executed, bringing new revenue to the company Current capacity allows branches to load instantly all customer info in milliseconds, providing a great customer experience Large Spanish Bank
  • 33. Case Study Insurance leader generates coveted single view of customers in 90 days – “The Wall” Problem Why MongoDB ResultsProblem Solution Results No single view of customer, leading to poor customer experience and churn 145 years of policy data, 70+ systems, 15+ apps that are not integrated Spent 2 years, $25M trying build single view with Oracle – failed Built “The Wall” pulling in disparate data and serving single view to customer service reps in real time Flexible data model to aggregate disparate data into single data store Churn analysis done with Hadoop with relevant results output to MongoDB Prototyped in 2 weeks Deployed to production in 90 days Decreased churn and improved ability to upsell/cross-sell
  • 34. Top 15 Global Bank Kicking Out Oracle Global bank with 48M customers in 50 countries terminates Oracle ULA & makes MongoDB database of choice Problem Why MongoDB Results Problem Solution Results Slow development cycles due to RDBMS’ rigid data model hindering ability to meet business demands High TCO for hardware, licenses, development, and support (>$50M Oracle ULA) Poor overall performance of customer- facing and internal applications Building dozens of apps on MongoDB, both net new and migrations from Oracle – e.g., significant portion of retail banking, including customer-facing and backoffice apps, fraud detection, card activation, equity research content mgt.) Flexible data model to develop apps quickly and accommodate diverse data Ability to scale infrastructure and costs elastically Able to cancel Oracle ULA. Evaluating what apps can be migrated to MongoDB. For new apps, MongoDB is default choice Apps built in weeks instead of months or years, e.g., ebanking app prototyped in 2 weeks and in production in 4 weeks 70% TCO reduction

Notes de l'éditeur

  1. Stream processing is often separate processing layer than the batch processing, but it can be stored into the data stores at various stages
  2. Could make more visual
  3. Here we have greatly reduced the relational data model for this application to two tables. In reality no database has two tables. It is much more common to have hundreds or thousands of tables. And as a developer where do you begin when you have a complex data model?? If you’re building an app you’re really thinking about just a hand full of common things, like products, and these can be represented in a document much more easily that a complex relational model where the data is broken up in a way that doesn’t really reflect the way you think about the data or write an application. Document Model Benefits Agility and flexibility Data model supports business change Rapidly iterate to meet new requirements Intuitive, natural data representation Eliminates ORM layer Developers are more productive Reduces the need for joins, disk seeks Programming is more simple Performance delivered at scale
  4. "location" : {     "type" : "Point",     "coordinates" : [          -106.3974968970265,          31.79689833641156     ] }
  5. Rich queries, text search, geospatial, aggregation, mapreduce are types of things you can build based on the richness of the query model.
  6. Blend data from multiple sources for analysis Higher performance analytics with less application-side code and less effort from your developers Executed via the new $lookup operator, a stage in the MongoDB Aggregation Framework pipeline
  7. Start with the original collection; each record (document) contains a number of shapes (keys), each with a particular color (value) $match filters out documents that don’t contain a red diamond $project adds a new “square” attribute with a value computed from the value (color) of the snowflake and triangle attributes $lookup performs a left outer join with another collection, with the star being the comparison key Finally, the $group stage groups the data by the color of the square and produces statistics for each group
  8. High Availability – Ensure application availability during many types of failures Meet stringent SLAs with fast-failover algorithm Under 2 seconds to detect and recover from replica set primary failure Disaster Recovery – Address the RTO and RPO goals for business continuity Maintenance – Perform upgrades and other maintenance operations with no application downtime Secondaries can be used for a variety of applications – failover, hot backup, rolling upgrades, data locality and privacy and workload isolation
  9. Sharding is transparent to applications; whether there is one or one hundred shards, the application code for querying MongoDB is the same. Applications issue queries to a query router that dispatches the query to the appropriate shards. For key-value queries that are based on the shard key, the query router will dispatch the query to the shard that manages the document with the requested key. When using range-based sharding, queries that specify ranges on the shard key are only dispatched to shards that contain documents with values within the range. For queries that don’t use the shard key, the query router will dispatch the query to all shards and aggregate and sort the results as appropriate. Multiple query routers can be used with a MongoDB system, and the appropriate number is determined based on performance and availability requirements of the application.
  10. Kernel 3.2 Scope tracking: https://docs.google.com/spreadsheets/d/1L1EbbWoshUIHXBzCh5e3sALtAFxm_dJ52SRPR6GzeAY/edit#gid=0 Release notes for 3.1.6: http://docs.mongodb.org/manual/release-notes/3.1-dev-series/
  11. Stream processing is often separate processing layer than the batch processing, but it can be stored into the data stores at various stages
  12. Just a logical diagram. Processing could be on same physical servers as storage nodes to minimize data movement