SlideShare une entreprise Scribd logo
1  sur  29
Partner Technical Solutions, MongoDB
Sandeep Parikh
#MongoDBWorld
MongoDB and Hadoop
Driving Business Insights
Agenda
• Evolving Data Landscape
• MongoDB & Hadoop Use Cases
• MongoDB Connector Features
• Demo
Evolving Data Landscape
Hadoop
The Apache Hadoop software library is a framework
that allows for the distributed processing of large data
sets across clusters of computers using simple
programming models.
• Terabyte and Petabtye datasets
• Data warehousing
• Advanced analytics
Enterprise IT Stack
EDW
Management&Monitoring
Security&Auditing
RDBMS
CRM, ERP, Collaboration, Mobile, BI
OS & Virtualization, Compute, Storage, Network
RDBMS
Applications
Infrastructure
Data Management
Operational Analytical
Operational vs. Analytical:
Enrichment
Applications, Interactions Warehouse, Analytics
Operational: MongoDB
Real-Time
Analytics
Product/Asset
Catalogs
Security &
Fraud
Internet of
Things
Mobile Apps
Customer
Data Mgmt
Single View Social
Churn Analysis Recommender
Warehouse &
ETL
Risk Modeling
Trade
Surveillance
Predictive
Analytics
Ad Targeting
Sentiment
Analysis
Analytical: Hadoop
Real-Time
Analytics
Product/Asset
Catalogs
Security &
Fraud
Internet of
Things
Mobile Apps
Customer
Data Mgmt
Single View Social
Churn Analysis Recommender
Warehouse &
ETL
Risk Modeling
Trade
Surveillance
Predictive
Analytics
Ad Targeting
Sentiment
Analysis
Operational vs. Analytical: Lifecycle
Real-Time
Analytics
Product/Asset
Catalogs
Security &
Fraud
Internet of
Things
Mobile Apps
Customer
Data Mgmt
Single View Social
Churn Analysis Recommender
Warehouse &
ETL
Risk Modeling
Trade
Surveillance
Predictive
Analytics
Ad Targeting
Sentiment
Analysis
MongoDB & Hadoop Use
Cases
Commerce
Applications
powered by
Analysis
powered by
• Products & Inventory
• Recommended products
• Customer profile
• Session management
• Elastic pricing
• Recommendation models
• Predictive analytics
• Clickstream history
MongoDB
Connector for
Hadoop
Insurance
Applications
powered by
Analysis
powered by
• Customer profiles
• Insurance policies
• Session data
• Call center data
• Customer action analysis
• Churn analysis
• Churn prediction
• Policy rates
MongoDB
Connector for
Hadoop
Fraud Detection
Payments
Fraud modeling
Nightly
Analysis
MongoDB Connector
for Hadoop
Results
Cache
Online payments
processing
3rd Party Data
Sources
Fraud
Detection
query
only
query
only
MongoDB Connector for
Hadoop
Connector Overview
Data
Read/Write
MongoDB
Read/Write
BSON
Tools
MapReduce
Pig
Hive
Spark
Platforms
Apache Hadoop
Cloudera CDH
Hortonworks HDP
Amazon EMR
Connector Features and
Functionality
• Computes splits to read data
– Single Node, Replica Sets, Sharded Clusters
• Mappings for Pig and Hive
– MongoDB as a standard data source/destination
• Support for
– Filtering data with MongoDB queries
– Authentication
– Reading from Replica Set tags
– Appending to existing collections
MapReduce Configuration
• MongoDB input
– mongo.job.input.format = com.hadoop.MongoInputFormat
– mongo.input.uri = mongodb://mydb:27017/db1.collection1
• MongoDB output
– mongo.job.output.format = com.hadoop.MongoOutputFormat
– mongo.output.uri = mongodb://mydb:27017/db1.collection2
• BSON input/output
– mongo.job.input.format = com.hadoop.BSONFileInputFormat
– mapred.input.dir = hdfs:///tmp/database.bson
– mongo.job.output.format = com.hadoop.BSONFileOutputFormat
– mapred.output.dir = hdfs:///tmp/output.bson
Pig Mappings
• Input: BSONLoader and MongoLoader
data = LOAD ‘mongodb://mydb:27017/db.collection’
using com.mongodb.hadoop.pig.MongoLoader
• Output: BSONStorage and MongoInsertStorage
STORE records INTO ‘hdfs:///output.bson’
using com.mongodb.hadoop.pig.BSONStorage
Hive Support
CREATE TABLE mongo_users (id int, name string, age int)
STORED BY "com.mongodb.hadoop.hive.MongoStorageHandler"
WITH SERDEPROPERTIES("mongo.columns.mapping” = "_id,name,age”)
TBLPROPERTIES("mongo.uri" = "mongodb://host:27017/test.users”)
• Access collections as Hive tables
• Use with MongoStorageHandler or BSONStorageHandler
Spark Usage
• Use with MapReduce
input/output formats
• Create Configuration objects
with input/output formats and
data URI
• Load/save data using
SparkContext Hadoop fileAPI
Data Movement
Dynamic queries with
most recent data
Puts load on operational
database
Snapshots move load to
Hadoop
Snapshots add
predictable load to
MongoDB
Dynamic queries to MongoDB vs. BSON snapshots in
HDFS
Demo
MovieWeb
MovieWeb Components
• MovieLens dataset
– 10M ratings, 10K movies, 70K users
• Python web app to browse movies,
recommendations
– Flask, PyMongo
• Spark app computes recommendations
– MLLib collaborative filter
• Predicted ratings are exposed in web app
– New predictions collection
MovieWeb Web Application
• Browse
– Top movies by ratings count
– Top genres by movie count
• Log in to
– See My Ratings
– Rate movies
• What’s missing?
– Movies You May Like
– Recommendations
Spark Recommender
• Apache Hadoop 2.3.0
– HDFS and YARN
• Spark 1.0
– Execute within YARN
– Assign executor
resources
• Data
– From HDFS,
MongoDB
– To MongoDB
Snapshot
database as
BSON
Store BSON in
HDFS
Read BSON into
Spark app
Train model from
existing ratings
Create user-
movie pairings
Predict ratings for
all pairings
Write predictions
to MongoDB
collection
Web application
exposes
recommendations
Repeat the
process weekly
MovieWeb Workflow
$ export SPARK_JAR=spark-assembly-1.0.0-hadoop2.3.0.jar
$ export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
$ bin/spark-submit
--master yarn-cluster
--class com.mongodb.hadoop.demo.Recommender demo-1.0.jar
--jars mongo-java-2.12.2.jar,mongo-hadoop-1.2.1.jar
--driver-memory 1G
--executor-memory 2G
--num-executors 4
Execution
Questions?

Contenu connexe

Tendances

Spark and MongoDB
Spark and MongoDBSpark and MongoDB
Spark and MongoDB
Norberto Leite
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
MongoDB
 
MongoDB et Hadoop
MongoDB et HadoopMongoDB et Hadoop
MongoDB et Hadoop
MongoDB
 
Social Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBSocial Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDB
Takahiro Inoue
 

Tendances (20)

How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own Datasource
 
Spark and MongoDB
Spark and MongoDBSpark and MongoDB
Spark and MongoDB
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
 
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...
 
MongoDB et Hadoop
MongoDB et HadoopMongoDB et Hadoop
MongoDB et Hadoop
 
MongoDB + Spring
MongoDB + SpringMongoDB + Spring
MongoDB + Spring
 
Webinar: Choosing the Right Shard Key for High Performance and Scale
Webinar: Choosing the Right Shard Key for High Performance and ScaleWebinar: Choosing the Right Shard Key for High Performance and Scale
Webinar: Choosing the Right Shard Key for High Performance and Scale
 
Mongo db 3.4 Overview
Mongo db 3.4 OverviewMongo db 3.4 Overview
Mongo db 3.4 Overview
 
Webinar: Live Data Visualisation with Tableau and MongoDB
Webinar: Live Data Visualisation with Tableau and MongoDBWebinar: Live Data Visualisation with Tableau and MongoDB
Webinar: Live Data Visualisation with Tableau and MongoDB
 
3 scenarios when to use MongoDB!
3 scenarios when to use MongoDB!3 scenarios when to use MongoDB!
3 scenarios when to use MongoDB!
 
Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 
Data Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudioData Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudio
 
Social Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBSocial Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDB
 
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQL
MongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQLMongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQL
MongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQL
 
Webinar: What's New in MongoDB 3.2
Webinar: What's New in MongoDB 3.2Webinar: What's New in MongoDB 3.2
Webinar: What's New in MongoDB 3.2
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Hermes: Free the Data! Distributed Computing with MongoDB
Hermes: Free the Data! Distributed Computing with MongoDBHermes: Free the Data! Distributed Computing with MongoDB
Hermes: Free the Data! Distributed Computing with MongoDB
 

Similaire à MongoDB and Hadoop: Driving Business Insights

Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDB
MongoDB
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data Presentation
MongoDB
 

Similaire à MongoDB and Hadoop: Driving Business Insights (20)

MongoDB and Hadoop
MongoDB and HadoopMongoDB and Hadoop
MongoDB and Hadoop
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business Insights
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDB
 
Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
Webinar: MongoDB and Hadoop - Working Together to provide Business InsightsWebinar: MongoDB and Hadoop - Working Together to provide Business Insights
Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDaysConexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
 
MongoDB Introduction, Installation & Execution
MongoDB Introduction, Installation & ExecutionMongoDB Introduction, Installation & Execution
MongoDB Introduction, Installation & Execution
 
MongoDB and Spring - Two leaves of a same tree
MongoDB and Spring - Two leaves of a same treeMongoDB and Spring - Two leaves of a same tree
MongoDB and Spring - Two leaves of a same tree
 
Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDB
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
 
Building your first app with MongoDB
Building your first app with MongoDBBuilding your first app with MongoDB
Building your first app with MongoDB
 
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDBMongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
 
Dataweek-Talk-2014
Dataweek-Talk-2014Dataweek-Talk-2014
Dataweek-Talk-2014
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data Presentation
 
Python and MongoDB
Python and MongoDB Python and MongoDB
Python and MongoDB
 
Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at Scale
 
An introduction to MongoDB by César Trigo #OpenExpoDay 2014
An introduction to MongoDB by César Trigo #OpenExpoDay 2014An introduction to MongoDB by César Trigo #OpenExpoDay 2014
An introduction to MongoDB by César Trigo #OpenExpoDay 2014
 
An introduction to MongoDB
An introduction to MongoDBAn introduction to MongoDB
An introduction to MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB and Python
MongoDB and PythonMongoDB and Python
MongoDB and Python
 

Plus de MongoDB

Plus de MongoDB (20)

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Dernier (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

MongoDB and Hadoop: Driving Business Insights

  • 1. Partner Technical Solutions, MongoDB Sandeep Parikh #MongoDBWorld MongoDB and Hadoop Driving Business Insights
  • 2. Agenda • Evolving Data Landscape • MongoDB & Hadoop Use Cases • MongoDB Connector Features • Demo
  • 4. Hadoop The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. • Terabyte and Petabtye datasets • Data warehousing • Advanced analytics
  • 5. Enterprise IT Stack EDW Management&Monitoring Security&Auditing RDBMS CRM, ERP, Collaboration, Mobile, BI OS & Virtualization, Compute, Storage, Network RDBMS Applications Infrastructure Data Management Operational Analytical
  • 6. Operational vs. Analytical: Enrichment Applications, Interactions Warehouse, Analytics
  • 7. Operational: MongoDB Real-Time Analytics Product/Asset Catalogs Security & Fraud Internet of Things Mobile Apps Customer Data Mgmt Single View Social Churn Analysis Recommender Warehouse & ETL Risk Modeling Trade Surveillance Predictive Analytics Ad Targeting Sentiment Analysis
  • 8. Analytical: Hadoop Real-Time Analytics Product/Asset Catalogs Security & Fraud Internet of Things Mobile Apps Customer Data Mgmt Single View Social Churn Analysis Recommender Warehouse & ETL Risk Modeling Trade Surveillance Predictive Analytics Ad Targeting Sentiment Analysis
  • 9. Operational vs. Analytical: Lifecycle Real-Time Analytics Product/Asset Catalogs Security & Fraud Internet of Things Mobile Apps Customer Data Mgmt Single View Social Churn Analysis Recommender Warehouse & ETL Risk Modeling Trade Surveillance Predictive Analytics Ad Targeting Sentiment Analysis
  • 10. MongoDB & Hadoop Use Cases
  • 11. Commerce Applications powered by Analysis powered by • Products & Inventory • Recommended products • Customer profile • Session management • Elastic pricing • Recommendation models • Predictive analytics • Clickstream history MongoDB Connector for Hadoop
  • 12. Insurance Applications powered by Analysis powered by • Customer profiles • Insurance policies • Session data • Call center data • Customer action analysis • Churn analysis • Churn prediction • Policy rates MongoDB Connector for Hadoop
  • 13. Fraud Detection Payments Fraud modeling Nightly Analysis MongoDB Connector for Hadoop Results Cache Online payments processing 3rd Party Data Sources Fraud Detection query only query only
  • 16. Connector Features and Functionality • Computes splits to read data – Single Node, Replica Sets, Sharded Clusters • Mappings for Pig and Hive – MongoDB as a standard data source/destination • Support for – Filtering data with MongoDB queries – Authentication – Reading from Replica Set tags – Appending to existing collections
  • 17. MapReduce Configuration • MongoDB input – mongo.job.input.format = com.hadoop.MongoInputFormat – mongo.input.uri = mongodb://mydb:27017/db1.collection1 • MongoDB output – mongo.job.output.format = com.hadoop.MongoOutputFormat – mongo.output.uri = mongodb://mydb:27017/db1.collection2 • BSON input/output – mongo.job.input.format = com.hadoop.BSONFileInputFormat – mapred.input.dir = hdfs:///tmp/database.bson – mongo.job.output.format = com.hadoop.BSONFileOutputFormat – mapred.output.dir = hdfs:///tmp/output.bson
  • 18. Pig Mappings • Input: BSONLoader and MongoLoader data = LOAD ‘mongodb://mydb:27017/db.collection’ using com.mongodb.hadoop.pig.MongoLoader • Output: BSONStorage and MongoInsertStorage STORE records INTO ‘hdfs:///output.bson’ using com.mongodb.hadoop.pig.BSONStorage
  • 19. Hive Support CREATE TABLE mongo_users (id int, name string, age int) STORED BY "com.mongodb.hadoop.hive.MongoStorageHandler" WITH SERDEPROPERTIES("mongo.columns.mapping” = "_id,name,age”) TBLPROPERTIES("mongo.uri" = "mongodb://host:27017/test.users”) • Access collections as Hive tables • Use with MongoStorageHandler or BSONStorageHandler
  • 20. Spark Usage • Use with MapReduce input/output formats • Create Configuration objects with input/output formats and data URI • Load/save data using SparkContext Hadoop fileAPI
  • 21. Data Movement Dynamic queries with most recent data Puts load on operational database Snapshots move load to Hadoop Snapshots add predictable load to MongoDB Dynamic queries to MongoDB vs. BSON snapshots in HDFS
  • 22. Demo
  • 24. MovieWeb Components • MovieLens dataset – 10M ratings, 10K movies, 70K users • Python web app to browse movies, recommendations – Flask, PyMongo • Spark app computes recommendations – MLLib collaborative filter • Predicted ratings are exposed in web app – New predictions collection
  • 25. MovieWeb Web Application • Browse – Top movies by ratings count – Top genres by movie count • Log in to – See My Ratings – Rate movies • What’s missing? – Movies You May Like – Recommendations
  • 26. Spark Recommender • Apache Hadoop 2.3.0 – HDFS and YARN • Spark 1.0 – Execute within YARN – Assign executor resources • Data – From HDFS, MongoDB – To MongoDB
  • 27. Snapshot database as BSON Store BSON in HDFS Read BSON into Spark app Train model from existing ratings Create user- movie pairings Predict ratings for all pairings Write predictions to MongoDB collection Web application exposes recommendations Repeat the process weekly MovieWeb Workflow
  • 28. $ export SPARK_JAR=spark-assembly-1.0.0-hadoop2.3.0.jar $ export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop $ bin/spark-submit --master yarn-cluster --class com.mongodb.hadoop.demo.Recommender demo-1.0.jar --jars mongo-java-2.12.2.jar,mongo-hadoop-1.2.1.jar --driver-memory 1G --executor-memory 2G --num-executors 4 Execution