SlideShare une entreprise Scribd logo
1  sur  23
MongoDB Hackathon 02
Vivek A. Ganesan
vivganes@gmail.com
Big Data Gods Meetup, Santa Clara, CA May
18, 2013
Before we start
Copyright 2013, Vivek A. Ganesan, All rights reserved 1
o A BIG thank you to our sponsors –
Big Data Cloud
o Meeting Space
o Food + Drinks
o Consulting/Training
Agenda
Copyright 2013, Vivek A. Ganesan, All rights reserved 2
o Review of Hackathon 01
o Data Modeling
o Indexing
o Aggregation
o Map/Reduce
Introduction
Copyright 2013, Vivek A. Ganesan, All rights reserved 3
o This is a hackathon, not a class
o Which means we work on stuff together
o Please consult and help your team mates
o There will be labs (that’s when we learn!)
o Talk to your team mates
o Figure out what problem you want to solve
o Think about your data sets and how to model them in
Mongo DB
Review – MongoDB Basics
Copyright 2013, Vivek A. Ganesan, All rights reserved 4
o MongoDB is a document-oriented NoSQL data store
o It saves data internally as Binary JSON
o A mongo data store may hold multiple databases
o A database may have multiple collections (analog of tables)
o A collection is a container of documents
o Documents contain Key/Value pairs
o A default key of “_id” is inserted by MongoDB for all documents
o User can set the value of “_id” to anything they want
o Documents are schema-free
o No fixed structure to a collection
o A collection can have documents with different key/value pairs
Review – Shell and Clients
Copyright 2013, Vivek A. Ganesan, All rights reserved 5
o A Mongo Shell is a CLI client to MongoDB
o Shell commands are Javascript functions
o You can write your own Javascript code within the shell
o You can also import Javascript modules using load()
o Mongo Shell looks for an initialization file : ~/.mongorc.js
o Setup global variables here
o To use your favorite editor within the Mongo shell :
o Set the environment variable EDITOR to your editor
o MongoDB supports clients in several programming languages :
o JS, Java, C, C++, C#, Scala, Python, Ruby, Perl and Erlang
Review – Mongo DB Objects
Copyright 2013, Vivek A. Ganesan, All rights reserved 6
o Note : Mongo Shell commands are in blue and output is in green
o Mongo uses a hierarchical naming scheme for database objects
o The current database is always in the db object
o The db command prints the name of the current db
o A collection called “mycollection” in the current database :
o db.mycollection (Note : This is a mongodb object)
o Commands are methods invoked on objects
o For e.g., to insert a document to db.mycollection collection :
o db.mycollection.insert command
o For e.g., to find documents in db.mycollection collection :
o db.mycollection.find command
Review – Create
Copyright 2013, Vivek A. Ganesan, All rights reserved 7
o First exercise :
o Create a new database called “blog”
o Create a collection called “users” and a collection called “posts”
o Solution to first exercise :
o use blog;
o db; => blog
o show collections; => system.indexes
o db.createCollection(“users”); => { “ok” => 1 }
o db.createCollection(“posts”); => { “ok” => 1 }
o show collections; => posts, system.indexes, users
Review – Insert
Copyright 2013, Vivek A. Ganesan, All rights reserved 8
o Second Exercise :
o In the “users” collection :
o Insert a single document, {username: “admin”}
o In the “posts” collection :
o Insert ten posts using a loop
o Blog data : post_title, post_body and post_tags as CSV
o Solution to Second Exercise :
o db.users.insert({username : “admin”});
o for (var i = 1; i <= 10; i++) { db.posts.insert({post_title:
"Title", post_body: "Post Body", post_tags:
"tag1,tag2,tag3,tag4,tag5"}); }
Review – Updates with modifier
Copyright 2013, Vivek A. Ganesan, All rights reserved 9
o Third Exercise :
o In the “posts” collection :
o Update ten posts with an updated_at key and set it to the
current timestamp
o Solution to the Third Exercise :
o Note : MongoDB replaces the entire document for an
update call without a modifier (modifiers start with a
‘$’ symbol)
o db.posts.update({}, {$set : {updated_at: new
Date()}}, false, true);
Review – Selective Updates
Copyright 2013, Vivek A. Ganesan, All rights reserved 10
o Fourth Exercise :
o In the “posts” collection :
o Update the posts such that the first three posts have a “foo”
tag (use the cursor functionality to iterate)
o Solution to the Fourth Exercise :
o c = db.posts.find().limit(3);
o while ( c.hasNext() ) {
o post = c.next();
o post["post_tags"] = post["post_tags"] + ",foo";
o db.posts.save(post);
o }
Review – Mastering find
Copyright 2013, Vivek A. Ganesan, All rights reserved 11
o In a Mongo Shell,
o Find all posts but extract only the post_title field
o db.posts.find({}, {post_title: 1, _id: 0});
o List all posts but in reverse order of created_on
o db.posts.find().sort({_id: -1});
o Do the same as above but paginate in sets of three
o db.posts.find().sort({_id: -1}).skip(3).limit(3);
o Find all posts that contain a tag called “foo”
o db.posts.find({post_tags: /foo/});
Review – Modifiers
Copyright 2013, Vivek A. Ganesan, All rights reserved 12
o Fifth Exercise :
o Modify “posts” collection
o Change the post_tags field to an array instead of a
CSV list
o c = db.posts.find();
o while ( c.hasNext() ) {
o post = c.next();
o post["post_tags"] = post["post_tags"].split(",");
o db.posts.save(post);
o }
Data Modeling
Copyright 2013, Vivek A. Ganesan, All rights reserved 13
o http://docs.mongodb.org/manual/core/data-modeling/
o When to reference?
o When it makes sense to i.e. many-to-many relationships
o When document size is a concern
o Some drivers may do this automatically
o When to embed?
o When it is “natural” for e.g. blog post and comments
o When there is a need for atomic operations
o When read performance is critical
Lab 01 – Model your data set
Copyright 2013, Vivek A. Ganesan, All rights reserved 14
o Break – 15 minutes
o Lab 01 – 45 minutes - With your team :
o Look at your data set and figure out how you will model it
o How would you bulk load the data?
o How would you handle errors while loading?
o Implement the schema for your data set
o Bulk load a small portion of your data set
o Verify the load and also run some sample queries
o Figure out what queries you would run frequently
Indexes
Copyright 2013, Vivek A. Ganesan, All rights reserved 15
o http://docs.mongodb.org/manual/core/indexes/
o When to index?
o Improve find performance
o Improve sort performance
o Note : There is a performance impact for writes
o What to index?
o Depends on the query
o Usually, most frequently searched for fields
o Sometimes, fields in embedded documents as well
Types of Indexes and Options
Copyright 2013, Vivek A. Ganesan, All rights reserved 16
o Unique indexes (_id has an unique index by default)
o Simple
o Compound Indexes
o Prefix order is important!
o Text indexes
o Sparse Indexes
o Multi-key indexes (for arrays)
o Geospatial and Geohaystack indexes
o Indexes can be built in the background (recommended!)
o Indexes can be named explicity (definitely recommened!)
Lab 02 – Indexes
Copyright 2013, Vivek A. Ganesan, All rights reserved 17
o Lab 02 – 30 minutes - With your team :
o Look at the frequent queries from Lab 01 and :
o Which would you index and why?
o What kind of indexes are needed?
o Since this is predominantly a read use case, index away
o Would you use the sparse index? For what and how?
o Would you use the geospatial index? For what and how?
o Would you use the TTL index? For what and how?
Aggregation
Copyright 2013, Vivek A. Ganesan, All rights reserved 18
o Used for “group by”-like queries
o Aggregation Framework (introduced in 2.1)
o http://docs.mongodb.org/manual/aggregation/
o Simple count : db.posts.count();
o Using Aggregation Framework : db.posts.aggregate([{
$group: { _id: null, count: {$sum: 1}}}]);
o Check the reference for comparison with SQL group by
o Still supports Map/Reduce (older approach and still relevant)
Lab 03 – Aggregation
Copyright 2013, Vivek A. Ganesan, All rights reserved 19
o Lab 03 – 30 minutes - With your team :
o Figure out what aggregations to run on the data set :
o For e.g., average rating per user?
o Or, average number of movies rated by all users?
o Write the queries for these aggregations and test them
o Are indexes helpful in aggregations? Why/Why not?
o Are you better off just doing these in your client code?
Why/Why not?
o When would you use pipelined aggregations?
Map/Reduce
Copyright 2013, Vivek A. Ganesan, All rights reserved 20
o Scatter/Gather framework
o db.collection.mapReduce(map_fn, red_fn, {out: output_coll})
o http://docs.mongodb.org/manual/aggregation/
o Mapper – just emits key/value pairs
o Framework – Groups and sorts mapper output => Reducer
o Reducer – Applies a function on the input => Output Coll.
o Distributed computation framework for full table scans
o http://docs.mongodb.org/manual/tutorial/map-reduce-
examples/
Lab 04 – Map/Reduce
Copyright 2013, Vivek A. Ganesan, All rights reserved 21
o Lab 04 – 30 minutes - With your team :
o Go through the Map/Reduce examples
o Figure out what Map/Reduce functions you would use
o Implement these functions (on a small data set)
o Some things to think about :
o Can you use Map/Reduce to “seed” your
recommendations?
o Can you use incremental Map/Reduce to “update”
your recommendations? How would you do this?
Questions? Comments?
Thank You!
E-mail: vivganes@gmail.com
Twitter : onevivek
Copyright 2013, Vivek A. Ganesan, All rights
reserved
22

Contenu connexe

Tendances

Search Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrSearch Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrKai Chan
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - ChicagoErik Hatcher
 
Starting with JSON Path Expressions in Oracle 12.1.0.2
Starting with JSON Path Expressions in Oracle 12.1.0.2Starting with JSON Path Expressions in Oracle 12.1.0.2
Starting with JSON Path Expressions in Oracle 12.1.0.2Marco Gralike
 
JSON in Oracle 18c and 19c
JSON in Oracle 18c and 19cJSON in Oracle 18c and 19c
JSON in Oracle 18c and 19cstewashton
 
JSON in 18c and 19c
JSON in 18c and 19cJSON in 18c and 19c
JSON in 18c and 19cstewashton
 
This upload requires better support for ODP format
This upload requires better support for ODP formatThis upload requires better support for ODP format
This upload requires better support for ODP formatForest Mars
 
UKOUG Tech14 - Getting Started With JSON in the Database
UKOUG Tech14 - Getting Started With JSON in the DatabaseUKOUG Tech14 - Getting Started With JSON in the Database
UKOUG Tech14 - Getting Started With JSON in the DatabaseMarco Gralike
 
Oracle Database - JSON and the In-Memory Database
Oracle Database - JSON and the In-Memory DatabaseOracle Database - JSON and the In-Memory Database
Oracle Database - JSON and the In-Memory DatabaseMarco Gralike
 
An introduction into Spring Data
An introduction into Spring DataAn introduction into Spring Data
An introduction into Spring DataOliver Gierke
 
Jdbc Java Programming
Jdbc Java ProgrammingJdbc Java Programming
Jdbc Java Programmingchhaichivon
 
MySQL without the SQL -- Cascadia PHP
MySQL without the SQL -- Cascadia PHPMySQL without the SQL -- Cascadia PHP
MySQL without the SQL -- Cascadia PHPDave Stokes
 
Los Angeles R users group - Dec 14 2010 - Part 2
Los Angeles R users group - Dec 14 2010 - Part 2Los Angeles R users group - Dec 14 2010 - Part 2
Los Angeles R users group - Dec 14 2010 - Part 2rusersla
 
第2回 Hadoop 輪読会
第2回 Hadoop 輪読会第2回 Hadoop 輪読会
第2回 Hadoop 輪読会Toshihiro Suzuki
 
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...Marco Gralike
 
UKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
UKOUG Tech14 - Using Database In-Memory Column Store with Complex DatatypesUKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
UKOUG Tech14 - Using Database In-Memory Column Store with Complex DatatypesMarco Gralike
 

Tendances (20)

Search Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrSearch Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and Solr
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago
 
Starting with JSON Path Expressions in Oracle 12.1.0.2
Starting with JSON Path Expressions in Oracle 12.1.0.2Starting with JSON Path Expressions in Oracle 12.1.0.2
Starting with JSON Path Expressions in Oracle 12.1.0.2
 
Spring data jpa
Spring data jpaSpring data jpa
Spring data jpa
 
JSON in Oracle 18c and 19c
JSON in Oracle 18c and 19cJSON in Oracle 18c and 19c
JSON in Oracle 18c and 19c
 
Full metal mongo
Full metal mongoFull metal mongo
Full metal mongo
 
JSON in 18c and 19c
JSON in 18c and 19cJSON in 18c and 19c
JSON in 18c and 19c
 
This upload requires better support for ODP format
This upload requires better support for ODP formatThis upload requires better support for ODP format
This upload requires better support for ODP format
 
UKOUG Tech14 - Getting Started With JSON in the Database
UKOUG Tech14 - Getting Started With JSON in the DatabaseUKOUG Tech14 - Getting Started With JSON in the Database
UKOUG Tech14 - Getting Started With JSON in the Database
 
Oracle Database - JSON and the In-Memory Database
Oracle Database - JSON and the In-Memory DatabaseOracle Database - JSON and the In-Memory Database
Oracle Database - JSON and the In-Memory Database
 
An introduction into Spring Data
An introduction into Spring DataAn introduction into Spring Data
An introduction into Spring Data
 
Persistences
PersistencesPersistences
Persistences
 
Jdbc Java Programming
Jdbc Java ProgrammingJdbc Java Programming
Jdbc Java Programming
 
MySQL without the SQL -- Cascadia PHP
MySQL without the SQL -- Cascadia PHPMySQL without the SQL -- Cascadia PHP
MySQL without the SQL -- Cascadia PHP
 
Android Data Storagefinal
Android Data StoragefinalAndroid Data Storagefinal
Android Data Storagefinal
 
Los Angeles R users group - Dec 14 2010 - Part 2
Los Angeles R users group - Dec 14 2010 - Part 2Los Angeles R users group - Dec 14 2010 - Part 2
Los Angeles R users group - Dec 14 2010 - Part 2
 
第2回 Hadoop 輪読会
第2回 Hadoop 輪読会第2回 Hadoop 輪読会
第2回 Hadoop 輪読会
 
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...
 
Mysql
MysqlMysql
Mysql
 
UKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
UKOUG Tech14 - Using Database In-Memory Column Store with Complex DatatypesUKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
UKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
 

En vedette

Collaborative filtering getting_started
Collaborative filtering getting_startedCollaborative filtering getting_started
Collaborative filtering getting_startedVivek Aanand Ganesan
 
Recommendation Engines Program Kickoff
Recommendation Engines Program KickoffRecommendation Engines Program Kickoff
Recommendation Engines Program KickoffVivek Aanand Ganesan
 
Luigi presentation NYC Data Science
Luigi presentation NYC Data ScienceLuigi presentation NYC Data Science
Luigi presentation NYC Data ScienceErik Bernhardsson
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakHakka Labs
 

En vedette (7)

Big data pipelines
Big data pipelinesBig data pipelines
Big data pipelines
 
Mongodb hackathon 01
Mongodb hackathon 01Mongodb hackathon 01
Mongodb hackathon 01
 
Collaborative filtering getting_started
Collaborative filtering getting_startedCollaborative filtering getting_started
Collaborative filtering getting_started
 
Recommendation Engines Program Kickoff
Recommendation Engines Program KickoffRecommendation Engines Program Kickoff
Recommendation Engines Program Kickoff
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Luigi presentation NYC Data Science
Luigi presentation NYC Data ScienceLuigi presentation NYC Data Science
Luigi presentation NYC Data Science
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
 

Similaire à Mongodb hackathon 02

Mdb dn 2016_07_elastic_search
Mdb dn 2016_07_elastic_searchMdb dn 2016_07_elastic_search
Mdb dn 2016_07_elastic_searchDaniel M. Farrell
 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBLisa Roth, PMP
 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB
 
mongodb11 (1) (1).pptx
mongodb11 (1) (1).pptxmongodb11 (1) (1).pptx
mongodb11 (1) (1).pptxRoopaR36
 
Building Spring Data with MongoDB
Building Spring Data with MongoDBBuilding Spring Data with MongoDB
Building Spring Data with MongoDBMongoDB
 
Getting Started - MongoDB
Getting Started - MongoDBGetting Started - MongoDB
Getting Started - MongoDBWildan Maulana
 
MongoDB World 2019: Fast Machine Learning Development with MongoDB
MongoDB World 2019: Fast Machine Learning Development with MongoDBMongoDB World 2019: Fast Machine Learning Development with MongoDB
MongoDB World 2019: Fast Machine Learning Development with MongoDBMongoDB
 
Top MongoDB interview Questions and Answers
Top MongoDB interview Questions and AnswersTop MongoDB interview Questions and Answers
Top MongoDB interview Questions and Answersjeetendra mandal
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBRaghunath A
 
1 Project 2 Introduction - the SeaPort Project seri.docx
1  Project 2 Introduction - the SeaPort Project seri.docx1  Project 2 Introduction - the SeaPort Project seri.docx
1 Project 2 Introduction - the SeaPort Project seri.docxhoney725342
 
Back to Basics 2017: Mí primera aplicación MongoDB
Back to Basics 2017: Mí primera aplicación MongoDBBack to Basics 2017: Mí primera aplicación MongoDB
Back to Basics 2017: Mí primera aplicación MongoDBMongoDB
 
Object Oriented Concepts and Principles
Object Oriented Concepts and PrinciplesObject Oriented Concepts and Principles
Object Oriented Concepts and Principlesdeonpmeyer
 
What's new in MongoDB v1.8
What's new in MongoDB v1.8What's new in MongoDB v1.8
What's new in MongoDB v1.8MongoDB
 

Similaire à Mongodb hackathon 02 (20)

The emerging world of mongo db csp
The emerging world of mongo db   cspThe emerging world of mongo db   csp
The emerging world of mongo db csp
 
Mongo learning series
Mongo learning series Mongo learning series
Mongo learning series
 
Mdb dn 2016_07_elastic_search
Mdb dn 2016_07_elastic_searchMdb dn 2016_07_elastic_search
Mdb dn 2016_07_elastic_search
 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
 
Mongo db
Mongo dbMongo db
Mongo db
 
mongodb11 (1) (1).pptx
mongodb11 (1) (1).pptxmongodb11 (1) (1).pptx
mongodb11 (1) (1).pptx
 
Building Spring Data with MongoDB
Building Spring Data with MongoDBBuilding Spring Data with MongoDB
Building Spring Data with MongoDB
 
Getting Started - MongoDB
Getting Started - MongoDBGetting Started - MongoDB
Getting Started - MongoDB
 
MongoDB World 2019: Fast Machine Learning Development with MongoDB
MongoDB World 2019: Fast Machine Learning Development with MongoDBMongoDB World 2019: Fast Machine Learning Development with MongoDB
MongoDB World 2019: Fast Machine Learning Development with MongoDB
 
Top MongoDB interview Questions and Answers
Top MongoDB interview Questions and AnswersTop MongoDB interview Questions and Answers
Top MongoDB interview Questions and Answers
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MongoDB
MongoDBMongoDB
MongoDB
 
Mongodb By Vipin
Mongodb By VipinMongodb By Vipin
Mongodb By Vipin
 
MongoDB
MongoDBMongoDB
MongoDB
 
1 Project 2 Introduction - the SeaPort Project seri.docx
1  Project 2 Introduction - the SeaPort Project seri.docx1  Project 2 Introduction - the SeaPort Project seri.docx
1 Project 2 Introduction - the SeaPort Project seri.docx
 
Mongo-Drupal
Mongo-DrupalMongo-Drupal
Mongo-Drupal
 
Back to Basics 2017: Mí primera aplicación MongoDB
Back to Basics 2017: Mí primera aplicación MongoDBBack to Basics 2017: Mí primera aplicación MongoDB
Back to Basics 2017: Mí primera aplicación MongoDB
 
Object Oriented Concepts and Principles
Object Oriented Concepts and PrinciplesObject Oriented Concepts and Principles
Object Oriented Concepts and Principles
 
What's new in MongoDB v1.8
What's new in MongoDB v1.8What's new in MongoDB v1.8
What's new in MongoDB v1.8
 

Dernier

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Dernier (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Mongodb hackathon 02

  • 1. MongoDB Hackathon 02 Vivek A. Ganesan vivganes@gmail.com Big Data Gods Meetup, Santa Clara, CA May 18, 2013
  • 2. Before we start Copyright 2013, Vivek A. Ganesan, All rights reserved 1 o A BIG thank you to our sponsors – Big Data Cloud o Meeting Space o Food + Drinks o Consulting/Training
  • 3. Agenda Copyright 2013, Vivek A. Ganesan, All rights reserved 2 o Review of Hackathon 01 o Data Modeling o Indexing o Aggregation o Map/Reduce
  • 4. Introduction Copyright 2013, Vivek A. Ganesan, All rights reserved 3 o This is a hackathon, not a class o Which means we work on stuff together o Please consult and help your team mates o There will be labs (that’s when we learn!) o Talk to your team mates o Figure out what problem you want to solve o Think about your data sets and how to model them in Mongo DB
  • 5. Review – MongoDB Basics Copyright 2013, Vivek A. Ganesan, All rights reserved 4 o MongoDB is a document-oriented NoSQL data store o It saves data internally as Binary JSON o A mongo data store may hold multiple databases o A database may have multiple collections (analog of tables) o A collection is a container of documents o Documents contain Key/Value pairs o A default key of “_id” is inserted by MongoDB for all documents o User can set the value of “_id” to anything they want o Documents are schema-free o No fixed structure to a collection o A collection can have documents with different key/value pairs
  • 6. Review – Shell and Clients Copyright 2013, Vivek A. Ganesan, All rights reserved 5 o A Mongo Shell is a CLI client to MongoDB o Shell commands are Javascript functions o You can write your own Javascript code within the shell o You can also import Javascript modules using load() o Mongo Shell looks for an initialization file : ~/.mongorc.js o Setup global variables here o To use your favorite editor within the Mongo shell : o Set the environment variable EDITOR to your editor o MongoDB supports clients in several programming languages : o JS, Java, C, C++, C#, Scala, Python, Ruby, Perl and Erlang
  • 7. Review – Mongo DB Objects Copyright 2013, Vivek A. Ganesan, All rights reserved 6 o Note : Mongo Shell commands are in blue and output is in green o Mongo uses a hierarchical naming scheme for database objects o The current database is always in the db object o The db command prints the name of the current db o A collection called “mycollection” in the current database : o db.mycollection (Note : This is a mongodb object) o Commands are methods invoked on objects o For e.g., to insert a document to db.mycollection collection : o db.mycollection.insert command o For e.g., to find documents in db.mycollection collection : o db.mycollection.find command
  • 8. Review – Create Copyright 2013, Vivek A. Ganesan, All rights reserved 7 o First exercise : o Create a new database called “blog” o Create a collection called “users” and a collection called “posts” o Solution to first exercise : o use blog; o db; => blog o show collections; => system.indexes o db.createCollection(“users”); => { “ok” => 1 } o db.createCollection(“posts”); => { “ok” => 1 } o show collections; => posts, system.indexes, users
  • 9. Review – Insert Copyright 2013, Vivek A. Ganesan, All rights reserved 8 o Second Exercise : o In the “users” collection : o Insert a single document, {username: “admin”} o In the “posts” collection : o Insert ten posts using a loop o Blog data : post_title, post_body and post_tags as CSV o Solution to Second Exercise : o db.users.insert({username : “admin”}); o for (var i = 1; i <= 10; i++) { db.posts.insert({post_title: "Title", post_body: "Post Body", post_tags: "tag1,tag2,tag3,tag4,tag5"}); }
  • 10. Review – Updates with modifier Copyright 2013, Vivek A. Ganesan, All rights reserved 9 o Third Exercise : o In the “posts” collection : o Update ten posts with an updated_at key and set it to the current timestamp o Solution to the Third Exercise : o Note : MongoDB replaces the entire document for an update call without a modifier (modifiers start with a ‘$’ symbol) o db.posts.update({}, {$set : {updated_at: new Date()}}, false, true);
  • 11. Review – Selective Updates Copyright 2013, Vivek A. Ganesan, All rights reserved 10 o Fourth Exercise : o In the “posts” collection : o Update the posts such that the first three posts have a “foo” tag (use the cursor functionality to iterate) o Solution to the Fourth Exercise : o c = db.posts.find().limit(3); o while ( c.hasNext() ) { o post = c.next(); o post["post_tags"] = post["post_tags"] + ",foo"; o db.posts.save(post); o }
  • 12. Review – Mastering find Copyright 2013, Vivek A. Ganesan, All rights reserved 11 o In a Mongo Shell, o Find all posts but extract only the post_title field o db.posts.find({}, {post_title: 1, _id: 0}); o List all posts but in reverse order of created_on o db.posts.find().sort({_id: -1}); o Do the same as above but paginate in sets of three o db.posts.find().sort({_id: -1}).skip(3).limit(3); o Find all posts that contain a tag called “foo” o db.posts.find({post_tags: /foo/});
  • 13. Review – Modifiers Copyright 2013, Vivek A. Ganesan, All rights reserved 12 o Fifth Exercise : o Modify “posts” collection o Change the post_tags field to an array instead of a CSV list o c = db.posts.find(); o while ( c.hasNext() ) { o post = c.next(); o post["post_tags"] = post["post_tags"].split(","); o db.posts.save(post); o }
  • 14. Data Modeling Copyright 2013, Vivek A. Ganesan, All rights reserved 13 o http://docs.mongodb.org/manual/core/data-modeling/ o When to reference? o When it makes sense to i.e. many-to-many relationships o When document size is a concern o Some drivers may do this automatically o When to embed? o When it is “natural” for e.g. blog post and comments o When there is a need for atomic operations o When read performance is critical
  • 15. Lab 01 – Model your data set Copyright 2013, Vivek A. Ganesan, All rights reserved 14 o Break – 15 minutes o Lab 01 – 45 minutes - With your team : o Look at your data set and figure out how you will model it o How would you bulk load the data? o How would you handle errors while loading? o Implement the schema for your data set o Bulk load a small portion of your data set o Verify the load and also run some sample queries o Figure out what queries you would run frequently
  • 16. Indexes Copyright 2013, Vivek A. Ganesan, All rights reserved 15 o http://docs.mongodb.org/manual/core/indexes/ o When to index? o Improve find performance o Improve sort performance o Note : There is a performance impact for writes o What to index? o Depends on the query o Usually, most frequently searched for fields o Sometimes, fields in embedded documents as well
  • 17. Types of Indexes and Options Copyright 2013, Vivek A. Ganesan, All rights reserved 16 o Unique indexes (_id has an unique index by default) o Simple o Compound Indexes o Prefix order is important! o Text indexes o Sparse Indexes o Multi-key indexes (for arrays) o Geospatial and Geohaystack indexes o Indexes can be built in the background (recommended!) o Indexes can be named explicity (definitely recommened!)
  • 18. Lab 02 – Indexes Copyright 2013, Vivek A. Ganesan, All rights reserved 17 o Lab 02 – 30 minutes - With your team : o Look at the frequent queries from Lab 01 and : o Which would you index and why? o What kind of indexes are needed? o Since this is predominantly a read use case, index away o Would you use the sparse index? For what and how? o Would you use the geospatial index? For what and how? o Would you use the TTL index? For what and how?
  • 19. Aggregation Copyright 2013, Vivek A. Ganesan, All rights reserved 18 o Used for “group by”-like queries o Aggregation Framework (introduced in 2.1) o http://docs.mongodb.org/manual/aggregation/ o Simple count : db.posts.count(); o Using Aggregation Framework : db.posts.aggregate([{ $group: { _id: null, count: {$sum: 1}}}]); o Check the reference for comparison with SQL group by o Still supports Map/Reduce (older approach and still relevant)
  • 20. Lab 03 – Aggregation Copyright 2013, Vivek A. Ganesan, All rights reserved 19 o Lab 03 – 30 minutes - With your team : o Figure out what aggregations to run on the data set : o For e.g., average rating per user? o Or, average number of movies rated by all users? o Write the queries for these aggregations and test them o Are indexes helpful in aggregations? Why/Why not? o Are you better off just doing these in your client code? Why/Why not? o When would you use pipelined aggregations?
  • 21. Map/Reduce Copyright 2013, Vivek A. Ganesan, All rights reserved 20 o Scatter/Gather framework o db.collection.mapReduce(map_fn, red_fn, {out: output_coll}) o http://docs.mongodb.org/manual/aggregation/ o Mapper – just emits key/value pairs o Framework – Groups and sorts mapper output => Reducer o Reducer – Applies a function on the input => Output Coll. o Distributed computation framework for full table scans o http://docs.mongodb.org/manual/tutorial/map-reduce- examples/
  • 22. Lab 04 – Map/Reduce Copyright 2013, Vivek A. Ganesan, All rights reserved 21 o Lab 04 – 30 minutes - With your team : o Go through the Map/Reduce examples o Figure out what Map/Reduce functions you would use o Implement these functions (on a small data set) o Some things to think about : o Can you use Map/Reduce to “seed” your recommendations? o Can you use incremental Map/Reduce to “update” your recommendations? How would you do this?
  • 23. Questions? Comments? Thank You! E-mail: vivganes@gmail.com Twitter : onevivek Copyright 2013, Vivek A. Ganesan, All rights reserved 22