SlideShare une entreprise Scribd logo
1  sur  72
Télécharger pour lire hors ligne
Data Modeling for
                 Performance


Mongo Boulder                 Michael Dwan
January 21, 2010                     Snapjoy
i’m michael dwan
 @michaeldwan on the twitter
the project
  Company X
• find business details (web + api)
• search by category/keyword + geo (web + api)
• update (api)



                                   application spec
100,000             30,000
                                 100,000,000
geo areas                              tags
                   partners

                                    2,300
   15,000,000                     categories

       businesses
                              2,000,000
                              requests daily
24,000,000
 urls in sitemap
                          why is this interesting?
• infrequent changes
• monthly updates w/ 12M monthly changes
• “zero downtime”



                                           updates
the problem
 mo’ data, mo’ problems
complexity
providers          mappings                phone_numbers

                                                                          zips
 assets

                               businesses _phone_numbers

                                                                         cities
categorizations




                             businesses
                                                                         states
  categories


                                                           businesses_neighborhoods
                  taggings



                                    users
    tags                                                        neighborhoods
x
xx   x
     architecture
read performance
dow
   n ti
       me
solr
solr getting fussy
dow
      n ti
          me
migrations
the solution
> gem install acts_as_web_scale
{
    "_id" : ObjectId("4ce838ef4a882579960001b9"),
    "name" : "Acme Glass Co",
    "tagline" : "Your trusty glass hole",
    "description" : "Glass repair...",
    "hours" : "Mon Fri 8 5",
    "url" : "http://acmeglasshole.biz",
}




                                        a business...
{
    "_id" : ObjectId("4ce838ef4a882579960001b9"),
    "name" : "Acme Glass Co",
    "tagline" : "Your trusty glass hole",
    "description" : "Glass repair...",
    "hours" : "Mon Fri 8 5",
    "url" : "http://acmeglasshole.biz",
}




            a business... has many phone numbers
{
    "_id" : ObjectId("4ce838ef4a882579960001b9"),
    "name" : "Acme Glass Co",
    "tagline" : "Your trusty glass hole",
    "description" : "Glass repair...",
    "hours" : "Mon Fri 8 5",
    "url" : "http://acmeglasshole.biz",
    "phone_numbers" : [
      "5035550091",
      "8005555456"
    ]
}


            a business... has many phone numbers
"_id" : ObjectId("4ce838ef4a882579960001b9"),
    "name" : "Acme Glass Co",
    "tagline" : "Your trusty glass hole",
    "description" : "Glass repair...",
    "hours" : "Mon Fri 8 5",
    "url" : "http://acmeglasshole.biz",
    "phone_numbers" : [
      "5035550091",
      "8005555456"
    ]
}




                      a business... has coordinates
"_id" : ObjectId("4ce838ef4a882579960001b9"),
    "name" : "Acme Glass Co",
    "tagline" : "Your trusty glass hole",
    "description" : "Glass repair...",
    "hours" : "Mon Fri 8 5",
    "url" : "http://acmeglasshole.biz",
    "phone_numbers" : [
       "5035550091",
       "8005555456"
    ],
    "coordinates" : [
       45.559294,
       -122.644053
    ]
}



                      a business... has coordinates
"url" : "http://acmeglasshole.biz",
    "phone_numbers" : [
       "5035550091",
       "8005555456"
    ],
    "coordinates" : [
       45.559294,
       -122.644053
    ]
}




                        a business... has many tags
"url" : "http://acmeglasshole.biz",
    "phone_numbers" : [
       "5035550091",
       "8005555456"
    ],
    "coordinates" : [
       45.559294,
       -122.644053
    ],
    "tags" : [
       "glass",
       "mirrors",
       "flat glass"
    ]
}



                        a business... has many tags
"coordinates" : [
       45.559294,
       -122.644053
    ],
    "tags" : [
       "glass",
       "mirrors",
       "flat glass"
    ]
}




                        a business... has an address
"coordinates" : [
       45.559294,
       -122.644053
    ],
    "tags" : [
       "glass",
       "mirrors",
       "flat glass"
    ],
    "location" : {
       "street_address" : "2035 NE Alberta St"
    }
}




                         a business... has an address
belongs to?
{
    "_id" : ObjectId("4ce82937961552247900000f"),
    "name" : "Illinois",
    "slug" : "il",
    ...
}




                                             a state
"coordinates" : [
       45.559294,
       -122.644053
    ],
    "tags" : [
       "glass",
       "mirrors",
       "flat glass"
    ],
    "location" : {
       "street_address" : "2035 NE Alberta St"
    }
}




                     a business... belongs to a state
"tags" : [
       "glass",
       "mirrors",
       "flat glass"
    ],
    "location" : {
       "street_address" : "2035 NE Alberta St"
    }
}




                     a business... belongs to a state
"tags" : [
       "glass",
       "mirrors",
       "flat glass"
    ],
    "location" : {
       "street_address" : "2035 NE Alberta St",
       "state" : {
         "_id" : ObjectId("4ce829379615522479000026"),
         "meta" : {
            "slug" : "or"
         },
         "display_name" : "Oregon"
       }
    }
}


                     a business... belongs to a state
"state" : {
          "_id" : ObjectId("4ce829379615522479000026"),
          "meta" : {
             "slug" : "or"
          },
          "display_name" : "Oregon"
        }
    }
}




                          a business... belongs to a city
"state" : {
           "_id" : ObjectId("4ce829379615522479000026"),
           "meta" : {
              "slug" : "or"
           },
           "display_name" : "Oregon"
        },
        "city" : {
           "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"),
           "meta" : {
              "slug" : "portland",
           },
           "display_name" : "Portland, OR"
        },
    }
}

                          a business... belongs to a city
},
          "display_name" : "Oregon"
        },
        "city" : {
           "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"),
           "meta" : {
              "slug" : "portland",
           },
           "display_name" : "Portland, OR"
        },
    }
}




                     a business... belongs to a zip code
},
          "display_name" : "Oregon"
        },
        "city" : {
           "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"),
           "meta" : {
              "slug" : "portland",
           },
           "display_name" : "Portland, OR"
        },
        "zip" : {
           "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"),
           "display_name" : "97211"
        }
    }
}

                     a business... belongs to a zip code
many-to-many?
{
    "_id" : ObjectId("4ce82e64d3dfaa16360014eb"),
    "name" : "Auto Glass",
    "slug" : "3063-auto-glass",
    "tags" : [
       "windshields"
    ],
    ...
}




                                       a category
"meta" : {
             "slug" : "or"
          },
          "display_name" : "Oregon"
        },
        "city" : {
           "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"),
           "meta" : {
              "slug" : "portland",
           },
           "display_name" : "Portland, OR"
        },
        "zip" : {
           "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"),
           "display_name" : "97211"
        }
    }
}




                        a business... belongs to a zip code
}
    }
}




            a business... belongs to many categories
}
    },
    "categories" : [
       {
          "_id" : ObjectId("4ce82e50d3dfaa16360004f2"),
          "meta" : {
             "slug" : "282-glass",
             "tags" : [ "windows" ],
          },
          "display_name" : "Glass"
       },
       {
          "_id" : ObjectId("4ce82e64d3dfaa16360014eb"),
          "meta" : {
             "slug" : "3063-auto-glass",
             "tags" : [ "windshields" ],
          },
          "display_name" : "Auto Glass"
       }
    ]
}

               a business... belongs to many categories
queries & indexes
    know what you want
#1 find a business
    I want *that* one
// single business
db.businesses.findOne({
   _id: ObjectId("4ce838ef4a882579960001b9")
})




                                 find a business
#2 find by location
  Businesses in San Francisco, CA
// find all within state
db.businesses.find({
   "location.state._id": ObjectId("4ce82937961552247900000f")
})




                       find businesses by state/city/zip
// find all within state
db.businesses.find({
   "location.state._id": ObjectId("4ce82937961552247900000f")
})

// find all within city
db.businesses.find({
   "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95")
})




                       find businesses by state/city/zip
// find all within state
db.businesses.find({
   "location.state._id": ObjectId("4ce82937961552247900000f")
})

// find all within city
db.businesses.find({
   "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95")
})

// find all within zip
db.businesses.find({
   "location.zip._id": ObjectId("4ce82b5ed3dfaa116b0026f0")
})




                       find businesses by state/city/zip
// the indexes
db.businesses.ensureIndex({"location.city._id": 1})
db.businesses.ensureIndex({"location.zip._id": 1})



                         1.5GB
                          each




    skip “location.state._id” -- only 51 possibilities


                                                 indexes
#3 find by category
 Businesses in the Auto Repair category
// find by category id
db.businesses.find({
   "categories._id": ObjectId("4ce82e50d3dfaa16360004f2")
})


// the index
db.businesses.ensureIndex({
   "categories._id":1
})




                               businesses by category
#4 - find by category + location
   Businesses in the Plumbing category in Chicago, IL
// find by city id and category id
db.businesses.find({
   "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95"),
   "categories._id": ObjectId("4ce82e50d3dfaa16360004f2")
})




                         businesses by category + city
// city id
 {"location.city._id":1}


         ~ or ~

  // category id
  {"categories._id":1}




 answer: both suck
we need a compound index


         which index should we use?
db.businesses.ensureIndex({
    "location.city._id" : 1, "categories._id" : 1
 })

                     ~ or ~
 db.businesses.ensureIndex({
    "categories._id" : 1, "location.city._id" : 1
 })


      35,000 cities & 2,500 categories


   answer: cities → categories
create one for zip codes and categories too!

                                          which order?
{"location.city._id" : 1}
 {"location.city._id" : 1, "categories._id" : 1}




                 answer: yes

db.businesses.dropIndex("location.city._id_1")




              don’t we have 2 indexes on city id?
#5 - find by keyword
  “something awesome” in Boulder, CO
{
    "_id" : ObjectId("4ce838ef4a882579960001b9"),
    "name" : "Acme Glass Co",
    "keywords" : [
      "glass",
      "repair",
      "acme",
      ...
    ]
}



db.businesses.ensureIndex({
   "location.city._id":1,
   "keywords":1
})



db.businesses.find({
   "location.city._id":ObjectId("4ce82aa0d3dfaa10f8004a95"),
   "keywords":/glass/i
})




             find businesses in city by keyword
me: we’re switching from postgres+solr to mongo
kyle: oh wow, you can replace solr with mongo?
me: with some creativity
kyle: seems like it’d still be hard to get just right
me: it works well
kyle: gotcha



                                chat with Kyle Banker
i was wrong, kyle was right
I




        I’ll never leave you again

...until MongoDB supports full text later this year
                      :)
aggregation
map/reduce to the rescue
sitemaps
big list of every url
• xml files containing each unique url ~ 24M
• 50,000 urls per file, about 500 files
• urls are generated from live data
• http://companyx.com/sitemaps/1.xml


                                              sitemaps
>> "hello!".hash % 6 #=> 5

>> "/ny/new-york/c/apartments".hash % 6 #=> 5




    returns an integer between 0 and the
              number specified




                   partition by consistent hash
1. map each url in the site to a partition
2. reduce all partitions to a single document containing
   all urls in that partition
3. save to a permanent collection




                                             map/reduce
/il/chicago/c/pizza                      4
                                             1
/ny/new-york/c/apartments                1
nd/rugby/c/apartments                    6   2
/14076500-bayside-marina                 2
/13401000-comtrak-logistics-inc          3   3
/12347500-allstate-auto-insurance        1
il/downers-grove/c/computer-web-design   6   4
/1009500-heidelberg-lodges               5
mn/redwood-falls/c/food-service          4   5
/14077000-bank-of-america                5
mn/savage/c/audio-visual-equipment       1   6
...


                                             map
{
                                             {
    "total" : 2,
                                                 "total" : 1,
    "urls" : [
                                                 "urls" : [
      "/12347500-allstate-auto-insurance",
                                                   "/mn/savage/c/audio-visual-equipment"
      "/ny/new-york/c/apartments"
                                                 ]
    ]
                                             }
}




         {
             "_id" : 1,
             "value" : {
               "total" : 2,
               "urls" : [
                 "/12347500-allstate-auto-insurance",
                 "/mn/savage/c/audio-visual-equipment",
                 "/ny/new-york/c/apartments"
               ]
             }
         }                                                                       reduce
db.sitemaps.findOne({_id:1}).value.urls




[
    "/12347500-allstate-auto-insurance",
    "/mn/savage/c/audio-visual-equipment",
    "/ny/new-york/c/apartments"
]




                                             usage
wrap up
115ms average response times


                        2 months later
thank you
 @michaeldwan

Contenu connexe

En vedette

梅可望校長養生講義
梅可望校長養生講義梅可望校長養生講義
梅可望校長養生講義佩貞 林
 
2015 deep research report on global optically functional films and coatings i...
2015 deep research report on global optically functional films and coatings i...2015 deep research report on global optically functional films and coatings i...
2015 deep research report on global optically functional films and coatings i...Research Hub
 
孝順需要教育
孝順需要教育孝順需要教育
孝順需要教育佩貞 林
 
What if everyone gave just 1 penny?
What if everyone gave just 1 penny?What if everyone gave just 1 penny?
What if everyone gave just 1 penny?David Hepworth
 
PS 240 Thinking Politically Spring 2011
PS 240 Thinking Politically Spring 2011PS 240 Thinking Politically Spring 2011
PS 240 Thinking Politically Spring 2011Christopher Rice
 
Fundusze inwestycyjne
Fundusze inwestycyjneFundusze inwestycyjne
Fundusze inwestycyjneGucio Silva
 
Blog her devries_surveyfinal, ec
Blog her devries_surveyfinal, ecBlog her devries_surveyfinal, ec
Blog her devries_surveyfinal, ecElisa Camahort Page
 
Png还是jpg,这是个问题
Png还是jpg,这是个问题Png还是jpg,这是个问题
Png还是jpg,这是个问题碳酸饮料会
 
Akupresura w-praktyce
Akupresura w-praktyceAkupresura w-praktyce
Akupresura w-praktyceGucio Silva
 
Umysl sprzedawcy
Umysl sprzedawcyUmysl sprzedawcy
Umysl sprzedawcyGucio Silva
 
Journal abbreviations
Journal abbreviationsJournal abbreviations
Journal abbreviationsUCT
 
2011 Toyota Highlnder Plano
2011 Toyota Highlnder Plano2011 Toyota Highlnder Plano
2011 Toyota Highlnder PlanoToyota of Irving
 

En vedette (19)

梅可望校長養生講義
梅可望校長養生講義梅可望校長養生講義
梅可望校長養生講義
 
2015 deep research report on global optically functional films and coatings i...
2015 deep research report on global optically functional films and coatings i...2015 deep research report on global optically functional films and coatings i...
2015 deep research report on global optically functional films and coatings i...
 
孝順需要教育
孝順需要教育孝順需要教育
孝順需要教育
 
Cennox_ASD-SENTINEL_RUS
Cennox_ASD-SENTINEL_RUSCennox_ASD-SENTINEL_RUS
Cennox_ASD-SENTINEL_RUS
 
知足常樂
知足常樂知足常樂
知足常樂
 
What if everyone gave just 1 penny?
What if everyone gave just 1 penny?What if everyone gave just 1 penny?
What if everyone gave just 1 penny?
 
Awesome images
Awesome imagesAwesome images
Awesome images
 
PS 240 Thinking Politically Spring 2011
PS 240 Thinking Politically Spring 2011PS 240 Thinking Politically Spring 2011
PS 240 Thinking Politically Spring 2011
 
Fundusze inwestycyjne
Fundusze inwestycyjneFundusze inwestycyjne
Fundusze inwestycyjne
 
Blog her devries_surveyfinal, ec
Blog her devries_surveyfinal, ecBlog her devries_surveyfinal, ec
Blog her devries_surveyfinal, ec
 
Some Beaut looking Vehicles from Yesteryear
Some Beaut looking Vehicles from YesteryearSome Beaut looking Vehicles from Yesteryear
Some Beaut looking Vehicles from Yesteryear
 
205- New year concert
205- New year concert205- New year concert
205- New year concert
 
197 - Nursing
197 - Nursing197 - Nursing
197 - Nursing
 
Png还是jpg,这是个问题
Png还是jpg,这是个问题Png还是jpg,这是个问题
Png还是jpg,这是个问题
 
Akupresura w-praktyce
Akupresura w-praktyceAkupresura w-praktyce
Akupresura w-praktyce
 
Umysl sprzedawcy
Umysl sprzedawcyUmysl sprzedawcy
Umysl sprzedawcy
 
Journal abbreviations
Journal abbreviationsJournal abbreviations
Journal abbreviations
 
New York City
New York CityNew York City
New York City
 
2011 Toyota Highlnder Plano
2011 Toyota Highlnder Plano2011 Toyota Highlnder Plano
2011 Toyota Highlnder Plano
 

Similaire à Data Modeling for Performance

Rich Results and Structured Data
Rich Results and Structured DataRich Results and Structured Data
Rich Results and Structured DataSMA Marketing
 
Querying NoSQL with SQL - KCDC - August 2017
Querying NoSQL with SQL - KCDC - August 2017Querying NoSQL with SQL - KCDC - August 2017
Querying NoSQL with SQL - KCDC - August 2017Matthew Groves
 
SDKs, the good the bad the ugly - Japan
SDKs, the good the bad the ugly - JapanSDKs, the good the bad the ugly - Japan
SDKs, the good the bad the ugly - Japantristansokol
 
MAKE SENSE OF YOUR BIG DATA
MAKE SENSE OF YOUR BIG DATA MAKE SENSE OF YOUR BIG DATA
MAKE SENSE OF YOUR BIG DATA TREEPTIK
 
Application Development & Database Choices: Postgres Support for non Relation...
Application Development & Database Choices: Postgres Support for non Relation...Application Development & Database Choices: Postgres Support for non Relation...
Application Development & Database Choices: Postgres Support for non Relation...EDB
 
Online | MongoDB Atlas on GCP Workshop
Online | MongoDB Atlas on GCP Workshop Online | MongoDB Atlas on GCP Workshop
Online | MongoDB Atlas on GCP Workshop Natasha Wilson
 
Big Data Analytics 1: Driving Personalized Experiences Using Customer Profiles
Big Data Analytics 1: Driving Personalized Experiences Using Customer ProfilesBig Data Analytics 1: Driving Personalized Experiences Using Customer Profiles
Big Data Analytics 1: Driving Personalized Experiences Using Customer ProfilesMongoDB
 
Data Mining Open Ap Is
Data Mining Open Ap IsData Mining Open Ap Is
Data Mining Open Ap Isoscon2007
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 MinutesKarel Minarik
 
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it tooQuerying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it tooAll Things Open
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampAlexei Gorobets
 
ETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDBETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDBMongoDB
 
Utilizing Arrays: Modeling, Querying and Indexing
Utilizing Arrays: Modeling, Querying and IndexingUtilizing Arrays: Modeling, Querying and Indexing
Utilizing Arrays: Modeling, Querying and IndexingKeshav Murthy
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichNorberto Leite
 
Designing Capital One's iPhone and iPad App
Designing Capital One's iPhone and iPad AppDesigning Capital One's iPhone and iPad App
Designing Capital One's iPhone and iPad AppSabrina Ngai
 
Interactive analytics at scale with druid
Interactive analytics at scale with druidInteractive analytics at scale with druid
Interactive analytics at scale with druidJulien Lavigne du Cadet
 
WordCamp Chiclana 2020 Crea schemas sin plugins
WordCamp Chiclana 2020 Crea schemas sin pluginsWordCamp Chiclana 2020 Crea schemas sin plugins
WordCamp Chiclana 2020 Crea schemas sin pluginsClosemarketing
 
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...Amazon Web Services
 
JSON Data Modeling - July 2018 - Tulsa Techfest
JSON Data Modeling - July 2018 - Tulsa TechfestJSON Data Modeling - July 2018 - Tulsa Techfest
JSON Data Modeling - July 2018 - Tulsa TechfestMatthew Groves
 

Similaire à Data Modeling for Performance (20)

Rich Results and Structured Data
Rich Results and Structured DataRich Results and Structured Data
Rich Results and Structured Data
 
Querying NoSQL with SQL - KCDC - August 2017
Querying NoSQL with SQL - KCDC - August 2017Querying NoSQL with SQL - KCDC - August 2017
Querying NoSQL with SQL - KCDC - August 2017
 
SDKs, the good the bad the ugly - Japan
SDKs, the good the bad the ugly - JapanSDKs, the good the bad the ugly - Japan
SDKs, the good the bad the ugly - Japan
 
MAKE SENSE OF YOUR BIG DATA
MAKE SENSE OF YOUR BIG DATA MAKE SENSE OF YOUR BIG DATA
MAKE SENSE OF YOUR BIG DATA
 
Application Development & Database Choices: Postgres Support for non Relation...
Application Development & Database Choices: Postgres Support for non Relation...Application Development & Database Choices: Postgres Support for non Relation...
Application Development & Database Choices: Postgres Support for non Relation...
 
Online | MongoDB Atlas on GCP Workshop
Online | MongoDB Atlas on GCP Workshop Online | MongoDB Atlas on GCP Workshop
Online | MongoDB Atlas on GCP Workshop
 
Big Data Analytics 1: Driving Personalized Experiences Using Customer Profiles
Big Data Analytics 1: Driving Personalized Experiences Using Customer ProfilesBig Data Analytics 1: Driving Personalized Experiences Using Customer Profiles
Big Data Analytics 1: Driving Personalized Experiences Using Customer Profiles
 
Data Mining Open Ap Is
Data Mining Open Ap IsData Mining Open Ap Is
Data Mining Open Ap Is
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
 
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it tooQuerying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @Moldcamp
 
ETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDBETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDB
 
MongoDB With Style
MongoDB With StyleMongoDB With Style
MongoDB With Style
 
Utilizing Arrays: Modeling, Querying and Indexing
Utilizing Arrays: Modeling, Querying and IndexingUtilizing Arrays: Modeling, Querying and Indexing
Utilizing Arrays: Modeling, Querying and Indexing
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
 
Designing Capital One's iPhone and iPad App
Designing Capital One's iPhone and iPad AppDesigning Capital One's iPhone and iPad App
Designing Capital One's iPhone and iPad App
 
Interactive analytics at scale with druid
Interactive analytics at scale with druidInteractive analytics at scale with druid
Interactive analytics at scale with druid
 
WordCamp Chiclana 2020 Crea schemas sin plugins
WordCamp Chiclana 2020 Crea schemas sin pluginsWordCamp Chiclana 2020 Crea schemas sin plugins
WordCamp Chiclana 2020 Crea schemas sin plugins
 
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
 
JSON Data Modeling - July 2018 - Tulsa Techfest
JSON Data Modeling - July 2018 - Tulsa TechfestJSON Data Modeling - July 2018 - Tulsa Techfest
JSON Data Modeling - July 2018 - Tulsa Techfest
 

Dernier

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Dernier (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Data Modeling for Performance

  • 1. Data Modeling for Performance Mongo Boulder Michael Dwan January 21, 2010 Snapjoy
  • 2. i’m michael dwan @michaeldwan on the twitter
  • 3. the project Company X
  • 4. • find business details (web + api) • search by category/keyword + geo (web + api) • update (api) application spec
  • 5. 100,000 30,000 100,000,000 geo areas tags partners 2,300 15,000,000 categories businesses 2,000,000 requests daily 24,000,000 urls in sitemap why is this interesting?
  • 6. • infrequent changes • monthly updates w/ 12M monthly changes • “zero downtime” updates
  • 7. the problem mo’ data, mo’ problems
  • 9. providers mappings phone_numbers zips assets businesses _phone_numbers cities categorizations businesses states categories businesses_neighborhoods taggings users tags neighborhoods
  • 10. x xx x architecture
  • 12. dow n ti me solr
  • 14. dow n ti me migrations
  • 16. > gem install acts_as_web_scale
  • 17.
  • 18.
  • 19. { "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", } a business...
  • 20. { "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", } a business... has many phone numbers
  • 21. { "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ] } a business... has many phone numbers
  • 22. "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ] } a business... has coordinates
  • 23. "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ] } a business... has coordinates
  • 24. "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ] } a business... has many tags
  • 25. "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ] } a business... has many tags
  • 26. "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ] } a business... has an address
  • 27. "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St" } } a business... has an address
  • 29. { "_id" : ObjectId("4ce82937961552247900000f"), "name" : "Illinois", "slug" : "il", ... } a state
  • 30. "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St" } } a business... belongs to a state
  • 31. "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St" } } a business... belongs to a state
  • 32. "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St", "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" } } } a business... belongs to a state
  • 33. "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" } } } a business... belongs to a city
  • 34. "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, } } a business... belongs to a city
  • 35. }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, } } a business... belongs to a zip code
  • 36. }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, "zip" : { "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"), "display_name" : "97211" } } } a business... belongs to a zip code
  • 38. { "_id" : ObjectId("4ce82e64d3dfaa16360014eb"), "name" : "Auto Glass", "slug" : "3063-auto-glass", "tags" : [ "windshields" ], ... } a category
  • 39. "meta" : { "slug" : "or" }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, "zip" : { "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"), "display_name" : "97211" } } } a business... belongs to a zip code
  • 40. } } } a business... belongs to many categories
  • 41. } }, "categories" : [ { "_id" : ObjectId("4ce82e50d3dfaa16360004f2"), "meta" : { "slug" : "282-glass", "tags" : [ "windows" ], }, "display_name" : "Glass" }, { "_id" : ObjectId("4ce82e64d3dfaa16360014eb"), "meta" : { "slug" : "3063-auto-glass", "tags" : [ "windshields" ], }, "display_name" : "Auto Glass" } ] } a business... belongs to many categories
  • 42. queries & indexes know what you want
  • 43. #1 find a business I want *that* one
  • 44. // single business db.businesses.findOne({ _id: ObjectId("4ce838ef4a882579960001b9") }) find a business
  • 45. #2 find by location Businesses in San Francisco, CA
  • 46. // find all within state db.businesses.find({ "location.state._id": ObjectId("4ce82937961552247900000f") }) find businesses by state/city/zip
  • 47. // find all within state db.businesses.find({ "location.state._id": ObjectId("4ce82937961552247900000f") }) // find all within city db.businesses.find({ "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95") }) find businesses by state/city/zip
  • 48. // find all within state db.businesses.find({ "location.state._id": ObjectId("4ce82937961552247900000f") }) // find all within city db.businesses.find({ "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95") }) // find all within zip db.businesses.find({ "location.zip._id": ObjectId("4ce82b5ed3dfaa116b0026f0") }) find businesses by state/city/zip
  • 49. // the indexes db.businesses.ensureIndex({"location.city._id": 1}) db.businesses.ensureIndex({"location.zip._id": 1}) 1.5GB each skip “location.state._id” -- only 51 possibilities indexes
  • 50. #3 find by category Businesses in the Auto Repair category
  • 51. // find by category id db.businesses.find({ "categories._id": ObjectId("4ce82e50d3dfaa16360004f2") }) // the index db.businesses.ensureIndex({ "categories._id":1 }) businesses by category
  • 52. #4 - find by category + location Businesses in the Plumbing category in Chicago, IL
  • 53. // find by city id and category id db.businesses.find({ "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95"), "categories._id": ObjectId("4ce82e50d3dfaa16360004f2") }) businesses by category + city
  • 54. // city id {"location.city._id":1} ~ or ~ // category id {"categories._id":1} answer: both suck we need a compound index which index should we use?
  • 55. db.businesses.ensureIndex({ "location.city._id" : 1, "categories._id" : 1 }) ~ or ~ db.businesses.ensureIndex({ "categories._id" : 1, "location.city._id" : 1 }) 35,000 cities & 2,500 categories answer: cities → categories create one for zip codes and categories too! which order?
  • 56. {"location.city._id" : 1} {"location.city._id" : 1, "categories._id" : 1} answer: yes db.businesses.dropIndex("location.city._id_1") don’t we have 2 indexes on city id?
  • 57. #5 - find by keyword “something awesome” in Boulder, CO
  • 58. { "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "keywords" : [ "glass", "repair", "acme", ... ] } db.businesses.ensureIndex({ "location.city._id":1, "keywords":1 }) db.businesses.find({ "location.city._id":ObjectId("4ce82aa0d3dfaa10f8004a95"), "keywords":/glass/i }) find businesses in city by keyword
  • 59. me: we’re switching from postgres+solr to mongo kyle: oh wow, you can replace solr with mongo? me: with some creativity kyle: seems like it’d still be hard to get just right me: it works well kyle: gotcha chat with Kyle Banker
  • 60. i was wrong, kyle was right
  • 61. I I’ll never leave you again ...until MongoDB supports full text later this year :)
  • 63. sitemaps big list of every url
  • 64. • xml files containing each unique url ~ 24M • 50,000 urls per file, about 500 files • urls are generated from live data • http://companyx.com/sitemaps/1.xml sitemaps
  • 65. >> "hello!".hash % 6 #=> 5 >> "/ny/new-york/c/apartments".hash % 6 #=> 5 returns an integer between 0 and the number specified partition by consistent hash
  • 66. 1. map each url in the site to a partition 2. reduce all partitions to a single document containing all urls in that partition 3. save to a permanent collection map/reduce
  • 67. /il/chicago/c/pizza 4 1 /ny/new-york/c/apartments 1 nd/rugby/c/apartments 6 2 /14076500-bayside-marina 2 /13401000-comtrak-logistics-inc 3 3 /12347500-allstate-auto-insurance 1 il/downers-grove/c/computer-web-design 6 4 /1009500-heidelberg-lodges 5 mn/redwood-falls/c/food-service 4 5 /14077000-bank-of-america 5 mn/savage/c/audio-visual-equipment 1 6 ... map
  • 68. { { "total" : 2, "total" : 1, "urls" : [ "urls" : [ "/12347500-allstate-auto-insurance", "/mn/savage/c/audio-visual-equipment" "/ny/new-york/c/apartments" ] ] } } { "_id" : 1, "value" : { "total" : 2, "urls" : [ "/12347500-allstate-auto-insurance", "/mn/savage/c/audio-visual-equipment", "/ny/new-york/c/apartments" ] } } reduce
  • 69. db.sitemaps.findOne({_id:1}).value.urls [ "/12347500-allstate-auto-insurance", "/mn/savage/c/audio-visual-equipment", "/ny/new-york/c/apartments" ] usage
  • 71. 115ms average response times 2 months later