SlideShare une entreprise Scribd logo
1  sur  72
Télécharger pour lire hors ligne
Data Modeling for
                 Performance


Mongo Boulder                 Michael Dwan
January 21, 2010                     Snapjoy
i’m michael dwan
 @michaeldwan on the twitter
the project
  Company X
• find business details (web + api)
• search by category/keyword + geo (web + api)
• update (api)



                                   application spec
100,000             30,000
                                 100,000,000
geo areas                              tags
                   partners

                                    2,300
   15,000,000                     categories

       businesses
                              2,000,000
                              requests daily
24,000,000
 urls in sitemap
                          why is this interesting?
• infrequent changes
• monthly updates w/ 12M monthly changes
• “zero downtime”



                                           updates
the problem
 mo’ data, mo’ problems
complexity
providers          mappings                phone_numbers

                                                                          zips
 assets

                               businesses _phone_numbers

                                                                         cities
categorizations




                             businesses
                                                                         states
  categories


                                                           businesses_neighborhoods
                  taggings



                                    users
    tags                                                        neighborhoods
x
xx   x
     architecture
read performance
dow
   n ti
       me
solr
solr getting fussy
dow
      n ti
          me
migrations
the solution
> gem install acts_as_web_scale
{
    "_id" : ObjectId("4ce838ef4a882579960001b9"),
    "name" : "Acme Glass Co",
    "tagline" : "Your trusty glass hole",
    "description" : "Glass repair...",
    "hours" : "Mon Fri 8 5",
    "url" : "http://acmeglasshole.biz",
}




                                        a business...
{
    "_id" : ObjectId("4ce838ef4a882579960001b9"),
    "name" : "Acme Glass Co",
    "tagline" : "Your trusty glass hole",
    "description" : "Glass repair...",
    "hours" : "Mon Fri 8 5",
    "url" : "http://acmeglasshole.biz",
}




            a business... has many phone numbers
{
    "_id" : ObjectId("4ce838ef4a882579960001b9"),
    "name" : "Acme Glass Co",
    "tagline" : "Your trusty glass hole",
    "description" : "Glass repair...",
    "hours" : "Mon Fri 8 5",
    "url" : "http://acmeglasshole.biz",
    "phone_numbers" : [
      "5035550091",
      "8005555456"
    ]
}


            a business... has many phone numbers
"_id" : ObjectId("4ce838ef4a882579960001b9"),
    "name" : "Acme Glass Co",
    "tagline" : "Your trusty glass hole",
    "description" : "Glass repair...",
    "hours" : "Mon Fri 8 5",
    "url" : "http://acmeglasshole.biz",
    "phone_numbers" : [
      "5035550091",
      "8005555456"
    ]
}




                      a business... has coordinates
"_id" : ObjectId("4ce838ef4a882579960001b9"),
    "name" : "Acme Glass Co",
    "tagline" : "Your trusty glass hole",
    "description" : "Glass repair...",
    "hours" : "Mon Fri 8 5",
    "url" : "http://acmeglasshole.biz",
    "phone_numbers" : [
       "5035550091",
       "8005555456"
    ],
    "coordinates" : [
       45.559294,
       -122.644053
    ]
}



                      a business... has coordinates
"url" : "http://acmeglasshole.biz",
    "phone_numbers" : [
       "5035550091",
       "8005555456"
    ],
    "coordinates" : [
       45.559294,
       -122.644053
    ]
}




                        a business... has many tags
"url" : "http://acmeglasshole.biz",
    "phone_numbers" : [
       "5035550091",
       "8005555456"
    ],
    "coordinates" : [
       45.559294,
       -122.644053
    ],
    "tags" : [
       "glass",
       "mirrors",
       "flat glass"
    ]
}



                        a business... has many tags
"coordinates" : [
       45.559294,
       -122.644053
    ],
    "tags" : [
       "glass",
       "mirrors",
       "flat glass"
    ]
}




                        a business... has an address
"coordinates" : [
       45.559294,
       -122.644053
    ],
    "tags" : [
       "glass",
       "mirrors",
       "flat glass"
    ],
    "location" : {
       "street_address" : "2035 NE Alberta St"
    }
}




                         a business... has an address
belongs to?
{
    "_id" : ObjectId("4ce82937961552247900000f"),
    "name" : "Illinois",
    "slug" : "il",
    ...
}




                                             a state
"coordinates" : [
       45.559294,
       -122.644053
    ],
    "tags" : [
       "glass",
       "mirrors",
       "flat glass"
    ],
    "location" : {
       "street_address" : "2035 NE Alberta St"
    }
}




                     a business... belongs to a state
"tags" : [
       "glass",
       "mirrors",
       "flat glass"
    ],
    "location" : {
       "street_address" : "2035 NE Alberta St"
    }
}




                     a business... belongs to a state
"tags" : [
       "glass",
       "mirrors",
       "flat glass"
    ],
    "location" : {
       "street_address" : "2035 NE Alberta St",
       "state" : {
         "_id" : ObjectId("4ce829379615522479000026"),
         "meta" : {
            "slug" : "or"
         },
         "display_name" : "Oregon"
       }
    }
}


                     a business... belongs to a state
"state" : {
          "_id" : ObjectId("4ce829379615522479000026"),
          "meta" : {
             "slug" : "or"
          },
          "display_name" : "Oregon"
        }
    }
}




                          a business... belongs to a city
"state" : {
           "_id" : ObjectId("4ce829379615522479000026"),
           "meta" : {
              "slug" : "or"
           },
           "display_name" : "Oregon"
        },
        "city" : {
           "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"),
           "meta" : {
              "slug" : "portland",
           },
           "display_name" : "Portland, OR"
        },
    }
}

                          a business... belongs to a city
},
          "display_name" : "Oregon"
        },
        "city" : {
           "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"),
           "meta" : {
              "slug" : "portland",
           },
           "display_name" : "Portland, OR"
        },
    }
}




                     a business... belongs to a zip code
},
          "display_name" : "Oregon"
        },
        "city" : {
           "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"),
           "meta" : {
              "slug" : "portland",
           },
           "display_name" : "Portland, OR"
        },
        "zip" : {
           "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"),
           "display_name" : "97211"
        }
    }
}

                     a business... belongs to a zip code
many-to-many?
{
    "_id" : ObjectId("4ce82e64d3dfaa16360014eb"),
    "name" : "Auto Glass",
    "slug" : "3063-auto-glass",
    "tags" : [
       "windshields"
    ],
    ...
}




                                       a category
"meta" : {
             "slug" : "or"
          },
          "display_name" : "Oregon"
        },
        "city" : {
           "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"),
           "meta" : {
              "slug" : "portland",
           },
           "display_name" : "Portland, OR"
        },
        "zip" : {
           "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"),
           "display_name" : "97211"
        }
    }
}




                        a business... belongs to a zip code
}
    }
}




            a business... belongs to many categories
}
    },
    "categories" : [
       {
          "_id" : ObjectId("4ce82e50d3dfaa16360004f2"),
          "meta" : {
             "slug" : "282-glass",
             "tags" : [ "windows" ],
          },
          "display_name" : "Glass"
       },
       {
          "_id" : ObjectId("4ce82e64d3dfaa16360014eb"),
          "meta" : {
             "slug" : "3063-auto-glass",
             "tags" : [ "windshields" ],
          },
          "display_name" : "Auto Glass"
       }
    ]
}

               a business... belongs to many categories
queries & indexes
    know what you want
#1 find a business
    I want *that* one
// single business
db.businesses.findOne({
   _id: ObjectId("4ce838ef4a882579960001b9")
})




                                 find a business
#2 find by location
  Businesses in San Francisco, CA
// find all within state
db.businesses.find({
   "location.state._id": ObjectId("4ce82937961552247900000f")
})




                       find businesses by state/city/zip
// find all within state
db.businesses.find({
   "location.state._id": ObjectId("4ce82937961552247900000f")
})

// find all within city
db.businesses.find({
   "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95")
})




                       find businesses by state/city/zip
// find all within state
db.businesses.find({
   "location.state._id": ObjectId("4ce82937961552247900000f")
})

// find all within city
db.businesses.find({
   "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95")
})

// find all within zip
db.businesses.find({
   "location.zip._id": ObjectId("4ce82b5ed3dfaa116b0026f0")
})




                       find businesses by state/city/zip
// the indexes
db.businesses.ensureIndex({"location.city._id": 1})
db.businesses.ensureIndex({"location.zip._id": 1})



                         1.5GB
                          each




    skip “location.state._id” -- only 51 possibilities


                                                 indexes
#3 find by category
 Businesses in the Auto Repair category
// find by category id
db.businesses.find({
   "categories._id": ObjectId("4ce82e50d3dfaa16360004f2")
})


// the index
db.businesses.ensureIndex({
   "categories._id":1
})




                               businesses by category
#4 - find by category + location
   Businesses in the Plumbing category in Chicago, IL
// find by city id and category id
db.businesses.find({
   "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95"),
   "categories._id": ObjectId("4ce82e50d3dfaa16360004f2")
})




                         businesses by category + city
// city id
 {"location.city._id":1}


         ~ or ~

  // category id
  {"categories._id":1}




 answer: both suck
we need a compound index


         which index should we use?
db.businesses.ensureIndex({
    "location.city._id" : 1, "categories._id" : 1
 })

                     ~ or ~
 db.businesses.ensureIndex({
    "categories._id" : 1, "location.city._id" : 1
 })


      35,000 cities & 2,500 categories


   answer: cities → categories
create one for zip codes and categories too!

                                          which order?
{"location.city._id" : 1}
 {"location.city._id" : 1, "categories._id" : 1}




                 answer: yes

db.businesses.dropIndex("location.city._id_1")




              don’t we have 2 indexes on city id?
#5 - find by keyword
  “something awesome” in Boulder, CO
{
    "_id" : ObjectId("4ce838ef4a882579960001b9"),
    "name" : "Acme Glass Co",
    "keywords" : [
      "glass",
      "repair",
      "acme",
      ...
    ]
}



db.businesses.ensureIndex({
   "location.city._id":1,
   "keywords":1
})



db.businesses.find({
   "location.city._id":ObjectId("4ce82aa0d3dfaa10f8004a95"),
   "keywords":/glass/i
})




             find businesses in city by keyword
me: we’re switching from postgres+solr to mongo
kyle: oh wow, you can replace solr with mongo?
me: with some creativity
kyle: seems like it’d still be hard to get just right
me: it works well
kyle: gotcha



                                chat with Kyle Banker
i was wrong, kyle was right
I




        I’ll never leave you again

...until MongoDB supports full text later this year
                      :)
aggregation
map/reduce to the rescue
sitemaps
big list of every url
• xml files containing each unique url ~ 24M
• 50,000 urls per file, about 500 files
• urls are generated from live data
• http://companyx.com/sitemaps/1.xml


                                              sitemaps
>> "hello!".hash % 6 #=> 5

>> "/ny/new-york/c/apartments".hash % 6 #=> 5




    returns an integer between 0 and the
              number specified




                   partition by consistent hash
1. map each url in the site to a partition
2. reduce all partitions to a single document containing
   all urls in that partition
3. save to a permanent collection




                                             map/reduce
/il/chicago/c/pizza                      4
                                             1
/ny/new-york/c/apartments                1
nd/rugby/c/apartments                    6   2
/14076500-bayside-marina                 2
/13401000-comtrak-logistics-inc          3   3
/12347500-allstate-auto-insurance        1
il/downers-grove/c/computer-web-design   6   4
/1009500-heidelberg-lodges               5
mn/redwood-falls/c/food-service          4   5
/14077000-bank-of-america                5
mn/savage/c/audio-visual-equipment       1   6
...


                                             map
{
                                             {
    "total" : 2,
                                                 "total" : 1,
    "urls" : [
                                                 "urls" : [
      "/12347500-allstate-auto-insurance",
                                                   "/mn/savage/c/audio-visual-equipment"
      "/ny/new-york/c/apartments"
                                                 ]
    ]
                                             }
}




         {
             "_id" : 1,
             "value" : {
               "total" : 2,
               "urls" : [
                 "/12347500-allstate-auto-insurance",
                 "/mn/savage/c/audio-visual-equipment",
                 "/ny/new-york/c/apartments"
               ]
             }
         }                                                                       reduce
db.sitemaps.findOne({_id:1}).value.urls




[
    "/12347500-allstate-auto-insurance",
    "/mn/savage/c/audio-visual-equipment",
    "/ny/new-york/c/apartments"
]




                                             usage
wrap up
115ms average response times


                        2 months later
thank you
 @michaeldwan

Contenu connexe

En vedette

梅可望校長養生講義
梅可望校長養生講義梅可望校長養生講義
梅可望校長養生講義佩貞 林
 
2015 deep research report on global optically functional films and coatings i...
2015 deep research report on global optically functional films and coatings i...2015 deep research report on global optically functional films and coatings i...
2015 deep research report on global optically functional films and coatings i...Research Hub
 
孝順需要教育
孝順需要教育孝順需要教育
孝順需要教育佩貞 林
 
What if everyone gave just 1 penny?
What if everyone gave just 1 penny?What if everyone gave just 1 penny?
What if everyone gave just 1 penny?David Hepworth
 
PS 240 Thinking Politically Spring 2011
PS 240 Thinking Politically Spring 2011PS 240 Thinking Politically Spring 2011
PS 240 Thinking Politically Spring 2011Christopher Rice
 
Fundusze inwestycyjne
Fundusze inwestycyjneFundusze inwestycyjne
Fundusze inwestycyjneGucio Silva
 
Blog her devries_surveyfinal, ec
Blog her devries_surveyfinal, ecBlog her devries_surveyfinal, ec
Blog her devries_surveyfinal, ecElisa Camahort Page
 
Png还是jpg,这是个问题
Png还是jpg,这是个问题Png还是jpg,这是个问题
Png还是jpg,这是个问题碳酸饮料会
 
Akupresura w-praktyce
Akupresura w-praktyceAkupresura w-praktyce
Akupresura w-praktyceGucio Silva
 
Umysl sprzedawcy
Umysl sprzedawcyUmysl sprzedawcy
Umysl sprzedawcyGucio Silva
 
Journal abbreviations
Journal abbreviationsJournal abbreviations
Journal abbreviationsUCT
 
2011 Toyota Highlnder Plano
2011 Toyota Highlnder Plano2011 Toyota Highlnder Plano
2011 Toyota Highlnder PlanoToyota of Irving
 

En vedette (19)

梅可望校長養生講義
梅可望校長養生講義梅可望校長養生講義
梅可望校長養生講義
 
2015 deep research report on global optically functional films and coatings i...
2015 deep research report on global optically functional films and coatings i...2015 deep research report on global optically functional films and coatings i...
2015 deep research report on global optically functional films and coatings i...
 
孝順需要教育
孝順需要教育孝順需要教育
孝順需要教育
 
Cennox_ASD-SENTINEL_RUS
Cennox_ASD-SENTINEL_RUSCennox_ASD-SENTINEL_RUS
Cennox_ASD-SENTINEL_RUS
 
知足常樂
知足常樂知足常樂
知足常樂
 
What if everyone gave just 1 penny?
What if everyone gave just 1 penny?What if everyone gave just 1 penny?
What if everyone gave just 1 penny?
 
Awesome images
Awesome imagesAwesome images
Awesome images
 
PS 240 Thinking Politically Spring 2011
PS 240 Thinking Politically Spring 2011PS 240 Thinking Politically Spring 2011
PS 240 Thinking Politically Spring 2011
 
Fundusze inwestycyjne
Fundusze inwestycyjneFundusze inwestycyjne
Fundusze inwestycyjne
 
Blog her devries_surveyfinal, ec
Blog her devries_surveyfinal, ecBlog her devries_surveyfinal, ec
Blog her devries_surveyfinal, ec
 
Some Beaut looking Vehicles from Yesteryear
Some Beaut looking Vehicles from YesteryearSome Beaut looking Vehicles from Yesteryear
Some Beaut looking Vehicles from Yesteryear
 
205- New year concert
205- New year concert205- New year concert
205- New year concert
 
197 - Nursing
197 - Nursing197 - Nursing
197 - Nursing
 
Png还是jpg,这是个问题
Png还是jpg,这是个问题Png还是jpg,这是个问题
Png还是jpg,这是个问题
 
Akupresura w-praktyce
Akupresura w-praktyceAkupresura w-praktyce
Akupresura w-praktyce
 
Umysl sprzedawcy
Umysl sprzedawcyUmysl sprzedawcy
Umysl sprzedawcy
 
Journal abbreviations
Journal abbreviationsJournal abbreviations
Journal abbreviations
 
New York City
New York CityNew York City
New York City
 
2011 Toyota Highlnder Plano
2011 Toyota Highlnder Plano2011 Toyota Highlnder Plano
2011 Toyota Highlnder Plano
 

Similaire à Data Modeling for Performance

Rich Results and Structured Data
Rich Results and Structured DataRich Results and Structured Data
Rich Results and Structured DataSMA Marketing
 
Querying NoSQL with SQL - KCDC - August 2017
Querying NoSQL with SQL - KCDC - August 2017Querying NoSQL with SQL - KCDC - August 2017
Querying NoSQL with SQL - KCDC - August 2017Matthew Groves
 
SDKs, the good the bad the ugly - Japan
SDKs, the good the bad the ugly - JapanSDKs, the good the bad the ugly - Japan
SDKs, the good the bad the ugly - Japantristansokol
 
MAKE SENSE OF YOUR BIG DATA
MAKE SENSE OF YOUR BIG DATA MAKE SENSE OF YOUR BIG DATA
MAKE SENSE OF YOUR BIG DATA TREEPTIK
 
Application Development & Database Choices: Postgres Support for non Relation...
Application Development & Database Choices: Postgres Support for non Relation...Application Development & Database Choices: Postgres Support for non Relation...
Application Development & Database Choices: Postgres Support for non Relation...EDB
 
Online | MongoDB Atlas on GCP Workshop
Online | MongoDB Atlas on GCP Workshop Online | MongoDB Atlas on GCP Workshop
Online | MongoDB Atlas on GCP Workshop Natasha Wilson
 
Big Data Analytics 1: Driving Personalized Experiences Using Customer Profiles
Big Data Analytics 1: Driving Personalized Experiences Using Customer ProfilesBig Data Analytics 1: Driving Personalized Experiences Using Customer Profiles
Big Data Analytics 1: Driving Personalized Experiences Using Customer ProfilesMongoDB
 
Data Mining Open Ap Is
Data Mining Open Ap IsData Mining Open Ap Is
Data Mining Open Ap Isoscon2007
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 MinutesKarel Minarik
 
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it tooQuerying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it tooAll Things Open
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampAlexei Gorobets
 
ETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDBETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDBMongoDB
 
Utilizing Arrays: Modeling, Querying and Indexing
Utilizing Arrays: Modeling, Querying and IndexingUtilizing Arrays: Modeling, Querying and Indexing
Utilizing Arrays: Modeling, Querying and IndexingKeshav Murthy
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichNorberto Leite
 
Designing Capital One's iPhone and iPad App
Designing Capital One's iPhone and iPad AppDesigning Capital One's iPhone and iPad App
Designing Capital One's iPhone and iPad AppSabrina Ngai
 
Interactive analytics at scale with druid
Interactive analytics at scale with druidInteractive analytics at scale with druid
Interactive analytics at scale with druidJulien Lavigne du Cadet
 
WordCamp Chiclana 2020 Crea schemas sin plugins
WordCamp Chiclana 2020 Crea schemas sin pluginsWordCamp Chiclana 2020 Crea schemas sin plugins
WordCamp Chiclana 2020 Crea schemas sin pluginsClosemarketing
 
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...Amazon Web Services
 
JSON Data Modeling - July 2018 - Tulsa Techfest
JSON Data Modeling - July 2018 - Tulsa TechfestJSON Data Modeling - July 2018 - Tulsa Techfest
JSON Data Modeling - July 2018 - Tulsa TechfestMatthew Groves
 

Similaire à Data Modeling for Performance (20)

Rich Results and Structured Data
Rich Results and Structured DataRich Results and Structured Data
Rich Results and Structured Data
 
Querying NoSQL with SQL - KCDC - August 2017
Querying NoSQL with SQL - KCDC - August 2017Querying NoSQL with SQL - KCDC - August 2017
Querying NoSQL with SQL - KCDC - August 2017
 
SDKs, the good the bad the ugly - Japan
SDKs, the good the bad the ugly - JapanSDKs, the good the bad the ugly - Japan
SDKs, the good the bad the ugly - Japan
 
MAKE SENSE OF YOUR BIG DATA
MAKE SENSE OF YOUR BIG DATA MAKE SENSE OF YOUR BIG DATA
MAKE SENSE OF YOUR BIG DATA
 
Application Development & Database Choices: Postgres Support for non Relation...
Application Development & Database Choices: Postgres Support for non Relation...Application Development & Database Choices: Postgres Support for non Relation...
Application Development & Database Choices: Postgres Support for non Relation...
 
Online | MongoDB Atlas on GCP Workshop
Online | MongoDB Atlas on GCP Workshop Online | MongoDB Atlas on GCP Workshop
Online | MongoDB Atlas on GCP Workshop
 
Big Data Analytics 1: Driving Personalized Experiences Using Customer Profiles
Big Data Analytics 1: Driving Personalized Experiences Using Customer ProfilesBig Data Analytics 1: Driving Personalized Experiences Using Customer Profiles
Big Data Analytics 1: Driving Personalized Experiences Using Customer Profiles
 
Data Mining Open Ap Is
Data Mining Open Ap IsData Mining Open Ap Is
Data Mining Open Ap Is
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
 
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it tooQuerying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @Moldcamp
 
ETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDBETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDB
 
MongoDB With Style
MongoDB With StyleMongoDB With Style
MongoDB With Style
 
Utilizing Arrays: Modeling, Querying and Indexing
Utilizing Arrays: Modeling, Querying and IndexingUtilizing Arrays: Modeling, Querying and Indexing
Utilizing Arrays: Modeling, Querying and Indexing
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
 
Designing Capital One's iPhone and iPad App
Designing Capital One's iPhone and iPad AppDesigning Capital One's iPhone and iPad App
Designing Capital One's iPhone and iPad App
 
Interactive analytics at scale with druid
Interactive analytics at scale with druidInteractive analytics at scale with druid
Interactive analytics at scale with druid
 
WordCamp Chiclana 2020 Crea schemas sin plugins
WordCamp Chiclana 2020 Crea schemas sin pluginsWordCamp Chiclana 2020 Crea schemas sin plugins
WordCamp Chiclana 2020 Crea schemas sin plugins
 
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
 
JSON Data Modeling - July 2018 - Tulsa Techfest
JSON Data Modeling - July 2018 - Tulsa TechfestJSON Data Modeling - July 2018 - Tulsa Techfest
JSON Data Modeling - July 2018 - Tulsa Techfest
 

Dernier

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Dernier (20)

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Data Modeling for Performance

  • 1. Data Modeling for Performance Mongo Boulder Michael Dwan January 21, 2010 Snapjoy
  • 2. i’m michael dwan @michaeldwan on the twitter
  • 3. the project Company X
  • 4. • find business details (web + api) • search by category/keyword + geo (web + api) • update (api) application spec
  • 5. 100,000 30,000 100,000,000 geo areas tags partners 2,300 15,000,000 categories businesses 2,000,000 requests daily 24,000,000 urls in sitemap why is this interesting?
  • 6. • infrequent changes • monthly updates w/ 12M monthly changes • “zero downtime” updates
  • 7. the problem mo’ data, mo’ problems
  • 9. providers mappings phone_numbers zips assets businesses _phone_numbers cities categorizations businesses states categories businesses_neighborhoods taggings users tags neighborhoods
  • 10. x xx x architecture
  • 12. dow n ti me solr
  • 14. dow n ti me migrations
  • 16. > gem install acts_as_web_scale
  • 17.
  • 18.
  • 19. { "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", } a business...
  • 20. { "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", } a business... has many phone numbers
  • 21. { "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ] } a business... has many phone numbers
  • 22. "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ] } a business... has coordinates
  • 23. "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ] } a business... has coordinates
  • 24. "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ] } a business... has many tags
  • 25. "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ] } a business... has many tags
  • 26. "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ] } a business... has an address
  • 27. "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St" } } a business... has an address
  • 29. { "_id" : ObjectId("4ce82937961552247900000f"), "name" : "Illinois", "slug" : "il", ... } a state
  • 30. "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St" } } a business... belongs to a state
  • 31. "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St" } } a business... belongs to a state
  • 32. "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St", "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" } } } a business... belongs to a state
  • 33. "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" } } } a business... belongs to a city
  • 34. "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, } } a business... belongs to a city
  • 35. }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, } } a business... belongs to a zip code
  • 36. }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, "zip" : { "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"), "display_name" : "97211" } } } a business... belongs to a zip code
  • 38. { "_id" : ObjectId("4ce82e64d3dfaa16360014eb"), "name" : "Auto Glass", "slug" : "3063-auto-glass", "tags" : [ "windshields" ], ... } a category
  • 39. "meta" : { "slug" : "or" }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, "zip" : { "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"), "display_name" : "97211" } } } a business... belongs to a zip code
  • 40. } } } a business... belongs to many categories
  • 41. } }, "categories" : [ { "_id" : ObjectId("4ce82e50d3dfaa16360004f2"), "meta" : { "slug" : "282-glass", "tags" : [ "windows" ], }, "display_name" : "Glass" }, { "_id" : ObjectId("4ce82e64d3dfaa16360014eb"), "meta" : { "slug" : "3063-auto-glass", "tags" : [ "windshields" ], }, "display_name" : "Auto Glass" } ] } a business... belongs to many categories
  • 42. queries & indexes know what you want
  • 43. #1 find a business I want *that* one
  • 44. // single business db.businesses.findOne({ _id: ObjectId("4ce838ef4a882579960001b9") }) find a business
  • 45. #2 find by location Businesses in San Francisco, CA
  • 46. // find all within state db.businesses.find({ "location.state._id": ObjectId("4ce82937961552247900000f") }) find businesses by state/city/zip
  • 47. // find all within state db.businesses.find({ "location.state._id": ObjectId("4ce82937961552247900000f") }) // find all within city db.businesses.find({ "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95") }) find businesses by state/city/zip
  • 48. // find all within state db.businesses.find({ "location.state._id": ObjectId("4ce82937961552247900000f") }) // find all within city db.businesses.find({ "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95") }) // find all within zip db.businesses.find({ "location.zip._id": ObjectId("4ce82b5ed3dfaa116b0026f0") }) find businesses by state/city/zip
  • 49. // the indexes db.businesses.ensureIndex({"location.city._id": 1}) db.businesses.ensureIndex({"location.zip._id": 1}) 1.5GB each skip “location.state._id” -- only 51 possibilities indexes
  • 50. #3 find by category Businesses in the Auto Repair category
  • 51. // find by category id db.businesses.find({ "categories._id": ObjectId("4ce82e50d3dfaa16360004f2") }) // the index db.businesses.ensureIndex({ "categories._id":1 }) businesses by category
  • 52. #4 - find by category + location Businesses in the Plumbing category in Chicago, IL
  • 53. // find by city id and category id db.businesses.find({ "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95"), "categories._id": ObjectId("4ce82e50d3dfaa16360004f2") }) businesses by category + city
  • 54. // city id {"location.city._id":1} ~ or ~ // category id {"categories._id":1} answer: both suck we need a compound index which index should we use?
  • 55. db.businesses.ensureIndex({ "location.city._id" : 1, "categories._id" : 1 }) ~ or ~ db.businesses.ensureIndex({ "categories._id" : 1, "location.city._id" : 1 }) 35,000 cities & 2,500 categories answer: cities → categories create one for zip codes and categories too! which order?
  • 56. {"location.city._id" : 1} {"location.city._id" : 1, "categories._id" : 1} answer: yes db.businesses.dropIndex("location.city._id_1") don’t we have 2 indexes on city id?
  • 57. #5 - find by keyword “something awesome” in Boulder, CO
  • 58. { "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "keywords" : [ "glass", "repair", "acme", ... ] } db.businesses.ensureIndex({ "location.city._id":1, "keywords":1 }) db.businesses.find({ "location.city._id":ObjectId("4ce82aa0d3dfaa10f8004a95"), "keywords":/glass/i }) find businesses in city by keyword
  • 59. me: we’re switching from postgres+solr to mongo kyle: oh wow, you can replace solr with mongo? me: with some creativity kyle: seems like it’d still be hard to get just right me: it works well kyle: gotcha chat with Kyle Banker
  • 60. i was wrong, kyle was right
  • 61. I I’ll never leave you again ...until MongoDB supports full text later this year :)
  • 63. sitemaps big list of every url
  • 64. • xml files containing each unique url ~ 24M • 50,000 urls per file, about 500 files • urls are generated from live data • http://companyx.com/sitemaps/1.xml sitemaps
  • 65. >> "hello!".hash % 6 #=> 5 >> "/ny/new-york/c/apartments".hash % 6 #=> 5 returns an integer between 0 and the number specified partition by consistent hash
  • 66. 1. map each url in the site to a partition 2. reduce all partitions to a single document containing all urls in that partition 3. save to a permanent collection map/reduce
  • 67. /il/chicago/c/pizza 4 1 /ny/new-york/c/apartments 1 nd/rugby/c/apartments 6 2 /14076500-bayside-marina 2 /13401000-comtrak-logistics-inc 3 3 /12347500-allstate-auto-insurance 1 il/downers-grove/c/computer-web-design 6 4 /1009500-heidelberg-lodges 5 mn/redwood-falls/c/food-service 4 5 /14077000-bank-of-america 5 mn/savage/c/audio-visual-equipment 1 6 ... map
  • 68. { { "total" : 2, "total" : 1, "urls" : [ "urls" : [ "/12347500-allstate-auto-insurance", "/mn/savage/c/audio-visual-equipment" "/ny/new-york/c/apartments" ] ] } } { "_id" : 1, "value" : { "total" : 2, "urls" : [ "/12347500-allstate-auto-insurance", "/mn/savage/c/audio-visual-equipment", "/ny/new-york/c/apartments" ] } } reduce
  • 69. db.sitemaps.findOne({_id:1}).value.urls [ "/12347500-allstate-auto-insurance", "/mn/savage/c/audio-visual-equipment", "/ny/new-york/c/apartments" ] usage
  • 71. 115ms average response times 2 months later