SlideShare a Scribd company logo
1 of 35
Scaling to billionsof people and places QCon London 2010 Josh Devins Copyright 2010 Nokia
Just what scale are we talking about? Since the start of this talk 1K+ Nokia devices were made and sold (13/sec) 15M+ phone calls were made using Nokia phones 3M+ text messages were sent using Nokia phones At any given moment there are 350 Nokia devices at TajMahal 525 Nokia devices at the Eiffel Tower  5000 Nokia devices at Disney World  6750 Nokia devices in the Forbidden City
 220 countries and territories
 46 languages
 Billions of devices
 82 million GPS devices
What consumers see
With this much reach… Brand expectations Public visibility Nearly immediate scale
 Global presence
 Legal implications
 A dataset of all the places in the world
 A dataset that is always growing
 A dense dataset
 New suppliers arrive all the time
 Technology: Storage
 Data correctness is vital
 Technology: Deduplication “Hotel Adlon”       “The   Hotel Adlon, Berlin”
The data is not evenly distributed
Technology: Geospatial search “phoenix bar inwarsaw” “paris bar berlin” “moma” “the bar”
 Coping with traffic spikes
Coordinating with public announcements
 Anticipating scale (but avoiding premature optimisation)
Caching
Caching
Technology: Hadoop and Pig logs = FILTER logs BY method == ‘GET’ ANDstatusCode >= 200 ANDstatusCode < 300; groupedByUri = GROUP logs BY uri; uriCounts = FOREACH groupedByUri GENERATEgroup AS uri,COUNT(logs) AS numHits;
What’s special about mobile? “There are four times as many mobile subscribers in the world as there are installed PCs.” 		- Financial Times (ft.com)
 Serving to mobiles: latency
 Serving to mobiles: bandwidth
Serving to mobiles: offline
What about the future? More devices Common web runtimes Merger of Maemo and Moblin into MeeGo All equals larger reach
Come work with us! We’re looking for the best  developers, testers,  architects, designers Send your CV todeveloper.jobs.berlin@nokia.com
Thanks! Any questions? Josh Devins josh.devins@nokia.com www.joshdevins.net Come work with us in Berlin developer.jobs.berlin@nokia.com Free beer! Whittle Room, 5:30pm

More Related Content

Recently uploaded

Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usage
Matteo Carbone
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
lizamodels9
 
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
lizamodels9
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
amitlee9823
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
lizamodels9
 
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
dollysharma2066
 

Recently uploaded (20)

Forklift Operations: Safety through Cartoons
Forklift Operations: Safety through CartoonsForklift Operations: Safety through Cartoons
Forklift Operations: Safety through Cartoons
 
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
 
Uneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration PresentationUneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration Presentation
 
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfDr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usage
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMAN
 
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLBAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
 
Business Model Canvas (BMC)- A new venture concept
Business Model Canvas (BMC)-  A new venture conceptBusiness Model Canvas (BMC)-  A new venture concept
Business Model Canvas (BMC)- A new venture concept
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023
 
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptx
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
 
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
 
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 

Featured

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Scaling to billions of people and places - QCon London 2010

Editor's Notes

  1. Hi everyone, thanks everyone for coming today. I’m Josh, I’m from Nokia in Berlin and I’m going to talk today about people, places, maps, and building location based services.As I go through the talk here, I’m happy to answer any questions you may have as well, so please feel free to just shoot up your arm or yell out if something comes up.
  2. [BACKGROUND]-Probably obvious who Nokia is and what we do, or traditionally have done, we make these things!-Shift from manufacturing to services in the last few years with the Ovi brand-Currently 4 major areas for Ovi services: Store, Music, Messaging and Maps-Since I work for Maps, I can really only talk about that-More specifically, the group I work in is called Places and you can think of us as the points of interest or POI managers, we’re the onesthat deal with all of things we know about a place
  3. [BUBBLE PEOPLE]-More affectionately, we’re known as “the bubble people”!-Of course there’s more to us than just bubbles, but the name certainly does make light of the fact that we are just a piece of the overall Maps puzzle-Integrate with: search, physical vector maps, devices, web, many supporting teams to deliver a complete product-More pertinent here is maybe the startup story of Ovi services within a massively scaled, efficient organization and production machine-This is an organization that does things on the scale of tens and hundreds of millions, every year[TRANSITION into talking about scale and efficiency]
  4. Sources: Nokia free navigation press release from 21 January; Nokia Q4 results announcement from 28 January; CEO keynote at CES, January 2010-To put some real numbers and context to the scale of what we’re dealing with here-Let’s just say there’s lots of phones in use all over the world!
  5. -Photo by Professor Quentin Ziplash - http://www.flickr.com/photos/ziplash/2413559809/-In the context of this talk of course we have to mention GPS-enabled devices-More that 82M GPS enabled devices shipped so far since the N95 launched in 2007 – first GPS phone-Reach that keeps on increasing as more and more devices are built and sold that support GPS-Basically there’s a whole bunch of devices/computers out there that have the capability to do things that have changed the way we build products-Products are now built up from normal web technologies, JavaScript, AJAX, REST and so on[extra points]-Installation base covers about 10 different device models as well-Since the announcement of free walk and drive navigation at the end of January we have had over 3.5M Maps downloads-Digital maps for 180 countries75 of which have navigation covering 650K cities, 28Mkms of road and the population of about 1.5B people
  6. -Whatdoes this all mean to the consumer?-In our case, as I mentioned earlier, we’re in the business of managing places-We saw the bubble on the web and here it is again-And again on the device, although tailored for the user experience you might have on the device[TRANSITION]
  7. [TRANSITION to big scale]-Okay so hopefully I’ve set the stage a little bit and given some context of where the rest of the talk is coming from-But really what we’re here to talk about is stuff that’s big and the consequences of building stuff and operating at scale-I think the first thing that comes to mind for me when we talk about size and scale is probably just branding and the name Nokia-In a lot of countries, particularly in Asia, Nokia is even regarded as being one of the most trusted and well known brands and this in itself has a whole bunch of consequences-Being big naturally also leads to having a very broad public image and generally high visibility to the public and the press-Everyone is, at some point, looking at what you are doing, from the press to bloggers to other people in the organsiation-So when you set out to do something at scale with these kinds of pressures, you better make damn sure that you’ve got a plan and your head screwed on straight-For example when we do things like over the air software updates, we really have the potential, the ability to completely hose everyone&apos;s phones and that’s literally millions of consumers-We have a pretty big army of test devices that we put through the paces to make sure this doesn’t happen, but the potential is there-The last point here speaks a bit to the speed at which we try to operate-Being an existing, successful, global company means that you have not only high public visibility in one place, like Europe or North America, but really worldwide-So again when you launch some product or launch something like Ovi services, you have to do it at a massive scale for customers all over the world[cover in the first 5 mins. Ideally]
  8. Photo by Genista - http://flic.kr/p/BmPq[TRANSITION to global presence]-Okay, speaking of getting things done, let’s dive into some details-I mentioned a minute ago that being big is hard because our services need to be accessible everywhere-And really this means not just being available but being usable everywhere even in the face of varying worldwide network latency-To do this…CDNs, “edge applications”, and data centers all around the world-We are benchmarking from around the world to try feel what a user feels, using external services like Keynote-Australia story, use case-The wire is always there, regardless of scale and performance
  9. [TRANSITION to dataset talk]-So being global is one dimension of what we operate in-Something that we faced almost immediately that is a bit different from maybe typical web applications is that our dataset is really one of our biggest assets, and it also has multiple dimensions to it-Part of the immediate scale consequence is that we need to be immediately useful as well, implying that we need to start with a large dataset-Started with tens of millions of places, quite a lot of which were sourced from Navteq, our sister company-Targets are for upwards of 500M/half-billion places == 1 place per 13 people in the world
  10. Photo by Dr. Jaus - http://flic.kr/p/4GRinF-This scale needs constantlove and attention-Insatiable appetite of the consumer to always be adding more places and adding to the density-Both as places are created and shutdown every day around the world-Countries and cities are growing, populations are becoming more dense
  11. -Speaking of density…-1 sq km of San Francisco-Nearly 10K places
  12. -Photo by mikebaird - http://flic.kr/p/6WwnRv-In order to fulfill this growth in the dataset, rely on partners-20+ major content partners-Not just any partners but really strategic ones to help us fill in gaps internationally and cover the whole world-So we have to deal with growth of not just dataset but providers as wellConstant new suppliers – how to onboard, partnerships, non-tech and tech problems
  13. [technical interlude]-Photo by twicepic - http://www.flickr.com/photos/twicepix/4408790286[-2 problems we face coming from these varying dimensions of the dataset – storage and matching]-geo-based datasets are not perfectly suitable to simple key value stores since you are doing a lot of range queries and such and not just a fetch on ID-need lots of indexing to make that work and not just regular indexing but real spatial indexing too-right away relied on simple MySQL, RDBMS-at startup, didn’t want to rely on bleeding edge technology or things at least that we have no experience with as developers or ops-much easier to find people with really strong MySQL experience-we knew that MySQL could handle tens of millions of records, so we at least had a starting place-now that we have established services with MySQL, we’re moving on to look at more sophisticated geo-indexing and NoSQL data stores since we plan on eventually reaching the hundreds of millions of place records-at the moment we are using Lucene, which I’ll talk more about in a minute, as a basis for some of our geo search-we’re also looking most heavily at CouchDB and Project Voldemort right now as our future data store, hopefully in production in the next 6-months or so
  14. [technical interlude][-2 problems we face coming from these varying dimensions of the dataset – storage and deduplication]-One of our goals is to create not just a huge dataset, but to create a dataset that is accurate and precise-And not just in terms of metadata -- that is, making sure that the address is correct and accurate -- but also meaning that we have only one official representation of every place-So given two representations of a place, how do we go about deduplicating and matching up two of the same place representations?-The best way to show this is really with an example, so let’s have a look at some of the steps we go through to do this-STEPS: - narrow down to a geo region - narrow down to a category - coarse grained steps using indexing - normalise the address string using external geocodingservices again from our sister company Navteq - normalise the name - name comparison using things like Levenstein distance - fine grained steps in-memory-Challenges: -Super dense cities -Non-tokenizable languages
  15. -Tens of millions of places in the dataset and growing daily-The dataset has special characteristics though, it’s not just any dataset-This is not an actual graph of POIs but gives you a sense of just how much the density changes from area to area-Sparse dataset and the nature of in relation to caching and sourcing and crowd-sourcing, market share, developing nations, etc.-Canada as an example (hockey loving nation): look at the difference in density and sparseness-Long tail story – if you are in town X which is not a minor city, but Nokia didn’t care about it, bad experience, bad for trust, etc.-You have to pay some attention to the entire spectrum-Not just sparseness but density – how do you design UI, algorithms, etc. to work in both dense and sparse? There is no average square kilometer – Canada vs Beijing. Very different experiences.
  16. [technical interlude]-non-trivial, text based search, it adds a new dimension-best way to look at this is with a couple of examples1: Phoenix Bar in Warsaw Poland? Phoenix Bar Warsaw Indiana? semantics matter, word ordering can matter but not always, not just about stopwords or tokenization-and I’ve purposely put all of this in lower case text since, particularly on the phone, search requests are not punctuated or capitalized properly, so you can’t always rely on that as hints to the meaning of words2: is this a bar in Paris or a bar in Berlin?context matters, where are you searching from?3: Museum in New York (alt name)? Misspelled city name Momo, town in Gabon?Alternative place names matter, common spelling mistakes matter4: a Bar in The, a town in Burghundy France? Or literally a place called “The Bar” in whatever city you are currently searching from?-So, really, non-trivial problems, and can’t just apply simple rules to everythingMachine learning, human input, evaluation of search queries, and so on
  17. Photo by th.omas - http://flic.kr/p/5mpBVR[TRANSITION from dataset to traffic]-Okay, so we’ve seen what the dataset looks like and some of the unique aspects of it-Of course the thing that all of us as public facing services have to deal with at some point is traffic and how to deal with it-Already mentioned the use of CDNs, edge applications for dealing with worldwide distribution, but of course these are used as well to deal with traffic and more specifically traffic spikes-Of course caching in multiple places is something that helps deal with traffic spikes as well-I don’t think there’s any real big secret there, but sometimes traffic spikes are not exactly what you expect…[NEXT]
  18. -Sometimes a spike isn’t just a spike-We cause our own traffic spikes…i.e. marketing campaigns-Not only just a spike, but a big uptake in overall, long term users, bring you up to the next level, people discover just how good things are-Maybe nothing technical that we can do in addition, but people need to be informed in lots of places, data centers, etc.-Coordination with multiple parts of the organization go a long way to manage expected spikes and capacity planning
  19. -So if public announcements are the predictable side of scale, the complement to that is the fact that we know our user base will grow, just don’t quite know exactly when and how fast this will happen-To try and predict some of this we do typical things like looking at device sales figures, and our market growth in various parts of the world-But this can only take you so far-When we’re looking at where we need to go in the future with speed and size of scale, we look at first trying to prove that we need to scale or use a certain technology-It’s about pragmatic optimisation and extrapolation of existing data that we have-Using real measurements and not just guessingPhoto by kenleewrites - http://flic.kr/p/6voGK3
  20. -Speaking of measuring and scale, I want to show a few slides on caching effectiveness and our geo dataset-Linear traffic histogram to show the effectiveness of caching given a mixed sparse and dense dataset-Attempts to illustrate the hits that would not benefit from caching (hit only once in this timeframe), this is the first column here-Every other column shows the number of places that were fetched a particular number of times-So basically the long tail of this shows the variety of popularity of places[NEXT]-The graph is a bit at fault as you just saw since with this scale you can’t really see the really low values [NEXT]
  21. -To really see the long tail, this logarithmic projection is better-We can more easily see that the distribution of even the cache hits is not very even in the long tail-Such a variety of how the rest of the places are accessed-Are there smart things that we can do give the nature of the dataset?-Pre-seed cache? Smart cache miss algorithms?-Some of this we don’t have answers to (yet)-But this really comes down to doing good analysis and estimation
  22. [technology interlude]-We collect a ton of logs containing a lot of data and this is often the basis for our analysis and estimation-We collect usage statistics for specific features, all access logs of course from Tomcat and Apache, and we have to somehow make sense of this-To comb through all of this and churn out some pretty graphs, we built a small Hadoop cluster consisting of 60 cores and about 50TB of raw storage space-The graphs you just saw were created using standard Tomcat access log files generated from our core place registry service-I ran about 17M lines of access logs through Hadoop and really it was relatively easy to do this-For looking at just standard access logs, we mostly use Pig-For those that haven’t seen or used Pig, it’s a high-level language that lets you express data analysis jobs in a far easier way than being forced to write Java MapReducejobs[NEXT]-The top bit of code here shows really how simple it is to do something like filter out specific requests from some standard Apache or Tomcat log file-Here I’m just looking for all successful GET requests[NEXT]-This next snippet shows an aggregation function which lets us count up the number of hits on each URI-You can probably already start to see how those graphs from the previous slide were generated-If anyone is interested in the nitty gritty details, I’ve got a blog post on using Pig and gnuplot to create those caching graphs
  23. [photo needed]-Okay, so we’ve talked a lot about scale and being big and what exactly that means-But of course being a device company as well, we have to mention what is special about services that are consumed by mobile devices-What things do you need to take care of or at least be mindful of?[NEXT]Well, for starters, it’s important to see that mobile is not only a concern to us but really it’s something that affects everyoneAnd since we’ve been talking about big, well, I think that quote says it all
  24. Photo by Mrs Logic - http://flic.kr/p/6EedxC-Latency is maybe one of the first things that comes to mind-Mobile networks are very different that regular networks-Have to deal with things like GSM modems starting up, session startup and teardown and such things that affect the end user’s experience-In the end, it’s about optimisingthe right areas and taking a wholistic view of your service-Can make services as fast as possible, but without good smart clients that do things like deal with network sessions appropriately, it could be all for naught-The user won’t see all of the work you’ve put in on your service
  25. Photo by Mathieu Ramage - http://flic.kr/p/5Nfnv4-2.5M per month average for users of Maps-Other maps products are 10x this amount-Why? Basically because people load their devices with maps beforehand, leaving only using bandwidth for enhanced features that require you to be online-Cost to the consumer – not everyone has flat data plans, not all networks in the world can even do this places like India, China, etc.-Data roaming is $$$
  26. Photo by makerbot - http://flic.kr/p/6gPB7D-People are not always online, people’s connections drop out-Plus roaming costs, nobody wants to be travelling and pay roaming costs just to get a city guide or find their way to the Roman Coliseum-We can’t always rely on a connection to the internet to be there-How do we deal with this?-Like I mentioned a minute ago, for starters we let people load up the maps they want onto their phones-Second is that the maps data contains not only the actual digital vector maps, but map data as well-This lets you do things like offline search for addresses and landmarks-And really, I’ve travelled a lot with these phones and it’s a real godsend when you’re tired, fresh off the plane in a strange city
  27. [TRANSITION to wrap up]-Reach of more classes of devices
  28. -And just to be clear, I’ve talked a lot about big things and scale-Don’t want to give you the wrong impression about what it’s like in the trenches-Here’s a couple of people from one of our teams-And like the previous slide said, we are truly looking for the best-We try to work with the best too-In the middle there is Simon who is a Lucene committer and very active in the Hadoop community as well-We also work top notch Hadoop consultants, SpringSource, and ThoughtWorks for architecture and continuous integration and continuous deployment too
  29. -And on that note…