SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
Building Search@Airbnb
Mousom Dhar Gupta
Total Guests
20,000,000+
Countries
190
Cities
34,000+
Castles
600+
Listings Worldwide
1,200,000+
Search
That Awesome Slide Title of Yours
Technical Stack
____________________________
DropWizard as a service framework (incl. Jetty, Jersey, Jackson)
ZooKeeper (via Smartstack) for service discovery. 
Lucene for index storage and simple retrieval. 
In-house built forward index, real-time indexing, ranking,
advanced filtering.
Web App
Search1
150 Search Threads
Lucene Index
~30 replicas of same index
dataJVM
…Search2 SearchN
Search
Overview
search
Lucene
Lucene
Lucene
Lucene
Lucene
Lucene
Lucene
Lucene
Combiner
Filtering 
and
Ranking
Shards
____________________________
Each box has 8 shards of Lucene Index
Latency is 50% less than a single shard index
Challenges
____________________________
Bootstrap (creating the index from scratch)
Ensuring consistency of the index with ground truth data in real time
Indexing
What’s in the Lucene index?
____________________________
Positions of listings indexed using Lucene’s spatial module
(RecursivePrefixTreeStrategy)
Categorical and numerical properties like room type and maximum occupancy
Full text (descriptions, reviews, etc.)
~40 fields per listing from a variety of data sources, all updated in real time
fraud
SpinalTap
…
calendar
master
DataStore
Medusa
Search 1
Search N
Search 2
Realtime Update
Tails binary update logs from Mysql Servers (5.6+)
Converts changes in any of the tables into actionable objects called
“Mutations” (Inserts, deletes, Updates)
Broadcasts them to Medusa using Kafka
Spinaltap
fraud
SpinalTap
…
calenda
r
master
DataStore
Medusa
Search 1
Search N
Search 2
Realtime Update
Source of truth for search index data.
Listens to updates from Spinaltap and builds new IndexData by
querying ~15 mysql tables from three different databases.
Persists everything in a DataStore and broadcasts latest version to all
search nodes.
Uses ZooKeeper for leader election.
Medusa
fraud
SpinalTap
…
calenda
r
master
DataStore
Medusa
Search 1
Search N
Search 2
Realtime Update
What’s in the forward index?
____________________________
Holds all the metadata about a listing required by
scoring and filtering.
We also have complicated business rules to calculate
Price, Availability, InstantBook etc which needs a ton of
metadata.
~50 fields built from multiple data source and updated
in realtime.
public final class ForwardIndexData {	
private final CalendarData calendarData;	
private final PricingData pricingData;	
private final HostInfo hostInfo;	
. . . .	
. . . .	
}	
!
public final class CalendarData {	
private final DateRanges reservationDates;	
private final SeasonalValues startDayOfWeeks;	
. . . .	
}	
!
private final class SeasonalValues<T> {	
private final DateRange startDate;	
private final T value;	
. . . .	
}	
Forward Index
Availability
____________________________
!
Depends on the profile of guest.
The checkin date must be one of the valid start days of the week.
Must satisfy seasonal minimum nights.
There must be enough preparation time for the host.
Import busy dates from external calendars to avoid booking conflict.
Pricing
____________________________
!
Depends on number of guests , number of nights.
How close or further away the checkin date is.
How long is the trip, does the host have Weekly and Monthly pricing.
Is there special price override for these nights.
Instant Book
____________________________
!
Depends on number of guests , number of nights.
Profile of the guest like positive reviews, does have profile photo?
How much preparation time the host has etc.
Needs to store objects with 50-100 fields as values keyed by listing id.
Should avoid the cost of serialization/deserialization during every fetch.
Data must be available in-memory for fast lookup, but also
persisted on disk.
Highly Concurrent, writer shouldn’t block the readers (One writer
but >100 reader threads)
Requirements
Why did we need our custom Forward Index?
// Forward Index	
public interface ForwardIndex<V> {	
!
Map<Long, V> asMap();	
	
void put(long id, V value);	
!
void putAll(Map<Long, V> values);	
!
void remove(long id);	
!
void commit();	
!
}
Forward Index Interface
// Writer	
forwardIndex.put(listingId, listingData);	
. . .	
// write to disk and also make it visible to readers.	
forwardIndex.commit();
// Reader	
// Fetch forward index data from in-memory map	
Map<Long, ListingData> fwdIndex = forwardIndex.asMap();	
ListingData data = fwdIndex.get(listingId);	
!
// Use it to evaluate business rules 	
checkAvailability(data, searchRequest);	
calculatePrice(data, searchRequest)
NonBlocking In-Memory
HashMap
DiskStore
// Forward Index	
public class ForwardIndexStore<V> implements ForwardIndex<V> {	
private final DB<V> diskStore;	
private final Cache<V> cache;	
!
. . . .	
!
@Override	
Map<Long, V> asMap() {	
return Collections.unmodifiableMap(cache);	
}	
	
void put(long id, V value) {	
diskStore.put(id, value);	
cache.put(id, value);	
}	
!
. . . .	
!
void commit() {	
diskStore.commit();	
cache.commit();	
}	
}
Forward Index Implementation
Ranking Problem
____________________________
Not a text search problem
Users are almost never searching for a specific item, rather they’re looking to
“Discover”
The most common component of a query is location
Highly personalized – the user is a part of the query
Optimizing for conversion (Search -> Inquiry -> Booking)
Evolution through continuous experimentation
Ranking
Ranking Components
____________________________
Relevance
Quality
Bookability
Personalization
Desirability of location
etc.
Ranking
Several hundred signals used to build
machine learning models:
!
Properties of the listing (reviews, location, etc.)
Behavioral signals (mined from request logs)
Image quality and click ability (computer vision)
Host behavior (response time/rate, cancellations, etc.)
Host preferences model
DB snapshots Logs
Life of a Query
Query Understanding
Retrieval Populator
First Pass Scorer
Geocoding
Configuring retrieval options
Choosing ranking models
Quality
Bookability
Relevance
Second Pass Ranking
Result Generation AirEvents
Filtering by Price and
Availability
25 results
2000 results
25 results
Second Pass Ranking
____________________________
Traditional ranking works like this:
!
then sort by 
In contrast, second pass operates on the entire list at once:
!
Makes it possible to implement features like result diversity, etc.
Life of a Query
Query Understanding
Retrieval Populator
First Pass Scorer
Geocoding
Configuring retrieval options
Choosing ranking models
Quality
Bookability
Relevance
Second Pass Ranking
Result Generation AirEvents
Filtering by Price and
Availability
25 results
2000 results
25 results
Search@airbnb

Contenu connexe

Tendances

Elastic search overview
Elastic search overviewElastic search overview
Elastic search overviewABC Talks
 
Introduction to React
Introduction to ReactIntroduction to React
Introduction to ReactRob Quick
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic IntroductionMayur Rathod
 
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)Treasure Data, Inc.
 
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]MongoDB
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchIsmaeel Enjreny
 
MSA 전략 2: 마이크로서비스, 어떻게 구현할 것인가?
MSA 전략 2: 마이크로서비스, 어떻게 구현할 것인가?MSA 전략 2: 마이크로서비스, 어떻게 구현할 것인가?
MSA 전략 2: 마이크로서비스, 어떻게 구현할 것인가?VMware Tanzu Korea
 
Angular state Management-NgRx
Angular state Management-NgRxAngular state Management-NgRx
Angular state Management-NgRxKnoldus Inc.
 
Microservices Design Patterns | Edureka
Microservices Design Patterns | EdurekaMicroservices Design Patterns | Edureka
Microservices Design Patterns | EdurekaEdureka!
 
Introduction to Django REST Framework, an easy way to build REST framework in...
Introduction to Django REST Framework, an easy way to build REST framework in...Introduction to Django REST Framework, an easy way to build REST framework in...
Introduction to Django REST Framework, an easy way to build REST framework in...Zhe Li
 
ORM: Object-relational mapping
ORM: Object-relational mappingORM: Object-relational mapping
ORM: Object-relational mappingAbhilash M A
 
Angular and The Case for RxJS
Angular and The Case for RxJSAngular and The Case for RxJS
Angular and The Case for RxJSSandi Barr
 
How to Design a Multi-Region Active-Active Architecture
How to Design a Multi-Region Active-Active ArchitectureHow to Design a Multi-Region Active-Active Architecture
How to Design a Multi-Region Active-Active ArchitectureAmazon Web Services
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginnersNeil Baker
 
What Is React | ReactJS Tutorial for Beginners | ReactJS Training | Edureka
What Is React | ReactJS Tutorial for Beginners | ReactJS Training | EdurekaWhat Is React | ReactJS Tutorial for Beginners | ReactJS Training | Edureka
What Is React | ReactJS Tutorial for Beginners | ReactJS Training | EdurekaEdureka!
 
Django REST Framework
Django REST FrameworkDjango REST Framework
Django REST FrameworkLoad Impact
 

Tendances (20)

Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
 
Azure App Service Deep Dive
Azure App Service Deep DiveAzure App Service Deep Dive
Azure App Service Deep Dive
 
Introduction to React
Introduction to ReactIntroduction to React
Introduction to React
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
 
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
 
Learn react-js
Learn react-jsLearn react-js
Learn react-js
 
Using SQL on OEM Data
Using SQL on OEM DataUsing SQL on OEM Data
Using SQL on OEM Data
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
MSA 전략 2: 마이크로서비스, 어떻게 구현할 것인가?
MSA 전략 2: 마이크로서비스, 어떻게 구현할 것인가?MSA 전략 2: 마이크로서비스, 어떻게 구현할 것인가?
MSA 전략 2: 마이크로서비스, 어떻게 구현할 것인가?
 
Angular state Management-NgRx
Angular state Management-NgRxAngular state Management-NgRx
Angular state Management-NgRx
 
Microservices Design Patterns | Edureka
Microservices Design Patterns | EdurekaMicroservices Design Patterns | Edureka
Microservices Design Patterns | Edureka
 
Introduction to Django REST Framework, an easy way to build REST framework in...
Introduction to Django REST Framework, an easy way to build REST framework in...Introduction to Django REST Framework, an easy way to build REST framework in...
Introduction to Django REST Framework, an easy way to build REST framework in...
 
ORM: Object-relational mapping
ORM: Object-relational mappingORM: Object-relational mapping
ORM: Object-relational mapping
 
Angular and The Case for RxJS
Angular and The Case for RxJSAngular and The Case for RxJS
Angular and The Case for RxJS
 
How to Design a Multi-Region Active-Active Architecture
How to Design a Multi-Region Active-Active ArchitectureHow to Design a Multi-Region Active-Active Architecture
How to Design a Multi-Region Active-Active Architecture
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
What Is React | ReactJS Tutorial for Beginners | ReactJS Training | Edureka
What Is React | ReactJS Tutorial for Beginners | ReactJS Training | EdurekaWhat Is React | ReactJS Tutorial for Beginners | ReactJS Training | Edureka
What Is React | ReactJS Tutorial for Beginners | ReactJS Training | Edureka
 
Django REST Framework
Django REST FrameworkDjango REST Framework
Django REST Framework
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 

Similaire à Search@airbnb

WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...WSO2
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.GeeksLab Odessa
 
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...InfluxData
 
Relevance trilogy may dream be with you! (dec17)
Relevance trilogy  may dream be with you! (dec17)Relevance trilogy  may dream be with you! (dec17)
Relevance trilogy may dream be with you! (dec17)Woonsan Ko
 
WSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needsWSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needsSriskandarajah Suhothayan
 
How To Analyze Geolocation Data with Hive and Hadoop
How To Analyze Geolocation Data with Hive and HadoopHow To Analyze Geolocation Data with Hive and Hadoop
How To Analyze Geolocation Data with Hive and HadoopHortonworks
 
Personalization with Orleans and Actor modelling
Personalization with Orleans and Actor modellingPersonalization with Orleans and Actor modelling
Personalization with Orleans and Actor modellingHarald Schult Ulriksen
 
Icinga 2010 at Nagios Workshop
Icinga 2010 at Nagios WorkshopIcinga 2010 at Nagios Workshop
Icinga 2010 at Nagios WorkshopIcinga
 
WSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needsWSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needsSriskandarajah Suhothayan
 
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital EnterpriseWSO2
 
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...WSO2
 
Introducing DataWave
Introducing DataWaveIntroducing DataWave
Introducing DataWaveData Works MD
 
Kerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit eastKerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit eastJorge Lopez-Malla
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADtab0ris_1
 
Writing Node.js Bindings - General Principles - Gabriel Schulhof
Writing Node.js Bindings - General Principles - Gabriel SchulhofWriting Node.js Bindings - General Principles - Gabriel Schulhof
Writing Node.js Bindings - General Principles - Gabriel SchulhofWithTheBest
 
Paris Cassandra Meetup - Cassandra for Developers
Paris Cassandra Meetup - Cassandra for DevelopersParis Cassandra Meetup - Cassandra for Developers
Paris Cassandra Meetup - Cassandra for DevelopersMichaël Figuière
 
Discover Data That Matters- Deep dive into WSO2 Analytics
Discover Data That Matters- Deep dive into WSO2 AnalyticsDiscover Data That Matters- Deep dive into WSO2 Analytics
Discover Data That Matters- Deep dive into WSO2 AnalyticsSriskandarajah Suhothayan
 
Do you know what your drupal is doing? Observe it!
Do you know what your drupal is doing? Observe it!Do you know what your drupal is doing? Observe it!
Do you know what your drupal is doing? Observe it!Luca Lusso
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 

Similaire à Search@airbnb (20)

WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
 
Relevance trilogy may dream be with you! (dec17)
Relevance trilogy  may dream be with you! (dec17)Relevance trilogy  may dream be with you! (dec17)
Relevance trilogy may dream be with you! (dec17)
 
WSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needsWSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needs
 
How To Analyze Geolocation Data with Hive and Hadoop
How To Analyze Geolocation Data with Hive and HadoopHow To Analyze Geolocation Data with Hive and Hadoop
How To Analyze Geolocation Data with Hive and Hadoop
 
Personalization with Orleans and Actor modelling
Personalization with Orleans and Actor modellingPersonalization with Orleans and Actor modelling
Personalization with Orleans and Actor modelling
 
Icinga 2010 at Nagios Workshop
Icinga 2010 at Nagios WorkshopIcinga 2010 at Nagios Workshop
Icinga 2010 at Nagios Workshop
 
WSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needsWSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needs
 
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
 
Siddhi - cloud-native stream processor
Siddhi - cloud-native stream processorSiddhi - cloud-native stream processor
Siddhi - cloud-native stream processor
 
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
 
Introducing DataWave
Introducing DataWaveIntroducing DataWave
Introducing DataWave
 
Kerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit eastKerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit east
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
 
Writing Node.js Bindings - General Principles - Gabriel Schulhof
Writing Node.js Bindings - General Principles - Gabriel SchulhofWriting Node.js Bindings - General Principles - Gabriel Schulhof
Writing Node.js Bindings - General Principles - Gabriel Schulhof
 
Paris Cassandra Meetup - Cassandra for Developers
Paris Cassandra Meetup - Cassandra for DevelopersParis Cassandra Meetup - Cassandra for Developers
Paris Cassandra Meetup - Cassandra for Developers
 
Discover Data That Matters- Deep dive into WSO2 Analytics
Discover Data That Matters- Deep dive into WSO2 AnalyticsDiscover Data That Matters- Deep dive into WSO2 Analytics
Discover Data That Matters- Deep dive into WSO2 Analytics
 
Do you know what your drupal is doing? Observe it!
Do you know what your drupal is doing? Observe it!Do you know what your drupal is doing? Observe it!
Do you know what your drupal is doing? Observe it!
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 

Dernier

SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencessuser9e7c64
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 

Dernier (20)

SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conference
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 

Search@airbnb

  • 4. That Awesome Slide Title of Yours
  • 5. Technical Stack ____________________________ DropWizard as a service framework (incl. Jetty, Jersey, Jackson) ZooKeeper (via Smartstack) for service discovery. Lucene for index storage and simple retrieval. In-house built forward index, real-time indexing, ranking, advanced filtering.
  • 6. Web App Search1 150 Search Threads Lucene Index ~30 replicas of same index dataJVM …Search2 SearchN Search Overview
  • 8. Challenges ____________________________ Bootstrap (creating the index from scratch) Ensuring consistency of the index with ground truth data in real time Indexing
  • 9. What’s in the Lucene index? ____________________________ Positions of listings indexed using Lucene’s spatial module (RecursivePrefixTreeStrategy) Categorical and numerical properties like room type and maximum occupancy Full text (descriptions, reviews, etc.) ~40 fields per listing from a variety of data sources, all updated in real time
  • 11. Tails binary update logs from Mysql Servers (5.6+) Converts changes in any of the tables into actionable objects called “Mutations” (Inserts, deletes, Updates) Broadcasts them to Medusa using Kafka Spinaltap
  • 13. Source of truth for search index data. Listens to updates from Spinaltap and builds new IndexData by querying ~15 mysql tables from three different databases. Persists everything in a DataStore and broadcasts latest version to all search nodes. Uses ZooKeeper for leader election. Medusa
  • 15. What’s in the forward index? ____________________________ Holds all the metadata about a listing required by scoring and filtering. We also have complicated business rules to calculate Price, Availability, InstantBook etc which needs a ton of metadata. ~50 fields built from multiple data source and updated in realtime. public final class ForwardIndexData { private final CalendarData calendarData; private final PricingData pricingData; private final HostInfo hostInfo; . . . . . . . . } ! public final class CalendarData { private final DateRanges reservationDates; private final SeasonalValues startDayOfWeeks; . . . . } ! private final class SeasonalValues<T> { private final DateRange startDate; private final T value; . . . . } Forward Index
  • 16. Availability ____________________________ ! Depends on the profile of guest. The checkin date must be one of the valid start days of the week. Must satisfy seasonal minimum nights. There must be enough preparation time for the host. Import busy dates from external calendars to avoid booking conflict.
  • 17. Pricing ____________________________ ! Depends on number of guests , number of nights. How close or further away the checkin date is. How long is the trip, does the host have Weekly and Monthly pricing. Is there special price override for these nights.
  • 18. Instant Book ____________________________ ! Depends on number of guests , number of nights. Profile of the guest like positive reviews, does have profile photo? How much preparation time the host has etc.
  • 19. Needs to store objects with 50-100 fields as values keyed by listing id. Should avoid the cost of serialization/deserialization during every fetch. Data must be available in-memory for fast lookup, but also persisted on disk. Highly Concurrent, writer shouldn’t block the readers (One writer but >100 reader threads) Requirements Why did we need our custom Forward Index?
  • 20. // Forward Index public interface ForwardIndex<V> { ! Map<Long, V> asMap(); void put(long id, V value); ! void putAll(Map<Long, V> values); ! void remove(long id); ! void commit(); ! } Forward Index Interface // Writer forwardIndex.put(listingId, listingData); . . . // write to disk and also make it visible to readers. forwardIndex.commit(); // Reader // Fetch forward index data from in-memory map Map<Long, ListingData> fwdIndex = forwardIndex.asMap(); ListingData data = fwdIndex.get(listingId); ! // Use it to evaluate business rules checkAvailability(data, searchRequest); calculatePrice(data, searchRequest)
  • 21. NonBlocking In-Memory HashMap DiskStore // Forward Index public class ForwardIndexStore<V> implements ForwardIndex<V> { private final DB<V> diskStore; private final Cache<V> cache; ! . . . . ! @Override Map<Long, V> asMap() { return Collections.unmodifiableMap(cache); } void put(long id, V value) { diskStore.put(id, value); cache.put(id, value); } ! . . . . ! void commit() { diskStore.commit(); cache.commit(); } } Forward Index Implementation
  • 22. Ranking Problem ____________________________ Not a text search problem Users are almost never searching for a specific item, rather they’re looking to “Discover” The most common component of a query is location Highly personalized – the user is a part of the query Optimizing for conversion (Search -> Inquiry -> Booking) Evolution through continuous experimentation Ranking
  • 24. Several hundred signals used to build machine learning models: ! Properties of the listing (reviews, location, etc.) Behavioral signals (mined from request logs) Image quality and click ability (computer vision) Host behavior (response time/rate, cancellations, etc.) Host preferences model DB snapshots Logs
  • 25. Life of a Query Query Understanding Retrieval Populator First Pass Scorer Geocoding Configuring retrieval options Choosing ranking models Quality Bookability Relevance Second Pass Ranking Result Generation AirEvents Filtering by Price and Availability 25 results 2000 results 25 results
  • 26. Second Pass Ranking ____________________________ Traditional ranking works like this: ! then sort by In contrast, second pass operates on the entire list at once: ! Makes it possible to implement features like result diversity, etc.
  • 27. Life of a Query Query Understanding Retrieval Populator First Pass Scorer Geocoding Configuring retrieval options Choosing ranking models Quality Bookability Relevance Second Pass Ranking Result Generation AirEvents Filtering by Price and Availability 25 results 2000 results 25 results