SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
Tha	
  Anatomy	
  of	
  a	
  Large-­‐Scale	
  
Social	
  Search	
  Engine,	
  www2010	
                               	
  
•  Damon	
  Horowitz,	
  Sepandar	
  D.	
  Kamvar	
  
•  The	
  Anatomy	
  of	
  a	
  Large-­‐Scale	
  Social	
  Search	
  
   Engine	
  
•  WWW	
  2010	
  

•  Aardvark	
             QA                              	
  
•  web                                            	
  
•  QA              	
  
•                         	
  
• 
                                 	
  
• 
            	
  

•  Google
•        	
  Aardvark	
                                 •           :	
  Google	
  
•                                  	
                   •                                  	
  
•                           	
                          • 
•                                         	
                 	
  
•                                                	
     •                                         	
  
                                                        •                                                	

                                                          	
  
“Do	
  you	
  have	
  any	
  good	
  babysiLer	
  recommendaMons	
  in	
  Palo	
  
Alto	
  for	
  my	
  6-­‐year-­‐old	
  twins?	
  I’m	
  looking	
  for	
  somebody	
  that	
  
won’t	
  let	
  them	
  watch	
  TV.”
•  Crawler	
  and	
  Indexer	
  
     –                                                	
  
•  Query	
  Analyzer	
  
     –               	
  
•  Ranking	
  FuncMon	
  
     –                             	
  
•  UI	
  
     –                                         UI
s(ui ,u j ,q) = p(ui | u j ) • p(ui | q)
                = p(ui | u j )∑ p(ui | t) p(t | q)
                                  t∈T


• p(ui|uj):	
  quality	
  score	
  
• p(ui|q):	
  relevance	
  score	
  
•                                    	
  

u:             q:            t:             	
  
P(ui|t)                                        	
•                  	
                                          p(t | ui ) p(ui )
                                                   p(ui | t) =
•                                           	
                       p(t)
•                                    	
            s(t | ui ) = p(t | ui ) + γ ∑u∈U p(t | u)
     • facebook    	
  
• blog      	
                                     ∑ p(t | u ) = 1
                                                              i
•                  /twiLer	
                       t∈T


                                     €


                                 €
•                         	
  
     •                                                                        	
  
     • 
P(ui|uj)                    	
• 
                    	
  
     –           	
  
     –                                          	
  
     –                                   	
  
     –    	
  
     –                         	
  
     –                            	
  
     –    	
  
     – 
P(t|q)                       :	
     	
•  Non	
  QuesMon	
  Classifier	
  
   –                       	
  
•  Inappropriate	
  QuesMon	
  Classifier	
  
   –                	
  
•  Trivial	
  QuesMon	
  Classifier	
  
   –                                                  	
  
•  LocaMon	
  SensiMve	
  Classifier	
  
   – 
P(t|q)                        :	
                    	
•                          	
  
     –  Keyword	
  Match	
  Topic	
  Mapper	
  
         •                                       	
  
     –  Taxonomy	
  Topic	
  Mapper	
  
         •  SVM 3000                             	
  
     –  Salient	
  Term	
  Topic	
  Mapper	
  
         •  d-­‐idf                                     	
  
     –  User	
  Tag	
  Topic	
  Mapper	
  
         • 
•                                                  	
  
     –  Topic	
  ExperMse:	
  p(ui|q)	
  
     –  Connectedness:	
  p(ui|uj)	
  
     –  Availability:	
                                   	
  
•                  	
  
     – 
                                            	
  
•                        	
  
     –  Google PC               	
  
•  Mobile	
  Google   Aardvark
      	
  
     –  Google                         Aardvark
• 
             	
  
•                        	
  




                                  	
                                        	
Aardvark	
                             18.6	
  words	
                 98.1%	
                    	
          2.2	
   	
  2.9	
  words	
        57	
   	
  63%
•                   	
  
     –  fact
•  57.2% 10                 	
  
     –  facebook 15.7% 15          	
  
•             6 37
•  87.7%                	
  
•      2.08
•  97.7%       3               	
  
•  174,605         	
  
•      1,199,323
•  Google            	
  
     –  200     Aardvark                 	
  
     –  Aardvark                         google
                                     5                                	
  
     –  10                                                     	
  

                             	
                 	
                                  	

Aardvark	
                        5 	
               71.5%	
                 3.93	
  ±	
  1.23	

Google	
                          2 	
               70.5%	
                 3.07	
  ±	
  1.46
•                                          	
  
     –                              	
  
• 
                             	
  
• 
                      	
  
•              	
  
• 
•  “       ”       Aardvark   	
  
•  Aardvark          	
  
•  Aardvark          	
  

•  “           ”
                       	
  
• 

Contenu connexe

Plus de Jun Harada

Plus de Jun Harada (13)

決算が読めるようになるゼミ第5回_Slack_原田惇
決算が読めるようになるゼミ第5回_Slack_原田惇決算が読めるようになるゼミ第5回_Slack_原田惇
決算が読めるようになるゼミ第5回_Slack_原田惇
 
mybo concept v1.00
mybo concept v1.00mybo concept v1.00
mybo concept v1.00
 
IoT x オープンイノベーション MERC丸の内院生ラウンジ
IoT x オープンイノベーション MERC丸の内院生ラウンジIoT x オープンイノベーション MERC丸の内院生ラウンジ
IoT x オープンイノベーション MERC丸の内院生ラウンジ
 
ロボット技術が、意外な製品・サービスに変わる - ロボット技術の応用事例
ロボット技術が、意外な製品・サービスに変わる - ロボット技術の応用事例ロボット技術が、意外な製品・サービスに変わる - ロボット技術の応用事例
ロボット技術が、意外な製品・サービスに変わる - ロボット技術の応用事例
 
(途中案)本当に役立つプログラミング力を鍛える講座
(途中案)本当に役立つプログラミング力を鍛える講座(途中案)本当に役立つプログラミング力を鍛える講座
(途中案)本当に役立つプログラミング力を鍛える講座
 
コミュニケーションロボット開発から拡販までの色々
コミュニケーションロボット開発から拡販までの色々コミュニケーションロボット開発から拡販までの色々
コミュニケーションロボット開発から拡販までの色々
 
ユカイ工学 Qooboのご紹介
ユカイ工学 Qooboのご紹介ユカイ工学 Qooboのご紹介
ユカイ工学 Qooboのご紹介
 
2017-12-06 tsumugu4 人工知能特集
2017-12-06 tsumugu4 人工知能特集2017-12-06 tsumugu4 人工知能特集
2017-12-06 tsumugu4 人工知能特集
 
IoT Business in Japan
IoT Business in JapanIoT Business in Japan
IoT Business in Japan
 
東京研修プログラム
東京研修プログラム東京研修プログラム
東京研修プログラム
 
20170606 東京システムハウス様 ロボティクス思考塾_1.00
20170606 東京システムハウス様 ロボティクス思考塾_1.0020170606 東京システムハウス様 ロボティクス思考塾_1.00
20170606 東京システムハウス様 ロボティクス思考塾_1.00
 
西大和中学校様むけ、ミエタ社ワークショップ
西大和中学校様むけ、ミエタ社ワークショップ西大和中学校様むけ、ミエタ社ワークショップ
西大和中学校様むけ、ミエタ社ワークショップ
 
IoT・ロボット製品の実現に向けたアプローチの実例
IoT・ロボット製品の実現に向けたアプローチの実例IoT・ロボット製品の実現に向けたアプローチの実例
IoT・ロボット製品の実現に向けたアプローチの実例
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Lab seminar20100604

  • 1. Tha  Anatomy  of  a  Large-­‐Scale   Social  Search  Engine,  www2010  
  • 2. •  Damon  Horowitz,  Sepandar  D.  Kamvar   •  The  Anatomy  of  a  Large-­‐Scale  Social  Search   Engine   •  WWW  2010   •  Aardvark   QA   •  web  
  • 3. •  QA   •    •    •    •  Google
  • 4. •   Aardvark   •  :  Google   •    •    •    •  •      •    •    •    “Do  you  have  any  good  babysiLer  recommendaMons  in  Palo   Alto  for  my  6-­‐year-­‐old  twins?  I’m  looking  for  somebody  that   won’t  let  them  watch  TV.”
  • 5. •  Crawler  and  Indexer   –    •  Query  Analyzer   –    •  Ranking  FuncMon   –    •  UI   –  UI
  • 6.
  • 7. s(ui ,u j ,q) = p(ui | u j ) • p(ui | q) = p(ui | u j )∑ p(ui | t) p(t | q) t∈T • p(ui|uj):  quality  score   • p(ui|q):  relevance  score   •    u: q: t:  
  • 8. P(ui|t) •    p(t | ui ) p(ui ) p(ui | t) = •    p(t) •    s(t | ui ) = p(t | ui ) + γ ∑u∈U p(t | u) • facebook   • blog   ∑ p(t | u ) = 1 i •  /twiLer   t∈T € € •    •    • 
  • 9. P(ui|uj) •    –    –    –    –    –    –    –    – 
  • 10. P(t|q) :   •  Non  QuesMon  Classifier   –    •  Inappropriate  QuesMon  Classifier   –    •  Trivial  QuesMon  Classifier   –    •  LocaMon  SensiMve  Classifier   – 
  • 11. P(t|q) :   •    –  Keyword  Match  Topic  Mapper   •    –  Taxonomy  Topic  Mapper   •  SVM 3000   –  Salient  Term  Topic  Mapper   •  d-­‐idf   –  User  Tag  Topic  Mapper   • 
  • 12. •    –  Topic  ExperMse:  p(ui|q)   –  Connectedness:  p(ui|uj)   –  Availability:     •    –   
  • 13.
  • 14.
  • 15.
  • 16. •    –  Google PC   •  Mobile  Google Aardvark   –  Google Aardvark
  • 17. •    •    Aardvark 18.6  words 98.1% 2.2    2.9  words 57    63%
  • 18. •    –  fact
  • 19. •  57.2% 10   –  facebook 15.7% 15   •  6 37
  • 20. •  87.7%   •  2.08
  • 21. •  97.7% 3   •  174,605   •  1,199,323
  • 22. •  Google   –  200 Aardvark   –  Aardvark google 5   –  10   Aardvark 5 71.5% 3.93  ±  1.23 Google 2 70.5% 3.07  ±  1.46
  • 23. •    –    •    •    •    • 
  • 24. •  “ ” Aardvark   •  Aardvark   •  Aardvark   •  “ ”   •