SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
Monkey with the Semantic Web
SearchMonkey



                          Presentation by:



         Paul Tarjan, Chief Technical Monkey
               (ptarjan@yahoo-inc.com)

                            Online at:



  http://www.slideshare.net/ptarjan/semantic-searchmonkey
The web was / is fragmented


                                   Funny pictures
       Super secret
        military site




                        Friend’s
                        website
    University                                    Cool
   event page                                  bookmarks
So we added search to find stuff


               Google                      Yahoo




                Super
                                         Funny
               secret
                                        pictures
             military site


                             Friend’s
         University                                   Cool
                             website
        event page                                 bookmarks
But there are many similar sites



   Facebook Events    Evite Events   Upcoming Events



      Youtube          Metacafe          Vimeo




        Digg            Reddit          Technorati




    Let’s treat these as “views” onto “objects”
Wouldn’t it be cool if you could do:

  •  object:video creator:”Paul Tarjan”
     length<=60s
Wouldn’t it be cool if you could do:

  •  object:video creator:http://paulisageek.com/
     length<=60s
Wouldn’t it be cool if you could do:

  •  object:game name:”Desktop Tower Defense”
     version:1.5 publishdate:”May 2 2005”
Wouldn’t it be cool if you could do:

  •  object:video author:”The Escapist”
     game:”Left 4 Dead”
It gets even
    cooler
Aggregation:

  •  object:review type:camera make:canon
     model:D40
Aggregation:

  •  object:event date:”May 16, 2008”
     type:party price<$5
Aggregation:

  •  object:photo person:“Paul Tarjan”
Aggregation:

  •  object:photo person:http://paulisageek.com
The Semantic What?

  •  Web pages are views of data for people to
     read
  •  Search Engines are a hack
  •  They treat pages as a bucket of words
  •  Lets turn the web into a database
  •  APIs are good, but there is no “web” of APIs
  •  If you figure out a good way of doing that, let
     me know 
Ok, I want to do it.
    Now what?
Recommendation: µF

  •  If there is a microformat for your data, use it
     –  hcard
     –  hreview
     –  hresume
     –  hcalendar
     –  rel-tag
     –  rel-licence
     –  xfn
     –  hatom
     –  geo
µF in a nutshell

  •  Change your @class to something that is known
  •  <div>
     –  <span class=“name”>Paul Tarjan</span>
     –  <span class=‘email’>spam@paulisageek.com</span>
  •  </div>
  •  BECOMES
  •  <div class=“vcard”>
     –  <span class=“fn”>Paul Tarjan</span>
     –  <span class=“email”>spam@paulisageek.com</span>
  •  </div>
Recommendation: RDFa

  •  If you have data that doesn’t really fit in a
     µF
  •  Examples:
    –  Markup APIs (YUI, javadoc, etc)
    –  Media (Audios, Videos, Games, Presentations)
    –  Job Postings
RDFa in a nutshell

  •  Make a namespace
  •  Use @property, @rel and @resource
  •  For DATA: @property makes the node
     contents into the value 
  •  For URLs: @rel makes the @resource into
     the value
Normal HTML

  •  <html>
   
…
   <div class=quot;private”>
   
private static String 
   
<strong>_createCookieHash </strong>
    (hash)
   
…
RDFa: example

  •  <html xmlns:yui=quot;http://yuilibrary.com/rdf/
     1.0/yui.rdf#quot;>
   
…
   <div class=quot;private” rel=quot;yui:methodquot;
   resource=quot;#method__createCookieHashquot;>
   
private static String 
   
<strong property=quot;yui:namequot;>
    _createCookieHash </strong> (hash)
   
…
That’s it!

   •  Automatically picked up by semantic
      parsers / crawlers
   •  Can build a SearchMonkey app on it
   •  Can make a mashup way easier than screen
      scraping
   •  Can get the data from Yahoo! BOSS
What is SearchMonkey?

         an open platform for using structured data to build more
         useful and relevant search results



Before                               After
Enhanced Result: Zagat




        Image      Links   Key/Value Pairs
                           or Abstract
Infobar: Wikipedia Preview




              Summary         Blob
Part of the puzzle


           Semantic vocabularies


    Semantic markup on web pages


                SearchMonkey
Vocabularies

  •  Need to speak the same language
  •  I like to see girls of that... caliber.
  •  English, French, Spanish, Esparanto?
  •  URLs to the rescue
     –  Dublin Core (http://purl.org/dc/elements/1.1/)
     –  Friend of a Friend (http://xmlns.com/foaf/0.1/)
     –  X-Friend Network (http://gmpg.org/xfn/11/)
     –  … (many more)
Syntax

  •  Nouns, Verbs, and Adjectives, oh my!
  •  All phrases become lots of triples
  •  (Subject, Verb / Adj. / Prep. / etc, Object)
  •  Key / Value pairs ++
     –  Everything is a URL or String
     –  Subject doesn’t have to be the document
Syntax 2

  •  Key / Value pair
     –  Title = Awesome SearchMonkey Presentation
     –  Homepage =
        http://search.yahoo.com/searchmonkey
  •  Triples
     –  (self, http://purl.org/dc#title, “Awesome
        SearchMonkey Presentation”)
     –  (self, http://vcard#url,
        http://search.yahoo.com/searchmonkey)
Decompose to triples

  •  My friend “Bob” is an idiot.
     –  (self, http://xmlns.com/foaf/0.1/knows,
        genid:Ui__152310312_366)
     –  (genid:Ui__152310312_366, http://
        www.w3.org/2001/vcard-rdf/3.0#fn, “Bob”)
     –  (genid:Ui__152310312_366, http://
        example.org/ptarjan/isInstanceOf, http://
        example.org/ptarjan/idiot)
  •  Unnamed nodes are O.K.
Writing URLs takes a lot of work!

  •  xmlns:foaf=http://xmlns.com/foaf/0.1/
  •  xmlns:vcard=http://www.w3.org/2001/vcard-rdf/
     3.0#
  •  xmlns:junk=http://example.org/ptarjan/
  •  My friend “Bob” is an idiot.
     –  (self, foaf:knows, genid:Ui__152310312_366)
     –  (genid:Ui__152310312_366, vcard:fn, “Bob”)
     –  (genid:Ui__152310312_366, junk:isInstanceOf, junk:idiot)
  •  Unnamed nodes are O.K.
RDFa

  •  <html xmlns:foaf=“http://xmlns.com/foaf/0.1”
     xmlns:vcard=http://www.w3.org/2001/vcard-rdf/
     3.0# xmlns:junk=http://example.org/ptarjan/>
   
 
<div rel=“foaf:knows”>
   

    
<span property=“vcard:fn”>Bob</span>
   

   
<span rel=“junk:isInstanceOf”
   resource=“junk:idiot” />
   
 
</div>
   
</html>
•  </SemanticWeb>


•  Questions?
Innards of SearchMonkey

  •  You build a web-service inside our
     framework
  •  When a search page renders
    –  We check which SM apps are enabled
    –  We call them
       • 50ms for in-page
       • Long time for AJAX
    –  They return data in our template
    –  We render them (and cache)
Prototyping with XSLT

  •  What if I don’t have structured data?
     –  I don’t own the site
     –  I do own the site, but I want to prototype first
  •  Build an XSLT custom data service first
     –  Write some XSLT to extract the data and
        transform it into DataRSS
     –  Mostly about finding the right XPath (use
        Firebug or XPather ) 
     –  Quick to implement, but brittle
     –  Can’t do a good Enhanced Result
Do it for real

   •  Demo
Examples



  •  Rubic’s cube
  •  VTA Bus
  •  API Monkey
  •  BugMeNot
  •  RetailMeNot
  •  Amazon
questions?

Contenu connexe

Tendances

Footprints for backlinks - Find quality backlinks in minutes
Footprints for backlinks - Find quality backlinks in minutesFootprints for backlinks - Find quality backlinks in minutes
Footprints for backlinks - Find quality backlinks in minutesSeo 4 you 2
 
The web is too slow
The web is too slow The web is too slow
The web is too slow Andy Davies
 
Twitter Bootstrap, or why being a PHP Developer is a bad idea
Twitter Bootstrap, or why being a PHP Developer is a bad ideaTwitter Bootstrap, or why being a PHP Developer is a bad idea
Twitter Bootstrap, or why being a PHP Developer is a bad ideaJason Lotito
 
Searching the internet - what patent searchers should know
Searching the internet - what patent searchers should knowSearching the internet - what patent searchers should know
Searching the internet - what patent searchers should knowEric Sieverts
 
HackMIT Presentation
HackMIT PresentationHackMIT Presentation
HackMIT PresentationMatt Harris
 
Finding things on the web with BOSS
Finding things on the web with BOSSFinding things on the web with BOSS
Finding things on the web with BOSSChristian Heilmann
 
Graph API - Facebook Developer Garage Taipei
Graph API - Facebook Developer Garage TaipeiGraph API - Facebook Developer Garage Taipei
Graph API - Facebook Developer Garage TaipeiCardinal Blue Software
 
Creating a Culture of Innovation in Your Library and Community (SMSD)
Creating a Culture of Innovation in Your Library and Community (SMSD)Creating a Culture of Innovation in Your Library and Community (SMSD)
Creating a Culture of Innovation in Your Library and Community (SMSD)Heather Braum
 
The duck soup link building guide
The duck soup link building guideThe duck soup link building guide
The duck soup link building guideTabish Javed
 
Webspam (English Version)
Webspam (English Version)Webspam (English Version)
Webspam (English Version)Dirk Haun
 
All seo foot prints
All seo foot printsAll seo foot prints
All seo foot printsazad008
 
Semantic Web, an introduction for bioscientists
Semantic Web, an introduction for bioscientistsSemantic Web, an introduction for bioscientists
Semantic Web, an introduction for bioscientistsEmanuele Della Valle
 
High Performance Kick Ass Web Apps (JavaScript edition)
High Performance Kick Ass Web Apps (JavaScript edition)High Performance Kick Ass Web Apps (JavaScript edition)
High Performance Kick Ass Web Apps (JavaScript edition)Stoyan Stefanov
 

Tendances (20)

Footprints for backlinks - Find quality backlinks in minutes
Footprints for backlinks - Find quality backlinks in minutesFootprints for backlinks - Find quality backlinks in minutes
Footprints for backlinks - Find quality backlinks in minutes
 
The web is too slow
The web is too slow The web is too slow
The web is too slow
 
Artspeakpresentation
ArtspeakpresentationArtspeakpresentation
Artspeakpresentation
 
Twitter Bootstrap, or why being a PHP Developer is a bad idea
Twitter Bootstrap, or why being a PHP Developer is a bad ideaTwitter Bootstrap, or why being a PHP Developer is a bad idea
Twitter Bootstrap, or why being a PHP Developer is a bad idea
 
Searching the internet - what patent searchers should know
Searching the internet - what patent searchers should knowSearching the internet - what patent searchers should know
Searching the internet - what patent searchers should know
 
HackMIT Presentation
HackMIT PresentationHackMIT Presentation
HackMIT Presentation
 
YQL talk at OHD Jakarta
YQL talk at OHD JakartaYQL talk at OHD Jakarta
YQL talk at OHD Jakarta
 
Mio web
Mio webMio web
Mio web
 
Html by tanbircox
Html by tanbircoxHtml by tanbircox
Html by tanbircox
 
Finding things on the web with BOSS
Finding things on the web with BOSSFinding things on the web with BOSS
Finding things on the web with BOSS
 
Hardcore HTML
Hardcore HTMLHardcore HTML
Hardcore HTML
 
20090422 Www
20090422 Www20090422 Www
20090422 Www
 
Graph API - Facebook Developer Garage Taipei
Graph API - Facebook Developer Garage TaipeiGraph API - Facebook Developer Garage Taipei
Graph API - Facebook Developer Garage Taipei
 
Creating a Culture of Innovation in Your Library and Community (SMSD)
Creating a Culture of Innovation in Your Library and Community (SMSD)Creating a Culture of Innovation in Your Library and Community (SMSD)
Creating a Culture of Innovation in Your Library and Community (SMSD)
 
The duck soup link building guide
The duck soup link building guideThe duck soup link building guide
The duck soup link building guide
 
Webspam (English Version)
Webspam (English Version)Webspam (English Version)
Webspam (English Version)
 
All seo foot prints
All seo foot printsAll seo foot prints
All seo foot prints
 
Css by tanbircox
Css by tanbircoxCss by tanbircox
Css by tanbircox
 
Semantic Web, an introduction for bioscientists
Semantic Web, an introduction for bioscientistsSemantic Web, an introduction for bioscientists
Semantic Web, an introduction for bioscientists
 
High Performance Kick Ass Web Apps (JavaScript edition)
High Performance Kick Ass Web Apps (JavaScript edition)High Performance Kick Ass Web Apps (JavaScript edition)
High Performance Kick Ass Web Apps (JavaScript edition)
 

Similaire à Monkey with the Semantic Web Presentation

Semantic Web For Distributed Social Networks
Semantic Web For Distributed Social NetworksSemantic Web For Distributed Social Networks
Semantic Web For Distributed Social NetworksDavid Peterson
 
Revolutions The Appendix
Revolutions The AppendixRevolutions The Appendix
Revolutions The AppendixShunsaku Kudo
 
History of jQuery
History of jQueryHistory of jQuery
History of jQueryjeresig
 
Using HTML5 For a Great Open Web - Valtech Tech Days
Using HTML5 For a Great Open Web - Valtech Tech DaysUsing HTML5 For a Great Open Web - Valtech Tech Days
Using HTML5 For a Great Open Web - Valtech Tech DaysRobert Nyman
 
온톨로지 개념 및 표현언어
온톨로지 개념 및 표현언어온톨로지 개념 및 표현언어
온톨로지 개념 및 표현언어Dongbum Kim
 
API's, Freebase, and the Collaborative Semantic web
API's, Freebase, and the Collaborative Semantic webAPI's, Freebase, and the Collaborative Semantic web
API's, Freebase, and the Collaborative Semantic webDan Delany
 
Douglas Knudsen - Great Mash Up
Douglas Knudsen - Great Mash UpDouglas Knudsen - Great Mash Up
Douglas Knudsen - Great Mash Up360|Conferences
 
Research Information Network 07 May 2009
Research Information Network 07 May 2009Research Information Network 07 May 2009
Research Information Network 07 May 2009rpg7ss
 
Performance, Games, and Distributed Testing in JavaScript
Performance, Games, and Distributed Testing in JavaScriptPerformance, Games, and Distributed Testing in JavaScript
Performance, Games, and Distributed Testing in JavaScriptjeresig
 
Using Wordpress 2009 04 29
Using Wordpress 2009 04 29Using Wordpress 2009 04 29
Using Wordpress 2009 04 29Matthew Baya
 
Web Development: The Next Five Years
Web Development: The Next Five YearsWeb Development: The Next Five Years
Web Development: The Next Five Yearssneeu
 
Semantic Web: A web that is not the Web
Semantic Web: A web that is not the WebSemantic Web: A web that is not the Web
Semantic Web: A web that is not the WebBruce Esrig
 

Similaire à Monkey with the Semantic Web Presentation (20)

SearchMonkey
SearchMonkeySearchMonkey
SearchMonkey
 
Semantic Web For Distributed Social Networks
Semantic Web For Distributed Social NetworksSemantic Web For Distributed Social Networks
Semantic Web For Distributed Social Networks
 
Reification
ReificationReification
Reification
 
Snakes on the Web
Snakes on the WebSnakes on the Web
Snakes on the Web
 
Internet Search
Internet SearchInternet Search
Internet Search
 
Revolutions The Appendix
Revolutions The AppendixRevolutions The Appendix
Revolutions The Appendix
 
History of jQuery
History of jQueryHistory of jQuery
History of jQuery
 
Using HTML5 For a Great Open Web - Valtech Tech Days
Using HTML5 For a Great Open Web - Valtech Tech DaysUsing HTML5 For a Great Open Web - Valtech Tech Days
Using HTML5 For a Great Open Web - Valtech Tech Days
 
온톨로지 개념 및 표현언어
온톨로지 개념 및 표현언어온톨로지 개념 및 표현언어
온톨로지 개념 및 표현언어
 
HTML Parsing With Hpricot
HTML Parsing With HpricotHTML Parsing With Hpricot
HTML Parsing With Hpricot
 
API's, Freebase, and the Collaborative Semantic web
API's, Freebase, and the Collaborative Semantic webAPI's, Freebase, and the Collaborative Semantic web
API's, Freebase, and the Collaborative Semantic web
 
Douglas Knudsen - Great Mash Up
Douglas Knudsen - Great Mash UpDouglas Knudsen - Great Mash Up
Douglas Knudsen - Great Mash Up
 
Research Information Network 07 May 2009
Research Information Network 07 May 2009Research Information Network 07 May 2009
Research Information Network 07 May 2009
 
Revisited
RevisitedRevisited
Revisited
 
Performance, Games, and Distributed Testing in JavaScript
Performance, Games, and Distributed Testing in JavaScriptPerformance, Games, and Distributed Testing in JavaScript
Performance, Games, and Distributed Testing in JavaScript
 
Using Wordpress 2009 04 29
Using Wordpress 2009 04 29Using Wordpress 2009 04 29
Using Wordpress 2009 04 29
 
QQ
QQQQ
QQ
 
S is for Spec
S is for SpecS is for Spec
S is for Spec
 
Web Development: The Next Five Years
Web Development: The Next Five YearsWeb Development: The Next Five Years
Web Development: The Next Five Years
 
Semantic Web: A web that is not the Web
Semantic Web: A web that is not the WebSemantic Web: A web that is not the Web
Semantic Web: A web that is not the Web
 

Dernier

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Dernier (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Monkey with the Semantic Web Presentation

  • 1. Monkey with the Semantic Web
  • 2. SearchMonkey Presentation by: Paul Tarjan, Chief Technical Monkey (ptarjan@yahoo-inc.com) Online at: http://www.slideshare.net/ptarjan/semantic-searchmonkey
  • 3. The web was / is fragmented Funny pictures Super secret military site Friend’s website University Cool event page bookmarks
  • 4. So we added search to find stuff Google Yahoo Super Funny secret pictures military site Friend’s University Cool website event page bookmarks
  • 5. But there are many similar sites Facebook Events Evite Events Upcoming Events Youtube Metacafe Vimeo Digg Reddit Technorati Let’s treat these as “views” onto “objects”
  • 6. Wouldn’t it be cool if you could do: •  object:video creator:”Paul Tarjan” length<=60s
  • 7. Wouldn’t it be cool if you could do: •  object:video creator:http://paulisageek.com/ length<=60s
  • 8. Wouldn’t it be cool if you could do: •  object:game name:”Desktop Tower Defense” version:1.5 publishdate:”May 2 2005”
  • 9. Wouldn’t it be cool if you could do: •  object:video author:”The Escapist” game:”Left 4 Dead”
  • 10. It gets even cooler
  • 11. Aggregation: •  object:review type:camera make:canon model:D40
  • 12. Aggregation: •  object:event date:”May 16, 2008” type:party price<$5
  • 13. Aggregation: •  object:photo person:“Paul Tarjan”
  • 14. Aggregation: •  object:photo person:http://paulisageek.com
  • 15. The Semantic What? •  Web pages are views of data for people to read •  Search Engines are a hack •  They treat pages as a bucket of words •  Lets turn the web into a database •  APIs are good, but there is no “web” of APIs •  If you figure out a good way of doing that, let me know 
  • 16. Ok, I want to do it. Now what?
  • 17. Recommendation: µF •  If there is a microformat for your data, use it –  hcard –  hreview –  hresume –  hcalendar –  rel-tag –  rel-licence –  xfn –  hatom –  geo
  • 18. µF in a nutshell •  Change your @class to something that is known •  <div> –  <span class=“name”>Paul Tarjan</span> –  <span class=‘email’>spam@paulisageek.com</span> •  </div> •  BECOMES •  <div class=“vcard”> –  <span class=“fn”>Paul Tarjan</span> –  <span class=“email”>spam@paulisageek.com</span> •  </div>
  • 19. Recommendation: RDFa •  If you have data that doesn’t really fit in a µF •  Examples: –  Markup APIs (YUI, javadoc, etc) –  Media (Audios, Videos, Games, Presentations) –  Job Postings
  • 20. RDFa in a nutshell •  Make a namespace •  Use @property, @rel and @resource •  For DATA: @property makes the node contents into the value •  For URLs: @rel makes the @resource into the value
  • 21. Normal HTML •  <html> … <div class=quot;private”> private static String <strong>_createCookieHash </strong> (hash) …
  • 22. RDFa: example •  <html xmlns:yui=quot;http://yuilibrary.com/rdf/ 1.0/yui.rdf#quot;> … <div class=quot;private” rel=quot;yui:methodquot; resource=quot;#method__createCookieHashquot;> private static String <strong property=quot;yui:namequot;> _createCookieHash </strong> (hash) …
  • 23. That’s it! •  Automatically picked up by semantic parsers / crawlers •  Can build a SearchMonkey app on it •  Can make a mashup way easier than screen scraping •  Can get the data from Yahoo! BOSS
  • 24. What is SearchMonkey? an open platform for using structured data to build more useful and relevant search results Before After
  • 25. Enhanced Result: Zagat Image Links Key/Value Pairs or Abstract
  • 27. Part of the puzzle Semantic vocabularies Semantic markup on web pages SearchMonkey
  • 28. Vocabularies •  Need to speak the same language •  I like to see girls of that... caliber. •  English, French, Spanish, Esparanto? •  URLs to the rescue –  Dublin Core (http://purl.org/dc/elements/1.1/) –  Friend of a Friend (http://xmlns.com/foaf/0.1/) –  X-Friend Network (http://gmpg.org/xfn/11/) –  … (many more)
  • 29. Syntax •  Nouns, Verbs, and Adjectives, oh my! •  All phrases become lots of triples •  (Subject, Verb / Adj. / Prep. / etc, Object) •  Key / Value pairs ++ –  Everything is a URL or String –  Subject doesn’t have to be the document
  • 30. Syntax 2 •  Key / Value pair –  Title = Awesome SearchMonkey Presentation –  Homepage = http://search.yahoo.com/searchmonkey •  Triples –  (self, http://purl.org/dc#title, “Awesome SearchMonkey Presentation”) –  (self, http://vcard#url, http://search.yahoo.com/searchmonkey)
  • 31. Decompose to triples •  My friend “Bob” is an idiot. –  (self, http://xmlns.com/foaf/0.1/knows, genid:Ui__152310312_366) –  (genid:Ui__152310312_366, http:// www.w3.org/2001/vcard-rdf/3.0#fn, “Bob”) –  (genid:Ui__152310312_366, http:// example.org/ptarjan/isInstanceOf, http:// example.org/ptarjan/idiot) •  Unnamed nodes are O.K.
  • 32. Writing URLs takes a lot of work! •  xmlns:foaf=http://xmlns.com/foaf/0.1/ •  xmlns:vcard=http://www.w3.org/2001/vcard-rdf/ 3.0# •  xmlns:junk=http://example.org/ptarjan/ •  My friend “Bob” is an idiot. –  (self, foaf:knows, genid:Ui__152310312_366) –  (genid:Ui__152310312_366, vcard:fn, “Bob”) –  (genid:Ui__152310312_366, junk:isInstanceOf, junk:idiot) •  Unnamed nodes are O.K.
  • 33. RDFa •  <html xmlns:foaf=“http://xmlns.com/foaf/0.1” xmlns:vcard=http://www.w3.org/2001/vcard-rdf/ 3.0# xmlns:junk=http://example.org/ptarjan/> <div rel=“foaf:knows”> <span property=“vcard:fn”>Bob</span> <span rel=“junk:isInstanceOf” resource=“junk:idiot” /> </div> </html>
  • 35. Innards of SearchMonkey •  You build a web-service inside our framework •  When a search page renders –  We check which SM apps are enabled –  We call them • 50ms for in-page • Long time for AJAX –  They return data in our template –  We render them (and cache)
  • 36. Prototyping with XSLT •  What if I don’t have structured data? –  I don’t own the site –  I do own the site, but I want to prototype first •  Build an XSLT custom data service first –  Write some XSLT to extract the data and transform it into DataRSS –  Mostly about finding the right XPath (use Firebug or XPather ) –  Quick to implement, but brittle –  Can’t do a good Enhanced Result
  • 37. Do it for real •  Demo
  • 38. Examples •  Rubic’s cube •  VTA Bus •  API Monkey •  BugMeNot •  RetailMeNot •  Amazon

Notes de l'éditeur

  1. <number>
  2. A SearchMonkey Enhanced result contains a great deal of structured data. It could have a picture, key/value pairs, deep links…This kind of information goes far beyond what normal search results give you – a title and an autoextracted summary. Where does this information come from? <number>
  3. Likewise, an Infobar has a summary (what the user sees before the pane is expanded) and a “blob”, an area of free-form HTML. <number>
  4. XSLT custom data services are excellent when there is no good structured data available, either because you don’t own the site in question, or because you just want to get a prototype out quickly without having to to change your site’s template markup. You can use these data services to mock up what is possible with SearchMonkey.As with the PHP, the XSLT is fairly simple. The “hard” part of writing the stylesheet is really just finding the right xpath expression for extracting the information you want. The other thing you need to do is pick a good vocabulary for describing the extracted data. For example, a description is a dc:description (Dublin Core description) and so on.If the page is not well-formed XHTML, have no fear, we tidy up the page ahead of time and run the XSLT on that. The tidying can fail, but only if the markup is really pathologically bad.As we mentioned before, XSLT custom data services are good for mocking up Enhanced Results, but they’re too slow in practice. For a production-quality app, you’ll need to use them in infobars.[Show demo]