3. Approach: Outside -> In
1. Use external sources to find interesting entities related to
a given date
2. Feed those entities into a query to the Discovery API
3. Present the entities and a set of related holdings from
WorldCat
8. WorldCat Discovery API
Things -> Strings -> Things
creator:[author name]
name:[book title]
subject:[author name | country name | book title]
Add number of results to ranking
rank += 10 * number_dapi_results
Important to consider the context:
* OCLC's Developer House event, December 2015
* Focus on Linked Data and the WorldCat Discovery API
- OCLC-provided venue and staff to help get up to speed on the Discovery API as well as Linked Data concepts
* Work with colleagues to prototype a tool or service
PROTOTYPE: I'll present the prototype my team -- and it was a team, who I am attempting to represent -- worked on, what it was intended to demonstrate, and how it works. I think it's cool, but don't confuse this with anything production-ready.
Which won't be hard, because I want to highlight some "opportunities" for future along these lines
... by which I mean: not all of this worked very well, and I want to talk about the shortcomings and hard problems we encountered as well as the things that worked.
The pitch for this tool would be something like "This day is history"
* could we leverage Linked Data and the discovery API to highlight holdings for a given day?
* applications could be:
- use to drive a recommendation tile on a search results page or a website
- use this to drive content for digital signage or other building displays
* Something fully automated would be great, but even a weekly email of "interesting things for the next week" could be useful.
Approach: Outside -> In
* use external sources to find *interesting* *entities* related to a given date
* feed those entities into a query to the Discovery API
* present the entities and a set of interesting holdings from WorldCat
Two things here:
* entities: what should we be searching for? how is that thing related to the date?
* interesting: how do we know whether an entity is interesting?
DBpedia
So the entities...Find entities via DBpedia
* "DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web"
* WikiPedia as Linked Data
* Things not strings
* RDF (Resource Description Framework) data model makes statements using triples; subject -> predicate -> object
* we were looking for subject -> predicate -> date
DBpedia has a SPARQL interface
* (SPARQL = SPAQRL Protocol and RDF Query Language)
* So we can create queries against DBpedia to find entities related to our data
Tried out a bunch of different examples to see what worked.
* Authors who were born on a date
* Books that were published on a date
* Countries that were founded on a date
(Not going to get too into the nitty-gritty here, code is on github)
Statements about the dbPedia resource David Foster Wallace
* fits our criteria
* also, sorta-kinda gives us interesting information like a VIAF id
Now about "interesting"
* I don't know if you've noticed, but there's a lot of stuff in WikiPedia -- how do we rank the date-related entities?
* each entity has a WikiPedia page and each WikiPedia will tell you things about itself through the WikiPedia API
* using that relationship we can pull some data to use to compute a score for each entity that we can sort on
Rank = revision_count + (10 * article_length) + number_external_links
(all of the WikiPedia variables are normalized)
Political ELement to this that I don't want to gloss over entirely: we're highlighting things that are well-represented in WikiPedia. Worth considering what limitations that puts on what we're highlighting -- and more importantly what we're not seeing
Here's where the "outside-in" approach gets tricky
* A lot of what me might like to do in a linked data way, we can't do with the Discovery API at this point
* We need to go back from our "thing" to some "strings" -- we map some string values from our entities to the query-able indices in the Discovery API
* And we add the number of results back to the ranking to indicate more interest
* Big opportunity for improvement, even for string, by, say, reconciling against FAST and plugging those strings (if not the URIs) into DAPI Diem
* Could also further mine the Statements about the entity to find futher information for use with DAPI Diem (e.g., look up works and query for them more specifically), look up related topcis/authors
Example of some things DAPI Diem brings up for November 11
- image where available
- statement about the relationship to the day
- Description
- 5 items from WorldCat
Bigger image
* sources
* thumbnail images never made it out of the backlog...
* This works pretty well, some duplication but useful enough...
Not everything works that well
the good...
….the bad...
...the ugly
Credits and code
My fantastic teammates
A nod to OCLC’s help along the way
Code is on GitHub