The Listening Experience Database (LED) project collects primary evidence of listening experiences from any historical period and musical genre. It uses a crowdsourcing approach to obtain over 8,000 listening experience submissions, which are implemented as Linked Open Data using ontologies. The data includes information on people, locations, musical works, genres, and is linked to external datasets like DBpedia, enhancing the representation of entities. Ongoing work includes text mining of evidence, analytics on the structured data, and improving vocabularies.
1. The Listening Experience Database
Alessandro Adamou
Knowledge Media Institute, The Open University
alessandro.adamou@open.ac.uk
@anticitizen79
2. Background
Research gap between leading strands of analysis of the musical
experience (cognitive, commercial, critical) [1,2], widened during
the Web of Data age.
Primary sources assumed to exist in significant quantities but:
• unstructured (or worse, not digitised) and/or
• unpublished and/or
• domain-biased (popular interest, phonographic era, social media)
LED consortium formed end 2012 to collate primary evidence of listening.
• £0.75m AHRC grant (2013-15)
• £0.98m AHRC grant (2016-19)
[1] S. Burstyn. In quest of the period ear. Early Music, XXV(4):692-701, November 1997.
[2] R. C. Wegman. Music as heard: Listeners and listening in Late-Medieval and Early Modern
Europe (1300-1600). Musical Quarterly, 82(3-4):432-433, 1998.
3. Crowdsourcing in databases
Obtaining data by soliciting contributions from a
community.
examples:
• Historic Cambridge Newspaper Collection
• Zooniverse (SETIlive, Old Weather etc.)
• Wikimedia Foundation projects
• setlist.fm, discogs.com
• UK Reading Experience Database
“Must a modern database really start up empty today?”
4. Inclusion protocols
• From any historical period and culture
– current oldest entry is 11th Century AD
• Involving any musical genre
• Must be documented with a referenceable source
– also unpublished, if obtained from an archival resource
– e.g. diaries, private correspondence, oral history, official
papers, (auto)biographies, social media
• No solicited criticism or fictional accounts
• No minimum standard for the level of detail in
describing the entities involved
• in English (primarily or officially translated)
5. LED-in-a-slide
http://open.ac.uk/Arts/LED
8227 individual listening experience / ~10k submissions
Evidence from published sources as well as manuscripts
Supervised crowdsourcing by experts and enthusiasts
Implemented using Linked Data, cross-domain data reuse
Faceted browsing, search, geographical browsing
~ 400,000 RDF triples
9. Native Linked Data implementation
• All generated data are entirely stored as RDF triples
– Browsing, searching etc. directly on the quad store
• Multi-tenancy and crowdsourcing model with named graphs
• Modular ontology using Bibo, DC, Music Ontology, FOAF,
Schema.org (and a little in-house)
• Data reuse and reconciliation with external sources is
integrated with the whole lifecycle
• Flexibility of SPARQL query interface: not constrained by the
facets offered by the Web portal.
10. External Linked datasets
● British National Bibliography http://bnb.data.bl.uk
○ Published works in the UK
● DBpedia http://dbpedia.org
○ Geographical data, musical works, published works worldwide
● LinkedBrainz http://linkedbrainz.org
○ Discontinued, but reengineering code has been made public
● data.gov.uk http://reference.data.gov.uk
○ Exact time instants and intervals in the British calendar
● VIAF http://viaf.org
○ for bridging alignments between BNB and DBpedia (mainly)
11. Dealing with vagueness
• Under-represented or vague spatial data
– e.g. “at home in Haymarket”, “church in Italy”, “a trip from Vienna to
London”
• Not fully qualified temporal instants or intervals
– e.g. “April 3, the 1820s”, “late 18th Century”, “Summer 1938, at night”,
“sometime between [...]”
• Entities being described but not named
– e.g. “British soldiers”, “Anglican Mass”, “mourners of Felix
Mendelssohn”, “Mrs. Britten”
• Unaligned semantics
– e.g. “Chords”, “Electric Guitar”, “Gibson Les Paul Sunburst”
– e.g. “King of England”, “Queen”, “Monarch”
Not the primary application of Linked Data, but the paradigm and
founding semantics can be adapted (to an extent).
12. Spatial data extraction
Free text input Stanbol + OpenNLP Curated Input RDF+GeoSPARQL
Apollo Theatre,
gallery
“Apollo Theatre, gallery”
dbp:Apollo_Theater
Manhattan
dbp:Apollo_Theatre
City of Westminster
led:place/12345
sg:sfIntersects
dbp:Apollo_Theater
; rdfs:label
“Apollo ...gallery”@en
13. Fuzziness in temporal data
Extended Date/Time Format (standard draft,
Library of Congress, 2012)
• Allows formalisation of underspecified points in time and
intervals
– “187u-22-uu” means “sometime in Summer in the 1870’s”
• We extended it to support subjective intervals (e.g.
early/mid/late, also for daytimes) and ranges (from-to)
• Made available in RDF for others to reuse, through
data.open.ac.uk (currently only materialised data)
14. LED contributions to the LD cloud
Royal Carl Rosa Company – “Faust”
for orchestra and voice
date: 14 May, 1917
location: Garrick Theatre (indoors, private space)
Novel data: historical music performances
Novel data: portions and quotes of document
sources / manuscripts (not modelled in BNB)
Journeying boy : the diaries of the young Benjamin Britten
1928-1938
Diary entries:
• Page 17, Feb 14 1929: “Still absent from school work. Everso much more […]”
• Page 67, March 18 1931: “Go with Mummy to B.B.C – Beethoven concert […]”
• Page 70, April 22 1931: “Go to John Nicholson’s to tea at 2.45. & to hear
Gramophone records on his new Radio-Gram Hear. Brahms. Pft. Concerto Mov.
1. (Rubenstein) Tchaik.”
• …
15. LED contributions to the LD cloud
Refined data: biographical enhancements
Refined data: semantic alignments between
DBpedia, BNB and MusicBrainz
dbpedia:Aaron_Copland
dbpedia:Jane_Austen
≡
≡
mb:aad3af83-5b59-4b86-a569-1a8409149b09#_
bnb:AustenJane1775-1817
Mary Somerville
Full name: Mary Fairfax Greig Somerville
Social group: Rulers, chiefs, aristocracy & gentry etc.
Occupation: Scientist
Religion: Christian, Protestant
wrote: Memoir of Mary Somerville (1817, 1840’s, 1849, 1850…)
16. Figures on reuse
Type Unique instances Total reuse Peak
People 8186 31869 1479
Written works 425 7474 431
Geographical locations 1410 8470 1061
Musical works (songs, albums) 6790 4241 46
Musical genres 343 7104 1195
Computed on 8227 distinct listening experiences
Source Reused distinct instances
DBpedia 2596
BNB 553
data.gov.uk 3278
MusicBrainz 1203
from external data sources
17. Ongoing work
• Text mining of listening evidence (e.g. most commonly used
terms for describing listening for specific periods or genres).
• Analytics on structured data (community
detection/clustering)
• Detection of listening experiences through Web crawling or
hooking into the user experience
• Controlled vocabularies (e.g. HISCO for historical occupations)
• Linked Data Fragments for facilitating reuse (under
investigation)
18. Further Reading
about LED:
Brown, S., Barlow, H., Adamou, A. and d'Aquin, M. (2015). The Listening Experience
Database Project: Collating the Responses of the "Ordinary Listener" to Prompt New
Insights into Musical Experience, The International Journal of the Humanities: Annual
Review, 13, p. 17-32, CGPublisher
Brown, S., Adamou, A., Barlow, H. and d'Aquin, M. (2014). Building listening experience
Linked Data through crowd-sourcing and reuse of library data, Proceedings of the 1st
International Workshop on Digital Libraries for Musicology, p. 1-8, ACM
related:
Hyvönen, E. (2012). Publishing and Using Cultural Heritage Linked Data on the Semantic
Web,Morgan & Claypool
Lewis, D. and Martin, T. (2015). Managing Vagueness with Fuzzy in Hierarchical Big Data.
2015 INNS Conference on Big Data, Vol. 53, p. 19-28, Elsevier
19. Thank you - QA time
Alessandro Adamou
Knowledge Media Institute, The Open University
alessandro.adamou@open.ac.uk
@anticitizen79