IAC 2024 - IA Fast Track to Search Focused AI Solutions
C N I20080404
1. Classifying the (digital)
Arts and Humanities
Wishful thinking in fifteen slides
By Dr Torsten Reimer
Centre for e-Research, King's College London
IEEE Conference on e-Science - 11/12/2009
2.
3.
4.
5. Once upon a time
ICT Guides
• Projects
• Methods
• Tools
arts-humanities.net
• Events and
reports
• Community
• Bibliography
etc.
6.
7. arts-humanities.net
an online hub for research & teaching
in the digital arts and humanities
support for creating and using digital
resources
enables members to locate information,
promote their research and discuss
ideas
mix of centrally provided and user
contributed content
use of web 2.0 functionality such as
tagging, feeds, wiki, blogging, user
profiles etc.
community resource
8. Methods Taxonomy
• Originally developed for the projects
and methods database
• Focus on resource creation
• Used to categorize projects,
tools, resources
• Now part of arts-humanities.net
• Seven main categories
9. Data analysis
• Collating: Collation is the process of comparing different versions of a text to discover the location and type of
textual variants. Collation is fundamental to a variety of scholarly pursuits, for example in the Arts and Humanities
field it can be used for the accurate reconstruction of texts of classical works. In the past collation was performed by
hand; today, it is performed with the assistance of a computer. Read more...
• Collocating: Refers to the techniques used to detect patterns of words that appear together in a text more often
than would be expected by chance. A collocation is a group or pair of words that are always used together, and can
illustrate restrictions on which verbs or adjectives can be used with particular nouns, or the order in which words
appear. Read more...
• Content analysis: Content analysis is a research technique focused on the content and internal features of media.
It is used to determine the presence of certain words, concepts, themes, phrases, characters, or sentences within
texts or sets of texts and to quantify this presence in an objective manner. Read more...
• Content-based image retrieval: Content-based image retrieval (CBIR) refers to techniques used to search for
digital images by features of their content, which is particularly helpful when studying large databases. It is often
preferable to perform searches relying on metadata, which can be expensive and time-consuming to produce, as it
requires humans to describe each individual item in the database. Read more...
• Content-based sound retrieval: Refers to techniques used to search for sound files by features of their content,
using specialist software, which is particularly helpful when studying large databases. It is often preferable to
perform searches relying on metadata, which can be expensive and time-consuming to produce, as it requires
humans to describe each individual item in the database. Read more...
• Data mining: Data mining is the process of using computing power to extract hidden patterns from data, analysing
the results from different perspectives and summarising it into a useful format, such as a graph or table. This
process is often facilitated by the use of metadata. It is important that any patterns found are verified and validated
by comparison with other data samples. In this way, data mining can identify trends that go beyond simple data
analysis. Read more...
• Image feature measurement: Image feature measurement is a term to describe techniques used to acquire,
measure, and analyse the parameters of digital images, such as size, shape, relative locations, textures, grey tones
and colours. These parameters are also known as ‘perception attributes’. Read more...
12. CHAIN
ADHO, centerNet, CLARIN,
DARIAH, Project Bamboo,
NoC
Key theme: advocacy for an
improved digital research
infrastructure for the
Humanities and Arts
Knowledge base: all partners
want one; we have one
International desire to overcome
'mine, all mine problem'
Coalition of Humanities and Arts Infrastructures and Networks
13. Problems with current set-up
• Shared editing necessary
• Versioning system
• Distributed across several websites
• Only parent-child relationships
• Different terminology for same
method in different fields
• Only monolingual
14. Solution: semantic web?
Linked Data:
• 1. Use URIs to identify things.
• 2. Use HTTP URIs so that these things can be referred to and looked up
("dereference") by people and user agents.
• 3. Provide useful information (i.e., a structured description — metadata)
about the thing when its URI is dereferenced.
• 4. Include links to other, related URIs in the exposed data to improve
discovery of other related information on the Web.
15. Taxonomy as service
Semantic web
(linked data)
Shared taxonomy
• CeRch
• DHO
• OeRC
• (CHAIN)
• and you?
16. Glorious future
• Build a resource owned
by and useful for the
wider Digital
Humanities / Arts
community
• Bring field(s) together
• Make what we do more
easily accessible to
funding bodies and the
public