2. We will talk about…
1. Theoretical and methodological foundations of the DSpace-GLAM project
2. Managing digital objects with DSpace
3. Exentending the DSpace data model with DSpace-GLAM
4. Integrating DSpace and DSpace-GLAM entities
5. Digital cultural resources fruition and sharing with add-ons
6. Dataset analysis with CKAN
7. Conclusions
3. The BIG DATA age
• Since several years the term "Big Data" has
been bursting into the world of Information
Technology,
• Promising potential related to a new
generation of technologies and architectures
able to extract value from the enormous
amount of data which is continuously
produced in the most different fields
4. In the science domain "Big Data" are seen as an
opportunity even bigger
The "data deluge" will make obsolete some of the
fundamental concepts on which the scientific
method has been based so far
A new scientific paradigm ?
5. No more theories?
No more hypothesis?
No more models?
Numbers speak for themselves?
A new scientific paradigm ?
6. Certainly new opportunities…
Source:http://bouache.com/blog/big-data/
• Being able to manipulate and
analyze massive amounts of
data represents an important
progress for science
• It won’t abolish the need to
build, refine and verify theories
• It will allow to formulate
hypotheses and test them
infinitely more rapidly and on
an infinitely larger sample than
in the past
7. …also for humanities
No data deluge, but…growing
amount of data
• Databases
• Electronic journals
• Digitization
• Tools for data extaction
• …
8. A variety of multidisciplinary
data are related to Cultural Heritage and History
Different in:
Typology
Format
Structure
Scale
9. More and more complexity
In the humanities most of the data
are created or collected by people
(not measured by instruments)
They are affected by individuals, place, time
The are fragmentary, partial, biased
Source: http://www.asianscientist.com/2016/07/print/body-as-a-source-of-big-data/
10. Putting data in context
Digital Cultural Data have to be analyzed together
with all contextual information, digital and not
digital, needed to answer research questions, such
as:
• (cultural, social, economic, technological…)
production context of a document/monument
• formation processes of an archaeological
record
• contextual associations at different levels and
scales (according to the different dimensions of
variations)
Source: https://ddd.uab.cat/pub/expbib/2006/terradefoc/10.pdf
11. A Digital Humanities approach is fundamental…
Such an approach, with its focus on relationships,
can help in identifying the important dimensions
of variation (the CONTEXT)
It can help in analyzing primary sources as
evidences of a network of heterogeneous systems
which can be studied by means of them through a
global (holistic) and multidimensional analysis
Technological Environmental Social
Cultural Economic
Source: Hodder I. 2016, Studies in Human-Thing Entanglement, p. 28
12. …within a Digital Library Management System
To move such an approach from theory to practice we need infrastructures and tools for
integration, analysis and storage of digital data and resources.
Today most of the cultural digital resources and data are in the Digital Libraries or Repositories
Are Digital Libraries and Repositories that must provide tools for:
• modeling, visualising and analysing information, both in a qualitative and quantitative way, as
well as collaboratively working on it
• highlighting the relationships between data at different scales
• explaining interpretations about the important dimensions of variation and about the network
of contextual relations in which historical sources are involved
To enter the daily workflow of historians, archaeologists and humanities scholars.
13. Why DSpace?
To achieve the outlined goals and build a state-of-art
Digital Library Management System, open source
software is preferable.
Development of open source software gives effective
way to create Digital Library Management Systems with
a small financial investment.
Looking exactly at sustainability, among the most used
open source Digital Library Management Systems, we
chose DSpace.
,
14. Why DSpace?
DSpace out-of-the-box allows to:
• capture and describe digital material using a
submission workflow module, or a variety of batch
ingest options
• distribute digital assets over the web through a
search and retrieval system
• preserve digital assets over the long term
,
15. Why DSpace?
The system is based on the specifications of the OAIS (Open
Archival Information System) for Long Term Preservation and is
able to manage the whole "life-cycle" of a digital object in
terms of "Digital Curation", by means of:
• metadata creation according to different standards
• SIP (Submission Information Package) import and validation
• AIP (Archival Information Package) creation
• AIP export
• storage management
• digital resources dissemination (also by means of the OAI-
PMH)
• digital object history management and integrity check
,
16. Why DSpace?
,
There are over 2200 digital repositories and libraries worldwide using the
DSpace application for a variety of digital archiving and dissemination needs.
DSpace is often used as an institutional repository to provide access to
research outputs, scholarly publications, library collections, educational
material and more.
It is also used as a digital library to store, preserve and disseminate digital
cultural heritage.
A fairly large part of the world cultural and scientific heritage is already
managed, accessed and preserved using DSpace
It makes sense to enhance a system already widely used rather than propose
to migrate data to new platforms
18. Communities & Collections
,
• Communities and collections are entities useful to aggregate DSpace
items by:
• Provenance and responsibility >>> Communities
• Metadata, workflow, curation >>> Collections
• They both aggregate the items but they are conceptually different things!
27. DSpace metadata
, Out-of-the-box DSpace can support
multiple flat metadata schemas
You can configure multiple schemas by
means of the “Metadata Schema Registry”
and select metadata fields from a mix of
configured schemas to describe your items
Communities and collections have some
simple descriptive metadata (a name, and
some descriptive prose)
29. Defining the submission form
,
Configure the submission form by means
of input-form.xml file
You can configure different forms for
different collections
You can create internal vocabularies for
the fields
31. input-form.xml
,
dc-schema (Required) : Name of metadata schema employed, e.g. dc for Dublin Core. This value must
match the value of the schema element defined in the Metadata Schema Registry
dc-element (Required) : Name of the element
dc-qualifier: Qualifier of the element entered, e.g. when the field is
contributor.advisor the value of this element would be advisor. Leaving this out means the input is for an
unqualified element.
repeatable: Value is true when multiple values of this field are allowed, false otherwise. When you mark
a field repeatable, the UI servlet will add a control to let the user ask for more fields to enter additional
values.
label (Required): Text to display as the label of this field, describing what to enter, e.g. "Your Advisor's
Name".
input-type(Required): Defines the kind of interactive widget to put in the form to collect the Dublin Core
value.
32. input-form.xml
,
hint (Required): Content is the text that will appear as a "hint", or instructions, next to the input fields. Can
be left empty, but it must be present.
required: When this element is included with any content, it marks the field as a required input. If the
user tries to leave the page without entering a value for this field, that text is displayed as a warning
message. For example, <required>You must enter a title.</required> Note that leaving the required
element empty will not mark a field as required, e.g.:<required></required>
33. input-form.xml – dropdown menus
,
To create an internal flat vocabulary you
have to:
• use the «dropdown», «qualdrop» or
«list» value within the <input-type>
element
• populate the <value-pairs> element
34. Hierarchical Taxonomies and Controlled Vocabularies
,
Dspace offers also a way for structuring
and managing more complex, hierarchical
controlled vocabularies
Managed in a separate file
Taxonomies are described in XML
Vocabularies are invoked from the input-
form.xml, using the <vocabulary> tag
within the related <field>
35. Batch submission process
,
Requires the creation of a DSpace Simple Archive:
• A directory for each item to import, containing:
• the files that make up the item.
• An xml file where each metadata element has it's own
entry within a <dcvalue> tagset. There are currently
three tag attributes available in the <dcvalue> tagset:
• <element> - the Dublin Core element
• <qualifier> - the element's qualifier
• <language>- (optional)ISO language code for
element
• A “contents” file, with the files enumeration
• An (optional) collection file with the information
about the collection(s) the item belongs to
<dublin_core>
<dcvalue element="title" qualifier="none">A
Tale of Two Cities</dcvalue>
<dcvalue element="date"
qualifier="issued">1990</dcvalue>
<dcvalue element="title"
qualifier="alternative"
language="fr">J'aime les Printemps</
dcvalue>
</dublin_core>
36. UI Batch Import
,
You have to:
• Compress the item
directories into a zip
files.
• Place the zip file in a
public domain URL, like
Dropbox or Google Drive
or wherever you have
access to do so
• Then log-in as
Administrator and fill
the form
38. Batch metadata editing
,
DSpace provides a batch metadata editing tool.
The batch editing tool facilitates the user to perform the following:
• Batch editing of metadata by means of a comma delimited file in CSV format
• Batch additions of metadata (e.g. add an abstract to a set of items)
• Batch find and replace of metadata values (e.g. correct misspelled surname across several records)
• Mass move items between collections
• Mass deletion, withdrawal, or re-instatement of items
• Enable the batch addition of new items (without bitstreams) via a CSV file
• Re-order the values in a list (e.g. authors)
40. Extending Dspace
Cultural Institutions in the «Big Data Age» ask for:
• Complex and multidimensional metadata structures
• Complex data models
• Relationships management between different entities
• Tools for digital data and resources visualization, analysis and
interpretation
Why not use an “extended” version of DSpace to meet these relevant
needs?
41. DSpace-GLAM
(Galleries, Libraries, Archives, Museums)
Built by 4Science on top of DSpace and to
meet the needs of Cultural Heritage
institutions
Flexible and extensible data model
inherited from DSpace-CRIS (our RIMS)
to manage relevant metadata standards
and specific conceptual models
With dedicated add-ons for digital objects
curation, fruition and sharing
Also an add-on for datasets visualization
and analysis
42. DSpace-GLAM
(Galleries, Libraries, Archives, Museums)
DSpace-GLAM is free, open source, compliant
with open standards
Add-ons are mainly distributed following a
new business model (crowdsourcing)
Provides institutions with
a sustainable and effective tool to manage
and analyze Cultural Heritage Information
43. Weakness of DSpace metadata management
• Flat metadata model
• Weak support for technical and structural metadata
• All information are stored as string at the database level with minimal
support (and validation) for data entry in the UI
• DSpace-GLAM improves the metadata at the item level providing:
-Additional input types for data entry (number, year and regex
validation)
-Partial support for nested metadata
-Support for technical and structural metadata
44. DSpace-GLAM: interoperability
,
• Connect to VIAF records and Getty Vocabularies for precise identification
of persons, artists and places
• It has been reported to work nicely with «plain» DSpace, with the
authority implementation. Plan to include it out-of-box in DSpace 7
45. Extending the DSpace Data Model
DSpace-GLAM can manage all the entities important to
contextualize digital cultural heritage:
• Persons
• Families
• Fonds
• Events
• Places
• Concepts
• …………..
Entities can be created to integrate different metadata
standards and conceptual models
47. • Persons
• Projects
• Organizations
are pre-defined entities inherited from DSpace-CRIS
… but you are not required to use (all of) them.
you can define additional entities
you can define your own relationships between entities, including the
ones that you have defined
Pre-defined entities
53. • Each DSpace-GLAM entity instance has a status flag
• Public: the details page is visible to anyone and it will be linked where
appropriate. The record is included in public search results
• Private: only administrators can access the details page. The entity is indexed
only for use as authority entry
• Each property/attribute value has an edit mode:
• Editable
• Visibility flag only
• Only Administrators
• Read only
• A field becomes visible when included in a public visible tab/box
Data model configurationDSpace-GLAM visibility and security
54. • Visibility of a tab or box can be restricted to
System administrators
Only RP owner
Admins and RP Owner
specific users and groups related to the entity instance
• To restrict the visibility of a box or tab to specific groups or users one
or more properties must be indicated containing the users and/or
groups that have access to the protected box / tab
Data model configurationDSpace-GLAM visibility and security
55. • It can be performed via UI and exported to xls
• It can be imported from XLS files
Data model configurationData model configuration
58. Data model configurationCreating inverse relationships between entities
DSpace-CRIS can use the SOLR indexes to reverse a relation
• Documents are linked to the person
• But you can also list the documents under a specific person
Relations are defined in the configuration spring file cris-relationpreference.xml and
characterized by
A name
The target entity (a CRIS Entity or a DSpace Item)
The SOLR query with {0}, {1} placeholders to be replaced with the CRIS-ID or the
uuid of the source CRIS instance
59. Data model configuration
Creating inverse relationships between entities
(cris-relationpreference.xml)
<bean id="relationINTERPRETATIONVSEVENTSConfiguration"
class="org.dspace.app.cris.configuration.RelationConfiguration">
<property name="relationName" value="crisinterpretation.events" />
<property name="relationClass" value="org.dspace.app.cris.model.ResearchObject"
/>
<property name="type" value="crisevents" />
<property name="query">
<value>crisevents.eventsrelatedinterpretation_authority:{0}</value>
</property>
</bean>
Name
Target entity
Solr query
60. Data model configurationCreating inverse relationships between entities
• Inverse relations can be
• Visualized
• Used to show aggregated statistics
• To be visualized, relations are embedded in components (see cris-
components.xml)
61. Data model configuration
Creating inverse relationships between entities
(cris-components.xml)
<!-- Dynamic object component -->
<bean id="doComponentsService" class="org.dspace.app.cris.integration.CrisComponentsService">
<property name="components">
<map>
<entry key="journalspublications" value-ref="publicationlistforjournals" />
<entry key="eventsdocuments" value-ref="publicationlistforevents" />
<entry key="placesevents" value-ref="eventlistforplaces" />
<entry key="eventsperson" value-ref="personlistforevents" />
<entry key="fondschild" value-ref="fondschildforfonds" />
<entry key="fondspublications" value-ref="publicationlistforfonds" />
<entry key="conceptdocuments" value-ref="publicationlistforconcept"/>
<entry key="conceptperson" value-ref="personlistforconcept"/>
</map>
</property>
</bean>
Name of the related box for
visualizing data
62. Data model configuration
Creating inverse relationships between entities
(cris-components.xml)
<!-- Person list for Events dynamic entity -->
<bean id="personlistforevents"
class="org.dspace.app.webui.cris.components.CRISRPConfigurerComponent">
<property name="relationConfiguration" ref="relationEVENTSVSRPConfiguration" />
<property name="commonFilter">
<util:constant
static-field="org.dspace.app.webui.cris.util.RelationPreferenceUtil.HIDDEN_FILTER" />
</property>
<property name="target" value="org.dspace.app.cris.model.ResearchObject" />
<property name="facets" ref="facetsRPforComponentConfiguration" />
<property name="types">
<map>
<entry key="all" value-ref="allObjectsComponent" />
</map>
</property>
</bean>
64. Data model configuration
Integrating DSpace and DSpace-GLAM
(dspace.cfg)
• All the GLAM’s entities can be
linked with DSpace Items and
used as authorities for item’s
metadata
• This can be done adding some
code to dspace.cfg file
##### Authority Control Settings #####
plugin.named.org.dspace.content.authority.ChoiceAuthority =
org.dspace.app.cris.integration.ORCIDAuthority = RPAuthority,
org.dspace.content.authority.ItemAuthority = PublicationAuthority,
org.dspace.content.authority.ItemAuthority = DataSetAuthority,
org.dspace.app.cris.integration.DOAuthority = EVENTAuthority,
org.dspace.app.cris.integration.DOAuthority = FONDSAuthority,
org.dspace.app.cris.integration.DOAuthority = CONCEPTAuthority,
org.dspace.app.cris.integration.DOAuthority = INTERPRETATIONAuthority,
65. Data model configuration
Integrating DSpace and DSpace-GLAM
(dspace.cfg)
choices.plugin.dc.relation.conference = EVENTAuthority
choices.presentation.dc.relation.conference = suggest
authority.controlled.dc.relation.conference = true
cris.DOAuthority.dc_relation_conference.filter = resourcetype_authority:events
cris.DOAuthority.dc.relation.conference.new-instances = events
ItemCrisRefDisplayStrategy.publicpath.dc.relation.conference = events
choices.plugin.dc.relation.concept = CONCEPTAuthority
choices.presentation.dc.relation.concept = suggest
authority.controlled.dc.relation.concept = true
cris.DOAuthority.dc.relation_concept.filter = resourcetype_authority:concept
cris.DOAuthority.dc.relation.concept.new-instances = concept
ItemCrisRefDisplayStrategy.publicpath.dc.relation.concept = concept
choices.plugin.dc.relation.fond = FONDSAuthority
choices.presentation.dc.relation.fond = suggest
authority.controlled.dc.relation.fond = true
cris.DOAuthority.dc_relation_fond.filter = resourcetype_authority:crisfonds AND
crisfonds.fondsleaf:true
ItemCrisRefDisplayStrategy.publicpath.dc.relation.fond = fonds
Authority name
Display mode
For authority values
Origin
for authority values
Entity to populate
with new values
Authority has its own ID
Path to use to link
the entity
69. Data model configurationClustering of related objects
Out-of-the-box are available components implementations to allow
configurable rendering of inverse relation for each entities (dspace items or
dspace-glam entities)
It is possible
• to configure which facets show in the component
• to apply filters to the relation
• It is possible to enable a clustering using custom categories defined
by facet queries
It is aware of the preference expressed for the relationships
70. Managing hierarchical archival structures
Extending the data model makes the system able to manage the hierarchical
metadata structure required by archival standards such as ISAD (G) and EAD
DSpace-GLAM can also manage the production and preservation context of the
archive required by ISAAR-CPF, EAC-CPF and ISDIAH
76. Pointing out Social Networks
The system is able to draw graphs based on relationships between Persons using data
from the different entities and from the DSpace Items
In particular it draws relationships between persons who:
• Co-authored the same items
• Partecipated in the same event(s)
• Partecipated in event(s) in the same place(s)
• Are related to the same concept(s)
78. Network configuration (network.cfg)
Networks are implemented by plugins
You can write your own implementation typically starting from the default
ones
You can canfigure the network layout (colors, nodes numbers, levels)
79. Formalizing and analysing interpretations
Interpretations are logical processes which starts from
data and/or assumptions
and
through logical reasoning and connecting persons, events, documents, etc.,
arrive to one or more conclusions
Often, in humanities, such processes are merged and hidden within natural
language narratives
To make such processes explicit, we have to scompose them in different
components and in atomic propositions and display such elements
81. Linking interpretations to entities
With DSpace-GLAM you can link an interpretation to the items, the
events and the persons, it is related to
Moreover you can link different interpretations to the same entity
82. Contextualizing historical data
Painting: The flagellation Painter: Piero della Francesca
Event: Council of Ferrara (AD 1438)
Event: Council of Mantua (AD 1459)
Place: FerraraConcept: Renaissance
Concept: Humanism
Concept: Neoplatonism
Person: Emperor John VIII Palaiologos
Place: Mantua
Interpretation: Ronchey
84. Ready for Linked Open Data
Linking and relating the created
entities with other authorities,
the institution is ready to be part
of the Linked Data Graph
Now we are working to include
also the additional entities into
the DSpace RDF management
features
GLAM
91. DSpace-GLAM use cases
Cutural Heritage image files (digitalized manuscripts, paintings, monuments,
archaeological finds, rare books, etc.) need to be consulted online, discussed
and commented / annotated
IIIF protocols and formats allow you to meet these requirements in a standard
and understandable (for both humans and machine) way
92. DSpace-GLAM use cases
High-quality scanned books have images typically over 100MB for each page
The structure of image sequences are complex and relevant (sequences of
pages, of the phases of an historical event, of a cycle of frescoes, etc.)
93. DSpace-GLAM use cases
The same requirements apply to audio and video content
-Streaming
-Internal structure
-Annotation / commenting / transcript
Adopt an open standard: the MPEG-DASH format allows adaptive streaming over simple
html client with full support for multiple tracks, ToC, subtitles
94. 4Science IIIF Image Viewer Addon
IIIF Compliant
1. Presentation API
2. Image API
3. Search API
4. Authentication API
(soon)
100. Link images with their textual
transcription / OCR
Indexing standard format (hOCR) in a webannotation
server to supply IIIF Search API
101. Side by side – image vs text using an additional OCR
panel
102. An example in Arabic characters
https://dspace-glam.4science.it/handle/1234/24
103. IIIF Image Viewer: share and reuse
Share images with other scholars/users
without waiving proper attribution, e.g.
using the «manifest» JSON file:
https://dspace-
glam.4science.it/json/iiif/1234/11/30/
manifest
in another IIIF Image Viewer:
http://projectmirador.org/demo/
105. Audio/Video streaming
https://dspace-
glam.4science.it/explore?bitstream_id=1841&handle=1234
/7&provider=video-streaming
Allows the transcode of the audio/video formats in a
format and encoding appropriate to the adopted media
server (adaptive video streaming)
Using the DASH standard protocol allows sharing video with
other scholars/users without waiving proper attribution,
e.g. using the «manifest» XML file:
https://dspace-glam.4science.it/av-
stream/1841/ch/0/29/94/83/manifest.mpd
in another DASH client
http://dashif.org/reference/players/javascript/v2.4.1/sampl
es/dash-if-reference-player/index.html
106. Visualizing and analysing datasets
4Science has released a free and open source integration with CKAN, the
world's leading open-source data management platform
Using an extensible viewer framework you can now offer data discovery,
exploration, preview, sampling and visualization from your DSpace repository
CKAN makes open webservices for tabular data available: https://ckan.org/
107. Visualizing and analysing datasets
We look at Dspace-GLAM not only as a tool for
management and preservation, but also for
analysis
Our integration with CKAN allows the
visualization and analysis of repertoires and
inventories by means of grids, graphs or maps
Datasets can also be related to items and other
entities
https://dspace-
glam.4science.it/handle/1234/15
Archaeological finds geolocalization
108. Visualizing and analysing datasets
https://dspace-glam.4science.it/explore?bitstream_id=1971&handle=1234/22&provider=ckan-recline
Pottery distribution
109. Why do I need DSpace-GLAM?
• DSpace-GLAM is a powerful extension of DSpace created by 4Science to meet
the needs of Galleries, Libraries, Archives and Museums
• to be able to manage, analyze and preserve digital objects
• together with historical, archaeological or other cultural datasets,
• relating them with other entities such as persons, events, places,
concepts, etc.
• to describe the context of cultural objects and data, according to different
granularity levels, and to different interpretations
• using worldwide adopted, cutting-edge, open-source software and open
standards
110. How I get DSpace-GLAM?
• Every institution, can install Dspace-GLAM or upgrade its DSpace installation
to DSpace-GLAM, extending documents management by creating new entities
• Your publications will be safely managed as before, adding the advantage of
linking them to relevant information such as authors, datasets, events,
concepts, networks and much more
111. When can I move to DSpace-GLAM?
• Now: every moment is appropriate to enhance your Digital Library, to better
support research activities and make your service more relevant
• Upgrading from DSpace to DSpace-GLAM or installing a brand-new “extended”
DLMS does not take much extra effort and it is largely rewarded by the
extraordinary results that you can get
• As an extra security, (if you already have a Dspace repository) DSpace-GLAM
does not alter the structure of the current objects managed by DSpace, so you
can go back from DSpace-GLAM to DSpace at any time just dropping (a lot of)
extra tables… but we are confident that you will not want to do that
112. • Our goal is to provide an environment for integrating the traditional
hermeneutic and interpretative work of historical sciences with data
visualization and analysis
• In this way, we hope, there may be a fundamental change in the way
digital cultural heritage is experienced, analyzed and contributed to
by the whole scientific community
Data Science in a Digital Humanities Framework
113. Thanks for your attention
Andrea Bollini
<andrea.bollini@4science.it>
mobile: +39 333 934 1808
skype: a.bollini
orcid: 0000-0002-9029-1854
Claudio Cortese
<claudio.cortese@4science.it>
mobile: +39 333 9340846
skype: claudio.cortese74
orcid: 0000-0003-4572-9711