SlideShare une entreprise Scribd logo
1  sur  15
Télécharger pour lire hors ligne
[Unclear] words are denoted in square brackets
#4 FAIR Data Principles – R is Reusable
20 September 2017
Video & slides available from ANDS website
START OF TRANSCRIPT
Ketih Russell: Welcome to the fourth webinar in this series
My name is Keith Russell from the Australian National Data Service and
I am your host for today. My colleague Susannah Sabine is behind the
scenes co-hosting the webinar with me
The Australian National Data Service works with research organisations
around Australia to establish trusted partnerships, provide reliable
services to add value to research data and enhance the capability in
the research sector.
We work together with two other NCRIS funded projects RDS
(Research Data Services) and Nectar to create an aligned set of joint
investments to deliver transformation in the research sector.
This webinar is part of a series of ANDS activities which aim to support
the Australian research community in increasing our ability to manage
our research data as a national asset.
#4 FAIR – R for Reusable Page 2 of 15
This is the fourth and final in a series of webinars on the FAIR data
principles. We have had webinars on Findable, Accessible,
Interoperable. Today we will talk about making data re-usable,
according to the FAIR data principles.
Today I will kick off with an introduction to what the Force11 FAIR data
principles say about making your research data re-usable.
Then we will have two speakers that will talk about how you can take
these principles and apply these in practice.
First we will have Nerida Quatermass from Creative Commons
Australia who will provide more information on using the Creative
Commons Licensing framework and things to think about when
choosing a licence.
After that Margie Smith from Geoscience Australia will present on the
work that they have been doing on attaching provenance information to
research data.
These are the elements that the Force11 have described for making
your research data re-usable.
First of all it is important to note that the other elements under FAIR
(Findable, Accessible, Interoperable) are also really important to make
data re-usable.
If nobody can find the data it will not be re-used for example.
The first high level heading is that the data and the metadata should
have a plurality of accurate and relevant attributes. Under this heading
they have described three elements that these attributes should cover.
1) Number one is that the data and the metadata should be released
with a clear and accessible license for the data. Making data available
but not assigning any licence makes the data really hard to re-use, it is
completely unclear as a re-user what you can actually do with the data.
#4 FAIR – R for Reusable Page 3 of 15
If you attach a licence make sure that it is in a machine readable format.
That way machines can access the data and know whether it can be
used for analysis.
Nerida will explain about a possible framework to use to assign a
licence to the data.
2) Number two is that the data and the metadata are associated with
provenance information on how the data was created. This provides
clarity on the steps that were taken in collecting, selecting, analyzing
the data. Turning it from raw data into derived data and finally the final
data set. This is extremely useful information if you want to re-use the
data as this provides context and gives you background on whether the
data will also be suitable for your purposes.
Attaching provenance information is easier said than done and I am
very grateful that Margi Smith is willing to present on how they have
picked up this challenge at Geoscience Australia.
3) The final point is that the data and the metadata should meet domain-
relevant community standards. For example, the data is best in a data
format and file format that is commonly used the discipline so it is easy
for another researcher in that discipline to pick up and use. Also use a
metadata format that is common in that discipline as that often contains
specific fields that are relevant to that discipline and help a researcher
in that field to quickly understand the potential re-use of the data set.
I would now like to ask our two speakers to talk in more detail about
aspects that are relevant for making the data re-usable.
First we will have Nerida Quatermass from Creative Commons
Australia, based at QUT.
She will present on the Creative Commons Licensing framework and
considerations on using these licences.
Next we will have Margie Smith from Geoscience Australia who will
present on how GA has attached provenance information to data.
#4 FAIR – R for Reusable Page 4 of 15
We will save up questions till the end of the webinar. But please feel
free to already type this in the question box as we go along.
Nerida Quatermass: Copyright law grants the monopoly over a work in material form to the
“owner” of it. CC licences have filled a need for a public licence ie. one
that anybody can rely on as permission to re-use a work. Before CC
licences, the only way to get re-use rights was by exceptions allowed
in copyright law or, licences directly negotiated between a copyright
owner and a licencee.
Public licences like CC are central to opening up access to research
output, including sharing of data associated with these.
I’ve put an open access spectrum there because it’s really important
to distinguish between free access and re-usability which starts with
permission to share, and extends to the right to make derivative
works.
These permissions to re-use are communicated with a clear machine
readable licence.
You are probably all know about CC licences. But as an overview:
Four licence elements can be combined, resulting in Six CC licenses.
They are featured in this slide on a spectrum of allowing more to less
re-use of a work.
The most open or permissive licence is Attribution, the most restrictive
is Attribution-Noncommercial-NoDerivatives.
The “free cultural Works seal” was developed for Wikimedia content. It
signals an important delineation between less and more restrictive
licenses applied to works in the digital commons. It distinguishes non-
software works.
In addition to the licenses, CC offers two public domain tools.
CC0, the public domain tool, for creators, and
#4 FAIR – R for Reusable Page 5 of 15
The PD Mark is used to indicate works that are already in the PD.
(used commonly by cultural heritage institutions in digital collections)
(C with a strikethrough)
CC0 can be particularly important to maximize the re-use of data and
databases because it otherwise may be unclear whether highly factual
data and databases are restricted by copyright or other rights. CC0 is
intended to cover all copyright and database rights, so however data
and databases are restricted under copyright or otherwise, those
rights are all surrendered.
It is foremost a waiver. It means you waive all of YOUR rights so that
you have zero rights left in a work, effectively dedicating it to the
public domain.
It has a legal code beneath it, because you need a legal mechanism
to relinquish your rights.
When you release content under CC Zero, you are explicitly stating
that you do not expect attribution. There's a little bit of uncertainty
around CC0 because Australian moral rights are fairly new, but the
licences are designed as carefully as possible to respect the author's
wishes.
The mains point are:
Do license your data – international rules are too variable to rely on
public domain
CC0 –ensures maximum compatibility with other licensed works and
prevents attribution stacking (e.g. attributing many in a project; the
immediate source of a derivative work +++ upstream works- there are
other ways to acknowledge contribution)
Next best CC-BY – if really want attribution to be legal requirement.
The licences, communicate re-use rights through the three layer
design-
#4 FAIR – R for Reusable Page 6 of 15
1. The Legal Code is the legal instrument which states the terms and
conditions of the licence
2. Human Readable format is a plain-language summary of the
licence, with relevant icons to clearly indicates conditions of licensing
and the re-use rights under the licence- You are free to… under the
following terms
In addition to supporting its reuse by individuals the FAIR Principles
put specific emphasis on enhancing the ability of machines to
automatically find and use the data- bringing us to the third layer:
A machine-readable translation of the licence attaches to digital works
or digital copies of work. The transation code (rights expression
language) becomes embedded in the digital source, which helps
search engines and other applications identify a work. This can also
be achieved by uploading a work to a content sharing platform that
supports CC licensing and takes care of the machine-readability for
you.
It’s also important to mark a work with the licence. I’ll talk about
marking shortly.
Regarding the robustness of the legal instrument:
The Creative Commons Licences have been upheld in every
jurisdiction in which litigation concerning them has occurred. There
have been no recorded cases of litigation concerning a Creative
Commons Licence in Australia, which would tend to support the
quality of their construction.
CC Licences are irrevocable, so last for the term of copyright.
The are non-exclusive, so it is open to the rights holder to apply
another licence to the material should the need arise. For example, if
you release material under a CC- Non-Commercial Licence, but a
commercial partner wishes to exploit the material, you can enter into a
separate licence with the commercial partner that permits commercial
reuse. This is known as “Dual licensing”.
#4 FAIR – R for Reusable Page 7 of 15
To maximise discoverability by search engines and software systems
make sure to use our license chooser tool to get the machine-
readable html code. The licence chooser also mints the licence for
marking a work.
Four important things come out of this- licence selection, attribution,
citation and more permissions:
1. Licence selection- guided by questions about what re-use you will
allow:
• Allow adaptations of your work to be shared?
• Allow commercial uses of your work?
• Remember that if your work is an adaptation of a work licensed
under either CC BY-SA or CC BY-NC-SA, then your derivative
work must be made available under the same license as per the
ShareAlike condition.
2. Attribution is a base condition of all the CC licences.
Flexibility for attribution requirements: “reasonable to means, medium
and context”- can link to separate resource-
licensor may waive some or all of the attribution requirements.
licensor may waive some or all of the attribution requirements-
3. Citation- location of the work, and also source works: Answers
concerns from data creators about being able to find the original
data.
• If the work you are licensing is a derivative of another work, then then
you need to communicate that your work is a derivative: including
the source URL of the original work and derivative/ modification
described.
• When modifying materials under one of the Version 4.0 CC licenses,
you must make a note of any modifications you make to the
materials, regardless of whether the modification is significant
#4 FAIR – R for Reusable Page 8 of 15
enough to merit a derivative work, and provide URI back to source.
Answers concerns from data creators about being able to find the
original data.
• It might be unfeasible to include attribution within a merged dataset in
which case, include URI back to unmodified version.
Lastly, More permissions:
• For example, if you license something under CC BY but are okay with
people not attributing you in certain cases--this is your chance to
specify those cases.
• You can't change the terms of a CC license, but you can always grant
additional permissions or warranties beyond what the license
allows?
• Does your work incorporate elements of several third party materials?
– Mark these and provide attribution.
Marking communicates the licence ON the work: here is a list of ways to
mark a work
Regarding content platforms: If there is no licence field there is usually
a description or other free form field where you can enter info about a
work.
My key message today is Re-use is a core component of FAIR data. So,
do licence your data to enable re-use.
Creative Commons licenses provide a simple mechanism
• to ensure that users of research have the rights they need to reuse,
replicate, and apply research outputs and data.
• To disseminate and communicate research output in order to
maximise the impact of work while protecting intellectual property
and academic integrity
• With built in attribution and citation which creates a clear path to the
original data.
#4 FAIR – R for Reusable Page 9 of 15
Margie Smith: Hi there!
My name is Margie Smith and I have worked at Geoscience Australia
since November 2016 in the Science Data Governance and Policy
team… a team of two.
I came across to help GA meet its obligations under the National
Archives of Australia’s Digital Continuity 2020 Policy, to bring some
external policy knowledge into the organisation and to provide
governance guidance around science data management.
In response to the National Archives Digital Continuity 2020 Policy
and other Australian Government Open Data policies, government
organisations have been tasked with making their data holdings
visible and available.
Making data open is not new to GA but there is most definitely now a
whole of government push for access to all data domains.
I have produced several documents to meet the DC2020 data
governance milestones, but as you can see from this diagram, there
has to be a balance of both oversight and execution across the data
lifecycle – to have one without the other will either produce a pile of
documents that nobody reads or a plethora of silos of excellence
generating portals, datasets and services that only those in the know
can find and use.
Whilst there are a series of external drivers for data management, use
and re-use, there are also strong drivers currently within the
organisation.
For example:
• the cost of collecting or acquiring the data
• the cost of not finding data previously acquired or
• finding data and not being the person who ‘knows’ all about it
#4 FAIR – R for Reusable Page 10 of 15
• succession planning
• analogue collections – diaries or paper products that have yet to
be digitised
• general public servant obligations like the Archives Act
• and, of course, GA’s Science Principles and vision.
Provenance will support the organisation through enabling data re-use
(as you can now find it) and allow for transparent science and advice
through understanding the data supply chain.
At the moment, our metadata records indicate provenance of the data
through the lineage statement or in the abstract.
As shown in these examples, the provenance of a dataset or product
are usually free-text and can be semi-structured or unstructured.
Very concise or…
… not exactly concise.
Here the abstract includes everything you need to know about the
Coastal inundation modelling for Busselton, Western Australia, under
current and future climate.
Whilst this provenance information is very useful, it is not particularly
useable; and by useable*
, I mean its ability to be located, retrieved,
presented and interpreted – by person or ideally, by machine search.
*
from the ISO 15489-1:2016 Information and documentation --
Records management -- Part 1: Concepts and principles
As an example of why we need provenance for data reuse, I have
made up a scenario.
In this scenario, the advice was generated from the complete dataset
at the time.
A scientist generated a model using algorithms and provided advice
based on the output of the model.
#4 FAIR – R for Reusable Page 11 of 15
The advice, assuming it was of a general nature, is then made
available through the catalogue – generally as a PDF document.
The metadata for the advice gives the name of the dataset used, the
area that the advice covers, the organisation as author of the report,
and perhaps some of the methodology used in the generation of the
report.
In most cases, you could link the advice to the name of the dataset
that was used to generate the advice, but not easily to the scientist or
team and the models used to generate the advice.
So this provenance model of a data product could work well as a
highly structured PROV system.
My colleague Nick Car gave a presentation on GA’s PROV model to
ANDS in March and I suggest you watch that for specific information
about the model at Geoscience Australia.
Adapting Nick’s model, I have tried to replicate my previous scenario –
modelling what we are working towards at GA.
This is currently happening through lineage and association with
digital objects rather than a true PROV model of digital objects.
Working from right to left, the Advice would have a metadata record in
eCat, our electronic catalogue, that indicates the process used to
generate the advice, which is made up of the temporal subset of the
dataset the advice is based on, the software or models that were
applied to the data and information around that data’s acquisition as
well as the reason the advice was required.
If the data is to be re-used in future advice, it might also be helpful to
know what models were tried previously that didn’t work.
For our catalogue-like things, we need to gradually add the ability to
link Entities, Agents, Activities etc to be able to use graph structured
provenance (PROV-DM) across multiple types of objects and across
multiple systems in the future.
#4 FAIR – R for Reusable Page 12 of 15
In my role I am particularly interested in the repeatability of advice
given by any government entity. Per the Archives Act, advice of this
type given by government must be stored for a period of years and
include the models, algorithms, software and data used to generate
the advice. It is a safety net for the entity and the public servants that
generated the advice at that point in time.
This is currently a manual process, heavily reliant on the individual
generating the advice and storing it appropriately.
It would be excellent if the work we are currently undertaking would
make it a lot easier for scientists to generate and catalogue this advice
in the future.
Prior to sorting out what I wanted to include in this presentation, I had
another look at the FAIR principles for data reuse.
Looking at these principles, I was feeling a lot better about what has
been achieved at GA in the last 18months.
We have a public catalogue, it has a clear and accessible data usage
license and the standards used for cataloguing are in the spatial
domain.
The lineage in a metadata record has been the de facto ‘data
provenance’ to date.
We are currently working on multi-domain metadata retrieval from our
catalogue; for example, we will be able to export records in AGRIF for
Records Management, ISO19115 for spatial and DCAT for the
National Archives.
The Google search is already enabled in the search panel on the
ga.gov.au splash page – this enables a search of both the website
and the catalogue for content.
In June, I was fortunate to attend the Open Geospatial Consortium
technical meeting which is an international spatial standards
organisation. It was evident in discussions there that many other
#4 FAIR – R for Reusable Page 13 of 15
countries were also working towards delivering their catalogues in
formats other than spatial to enable searching by other domains.
We have a new catalogue, our eCat: where metadata records will
have
• a persistent identifier
• the license for data re-use is clear
• you can get to the data or product directly from the metadata
record
• and records for data are linked to services and portals that use
them, and vice versa.
At the moment, we are working to publish the 19115-3 catalogue
schema and codelists that are used by GA in the catalogue.
In terms of oversight, we have data product plans, roles and
responsibilities, and workflows for the release of products from GA
through eCat which is a longstanding and well understood process.
For the past month, my area has been undertaking work to highlight
the need for science areas to focus on a data-first rather than product-
first view. This data-first process will echo the data product publishing
workflows and have a dedicated internal catalogue we are calling
SourceCat.
SourceCat is a clone of the eCat software and is being trialled within
two areas of GA before being released across the organisation.
Once we have this in place, being able to show provenance from the
product to the data will be made easier as we start the process at the
beginning rather than try and remediate at the product publishing end
of a project.
This is a view of our new eCat – the electronic catalogue for products
generated at GA.
We have moved to the newer metadata standard for Australian Spatial
Data, the ISO 19115-1:2014 which you can see indicated on the page.
#4 FAIR – R for Reusable Page 14 of 15
There are also Keyword lists which have been somewhat free-forming
to date. We have now selected well defined vocabularies where they
exist and are working with the custodians to publish them whilst at the
same time wrapping a governance structure around their maintenance
and future extension.
There is a persistent id and data download is indicated.
In the scenario I gave before, I pictured how the provenance of a data
product would work well as part of a highly structured PROV model.
The structure required supports data provenance and re-use even if it
doesn’t become a PROV system immediately.
The Source Catalogue is currently being built as a proof of concept for
two science areas in the organisation with the intention of making it an
agency tool for all data that is acquired or created.
In the future we intend to have a Software Catalogue and Objects
Catalogue so that the software or models used in data curation or
data products can be included as per PROV models. These are all
clones of the eCat software.
With this comes the need to support the organisation with tools and
documented procedures that in the future will become automagic
processes to bring data into the building. This support is more of the
oversight and execution balance that I spoke of earlier.
We are also using the catalogue standard to introduce elements that
will align with a future PROV model.
We will be including the element ‘derivedFrom’ in the metadata record.
In the future, if a product does not have a ‘derivedFrom’ element, it will
not be published.
Further into the future we will include the element ‘haveProv’, which is
different to lineage, as it is forward facing – linking the data to all
products that have used it.
#4 FAIR – R for Reusable Page 15 of 15
By having all these links embedded, Nick explained that this will allow
a machine readable PROV-record to link to a metadata record to
indicate provenance exists. He then started talking PROV bundles
and lost me but hopefully all these steps will lead to the working
PROV model of the future GA.
I was also thinking about the next talk on licensing frameworks. In this
future machine-to-machine scenario, the licenses of aggregated
products may be determined through an automated rule set
depending on the way the data product is delivered.
In this example a dataset and its associated web service have
differing licences. For third-party aggregated data use this process is
currently determined through extensive written agreements for each
product.
Finally, it takes a lot of work to remediate legacy metadata records.
Are we going to remediate every single one of our legacy data
records? NO – or at least not straight away. Not all data is high value
nor does all data have to be highly useable, but all data acquired and
data products created should be FAIR.
To re-use data, it is necessary to understand its provenance to assess
if it is fit for purpose and in working towards a PROV model and
implementing tools like the SourceCat we are also further along the
path to achieving GA’s vision to fully maximise our data potential.
END OF TRANSCRIPT

Contenu connexe

Tendances

The open semantic enterprise enterprise data meets web data
The open semantic enterprise   enterprise data meets web dataThe open semantic enterprise   enterprise data meets web data
The open semantic enterprise enterprise data meets web data
Georg Guentner
 
Legal interoperability: glocal perspective (LAPSI, Torino)
Legal interoperability: glocal perspective (LAPSI, Torino)Legal interoperability: glocal perspective (LAPSI, Torino)
Legal interoperability: glocal perspective (LAPSI, Torino)
Federico Morando
 

Tendances (15)

A Look at CESSDA and Data Re-use Licenses
A Look at CESSDA and Data Re-use LicensesA Look at CESSDA and Data Re-use Licenses
A Look at CESSDA and Data Re-use Licenses
 
FOSDEM 2012 Legal Devroom: ⊂ (FLOSS legal/policy ∩ CC [4.0])
FOSDEM 2012 Legal Devroom: ⊂ (FLOSS legal/policy ∩ CC [4.0])FOSDEM 2012 Legal Devroom: ⊂ (FLOSS legal/policy ∩ CC [4.0])
FOSDEM 2012 Legal Devroom: ⊂ (FLOSS legal/policy ∩ CC [4.0])
 
Monica Crocker & Cathy Beil eDiscovery In The Real World
Monica Crocker & Cathy Beil eDiscovery In The Real WorldMonica Crocker & Cathy Beil eDiscovery In The Real World
Monica Crocker & Cathy Beil eDiscovery In The Real World
 
Audio Discovery by Albert Kassis
 Audio  Discovery by Albert Kassis  Audio  Discovery by Albert Kassis
Audio Discovery by Albert Kassis
 
The Basics of Cloud Computing
The Basics of Cloud ComputingThe Basics of Cloud Computing
The Basics of Cloud Computing
 
Hello Open World - Semtech 2009
Hello Open World - Semtech 2009Hello Open World - Semtech 2009
Hello Open World - Semtech 2009
 
Domain Name System
Domain Name SystemDomain Name System
Domain Name System
 
The open semantic enterprise enterprise data meets web data
The open semantic enterprise   enterprise data meets web dataThe open semantic enterprise   enterprise data meets web data
The open semantic enterprise enterprise data meets web data
 
Dangerous Liaisons - Software Combinations as Derivative Works?
Dangerous Liaisons - Software Combinations as Derivative Works?Dangerous Liaisons - Software Combinations as Derivative Works?
Dangerous Liaisons - Software Combinations as Derivative Works?
 
Chapter 5 - Developments in Multimedia and Internet Licensing - The Licensing...
Chapter 5 - Developments in Multimedia and Internet Licensing - The Licensing...Chapter 5 - Developments in Multimedia and Internet Licensing - The Licensing...
Chapter 5 - Developments in Multimedia and Internet Licensing - The Licensing...
 
Partly Sunny With a Chance of Rain: Forecasting the Legal Issues in Cloud Com...
Partly Sunny With a Chance of Rain: Forecasting the Legal Issues in Cloud Com...Partly Sunny With a Chance of Rain: Forecasting the Legal Issues in Cloud Com...
Partly Sunny With a Chance of Rain: Forecasting the Legal Issues in Cloud Com...
 
Lgd 2
Lgd 2Lgd 2
Lgd 2
 
Tags, Networks, Narrative
Tags, Networks, NarrativeTags, Networks, Narrative
Tags, Networks, Narrative
 
How to Publish Open Data
How to Publish Open DataHow to Publish Open Data
How to Publish Open Data
 
Legal interoperability: glocal perspective (LAPSI, Torino)
Legal interoperability: glocal perspective (LAPSI, Torino)Legal interoperability: glocal perspective (LAPSI, Torino)
Legal interoperability: glocal perspective (LAPSI, Torino)
 

Similaire à Transcript #4 fair -R for Reusable

The Proliferation And Advances Of Computer Networks
The Proliferation And Advances Of Computer NetworksThe Proliferation And Advances Of Computer Networks
The Proliferation And Advances Of Computer Networks
Jessica Deakin
 
Sears web30e connectionartificialintelligence
Sears web30e connectionartificialintelligenceSears web30e connectionartificialintelligence
Sears web30e connectionartificialintelligence
hrpiza
 
Sears web30e connectionartificialintelligence
Sears web30e connectionartificialintelligenceSears web30e connectionartificialintelligence
Sears web30e connectionartificialintelligence
hrpiza
 
Secure File Sharing on Cloud
Secure File Sharing on CloudSecure File Sharing on Cloud
Secure File Sharing on Cloud
Supriya .
 

Similaire à Transcript #4 fair -R for Reusable (20)

How FAIR is your data? Copyright, licensing and reuse of data
How FAIR is your data? Copyright, licensing and reuse of dataHow FAIR is your data? Copyright, licensing and reuse of data
How FAIR is your data? Copyright, licensing and reuse of data
 
Research data management 1.5
Research data management 1.5Research data management 1.5
Research data management 1.5
 
Session 1.1 dalicc - data licenses clearance center
Session 1.1   dalicc - data licenses clearance centerSession 1.1   dalicc - data licenses clearance center
Session 1.1 dalicc - data licenses clearance center
 
Sharing Scientific Data: Legal, Normative and Social Issues
Sharing Scientific Data: Legal, Normative and Social IssuesSharing Scientific Data: Legal, Normative and Social Issues
Sharing Scientific Data: Legal, Normative and Social Issues
 
Lightweight rights modeling and linked data publication for online cultural h...
Lightweight rights modeling and linked data publication for online cultural h...Lightweight rights modeling and linked data publication for online cultural h...
Lightweight rights modeling and linked data publication for online cultural h...
 
Smith RDAP11 NSF Data Management Plan Case Studies
Smith RDAP11 NSF Data Management Plan Case StudiesSmith RDAP11 NSF Data Management Plan Case Studies
Smith RDAP11 NSF Data Management Plan Case Studies
 
Fact Sheet on Creative Commons & Open Science (by Creative Commons UK)
Fact Sheet on Creative Commons & Open Science (by Creative Commons UK)Fact Sheet on Creative Commons & Open Science (by Creative Commons UK)
Fact Sheet on Creative Commons & Open Science (by Creative Commons UK)
 
Open Data in a Day - Licensing, Law and Best Practice
Open Data in a Day - Licensing, Law and Best PracticeOpen Data in a Day - Licensing, Law and Best Practice
Open Data in a Day - Licensing, Law and Best Practice
 
The Proliferation And Advances Of Computer Networks
The Proliferation And Advances Of Computer NetworksThe Proliferation And Advances Of Computer Networks
The Proliferation And Advances Of Computer Networks
 
thesis defense1
thesis defense1thesis defense1
thesis defense1
 
Puneet Kishor - The new Creative Commons 4.0 Licence – what’s new and why it’...
Puneet Kishor - The new Creative Commons 4.0 Licence – what’s new and why it’...Puneet Kishor - The new Creative Commons 4.0 Licence – what’s new and why it’...
Puneet Kishor - The new Creative Commons 4.0 Licence – what’s new and why it’...
 
Sears web30e connectionartificialintelligence
Sears web30e connectionartificialintelligenceSears web30e connectionartificialintelligence
Sears web30e connectionartificialintelligence
 
Sears web30e connectionartificialintelligence
Sears web30e connectionartificialintelligenceSears web30e connectionartificialintelligence
Sears web30e connectionartificialintelligence
 
How Global Data Availability Accelerates Collaboration And Delivers Business ...
How Global Data Availability Accelerates Collaboration And Delivers Business ...How Global Data Availability Accelerates Collaboration And Delivers Business ...
How Global Data Availability Accelerates Collaboration And Delivers Business ...
 
Creating OERs, problems and solutions: The law, Accessibility, Metadata
Creating OERs, problems and solutions: The law, Accessibility, MetadataCreating OERs, problems and solutions: The law, Accessibility, Metadata
Creating OERs, problems and solutions: The law, Accessibility, Metadata
 
FINODEX open data training
FINODEX open data trainingFINODEX open data training
FINODEX open data training
 
Secure File Sharing on Cloud
Secure File Sharing on CloudSecure File Sharing on Cloud
Secure File Sharing on Cloud
 
2014 11-17 crichton institute talk on open data
2014 11-17 crichton institute talk on open data2014 11-17 crichton institute talk on open data
2014 11-17 crichton institute talk on open data
 
Unit 1.4 Research
Unit 1.4 ResearchUnit 1.4 Research
Unit 1.4 Research
 
Two Level Auditing Architecture to Maintain Consistent In Cloud
Two Level Auditing Architecture to Maintain Consistent In CloudTwo Level Auditing Architecture to Maintain Consistent In Cloud
Two Level Auditing Architecture to Maintain Consistent In Cloud
 

Plus de ARDC

Plus de ARDC (20)

Introduction to ADA
Introduction to ADAIntroduction to ADA
Introduction to ADA
 
Architecture and Standards
Architecture and StandardsArchitecture and Standards
Architecture and Standards
 
Data Sharing and Release Legislation
Data Sharing and Release Legislation   Data Sharing and Release Legislation
Data Sharing and Release Legislation
 
Australian Dementia Network (ADNet)
Australian Dementia Network (ADNet)Australian Dementia Network (ADNet)
Australian Dementia Network (ADNet)
 
Investigator-initiated clinical trials: a community perspective
Investigator-initiated clinical trials: a community perspectiveInvestigator-initiated clinical trials: a community perspective
Investigator-initiated clinical trials: a community perspective
 
NCRIS and the health domain
NCRIS and the health domainNCRIS and the health domain
NCRIS and the health domain
 
International perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataInternational perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research data
 
Clinical trials data sharing
Clinical trials data sharingClinical trials data sharing
Clinical trials data sharing
 
Clinical trials and cohort studies
Clinical trials and cohort studiesClinical trials and cohort studies
Clinical trials and cohort studies
 
Introduction to vision and scope
Introduction to vision and scopeIntroduction to vision and scope
Introduction to vision and scope
 
FAIR for the future: embracing all things data
FAIR for the future: embracing all things dataFAIR for the future: embracing all things data
FAIR for the future: embracing all things data
 
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian DuncanARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
 
Skilling-up-in-research-data-management-20181128
Skilling-up-in-research-data-management-20181128Skilling-up-in-research-data-management-20181128
Skilling-up-in-research-data-management-20181128
 
Research data management and sharing of medical data
Research data management and sharing of medical dataResearch data management and sharing of medical data
Research data management and sharing of medical data
 
Findable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) dataFindable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) data
 
Applying FAIR principles to linked datasets: Opportunities and Challenges
Applying FAIR principles to linked datasets: Opportunities and ChallengesApplying FAIR principles to linked datasets: Opportunities and Challenges
Applying FAIR principles to linked datasets: Opportunities and Challenges
 
How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018
 
Ready, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
Ready, Set, Go! Join the Top 10 FAIR Data Things Global SprintReady, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
Ready, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
 
Peter neish DMPs BoF eResearch 2018
Peter neish DMPs BoF eResearch 2018Peter neish DMPs BoF eResearch 2018
Peter neish DMPs BoF eResearch 2018
 
Connected DMPs at UoA - we have a dream
Connected DMPs at UoA - we have a dreamConnected DMPs at UoA - we have a dream
Connected DMPs at UoA - we have a dream
 

Dernier

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Dernier (20)

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 

Transcript #4 fair -R for Reusable

  • 1. [Unclear] words are denoted in square brackets #4 FAIR Data Principles – R is Reusable 20 September 2017 Video & slides available from ANDS website START OF TRANSCRIPT Ketih Russell: Welcome to the fourth webinar in this series My name is Keith Russell from the Australian National Data Service and I am your host for today. My colleague Susannah Sabine is behind the scenes co-hosting the webinar with me The Australian National Data Service works with research organisations around Australia to establish trusted partnerships, provide reliable services to add value to research data and enhance the capability in the research sector. We work together with two other NCRIS funded projects RDS (Research Data Services) and Nectar to create an aligned set of joint investments to deliver transformation in the research sector. This webinar is part of a series of ANDS activities which aim to support the Australian research community in increasing our ability to manage our research data as a national asset.
  • 2. #4 FAIR – R for Reusable Page 2 of 15 This is the fourth and final in a series of webinars on the FAIR data principles. We have had webinars on Findable, Accessible, Interoperable. Today we will talk about making data re-usable, according to the FAIR data principles. Today I will kick off with an introduction to what the Force11 FAIR data principles say about making your research data re-usable. Then we will have two speakers that will talk about how you can take these principles and apply these in practice. First we will have Nerida Quatermass from Creative Commons Australia who will provide more information on using the Creative Commons Licensing framework and things to think about when choosing a licence. After that Margie Smith from Geoscience Australia will present on the work that they have been doing on attaching provenance information to research data. These are the elements that the Force11 have described for making your research data re-usable. First of all it is important to note that the other elements under FAIR (Findable, Accessible, Interoperable) are also really important to make data re-usable. If nobody can find the data it will not be re-used for example. The first high level heading is that the data and the metadata should have a plurality of accurate and relevant attributes. Under this heading they have described three elements that these attributes should cover. 1) Number one is that the data and the metadata should be released with a clear and accessible license for the data. Making data available but not assigning any licence makes the data really hard to re-use, it is completely unclear as a re-user what you can actually do with the data.
  • 3. #4 FAIR – R for Reusable Page 3 of 15 If you attach a licence make sure that it is in a machine readable format. That way machines can access the data and know whether it can be used for analysis. Nerida will explain about a possible framework to use to assign a licence to the data. 2) Number two is that the data and the metadata are associated with provenance information on how the data was created. This provides clarity on the steps that were taken in collecting, selecting, analyzing the data. Turning it from raw data into derived data and finally the final data set. This is extremely useful information if you want to re-use the data as this provides context and gives you background on whether the data will also be suitable for your purposes. Attaching provenance information is easier said than done and I am very grateful that Margi Smith is willing to present on how they have picked up this challenge at Geoscience Australia. 3) The final point is that the data and the metadata should meet domain- relevant community standards. For example, the data is best in a data format and file format that is commonly used the discipline so it is easy for another researcher in that discipline to pick up and use. Also use a metadata format that is common in that discipline as that often contains specific fields that are relevant to that discipline and help a researcher in that field to quickly understand the potential re-use of the data set. I would now like to ask our two speakers to talk in more detail about aspects that are relevant for making the data re-usable. First we will have Nerida Quatermass from Creative Commons Australia, based at QUT. She will present on the Creative Commons Licensing framework and considerations on using these licences. Next we will have Margie Smith from Geoscience Australia who will present on how GA has attached provenance information to data.
  • 4. #4 FAIR – R for Reusable Page 4 of 15 We will save up questions till the end of the webinar. But please feel free to already type this in the question box as we go along. Nerida Quatermass: Copyright law grants the monopoly over a work in material form to the “owner” of it. CC licences have filled a need for a public licence ie. one that anybody can rely on as permission to re-use a work. Before CC licences, the only way to get re-use rights was by exceptions allowed in copyright law or, licences directly negotiated between a copyright owner and a licencee. Public licences like CC are central to opening up access to research output, including sharing of data associated with these. I’ve put an open access spectrum there because it’s really important to distinguish between free access and re-usability which starts with permission to share, and extends to the right to make derivative works. These permissions to re-use are communicated with a clear machine readable licence. You are probably all know about CC licences. But as an overview: Four licence elements can be combined, resulting in Six CC licenses. They are featured in this slide on a spectrum of allowing more to less re-use of a work. The most open or permissive licence is Attribution, the most restrictive is Attribution-Noncommercial-NoDerivatives. The “free cultural Works seal” was developed for Wikimedia content. It signals an important delineation between less and more restrictive licenses applied to works in the digital commons. It distinguishes non- software works. In addition to the licenses, CC offers two public domain tools. CC0, the public domain tool, for creators, and
  • 5. #4 FAIR – R for Reusable Page 5 of 15 The PD Mark is used to indicate works that are already in the PD. (used commonly by cultural heritage institutions in digital collections) (C with a strikethrough) CC0 can be particularly important to maximize the re-use of data and databases because it otherwise may be unclear whether highly factual data and databases are restricted by copyright or other rights. CC0 is intended to cover all copyright and database rights, so however data and databases are restricted under copyright or otherwise, those rights are all surrendered. It is foremost a waiver. It means you waive all of YOUR rights so that you have zero rights left in a work, effectively dedicating it to the public domain. It has a legal code beneath it, because you need a legal mechanism to relinquish your rights. When you release content under CC Zero, you are explicitly stating that you do not expect attribution. There's a little bit of uncertainty around CC0 because Australian moral rights are fairly new, but the licences are designed as carefully as possible to respect the author's wishes. The mains point are: Do license your data – international rules are too variable to rely on public domain CC0 –ensures maximum compatibility with other licensed works and prevents attribution stacking (e.g. attributing many in a project; the immediate source of a derivative work +++ upstream works- there are other ways to acknowledge contribution) Next best CC-BY – if really want attribution to be legal requirement. The licences, communicate re-use rights through the three layer design-
  • 6. #4 FAIR – R for Reusable Page 6 of 15 1. The Legal Code is the legal instrument which states the terms and conditions of the licence 2. Human Readable format is a plain-language summary of the licence, with relevant icons to clearly indicates conditions of licensing and the re-use rights under the licence- You are free to… under the following terms In addition to supporting its reuse by individuals the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data- bringing us to the third layer: A machine-readable translation of the licence attaches to digital works or digital copies of work. The transation code (rights expression language) becomes embedded in the digital source, which helps search engines and other applications identify a work. This can also be achieved by uploading a work to a content sharing platform that supports CC licensing and takes care of the machine-readability for you. It’s also important to mark a work with the licence. I’ll talk about marking shortly. Regarding the robustness of the legal instrument: The Creative Commons Licences have been upheld in every jurisdiction in which litigation concerning them has occurred. There have been no recorded cases of litigation concerning a Creative Commons Licence in Australia, which would tend to support the quality of their construction. CC Licences are irrevocable, so last for the term of copyright. The are non-exclusive, so it is open to the rights holder to apply another licence to the material should the need arise. For example, if you release material under a CC- Non-Commercial Licence, but a commercial partner wishes to exploit the material, you can enter into a separate licence with the commercial partner that permits commercial reuse. This is known as “Dual licensing”.
  • 7. #4 FAIR – R for Reusable Page 7 of 15 To maximise discoverability by search engines and software systems make sure to use our license chooser tool to get the machine- readable html code. The licence chooser also mints the licence for marking a work. Four important things come out of this- licence selection, attribution, citation and more permissions: 1. Licence selection- guided by questions about what re-use you will allow: • Allow adaptations of your work to be shared? • Allow commercial uses of your work? • Remember that if your work is an adaptation of a work licensed under either CC BY-SA or CC BY-NC-SA, then your derivative work must be made available under the same license as per the ShareAlike condition. 2. Attribution is a base condition of all the CC licences. Flexibility for attribution requirements: “reasonable to means, medium and context”- can link to separate resource- licensor may waive some or all of the attribution requirements. licensor may waive some or all of the attribution requirements- 3. Citation- location of the work, and also source works: Answers concerns from data creators about being able to find the original data. • If the work you are licensing is a derivative of another work, then then you need to communicate that your work is a derivative: including the source URL of the original work and derivative/ modification described. • When modifying materials under one of the Version 4.0 CC licenses, you must make a note of any modifications you make to the materials, regardless of whether the modification is significant
  • 8. #4 FAIR – R for Reusable Page 8 of 15 enough to merit a derivative work, and provide URI back to source. Answers concerns from data creators about being able to find the original data. • It might be unfeasible to include attribution within a merged dataset in which case, include URI back to unmodified version. Lastly, More permissions: • For example, if you license something under CC BY but are okay with people not attributing you in certain cases--this is your chance to specify those cases. • You can't change the terms of a CC license, but you can always grant additional permissions or warranties beyond what the license allows? • Does your work incorporate elements of several third party materials? – Mark these and provide attribution. Marking communicates the licence ON the work: here is a list of ways to mark a work Regarding content platforms: If there is no licence field there is usually a description or other free form field where you can enter info about a work. My key message today is Re-use is a core component of FAIR data. So, do licence your data to enable re-use. Creative Commons licenses provide a simple mechanism • to ensure that users of research have the rights they need to reuse, replicate, and apply research outputs and data. • To disseminate and communicate research output in order to maximise the impact of work while protecting intellectual property and academic integrity • With built in attribution and citation which creates a clear path to the original data.
  • 9. #4 FAIR – R for Reusable Page 9 of 15 Margie Smith: Hi there! My name is Margie Smith and I have worked at Geoscience Australia since November 2016 in the Science Data Governance and Policy team… a team of two. I came across to help GA meet its obligations under the National Archives of Australia’s Digital Continuity 2020 Policy, to bring some external policy knowledge into the organisation and to provide governance guidance around science data management. In response to the National Archives Digital Continuity 2020 Policy and other Australian Government Open Data policies, government organisations have been tasked with making their data holdings visible and available. Making data open is not new to GA but there is most definitely now a whole of government push for access to all data domains. I have produced several documents to meet the DC2020 data governance milestones, but as you can see from this diagram, there has to be a balance of both oversight and execution across the data lifecycle – to have one without the other will either produce a pile of documents that nobody reads or a plethora of silos of excellence generating portals, datasets and services that only those in the know can find and use. Whilst there are a series of external drivers for data management, use and re-use, there are also strong drivers currently within the organisation. For example: • the cost of collecting or acquiring the data • the cost of not finding data previously acquired or • finding data and not being the person who ‘knows’ all about it
  • 10. #4 FAIR – R for Reusable Page 10 of 15 • succession planning • analogue collections – diaries or paper products that have yet to be digitised • general public servant obligations like the Archives Act • and, of course, GA’s Science Principles and vision. Provenance will support the organisation through enabling data re-use (as you can now find it) and allow for transparent science and advice through understanding the data supply chain. At the moment, our metadata records indicate provenance of the data through the lineage statement or in the abstract. As shown in these examples, the provenance of a dataset or product are usually free-text and can be semi-structured or unstructured. Very concise or… … not exactly concise. Here the abstract includes everything you need to know about the Coastal inundation modelling for Busselton, Western Australia, under current and future climate. Whilst this provenance information is very useful, it is not particularly useable; and by useable* , I mean its ability to be located, retrieved, presented and interpreted – by person or ideally, by machine search. * from the ISO 15489-1:2016 Information and documentation -- Records management -- Part 1: Concepts and principles As an example of why we need provenance for data reuse, I have made up a scenario. In this scenario, the advice was generated from the complete dataset at the time. A scientist generated a model using algorithms and provided advice based on the output of the model.
  • 11. #4 FAIR – R for Reusable Page 11 of 15 The advice, assuming it was of a general nature, is then made available through the catalogue – generally as a PDF document. The metadata for the advice gives the name of the dataset used, the area that the advice covers, the organisation as author of the report, and perhaps some of the methodology used in the generation of the report. In most cases, you could link the advice to the name of the dataset that was used to generate the advice, but not easily to the scientist or team and the models used to generate the advice. So this provenance model of a data product could work well as a highly structured PROV system. My colleague Nick Car gave a presentation on GA’s PROV model to ANDS in March and I suggest you watch that for specific information about the model at Geoscience Australia. Adapting Nick’s model, I have tried to replicate my previous scenario – modelling what we are working towards at GA. This is currently happening through lineage and association with digital objects rather than a true PROV model of digital objects. Working from right to left, the Advice would have a metadata record in eCat, our electronic catalogue, that indicates the process used to generate the advice, which is made up of the temporal subset of the dataset the advice is based on, the software or models that were applied to the data and information around that data’s acquisition as well as the reason the advice was required. If the data is to be re-used in future advice, it might also be helpful to know what models were tried previously that didn’t work. For our catalogue-like things, we need to gradually add the ability to link Entities, Agents, Activities etc to be able to use graph structured provenance (PROV-DM) across multiple types of objects and across multiple systems in the future.
  • 12. #4 FAIR – R for Reusable Page 12 of 15 In my role I am particularly interested in the repeatability of advice given by any government entity. Per the Archives Act, advice of this type given by government must be stored for a period of years and include the models, algorithms, software and data used to generate the advice. It is a safety net for the entity and the public servants that generated the advice at that point in time. This is currently a manual process, heavily reliant on the individual generating the advice and storing it appropriately. It would be excellent if the work we are currently undertaking would make it a lot easier for scientists to generate and catalogue this advice in the future. Prior to sorting out what I wanted to include in this presentation, I had another look at the FAIR principles for data reuse. Looking at these principles, I was feeling a lot better about what has been achieved at GA in the last 18months. We have a public catalogue, it has a clear and accessible data usage license and the standards used for cataloguing are in the spatial domain. The lineage in a metadata record has been the de facto ‘data provenance’ to date. We are currently working on multi-domain metadata retrieval from our catalogue; for example, we will be able to export records in AGRIF for Records Management, ISO19115 for spatial and DCAT for the National Archives. The Google search is already enabled in the search panel on the ga.gov.au splash page – this enables a search of both the website and the catalogue for content. In June, I was fortunate to attend the Open Geospatial Consortium technical meeting which is an international spatial standards organisation. It was evident in discussions there that many other
  • 13. #4 FAIR – R for Reusable Page 13 of 15 countries were also working towards delivering their catalogues in formats other than spatial to enable searching by other domains. We have a new catalogue, our eCat: where metadata records will have • a persistent identifier • the license for data re-use is clear • you can get to the data or product directly from the metadata record • and records for data are linked to services and portals that use them, and vice versa. At the moment, we are working to publish the 19115-3 catalogue schema and codelists that are used by GA in the catalogue. In terms of oversight, we have data product plans, roles and responsibilities, and workflows for the release of products from GA through eCat which is a longstanding and well understood process. For the past month, my area has been undertaking work to highlight the need for science areas to focus on a data-first rather than product- first view. This data-first process will echo the data product publishing workflows and have a dedicated internal catalogue we are calling SourceCat. SourceCat is a clone of the eCat software and is being trialled within two areas of GA before being released across the organisation. Once we have this in place, being able to show provenance from the product to the data will be made easier as we start the process at the beginning rather than try and remediate at the product publishing end of a project. This is a view of our new eCat – the electronic catalogue for products generated at GA. We have moved to the newer metadata standard for Australian Spatial Data, the ISO 19115-1:2014 which you can see indicated on the page.
  • 14. #4 FAIR – R for Reusable Page 14 of 15 There are also Keyword lists which have been somewhat free-forming to date. We have now selected well defined vocabularies where they exist and are working with the custodians to publish them whilst at the same time wrapping a governance structure around their maintenance and future extension. There is a persistent id and data download is indicated. In the scenario I gave before, I pictured how the provenance of a data product would work well as part of a highly structured PROV model. The structure required supports data provenance and re-use even if it doesn’t become a PROV system immediately. The Source Catalogue is currently being built as a proof of concept for two science areas in the organisation with the intention of making it an agency tool for all data that is acquired or created. In the future we intend to have a Software Catalogue and Objects Catalogue so that the software or models used in data curation or data products can be included as per PROV models. These are all clones of the eCat software. With this comes the need to support the organisation with tools and documented procedures that in the future will become automagic processes to bring data into the building. This support is more of the oversight and execution balance that I spoke of earlier. We are also using the catalogue standard to introduce elements that will align with a future PROV model. We will be including the element ‘derivedFrom’ in the metadata record. In the future, if a product does not have a ‘derivedFrom’ element, it will not be published. Further into the future we will include the element ‘haveProv’, which is different to lineage, as it is forward facing – linking the data to all products that have used it.
  • 15. #4 FAIR – R for Reusable Page 15 of 15 By having all these links embedded, Nick explained that this will allow a machine readable PROV-record to link to a metadata record to indicate provenance exists. He then started talking PROV bundles and lost me but hopefully all these steps will lead to the working PROV model of the future GA. I was also thinking about the next talk on licensing frameworks. In this future machine-to-machine scenario, the licenses of aggregated products may be determined through an automated rule set depending on the way the data product is delivered. In this example a dataset and its associated web service have differing licences. For third-party aggregated data use this process is currently determined through extensive written agreements for each product. Finally, it takes a lot of work to remediate legacy metadata records. Are we going to remediate every single one of our legacy data records? NO – or at least not straight away. Not all data is high value nor does all data have to be highly useable, but all data acquired and data products created should be FAIR. To re-use data, it is necessary to understand its provenance to assess if it is fit for purpose and in working towards a PROV model and implementing tools like the SourceCat we are also further along the path to achieving GA’s vision to fully maximise our data potential. END OF TRANSCRIPT