Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

JABES 2018 - Conférence inaugurale, Jurgen Kett, présentation avec commentaires

147 vues

Publié le

Développer la clé de voute du web de données culturelles et scientifiques / Developing a backbone for the web of cultural and scientific data.
Jurgen Kett, chef du département Library Standards et responsable du GND, Deutsche National Bibliothek
Journées ABES 2018

Publié dans : Formation
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

JABES 2018 - Conférence inaugurale, Jurgen Kett, présentation avec commentaires

  1. 1. 1
  2. 2. The idea of linked data or of the semantic web is to combine data beyond the boundaries of systems and domains in a meaningful way. The idea is not new. In fact as librarians we have long tradition of interlinking, sharing and re- using data. Perhaps that is one of the reasons why libraries embraced the movement very quickly. The difference to our tradition is, that we were using our own standards, formats and protocols that where hard to understand. Now it seemed by doing some minor tranformations we could directly plug into the world. It was too tempting. We have just been treated as dinosaurs, suddenly we find ourselves setting the tone in future technologies. Since then we have learnded that it‘s not that easy. It takes much more time and efforts to build a living cultural semantic web. It is not enough to just change the cover, we need to revise the way we are working and the environments we are using. One important step towards this goal is to develop standard ways for interlinking cultural collections. The existing concept of authority data is a perfect way to do this. 2
  3. 3. Our goal is to make the GND which still is mainly used by libraries a cross-domain community project. This project will address perhaps the most important task of these days: building bridges. 3
  4. 4. The bridges the GND provides are components of the Semantic Web. It connects data collections. The GND is the central authority file of the german speaking world. The most intensive application is in Germany, Austria and Switzerland – but also in South Tyrol, Liechtenstein and Luxembourg. 4
  5. 5. It contains records of persons, corperate bodies, works, conferences, geografical names an subject headings. 5
  6. 6. In order to make the data available to a broad user base and to maximize its dissemination, the data is available under a free license. Also important for the distribution, is the offer of different formats. The most used format is still MARC21. But increasing use outside the library domain has made RDF and JSON-based formats more meaningful. 6
  7. 7. Currently the GND contains about sixteen million records. The majority of these records (three-quarter) describe persons. As you can see on the diagram a big portion of are “Names of Persons”. These records are of less good quality and are not precise. One challenge is to improve as much of this data as possible. As with the introduction of the FRBR-based RDA in the german-speaking world the number of works in the GND will grow very fast. 7
  8. 8. The main application of the GND database was initially the standardization of search entries and the re-use of cataloging data. But with the success of the World Wide Web, its potential as a tool for semantically linking publications in meaningful machine-readable ways has become increasingly important. Over the years our partners integrated all there local authority data into the GND database. More and more collections and datasets referred to it. Step by step the GND became a data network. You can distinguish roughly the three kinds of links that are listed here. 8
  9. 9. The most important feature of the GND is its very co-operative character. The GND is carried by a lot of partners – most of them national libraries and library centers. Hundreds of libraries an other institutions are connected via these partners. We all jointly maintain the GND. We have jointly built it over the years, defined the rules for it, and specified interfaces and business processes. We argued heavily about strategic decisions. And whenever old local remainders had to be integrated (forgotten, so-called authority data), we were regularly annoyed about each other's data garbage. In short: We have learned to manage an authority file together. 9
  10. 10. Since the last couple of years our user community has grown. That is is a great opportunity, but our organisation and system do not scale enough. The biggest challenge is the integration of new communities (e.g. museums, archives, historic preservation, portals). 10
  11. 11. But it is also clear that the complexity of the system is increasing due to the different perspectives. 11
  12. 12. In order to meet these challenges, we have set up an initiative. It covers the revision of the organization (responsibilities, rights and obligations), the rules (cataloging rules, workflows and rights management), the system environment (tools and interfaces for editing, analyzing and visualizing) and technical infrastructure. As a basis, we set up a new organizational structure, agreed on common strategic goals and principles and defined a work program. 12
  13. 13. Before the start of the initiative, the GND had no formal organizational structure. Although it is great in principle, when things work without the usual formalisms, this situation had some disadvantages. The roles and responsibilities (especially with regard to strategic issues) were not clear. It was obvious that a binding structure was needed to handle the fundamental modernization of the GND. 13
  14. 14. The new organizational structure needed to be scalable. It must be able to grow as new partners from different domains join. Since 2017 we agreed on the new organizational model shown here. The GND committee is responsible for strategic decisions and oversees the operation of the GND. The GND office provides the common core system environment (GND platform) and services - including quality control services. The office also oversees the development of rules, formats and the GND platform. The so-called agencies are mainly responsible for the management of user groups (participants). They represent the interests of their participants in the GND Committee. Agencies coordinate change requests and change implementation, provide support and training. And they are responsible for the quality of all the data provided by their participants. This means that if problems arise with a record, they are obliged to take care of it. 14
  15. 15. With the founding of the GND cooperative, we also agreed common principles. The principles are partly in contradiction to each other and form a field of tension. They are already showing what the fundamental challenges of the coming years are. For example, in future the GND should be consistently designed across domains and take into account the requirements of non-library institutions. On the other hand we commit ourselves to consistency (unambiguous) and a high uniform data quality (obliging rules, trusted quality). 15
  16. 16. The final step to get the development going was to set up a work program with six Action Fields. 16
  17. 17. Our goals are impossible to achieve without additional ressources and political support. Fortunately our vision has wide support in the whole scientific and cultural community. It is a shared vision. “Die Deutsche Forschungsgemeinschaft (DFG)” gives us the opportunity to obtain substantial funds for essential development-intensive parts of the program. 17
  18. 18. Numerous cultural institutions participate in this from their own resources. We s also been able to build strategic partnerships with other communities. 18
  19. 19. These are ongoing and timely planned projects of the work program. The project “GND4C”, which is to build up important bases for the opening of the GND, forms something like the centerpiece. There are also numerous activities outside of projects. 19
  20. 20. 20
  21. 21. The basic idea of Action Field 1 is that new partners join together to form interest groups. These stakeholders should then be assigned in a second step, either an existing agency or create a new own agency. Of course, this is a gradual process and there is a lot to learn about each other. In order to enable a constant dialogue, representatives of the interest group are involved in the committees of the GND from the beginning. It is clear that the policy of cooperation needs to be further developed in view of the needs of these new partners. Important preparatory work has already been done in setting up the German Digital Library (DDB), a kind of national sibling from EUROPEANA. The structures created there should be reused and strengthened. 21
  22. 22. Action Field 2 addresses the already mentioned contradiction between uniqueness and unity on the one hand and community-specific demands on the other: On the one hand, the rules and the data model should better meet the needs of the new partners. On the other hand, it should promote cross-domain collaboration. Through years of collaboration with special interest groups, we have learned that it is useless to force everyone into a single model. In practice existing data fields will be reinterpreted. Unfortunately, this leads to incompatibility, misunderstandings and as a rule to the creation of duplicates and to inefficient and frustrating “edit battles”. Therefore, we plan to introduce modular extensions for specific stakeholders (GND-PLUS). The rules and fields within these extensions are set by the stakeholders. The fields are protected. They are not obligatory for the other groups but the information can be used by everyone. Many library-specific changes will be in a GND-PLUS space. 22
  23. 23. Action field 3 „Import and dataming“ is about creating tools that allow efficient data analysis and data integration. New partners come with previously unconnected datasets and collections. We need tools and workflows in order to support the integration, better tools for quality control and better support for adding internal and external links. 23
  24. 24. We also need to improve the access to the GND network. Currently the undelying network of interconnected collections and datasets is not really visable to users. Our goal is to provide a central entry point to that network. This entry point will not be a giant integrated portal, but a signpost to connected datasets, colections and services. It will also provide means to explore the GND. 24
  25. 25. Action 5 is necessary because the data exchange infrastructure is unstable and will not survive further growth. Currently many partners mirror the complete database locally. The number of updates depends on the processing speed of the slowest connected system. We have to pour a swimming pool full of changes cup by cup through a tiny tube to prevent overflows. The main difficulty with this work package is that it also has to deal with the local systems. 25
  26. 26. Last but not least Action Field 6 „collaboration“ cares about reaching out for new user groups and usage scenarios. We want other user groups to use GND as a tool in their environment. In particular, authors and publisher should discover the GND for themselves and immediately claim their publications with a personal unique identifier. We are still in the early stages of this and are currently starting a project together with important representatives from the publishing industry. In order to reach scientists and universities, we started a co-operation project around ORCID with various project partners (ORCID DE). 26
  27. 27. We also want to make the GND more attractive to software developers and research projects. Therefore, we plan to develop a lightweight API for the GND, as well as build a registry for existing projects and tools. 27
  28. 28. Even more important is the cooperation with the wikimedia projects "Wikipedia" and "Wikibase". For about ten years there is a successful cooperation with Wikipedia. Wikipedians connect articles to the GND and make suggestions for correction. Our goal is to bring the Wikidata community even closer to the GND and vice versa. We also plan to evaluate the re-use of Wikibase - the software running on Wikidata. 28
  29. 29. In the end, it's about skillfully complementing each other's strengths. We hope that through our initiative we will make a difference for other, similar projects. 29
  30. 30. If you look at the parallels between our two endeavors, the idea of a European authority data system comes up. If our data hubs are based on similar flexible concepts and based on the same software base, then in a second or third step, it will not be difficult for us to interlink these hubs. We should just try it. 30
  31. 31. 31