Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

TDM of National Libraries in the EU.pptx

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 29 Publicité

TDM of National Libraries in the EU.pptx

Télécharger pour lire hors ligne

Text and Data Mining (TDM) for scientific research or for any other purpose is included in the provisions of the Directive 2019/790/EU on Copyright in the Digital Single Market. Research on TDM operations in the National Libraries of EU Member States was conducted and is presented.

Text and Data Mining (TDM) for scientific research or for any other purpose is included in the provisions of the Directive 2019/790/EU on Copyright in the Digital Single Market. Research on TDM operations in the National Libraries of EU Member States was conducted and is presented.

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (13)

Similaire à TDM of National Libraries in the EU.pptx (20)

Publicité

Plus par Dr. Marinos Papadopoulos (17)

Plus récents (20)

Publicité

TDM of National Libraries in the EU.pptx

  1. 1. Text & Data Mining in Archives, Libraries & Museums: Research on TDM of National Libraries in the EU Centre of International and European Economic Law & Jean Monnet Foundation For Europe Prof. Maria Kanellopoulou Botti Department of Archives, Library Science & Museology Ionian University Attorney-at-Law & Dr. Marinos Papadopoulos Attorney-at-Law Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 1
  2. 2. Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 2 Text and Data Mining (TDM)  Art.3 & Art.4 of Directive 2019/790/EU on Copyright in the Digital Single Market (DSM Directive).  TDM includes Web Harvesting and Web Archiving activities.  A statutory mandatory exception of Copyright that has long been requested (e.g., IFLA Statement on Text and Data Mining, 2013).  The TDM exception inspired from, and contain partly the same conditions as the scientific research exception. 1. Has to be implemented across all EU Member States in order to ensure effective harmonization of the law. 2. Must not be subject to contractual overrides regarding TDM implemented for scientific purpose. 3. Must not be subject to lock-up behind technological protection measures.
  3. 3. Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 3 What TDM is  TDM is automated analytical technique aimed at analyzing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations. It is any activity where computer technology is used to index, analyze, evaluate and interpret mass quantities of content and data (Recitals 8, 11).  TDM is an inherent part of Artificial Intelligence and Machine Learning research.  TDM works in the following manner: 1. Identifying 2. Copying a. Pre-processing i. Tokenization ii. Normalization (stemming or lemmatization) iii. Parsing (POS tagging) b. Uploading 3. Extracting 4. Recombining
  4. 4. Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 4 European v American perspective on TDM In the US legal environment, courts have found that reproducing copyrighted works as one step in the process of knowledge discovery through text data mining is transformative, and thus ultimately the act of reproduction of works through the TDM process is a fair use of those works that fits in the first fair use factor of the US Copyright Act. The concept of “transformative use” fits in the concept of “non-expressive use” the latter being considered as a subset of the former. In the EU Copyright law, the notion of reproduction is accepted at its broadest meaning as is clearly stated in art.2 of the InfoSoc Directive and is also indicated in Recital 21 of the InfoSoc Directive. In the EU Copyright law, the meaning of reproduction is to determined technically rather than functionally. Thus, copying of works in the framework of the TDM process in the EU Copyright law falls within the legal meaning of reproduction which is an exclusive right of the author of a work.
  5. 5. Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 5 Lawful Access 1. Access to a work through a subscription or access to content based on open access (Recitals 10, 14). Access to content that is freely available online. Access to work that is allowed by an existing exception or limitation to Copyright. 2. Access to a database in respect of terms of use and the conditions of access to a database set by the rightholder of the database. Access to work that is allowed by an existing exception or limitation to Copyright. 3. Lawful access = normal use = lawful use (Recital 33 of InfoSoc Directive) 4. Lawful access does not allow the circumvention of Technical Protection Means (TPM)
  6. 6. Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 6 Purpose-specific TDM (art.3) v TDM for any purpose (art.4) 1. Art.3: the TDM exception of Copyright is provided for the purpose of scientific research. 2. Art.4: the TDM exception of Copyright is not purpose-specific. “creative work undertaken on a systematic basis in order to increase the stock of knowledge, including knowledge of man, culture and society, and the use of this stock of knowledge to devise new applications.” (OECD definition)
  7. 7. Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 7 The beneficiary of purpose-specific TDM 1. Art.3: a research organization and/or a cultural heritage organization. a university, including its libraries, a research institute or any other entity, the primary goal of which is to conduct “scientific research” or to carry out educational activities involving also the conduct of scientific research: (a) on a not-for-profit basis or by reinvesting all the profits in its scientific research; or (b) pursuant to a public interest mission recognized by a Member State, and in such a way that the access to the results generated by such scientific research cannot be enjoyed on a preferential basis by an undertaking that exercises a decisive influence upon such organization (Art.2§1) a publicly accessible library or museum, an archive or a film or audio heritage institution regardless of the type of works or other subject matter that they hold in their permanent collections; cultural heritage organizations should also be understood to include, inter alia, national libraries, national archives, educational establishments, research organizations and public sector broadcasting organizations (Art.2§2, Recital 13)
  8. 8. Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 8 The beneficiary of TDM (that is not purpose-specific) 1. Art.4: any public or private, non-profit or for-profit, legal or physical person (Recital 18)
  9. 9. Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 9 Empirical Research of TDM in National Libraries of EU Member States SURVEY’S IDENTITY Name A survey on web archiving in EU Member States’ national libraries Kind Empirical research via questionnaire Medium Internet by Google Forms Provider Ionian University Co-Funded by Greece and the European Union – European Social Fund Part of A research project titled “Web Archiving in Public Libraries and IP Law” within the framework of the Operational Program “Human Resources Development, Education and Lifelong Learning” of NSRF - Partnership Agreement 2014-2020 Duration March – July 2019 Target group National Libraries of EU Member States’ Language English Basic Fields/components 1. Library’s policies on Web-harvesting / Arrangement / Procedures, 2.Technological issues, 3. Legal issues, 4. Access/Utilization, 5. Co-operation & Perspectives 6.Proposals and useful observations Question’s number 17 Main scope Collecting elements on current web archiving situation Expected results Enhancing countries involved in Web Archiving, complications, perspectives, new projects
  10. 10. Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 1 0 Empirical Research of TDM in National Libraries of EU Member States  EU Member States National Libraries that responded to the research undertaken
  11. 11. Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 11 Empirical Research of TDM in National Libraries of EU Member States  17 Questions: 1. Policy issues questions on Web-harvesting, library arrangements and procedures 2. Technological issues questions 3. Legal issues questions 4. Access/utilization questions 5. Co-operation & perspectives questions 6. Proposals and useful observations (open-ended questions)
  12. 12. Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 12 Empirical Research of TDM in National Libraries of EU Member States  The importance of Web-Harvesting/Archiving for EU National Libraries
  13. 13. Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 13 Empirical Research of TDM in National Libraries of EU Member States  Operators’ number per Web-harvesting/archiving in EU National Libraries (the question was not replied by all surveyed EU National Libraries) Country Operators No. Country Operators No. Denmark 7-8 Hungary 5 France 4 Sweden 3 Slovenia 3 Belgium 1 Greece 3 Germany 4 Spain 5 Estonia 4
  14. 14. Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 14 Empirical Research of TDM in National Libraries of EU Member States  Use of quality filters for Web-Harvesting of EU National Libraries
  15. 15. Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 15 Empirical Research of TDM in National Libraries of EU Member States  The main purpose for Web-Harvesting/Archiving of EU National Libraries
  16. 16. Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 16 Empirical Research of TDM in National Libraries of EU Member States  The use of third parties for Web-Harvesting of EU National Libraries
  17. 17. Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 17 Empirical Research of TDM in National Libraries of EU Member States  The use of software for Web-Harvesting of EU National Libraries
  18. 18. Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 18 Empirical Research of TDM in National Libraries of EU Member States Software Archive-it of Internet Archive W3ACT Heritrix crawl engine, Annotation Curation Tool Repox Software Heritrix Proprietary software of the service provider Heritrix bundled with NetarchiveSuite Heritrix with Net Archive Suit (NAS) Heritrix 3, ArchiveIt, Webrecorder (as an experiment) NetarchiveSuite, Heritrix, Free text search using Solr, and Wayback. Developing search frontend and playback engine SolrWayback. Archive-It Heritrix, Net Archive Suite, Open Wayback, SolR Heritrix (and the Web Curator Tool) Web Curator Tool, Heritrix NetarchiveSuite and Heritrix. Heritrix web harvesting software OWA-Client, developed by service provider Heritrix (harvesting), SOLR (indexing), Wayback (search and representation) Heritrix
  19. 19. Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 19 Empirical Research of TDM in National Libraries of EU Member States  Concern for author’s consent before execution of Web-Harvesting of EU National Libraries
  20. 20. Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 20 Empirical Research of TDM in National Libraries of EU Member States  Concern for personal data protection before execution of Web-Harvesting of EU National Libraries
  21. 21. Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 21 Empirical Research of TDM in National Libraries of EU Member States  Concern for intellectual property protection in process of Web-Harvesting of EU National Libraries
  22. 22. Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 22 Empirical Research of TDM in National Libraries of EU Member States  The terms of access to and use of works harvested from the Web and archived by EU National Libraries 1. Usually only inside the library in the research reading rooms (7). 2. On legal deposit terminals with firewall (3). 3. Only on Library premises to registered users (6). 4. Available online with the specific permission of the website holder and publishers (5). 5. Available online on the permission of National Library (1) 6. The web archive is publicly available without restrictions. Intellectual property right holders can request their material to be accessible only on library premises (1). 7. The archived websites are available for research purposes only (3). 8. Only printing is permitted and not in all libraries (3).
  23. 23. Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 23 Empirical Research of TDM in National Libraries of EU Member States  Inquiry of user-satisfaction from Web-harvesting service of EU National Libraries
  24. 24. Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 24 Empirical Research of TDM in National Libraries of EU Member States  Forms of co-operation for Web-harvesting service of EU National Libraries
  25. 25. Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 25 Empirical Research of TDM in National Libraries of EU Member States  Connection of Web-harvesting systems of EU National Libraries and e-book publishers
  26. 26. Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 26 Empirical Research of TDM in National Libraries of EU Member States  Answers to question for plans for new projects related to Web-harvesting of EU National Libraries 1. Integration of the web documents metadata in the National Library Service Catalog. 2. Exploring using the web recorder tool to archive websites and push the WARCs gathered in this way into library’s collection. 3. More stakeholder involvement and projects related to raise awareness on web harvesting. 4. Searching for use of new tools for harvesting content from social and streaming media platforms. 5. Harvesting of press websites with paywall (an automated authentication of the crawler). 6. Cooperation with the Internet Archive, in order to achieve better bulk harvesting. 7. Upgrading library’s services with the support of another software (MINT) which will enable to enrich metadata during the harvesting process. 8. Web-harvesting of new thematic fields on digital music, climate change etc. 9. Increasing the Web-harvested collections constantly. 10. Modernizing and expanding the Web-harvesting environment, including the system used for access to harvested works where library will switch from an in-house system to Open Wayback system. 11. Social media harvesting depending on whether there will be funding.
  27. 27. Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 27 Empirical Research of TDM in National Libraries of EU Member States  The most important problem in Web-harvesting operation of EU National Libraries
  28. 28. Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 28 Empirical Research of TDM in National Libraries of EU Member States  Proposals & Observations for Web-harvesting operation of EU National Libraries 1. The necessity to continually improve technology in general (e.g., to extract material from large and dynamic web pages that are not yet satisfying or feasible with Heritrix). 2. Legal issues are always at the forefront of interest because the legislation is general and incomplete and allows only for limited access to content harvested from the Web; library experts also noticed the necessity of protecting and securing their web collections. 3. Libraries prefer the development of small collections with works harvested from different websites initially (quality and variety is important for them); they consider the development of extensive collections subsequently and at a later stage in their Web- harvesting operation (quantity is not an immediate goal). 4. Improving technical infrastructures and tools comes at the forefront of upcoming library research projects along with expanding collections, better description of web archives metadata and extracting pages on new topics and fields such as social media and live streaming. 5. The most experienced in web harvesting libraries, aim at the extraction of materials from “difficult” websites such as complex websites and sites with pay walls. Less experienced libraries aim at collaboration and co-operation development and awareness raising programs of their Web-harvesting operation.
  29. 29. Text & Data Mining in Archives, Libraries & Museums: Research on TDM of National Libraries in the EU Centre of International and European Economic Law & Jean Monnet Foundation For Europe Prof. Maria Kanellopoulou Botti Department of Archives, Library Science & Museology Ionian University Attorney-at-Law & Dr. Marinos Papadopoulos Attorney-at-Law Prof. M. Kanellopoulou Botti & Dr. M. Papadopoulos | e-Conference on Mass Digitization and the EU Policy for Intellectual Property @ 30-31/03/2022 29

×