L’archive ouverte HAL : mise en oeuvre d’une politique partagée pour l’accès ouvert aux publications scientifiques.
Christine Berthaud
Directrice du Centre pour la Communication Scientifique Directe (CCSD)
du CNRS, France.
Résumé:
L’archive ouverte HAL créée en 2001 par le Centre pour la Communication Scientifique Directe (CCSD) à l’initiative de physiciens du CNRS sur le modèle d’ArXiv a su évoluer au cours de ces 12 années.
HAL est une archive ouverte inter établissements et multidisciplinaire : elle recueille des dépôts émanant de l’ensemble des établissements l’enseignement supérieur et de la recherche et dans toutes les disciplines scientifiques. En avril 2013, on a renouvelé dans le cadre de la Très Grande Infrastructure Recherche : la Bibliothèque Scientifique Numérique, le protocole d’accord signé en 2006 entre les universités et de nombreux organismes afin que la plateforme HAL devienne effectivement une plateforme commune à forte visibilité internationale. Les établissements signataires s’engagent à alimenter HAL selon des modalités définies. Cette politique d’ouverture a induit des évolutions de la plateforme et des procédures la régissant. Une meilleure prise de conscience des possibilités crées par le numérique et l’open-access permet aussi de concevoir de nouveaux modèles entre la voie verte et la voie dorée notamment celui des épi-journaux proposé par la plateforme Episciences.org. Cette communication propose de présenter comment une archive commune peut être partagée par tous en prenant en compte les spécificités de chacun.
DevoxxFR 2024 Reproducible Builds with Apache Maven
Open archive (HAL): implementation of a policy for open archives
1. Open Archive (HAL) :
implementation of a policy for open archives
Christine.Berthaud@ccsd.cnrs.fr
2. L’open access
• Economic factor
• Increase of the price rates of subscriptions 1975 in 1995 (300 %)
• The digital technology also generates new intermediate actors (search
engines)
• Technical factor
• Internet: the server ftp, the electronic mail and Web and its associated
services (concept of interoperability)
• New techniques for the scientific communication
• Socio-behavioral factor
• Appearance of new practices of the IST and the need formalized by
sharing of resources on behalf of the researchers, computer specialists
and professionals of the IST
Journée détude ISD, Tunis 10 avril 2013 - Christine
Berthaud@ccsd.cnrs.fr
3. The prehistory
• The precursors of new practices of type OA:
• From the 80s, researchers deposited their articles on servers FTP then
on Web sites (computer specialists of the Bell Labs, the physicists, the
astrophysicists).
• Paul Ginsparg invents a collective Web site
• First server of self-archiving 1991
http: // arxiv.org /
ArXiv has mirror sites all over the world
Journée détude ISD, Tunis 10 avril 2013 -
Christine Berthaud@ccsd.cnrs.fr
4. Now
• ArXiv, the example in Physics and Mathematics for 20 years
• An initiative of Paul Ginsparg at Los Alamos
• 4 000 manuscripts deposited each month
• 675 000 manuscripts deposited todate
• 300 000 daily searches
• Mirror sites all over the world
Australia, Brazil, China, Germany, India, Israel
Italy, Japan, Russia, South Africa, France
South Korea, Spain, Taiwan, England, and so on
Journée détude ISD, Tunis 10 avril 2013 - Christine
Berthaud@ccsd.cnrs.fr
5. Declarations
• Berlin Declaration (octobre2003)
• Big leading European scientific institutions assert their commitment
in favour of the open access (archives and reviews).
• In France, the CNRS, INSERM, INRIA, Institut Pasteur have signed ...
• 25 Nobel support the Open Access (open letter to Congress,
September 2004)
• 2-4 december, 2009 : The Berlin7 conference took place in Paris.
Journée détude ISD, Tunis 10 avril 2013 -
Christine Berthaud@ccsd.cnrs.fr
6. Definition
• An open archive (OA) is a repository of data (server) whose content
(scientific and technical documents) it is available online, freely on
the web, and respects the OAI-PMH (Open Archives Initiative-
Protocol for Metadata Harvesting) ...
• Self-archiving means file documents on this server.
• And…
• Due to this common protocol, open archives are harvested by a
harvester as Oaister becomes a meta-archive, documents can be
seen everywhere in the world, whatever their location.
Journée détude ISD, Tunis 10 avril 2013 -
Christine Berthaud
7. Electronic ID and quotability
• "Document" in an open archive is:
• One or several versions of the same article containing each
• The descriptive note
• Including possibly bibliographical references
• one or more of the same file formats (PDF, TeX, XML, DOC, etc.).
• possibly additional files:
images, sounds, videos, posters, presentation "ppt", etc..
• The persistent identifier, "Quotable" references a "container" that
houses all components of the scientific paper
• URL simplified ensures the reader to arrive always on the latest version
of the article and allows access to historical
• Stable URL does not depend on the physical storage of document
Journée détude ISD, Tunis 10 avril 2013 -
Christine Berthaud
8. In France
• Creation of the Center for Direct Scientific Communication, 2000
(Centre pour la Communication Scientifique Directe : CCSD)
• HAL development, 2001and TEL for theses
• Official announcement of a common OA policy, focussing on the
creation of institutional repositories, March 2005
• Open Archives Steering Committee, March 2006
• 1st Inter-institution agreements
• CNRS, CPU, CGE, INRIA, INRA, INSERM, IRD, CIRAD, CEMAGREF,
PASTEUR
Journée détude ISD, Tunis 10 avril 2013 -
Christine Berthaud
9. • Context
• May 2008 : Salençon report :
• political leadership of the STI shared by the major players of higher
education and research
• offer services to all communities
• develop new models and economic balance between public and private
publishing
• 2008 : BSN is included in the roadmap of
• «very large research infrastructures » (TGIR)
Journée détude ISD, Tunis 10 avril 2013 -
Christine Berthaud
10. First steps
• 2010 : creation of a Strategic steering committee of digital
acquisitions
• Composition :
• Conference of University Presidents (CPU)
• Conference of Schools of higher education
• Major research institutions
• Ministry of culture
• Reference text, setting out the objectives of coordination :
• Slow spending
• Quality of supply …
Journée détude ISD, Tunis 10 avril 2013 -
Christine Berthaud
11. Current structure
• Strategic steering committee
• 9 Technical group
• 1- Acquisitions of current and restrospective collections
• 2- Access arrangements and accommodation
• 3- Cataloging and metadata
• 4- Open access
• 5- Digitization of scientific patrimony
• 6- Digital preservation
• 7- Digital publishing
• 8- Interlibrary loans
• 9- Training
Journée détude ISD, Tunis 10 avril 2013 -
Christine Berthaud
12.
13. National MoU
• Partnership agreement in favor of open archives and shared
platform HAL
• signed by CPU (Conférence des Présidents
d’Universités), CGE (Conférence des Grandes
Ecoles) and 23 establishments :
• L’AMUE, l’ANDRA, l’ANR, la BNF, Le BRGM, la CDEFI, le CEA, le CEE, le CIRAD, le
CNRS, le CSTB, l’IFPEN, l’IFFSTAR, l’INED, l’INERIS, l’INRA, l’INRIA, l’INSERM,
l’INVS, l’IRD, l’IRSN, l’IRSTEA et l’Institut Pasteur.
Journée détude ISD, Tunis 10 avril 2013 -
Christine Berthaud
14. HAL, The CCSD’s Mission
• An initiative for multidisciplinary scientific archives
• An international approach
• Joining the open archives movement (no restriction to building a
national or institutional archive)
• An essentially researcher-centered approach:
• Archives directly built by researchers with as primary objective
the creation of a scientific tool providing access to the full-text of
a document
• An « indirect » institutional tool through metadata harvesting
(association author -> laboratories -> institutions)
• A long-term archival mission
Journée détude ISD, Tunis 10 avril 2013 -
Christine Berthaud
15. HAL objectives
• Full-text oriented multidisciplinary archive
• All kinds of scientific publications
• Researcher-centered approach
• Principle of self-archiving (with possible
editorial assistance by professionals)
• Customized environments (flexible metadata
structure, community-specific interfaces,
building of ‘collections’, alert systems)
• Interoperability
• Open technologies (Linux, Apache, MySQL,
PHP,…)
• Long-term archiving
Journée détude ISD, Tunis 10 avril 2013 -
Christine Berthaud
16. A reminder : the Berlin Declaration
• …defines open access contributions as
including :
• ‘original scientific research results, raw data
and metadata, source materials, digital
representations of pictorial and graphical
materials and scholarly multimedia material’
Journée détude ISD, Tunis 10 avril 2013 - Christine
Berthaud
17. Quality
• Quality of information
• Institutional recognition and certification
• Editorial assistance by librarian (metadata check and
improvement)
• Value-creation tools (overlay journals, creation of
‘collections’ via stamps,…)
• Required scientific level: “any article that, if submitted
to a peer-reviewed journal, would be sent to a reviewer”
• Broad coverage
• Quality of infrastructure provision
• Archiving and long-term preservation environments
(management of technological changes)
• Large scale dissemination
Journée détude ISD, Tunis 10 avril 2013 -
Christine Berthaud
18. Services
• To the researchers
• Quality, accessibility, long term archives, reporting
aid (annual assessment, lab assessment), legal
support, single input and multiple output formats,
export facilities
• To the institution
• Quality, broad coverage of lab production, enhanced
assessment, branding, administrative and prospective
tools
• To the research (and tax payer) community
• Quality, large scale accessibility of the French
(multidisciplinary) research output
Journée détude ISD, Tunis 10 avril 2013 -
Christine Berthaud
19. The HAL Open Archive
• Entirely developed by the CCSD
• Open Technologies
• LAMP (Linux, Apache, MySQL, PHP)
• Secured environment (hosted by the IN2P3’s Computer Center)
• Partnership with INRIA, CINES
Journée détude ISD, Tunis 10 avril 2013 -
Christine Berthaud
20. Interoperability, interconnections
• OAI-PMH
• Multi formats including OAI_DC (Non qualified Dublin Core)
• REDIF
• For harvesting with RePeC (Economists)
• RSS feeds
• Connections
• arXiv in Physics and Mathematics (Biology), exclusively
from HAL to arXiv
• PubMed Central (HAL-INSERM)
• Since February 2006, record import feature from PubMed
Central
• Transfer to PubMed Central
• RePec
Journée détude ISD, Tunis 10 avril 2013–
Christine Berthaud
21. Imports/exports
• Imports
• XML files, metadata and full-text (Web services some time in
2006)
• Exports
• Lists according to search criterias in all common computer
formats
• Researcher’s « Home page »
• Institutional exports (CRAC, Labintel, etc.)
• Web services
Journée détude ISD, Tunis 10 avril 2013 -
Christine Berthaud
23. Deposit portals
domain-dependent
generic
institutional typological AUTRES
UNIV
INRIA HAL TEL HAL-SHS INRA
EXPORTS OAI-PMH
REDIF
IMPORTS RSS
Etc.
XML, WS
haL
Bibliographic
Bibliographic
record
record
Full text
Full text
Discipline-specific metadata
Institutional metadata
ArXiv
ArXiv
Common metadata
PubMed Central
PubMed Central
(2006))
(2006
24. EXPORTS OAI-PMH
REDIF
IMPORTS RSS
Etc.
XML, WS
haL
Customized collections, extractions, « stamps »
26. Deposit
• A streamlined process
• 4 steps metadata authors files recap
1. Metadata
2. Author(s), laboratory affiliation(s), institution(s)
3. [document downloading]
4. Recap control, deposit
• Back if needed
2
27. Depositing in HAL
Deposit conditions
• Streamlined identification
• Auto-validated account (institutional authentification possible)
• Contributors: authors, librarians, scientific publishers, etc.
• Required scientific level
• « Any article that, if submitted to a peer-reviewed journal, would be sent to a
reviewer »
• Controls prior to onlining
• Technical check
• Cursory scientific validation by field or submission portal
• Open document format
• Mandatory display format (PDF, PS)
• Source files recommended
•
• Unretractable deposit
• Ability to deposit new versions
• All versions are accessible to any Net surfer
Journée détude ISD, Tunis 10 avril 2013 – Christine
Berthaud
28. HAL
• Uses
• Preprints, Postprints, bibliographic records
• Types of bibliographic records
• Articles in peer-reviewed journals
• Guest conference papers
• Peer-reviewed proceedings
• Articles in non peer-reviewed journals
• Conference papers
• Seminars, workshops
• Book chapters
• Books
• Patents
• Theses and dissertations
29. Statistics
• 220,449 scientific articles available in full text on HAL including :
44,136 documents related to the disciplines of SHS
34,071 theses on TEL
More than 600,000 authors referenced
2800 deposits per month
• In 2012: 2763 ArXiv transfers, 1736 to RePEc and 465 to Pubmed
Harvested by:
REPEC, Google Scholar, OAISTER ...
32. HAL, or Why Centralize
• Ensure access to the full-text
• Maintain an homogeneous quality scientific level
• Provide enhanced international visibility
• Supply permanent URLs
• Enable overall indexing of the full-text included in the database
• Date and time the deposits
• Give an institutional viewpoint
• Interconnect with other important databases in the world
• Enrich automatically the institutional directories
• Limit record capture
• Centralize reader’s alerts
• Manage long-term archiving
• And … limit efforts to one single specialized unit
Journée détude ISD, Tunis 10 avril 2013 – Christine
Berthaud
33. The declinations of HAL
• thematic portals
• HAL-SHS, @rchiveSIC, HAL-SDE…
• Generic portals
• HAL, TEL, CEL…
• 83 Institutional Portals developed specifically in HAL
:
• 32 university
• 28 high schools,
• 23 research institutions…
• 2563 Collections
For laboratories, research projects, conferences, journals ...
Journée détude ISD, Tunis 10 avril 2013 – Christine
Berthaud
34. In conclusion, the future
• The Episciences.org project is involved in the open access
movement.
• The main idea is to provide a technical platform of peer-reviewing;
its purpose is to promote the emergence of epijournals (overlay
journals), namely open access electronic journals taking their
contents from preprints deposited in open archives such as arXiv or
HAL, that have not been published elsewhere.
• No assignment of rights is signed with authors retain their rights in
their articles.
• Being prepared : "Episciences-Maths" project supported by the
Institut Fourier, CNRS UMR5582 and Université Joseph Fourier
(Grenoble)
Journée détude ISD, Tunis 10 avril 2013 – Christine
Berthaud