Vector Search -An Introduction in Oracle Database 23ai.pptx
a standard for document archiving
1. PDF/A – A standard for
document archiving
Dipl. Inf. Reinhold Müller-Meernach
Röttenbach
006
2/2
Dr. Uwe Wächter No.
Roßdorf
SEAL Systems
info@sealsystems.com
www.sealsystems.com
2. DOCUMENT MANAGEMENT
PDF/A – A standard for
document archiving
Dipl. Inf. Reinhold Müller-Meernach
Röttenbach
Dr. Uwe Wächter
Roßdorf
The »leader of the pack« TIFF/G4 has got competition.
With PDF/A, a new standard for long-term archiving of electronic
documents has now been defined. Checks on existing document
archives show that a large amount of the PDF files archived there
don’t even meet the minimum requirements of the new standard.
But this is no longer a reason to panic.
3. DOCUMENT MANAGEMENT
Paper archives have been and
are being replaced by digital stora-
ge. The number of electronically
created documents is growing con-
stantly. For long-term archiving of
these documents, standards are
beneficial if the well-defined repro-
ducibility and distribution is to be Fig. 1: Investigations show that
supported over a long period of almost no PDF files in existing
time. The monochrome grid format archives conform to PDF/A.
(Fig.: Seal Systems AG,
TIFF/G4 has been the de facto Röttenbach)
standard for more than ten years.
For text-laden documents (such as References to external sources, of this are the effects of trans-
those from Office applications), the such as further files, images, web- parency, colour mixing and back-
Portable Document Format, PDF sites or external fonts contradict ground stamping. These characte-
for short, from Adobe has become the PDF/A norm. ristics can not be represented 1:1
established as an application-neu- with many PDF generating proces-
An especially important charac-
tral exchange format. With PDF/A, ses. Therefore, with PDF/A, this
teristic of PDF/A is the embedding
there is now a standard, that must be avoided.
of fonts. Only this can ensure that
establishes a part of the PDF speci-
a document can be printed in
fication to make PDF files parti-
exactly the same way after many Secure archiving in line
cularly suitable for archiving.
years, without having to use font with the norm
The ISO Norm 19005-1 is based definitions on a computer or prin-
on the »PDF Reference 1.4« from ter. PDF can also demonstrate its Secure archiving in line with the
Adobe. It makes PDF 1.4 more advantage over TIFF G4 through norm means that the saved files
precise and defines whether its its colour displays. However, this can then still be used if the admi-
properties are obligatory, re- only conforms to standards if the nistration system corrupts. There-
Fig. 2: With test and correction
procedures for PDF/A, data
stocks can be viewed and modified
as the case may need.
(Fig.: Seal Systems AG, Röttenbach)
commended, limited or prohibited. PDF file can also be printed in- fore, PDF/A-conforming files must
This makes it possible to differenti- stantly on all colour printers. operate a clause on metadata.
ate two levels of PDF/A: a (PDF/A- To do this, colour definitions un-
The Portable Document Format
1a) and b (PDF/A-1b). related to the equipment are saved
makes it possible to save graphic
in the file, which are only conver-
displays in different representations
Level B is important ted when printing.
simultaneously. This means an
for archiving Simple and safe reproduction can improved display on different
be prevented through protective screens (PC or Handheld or PDA)
Level B deals mainly with the mechanism, compressions and or a user orientation (German or
preservation of the external appea- encodings. Therefore, these tech- English) is possible. However, as
rance over long periods of time. niques are also prohibited for reproduction is unclear with this
To do this, it is necessary that all
PDF/A conforming files. method, this function contradicts
the information needed for the
ISO 19005-1.
reproduction is contained in the Frequently, image overlapping in
file itself. For example, this con- certain applications can be When using level A, such charac-
cerns all texts, graphics, images, specifically used to elicit certain teristics are additionally standardised
fonts and colour information. effects for the observer. Examples using level B, which define the
4. DOCUMENT MANAGEMENT
When this question is answered
then the next steps can be taken.
It must be clarified which proce-
dures guarantee that these
minimum requirements are com-
plied with. In addition, it must be
decided how to proceed with any
old stock. And finally, it must be
specified who is responsible for
inspection and compliance of these
Fig. 3: Test logs provide users processes.
and IT managers with information
about the quality of the data In the meantime, there are now
stock. (Fig.: Seal Systems AG, countless software tools for
Röttenbach)
creating PDF files. The most well-
known is Acrobat from Adobe.
As well as many converting
applications from third-party
suppliers, there are a number of
applications that make it possible
to directly export PDFs. In the
properties for content, structure operational practice. Therefore, it
future, this should also be possible
and semantics. This means there is must be checked whether company
for the Office products from
the opportunity to be able to standards can also be defined
Microsoft. However, investigations
re-extract parts or information taking into account practicability
show that some PDFs created in
from the PDF documents at a later and compatibility with existing
this way do not even meet the
point in time. Furthermore, this procedures. This takes over defini-
standard specification, so definitely
level explains how a Unicode font tions from the ISO norm, combi-
fall short of the stricter ISO 19005-1.
must be dealt with. Work is ning comprehensible instructions
already being carried out on the for action which can be used by all In a very small number of cases,
expansion of this norm, which is company members. PDF files are created solely within
named 19005-2 and is based on the company with an inspected
»Adobe PDF Reference 1.6«. Define minimum standards tool. PDF is an exchange format –
meaning the probability that con-
The past has shown that even in- siderable data stocks stem from
PDF/A level A covers other, uncheckable sources is high.
dividual industries can agree on a
the complete norm
standardised comprehension and Business partners, the internet
Every international norm is a procedure. and emails are examples of this.
compromise between the interest If a company decides on PDF For these reasons, it makes more
groups concerned and their as a reliable document format for sense for the standard to be
requirements, which can be long-term archiving, then this is inspected by the responsible body
contradicting in parts. Existing pro- the next logical question: is every within the archiving organisation.
cedures and local regulations PDF allowed or must it satisfy Nowadays, there are test
should be taken into account. On certain minimum requirements? programs, with which PDF files
the other hand, new technical When answering this question can be inspected for configurable
possibilities also shouldn’t be ruled and defining the minimum ISO and company standard compa-
out. Maximum specification of all standards, the ISO norm for tibility. The result of an inspection
details can lead to unusability in PDF/A can help. is always a confirmation of
Fig 4: PDF/A inspections can be
integrated into existing Document
Management Systems (DMS) and
Product Data Management Systems.
(Fig.: Seal Systems AG, Röttenbach)
5. DOCUMENT MANAGEMENT
conformity or a rejection. In the steps can be derived from this. A becoming more powerful and
latter case, a qualified analysis part of the data can be corrected, extensive with every new version.
should take place so that the another part not. 3D visualisations, form processing,
creator can be given targeted digital signatures, change mana-
instructions for use. gement and pre-print inspection
PDF/A – an archiving format
are only parts of the PDF applica-
However, an alternative to with a future tion spectrum. The use as an
rejection can also be the automated
extensively simple exchange format
correction of a PDF file to norm If the sources are known, it can
suggests itself for use as an archi-
conformity. Frequently observed be possible to make a new norm-
ving format. The technical
incompatibilities, such as missing conforming version available. First requirements here are less but the
font embedding, can be corrected experiences in reference supplies legal ones are higher.
as a result with the minimum of from industrial customers have
With PDF/A, a norm has now
Fig 5: The diagram been passed, with which risks and
shows the integra- future expense for long-term
tion of the PDF/A
methods into the archiving can be minimised.
SAP document There are tools to generate, inspect
management system. and adjust PDF/A files. As a result,
(Fig.: Seal Systems
the new standard will rapidly be
AG, Röttenbach)
established as practical alternative.
effort. To safeguard processes, the shown that almost no PDF file met
question of the time of a con- the PDF/A-1b definition.
formity inspection is decisive.
The most frequent errors are
The first and best time is defini- (in this order) missing metadata,
tely the generation process. For no font embedding, colour
unknown documents or non- management and protection me-
secured generation processes, a chanisms. However, any weaknes-
simple checking procedure is supp- ses can be automatically corrected
lied on the desktop. Both methods through suitable tools. The
make it easier for the parties Portable Document Format is
concerned to carry out an
inspection but do not force
them to do so.
Therefore, it is recommen-
ded that the manufacturer
and operator of Document
Management Systems (DMS),
*Enterprise Content Management
Systems (ECM) and archiving
solutions provide a suitable inter-
face, through which test methods
can be integrated.
If this interface is then run by all
archiving and converting processes,
Fig. 6: The data
the PDF/A inspection is obligatory. format PDF/A is
Even for existing PDF archives, a classed as a norm,
one-off or regular inspection is with which both the
recommended. A first run provides risks and predomi-
nantly the expense of
information about the quality of long-term archiving
the data stock. Then subsequent can be minimised.