SlideShare a Scribd company logo
1 of 53
An Introduction to
Document
Scanning
Business Document Scanning 101:
From the Data Capture Prospective
So you
have a lot
of this?
And you’ve decided
this is the answer.
So you need a crash
course in scanning
Lessons:
Lesson 1: Simplex or Duplex
Lesson 2: Resolution
Lesson 3: Color Depth
Lesson 4: File Formats
Lesson 5: Indexing
Lesson 6: Document Prep and Estimating Volumes
Homework: Learn More About Data Capture and Document Management
Lesson 1: Simplex or Duplex
Are the documents single or double-sided?
This may seem obvious but…
You many not want documents such as
purchase invoices scanned in duplex where
the back of the document only contains terms
and conditions.
On the other hand, if the documents have
high legal importance you may want every
conceivable item of information captured
such as small signatures or notes on the back.
Duplex scanning requires
more scanning
time/processing and
results in larger files.
And you don’t have to be
a genius to know that is
more costly.
Lesson 2: Resolution
So what is resolution and why does it matter?
Resolution is expressed as the number of dots
per inch (dpi) or less frequently pixels. Pixel
refers to “picture element” per inch (ppi) which
make up the image or really at what the image
was sampled.
What is Resolution?
Implications of Resolution
This graphic contains
two images, a “0” as a
grayscale image and an
“x” as black and white.
Implications of Resolution
• If we halved the size of the grid horizontally and
vertically (doubled the resolution), the pixels would
appear smoother and produce a better quality image,
the inverse would be true if we doubled the size of the
squares.
• If we kept the squares the same size but reduced the
size of the characters significantly the resolution is
insufficient.
Implications of Resolution
• The higher the resolution, the better the image
quality.
• For small characters, increase the resolution to
capture them effectively
So:
And, the higher the resolution,
the slower the scan and the
larger the file.
And, the higher the resolution,
the slower the scan and the
larger the file.
Which means higher scanning
and file storage costs, Einstein.
Typical Scanning Resolutions
• Web graphic – 96 dpi
• Standard archive document – 200 dpi
• Document required for optical character
recognition (OCR) – 300 dpi
• Plans/drawings for vectorization – 400 dpi
• Documents required for historical archiving –
600 dpi
Resolution is generally determined by intended
use.
Lesson 3: Color Depth
Documents scanned in black and white are
always scanned as grayscale within the
scanner. The scanner then applies a process
known as thresholding to the image to produce
the black and white image.
Thresholding simply determines when a pixel
should be black or white.
Understanding Black and White
Grayscale is used when the image contains
color or grayscale data and the tone of the
image needs to be retained, i.e. photographs or
shaded graphics.
Understanding Grayscale
Color is obviously used when the image
contains color data. Some users wish to retain
important color information for example, land
boundaries or graphical data, and not
letterhead logos, highlighters, etc.
Understanding Color
Bits per
pixel
File Storage Requirements
24 8 1
Bits per
pixel
File Storage Requirements
24 8 1
So the storage requirements for a grayscale image is 8
times larger than a black and white, and color
requirements are 24 times more than black and white.
And, remember Einstein, larger files equals higher costs.
Lesson 4: File Formats
TIFF
JPEG
PDF
For an in-depth look visit: PDF v. TIFF
• Well established format
• Most often used for black and white documents
• Supports multiple pages
• Interpreted correctly by most applications with a
caution on certain color implementations
• “Group 4” format refers to the compression method
used on black and white images which is a “lossless”
compression where original data is not lost in
compression/decompression.
Understanding TIFF*
TIFF
*Tagged Image File Format
• Well established format by Adobe
• Supports color, grayscale, and black and white
• Supports multiple pages
• Generally stored using Group 4 and JPEG
compression although supports other formats too.
• Used when more advanced features are needed
within the file such as embedded Optical Character
Recognition (OCR), hyperlinking, digital signing
and other security features.
Understanding PDF*
PDF
*Portable Document Format
Searchable PDF:
Understanding PDF Variations
PDF
Many scanning applications can create searchable
PDF files. Here, the scanner applies OCR technology
to make the file text searchable. Your application
may label this as “make searchable”, “apply OCR”,
“text-under-image” or “searchable PDF.” If selected,
your file will be text searchable or text selectable
within the Acrobat viewer and many other programs
that search PDF files
PDF/A:
Understanding PDF Variations
PDF
PDF/A is an ISO-standard for digital preservation or
archiving of electronic documents.
It differs from standard PDF by omitting features not
necessary for long-term archiving, such as font
linking.
Growing in international government and industry
segments, including legal systems, libraries,
newspapers, and regulated industries.
Understanding JPEG
JPEG
*Joint Photographic Expert Group
• Well established format
• Most often used for photographs and graphics
• Supports single page only
• A “lossy” compression format, that is, some of the
data is lost during compression. however it provides
good compression ratios for grayscale and color
images.
Compression and File Size
*Comparison courtesy of Wikipedia
OMG,
right?
JPEG
Compression and File Size
*Comparison courtesy of Wikipedia
OMG,
right?
The bottom line: experiment with your
images and file size. A middle quality
scan may meet your needs and save
tremendous file space.
Lesson 5: Indexing
For an in-depth look visit: What is Document Indexing?
What is Indexing?
Document indexing (sometimes referred to as
metadata) enables a users to quickly and
efficiently locate their documents, either
through a folder structure, database or
electronic document management system.
Avoid a disaster
Avoid a disaster
Great care should be taken to design an efficient indexing
scheme. If the design is not devised correctly at the outset,
trying to rectify it later can be both difficult and costly.
Sometimes it makes sense to replicate the current manual
method for document location to create a familiar, but faster
system.
Don’t worry, there is automation
Technologies such as
• Barcode recognition
• OCR
• Batch processing
• Data Mining, Text Mining
can save time and money by automating indexing and
more.
Using Barcodes for Indexing
Intelligent data
capture software
can extract data
from barcodes to
create and send
index information
to a document
management
system.
For an in-depth look at barcodex in data capture
visit: What Can Barcodes Do For Me?
With OCR, make your image-based file fully
text searchable or extract data from a zone for
indexing.
Using OCR for Indexing
With zonal OCR, document
areas are identified for
automatic OCR capture.
Additionally, drag-and-drop
OCR allows an operator to
highlight document text
which is automatically OCR'd
and dropped into index
fields.
TIPS for OCR
• Scan at 300 dpi for greater accuracy
and ensure that small text is captured.
• Limit the use of color on documents.
• Pre-process the image with image
enhancement software (available in
many data capture products, learn
more).
Intelligent data capture solutions often use batch processing that
lets you process a whole folder of documents at a time. Some
products can “watch folders,” and process files as they are
scanned into the folder.
What is Batch Processing?
For an in-depth look visit: What is Batch Document Processing?
Intelligent data capture solutions often use batch processing that
lets you process a whole folder of documents at a time. Some
products can “watch folders,” and process files as they are
scanned into the folder.
What is Batch Processing?
Processing can include indexing, file routing, file splitting,
and cleaning/enhancing the scans. Learn more.
Lesson 6:
Document
Prep and
Estimating
Volumes
Preparation, quality control and indexing are the
most time consuming elements of any scanning
job and usually the most costly.
TIPS for OCR
Typically a good operator can prepare 750-1000
documents per hour, however a number of
factors may drop throughput to 300 or 500.
Odd Size Document Type
sales receipts, photos,
plans/drawings,
Bindings
three ring, spiral, glue,
folder
Fasteners
staples, paper clips binder
clips, rubber bands
Attachments
Post-its, tabs
Factors that Influence Document Prep
Estimating Volumes and
Storage
Type
Paper
Folders Ring Binder
Lever arch
folder
Transfer
Cases
Bankers
Boxes Archive Boxes
Filing
Cabinets
Simplex
(avg #s)
30 to 100 200 500 500 500 2500 3000/drawer
Duplex
(avg #s)
60 to 200 400 1000 1000 1000 5000 6000/drawer
Learn more about estimating volumes
Homework: Learn More About
Data Capture and Document
Management
More
Document Management
Determine if you require a full document
management system or do you just need a
simple search and retrieval system?
Can I use it as a stepping stone while I
evaluate my document management
system?
More
Learn More
Call us for information on:
How to digitize medical or dental records.
The best way to scan medical or dental records.
Scanning paper records.
Document scanning for medical or dental records.
Going paperless at the medical or dental office.
How to capture medical or dental records efficiently.
Scanning medical or dental records with Fujitsu ScanSnap.
Touchscreen scanning of medical or dental records.
How to improve your medical or dental workflow with document scanning.
Scanning to EMR or scanning to EDR
How to maximize your Fujitsu ScanSnap
Using your ScanSnap for a basic document management system
Using barcodes and the Fujitsu ScanSnap
Scanning with the Fujitsu ScanSnap
Automating workflow with the Fujitsu ScanSnap
Automating document management capture
Scanning into Dentrix
Indexing into Dentrix
Understanding basic Document Scanning
Things your teacher never told you about Document Scanning
An introduction to Document Scanning
Scanning Fundamentals for the average Joe
By DocuFi
Makers of ImageRamp Data Capture Solutions
30 years’ Experience in the Document Imaging
Market
Proven Fujitsu ISV Partner
Find out more at ImageRamp and
www.docufi.com
Image Credits
• Pjohnkeane, Requirements, requirements, requirements, http://bit.ly/1fcULDf
• Doug Waldron, “Files (85)”, http://bit.ly/1bfciII
• UBC Learning Commons, “Scanner_icon-1024x671”, http://bit.ly/1eewI4P
• Knile Lucy, you have some sorting to do! http://bit.ly/19bSgjF
• Michael 1952, SJSA Fifth Grade - I Fell in Love With The Teacher, http://bit.ly/1eevu9A
• Ton Haex, ”Einstein show.... “, http://bit.ly/LVqeBi
• Loco Steve, “Sunrise under scrutiny”, http://bit.ly/1eevSVv
• Tax Credits, “ Coins”, http://bit.ly/1mtQj5j
• j_baer, ”Ubuntu Color Wheel”, http://bit.ly/1jARikx
• Marcin Wichary, Alphabetical, http://bit.ly/1aILOku
• David Erickson e-strategyblog.com, “Hindenburg Disaster”, http://bit.ly/1jASeFF
• William Warby wwarby,” Gears”, http://bit.ly/1dwtU1S
• Alan Cleaver,” watching”, http://bit.ly/1h1k9k7
• Zoetnet, “overflowing,” http://bit.ly/KHW9Em
• Seattle Municipal Archives, “Comptroller's Office employees, 1960”, http://bit.ly/1eBvLGE
• Seattle Municipal Archives , “City Light worker with office machine, 1954”,
http://bit.ly/1eBw3NM
• Patrick Hoesly, “Thank you” http://bit.ly/17xKErE
All images are owned or licensed by DocuFi with acknowledgement given to:

More Related Content

What's hot

Document management system
Document management systemDocument management system
Document management systemRatan Agarwal
 
What Is Records Management
What Is Records ManagementWhat Is Records Management
What Is Records ManagementSteve Williams
 
Document Management System(DMS)
Document Management System(DMS)Document Management System(DMS)
Document Management System(DMS)Nishant Shah
 
Document Management System - docManager
Document Management System - docManagerDocument Management System - docManager
Document Management System - docManagerRajesh Shah
 
Document Management System
Document Management SystemDocument Management System
Document Management SystemSidhartha Sahoo
 
Content Management Systems in Libraries
Content Management Systems in LibrariesContent Management Systems in Libraries
Content Management Systems in LibrariesChris
 
greenstone digital library software
greenstone digital library softwaregreenstone digital library software
greenstone digital library softwaresharon bacalzo
 
Digital preservation from a records management perspective
Digital preservation from a records management perspectiveDigital preservation from a records management perspective
Digital preservation from a records management perspectiveMichael Day
 
LogicalDOC Clustering
LogicalDOC ClusteringLogicalDOC Clustering
LogicalDOC ClusteringLogicalDOC
 
Analytics in healthcare
Analytics in healthcareAnalytics in healthcare
Analytics in healthcareAnushkaAlok
 
LogicalDOC White Paper
LogicalDOC White PaperLogicalDOC White Paper
LogicalDOC White PaperLogicalDOC
 
Basic Component of Document Management System Software
Basic Component of Document Management System SoftwareBasic Component of Document Management System Software
Basic Component of Document Management System SoftwareDigismartek
 
Data Archiving & Purging.pptx
Data Archiving & Purging.pptxData Archiving & Purging.pptx
Data Archiving & Purging.pptxMishika Bharadwaj
 
Implementing Electronic Filling with Integrated Document Management Systems
Implementing Electronic Filling with Integrated Document Management SystemsImplementing Electronic Filling with Integrated Document Management Systems
Implementing Electronic Filling with Integrated Document Management SystemsGoutama Bachtiar
 
8 Reasons You Need an Electronic Document Management System
8 Reasons You Need an Electronic Document Management System8 Reasons You Need an Electronic Document Management System
8 Reasons You Need an Electronic Document Management SystemHelpSystems
 
10 steps to implement edms
10 steps to implement edms10 steps to implement edms
10 steps to implement edmsLogicalDOC
 
Document Management Software Brochure
Document Management Software BrochureDocument Management Software Brochure
Document Management Software BrochureOsource
 

What's hot (20)

Document management system
Document management systemDocument management system
Document management system
 
What Is Records Management
What Is Records ManagementWhat Is Records Management
What Is Records Management
 
Document Management System(DMS)
Document Management System(DMS)Document Management System(DMS)
Document Management System(DMS)
 
Document Management System - docManager
Document Management System - docManagerDocument Management System - docManager
Document Management System - docManager
 
Document Management System
Document Management SystemDocument Management System
Document Management System
 
Content Management Systems in Libraries
Content Management Systems in LibrariesContent Management Systems in Libraries
Content Management Systems in Libraries
 
greenstone digital library software
greenstone digital library softwaregreenstone digital library software
greenstone digital library software
 
Digital preservation from a records management perspective
Digital preservation from a records management perspectiveDigital preservation from a records management perspective
Digital preservation from a records management perspective
 
LogicalDOC Clustering
LogicalDOC ClusteringLogicalDOC Clustering
LogicalDOC Clustering
 
Analytics in healthcare
Analytics in healthcareAnalytics in healthcare
Analytics in healthcare
 
Digitization workflow
Digitization workflowDigitization workflow
Digitization workflow
 
LogicalDOC White Paper
LogicalDOC White PaperLogicalDOC White Paper
LogicalDOC White Paper
 
Basic Component of Document Management System Software
Basic Component of Document Management System SoftwareBasic Component of Document Management System Software
Basic Component of Document Management System Software
 
Data Archiving & Purging.pptx
Data Archiving & Purging.pptxData Archiving & Purging.pptx
Data Archiving & Purging.pptx
 
Proposal DMS
Proposal   DMS Proposal   DMS
Proposal DMS
 
Implementing Electronic Filling with Integrated Document Management Systems
Implementing Electronic Filling with Integrated Document Management SystemsImplementing Electronic Filling with Integrated Document Management Systems
Implementing Electronic Filling with Integrated Document Management Systems
 
8 Reasons You Need an Electronic Document Management System
8 Reasons You Need an Electronic Document Management System8 Reasons You Need an Electronic Document Management System
8 Reasons You Need an Electronic Document Management System
 
Library portal
Library portalLibrary portal
Library portal
 
10 steps to implement edms
10 steps to implement edms10 steps to implement edms
10 steps to implement edms
 
Document Management Software Brochure
Document Management Software BrochureDocument Management Software Brochure
Document Management Software Brochure
 

Viewers also liked

Why you need to use document scanning management system for business?
Why you need to use document scanning management system for business?Why you need to use document scanning management system for business?
Why you need to use document scanning management system for business?Digismartek
 
Scanning & document management
Scanning & document managementScanning & document management
Scanning & document managementGautam Ganguly
 
Document scanning and capture (local, central, outsource) what's working best
Document scanning and capture (local, central, outsource) what's working bestDocument scanning and capture (local, central, outsource) what's working best
Document scanning and capture (local, central, outsource) what's working bestVander Loto
 
Scanning Document Types | Record Nations
Scanning Document Types | Record NationsScanning Document Types | Record Nations
Scanning Document Types | Record NationsRecord Nations
 
Apa itu soft copy
Apa itu soft copyApa itu soft copy
Apa itu soft copyjohnthj
 

Viewers also liked (15)

What is Intelligent Document and Data Capture? A look at the technologies to ...
What is Intelligent Document and Data Capture? A look at the technologies to ...What is Intelligent Document and Data Capture? A look at the technologies to ...
What is Intelligent Document and Data Capture? A look at the technologies to ...
 
Image Scanning Services
Image Scanning ServicesImage Scanning Services
Image Scanning Services
 
Why you need to use document scanning management system for business?
Why you need to use document scanning management system for business?Why you need to use document scanning management system for business?
Why you need to use document scanning management system for business?
 
What is Data Capture
What is Data CaptureWhat is Data Capture
What is Data Capture
 
RU
RURU
RU
 
Scanning & document management
Scanning & document managementScanning & document management
Scanning & document management
 
Document scanning and capture (local, central, outsource) what's working best
Document scanning and capture (local, central, outsource) what's working bestDocument scanning and capture (local, central, outsource) what's working best
Document scanning and capture (local, central, outsource) what's working best
 
What is Document Indexing? A tutorial for intelligent data capture.
What is Document Indexing? A tutorial for intelligent data capture.What is Document Indexing? A tutorial for intelligent data capture.
What is Document Indexing? A tutorial for intelligent data capture.
 
PDF vs. TIFF, An Evaluation of Document Scanning File Formats
PDF vs. TIFF, An Evaluation of Document Scanning File FormatsPDF vs. TIFF, An Evaluation of Document Scanning File Formats
PDF vs. TIFF, An Evaluation of Document Scanning File Formats
 
Mobile Cloud Capture: Customize your Data Capture on Mobile Devices with Proc...
Mobile Cloud Capture: Customize your Data Capture on Mobile Devices with Proc...Mobile Cloud Capture: Customize your Data Capture on Mobile Devices with Proc...
Mobile Cloud Capture: Customize your Data Capture on Mobile Devices with Proc...
 
Scanning Document Types | Record Nations
Scanning Document Types | Record NationsScanning Document Types | Record Nations
Scanning Document Types | Record Nations
 
ChronoScan Document Scanning and Capture for Unparralleled Data Extraction an...
ChronoScan Document Scanning and Capture for Unparralleled Data Extraction an...ChronoScan Document Scanning and Capture for Unparralleled Data Extraction an...
ChronoScan Document Scanning and Capture for Unparralleled Data Extraction an...
 
Fujitsu ScanSnap Scanner, an overview of document data capture with barcodes,...
Fujitsu ScanSnap Scanner, an overview of document data capture with barcodes,...Fujitsu ScanSnap Scanner, an overview of document data capture with barcodes,...
Fujitsu ScanSnap Scanner, an overview of document data capture with barcodes,...
 
What can barcodes do for me? A look at barcodes in Document Management/EMR da...
What can barcodes do for me? A look at barcodes in Document Management/EMR da...What can barcodes do for me? A look at barcodes in Document Management/EMR da...
What can barcodes do for me? A look at barcodes in Document Management/EMR da...
 
Apa itu soft copy
Apa itu soft copyApa itu soft copy
Apa itu soft copy
 

Similar to An Introduction to Document Scanning, Understanding Your Requirements

Document Automation and Integration Webinar For CVision
Document Automation and Integration Webinar For CVisionDocument Automation and Integration Webinar For CVision
Document Automation and Integration Webinar For CVisionChris Riley ☁
 
Grootschalige digitalisering van archivalia
Grootschalige digitalisering van archivaliaGrootschalige digitalisering van archivalia
Grootschalige digitalisering van archivaliaMarc Holtman
 
Praveen
PraveenPraveen
Praveenrjmktg
 
Asset Management and Workflow
Asset Management and WorkflowAsset Management and Workflow
Asset Management and WorkflowVirtu Institute
 
Developing a plan for your imaging project
Developing a plan for your imaging projectDeveloping a plan for your imaging project
Developing a plan for your imaging projectTAB
 
Scanning and Digitization
Scanning and DigitizationScanning and Digitization
Scanning and DigitizationMike Sleigh
 
Understanding EDP (Electronic Data Processing) Environment
Understanding EDP (Electronic Data Processing) EnvironmentUnderstanding EDP (Electronic Data Processing) Environment
Understanding EDP (Electronic Data Processing) EnvironmentAdetula Bunmi
 
Smartfish Presentation 2007
Smartfish Presentation 2007Smartfish Presentation 2007
Smartfish Presentation 2007waynehooper
 
Backing Up And Working With Digital Documents
Backing Up And Working With Digital DocumentsBacking Up And Working With Digital Documents
Backing Up And Working With Digital DocumentsNancy Duhon
 

Similar to An Introduction to Document Scanning, Understanding Your Requirements (20)

What is Batch Document Processing? A tutorial for document capture.
What is Batch Document Processing?  A tutorial for document capture.What is Batch Document Processing?  A tutorial for document capture.
What is Batch Document Processing? A tutorial for document capture.
 
Document Automation and Integration Webinar For CVision
Document Automation and Integration Webinar For CVisionDocument Automation and Integration Webinar For CVision
Document Automation and Integration Webinar For CVision
 
Batch Document Processing with ImageRamp Batch
Batch Document Processing with ImageRamp BatchBatch Document Processing with ImageRamp Batch
Batch Document Processing with ImageRamp Batch
 
Automatic file naming and routing for scanned documents and existing files.
Automatic file naming and routing for scanned documents and existing files.  Automatic file naming and routing for scanned documents and existing files.
Automatic file naming and routing for scanned documents and existing files.
 
Grootschalige digitalisering van archivalia
Grootschalige digitalisering van archivaliaGrootschalige digitalisering van archivalia
Grootschalige digitalisering van archivalia
 
Document management tools and techniques
Document management tools and techniquesDocument management tools and techniques
Document management tools and techniques
 
Praveen
PraveenPraveen
Praveen
 
Asset Management and Workflow
Asset Management and WorkflowAsset Management and Workflow
Asset Management and Workflow
 
Folder Watching For Automated Document Capture, Batch Scanning
Folder Watching For Automated Document Capture, Batch ScanningFolder Watching For Automated Document Capture, Batch Scanning
Folder Watching For Automated Document Capture, Batch Scanning
 
Intelligent Data Extraction, Turning Content into Data, A Look at Advanced Ca...
Intelligent Data Extraction, Turning Content into Data, A Look at Advanced Ca...Intelligent Data Extraction, Turning Content into Data, A Look at Advanced Ca...
Intelligent Data Extraction, Turning Content into Data, A Look at Advanced Ca...
 
Improve OCR Accuracy, Clean Up and Enhance Scanned Images
Improve OCR Accuracy, Clean Up and Enhance Scanned ImagesImprove OCR Accuracy, Clean Up and Enhance Scanned Images
Improve OCR Accuracy, Clean Up and Enhance Scanned Images
 
Introduction to Document Management
Introduction to Document ManagementIntroduction to Document Management
Introduction to Document Management
 
Developing a plan for your imaging project
Developing a plan for your imaging projectDeveloping a plan for your imaging project
Developing a plan for your imaging project
 
Scanning and Digitization
Scanning and DigitizationScanning and Digitization
Scanning and Digitization
 
Understanding EDP (Electronic Data Processing) Environment
Understanding EDP (Electronic Data Processing) EnvironmentUnderstanding EDP (Electronic Data Processing) Environment
Understanding EDP (Electronic Data Processing) Environment
 
Automated Data Capture and Extraction with ChronoScan for Automated Metadata ...
Automated Data Capture and Extraction with ChronoScan for Automated Metadata ...Automated Data Capture and Extraction with ChronoScan for Automated Metadata ...
Automated Data Capture and Extraction with ChronoScan for Automated Metadata ...
 
8 Document Capture Must Haves, a Document Management Tutorial
8 Document Capture Must Haves, a Document Management Tutorial8 Document Capture Must Haves, a Document Management Tutorial
8 Document Capture Must Haves, a Document Management Tutorial
 
Smartfish Presentation 2007
Smartfish Presentation 2007Smartfish Presentation 2007
Smartfish Presentation 2007
 
Backing Up And Working With Digital Documents
Backing Up And Working With Digital DocumentsBacking Up And Working With Digital Documents
Backing Up And Working With Digital Documents
 
DU_SERIES_Session1.pdf
DU_SERIES_Session1.pdfDU_SERIES_Session1.pdf
DU_SERIES_Session1.pdf
 

More from DocuFi, offering HAI and Infection Prevention Analytics (6)

HAIvia Mobile for Infection Prevention Data Capture and Forms Management (for...
HAIvia Mobile for Infection Prevention Data Capture and Forms Management (for...HAIvia Mobile for Infection Prevention Data Capture and Forms Management (for...
HAIvia Mobile for Infection Prevention Data Capture and Forms Management (for...
 
Automated Document Indexing with ImageRamp
Automated Document Indexing with ImageRampAutomated Document Indexing with ImageRamp
Automated Document Indexing with ImageRamp
 
Custom Capture Tool Development
Custom Capture Tool DevelopmentCustom Capture Tool Development
Custom Capture Tool Development
 
Tips to Solve Common Problems Reading Barcodes
Tips to Solve Common Problems Reading BarcodesTips to Solve Common Problems Reading Barcodes
Tips to Solve Common Problems Reading Barcodes
 
Intelligent Data Capture Just Got Better, What's New in ImageRamp 6
Intelligent Data Capture Just Got Better, What's New in ImageRamp 6Intelligent Data Capture Just Got Better, What's New in ImageRamp 6
Intelligent Data Capture Just Got Better, What's New in ImageRamp 6
 
Transformation in the Electric Utility Industry, Redevelopment of Decommissio...
Transformation in the Electric Utility Industry, Redevelopment of Decommissio...Transformation in the Electric Utility Industry, Redevelopment of Decommissio...
Transformation in the Electric Utility Industry, Redevelopment of Decommissio...
 

Recently uploaded

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 

Recently uploaded (20)

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 

An Introduction to Document Scanning, Understanding Your Requirements

  • 1. An Introduction to Document Scanning Business Document Scanning 101: From the Data Capture Prospective
  • 2. So you have a lot of this?
  • 3. And you’ve decided this is the answer.
  • 4. So you need a crash course in scanning
  • 5. Lessons: Lesson 1: Simplex or Duplex Lesson 2: Resolution Lesson 3: Color Depth Lesson 4: File Formats Lesson 5: Indexing Lesson 6: Document Prep and Estimating Volumes Homework: Learn More About Data Capture and Document Management
  • 6. Lesson 1: Simplex or Duplex Are the documents single or double-sided? This may seem obvious but…
  • 7. You many not want documents such as purchase invoices scanned in duplex where the back of the document only contains terms and conditions. On the other hand, if the documents have high legal importance you may want every conceivable item of information captured such as small signatures or notes on the back.
  • 8. Duplex scanning requires more scanning time/processing and results in larger files.
  • 9. And you don’t have to be a genius to know that is more costly.
  • 11. So what is resolution and why does it matter?
  • 12. Resolution is expressed as the number of dots per inch (dpi) or less frequently pixels. Pixel refers to “picture element” per inch (ppi) which make up the image or really at what the image was sampled. What is Resolution?
  • 13. Implications of Resolution This graphic contains two images, a “0” as a grayscale image and an “x” as black and white.
  • 14. Implications of Resolution • If we halved the size of the grid horizontally and vertically (doubled the resolution), the pixels would appear smoother and produce a better quality image, the inverse would be true if we doubled the size of the squares. • If we kept the squares the same size but reduced the size of the characters significantly the resolution is insufficient.
  • 15. Implications of Resolution • The higher the resolution, the better the image quality. • For small characters, increase the resolution to capture them effectively So:
  • 16. And, the higher the resolution, the slower the scan and the larger the file.
  • 17. And, the higher the resolution, the slower the scan and the larger the file. Which means higher scanning and file storage costs, Einstein.
  • 18. Typical Scanning Resolutions • Web graphic – 96 dpi • Standard archive document – 200 dpi • Document required for optical character recognition (OCR) – 300 dpi • Plans/drawings for vectorization – 400 dpi • Documents required for historical archiving – 600 dpi Resolution is generally determined by intended use.
  • 20. Documents scanned in black and white are always scanned as grayscale within the scanner. The scanner then applies a process known as thresholding to the image to produce the black and white image. Thresholding simply determines when a pixel should be black or white. Understanding Black and White
  • 21. Grayscale is used when the image contains color or grayscale data and the tone of the image needs to be retained, i.e. photographs or shaded graphics. Understanding Grayscale
  • 22. Color is obviously used when the image contains color data. Some users wish to retain important color information for example, land boundaries or graphical data, and not letterhead logos, highlighters, etc. Understanding Color
  • 23. Bits per pixel File Storage Requirements 24 8 1
  • 24. Bits per pixel File Storage Requirements 24 8 1 So the storage requirements for a grayscale image is 8 times larger than a black and white, and color requirements are 24 times more than black and white. And, remember Einstein, larger files equals higher costs.
  • 25. Lesson 4: File Formats TIFF JPEG PDF For an in-depth look visit: PDF v. TIFF
  • 26. • Well established format • Most often used for black and white documents • Supports multiple pages • Interpreted correctly by most applications with a caution on certain color implementations • “Group 4” format refers to the compression method used on black and white images which is a “lossless” compression where original data is not lost in compression/decompression. Understanding TIFF* TIFF *Tagged Image File Format
  • 27. • Well established format by Adobe • Supports color, grayscale, and black and white • Supports multiple pages • Generally stored using Group 4 and JPEG compression although supports other formats too. • Used when more advanced features are needed within the file such as embedded Optical Character Recognition (OCR), hyperlinking, digital signing and other security features. Understanding PDF* PDF *Portable Document Format
  • 28. Searchable PDF: Understanding PDF Variations PDF Many scanning applications can create searchable PDF files. Here, the scanner applies OCR technology to make the file text searchable. Your application may label this as “make searchable”, “apply OCR”, “text-under-image” or “searchable PDF.” If selected, your file will be text searchable or text selectable within the Acrobat viewer and many other programs that search PDF files
  • 29. PDF/A: Understanding PDF Variations PDF PDF/A is an ISO-standard for digital preservation or archiving of electronic documents. It differs from standard PDF by omitting features not necessary for long-term archiving, such as font linking. Growing in international government and industry segments, including legal systems, libraries, newspapers, and regulated industries.
  • 30. Understanding JPEG JPEG *Joint Photographic Expert Group • Well established format • Most often used for photographs and graphics • Supports single page only • A “lossy” compression format, that is, some of the data is lost during compression. however it provides good compression ratios for grayscale and color images.
  • 31. Compression and File Size *Comparison courtesy of Wikipedia OMG, right? JPEG
  • 32. Compression and File Size *Comparison courtesy of Wikipedia OMG, right? The bottom line: experiment with your images and file size. A middle quality scan may meet your needs and save tremendous file space.
  • 33. Lesson 5: Indexing For an in-depth look visit: What is Document Indexing?
  • 34. What is Indexing? Document indexing (sometimes referred to as metadata) enables a users to quickly and efficiently locate their documents, either through a folder structure, database or electronic document management system.
  • 36. Avoid a disaster Great care should be taken to design an efficient indexing scheme. If the design is not devised correctly at the outset, trying to rectify it later can be both difficult and costly. Sometimes it makes sense to replicate the current manual method for document location to create a familiar, but faster system.
  • 37. Don’t worry, there is automation Technologies such as • Barcode recognition • OCR • Batch processing • Data Mining, Text Mining can save time and money by automating indexing and more.
  • 38. Using Barcodes for Indexing Intelligent data capture software can extract data from barcodes to create and send index information to a document management system. For an in-depth look at barcodex in data capture visit: What Can Barcodes Do For Me?
  • 39. With OCR, make your image-based file fully text searchable or extract data from a zone for indexing.
  • 40. Using OCR for Indexing With zonal OCR, document areas are identified for automatic OCR capture. Additionally, drag-and-drop OCR allows an operator to highlight document text which is automatically OCR'd and dropped into index fields.
  • 41. TIPS for OCR • Scan at 300 dpi for greater accuracy and ensure that small text is captured. • Limit the use of color on documents. • Pre-process the image with image enhancement software (available in many data capture products, learn more).
  • 42. Intelligent data capture solutions often use batch processing that lets you process a whole folder of documents at a time. Some products can “watch folders,” and process files as they are scanned into the folder. What is Batch Processing? For an in-depth look visit: What is Batch Document Processing?
  • 43. Intelligent data capture solutions often use batch processing that lets you process a whole folder of documents at a time. Some products can “watch folders,” and process files as they are scanned into the folder. What is Batch Processing? Processing can include indexing, file routing, file splitting, and cleaning/enhancing the scans. Learn more.
  • 45. Preparation, quality control and indexing are the most time consuming elements of any scanning job and usually the most costly.
  • 46. TIPS for OCR Typically a good operator can prepare 750-1000 documents per hour, however a number of factors may drop throughput to 300 or 500.
  • 47. Odd Size Document Type sales receipts, photos, plans/drawings, Bindings three ring, spiral, glue, folder Fasteners staples, paper clips binder clips, rubber bands Attachments Post-its, tabs Factors that Influence Document Prep
  • 48. Estimating Volumes and Storage Type Paper Folders Ring Binder Lever arch folder Transfer Cases Bankers Boxes Archive Boxes Filing Cabinets Simplex (avg #s) 30 to 100 200 500 500 500 2500 3000/drawer Duplex (avg #s) 60 to 200 400 1000 1000 1000 5000 6000/drawer Learn more about estimating volumes
  • 49. Homework: Learn More About Data Capture and Document Management More
  • 50. Document Management Determine if you require a full document management system or do you just need a simple search and retrieval system? Can I use it as a stepping stone while I evaluate my document management system?
  • 52. Call us for information on: How to digitize medical or dental records. The best way to scan medical or dental records. Scanning paper records. Document scanning for medical or dental records. Going paperless at the medical or dental office. How to capture medical or dental records efficiently. Scanning medical or dental records with Fujitsu ScanSnap. Touchscreen scanning of medical or dental records. How to improve your medical or dental workflow with document scanning. Scanning to EMR or scanning to EDR How to maximize your Fujitsu ScanSnap Using your ScanSnap for a basic document management system Using barcodes and the Fujitsu ScanSnap Scanning with the Fujitsu ScanSnap Automating workflow with the Fujitsu ScanSnap Automating document management capture Scanning into Dentrix Indexing into Dentrix Understanding basic Document Scanning Things your teacher never told you about Document Scanning An introduction to Document Scanning Scanning Fundamentals for the average Joe By DocuFi Makers of ImageRamp Data Capture Solutions 30 years’ Experience in the Document Imaging Market Proven Fujitsu ISV Partner Find out more at ImageRamp and www.docufi.com
  • 53. Image Credits • Pjohnkeane, Requirements, requirements, requirements, http://bit.ly/1fcULDf • Doug Waldron, “Files (85)”, http://bit.ly/1bfciII • UBC Learning Commons, “Scanner_icon-1024x671”, http://bit.ly/1eewI4P • Knile Lucy, you have some sorting to do! http://bit.ly/19bSgjF • Michael 1952, SJSA Fifth Grade - I Fell in Love With The Teacher, http://bit.ly/1eevu9A • Ton Haex, ”Einstein show.... “, http://bit.ly/LVqeBi • Loco Steve, “Sunrise under scrutiny”, http://bit.ly/1eevSVv • Tax Credits, “ Coins”, http://bit.ly/1mtQj5j • j_baer, ”Ubuntu Color Wheel”, http://bit.ly/1jARikx • Marcin Wichary, Alphabetical, http://bit.ly/1aILOku • David Erickson e-strategyblog.com, “Hindenburg Disaster”, http://bit.ly/1jASeFF • William Warby wwarby,” Gears”, http://bit.ly/1dwtU1S • Alan Cleaver,” watching”, http://bit.ly/1h1k9k7 • Zoetnet, “overflowing,” http://bit.ly/KHW9Em • Seattle Municipal Archives, “Comptroller's Office employees, 1960”, http://bit.ly/1eBvLGE • Seattle Municipal Archives , “City Light worker with office machine, 1954”, http://bit.ly/1eBw3NM • Patrick Hoesly, “Thank you” http://bit.ly/17xKErE All images are owned or licensed by DocuFi with acknowledgement given to: