SlideShare a Scribd company logo
1 of 53
An Introduction to
Document
Scanning
Business Document Scanning 101:
From the Data Capture Prospective
So you
have a lot
of this?
And you’ve decided
this is the answer.
So you need a crash
course in scanning
Lessons:
Lesson 1: Simplex or Duplex
Lesson 2: Resolution
Lesson 3: Color Depth
Lesson 4: File Formats
Lesson 5: Indexing
Lesson 6: Document Prep and Estimating Volumes
Homework: Learn More About Data Capture and Document Management
Lesson 1: Simplex or Duplex
Are the documents single or double-sided?
This may seem obvious but…
You many not want documents such as
purchase invoices scanned in duplex where
the back of the document only contains terms
and conditions.
On the other hand, if the documents have
high legal importance you may want every
conceivable item of information captured
such as small signatures or notes on the back.
Duplex scanning requires
more scanning
time/processing and
results in larger files.
And you don’t have to be
a genius to know that is
more costly.
Lesson 2: Resolution
So what is resolution and why does it matter?
Resolution is expressed as the number of dots
per inch (dpi) or less frequently pixels. Pixel
refers to “picture element” per inch (ppi) which
make up the image or really at what the image
was sampled.
What is Resolution?
Implications of Resolution
This graphic contains
two images, a “0” as a
grayscale image and an
“x” as black and white.
Implications of Resolution
• If we halved the size of the grid horizontally and
vertically (doubled the resolution), the pixels would
appear smoother and produce a better quality image,
the inverse would be true if we doubled the size of the
squares.
• If we kept the squares the same size but reduced the
size of the characters significantly the resolution is
insufficient.
Implications of Resolution
• The higher the resolution, the better the image
quality.
• For small characters, increase the resolution to
capture them effectively
So:
And, the higher the resolution,
the slower the scan and the
larger the file.
And, the higher the resolution,
the slower the scan and the
larger the file.
Which means higher scanning
and file storage costs, Einstein.
Typical Scanning Resolutions
• Web graphic – 96 dpi
• Standard archive document – 200 dpi
• Document required for optical character
recognition (OCR) – 300 dpi
• Plans/drawings for vectorization – 400 dpi
• Documents required for historical archiving –
600 dpi
Resolution is generally determined by intended
use.
Lesson 3: Color Depth
Documents scanned in black and white are
always scanned as grayscale within the
scanner. The scanner then applies a process
known as thresholding to the image to produce
the black and white image.
Thresholding simply determines when a pixel
should be black or white.
Understanding Black and White
Grayscale is used when the image contains
color or grayscale data and the tone of the
image needs to be retained, i.e. photographs or
shaded graphics.
Understanding Grayscale
Color is obviously used when the image
contains color data. Some users wish to retain
important color information for example, land
boundaries or graphical data, and not
letterhead logos, highlighters, etc.
Understanding Color
Bits per
pixel
File Storage Requirements
24 8 1
Bits per
pixel
File Storage Requirements
24 8 1
So the storage requirements for a grayscale image is 8
times larger than a black and white, and color
requirements are 24 times more than black and white.
And, remember Einstein, larger files equals higher costs.
Lesson 4: File Formats
TIFF
JPEG
PDF
For an in-depth look visit: PDF v. TIFF
• Well established format
• Most often used for black and white documents
• Supports multiple pages
• Interpreted correctly by most applications with a
caution on certain color implementations
• “Group 4” format refers to the compression method
used on black and white images which is a “lossless”
compression where original data is not lost in
compression/decompression.
Understanding TIFF*
TIFF
*Tagged Image File Format
• Well established format by Adobe
• Supports color, grayscale, and black and white
• Supports multiple pages
• Generally stored using Group 4 and JPEG
compression although supports other formats too.
• Used when more advanced features are needed
within the file such as embedded Optical Character
Recognition (OCR), hyperlinking, digital signing
and other security features.
Understanding PDF*
PDF
*Portable Document Format
Searchable PDF:
Understanding PDF Variations
PDF
Many scanning applications can create searchable
PDF files. Here, the scanner applies OCR technology
to make the file text searchable. Your application
may label this as “make searchable”, “apply OCR”,
“text-under-image” or “searchable PDF.” If selected,
your file will be text searchable or text selectable
within the Acrobat viewer and many other programs
that search PDF files
PDF/A:
Understanding PDF Variations
PDF
PDF/A is an ISO-standard for digital preservation or
archiving of electronic documents.
It differs from standard PDF by omitting features not
necessary for long-term archiving, such as font
linking.
Growing in international government and industry
segments, including legal systems, libraries,
newspapers, and regulated industries.
Understanding JPEG
JPEG
*Joint Photographic Expert Group
• Well established format
• Most often used for photographs and graphics
• Supports single page only
• A “lossy” compression format, that is, some of the
data is lost during compression. however it provides
good compression ratios for grayscale and color
images.
Compression and File Size
*Comparison courtesy of Wikipedia
OMG,
right?
JPEG
Compression and File Size
*Comparison courtesy of Wikipedia
OMG,
right?
The bottom line: experiment with your
images and file size. A middle quality
scan may meet your needs and save
tremendous file space.
Lesson 5: Indexing
For an in-depth look visit: What is Document Indexing?
What is Indexing?
Document indexing (sometimes referred to as
metadata) enables a users to quickly and
efficiently locate their documents, either
through a folder structure, database or
electronic document management system.
Avoid a disaster
Avoid a disaster
Great care should be taken to design an efficient indexing
scheme. If the design is not devised correctly at the outset,
trying to rectify it later can be both difficult and costly.
Sometimes it makes sense to replicate the current manual
method for document location to create a familiar, but faster
system.
Don’t worry, there is automation
Technologies such as
• Barcode recognition
• OCR
• Batch processing
• Data Mining, Text Mining
can save time and money by automating indexing and
more.
Using Barcodes for Indexing
Intelligent data
capture software
can extract data
from barcodes to
create and send
index information
to a document
management
system.
For an in-depth look at barcodex in data capture
visit: What Can Barcodes Do For Me?
With OCR, make your image-based file fully
text searchable or extract data from a zone for
indexing.
Using OCR for Indexing
With zonal OCR, document
areas are identified for
automatic OCR capture.
Additionally, drag-and-drop
OCR allows an operator to
highlight document text
which is automatically OCR'd
and dropped into index
fields.
TIPS for OCR
• Scan at 300 dpi for greater accuracy
and ensure that small text is captured.
• Limit the use of color on documents.
• Pre-process the image with image
enhancement software (available in
many data capture products, learn
more).
Intelligent data capture solutions often use batch processing that
lets you process a whole folder of documents at a time. Some
products can “watch folders,” and process files as they are
scanned into the folder.
What is Batch Processing?
For an in-depth look visit: What is Batch Document Processing?
Intelligent data capture solutions often use batch processing that
lets you process a whole folder of documents at a time. Some
products can “watch folders,” and process files as they are
scanned into the folder.
What is Batch Processing?
Processing can include indexing, file routing, file splitting,
and cleaning/enhancing the scans. Learn more.
Lesson 6:
Document
Prep and
Estimating
Volumes
Preparation, quality control and indexing are the
most time consuming elements of any scanning
job and usually the most costly.
TIPS for OCR
Typically a good operator can prepare 750-1000
documents per hour, however a number of
factors may drop throughput to 300 or 500.
Odd Size Document Type
sales receipts, photos,
plans/drawings,
Bindings
three ring, spiral, glue,
folder
Fasteners
staples, paper clips binder
clips, rubber bands
Attachments
Post-its, tabs
Factors that Influence Document Prep
Estimating Volumes and
Storage
Type
Paper
Folders Ring Binder
Lever arch
folder
Transfer
Cases
Bankers
Boxes Archive Boxes
Filing
Cabinets
Simplex
(avg #s)
30 to 100 200 500 500 500 2500 3000/drawer
Duplex
(avg #s)
60 to 200 400 1000 1000 1000 5000 6000/drawer
Learn more about estimating volumes
Homework: Learn More About
Data Capture and Document
Management
More
Document Management
Determine if you require a full document
management system or do you just need a
simple search and retrieval system?
Can I use it as a stepping stone while I
evaluate my document management
system?
More
Learn More
Call us for information on:
How to digitize medical or dental records.
The best way to scan medical or dental records.
Scanning paper records.
Document scanning for medical or dental records.
Going paperless at the medical or dental office.
How to capture medical or dental records efficiently.
Scanning medical or dental records with Fujitsu ScanSnap.
Touchscreen scanning of medical or dental records.
How to improve your medical or dental workflow with document scanning.
Scanning to EMR or scanning to EDR
How to maximize your Fujitsu ScanSnap
Using your ScanSnap for a basic document management system
Using barcodes and the Fujitsu ScanSnap
Scanning with the Fujitsu ScanSnap
Automating workflow with the Fujitsu ScanSnap
Automating document management capture
Scanning into Dentrix
Indexing into Dentrix
Understanding basic Document Scanning
Things your teacher never told you about Document Scanning
An introduction to Document Scanning
Scanning Fundamentals for the average Joe
By DocuFi
Makers of ImageRamp Data Capture Solutions
30 years’ Experience in the Document Imaging
Market
Proven Fujitsu ISV Partner
Find out more at ImageRamp and
www.docufi.com
Image Credits
• Pjohnkeane, Requirements, requirements, requirements, http://bit.ly/1fcULDf
• Doug Waldron, “Files (85)”, http://bit.ly/1bfciII
• UBC Learning Commons, “Scanner_icon-1024x671”, http://bit.ly/1eewI4P
• Knile Lucy, you have some sorting to do! http://bit.ly/19bSgjF
• Michael 1952, SJSA Fifth Grade - I Fell in Love With The Teacher, http://bit.ly/1eevu9A
• Ton Haex, ”Einstein show.... “, http://bit.ly/LVqeBi
• Loco Steve, “Sunrise under scrutiny”, http://bit.ly/1eevSVv
• Tax Credits, “ Coins”, http://bit.ly/1mtQj5j
• j_baer, ”Ubuntu Color Wheel”, http://bit.ly/1jARikx
• Marcin Wichary, Alphabetical, http://bit.ly/1aILOku
• David Erickson e-strategyblog.com, “Hindenburg Disaster”, http://bit.ly/1jASeFF
• William Warby wwarby,” Gears”, http://bit.ly/1dwtU1S
• Alan Cleaver,” watching”, http://bit.ly/1h1k9k7
• Zoetnet, “overflowing,” http://bit.ly/KHW9Em
• Seattle Municipal Archives, “Comptroller's Office employees, 1960”, http://bit.ly/1eBvLGE
• Seattle Municipal Archives , “City Light worker with office machine, 1954”,
http://bit.ly/1eBw3NM
• Patrick Hoesly, “Thank you” http://bit.ly/17xKErE
All images are owned or licensed by DocuFi with acknowledgement given to:

More Related Content

What's hot

#HR and #GDPR: Preparing for 2018 Compliance
#HR and #GDPR: Preparing for 2018 Compliance #HR and #GDPR: Preparing for 2018 Compliance
#HR and #GDPR: Preparing for 2018 Compliance Dovetail Software
 
Be A Great Product Leader (Slack 2017)
Be A Great Product Leader (Slack 2017)Be A Great Product Leader (Slack 2017)
Be A Great Product Leader (Slack 2017)Adam Nash
 
Electronic Records Management An Overview
Electronic Records Management An OverviewElectronic Records Management An Overview
Electronic Records Management An OverviewKen Matthews
 
Audit Checklist for Information Systems
Audit Checklist for Information SystemsAudit Checklist for Information Systems
Audit Checklist for Information SystemsAhmad Tariq Bhatti
 
Document Management System
Document Management SystemDocument Management System
Document Management SystemSidhartha Sahoo
 
What is Document Management
What is Document ManagementWhat is Document Management
What is Document ManagementRadix Software
 
Investing in Changemakers
Investing in ChangemakersInvesting in Changemakers
Investing in Changemakersaccenture
 
Information security policy_2011
Information security policy_2011Information security policy_2011
Information security policy_2011codka
 
Platform-powered IT
Platform-powered ITPlatform-powered IT
Platform-powered ITaccenture
 
The Adam - A process model for digital forensic practice
The Adam - A process model for digital forensic practiceThe Adam - A process model for digital forensic practice
The Adam - A process model for digital forensic practiceDr. Richard Adams
 
Introduction of information technology to managers
Introduction of information technology to managersIntroduction of information technology to managers
Introduction of information technology to managersAbdulQadir Koitewale
 
Data Analytics in Healthcare
Data Analytics in HealthcareData Analytics in Healthcare
Data Analytics in HealthcareMark Gall
 
Principles of Health IT Application in Healthcare (October 4, 2021)
Principles of Health IT Application in Healthcare (October 4, 2021)Principles of Health IT Application in Healthcare (October 4, 2021)
Principles of Health IT Application in Healthcare (October 4, 2021)Nawanan Theera-Ampornpunt
 
Personal Health Records
Personal Health RecordsPersonal Health Records
Personal Health RecordsAyush Narula
 
의료의 미래, 디지털 헬스케어: 제약산업을 중심으로
의료의 미래, 디지털 헬스케어: 제약산업을 중심으로의료의 미래, 디지털 헬스케어: 제약산업을 중심으로
의료의 미래, 디지털 헬스케어: 제약산업을 중심으로Yoon Sup Choi
 
Integrating Physical And Logical Security
Integrating Physical And Logical SecurityIntegrating Physical And Logical Security
Integrating Physical And Logical SecurityJorge Sebastiao
 

What's hot (20)

Assit lvel4
Assit lvel4Assit lvel4
Assit lvel4
 
#HR and #GDPR: Preparing for 2018 Compliance
#HR and #GDPR: Preparing for 2018 Compliance #HR and #GDPR: Preparing for 2018 Compliance
#HR and #GDPR: Preparing for 2018 Compliance
 
Be A Great Product Leader (Slack 2017)
Be A Great Product Leader (Slack 2017)Be A Great Product Leader (Slack 2017)
Be A Great Product Leader (Slack 2017)
 
Electronic Records Management An Overview
Electronic Records Management An OverviewElectronic Records Management An Overview
Electronic Records Management An Overview
 
Hippa 2021
Hippa 2021Hippa 2021
Hippa 2021
 
Audit Checklist for Information Systems
Audit Checklist for Information SystemsAudit Checklist for Information Systems
Audit Checklist for Information Systems
 
Document Management System
Document Management SystemDocument Management System
Document Management System
 
What is Document Management
What is Document ManagementWhat is Document Management
What is Document Management
 
Investing in Changemakers
Investing in ChangemakersInvesting in Changemakers
Investing in Changemakers
 
Data, knowledge and information
Data, knowledge and informationData, knowledge and information
Data, knowledge and information
 
Information security policy_2011
Information security policy_2011Information security policy_2011
Information security policy_2011
 
Platform-powered IT
Platform-powered ITPlatform-powered IT
Platform-powered IT
 
The Adam - A process model for digital forensic practice
The Adam - A process model for digital forensic practiceThe Adam - A process model for digital forensic practice
The Adam - A process model for digital forensic practice
 
Introduction of information technology to managers
Introduction of information technology to managersIntroduction of information technology to managers
Introduction of information technology to managers
 
Data Analytics in Healthcare
Data Analytics in HealthcareData Analytics in Healthcare
Data Analytics in Healthcare
 
Data Privacy & Security
Data Privacy & SecurityData Privacy & Security
Data Privacy & Security
 
Principles of Health IT Application in Healthcare (October 4, 2021)
Principles of Health IT Application in Healthcare (October 4, 2021)Principles of Health IT Application in Healthcare (October 4, 2021)
Principles of Health IT Application in Healthcare (October 4, 2021)
 
Personal Health Records
Personal Health RecordsPersonal Health Records
Personal Health Records
 
의료의 미래, 디지털 헬스케어: 제약산업을 중심으로
의료의 미래, 디지털 헬스케어: 제약산업을 중심으로의료의 미래, 디지털 헬스케어: 제약산업을 중심으로
의료의 미래, 디지털 헬스케어: 제약산업을 중심으로
 
Integrating Physical And Logical Security
Integrating Physical And Logical SecurityIntegrating Physical And Logical Security
Integrating Physical And Logical Security
 

Viewers also liked

Why you need to use document scanning management system for business?
Why you need to use document scanning management system for business?Why you need to use document scanning management system for business?
Why you need to use document scanning management system for business?Digismartek
 
Scanning & document management
Scanning & document managementScanning & document management
Scanning & document managementGautam Ganguly
 
Document scanning and capture (local, central, outsource) what's working best
Document scanning and capture (local, central, outsource) what's working bestDocument scanning and capture (local, central, outsource) what's working best
Document scanning and capture (local, central, outsource) what's working bestVander Loto
 
Scanning Document Types | Record Nations
Scanning Document Types | Record NationsScanning Document Types | Record Nations
Scanning Document Types | Record NationsRecord Nations
 
Apa itu soft copy
Apa itu soft copyApa itu soft copy
Apa itu soft copyjohnthj
 

Viewers also liked (15)

What is Intelligent Document and Data Capture? A look at the technologies to ...
What is Intelligent Document and Data Capture? A look at the technologies to ...What is Intelligent Document and Data Capture? A look at the technologies to ...
What is Intelligent Document and Data Capture? A look at the technologies to ...
 
Image Scanning Services
Image Scanning ServicesImage Scanning Services
Image Scanning Services
 
Why you need to use document scanning management system for business?
Why you need to use document scanning management system for business?Why you need to use document scanning management system for business?
Why you need to use document scanning management system for business?
 
What is Data Capture
What is Data CaptureWhat is Data Capture
What is Data Capture
 
RU
RURU
RU
 
Scanning & document management
Scanning & document managementScanning & document management
Scanning & document management
 
Document scanning and capture (local, central, outsource) what's working best
Document scanning and capture (local, central, outsource) what's working bestDocument scanning and capture (local, central, outsource) what's working best
Document scanning and capture (local, central, outsource) what's working best
 
What is Document Indexing? A tutorial for intelligent data capture.
What is Document Indexing? A tutorial for intelligent data capture.What is Document Indexing? A tutorial for intelligent data capture.
What is Document Indexing? A tutorial for intelligent data capture.
 
PDF vs. TIFF, An Evaluation of Document Scanning File Formats
PDF vs. TIFF, An Evaluation of Document Scanning File FormatsPDF vs. TIFF, An Evaluation of Document Scanning File Formats
PDF vs. TIFF, An Evaluation of Document Scanning File Formats
 
Mobile Cloud Capture: Customize your Data Capture on Mobile Devices with Proc...
Mobile Cloud Capture: Customize your Data Capture on Mobile Devices with Proc...Mobile Cloud Capture: Customize your Data Capture on Mobile Devices with Proc...
Mobile Cloud Capture: Customize your Data Capture on Mobile Devices with Proc...
 
Scanning Document Types | Record Nations
Scanning Document Types | Record NationsScanning Document Types | Record Nations
Scanning Document Types | Record Nations
 
ChronoScan Document Scanning and Capture for Unparralleled Data Extraction an...
ChronoScan Document Scanning and Capture for Unparralleled Data Extraction an...ChronoScan Document Scanning and Capture for Unparralleled Data Extraction an...
ChronoScan Document Scanning and Capture for Unparralleled Data Extraction an...
 
Fujitsu ScanSnap Scanner, an overview of document data capture with barcodes,...
Fujitsu ScanSnap Scanner, an overview of document data capture with barcodes,...Fujitsu ScanSnap Scanner, an overview of document data capture with barcodes,...
Fujitsu ScanSnap Scanner, an overview of document data capture with barcodes,...
 
What can barcodes do for me? A look at barcodes in Document Management/EMR da...
What can barcodes do for me? A look at barcodes in Document Management/EMR da...What can barcodes do for me? A look at barcodes in Document Management/EMR da...
What can barcodes do for me? A look at barcodes in Document Management/EMR da...
 
Apa itu soft copy
Apa itu soft copyApa itu soft copy
Apa itu soft copy
 

Similar to An Introduction to Document Scanning, Understanding Your Requirements

Document Automation and Integration Webinar For CVision
Document Automation and Integration Webinar For CVisionDocument Automation and Integration Webinar For CVision
Document Automation and Integration Webinar For CVisionChris Riley ☁
 
Grootschalige digitalisering van archivalia
Grootschalige digitalisering van archivaliaGrootschalige digitalisering van archivalia
Grootschalige digitalisering van archivaliaMarc Holtman
 
Praveen
PraveenPraveen
Praveenrjmktg
 
Document management system
Document management systemDocument management system
Document management systemAbhishek Agrawal
 
Asset Management and Workflow
Asset Management and WorkflowAsset Management and Workflow
Asset Management and WorkflowVirtu Institute
 
Document Management System Overview
Document Management System OverviewDocument Management System Overview
Document Management System OverviewSaif Enterprise
 
Developing a plan for your imaging project
Developing a plan for your imaging projectDeveloping a plan for your imaging project
Developing a plan for your imaging projectTAB
 
Scanning and Digitization
Scanning and DigitizationScanning and Digitization
Scanning and DigitizationMike Sleigh
 
Understanding EDP (Electronic Data Processing) Environment
Understanding EDP (Electronic Data Processing) EnvironmentUnderstanding EDP (Electronic Data Processing) Environment
Understanding EDP (Electronic Data Processing) EnvironmentAdetula Bunmi
 
Smartfish Presentation 2007
Smartfish Presentation 2007Smartfish Presentation 2007
Smartfish Presentation 2007waynehooper
 

Similar to An Introduction to Document Scanning, Understanding Your Requirements (20)

What is Batch Document Processing? A tutorial for document capture.
What is Batch Document Processing?  A tutorial for document capture.What is Batch Document Processing?  A tutorial for document capture.
What is Batch Document Processing? A tutorial for document capture.
 
Document Automation and Integration Webinar For CVision
Document Automation and Integration Webinar For CVisionDocument Automation and Integration Webinar For CVision
Document Automation and Integration Webinar For CVision
 
Batch Document Processing with ImageRamp Batch
Batch Document Processing with ImageRamp BatchBatch Document Processing with ImageRamp Batch
Batch Document Processing with ImageRamp Batch
 
Automatic file naming and routing for scanned documents and existing files.
Automatic file naming and routing for scanned documents and existing files.  Automatic file naming and routing for scanned documents and existing files.
Automatic file naming and routing for scanned documents and existing files.
 
Grootschalige digitalisering van archivalia
Grootschalige digitalisering van archivaliaGrootschalige digitalisering van archivalia
Grootschalige digitalisering van archivalia
 
Document management tools and techniques
Document management tools and techniquesDocument management tools and techniques
Document management tools and techniques
 
Praveen
PraveenPraveen
Praveen
 
Document management system
Document management systemDocument management system
Document management system
 
Asset Management and Workflow
Asset Management and WorkflowAsset Management and Workflow
Asset Management and Workflow
 
Folder Watching For Automated Document Capture, Batch Scanning
Folder Watching For Automated Document Capture, Batch ScanningFolder Watching For Automated Document Capture, Batch Scanning
Folder Watching For Automated Document Capture, Batch Scanning
 
Intelligent Data Extraction, Turning Content into Data, A Look at Advanced Ca...
Intelligent Data Extraction, Turning Content into Data, A Look at Advanced Ca...Intelligent Data Extraction, Turning Content into Data, A Look at Advanced Ca...
Intelligent Data Extraction, Turning Content into Data, A Look at Advanced Ca...
 
Document Management System Overview
Document Management System OverviewDocument Management System Overview
Document Management System Overview
 
Improve OCR Accuracy, Clean Up and Enhance Scanned Images
Improve OCR Accuracy, Clean Up and Enhance Scanned ImagesImprove OCR Accuracy, Clean Up and Enhance Scanned Images
Improve OCR Accuracy, Clean Up and Enhance Scanned Images
 
Introduction to Document Management
Introduction to Document ManagementIntroduction to Document Management
Introduction to Document Management
 
Developing a plan for your imaging project
Developing a plan for your imaging projectDeveloping a plan for your imaging project
Developing a plan for your imaging project
 
Scanning and Digitization
Scanning and DigitizationScanning and Digitization
Scanning and Digitization
 
Understanding EDP (Electronic Data Processing) Environment
Understanding EDP (Electronic Data Processing) EnvironmentUnderstanding EDP (Electronic Data Processing) Environment
Understanding EDP (Electronic Data Processing) Environment
 
Automated Data Capture and Extraction with ChronoScan for Automated Metadata ...
Automated Data Capture and Extraction with ChronoScan for Automated Metadata ...Automated Data Capture and Extraction with ChronoScan for Automated Metadata ...
Automated Data Capture and Extraction with ChronoScan for Automated Metadata ...
 
8 Document Capture Must Haves, a Document Management Tutorial
8 Document Capture Must Haves, a Document Management Tutorial8 Document Capture Must Haves, a Document Management Tutorial
8 Document Capture Must Haves, a Document Management Tutorial
 
Smartfish Presentation 2007
Smartfish Presentation 2007Smartfish Presentation 2007
Smartfish Presentation 2007
 

More from DocuFi, offering HAI and Infection Prevention Analytics (6)

HAIvia Mobile for Infection Prevention Data Capture and Forms Management (for...
HAIvia Mobile for Infection Prevention Data Capture and Forms Management (for...HAIvia Mobile for Infection Prevention Data Capture and Forms Management (for...
HAIvia Mobile for Infection Prevention Data Capture and Forms Management (for...
 
Automated Document Indexing with ImageRamp
Automated Document Indexing with ImageRampAutomated Document Indexing with ImageRamp
Automated Document Indexing with ImageRamp
 
Custom Capture Tool Development
Custom Capture Tool DevelopmentCustom Capture Tool Development
Custom Capture Tool Development
 
Tips to Solve Common Problems Reading Barcodes
Tips to Solve Common Problems Reading BarcodesTips to Solve Common Problems Reading Barcodes
Tips to Solve Common Problems Reading Barcodes
 
Intelligent Data Capture Just Got Better, What's New in ImageRamp 6
Intelligent Data Capture Just Got Better, What's New in ImageRamp 6Intelligent Data Capture Just Got Better, What's New in ImageRamp 6
Intelligent Data Capture Just Got Better, What's New in ImageRamp 6
 
Transformation in the Electric Utility Industry, Redevelopment of Decommissio...
Transformation in the Electric Utility Industry, Redevelopment of Decommissio...Transformation in the Electric Utility Industry, Redevelopment of Decommissio...
Transformation in the Electric Utility Industry, Redevelopment of Decommissio...
 

Recently uploaded

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Recently uploaded (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

An Introduction to Document Scanning, Understanding Your Requirements

  • 1. An Introduction to Document Scanning Business Document Scanning 101: From the Data Capture Prospective
  • 2. So you have a lot of this?
  • 3. And you’ve decided this is the answer.
  • 4. So you need a crash course in scanning
  • 5. Lessons: Lesson 1: Simplex or Duplex Lesson 2: Resolution Lesson 3: Color Depth Lesson 4: File Formats Lesson 5: Indexing Lesson 6: Document Prep and Estimating Volumes Homework: Learn More About Data Capture and Document Management
  • 6. Lesson 1: Simplex or Duplex Are the documents single or double-sided? This may seem obvious but…
  • 7. You many not want documents such as purchase invoices scanned in duplex where the back of the document only contains terms and conditions. On the other hand, if the documents have high legal importance you may want every conceivable item of information captured such as small signatures or notes on the back.
  • 8. Duplex scanning requires more scanning time/processing and results in larger files.
  • 9. And you don’t have to be a genius to know that is more costly.
  • 11. So what is resolution and why does it matter?
  • 12. Resolution is expressed as the number of dots per inch (dpi) or less frequently pixels. Pixel refers to “picture element” per inch (ppi) which make up the image or really at what the image was sampled. What is Resolution?
  • 13. Implications of Resolution This graphic contains two images, a “0” as a grayscale image and an “x” as black and white.
  • 14. Implications of Resolution • If we halved the size of the grid horizontally and vertically (doubled the resolution), the pixels would appear smoother and produce a better quality image, the inverse would be true if we doubled the size of the squares. • If we kept the squares the same size but reduced the size of the characters significantly the resolution is insufficient.
  • 15. Implications of Resolution • The higher the resolution, the better the image quality. • For small characters, increase the resolution to capture them effectively So:
  • 16. And, the higher the resolution, the slower the scan and the larger the file.
  • 17. And, the higher the resolution, the slower the scan and the larger the file. Which means higher scanning and file storage costs, Einstein.
  • 18. Typical Scanning Resolutions • Web graphic – 96 dpi • Standard archive document – 200 dpi • Document required for optical character recognition (OCR) – 300 dpi • Plans/drawings for vectorization – 400 dpi • Documents required for historical archiving – 600 dpi Resolution is generally determined by intended use.
  • 20. Documents scanned in black and white are always scanned as grayscale within the scanner. The scanner then applies a process known as thresholding to the image to produce the black and white image. Thresholding simply determines when a pixel should be black or white. Understanding Black and White
  • 21. Grayscale is used when the image contains color or grayscale data and the tone of the image needs to be retained, i.e. photographs or shaded graphics. Understanding Grayscale
  • 22. Color is obviously used when the image contains color data. Some users wish to retain important color information for example, land boundaries or graphical data, and not letterhead logos, highlighters, etc. Understanding Color
  • 23. Bits per pixel File Storage Requirements 24 8 1
  • 24. Bits per pixel File Storage Requirements 24 8 1 So the storage requirements for a grayscale image is 8 times larger than a black and white, and color requirements are 24 times more than black and white. And, remember Einstein, larger files equals higher costs.
  • 25. Lesson 4: File Formats TIFF JPEG PDF For an in-depth look visit: PDF v. TIFF
  • 26. • Well established format • Most often used for black and white documents • Supports multiple pages • Interpreted correctly by most applications with a caution on certain color implementations • “Group 4” format refers to the compression method used on black and white images which is a “lossless” compression where original data is not lost in compression/decompression. Understanding TIFF* TIFF *Tagged Image File Format
  • 27. • Well established format by Adobe • Supports color, grayscale, and black and white • Supports multiple pages • Generally stored using Group 4 and JPEG compression although supports other formats too. • Used when more advanced features are needed within the file such as embedded Optical Character Recognition (OCR), hyperlinking, digital signing and other security features. Understanding PDF* PDF *Portable Document Format
  • 28. Searchable PDF: Understanding PDF Variations PDF Many scanning applications can create searchable PDF files. Here, the scanner applies OCR technology to make the file text searchable. Your application may label this as “make searchable”, “apply OCR”, “text-under-image” or “searchable PDF.” If selected, your file will be text searchable or text selectable within the Acrobat viewer and many other programs that search PDF files
  • 29. PDF/A: Understanding PDF Variations PDF PDF/A is an ISO-standard for digital preservation or archiving of electronic documents. It differs from standard PDF by omitting features not necessary for long-term archiving, such as font linking. Growing in international government and industry segments, including legal systems, libraries, newspapers, and regulated industries.
  • 30. Understanding JPEG JPEG *Joint Photographic Expert Group • Well established format • Most often used for photographs and graphics • Supports single page only • A “lossy” compression format, that is, some of the data is lost during compression. however it provides good compression ratios for grayscale and color images.
  • 31. Compression and File Size *Comparison courtesy of Wikipedia OMG, right? JPEG
  • 32. Compression and File Size *Comparison courtesy of Wikipedia OMG, right? The bottom line: experiment with your images and file size. A middle quality scan may meet your needs and save tremendous file space.
  • 33. Lesson 5: Indexing For an in-depth look visit: What is Document Indexing?
  • 34. What is Indexing? Document indexing (sometimes referred to as metadata) enables a users to quickly and efficiently locate their documents, either through a folder structure, database or electronic document management system.
  • 36. Avoid a disaster Great care should be taken to design an efficient indexing scheme. If the design is not devised correctly at the outset, trying to rectify it later can be both difficult and costly. Sometimes it makes sense to replicate the current manual method for document location to create a familiar, but faster system.
  • 37. Don’t worry, there is automation Technologies such as • Barcode recognition • OCR • Batch processing • Data Mining, Text Mining can save time and money by automating indexing and more.
  • 38. Using Barcodes for Indexing Intelligent data capture software can extract data from barcodes to create and send index information to a document management system. For an in-depth look at barcodex in data capture visit: What Can Barcodes Do For Me?
  • 39. With OCR, make your image-based file fully text searchable or extract data from a zone for indexing.
  • 40. Using OCR for Indexing With zonal OCR, document areas are identified for automatic OCR capture. Additionally, drag-and-drop OCR allows an operator to highlight document text which is automatically OCR'd and dropped into index fields.
  • 41. TIPS for OCR • Scan at 300 dpi for greater accuracy and ensure that small text is captured. • Limit the use of color on documents. • Pre-process the image with image enhancement software (available in many data capture products, learn more).
  • 42. Intelligent data capture solutions often use batch processing that lets you process a whole folder of documents at a time. Some products can “watch folders,” and process files as they are scanned into the folder. What is Batch Processing? For an in-depth look visit: What is Batch Document Processing?
  • 43. Intelligent data capture solutions often use batch processing that lets you process a whole folder of documents at a time. Some products can “watch folders,” and process files as they are scanned into the folder. What is Batch Processing? Processing can include indexing, file routing, file splitting, and cleaning/enhancing the scans. Learn more.
  • 45. Preparation, quality control and indexing are the most time consuming elements of any scanning job and usually the most costly.
  • 46. TIPS for OCR Typically a good operator can prepare 750-1000 documents per hour, however a number of factors may drop throughput to 300 or 500.
  • 47. Odd Size Document Type sales receipts, photos, plans/drawings, Bindings three ring, spiral, glue, folder Fasteners staples, paper clips binder clips, rubber bands Attachments Post-its, tabs Factors that Influence Document Prep
  • 48. Estimating Volumes and Storage Type Paper Folders Ring Binder Lever arch folder Transfer Cases Bankers Boxes Archive Boxes Filing Cabinets Simplex (avg #s) 30 to 100 200 500 500 500 2500 3000/drawer Duplex (avg #s) 60 to 200 400 1000 1000 1000 5000 6000/drawer Learn more about estimating volumes
  • 49. Homework: Learn More About Data Capture and Document Management More
  • 50. Document Management Determine if you require a full document management system or do you just need a simple search and retrieval system? Can I use it as a stepping stone while I evaluate my document management system?
  • 52. Call us for information on: How to digitize medical or dental records. The best way to scan medical or dental records. Scanning paper records. Document scanning for medical or dental records. Going paperless at the medical or dental office. How to capture medical or dental records efficiently. Scanning medical or dental records with Fujitsu ScanSnap. Touchscreen scanning of medical or dental records. How to improve your medical or dental workflow with document scanning. Scanning to EMR or scanning to EDR How to maximize your Fujitsu ScanSnap Using your ScanSnap for a basic document management system Using barcodes and the Fujitsu ScanSnap Scanning with the Fujitsu ScanSnap Automating workflow with the Fujitsu ScanSnap Automating document management capture Scanning into Dentrix Indexing into Dentrix Understanding basic Document Scanning Things your teacher never told you about Document Scanning An introduction to Document Scanning Scanning Fundamentals for the average Joe By DocuFi Makers of ImageRamp Data Capture Solutions 30 years’ Experience in the Document Imaging Market Proven Fujitsu ISV Partner Find out more at ImageRamp and www.docufi.com
  • 53. Image Credits • Pjohnkeane, Requirements, requirements, requirements, http://bit.ly/1fcULDf • Doug Waldron, “Files (85)”, http://bit.ly/1bfciII • UBC Learning Commons, “Scanner_icon-1024x671”, http://bit.ly/1eewI4P • Knile Lucy, you have some sorting to do! http://bit.ly/19bSgjF • Michael 1952, SJSA Fifth Grade - I Fell in Love With The Teacher, http://bit.ly/1eevu9A • Ton Haex, ”Einstein show.... “, http://bit.ly/LVqeBi • Loco Steve, “Sunrise under scrutiny”, http://bit.ly/1eevSVv • Tax Credits, “ Coins”, http://bit.ly/1mtQj5j • j_baer, ”Ubuntu Color Wheel”, http://bit.ly/1jARikx • Marcin Wichary, Alphabetical, http://bit.ly/1aILOku • David Erickson e-strategyblog.com, “Hindenburg Disaster”, http://bit.ly/1jASeFF • William Warby wwarby,” Gears”, http://bit.ly/1dwtU1S • Alan Cleaver,” watching”, http://bit.ly/1h1k9k7 • Zoetnet, “overflowing,” http://bit.ly/KHW9Em • Seattle Municipal Archives, “Comptroller's Office employees, 1960”, http://bit.ly/1eBvLGE • Seattle Municipal Archives , “City Light worker with office machine, 1954”, http://bit.ly/1eBw3NM • Patrick Hoesly, “Thank you” http://bit.ly/17xKErE All images are owned or licensed by DocuFi with acknowledgement given to: