2. What is the digitization?
2
Digitization is the process of converting
information into a digital format. In this format,
information is organized into discrete units of data
that can be separately addressed. This is
the binary data that computers and many devices
with computing capacity can process.
http://whatis.techtarget.com/definition/0,,sid9_g
ci896692,00.html
See also: http://en.wikipedia.org/wiki/Digitizing
3. Steps of digitization
3
1. Choose the book you want to digitize.
2. Choose an OCR software (GO!)
3. Scan your book (Choose the devise. Scanner,
compact device, digital camera, IRIScan) (GO!)
4. Optical Character Recognition (image)
5. Correction (image1) (image2)
6. Save as a text searchable PDF document
See another versions:
http://www.inquisition.ca/en/info/artic/comment_
numeriser.htm
http://dlg.galileo.usg.edu/guide.html#01
4. Text and images
4
Text and images can be digitized similarly:
a scanner captures an image (which may be an image
of text) and converts it to an image file, such as
a bitmap. An optical character recognition (OCR)
program analyzes a text image for light and dark
areas in order to identify each alphabetic letter or
numeric digit, and converts each character into
an ASCII code.
5. Choose an OCR software
5
There are a lot of softwares to digitize your
documents.
On Wikipedia there is comparison list of optical
character recognition softwares. Check it out!
http://en.wikipedia.org/wiki/List_of_optical_chara
cter_recognition_software
(I recommend you the ABBYY FineReader.)
If you don’t want to buy (or download) a
software, here’s a free online OCR:
http://www.newocr.com/
6. What is OCR?
6
OCR (optical character recognition) is the
recognition of printed or written text characters by
a computer. This involves photoscanning of the
text character-by-character, analysis of the
scanned-in image, and then translation of the
character image into character codes, such as
ASCII, commonly used in data processing.
http://searchciomidmarket.techtarget.com/definition/OCR
Read more:
http://en.wikipedia.org/wiki/Optical_character_r
ecognition
7. What is ASCII?
7
ASCII (American Standard Code for Information
Interchange) is the most common format for
text files in computers and on the Internet. In an
ASCII file, each alphabetic, numeric, or special
character is represented with a 7-bit binary number
(a string of seven 0s or 1s). 128 possible characters
are defined.
In: http://searchciomidmarket.techtarget.com/definition/ASCII
8. How to scan the book
8
With scanner: http://www.wikihow.com/Scan-a
Book
http://www.proportionalreading.com/scan.html
With one compact device:
http://www.ehow.com/how_6950098_scan-bookpdf-format.html
With digital camera:
http://www.wikihow.com/Scan-a-Book-With-aDigital-Camera
With IRIScan:
http://www.youtube.com/watch?v=9bgcDHLe3Xg
12. Videos
12
How to digitize a book:
http://www.youtube.com/watch?v=-M95Ob4kIak
How to chop and scan a book:
http://www.youtube.com/watch?v=8tx2JmW_p4c
Scanning text using OCR software:
http://www.youtube.com/watch?v=_SwrGtSY4-c
How to OCR PDFs easily with Acrobat Batch OCR:
http://www.youtube.com/watch?v=V6Iz3U5X-SU
How to digitize a million books
http://www.youtube.com/watch?v=OlKhKyTS23E
13. How to put a scanned doc into PDF format
13
http://www.ehow.com/how_8563246_put-scanned-
document-pdf-format.html
Some OCR softwares include
PDF format to save.
Have a good reading on
your digital device!
Made by Mario Laskovics (2012.04.03)