3.2 Kofax Partner Connect 2013 - Transformation Modules - Advanced Track and What Is New in KTM
1. PartnerConnect Deutschland
Frankfurt, 31. Januar 2013
Kofax Transformation Modules –
Expertenforum & Neuigkeiten in KTM?
Stephan Mayer / Stefan Skrok
Presales EMEA
2. Agenda
How to build a successful KTM project?
New features in KTM
2
3. Advanced Track & What’s new in KTM - Overview
On Ramp KTM for Forms
KIC - PDF and Color Documents Kofax Capture Add On – Features
Productivity Enhancements – Design Time Technology Enhancements
Benchmarking Trainable Document Seperation
Separation LimLoc Enhancements
Classification
Kofax Search and Matching Server
Extraction
Mix Print Detection
Knowledge Base Conflict Management
Clustering Utility
Project Merge Tool
Project Builder –Test Documents
New Xdoc Browser
Productivity Enhancements – Users This and That
Localisation Recostar 5
Thin Client Enhancements Normalization
Field and Table Drop Down Lists Format Locator Enhancements
Sticky Notes (Annotations) Locator Dialog - Testing
Advanced Routing Script
Docking and Zooming Rotation
3
4. The Golden Rule of KTM
Automation?
(i.e. how much data is
extracted automatically)
User productivity?
(i.e. how many docs can a
user process per hour)
4
5. The Fallacy of OCR Accuracy
What OCR accuracy do you have?
What is the straight-through processing rate?
How much can we automate?
85% straight-through processing
23 fields → 99.29% field accuracy
6 chars/field → 99.89% character accuracy
What is the cost of the other 15%?
You will lose this deal against an OCR Provider because this
deal is being fought over features and tech, and not business
value
5
6. User Productivity Example
Große Supermarkt-
kette in der Türkei
Pan European Wholesaler invoices/person/day Improvement
Before Kofax 800
After 3 months of Automation 1200 50%
After 2 weeks of „user productivity“ >2000 66% (150%!)
6
7. What are the goals of a KTM Project?
Every KTM project can be reduced to the following goal
Increase documents/person/hour.
Decrease clicks/document.
Can a user correct a problem faster than your complex
solution?
The goal is not
Perfect OCR
Perfect UI
Be guided by simplicity, order, speed
Do not chase accuracy, chase docs/person/hour.
7
8. Anyone can do KTM
Classify
Separate
Folder
Extract
Validate
Learning
8
9. All you need is paper and highlighters
Classify
Separate
Folder
Extract
Validate
9
10. Build a Benchmark
Add the Fields you need
Classify (F5)
Validate (F8)
Save Xdocs ( )
Train Xdocs (F10)
Train Project ( )
Tools/ExtractionBenchmark
/AllClasses
Save Benchmark
Open in Microsoft Excel
10
11. Goals of every KTM Project
1. Human Productivity
2. Eliminate False Positives
bad data leaving Kofax
3. Reduce False Negatives
user pressing ENTER
4. Few True Negatives
OCR Accuracy, Database problems &
learning
11
15. How to improve extraction quality beyond just OCR?
We probably have much of the
information to be extracted in our
dictionaries/databases already!
15
18. New Utility for Clustering Unknown Documents
What it does
Requirements
Step-by-step
Importing into KTM
18
19. What does the Kofax Clustering Utility do?
When configuring KTM content classification, the customer needs
to provide samples for each class.
What KTM requires:
19
20. What does the Kofax Clustering Utility do?
When configuring KTM content classification, the customer needs
to provide samples for each class.
What customers usually provide:
20
21. What does the Kofax Clustering Utility do?
presorting a document set into clusters of similar documents
User labels some of these clusters
Utility learns from labeling and pre-sorts again
Several iterations of labeling and pre-sorting
Export of sorted documents as learn-set for KTM content
classification
21
22. What does the Kofax Clustering Utility do?
new KTM project
Customer uses Utility to provide KPSG or partner with sorted
documents
KPSG or partner uses Utility to sort documents from customer
Understanding what are the biggest subsets of documents in a
customer’s monthly mailroom volume
enhancing a KTM project
Customer adds new classes to project and needs samples for
classification
22
23. Requirements
Kofax Clustering Utility works with XDocuments
XDocuments must be created with KTM OCR Server tool
KTM (5.5 SP2) must be installed to use Clustering Utility.
23
24. Requirements
Using the KTM OCR Server reduces the KTM base volume count
Eval licenses supported
Hardware requirements same as for KC/KTM
Files to be clustered should be local for performance
Need write access to file location
24
25. Step by Step – KTM OCR Server
Configuring the KTM OCR Server:
Select path to unsorted images
Enable „Save XDoc files“ and
„Save text files“
Under OCR Settings, select
proper language
Leave rest at default
Running the KTM OCR Server:
Simply press the Start button
25
26. Step by Step – Kofax Clustering Utility
1. Import
Point „Import directory“ to same directory of unsorted documents
For each document, an .xdc file and a .txt file must exist
Select „Start Discovery“.
Takes a while, ~0.5 sec per document
Converts XDocs into internal format
Identifies initial clusters
26
27. Step by Step – Kofax Clustering Utility
2. Discovery
Label initial 3 clusters
You see the most representative document of each cluster
Provide a name for each cluster, will be used as class name in
KTM
27
29. Step by Step – Kofax Clustering Utility
2. Discovery
You can stop discovery when 80-90% of the documents are
discovered or continue until all documents are discovered
At 80-90% the most common document types are often known,
remaining documents are likely in very small clusters
Click „Review“ to continue to next step
29
30. Step by Step – Kofax Clustering Utility
3. Review
Sort by categories (labels)
Examine the categories for consistency
Confirm some documents if you want to cluster again
30
32. Step by Step – Kofax Clustering Utility
4. Export
Select any directory for export
Sub directories will be created for each category/label
.txt files (and tif/xdoc for reference) will be exported, since only .txt
files are used to train KTM content classification later
32
33. Importing into KTM
In Project Builder, point New Project dialog Content Classifier
settings to exported directory
Select „Discovered documents“ sub directory
33
34. Importing into KTM
A class is created in Project Builder for each category
Training documents are imported
Select „Train“ in Project Builder main menu
Verify in Classification Benchmark (Result Matrix)
34
35. Importing into KTM
Setting this up manually
and finding/organizing the
proper training documents
takes hours or days.
With the Kofax Clustering
Utility, this example took
20 minutes.
35
37. Kofax Transformation Modules vs Xtrata
Kofax Transformation
for fixed-form and free-form documents,
Xtrata
limited to fixed forms only.
KTM key applications:
Automatic Indexing for Archive.
Workflow (Mailroom) Automation.
Forms Processing.
Accounts Payable Automation.
Automatic Document Separation.
Records management.
Xtrata key applications
forms processing only.
37
38. Kofax Transformation Modules vs Xtrata
Advantages of using KTM for forms processing applications
Ability to perform database matching to improve extraction rates -
improves ROI.
More powerful and flexible validation interface (with Xtrata you have to
use the KC validation module) - improves productivity.
More classification methods, i.e. layout and context vs layout only in
Xtrata – improves classification accuracy (requires Full Base license)
Scripting for more advanced applications - improves flexibility.
38
39. KTM for Forms
Basic Information
http://www.kofax.com/downloads/datasheets/ds-kc10-license-
update-en.pdf
Features
Layout-based classification
Unlimited extraction fields
Advanced Zone Locator
Barcode Locator
ABBYY FineReader OCR
Document Review (thick client)
Validation, Verification, Correction (thick and thin client)
39
40. Not supported in KTM for Forms
All locators not mentioned in the previous question
Content based classification
Any OCR besides Abbyy
Trainable Document Separation
40
42. KTM 5.5 – On Ramp
Kofax Capture
Scan
Scan
Business Processes
Kofax
Export Connector
Transformation
Fax Modules
Kofax Native PDF
Capture Support
Email Import
Connector
Folder
Web
service
Original Format
42
43. KTM 5.5 – PDF
Supports Advanced Supports Color
Binarization for full
compatibility with all
KTM functions
Advanced Binarization
Supports PDF settings at project level
43
44. KTM 5.5 – PDF
Extracts “perfect” original
PDF text layer.
No OCR required!
Image layer
is ignored!
Page 44
58. KTM 5.5- Extraction Benchmarking
Extraction Benchmark
EV = Extracted Value GFV = Golden File Value (perfect file)
EV = GFV Super
EV = GFV Work
EV ≠ GFV Work
EV ≠ GFV False positives
Project quality
Project design
Slide 58
58
64. KTM 5.5 – Conflict Management
Toolbar
Navigate between conflicts
Synchronize Zoom
Show All Fields
Page 64
65. KTM 5.5 – Conflict Management
A Conflict Document
Delete document
Delete field
File name
Page navigation
Conflicting field
Field area on document
Page 65
114. KTM 5.5 – Localisation
KTM Languages
English
German
114
115. KTM 5.5 – Localisation
Additional KTM Languages
# Language Pack Language ID
1 Brazilian pt-BR
2 Chinese zh-CN
3 Czech cs
4 French fr
5 Italian it
6 Japanese ja
7 Polish pl
8 Russian ru
9 Spanish es
10 Swedish sv-SE
Page 115
116. KTM 5.5 – Localisation
Additional KTM Languages
Graphic User Interface
Project Builder and runtime modules
Component based messages
KTM Server
Documentation
(runtime modules and Userguide.pdf)
1. Document Review
2. Correction
3. Validation
4. Verification
116
119. KTM 5.5 – Localisation
.Net concept
Primary language English en
English (United Kingdom) en-UK
Secondary language
English (United Stated) en-US
119
120. KTM 5.5 – Localisation
Fall back principle
120
121. KTM 5.5 – Localisation
Fall back principle
Localise
Primary – Secondary Yes
language translation?
No
Yes
Primary language
translation?
No
Use default value for display name Use translation value for display name
End
121
122. KTM 5.5 – Localisation
KTM GUI, Server and Active Language
122
123. KTM 5.5 – Localisation
KTM GUI Language, Server and Active Language
The Project.ActiveLanguage overrides the Region and Language settings
123
124. KTM 5.5 – Localisation
Summary
KTM Graphic User Interface language
KTM Server language
Project language (Project.ActiveLanguage)
-
-
-
124
125. KTM 5.5 – Localisation
What can be localised?
KTM Element Yes/No Note
Fields
Table Columns
Formatting Methods Component messages used
Validation Methods Regular Expression only
Component messages used
Validation Form Tab captions
Field label
Simple label
Button captions
DB button captions
Group captions
Script Resources
125
139. KTM TC 5.5 Improvements
Preserve User Settings
User name at login screen
Batch Open dialog box: size, columns, sorting settings
Panels: size, expanded states
Zoom settings: fit width, fit height, custom zoom
Annotation settings: hide/display annotations
139
140. KTM TC 5.5 Improvements
Advanced Login Capabilities
Domain login for linked users
Single sign-on support for Active Directory users
140
141. KTM TC 5.5 Improvements
Combo-boxes Inside Tables, Items With Descriptions
Display descriptions, values or both
Support empty strings consistently for all combo-boxes
Paging control for over 100 items
Type-ahead filtering capabilities
New script events to initialize scripted combo-boxes
141
142. KTM TC 5.5 Improvements
Other “Small” Things…
Batch loading performance improvements (project caching)
PDF support
Reject/Unreject documents – support scripting on the server
Allow to install Thin Client Server on top of previous version
Propagate user changes in config files to a new version
142
144. KTM 5.5 – Advanced Routing
Batch routing was new in KTM 5.0 with KC 9.0
Kofax Catpture Service Packs allow more functionality:
KTM 5.0
Batch Routing (routing of documents) is available in KC 9
Batch Routing (routing of folders) is available with KC 9 SP1
KTM 5.5
Assigning a new batch class to the child batches. KC 9 SP2 required
144
145. KTM 5.5 – Advanced Routing
Setting an XValue assigns a new batch class to a child batch:
KTM_DOCUMENTROUTING_NEWBATCHCLASS_<PlaceHolder>
Page 145
147. KTM 5.5 – TDS Enhancements
KTM AFC – Documents • AFC or SVM
KTM 5.5 AFC - Pages • TDS Separation
Algorythm unchanged
1st • Re-use training sets
SVM AFC
Middle
Last
• Re-build model
147
148. KTM 5.5 – TDS Enhancements
SVM vs AFC Training Set
100,000 30,000 100 doc.
pages docs types
SVM
AFC
Similar accuracey, but the AFC produces fewer missed splits
AFC allows for more frequent benchmarking
Page 148
148
150. KTM 5.5 – Line Item Matching Locator
Use cases for new features:
1. Multi PO discovery
2. Online Learning
3. Release Matching information to ERP
4. Getting more data
150
151. KTM 5.5 – Line Item Matching Locator
• Multi PO discovery
151
152. KTM 5.5 – Line Item Matching Locator
KTM 5.0
KTM Server Validation clerk KTM KB
Learning Server
Marked for
Learning Learned
KTM 5.5
KTM Server Validation clerk KTM KB
Learning Server
Marked for Marked for
Learning Learning Learned
Slide 152
152
153. KTM 5.5 – Line Item Matching Locator
Match Remarks
Information about Under-/Over delivery, Ambiguous matches,
etc. are now stored in new global column for Match Remarks
153
154. KTM 5.5 – Line Item Matching Locator
Additional columns
Table Locator can be used to find additional columns on the
invoice (e.g. Supplier Article code)
LIM Loc as input to Table Locator
Table Header pack for column
detection
154
155. KTM 5.5 – Line Item Matching Locator
Additional columns
Additional Database columns (e.g. Cost Center ID) can be copied to
the XDoc Table
155
157. KTM 5.5 – Search and Matching Server
Business Value New in KTM 5.5
Faster client startup time – Instant Client Server instead of local copy
feedback (No Loading Delay – No Local Memory
Usage)
Access large enterprise DBs Unlimited DB Size due to 64 bit support
(50 Mio Records Tested)
Fast response time Multithreaded design with full support of
multi core architecture
Industry standard connectivity MS SQL, Oracle, ODBC and CSV
Low Maintenance Automatic DB Update Scheduler in
background
157
159. Technical Background
Instant access, no loading time
Automatic update
Direct access to databases
Made for 64 bit systems and big databases
Load balancing available
Multiple KSMS Server
Security
Active Directory support
Secure communication
Administration through
KTM remote or KTM local
client possible
Separate installer
159
160. Kofax Search and Matching Server
Enterprise Customer DB
KSMS Search Speed - 1 million records
40,00
35,00
30,00
Search Operations / second
25,00
20,00 Server (8 cores + HT)
Server (24 Cores - no HT)
15,00
10,00
5,00
0,00
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
160
168. KTM 5.5 – This and That
Drop Down Boxes in Tables Cells
Drop Down Boxes Description | Value
RecoStar 5
Normalization
Format Locator Enhancements
Locator Dialogue & Testing
Sticky Notes (Annotations)
Docking and Zooming
Rotation
Script
168
169. KTM 5.5 – This and That: Dropdown Boxes in Table Cells
Validation Form Designer
Validation Form
Same script events as for normal combo boxes
169
170. KTM 5.5 – This and That: Recostar 5
Country and Language
170
171. KTM 5.5 – This and That: Recostar 5
Dictionaries
171
172. KTM 5.5 – This and That: Recostar 5
Zonal
172
173. KTM 5.5 – This and That: Normalization
Batch and Document structures
Memory or on disk?
173
174. KTM 5.5 – This and That: Format Locator Enhancements
Use and Sorting
174
175. KTM 5.5 – This and That: Locator Dialogue & Testing
175
176. KTM 5.5 – This and That: Sticky Notes (Annotations)
176
177. KTM 5.5 – This and That: Sticky Notes (Annotations)
Script events
Application_AnnotationCreated
Application_AnnotationSaved
177
178. KTM 5.5 – This and That: Docking and Zooming
Allow user to change the view - True/False
Docking
178
179. KTM 5.5 – This and That: Docking and Zooming
Allow user to change the view - True/False
Docking
The zoom value is stored seperately for [Left/Right] and [Top/Bottom]
Top
Left Fields Right
Bottom
179
180. KTM 5.5 – This and That: Docking and Zooming
Fit to Width
180
181. KTM 5.5 – This and That: Docking and Zooming
181
182. KTM 5.5 – This and That: Rotation
Use case: User rights
182
183. KTM 5.5 – This and That: Rotation
Project script: Document_XDocPageRotated
183
184. KTM 5.5 – This and That: Scripting
Class Script
ValidationForm_ButtonDialogClosed
ValidationForm_AfterViewerLassoDrawn
184