Concept Searching provides solutions for automatic metadata generation, taxonomy management, and classification to improve search and business processes. Their product suite includes Concept Classifier and Taxonomy Manager which extract conceptual metadata from content and align it with taxonomies. This facilitates guided navigation, records management, and compliance. A demo will show how end users can search and tag in SharePoint 2010 leveraging conceptual metadata and how the products integrate with term stores and FAST Search.
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
Webinar - The Swiss Army Knife for SharePoint 2010 – Tagging, Term Store and Taxonomies
1. Don Miller, VP of Business Development
1 (408) 828-3400
donm@conceptsearching.com
John Challis, CTO Founder
johnc@conceptsearching.com
The Swiss Army Knife of
SharePoint
2. Agenda – Tagging Taxonomies and Term Store
Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
First 20 Minutes
• Company Overview
• Review metadata, taxonomy, classification, search problem and value
• Case Study
• Review tagging approaches
• Review suggested taxonomy approaches to build out 2010 Term Store
• Concept Searching turn key approach to metadata management and term store development
Second 30 Minutes
• Product demo – End Users
– How does an end user search in 2010
– How does an end user tag in 2010
• How does conceptClassifier and Taxonomy Manager help 2010
– How do we accelerate process of tagging
– How do we improve end user search experience with “Guided Navigation”
• Product demo – Technical
– Improve Information Architecture Design
– Integration into 2010 Term Store
– Improve Taxonomy Design for Content Stewards
– Integration to FAST 2010
3. Concept Searching, Inc.
Company founded in 2002
Product launched in 2003
Focus on management of structured and unstructured information
Technology
Automatic concept identification, content tagging, auto-
classification, taxonomy management
Only statistical vendor that can extract conceptual metadata
2009 and 2010 ‘100 Companies that Matter in KM’ (KM World
Magazine)
KMWorld ‘Trend Setting Product’ of 2009
and 2010
Locations: US, UK, & South Africa
Client base: Fortune 500/1000 organizations
Managed Partner under Microsoft global ISV Program - “go to partner”
for Microsoft for auto-classification and taxonomy management
Microsoft Enterprise Search ISV , FAST Partner
Software Product Suite: conceptSearch, conceptTaxonomyManager,
conceptClassifier, conceptClassifier for SharePoint, contentTypeUpdater
Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
4. What Is Keyword vs. Metadata Costing You?
•Identify any type of
organizationally defined
privacy data
•Combines pattern
matching with associated
vocabulary
•Automatic Content Type
updating enabling
workflows and rights
management
Data Privacy Protection
•Average cost per
exposed record is $197
and ranges from $90-
$305 per record
•70% of breaches are due
to a mistake or malicious
intent by an
organization’s own staff
•Average cost runs from
$225K to $35M
•Eliminate manual tagging
& replace with automatic
identification of multi-
word concepts
•Provide guided
navigation via the
taxonomy structure (i.e.
concepts)
•Go beyond dynamic
clustering with
conceptual clustering
based on the taxonomies
Search
•“It’s not about better
search”
•Less than 50% of content
is correctly indexed, meta
tagged or efficiently
searchable
•85% of relevant
documents are never
retrieved in search
•Taxonomy navigation
is 36% - 48% faster
•Savings 2.5 hours
per user per day
•Eliminate inconsistent
end user tagging
•Automatically declare
documents of record
based on vocabulary and
retention codes
•Automatically change the
Content Type and route
to the Records
Management repository
Records Management
•67% of data loss in
Records Management is
due to end user error
•It costs and organization
$180 per document to
recreate it when it is not
tagged correctly and
cannot be found
•Savings of $4.00 - $7.04
per record by eliminating
manual tagging
•Ensures compliance and
reduces potential
litigation exposures
•Eliminate duplicate
documents
•Identify privacy data
exposures
•Identify and declare
records that were not
previously identified
•Notify users of high
value content
•Migrating required
content to a structure
Pre Migration/Collaboration
•60% of stored
documents are
obsolete
•50% of documents are
duplicates
•Requires resources to
identify content
alignment and what
should/not be migrated
•Reduces migration
costs
•Ensures
compliance and
protection of
content assets
•Easy end user
updates
Problem
Solution
Benefit
5. Tel: 703.246.9360 | Fax: 240.465.1182
USAF Human Performance Clearinghouse
GOAL : Leverage Existing USAF, AFDW, and AFMS License Agreements to
Enable IM, RM, & Privacy & Security Compliance
Requirements
• DoDD 8320 (Data Sharing in a Net-Centric DoD)
• DoDD 5015 (Records Management)
• USAF Privacy Act Program & HIPAA
• Freedom of Information Act (FOIA)
Distribution Statement A: Approved for public release; distribution is unlimited.
311 ABG/PA No. 09-488, 16 Oct 2009
MigrationMigration
Data Privacy
Records Management
Search
eDiscovery & FOIA
Distribution Statement A: Approved for public release; distribution is unlimited.
311 ABG/PA No. 09-488, 16 Oct 2009
6. Taxonomy
Management
or IA Design
Classification
and Content
Alignment
Search and
Validation
Accurate metadata requires three components…
taxonomy management, classification and search
There is no such
thing as an auto
taxonomy generation
tool. With out
taxonomy, you
cannot align content
or guide users to
content.
With out classification
your taxonomy does not
connect to search or
records management.
There is no alignment.
100% of results. With
out search, you can not
validate taxonomy and
classification. Should
support multi words,
because single key
words are ambiguous.
An enterprise metadata generation solution requires all three, otherwise the
results are inferior and cumbersome or impossible to implement
Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
7. A Manual Metadata Approach Will Fail 95%+ Of The Time
Issue Organizational Impact
Inconsistent Less than 50% of content is correctly indexed, meta-tagged or efficiently
searchable rendering it unusable to the organization (IDC)
Subjective Highly trained Information Specialists will agree on meta tags between
33% - 50% of the time. (C. Cleverdon)
Cumbersome - Expensive Average cost of manually tagging one item runs from $4 - $7 per
document and does not factor in the accuracy of the meta tags nor the
repercussions from mis-tagged content (Hoovers)
Malicious Compliance End users select first value in list (Perspectives on Metadata, Sarah Courier)
No perceived value for end user What’s in it for me? End user creates document, does not see value for
organization nor risks associated with litigation and non conformance to
policies.
What have you seen Metadata will continue to be a problem due to inconsistent human
behavior
The answer to consistent metadata is an automated approach that can extract the meaning from
content eliminating manual metadata generation yet still providing the ability to manage
knowledge assets in alignment with the unique corporate knowledge infrastructure.
Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
8. Create enterprise automated metadata
framework/model
Average return on investment minimum of 38%
and runs as high as 600% (IDC)
Apply consistent meaningful metadata to
enterprise content
Incorrect meta tags costs an organization
$2,500 per user per year – in addition potential
costs for non-compliance (IDC)
Guide users to relevant content with taxonomy
navigation
Savings of $8,965 per year per user based on an
$80K salary (Chen & Dumais)
100% “Recall” of content, 35% Faster access to
content “Precision”
Use automatic conceptual metadata
generation to improve Records Management
Eliminate inconsistent end user tagging at $4-$7
per record (Hoovers)
Improve compliance processes, eliminate
potential privacy exposures
conceptClassifer’s TaxonomyManager automated metadata approach drives
business value
1. Model and
Validate
2. Automate
Tagging
3. Findability
4. Business
Processes
5. Records
Management
and PII
6. Life Cycle
Management
Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
9. Core search expertise drives business value
Concept Extraction – the ability to extract ‘concepts in context’
• Only statistical metadata generation and classification company that can extract
concepts from content as it is created or ingested
triple heart bypass
Triple
Baseball
Three
Heart
Organ
Center
Bypass
Highway
Avoid
conceptClassifier will generate conceptual metadata by extracting multi-word terms that
identifies ‘triple heart bypass’ as a concept as opposed to single keywords
• Search will return results based on the concept even if the exact terms are not contained in
the document (i.e. ‘coronary artery surgery’, ‘heart surgery’)
• Metadata can be used by any search engine index or any application/process that uses
metadata
Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
10. conceptClassifier and Taxonomy Manager Value Propositions
Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
• No Behavior Modification for tagging or searching
• Accurate metadata (Metadata is descriptive information about
information)
• Accelerate and validate the building out of Taxonomies for Business
Applications (THERE IS NO SUCH THING AS AUTOGENERATING A
TAXONOMY!!!). If nothing, start with file share folder structure.
• Align content with Business Requirements
• Drive: Records Management, PII, Collaboration, Findability, EIA and
Taxonomy Based Applications
11. Start
• File Share
• Corporate WWW Site Directory
• Industry Standard – ~50% of terms; but which 50% are in alignment with
your content?
• Align with Business Goals
• Search Logs
Optimize
• Subject Matter Expert Interviews
• Card Sorts/Tree Builds
• Clustering
Validate
• Classify against Taxonomy
• Guided Navigation
Manual taxonomy approaches to build out 2010 Term Store
However, the real problem will still be
that until you force end users to
manually tag, your taxonomy is of
little value!
Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
12. Taxonomy Manager aligns the content and provides “Guided Navigation”
Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
• Enterprise Architects /
Content Editors can ensure
alignment with taxonomy
• After 100% of Results are
returned, leverage metadata
for guided navigation and
refiners
• Accelerate document finding
[PRECISION] by a minimum
of 35%
I want all proposals in two
specific regions. I could then
have a guided refiner for
vertical, amount, etc.
13. Dynamic clustering is not Guided Navigation for “Proposals”
Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
• Most clustering is key word
based
• Brings back clusters, they
are best guesses
• Clustering is not a starting
point taxonomy, good for
clues about a term or
concept.
• They might help, they
might make it worse
• Better than nothing, but
not a long term strategy or
evolution of key word
search
Dynamic navigation (CLUSTERING) is helpful,
but how does an information worker know
when it is a good topic or not? This is NOT
PRECISION!
14. Clustering can be used to drive relevant clues about a term
Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
15. Search allows us to validate our terms and the clues for our terms
Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
16. conceptClassifier and taxonomyManager
We Make Metadata Work For You
Automatic Conceptual Metadata Generation
Automated Classification
Taxonomy Development & Management
• Proven to reduce taxonomy development by 80%
Microsoft Integration
• Runs natively in SharePoint 2007 and SharePoint
2010, Microsoft Office Applications, SharePoint
Search and FAST, Windows Server 2008 R2 FCI
• Fully integrated with SharePoint Content Types
Content Type Updater
• Automatically changes the Content Type based
on presence of organizationally defined
metadata found within the document
• Identification of confidential/privacy data
• Ability to identify records based on the
records retention schedule and route to
the records center
Technology
• Downloadable in 30 minutes – no programming
required
• Fully SOA compliant, delivered as Web Parts, based
on open standards
• Highly scalable
Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
17. conceptClassifier for FAST Search
Improves search outcomes by placing conceptual
metadata in the FAST Search index to increase
relevancy of search results
Enables import of FAST Entities into the
conceptClassifier taxonomy manager to fine-tune
them with metadata generated from your own
content and nomenclature
Runs natively as a FAST Pipeline Stage eliminating
integration and customization issues
Eliminates vocabulary normalization issues across
global boundaries through controlled vocabularies
Improves faceted search results as facets are
based on concepts aligned with the taxonomy
Provides taxonomy browse capabilities based on
the nodes within the corporate taxonomy(s)
Provides accurate metadata filters such as numeric
range searching and wildcard alphanumeric matching
Removes documents from search results that are
confidential/sensitive through automatic Content Type
updating and routing to secure server
Automatically tags content with both vocabulary and
retention codes and respects SharePoint security that
could prevent access to the document once it has been
declared a record
Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
19. Please feel free to call me about your term store questions and
best of breed approaches.
Don Miller, VP of Business Development
1 (408) 828-3400
donm@conceptsearching.com
Notes de l'éditeur
It is important to note that metadata, auto-classification, and taxonomies are not applications – the business value of these tools are often integrated with other solutions – such as the offerings of the other participants in this panel
Let’s look at where these tools can compliment other solutions and improve business processes
CLICK: Migration:
With the vast amounts of content - moving all content doesn’t make sense and using valuable resources to identify what should/should not be migrated isn’t a good use of time or money
Before the migration you can use these technologies to:
Eliminate duplicate documents
Identify documents that contain confidential or privacy data
Identify and declare records
Identify high value content
Savings: We had one client who needed to manually tag 45K marketing documents and estimated that it would take 6 months will 2 full-time people – with our tools it took 2 weeks
CLICK Search:
The age old problem is how to get end users to tag content – it’s estimated that less 50% of content is correctly indexed, meta tagged or efficiently searchable – it isn’t about what search engine you use
Statistics still claim that end users spend 15% of their time duplicating information, 25% searching, and 40% can’t find what they need to do their jobs
Automatic generation of conceptual metadata removes the end user from the tagging process HUMANS WON’T TAG CONTENT THROUGH FORMS, PICKLISTS, DROP DOWNS BUT WE WILL ALWAYS FIND WAYS TO AVOID TAGGING
Content, once tagged can be provided to any search engine index to deliver more accurate search results
Using the taxonomy users can more efficiently find relevant information via the hierarchical structure
Savings: 2.5 hours per day per user
CLICK Records Management:
The problem cited most frequently is inconsistent end user tagging in the declaration of records
With metadata generation and a taxonomy that mirrors the file plan – documents can be automatically declared records based on the concepts and descriptors within the document
Based on custom Content Types in SharePoint the document can be declared a record and routed to the RM repository
Savings: $4 - $7.04 per document record
CLICK Data Privacy Protection
Taxonomy(s) can be created to identify any organizationally defined confidential information
When content is created or ingested the document can be identified as containing confidential information and using Content Type updating the document can be routed to a secure location and locked down using Windows Rights Management
Cost Avoidance: Average cost of a data exposure is $225K - $35 million
Tying this all together and seeing how it works in the real world:
The USAF Human Performance Clearinghouse (HPC) is an enterprise solution that serves the USAF Human Systems Integration Community. Lead by the Air Force Medical Service (AFMS) the HPC leverages SharePoint and conceptClassifier to deliver real-time collaboration, Information/Content Management, Knowledge Management, Taxonomy Management, automatic metadata tagging, and automated Windows Rights Management to over 75 locations worldwide.
US Air Force Medical Service
Initially deployed conceptClassifier to power Knowledge Portal with over 65K users
Controlled vocabulary consists of over 27K unique keywords, metadata, and multi-word fragments generated by conceptClassifier
They are now using it do solve a variety of challenges
CLICK THROUGH ARROWS:
CLICK: Migration
CLICK: Data Privacy
CLICK: Search
CLICK: Although we didn’t talk about this, the US Air Force HPC also uses it for eDiscovery and Freedom of Information Act
conceptClassifier provides technologies that are natively integrated with SharePoint and delivers the missing pieces including the conceptual metadata generation, auto-classification, and taxonomy management tools that can be used to leverage your metadata and improve business outcomes resulting in a tangible ROI, reduces costs and organizational risk.
Traditional search assumes the end user knows what they are looking for, or must enter the ‘right’ combination of words to get the ‘right’ result.
Knowledge workers need to identify content in the context of what they are seeking. The fundamental problem with search solutions is that they are based on an index of single words. Yet most queries are expressed in short patterns of words and not single words in isolation – which are highly ambiguous. In the example above, a search engine would identify all the documents that contained the words: triple, heart, bypass instead of documents that contained the concept of ‘triple heart bypass’. Since the concept has been identified, other documents that have related concepts will be identified even if they do not contain that exact phrase.
The metadata generation issue is increasingly a growing concern in enterprises. Not only for search but also for records management, compliance, and enterprise content management. A comprehensive approach requires more than syntactic metadata and requiring end users to add rich metadata is haphazard and subjective at best. Since conceptClassifier for SharePoint is no longer restricted to keyword identification, compound term metadata can be automatically generated either when the content is created or ingested. The generation of metadata based on concepts extracts compound terms and keywords from a document or corpus of documents that are highly correlated to a particular concept. By identifying the most significant patterns in any text, these compound terms can then be used to generate non-subjective metadata based on an understanding of conceptual meaning.
Compound term processing can address many challenges facing large enterprises and provide many benefits. Identification of concepts within a large corpus of information removes the ambiguity in search, eliminates inconsistent meta-tagging, and automatic classification and taxonomy management based on concept identification simplifies development and on-going maintenance.
conceptClassifier for SharePoint is fully integrated with both SharePoint, Microsoft Office, Windows Server 2008 R2 FCI, FAST and Microsoft Enterprise Search.
The automatic extraction of compound terms enables the Subject Matter Expert (SME) to use the terms within the taxonomy generation process, reducing the time to build out and maintain taxonomies by 80%.
Features:
Downloadable in 30 minutes – no programming required
Automatic classification and compound term meta data extraction
Classification technology uses concept extraction and compound term processing
Taxonomy based and faceted navigation
Robust suite of tools to build an maintain taxonomies
Fully integrated with Content Types
Automatic classification from MS Office and Outlook
Taxonomy browse, faceted navigation, and preview functionality from the search interface
Can automatically classify from SharePoint, folders, and web sites providing a single interface to all permmissable content
Simple intuitive interface designed for the SME
Fully SOA compliant, delivered as Web Parts, based on open standards
Integrates with Microsoft Office, Microsoft Records Center
The Only Microsoft Solution that Runs Natively in ...
FAST Search, SharePoint 2007, 2010, Windows Server R2 FCI, and Microsoft Office
conceptClassifier provides the tools to rapidly build and easily manage unstructured content. Providing automatic conceptual metadata generation, automated classification and taxonomy management organizations can harness the power of content to not only improve findability within the FAST Search product suite, but drive additional business processes such as records management, compliance, and enforce governance.
The Only FAST Search Solution that ...
Automatically Generates Conceptual Metadata
Utilizing our unique concept identification and extraction capabilities, conceptClassifier’s statistical engine can identify out-of-the box all the meaningful concepts resident within an organization’s own information repositories and automatically generate semantic metadata that is unique to organization and their nomenclature.
The ability to automatically generate conceptual multi-word term metadata and placing those terms in the FAST Search index, the search can be performed with a higher degree of accuracy because the ambiguity inherent in single words is no longer a problem.
Utilizing the Concept Searching technology framework, end users can now search on concepts, delivering a multi-dimensional view of relevant information and easily identify the relationships between content assets that otherwise may not have been found.
The Only FAST Search Solution that ...
Eliminates Manual Metadata Tagging
The Only FAST Search Solution that...
Delivers Innovative, Intuitive, & Rapidly Deployed Taxonomy Management Managed by Business Users