The document discusses conceptClassifier, a product from Concept Searching that provides automatic semantic metadata generation and classification of documents in SharePoint. It extracts concepts using compound term processing to tag documents with metadata and classify them into appropriate taxonomy nodes. This reduces the time and cost of manual metadata tagging while improving search, navigation, and other business processes that rely on high-quality metadata. The product demonstrates how metadata can be generated, applied consistently across document sources, and used to drive governance, records management, compliance and other enterprise initiatives when integrated with SharePoint.
2. Speakers NS Rana – Business Productivity Advisor, Microsoft NS is a 19 year IT Industry veteran. For the last 12 years NS has worked at Microsoft in various roles helping organizations both large and small achieve their full potential by effectively adopting and deploying Microsoft Technologies. In his current role as a Business Productivity Advisor, he defines, manages and delivers solution scenarios that include Collaboration, Enterprise Content Management, Enterprise Search, Business Intelligence powered by technologies like Microsoft Office and SharePoint. Donald T. Miller – Vice President Business Development, Concept Searching With over 20 years of experience, Don Miller is an industry veteran of search and information management solutions and is the Vice President of Business Development for Concept Searching. Val Orekhov – Chief Architect, Portal Solutions Val is the Chief Architect at Portal Solutions. Portal Solutions is a leading systems of Microsoft solutions and recently published a white paper on the next step in knowledge management which is knowledge optimization. Knowledge Optimization is positioned as the next logical evolution of Knowledge Management, and continues the company’s long tradition of thought leadership in the enterprise software industry. Concept Searching • Martin Garland• +1 (703) 531-8567 • marting@conceptsearching.com
20. Metadata Drives the Enterprise Concept Searching • Martin Garland• +1 (703) 531-8567 • marting@conceptsearching.com
21. The Enterprise Information Problem – Metadata is Everywhere Metadata Problems Concept Searching • Martin Garland• +1 (703) 531-8567 • marting@conceptsearching.com
22. Traditional Manual Metadata Approach Bleed 100Ks of Dollars From Your Company A manual metadata approach is unacceptable INACCURATE x INCONSISTENT = INCOMPLETE & UNACCEPTABLE Concept Searching • Martin Garland• +1 (703) 531-8567 • marting@conceptsearching.com
23. conceptClassifier and taxonomyManager are changing the metadata market. What are Enterprise Metadata Approaches? High Requires Domain Expertise Cumbersome and slow to build Requires Boolean Logic/Developer Concept Searching has changed the game! Productivity Gains Requires Domain Expertise Requires large training sets per node Cumbersome and slow to build Low High Low Cost Savings Concept Searching • Martin Garland• +1 (703) 531-8567 • marting@conceptsearching.com
30. Reduce IT infrastructure costs with metadata management for life cycle managementConcept Searching • Martin Garland• +1 (703) 531-8567 • marting@conceptsearching.com
31.
32.
33. Search will return results based on the concept even if the exact terms are not contained in the document (i.e. ‘coronary artery surgery’, ‘heart surgery’)
34. Can be used by any search engine index or any application/process that uses metadataTriple Baseball Three Heart Organ Center Bypass Highway Avoid Concept Searching • Martin Garland• +1 (703) 531-8567 • marting@conceptsearching.com
59. Provides a single search interface to end users from within SharePoint to multiple repositories (SharePoint, file stores, web sites)Taxonomy Development Management We Make Metadata Work For You MS Office Integration for Metadata Faceted & Taxonomy Navigation Plus Text Preview Full Integration with Content Types Single Classification Interface to SharePoint, File Stores, & Websites MOSS Record Center Workflow Automation Concept Searching • Martin Garland• +1 (703) 531-8567 • marting@conceptsearching.com
60.
61.
62. Content will be automatically classified to one or more nodes based on concepts within the content
63. Reduces time to develop, build, and maintain a taxonomy by as much as 80%
64. Can import industry standard taxonomiesConcept Searching • Martin Garland• +1 (703) 531-8567 • marting@conceptsearching.com
68. Editable from within SharePoint & the Concept Searching Taxonomy ManagerConcept Searching • Martin Garland• +1 (703) 531-8567 • marting@conceptsearching.com
72. Authorized users have complete control over automatically generated metadataConcept Searching • Martin Garland• +1 (703) 531-8567 • marting@conceptsearching.com
90. Automatically classify and place semantic metadata in search engine indexConcept Searching • Martin Garland• +1 (703) 531-8567 • marting@conceptsearching.com
106. Improves any business process that requires metadataWe make metadata work for you! Concept Searching • Martin Garland• +1 (703) 531-8567 • marting@conceptsearching.com
The key points on this slide are:Been in business since 2002, first customers in 2003Major Enterprises with up to 66 000 users have deployed successfully to manage unstructured dataOwned by the Founders – no external investment. Profitable with 35% growth in 2008 and already trading for similar growth in 2009.Increasing number of specialized Partners in this space buying into our value proposition.Concept Searching was founded in 2002 with the goal of developing statistical search and classification products that delivered critical functionality currently unavailable in the marketplace. The products were launched in 2003 and Concept Searching has experienced growth and profitability every year since. Concept Searching is the only statistical classification software company in the world that uses concept extraction and compound term processing to achieve the highest precision without the loss of recall. Our products are the only solutions that are fully integrated with MOSS and Microsoft Search. In side-by-side comparisons against industry leaders, Concept Searching has been able to dramatically illustrate the strength of the technology. Concept Searching counts an ever growing number of global and Fortune 500 and Fortune 1000 clients. We have built a strong partnership channel with Microsoft Partners. Continuing to invest in product development Concept Searching is defining new standards for the search and classification industry and is committed to delivering quantifiable business benefits to organizations around the world.
Traditional search assumes the end user knows what they are looking for, or must enter the ‘right’ combination of words to get the ‘right’ result.Knowledge workers need to identify content in the context of what they are seeking. The fundamental problem with search solutions is that they are based on an index of single words. Yet most queries are expressed in short patterns of words and not single words in isolation – which are highly ambiguous. In the example above, a search engine would identify all the documents that contained the words: triple, heart, bypass instead of documents that contained the concept of ‘triple heart bypass’. Since the concept has been identified, other documents that have related concepts will be identified even if they do not contain that exact phrase. The metadata generation issue is increasingly a growing concern in enterprises. Not only for search but also for records management, compliance, and enterprise content management. A comprehensive approach requires more than syntactic metadata and requiring end users to add rich metadata is haphazard and subjective at best. Since conceptClassifier for SharePoint is no longer restricted to keyword identification, compound term metadata can be automatically generated either when the content is created or ingested. The generation of metadata based on concepts extracts compound terms and keywords from a document or corpus of documents that are highly correlated to a particular concept. By identifying the most significant patterns in any text, these compound terms can then be used to generate non-subjective metadata based on an understanding of conceptual meaning. Compound term processing can address many challenges facing large enterprises and provide many benefits. Identification of concepts within a large corpus of information removes the ambiguity in search, eliminates inconsistent meta-tagging, and automatic classification and taxonomy management based on concept identification simplifies development and on-going maintenance.
“At last a tool set that enables enterprise content be the driver for business productivity”Concept Searching provides a comprehensive suite of tools for the automatic classification and taxonomy management of enterprise content. The ability to identify ‘concepts in context’ generates far richer meta data, improving the precision and relevancy in the information retrieval process. Concept Searching provides a comprehensive suite of tools for automatic semantic metadata generation, automated classification and taxonomy management of enterprise content. The metadata generation issue is increasingly a growing concern in large enterprises. A comprehensive approach requires more than syntactic metadata (i.e. date, author, title) and requiring end users to add rich metadata is haphazard and subjective at best. Since Concept Searching’s technology is no longer restricted to keyword identification, compound term metadata can be automatically generated either when the content is created or ingested. The generation of metadata based on concepts extracts compound terms and keywords from a document or corpus of documents that are highly correlated to a particular concept. By identifying the most significant patterns in any text, these compound terms can then be used to generate non-subjective metadata based on an understanding of conceptual meaning.The ability to identify ‘concepts in context’ generates far richer meta data, improving the precision and relevancy in the information retrieval process. Meta-tags are automatically added to the properties field of each document making the document more valuable to the organization by increasing the ability of the document to be retrieved using Microsoft Search Products that use keywords and metadata to retrieve information. conceptClassifier for SharePoint is fully integrated with both SharePoint, Microsoft Office, Exchange, FAST and Microsoft Enterprise Search. The automatic extraction of compound terms enables the Subject Matter Expert (SME) to use the terms within the taxonomy generation process, reducing the time to build out and maintain taxonomies by 80%. (Compound Term Processing performs matching on the basis of compound terms as opposed to keywords. Compound terms are built by combining two (or more) simple terms, for example ‘triple’ is a single word term but ‘triple heart bypass’ is a compound term. By identifying and forming compound (multi-word) terms and placing these in the search engine’s index the search can be performed with a greater degree of accuracy because the ambiguity inherent in single words is no longer a problem. A search for ‘survival rate after triple bypass surgery’ will locate documents about this topic even if the precise phrase is not contained in any of the documents. A traditional search query return would return all documents that contained the words ‘triple’, all the words that contain ‘heart’, and all the words that contain ‘bypass’.)Features: Downloadable in 30 minutes – no programming required Automatic classification and compound term meta data extraction Classification technology uses concept extraction and compound term processing Taxonomy based and faceted navigation Robust suite of tools to build an maintain taxonomiesFully integrated with Content TypesAutomatic classification from MS Office and OutlookTaxonomy browse, faceted navigation, and preview functionality from the search interfaceCan automatically classify from SharePoint, folders, and web sites providing a single interface to all permmissable content Simple intuitive interface designed for the SME Fully SOA compliant, delivered as Web Parts, based on open standards Integrates with Microsoft Office, Microsoft Records Center, and the Microsoft Business Data Catalog
A taxonomy is a classification structure that is represented by a hierarchical view of topics that have been grouped together because they share the same quality of characteristic. A taxonomy provides a unified view and access to relevant information across often disperse silos of information. Concept Searching supports multiple taxonomies within an organization. Taxonomy development is traditionally a very time consuming and costly activity. Our Taxonomy Manager has been proven to reduce taxonomy development time by 80%, generating a time savings of 6-12 months and a cost savings of $150K - $300K. Concept Searching also has a robust and frequently expanding library of off-the-shelf taxonomies covering a wide variety of domains to help jumpstart a classification project by providing off the shelf taxonomies to cover nearly any industry.The taxonomy (or multiple taxonomies) can be used by Subject Matter Experts (SME’s) to easily build taxonomies and classify document into predefined categories based on a small number of descriptors or clues. Once classified the documents can then be applied to a corporate taxonomy and made available to the organization. The taxonomy management features includes:- Ability to change the node weighting (score)- Auto clue suggestion: automatic generation of node clues from compound terms found in the document corpus eliminating training sets and complex Boolean rules- Dynamic screen updating: the user interface is fully AJAX enabled so changes to the taxonomy are immediately available for further refinementDocument movement feedback: this feature enables the SME to see the cause and effect on the taxonomy without re-indexing.The metadata generation issue is increasingly a growing concern in large enterprises. A comprehensive approach requires more than syntactic metadata (i.e. date, author, title) and requiring end users to add rich metadata is haphazard and subjective at best. Since Concept Searching’s technology is no longer restricted to keyword identification, compound term metadata can be automatically generated either when the content is created or ingested. The generation of metadata based on concepts extracts compound terms and keywords from a document or corpus of documents that are highly correlated to a particular concept. By identifying the most significant patterns in any text, these compound terms can then be used to generate non-subjective metadata based on an understanding of conceptual meaning. Compound term processing is a new approach to an old problem. Instead of identifying single keywords, compound term processing identifies multi-word terms that form a complex entity and identifies them as a concept. By deriving these compound terms from the clients own document corpus we can tag content with meaningful semantic metadata and enable Microsoft’s Enterprise search to filter across that metadata at retrieval thus deliver a higher degree of accuracy because the ambiguity inherent in searching against single words in isolation is no longer a problem. As a result, a search for “survival rates following a triple heart bypass” will locate documents about this topic even if this precise phrase is not contained in any document. Compound term processing can address many challenges facing large enterprises and provide many benefits. Identification of concepts within a large corpus of information removes the ambiguity in search, eliminates inconsistent meta-tagging, and automatic classification and taxonomy management based on concept identification simplifies development and on-going maintenance. The unique compound term processing enables the identification of compound terms (not keywords) from highly relevant content that can be used to trigger the automatic meta-tagging and the auto-classification processes. This conceptual metadata is added to the original metadata for the category/folder. More semantic metadata that can be linked to a document or record results in information that becomes more useful to the organization. Meta-tags are automatically added to the properties field of each document making the document more valuable to the organization by increasing the ability of the document to be retrieved using Microsoft Search Products that use keywords and metadata to retrieve information.
Following the automatic generation (tagging) of compound terms and semantic metadata the documents in the document libraries are then automatically classified to multiple categories within the taxonomy. The terms generated can be edited from within SharePoint or from within the Taxonomy Manager tool. The content will remain and can be accessed from the original location but can be linked to multiple categories/nodes.
Enterprises are increasingly understanding the value and critical need to utilize Content Types to structure their content and identify the type of document regardless of its physical site or library storage location. Content Types can be used to enforce metadata governance, adhere to policies and drive workflows in line with business processes. Included in the new release is the ability to assign taxonomies to specific Content Types. Documents that correspond to the selected Content Types will be classified and documents that do not correspond to a content type or do not include some metadata elements that a specific content type has specified will not be classified. This essential functionality allows different taxonomies to be assigned to different Content Types for example, assign the HR taxonomy to all Content Types of type “HR”, including any Content Types derived from “HR” and assign the Finance taxonomy to all Content Types of type “Finance”, including any Content Types derived from “Finance”. The configuration can be performed using a wizard that runs inside SharePoint. The taxonomies will be available for these documents regardless of their location. conceptClassifier’s site columns and Event Handlers are associated to the Content Types. This delivers the ability to automatically add classification functionality to new sites when created.
conceptClassifier for SharePoint fully supports Content Types. An add-on features includes the ability to update Content Types based on the identification of content during the classification process. This is particularly useful in records management and data privacy and security. This provides the ability to develop a series of actions that can occur when content contains specific metadata as defined by the organization.
conceptClassifier for SharePoint integration with Microsoft Office and Microsoft Exchange the automatic metadata generation and classification without end user participation. Alternatively, the Subject Matter Expert (SME) or Knowledge Worker can be granted the authority to modify the results from within the traditional Microsoft Office interface. The knowledge worker is the most qualified person to anticipate how the asset will be searched for and how to make it easy to find. The automatic classification returns not only single words but identifies concepts within the document to assist the knowledge worker in the classification process. This guided approach enables the knowledge worker to precisely and accurately classify the document for reuse and retrieval. Placing the ability to classify documents into the hands of knowledge workers results in rich and comprehensive metadata, significantly improving the organization’s ability to leverage their information capital. · Gives business experts the ability to classify critical business · information with highly relevant metadata· Greatly improves the search and retrieval process by ensuring accurate and complete metadata· Expedites organizational access to real-time information· Provides a consistent content management approach· Delivers metadata rich information retrieval thereby maximizing productivity and organizational agility
Knowledge workers need to identify content in the context of what they are seeking. The fundamental problem with most enterprise search solutions, and all statistical search solutions, is that they are based on an index of single words. Yet most queries are expressed in short patterns of words and not single words in isolation which are highly ambiguous. A concept search engine can isolate the key meaning that is normally expressed as proper nouns, nouns phrases and verb phrases. Although linguistic products can do this, their performance is highly variable depending upon the vocabulary and language in use. A statistical based language independent concept search can accept queries in natural language with the user typing words, phrases or whole sentences. The system then analyzes the natural language query to extract the keywords and phrases to identify the main concepts and retrieve content that is highly relevant. Precision and recall are the two key performance measures for information retrieval. Precision is the retrieval of only those items that are relevant to the query. Recall is the retrieval of all items that are relevant to the query. Yet most information retrieval technologies are less than 22% accurate for both precision and recall. The ideal goal is to have them balanced. Compound Term Processing has the ability to increase precision with no loss of recall. Documents that have been auto-classified are now accessible by searching for all the content within a folder and by using Microsoft Enterprise Search which can now filter on highly relevant metadata that has been created with Taxonomy Manager. Search results are clustered into categories or facets enabling an end user to rapidly drill into a result set based on organizational, functional, product line, and geographic metadata that have been generated using Taxonomy Manager and automatically tagged to relevant documents and records within document libraries. Based on the end user search refinement new facets will be generated when the query changes.
Knowledge workers need to identify content in the context of what they are seeking. The fundamental problem with most enterprise search solutions, and all statistical search solutions, is that they are based on an index of single words. Yet most queries are expressed in short patterns of words and not single words in isolation which are highly ambiguous. A concept search engine can isolate the key meaning that is normally expressed as proper nouns, nouns phrases and verb phrases. Although linguistic products can do this, their performance is highly variable depending upon the vocabulary and language in use. A statistical based language independent concept search can accept queries in natural language with the user typing words, phrases or whole sentences. The system then analyzes the natural language query to extract the keywords and phrases to identify the main concepts and retrieve content that is highly relevant. Precision and recall are the two key performance measures for information retrieval. Precision is the retrieval of only those items that are relevant to the query. Recall is the retrieval of all items that are relevant to the query. Yet most information retrieval technologies are less than 22% accurate for both precision and recall. The ideal goal is to have them balanced. Compound Term Processing has the ability to increase precision with no loss of recall. Documents that have been auto-classified are now accessible by searching for all the content within a folder and by using Microsoft Enterprise Search which can now filter on highly relevant metadata that has been created with Taxonomy Manager. Search results are clustered into categories or facets enabling an end user to rapidly drill into a result set based on organizational, functional, product line, and geographic metadata that have been generated using Taxonomy Manager and automatically tagged to relevant documents and records within document libraries. Based on the end user search refinement new facets will be generated when the query changes.
Knowledge workers need to identify content in the context of what they are seeking. The fundamental problem with most enterprise search solutions, and all statistical search solutions, is that they are based on an index of single words. Yet most queries are expressed in short patterns of words and not single words in isolation which are highly ambiguous. A concept search engine can isolate the key meaning that is normally expressed as proper nouns, nouns phrases and verb phrases. Although linguistic products can do this, their performance is highly variable depending upon the vocabulary and language in use. A statistical based language independent concept search can accept queries in natural language with the user typing words, phrases or whole sentences. The system then analyzes the natural language query to extract the keywords and phrases to identify the main concepts and retrieve content that is highly relevant. Precision and recall are the two key performance measures for information retrieval. Precision is the retrieval of only those items that are relevant to the query. Recall is the retrieval of all items that are relevant to the query. Yet most information retrieval technologies are less than 22% accurate for both precision and recall. The ideal goal is to have them balanced. Compound Term Processing has the ability to increase precision with no loss of recall. Documents that have been auto-classified are now accessible by searching for all the content within a folder and by using Microsoft Enterprise Search which can now filter on highly relevant metadata that has been created with Taxonomy Manager. Search results are clustered into categories or facets enabling an end user to rapidly drill into a result set based on organizational, functional, product line, and geographic metadata that have been generated using Taxonomy Manager and automatically tagged to relevant documents and records within document libraries. Based on the end user search refinement new facets will be generated when the query changes.
Knowledge workers need to identify content in the context of what they are seeking. The fundamental problem with most enterprise search solutions, and all statistical search solutions, is that they are based on an index of single words. Yet most queries are expressed in short patterns of words and not single words in isolation which are highly ambiguous. A concept search engine can isolate the key meaning that is normally expressed as proper nouns, nouns phrases and verb phrases. Although linguistic products can do this, their performance is highly variable depending upon the vocabulary and language in use. A statistical based language independent concept search can accept queries in natural language with the user typing words, phrases or whole sentences. The system then analyzes the natural language query to extract the keywords and phrases to identify the main concepts and retrieve content that is highly relevant. Precision and recall are the two key performance measures for information retrieval. Precision is the retrieval of only those items that are relevant to the query. Recall is the retrieval of all items that are relevant to the query. Yet most information retrieval technologies are less than 22% accurate for both precision and recall. The ideal goal is to have them balanced. Compound Term Processing has the ability to increase precision with no loss of recall. Documents that have been auto-classified are now accessible by searching for all the content within a folder and by using Microsoft Enterprise Search which can now filter on highly relevant metadata that has been created with Taxonomy Manager. Search results are clustered into categories or facets enabling an end user to rapidly drill into a result set based on organizational, functional, product line, and geographic metadata that have been generated using Taxonomy Manager and automatically tagged to relevant documents and records within document libraries. Based on the end user search refinement new facets will be generated when the query changes.
“At last a tool set that enables enterprise content be the driver for business productivity”Concept Searching provides a comprehensive suite of tools for the automatic classification and taxonomy management of enterprise content. The ability to identify ‘concepts in context’ generates far richer meta data, improving the precision and relevancy in the information retrieval process. Concept Searching provides a comprehensive suite of tools for automatic semantic metadata generation, automated classification and taxonomy management of enterprise content. The metadata generation issue is increasingly a growing concern in large enterprises. A comprehensive approach requires more than syntactic metadata (i.e. date, author, title) and requiring end users to add rich metadata is haphazard and subjective at best. Since Concept Searching’s technology is no longer restricted to keyword identification, compound term metadata can be automatically generated either when the content is created or ingested. The generation of metadata based on concepts extracts compound terms and keywords from a document or corpus of documents that are highly correlated to a particular concept. By identifying the most significant patterns in any text, these compound terms can then be used to generate non-subjective metadata based on an understanding of conceptual meaning.The ability to identify ‘concepts in context’ generates far richer meta data, improving the precision and relevancy in the information retrieval process. Meta-tags are automatically added to the properties field of each document making the document more valuable to the organization by increasing the ability of the document to be retrieved using Microsoft Search Products that use keywords and metadata to retrieve information. conceptClassifier for SharePoint is fully integrated with both SharePoint, Microsoft Office, Exchange, FAST and Microsoft Enterprise Search. The automatic extraction of compound terms enables the Subject Matter Expert (SME) to use the terms within the taxonomy generation process, reducing the time to build out and maintain taxonomies by 80%. (Compound Term Processing performs matching on the basis of compound terms as opposed to keywords. Compound terms are built by combining two (or more) simple terms, for example ‘triple’ is a single word term but ‘triple heart bypass’ is a compound term. By identifying and forming compound (multi-word) terms and placing these in the search engine’s index the search can be performed with a greater degree of accuracy because the ambiguity inherent in single words is no longer a problem. A search for ‘survival rate after triple bypass surgery’ will locate documents about this topic even if the precise phrase is not contained in any of the documents. A traditional search query return would return all documents that contained the words ‘triple’, all the words that contain ‘heart’, and all the words that contain ‘bypass’.)Features: Downloadable in 30 minutes – no programming required Automatic classification and compound term meta data extraction Classification technology uses concept extraction and compound term processing Taxonomy based and faceted navigation Robust suite of tools to build an maintain taxonomiesFully integrated with Content TypesAutomatic classification from MS Office and OutlookTaxonomy browse, faceted navigation, and preview functionality from the search interfaceCan automatically classify from SharePoint, folders, and web sites providing a single interface to all permmissable content Simple intuitive interface designed for the SME Fully SOA compliant, delivered as Web Parts, based on open standards Integrates with Microsoft Office, Microsoft Records Center, and the Microsoft Business Data Catalog