SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Do not reinvent
Findability and Knowledge Management




Håkan Tylén
Western Europe Business Development
+46703091665
hatylen@microsoft.com
Agenda outline
Customer/Employee Service,
in the Self-service channel


         How can I help YOU?
Metadata basics
What is it? Where is it stored?




     Metadata is the set of properties that characterize a document.
Poor metadata impairs the search experience
Degraded findability leads to the erosion of users’ trust in search
                                                                        Few options to navigate
   Inconsistent, incorrect or missing        I’m not confident I will
                                             find what I need here…
                                                                         or refine a large result
                                                                        list other than trying to

   metadata is commonplace within            This is a waste of time!
                                                                         reformulate the query


   most organizations today
   This impairs findability in the context
   of enterprise search
      Hard to scan or navigate results
      Documents returned may be                                         Unchanged template
                                                                        metadata make results
                                                                         look like duplicates
      incomplete or not current
      No confidence in authority and
      correctness of information
      Difficult to locate relevant experts
                                                                        Meaningless metadata
                                                                        confuses users as they
   Even with refinement                                                 scan the search results

   tools, users do not
   rely on them
      Multiple variations
      or spellings                                                      Missing metadata raises
                                                                        questions about result

      Hit counts do not
                                                                           set completeness


      add up
ROI - Scenarios

1.   Time Wasted Searching
2.   Cost of Reworking Information
3.   Opportunity Costs to the Enterprise




                                6   | SharePoint Server 2010 for Internet Sites Microsoft confidential.
Scenario 1: Time wasted

 €3.000/month + social           €50.000/year
 10 minutes/day *220
   €1.000/emp/year

 1000 employees = €1.000.000/year ”released time”




                                   7   | SharePoint Server 2010 for Internet Sites Microsoft confidential.
Creating quality metadata is a real challenge
Few organizations have good quality metadata on internal content
                          • Ineffective information governance across the enterprise
                          • Multiple content silos and search interfaces
Challenge                 • Manually entered metadata is inconsistent, incorrect or missing
                          • No automated tools for content classification
                          • Impossible to keep up with ever growing content volumes




                                       Assist users in tagging
                                      content with automated
                                       metadata suggestions
                                        or enrichment tools




• FAST Search for SharePoint (FS4SP) delivers business value out-of-the-box
• Sophisticated content processing optimizes findability across multiple silos       Solution
  of unstructured and structured content
• In addition, property extraction overcomes poor metadata by generating it
  and normalizing it on-the-fly
Agenda outline
Content Processing Pipeline – what is it?
Enhance your content for optimal search experience and findability

   The pipeline is a sequentially arranged set of discrete processing
   stages that break down and enrich content for indexing
     Convert documents to plain text (support for 400+ file formats)
     Detect document languages and encoding (support for 80+ languages)
     Apply linguistic normalization to optimize content for search
     Identify and leverage existing metadata where applicable
     Parse content to extract or generate additional metadata
     Map content and associated metadata (crawled properties) to the index
     schema (managed properties) for searching
     Custom stages can be created and added to the pipeline




      Language
       Custom       Identifies the encoding and language-specific rules for
                    Breaks you to tokens entities mentioned
                    Applies document times (phrase/weight inin pipeline
                    Recognizes predefined usinga standard topairsthe so
                    Createstext into andvectors content processingthe content;
                    Converts dates extend the normalizationrepresentation,
                    Enables language-specific tolanguages usedcontenttext to
                                                                       reflecting
   Date and Time
     Properties
      Property
       Format       Extracts plain text pieces of content and metadata
                    Maps the relevant and metadata from multiple content
   Lemmatization
    Encoding and
    Vectorization
    Tokenization
     Processing     content so that the (home-grown occurrence) 3rd party
                    punctuation, support for and phrases
                    users’ locale-specific accents, linguisticin words,enable
                    out custom stages appropriate of solutionscanonical
                    important terms and frequency compoundexample, the
                    handlethe box match wordsCompanies, Locations andor
                    withof queriesdiacritics,representations; fornormalization
                                                                   or to phrases
   Normalization
     Conversion
     Extraction
       Mapper       formats (e.g. the pipeline to the index schema
                    discovered inMicrosoft Office, PDF, HTML, etc.) for search
      Detection
        Stage       rules and to (currency, telephones, downstream
                    and similar”address extended to other 2010
                    inflected dictionariesyour
                    People but this is equivalent to March numbers, etc.)
                    “find 14-Mar-10can becan be appliedpartneeds
                    datenumbers functionality own masculine/feminine,
                    software)forms (singular/plural,business14,categories etc.)
Property Extraction
Create metadata on-the-fly, adding structure to unstructured content
   In a nutshell, property extraction                                                                                  Crawled Properties
   is the ability to                                                                                                          Companies

        Process unstructured content (e.g.                                                                                     Microsoft
                                                                                                                               Contoso
        a document’s body)                                                                                                    Woodgrove

        Recognize entities mentioned in                                                                                            …


        the text (e.g. people, companies,                                                                                     Locations

        locations, concepts, etc.)                                                                                             London
                                                                                                                             San Francisco
        Optionally, normalize variations to                                                                                    Moscow

        a single, canonical form                                                                                                   …


        Expose these extracted entities as                                                                                      People

        crawled properties in pipeline                                                                                         Bill Gates
                                                                                                                             Barack Obama
        Map them to managed properties                                                                                        José Caires

        for filtering and searching                                                                                                ...




Index Schema:                                                                        Managed Properties
 Type   Doc ID     Title      Author       Date        Size    Keywords     Companies      Locations     People        ...      Body Text

         xxx     Sales For…   John Doe   2010-04-15   386 KB   sales; pipe… Microsoft; …   London; …   Bill Gates; …   …        The mark…

         yyy         …           …           …          …          …             …            …             …          …                 …

         zzz         …           …           …          …          …             …            …             …          …                 …
Good metadata greatly improves findability
Property extraction enables consistent metadata across all content
                                           This is really great! Now I
   Metadata quality is critical to         can navigate through this
                                                                         Metadata is also used
                                                                          for relevancy tuning,

   the search experience                  large information universe     multi-level sorting and
                                                                            advanced search
                                             without feeling lost…

   FS4SP leverages metadata,
   i.e. managed properties, to
   present deep refiners              File Formats ,


      Offer at-a-glance overview
      Organize free-text search
      results into multiple facets
                                       Companies
      Make search conversational
      Guide users toward possible                                        Precise hit counts in

      refinement choices                                                  deep refiners are
                                                                         computed across the
                                                                           whole result set.
      Prevent users drilling down       Products
      into a “0 results” dead end
   Additional uses for managed
   properties in FS4SP
      Relevancy tuning & ranking        Concepts

      Multi-level sorting
      Advanced (or fielded) search
                                     And many more…
The Microsoft IT Intranet
Environment
                                                               6.4 TB
                                                               49,731 Sites
        Seattle                       Dublin                   117,324 Sub-sites
                           29.89 TB                      22%


                      65%
                            ( 31,346,042 MB )
                           Grows with 1.5TB per quarter
                                                                  Singapore
                           223,595 Sites       4.1 TB
19.4 TB                    545,387 Sub-sites 45,878 Sites
                                           13%
127,986 Sites                                                    82,128 Sub-sites
345,935 Sub-sites
                              - Europe - Middle East -
           - Americas -              - Africa -                - Asia Pacific -
    As of September 2010
                                                                                  |   13
Knowledge Transfer: MSW
Property extraction and refiners in FS4SP
What’s available out-of-the-box?
   FS4SP automatically detects 80+
   languages in content
   Property extraction dictionaries are
   included for 11 languages* and 3
   types of entities
         Locations
         Companies
         Persons
   The metadata is exposed to users as
   refiners, drives relevancy and other
   features to improve findability
   This delivers real business value to
   organizations struggling with issues
   such as
         Poor document metadata
         Large content volumes
         Lack of result refinement options
         Low user adoption of search
   * Arabic, Dutch, English, French, German, Italian, Japanese, Norwegian, Portuguese, Russian, Spanish
Extending property extraction in FS4SP (1/2)
Make search speak the language of your business using dictionaries

   Property extraction in FS4SP is                      SharePoint lists & Term Store


   customizable using a dictionary,
   i.e. list of keywords and phrases
   Matching variations can be
   normalized to a single entry
   Several dictionaries may co-exist
   to address needs of the business
      Projects
                                              Create custom
      Products                                search refiners
                                              to fit your own
      Customers                               business needs

      Competitors
      Employees
      Business-specific concepts
   The necessary data may be readily
   available within the organization
   or from external sources
                                                   LOB applications, Databases & XML
Extending property extraction in FS4SP (2/2)
Use existing text mining or classification tools to go even further

   Another approach is to invoke                    External text mining/classification tool
   external tools during content
   processing in FS4SP
      This leverages the standard
      pipeline extensibility mechanism        Local software                                 Web service

   Such tools typically address
   problems like                                                                           Analyze
                                                                                         text content
      Text mining for entity, fact or
      relationship extraction                      Return
                                                metadata tags
      Taxonomy classification
   Moreover, these tools may be
   already deployed for other               Index

   purposes in the enterprise
                                                                          Content pipeline

                                                      Enriched document
      Home-grown solutions                                for indexing
                                                                                                           ?
      3rd party, specialized vendors
         Industry sectors or verticals                                                           Original document
                                                                                                  from repository
         Scientific or technology domains
Agenda outline
Best practice #1
Deepen your understanding of your audiences and your content




  Marketing   Sales   Consulting   Procurement   Production   Research   IT Support   HR / Legal
Enterprise
 content




                 Before you start deploying enterprise search:
               understand your content, your users and what
                they need to get their jobs done effectively.
Best practice #2
     Use existing language resources inside and outside your enterprise



                     •Thesauri, controlled                             •Government
   Internal assets




                                                  Internet resources




                                                                                                  Content providers




                                                                                                                                                 Specialized vendors
                      vocabularies                                      agencies
                     •Taxonomies,                                      •Industry bodies
                      ontologies                                       •Research
                                                                        institutions
                     •Master databases
                                                                       •Academia
                     •Enterprise systems
                                                                       •Virtual
                     •Line-of-business                                  communities
                      applications
                                                                       •Examples
                     •Subject matter
                      experts                                           •Wikipedia.org
                                                                        •DBpedia.org
                     •Examples*                                         •WordNet, from
                      •SharePoint (Lists,                                Princeton University
                       Term Store)                                      •Medical Subject
                      •Employees (AD, HR)                                Headings (MeSH)
                      •Customers (CRM)
                      •Suppliers (ERP)
                      •Products (PLM)
                      •Processes (BPM)
                      •Projects (EPM)



* AD – Active Directory; CRM – Customer Relationship Mgmt.; ERP – Enterprise Resource Mgmt.; PLM – Product Lifecycle Mgmt.; BPM – Business Process Modeling; EPM – Enterprise Project Mgmt.
Best practice #3
Keep the index synchronized with content sources and dictionaries

      The language of the business      Where possible, automate
      will change over time             dictionary upkeep as part of
               External environment     standard business workflows
               Enterprise content         Taxonomies and thesauri
               Users’ needs               Enterprise project management
      Ensure that property extraction     Product lifecycle management
      dictionaries and search index     Schedule regular analysis and
      are systematically updated to     review checkpoints to handle
      respond to these changes          exceptional cases
Dictionary




                                                                        with changes over time
   Data




                                                                         Search synchronized
 Sources

  Property
 Extraction
Dictionaries

  Search
  Index

Enterprise
 Content
 Sources
Best practice #4
Distinguish search management from systems management

   As the language of your business                   Search management is not an IT
   and users’ needs evolve, so should                 responsibility, it’s for the business
   your search solution
                                                         Job profile
   If not, the search experience and
                                                         • Skillset of a SharePoint administrator (not a
   findability inevitably degrade over                     programmer or systems engineer)
   time – users’ trust will plunge too                   • Business perspective and focus
                                                         • Good ability with languages
                                                         • Attention to detail

                            Original implementation      Sample tasks
                             of the search solution
                                                         • Monitor search reports (daily/weekly)
                                                         • Run user polls and/or focus groups
                                                           (quarterly)
                                                         • Process users feedback/questions
                                                         • Update dictionaries and manage keywords
                                                           (as required)
                                                         • Support search-related projects

                                                         Staffing – depends on scale
Actual search experience,
   if left unattended...                                 • One person part-time, or
                                                         • A geographically distributed team
Agenda outline
Case study #1
General Mills (Research & Development)
    Business Problem
   • Researchers forced to search each internal and
     external content source separately
   • Low relevancy in existing search applications
   • High effort in information discovery tasks
   • Growing difficulty in establishing connections with
     experts as company grew worldwide


    Approach & Solution
   • FAST Search for SharePoint indexes all internal
     sources and federates external industry services
   • Property extraction dictionaries extended to
     recognize product names cited in documents
   • Deep refiners are used on extracted properties to
     drill down by products, companies and people


    Benefits & Value
   • Improved employee productivity with more relevant
     search results in a unified interface
   • Greater information sharing and reuse across
     product areas & geographies                           By using FAST Search Server 2010 for SharePoint, our
   • Integrated people search eases social networking      researchers can refine their searches and find exactly what
                                                           they are looking for. They spend more time innovating than
   • Proof point for wider search roll-out in enterprise   looking for information.

Link to full case study
                                                              – Michelle Check, R&D Systems Leader, General Mills
Case study #2
Mississippi Department of Transportation (MDOT)
    Business Problem
    • Poor access to a large, active collection of paper-
      based contracts and project documents
    • Metadata managed in a separate DMS (database)
    • Information silos stifle and sharing of data and
      collaboration
    • Requirements to provide internal and public access


    Approach & Solution
    • FAST Search for SharePoint indexes images with
      iFilter-based OCR technology
    • Pipeline extended with custom .NET code to merge
      metadata from database with indexed documents
    • Custom refiners reflect language used in the
      business for navigating search results


    Benefits & Value
    • Unified self-service interface to locate information
    • Ability to slice & dice results according to specific
      needs (dates, project, folder, route, district, etc.)   We are literally reducing decision cycles from days to
    • Information search times cut from several hours or      minutes for hundreds of overlapping decisions a day. With
      days to mere seconds or minutes                         SharePoint Server 2010, we can make better spending
                                                              decisions and enhance program performance without a very
    • Users have more time to focus on higher value tasks     large investment.
Link to full case study                                                         – John Michael Simpson, CTO, MDOT
Ingredients for great enterprise search
The business value of FAST Search Server 2010 for SharePoint
               The challenges
               • Explosive content growth puts information management and
                 governance under pressure
               • Multiple content silos with different search interfaces
               • Poor metadata – missing, inconsistent, incorrect


               The solution
               • Content processing optimizes findability across disparate sources
               • Property extraction generates metadata while indexing content
               • Deep refiners expose metadata in search results helping users
                 quickly zoom to the right information


               The benefits
               • Reduced costs through enterprise search consolidation and
                 automated metadata enrichment
               • Enhanced findability helps employees to get their job done faster
               • Increased user adoption across the enterprise drives ROI
microsoft.com / Enterprise Search

 © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market
    conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
                                        MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Contenu connexe

Tendances

Hw09 Terapot Email Archiving With Hadoop
Hw09   Terapot  Email Archiving With HadoopHw09   Terapot  Email Archiving With Hadoop
Hw09 Terapot Email Archiving With HadoopCloudera, Inc.
 
WHAT IS A DBMS? EXPLAIN DIFFERENT MYSQL COMMANDS AND CONSTRAINTS OF THE SAME.
WHAT IS A DBMS? EXPLAIN DIFFERENT MYSQL COMMANDS AND  CONSTRAINTS OF THE SAME.WHAT IS A DBMS? EXPLAIN DIFFERENT MYSQL COMMANDS AND  CONSTRAINTS OF THE SAME.
WHAT IS A DBMS? EXPLAIN DIFFERENT MYSQL COMMANDS AND CONSTRAINTS OF THE SAME.`Shweta Bhavsar
 
[Hadoop] NexR Terapot: Massive Email Archiving
[Hadoop] NexR Terapot: Massive Email Archiving[Hadoop] NexR Terapot: Massive Email Archiving
[Hadoop] NexR Terapot: Massive Email ArchivingJinho Jung
 
Internet of Information and Services (IoIS)
Internet of Information and Services (IoIS)Internet of Information and Services (IoIS)
Internet of Information and Services (IoIS)Antonio Marcos Alberti
 
Survey of Machine Learning Techniques in Textual Document Classification
Survey of Machine Learning Techniques in Textual Document ClassificationSurvey of Machine Learning Techniques in Textual Document Classification
Survey of Machine Learning Techniques in Textual Document ClassificationIOSR Journals
 
Aardvark Final Www2010
Aardvark Final Www2010Aardvark Final Www2010
Aardvark Final Www2010guestcc519e
 

Tendances (8)

Hw09 Terapot Email Archiving With Hadoop
Hw09   Terapot  Email Archiving With HadoopHw09   Terapot  Email Archiving With Hadoop
Hw09 Terapot Email Archiving With Hadoop
 
WHAT IS A DBMS? EXPLAIN DIFFERENT MYSQL COMMANDS AND CONSTRAINTS OF THE SAME.
WHAT IS A DBMS? EXPLAIN DIFFERENT MYSQL COMMANDS AND  CONSTRAINTS OF THE SAME.WHAT IS A DBMS? EXPLAIN DIFFERENT MYSQL COMMANDS AND  CONSTRAINTS OF THE SAME.
WHAT IS A DBMS? EXPLAIN DIFFERENT MYSQL COMMANDS AND CONSTRAINTS OF THE SAME.
 
[Hadoop] NexR Terapot: Massive Email Archiving
[Hadoop] NexR Terapot: Massive Email Archiving[Hadoop] NexR Terapot: Massive Email Archiving
[Hadoop] NexR Terapot: Massive Email Archiving
 
Edi text
Edi textEdi text
Edi text
 
Internet of Information and Services (IoIS)
Internet of Information and Services (IoIS)Internet of Information and Services (IoIS)
Internet of Information and Services (IoIS)
 
Survey of Machine Learning Techniques in Textual Document Classification
Survey of Machine Learning Techniques in Textual Document ClassificationSurvey of Machine Learning Techniques in Textual Document Classification
Survey of Machine Learning Techniques in Textual Document Classification
 
Boot slides xxl
Boot slides xxlBoot slides xxl
Boot slides xxl
 
Aardvark Final Www2010
Aardvark Final Www2010Aardvark Final Www2010
Aardvark Final Www2010
 

En vedette

US Colleges & Universities: An Insider’s Perspective
US Colleges & Universities: An Insider’s PerspectiveUS Colleges & Universities: An Insider’s Perspective
US Colleges & Universities: An Insider’s PerspectiveEducational Initiatives
 
Nicolas Games PL 18.05 Final
Nicolas Games PL 18.05 FinalNicolas Games PL 18.05 Final
Nicolas Games PL 18.05 FinalNicolas Games
 
Presentation Bob Smits during SharePoint roundtable Imtech ICT
Presentation Bob Smits during SharePoint roundtable Imtech ICTPresentation Bob Smits during SharePoint roundtable Imtech ICT
Presentation Bob Smits during SharePoint roundtable Imtech ICTDynamic People B.V.
 
Andron design portfolio final
Andron design portfolio finalAndron design portfolio final
Andron design portfolio finalpartymarty
 
Zoeken in SharePoint by Arno Flapper Imtech ICT
Zoeken in SharePoint by Arno Flapper Imtech ICTZoeken in SharePoint by Arno Flapper Imtech ICT
Zoeken in SharePoint by Arno Flapper Imtech ICTDynamic People B.V.
 
Enterprise iPhone developers building bespoke apps
Enterprise iPhone developers building bespoke appsEnterprise iPhone developers building bespoke apps
Enterprise iPhone developers building bespoke apps3squared.com
 
How to sell copyrights 1
How to sell copyrights 1How to sell copyrights 1
How to sell copyrights 1Yuliani Liputo
 
Misconception Series (Part 6) - English
Misconception Series (Part 6) - English Misconception Series (Part 6) - English
Misconception Series (Part 6) - English Educational Initiatives
 
Presentations tips
Presentations tipsPresentations tips
Presentations tipsmubashirse
 
Northwestern University's Center for Talent Development
Northwestern University's Center for Talent DevelopmentNorthwestern University's Center for Talent Development
Northwestern University's Center for Talent DevelopmentEducational Initiatives
 
Misconception Series (Part 1) - English
Misconception Series (Part 1) - English Misconception Series (Part 1) - English
Misconception Series (Part 1) - English Educational Initiatives
 
Student Misconception Series – English (Part 2)
Student Misconception Series – English (Part 2)Student Misconception Series – English (Part 2)
Student Misconception Series – English (Part 2)Educational Initiatives
 
Student Misconceptions in MATHS – Part 6
Student Misconceptions in MATHS – Part 6Student Misconceptions in MATHS – Part 6
Student Misconceptions in MATHS – Part 6Educational Initiatives
 
Stop mugging-start-learning -quiz contest
Stop mugging-start-learning -quiz contestStop mugging-start-learning -quiz contest
Stop mugging-start-learning -quiz contestEducational Initiatives
 

En vedette (17)

US Colleges & Universities: An Insider’s Perspective
US Colleges & Universities: An Insider’s PerspectiveUS Colleges & Universities: An Insider’s Perspective
US Colleges & Universities: An Insider’s Perspective
 
Nicolas Games PL 18.05 Final
Nicolas Games PL 18.05 FinalNicolas Games PL 18.05 Final
Nicolas Games PL 18.05 Final
 
Presentation Bob Smits during SharePoint roundtable Imtech ICT
Presentation Bob Smits during SharePoint roundtable Imtech ICTPresentation Bob Smits during SharePoint roundtable Imtech ICT
Presentation Bob Smits during SharePoint roundtable Imtech ICT
 
Andron design portfolio final
Andron design portfolio finalAndron design portfolio final
Andron design portfolio final
 
Zoeken in SharePoint by Arno Flapper Imtech ICT
Zoeken in SharePoint by Arno Flapper Imtech ICTZoeken in SharePoint by Arno Flapper Imtech ICT
Zoeken in SharePoint by Arno Flapper Imtech ICT
 
Enterprise iPhone developers building bespoke apps
Enterprise iPhone developers building bespoke appsEnterprise iPhone developers building bespoke apps
Enterprise iPhone developers building bespoke apps
 
Evaluation
EvaluationEvaluation
Evaluation
 
How to sell copyrights 1
How to sell copyrights 1How to sell copyrights 1
How to sell copyrights 1
 
Misconception Series (Part 6) - English
Misconception Series (Part 6) - English Misconception Series (Part 6) - English
Misconception Series (Part 6) - English
 
Presentations tips
Presentations tipsPresentations tips
Presentations tips
 
Northwestern University's Center for Talent Development
Northwestern University's Center for Talent DevelopmentNorthwestern University's Center for Talent Development
Northwestern University's Center for Talent Development
 
Presentation 7
Presentation 7Presentation 7
Presentation 7
 
meetroo praatplaat low res
meetroo praatplaat low resmeetroo praatplaat low res
meetroo praatplaat low res
 
Misconception Series (Part 1) - English
Misconception Series (Part 1) - English Misconception Series (Part 1) - English
Misconception Series (Part 1) - English
 
Student Misconception Series – English (Part 2)
Student Misconception Series – English (Part 2)Student Misconception Series – English (Part 2)
Student Misconception Series – English (Part 2)
 
Student Misconceptions in MATHS – Part 6
Student Misconceptions in MATHS – Part 6Student Misconceptions in MATHS – Part 6
Student Misconceptions in MATHS – Part 6
 
Stop mugging-start-learning -quiz contest
Stop mugging-start-learning -quiz contestStop mugging-start-learning -quiz contest
Stop mugging-start-learning -quiz contest
 

Similaire à Adding structure to unstructured content for enhanced findability hakan tylen

Metadata primer for technical communicators
Metadata primer for technical communicatorsMetadata primer for technical communicators
Metadata primer for technical communicatorsRob Hanna, ECMs
 
Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic SearchPaul Wlodarczyk
 
Chris McNulty - Managed Metadata and Taxonomies
Chris McNulty - Managed Metadata and TaxonomiesChris McNulty - Managed Metadata and Taxonomies
Chris McNulty - Managed Metadata and TaxonomiesSharePoint Saturday NY
 
SPSBOS -- How your metadata strategy impacts everything you do
SPSBOS -- How your metadata strategy impacts everything you doSPSBOS -- How your metadata strategy impacts everything you do
SPSBOS -- How your metadata strategy impacts everything you doChristian Buckley
 
Tagging Up - MMS and Taxonomy In SharePoint 2010
Tagging Up - MMS and Taxonomy In SharePoint 2010Tagging Up - MMS and Taxonomy In SharePoint 2010
Tagging Up - MMS and Taxonomy In SharePoint 2010Chris McNulty
 
Looking Under the Hood -- Australia SharePoint Conference
Looking Under the Hood -- Australia SharePoint ConferenceLooking Under the Hood -- Australia SharePoint Conference
Looking Under the Hood -- Australia SharePoint ConferenceChristian Buckley
 
MS Fast Search Server
MS Fast Search ServerMS Fast Search Server
MS Fast Search ServerWaleed Badawy
 
Mesh Labs Introduction June 2012
Mesh Labs Introduction June 2012Mesh Labs Introduction June 2012
Mesh Labs Introduction June 2012Umesh Ramalingachar
 
Making IA Real: Planning an Information Architecture Strategy
Making IA Real: Planning an Information Architecture StrategyMaking IA Real: Planning an Information Architecture Strategy
Making IA Real: Planning an Information Architecture StrategyChiara Fox Ogan
 
How your metadata strategy impacts everything you do
How your metadata strategy impacts everything you doHow your metadata strategy impacts everything you do
How your metadata strategy impacts everything you doChristian Buckley
 
Looking Under the Hood: How Your Metadata Strategy Impacts Everything You Do
Looking Under the Hood: How Your Metadata Strategy Impacts Everything You DoLooking Under the Hood: How Your Metadata Strategy Impacts Everything You Do
Looking Under the Hood: How Your Metadata Strategy Impacts Everything You DoChristian Buckley
 
Metadata and Tagging
Metadata and TaggingMetadata and Tagging
Metadata and Taggingpauloshea
 
How Search 2.0 Has Been Redefined by Enterprise 2.0
How Search 2.0 Has Been Redefined by Enterprise 2.0How Search 2.0 Has Been Redefined by Enterprise 2.0
How Search 2.0 Has Been Redefined by Enterprise 2.0Enterprise 2.0 Conference
 
InfoFusion Overview And Roadmap
InfoFusion Overview And RoadmapInfoFusion Overview And Roadmap
InfoFusion Overview And RoadmapMarten den Haring
 
SPSTCDC - Managed Metadata and Taxonomies in SharePoint 2010 - Playing Tag
SPSTCDC - Managed Metadata and Taxonomies in SharePoint 2010 - Playing TagSPSTCDC - Managed Metadata and Taxonomies in SharePoint 2010 - Playing Tag
SPSTCDC - Managed Metadata and Taxonomies in SharePoint 2010 - Playing TagKnowledge Management Associates, LLC
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Peter Mika
 

Similaire à Adding structure to unstructured content for enhanced findability hakan tylen (20)

Playing Tag: Managed Metadata and Taxonomies in SharePoint 2010
Playing Tag: Managed Metadata and Taxonomies in SharePoint 2010Playing Tag: Managed Metadata and Taxonomies in SharePoint 2010
Playing Tag: Managed Metadata and Taxonomies in SharePoint 2010
 
Metadata primer for technical communicators
Metadata primer for technical communicatorsMetadata primer for technical communicators
Metadata primer for technical communicators
 
KMA Taxonomy TBC2010
KMA Taxonomy TBC2010KMA Taxonomy TBC2010
KMA Taxonomy TBC2010
 
Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic Search
 
KMA on Mms2010 nyc
KMA on Mms2010 nycKMA on Mms2010 nyc
KMA on Mms2010 nyc
 
Chris McNulty - Managed Metadata and Taxonomies
Chris McNulty - Managed Metadata and TaxonomiesChris McNulty - Managed Metadata and Taxonomies
Chris McNulty - Managed Metadata and Taxonomies
 
KMA's mms2010nyc
KMA's mms2010nycKMA's mms2010nyc
KMA's mms2010nyc
 
SPSBOS -- How your metadata strategy impacts everything you do
SPSBOS -- How your metadata strategy impacts everything you doSPSBOS -- How your metadata strategy impacts everything you do
SPSBOS -- How your metadata strategy impacts everything you do
 
Tagging Up - MMS and Taxonomy In SharePoint 2010
Tagging Up - MMS and Taxonomy In SharePoint 2010Tagging Up - MMS and Taxonomy In SharePoint 2010
Tagging Up - MMS and Taxonomy In SharePoint 2010
 
Looking Under the Hood -- Australia SharePoint Conference
Looking Under the Hood -- Australia SharePoint ConferenceLooking Under the Hood -- Australia SharePoint Conference
Looking Under the Hood -- Australia SharePoint Conference
 
MS Fast Search Server
MS Fast Search ServerMS Fast Search Server
MS Fast Search Server
 
Mesh Labs Introduction June 2012
Mesh Labs Introduction June 2012Mesh Labs Introduction June 2012
Mesh Labs Introduction June 2012
 
Making IA Real: Planning an Information Architecture Strategy
Making IA Real: Planning an Information Architecture StrategyMaking IA Real: Planning an Information Architecture Strategy
Making IA Real: Planning an Information Architecture Strategy
 
How your metadata strategy impacts everything you do
How your metadata strategy impacts everything you doHow your metadata strategy impacts everything you do
How your metadata strategy impacts everything you do
 
Looking Under the Hood: How Your Metadata Strategy Impacts Everything You Do
Looking Under the Hood: How Your Metadata Strategy Impacts Everything You DoLooking Under the Hood: How Your Metadata Strategy Impacts Everything You Do
Looking Under the Hood: How Your Metadata Strategy Impacts Everything You Do
 
Metadata and Tagging
Metadata and TaggingMetadata and Tagging
Metadata and Tagging
 
How Search 2.0 Has Been Redefined by Enterprise 2.0
How Search 2.0 Has Been Redefined by Enterprise 2.0How Search 2.0 Has Been Redefined by Enterprise 2.0
How Search 2.0 Has Been Redefined by Enterprise 2.0
 
InfoFusion Overview And Roadmap
InfoFusion Overview And RoadmapInfoFusion Overview And Roadmap
InfoFusion Overview And Roadmap
 
SPSTCDC - Managed Metadata and Taxonomies in SharePoint 2010 - Playing Tag
SPSTCDC - Managed Metadata and Taxonomies in SharePoint 2010 - Playing TagSPSTCDC - Managed Metadata and Taxonomies in SharePoint 2010 - Playing Tag
SPSTCDC - Managed Metadata and Taxonomies in SharePoint 2010 - Playing Tag
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012
 

Dernier

8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCRashishs7044
 
Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy Verified Accounts
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?Olivia Kresic
 
Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03DallasHaselhorst
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyotictsugar
 
Chapter 9 PPT 4th edition.pdf internal audit
Chapter 9 PPT 4th edition.pdf internal auditChapter 9 PPT 4th edition.pdf internal audit
Chapter 9 PPT 4th edition.pdf internal auditNhtLNguyn9
 
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptxThe-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptxmbikashkanyari
 
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!Doge Mining Website
 
Appkodes Tinder Clone Script with Customisable Solutions.pptx
Appkodes Tinder Clone Script with Customisable Solutions.pptxAppkodes Tinder Clone Script with Customisable Solutions.pptx
Appkodes Tinder Clone Script with Customisable Solutions.pptxappkodes
 
Financial-Statement-Analysis-of-Coca-cola-Company.pptx
Financial-Statement-Analysis-of-Coca-cola-Company.pptxFinancial-Statement-Analysis-of-Coca-cola-Company.pptx
Financial-Statement-Analysis-of-Coca-cola-Company.pptxsaniyaimamuddin
 
Entrepreneurship lessons in Philippines
Entrepreneurship lessons in  PhilippinesEntrepreneurship lessons in  Philippines
Entrepreneurship lessons in PhilippinesDavidSamuel525586
 
8447779800, Low rate Call girls in Dwarka mor Delhi NCR
8447779800, Low rate Call girls in Dwarka mor Delhi NCR8447779800, Low rate Call girls in Dwarka mor Delhi NCR
8447779800, Low rate Call girls in Dwarka mor Delhi NCRashishs7044
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCRashishs7044
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607dollysharma2066
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCRashishs7044
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...ssuserf63bd7
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaoncallgirls2057
 
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdfKhaled Al Awadi
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckHajeJanKamps
 
Darshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdfDarshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdfShashank Mehta
 

Dernier (20)

8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
 
Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail Accounts
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?
 
Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyot
 
Chapter 9 PPT 4th edition.pdf internal audit
Chapter 9 PPT 4th edition.pdf internal auditChapter 9 PPT 4th edition.pdf internal audit
Chapter 9 PPT 4th edition.pdf internal audit
 
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptxThe-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
 
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!
 
Appkodes Tinder Clone Script with Customisable Solutions.pptx
Appkodes Tinder Clone Script with Customisable Solutions.pptxAppkodes Tinder Clone Script with Customisable Solutions.pptx
Appkodes Tinder Clone Script with Customisable Solutions.pptx
 
Financial-Statement-Analysis-of-Coca-cola-Company.pptx
Financial-Statement-Analysis-of-Coca-cola-Company.pptxFinancial-Statement-Analysis-of-Coca-cola-Company.pptx
Financial-Statement-Analysis-of-Coca-cola-Company.pptx
 
Entrepreneurship lessons in Philippines
Entrepreneurship lessons in  PhilippinesEntrepreneurship lessons in  Philippines
Entrepreneurship lessons in Philippines
 
8447779800, Low rate Call girls in Dwarka mor Delhi NCR
8447779800, Low rate Call girls in Dwarka mor Delhi NCR8447779800, Low rate Call girls in Dwarka mor Delhi NCR
8447779800, Low rate Call girls in Dwarka mor Delhi NCR
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
 
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
 
Darshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdfDarshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdf
 

Adding structure to unstructured content for enhanced findability hakan tylen

  • 1. Do not reinvent Findability and Knowledge Management Håkan Tylén Western Europe Business Development +46703091665 hatylen@microsoft.com
  • 3. Customer/Employee Service, in the Self-service channel How can I help YOU?
  • 4. Metadata basics What is it? Where is it stored? Metadata is the set of properties that characterize a document.
  • 5. Poor metadata impairs the search experience Degraded findability leads to the erosion of users’ trust in search Few options to navigate Inconsistent, incorrect or missing I’m not confident I will find what I need here… or refine a large result list other than trying to metadata is commonplace within This is a waste of time! reformulate the query most organizations today This impairs findability in the context of enterprise search Hard to scan or navigate results Documents returned may be Unchanged template metadata make results look like duplicates incomplete or not current No confidence in authority and correctness of information Difficult to locate relevant experts Meaningless metadata confuses users as they Even with refinement scan the search results tools, users do not rely on them Multiple variations or spellings Missing metadata raises questions about result Hit counts do not set completeness add up
  • 6. ROI - Scenarios 1. Time Wasted Searching 2. Cost of Reworking Information 3. Opportunity Costs to the Enterprise 6 | SharePoint Server 2010 for Internet Sites Microsoft confidential.
  • 7. Scenario 1: Time wasted €3.000/month + social €50.000/year 10 minutes/day *220 €1.000/emp/year 1000 employees = €1.000.000/year ”released time” 7 | SharePoint Server 2010 for Internet Sites Microsoft confidential.
  • 8. Creating quality metadata is a real challenge Few organizations have good quality metadata on internal content • Ineffective information governance across the enterprise • Multiple content silos and search interfaces Challenge • Manually entered metadata is inconsistent, incorrect or missing • No automated tools for content classification • Impossible to keep up with ever growing content volumes Assist users in tagging content with automated metadata suggestions or enrichment tools • FAST Search for SharePoint (FS4SP) delivers business value out-of-the-box • Sophisticated content processing optimizes findability across multiple silos Solution of unstructured and structured content • In addition, property extraction overcomes poor metadata by generating it and normalizing it on-the-fly
  • 10. Content Processing Pipeline – what is it? Enhance your content for optimal search experience and findability The pipeline is a sequentially arranged set of discrete processing stages that break down and enrich content for indexing Convert documents to plain text (support for 400+ file formats) Detect document languages and encoding (support for 80+ languages) Apply linguistic normalization to optimize content for search Identify and leverage existing metadata where applicable Parse content to extract or generate additional metadata Map content and associated metadata (crawled properties) to the index schema (managed properties) for searching Custom stages can be created and added to the pipeline Language Custom Identifies the encoding and language-specific rules for Breaks you to tokens entities mentioned Applies document times (phrase/weight inin pipeline Recognizes predefined usinga standard topairsthe so Createstext into andvectors content processingthe content; Converts dates extend the normalizationrepresentation, Enables language-specific tolanguages usedcontenttext to reflecting Date and Time Properties Property Format Extracts plain text pieces of content and metadata Maps the relevant and metadata from multiple content Lemmatization Encoding and Vectorization Tokenization Processing content so that the (home-grown occurrence) 3rd party punctuation, support for and phrases users’ locale-specific accents, linguisticin words,enable out custom stages appropriate of solutionscanonical important terms and frequency compoundexample, the handlethe box match wordsCompanies, Locations andor withof queriesdiacritics,representations; fornormalization or to phrases Normalization Conversion Extraction Mapper formats (e.g. the pipeline to the index schema discovered inMicrosoft Office, PDF, HTML, etc.) for search Detection Stage rules and to (currency, telephones, downstream and similar”address extended to other 2010 inflected dictionariesyour People but this is equivalent to March numbers, etc.) “find 14-Mar-10can becan be appliedpartneeds datenumbers functionality own masculine/feminine, software)forms (singular/plural,business14,categories etc.)
  • 11. Property Extraction Create metadata on-the-fly, adding structure to unstructured content In a nutshell, property extraction Crawled Properties is the ability to Companies Process unstructured content (e.g. Microsoft Contoso a document’s body) Woodgrove Recognize entities mentioned in … the text (e.g. people, companies, Locations locations, concepts, etc.) London San Francisco Optionally, normalize variations to Moscow a single, canonical form … Expose these extracted entities as People crawled properties in pipeline Bill Gates Barack Obama Map them to managed properties José Caires for filtering and searching ... Index Schema: Managed Properties Type Doc ID Title Author Date Size Keywords Companies Locations People ... Body Text xxx Sales For… John Doe 2010-04-15 386 KB sales; pipe… Microsoft; … London; … Bill Gates; … … The mark… yyy … … … … … … … … … … zzz … … … … … … … … … …
  • 12. Good metadata greatly improves findability Property extraction enables consistent metadata across all content This is really great! Now I Metadata quality is critical to can navigate through this Metadata is also used for relevancy tuning, the search experience large information universe multi-level sorting and advanced search without feeling lost… FS4SP leverages metadata, i.e. managed properties, to present deep refiners File Formats , Offer at-a-glance overview Organize free-text search results into multiple facets Companies Make search conversational Guide users toward possible Precise hit counts in refinement choices deep refiners are computed across the whole result set. Prevent users drilling down Products into a “0 results” dead end Additional uses for managed properties in FS4SP Relevancy tuning & ranking Concepts Multi-level sorting Advanced (or fielded) search And many more…
  • 13. The Microsoft IT Intranet Environment 6.4 TB 49,731 Sites Seattle Dublin 117,324 Sub-sites 29.89 TB 22% 65% ( 31,346,042 MB ) Grows with 1.5TB per quarter Singapore 223,595 Sites 4.1 TB 19.4 TB 545,387 Sub-sites 45,878 Sites 13% 127,986 Sites 82,128 Sub-sites 345,935 Sub-sites - Europe - Middle East - - Americas - - Africa - - Asia Pacific - As of September 2010 | 13
  • 15.
  • 16. Property extraction and refiners in FS4SP What’s available out-of-the-box? FS4SP automatically detects 80+ languages in content Property extraction dictionaries are included for 11 languages* and 3 types of entities Locations Companies Persons The metadata is exposed to users as refiners, drives relevancy and other features to improve findability This delivers real business value to organizations struggling with issues such as Poor document metadata Large content volumes Lack of result refinement options Low user adoption of search * Arabic, Dutch, English, French, German, Italian, Japanese, Norwegian, Portuguese, Russian, Spanish
  • 17.
  • 18. Extending property extraction in FS4SP (1/2) Make search speak the language of your business using dictionaries Property extraction in FS4SP is SharePoint lists & Term Store customizable using a dictionary, i.e. list of keywords and phrases Matching variations can be normalized to a single entry Several dictionaries may co-exist to address needs of the business Projects Create custom Products search refiners to fit your own Customers business needs Competitors Employees Business-specific concepts The necessary data may be readily available within the organization or from external sources LOB applications, Databases & XML
  • 19. Extending property extraction in FS4SP (2/2) Use existing text mining or classification tools to go even further Another approach is to invoke External text mining/classification tool external tools during content processing in FS4SP This leverages the standard pipeline extensibility mechanism Local software Web service Such tools typically address problems like Analyze text content Text mining for entity, fact or relationship extraction Return metadata tags Taxonomy classification Moreover, these tools may be already deployed for other Index purposes in the enterprise Content pipeline Enriched document Home-grown solutions for indexing ? 3rd party, specialized vendors Industry sectors or verticals Original document from repository Scientific or technology domains
  • 21. Best practice #1 Deepen your understanding of your audiences and your content Marketing Sales Consulting Procurement Production Research IT Support HR / Legal Enterprise content Before you start deploying enterprise search: understand your content, your users and what they need to get their jobs done effectively.
  • 22. Best practice #2 Use existing language resources inside and outside your enterprise •Thesauri, controlled •Government Internal assets Internet resources Content providers Specialized vendors vocabularies agencies •Taxonomies, •Industry bodies ontologies •Research institutions •Master databases •Academia •Enterprise systems •Virtual •Line-of-business communities applications •Examples •Subject matter experts •Wikipedia.org •DBpedia.org •Examples* •WordNet, from •SharePoint (Lists, Princeton University Term Store) •Medical Subject •Employees (AD, HR) Headings (MeSH) •Customers (CRM) •Suppliers (ERP) •Products (PLM) •Processes (BPM) •Projects (EPM) * AD – Active Directory; CRM – Customer Relationship Mgmt.; ERP – Enterprise Resource Mgmt.; PLM – Product Lifecycle Mgmt.; BPM – Business Process Modeling; EPM – Enterprise Project Mgmt.
  • 23. Best practice #3 Keep the index synchronized with content sources and dictionaries The language of the business Where possible, automate will change over time dictionary upkeep as part of External environment standard business workflows Enterprise content Taxonomies and thesauri Users’ needs Enterprise project management Ensure that property extraction Product lifecycle management dictionaries and search index Schedule regular analysis and are systematically updated to review checkpoints to handle respond to these changes exceptional cases Dictionary with changes over time Data Search synchronized Sources Property Extraction Dictionaries Search Index Enterprise Content Sources
  • 24. Best practice #4 Distinguish search management from systems management As the language of your business Search management is not an IT and users’ needs evolve, so should responsibility, it’s for the business your search solution Job profile If not, the search experience and • Skillset of a SharePoint administrator (not a findability inevitably degrade over programmer or systems engineer) time – users’ trust will plunge too • Business perspective and focus • Good ability with languages • Attention to detail Original implementation Sample tasks of the search solution • Monitor search reports (daily/weekly) • Run user polls and/or focus groups (quarterly) • Process users feedback/questions • Update dictionaries and manage keywords (as required) • Support search-related projects Staffing – depends on scale Actual search experience, if left unattended... • One person part-time, or • A geographically distributed team
  • 26. Case study #1 General Mills (Research & Development) Business Problem • Researchers forced to search each internal and external content source separately • Low relevancy in existing search applications • High effort in information discovery tasks • Growing difficulty in establishing connections with experts as company grew worldwide Approach & Solution • FAST Search for SharePoint indexes all internal sources and federates external industry services • Property extraction dictionaries extended to recognize product names cited in documents • Deep refiners are used on extracted properties to drill down by products, companies and people Benefits & Value • Improved employee productivity with more relevant search results in a unified interface • Greater information sharing and reuse across product areas & geographies By using FAST Search Server 2010 for SharePoint, our • Integrated people search eases social networking researchers can refine their searches and find exactly what they are looking for. They spend more time innovating than • Proof point for wider search roll-out in enterprise looking for information. Link to full case study – Michelle Check, R&D Systems Leader, General Mills
  • 27. Case study #2 Mississippi Department of Transportation (MDOT) Business Problem • Poor access to a large, active collection of paper- based contracts and project documents • Metadata managed in a separate DMS (database) • Information silos stifle and sharing of data and collaboration • Requirements to provide internal and public access Approach & Solution • FAST Search for SharePoint indexes images with iFilter-based OCR technology • Pipeline extended with custom .NET code to merge metadata from database with indexed documents • Custom refiners reflect language used in the business for navigating search results Benefits & Value • Unified self-service interface to locate information • Ability to slice & dice results according to specific needs (dates, project, folder, route, district, etc.) We are literally reducing decision cycles from days to • Information search times cut from several hours or minutes for hundreds of overlapping decisions a day. With days to mere seconds or minutes SharePoint Server 2010, we can make better spending decisions and enhance program performance without a very • Users have more time to focus on higher value tasks large investment. Link to full case study – John Michael Simpson, CTO, MDOT
  • 28. Ingredients for great enterprise search The business value of FAST Search Server 2010 for SharePoint The challenges • Explosive content growth puts information management and governance under pressure • Multiple content silos with different search interfaces • Poor metadata – missing, inconsistent, incorrect The solution • Content processing optimizes findability across disparate sources • Property extraction generates metadata while indexing content • Deep refiners expose metadata in search results helping users quickly zoom to the right information The benefits • Reduced costs through enterprise search consolidation and automated metadata enrichment • Enhanced findability helps employees to get their job done faster • Increased user adoption across the enterprise drives ROI
  • 29. microsoft.com / Enterprise Search © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.