Agnes Molnar is an international SharePoint consultant and Microsoft MVP who has over 10 years of experience with SharePoint. In her presentation, she discusses some of the real world challenges organizations face with enterprise search, including information overload, the complexity of content and metadata, security, scaling, and relevance ranking. She emphasizes that search is an application that requires understanding user needs and behaviors as well as content sources in order to be successful.
2. Real World Challenges in SP
AGNES MOLNAR
Search CONSULTANT,
INDEPENDENT
SHAREPOINT SERVER MVP – HUNGARY
SHAREPOINT AND PROJECT CONFERENCE ADRIATICS 2013
ZAGREB, NOVEMBER 27-28 2013
3. Introduction – Agnes Molnar
International SharePoint Consultant
• 10+ Years SharePoint Experience
• Information Architecture & ECM
• Search
SharePoint Server MVP
• 6 Years SharePoint Server MVP
• 5+ Years Speaking at Conferences Around the World
• Numerous Books, White Papers, Articles
Contact
• E-mail: aghy@aghy.hu
• Blog: http://aghy.hu
• Twitter: @molnaragnes
8. Search as an Application
Source: http://www.domorewithsearch.com
9. Search as an Application
• Search is no longer the white box
• Content lives in disparate locations
• Structured and unstructured content lives in different locations
• Need to aggregate content according to
•
•
•
•
•
•
Process
Context
Customer
Goal
Program
Parameter of any of the above
10. User – Context – Content
• Context:
Business models & goals,
corporate culture, resources
• [Where information is used]
• Content:
Document types Objects,
structure, attributes, Metainformation
Context
• [How to describe the information]
• Users:
Information needs, audience
types, expertise, tasks
• [How to Use the Information]
Content
Users
12. Search is more than Technology
Source: http://searchpatterns.org
13. The Complexity of Enterprise Information
What we give to the search engine…
What the search engine sees…
Title
Author
Created Date
Modified Date
File Type
…
Overview of SharePoint 2013 Preview Installation and Configuration
Alex Yarrow
06/21/2012
10/16/2012
docx
…
14. Explicit metadata versus implicit metadata
Content Type =
License
ABC Company
Explicit metadata
Organization =
DEF Company
Topic =
Support
ABC shall provide first level technical support
to all Licensed Product end users and/or
Sublicensed Product customers/users. DEF
will provide second level support. DEF shall
provide to ABC a primary and a secondary
support person to act as the primary interface
with ABC’s technical and customer support
team. DEF shall provide direct technical
support to ABC for all uses of the DEF
Software. Support level definitions and
responsibilities are set forth in Exhibit C. An
“SLA Failure” as defined in Exhibit C shall
qualify as a Release Condition sufficient to
authorize the Escrow Agent to release to
Source Code to ABC pursuant to Section 7
and the Escrow Agreement.
Forward Index – Words per document
Inverted Index – Documents per word
ABC
customers
customer support
customer support team
DEF
DEF software
end users
escrow agreement.
escrow agent
exhibit c
licensed product
Implicit metadata
release condition
section 7
secondary support
SLA
SLA failure
software
source code
support level
sublicensed product
technical support
15. The Complexity of Search
Result Block
Data Source
Content Source
Result Block
Data Source
Query Rule
Query Rule
Query Rule
Result Set
Content Source
Data Source
metadata
Content Source
Data Source
Local Search Index
Refinement Panel
Result Source
Indexing
Hover Panel
Federation
Remote Search index
Result Source
Display
Templates
16. Requirements Gathering
Information-Seeking Patterns
• „I know what I’m searching for and know how to do that”
• „I know what I’m searching for but I don’t know how to do that”
• „I don’t know what I’m searching for”
• „Am I Searching?...”
19. Content Inventory
• SharePoint content (2013, 2010, …)
•
•
•
•
Intranet
Department sites
Project sites
Internal KB
• File shares
•
•
Sales repository (RFPs, proposals, etc.)
Marketing documents (DMs, brochures, etc.)
• Web sites
•
•
•
Company public web site
Professional Know-How Web Sites
(finance, IT, development, etc.)
Common interest
(stock, management, etc.)
• Exchange Public Folders
•
Internal communication
• Business Data
•
Data from databases
• Custom connector
•
•
SAP data
CRM data
21. Crawl or Federate? – Where to get the
content from?
• Crawl + Use Local Index:
• Examples:
•
•
Intranet
Company file shares
• Pros:
•
•
•
Full control over the index (crawl schedule, metadata included, etc.) and ranking model
Results can be aggregated into one result set
Common refiners (facets)
• Cons:
•
•
Needs resources for the crawling process
Needs storage to store the index
• Federate:
• Examples:
•
•
•
Professional know-how web sites (TechNet, MSDN, etc.)
Internet results for a specific topic (financial news, stock information, etc.)
3rd party Content Management System
• Pros:
•
Doesn’t need resources to crawl / store the index
• Cons:
•
•
•
•
Live Internet connection is required
No control over the index
No control over the ranking model
No real aggregation with other result sources
22. Content Source Inventory
Name
Type
Location
Owner
Volume of
Content
Frequency of
Updates
Intranet
SharePoint
http://intranet
Intranet Team
200K items
100-300/hr
Project Sites
SharePoint
http://projects
Delivery
200K items
100-200/hr
Sales share
File share
X:Sales
Sales
500K docs
300-500/hr
Marketing share
File share
X:Marketing
Marketing
200K docs
300-500/hr
Company web
site
Web site
http://mycompany.com
Marketing/
Publishing Team
<100K pages
1-10/day
Competitor’s web
site
Web site
http://competitor.com
[external]
<100K pages
1-10/day
Professional
Know-How
Web site
http://www.mykb.com
[external]
<100K pages
5-10/week
Company
Announcements
Exchange
Public
Folder
Exchange/Public
Folders/Announcements
Marketing/
Internal Comm.
Team
<100K items
5-10/day
HR data
Business
Data (SQL)
SQL database
HR
<100K items
10-100/day
CRM data
Custom
Connector
CRM system
Sales
500K entries
500-1000/hr
23. Metadata in Search
• The “glue” of Search Applications
• Crawled property:
metadata extracted from the documents/items during the crawl.
• Managed property:
mapped to crawled properties, controlled by Search Admins, helping
users perform more efficient and successful queries:
• Refiners
• Displayed in Search Results
• Sorting Properties
24. Metadata in Search
Crawled Property
Managed Property
Usage
Refiner
Author
Display on
Result Set
CreatedBy
Author
Display on
Hover Panel
From
Sorting by
31. Sorting the Results – Relevance Ranking
• Requirements:
“I’d like to see ALL the relevant results.”
vs.
“I don’t want to see anything that is not relevant
(to me, in this context).”
32. Sorting the Results – Relevance Ranking
Element
Description
Freshness
Authority
Quality
Geo
Age of a document compared to the time when the query is issued
Importance of a document determined by the links to it from other documents
Assigned importance of a document, independent of the query
Importance of geographical distance between a document’s associated latitude/longitude
and a target location specified in a query
Context
Proximity
Importance of matching a query in a given document field
For multi-term queries: the shorter the distance between query terms in a document, the
higher the document’s rank value
Position
Frequency
The earlier a query term occurs in a field, the higher the document’s rank value
The more frequent a query term occurs in a document, the higher the document’s rank
value
Completeness The greater the number of query terms present in the same field of a matching document,
the higher the document’s rank value
Number
For multi-term queries; the more query terms matched in a document, the higher the
document’s rank value
Reference: Okapi BM25
http://en.wikipedia.org/wiki/Probabilistic_relevance_model_(BM25)
34. Search Analytics in SharePoint 2013
•
Usage Events – As users interact with content in SharePoint, actions are captured and
stored as events (click a link, press a button, view or open a document).
•
Access and create experiences using data captured in the analytics database.
No longer within the firewallRelevance is criticalSearch within the organization„Transparent” SearchSearch Driven Applications
Management by Walking Around
“Join” by…FilterRefinementDisplaySort/Order
Resource: Configure properties of the Search Box Web Part in SharePoint Server 2013 (http://technet.microsoft.com/en-us/library/gg576963.aspx).Entity Extraction for other content sources
Search “opens up windows” but not a “security leak”!!Plan!!Research on SOURCE SYSTEM, involve the admins there!!TestOn Source systemOn SearchInvolve:Source system key usersSource system adminsTest users (<7)More test users
New analytics processing component analyzes content in the search index and user actions that were performed on a site to identify items that users perceive as more relevant than others.Number of ViewsNumber of ClicksOverall item usageRecommendationSocial distance…