Much has been written about Search in SharePoint 2013 – and rightly so. But if you think the airtight FAST integration is all there is… think again! New cutting-edge technology has made its way into the platform, promising for more flexibility, better performance and richer functionality.
But how is it different? What can we use it for? And how does all of this affect capacity planning, deployment, and day-to-day maintenance like resizing, coping with HW failures and keeping up the seemingly always-changing business requirements? In this session we'll explore all of this - and more!
2. Marcus Johansson
• Senior Consultant, Comperio
• V-TSP Enterprise Search, Microsoft
Email: marcus.johansson@comperiosearch.com
Twitter: @marcjoha
Blog: http://blog.comperiosearch.com
LinkedIn: http://www.linkedin.com/in/marcusjohansson
3. End of an era, birth of a New age
• FAST now “fully integrated”
– True, but there’s more!
• No longer a “FAST license”
– SP2013 contains everything
– Enterprise version
• Migration from FS4SP?
– Brr… 1997 – 2013
4. The evolution of FAST
Secret sauce
(incl. Mars)
FSIS
Search in
FDS ESP FS4SP
SP2013
FSIA
Search in
SP2010
5. All this talk about the new Sheriff…
• Search in SP2013 gets a lot of attention
– Revamped user/admin interface
– Hover panels, previews
– Query rules, result blocks
– Result types, display templates
– “You’ve seen this result before”
– Query Builder
– Content Search web part
– Etc.
• Notice the pattern?
6. …what Search in SP13 really is
Empowering
Better, more Major user
the whole
powerful experience
SharePoint
extensibility overhaul
experience
Finally a Vastly
single search improved
architecture search core
• How come most of the buzz is about the UX?
7. For the first time,
Search isn’t defined by the
nuts and bolts,
but from the User Experience
and high-level tools around it.
13. Keeping it all together
Services Processes
Process name Description
hostcontrollerservice.exe Process controller. Monitors and restarts children.
noderunner.exe A search component (except the crawl component)
mssearch.exe The crawl component.
14. Crawl component
• Changes from SP2010
mssearch.exe – Only crawling
• No indexing
– Continuous crawl
• Improves freshness
– Crawl Log
• More details
• Document removal
Crawl – Crawl Health Report
• Huge improvement!
15. Continuous crawls
• Not event-driven indexing
• Starts crawl regardless of prior crawl session
• Large change sets no longer bad for freshness
Full and
incremental
Continuous
Default 15 min
time
• Only available for SharePoint content types
– Possible to crawl SP 2010 and 2007
16. Crawl health reports
Crawl rate per type Crawl load
CPU and Content
Rate Latency Freshness memory Processing Etc.
load activity
17. Crawl component performance
• Anecdotal: feels faster, more stable
• Bound by CPU and network
– Documents per second
– Link discovery
• Some I/O – files temporarily stored on disk
• Adjust performance by:
– Crawler impact rules
– Performance level (number of threads)
Set-SPEnterpriseSearchService -PerformanceLevel X
18. Content processing component
• Schema mapping
– Crawled Managed properties
• Entity extraction
– Companies and custom
• Advanced Filter Pack is gone
– Though PDFs are out of the box
• Extensible through web service
• Internally: processing flows
Link – Replaces pipeline in FS4SP
– Based on FSIS/CTS. Hidden
19. Content processing flows
• Hidden in SP2013. In FSIS, flows could be
created, modified and debugged in real-time.
• Why on earth
was this not
included in
SP2013!?
The flow designer in
FSIS, not available in
SP2013.
23. Query processing component
• Prepares the queries
– Query rules
– Result sources
– Linguistics/dictionaries
– Etc.
• Manipulates the results
– Display templates
– Late security trimming
– Etc.
• Internally: processing flows
– Derived from FSIS/IMS. Again, this is hidden
– Still MAJOR improvement compared to FS4SP
24. Query rules
• For a certain term trigger certain action:
– Add/change query terms
– Use alternate sorting/relevance
– Hybrid search (or other federated results)
– Etc.
• Replaces search keywords in SP2010
• Configure at farm, site collection or site-level
• Warning: Triggering the query rules engine comes
with a penalty
– Anecdotal tests: ~70ms + excl. parallel queries
27. Query health reports
Latency per processing node in SharePoint flow
Latency in
Latency in
Trend Overall each Index times Etc.
main flow
subflow
28. Analytics processing component
• Analyzes crawled items and search usage
• Updates index without re-indexing documents
• Result: relevance becomes self-learning
– Also: search reports and recommendations
Link
Analytics
Reporting
29. Type 1: Search analytics
Influences relevance
Type Description
Anchor processing Comparable to Google PageRank.
Click Distance Number of clicks to an authoritative page.
Search clicks Keeps track of how users click in the results.
Used in search center
Type Description
Social tags Tags that users apply to content. Not used per default,
but could be integrated as e.g. refiners.
Social distance Used for sorting in People search.
Deep links Subsite that users click on are added as deep links on
the top-site result.
30. Type 2: SP usage analytics
• Usage counts
– Opened and viewed items
– From all of SharePoint, not just search results
– Improves relevance
• Activity ranking
– Looks for trends and boosts “hot” items
• Recommendations
– Looks for usage patterns within a site
– “People who viewed this also viewed…”
31. Search reports
• Self-learning relevance aside,
never underestimate manual effort!
– Query rules, synonyms, boosts, etc.
• Automatic reports:
– Number of queries
– Top queries
– Abandoned queries
– No-result queries
– Query rule usage
32. Search administration component
• Provisions other search components
• Talks to Admin database on behalf of:
Crawl, Content and Query processing
components
• In previous FAST products, it was impossible to
make the admin component redundant
– Not the case in SP2013!
– Scale appropriately
Admin
33. Hardware properties
Component CPU Memory Disk I/O Network
Crawl Medium Medium Medium High
Content processing High High Medium
Index High High High Medium
Query processing Low Medium Medium
Analytics processing Medium Medium Medium High
Search administration Low Low Low Low
• Special cases
– Crawler temporarily store files on disk
– Memory usage of admin component increases
with topology size
34. Changes in HW requirements
• I/O bound, lots of IOPS! • Still I/O-bound, but:
• VMs not recommended – VMs are fine!
• Often issues with SANs – SANs are fine!
• More RAM required, but:
– Lower indexing latency
– Lower search times
• Thresholds: • Thresholds :
– 15M items/server – 10M items/server
– Tested at 500M items – Tested at 500M items
35. A note on RAM consumption
• Search is a BIG thief of RAM in SP13
• Memory limit configurable in:
<15 hive>SearchRuntime1.0noderunner.exe.config
– Warning: Components may crash at limit
• Safer options:
– Decrease memory limit for the
Distributed Cache service.
– Tell your boss:
“RAM is cheap. I’m not!”