Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Making Search Better: Managing Relevancy
1.
2.
3. Join us right after the event at Firehouse
Grill for a free drink, kindly provided by
AvePoint and Rackspace! 1765 East
Bayshore Road East Palo Alto, CA 94303
(Next to Nordstrom Rack).
Drinks to be provided by…..
4. Miles Kehoe Mark Bennett
New Idea Engineering, Inc. New Idea Engineering, Inc.
miles.kehoe@ideaeng.com mbennett@ideaeng.com
408-446-3460 / 408-828-4592 408-446-3460 / 408-829-6513
Chris Fernandez Evan Sayer
New Idea Engineering, Inc. New Idea Engineering, Inc.
chris.fernandez@ideaeng.com Evan.sayer@ideaeng.com
408-446-3460 / 650-279-8343 408-446-3460 /
6. New Idea Engineering Inc. founded in 1996
◦ The Business & Technology of Search
Vendor-neutral search expertise:
◦ Commercial and Open Source Projects
◦ Evaluation, Selection, Implementation, Ongoing Mgmt
Search analysts, developers, consultants, authors:
◦ Blog: www.enterprisesearchblog.com
◦ Presentations at SPTEchCon / ESS / KMWorld / …
◦ Authors of Professional Microsoft Search
12. Some Assembly Expected – often Required
The Platform
Most search platforms are general purpose
Vendors expect you to customize platform
The OOB platform works*
The Demo
Sales staff work from a script
Demos are scripted and use easy, clean data
“That‟s the product I want!”
A POC using your content is best criteria
*“Any sufficiently advanced demo
is indistinguishable from product”
13. User Requirements Drive Product Capability
Early search: exact match
Users pushed for (and got):
Plurals/Stemming
Soundex
Document filters
Fielded search
Boolean Operators & advanced query syntax
Synonyms and thesaurus support
Security
Facets Navigation
…
14. But Search is Tough to get Right
Search touches everything:
All repositories/formats
Structured and unstructured
Has to respect security
Expected to co-exist with other repositories
Many „near duplicate‟ documents
Content changing in real time
And it looks so easy on the „net & in the demo
- and it has to have sub-second response!
16. Simple Search Model
The search User sees search like this:
health Search Kernel Result List
Did you mean Health Services
1. Health and smoking
2. High risk behaviors
…
As a search manager, you see the magic behind the curtain:
Crawler/Indexer
Search
Index
17.
18. Anatomy of a search engine
Inverted ‘Table Like’ Structure
Word List for extracted fields (metadata)
apple New printer ships Jan 20K PressRel.TXT
autonomy iPhone top hit Dec 12K Music.PDF
cisco
8 core chip Apr 128K News.DOCX
digital
google Snow Leopard Dec 8K Jdbc:12345
hewlett Win 8 Launch Aug 39K Info.XPS
intel
microsoft
Content
Repository
19. Site Design
◦ Good planning is a key step in search!
Indexing
◦ Improve content prior to /during indexing
Query Pre-Processing
◦ Enhance the user query
Maximize Kernel Capability
◦ Improve content prior to indexing
Post Processing
◦ Enhance result set
20. Site Design: Part 1
◦
◦
◦
http://sales/eastern_region
http://sales/western_region
http://sales/international
22. The indexing process
Start with a „key‟: a file name, URL, DB row
Process the document:
fetch the document
identify the format
convert into text
recognize the language
apply indexed synonyms
extract entities
Feed the doc into the indexer
23.
24. Enhance the quality of the content
◦ Improve content prior to indexing
◦ Add synonym terms for indexing
◦ Provide/extract entities for facets
◦ Add content where it makes sense
(Context of user and document)
◦ How? FS4SP pipeline; IDOL IDX; Exalead API…
◦ Remember: speed is of the essence!
25. Enhance the quality of the query
◦ Autocomplete
◦ Evaluate/parse the query
Recognize any special format
(name, product number)
Inappropriate vocabulary?
◦ Expand query using platform query language
◦ Is it a best bet term?
Think „Siri‟
26. Note: Scriptaculous.js includes this style of Autocomplete code .
Pingar also includes this in their commercial library of tools
27. Use all that your platform supports:
◦ Spell check (custom dictionary)
◦ Thesaurus/Synonyms
◦ Stemming/Plurals
◦ Fielded search/relevance
Call out to likely federated sources
28. Processing results before the user sees them
◦ Eliminate or collapse duplicate documents
◦ Insert best bets, spell suggests
◦ Don‟t hesitate to insert your own smarts
◦ Assemble sources (tabular view, clusters)
◦ If it looks like:
a person name, show contact info
a part number, show a link to the product page
a common term, offer to disambiguate
34. Emulate Google.com
Simplify search form
Use the context
Regularly monitor and adjust search
Actively encourage user feedback
Encourage and recognize good
metadata @ content creation
35. Add quality metadata tools
Wide range of tools
Costs vary based on solution
◦ Use „behavior-based‟ metadata
◦ Use custom-created taxonomies
◦ Use automated tools
36. Some solutions from NIE Partners
Pingar:
Automated taxonomy, entity extraction,
summarization and more
WAND:
Vertical market human created taxonomies
Concept Searching:
Seeded automatic taxonomy with
human validation
Custom Taxonomies:
Expert creates custom taxonomy for
your company
37. Key Take-Aways
Quantify the metric for your site(s)
Encourage tagging at content creation
Keep search and results simple
Use the context (query cooking)
Facilitate and take action on feedback
Ongoing monitoring and tuning is required
38. Miles Kehoe Chris Fernandez Mark Bennett
New Idea Engineering, Inc. New Idea Engineering, Inc. New Idea Engineering, Inc.
miles.kehoe@ideaeng.com chris.fernandez@ideaeng.com mbennett@ideaeng.com
408-446-3460 / 408-828-4592 408-446-3460 / 650-279-8343 408-446-3460 / 408-829-6513
39. Join us right after the event at Firehouse
Grill for a free drink, kindly provided by
AvePoint and Rackspace! 1765 East
Bayshore Road East Palo Alto, CA 94303
(Next to Nordstrom Rack).
Drinks to be provided by…..
Notes de l'éditeur
Copyright (2012) Miles Kehoe/New Idea EngineeringInc