SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
1
The Enterprise Search Market in a
Nutshell
Iain Fletcher
ifletcher@searchtechnologies.com
October 19, 2015
ICIC 2015, Nice
2
Agenda
• About Search Technologies (30 seconds)
• The enterprise search market
• Likely future architectures for supporting
important search applications
3
Search Technologies: Background
San Diego
London UK
San Jose, CR
Cincinnati
San Francisco
Washington
(HQ)
Frankfurt DE
• Founded 2005
• 180 employees
• 600+ customers
• Independent consulting company
• Focus on enterprise search
• Working will all leading platforms
Prague, CZ
4
600+ Customers
5
The Enterprise Search Market
6
High-level Search Engine Classifications
1. Part of a portfolio, many are recently acquired technologies
– E.g. SharePoint/FAST, HP Autonomy, IBM/Vivisimo, Dassault/Exalead,
Oracle/Endeca
2. Stand-alone specialists, often deployed to address specific apps or
challenges
– E.g. GSA, Coveo, Attivio, Sinequa, Recommind
3. Open source, with or without support or proprietary add-ons
– Raw: Lucene, Solr, Elasticsearch
– With support/add-ons: LucidWorks, Cloudera Search, Elastic ELK
4. Cloud-based services, typically based on open source technology
– E.g. Amazon Cloudsearch (Solr), Microsoft Azure search (Elasticsearch)
7
The dominant market share is currently with
SharePoint, open source, and the GSA
• SharePoint 2013 search is credible, and bundled
– Search teams are under pressure to use it, or to provide a
compelling reason to do otherwise
• Solr and Elasticsearch are robust and reliable
– Thanks to very wide-spread deployment
• The Google brand sells – and a lot of GSAs have been
shipped during the past few years
Market Observations
8
Functional Observations
• Core indexing / searching is generally fast and reliable
– Search is a maturing / converging technology
• Key differences remain in peripheral functionality, such as
content processing prior to indexing, and query processing
– Coveo, Attivio, Sinequa etc. have well-developed indexing
pipelines, UI tools, and a range of data connectors
– SharePoint and GSA are delivered with limited content
processing functionality and limited connectivity
– Solr, Elasticsearch, AWS Cloudsearch and Azure search don’t
provide a formal indexing pipeline, UI, or connectors
9
Further Observations
• The search engines with less focus on peripheral issues
such as content processing and connectivity have dominant
market share
• Connectivity is often challenging, especially when
combined with continual data growth, and document-level
security requirements
• The movement of data sets to the cloud adds further
complexity for enterprise search systems
– Hybrid indexing environments will be with us for some years
– Some content sets in the cloud, some behind the firewall
10
Great Search requires Attention to Detail
E.g. in content processing
prior to indexing
• Normalization
– Names, dates, synonyms….
• Entity identification and resolution
• Categorization
• Document vector extraction
• Document splitting and concatenation
• Link & popularity analysis
• Dupe & near-dupe detection
Index
security
category
metadata
11
Future Directions for Search
So what will search architectures look like in the future?
Important influences:
• The business need for organizational and analytical agility
• The convergence of search and (“big data”) analytics
• Continual growth in data volumes, and evolution in
repository / storage fashions
12
Converging Architectures
Let’s take a brief look at:
1. The “Big Data Architecture”, as evangelized by IBM,
Cloudera, etc.
2. Recent Search Architectures
Background Info
13
The Big Data Architecture
Designed for Structured Data
14
The Traditional Search Architecture
Integrated Search EngineContent
Sources
Connectors Index Pipeline Search
IndexEmployee
Directory
CMS
File Share
UI
Etc.
Designed for Unstructured Content
15
The Traditional Search Architecture
Integrated Search EngineContent
Sources
Connectors Index Pipeline Search
IndexEmployee
Directory
CMS
File Share
UI
Etc.
• As data volumes grow, re-indexing
becomes challenging
• The rate at which content can be
acquired from repositories is usually the
bottleneck
Designed for Unstructured Content
16
The Traditional Search Architecture
Integrated Search EngineContent
Sources
Connectors Index Pipeline Search
IndexEmployee
Directory
CMS
File Share
UI
Etc.
• A few documents-per-second?
• There are only 2.6 million seconds in a
month
RE-INDEX
17
A Better Search Architecture
• Re-indexing rates greatly improved
• “Touch-time” with repositories can be managed autonomously
Search EngineContent
Sources
Connectors
Index
Pipeline
Search
Index
Employee
Directory
CMS
Etc.
RE-INDEX
Content
Processing
Secure
Cache
Iterative
Development
18
The Future Architecture?
Hadoop
Search EngineContent
Sources
Connectors
Index
Pipeline
Search
IndexEmployee
Directory
CMS
Etc.
RE-INDEX
Content
Processing
Secure
Cache
Iterative
Development
• This environment will encourage ever more sophisticated text analytics
• We expect to see much innovation in text analytics during the next few years
• The deliverable is a better, and richer search index
19
An Established Architecture
Hadoop
Search EngineContent
Sources
Connectors
Index
Pipeline
Search
IndexEmployee
Directory
CMS
Etc.
RE-INDEX
Content
Processing
Secure
Cache
Iterative
Development
• Google.com works something like this,
since 2004
20
An Integrated Search/Analytics Architecture
Hadoop
Content
Sources
Connectors
CMS
File system
Rapid Indexing
Content
Processing
Secure
Cache
Iterative
Development
ETL
Data
Sources
Data
Warehouse
Logfiles
Etc.
Etc. Search
App.
Search
App.
Analysis
App.
Analysis
App.
• Encourages agile exploitation of data and content resources
21
Summary 1
• Search and Big Data applications are tending towards to the
same architecture
• Autonomous connectivity and content processing simplifies
and de-risks – if you can get it right
• The foundation of great search is still a clean, rich and
detailed index
• The “search index” itself is a mature technology, almost a
commodity
• Much of the innovation during the next few years will be in
text analytics, and other methods of preparing content
prior to indexing
22
The compulsory analyst quote….
And finally….
“Enterprise Search Can Bring Big Data Within Reach”
• Multiple, purpose-built indexes that are derived from enriched
content are necessary.
http://blogs.gartner.com/darin-stewart/2014/04/01/enterprise-search-can-bring-big-data-within-reach/
* Darin Stewart, Enterprise Search Can Bring Big Data Within Reach, April 2014 Blog
23
The Enterprise Search Market in a
Nutshell
Iain Fletcher
ifletcher@Searchtechnologies.com
October 20, 2015
Questions?
24
Spare Slides
25
Reference Architecture
Content
sources
Connectors
Indexes
Semantics
Text Mining
Quality
Metrics
Content Processing Pipelines
Big Data Framework
Indexes
Query
parsing
Search Engine
Web Browser
Staging
Repository
26
Where is the Focus?
• The Business View
• The Implementation View
ApplicationContent Capture
& Preparation
Data
Store
/ Index
Application
Content Capture
& Preparation
Data Store
/ Index

Contenu connexe

Tendances

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
Rajesh Kumar
 

Tendances (20)

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
AWS Data Analytics on AWS
AWS Data Analytics on AWSAWS Data Analytics on AWS
AWS Data Analytics on AWS
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
 
BI and Data Analytics
BI and Data Analytics BI and Data Analytics
BI and Data Analytics
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
 
(ENT305) Develop an Enterprise-wide Cloud Adoption Strategy | AWS re:Invent 2014
(ENT305) Develop an Enterprise-wide Cloud Adoption Strategy | AWS re:Invent 2014(ENT305) Develop an Enterprise-wide Cloud Adoption Strategy | AWS re:Invent 2014
(ENT305) Develop an Enterprise-wide Cloud Adoption Strategy | AWS re:Invent 2014
 
What is Product vs. Platform Product Management by Oracle PM
What is Product vs. Platform Product Management by Oracle PMWhat is Product vs. Platform Product Management by Oracle PM
What is Product vs. Platform Product Management by Oracle PM
 
Data Marketplace and the Role of Data Virtualization
Data Marketplace and the Role of Data VirtualizationData Marketplace and the Role of Data Virtualization
Data Marketplace and the Role of Data Virtualization
 
Data quality patterns in the cloud with ADF
Data quality patterns in the cloud with ADFData quality patterns in the cloud with ADF
Data quality patterns in the cloud with ADF
 
06. Transformation Logic Template (Source to Target)
06. Transformation Logic Template (Source to Target)06. Transformation Logic Template (Source to Target)
06. Transformation Logic Template (Source to Target)
 
Introduction to Microsoft Power Platform (PowerApps, Flow)
Introduction to Microsoft Power Platform (PowerApps, Flow)Introduction to Microsoft Power Platform (PowerApps, Flow)
Introduction to Microsoft Power Platform (PowerApps, Flow)
 
Building an Enterprise-Grade Azure Governance Model
Building an Enterprise-Grade Azure Governance ModelBuilding an Enterprise-Grade Azure Governance Model
Building an Enterprise-Grade Azure Governance Model
 
Sql server 2019 new features
Sql server 2019 new featuresSql server 2019 new features
Sql server 2019 new features
 
Azure automation
Azure automationAzure automation
Azure automation
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
 
The art of implementing data lineage
The art of implementing data lineageThe art of implementing data lineage
The art of implementing data lineage
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for Dummies
 
Building the Modern Data Hub
Building the Modern Data HubBuilding the Modern Data Hub
Building the Modern Data Hub
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
 

En vedette

En vedette (20)

Introduction to enterprise search
Introduction to enterprise searchIntroduction to enterprise search
Introduction to enterprise search
 
The Evolution of Search and Big Data
The Evolution of Search and Big DataThe Evolution of Search and Big Data
The Evolution of Search and Big Data
 
Understanding Cognitive Applications: A Framework - Sue Feldman
Understanding Cognitive Applications:  A Framework - Sue FeldmanUnderstanding Cognitive Applications:  A Framework - Sue Feldman
Understanding Cognitive Applications: A Framework - Sue Feldman
 
SharePoint 2013 Search - Whats new for End Users
SharePoint 2013 Search - Whats new for End UsersSharePoint 2013 Search - Whats new for End Users
SharePoint 2013 Search - Whats new for End Users
 
Enterprise search
Enterprise searchEnterprise search
Enterprise search
 
Coveo Search - Product Overview
Coveo Search - Product OverviewCoveo Search - Product Overview
Coveo Search - Product Overview
 
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
 
Optimising Content Spending with Analytics
Optimising Content Spending with AnalyticsOptimising Content Spending with Analytics
Optimising Content Spending with Analytics
 
Welcome to France, Homebase of the French Speaking Patent Information Associa...
Welcome to France, Homebase of the French Speaking Patent Information Associa...Welcome to France, Homebase of the French Speaking Patent Information Associa...
Welcome to France, Homebase of the French Speaking Patent Information Associa...
 
New Product Introductions - BizInt
New Product Introductions - BizIntNew Product Introductions - BizInt
New Product Introductions - BizInt
 
New Product Introductions - InfoChem
New Product Introductions - InfoChemNew Product Introductions - InfoChem
New Product Introductions - InfoChem
 
RightsDirekt
RightsDirektRightsDirekt
RightsDirekt
 
New Product Introduction - Intellixir
New Product Introduction - IntellixirNew Product Introduction - Intellixir
New Product Introduction - Intellixir
 
New Product Introductions - Minesoft
New Product Introductions - MinesoftNew Product Introductions - Minesoft
New Product Introductions - Minesoft
 
New Product Introductions - FIZ Karlsruhe
New Product Introductions - FIZ KarlsruheNew Product Introductions - FIZ Karlsruhe
New Product Introductions - FIZ Karlsruhe
 
Automatic Chemical Annotation of Large Full-Text Patent Corpora. Pitfalls, Ch...
Automatic Chemical Annotation of Large Full-Text Patent Corpora. Pitfalls, Ch...Automatic Chemical Annotation of Large Full-Text Patent Corpora. Pitfalls, Ch...
Automatic Chemical Annotation of Large Full-Text Patent Corpora. Pitfalls, Ch...
 
New Product Introductions - Questel
New Product Introductions - QuestelNew Product Introductions - Questel
New Product Introductions - Questel
 
Systematic, Automated Analysis of Patents and Related Literature
Systematic, Automated Analysis of Patents and Related LiteratureSystematic, Automated Analysis of Patents and Related Literature
Systematic, Automated Analysis of Patents and Related Literature
 
Thieme Publishers: New Vistas for the Pharmaceutical Industry: Combining full...
Thieme Publishers: New Vistas for the Pharmaceutical Industry: Combining full...Thieme Publishers: New Vistas for the Pharmaceutical Industry: Combining full...
Thieme Publishers: New Vistas for the Pharmaceutical Industry: Combining full...
 
New Product Introductions - CAS
New Product Introductions - CASNew Product Introductions - CAS
New Product Introductions - CAS
 

Similaire à The Enterprise Search Market in a Nutshell

Coexist or Integrate? Manage Unstructured Content from Diverse Repositories a...
Coexist or Integrate? Manage Unstructured Content from Diverse Repositories a...Coexist or Integrate? Manage Unstructured Content from Diverse Repositories a...
Coexist or Integrate? Manage Unstructured Content from Diverse Repositories a...
Concept Searching, Inc
 
How To Implement Engineering Search Within Your Organization Webinar
How To Implement Engineering Search Within Your Organization WebinarHow To Implement Engineering Search Within Your Organization Webinar
How To Implement Engineering Search Within Your Organization Webinar
Concept Searching, Inc
 
Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)
Petter Skodvin-Hvammen
 
Search technologies & aws cloud search
Search technologies & aws cloud searchSearch technologies & aws cloud search
Search technologies & aws cloud search
Amazon Web Services
 

Similaire à The Enterprise Search Market in a Nutshell (20)

Coexist or Integrate? Manage Unstructured Content from Diverse Repositories a...
Coexist or Integrate? Manage Unstructured Content from Diverse Repositories a...Coexist or Integrate? Manage Unstructured Content from Diverse Repositories a...
Coexist or Integrate? Manage Unstructured Content from Diverse Repositories a...
 
How To Implement Engineering Search Within Your Organization Webinar
How To Implement Engineering Search Within Your Organization WebinarHow To Implement Engineering Search Within Your Organization Webinar
How To Implement Engineering Search Within Your Organization Webinar
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Why Information Architecture is Vital for Office 365 Adoption and Governance ...
Why Information Architecture is Vital for Office 365 Adoption and Governance ...Why Information Architecture is Vital for Office 365 Adoption and Governance ...
Why Information Architecture is Vital for Office 365 Adoption and Governance ...
 
Fried connecting across silos seminar
Fried connecting across silos seminarFried connecting across silos seminar
Fried connecting across silos seminar
 
14 Tips for Planning ECM Content Migration to SharePoint
14 Tips for Planning ECM Content Migration to SharePoint14 Tips for Planning ECM Content Migration to SharePoint
14 Tips for Planning ECM Content Migration to SharePoint
 
Workshop - Ways of Working Within the M365 Workspace.pptx
Workshop - Ways of Working Within the M365 Workspace.pptxWorkshop - Ways of Working Within the M365 Workspace.pptx
Workshop - Ways of Working Within the M365 Workspace.pptx
 
SharePoint 2013 governance model
SharePoint 2013 governance modelSharePoint 2013 governance model
SharePoint 2013 governance model
 
Webinar: Lucidworks + Thomson Reuters for Improved Investment Performance
Webinar: Lucidworks + Thomson Reuters for Improved Investment PerformanceWebinar: Lucidworks + Thomson Reuters for Improved Investment Performance
Webinar: Lucidworks + Thomson Reuters for Improved Investment Performance
 
Enterprise Search Summit Keynote: A Big Data Architecture for Search
Enterprise Search Summit Keynote: A Big Data Architecture for SearchEnterprise Search Summit Keynote: A Big Data Architecture for Search
Enterprise Search Summit Keynote: A Big Data Architecture for Search
 
Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)
 
Building an effective sharepoint team
Building an effective sharepoint teamBuilding an effective sharepoint team
Building an effective sharepoint team
 
2017 01-11 intelligent search and intranet - chihuahuas vs muffins v1
2017 01-11 intelligent search and intranet - chihuahuas vs muffins v12017 01-11 intelligent search and intranet - chihuahuas vs muffins v1
2017 01-11 intelligent search and intranet - chihuahuas vs muffins v1
 
Fried data summit big data for lob content
Fried data summit big data for lob contentFried data summit big data for lob content
Fried data summit big data for lob content
 
Webinar: Slippery Slope of SharePoint Migrations
Webinar: Slippery Slope of SharePoint Migrations Webinar: Slippery Slope of SharePoint Migrations
Webinar: Slippery Slope of SharePoint Migrations
 
Steering Away from Bolted-On Analytics
Steering Away from Bolted-On AnalyticsSteering Away from Bolted-On Analytics
Steering Away from Bolted-On Analytics
 
Fried dallas spug
Fried dallas spugFried dallas spug
Fried dallas spug
 
ICIC 2013 Conference Proceedings Andreas Pesenhofer max.recall
ICIC 2013 Conference Proceedings Andreas Pesenhofer max.recallICIC 2013 Conference Proceedings Andreas Pesenhofer max.recall
ICIC 2013 Conference Proceedings Andreas Pesenhofer max.recall
 
TLC2016 - A search engine for Blackboard Learn, the impossible made possible.
TLC2016 - A search engine for Blackboard Learn, the impossible made possible.TLC2016 - A search engine for Blackboard Learn, the impossible made possible.
TLC2016 - A search engine for Blackboard Learn, the impossible made possible.
 
Search technologies & aws cloud search
Search technologies & aws cloud searchSearch technologies & aws cloud search
Search technologies & aws cloud search
 

Plus de Dr. Haxel Consult

AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
Dr. Haxel Consult
 

Plus de Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 

Dernier

Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
imonikaupta
 
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
nirzagarg
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
@Chandigarh #call #Girls 9053900678 @Call #Girls in @Punjab 9053900678
 
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
nirzagarg
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
ydyuyu
 

Dernier (20)

20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
 
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
 
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
 
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck Microsoft
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
 
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
 
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
 
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 

The Enterprise Search Market in a Nutshell

  • 1. 1 The Enterprise Search Market in a Nutshell Iain Fletcher ifletcher@searchtechnologies.com October 19, 2015 ICIC 2015, Nice
  • 2. 2 Agenda • About Search Technologies (30 seconds) • The enterprise search market • Likely future architectures for supporting important search applications
  • 3. 3 Search Technologies: Background San Diego London UK San Jose, CR Cincinnati San Francisco Washington (HQ) Frankfurt DE • Founded 2005 • 180 employees • 600+ customers • Independent consulting company • Focus on enterprise search • Working will all leading platforms Prague, CZ
  • 6. 6 High-level Search Engine Classifications 1. Part of a portfolio, many are recently acquired technologies – E.g. SharePoint/FAST, HP Autonomy, IBM/Vivisimo, Dassault/Exalead, Oracle/Endeca 2. Stand-alone specialists, often deployed to address specific apps or challenges – E.g. GSA, Coveo, Attivio, Sinequa, Recommind 3. Open source, with or without support or proprietary add-ons – Raw: Lucene, Solr, Elasticsearch – With support/add-ons: LucidWorks, Cloudera Search, Elastic ELK 4. Cloud-based services, typically based on open source technology – E.g. Amazon Cloudsearch (Solr), Microsoft Azure search (Elasticsearch)
  • 7. 7 The dominant market share is currently with SharePoint, open source, and the GSA • SharePoint 2013 search is credible, and bundled – Search teams are under pressure to use it, or to provide a compelling reason to do otherwise • Solr and Elasticsearch are robust and reliable – Thanks to very wide-spread deployment • The Google brand sells – and a lot of GSAs have been shipped during the past few years Market Observations
  • 8. 8 Functional Observations • Core indexing / searching is generally fast and reliable – Search is a maturing / converging technology • Key differences remain in peripheral functionality, such as content processing prior to indexing, and query processing – Coveo, Attivio, Sinequa etc. have well-developed indexing pipelines, UI tools, and a range of data connectors – SharePoint and GSA are delivered with limited content processing functionality and limited connectivity – Solr, Elasticsearch, AWS Cloudsearch and Azure search don’t provide a formal indexing pipeline, UI, or connectors
  • 9. 9 Further Observations • The search engines with less focus on peripheral issues such as content processing and connectivity have dominant market share • Connectivity is often challenging, especially when combined with continual data growth, and document-level security requirements • The movement of data sets to the cloud adds further complexity for enterprise search systems – Hybrid indexing environments will be with us for some years – Some content sets in the cloud, some behind the firewall
  • 10. 10 Great Search requires Attention to Detail E.g. in content processing prior to indexing • Normalization – Names, dates, synonyms…. • Entity identification and resolution • Categorization • Document vector extraction • Document splitting and concatenation • Link & popularity analysis • Dupe & near-dupe detection Index security category metadata
  • 11. 11 Future Directions for Search So what will search architectures look like in the future? Important influences: • The business need for organizational and analytical agility • The convergence of search and (“big data”) analytics • Continual growth in data volumes, and evolution in repository / storage fashions
  • 12. 12 Converging Architectures Let’s take a brief look at: 1. The “Big Data Architecture”, as evangelized by IBM, Cloudera, etc. 2. Recent Search Architectures Background Info
  • 13. 13 The Big Data Architecture Designed for Structured Data
  • 14. 14 The Traditional Search Architecture Integrated Search EngineContent Sources Connectors Index Pipeline Search IndexEmployee Directory CMS File Share UI Etc. Designed for Unstructured Content
  • 15. 15 The Traditional Search Architecture Integrated Search EngineContent Sources Connectors Index Pipeline Search IndexEmployee Directory CMS File Share UI Etc. • As data volumes grow, re-indexing becomes challenging • The rate at which content can be acquired from repositories is usually the bottleneck Designed for Unstructured Content
  • 16. 16 The Traditional Search Architecture Integrated Search EngineContent Sources Connectors Index Pipeline Search IndexEmployee Directory CMS File Share UI Etc. • A few documents-per-second? • There are only 2.6 million seconds in a month RE-INDEX
  • 17. 17 A Better Search Architecture • Re-indexing rates greatly improved • “Touch-time” with repositories can be managed autonomously Search EngineContent Sources Connectors Index Pipeline Search Index Employee Directory CMS Etc. RE-INDEX Content Processing Secure Cache Iterative Development
  • 18. 18 The Future Architecture? Hadoop Search EngineContent Sources Connectors Index Pipeline Search IndexEmployee Directory CMS Etc. RE-INDEX Content Processing Secure Cache Iterative Development • This environment will encourage ever more sophisticated text analytics • We expect to see much innovation in text analytics during the next few years • The deliverable is a better, and richer search index
  • 19. 19 An Established Architecture Hadoop Search EngineContent Sources Connectors Index Pipeline Search IndexEmployee Directory CMS Etc. RE-INDEX Content Processing Secure Cache Iterative Development • Google.com works something like this, since 2004
  • 20. 20 An Integrated Search/Analytics Architecture Hadoop Content Sources Connectors CMS File system Rapid Indexing Content Processing Secure Cache Iterative Development ETL Data Sources Data Warehouse Logfiles Etc. Etc. Search App. Search App. Analysis App. Analysis App. • Encourages agile exploitation of data and content resources
  • 21. 21 Summary 1 • Search and Big Data applications are tending towards to the same architecture • Autonomous connectivity and content processing simplifies and de-risks – if you can get it right • The foundation of great search is still a clean, rich and detailed index • The “search index” itself is a mature technology, almost a commodity • Much of the innovation during the next few years will be in text analytics, and other methods of preparing content prior to indexing
  • 22. 22 The compulsory analyst quote…. And finally…. “Enterprise Search Can Bring Big Data Within Reach” • Multiple, purpose-built indexes that are derived from enriched content are necessary. http://blogs.gartner.com/darin-stewart/2014/04/01/enterprise-search-can-bring-big-data-within-reach/ * Darin Stewart, Enterprise Search Can Bring Big Data Within Reach, April 2014 Blog
  • 23. 23 The Enterprise Search Market in a Nutshell Iain Fletcher ifletcher@Searchtechnologies.com October 20, 2015 Questions?
  • 25. 25 Reference Architecture Content sources Connectors Indexes Semantics Text Mining Quality Metrics Content Processing Pipelines Big Data Framework Indexes Query parsing Search Engine Web Browser Staging Repository
  • 26. 26 Where is the Focus? • The Business View • The Implementation View ApplicationContent Capture & Preparation Data Store / Index Application Content Capture & Preparation Data Store / Index