SlideShare une entreprise Scribd logo
1  sur  19
FIBEP World Media Intelligence Congress17-20 November 2015, ViennaFIBEP World Media Intelligence Congress17-20 November 2015, Vienna
www.wmicongress.com
Speaker:
Twitter:
How Infomedia upgraded their closed-source
search engine to a fast, scalable and flexible
open-source platform
Session Title:
2015-11-19
Kristian Schou, Infomedia & Charlie Hull, Flax
@InfomediaDK @Flaxsearch Web: www.flax.co.uk
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
About Infomedia
• Founded in 2003
• The leading Danish provider of media monitoring and media
analysis
• Largest and oldest Danish Media archive with access to
approximately 75 million searchable articles
@_FIBEP #_FIBEP #WMIC152015-11-19
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
About Flax
• Founded in 2001 in Cambridge, U.K.
• Independent, honest advice and analysis
• Expert design & development, Apache Solr committers
• Test-driven relevancy and performance tuning
• Custom training & mentoring for your staff
• Flexible support up to 24/7/365 with SLAs
• Some of our clients:
@_FIBEP #_FIBEP #WMIC152015-11-19

FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
The situation at Infomedia in 2013
• Very old media monitoring system based on Verity
• Verity was put into production in 2001 at the company that would later become
Infomedia!
• Slightly less old installation of Autonomy IDOL used for
Infomedia’s Media Archive
• put into production at Infomedia in 2009/10
• Drawbacks:
– Verity at almost max capacity needing constant attention
– Old and complex workflow for receiving and processing articles
– Different platforms for monitoring and archive searches meant we were ‘bi-lingual’,
using two different query languages in-house.
– Verity no longer supported by the owning company (HP)
– Verity not scalable!
@_FIBEP #_FIBEP #WMIC152015-11-19
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
What to do?
• Different upgrading options explored throughout 2011-2012
• Upgrade everything to Autonomy IDOL?
• Switch to other commercial search engine?
• Go open-source?
• Recommendations and internal testing drew us to Apache Solr, an
open source enterprise search platform
• Advantages:
– Transparency (going from commercial to open-source)
– Rapid maturity of Solr – development moving very fast
– Large and active Solr Community
– Customizability
– Solr is known to be fast and highly scalable
– No license fees
@_FIBEP #_FIBEP #WMIC152015-11-19
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Defining the project with Flax
• Infomedia searched for Solr expertise in Denmark/Scandinavia
– could not find an option that we were comfortable with
• Introduced to Flax through networking and recommendations
– Experience from similar upgrade projects with Gorkana and AAP
– Very impressed with Flax’s insight, knowledge and credentials
– Actual committer to Apache Solr
• Project began in autumn of 2013 with the goals of:
– Building a completely new search architecture to replace Verity and IDOL
– Defining Infomedia's own query language, IQL, owned and controlled by Infomedia
– Translating old monitoring queries (app. 8.000) to this new IQL syntax
@_FIBEP #_FIBEP #WMIC152015-11-19
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Replacing Verity
• Verity replaced by Flax Monitor
– Parses IQL to Lucene queries
– Runs on 2 servers
– Uses Luwak, Flax's 'stored search' library:
• Built on Apache Lucene (as is Solr)
• Also used by Bloomberg, Booz Allen Hamilton & others
• In use for 1m stored searches (some 250k characters), 1m stories/day
• 40x faster than Elasticsearch Percolator
• Open source at https://github.com/flaxsearch/luwak
@_FIBEP #_FIBEP #WMIC152015-11-19
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Turning search upside down
@_FIBEP #_FIBEP #WMIC152015-11-19
Docs
Result
Query
QueryStored
Queries $$$
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Turning search upside down
@_FIBEP #_FIBEP #WMIC152015-11-19
Docs
Result
Query
QueryStored
Queries
1 million queries
Some 250k long
Complex rules
1 million new
documents a
day
$$$
Within 5-100ms
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Turning search upside down
@_FIBEP #_FIBEP #WMIC152015-11-19
Docs
Result
Query
QueryStored
Queries
1 million queries
Some 250k long
Complex rules
1 million new
documents a
day
$$$$$$
Within 5-100ms
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Turning search upside down
@_FIBEP #_FIBEP #WMIC152015-11-19
Docs
Result
Query
QueryStored
Queries
1 million queries
Some 250k long
Complex rules
1 million new
documents a
day
$$$$$$
Within 5-100ms
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Turning search upside down
@_FIBEP #_FIBEP #WMIC152015-11-19
Docs
Query
QueryStored
Queries 1.
Pre
Query
Subset
1 million queries
Some 250k long
Complex rules
~200
Doc
1 million new
documents
a day
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Turning search upside down
@_FIBEP #_FIBEP #WMIC152015-11-19
Docs
Query
QueryStored
Queries 1.
Pre
Query
Subset
Result
1 million queries
Some 250k long
Complex rules
~200
2.
Search
Doc
1 million new
documents
a day
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Replacing Autonomy IDOL
• Autonomy IDOL replaced by Apache Solr
− Parses IQL to Lucene queries
− SolrCloud distributes the index & queries across several servers
− Setup: 75 million documents hosted on 8 servers,
6 cores/24GB memory and 125 GB storage per server
− This setup is doubled to have full redundancy
− Features added to standard Solr by Flax:
• Custom highlighting,
• Framework to handle multiple languages
• Extended error logging
• Cluster management
• Performance enhancements for complex wildcard queries
@_FIBEP #_FIBEP #WMIC152015-11-19
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Benefits of the project
• Articles indexed and searchable within minutes of receiving them
• New, much smarter tools for constructing and comparing
monitoring queries
• The Flax Monitor is an extremely smart and performant monitoring
solution
• Huge benefits from defining the Infomedia Query Language, IQL
– Extremely enlightening and empowering process to analyze what we actually need from a
query language
– We fully understand and have documented how IQL works
– IQL is designed to match Infomedia’s demands and preferences
– We can revise and expand IQL as new needs and opportunities arrive
– Not bound to any search platform. We can take it with us
@_FIBEP #_FIBEP #WMIC152015-11-19
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Learnings/Where are we now?
• A challenging, complex, time-consuming but ultimately rewarding project
• The ripple effect – we have had to revisit and update a lot of legacy systems
• Customization is great, but can also mean more specification
• Open Source prevents lock-in but demands investment in education - otherwise it is still
just a magic box
• Flax‘s expert knowledge has been invaluable
• A succesful migration
• More than 90% of Infomedia’s monitoring queries have been migrated to IQL with
practically no negative change in precision or recall
• The collaboration with Flax continues
• As Infomedia develops, so do new ideas and feature requests
• A customized open source platform also means continuous improvement
• Currently updating to Solr 5.3
• Still experimenting with different ways to scale our Solr installation
@_FIBEP #_FIBEP #WMIC152015-11-19
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Other lessons
• You can also keep your old query language
- Flax have written dtSearch & Verity parsers for Lucene
• Some of your old queries might not be working
- e.g. Verity doesn't always tell you when queries are broken!
• Open source can help future-proof your search
- and you have control of the software
• Engage with the open source community:
- User groups
- Mailing lists
- Contribute back if you can
@_FIBEP #_FIBEP #WMIC152015-11-19
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
@_FIBEP #_FIBEP #WMIC15Date of Presentation
Thanks for listening
- any questions?
Kristian Schou, Infomedia & Charlie Hull, Flax
@InfomediaDK @Flaxsearch Web: www.flax.co.uk
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
@_FIBEP #_FIBEP #WMIC15Date of Presentation
Something else you might like
Think outside the search box!
2DSearch is a patent pending, radical alternative to traditional keyword
search. Instead of a one-dimensional search box, concepts are
expressed and manipulated as objects on a two-dimensional canvas.
So you spend less time worrying about Boolean strings, and more
time creating semantically transparent queries and effective search
strategies.
Sign up to gain early access at www.2dsearch.com

Contenu connexe

Tendances

Neo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4jNeo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4jNeo4j
 
Agile Lab_BigData_Meetup_AKKA
Agile Lab_BigData_Meetup_AKKAAgile Lab_BigData_Meetup_AKKA
Agile Lab_BigData_Meetup_AKKAPaolo Platter
 
Full Stack Graph in the Cloud
Full Stack Graph in the CloudFull Stack Graph in the Cloud
Full Stack Graph in the CloudNeo4j
 
A whirlwind tour of graph databases
A whirlwind tour of graph databasesA whirlwind tour of graph databases
A whirlwind tour of graph databasesjexp
 
Slide 3 Fast Data processing with kafka, rfx and redis
Slide 3 Fast Data processing with kafka, rfx and redisSlide 3 Fast Data processing with kafka, rfx and redis
Slide 3 Fast Data processing with kafka, rfx and redisTrieu Nguyen
 
Choosing the Right Open Source Database
Choosing the Right Open Source DatabaseChoosing the Right Open Source Database
Choosing the Right Open Source DatabaseAll Things Open
 
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4jNeo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4jNeo4j
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...jexp
 
Real-time Big Data at FPT (for TechCamp University)
Real-time Big Data at FPT (for TechCamp University)Real-time Big Data at FPT (for TechCamp University)
Real-time Big Data at FPT (for TechCamp University)Trieu Nguyen
 
NSGIC 2011 Presentation on geo open source
NSGIC 2011 Presentation on geo open sourceNSGIC 2011 Presentation on geo open source
NSGIC 2011 Presentation on geo open sourceMichael Terner
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Dataconomy Media
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Mats Uddenfeldt
 
Building a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and OntologiesBuilding a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and OntologiesNeo4j
 
Mastering On-Site Search / Custom Site Search
Mastering On-Site Search / Custom Site SearchMastering On-Site Search / Custom Site Search
Mastering On-Site Search / Custom Site SearchRalf Schwoebel
 
RFX - Full-Stack Technology for Real-time Big Data
RFX - Full-Stack Technology for Real-time Big DataRFX - Full-Stack Technology for Real-time Big Data
RFX - Full-Stack Technology for Real-time Big DataTrieu Nguyen
 
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4j
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4jNeo4j GraphTalks - Introduction to GraphDatabases and Neo4j
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4jNeo4j
 
Finding the Needle in a Haystack With Knowledge Graphs
Finding the Needle in a Haystack With Knowledge GraphsFinding the Needle in a Haystack With Knowledge Graphs
Finding the Needle in a Haystack With Knowledge GraphsNeo4j
 
Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning ProductsAndrew Musselman
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB
 

Tendances (20)

Neo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4jNeo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
 
Agile Lab_BigData_Meetup_AKKA
Agile Lab_BigData_Meetup_AKKAAgile Lab_BigData_Meetup_AKKA
Agile Lab_BigData_Meetup_AKKA
 
Full Stack Graph in the Cloud
Full Stack Graph in the CloudFull Stack Graph in the Cloud
Full Stack Graph in the Cloud
 
A whirlwind tour of graph databases
A whirlwind tour of graph databasesA whirlwind tour of graph databases
A whirlwind tour of graph databases
 
Slide 3 Fast Data processing with kafka, rfx and redis
Slide 3 Fast Data processing with kafka, rfx and redisSlide 3 Fast Data processing with kafka, rfx and redis
Slide 3 Fast Data processing with kafka, rfx and redis
 
Choosing the Right Open Source Database
Choosing the Right Open Source DatabaseChoosing the Right Open Source Database
Choosing the Right Open Source Database
 
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4jNeo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...
 
Real-time Big Data at FPT (for TechCamp University)
Real-time Big Data at FPT (for TechCamp University)Real-time Big Data at FPT (for TechCamp University)
Real-time Big Data at FPT (for TechCamp University)
 
NSGIC 2011 Presentation on geo open source
NSGIC 2011 Presentation on geo open sourceNSGIC 2011 Presentation on geo open source
NSGIC 2011 Presentation on geo open source
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Building a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and OntologiesBuilding a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and Ontologies
 
Mastering On-Site Search / Custom Site Search
Mastering On-Site Search / Custom Site SearchMastering On-Site Search / Custom Site Search
Mastering On-Site Search / Custom Site Search
 
RFX - Full-Stack Technology for Real-time Big Data
RFX - Full-Stack Technology for Real-time Big DataRFX - Full-Stack Technology for Real-time Big Data
RFX - Full-Stack Technology for Real-time Big Data
 
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4j
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4jNeo4j GraphTalks - Introduction to GraphDatabases and Neo4j
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4j
 
Tech view on Regulatory Compliance
Tech view on Regulatory ComplianceTech view on Regulatory Compliance
Tech view on Regulatory Compliance
 
Finding the Needle in a Haystack With Knowledge Graphs
Finding the Needle in a Haystack With Knowledge GraphsFinding the Needle in a Haystack With Knowledge Graphs
Finding the Needle in a Haystack With Knowledge Graphs
 
Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning Products
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDB
 

Similaire à FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to a fast, scalable and flexible open-source platform

Flink's Journey from Academia to the ASF
Flink's Journey from Academia to the ASFFlink's Journey from Academia to the ASF
Flink's Journey from Academia to the ASFFabian Hueske
 
Introduction to (web) APIs - definitions, examples, concepts and trends
Introduction to (web) APIs - definitions, examples, concepts and trendsIntroduction to (web) APIs - definitions, examples, concepts and trends
Introduction to (web) APIs - definitions, examples, concepts and trendsOlaf Janssen
 
July OpenNTF Webinar - HCL Presents Keep, a new API for Domino
July OpenNTF Webinar - HCL Presents Keep, a new API for DominoJuly OpenNTF Webinar - HCL Presents Keep, a new API for Domino
July OpenNTF Webinar - HCL Presents Keep, a new API for DominoHoward Greenberg
 
Global Open Source Development 2011-2014 Review and 2015 Forecast
Global Open Source Development 2011-2014 Review and 2015 ForecastGlobal Open Source Development 2011-2014 Review and 2015 Forecast
Global Open Source Development 2011-2014 Review and 2015 ForecastSammy Fung
 
Freme general-overview-version-june-2015
Freme general-overview-version-june-2015Freme general-overview-version-june-2015
Freme general-overview-version-june-2015FREMEProjectH2020
 
OpenAIRE infrastructure and Services (OpenAIRE Workshop Malta)
OpenAIRE infrastructure and Services (OpenAIRE Workshop Malta)OpenAIRE infrastructure and Services (OpenAIRE Workshop Malta)
OpenAIRE infrastructure and Services (OpenAIRE Workshop Malta)Pedro Príncipe
 
The Europeana API Strategy
The Europeana API StrategyThe Europeana API Strategy
The Europeana API StrategyDavid Haskiya
 
OpenStack August 2014 Marketing Meeting
OpenStack August 2014 Marketing MeetingOpenStack August 2014 Marketing Meeting
OpenStack August 2014 Marketing MeetingOpenStack Foundation
 
Open source business models
Open source business modelsOpen source business models
Open source business modelsDave Neary
 
On-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the CloudOn-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the CloudMarin Dimitrov
 
Apache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 BeijingApache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 BeijingLuke Han
 
FIWARE Accelerator Programme: 80 Milion Euro for Start-Ups and SMEs
FIWARE Accelerator Programme: 80 Milion Euro for Start-Ups and SMEsFIWARE Accelerator Programme: 80 Milion Euro for Start-Ups and SMEs
FIWARE Accelerator Programme: 80 Milion Euro for Start-Ups and SMEsCodemotion
 
20170720 fiware lab_at_open_stack_days_tokyo
20170720 fiware lab_at_open_stack_days_tokyo20170720 fiware lab_at_open_stack_days_tokyo
20170720 fiware lab_at_open_stack_days_tokyostefano de panfilis
 
Piwik presentation 2011
Piwik presentation 2011Piwik presentation 2011
Piwik presentation 2011Matthieu Aubry
 
INTERFACE, by apidays - Lessons learned from implementing our custom ‘Big Da...
INTERFACE, by apidays  - Lessons learned from implementing our custom ‘Big Da...INTERFACE, by apidays  - Lessons learned from implementing our custom ‘Big Da...
INTERFACE, by apidays - Lessons learned from implementing our custom ‘Big Da...apidays
 
WSO2Con EU 2015: Opening Keynote - Helping You Connect the World
WSO2Con EU 2015: Opening Keynote - Helping You Connect the WorldWSO2Con EU 2015: Opening Keynote - Helping You Connect the World
WSO2Con EU 2015: Opening Keynote - Helping You Connect the WorldWSO2
 
Liberate Your Library Building A Scottish Consortium November 16th 2009
Liberate Your Library   Building A Scottish Consortium November 16th 2009Liberate Your Library   Building A Scottish Consortium November 16th 2009
Liberate Your Library Building A Scottish Consortium November 16th 2009Jonathan Field
 
Values & Vision - Cloud Sandboxes for BIG Earth Sciences
Values & Vision - Cloud Sandboxes for BIG Earth SciencesValues & Vision - Cloud Sandboxes for BIG Earth Sciences
Values & Vision - Cloud Sandboxes for BIG Earth Sciencesterradue
 
[WSO2 Integration Summit London 2019] An API-enabled Journey Towards Empoweri...
[WSO2 Integration Summit London 2019] An API-enabled Journey Towards Empoweri...[WSO2 Integration Summit London 2019] An API-enabled Journey Towards Empoweri...
[WSO2 Integration Summit London 2019] An API-enabled Journey Towards Empoweri...WSO2
 

Similaire à FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to a fast, scalable and flexible open-source platform (20)

Flink's Journey from Academia to the ASF
Flink's Journey from Academia to the ASFFlink's Journey from Academia to the ASF
Flink's Journey from Academia to the ASF
 
Introduction to (web) APIs - definitions, examples, concepts and trends
Introduction to (web) APIs - definitions, examples, concepts and trendsIntroduction to (web) APIs - definitions, examples, concepts and trends
Introduction to (web) APIs - definitions, examples, concepts and trends
 
July OpenNTF Webinar - HCL Presents Keep, a new API for Domino
July OpenNTF Webinar - HCL Presents Keep, a new API for DominoJuly OpenNTF Webinar - HCL Presents Keep, a new API for Domino
July OpenNTF Webinar - HCL Presents Keep, a new API for Domino
 
Global Open Source Development 2011-2014 Review and 2015 Forecast
Global Open Source Development 2011-2014 Review and 2015 ForecastGlobal Open Source Development 2011-2014 Review and 2015 Forecast
Global Open Source Development 2011-2014 Review and 2015 Forecast
 
Freme general-overview-version-june-2015
Freme general-overview-version-june-2015Freme general-overview-version-june-2015
Freme general-overview-version-june-2015
 
OpenAIRE infrastructure and Services (OpenAIRE Workshop Malta)
OpenAIRE infrastructure and Services (OpenAIRE Workshop Malta)OpenAIRE infrastructure and Services (OpenAIRE Workshop Malta)
OpenAIRE infrastructure and Services (OpenAIRE Workshop Malta)
 
The Europeana API Strategy
The Europeana API StrategyThe Europeana API Strategy
The Europeana API Strategy
 
OpenStack August 2014 Marketing Meeting
OpenStack August 2014 Marketing MeetingOpenStack August 2014 Marketing Meeting
OpenStack August 2014 Marketing Meeting
 
Open source business models
Open source business modelsOpen source business models
Open source business models
 
On-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the CloudOn-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the Cloud
 
Apache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 BeijingApache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 Beijing
 
FIWARE Accelerator Programme: 80 Milion Euro for Start-Ups and SMEs
FIWARE Accelerator Programme: 80 Milion Euro for Start-Ups and SMEsFIWARE Accelerator Programme: 80 Milion Euro for Start-Ups and SMEs
FIWARE Accelerator Programme: 80 Milion Euro for Start-Ups and SMEs
 
20170720 fiware lab_at_open_stack_days_tokyo
20170720 fiware lab_at_open_stack_days_tokyo20170720 fiware lab_at_open_stack_days_tokyo
20170720 fiware lab_at_open_stack_days_tokyo
 
Piwik presentation 2011
Piwik presentation 2011Piwik presentation 2011
Piwik presentation 2011
 
INTERFACE, by apidays - Lessons learned from implementing our custom ‘Big Da...
INTERFACE, by apidays  - Lessons learned from implementing our custom ‘Big Da...INTERFACE, by apidays  - Lessons learned from implementing our custom ‘Big Da...
INTERFACE, by apidays - Lessons learned from implementing our custom ‘Big Da...
 
WSO2Con EU 2015: Opening Keynote - Helping You Connect the World
WSO2Con EU 2015: Opening Keynote - Helping You Connect the WorldWSO2Con EU 2015: Opening Keynote - Helping You Connect the World
WSO2Con EU 2015: Opening Keynote - Helping You Connect the World
 
Liberate Your Library Building A Scottish Consortium November 16th 2009
Liberate Your Library   Building A Scottish Consortium November 16th 2009Liberate Your Library   Building A Scottish Consortium November 16th 2009
Liberate Your Library Building A Scottish Consortium November 16th 2009
 
Semantic Technology in Publishing & Finance
Semantic Technology in Publishing & FinanceSemantic Technology in Publishing & Finance
Semantic Technology in Publishing & Finance
 
Values & Vision - Cloud Sandboxes for BIG Earth Sciences
Values & Vision - Cloud Sandboxes for BIG Earth SciencesValues & Vision - Cloud Sandboxes for BIG Earth Sciences
Values & Vision - Cloud Sandboxes for BIG Earth Sciences
 
[WSO2 Integration Summit London 2019] An API-enabled Journey Towards Empoweri...
[WSO2 Integration Summit London 2019] An API-enabled Journey Towards Empoweri...[WSO2 Integration Summit London 2019] An API-enabled Journey Towards Empoweri...
[WSO2 Integration Summit London 2019] An API-enabled Journey Towards Empoweri...
 

Plus de Charlie Hull

Lucene, Solr and java 9 - opportunities and challenges
Lucene, Solr and java 9 - opportunities and challengesLucene, Solr and java 9 - opportunities and challenges
Lucene, Solr and java 9 - opportunities and challengesCharlie Hull
 
Making sense of big data
Making sense of big dataMaking sense of big data
Making sense of big dataCharlie Hull
 
Search Solutions 2015: Towards a new model of search relevance testing
Search Solutions 2015:  Towards a new model of search relevance testingSearch Solutions 2015:  Towards a new model of search relevance testing
Search Solutions 2015: Towards a new model of search relevance testingCharlie Hull
 
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015Charlie Hull
 
Bio solr building a better search for bioinformatics
Bio solr   building a better search for bioinformaticsBio solr   building a better search for bioinformatics
Bio solr building a better search for bioinformaticsCharlie Hull
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studyCharlie Hull
 

Plus de Charlie Hull (6)

Lucene, Solr and java 9 - opportunities and challenges
Lucene, Solr and java 9 - opportunities and challengesLucene, Solr and java 9 - opportunities and challenges
Lucene, Solr and java 9 - opportunities and challenges
 
Making sense of big data
Making sense of big dataMaking sense of big data
Making sense of big data
 
Search Solutions 2015: Towards a new model of search relevance testing
Search Solutions 2015:  Towards a new model of search relevance testingSearch Solutions 2015:  Towards a new model of search relevance testing
Search Solutions 2015: Towards a new model of search relevance testing
 
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
 
Bio solr building a better search for bioinformatics
Bio solr   building a better search for bioinformaticsBio solr   building a better search for bioinformatics
Bio solr building a better search for bioinformatics
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
 

Dernier

Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...software pro Development
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 

Dernier (20)

Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 

FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to a fast, scalable and flexible open-source platform

  • 1. FIBEP World Media Intelligence Congress17-20 November 2015, ViennaFIBEP World Media Intelligence Congress17-20 November 2015, Vienna www.wmicongress.com Speaker: Twitter: How Infomedia upgraded their closed-source search engine to a fast, scalable and flexible open-source platform Session Title: 2015-11-19 Kristian Schou, Infomedia & Charlie Hull, Flax @InfomediaDK @Flaxsearch Web: www.flax.co.uk
  • 2. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna About Infomedia • Founded in 2003 • The leading Danish provider of media monitoring and media analysis • Largest and oldest Danish Media archive with access to approximately 75 million searchable articles @_FIBEP #_FIBEP #WMIC152015-11-19
  • 3. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna About Flax • Founded in 2001 in Cambridge, U.K. • Independent, honest advice and analysis • Expert design & development, Apache Solr committers • Test-driven relevancy and performance tuning • Custom training & mentoring for your staff • Flexible support up to 24/7/365 with SLAs • Some of our clients: @_FIBEP #_FIBEP #WMIC152015-11-19 
  • 4. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna The situation at Infomedia in 2013 • Very old media monitoring system based on Verity • Verity was put into production in 2001 at the company that would later become Infomedia! • Slightly less old installation of Autonomy IDOL used for Infomedia’s Media Archive • put into production at Infomedia in 2009/10 • Drawbacks: – Verity at almost max capacity needing constant attention – Old and complex workflow for receiving and processing articles – Different platforms for monitoring and archive searches meant we were ‘bi-lingual’, using two different query languages in-house. – Verity no longer supported by the owning company (HP) – Verity not scalable! @_FIBEP #_FIBEP #WMIC152015-11-19
  • 5. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna What to do? • Different upgrading options explored throughout 2011-2012 • Upgrade everything to Autonomy IDOL? • Switch to other commercial search engine? • Go open-source? • Recommendations and internal testing drew us to Apache Solr, an open source enterprise search platform • Advantages: – Transparency (going from commercial to open-source) – Rapid maturity of Solr – development moving very fast – Large and active Solr Community – Customizability – Solr is known to be fast and highly scalable – No license fees @_FIBEP #_FIBEP #WMIC152015-11-19
  • 6. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Defining the project with Flax • Infomedia searched for Solr expertise in Denmark/Scandinavia – could not find an option that we were comfortable with • Introduced to Flax through networking and recommendations – Experience from similar upgrade projects with Gorkana and AAP – Very impressed with Flax’s insight, knowledge and credentials – Actual committer to Apache Solr • Project began in autumn of 2013 with the goals of: – Building a completely new search architecture to replace Verity and IDOL – Defining Infomedia's own query language, IQL, owned and controlled by Infomedia – Translating old monitoring queries (app. 8.000) to this new IQL syntax @_FIBEP #_FIBEP #WMIC152015-11-19
  • 7. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Replacing Verity • Verity replaced by Flax Monitor – Parses IQL to Lucene queries – Runs on 2 servers – Uses Luwak, Flax's 'stored search' library: • Built on Apache Lucene (as is Solr) • Also used by Bloomberg, Booz Allen Hamilton & others • In use for 1m stored searches (some 250k characters), 1m stories/day • 40x faster than Elasticsearch Percolator • Open source at https://github.com/flaxsearch/luwak @_FIBEP #_FIBEP #WMIC152015-11-19
  • 8. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Turning search upside down @_FIBEP #_FIBEP #WMIC152015-11-19 Docs Result Query QueryStored Queries $$$
  • 9. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Turning search upside down @_FIBEP #_FIBEP #WMIC152015-11-19 Docs Result Query QueryStored Queries 1 million queries Some 250k long Complex rules 1 million new documents a day $$$ Within 5-100ms
  • 10. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Turning search upside down @_FIBEP #_FIBEP #WMIC152015-11-19 Docs Result Query QueryStored Queries 1 million queries Some 250k long Complex rules 1 million new documents a day $$$$$$ Within 5-100ms
  • 11. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Turning search upside down @_FIBEP #_FIBEP #WMIC152015-11-19 Docs Result Query QueryStored Queries 1 million queries Some 250k long Complex rules 1 million new documents a day $$$$$$ Within 5-100ms
  • 12. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Turning search upside down @_FIBEP #_FIBEP #WMIC152015-11-19 Docs Query QueryStored Queries 1. Pre Query Subset 1 million queries Some 250k long Complex rules ~200 Doc 1 million new documents a day
  • 13. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Turning search upside down @_FIBEP #_FIBEP #WMIC152015-11-19 Docs Query QueryStored Queries 1. Pre Query Subset Result 1 million queries Some 250k long Complex rules ~200 2. Search Doc 1 million new documents a day
  • 14. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Replacing Autonomy IDOL • Autonomy IDOL replaced by Apache Solr − Parses IQL to Lucene queries − SolrCloud distributes the index & queries across several servers − Setup: 75 million documents hosted on 8 servers, 6 cores/24GB memory and 125 GB storage per server − This setup is doubled to have full redundancy − Features added to standard Solr by Flax: • Custom highlighting, • Framework to handle multiple languages • Extended error logging • Cluster management • Performance enhancements for complex wildcard queries @_FIBEP #_FIBEP #WMIC152015-11-19
  • 15. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Benefits of the project • Articles indexed and searchable within minutes of receiving them • New, much smarter tools for constructing and comparing monitoring queries • The Flax Monitor is an extremely smart and performant monitoring solution • Huge benefits from defining the Infomedia Query Language, IQL – Extremely enlightening and empowering process to analyze what we actually need from a query language – We fully understand and have documented how IQL works – IQL is designed to match Infomedia’s demands and preferences – We can revise and expand IQL as new needs and opportunities arrive – Not bound to any search platform. We can take it with us @_FIBEP #_FIBEP #WMIC152015-11-19
  • 16. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Learnings/Where are we now? • A challenging, complex, time-consuming but ultimately rewarding project • The ripple effect – we have had to revisit and update a lot of legacy systems • Customization is great, but can also mean more specification • Open Source prevents lock-in but demands investment in education - otherwise it is still just a magic box • Flax‘s expert knowledge has been invaluable • A succesful migration • More than 90% of Infomedia’s monitoring queries have been migrated to IQL with practically no negative change in precision or recall • The collaboration with Flax continues • As Infomedia develops, so do new ideas and feature requests • A customized open source platform also means continuous improvement • Currently updating to Solr 5.3 • Still experimenting with different ways to scale our Solr installation @_FIBEP #_FIBEP #WMIC152015-11-19
  • 17. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Other lessons • You can also keep your old query language - Flax have written dtSearch & Verity parsers for Lucene • Some of your old queries might not be working - e.g. Verity doesn't always tell you when queries are broken! • Open source can help future-proof your search - and you have control of the software • Engage with the open source community: - User groups - Mailing lists - Contribute back if you can @_FIBEP #_FIBEP #WMIC152015-11-19
  • 18. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna @_FIBEP #_FIBEP #WMIC15Date of Presentation Thanks for listening - any questions? Kristian Schou, Infomedia & Charlie Hull, Flax @InfomediaDK @Flaxsearch Web: www.flax.co.uk
  • 19. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna @_FIBEP #_FIBEP #WMIC15Date of Presentation Something else you might like Think outside the search box! 2DSearch is a patent pending, radical alternative to traditional keyword search. Instead of a one-dimensional search box, concepts are expressed and manipulated as objects on a two-dimensional canvas. So you spend less time worrying about Boolean strings, and more time creating semantically transparent queries and effective search strategies. Sign up to gain early access at www.2dsearch.com