SlideShare a Scribd company logo
1 of 53
SYMPOSIUM ON BIAS AND DIVERSITY IN IRA TESTBED FOR DIVERSIFICATON IN SEARCH Koblenz, August 31, 2011 Michael Matthews, Barcelona Media/Yahoo! Research 1
OVERVIEW Introduction to LivingKnowledge Testbed – The Diversity Engine Getting started – Our first application! Adding text analysis Adding multimedia analysis Evaluation Indexing and search Developing applications Future work 2
DIVERSITY ENGINE Provide collections, annotation tools and an evaluation framework to allow for collaborative and comparable research Supports indexing and searching on a wide variety of document annotations including entities, bias, trust, polarity, and multimedia features  Support development of bias and diversity aware applications
ARCHITECTURE Document Collections Analysis Pipeline Index/ Search Application Development NYT Yahoo! News ARC Crawls Evaluation Framework
 DESIGN DECISIONS Use Open Source tools when available Programming Language - Java 1.6 Data format – LK XML Analysis tools Operating System – Linux (any software language) Indexing/Search - Solr GUI – JSP, HTML, JavaScript, CSS 5
LK-XML format.
 DOCUMENT COLLECTIONS Supported Formats -ARC (Internet Memory Crawls) ,Text, HTML. Kyoto, BBN, NYT Collections Testing Examples included with Diversity Engine Large ARCs available from Internet Memory Converters provided for other collections (MPQA, BBN, NYT) that have licensing restrictions 7
 ANALYSIS MODULES 8
 INDEXING/SEARCH Solr Enterprise search platform built on top of Lucene Xml input and output allows for easy integration with Diversity Engine Plug-in framework allows customization Built-in facet capabilities support indexing and searching on annotations Integration Converter from LK XML – Solr XML Plug-in for facet ranking and speed improvements 9
 APPLICATION DEVELOPMENT ,[object Object]
Future Predictor
Media Content Analysis
Support development – coding required!
Real World Problems
HTML Extraction
Scaling to Large Collections
Provenance
Some pluggable GUI components
Examples to ease learning curve10
 APPLICATION DEVELOPMENT 11
 APPLICATION DEVELOPMENT 12
EVALUATION FRAMEWORK ,[object Object]
Evaluates any possible annotation pipeline
Measures correctness and quality
Outputs Precision + Recall
Compares annotation output of pipeline with ground truth data13
 OUR FIRST APPLICATION Download Diversity Engine release from SourceForge  tar xzvf [release file] cd testbed ant build apps/testbed conf/testbed/tutorial-application.xml What happened? 197 text files and 127 images files converted from arc format to LK XML and stored in devapps/example/data/lkxml 2 annotators were run over collection OpenNLP for tokenization, sentence splitting, Pos tags SST named entity recognizer Results stored in devapps/example/data/lkxml Files were converted to Solr xml format and indexed using solr Solr XML stored to devapps/example/data/solr HTML Visualization Files stored in devapps/example/data/html ant deploy-testbed Solr running at http://localthost:8983/solr/ Example app running at http://localhost:8983/testbed/ 14
 EXAMPLE SOLR OUTPUT http://localhost:8983/solr/select/?q=putin 15
 EXAMPLE APPLICATION http://localhost:8983/testbed/results.jsp?query=putin 16
 EXAMPLE DOCUMENT 17
 CONFIGURATION FILE <lk-applicationlogDir="log"appDir="devapps/example"> 	<corpusdir="corpora/examples/smallarc"format="arc"/> 	<image-pipeline> 		<annotators> 		</annotators> 	</image-pipeline> 	<pipeline> 		<annotators> 			<annotatorexec="./opennlp"/> 			<annotatorexec="./sst"/> 		</annotators> 	</pipeline> 	<visualize/> 	<indexersolrHomeDir="solr/solr“ 		solrDataDir="solr/solr/data“ 		converter="conf/testbed/tutorial-lk2solr.xml"/> 	<searcherappTitle="LivingKnowledge  - Example Application" appShortTitle="Example Application" appUrl="http://localhost:8983/solr/"> 	<facets> 			<facetfield="per"description="Person"/> 			<facetfield="loc"description="Location"/> 	</facets> 	</searcher> </lk-application> 18
 TEXT ANALYSIS 	<pipeline> 		<annotators> 			<annotatorexec="./opennlp"/> 			<annotatorexec="./sst"/> 		</annotators> 	</pipeline> 	<pipeline> 		<annotators> 			<annotatorexec="./opennlp"/> 			<annotatorexec="./sst"/> 			<annotatorexec="./facts"/> 			<annotatorexec="./unitn_tagger"/> 			<annotatorexec="./unitn_subjexpr"/> 		</annotators> 	</pipeline> apps/testbed –run pipeline conf/testbed/tutorial-application.xml apps/testbed –run visualization conf/testbed/tutorial-application.xml 19
 TEXT ANALYSIS - FACTS devapps/example/data/lkxml/EA-EUElections2009-euobserver-0729-20090729085530-00000.arc.15521713.facts.xml 20
 TEXT ANALYSIS - FACTS devapps/example/data/html/EA-EUElections2009-euobserver-0729-20090729085530-00000.arc.15521713.html 21
 IMAGE ANALYSIS 	<image-pipeline> 		<annotators> 			<annotatorexec="./soton_haarfacedetector"/> 		</annotators> 	</pipeline> 	<pipeline> 		<annotators> 			<annotatorexec="./opennlp"/> 			<annotatorexec="./sst"/> 			<annotatorexec="./facts"/> 			<annotatorexec="./unitn_tagger"/> 			<annotatorexec="./unitn_subjexpr"/> 			<annotatorexec="./imageannots"/> 		</annotators> 	</pipeline> apps/testbed –run pipeline,image-pipeline –pipeline imageannotsconf/testbed/tutorial-application.xml ls devapps/example/data/lkxml/img/* 22
 ANALYSIS API Documents in LK XML format  Annotators passed a single document directory –They should add annotations for each document in directory Files will have consistent naming convention LkText file = id + “.lktext.xml” LkMedia = id + “.lkmedia.xml” LkAnnotation = id + “.” + annotatorId + “.xml” Annotators will be processed sequentially in the order listed in the XML file Annotators can be written in any language but must run on Linux – Helper classes will exist for Java, but there is no obligation to use them. Add application calling your new annotator to apps directory Add your application to the configuration file as before 23
 ANALYSIS API – JAVA Extend class org.diversityengine.annotator.AbstractAnnotator Implement Methods getName() getType() - TEXT OR IMAGE For Image Analysis implement LkAnnotation getLkAnnotation(ImageDocument document) For Text Analysis implement LkAnnotation getLkAnnotation(TextDocument document) In main, instantiate and call annotator NewAnnotator annotator = new NewAnnotator() annotator.processDirectory(args[0]); Add application calling your new annotator to apps directory Add your application to the configuration file as before 24
EVALUATION Evaluation works with same configuration file. Simply add evaluation element <lk-applicationlogDir="log"appDir="devapps/evaluation"> 	<corpusdir="corpora/evaluation/sst/text/"format="bbn"/> 	<pipeline> 	<annotators> 		<annotatorexec="./sst"/> 	</annotators> 	</pipeline> 	<evaluationevalDir="evaluation/sst/"> 		<evaluatorprovides="ENTITIES" goldDir="corpora/evaluation/sst/gold/" goldAnnotator="sstgold" annotator="sst" /> 	</evaluation> </lk-application> apps/testbed conf/evaluation/sst.xml 25
EVALUATION RESULTS <evaluationgoldDir="/home/mikemat/code/livingknowledge/WP6/testbed/corpora/evaluation/sst/gold/"lkDir="/home/mikemat/code/livingknowledge/WP6/testbed/devapps/evaluation/data/lkxml"annotation="sst"goldAnnotation="sstgold"provides="ENTITIES"> <docs> <docid="WSJ0375"N="19"tp="18"fp="1"fn="1" /> <docid="WSJ0380"N="19"tp="15"fp="4"fn="1" /> <docid="WSJ0376"N="72"tp="61"fp="11"fn="7" /> <docid="WSJ0377"N="26"tp="17"fp="9"fn="6" /> <docid="WSJ0378"N="10"tp="10"fp="0"fn="0" /> <docid="WSJ0379"N="24"tp="19"fp="5"fn="2" /> </docs> <totalsN="170"tp="140"fp="30"fn="17"p="0.8235294117647058"r="0.89171974522293"f="0.8562691131498471" /> </evaluation> cat evaluation/sst/sst.ENTITIES.xml 26
 INDEXING AND SEARCH Search Engines - Traditional Bag-of-words representation Inverted index (words -> documents) for efficiency 10 docs ranked according tf-idf similarity with query Search Engines – Today Much metadata associated with documents Ranking based on 100s of features (date, location, pagerank, click data, etc, personalization) Richer display Facets for exploratory search Answers when appropriate etc.. Many open source options - Lucene/Solr most widely used 27
 APACHE LUCENE/SOLR Lucene/Solr 28
 FACETED SEARCH Diagram by Yonik Seeley 29
FACETED SEACH ,[object Object]
price ranges for product query
related people or locations for news query
Exploratory Search
Show documents that matching the query term and a selected facet
Make inferences not clear from simple document list
Living Knowledge Analysis is modeled very well by facets
Topics as determined by entity and fact extraction
Location and Time diversity dimensions
Opinions as determined by opinion extraction30
LK XML TO SOLR ,[object Object]

More Related Content

What's hot

Opal Hermes - towards representative benchmarks
Opal  Hermes - towards representative benchmarksOpal  Hermes - towards representative benchmarks
Opal Hermes - towards representative benchmarksMichaelEichberg1
 
Automated Evolution of Feature Logging Statement Levels Using Git Histories a...
Automated Evolution of Feature Logging Statement Levels Using Git Histories a...Automated Evolution of Feature Logging Statement Levels Using Git Histories a...
Automated Evolution of Feature Logging Statement Levels Using Git Histories a...Raffi Khatchadourian
 
QTP Automation Testing Tutorial 7
QTP Automation Testing Tutorial 7QTP Automation Testing Tutorial 7
QTP Automation Testing Tutorial 7Akash Tyagi
 
Dita ot pipeline webinar
Dita ot pipeline webinarDita ot pipeline webinar
Dita ot pipeline webinarSuite Solutions
 
Net framework session03
Net framework session03Net framework session03
Net framework session03Vivek chan
 
Intro To C++ - Class 07 - Headers, Interfaces, & Prototypes
Intro To C++ - Class 07 - Headers, Interfaces, & PrototypesIntro To C++ - Class 07 - Headers, Interfaces, & Prototypes
Intro To C++ - Class 07 - Headers, Interfaces, & PrototypesBlue Elephant Consulting
 
EA User Group London 2018 - Extending EA with custom scripts to cater for spe...
EA User Group London 2018 - Extending EA with custom scripts to cater for spe...EA User Group London 2018 - Extending EA with custom scripts to cater for spe...
EA User Group London 2018 - Extending EA with custom scripts to cater for spe...Guillaume Finance
 
Jpylyzer, a validation and feature extraction tool developed in SCAPE project
Jpylyzer, a validation and feature extraction tool developed in SCAPE projectJpylyzer, a validation and feature extraction tool developed in SCAPE project
Jpylyzer, a validation and feature extraction tool developed in SCAPE projectSCAPE Project
 
CustomizingStyleSheetsForHTMLOutputs
CustomizingStyleSheetsForHTMLOutputsCustomizingStyleSheetsForHTMLOutputs
CustomizingStyleSheetsForHTMLOutputsSuite Solutions
 
Understanding and Configuring the FO Plug-in for Generating PDF Files: Part I...
Understanding and Configuring the FO Plug-in for Generating PDF Files: Part I...Understanding and Configuring the FO Plug-in for Generating PDF Files: Part I...
Understanding and Configuring the FO Plug-in for Generating PDF Files: Part I...Suite Solutions
 
The_Little_Jenkinsfile_That_Could
The_Little_Jenkinsfile_That_CouldThe_Little_Jenkinsfile_That_Could
The_Little_Jenkinsfile_That_CouldShelley Lambert
 
QTP Automation Testing Tutorial 2
QTP Automation Testing Tutorial 2QTP Automation Testing Tutorial 2
QTP Automation Testing Tutorial 2Akash Tyagi
 
LDAP Injection & Blind LDAP Injection in Web Applications
LDAP Injection & Blind LDAP Injection in Web ApplicationsLDAP Injection & Blind LDAP Injection in Web Applications
LDAP Injection & Blind LDAP Injection in Web ApplicationsChema Alonso
 
WebLogic's ClassLoaders, Filtering ClassLoader and ClassLoader Analysis Tool
WebLogic's ClassLoaders, Filtering ClassLoader and ClassLoader Analysis ToolWebLogic's ClassLoaders, Filtering ClassLoader and ClassLoader Analysis Tool
WebLogic's ClassLoaders, Filtering ClassLoader and ClassLoader Analysis ToolJeffrey West
 
WebLogic Filtering ClassLoader and ClassLoader Analysis Tool Demo
WebLogic Filtering ClassLoader and ClassLoader Analysis Tool DemoWebLogic Filtering ClassLoader and ClassLoader Analysis Tool Demo
WebLogic Filtering ClassLoader and ClassLoader Analysis Tool DemoJeffrey West
 

What's hot (16)

Opal Hermes - towards representative benchmarks
Opal  Hermes - towards representative benchmarksOpal  Hermes - towards representative benchmarks
Opal Hermes - towards representative benchmarks
 
Automated Evolution of Feature Logging Statement Levels Using Git Histories a...
Automated Evolution of Feature Logging Statement Levels Using Git Histories a...Automated Evolution of Feature Logging Statement Levels Using Git Histories a...
Automated Evolution of Feature Logging Statement Levels Using Git Histories a...
 
QTP Automation Testing Tutorial 7
QTP Automation Testing Tutorial 7QTP Automation Testing Tutorial 7
QTP Automation Testing Tutorial 7
 
Dita ot pipeline webinar
Dita ot pipeline webinarDita ot pipeline webinar
Dita ot pipeline webinar
 
Net framework session03
Net framework session03Net framework session03
Net framework session03
 
Intro To C++ - Class 07 - Headers, Interfaces, & Prototypes
Intro To C++ - Class 07 - Headers, Interfaces, & PrototypesIntro To C++ - Class 07 - Headers, Interfaces, & Prototypes
Intro To C++ - Class 07 - Headers, Interfaces, & Prototypes
 
EA User Group London 2018 - Extending EA with custom scripts to cater for spe...
EA User Group London 2018 - Extending EA with custom scripts to cater for spe...EA User Group London 2018 - Extending EA with custom scripts to cater for spe...
EA User Group London 2018 - Extending EA with custom scripts to cater for spe...
 
Jpylyzer, a validation and feature extraction tool developed in SCAPE project
Jpylyzer, a validation and feature extraction tool developed in SCAPE projectJpylyzer, a validation and feature extraction tool developed in SCAPE project
Jpylyzer, a validation and feature extraction tool developed in SCAPE project
 
CustomizingStyleSheetsForHTMLOutputs
CustomizingStyleSheetsForHTMLOutputsCustomizingStyleSheetsForHTMLOutputs
CustomizingStyleSheetsForHTMLOutputs
 
LOD2: State of Play WP6 - LOD2 Stack Architecture
LOD2: State of Play WP6 - LOD2 Stack ArchitectureLOD2: State of Play WP6 - LOD2 Stack Architecture
LOD2: State of Play WP6 - LOD2 Stack Architecture
 
Understanding and Configuring the FO Plug-in for Generating PDF Files: Part I...
Understanding and Configuring the FO Plug-in for Generating PDF Files: Part I...Understanding and Configuring the FO Plug-in for Generating PDF Files: Part I...
Understanding and Configuring the FO Plug-in for Generating PDF Files: Part I...
 
The_Little_Jenkinsfile_That_Could
The_Little_Jenkinsfile_That_CouldThe_Little_Jenkinsfile_That_Could
The_Little_Jenkinsfile_That_Could
 
QTP Automation Testing Tutorial 2
QTP Automation Testing Tutorial 2QTP Automation Testing Tutorial 2
QTP Automation Testing Tutorial 2
 
LDAP Injection & Blind LDAP Injection in Web Applications
LDAP Injection & Blind LDAP Injection in Web ApplicationsLDAP Injection & Blind LDAP Injection in Web Applications
LDAP Injection & Blind LDAP Injection in Web Applications
 
WebLogic's ClassLoaders, Filtering ClassLoader and ClassLoader Analysis Tool
WebLogic's ClassLoaders, Filtering ClassLoader and ClassLoader Analysis ToolWebLogic's ClassLoaders, Filtering ClassLoader and ClassLoader Analysis Tool
WebLogic's ClassLoaders, Filtering ClassLoader and ClassLoader Analysis Tool
 
WebLogic Filtering ClassLoader and ClassLoader Analysis Tool Demo
WebLogic Filtering ClassLoader and ClassLoader Analysis Tool DemoWebLogic Filtering ClassLoader and ClassLoader Analysis Tool Demo
WebLogic Filtering ClassLoader and ClassLoader Analysis Tool Demo
 

Viewers also liked

A Linear-Algebraic Technique with an Application in Semantic Image Retrieval
A Linear-Algebraic Technique with an Application in Semantic Image RetrievalA Linear-Algebraic Technique with an Application in Semantic Image Retrieval
A Linear-Algebraic Technique with an Application in Semantic Image RetrievalJonathon Hare
 
Southampton Web Science DTC - Innovations in web publishing and services for ...
Southampton Web Science DTC - Innovations in web publishing and services for ...Southampton Web Science DTC - Innovations in web publishing and services for ...
Southampton Web Science DTC - Innovations in web publishing and services for ...David Worlock
 
5 schéma de financement des investissements
5  schéma de financement des investissements5  schéma de financement des investissements
5 schéma de financement des investissementsJean-michel Neugate
 
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)Jonathon Hare
 
University of Southampton StarStream
University of Southampton StarStreamUniversity of Southampton StarStream
University of Southampton StarStreamAlan Scrase
 

Viewers also liked (9)

Economics pp
Economics ppEconomics pp
Economics pp
 
Greentree Theme
Greentree ThemeGreentree Theme
Greentree Theme
 
A Linear-Algebraic Technique with an Application in Semantic Image Retrieval
A Linear-Algebraic Technique with an Application in Semantic Image RetrievalA Linear-Algebraic Technique with an Application in Semantic Image Retrieval
A Linear-Algebraic Technique with an Application in Semantic Image Retrieval
 
Poemes
PoemesPoemes
Poemes
 
Magrana pdi
Magrana pdiMagrana pdi
Magrana pdi
 
Southampton Web Science DTC - Innovations in web publishing and services for ...
Southampton Web Science DTC - Innovations in web publishing and services for ...Southampton Web Science DTC - Innovations in web publishing and services for ...
Southampton Web Science DTC - Innovations in web publishing and services for ...
 
5 schéma de financement des investissements
5  schéma de financement des investissements5  schéma de financement des investissements
5 schéma de financement des investissements
 
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)
 
University of Southampton StarStream
University of Southampton StarStreamUniversity of Southampton StarStream
University of Southampton StarStream
 

Similar to ESSIR LivingKnowledge DiversityEngine tutorial

torque - Automation Testing Tool for C-C++ on Linux
torque -  Automation Testing Tool for C-C++ on Linuxtorque -  Automation Testing Tool for C-C++ on Linux
torque - Automation Testing Tool for C-C++ on LinuxJITENDRA LENKA
 
Must be similar to screenshotsI must be able to run the projects.docx
Must be similar to screenshotsI must be able to run the projects.docxMust be similar to screenshotsI must be able to run the projects.docx
Must be similar to screenshotsI must be able to run the projects.docxherthaweston
 
Introduction to Roslyn and its use in program development
Introduction to Roslyn and its use in program developmentIntroduction to Roslyn and its use in program development
Introduction to Roslyn and its use in program developmentPVS-Studio
 
Understanding and extending p2 for fun and profit
Understanding and extending p2 for fun and profitUnderstanding and extending p2 for fun and profit
Understanding and extending p2 for fun and profitPascal Rapicault
 
Ensuring Software Quality Through Test Automation- Naperville Software Develo...
Ensuring Software Quality Through Test Automation- Naperville Software Develo...Ensuring Software Quality Through Test Automation- Naperville Software Develo...
Ensuring Software Quality Through Test Automation- Naperville Software Develo...LinkCompanyAdmin
 
Generative AI Application Development using LangChain and LangFlow
Generative AI Application Development using LangChain and LangFlowGenerative AI Application Development using LangChain and LangFlow
Generative AI Application Development using LangChain and LangFlowGene Leybzon
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowKaxil Naik
 
Android application structure
Android application structureAndroid application structure
Android application structureAlexey Ustenko
 
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...Amazon Web Services
 
Performance Evaluation of Open Source Data Mining Tools
Performance Evaluation of Open Source Data Mining ToolsPerformance Evaluation of Open Source Data Mining Tools
Performance Evaluation of Open Source Data Mining Toolsijsrd.com
 
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?BIOVIA
 
OWASP Dependency-Track Introduction
OWASP Dependency-Track IntroductionOWASP Dependency-Track Introduction
OWASP Dependency-Track IntroductionSergey Sotnikov
 
Creation of a Test Bed Environment for Core Java Applications using White Box...
Creation of a Test Bed Environment for Core Java Applications using White Box...Creation of a Test Bed Environment for Core Java Applications using White Box...
Creation of a Test Bed Environment for Core Java Applications using White Box...cscpconf
 
Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Julie Allinson
 

Similar to ESSIR LivingKnowledge DiversityEngine tutorial (20)

ShwetaKBijay-resume
ShwetaKBijay-resumeShwetaKBijay-resume
ShwetaKBijay-resume
 
torque - Automation Testing Tool for C-C++ on Linux
torque -  Automation Testing Tool for C-C++ on Linuxtorque -  Automation Testing Tool for C-C++ on Linux
torque - Automation Testing Tool for C-C++ on Linux
 
Document Summarizer
Document SummarizerDocument Summarizer
Document Summarizer
 
Cognos Software Development Kit
Cognos Software Development KitCognos Software Development Kit
Cognos Software Development Kit
 
Must be similar to screenshotsI must be able to run the projects.docx
Must be similar to screenshotsI must be able to run the projects.docxMust be similar to screenshotsI must be able to run the projects.docx
Must be similar to screenshotsI must be able to run the projects.docx
 
Introduction to Roslyn and its use in program development
Introduction to Roslyn and its use in program developmentIntroduction to Roslyn and its use in program development
Introduction to Roslyn and its use in program development
 
Apache lucene
Apache luceneApache lucene
Apache lucene
 
Understanding and extending p2 for fun and profit
Understanding and extending p2 for fun and profitUnderstanding and extending p2 for fun and profit
Understanding and extending p2 for fun and profit
 
Ensuring Software Quality Through Test Automation- Naperville Software Develo...
Ensuring Software Quality Through Test Automation- Naperville Software Develo...Ensuring Software Quality Through Test Automation- Naperville Software Develo...
Ensuring Software Quality Through Test Automation- Naperville Software Develo...
 
Resume_Shanthi
Resume_ShanthiResume_Shanthi
Resume_Shanthi
 
Generative AI Application Development using LangChain and LangFlow
Generative AI Application Development using LangChain and LangFlowGenerative AI Application Development using LangChain and LangFlow
Generative AI Application Development using LangChain and LangFlow
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 
Android application structure
Android application structureAndroid application structure
Android application structure
 
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
 
Performance Evaluation of Open Source Data Mining Tools
Performance Evaluation of Open Source Data Mining ToolsPerformance Evaluation of Open Source Data Mining Tools
Performance Evaluation of Open Source Data Mining Tools
 
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
 
10071756.ppt
10071756.ppt10071756.ppt
10071756.ppt
 
OWASP Dependency-Track Introduction
OWASP Dependency-Track IntroductionOWASP Dependency-Track Introduction
OWASP Dependency-Track Introduction
 
Creation of a Test Bed Environment for Core Java Applications using White Box...
Creation of a Test Bed Environment for Core Java Applications using White Box...Creation of a Test Bed Environment for Core Java Applications using White Box...
Creation of a Test Bed Environment for Core Java Applications using White Box...
 
Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29
 

More from Jonathon Hare

Scale Saliency: Applications in Visual Matching,Tracking and View-Based Objec...
Scale Saliency: Applications in Visual Matching,Tracking and View-Based Objec...Scale Saliency: Applications in Visual Matching,Tracking and View-Based Objec...
Scale Saliency: Applications in Visual Matching,Tracking and View-Based Objec...Jonathon Hare
 
Content-based image retrieval using a mobile device as a novel interface
Content-based image retrieval using a mobile device as a novel interfaceContent-based image retrieval using a mobile device as a novel interface
Content-based image retrieval using a mobile device as a novel interfaceJonathon Hare
 
IMAGE DIVERSITY ANALYSIS: CONTEXT, OPINION AND BIAS
IMAGE DIVERSITY ANALYSIS: CONTEXT, OPINION AND BIASIMAGE DIVERSITY ANALYSIS: CONTEXT, OPINION AND BIAS
IMAGE DIVERSITY ANALYSIS: CONTEXT, OPINION AND BIASJonathon Hare
 
Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and B...
Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and B...Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and B...
Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and B...Jonathon Hare
 
OpenIMAJ and ImageTerrier: Java Libraries and Tools for Scalable Multimedia A...
OpenIMAJ and ImageTerrier: Java Libraries and Tools for Scalable Multimedia A...OpenIMAJ and ImageTerrier: Java Libraries and Tools for Scalable Multimedia A...
OpenIMAJ and ImageTerrier: Java Libraries and Tools for Scalable Multimedia A...Jonathon Hare
 
Mind the Gap: Another look at the problem of the semantic gap in image retrieval
Mind the Gap: Another look at the problem of the semantic gap in image retrievalMind the Gap: Another look at the problem of the semantic gap in image retrieval
Mind the Gap: Another look at the problem of the semantic gap in image retrievalJonathon Hare
 
Saliency-based Models of Image Content and their Application to Auto-Annotati...
Saliency-based Models of Image Content and their Application to Auto-Annotati...Saliency-based Models of Image Content and their Application to Auto-Annotati...
Saliency-based Models of Image Content and their Application to Auto-Annotati...Jonathon Hare
 
The Art and Science of Image Retrieval
The Art and Science of Image RetrievalThe Art and Science of Image Retrieval
The Art and Science of Image RetrievalJonathon Hare
 
Searching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonSearching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonJonathon Hare
 
Searching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonSearching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonJonathon Hare
 
Searching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonSearching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonJonathon Hare
 
BUILDING A SCALABLE MULTIMEDIA WEB OBSERVATORY
BUILDING A SCALABLE MULTIMEDIA WEB OBSERVATORYBUILDING A SCALABLE MULTIMEDIA WEB OBSERVATORY
BUILDING A SCALABLE MULTIMEDIA WEB OBSERVATORYJonathon Hare
 
Sharp images and fuzzy concepts: Multimedia retrieval and the semantic gap
Sharp images and fuzzy concepts: Multimedia retrieval and the semantic gapSharp images and fuzzy concepts: Multimedia retrieval and the semantic gap
Sharp images and fuzzy concepts: Multimedia retrieval and the semantic gapJonathon Hare
 
Semantic Retrieval and Automatic Annotation: Linear Transformations, Correlat...
Semantic Retrieval and Automatic Annotation: Linear Transformations, Correlat...Semantic Retrieval and Automatic Annotation: Linear Transformations, Correlat...
Semantic Retrieval and Automatic Annotation: Linear Transformations, Correlat...Jonathon Hare
 
Spot the Dog: An overview of semantic retrieval of unannotated images in the ...
Spot the Dog: An overview of semantic retrieval of unannotated images in the ...Spot the Dog: An overview of semantic retrieval of unannotated images in the ...
Spot the Dog: An overview of semantic retrieval of unannotated images in the ...Jonathon Hare
 
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...Jonathon Hare
 
SEWM'14 keynote: Mining Events from Multimedia Streams
SEWM'14 keynote: Mining Events from Multimedia StreamsSEWM'14 keynote: Mining Events from Multimedia Streams
SEWM'14 keynote: Mining Events from Multimedia StreamsJonathon Hare
 
A brief introduction to extracting information from images
A brief introduction to extracting information from imagesA brief introduction to extracting information from images
A brief introduction to extracting information from imagesJonathon Hare
 
WAISFest 2011: Southampton Goggles
WAISFest 2011: Southampton GogglesWAISFest 2011: Southampton Goggles
WAISFest 2011: Southampton GogglesJonathon Hare
 

More from Jonathon Hare (19)

Scale Saliency: Applications in Visual Matching,Tracking and View-Based Objec...
Scale Saliency: Applications in Visual Matching,Tracking and View-Based Objec...Scale Saliency: Applications in Visual Matching,Tracking and View-Based Objec...
Scale Saliency: Applications in Visual Matching,Tracking and View-Based Objec...
 
Content-based image retrieval using a mobile device as a novel interface
Content-based image retrieval using a mobile device as a novel interfaceContent-based image retrieval using a mobile device as a novel interface
Content-based image retrieval using a mobile device as a novel interface
 
IMAGE DIVERSITY ANALYSIS: CONTEXT, OPINION AND BIAS
IMAGE DIVERSITY ANALYSIS: CONTEXT, OPINION AND BIASIMAGE DIVERSITY ANALYSIS: CONTEXT, OPINION AND BIAS
IMAGE DIVERSITY ANALYSIS: CONTEXT, OPINION AND BIAS
 
Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and B...
Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and B...Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and B...
Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and B...
 
OpenIMAJ and ImageTerrier: Java Libraries and Tools for Scalable Multimedia A...
OpenIMAJ and ImageTerrier: Java Libraries and Tools for Scalable Multimedia A...OpenIMAJ and ImageTerrier: Java Libraries and Tools for Scalable Multimedia A...
OpenIMAJ and ImageTerrier: Java Libraries and Tools for Scalable Multimedia A...
 
Mind the Gap: Another look at the problem of the semantic gap in image retrieval
Mind the Gap: Another look at the problem of the semantic gap in image retrievalMind the Gap: Another look at the problem of the semantic gap in image retrieval
Mind the Gap: Another look at the problem of the semantic gap in image retrieval
 
Saliency-based Models of Image Content and their Application to Auto-Annotati...
Saliency-based Models of Image Content and their Application to Auto-Annotati...Saliency-based Models of Image Content and their Application to Auto-Annotati...
Saliency-based Models of Image Content and their Application to Auto-Annotati...
 
The Art and Science of Image Retrieval
The Art and Science of Image RetrievalThe Art and Science of Image Retrieval
The Art and Science of Image Retrieval
 
Searching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonSearching Images: Recent research at Southampton
Searching Images: Recent research at Southampton
 
Searching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonSearching Images: Recent research at Southampton
Searching Images: Recent research at Southampton
 
Searching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonSearching Images: Recent research at Southampton
Searching Images: Recent research at Southampton
 
BUILDING A SCALABLE MULTIMEDIA WEB OBSERVATORY
BUILDING A SCALABLE MULTIMEDIA WEB OBSERVATORYBUILDING A SCALABLE MULTIMEDIA WEB OBSERVATORY
BUILDING A SCALABLE MULTIMEDIA WEB OBSERVATORY
 
Sharp images and fuzzy concepts: Multimedia retrieval and the semantic gap
Sharp images and fuzzy concepts: Multimedia retrieval and the semantic gapSharp images and fuzzy concepts: Multimedia retrieval and the semantic gap
Sharp images and fuzzy concepts: Multimedia retrieval and the semantic gap
 
Semantic Retrieval and Automatic Annotation: Linear Transformations, Correlat...
Semantic Retrieval and Automatic Annotation: Linear Transformations, Correlat...Semantic Retrieval and Automatic Annotation: Linear Transformations, Correlat...
Semantic Retrieval and Automatic Annotation: Linear Transformations, Correlat...
 
Spot the Dog: An overview of semantic retrieval of unannotated images in the ...
Spot the Dog: An overview of semantic retrieval of unannotated images in the ...Spot the Dog: An overview of semantic retrieval of unannotated images in the ...
Spot the Dog: An overview of semantic retrieval of unannotated images in the ...
 
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
 
SEWM'14 keynote: Mining Events from Multimedia Streams
SEWM'14 keynote: Mining Events from Multimedia StreamsSEWM'14 keynote: Mining Events from Multimedia Streams
SEWM'14 keynote: Mining Events from Multimedia Streams
 
A brief introduction to extracting information from images
A brief introduction to extracting information from imagesA brief introduction to extracting information from images
A brief introduction to extracting information from images
 
WAISFest 2011: Southampton Goggles
WAISFest 2011: Southampton GogglesWAISFest 2011: Southampton Goggles
WAISFest 2011: Southampton Goggles
 

Recently uploaded

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Recently uploaded (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

ESSIR LivingKnowledge DiversityEngine tutorial

  • 1. SYMPOSIUM ON BIAS AND DIVERSITY IN IRA TESTBED FOR DIVERSIFICATON IN SEARCH Koblenz, August 31, 2011 Michael Matthews, Barcelona Media/Yahoo! Research 1
  • 2. OVERVIEW Introduction to LivingKnowledge Testbed – The Diversity Engine Getting started – Our first application! Adding text analysis Adding multimedia analysis Evaluation Indexing and search Developing applications Future work 2
  • 3. DIVERSITY ENGINE Provide collections, annotation tools and an evaluation framework to allow for collaborative and comparable research Supports indexing and searching on a wide variety of document annotations including entities, bias, trust, polarity, and multimedia features Support development of bias and diversity aware applications
  • 4. ARCHITECTURE Document Collections Analysis Pipeline Index/ Search Application Development NYT Yahoo! News ARC Crawls Evaluation Framework
  • 5. DESIGN DECISIONS Use Open Source tools when available Programming Language - Java 1.6 Data format – LK XML Analysis tools Operating System – Linux (any software language) Indexing/Search - Solr GUI – JSP, HTML, JavaScript, CSS 5
  • 7. DOCUMENT COLLECTIONS Supported Formats -ARC (Internet Memory Crawls) ,Text, HTML. Kyoto, BBN, NYT Collections Testing Examples included with Diversity Engine Large ARCs available from Internet Memory Converters provided for other collections (MPQA, BBN, NYT) that have licensing restrictions 7
  • 9. INDEXING/SEARCH Solr Enterprise search platform built on top of Lucene Xml input and output allows for easy integration with Diversity Engine Plug-in framework allows customization Built-in facet capabilities support indexing and searching on annotations Integration Converter from LK XML – Solr XML Plug-in for facet ranking and speed improvements 9
  • 10.
  • 13. Support development – coding required!
  • 16. Scaling to Large Collections
  • 18. Some pluggable GUI components
  • 19. Examples to ease learning curve10
  • 22.
  • 23. Evaluates any possible annotation pipeline
  • 26. Compares annotation output of pipeline with ground truth data13
  • 27. OUR FIRST APPLICATION Download Diversity Engine release from SourceForge tar xzvf [release file] cd testbed ant build apps/testbed conf/testbed/tutorial-application.xml What happened? 197 text files and 127 images files converted from arc format to LK XML and stored in devapps/example/data/lkxml 2 annotators were run over collection OpenNLP for tokenization, sentence splitting, Pos tags SST named entity recognizer Results stored in devapps/example/data/lkxml Files were converted to Solr xml format and indexed using solr Solr XML stored to devapps/example/data/solr HTML Visualization Files stored in devapps/example/data/html ant deploy-testbed Solr running at http://localthost:8983/solr/ Example app running at http://localhost:8983/testbed/ 14
  • 28. EXAMPLE SOLR OUTPUT http://localhost:8983/solr/select/?q=putin 15
  • 29. EXAMPLE APPLICATION http://localhost:8983/testbed/results.jsp?query=putin 16
  • 31. CONFIGURATION FILE <lk-applicationlogDir="log"appDir="devapps/example"> <corpusdir="corpora/examples/smallarc"format="arc"/> <image-pipeline> <annotators> </annotators> </image-pipeline> <pipeline> <annotators> <annotatorexec="./opennlp"/> <annotatorexec="./sst"/> </annotators> </pipeline> <visualize/> <indexersolrHomeDir="solr/solr“ solrDataDir="solr/solr/data“ converter="conf/testbed/tutorial-lk2solr.xml"/> <searcherappTitle="LivingKnowledge - Example Application" appShortTitle="Example Application" appUrl="http://localhost:8983/solr/"> <facets> <facetfield="per"description="Person"/> <facetfield="loc"description="Location"/> </facets> </searcher> </lk-application> 18
  • 32. TEXT ANALYSIS <pipeline> <annotators> <annotatorexec="./opennlp"/> <annotatorexec="./sst"/> </annotators> </pipeline> <pipeline> <annotators> <annotatorexec="./opennlp"/> <annotatorexec="./sst"/> <annotatorexec="./facts"/> <annotatorexec="./unitn_tagger"/> <annotatorexec="./unitn_subjexpr"/> </annotators> </pipeline> apps/testbed –run pipeline conf/testbed/tutorial-application.xml apps/testbed –run visualization conf/testbed/tutorial-application.xml 19
  • 33. TEXT ANALYSIS - FACTS devapps/example/data/lkxml/EA-EUElections2009-euobserver-0729-20090729085530-00000.arc.15521713.facts.xml 20
  • 34. TEXT ANALYSIS - FACTS devapps/example/data/html/EA-EUElections2009-euobserver-0729-20090729085530-00000.arc.15521713.html 21
  • 35. IMAGE ANALYSIS <image-pipeline> <annotators> <annotatorexec="./soton_haarfacedetector"/> </annotators> </pipeline> <pipeline> <annotators> <annotatorexec="./opennlp"/> <annotatorexec="./sst"/> <annotatorexec="./facts"/> <annotatorexec="./unitn_tagger"/> <annotatorexec="./unitn_subjexpr"/> <annotatorexec="./imageannots"/> </annotators> </pipeline> apps/testbed –run pipeline,image-pipeline –pipeline imageannotsconf/testbed/tutorial-application.xml ls devapps/example/data/lkxml/img/* 22
  • 36. ANALYSIS API Documents in LK XML format Annotators passed a single document directory –They should add annotations for each document in directory Files will have consistent naming convention LkText file = id + “.lktext.xml” LkMedia = id + “.lkmedia.xml” LkAnnotation = id + “.” + annotatorId + “.xml” Annotators will be processed sequentially in the order listed in the XML file Annotators can be written in any language but must run on Linux – Helper classes will exist for Java, but there is no obligation to use them. Add application calling your new annotator to apps directory Add your application to the configuration file as before 23
  • 37. ANALYSIS API – JAVA Extend class org.diversityengine.annotator.AbstractAnnotator Implement Methods getName() getType() - TEXT OR IMAGE For Image Analysis implement LkAnnotation getLkAnnotation(ImageDocument document) For Text Analysis implement LkAnnotation getLkAnnotation(TextDocument document) In main, instantiate and call annotator NewAnnotator annotator = new NewAnnotator() annotator.processDirectory(args[0]); Add application calling your new annotator to apps directory Add your application to the configuration file as before 24
  • 38. EVALUATION Evaluation works with same configuration file. Simply add evaluation element <lk-applicationlogDir="log"appDir="devapps/evaluation"> <corpusdir="corpora/evaluation/sst/text/"format="bbn"/> <pipeline> <annotators> <annotatorexec="./sst"/> </annotators> </pipeline> <evaluationevalDir="evaluation/sst/"> <evaluatorprovides="ENTITIES" goldDir="corpora/evaluation/sst/gold/" goldAnnotator="sstgold" annotator="sst" /> </evaluation> </lk-application> apps/testbed conf/evaluation/sst.xml 25
  • 39. EVALUATION RESULTS <evaluationgoldDir="/home/mikemat/code/livingknowledge/WP6/testbed/corpora/evaluation/sst/gold/"lkDir="/home/mikemat/code/livingknowledge/WP6/testbed/devapps/evaluation/data/lkxml"annotation="sst"goldAnnotation="sstgold"provides="ENTITIES"> <docs> <docid="WSJ0375"N="19"tp="18"fp="1"fn="1" /> <docid="WSJ0380"N="19"tp="15"fp="4"fn="1" /> <docid="WSJ0376"N="72"tp="61"fp="11"fn="7" /> <docid="WSJ0377"N="26"tp="17"fp="9"fn="6" /> <docid="WSJ0378"N="10"tp="10"fp="0"fn="0" /> <docid="WSJ0379"N="24"tp="19"fp="5"fn="2" /> </docs> <totalsN="170"tp="140"fp="30"fn="17"p="0.8235294117647058"r="0.89171974522293"f="0.8562691131498471" /> </evaluation> cat evaluation/sst/sst.ENTITIES.xml 26
  • 40. INDEXING AND SEARCH Search Engines - Traditional Bag-of-words representation Inverted index (words -> documents) for efficiency 10 docs ranked according tf-idf similarity with query Search Engines – Today Much metadata associated with documents Ranking based on 100s of features (date, location, pagerank, click data, etc, personalization) Richer display Facets for exploratory search Answers when appropriate etc.. Many open source options - Lucene/Solr most widely used 27
  • 41. APACHE LUCENE/SOLR Lucene/Solr 28
  • 42. FACETED SEARCH Diagram by Yonik Seeley 29
  • 43.
  • 44. price ranges for product query
  • 45. related people or locations for news query
  • 47. Show documents that matching the query term and a selected facet
  • 48. Make inferences not clear from simple document list
  • 49. Living Knowledge Analysis is modeled very well by facets
  • 50. Topics as determined by entity and fact extraction
  • 51. Location and Time diversity dimensions
  • 52. Opinions as determined by opinion extraction30
  • 53.
  • 54. Diversity Engine provides a simple language to map LX XML to Solr XML31
  • 55. LK2SOLR CONVERSION <indexersolrHomeDir="solr/solr“ solrDataDir="solr/solr/data“ converter="conf/testbed/tutorial-lk2solr.xml"/> <lktosolr> <fieldsolr="per"annotation="ENTITIES_CLEAN"value="$text“ filter="org.diversityengine.solr.converter.filters.PerValueFilter"/> <fieldsolr="loc"annotation="ENTITIES_CLEAN"value="$text“ filter="org.diversityengine.solr.converter.filters.LocValueFilter"/> <fieldsolr="keywords"annotation="TOP_ENTITIES"value="$text" /> <fieldsolr="pubdate"annotation="metainfo:lktext"value="date“ type="date"/> </lktosolr> solr – Name of the field in solr annotation – Name of the LKXML Annotation value – Value of annotation filter – Allows post processing on annotation type – Only Date supported currently 32
  • 56. ADDING FACTS TO INDEX <lktosolr> <fieldsolr="per"annotation="ENTITIES_CLEAN"value="$text“ filter="org.diversityengine.solr.converter.filters.PerValueFilter"/> <fieldsolr="loc"annotation="ENTITIES_CLEAN"value="$text“ filter="org.diversityengine.solr.converter.filters.LocValueFilter"/> <fieldsolr="keywords"annotation="TOP_ENTITIES"value="$text" /> <fieldsolr="pubdate"annotation="metainfo:lktext"value="date“ type="date"/> <fieldsolr="yago"annotation="yago-entities"value="$text" /> <fieldsolr="yago-country"annotation="facts" value="xpath:/entity-information[facts/type/text()= 'wordnet_country_108544813']/id/text()" /> </lktosolr> apps/testbed –run convert-solr conf/testbed/tutorial-application.xml ls devapps/example/data/solr/* apps/testbed –run index conf/testbed/tutorial-application.xml 33
  • 57. FACTS TO SOLR <fieldsolr="yago"annotation="yago-entities"value="$text" /> 34
  • 58. FACTS TO SOLR <fieldsolr="yago-country"annotation="facts" value="xpath:/entity-information[facts/type/text()= 'wordnet_country_108544813']/id/text()" /> 35
  • 59. ADDING IMAGES TO INDEX <lktosolr> <fieldsolr="per"annotation="ENTITIES_CLEAN"value="$text“ filter="org.diversityengine.solr.converter.filters.PerValueFilter"/> <fieldsolr="loc"annotation="ENTITIES_CLEAN"value="$text“ filter="org.diversityengine.solr.converter.filters.LocValueFilter"/> <fieldsolr="keywords"annotation="TOP_ENTITIES"value="$text" /> <fieldsolr="yago"annotation="yago-entities"value="$text" /> <fieldsolr="yago-country"annotation="facts" value="xpath:/entityinformation[facts/type/text() ='wordnet_country_108544813']/id/text()" /> <fieldsolr="pubdate"annotation="metainfo:lktext"value="date“ type="date"/> <fieldsolr="image"annotation="IMAGE_ANNOTS"value="$text" /> <fieldsolr="bestimage"annotation="BEST_IMAGES"value="$text" /> </lktosolr> apps/testbed –run convert-solr conf/testbed/tutorial-application.xml ls devapps/example/data/solr/* apps/testbed –run index conf/testbed/tutorial-application.xml 36
  • 60. APPLICATION DEVELOPMENT Examples HTML Extraction Scaling to Large Collections Provenance Some pluggable GUI components 37
  • 61. FACT/IMAGE APPLICATION <searcherappTitle="LivingKnowledge - Example Application" appShortTitle="Example Application" appUrl="http://localhost:8983/solr/"> <facets> <facetfield=“yago"description=“Yago"/> <facetfield=“yago-country"description=“Country"/> <facetfield="per"description="Person"/> <facetfield="loc"description="Location"/> <facetfield=“image"description=“Images"/> </facets> </searcher> ant deploy-testbed 38
  • 62. FACT/IMAGE APPLICATION http://localhost:8983/testbed/results.jsp?query=putin 39
  • 63. OPINION APPLICATION Opinions are at sentence level, not document level – same analysis, but different indexing cat conf/testbed/tutorial-lk2solr-sentence.xml <lktosolrsolrDoc="SENTENCES"contextSize="1"> <fieldsolr="per"annotation="ENTITIES_CLEAN"value="$text“ filter="org.diversityengine.solr.converter.filters.PerValueFilter“ source="solrdoc" /> <fieldsolr="loc"annotation="ENTITIES_CLEAN"value="$text“ filter="org.diversityengine.solr.converter.filters.LocValueFilter“ source="solrdoc" /> <fieldsolr="keywords"annotation="TOP_ENTITIES"value="$text" /> <fieldsolr="yago"annotation="yago-entities"value="$text“ source="solrdoc" /> <fieldsolr="image"annotation="IMAGE_ANNOTS"value="$text" /> <fieldsolr="bestimage"annotation="BEST_IMAGES"value="$text" /> <fieldsolr="pubdate"annotation="metainfo:lktext"value="date“ type="date"/> <fieldsolr="polarity" annotation="MPQA-expressive-subjectivity,MPQA-direct-subjective“ value="xpath:/node()[@pol]/@pol"source="solrdoc“ filter="org.diversityengine.solr.converter.filters.PolarityValueFilter"/> <fieldsolr="pol-int“ annotation="MPQA-expressive-subjectivity,MPQA-direct-subjective“ value="xpath:concat(/node()[@pol and @int]/@pol,/node()[@int and @pol]/@int)“ source="solrdoc"/> </lktosolr> apps/testbed –run convert-solr,index conf/testbed/tutorial-application-sentence.xml ls devapps/example/data/solr/* 40
  • 64. SOLR XML – SENTENCE 41
  • 65. OPINION APPLICATION modify webappEB-INFeb.xml <web-appxmlns="http://java.sun.com/xml/ns/javaee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd" version="2.5"> <description> LivingKnowledge Testbed Example Application </description> <display-name>Testbed Examples</display-name> <context-param> <param-name>applicationDef</param-name> <param-value>conf/testbed/tutorial-application-sentence.xml</param-value> <description>The Living Knowledge application description XML file </description> </context-param> </web-app> ant deploy-testbed 42
  • 66. OPINION APPLICATION http://localhost:8983/testbed/results.jsp?query=putin 43
  • 68. HTML EXTRACTION Boilerplate can lead to false positive results and inaccurate facet aggregation Real example – before extraction developed, most common person for most queries was in a top story title (on all pages) the day of the crawl! Titles, Authors and Dates are important for bias and diversity aware search 45
  • 69. PROVENANCE How an annotation is derived is often as important as the annotation itself Users want to verify results Developers need to validate results Open Provenance provides an open source solution Testbed annotations can be extended with Open Provenance chains 46
  • 71. SCALING TO LARGE COLLECTIONS In the real world, even “small” datasets have million of documents NLP/Image processing is expensive – 1 doc/sec = 11 days for 1 million docs! Hadoop Mapper allows for scaling – scales linearly with number of machines ZipCollection writer allows partitioning data into subsets for processing 48
  • 75. FUTURE WORK More components Maven to manage dependencies Better integration of Timeline and Geo visualization components Integration of ranking algorithms Better Documentation  52
  • 76. Thanks! LivingKnowledge Partners! You for coming!! Questions? 53