EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infrastructure for Evaluation on Big Data

•

0 j'aime•988 vues

Selected Talk by Allan Hanbury, at the European Data Forum 2013, 10 April 2013 in Dublin, Ireland: Algorithm any good? A Cloud-based Infrastructure for Evaluation on Big Data

Technologie Formation

Algorithm any good?
A Cloud-based
Infrastructure for
Evaluation on Big Data
Allan Hanbury
Vienna University of Technology

The research leading to these results has received funding from the European Union Seventh
Framework Programme (FP7/2007-2013) under grant agreement n° 318068 (VISCERAL).

Evaluation

 Evaluation campaigns / Challenges /
Benchmarks / Competitions / ...
 Makes economic sense
 “for every $1 that NIST and its partners invested in
TREC, at least $3.35 to $5.07 in benefits accrued
to IR researchers.”
 Has scientific impact

Evaluation Campaigns
Ground
truth
Tasks Data
Organiser

Participants

Kyle Mcdonald: http://www.flickr.com/photos/kylemcdonald/6187343093/

With Big Data?
Ground
truth

Organiser

Tasks Data

Participants

Kyle Mcdonald: http://www.flickr.com/photos/kylemcdonald/6187343093/

Benchmarking Algorithms on Big Data

 Distributing terabytes is hard
 Sending hard disks, download is not feasible
 Bringing algorithms to the data is necessary
 Motivating participants
 Tasks with general interest and few infrastructure
barriers (how to store or treat terabytes ...)
 Allow sharing infrastructure
 Manual ground truthing does not scale. Use:
 Semi-automation (e.g. silver corpus)
 Coercion (e.g. crowd sourcing)
 …

Evaluation on the Cloud

 (http://visceral.eu)

 Bring the algorithms to the data, not the data
to the algorithms
 Put the data on the cloud
 Participants program in computing instances on
the cloud
 First benchmark on structure recognition in
medical images

Training Phase

Cloud
Training Data Test Data

Participant
Instances
Registration
System
Analysis
System

Participants Organiser

Evaluation Phase

Cloud
Training Data Test Data

Participant
Instances
Registration
System
Analysis
System

Participants Organiser

Annotators
(Radiologists)

Locally Installed
Annotation
Clients
Annotation
Management System
Cloud
Training Data Test Data

Participant
Instances
Registration
System
Analysis
System

Participants Organiser

Future Development

 Dealing with private data
 Does it make sense to evaluate on data that the
participant cannot see?
 Does it make sense to evaluate only on extracted
features?
 Moving toward eScience
 Data identifiers
 Algorithm identifiers?
 Continuous evaluation
 Modular construction of the algorithms

Challenges

 Sharing components
 Who should provide the cloud service?
 Who pays for using it?
 Transferring components to industry

Recommandé

TechRules - Risk AnalyticsTechRules

EventShop ISG talk 140213Siripen Pongpaichet

Federated Galaxy: Biomedical Computing at the FrontierEnis Afgan

Eyeo 2019-Lightning-CytoscapeKeiichiro Ono

The Discovery Cloud: Accelerating Science via Outsourcing and AutomationIan Foster

EventShop DemoSiripen Pongpaichet

SKG-2013, Beijing, China, 03 October 2013Charith Perera

36x48_Trifold_FinalPosterRyan Riopelle, EIT

Recommandé

TechRules - Risk AnalyticsTechRules

EventShop ISG talk 140213Siripen Pongpaichet

Federated Galaxy: Biomedical Computing at the FrontierEnis Afgan

Eyeo 2019-Lightning-CytoscapeKeiichiro Ono

The Discovery Cloud: Accelerating Science via Outsourcing and AutomationIan Foster

EventShop DemoSiripen Pongpaichet

SKG-2013, Beijing, China, 03 October 2013Charith Perera

36x48_Trifold_FinalPosterRyan Riopelle, EIT

Power pointMila Smw

EDF2013: Selected Talk Nikolaos Loutas, João Rodrigues Frade: Linked Open Gov...European Data Forum

Rol del docente y del alumno ante las ticNelba Quintana

EDF2013: Selected Talk, Peter Haase: Optique: Scalable End-User Access to Big...European Data Forum

Delivering on Standards for Publishing Government Linked Data3 Round Stones

EDF2013: Selected Talk John Sheridan: Good Law from Open DataEuropean Data Forum

Knowledge Discovery in ProductionAndré Karpištšenko

Siddhi: A Second Look at Complex Event Processing ImplementationsSrinath Perera

Situation Awareness In A Complex Worldvsorathia

IRJET- Criminal Recognization in CCTV Surveillance VideoIRJET Journal

Scalable Computing Labs (SCL).Mindtree Ltd.

Tim Malthus_Towards standards for the exchange of field spectral datasetsTERN Australia

Review of Algorithms for Crime Analysis & PredictionIRJET Journal

Microservices Architecture Part 2 Event Sourcing and SagaAraf Karsh Hamid

IBM Smarter Business 2012 - PureSystems - PureDataIBM Sverige

IRJET- A Detailed Analysis on Windows Event Log Viewer for Faster Root Ca...IRJET Journal

A vision on collaborative computation of things for personalized analysesDaniele Gianni

Inspection of Suspicious Human Activity in the Crowd Sourced Areas Captured i...IRJET Journal

IRJET - A Genetic Approach for Reversible Database Watermarking using Fingerp...IRJET Journal

Making Runtime Data Useful for Incident Diagnosis: An Experience ReportQAware GmbH

Io t technologies_ppt-2achakracu

Performance of Hasty and Consistent Multi Spectral Iris Segmentation using De...ijtsrd

Contenu connexe

En vedette

Power pointMila Smw

EDF2013: Selected Talk Nikolaos Loutas, João Rodrigues Frade: Linked Open Gov...European Data Forum

Rol del docente y del alumno ante las ticNelba Quintana

EDF2013: Selected Talk, Peter Haase: Optique: Scalable End-User Access to Big...European Data Forum

Delivering on Standards for Publishing Government Linked Data3 Round Stones

EDF2013: Selected Talk John Sheridan: Good Law from Open DataEuropean Data Forum

En vedette (6)

Power point

EDF2013: Selected Talk Nikolaos Loutas, João Rodrigues Frade: Linked Open Gov...

Rol del docente y del alumno ante las tic

EDF2013: Selected Talk, Peter Haase: Optique: Scalable End-User Access to Big...

Delivering on Standards for Publishing Government Linked Data

EDF2013: Selected Talk John Sheridan: Good Law from Open Data

Similaire à EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infrastructure for Evaluation on Big Data