Challenges in medical imaging and the VISCERAL model

Challenges in medical imaging and
the VISCERAL model
Henning Müller
HES-SO &
Martinos Center

Overview
• Systematic evaluations
– Information retrieval, industrial challenges
• ImageCLEF
– 2003-2016
• Challengesin medical imaging and Open Science
– Conference and platforms (Kaggle, Topcoder, …)
• VISCERAL
– “Moving the algorithms to the data and not the data
to the algorithms”
• Conclusions

Systematic evaluations
• 1960-: the Cranfieldtests
– Test collection, tasks, ground truth
– Automatic indexing better than manual terms
• 1992-: TREC – Text Retrieval Conference
– At NIST, Gaithersburg
– Many different tasks over the years
• 1999-: CLEF, TRECVid as offspring of TREC
• Industrial performance benchmarks
– TPC (1988), common transaction processing frame
– Supercomputer benchmark (1993), common criteria
– …
Cleverdon, C. W. (1960). ASLIB Cranfield research project
on the comparative efficiency of indexing systems.ASLIB
Proceedings, XII, 421-431.

• Benchmark on multimodal imageretrieval
– Run since 2003, medical task since 2004
– Part of the Cross Language Evaluation Forum (CLEF)
• Many tasks related to medical image retrieval
– Image classification (modality, body part, …)
– Image-based retrieval
– Case-based retrieval (finding similar cases)
– Compound figure separation
– Caption prediction
– …
• Many old databases remain available, imageclef.org
Henning Müller, Paul Clough, Thomas
Deselaers, Barbara Caputo, ImageCLEF –
Experimental evaluation of visual information
retrieval, Springer, 2010.

ImageCLEF experiences
• Creating a community is important to have a good
participation (many groups register to access data)
– Workshop to discuss results, evolution of tasks over
the years (attracting postgraduate students)
• Impact of data sets can be high (see also TREC)
– Overview articles are frequently cited, best
participant algorithms as well
• Large data causes problems in some countries
• Hard to make groups collaborate
– Evaluate system components
• Little interactive evaluation of systems
• Not everything is fully reproducible

Open Science
• Initiatives to share data, tasks and tools
– Not only experts, really everyone
– More efficient way to do science, no reimplementation
– NIH, some journals push for open data, open access
• Data papers and executable papers
– Full reproducibility
• Which is otherwise often not given!!
• Challengesas an important way to bring many
people into the loop of data science
– http://www.challenge.gov/
– Kaggle, TopCoder, …
Open Science is the practice of science in such a way
that others can collaborate and contribute, where
research data, lab notes and other research processes
are freely available, under terms that enable reuse,
redistribution and reproduction of the research and its underlying data and methods.

Challenges in Medical Imaging
• Grand Challenges in Medical Imaging
– http://grand-challenge.org/
– Including challenges and details on impact & why
• 2007: MICCAI workshop on liver segmentation
– Including common data
• Now most conferences organize many challenge
sessions in addition to workshops
– MICCAI, ISBI, SPIE, ICPR, …
• Problem: still, most publications are on closed data
sets, impossible to verify, small, ...
– What if all were available on a secure infrastructure?

Platforms for ML challenges
• Kaggle
– Much influence on machine learning challenges
– Big commercial factor, giving price money and
also hiring good talent
– Download data, submit results
• TopCoder
– Use of code instead of results list
– 79,900,000 of price money distributed
– Almost 1 million members
• Many other exist in specific domains
– Sage Bionetworks in the biomedical field

Challenges with challenges
• Get a large number of participants and different
techniques, as there are many burdens
– Only one can win in the end, price money
• Same conditions for all (computation, bandwidth)
• How to distribute very large data sets?
• How to deal with confidential/restricteddata?
– Medical, commercial data, forbidden data sets
• How to deal with quickly changing data?
– Data of cell phone providers, Internet companies
• Reproducibility
– Optimizations on test data, particularly with prices

VISCERAL model
• VISCERAL – Visual Concept Extraction
Challenge in Radiology
• “Bringing the algorithms to the data”
– Have the data centrally stored, in our case in the
cloud (which can be HIPAA compliant)
• Three types of challenges
– Anatomy segmentations (3x), 20 organs
– Retrieval challenge (2x), finding similar cases
– Lesion detection challenge (2x), 5 organs
• Provide large data sets that are well annotated
and can be shared long term
– Challenging with IRB approval in three countries

Test DataTraining Data
Participants Organiser
Participant
Virtual
MachinesRegistration
System
Annotation
Management System
Analysis
System
Annotators
(Radiologists)
Locally Installed
Annotation
Clients
Microsoft
Azure
Cloud
Test Data

Silver corpus (example trachea)
• Executable code of all participants
– Run it on new data, do label fusion
Dice 0.85 Dice 0.71 Dice 0.84 Dice 0.83
Participant segmentations
Dice 0.92
Silver Corpus

Docker vs. Virtual MachinesContainers
Bins/Libs
VM
VS.

Evaluation as a Service (EaaS)
• Evaluationvia APIs, code, cloud, …
• Workshop in Sierre in March 2015
• Many aspects, viewpoints,
interests
• White paper published
– ArXiv
• All comments are
welcome!!

Cloud-based evaluation
• Workshop at Martinos
Center, Boston, MA, November 2015
• How to run benchmarks on very large data sets in
the cloud (reproducibility, motivation)
• Many different stakeholders
– Scientists, infrastructure providers, companies,
funding organizations
• Sustainability is a major challenge
• Interest of singe persons vs. interests of a domain
– Give credit to creators of data and tools
• Nature Scientific data

Coding4Cancer & others
• Challenge on cancer prediction (breast, lung)
• Price money for the challenges
– Make code open source to be eligible
• Commercial medical imaging challenges
– Zebra Medical Vision
• Large data sets available for research
• Use of their infrastructure, only, using Docker
– RadLogic
• Plug-in concept for algorithms of scientists

What is needed now?
• Long-term vision of how medical data analysis will
develop and how data & tools can be shared
– Moon shot initiative on cancer (Biden)
• International research infrastructure
– Public-private partnerships to make them
sustainable, still sharing costs is not clear
– Leaving data where produced, moving the code
• Incentives to share data and task environments
– Those doing the major work should receive credit
– More work on those preparing data & tasks
• Annotate data, standard formats, support to others

Conclusions
• Open Science is developingquickly
– Potential advantages for all
• The medical domain is complicated as data
require protection (the more, the bigger)
– Particularly for genomics
– No duplications limit data exposure
• Translational aspects also need to be taken into
account (transfer code towards products)
– Executable “papers” and available data should help
– Objective performance comparison
• Challengeswill be part of this ecosystem

Contact
• More information canbe found at
– http://www.imageclef.org/
– http://visceral.eu/
– http://medgift.hevs.ch/
– http://publications.hevs.ch/
• Contact:
– Henning.mueller@hevs.ch

Challenges in medical imaging and the VISCERAL model

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Challenges in medical imaging and the VISCERAL model

Similar to Challenges in medical imaging and the VISCERAL model (20)

More from Institute of Information Systems (HES-SO)

More from Institute of Information Systems (HES-SO) (20)

Recently uploaded

Recently uploaded (20)

Challenges in medical imaging and the VISCERAL model