Journal images represent an important part of the knowledge stored in the medical literature. Figure classification has received much attention as the information of the image types can be used in a variety of contexts to focus image search and filter out unwanted information or ”noise”, for example non–clinical images. A major problem in figure classification is the fact that many figures in the biomedical literature are compound figures and do often contain more than a single figure type. Some journals do separate compound figures into several parts but many do not, thus requiring currently manual separation.
In this work, a technique of compound figure separation is proposed and implemented based on systematic detection and analysis of uniform space gaps. The method discussed in this article is evaluated on a dataset of journal figures of the open access literature that was created for the ImageCLEF 2012 benchmark and contains about 3000 compound figures.
Automatic tools can easily reach a relatively high accuracy in separating compound figures. To further increase accuracy efforts are needed to improve the detection process as well as to avoid over–separation with powerful analysis strategies. The tools of this article have also been tested on a database of approximately 150’000 compound figures from the biomedical literature, making these images available as separate figures for further image analysis and allowing to filter important information from them.
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Separating compound figures in journal articles to allow for subfigure classification
1. Institute of
Information Systems
Separating compound figures in journal
articles to allow for subfigure classification
Ajad Chhatkuli
Antonio Foncubierta-Rodríguez
Dimitrios Markonis
Henning Müller
2. Motivation Institute of
Information Systems
• Figures in biomedical journals contain a lot of
information
• CBIR has been proposed for accessing medical
literature
• Modality classification
• Improves accessibility
• Allows result filtering
• But 50% of figures are compound or multipanel
3. Aim Institute of
Information Systems
• Develop a system that separates compound figures
in the biomedical literature
• Visual-information only
• Textual information is discarded
• Modality-independent
• One method for many images types
• Many methods for few images types
• Tunable according to the dataset
• Large-scale tested
• Approximately 250 open access journals
5. Methods. Dataset Institute of
Information Systems
• 2982 manually classified figures from ImageCLEF
2012 dataset
• Ground truth:
• Image subclass: 2x1,1x2,
• Position of separators
6. Methods. Overview Institute of
Information Systems
• Problem is separated in two
• Find subfigure separator candidates
• Preprocessing if required
• Analyze candidates
• Remove false positives
• Rule-based decisions
7. Methods. Separator detection Institute of
Information Systems
• Based on minimum
pixel projection for
white-space separated
figures
• Horizontal Vertical
detection
• Inverse order by rotation
according to aspect ratio
• Recursive
8. Methods. Separator detection Institute of
Information Systems
• Rule-based processing
• Progressive truncation to remove labels if no
separators are found
• Text removal based on connected commponents if no
separators are found
• Complement image for black-space separations
• Standard deviation image for subtle separations
• Binarization of non-graph figures:
• Less than 40% of the image is white or almost white
9. Methods. Separator analysis Institute of
Information Systems
• Classification problem
• True/false separator
• Features used:
• Closeness to border, division ratio, standard
deviation, text removal analysis, histogram, gap
comparison
• Classifiers:
• SVM
• Rule-based classifier
13. Unsuccessful examples Institute of
Information Systems
Not horizontal/vertical
No separation gap separation
14. Conclusions future work Institute of
Information Systems
• Good results for a wide range of images
• Using purely visual information
• Separation problem: detection and analysis
• Rule weights can be fine-tuned according to dataset
• What would be the impact of a larger training set?
• What would be the impact in existing modality
classification accuracy?
15. Conclusions future work Institute of
Information Systems
• Good results for a wide range of images
• Using purely visual information
• Separation problem: detection and analysis
• Rule weights can be fine-tuned according to dataset
• What would be the impact of a larger training set?
• What would be the impact in existing modality
classification accuracy?
16. Institute of
Information Systems
Thanks for your attention!
More information at http://medgift.hevs.ch
Ajad Chhatkuli, Dimitrios Markonis, Antonio Foncubierta-Rodríguez, Fabrice Meriaudeau
and Henning Müller, Separating compound figures in journal articles to allow for subfigure
classification, in: SPIE, Medical Imaging, Orlando, FL, USA, 2013