pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
Moving beyond the box: automating the digitisation of insect collections
1. Moving beyond the box:Moving beyond the box:
automating the digitisation ofautomating the digitisation of
insect collectionsinsect collections
2. Blagoderov et al (2012) No specimen left behind: industrial scale digitization of natural history collections.
ZooKeys 209: 131–146, doi: 10.3897/zookeys.209.3178
3. Drawer level imaging is (mostly) a solved problem
1. Place drawer 2. Scan 3. Stitch
4. Result: High resolution composite image
Drawer level imaging is (mostly) a solved problem
• Fast (5 mins per drawer)
• High resolution (circa 500MB per image)
• No specimen handling
5. But, two key problems remain…
Synchronisation Label data
Keeping the physical & digital
copies in sync
Capturing data from multiple
pinned labels
6. • Don’t worry about it, re-image as required
• Lock down the drawers
• Crop-out each specimen image
– Automate the cropping process
– Link each specimen to its digital image
– Make it easy to collect label data
Approaches to the synchronisation problem
The only practical solution,
but a new rate limiting step
7. Annotation software, NHM working with SmartDrive:
Initial supporting software
• No automated cropping
• Manually link images &
specimens
• Poor UX/UI
• Closed source and
proprietary
• Not cross-platform
(Windows only)
A good first step to
understanding the problems
8. Automating specimen segmentation
Starting image
Auto-segment
Mark errors
Correct
Work with Pieter Holtzhausen and Stéfan van der Walt (Stellenbosch University)
Software: Inselect, written in Python
14. Works on slides & pinned insects
Whole Drawer image Also testing mineral & fossil samples
15. Planned UX / UI Enhancements
Whole Drawer image
Unit tray
recognition
Multi-
specimen
annotation
Plug in controlled
vocabulary services
16. 1D & 2D barcode recognition
Whole Drawer image
• Recognition and reading of 1D &
2D matrix barcodes from images
• Different physical requirements
(smallest 6x6mm matrix,
readable via handheld scanners)
• Testing open source &
commercial libraries at different
scan resolutions
Initial results
• Commercial solutions outperform
open source
• Max. read success 94%
• Idiosyncratic results (different
results on different OS)
• Testing continues…
17. The Holy Grail: label imaging & text recognition
Whole Drawer image
Chauliodes pectinicornis
DorsalCaudalFrontal
Reconstructed labels
18. The Holy Grail: label imaging & text recognition
Whole Drawer image
Agulla astuta
DorsalCaudalLateral
Reconstructed labels
19. Approaches to label imaging pinned specimens
Whole Drawer image
Could be incorporated as part of the barcode dispensing process
1. Barcode dispenser & scanner
(two sides barcode labels)
2. Freshly pinned barcode label
3. Other collection labels
4. Multiple label imaging cameras
5. Assemble labels from composite images
20. Acknowledgements
Whole Drawer image
Segmentation algorithm & app. development
Stefan van der Walt and Pieter Holtzhausen
Application development
Alice Heaton
Barcode recognition & testing
Lawrence Hudson
Analysis & testing
Laurence Livermore, Vladimir Blagoderov and Ben Price
Initial specification and funding
Vince Smith
Notes de l'éditeur
Big digitisation programmes. Need high throughput approaches.
Big digitisation programme
Project history: Inselect was originally conceived as an answer to whole-drawer imaging hardware setups, such as Dscan, SatScan and InvertNet, which are in use at different natural history collections around the world.
The systems are typically used to industrially digitise insect collections but can also be used to digitise slide collections.
In early 2014 we started working with research teams in higher education to develop prototypes for automated drawer segmentation.
The Inselect prototype was developed in collaboration between the NHM, London and Stellenbosch University, South Africa.
The initial collaboration was instigated by Ben Price and the following people: Stéfan van der Walt, Pieter Holtzhausen, Vladimir Blagoderov and Laurence Livermore
The first prototype developed by Stéfan and Pieter was Python-based and had the following fucntionality:
Downscale high resolution TIFF images for processing to JPEG – easier to run segmentation algorithm
Automatically detect specimens
Create rectangular bounding boxes around specimens
Batch crop specimen images
One of the rate-limiting steps for these whole drawing imaging systems is annotation and post-processing. Typically collections want to record metadata about individual specimens which involves cropping specimens and entering metadata such as taxonomic name, location in collection etc.
Most software currently in use has limitations:
No automation – all specimens have to be manually selected by a human operator
Poor user interface and user experience – most software has a very basic user interface and provides a poor user experience
Closed source & proprietary – making it costly or impossible for museums to collaborate, build upon and share software
Not cross-platform – limiting annotation and post-processing to Windows users only
What I want to do for the remainder of this talk is focus on elements of this programme and pick out some of the key elements which are of broader interest, in terms of …
Contrast based is the simple approach (quick and simple – less computationally expensive)– When you resegment you use seed growing. (More time consuming but better fidelity. It also uses a human to mark a startting centroid
The sidebar allows very quick reviewing of segmentation for false positives and incorrectly segmentated objects (e.g. multiple objects together)
The sidebar allows very quick reviewing of segmentation for false positives and incorrectly segmentated objects (e.g. multiple objects together)
The sidebar allows very quick reviewing of segmentation for false positives and incorrectly segmentated objects (e.g. multiple objects together)
The sidebar allows very quick reviewing of segmentation for false positives and incorrectly segmentated objects (e.g. multiple objects together)
(Fishfly: Megaloptera)
The sidebar allows very quick reviewing of segmentation for false positives and incorrectly segmentated objects (e.g. multiple objects together) Lace wing Neuroptera
The sidebar allows very quick reviewing of segmentation for false positives and incorrectly segmentated objects (e.g. multiple objects together)
The sidebar allows very quick reviewing of segmentation for false positives and incorrectly segmentated objects (e.g. multiple objects together)