Gen AI in Business - Global Trends Report 2024.pdf
An evaluation of taxonomic name finding & next steps in Biodiversity Heritage Library (BHL) developments
1. An evaluation of taxonomic name finding & next steps in Biodiversity Heritage Library (BHL) developments Chris Freeland Technical Director, BHL Director of Bioinformatics, Missouri Botanical Garden
7. Open Access Data The snakes of Australia ; an illustrated and descriptive catalogue of all the known species. By Gerard Krefft... Publisher: Sydney,T. Richards, Government Printer,1869. PDF OCR XML JP2
9. Raw Image Converted to text via OCR Name finding via TaxonFinder Extract names Submit to NameBank SOAP response Name Finding in action with Taxonomic Intelligence…
10.
11.
12.
13.
14.
15. Characteristics of sample = 86.91% 2610 Total Number of Unique Names 3003 Total Number of Names 7.7 Average Number of Names per Page 446.8 Average Number of Words per Page 392 Number of Pages
16. OCR error rate for names only Top OCR errors Of the 3,003 names, 1,056 were incorrectly transcribed by OCR. e->o 14 c->e 7 h->ii 13 i->l 6 h->l 12 u->n 5 u->ii 11 u->I 4 r->i 10 e->c 3 l->i 9 Omit Space 2 n->v 8 Insert Space 1 35.16%
17. Performances of algorithms TaxonFinder FAT Excluding names with OCR errors Including names with OCR errors 28.20% 40.32% Precision 23.34% 36.62% Recall 25.77% 38.47% F-score 32.25% 43.77% Precision 17.21% 25.82% Recall 24.73% 34.80% F-score