Much computer vision research has focused on natural images, but technical documents typically consist of abstract images, such as charts, drawings, diagrams, and schematics. How well do general web search engines discover abstract images? Recent advancements in computer vision and machine learning have led to the rise of reverse image search engines. Where conventional search engines accept a text query and return a set of document results, including images, a reverse image search accepts an image as a query and returns a set of images as results. This paper evaluates how well common reverse image search engines discover abstract images. We conducted an experiment leveraging images from Wikimedia Commons, a website known to be well indexed by Baidu, Bing, Google, and Yandex. We measure how difficult an image is to find again (retrievability), what percentage of images returned are relevant (precision), and the average number of results a visitor must review before finding the submitted image (mean reciprocal rank). When trying to discover the same image again among similar images, Yandex performs best. When searching for pages containing a specific image, Google and Yandex outperform the others when discovering photographs with precision scores ranging from 0.8191 to 0.8297, respectively. In both of these cases, Google and Yandex perform better with natural images than with abstract ones achieving a difference in retrievability as high as 54% between images in these categories. These results affect anyone applying
common web search engines to search for technical documents that use abstract images.
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine
1. 1
2022/10/24
@shawnmjones 1
2022/10/24
Managed by Triad National Security, LLC, for the U.S. Department of Energy’s NNSA.
Abstract Images Have Different Levels of
Retrievability Per Reverse Image Search Engine
Shawn M. Jones & Diane Oyen
Information Sciences (CCS-3)
2022/10/24
LA-UR-XXXXXX
2. 2
2022/10/24
@shawnmjones
There are few computer vision research papers focused
on querying and retrieving abstract, technical drawings
• Technical documents typically contain
abstract images
• Many reasons exist to search for
abstract images online:
• protect intellectual property
• build datasets
• find evidence for legal cases
• establish scholarly evidence
• justify funding through image
reuse
https://commons.wikimedia.org/wiki/File:Complete_neuron_cell_diagram_en.svg
https://commons.wikimedia.org/wiki/File:Carriage-house-2.jpg
https://commons.wikimedia.org/wiki/File:Interspiro_DCSC_loop_schematic.png
3. 3
2022/10/24
@shawnmjones
Baidu Bing Google Yandex
Now major search engines support reverse image search
Screenshot source:
https://image.baidu.com
Screenshot source:
https://images.google.com
Screenshot source:
https://www.bing.com/
Screenshot source:
https://yandex.com/images
4. 4
2022/10/24
@shawnmjones
With each service,
a user can upload
an image and
receive different
types of results
pages-with
results
similar-to
results
the uploaded
query image
Uploaded image source: https://commons.wikimedia.org/wiki/File:Adams_The_Tetons_and_the_Snake_River.jpg
Screenshot from: https://www.bing.com
6. 6
2022/10/24
@shawnmjones
To collect query images, we submitted terms to
Wikimedia Commons’ API
“diagram”
“schematic”
abstract images
“photo”
“photograph”
natural images
100 images
100 images
100 images
99 images
Previous studies have shown that Wikipedia content has high retrievability.
Image sources:
• https://commons.wikimedia.org/wiki/File:Galileo_Diagram.jpg
• https://commons.wikimedia.org/wiki/File:Complete_neuron_cell_diagram_en.svg
• https://commons.wikimedia.org/wiki/File:Bicycle_diagram-es.svg
• https://commons.wikimedia.org/wiki/File:Systems_Engineering_V_diagram.jpg
Image sources :
• https://commons.wikimedia.org/wiki/File:Hvdc_bipolar_schematic.svg
• https://commons.wikimedia.org/wiki/File:Beve_gear_schematic.png
• https://commons.wikimedia.org/wiki/File:Interspiro_DCSC_loop_schematic.png
• https://commons.wikimedia.org/wiki/File:Carriage-house-2.jpg
Image sources :
• https://commons.wikimedia.org/wiki/File:Manatee_photo.jpg
• https://commons.wikimedia.org/wiki/File:Frank_W._Micklethwaite_photo_of_downtown_Toronto,_1890_-2.jpg
• https://commons.wikimedia.org/wiki/File:James_Abram_Garfield,_photo_portrait_seated.jpg
• https://commons.wikimedia.org/wiki/File:Wtc-photo.jpg
Image sources :
• https://commons.wikimedia.org/wiki/File:Adams_The_Tetons_and_the_Snake_River.jpg
• https://commons.wikimedia.org/wiki/File:Photographing_sunrise_1745.jpg
• https://commons.wikimedia.org/wiki/File:FEMA_-_5399_-_Photograph_by_Andrea_Booher_taken_on_09-28-2001_in_New_York.jpg
• https://commons.wikimedia.org/wiki/File:Photographing_a_model.jpg
7. 7
2022/10/24
@shawnmjones
We then submitted
the same image to
each reverse image
search engine
then again with:
and so on...
Image source: https://commons.wikimedia.org/wiki/File:Manatee_photo.jpg
Image source: https://commons.wikimedia.org/wiki/File:Interspiro_DCSC_loop_schematic.png
Screenshot source:
https://images.google.com
Screenshot source:
https://www.bing.com/
Screenshot source:
https://image.baidu.com
Screenshot source:
https://yandex.com/images
8. 8
2022/10/24
@shawnmjones
Using ImageHash’s pHash and GoFigure’s VisHash we
evaluated how often the same image existed in the
results
pHash was designed
to compare
photographs via
Discrete Cosine
Transforms (DCT).
VisHash was designed
to compare diagrams
and technical
drawings by finding
shapes in the image.
Uploaded images:
https://commons.wikimedia.org/wiki/File:Manatee_photo.jpg
https://commons.wikimedia.org/wiki/File:Interspiro_DCSC_loop_schematic.png
Screenshots source:
https://yandex.com/images
9. 9
2022/10/24
@shawnmjones
Precision differs based on pages-with or similar-to
results, with Yandex performing best
blue = abstract images
green = natural images
Precision@k:
What percentage of images in the results are the same as the query image if we stop at k results?
S. M. Jones and D. Oyen. 2022. “Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine,” Proceedings
of the 2nd Drawings and abstract Imagery: Representation and Analysis (DIRA) Workshop. (Tel Aviv, Israel).
10. 10
2022/10/24
@shawnmjones
After reviewing 10 pages-with results, Google has a max of 54% retrievability
difference between images from the categories of photograph and diagram
blue = abstract images
green = natural images
Retrievability:
Given a query image, was it retrieved within the cutoff c?
S. M. Jones and D. Oyen. 2022. “Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine,” Proceedings
of the 2nd Drawings and abstract Imagery: Representation and Analysis (DIRA) Workshop. (Tel Aviv, Israel).
11. 11
2022/10/24
@shawnmjones
For similar-to results, Yandex consistently provides a
high MRR (0.8) for natural images
MRR:
How many results, on
average, across all
queries, must a visitor
review before finding a
the same one again?
Google does well with pages-with results
S. M. Jones and D. Oyen. 2022. “Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine,” Proceedings
of the 2nd Drawings and abstract Imagery: Representation and Analysis (DIRA) Workshop. (Tel Aviv, Israel).
12. 12
2022/10/24
@shawnmjones
Key Takeaways
• We submitted abstract and natural images
from Wikimedia Commons to four major
reverse image search engines.
• When they do return results, Bing and Baidu
do not perform well.
• Google does not perform well for similar-to
results, likely indicating that their definition
of similar-to differs from other search
engines.
• Yandex performs best in all cases.
• Yandex and Google consistently perform
better for natural images in pages-with
results.
S. M. Jones and D. Oyen. 2022. “Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine,” Proceedings
of the 2nd Drawings and abstract Imagery: Representation and Analysis (DIRA) Workshop. (Tel Aviv, Israel).