Applying Commercial Computer Vision Tools to Cope with Uncertainties in a Citizendriven Archive

•

0 j'aime•78 vues

Technological Ecosystems for Enhancing Multiculturality

Track 13. Uncertainty in Digital Humanities Author: Amelie Dorn, Eveline Wandl-Vogt, Thomas Palfinger, Jose Luis Preza Diaz, Barbara Piringer, Alexander Schatek and Rainer Zoubek

Formation

Applying Commercial Computer Vision
Tools to Cope with Uncertainties in a
Citizen-driven Archive.
The case study Topothek@exploreAT!
Amelie Dorn 1, Eveline Wandl-Vogt 1
Thomas Palfinger 1, José Luis Preza Díaz 1, Barbara Piringer 1
Alexander Schatek 2, Rainer Zoubek 2
1 Austrian Academy of Sciences, Austrian Centre for Digital Humanities - AAS (AT)
2 Topothek (AT)
TEEM’2018. Track 13. Salamanca (ES) 24/10/2018

Background, Problem Definition and Motivation
Uncertainty
• commonly encountered issue across data of
different disciplines and communities
• causes: mixed data quality, low-level annotations,
time, lack of standards, incoherent
infrastructures, etc
• challenges: data extraction, analysis, interlinking

Background, Problem Definition and Motivation
Motivation
• Digital Humanities → availability of digital methods
& tools to tackle uncertainties
• Use-case Topothek → testing of AI technologies
(Clarifai, IBM Watson, Microsoft cognitive services)
with the aim to enhance data access for
humans (researchers, citizens) as well as machines

Background, Problem Definition and Motivation
Context
@ exploration space ACDH-OeAW
→ multidisciplinary DH-project
→ implementation in Open Innovation Research
Infrastructure (OI-RI)
→ link language data to real-world objects

Use Case Topothek - Background, Goals and Tagging
● completely citizen-driven archive
● focus on history of municipalities
● 200 topotheques in 14 European
countries
● ~300.000 objects
Sources: https://www.topothek.at/en/; https://www.topothek.at/de/unsere-topotheken
Map data ©2018 GeoBasis-DE/BKG (©2009), Google

Use Case Topotheque - Background, Goals and
Tagging
Community tags
- persons
Object tags
- buildings,
objects
Context tags
- events
Source: Topothek Wieselburg. http://wieselburg.topothek.at/#doc=311992. Copyright notice: Urheber: Stadtarchiv, Foto-Dufek 1437/25A

Use Case Topotheque - Background, Goals and
Tagging
Community tags
- persons
Object tags
- buildings,
objects
Context tags
- events
→ sources of uncertainty:
image quality, tag quality
Source: Topothek Wieselburg. http://wieselburg.topothek.at/#doc=311992. Copyright notice: Urheber: Stadtarchiv, Foto-Dufek 1437/25A

Methods: addressing uncertainties in Topothek (AT)
© 2018 Eveline Wandl-Vogt, Thomas Palfinger, Barbara Piringer

Research Questions
Q1: How many algorithm generated concepts are
identified as relevant/ irrelevant for the image for pooled
results from the three different tools?
Q2: What are the consequences of Q1 regarding
certainty/ uncertainty in Topotheque’s data for access
and interconnection?

Results: Computer Vision (1)
● 3 CV-algorithms
● 21 images
● 912 / 1067 tags
● 24 common
concepts
● 782 differing
concepts

Results: Computer Vision (2)
Q1: How many algorithm generated concepts are identified
as relevant/ irrelevant for the image for pooled results from
the three different tools?
Total: 912 unique concepts
relevant: 40% (n=367)
non-relevant: 36% (n=326)
not useful: 14% (n=128)
uncertain: 4% (n=37)

Results: CV & Uncertainty
Q2: What are the consequences of Q1 regarding certainty/
uncertainty in Topotheque’s data for the further
application of cv, the citizen user and the researcher?
→ need to train algorithm on data to make CV feasible for
citizens
→ 3/3: uncertainty significantly reduced when same concepts
produced by the 3 algorithms
→2/3: on average 5 relevant and 1 non-relevant
concept/image = more than double number of manually added
object tags

Challenges, learnings & conclusion
• build on the experiences of a science-as-a-service
infrastructure
• improve results with human-machine-interaction
• improve the developmentof a thesaurus
• tackle technical and societal challenges of uncertainty
further

Thank you for your attention!
@adooorn
#explorations4u

Recommandé

Machine Learning part1 - Introduction to Data Science Frank Kienle

Machine Learning part 2 - Introduction to Data Science Frank Kienle

GTU GeekDay Data Science and ApplicationsKürşat İNCE

[DOLAP2019] Augmented Business IntelligenceUniversity of Bologna

[ADBIS 2021] - Optimizing Execution Plans in a MultistoreChiara Forresi

Business Models - Introduction to Data ScienceFrank Kienle

NoSQL (Not Only SQL)Pouria Amirian

From a sea of projects to collaboration opportunities within secondsMichel Drescher

Recommandé

Machine Learning part1 - Introduction to Data Science Frank Kienle

Machine Learning part 2 - Introduction to Data Science Frank Kienle

GTU GeekDay Data Science and ApplicationsKürşat İNCE

[DOLAP2019] Augmented Business IntelligenceUniversity of Bologna

[ADBIS 2021] - Optimizing Execution Plans in a MultistoreChiara Forresi

Business Models - Introduction to Data ScienceFrank Kienle

NoSQL (Not Only SQL)Pouria Amirian

From a sea of projects to collaboration opportunities within secondsMichel Drescher

2005)butest

Kaggle Days Milan - March 2019Alberto Danese

Jiali_Han_ResumeJiali Han

Kaggle Days Brussels - Alberto DaneseAlberto Danese

Ivann's Resume(Recent Graduate)Ivann Altman-McCray

data scienceskhraletta

Learning to Learn Model Behavior ( Capital One: data intelligence conference )Pramit Choudhary

ECAI 2014 Tutorial on a behavioral analysis tool for agent-based simulations ...tamasmahr

Adopting a situated learning framework for (big) data projectsCranfield University

Adding Open Data Value to 'Closed Data' ProblemsSimon Price

Killam_PaulResumeProgrammingPaul Killam

Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Ilkay Altintas, Ph.D.

Programming for data science in pythonUmmeSalmaM1

Computer Sciences Lab butest

Data scienceMohamed Loey

Short intro to 3D Geoinfo pilot The NetherlandsLéon Berlo

Deep learning: challenges and applicationsAboul Ella Hassanien

Data science SouravSadhukhan6

Learning End to EndThilo Stadelmann

Data scienceSreejith c

The Art and Science of Analyzing Software DataCS, NcState

Data Sets as Facilitator for new Products and Services for UniversitiesHendrik Drachsler

Contenu connexe

Tendances

2005)butest

Kaggle Days Milan - March 2019Alberto Danese

Jiali_Han_ResumeJiali Han

Kaggle Days Brussels - Alberto DaneseAlberto Danese

Ivann's Resume(Recent Graduate)Ivann Altman-McCray

data scienceskhraletta

Learning to Learn Model Behavior ( Capital One: data intelligence conference )Pramit Choudhary

ECAI 2014 Tutorial on a behavioral analysis tool for agent-based simulations ...tamasmahr

Adopting a situated learning framework for (big) data projectsCranfield University

Adding Open Data Value to 'Closed Data' ProblemsSimon Price

Killam_PaulResumeProgrammingPaul Killam

Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Ilkay Altintas, Ph.D.

Programming for data science in pythonUmmeSalmaM1

Computer Sciences Lab butest

Data scienceMohamed Loey

Short intro to 3D Geoinfo pilot The NetherlandsLéon Berlo

Deep learning: challenges and applicationsAboul Ella Hassanien

Data science SouravSadhukhan6

Learning End to EndThilo Stadelmann

Data scienceSreejith c

Tendances (20)

2005)

Kaggle Days Milan - March 2019

Jiali_Han_Resume

Kaggle Days Brussels - Alberto Danese

Ivann's Resume(Recent Graduate)

data science

Learning to Learn Model Behavior ( Capital One: data intelligence conference )

ECAI 2014 Tutorial on a behavioral analysis tool for agent-based simulations ...

Adopting a situated learning framework for (big) data projects

Adding Open Data Value to 'Closed Data' Problems

Killam_PaulResumeProgramming

Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...

Programming for data science in python

Computer Sciences Lab

Data science

Short intro to 3D Geoinfo pilot The Netherlands

Deep learning: challenges and applications

Data science

Learning End to End

Data science

Similaire à Applying Commercial Computer Vision Tools to Cope with Uncertainties in a Citizendriven Archive

The Art and Science of Analyzing Software DataCS, NcState

Data Sets as Facilitator for new Products and Services for UniversitiesHendrik Drachsler

Opportunities and methodological challenges of Big Data for official statist...Piet J.H. Daas

Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth

Towards a Community-driven Data Science Body of Knowledge – Data Management S...Research Data Alliance

Lecture_1_Intro_toDS&AI.pptxDrSudheerHanumanthak

Toward supporting decision-making under uncertainty in digital humanities wit...Technological Ecosystems for Enhancing Multiculturality

Ci2004-10.docbutest

data scienceskhraletta

How to crack down big data? Ta-Wei (David) Huang

Archaeology and cultural heritage application working groupManolis Vavalis

The Nature of Digitally-Produced Data: Towards Social-Scientific Tool CriticismJacco van Ossenbruggen

The Nature Of Digitally-Produced Data: Towards Social-Scientific Tool CriticismMyriam Traub

Visual Information Analysis for Crisis and Natural Disasters Management and R...Yiannis Kompatsiaris

Big data as a source for official statisticsEdwin de Jonge

Data Science Demystified_ Journeying Through Insights and InnovationsVaishali Pal

Data Science in 2016: Moving UpPaco Nathan

Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain

Digital Forensics: The next 10 yearsAl Imran, CISA

Parthenos Webinar Make It Happen - Carrying Out Research and Analyzing DataParthenos

Similaire à Applying Commercial Computer Vision Tools to Cope with Uncertainties in a Citizendriven Archive (20)

The Art and Science of Analyzing Software Data

Data Sets as Facilitator for new Products and Services for Universities

Opportunities and methodological challenges of Big Data for official statist...

Thoughts on Knowledge Graphs & Deeper Provenance

Towards a Community-driven Data Science Body of Knowledge – Data Management S...

Lecture_1_Intro_toDS&AI.pptx

Toward supporting decision-making under uncertainty in digital humanities wit...

Ci2004-10.doc

data science

How to crack down big data?

Archaeology and cultural heritage application working group

The Nature of Digitally-Produced Data: Towards Social-Scientific Tool Criticism

The Nature Of Digitally-Produced Data: Towards Social-Scientific Tool Criticism

Visual Information Analysis for Crisis and Natural Disasters Management and R...

Big data as a source for official statistics

Data Science Demystified_ Journeying Through Insights and Innovations

Data Science in 2016: Moving Up

Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015

Digital Forensics: The next 10 years

Parthenos Webinar Make It Happen - Carrying Out Research and Analyzing Data

Plus de Technological Ecosystems for Enhancing Multiculturality

A Preliminary Study of Proof of Concept Practices and their connection with I...Technological Ecosystems for Enhancing Multiculturality

Social networks as a promotional space for Spanish radio content. The case st...Technological Ecosystems for Enhancing Multiculturality

Towards the study of sentiment in the public opinion of science in SpanishTechnological Ecosystems for Enhancing Multiculturality

A Three-Step Data-Mining Analysis of Top-Ranked Higher Education Institutions...Technological Ecosystems for Enhancing Multiculturality

Specifics of multimedia texts in the context of social networks media aestheticsTechnological Ecosystems for Enhancing Multiculturality

Combined Effects of Similarity and Imagined Contact on First-Person Testimoni...Technological Ecosystems for Enhancing Multiculturality

Direct online political communication effects on civil participation in spain...Technological Ecosystems for Enhancing Multiculturality

University Media in Ecuador: Types, Functions and Self-determinationTechnological Ecosystems for Enhancing Multiculturality

Like it or die: using social networks to improve collaborative learning in hi...Technological Ecosystems for Enhancing Multiculturality

Framing theory in studies of environmental information in pressTechnological Ecosystems for Enhancing Multiculturality

Domain engineering for generating dashboards to analyze employment and employ...Technological Ecosystems for Enhancing Multiculturality

Mapping the systematic literature studies about software ecosystemsTechnological Ecosystems for Enhancing Multiculturality

Tag-Based Browsing of Digital Collections with Inverted Indexes and Browsing ...Technological Ecosystems for Enhancing Multiculturality

A Multivocal Literature Review on the use of DevOps for e-learning systemsTechnological Ecosystems for Enhancing Multiculturality

Document Annotation Tools: Annotation Classification MechanismsTechnological Ecosystems for Enhancing Multiculturality

Managing Uncertainty in the Humanities: Digital and Analogue ApproachesTechnological Ecosystems for Enhancing Multiculturality

Representing Imprecise and Uncertain Knowledge in Digital Humanities: A Theor...Technological Ecosystems for Enhancing Multiculturality

Dotmocracy and Planning Poker for Uncertainty Management in Collaborative Res...Technological Ecosystems for Enhancing Multiculturality

Appliying topic modeling techniques to degraded texts. Spanish historical pre...Technological Ecosystems for Enhancing Multiculturality

Uncertainty in the spatial metadata of historical photographs – A geomatic an...Technological Ecosystems for Enhancing Multiculturality

Plus de Technological Ecosystems for Enhancing Multiculturality (20)

A Preliminary Study of Proof of Concept Practices and their connection with I...

Social networks as a promotional space for Spanish radio content. The case st...

Towards the study of sentiment in the public opinion of science in Spanish

A Three-Step Data-Mining Analysis of Top-Ranked Higher Education Institutions...

Specifics of multimedia texts in the context of social networks media aesthetics

Combined Effects of Similarity and Imagined Contact on First-Person Testimoni...

Direct online political communication effects on civil participation in spain...

University Media in Ecuador: Types, Functions and Self-determination

Like it or die: using social networks to improve collaborative learning in hi...

Framing theory in studies of environmental information in press

Domain engineering for generating dashboards to analyze employment and employ...

Mapping the systematic literature studies about software ecosystems

Tag-Based Browsing of Digital Collections with Inverted Indexes and Browsing ...

A Multivocal Literature Review on the use of DevOps for e-learning systems

Document Annotation Tools: Annotation Classification Mechanisms

Managing Uncertainty in the Humanities: Digital and Analogue Approaches

Representing Imprecise and Uncertain Knowledge in Digital Humanities: A Theor...

Dotmocracy and Planning Poker for Uncertainty Management in Collaborative Res...

Appliying topic modeling techniques to degraded texts. Spanish historical pre...

Uncertainty in the spatial metadata of historical photographs – A geomatic an...

Dernier

Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron

Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB

Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr

Arihant handbook biology for class 11 .pdfchloefrazer622

Software Engineering Methodologies (overview)eniolaolutunde

Staff of Color (SOC) Retention Efforts DDSDDavid Douglas School District

Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle

How to Make a Pirate ship Primary Education.pptxmanuelaromero2013

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy

Paris 2024 Olympic Geographies - an activityGeoBlogs

Código Creativo y Arte de Software | Unidad 1Maestría en Comunicación Digital Interactiva - UNR

Nutritional Needs Presentation - HLTH 104misteraugie

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxRAM LAL ANAND COLLEGE, DELHI UNIVERSITY.

1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"National Information Standards Organization (NISO)

Accessible design: Minimum effort, maximum impactdawncurless

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD

Measures of Central Tendency: Mean, Median and ModeThiyagu K

Dernier (20)

Q4-W6-Restating Informational Text Grade 3

Beyond the EU: DORA and NIS 2 Directive's Global Impact

Z Score,T Score, Percential Rank and Box Plot Graph

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...

Arihant handbook biology for class 11 .pdf

Software Engineering Methodologies (overview)

Staff of Color (SOC) Retention Efforts DDSD

Hybridoma Technology ( Production , Purification , and Application )

How to Make a Pirate ship Primary Education.pptx

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf

Paris 2024 Olympic Geographies - an activity

Código Creativo y Arte de Software | Unidad 1

Nutritional Needs Presentation - HLTH 104

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx

1029-Danh muc Sach Giao Khoa khoi 6.pdf

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"

Accessible design: Minimum effort, maximum impact

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...

Measures of Central Tendency: Mean, Median and Mode

Applying Commercial Computer Vision Tools to Cope with Uncertainties in a Citizendriven Archive

1. Applying Commercial Computer Vision Tools to Cope with Uncertainties in a Citizen-driven Archive. The case study Topothek@exploreAT! Amelie Dorn 1, Eveline Wandl-Vogt 1 Thomas Palfinger 1, José Luis Preza Díaz 1, Barbara Piringer 1 Alexander Schatek 2, Rainer Zoubek 2 1 Austrian Academy of Sciences, Austrian Centre for Digital Humanities - AAS (AT) 2 Topothek (AT) TEEM’2018. Track 13. Salamanca (ES) 24/10/2018

2. Background, Problem Definition and Motivation Uncertainty • commonly encountered issue across data of different disciplines and communities • causes: mixed data quality, low-level annotations, time, lack of standards, incoherent infrastructures, etc • challenges: data extraction, analysis, interlinking

3. Background, Problem Definition and Motivation Motivation • Digital Humanities → availability of digital methods & tools to tackle uncertainties • Use-case Topothek → testing of AI technologies (Clarifai, IBM Watson, Microsoft cognitive services) with the aim to enhance data access for humans (researchers, citizens) as well as machines

4. Background, Problem Definition and Motivation Context @ exploration space ACDH-OeAW → multidisciplinary DH-project → implementation in Open Innovation Research Infrastructure (OI-RI) → link language data to real-world objects

5. Use Case Topothek - Background, Goals and Tagging ● completely citizen-driven archive ● focus on history of municipalities ● 200 topotheques in 14 European countries ● ~300.000 objects Sources: https://www.topothek.at/en/; https://www.topothek.at/de/unsere-topotheken Map data ©2018 GeoBasis-DE/BKG (©2009), Google

6. Use Case Topotheque - Background, Goals and Tagging Community tags - persons Object tags - buildings, objects Context tags - events Source: Topothek Wieselburg. http://wieselburg.topothek.at/#doc=311992. Copyright notice: Urheber: Stadtarchiv, Foto-Dufek 1437/25A

7. Use Case Topotheque - Background, Goals and Tagging Community tags - persons Object tags - buildings, objects Context tags - events → sources of uncertainty: image quality, tag quality Source: Topothek Wieselburg. http://wieselburg.topothek.at/#doc=311992. Copyright notice: Urheber: Stadtarchiv, Foto-Dufek 1437/25A

9. Research Questions Q1: How many algorithm generated concepts are identified as relevant/ irrelevant for the image for pooled results from the three different tools? Q2: What are the consequences of Q1 regarding certainty/ uncertainty in Topotheque’s data for access and interconnection?

10. Results: Computer Vision (1) ● 3 CV-algorithms ● 21 images ● 912 / 1067 tags ● 24 common concepts ● 782 differing concepts

11. Results: Computer Vision (2) Q1: How many algorithm generated concepts are identified as relevant/ irrelevant for the image for pooled results from the three different tools? Total: 912 unique concepts relevant: 40% (n=367) non-relevant: 36% (n=326) not useful: 14% (n=128) uncertain: 4% (n=37)

12. Results: CV & Uncertainty Q2: What are the consequences of Q1 regarding certainty/ uncertainty in Topotheque’s data for the further application of cv, the citizen user and the researcher? → need to train algorithm on data to make CV feasible for citizens → 3/3: uncertainty significantly reduced when same concepts produced by the 3 algorithms →2/3: on average 5 relevant and 1 non-relevant concept/image = more than double number of manually added object tags

13. Challenges, learnings & conclusion • build on the experiences of a science-as-a-service infrastructure • improve results with human-machine-interaction • improve the developmentof a thesaurus • tackle technical and societal challenges of uncertainty further

14. Thank you for your attention! @adooorn #explorations4u