SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
Tool Criticism
Marijn Koolen
(Huygens Institute for the History of the Netherlands)
Tools & Methods guest lecture - 2021-03-02 - Groningen
● Python/Jupyter to do:
○ GIS, Plotting locations on Google Maps
○ Machine Learning, (un)supervised learning, visualisation techniques
○ Mining social media data
○ TF*IDF, information processing, Word embedding models
○ Statistics / JASP
● Questions:
○ Why interested?
○ What would you like do with this?
Your Interests?
Online Resources
● Online (note)books for DH and Python
● Generic Jupyter:
○ https://jupyter4edu.github.io/jupyter-edu-book/
○ https://programminghistorian.org/en/lessons/jupyter-notebooks
● Specific methods and techniques
○ Cultural Analytics: https://melaniewalsh.github.io/Intro-Cultural-Analytics/welcome.html
■ Includes TF*IDF, Tweet mining and analysis, Geocoding
○ Named Entity Recognition: http://ner.pythonhumanities.com/intro.html
○ Deep Learning: https://course.fast.ai
○ NLP: Traditional and Deep Learning: https://www.fast.ai/2019/07/08/fastai-nlp/
■ Includes Word Embeddings, sentiment analysis, topic modelling, classification, …
○ GLAM Workbench: https://glam-workbench.github.io
■ Retrieving and analysing data from Galleries, Libraries, Archives, Museums
● Starting point: (digital) source criticism
○ Method / approach in the humanities and specifically in historical research (cf. Fickers, 2012)
○ Internal source criticism: content of the document
○ External source criticism: metadata of the document (context)
■ Who created the document?
■ What kind of document is it?
■ Where was it made and distributed?
■ When was it made?
■ Why was it made?
● Digital Tool Criticism
○ What makes digital tool criticism different from digital source criticism?
○ Tool hermeneutics: what was its intended use? Does that align with my intended use? How
does it affect the digital sources/data it operates on?
Guiding Questions
For researchers:
- Incorporate digital source, data and tool criticism in research process
- Explicitly ask and answer questions about assumptions, choices, limitations
- Document and share workarounds
- Look for “About” pages and documentation on
- Functionalities, configurations, parameter choices
- Selection criteria and transformations of data sets
- Develop method of experimentation with tool to test functioning
- Look under the hood to develop better intuitions, grow your conceptual toolbox
- E.g. how can you test if a search engine filters stopwords or does linguistic normalization?
Recommendations
Model: Reflection as Integrative Practice
Koolen, van Gorp & van Ossenbruggen, 2018. Toward a model for digital tool criticism: Reflection as integrative practice. In Digital
Scholarship in the Humanities 2018. https://academic.oup.com/dsh/advance-article/doi/10.1093/llc/fqy048/5127711
Role of Reflection
● Reflection In Action
○ Process is often unpredictable and uncertain (Schön 1983, p. 40)
○ Some actions, recognitions and judgements we carry out spontaneously, without thinking
about them (p. 54)
○ Use reflection to criticize tacit understanding grown from repetitive experiences (p. 61)
● This fits certain aspects of scholarly practice
○ E.g. searching, browsing, selecting using various information systems (digital archives and
libraries, catalogs and other databases).
○ But information systems already have pre-selection, rarely well-documented (digital source
criticism!)
Research Design as Wicked Problem
● Wicked problem
○ Design theory concept, a problem that is inherently ill-defined (Rittel in Churchman 1967)
○ Working towards solution changes the nature of the problem
● Humanities research is designed iteratively (Bron et al. 2016)
○ Impossible to plan where investigation takes you
○ Engagement with research materials shift goal posts
○ Affects appropriateness of design for RQ
● User-friendliness of digital tools exacerbates the problem
○ Graphical User Interfaces (GUIs) often hide relevant data transformations and manipulations
○ Difficult to look under the hood
○ Requires active reflective attitude
Entanglement of Data and Tools
Entanglement of Data and Tools
Each step changes the underlying data!
● How to address tool criticism questions
○ Focus on research methods
● E.g. Social Network Analysis (SNA)
○ Understand concepts, techniques and applications of SNA before assessing SNA tools
○ How many of you have used SNA tools? How many of you want to use them?
○ Gephi or NetworkX (Python library)
● Before you ask...
○ Which layout algorithm should I use?
○ Which community detection algorithm should I use? What parameters are good?
● … understand core concepts:
○ nodes, edges, link degrees, paths, connected components,
○ Modularity, bridge, weak ties,
○ Completeness, impact of missing data
Tools or Methods?
Source: https://towardsdatascience.com/generating-twitter-ego-networks-detecting-ego-communities-93897883d255
● Term Frequency * Inverse Document Frequency
○ Used in many methods and tools
○ What was TF*IDF originally intended for?
● Again, start from method
○ Natural Language Processing, Information Theory
○ Concepts: Zipf’s law, tokenisation, stop, stem, lemma, part-of-speech, mutual information
TF*IDF
● I’ve prepared a Jupyter notebook that demonstrates the workings of TF*DF
○ Using social media data (tweets and online reviews)
○ With 7 questions to reflect on its details
● Break out groups
○ Open the notebook and discuss the questions (take 20 mins.)
○ Afterwards we discuss your observations and your own questions
○ Also, look at the Wikipedia page on TF*IDF: https://en.wikipedia.org/wiki/Tf-idf
Hands On with TF*IDF
● Text Mining of tweets (and other short records)
○ Tweets are peculiar textual representations
■ Minimal amount of text, low redundancy
■ Majority of terms occur only once
○ Which part of TF*IDF contributes more to the TF*IDF score of a tweet?
○ Consequences for ranking/clustering/mining?
Text Mining in Tweets
● Resources
○ NRC EmoLex: 8 basic emotions (https://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm)
○ LIWC:over 70 categories, incl. emotions (https://liwc.wpengine.com)
○ VADER: Valence, Arousal, Dominance (https://github.com/cjhutto/vaderSentiment)
● Critical questions
○ How do they work? What are they intended to measure? For what text genres?
○ How reliable are they? What do they capture well? What are typical mistakes they make?
● Lessons from 20+ years of NLP research:
○ sentiment is domain-specific, nowadays aspect-based (reviews of hotels, restaurants and
smartphone have their own vocabularies)
● ALWAYS combine quantitative with qualitative analysis!
○ They contextualise each other
Sentiment Analysis and Emotion Lexicons
Questions About Social Media Sentiment Mining
● Another Jupyter notebook, that dissects sentiment analysis
○ Using social media data (tweets and online reviews)
○ With 9 questions to reflect on its details and output
● Break out groups
○ Open the notebook and discuss the questions (take 20 mins.)
○ Afterwards we discuss your observations and your own questions
● Concepts:
○ N-grams, skipgrams, distributional semantics
○ Semantic vs. syntactic similarity (related size of context window)
○ Generic vs. domain-specific models and text corpora
○ Pre-trained models, transfer learning
○ Corpus size
● See also (shameless self-promotion):
○ Wevers, M., & Koolen, M. (2020). Digital begriffsgeschichte: Tracing semantic change using
word embeddings. Historical Methods: A Journal of Quantitative and Interdisciplinary History,
53(4), 226-243.
○ https://www.tandfonline.com/doi/pdf/10.1080/01615440.2020.1760157
Word Embedding Models
● Finding patterns in data
○ But are they meaningful patterns?
○ Main point: separating regular features (signal) from ‘accidental feature’ (noise) of a dataset
■ If I throw a 6-sided die 10 times, the average is probably close to 3.5 (regular/signal) but
the particular sequence of sides is accidental (irregular/noise)
■ Many ‘regularities’ are artefacts introduced through selection (tweets from the last 24
hours may cover Sunday evening for one part of the world and Monday morning for
another)
● Which regularities are relevant depends on your research question
○ But ML methods are oblivious to your research question and context
Machine Learning
Tweet Corpora
● Existing corpora
○ Kaggle sentiment140: https://www.kaggle.com/kazanova/sentiment140
○ GESIS TweetsCOV19: https://data.gesis.org/tweetscov19/
○ GateNLP BTC: https://github.com/GateNLP/broad_twitter_corpus
○ Disaster Tweet Corpus 2020: https://zenodo.org/record/3713920
● How were they constructed?
○ Multiple layers of selection:
■ Twitter API
■ Collection methods and period, queries and cleaning/filtering
● For what purpose were they collected?
○ How has that shaped their construction?
Tool Criticism Recommendations (From Journal Article)
● Analyze and discuss tools at the level of data transformations.
○ How do inputs and outputs differ?
○ What does this mean for interpreting the transformed data?
● Questions to ask about digital data:
○ Where do the data come from? Who made the data? Who made the data available? What selection criteria were used?
How is it organized? What preprocessing steps were used to make the data available? If digitized from analogue sources,
how does the digitized data differ from the analogue sources? Are all sources digitized or only selected materials? What
are known omissions/gaps in the data?
● Questions about digital tools:
○ Which tools are available and relevant for your research? Which tool best fits the method you want to use? How does the
tool fit the method you want to use? For which phase of your research is this tool suitable? What kind of tool is it? Who
made the tool, when, why, and what for? How does the tool transform the data that it works upon? What are the potential
consequences of this?
● Questions about digital search tools:
○ What search strategies does the tool allow? What feedback about matching and non-matching documents does the tool
provide? What ways does the tool offer for sense-making and getting an overview of the data it gives access to?
● Questions about digital analysis tools:
○ What elements of the data does the tool allow you to analyze qualitatively or quantitatively? What ways of analyzing does
the tool offer, and what ways to contextualize your analysis?
References
Bron, M., Van Gorp, J., & De Rijke, M. (2016). Media studies research in the data‐driven age: How research questions evolve. Journal
of the Association for Information Science and Technology, 67(7), 1535-1554.
Churchman, C. W. (1967). Wicked problems. Management Science, 14(4), B141–142
Fickers, A. (2012). Towards a new digital historicism? Doing history in the age of abundance. VIEW Journal of European Television
History and Culture, 1(1), 19-26.
Hoekstra, R., & Koolen, M. (2019). Data scopes for digital history research. Historical Methods: A Journal of Quantitative and
Interdisciplinary History, 52(2), 79-94.
Koolen, van Gorp & van Ossenbruggen, 2018. Toward a model for digital tool criticism: Reflection as integrative practice. In Digital
Scholarship in the Humanities 2018.
Schön, D. (1983). The reflective practitioner. How professionals think in action. New York: Basic Book. Inc., Publishers.
Wevers, M., & Koolen, M. (2020). Digital begriffsgeschichte: Tracing semantic change using word embeddings. Historical Methods: A
Journal of Quantitative and Interdisciplinary History, 53(4), 226-243.

Contenu connexe

Similaire à Tool criticism

Search in Research, Let's Make it More Complex!
Search in Research, Let's Make it More Complex!Search in Research, Let's Make it More Complex!
Search in Research, Let's Make it More Complex!Marijn Koolen
 
Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...
Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...
Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...Marijn Koolen
 
Curtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahooCurtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahoo羽祈 張
 
Webinar 3 - AI & Investigative Journalism - Training Slidedeck
Webinar 3 - AI & Investigative Journalism - Training SlidedeckWebinar 3 - AI & Investigative Journalism - Training Slidedeck
Webinar 3 - AI & Investigative Journalism - Training Slidedeckwalkleys
 
How to do science in a large IT company (ICPC World Finals 2021, Moscow)
How to do science in a large IT company (ICPC World Finals 2021, Moscow)How to do science in a large IT company (ICPC World Finals 2021, Moscow)
How to do science in a large IT company (ICPC World Finals 2021, Moscow)Alexander Borzunov
 
Requirements Engineering for the Humanities
Requirements Engineering for the HumanitiesRequirements Engineering for the Humanities
Requirements Engineering for the HumanitiesShawn Day
 
Solstice 2019 social media tools and data
Solstice 2019 social media tools and data Solstice 2019 social media tools and data
Solstice 2019 social media tools and data Pete F. Atherton
 
Introduction Data Science.pptx
Introduction Data Science.pptxIntroduction Data Science.pptx
Introduction Data Science.pptxAkhirulAminulloh2
 
Requirements for Learning Analytics
Requirements for Learning AnalyticsRequirements for Learning Analytics
Requirements for Learning AnalyticsTore Hoel
 
Wimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity ReportWimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity ReportFabien Gandon
 
Data Analytics.03. Data processing
Data Analytics.03. Data processingData Analytics.03. Data processing
Data Analytics.03. Data processingAlex Rayón Jerez
 
What you did last summer?
What you did last summer?What you did last summer?
What you did last summer?DoThinger
 
Ten Lessons Learnt to Drive and Transform Open Source Software User Experienc...
Ten Lessons Learnt to Drive and Transform Open Source Software User Experienc...Ten Lessons Learnt to Drive and Transform Open Source Software User Experienc...
Ten Lessons Learnt to Drive and Transform Open Source Software User Experienc...All Things Open
 
Ten Lessons Learnt to Drive and Transform Open Source Software User Experienc...
Ten Lessons Learnt to Drive and Transform Open Source Software User Experienc...Ten Lessons Learnt to Drive and Transform Open Source Software User Experienc...
Ten Lessons Learnt to Drive and Transform Open Source Software User Experienc...Ju Lim
 
Managing Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsManaging Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsRebecca Grant
 
PETE&C 2018: Let's Get Digital: Problem solving that is!
PETE&C 2018: Let's Get Digital: Problem solving that is!PETE&C 2018: Let's Get Digital: Problem solving that is!
PETE&C 2018: Let's Get Digital: Problem solving that is!The Source for Learning, Inc.
 
A step towards machine learning at accionlabs
A step towards machine learning at accionlabsA step towards machine learning at accionlabs
A step towards machine learning at accionlabsChetan Khatri
 

Similaire à Tool criticism (20)

Search in Research, Let's Make it More Complex!
Search in Research, Let's Make it More Complex!Search in Research, Let's Make it More Complex!
Search in Research, Let's Make it More Complex!
 
Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...
Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...
Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...
 
Curtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahooCurtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahoo
 
Webinar 3 - AI & Investigative Journalism - Training Slidedeck
Webinar 3 - AI & Investigative Journalism - Training SlidedeckWebinar 3 - AI & Investigative Journalism - Training Slidedeck
Webinar 3 - AI & Investigative Journalism - Training Slidedeck
 
How to do science in a large IT company (ICPC World Finals 2021, Moscow)
How to do science in a large IT company (ICPC World Finals 2021, Moscow)How to do science in a large IT company (ICPC World Finals 2021, Moscow)
How to do science in a large IT company (ICPC World Finals 2021, Moscow)
 
Requirements Engineering for the Humanities
Requirements Engineering for the HumanitiesRequirements Engineering for the Humanities
Requirements Engineering for the Humanities
 
Solstice 2019 social media tools and data
Solstice 2019 social media tools and data Solstice 2019 social media tools and data
Solstice 2019 social media tools and data
 
Introduction Data Science.pptx
Introduction Data Science.pptxIntroduction Data Science.pptx
Introduction Data Science.pptx
 
Requirements for Learning Analytics
Requirements for Learning AnalyticsRequirements for Learning Analytics
Requirements for Learning Analytics
 
Wimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity ReportWimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity Report
 
Data Analytics.03. Data processing
Data Analytics.03. Data processingData Analytics.03. Data processing
Data Analytics.03. Data processing
 
Scale2014
Scale2014Scale2014
Scale2014
 
What you did last summer?
What you did last summer?What you did last summer?
What you did last summer?
 
Wheatley and Hervieux "Voice-Assistants, Artificial Intelligence, and the fut...
Wheatley and Hervieux "Voice-Assistants, Artificial Intelligence, and the fut...Wheatley and Hervieux "Voice-Assistants, Artificial Intelligence, and the fut...
Wheatley and Hervieux "Voice-Assistants, Artificial Intelligence, and the fut...
 
Ten Lessons Learnt to Drive and Transform Open Source Software User Experienc...
Ten Lessons Learnt to Drive and Transform Open Source Software User Experienc...Ten Lessons Learnt to Drive and Transform Open Source Software User Experienc...
Ten Lessons Learnt to Drive and Transform Open Source Software User Experienc...
 
Ten Lessons Learnt to Drive and Transform Open Source Software User Experienc...
Ten Lessons Learnt to Drive and Transform Open Source Software User Experienc...Ten Lessons Learnt to Drive and Transform Open Source Software User Experienc...
Ten Lessons Learnt to Drive and Transform Open Source Software User Experienc...
 
Managing Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsManaging Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research Methods
 
Data science as career
Data science as careerData science as career
Data science as career
 
PETE&C 2018: Let's Get Digital: Problem solving that is!
PETE&C 2018: Let's Get Digital: Problem solving that is!PETE&C 2018: Let's Get Digital: Problem solving that is!
PETE&C 2018: Let's Get Digital: Problem solving that is!
 
A step towards machine learning at accionlabs
A step towards machine learning at accionlabsA step towards machine learning at accionlabs
A step towards machine learning at accionlabs
 

Plus de Marijn Koolen

Recommender Systems NL Meetup
Recommender Systems NL MeetupRecommender Systems NL Meetup
Recommender Systems NL MeetupMarijn Koolen
 
Narrative-Driven Recommendation for Casual Leisure Needs
Narrative-Driven Recommendation for Casual Leisure NeedsNarrative-Driven Recommendation for Casual Leisure Needs
Narrative-Driven Recommendation for Casual Leisure NeedsMarijn Koolen
 
Digital History - Maritieme Carrieres bij de VOC
Digital History - Maritieme Carrieres bij de VOCDigital History - Maritieme Carrieres bij de VOC
Digital History - Maritieme Carrieres bij de VOCMarijn Koolen
 
Facilitating reusable third-party annotations in the digital edition
Facilitating reusable third-party annotations in the digital editionFacilitating reusable third-party annotations in the digital edition
Facilitating reusable third-party annotations in the digital editionMarijn Koolen
 
Narrative-Driven Recommendation for Casual Leisure Needs
Narrative-Driven Recommendation for Casual Leisure NeedsNarrative-Driven Recommendation for Casual Leisure Needs
Narrative-Driven Recommendation for Casual Leisure NeedsMarijn Koolen
 
Scholary Web Annotation - HuC Live 2018
Scholary Web Annotation - HuC Live 2018Scholary Web Annotation - HuC Live 2018
Scholary Web Annotation - HuC Live 2018Marijn Koolen
 

Plus de Marijn Koolen (6)

Recommender Systems NL Meetup
Recommender Systems NL MeetupRecommender Systems NL Meetup
Recommender Systems NL Meetup
 
Narrative-Driven Recommendation for Casual Leisure Needs
Narrative-Driven Recommendation for Casual Leisure NeedsNarrative-Driven Recommendation for Casual Leisure Needs
Narrative-Driven Recommendation for Casual Leisure Needs
 
Digital History - Maritieme Carrieres bij de VOC
Digital History - Maritieme Carrieres bij de VOCDigital History - Maritieme Carrieres bij de VOC
Digital History - Maritieme Carrieres bij de VOC
 
Facilitating reusable third-party annotations in the digital edition
Facilitating reusable third-party annotations in the digital editionFacilitating reusable third-party annotations in the digital edition
Facilitating reusable third-party annotations in the digital edition
 
Narrative-Driven Recommendation for Casual Leisure Needs
Narrative-Driven Recommendation for Casual Leisure NeedsNarrative-Driven Recommendation for Casual Leisure Needs
Narrative-Driven Recommendation for Casual Leisure Needs
 
Scholary Web Annotation - HuC Live 2018
Scholary Web Annotation - HuC Live 2018Scholary Web Annotation - HuC Live 2018
Scholary Web Annotation - HuC Live 2018
 

Dernier

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 

Dernier (20)

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 

Tool criticism

  • 1. Tool Criticism Marijn Koolen (Huygens Institute for the History of the Netherlands) Tools & Methods guest lecture - 2021-03-02 - Groningen
  • 2. ● Python/Jupyter to do: ○ GIS, Plotting locations on Google Maps ○ Machine Learning, (un)supervised learning, visualisation techniques ○ Mining social media data ○ TF*IDF, information processing, Word embedding models ○ Statistics / JASP ● Questions: ○ Why interested? ○ What would you like do with this? Your Interests?
  • 3. Online Resources ● Online (note)books for DH and Python ● Generic Jupyter: ○ https://jupyter4edu.github.io/jupyter-edu-book/ ○ https://programminghistorian.org/en/lessons/jupyter-notebooks ● Specific methods and techniques ○ Cultural Analytics: https://melaniewalsh.github.io/Intro-Cultural-Analytics/welcome.html ■ Includes TF*IDF, Tweet mining and analysis, Geocoding ○ Named Entity Recognition: http://ner.pythonhumanities.com/intro.html ○ Deep Learning: https://course.fast.ai ○ NLP: Traditional and Deep Learning: https://www.fast.ai/2019/07/08/fastai-nlp/ ■ Includes Word Embeddings, sentiment analysis, topic modelling, classification, … ○ GLAM Workbench: https://glam-workbench.github.io ■ Retrieving and analysing data from Galleries, Libraries, Archives, Museums
  • 4.
  • 5. ● Starting point: (digital) source criticism ○ Method / approach in the humanities and specifically in historical research (cf. Fickers, 2012) ○ Internal source criticism: content of the document ○ External source criticism: metadata of the document (context) ■ Who created the document? ■ What kind of document is it? ■ Where was it made and distributed? ■ When was it made? ■ Why was it made? ● Digital Tool Criticism ○ What makes digital tool criticism different from digital source criticism? ○ Tool hermeneutics: what was its intended use? Does that align with my intended use? How does it affect the digital sources/data it operates on? Guiding Questions
  • 6. For researchers: - Incorporate digital source, data and tool criticism in research process - Explicitly ask and answer questions about assumptions, choices, limitations - Document and share workarounds - Look for “About” pages and documentation on - Functionalities, configurations, parameter choices - Selection criteria and transformations of data sets - Develop method of experimentation with tool to test functioning - Look under the hood to develop better intuitions, grow your conceptual toolbox - E.g. how can you test if a search engine filters stopwords or does linguistic normalization? Recommendations
  • 7. Model: Reflection as Integrative Practice Koolen, van Gorp & van Ossenbruggen, 2018. Toward a model for digital tool criticism: Reflection as integrative practice. In Digital Scholarship in the Humanities 2018. https://academic.oup.com/dsh/advance-article/doi/10.1093/llc/fqy048/5127711
  • 8. Role of Reflection ● Reflection In Action ○ Process is often unpredictable and uncertain (Schön 1983, p. 40) ○ Some actions, recognitions and judgements we carry out spontaneously, without thinking about them (p. 54) ○ Use reflection to criticize tacit understanding grown from repetitive experiences (p. 61) ● This fits certain aspects of scholarly practice ○ E.g. searching, browsing, selecting using various information systems (digital archives and libraries, catalogs and other databases). ○ But information systems already have pre-selection, rarely well-documented (digital source criticism!)
  • 9. Research Design as Wicked Problem ● Wicked problem ○ Design theory concept, a problem that is inherently ill-defined (Rittel in Churchman 1967) ○ Working towards solution changes the nature of the problem ● Humanities research is designed iteratively (Bron et al. 2016) ○ Impossible to plan where investigation takes you ○ Engagement with research materials shift goal posts ○ Affects appropriateness of design for RQ ● User-friendliness of digital tools exacerbates the problem ○ Graphical User Interfaces (GUIs) often hide relevant data transformations and manipulations ○ Difficult to look under the hood ○ Requires active reflective attitude
  • 10. Entanglement of Data and Tools
  • 11. Entanglement of Data and Tools Each step changes the underlying data!
  • 12. ● How to address tool criticism questions ○ Focus on research methods ● E.g. Social Network Analysis (SNA) ○ Understand concepts, techniques and applications of SNA before assessing SNA tools ○ How many of you have used SNA tools? How many of you want to use them? ○ Gephi or NetworkX (Python library) ● Before you ask... ○ Which layout algorithm should I use? ○ Which community detection algorithm should I use? What parameters are good? ● … understand core concepts: ○ nodes, edges, link degrees, paths, connected components, ○ Modularity, bridge, weak ties, ○ Completeness, impact of missing data Tools or Methods?
  • 14. ● Term Frequency * Inverse Document Frequency ○ Used in many methods and tools ○ What was TF*IDF originally intended for? ● Again, start from method ○ Natural Language Processing, Information Theory ○ Concepts: Zipf’s law, tokenisation, stop, stem, lemma, part-of-speech, mutual information TF*IDF
  • 15. ● I’ve prepared a Jupyter notebook that demonstrates the workings of TF*DF ○ Using social media data (tweets and online reviews) ○ With 7 questions to reflect on its details ● Break out groups ○ Open the notebook and discuss the questions (take 20 mins.) ○ Afterwards we discuss your observations and your own questions ○ Also, look at the Wikipedia page on TF*IDF: https://en.wikipedia.org/wiki/Tf-idf Hands On with TF*IDF
  • 16. ● Text Mining of tweets (and other short records) ○ Tweets are peculiar textual representations ■ Minimal amount of text, low redundancy ■ Majority of terms occur only once ○ Which part of TF*IDF contributes more to the TF*IDF score of a tweet? ○ Consequences for ranking/clustering/mining? Text Mining in Tweets
  • 17. ● Resources ○ NRC EmoLex: 8 basic emotions (https://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm) ○ LIWC:over 70 categories, incl. emotions (https://liwc.wpengine.com) ○ VADER: Valence, Arousal, Dominance (https://github.com/cjhutto/vaderSentiment) ● Critical questions ○ How do they work? What are they intended to measure? For what text genres? ○ How reliable are they? What do they capture well? What are typical mistakes they make? ● Lessons from 20+ years of NLP research: ○ sentiment is domain-specific, nowadays aspect-based (reviews of hotels, restaurants and smartphone have their own vocabularies) ● ALWAYS combine quantitative with qualitative analysis! ○ They contextualise each other Sentiment Analysis and Emotion Lexicons
  • 18. Questions About Social Media Sentiment Mining ● Another Jupyter notebook, that dissects sentiment analysis ○ Using social media data (tweets and online reviews) ○ With 9 questions to reflect on its details and output ● Break out groups ○ Open the notebook and discuss the questions (take 20 mins.) ○ Afterwards we discuss your observations and your own questions
  • 19. ● Concepts: ○ N-grams, skipgrams, distributional semantics ○ Semantic vs. syntactic similarity (related size of context window) ○ Generic vs. domain-specific models and text corpora ○ Pre-trained models, transfer learning ○ Corpus size ● See also (shameless self-promotion): ○ Wevers, M., & Koolen, M. (2020). Digital begriffsgeschichte: Tracing semantic change using word embeddings. Historical Methods: A Journal of Quantitative and Interdisciplinary History, 53(4), 226-243. ○ https://www.tandfonline.com/doi/pdf/10.1080/01615440.2020.1760157 Word Embedding Models
  • 20. ● Finding patterns in data ○ But are they meaningful patterns? ○ Main point: separating regular features (signal) from ‘accidental feature’ (noise) of a dataset ■ If I throw a 6-sided die 10 times, the average is probably close to 3.5 (regular/signal) but the particular sequence of sides is accidental (irregular/noise) ■ Many ‘regularities’ are artefacts introduced through selection (tweets from the last 24 hours may cover Sunday evening for one part of the world and Monday morning for another) ● Which regularities are relevant depends on your research question ○ But ML methods are oblivious to your research question and context Machine Learning
  • 21. Tweet Corpora ● Existing corpora ○ Kaggle sentiment140: https://www.kaggle.com/kazanova/sentiment140 ○ GESIS TweetsCOV19: https://data.gesis.org/tweetscov19/ ○ GateNLP BTC: https://github.com/GateNLP/broad_twitter_corpus ○ Disaster Tweet Corpus 2020: https://zenodo.org/record/3713920 ● How were they constructed? ○ Multiple layers of selection: ■ Twitter API ■ Collection methods and period, queries and cleaning/filtering ● For what purpose were they collected? ○ How has that shaped their construction?
  • 22. Tool Criticism Recommendations (From Journal Article) ● Analyze and discuss tools at the level of data transformations. ○ How do inputs and outputs differ? ○ What does this mean for interpreting the transformed data? ● Questions to ask about digital data: ○ Where do the data come from? Who made the data? Who made the data available? What selection criteria were used? How is it organized? What preprocessing steps were used to make the data available? If digitized from analogue sources, how does the digitized data differ from the analogue sources? Are all sources digitized or only selected materials? What are known omissions/gaps in the data? ● Questions about digital tools: ○ Which tools are available and relevant for your research? Which tool best fits the method you want to use? How does the tool fit the method you want to use? For which phase of your research is this tool suitable? What kind of tool is it? Who made the tool, when, why, and what for? How does the tool transform the data that it works upon? What are the potential consequences of this? ● Questions about digital search tools: ○ What search strategies does the tool allow? What feedback about matching and non-matching documents does the tool provide? What ways does the tool offer for sense-making and getting an overview of the data it gives access to? ● Questions about digital analysis tools: ○ What elements of the data does the tool allow you to analyze qualitatively or quantitatively? What ways of analyzing does the tool offer, and what ways to contextualize your analysis?
  • 23. References Bron, M., Van Gorp, J., & De Rijke, M. (2016). Media studies research in the data‐driven age: How research questions evolve. Journal of the Association for Information Science and Technology, 67(7), 1535-1554. Churchman, C. W. (1967). Wicked problems. Management Science, 14(4), B141–142 Fickers, A. (2012). Towards a new digital historicism? Doing history in the age of abundance. VIEW Journal of European Television History and Culture, 1(1), 19-26. Hoekstra, R., & Koolen, M. (2019). Data scopes for digital history research. Historical Methods: A Journal of Quantitative and Interdisciplinary History, 52(2), 79-94. Koolen, van Gorp & van Ossenbruggen, 2018. Toward a model for digital tool criticism: Reflection as integrative practice. In Digital Scholarship in the Humanities 2018. Schön, D. (1983). The reflective practitioner. How professionals think in action. New York: Basic Book. Inc., Publishers. Wevers, M., & Koolen, M. (2020). Digital begriffsgeschichte: Tracing semantic change using word embeddings. Historical Methods: A Journal of Quantitative and Interdisciplinary History, 53(4), 226-243.