1. TO I N E P I E T E RS A N D JA A P V E R H E U L
U T R E C H T U N I V E R S I T Y , T H E N E T H E R L A N D S
Texcavator
Text Mining Historical Newspapers
2. Overview
Translantis research project
Concept of reference cultures
Digital humanities
Texcavator tool
Requirements
Features
Configuration
Texcavator use cases
Future ambitions
Challenges
Cultural Text Mining
KB Big Data Conference 24 March 2015
3. T R A N S L A N T I S . N L
KB Big Data Conference 24 March 2015
Translantis research project
4. Translantis
Topic: emergence of the United States in Public Discourse in the
Netherlands, 1890-1990
Concept: transnational reference cultures
Method: digital humanities text mining
Translantis.nl
KB Big Data Conference 24 March 2015
6. T R A N S L A N T I S . N L
KB Big Data Conference 24 March 2015
Texcavator
7. Texcavator
generic tool for cultural text mining and big data
research
enables scholars to systematically search very large
quantities of textual data in a reliable and reproducible
way
able to support exploration and contextualization
serve multiple user groups
Wide community of historians using big data
Translantis team (NWO-funded)
Asymmetrical Encounters team (HERA-funded)
KB Big Data Conference 24 March 2015
8. Features
Direct access to big data repository
Integrated text-mining tools
Boolean search
Named Entity Recognition
Sentiment mining
Stemming
Real-time visualization of search results
Dynamic word clouds (and export of underlying data)
Timelines (normalized, bursts)
Input-output storage
Close and distant reading
KB Big Data Conference 24 March 2015
20. Taylorism
KB Big Data Conference 24 March 2015
Voyant word cloud van
“wetenschappelijke
bedrijfsleiding” dataset
Verwijzingen over tijd binnen
“wetenschappelijke bedrijfsleiding” dataset
naar “Taylor”, “taylor-stelsel”, “Taylor-
systeem”
21. C H A L L E N G ES &
O P P O RT U N I T I ES
KB Big Data Conference 24 March 2015
Ambitions
22. Challenges
Software development
Stable version of Texcavator
Intuitive interface
Additional features
Technological
Processor and server capacity
Data exchange and standardization (metatags)
OCR
Scientific
Combining close and distant reading
Reproducability
KB Big Data Conference 24 March 2015
23. Cultural Text Mining
Mining of cultural aspects of entities and events
Concepts, mentalities, ideas, utopia’s, etc
Mining for Meaning
Towards digital conceptual history or digital history of
mentalities
Address macro-historical questions:
Trends, patterns, structures in debates
Circulation of knowledge
Emergence of transnational reference cultures
KB Big Data Conference 24 March 2015