SlideShare a Scribd company logo
1 of 17
Download to read offline
Mapping Commodity Trading in the 19th Century
Benjamin Bach,
INRIA,
Paris
Asma Malik,
University of
Strathclyde,
Glasgow
Michael
Mauderer,
University of
St Andrews
Sadiq Sani,
Robert Gordon
University,
Aberdeen
Joe Wandy,
University of
Glasgow
Outline
● Project Overview
● Data
● Technology
● Demo
● Future Work
Overview
19th Century
Commodities Diseases
Locations Disasters
Process
Tasks
● Retrieve documents mentioning
○ Commodities
○ Locations
○ Time range
● Relations between retrieved terms
○ Spatial relations
○ Temporal relations
○ Co-occurrence relations
Users:
Historians
Data
● Commodities: 1067
● Time: 1600 - 1952 (452 years)
● Documents: 18 580
● Location occurrences: 91 650 469
● Commodity occurrences: 29 020 013
The Data
● PostgreSQL Database in Edinburgh
○ Not accessible
● PostgreSQL Database in St Andrews
○ Low Performance
● PostgreSQL Database Backup
○ 2.5GB compressed binary data
○ Cannot be imported into Amazon RDS
Solution 1
● Create a more compatible SQL export to
import into Amazon RDS
○ 24GB raw text file containing SQL statements
○ still incompatible
○ hard to correct errors in a timely manner
Solution 2
● Create EC2 instance running a PostgreSQL
database
○ Powerful enough
○ Enough storage
○ Accessible
Big Data Problems
● Simple things take a long time
● Incremental finding of errors/new problems
The Pipeline
● D3 for client-side presentation
● Java+SQL for server-side processing
data
Database
Web Service
Client
Commodities, date range
Initial Sketches
Visualization
- Space and time
-> Finding related terms + documents
- find related documents
- what are documents talking about
- Implicit knowledge:
- Co-occurrences of terms in documents
For every commodity:
1) Get top 10 documents,
2) Limit related terms to 6
3) Sum up co-occurrences
Demo
Future work
- Query by Location
- Time diagrams for term frequency over time
- Encode information in matrix cells (#doc,collection..)
- Show and browse documents
- Handle big data: diseases, disasters, ..
- Co-occurrences ?
Thank you for listening!

More Related Content

Similar to Mapping Commodity Trading

Data engineering in 10 years.pdf
Data engineering in 10 years.pdfData engineering in 10 years.pdf
Data engineering in 10 years.pdfLars Albertsson
 
Simple Archive Architectures
Simple Archive ArchitecturesSimple Archive Architectures
Simple Archive ArchitecturesLighton Phiri
 
Lviv Outsourcing Forum 2016 Михайло Крамаренко “IT-outsourcing: Retrospection...
Lviv Outsourcing Forum 2016 Михайло Крамаренко “IT-outsourcing: Retrospection...Lviv Outsourcing Forum 2016 Михайло Крамаренко “IT-outsourcing: Retrospection...
Lviv Outsourcing Forum 2016 Михайло Крамаренко “IT-outsourcing: Retrospection...Lviv Startup Club
 
Portland Common Data Model (PCDM): Creating and Sharing Complex Digital Objects
Portland Common Data Model (PCDM): Creating and Sharing Complex Digital ObjectsPortland Common Data Model (PCDM): Creating and Sharing Complex Digital Objects
Portland Common Data Model (PCDM): Creating and Sharing Complex Digital ObjectsKaren Estlund
 
The Internet in Database: A Cassandra Use Case
The Internet in Database: A Cassandra Use CaseThe Internet in Database: A Cassandra Use Case
The Internet in Database: A Cassandra Use CaseDatafiniti
 
India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015Kanwal Prakash Singh
 
India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015Kanwal Prakash Singh
 

Similar to Mapping Commodity Trading (7)

Data engineering in 10 years.pdf
Data engineering in 10 years.pdfData engineering in 10 years.pdf
Data engineering in 10 years.pdf
 
Simple Archive Architectures
Simple Archive ArchitecturesSimple Archive Architectures
Simple Archive Architectures
 
Lviv Outsourcing Forum 2016 Михайло Крамаренко “IT-outsourcing: Retrospection...
Lviv Outsourcing Forum 2016 Михайло Крамаренко “IT-outsourcing: Retrospection...Lviv Outsourcing Forum 2016 Михайло Крамаренко “IT-outsourcing: Retrospection...
Lviv Outsourcing Forum 2016 Михайло Крамаренко “IT-outsourcing: Retrospection...
 
Portland Common Data Model (PCDM): Creating and Sharing Complex Digital Objects
Portland Common Data Model (PCDM): Creating and Sharing Complex Digital ObjectsPortland Common Data Model (PCDM): Creating and Sharing Complex Digital Objects
Portland Common Data Model (PCDM): Creating and Sharing Complex Digital Objects
 
The Internet in Database: A Cassandra Use Case
The Internet in Database: A Cassandra Use CaseThe Internet in Database: A Cassandra Use Case
The Internet in Database: A Cassandra Use Case
 
India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015
 
India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Mapping Commodity Trading

  • 1. Mapping Commodity Trading in the 19th Century Benjamin Bach, INRIA, Paris Asma Malik, University of Strathclyde, Glasgow Michael Mauderer, University of St Andrews Sadiq Sani, Robert Gordon University, Aberdeen Joe Wandy, University of Glasgow
  • 2. Outline ● Project Overview ● Data ● Technology ● Demo ● Future Work
  • 5. Tasks ● Retrieve documents mentioning ○ Commodities ○ Locations ○ Time range ● Relations between retrieved terms ○ Spatial relations ○ Temporal relations ○ Co-occurrence relations Users: Historians
  • 6. Data ● Commodities: 1067 ● Time: 1600 - 1952 (452 years) ● Documents: 18 580 ● Location occurrences: 91 650 469 ● Commodity occurrences: 29 020 013
  • 7. The Data ● PostgreSQL Database in Edinburgh ○ Not accessible ● PostgreSQL Database in St Andrews ○ Low Performance ● PostgreSQL Database Backup ○ 2.5GB compressed binary data ○ Cannot be imported into Amazon RDS
  • 8. Solution 1 ● Create a more compatible SQL export to import into Amazon RDS ○ 24GB raw text file containing SQL statements ○ still incompatible ○ hard to correct errors in a timely manner
  • 9. Solution 2 ● Create EC2 instance running a PostgreSQL database ○ Powerful enough ○ Enough storage ○ Accessible
  • 10. Big Data Problems ● Simple things take a long time ● Incremental finding of errors/new problems
  • 11. The Pipeline ● D3 for client-side presentation ● Java+SQL for server-side processing data Database Web Service Client Commodities, date range
  • 13.
  • 14. Visualization - Space and time -> Finding related terms + documents - find related documents - what are documents talking about - Implicit knowledge: - Co-occurrences of terms in documents For every commodity: 1) Get top 10 documents, 2) Limit related terms to 6 3) Sum up co-occurrences
  • 15. Demo
  • 16. Future work - Query by Location - Time diagrams for term frequency over time - Encode information in matrix cells (#doc,collection..) - Show and browse documents - Handle big data: diseases, disasters, .. - Co-occurrences ?
  • 17. Thank you for listening!