SlideShare une entreprise Scribd logo
1  sur  36
Appropriate Technology
Across the Data Pipeline:
Toolchains and Technologies
Tony Hirst
Computing and Communications
The Open University
@psychemedia
blog.ouseful.info
Vicky Hugheston / flick/vicky_dom – “Craft”
morph.io
tabula.technology
openrefine.org
datascientistworkbench.com [IBM]
Kitematic (Docker Toolbox)
Docker & docker-compose
Jupyter Notebooks (try.jupyter.org)
Gors appropriate
Gors appropriate
Gors appropriate
Gors appropriate
Gors appropriate
Gors appropriate
Gors appropriate
Gors appropriate
Gors appropriate
Gors appropriate
Gors appropriate
Gors appropriate
Gors appropriate
Gors appropriate
Gors appropriate
Gors appropriate

Contenu connexe

Tendances

Rochi2008 Microwler
Rochi2008 MicrowlerRochi2008 Microwler
Rochi2008 MicrowlerMarius Butuc
 
Digitisation Infrastructure - June 2007
Digitisation Infrastructure - June 2007Digitisation Infrastructure - June 2007
Digitisation Infrastructure - June 2007Alastair Dunning
 
Making Sense of Digital Humanities: a Conversation Starter
Making Sense of Digital Humanities: a Conversation Starter Making Sense of Digital Humanities: a Conversation Starter
Making Sense of Digital Humanities: a Conversation Starter University of Cape Town
 
Digital Humanities - Conversation Starter 2015
Digital Humanities - Conversation Starter 2015Digital Humanities - Conversation Starter 2015
Digital Humanities - Conversation Starter 2015University of Cape Town
 
Hackteria: Hacking the Institutions Worldwide - since 2009
Hackteria: Hacking the Institutions Worldwide - since 2009Hackteria: Hacking the Institutions Worldwide - since 2009
Hackteria: Hacking the Institutions Worldwide - since 2009Marc Dusseiller Dusjagr
 
Intelligent cities: from digital to social analogic
Intelligent cities: from digital to social analogicIntelligent cities: from digital to social analogic
Intelligent cities: from digital to social analogicLuis Borges Gouveia
 
Humanities in the Digital World
Humanities in the Digital WorldHumanities in the Digital World
Humanities in the Digital WorldDavid De Roure
 
Social Semantic (Sensor) Web
Social Semantic (Sensor) WebSocial Semantic (Sensor) Web
Social Semantic (Sensor) WebDavid Crowley
 
Satellites and Education Conference
Satellites and Education ConferenceSatellites and Education Conference
Satellites and Education ConferenceBecky Jaramillo
 
Scholarship in the Digital World
Scholarship in the Digital WorldScholarship in the Digital World
Scholarship in the Digital WorldDavid De Roure
 
New Forms of Data for e-Research
New Forms of Data for e-ResearchNew Forms of Data for e-Research
New Forms of Data for e-ResearchDavid De Roure
 
Week 02 - COM546
Week 02 - COM546Week 02 - COM546
Week 02 - COM546Kathy Gill
 
Digital Preservation Initiatives
Digital Preservation InitiativesDigital Preservation Initiatives
Digital Preservation Initiativesdotyda
 
Big Data and Social Sciences
Big Data and Social SciencesBig Data and Social Sciences
Big Data and Social SciencesDavid De Roure
 
The Dawn of the Internet in Brazil
The Dawn of the Internet in BrazilThe Dawn of the Internet in Brazil
The Dawn of the Internet in BrazilMarcelo Sávio
 
Creative coding in art education -Fads presentation
Creative coding in art education -Fads presentationCreative coding in art education -Fads presentation
Creative coding in art education -Fads presentationTomi Dufva
 

Tendances (20)

Rochi2008 Microwler
Rochi2008 MicrowlerRochi2008 Microwler
Rochi2008 Microwler
 
Week Two Notes
Week Two NotesWeek Two Notes
Week Two Notes
 
Digitisation Infrastructure - June 2007
Digitisation Infrastructure - June 2007Digitisation Infrastructure - June 2007
Digitisation Infrastructure - June 2007
 
Making Sense of Digital Humanities: a Conversation Starter
Making Sense of Digital Humanities: a Conversation Starter Making Sense of Digital Humanities: a Conversation Starter
Making Sense of Digital Humanities: a Conversation Starter
 
Digital Humanities - Conversation Starter 2015
Digital Humanities - Conversation Starter 2015Digital Humanities - Conversation Starter 2015
Digital Humanities - Conversation Starter 2015
 
Hackteria: Hacking the Institutions Worldwide - since 2009
Hackteria: Hacking the Institutions Worldwide - since 2009Hackteria: Hacking the Institutions Worldwide - since 2009
Hackteria: Hacking the Institutions Worldwide - since 2009
 
David De Roure
David De RoureDavid De Roure
David De Roure
 
Intelligent cities: from digital to social analogic
Intelligent cities: from digital to social analogicIntelligent cities: from digital to social analogic
Intelligent cities: from digital to social analogic
 
Humanities in the Digital World
Humanities in the Digital WorldHumanities in the Digital World
Humanities in the Digital World
 
Social Semantic (Sensor) Web
Social Semantic (Sensor) WebSocial Semantic (Sensor) Web
Social Semantic (Sensor) Web
 
Satellites and Education Conference
Satellites and Education ConferenceSatellites and Education Conference
Satellites and Education Conference
 
Scholarship in the Digital World
Scholarship in the Digital WorldScholarship in the Digital World
Scholarship in the Digital World
 
New Forms of Data for e-Research
New Forms of Data for e-ResearchNew Forms of Data for e-Research
New Forms of Data for e-Research
 
Week 02 - COM546
Week 02 - COM546Week 02 - COM546
Week 02 - COM546
 
Digital Preservation Initiatives
Digital Preservation InitiativesDigital Preservation Initiatives
Digital Preservation Initiatives
 
Big Data and Social Sciences
Big Data and Social SciencesBig Data and Social Sciences
Big Data and Social Sciences
 
The Dawn of the Internet in Brazil
The Dawn of the Internet in BrazilThe Dawn of the Internet in Brazil
The Dawn of the Internet in Brazil
 
Creative coding in art education -Fads presentation
Creative coding in art education -Fads presentationCreative coding in art education -Fads presentation
Creative coding in art education -Fads presentation
 
Cet
CetCet
Cet
 
Trinity camel
Trinity camelTrinity camel
Trinity camel
 

En vedette

PSY4035 PG Literature searching for dissertation
PSY4035 PG Literature searching for dissertationPSY4035 PG Literature searching for dissertation
PSY4035 PG Literature searching for dissertationveades
 
Woongeschiedenis en de keuze van een plaats delict
Woongeschiedenis en de keuze van een plaats delictWoongeschiedenis en de keuze van een plaats delict
Woongeschiedenis en de keuze van een plaats delictFrank Smilda
 
Administering Organization Filing Centres (Registries)
Administering Organization Filing Centres (Registries)Administering Organization Filing Centres (Registries)
Administering Organization Filing Centres (Registries)SOLOMON M KAMINDA
 
The Business Case for Employer Branding
The Business Case for Employer BrandingThe Business Case for Employer Branding
The Business Case for Employer BrandingGlassdoor
 

En vedette (11)

Davison_June
Davison_JuneDavison_June
Davison_June
 
PSY4035 PG Literature searching for dissertation
PSY4035 PG Literature searching for dissertationPSY4035 PG Literature searching for dissertation
PSY4035 PG Literature searching for dissertation
 
Taller 4
Taller 4Taller 4
Taller 4
 
Woongeschiedenis en de keuze van een plaats delict
Woongeschiedenis en de keuze van een plaats delictWoongeschiedenis en de keuze van een plaats delict
Woongeschiedenis en de keuze van een plaats delict
 
ISC Marketing - Design Credentials
ISC Marketing - Design CredentialsISC Marketing - Design Credentials
ISC Marketing - Design Credentials
 
Administering Organization Filing Centres (Registries)
Administering Organization Filing Centres (Registries)Administering Organization Filing Centres (Registries)
Administering Organization Filing Centres (Registries)
 
ROBORACE
ROBORACEROBORACE
ROBORACE
 
TOPIC 1: HISTORY OF RADIATION
TOPIC 1: HISTORY OF RADIATIONTOPIC 1: HISTORY OF RADIATION
TOPIC 1: HISTORY OF RADIATION
 
SUBLIMINAL ADVERTIZING
SUBLIMINAL ADVERTIZING SUBLIMINAL ADVERTIZING
SUBLIMINAL ADVERTIZING
 
BRAND ISNOT JUST A NAME
BRAND ISNOT JUST A NAMEBRAND ISNOT JUST A NAME
BRAND ISNOT JUST A NAME
 
The Business Case for Employer Branding
The Business Case for Employer BrandingThe Business Case for Employer Branding
The Business Case for Employer Branding
 

Plus de Tony Hirst

15 in 20 research fiesta
15 in 20 research fiesta15 in 20 research fiesta
15 in 20 research fiestaTony Hirst
 
Jupyternotebooks ou.pptx
Jupyternotebooks ou.pptxJupyternotebooks ou.pptx
Jupyternotebooks ou.pptxTony Hirst
 
Virtual computing.pptx
Virtual computing.pptxVirtual computing.pptx
Virtual computing.pptxTony Hirst
 
ouseful-parlihacks
ouseful-parlihacksouseful-parlihacks
ouseful-parlihacksTony Hirst
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriateTony Hirst
 
Robotlab jupyter
Robotlab   jupyterRobotlab   jupyter
Robotlab jupyterTony Hirst
 
Fco open data in half day th-v2
Fco open data in half day  th-v2Fco open data in half day  th-v2
Fco open data in half day th-v2Tony Hirst
 
Notes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 WorkshopNotes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 WorkshopTony Hirst
 
Community Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireCommunity Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireTony Hirst
 
Residential school 2015_robotics_interest
Residential school 2015_robotics_interestResidential school 2015_robotics_interest
Residential school 2015_robotics_interestTony Hirst
 
Data Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXData Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXTony Hirst
 
A Quick Tour of OpenRefine
A Quick Tour of OpenRefineA Quick Tour of OpenRefine
A Quick Tour of OpenRefineTony Hirst
 
Conversations with data
Conversations with dataConversations with data
Conversations with dataTony Hirst
 
Data reuse OU workshop bingo
Data reuse OU workshop bingoData reuse OU workshop bingo
Data reuse OU workshop bingoTony Hirst
 
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Tony Hirst
 
Lincoln jun14datajournalism
Lincoln jun14datajournalismLincoln jun14datajournalism
Lincoln jun14datajournalismTony Hirst
 
Lincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data JournalismLincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data JournalismTony Hirst
 

Plus de Tony Hirst (20)

15 in 20 research fiesta
15 in 20 research fiesta15 in 20 research fiesta
15 in 20 research fiesta
 
Dev8d jupyter
Dev8d jupyterDev8d jupyter
Dev8d jupyter
 
Ili 16 robot
Ili 16 robotIli 16 robot
Ili 16 robot
 
Jupyternotebooks ou.pptx
Jupyternotebooks ou.pptxJupyternotebooks ou.pptx
Jupyternotebooks ou.pptx
 
Virtual computing.pptx
Virtual computing.pptxVirtual computing.pptx
Virtual computing.pptx
 
ouseful-parlihacks
ouseful-parlihacksouseful-parlihacks
ouseful-parlihacks
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriate
 
Robotlab jupyter
Robotlab   jupyterRobotlab   jupyter
Robotlab jupyter
 
Fco open data in half day th-v2
Fco open data in half day  th-v2Fco open data in half day  th-v2
Fco open data in half day th-v2
 
Notes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 WorkshopNotes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 Workshop
 
Community Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireCommunity Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wire
 
Residential school 2015_robotics_interest
Residential school 2015_robotics_interestResidential school 2015_robotics_interest
Residential school 2015_robotics_interest
 
Data Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXData Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKX
 
Week4
Week4Week4
Week4
 
A Quick Tour of OpenRefine
A Quick Tour of OpenRefineA Quick Tour of OpenRefine
A Quick Tour of OpenRefine
 
Conversations with data
Conversations with dataConversations with data
Conversations with data
 
Data reuse OU workshop bingo
Data reuse OU workshop bingoData reuse OU workshop bingo
Data reuse OU workshop bingo
 
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
 
Lincoln jun14datajournalism
Lincoln jun14datajournalismLincoln jun14datajournalism
Lincoln jun14datajournalism
 
Lincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data JournalismLincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data Journalism
 

Dernier

Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17Celine George
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 

Dernier (20)

Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 

Gors appropriate

Notes de l'éditeur

  1. Appropri-ut in the sense of not inapppropriate – the “right thing” to use, as well as approrpri-ate, as in co-opt, or use for something it perhaps wasn’t originally intended for.
  2. So for example, one thing I do is appropriate openly licensed media resources for my own slides. In this case, I want to set the scene for this presentation as one in which I haven’t been afraid to get my hands dirty, but I have also played with and explored a particular medium – in this case, various digital technologies – and created my own things which may also, ultimately, be of direct use to others. You might also say they’re at best half-baked, if not completely unbaked;-)
  3. The tools I’m going to talk about are situated within a data context. I spend a lot of time playing with openly licensed datasets, working across the whole data pipeline. This example, taken from the third year undergrad equivalent OU course TM351 “Data Analysis and Management” provides a simplistic view of some of the processes involved in working with data. (We all know it’s not quite that straightforward, and often involves a lot of iteration or backtracking, but as well as “The role of the academic [making] everything less simple”, as Mary Beard put it in an Observer interview a few weeks ago, the academic also simplifies and idealises through abstraction and revisionist storytelling, particularly when it comes to describing processes. So what I plan to do is spend a few minutes show you some of the tools and emerging approaches I use working across the various steps of this pipeline.
  4. So – the first thing to note is that I’m a technology optimist: I believe technology can help make our lives simpler, even if at first it may look as if we are making it more complex by introducing yet more tools to learn – and install on computers that our IT department would rather we left under their control. Taking control of your computing destiny is another theme of this talk… In this example, the box diagram I showed on the first line was /written/ rather than drawn. If I want to add steps, or have sub-branches added to the diagram, I don’t need to start faffing around in Powerpoint or Word figures trying to line things up and get them sized right and so on. I let the machine do it. In this particular online tool (you can see the URL in the screenshot at the top of the slide – I’ll pop a copy of the annotated slides online, and also let Alan have a copy) – so, in this particular tool, blockdiag, there are other diagram types available. The underlying code is also opensource and available as a python package, so you can write diagrams such as these in a Jupyter notebook, for example. I’ll have more to say about Jupyter notebooks later.
  5. One other point to note – and a bit of blatant self-promotion here – most of the individual slides within this talk are backed up by one or more posts on my personal blog, Ouseful.info. I’ve been writing this blog for many years and it represents a reasonably complete notebook of a lots of the ideas I’ve explored over that time. In many cases, the posts are comprehensive and self-complete: they record all the steps I took to do somehting in case I need to remind myself later.
  6. So, the pipeline. The first step, acquisition, relates to how we get hold of data This may be from downloaded data files – Excel spreadsheet documents (which are actually zip files – you know you can change the xlsx suffix to zip and unzip them, right? Same with docx Word document files and pptx Powerpoint files), databases, online APIs (application programmable interfaces), but it may be scraped from other sorts of document. Web pages, for example, or PDF documents (even though PDF documents are horrible, it’s often quite easy to extract data tables from them). I’m not going to talk about the mechanics of scraping, but journalism lecturer Paul Bradshaw has a good intro to a variety of tools and techniques in his Leanpub book “Scraping for Journalists”.
  7. I will beiefly mention a couple of tools I use though – morph.io is a site hoste dby an Australian opendata group that is actually a fork of a tool by UK Liverpudlian start-up, Scraperwiki. Morp.io will run a scraper of your own writing, hosted on Github, once a day and pop the results into a SQLite database that you can download. The slide shows a scraper I use for scraping License applications made to the Isle of Wight council.
  8. Another tool I use a lot is Tabula. Tabula is a Java application with a browser based user interface that will extract data tables from PDF documents. You simple drag to select the area of the page you want to scrape (you can mirror the same area over multiple pages or define different areas on each).
  9. The heart of the application is actually a command line engine, recently wrapped by the R tabulizr package. This means you can automate the use of tabula in order to scrape tabular data from PDF documents within R, getting the data back as an R data frame. That’s tabulizr – very nice; and the developer (on Github) is quite responsive.
  10. Another tool I use from time to time is Apache Tika – this can extract text from PDFs, Word documents and so on, as well as from images. There are quite a few online OCR services now, many of them appearing as part of “AI toolsets”, offering a range of commodity AI API services – IBM, Microsoft and Google all have them, for example. So as well as OCR text extraction, they do face and emotion detection in images, semantic tagging / entity labeling within documents, automatic image tagging, speech to text, and so on. All with varying degrees of success. But all of them steadily improving.
  11. After data acquisition, we’re often faced with cleaning a dataset. A tool I used for cleaning data is another Java application, again accessed via a browser, called OpenRefine. OpenRefine will open a wide range of document types – spreadsheets, csv or tabbed data files, XML, JSON, HTML – either locally or from the web, and presents it in a spreadsheet style UI. A wide range of options are provided for applying a particular transformation to each cell in a particular column – you can also script your own in a custom scripting language, or Python – as well as tools for faceting and filtering the display of rows based on values within one or more columns. The clustering tools are useful for finding and correcting partial matches – so for example, you can normalise MyCo Ltd, with MyCo Ltd., with MyCo Limited, and so on.
  12. OpenRefine can also provide support for a limited range of data reshaping actions. I’ve described a few of them in this post, which takes a messy local election results data set and shows how to clean and reshape it. OpenRefine also has a templated export – so we can generate simple ‘line at a time’ reports from a filtered dataset.
  13. One of the things I try to look for in applications is whether they are open source and whether they provide a browser based UI – if you can use it via a browser, you should be able to use it on your own local machine or from a remotely hosted version accessed over the web. OpenRefine meets both these criteria, which means it’s no problem for someone like IBM to make it available via their DataScientistWorkbench site. (It’s also not too hard to roll you won version of something like this site.) The other tools currently provided by this site are RStudio, a powerful – and friendly – IDE for the R programming language, and Jupyter notebooks.
  14. One reason why it’s getting easier to expose these applications over the web in a scaleable way is through containerisation. Containerisation is a form of application virtualisation where one or more applications can be wired together an isolated from each other within a multi-tenanted virtual machine. Docker containers offer the promise of being able to “run anywhere” – or at least, anywhere where the container platform can operate. Docker is the most popular route to this at the moment. The application show here is called Kitematic. It lets you search for public application containers, and download them and run them locally on your own computer. The example shows various containers I’ve put together for OpenRefine (some are different versions, others are experiments / demos I really should delete) So rather than install Java on your computer and then download and install OpenRefine, you can just one-click in Kitematic and it will get a prepackaged OpenRefine container for you that includes all that OpenRefine needs to run.
  15. One of the spin-offs from the early days of OpenRefine was the notion of a “reconciliation service”, whereby you could look up each item in an OpenRefine column against a webservice that would try to match it to – reconcile it with – a known entity. A partial / fuzzy matching lookup against a controlled vocabulary, essentially. OpenCorporates, the opendata international company lookup service, offers a reconciliation endpoint. It’s easy enough to package up your own lookup tables and this recipe describes how to do it using a homebrewed reconciliation container. I did ones for MPs, for example.
  16. Just as an aside, when putting together reconciliation services, we ideally want a canonical list of entities or entity names we want to reconcile against. Registers can be a good source of these. But it’s also worth noting that registers can also be used to generate derived datasets. For example, I wanted a list of UK prisons with location information. In the absence finding a single openly licensed dataset with this information (a website with one prison per page was the closest I found, which I could have scraped but chose not to), I instead do a lookup via the Food Standards Agency, which has inspection information for public food outlets. (Another source might have been the CQC, with a search for health surgeries or dental treatment centres, filtered by “HMP” or “prison”).
  17. RStudio is another application that can be freely redistributed and exposed via a bowser. These posts who how to run an RStudio application in the cloud using a simple container management dashboard formerly known as Tutum, now available as Docker Cloud. I’ve also described how to package a Shiny application in a container so you can deploy it anywhere. Does anyone use Shiny? Shiny is a rapid prototyping tool for building browser-based, HTML5 interactive applications and dashboards – RStudio released a new dashboarding framework over the last couple of weeks – that make it relatively easy to build interactive data exloration tools against an R environment.
  18. One really nice component of the Docker ecosytem is docker-compose, formerly known as fig, which allows you to orchestrate the launch of several interlinked containers, so you can easily access one from another. The example here shows how to link RStudio and a Jupyter notebooks to a neo4j database.
  19. I’ve mentioned Jupyter a few times – does anyone use Jupyter notebooks? IPython notebooks? The browser based notebook UI lets you enter text (as markdown) and executable code (in a variety of languages) and then run the code and display the results of the code execution back in the notebook. One thing I’ve been exploring recently is a way of calling command line application functions packaged in a container from a notebook cell, and returning the output of of the containerised command line function as a shared file. This post describes how I package the Contentmine tools - a set of tools for harvesting scientific journal papers and extracting knowledge from them – and which a real pain to set up normally – and then use them via a notebook.
  20. Just by the by, if you want to try the notebooks out, there’s a live demo available. (I also did a post on “Seven Ways to Run Jupyter Notebooks” which describes several other alternative ways of running the notebooks.) The code example here shows all the code needed to open an Excel file containing average travel times to GP surgeries by LSOA, filter the data down to a particular local authority area, pull in an openly licensed geojson shapefile for that area, and then plot (and embed) an interactive choropleth map via the folium python package (using Google maps, I think, though it may be OpenStreetmap?)
  21. One problem with producing interactive maps is that sometimes you actually want an image. It turns out that webtesting frameworks like Selenium make it easy to grab screenshots from test pages rendered in a test browser, so I co-opted the idea to produce a routine that lets me grab a png snapshot of a map.
  22. That example was actually created for a side project I dabbled with with our hyperlocal news outlet on the Isle of Wight called OnTheWIght. OnTheWight have been reporting monthly job figures for years, so I though I’d have a go at automating the production of the reports from nomis data, as well as producing a few charts. The report is just a literal reporting, although I do try to add some colour and a tiny amount of analysis for example by using directional and magnitude terms – “the numbers went UP SLIGHTLY from last month, although they are SIGNIFICANTLY DOWN from the same time last year”. And so on.
  23. On my own site, I started trying to pull out some geographical insight, automatically reporting on areas with noticeably high unemployment compared to other areas by gender. The map does look like a population map, but the unemployment rate is actually higher in some of the more heavily populated areas!
  24. Just a side note – the idea of being able to build something once they deploy it more widely for no extra effort really appeals to me. In the case of national datasets broken down to local level, building a solution for a local area you know about and understand helps get you started on automatically detecting and pulling out stories or features – but the same code can then run for other areas.
  25. The pain points often come in splitting the data down to local areas and then generating the stories.
  26. But if you automate a pain point away for one local area, you’ve solved the problem for all of them. The approach I’ve been taking is to think in terms of producing press releases rather than than finished stories, relying on the journalist, or some other editorial role, to act as the final arbiter of the quality and relevance of the press release style communication. The implication is also that more work needs to be done checking and working up the press release for the final story (if, indeed, there is any story).
  27. So picking up on this idea of reuse – or laziness – the nomis data to text engine can be easily wrapped to to provide a conversational UI for it. In this example, I can ask the service for the latest JSA figures in a particular area. Although not shown, you can put in a postcode, for example, and get the figures back for the local authority area containing that postcode. At the time I did this demo, I was half thinking of trying to persuade Johnston Press to give me some pin money to play with, so I scraped a list of Johnston press papers, found the postcode of their office, and used it as a the basis for a lookup of jobless figures by newspaper title area.
  28. Having got some machinery set up to work with slack, I could also use it as an interface for a simple “spreadsheet row to paragraph of text” toy I was trying to put together. So here, for example, I’m looking up latest figures for CQC care home inspections. (Actually, I think this is based on a scraper of the CQC website rather than a data file download.)
  29. The original experiments had the slack bot code running on my personal computer. More recently, I started looking at how things like Amazon AWS Lamda functions, essentially serverless remote procedure calls, could be used to host the bot. The examples here make use of the UK Parliament API to provide the content, allowing me to lookup up recent reports, or committee memberships, for example.
  30. The data 2 text area is a rich one, and one thing I find reflecting on my own exploratory data activities is that I often look to charts (which are often custom, mutlilayered charts of my own devising – ggplot is great for that) for inspiration. Working in education, where we have a legal requirement to make our teaching materials accessible, charts and figures often require written descriptions. So one thing I’ve started wondering recently is whether we can introspect on chart objects created using things like ggplot as a “data basis” for a textualisation of the chart components (and then do data2tesxt analysis for the simple analytics insight reporting). And it seems we can – gpplot chart objects , for example, have a ggplot_build() introspector, and we can also get access directly to chart objects.
  31. When I posted about my ggplot2text experiment, I idly wondered whether we could do the same for matplotlib chart objects. And is seems we can, as this demo shared via a commenter shows. #Lazyweb ftw, you might say:-)
  32. As I was looking at the Parlimanent API backend for a simple conversational search agent, the ONS Beta website became the live site. One of the nice things about the new ONS site is that a JSON feed alternative is available for much of the HTML content on the site. Which means we can repurpose that website content directly as a response to a conversational search.
  33. Finally, I want to return to the Jupyter ecosystem. I absoultely love the notebook environment: it provides a great environment for writing literate, reproducible data analysis scripts (serval news outlets are starting to publlish Jupyter notebooks showing the analysis behind their news stories – Buzzfeed is a great example of this, as with their recent tennis macth fixing / betting scame, for example), as well as providing a great environment for documenting exploratory data analyses. But the Jupyter ecosystem is already much richer than that. I haven’t described the dashboard toolkit for creating live dashboards, the slideshow view that lets you create interactive slides with live code execution, the range of programme language kernels (not just Python and R) or the kernel wrapper that lets you define an API via a notebook). But I do just want to quickly mention remote kernels.
  34. At the moment, we’re currently rewriting a day long residential school activity that uses Lego robots. Until this year, we’ve used the original yellow Lego Mindstorms RCX brick. This year, we’re using the Lego EV3 brick, which has wifi and can be set up to run Linux and a python shell that can access the robot’s bits. The approach I’ve been exploring it to run a remote IPython kernel on the brick, and a Juoyter server on a desktop machine, and then connect a notebook to the remote kernel via the Jupyter server. Running the notebook server on the brick removes the load of running the server from the brick. (The same approach can be – and is – used to run large tasks on supercomputer clusters.) The notebooks also allow us to create simple interactive Uis – just like R has the shiny framework, the Jupyter notebooks can run interactive ipywidgets direclty wired to python state. In the example abovem I have a slide for controlling motor speed, for example (actually, the duty cycle fo the stepper motor) and another that displays the value being seen by a particular sensor. (Again, there’s a tiny element of simplistic data2text contextualisation in the display.)
  35. So that’s me done. Some of the tools and technologies that I think are appropriate for, or can be appropriated for, data related tasks. Sometimes a pen will do as well as a spoon.
  36. And finally, a last bit of blatant self-promotion. In the same way that maths has recreational maths – fun puzzles in the Sunday papers – I engage in recreational data activities. And as with the blog, I keep a record of what I’ve done. Several years ago, I started to learn R, and used Formula One results and timing sheets data as context for that. Over the years, I’ve pulled various tricks and techniques together into this evolving book. (Actually, the book was also another experiment – Leanpub encourages you to publish as you write, and used markdown for the manuscript. I was looking for an opportunity to explore whether we might be able to use something like Rstudio, and in particular Rmd, R-markdown) for authoring OU course materials, so this gave me a reason – and a context – for exploring such a workflow). It’s still a work in progress, bit at over 400 pages already it represents a reasonably deep dive into the different things you can do with a limited range of datasets on a particular topic, as well as exploring a variety of ways of using – and appropriating – R to help us find stories in data.