SlideShare une entreprise Scribd logo
1  sur  104
A Tinkerer’s Toolbox:
Data Driven Journalism




                                           Tony Hirst
                     Dept of Communication and Systems
                                   The Open University
               Visiting Senior Research Fellow, University of Lincoln
@psychemedia

blog.ouseful.info

      #???
Where I situate myself…
Visualising data helps me make
sense of the world around me
Do you know
   what’s
 possible?
#ddj
Google
Spreadsheets
Data Distributions




                     Outliers
Trends and (anti)correlations...
Explanatory visualization
Data visualizations that are used to
transmit information or a point of
view from the designer to the
reader. Explanatory visualizations
typically have a specific “story” or
information that they are intended
to transmit.

Exploratory visualization
Data visualizations that are used by
the designer for self-informative
purposes to discover patterns,
trends, or sub-problems in a
dataset. Exploratory visualizations
typically don’t have an already-
known story.
Exploiting
Structure
Hierarchical data and treemaps - medals




Pivot tables
Templated data views
Macroscopes
Look for
Differences
Data Can Tell a
    Story
http://www.musik-therapie.at/PederHill/Structure&Plot.htm
Visual Data
Summaries
ggplot() +
geom_linerange(data = d1,aes(x= car, ymin = ymin,ymax = ymax)) +
geom_point(data = d2,aes(x= car, y= value,shape = variable),size = 2) +
opts(title="F1 2011 Korea nRace Summary Chart",
    axis.text.x=theme_text(angle=-90, hjust=0)) +
labs(x = NULL, y = "Position", shape = "")
Data
Clean(s)ing
Google Refine
(Inner) Joins &
 Reconciliation
Google Fusion
Tables
Google Refine
OpenHeatMap
“Data Flow”
“Analog Synth Meeting”, Todd Huffman
Google
                                                   Yahoo! Pipe
Wikipedia       HTML    Spreadsheet        CSV
                                                   Import CSV
                        =importHTML



            Embedded
                       <embed>        Google Map    KML
            object
Find the data…

                        Google
                                                   Yahoo! Pipe
Wikipedia       HTML    Spreadsheet        CSV
                                                   Import CSV
                        =importHTML



            Embedded
                       <embed>        Google Map    KML
            object
Get the data as data…

                        Google
                                                   Yahoo! Pipe
Wikipedia       HTML    Spreadsheet        CSV
                                                   Import CSV
                        =importHTML



            Embedded
                       <embed>        Google Map    KML
            object
Transform the data…

                        Google
                                                   Yahoo! Pipe
Wikipedia       HTML    Spreadsheet        CSV
                                                   Import CSV
                        =importHTML



            Embedded
                       <embed>        Google Map    KML
            object
Enrich the data and transform again…

                        Google
                                                   Yahoo! Pipe
Wikipedia       HTML    Spreadsheet        CSV
                                                   Import CSV
                        =importHTML



            Embedded
                       <embed>        Google Map    KML
            object
Display the data…

                        Google
                                                   Yahoo! Pipe
Wikipedia       HTML    Spreadsheet        CSV
                                                   Import CSV
                        =importHTML



            Embedded
                       <embed>        Google Map    KML
            object
Publish the displayed data…

                        Google
                                                   Yahoo! Pipe
Wikipedia       HTML    Spreadsheet        CSV
                                                   Import CSV
                        =importHTML



            Embedded
                       <embed>        Google Map    KML
            object
The onlineCSV file
      becomes a spreadsheet
          becomes A DATABASE
Finding data…
site:.gov.uk
filetype:xls
underspend
inurl:http://phx.corporate-ir.net/phoenix.zhtml?
intitle:press
site:phx.corporate-ir.net
inurl:http://phx.corporate-ir.net/phoenix.zhtml?
intitle:press
site:phx.corporate-ir.net
Tapping the
Data Burden
Reporting body              Receiving body


                      Data tap




Data Burdens and FOI
Opening Data
 Up via FOI
“Public Data” &
 Social Media
   Mapping
Emergent views
 of structural
  properties
My “journalism” is tracking down
tools and working out recipes that
     help datasets tell stories
http://delicious.com/stacks/view/CROBXt
Build lazy…
Electrical Safety 101

We get a lot of stuff from
Asia, so it all comes with
funny plugs, travelling just
adds to the fun.

Left to right top to bottom we
have:

Singapore wall socket UK
Adapter UK -> NZ/AU
Double adapter NZ/AU
My cell charger NZ/AU
Adapter NZ/AU -> everything
Andreas cell charger Euro
Camera charger US




                    tolomea
“Hands Passing Baton at Sporting Event”, tableatny
@psychemedia

blog.ouseful.info

Contenu connexe

Plus de Tony Hirst

Virtual computing.pptx
Virtual computing.pptxVirtual computing.pptx
Virtual computing.pptxTony Hirst
 
ouseful-parlihacks
ouseful-parlihacksouseful-parlihacks
ouseful-parlihacksTony Hirst
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriateTony Hirst
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriateTony Hirst
 
Robotlab jupyter
Robotlab   jupyterRobotlab   jupyter
Robotlab jupyterTony Hirst
 
Fco open data in half day th-v2
Fco open data in half day  th-v2Fco open data in half day  th-v2
Fco open data in half day th-v2Tony Hirst
 
Notes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 WorkshopNotes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 WorkshopTony Hirst
 
Community Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireCommunity Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireTony Hirst
 
Residential school 2015_robotics_interest
Residential school 2015_robotics_interestResidential school 2015_robotics_interest
Residential school 2015_robotics_interestTony Hirst
 
Data Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXData Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXTony Hirst
 
A Quick Tour of OpenRefine
A Quick Tour of OpenRefineA Quick Tour of OpenRefine
A Quick Tour of OpenRefineTony Hirst
 
Conversations with data
Conversations with dataConversations with data
Conversations with dataTony Hirst
 
Data reuse OU workshop bingo
Data reuse OU workshop bingoData reuse OU workshop bingo
Data reuse OU workshop bingoTony Hirst
 
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Tony Hirst
 
Lincoln jun14datajournalism
Lincoln jun14datajournalismLincoln jun14datajournalism
Lincoln jun14datajournalismTony Hirst
 
Lincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data JournalismLincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data JournalismTony Hirst
 
Hestia linear tales
Hestia linear talesHestia linear tales
Hestia linear talesTony Hirst
 

Plus de Tony Hirst (20)

Virtual computing.pptx
Virtual computing.pptxVirtual computing.pptx
Virtual computing.pptx
 
ouseful-parlihacks
ouseful-parlihacksouseful-parlihacks
ouseful-parlihacks
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriate
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriate
 
Robotlab jupyter
Robotlab   jupyterRobotlab   jupyter
Robotlab jupyter
 
Fco open data in half day th-v2
Fco open data in half day  th-v2Fco open data in half day  th-v2
Fco open data in half day th-v2
 
Notes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 WorkshopNotes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 Workshop
 
Community Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireCommunity Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wire
 
Residential school 2015_robotics_interest
Residential school 2015_robotics_interestResidential school 2015_robotics_interest
Residential school 2015_robotics_interest
 
Data Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXData Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKX
 
Week4
Week4Week4
Week4
 
A Quick Tour of OpenRefine
A Quick Tour of OpenRefineA Quick Tour of OpenRefine
A Quick Tour of OpenRefine
 
Conversations with data
Conversations with dataConversations with data
Conversations with data
 
Data reuse OU workshop bingo
Data reuse OU workshop bingoData reuse OU workshop bingo
Data reuse OU workshop bingo
 
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
 
Lincoln jun14datajournalism
Lincoln jun14datajournalismLincoln jun14datajournalism
Lincoln jun14datajournalism
 
Lincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data JournalismLincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data Journalism
 
Calrg14 tm351
Calrg14 tm351Calrg14 tm351
Calrg14 tm351
 
Calrg14 tm351
Calrg14 tm351Calrg14 tm351
Calrg14 tm351
 
Hestia linear tales
Hestia linear talesHestia linear tales
Hestia linear tales
 

Dernier

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Dernier (20)

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Lincoln ddj

Notes de l'éditeur

  1. Do we have a hashtag for the workshop?
  2. Collaborative commentary
  3. Through the provision of an API on top of the aggregated local council data, OpenlyLocal can also be treated as a database in its own right. In the example shown here, committee membership is displayed via a treemap showing party affiliations of committee members. (Hovering over a particular grouping displays a list of names of council members on that committee from that party political grouping.) Whilst it would be a major task to take data from every council website in a variety of formats in order to generate similar views for other councils, the work done by OpenlyLocal in aggregating this data and then republishing it via a single API in a single format means that the treemap view can be applied to each council whose data is stored in OpenlyLocal.In passing, it is also worth mentioning how the use of visualisations can be helpful in cleaning data or identifying possible errors in it. In the above example, we see that party affiliations for councillors on the Isle of Wight Council are declared as both Liberal Democrat and and Liberal Democrat Group.
  4. The top, blue strip shows the gear (1 to 7); the green strip shows the throttle pedal depression (0-100%), and the red strip shows the brake (0-100%). The light blue strip is a composite of the previous three strips. The whiter the pixel, the closer it is to 100% throttle in 7th gear with no braking.The bottom two traces show the longitudinal and lateral g-force respectively. For the longitudinal trace, red shows braking – being forced into the steering wheel; green shows acceleration – being forced back into your seat. You’ll see the greatest g-force under braking occurs when the brakes are slapped full on… (the red bits in the third and fifth traces line up). For the latitudinal g-force, the red shows the driving being flung to the left (i.e. right hand corner), the green shows them being pushed out to the right.
  5. Analogsynth – pretty much ultimate freedom to linlk audio processing effects modules together. Simplified by having a common plug.
  6. Some scene setting about what I mean by “flow”…
  7. Suppose we have a table of numerical data associated with placenames on something like Wikipedia. How do we knock up a quick map view of the data?
  8. UK city population search onwikipedia
  9. This can all be a bit flakey – a bit like balancing stones… But It can also be surprisingly stable (for a time at least!)
  10. Here we see the result of pulling data into a Google Spreadsheet from a CSV file published at a particular web address. We now have the ability to run the full range of spreadsheet tools over the data – data which is being pulled in from the datastore, remember.(A similar functionality presumably exists in Microsoft Excel?)
  11. Emergent Social Positioning: origins: 1.5 degree egonet (how followers follow each other, how hashtaggers follow each other)- projection maps from followers to folk they commonly follow;-- projection maps from hashtaggers to folk they commonly follow- projection maps from friends to folk who commonly follow them
  12. Lots of the time, things don’t quite fit: the import format for one tool does not match up with the export formats of another… so sometimes we need an adapter. (Cf. also the notion of impedance mismatch.)
  13. Do we have a hashtag for the workshop?