SlideShare une entreprise Scribd logo
1  sur  63
Télécharger pour lire hors ligne
One	
  take	
  on	
  what	
  data	
  journalism	
  may	
  or	
  may	
  not	
  be…	
  a	
  lecture	
  presented	
  to	
  
journalism	
  students	
  at	
  the	
  University	
  of	
  Lincoln,	
  UK,	
  February	
  2014.	
  

1	
  
Let’s	
  start	
  with	
  an	
  easy(?!)	
  quesJon	
  -­‐	
  what	
  is	
  journalism?	
  
One	
  way	
  of	
  answering	
  that	
  quesJon	
  is	
  to	
  list	
  some	
  of	
  the	
  funcJons,	
  or	
  aMributed,	
  
associated	
  with	
  it	
  –	
  informing,	
  educaJng,	
  holding	
  to	
  account,	
  watchdog	
  funcJon,	
  
campaigning,	
  contextualising.	
  

2	
  
Sensemaking	
  seems	
  to	
  me	
  to	
  be	
  an	
  important	
  part	
  of	
  it…	
  In	
  part	
  contextualisaJon,	
  in	
  
part	
  idenJfying	
  the	
  bits	
  that	
  make	
  the	
  difference,	
  the	
  bits	
  that	
  make	
  it	
  important,	
  the	
  
bits	
  that	
  make	
  is	
  news	
  that	
  people	
  need	
  to	
  know..	
  

3	
  
Second	
  quesJon:	
  what	
  is	
  data?	
  NaJonal	
  staJsJcs,	
  sports	
  results,	
  polls,	
  financial	
  
figures,	
  health	
  data,	
  school	
  league	
  tables,	
  etc	
  etc.	
  
Is	
  a	
  book	
  data?	
  Or	
  a	
  speech?	
  What	
  if	
  I	
  split	
  a	
  speech	
  up	
  into	
  separate	
  words,	
  count	
  
the	
  occurrence	
  of	
  each	
  unique	
  word	
  and	
  then	
  display	
  the	
  result	
  as	
  a	
  “tag	
  cloud”,	
  or	
  
word	
  frequency	
  diagram.	
  

4	
  
One	
  way	
  of	
  thinking	
  about	
  data	
  is	
  that	
  it	
  is	
  a	
  parJcular	
  sort	
  of	
  source,	
  or	
  a	
  source	
  that	
  
can	
  respond	
  to	
  a	
  parJcular	
  style	
  of	
  quesJoning	
  in	
  a	
  parJcular	
  way.	
  
Another	
  take	
  on	
  this	
  is	
  that	
  many	
  “data	
  sources”	
  are	
  experts	
  on	
  a	
  parJcular	
  topic,	
  
experts	
  that	
  know	
  a	
  lot	
  of	
  a	
  very	
  parJcular	
  class	
  of	
  facts.	
  

5	
  
So	
  what	
  is	
  data	
  journalism?	
  
One	
  way	
  is	
  to	
  think	
  of	
  it	
  as	
  a	
  process,	
  as	
  exemplified	
  by	
  Paul	
  Bradshaw’s	
  inverted	
  
pyramid	
  of	
  data	
  journalism.	
  I	
  see	
  it	
  more	
  as	
  a	
  conversaJon	
  in	
  which	
  data	
  is	
  one	
  of	
  the	
  
conversants.	
  The	
  conversaJonal	
  view	
  also	
  allows	
  us	
  to	
  think	
  about	
  process,	
  but	
  more	
  
important,	
  for	
  me,	
  is	
  that	
  in	
  a	
  conversaJon,	
  it	
  gets	
  personal…	
  

6	
  
The	
  inverted	
  pyramid	
  gives	
  us	
  one	
  way	
  of	
  considering	
  the	
  data	
  journalisJc	
  process,	
  or	
  
at	
  least	
  idenJfying	
  some	
  of	
  the	
  steps	
  involved	
  in	
  a	
  data	
  invesJgaJon.	
  
But	
  there	
  are	
  many	
  other	
  ways	
  of	
  conceptualising	
  the	
  process	
  –	
  for	
  example,	
  finding	
  
stories	
  and	
  telling	
  stories…	
  

7	
  
When	
  it	
  comes	
  to	
  finding	
  stories,	
  do	
  we:	
  
a)  want	
  to	
  find	
  stories	
  in	
  a	
  dataset	
  we	
  are	
  provided	
  with,	
  or	
  
b)  use	
  data	
  to	
  help	
  draw	
  out	
  a	
  story	
  lead	
  we	
  have	
  already	
  been	
  Jpped	
  off	
  to?	
  

8	
  
One	
  of	
  the	
  ways	
  I	
  like	
  to	
  work	
  with	
  data	
  is	
  to	
  have	
  a	
  conversaJon	
  with	
  it	
  –	
  asking	
  
quesJons	
  of	
  it	
  and	
  then	
  further	
  quesJons	
  based	
  on	
  the	
  responses	
  I	
  get.	
  

9	
  
SomeJmes	
  it	
  looks	
  at	
  first	
  as	
  if	
  we	
  have	
  data	
  in	
  a	
  form	
  where	
  we	
  might	
  be	
  able	
  to	
  do	
  
something	
  with	
  it	
  –	
  then	
  we	
  realise	
  it	
  needs	
  cleaning	
  and	
  reshaping.	
  
For	
  example,	
  in	
  this	
  case	
  we	
  have	
  percentage	
  signs	
  contaminaJng	
  numbers,	
  data	
  
organised	
  in	
  separate	
  secJons	
  –	
  but	
  how	
  do	
  we	
  get	
  a	
  “well	
  behaved”	
  view	
  over	
  	
  data	
  
from	
  all	
  the	
  wards	
  –	
  and	
  different	
  sorts	
  of	
  data:	
  votes	
  polled	
  per	
  candidate	
  versus	
  the	
  
size	
  of	
  the	
  electorate	
  in	
  a	
  parJcular	
  ward	
  for	
  example.	
  
Walkthrough:	
  hMp://blog.ouseful.info/2013/05/03/a-­‐wrangling-­‐example-­‐with-­‐
openrefine-­‐making-­‐ready-­‐data/	
  

10	
  
One	
  of	
  the	
  first	
  datasets	
  I	
  played	
  with	
  was	
  MPs’	
  expenses	
  data.	
  Here	
  are	
  a	
  couple	
  of	
  ways	
  I	
  started	
  to	
  cha
The	
  bar	
  chart	
  Is	
  ordered,	
  for	
  a	
  parJcular	
  expenses	
  area,	
  by	
  total	
  amount	
  for	
  each	
  individual	
  MP.	
  

The	
  block	
  histogram	
  shows	
  how	
  many	
  MPs	
  made	
  a	
  total	
  claim	
  in	
  parJcular	
  expenses	
  area	
  of	
  a	
  parJcular	
  
A	
  scaMerplot	
  is	
  another	
  very	
  powerful	
  sort	
  of	
  chart	
  –	
  we	
  can	
  plot	
  two	
  sorts	
  of	
  value	
  against	
  each	
  other	
  to

Some	
  scaMerplot	
  tools	
  allow	
  you	
  to	
  size	
  or	
  colour	
  nodes	
  according	
  to	
  further	
  dimensions.	
  Colouring	
  node
Maps	
  can	
  be	
  used	
  to	
  pull	
  out	
  different	
  sorts	
  of	
  relaJonships	
  –	
  for	
  example,	
  plokng	
  
markers	
  in	
  the	
  centre	
  of	
  each	
  MP’s	
  ward	
  coloured	
  by	
  the	
  total	
  value	
  of	
  travel	
  
expenses	
  claim	
  in	
  a	
  parJcular	
  area,	
  we	
  can	
  easily	
  see	
  whether	
  or	
  not	
  an	
  MP	
  is	
  
claiming	
  an	
  amount	
  significantly	
  different	
  to	
  MPs	
  in	
  neighbouring	
  wards.	
  In	
  this	
  case	
  –	
  
travel	
  expenses	
  –	
  we	
  might	
  expect	
  	
  (at	
  first	
  glance	
  at	
  least)	
  a	
  homophiliJc	
  effect	
  –	
  folk	
  
a	
  similar	
  distance	
  away	
  from	
  Westminster	
  should	
  presumably	
  make	
  similar	
  sorts	
  of	
  
travel	
  claim?	
  At	
  second	
  glance,	
  we	
  might	
  then	
  start	
  to	
  refine	
  our	
  quesJoning	
  –	
  does	
  
ward	
  size	
  (in	
  terms	
  of	
  geographical	
  area)	
  or	
  rurality	
  have	
  an	
  effect?	
  Does	
  an	
  MP	
  travel	
  
to	
  and	
  from	
  home	
  more	
  than	
  neighbours	
  (or	
  perhaps	
  claim	
  more	
  in	
  terms	
  of	
  
accommodaJon	
  in	
  London?)	
  

13	
  
SomeJmes	
  we	
  need	
  to	
  provide	
  quite	
  a	
  lot	
  of	
  explanaJon	
  when	
  it	
  comes	
  to	
  making	
  
sense	
  of	
  even	
  a	
  simple	
  data	
  visualisaJon	
  –	
  “what	
  am	
  I	
  supposed	
  to	
  be	
  looking	
  at?”	
  

14	
  
ContextualisaJon	
  can	
  take	
  many	
  forms	
  –	
  Trinity	
  Mirror	
  Group	
  have	
  a	
  data	
  unit	
  that	
  
produces	
  parJally	
  packaged	
  data	
  stories	
  and	
  lines	
  for	
  regional	
  Jtles,	
  who	
  can	
  then	
  
add	
  local	
  colour,	
  knowledge,	
  interpretaJon	
  and	
  spin	
  to	
  the	
  resulJng	
  story.	
  

15	
  
For	
  many	
  readers	
  –	
  it	
  may	
  be	
  that	
  data	
  ONLY	
  makes	
  sense	
  when	
  appropriately	
  
contextualised.	
  
In	
  passing,	
  it’s	
  also	
  worth	
  noJng	
  that	
  someJmes	
  the	
  data	
  you	
  don’t	
  collect	
  
someJmes	
  affects	
  the	
  interpretaJon	
  of	
  the	
  data	
  you	
  do…	
  
Foe	
  example:	
  hMp://www.open.edu/openlearn/science-­‐maths-­‐technology/
mathemaJcs-­‐and-­‐staJsJcs/staJsJcs/diary-­‐data-­‐sleuth-­‐when-­‐the-­‐data-­‐you-­‐dont-­‐
collect-­‐affects-­‐the-­‐data-­‐you-­‐do	
  

16	
  
In	
  passing,	
  it’s	
  worth	
  menJoning	
  that	
  one	
  thing	
  staJsJcs	
  does	
  is	
  help	
  provide	
  context.	
  
Is	
  this	
  number	
  a	
  big	
  number	
  in	
  the	
  greater	
  scheme	
  of	
  things?	
  Is	
  this	
  thing	
  likely	
  to	
  
happen	
  by	
  chance	
  or	
  is	
  there	
  a	
  meaningful	
  causal	
  relaJonship	
  between	
  this	
  thing	
  and	
  
another	
  thing?	
  
The	
  chart	
  in	
  the	
  corner	
  is	
  a	
  reminder	
  about	
  how	
  surprising	
  probabiliJes	
  can	
  be.	
  The	
  
chart	
  shows	
  the	
  probability	
  (y-­‐axis)	
  that	
  two	
  people	
  share	
  a	
  birthday	
  (the	
  number	
  of	
  
people	
  is	
  given	
  on	
  the	
  x-­‐axis).	
  The	
  chart	
  shows	
  that	
  if	
  there	
  are	
  23	
  or	
  more	
  people	
  in	
  a	
  
room,	
  there	
  is	
  more	
  than	
  a	
  50/50	
  chance	
  that	
  two	
  of	
  them	
  will	
  share	
  a	
  birthday	
  (that	
  
is,	
  share	
  the	
  same	
  birth	
  day	
  and	
  month,	
  though	
  not	
  necessarily	
  same	
  birth	
  year).	
  
How	
  many	
  people	
  are	
  in	
  the	
  room?	
  If	
  it’s	
  more	
  than	
  23	
  –	
  I	
  bet	
  that	
  at	
  least	
  two	
  
people	
  share	
  a	
  birthday	
  (at	
  least	
  in	
  terms	
  of	
  day	
  and	
  month).	
  

17	
  
The	
  other	
  way	
  of	
  using	
  data	
  is	
  to	
  tell	
  stories.	
  But	
  what	
  does	
  that	
  even	
  mean…?	
  

18	
  
A	
  common	
  source	
  of	
  stories	
  based	
  on	
  data	
  are	
  polls,	
  either	
  polls	
  that	
  are	
  
commissioned	
  by	
  a	
  publisher	
  with	
  a	
  view	
  to	
  generaJng	
  a	
  story,	
  or	
  commissioned	
  by	
  a	
  
lobbying	
  group	
  or	
  PR	
  form	
  to	
  promote	
  not	
  only	
  stories	
  around	
  a	
  parJcular	
  issue,	
  but	
  
stories	
  that	
  follow	
  a	
  line	
  favourable	
  to	
  the	
  organisaJon	
  that	
  commissioned	
  the	
  poll	
  
(or	
  detrimental	
  to	
  posiJons	
  that	
  whoever	
  commissioned	
  the	
  poll	
  is	
  campaigning	
  
against).	
  
When	
  presented	
  with	
  a	
  press	
  release	
  wriMen	
  around	
  a	
  PR	
  company	
  commissioned	
  
poll,	
  look	
  to	
  the	
  raw	
  data	
  to	
  see	
  where	
  the	
  numbers	
  that	
  appear	
  in	
  the	
  press	
  release	
  
quotes	
  actually	
  come	
  from.	
  
In	
  the	
  above	
  example,	
  I	
  could	
  for	
  example	
  claim	
  that	
  96%	
  of	
  people	
  (creaJve	
  reading	
  
of	
  the	
  numbers)	
  did	
  not	
  appear	
  to	
  disagree	
  with	
  the	
  idea	
  that	
  press	
  behaviour	
  should	
  
be	
  independently	
  regulated	
  (creaJve	
  reading	
  of	
  the	
  quesJon;	
  the	
  repeated	
  negaJves	
  
also	
  serve	
  to	
  further	
  confuse	
  the	
  clarity	
  of	
  what	
  is,	
  or	
  isn’t	
  actually	
  being	
  claimed…).	
  
And	
  when	
  reading	
  raw	
  results,	
  or	
  quoJng	
  from	
  them,	
  take	
  care	
  which	
  numbers	
  you	
  
quote.	
  SomeJmes	
  the	
  presentaJon	
  of	
  the	
  results	
  can	
  lead	
  to	
  you	
  misreading	
  them	
  or	
  
the	
  way	
  they	
  add	
  up.	
  
SomeJmes,	
  two	
  or	
  more	
  polls	
  may	
  be	
  commissioned	
  around	
  the	
  same	
  topic	
  and	
  
appear	
  to	
  give	
  contradictory	
  results.	
  For	
  an	
  example	
  of	
  this,	
  see:	
  hMp://
www.open.edu/openlearn/science-­‐maths-­‐technology/mathemaJcs-­‐and-­‐staJsJcs/
staJsJcs/two-­‐can-­‐play-­‐game-­‐when-­‐polls-­‐collide	
  
19	
  
Many	
  polling	
  organisaJons	
  publish	
  press	
  releases	
  featuring	
  “highlight”	
  results	
  from	
  a	
  
poll.	
  The	
  more	
  reputable	
  ones	
  also	
  publish	
  copies	
  of	
  the	
  poll	
  or	
  survey	
  quesJons	
  and	
  
the	
  results	
  that	
  were	
  returned.	
  
YouGov	
  polls	
  oren	
  split	
  results	
  down	
  by	
  poliJcal	
  persuasion	
  or	
  newspaper	
  
preference,	
  as	
  well	
  as	
  demographically	
  segmenJng	
  responses	
  by	
  gender,	
  age	
  or	
  
region.	
  
The	
  majority	
  of	
  polling	
  organisaJons	
  publish	
  the	
  data	
  via	
  PDFs	
  rather	
  than	
  “as	
  data”,	
  
for	
  example,	
  in	
  the	
  form	
  of	
  spreadsheet	
  datatables.	
  Tools	
  such	
  as	
  Tabula	
  (URL)	
  are	
  
making	
  it	
  increasingly	
  easy	
  to	
  extract	
  the	
  data	
  contained	
  within	
  PDFs	
  into	
  actual	
  
datatables.	
  Your	
  local	
  techie	
  should	
  also	
  be	
  able	
  to	
  “scrape”	
  the	
  data	
  from	
  a	
  PDF	
  
document	
  and	
  put	
  it	
  into	
  a	
  data	
  from.	
  
For	
  examples	
  of	
  how	
  to	
  scrape	
  data	
  as	
  well	
  as	
  images	
  from	
  PDF	
  documents,	
  see:	
  
-­‐ 	
  scraping	
  data	
  tables	
  from	
  PDFs:	
  
-­‐ 	
  extracBng	
  images	
  from	
  PDFs:	
  
Even	
  if	
  you	
  feel	
  as	
  if	
  you	
  can’t	
  do	
  this	
  yourself,	
  you	
  should	
  make	
  yourself	
  aware	
  of	
  
what	
  is	
  possible	
  and	
  achievable	
  by	
  people	
  who	
  have	
  the	
  skills	
  to	
  performs	
  these	
  
tasks.	
  

20	
  
Stephen	
  Few	
  has	
  wriMen	
  several	
  excellent	
  books	
  about	
  creaJng	
  data	
  visualisaJons	
  
and	
  data	
  dashboards,	
  although	
  you	
  shouldn’t	
  necessarily	
  believe	
  everything	
  he	
  says!	
  
This	
  quote	
  gets	
  across	
  the	
  idea	
  that	
  just	
  as	
  we	
  use	
  emphasis	
  and	
  tone	
  in	
  wriMen	
  
communicaJon,	
  we	
  can	
  also	
  can	
  and	
  should	
  make	
  use	
  of	
  emphasis	
  and	
  tone	
  in	
  
charts.	
  
Many	
  newspapers	
  are	
  starJng	
  to	
  make	
  use	
  of	
  charts	
  that	
  show	
  several	
  datapoints	
  (for	
  
example,	
  several	
  bars	
  in	
  a	
  bar	
  chart)	
  but	
  highlight	
  one	
  or	
  two	
  of	
  them	
  that	
  are	
  the	
  
focus	
  of	
  a	
  parJcular	
  storyline,	
  the	
  other	
  points	
  or	
  bars	
  being	
  used	
  to	
  provide	
  context.	
  
In	
  chart	
  design,	
  “less	
  is	
  more”	
  oren	
  works	
  (this	
  reflects	
  a	
  principle	
  aMributed	
  to	
  data	
  
visualisaJon	
  guru	
  Edward	
  Ture	
  of	
  using	
  “least	
  ink”	
  when	
  creaJng	
  charts).	
  

21	
  
This	
  video	
  	
  -­‐	
  showing	
  part	
  of	
  a	
  lecture	
  by	
  science	
  ficJon	
  writer	
  Kurt	
  Vonnegut	
  –	
  shows	
  
how	
  simple	
  lines	
  can	
  tell	
  archetypal	
  stories.	
  Note	
  how	
  the	
  narraJon	
  sets	
  the	
  scene	
  	
  -­‐	
  
the	
  axes	
  are	
  explained	
  then	
  the	
  line	
  is	
  constructed.	
  When	
  the	
  x-­‐axis	
  represents	
  Jme,	
  
remember	
  that	
  someone	
  riding	
  the	
  line	
  as	
  it	
  was	
  constructed	
  does	
  not	
  necessarily	
  
know	
  what	
  the	
  future	
  holds.	
  When	
  you	
  see	
  a	
  line	
  chart	
  with	
  Jme	
  as	
  an	
  x-­‐axis,	
  
remember	
  that	
  it	
  shows	
  a	
  trace	
  of	
  a	
  story	
  that	
  unfolded	
  over	
  Jme.	
  
Another	
  powerful	
  example	
  of	
  this	
  can	
  be	
  found	
  on	
  Youtube	
  –	
  search	
  for	
  house	
  price	
  
rollercoaster	
  to	
  find	
  an	
  animaJon	
  where	
  how	
  price	
  values	
  over	
  Jme	
  are	
  visualised	
  as	
  
an	
  animated	
  roller	
  coaster	
  ride…	
  

22	
  
This	
  second	
  clip	
  shows	
  Swedish	
  health	
  staJsJcian	
  made	
  famous	
  by	
  his	
  “data	
  
performances”,	
  Hans	
  Rosling,	
  narraJng	
  an	
  animated	
  data	
  visualisaJon	
  rendered	
  using	
  
a	
  dynamic	
  bubble	
  chart	
  technique	
  that	
  he	
  popularised	
  via	
  his	
  Gapminder	
  website.	
  
Note	
  how	
  the	
  first	
  30	
  seconds	
  of	
  the	
  clip	
  are	
  spent	
  explaining	
  the	
  set	
  up	
  of	
  the	
  chart	
  –	
  
what	
  the	
  axes	
  mean,	
  what	
  the	
  bubbles	
  represent.	
  When	
  you	
  see	
  a	
  rich	
  data	
  driven	
  
interacJve	
  on	
  a	
  website,	
  how	
  much	
  coaching	
  and	
  contextualisaJon	
  is	
  provided	
  to	
  
help	
  the	
  user/reader	
  make	
  sense	
  of	
  it?	
  
If	
  you	
  turn	
  the	
  sound	
  off	
  on	
  the	
  Rosling	
  clip,	
  how	
  much	
  sense	
  do	
  the	
  moving	
  bubbles	
  
make	
  in	
  terms	
  of	
  the	
  story	
  they	
  tell	
  without	
  the	
  benefit	
  of	
  Rosling’s	
  narraBon?	
  Can	
  
you	
  tell	
  where	
  to	
  focus	
  your	
  aMenJon	
  to	
  pull	
  out	
  a	
  meaningful	
  storyline?	
  Are	
  there	
  
many	
  possible	
  storylines	
  that	
  can	
  be	
  pulled	
  out?	
  What	
  tricks	
  does	
  Rosling	
  use	
  to	
  focus	
  
your	
  aMenJon	
  on	
  –	
  and	
  illustrate	
  –	
  the	
  story	
  he	
  is	
  telling?	
  Is	
  there	
  any	
  sleight	
  of	
  hand	
  
in	
  terms	
  of	
  not	
  commenJng	
  on	
  what	
  some	
  of	
  the	
  other	
  bubbles	
  are	
  doing	
  (is	
  he	
  
using,	
  or	
  could	
  he	
  potenJally	
  use,	
  misdirecJon	
  to	
  focus	
  your	
  aMenJon	
  aware	
  from	
  
possible	
  stories	
  he	
  does	
  not	
  want	
  you	
  to	
  pull	
  out	
  of	
  the	
  data?)	
  
For	
  more	
  examples	
  of	
  Rosling’s	
  compelling	
  performances,	
  see	
  the	
  recent	
  OU/BBC	
  Two	
  
co-­‐producJon	
  “Don’t	
  Panic	
  –	
  The	
  Truth	
  About	
  PopulaJon	
  Change”	
  available	
  on	
  the	
  
Gapminder	
  website:	
  hMp://www.gapminder.org/videos/dont-­‐panic-­‐the-­‐facts-­‐about-­‐
populaJon/	
  

23	
  
Few	
  suggests	
  that	
  graphical	
  communicaJon	
  requires	
  stylisJc	
  devices	
  that	
  emphasise	
  
parJcular	
  aspects	
  of	
  a	
  graphic.	
  Hans	
  Rosling	
  achieves	
  this	
  by	
  both	
  poinJng	
  to	
  items	
  of	
  
interest,	
  reinforcing	
  with	
  emphasis	
  with	
  both	
  his	
  narraJon	
  and	
  the	
  use	
  of	
  overlays	
  on	
  
the	
  graphic	
  itself.	
  
So	
  how	
  can	
  we	
  go	
  about	
  drawing	
  emphasis	
  within	
  a	
  staJc	
  graphic	
  or	
  chart,	
  such	
  as	
  
one	
  might	
  find	
  in	
  a	
  print	
  publicaJon?	
  

24	
  
To	
  show	
  one	
  way	
  of	
  emphasising	
  parJcular	
  elements	
  of	
  a	
  graphic,	
  let’s	
  produce	
  a	
  
quick	
  chart	
  of	
  our	
  own.	
  
The	
  first	
  thing	
  we	
  need	
  is	
  some	
  data	
  –	
  I’m	
  going	
  to	
  use	
  some	
  data	
  from	
  the	
  Winter	
  
Olympics,	
  a	
  grab	
  of	
  the	
  medal	
  table	
  from	
  the	
  back	
  end	
  of	
  the	
  first	
  week	
  of	
  the	
  2014	
  
games.	
  The	
  quesJon	
  I	
  want	
  to	
  explore	
  is	
  the	
  extent	
  to	
  which	
  the	
  country	
  that	
  is	
  
leading	
  the	
  medal	
  table	
  as	
  measured	
  by	
  most	
  number	
  of	
  gold	
  medals	
  awarded,	
  
compared	
  to	
  a	
  ranking	
  in	
  which	
  the	
  table	
  is	
  ordered	
  according	
  to	
  the	
  total	
  number	
  of	
  
medals	
  awarded.	
  
The	
  data	
  I’m	
  going	
  to	
  use	
  comes	
  from	
  a	
  Wikipedia	
  page.	
  The	
  medal	
  table	
  is	
  contained	
  
within	
  an	
  HTML	
  table.	
  To	
  get	
  the	
  data	
  out	
  of	
  the	
  page	
  we	
  are	
  going	
  to	
  screenscrape	
  
the	
  HTML	
  table	
  that	
  contains	
  the	
  data.	
  There	
  are	
  a	
  variety	
  of	
  tools	
  for	
  doing	
  this,	
  from	
  
browser	
  extensions	
  to	
  scraper	
  applicaJons	
  such	
  as	
  import.io,	
  to	
  environments	
  such	
  as	
  
Scraperwiki	
  that	
  provide	
  a	
  range	
  of	
  developer	
  tools	
  configured	
  to	
  support	
  
screenscraping	
  based	
  data	
  collecJon.	
  
But	
  the	
  tool	
  I’m	
  going	
  to	
  use	
  is…	
  

25	
  
..Google	
  (spread)sheets,	
  and	
  in	
  parJcular	
  a	
  formula	
  that	
  will	
  import	
  a	
  parJcular	
  HTML	
  
table	
  –	
  in	
  this	
  case,	
  the	
  2nd	
  table	
  in	
  the	
  page	
  –	
  from	
  a	
  specified	
  URL,	
  In	
  this	
  case	
  the	
  
URL	
  of	
  the	
  Wikipedia	
  page	
  containing	
  the	
  medal	
  table.	
  
The	
  formula?	
  
	
  =importhtml(“URL”,”table”,	
  tableNumber)	
  	
  
On	
  entering	
  the	
  formula,	
  the	
  spreadsheet	
  will	
  pull	
  the	
  data	
  in	
  from	
  the	
  Wikipedia	
  
page	
  and	
  make	
  it	
  available	
  as	
  spreadsheet	
  data.	
  	
  
We	
  can	
  now	
  use	
  the	
  spreadsheet	
  to	
  create	
  charts	
  within	
  the	
  sheet	
  itself.	
  If	
  the	
  data	
  in	
  
the	
  Wikipedia	
  page	
  is	
  updated,	
  the	
  data	
  in	
  the	
  spreadsheet	
  will	
  be	
  updated	
  whenever	
  
the	
  spreadsheet	
  is	
  refreshed.	
  

26	
  
Whilst	
  we	
  could	
  generate	
  charts	
  within	
  the	
  spreadsheet,	
  I’m	
  actually	
  going	
  to	
  use	
  an	
  
online	
  tool	
  called	
  datawrapper	
  (available	
  at	
  datawrapper.de).	
  
Datawrapper	
  charts	
  are	
  starJng	
  to	
  make	
  an	
  appearance	
  in	
  many	
  online	
  news	
  reports,	
  
such	
  as	
  those	
  published	
  by	
  the	
  Guardian	
  and	
  Trinity	
  Mirror’s	
  ampp3d,	
  so	
  being	
  
familiar	
  with	
  this	
  tool	
  	
  -­‐	
  and	
  what	
  you	
  can	
  do	
  with	
  it	
  –	
  could	
  be	
  a	
  useful	
  skill	
  to	
  have.	
  
To	
  get	
  the	
  data	
  in	
  to	
  datawrapper	
  you	
  can	
  upload	
  a	
  CSV	
  file,	
  or	
  paste	
  a	
  copy	
  of	
  the	
  
data	
  in	
  to	
  the	
  upload	
  area.	
  I’ve	
  taken	
  the	
  laMer	
  approach,	
  highlighJng	
  and	
  copying	
  
the	
  table	
  from	
  the	
  spreadsheet	
  and	
  then	
  pasJng	
  it	
  in	
  to	
  datawrapper.	
  

27	
  
Having	
  uploaded	
  the	
  data,	
  we	
  can	
  configure	
  several	
  properJes	
  for	
  each	
  column.	
  In	
  
many	
  cases	
  datawrapper	
  should	
  be	
  able	
  to	
  detect	
  what	
  sort	
  of	
  content	
  is	
  contained	
  
within	
  each	
  column	
  (for	
  example,	
  whether	
  it	
  is	
  a	
  number	
  or	
  a	
  text	
  field).	
  
If	
  necessary,	
  we	
  can	
  apply	
  a	
  limited	
  amount	
  of	
  processing	
  to	
  the	
  contents	
  of	
  a	
  
specified	
  column.	
  We	
  can	
  also	
  choose	
  to	
  hide	
  one	
  or	
  more	
  columns	
  from	
  the	
  
displayed	
  view.	
  In	
  this	
  case,	
  I	
  am	
  going	
  to	
  hide	
  the	
  Rank,	
  Silver	
  and	
  Bronze	
  columns.	
  

28	
  
We	
  now	
  get	
  to	
  choose	
  the	
  chart	
  type	
  –	
  I’m	
  going	
  to	
  go	
  for	
  a	
  horizontal	
  bar	
  chart	
  and	
  
select	
  the	
  default	
  datawrapper	
  style.	
  

29	
  
Different	
  chart	
  types	
  have	
  different	
  configuraJon	
  opJons.	
  I’m	
  going	
  to	
  choose	
  to	
  
automaJcally	
  sort	
  the	
  bars	
  based	
  on	
  the	
  selected	
  value	
  –	
  noJce	
  the	
  buMons	
  in	
  the	
  
chart	
  that	
  allow	
  us	
  to	
  select	
  whether	
  to	
  display	
  the	
  Gold	
  medal	
  count	
  or	
  the	
  Total	
  
medal	
  count.	
  

30	
  
Now	
  we	
  get	
  to	
  add	
  some	
  emphasis	
  –	
  remember	
  emphasis?	
  This	
  is	
  an	
  example	
  about	
  
how	
  to	
  show	
  emphasis	
  in	
  a	
  chart…	
  
In	
  this	
  case,	
  I’m	
  going	
  to	
  emphasise	
  the	
  top	
  2	
  posiJons	
  in	
  the	
  Gold	
  medal	
  ranking	
  –	
  
the	
  “point”	
  of	
  the	
  piece	
  is	
  to	
  explore	
  the	
  extent	
  to	
  which	
  these	
  posiJons	
  hold,	
  or	
  
don’t	
  hold,	
  when	
  we	
  rank	
  the	
  table	
  by	
  total	
  medal	
  count.	
  
At	
  this	
  point,	
  we	
  can	
  also	
  give	
  the	
  chart	
  a	
  Jtle,	
  and	
  add	
  some	
  provenance	
  informaJon	
  
describing	
  and	
  poinJng	
  to	
  the	
  source	
  of	
  the	
  data.	
  

31	
  
Here’s	
  an	
  example	
  of	
  the	
  final	
  chart,	
  with	
  the	
  ranking	
  (automaJcally)	
  sorted	
  according	
  
to	
  total	
  medal	
  count.	
  Note	
  how	
  the	
  order	
  and	
  posiJoning	
  of	
  the	
  two	
  highlighted	
  
countries	
  has	
  changed.	
  
The	
  difference	
  is	
  further	
  exemplified	
  when	
  switching	
  between	
  the	
  Gold	
  and	
  Total	
  
counts	
  by	
  the	
  use	
  of	
  animaJon	
  –	
  the	
  highlighted	
  bars	
  draw	
  the	
  eye	
  and	
  allow	
  you	
  to	
  
beMer	
  see	
  how	
  their	
  relaJve	
  posiJons	
  change	
  across	
  each	
  of	
  the	
  two	
  ranking	
  
schemes.	
  

32	
  
Having	
  created	
  chart,	
  you	
  can	
  now	
  save	
  it	
  to	
  your	
  datawrapper	
  account.	
  An	
  embed	
  
code	
  for	
  the	
  chart	
  is	
  provided	
  so	
  that	
  you	
  embed	
  the	
  chart	
  within	
  your	
  own	
  web	
  
page.	
  

33	
  
Bar	
  charts	
  are	
  a	
  very	
  effecJve	
  way	
  of	
  displaying	
  parJcular	
  sorts	
  of	
  informaJon,	
  such	
  
as	
  counts.	
  But	
  what	
  other	
  ways	
  are	
  there	
  of	
  displaying	
  data?	
  

34	
  
Datawrapper	
  provides	
  a	
  variety	
  of	
  chart	
  types,	
  including:	
  
-­‐ 	
  horizontal	
  and	
  verJcal	
  (column)	
  bar	
  charts,	
  
-­‐ 	
  grouped	
  bars	
  that	
  collate	
  different	
  bars	
  according	
  to	
  groups	
  (for	
  example,	
  elecJon	
  
on	
  elecJon	
  percentage	
  of	
  the	
  vote	
  for	
  different	
  poliJcal	
  parJes),	
  
-­‐ 	
  stacked	
  column	
  charts	
  (for	
  example,	
  for	
  a	
  selecJon	
  of	
  countries	
  we	
  could	
  display	
  a	
  
column	
  showing	
  the	
  total	
  number	
  of	
  medals	
  constructed	
  by	
  stacking	
  the	
  individual	
  
gold,	
  silver	
  and	
  bronze	
  medal	
  counts	
  for	
  those	
  countries)	
  
-­‐ 	
  line	
  charts,	
  which	
  are	
  widely	
  used	
  for	
  plokng	
  some	
  value	
  on	
  the	
  verJcal	
  y-­‐axis	
  
against	
  Jme	
  on	
  the	
  horizontal	
  x-­‐axis	
  
-­‐ 	
  pie	
  charts,	
  to	
  show	
  proporJons	
  of	
  a	
  whole,	
  and	
  variants	
  thereof,	
  such	
  as	
  the	
  donut	
  
chart	
  (a	
  pie	
  chart	
  with	
  the	
  middle	
  cut	
  out)	
  
-­‐ 	
  simple	
  data	
  tables	
  (never	
  underesJmate	
  the	
  power	
  of	
  a	
  table	
  –	
  they	
  can	
  be	
  really	
  
useful	
  for	
  showing	
  specific	
  values,	
  and	
  can	
  be	
  very	
  powerful	
  when	
  allowing	
  the	
  user	
  
to	
  sort	
  the	
  table	
  either	
  by	
  ascending	
  or	
  descending	
  values	
  in	
  parJcular	
  columns)	
  
-­‐ 	
  maps,	
  which	
  as	
  we	
  shall	
  see,	
  can	
  draw	
  out	
  very	
  powerful	
  relaJonships	
  across	
  data	
  
elements.	
  

35	
  
We’ve	
  also	
  seen	
  some	
  other	
  “basic”	
  charts	
  that	
  can	
  be	
  useful	
  for	
  displaying	
  the	
  
distribuJon	
  of	
  data	
  elements:	
  
-­‐ 	
  the	
  block	
  histogram	
  shows	
  a	
  count	
  on	
  the	
  y-­‐axis	
  of	
  data	
  elements	
  falling	
  within	
  
parJcular	
  ranges	
  of	
  values	
  on	
  the	
  x-­‐axis	
  
-­‐ 	
  the	
  scaMerplot	
  allows	
  us	
  to	
  plot	
  two	
  values	
  against	
  each	
  other,	
  for	
  example	
  height	
  
versus	
  weight.	
  These	
  charts	
  can	
  provide	
  us	
  with	
  clues	
  about	
  possible	
  correlaJons	
  or	
  
relaJonships	
  between	
  the	
  two	
  values.	
  Some	
  scaMerplot	
  tools	
  further	
  allow	
  us	
  to	
  
colour	
  each	
  point	
  according	
  to	
  group	
  membership	
  so	
  that	
  we	
  can	
  look	
  to	
  see	
  whether	
  
numbers	
  are	
  clustered	
  or	
  grouped	
  according	
  to	
  group	
  membership.	
  

36	
  
Visualising	
  data	
  is	
  a	
  powerful	
  way	
  of	
  asking	
  quesJons	
  of	
  data	
  –	
  what	
  data	
  points	
  you	
  
choose	
  to	
  display	
  and	
  how	
  you	
  display	
  them	
  represent	
  the	
  framing	
  of	
  the	
  quesJon.	
  
What	
  the	
  data	
  looks	
  like	
  is	
  the	
  response,	
  but	
  a	
  response	
  that	
  oren	
  takes	
  careful	
  
reading.	
  The	
  data	
  source	
  has	
  drawn	
  you	
  the	
  answer	
  –	
  you	
  need	
  to	
  turn	
  it	
  into	
  words	
  
that	
  you	
  can	
  use	
  to	
  formulate	
  further	
  quesJons	
  to	
  check	
  your	
  understanding	
  of	
  the	
  
answer	
  first	
  provided.	
  (Each	
  quesJon	
  (each	
  chart)	
  typically	
  leads	
  to	
  another…	
  or	
  more	
  
than	
  one	
  other…)	
  
Asking	
  quesJons	
  that	
  have	
  a	
  graphical	
  answer	
  is	
  one	
  way	
  of	
  querying	
  a	
  data	
  source	
  –	
  
but	
  are	
  there	
  other	
  approaches?	
  	
  
Let’s	
  explore	
  that	
  a	
  liMle	
  more	
  –	
  what	
  do	
  we	
  mean	
  by	
  asking	
  quesJons	
  of	
  data?	
  

37	
  
A	
  database	
  that	
  most	
  of	
  us	
  use	
  every	
  day	
  is	
  the	
  Google	
  web	
  search	
  engine.	
  We	
  put	
  in	
  
a	
  key	
  term	
  or	
  phrase	
  and	
  Google	
  finds	
  web	
  pages	
  ranked	
  according	
  to	
  a	
  variety	
  of	
  
criteria	
  that	
  are	
  deemed	
  most	
  relevant	
  to	
  the	
  query	
  you	
  (and	
  it	
  could	
  well	
  be	
  who	
  you	
  
actually	
  are	
  that	
  affects	
  the	
  ranking)	
  have	
  made.	
  
SomeJmes	
  we	
  may	
  know	
  what	
  websites	
  we	
  actually	
  want	
  to	
  search	
  over.	
  Google	
  
Custom	
  Search	
  Engines	
  provide	
  one	
  way	
  of	
  defining	
  your	
  own	
  search	
  engine	
  that	
  just	
  
searches	
  over	
  part	
  of	
  the	
  web	
  that	
  you	
  are	
  interested	
  in.	
  
One	
  of	
  the	
  custom	
  search	
  engines	
  I	
  have	
  developed	
  searches	
  over	
  websites	
  that	
  act	
  
as	
  wire	
  services	
  for	
  press	
  releases:	
  hMps://www.google.com/cse/publicurl?
cx=016419300868826941330:wvfrmcn2oxc	
  
This	
  allows	
  us	
  to	
  track	
  down	
  the	
  source	
  of	
  many	
  a	
  news	
  item	
  and	
  explore	
  the	
  extent	
  
to	
  which	
  a	
  given	
  news	
  story	
  has	
  just	
  churned	
  a	
  press	
  release.	
  
See	
  also:	
  hMp://blog.ouseful.info/2014/02/06/polling-­‐the-­‐news/	
  This	
  post	
  also	
  
describes	
  how	
  to	
  create	
  a	
  bookmarklet	
  that	
  allows	
  you	
  to	
  highlight	
  a	
  quote	
  in	
  a	
  news	
  
report	
  and	
  search	
  for	
  press	
  releases	
  that	
  contain	
  that	
  quote.	
  	
  

38	
  
Here’s	
  an	
  example	
  of	
  the	
  search	
  engine	
  in	
  acJon	
  –	
  I’ve	
  used	
  a	
  bookmarklet	
  that	
  takes	
  
a	
  highlighted	
  quote	
  from	
  a	
  news	
  story	
  and	
  passes	
  it	
  to	
  the	
  custom	
  search	
  engine,	
  
allowing	
  me	
  to	
  easily	
  see	
  the	
  source	
  of	
  the	
  quote,	
  and	
  the	
  story	
  itself.	
  	
  
I’ve	
  also	
  started	
  defining	
  another	
  related	
  custom	
  search	
  engine	
  that	
  allows	
  us	
  to	
  
search	
  news	
  sites	
  and	
  polling	
  companies	
  for	
  stories	
  about,	
  and	
  sources	
  of,	
  polls	
  and	
  
surveys:	
  
hMps://www.google.com/cse/publicurl?cx=016419300868826941330:ewbi9skvnmq	
  	
  	
  

39	
  
Custom	
  search	
  engines	
  are	
  a	
  powerful	
  tool	
  for	
  helping	
  us	
  developed	
  focussed	
  web	
  
search	
  tools	
  that	
  limit	
  results	
  to	
  a	
  parJcular	
  part	
  of	
  the	
  web	
  we	
  are	
  interested	
  in,	
  
either	
  by	
  locaJon	
  or	
  topic.	
  
We	
  can	
  also	
  use	
  (advanced)	
  search	
  limits	
  in	
  ‘everyday’	
  web	
  queries	
  using	
  the	
  major	
  
web	
  search	
  engine.	
  
For	
  example,	
  the	
  query	
  shown	
  on	
  this	
  slide	
  searches	
  for	
  the	
  word	
  underspend	
  
appearing	
  in	
  Excel	
  spreadsheets	
  (filetype:xls)	
  that	
  can	
  be	
  found	
  on	
  UK	
  government	
  
websites	
  (or	
  more	
  specifically,	
  websites	
  hosted	
  on	
  the	
  gov.uk	
  domain	
  (site:gov.uk)).	
  
Another	
  query	
  limit	
  combinaJon	
  I	
  have	
  found	
  useful	
  is:	
  
confidenBal	
  filetype:ppt	
  
This	
  can	
  turn	
  up	
  presentaJons	
  that	
  have	
  been	
  delivered	
  at	
  closed	
  corporate	
  events	
  
but	
  that	
  have	
  leaked	
  on	
  to	
  the	
  web…	
  	
  

40	
  
Even	
  if	
  you	
  don’t	
  consider	
  yourself	
  a	
  geek	
  or	
  database	
  expert,	
  wriJng	
  advanced	
  
search	
  queries	
  using	
  search	
  limits	
  is	
  but	
  a	
  small	
  step	
  away	
  from	
  wriJng	
  queries	
  over	
  
databases	
  themselves.	
  
One	
  of	
  the	
  most	
  widely	
  used	
  languages	
  for	
  querying	
  databases	
  is	
  SQL.	
  The	
  above	
  slide	
  
shows	
  a	
  simple,	
  made	
  up	
  SQL	
  query	
  that	
  could	
  have	
  a	
  similar	
  effect	
  to	
  the	
  simpler	
  
search	
  engine	
  query	
  made	
  over	
  a	
  very	
  simple	
  search	
  engine	
  database.	
  
The	
  idea	
  is	
  that	
  we	
  select	
  those	
  webPages	
  where	
  the	
  text	
  content	
  of	
  the	
  webpage	
  
contains	
  the	
  word	
  underspend	
  anywhere	
  –	
  the	
  %	
  signs	
  denote	
  wildcard	
  characters	
  so	
  
the	
  underspend	
  word	
  can	
  appear	
  preceded	
  or	
  followed	
  by	
  any	
  number	
  of	
  arbitrary	
  
characters.	
  We	
  also	
  want	
  the	
  query	
  to	
  be	
  limited	
  to	
  pages	
  that	
  have	
  a	
  parJcular	
  
filetype	
  and	
  domain.	
  
Far	
  more	
  complicated	
  queries	
  can	
  be	
  wriMen	
  over	
  far	
  more	
  complex	
  databases.	
  
What’s	
  important	
  is	
  that	
  you	
  develop	
  an	
  idea	
  of	
  what	
  sorts	
  of	
  database	
  structure	
  and	
  
query	
  are	
  possible,	
  not	
  necessarily	
  that	
  you	
  can	
  run	
  and	
  query	
  such	
  databases	
  
yourself.	
  
For	
  more	
  examples,	
  see:	
  
Asking	
  QuesJons	
  of	
  Data	
  –	
  Garment	
  Factories	
  Data	
  ExpediJon	
  –	
  hMp://
schoolofdata.org/2013/05/24/asking-­‐quesJons-­‐of-­‐data-­‐garment-­‐factories-­‐data-­‐
expediJon/	
  	
  
Asking	
  QuesJons	
  of	
  Data	
  –	
  Some	
  Simple	
  One-­‐Liners	
  hMp://schoolofdata.org/
2013/05/13/asking-­‐quesJons-­‐of-­‐data-­‐some-­‐simple-­‐one-­‐liners/	
  	
  

41	
  
One	
  of	
  the	
  simplest,	
  but	
  oren	
  one	
  of	
  the	
  most	
  useful,	
  things	
  we	
  can	
  do	
  is	
  to	
  count	
  
things.	
  You	
  just	
  need	
  to	
  be	
  creaJve	
  in	
  what	
  you	
  count!	
  
One	
  of	
  the	
  nice	
  features	
  about	
  working	
  with	
  database	
  query	
  languages	
  such	
  as	
  SQL	
  is	
  
that	
  we	
  can	
  write	
  queries	
  that	
  count	
  the	
  number	
  of	
  responses	
  and	
  allows	
  us	
  to	
  rank	
  
results	
  on	
  that	
  basis.	
  For	
  example,	
  in	
  a	
  database	
  of	
  public	
  spending	
  transacJons	
  with	
  
different	
  companies,	
  we	
  could	
  count	
  the	
  number	
  of	
  transacJons	
  with	
  a	
  parJcular	
  
company,	
  sum	
  the	
  value	
  of	
  transacJons	
  carried	
  out	
  with	
  a	
  parJcular	
  company,	
  or	
  find	
  
the	
  companies	
  with	
  the	
  largest	
  total	
  amount	
  spent	
  with	
  a	
  parJcular	
  company.	
  

42	
  
As	
  has	
  already	
  been	
  menJoned,	
  a	
  key	
  part	
  of	
  the	
  journalisJc	
  exercise	
  is	
  pukng	
  things	
  
into	
  context.	
  
When	
  working	
  with	
  data,	
  interpreJng	
  what	
  the	
  data	
  says	
  oren	
  depends	
  on	
  
understanding	
  the	
  context	
  and	
  more	
  importantly,	
  the	
  caveats,	
  that	
  arise	
  by	
  virtue	
  of	
  
asking	
  a	
  parJcular	
  quesJon	
  of	
  a	
  parJcular	
  dataset	
  that	
  has	
  been	
  collected	
  in	
  a	
  
parJcular	
  way	
  under	
  parJcular	
  condiJons.	
  
That	
  said,	
  given	
  a	
  parJcular	
  data	
  set,	
  are	
  there	
  any	
  obvious	
  quesJons	
  we	
  can	
  ask	
  of	
  
it?	
  

43	
  
When	
  results	
  are	
  ranked,	
  as	
  for	
  example	
  in	
  the	
  case	
  of	
  league	
  tables,	
  there	
  are	
  oren	
  
easy	
  picking	
  stories	
  to	
  be	
  had	
  around	
  top	
  3/boMom	
  three	
  posiJons.	
  In	
  naJonal	
  
rankings,	
  local	
  news	
  stories	
  can	
  be	
  idenJfied	
  if	
  your	
  local	
  schools	
  or	
  council	
  appears	
  
in	
  either	
  of	
  those	
  extremes.	
  
For	
  contextualisaJon	
  purposes,	
  it	
  oren	
  makes	
  sense	
  to	
  look	
  at	
  distribuJons.	
  Many	
  
summary	
  staJsJcs	
  report	
  on	
  the	
  mean	
  value,	
  but	
  looking	
  at	
  measures	
  of	
  variaJon,	
  or	
  
spread,	
  about	
  a	
  mean,	
  as	
  well	
  as	
  the	
  posiJon	
  of	
  a	
  median	
  value,	
  can	
  oren	
  change	
  the	
  
context	
  of	
  a	
  story.	
  
If	
  the	
  lecture	
  room	
  has	
  20	
  students	
  in	
  it	
  on	
  an	
  income	
  of	
  £6,000	
  maintenance	
  loan	
  
per	
  year,	
  the	
  total	
  income	
  is	
  £120,000	
  and	
  their	
  average	
  mean	
  income	
  is	
  £6,000.	
  If	
  an	
  
academic	
  in	
  the	
  room	
  is	
  on	
  £40,000,	
  the	
  total	
  income	
  for	
  the	
  room	
  is	
  £160,000.	
  The	
  
average	
  mean	
  income	
  is	
  now	
  just	
  a	
  liMle	
  over	
  £7,500.	
  If	
  we	
  define	
  a	
  poverty	
  level	
  as	
  a	
  
mean	
  income	
  below	
  £10,	
  000,	
  the	
  members	
  of	
  the	
  room	
  are,	
  on	
  average,	
  in	
  poverty.	
  
If	
  a	
  senior	
  academic	
  such	
  as	
  professor	
  on	
  an	
  income	
  over	
  £65,000	
  wanders	
  into	
  the	
  
room,	
  the	
  total	
  income	
  goes	
  to	
  over	
  £225,000.	
  With	
  22	
  people	
  now	
  in	
  the	
  room,	
  the	
  
average	
  mean	
  income	
  is	
  now	
  over	
  £10,000:	
  the	
  room	
  is	
  out	
  of	
  poverty.	
  The	
  median	
  
average	
  income,	
  however,	
  is	
  sJll	
  at	
  £6,000.	
  
As	
  well	
  as	
  top,	
  boMom,	
  mean	
  and	
  median,	
  we	
  should	
  also	
  look	
  to	
  outliers.	
  If	
  Bill	
  Gates	
  
or	
  Mark	
  Zuckerberg	
  walks	
  into	
  a	
  bar,	
  the	
  average	
  net	
  worth	
  of	
  people	
  in	
  that	
  bar	
  is	
  
likely	
  to	
  go	
  up	
  to	
  a	
  level	
  of	
  previously	
  unimagined	
  wealth.	
  
Here	
  are	
  several	
  reasons	
  why	
  you	
  should	
  pay	
  aMenJon	
  to	
  outliers:	
  
-­‐ 	
  they	
  may	
  be	
  ‘dirty’	
  or	
  incorrect	
  data	
  points	
  that	
  need	
  to	
  be	
  corrected	
  and	
  that	
  may	
  
well	
  raise	
  quesJons	
  about	
  data	
  quality;	
  
-­‐ 	
  the	
  outlier	
  may	
  truly	
  be	
  an	
  outlier,	
  a	
  remarkable	
  point	
  and	
  a	
  story	
  in	
  its	
  own	
  right;	
  
-­‐ 	
  the	
  outlier	
  may	
  skew	
  other	
  measures,	
  such	
  as	
  mean	
  values	
  or	
  other	
  summary	
  
staJsJcs.	
  In	
  such	
  cases,	
  it	
  may	
  make	
  sense	
  to	
  use	
  other	
  measures	
  or	
  to	
  rerun	
  the	
  

44	
  
This	
  rather	
  dense	
  graphic	
  is	
  a	
  view	
  over	
  local	
  council	
  spending	
  data	
  in	
  my	
  local	
  area	
  as	
  
relates	
  to	
  spend	
  on	
  libraries.	
  The	
  separate	
  charts	
  show	
  the	
  accumulated	
  spend	
  over	
  a	
  
period	
  of	
  Jme	
  with	
  different	
  suppliers.	
  The	
  intenJon	
  of	
  the	
  display	
  was	
  to	
  provide	
  at	
  
a	
  glance	
  a	
  view	
  of	
  accumulated	
  spend	
  with	
  different	
  companies	
  across	
  different	
  
directorates	
  and	
  spending	
  areas	
  to	
  see	
  whether	
  any	
  companies	
  had	
  a	
  significant	
  
spend	
  compared	
  to	
  other	
  companies.	
  
The	
  table	
  at	
  the	
  boMom	
  shows	
  the	
  top	
  of	
  a	
  league	
  table	
  of	
  companies	
  with	
  the	
  largest	
  
accumulated	
  spend	
  by	
  directorate	
  and	
  expense	
  type.	
  
At	
  first	
  glance,	
  the	
  spend	
  on	
  phone	
  lines	
  with	
  different	
  suppliers	
  seems	
  to	
  outweigh	
  
the	
  spend	
  on	
  books.	
  How	
  can	
  that	
  be?	
  Are	
  the	
  librarians	
  spending	
  their	
  Jme	
  calling	
  
premium	
  rate	
  phone	
  lines?	
  
If	
  we	
  guess	
  at	
  20	
  libraries	
  and	
  a	
  6	
  month	
  spend	
  period,	
  then	
  assume	
  that	
  the	
  phone	
  
lines	
  correspond	
  to	
  broadband	
  data	
  bills,	
  do	
  the	
  monthly	
  payments	
  per	
  library	
  sJll	
  
seem	
  outrageous?	
  These	
  assumpJons	
  are	
  testable	
  via	
  quesJons	
  to	
  the	
  relevant	
  
authoriJes,	
  of	
  course,	
  but	
  demonstrate	
  the	
  care	
  we	
  need	
  to	
  take	
  when	
  trying	
  to	
  
understand	
  why	
  a	
  number	
  that	
  may	
  appear	
  to	
  be	
  large	
  is	
  that	
  large.	
  
See	
  also:	
  Local	
  Council	
  Spending	
  Data	
  –	
  Time	
  Series	
  Charts	
  hMp://blog.ouseful.info/
2013/11/06/local-­‐council-­‐spending-­‐data-­‐Jme-­‐series-­‐charts/	
  

45	
  
As	
  well	
  as	
  looking	
  for	
  outliers,	
  we	
  should	
  also	
  look	
  for	
  similariJes	
  between	
  things	
  we	
  
expect	
  to	
  be	
  different	
  and	
  differences	
  between	
  things	
  we	
  expect	
  to	
  be	
  the	
  same,	
  or	
  at	
  
least,	
  similar.	
  

46	
  
Looking	
  again	
  at	
  some	
  of	
  my	
  local	
  council’s	
  spending	
  data,	
  I	
  noJced	
  a	
  search	
  on	
  
“music”	
  pulled	
  back	
  what	
  appeared	
  to	
  be	
  a	
  shir	
  in	
  responsibility	
  between	
  
directorates	
  for	
  spend	
  on	
  school	
  music	
  service	
  provision.	
  
An	
  obvious	
  quesJon	
  that	
  follows	
  is:	
  if	
  the	
  service	
  did	
  change	
  hands	
  (something	
  we	
  
can	
  check),	
  was	
  there	
  a	
  resulJng	
  difference	
  in	
  the	
  way	
  that	
  the	
  directorates	
  were	
  
spending?	
  Could	
  we,	
  for	
  example,	
  idenJfy	
  whether	
  any	
  projects	
  got	
  dropped	
  (or	
  at	
  
least,	
  renamed	
  out	
  of	
  scope!)?	
  
This	
  forensic	
  approach	
  can	
  also	
  be	
  used	
  to	
  track	
  the	
  consequences	
  of	
  a	
  shir	
  in	
  control	
  
of	
  a	
  service,	
  if	
  we	
  know	
  it	
  has	
  happened.	
  When	
  a	
  service	
  changes	
  hand,	
  we	
  can	
  keep	
  
a	
  note	
  of	
  the	
  fact	
  and	
  then	
  a	
  year	
  on	
  look	
  for	
  evidence	
  in	
  whether	
  treatment	
  of	
  the	
  
service	
  has	
  changed,	
  at	
  least	
  in	
  consequences	
  for	
  spending.	
  
See	
  also:	
  What	
  Role,	
  If	
  Any,	
  Does	
  Spending	
  Data	
  Have	
  to	
  Play	
  in	
  Local	
  Council	
  Budget	
  
ConsultaJons?	
  hMp://blog.ouseful.info/2013/11/03/what-­‐role-­‐if-­‐any-­‐does-­‐spending-­‐
data-­‐have-­‐to-­‐play-­‐in-­‐local-­‐council-­‐budget-­‐consultaJons/	
  

47	
  
When	
  asking	
  quesJons	
  of	
  data,	
  one	
  quesJon	
  can	
  oren	
  lead	
  to	
  another.	
  
For	
  example,	
  a	
  query	
  over	
  my	
  local	
  council	
  spending	
  data	
  about	
  amounts	
  spent	
  with	
  
the	
  local	
  newspaper,	
  the	
  Isle	
  of	
  Wight	
  Country	
  Press,	
  idenJfied	
  a	
  variety	
  of	
  expense	
  
types	
  associated	
  with	
  those	
  spending	
  transacJons.	
  One	
  such	
  expense	
  type	
  was	
  
AdverBsing	
  &	
  Publicity.	
  This	
  led	
  to	
  me	
  now	
  steering	
  the	
  conversaJon	
  I	
  was	
  having	
  
with	
  this	
  expert	
  (data)	
  source	
  on	
  council	
  spending	
  and	
  taking	
  it	
  on	
  to	
  a	
  slightly	
  
different	
  tack:	
  so	
  who	
  else	
  have	
  you	
  been	
  spending	
  adverBsing	
  and	
  publicity	
  budgets	
  
with?	
  	
  

48	
  
If	
  you	
  in	
  the	
  posiJon	
  of	
  paying	
  for	
  energy	
  supply	
  bills	
  –	
  electricity	
  and	
  gas	
  –	
  you’ll	
  
probably	
  be	
  familiar	
  with	
  the	
  idea	
  that	
  payments	
  are	
  set	
  so	
  you	
  tend	
  to	
  overpay	
  on	
  a	
  
monthly	
  basis.	
  Arer	
  collecJng	
  the	
  interest	
  on	
  your	
  overpayments,	
  the	
  uJlity	
  
companies	
  may	
  eventually	
  get	
  round	
  to	
  sending	
  you	
  a	
  small	
  repayment	
  to	
  cover	
  the	
  
excess	
  (ex-­‐	
  of	
  any	
  interest,	
  of	
  course…).	
  
Is	
  the	
  same	
  true	
  at	
  the	
  council	
  level?	
  
One	
  thing	
  I	
  noJced	
  in	
  the	
  spend	
  my	
  local	
  council	
  spent	
  with	
  supplier	
  Southern	
  
Electric	
  was	
  that	
  there	
  appeared	
  to	
  be	
  more	
  than	
  a	
  few	
  “negaJve	
  payments”.	
  So	
  
where	
  were	
  these	
  coming	
  from?	
  The	
  chart	
  shown	
  in	
  this	
  slide	
  has	
  posiJve	
  payments	
  
made	
  by	
  date	
  (not	
  ordered	
  on	
  an	
  evenly	
  space	
  Jmeline)	
  in	
  black,	
  and	
  the	
  magnitude	
  
of	
  negaJve	
  payments	
  shown	
  in	
  red.	
  Where	
  a	
  red	
  triangle	
  sits	
  over	
  a	
  black	
  dot,	
  this	
  
shows	
  that	
  a	
  posiJve	
  and	
  negaJve	
  payment	
  of	
  the	
  same	
  amount	
  were	
  made	
  on	
  the	
  
same	
  day.	
  Why’s	
  that?	
  
Some	
  days	
  show	
  several	
  negaJve	
  payments	
  –	
  again,	
  what’s	
  happening?	
  There’s	
  not	
  
necessarily	
  anything	
  suspicious	
  going	
  on,	
  but	
  what	
  story	
  does	
  this	
  chart	
  appear	
  to	
  tell	
  
us,	
  parJcularly	
  in	
  terms	
  of	
  the	
  similariJes	
  in	
  amount	
  of	
  certain	
  posiJve	
  and	
  negaJve	
  
spends?	
  

49	
  
Just	
  by	
  the	
  by,	
  this	
  chart	
  refines	
  the	
  quesJon	
  I’m	
  asking	
  of	
  the	
  spend	
  with	
  Southern	
  
Electric,	
  asking	
  for	
  more	
  informaJon	
  about	
  posiJve	
  and	
  negaJve	
  payments	
  made	
  on	
  
the	
  gas	
  and	
  electricity	
  accounts	
  separately.	
  

50	
  
As	
  well	
  as	
  similariJes	
  and	
  differences,	
  data	
  can	
  tell	
  us	
  tales	
  about	
  trends…	
  

51	
  
Regular	
  releases	
  from	
  the	
  ONS	
  –	
  the	
  Office	
  of	
  NaJonal	
  StaJsJcs	
  –	
  provide	
  bread	
  and	
  
buMer	
  news	
  stories	
  on	
  a	
  regular	
  basis	
  according	
  to	
  a	
  known	
  schedule.	
  
For	
  example,	
  monthly	
  job	
  seeker	
  figures	
  get	
  a	
  monthly	
  write-­‐up	
  in	
  OnTheWight,	
  the	
  
hyperlocal	
  news	
  blog	
  local	
  to	
  me.	
  The	
  report	
  makes	
  a	
  comparison	
  between	
  the	
  
current	
  figures	
  and	
  figures	
  from	
  the	
  previous	
  month	
  and	
  from	
  the	
  same	
  month	
  of	
  the	
  
previous	
  year.	
  The	
  aim	
  is	
  is	
  so	
  that	
  we	
  can	
  see	
  how	
  the	
  numbers	
  have	
  changed	
  month	
  
on	
  month,	
  and	
  year	
  on	
  year.	
  
I	
  started	
  to	
  explore	
  a	
  simple	
  script	
  that	
  would	
  take	
  data	
  directly	
  from	
  the	
  ONS	
  and	
  
produce	
  assets	
  that	
  could	
  be	
  reused	
  in	
  a	
  news	
  story	
  –	
  for	
  example,	
  to	
  produce	
  a	
  table	
  
showing	
  the	
  change	
  in	
  figures	
  over	
  recent	
  months.	
  
I	
  also	
  started	
  to	
  explore	
  ways	
  in	
  which	
  we	
  could	
  automate	
  the	
  producJon	
  of	
  prose	
  
from	
  the	
  data	
  [code:	
  hMps://gist.github.com/psychemedia/7536017].	
  For	
  example,	
  
the	
  following	
  phrase	
  was	
  generated	
  automaJcally	
  from	
  monthly	
  figures:	
  
The	
  total	
  number	
  of	
  people	
  claiming	
  Job	
  Seeker's	
  Allowance	
  (JSA)	
  on	
  the	
  Isle	
  of	
  Wight	
  
in	
  October	
  was	
  2781,	
  up	
  94	
  from	
  2687	
  in	
  September,	
  2013,	
  and	
  down	
  377	
  from	
  3158	
  
in	
  October,	
  2012.	
  
The	
  words	
  up	
  and	
  down	
  were	
  selected	
  based	
  on	
  simple	
  if-­‐then	
  rule	
  that	
  compared	
  
figures	
  to	
  see	
  which	
  was	
  the	
  greater.	
  The	
  numbers	
  and	
  dates	
  are	
  pulled	
  in	
  from	
  the	
  
data.	
  The	
  other	
  words	
  are	
  canned	
  phrases.	
  
The	
  automated	
  producJon	
  of	
  text	
  from	
  data	
  is	
  something	
  that	
  has	
  received	
  aMenJon	
  
from	
  several	
  companies,	
  parJcular	
  in	
  the	
  area	
  of	
  baseball	
  reports	
  and	
  financial	
  
reporJng.	
  See	
  for	
  example:	
  hMp://blog.ouseful.info/2013/05/22/notes-­‐on-­‐narraJve-­‐
science-­‐and-­‐automated-­‐insight/	
  

52	
  
If	
  we	
  plot	
  a	
  line	
  chart	
  with	
  some	
  quanJty	
  against	
  a	
  Jme	
  axis,	
  we	
  can	
  oren	
  see	
  
increasing	
  or	
  decreasing	
  trends	
  over	
  Jme.	
  If	
  we	
  are	
  looking	
  for	
  constant	
  rates	
  of	
  
increase	
  in	
  some	
  value,	
  it	
  oren	
  makes	
  sense	
  to	
  use	
  a	
  log/logarithmic	
  scale	
  to	
  display	
  
that	
  value	
  on	
  the	
  y-­‐axis	
  Periodic	
  trends	
  can	
  also	
  be	
  seen	
  as	
  ‘waves’	
  appearing	
  in	
  the	
  
line	
  over	
  Jme,	
  but	
  other	
  displays	
  can	
  draw	
  out	
  periodicity	
  or	
  seasonality	
  in	
  a	
  more	
  
visually	
  compelling	
  way.	
  
For	
  example,	
  in	
  these	
  charts	
  –	
  of	
  jobless	
  figures	
  on	
  the	
  Isle	
  of	
  Wight	
  once	
  again	
  –	
  we	
  
have	
  months	
  ordered	
  along	
  the	
  horizontal	
  x-­‐axis	
  and	
  the	
  number	
  of	
  job	
  allowance	
  
claimants	
  on	
  the	
  verJcal	
  y-­‐axis.	
  The	
  separate	
  coloured	
  lines	
  represent	
  different	
  years.	
  
On	
  the	
  ler,	
  we	
  use	
  a	
  legend	
  to	
  idenJfy	
  the	
  lines,	
  on	
  the	
  right	
  is	
  an	
  example	
  of	
  
labeling	
  the	
  lines	
  directly.	
  
The	
  lines	
  show	
  strong	
  seasonality	
  in	
  behaviour.	
  Being	
  a	
  tourist	
  desJnaJon,	
  job	
  seeker	
  
figures	
  tend	
  to	
  fall	
  over	
  the	
  summer	
  months.	
  Pukng	
  lines	
  for	
  several	
  years	
  on	
  the	
  
same	
  axis	
  allows	
  us	
  to	
  compare	
  annual	
  cycles	
  over	
  Jme.	
  

53	
  
Another	
  trend	
  we	
  can	
  try	
  to	
  pull	
  out	
  is	
  change	
  over	
  years	
  for	
  each	
  given	
  month.	
  Here,	
  
the	
  horizontal	
  x-­‐axis	
  blocks	
  out	
  the	
  months,	
  as	
  before,	
  but	
  within	
  each	
  month	
  we	
  
have	
  an	
  ordered	
  range	
  of	
  years.	
  The	
  line	
  within	
  each	
  block	
  thus	
  represents	
  the	
  year-­‐
on-­‐year	
  change	
  in	
  numbers	
  within	
  a	
  given	
  month.	
  
The	
  step	
  change	
  within	
  each	
  month	
  suggests	
  that	
  the	
  way	
  the	
  figures	
  were	
  calculated	
  
changed	
  significantly	
  several	
  years	
  ago.	
  
Further	
  reading:	
  a	
  good	
  guide	
  to	
  staJsJcs	
  as	
  used	
  by	
  government,	
  include	
  a	
  
descripJon	
  of	
  the	
  way	
  that	
  “seasonal	
  adjustments”	
  are	
  handled,	
  is	
  provided	
  by	
  the	
  
House	
  of	
  Commons	
  Library’s	
  StaJsJcal	
  Literacy	
  Guide	
  hMp://www.parliament.uk/
business/publicaJons/research/briefing-­‐papers/SN04944/staJsJcal-­‐literacy-­‐guide	
  

54	
  
As	
  well	
  as	
  the	
  paMerns	
  we	
  can	
  see	
  over	
  Jme	
  by	
  plokng	
  data	
  against	
  a	
  Jme	
  axis,	
  we	
  
can	
  also	
  look	
  for	
  paMerns	
  in	
  space…	
  

55	
  
In	
  part	
  because	
  they	
  are	
  so	
  recognisable	
  to	
  the	
  majority	
  of	
  people	
  as	
  an	
  idea	
  	
  as	
  well	
  
as	
  an	
  artefact,	
  maps	
  are	
  widely	
  used	
  in	
  many	
  publicaJons.	
  
I	
  have	
  already	
  menJoned	
  how	
  the	
  use	
  of	
  a	
  map	
  to	
  compare	
  travel	
  claims	
  by	
  MPs	
  
based	
  on	
  their	
  consJtuency	
  locaJons	
  provided	
  a	
  way	
  of	
  making	
  a	
  parJcular	
  sort	
  of	
  
comparison	
  between	
  MPs	
  (in	
  parJcular,	
  a	
  comparison	
  based	
  on	
  geographical	
  
locaJon).	
  
But	
  we	
  can	
  take	
  the	
  idea	
  of	
  a	
  map	
  more	
  generally,	
  as	
  a	
  spaJal	
  distribuJon	
  of	
  points	
  
that	
  are	
  related	
  in	
  some	
  way,	
  with	
  strong	
  relaJons	
  represented	
  as	
  spaJal	
  proximity.	
  
Things	
  that	
  are	
  close	
  together	
  on	
  the	
  page	
  are	
  taken	
  to	
  be	
  close	
  together	
  in	
  some	
  sort	
  
of	
  space,	
  a	
  space	
  which	
  may	
  be	
  conceptual	
  or	
  social,	
  not	
  just	
  (or	
  not	
  even)	
  
geographic.	
  

56	
  
Take	
  this	
  map,	
  for	
  example,	
  a	
  map	
  of	
  TwiMer	
  users	
  commonly	
  followed	
  by	
  a	
  sample	
  of	
  
followers	
  of	
  @UL_journalism.	
  
The	
  map	
  has	
  been	
  laid	
  out	
  so	
  that	
  TwiMer	
  users	
  who	
  are	
  heavily	
  interlinked	
  are	
  
grouped	
  closely	
  together	
  (for	
  the	
  most	
  part,	
  at	
  least).	
  A	
  network	
  staJsJc	
  has	
  been	
  
used	
  in	
  an	
  aMempt	
  to	
  colour	
  clusters	
  of	
  nodes	
  with	
  high	
  interconnecJon.	
  The	
  
coloured	
  regions	
  thus	
  represent	
  a	
  first	
  aMempt	
  at	
  idenJfying	
  different	
  groupings	
  of	
  
TwiMer	
  user.	
  You	
  will	
  note	
  how	
  the	
  spaJal	
  layout	
  algorithm	
  and	
  the	
  grouping/
colouring	
  algorithm	
  complement	
  each	
  other	
  well	
  –	
  they	
  both	
  seem	
  to	
  tell	
  a	
  similar	
  
story,	
  where	
  the	
  story	
  is	
  that	
  certain	
  groups	
  of	
  individuals	
  are	
  somehow	
  alike.	
  
About	
  the	
  technique:	
  hMp://schoolofdata.org/2014/02/14/mapping-­‐social-­‐
posiJoning-­‐on-­‐twiMer/	
  
Let’s	
  have	
  a	
  closer	
  look	
  at	
  some	
  of	
  the	
  regions…	
  

57	
  
This	
  area	
  seems	
  to	
  be	
  TwiMer	
  accounts	
  that	
  relate	
  in	
  large	
  part	
  to	
  the	
  University	
  of	
  
Lincoln	
  and	
  its	
  related	
  organisaJons	
  and	
  acJviJes.	
  

58	
  
This	
  area	
  of	
  the	
  map	
  contains	
  accounts	
  associated	
  with	
  Lincoln	
  more	
  generally.	
  Such	
  a	
  
map	
  may	
  be	
  useful	
  for	
  idenJfying	
  companies	
  that	
  are	
  used	
  by	
  students	
  and	
  as	
  such	
  
may	
  be	
  useful	
  leads	
  for	
  adverJsing	
  agents	
  looking	
  to	
  sell	
  adverts	
  appearing	
  in	
  
university	
  magazines	
  or	
  poster	
  areas.	
  

59	
  
This	
  area	
  of	
  the	
  map	
  actually	
  conflates	
  several	
  different	
  groupings,	
  at	
  least,	
  on	
  my	
  
reading	
  of	
  it.	
  In	
  fact,	
  it	
  may	
  make	
  sense	
  to	
  try	
  to	
  find	
  clusters	
  within	
  this	
  group	
  on	
  its	
  
on	
  and	
  then	
  recolour	
  accordingly.	
  
So	
  what	
  groups	
  can	
  I	
  see?	
  BoMom	
  ler	
  there	
  looks	
  to	
  be	
  Lincoln	
  local	
  media	
  outlets.	
  
Moving	
  counter-­‐clockwise	
  between	
  the	
  6	
  and	
  3	
  o’clock	
  posiJons	
  we	
  see	
  photography	
  
related	
  users	
  moving	
  up	
  into	
  celebriJes.	
  As	
  we	
  move	
  further	
  up	
  towards	
  the	
  twelve	
  
o’clock	
  posiJon,	
  we	
  see	
  news	
  sites,	
  both	
  “popular”	
  and	
  more	
  industry	
  related	
  
(@journalismnews,	
  for	
  example).	
  
That	
  there	
  does	
  not	
  appear	
  to	
  be	
  a	
  strong	
  independent	
  cluster	
  of	
  journalists	
  and	
  
industry	
  related	
  sites	
  suggests	
  that,	
  from	
  the	
  sampled	
  followers	
  of	
  UL_Journalism	
  at	
  
least,	
  there	
  isnlt	
  necessarily	
  a	
  very	
  strong	
  noJon	
  of	
  following	
  these	
  industry	
  lights…	
  

60	
  
One	
  of	
  the	
  things	
  to	
  menJon	
  about	
  mapping	
  data	
  mapping	
  and	
  visualisaJon	
  
techniques	
  is	
  that	
  they	
  oren	
  tells	
  us	
  things	
  we	
  already	
  (think	
  we)	
  know;	
  in	
  that	
  sense,	
  
they	
  are	
  not	
  news.	
  But	
  they	
  may	
  also	
  tell	
  us	
  things	
  we	
  know	
  in	
  new,	
  visually	
  appealing	
  
ways.	
  And	
  by	
  making	
  use	
  of	
  such	
  ‘confirmatory’	
  visualisaJons	
  and	
  displays	
  we	
  can	
  
build	
  confidence	
  within	
  an	
  audience	
  that	
  they	
  know	
  how	
  to	
  interpret	
  these	
  sorts	
  of	
  
representaJon.	
  

61	
  
As	
  the	
  audience	
  becomes	
  comfortable	
  reading	
  the	
  charts	
  and	
  making	
  sense	
  of	
  data,	
  
when	
  there	
  is	
  something	
  new	
  or	
  surprising	
  in	
  the	
  data,	
  the	
  surprise	
  manifests	
  itself	
  in	
  
the	
  reading	
  of	
  the	
  data	
  or	
  chart.	
  
For	
  journalists	
  working	
  with	
  data,	
  developing	
  a	
  sense	
  of	
  familiarity	
  with	
  how	
  to	
  
interpret	
  and	
  read	
  data	
  when	
  it	
  is	
  just	
  confirming	
  what	
  you	
  already	
  know	
  helps	
  to	
  
refine	
  your	
  senses	
  for	
  spokng	
  things	
  that	
  are	
  odd,	
  noteworthy,	
  or	
  newsworthy.	
  
Taking	
  a	
  liMle	
  bit	
  of	
  Jme	
  each	
  day	
  to:	
  
-­‐ 	
  read	
  charts	
  as	
  if	
  they	
  were	
  stories;	
  
-­‐ 	
  look	
  behind	
  the	
  data	
  to	
  find	
  original	
  sources,	
  such	
  as	
  polls	
  or	
  data	
  containing	
  news	
  
releases,	
  and	
  then	
  compare	
  the	
  original	
  release	
  with	
  the	
  way	
  it	
  is	
  reported,	
  paying	
  
parJcular	
  aMenJon	
  to	
  the	
  points	
  that	
  are	
  highlighted,	
  and	
  how	
  the	
  data	
  is	
  
contextualised;	
  
will	
  help	
  you	
  develop	
  some	
  of	
  the	
  skills	
  you	
  will	
  need	
  if	
  you	
  want	
  to	
  be	
  able	
  to	
  
idenJfy,	
  develop	
  and	
  treat	
  some	
  of	
  the	
  stories	
  that	
  your	
  specialist	
  source	
  that	
  is	
  data	
  
can	
  provide	
  you	
  with,	
  of	
  only	
  you	
  ask…	
  	
  

62	
  
And	
  finally,	
  a	
  couple	
  of	
  handy	
  books	
  and	
  resources	
  on	
  data	
  journalism	
  if	
  you’re	
  
interested	
  in	
  reading	
  more	
  generally	
  around	
  the	
  subject…	
  

63	
  

Contenu connexe

Tendances

Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...Jonathan Stray
 
A LINK-BASED APPROACH TO ENTITY RESOLUTION IN SOCIAL NETWORKS
A LINK-BASED APPROACH TO ENTITY RESOLUTION IN SOCIAL NETWORKSA LINK-BASED APPROACH TO ENTITY RESOLUTION IN SOCIAL NETWORKS
A LINK-BASED APPROACH TO ENTITY RESOLUTION IN SOCIAL NETWORKScsandit
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
 
Frontiers of Computational Journalism week 3 - Information Filter Design
Frontiers of Computational Journalism week 3 - Information Filter DesignFrontiers of Computational Journalism week 3 - Information Filter Design
Frontiers of Computational Journalism week 3 - Information Filter DesignJonathan Stray
 
Script for ICCRTS 2011 Presentation
Script for ICCRTS 2011 PresentationScript for ICCRTS 2011 Presentation
Script for ICCRTS 2011 PresentationBruce Forrester
 
The Road to Open Data Enlightenment Is Paved With Nice Excuses
The Road to Open Data Enlightenment Is Paved With Nice ExcusesThe Road to Open Data Enlightenment Is Paved With Nice Excuses
The Road to Open Data Enlightenment Is Paved With Nice ExcusesToon Vanagt
 
POLITICAL OPINION ANALYSIS IN SOCIAL NETWORKS: CASE OF TWITTER AND FACEBOOK
POLITICAL OPINION ANALYSIS IN SOCIAL  NETWORKS: CASE OF TWITTER AND FACEBOOK POLITICAL OPINION ANALYSIS IN SOCIAL  NETWORKS: CASE OF TWITTER AND FACEBOOK
POLITICAL OPINION ANALYSIS IN SOCIAL NETWORKS: CASE OF TWITTER AND FACEBOOK dannyijwest
 
Data Science and its Relationship to Big Data and Data-Driven Decision Making
Data Science and its Relationship to Big Data and Data-Driven Decision MakingData Science and its Relationship to Big Data and Data-Driven Decision Making
Data Science and its Relationship to Big Data and Data-Driven Decision MakingDr. Volkan OBAN
 
Emcien overview v6 01282013
Emcien overview v6 01282013Emcien overview v6 01282013
Emcien overview v6 01282013WCJones6348
 
Frontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisFrontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisJonathan Stray
 

Tendances (11)

Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
 
A LINK-BASED APPROACH TO ENTITY RESOLUTION IN SOCIAL NETWORKS
A LINK-BASED APPROACH TO ENTITY RESOLUTION IN SOCIAL NETWORKSA LINK-BASED APPROACH TO ENTITY RESOLUTION IN SOCIAL NETWORKS
A LINK-BASED APPROACH TO ENTITY RESOLUTION IN SOCIAL NETWORKS
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
Frontiers of Computational Journalism week 3 - Information Filter Design
Frontiers of Computational Journalism week 3 - Information Filter DesignFrontiers of Computational Journalism week 3 - Information Filter Design
Frontiers of Computational Journalism week 3 - Information Filter Design
 
Script for ICCRTS 2011 Presentation
Script for ICCRTS 2011 PresentationScript for ICCRTS 2011 Presentation
Script for ICCRTS 2011 Presentation
 
The Road to Open Data Enlightenment Is Paved With Nice Excuses
The Road to Open Data Enlightenment Is Paved With Nice ExcusesThe Road to Open Data Enlightenment Is Paved With Nice Excuses
The Road to Open Data Enlightenment Is Paved With Nice Excuses
 
POLITICAL OPINION ANALYSIS IN SOCIAL NETWORKS: CASE OF TWITTER AND FACEBOOK
POLITICAL OPINION ANALYSIS IN SOCIAL  NETWORKS: CASE OF TWITTER AND FACEBOOK POLITICAL OPINION ANALYSIS IN SOCIAL  NETWORKS: CASE OF TWITTER AND FACEBOOK
POLITICAL OPINION ANALYSIS IN SOCIAL NETWORKS: CASE OF TWITTER AND FACEBOOK
 
Data Science and its Relationship to Big Data and Data-Driven Decision Making
Data Science and its Relationship to Big Data and Data-Driven Decision MakingData Science and its Relationship to Big Data and Data-Driven Decision Making
Data Science and its Relationship to Big Data and Data-Driven Decision Making
 
Paper 28
Paper 28Paper 28
Paper 28
 
Emcien overview v6 01282013
Emcien overview v6 01282013Emcien overview v6 01282013
Emcien overview v6 01282013
 
Frontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisFrontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text Analysis
 

En vedette

Project Report - Raymond Chepkwony
Project Report - Raymond ChepkwonyProject Report - Raymond Chepkwony
Project Report - Raymond ChepkwonyRaymond Chepkwony
 
Searching for evidence
Searching for evidenceSearching for evidence
Searching for evidenceAnne Madden
 
Tips for searching (and finding!): Library Elevenses
Tips for searching (and finding!): Library ElevensesTips for searching (and finding!): Library Elevenses
Tips for searching (and finding!): Library ElevensesAnne Madden
 
Finding Public Policy briefs
Finding Public Policy briefsFinding Public Policy briefs
Finding Public Policy briefsguest388eb8e
 
Combined Boolean Slideshare
Combined Boolean SlideshareCombined Boolean Slideshare
Combined Boolean SlideshareCommvault
 
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...Ted Drake
 
NACHR Conf Presentation Headhunting in Healthcare - Free Tools Greg Hawkes
NACHR Conf Presentation Headhunting in Healthcare - Free Tools Greg HawkesNACHR Conf Presentation Headhunting in Healthcare - Free Tools Greg Hawkes
NACHR Conf Presentation Headhunting in Healthcare - Free Tools Greg HawkesGreg Hawkes
 
Deploying Next Generation Firewalling with ASA - CX
Deploying Next Generation Firewalling with ASA - CXDeploying Next Generation Firewalling with ASA - CX
Deploying Next Generation Firewalling with ASA - CXCisco Canada
 
TAG Recruiting JAN2015 - google cse (steve rath)
TAG Recruiting JAN2015 - google cse (steve rath)TAG Recruiting JAN2015 - google cse (steve rath)
TAG Recruiting JAN2015 - google cse (steve rath)Steve Rath
 
In-Depth with Local SEO
In-Depth with Local SEOIn-Depth with Local SEO
In-Depth with Local SEORand Fishkin
 
Hothouse: CX Design in a Big Company
Hothouse: CX Design in a Big CompanyHothouse: CX Design in a Big Company
Hothouse: CX Design in a Big CompanyShardul Mehta
 
How to Benchmark Your Online Customer Experience Against Competition
How to Benchmark Your Online Customer Experience Against CompetitionHow to Benchmark Your Online Customer Experience Against Competition
How to Benchmark Your Online Customer Experience Against CompetitionUserZoom
 

En vedette (13)

Research Tools
Research ToolsResearch Tools
Research Tools
 
Project Report - Raymond Chepkwony
Project Report - Raymond ChepkwonyProject Report - Raymond Chepkwony
Project Report - Raymond Chepkwony
 
Searching for evidence
Searching for evidenceSearching for evidence
Searching for evidence
 
Tips for searching (and finding!): Library Elevenses
Tips for searching (and finding!): Library ElevensesTips for searching (and finding!): Library Elevenses
Tips for searching (and finding!): Library Elevenses
 
Finding Public Policy briefs
Finding Public Policy briefsFinding Public Policy briefs
Finding Public Policy briefs
 
Combined Boolean Slideshare
Combined Boolean SlideshareCombined Boolean Slideshare
Combined Boolean Slideshare
 
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
 
NACHR Conf Presentation Headhunting in Healthcare - Free Tools Greg Hawkes
NACHR Conf Presentation Headhunting in Healthcare - Free Tools Greg HawkesNACHR Conf Presentation Headhunting in Healthcare - Free Tools Greg Hawkes
NACHR Conf Presentation Headhunting in Healthcare - Free Tools Greg Hawkes
 
Deploying Next Generation Firewalling with ASA - CX
Deploying Next Generation Firewalling with ASA - CXDeploying Next Generation Firewalling with ASA - CX
Deploying Next Generation Firewalling with ASA - CX
 
TAG Recruiting JAN2015 - google cse (steve rath)
TAG Recruiting JAN2015 - google cse (steve rath)TAG Recruiting JAN2015 - google cse (steve rath)
TAG Recruiting JAN2015 - google cse (steve rath)
 
In-Depth with Local SEO
In-Depth with Local SEOIn-Depth with Local SEO
In-Depth with Local SEO
 
Hothouse: CX Design in a Big Company
Hothouse: CX Design in a Big CompanyHothouse: CX Design in a Big Company
Hothouse: CX Design in a Big Company
 
How to Benchmark Your Online Customer Experience Against Competition
How to Benchmark Your Online Customer Experience Against CompetitionHow to Benchmark Your Online Customer Experience Against Competition
How to Benchmark Your Online Customer Experience Against Competition
 

Similaire à An Introduction to Data Journalism

An Introduction to Data Visualization
An Introduction to Data VisualizationAn Introduction to Data Visualization
An Introduction to Data VisualizationNupur Samaddar
 
Platforms and Analytical Gestures
Platforms and Analytical GesturesPlatforms and Analytical Gestures
Platforms and Analytical GesturesBernhard Rieder
 
591 Final Report - Team 7 - Political Issues
591 Final Report - Team 7 - Political Issues591 Final Report - Team 7 - Political Issues
591 Final Report - Team 7 - Political IssuesTim Sawicki
 
Denver Event - 2013 - Floodlight and Data Engine User Survey
Denver Event - 2013 - Floodlight and Data Engine User SurveyDenver Event - 2013 - Floodlight and Data Engine User Survey
Denver Event - 2013 - Floodlight and Data Engine User SurveyKDMC
 
Community profiling presenting your infomation
Community profiling  presenting your infomationCommunity profiling  presenting your infomation
Community profiling presenting your infomationTim Curtis
 
Data Literacy in Public Relations by the PRCA Innovation Forum.pdf
Data Literacy in Public Relations by the PRCA Innovation Forum.pdfData Literacy in Public Relations by the PRCA Innovation Forum.pdf
Data Literacy in Public Relations by the PRCA Innovation Forum.pdfJames
 
Please accept this assignment 25 pages minimum double space courie.docx
Please accept this assignment 25 pages minimum double space courie.docxPlease accept this assignment 25 pages minimum double space courie.docx
Please accept this assignment 25 pages minimum double space courie.docxrandymartin91030
 
Activity to Explore Community Demographics
Activity to Explore Community DemographicsActivity to Explore Community Demographics
Activity to Explore Community DemographicsEveryday Democracy
 
How Big Data Deep Analysis and Agile SQL Querying Give 2016 Campaigners an Ed...
How Big Data Deep Analysis and Agile SQL Querying Give 2016 Campaigners an Ed...How Big Data Deep Analysis and Agile SQL Querying Give 2016 Campaigners an Ed...
How Big Data Deep Analysis and Agile SQL Querying Give 2016 Campaigners an Ed...Dana Gardner
 
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...Jonathan Stray
 
Add a section to the paper you submittedIt is based on the paper (.docx
Add a section to the paper you submittedIt is based on the paper (.docxAdd a section to the paper you submittedIt is based on the paper (.docx
Add a section to the paper you submittedIt is based on the paper (.docxdaniahendric
 
Social Networking Facebook My Space
Social Networking Facebook My SpaceSocial Networking Facebook My Space
Social Networking Facebook My Spaceannesunita
 
Gov Transformation Through Public Data
Gov Transformation Through Public DataGov Transformation Through Public Data
Gov Transformation Through Public DataW. David Stephenson
 
Sample Text Citation Mla T. Online assignment writing service.
Sample Text Citation Mla T. Online assignment writing service.Sample Text Citation Mla T. Online assignment writing service.
Sample Text Citation Mla T. Online assignment writing service.Michelle Brown
 

Similaire à An Introduction to Data Journalism (20)

An Introduction to Data Visualization
An Introduction to Data VisualizationAn Introduction to Data Visualization
An Introduction to Data Visualization
 
Platforms and Analytical Gestures
Platforms and Analytical GesturesPlatforms and Analytical Gestures
Platforms and Analytical Gestures
 
591 Final Report - Team 7 - Political Issues
591 Final Report - Team 7 - Political Issues591 Final Report - Team 7 - Political Issues
591 Final Report - Team 7 - Political Issues
 
Denver Event - 2013 - Floodlight and Data Engine User Survey
Denver Event - 2013 - Floodlight and Data Engine User SurveyDenver Event - 2013 - Floodlight and Data Engine User Survey
Denver Event - 2013 - Floodlight and Data Engine User Survey
 
Community profiling presenting your infomation
Community profiling  presenting your infomationCommunity profiling  presenting your infomation
Community profiling presenting your infomation
 
Data Literacy in Public Relations by the PRCA Innovation Forum.pdf
Data Literacy in Public Relations by the PRCA Innovation Forum.pdfData Literacy in Public Relations by the PRCA Innovation Forum.pdf
Data Literacy in Public Relations by the PRCA Innovation Forum.pdf
 
Onalytica WP
Onalytica WPOnalytica WP
Onalytica WP
 
Mmis 61 so2016
Mmis 61 so2016Mmis 61 so2016
Mmis 61 so2016
 
Please accept this assignment 25 pages minimum double space courie.docx
Please accept this assignment 25 pages minimum double space courie.docxPlease accept this assignment 25 pages minimum double space courie.docx
Please accept this assignment 25 pages minimum double space courie.docx
 
Activity to Explore Community Demographics
Activity to Explore Community DemographicsActivity to Explore Community Demographics
Activity to Explore Community Demographics
 
Tallink
TallinkTallink
Tallink
 
How Big Data Deep Analysis and Agile SQL Querying Give 2016 Campaigners an Ed...
How Big Data Deep Analysis and Agile SQL Querying Give 2016 Campaigners an Ed...How Big Data Deep Analysis and Agile SQL Querying Give 2016 Campaigners an Ed...
How Big Data Deep Analysis and Agile SQL Querying Give 2016 Campaigners an Ed...
 
NLP journal paper
NLP journal paperNLP journal paper
NLP journal paper
 
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
 
Add a section to the paper you submittedIt is based on the paper (.docx
Add a section to the paper you submittedIt is based on the paper (.docxAdd a section to the paper you submittedIt is based on the paper (.docx
Add a section to the paper you submittedIt is based on the paper (.docx
 
Social Networking Facebook My Space
Social Networking Facebook My SpaceSocial Networking Facebook My Space
Social Networking Facebook My Space
 
Sociological Methodology
Sociological MethodologySociological Methodology
Sociological Methodology
 
Gov Transformation Through Public Data
Gov Transformation Through Public DataGov Transformation Through Public Data
Gov Transformation Through Public Data
 
Sample Text Citation Mla T. Online assignment writing service.
Sample Text Citation Mla T. Online assignment writing service.Sample Text Citation Mla T. Online assignment writing service.
Sample Text Citation Mla T. Online assignment writing service.
 
Big6 intro
Big6 introBig6 intro
Big6 intro
 

Plus de Tony Hirst

15 in 20 research fiesta
15 in 20 research fiesta15 in 20 research fiesta
15 in 20 research fiestaTony Hirst
 
Jupyternotebooks ou.pptx
Jupyternotebooks ou.pptxJupyternotebooks ou.pptx
Jupyternotebooks ou.pptxTony Hirst
 
Virtual computing.pptx
Virtual computing.pptxVirtual computing.pptx
Virtual computing.pptxTony Hirst
 
ouseful-parlihacks
ouseful-parlihacksouseful-parlihacks
ouseful-parlihacksTony Hirst
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriateTony Hirst
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriateTony Hirst
 
Robotlab jupyter
Robotlab   jupyterRobotlab   jupyter
Robotlab jupyterTony Hirst
 
Community Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireCommunity Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireTony Hirst
 
Residential school 2015_robotics_interest
Residential school 2015_robotics_interestResidential school 2015_robotics_interest
Residential school 2015_robotics_interestTony Hirst
 
Data Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXData Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXTony Hirst
 
A Quick Tour of OpenRefine
A Quick Tour of OpenRefineA Quick Tour of OpenRefine
A Quick Tour of OpenRefineTony Hirst
 
Conversations with data
Conversations with dataConversations with data
Conversations with dataTony Hirst
 
Data reuse OU workshop bingo
Data reuse OU workshop bingoData reuse OU workshop bingo
Data reuse OU workshop bingoTony Hirst
 
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Tony Hirst
 
Lincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data JournalismLincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data JournalismTony Hirst
 

Plus de Tony Hirst (20)

15 in 20 research fiesta
15 in 20 research fiesta15 in 20 research fiesta
15 in 20 research fiesta
 
Dev8d jupyter
Dev8d jupyterDev8d jupyter
Dev8d jupyter
 
Ili 16 robot
Ili 16 robotIli 16 robot
Ili 16 robot
 
Jupyternotebooks ou.pptx
Jupyternotebooks ou.pptxJupyternotebooks ou.pptx
Jupyternotebooks ou.pptx
 
Virtual computing.pptx
Virtual computing.pptxVirtual computing.pptx
Virtual computing.pptx
 
ouseful-parlihacks
ouseful-parlihacksouseful-parlihacks
ouseful-parlihacks
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriate
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriate
 
Robotlab jupyter
Robotlab   jupyterRobotlab   jupyter
Robotlab jupyter
 
Community Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireCommunity Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wire
 
Residential school 2015_robotics_interest
Residential school 2015_robotics_interestResidential school 2015_robotics_interest
Residential school 2015_robotics_interest
 
Data Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXData Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKX
 
Week4
Week4Week4
Week4
 
A Quick Tour of OpenRefine
A Quick Tour of OpenRefineA Quick Tour of OpenRefine
A Quick Tour of OpenRefine
 
Conversations with data
Conversations with dataConversations with data
Conversations with data
 
Data reuse OU workshop bingo
Data reuse OU workshop bingoData reuse OU workshop bingo
Data reuse OU workshop bingo
 
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
 
Lincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data JournalismLincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data Journalism
 
Calrg14 tm351
Calrg14 tm351Calrg14 tm351
Calrg14 tm351
 
Calrg14 tm351
Calrg14 tm351Calrg14 tm351
Calrg14 tm351
 

Dernier

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 

Dernier (20)

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 

An Introduction to Data Journalism

  • 1. One  take  on  what  data  journalism  may  or  may  not  be…  a  lecture  presented  to   journalism  students  at  the  University  of  Lincoln,  UK,  February  2014.   1  
  • 2. Let’s  start  with  an  easy(?!)  quesJon  -­‐  what  is  journalism?   One  way  of  answering  that  quesJon  is  to  list  some  of  the  funcJons,  or  aMributed,   associated  with  it  –  informing,  educaJng,  holding  to  account,  watchdog  funcJon,   campaigning,  contextualising.   2  
  • 3. Sensemaking  seems  to  me  to  be  an  important  part  of  it…  In  part  contextualisaJon,  in   part  idenJfying  the  bits  that  make  the  difference,  the  bits  that  make  it  important,  the   bits  that  make  is  news  that  people  need  to  know..   3  
  • 4. Second  quesJon:  what  is  data?  NaJonal  staJsJcs,  sports  results,  polls,  financial   figures,  health  data,  school  league  tables,  etc  etc.   Is  a  book  data?  Or  a  speech?  What  if  I  split  a  speech  up  into  separate  words,  count   the  occurrence  of  each  unique  word  and  then  display  the  result  as  a  “tag  cloud”,  or   word  frequency  diagram.   4  
  • 5. One  way  of  thinking  about  data  is  that  it  is  a  parJcular  sort  of  source,  or  a  source  that   can  respond  to  a  parJcular  style  of  quesJoning  in  a  parJcular  way.   Another  take  on  this  is  that  many  “data  sources”  are  experts  on  a  parJcular  topic,   experts  that  know  a  lot  of  a  very  parJcular  class  of  facts.   5  
  • 6. So  what  is  data  journalism?   One  way  is  to  think  of  it  as  a  process,  as  exemplified  by  Paul  Bradshaw’s  inverted   pyramid  of  data  journalism.  I  see  it  more  as  a  conversaJon  in  which  data  is  one  of  the   conversants.  The  conversaJonal  view  also  allows  us  to  think  about  process,  but  more   important,  for  me,  is  that  in  a  conversaJon,  it  gets  personal…   6  
  • 7. The  inverted  pyramid  gives  us  one  way  of  considering  the  data  journalisJc  process,  or   at  least  idenJfying  some  of  the  steps  involved  in  a  data  invesJgaJon.   But  there  are  many  other  ways  of  conceptualising  the  process  –  for  example,  finding   stories  and  telling  stories…   7  
  • 8. When  it  comes  to  finding  stories,  do  we:   a)  want  to  find  stories  in  a  dataset  we  are  provided  with,  or   b)  use  data  to  help  draw  out  a  story  lead  we  have  already  been  Jpped  off  to?   8  
  • 9. One  of  the  ways  I  like  to  work  with  data  is  to  have  a  conversaJon  with  it  –  asking   quesJons  of  it  and  then  further  quesJons  based  on  the  responses  I  get.   9  
  • 10. SomeJmes  it  looks  at  first  as  if  we  have  data  in  a  form  where  we  might  be  able  to  do   something  with  it  –  then  we  realise  it  needs  cleaning  and  reshaping.   For  example,  in  this  case  we  have  percentage  signs  contaminaJng  numbers,  data   organised  in  separate  secJons  –  but  how  do  we  get  a  “well  behaved”  view  over    data   from  all  the  wards  –  and  different  sorts  of  data:  votes  polled  per  candidate  versus  the   size  of  the  electorate  in  a  parJcular  ward  for  example.   Walkthrough:  hMp://blog.ouseful.info/2013/05/03/a-­‐wrangling-­‐example-­‐with-­‐ openrefine-­‐making-­‐ready-­‐data/   10  
  • 11. One  of  the  first  datasets  I  played  with  was  MPs’  expenses  data.  Here  are  a  couple  of  ways  I  started  to  cha The  bar  chart  Is  ordered,  for  a  parJcular  expenses  area,  by  total  amount  for  each  individual  MP.   The  block  histogram  shows  how  many  MPs  made  a  total  claim  in  parJcular  expenses  area  of  a  parJcular  
  • 12. A  scaMerplot  is  another  very  powerful  sort  of  chart  –  we  can  plot  two  sorts  of  value  against  each  other  to Some  scaMerplot  tools  allow  you  to  size  or  colour  nodes  according  to  further  dimensions.  Colouring  node
  • 13. Maps  can  be  used  to  pull  out  different  sorts  of  relaJonships  –  for  example,  plokng   markers  in  the  centre  of  each  MP’s  ward  coloured  by  the  total  value  of  travel   expenses  claim  in  a  parJcular  area,  we  can  easily  see  whether  or  not  an  MP  is   claiming  an  amount  significantly  different  to  MPs  in  neighbouring  wards.  In  this  case  –   travel  expenses  –  we  might  expect    (at  first  glance  at  least)  a  homophiliJc  effect  –  folk   a  similar  distance  away  from  Westminster  should  presumably  make  similar  sorts  of   travel  claim?  At  second  glance,  we  might  then  start  to  refine  our  quesJoning  –  does   ward  size  (in  terms  of  geographical  area)  or  rurality  have  an  effect?  Does  an  MP  travel   to  and  from  home  more  than  neighbours  (or  perhaps  claim  more  in  terms  of   accommodaJon  in  London?)   13  
  • 14. SomeJmes  we  need  to  provide  quite  a  lot  of  explanaJon  when  it  comes  to  making   sense  of  even  a  simple  data  visualisaJon  –  “what  am  I  supposed  to  be  looking  at?”   14  
  • 15. ContextualisaJon  can  take  many  forms  –  Trinity  Mirror  Group  have  a  data  unit  that   produces  parJally  packaged  data  stories  and  lines  for  regional  Jtles,  who  can  then   add  local  colour,  knowledge,  interpretaJon  and  spin  to  the  resulJng  story.   15  
  • 16. For  many  readers  –  it  may  be  that  data  ONLY  makes  sense  when  appropriately   contextualised.   In  passing,  it’s  also  worth  noJng  that  someJmes  the  data  you  don’t  collect   someJmes  affects  the  interpretaJon  of  the  data  you  do…   Foe  example:  hMp://www.open.edu/openlearn/science-­‐maths-­‐technology/ mathemaJcs-­‐and-­‐staJsJcs/staJsJcs/diary-­‐data-­‐sleuth-­‐when-­‐the-­‐data-­‐you-­‐dont-­‐ collect-­‐affects-­‐the-­‐data-­‐you-­‐do   16  
  • 17. In  passing,  it’s  worth  menJoning  that  one  thing  staJsJcs  does  is  help  provide  context.   Is  this  number  a  big  number  in  the  greater  scheme  of  things?  Is  this  thing  likely  to   happen  by  chance  or  is  there  a  meaningful  causal  relaJonship  between  this  thing  and   another  thing?   The  chart  in  the  corner  is  a  reminder  about  how  surprising  probabiliJes  can  be.  The   chart  shows  the  probability  (y-­‐axis)  that  two  people  share  a  birthday  (the  number  of   people  is  given  on  the  x-­‐axis).  The  chart  shows  that  if  there  are  23  or  more  people  in  a   room,  there  is  more  than  a  50/50  chance  that  two  of  them  will  share  a  birthday  (that   is,  share  the  same  birth  day  and  month,  though  not  necessarily  same  birth  year).   How  many  people  are  in  the  room?  If  it’s  more  than  23  –  I  bet  that  at  least  two   people  share  a  birthday  (at  least  in  terms  of  day  and  month).   17  
  • 18. The  other  way  of  using  data  is  to  tell  stories.  But  what  does  that  even  mean…?   18  
  • 19. A  common  source  of  stories  based  on  data  are  polls,  either  polls  that  are   commissioned  by  a  publisher  with  a  view  to  generaJng  a  story,  or  commissioned  by  a   lobbying  group  or  PR  form  to  promote  not  only  stories  around  a  parJcular  issue,  but   stories  that  follow  a  line  favourable  to  the  organisaJon  that  commissioned  the  poll   (or  detrimental  to  posiJons  that  whoever  commissioned  the  poll  is  campaigning   against).   When  presented  with  a  press  release  wriMen  around  a  PR  company  commissioned   poll,  look  to  the  raw  data  to  see  where  the  numbers  that  appear  in  the  press  release   quotes  actually  come  from.   In  the  above  example,  I  could  for  example  claim  that  96%  of  people  (creaJve  reading   of  the  numbers)  did  not  appear  to  disagree  with  the  idea  that  press  behaviour  should   be  independently  regulated  (creaJve  reading  of  the  quesJon;  the  repeated  negaJves   also  serve  to  further  confuse  the  clarity  of  what  is,  or  isn’t  actually  being  claimed…).   And  when  reading  raw  results,  or  quoJng  from  them,  take  care  which  numbers  you   quote.  SomeJmes  the  presentaJon  of  the  results  can  lead  to  you  misreading  them  or   the  way  they  add  up.   SomeJmes,  two  or  more  polls  may  be  commissioned  around  the  same  topic  and   appear  to  give  contradictory  results.  For  an  example  of  this,  see:  hMp:// www.open.edu/openlearn/science-­‐maths-­‐technology/mathemaJcs-­‐and-­‐staJsJcs/ staJsJcs/two-­‐can-­‐play-­‐game-­‐when-­‐polls-­‐collide   19  
  • 20. Many  polling  organisaJons  publish  press  releases  featuring  “highlight”  results  from  a   poll.  The  more  reputable  ones  also  publish  copies  of  the  poll  or  survey  quesJons  and   the  results  that  were  returned.   YouGov  polls  oren  split  results  down  by  poliJcal  persuasion  or  newspaper   preference,  as  well  as  demographically  segmenJng  responses  by  gender,  age  or   region.   The  majority  of  polling  organisaJons  publish  the  data  via  PDFs  rather  than  “as  data”,   for  example,  in  the  form  of  spreadsheet  datatables.  Tools  such  as  Tabula  (URL)  are   making  it  increasingly  easy  to  extract  the  data  contained  within  PDFs  into  actual   datatables.  Your  local  techie  should  also  be  able  to  “scrape”  the  data  from  a  PDF   document  and  put  it  into  a  data  from.   For  examples  of  how  to  scrape  data  as  well  as  images  from  PDF  documents,  see:   -­‐   scraping  data  tables  from  PDFs:   -­‐   extracBng  images  from  PDFs:   Even  if  you  feel  as  if  you  can’t  do  this  yourself,  you  should  make  yourself  aware  of   what  is  possible  and  achievable  by  people  who  have  the  skills  to  performs  these   tasks.   20  
  • 21. Stephen  Few  has  wriMen  several  excellent  books  about  creaJng  data  visualisaJons   and  data  dashboards,  although  you  shouldn’t  necessarily  believe  everything  he  says!   This  quote  gets  across  the  idea  that  just  as  we  use  emphasis  and  tone  in  wriMen   communicaJon,  we  can  also  can  and  should  make  use  of  emphasis  and  tone  in   charts.   Many  newspapers  are  starJng  to  make  use  of  charts  that  show  several  datapoints  (for   example,  several  bars  in  a  bar  chart)  but  highlight  one  or  two  of  them  that  are  the   focus  of  a  parJcular  storyline,  the  other  points  or  bars  being  used  to  provide  context.   In  chart  design,  “less  is  more”  oren  works  (this  reflects  a  principle  aMributed  to  data   visualisaJon  guru  Edward  Ture  of  using  “least  ink”  when  creaJng  charts).   21  
  • 22. This  video    -­‐  showing  part  of  a  lecture  by  science  ficJon  writer  Kurt  Vonnegut  –  shows   how  simple  lines  can  tell  archetypal  stories.  Note  how  the  narraJon  sets  the  scene    -­‐   the  axes  are  explained  then  the  line  is  constructed.  When  the  x-­‐axis  represents  Jme,   remember  that  someone  riding  the  line  as  it  was  constructed  does  not  necessarily   know  what  the  future  holds.  When  you  see  a  line  chart  with  Jme  as  an  x-­‐axis,   remember  that  it  shows  a  trace  of  a  story  that  unfolded  over  Jme.   Another  powerful  example  of  this  can  be  found  on  Youtube  –  search  for  house  price   rollercoaster  to  find  an  animaJon  where  how  price  values  over  Jme  are  visualised  as   an  animated  roller  coaster  ride…   22  
  • 23. This  second  clip  shows  Swedish  health  staJsJcian  made  famous  by  his  “data   performances”,  Hans  Rosling,  narraJng  an  animated  data  visualisaJon  rendered  using   a  dynamic  bubble  chart  technique  that  he  popularised  via  his  Gapminder  website.   Note  how  the  first  30  seconds  of  the  clip  are  spent  explaining  the  set  up  of  the  chart  –   what  the  axes  mean,  what  the  bubbles  represent.  When  you  see  a  rich  data  driven   interacJve  on  a  website,  how  much  coaching  and  contextualisaJon  is  provided  to   help  the  user/reader  make  sense  of  it?   If  you  turn  the  sound  off  on  the  Rosling  clip,  how  much  sense  do  the  moving  bubbles   make  in  terms  of  the  story  they  tell  without  the  benefit  of  Rosling’s  narraBon?  Can   you  tell  where  to  focus  your  aMenJon  to  pull  out  a  meaningful  storyline?  Are  there   many  possible  storylines  that  can  be  pulled  out?  What  tricks  does  Rosling  use  to  focus   your  aMenJon  on  –  and  illustrate  –  the  story  he  is  telling?  Is  there  any  sleight  of  hand   in  terms  of  not  commenJng  on  what  some  of  the  other  bubbles  are  doing  (is  he   using,  or  could  he  potenJally  use,  misdirecJon  to  focus  your  aMenJon  aware  from   possible  stories  he  does  not  want  you  to  pull  out  of  the  data?)   For  more  examples  of  Rosling’s  compelling  performances,  see  the  recent  OU/BBC  Two   co-­‐producJon  “Don’t  Panic  –  The  Truth  About  PopulaJon  Change”  available  on  the   Gapminder  website:  hMp://www.gapminder.org/videos/dont-­‐panic-­‐the-­‐facts-­‐about-­‐ populaJon/   23  
  • 24. Few  suggests  that  graphical  communicaJon  requires  stylisJc  devices  that  emphasise   parJcular  aspects  of  a  graphic.  Hans  Rosling  achieves  this  by  both  poinJng  to  items  of   interest,  reinforcing  with  emphasis  with  both  his  narraJon  and  the  use  of  overlays  on   the  graphic  itself.   So  how  can  we  go  about  drawing  emphasis  within  a  staJc  graphic  or  chart,  such  as   one  might  find  in  a  print  publicaJon?   24  
  • 25. To  show  one  way  of  emphasising  parJcular  elements  of  a  graphic,  let’s  produce  a   quick  chart  of  our  own.   The  first  thing  we  need  is  some  data  –  I’m  going  to  use  some  data  from  the  Winter   Olympics,  a  grab  of  the  medal  table  from  the  back  end  of  the  first  week  of  the  2014   games.  The  quesJon  I  want  to  explore  is  the  extent  to  which  the  country  that  is   leading  the  medal  table  as  measured  by  most  number  of  gold  medals  awarded,   compared  to  a  ranking  in  which  the  table  is  ordered  according  to  the  total  number  of   medals  awarded.   The  data  I’m  going  to  use  comes  from  a  Wikipedia  page.  The  medal  table  is  contained   within  an  HTML  table.  To  get  the  data  out  of  the  page  we  are  going  to  screenscrape   the  HTML  table  that  contains  the  data.  There  are  a  variety  of  tools  for  doing  this,  from   browser  extensions  to  scraper  applicaJons  such  as  import.io,  to  environments  such  as   Scraperwiki  that  provide  a  range  of  developer  tools  configured  to  support   screenscraping  based  data  collecJon.   But  the  tool  I’m  going  to  use  is…   25  
  • 26. ..Google  (spread)sheets,  and  in  parJcular  a  formula  that  will  import  a  parJcular  HTML   table  –  in  this  case,  the  2nd  table  in  the  page  –  from  a  specified  URL,  In  this  case  the   URL  of  the  Wikipedia  page  containing  the  medal  table.   The  formula?    =importhtml(“URL”,”table”,  tableNumber)     On  entering  the  formula,  the  spreadsheet  will  pull  the  data  in  from  the  Wikipedia   page  and  make  it  available  as  spreadsheet  data.     We  can  now  use  the  spreadsheet  to  create  charts  within  the  sheet  itself.  If  the  data  in   the  Wikipedia  page  is  updated,  the  data  in  the  spreadsheet  will  be  updated  whenever   the  spreadsheet  is  refreshed.   26  
  • 27. Whilst  we  could  generate  charts  within  the  spreadsheet,  I’m  actually  going  to  use  an   online  tool  called  datawrapper  (available  at  datawrapper.de).   Datawrapper  charts  are  starJng  to  make  an  appearance  in  many  online  news  reports,   such  as  those  published  by  the  Guardian  and  Trinity  Mirror’s  ampp3d,  so  being   familiar  with  this  tool    -­‐  and  what  you  can  do  with  it  –  could  be  a  useful  skill  to  have.   To  get  the  data  in  to  datawrapper  you  can  upload  a  CSV  file,  or  paste  a  copy  of  the   data  in  to  the  upload  area.  I’ve  taken  the  laMer  approach,  highlighJng  and  copying   the  table  from  the  spreadsheet  and  then  pasJng  it  in  to  datawrapper.   27  
  • 28. Having  uploaded  the  data,  we  can  configure  several  properJes  for  each  column.  In   many  cases  datawrapper  should  be  able  to  detect  what  sort  of  content  is  contained   within  each  column  (for  example,  whether  it  is  a  number  or  a  text  field).   If  necessary,  we  can  apply  a  limited  amount  of  processing  to  the  contents  of  a   specified  column.  We  can  also  choose  to  hide  one  or  more  columns  from  the   displayed  view.  In  this  case,  I  am  going  to  hide  the  Rank,  Silver  and  Bronze  columns.   28  
  • 29. We  now  get  to  choose  the  chart  type  –  I’m  going  to  go  for  a  horizontal  bar  chart  and   select  the  default  datawrapper  style.   29  
  • 30. Different  chart  types  have  different  configuraJon  opJons.  I’m  going  to  choose  to   automaJcally  sort  the  bars  based  on  the  selected  value  –  noJce  the  buMons  in  the   chart  that  allow  us  to  select  whether  to  display  the  Gold  medal  count  or  the  Total   medal  count.   30  
  • 31. Now  we  get  to  add  some  emphasis  –  remember  emphasis?  This  is  an  example  about   how  to  show  emphasis  in  a  chart…   In  this  case,  I’m  going  to  emphasise  the  top  2  posiJons  in  the  Gold  medal  ranking  –   the  “point”  of  the  piece  is  to  explore  the  extent  to  which  these  posiJons  hold,  or   don’t  hold,  when  we  rank  the  table  by  total  medal  count.   At  this  point,  we  can  also  give  the  chart  a  Jtle,  and  add  some  provenance  informaJon   describing  and  poinJng  to  the  source  of  the  data.   31  
  • 32. Here’s  an  example  of  the  final  chart,  with  the  ranking  (automaJcally)  sorted  according   to  total  medal  count.  Note  how  the  order  and  posiJoning  of  the  two  highlighted   countries  has  changed.   The  difference  is  further  exemplified  when  switching  between  the  Gold  and  Total   counts  by  the  use  of  animaJon  –  the  highlighted  bars  draw  the  eye  and  allow  you  to   beMer  see  how  their  relaJve  posiJons  change  across  each  of  the  two  ranking   schemes.   32  
  • 33. Having  created  chart,  you  can  now  save  it  to  your  datawrapper  account.  An  embed   code  for  the  chart  is  provided  so  that  you  embed  the  chart  within  your  own  web   page.   33  
  • 34. Bar  charts  are  a  very  effecJve  way  of  displaying  parJcular  sorts  of  informaJon,  such   as  counts.  But  what  other  ways  are  there  of  displaying  data?   34  
  • 35. Datawrapper  provides  a  variety  of  chart  types,  including:   -­‐   horizontal  and  verJcal  (column)  bar  charts,   -­‐   grouped  bars  that  collate  different  bars  according  to  groups  (for  example,  elecJon   on  elecJon  percentage  of  the  vote  for  different  poliJcal  parJes),   -­‐   stacked  column  charts  (for  example,  for  a  selecJon  of  countries  we  could  display  a   column  showing  the  total  number  of  medals  constructed  by  stacking  the  individual   gold,  silver  and  bronze  medal  counts  for  those  countries)   -­‐   line  charts,  which  are  widely  used  for  plokng  some  value  on  the  verJcal  y-­‐axis   against  Jme  on  the  horizontal  x-­‐axis   -­‐   pie  charts,  to  show  proporJons  of  a  whole,  and  variants  thereof,  such  as  the  donut   chart  (a  pie  chart  with  the  middle  cut  out)   -­‐   simple  data  tables  (never  underesJmate  the  power  of  a  table  –  they  can  be  really   useful  for  showing  specific  values,  and  can  be  very  powerful  when  allowing  the  user   to  sort  the  table  either  by  ascending  or  descending  values  in  parJcular  columns)   -­‐   maps,  which  as  we  shall  see,  can  draw  out  very  powerful  relaJonships  across  data   elements.   35  
  • 36. We’ve  also  seen  some  other  “basic”  charts  that  can  be  useful  for  displaying  the   distribuJon  of  data  elements:   -­‐   the  block  histogram  shows  a  count  on  the  y-­‐axis  of  data  elements  falling  within   parJcular  ranges  of  values  on  the  x-­‐axis   -­‐   the  scaMerplot  allows  us  to  plot  two  values  against  each  other,  for  example  height   versus  weight.  These  charts  can  provide  us  with  clues  about  possible  correlaJons  or   relaJonships  between  the  two  values.  Some  scaMerplot  tools  further  allow  us  to   colour  each  point  according  to  group  membership  so  that  we  can  look  to  see  whether   numbers  are  clustered  or  grouped  according  to  group  membership.   36  
  • 37. Visualising  data  is  a  powerful  way  of  asking  quesJons  of  data  –  what  data  points  you   choose  to  display  and  how  you  display  them  represent  the  framing  of  the  quesJon.   What  the  data  looks  like  is  the  response,  but  a  response  that  oren  takes  careful   reading.  The  data  source  has  drawn  you  the  answer  –  you  need  to  turn  it  into  words   that  you  can  use  to  formulate  further  quesJons  to  check  your  understanding  of  the   answer  first  provided.  (Each  quesJon  (each  chart)  typically  leads  to  another…  or  more   than  one  other…)   Asking  quesJons  that  have  a  graphical  answer  is  one  way  of  querying  a  data  source  –   but  are  there  other  approaches?     Let’s  explore  that  a  liMle  more  –  what  do  we  mean  by  asking  quesJons  of  data?   37  
  • 38. A  database  that  most  of  us  use  every  day  is  the  Google  web  search  engine.  We  put  in   a  key  term  or  phrase  and  Google  finds  web  pages  ranked  according  to  a  variety  of   criteria  that  are  deemed  most  relevant  to  the  query  you  (and  it  could  well  be  who  you   actually  are  that  affects  the  ranking)  have  made.   SomeJmes  we  may  know  what  websites  we  actually  want  to  search  over.  Google   Custom  Search  Engines  provide  one  way  of  defining  your  own  search  engine  that  just   searches  over  part  of  the  web  that  you  are  interested  in.   One  of  the  custom  search  engines  I  have  developed  searches  over  websites  that  act   as  wire  services  for  press  releases:  hMps://www.google.com/cse/publicurl? cx=016419300868826941330:wvfrmcn2oxc   This  allows  us  to  track  down  the  source  of  many  a  news  item  and  explore  the  extent   to  which  a  given  news  story  has  just  churned  a  press  release.   See  also:  hMp://blog.ouseful.info/2014/02/06/polling-­‐the-­‐news/  This  post  also   describes  how  to  create  a  bookmarklet  that  allows  you  to  highlight  a  quote  in  a  news   report  and  search  for  press  releases  that  contain  that  quote.     38  
  • 39. Here’s  an  example  of  the  search  engine  in  acJon  –  I’ve  used  a  bookmarklet  that  takes   a  highlighted  quote  from  a  news  story  and  passes  it  to  the  custom  search  engine,   allowing  me  to  easily  see  the  source  of  the  quote,  and  the  story  itself.     I’ve  also  started  defining  another  related  custom  search  engine  that  allows  us  to   search  news  sites  and  polling  companies  for  stories  about,  and  sources  of,  polls  and   surveys:   hMps://www.google.com/cse/publicurl?cx=016419300868826941330:ewbi9skvnmq       39  
  • 40. Custom  search  engines  are  a  powerful  tool  for  helping  us  developed  focussed  web   search  tools  that  limit  results  to  a  parJcular  part  of  the  web  we  are  interested  in,   either  by  locaJon  or  topic.   We  can  also  use  (advanced)  search  limits  in  ‘everyday’  web  queries  using  the  major   web  search  engine.   For  example,  the  query  shown  on  this  slide  searches  for  the  word  underspend   appearing  in  Excel  spreadsheets  (filetype:xls)  that  can  be  found  on  UK  government   websites  (or  more  specifically,  websites  hosted  on  the  gov.uk  domain  (site:gov.uk)).   Another  query  limit  combinaJon  I  have  found  useful  is:   confidenBal  filetype:ppt   This  can  turn  up  presentaJons  that  have  been  delivered  at  closed  corporate  events   but  that  have  leaked  on  to  the  web…     40  
  • 41. Even  if  you  don’t  consider  yourself  a  geek  or  database  expert,  wriJng  advanced   search  queries  using  search  limits  is  but  a  small  step  away  from  wriJng  queries  over   databases  themselves.   One  of  the  most  widely  used  languages  for  querying  databases  is  SQL.  The  above  slide   shows  a  simple,  made  up  SQL  query  that  could  have  a  similar  effect  to  the  simpler   search  engine  query  made  over  a  very  simple  search  engine  database.   The  idea  is  that  we  select  those  webPages  where  the  text  content  of  the  webpage   contains  the  word  underspend  anywhere  –  the  %  signs  denote  wildcard  characters  so   the  underspend  word  can  appear  preceded  or  followed  by  any  number  of  arbitrary   characters.  We  also  want  the  query  to  be  limited  to  pages  that  have  a  parJcular   filetype  and  domain.   Far  more  complicated  queries  can  be  wriMen  over  far  more  complex  databases.   What’s  important  is  that  you  develop  an  idea  of  what  sorts  of  database  structure  and   query  are  possible,  not  necessarily  that  you  can  run  and  query  such  databases   yourself.   For  more  examples,  see:   Asking  QuesJons  of  Data  –  Garment  Factories  Data  ExpediJon  –  hMp:// schoolofdata.org/2013/05/24/asking-­‐quesJons-­‐of-­‐data-­‐garment-­‐factories-­‐data-­‐ expediJon/     Asking  QuesJons  of  Data  –  Some  Simple  One-­‐Liners  hMp://schoolofdata.org/ 2013/05/13/asking-­‐quesJons-­‐of-­‐data-­‐some-­‐simple-­‐one-­‐liners/     41  
  • 42. One  of  the  simplest,  but  oren  one  of  the  most  useful,  things  we  can  do  is  to  count   things.  You  just  need  to  be  creaJve  in  what  you  count!   One  of  the  nice  features  about  working  with  database  query  languages  such  as  SQL  is   that  we  can  write  queries  that  count  the  number  of  responses  and  allows  us  to  rank   results  on  that  basis.  For  example,  in  a  database  of  public  spending  transacJons  with   different  companies,  we  could  count  the  number  of  transacJons  with  a  parJcular   company,  sum  the  value  of  transacJons  carried  out  with  a  parJcular  company,  or  find   the  companies  with  the  largest  total  amount  spent  with  a  parJcular  company.   42  
  • 43. As  has  already  been  menJoned,  a  key  part  of  the  journalisJc  exercise  is  pukng  things   into  context.   When  working  with  data,  interpreJng  what  the  data  says  oren  depends  on   understanding  the  context  and  more  importantly,  the  caveats,  that  arise  by  virtue  of   asking  a  parJcular  quesJon  of  a  parJcular  dataset  that  has  been  collected  in  a   parJcular  way  under  parJcular  condiJons.   That  said,  given  a  parJcular  data  set,  are  there  any  obvious  quesJons  we  can  ask  of   it?   43  
  • 44. When  results  are  ranked,  as  for  example  in  the  case  of  league  tables,  there  are  oren   easy  picking  stories  to  be  had  around  top  3/boMom  three  posiJons.  In  naJonal   rankings,  local  news  stories  can  be  idenJfied  if  your  local  schools  or  council  appears   in  either  of  those  extremes.   For  contextualisaJon  purposes,  it  oren  makes  sense  to  look  at  distribuJons.  Many   summary  staJsJcs  report  on  the  mean  value,  but  looking  at  measures  of  variaJon,  or   spread,  about  a  mean,  as  well  as  the  posiJon  of  a  median  value,  can  oren  change  the   context  of  a  story.   If  the  lecture  room  has  20  students  in  it  on  an  income  of  £6,000  maintenance  loan   per  year,  the  total  income  is  £120,000  and  their  average  mean  income  is  £6,000.  If  an   academic  in  the  room  is  on  £40,000,  the  total  income  for  the  room  is  £160,000.  The   average  mean  income  is  now  just  a  liMle  over  £7,500.  If  we  define  a  poverty  level  as  a   mean  income  below  £10,  000,  the  members  of  the  room  are,  on  average,  in  poverty.   If  a  senior  academic  such  as  professor  on  an  income  over  £65,000  wanders  into  the   room,  the  total  income  goes  to  over  £225,000.  With  22  people  now  in  the  room,  the   average  mean  income  is  now  over  £10,000:  the  room  is  out  of  poverty.  The  median   average  income,  however,  is  sJll  at  £6,000.   As  well  as  top,  boMom,  mean  and  median,  we  should  also  look  to  outliers.  If  Bill  Gates   or  Mark  Zuckerberg  walks  into  a  bar,  the  average  net  worth  of  people  in  that  bar  is   likely  to  go  up  to  a  level  of  previously  unimagined  wealth.   Here  are  several  reasons  why  you  should  pay  aMenJon  to  outliers:   -­‐   they  may  be  ‘dirty’  or  incorrect  data  points  that  need  to  be  corrected  and  that  may   well  raise  quesJons  about  data  quality;   -­‐   the  outlier  may  truly  be  an  outlier,  a  remarkable  point  and  a  story  in  its  own  right;   -­‐   the  outlier  may  skew  other  measures,  such  as  mean  values  or  other  summary   staJsJcs.  In  such  cases,  it  may  make  sense  to  use  other  measures  or  to  rerun  the   44  
  • 45. This  rather  dense  graphic  is  a  view  over  local  council  spending  data  in  my  local  area  as   relates  to  spend  on  libraries.  The  separate  charts  show  the  accumulated  spend  over  a   period  of  Jme  with  different  suppliers.  The  intenJon  of  the  display  was  to  provide  at   a  glance  a  view  of  accumulated  spend  with  different  companies  across  different   directorates  and  spending  areas  to  see  whether  any  companies  had  a  significant   spend  compared  to  other  companies.   The  table  at  the  boMom  shows  the  top  of  a  league  table  of  companies  with  the  largest   accumulated  spend  by  directorate  and  expense  type.   At  first  glance,  the  spend  on  phone  lines  with  different  suppliers  seems  to  outweigh   the  spend  on  books.  How  can  that  be?  Are  the  librarians  spending  their  Jme  calling   premium  rate  phone  lines?   If  we  guess  at  20  libraries  and  a  6  month  spend  period,  then  assume  that  the  phone   lines  correspond  to  broadband  data  bills,  do  the  monthly  payments  per  library  sJll   seem  outrageous?  These  assumpJons  are  testable  via  quesJons  to  the  relevant   authoriJes,  of  course,  but  demonstrate  the  care  we  need  to  take  when  trying  to   understand  why  a  number  that  may  appear  to  be  large  is  that  large.   See  also:  Local  Council  Spending  Data  –  Time  Series  Charts  hMp://blog.ouseful.info/ 2013/11/06/local-­‐council-­‐spending-­‐data-­‐Jme-­‐series-­‐charts/   45  
  • 46. As  well  as  looking  for  outliers,  we  should  also  look  for  similariJes  between  things  we   expect  to  be  different  and  differences  between  things  we  expect  to  be  the  same,  or  at   least,  similar.   46  
  • 47. Looking  again  at  some  of  my  local  council’s  spending  data,  I  noJced  a  search  on   “music”  pulled  back  what  appeared  to  be  a  shir  in  responsibility  between   directorates  for  spend  on  school  music  service  provision.   An  obvious  quesJon  that  follows  is:  if  the  service  did  change  hands  (something  we   can  check),  was  there  a  resulJng  difference  in  the  way  that  the  directorates  were   spending?  Could  we,  for  example,  idenJfy  whether  any  projects  got  dropped  (or  at   least,  renamed  out  of  scope!)?   This  forensic  approach  can  also  be  used  to  track  the  consequences  of  a  shir  in  control   of  a  service,  if  we  know  it  has  happened.  When  a  service  changes  hand,  we  can  keep   a  note  of  the  fact  and  then  a  year  on  look  for  evidence  in  whether  treatment  of  the   service  has  changed,  at  least  in  consequences  for  spending.   See  also:  What  Role,  If  Any,  Does  Spending  Data  Have  to  Play  in  Local  Council  Budget   ConsultaJons?  hMp://blog.ouseful.info/2013/11/03/what-­‐role-­‐if-­‐any-­‐does-­‐spending-­‐ data-­‐have-­‐to-­‐play-­‐in-­‐local-­‐council-­‐budget-­‐consultaJons/   47  
  • 48. When  asking  quesJons  of  data,  one  quesJon  can  oren  lead  to  another.   For  example,  a  query  over  my  local  council  spending  data  about  amounts  spent  with   the  local  newspaper,  the  Isle  of  Wight  Country  Press,  idenJfied  a  variety  of  expense   types  associated  with  those  spending  transacJons.  One  such  expense  type  was   AdverBsing  &  Publicity.  This  led  to  me  now  steering  the  conversaJon  I  was  having   with  this  expert  (data)  source  on  council  spending  and  taking  it  on  to  a  slightly   different  tack:  so  who  else  have  you  been  spending  adverBsing  and  publicity  budgets   with?     48  
  • 49. If  you  in  the  posiJon  of  paying  for  energy  supply  bills  –  electricity  and  gas  –  you’ll   probably  be  familiar  with  the  idea  that  payments  are  set  so  you  tend  to  overpay  on  a   monthly  basis.  Arer  collecJng  the  interest  on  your  overpayments,  the  uJlity   companies  may  eventually  get  round  to  sending  you  a  small  repayment  to  cover  the   excess  (ex-­‐  of  any  interest,  of  course…).   Is  the  same  true  at  the  council  level?   One  thing  I  noJced  in  the  spend  my  local  council  spent  with  supplier  Southern   Electric  was  that  there  appeared  to  be  more  than  a  few  “negaJve  payments”.  So   where  were  these  coming  from?  The  chart  shown  in  this  slide  has  posiJve  payments   made  by  date  (not  ordered  on  an  evenly  space  Jmeline)  in  black,  and  the  magnitude   of  negaJve  payments  shown  in  red.  Where  a  red  triangle  sits  over  a  black  dot,  this   shows  that  a  posiJve  and  negaJve  payment  of  the  same  amount  were  made  on  the   same  day.  Why’s  that?   Some  days  show  several  negaJve  payments  –  again,  what’s  happening?  There’s  not   necessarily  anything  suspicious  going  on,  but  what  story  does  this  chart  appear  to  tell   us,  parJcularly  in  terms  of  the  similariJes  in  amount  of  certain  posiJve  and  negaJve   spends?   49  
  • 50. Just  by  the  by,  this  chart  refines  the  quesJon  I’m  asking  of  the  spend  with  Southern   Electric,  asking  for  more  informaJon  about  posiJve  and  negaJve  payments  made  on   the  gas  and  electricity  accounts  separately.   50  
  • 51. As  well  as  similariJes  and  differences,  data  can  tell  us  tales  about  trends…   51  
  • 52. Regular  releases  from  the  ONS  –  the  Office  of  NaJonal  StaJsJcs  –  provide  bread  and   buMer  news  stories  on  a  regular  basis  according  to  a  known  schedule.   For  example,  monthly  job  seeker  figures  get  a  monthly  write-­‐up  in  OnTheWight,  the   hyperlocal  news  blog  local  to  me.  The  report  makes  a  comparison  between  the   current  figures  and  figures  from  the  previous  month  and  from  the  same  month  of  the   previous  year.  The  aim  is  is  so  that  we  can  see  how  the  numbers  have  changed  month   on  month,  and  year  on  year.   I  started  to  explore  a  simple  script  that  would  take  data  directly  from  the  ONS  and   produce  assets  that  could  be  reused  in  a  news  story  –  for  example,  to  produce  a  table   showing  the  change  in  figures  over  recent  months.   I  also  started  to  explore  ways  in  which  we  could  automate  the  producJon  of  prose   from  the  data  [code:  hMps://gist.github.com/psychemedia/7536017].  For  example,   the  following  phrase  was  generated  automaJcally  from  monthly  figures:   The  total  number  of  people  claiming  Job  Seeker's  Allowance  (JSA)  on  the  Isle  of  Wight   in  October  was  2781,  up  94  from  2687  in  September,  2013,  and  down  377  from  3158   in  October,  2012.   The  words  up  and  down  were  selected  based  on  simple  if-­‐then  rule  that  compared   figures  to  see  which  was  the  greater.  The  numbers  and  dates  are  pulled  in  from  the   data.  The  other  words  are  canned  phrases.   The  automated  producJon  of  text  from  data  is  something  that  has  received  aMenJon   from  several  companies,  parJcular  in  the  area  of  baseball  reports  and  financial   reporJng.  See  for  example:  hMp://blog.ouseful.info/2013/05/22/notes-­‐on-­‐narraJve-­‐ science-­‐and-­‐automated-­‐insight/   52  
  • 53. If  we  plot  a  line  chart  with  some  quanJty  against  a  Jme  axis,  we  can  oren  see   increasing  or  decreasing  trends  over  Jme.  If  we  are  looking  for  constant  rates  of   increase  in  some  value,  it  oren  makes  sense  to  use  a  log/logarithmic  scale  to  display   that  value  on  the  y-­‐axis  Periodic  trends  can  also  be  seen  as  ‘waves’  appearing  in  the   line  over  Jme,  but  other  displays  can  draw  out  periodicity  or  seasonality  in  a  more   visually  compelling  way.   For  example,  in  these  charts  –  of  jobless  figures  on  the  Isle  of  Wight  once  again  –  we   have  months  ordered  along  the  horizontal  x-­‐axis  and  the  number  of  job  allowance   claimants  on  the  verJcal  y-­‐axis.  The  separate  coloured  lines  represent  different  years.   On  the  ler,  we  use  a  legend  to  idenJfy  the  lines,  on  the  right  is  an  example  of   labeling  the  lines  directly.   The  lines  show  strong  seasonality  in  behaviour.  Being  a  tourist  desJnaJon,  job  seeker   figures  tend  to  fall  over  the  summer  months.  Pukng  lines  for  several  years  on  the   same  axis  allows  us  to  compare  annual  cycles  over  Jme.   53  
  • 54. Another  trend  we  can  try  to  pull  out  is  change  over  years  for  each  given  month.  Here,   the  horizontal  x-­‐axis  blocks  out  the  months,  as  before,  but  within  each  month  we   have  an  ordered  range  of  years.  The  line  within  each  block  thus  represents  the  year-­‐ on-­‐year  change  in  numbers  within  a  given  month.   The  step  change  within  each  month  suggests  that  the  way  the  figures  were  calculated   changed  significantly  several  years  ago.   Further  reading:  a  good  guide  to  staJsJcs  as  used  by  government,  include  a   descripJon  of  the  way  that  “seasonal  adjustments”  are  handled,  is  provided  by  the   House  of  Commons  Library’s  StaJsJcal  Literacy  Guide  hMp://www.parliament.uk/ business/publicaJons/research/briefing-­‐papers/SN04944/staJsJcal-­‐literacy-­‐guide   54  
  • 55. As  well  as  the  paMerns  we  can  see  over  Jme  by  plokng  data  against  a  Jme  axis,  we   can  also  look  for  paMerns  in  space…   55  
  • 56. In  part  because  they  are  so  recognisable  to  the  majority  of  people  as  an  idea    as  well   as  an  artefact,  maps  are  widely  used  in  many  publicaJons.   I  have  already  menJoned  how  the  use  of  a  map  to  compare  travel  claims  by  MPs   based  on  their  consJtuency  locaJons  provided  a  way  of  making  a  parJcular  sort  of   comparison  between  MPs  (in  parJcular,  a  comparison  based  on  geographical   locaJon).   But  we  can  take  the  idea  of  a  map  more  generally,  as  a  spaJal  distribuJon  of  points   that  are  related  in  some  way,  with  strong  relaJons  represented  as  spaJal  proximity.   Things  that  are  close  together  on  the  page  are  taken  to  be  close  together  in  some  sort   of  space,  a  space  which  may  be  conceptual  or  social,  not  just  (or  not  even)   geographic.   56  
  • 57. Take  this  map,  for  example,  a  map  of  TwiMer  users  commonly  followed  by  a  sample  of   followers  of  @UL_journalism.   The  map  has  been  laid  out  so  that  TwiMer  users  who  are  heavily  interlinked  are   grouped  closely  together  (for  the  most  part,  at  least).  A  network  staJsJc  has  been   used  in  an  aMempt  to  colour  clusters  of  nodes  with  high  interconnecJon.  The   coloured  regions  thus  represent  a  first  aMempt  at  idenJfying  different  groupings  of   TwiMer  user.  You  will  note  how  the  spaJal  layout  algorithm  and  the  grouping/ colouring  algorithm  complement  each  other  well  –  they  both  seem  to  tell  a  similar   story,  where  the  story  is  that  certain  groups  of  individuals  are  somehow  alike.   About  the  technique:  hMp://schoolofdata.org/2014/02/14/mapping-­‐social-­‐ posiJoning-­‐on-­‐twiMer/   Let’s  have  a  closer  look  at  some  of  the  regions…   57  
  • 58. This  area  seems  to  be  TwiMer  accounts  that  relate  in  large  part  to  the  University  of   Lincoln  and  its  related  organisaJons  and  acJviJes.   58  
  • 59. This  area  of  the  map  contains  accounts  associated  with  Lincoln  more  generally.  Such  a   map  may  be  useful  for  idenJfying  companies  that  are  used  by  students  and  as  such   may  be  useful  leads  for  adverJsing  agents  looking  to  sell  adverts  appearing  in   university  magazines  or  poster  areas.   59  
  • 60. This  area  of  the  map  actually  conflates  several  different  groupings,  at  least,  on  my   reading  of  it.  In  fact,  it  may  make  sense  to  try  to  find  clusters  within  this  group  on  its   on  and  then  recolour  accordingly.   So  what  groups  can  I  see?  BoMom  ler  there  looks  to  be  Lincoln  local  media  outlets.   Moving  counter-­‐clockwise  between  the  6  and  3  o’clock  posiJons  we  see  photography   related  users  moving  up  into  celebriJes.  As  we  move  further  up  towards  the  twelve   o’clock  posiJon,  we  see  news  sites,  both  “popular”  and  more  industry  related   (@journalismnews,  for  example).   That  there  does  not  appear  to  be  a  strong  independent  cluster  of  journalists  and   industry  related  sites  suggests  that,  from  the  sampled  followers  of  UL_Journalism  at   least,  there  isnlt  necessarily  a  very  strong  noJon  of  following  these  industry  lights…   60  
  • 61. One  of  the  things  to  menJon  about  mapping  data  mapping  and  visualisaJon   techniques  is  that  they  oren  tells  us  things  we  already  (think  we)  know;  in  that  sense,   they  are  not  news.  But  they  may  also  tell  us  things  we  know  in  new,  visually  appealing   ways.  And  by  making  use  of  such  ‘confirmatory’  visualisaJons  and  displays  we  can   build  confidence  within  an  audience  that  they  know  how  to  interpret  these  sorts  of   representaJon.   61  
  • 62. As  the  audience  becomes  comfortable  reading  the  charts  and  making  sense  of  data,   when  there  is  something  new  or  surprising  in  the  data,  the  surprise  manifests  itself  in   the  reading  of  the  data  or  chart.   For  journalists  working  with  data,  developing  a  sense  of  familiarity  with  how  to   interpret  and  read  data  when  it  is  just  confirming  what  you  already  know  helps  to   refine  your  senses  for  spokng  things  that  are  odd,  noteworthy,  or  newsworthy.   Taking  a  liMle  bit  of  Jme  each  day  to:   -­‐   read  charts  as  if  they  were  stories;   -­‐   look  behind  the  data  to  find  original  sources,  such  as  polls  or  data  containing  news   releases,  and  then  compare  the  original  release  with  the  way  it  is  reported,  paying   parJcular  aMenJon  to  the  points  that  are  highlighted,  and  how  the  data  is   contextualised;   will  help  you  develop  some  of  the  skills  you  will  need  if  you  want  to  be  able  to   idenJfy,  develop  and  treat  some  of  the  stories  that  your  specialist  source  that  is  data   can  provide  you  with,  of  only  you  ask…     62  
  • 63. And  finally,  a  couple  of  handy  books  and  resources  on  data  journalism  if  you’re   interested  in  reading  more  generally  around  the  subject…   63