SlideShare une entreprise Scribd logo
1  sur  55
Télécharger pour lire hors ligne
Taking	
  this	
  opportunity	
  to	
  explore	
  some	
  of	
  the	
  issues	
  associated	
  with	
  whatever	
  this	
  
thing	
  called	
  “data	
  journalism”	
  is…	
  
1	
  
I’m	
  not	
  a	
  journalist,	
  and	
  don’t	
  have	
  any	
  form	
  of	
  
journalism	
  training.	
  But	
  I	
  do	
  have	
  an	
  interest	
  in	
  ICT,	
  
and	
  from	
  that	
  have	
  an	
  interest	
  in	
  
“communicaDon”.	
  
	
  
Let’s	
  start	
  with	
  an	
  easy(?!)	
  quesDon	
  -­‐	
  what	
  is	
  
journalism?	
  
	
  
One	
  way	
  of	
  answering	
  that	
  quesDon	
  is	
  to	
  list	
  some	
  
of	
  the	
  funcDons,	
  or	
  aMributed,	
  associated	
  with	
  it	
  –	
  
informing,	
  educaDng,	
  holding	
  to	
  account,	
  
watchdog	
  funcDon,	
  campaigning,	
  contextualising	
  
for	
  a	
  par'cular	
  audience.	
  
2	
  
Sensemaking	
  seems	
  to	
  me	
  to	
  be	
  an	
  important	
  part	
  of	
  it…	
  In	
  part	
  contextualisaDon,	
  in	
  
part	
  idenDfying	
  the	
  bits	
  that	
  make	
  the	
  difference,	
  the	
  bits	
  that	
  make	
  it	
  important,	
  the	
  
bits	
  that	
  make	
  it	
  news	
  that	
  people	
  need	
  to	
  know…	
  
	
  
…and	
  oRen	
  with	
  a	
  parDcular	
  audience	
  in	
  mind.	
  
3	
  
CriDcal	
  judgement.	
  
4	
  
Second	
  quesDon:	
  what	
  is	
  data?	
  NaDonal	
  staDsDcs,	
  sports	
  results,	
  polls,	
  financial	
  
figures,	
  health	
  data,	
  school	
  league	
  tables,	
  etc	
  etc.	
  
	
  
Is	
  a	
  book	
  data?	
  Or	
  a	
  speech?	
  What	
  if	
  I	
  split	
  a	
  speech	
  up	
  into	
  separate	
  words,	
  count	
  
the	
  occurrence	
  of	
  each	
  unique	
  word	
  and	
  then	
  display	
  the	
  result	
  as	
  a	
  “tag	
  cloud”,	
  or	
  
word	
  frequency	
  diagram.	
  
5	
  
One	
  way	
  of	
  thinking	
  about	
  data	
  is	
  that	
  it	
  is	
  a	
  parDcular	
  sort	
  of	
  source,	
  or	
  a	
  source	
  that	
  
can	
  respond	
  to	
  a	
  parDcular	
  style	
  of	
  quesDoning	
  in	
  a	
  parDcular	
  way.	
  
	
  
Another	
  take	
  on	
  this	
  is	
  that	
  many	
  “data	
  sources”	
  are	
  experts	
  on	
  a	
  parDcular	
  topic,	
  
experts	
  that	
  know	
  a	
  lot	
  of	
  a	
  very	
  parDcular	
  class	
  of	
  facts.	
  
6	
  
One	
  way	
  of	
  thinking	
  about	
  data	
  is	
  that	
  it	
  is	
  a	
  parDcular	
  sort	
  of	
  source,	
  or	
  a	
  source	
  that	
  
can	
  respond	
  to	
  a	
  parDcular	
  style	
  of	
  quesDoning	
  in	
  a	
  parDcular	
  way.	
  
	
  
Another	
  take	
  on	
  this	
  is	
  that	
  many	
  “data	
  sources”	
  are	
  experts	
  on	
  a	
  parDcular	
  topic,	
  
experts	
  that	
  know	
  a	
  lot	
  of	
  a	
  very	
  parDcular	
  class	
  of	
  facts.	
  
7	
  
So	
  what	
  is	
  data	
  journalism?	
  
	
  
If	
  I	
  was	
  to	
  ask	
  you,	
  the	
  members	
  of	
  a	
  school	
  of	
  journalism,	
  “is	
  this	
  or	
  that	
  news	
  arDcle	
  
‘journalism’”	
  I	
  imagine	
  one	
  response	
  might,	
  “well….	
  It’s	
  the	
  output	
  of	
  a	
  journalisDc	
  
process.”	
  
	
  
But	
  if	
  I	
  point	
  at	
  a	
  map	
  with	
  some	
  markers	
  on	
  it	
  and	
  ask:	
  “is	
  this	
  map	
  “data	
  
journalism”,	
  you	
  might	
  answer:	
  yes.	
  Or	
  at	
  least,	
  that’s	
  what	
  many	
  of	
  the	
  early	
  job	
  ads	
  
for	
  data	
  journalists	
  implied.	
  
8	
  
Sports	
  journalism	
  has	
  sport	
  as	
  the	
  topical	
  contextual	
  frame	
  for	
  some	
  journalisDc	
  
acDvity,	
  
	
  
PoliDcal	
  journalism	
  has	
  poliDcs	
  as	
  the	
  topical	
  contextual	
  frame	
  for	
  some	
  journalisDc	
  
acDvity,	
  
	
  
InvesDgaDve	
  journalism	
  has	
  a	
  parDcular	
  process	
  as	
  the	
  contextual	
  frame	
  for	
  some	
  
journalisDc	
  acDvity,	
  a	
  process	
  that	
  may	
  be	
  applied	
  to	
  parDcular	
  topic	
  areas.	
  
	
  
So	
  for	
  data	
  journalism	
  does	
  “data”	
  relate	
  to	
  the	
  topic	
  or	
  the	
  process?	
  
	
  
Where	
  we	
  focus	
  on	
  data	
  outputs,	
  then	
  the	
  implicaDon	
  is	
  that	
  the	
  “topic”	
  of	
  data	
  is	
  
the	
  focus	
  of	
  the	
  framing.	
  But	
  I	
  think	
  we	
  need	
  to	
  reframe	
  to	
  consider	
  the	
  procedural	
  
role.	
  
	
  
9	
  
So	
  as	
  a	
  starDng	
  point,	
  let’s	
  frame	
  the	
  idea	
  that	
  data	
  journalism	
  is	
  a	
  process	
  related	
  
epithet	
  that	
  implies	
  one	
  of	
  the	
  key	
  sources	
  in	
  a	
  journalisDc	
  acDvity	
  is	
  “data”.	
  
10	
  
11	
  
By	
  focusing	
  on	
  this	
  noDon	
  of	
  data	
  journalism	
  as	
  relaDng	
  to	
  process,	
  we	
  can	
  then	
  start	
  
to	
  explore	
  with	
  a	
  liMle	
  bit	
  more	
  criDcality	
  what	
  the	
  pracDce	
  of	
  data	
  journalism	
  might	
  
involve	
  that	
  idenDfies	
  it	
  as	
  such.	
  
	
  
That	
  is,	
  how	
  is	
  pracDce	
  influenced	
  by	
  the	
  fact	
  that	
  it	
  must	
  engage	
  with	
  “data	
  as	
  a	
  
source”?	
  
12	
  
The	
  inverted	
  pyramid	
  gives	
  us	
  one	
  way	
  of	
  considering	
  the	
  data	
  journalisDc	
  process,	
  or	
  
at	
  least	
  idenDfying	
  some	
  of	
  the	
  steps	
  involved	
  in	
  a	
  data	
  invesDgaDon.	
  
	
  
But	
  there	
  are	
  many	
  other	
  ways	
  of	
  conceptualising	
  the	
  process	
  –	
  for	
  example,	
  finding	
  
stories	
  and	
  telling	
  stories…	
  
13	
  
When	
  it	
  comes	
  to	
  finding	
  stories,	
  do	
  we:	
  
	
  
a)  want	
  to	
  find	
  stories	
  in	
  a	
  dataset	
  we	
  are	
  provided	
  with,	
  or	
  
b)  use	
  data	
  to	
  help	
  draw	
  out	
  a	
  story	
  lead	
  we	
  have	
  already	
  been	
  Dpped	
  off	
  to?	
  
14	
  
Anscombe’s	
  Quartet	
  is	
  a	
  toy	
  dataset	
  that	
  first	
  appeared	
  in	
  a	
  1973	
  paper	
  by	
  
staDsDcian	
  Francis	
  Anscombe.	
  	
  
	
  
His	
  paper	
  –	
  Graphs	
  in	
  StaDsDcal	
  Analysis	
  –	
  was	
  based	
  around	
  the	
  claim	
  that	
  “graphs	
  
are	
  essenDal	
  to	
  good	
  staDsDcal	
  analysis”.	
  
15	
  
But	
  this	
  is	
  where	
  we	
  start	
  to	
  hit	
  some	
  stumbling	
  blocks.	
  
16	
  
And	
  a	
  big	
  stumbling	
  block	
  is	
  one	
  that	
  is	
  oRen	
  denied	
  in	
  higher	
  educaDon,	
  which	
  is	
  the	
  
provision	
  of	
  skills,	
  as	
  compared	
  to	
  “higher	
  level	
  conceptual	
  or	
  academic	
  
understanding”.	
  
	
  
There	
  is	
  an	
  old	
  saw	
  that	
  we	
  become	
  beMer	
  writers	
  through	
  reading	
  more.	
  But	
  how	
  
much	
  Dme	
  do	
  you	
  invest	
  in	
  reading	
  charts?	
  
	
  
Really	
  reading	
  them?	
  
	
  
I	
  came	
  across	
  this	
  beauDfully	
  Dtled	
  book	
  a	
  few	
  weeks	
  ago	
  	
  -­‐	
  “Making	
  Sense	
  of	
  
Squiggly	
  Lines”.	
  
	
  
The	
  blurb	
  on	
  the	
  back	
  summarises	
  the	
  situaDon	
  well:	
  “Data	
  points	
  are	
  just	
  words,	
  but	
  
when	
  connected	
  with	
  a	
  squiggly	
  line	
  they	
  tell	
  a	
  story”.	
  
17	
  
18	
  
In	
  an	
  ideal	
  world,	
  the	
  process	
  would	
  be	
  simple:	
  have	
  data,	
  get	
  story.	
  
19	
  
But	
  it’s	
  not	
  that	
  simple.	
  	
  
	
  
It’s	
  more	
  likely	
  that	
  we	
  need	
  to	
  engage	
  with	
  the	
  dataset	
  to	
  try	
  to	
  tease	
  the	
  stories	
  out	
  
of	
  it,	
  or	
  facts	
  and	
  relaDonships	
  from	
  it	
  that	
  we	
  can	
  used	
  to	
  support	
  the	
  claims	
  we	
  
make	
  in	
  a	
  narraDon	
  of	
  some	
  sort	
  of	
  story	
  that	
  is	
  at	
  least	
  supported	
  by	
  the	
  data,	
  or	
  
contextualises	
  it	
  in	
  a	
  narraDve	
  way	
  that	
  is	
  hopefully	
  “truthy”.	
  
20	
  
One	
  of	
  the	
  ways	
  I	
  like	
  to	
  work	
  with	
  data	
  is	
  to	
  have	
  a	
  conversaDon	
  with	
  it	
  –	
  asking	
  
quesDons	
  of	
  it	
  and	
  then	
  further	
  quesDons	
  based	
  on	
  the	
  responses	
  I	
  get.	
  
21	
  
SomeDmes	
  it	
  looks	
  at	
  first	
  as	
  if	
  we	
  have	
  data	
  in	
  a	
  form	
  where	
  we	
  might	
  be	
  able	
  to	
  do	
  
something	
  with	
  it	
  –	
  then	
  we	
  realise	
  it	
  needs	
  cleaning	
  and	
  reshaping.	
  
	
  
For	
  example,	
  in	
  this	
  case	
  we	
  have	
  percentage	
  signs	
  contaminaDng	
  numbers,	
  data	
  
organised	
  in	
  separate	
  secDons	
  –	
  but	
  how	
  do	
  we	
  get	
  a	
  “well	
  behaved”	
  view	
  over	
  	
  data	
  
from	
  all	
  the	
  wards	
  –	
  and	
  different	
  sorts	
  of	
  data:	
  votes	
  polled	
  per	
  candidate	
  versus	
  the	
  
size	
  of	
  the	
  electorate	
  in	
  a	
  parDcular	
  ward	
  for	
  example.	
  
	
  
Walkthrough:	
  hMp://blog.ouseful.info/2013/05/03/a-­‐wrangling-­‐example-­‐with-­‐
openrefine-­‐making-­‐ready-­‐data/	
  
22	
  
But	
  this	
  is	
  where	
  we	
  start	
  to	
  hit	
  some	
  stumbling	
  blocks.	
  
23	
  
And	
  a	
  big	
  stumbling	
  block	
  is	
  one	
  that	
  is	
  oRen	
  denied	
  in	
  higher	
  educaDon,	
  which	
  is	
  the	
  
provision	
  of	
  skills,	
  as	
  compared	
  to	
  “higher	
  level	
  conceptual	
  or	
  academic	
  
understanding”.	
  
24	
  
Tidying	
  data	
  –	
  or	
  cleaning	
  data	
  –	
  or	
  more	
  colloquially,	
  “wrangling	
  data”	
  –	
  refers	
  to	
  
the	
  process	
  we	
  need	
  to	
  engage	
  in	
  to	
  turn	
  a	
  dataset	
  we	
  have	
  found	
  into	
  one	
  that	
  is	
  
useable.	
  
	
  
Many	
  published	
  datasets	
  are	
  horrible.	
  
	
  
Really	
  horrible.	
  
	
  
They	
  don’t	
  work	
  as	
  we	
  might	
  want	
  or	
  expect	
  them	
  to	
  in	
  the	
  applicaDons	
  we	
  tend	
  to	
  
have	
  to	
  hand.	
  
25	
  
Take	
  producing	
  data	
  visualisaDons,	
  for	
  example:	
  have	
  data,	
  produce	
  visualisaDon.	
  
	
  
No.	
  
	
  
That’s	
  like	
  saying:	
  have	
  two	
  hours	
  of	
  rambling	
  conversaDon	
  with	
  source,	
  have	
  200	
  
word	
  story	
  with	
  strong	
  quotes.	
  
	
  
No.	
  Just:	
  no.	
  
	
  
It	
  doesn’t	
  work	
  like	
  that.	
  
	
  
Yes,	
  there	
  are	
  powerful	
  charDng	
  tools	
  available	
  BUT	
  they	
  require	
  the	
  data	
  to	
  be	
  clean	
  
and	
  Ddy	
  and	
  to	
  be	
  in	
  the	
  right	
  shape	
  for	
  the	
  tool.	
  But	
  it	
  typically	
  isn’t.	
  
26	
  
We	
  have	
  to	
  wrangle	
  it.	
  
	
  
Now	
  wrangling	
  is	
  a	
  technical	
  job,	
  and	
  arguably	
  a	
  job	
  for	
  technicians	
  –	
  higher	
  
apprenDces	
  	
  of	
  the	
  journalisDc	
  world	
  –	
  not	
  graduate	
  journalists.	
  
	
  
But	
  I	
  think	
  out	
  journalists	
  are	
  going	
  to	
  have	
  to	
  learn	
  the	
  equivalent	
  of	
  some	
  
machining	
  in	
  the	
  mechanical	
  world.	
  
27	
  
Just	
  by	
  the	
  by,	
  I	
  didn’t	
  draw	
  those	
  block	
  diagrams,	
  I	
  wrote	
  them.	
  
28	
  
I	
  “wrote”	
  these	
  charts	
  –	
  you	
  can	
  see	
  how	
  at	
  the	
  top.	
  That	
  code	
  –	
  applied	
  to	
  a	
  suitably	
  
shaped	
  version	
  of	
  a	
  dataset	
  known	
  as	
  Anscombe’s	
  Quartet.	
  
	
  
The	
  data	
  has	
  been	
  reshaped	
  to	
  3	
  column	
  format:	
  a	
  column	
  for	
  the	
  x	
  values,	
  that	
  are	
  
ploMed	
  on	
  the	
  horizontal	
  x-­‐axes;	
  	
  a	
  column	
  for	
  the	
  y	
  values,	
  that	
  form	
  the	
  verDcal	
  y-­‐
axes;	
  and	
  a	
  column	
  for	
  the	
  groups,	
  which	
  specify	
  which	
  panel,	
  or	
  facet,	
  each	
  point	
  
should	
  be	
  ploMed	
  in.	
  
	
  
The	
  code	
  defines	
  the	
  construcDon	
  of	
  those	
  charts.	
  Exactly.	
  There	
  is	
  no	
  magic.	
  At	
  least,	
  
no	
  other	
  magic.	
  
29	
  
One	
  of	
  the	
  first	
  datasets	
  I	
  played	
  with	
  was	
  MPs’	
  expenses	
  data.	
  Here	
  are	
  a	
  couple	
  of	
  ways	
  I	
  started	
  to	
  ch
	
  
The	
  bar	
  chart	
  Is	
  ordered,	
  for	
  a	
  parDcular	
  expenses	
  area,	
  by	
  total	
  amount	
  for	
  each	
  individual	
  MP.	
  
	
  
The	
  block	
  histogram	
  shows	
  how	
  many	
  MPs	
  made	
  a	
  total	
  claim	
  in	
  parDcular	
  expenses	
  area	
  of	
  a	
  parDcular
CriDcal	
  judgement	
  –	
  it	
  applies	
  to	
  data	
  too...	
  
31	
  
One	
  of	
  the	
  things	
  to	
  menDon	
  about	
  mapping	
  data	
  mapping	
  and	
  visualisaDon	
  
techniques	
  is	
  that	
  they	
  oRen	
  tells	
  us	
  things	
  we	
  already	
  (think	
  we)	
  know;	
  in	
  that	
  sense,	
  
they	
  are	
  not	
  news.	
  But	
  they	
  may	
  also	
  tell	
  us	
  things	
  we	
  know	
  in	
  new,	
  visually	
  
appealing	
  ways.	
  And	
  by	
  making	
  use	
  of	
  such	
  ‘confirmatory’	
  visualisaDons	
  and	
  displays	
  
we	
  can	
  build	
  confidence	
  within	
  an	
  audience	
  that	
  they	
  know	
  how	
  to	
  interpret	
  these	
  
sorts	
  of	
  representaDon.	
  
32	
  
As	
  the	
  audience	
  becomes	
  comfortable	
  reading	
  the	
  charts	
  and	
  making	
  sense	
  of	
  data,	
  
when	
  there	
  is	
  something	
  new	
  or	
  surprising	
  in	
  the	
  data,	
  the	
  surprise	
  manifests	
  itself	
  in	
  
the	
  reading	
  of	
  the	
  data	
  or	
  chart.	
  
	
  
For	
  journalists	
  working	
  with	
  data,	
  developing	
  a	
  sense	
  of	
  familiarity	
  with	
  how	
  to	
  
interpret	
  and	
  read	
  data	
  when	
  it	
  is	
  just	
  confirming	
  what	
  you	
  already	
  know	
  helps	
  to	
  
refine	
  your	
  senses	
  for	
  sposng	
  things	
  that	
  are	
  odd,	
  noteworthy,	
  or	
  newsworthy.	
  
	
  
Taking	
  a	
  liMle	
  bit	
  of	
  Dme	
  each	
  day	
  to:	
  
	
  
-­‐ 	
  read	
  charts	
  as	
  if	
  they	
  were	
  stories;	
  
-­‐ 	
  look	
  behind	
  the	
  data	
  to	
  find	
  original	
  sources,	
  such	
  as	
  polls	
  or	
  data	
  containing	
  news	
  
releases,	
  and	
  then	
  compare	
  the	
  original	
  release	
  with	
  the	
  way	
  it	
  is	
  reported,	
  paying	
  
parDcular	
  aMenDon	
  to	
  the	
  points	
  that	
  are	
  highlighted,	
  and	
  how	
  the	
  data	
  is	
  
contextualised;	
  
will	
  help	
  you	
  develop	
  some	
  of	
  the	
  skills	
  you	
  will	
  need	
  if	
  you	
  want	
  to	
  be	
  able	
  to	
  
idenDfy,	
  develop	
  and	
  treat	
  some	
  of	
  the	
  stories	
  that	
  your	
  specialist	
  source	
  that	
  is	
  data	
  
can	
  provide	
  you	
  with,	
  of	
  only	
  you	
  ask…	
  	
  
33	
  
A	
  scaMerplot	
  is	
  another	
  very	
  powerful	
  sort	
  of	
  chart	
  –	
  we	
  can	
  plot	
  two	
  sorts	
  of	
  value	
  against	
  each	
  other	
  to
	
  
Some	
  scaMerplot	
  tools	
  allow	
  you	
  to	
  size	
  or	
  colour	
  nodes	
  according	
  to	
  further	
  dimensions.	
  Colouring	
  node
Maps	
  can	
  be	
  used	
  to	
  pull	
  out	
  different	
  sorts	
  of	
  relaDonships	
  –	
  for	
  example,	
  plosng	
  
markers	
  in	
  the	
  centre	
  of	
  each	
  MP’s	
  ward	
  coloured	
  by	
  the	
  total	
  value	
  of	
  travel	
  
expenses	
  claim	
  in	
  a	
  parDcular	
  area,	
  we	
  can	
  easily	
  see	
  whether	
  or	
  not	
  an	
  MP	
  is	
  
claiming	
  an	
  amount	
  significantly	
  different	
  to	
  MPs	
  in	
  neighbouring	
  wards.	
  In	
  this	
  case	
  
–	
  travel	
  expenses	
  –	
  we	
  might	
  expect	
  	
  (at	
  first	
  glance	
  at	
  least)	
  a	
  homophiliDc	
  effect	
  –	
  
folk	
  a	
  similar	
  distance	
  away	
  from	
  Westminster	
  should	
  presumably	
  make	
  similar	
  sorts	
  
of	
  travel	
  claim?	
  At	
  second	
  glance,	
  we	
  might	
  then	
  start	
  to	
  refine	
  our	
  quesDoning	
  –	
  
does	
  ward	
  size	
  (in	
  terms	
  of	
  geographical	
  area)	
  or	
  rurality	
  have	
  an	
  effect?	
  Does	
  an	
  MP	
  
travel	
  to	
  and	
  from	
  home	
  more	
  than	
  neighbours	
  (or	
  perhaps	
  claim	
  more	
  in	
  terms	
  of	
  
accommodaDon	
  in	
  London?)	
  
35	
  
SomeDmes	
  we	
  need	
  to	
  provide	
  quite	
  a	
  lot	
  of	
  explanaDon	
  when	
  it	
  comes	
  to	
  making	
  
sense	
  of	
  even	
  a	
  simple	
  data	
  visualisaDon	
  –	
  “what	
  am	
  I	
  supposed	
  to	
  be	
  looking	
  at?”	
  
36	
  
The	
  other	
  way	
  of	
  using	
  data	
  is	
  to	
  tell	
  stories.	
  But	
  what	
  does	
  that	
  even	
  mean…?	
  
37	
  
The	
  other	
  way	
  of	
  using	
  data	
  is	
  to	
  tell	
  stories.	
  But	
  what	
  does	
  that	
  even	
  mean…?	
  
38	
  
In	
  passing,	
  it’s	
  worth	
  menDoning	
  that	
  one	
  thing	
  staDsDcs	
  does	
  is	
  help	
  provide	
  
context.	
  
	
  
Is	
  this	
  number	
  a	
  big	
  number	
  in	
  the	
  greater	
  scheme	
  of	
  things?	
  Is	
  this	
  thing	
  likely	
  to	
  
happen	
  by	
  chance	
  or	
  is	
  there	
  a	
  meaningful	
  causal	
  relaDonship	
  between	
  this	
  thing	
  and	
  
another	
  thing?	
  
	
  
The	
  chart	
  in	
  the	
  corner	
  is	
  a	
  reminder	
  about	
  how	
  surprising	
  probabiliDes	
  can	
  be.	
  The	
  
chart	
  shows	
  the	
  probability	
  (y-­‐axis)	
  that	
  two	
  people	
  share	
  a	
  birthday	
  (the	
  number	
  of	
  
people	
  is	
  given	
  on	
  the	
  x-­‐axis).	
  The	
  chart	
  shows	
  that	
  if	
  there	
  are	
  23	
  or	
  more	
  people	
  in	
  
a	
  room,	
  there	
  is	
  more	
  than	
  a	
  50/50	
  chance	
  that	
  two	
  of	
  them	
  will	
  share	
  a	
  birthday	
  
(that	
  is,	
  share	
  the	
  same	
  birth	
  day	
  and	
  month,	
  though	
  not	
  necessarily	
  same	
  birth	
  
year).	
  
	
  
How	
  many	
  people	
  are	
  in	
  the	
  room?	
  If	
  it’s	
  more	
  than	
  23	
  –	
  I	
  bet	
  that	
  at	
  least	
  two	
  
people	
  share	
  a	
  birthday	
  (at	
  least	
  in	
  terms	
  of	
  day	
  and	
  month).	
  
39	
  
40	
  
One	
  of	
  the	
  first	
  datasets	
  I	
  played	
  with	
  was	
  MPs’	
  expenses	
  data.	
  Here	
  are	
  a	
  couple	
  of	
  ways	
  I	
  started	
  to	
  ch
	
  
The	
  bar	
  chart	
  Is	
  ordered,	
  for	
  a	
  parDcular	
  expenses	
  area,	
  by	
  total	
  amount	
  for	
  each	
  individual	
  MP.	
  
	
  
The	
  block	
  histogram	
  shows	
  how	
  many	
  MPs	
  made	
  a	
  total	
  claim	
  in	
  parDcular	
  expenses	
  area	
  of	
  a	
  parDcular
42	
  
43	
  
44	
  
45	
  
46	
  
47	
  
48	
  
49	
  
50	
  
51	
  
52	
  
The	
  other	
  way	
  of	
  using	
  data	
  is	
  to	
  tell	
  stories.	
  But	
  what	
  does	
  that	
  even	
  mean…?	
  
53	
  
54	
  
55	
  

Contenu connexe

Similaire à Lincoln jun14datajournalism

TED Wiley Visualizing .docx
TED  Wiley Visualizing .docxTED  Wiley Visualizing .docx
TED Wiley Visualizing .docx
ssuserf9c51d
 
Please accept this assignment 25 pages minimum double space courie.docx
Please accept this assignment 25 pages minimum double space courie.docxPlease accept this assignment 25 pages minimum double space courie.docx
Please accept this assignment 25 pages minimum double space courie.docx
randymartin91030
 

Similaire à Lincoln jun14datajournalism (20)

The future value of data initial perspective
The future value of data   initial perspectiveThe future value of data   initial perspective
The future value of data initial perspective
 
Data Science and its relationship to Big Data and data-driven decision making
Data Science and its relationship to Big Data and data-driven decision makingData Science and its relationship to Big Data and data-driven decision making
Data Science and its relationship to Big Data and data-driven decision making
 
TED Wiley Visualizing .docx
TED  Wiley Visualizing .docxTED  Wiley Visualizing .docx
TED Wiley Visualizing .docx
 
Data Science and its Relationship to Big Data and Data-Driven Decision Making
Data Science and its Relationship to Big Data and Data-Driven Decision MakingData Science and its Relationship to Big Data and Data-Driven Decision Making
Data Science and its Relationship to Big Data and Data-Driven Decision Making
 
An Introduction to Data Visualization
An Introduction to Data VisualizationAn Introduction to Data Visualization
An Introduction to Data Visualization
 
Transcript of Webinar: Data management plans (DMPs) - audio
Transcript of Webinar: Data management plans (DMPs) - audioTranscript of Webinar: Data management plans (DMPs) - audio
Transcript of Webinar: Data management plans (DMPs) - audio
 
Caught in the Middle: Librarians, Scholars, and Information Revolutions Today...
Caught in the Middle: Librarians, Scholars, and Information Revolutions Today...Caught in the Middle: Librarians, Scholars, and Information Revolutions Today...
Caught in the Middle: Librarians, Scholars, and Information Revolutions Today...
 
Data dynamite presentation
Data dynamite presentationData dynamite presentation
Data dynamite presentation
 
Democratizing Data
Democratizing DataDemocratizing Data
Democratizing Data
 
12 principles of data story design
12 principles of data story design12 principles of data story design
12 principles of data story design
 
Please accept this assignment 25 pages minimum double space courie.docx
Please accept this assignment 25 pages minimum double space courie.docxPlease accept this assignment 25 pages minimum double space courie.docx
Please accept this assignment 25 pages minimum double space courie.docx
 
Data fluency for the 21st century
Data fluency for the 21st centuryData fluency for the 21st century
Data fluency for the 21st century
 
Reflections on NETS Refresh
Reflections on NETS RefreshReflections on NETS Refresh
Reflections on NETS Refresh
 
Research unit booklet
Research unit bookletResearch unit booklet
Research unit booklet
 
Managing and publishing sensitive data in the social sciences - Webinar trans...
Managing and publishing sensitive data in the social sciences - Webinar trans...Managing and publishing sensitive data in the social sciences - Webinar trans...
Managing and publishing sensitive data in the social sciences - Webinar trans...
 
Impact & Interaction: social media as part of communication strategy for rese...
Impact & Interaction: social media as part of communication strategy for rese...Impact & Interaction: social media as part of communication strategy for rese...
Impact & Interaction: social media as part of communication strategy for rese...
 
What is Data Science
What is Data ScienceWhat is Data Science
What is Data Science
 
Information Behaviors versus Knowledge
Information Behaviors versus KnowledgeInformation Behaviors versus Knowledge
Information Behaviors versus Knowledge
 
1 Introduction to-data-mining lecture
1   Introduction to-data-mining lecture1   Introduction to-data-mining lecture
1 Introduction to-data-mining lecture
 
+Cross
+Cross+Cross
+Cross
 

Plus de Tony Hirst

Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Tony Hirst
 
Lincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data JournalismLincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data Journalism
Tony Hirst
 

Plus de Tony Hirst (20)

15 in 20 research fiesta
15 in 20 research fiesta15 in 20 research fiesta
15 in 20 research fiesta
 
Dev8d jupyter
Dev8d jupyterDev8d jupyter
Dev8d jupyter
 
Ili 16 robot
Ili 16 robotIli 16 robot
Ili 16 robot
 
Jupyternotebooks ou.pptx
Jupyternotebooks ou.pptxJupyternotebooks ou.pptx
Jupyternotebooks ou.pptx
 
Virtual computing.pptx
Virtual computing.pptxVirtual computing.pptx
Virtual computing.pptx
 
ouseful-parlihacks
ouseful-parlihacksouseful-parlihacks
ouseful-parlihacks
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriate
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriate
 
Robotlab jupyter
Robotlab   jupyterRobotlab   jupyter
Robotlab jupyter
 
Notes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 WorkshopNotes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 Workshop
 
Community Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireCommunity Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wire
 
Residential school 2015_robotics_interest
Residential school 2015_robotics_interestResidential school 2015_robotics_interest
Residential school 2015_robotics_interest
 
Data Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXData Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKX
 
Week4
Week4Week4
Week4
 
A Quick Tour of OpenRefine
A Quick Tour of OpenRefineA Quick Tour of OpenRefine
A Quick Tour of OpenRefine
 
Conversations with data
Conversations with dataConversations with data
Conversations with data
 
Data reuse OU workshop bingo
Data reuse OU workshop bingoData reuse OU workshop bingo
Data reuse OU workshop bingo
 
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
 
Lincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data JournalismLincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data Journalism
 
Calrg14 tm351
Calrg14 tm351Calrg14 tm351
Calrg14 tm351
 

Dernier

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Dernier (20)

Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 

Lincoln jun14datajournalism

  • 1. Taking  this  opportunity  to  explore  some  of  the  issues  associated  with  whatever  this   thing  called  “data  journalism”  is…   1  
  • 2. I’m  not  a  journalist,  and  don’t  have  any  form  of   journalism  training.  But  I  do  have  an  interest  in  ICT,   and  from  that  have  an  interest  in   “communicaDon”.     Let’s  start  with  an  easy(?!)  quesDon  -­‐  what  is   journalism?     One  way  of  answering  that  quesDon  is  to  list  some   of  the  funcDons,  or  aMributed,  associated  with  it  –   informing,  educaDng,  holding  to  account,   watchdog  funcDon,  campaigning,  contextualising   for  a  par'cular  audience.   2  
  • 3. Sensemaking  seems  to  me  to  be  an  important  part  of  it…  In  part  contextualisaDon,  in   part  idenDfying  the  bits  that  make  the  difference,  the  bits  that  make  it  important,  the   bits  that  make  it  news  that  people  need  to  know…     …and  oRen  with  a  parDcular  audience  in  mind.   3  
  • 5. Second  quesDon:  what  is  data?  NaDonal  staDsDcs,  sports  results,  polls,  financial   figures,  health  data,  school  league  tables,  etc  etc.     Is  a  book  data?  Or  a  speech?  What  if  I  split  a  speech  up  into  separate  words,  count   the  occurrence  of  each  unique  word  and  then  display  the  result  as  a  “tag  cloud”,  or   word  frequency  diagram.   5  
  • 6. One  way  of  thinking  about  data  is  that  it  is  a  parDcular  sort  of  source,  or  a  source  that   can  respond  to  a  parDcular  style  of  quesDoning  in  a  parDcular  way.     Another  take  on  this  is  that  many  “data  sources”  are  experts  on  a  parDcular  topic,   experts  that  know  a  lot  of  a  very  parDcular  class  of  facts.   6  
  • 7. One  way  of  thinking  about  data  is  that  it  is  a  parDcular  sort  of  source,  or  a  source  that   can  respond  to  a  parDcular  style  of  quesDoning  in  a  parDcular  way.     Another  take  on  this  is  that  many  “data  sources”  are  experts  on  a  parDcular  topic,   experts  that  know  a  lot  of  a  very  parDcular  class  of  facts.   7  
  • 8. So  what  is  data  journalism?     If  I  was  to  ask  you,  the  members  of  a  school  of  journalism,  “is  this  or  that  news  arDcle   ‘journalism’”  I  imagine  one  response  might,  “well….  It’s  the  output  of  a  journalisDc   process.”     But  if  I  point  at  a  map  with  some  markers  on  it  and  ask:  “is  this  map  “data   journalism”,  you  might  answer:  yes.  Or  at  least,  that’s  what  many  of  the  early  job  ads   for  data  journalists  implied.   8  
  • 9. Sports  journalism  has  sport  as  the  topical  contextual  frame  for  some  journalisDc   acDvity,     PoliDcal  journalism  has  poliDcs  as  the  topical  contextual  frame  for  some  journalisDc   acDvity,     InvesDgaDve  journalism  has  a  parDcular  process  as  the  contextual  frame  for  some   journalisDc  acDvity,  a  process  that  may  be  applied  to  parDcular  topic  areas.     So  for  data  journalism  does  “data”  relate  to  the  topic  or  the  process?     Where  we  focus  on  data  outputs,  then  the  implicaDon  is  that  the  “topic”  of  data  is   the  focus  of  the  framing.  But  I  think  we  need  to  reframe  to  consider  the  procedural   role.     9  
  • 10. So  as  a  starDng  point,  let’s  frame  the  idea  that  data  journalism  is  a  process  related   epithet  that  implies  one  of  the  key  sources  in  a  journalisDc  acDvity  is  “data”.   10  
  • 11. 11  
  • 12. By  focusing  on  this  noDon  of  data  journalism  as  relaDng  to  process,  we  can  then  start   to  explore  with  a  liMle  bit  more  criDcality  what  the  pracDce  of  data  journalism  might   involve  that  idenDfies  it  as  such.     That  is,  how  is  pracDce  influenced  by  the  fact  that  it  must  engage  with  “data  as  a   source”?   12  
  • 13. The  inverted  pyramid  gives  us  one  way  of  considering  the  data  journalisDc  process,  or   at  least  idenDfying  some  of  the  steps  involved  in  a  data  invesDgaDon.     But  there  are  many  other  ways  of  conceptualising  the  process  –  for  example,  finding   stories  and  telling  stories…   13  
  • 14. When  it  comes  to  finding  stories,  do  we:     a)  want  to  find  stories  in  a  dataset  we  are  provided  with,  or   b)  use  data  to  help  draw  out  a  story  lead  we  have  already  been  Dpped  off  to?   14  
  • 15. Anscombe’s  Quartet  is  a  toy  dataset  that  first  appeared  in  a  1973  paper  by   staDsDcian  Francis  Anscombe.       His  paper  –  Graphs  in  StaDsDcal  Analysis  –  was  based  around  the  claim  that  “graphs   are  essenDal  to  good  staDsDcal  analysis”.   15  
  • 16. But  this  is  where  we  start  to  hit  some  stumbling  blocks.   16  
  • 17. And  a  big  stumbling  block  is  one  that  is  oRen  denied  in  higher  educaDon,  which  is  the   provision  of  skills,  as  compared  to  “higher  level  conceptual  or  academic   understanding”.     There  is  an  old  saw  that  we  become  beMer  writers  through  reading  more.  But  how   much  Dme  do  you  invest  in  reading  charts?     Really  reading  them?     I  came  across  this  beauDfully  Dtled  book  a  few  weeks  ago    -­‐  “Making  Sense  of   Squiggly  Lines”.     The  blurb  on  the  back  summarises  the  situaDon  well:  “Data  points  are  just  words,  but   when  connected  with  a  squiggly  line  they  tell  a  story”.   17  
  • 18. 18  
  • 19. In  an  ideal  world,  the  process  would  be  simple:  have  data,  get  story.   19  
  • 20. But  it’s  not  that  simple.       It’s  more  likely  that  we  need  to  engage  with  the  dataset  to  try  to  tease  the  stories  out   of  it,  or  facts  and  relaDonships  from  it  that  we  can  used  to  support  the  claims  we   make  in  a  narraDon  of  some  sort  of  story  that  is  at  least  supported  by  the  data,  or   contextualises  it  in  a  narraDve  way  that  is  hopefully  “truthy”.   20  
  • 21. One  of  the  ways  I  like  to  work  with  data  is  to  have  a  conversaDon  with  it  –  asking   quesDons  of  it  and  then  further  quesDons  based  on  the  responses  I  get.   21  
  • 22. SomeDmes  it  looks  at  first  as  if  we  have  data  in  a  form  where  we  might  be  able  to  do   something  with  it  –  then  we  realise  it  needs  cleaning  and  reshaping.     For  example,  in  this  case  we  have  percentage  signs  contaminaDng  numbers,  data   organised  in  separate  secDons  –  but  how  do  we  get  a  “well  behaved”  view  over    data   from  all  the  wards  –  and  different  sorts  of  data:  votes  polled  per  candidate  versus  the   size  of  the  electorate  in  a  parDcular  ward  for  example.     Walkthrough:  hMp://blog.ouseful.info/2013/05/03/a-­‐wrangling-­‐example-­‐with-­‐ openrefine-­‐making-­‐ready-­‐data/   22  
  • 23. But  this  is  where  we  start  to  hit  some  stumbling  blocks.   23  
  • 24. And  a  big  stumbling  block  is  one  that  is  oRen  denied  in  higher  educaDon,  which  is  the   provision  of  skills,  as  compared  to  “higher  level  conceptual  or  academic   understanding”.   24  
  • 25. Tidying  data  –  or  cleaning  data  –  or  more  colloquially,  “wrangling  data”  –  refers  to   the  process  we  need  to  engage  in  to  turn  a  dataset  we  have  found  into  one  that  is   useable.     Many  published  datasets  are  horrible.     Really  horrible.     They  don’t  work  as  we  might  want  or  expect  them  to  in  the  applicaDons  we  tend  to   have  to  hand.   25  
  • 26. Take  producing  data  visualisaDons,  for  example:  have  data,  produce  visualisaDon.     No.     That’s  like  saying:  have  two  hours  of  rambling  conversaDon  with  source,  have  200   word  story  with  strong  quotes.     No.  Just:  no.     It  doesn’t  work  like  that.     Yes,  there  are  powerful  charDng  tools  available  BUT  they  require  the  data  to  be  clean   and  Ddy  and  to  be  in  the  right  shape  for  the  tool.  But  it  typically  isn’t.   26  
  • 27. We  have  to  wrangle  it.     Now  wrangling  is  a  technical  job,  and  arguably  a  job  for  technicians  –  higher   apprenDces    of  the  journalisDc  world  –  not  graduate  journalists.     But  I  think  out  journalists  are  going  to  have  to  learn  the  equivalent  of  some   machining  in  the  mechanical  world.   27  
  • 28. Just  by  the  by,  I  didn’t  draw  those  block  diagrams,  I  wrote  them.   28  
  • 29. I  “wrote”  these  charts  –  you  can  see  how  at  the  top.  That  code  –  applied  to  a  suitably   shaped  version  of  a  dataset  known  as  Anscombe’s  Quartet.     The  data  has  been  reshaped  to  3  column  format:  a  column  for  the  x  values,  that  are   ploMed  on  the  horizontal  x-­‐axes;    a  column  for  the  y  values,  that  form  the  verDcal  y-­‐ axes;  and  a  column  for  the  groups,  which  specify  which  panel,  or  facet,  each  point   should  be  ploMed  in.     The  code  defines  the  construcDon  of  those  charts.  Exactly.  There  is  no  magic.  At  least,   no  other  magic.   29  
  • 30. One  of  the  first  datasets  I  played  with  was  MPs’  expenses  data.  Here  are  a  couple  of  ways  I  started  to  ch   The  bar  chart  Is  ordered,  for  a  parDcular  expenses  area,  by  total  amount  for  each  individual  MP.     The  block  histogram  shows  how  many  MPs  made  a  total  claim  in  parDcular  expenses  area  of  a  parDcular
  • 31. CriDcal  judgement  –  it  applies  to  data  too...   31  
  • 32. One  of  the  things  to  menDon  about  mapping  data  mapping  and  visualisaDon   techniques  is  that  they  oRen  tells  us  things  we  already  (think  we)  know;  in  that  sense,   they  are  not  news.  But  they  may  also  tell  us  things  we  know  in  new,  visually   appealing  ways.  And  by  making  use  of  such  ‘confirmatory’  visualisaDons  and  displays   we  can  build  confidence  within  an  audience  that  they  know  how  to  interpret  these   sorts  of  representaDon.   32  
  • 33. As  the  audience  becomes  comfortable  reading  the  charts  and  making  sense  of  data,   when  there  is  something  new  or  surprising  in  the  data,  the  surprise  manifests  itself  in   the  reading  of  the  data  or  chart.     For  journalists  working  with  data,  developing  a  sense  of  familiarity  with  how  to   interpret  and  read  data  when  it  is  just  confirming  what  you  already  know  helps  to   refine  your  senses  for  sposng  things  that  are  odd,  noteworthy,  or  newsworthy.     Taking  a  liMle  bit  of  Dme  each  day  to:     -­‐   read  charts  as  if  they  were  stories;   -­‐   look  behind  the  data  to  find  original  sources,  such  as  polls  or  data  containing  news   releases,  and  then  compare  the  original  release  with  the  way  it  is  reported,  paying   parDcular  aMenDon  to  the  points  that  are  highlighted,  and  how  the  data  is   contextualised;   will  help  you  develop  some  of  the  skills  you  will  need  if  you  want  to  be  able  to   idenDfy,  develop  and  treat  some  of  the  stories  that  your  specialist  source  that  is  data   can  provide  you  with,  of  only  you  ask…     33  
  • 34. A  scaMerplot  is  another  very  powerful  sort  of  chart  –  we  can  plot  two  sorts  of  value  against  each  other  to   Some  scaMerplot  tools  allow  you  to  size  or  colour  nodes  according  to  further  dimensions.  Colouring  node
  • 35. Maps  can  be  used  to  pull  out  different  sorts  of  relaDonships  –  for  example,  plosng   markers  in  the  centre  of  each  MP’s  ward  coloured  by  the  total  value  of  travel   expenses  claim  in  a  parDcular  area,  we  can  easily  see  whether  or  not  an  MP  is   claiming  an  amount  significantly  different  to  MPs  in  neighbouring  wards.  In  this  case   –  travel  expenses  –  we  might  expect    (at  first  glance  at  least)  a  homophiliDc  effect  –   folk  a  similar  distance  away  from  Westminster  should  presumably  make  similar  sorts   of  travel  claim?  At  second  glance,  we  might  then  start  to  refine  our  quesDoning  –   does  ward  size  (in  terms  of  geographical  area)  or  rurality  have  an  effect?  Does  an  MP   travel  to  and  from  home  more  than  neighbours  (or  perhaps  claim  more  in  terms  of   accommodaDon  in  London?)   35  
  • 36. SomeDmes  we  need  to  provide  quite  a  lot  of  explanaDon  when  it  comes  to  making   sense  of  even  a  simple  data  visualisaDon  –  “what  am  I  supposed  to  be  looking  at?”   36  
  • 37. The  other  way  of  using  data  is  to  tell  stories.  But  what  does  that  even  mean…?   37  
  • 38. The  other  way  of  using  data  is  to  tell  stories.  But  what  does  that  even  mean…?   38  
  • 39. In  passing,  it’s  worth  menDoning  that  one  thing  staDsDcs  does  is  help  provide   context.     Is  this  number  a  big  number  in  the  greater  scheme  of  things?  Is  this  thing  likely  to   happen  by  chance  or  is  there  a  meaningful  causal  relaDonship  between  this  thing  and   another  thing?     The  chart  in  the  corner  is  a  reminder  about  how  surprising  probabiliDes  can  be.  The   chart  shows  the  probability  (y-­‐axis)  that  two  people  share  a  birthday  (the  number  of   people  is  given  on  the  x-­‐axis).  The  chart  shows  that  if  there  are  23  or  more  people  in   a  room,  there  is  more  than  a  50/50  chance  that  two  of  them  will  share  a  birthday   (that  is,  share  the  same  birth  day  and  month,  though  not  necessarily  same  birth   year).     How  many  people  are  in  the  room?  If  it’s  more  than  23  –  I  bet  that  at  least  two   people  share  a  birthday  (at  least  in  terms  of  day  and  month).   39  
  • 40. 40  
  • 41. One  of  the  first  datasets  I  played  with  was  MPs’  expenses  data.  Here  are  a  couple  of  ways  I  started  to  ch   The  bar  chart  Is  ordered,  for  a  parDcular  expenses  area,  by  total  amount  for  each  individual  MP.     The  block  histogram  shows  how  many  MPs  made  a  total  claim  in  parDcular  expenses  area  of  a  parDcular
  • 42. 42  
  • 43. 43  
  • 44. 44  
  • 45. 45  
  • 46. 46  
  • 47. 47  
  • 48. 48  
  • 49. 49  
  • 50. 50  
  • 51. 51  
  • 52. 52  
  • 53. The  other  way  of  using  data  is  to  tell  stories.  But  what  does  that  even  mean…?   53  
  • 54. 54  
  • 55. 55