Lincoln jun14datajournalism

Taking
this
opportunity
to
explore
some
of
the
issues
associated
with
whatever
this

thing
called
“data
journalism”
is…

1

I’m
not
a
journalist,
and
don’t
have
any
form
of

journalism
training.
But
I
do
have
an
interest
in
ICT,

and
from
that
have
an
interest
in

“communicaDon”.

Let’s
start
with
an
easy(?!)
quesDon
-‐
what
is

journalism?

One
way
of
answering
that
quesDon
is
to
list
some

of
the
funcDons,
or
aMributed,
associated
with
it
–

informing,
educaDng,
holding
to
account,

watchdog
funcDon,
campaigning,
contextualising

for
a
par'cular
audience.

2

Sensemaking
seems
to
me
to
be
an
important
part
of
it…
In
part
contextualisaDon,
in

part
idenDfying
the
bits
that
make
the
diﬀerence,
the
bits
that
make
it
important,
the

bits
that
make
it
news
that
people
need
to
know…

…and
oRen
with
a
parDcular
audience
in
mind.

3

Second
quesDon:
what
is
data?
NaDonal
staDsDcs,
sports
results,
polls,
ﬁnancial

ﬁgures,
health
data,
school
league
tables,
etc
etc.

Is
a
book
data?
Or
a
speech?
What
if
I
split
a
speech
up
into
separate
words,
count

the
occurrence
of
each
unique
word
and
then
display
the
result
as
a
“tag
cloud”,
or

word
frequency
diagram.

5

One
way
of
thinking
about
data
is
that
it
is
a
parDcular
sort
of
source,
or
a
source
that

can
respond
to
a
parDcular
style
of
quesDoning
in
a
parDcular
way.

Another
take
on
this
is
that
many
“data
sources”
are
experts
on
a
parDcular
topic,

experts
that
know
a
lot
of
a
very
parDcular
class
of
facts.

6

One
way
of
thinking
about
data
is
that
it
is
a
parDcular
sort
of
source,
or
a
source
that

can
respond
to
a
parDcular
style
of
quesDoning
in
a
parDcular
way.

Another
take
on
this
is
that
many
“data
sources”
are
experts
on
a
parDcular
topic,

experts
that
know
a
lot
of
a
very
parDcular
class
of
facts.

7

So
what
is
data
journalism?

If
I
was
to
ask
you,
the
members
of
a
school
of
journalism,
“is
this
or
that
news
arDcle

‘journalism’”
I
imagine
one
response
might,
“well….
It’s
the
output
of
a
journalisDc

process.”

But
if
I
point
at
a
map
with
some
markers
on
it
and
ask:
“is
this
map
“data

journalism”,
you
might
answer:
yes.
Or
at
least,
that’s
what
many
of
the
early
job
ads

for
data
journalists
implied.

8

Sports
journalism
has
sport
as
the
topical
contextual
frame
for
some
journalisDc

acDvity,

PoliDcal
journalism
has
poliDcs
as
the
topical
contextual
frame
for
some
journalisDc

acDvity,

InvesDgaDve
journalism
has
a
parDcular
process
as
the
contextual
frame
for
some

journalisDc
acDvity,
a
process
that
may
be
applied
to
parDcular
topic
areas.

So
for
data
journalism
does
“data”
relate
to
the
topic
or
the
process?

Where
we
focus
on
data
outputs,
then
the
implicaDon
is
that
the
“topic”
of
data
is

the
focus
of
the
framing.
But
I
think
we
need
to
reframe
to
consider
the
procedural

role.

9

So
as
a
starDng
point,
let’s
frame
the
idea
that
data
journalism
is
a
process
related

epithet
that
implies
one
of
the
key
sources
in
a
journalisDc
acDvity
is
“data”.

10

By
focusing
on
this
noDon
of
data
journalism
as
relaDng
to
process,
we
can
then
start

to
explore
with
a
liMle
bit
more
criDcality
what
the
pracDce
of
data
journalism
might

involve
that
idenDﬁes
it
as
such.

That
is,
how
is
pracDce
inﬂuenced
by
the
fact
that
it
must
engage
with
“data
as
a

source”?

12

The
inverted
pyramid
gives
us
one
way
of
considering
the
data
journalisDc
process,
or

at
least
idenDfying
some
of
the
steps
involved
in
a
data
invesDgaDon.

But
there
are
many
other
ways
of
conceptualising
the
process
–
for
example,
ﬁnding

stories
and
telling
stories…

13

When
it
comes
to
finding
stories,
do
we:

a)  want
to
find
stories
in
a
dataset
we
are
provided
with,
or

b)  use
data
to
help
draw
out
a
story
lead
we
have
already
been
Dpped
off
to?

14

Anscombe’s
Quartet
is
a
toy
dataset
that
ﬁrst
appeared
in
a
1973
paper
by

staDsDcian
Francis
Anscombe.

His
paper
–
Graphs
in
StaDsDcal
Analysis
–
was
based
around
the
claim
that
“graphs

are
essenDal
to
good
staDsDcal
analysis”.

15

But
this
is
where
we
start
to
hit
some
stumbling
blocks.

16

And
a
big
stumbling
block
is
one
that
is
oRen
denied
in
higher
educaDon,
which
is
the

provision
of
skills,
as
compared
to
“higher
level
conceptual
or
academic

understanding”.

There
is
an
old
saw
that
we
become
beMer
writers
through
reading
more.
But
how

much
Dme
do
you
invest
in
reading
charts?

Really
reading
them?

I
came
across
this
beauDfully
Dtled
book
a
few
weeks
ago

-‐
“Making
Sense
of

Squiggly
Lines”.

The
blurb
on
the
back
summarises
the
situaDon
well:
“Data
points
are
just
words,
but

when
connected
with
a
squiggly
line
they
tell
a
story”.

17

In
an
ideal
world,
the
process
would
be
simple:
have
data,
get
story.

19

But
it’s
not
that
simple.

It’s
more
likely
that
we
need
to
engage
with
the
dataset
to
try
to
tease
the
stories
out

of
it,
or
facts
and
relaDonships
from
it
that
we
can
used
to
support
the
claims
we

make
in
a
narraDon
of
some
sort
of
story
that
is
at
least
supported
by
the
data,
or

contextualises
it
in
a
narraDve
way
that
is
hopefully
“truthy”.

20

One
of
the
ways
I
like
to
work
with
data
is
to
have
a
conversaDon
with
it
–
asking

quesDons
of
it
and
then
further
quesDons
based
on
the
responses
I
get.

21

SomeDmes
it
looks
at
first
as
if
we
have
data
in
a
form
where
we
might
be
able
to
do

something
with
it
–
then
we
realise
it
needs
cleaning
and
reshaping.

For
example,
in
this
case
we
have
percentage
signs
contaminaDng
numbers,
data

organised
in
separate
secDons
–
but
how
do
we
get
a
“well
behaved”
view
over

data

from
all
the
wards
–
and
different
sorts
of
data:
votes
polled
per
candidate
versus
the

size
of
the
electorate
in
a
parDcular
ward
for
example.

Walkthrough:
hMp://blog.ouseful.info/2013/05/03/a-‐wrangling-‐example-‐with-‐
openrefine-‐making-‐ready-‐data/

22

But
this
is
where
we
start
to
hit
some
stumbling
blocks.

23

And
a
big
stumbling
block
is
one
that
is
oRen
denied
in
higher
educaDon,
which
is
the

provision
of
skills,
as
compared
to
“higher
level
conceptual
or
academic

understanding”.

24

Tidying
data
–
or
cleaning
data
–
or
more
colloquially,
“wrangling
data”
–
refers
to

the
process
we
need
to
engage
in
to
turn
a
dataset
we
have
found
into
one
that
is

useable.

Many
published
datasets
are
horrible.

Really
horrible.

They
don’t
work
as
we
might
want
or
expect
them
to
in
the
applicaDons
we
tend
to

have
to
hand.

25

Take
producing
data
visualisaDons,
for
example:
have
data,
produce
visualisaDon.

No.

That’s
like
saying:
have
two
hours
of
rambling
conversaDon
with
source,
have
200

word
story
with
strong
quotes.

No.
Just:
no.

It
doesn’t
work
like
that.

Yes,
there
are
powerful
charDng
tools
available
BUT
they
require
the
data
to
be
clean

and
Ddy
and
to
be
in
the
right
shape
for
the
tool.
But
it
typically
isn’t.

26

We
have
to
wrangle
it.

Now
wrangling
is
a
technical
job,
and
arguably
a
job
for
technicians
–
higher

apprenDces

of
the
journalisDc
world
–
not
graduate
journalists.

But
I
think
out
journalists
are
going
to
have
to
learn
the
equivalent
of
some

machining
in
the
mechanical
world.

27

Just
by
the
by,
I
didn’t
draw
those
block
diagrams,
I
wrote
them.

28

I
“wrote”
these
charts
–
you
can
see
how
at
the
top.
That
code
–
applied
to
a
suitably

shaped
version
of
a
dataset
known
as
Anscombe’s
Quartet.

The
data
has
been
reshaped
to
3
column
format:
a
column
for
the
x
values,
that
are

ploMed
on
the
horizontal
x-‐axes;

a
column
for
the
y
values,
that
form
the
verDcal
y-‐
axes;
and
a
column
for
the
groups,
which
specify
which
panel,
or
facet,
each
point

should
be
ploMed
in.

The
code
deﬁnes
the
construcDon
of
those
charts.
Exactly.
There
is
no
magic.
At
least,

no
other
magic.

29

One
of
the
ﬁrst
datasets
I
played
with
was
MPs’
expenses
data.
Here
are
a
couple
of
ways
I
started
to
ch

The
bar
chart
Is
ordered,
for
a
parDcular
expenses
area,
by
total
amount
for
each
individual
MP.

The
block
histogram
shows
how
many
MPs
made
a
total
claim
in
parDcular
expenses
area
of
a
parDcular

CriDcal
judgement
–
it
applies
to
data
too...

31

One
of
the
things
to
menDon
about
mapping
data
mapping
and
visualisaDon

techniques
is
that
they
oRen
tells
us
things
we
already
(think
we)
know;
in
that
sense,

they
are
not
news.
But
they
may
also
tell
us
things
we
know
in
new,
visually

appealing
ways.
And
by
making
use
of
such
‘conﬁrmatory’
visualisaDons
and
displays

we
can
build
conﬁdence
within
an
audience
that
they
know
how
to
interpret
these

sorts
of
representaDon.

32

As
the
audience
becomes
comfortable
reading
the
charts
and
making
sense
of
data,

when
there
is
something
new
or
surprising
in
the
data,
the
surprise
manifests
itself
in

the
reading
of
the
data
or
chart.

For
journalists
working
with
data,
developing
a
sense
of
familiarity
with
how
to

interpret
and
read
data
when
it
is
just
confirming
what
you
already
know
helps
to

refine
your
senses
for
sposng
things
that
are
odd,
noteworthy,
or
newsworthy.

Taking
a
liMle
bit
of
Dme
each
day
to:

-‐ 
read
charts
as
if
they
were
stories;

-‐ 
look
behind
the
data
to
find
original
sources,
such
as
polls
or
data
containing
news

releases,
and
then
compare
the
original
release
with
the
way
it
is
reported,
paying

parDcular
aMenDon
to
the
points
that
are
highlighted,
and
how
the
data
is

contextualised;

will
help
you
develop
some
of
the
skills
you
will
need
if
you
want
to
be
able
to

idenDfy,
develop
and
treat
some
of
the
stories
that
your
specialist
source
that
is
data

can
provide
you
with,
of
only
you
ask…

33

A
scaMerplot
is
another
very
powerful
sort
of
chart
–
we
can
plot
two
sorts
of
value
against
each
other
to

Some
scaMerplot
tools
allow
you
to
size
or
colour
nodes
according
to
further
dimensions.
Colouring
node

Maps
can
be
used
to
pull
out
different
sorts
of
relaDonships
–
for
example,
plosng

markers
in
the
centre
of
each
MP’s
ward
coloured
by
the
total
value
of
travel

expenses
claim
in
a
parDcular
area,
we
can
easily
see
whether
or
not
an
MP
is

claiming
an
amount
significantly
different
to
MPs
in
neighbouring
wards.
In
this
case

–
travel
expenses
–
we
might
expect

(at
first
glance
at
least)
a
homophiliDc
effect
–

folk
a
similar
distance
away
from
Westminster
should
presumably
make
similar
sorts

of
travel
claim?
At
second
glance,
we
might
then
start
to
refine
our
quesDoning
–

does
ward
size
(in
terms
of
geographical
area)
or
rurality
have
an
effect?
Does
an
MP

travel
to
and
from
home
more
than
neighbours
(or
perhaps
claim
more
in
terms
of

accommodaDon
in
London?)

35

SomeDmes
we
need
to
provide
quite
a
lot
of
explanaDon
when
it
comes
to
making

sense
of
even
a
simple
data
visualisaDon
–
“what
am
I
supposed
to
be
looking
at?”

36

The
other
way
of
using
data
is
to
tell
stories.
But
what
does
that
even
mean…?

37

The
other
way
of
using
data
is
to
tell
stories.
But
what
does
that
even
mean…?

38

In
passing,
it’s
worth
menDoning
that
one
thing
staDsDcs
does
is
help
provide

context.

Is
this
number
a
big
number
in
the
greater
scheme
of
things?
Is
this
thing
likely
to

happen
by
chance
or
is
there
a
meaningful
causal
relaDonship
between
this
thing
and

another
thing?

The
chart
in
the
corner
is
a
reminder
about
how
surprising
probabiliDes
can
be.
The

chart
shows
the
probability
(y-‐axis)
that
two
people
share
a
birthday
(the
number
of

people
is
given
on
the
x-‐axis).
The
chart
shows
that
if
there
are
23
or
more
people
in

a
room,
there
is
more
than
a
50/50
chance
that
two
of
them
will
share
a
birthday

(that
is,
share
the
same
birth
day
and
month,
though
not
necessarily
same
birth

year).

How
many
people
are
in
the
room?
If
it’s
more
than
23
–
I
bet
that
at
least
two

people
share
a
birthday
(at
least
in
terms
of
day
and
month).

39

The
other
way
of
using
data
is
to
tell
stories.
But
what
does
that
even
mean…?

53

Lincoln jun14datajournalism

Recommandé

Recommandé

Contenu connexe

Similaire à Lincoln jun14datajournalism

Similaire à Lincoln jun14datajournalism (20)

Plus de Tony Hirst

Plus de Tony Hirst (20)

Dernier

Dernier (20)

Lincoln jun14datajournalism