SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
Appropri-ut	in	the	sense	of	not	inapppropriate	–	the	“right	thing”	to	use,	as	well	as	
approrpri-ate,	as	in	co-opt,	or	use	for	something	it	perhaps	wasn’t	originally	intended	
for.	
1
So	for	example,	one	thing	I	do	is	appropriate	openly	licensed	media	resources	for	my	
own	slides.	
	
In	this	case,	I	want	to	set	the	scene	for	this	presentaAon	as	one	in	which	I	haven’t	
been	afraid	to	get	my	hands	dirty,	but	I	have	also	played	with	and	explored	a	
parAcular	medium	–	in	this	case,	various	digital	technologies	–	and	created	my	own	
things	which	may	also,	ulAmately,	be	of	direct	use	to	others.	
	
You	might	also	say	they’re	at	best	half-baked,	if	not	completely	unbaked;-)	
2
The	tools	I’m	going	to	talk	about	are	situated	within	a	data	context.	I	spend	a	lot	of	
Ame	playing	with	openly	licensed	datasets,	working	across	the	whole	data	pipeline.	
	
This	example,	taken	from	the	third	year	undergrad	equivalent	OU	course	TM351	
“Data	Analysis	and	Management”	provides	a	simplisAc	view	of	some	of	the	processes	
involved	in	working	with	data.	
	
(We	all	know	it’s	not	quite	that	straighSorward,	and	oTen	involves	a	lot	of	iteraAon	
or	backtracking,	but	as	well	as		“The	role	of	the	academic		[making]	everything	less	
simple”,	as	Mary	Beard	put	it	in	an	Observer	interview	a	few	weeks	ago,	the	
academic	also	simplifies	and	idealises	through	abstracAon	and	revisionist	storytelling,	
parAcularly	when	it	comes	to	describing	processes.	
	
So	what	I	plan	to	do	is	spend	a	few	minutes	show	you	some	of	the	tools	and	
emerging	approaches	I	use	working	across	the	various	steps	of	this	pipeline.	
3
So	–	the	first	thing	to	note	is	that	I’m	a	technology	opAmist:	I	believe	technology	can	
help	make	our	lives	simpler,	even	if	at	first	it	may	look	as	if	we	are	making	it	more	
complex	by	introducing	yet	more	tools	to	learn	–	and	install	on	computers	that	our	IT	
department	would	rather	we	leT	under	their	control.	
	
Taking	control	of	your	compuAng	desAny	is	another	theme	of	this	talk…	
	
In	this	example,	the	box	diagram	I	showed	on	the	first	line	was	/wrien/	rather	than	
drawn.	If	I	want	to	add	steps,	or	have	sub-branches	added	to	the	diagram,	I	don’t	
need	to	start	faffing	around	in	Powerpoint	or	Word	figures	trying	to	line	things	up	
and	get	them	sized	right	and	so	on.	
	
I	let	the	machine	do	it.	
	
In	this	parAcular	online	tool	(you	can	see	the	URL	in	the	screenshot	at	the	top	of	the	
slide	–	I’ll	pop	a	copy	of	the	annotated	slides	online,	and	also	let	Alan	have	a	copy)	–	
so,	in	this	parAcular	tool,	blockdiag,	there	are	other	diagram	types	available.	
	
The	underlying	code	is	also	opensource	and	available	as	a	python	package,	so	you	can	
write	diagrams	such	as	these	in	a	Jupyter	notebook,	for	example.	
	
I’ll	have	more	to	say	about	Jupyter	notebooks	later.	
4
One	other	point	to	note	–	and	a	bit	of	blatant	self-promoAon	here	–	most	of	the	
individual	slides	within	this	talk	are	backed	up	by	one	or	more	posts	on	my	personal	
blog,	Ouseful.info.	
	
I’ve	been	wriAng	this	blog	for	many	years	and	it	represents	a	reasonably	complete	
notebook	of	a	lots	of	the	ideas	I’ve	explored	over	that	Ame.	
	
In	many	cases,	the	posts	are	comprehensive	and	self-complete:	they	record	all	the	
steps	I	took	to	do	somehAng	in	case	I	need	to	remind	myself	later.	
5
So,	the	pipeline.	
	
The	first	step,	acquisiAon,	relates	to	how	we	get	hold	of	data	This	may	be	from	
downloaded	data	files	–	Excel	spreadsheet	documents	(which	are	actually	zip	files	–	
you	know	you	can	change	the	xlsx	suffix	to	zip	and	unzip	them,	right?	Same	with	docx	
Word	document	files	and	pptx	Powerpoint	files),	databases,	online	APIs	(applicaAon	
programmable	interfaces),	but	it	may	be	scraped	from	other	sorts	of	document.	Web	
pages,	for	example,	or	PDF	documents	(even	though	PDF	documents	are	horrible,	it’s	
oTen	quite	easy	to	extract	data	tables	from	them).	
	
I’m	not	going	to	talk	about	the	mechanics	of	scraping,	but	journalism	lecturer	Paul	
Bradshaw	has	a	good	intro	to	a	variety	of	tools	and	techniques	in	his	Leanpub	book	
“Scraping	for	Journalists”.	
6
I	will	beiefly	menAon	a	couple	of	tools	I	use	though	–	morph.io	is	a	site	hoste	dby	an	
Australian	opendata	group	that	is	actually	a	fork	of	a	tool	by	UK	Liverpudlian	start-up,	
Scraperwiki.	
	
Morp.io	will	run	a	scraper	of	your	own	wriAng,	hosted	on	Github,	once	a	day	and	pop	
the	results	into	a	SQLite	database	that	you	can	download.	
	
The	slide	shows	a	scraper	I	use	for	scraping	License	applicaAons	made	to	the	Isle	of	
Wight	council.	
7
Another	tool	I	use	a	lot	is	Tabula.	Tabula	is	a	Java	applicaAon	with	a	browser	based	
user	interface	that	will	extract	data	tables	from	PDF	documents.	
	
You	simple	drag	to	select	the	area	of	the	page	you	want	to	scrape	(you	can	mirror	the	
same	area	over	mulAple	pages	or	define	different	areas	on	each).	
8
The	heart	of	the	applicaAon	is	actually	a	command	line	engine,	recently	wrapped	by	
the	R	tabulizr	package.	
	
This	means	you	can	automate	the	use	of	tabula	in	order	to	scrape	tabular	data	from	
PDF	documents	within	R,	gepng	the	data	back	as	an	R	data	frame.	
	
That’s	tabulizr	–	very	nice;	and	the	developer	(on	Github)	is	quite	responsive.	
9
Another	tool	I	use	from	Ame	to	Ame	is	Apache	Tika	–	this	can	extract	text	from	PDFs,	
Word	documents	and	so	on,	as	well	as	from	images.	
	
There	are	quite	a	few	online	OCR	services	now,	many	of	them	appearing	as	part	of	
“AI	toolsets”,	offering	a	range	of	commodity	AI	API	services	–	IBM,	MicrosoT	and	
Google	all	have	them,	for	example.	
	
So	as	well	as	OCR	text	extracAon,	they	do	face	and	emoAon	detecAon	in	images,	
semanAc	tagging	/	enAty	labeling	within	documents,	automaAc	image	tagging,	
speech	to	text,	and	so	on.	All	with	varying	degrees	of	success.	But	all	of	them	steadily	
improving.	
10
ATer	data	acquisiAon,	we’re	oTen	faced	with	cleaning	a	dataset.	A	tool	I	used	for	
cleaning	data	is	another	Java	applicaAon,	again	accessed	via	a	browser,	called	
OpenRefine.	
	
OpenRefine	will	open	a	wide	range	of	document	types	–	spreadsheets,	csv	or	tabbed	
data	files,	XML,	JSON,	HTML	–	either	locally	or	from	the	web,	and	presents	it	in	a	
spreadsheet	style	UI.	
	
A	wide	range	of	opAons	are	provided	for	applying	a	parAcular	transformaAon	to	each	
cell	in	a	parAcular	column	–	you	can	also	script	your	own	in	a	custom	scripAng	
language,	or	Python	–	as	well	as	tools	for	faceAng	and	filtering	the	display	of	rows	
based	on	values	within	one	or	more	columns.	
	
The	clustering	tools	are	useful	for	finding	and	correcAng	parAal	matches	–	so	for	
example,	you	can	normalise	MyCo	Ltd,	with	MyCo	Ltd.,	with	MyCo	Limited,	and	so	
on.	
11
OpenRefine	can	also	provide	support	for	a	limited	range	of	data	reshaping	acAons.	
I’ve	described	a	few	of	them	in	this	post,	which	takes	a	messy	local	elecAon	results	
data	set	and	shows	how	to	clean	and	reshape	it.	
	
OpenRefine	also	has	a	templated	export	–	so	we	can	generate	simple	‘line	at	a	Ame’	
reports	from	a	filtered	dataset.	
12
One	of	the	things	I	try	to	look	for	in	applicaAons	is	whether	they	are	open	source	and	
whether	they	provide	a	browser	based	UI	–	if	you	can	use	it	via	a	browser,	you	should	
be	able	to	use	it	on	your	own	local	machine	or	from	a	remotely	hosted	version	
accessed	over	the	web.	
	
OpenRefine	meets	both	these	criteria,	which	means	it’s	no	problem	for	someone	like	
IBM	to	make	it	available	via	their	DataScienAstWorkbench	site.	(It’s	also	not	too	hard	
to	roll	you	won	version	of	something	like	this	site.)	
	
The	other	tools	currently	provided	by	this	site	are	RStudio,	a	powerful	–	and	friendly	
–	IDE	for	the	R	programming	language,	and	Jupyter	notebooks.	
13
One	reason	why	it’s	gepng	easier	to	expose	these	applicaAons	over	the	web	in	a	
scaleable	way	is	through	containerisaAon.	ContainerisaAon	is	a	form	of	applicaAon	
virtualisaAon	where	one	or	more	applicaAons	can	be	wired	together	an	isolated	from	
each	other	within	a	mulA-tenanted	virtual	machine.	
	
Docker	containers	offer	the	promise	of	being	able	to	“run	anywhere”	–	or	at	least,	
anywhere	where	the	container	plaSorm	can	operate.	Docker	is	the	most	popular	
route	to	this	at	the	moment.	
	
The	applicaAon	show	here	is	called	KitemaAc.	It	lets	you	search	for	public	applicaAon	
containers,	and	download	them	and	run	them	locally	on	your	own	computer.	
	
The	example	shows	various	containers	I’ve	put	together	for	OpenRefine	(some	are	
different	versions,	others	are	experiments	/	demos	I	really	should	delete)	
	
So	rather	than	install	Java	on	your	computer	and	then	download	and	install	
OpenRefine,	you	can	just	one-click	in	KitemaAc	and	it	will	get	a	prepackaged	
OpenRefine	container	for	you	that	includes	all	that	OpenRefine	needs	to	run.	
14
One	of	the	spin-offs	from	the	early	days	of	OpenRefine	was	the	noAon	of	a	
“reconciliaAon	service”,	whereby	you	could	look	up	each	item	in	an	OpenRefine	
column	against	a	webservice	that	would	try	to	match	it	to	–	reconcile	it	with	–	a	
known	enAty.	A	parAal	/	fuzzy	matching	lookup	against	a	controlled	vocabulary,	
essenAally.	OpenCorporates,	the	opendata	internaAonal	company	lookup	service,	
offers	a	reconciliaAon	endpoint.	
	
It’s	easy	enough	to	package	up	your	own	lookup	tables	and	this	recipe	describes	how	
to	do	it	using	a	homebrewed	reconciliaAon	container.	I	did	ones	for	MPs,	for	
example.		
15
Just	as	an	aside,	when	pupng	together	reconciliaAon	services,	we	ideally	want	a	
canonical	list	of	enAAes	or	enAty	names	we	want	to	reconcile	against.	
	
Registers	can	be	a	good	source	of	these.	But	it’s	also	worth	noAng	that	registers	can	
also	be	used	to	generate	derived	datasets.	For	example,	I	wanted	a	list	of	UK	prisons	
with	locaAon	informaAon.	
	
In	the	absence	finding	a	single	openly	licensed	dataset	with	this	informaAon	(a	
website	with	one	prison	per	page	was	the	closest	I	found,	which	I	could	have	scraped	
but	chose	not	to),	I	instead	do	a	lookup	via	the	Food	Standards	Agency,	which	has	
inspecAon	informaAon	for	public	food	outlets.	(Another	source	might	have	been	the	
CQC,	with	a	search	for	health	surgeries	or	dental	treatment	centres,	filtered	by	
“HMP”	or	“prison”).	
	
16
RStudio	is	another	applicaAon	that	can	be	freely	redistributed	and	exposed	via	a	
bowser.	
	
These	posts	who	how	to	run	an	RStudio	applicaAon	in	the	cloud	using	a	simple	
container	management	dashboard	formerly	known	as	Tutum,	now	available	as	
Docker	Cloud.	
	
I’ve	also	described	how	to	package	a	Shiny	applicaAon	in	a	container	so	you	can	
deploy	it	anywhere.	
	
Does	anyone	use	Shiny?	Shiny	is	a	rapid	prototyping	tool	for	building	browser-based,	
HTML5	interacAve	applicaAons	and	dashboards	–	RStudio	released	a	new	
dashboarding	framework	over	the	last	couple	of	weeks	–	that	make	it	relaAvely	easy	
to	build	interacAve	data	exloraAon	tools	against	an	R	environment.	
17
One	really	nice	component	of	the	Docker	ecosytem	is	docker-compose,	formerly	
known	as	fig,	which	allows	you	to	orchestrate	the	launch	of	several	interlinked	
containers,	so	you	can	easily	access	one	from	another.	
	
The	example	here	shows	how	to	link	RStudio	and	a	Jupyter	notebooks	to	a	neo4j	
database.	
18
I’ve	menAoned	Jupyter	a	few	Ames	–	does	anyone	use	Jupyter	notebooks?	IPython	
notebooks?	
	
The	browser	based	notebook	UI	lets	you	enter	text	(as	markdown)	and	executable	
code	(in	a	variety	of	languages)	and	then	run	the	code	and	display	the	results	of	the	
code	execuAon	back	in	the	notebook.	
	
One	thing	I’ve	been	exploring	recently	is	a	way	of	calling	command	line	applicaAon	
funcAons	packaged	in	a	container	from	a	notebook	cell,	and	returning	the	output	of	
of	the	containerised	command	line	funcAon	as	a	shared	file.	
	
This	post	describes	how	I	package	the	Contentmine	tools		-	a	set	of	tools	for	
harvesAng	scienAfic	journal	papers	and	extracAng	knowledge	from	them	–	and	which	
a	real	pain	to	set	up	normally	–	and	then	use	them	via	a	notebook.	
19
Just	by	the	by,	if	you	want	to	try	the	notebooks	out,	there’s	a	live	demo	available.	(I	
also	did	a	post	on	“Seven	Ways	to	Run	Jupyter	Notebooks”	which	describes	several	
other	alternaAve	ways	of	running	the	notebooks.)	
	
The	code	example	here	shows	all	the	code	needed	to	open	an	Excel	file	containing	
average	travel	Ames	to	GP	surgeries	by	LSOA,	filter	the	data	down	to	a	parAcular	local	
authority	area,	pull	in	an	openly	licensed	geojson	shapefile	for	that	area,	and	then	
plot	(and	embed)	an	interacAve	choropleth	map	via	the	folium	python	package	(using	
Google	maps,	I	think,	though	it	may	be	OpenStreetmap?)	
20
One	problem	with	producing	interacAve	maps	is	that	someAmes	you	actually	want	an	
image.	
	
It	turns	out	that	webtesAng	frameworks	like	Selenium	make	it	easy	to	grab	
screenshots	from	test	pages	rendered	in	a	test	browser,	so	I	co-opted	the	idea	to	
produce	a	rouAne	that	lets	me	grab	a	png	snapshot	of	a	map.	
21
That	example	was	actually	created	for	a	side	project	I	dabbled	with	with	our	
hyperlocal	news	outlet	on	the	Isle	of	Wight	called	OnTheWIght.	
	
OnTheWight	have	been	reporAng	monthly	job	figures	for	years,	so	I	though	I’d	have	a	
go	at	automaAng	the	producAon	of	the	reports	from	nomis	data,	as	well	as	producing	
a	few	charts.	
	
The	report	is	just	a	literal	reporAng,	although	I	do	try	to	add	some	colour	and	a	Any	
amount	of	analysis	for	example	by	using	direcAonal	and	magnitude	terms	–	“the	
numbers	went	UP	SLIGHTLY	from	last	month,	although	they	are	SIGNIFICANTLY	
DOWN	from	the	same	Ame	last	year”.	And	so	on.	
22
On	my	own	site,	I	started	trying	to	pull	out	some	geographical	insight,	automaAcally	
reporAng	on	areas	with	noAceably	high	unemployment	compared	to	other	areas	by	
gender.	
	
The	map	does	look	like	a	populaAon	map,	but	the	unemployment	rate	is	actually	
higher	in	some	of	the	more	heavily	populated	areas!	
23
Just	a	side	note	–	the	idea	of	being	able	to	build	something	once	they	deploy	it	more	
widely	for	no	extra	effort	really	appeals	to	me.	
	
In	the	case	of	naAonal	datasets	broken	down	to	local	level,	building	a	soluAon	for	a	
local	area	you	know	about	and	understand	helps	get	you	started	on	automaAcally	
detecAng	and	pulling	out	stories	or	features	–	but	the	same	code	can	then	run	for	
other	areas.	
24
The	pain	points	oTen	come	in	splipng	the	data	down	to	local	areas	and	then	
generaAng	the	stories.	
25
But	if	you	automate	a	pain	point	away	for	one	local	area,	you’ve	solved	the	problem	
for	all	of	them.	
	
The	approach	I’ve	been	taking	is	to	think	in	terms	of	producing	press	releases	rather	
than	than	finished	stories,	relying	on	the	journalist,	or	some	other	editorial	role,	to	
act	as	the	final	arbiter	of	the	quality	and	relevance	of	the	press	release	style	
communicaAon.	
	
The	implicaAon	is	also	that	more	work	needs	to	be	done	checking	and	working	up	the	
press	release	for	the	final	story	(if,	indeed,	there	is	any	story).	
26
So	picking	up	on	this	idea	of	reuse	–	or	laziness	–	the	nomis	data	to	text	engine	can	
be	easily	wrapped	to	to	provide	a	conversaAonal	UI	for	it.	
	
In	this	example,	I	can	ask	the	service	for	the	latest	JSA	figures	in	a	parAcular	area.	
Although	not	shown,	you	can	put	in	a	postcode,	for	example,	and	get	the	figures	back	
for	the	local	authority	area	containing	that	postcode.	
	
At	the	Ame	I	did	this	demo,	I	was	half	thinking	of	trying	to	persuade	Johnston	Press	to	
give	me	some	pin	money	to	play	with,	so	I	scraped	a	list	of	Johnston	press	papers,	
found	the	postcode	of	their	office,	and	used	it	as	a	the	basis	for	a	lookup	of	jobless	
figures	by	newspaper	Atle	area.	
	
27
Having	got	some	machinery	set	up	to	work	with	slack,	I	could	also	use	it	as	an	
interface	for	a	simple	“spreadsheet	row	to	paragraph	of	text”	toy	I	was	trying	to	put	
together.	
	
So	here,	for	example,	I’m	looking	up	latest	figures	for	CQC	care	home	inspecAons.	
(Actually,	I	think	this	is	based	on	a	scraper	of	the	CQC	website	rather	than	a	data	file	
download.)	
28
The	original	experiments	had	the	slack	bot	code	running	on	my	personal	computer.	
More	recently,	I	started	looking	at	how	things	like	Amazon	AWS	Lamda	funcAons,	
essenAally	serverless	remote	procedure	calls,	could	be	used	to	host	the	bot.	
	
The	examples	here	make	use	of	the	UK	Parliament	API	to	provide	the	content,	
allowing	me	to	lookup	up	recent	reports,	or	commiee	memberships,	for	example.	
29
The	data	2	text	area	is	a	rich	one,	and	one	thing	I	find	reflecAng	on	my	own	
exploratory	data	acAviAes	is	that	I	oTen	look	to	charts	(which	are	oTen	custom,	
mutlilayered	charts	of	my	own	devising	–	ggplot	is	great	for	that)	for	inspiraAon.	
	
Working	in	educaAon,	where	we	have	a	legal	requirement	to	make	our	teaching	
materials	accessible,	charts	and	figures	oTen	require	wrien	descripAons.	
	
So	one	thing	I’ve	started	wondering	recently	is	whether	we	can	introspect	on	chart	
objects	created	using	things	like	ggplot	as	a	“data	basis”	for	a	textualisaAon	of	the	
chart	components	(and	then	do	data2tesxt	analysis	for	the	simple	analyAcs	insight	
reporAng).	
	
And	it	seems	we	can	–	gpplot	chart	objects	,	for	example,	have	a	ggplot_build()	
introspector,	and	we	can	also	get	access	directly	to	chart	objects.	
30
When	I	posted	about	my	ggplot2text	experiment,	I	idly	wondered	whether	we	could	
do	the	same	for	matplotlib	chart	objects.	And	is	seems	we	can,	as	this	demo	shared	
via	a	commenter	shows.	
	
#Lazyweb	Tw,	you	might	say:-)	
31
As	I	was	looking	at	the	Parlimanent	API	backend	for	a	simple	conversaAonal	search	
agent,	the	ONS	Beta	website	became	the	live	site.	One	of	the	nice	things	about	the	
new	ONS	site	is	that	a	JSON	feed	alternaAve	is	available	for	much	of	the	HTML	
content	on	the	site.	
	
Which	means	we	can	repurpose	that	website	content	directly	as	a	response	to	a	
conversaAonal	search.	
32
Finally,	I	want	to	return	to	the	Jupyter	ecosystem.	
	
I	absoultely	love	the	notebook	environment:	it	provides	a	great	environment	for	
wriAng	literate,	reproducible	data	analysis	scripts	(serval	news	outlets	are	starAng	to	
publlish	Jupyter	notebooks	showing	the	analysis	behind	their	news	stories	–	Buzzfeed	
is	a	great	example	of	this,	as	with	their	recent	tennis	macth	fixing	/	bepng	scame,	for	
example),	as	well	as	providing	a	great	environment	for	documenAng	exploratory	data	
analyses.	
	
But	the	Jupyter	ecosystem	is	already	much	richer	than	that.	
	
I	haven’t	described	the	dashboard	toolkit	for	creaAng	live	dashboards,	the	slideshow	
view	that	lets	you	create	interacAve	slides	with	live	code	execuAon,	the	range	of	
programme	language	kernels	(not	just	Python	and	R)	or	the	kernel	wrapper	that	lets	
you	define	an	API	via	a	notebook).	
	
But	I	do	just	want	to	quickly	menAon	remote	kernels.	
33
At	the	moment,	we’re	currently	rewriAng	a	day	long	residenAal	school	acAvity	that	
uses	Lego	robots.	UnAl	this	year,	we’ve	used	the	original	yellow	Lego	Mindstorms	
RCX	brick.	This	year,	we’re	using	the	Lego	EV3	brick,	which	has	wifi	and	can	be	set	up	
to	run	Linux	and	a	python	shell	that	can	access	the	robot’s	bits.	
	
The	approach	I’ve	been	exploring	it	to	run	a	remote	IPython	kernel	on	the	brick,	and	
a	Juoyter	server	on	a	desktop	machine,	and	then	connect	a	notebook	to	the	remote	
kernel	via	the	Jupyter	server.	
	
Running	the	notebook	server	on	the	brick	removes	the	load	of	running	the	server	
from	the	brick.	(The	same	approach	can	be	–	and	is	–	used	to	run	large	tasks	on	
supercomputer	clusters.)	
	
The	notebooks	also	allow	us	to	create	simple	interacAve	Uis	–	just	like	R	has	the	shiny	
framework,	the	Jupyter	notebooks	can	run	interacAve	ipywidgets	direclty	wired	to	
python	state.	In	the	example	abovem	I	have	a	slide	for	controlling	motor	speed,	for	
example	(actually,	the	duty	cycle	fo	the	stepper	motor)	and	another	that	displays	the	
value	being	seen	by	a	parAcular	sensor.	(Again,	there’s	a	Any	element	of	simplisAc	
data2text	contextualisaAon	in	the	display.)	
		
34
So	that’s	me	done.	
	
Some	of	the	tools	and	technologies	that	I	think	are	appropriate	for,	or	can	be	
appropriated	for,	data	related	tasks.	
	
SomeAmes	a	pen	will	do	as	well	as	a	spoon.	
35
And	finally,	a	last	bit	of	blatant	self-promoAon.	
	
In	the	same	way	that	maths	has	recreaAonal	maths	–	fun	puzzles	in	the	Sunday	
papers	–	I	engage	in	recreaAonal	data	acAviAes.	And	as	with	the	blog,	I	keep	a	record	
of	what	I’ve	done.	
	
Several	years	ago,	I	started	to	learn	R,	and	used	Formula	One	results	and	Aming	
sheets	data	as	context	for	that.	Over	the	years,	I’ve	pulled	various	tricks	and	
techniques	together	into	this	evolving	book.	(Actually,	the	book	was	also	another	
experiment	–	Leanpub	encourages	you	to	publish	as	you	write,	and	used	markdown	
for	the	manuscript.	I	was	looking	for	an	opportunity	to	explore	whether	we	might	be	
able	to	use	something	like	Rstudio,	and	in	parAcular	Rmd,	R-markdown)	for	authoring	
OU	course	materials,	so	this	gave	me	a	reason	–	and	a	context	–	for	exploring	such	a	
workflow).	
	
It’s	sAll	a	work	in	progress,	bit	at	over	400	pages	already	it	represents	a	reasonably	
deep	dive	into	the	different	things	you	can	do	with	a	limited	range	of	datasets	on	a	
parAcular	topic,	as	well	as	exploring	a	variety	of	ways	of	using	–	and	appropriaAng	–	R	
to	help	us	find	stories	in	data.	
36

Contenu connexe

Similaire à Gors appropriate

My self assessment
My self assessmentMy self assessment
My self assessmentjcmahoney76
 
5 tactics for practical privacy protection
5 tactics for practical privacy protection5 tactics for practical privacy protection
5 tactics for practical privacy protectionAmber Macintyre
 
Choose Boring Technology
Choose Boring TechnologyChoose Boring Technology
Choose Boring TechnologyDan McKinley
 
A Few of My Favorite Tools
A Few of My Favorite ToolsA Few of My Favorite Tools
A Few of My Favorite ToolsShimon Shmueli
 
Reading response #3
Reading response #3Reading response #3
Reading response #3cfregoso
 
Sourceress cover letter
Sourceress cover letterSourceress cover letter
Sourceress cover letterTala Shivute
 
Collabtipskennedymighellts09
Collabtipskennedymighellts09Collabtipskennedymighellts09
Collabtipskennedymighellts09denniskennedy
 
English for Computer Unit 1 Introduction
English for Computer Unit 1 IntroductionEnglish for Computer Unit 1 Introduction
English for Computer Unit 1 Introductionanchalee khunseesook
 
Respond to two of your colleagues in one or more of the following .docx
Respond to two of your colleagues in one or more of the following .docxRespond to two of your colleagues in one or more of the following .docx
Respond to two of your colleagues in one or more of the following .docxaudeleypearl
 
Respond to two of your colleagues in one or more of the following .docx
Respond to two of your colleagues in one or more of the following .docxRespond to two of your colleagues in one or more of the following .docx
Respond to two of your colleagues in one or more of the following .docxwilfredoa1
 
Arduino lessons learned
Arduino lessons learnedArduino lessons learned
Arduino lessons learnedBryce Roberts
 
Formal vs informal.pptx
Formal vs informal.pptxFormal vs informal.pptx
Formal vs informal.pptxFionaKee3
 
TIP OF THE DAY series about DIP
TIP OF THE DAY series about DIPTIP OF THE DAY series about DIP
TIP OF THE DAY series about DIPDarshana Samanpura
 
Hacker High School-Book 01- being_a_hacker
Hacker High School-Book 01- being_a_hackerHacker High School-Book 01- being_a_hacker
Hacker High School-Book 01- being_a_hackerBons Ju
 

Similaire à Gors appropriate (17)

My self assessment
My self assessmentMy self assessment
My self assessment
 
5 tactics for practical privacy protection
5 tactics for practical privacy protection5 tactics for practical privacy protection
5 tactics for practical privacy protection
 
Choose Boring Technology
Choose Boring TechnologyChoose Boring Technology
Choose Boring Technology
 
A Few of My Favorite Tools
A Few of My Favorite ToolsA Few of My Favorite Tools
A Few of My Favorite Tools
 
Paradox of the Active User
Paradox of the Active UserParadox of the Active User
Paradox of the Active User
 
Reading response #3
Reading response #3Reading response #3
Reading response #3
 
Sourceress cover letter
Sourceress cover letterSourceress cover letter
Sourceress cover letter
 
Collabtipskennedymighellts09
Collabtipskennedymighellts09Collabtipskennedymighellts09
Collabtipskennedymighellts09
 
English for Computer Unit 1 Introduction
English for Computer Unit 1 IntroductionEnglish for Computer Unit 1 Introduction
English for Computer Unit 1 Introduction
 
Respond to two of your colleagues in one or more of the following .docx
Respond to two of your colleagues in one or more of the following .docxRespond to two of your colleagues in one or more of the following .docx
Respond to two of your colleagues in one or more of the following .docx
 
Respond to two of your colleagues in one or more of the following .docx
Respond to two of your colleagues in one or more of the following .docxRespond to two of your colleagues in one or more of the following .docx
Respond to two of your colleagues in one or more of the following .docx
 
Arduino lessons learned
Arduino lessons learnedArduino lessons learned
Arduino lessons learned
 
Formal vs informal.pptx
Formal vs informal.pptxFormal vs informal.pptx
Formal vs informal.pptx
 
Ass6
Ass6Ass6
Ass6
 
TIP OF THE DAY series about DIP
TIP OF THE DAY series about DIPTIP OF THE DAY series about DIP
TIP OF THE DAY series about DIP
 
Hacker High School-Book 01- being_a_hacker
Hacker High School-Book 01- being_a_hackerHacker High School-Book 01- being_a_hacker
Hacker High School-Book 01- being_a_hacker
 
C programming guide new
C programming guide newC programming guide new
C programming guide new
 

Plus de Tony Hirst

15 in 20 research fiesta
15 in 20 research fiesta15 in 20 research fiesta
15 in 20 research fiestaTony Hirst
 
Jupyternotebooks ou.pptx
Jupyternotebooks ou.pptxJupyternotebooks ou.pptx
Jupyternotebooks ou.pptxTony Hirst
 
Virtual computing.pptx
Virtual computing.pptxVirtual computing.pptx
Virtual computing.pptxTony Hirst
 
ouseful-parlihacks
ouseful-parlihacksouseful-parlihacks
ouseful-parlihacksTony Hirst
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriateTony Hirst
 
Notes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 WorkshopNotes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 WorkshopTony Hirst
 
Community Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireCommunity Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireTony Hirst
 
Residential school 2015_robotics_interest
Residential school 2015_robotics_interestResidential school 2015_robotics_interest
Residential school 2015_robotics_interestTony Hirst
 
Data Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXData Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXTony Hirst
 
A Quick Tour of OpenRefine
A Quick Tour of OpenRefineA Quick Tour of OpenRefine
A Quick Tour of OpenRefineTony Hirst
 
Conversations with data
Conversations with dataConversations with data
Conversations with dataTony Hirst
 
Data reuse OU workshop bingo
Data reuse OU workshop bingoData reuse OU workshop bingo
Data reuse OU workshop bingoTony Hirst
 
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Tony Hirst
 
Lincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data JournalismLincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data JournalismTony Hirst
 
Hestia linear tales
Hestia linear talesHestia linear tales
Hestia linear talesTony Hirst
 
Hestia linear tales
Hestia linear talesHestia linear tales
Hestia linear talesTony Hirst
 

Plus de Tony Hirst (20)

15 in 20 research fiesta
15 in 20 research fiesta15 in 20 research fiesta
15 in 20 research fiesta
 
Dev8d jupyter
Dev8d jupyterDev8d jupyter
Dev8d jupyter
 
Ili 16 robot
Ili 16 robotIli 16 robot
Ili 16 robot
 
Jupyternotebooks ou.pptx
Jupyternotebooks ou.pptxJupyternotebooks ou.pptx
Jupyternotebooks ou.pptx
 
Virtual computing.pptx
Virtual computing.pptxVirtual computing.pptx
Virtual computing.pptx
 
ouseful-parlihacks
ouseful-parlihacksouseful-parlihacks
ouseful-parlihacks
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriate
 
Notes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 WorkshopNotes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 Workshop
 
Community Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireCommunity Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wire
 
Residential school 2015_robotics_interest
Residential school 2015_robotics_interestResidential school 2015_robotics_interest
Residential school 2015_robotics_interest
 
Data Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXData Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKX
 
Week4
Week4Week4
Week4
 
A Quick Tour of OpenRefine
A Quick Tour of OpenRefineA Quick Tour of OpenRefine
A Quick Tour of OpenRefine
 
Conversations with data
Conversations with dataConversations with data
Conversations with data
 
Data reuse OU workshop bingo
Data reuse OU workshop bingoData reuse OU workshop bingo
Data reuse OU workshop bingo
 
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
 
Lincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data JournalismLincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data Journalism
 
Calrg14 tm351
Calrg14 tm351Calrg14 tm351
Calrg14 tm351
 
Hestia linear tales
Hestia linear talesHestia linear tales
Hestia linear tales
 
Hestia linear tales
Hestia linear talesHestia linear tales
Hestia linear tales
 

Dernier

Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 

Dernier (20)

Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 

Gors appropriate