This document contains the schedule for a data science event taking place on November 29th from 14:00-18:30 in room NAB314. The schedule includes 15-minute presentations on topics like big data practices, data as a design tool, gamification and crowd-sourcing, understanding game play behavior, big data and disasters, ethical challenges in data science, values in digital relations and prosperity theology, legible machine learning, data science applications in interdisciplinary research, and an MSc in data science program. There will also be coffee and discussion/drinks at the end.
Boost PC performance: How more available memory can improve productivity
Data Science Event Covers Big Data, Ethics
1. {Data|Social} Science!
29/11/13 14:00-18:30 NAB314!
!
14:00 Introduction!
!
14:15 Big Data Practices !
Evelyn Rupert!
!
14:30 Data as Design Tool!
Rebecca Fiebrink!
!
14:45 Gamification, visualisation and crowd-sourcing!
Frederic Fol Leymarie!
!
15:00 Understanding Game play behaviour!
Jeremy Gow!
!
15:15 Big Data and Disasters!
Dhiraj Murthy!
!
15:30 Coffee!
!
15:45 Ethical Challenges for Data Science!
Dan McQuillan!
!
16:00 Values in modern digital relations and traditional prosperity theology!
Bev Skeggs!
!
16:15 Legible Machine Learning!
Marco Gillies!
!
16:30 On some Data Science applications in interdisciplinary research !
Daniel Stamate!
!
16:45 MSc Data Science!
Daniel Stamate!
!
17:00 Discussion and Drinks!
!
!
2. Big Data Practices !
Evelyn Rupert!
!
I use the term ‘Big Data practices’ to suggest that what is ‘big’ about Big Data are changing
practices that are reconfiguring four kinds of relations: social, method, data, and research.
I’ll focus on the latter and how our academic craft is generating Big Data from online
research articles to other forms of digital content such as websites, databases, blogs,
profiles, images, tweets, podcasts and so on. Through these online mediums academics
are re-versioning and multiplying their research outputs such that the main output – the
research article – is but one of a larger and longer process of relations and practices
accumulated as data on the internet. How might we think about this? I’ll respond to this in
relation to the journal I am editing, Big Data & Society. I’ll discuss how we are organising
the journal as a digital space for linking out to related content and developing a ‘lively’ logo
built on the co-word analysis of journal keywords to explore how it is part of the practices
making up what ‘is’ Big Data.!
!
Data as Design Tool!
Rebecca Fiebrink!
!
Gamification, visualisation and crowd-sourcing!
Frederic Fol Leymarie!
!
Gamification, visualisation and crowd-sourcing, with as an illustration our new BBSRC
grant: DockIt: a Crowd-Sourced Molecular Docking Puzzle Game. I will address the
potential to apply this approach to other complex big data & analytics problems, in
particular in the realm of smart-cities.!
Information retrieval: the need for better multimedia search search and data management.
I will illustrate what we can contribute with recent on-going research on a novel way to
search on images using shape information; work funded in part by the EU FET project
CEEDs. I will say a few words about CEEDs as well, which focuses on novel interfaces for
human user dealing with complex big data problems: http://ceeds-project.eu/!
!
Understanding Game play behaviour!
Jeremy Gow!
!
Big Data and Disasters!
Dhiraj Murthy!
!
Though natural disasters are product of meteorological, seismic, and other physical
actors, they are always social events. Specifically, the ways in which warning occurs,
disasters are responded to, and how reconstruction takes place are all mediated by
sociopolitical factors. These three time envelopes of pre-disaster, diaster, and aftermath
are particularly important in studying disasters. Social media is 'always on' and ubiquitous
and these traits have meant that data is being generated during all three time periods. The
volume of data being collected on various social media is immense and easily places it
within the category of Big Data. My recent work has been focused on data from Hurricane
Sandy. The storm caused over $65 billion in damage, making it the second costliest storm
in U.S. history. In this project, I examine the behavior of Twitter users from October 22,
2012 to November 3, 2012, using mentions, links and hashtags for data analysis. We
found that certain Twitter rose to prominence depending on the stage of the storm. For
example, in the days following Hurricane Sandy’s initial landfall, users became more
3. interested in relief efforts. Data was collected from October 22, 2012 to November 3, 2012,
giving a two week window of Twitter activity. We utilized the Twitter API to collect geolocated tweets from 50 major US cities. Tweets were filtered for three storm related terms:
“hurricane”, “storm” and “sandy”, yielding a total of 142,768 tweets. A second project I am
working on refined this data by following any links to Instagram images within the tweet.
This search returned 11,964 Instagram images that were hand coded into thirteen
separate categories. By studying these images, we were able to discern which categories
rose to prominence during the three time envelopes. For example, food images were
mostly dominant pre-disaster and during the disaster, damage-related images were
dominant. The data and methods of both projects will be briefly introduced.!
!
!
Ethical Challenges for Data Science!
Dan McQuillan!
!
This presentation will interrogate key ethical challenges that are arising at the borders of
social science and computing, and will suggest some approaches to transform these
tensions into productive lines of research. In a post-PRISM environment, big data research
needs distinguish itself from surveillance. 'Because we can' is not an adequate rationale
for researching social media and the data exhaust because it is indistinguishable from the
dynamics of the NSA on one hand and Silicon Valley on the other. Do ethics committees
understand the implications of heterogenous metadata better than the judiciary who failed
in their oversight of PRISM? Further, the algorithms are as important as the data- a
computing-based understanding of algorithms must be combined with a sociological
appreciation of their consequences. We are already seeing a proliferation of 'predictive
methods' with the application of data science and machine learning to everything from
Wonga loans to drone strikes. Rapid development of methods is outpacing the
development of a social framework for their governance. By drilling down to issues of!
data construction, and looking at algorithms through a combination of Foucault and
cybernetics, this presentation will propose participatory methods as an important new line
of development in data science, and suggest that emerging areas of citizen science are
finding an appropriate balance of the empricial and the ethical.!
!
Values in modern digital relations and traditional prosperity theology!
Bev Skeggs!
!
There has been a great deal of interest in how capital has intervened in almost every area
of life, leading some to propose new forms of capital eg ‘emotional capitalism’, and others
to suggest that processes of valuation are now the major method for understanding the
social world. Whilst, no doubt, capital behaves according to its own logic, finding new lines
of flight, converting affects into value, making multi-culturalism marketable, generating !
new forms of bio-capital, and making many of our actions subject to the logic of
calculation, this project asks if anything is left behind. Is there anything that cannot be
capitalized upon? Many social theories reproduce the logic of capital. But if we only
understand the world from the perspective of this logic what do we miss seeing? My !
previous research projects have drawn attention to how values are formed beyond value,
unnoticed and unseen, producing new ways of being and doing in the world, organized
differently through spatial and temporal co-ordinates. This project consolidates and
expands this analysis by exploring values (and their relationship to value) through two limit
cases that attempt to convert all values to value: modern digital relations and traditional
prosperity theology. !
!
4. Legible Machine Learning!
Marco Gillies!
!
This talk will give an overview on research that uses machine learning as part of a tool to
enable actors and ordinary gamers to design the movement and behaviour of a virtual
character. They use data of their movements as the means for customising the algorithms
that control the characters. The key challenge in this work is how to debug the models
when they go wrong and do not work as intended. Learning algorithms are often opaque,
even to expert researchers, making them difficult to debug. This research has lead us to
the importance of designing algorithms and tools that are legible to users. This means that
they must support a clearly legible conceptual model both in their interface and the
algorithm itself. We will conclude with a brief discussion of how this might apply to data
research in the social sciences. !
!
On some Data Science applications in interdisciplinary research !
Daniel Stamate!
!
We present a series of applications of Machine Learning, Statistical Data Mining and Big
Data Analytics and research work in: (a) predicting medical treatment outcomes based on
genotype data in medical sectors in which efficient treatment prescribing is paramount but
in which the trial and error approach to prescribing a working treatment is current practice;
(b) diagnosing cancer patients based on gene expression data; (c) the evaluation of
forecasting models in the renewable energy sector (wind time series); (d) web mining and
sentiment analysis; (e) mining census data. A brief introduction of the new Data Science &
Soft Computing Lab and its activity will conclude this presentation.!
!
16:45 MSc Data Science!
Daniel Stamate!
!
We outline the profile of this new MSc programme in Data Science, and the opportunities it
brings to its students in particular in studying cutting edge Data Science technologies, and
in being exposed to and potentially involved in interdisciplinary research work in the
College, to which these students could contribute with their expertise in Machine Learning,
Statistical Data Mining, and Big Data Management and Analytics during their final project
work or possibly in subsequent PhD study. These fields inspire new trends indeed not only
in industry but in any other sector of activity, including research, in which processing and
analysing data brings unprecedented challenges and offers unprecedented opportunities.
In this presentation we want also to suggest concrete ways in which the Data Science
MSc's students could be offered the opportunity to be inspired by the interdisciplinary
research activities developed in the College's departments, opportunity which could
potentially be followed by the involvement of some of these students in these activities.