This document discusses the ethics of big data and smart cities. It notes that cities are generating vast amounts of data from various sources that can be used to understand and manage urban systems. However, it also notes several critiques of smart city approaches, including the risk of technocratic governance, exacerbating inequalities, and threats to privacy from extensive data collection and integration. The document outlines various privacy concerns around urban big data, such as enabling extensive surveillance and inference, weak anonymization allowing re-identification, lack of transparency and individual control. It argues notice and consent are often empty given the complexity of data flows. Overall, it calls for balancing the benefits of data-driven urbanism with minimizing its pernicious effects through governance,
1. The Ethics of Urban Big Data
and Smart Cities
Prof. Rob Kitchin
Maynooth University
2. Data and the city
• Rich history of data being generated about cities
• Urban data are a key input for understanding city life,
solving urban problems, formulating policy and plans,
guiding operational governance, modelling possible
futures, and tackling a diverse set of other issues
• For as long as data have been generated about cities
then, various kinds of data-informed urbanism has been
occurring
• Data-informed urbanism is increasingly being
complemented and replaced by data-driven, networked
urbanism
• Post-Millennium, the urban data landscape is being
transformed moving from small to big data
3. Urban big data
• Directed
o Surveillance: CCTV,
drones/satellite
o Scaled public admin records
• Automated
o Automated surveillance
o Digital devices
o Sensors, actuators,
transponders, meters (IoT)
o Interactions and transactions
• Volunteered
o Social media
o Sousveillance/wearables
o Crowdsourcing/neogeography
o Citizen science
4. Urban big data
• Diverse range of public and private
generation of fine-scale (uniquely
indexical) data about citizens and places in
real-time:
• utilities
• transport providers
• environmental agencies
• mobile phone operators
• social media sites
• travel and accommodation websites
• home appliances and entertainment
systems
• financial institutions and retail chains
• private surveillance and security firms
• remote sensing, aerial surveying
• emergency services
• Producing a data deluge that can be
combined, analyzed, acted upon
9. Data-driven, networked urbanism
• Cities are becoming ever more instrumented and
networked, their systems interlinked and integrated
• Consequently, cities are becoming knowable and
controllable in new dynamic ways
• Urban operational governance and city services are
becoming highly responsive to a form of networked
urbanism in which big data systems are:
• prefiguring and setting the urban agenda
• producing a deluge of contextual and actionable data
• influencing and controlling how city systems respond and
perform in real-time
10. Creating smart cities
• Tackle pressing issues
• New forms of operational governance
• More efficient, competitive and productive service delivery
• Increase resilience and sustainability
• More transparency and accountability
• Enhance participation in city life and quality of life
• Stimulate creativity, innovation, entrepreneurship and
economic growth
• Improve models and simulations for future development
11. Eight critiques of smart cities
• City as a knowable, rational, steerable machine
• Ahistorical, aspatial and homogenizing
• Technocratic governance and solutionism
• Corporatisation of governance
• Serve certain interests and reinforce inequalities
• The politics of urban data
• Social, political, ethical effects
• Buggy, brittle, hackable urban systems
12. The politics of urban data
• Big data and dashboards are not simply technical tools
• Nor are they are not pragmatic, neutral, objective,
non-ideological; nor can they speak for themselves
• Data do not exist independently of the ideas,
instruments, practices, contexts, knowledges and
systems used to generate, process & analyze them
• Big data and dashboards express a normative notion
about what should be measured, for what reasons, and
what they should tell us
• And they have normative effect - being used to
influence decision-making, modify institutional
behaviour, condition workers, etc
13. The politics of urban data
Material Platform
(infrastructure – hardware)
Code Platform
(operating system)
Code/algorithms
(software)
Data(base)
Interface
Reception/Operation
(user/usage)
Systems of thought
Forms of knowledge
Finance
Political economies
Governmentalities & legalities
Organisations and institutions
Subjectivities and communities
Marketplace
System/process
performs a task
Context
frames the system/task
Digital socio-technical assemblage
Places
Practices
14. Ethics of data-driven urbanism
• Data-driven, networked
urbanism raises all kinds of
ethical & related questions
• Data ownership and control
• Data integration and data
markets
• Data security and integrity
• Dataveillance and privacy
• Data quality and provenance
• Data uses
15. Privacy and big urban data
• Privacy debates concern acceptable practices with
regards to accessing and disclosing personal and sensitive
information about a person
• identity privacy (to protect personal and confidential data)
• bodily privacy (to protect the integrity of the physical
person);
• territorial privacy (to protect personal space, objects and
property);
• locational and movement privacy (to protect against the
tracking of spatial behaviour)
• communications privacy (to protect against the surveillance of
conversations and correspondence);
• transactions privacy (to protect against monitoring of
queries/searches, purchases, and other exchanges)
16. A Taxonomy of Privacy Harms (compiled from Solove 2006)
Domain Privacy breach Description
Information
Collection
Surveillance Watching, listening to, or recording of an individual’s activities
Interrogation Various forms of questioning or probing for information
Information
Processing
Aggregation The combination of various pieces of data about a person
Identification Linking information to particular individuals
Insecurity Carelessness in protecting stored information from leaks and
improper access
Secondary Use Use of information collected for one purpose for a different
purpose without the data subject’s consent
Exclusion Failure to allow the data subject to know about the data that others
have about her and participate in its handling and use, including
being barred from being able to access and correct errors
Information
Dissemination
Breach of Confidentiality Breaking a promise to keep a person’s information confidential
Disclosure Revelation of information about a person that impacts the way
others judge her character
Exposure Revealing another’s nudity, grief, or bodily functions
Increased Accessibility Amplifying the accessibility of information
Blackmail Threat to disclose personal information
Appropriation The use of the data subject’s identity to serve the aims and
interests of another
Distortion Dissemination of false or misleading information about individuals
Invasion Intrusion Invasive acts that disturb one’s tranquillity or solitude
Decisional Interference Incursion into the data subject’s decisions regarding her private
affairs
17. Privacy and big urban data
• Intensifies datafication
• The capture and circulation data are:
• indiscriminate and exhaustive (involve all individuals, objects,
transactions, etc.);
• distributed (occur across multiple devices, services and places);
• platform independent (data flows easily across platforms, services,
and devices);
• continuous (data are generated on a routine and automated basis).
• Much greater levels of intensified scrutiny and modes of
surveillance/dataveillance
• Tasks previously unmonitored or caught through disciplinary gaze
now routinely tracked and traced
• All but impossible to live everyday lives without leaving digital
footprints and shadows
• Mass recording, organizing, storing and sharing big data changes
the uses to which data can be put
18. Location/movement data
• Controllable digital CCTV cameras + ANPR + facial
recognition
• Smart phones: cell masts, GPS, wifi
• Sensor networks: capture and track phone identifiers
such as MAC addresses
• Wifi mesh: capture & track phones with wifi turned on
• Smart card tracking: barcodes/RFID chips (buildings &
public transport)
• Vehicle tracking: unique ID transponders for automated
road tolls & car parking
• Other staging points: ATMs, credit card use, metadata
tagging
• Electronic tagging; shared calenders
19. Data type Data permissions that can be sought by android apps (from Hein 2014)
Accounts log email log
App Activity name, package name, process number of activity, processed id
App Data Usage Cache size, code size, data size, name, package name
App Install installed at, name, package name, unknown sources enabled, version code, version
name
Battery health, level, plugged, present, scale, status, technology, temperature, voltage
Device Info board, brand, build version, cell number, device, device type, display, fingerprint, IP,
MAC address, manufacturer, model, OS platform, product, SDK code, total disk
space, unknown sources enabled
GPS accuracy, altitude, latitude, longitude, provider, speed
MMS from number, MMS at, MMS type, service number, to number
NetData bytes received, bytes sent, connection type, interface type
PhoneCall call duration, called at, from number, phone call type, to number
SMS from number, service number, SMS at, SMS type, to number
TelephonyInfo cell tower ID, cell tower latitude, cell tower longitude, IMEI, ISO country code, local
area code, MEID, mobile country code, mobile network code, network name,
network type, phone type, SIM serial number, SIM state, subscriber ID
WifiConnection BSSID, IP, linkspeed, MAC addr, network ID, RSSI, SSID
WifiNeighbors BSSID, capabilities, frequency, level, SSID
Root Check root status code, root status reason code, root version, sig file version
Malware Info algorithm confidence, app list, found malware, malware SDK version, package list,
reason code, service list, sigfile version
20. Privacy and big urban data
• Deepens inferencing
• Big data and predictive modelling enables a lot of inference
beyond the data generated
• can infer info about an individual not directly encoded in a
database but constitute PII which can produce ‘predictive
privacy harms’.
• For example, co-proximity and co-movement with others
can be used to infer political, social, and/or religious
affiliation.
• Also can produce ‘the tyranny of the minority’
21. Privacy and big urban data
• Weak anonymization and enables re-identification
• Key strategies for ensuring individual privacy is anonymization, either
through the use of pseudonyms or aggregation or other strategies.
• Pseudonyms simply mean that a unique tag is used to identify a person
in place of a name.
• Code is persistent and distinguishable from others and recognizable on
an on-going basis, meaning it can be tracked over time and space and
used to create detailed individual profiles.
• No different from other persistent identifiers such as social security
numbers and in effect constitutes PII.
• Some companies talking of ‘anonymous identifiers’ is thus somewhat of
an oxymoron, especially when the identifier is directly linked to an
account with known personal details
• Inference and the linking of a pseudonym to other accounts and
transactions means it can be potentially be re-identified.
• It is possible to reverse engineer anonymization strategies by combing
and combining datasets
22. Privacy and big urban data
• Opacity and automation creates obfuscation and reduces control
• The emerging big data landscape is complex and fragmented.
• Various smart city technologies are composed of multiple interacting systems
run by a number of corporate and state actors.
• Data are thus passed between ‘devices, platforms, services, applications, and
analytics engines’ and shared with third parties.
• Across this maze-like assemblage data can be ‘leaked, intercepted,
transmitted, disclosed, dis/assembled across data streams, and repurposed’ in
ways that are difficult to track and control
• Moreover, algorithmic processing is black-boxed, so it’s not clear how data are
being processed
• Opacity and automation undermine the FIPPs at the heart of privacy regulation
in a number of respects:
• making it difficult for individuals to seek access to verify, query, correct or
delete data, or to even know who to ask (tangled set of roles (as data
processors and controllers);
• to know how data collected about them is used; to assess how fair any
actions taken upon the data are;
• to hold data controllers to account
23. Privacy and big urban data
• Data are being shared and repurposed and used in unpredictable
and unexpected ways
• One of the key features of the data revolution is the wholesale erosion
of data minimization principles;
• that is, the undermining of purpose specification and use limitations
principles that mean that data should only be generated to perform a
particular task, are only retained as long as they are needed for that
task, and are only used to perform a particular task.
• Solution pursued by many companies is to repackage data by de-
identifying them (using pseudonyms or aggregation) or creating derived
data, with only the original dataset being subjected to data
minimization. The repackaged data can then be sold on and repurposed
in a plethora of ways
• The data and services that data brokers offer are used to perform a
wide variety of tasks for which the data were never intended, including
to predictively profile, socially sort, behaviourally nudge, and regulate,
control and govern individuals and the various systems and
infrastructures with which they interact
24. Privacy and big urban data
• Notice and consent is an empty exercise or absent
• Individuals interact with a number of smart city technologies on a daily basis,
each of which is generating data about them.
• Given the volume and diversity of these interactions it is simply too onerous for
individuals to police their privacy across dozens of entities, to weigh up the
costs and benefits of agreeing to terms and conditions without knowing how the
data might be used now and in the future, and to assess the cumulative and
holistic effects of their data being merged with other datasets
• In the case of some smart city technologies there is little mechanism to seek
notice and consent
• For example, CCTV, ANPR and MAC address tracking, and sensing by the Internet
of Things, all take place with no attempt at consent and often with little
notification
• Moreover, there is no ability to opt-out
• As such, there is no sense in which a person can selectively reveal themselves;
instead they must always reveal themselves.
• If a person is unaware that data about them is being generated, then it is
impossible to discover and query the purposes to which those data are being
put
25. R
Fair Information Practice Principles (OECD, 1980)
Principle Description
Notice Individuals are informed that data are being generated and the
purpose to which the data will be put
Choice Individuals have the choice to opt-in or opt-out as to whether and
how their data will be used or disclosed
Consent Data are only generated and disclosed with the consent of
individuals
Security Data are protected from loss, misuse, unauthorized access,
disclosure, alteration and destruction
Integrity Data are reliable, accurate, complete and current
Access Individuals can access, check and verify data about themselves
Use Data are only used for the purpose for which they are generated
and individuals are informed of each change of purpose
Accountability The data holder is accountable for ensuring the above principles
and has mechanisms in place to assure compliance
Redundant in the age of big urban data?
27. Suggested solutions
• Market:
• Industry standards and self-regulation
• Privacy/security as competitive advantage
• Technological
• End-to-end strong encryption, access controls, security controls, audit
trails, backups, up-to-date patching, etc.
• Privacy enhancement tools
• Policy and regulation
• FIPPs
• Privacy by design;
• security by design
• Governance
• Vision and strategy: (1) smart city advisory board and smart city strategy;
• Oversight of delivery and compliance: (2) smart city governance, risk and
compliance board;
• Day-to-day delivery: (3) core privacy/security team, smart city
privacy/security assessments, and (4) computer emergency response team
28. Conclusion
• We are entering an era of embedded and mobile computation
• Devices and infrastructures are producing vast quantities of data in
real-time, and are responsive to these data, enabling new kinds of
monitoring, regulation and control
• Cities are becoming data-driven and are enacting new forms of
algorithmic governance
• Whilst data-driven, networked urbanism undoubtedly provides a set of
solutions for urban problems, it also raises a number of ethical and
normative questions
• The challenge facing urban managers and citizens is to realise the
benefits of planning and delivering city services using urban data and
real-time responsive systems whilst minimizing pernicious effects
• At present, little serious thought has been expended on the latter