Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Behavioral Big Data & Healthcare Research: Talk at WiDS Taipei
1. Behavioral Big Data
& Healthcare Research
WiDS Taipei, 31 March 2019
Galit Shmueli 徐茉莉
Institute of Service Science Behavioral
Big Data
Researcher
Human
Subjects
Research
Question
In memory of
Prof Aya Cohen
1940-2019
2. 1994-2000 (MSc + PhD, Statistics)
Israel Institute of Technology
Faculty of IE & M
2000-2002
Carnegie Mellon University
Department of Statistics
2002-2012
University of Maryland
Smith School of Business
2011-2014
Indian School of Business
Hyderabad, India
2014-…
National Tsing Hua University
Institute of Service Science
My Academic Path
My Research
‘Entrepreneurial’ statistical &
data mining modeling
Interdisciplinary
Statistical Strategy
• To Explain or To Predict?
• Information Quality
• Data Mining for Causality
• Predicting with Causal Models
• Behavioral Big Data
1991-1994 (BA, Statistics & Psychology)
University of Haifa, Israel
3. What is Behavioral Big Data (BBD)
Special type of Big Data
• Behavioral: people’s measurable
“everyday” behavior, interactions, self-
reported opinions, thoughts, feelings
• Human and social aspects:
Intentions, deception, emotion,
reciprocation, herding,…
When aware of data collection ->
modified behavior (legal risks, embarrassment, unwanted solicitation)
4. BBD vs. Inanimate Big Data
Behavioral
Big Data
Researcher
Human
Subjects
Research
Question
Inanimate
Big Data
Researcher
Research
Question
1. Aware, ongoing interaction with
the BBD - “contaminate” BBD
with intention, deception,
emotion, herding…
2. Can be harmed by BBD
5. Figure 1: The types of physiological
data points and the wearable
sensors under development or on
the market to monitor them.
Elenko, Underwood & Zohar (2015),
“Defining Digital Medicine”,
Nature Biotechnology 33, 456-461
Physiological
Big Data
Human
Subjects
6. BBD vs.
Physio
Big Data
• Individual bodies
• Physical measurements
• Medical systems set
data collection timing
• Clinical trials:
awareness & vested
interest
• Collection of connected people
• Measurable behaviors: actions,
interactions, self-reported
feelings, opinions, thoughts
• User chooses data generation
content & timing
• Experiments: users unaware; not
always in user’s best interest
Different research methods in life sciences and behavioral sciences
• Measurement instruments
• Models (latent variable models, social network analysis)
• Human subjects risks
8. “The main products of the 21st
century economy will not be
textiles, vehicles, and weapons
but bodies, brains, and minds”
https://www.ynharari.com/homo-deus-impact-digitalization-society/
“If you wear biometric sensors (such as a Fitbit band) and these
sensors are connected to the computer, the computer will know
exactly what your heart rate, blood pressure and adrenalin level are,
and based on this information, it can identify your emotional state
better than any human psychologist”
https://www.yediot.co.il/articles/0,7340,L-4948868,00.html
Physiological data
translated to BBD
9. He’s part of a small but
growing group of people
who are wearing CGMs
to track—and then
hack—what goes on in
their own bodies.
Physiological
data collection
turns into BBD
13. Data from a typical hospital, about…
Patients
Personal info
Medical history (visits, tests,
medication, hospitalization...)
Scheduled events, billing
Physicians
Scheduled + actual appointments,
procedures, prescriptions,…
Entries of patient info/data
Nurses
Location, work hours,…
Pharmacy staff
Speed of service
Quality of service
Lab staff
Speed of service
Quality of service
Other staff
Finance/accounting
Cleaning
Receptionists
Volunteers
Food court!
Data Collection
Technologies:
• Medical devices
• HIT systems
(EHR, HR for
Health Info
System)
• WiFi
---
Smart Hospital
• Cameras
• Sensors
• GPS
• IoT
14. Interactions between
Patients – doctors/nurses
Doctors – other doctors
Patients – other patients
Patient family – hospital staff
Patients – social network ”friends”
...
New data #1:
Recorded Interactions
15. Chiu, C. C., Tripathi, A., Chou, K., Co, C., Jaitly, N., Jaunzeikare, D., ... & Tansuwan, J. (2017).
Speech recognition for medical conversations. arXiv preprint arXiv:1711.07274.
Data:
• 90, 000 conversations between
doctors and patients during
clinical visits.
• 151 types of medical visits for
different purposes
• Each conversation is typically
between a single doctor and a
patient, sometimes also including
a nurse, or family member.
17. Mobile health apps and wearable devices that use
artificial intelligence to help diagnose or even treat
medical conditions pose a new regulatory challenge for
the U.S. Food and Drug Administration
This comes at a time when medical devices have
evolved from fairly self-contained gadgets into
implants and wearables that communicate
wirelessly with medical software on separate
computers or in the cloud. The definition of medical
device has also stretched as smartphone apps and
online services—often backed by machine-
learning algorithms—promise to deliver medical
diagnoses that once would have required a visit to a
doctor's office and specialized lab equipment.
18. This is where it becomes
ethically challenging:
Who’s collecting the data
and for what purpose?
Are users aware of the data collection
and usage?
What are users’ benefits & risks
from sharing their data?
20. Health-related BBD: Online
• Medical/health websites
• Online forums
• Social networks
• Search engines
Data voluntarily entered by users: personal details, photos, comments,
messages, search terms, likes, payment information, connections with “friends”
Passive footprints: duration on the website, pages browsed, sequence,
referring website, Internet browser, operating system, location, IP address
21. New data #4:
Health-related behavior self-logged on Apps
Every day, women manually log around 1.4
M new data points including cycle history,
ovulation and pregnancy tests results, age,
height, weight, lifestyle statistics about
sleep, activity, and nutrition. In addition,
more data comes from wearable devices
like Fitbit & Apple Watch.
Data voluntarily entered by users: health condition, symptoms, behaviors
(eating, exercise, sleep, sex, parking, feelings…)
Passive footprints: app log times, pages browsed, sequence, location…
22. Flo became the most
downloaded app
worldwide in its
category within months
after introducing
neural networks to its
prediction algorithm.
In addition to logging a menstruation and health diary, users can join a number of
different themed groups including weight loss, clothing, fitness,
relationships, and travel. These groups look and work much like “message
board”-style social network
To date, Meet You has reportedly accumulated two million daily active users,
1.2 million daily active users of its social network, and over 800,000 daily posts.
23.
24. Sea Hero Quest, a mobile app that
measures spatial navigation ability.
Credit: Hugo Spiers et al.
Since its launch in May 2016, some 2.5
million people have played Sea Hero Quest
Health-related BBD:
Gaming
27. “Some hospitals are collecting new information
from patients directly, while others have sought
data from companies that sell consumer and
financial information, or federal agencies that
provide statistics on poverty, housing density
and unemployment.”
The big obstacle: access to the data. Doctors and nurses have limited time to collect new data
and patients bombarded with questions about their lives may suffer “interview fatigue”
28. This is where it becomes
ethically challenging:
Who’s collecting the data
and for what purpose?
Are users aware of the data collection
and usage?
What are users’ benefits & risks
from sharing their data?
30. Subjects went home with an app that measured the
ways they touched their phone’s display (swipes,
taps, and keyboard typing)
Before starting Mindstrong, Paul Dagum, its founder
and CEO, paid for two Bay Area–based studies to
figure out whether there might be a systemic measure
of cognitive ability—or disability—hidden in how we
use our phones. 150 research subjects came into a
clinic and underwent a standardized
neurocognitive assessment
memory problems… can be spotted by looking at things
including how rapidly you type and what errors you make
(such as how frequently you delete characters), as well as by
how fast you scroll down a list of contacts.
31. “thousands of people are using
the app, and the company now
has five years of clinical study
data to confirm its science and
technology.”
PRIVACY:
“while Mindstrong says it protects users’
data, collecting such data at all could be
a scary prospect for many of the people it
aims to help.
Companies may be interested in, say,
including it as part of an employee
wellness plan, but most of us wouldn’t
want our employers anywhere near our
mental health data”
32.
33. Microsoft Xbox 360 comes with a
microphone, a camera and technology that
recognizes a user's voice and face
• sign in and sign off
• games you played
• game-score statistics
• Xbox console hardware & operating performance data
• manufacturing codes from game discs
• network performance data
• data that indicates the quality of the Xbox service
to prevent cheating
• IP address
• operating system
• Xbox Live software version
to improve your experience
• Bing search terms
• samples of voice commands to perform search
• what you watched on Xbox One’s TV service
• music & videos you watched or listened to using Xbox
Live
At
home/school/work
36. provide a ride-hailing platform available specifically to
healthcare providers, letting clinics, hospitals, rehab
centers and more easily assign rides for their patients
and clients from a centralized dashboard – without
requiring that the rider even have the Uber app, or a
smartphone.
Uber Health’s creation was rooted in some alarming
statistics about patient care and healthcare client
absentee rates.
38. Research Fields using Health BBD
Operations Researchers and Industrial Engineers
For: Hospital Management and Operations
(staffing, scheduling,…)
Medical/Healthcare Researchers & Clinicians
For: Improved Medical Treatment
(safety, effectiveness,…)
Information Systems Researchers
For: Improved Design & Use of Medical IS
(value of IS, effectiveness, standardization,…)
Marketing
Advertising
Insurance
Machine Learning
Social science
39. How Do Researchers Get
Health BBD?
1. Open/Publicly Available Data
Constantly refreshed or single data dump
API, web scraping
Hacked data
2. Partner with Company/Organization
• Both parties interested in research question
• Data purchase
• Personal connections, sabbaticals, internships
• Partnership between school and organization
• Third party (WCAI)
3. Crowdsourcing
4. China (!)
40. Research Using New Health BBD: Challenges
Behavioral
Big Data
Researcher
Human
Subjects
Research
Question
Scientific vs.
Clinical vs.
Commercial
Explain
vs.
Predict
Different (conflicting) Goals:
Unit of analysis vs.
Unit of measurement
Under/over-
coverage
New risks (privacy, liability,
security, HIPAA compliance)
New ethical challenges:
Generalization Challenges:
Acquire + analyze data
Users (self-selection,
spill-over, knowledge of
allocation, network)
Company algorithms
Average effect vs. individual effect
Data contaminated by:New modes of connection &
information (social networks,
forums, IoT, Apps)
ATE vs.
Individual
Technical expertise
larger distance
Old Q, new data: Operationalize new variables
New Q: Lack of literature
42. Two examples of high-profile studies
using new health BBD
Emotional contagion in
social networks
Kramer et al. (PNAS, 2014)
Detecting influenza epidemics
using search engine query data
Ginsberg et al. (Nature, 2009)
44. • No Ethics Board Review (IRB)
“[The work] was consistent with Facebook’s Data
Use Policy, to which all users agree prior to
creating an account on Facebook, constituting
informed consent for this research.”
• PNAS editorial Expression of Concern
• Varied response from public, academia, press,
ethicists, corporates
Where do Data Scientists get Ethics Training?
45. New Q: Lack of literature
Behavioral
Big Data
Explain
vs.
Predict
Different (conflicting) Goals:
Unit of analysis vs.
Unit of measurement
Under/over-
coverage
Generalization Challenges:
Acquire + analyze data
Users (self-selection,
spill-over, knowledge of
allocation, network)
Company algorithms
Average effect vs. individual effect
Data contaminated by:
ATE vs.
Individual
Technical expertise
Old Q, new data: Operationalize new variables
Scientific vs.
Clinical vs.
Commercial
Researcher
Human
Subjects
New risks (privacy, liability,
security, HIPAA compliance)
New ethical challenges:
New modes of connection &
information (social networks,
forums, IoT)
Research
Question
46. Example #2
• “Up-to-date influenza estimates may
enable public health officials and health
professional to better respond to seasonal
epidemics”
• BBD: automated search results for 50M
keywords on Google.com (2003-2007). For
each query: {query text, IP address}
• Fit 450M different models, correlating
each query text with CDC data; Combined
45 queries with highest correlation
47. Researchers: epidemiologists + data science academics
Dalton et al. (2016), “Flutracking weekly online community
survey of influenza-like illness annual report, 2015”
Communicable diseases intelligence quarterly report
Challenge: Acquire data
48. • The algorithm detects “flu” or
“winter”?
• Persistent over-estimation
• Performs worse than lagged CDC
3-week-old data
• Never released 45 terms used
• Lazer et al. recommend
combining/ calibrating GFT with
CDC data
But most importantly…
49. Changes made by Google’s search
algorithm to display potential
diagnoses + recommend search for
treatment (more advertising)
-> increased search
51. New Q: Lack of literature
Average effect vs. individual effect
Human
Subjects
Under/over-
coverage
New risks (privacy, liability,
security, HIPAA compliance)
New ethical challenges:
Users (self-selection,
spill-over, knowledge of
allocation, network)
New modes of connection &
information (social networks,
forums, IoT)
ATE vs.
Individual
Old Q, new data: Operationalize new variables
Explain
vs.
Predict
Scientific vs.
Clinical vs.
Commercial
Different (conflicting) Goals:
Unit of analysis vs.
Unit of measurement
Research
Question
Generalization Challenges:
Acquire + analyze data
Technical expertise
Company algorithms
Data contaminated by:
Behavioral
Big Data
Researcher
52. Uses Google searches to measure sensitive
behaviors/opinions/thoughts on
racism, self-induced abortion, depression,
child abuse, hateful mobs, the science of
humor, sexual preference, anxiety, son
preference, and sexual insecurity, among
many other topics.
53. New Q: Lack of literature
Old Q, new data: Operationalize new variables
Research
Question
Scientific vs.
Clinical vs.
Commercial
Explain
vs.
Predict
Different (conflicting) Goals:
Unit of analysis vs.
Unit of measurement
Under/over-
coverage
Generalization Challenges:
Acquire + analyze data
Users (self-selection,
spill-over, knowledge of
allocation, network)
Company algorithms
Average effect vs. individual effect
Data contaminated by:
ATE vs.
Individual
Technical expertise
Let’s Discuss Data Privacy
Researcher
larger distance
Human
Subjects
Behavioral
Big Data
New risks (privacy, liability,
security, HIPAA compliance)
New ethical challenges:
New modes of connection &
information (social networks,
forums, IoT, Apps)
54. Data Privacy is a Big Issue Right Now
Behavioral
Big Data
Researcher
Human
Subjects
Research
Question
Scientific vs.
Clinical vs.
Commercial
Explain
vs.
Predict
Different (conflicting) Goals:
Unit of analysis vs.
Unit of measureme
New risks (privacy, liability,
security, HIPAA compliance)
New ethical challenges:
Generalization Cha
Acquire + analyze data
Users (self-se
spill-over, kn
allocation, ne
Company alg
Average effec
Data contaNew modes of connection &
information (social networks,
forums, IoT)
ATE vs.
Individual
Technical expertise
55. What we’ve learned… is that
we need to take a more
proactive role in a broader
view of our responsibility. It’s
not enough to just build tools,
we need to make sure that
they’re used for good
56. “patients as well as medical staff will be
communicating in a non-private
environment. It is very important to
understand, monitor and control your own
content for its privacy implications. More
dangerous and needing control will be the
reach of patient-to-patient identification
and communication.”
57.
58. Medical data privacy is typically regulated
What about BBD?
several Israeli hospitals have been
conducting a pilot program which
used AI software to assist in deciding
whether patients should undergo
surgery. However these patients were
subjected to these tests without their
knowledge. The software has been
developed by a startup named
MEDecide in Tel Aviv and used
61. Using BBD for Research: Human Subjects
Institutional Review Board (IRB)
“ethics committee”
University-level committee designated
to approve, monitor, and review
biomedical and behavioral research
involving humans.
Medical and behavioral researchers are
aware of IRB. What about data science
researchers?
62. The “Final Rule” (July 19, 2018):
Update to the “Common Rule”
New exemption category: Research involving “benign
behavioral interventions”
Exemption for secondary research using identifiable
private information or identifiable biospecimens
No review needed under certain circumstances:
- publicly available data
- participant cannot readily be identified
- participant is regulated under HIPAA for purposes of “health-care
operations,” “research,” or “public health activities”—but not
where the investigator plans to report individual research results
63. • Am I respecting the rights of my data subjects?
• Are my data pseudonymized?
• Is my research “minimal risk?”
• Do I have broad consent for secondary analysis?
Greene, Shmueli, Ray, and Fell (2019)
Adjusting to the GDPR: The Impact on
Data Scientists and Behavioral
Researchers
66. … and new challenges
Behavioral
Big Data
Researcher
Human
Subjects
Research
Question
Scientific vs.
Clinical vs.
Commercial
Explain
vs.
Predict
Different (conflicting) Goals:
Unit of analysis vs.
Unit of measurement
Under/over-
coverage
New risks (privacy, liability,
security, HIPAA compliance)
New ethical challenges:
Generalization Challenges:
Acquire + analyze data
Users (self-selection,
spill-over, knowledge of
allocation, network)
Company algorithms
Average effect vs. individual effect
Data contaminated by:New modes of connection &
information (social networks,
forums, IoT, Apps)
ATE vs.
Individual
Technical expertise
New Q: Lack of literature
Old Q, new data: Operationalize new variables
67. Anal yt ics
Humanit y
Responsibil it y
Galit Shmueli 徐茉莉
Institute of Service Science
Shmueli, G. (2017), Research Dilemmas With Behavioral Big Data, Big Data, vol 5 issue 2, pp. 98-119