SlideShare une entreprise Scribd logo
1  sur  152
Télécharger pour lire hors ligne
NOTICING	
  THE	
  NUANCE:	
  
Designing	
  intelligent	
  systems	
  that	
  can	
  understand	
  	
  
semantic,	
  psychological,	
  and	
  behavioral	
  dimensions	
  	
  
of	
  our	
  digital	
  footprints	
  
Elizabeth	
  L.	
  Murnane	
  
	
  
elm236@cornell.edu 	
  	
  
www.cs.cornell.edu/~elm236/	
  
ABOUT	
  ELIZABETH	
  
Currently	
  
•  3nd	
  year	
  PhD	
  at	
  Cornell	
  Information	
  Science	
  
•  Committee:	
  Profs.	
  Dan	
  Cosley	
  (chair),	
  Claire	
  Cardie,	
  Geri	
  Gay	
  
	
  
Research	
  
•  Personalization;	
  IR/NLP;	
  Personal	
  Informatics;	
  Affective-­‐,	
  Semantic-­‐,	
  
Social-­‐	
  Computing	
  
•  2011	
  NSF	
  Graduate	
  Research	
  Fellow	
  
Background	
  
•  2007	
  MIT	
  S.B.	
  in	
  Mathematics	
  with	
  Computer	
  Science	
  
•  Co-­‐founded	
  MIT	
  CSAIL	
  startup	
  
	
  
USER-­‐CENTRIC	
  DATA	
  
•  Explicit	
  &	
  Implicit	
  
•  User-­‐generated	
  content	
  
•  Sensor	
  data	
  
•  Big	
  Data	
  &	
  Big	
  Personal	
  Data	
  (“Little	
  Data”)	
  
DIGITAL	
  FOOTPRINTS	
  
DIGITAL	
  FOOTPRINTS	
  
•  Search	
  Queries	
  	
  
DIGITAL	
  FOOTPRINTS	
  
•  Search	
  Queries	
  
•  Social	
  web,	
  microblogs,	
  media	
  sharing	
  	
  
DIGITAL	
  FOOTPRINTS	
  
•  Search	
  Queries	
  
•  Social	
  web,	
  microblogs,	
  media	
  sharing	
  
•  Mobile	
  sensing,	
  personal	
  informatics,	
  
life-­‐logging,	
  check-­‐ins	
  
DIGITAL	
  FOOTPRINTS	
  
•  Search	
  Queries	
  
•  Social	
  web,	
  microblogs,	
  media	
  sharing	
  
•  Mobile	
  sensing,	
  personal	
  informatics,	
  
life-­‐logging,	
  check-­‐ins	
  
•  Social	
  networking	
  
NUANCED	
  DIMENSIONS	
  OF	
  DATA	
  
•  Semantics	
  
•  Helping	
  machines	
  extract	
  intended	
  meaning	
  from	
  an	
  individual’s	
  
content	
  
•  Personality	
  &	
  Emotion	
  
•  Helping	
  machines	
  interpret	
  psychological,	
  affective,	
  and	
  subjective	
  
characteristics	
  of	
  users	
  and	
  their	
  data	
  
•  Behavior	
  
•  Helping	
  machines	
  understand	
  the	
  dynamics	
  of	
  both	
  private	
  and	
  
interpersonal	
  activities	
  
APPLICATION	
  AREAS	
  
Knowledge	
  
Sharing	
  
Personal	
  
Informatics	
  
Information	
  
Retrieval	
  
RESEARCH	
  PROJECTS	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
Computational	
  
Problem:	
  
Dimensions	
  
Mined:	
  
Projects:	
  
•  Semantic	
  
•  Psychological	
  
•  Psychological	
  
•  Behavioral	
  
•  CeRI	
  
•  Outreach	
  
•  Task	
  routing	
  
•  Commenting	
  
interface	
  
•  Smart	
  Pensieve	
  
•  Activity	
  Rhythms	
  
•  Smoking	
  Cessation	
  
•  Semantic	
  
•  Psychological	
  
•  RESLVE	
  
•  Sentiment-­‐based	
  
search	
  
RESEARCH	
  PROJECTS	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  Computational	
  
Problem:	
  
Dimensions	
  
Mined:	
  
Projects:	
  
•  Semantic	
  
•  Psychological	
  
•  Psychological	
  
•  Behavioral	
  
•  RESLVE	
   •  CeRI	
  
•  Outreach	
  
•  Task	
  routing	
  
•  Commenting	
  
interface	
  
•  Smart	
  Pensieve	
  
•  Activity	
  Rhythms	
  
•  Smoking	
  Cessation	
  
•  Semantic	
  
•  Psychological	
  
•  Sentiment-­‐based	
  
search	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
THE	
  RESLVE	
  PROJECT	
  
•  Gain	
  better	
  understanding	
  of	
  challenges	
  machines	
  face	
  in	
  
understanding	
  semantic	
  meaning	
  of	
  social	
  Web	
  data	
  
•  Use	
  those	
  insights	
  to	
  develop	
  more	
  advanced	
  computational	
  
methods	
  that	
  can	
  more	
  reliably	
  make	
  sense	
  of	
  this	
  data	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
SOCIAL	
  WEB	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
SOCIAL	
  WEB	
  
10	
  million	
  	
  
pages	
  per	
  day	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
SOCIAL	
  WEB	
  
800	
  million	
  	
  
visitors	
  per	
  month	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
SOCIAL	
  WEB	
  
7	
  billion	
  images	
  
(twice	
  4	
  years	
  ago)	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
TASK	
  DEFINITION	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
TASK	
  DEFINITION	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
Named	
  En)ty	
  Recogni)on	
  (NER)	
  
•  SystemaEcally	
  idenEfying	
  menEons	
  of	
  en##es	
  (e.g.,	
  
people,	
  places,	
  concepts,	
  ideas)	
  
TASK	
  DEFINITION	
  
Named	
  En)ty	
  Recogni)on	
  (NER)	
  
•  SystemaEcally	
  idenEfying	
  menEons	
  of	
  en##es	
  (e.g.,	
  
people,	
  places,	
  concepts,	
  ideas)	
  
Named	
  En)ty	
  Disambigua)on	
  (NED)	
  
•  Resolving	
  the	
  intended	
  meaning	
  of	
  ambiguous	
  enEEes	
  from	
  
mulEple	
  candidate	
  meanings	
  
	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
AMBIGUOUS	
  ENTITIES	
  
aaahh	
  one	
  more	
  day	
  un,l	
  
finn!!!	
  #cantwait	
  
	
  
	
  
	
  
office	
  holiday	
  party	
   Beetle	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
AMBIGUOUS	
  ENTITIES	
  
aaahh	
  one	
  more	
  day	
  un,l	
  
finn!!!	
  #cantwait	
  
	
  
	
  
	
  
office	
  holiday	
  party	
   Beetle	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
AMBIGUOUS	
  ENTITIES	
  
aaahh	
  one	
  more	
  day	
  un,l	
  
finn!!!	
  #cantwait	
  
	
  
	
  
	
  
office	
  holiday	
  party	
   Beetle	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
AMBIGUOUS	
  ENTITIES	
  
aaahh	
  one	
  more	
  day	
  un,l	
  
finn!!!	
  #cantwait	
  
	
  
	
  
	
  
office	
  holiday	
  party	
   Beetle	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
Footage:	
  
office	
  holiday	
  party	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
Footage:	
  
office	
  holiday	
  party	
  
Footage:	
  
• Workplace?	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
Footage:	
  
office	
  holiday	
  party	
  
Footage:	
  
• Workplace?	
  
• TV	
  Show?	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
Footage:	
  
office	
  holiday	
  party	
  
Footage:	
  
• Workplace?	
  
• TV	
  Show?	
  
Episode	
  4	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
Footage:	
  
office	
  holiday	
  party	
  
Episode	
  4	
  
Footage:	
  
• Workplace?	
  
• TV	
  Show?	
  
• US	
  Version?	
  
• UK	
  Version?	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
Episode	
  4	
  
office	
  holiday	
  party	
  
office,	
  december	
  3	
  
Footage:	
  
• Workplace?	
  
• TV	
  Show?	
  
• US	
  Version?	
  
• UK	
  Version?	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
ANALYSIS	
  
Data	
  Sample	
  
•  TwiKer:	
  tweets	
  
•  YouTube:	
  video	
  Etles,	
  descripEons	
  
•  Flickr:	
  photo	
  tags,	
  Etles,	
  descripEons	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
TEXT	
  LENGTH	
  
•  Longest	
  uKerances	
  sEll	
  shorter	
  than	
  even	
  shortest	
  texts	
  
from	
  NER	
  task	
  corpora	
  like	
  Reuters-­‐21578,	
  Brown-­‐Corpus	
  
0"
5"
10"
15"
20"
25"
30"
10"
40"
70"
100"
130"
160"
190"
300"
450"
600"
800"
1100"
1400"
2500"
4000"
5500"
7000"
8500"
10000"
11500"
13000"
14500"
Twi/er" YouTube" Flickr"
Reuters" Brown"
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
HIGH	
  AMBIGUITY	
  
0"
0.1"
0.2"
0.3"
0.4"
0.5"
0.6"
0.7"
0.8"
0.9"
1"
Wikipedia"Miner" DBPedia"Spotlight"
•  NER	
  services	
  have	
  low	
  confidence	
  
	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
HIGH	
  AMBIGUITY	
  
•  NER	
  services	
  have	
  low	
  confidence	
  
	
  
•  Many	
  potenEal	
  candidates	
  (2	
  to	
  163,	
  avg.	
  5-­‐6,	
  median	
  4)	
  
0"
0.1"
0.2"
0.3"
0.4"
0.5"
0.6"
0.7"
0.8"
0.9"
1"
Wikipedia"Miner" DBPedia"Spotlight"
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
HIGH	
  AMBIGUITY	
  
•  91%	
  of	
  uKerances	
  contain	
  at	
  least	
  1	
  ambiguous	
  enEty	
  
•  2/3	
  of	
  enEEes	
  detected	
  are	
  ambiguous	
  
•  Almost	
  no	
  enEEes	
  without	
  at	
  least	
  2	
  senses	
  to	
  disambiguate	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
CHALLENGES	
  &	
  FOCUS	
  
•  Short	
  Length	
  
•  Sparse	
  Lexical	
  Context	
  
•  Noisy	
  
•  Highly	
  personal	
  in	
  nature	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
CHALLENGES	
  &	
  FOCUS	
  
•  Short	
  Length	
  
•  Sparse	
  Lexical	
  Context	
  
•  Noisy	
  
•  Highly	
  personal	
  in	
  nature	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
LIMITATIONS	
  OF	
  EXTANT	
  RESEARCH	
  
Tweets	
  severely	
  degrade	
  tradiEonal	
  techniques	
  
	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
LIMITATIONS	
  OF	
  EXTANT	
  RESEARCH	
  
Tweets	
  severely	
  degrade	
  tradiEonal	
  techniques	
  
•  Stanford	
  NER:	
  F1	
  drops	
  90%	
  à	
  46%	
  
•  DBPedia	
  Spotlight	
  &	
  Wikipedia	
  Miner:	
  P@1	
  <	
  40%	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
LIMITATIONS	
  OF	
  EXTANT	
  RESEARCH	
  
Tweets	
  severely	
  degrade	
  tradiEonal	
  techniques	
  
•  Stanford	
  NER:	
  F1	
  drops	
  90%	
  à	
  46%	
  
•  DBPedia	
  Spotlight	
  &	
  Wikipedia	
  Miner:	
  P@1	
  <	
  40%	
  
	
  
Recent	
  strategies	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
LIMITATIONS	
  OF	
  EXTANT	
  RESEARCH	
  
Tweets	
  severely	
  degrade	
  tradiEonal	
  techniques	
  
•  Stanford	
  NER:	
  F1	
  drops	
  90%	
  à	
  46%	
  
•  DBPedia	
  Spotlight	
  &	
  Wikipedia	
  Miner:	
  P@1	
  <	
  40%	
  
	
  
Recent	
  strategies	
  
•  Crowd-­‐sourcing	
  
•  LimitaEon:	
  Dependent	
  on	
  reliable	
  human	
  workers	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
LIMITATIONS	
  OF	
  EXTANT	
  RESEARCH	
  
Tweets	
  severely	
  degrade	
  tradiEonal	
  techniques	
  
•  Stanford	
  NER:	
  F1	
  drops	
  90%	
  à	
  46%	
  
•  DBPedia	
  Spotlight	
  &	
  Wikipedia	
  Miner:	
  P@1	
  <	
  40%	
  
	
  
Recent	
  strategies	
  
•  Crowd-­‐sourcing	
  
•  LimitaEon:	
  Dependent	
  on	
  reliable	
  human	
  workers	
  
•  Automated	
  aKempts	
  
•  LimitaEon:	
  Focus	
  on	
  NER	
  not	
  NED	
  
•  LimitaEon:	
  Generalizability	
  beyond	
  TwiKer?	
  
	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
HYPOTHESES	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
•  User	
  has	
  core	
  interests	
  
•  User	
  more	
  likely	
  to	
  menEon	
  an	
  enEty	
  about	
  a	
  topic	
  relevant	
  to	
  personal	
  interests	
  than	
  
menEon	
  a	
  topic	
  of	
  non-­‐interest	
  
	
  
•  User	
  expresses	
  these	
  interests	
  consistently	
  in	
  content	
  she	
  posts	
  
online	
  in	
  mulEple	
  communiEes	
  
•  Can	
  use	
  a	
  semanEc	
  knowledge	
  base	
  to	
  formally	
  represent	
  these	
  
topics	
  of	
  interest	
  
	
  
	
  	
  
HYPOTHESES	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
•  User	
  has	
  core	
  interests	
  
•  User	
  more	
  likely	
  to	
  menEon	
  an	
  enEty	
  about	
  a	
  topic	
  relevant	
  to	
  personal	
  interests	
  than	
  
menEon	
  a	
  topic	
  of	
  non-­‐interest	
  
	
  
•  User	
  expresses	
  these	
  interests	
  consistently	
  in	
  content	
  she	
  posts	
  
online	
  in	
  mulEple	
  communiEes	
  
•  Can	
  use	
  a	
  semanEc	
  knowledge	
  base	
  to	
  formally	
  represent	
  these	
  
topics	
  of	
  interest	
  
•  Wikipedia	
  
•  ArEcles,	
  categories	
  effecEvely	
  represent	
  topic	
  
•  CompaEble	
  with	
  NER	
  toolkits	
  (DBPedia	
  Spotlight,	
  Wikipedia	
  Miner)	
  
­  ArEcle	
  ediEng	
  behavior	
  ≈	
  interests	
  
	
  
	
  	
  
QUALITATIVE	
  ANALYSIS:	
  STABLE	
  INTERESTS	
  
User’s	
  topics	
  of	
  contribuEon	
  similar	
  across	
  Web:	
  
	
  
	
  	
  
	
  	
  
On	
  average,	
  52.4%	
  of	
  enEEes	
  a	
  user	
  menEons	
  in	
  social	
  Web	
  (e.g.,	
  “Java”)	
  have	
  at	
  
least	
  1	
  candidate	
  sense	
  in	
  same	
  parent	
  category	
  of	
  Wikipedia	
  arEcle	
  same	
  user	
  
edited	
  (e.g.,	
  “Programming	
  language”)	
  
If	
  extend	
  to	
  just	
  4	
  parents	
  up	
  category	
  hierarchy,	
  get	
  all	
  100%	
  
	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
QUALITATIVE	
  ANALYSIS:	
  STABLE	
  INTERESTS	
  
User’s	
  topics	
  of	
  contribuEon	
  similar	
  across	
  Web:	
  
	
  
Same	
  Topic	
  
	
  	
  
On	
  average,	
  52.4%	
  of	
  enEEes	
  a	
  user	
  menEons	
  in	
  social	
  Web	
  (e.g.,	
  “Java”)	
  have	
  at	
  
least	
  1	
  candidate	
  sense	
  in	
  same	
  parent	
  category	
  of	
  Wikipedia	
  arEcle	
  same	
  user	
  
edited	
  (e.g.,	
  “Programming	
  language”)	
  
If	
  extend	
  to	
  just	
  4	
  parents	
  up	
  category	
  hierarchy,	
  get	
  all	
  100%	
  
	
  
	
  
	
  
	
  
Ambiguous	
  YouTube	
  post:	
  	
  
office,	
  december	
  3	
  
	
  
Same	
  user’s	
  recent	
  Wikipedia	
  edit:	
  	
  
<item	
  userid="xxxx"	
  user="xxxx”	
  
pageid="31841130”	
  ,tle=	
  	
  
"The	
  Office	
  (U.S.	
  season	
  8)"/>	
  
	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
QUALITATIVE	
  ANALYSIS:	
  STABLE	
  INTERESTS	
  
A	
  user’s	
  topics	
  of	
  contribuEon	
  similar	
  across	
  Web:	
  
	
  
Same	
  Topic	
  
Same	
  categories	
  
  On	
  average,	
  52.4%	
  of	
  enEEes	
  a	
  user	
  menEons	
  in	
  social	
  Web	
  (e.g.,	
  “Java”)	
  have	
  at	
  
least	
  1	
  candidate	
  sense	
  in	
  same	
  parent	
  category	
  of	
  Wikipedia	
  arEcle	
  same	
  user	
  
edited	
  (e.g.,	
  “Programming	
  language”)	
  
  If	
  extend	
  to	
  just	
  4	
  parents	
  up	
  category	
  hierarchy,	
  get	
  all	
  100%	
  
	
  
	
  
	
  
	
  
Ambiguous	
  YouTube	
  post:	
  	
  
office,	
  december	
  3	
  
	
  
Same	
  user’s	
  recent	
  Wikipedia	
  edit:	
  	
  
<item	
  userid="xxxx"	
  user="xxxx”	
  
pageid="31841130”	
  ,tle=	
  	
  
"The	
  Office	
  (U.S.	
  season	
  8)"/>	
  
	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
STRATEGY	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
Ø  Bridge	
  user	
  idenEty	
  between	
  social	
  Web	
  and	
  knowledge	
  base,	
  K	
  
Ø  Model	
  interests	
  using	
  K’s	
  organizaEonal	
  scheme	
  
Ø  Rank	
  enEty	
  senses	
  according	
  to	
  relevance	
  to	
  interests	
  
	
  
EXPLORING	
  A	
  PERSONALIZED	
  SOLUTION	
  
 Individual-­‐centric	
  approach	
  to	
  NED	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
EXPLORING	
  A	
  PERSONALIZED	
  SOLUTION	
  
 Individual-­‐centric	
  approach	
  to	
  NED	
  
	
  
 Incorporates	
  external,	
  user-­‐specific	
  semanEc	
  data	
  
Personal	
  
Context	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
EXPLORING	
  A	
  PERSONALIZED	
  SOLUTION	
  
 Individual-­‐centric	
  approach	
  to	
  NED	
  
	
  
 Incorporates	
  external,	
  user-­‐specific	
  semanEc	
  data	
  
 Model	
  personal	
  interests	
  with	
  respect	
  to	
  this	
  informaEon	
  
Personal	
  
Context	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
EXPLORING	
  A	
  PERSONALIZED	
  SOLUTION	
  
 Individual-­‐centric	
  approach	
  to	
  NED	
  
	
  
 Incorporates	
  external,	
  user-­‐specific	
  semanEc	
  data	
  
 Model	
  personal	
  interests	
  with	
  respect	
  to	
  this	
  informaEon	
  
 Determine	
  user’s	
  likely	
  intended	
  meaning	
  of	
  ambiguous	
  enEty	
  based	
  
on	
  similarity	
  between	
  potenEal	
  meanings	
  and	
  interests	
  
Personal	
  
Context	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
EXPLORING	
  A	
  PERSONALIZED	
  SOLUTION	
  
 Individual-­‐centric	
  approach	
  to	
  NED	
  
	
  
 Incorporates	
  external,	
  user-­‐specific	
  semanEc	
  data	
  
 Model	
  personal	
  interests	
  with	
  respect	
  to	
  this	
  informaEon	
  
 Determine	
  user’s	
  likely	
  intended	
  meaning	
  of	
  ambiguous	
  enEty	
  based	
  on	
  
similarity	
  between	
  potenEal	
  meanings	
  and	
  interests	
  
RESLVE	
  
Resolving	
  EnEty	
  Sense	
  by	
  LeVeraging	
  Edits	
  
Personal	
  
Context	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
IMPLEMENTATION:	
  THE	
  RESLVE	
  SYSTEM	
  
RESLVE	
  (Resolving	
  EnEty	
  Sense	
  by	
  LeVeraging	
  Edits)	
  addresses	
  NED	
  by:	
  
	
  
pre-
processor
Wikipedia
Miner
user utterances
unstructured
short texts
DBPedia
Spotlight
top ranked
personally-
relevant
candidates
entity
m
m
m
entity
username
user contributed
structured
documents
user interest
model
BRIDGING
USER
IDENTITY
MODELING
USER
INTEREST
I II
III
RANKING
CANDIDATES
BY PERSONAL
RELEVANCE
m
m
m
m m
m m
m
m
m
entity
entity
detected entities &
candidate meanings ("m")
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
IMPLEMENTATION:	
  THE	
  RESLVE	
  SYSTEM	
  
RESLVE	
  (Resolving	
  EnEty	
  Sense	
  by	
  LeVeraging	
  Edits)	
  addresses	
  NED	
  by:	
  
I.  ConnecEng	
  social	
  Web	
  +	
  Wikipedia	
  editor	
  idenEty	
  
pre-
processor
Wikipedia
Miner
user utterances
unstructured
short texts
DBPedia
Spotlight
top ranked
personally-
relevant
candidates
entity
m
m
m
entity
username
user contributed
structured
documents
user interest
model
BRIDGING
USER
IDENTITY
MODELING
USER
INTEREST
I II
III
RANKING
CANDIDATES
BY PERSONAL
RELEVANCE
m
m
m
m m
m m
m
m
m
entity
entity
detected entities &
candidate meanings ("m")
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
IMPLEMENTATION:	
  THE	
  RESLVE	
  SYSTEM	
  
RESLVE	
  (Resolving	
  EnEty	
  Sense	
  by	
  LeVeraging	
  Edits)	
  addresses	
  NED	
  by:	
  
I.  ConnecEng	
  social	
  Web	
  +	
  Wikipedia	
  editor	
  idenEty	
  	
  
II.  Modeling	
  topics	
  of	
  interests	
  using	
  arEcle	
  edits	
  
pre-
processor
Wikipedia
Miner
user utterances
unstructured
short texts
DBPedia
Spotlight
top ranked
personally-
relevant
candidates
entity
m
m
m
entity
username
user contributed
structured
documents
user interest
model
BRIDGING
USER
IDENTITY
MODELING
USER
INTEREST
I II
III
RANKING
CANDIDATES
BY PERSONAL
RELEVANCE
m
m
m
m m
m m
m
m
m
entity
entity
detected entities &
candidate meanings ("m")
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
IMPLEMENTATION:	
  THE	
  RESLVE	
  SYSTEM	
  
RESLVE	
  (Resolving	
  EnEty	
  Sense	
  by	
  LeVeraging	
  Edits)	
  addresses	
  NED	
  by:	
  
I.  ConnecEng	
  social	
  Web	
  +	
  Wikipedia	
  editor	
  idenEty	
  	
  
II.  Modeling	
  topics	
  of	
  interests	
  using	
  arEcle	
  edits	
  
III.  Ranking	
  enEty	
  candidates	
  by	
  personal	
  relevance	
  
	
  
pre-
processor
Wikipedia
Miner
user utterances
unstructured
short texts
DBPedia
Spotlight
top ranked
personally-
relevant
candidates
entity
m
m
m
entity
username
user contributed
structured
documents
user interest
model
BRIDGING
USER
IDENTITY
MODELING
USER
INTEREST
I II
III
RANKING
CANDIDATES
BY PERSONAL
RELEVANCE
m
m
m
m m
m m
m
m
m
entity
entity
detected entities &
candidate meanings ("m")
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
IMPLEMENTATION:	
  THE	
  RESLVE	
  SYSTEM	
  
RESLVE	
  (Resolving	
  EnEty	
  Sense	
  by	
  LeVeraging	
  Edits)	
  addresses	
  NED	
  by:	
  
I.  ConnecEng	
  social	
  Web	
  +	
  Wikipedia	
  editor	
  idenEty	
  	
  
II.  Modeling	
  topics	
  of	
  interests	
  using	
  arEcle	
  edits	
  
III.  Ranking	
  enEty	
  candidates	
  by	
  personal	
  relevance	
  
	
  
pre-
processor
Wikipedia
Miner
user utterances
unstructured
short texts
DBPedia
Spotlight
top ranked
personally-
relevant
candidates
entity
m
m
m
entity
username
user contributed
structured
documents
user interest
model
BRIDGING
USER
IDENTITY
MODELING
USER
INTEREST
I II
III
RANKING
CANDIDATES
BY PERSONAL
RELEVANCE
m
m
m
m m
m m
m
m
m
entity
entity
detected entities &
candidate meanings ("m")
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
PHASE	
  1:	
  BRIDGING	
  WEB	
  IDENTITIES	
  
pre-
processor
Wikipedia
Miner
user utterances
unstructured
short texts
DBPedia
Spotlight
top ranked
personally-
relevant
candidates
entity
m
m
m
entity
username
user contributed
structured
documents
user interest
model
BRIDGING
USER
IDENTITY
MODELING
USER
INTEREST
I II
III
RANKING
CANDIDATES
BY PERSONAL
RELEVANCE
m
m
m
m m
m m
m
m
m
entity
entity
detected entities &
candidate meanings ("m")
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
•  Connect	
  idenEty	
  of	
  social	
  media	
  user	
  with	
  Wikipedia	
  editor	
  
PHASE	
  1:	
  BRIDGING	
  WEB	
  IDENTITIES	
  
•  Connect	
  idenEty	
  of	
  social	
  media	
  user	
  with	
  Wikipedia	
  editor	
  
•  Simple	
  string	
  matching	
  
­  Iofciu,	
  2011;	
  Perito,	
  2011	
  
pre-
processor
Wikipedia
Miner
user utterances
unstructured
short texts
DBPedia
Spotlight
top ranked
personally-
relevant
candidates
entity
m
m
m
entity
username
user contributed
structured
documents
user interest
model
BRIDGING
USER
IDENTITY
MODELING
USER
INTEREST
I II
III
RANKING
CANDIDATES
BY PERSONAL
RELEVANCE
m
m
m
m m
m m
m
m
m
entity
entity
detected entities &
candidate meanings ("m")
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
pre-
processor
Wikipedia
Miner
user utterances
unstructured
short texts
DBPedia
Spotlight
top ranked
personally-
relevant
candidates
entity
m
m
m
entity
username
user contributed
structured
documents
user interest
model
BRIDGING
USER
IDENTITY
MODELING
USER
INTEREST
I II
III
RANKING
CANDIDATES
BY PERSONAL
RELEVANCE
m
m
m
m m
m m
m
m
m
entity
entity
detected entities &
candidate meanings ("m")
 Models	
  user’s	
  topics	
  of	
  interest	
  using	
  bridged	
  Wiki	
  account’s	
  ediEng-­‐history	
  
 Compares	
  similarity	
  of	
  those	
  topics	
  to	
  topic	
  associated	
  with	
  candidate	
  sense	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
PHASE	
  2:	
  REPRESENTING	
  USERS	
  AND	
  ENTITIES	
  
 Models	
  user’s	
  topics	
  of	
  interest	
  using	
  bridged	
  Wiki	
  account’s	
  ediEng-­‐history	
  
 Compares	
  similarity	
  of	
  those	
  topics	
  to	
  topic	
  associated	
  with	
  candidate	
  sense	
  
 Content-­‐based	
  &	
  knowledge-­‐graph	
  based	
  similarity	
  
pre-
processor
Wikipedia
Miner
user utterances
unstructured
short texts
DBPedia
Spotlight
top ranked
personally-
relevant
candidates
entity
m
m
m
entity
username
user contributed
structured
documents
user interest
model
BRIDGING
USER
IDENTITY
MODELING
USER
INTEREST
I II
III
RANKING
CANDIDATES
BY PERSONAL
RELEVANCE
m
m
m
m m
m m
m
m
m
entity
entity
detected entities &
candidate meanings ("m")
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
PHASE	
  2:	
  REPRESENTING	
  USERS	
  AND	
  ENTITIES	
  
MODELING	
  A	
  KNOWLEDGE	
  CONTEXT	
  
 Knowledge	
  base,	
  K	
  
 K=(N,E)	
  
 2	
  node	
  types:	
  
­  Categories	
  
­  Topics	
  
c1
c2
c4
t3t2
c3
d2d1 d3
t1
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
USER	
  INTEREST	
  MODEL	
  
•  EdiEng	
  a	
  descripEon	
  signals	
  interest	
  in	
  associated	
  topic	
  
•  Topic	
  nodes:	
  all	
  topics	
  user	
  edited	
  descripEon	
  of	
  
•  Category	
  nodes:	
  categories	
  reachable	
  in	
  knowledge	
  graph	
  from	
  those	
  topics	
  
•  Edge	
  weight	
  =	
  inverse	
  of	
  shortest	
  path	
  length	
  
! c1 c2 c3 c4
t1
!
!
! 1!
!
!
! 0!
t2
!
!
! 1!
!
!
! 1!
t3 0! 0!
!
!
! 1!
•  Same	
  representaEon	
  for	
  candidates	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
 Models	
  user’s	
  topics	
  of	
  interest	
  using	
  bridged	
  Wiki	
  account’s	
  ediEng-­‐history	
  
 Compares	
  similarity	
  of	
  those	
  topics	
  to	
  topic	
  associated	
  with	
  candidate	
  sense	
  
 Content-­‐based	
  &	
  knowledge-­‐graph	
  based	
  similarity	
  
 Weighted	
  vectors	
  used	
  to	
  represent	
  user	
  and	
  candidate	
  sense	
  
pre-
processor
Wikipedia
Miner
user utterances
unstructured
short texts
DBPedia
Spotlight
top ranked
personally-
relevant
candidates
entity
m
m
m
entity
username
user contributed
structured
documents
user interest
model
BRIDGING
USER
IDENTITY
MODELING
USER
INTEREST
I II
III
RANKING
CANDIDATES
BY PERSONAL
RELEVANCE
m
m
m
m m
m m
m
m
m
entity
entity
detected entities &
candidate meanings ("m")
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
PHASE	
  2:	
  REPRESENTING	
  USERS	
  AND	
  ENTITIES	
  
PHASE	
  3:	
  RANKING	
  BY	
  PERSONAL	
  RELEVANCE	
  
Output	
  highest	
  scoring	
  candidate	
  as	
  intended	
  meaning	
  by	
  measuring:	
  
sim(u,m)=α*simcontent(u,m)+(1-­‐α)*simcategory(u,m)	
  	
  
pre-
processor
Wikipedia
Miner
user utterances
unstructured
short texts
DBPedia
Spotlight
top ranked
personally-
relevant
candidates
entity
m
m
m
entity
username
user contributed
structured
documents
user interest
model
BRIDGING
USER
IDENTITY
MODELING
USER
INTEREST
I II
III
RANKING
CANDIDATES
BY PERSONAL
RELEVANCE
m
m
m
m m
m m
m
m
m
entity
entity
detected entities &
candidate meanings ("m")
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
PRE-­‐PROCESSING	
  &	
  PREPARATION	
  MODULES	
  
pre-
processor
Wikipedia
Miner
user utterances
unstructured
short texts
DBPedia
Spotlight
top ranked
personally-
relevant
candidates
entity
m
m
m
entity
username
user contributed
structured
documents
user interest
model
BRIDGING
USER
IDENTITY
MODELING
USER
INTEREST
I II
III
RANKING
CANDIDATES
BY PERSONAL
RELEVANCE
m
m
m
m m
m m
m
m
m
entity
entity
detected entities &
candidate meanings ("m")
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
pre-
processor
Wikipedia
Miner
user utterances
unstructured
short texts
DBPedia
Spotlight
top ranked
personally-
relevant
candidates
entity
m
m
m
entity
username
user contributed
structured
documents
user interest
model
BRIDGING
USER
IDENTITY
MODELING
USER
INTEREST
I II
III
RANKING
CANDIDATES
BY PERSONAL
RELEVANCE
m
m
m
m m
m m
m
m
m
entity
entity
detected entities &
candidate meanings ("m")
PRE-­‐PROCESSING	
  &	
  PREPARATION	
  MODULES	
  
EXPERIMENT	
  
Labeling	
  correct	
  enEty	
  meaning	
  
•  1545	
  valid	
  ambiguous	
  enEEes	
  
•  Mechanical	
  Turk	
  CategorizaEon	
  Masters	
  	
  
•  Averaged	
  observed	
  agreement	
  across	
  all	
  coders	
  and	
  items	
  =	
  0.866	
  
•  Average	
  Fleiss	
  Kappa	
  =	
  0.803	
  
•  918	
  unanimously	
  labeled	
  ambiguous	
  enEEes	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
PERFORMANCE	
  
Metric	
  
•  Precision	
  at	
  rank	
  1	
  (P@1)	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
PERFORMANCE	
  
Metric	
  
•  Precision	
  at	
  rank	
  1	
  (P@1)	
  
Methods	
  of	
  comparison	
  
•  Human	
  annotated	
  gold	
  standard	
  
•  RC:	
  Randomly	
  sorted	
  candidates	
  
•  PF:	
  Prior	
  frequency	
  	
  
•  RU:	
  RESLVE	
  given	
  a	
  random	
  Wikipedia	
  user's	
  interest	
  model	
  	
  
•  DS:	
  DBPedia	
  Spotlight	
  
•  WM:	
  Wikipedia	
  Miner	
  	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
RESULTS	
  
Flickr	
   YouTube	
  
RESLVE	
   0.63	
   0.76	
   0.84	
  
RC	
   0.21	
   0.32	
   0.31	
  
PF	
   0.74	
   0.69	
   0.66	
  
RU	
   0.51	
   0.71	
   0.78	
  
WM	
   0.78	
   0.58	
   0.80	
  
DS	
   0.53	
   0.67	
   0.63	
  
Twitter
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
RESULTS	
  
•  Best	
  performance	
  on	
  YouTube	
  texts	
  	
  
	
  	
  	
  (longest)	
  due	
  to	
  content-­‐based	
  sim	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
RESULTS	
  
•  Best	
  performance	
  on	
  YouTube	
  texts	
  	
  
	
  	
  	
  (longest)	
  due	
  to	
  content-­‐based	
  sim	
  
•  Outperforms	
  on	
  more	
  personal	
  text	
  (e.g.,	
  tweets)	
  
	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
RESULTS	
  
•  Best	
  performance	
  on	
  YouTube	
  texts	
  	
  
	
  	
  	
  (longest)	
  due	
  to	
  content-­‐based	
  sim	
  
•  Outperforms	
  on	
  more	
  personal	
  text	
  (e.g.,	
  tweets)	
  
•  Less	
  effecEve	
  on	
  impersonal	
  text	
  (e.g.,	
  photo	
  geo-­‐tags)	
  
•  	
  High	
  prior	
  frequency	
  so	
  standard	
  methods	
  suffice	
  
•  Personally-­‐unfamiliar	
  topics	
  so	
  not	
  likely	
  to	
  make	
  Wiki	
  edits	
  about	
  them	
  
•  Stable	
  interests	
  assumpEon	
  breaks	
  down	
  here	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
RESEARCH	
  PROJECTS	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
•  Semantic	
  
•  Psychological	
  
•  Psychological	
  
•  Behavioral	
  
•  RESLVE	
   •  CeRI	
  
•  Outreach	
  
•  Task	
  routing	
  
•  Commenting	
  
interface	
  
•  Smart	
  Pensieve	
  
•  Activity	
  Rhythms	
  
•  Smoking	
  Cessation	
  
•  Semantic	
  
•  Psychological	
  
•  Sentiment-­‐based	
  
search	
  
Computational	
  
Problem:	
  
Dimensions	
  
Mined:	
  
Projects:	
  
SENTIMENT	
  BASED	
  SEARCH	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
SENTIMENT	
  BASED	
  SEARCH	
  
•  Zip	
  codes	
  of	
  10	
  most	
  populated	
  cities,	
  10	
  least	
  populated	
  cities,	
  10	
  
random	
  cities	
  across	
  the	
  country	
  
•  54,015	
  places	
  across	
  1500	
  US	
  cities	
  
•  Movie	
  theaters,	
  hotels,	
  spas,	
  stores,	
  restaurants,	
  etc.	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
CHALLENGES	
  FOR	
  USERS	
  
•  Interpreting	
  mixed	
  reviews	
  
•  Confidence	
  in	
  reviewer’s	
  subjective	
  opinions	
  
•  Reading	
  multiple	
  reviews	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
RESEARCH	
  QUESTIONS	
  
•  Language	
  and	
  rating	
  
•  How	
  does	
  a	
  place’s	
  rating	
  relate	
  to	
  the	
  language	
  used	
  in	
  its	
  reviews?	
  	
  
•  Personality	
  and	
  rating	
  
•  Do	
  people	
  with	
  similar	
  personalities	
  tend	
  to	
  like	
  or	
  dislike	
  the	
  same	
  places?	
  
•  Search	
  interfaces	
  
•  How	
  can	
  we	
  rank	
  search	
  results	
  in	
  order	
  to	
  recommend	
  places	
  according	
  
to	
  how	
  appealing	
  their	
  atmosphere	
  is	
  likely	
  to	
  be	
  to	
  a	
  user	
  based	
  on	
  her	
  
personality	
  and	
  mood?	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
STRATEGY	
  
•  Extract	
  features	
  from	
  reviews	
  using	
  Linguistic	
  Inquiry	
  and	
  Word	
  
Count	
  (LIWC)	
  and	
  MRC	
  Psycholinguistic	
  Database	
  	
  
	
  
•  Support	
  vector	
  models	
  trained	
  by	
  Mairesse	
  algorithm	
  to	
  derive	
  
Big	
  Five	
  personality	
  types	
  of	
  reviewers	
  
	
  
•  Average	
  personality	
  score	
  of	
  reviewers	
  of	
  a	
  place	
  who	
  rated	
  the	
  
place	
  5	
  or	
  higher/lower	
  as	
  proxy	
  for	
  people	
  who	
  like/dislike	
  a	
  
location’s	
  essence	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
SEARCH	
  INTERFACES	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
Extraversion	
  =	
  Red,	
  Stability	
  =	
  Purple,	
  	
  
Agreeableness	
  =	
  Green,	
  Conscientiousness	
  =	
  Blue,	
  	
  
Openness	
  =	
  Yellow	
  
	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
CONSUMING	
  INFORMATION	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
CREATING	
  &	
  SHARING	
  INFORMATION	
  
Knowledge	
  Sharing	
   Personal	
  Informatics	
  Information	
  Retrieval	
  
RESEARCH	
  PROJECTS	
  
Knowledge	
  Sharing	
   Personal	
  Informatics	
  
•  CeRI	
  
•  Outreach	
  
•  Task	
  routing	
  
•  Commenting	
  
interface	
  
•  Semantic	
  
•  Psychological	
  
Information	
  Retrieval	
  
•  RESLVE	
  
•  Sentiment-­‐based	
  
search	
  
•  Semantic	
  
•  Psychological	
  
•  Psychological	
  
•  Behavioral	
  
•  Smart	
  Pensieve	
  
•  Activity	
  Rhythms	
  
•  Smoking	
  Cessation	
  
•  CeRI	
  
Knowledge	
  Sharing	
   Personal	
  Informatics	
  Information	
  Retrieval	
  
Computational	
  
Problem:	
  
Dimensions	
  
Mined:	
  
Projects:	
  
CERI:	
  CORNELL	
  E-­‐RULEMAKING	
  INITIATIVE	
  
•  Law	
  School	
  
•  Legal	
  Information	
  Institute	
  (LII)	
  
•  Information	
  Science	
  
•  Computer	
  Science	
  
Knowledge	
  Sharing	
   Personal	
  Informatics	
  Information	
  Retrieval	
  
BACKGROUND	
  
•  Rulemaking:	
  process	
  federal	
  agencies	
  use	
  to	
  create	
  regulations	
  
(called	
  “rules”)	
  
•  e-­‐Rulemaking:	
  the	
  use	
  of	
  digital	
  technologies	
  during	
  this	
  process	
  
•  Regulations.gov,	
  RegulationRoom.org:	
  online	
  communities	
  that	
  
allow	
  people	
  to	
  learn	
  about,	
  discuss,	
  and	
  react	
  to	
  proposed	
  rules	
  
during	
  e-­‐Rulemaking	
  process	
  
Knowledge	
  Sharing	
   Personal	
  Informatics	
  Information	
  Retrieval	
  
BACKGROUND	
  
•  Rulemaking:	
  process	
  federal	
  agencies	
  use	
  to	
  create	
  regulations	
  
(called	
  “rules”)	
  
•  e-­‐Rulemaking:	
  the	
  use	
  of	
  digital	
  technologies	
  during	
  this	
  process	
  
•  Regulations.gov,	
  RegulationRoom.org:	
  online	
  communities	
  that	
  
allow	
  people	
  to	
  learn	
  about,	
  discuss,	
  and	
  react	
  to	
  proposed	
  rules	
  
during	
  e-­‐Rulemaking	
  process	
  
Knowledge	
  Sharing	
   Personal	
  Informatics	
  Information	
  Retrieval	
  
PARTICIPATION	
  PATTERNS	
  
•  Regulations.gov	
  
•  14,000	
  rules	
  
•  2	
  million	
  comments	
  
•  Regulation	
  Room	
  
•  5	
  live	
  rules	
  
•  1,318	
  comments	
  
•  Common	
  problem:	
  under-­‐contribution	
  
Knowledge	
  Sharing	
   Personal	
  Informatics	
  Information	
  Retrieval	
  
PARTICIPATION	
  PATTERNS	
  
Knowledge	
  Sharing	
   Personal	
  Informatics	
  Information	
  Retrieval	
  
PARTICIPATION	
  PATTERNS	
  
Frequency	
  of	
  comments	
  per	
  rule	
   Comments	
  per	
  rule	
  across	
  
agencies	
  Knowledge	
  Sharing	
   Personal	
  Informatics	
  Information	
  Retrieval	
  
CHALLENGE	
  
Knowledge	
  Sharing	
   Personal	
  Informatics	
  Information	
  Retrieval	
  
•  A	
  major	
  goal	
  of	
  eRulemaking	
  is	
  to	
  increase	
  public	
  participation	
  across	
  
a	
  broad	
  audience	
  and	
  make	
  the	
  process	
  more	
  representative	
  
•  A	
  major	
  challenge	
  is	
  sustained	
  participation	
  by	
  multiple	
  actors	
  across	
  
rules	
  
SOLUTION	
  
•  Twitter	
  is	
  a	
  popular	
  medium	
  where	
  people	
  express	
  views	
  and	
  ideas	
  
•  Identify	
  and	
  target	
  Twitter	
  users	
  who	
  may	
  be	
  interested	
  
in	
  contributing	
  feedback	
  on	
  a	
  rule	
  
A	
  solution	
  is	
  to	
  bring	
  new	
  users	
  to	
  an	
  e-­‐rule	
  
•  A	
  major	
  goal	
  of	
  eRulemaking	
  is	
  to	
  increase	
  public	
  participation	
  across	
  
a	
  broad	
  audience	
  and	
  make	
  the	
  process	
  more	
  representative	
  
•  A	
  major	
  challenge	
  is	
  sustained	
  participation	
  by	
  multiple	
  actors	
  across	
  
rules	
  
Knowledge	
  Sharing	
   Personal	
  Informatics	
  Information	
  Retrieval	
  
EXPERIMENT	
  
•  How	
  useful	
  is	
  Twitter	
  content	
  for	
  drawing	
  inferences	
  about	
  
people’s	
  interests	
  and	
  knowledgeability	
  about	
  a	
  topic?	
  
•  Are	
  users	
  who	
  create	
  content	
  about	
  topics	
  relevant	
  to	
  an	
  e-­‐rule	
  
more	
  likely	
  to	
  engage	
  in	
  related	
  e-­‐Rulemaking	
  processes	
  if	
  
targeted	
  with	
  requests	
  for	
  participation?	
  
Knowledge	
  Sharing	
   Personal	
  Informatics	
  Information	
  Retrieval	
  
1 Identify	
  Subjects	
  
Bio	
   Tweet	
  
Combo	
   Contro
l	
  
•  Similarity	
  between	
  query	
  and	
  each	
  document	
  
•  Highest	
  score	
  used	
  to	
  assign	
  user	
  to	
  condition	
  
*	
  via	
  Google	
  Keyword	
  
Tool,	
  which	
  provides	
  less	
  
technical	
  words	
  used	
  by	
  
public	
  to	
  discuss	
  same	
  
topics	
  
User:	
   Rule:	
  
Document	
  term	
  matrix	
   Query	
  
q = words in rule +
query expansion *
D1 = bio D2 = tweets
D3 = bio+tweets (“combo”)
Knowledge	
  Sharing	
   Personal	
  Informatics	
  Information	
  Retrieval	
  
2
²  Highest	
   ranked	
   users	
   in	
   each	
   group	
   sent	
   an	
   outreach	
  
tweet	
  
Send	
  Tweets	
  
Knowledge	
  Sharing	
   Personal	
  Informatics	
  Information	
  Retrieval	
  
3
	
  
•  Engagement	
  (retweets,	
  replies,	
  and	
  follows)	
  
•  Click	
  Through	
  Rate	
  
	
  
•  Contributed	
  to	
  the	
  rule	
  
Measure	
  Response	
  
Knowledge	
  Sharing	
   Personal	
  Informatics	
  Information	
  Retrieval	
  
PSYCHOLOGICAL	
  TRAITS	
  OF	
  EFFECTIVE	
  
CONTRIBUTORS	
  
Knowledge	
  Sharing	
   Personal	
  Informatics	
  Information	
  Retrieval	
  
•  Connecting	
  psychological	
  traits,	
  language	
  use,	
  and	
  contribution	
  
capability	
  
	
  
•  Classification,	
  Outreach,	
  and	
  Task	
  Routing	
  
PSYCHOLOGICAL	
  TRAITS	
  OF	
  EFFECTIVE	
  
CONTRIBUTORS	
  
•  Connecting	
  psychological	
  traits,	
  language	
  use,	
  and	
  contribution	
  
capability	
  
	
  
•  Classification,	
  Outreach,	
  and	
  Task	
  Routing	
  
	
  
•  Inventories	
  
•  Self-­‐efficacy	
  &	
  self-­‐esteem	
  
•  Big	
  5	
  personality	
  
•  Self-­‐regulation	
  &	
  self-­‐monitoring	
  
•  Trendsetting	
  &	
  Opinion	
  Leadership	
  
•  Pro-­‐social	
  &	
  altruistic	
  value	
  orientations	
  
	
  
Knowledge	
  Sharing	
   Personal	
  Informatics	
  Information	
  Retrieval	
  
COMPUTATIONAL	
  SUPPORTS	
  FOR	
  
KNOWLEDGE	
  SHARING	
  
•  Meaningful	
  games	
  to	
  teach	
  
community	
  norms	
  
•  Personalized	
  rule	
  recommendation	
  
•  Providing	
  assistance,	
  prompts,	
  and	
  
examples	
  to	
  improve	
  the	
  quality	
  of	
  
contributions	
  
Knowledge	
  Sharing	
   Personal	
  Informatics	
  Information	
  Retrieval	
  
COMPUTATIONAL	
  SUPPORTS	
  FOR	
  
KNOWLEDGE	
  SHARING	
  
•  Meaningful	
  games	
  to	
  teach	
  
community	
  norms	
  
•  Personalized	
  rule	
  recommendation	
  
•  Providing	
  assistance,	
  prompts,	
  and	
  
examples	
  to	
  improve	
  the	
  quality	
  of	
  
contributions	
  
Knowledge	
  Sharing	
   Personal	
  Informatics	
  Information	
  Retrieval	
  
RESEARCH	
  PROJECTS	
  
Knowledge	
  Sharing	
  
•  Psychological	
  
•  Behavioral	
  
•  CeRI	
  
•  Outreach	
  
•  Task	
  routing	
  
•  Commenting	
  
interface	
  
•  Smart	
  Pensieve	
  
•  Activity	
  Rhythms	
  
•  Smoking	
  Cessation	
  
•  Semantic	
  
•  Psychological	
  
Information	
  Retrieval	
  
•  RESLVE	
  
•  Sentiment-­‐based	
  
search	
  
Personal	
  Informatics	
  
•  Semantic	
  
•  Psychological	
  
Information	
  Retrieval	
   Personal	
  Informatics	
  Information	
  Retrieval	
   Knowledge	
  Sharing	
  
Computational	
  
Problem:	
  
Dimensions	
  
Mined:	
  
Projects:	
  
RESEARCH	
  PROJECTS	
  
Knowledge	
  Sharing	
  
•  Psychological	
  
•  Behavioral	
  
•  CeRI	
  
•  Outreach	
  
•  Task	
  routing	
  
•  Commenting	
  
interface	
  
•  Activity	
  Rhythms	
  
•  Smoking	
  Cessation	
  
•  Semantic	
  
•  Psychological	
  
Information	
  Retrieval	
  
•  RESLVE	
  
•  Sentiment-­‐based	
  
search	
  
Personal	
  Informatics	
  
•  Semantic	
  
•  Psychological	
  
•  Smart	
  Pensieve	
  
Information	
  Retrieval	
   Personal	
  Informatics	
  Information	
  Retrieval	
   Knowledge	
  Sharing	
  
Computational	
  
Problem:	
  
Dimensions	
  
Mined:	
  
Projects:	
  
REMINISCENCE	
  
•  Current	
  tools	
  are	
  too	
  technically	
  focused	
  
•  Emphasize	
  data	
  capture	
  and	
  logging	
  (photos,	
  videos,	
  
scanned	
  documents)	
  
•  Treats	
  memories	
  as	
  information	
  to	
  be	
  later	
  manipulated	
  
Information	
  Retrieval	
   Personal	
  Informatics	
  Information	
  Retrieval	
   Knowledge	
  Sharing	
  
REMINISCENCE	
  
•  Current	
  tools	
  are	
  too	
  technically	
  focused	
  
•  Emphasize	
  data	
  capture	
  and	
  logging	
  (photos,	
  videos,	
  
scanned	
  documents)	
  
•  Treats	
  memories	
  as	
  information	
  to	
  be	
  later	
  manipulated	
  
•  But	
  the	
  activity	
  of	
  reminiscence	
  is	
  actually..	
  
•  Imprecise	
  
•  Social	
  
•  Nuanced	
  
Information	
  Retrieval	
   Personal	
  Informatics	
  Information	
  Retrieval	
   Knowledge	
  Sharing	
  
SMART	
  PENSIEVE:	
  	
  
WHAT	
  MAKES	
  A	
  MEMORY	
  MEANINGFUL?	
  
•  Content	
  type	
  
•  Photos,	
  wall	
  posts,	
  status	
  updates,	
  event	
  information	
  
•  Social	
  dynamics	
  
•  Tie	
  strength,	
  kind	
  of	
  relationship,	
  amount	
  of	
  interaction	
  
•  Temporal	
  features	
  
•  Recent,	
  distant	
  past	
  
Information	
  Retrieval	
   Personal	
  Informatics	
  Information	
  Retrieval	
   Knowledge	
  Sharing	
  
TRIGGERING	
  MEMORY	
  
Information	
  Retrieval	
   Personal	
  Informatics	
  Information	
  Retrieval	
   Knowledge	
  Sharing	
  
PAM,	
  PANAS,	
  ISS,	
  MSCS	
  
Information	
  Retrieval	
   Personal	
  Informatics	
  Information	
  Retrieval	
   Knowledge	
  Sharing	
  
COLLECTING	
  SURVEY	
  DATA	
  
•  Laboratory	
  setting	
  
•  Pro:	
  Can	
  monitor	
  participants	
  &	
  ensure	
  data	
  quality	
  
•  Con:	
  More	
  time	
  consuming	
  for	
  researcher	
  
•  Con:	
  Higher	
  pay	
  rates	
  
•  Online	
  surveys	
  
•  Pro:	
  Allow	
  larger	
  scale	
  collection	
  
•  Pro:	
  Cheaper	
  (time	
  &	
  money)	
  
•  Con:	
  Drop-­‐outs	
  and	
  missing	
  responses	
  common	
  
Information	
  Retrieval	
   Personal	
  Informatics	
  Information	
  Retrieval	
   Knowledge	
  Sharing	
  
Information	
  Retrieval	
   Personal	
  Informatics	
  Information	
  Retrieval	
   Knowledge	
  Sharing	
  
IMPROVING	
  SURVEY	
  ADMINISTRATION	
  
RESEARCH	
  PROJECTS	
  
Knowledge	
  Sharing	
  
•  Psychological	
  
•  Behavioral	
  
•  CeRI	
  
•  Outreach	
  
•  Task	
  routing	
  
•  Commenting	
  
interface	
  
•  Smart	
  Pensieve	
  
•  Smoking	
  Cessation	
  
•  Semantic	
  
•  Psychological	
  
Information	
  Retrieval	
  
•  RESLVE	
  
•  Sentiment-­‐based	
  
search	
  
Personal	
  Informatics	
  
•  Semantic	
  
•  Psychological	
  
•  Activity	
  Rhythms	
  
Computational	
  
Problem:	
  
Dimensions	
  
Mined:	
  
Projects:	
  
BEHAVIOR	
  &	
  HEALTH	
  
•  Assess	
  sleep	
  patterns	
  &	
  circadian	
  rhythm	
  
•  Capture	
  behavioral	
  factors	
  associated	
  with	
  stress	
  
•  Approach	
  
•  Screen	
  on/off	
  
•  Unlocking	
  
•  Application	
  usage	
  
•  Internet	
  search	
  
•  SMS,	
  email,	
  phone	
  
	
  
Information	
  Retrieval	
   Personal	
  Informatics	
  Information	
  Retrieval	
   Knowledge	
  Sharing	
  
BEHAVIOR	
  &	
  HEALTH	
  
•  Scheduling	
  Patterns	
  
•  Socially-­‐Oriented	
  Behaviors	
  
•  Approach	
  
•  Calendar	
  entries,	
  social	
  media	
  posts,	
  messages	
  
•  Psycholinguistic	
  Analysis	
  
•  Personality	
  Inventory	
  
Information	
  Retrieval	
   Personal	
  Informatics	
  Information	
  Retrieval	
   Knowledge	
  Sharing	
  
RESEARCH	
  PROJECTS	
  
Knowledge	
  Sharing	
  
•  Psychological	
  
•  Behavioral	
  
•  CeRI	
  
•  Outreach	
  
•  Task	
  routing	
  
•  Commenting	
  
interface	
  
•  Smart	
  Pensieve	
  
•  Activity	
  Rhythms	
  
•  Semantic	
  
•  Psychological	
  
Information	
  Retrieval	
  
•  RESLVE	
  
•  Sentiment-­‐based	
  
search	
  
Personal	
  Informatics	
  
•  Semantic	
  
•  Psychological	
  
•  Smoking	
  Cessation	
  
Computational	
  
Problem:	
  
Dimensions	
  
Mined:	
  
Projects:	
  
SMOKING	
  CESSATION	
  
•  Leading	
  cause	
  of	
  preventable	
  death	
  &	
  leading	
  form	
  of	
  
chemical	
  dependence	
  in	
  U.S.	
  
•  44	
  million	
  smokers	
  in	
  the	
  U.S.	
  alone	
  (1/5	
  of	
  population)	
  
•  68.8%	
  report	
  they	
  want	
  to	
  quit	
  and	
  over	
  50%	
  have	
  tried	
  for	
  
at	
  least	
  1	
  day	
  in	
  the	
  past	
  year	
  
•  Relapse	
  common	
  &	
  a	
  minority	
  permanently	
  abstain	
  
Information	
  Retrieval	
   Personal	
  Informatics	
  Information	
  Retrieval	
   Knowledge	
  Sharing	
  
INTERVENTION	
  
•  Requires	
  tailoring	
  to	
  individual	
  conditions	
  
•  Lack	
  of	
  long	
  term	
  patient	
  assessment	
  &	
  follow-­‐up	
  	
  
•  Access	
  and	
  affordability	
  are	
  obstacles	
  
	
  
Information	
  Retrieval	
   Personal	
  Informatics	
  Information	
  Retrieval	
   Knowledge	
  Sharing	
  
INTERVENTION	
  
•  Requires	
  tailoring	
  to	
  individual	
  conditions	
  
•  Lack	
  of	
  long	
  term	
  patient	
  assessment	
  &	
  follow-­‐up	
  	
  
•  Access	
  and	
  affordability	
  are	
  obstacles	
  
•  Technology	
  based	
  interventions	
  have	
  major	
  shortcomings	
  
•  Low	
  adherence	
  to	
  established	
  guidelines	
  
•  Not	
  personalized	
  
•  Unable	
  to	
  handle	
  user	
  struggles	
  and	
  setbacks	
  
Information	
  Retrieval	
   Personal	
  Informatics	
  Information	
  Retrieval	
   Knowledge	
  Sharing	
  
FACTORS	
  INFLUENCING	
  OUTCOME	
  
•  Personal,	
  psychological,	
  emotional	
  traits	
  
•  Behaviors	
  &	
  activities	
  
•  Environment	
  and	
  social	
  interactions	
  
•  Cessation	
  motivations	
  and	
  process	
  
	
  
LEVERAGING	
  DIGITAL	
  FOOTPRINTS	
  
•  Naturally	
  expressed	
  language	
  
•  Content	
  is	
  posted	
  spontaneously	
  and	
  regularly	
  
•  Social	
  setting	
  
•  Low-­‐cost,	
  large-­‐scale,	
  longitudinal	
  data	
  access	
  
Information	
  Retrieval	
   Personal	
  Informatics	
  Information	
  Retrieval	
   Knowledge	
  Sharing	
  
MAKE	
  A	
  PREDICTION	
  
 General	
  illness	
  +	
  coughing	
  +	
  wheezing	
  =	
  Today	
  I	
  quit	
  smoking.	
  
 Just	
  saw	
  a	
  cigarette	
  commercial	
  with	
  people	
  with	
  holes	
  in	
  their	
  throat.	
  It's	
  
official.	
  No	
  more	
  cigarettes.	
  
 Today,	
  I	
  quit	
  smoking.	
  My	
  son	
  came	
  home	
  with	
  an	
  ashtray	
  he	
  made	
  in	
  arts	
  
and	
  crafts	
  class.	
  FML	
  
MAKE	
  A	
  PREDICTION	
  
 General	
  illness	
  +	
  coughing	
  +	
  wheezing	
  =	
  Today	
  I	
  quit	
  smoking.	
  
 Just	
  saw	
  a	
  cigarette	
  commercial	
  with	
  people	
  with	
  holes	
  in	
  their	
  throat.	
  It's	
  
official.	
  No	
  more	
  cigarettes.	
  
 Today,	
  I	
  quit	
  smoking.	
  My	
  son	
  came	
  home	
  with	
  an	
  ashtray	
  he	
  made	
  in	
  arts	
  
and	
  crafts	
  class.	
  FML	
  
MAKE	
  A	
  PREDICTION	
  
 General	
  illness	
  +	
  coughing	
  +	
  wheezing	
  =	
  Today	
  I	
  quit	
  smoking.	
  
 Just	
  saw	
  a	
  cigarette	
  commercial	
  with	
  people	
  with	
  holes	
  in	
  their	
  throat.	
  It's	
  
official.	
  No	
  more	
  cigarettes.	
  
 Today,	
  I	
  quit	
  smoking.	
  My	
  son	
  came	
  home	
  with	
  an	
  ashtray	
  he	
  made	
  in	
  arts	
  
and	
  crafts	
  class.	
  FML	
  
n  i’m	
  cool,	
  day	
  4	
  no	
  cigs	
  but	
  my	
  mom	
  smokes,	
  i	
  stay	
  with	
  her,	
  does	
  not	
  respect	
  
me	
  trying	
  to	
  quit	
  :	
  
n  I	
  quit	
  smoking	
  on	
  Sunday	
  evening.	
  Day	
  3	
  today.	
  I	
  feel	
  exhausted,	
  annoyed,	
  
bored.	
  But	
  the	
  fight	
  must	
  go	
  on.	
  Keep	
  fighting	
  :)	
  
n  somebody	
  is	
  getting	
  punched	
  in	
  the	
  f***ing	
  mouth	
  today.	
  #coldturkey	
  
MAKE	
  A	
  PREDICTION	
  
 General	
  illness	
  +	
  coughing	
  +	
  wheezing	
  =	
  Today	
  I	
  quit	
  smoking.	
  
 Just	
  saw	
  a	
  cigarette	
  commercial	
  with	
  people	
  with	
  holes	
  in	
  their	
  throat.	
  It's	
  
official.	
  No	
  more	
  cigarettes.	
  
 Today,	
  I	
  quit	
  smoking.	
  My	
  son	
  came	
  home	
  with	
  an	
  ashtray	
  he	
  made	
  in	
  arts	
  
and	
  crafts	
  class.	
  FML	
  
n  i’m	
  cool,	
  day	
  4	
  no	
  cigs	
  but	
  my	
  mom	
  smokes,	
  i	
  stay	
  with	
  her,	
  does	
  not	
  respect	
  
me	
  trying	
  to	
  quit	
  :	
  
n  I	
  quit	
  smoking	
  on	
  Sunday	
  evening.	
  Day	
  3	
  today.	
  I	
  feel	
  exhausted,	
  annoyed,	
  
bored.	
  But	
  the	
  fight	
  must	
  go	
  on.	
  Keep	
  fighting	
  :)	
  
n  somebody	
  is	
  getting	
  punched	
  in	
  the	
  f***ing	
  mouth	
  today.	
  #coldturkey	
  
METHODOLOGY	
  &	
  DATA	
  COLLECTION	
  
• Identify	
  smokers	
  
•  Query	
  Twitter	
  firehose	
  for	
  cessation	
  event	
  tweets	
  
•  Sample	
  2000	
  users	
  
•  3	
  Mechanical	
  Turkers	
  per	
  tweet	
  for	
  verification	
  
•  2	
  years	
  worth	
  of	
  tweets	
  per	
  verified	
  smoker	
  (1	
  year	
  before	
  cessation	
  
event,	
  1	
  year	
  after)	
  
MEASURES	
  
Activity	
  variables	
  
•  Tweet	
  volume,	
  burstiness,	
  frequency	
  
Social	
  variables	
  
•  Friends,	
  followers,	
  tweets	
  with	
  @mentions,	
  unique	
  mentions	
  
Personal	
  &	
  Emotional	
  variables	
  
•  Location,	
  sentiment	
  intensity	
  
Behavior	
  Change	
  Process	
  variables	
  
•  Cessation	
  date,	
  motive	
  to	
  quit,	
  treatment,	
  stages	
  of	
  behavior	
  
change	
  
MEASURES	
  
Activity	
  variables	
  
•  Tweet	
  volume,	
  burstiness,	
  frequency	
  
Social	
  variables	
  
•  Friends,	
  followers,	
  tweets	
  with	
  @mentions,	
  unique	
  mentions	
  
Personal	
  &	
  Emotional	
  variables	
  
•  Location,	
  sentiment	
  intensity	
  
Behavior	
  Change	
  Process	
  variables	
  
•  Cessation	
  date,	
  motive	
  to	
  quit,	
  treatment,	
  stages	
  of	
  behavior	
  
change	
  
	
  
MEASURES	
  
Activity	
  variables	
  
•  Tweet	
  volume,	
  burstiness,	
  frequency	
  
Social	
  variables	
  
•  Friends,	
  followers,	
  tweets	
  with	
  @mentions,	
  unique	
  mentions	
  
Personal	
  &	
  Emotional	
  variables	
  
•  Location,	
  sentiment	
  intensity	
  
Behavior	
  Change	
  Process	
  variables	
  
•  Cessation	
  date,	
  motive	
  to	
  quit,	
  treatment,	
  stages	
  of	
  behavior	
  
change	
  
	
  
MEASURES	
  
Activity	
  variables	
  
•  Tweet	
  volume,	
  burstiness,	
  frequency	
  
Social	
  variables	
  
•  Friends,	
  followers,	
  tweets	
  with	
  @mentions,	
  unique	
  mentions	
  
Personal	
  &	
  Emotional	
  variables	
  
•  Location,	
  sentiment	
  intensity	
  
Behavior	
  Change	
  Process	
  variables	
  
•  Cessation	
  date,	
  motive	
  to	
  quit,	
  treatment,	
  stages	
  of	
  behavior	
  
change	
  
	
  
MEASURES	
  
Activity	
  variables	
  
•  Tweet	
  volume,	
  burstiness,	
  frequency	
  
Social	
  variables	
  
•  Friends,	
  followers,	
  tweets	
  with	
  @mentions,	
  unique	
  mentions	
  
Personal	
  &	
  Emotional	
  variables	
  
•  Location,	
  sentiment	
  intensity	
  
Behavior	
  Change	
  Process	
  variables	
  
•  Cessation	
  date,	
  motive	
  to	
  quit,	
  treatment,	
  stages	
  of	
  behavior	
  
change	
  
	
  
RESPONSE	
  VARIABLES	
  
 Outcome	
  
­  Survival	
  /	
  Relapse	
  
 Survivors	
  
 Congratulations	
  to	
  me,	
  still	
  smoke	
  free	
  J	
  
 @username	
  nope	
  i	
  don’t	
  smoke	
  anymore	
  
 first	
  few	
  weeks	
  were	
  hard	
  but	
  I	
  haven’t	
  craved	
  a	
  cig	
  in	
  months	
  
 Relapsers	
  
 Day	
  26:	
  Broke	
  down	
  and	
  bought	
  	
  a	
  pack	
  of	
  smokes	
  last	
  weekend.	
  Smoked	
  the	
  last	
  one	
  today.	
  
 Well,	
  tried	
  to	
  quit	
  smokin	
  tobacco	
  but..had	
  a	
  fucked	
  up	
  day	
  
 So	
  day	
  3	
  of	
  not	
  smoking	
  is	
  about	
  to	
  get	
  cut	
  short..i	
  can’t	
  do	
  it	
  lol	
  
ALIGNMENT	
  WITH	
  CDC	
  REPORTS	
  
!
!
Men Women
CDC 54% 46%
Twitter 59% 41%
Location	
  
	
  
	
  
Gender	
  
	
  
	
  
	
  
Abstinence	
  Rates	
  
	
  
! !
ALIGNMENT	
  WITH	
  CDC	
  REPORTS	
  
!
!
Men Women
CDC 54% 46%
Twitter 59% 41%
Location	
  
	
  
	
  
Gender	
  
	
  
	
  
	
  
Abstinence	
  Rates	
  
	
  
! !
ALIGNMENT	
  WITH	
  CDC	
  REPORTS	
  
!
!
Men Women
CDC 54% 46%
Twitter 59% 41%
Location	
  
	
  
	
  
Gender	
  
	
  
	
  
	
  
Abstinence	
  Rates	
  
	
  
! !
ALIGNMENT	
  WITH	
  CDC	
  REPORTS	
  
!
!
Men Women
CDC 54% 46%
Twitter 59% 41%
Location	
  
	
  
	
  
Gender	
  
	
  
	
  
	
  
Abstinence	
  Rates	
  
	
  
! !
RESULTS	
  
•  Survivors	
  (S)	
  and	
  Relapsers	
  (R)	
  
•  Before	
  (B)	
  and	
  After	
  (A)	
  the	
  cessation	
  point	
  
SIGNIFICANT	
  DIFFERENCES:	
  ACTIVITY	
  
Tweets	
  
before	
  
Tweets	
  
after	
  
Burst	
  
before	
  
Burst	
  
after	
  
Freq	
  before	
   Freq	
  
after	
  
FAIL	
   1243	
   3551	
   10.119	
   10.943	
   3.56	
   2.704	
  
SUCCEED	
   412	
   771	
   4.459	
   4.278	
   9.906	
   11.254	
  
TIME	
  OF	
  DAY	
  
!
“im	
  really	
  considering	
  smoking	
  
tonight	
  bcause	
  im	
  so	
  stressed”	
  
TIME	
  OF	
  DAY	
  
!
“outside	
  the	
  club	
  and	
  guy	
  beside	
  me	
  
smoking	
  makes	
  me	
  wanna”	
  
“im	
  really	
  considering	
  smoking	
  
tonight	
  bcause	
  im	
  so	
  stressed”	
  
SIGNIFICANT	
  DIFFERENCES:	
  SOCIAL	
  
Friends	
  
before	
  
Friends	
  after	
   Followers	
  
before	
  
Follwers	
  after	
  
FAIL	
   .093	
   .073	
   .074	
   .064	
  
SUCCEED	
   .187	
   .207	
   .114	
   .125	
  
“Starting	
  the	
  patch	
  
today.	
  Everyone	
  please	
  
support	
  me	
  on	
  the	
  road	
  to	
  
quitting	
  smoking”	
  	
  
“Ok	
  I	
  started	
  a	
  really	
  big	
  challenge	
  
yesterday...	
  I	
  quit	
  smoking!	
  I	
  may	
  need	
  
some	
  help	
  from	
  you	
  guys	
  in	
  the	
  
upcoming	
  days/weeks”.	
  	
  
SIGNIFICANT	
  DIFFERENCES:	
  SOCIAL	
  
Friends	
  
before	
  
Friends	
  after	
   Followers	
  
before	
  
Follwers	
  after	
  
FAIL	
   .093	
   .073	
   .074	
   .064	
  
SUCCEED	
   .187	
   .207	
   .114	
   .125	
  
Day	
  2	
  of	
  not	
  smoking	
  #bittersweet	
  
	
  
I	
  quit	
  smoking	
  yesterday	
  and	
  everyone	
  is	
  pissing	
  me	
  off!	
  
	
  
Day	
  3	
  without	
  a	
  cig.	
  Ooo	
  I'm	
  about	
  to	
  shoot	
  someone	
  
MOTIVES	
  
!
!
!
Information	
  Retrieval	
   Personal	
  Informatics	
  Information	
  Retrieval	
   Knowledge	
  Sharing	
  
PREDICTION	
  
CONTRIBUTIONS	
  
 Theoretical	
  contributions	
  
­  Goal	
  setting	
  
­  Behavior	
  change	
  
 Computational	
  contributions	
  
­  Classification	
  of	
  smoking-­‐relevant	
  content	
  
­  Extraction	
  of	
  informative	
  data	
  features	
  
­  Modeling	
  the	
  process	
  &	
  predicting	
  ultimate	
  outcome	
  
­  Design	
  implications	
  for	
  intelligent	
  intervention	
  technologies	
  
RESEARCH	
  PROJECTS	
  
Information	
  Retrieval	
   Knowledge	
  Sharing	
   Personal	
  Informatics	
  
Computational	
  
Problem:	
  
Dimensions	
  
Mined:	
  
Projects:	
  
•  Semantic	
  
•  Psychological	
  
•  Psychological	
  
•  Behavioral	
  
•  CeRI	
  
•  Outreach	
  
•  Task	
  routing	
  
•  Commenting	
  
interface	
  
•  Smart	
  Pensieve	
  
•  Activity	
  Rhythms	
  
•  Smoking	
  Cessation	
  
•  Semantic	
  
•  Psychological	
  
•  RESLVE	
  
•  Sentiment-­‐based	
  
search	
  
SUMMARY	
  &	
  CONCLUSION	
  
•  Advance	
  our	
  understanding	
  of	
  what	
  our	
  digital	
  footprints	
  reveal	
  
about	
  us	
  as	
  humans	
  
•  Develop	
  new	
  computational	
  techniques	
  that	
  can	
  make	
  sense	
  of	
  
and	
  utilize	
  this	
  data’s	
  nuanced	
  semantic,	
  psychological,	
  and	
  
behavioral	
  dimensions	
  
•  Apply	
  the	
  resulting	
  intelligent	
  systems	
  across	
  multiple	
  domains	
  in	
  
order	
  to	
  help	
  people	
  use	
  digital	
  information	
  and	
  have	
  meaningful	
  
experiences	
  with	
  technology	
  
THANK	
  YOU!	
  
•  Advance	
  our	
  understanding	
  of	
  what	
  our	
  digital	
  footprints	
  reveal	
  
about	
  us	
  as	
  humans	
  
•  Develop	
  new	
  computational	
  techniques	
  that	
  can	
  make	
  sense	
  of	
  
and	
  utilize	
  this	
  data’s	
  nuanced	
  semantic,	
  psychological,	
  and	
  
behavioral	
  dimensions	
  
•  Apply	
  the	
  resulting	
  intelligent	
  systems	
  across	
  multiple	
  domains	
  in	
  
order	
  to	
  help	
  people	
  use	
  digital	
  information	
  and	
  have	
  meaningful	
  
experiences	
  with	
  technology	
  
	
  
v  Questions,	
  comments,	
  and	
  guidance	
  welcome!	
  
Elizabeth	
  L.	
  Murnane	
  
elm236@cornell.edu 	
  	
  
www.cs.cornell.edu/~elm236/	
  

Contenu connexe

Tendances

Annual conference presentation final pdf
Annual conference presentation final pdfAnnual conference presentation final pdf
Annual conference presentation final pdf
ksre
 

Tendances (7)

Annual conference presentation final pdf
Annual conference presentation final pdfAnnual conference presentation final pdf
Annual conference presentation final pdf
 
The wall falls down: Integrating our online and offline worlds [Confab 2015]
The wall falls down: Integrating our online and offline worlds [Confab 2015]The wall falls down: Integrating our online and offline worlds [Confab 2015]
The wall falls down: Integrating our online and offline worlds [Confab 2015]
 
[soap Keynote] The Freedom to Grow: how standards facilitate the techcomm ind...
[soap Keynote] The Freedom to Grow: how standards facilitate the techcomm ind...[soap Keynote] The Freedom to Grow: how standards facilitate the techcomm ind...
[soap Keynote] The Freedom to Grow: how standards facilitate the techcomm ind...
 
Change 改變 (2011.06.30@STL 明光社)
Change 改變 (2011.06.30@STL 明光社)Change 改變 (2011.06.30@STL 明光社)
Change 改變 (2011.06.30@STL 明光社)
 
Open source intelligence information gathering (OSINT)
Open source intelligence information gathering (OSINT)Open source intelligence information gathering (OSINT)
Open source intelligence information gathering (OSINT)
 
Social Communications: Getting Prepared and Making it Happen
Social Communications: Getting Prepared and Making it HappenSocial Communications: Getting Prepared and Making it Happen
Social Communications: Getting Prepared and Making it Happen
 
Netnography: Overview and How to (Schulich School of Business, MBA class, Soc...
Netnography: Overview and How to (Schulich School of Business, MBA class, Soc...Netnography: Overview and How to (Schulich School of Business, MBA class, Soc...
Netnography: Overview and How to (Schulich School of Business, MBA class, Soc...
 

En vedette

Buon compleanno antonio2012bis
Buon compleanno antonio2012bisBuon compleanno antonio2012bis
Buon compleanno antonio2012bis
Rosy Colombo
 

En vedette (18)

Absortzioa plater zutabetan
Absortzioa plater zutabetanAbsortzioa plater zutabetan
Absortzioa plater zutabetan
 
Bab 1
Bab 1Bab 1
Bab 1
 
Athens goverment and polis
Athens goverment  and polisAthens goverment  and polis
Athens goverment and polis
 
Athens goverment and polis
Athens goverment  and polisAthens goverment  and polis
Athens goverment and polis
 
Why BGV
Why BGVWhy BGV
Why BGV
 
Koordinazio konposatuak
Koordinazio konposatuakKoordinazio konposatuak
Koordinazio konposatuak
 
Genetikoki eraldatutako landareak ppt
Genetikoki eraldatutako landareak pptGenetikoki eraldatutako landareak ppt
Genetikoki eraldatutako landareak ppt
 
Bab 15
Bab 15Bab 15
Bab 15
 
Pirosekuentziazioa aurkezpena
Pirosekuentziazioa aurkezpenaPirosekuentziazioa aurkezpena
Pirosekuentziazioa aurkezpena
 
Real, true, good presentasi
Real, true, good presentasiReal, true, good presentasi
Real, true, good presentasi
 
Homework
HomeworkHomework
Homework
 
Athens goverment and polis
Athens goverment  and polisAthens goverment  and polis
Athens goverment and polis
 
Designing an Online Civic Participation Platform: Socio-Computational Support...
Designing an Online Civic Participation Platform: Socio-Computational Support...Designing an Online Civic Participation Platform: Socio-Computational Support...
Designing an Online Civic Participation Platform: Socio-Computational Support...
 
Unraveling Abstinence and Relapse: Smoking Cessation Reflected in Social Media
Unraveling Abstinence and Relapse: Smoking Cessation Reflected in Social MediaUnraveling Abstinence and Relapse: Smoking Cessation Reflected in Social Media
Unraveling Abstinence and Relapse: Smoking Cessation Reflected in Social Media
 
Buon compleanno antonio2012bis
Buon compleanno antonio2012bisBuon compleanno antonio2012bis
Buon compleanno antonio2012bis
 
Buon compleanno don Antonio
Buon compleanno don AntonioBuon compleanno don Antonio
Buon compleanno don Antonio
 
Info Viz by Liz
Info Viz by LizInfo Viz by Liz
Info Viz by Liz
 
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short TextRESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text
 

Similaire à Noticing the Nuance: Designing intelligent systems that can understand semantic, psychological, and behavioral dimensions of our digital footprints

Similaire à Noticing the Nuance: Designing intelligent systems that can understand semantic, psychological, and behavioral dimensions of our digital footprints (20)

Advanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU InvestigatorsAdvanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU Investigators
 
Enterprise Scale Knowledge Graphs
Enterprise Scale Knowledge GraphsEnterprise Scale Knowledge Graphs
Enterprise Scale Knowledge Graphs
 
The Art of Smart Data
The Art of Smart DataThe Art of Smart Data
The Art of Smart Data
 
Detecting Signals from Real-time Social Web
Detecting Signals from Real-time Social Web Detecting Signals from Real-time Social Web
Detecting Signals from Real-time Social Web
 
Detecting Signals from Real-time Social Web
Detecting Signals from Real-time Social Web Detecting Signals from Real-time Social Web
Detecting Signals from Real-time Social Web
 
OSINT- Leveraging data into intelligence
OSINT- Leveraging data into intelligenceOSINT- Leveraging data into intelligence
OSINT- Leveraging data into intelligence
 
APIS. Digitale biographische Blütenlese
APIS. Digitale biographische BlütenleseAPIS. Digitale biographische Blütenlese
APIS. Digitale biographische Blütenlese
 
Lecture4 Social Web
Lecture4 Social Web Lecture4 Social Web
Lecture4 Social Web
 
Getting comfortable with Data
Getting comfortable with DataGetting comfortable with Data
Getting comfortable with Data
 
Context, Narratives & Big Data Analytics
Context, Narratives & Big Data AnalyticsContext, Narratives & Big Data Analytics
Context, Narratives & Big Data Analytics
 
Cultural Networks 2012: Headway UK
Cultural Networks 2012: Headway UKCultural Networks 2012: Headway UK
Cultural Networks 2012: Headway UK
 
Digital Marketing & Discoverability for the Performing Arts
Digital Marketing & Discoverability for the Performing ArtsDigital Marketing & Discoverability for the Performing Arts
Digital Marketing & Discoverability for the Performing Arts
 
Osint
OsintOsint
Osint
 
Privacy, Ethics, and Future Uses of the Social Web
Privacy, Ethics, and Future Uses of the Social WebPrivacy, Ethics, and Future Uses of the Social Web
Privacy, Ethics, and Future Uses of the Social Web
 
Social Media Dataset
Social Media DatasetSocial Media Dataset
Social Media Dataset
 
Technology Trends Social Media June 2011
Technology Trends Social Media June 2011Technology Trends Social Media June 2011
Technology Trends Social Media June 2011
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With Hadoop
 
Kohacon2016
Kohacon2016Kohacon2016
Kohacon2016
 
Social Media Brian Johnson Program, 07.14.10
Social Media Brian Johnson Program, 07.14.10Social Media Brian Johnson Program, 07.14.10
Social Media Brian Johnson Program, 07.14.10
 
Cortana intelligence suite for projects &amp; hacks
Cortana intelligence suite for projects &amp; hacksCortana intelligence suite for projects &amp; hacks
Cortana intelligence suite for projects &amp; hacks
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Noticing the Nuance: Designing intelligent systems that can understand semantic, psychological, and behavioral dimensions of our digital footprints

  • 1. NOTICING  THE  NUANCE:   Designing  intelligent  systems  that  can  understand     semantic,  psychological,  and  behavioral  dimensions     of  our  digital  footprints   Elizabeth  L.  Murnane     elm236@cornell.edu     www.cs.cornell.edu/~elm236/  
  • 2. ABOUT  ELIZABETH   Currently   •  3nd  year  PhD  at  Cornell  Information  Science   •  Committee:  Profs.  Dan  Cosley  (chair),  Claire  Cardie,  Geri  Gay     Research   •  Personalization;  IR/NLP;  Personal  Informatics;  Affective-­‐,  Semantic-­‐,   Social-­‐  Computing   •  2011  NSF  Graduate  Research  Fellow   Background   •  2007  MIT  S.B.  in  Mathematics  with  Computer  Science   •  Co-­‐founded  MIT  CSAIL  startup    
  • 3. USER-­‐CENTRIC  DATA   •  Explicit  &  Implicit   •  User-­‐generated  content   •  Sensor  data   •  Big  Data  &  Big  Personal  Data  (“Little  Data”)  
  • 5. DIGITAL  FOOTPRINTS   •  Search  Queries    
  • 6. DIGITAL  FOOTPRINTS   •  Search  Queries   •  Social  web,  microblogs,  media  sharing    
  • 7. DIGITAL  FOOTPRINTS   •  Search  Queries   •  Social  web,  microblogs,  media  sharing   •  Mobile  sensing,  personal  informatics,   life-­‐logging,  check-­‐ins  
  • 8. DIGITAL  FOOTPRINTS   •  Search  Queries   •  Social  web,  microblogs,  media  sharing   •  Mobile  sensing,  personal  informatics,   life-­‐logging,  check-­‐ins   •  Social  networking  
  • 9. NUANCED  DIMENSIONS  OF  DATA   •  Semantics   •  Helping  machines  extract  intended  meaning  from  an  individual’s   content   •  Personality  &  Emotion   •  Helping  machines  interpret  psychological,  affective,  and  subjective   characteristics  of  users  and  their  data   •  Behavior   •  Helping  machines  understand  the  dynamics  of  both  private  and   interpersonal  activities  
  • 10. APPLICATION  AREAS   Knowledge   Sharing   Personal   Informatics   Information   Retrieval  
  • 11. RESEARCH  PROJECTS   Information  Retrieval   Knowledge  Sharing   Personal  Informatics   Computational   Problem:   Dimensions   Mined:   Projects:   •  Semantic   •  Psychological   •  Psychological   •  Behavioral   •  CeRI   •  Outreach   •  Task  routing   •  Commenting   interface   •  Smart  Pensieve   •  Activity  Rhythms   •  Smoking  Cessation   •  Semantic   •  Psychological   •  RESLVE   •  Sentiment-­‐based   search  
  • 12. RESEARCH  PROJECTS   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  Computational   Problem:   Dimensions   Mined:   Projects:   •  Semantic   •  Psychological   •  Psychological   •  Behavioral   •  RESLVE   •  CeRI   •  Outreach   •  Task  routing   •  Commenting   interface   •  Smart  Pensieve   •  Activity  Rhythms   •  Smoking  Cessation   •  Semantic   •  Psychological   •  Sentiment-­‐based   search   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 13. THE  RESLVE  PROJECT   •  Gain  better  understanding  of  challenges  machines  face  in   understanding  semantic  meaning  of  social  Web  data   •  Use  those  insights  to  develop  more  advanced  computational   methods  that  can  more  reliably  make  sense  of  this  data   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 14. SOCIAL  WEB   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 15. SOCIAL  WEB   10  million     pages  per  day   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 16. SOCIAL  WEB   800  million     visitors  per  month   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 17. SOCIAL  WEB   7  billion  images   (twice  4  years  ago)   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 18. TASK  DEFINITION   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 19. TASK  DEFINITION   Information  Retrieval   Knowledge  Sharing   Personal  Informatics   Named  En)ty  Recogni)on  (NER)   •  SystemaEcally  idenEfying  menEons  of  en##es  (e.g.,   people,  places,  concepts,  ideas)  
  • 20. TASK  DEFINITION   Named  En)ty  Recogni)on  (NER)   •  SystemaEcally  idenEfying  menEons  of  en##es  (e.g.,   people,  places,  concepts,  ideas)   Named  En)ty  Disambigua)on  (NED)   •  Resolving  the  intended  meaning  of  ambiguous  enEEes  from   mulEple  candidate  meanings     Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 21. AMBIGUOUS  ENTITIES   aaahh  one  more  day  un,l   finn!!!  #cantwait         office  holiday  party   Beetle   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 22. AMBIGUOUS  ENTITIES   aaahh  one  more  day  un,l   finn!!!  #cantwait         office  holiday  party   Beetle   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 23. AMBIGUOUS  ENTITIES   aaahh  one  more  day  un,l   finn!!!  #cantwait         office  holiday  party   Beetle   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 24. AMBIGUOUS  ENTITIES   aaahh  one  more  day  un,l   finn!!!  #cantwait         office  holiday  party   Beetle   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 25. Footage:   office  holiday  party   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 26. Footage:   office  holiday  party   Footage:   • Workplace?   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 27. Footage:   office  holiday  party   Footage:   • Workplace?   • TV  Show?   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 28. Footage:   office  holiday  party   Footage:   • Workplace?   • TV  Show?   Episode  4   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 29. Footage:   office  holiday  party   Episode  4   Footage:   • Workplace?   • TV  Show?   • US  Version?   • UK  Version?   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 30. Episode  4   office  holiday  party   office,  december  3   Footage:   • Workplace?   • TV  Show?   • US  Version?   • UK  Version?   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 31. ANALYSIS   Data  Sample   •  TwiKer:  tweets   •  YouTube:  video  Etles,  descripEons   •  Flickr:  photo  tags,  Etles,  descripEons   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 32. TEXT  LENGTH   •  Longest  uKerances  sEll  shorter  than  even  shortest  texts   from  NER  task  corpora  like  Reuters-­‐21578,  Brown-­‐Corpus   0" 5" 10" 15" 20" 25" 30" 10" 40" 70" 100" 130" 160" 190" 300" 450" 600" 800" 1100" 1400" 2500" 4000" 5500" 7000" 8500" 10000" 11500" 13000" 14500" Twi/er" YouTube" Flickr" Reuters" Brown" Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 33. HIGH  AMBIGUITY   0" 0.1" 0.2" 0.3" 0.4" 0.5" 0.6" 0.7" 0.8" 0.9" 1" Wikipedia"Miner" DBPedia"Spotlight" •  NER  services  have  low  confidence     Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 34. HIGH  AMBIGUITY   •  NER  services  have  low  confidence     •  Many  potenEal  candidates  (2  to  163,  avg.  5-­‐6,  median  4)   0" 0.1" 0.2" 0.3" 0.4" 0.5" 0.6" 0.7" 0.8" 0.9" 1" Wikipedia"Miner" DBPedia"Spotlight" Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 35. HIGH  AMBIGUITY   •  91%  of  uKerances  contain  at  least  1  ambiguous  enEty   •  2/3  of  enEEes  detected  are  ambiguous   •  Almost  no  enEEes  without  at  least  2  senses  to  disambiguate   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 36. CHALLENGES  &  FOCUS   •  Short  Length   •  Sparse  Lexical  Context   •  Noisy   •  Highly  personal  in  nature   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 37. CHALLENGES  &  FOCUS   •  Short  Length   •  Sparse  Lexical  Context   •  Noisy   •  Highly  personal  in  nature   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 38. LIMITATIONS  OF  EXTANT  RESEARCH   Tweets  severely  degrade  tradiEonal  techniques     Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 39. LIMITATIONS  OF  EXTANT  RESEARCH   Tweets  severely  degrade  tradiEonal  techniques   •  Stanford  NER:  F1  drops  90%  à  46%   •  DBPedia  Spotlight  &  Wikipedia  Miner:  P@1  <  40%   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 40. LIMITATIONS  OF  EXTANT  RESEARCH   Tweets  severely  degrade  tradiEonal  techniques   •  Stanford  NER:  F1  drops  90%  à  46%   •  DBPedia  Spotlight  &  Wikipedia  Miner:  P@1  <  40%     Recent  strategies   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 41. LIMITATIONS  OF  EXTANT  RESEARCH   Tweets  severely  degrade  tradiEonal  techniques   •  Stanford  NER:  F1  drops  90%  à  46%   •  DBPedia  Spotlight  &  Wikipedia  Miner:  P@1  <  40%     Recent  strategies   •  Crowd-­‐sourcing   •  LimitaEon:  Dependent  on  reliable  human  workers   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 42. LIMITATIONS  OF  EXTANT  RESEARCH   Tweets  severely  degrade  tradiEonal  techniques   •  Stanford  NER:  F1  drops  90%  à  46%   •  DBPedia  Spotlight  &  Wikipedia  Miner:  P@1  <  40%     Recent  strategies   •  Crowd-­‐sourcing   •  LimitaEon:  Dependent  on  reliable  human  workers   •  Automated  aKempts   •  LimitaEon:  Focus  on  NER  not  NED   •  LimitaEon:  Generalizability  beyond  TwiKer?     Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 43. HYPOTHESES   Information  Retrieval   Knowledge  Sharing   Personal  Informatics   •  User  has  core  interests   •  User  more  likely  to  menEon  an  enEty  about  a  topic  relevant  to  personal  interests  than   menEon  a  topic  of  non-­‐interest     •  User  expresses  these  interests  consistently  in  content  she  posts   online  in  mulEple  communiEes   •  Can  use  a  semanEc  knowledge  base  to  formally  represent  these   topics  of  interest        
  • 44. HYPOTHESES   Information  Retrieval   Knowledge  Sharing   Personal  Informatics   •  User  has  core  interests   •  User  more  likely  to  menEon  an  enEty  about  a  topic  relevant  to  personal  interests  than   menEon  a  topic  of  non-­‐interest     •  User  expresses  these  interests  consistently  in  content  she  posts   online  in  mulEple  communiEes   •  Can  use  a  semanEc  knowledge  base  to  formally  represent  these   topics  of  interest   •  Wikipedia   •  ArEcles,  categories  effecEvely  represent  topic   •  CompaEble  with  NER  toolkits  (DBPedia  Spotlight,  Wikipedia  Miner)   ­  ArEcle  ediEng  behavior  ≈  interests        
  • 45. QUALITATIVE  ANALYSIS:  STABLE  INTERESTS   User’s  topics  of  contribuEon  similar  across  Web:             On  average,  52.4%  of  enEEes  a  user  menEons  in  social  Web  (e.g.,  “Java”)  have  at   least  1  candidate  sense  in  same  parent  category  of  Wikipedia  arEcle  same  user   edited  (e.g.,  “Programming  language”)   If  extend  to  just  4  parents  up  category  hierarchy,  get  all  100%     Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 46. QUALITATIVE  ANALYSIS:  STABLE  INTERESTS   User’s  topics  of  contribuEon  similar  across  Web:     Same  Topic       On  average,  52.4%  of  enEEes  a  user  menEons  in  social  Web  (e.g.,  “Java”)  have  at   least  1  candidate  sense  in  same  parent  category  of  Wikipedia  arEcle  same  user   edited  (e.g.,  “Programming  language”)   If  extend  to  just  4  parents  up  category  hierarchy,  get  all  100%           Ambiguous  YouTube  post:     office,  december  3     Same  user’s  recent  Wikipedia  edit:     <item  userid="xxxx"  user="xxxx”   pageid="31841130”  ,tle=     "The  Office  (U.S.  season  8)"/>     Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 47. QUALITATIVE  ANALYSIS:  STABLE  INTERESTS   A  user’s  topics  of  contribuEon  similar  across  Web:     Same  Topic   Same  categories     On  average,  52.4%  of  enEEes  a  user  menEons  in  social  Web  (e.g.,  “Java”)  have  at   least  1  candidate  sense  in  same  parent  category  of  Wikipedia  arEcle  same  user   edited  (e.g.,  “Programming  language”)     If  extend  to  just  4  parents  up  category  hierarchy,  get  all  100%           Ambiguous  YouTube  post:     office,  december  3     Same  user’s  recent  Wikipedia  edit:     <item  userid="xxxx"  user="xxxx”   pageid="31841130”  ,tle=     "The  Office  (U.S.  season  8)"/>     Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 48. STRATEGY   Information  Retrieval   Knowledge  Sharing   Personal  Informatics   Ø  Bridge  user  idenEty  between  social  Web  and  knowledge  base,  K   Ø  Model  interests  using  K’s  organizaEonal  scheme   Ø  Rank  enEty  senses  according  to  relevance  to  interests    
  • 49. EXPLORING  A  PERSONALIZED  SOLUTION    Individual-­‐centric  approach  to  NED   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 50. EXPLORING  A  PERSONALIZED  SOLUTION    Individual-­‐centric  approach  to  NED      Incorporates  external,  user-­‐specific  semanEc  data   Personal   Context   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 51. EXPLORING  A  PERSONALIZED  SOLUTION    Individual-­‐centric  approach  to  NED      Incorporates  external,  user-­‐specific  semanEc  data    Model  personal  interests  with  respect  to  this  informaEon   Personal   Context   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 52. EXPLORING  A  PERSONALIZED  SOLUTION    Individual-­‐centric  approach  to  NED      Incorporates  external,  user-­‐specific  semanEc  data    Model  personal  interests  with  respect  to  this  informaEon    Determine  user’s  likely  intended  meaning  of  ambiguous  enEty  based   on  similarity  between  potenEal  meanings  and  interests   Personal   Context   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 53. EXPLORING  A  PERSONALIZED  SOLUTION    Individual-­‐centric  approach  to  NED      Incorporates  external,  user-­‐specific  semanEc  data    Model  personal  interests  with  respect  to  this  informaEon    Determine  user’s  likely  intended  meaning  of  ambiguous  enEty  based  on   similarity  between  potenEal  meanings  and  interests   RESLVE   Resolving  EnEty  Sense  by  LeVeraging  Edits   Personal   Context   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 54. IMPLEMENTATION:  THE  RESLVE  SYSTEM   RESLVE  (Resolving  EnEty  Sense  by  LeVeraging  Edits)  addresses  NED  by:     pre- processor Wikipedia Miner user utterances unstructured short texts DBPedia Spotlight top ranked personally- relevant candidates entity m m m entity username user contributed structured documents user interest model BRIDGING USER IDENTITY MODELING USER INTEREST I II III RANKING CANDIDATES BY PERSONAL RELEVANCE m m m m m m m m m m entity entity detected entities & candidate meanings ("m") Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 55. IMPLEMENTATION:  THE  RESLVE  SYSTEM   RESLVE  (Resolving  EnEty  Sense  by  LeVeraging  Edits)  addresses  NED  by:   I.  ConnecEng  social  Web  +  Wikipedia  editor  idenEty   pre- processor Wikipedia Miner user utterances unstructured short texts DBPedia Spotlight top ranked personally- relevant candidates entity m m m entity username user contributed structured documents user interest model BRIDGING USER IDENTITY MODELING USER INTEREST I II III RANKING CANDIDATES BY PERSONAL RELEVANCE m m m m m m m m m m entity entity detected entities & candidate meanings ("m") Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 56. IMPLEMENTATION:  THE  RESLVE  SYSTEM   RESLVE  (Resolving  EnEty  Sense  by  LeVeraging  Edits)  addresses  NED  by:   I.  ConnecEng  social  Web  +  Wikipedia  editor  idenEty     II.  Modeling  topics  of  interests  using  arEcle  edits   pre- processor Wikipedia Miner user utterances unstructured short texts DBPedia Spotlight top ranked personally- relevant candidates entity m m m entity username user contributed structured documents user interest model BRIDGING USER IDENTITY MODELING USER INTEREST I II III RANKING CANDIDATES BY PERSONAL RELEVANCE m m m m m m m m m m entity entity detected entities & candidate meanings ("m") Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 57. IMPLEMENTATION:  THE  RESLVE  SYSTEM   RESLVE  (Resolving  EnEty  Sense  by  LeVeraging  Edits)  addresses  NED  by:   I.  ConnecEng  social  Web  +  Wikipedia  editor  idenEty     II.  Modeling  topics  of  interests  using  arEcle  edits   III.  Ranking  enEty  candidates  by  personal  relevance     pre- processor Wikipedia Miner user utterances unstructured short texts DBPedia Spotlight top ranked personally- relevant candidates entity m m m entity username user contributed structured documents user interest model BRIDGING USER IDENTITY MODELING USER INTEREST I II III RANKING CANDIDATES BY PERSONAL RELEVANCE m m m m m m m m m m entity entity detected entities & candidate meanings ("m") Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 58. IMPLEMENTATION:  THE  RESLVE  SYSTEM   RESLVE  (Resolving  EnEty  Sense  by  LeVeraging  Edits)  addresses  NED  by:   I.  ConnecEng  social  Web  +  Wikipedia  editor  idenEty     II.  Modeling  topics  of  interests  using  arEcle  edits   III.  Ranking  enEty  candidates  by  personal  relevance     pre- processor Wikipedia Miner user utterances unstructured short texts DBPedia Spotlight top ranked personally- relevant candidates entity m m m entity username user contributed structured documents user interest model BRIDGING USER IDENTITY MODELING USER INTEREST I II III RANKING CANDIDATES BY PERSONAL RELEVANCE m m m m m m m m m m entity entity detected entities & candidate meanings ("m") Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 59. PHASE  1:  BRIDGING  WEB  IDENTITIES   pre- processor Wikipedia Miner user utterances unstructured short texts DBPedia Spotlight top ranked personally- relevant candidates entity m m m entity username user contributed structured documents user interest model BRIDGING USER IDENTITY MODELING USER INTEREST I II III RANKING CANDIDATES BY PERSONAL RELEVANCE m m m m m m m m m m entity entity detected entities & candidate meanings ("m") Information  Retrieval   Knowledge  Sharing   Personal  Informatics   •  Connect  idenEty  of  social  media  user  with  Wikipedia  editor  
  • 60. PHASE  1:  BRIDGING  WEB  IDENTITIES   •  Connect  idenEty  of  social  media  user  with  Wikipedia  editor   •  Simple  string  matching   ­  Iofciu,  2011;  Perito,  2011   pre- processor Wikipedia Miner user utterances unstructured short texts DBPedia Spotlight top ranked personally- relevant candidates entity m m m entity username user contributed structured documents user interest model BRIDGING USER IDENTITY MODELING USER INTEREST I II III RANKING CANDIDATES BY PERSONAL RELEVANCE m m m m m m m m m m entity entity detected entities & candidate meanings ("m") Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 61. pre- processor Wikipedia Miner user utterances unstructured short texts DBPedia Spotlight top ranked personally- relevant candidates entity m m m entity username user contributed structured documents user interest model BRIDGING USER IDENTITY MODELING USER INTEREST I II III RANKING CANDIDATES BY PERSONAL RELEVANCE m m m m m m m m m m entity entity detected entities & candidate meanings ("m")  Models  user’s  topics  of  interest  using  bridged  Wiki  account’s  ediEng-­‐history    Compares  similarity  of  those  topics  to  topic  associated  with  candidate  sense   Information  Retrieval   Knowledge  Sharing   Personal  Informatics   PHASE  2:  REPRESENTING  USERS  AND  ENTITIES  
  • 62.  Models  user’s  topics  of  interest  using  bridged  Wiki  account’s  ediEng-­‐history    Compares  similarity  of  those  topics  to  topic  associated  with  candidate  sense    Content-­‐based  &  knowledge-­‐graph  based  similarity   pre- processor Wikipedia Miner user utterances unstructured short texts DBPedia Spotlight top ranked personally- relevant candidates entity m m m entity username user contributed structured documents user interest model BRIDGING USER IDENTITY MODELING USER INTEREST I II III RANKING CANDIDATES BY PERSONAL RELEVANCE m m m m m m m m m m entity entity detected entities & candidate meanings ("m") Information  Retrieval   Knowledge  Sharing   Personal  Informatics   PHASE  2:  REPRESENTING  USERS  AND  ENTITIES  
  • 63. MODELING  A  KNOWLEDGE  CONTEXT    Knowledge  base,  K    K=(N,E)    2  node  types:   ­  Categories   ­  Topics   c1 c2 c4 t3t2 c3 d2d1 d3 t1 Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 64. USER  INTEREST  MODEL   •  EdiEng  a  descripEon  signals  interest  in  associated  topic   •  Topic  nodes:  all  topics  user  edited  descripEon  of   •  Category  nodes:  categories  reachable  in  knowledge  graph  from  those  topics   •  Edge  weight  =  inverse  of  shortest  path  length   ! c1 c2 c3 c4 t1 ! ! ! 1! ! ! ! 0! t2 ! ! ! 1! ! ! ! 1! t3 0! 0! ! ! ! 1! •  Same  representaEon  for  candidates   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 65.  Models  user’s  topics  of  interest  using  bridged  Wiki  account’s  ediEng-­‐history    Compares  similarity  of  those  topics  to  topic  associated  with  candidate  sense    Content-­‐based  &  knowledge-­‐graph  based  similarity    Weighted  vectors  used  to  represent  user  and  candidate  sense   pre- processor Wikipedia Miner user utterances unstructured short texts DBPedia Spotlight top ranked personally- relevant candidates entity m m m entity username user contributed structured documents user interest model BRIDGING USER IDENTITY MODELING USER INTEREST I II III RANKING CANDIDATES BY PERSONAL RELEVANCE m m m m m m m m m m entity entity detected entities & candidate meanings ("m") Information  Retrieval   Knowledge  Sharing   Personal  Informatics   PHASE  2:  REPRESENTING  USERS  AND  ENTITIES  
  • 66. PHASE  3:  RANKING  BY  PERSONAL  RELEVANCE   Output  highest  scoring  candidate  as  intended  meaning  by  measuring:   sim(u,m)=α*simcontent(u,m)+(1-­‐α)*simcategory(u,m)     pre- processor Wikipedia Miner user utterances unstructured short texts DBPedia Spotlight top ranked personally- relevant candidates entity m m m entity username user contributed structured documents user interest model BRIDGING USER IDENTITY MODELING USER INTEREST I II III RANKING CANDIDATES BY PERSONAL RELEVANCE m m m m m m m m m m entity entity detected entities & candidate meanings ("m") Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 67. PRE-­‐PROCESSING  &  PREPARATION  MODULES   pre- processor Wikipedia Miner user utterances unstructured short texts DBPedia Spotlight top ranked personally- relevant candidates entity m m m entity username user contributed structured documents user interest model BRIDGING USER IDENTITY MODELING USER INTEREST I II III RANKING CANDIDATES BY PERSONAL RELEVANCE m m m m m m m m m m entity entity detected entities & candidate meanings ("m") Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 68. pre- processor Wikipedia Miner user utterances unstructured short texts DBPedia Spotlight top ranked personally- relevant candidates entity m m m entity username user contributed structured documents user interest model BRIDGING USER IDENTITY MODELING USER INTEREST I II III RANKING CANDIDATES BY PERSONAL RELEVANCE m m m m m m m m m m entity entity detected entities & candidate meanings ("m") PRE-­‐PROCESSING  &  PREPARATION  MODULES  
  • 69. EXPERIMENT   Labeling  correct  enEty  meaning   •  1545  valid  ambiguous  enEEes   •  Mechanical  Turk  CategorizaEon  Masters     •  Averaged  observed  agreement  across  all  coders  and  items  =  0.866   •  Average  Fleiss  Kappa  =  0.803   •  918  unanimously  labeled  ambiguous  enEEes   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 70. PERFORMANCE   Metric   •  Precision  at  rank  1  (P@1)   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 71. PERFORMANCE   Metric   •  Precision  at  rank  1  (P@1)   Methods  of  comparison   •  Human  annotated  gold  standard   •  RC:  Randomly  sorted  candidates   •  PF:  Prior  frequency     •  RU:  RESLVE  given  a  random  Wikipedia  user's  interest  model     •  DS:  DBPedia  Spotlight   •  WM:  Wikipedia  Miner     Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 72. RESULTS   Flickr   YouTube   RESLVE   0.63   0.76   0.84   RC   0.21   0.32   0.31   PF   0.74   0.69   0.66   RU   0.51   0.71   0.78   WM   0.78   0.58   0.80   DS   0.53   0.67   0.63   Twitter Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 73. RESULTS   •  Best  performance  on  YouTube  texts          (longest)  due  to  content-­‐based  sim   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 74. RESULTS   •  Best  performance  on  YouTube  texts          (longest)  due  to  content-­‐based  sim   •  Outperforms  on  more  personal  text  (e.g.,  tweets)     Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 75. RESULTS   •  Best  performance  on  YouTube  texts          (longest)  due  to  content-­‐based  sim   •  Outperforms  on  more  personal  text  (e.g.,  tweets)   •  Less  effecEve  on  impersonal  text  (e.g.,  photo  geo-­‐tags)   •   High  prior  frequency  so  standard  methods  suffice   •  Personally-­‐unfamiliar  topics  so  not  likely  to  make  Wiki  edits  about  them   •  Stable  interests  assumpEon  breaks  down  here   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 76. RESEARCH  PROJECTS   Information  Retrieval   Knowledge  Sharing   Personal  Informatics   •  Semantic   •  Psychological   •  Psychological   •  Behavioral   •  RESLVE   •  CeRI   •  Outreach   •  Task  routing   •  Commenting   interface   •  Smart  Pensieve   •  Activity  Rhythms   •  Smoking  Cessation   •  Semantic   •  Psychological   •  Sentiment-­‐based   search   Computational   Problem:   Dimensions   Mined:   Projects:  
  • 77. SENTIMENT  BASED  SEARCH   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 78. SENTIMENT  BASED  SEARCH   •  Zip  codes  of  10  most  populated  cities,  10  least  populated  cities,  10   random  cities  across  the  country   •  54,015  places  across  1500  US  cities   •  Movie  theaters,  hotels,  spas,  stores,  restaurants,  etc.   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 79. Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 80. Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 81. Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 82. Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 83. CHALLENGES  FOR  USERS   •  Interpreting  mixed  reviews   •  Confidence  in  reviewer’s  subjective  opinions   •  Reading  multiple  reviews   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 84. RESEARCH  QUESTIONS   •  Language  and  rating   •  How  does  a  place’s  rating  relate  to  the  language  used  in  its  reviews?     •  Personality  and  rating   •  Do  people  with  similar  personalities  tend  to  like  or  dislike  the  same  places?   •  Search  interfaces   •  How  can  we  rank  search  results  in  order  to  recommend  places  according   to  how  appealing  their  atmosphere  is  likely  to  be  to  a  user  based  on  her   personality  and  mood?   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 85. STRATEGY   •  Extract  features  from  reviews  using  Linguistic  Inquiry  and  Word   Count  (LIWC)  and  MRC  Psycholinguistic  Database       •  Support  vector  models  trained  by  Mairesse  algorithm  to  derive   Big  Five  personality  types  of  reviewers     •  Average  personality  score  of  reviewers  of  a  place  who  rated  the   place  5  or  higher/lower  as  proxy  for  people  who  like/dislike  a   location’s  essence   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 86. SEARCH  INTERFACES   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 87. Extraversion  =  Red,  Stability  =  Purple,     Agreeableness  =  Green,  Conscientiousness  =  Blue,     Openness  =  Yellow     Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 88. Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 89. CONSUMING  INFORMATION   Information  Retrieval   Knowledge  Sharing   Personal  Informatics  
  • 90. CREATING  &  SHARING  INFORMATION   Knowledge  Sharing   Personal  Informatics  Information  Retrieval  
  • 91. RESEARCH  PROJECTS   Knowledge  Sharing   Personal  Informatics   •  CeRI   •  Outreach   •  Task  routing   •  Commenting   interface   •  Semantic   •  Psychological   Information  Retrieval   •  RESLVE   •  Sentiment-­‐based   search   •  Semantic   •  Psychological   •  Psychological   •  Behavioral   •  Smart  Pensieve   •  Activity  Rhythms   •  Smoking  Cessation   •  CeRI   Knowledge  Sharing   Personal  Informatics  Information  Retrieval   Computational   Problem:   Dimensions   Mined:   Projects:  
  • 92. CERI:  CORNELL  E-­‐RULEMAKING  INITIATIVE   •  Law  School   •  Legal  Information  Institute  (LII)   •  Information  Science   •  Computer  Science   Knowledge  Sharing   Personal  Informatics  Information  Retrieval  
  • 93. BACKGROUND   •  Rulemaking:  process  federal  agencies  use  to  create  regulations   (called  “rules”)   •  e-­‐Rulemaking:  the  use  of  digital  technologies  during  this  process   •  Regulations.gov,  RegulationRoom.org:  online  communities  that   allow  people  to  learn  about,  discuss,  and  react  to  proposed  rules   during  e-­‐Rulemaking  process   Knowledge  Sharing   Personal  Informatics  Information  Retrieval  
  • 94. BACKGROUND   •  Rulemaking:  process  federal  agencies  use  to  create  regulations   (called  “rules”)   •  e-­‐Rulemaking:  the  use  of  digital  technologies  during  this  process   •  Regulations.gov,  RegulationRoom.org:  online  communities  that   allow  people  to  learn  about,  discuss,  and  react  to  proposed  rules   during  e-­‐Rulemaking  process   Knowledge  Sharing   Personal  Informatics  Information  Retrieval  
  • 95. PARTICIPATION  PATTERNS   •  Regulations.gov   •  14,000  rules   •  2  million  comments   •  Regulation  Room   •  5  live  rules   •  1,318  comments   •  Common  problem:  under-­‐contribution   Knowledge  Sharing   Personal  Informatics  Information  Retrieval  
  • 96. PARTICIPATION  PATTERNS   Knowledge  Sharing   Personal  Informatics  Information  Retrieval  
  • 97. PARTICIPATION  PATTERNS   Frequency  of  comments  per  rule   Comments  per  rule  across   agencies  Knowledge  Sharing   Personal  Informatics  Information  Retrieval  
  • 98. CHALLENGE   Knowledge  Sharing   Personal  Informatics  Information  Retrieval   •  A  major  goal  of  eRulemaking  is  to  increase  public  participation  across   a  broad  audience  and  make  the  process  more  representative   •  A  major  challenge  is  sustained  participation  by  multiple  actors  across   rules  
  • 99. SOLUTION   •  Twitter  is  a  popular  medium  where  people  express  views  and  ideas   •  Identify  and  target  Twitter  users  who  may  be  interested   in  contributing  feedback  on  a  rule   A  solution  is  to  bring  new  users  to  an  e-­‐rule   •  A  major  goal  of  eRulemaking  is  to  increase  public  participation  across   a  broad  audience  and  make  the  process  more  representative   •  A  major  challenge  is  sustained  participation  by  multiple  actors  across   rules   Knowledge  Sharing   Personal  Informatics  Information  Retrieval  
  • 100. EXPERIMENT   •  How  useful  is  Twitter  content  for  drawing  inferences  about   people’s  interests  and  knowledgeability  about  a  topic?   •  Are  users  who  create  content  about  topics  relevant  to  an  e-­‐rule   more  likely  to  engage  in  related  e-­‐Rulemaking  processes  if   targeted  with  requests  for  participation?   Knowledge  Sharing   Personal  Informatics  Information  Retrieval  
  • 101. 1 Identify  Subjects   Bio   Tweet   Combo   Contro l   •  Similarity  between  query  and  each  document   •  Highest  score  used  to  assign  user  to  condition   *  via  Google  Keyword   Tool,  which  provides  less   technical  words  used  by   public  to  discuss  same   topics   User:   Rule:   Document  term  matrix   Query   q = words in rule + query expansion * D1 = bio D2 = tweets D3 = bio+tweets (“combo”) Knowledge  Sharing   Personal  Informatics  Information  Retrieval  
  • 102. 2 ²  Highest   ranked   users   in   each   group   sent   an   outreach   tweet   Send  Tweets   Knowledge  Sharing   Personal  Informatics  Information  Retrieval  
  • 103. 3   •  Engagement  (retweets,  replies,  and  follows)   •  Click  Through  Rate     •  Contributed  to  the  rule   Measure  Response   Knowledge  Sharing   Personal  Informatics  Information  Retrieval  
  • 104. PSYCHOLOGICAL  TRAITS  OF  EFFECTIVE   CONTRIBUTORS   Knowledge  Sharing   Personal  Informatics  Information  Retrieval   •  Connecting  psychological  traits,  language  use,  and  contribution   capability     •  Classification,  Outreach,  and  Task  Routing  
  • 105. PSYCHOLOGICAL  TRAITS  OF  EFFECTIVE   CONTRIBUTORS   •  Connecting  psychological  traits,  language  use,  and  contribution   capability     •  Classification,  Outreach,  and  Task  Routing     •  Inventories   •  Self-­‐efficacy  &  self-­‐esteem   •  Big  5  personality   •  Self-­‐regulation  &  self-­‐monitoring   •  Trendsetting  &  Opinion  Leadership   •  Pro-­‐social  &  altruistic  value  orientations     Knowledge  Sharing   Personal  Informatics  Information  Retrieval  
  • 106. COMPUTATIONAL  SUPPORTS  FOR   KNOWLEDGE  SHARING   •  Meaningful  games  to  teach   community  norms   •  Personalized  rule  recommendation   •  Providing  assistance,  prompts,  and   examples  to  improve  the  quality  of   contributions   Knowledge  Sharing   Personal  Informatics  Information  Retrieval  
  • 107. COMPUTATIONAL  SUPPORTS  FOR   KNOWLEDGE  SHARING   •  Meaningful  games  to  teach   community  norms   •  Personalized  rule  recommendation   •  Providing  assistance,  prompts,  and   examples  to  improve  the  quality  of   contributions   Knowledge  Sharing   Personal  Informatics  Information  Retrieval  
  • 108. RESEARCH  PROJECTS   Knowledge  Sharing   •  Psychological   •  Behavioral   •  CeRI   •  Outreach   •  Task  routing   •  Commenting   interface   •  Smart  Pensieve   •  Activity  Rhythms   •  Smoking  Cessation   •  Semantic   •  Psychological   Information  Retrieval   •  RESLVE   •  Sentiment-­‐based   search   Personal  Informatics   •  Semantic   •  Psychological   Information  Retrieval   Personal  Informatics  Information  Retrieval   Knowledge  Sharing   Computational   Problem:   Dimensions   Mined:   Projects:  
  • 109. RESEARCH  PROJECTS   Knowledge  Sharing   •  Psychological   •  Behavioral   •  CeRI   •  Outreach   •  Task  routing   •  Commenting   interface   •  Activity  Rhythms   •  Smoking  Cessation   •  Semantic   •  Psychological   Information  Retrieval   •  RESLVE   •  Sentiment-­‐based   search   Personal  Informatics   •  Semantic   •  Psychological   •  Smart  Pensieve   Information  Retrieval   Personal  Informatics  Information  Retrieval   Knowledge  Sharing   Computational   Problem:   Dimensions   Mined:   Projects:  
  • 110. REMINISCENCE   •  Current  tools  are  too  technically  focused   •  Emphasize  data  capture  and  logging  (photos,  videos,   scanned  documents)   •  Treats  memories  as  information  to  be  later  manipulated   Information  Retrieval   Personal  Informatics  Information  Retrieval   Knowledge  Sharing  
  • 111. REMINISCENCE   •  Current  tools  are  too  technically  focused   •  Emphasize  data  capture  and  logging  (photos,  videos,   scanned  documents)   •  Treats  memories  as  information  to  be  later  manipulated   •  But  the  activity  of  reminiscence  is  actually..   •  Imprecise   •  Social   •  Nuanced   Information  Retrieval   Personal  Informatics  Information  Retrieval   Knowledge  Sharing  
  • 112. SMART  PENSIEVE:     WHAT  MAKES  A  MEMORY  MEANINGFUL?   •  Content  type   •  Photos,  wall  posts,  status  updates,  event  information   •  Social  dynamics   •  Tie  strength,  kind  of  relationship,  amount  of  interaction   •  Temporal  features   •  Recent,  distant  past   Information  Retrieval   Personal  Informatics  Information  Retrieval   Knowledge  Sharing  
  • 113. TRIGGERING  MEMORY   Information  Retrieval   Personal  Informatics  Information  Retrieval   Knowledge  Sharing  
  • 114. PAM,  PANAS,  ISS,  MSCS   Information  Retrieval   Personal  Informatics  Information  Retrieval   Knowledge  Sharing  
  • 115. COLLECTING  SURVEY  DATA   •  Laboratory  setting   •  Pro:  Can  monitor  participants  &  ensure  data  quality   •  Con:  More  time  consuming  for  researcher   •  Con:  Higher  pay  rates   •  Online  surveys   •  Pro:  Allow  larger  scale  collection   •  Pro:  Cheaper  (time  &  money)   •  Con:  Drop-­‐outs  and  missing  responses  common   Information  Retrieval   Personal  Informatics  Information  Retrieval   Knowledge  Sharing  
  • 116. Information  Retrieval   Personal  Informatics  Information  Retrieval   Knowledge  Sharing   IMPROVING  SURVEY  ADMINISTRATION  
  • 117. RESEARCH  PROJECTS   Knowledge  Sharing   •  Psychological   •  Behavioral   •  CeRI   •  Outreach   •  Task  routing   •  Commenting   interface   •  Smart  Pensieve   •  Smoking  Cessation   •  Semantic   •  Psychological   Information  Retrieval   •  RESLVE   •  Sentiment-­‐based   search   Personal  Informatics   •  Semantic   •  Psychological   •  Activity  Rhythms   Computational   Problem:   Dimensions   Mined:   Projects:  
  • 118. BEHAVIOR  &  HEALTH   •  Assess  sleep  patterns  &  circadian  rhythm   •  Capture  behavioral  factors  associated  with  stress   •  Approach   •  Screen  on/off   •  Unlocking   •  Application  usage   •  Internet  search   •  SMS,  email,  phone     Information  Retrieval   Personal  Informatics  Information  Retrieval   Knowledge  Sharing  
  • 119. BEHAVIOR  &  HEALTH   •  Scheduling  Patterns   •  Socially-­‐Oriented  Behaviors   •  Approach   •  Calendar  entries,  social  media  posts,  messages   •  Psycholinguistic  Analysis   •  Personality  Inventory   Information  Retrieval   Personal  Informatics  Information  Retrieval   Knowledge  Sharing  
  • 120. RESEARCH  PROJECTS   Knowledge  Sharing   •  Psychological   •  Behavioral   •  CeRI   •  Outreach   •  Task  routing   •  Commenting   interface   •  Smart  Pensieve   •  Activity  Rhythms   •  Semantic   •  Psychological   Information  Retrieval   •  RESLVE   •  Sentiment-­‐based   search   Personal  Informatics   •  Semantic   •  Psychological   •  Smoking  Cessation   Computational   Problem:   Dimensions   Mined:   Projects:  
  • 121. SMOKING  CESSATION   •  Leading  cause  of  preventable  death  &  leading  form  of   chemical  dependence  in  U.S.   •  44  million  smokers  in  the  U.S.  alone  (1/5  of  population)   •  68.8%  report  they  want  to  quit  and  over  50%  have  tried  for   at  least  1  day  in  the  past  year   •  Relapse  common  &  a  minority  permanently  abstain   Information  Retrieval   Personal  Informatics  Information  Retrieval   Knowledge  Sharing  
  • 122. INTERVENTION   •  Requires  tailoring  to  individual  conditions   •  Lack  of  long  term  patient  assessment  &  follow-­‐up     •  Access  and  affordability  are  obstacles     Information  Retrieval   Personal  Informatics  Information  Retrieval   Knowledge  Sharing  
  • 123. INTERVENTION   •  Requires  tailoring  to  individual  conditions   •  Lack  of  long  term  patient  assessment  &  follow-­‐up     •  Access  and  affordability  are  obstacles   •  Technology  based  interventions  have  major  shortcomings   •  Low  adherence  to  established  guidelines   •  Not  personalized   •  Unable  to  handle  user  struggles  and  setbacks   Information  Retrieval   Personal  Informatics  Information  Retrieval   Knowledge  Sharing  
  • 124. FACTORS  INFLUENCING  OUTCOME   •  Personal,  psychological,  emotional  traits   •  Behaviors  &  activities   •  Environment  and  social  interactions   •  Cessation  motivations  and  process    
  • 125. LEVERAGING  DIGITAL  FOOTPRINTS   •  Naturally  expressed  language   •  Content  is  posted  spontaneously  and  regularly   •  Social  setting   •  Low-­‐cost,  large-­‐scale,  longitudinal  data  access   Information  Retrieval   Personal  Informatics  Information  Retrieval   Knowledge  Sharing  
  • 126. MAKE  A  PREDICTION    General  illness  +  coughing  +  wheezing  =  Today  I  quit  smoking.    Just  saw  a  cigarette  commercial  with  people  with  holes  in  their  throat.  It's   official.  No  more  cigarettes.    Today,  I  quit  smoking.  My  son  came  home  with  an  ashtray  he  made  in  arts   and  crafts  class.  FML  
  • 127. MAKE  A  PREDICTION    General  illness  +  coughing  +  wheezing  =  Today  I  quit  smoking.    Just  saw  a  cigarette  commercial  with  people  with  holes  in  their  throat.  It's   official.  No  more  cigarettes.    Today,  I  quit  smoking.  My  son  came  home  with  an  ashtray  he  made  in  arts   and  crafts  class.  FML  
  • 128. MAKE  A  PREDICTION    General  illness  +  coughing  +  wheezing  =  Today  I  quit  smoking.    Just  saw  a  cigarette  commercial  with  people  with  holes  in  their  throat.  It's   official.  No  more  cigarettes.    Today,  I  quit  smoking.  My  son  came  home  with  an  ashtray  he  made  in  arts   and  crafts  class.  FML   n  i’m  cool,  day  4  no  cigs  but  my  mom  smokes,  i  stay  with  her,  does  not  respect   me  trying  to  quit  :   n  I  quit  smoking  on  Sunday  evening.  Day  3  today.  I  feel  exhausted,  annoyed,   bored.  But  the  fight  must  go  on.  Keep  fighting  :)   n  somebody  is  getting  punched  in  the  f***ing  mouth  today.  #coldturkey  
  • 129. MAKE  A  PREDICTION    General  illness  +  coughing  +  wheezing  =  Today  I  quit  smoking.    Just  saw  a  cigarette  commercial  with  people  with  holes  in  their  throat.  It's   official.  No  more  cigarettes.    Today,  I  quit  smoking.  My  son  came  home  with  an  ashtray  he  made  in  arts   and  crafts  class.  FML   n  i’m  cool,  day  4  no  cigs  but  my  mom  smokes,  i  stay  with  her,  does  not  respect   me  trying  to  quit  :   n  I  quit  smoking  on  Sunday  evening.  Day  3  today.  I  feel  exhausted,  annoyed,   bored.  But  the  fight  must  go  on.  Keep  fighting  :)   n  somebody  is  getting  punched  in  the  f***ing  mouth  today.  #coldturkey  
  • 130. METHODOLOGY  &  DATA  COLLECTION   • Identify  smokers   •  Query  Twitter  firehose  for  cessation  event  tweets   •  Sample  2000  users   •  3  Mechanical  Turkers  per  tweet  for  verification   •  2  years  worth  of  tweets  per  verified  smoker  (1  year  before  cessation   event,  1  year  after)  
  • 131. MEASURES   Activity  variables   •  Tweet  volume,  burstiness,  frequency   Social  variables   •  Friends,  followers,  tweets  with  @mentions,  unique  mentions   Personal  &  Emotional  variables   •  Location,  sentiment  intensity   Behavior  Change  Process  variables   •  Cessation  date,  motive  to  quit,  treatment,  stages  of  behavior   change  
  • 132. MEASURES   Activity  variables   •  Tweet  volume,  burstiness,  frequency   Social  variables   •  Friends,  followers,  tweets  with  @mentions,  unique  mentions   Personal  &  Emotional  variables   •  Location,  sentiment  intensity   Behavior  Change  Process  variables   •  Cessation  date,  motive  to  quit,  treatment,  stages  of  behavior   change    
  • 133. MEASURES   Activity  variables   •  Tweet  volume,  burstiness,  frequency   Social  variables   •  Friends,  followers,  tweets  with  @mentions,  unique  mentions   Personal  &  Emotional  variables   •  Location,  sentiment  intensity   Behavior  Change  Process  variables   •  Cessation  date,  motive  to  quit,  treatment,  stages  of  behavior   change    
  • 134. MEASURES   Activity  variables   •  Tweet  volume,  burstiness,  frequency   Social  variables   •  Friends,  followers,  tweets  with  @mentions,  unique  mentions   Personal  &  Emotional  variables   •  Location,  sentiment  intensity   Behavior  Change  Process  variables   •  Cessation  date,  motive  to  quit,  treatment,  stages  of  behavior   change    
  • 135. MEASURES   Activity  variables   •  Tweet  volume,  burstiness,  frequency   Social  variables   •  Friends,  followers,  tweets  with  @mentions,  unique  mentions   Personal  &  Emotional  variables   •  Location,  sentiment  intensity   Behavior  Change  Process  variables   •  Cessation  date,  motive  to  quit,  treatment,  stages  of  behavior   change    
  • 136. RESPONSE  VARIABLES    Outcome   ­  Survival  /  Relapse    Survivors    Congratulations  to  me,  still  smoke  free  J    @username  nope  i  don’t  smoke  anymore    first  few  weeks  were  hard  but  I  haven’t  craved  a  cig  in  months    Relapsers    Day  26:  Broke  down  and  bought    a  pack  of  smokes  last  weekend.  Smoked  the  last  one  today.    Well,  tried  to  quit  smokin  tobacco  but..had  a  fucked  up  day    So  day  3  of  not  smoking  is  about  to  get  cut  short..i  can’t  do  it  lol  
  • 137. ALIGNMENT  WITH  CDC  REPORTS   ! ! Men Women CDC 54% 46% Twitter 59% 41% Location       Gender         Abstinence  Rates     ! !
  • 138. ALIGNMENT  WITH  CDC  REPORTS   ! ! Men Women CDC 54% 46% Twitter 59% 41% Location       Gender         Abstinence  Rates     ! !
  • 139. ALIGNMENT  WITH  CDC  REPORTS   ! ! Men Women CDC 54% 46% Twitter 59% 41% Location       Gender         Abstinence  Rates     ! !
  • 140. ALIGNMENT  WITH  CDC  REPORTS   ! ! Men Women CDC 54% 46% Twitter 59% 41% Location       Gender         Abstinence  Rates     ! !
  • 141. RESULTS   •  Survivors  (S)  and  Relapsers  (R)   •  Before  (B)  and  After  (A)  the  cessation  point  
  • 142. SIGNIFICANT  DIFFERENCES:  ACTIVITY   Tweets   before   Tweets   after   Burst   before   Burst   after   Freq  before   Freq   after   FAIL   1243   3551   10.119   10.943   3.56   2.704   SUCCEED   412   771   4.459   4.278   9.906   11.254  
  • 143. TIME  OF  DAY   ! “im  really  considering  smoking   tonight  bcause  im  so  stressed”  
  • 144. TIME  OF  DAY   ! “outside  the  club  and  guy  beside  me   smoking  makes  me  wanna”   “im  really  considering  smoking   tonight  bcause  im  so  stressed”  
  • 145. SIGNIFICANT  DIFFERENCES:  SOCIAL   Friends   before   Friends  after   Followers   before   Follwers  after   FAIL   .093   .073   .074   .064   SUCCEED   .187   .207   .114   .125   “Starting  the  patch   today.  Everyone  please   support  me  on  the  road  to   quitting  smoking”     “Ok  I  started  a  really  big  challenge   yesterday...  I  quit  smoking!  I  may  need   some  help  from  you  guys  in  the   upcoming  days/weeks”.    
  • 146. SIGNIFICANT  DIFFERENCES:  SOCIAL   Friends   before   Friends  after   Followers   before   Follwers  after   FAIL   .093   .073   .074   .064   SUCCEED   .187   .207   .114   .125   Day  2  of  not  smoking  #bittersweet     I  quit  smoking  yesterday  and  everyone  is  pissing  me  off!     Day  3  without  a  cig.  Ooo  I'm  about  to  shoot  someone  
  • 148. Information  Retrieval   Personal  Informatics  Information  Retrieval   Knowledge  Sharing   PREDICTION  
  • 149. CONTRIBUTIONS    Theoretical  contributions   ­  Goal  setting   ­  Behavior  change    Computational  contributions   ­  Classification  of  smoking-­‐relevant  content   ­  Extraction  of  informative  data  features   ­  Modeling  the  process  &  predicting  ultimate  outcome   ­  Design  implications  for  intelligent  intervention  technologies  
  • 150. RESEARCH  PROJECTS   Information  Retrieval   Knowledge  Sharing   Personal  Informatics   Computational   Problem:   Dimensions   Mined:   Projects:   •  Semantic   •  Psychological   •  Psychological   •  Behavioral   •  CeRI   •  Outreach   •  Task  routing   •  Commenting   interface   •  Smart  Pensieve   •  Activity  Rhythms   •  Smoking  Cessation   •  Semantic   •  Psychological   •  RESLVE   •  Sentiment-­‐based   search  
  • 151. SUMMARY  &  CONCLUSION   •  Advance  our  understanding  of  what  our  digital  footprints  reveal   about  us  as  humans   •  Develop  new  computational  techniques  that  can  make  sense  of   and  utilize  this  data’s  nuanced  semantic,  psychological,  and   behavioral  dimensions   •  Apply  the  resulting  intelligent  systems  across  multiple  domains  in   order  to  help  people  use  digital  information  and  have  meaningful   experiences  with  technology  
  • 152. THANK  YOU!   •  Advance  our  understanding  of  what  our  digital  footprints  reveal   about  us  as  humans   •  Develop  new  computational  techniques  that  can  make  sense  of   and  utilize  this  data’s  nuanced  semantic,  psychological,  and   behavioral  dimensions   •  Apply  the  resulting  intelligent  systems  across  multiple  domains  in   order  to  help  people  use  digital  information  and  have  meaningful   experiences  with  technology     v  Questions,  comments,  and  guidance  welcome!   Elizabeth  L.  Murnane   elm236@cornell.edu     www.cs.cornell.edu/~elm236/