Personal Information Systems
and Personal Semantics
Gregory Grefenstette
CLEF 2015
September 8, 2015
,
Information is moving from the Web to Apps
Each person generates a lot of data
Two communities use it now
Search in one’s ...
2015CLEF 2015 Grefenstette - 3
http://www.statista.com/statistics/263795/number-of-available-apps-in-the-apple-app-store/
...
2015CLEF 2015 Grefenstette - 4
2014
Another trend
Smart Glasses
http://en.wikipedia.org/wiki/File:A_Google_Glass_wearer.jpg
http://en.wikipedia.org/wiki/File:Aimoneyetap.jp...
https://www.youtube.com/watch?v=b7I7JuQXttw
Okay …
Apps,
Quantified Self,
Smart Glasses
Step back to NOW
Personal
Big Data
Personal
Big Data
Email sent
Email received
Social network posts
IP address location
SMS, chats
Search history
Web pages v...
Who uses this data today?
Surely, each person should have the same
access to their own data
Impediments to using our own data
•  Data Silos
•  Ownership
•  Privacy
•  Big Data Problems
•  Variety
•  Volume
•  Mergi...
Supposing we could get all our data back into
our own hands, how could we search it?
Short course on 4 types of search
Search Engines – Cranfield/SMART Model
148 Sept 2015
CLEF 2015 Grefenstette
ftp://ftp.cs.cornell.edu/pub/smart/cran
.I 6
....
2015CLEF 2015 Grefenstette - 15
Search Engines – Cranfield/SMART Model
2015CLEF 2015 Grefenstette - 16
Schedules 3 Economics, Education, Society
33 Economics and Management
338 Industries, Prod...
2015CLEF 2015 Grefenstette - 17
Search Engines – Dewey Decimal Faceted Model
2 Other Search Models: Maps, Time Intervals
2015CLEF 2015 Grefenstette - 18
Past Attempts
2015CLEF 2015 Grefenstette - 19
MyLifeBits
2015CLEF 2015 Grefenstette - 20
Gemmell, Jim, Gordon Bell, and Roger
Lueder. "MyLifeBits: a personal database
f...
LifeLog
2015CLEF 2015 Grefenstette - 21
…The user can order the life-log agent
to add retrieval keys (annotation) with
an ...
Stuff I’ve Seen
2015CLEF 2015 Grefenstette - 22
…Research in cognitive psychology has
found that people remember
informati...
PERSON
2015CLEF 2015 Grefenstette - 23
…we define the general category for
user’s activity in advance, such as
ordinary ac...
Personal Data Prototype
2015CLEF 2015 Grefenstette - 24
…Landmarks of tags are defined by the
frequency of tags that are a...
Dublin City University
2015CLEF 2015 Grefenstette - 25
…The user can order the life-log agent to add retrieval
keys (annot...
Okay,
we’ve seen
-- Apps / QS
-- Personal Big Data
-- Some early attempts
Everyone says
Time is important
Maps are importa...
2015PTraces Grefenstette - 27
swimming
2015PTraces Grefenstette - 28
swimming
(my) people involved
in something about
swimming
2015PTraces Grefenstette - 29
swimming
things I’ve bought
involving
swimming
2015PTraces Grefenstette - 30
swimming
(my) photos and facebook
posts related to swimming
2015PTraces Grefenstette - 31
swimming
emails about
swimming things
2015PTraces Grefenstette - 32
swimming
places I’ve been involving
swimming
2015PTraces Grefenstette - 33
swimming
days involving
swimming things
2015PTraces Grefenstette - 34
swimming
phone calls about
swimming things…
2015PTraces Grefenstette - 35
swimming
Rather Self-Centred, no?
2015CLEF 2015 Grefenstette - 36
Personal Information System
Personal
archives
Induction semantic
dimensions
Personal
Semantic hierachies
Crowdsourced sema...
2015PTraces Grefenstette - 38
s
w
i
m
m
i
n
g
K
n
i
t
t
i
n
g
p
o
k
e
r
P
a
i
n
t
i
n
g
.
.
.
Expert >>> Crowdsourcing >>> Personal
Ontology Folksonomy Models
Expert >>> Crowdsourcing >>> Personal
Models Folksonomy Models
Expert >>> Crowdsourcing >>> Personal
Models Folksonomy Models
Knitting>Knitting_methods_for_shaping>Short_ro
Knitting>Knitting_stitches
Knitting>Knitting_stitches>List_of_knitting_stit...
Expert >>> Crowdsourcing >>> Personal
Models Folksonomy Models
2015CLEF 2015 Grefenstette - 44
2015CLEF 2015 Grefenstette - 45
Well, no….
2015CLEF 2015 Grefenstette - 46
Tweet
2015CLEF 2015 Grefenstette - 47
Less than 12 hours until I am in the pool
crying... thankful for mirrored goggles
Sw...
2015CLEF 2015 Grefenstette - 48
swimming -- weightlifting, cycling, gymnastics, judo,
table, volleyball, archery, rowing, ...
Existing taxonomies are for societal
exchanges
Do you want to buy this?
What famous person did this when?
What can we make...
Somthing like….
Sports/swimming/backstroke
Sports/swimming/on my back
Sports/swimming/breastroke
Sports/swimming/fins
Spor...
2015CLEF 2015 Grefenstette - 51
http://www.notsoboringlife.com/list-of-hobbies/Not just swimming!
Conclusion on Personal facets
There is a lot of work to do
•  for predictable needs (hobbies, pastimes, sports), we do not...
•  Information is moving from the Web into Apps
•  People are generating information in these siloed Apps
•  People genera...
•  Information is moving from the Web into Apps
•  People are generating information in these siloed Apps
•  People genera...
•  Information is moving from the Web into Apps
•  People are generating information in these siloed Apps
•  People genera...
Conclusion: Searching Personal Big Data
•  Information is moving from the Web into Apps
•  People are generating informati...
- 57- 57
Thank you !
www.inria.fr
Gurrin, Cathal and Smeaton, Alan F. and Doherty, Aiden R. (2014) LifeLogging:
personal big data. Foundations and Trends in...
Prochain SlideShare
Chargement dans…5
×

Clef 2015 Keynote Grefenstette September 8, 2015, Toulouse

693 vues

Publié le

People generally think of Big Data as something generated by machines or large communities of people interacting with the digital world. But technological progress means that each individual is currently, or soon will be, generating masses of digital data in their everyday lives. In every interaction with an application, every web page visited, every time your telephone is turned on, you generate information about yourself, Personal Big Data. With the rising adoption of quantified self gadgets, and the foreseeable adoption of intelligent glasses capturing daily life, the quantity of personal Big Data will only grow. In this Personal Big Data, as in other Big Data, a key problem is aligning concepts in the same semantic space. While concept alignment in the public sphere is an understood, though unresolved, problem, what does ontological organization of a personal space look like? Is it idiosyncratic, or something that can be shared between people? We will describe our current approach to this problem of organizing personal data and creating and exploiting a personal semantics.

Publié dans : Sciences
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Clef 2015 Keynote Grefenstette September 8, 2015, Toulouse

  1. 1. Personal Information Systems and Personal Semantics Gregory Grefenstette CLEF 2015 September 8, 2015 ,
  2. 2. Information is moving from the Web to Apps Each person generates a lot of data Two communities use it now Search in one’s own data is the future Four ways to search We need personal facets
  3. 3. 2015CLEF 2015 Grefenstette - 3 http://www.statista.com/statistics/263795/number-of-available-apps-in-the-apple-app-store/ Apple announced that 100 billion apps had been downloaded from its App Store (June 2015)
  4. 4. 2015CLEF 2015 Grefenstette - 4 2014
  5. 5. Another trend
  6. 6. Smart Glasses http://en.wikipedia.org/wiki/File:A_Google_Glass_wearer.jpg http://en.wikipedia.org/wiki/File:Aimoneyetap.jpg http://en.wikipedia.org/wiki/File:Golden-i_3.8_Headset_Computer.png Sony US Patent Application 20130069850 Microsoft US Patent Application 20120293548
  7. 7. https://www.youtube.com/watch?v=b7I7JuQXttw
  8. 8. Okay … Apps, Quantified Self, Smart Glasses Step back to NOW
  9. 9. Personal Big Data
  10. 10. Personal Big Data Email sent Email received Social network posts IP address location SMS, chats Search history Web pages visited Media viewed Credit card purchases Call data GPS locations Vitals signs Activity/inactivity Lifestyle Conversations Reading People seen Noises heard
  11. 11. Who uses this data today? Surely, each person should have the same access to their own data
  12. 12. Impediments to using our own data •  Data Silos •  Ownership •  Privacy •  Big Data Problems •  Variety •  Volume •  Merging -- Semantics
  13. 13. Supposing we could get all our data back into our own hands, how could we search it? Short course on 4 types of search
  14. 14. Search Engines – Cranfield/SMART Model 148 Sept 2015 CLEF 2015 Grefenstette ftp://ftp.cs.cornell.edu/pub/smart/cran .I 6 .W ventricular septal defect occurring in association with aortic regurgitation .I 7 .W radioisotopes in heart scanning. mainly used in diagnosis of pericardial effusions. also used to study tumors, heart enlargement, aneurysms and pericardial thickening. technetium, rihsa, radioactive hippurate, cholegraffin are used. .I 8 .W the effects of drugs on the bone marrow of man and animals, … 5 332 5 333 6 112 6 115 6 116 6 118 6 122 6 238 6 239 6 242 6 260 6 309 6 320 6 321 6 323 7 92 7 121 7 189 7 389 7 390 7 391 7 392 7 393 8 52 8 60 conditions . .I 237 cisternal fluid oxygen ... using a beckman micro-oxyg.. tension simultaneously in the.. and in arterial blood under.. that the cisternal oxygen.. oxygen tension of the surroun. the available free oxygen... duration in the cerebral... .I 238 ventricular septal defect obstruction . a case of ventricular... lesion and infundibular... coronary cusp of the aortic.. septal defect, was demonstra.. as a polyp-like mass in the... catheterization and angiocard ventricular outflow obstr... .I 239 functional adaptations of the congenital heart disease .... queries qrels documents
  15. 15. 2015CLEF 2015 Grefenstette - 15 Search Engines – Cranfield/SMART Model
  16. 16. 2015CLEF 2015 Grefenstette - 16 Schedules 3 Economics, Education, Society 33 Economics and Management 338 Industries, Products 338.1 – 338.4 Specific kinds of industries 338.4 Secondary Industries and Services 338.47 Goods and Services Built from 338.471 – 338.479 Subdivisions for Goods and Services Schedules 338.476 Technology 338.4767 Manufacturing 338.47677 Textiles 338.476772 Textiles of Seed hair fibres 338.4767721 Cotton Built from 338.47677210 Facet Indicator for Standard Subdivision Table 1 338.476772109 Historical, geographic, persons treatment Built from 338.4767721094 Europe Western Europe Table 2 338.47677210942 England and Wales 338.476772109427 Northwestern England and Isle of Man 338.4767721094276 Lancashire “The Lancashire cotton industry : a study in economic development” Assigned DDC Code: 338.4767721094276 Search Engines – Dewey Decimal Faceted Model
  17. 17. 2015CLEF 2015 Grefenstette - 17 Search Engines – Dewey Decimal Faceted Model
  18. 18. 2 Other Search Models: Maps, Time Intervals 2015CLEF 2015 Grefenstette - 18
  19. 19. Past Attempts 2015CLEF 2015 Grefenstette - 19
  20. 20. MyLifeBits 2015CLEF 2015 Grefenstette - 20 Gemmell, Jim, Gordon Bell, and Roger Lueder. "MyLifeBits: a personal database for everything." Communications of the ACM 49.1 (2006): 88-95. "But even with convenient classifications and labels ready to apply, we are still asking the user to become a filing clerk – manually annotating every document, email, photo, or conversation."
  21. 21. LifeLog 2015CLEF 2015 Grefenstette - 21 …The user can order the life-log agent to add retrieval keys (annotation) with an arbitrary name by simple operations on his cellular phone while the agent is capturing a life-log video. This enables the agent to identify a scene that the user wants to remember throughout his life, and thus the user can access easily to the videos that were captured during precious experiences" Aizawa, Kiyoharu, Tetsuro Hori, Shinya Kawasaki, and Takayuki Ishikawa. "Capture and efficient retrieval of life log." In Pervasive 2004 Workshop on Memory and Sharing Experiences, pp. 15-20. 2004.
  22. 22. Stuff I’ve Seen 2015CLEF 2015 Grefenstette - 22 …Research in cognitive psychology has found that people remember information, particularly older information, not in terms of exact time, but in terms of key episodes, such as a child’s birthday, exotic travel,… Cutrell, Edward, Susan T. Dumais, and Jaime Teevan. "Searching to eliminate personal information management." Communications of the ACM 49.1 (2006): 58-64
  23. 23. PERSON 2015CLEF 2015 Grefenstette - 23 …we define the general category for user’s activity in advance, such as ordinary activity and extra-ordinary activity. In ordinary activity is related to the activity in home or office. Generally, the activities occurred outside of those area, they are classified as extraordinary activities. In addition to these pre-defined activities, users can add their own activity through our learning based structure… For some duration, we record whole activities of user. For the repeated activities at same time, in same place with similar objects, our activity engine will register as user defined activities by asking in which category those can be included. Kim, Ig-Jae, et al. "PERSON: personalized experience recoding and searching on networked environment." Proceedings of the 3rd ACM workshop on Continuous archival and retrival of personal experences. ACM, 2006.
  24. 24. Personal Data Prototype 2015CLEF 2015 Grefenstette - 24 …Landmarks of tags are defined by the frequency of tags that are assigned to each item of personal data. A tag that has been in heavy use during a period of time is a candidate for a landmark. A tag that has rarely been used during a long period of time is also a candidate for a landmark. Outliers are candidates for landmarks in time-series data, such as home energy use, the number of steps walked, and histories of body weight. Data that exceed pre-defined or user-defined thresholds are also candidates. Other landmarks are public landmarks, which include shocking public news, bestsellers, blockbuster films, and annual rankings of top Web-search words. We can recall our own experiences on those days from these landmarks. Teraoka, Teruhiko. "Organization and exploration of heterogeneous personal data collected in daily life." Human- Centric Computing and Information Sciences 2.1 (2012): 1-15.
  25. 25. Dublin City University 2015CLEF 2015 Grefenstette - 25 …The user can order the life-log agent to add retrieval keys (annotation) with an arbitrary name by simple operations on his cellular phone while the agent is capturing a life-log video. This enables the agent to identify a scene that the user wants to remember throughout his life, and thus the user can access easily to the videos that were captured during precious experiences" Qiu, Zhengwei. "A lifelogging system supporting multimodal access." PhD diss., Dublin City University, 2013. Wang, Peng, and Alan F. Smeaton. "Aggregating semantic concepts for event representation in lifelogging." Proceedings of the International Workshop on Semantic Web Information Management. ACM, 2011.
  26. 26. Okay, we’ve seen -- Apps / QS -- Personal Big Data -- Some early attempts Everyone says Time is important Maps are important String search is important but… Facets, what are our personal facets? How can we automate them? 2015CLEF 2015 Grefenstette - 26
  27. 27. 2015PTraces Grefenstette - 27 swimming
  28. 28. 2015PTraces Grefenstette - 28 swimming (my) people involved in something about swimming
  29. 29. 2015PTraces Grefenstette - 29 swimming things I’ve bought involving swimming
  30. 30. 2015PTraces Grefenstette - 30 swimming (my) photos and facebook posts related to swimming
  31. 31. 2015PTraces Grefenstette - 31 swimming emails about swimming things
  32. 32. 2015PTraces Grefenstette - 32 swimming places I’ve been involving swimming
  33. 33. 2015PTraces Grefenstette - 33 swimming days involving swimming things
  34. 34. 2015PTraces Grefenstette - 34 swimming phone calls about swimming things…
  35. 35. 2015PTraces Grefenstette - 35 swimming
  36. 36. Rather Self-Centred, no? 2015CLEF 2015 Grefenstette - 36
  37. 37. Personal Information System Personal archives Induction semantic dimensions Personal Semantic hierachies Crowdsourced semantic Hierarchies (eg. Wikipedia) Expert semantic Hierarchies (eg. MeSH) Ingest/Annotate/Merge
  38. 38. 2015PTraces Grefenstette - 38 s w i m m i n g K n i t t i n g p o k e r P a i n t i n g . . .
  39. 39. Expert >>> Crowdsourcing >>> Personal Ontology Folksonomy Models
  40. 40. Expert >>> Crowdsourcing >>> Personal Models Folksonomy Models
  41. 41. Expert >>> Crowdsourcing >>> Personal Models Folksonomy Models
  42. 42. Knitting>Knitting_methods_for_shaping>Short_ro Knitting>Knitting_stitches Knitting>Knitting_stitches>List_of_knitting_stitche Knitting>Knitting_stitches>Basic_knitted_fabrics Knitting>Knitting_stitches>Decrease_(knitting) Knitting>Knitting_stitches>Dip_stitch Knitting>Knitting_stitches>Drop-stitch_knitting Knitting>Knitting_stitches>Elongated_stitch Knitting>Knitting_stitches>Fair_Isle_(technique) Knitting>Knitting_stitches>Grafting_(knitting) Knitting>Knitting_stitches>Loop_knitting Knitting>Knitting_stitches>Pick_up_stitches_(kni Knitting>Knitting_stitches>Plaited_stitch_(knitting Knitting>Knitting_stitches>Slip-stitch_knitting Knitting>Knitting_stitches>Yarn_over Knitting>Knitting_tools_and_materials Knitting>Knitting_tools_and_materials>Eisaku_N Knitting>Knitting_tools_and_materials>Hank_(te Knitting>Knitting_tools_and_materials>Knitting_m Knitting>Knitting_tools_and_materials>Knitting_N Knitting>Knitting_tools_and_materials>Knitting_n Knitting>Knitting_tools_and_materials>Knitting_n Knitting>Knitting_tools_and_materials>Lazy_Kat Knitting>Knitting_tools_and_materials>Liaghra Knitting>Knitting_tools_and_materials>Nostepinn Knitting>Knitting_tools_and_materials>Row_cou Expert >>> Crowdsourcing >>> Personal Models Folksonomy Models
  43. 43. Expert >>> Crowdsourcing >>> Personal Models Folksonomy Models
  44. 44. 2015CLEF 2015 Grefenstette - 44
  45. 45. 2015CLEF 2015 Grefenstette - 45
  46. 46. Well, no…. 2015CLEF 2015 Grefenstette - 46
  47. 47. Tweet 2015CLEF 2015 Grefenstette - 47 Less than 12 hours until I am in the pool crying... thankful for mirrored goggles Swimming>pool Swimming>goggles facets I’d want this …
  48. 48. 2015CLEF 2015 Grefenstette - 48 swimming -- weightlifting, cycling, gymnastics, judo, table, volleyball, archery, rowing, badminton, track, water, taekwondo, tennis, field, diving, handball, boxing, softball, karate, pentathlon, fencing, athletics, triathlon, wrestling, soccer http://webdocs.cs.ualberta.ca/~lindek/downloads.htm Distributional Semantics 1.5 billion words Wordnet
  49. 49. Existing taxonomies are for societal exchanges Do you want to buy this? What famous person did this when? What can we make for this? 2015CLEF 2015 Grefenstette - 49 We are missing a description of what is related to us, doing something… specific vocabularies loose taxonomies … facets
  50. 50. Somthing like…. Sports/swimming/backstroke Sports/swimming/on my back Sports/swimming/breastroke Sports/swimming/fins Sports/swimming/goggles Sports/swimming/fast lane Sports/swimming/slow lane Sports/swimming/laps Sports/swimming/lifeguard Sports/swimming/pool Sports/swimming/lake Sports/swimming/ocean Sports/swimming/Neuilly Nautic Centre Sport/swimming/South Hills Pool Sports/swimming/towel Sports/swimming/25m Sports/swimming/goggles Sports/swimming/cap Sports/swimming/swim suit 2015CLEF 2015 Grefenstette - 50
  51. 51. 2015CLEF 2015 Grefenstette - 51 http://www.notsoboringlife.com/list-of-hobbies/Not just swimming!
  52. 52. Conclusion on Personal facets There is a lot of work to do •  for predictable needs (hobbies, pastimes, sports), we do not have the basic facets we need •  for personal information (family, friends, familiar places), we have very little •  And this should be multilingual, too 2015CLEF 2015 Grefenstette - 52
  53. 53. •  Information is moving from the Web into Apps •  People are generating information in these siloed Apps •  People generate more digital information every day •  Wearable computing will create even more 2015CLEF 2015 Grefenstette - 53 Conclusion: Searching Personal Big Data
  54. 54. •  Information is moving from the Web into Apps •  People are generating information in these siloed Apps •  People generate more digital information every day •  Wearable computing will create even more •  At one point, people will want their information back 2015CLEF 2015 Grefenstette - 54 Conclusion: Searching Personal Big Data
  55. 55. •  Information is moving from the Web into Apps •  People are generating information in these siloed Apps •  People generate more digital information every day •  Wearable computing will create even more •  At one point, people will want their information back •  When you have too much information, you need facets •  The facets for organizing personal information will be needed and do not yet exist 2015CLEF 2015 Grefenstette - 55 Conclusion: Searching Personal Big Data
  56. 56. Conclusion: Searching Personal Big Data •  Information is moving from the Web into Apps •  People are generating information in these siloed Apps •  People generate more digital information every day •  Wearable computing will create even more •  At one point, people will want their information back •  When you have too much information, you need facets •  The facets for organizing personal information will be needed and do not yet exist •  There are billions of cell phone users. They will all want this. You should start working on it. 2015CLEF 2015 Grefenstette - 56
  57. 57. - 57- 57 Thank you ! www.inria.fr
  58. 58. Gurrin, Cathal and Smeaton, Alan F. and Doherty, Aiden R. (2014) LifeLogging: personal big data. Foundations and Trends in Information Retrieval, 8 (1). pp. 1-125. ISSN 1554-0677 Content type Per day Volume per day Volume per year Video 16 hours 90 GB 33 TB Autographer Camera 3000 images 1.3 GB 480 GB Audio 16 hours 630 MB 230 GB Microsoft Sensecam 4500 images 82 MB 30 GB Accelerometer 58,000 readings 138 KB 50 MB Locations 10,000 readings 27 KB 10 MB Bluetooth Interactions 400 (estimated) 5 MB 2 GB Words heard or read 100,000 700 KB 255 MB

×