MyLifeBits van Microsoft

  1. 1. MyLifeBits Jim Gemmell February, 2005
  2. 2. Conclusion <ul><li>We have entered an era of virtually unlimited storage, enabling the lifetime store </li></ul><ul><li>To make the store useful we need annotation, typed links, and database features </li></ul><ul><li>More capture, more correlation – less work by the user </li></ul>
  3. 3. Collaborators <ul><li>Chief inspiration & guinea pig: Gordon Bell </li></ul><ul><li>Software development lead: Roger Lueder </li></ul><ul><li>MSR Collaborators: Lyndsay Williams, Ken Wood, Kentaro Toyama, Ron Logan, Steve Drucker, Curtis Wong, Mary Czerwinski, Brian Meyers </li></ul><ul><li>Interns: Josh Blumenstock, Evan Salomon, Aleks Aris </li></ul>
  4. 4. Outline <ul><li>What is MyLifeBits </li></ul><ul><li>History/Motivation </li></ul><ul><li>MyLifeBits system outline </li></ul><ul><li>Demo </li></ul><ul><li>Future work </li></ul>
  5. 5. MyLifeBits is: <ul><li>An experiment in lifetime storage </li></ul><ul><ul><li>Digitizing Gordon Bell’s past </li></ul></ul><ul><ul><li>Capturing more of his future </li></ul></ul><ul><li>A software system </li></ul><ul><ul><li>Capture </li></ul></ul><ul><ul><li>Storage & retrieval </li></ul></ul><ul><ul><li>Organization & annotation </li></ul></ul><ul><li>Minimum requirement: fulfill Vannevar Bush’s 1945 “Memex” vision </li></ul>
  6. 6. Memex As We May Think, Vannevar Bush, 1945 <ul><li>“ A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility” </li></ul><ul><li>Full-text search, text & audio annotations, and hyperlinks </li></ul>
  7. 7. I am data
  8. 8. The guinea pig <ul><li>Has now scanned virtually all: </li></ul><ul><ul><li>Books written (and read when possible) </li></ul></ul><ul><ul><li>Personal documents (correspondence including memos and email, bills, legal documents, papers written, …) </li></ul></ul><ul><ul><li>Photos </li></ul></ul><ul><ul><li>Posters, paintings, photo of things (artifacts, …medals, plaques) </li></ul></ul><ul><ul><li>Home movies and videos </li></ul></ul><ul><ul><li>CD collection </li></ul></ul><ul><ul><li>And, of course, all PC files </li></ul></ul><ul><li>Now recording: phone, radio, TV (movies), web pages… conversations and meetings to come </li></ul><ul><li>Paperless throughout 2002. 12” scanned, 12’ discarded . </li></ul><ul><li>Only 44 GB, incl. 10 wma, 14 SQL!!! Video: o(100) + 500 mov </li></ul>
  9. 9. The 1 TB Life <ul><li>1TB gives you 65+ years of: </li></ul><ul><ul><li>100 email messages a day (5KB each) </li></ul></ul><ul><ul><li>100 web pages day (50KB each) </li></ul></ul><ul><ul><li>5 scanned pages a day (100KB each) </li></ul></ul><ul><ul><li>1 book every 10 days (1 MB each) </li></ul></ul><ul><ul><li>10 photos per day (400 KB JPEG each) </li></ul></ul><ul><ul><li>8 hours per day of sound - e.g. telephone, voice annotations, and meeting recordings (8 Kb/s) </li></ul></ul><ul><ul><li>1 new music CD every 10 days (45 min each at 128 Kb/s) </li></ul></ul><ul><li>It will take you 5 years to fill up your 80 GB drive </li></ul><ul><li>Want video? Buy more cheap drives (1 TB/year lets you record 4 hours/day of 1.5 Mb/s video) </li></ul>
  10. 10. Trying to fill a terabyte in a year <ul><li>Gordon’s lifetime collection < 30 GB (12 GB is music CDs) </li></ul>4 hours 1.6K hours 1.5 Mb/s video 26 hours 9.3K hours 256 kb/s video 51 hours 18.6K hours 128 kb/s audio 2.9K docs 1.0M docs 1 MB document 7.3K photos 2.7M photos Photo (400 KB JPEG) Per day Per TB Item
  11. 11. “ yet if the user inserted 5000 pages of material a day it would take him hundreds of years to fill the repository, so that he can be profligate and enter material freely” -Vannevar Bush, 1945
  12. 12. So you’ve got it – now what do you do with it? <ul><li>Can you find anything? </li></ul><ul><li>Can you organize that many objects? </li></ul><ul><li>Once you find it will you know what it is? </li></ul><ul><li>Once you’ve found it once, could you find it again? </li></ul>
  13. 13. <ul><li>“ A record if it is to be useful … must be continuously extended, it must be stored, and above all it must be consulted” </li></ul><ul><li>“ The difficulty seems to be, not so much that we publish unduly … but rather that publication has been extended far beyond our present ability to make real use of the record” </li></ul><ul><li>- Vannevar Bush </li></ul>
  14. 14. MyLifeBits Software MyLifeBits store database Voice annotation tool Telephone capture tool TV capture tool TV EPG download tool Radio capture & EPG PocketPC transfer tool PocketRadio player Import files MyLifeBits Shell Browser tool Internet IM capture GPS import & Map display SenseCam Screen saver Text annotation tool MAPI interface Legacy email client Outlook interface files Legacy applications VIBE logging
  15. 15. Entities & Links Annotates Caller in Phone Call Photo of Event Transcludes
  16. 16. MyLifeBits Schema (simplified) Images Music Phone calls Resources Relation-ships Relation-ship types Entity types Resource entities Event types Event log Events Tasks People Notes Email Messages Saved searches
  17. 17. DEMO
  18. 18. Future work: new capture modes/devices SenseCam Deja View Body Media Quindi
  19. 19. Future work: Visualizations <ul><li>Don't give me a little card image and say, &quot;That's all you've got, because that's what I thought you should want for your virtual shoebox.&quot; There have got to be multiple modalities and the designers have to be able to deal with that. … don't metaphor me in, don't give me only one way of looking at things. </li></ul><ul><li>-Andy van Dam, Hypertext '87 Keynote Address </li></ul>Next Media U. Maryland IN-SPIRE Web Scout
  20. 20. Future work: UI <ul><li>UI Improvements </li></ul><ul><li>User studies </li></ul>
  21. 21. Future work: Content analysis & Data Mining <ul><li>Is MyLifeBits just enough rope to hang yourself with? </li></ul><ul><li>MyLifeBits must become MyPersonalAssistant </li></ul><ul><li>Content analysis and data mining </li></ul><ul><li>Doc similarity & “clean living” </li></ul><ul><li>Document meta-data extraction </li></ul>“ Creative thought and essentially repetitive thought are very different things. For the latter there are, and may be, powerful mechanical aids” – Vannevar Bush
  22. 22. Future work: scaling <ul><li>Just starting to hit performance problems </li></ul><ul><li>Stress tests & design modifications </li></ul>
  23. 23. www.MyLifeBits.com http://research.microsoft.com/CARPE2004
  25. 25. Everything goes in a database <ul><li>You need all the features of a database (Consistency, Indexing, Pivoting, Queries, Speed/scalability, Backup, replication) </li></ul><ul><li>If you don’t use one, you will find yourself creating one! </li></ul><ul><li>Files as blobs, also sync with file system for legacy apps </li></ul>SQL
  26. 26. CARPE ’04 The First ACM Workshop on Continuous Archival & Retrieval of Personal Experiences October 15 th 2004 Columbia University, New York, NY, USA
  27. 27. Dear Appy, How committed are you? Signed, Lost and Forgotten Data <ul><li>Dear Appy, </li></ul><ul><li>I'm having trouble with long-term commitment -- not on my end, heaven knows, but from the apps that created me and with whom I like to associate. Over time, these pesky apps evolve and they simply don't recognize the data that they once helped create! But, we data progeny -- and there are lots of us -- feel that as our creators, these apps should be responsible for eternal support. </li></ul><ul><li>But the little problem with recognition isn't the worst of it – sometimes the apps even disappear altogether. I ask you, is it expecting too much for 20-something year old data like me to be interpretable by my app (e.g. Acrobat, DB2, Draw, Eudora, Office, Quicken, or RealNetworks), or am I just associating with irresponsible apps? </li></ul><ul><li>If things continue on their current path, it seems I will be completely un-interpretable within 20 to 50 years! My apps will move to other platforms, or evolve to be more Internet- or Next-Big-Thing-centric... </li></ul>By Gordon Bell http://research.microsoft.com/~gbell
  28. 28. A Storocratic Oath <ul><li>Do no harm to dates (File creation, Photo taken) </li></ul><ul><li>Do no harm to device created & other meta-data. </li></ul><ul><ul><li>Camera data & location data are sacred. </li></ul></ul><ul><li>Support & aid the creation of critical meta-data. </li></ul><ul><ul><li>When/how the user feels like it </li></ul></ul><ul><ul><li>Auto-magically! </li></ul></ul><ul><li>Maintain user confidentiality </li></ul>
  29. 29. Classification wish list <ul><li>Download classifications rather than build them </li></ul><ul><li>Definitions & synonyms should help find what I want </li></ul><ul><li>Today it is too expensive to manually classify my scanned paper. E.g. “right time” meta-data is critical! </li></ul><ul><li>Next year I hope “the system” can classify papers and other documents e.g. bills </li></ul><ul><li>In 10 years I expect all documents to appear electronically & classified with a little help from me </li></ul>
  30. 30. Personal Search is not Professional or Web search <ul><li>System sees every entry & access </li></ul><ul><li>Everything, not just a professional life </li></ul><ul><li>Limited to SIS, not an infinite amount, covers a profession & personal life </li></ul> Web as seen by search engines MyLifeBits Knowledge breadth e.g. Dewey classification Depth e.g. information item types & coverage Professional user