12. Codebase History Lius Nutch Lius Lite Tika textmining Jackrabbit Andy Clark Jukka Zitting Rida Benjelloun Chris Mattman Jerome Charron Sami Siren Bertrand Delacretaz Keith Bennett
14. Content Extraction PPT Type: application/vnd.ms-powerpoint Title: Apache Tika Author: Jukka Zitting new PowerPointParser().parse(…);
15. Media Type Detection application/vnd.ms-powerpoint MimeTypes types = …; MimeType type = types.getMimeType(…); tika-mimetypes.xml /etc/magic mime.types ?
16. Combined Detection and Extraction PPT Type: application/vnd.ms-powerpoint Title: Apache Tika Author: Jukka Zitting TXT PDF XML new AutoDetectParser().parse(…); ?