Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

OpenMinteD Project - building a TDM infrastructure

by Stelios Piperidis, Head of Department, ILSP/ARC

  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

OpenMinteD Project - building a TDM infrastructure

  1. 1. OpenMinTeD Building an Open Text and Data Mining Infrastructure • Stelios Piperidis • spip@ilsp.gr • Institute for Language & Speech Processing • Athena Research & Innovation Centre
  2. 2. ● > 1,08 billion websites and 3,46 billion internet users, on 25 September 2016. ● > 24 million wireless sensors and actuators worldwide (553% up, between 2011 and 2016). ● > 16 zettabytes of useful data (16 Trillion GB) by 2020. ● YouTube claims to upload 24 hours of video every minute, making the site a hugely significant data aggregator. ● “Every second, on average, around 6,000 tweets are tweeted on Twitter, which corresponds to over 350,000 tweets sent per minute, >500 million tweets per day and around 200 billion tweets per year”. ● 74,200,000 pages existed on Facebook, with 7 million apps and websites integrated with Facebook on 30/5/2016.
  3. 3. The global research community generates over 1.5 million new scholarly articles per annum. e STM report (2009) … some 90% of papers … are never cited. … 50% of papers are never read by anyone other than their authors, referees and journal editors … one paper published every 30 seconds … 70,000 papers published on a single protein, the tumor suppressor p53 e STM report (2009) 3
  4. 4. process textual sources, organise and classify in various dimensions, extract main (indexical) information items identify and extract entities and relations between entities, facilitate the transformation of unstructured textual sources into structured data enable the multidimensional analysis of structured data to extract meaningful insights and improve the ability to predict
  5. 5. Text Types Newswire Scientific Literature Tweets/blogs Patents Clinical/medical records Textbooks, monographs Online forums …. Languages English French German Spanish Portuguese Italian Polish Tasks Translation Information Extraction Semantic Search Question Answering Sentiment Analysis Summarization Knowledge Discovery Domains Finance/Business Health Biology Social Sciences Humanities ….
  6. 6. Establish an open and sustainable Text and Data Mining (TDM) platform and infrastructure where researchers can collaboratively create, discover, share and re-use knowledge from a wide range of text based scientific and scholarly related sources. 7
  7. 7. Text Mining Researchers Content Providers End UsersComputing Infrastructures
  8. 8. 10 ACCESSIBLE CONTENT DISCOVERABLE SERVICES EFFICIENT PROCESSING RESEARCH COMMUNITIES VALUE ADDED APPS Via standardised programmatic interfaces and access rules Well-documented easily discoverable text mining services and workflows which process, analyse and annotate text Operate on public e-Infrastructures via standarized APIs Different scientific communities have different challenges Community-driven applications to illustrate the value of the infastructure. Engage with industry. 10
  9. 9. From the very beginning… Requirements, content, barriers, expected outcomes. … to the very end Create applications, validate and evaluate the results.
  10. 10. • Document literature content, language/knowledge resources, data categories taxonomies, provenance information • Document language processing/text mining services and workflows • Generic and domain-specific metadata descriptions • Combine services into workflows • Combine content and language resources with services and workflows • Combine automatic and manual/crowdsourcing annotation services • Study IPR restrictions for reuse of sources as well as possible exceptions • Promote clarity and standardisation of legal rights and obligations • Translate the legal & policy aspects into specifications for lawful user-to-service and service-to-service interactions
  11. 11. • documenting, depositing, managing, publishing and sharing scientific content and data, text and data mining software tools, services and workflows, language and knowledge resources • to enable both technically but also legally the linking and pipelining of text mining tools, services and workflows, as well as language and knowledge resources • automatic analysis, annotation and extraction of important information out of scientific content • composing, scheduling and orchestrating new processing workflows by combining existing text mining services and language/knowledge resources • services for advising on lawful use and combination of content, language resource and text mining services
  12. 12. 1. End users - Researchers, data base curators, … - Novice: use services to advance their science - Advanced: use TDM services into complex workflows 14 2. Content and service providers - Publishers, libraries, scientific data base centres, … - TDM researchers - SME’s
  13. 13. twitter.com/openminted_eu facebook.com/openminted bit.do/openmintedlinkedin vimeo.com/openminted bit.do/openmintedplus twitter.com/openminted_eu facebook.com/openminted bit.do/openmintedlinkedin vimeo.com/openminted bit.do/openmintedplus