SlideShare une entreprise Scribd logo
1  sur  39
Alyona Medelyan (Pingar)
                      @zelandiya

    THE NEXT-GENERATION
             SHAREPOINT:
POWERED BY TEXT ANALYTICS
AGENDA
• Information tasks
• Text analytics
• APIs
• Demos
• Conclusions
Information tasks
What do they cost us?
How does SharePoint help?
Avg. hours per week
14.5
       13.3                                               = $37K       year / person


              9.6   9.5
                          8.8   8.3
                                        6.8   6.7
                                                    5.6   5.6
                                                                4.3   4.2

                                                                             1




                                                                     Source:
                                      IDC, Hidden Cost of Information (2005)
SHAREPOINT SAVES TIME
 Interact with SP from Outlook
       Create docs collaboratively
                   Customize search configuration
                              Use sites, sets & libraries
                                     Define Managed Metadata
                                                       Configure forms
                                                            Design Workflow
Text Analytics
What is it and how does it work?
What tasks does it solve?
WHAT IS TEXT ANALYTICS?
                unstructured data



Linguistics                                  Search
   Statistics                          Data Extraction
  Text Processing                    Document Organization
Machine Learning                    Business Intelligence
Natural Language Processing          Opinion Mining
     Text Mining
TEXT ANALYTICS SAVES MORE TIME
    Compose search reports
        Extract entities
                                        … automatically
        Mine opinions & sentiment
              Cluster search results
                   Redact
                           Summarize
                               Generate metadata
                                              Fill databases
                                                     Profanity check
Text Analytics Software
What companies offer text analytics?
What are open source tools like?
TEXT ANALYTICS: GLOBAL PERSPECTIVE

User adoption has grown by 25% in 2010
 creating an $835 million market because:

• Unstructured data grows (ex. social)  Text analytics!
• Text analytics is central to effective information access
• Many successes in NLP: IBM Watson, Wolfram Alpha



                                    Full report by Seth Grimes:
                                  http://altaplana.com/TA2011
APPLICATIONS OF TEXT ANALYTICS
            Search & info access                                    39%
Customer experience management                                      39%
             Brand management                                       39%
                          Research                               36%
          Competitive intelligence                            33%
                Customer service                        26%
                       E-discovery                15%
                      Life sciences               15%
                    Product design                15%
                Online commerce             11%
                            Finance        10%
                               Other      9%
            Content management           8%
                Insurance & fraud        8%
              Millitary intelligence    7%
                 Law enforcement       6%                        Source:
                                             http://altaplana.com/TA2011
SEARCH & INFO ACCESS
 METADATA EXTRACTION

Document                  Easy to extract:                Metadata
                          File type, name & location,
                          creation & modification date,
                          authors

           Difficult to extract:
           Keywords,
           people & companies mentioned,
           suppliers & addresses mentioned
SEARCH & INFO ACCESS
KEYWORD EXTRACTION

Document     Candidates                                         Keywords



           Hi All,
           As of today, MetaStock has several new functions.
           The most important new feature is the ability to
           display forward heat rate charts.
           Also, notice that the interface looks different -- this
           reflects and accommodates the new features.
           If you have any questions regarding this new
           version of MetaStock, please contact Bella Santuri.
SEARCH & INFO ACCESS
KEYWORD EXTRACTION

Document     Candidates                                         Keywords



           Hi All,
           As of today, MetaStock has several new functions.
           The most important new feature is the ability to
           display forward heat rate charts.
           Also, notice that the interface looks different -- this
           reflects and accommodates the new features.
           If you have any questions regarding this new
           version of MetaStock, please contact Bella Santuri.
SEARCH & INFO ACCESS
    KEYWORD EXTRACTION

    Document     Candidates       Properties                        Keywords



               Hi All,
               As of today, MetaStock has several new functions.
 Frequency     The most important new feature is the ability to
    Position   display forward heat rate charts.
Corpus stats   Also, notice that the interface looks different -- this
Relatedness    reflects and accommodates the new features.
               If you have any questions regarding this new
               version of MetaStock, please contact Bella Santuri.
SEARCH & INFO ACCESS
 KEYWORD EXTRACTION

Document      Candidates       Properties         Scoring        Keywords



            Hi All,
            As of today, MetaStock has several new functions.
Heuristic   The most important new feature is the ability to
 scoring    display forward heat rate charts.
            Also, notice that the interface looks different -- this
Machine     reflects and accommodates the new features.
learning    If you have any questions regarding this new
            version of MetaStock, please contact Bella Santuri.
SEARCH & INFO ACCESS
NAMES EXTRACTION

Document      Examples       Properties       Learning        Names



           If you have any questions regarding this new version of
           MetaStock, please contact Bella Santuri.


                                NLP,
       Training data                            Machine
                             Heuristics,
       (annotations)                            Learning
                             Text mining
<SEARCH + TEXT ANALYTICS> COMPANIES




 Pingar, BasisTech, AlchemyAPI, LanguageComputer, OpenCalais, Extractiv
BRAND & CUSTOMER MANAGEMENT
   SENTIMENT ANALYSIS

 Reviews
Document
Document                                                        Visualization
  Tweets                Sentiment Analysis
                                                                Summary
  Surveys

Naïve approach: Sentiment-words dictionary!

Negative    Positive    BUT:
  suck      fantastic                        If you are reading this because it
 terrible   excellent                        is your darling fragrance, please
  awful     awesome                          wear it at home exclusively, and
                                             tape the windows shut.

                                                No sentiment words!
BRAND & CUSTOMER MANAGEMENT
   SENTIMENT ANALYSIS

 Reviews
Document
Document                                                  Visualization
  Tweets        Examples     Properties    Learning
                                                          Summary
  Surveys


                                       Presence
                                       Position
Training data          Lexicon                            Machine
                                    Part-of-Speech
(annotations)         induction                           Learning
                                       Negation
                                    Generalization
                Important:
                Identifying sentiment bearing sentences
                Attaching sentiment to a topic!
SENTIMENT ANALYSIS COMPANIES
Attensity
AlchemyAPI
Lexalytics
Saplo
Medallia
SAS
RESEARCH
    TEXT SUMMARIZATION
          Address      Hi All,
    Announcement       As of today, MetaStock has several new functions.
           Details     The most important new feature is the ability to
                       display forward heat rate charts.
       More details    Also, notice that the interface looks different -- this
                       reflects and accommodates the new features.
         Conclusion    If you have any questions regarding this new
                       version of MetaStock, please contact Bella Santuri.

Extractive summary:   As of today, MetaStock has several new functions.
Sentence compression: MetaStock has several new functions.
                      The new interface looks different.
Abstractive summary: MetaStock has new features and a new interface.
TEXT SUMMARIZATION COMPANIES




Lexalytics, Pingar
COMPETITIVE INTELLIGENCE:
ENTITY & ENTITY RELATION EXTRACTION




     Companies:
     OpenCalais, Extractiv, Pingar, Evri, AlchemyAPI, Zemanta
FRAUD INVESTIGATION:
NORMALIZATION OF DATES & NAMES




           Companies:
           Cicero, BasisTech
OPEN-SOURCE TOOLS
• NLTK – Apache license, Book, Python & academic
  datasets, nltk.org
• LingPipe – Commercial
  licenses, Tutorials, Coreference & Chinese
  segment, alias-i.com/lingpipe
• OpenNLP – Apache license, Parsing, MaxEnt
  ML, incubator.apache.org/opennlp
• GATE – restricted GPL, Training courses, Applications
  & framework, gate.ac.uk
• Stanford NLP – full GPL, Online docs, Full
  library, nlp.stanford.edu
APIs
What’s an API and how does it work?
What are the advantages of the API model?
Which API is the right one for you?
API ACCESS
                                     a protocol specifies how • SOAP
                                     XML needs to be encoded • REST
                a call is an XML message
                describing the request

                includes API authentication
                calls via a web service
                                              API                          ENGINE
             SDK
               usage examples
Developer creates                       An interface that                Software engine
  an application                     ensures communication             solves a specific task
REST API ACCESS FROM A BROWSER
API request
http://search.yahooapis.com/WebSearchService/V1/webSe
arch?appid=YahooDemo&query=madonna&context=Italian+sc
ulptors+and+painters+of+the+renaissance+favored+the+V
irgin+Mary+for+inspiration
API response
SOAP API ACCESS FROM VS2010
SOAP API ACCESS IN POWERSHELL




Read complete blog post “Bulk metadata extraction in SharePoint”:
http://bit.ly/powershell-migrate
API = EASY INTEGRATION & FLEXIBILITY
• Integrate into existing architecture
  via any programming language
• Improve known flaws in the current system/process
• Minimize adoption barriers within the company
  no or little training required for stuff
• Only pay for the features you need
• Flexible deployment:
   • Host API on site = Secure data exchange
   • Access the API in the cloud = Save on tech support & hardware
WHICH API IS BEST FOR YOU?
         I need to take some text and get a list of the
         important entities/keywords/phrases.


          Y: Term Extractor        API restrictions
          OpenCalais               Supported languages
          BeliefNetworks           Quality of results
          OpenAmplify              Semantic links
          AlchemyAPI 2nd           Synonyms/Duplicates
          Evri 1st

                           Blog post on API comparison:
                                      faganm.com/blog
HOW TO CHOOSE AN API:
• Define a specific task
• Think of what features are important
• Get prepared:
  • Subscribe for API keys
  • Get SDKs
  • Learn libraries
• Find representative data
• Build a test framework
• Compare results
METADATA EXTRACTION
IN SHAREPOINT
Demo
Pingar’s add-on for SharePoint 2010
built using a text analytics API
INTEGRATING APIS
INTO SCANNING
Video
Using Fuji Xerox SmartConnect and Pingar API
to scan documents in batch into SharePoint



                       http://www.youtube.com/watch?v=kluVp25upag
THE NEXT-GENERATION SHAREPOINT:
POWERED BY TEXT ANALYTICS
• What can be automated?
  • Metadata extraction, Data entry, Opinion mining,
    Sanitization, Doc approval, Summarization, …

• How to integrate text analytics
  into existing SharePoint applications?
  • Easy! Via an API

• How to find the right text analytics API?
  • Review what’s available
    Set up an experiment
    Compare results
Thank you to all of our Sponsors

Contenu connexe

Similaire à The Next-Generation SharePoint: Powered by Text Analytics

Mesh Labs Introduction June 2012
Mesh Labs Introduction June 2012Mesh Labs Introduction June 2012
Mesh Labs Introduction June 2012Umesh Ramalingachar
 
Should You Choose Java or Python for Data Science?
Should You Choose Java or Python for Data Science?Should You Choose Java or Python for Data Science?
Should You Choose Java or Python for Data Science?Narola Infotech
 
Left Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise AnalyticsLeft Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise AnalyticsInside Analysis
 
01 necto introduction_ready
01 necto introduction_ready01 necto introduction_ready
01 necto introduction_readywww.panorama.com
 
Tagging Up - MMS and Taxonomy In SharePoint 2010
Tagging Up - MMS and Taxonomy In SharePoint 2010Tagging Up - MMS and Taxonomy In SharePoint 2010
Tagging Up - MMS and Taxonomy In SharePoint 2010Chris McNulty
 
Belladati Meetup Singapore Workshop
Belladati Meetup Singapore WorkshopBelladati Meetup Singapore Workshop
Belladati Meetup Singapore Workshopbelladati
 
ICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPTICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPTDr. Haxel Consult
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewDr. Ananth Krishnamoorthy
 
Tour de France Azure PaaS 6/7 Ajouter de l'intelligence
Tour de France Azure PaaS 6/7 Ajouter de l'intelligenceTour de France Azure PaaS 6/7 Ajouter de l'intelligence
Tour de France Azure PaaS 6/7 Ajouter de l'intelligenceAlex Danvy
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data ScienceDataWorks Summit
 
The New Normal: Predictive Power on the Front Lines
The New Normal: Predictive Power on the Front LinesThe New Normal: Predictive Power on the Front Lines
The New Normal: Predictive Power on the Front LinesInside Analysis
 
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...Lucas Jellema
 
Brochure data science learning path board-infinity (1)
Brochure   data science learning path board-infinity (1)Brochure   data science learning path board-infinity (1)
Brochure data science learning path board-infinity (1)NirupamNishant2
 
Enterprise Search from Microsoft
Enterprise Search  from MicrosoftEnterprise Search  from Microsoft
Enterprise Search from MicrosoftAmplexor
 
Conversational Artificial Intelligence with Ben Tomlinson and Wayne Thompson
Conversational Artificial Intelligence with Ben Tomlinson and Wayne ThompsonConversational Artificial Intelligence with Ben Tomlinson and Wayne Thompson
Conversational Artificial Intelligence with Ben Tomlinson and Wayne ThompsonDatabricks
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 Dataiku
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 editionDavid Talby
 
Oracle BI 11g Insync presentation
Oracle BI 11g Insync presentationOracle BI 11g Insync presentation
Oracle BI 11g Insync presentationInSync Conference
 

Similaire à The Next-Generation SharePoint: Powered by Text Analytics (20)

Mesh Labs Introduction June 2012
Mesh Labs Introduction June 2012Mesh Labs Introduction June 2012
Mesh Labs Introduction June 2012
 
Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017 Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017
 
Should You Choose Java or Python for Data Science?
Should You Choose Java or Python for Data Science?Should You Choose Java or Python for Data Science?
Should You Choose Java or Python for Data Science?
 
Left Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise AnalyticsLeft Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise Analytics
 
01 necto introduction_ready
01 necto introduction_ready01 necto introduction_ready
01 necto introduction_ready
 
Tagging Up - MMS and Taxonomy In SharePoint 2010
Tagging Up - MMS and Taxonomy In SharePoint 2010Tagging Up - MMS and Taxonomy In SharePoint 2010
Tagging Up - MMS and Taxonomy In SharePoint 2010
 
Belladati Meetup Singapore Workshop
Belladati Meetup Singapore WorkshopBelladati Meetup Singapore Workshop
Belladati Meetup Singapore Workshop
 
ICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPTICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPT
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape Overview
 
Tour de France Azure PaaS 6/7 Ajouter de l'intelligence
Tour de France Azure PaaS 6/7 Ajouter de l'intelligenceTour de France Azure PaaS 6/7 Ajouter de l'intelligence
Tour de France Azure PaaS 6/7 Ajouter de l'intelligence
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
The New Normal: Predictive Power on the Front Lines
The New Normal: Predictive Power on the Front LinesThe New Normal: Predictive Power on the Front Lines
The New Normal: Predictive Power on the Front Lines
 
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
 
Brochure data science learning path board-infinity (1)
Brochure   data science learning path board-infinity (1)Brochure   data science learning path board-infinity (1)
Brochure data science learning path board-infinity (1)
 
Enterprise Search from Microsoft
Enterprise Search  from MicrosoftEnterprise Search  from Microsoft
Enterprise Search from Microsoft
 
Conversational Artificial Intelligence with Ben Tomlinson and Wayne Thompson
Conversational Artificial Intelligence with Ben Tomlinson and Wayne ThompsonConversational Artificial Intelligence with Ben Tomlinson and Wayne Thompson
Conversational Artificial Intelligence with Ben Tomlinson and Wayne Thompson
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016
 
Sptechcon2011 mms2010
Sptechcon2011 mms2010Sptechcon2011 mms2010
Sptechcon2011 mms2010
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 
Oracle BI 11g Insync presentation
Oracle BI 11g Insync presentationOracle BI 11g Insync presentation
Oracle BI 11g Insync presentation
 

Dernier

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 

Dernier (20)

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 

The Next-Generation SharePoint: Powered by Text Analytics

  • 1. Alyona Medelyan (Pingar) @zelandiya THE NEXT-GENERATION SHAREPOINT: POWERED BY TEXT ANALYTICS
  • 2. AGENDA • Information tasks • Text analytics • APIs • Demos • Conclusions
  • 3. Information tasks What do they cost us? How does SharePoint help?
  • 4. Avg. hours per week 14.5 13.3 = $37K year / person 9.6 9.5 8.8 8.3 6.8 6.7 5.6 5.6 4.3 4.2 1 Source: IDC, Hidden Cost of Information (2005)
  • 5. SHAREPOINT SAVES TIME  Interact with SP from Outlook  Create docs collaboratively  Customize search configuration  Use sites, sets & libraries  Define Managed Metadata  Configure forms  Design Workflow
  • 6. Text Analytics What is it and how does it work? What tasks does it solve?
  • 7. WHAT IS TEXT ANALYTICS? unstructured data Linguistics Search Statistics Data Extraction Text Processing Document Organization Machine Learning Business Intelligence Natural Language Processing Opinion Mining Text Mining
  • 8. TEXT ANALYTICS SAVES MORE TIME  Compose search reports  Extract entities … automatically  Mine opinions & sentiment  Cluster search results  Redact  Summarize  Generate metadata  Fill databases  Profanity check
  • 9. Text Analytics Software What companies offer text analytics? What are open source tools like?
  • 10. TEXT ANALYTICS: GLOBAL PERSPECTIVE User adoption has grown by 25% in 2010 creating an $835 million market because: • Unstructured data grows (ex. social)  Text analytics! • Text analytics is central to effective information access • Many successes in NLP: IBM Watson, Wolfram Alpha Full report by Seth Grimes: http://altaplana.com/TA2011
  • 11. APPLICATIONS OF TEXT ANALYTICS Search & info access 39% Customer experience management 39% Brand management 39% Research 36% Competitive intelligence 33% Customer service 26% E-discovery 15% Life sciences 15% Product design 15% Online commerce 11% Finance 10% Other 9% Content management 8% Insurance & fraud 8% Millitary intelligence 7% Law enforcement 6% Source: http://altaplana.com/TA2011
  • 12. SEARCH & INFO ACCESS  METADATA EXTRACTION Document Easy to extract: Metadata File type, name & location, creation & modification date, authors Difficult to extract: Keywords, people & companies mentioned, suppliers & addresses mentioned
  • 13. SEARCH & INFO ACCESS KEYWORD EXTRACTION Document Candidates Keywords Hi All, As of today, MetaStock has several new functions. The most important new feature is the ability to display forward heat rate charts. Also, notice that the interface looks different -- this reflects and accommodates the new features. If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.
  • 14. SEARCH & INFO ACCESS KEYWORD EXTRACTION Document Candidates Keywords Hi All, As of today, MetaStock has several new functions. The most important new feature is the ability to display forward heat rate charts. Also, notice that the interface looks different -- this reflects and accommodates the new features. If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.
  • 15. SEARCH & INFO ACCESS KEYWORD EXTRACTION Document Candidates Properties Keywords Hi All, As of today, MetaStock has several new functions. Frequency The most important new feature is the ability to Position display forward heat rate charts. Corpus stats Also, notice that the interface looks different -- this Relatedness reflects and accommodates the new features. If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.
  • 16. SEARCH & INFO ACCESS KEYWORD EXTRACTION Document Candidates Properties Scoring Keywords Hi All, As of today, MetaStock has several new functions. Heuristic The most important new feature is the ability to scoring display forward heat rate charts. Also, notice that the interface looks different -- this Machine reflects and accommodates the new features. learning If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.
  • 17. SEARCH & INFO ACCESS NAMES EXTRACTION Document Examples Properties Learning Names If you have any questions regarding this new version of MetaStock, please contact Bella Santuri. NLP, Training data Machine Heuristics, (annotations) Learning Text mining
  • 18. <SEARCH + TEXT ANALYTICS> COMPANIES Pingar, BasisTech, AlchemyAPI, LanguageComputer, OpenCalais, Extractiv
  • 19. BRAND & CUSTOMER MANAGEMENT  SENTIMENT ANALYSIS Reviews Document Document Visualization Tweets Sentiment Analysis Summary Surveys Naïve approach: Sentiment-words dictionary! Negative Positive BUT: suck fantastic If you are reading this because it terrible excellent is your darling fragrance, please awful awesome wear it at home exclusively, and tape the windows shut. No sentiment words!
  • 20. BRAND & CUSTOMER MANAGEMENT  SENTIMENT ANALYSIS Reviews Document Document Visualization Tweets Examples Properties Learning Summary Surveys Presence Position Training data Lexicon Machine Part-of-Speech (annotations) induction Learning Negation Generalization Important: Identifying sentiment bearing sentences Attaching sentiment to a topic!
  • 22. RESEARCH  TEXT SUMMARIZATION Address Hi All, Announcement As of today, MetaStock has several new functions. Details The most important new feature is the ability to display forward heat rate charts. More details Also, notice that the interface looks different -- this reflects and accommodates the new features. Conclusion If you have any questions regarding this new version of MetaStock, please contact Bella Santuri. Extractive summary: As of today, MetaStock has several new functions. Sentence compression: MetaStock has several new functions. The new interface looks different. Abstractive summary: MetaStock has new features and a new interface.
  • 24. COMPETITIVE INTELLIGENCE: ENTITY & ENTITY RELATION EXTRACTION Companies: OpenCalais, Extractiv, Pingar, Evri, AlchemyAPI, Zemanta
  • 25. FRAUD INVESTIGATION: NORMALIZATION OF DATES & NAMES Companies: Cicero, BasisTech
  • 26. OPEN-SOURCE TOOLS • NLTK – Apache license, Book, Python & academic datasets, nltk.org • LingPipe – Commercial licenses, Tutorials, Coreference & Chinese segment, alias-i.com/lingpipe • OpenNLP – Apache license, Parsing, MaxEnt ML, incubator.apache.org/opennlp • GATE – restricted GPL, Training courses, Applications & framework, gate.ac.uk • Stanford NLP – full GPL, Online docs, Full library, nlp.stanford.edu
  • 27. APIs What’s an API and how does it work? What are the advantages of the API model? Which API is the right one for you?
  • 28. API ACCESS a protocol specifies how • SOAP XML needs to be encoded • REST a call is an XML message describing the request includes API authentication calls via a web service API ENGINE SDK usage examples Developer creates An interface that Software engine an application ensures communication solves a specific task
  • 29. REST API ACCESS FROM A BROWSER API request http://search.yahooapis.com/WebSearchService/V1/webSe arch?appid=YahooDemo&query=madonna&context=Italian+sc ulptors+and+painters+of+the+renaissance+favored+the+V irgin+Mary+for+inspiration API response
  • 30. SOAP API ACCESS FROM VS2010
  • 31. SOAP API ACCESS IN POWERSHELL Read complete blog post “Bulk metadata extraction in SharePoint”: http://bit.ly/powershell-migrate
  • 32. API = EASY INTEGRATION & FLEXIBILITY • Integrate into existing architecture via any programming language • Improve known flaws in the current system/process • Minimize adoption barriers within the company no or little training required for stuff • Only pay for the features you need • Flexible deployment: • Host API on site = Secure data exchange • Access the API in the cloud = Save on tech support & hardware
  • 33. WHICH API IS BEST FOR YOU? I need to take some text and get a list of the important entities/keywords/phrases. Y: Term Extractor API restrictions OpenCalais Supported languages BeliefNetworks Quality of results OpenAmplify Semantic links AlchemyAPI 2nd Synonyms/Duplicates Evri 1st Blog post on API comparison: faganm.com/blog
  • 34. HOW TO CHOOSE AN API: • Define a specific task • Think of what features are important • Get prepared: • Subscribe for API keys • Get SDKs • Learn libraries • Find representative data • Build a test framework • Compare results
  • 35. METADATA EXTRACTION IN SHAREPOINT Demo Pingar’s add-on for SharePoint 2010 built using a text analytics API
  • 36. INTEGRATING APIS INTO SCANNING Video Using Fuji Xerox SmartConnect and Pingar API to scan documents in batch into SharePoint http://www.youtube.com/watch?v=kluVp25upag
  • 37.
  • 38. THE NEXT-GENERATION SHAREPOINT: POWERED BY TEXT ANALYTICS • What can be automated? • Metadata extraction, Data entry, Opinion mining, Sanitization, Doc approval, Summarization, … • How to integrate text analytics into existing SharePoint applications? • Easy! Via an API • How to find the right text analytics API? • Review what’s available Set up an experiment Compare results
  • 39. Thank you to all of our Sponsors

Notes de l'éditeur

  1. What are your primary applications where text comes into play?