SlideShare une entreprise Scribd logo
1  sur  125
Introducing Priberam Labs:
        Machine Learning and Natural Language Processing


                                      Andr´ Martins
                                          e




                              IST, Lisbon, November 22nd, 2012




Andr´ Martins (Priberam/IT)
    e                                Introducing Priberam Labs   IST 22/11/2012   1 / 56
Collaborators




      M´rio Figueiredo, Noah Smith, Pedro Aguiar, Eric Xing, Miguel Almeida.
       a

Andr´ Martins (Priberam/IT)
    e                          Introducing Priberam Labs      IST 22/11/2012   2 / 56
Outline


1   Introduction
       What is Priberam?
       What are the Priberam Labs?


2   Research at Priberam Labs


3   Master’s Projects


4   Academia Partnerships




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs   IST 22/11/2012   3 / 56
Outline


1   Introduction
       What is Priberam?
       What are the Priberam Labs?


2   Research at Priberam Labs


3   Master’s Projects


4   Academia Partnerships




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs   IST 22/11/2012   4 / 56
What is Priberam?

      A spin-off from IST funded in 1989
      R&D in the area of language technologies
      Microsoft gold certified partner, PME L´
                                            ıder, PME Inovadora COTEC




Andr´ Martins (Priberam/IT)
    e                            Introducing Priberam Labs   IST 22/11/2012   5 / 56
What is Priberam?

      A spin-off from IST funded in 1989
      R&D in the area of language technologies
      Microsoft gold certified partner, PME L´
                                            ıder, PME Inovadora COTEC
      Some of our clients:




Andr´ Martins (Priberam/IT)
    e                            Introducing Priberam Labs   IST 22/11/2012   5 / 56
Online Dictionary




      (http://www.priberam.pt/dlpo — 1M page-views per day)



Andr´ Martins (Priberam/IT)
    e                            Introducing Priberam Labs    IST 22/11/2012   6 / 56
Grammar Checker




      (http://www.flip.pt)



Andr´ Martins (Priberam/IT)
    e                           Introducing Priberam Labs   IST 22/11/2012   7 / 56
Legal Search




      (http://www.legix.pt)



Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs   IST 22/11/2012   8 / 56
Newswire Search




      (http://www.dn.pt, http://www.jn.pt, http://www.tsf.pt)

Andr´ Martins (Priberam/IT)
    e                            Introducing Priberam Labs      IST 22/11/2012   9 / 56
Newswire Search
                                        question




      (http://www.dn.pt, http://www.jn.pt, http://www.tsf.pt)

Andr´ Martins (Priberam/IT)
    e                            Introducing Priberam Labs      IST 22/11/2012   9 / 56
Newswire Search
                                        question




           answer




      (http://www.dn.pt, http://www.jn.pt, http://www.tsf.pt)

Andr´ Martins (Priberam/IT)
    e                            Introducing Priberam Labs      IST 22/11/2012   9 / 56
Outline


1   Introduction
       What is Priberam?
       What are the Priberam Labs?


2   Research at Priberam Labs


3   Master’s Projects


4   Academia Partnerships




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs   IST 22/11/2012   10 / 56
What are the Priberam Labs?




Every day we deal with challenging and stimulating problems, some of
them unanswered by current scientific knowledge




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs   IST 22/11/2012   11 / 56
What are the Priberam Labs?




Every day we deal with challenging and stimulating problems, some of
them unanswered by current scientific knowledge
Our key areas: Natural Language Processing and Machine Learning




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs   IST 22/11/2012   11 / 56
What are the Priberam Labs?




Every day we deal with challenging and stimulating problems, some of
them unanswered by current scientific knowledge
Our key areas: Natural Language Processing and Machine Learning
Our goals:
      advance the state of the art in NLP and ML
      incorporate the resulting innovations in new products
      promote collaborations with other researchers in academia
Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs       IST 22/11/2012   11 / 56
Outline


1   Introduction
       What is Priberam?
       What are the Priberam Labs?


2   Research at Priberam Labs


3   Master’s Projects


4   Academia Partnerships




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs   IST 22/11/2012   12 / 56
Our Research Interests




      Natural Language Processing
      Machine Learning
      Structured Prediction
      Graphical Models




Andr´ Martins (Priberam/IT)
    e                              Introducing Priberam Labs   IST 22/11/2012   13 / 56
Our Research Interests




      Natural Language Processing
      Machine Learning
      Structured Prediction
      Graphical Models




Andr´ Martins (Priberam/IT)
    e                              Introducing Priberam Labs   IST 22/11/2012   13 / 56
Natural Language Processing



Goal: make machines capable of “understanding” human language.




Andr´ Martins (Priberam/IT)
    e                          Introducing Priberam Labs   IST 22/11/2012   14 / 56
Natural Language Processing



Goal: make machines capable of “understanding” human language.

                                                  Information Retrieval
                                                  Machine Translation
                                                  Syntactic Parsing
                                                  Semantic Parsing
                                                  Speech Recognition
                                                  ...




Andr´ Martins (Priberam/IT)
    e                          Introducing Priberam Labs          IST 22/11/2012   14 / 56
The Empirical “Revolution” in NLP

Until the 1980s: rule-based methods were prevalent in AI




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs    IST 22/11/2012   15 / 56
The Empirical “Revolution” in NLP

Until the 1980s: rule-based methods were prevalent in AI
Since the mid 1990s: statistical methods, corpus linguistics




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs    IST 22/11/2012   15 / 56
The Empirical “Revolution” in NLP

Until the 1980s: rule-based methods were prevalent in AI
Since the mid 1990s: statistical methods, corpus linguistics
Today: emphasis in machine learning and large-scale data processing
      “The unreasonable effectiveness of data”, Halevy et al. 2009




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs    IST 22/11/2012   15 / 56
Our Research Interests




      Natural Language Processing
      Machine Learning
      Structured Prediction
      Graphical Models




Andr´ Martins (Priberam/IT)
    e                              Introducing Priberam Labs   IST 22/11/2012   16 / 56
Our Research Interests




      Natural Language Processing
      Machine Learning
      Structured Prediction
      Graphical Models




Andr´ Martins (Priberam/IT)
    e                              Introducing Priberam Labs   IST 22/11/2012   16 / 56
Example: Spam Detector




Andr´ Martins (Priberam/IT)
    e                               Introducing Priberam Labs   IST 22/11/2012   17 / 56
Example: Spam Detector




Andr´ Martins (Priberam/IT)
    e                               Introducing Priberam Labs   IST 22/11/2012   17 / 56
Example: Spam Detector




Andr´ Martins (Priberam/IT)
    e                               Introducing Priberam Labs   IST 22/11/2012   17 / 56
Example: Spam Detector




Andr´ Martins (Priberam/IT)
    e                               Introducing Priberam Labs   IST 22/11/2012   17 / 56
Example: Spam Detector




Andr´ Martins (Priberam/IT)
    e                               Introducing Priberam Labs   IST 22/11/2012   17 / 56
Example: Spam Detector




Andr´ Martins (Priberam/IT)
    e                               Introducing Priberam Labs   IST 22/11/2012   17 / 56
Machine Learning

Goal: build systems that learn from the data.




Mitchell (1997); Manning and Sch¨tze (1999); Sch¨lkopf and Smola (2002); Bishop (2006)
                                u               o




Andr´ Martins (Priberam/IT)
    e                              Introducing Priberam Labs          IST 22/11/2012     18 / 56
Machine Learning

Goal: build systems that learn from the data.

      Input set X and output set Y




Mitchell (1997); Manning and Sch¨tze (1999); Sch¨lkopf and Smola (2002); Bishop (2006)
                                u               o




Andr´ Martins (Priberam/IT)
    e                              Introducing Priberam Labs          IST 22/11/2012     18 / 56
Machine Learning

Goal: build systems that learn from the data.

      Input set X and output set Y
      Learn a classifier h : X → Y from a set of labeled examples
      {(xi , yi )}N ⊆ X × Y
                  i=1




Mitchell (1997); Manning and Sch¨tze (1999); Sch¨lkopf and Smola (2002); Bishop (2006)
                                u               o




Andr´ Martins (Priberam/IT)
    e                              Introducing Priberam Labs          IST 22/11/2012     18 / 56
Machine Learning

Goal: build systems that learn from the data.

      Input set X and output set Y
      Learn a classifier h : X → Y from a set of labeled examples
      {(xi , yi )}N ⊆ X × Y
                  i=1
      Given an unseen example x ∈ X, predict y = h(x)




Mitchell (1997); Manning and Sch¨tze (1999); Sch¨lkopf and Smola (2002); Bishop (2006)
                                u               o




Andr´ Martins (Priberam/IT)
    e                              Introducing Priberam Labs          IST 22/11/2012     18 / 56
Machine Learning

Goal: build systems that learn from the data.

      Input set X and output set Y
      Learn a classifier h : X → Y from a set of labeled examples
      {(xi , yi )}N ⊆ X × Y
                  i=1
      Given an unseen example x ∈ X, predict y = h(x)
      Many approaches: decision trees, neural networks, nearest neighbors,
      naive Bayes, logistic regression, support vector machines, ...
      Many learning formalisms: supervised, unsupervised, semi-supervised,
      weakly-supervised, active, online, reinforcement, ...


Mitchell (1997); Manning and Sch¨tze (1999); Sch¨lkopf and Smola (2002); Bishop (2006)
                                u               o




Andr´ Martins (Priberam/IT)
    e                              Introducing Priberam Labs          IST 22/11/2012     18 / 56
Our Research Interests




      Natural Language Processing
      Machine Learning
      Structured Prediction
      Graphical Models




Andr´ Martins (Priberam/IT)
    e                              Introducing Priberam Labs   IST 22/11/2012   19 / 56
Our Research Interests




      Natural Language Processing
      Machine Learning
      Structured Prediction
      Graphical Models




Andr´ Martins (Priberam/IT)
    e                              Introducing Priberam Labs   IST 22/11/2012   19 / 56
Structured Prediction


Language is structured, complex, and ambiguous.




Lafferty et al. (2001); Taskar et al. (2003); Altun et al. (2003); Tsochantaridis et al. (2004)




Andr´ Martins (Priberam/IT)
    e                                 Introducing Priberam Labs             IST 22/11/2012       20 / 56
Structured Prediction


Language is structured, complex, and ambiguous.
The input set X is typically structured (a string, an acoustic signal, etc.)
Often: the output set Y is also structured (a string, a parse tree, etc.)




Lafferty et al. (2001); Taskar et al. (2003); Altun et al. (2003); Tsochantaridis et al. (2004)




Andr´ Martins (Priberam/IT)
    e                                 Introducing Priberam Labs             IST 22/11/2012       20 / 56
Structured Prediction


Language is structured, complex, and ambiguous.
The input set X is typically structured (a string, an acoustic signal, etc.)
Often: the output set Y is also structured (a string, a parse tree, etc.)
Some problems:
      How to decode structured outputs?
      How to learn models for structured prediction?
      How to learn the structure itself?

Lafferty et al. (2001); Taskar et al. (2003); Altun et al. (2003); Tsochantaridis et al. (2004)




Andr´ Martins (Priberam/IT)
    e                                 Introducing Priberam Labs             IST 22/11/2012       20 / 56
Example: Part-of-Speech Tagging

Goal: given a sentence, determine the part-of-speech tag of each word.




                          Time   flies        like         an   arrow




Andr´ Martins (Priberam/IT)
    e                               Introducing Priberam Labs           IST 22/11/2012   21 / 56
Example: Part-of-Speech Tagging

Goal: given a sentence, determine the part-of-speech tag of each word.




                          Noun                            Det   Noun




                          Time   flies        like         an   arrow




Andr´ Martins (Priberam/IT)
    e                               Introducing Priberam Labs           IST 22/11/2012   21 / 56
Example: Part-of-Speech Tagging

Goal: given a sentence, determine the part-of-speech tag of each word.




                                 Noun?
                          Noun   Verb?                    Det   Noun




                          Time   flies        like         an   arrow




Andr´ Martins (Priberam/IT)
    e                               Introducing Priberam Labs           IST 22/11/2012   21 / 56
Example: Part-of-Speech Tagging

Goal: given a sentence, determine the part-of-speech tag of each word.




                                 Noun?      Prep?
                          Noun   Verb?      Verb?         Det   Noun




                          Time   flies        like         an   arrow




Andr´ Martins (Priberam/IT)
    e                               Introducing Priberam Labs           IST 22/11/2012   21 / 56
Example: Part-of-Speech Tagging

Goal: given a sentence, determine the part-of-speech tag of each word.

      Rule-based systems (Brill, 1993)




                                 Noun?      Prep?
                          Noun   Verb?      Verb?         Det   Noun




                          Time   flies        like         an   arrow




Andr´ Martins (Priberam/IT)
    e                               Introducing Priberam Labs           IST 22/11/2012   21 / 56
Example: Part-of-Speech Tagging

Goal: given a sentence, determine the part-of-speech tag of each word.

      Rule-based systems (Brill, 1993)
      Hidden Markov models (Brants, 2000)



                          Noun   Verb        Prep         Det   Noun




                          Time   flies        like         an   arrow




Andr´ Martins (Priberam/IT)
    e                               Introducing Priberam Labs           IST 22/11/2012   21 / 56
Example: Part-of-Speech Tagging

Goal: given a sentence, determine the part-of-speech tag of each word.

      Rule-based systems (Brill, 1993)
      Hidden Markov models (Brants, 2000)
      Conditional random fields (Lafferty et al., 2001)

                          Noun   Verb        Prep          Det   Noun




                                 Time flies like an arrow




Andr´ Martins (Priberam/IT)
    e                               Introducing Priberam Labs           IST 22/11/2012   21 / 56
Our Research Interests




      Natural Language Processing
      Machine Learning
      Structured Prediction
      Graphical Models




Andr´ Martins (Priberam/IT)
    e                              Introducing Priberam Labs   IST 22/11/2012   22 / 56
Our Research Interests




      Natural Language Processing
      Machine Learning
      Structured Prediction
      Graphical Models




Andr´ Martins (Priberam/IT)
    e                              Introducing Priberam Labs   IST 22/11/2012   22 / 56
Graphical Models




      Inspired in Statistical Mechanics (Ising, 1925; Potts, 1952)
      Applications in coding theory, vision, computational biology, ...
      (Tanner, 1981; Pearl, 1988; Kschischang et al., 2001; Koller and Friedman, 2009)




Andr´ Martins (Priberam/IT)
    e                             Introducing Priberam Labs        IST 22/11/2012   23 / 56
Graphical Models




      Inspired in Statistical Mechanics (Ising, 1925; Potts, 1952)
      Applications in coding theory, vision, computational biology, ...
      (Tanner, 1981; Pearl, 1988; Kschischang et al., 2001; Koller and Friedman, 2009)

MAP Inference: obtain the most likely configuration.




Andr´ Martins (Priberam/IT)
    e                             Introducing Priberam Labs        IST 22/11/2012   23 / 56
Graphical Models




      Inspired in Statistical Mechanics (Ising, 1925; Potts, 1952)
      Applications in coding theory, vision, computational biology, ...
      (Tanner, 1981; Pearl, 1988; Kschischang et al., 2001; Koller and Friedman, 2009)

MAP Inference: obtain the most likely configuration.

      Graphs without cycles: dynamic programming (Viterbi, 1967)




Andr´ Martins (Priberam/IT)
    e                             Introducing Priberam Labs        IST 22/11/2012   23 / 56
Graphical Models




      Inspired in Statistical Mechanics (Ising, 1925; Potts, 1952)
      Applications in coding theory, vision, computational biology, ...
      (Tanner, 1981; Pearl, 1988; Kschischang et al., 2001; Koller and Friedman, 2009)

MAP Inference: obtain the most likely configuration.

      Graphs without cycles: dynamic programming (Viterbi, 1967)
      In general NP-hard!


Andr´ Martins (Priberam/IT)
    e                             Introducing Priberam Labs        IST 22/11/2012   23 / 56
AD3 Algorithm (Martins et al., 2010a, 2011a)




“Alternating Directions Dual Decomposition.”




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs   IST 22/11/2012   24 / 56
AD3 Algorithm (Martins et al., 2010a, 2011a)




“Alternating Directions Dual Decomposition.”


      An approximate MAP inference algorithm based on an LP relaxation




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs   IST 22/11/2012   24 / 56
AD3 Algorithm (Martins et al., 2010a, 2011a)




“Alternating Directions Dual Decomposition.”


      An approximate MAP inference algorithm based on an LP relaxation
      Fundamental idea: decompose the graph in parts, at each iteration
      t solve local subproblems and promote a consensus on the overlaps




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs   IST 22/11/2012   24 / 56
AD3 Algorithm (Martins et al., 2010a, 2011a)




“Alternating Directions Dual Decomposition.”


      An approximate MAP inference algorithm based on an LP relaxation
      Fundamental idea: decompose the graph in parts, at each iteration
      t solve local subproblems and promote a consensus on the overlaps
      Convergence rate O(1/t)




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs   IST 22/11/2012   24 / 56
AD3 Algorithm (Martins et al., 2010a, 2011a)




“Alternating Directions Dual Decomposition.”


      An approximate MAP inference algorithm based on an LP relaxation
      Fundamental idea: decompose the graph in parts, at each iteration
      t solve local subproblems and promote a consensus on the overlaps
      Convergence rate O(1/t)
      Can tackle combinatorial parts and first-order logic constraints




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs    IST 22/11/2012   24 / 56
AD3 Algorithm (Martins et al., 2010a, 2011a)




“Alternating Directions Dual Decomposition.”


      An approximate MAP inference algorithm based on an LP relaxation
      Fundamental idea: decompose the graph in parts, at each iteration
      t solve local subproblems and promote a consensus on the overlaps
      Convergence rate O(1/t)
      Can tackle combinatorial parts and first-order logic constraints
      Code available at: http://www.ark.cs.cmu.edu/AD3


Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs    IST 22/11/2012   24 / 56
Graphs are Everywhere




                                                                    Facebook graph
                  WWW graph




                      Protein folding                    Image Segmentation


Andr´ Martins (Priberam/IT)
    e                                   Introducing Priberam Labs             IST 22/11/2012   25 / 56
Syntactic Parsing
(Chomsky, 1965; Magerman, 1995; Charniak, 1996; Collins, 1999; Klein and Manning, 2003)




Andr´ Martins (Priberam/IT)
    e                              Introducing Priberam Labs          IST 22/11/2012   26 / 56
Syntactic Parsing
(Chomsky, 1965; Magerman, 1995; Charniak, 1996; Collins, 1999; Klein and Manning, 2003)

      She solved the problem with the statistical method.




Andr´ Martins (Priberam/IT)
    e                                Introducing Priberam Labs        IST 22/11/2012   26 / 56
Syntactic Parsing
(Chomsky, 1965; Magerman, 1995; Charniak, 1996; Collins, 1999; Klein and Manning, 2003)

      She solved the problem with the statistical method.
                                             S
 S --> NP VP
 NP --> Pro
 NP --> Det N                 NP                           VP
 NP --> Det Nbar
 Nbar --> Adj N               Pro
 VP --> V NP PP
 PP --> P NP                  She
 Det --> the                          V                    NP                PP
 Pro --> She                        solved           Det        N
 N --> problem                                                         P            NP
 N --> method                                        the problem
 V --> solved                                                         with
                                                                             Det           Nbar
 P --> with
 Adj -->                                                                     the     Adj             N
 statistical
                                                                                   statistical method



Andr´ Martins (Priberam/IT)
    e                                     Introducing Priberam Labs                 IST 22/11/2012       26 / 56
Syntactic Ambiguity
  1 She employed the statistical method:
                                                     S



                              NP                                  VP

                              She



                                          V                       NP                    PP
                                        solved               the problem
                                                                            with the statistical method

  2 The statistical method was broken:
                                                     S


                                   NP                       VP

                                She

                                                 V                         NP

                                              solved
                                                             NP                       PP

                                                         the problem
                                                                           with the statistical method

Andr´ Martins (Priberam/IT)
    e                                    Introducing Priberam Labs                           IST 22/11/2012   27 / 56
Dependency Syntax
(P¯nini, 4th century BCE, Tesni`re 1959; Hudson 1984; Mel’ˇuk 1988; Eisner 1996; McDonald
  a.                           e                          c
et al. 2005; Nivre et al. 2006; Koo et al. 2007)




       *       She            solved   the   problem        with    the   statistical   method



Tree obtained “lexicalizing” the previous phrase-structure tree.
      A lightweight syntactic formalism, without phrases
      Grammar functions represented as lexical relationships



Andr´ Martins (Priberam/IT)
    e                                   Introducing Priberam Labs                IST 22/11/2012   28 / 56
Turbo Parser (Martins et al., 2009, 2010b, 2011b)


   A multi-lingual statistical dependency parser,
   which formulates parsing as inference in a
   graphical model.


      Ignores global effects caused by the cycles of the graph
      Same idea that underlies turbo decoders (Berrou et al., 1993)
      Uses AD3 for solving the relaxation
      State-of-the-art accuracies, extremely fast (1, 200 words per second)
      Code available at: http://www.ark.cs.cmu.edu/TurboParser




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs    IST 22/11/2012   29 / 56
Ongoing Project: Summarization
Given a set of documents about an event, generate a brief summary.




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs   IST 22/11/2012   30 / 56
Ongoing Project: Summarization
Given a set of documents about an event, generate a brief summary.




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs   IST 22/11/2012   30 / 56
Extractive Summarization
Just extract the most salient sentences.




Andr´ Martins (Priberam/IT)
    e                               Introducing Priberam Labs   IST 22/11/2012   31 / 56
Extractive Summarization
Just extract the most salient sentences.
      Reward relevance and coverage, penalize redundancy




Andr´ Martins (Priberam/IT)
    e                               Introducing Priberam Labs   IST 22/11/2012   31 / 56
Compressive Summarization
Jointly extract and compress sentences.




Andr´ Martins (Priberam/IT)
    e                          Introducing Priberam Labs   IST 22/11/2012   32 / 56
Compressive Summarization
Jointly extract and compress sentences.
      Trade-off between informativeness, length, and grammaticality




Andr´ Martins (Priberam/IT)
    e                          Introducing Priberam Labs   IST 22/11/2012   32 / 56
Released Software


      A multilingual part-of-speech tagger (TurboTagger)
      A multilingual dependency parser (TurboParser)
      A algorithm for approximate inference in graphical models (AD3 )


                     http://www.ark.cs.cmu.edu/TurboParser
                     http://www.ark.cs.cmu.edu/AD3


                                 lti


Andr´ Martins (Priberam/IT)
    e                             Introducing Priberam Labs   IST 22/11/2012   33 / 56
Outline


1   Introduction
       What is Priberam?
       What are the Priberam Labs?


2   Research at Priberam Labs


3   Master’s Projects


4   Academia Partnerships




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs   IST 22/11/2012   34 / 56
Master’s Projects




      Opinion Mining in Newspapers and Blogs
      Text-Driven Forecasting
      Recommendation Systems
      Weakly Supervised Sentiment Analysis




Andr´ Martins (Priberam/IT)
    e                            Introducing Priberam Labs   IST 22/11/2012   35 / 56
Master’s Projects




      Opinion Mining in Newspapers and Blogs
      Text-Driven Forecasting
      Recommendation Systems
      Weakly Supervised Sentiment Analysis




Andr´ Martins (Priberam/IT)
    e                            Introducing Priberam Labs   IST 22/11/2012   35 / 56
Opinion Mining in Newspapers and Blogs




Build a system that extracts “opinions” from text in natural language.




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs   IST 22/11/2012   36 / 56
Opinion Mining in Newspapers and Blogs




Build a system that extracts “opinions” from text in natural language.
      Examples: opinions of politicians about controversial topics, user
      reviews about products, opinions expressed in blogs and Twitter, etc.




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs    IST 22/11/2012   36 / 56
Opinion Mining in Newspapers and Blogs




Build a system that extracts “opinions” from text in natural language.
      Examples: opinions of politicians about controversial topics, user
      reviews about products, opinions expressed in blogs and Twitter, etc.
      Goal: a computer program that extracts opinions, identifies the
      opinion holder, the aspect that is being opinionated about, and the
      opinion polarity (positive or negative sentiment)


Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs    IST 22/11/2012   36 / 56
Example: Google Products




                                                             opinion snippets




           aspects



Andr´ Martins (Priberam/IT)
    e                            Introducing Priberam Labs         IST 22/11/2012   37 / 56
Master’s Projects




      Opinion Mining in Newspapers and Blogs
      Text-Driven Forecasting
      Recommendation Systems
      Weakly Supervised Sentiment Analysis




Andr´ Martins (Priberam/IT)
    e                            Introducing Priberam Labs   IST 22/11/2012   38 / 56
Master’s Projects




      Opinion Mining in Newspapers and Blogs
      Text-Driven Forecasting
      Recommendation Systems
      Weakly Supervised Sentiment Analysis




Andr´ Martins (Priberam/IT)
    e                            Introducing Priberam Labs   IST 22/11/2012   38 / 56
Text-Driven Forecasting

       Example: a movie by a famous director has
       premiered. Can we predict its gross revenue
       given opinionated text?

              “[...] a masterpiece in sheer
              awfulness.” — Rotten Tomatoes




Andr´ Martins (Priberam/IT)
    e                               Introducing Priberam Labs   IST 22/11/2012   39 / 56
Text-Driven Forecasting

       Example: a movie by a famous director has
       premiered. Can we predict its gross revenue
       given opinionated text?

              “[...] a masterpiece in sheer
              awfulness.” — Rotten Tomatoes


      Goal: develop ML algorithms for predicting numeric quantities about
      an event given a body of text.




Andr´ Martins (Priberam/IT)
    e                               Introducing Priberam Labs   IST 22/11/2012   39 / 56
Text-Driven Forecasting

       Example: a movie by a famous director has
       premiered. Can we predict its gross revenue
       given opinionated text?

              “[...] a masterpiece in sheer
              awfulness.” — Rotten Tomatoes


      Goal: develop ML algorithms for predicting numeric quantities about
      an event given a body of text.
      Possible applications: predicting the revenue of movies, opinion
      polls from blogs, stock volatility from financial reports, the number of
      external links given a news article, etc.


Andr´ Martins (Priberam/IT)
    e                               Introducing Priberam Labs   IST 22/11/2012   39 / 56
Master’s Projects




      Opinion Mining in Newspapers and Blogs
      Text-Driven Forecasting
      Recommendation Systems
      Weakly Supervised Sentiment Analysis




Andr´ Martins (Priberam/IT)
    e                            Introducing Priberam Labs   IST 22/11/2012   40 / 56
Master’s Projects




      Opinion Mining in Newspapers and Blogs
      Text-Driven Forecasting
      Recommendation Systems
      Weakly Supervised Sentiment Analysis




Andr´ Martins (Priberam/IT)
    e                            Introducing Priberam Labs   IST 22/11/2012   40 / 56
Recommendation Systems
In many applications (e.g. movie rental systems) users assign ratings to
products according to their taste (from to          )




Andr´ Martins (Priberam/IT)
    e                           Introducing Priberam Labs   IST 22/11/2012   41 / 56
Recommendation Systems
In many applications (e.g. movie rental systems) users assign ratings to
products according to their taste (from to          )
These ratings can be seen as entries in a matrix (of        N users by M movies)
                                                           
                               ? ? ...
                    ?             ? ...                    
                                                           
                   
                              ? ? ...                      
                                                            
                        .
                         .     . . ..
                               . .              .
                                                .           
                        .     . .       .      .           
                                   ?     ? ...




Andr´ Martins (Priberam/IT)
    e                           Introducing Priberam Labs        IST 22/11/2012   41 / 56
Recommendation Systems
In many applications (e.g. movie rental systems) users assign ratings to
products according to their taste (from to          )
These ratings can be seen as entries in a matrix (of        N users by M movies)
                                                           
                               ? ? ...
                    ?             ? ...                    
                                                           
                   
                              ? ? ...                      
                                                            
                        .
                         .     . . ..
                               . .              .
                                                .           
                        .     . .       .      .           
                                   ?     ? ...

Goal: fill the blanks (matrix completion).




Andr´ Martins (Priberam/IT)
    e                           Introducing Priberam Labs        IST 22/11/2012   41 / 56
Recommendation Systems
In many applications (e.g. movie rental systems) users assign ratings to
products according to their taste (from to          )
These ratings can be seen as entries in a matrix (of        N users by M movies)
                                                           
                               ? ? ...
                    ?             ? ...                    
                                                           
                   
                              ? ? ...                      
                                                            
                        .
                         .     . . ..
                               . .              .
                                                .           
                        .     . .       .      .           
                                   ?     ? ...

Goal: fill the blanks (matrix completion).
      Predict the rating that the ith user will assign to the jth movie based
      on similar user/movie profiles: collaborative filtering



Andr´ Martins (Priberam/IT)
    e                           Introducing Priberam Labs        IST 22/11/2012   41 / 56
Recommendation Systems
In many applications (e.g. movie rental systems) users assign ratings to
products according to their taste (from to          )
These ratings can be seen as entries in a matrix (of        N users by M movies)
                                                           
                               ? ? ...
                    ?             ? ...                    
                                                           
                   
                              ? ? ...                      
                                                            
                        .
                         .     . . ..
                               . .              .
                                                .           
                        .     . .       .      .           
                                   ?     ? ...

Goal: fill the blanks (matrix completion).
      Predict the rating that the ith user will assign to the jth movie based
      on similar user/movie profiles: collaborative filtering
      Recommend new movies to unseen users

Andr´ Martins (Priberam/IT)
    e                           Introducing Priberam Labs        IST 22/11/2012   41 / 56
Recommendation Systems


Netflix Prize: $1M for whoever improves Netflix’s Cinematch R in > 10%




Andr´ Martins (Priberam/IT)
    e                           Introducing Priberam Labs   IST 22/11/2012   42 / 56
Recommendation Systems


Netflix Prize: $1M for whoever improves Netflix’s Cinematch R in > 10%
      Winner: BellKor’s Pragmatic Chaos, 21/9/2009




Andr´ Martins (Priberam/IT)
    e                           Introducing Priberam Labs   IST 22/11/2012   42 / 56
Recommendation Systems


Netflix Prize: $1M for whoever improves Netflix’s Cinematch R in > 10%
      Winner: BellKor’s Pragmatic Chaos, 21/9/2009
Data: some entries of the user/movie matrix (training and test splits)




Andr´ Martins (Priberam/IT)
    e                           Introducing Priberam Labs   IST 22/11/2012   42 / 56
Recommendation Systems


Netflix Prize: $1M for whoever improves Netflix’s Cinematch R in > 10%
      Winner: BellKor’s Pragmatic Chaos, 21/9/2009
Data: some entries of the user/movie matrix (training and test splits)
Evaluation metric: root mean squared error (RMSE)




Andr´ Martins (Priberam/IT)
    e                           Introducing Priberam Labs   IST 22/11/2012   42 / 56
Recommendation Systems


Netflix Prize: $1M for whoever improves Netflix’s Cinematch R in > 10%
      Winner: BellKor’s Pragmatic Chaos, 21/9/2009
Data: some entries of the user/movie matrix (training and test splits)
Evaluation metric: root mean squared error (RMSE)
Some possible approaches:
      k-nearest neighbors (for some similarity metric)
      probabilistic models with latent variables
      low-rank matrix factorization




Andr´ Martins (Priberam/IT)
    e                           Introducing Priberam Labs   IST 22/11/2012   42 / 56
Master’s Projects




      Opinion Mining in Newspapers and Blogs
      Text-Driven Forecasting
      Recommendation Systems
      Weakly Supervised Sentiment Analysis




Andr´ Martins (Priberam/IT)
    e                            Introducing Priberam Labs   IST 22/11/2012   43 / 56
Master’s Projects




      Opinion Mining in Newspapers and Blogs
      Text-Driven Forecasting
      Recommendation Systems
      Weakly Supervised Sentiment Analysis




Andr´ Martins (Priberam/IT)
    e                            Introducing Priberam Labs   IST 22/11/2012   43 / 56
Weakly Supervised Sentiment Analysis


Classify a product review as positive or negative.




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs   IST 22/11/2012   44 / 56
Weakly Supervised Sentiment Analysis


Classify a product review as positive or negative.
      “This camera takes poor quality photos. Yes, it’s slim and
      lightweight. Yes, the shutter speed is snappy. But the photos are
      of such poor quality that it’s a pretty useless camera.”

      — Amazon.com




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs    IST 22/11/2012   44 / 56
Weakly Supervised Sentiment Analysis


Classify a product review as positive or negative.
      “This camera takes poor quality photos. Yes, it’s slim and
      lightweight. Yes, the shutter speed is snappy. But the photos are
      of such poor quality that it’s a pretty useless camera.”

      — Amazon.com




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs    IST 22/11/2012   44 / 56
Weakly Supervised Sentiment Analysis


Classify a product review as positive or negative.
      “This camera takes poor quality photos. Yes, it’s slim and
      lightweight. Yes, the shutter speed is snappy. But the photos are
      of such poor quality that it’s a pretty useless camera.”

      — Amazon.com

Data: a set of reviews along with product ratings.




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs    IST 22/11/2012   44 / 56
Weakly Supervised Sentiment Analysis


Classify a product review as positive or negative.
      “This camera takes poor quality photos. Yes, it’s slim and
      lightweight. Yes, the shutter speed is snappy. But the photos are
      of such poor quality that it’s a pretty useless camera.”

      — Amazon.com

Data: a set of reviews along with product ratings.
Goal: an algorithm which, given as input a new product review, predicts
its polarity (positive or negative)




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs    IST 22/11/2012   44 / 56
Weakly Supervised Sentiment Analysis


Consider a scenario with weak supervision: domain adaptation,
semi-supervised learning, language transfer, etc.




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs   IST 22/11/2012   45 / 56
Weakly Supervised Sentiment Analysis


Consider a scenario with weak supervision: domain adaptation,
semi-supervised learning, language transfer, etc.
Possible tasks:
      Classify movie reviews with a system trained on cellphone reviews
      Train a system in English data and use it for reviews in Portuguese




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs    IST 22/11/2012   45 / 56
Weakly Supervised Sentiment Analysis


Consider a scenario with weak supervision: domain adaptation,
semi-supervised learning, language transfer, etc.
Possible tasks:
      Classify movie reviews with a system trained on cellphone reviews
      Train a system in English data and use it for reviews in Portuguese
What are the relevant features?
      Adjectives? (not always helpful...)
      Connective words: but, however, although,...




Andr´ Martins (Priberam/IT)
    e                          Introducing Priberam Labs   IST 22/11/2012   45 / 56
Outline


1   Introduction
       What is Priberam?
       What are the Priberam Labs?


2   Research at Priberam Labs


3   Master’s Projects


4   Academia Partnerships




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs   IST 22/11/2012   46 / 56
Academia Partnerships




      CMU/Portugal
      Seminars
      Summer School (LxMLS)
      Opportunity: Research Internships




Andr´ Martins (Priberam/IT)
    e                              Introducing Priberam Labs   IST 22/11/2012   47 / 56
CMU/Portugal


      Dual PhD Program in Language Technologies
      Priberam is an industrial partner
      See how to apply in: http://www.cmuportugal.org




Andr´ Martins (Priberam/IT)
    e                          Introducing Priberam Labs   IST 22/11/2012   48 / 56
CMU/Portugal


      Dual PhD Program in Language Technologies
      Priberam is an industrial partner
      See how to apply in: http://www.cmuportugal.org
      Note: deadline soon (December 15th)




Andr´ Martins (Priberam/IT)
    e                          Introducing Priberam Labs   IST 22/11/2012   48 / 56
Priberam Machine Learning Lunch Seminars




      A series of informal meetings every two weeks at IST (Tuesdays 1PM)
      Discussion forum involving different research groups interested in
      machine learning
      Everyone can attend, no registration needed




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs    IST 22/11/2012   49 / 56
Priberam Machine Learning Lunch Seminars




      A series of informal meetings every two weeks at IST (Tuesdays 1PM)
      Discussion forum involving different research groups interested in
      machine learning
      Everyone can attend, no registration needed
      Delicious free food!



Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs    IST 22/11/2012   49 / 56
Lisbon Machine Learning School




      An annual summer school held since 2011 devoted to ML and NLP




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs   IST 22/11/2012   50 / 56
Lisbon Machine Learning School




      An annual summer school held since 2011 devoted to ML and NLP
      > 100 participants worldwide (mostly MSc and PhD students)




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs   IST 22/11/2012   50 / 56
Lisbon Machine Learning School




      An annual summer school held since 2011 devoted to ML and NLP
      > 100 participants worldwide (mostly MSc and PhD students)
      Priberam Labs co-organizes and is one of the sponsors
      Google is the main sponsor




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs   IST 22/11/2012   50 / 56
Lisbon Machine Learning School




      An annual summer school held since 2011 devoted to ML and NLP
      > 100 participants worldwide (mostly MSc and PhD students)
      Priberam Labs co-organizes and is one of the sponsors
      Google is the main sponsor
      Next year’s topic is Big Data
      More information and videos of past lectures: http://lxmls.it.pt



Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs    IST 22/11/2012   50 / 56
Opportunity: Research Internships


We’re offering short term research internships at Priberam Labs!

      Who? MSc/PhD students wanting a short experience in the industry
      What? A stimulating research environment, connections to the
      international ML and NLP research scene
      How? Interns will work with us in a research project of their choice


Interested?
      labs@priberam.com




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs    IST 22/11/2012   51 / 56
Thank You!

More information about the Labs: http://labs.priberam.com
(You could be here.)




Andr´ Martins (Priberam/IT)
    e                         Introducing Priberam Labs   IST 22/11/2012   52 / 56
References I
Altun, Y., Tsochantaridis, I., and Hofmann, T. (2003). Hidden Markov support vector
   machines. In Proc. of International Conference of Machine Learning.
Berrou, C., Glavieux, A., and Thitimajshima, P. (1993). Near Shannon limit error-correcting
   coding and decoding. In Proc. of International Conference on Communications, volume 93,
   pages 1064–1070.
Bishop, C. (2006). Pattern recognition and machine learning. Springer New York.
Brants, T. (2000). Tnt: a statistical part-of-speech tagger. In Proc. of the Sixth Conference on
   Applied Natural Language Processing.
Brill, E. (1993). A Corpus-Based Approach to Language Learning. PhD thesis, University of
   Pennsylvania.
Charniak, E. (1996). Tree-bank grammars. In Proc. of the National Conference on Artificial
  Intelligence, pages 1031–1036.
Chomsky, N. (1965). Aspects of the Theory of Syntax, volume 119. The MIT press.
Collins, M. (1999). Head-driven statistical models for natural language parsing. PhD thesis,
   University of Pennsylvania.
Eisner, J. (1996). Three new probabilistic models for dependency parsing: An exploration. In
   Proc. of International Conference on Computational Linguistics, pages 340–345.
Halevy, A., Norvig, P., and Pereira, F. (2009). The unreasonable effectiveness of data.
  Intelligent Systems, IEEE, 24(2):8–12.
Hudson, R. (1984). Word grammar. Blackwell Oxford.

Andr´ Martins (Priberam/IT)
    e                                Introducing Priberam Labs             IST 22/11/2012      53 / 56
References II
Ising, E. (1925). Beitrag zur theorie des ferromagnetismus. Zeitschrift f¨r Physik A Hadrons
                                                                         u
    and Nuclei, 31(1):253–258.
Klein, D. and Manning, C. (2003). Accurate unlexicalized parsing. In Proc. of Annual Meeting
   on Association for Computational Linguistics, pages 423–430.
Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques.
  The MIT Press.
Koo, T., Globerson, A., Carreras, X., and Collins, M. (2007). Structured prediction models via
  the matrix-tree theorem. In Empirical Methods for Natural Language Processing.
Kschischang, F. R., Frey, B. J., and Loeliger, H. A. (2001). Factor graphs and the sum-product
  algorithm. IEEE Transactions on Information Theory, 47.
Lafferty, J., McCallum, A., and Pereira, F. (2001). Conditional random fields: Probabilistic
  models for segmenting and labeling sequence data. In Proc. of International Conference of
  Machine Learning.
Magerman, D. (1995). Statistical decision-tree models for parsing. In Proc. of Annual Meeting
  on Association for Computational Linguistics, pages 276–283.
Manning, C. and Sch¨tze, H. (1999). Foundations of Statistical Natural Language Processing.
                   u
  MIT Press, Cambridge, MA.
Martins, A. F. T., Figueiredo, M. A. T., Aguiar, P. M. Q., Smith, N. A., and Xing, E. P.
  (2011a). An Augmented Lagrangian Approach to Constrained MAP Inference. In Proc. of
  International Conference of Machine Learning.


Andr´ Martins (Priberam/IT)
    e                                Introducing Priberam Labs            IST 22/11/2012   54 / 56
References III
Martins, A. F. T., Smith, N. A., Aguiar, P. M. Q., and Figueiredo, M. A. T. (2011b). Dual
  Decomposition with Many Overlapping Components. In Proc. of Empirical Methods for
  Natural Language Processing.
Martins, A. F. T., Smith, N. A., and Xing, E. P. (2009). Concise Integer Linear Programming
  Formulations for Dependency Parsing. In Proc. of Annual Meeting of the Association for
  Computational Linguistics.
Martins, A. F. T., Smith, N. A., Xing, E. P., Aguiar, P. M. Q., and Figueiredo, M. A. T.
  (2010a). Augmented Dual Decomposition for MAP Inference. In Neural Information
  Processing Systems: Workshop in Optimization for Machine Learning.
Martins, A. F. T., Smith, N. A., Xing, E. P., Figueiredo, M. A. T., and Aguiar, P. M. Q.
  (2010b). Turbo Parsers: Dependency Parsing by Approximate Variational Inference. In Proc.
  of Empirical Methods for Natural Language Processing.
McDonald, R. T., Pereira, F., Ribarov, K., and Hajic, J. (2005). Non-projective dependency
  parsing using spanning tree algorithms. In Proc. of Empirical Methods for Natural Language
  Processing.
Mel’ˇuk, I. (1988). Dependency syntax: theory and practice. State University of New York Press.
    c
Mitchell, T. (1997). Machine learning. McGraw Hill.
Nivre, J., Hall, J., Nilsson, J., Eryiˇit, G., and Marinov, S. (2006). Labeled pseudo-projective
                                      g
   dependency parsing with support vector machines. In Procs. of International Conference on
   Natural Language Learning.
Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.
   Morgan Kaufmann.

Andr´ Martins (Priberam/IT)
    e                                 Introducing Priberam Labs             IST 22/11/2012   55 / 56
References IV


Potts, R. (1952). Some generalized order-disorder transformations. In Proceedings of the
  Cambridge Philosophical Society, volume 48, pages 106–109. Cambridge Univ Press.
Sch¨lkopf, B. and Smola, A. J. (2002). Learning with Kernels. The MIT Press, Cambridge, MA.
   o
Tanner, R. (1981). A recursive approach to low complexity codes. IEEE Transactions on
  Information Theory, 27(5):533–547.
Taskar, B., Guestrin, C., and Koller, D. (2003). Max-margin Markov networks. In Proc. of
   Neural Information Processing Systems.
Tesni`re, L. (1959). El´ments de syntaxe structurale. Libraire C. Klincksieck.
     e                 e
Tsochantaridis, I., Hofmann, T., Joachims, T., and Altun, Y. (2004). Support vector machine
  learning for interdependent and structured output spaces. In Proc. of International
  Conference of Machine Learning.
Viterbi, A. (1967). Error bounds for convolutional codes and an asymptotically optimum
   decoding algorithm. IEEE Transactions on Information Theory, 13(2):260–269.




Andr´ Martins (Priberam/IT)
    e                                 Introducing Priberam Labs            IST 22/11/2012   56 / 56

Contenu connexe

Dernier

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Dernier (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

En vedette

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

En vedette (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Introducing Priberam Labs: Machine Learning and Natural Language Processing

  • 1. Introducing Priberam Labs: Machine Learning and Natural Language Processing Andr´ Martins e IST, Lisbon, November 22nd, 2012 Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 1 / 56
  • 2. Collaborators M´rio Figueiredo, Noah Smith, Pedro Aguiar, Eric Xing, Miguel Almeida. a Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 2 / 56
  • 3. Outline 1 Introduction What is Priberam? What are the Priberam Labs? 2 Research at Priberam Labs 3 Master’s Projects 4 Academia Partnerships Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 3 / 56
  • 4. Outline 1 Introduction What is Priberam? What are the Priberam Labs? 2 Research at Priberam Labs 3 Master’s Projects 4 Academia Partnerships Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 4 / 56
  • 5. What is Priberam? A spin-off from IST funded in 1989 R&D in the area of language technologies Microsoft gold certified partner, PME L´ ıder, PME Inovadora COTEC Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 5 / 56
  • 6. What is Priberam? A spin-off from IST funded in 1989 R&D in the area of language technologies Microsoft gold certified partner, PME L´ ıder, PME Inovadora COTEC Some of our clients: Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 5 / 56
  • 7. Online Dictionary (http://www.priberam.pt/dlpo — 1M page-views per day) Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 6 / 56
  • 8. Grammar Checker (http://www.flip.pt) Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 7 / 56
  • 9. Legal Search (http://www.legix.pt) Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 8 / 56
  • 10. Newswire Search (http://www.dn.pt, http://www.jn.pt, http://www.tsf.pt) Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 9 / 56
  • 11. Newswire Search question (http://www.dn.pt, http://www.jn.pt, http://www.tsf.pt) Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 9 / 56
  • 12. Newswire Search question answer (http://www.dn.pt, http://www.jn.pt, http://www.tsf.pt) Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 9 / 56
  • 13. Outline 1 Introduction What is Priberam? What are the Priberam Labs? 2 Research at Priberam Labs 3 Master’s Projects 4 Academia Partnerships Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 10 / 56
  • 14. What are the Priberam Labs? Every day we deal with challenging and stimulating problems, some of them unanswered by current scientific knowledge Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 11 / 56
  • 15. What are the Priberam Labs? Every day we deal with challenging and stimulating problems, some of them unanswered by current scientific knowledge Our key areas: Natural Language Processing and Machine Learning Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 11 / 56
  • 16. What are the Priberam Labs? Every day we deal with challenging and stimulating problems, some of them unanswered by current scientific knowledge Our key areas: Natural Language Processing and Machine Learning Our goals: advance the state of the art in NLP and ML incorporate the resulting innovations in new products promote collaborations with other researchers in academia Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 11 / 56
  • 17. Outline 1 Introduction What is Priberam? What are the Priberam Labs? 2 Research at Priberam Labs 3 Master’s Projects 4 Academia Partnerships Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 12 / 56
  • 18. Our Research Interests Natural Language Processing Machine Learning Structured Prediction Graphical Models Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 13 / 56
  • 19. Our Research Interests Natural Language Processing Machine Learning Structured Prediction Graphical Models Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 13 / 56
  • 20. Natural Language Processing Goal: make machines capable of “understanding” human language. Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 14 / 56
  • 21. Natural Language Processing Goal: make machines capable of “understanding” human language. Information Retrieval Machine Translation Syntactic Parsing Semantic Parsing Speech Recognition ... Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 14 / 56
  • 22. The Empirical “Revolution” in NLP Until the 1980s: rule-based methods were prevalent in AI Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 15 / 56
  • 23. The Empirical “Revolution” in NLP Until the 1980s: rule-based methods were prevalent in AI Since the mid 1990s: statistical methods, corpus linguistics Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 15 / 56
  • 24. The Empirical “Revolution” in NLP Until the 1980s: rule-based methods were prevalent in AI Since the mid 1990s: statistical methods, corpus linguistics Today: emphasis in machine learning and large-scale data processing “The unreasonable effectiveness of data”, Halevy et al. 2009 Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 15 / 56
  • 25. Our Research Interests Natural Language Processing Machine Learning Structured Prediction Graphical Models Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 16 / 56
  • 26. Our Research Interests Natural Language Processing Machine Learning Structured Prediction Graphical Models Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 16 / 56
  • 27. Example: Spam Detector Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 17 / 56
  • 28. Example: Spam Detector Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 17 / 56
  • 29. Example: Spam Detector Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 17 / 56
  • 30. Example: Spam Detector Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 17 / 56
  • 31. Example: Spam Detector Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 17 / 56
  • 32. Example: Spam Detector Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 17 / 56
  • 33. Machine Learning Goal: build systems that learn from the data. Mitchell (1997); Manning and Sch¨tze (1999); Sch¨lkopf and Smola (2002); Bishop (2006) u o Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 18 / 56
  • 34. Machine Learning Goal: build systems that learn from the data. Input set X and output set Y Mitchell (1997); Manning and Sch¨tze (1999); Sch¨lkopf and Smola (2002); Bishop (2006) u o Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 18 / 56
  • 35. Machine Learning Goal: build systems that learn from the data. Input set X and output set Y Learn a classifier h : X → Y from a set of labeled examples {(xi , yi )}N ⊆ X × Y i=1 Mitchell (1997); Manning and Sch¨tze (1999); Sch¨lkopf and Smola (2002); Bishop (2006) u o Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 18 / 56
  • 36. Machine Learning Goal: build systems that learn from the data. Input set X and output set Y Learn a classifier h : X → Y from a set of labeled examples {(xi , yi )}N ⊆ X × Y i=1 Given an unseen example x ∈ X, predict y = h(x) Mitchell (1997); Manning and Sch¨tze (1999); Sch¨lkopf and Smola (2002); Bishop (2006) u o Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 18 / 56
  • 37. Machine Learning Goal: build systems that learn from the data. Input set X and output set Y Learn a classifier h : X → Y from a set of labeled examples {(xi , yi )}N ⊆ X × Y i=1 Given an unseen example x ∈ X, predict y = h(x) Many approaches: decision trees, neural networks, nearest neighbors, naive Bayes, logistic regression, support vector machines, ... Many learning formalisms: supervised, unsupervised, semi-supervised, weakly-supervised, active, online, reinforcement, ... Mitchell (1997); Manning and Sch¨tze (1999); Sch¨lkopf and Smola (2002); Bishop (2006) u o Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 18 / 56
  • 38. Our Research Interests Natural Language Processing Machine Learning Structured Prediction Graphical Models Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 19 / 56
  • 39. Our Research Interests Natural Language Processing Machine Learning Structured Prediction Graphical Models Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 19 / 56
  • 40. Structured Prediction Language is structured, complex, and ambiguous. Lafferty et al. (2001); Taskar et al. (2003); Altun et al. (2003); Tsochantaridis et al. (2004) Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 20 / 56
  • 41. Structured Prediction Language is structured, complex, and ambiguous. The input set X is typically structured (a string, an acoustic signal, etc.) Often: the output set Y is also structured (a string, a parse tree, etc.) Lafferty et al. (2001); Taskar et al. (2003); Altun et al. (2003); Tsochantaridis et al. (2004) Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 20 / 56
  • 42. Structured Prediction Language is structured, complex, and ambiguous. The input set X is typically structured (a string, an acoustic signal, etc.) Often: the output set Y is also structured (a string, a parse tree, etc.) Some problems: How to decode structured outputs? How to learn models for structured prediction? How to learn the structure itself? Lafferty et al. (2001); Taskar et al. (2003); Altun et al. (2003); Tsochantaridis et al. (2004) Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 20 / 56
  • 43. Example: Part-of-Speech Tagging Goal: given a sentence, determine the part-of-speech tag of each word. Time flies like an arrow Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 21 / 56
  • 44. Example: Part-of-Speech Tagging Goal: given a sentence, determine the part-of-speech tag of each word. Noun Det Noun Time flies like an arrow Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 21 / 56
  • 45. Example: Part-of-Speech Tagging Goal: given a sentence, determine the part-of-speech tag of each word. Noun? Noun Verb? Det Noun Time flies like an arrow Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 21 / 56
  • 46. Example: Part-of-Speech Tagging Goal: given a sentence, determine the part-of-speech tag of each word. Noun? Prep? Noun Verb? Verb? Det Noun Time flies like an arrow Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 21 / 56
  • 47. Example: Part-of-Speech Tagging Goal: given a sentence, determine the part-of-speech tag of each word. Rule-based systems (Brill, 1993) Noun? Prep? Noun Verb? Verb? Det Noun Time flies like an arrow Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 21 / 56
  • 48. Example: Part-of-Speech Tagging Goal: given a sentence, determine the part-of-speech tag of each word. Rule-based systems (Brill, 1993) Hidden Markov models (Brants, 2000) Noun Verb Prep Det Noun Time flies like an arrow Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 21 / 56
  • 49. Example: Part-of-Speech Tagging Goal: given a sentence, determine the part-of-speech tag of each word. Rule-based systems (Brill, 1993) Hidden Markov models (Brants, 2000) Conditional random fields (Lafferty et al., 2001) Noun Verb Prep Det Noun Time flies like an arrow Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 21 / 56
  • 50. Our Research Interests Natural Language Processing Machine Learning Structured Prediction Graphical Models Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 22 / 56
  • 51. Our Research Interests Natural Language Processing Machine Learning Structured Prediction Graphical Models Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 22 / 56
  • 52. Graphical Models Inspired in Statistical Mechanics (Ising, 1925; Potts, 1952) Applications in coding theory, vision, computational biology, ... (Tanner, 1981; Pearl, 1988; Kschischang et al., 2001; Koller and Friedman, 2009) Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 23 / 56
  • 53. Graphical Models Inspired in Statistical Mechanics (Ising, 1925; Potts, 1952) Applications in coding theory, vision, computational biology, ... (Tanner, 1981; Pearl, 1988; Kschischang et al., 2001; Koller and Friedman, 2009) MAP Inference: obtain the most likely configuration. Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 23 / 56
  • 54. Graphical Models Inspired in Statistical Mechanics (Ising, 1925; Potts, 1952) Applications in coding theory, vision, computational biology, ... (Tanner, 1981; Pearl, 1988; Kschischang et al., 2001; Koller and Friedman, 2009) MAP Inference: obtain the most likely configuration. Graphs without cycles: dynamic programming (Viterbi, 1967) Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 23 / 56
  • 55. Graphical Models Inspired in Statistical Mechanics (Ising, 1925; Potts, 1952) Applications in coding theory, vision, computational biology, ... (Tanner, 1981; Pearl, 1988; Kschischang et al., 2001; Koller and Friedman, 2009) MAP Inference: obtain the most likely configuration. Graphs without cycles: dynamic programming (Viterbi, 1967) In general NP-hard! Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 23 / 56
  • 56. AD3 Algorithm (Martins et al., 2010a, 2011a) “Alternating Directions Dual Decomposition.” Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 24 / 56
  • 57. AD3 Algorithm (Martins et al., 2010a, 2011a) “Alternating Directions Dual Decomposition.” An approximate MAP inference algorithm based on an LP relaxation Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 24 / 56
  • 58. AD3 Algorithm (Martins et al., 2010a, 2011a) “Alternating Directions Dual Decomposition.” An approximate MAP inference algorithm based on an LP relaxation Fundamental idea: decompose the graph in parts, at each iteration t solve local subproblems and promote a consensus on the overlaps Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 24 / 56
  • 59. AD3 Algorithm (Martins et al., 2010a, 2011a) “Alternating Directions Dual Decomposition.” An approximate MAP inference algorithm based on an LP relaxation Fundamental idea: decompose the graph in parts, at each iteration t solve local subproblems and promote a consensus on the overlaps Convergence rate O(1/t) Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 24 / 56
  • 60. AD3 Algorithm (Martins et al., 2010a, 2011a) “Alternating Directions Dual Decomposition.” An approximate MAP inference algorithm based on an LP relaxation Fundamental idea: decompose the graph in parts, at each iteration t solve local subproblems and promote a consensus on the overlaps Convergence rate O(1/t) Can tackle combinatorial parts and first-order logic constraints Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 24 / 56
  • 61. AD3 Algorithm (Martins et al., 2010a, 2011a) “Alternating Directions Dual Decomposition.” An approximate MAP inference algorithm based on an LP relaxation Fundamental idea: decompose the graph in parts, at each iteration t solve local subproblems and promote a consensus on the overlaps Convergence rate O(1/t) Can tackle combinatorial parts and first-order logic constraints Code available at: http://www.ark.cs.cmu.edu/AD3 Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 24 / 56
  • 62. Graphs are Everywhere Facebook graph WWW graph Protein folding Image Segmentation Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 25 / 56
  • 63. Syntactic Parsing (Chomsky, 1965; Magerman, 1995; Charniak, 1996; Collins, 1999; Klein and Manning, 2003) Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 26 / 56
  • 64. Syntactic Parsing (Chomsky, 1965; Magerman, 1995; Charniak, 1996; Collins, 1999; Klein and Manning, 2003) She solved the problem with the statistical method. Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 26 / 56
  • 65. Syntactic Parsing (Chomsky, 1965; Magerman, 1995; Charniak, 1996; Collins, 1999; Klein and Manning, 2003) She solved the problem with the statistical method. S S --> NP VP NP --> Pro NP --> Det N NP VP NP --> Det Nbar Nbar --> Adj N Pro VP --> V NP PP PP --> P NP She Det --> the V NP PP Pro --> She solved Det N N --> problem P NP N --> method the problem V --> solved with Det Nbar P --> with Adj --> the Adj N statistical statistical method Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 26 / 56
  • 66. Syntactic Ambiguity 1 She employed the statistical method: S NP VP She V NP PP solved the problem with the statistical method 2 The statistical method was broken: S NP VP She V NP solved NP PP the problem with the statistical method Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 27 / 56
  • 67. Dependency Syntax (P¯nini, 4th century BCE, Tesni`re 1959; Hudson 1984; Mel’ˇuk 1988; Eisner 1996; McDonald a. e c et al. 2005; Nivre et al. 2006; Koo et al. 2007) * She solved the problem with the statistical method Tree obtained “lexicalizing” the previous phrase-structure tree. A lightweight syntactic formalism, without phrases Grammar functions represented as lexical relationships Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 28 / 56
  • 68. Turbo Parser (Martins et al., 2009, 2010b, 2011b) A multi-lingual statistical dependency parser, which formulates parsing as inference in a graphical model. Ignores global effects caused by the cycles of the graph Same idea that underlies turbo decoders (Berrou et al., 1993) Uses AD3 for solving the relaxation State-of-the-art accuracies, extremely fast (1, 200 words per second) Code available at: http://www.ark.cs.cmu.edu/TurboParser Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 29 / 56
  • 69. Ongoing Project: Summarization Given a set of documents about an event, generate a brief summary. Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 30 / 56
  • 70. Ongoing Project: Summarization Given a set of documents about an event, generate a brief summary. Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 30 / 56
  • 71. Extractive Summarization Just extract the most salient sentences. Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 31 / 56
  • 72. Extractive Summarization Just extract the most salient sentences. Reward relevance and coverage, penalize redundancy Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 31 / 56
  • 73. Compressive Summarization Jointly extract and compress sentences. Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 32 / 56
  • 74. Compressive Summarization Jointly extract and compress sentences. Trade-off between informativeness, length, and grammaticality Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 32 / 56
  • 75. Released Software A multilingual part-of-speech tagger (TurboTagger) A multilingual dependency parser (TurboParser) A algorithm for approximate inference in graphical models (AD3 ) http://www.ark.cs.cmu.edu/TurboParser http://www.ark.cs.cmu.edu/AD3 lti Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 33 / 56
  • 76. Outline 1 Introduction What is Priberam? What are the Priberam Labs? 2 Research at Priberam Labs 3 Master’s Projects 4 Academia Partnerships Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 34 / 56
  • 77. Master’s Projects Opinion Mining in Newspapers and Blogs Text-Driven Forecasting Recommendation Systems Weakly Supervised Sentiment Analysis Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 35 / 56
  • 78. Master’s Projects Opinion Mining in Newspapers and Blogs Text-Driven Forecasting Recommendation Systems Weakly Supervised Sentiment Analysis Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 35 / 56
  • 79. Opinion Mining in Newspapers and Blogs Build a system that extracts “opinions” from text in natural language. Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 36 / 56
  • 80. Opinion Mining in Newspapers and Blogs Build a system that extracts “opinions” from text in natural language. Examples: opinions of politicians about controversial topics, user reviews about products, opinions expressed in blogs and Twitter, etc. Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 36 / 56
  • 81. Opinion Mining in Newspapers and Blogs Build a system that extracts “opinions” from text in natural language. Examples: opinions of politicians about controversial topics, user reviews about products, opinions expressed in blogs and Twitter, etc. Goal: a computer program that extracts opinions, identifies the opinion holder, the aspect that is being opinionated about, and the opinion polarity (positive or negative sentiment) Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 36 / 56
  • 82. Example: Google Products opinion snippets aspects Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 37 / 56
  • 83. Master’s Projects Opinion Mining in Newspapers and Blogs Text-Driven Forecasting Recommendation Systems Weakly Supervised Sentiment Analysis Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 38 / 56
  • 84. Master’s Projects Opinion Mining in Newspapers and Blogs Text-Driven Forecasting Recommendation Systems Weakly Supervised Sentiment Analysis Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 38 / 56
  • 85. Text-Driven Forecasting Example: a movie by a famous director has premiered. Can we predict its gross revenue given opinionated text? “[...] a masterpiece in sheer awfulness.” — Rotten Tomatoes Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 39 / 56
  • 86. Text-Driven Forecasting Example: a movie by a famous director has premiered. Can we predict its gross revenue given opinionated text? “[...] a masterpiece in sheer awfulness.” — Rotten Tomatoes Goal: develop ML algorithms for predicting numeric quantities about an event given a body of text. Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 39 / 56
  • 87. Text-Driven Forecasting Example: a movie by a famous director has premiered. Can we predict its gross revenue given opinionated text? “[...] a masterpiece in sheer awfulness.” — Rotten Tomatoes Goal: develop ML algorithms for predicting numeric quantities about an event given a body of text. Possible applications: predicting the revenue of movies, opinion polls from blogs, stock volatility from financial reports, the number of external links given a news article, etc. Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 39 / 56
  • 88. Master’s Projects Opinion Mining in Newspapers and Blogs Text-Driven Forecasting Recommendation Systems Weakly Supervised Sentiment Analysis Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 40 / 56
  • 89. Master’s Projects Opinion Mining in Newspapers and Blogs Text-Driven Forecasting Recommendation Systems Weakly Supervised Sentiment Analysis Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 40 / 56
  • 90. Recommendation Systems In many applications (e.g. movie rental systems) users assign ratings to products according to their taste (from to ) Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 41 / 56
  • 91. Recommendation Systems In many applications (e.g. movie rental systems) users assign ratings to products according to their taste (from to ) These ratings can be seen as entries in a matrix (of N users by M movies)   ? ? ...  ? ? ...      ? ? ...    . . . . .. . . . .   . . . . .  ? ? ... Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 41 / 56
  • 92. Recommendation Systems In many applications (e.g. movie rental systems) users assign ratings to products according to their taste (from to ) These ratings can be seen as entries in a matrix (of N users by M movies)   ? ? ...  ? ? ...      ? ? ...    . . . . .. . . . .   . . . . .  ? ? ... Goal: fill the blanks (matrix completion). Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 41 / 56
  • 93. Recommendation Systems In many applications (e.g. movie rental systems) users assign ratings to products according to their taste (from to ) These ratings can be seen as entries in a matrix (of N users by M movies)   ? ? ...  ? ? ...      ? ? ...    . . . . .. . . . .   . . . . .  ? ? ... Goal: fill the blanks (matrix completion). Predict the rating that the ith user will assign to the jth movie based on similar user/movie profiles: collaborative filtering Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 41 / 56
  • 94. Recommendation Systems In many applications (e.g. movie rental systems) users assign ratings to products according to their taste (from to ) These ratings can be seen as entries in a matrix (of N users by M movies)   ? ? ...  ? ? ...      ? ? ...    . . . . .. . . . .   . . . . .  ? ? ... Goal: fill the blanks (matrix completion). Predict the rating that the ith user will assign to the jth movie based on similar user/movie profiles: collaborative filtering Recommend new movies to unseen users Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 41 / 56
  • 95. Recommendation Systems Netflix Prize: $1M for whoever improves Netflix’s Cinematch R in > 10% Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 42 / 56
  • 96. Recommendation Systems Netflix Prize: $1M for whoever improves Netflix’s Cinematch R in > 10% Winner: BellKor’s Pragmatic Chaos, 21/9/2009 Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 42 / 56
  • 97. Recommendation Systems Netflix Prize: $1M for whoever improves Netflix’s Cinematch R in > 10% Winner: BellKor’s Pragmatic Chaos, 21/9/2009 Data: some entries of the user/movie matrix (training and test splits) Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 42 / 56
  • 98. Recommendation Systems Netflix Prize: $1M for whoever improves Netflix’s Cinematch R in > 10% Winner: BellKor’s Pragmatic Chaos, 21/9/2009 Data: some entries of the user/movie matrix (training and test splits) Evaluation metric: root mean squared error (RMSE) Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 42 / 56
  • 99. Recommendation Systems Netflix Prize: $1M for whoever improves Netflix’s Cinematch R in > 10% Winner: BellKor’s Pragmatic Chaos, 21/9/2009 Data: some entries of the user/movie matrix (training and test splits) Evaluation metric: root mean squared error (RMSE) Some possible approaches: k-nearest neighbors (for some similarity metric) probabilistic models with latent variables low-rank matrix factorization Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 42 / 56
  • 100. Master’s Projects Opinion Mining in Newspapers and Blogs Text-Driven Forecasting Recommendation Systems Weakly Supervised Sentiment Analysis Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 43 / 56
  • 101. Master’s Projects Opinion Mining in Newspapers and Blogs Text-Driven Forecasting Recommendation Systems Weakly Supervised Sentiment Analysis Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 43 / 56
  • 102. Weakly Supervised Sentiment Analysis Classify a product review as positive or negative. Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 44 / 56
  • 103. Weakly Supervised Sentiment Analysis Classify a product review as positive or negative. “This camera takes poor quality photos. Yes, it’s slim and lightweight. Yes, the shutter speed is snappy. But the photos are of such poor quality that it’s a pretty useless camera.” — Amazon.com Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 44 / 56
  • 104. Weakly Supervised Sentiment Analysis Classify a product review as positive or negative. “This camera takes poor quality photos. Yes, it’s slim and lightweight. Yes, the shutter speed is snappy. But the photos are of such poor quality that it’s a pretty useless camera.” — Amazon.com Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 44 / 56
  • 105. Weakly Supervised Sentiment Analysis Classify a product review as positive or negative. “This camera takes poor quality photos. Yes, it’s slim and lightweight. Yes, the shutter speed is snappy. But the photos are of such poor quality that it’s a pretty useless camera.” — Amazon.com Data: a set of reviews along with product ratings. Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 44 / 56
  • 106. Weakly Supervised Sentiment Analysis Classify a product review as positive or negative. “This camera takes poor quality photos. Yes, it’s slim and lightweight. Yes, the shutter speed is snappy. But the photos are of such poor quality that it’s a pretty useless camera.” — Amazon.com Data: a set of reviews along with product ratings. Goal: an algorithm which, given as input a new product review, predicts its polarity (positive or negative) Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 44 / 56
  • 107. Weakly Supervised Sentiment Analysis Consider a scenario with weak supervision: domain adaptation, semi-supervised learning, language transfer, etc. Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 45 / 56
  • 108. Weakly Supervised Sentiment Analysis Consider a scenario with weak supervision: domain adaptation, semi-supervised learning, language transfer, etc. Possible tasks: Classify movie reviews with a system trained on cellphone reviews Train a system in English data and use it for reviews in Portuguese Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 45 / 56
  • 109. Weakly Supervised Sentiment Analysis Consider a scenario with weak supervision: domain adaptation, semi-supervised learning, language transfer, etc. Possible tasks: Classify movie reviews with a system trained on cellphone reviews Train a system in English data and use it for reviews in Portuguese What are the relevant features? Adjectives? (not always helpful...) Connective words: but, however, although,... Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 45 / 56
  • 110. Outline 1 Introduction What is Priberam? What are the Priberam Labs? 2 Research at Priberam Labs 3 Master’s Projects 4 Academia Partnerships Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 46 / 56
  • 111. Academia Partnerships CMU/Portugal Seminars Summer School (LxMLS) Opportunity: Research Internships Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 47 / 56
  • 112. CMU/Portugal Dual PhD Program in Language Technologies Priberam is an industrial partner See how to apply in: http://www.cmuportugal.org Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 48 / 56
  • 113. CMU/Portugal Dual PhD Program in Language Technologies Priberam is an industrial partner See how to apply in: http://www.cmuportugal.org Note: deadline soon (December 15th) Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 48 / 56
  • 114. Priberam Machine Learning Lunch Seminars A series of informal meetings every two weeks at IST (Tuesdays 1PM) Discussion forum involving different research groups interested in machine learning Everyone can attend, no registration needed Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 49 / 56
  • 115. Priberam Machine Learning Lunch Seminars A series of informal meetings every two weeks at IST (Tuesdays 1PM) Discussion forum involving different research groups interested in machine learning Everyone can attend, no registration needed Delicious free food! Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 49 / 56
  • 116. Lisbon Machine Learning School An annual summer school held since 2011 devoted to ML and NLP Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 50 / 56
  • 117. Lisbon Machine Learning School An annual summer school held since 2011 devoted to ML and NLP > 100 participants worldwide (mostly MSc and PhD students) Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 50 / 56
  • 118. Lisbon Machine Learning School An annual summer school held since 2011 devoted to ML and NLP > 100 participants worldwide (mostly MSc and PhD students) Priberam Labs co-organizes and is one of the sponsors Google is the main sponsor Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 50 / 56
  • 119. Lisbon Machine Learning School An annual summer school held since 2011 devoted to ML and NLP > 100 participants worldwide (mostly MSc and PhD students) Priberam Labs co-organizes and is one of the sponsors Google is the main sponsor Next year’s topic is Big Data More information and videos of past lectures: http://lxmls.it.pt Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 50 / 56
  • 120. Opportunity: Research Internships We’re offering short term research internships at Priberam Labs! Who? MSc/PhD students wanting a short experience in the industry What? A stimulating research environment, connections to the international ML and NLP research scene How? Interns will work with us in a research project of their choice Interested? labs@priberam.com Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 51 / 56
  • 121. Thank You! More information about the Labs: http://labs.priberam.com (You could be here.) Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 52 / 56
  • 122. References I Altun, Y., Tsochantaridis, I., and Hofmann, T. (2003). Hidden Markov support vector machines. In Proc. of International Conference of Machine Learning. Berrou, C., Glavieux, A., and Thitimajshima, P. (1993). Near Shannon limit error-correcting coding and decoding. In Proc. of International Conference on Communications, volume 93, pages 1064–1070. Bishop, C. (2006). Pattern recognition and machine learning. Springer New York. Brants, T. (2000). Tnt: a statistical part-of-speech tagger. In Proc. of the Sixth Conference on Applied Natural Language Processing. Brill, E. (1993). A Corpus-Based Approach to Language Learning. PhD thesis, University of Pennsylvania. Charniak, E. (1996). Tree-bank grammars. In Proc. of the National Conference on Artificial Intelligence, pages 1031–1036. Chomsky, N. (1965). Aspects of the Theory of Syntax, volume 119. The MIT press. Collins, M. (1999). Head-driven statistical models for natural language parsing. PhD thesis, University of Pennsylvania. Eisner, J. (1996). Three new probabilistic models for dependency parsing: An exploration. In Proc. of International Conference on Computational Linguistics, pages 340–345. Halevy, A., Norvig, P., and Pereira, F. (2009). The unreasonable effectiveness of data. Intelligent Systems, IEEE, 24(2):8–12. Hudson, R. (1984). Word grammar. Blackwell Oxford. Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 53 / 56
  • 123. References II Ising, E. (1925). Beitrag zur theorie des ferromagnetismus. Zeitschrift f¨r Physik A Hadrons u and Nuclei, 31(1):253–258. Klein, D. and Manning, C. (2003). Accurate unlexicalized parsing. In Proc. of Annual Meeting on Association for Computational Linguistics, pages 423–430. Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. The MIT Press. Koo, T., Globerson, A., Carreras, X., and Collins, M. (2007). Structured prediction models via the matrix-tree theorem. In Empirical Methods for Natural Language Processing. Kschischang, F. R., Frey, B. J., and Loeliger, H. A. (2001). Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 47. Lafferty, J., McCallum, A., and Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. of International Conference of Machine Learning. Magerman, D. (1995). Statistical decision-tree models for parsing. In Proc. of Annual Meeting on Association for Computational Linguistics, pages 276–283. Manning, C. and Sch¨tze, H. (1999). Foundations of Statistical Natural Language Processing. u MIT Press, Cambridge, MA. Martins, A. F. T., Figueiredo, M. A. T., Aguiar, P. M. Q., Smith, N. A., and Xing, E. P. (2011a). An Augmented Lagrangian Approach to Constrained MAP Inference. In Proc. of International Conference of Machine Learning. Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 54 / 56
  • 124. References III Martins, A. F. T., Smith, N. A., Aguiar, P. M. Q., and Figueiredo, M. A. T. (2011b). Dual Decomposition with Many Overlapping Components. In Proc. of Empirical Methods for Natural Language Processing. Martins, A. F. T., Smith, N. A., and Xing, E. P. (2009). Concise Integer Linear Programming Formulations for Dependency Parsing. In Proc. of Annual Meeting of the Association for Computational Linguistics. Martins, A. F. T., Smith, N. A., Xing, E. P., Aguiar, P. M. Q., and Figueiredo, M. A. T. (2010a). Augmented Dual Decomposition for MAP Inference. In Neural Information Processing Systems: Workshop in Optimization for Machine Learning. Martins, A. F. T., Smith, N. A., Xing, E. P., Figueiredo, M. A. T., and Aguiar, P. M. Q. (2010b). Turbo Parsers: Dependency Parsing by Approximate Variational Inference. In Proc. of Empirical Methods for Natural Language Processing. McDonald, R. T., Pereira, F., Ribarov, K., and Hajic, J. (2005). Non-projective dependency parsing using spanning tree algorithms. In Proc. of Empirical Methods for Natural Language Processing. Mel’ˇuk, I. (1988). Dependency syntax: theory and practice. State University of New York Press. c Mitchell, T. (1997). Machine learning. McGraw Hill. Nivre, J., Hall, J., Nilsson, J., Eryiˇit, G., and Marinov, S. (2006). Labeled pseudo-projective g dependency parsing with support vector machines. In Procs. of International Conference on Natural Language Learning. Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann. Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 55 / 56
  • 125. References IV Potts, R. (1952). Some generalized order-disorder transformations. In Proceedings of the Cambridge Philosophical Society, volume 48, pages 106–109. Cambridge Univ Press. Sch¨lkopf, B. and Smola, A. J. (2002). Learning with Kernels. The MIT Press, Cambridge, MA. o Tanner, R. (1981). A recursive approach to low complexity codes. IEEE Transactions on Information Theory, 27(5):533–547. Taskar, B., Guestrin, C., and Koller, D. (2003). Max-margin Markov networks. In Proc. of Neural Information Processing Systems. Tesni`re, L. (1959). El´ments de syntaxe structurale. Libraire C. Klincksieck. e e Tsochantaridis, I., Hofmann, T., Joachims, T., and Altun, Y. (2004). Support vector machine learning for interdependent and structured output spaces. In Proc. of International Conference of Machine Learning. Viterbi, A. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2):260–269. Andr´ Martins (Priberam/IT) e Introducing Priberam Labs IST 22/11/2012 56 / 56