SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
CS-GN-TEAM: internal presentation




research taster project
    temporal expressions extraction
                        Michele Filannino + You




                                                     Manchester, 15/02/2012
presentation my research taster project




cdt?


■ 4-year PhD course
■ funded by EPSRC
■ industrial partners
■ multi-disciplinary
■ new model for all PhD training within the UK

                                           15/02/2012, Michele Filannino   2 / 23
presentation my research taster project




cdt?
■ 6 months of foundation period
   ●   3 postgraduate courses
        ▶   Machine Learning and Data Mining, Modelling and
            visualisation of high-dimensional data, Semi-structured data
            and the web
   ●   3 scientific methods courses
   ●   1 short taster project [6 weeks]
   ●   creativity workshops

■ 3,5 years of PhD research
                                                      15/02/2012, Michele Filannino   3 / 23
presentation my research taster project




where we are

■ Computer science
  ●   natural language processing
      ▶   information retrieval
           ★ information extraction

               ✦   temporal expressions extraction




                                                           15/02/2012, Michele Filannino   4 / 23
presentation my research taster project




or...

 ■ Computer science
   ●    data mining
        ▶   text mining
             ★ information extraction

                 ✦   temporal expressions extraction




                                                             15/02/2012, Michele Filannino   5 / 23
presentation my research taster project




temporal expression
       ■ natural language phrase that denotes a temporal
          entity: an interval or an instant1
            ●    fully-qualified: no reference to any other temporal
                 entity
                    ▶    March 15, 2001
            ●    deictic: reference to the time of utterance
                    ▶    today, yesterday, three weeks ago, last Thursday
            ●    anaphoric: reference to a timex2 previously evoked in
                 the text
                    ▶    March 15, the next week, Saturday, at that time
1 L.Ferro, I. Mani, B. Sundheim, and G. Wilson, “Tides temporal annotation guidelines, v.
1.0.2,” MITRE, 2001                                                                            15/02/2012, Michele Filannino   6 / 23
2 timex temporal expression
presentation my research taster project




why?

■ user’s perspective
   ●   temporal aspects of events and entities provide a
       natural mechanism for organising information.

■ machine’s perspective
   ●   improvements in
        ▶   question answering, summarisation, browsing



                                                  15/02/2012, Michele Filannino   7 / 23
presentation my research taster project




how?
■ annotation
  ●   recognition
      ▶   automatically detect and delimitate expressions
      ▶   mostly machine-learning techniques
  ●   normalisation
      ▶   assign attributes values for all the recognised
          expressions
      ▶   using a shared and formal format (standard?)
      ▶   mostly rule-based techniques
■ reasoning or searching
                                                   15/02/2012, Michele Filannino   8 / 23
presentation my research taster project




timex                     forms1

       ■ time or date references
            ●    11pm, February 14th, 2005

       ■ time references that anchor on another time
            ●    one hour after midnight, two weeks before Christmas

       ■ durations
            ●    few months, two days, five years

       ■ recurring times
            ●    every third month, twice in the hour

1 J.Poveda, M. Surdeanu, and J. Turmo, “An analysis of Bootstrapping for the Recognition
of Temporal Expressions”, 2009                                                               15/02/2012, Michele Filannino   9 / 23
presentation my research taster project




timex                     forms1

       ■ context-dependent times
            ●    today, last year

       ■ vague references
            ●    somewhere in the middle of June, the near future

       ■ times indicated by an event
            ●    the day S. Berlusconi resigned
                   ▶    an event is considered a cover term for situations that

                        happen or occur

1 J.Poveda, M. Surdeanu, and J. Turmo, “An analysis of Bootstrapping for the Recognition
of Temporal Expressions”, 2009                                                               15/02/2012, Michele Filannino   10 / 23
presentation my research taster project




timeline
                  ACE-2004 dev & eval                         TempEval Task#15                      TempEval-3 Task#1
                         (TERN2004 corpus)                           (in SemEval07)                           (in SemEval13)



                             TimeML                                                TempEval-2 Task#13
                              (standard)                                                     (in SemEval10)



   85%1                                      87.8%1        90.7%1
      2000       2001      2002      2003    2004   2005    2006    2007    2008      2009        2010       2011    2012      2013




                            TimeBank                        SVM                    Conditional Random Fields
                                  (corpus)             (machine learning)                       (machine learning)


      Hand grammar approach                         Maximum Entropy Class.                     Markov logic network
                      (rule-based)                         (machine learning)                            (machine learning)




1 TERN2004   corpus                                                                          15/02/2012, Michele Filannino     11 / 23
presentation my research taster project




standards

■ “the nice thing about standards is, there are so
  many to choose from” by Andrew S. Tanenbaum
   ●   TimeML
   ●   DAML-Time
   ●   TIDES
   ●   ACE-TERN


                                           15/02/2012, Michele Filannino   12 / 23
presentation my research taster project




standards

■ there’s a tension between
   ●   flexibility and efficiency
   ●   usability and flexibility
   ●   complexity and spreadability
   ●   flexibility and agreement



                                                15/02/2012, Michele Filannino   13 / 23
presentation my research taster project




about the spreadability




                             15/02/2012, Michele Filannino   14 / 23
presentation my research taster project




about the agreement
                   TimeML Tag                                       agreement
                         TIMEX3                                            0.83
                         SIGNAL                                            0.77
                          EVENT                                            0.78
                           ALINK                                           0.81
                           SLINK                                           0.85
                           TLINK                                          0.55
Source: http://timeml.org/site/timebank/documentation-1.2.html             15/02/2012, Michele Filannino   15 / 23
presentation my research taster project




example: raw text



        That means Unisys must pay about $100 million in interest every
        quarter, on top of $27 million in dividends on preferred stock.




Source: TRIOS TimeBank v.0.1                                 15/02/2012, Michele Filannino   16 / 23
presentation my research taster project




example: recognition


        That means Unisys must <ev>pay</ev> about $100 million in interest
        <te>every quarter</te>, on top of $27 million in dividends on preferred
        stock.




Source: TRIOS TimeBank v.0.1                                15/02/2012, Michele Filannino   17 / 23
presentation my research taster project




example: normalisation
        That means Unisys must <EVENT eid="e110" mainevent="YES"
        class="OCCURRENCE" stem="pay" tense="NONE" aspect="NONE"
        polarity="POS" pos="VERB">pay</EVENT> about $100 million in
        interest <TIMEX3 tid="t256" type="SET" value="P1Q"
        temporalFunction="false" functionInDocument="NONE"
        quant="every">every quarter</TIMEX3>, on top of $27 million in
        dividends on preferred stock.
        <TLINK lid="l32" relType="BEFORE" relatedToEvent="e110"
        eventID="e107"/>
        <TLINK lid="l26" relType="OVERLAP" eventID="e110"
        relatedToTime="t256"/>

Source: TRIOS TimeBank v.0.1                             15/02/2012, Michele Filannino   18 / 23
presentation my research taster project




considerations
■ specialised linguistic approaches do not pay
   ●   machine learning techniques usually perform better

■ scarcity of pre-annotated corpus
   ●   manual corpus annotation is very tricky
   ●   partially solved with TempEval-3 (2013)
        ▶   1M words corpus automatically annotated by TRIOS

■ vibrant area in bio-medical domain

                                                  15/02/2012, Michele Filannino   19 / 23
presentation my research taster project



          “temporal expressions”                          “temporal expressions” AND “clinical”

   500

   450                                                                                           44
                                                                 42            41      45
   400                                                                                                    46
                                                                        36

   350
                                                          22
   300                                             15

   250                                   15
                                16
                                                                                                433
   200                                                          410           410      412
             10        12                                                                                 382
                                                                       370

    150                                                  310
                                                  280
                               220      230
    100      182      180

     50                                                                                                               9
                                                                                                                      33
      0
           2000      2001     2002     2003       2004   2005   2006   2007   2008    2009     2010      2011        2012


Source: Google Scholar (last update 09/02/2012)                                      15/02/2012, Michele Filannino     20 / 23
presentation my research taster project



         “temporal expressions”                           “temporal expressions” AND “clinical”

  100%
             5%       6%                 6%       5%     7%
                               7%
                                                                9%     9%     9%      10%       9%        11%
   90%

   80%                                                                                                               21%


   70%

   60%

   50%
            95%      94%       93%      94%       95%    93%    91%    91%    91%     90%       91%      89%
   40%                                                                                                               79%


   30%

   20%

   10%

    0%
           2000      2001     2002     2003       2004   2005   2006   2007   2008    2009     2010      2011        2012


Source: Google Scholar (last update 09/02/2012)                                      15/02/2012, Michele Filannino     21 / 23
presentation my research taster project




considerations


■ rule-based approach will never die
   ●   CRF and MLN are machine learning hybridisation

■ better performance means clever decomposition
   ●   how to divide the general problem into sub-problems




                                              15/02/2012, Michele Filannino   22 / 23
presentation my research taster project




my to-do list
 ■ collect some corpus in clinical field
 ■ study novel machine learning approaches
    ●   maximum likelihood, logistic regression, CRF, MLN

 ■ implement a prototype
    ●   Python or MATLAB


            12 days elapsed                  18 days remaining
0       3          6          9   12   15   18          21           24           27         30




                                                             15/02/2012, Michele Filannino   23 / 23
Thank you.

Contenu connexe

Similaire à My research taster project

Pushing the awareness envelope
Pushing the awareness envelopePushing the awareness envelope
Pushing the awareness envelopeIsrael Gutiérrez
 
infoavond MC 2023 - Engelse versie -.pptx
infoavond MC 2023 - Engelse versie -.pptxinfoavond MC 2023 - Engelse versie -.pptx
infoavond MC 2023 - Engelse versie -.pptxdloijen
 
Reinventing the Data Analytics Classroom
Reinventing the Data Analytics ClassroomReinventing the Data Analytics Classroom
Reinventing the Data Analytics ClassroomGalit Shmueli
 
Outline new productmktg-2012fall-sep01
Outline new productmktg-2012fall-sep01Outline new productmktg-2012fall-sep01
Outline new productmktg-2012fall-sep01Yender McLee
 
Teaching speaking skill
Teaching speaking skillTeaching speaking skill
Teaching speaking skillPrum Rotana
 
Technology Use for Developing a Writing Plan
Technology Use for Developing a Writing PlanTechnology Use for Developing a Writing Plan
Technology Use for Developing a Writing PlanDoctoralNet Limited
 
NIDOS Log frames training 14th March 2013 - Jill Gentle
NIDOS Log frames training 14th March 2013 - Jill GentleNIDOS Log frames training 14th March 2013 - Jill Gentle
NIDOS Log frames training 14th March 2013 - Jill GentleNIDOS
 
Lee then-lim cc-fp finals_l 014-251115
Lee then-lim cc-fp finals_l 014-251115Lee then-lim cc-fp finals_l 014-251115
Lee then-lim cc-fp finals_l 014-251115Xiao Yun
 
Presentation Skills Part 1 - Planning & Organizing
Presentation Skills Part 1 - Planning & OrganizingPresentation Skills Part 1 - Planning & Organizing
Presentation Skills Part 1 - Planning & OrganizingMichelle Smyth
 
Ou video analysis workshopfin3
Ou video analysis workshopfin3Ou video analysis workshopfin3
Ou video analysis workshopfin3Anne Adams
 
Uses of Video Annotation Software to Promote Deep Learning - SoTE 2106
Uses of Video Annotation Software to Promote Deep Learning - SoTE 2106Uses of Video Annotation Software to Promote Deep Learning - SoTE 2106
Uses of Video Annotation Software to Promote Deep Learning - SoTE 2106Michael Johnson
 
Enbe the journal note brief
Enbe the journal note briefEnbe the journal note brief
Enbe the journal note briefshensin1015
 
Iaf article design
Iaf article designIaf article design
Iaf article designSpark cph
 
T ueworkshoplite.01
T ueworkshoplite.01T ueworkshoplite.01
T ueworkshoplite.01ProAkademia
 

Similaire à My research taster project (20)

Pushing the awareness envelope
Pushing the awareness envelopePushing the awareness envelope
Pushing the awareness envelope
 
infoavond MC 2023 - Engelse versie -.pptx
infoavond MC 2023 - Engelse versie -.pptxinfoavond MC 2023 - Engelse versie -.pptx
infoavond MC 2023 - Engelse versie -.pptx
 
Reinventing the Data Analytics Classroom
Reinventing the Data Analytics ClassroomReinventing the Data Analytics Classroom
Reinventing the Data Analytics Classroom
 
Outline new productmktg-2012fall-sep01
Outline new productmktg-2012fall-sep01Outline new productmktg-2012fall-sep01
Outline new productmktg-2012fall-sep01
 
LESSON 16
LESSON 16LESSON 16
LESSON 16
 
Teaching speaking skill
Teaching speaking skillTeaching speaking skill
Teaching speaking skill
 
Pm and tm sofial
Pm and tm sofialPm and tm sofial
Pm and tm sofial
 
Pm and tm sofia
Pm and tm sofiaPm and tm sofia
Pm and tm sofia
 
Technology Use for Developing a Writing Plan
Technology Use for Developing a Writing PlanTechnology Use for Developing a Writing Plan
Technology Use for Developing a Writing Plan
 
Uwcsea day 1v2
Uwcsea day 1v2Uwcsea day 1v2
Uwcsea day 1v2
 
Nanoteaching Bhopal PPT
Nanoteaching Bhopal PPTNanoteaching Bhopal PPT
Nanoteaching Bhopal PPT
 
NIDOS Log frames training 14th March 2013 - Jill Gentle
NIDOS Log frames training 14th March 2013 - Jill GentleNIDOS Log frames training 14th March 2013 - Jill Gentle
NIDOS Log frames training 14th March 2013 - Jill Gentle
 
Lee then-lim cc-fp finals_l 014-251115
Lee then-lim cc-fp finals_l 014-251115Lee then-lim cc-fp finals_l 014-251115
Lee then-lim cc-fp finals_l 014-251115
 
Presentation Skills Part 1 - Planning & Organizing
Presentation Skills Part 1 - Planning & OrganizingPresentation Skills Part 1 - Planning & Organizing
Presentation Skills Part 1 - Planning & Organizing
 
Ou video analysis workshopfin3
Ou video analysis workshopfin3Ou video analysis workshopfin3
Ou video analysis workshopfin3
 
NCIHC HFT53 Teaching with Intention presentation slides.pdf
NCIHC HFT53 Teaching with Intention presentation slides.pdfNCIHC HFT53 Teaching with Intention presentation slides.pdf
NCIHC HFT53 Teaching with Intention presentation slides.pdf
 
Uses of Video Annotation Software to Promote Deep Learning - SoTE 2106
Uses of Video Annotation Software to Promote Deep Learning - SoTE 2106Uses of Video Annotation Software to Promote Deep Learning - SoTE 2106
Uses of Video Annotation Software to Promote Deep Learning - SoTE 2106
 
Enbe the journal note brief
Enbe the journal note briefEnbe the journal note brief
Enbe the journal note brief
 
Iaf article design
Iaf article designIaf article design
Iaf article design
 
T ueworkshoplite.01
T ueworkshoplite.01T ueworkshoplite.01
T ueworkshoplite.01
 

Plus de Michele Filannino

Temporal information extraction in the general and clinical domain
Temporal information extraction in the general and clinical domainTemporal information extraction in the general and clinical domain
Temporal information extraction in the general and clinical domainMichele Filannino
 
Mining temporal footprints from Wikipedia
Mining temporal footprints from WikipediaMining temporal footprints from Wikipedia
Mining temporal footprints from WikipediaMichele Filannino
 
Can computers understand time?
Can computers understand time?Can computers understand time?
Can computers understand time?Michele Filannino
 
Detecting novel associations in large data sets
Detecting novel associations in large data setsDetecting novel associations in large data sets
Detecting novel associations in large data setsMichele Filannino
 
Sviluppo di un algoritmo di similarità a supporto dell'annotazione semantica ...
Sviluppo di un algoritmo di similarità a supporto dell'annotazione semantica ...Sviluppo di un algoritmo di similarità a supporto dell'annotazione semantica ...
Sviluppo di un algoritmo di similarità a supporto dell'annotazione semantica ...Michele Filannino
 
Algoritmo di text-similarity per l'annotazione semantica di Web Service
Algoritmo di text-similarity per l'annotazione semantica di Web ServiceAlgoritmo di text-similarity per l'annotazione semantica di Web Service
Algoritmo di text-similarity per l'annotazione semantica di Web ServiceMichele Filannino
 
Semantic Web Service Annotation
Semantic Web Service AnnotationSemantic Web Service Annotation
Semantic Web Service AnnotationMichele Filannino
 
Modulo di serendipità in un Item Recommender System
Modulo di serendipità in un Item Recommender SystemModulo di serendipità in un Item Recommender System
Modulo di serendipità in un Item Recommender SystemMichele Filannino
 
Serendipity module in Item Recommender System
Serendipity module in Item Recommender SystemSerendipity module in Item Recommender System
Serendipity module in Item Recommender SystemMichele Filannino
 

Plus de Michele Filannino (10)

me_t3_october
me_t3_octoberme_t3_october
me_t3_october
 
Temporal information extraction in the general and clinical domain
Temporal information extraction in the general and clinical domainTemporal information extraction in the general and clinical domain
Temporal information extraction in the general and clinical domain
 
Mining temporal footprints from Wikipedia
Mining temporal footprints from WikipediaMining temporal footprints from Wikipedia
Mining temporal footprints from Wikipedia
 
Can computers understand time?
Can computers understand time?Can computers understand time?
Can computers understand time?
 
Detecting novel associations in large data sets
Detecting novel associations in large data setsDetecting novel associations in large data sets
Detecting novel associations in large data sets
 
Sviluppo di un algoritmo di similarità a supporto dell'annotazione semantica ...
Sviluppo di un algoritmo di similarità a supporto dell'annotazione semantica ...Sviluppo di un algoritmo di similarità a supporto dell'annotazione semantica ...
Sviluppo di un algoritmo di similarità a supporto dell'annotazione semantica ...
 
Algoritmo di text-similarity per l'annotazione semantica di Web Service
Algoritmo di text-similarity per l'annotazione semantica di Web ServiceAlgoritmo di text-similarity per l'annotazione semantica di Web Service
Algoritmo di text-similarity per l'annotazione semantica di Web Service
 
Semantic Web Service Annotation
Semantic Web Service AnnotationSemantic Web Service Annotation
Semantic Web Service Annotation
 
Modulo di serendipità in un Item Recommender System
Modulo di serendipità in un Item Recommender SystemModulo di serendipità in un Item Recommender System
Modulo di serendipità in un Item Recommender System
 
Serendipity module in Item Recommender System
Serendipity module in Item Recommender SystemSerendipity module in Item Recommender System
Serendipity module in Item Recommender System
 

Dernier

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Dernier (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

My research taster project

  • 1. CS-GN-TEAM: internal presentation research taster project temporal expressions extraction Michele Filannino + You Manchester, 15/02/2012
  • 2. presentation my research taster project cdt? ■ 4-year PhD course ■ funded by EPSRC ■ industrial partners ■ multi-disciplinary ■ new model for all PhD training within the UK 15/02/2012, Michele Filannino 2 / 23
  • 3. presentation my research taster project cdt? ■ 6 months of foundation period ● 3 postgraduate courses ▶ Machine Learning and Data Mining, Modelling and visualisation of high-dimensional data, Semi-structured data and the web ● 3 scientific methods courses ● 1 short taster project [6 weeks] ● creativity workshops ■ 3,5 years of PhD research 15/02/2012, Michele Filannino 3 / 23
  • 4. presentation my research taster project where we are ■ Computer science ● natural language processing ▶ information retrieval ★ information extraction ✦ temporal expressions extraction 15/02/2012, Michele Filannino 4 / 23
  • 5. presentation my research taster project or... ■ Computer science ● data mining ▶ text mining ★ information extraction ✦ temporal expressions extraction 15/02/2012, Michele Filannino 5 / 23
  • 6. presentation my research taster project temporal expression ■ natural language phrase that denotes a temporal entity: an interval or an instant1 ● fully-qualified: no reference to any other temporal entity ▶ March 15, 2001 ● deictic: reference to the time of utterance ▶ today, yesterday, three weeks ago, last Thursday ● anaphoric: reference to a timex2 previously evoked in the text ▶ March 15, the next week, Saturday, at that time 1 L.Ferro, I. Mani, B. Sundheim, and G. Wilson, “Tides temporal annotation guidelines, v. 1.0.2,” MITRE, 2001 15/02/2012, Michele Filannino 6 / 23 2 timex temporal expression
  • 7. presentation my research taster project why? ■ user’s perspective ● temporal aspects of events and entities provide a natural mechanism for organising information. ■ machine’s perspective ● improvements in ▶ question answering, summarisation, browsing 15/02/2012, Michele Filannino 7 / 23
  • 8. presentation my research taster project how? ■ annotation ● recognition ▶ automatically detect and delimitate expressions ▶ mostly machine-learning techniques ● normalisation ▶ assign attributes values for all the recognised expressions ▶ using a shared and formal format (standard?) ▶ mostly rule-based techniques ■ reasoning or searching 15/02/2012, Michele Filannino 8 / 23
  • 9. presentation my research taster project timex forms1 ■ time or date references ● 11pm, February 14th, 2005 ■ time references that anchor on another time ● one hour after midnight, two weeks before Christmas ■ durations ● few months, two days, five years ■ recurring times ● every third month, twice in the hour 1 J.Poveda, M. Surdeanu, and J. Turmo, “An analysis of Bootstrapping for the Recognition of Temporal Expressions”, 2009 15/02/2012, Michele Filannino 9 / 23
  • 10. presentation my research taster project timex forms1 ■ context-dependent times ● today, last year ■ vague references ● somewhere in the middle of June, the near future ■ times indicated by an event ● the day S. Berlusconi resigned ▶ an event is considered a cover term for situations that happen or occur 1 J.Poveda, M. Surdeanu, and J. Turmo, “An analysis of Bootstrapping for the Recognition of Temporal Expressions”, 2009 15/02/2012, Michele Filannino 10 / 23
  • 11. presentation my research taster project timeline ACE-2004 dev & eval TempEval Task#15 TempEval-3 Task#1 (TERN2004 corpus) (in SemEval07) (in SemEval13) TimeML TempEval-2 Task#13 (standard) (in SemEval10) 85%1 87.8%1 90.7%1 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 TimeBank SVM Conditional Random Fields (corpus) (machine learning) (machine learning) Hand grammar approach Maximum Entropy Class. Markov logic network (rule-based) (machine learning) (machine learning) 1 TERN2004 corpus 15/02/2012, Michele Filannino 11 / 23
  • 12. presentation my research taster project standards ■ “the nice thing about standards is, there are so many to choose from” by Andrew S. Tanenbaum ● TimeML ● DAML-Time ● TIDES ● ACE-TERN 15/02/2012, Michele Filannino 12 / 23
  • 13. presentation my research taster project standards ■ there’s a tension between ● flexibility and efficiency ● usability and flexibility ● complexity and spreadability ● flexibility and agreement 15/02/2012, Michele Filannino 13 / 23
  • 14. presentation my research taster project about the spreadability 15/02/2012, Michele Filannino 14 / 23
  • 15. presentation my research taster project about the agreement TimeML Tag agreement TIMEX3 0.83 SIGNAL 0.77 EVENT 0.78 ALINK 0.81 SLINK 0.85 TLINK 0.55 Source: http://timeml.org/site/timebank/documentation-1.2.html 15/02/2012, Michele Filannino 15 / 23
  • 16. presentation my research taster project example: raw text That means Unisys must pay about $100 million in interest every quarter, on top of $27 million in dividends on preferred stock. Source: TRIOS TimeBank v.0.1 15/02/2012, Michele Filannino 16 / 23
  • 17. presentation my research taster project example: recognition That means Unisys must <ev>pay</ev> about $100 million in interest <te>every quarter</te>, on top of $27 million in dividends on preferred stock. Source: TRIOS TimeBank v.0.1 15/02/2012, Michele Filannino 17 / 23
  • 18. presentation my research taster project example: normalisation That means Unisys must <EVENT eid="e110" mainevent="YES" class="OCCURRENCE" stem="pay" tense="NONE" aspect="NONE" polarity="POS" pos="VERB">pay</EVENT> about $100 million in interest <TIMEX3 tid="t256" type="SET" value="P1Q" temporalFunction="false" functionInDocument="NONE" quant="every">every quarter</TIMEX3>, on top of $27 million in dividends on preferred stock. <TLINK lid="l32" relType="BEFORE" relatedToEvent="e110" eventID="e107"/> <TLINK lid="l26" relType="OVERLAP" eventID="e110" relatedToTime="t256"/> Source: TRIOS TimeBank v.0.1 15/02/2012, Michele Filannino 18 / 23
  • 19. presentation my research taster project considerations ■ specialised linguistic approaches do not pay ● machine learning techniques usually perform better ■ scarcity of pre-annotated corpus ● manual corpus annotation is very tricky ● partially solved with TempEval-3 (2013) ▶ 1M words corpus automatically annotated by TRIOS ■ vibrant area in bio-medical domain 15/02/2012, Michele Filannino 19 / 23
  • 20. presentation my research taster project “temporal expressions” “temporal expressions” AND “clinical” 500 450 44 42 41 45 400 46 36 350 22 300 15 250 15 16 433 200 410 410 412 10 12 382 370 150 310 280 220 230 100 182 180 50 9 33 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 Source: Google Scholar (last update 09/02/2012) 15/02/2012, Michele Filannino 20 / 23
  • 21. presentation my research taster project “temporal expressions” “temporal expressions” AND “clinical” 100% 5% 6% 6% 5% 7% 7% 9% 9% 9% 10% 9% 11% 90% 80% 21% 70% 60% 50% 95% 94% 93% 94% 95% 93% 91% 91% 91% 90% 91% 89% 40% 79% 30% 20% 10% 0% 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 Source: Google Scholar (last update 09/02/2012) 15/02/2012, Michele Filannino 21 / 23
  • 22. presentation my research taster project considerations ■ rule-based approach will never die ● CRF and MLN are machine learning hybridisation ■ better performance means clever decomposition ● how to divide the general problem into sub-problems 15/02/2012, Michele Filannino 22 / 23
  • 23. presentation my research taster project my to-do list ■ collect some corpus in clinical field ■ study novel machine learning approaches ● maximum likelihood, logistic regression, CRF, MLN ■ implement a prototype ● Python or MATLAB 12 days elapsed 18 days remaining 0 3 6 9 12 15 18 21 24 27 30 15/02/2012, Michele Filannino 23 / 23