SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
How to Evaluate
Controlled Natural Languages
Tobias Kuhn
Workshop on Controlled Natural Language (CNL 2009),
Marettimo, Italy
8 June 2009
Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 2
Of Topic: AceWiki
Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 3
Of Topic: ACE Editor
Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 4
Introduction
 (Formal) Controlled Natural Languages (CNL) are designed to
be more understandable and more usable by humans than
common formal languages.
 But how do we know whether this goal is achieved?
 The only way to fnd out: User Studies!
Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 5
Evaluation of CNL Tools
 Many user studies have been performed to evaluate tools
that use CNL, e.g. [1].
 Hard to determine how much the CNL contributes to the
understandability
 Hard to compare CNLs to other formal languages because
diferent languages usually require diferent tools
[1] Abraham Bernstein, Esther Kaufmann. GINO – A Guided Input Natural Language
Ontology Editor. ISWC 2006.
Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 6
Tool-Independent Evaluation of CNLs
 Only very few evaluations have been performed that test a
CNL independently of a particular tool.
 [2] presents a paraphrase-based approach: The subjects of
an experiment receive a CNL statement and have to choose
from four paraphrases in natural English:
[2] Glen Hart, Martina Johnson, Catherine Dolbear. Rabbit: Developing a Controlled
Natural Language for Authoring Ontologies. ESWC 2008.
Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 7
Challenges with
Paraphrase-based Approaches
 Ambiguity of natural language
 One has to make sure that the subjects understand the
natural language paraphrases in the right way.
 Does good performance imply understanding?
 The formal statement and the paraphrases tend to look
very similar if both rely on English.
 One has to exclude that the subjects do the right thing
without understanding the statements:
 Following some syntactic patterns
 Misunderstanding both – statement and paraphrase –
in the same way
Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 8
My Approach: Ontograph Framework
 Using a simple graphical notation: Ontographs
 Designed to be used in experiments
 Idea: Let the subjects perform tasks on the basis of
situations depicted by diagrams (i.e. Ontographs).
 Assumption: Ontographs are very easy to understand.
✔ Every present is bought by John.
✘ John buys at most one present.
Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 9
Ontographs
 Ontographs consist of a
legend and a mini world.
 The legend introduces
types and relations.
 The mini world shows
the existing individuals,
their types, and their
relations.
Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 10
Ontographs: Properties
 Formal language
 Intuitive graphical icons
 No partial knowledge
 No explicit negation
 No generalization
 Large syntactical
distance to textual
languages
Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 11
Experiment: Goal
 The goal of the experiment was to fnd out whether
controlled natural languages are more understandable than
comparable common formal languages.
 CNL: Attempto Controlled English (ACE)
 Comparable language: Manchester OWL Syntax [3]:
»The syntax, which is known as the Manchester OWL Syntax, was developed in
response to a demand from a wide range of users, who do not have a
Description Logic background, for a “less logician like” syntax. The Manchester
OWL Syntax is derived from the OWL Abstract Syntax, but is less verbose and
minimises the use of brackets. This means that it is quick and easy to read and
write.«
 For a direct comparison, we defned a slightly modifed version:
MLL (Manchester-like language)
[3] Matthew Horridge, Nick Drummond, John Goodwin, Alan Rector, Robert Stevens, Hai H.
Wang. The Manchester OWL Syntax. OWLED 2006.
Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 12
ACE versus MLL
Bill is not a golfer. Bill HasType not golfer
No golfer is a woman. golfer DisjointWith woman
Nobody who is a man or who is a golfer
is an ofcer and is a traveler.
man or golfer SubTypeOf not (ofcer
and traveler)
Every man buys a present. man SubTypeOf buys some present
Lisa helps at most 1 person. Lisa HasType helps max 1 person
If X helps Y then Y does not love X. helps DisjointWith inverse loves
Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 13
Learning Time
learning time
understanding
0
?
20 min 4 h 2 weeks 1 year
controlled natural language
common formal language
Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 14
4 Series of Ontographs
1 2
3 4
Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 15
Statements in ACE and MLL
for each Ontograph
Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 16
Experiment: Subjects
 Requirements:
 Students, but no computer scientists or logicians
 At least intermediate level in written German and English
 Recruitment of 64 subjects:
 Broad variety of felds of study
 On average 22 years old
 42% female, 58% male
 The subject were equally distributed into eight groups:
(Series 1, Series 2, Series 3, Series 4) x (ACE frst, MLL frst)
Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 17
Experiment: Procedure
 1. Subjects read an instruction sheet that explains the
procedure, the pay-out, and the ontograph notation.
 2. The subjects answer control questions in order to check
whether they understood the instructions.
 3. During a learning phase that lasts at most 16 minutes, the
subjects read a language description sheet (of either ACE or
MLL) and see on the screen an ontograph together with 10
statements marked as “true” and 10 marked as “false”.
 4. During the test phase that lasts at most 6 minutes, the
subjects see another ontograph on the screen an have to
classify 10 statements as “true”, “false”, or “don't know”.
 5. The steps 3 and 4 are repeated with the other language.
 6. The subjects fll out a questionnaire.
Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 18
Language Instruction Sheets:
ACE versus MLL
Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 19
Experiment: Learning Phase
Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 20
Experiment: Testing Phase
Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 21
Experiment: Pay-out
 Every subject got 20.00 CHF for participation.
 Furthermore, they got 0.60 CHF for every correctly classifed
statement and 0.30 CHF for every “don't know”.
 Thus, every subject earned between 20 and 32 CHF.
Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 22
Evaluation: Ontograph Framework
 Did the Ontograph framework work? Answer: Yes!
 The subjects performed very well in the experiment (8.9
correct classifcations out of 10)
 They found the ontographs very easy to understand
(questionnaire score of 2.7 where 0 is “very hard to
understand” and 3 is “very easy to understand”)
Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 23
Evaluation: ACE vs MLL
 Which language performed better?
 Answer: ACE was understood better, within shorter time, and
was liked better by the subjects than MLL!
p-values obtained by
Wilcoxon singed rank test:
0.003421
1.493e-10
3.24e-07
Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 24
Evaluation: First/Second Language
Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 25
Evaluation: Series 1/2/3/4
Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 26
Evaluation: Regression
 Regression on the 128 test phase results with the normalized
classifcation score (-5 to 5) as the dependent variable
 Baseline: testing MLL as second language on series 1, male
subject of 18 years with good (but not very good) English
skills
| Robust
sc_norm | Coef. Std. Err. t P>|t|
---------------|---------------------------------------
ace | .5156250 .1800104 2.86 0.006
first_lang | -.2187500 .1800104 -1.22 0.229
series_2 | -.4802784 .3371105 -1.42 0.159
series_3 | -.2776878 .3485605 -0.80 0.429
series_4 | -.8795029 .5219091 -1.69 0.097
female | .1413201 .2982032 0.47 0.637
age_above_18 | -.0724091 .0296851 -2.44 0.018
very_good_engl | .2031366 .2967447 0.68 0.496
_cons | 4.302329 .3251371 13.23 0.000
Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 27
Conclusions
 The Ontograph framework seems to be suitable for
understandability experiments for CNLs.
 ACE is understood signifcantly better than MLL.
 There is no reason to believe that another logic syntax
(except CNLs) would have performed better than MLL.
 Furthermore, ACE requires signifcantly less time to be
learned and was liked better by the subjects.
Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 28
Resources for the
Ontograph Framework
 The resources for the Ontograph framework are available
freely under a Creative Commons license:
 http://attempto.ifi.uzh.ch/site/docs/ontograph/
Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 29
Thank you for your attention!
Questions/Discussion

Contenu connexe

Similaire à How to Evaluate Controlled Natural Languages

Tips for developing Grammar skill
Tips for developing Grammar skill Tips for developing Grammar skill
Tips for developing Grammar skill
julhat
 

Similaire à How to Evaluate Controlled Natural Languages (20)

Controlled Natural Language and Opportunities for Standardization
Controlled Natural Language and Opportunities for StandardizationControlled Natural Language and Opportunities for Standardization
Controlled Natural Language and Opportunities for Standardization
 
Acoustic Analysis of L2 Spanish /a/
Acoustic Analysis of L2 Spanish /a/Acoustic Analysis of L2 Spanish /a/
Acoustic Analysis of L2 Spanish /a/
 
559 1 enjoy english. 11 кл. раб. тетрадь 1-биболетова м.з, бабушис е.е_2011 ...
559 1  enjoy english. 11 кл. раб. тетрадь 1-биболетова м.з, бабушис е.е_2011 ...559 1  enjoy english. 11 кл. раб. тетрадь 1-биболетова м.з, бабушис е.е_2011 ...
559 1 enjoy english. 11 кл. раб. тетрадь 1-биболетова м.з, бабушис е.е_2011 ...
 
Tips for developing Grammar skill
Tips for developing Grammar skill Tips for developing Grammar skill
Tips for developing Grammar skill
 
How can corpus studies be of any use to translators
How can corpus studies be of any use to translatorsHow can corpus studies be of any use to translators
How can corpus studies be of any use to translators
 
FinalReport
FinalReportFinalReport
FinalReport
 
Oral fluency and spoken grammar 2016
Oral fluency and spoken grammar 2016Oral fluency and spoken grammar 2016
Oral fluency and spoken grammar 2016
 
The Effects of Two Languages in One Mind: First Language Attrition
The Effects of Two Languages in One Mind: First Language AttritionThe Effects of Two Languages in One Mind: First Language Attrition
The Effects of Two Languages in One Mind: First Language Attrition
 
Tesol2011 pc
Tesol2011 pcTesol2011 pc
Tesol2011 pc
 
Ca and-ea2222
Ca and-ea2222Ca and-ea2222
Ca and-ea2222
 
Sample only-pdf
Sample only-pdfSample only-pdf
Sample only-pdf
 
TESOL 2 - Technological Language Immersion
TESOL 2 - Technological Language ImmersionTESOL 2 - Technological Language Immersion
TESOL 2 - Technological Language Immersion
 
Interlanguage Analysis of Spanish Learners
Interlanguage Analysis of Spanish LearnersInterlanguage Analysis of Spanish Learners
Interlanguage Analysis of Spanish Learners
 
Presentation coglab
Presentation coglabPresentation coglab
Presentation coglab
 
A Study on the Appropriate Size of the Mongolian General Corpus
A Study on the Appropriate Size of the Mongolian General CorpusA Study on the Appropriate Size of the Mongolian General Corpus
A Study on the Appropriate Size of the Mongolian General Corpus
 
A Study on the Appropriate Size of the Mongolian General Corpus
A Study on the Appropriate Size of the Mongolian General CorpusA Study on the Appropriate Size of the Mongolian General Corpus
A Study on the Appropriate Size of the Mongolian General Corpus
 
Esp.753.language descriptions
Esp.753.language descriptionsEsp.753.language descriptions
Esp.753.language descriptions
 
How to Write a Literature Review in 30 Minutes or Less
How to Write a Literature Review in 30 Minutes or LessHow to Write a Literature Review in 30 Minutes or Less
How to Write a Literature Review in 30 Minutes or Less
 
New Direction in Syllabus and Curriculum
New Direction in Syllabus and CurriculumNew Direction in Syllabus and Curriculum
New Direction in Syllabus and Curriculum
 
Esp.language descriptions
Esp.language descriptionsEsp.language descriptions
Esp.language descriptions
 

Plus de Tobias Kuhn

A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
Tobias Kuhn
 
Science Bots: A Model for the Future of Scientific Computation?
Science Bots: A Model for the Future of Scientific Computation?Science Bots: A Model for the Future of Scientific Computation?
Science Bots: A Model for the Future of Scientific Computation?
Tobias Kuhn
 
Citation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific LiteratureCitation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific Literature
Tobias Kuhn
 
Citation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific LiteratureCitation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific Literature
Tobias Kuhn
 
Automatische Übersetzung in einem multilingualen, semantischen Wiki
Automatische Übersetzung in einem multilingualen, semantischen WikiAutomatische Übersetzung in einem multilingualen, semantischen Wiki
Automatische Übersetzung in einem multilingualen, semantischen Wiki
Tobias Kuhn
 

Plus de Tobias Kuhn (20)

Nanopublications and Decentralized Publishing
Nanopublications and Decentralized PublishingNanopublications and Decentralized Publishing
Nanopublications and Decentralized Publishing
 
Linked Data Publishing with Nanopublications
Linked Data Publishing with NanopublicationsLinked Data Publishing with Nanopublications
Linked Data Publishing with Nanopublications
 
Genuine semantic publishing
Genuine semantic publishingGenuine semantic publishing
Genuine semantic publishing
 
A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
A Decentralized Approach to Dissemination, Retrieval, and Archiving of DataA Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
 
The Controlled Natural Language of Randall Munroe’s Thing Explainer
The Controlled Natural Language of Randall Munroe’s Thing Explainer The Controlled Natural Language of Randall Munroe’s Thing Explainer
The Controlled Natural Language of Randall Munroe’s Thing Explainer
 
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
 
nanopub-java: A Java Library for Nanopublications
nanopub-java: A Java Library for Nanopublicationsnanopub-java: A Java Library for Nanopublications
nanopub-java: A Java Library for Nanopublications
 
Semantic Publishing and Nanopublications
Semantic Publishing and NanopublicationsSemantic Publishing and Nanopublications
Semantic Publishing and Nanopublications
 
Scientific Data Publishing
Scientific Data PublishingScientific Data Publishing
Scientific Data Publishing
 
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
 
Science Bots: A Model for the Future of Scientific Computation?
Science Bots: A Model for the Future of Scientific Computation?Science Bots: A Model for the Future of Scientific Computation?
Science Bots: A Model for the Future of Scientific Computation?
 
Data Publishing and Post-Publication Reviews
Data Publishing and Post-Publication ReviewsData Publishing and Post-Publication Reviews
Data Publishing and Post-Publication Reviews
 
Semantic Publishing with Nanopublications
Semantic Publishing with Nanopublications Semantic Publishing with Nanopublications
Semantic Publishing with Nanopublications
 
Nanopubs
NanopubsNanopubs
Nanopubs
 
Meme Extraction from Corpora of Scientific Literature using Citation Networks
Meme Extraction from Corpora of Scientific Literature using Citation NetworksMeme Extraction from Corpora of Scientific Literature using Citation Networks
Meme Extraction from Corpora of Scientific Literature using Citation Networks
 
A Multilingual Semantic Wiki Based on Controlled Natural Language
A Multilingual Semantic Wiki Based on Controlled Natural LanguageA Multilingual Semantic Wiki Based on Controlled Natural Language
A Multilingual Semantic Wiki Based on Controlled Natural Language
 
Citation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific LiteratureCitation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific Literature
 
Citation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific LiteratureCitation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific Literature
 
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
 
Automatische Übersetzung in einem multilingualen, semantischen Wiki
Automatische Übersetzung in einem multilingualen, semantischen WikiAutomatische Übersetzung in einem multilingualen, semantischen Wiki
Automatische Übersetzung in einem multilingualen, semantischen Wiki
 

Dernier

Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 

Dernier (20)

Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 

How to Evaluate Controlled Natural Languages

  • 1. How to Evaluate Controlled Natural Languages Tobias Kuhn Workshop on Controlled Natural Language (CNL 2009), Marettimo, Italy 8 June 2009
  • 2. Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 2 Of Topic: AceWiki
  • 3. Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 3 Of Topic: ACE Editor
  • 4. Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 4 Introduction  (Formal) Controlled Natural Languages (CNL) are designed to be more understandable and more usable by humans than common formal languages.  But how do we know whether this goal is achieved?  The only way to fnd out: User Studies!
  • 5. Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 5 Evaluation of CNL Tools  Many user studies have been performed to evaluate tools that use CNL, e.g. [1].  Hard to determine how much the CNL contributes to the understandability  Hard to compare CNLs to other formal languages because diferent languages usually require diferent tools [1] Abraham Bernstein, Esther Kaufmann. GINO – A Guided Input Natural Language Ontology Editor. ISWC 2006.
  • 6. Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 6 Tool-Independent Evaluation of CNLs  Only very few evaluations have been performed that test a CNL independently of a particular tool.  [2] presents a paraphrase-based approach: The subjects of an experiment receive a CNL statement and have to choose from four paraphrases in natural English: [2] Glen Hart, Martina Johnson, Catherine Dolbear. Rabbit: Developing a Controlled Natural Language for Authoring Ontologies. ESWC 2008.
  • 7. Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 7 Challenges with Paraphrase-based Approaches  Ambiguity of natural language  One has to make sure that the subjects understand the natural language paraphrases in the right way.  Does good performance imply understanding?  The formal statement and the paraphrases tend to look very similar if both rely on English.  One has to exclude that the subjects do the right thing without understanding the statements:  Following some syntactic patterns  Misunderstanding both – statement and paraphrase – in the same way
  • 8. Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 8 My Approach: Ontograph Framework  Using a simple graphical notation: Ontographs  Designed to be used in experiments  Idea: Let the subjects perform tasks on the basis of situations depicted by diagrams (i.e. Ontographs).  Assumption: Ontographs are very easy to understand. ✔ Every present is bought by John. ✘ John buys at most one present.
  • 9. Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 9 Ontographs  Ontographs consist of a legend and a mini world.  The legend introduces types and relations.  The mini world shows the existing individuals, their types, and their relations.
  • 10. Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 10 Ontographs: Properties  Formal language  Intuitive graphical icons  No partial knowledge  No explicit negation  No generalization  Large syntactical distance to textual languages
  • 11. Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 11 Experiment: Goal  The goal of the experiment was to fnd out whether controlled natural languages are more understandable than comparable common formal languages.  CNL: Attempto Controlled English (ACE)  Comparable language: Manchester OWL Syntax [3]: »The syntax, which is known as the Manchester OWL Syntax, was developed in response to a demand from a wide range of users, who do not have a Description Logic background, for a “less logician like” syntax. The Manchester OWL Syntax is derived from the OWL Abstract Syntax, but is less verbose and minimises the use of brackets. This means that it is quick and easy to read and write.«  For a direct comparison, we defned a slightly modifed version: MLL (Manchester-like language) [3] Matthew Horridge, Nick Drummond, John Goodwin, Alan Rector, Robert Stevens, Hai H. Wang. The Manchester OWL Syntax. OWLED 2006.
  • 12. Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 12 ACE versus MLL Bill is not a golfer. Bill HasType not golfer No golfer is a woman. golfer DisjointWith woman Nobody who is a man or who is a golfer is an ofcer and is a traveler. man or golfer SubTypeOf not (ofcer and traveler) Every man buys a present. man SubTypeOf buys some present Lisa helps at most 1 person. Lisa HasType helps max 1 person If X helps Y then Y does not love X. helps DisjointWith inverse loves
  • 13. Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 13 Learning Time learning time understanding 0 ? 20 min 4 h 2 weeks 1 year controlled natural language common formal language
  • 14. Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 14 4 Series of Ontographs 1 2 3 4
  • 15. Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 15 Statements in ACE and MLL for each Ontograph
  • 16. Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 16 Experiment: Subjects  Requirements:  Students, but no computer scientists or logicians  At least intermediate level in written German and English  Recruitment of 64 subjects:  Broad variety of felds of study  On average 22 years old  42% female, 58% male  The subject were equally distributed into eight groups: (Series 1, Series 2, Series 3, Series 4) x (ACE frst, MLL frst)
  • 17. Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 17 Experiment: Procedure  1. Subjects read an instruction sheet that explains the procedure, the pay-out, and the ontograph notation.  2. The subjects answer control questions in order to check whether they understood the instructions.  3. During a learning phase that lasts at most 16 minutes, the subjects read a language description sheet (of either ACE or MLL) and see on the screen an ontograph together with 10 statements marked as “true” and 10 marked as “false”.  4. During the test phase that lasts at most 6 minutes, the subjects see another ontograph on the screen an have to classify 10 statements as “true”, “false”, or “don't know”.  5. The steps 3 and 4 are repeated with the other language.  6. The subjects fll out a questionnaire.
  • 18. Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 18 Language Instruction Sheets: ACE versus MLL
  • 19. Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 19 Experiment: Learning Phase
  • 20. Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 20 Experiment: Testing Phase
  • 21. Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 21 Experiment: Pay-out  Every subject got 20.00 CHF for participation.  Furthermore, they got 0.60 CHF for every correctly classifed statement and 0.30 CHF for every “don't know”.  Thus, every subject earned between 20 and 32 CHF.
  • 22. Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 22 Evaluation: Ontograph Framework  Did the Ontograph framework work? Answer: Yes!  The subjects performed very well in the experiment (8.9 correct classifcations out of 10)  They found the ontographs very easy to understand (questionnaire score of 2.7 where 0 is “very hard to understand” and 3 is “very easy to understand”)
  • 23. Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 23 Evaluation: ACE vs MLL  Which language performed better?  Answer: ACE was understood better, within shorter time, and was liked better by the subjects than MLL! p-values obtained by Wilcoxon singed rank test: 0.003421 1.493e-10 3.24e-07
  • 24. Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 24 Evaluation: First/Second Language
  • 25. Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 25 Evaluation: Series 1/2/3/4
  • 26. Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 26 Evaluation: Regression  Regression on the 128 test phase results with the normalized classifcation score (-5 to 5) as the dependent variable  Baseline: testing MLL as second language on series 1, male subject of 18 years with good (but not very good) English skills | Robust sc_norm | Coef. Std. Err. t P>|t| ---------------|--------------------------------------- ace | .5156250 .1800104 2.86 0.006 first_lang | -.2187500 .1800104 -1.22 0.229 series_2 | -.4802784 .3371105 -1.42 0.159 series_3 | -.2776878 .3485605 -0.80 0.429 series_4 | -.8795029 .5219091 -1.69 0.097 female | .1413201 .2982032 0.47 0.637 age_above_18 | -.0724091 .0296851 -2.44 0.018 very_good_engl | .2031366 .2967447 0.68 0.496 _cons | 4.302329 .3251371 13.23 0.000
  • 27. Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 27 Conclusions  The Ontograph framework seems to be suitable for understandability experiments for CNLs.  ACE is understood signifcantly better than MLL.  There is no reason to believe that another logic syntax (except CNLs) would have performed better than MLL.  Furthermore, ACE requires signifcantly less time to be learned and was liked better by the subjects.
  • 28. Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 28 Resources for the Ontograph Framework  The resources for the Ontograph framework are available freely under a Creative Commons license:  http://attempto.ifi.uzh.ch/site/docs/ontograph/
  • 29. Tobias Kuhn, CNL 2009, Marettimo, Italy, 8 June 2009 29 Thank you for your attention! Questions/Discussion