Constructing an AI knowledge base requires decomposing complex sentences into simplified statements with encoding concepts. Due to knowledge engineering cost and complexity, we created an experiment to test the scenario where college students do the above task using a semantic wiki. This wiki also tracked the progress of each student and provided an integrated environment for our knowledge workers.
In this presentation we will discuss the layout of the imported data within the wiki, the user experience throughout the publishing process, the underlying technologies behind the wiki app, and the preliminary results of the experiment. The semantic wiki web application included the following technologies:
• Semantic MediaWiki Plus, which provides an object oriented framework for semi-structured data.
• JavaScript, HTML5, and AJAX service-based graphing of triples and entities within the project and for interconnected services.
• Faceted browsing and semantic pivoting among related entities: textbook paragraphs, sentences, concepts, and sentence encodings.
• Virtuoso integration with the knowledge base.
2. Today we will be talking about…
• Populating a Symbolic AI – Aura
• The spiraling cost structure for encoding data into
a symbolic AI
• How do we bring low cost domain experts into the
process?
• Creating a Semantic MediaWiki Installation
• Importing a textbook into Semantic MediaWiki
and marking up pages with properties
• Customizing the installation for annotating
textbook sentences
4. 3) Encoding Planning -- 35% time
Group Common UTs, ID KR/KE Issues,
ID Already Encoded, Write How to Encode
Pre-Planning, QA Check
Status Labeling: Encoding Complete, KR Issue (Closed)
2) Reaching Consensus -- 14% time
Universal Truth Authoring, Concept Chosen QA Check
1) Determining Relevance -- 2% time
Highlighting, Diagram Analysis
QA Check
Status Labeling: Relevant, Irrelevant (Closed)
6) Question-Based Testing -- 14% time
Use Minimal Test Suite, Reasoning JIRA Issues Filed,
Encoder Fills KB Gaps
QA Check with Screenshots of “Passing" Comparison
and Relationship Questions
5) Key Term Review -- 25% time
KR Evaluated by Modeling Expert and Biologist,
Encoder Makes Changes
KR Evaluated by Modeling Expert and Biologist
QA Check
4) Encoding -- 10% time
Encode, File JIRA Issues
QA Check
Status Labeling: Encoding Complete, KE Issue
5. -- How to choose a concept given a UT?
-- How to produce UTs from sentences?
Sentence
Sentence
UT
UT
UT
UT
Chapter
Chapter
KBBook
CMap
CMap
CMap
CMap
Chapter UT
2) Reaching Consensus -- 14% time
Univeral Truth Authoring, Concept Chosen
6. What is a Universal Truth?
• “A Universal Truth is a stand-alone, unambiguous
declarative sentence about a textbook topic that
expresses a single fact that is universally true”
- AURA Knowledge Engineering Manual
• “Water is composed of two Hydrogen element molecules
and one Oxygen element molecule with the chemical
formula H20”
• Water is composed of hydrogen
• Water is composed of oxygen
• Hydrogen is an element
• Oxygen is an element
• Water has the chemical formula H20
• Does: “Water is a compound” count?
7. Project Goals
• “Crowd Source Universal Truth
Authoring”
• Can Domain Experts Author Useful Universal
Truths?
• Can We Speed Up Encoding a Textbook with Input
from Domain Experts?
• Can We Create a UT Authoring Portal for Multiple
Textbooks?
• Can Existing Social Networks Provide Domain
Experts Capable of UT Authoring?
• Could Gamification be Applied to An Existing Portal
to Add Non-Domain Experts?
8. About the Domain Experts
• Students attending University of Washington or
recent graduates
• All have a background in biology or life sciences
• Native English speakers with excellent writing
skills
• Each student read the chapters in question and
was provided with an iPad running the Inquire
application
• Students were paid for their time
10. Storing a Text Book in Aura Wiki
• The wiki was created with instances of page types
composed of textbook sentences
• Sentence
• Paragraph
• Section
• Chapter
• Book
• The wiki also has imported resources to aid in the UT
authoring process
• Glossary Pages
• Taxonomy Concepts
• Universal Truths – Human and Machine
15. Authoring Universal Truths
• Semantic Wiki Properties
• Each page has a unique id
for the table of contents
element
• The sentence itself is an
element
• Elements pointing to the
previous and next
sentences.
• Elements pointing to top
level entities
• Users can update the
sentences relevancy and
encoding status.
Sentence and Context View
17. Authoring Universal Truths
• Semantic Wiki Properties
• Reference sentence
• The universal truth text
• UT concept – AURA provided
• UT context – AURA provided
• Accuracy rating for the universal
truth
• Date created, approved, and
when ratings were applied
Universal Truth
19. Navigating Aura Wiki
• Unregistered and Registered Main Pages
• Unregistered users are locked out
• Registration is turned off for anonymous users
• Unique Extensions Proposed for Guided Authoring
20. How to View a Textbook Paragraph?
Auto create triple
format UTs from
sentence?
21. How to View a Universal Truth Page?
How do we unify
versions of the
page for export
to AURA?
25. Domain Expert Authoring Statistics
• 6 University of Washington Students participated in the
test
• Each received 45 minutes of training on creating
Universal Truths
• Each was given 1 hour and a pre-selected list of
sentences on a user page to complete
• The groups generated over 100+ Universal Truths each
session
• They averaged 37 Universal Truths an hour per student
• Students were frequently observed using their domain
experience to construct UTs not specifically worded in the
source sentence (ie: “Water is a compound”)
27. Project Goals
• “Crowd Source Universal Truth
Authoring”
• Can Domain Experts Author Useful Universal
Truths?
• Can We Speed Up Encoding a Textbook with
Input from Domain Experts?
28. Project Goals
• “Crowd Source Universal Truth
Authoring”
• Can We Create a UT Authoring Portal for
Multiple Textbooks?
29. Project Goals
• “Crowd Source Universal Truth
Authoring”
• Can Existing Social Networks Provide Domain
Experts Capable of UT Authoring?
• Could Gamification be Applied to An Existing
Portal to Add Non-Domain Experts?
Hello, looking over the program I’m aware this is a pretty competitive hour for talks… we’re doing this right after lunch… going against a Google talk… and with a cryptic title about artificial intelligence engines and a semantic media wiki installation.
This talk is going to cover an experiment we ran the last 6 months of 2012. An experiment that involves a symbolic AI population program and our solution to lowering the costs associated with encoding a text book into the Knowledge Base. We’re going to expand on the process for adding new data to the knowledge base, and our attempt to lower the cost structure by using domain experts using an installation of Semantic MediaWiki specifically created to populate Aura.
So let’s begin with AURA, and AURA itself is pretty large… so I chose one screenshot to include on one slide. In fact, this isn’t even a screenshot of AURA doing anything beyond one screen used to populate the knowledge base, and debugging a question into an explanation via concept maps. This screen quickly became a major choke point when it comes to populating the underlying concept maps composing the underlying knowledge base. In fact, it got exponentially more expensive and time consuming to add new concepts and relations to AURA as more chapters were encoded into AURA.This is a good screenshot because you see AURA failing to answer a question because it needs more data encoded. Looking at the third arrow AURA is saying a group of CMAPS to answer the question “What are the parts of the Eukaryotic Cell” do not exist. So it’s time to start the process for adding these concept maps from the textbook…
A process that looks roughly like this… I don’t want to dwell on all the steps being shown here too long, but as shown above it’s quite extensive to add even even trivial data to the knowledge base. This is the work process of several groups from Knowledge Engineers to SRI research groups to biologists and teachers. When project management was asked it which step needs focused on to speed up data population it came down to number 2…Actually the first part of #2…
We cared about this step.Authoring the “Universal Truth” portion of this process was time consuming, expensive, and getting more difficult as the knowledge base grew. It required trained biologists, trained educators that were used to the source text, and the knowledge engineering team focused on hiring individuals that could be trained into understanding “how” to encode these universal truths.A large part of the experiment was dedicated to training students in recognizing a universal truth and how to derive them from source sentences. We also specifically created work paths within our Semantic MediaWiki installation to aid in recognizing and constructing Universal Truths.
… and that wasn’t an easy task due to the nature of a “Universal Truth”. - Read definition – So easy enough to understand? I chose a sentence from wikipedia to demonstrate just how easy this task can get – Read sentence – Any guesses on how many universal truths lie in that sentence? Well just at a glance I found 5 and the last one is probably not valid being composed of two truths both stating water has a chemical formula, H20 is a chemical formula, and then a statement connecting water to H2O.
With all of that in mind and facing a pretty significant problem adding more content to AURA, we devised an experiment with the explicit intent to outsource universal truth authoring to the greatest number of domain experts. This is our “bullet list of pain” thinly veiled as “project goals”…
And finally… with our simple problem complete with simple project goals we decided on the easiest group of people in the world to schedule – College Students.-- read points –Students attending University of Washington or recent graduatesAll have a background in biology or life sciencesNative English speakers with excellent writing skillsEach student read the chapters in question and was provided with an iPad running the Inquire applicationStudents were paid for their time
Designed as a portal for annotating a textbook with Universal Truths we developed Aura Wiki to build on each aspect of the project – assuming the students pass the current project goal (ie – One painful bullet point). Here is an example of the entry point to the wiki functioning as a portal, and an early version of the UT authoring page at a sentence level.
We also decided to take on the task of storing and marking up the entire text book with semantic entities.First we began with the top level importing standard table of context data into a set of wiki pages marked by category – read top section pointThen we added the markup including glossaries, a taxonomy of existing concepts imported from Aura, and we imported existing universal truths from the current system as examples.
Frequently deemed the ugliest - and most common - page on the website it quickly became the focal point for UIX improvements as we realized it wasn’t really plausible to provide random sentences to users for UT annotation. These pages were created originally as background pages for tracking textbook properties and were not originally intended to be navigational elements. However, users would often leave the UT authoring page soon after creating their first set of annotations navigating to the actual text book table of content pages generating these criticisms…
Once the import was complete and we added the annotation pages this was the site map structure that emerged.Where we intended the users to stay and focusEverything the users found and decided to useA proposed review system for moderators / trusted usersRemoved to google analytics
-- Add arrows and explain turning on –First we had our import sources and addition of knowledge engineering UTs including marking up pages with additional semantic properties.The data was normalized for wiki presentation and queriesThe wiki portions of AURA wiki and the import agents to create the textbook pagesFinally, the export and sync agents to push/pull UTs to/from AURA
After all of the importing, normalization, alignment of wiki semantic properties to AURA’s ontology, and addition of pre-existing Universal Truth’s we ended up with a sentence annotation page that looks like this. On this page you can … - read slides – Read SentenceAccess Sentence ContextAccess Neighboring SentencesCheck & Submit RelevancyCheck & Submit Authoring StatusDisplay Existing Universal TruthsAuthor Universal TruthsAnd on closer inspection…
Here is the expanded view of the context surrounding a sentence available for UT annotation.Each page has a unique id for the table of contents elementThe sentence itself is an elementElements pointing to the previous and next sentences.Elements pointing to top level entitiesUsers can update the sentences relevancy and encoding status.
Each sentence has a collection of universal truths, each represented by a wiki page, that are created inline on the sentence page. On this page you’re viewing the expanded editing pane for adding a universal truth including : The listing of existing universal truths applied to the sentenceThe UT authoring blockAnd two autocomplete boxes for applying additional semantic properties to the universal truth
Reference sentenceThe universal truth textUT concept – AURA providedUT context – AURA providedAccuracy rating for the universal truthDate created, approved, and when ratings were applied
How do we show progress?How do we show community contributors?How do we focus members on a specific chapter or sentenceHow do we train users in what a universal truth entails – Guided TutorialThere were several requests for unique mediawiki extensions
Our original text view needed expanded to add context for authoring..-- 4 clicks --Problem is this made pages very long so authoring Uts required a lot of scrolling up and down the page in our original format.
These pages were created behind the scenes by the UT inline authoring component, and there was a huge debate on whether they should be visible to users. While important to the wiki for queries, moderating universal truths, and exporting semantic properties the operations provided by default wiki pages conflicted with some of our original assumptions.-- 4 clicks --
Like the second proposal it soon became obvious people couldn’t moderate a universal truth without the full context of a paragraph and possibly even an entire textbook section. This meant we had to remove the ability to approve and deny universal truths across sentences and focus on the annotations per sentence.
6 University of Washington Students participated in the testEach received 45 minutes of training on creating Univeral TruthsEach was given 1 hour and a pre-selected list of sentences on a user page to completeThe groups generated over 100 Universal Truths each sessionThey averaged 37 Universal Truths an hour per studentStudents were frequently observed using their domain experience to construct UTs not specifically worded in the source sentence
A complex iPad application and I chose one wireframe to put on one slide.You’re looking at Inquire displaying the online textbook portion of Aura