Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cf intro
1. April 13, 2006 1
Introducing:
The Cyc Foundation
April 13, 2006
2. April 13, 2006 2
Motivations
Wikimedia Foundation:
“Imagine a world in which every single person is
given free access to the sum of all human
knowledge. That's what we're doing.”
Cyc Foundation:
“Imagine a world in which every single person is given
free access to programs that reason with the sum of all
human knowledge. That's what we're doing.”
4. Cyc
Cyc
Ontology &
Knowledge
Base
Reasoning
Modules
Interface to
External Data Sources
Interface to
External Data Sources
CycAPICycAPI
Knowledge
EntryTools
Knowledge
EntryTools User Interface
(with Natural Language Dialog)
User Interface
(with Natural Language Dialog)
Data
Bases
Web
Pages
Text
Sources
Other
KBs
Other
Applications
Knowledge
Authors
Knowledge
Users
External
Data
Sources
Cyc Reasoning SystemCyc Reasoning System
5. April 13, 2006 5
Query:
“Someone happy”
Caption:
“A man watching
his daughter take
her first step”
Help Find Information byHelp Find Information by
Inference (+KB)Inference (+KB)
6. April 13, 2006 6
Query:
“Someone happy”
Caption:
“A man watching
his daughter take
her first step”
Help Find Information byHelp Find Information by
Inference (+KB)Inference (+KB)
(∃x) (feelsEmotion x Happiness Positive)
(∃x,y) (and (father x y) (gender x Female) (sees x y) (walking
LogicalInference
(deduction)
7. April 13, 2006 7
Help Find Information byHelp Find Information by
Inference (+KB)Inference (+KB)
(∃x) (feelsEmotion x Happiness Positive)
(∃x,y) (and (father x y) (gender x Female) (sees x y) (walking
LogicalInference
(deduction)
(implies
(and
(isa ?BIG-EVENT HumanLifecycleMilestone)
(doneBy ?BIG-EVENT ?CHILD)
(sees ?PARENT ?BIG-EVENT)
(children ?PARENT ?CHILD))
(holdsIn ?BIG-EVENT
(feelsEmotionTypeAtLevel ?PARENT
(PositiveAmountFn Pride))))
. . .. . .
. . .. . .
8. ALICE
• JUDGE11[9:26:0]I WILL ASK YOU SOME QUESTIONS
• PROGRAM[9:26:0]Do you think your plan will succeed?
• JUDGE11[9:26:16]WHAT IS THE COLOR OF A BLUE CAR
• PROGRAM[9:26:16]WHAT COLOR IS a blue car
• JUDGE11[9:26:35]WHERE IS SUE’S NOSE WHEN SUE IS IN
HER HOUSE
• PROGRAM[9:26:35]Where it belongs.
Try searching the World Wide Web.
• JUDGE11[9:26:49]WHAT HAPPENS TO AN ICE CUBE IN A
HOT DRINK
• PROGRAM[9:26:49]Try it.
• JUDGE11[9:27:25]HOW IS THE FATHER OF ANDY S
MOTHER RELATED TO ANDY
• PROGRAM[9:27:25]Fine as far as I know.
9. Natural Language Understanding
requires having lots of knowledge
1.The pen is in the box.
The box is in the pen.
2. The police watched the demonstrators…
…because they feared violence.
…because they advocated violence.
10. April 13, 2006 10
System: Ready for input.
User: what do you know about anthrax?
System: By “anthrax” do you mean
(1) the heavy metal band or (2) bacteria?
User: 2
System: Here is what I know about it:
Anthrax is a species of organism.
All anthrax is a bacterium.
All anthrax can lethally infect mammals.
Anthrax is extremely toxic for people to inhale.
Iraq has biological weapons that deliver anthrax.
Anthrax is opaque, tangible, microscopic, .
Anthrax can be dried into spores stored indefinitely
11. April 13, 2006 11
System: Ready for input.
User: what do you know about TR1?
System: I know nothing about it.
What sort of thing is TR1?
User: bacteria
System: TR1 lethally infects_________?
User: People
System: By “People”, I assume you mean
human beings, not People Magazine.
System: Does it infect some broader category of
animals (e.g., air-breathing vertebrates, chordates,
placental mammals, primates, eucaryotes, etc.)?
User: mammals
12. April 13, 2006 12
System: Is TR1 similar to anthrax?
User: yes
System: Which of the following are true of TR1:
.
.
.
User: Al Qaida possesses some TR1.
System: Ready for input.
User: Does Osama Bin Laden control anything that can
be the basis of weapons of mass destruction?
System: Yes, some TR1.
User: please explain
System: Osama Bin Laden controls Al Qaida.
Al Qaida posseses some TR1.
TR1 is a bacteria that lethally infects people.
22. April 13, 2006 22
Efficiency
Expressiveness
Efficiency vs. ExpressivenessEfficiency vs. Expressiveness
C++
PASCAL
LISP
English, German
First-order logic nth
-order logic
HL (heuristic level language)
EL (epistemological
level language)
Use two cooperating
languages (EL and HL)
to escape the limitations of
an age-old tradeoff.
Continuing improvements
in inference performance
won’t negatively effect
expressiveness.
24. April 13, 2006 24
BURC: Bootstrapping
Using ResearchCyc
• Goal: To extend Cyc’s knowledge base using
“relationships implied to be possible, normal or
commonplace in the world”
• Prior work with Cyc knowledge entry has been manually
oriented
• How will we collect common sense without a body and
manual labor…?
• Read, Parse, Mine!
• Proposal: Read text, Parse into a database, Extract
relations between words, Propose hypothetical relations
between concepts
25. April 13, 2006 25
BURC: Basic Analogy
• The Shotgun approach to the Human Genome
• Extract millions of fragments
• Knit them back together by finding commonalities
• Will it work for the Human Memome?
• James Burke: ‘Mr. Connections’
Lenat’s Bootstrap Hypothesis: once Cyc reaches a certain
level/scale it can help in its own development and start using
NLP to augment its knowledge base
26. April 13, 2006 26
Mining Adjective Knowledge
Example
• “white blouse” as factoid fragment
• Hypothesis: (plausibleValueOfType Blouse
mainColorOfObject WhiteColor)
28. April 13, 2006 28
(Very) Brief History of Cyc
• c. 1967 – AI is used on toy problems.
• c. 1977 – Expert systems reason in narrow domains.
• c. 1983 – Lenat, Minsky, Feigenbaum, Kay, and others
recognize need for a substrate of shared world knowledge;
and realize it would take hundreds of person-years to
“prime the pump”.
• 1984 – Admiral Bob Inman convinces Lenat to leave
Stanford and pursue this high-risk, high-payoff project
(Cyc) within MCC.
• 1994 – Cycorp is formed.
30. April 13, 2006 30
“The driver of the power of intelligent systems is the knowledge
the systems have about their universe of discourse, not the
sophistication of the reasoning process the systems employ.
Cyc has not only the world’s largest knowledge base, but the
best represented from a technical point of view.”
Ed Feigenbaum
inventor of the first expert system
editor of the AI Handbook
31. April 13, 2006 31
“People have silly reasons why computers don’t really
think. The answer is we haven’t programmed them
right; they just don’t have much common sense.
There’s been only one large project to do something
about that, that’s the famous Cyc project…”.
-- Marvin Minsky
32. April 13, 2006 32
How has Cycorp done?
• 20 years
• 3 million facts and rules (hand-entered)
• Compelling demos
• Some applications (constrained by business model)
• The basis for much greater growth
• “If the right way to build an A.I. involves giving
Cyc away for free, that is what we will do.”
– Doug Lenat (repeatedly)
– Note: Jury is out on what the “right way” is
33. April 13, 2006 33
Cycorp: True to its Promise
• OpenCyc
– The entire Cyc structural ontology: FREE
– 300,000 concept terms, ~2M facts and rules
• ResearchCyc
– Equal to Full Cyc (w/ Research-only license)
– Source code for inference engine not released
– API with 18,000 functions and macros!
– Ability to compile in your own additions
• Q: Will more be released? A: It depends.
– Cycorp must financially support its own R&D.
– Existing releases must result in major project benefits.
And it doesn’t
really matter.
34. April 13, 2006 34
Time for the Next Phase
• Cycorp has gotten us to where we are
– Representational ability
– Inference ability
– …and will continue (R&D leader, commercialization
• The rest of the world will help get us where we
are going
– Breadth of content
– Broad real-world diffusion
The thinking that got us to where we are today is insufficient to solve the problems
that exist today. To solve today's problems requires a new level of thinking.
-- Einstein
35. April 13, 2006 35
Building Cyc qua Engineering Task
amount known
rateoflearning
learning by discovery
learning via
natural language
CYC
750 person-years
21 realtime years
$75 million
Frontier of human knowledge
1984
2004
2006codify & enter each piece of knowledge, by hand
36. April 13, 2006 36
Building Cyc qua Engineering Task
amount known
rateoflearning
CYC
750 person-years
21 realtime years
$75 million
1984
2004
2006codify & enter each piece of knowledge, by hand
1000 years
10 years
37. April 13, 2006 37
How will we get the knowledge?
Games
That
Matter!
38. April 13, 2006 38
Foundation as Continuation
• Are we trying to make an A.I.?
– No.
• Are we trying to make computers behave
much more intelligently?
– Yes!
39. April 13, 2006 39
Mission (DRAFT)
The Cyc Foundation has been formed as an
independent not-for-profit organization
to hasten the arrival of intelligent tools
that will help humanity.
40. April 13, 2006 40
Assumptions
• (Currently) 9 ideas that shape strategy,
objectives and policy
• These may need to be validated,
modified or augmented
• In some cases, assumptions are
followed by related policy
41. April 13, 2006 41
Assumption #1
Long before computers are as smart as people,
they will be (in some cases already have been)
put to use to cure disease, address hunger
problems, make important new scientific
discoveries and help people work together.
Smarter computers will do a better job of this.
42. April 13, 2006 42
Assumption #2
Cycorp has developed and cared for what we believe
is an important piece of the AI puzzle.
They have always wanted to release it to the public,
but it had to be when people could realistically
develop it further on their own without in some way
endangering the project.
One fear was “forking”, or creating incompatible
variants of the knowledge base.
Cycorp and The Foundation will cooperate on 1 KB.
43. April 13, 2006 43
Flow of Cyc Data
Cycorp
Cyc Foundation
RCyc User
Gamer / Wikipedia user
Team:
- Subject-matter expert
- Ontologist
44. April 13, 2006 44
Assumption #3
The knowledge that will give computers
human-like intelligence ultimately needs to be
free.
That's our best hope of having it put to best use.
Portions of knowledge will always be held
proprietary.
The more shared a piece of knowledge, the greater
will be the force pulling all of its representations
toward freedom (to avoid the burden of
maintaining a non-standard representation).
45. April 13, 2006 45
Assumption #4
Proposed Semantic Web standards (such as those related
to OWL) are an important step in the right direction,
because they provide a foundation for working with
meaning on the Web.
The Cyc ontology will be a valuable addition, because it
can act as a semantic hub, allowing us to have shared
meaning.
There is some concern that a top-down central ontology
will dictate use of terms that may not meet a project’s
needs. We will be able to show that use of the Cyc
ontology can satisfy both needs and will be a useful
complement to the great work that has already been done
toward the Semantic Web.
46. April 13, 2006 46
Assumption #5
We all have something to learn.
We all have something to teach.
The Foundation mission will benefit from a very
broad base of support, rather than the traditional
rule by the technical elite.
47. April 13, 2006 47
Assumption #6
For this effort, focused work by many will be more
valuable than genius work by a few.
To be most helpful, people should work together,
and on tasks where they are capable of
contributing successfully.
(Example: don’t go off and try to “solve the A.I.
problem” by yourself.)
48. April 13, 2006 48
Assumption #7
Regular humans can be turned off by overly
technical talk that is out of place – and rightly so.
We need to be inclusive in our language and in our
activities in order to ensure the broadest base of
support and participation.
This is especially true in the Cyclify initiative.
49. April 13, 2006 49
Assumption #8
There is no “us” and “them”
• The Foundation is managed by its volunteer board
and run by its volunteer members
• The Foundation will start with no employees
• The will be no BDFL – Benevolent Dictator for
Life
50. April 13, 2006 50
Assumption #9
Fun is mandatory!
• By comparison, contributing to SETI is like
cleaning your oven while you sleep.
• This work will be hands-on, compelling and
(hopefully) addictive.
• If you’re not having fun, find out why and
fix it.
51. April 13, 2006 51
Foundation Goals
• Convert human knowledge to a form that computers
can reason with
– Grow the Cyc Ontology and KB Exponentially
• Establish a standard vocabulary and language for
representing concepts & knowledge
• Support the creation of intelligent tools
• Promote free and efficient knowledge transfer
Cyclify
52. April 13, 2006 52
Cyclify Knowledge Collection Activities
• Web Games
– Validate acquired knowledge
– Multiple-choice fact entry
– More?
• Wikipedia Linking
• KR Dating Service
– Wiki-based knowledge entry
– A SME paired with an ontologist
• WordNet Linking
53. April 13, 2006 53
Playflow Within Cyclify
Wikipedia user
Team:
- Subject-matter expert
- Ontologist
Game
Server
Wiki Knowledge
Server
Wikipedia
Data
RCyc User
Cycorp
K. Acquisiton
Data
Gamer
RCycRCyc
54. April 13, 2006 54
I’m thinking of a sentence…
Because I read about it on the web.
Status:
I have 2
answers
TrueTrue Fibromyalgia is
caused by ticks.
FalseFalse Don’t KnowDon’t Know
Doesn’t make senseDoesn’t make sense
Score: 24
55. April 13, 2006 55
Status:
I think this
sentence is
probably
not right
Submitting...
Thank you!
Answers: 2
You agreed with: 100%
I now have a better understanding of:
Fibromyalgia is caused by ticks.
Score: +2NextNext
Score: 26
56. April 13, 2006 56
Cyc Image
DMZ Boundary
computer (inside) computer (outside)
KAGs
GAFs
web gathered
hypothesized
asserted
…
Forward
rules
SubL form, running
KAG-collecting query
scp
XML
file
Populator (java)
Applet
XML
file
PostGRES
database
Question Server (java)
Applet
Current Architecture
Applet
57. April 13, 2006 57
Cyc Foundation Projects
• Nonprofit Formation (planning/budgeting/filing)
• Foundation Website
• Cyclify
• Fundraising
• Membership management
• Events
• ResearchCyc
– Recommend Cyc features / functions / design
– Help with ResearchCyc testing, documentation
58. April 13, 2006 58
Budgeting
• Must develop budget related to Year 1 plan
• Possible areas of spending
– Legal filings
– Server hosting
– W3C membership
– Conference attendance
– Fundraising
59. April 13, 2006 59
Foundation Website
• Requirements
– Content management features
– Collaboration features
– Out-of-the-box ease of use
– Free
• Currently evaluating Joomla (Mambo)
• Desired launch: May 15
60. April 13, 2006 60
Cyclify Projects
• First Web Game
– Develop game
– Viral marketing
– Add wiki linking activity
• Wiki Knowledge Collection
– Set up wikip.cyclify.org
– Add frame for ontologizing
– Feed wikip links to Web game
• Back End
– Design and implement PlayFlow
– Submit collected knowledge to Cycorp
61. April 13, 2006 61
Fundraising
• Individual Memberships
– Free membership for first 6 months for Cyclify
members and ResearchCyc users?
– How much?
– What do you get?
• Corporate Donations
– Need to prepare story
– Seems feasible to get donations
62. April 13, 2006 62
What does nonprofit mean?
• Cannot have investors or disburse earnings
• Can have earnings, though
• Revenues must come from services that are
within mission
• 501(c)(3)? (like Wikimedia Foundation)
• Or 501(c)(6)? (like Eclipse Foundation)
63. April 13, 2006 63
The Foundation Board of Directors
John De Oliveira Founder and President Strategy, Corp. Fundraising
Mark Baltzegar Co-Founder and Vice President Strategy, Game Devel., IT
OPEN Secretary, Treasurer Secretary, Treasurer
David James Board Member Organizational Dynamics
OPEN Board Member Standards
OPEN Board Member Events, Operations Delegator
OPEN Board Member Architecture, Playflow Design
TBD Sept. 2006 Board Member Oversight
Name Position Role
64. April 13, 2006 64
The Foundation: Membership
Project Leader, Cyclify Stu Baurman Keith Wright
Project Leader, ResearchCyc Kino Coursey Pierluigi Miraglia
High Scorer (current month) Douglas Miles Gavin Matthews
High Scorer (all time) Arturo Hernandez Joe Simone
David Whitten Guyren Howe ~100 ResearchCyc Users
Brad Bouldin John Cabral YOU!
Larry Lefkowitz Ben Rode
Bill Jarrold Jason Azbahr
65. April 13, 2006 65
ResearchCyc Users
Xerox PARC
Daxtron Labs
Lockheed Martin ATLD
Government
Government-related
Commercial
Houston
VA Medical Center
Air Force
Rome Labs
Institute for the Study
Of Accelerating Change
U of Maryland
Language Computer
Corporation
NTT
Communications Science
Laboratories (Japan)
Northwestern U Stanford NLP Dept.
ANSER, Inc.
LBJ School of
Public Affairs
Fraunhofer Institute
U of Illinois Urbana-Champaign
New Mexico
Highlands Univ.
Harvard U
Linkoping U
(Sweden)
Radboud U
(Netherlands)
Tokyo Inst.
of Technology
Terra Incognita
University
Microfabrica, Inc.
U of Stuttgart
NPOs
MIT Media Lab
Witan International
U of Pennsylvania
SRI
21st
Century
Technologies
U of Minnesota
Stone’s Throw
Technologies
ISI
Trimtab Consulting
U of Hawaii
Rensselaer AI and Reasoning Lab
TNO-DMV (Netherlands)
Sapio Systems (Denmark)
U of Toronto
Knowledge Media
Institute, Open
University
Austin Info Systems
66. April 13, 2006 66
How can I help?
• Humans (a.k.a. common sense experts)
• Programmers
– Web programmers
– Cyc programmers
• Ontologists
• Subject-matter experts
• Bloggers
67. April 13, 2006 67
Human Cyclists*
• Play the Web Game
• Come up with new game ideas
• Link Wikipedia to Cyc
• Learn more about Cyc
• Befriend an ontologist
• Tell a friend about Cyclify
• Write to a blog about Cyclify
• Help with viral marketing
• Design a logo
• T-Shirts: Buy one, or Create and sell them
* From now on, we’re all “Cyclists” – people who interact with Cyc in one way or another.
68. April 13, 2006 68
Programmers
• Help design and build a web services interface
• Learn the architecture of Web Game #1
• Design an add-on for the Web game
• Learn how to use the question server
• Propose a new game
• Help develop/support technical infrastructure
• Help organize documentation
• Help write the Cyc books
– to be published by O'Reilly
69. April 13, 2006 69
Ontologists
• Identify gaps in the knowledge base
• Befriend a Subject Matter Expert
– Work together on a domain
• Befriend a Human Cyclist
– Teach one who wants to learn basic ontology skills
• Help organize documentation
• Help write the Cyc books
70. April 13, 2006 70
Bloggers
• Blog about Cyclify
• Link to each other’s blogs
71. April 13, 2006 71
Timeline (Milestones)
• May 15 – Launch Foundation Website
• Build membership up until July 15
• June 15
– File Articles of Formation w/ Sec. Of State
– First Web game in beta
• July 15 – Launch Game
• October – First OpenCyc build containing
game data
Notes de l'éditeur
dodai is “the base upon which a structure is built.” It was the simplest Kanji I could find for “foundation”. This might be the basis of a logo, or it could be something totally temporary.
[Before each answer]: what answer would you expect?
You can tell it’s just matching on “where is X?”
Employee will let Cyc read the page and will help it figure out meaning.
This gets filled in at a different place and time.
Florence served in Viet Nam and was injured in an explosion.
Now when Florence goes to the site, there is a special link just for her.
This is a detailed itinerary page.
Nothing about the Red Cross museum appeared in the itinerary page.
Notice that this does not require incredibly deep reasoning.
Cyc transcends the traditional tradeoff that other languages must make between efficiency and expressiveness. It’s use of two cooperating languages allows it to be almost as expressive as English while operating as efficiently as a C++ program.
<number>
<number>
<number>
<number>
<number>
The OpenCyc website has said for years that an independent organization would be formed. That is finally happening.
Many of you have probably seen this curve that describes how knowledge gets into Cyc. It was hand-entered for years, but now there is enough knowledge there to augment the effort with other methods. NL and discovery. For quite some time, the NL method will need human help, but it no longer depends exclusively on highly trained logicians.
There’s an entrance exam. Can you find yourself on this list? Yes? You’re hired. You can help.