VIP Independent Call Girls in Mira Bhayandar 🌹 9920725232 ( Call Me ) Mumbai ...
Data mining, truth, justice, the American Way, and the Giant Spaghetti Monster
1. Data Mining, Truth, Justice, the American Way,
and the Flying Spaghetti Monster
tim@menzies.us Ph.D.
LCSEE, WVU, 20 Sept 2007
2. Expose, and hose
• quot;Part of education is to • quot;Part of science is to
expose people to different expose people to the
schools of thought.” critical and continual
(re)evaluation of ideas.”
- President George Bush, - Some guy called Timm,
August 1, 2005 September 20, 2007
2
3. quot;Look up in the sky! It's a bird! It's a
plane! It's Superman!quot;
quot;Yes, it's Superman, strange visitor from
another planet who came to Earth with
powers and abilities far beyond those of
mortal men.”
“Superman, who can change the course of
mighty rivers, bend steel in his bare hands;
and who, disguised as Clark Kent, mild-
mannered reporter for a great metropolitan
newspaper, fights a never ending battle for
truth, justice, and the American way.quot;
Why a never-
How to ensure ending battle?
justice?
How to make lottsa $$ ?
How to find truth?
3
4. So, tonight
Notions of certainty
Standards for debate
Surprises
Nothing is “truth”
but many more things are false
And some things are useful
Implications for humility
And for justice
4
5. God gave me a brain.
I take it (s)he wants me to use it.
Mark of the rational
while not dead; do
Review and revise assumptions;
Done
Entertain a wide range of ideas
But don’t necessarily accept them
Demand evidence
that lets your repeat/ refute/ improve
prior conclusions
But what of faith?
That, is another talk
There is room for the
divine in my universe
But in my test tubes?
Not too much
5
6. Data miners: agents that automate the
creation and review of new ideas
@relation weather.symbolic
@attribute outlook {sunny, overcast, rainy}
Mountains
@attribute temperature {hot, mild, cool}
@attribute humidity {high, normal}
of data
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
@data
sunny,hot,high,FALSE,no
Tablespoons of
sunny,hot,high,TRUE,no
knowledge
overcast,hot,high,FALSE,yes
rainy,mild,high,FALSE,yes
rainy,cool,normal,FALSE,yes
outlook = sunny
rainy,cool,normal,TRUE,no
| humidity = high: no
overcast,cool,normal,TRUE,yes
sunny,mild,high,FALSE,no | humidity = normal: yes
sunny,cool,normal,FALSE,yes
outlook = overcast: yes
rainy,mild,normal,FALSE,yes
outlook = rainy
sunny,mild,normal,TRUE,yes
| windy = TRUE: no
overcast,mild,high,TRUE,yes
overcast,hot,normal,FALSE,yes | windy = FALSE: yes
rainy,mild,high,TRUE,no
6
7. Data doubling every 20 months
Internet, Radio Frequency Identification (RFID) tracking, on-line
shopping (patterns of sales tracked at Amazon)
So now we can automatically learn answers to many questions; e.g.
What eggs to select for IVF?
What will software cost to develop?
What diseases does a patient have?
Which loan applications to fund?
What houses will have the best resale value?
Which parts of the program need more inspection?
What products are best to sell to what markets?
What cows to keep and which to send to the abattoir ?
How to teach a satellite to distinguish between cloud shadows and oil
spills?
How much electricity will be needed in two hours
i.e. what cola-powered generators to fire up?
7
8. More fundamentally, what can we say
about the world, with any certainty?
Same data, different data miners
different conclusions
Every miner biased by
Evaluation bias
Language
What is the “shape” of the
models we can learn?
Decision trees, equations, etc
Search
Pruning the possible infinite
space of of candidate models
What not to explore
Over-fitting avoidance
How to stop the learner fixating on noise
E.g. pruning back decision trees
8
9. Any learning scheme
has many biases
• Bias lets us ignore “stuff”.
• Without it, we don’t know
what is important or dull, we
can’t summarize, generalize.
• Without bias, we can’t
learn from the past
• Bias blinds us but
lets us see the future
• But changing biases changes what
we best believe
• No wonder truth is a
never-ending battle
9
10. Generalizing from
the past, works
Sometimes, very clearly
Heavy smokers have
2000% to 3000%
higher change of lung
cancer
Learned theories
performs very well on
new data
But ...
the “best” learned theory
can be a moveable feast.
10
11. So, a relativistic soup?
No certainty?
No way to plan effective actions?
No way to rule out absurd notions?
11
12. I don’t want to offend
any one, but…
… I think that once … Should I even say this in a
public place?
there were no cell phones
or iPods, or clothes, or quot;Part of education is to expose
countries, or language, or people to different schools of
human society, or 4-valved thought.”
hearts, or homeostasis, or President George Bush,
organs, or brains, or planets, August 1, 2005
or stars, or matter Shouldn’t I be have to give
credence to all theories?
Where the net energy
in-flow is positive… Evolution,
Intelligent design
the universe selects for self-
perpetuating systems, Pirates cause global
warming?
an exponentially decreasing
number of which are of
exponentially increasing
complexity
12
13. The Church of the Flying
Spaghetti Monster (FSM)
Founded in 2005
OSU physics graduate Bobby Henderson
A protest against the decision by the Kansas State Board of Education
That require the teaching of intelligent design as an alternative to biological evolution.
Henderson wrote to the board
professing belief in a supernatural Creator called the Flying Spaghetti Monster
Demanded that his quot;Pastafarianquot; theory of creation be taught in science classrooms.
13
14. FSM is not about religion
It is a mistake to view FSM as anti-religion
Rather, FSM is anti-anti-scientific rigor
No one in their right mind would ever
believe this nonsense
And that’s the point
Truth is a never-ending battle
We must have standards to assess scientific
theories, to reject absurdities
Or any nonsense can be released on this world
E.g. “Global warming is caused by pirates.”
14
15. Wikipedia on FSM
FSM: an invisible, undetectable Pirates are quot;absolute divine
Flying Spaghetti Monster beingsquot; and the original
Pastafarians.
Evidence for evolution planted by
FSM to in to Pastafarians' faith Their image as quot;thieves and
outcastsquot; is misinformation spread
by Christian theologians in the
FSM changes the results of
Middle Ages and Hare Krishnas.
measurements, like radiocarbon
dating, via His Noodly Appendage.
Pirates are quot;peace-loving
explorers and spreaders of good
Heaven contains beer volcanoes
willquot; who distributed candy to
and a stripper factory.
small children.
Hell is similar, but with stale beer
Global warming, earthquakes,
and diseased strippers. hurricanes, and other natural
disasters are a direct effect of the
shrinking numbers of pirates since
the 1800s.
15
16. FSM “proof” of the
divinity of pirates
A case study on how
not to present data
X-axis deliberately
misleading.
Crazy? Yes!
• But would you recognize such craziness if you say it again?
16
20. To our peril, we trust
old ideas too much
Columbia ice strike:
Size: 1200 in3,
Speed: 477 mph
(relative to vehicle)
Certified as “safe” by the
CRATER micro-meteorite
model
A typical experiment in
CRATER’s test database
Size: 3 in3 piece of debris
Speed: under 150 mph.
20
21. Value of estrogen
(NYT magazine,
Sept 16, 2007)
1990s:
Failure of scientific method
American Heart Association
Benefits of estrogen reported from large
recommends hormone replacement
observational studies, not randomized trials
therapy for older women to ward off
Repeated epidemiological finding:
heart disease and osteoporosis.
2001: randomized trail rarely support conclusions
from observational studies.
15 million Americans filling H.R.T.
So forget what you’re read about
prescriptions annually
2002: Anti-oxidants like vitamins E & C &beta
carotene preventing heat disease
estrogen therapy exposed as a hazard,
Fiber prevents colon cancer
not a benefit, for health
21
22. So, why is FSM silly?
And please, rest assured,
it is very very silly stuff indeed.
Theories need an entrance exam
Many possible theories
one for each bias
Demand that a theory has past at least
some operational al test before we
condone it, act on it.
If no reason to accept the new, don’t
Trust the most what has been
challenged the most
Karl Popper
22
23. No things are “right”, but some
things are “useful”
Sure, one data set supports many theories.
But there are many many more theories that are
unsupported.
No model is right, but some things are useful
(perform well on test data)
George Box
And many many many more ideas are useless
Can’t make predictions
Not defined enough to support (possible) refutation
23
24. Wolfgang Pauli
The quot;conscience of physicsquot;,
the critic to whom his colleagues were accountable.
Scathing in his dismissal of poor theories
often labeling it ganz falsch, utterly false.
But “ganz falsch” was not his most severe
criticism,
He hated theories so unclearly presented as to be
untestable
unevaluatable,
Worse than wrong because they could not be
proven wrong.
Not properly belonging within the realm of science,
even though posing as such.
Famously, he wrote of of such unclear paper:
”This paper is right. It is not even wrong.quot;
24
25. Believe those who seek the truth;
doubt those who find it
-Andre Gide.
26. Don’t test once on just
the training data
Study more than the
average
performance
Also look at the
variance
E.g. here, no
significant on new
data after X=8
26
27. If something works, poke it till it breaks
i) Sort attributes on “infogain”
ii) Learn using first N attributes
labor soybean
diabetes
anneal
A few variables
are (often) enough 27
30. Living with uncertainty: An incremental
discretizer + a Bayes
count, alert, fix classifier where all inputs
are all mono-classified
Track average max
likelihood for data
processing in “era”’s of X
instances
Count: stuff seen in past
Alert: if new counts different Contrast set learning
Fix: find delta new to old Linear time inference,
Very, very fast Tiny memory footprint
And, it works [Orrego, 2004]
F15 simulator data [courtesy B. Cukic]
Five flights: a,b,c,d,e
each with different off-nominal condition
imposed at “time” 15
Off-nominal condition not present in prior data
In all cases,
massive change detected
30
31. Living with uncertainty
Policy #1: exploration
Life is a
Tolerate the sub-optimal, a little
balance
Doing crazy things to learn new things
between
Policy #2: exploitation
Fix your theories and base your work on those fixed ideas.
Popper:
• most “science” is puzzle solving…
• … within existing paradigms.
• Sometimes the paradigm breakdowns….
• …prompting revolutionary research
Human young:
• Do crazy things (take long trips)
• Less craziness as we grow older
31
32. Tolerance of “exploration”
Critical to the
American way
America: history of
tolerance and acceptance
1945:
400 German rocket
scientists choose to
surrender to the Yankees,
not the Russians
The choose their post-war
life based on their
perceptions of American
ideology
Hence,
32
33. Tolerance = hi-tech = $$$
R. Florida: The Economic
Geography of Talent, 2002
Annals of Association of American
Geographers 92(4), 2002,pp743-655
Best predictor for hi-tech industry
R2 0.42 to “coolness”
R2 0.49 to cultural amenities
R2 0.50 to median house value
R2 0.77 to “diversity” index
33
34. Data Mining, Truth, Justice, the
American Way & Flying Spaghetti Monsters
“Superman, fights a never ending battle
To make $$, for truth, justice, and the American way.quot;
institutionalize
exploration Old conclusions must
No “truth”,
and tolerance be constantly re-assessed
all Is biased.
A healthy hi-tech needs
tolerance to support
exploration
and that the FSM is silly,
but would consider revising
that view if new evidence
emerges
34
35. Expose, and hose
• quot;Part of education is to • quot;Part of science is to
expose people to different expose people to the
schools of thought.” critical and continual
(re)evaluation of ideas.”
- President George Bush, - Some guy called Timm,
August 1, 2005 September 20, 2007
35