This document discusses potential contributions of design theory and methods for data science projects. It begins by introducing design theory, particularly C-K design theory which describes innovative design as the interaction and joint expansion of concepts and knowledge. It then presents two design methods - Innovation Field Mapping which uses C-K theory to map out an innovation field, and the KCP process which involves collective reasoning and action on concepts and knowledge. The document argues that these design theory approaches can help data science initiatives generate high-potential projects that can lead to breakthrough results, beyond traditional brainstorming.
Generative AI on Enterprise Cloud with NiFi and Milvus
Innovative design methods for data science projects
1. Innovative design methods for data science
projects
- beyond brainstorming
Akın O. Kazakçı
akin.kazakci@mines-paristech.fr
Centre for Data Science
January
the
7th,
2014
3. Design
Theory
and
Methods
for
Innova4on
• Chair
for
Research
and
Educa:on
• Fundamental
Research
on
Design
Theory
• 11
Industrial
Sponsors
• Theory
,
Field
research,
History,
Laboratory
experiments
4. CDS; Peculiar Characteristics & Lots of Unknown
• What is data-science?
– You have 10 secs. Please avoid dictionary definitions. And
no, do not use a list of subdomains.
• Is this a new form of organisation? Which
model?
– Neither private R&D, nor traditional research lab.
• How to unify and align researchers interests?
– Would traditional incentives be enough?
• What is the overall project for CDS?
– How to build a joint long-term vision with clearly
articulated (scientific or not) objectives?
4!
Akın O. Kazakçı, MINES ParisTech!
5. Gartner’s Hype Cycle
5!
Akın O. Kazakçı, MINES ParisTech!
Cabane et al. 2014, Understanding the Role of Collective
Imaginary in the Dynamics of Expectations
Int. Prod. Dev. Mana. (IPDM) Conf.
6. Are there strategies that would allow « smooth landing »?
6!
Akın O. Kazakçı, MINES ParisTech!
Average DSI Curve
« Smooth-Lander »
DSI
Innovative DSI
How
to
reach
plateau
of
produc:vity?
How
to
reach
it
before
others
and
lead
the
way?
Which
methods,
processes
or
principles
would
allow
building
innova:on
strategies
for
DSIs?
How
would
a
data
science
ini:a:ve
(e.g.
centres
or
groups)
generate
high-‐poten:al
projects
that
can
lead
to
breakthrough
results?
8. Profound Transformation of NPD activities
8!
Akın O. Kazakçı, MINES ParisTech!
• New functional spaces
• New user experiences
• New competencies
• New partnerships
• New business models
• Fuzzy industrial sectors
è 3rd Industrial revolution (Le Masson et al., 2006)
è New Products vs. New Product Types
è Revision of Objects’ Identities (Hatchuel et al., 1999)
10. Main functions
and design
parameters are
maintained
Rule-‐based
design
Rule-‐breaking
design
• New functional
spaces
• New
competencies
• New
partnerships
• New business
models
Innova4on:
op4misa4on
or
iden4ty
change?
Innova:on
as
«
op:misa:on
»
Innova:on
as
«
iden:ty
change
»
11. 11!
Akın O. Kazakçı, MINES ParisTech!
How to capture revision of identities?
–
A
concept-‐knowledge
theory
of
design
«
Design
specs
»
Tradi:onal
Object
Defini:ons:
Knowledge
Methods,
Judgements,
R&D
Competencies…
an
example
of
design
specs
for
locomo:ve
engines
(1890s’)
In
design,
objects
can
be
defined
by
a
«
design
spec
»
-‐
a
list
of
features
(or
proper:es).
The
designer
(individual
or
group)
need
to
have
some
knowledge
specific
to
each
«
feature
»
to
be
able
to
implement
(or
build)
it
and
for
handling
interac:ons.
12. Revision of identities as « Dual expansive reasoning »
?
?
Concept
expansions
Knowledge
expansions
In
«
innova:ve
design
»,
both
design
specs
and
associated
knowledges
are
«
dissolved
»
and
«
made
to
evolve
».
13. Source:
Wikipedia
Hatchuel
96;
Hatchuel
and
Weil
99,
02
Kazakci
and
Tsoukias,
03;
Kazakci
07
13!
C-K design theory: a breakthrough in understanding design
C-‐K
design
theory
describes
innova:ve
design
as
the
interac:on
and
joint
expansion
of
concepts
and
knowledge.
Ø Collec:ve
reasoning
and
ac:on
on
desired,
unknown
and
undecidable
objects
Ø Two
spaces
for
exploring:
Space
of
concepts
(arborescent
explora:on
of
unfeasible
specifica:ons)
and
knowledge
space
(proposi4ons
about
the
world
–
all
kinds
of
knowledge).
Ø Opera4ons
for
iden4ty
change
:
Expansive
par44ons
(flying
ship,
free
newspaper,
mobile
museum,
camera-‐
glass,
…
)
A
revival
of
design
theory
field:
Yoshikawa,
81;
Suh,
91;
Braha
and
Reich
03;
Shai
and
Reich,
03;
Research
in
Engineering
Design,
Special
Issue
on
Design
Theory
(2013),
…
16. Concept
Knowledge
Classic
K
New
K
for
motorist
16!
Akın O. Kazakçı, MINES ParisTech!
C-K for Innovation Field Mapping
What
is
the
Open
Rotor
innova4on
field
?
Project
with
Snecma
Brogard,
Joanny,
2010
Chaire
TMCI
17. Exploring the classic
engines improvements
Changing plane
and flying
experience
-
How
to
go
beyond
tradi4onal
design
paths?
17!
Akın O. Kazakçı, MINES ParisTech!
C-K for Innovation Field Mapping
18. monitoring
progress
with
CrossValida:on
+
Achieve
5σ!
Select
a
classifica:on
method!
Pre-‐processing!
Choose
hyper-‐params!
Train!
Op:mize
for
accuracy!
SVM
Decision
Trees
NN
…..…..
Integrate
AMS
directly
in
training
during
Gradient
Boos:ng
(John)
during
node
split
in
random
forest
(John)
Weighted
Classifica:on
Cascades
Two
par:cipants
observe
that
AMS
can
be
refactorized
and
its
terms
can
be
rewrimen
in
terms
of
their
convex
conjugate
form
–
which
allow
to
Fenchel-‐Young
inequality
from
convex
op:miza:on
limerature.
Ref:
hmp://arxiv.org/pdf/1409.2655v2.pdf,
Mackey
&
Brian
Op:miza:on
of
AMS
becomes
possible
by
a
procedure
they
name
Weigthed
Classifica/on
Cascades.(Rank:
461th)
?
?
?
?
?
Gradient
boos:ng
methods
fit
a
classifier
to
the
'per
data
point
loss'
and
since
AMS
is
not
a
sum
of
per
data
point
(event)
losses,
it's
not
obvious
how
to
do
use
AMS
as
a
loss
in
gradient
boos:ng
(Andre
Holzner)
AMS:
3.3
è
The
node
split
works
by
looking
for
the
split
that
maximises
the
AMS
of
one
side
of
the
split
when
predic:ng
it
as
pure
signal
(John)
An
alterna:ve
may
be
to
«
use
AUC
in
gradient
boos:ng
:ll
you
get
to
the
max
cv
result
and
then
tried
to
move
forward
with
an
AMS
loss
func:on
from
that
point
»
In
principle,
the
AMS
approximate
func4on
is
derivable
(hmp://:nyurl.com/ov5pedq)
at
a
node
level
(s
and
b
being
the
totals
of
other
nodes,
considered
constant,
and
x,
w
being
the
probability
predic:on
and
weight
for
the
node
to
be
split)
and
one
could
rewrite
the
part
of
code
where
the
objec:ve
func:on
is
evaluated,
replacing
the
sums
with
a
different
calcula:on
»
(Giulio
Casa)
C
space
K
Space
Design
for
sta:s:cal
efficiency
1st
2nd
3rd
ensembles
+
selec:ng
a
cutoff
threshold
that
op:mise
(or
stabilise
AMS)
Design
strategy
analysis
for
HiggsML
challenge
teams
19. Reduce
within-‐class
imbalance
C
K
Dealing
with
CIP
By
adjus4ng
class
distribu4on
Working
in
input
space
Re-‐represen4ng
inputs
Local
distor4on
Produce
an
embedding
Change
spa4al
resolu4on
For
some
X
X
is
a
support
vector
With
raw
data
Feature
engineering
Exploratory
(knowledge
or
intui4on
based
Automated
Gene4c
Algoritms
(Wasilowski,
Chen,
2009)
Reduce
between-‐class
imbalance
Reduce
both
Costs
are
known
Oversampling
signals
Undersampling
the
background
Iden4fying
class
distribu4on
Progressive
sampling
by
duplica4ng
by
synthesizing
new
points
SMOTE,
(Chawla,
Bowyer
et
al.
2002)
MSMOTE
(Hu
et
al,
2009
)
Borderline
SMOTE
(Han
et
al,
2005)
)
Adap4ve
Synthe4c
Sampling
(He
et
al,
2008
)
SafeLevel
Sampling
(Bunkhumpornpat
et
al
2008
)
resample
each
mixture
contains
all
signals
+
some
background
Such
that
all
background
points
are
used
at
least
in
one
mixture
Use
meta-‐learning
(Chan,
Stolfo,
2001)
Use
SVM
ensemble
(Yan,
Lin
et
al,
2003)
Remove
reduntant
(Kubat,
Matwia,
1997
Remove
border
regions
with
background
examples
(Kubat,
Matwia,
1997)
Reduce
overlap
Preferen4al
sampling
Remove
background
whose
average
distance
to
its
3
NN
is
smallest
(Mani,
Zhang,
2003)
By
adap4ng
algorithms
Improve
predic4ve
accuracy
Reduce
predic4ve
variance
Alterna4ve
search
techniques
Non-‐greedy
methods
Gene4c
Alg.
Detect
rare
events
TimeWeaver
(
)
Discover
small
disjuncts
(Carvahlo,
Freitas,
)
Change
evalau4on
metrics
Simulated
Annealing
Depth-‐bound
exhaus4ve
Brute
()
Laplace
es4mate
Evaluate
small
disjuncts
separately
Quinlan,
()
Modify
defini4on
of
learning
Bias
induc4on
towards
specificity
Minimize
error
costs
Change
levels
of
learning
Cascade
of
learners
Learn
only
rare
class
()
Two-‐level
learnig
()
Unknown
Costs
Modify
base
learner
Max
Specificity
(Acker,
Porter,
1989)
Specificity
for
small
disjuncts
(Ting,
1989)
Base
is
a
Tree
Learner
Split
aoributes
are
selected
to
minimise
total
expected
cost
Base
is
a
NN
Cost-‐weighted
error
propaga4on
Relabeling
for
min
expected
cost
Test
data
Training
data
Weigh4ng
(Ting,
1998)
CSC
(Wioen,
Franck,
2005)
MetaCost
(Domingos,
1999)
Cos4ng
(Zadrony
et
al,
2003)
Preprocess
ing
Cost-‐based
sampling
Empirical
Threshold
Sepng
Plot
total
cost
for
various
thresholds
Choose
min
using
plot
With
Cross
Valida4on
by
choosing
less
steep
hills
Thresholding
(Sheng,
Ling,
2006)
Using
ensembles
Using
cross
valida4on
Cost-‐
Sensi4ve
Boos4ng
Imbalance
d
IVotes
()
AdaCost
(
)
Using
sampling
to
alter
weight
distribu4on
Boos4ng
CSB
()
RareBoost
(
)
MSMOTE
Boost
()
SMOTE
Boost
()
Data
Boost-‐
IM
()
RUSBoost
()
Bagging
Overbagging
(
)
Underbagging
()
Under-‐
Over-‐
Bagging
()
Dicovery
Problem
Cross-‐Va
Ensemb
Gradient
loss'
and
losses,
it
boos:ng
AMS:
3.3
maximise
as
pure
s
An
altern
you
get
t
with
an
A
In
princip
(hmp://:
the
total
being
th
be
split)
objec:ve
different
1
2
3
4
5
Data
science
as
a
new
fron:er
for
design
A.
Kazakci,
ICED’15
(submimed)
20. DKCP process: Linearising C-K dynamics
20!
Akın O. Kazakçı, MINES ParisTech!
Proven
methodology:
-‐
Developped
at
Mines
ParisTech
(TMCI)
with
RATP
and
Thalès
Avionics
-‐ 40+
KCP
by
researchers
(2002-‐2014)
-‐ 2
PhD
Projects
(Arnoux,
2013;
Klasing
Chen,
in
process)
-‐ Now,
a
network
of
specialist
consultants
Ini4alisa4on
[K]
Knowledge
sharing
Workshops
[P]
Project
building
[C]
IFM-‐Design
Workshops
[RUN]
22. Experiments
with
210
subjets
(842
proposi/ons)
“Fixa4on
effects”
Three
types
of
solu:ons
:
Slowing
the
fall
Protec:ng
the
egg
Dumping
the
schock
covers
81
%
results!
Fixa:ons
on
an
objects
iden:ty
You
got
anything
beKer
???
23. Determining expansive path using C-K reasoningDetermining fixation path using C-K reasoning
Theory-driven experiments – SIG
Design Theory 2012 – M.Cassotti
& M.Agogué
C space K space
Expanding both in the C-space and in
the K-space for the “egg” task
24. Result 1 : the paths identified as fixation paths using C-K theory are the ones within
the fixation effect for adults
Theory-driven experiments – SIG Design Theory 2012 – M.Cassotti & M.Agogué
(1) Natural distribution of solutions of a design task
25. Types of « fixation » based on C-K theory
25!
Akın O. Kazakçı, MINES ParisTech!
Cogni:ve
fixa:ons
Social
fixa:ons
26. Limits of traditional methods for collective creativity
Consensus&
Shared
understanding
Originality
Participative
Seminars
Creative
Commandos
è Classical methods do not allow
generating concepts that are both
breakthrough and shared!
Fixa:on
Phenomena
Isola:on
Phenomena
26!
Akın O. Kazakçı, MINES ParisTech!
27. DKCP : Organising for shared breakthrough projects
Consensus&
Shared
understanding
Originality
Fixa:on
Phenomena
Isola:on
Phenomena
A
method
for
steering
breakthrough
process
27!
Akın O. Kazakçı, MINES ParisTech!
28. DKCP process: Linearising C-K dynamics
28!
Management
of
the
cogni4ve
and
social
aspects
(KCP
facilitators)
Innova4on
effort
(Par:cipants;
20-‐50)
D
K
C
P
Pré-‐C
Pré-‐K
Project
organisa:on
Defining
and
pre-‐explora:on
of
K
pockets
Sharing
and
integra:ng
K
Orienta:on
of
phase
C
Guided
crea:vity
Building
ac:onnable
strategies
Akın O. Kazakçı, MINES ParisTech!
Ini4alisa4on
[K]
Knowledge
sharing
Workshops
[P]
Project
building
[C]
IFM-‐Design
Workshops
[RUN]
29. Thank you!
Disclaimer: Copyrights of images belong to their respective owners.
29!
Akın O. Kazakçı, MINES ParisTech!
Akın O. Kazakçı
akin.kazakci@mines-paristech.fr
Feel
free
to
contact
me
for
more: