SlideShare une entreprise Scribd logo
1  sur  59
Télécharger pour lire hors ligne
R.B. Benosman,
McGowan Institute,
BST-3, Rm 2046, 3501 Fifth Avenue
Pittsburgh, PA 15213
benosman@pitt.edu
Event-based Neuromorphic Perception and
Computation: The Future of Sensing and AI
1
R.B. Benosman,
McGowan Institute,
BST-3, Rm 2046, 3501 Fifth Avenue
Pittsburgh, PA 15213
benosman@pitt.edu
Event-based Neuromorphic Perception and
Computation: The Future of Sensing and AI
2
1959
Russell’s Infant
Son: 5cm by 5cm
(176x176 array)
Portland Art
Museum.
David Hubel and
Torsten Wiesel —
 in 1959.Their
publication,
entitled “Receptive
fields of single
neurons in the
cat’s striate
cortex”
Lawrence Roberts’
“Machine perception
of three-dimensional
solids”. Process 2D
photographs to build
up 3D representations
from lines.
1963 1966
MIT summer Vision
Project, Seymour Papert,
automatically,
background/foreground
segmentation, extract non-
overlapping objects
David Marr,
“Vision a computational
investigation into the
human representtion and
processing of visual
information” vision is
hierarchical
1982
Franck
Rosenblatt:
Perceptron
1958
AI hype?
Frank Rosenblatt (1928-1971)
• Books: Marr (1982), Ballard & Brown (1982), Horn (1986)
1990s: Geometry reigns
• Fundamental matrix: Faugeras (1992)
• Normalized 8-point algorithm: Hartley (1997)
• RANSAC for robust fundamental matrix
estimation: Torr & Murray (1997)
• Bundle adjustment: Triggs et al. (1999)
• Hartley & Zisserman book (2000)
• Projective structure from motion: Faugeras and
Luong (2001)
Early applications of image analysis
• Character and digit recognition
• First OCR conference in 1962
• Microscopy, cytology
• Interpretation of aerial images
• Even before satellites!
• Particle physics
• Hough transform for analysis of bubble
chamber photos published in 1959
• Face recognition
• Article about W. Bledsoe
Azriel Rosenfeld (1931-2004)
“Father of computer vision”
• Ph.D. in mathematics, Columbia, 1957
• Professor at UMD and ordained rabbi
• Wrote first textbook in the field in 1969
1969
Early applications of image analysis
• Character and digit recognition
• First OCR conference in 1962
• Microscopy, cytology
• Interpretation of aerial images
• Even before satellites!
• Particle physics
• Hough transform for analysis of bubble
chamber photos published in 1959
• Face recognition
• Article about W. Bledsoe
• Fingerprint recognition
Azriel Rosenfeld (1931-2004)
“Father of computer vision”
• Ph.D. in mathematics, Columbia, 1957
• Professor at UMD and ordained rabbi
• Wrote first textbook in the field in 1969
• Oral history (IEEE interview, 1998)
Azriel Rosenfeld
Early applications of
image analysis
1990’
s
Vision is ruled by
Geometry, (projective)
3D reconstruction,
fundamental matrix,
RANSAC, bundle,
2000’
s
Featurebased Pattern
recognition.
Keypoints
3D reconstruction gets
“solved”, generic object
recognition
Historic Timeline
3
?
2000’
s
1959
Russell’s Infant
Son: 5cm by 5cm
(176x176 array)
Portland Art
Museum.
David Hubel and
Torsten Wiesel —
 in 1959. Their
publication,
entitled “Receptive
fields of single
neurons in the
cat’s striate
cortex”
Lawrence Roberts’
“Machine perception
of three-dimensional
solids”. Process 2D
photographs to build
up 3D representations
from lines.
1963 1966
MIT summer Vision
Project, Seymour Papert,
automatically,
background/foreground
segmentation, extract non-
overlapping objects
David Marr,
“Vision a computational
investigationinto the
human representtion
and processing of visual
information” vision is
hierarchical
1982
ck
nblatt:
ptron
958
e?
(1928-1971)
• Books: Marr (1982), Ballard & Brown (1982), Horn (1986)
1990s: Geometry reigns
• Fundamental matrix: Faugeras (1992)
• Normalized 8-point algorithm: Hartley (1997)
• RANSAC for robust fundamental matrix
estimation: Torr & Murray (1997)
• Bundle adjustment: Triggs et al. (1999)
• Hartley & Zisserman book (2000)
• Projective structure from motion: Faugeras and
Luong (2001)
of image analysis
ition
62
ages
sis of bubble
in 1959
Azriel Rosenfeld (1931-2004)
“Father of computer vision”
• Ph.D. in mathematics, Columbia, 1957
• Professor at UMD and ordained rabbi
1969
Early applications of image analysis
• Character and digit recognition
• First OCR conference in 1962
• Microscopy, cytology
• Interpretation of aerial images
• Even before satellites!
• Particle physics
• Hough transform for analysis of bubble
chamber photos published in 1959
• Face recognition
• Article about W. Bledsoe
• Fingerprint recognition
Azriel Rosenfeld (1931-2004)
“Father of computer vision”
• Ph.D. in mathematics, Columbia, 1957
• Professor at UMD and ordained rabbi
• Wrote first textbook in the field in 1969
• Oral history (IEEE interview, 1998)
Azriel Rosenfeld
Early applications of
image analysis
1990’
s
Vision is ruled by
Geometry, (projective)
3D reconstruction,
fundamental matrix,
RANSAC, bundle,
2000’
s
Featurebased Pattern
recognition.
Keypoints
3D reconstruction gets
“solved”, generic object
recognition
Historic Timeline
4
• Computer vision has been reinvented at least three times.
• Too close to the market: applications based research
Tendency to resist novelty choosing applications over potentially more
promising methods that could not yet deliver
• Not idea driven
Topic distribution of submissions for ICCV 2019
FromSvetlana Lazebnik,ComputerVision:LookingBack to Look Forward(2020)
What Went Wrong?
5
What Went Wrong?
6
• Images are the optimal structure of data
• Grey Levels as source of information
Why Are We Using Images?
7
• Invention of the camera obscura in 1544 (L. Da Vinci?)
• The mother of all cameras
Computer Vision: a Heritage from Art!
8
Origins of Imaging
• Increasing profits: painting faster
• Evolution from portable models for travellers to current
digital cameras
• Evolving from canvas, to paper, to glass, to celluloid, to
pixels 9
Origins of Video: Motion Picture
• Early work in motion-picture projection
• Pioneering work on animal locomotion in 1877 and 1878
• Used multiple cameras to capture motion in stop-motion
photographs
Eadweard Muybridge
(1830-1904)
10
Computer vision,
the Impossible Trade Off!
power vs frame rate
redundant useless data
power and resource
hungry: need to
acquire/transmit/store/
process
motion blur
displacement between
frames
High Power & High Latency
11
Event Acquisition
• Reduce Data Load and only Detect “meaningful”
events, at the time they happen!
• No generic solution,
Solutions:
Scopes:
• Avoid burning energy to acquire, transmit and store
information that ends up being trashed
• There are almost an infinite number of solutions
to extract events
• Need to be adapted to the dynamics and
nature of the data
12
Event acquisition
• New Information is detected when it happens
• When nothing happens, nothing is sent or processed
• Sparse information coding
mages. In order to overcome these limitations and
ate temporal sampling of f (t), it is more efficient to
s of f (t) just at the exact time at which they occur
mely sampling on the other axis.
a) t (b)
f(t)
t
ways to sample functions values, in (a) using a
t scale on the t axis, in (b) using a constant scale
es of f (t).
t0 tk
f x,y(t)
∆f ˆ
f x,y(
α1 β1 α2 β2
Ev(t)
+1 +1 +1
-1
-1 -1
-1 -1
+1
Popular solution: Sample on the amplitude axis of signals
Time is the most valuable information 13
Event acquisition
mages. In order to overcome these limitations and
ate temporal sampling of f (t), it is more efficient to
s of f (t) just at the exact time at which they occur
mely sampling on the other axis.
a) t (b)
f(t)
t
ways to sample functions values, in (a) using a
t scale on the t axis, in (b) using a constant scale
es of f (t).
t0 tk
f x,y(t)
∆f ˆ
f x,y(
α1 β1 α2 β2
Ev(t)
+1 +1 +1
-1
-1 -1
-1 -1
+1
Popular solution: Sample on the amplitude axis of signals
Time is the most valuable information
14
e. This paper reviews
elopment of a high-
namic vision sensor
pt entirely, and then
r efficient low-level
nd high-level object
spike events. These
use them for object
mber of events but
nts. Labeling attaches
s, e.g. orientation or
vents to track moving
nt-by-event basis and
as the basis for
ject for filtering and
ent past event times.
ese past event times
ger branching logic to
. These methods are
al digital hardware,
g-based approach for
egrates a neural style
e. All code is open-
sourceforge.net).
address-event, vision
ature extraction, low-
(a) Vision sensor system
(b) Vision sensor USB interface
TMPDIFF128
16 bit
counter
USB
interface
Host PC
USB
Req
Biases
Ack
Enb
Enb
16 bit
bus
FIFOs
buffers
3
Fig. 5 Shows the present implementation of the TMPDIFF128 camera system with USB2.0 interface. (a) shows the vision sensor system. (b
of the USB hardware and software interface. The vision sensor (TMPDIFF128) sends AEs to the USB interface, which also captures tim
running counter running at 100 kHz that shares the same 16-bit bus. These timestamped events are buffered by the USB FIFOs to be sent to
also buffers the data in USB driver FIFOs, ‘unwraps’ the 16 bit timestamps to 32 bit values, and offers this data to other threads for further p
USB chip also uses a serial interface to control the vision sensor biases. Flash memory on the USB chip stores persistent bias values.
15
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 1, JANUARY 2011 259
A QVGA 143 dB Dynamic Range Frame-Free PWM
Image Sensor With Lossless Pixel-Level Video
Compression and Time-Domain CDS
Christoph Posch, Member, IEEE, Daniel Matolin, and Rainer Wohlgenannt
Abstract—The biomimetic CMOS dynamic vision and image
sensor described in this paper is based on a QVGA (304 240)
array of fully autonomous pixels containing event-based change
detection and pulse-width-modulation (PWM) imaging circuitry.
Exposure measurements are initiated and carried out locally by
the individual pixel that has detected a change of brightness in
its field-of-view. Pixels do not rely on external timing signals
and independently and asynchronously request access to an
(asynchronous arbitrated) output channel when they have new
grayscale values to communicate. Pixels that are not stimulated
visually do not produce output. The visual information acquired
from the scene, temporal contrast and grayscale data, are com-
municated in the form of asynchronous address-events (AER),
with the grayscale values being encoded in inter-event intervals.
The pixel-autonomous and massively parallel operation ideally
results in lossless video compression through complete temporal
redundancy suppression at the pixel level. Compression factors
depend on scene activity and peak at 1000 for static scenes. Due
to the time-based encoding of the illumination information, very
high dynamic range—intra-scene DR of 143 dB static and 125 dB
at 30 fps equivalent temporal resolution—is achieved. A novel
time-domain correlated double sampling (TCDS) method yields
array FPN of 0.25% rms. SNR is 56 dB (9.3 bit) for 10 Lx
illuminance.
Index Terms—Address-event representation (AER),
biomimetics, CMOS image sensor, event-based vision, focal-plane
processing, high dynamic range (HDR), neuromorphic electronics,
time-domain CDS, time-domain imaging, video compression.
I. INTRODUCTION
BIOLOGICAL sensory and information processing sys-
tems appear to be much more effective in dealing with
real-world tasks than their artificial counterparts. Humans still
outperform the most powerful computers in routine functions
involving, e.g., real-time sensory data processing, perception
tasks and motion control and are, most strikingly, orders of
synchronous information processing. It has been demonstrated
[1]–[3] that modern silicon VLSI technology can be employed
in the construction of biomimetic or neuromorphic artefacts that
mimic biological neural functions. Neuromorphic systems, as
the biological systems they model, process information using
energy-efficient, asynchronous, event-driven methods.
The greatest successes of neuromorphic systems to date have
been in the emulation of peripheral sensory transduction, most
notably in vision. Since the seminal attempt to build a “silicon
retina” by Mahowald and Mead in the late 1980s [4], a variety of
biomimetic vision devices has been proposed and implemented
[5]. In the field of imaging and vision, two observations are
crucial: biology has no notion of a frame, and the world—the
source of most visual information we are interested in—works
in continuous-time and asynchronously. The authors are con-
vinced that biomimetic asynchronous electronics and signal
processing have the potential—also in fields that are histori-
cally dominated by synchronous approaches such as artificial
vision, image sensing and image processing—to reach entirely
new levels of performance and functionality, comparable to
the ones found in biological systems. Future artificial vision
systems, if they want to succeed in demanding applications
such as, e.g., autonomous robot navigation, high-speed motor
control, visual feedback loops, etc. must exploit the power of
the asynchronous, frame-free, biomimetic approach.
Studying biological vision, it has been noted that there exist
two different types of retinal ganglion cells and corresponding
retina–brain pathways in, e.g., the human retina: The “Magno”-
cells are at the basis of what is named the transient channel or the
Magno-cellular pathway. They have short latencies and respond
transiently when changes—movements, onsets, offsets—are in-
volved. The “Parvo”-cells are at the basis of what is called the
sustained channel or the Parvo-cellular pathway. Parvo-cells are
Temporal events and absolute light measurement
16
Frames vs Events
17
Why Event Based Sensors?
Chaotic pendulum tracking
18
Event Time-based Sensor: Grey Levels Events
19
CCAM sensors provide frame-free visual informa
CCAM is g
less events
equivalent 1
based came
the number of events depends on the
dynamics of the scene. For standard
cameras this amount is constant.
Why Event Based Sensors?
20
Event Cameras
21
Event cameras have become a commodity
Event Cameras
22
How to Process Events?
23
How to Process Events?
24
Incremental
Time
?
Update previous result
incrementally for the
single pixel change
Compute the new 1 pixel-
change frame and
compute result using the
whole frame
Batch or Frame
Figure 3: Sample results from our method. The columns depict raw events, time manifold,
result without manifold regularisation and finally with our manifold regularisation. Notice
the increased contrast in weakly textured regions (especially around the edge of the monitor).
the results of [1]. We point out that no ground truth data is available so we are limited to
purely qualitative comparisons.
In Fig. 4 we show a few images from the sequences. Since we are dealing with highly
dynamic data, we point the reader to the included supplementary video3 which shows whole
sequences of several hundred frames.
Figure 4: Comparison to the method of [1]. The first row shows the raw input events that
have been used for both methods. The second row depicts the results of Bardow et al., and
the last row shows our result. We can see that out method produces more details (e.g. face,
beard) as well as more graceful gray value variations in untextured areas, where [1] tends to
produce a single gray value.
4.4 Compar ison to Standar d Camer as
We have captured a sequence using a DVS128 camera as well as a Canon EOS60D DSLR
camera to compare the fundamental differences of traditional cameras and event-based cam-
eras. As already pointed out by [1], rapid movement results in motion blur for conventional
3ht t ps: / / www. yout ube. com
/ wat ch?v=r vB2URr G
T94
Figure 2: Block diagram of the proposed approach. The output of the event camera is collected into frames over a specified
time interval T, using aseparate channel depending on the event polarity (positiveand negative). The resulting synchronous
event framesareprocessed by aResNet-inspired network, which producesaprediction of the steering angle of thevehicle.
without resorting to partitioning the solution space; the an-
gles produced by our network can take any value, not just
discreteones, in therange[− 180◦ ,180◦ ]. Moreover, in con-
trast to previous event-based vision learning works which
use small datasets, we show results on the largest and most
challenging (dueto scenevariability) event-based dataset to
date.
3. Methodology
Our approach aims at predicting steering wheel com-
mandsfrom aforward-looking DVSsensor [1] mounted on
a car. As shown in Fig. 2, we propose a learning approach
that takes as input the visual information acquired by an
event camera and outputs the vehicle’s steering angle. The
events are converted into event frames by pixel-wise accu-
mulation over aconstant time interval. Then, adeep neural
network mapstheevent framesto steering anglesby solving
a regression task. In the following, we detail the different
and negativeevents. Thehistogram for positiveevents is
h+
(x,y)
.
= Â
tk2T, pk= + 1
d(x− xk,y− yk), (1)
where d is the Kronecker delta, and the histogram h− for
thenegativeeventsisdefined similarly, using pk = − 1. The
histogramsh+ and h− arestacked to produceatwo-channel
event image. Events of different polarity are stored in dif-
ferent channels, asopposed to asinglechannel with thebal-
ance of polarities (h+ − h− ), to avoid information loss due
to cancellation in case events of opposite polarity occur in
the samepixel during the integration interval T.
3.2. Learning Approach
3.2.1. Preprocessing. A correct normalization of input
and output data is essential for reliably training any neural
network. Since roads are almost always straight, the steer-
ing angle’s distribution of a driving car is mainly picked in
[−5◦
,5◦
]. This unbalanced distribution results in a biased
regression. In addition, vehicles frequently stand still be-
Event Computation
V
S
25
Applications: Event Stereovision
• Matching pixels is hard
• Changing lighting conditions, occlusions, motion blur…. 26
• Matching binocular events only using the time of events
• Two events arriving at the same time and fulfilling geometric
constraints are matched
Event Stereovision
P.Rogister, R.B. Benosman, S-H. Ieng, P. Lichtsteiner, T. Delbruck, Asynchronous Event-based Binocu- lar Stereo Matching. 2011 December 23, IEEE
Transactions on Neural Networks 23(2) : 347-353, DOI : 10.1109/TNNLS.2011.2180025
27
S.H. Ieng, J. Carneiro, M. Osswald, R. B.Benosman, Neuromorphic Event-Based Generalized Time-based Stereovision, 2018 July 18, Frontiers in
Neuroscience, 12(442), DOI :10.3389/fnins.2018.00442
Event Stereovision
28
Courtesy Chronocam 2016
29
Visual Odometry
Courtesy Chronocam 2016
30
2
When events are transmitted off-chip, they are time-
ped and then transmitted to a computer using a standard
connection.
EVENT-BASED VISUAL MOTION FLOW
stream of events from the silicon retina can be mathemat-
y defined as follows: let e(p, t) = (p, t)T
a triplet giving
position p = (x, y)T
and the time t of an event. We can
define the function Σe that maps to each p, the time t:
Σe : R2
→ R3
p → t = Σe.
(1)
e being an increasing function, Σe is then a monotonically
asing surface.
t Σey
2
hen events are transmitted off-chip, they are time-
and then transmitted to a computer using a standard
nnection.
ENT-BASED VISUAL MOTION FLOW
am of events from the silicon retina can be mathemat-
fined as follows: let e(p,t) = (p,t)T
a triplet giving
ion p = (x, y)T
and the time t of an event. We can
ne the function Σe that maps to each p, the time t:
Σe : R2
→ R3
p → t = Σe.
(1)
ng an increasing function, Σe isthen amonotonically
g surface.
t Σey
eeds a set
the pixel
decreased.
al clocked
data thus
he timing
d accurate
me rate” is
2, FEBRUARY 2008
peration. In (a), the
C and due to the
e pixel is sensi-
fine as
h 128 by
F spikes
ner et al.
ortional to
X
x
Fig. 2. General principle of visual flow computation,
the surface of active events Σe is derived to provide an
estimation of orientation and amplitude of motion.
We then set the first partial derivatives with respect to the
parameters as: Σex
= ∂ Σ e
∂ x and Σey
= ∂ Σ e
∂ y .
We can then write Σe as:
Σe(p + ∆ p) = Σe(p) + ∇ ΣT
e ∆ p + o(||∆ p||), (2)
with ∇ Σe = ( ∂ Σ e
∂ x , ∂ Σ e
∂ y )T
.
Thepartial functions of Σe are functions of asingle variable
whether x or y. Time being a strictly increasing function, Σe
is a nonzero derivatives surface at any point. It is then possible
to use the inverse function theorem to write around a location
p = (x, y)T
:
∂Σe
∂x
(x, y0) =
dΣe|y= y0
dx
(x) =
1
vx (x, y0)
,
∂Σe
∂y
(x0, y) =
dΣe|x= x 0
dy
(y) =
1
vy (x0, y)
,
(3)
Σe|x= x , Σe|y= y being Σe restricted respectively to y and
pixel schematic. (b) Principle of operation. In (a), the
or single-ended inverting amplifiers.
ferencing circuit removes DC and due to the
rsion in the photoreceptor, the pixel is sensi-
ontrast , which we define as
(1)
hotocurrent. (The units of do not affect
) illustrates the principle of operation of the
of this section, we will consider in detail the
component parts of the pixel circuit (Fig. 2).
(b)
on DVS sensor with 128 by
ple of ON and OFF spikes
dapted from Lichtsteiner et al.
xel’s voltage Vp proportional to
corresponding generation of
e change threshold) and OFF
, from which the evolution of
Σe(p + ∆ p) = Σe(p) + ∇ Σe ∆ p + o(||∆ p||),
with ∇ Σe = ( ∂ Σ e
∂ x , ∂ Σ e
∂ y )T
.
Thepartial functions of Σe arefunctions of asingle vari
whether x or y. Time being a strictly increasing function
is a nonzero derivatives surface at any point. It is then pos
to use the inverse function theorem to write around a loca
p = (x, y)T
:
∂Σe
∂x
(x, y0) =
dΣe|y= y0
dx
(x) =
1
vx (x, y0)
,
∂Σe
∂y
(x0, y) =
dΣe|x= x 0
dy
(y) =
1
vy (x0, y)
,
Σe|x= x 0 , Σe|y= y0 being Σe restricted respectively to y
x. The gradient ∇ Σe can then be written:
∇ Σe = (
1
vx
,
1
vy
)T
,
which provides the inverse of the pixellic velocity of ev
For an incoming event :
Form the surface (event times):
We then have:
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 25, NO. 2, FEBRUARY 2014 407
Event-Based Visual Flow
Ryad Benosman, Charles Clercq, Xavier Lagorce, Sio-Hoi Ieng, and Chiara Bartolozzi
Abstract—This paper introduces a new methodology to com-
pute dense visual flow using the precise timings of spikes from
an asynchronous event-based retina. Biological retinas, and their
artificial counterparts, are totally asynchronous and data-driven
and rely on a paradigm of light acquisition radically different
from most of the currently used frame-grabber technologies.
This paper introduces a framework to estimate visual flow from
the local properties of events’ spatiotemporal space. We will
show that precise visual flow orientation and amplitude can
be estimated using a local differential approach on the surface
defined by coactive events. Experimental results are presented;
they show the method adequacy with high data sparseness and
temporal resolution of event-based acquisition that allows the
computation of motion flow with microsecond accuracy and at
very low computational cost.
Index Terms—Event-based vision, event-based visual motion
flow, neuromorphic sensors, real time.
I. INTRODUCTION
RECENT work in this paper of neural activity has shown
that each spike arrival time is reliable [1]–[4]. However,
the extent to which the precise timing of neural spikes down
to millisecond precision is significant for computation is still a
matter of debate. In this paper, we address this issue by focus-
ing on the computational principles that could be operated
by motion-sensitive neurons of the visual system to compute
the visual flow. We bring together new experimental sensors
delivering truly naturalistic precise timed visual outputs and a
new mathematical method that allows to compute event-based
visual motion flow using each incoming spike’s timing as main
computation feature. This presented method does not rely on
gray levels, nor on the integration of activity over long time
intervals. It uses each relative timing of changes of individual
information in various demanding, machine vision applica-
tions [9]–[15]. The microsecond temporal resolution and the
inherent redundancy suppression of the frame-free, event-
driven acquisition and subsequent representation of visual
information employed by these cameras enables to derive a
novel methodology to process the visual information at a very
high speed and with low computational cost.
Visual flow is a topic of several research fields that has
been intensively studied since the early days of computational
neuroscience. It is widely used in artificial vision and essential
in navigation. Visual flow is known to be an ill-posed noisy
visual measure limited by the aperture problem. Its use in real-
time applications on natural scenes is generally difficult. It is
usually computed sparsely on high salient points.
Visual flow techniques are commonly classified under one of
the four major categories.
1) Energy-based or frequency-based methods estimate opti-
cal flow from the output of the velocity-tuned filters
designed in the Fourier domain [16]–[18].
2) Phase-based methods estimate image velocity in terms
of band-pass filter outputs [19].
3) Correlation-based or region-matching methods search
for a best match of small spatial neighborhoods between
adjacent frames [20]–[25].
4) Gradient-based or differential methods use spatiotem-
poral image intensity derivatives and an assumption of
brightness constancy [26]–[28].
Energy-based techniques are slow [19] and are not adequate
for real-time applications where gradient-based approaches
perform better, as they rely on correlations. Visual flow
31
Event Flow
32
• High temporal resolution generates smooth space-time
surfaces
• The slope of the local surface contains the orientation and
amplitude of the optical flow
Motion estimation: optical flow
Event Flow
33
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 25, NO. 2, FEBRUARY 2014 407
Event-Based Visual Flow
Ryad Benosman, Charles Clercq, Xavier Lagorce, Sio-Hoi Ieng, and Chiara Bartolozzi
Abstract—This paper introduces a new methodology to com-
pute dense visual flow using the precise timings of spikes from
an asynchronous event-based retina. Biological retinas, and their
artificial counterparts, are totally asynchronous and data-driven
and rely on a paradigm of light acquisition radically different
from most of the currently used frame-grabber technologies.
This paper introduces a framework to estimate visual flow from
the local properties of events’ spatiotemporal space. We will
show that precise visual flow orientation and amplitude can
be estimated using a local differential approach on the surface
defined by coactive events. Experimental results are presented;
they show the method adequacy with high data sparseness and
temporal resolution of event-based acquisition that allows the
computation of motion flow with microsecond accuracy and at
very low computational cost.
Index Terms—Event-based vision, event-based visual motion
flow, neuromorphic sensors, real time.
I. INTRODUCTION
RECENT work in this paper of neural activity has shown
that each spike arrival time is reliable [1]–[4]. However,
the extent to which the precise timing of neural spikes down
to millisecond precision is significant for computation is still a
matter of debate. In this paper, we address this issue by focus-
ing on the computational principles that could be operated
by motion-sensitive neurons of the visual system to compute
the visual flow. We bring together new experimental sensors
delivering truly naturalistic precise timed visual outputs and a
new mathematical method that allows to compute event-based
visual motion flow using each incoming spike’s timing as main
computation feature. This presented method does not rely on
gray levels, nor on the integration of activity over long time
intervals. It uses each relative timing of changes of individual
information in various demanding, machine vision applica-
tions [9]–[15]. The microsecond temporal resolution and the
inherent redundancy suppression of the frame-free, event-
driven acquisition and subsequent representation of visual
information employed by these cameras enables to derive a
novel methodology to process the visual information at a very
high speed and with low computational cost.
Visual flow is a topic of several research fields that has
been intensively studied since the early days of computational
neuroscience. It is widely used in artificial vision and essential
in navigation. Visual flow is known to be an ill-posed noisy
visual measure limited by the aperture problem. Its use in real-
time applications on natural scenes is generally difficult. It is
usually computed sparsely on high salient points.
Visual flow techniques are commonly classified under one of
the four major categories.
1) Energy-based or frequency-based methods estimate opti-
cal flow from the output of the velocity-tuned filters
designed in the Fourier domain [16]–[18].
2) Phase-based methods estimate image velocity in terms
of band-pass filter outputs [19].
3) Correlation-based or region-matching methods search
for a best match of small spatial neighborhoods between
adjacent frames [20]–[25].
4) Gradient-based or differential methods use spatiotem-
poral image intensity derivatives and an assumption of
brightness constancy [26]–[28].
Energy-based techniques are slow [19] and are not adequate
for real-time applications where gradient-based approaches
perform better, as they rely on correlations. Visual flow
34
Tracking Real-Time Outdoor Scenes
Z. Ni, S.H. Ieng, C. Posch, S. Regnier, R.B. Benosman, Visual Tracking using Neuromorphic Asynchronous Event-based Cameras, 24
February 2015 Neural Computation 27(4) :925-53, DOI : 10.1162/NECO-a-00720
35
that errorsaremoreequally shared between thevanandthecar thankstothemore
reliable event attribution.
Figure 16: The events generated by the van and the car during 3s are fitted into two
planes, denoted asP1 in blueand P2 in red. n1 and n2 arethesurfacenormals.
Tracking Real-Time Outdoor Scenes
Z. Ni, S.H. Ieng, C. Posch, S. Regnier, R.B. Benosman, Visual Tracking using Neuromorphic Asynchronous Event-based Cameras, 24
February 2015 Neural Computation 27(4) :925-53, DOI : 10.1162/NECO-a-00720
36
Event-Based Tracking with a Moving Camera
Z. Ni, S.H. Ieng, C. Posch, S. Regnier, R.B. Benosman, Visual Tracking using Neuromorphic Asynchronous Event-based Cameras, 24
February 2015 Neural Computation 27(4) :925-53, DOI : 10.1162/NECO-a-00720
37
Event-Based 3D Tracking and Pose Estimation
D. Reverter-Valeiras, G. Orchard, S.H. Ieng, R.B. Benosman,Neuromorphic Event-Based 3D Pose Estimation, 2016January 22, Frontiers in Neuromorphic
9(522), DOI : 10.3389/fnins.2015.00522
38
Low Power and Latency Streaming
39
Asynchronous Event-Based Fourrier Analysis
Q. Sabatier, S.H. Ieng, R.B. Benosman, Asynchronous Event-based Fourier Analysis, 2017 February 6, IEEE Transactions on Image
Processing 26 (5) :2192-2202, DOI : 10.1109/TIP.2017.2661702
40
Last Two Decades: Rethinking Computer Vision in The Time Domain
(b) Events from the sensor
(f) Time surface
(a) Event-driven time-based
vision sensor (ATIS or DVS)
Context amplitude
surface amplitude
X (spatial)
Y
(
s
p
a
t
i
a
l
)
ON events OFF events
Deep Temporal Learning: Time Surfaces
Time Surface
Pixels
X. Lagorce, G. Orchard, F. Galluppi, B. E. Shi, R.B. Benosman, HOTS : A Hierarchy Of event- based Time-Surfaces for pattern recognition, 2016
July 11, IEEE Transaction on Pattern Analysis and Machine Intelligence 39(7) :1346-1359, doi :10.1109/TPAMI.2016.2574707
42
Fig. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is pre
events (c) which are fed into Layer 1. The events are convolved with exponential kernels
length ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is m
constitute the output of the layer (g). Each layer k (gray boxes) takes input from its pre
The output of Layer k is presented between Layer k and k þ 1 ((g),(h),(i)). To compute e
length ð2Rk þ 1Þ around each pixel. The event contexts are compared to the differe
1350 IEEE TRANSACTIONS ON PATTERN ANALYSIS
Fig. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is presented to the ATIS camera (b) which
events (c) which are fed into Layer 1. The events are convolved with exponential kernels (d) to build event contexts from spatia
length ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is matched, it produces an event (f). Even
1350 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL
Fig. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is presented to the ATIS camera (b) which p
events (c) which are fed into Layer 1. The events are convolved with exponential kernels (d) to build event contexts from spatial
length ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is matched, it produces an event (f). Event
1350 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 3
Fig. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is presented to the ATIS cam
events (c) which are fed into Layer 1. The events are convolved with exponential kernels (d) to build event contex
length ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is matched, it produces an e
constitute the output of the layer (g). Each layer k (gray boxes) takes input from its previous layer and feeds th
<latexit sha1_base64="E0lXgR5y61QwpnHFZI9ZGbqt2ts=">AAACy3icjVHLSsNAFD2Nr1pfVZdugkVwVRJRdFl040aoYB/QljKZTmtomoRkItTq0h9wq/8l/oH+hXfGKahFdEKSM+eec2fuvV4c+Kl0nNecNTe/sLiUXy6srK6tbxQ3t+pplCVc1HgUREnTY6kI/FDUpC8D0YwTwUZeIBre8EzFGzciSf0ovJLjWHRGbBD6fZ8zSVSzLVlmd91Ct1hyyo5e9ixwDSjBrGpUfEEbPUTgyDCCQAhJOABDSk8LLhzExHUwIS4h5Ou4wD0K5M1IJUjBiB3Sd0C7lmFD2qucqXZzOiWgNyGnjT3yRKRLCKvTbB3PdGbF/pZ7onOqu43p75lcI2Ilron9yzdV/tenapHo40TX4FNNsWZUddxkyXRX1M3tL1VJyhATp3CP4glhrp3TPtvak+raVW+Zjr9ppWLVnhtthnd1Sxqw+3Ocs6B+UHaPys7lYalyakadxw52sU/zPEYF56iipuf4iCc8WxdWat1ad59SK2c82/i2rIcPBnaRrw==</latexit>
⌧1
(c) Spatio-temporal domain
(e) Exponential kernels
(b) Events from the sensor
(f) Time surface
(d) Time context
(a) Event-driven time-based
vision sensor (ATIS or DVS)
Context amplitude
surface amplitude
X (spatial)
Y
(
s
p
a
t
i
a
l
)
ON events OFF events
Clustering
Fig. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is presented to the ATIS camera (b) which
events (c) which are fed into Layer 1. The events are convolved with exponential kernels (d) to build event contexts from spatia
length ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is matched, it produces an event (f). Even
Stimulus
Fig. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is presented to the ATIS camera (b) which produces ON an
events (c) which are fed into Layer 1. The events are convolved with exponential kernels (d) to build event contexts from spatial receptive field o
length ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is matched, it produces an event (f). Events from the N1 fe
constitute the output of the layer (g). Each layer k (gray boxes) takes input from its previous layer and feeds the next by reproducing steps
1350 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. 7, JUL
Event
Camera
1350 IEEE TRANSACTIONS ON PATTERN ANALYS
1350 IEEE TRANSACTIONS ON PATTERN ANALYSIS AN
43
Fig. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is presented to the ATIS camera (b
events (c) which are fed into Layer 1. The events are convolved with exponential kernels (d) to build event contexts from
length ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is matched, it produces an event (f
<latexit sha1_base64="IEy8eBhrIcA0pDsqgwmjW0T+92U=">AAACy3icjVHLSsNAFD3GV62vqks3wSK4KklRdFl040aoYB/QljJJpzU0TUJmItTq0h9wq/8l/oH+hXfGKahFdEKSM+eec2fuvV4SBkI6zuucNb+wuLScW8mvrq1vbBa2tusizlKf1/w4jNOmxwQPg4jXZCBD3kxSzkZeyBve8EzFGzc8FUEcXclxwjsjNoiCfuAzSVSzLVlmd8v5bqHolBy97FngGlCEWdW48II2eojhI8MIHBEk4RAMgp4WXDhIiOtgQlxKKNBxjnvkyZuRipOCETuk74B2LcNGtFc5hXb7dEpIb0pOG/vkiUmXElan2Tqe6cyK/S33ROdUdxvT3zO5RsRKXBP7l2+q/K9P1SLRx4muIaCaEs2o6nyTJdNdUTe3v1QlKUNCnMI9iqeEfe2c9tnWHqFrV71lOv6mlYpVe99oM7yrW9KA3Z/jnAX1csk9KjmXh8XKqRl1DrvYwwHN8xgVnKOKmp7jI57wbF1Ywrq17j6l1pzx7ODbsh4+AAjXkbA=</latexit>
⌧2
1350 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIG
Fig. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is presented to the
events (c) which are fed into Layer 1. The events are convolved with exponential kernels (d) to build ev
length ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is matched, it prod
constitute the output of the layer (g). Each layer k (gray boxes) takes input from its previous layer an
The output of Layer k is presented between Layer k and k þ 1 ((g),(h),(i)). To compute event contexts
1350 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHIN
Clustering
1350 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELL
Fig. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is presented to the ATIS camera (b) which produces ON and OFF
events (c) which are fed into Layer 1. The events are convolved with exponential kernels (d) to build event contexts from spatial receptive field of side-
length ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is matched, it produces an event (f). Events from the N1 features
constitute the output of the layer (g). Each layer k (gray boxes) takes input from its previous layer and feeds the next by reproducing steps (d)-(g).
The output of Layer k is presented between Layer k and k þ 1 ((g),(h),(i)). To compute event contexts, each layer considers a receptive field of side-
length ð2Rk þ 1Þ around each pixel. The event contexts are compared to the different features (represented as surfaces in the gray boxes as
1350 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. 7, JULY 2017
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. 7
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. 7, JULY 201
44
g. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is presented to the ATIS camera (b) which produces ON and OFF
vents (c) which are fed into Layer 1. The events are convolved with exponential kernels (d) to build event contexts from spatial receptive field of side-
ngth ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is matched, it produces an event (f). Events from the N1 features
onstitute the output of the layer (g). Each layer k (gray boxes) takes input from its previous layer and feeds the next by reproducing steps (d)-(g).
Clustering
E TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. 7, JULY 2017
model. From left to right, a moving digit (a) is presented to the ATIS camera (b) which produces ON and OFF
events are convolved with exponential kernels (d) to build event contexts from spatial receptive field of side-
stered into N1 features (e). When a feature is matched, it produces an event (f). Events from the N1 features
ch layer k (gray boxes) takes input from its previous layer and feeds the next by reproducing steps (d)-(g).
<latexit sha1_base64="m9P+XndBFyGd43BPzGkrnmWWB64=">AAACy3icjVHLSsNAFD3GV62vqks3wSK4KokPdFl040aoYB/QljJJpzU0TcLMRKjVpT/gVv9L/AP9C++MKahFdEKSM+eec2fuvV4SBlI5zuuMNTs3v7CYW8ovr6yurRc2NmsyToXPq34cxqLhMcnDIOJVFaiQNxLB2dALed0bnOl4/YYLGcTRlRolvD1k/SjoBT5TRDVaiqV25yDfKRSdkmOWPQ3cDBSRrUpceEELXcTwkWIIjgiKcAgGSU8TLhwkxLUxJk4QCkyc4x558qak4qRgxA7o26ddM2Mj2uuc0rh9OiWkV5DTxi55YtIJwvo028RTk1mzv+Uem5z6biP6e1muIbEK18T+5Zso/+vTtSj0cGJqCKimxDC6Oj/Lkpqu6JvbX6pSlCEhTuMuxQVh3zgnfbaNR5radW+Zib8ZpWb13s+0Kd71LWnA7s9xToPafsk9KjmXh8XyaTbqHLaxgz2a5zHKOEcFVTPHRzzh2bqwpHVr3X1KrZnMs4Vvy3r4AAs4kbE=</latexit>
⌧3
45
IONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. 7, JULY 2017
normalized, and Bhattacharyya distances respectively. The
recognized class is the one with the smallest bar in each col-
umn, and is marked with a star. Each character is presented
once to the sensor in the order we gave to the classes so that
perfect results correspond to stars only on the main diago-
nal. In this experiment, all distances lead to a recognition
performance of 100 percent.
We ran a cross validation test by randomly choosing for
each pattern which presentation would be used for learning
(both the model and classifier), and the other is then used
for testing. Every trial amongst the hundred we ran gave a
recognition accuracy of 100 percent.
This experiment is run on a dataset composed of objects
with very distinct spatial characteristics. Because of that, it is
the best one to try to interpret the features learnt by the
architecture. Fig. 11 presents the activation of all three
experiment, we use a dataset consisting of the faces of seven
subjects. Each subject is moving his or her head to follow the
same visual stimulus tracing out a square path on a computer
monitor (see Fig. 12). The data is acquired by an ATIS
camera [22]. Each subject has 20 presentations in the dataset
of which one is used for training and the others for testing.
We again use the same hierarchical system described
previously with the following parameters:
! R1 ¼ 4;
! t1 ¼ 50 ms;
! N1 ¼ 8:
! KR ¼ 2;
! Kt ¼ 5;
! KN ¼ 2:
Because the faces are bigger and more complex objects, we
Fig. 9. Letters & Digits experiment: Pattern signatures for some of the input classes. For each letter and digit the trained histogram used as a signa-
ture by the classifier is shown. The snapshot shows an accumulation of events from the sensor (White dots for ON events and black dots for OFF
events). The histograms present the signatures: X-axis is the index of the feature, Y-axis is the number of activations of the feature during the stimu-
lus presentation. The signatures of all the letters & digits are presented in the supplemental material, available online.
LAGORCE ET AL.: HOTS: A HIERARCHY OF EVENT-BASED TIME-SURFACES FOR PATTERN RECOGNITION 1353
ving digit (a) is presented to the ATIS camera (b) which produces ON and OFF
ponential kernels (d) to build event contexts from spatial receptive field of side-
en a feature is matched, it produces an event (f). Events from the N1 features
nput from its previous layer and feeds the next by reproducing steps (d)-(g).
i)). To compute event contexts, each layer considers a receptive field of side-
E INTELLIGENCE, VOL. 39, NO. 7, JULY 2017
Deeper
46
Deep Temporal Learning with Adaptive Temporal Feedback:
Temporal Surfaces
X. Lagorce, G. Orchard, F. Galluppi, B. E. Shi, R.B. Benosman, HOTS : A Hierarchy Of event- based Time-Surfaces for pattern recognition, 2016
July 11, IEEE Transaction on Pattern Analysis and Machine Intelligence 39(7) :1346-1359, doi :10.1109/TPAMI.2016.2574707
47
Deep Temporal Learning with Adaptive Temporal Feedback:
Temporal Surfaces
X. Lagorce, G. Orchard, F. Galluppi, B. E. Shi, R.B. Benosman, HOTS : A Hierarchy Of event- based Time-Surfaces for pattern recognition, 2016
July 11, IEEE Transaction on Pattern Analysis and Machine Intelligence 39(7) :1346-1359, doi :10.1109/TPAMI.2016.2574707
48
Computation Platfoms?
Two tendencies
Biomimetism Understandand Accelerate
49
From: Goi, E., Zhang, Q., Chen, X. et al. Perspective on photonic memristive neuromorphic computing. PhotoniX 1, 3 (2020).
https://doi.org/10.1186/s43074-020-0001-6
Computation Platfoms?
Replicate
50
https://wiki.ebrains.eu/bin/view/Collabs/
neuromorphic/BrainScaleS/
~4 Million bio-realistic neurons
880 Million learning synapses
105 faster than real time
164 Giga-events/s (normal)
1 Tera-event/s (burst)
several hundreds of kW [2010]
Analog vs Digital
SpiNNaker
1 millon processors
200 million million actions per second
51
Neuromorphic Computing
[from The Scientist, 2019]
Today’s digital neuromorphic
hardware
52
I/O
I/O
memory
Event-based
Machine
Learning
Module
Event-based
Processing
Communication
AER Module
Communication
AER
Module
Communication
AER Module
I/O
A Processing Solution Adapted to Event Data
Events
Vision
Analog
Digital
LIDAR
Radar Audio
IMU
• Architecture for Event per Event: programmable
event-based computation and machine learning
• Industry standard I/O and programmability
• Scalable, enabling fast and cost-effective derivatives
<10-100mW and up to 20 GigaEvents/s processing
SPEAR-
1
understand
53
https://www.summerrobotics.ai
7
VII. RESULTS
A. Experiment
The method is applied on a scene containing two objects
placed around one meter away from the system. We used a
SPSM of 20⇥30 symbols and the pattern codification is done
using the dutycycle of the signal. The neighborhood in the
burst filter is set to 1 (3 ⇥ 3 pixels) and signal’s frequency
is set to 20Hz for performance issues. The algorithm runs on
MatLab and even if it could be heavily parallelized, it wasn’t
the case on this experiment and real time wasn’t achieve. An
image of accumulated outputs from filters is given in Fig. 9,
colors in the image reflect the estimation of the dutycycle by
each pixel associated filter. Point correspondance is performed
by extracting the 3 ⇥ 3 dutycycle based codewords and as
pattern’s dots are bigger than a single pixel, we perform a
spatial averaging to obtain a sub-pixel position of the dot in
the camera frame. After triangulation, a 3D point cloud is
obtained (Fig. 10)
Fig. 9. Image of the projected pattern as extracted by the algorithm. Colors
correspond to the estimated dutycycle.
Fig. 10. 3D point cloud.
VIII. CONCLUSION AND DISCUSSION
This paper introduced a methodology that makes use of the
hguh temporal resolution of the event based sensor to rethink
Fig. 11. Reconstructed depth
a unique spatial distribution of light patterns composed of
elements each flickering at a unique frequency. We also used
the idea of random shift allowing each neighboorhood to be
unique. Each methodology relying on the combined use of an
event based camera and a light projector coding each spatial
position in the frequency domain can make use of the devel-
oped approach. There are several ways to decode frequencies
from the output of the event based camera. The method used
here could be bettered, using more precise event based camera
allowing fequency to be extracted from the timing of the
oscillation of each pattern from inter spike information. If
that condition is fulfilled even simpler patterns could be used
without the need to rely on unique neighborhoods.
REFERENCES
[1] A. D. L. Escalera, L. Moreno, M. A. Salichs, and J. M.
Armingol, “Continuous mobile robot localization by using structured
light and a geometric map,” International Journal of Systems
Science, vol. 27, no. 8, pp. 771–782, 1996. [Online]. Available:
http://www.tandfonline.com/doi/abs/10.1080/00207729608929276
[2] M. Asada, “Map building for a mobile robot from sensory data,”
Systems, Man and Cybernetics, IEEE Transactions on, vol. 20, no. 6,
pp. 1326–1336, 1990.
[3] Y. Kakinoki, T. Koezuka, S. Hashinami, and M. Nakashima, “Wide-
area, high dynamic range 3-d imager,” in Proc. SPIE 1194, Optics,
Illumination, and Image Sensing for Machine Vision IV, 1990.
[4] R. T. Chin, “Automated visual inspection: 1981 to 1987,” Computer
Vision, Graphics, and Image Processing, vol. 41, no. 3, pp. 346–381,
1988. [Online]. Available: http://www.sciencedirect.com/science/article/
pii/0734189X88901089
[5] E. M. Petriu, Z. Sakr, H. J. W. Spoelder, and A. Moica, “Object
recognition using pseudo-random color encoded structured light,” in
Instrumentation and Measurement Technology Conference, 2000. IMTC
2000. Proceedings of the 17th IEEE, vol. 3. IEEE, 2000, pp.
1237–1241. [Online]. Available: http://ieeexplore.ieee.org/xpls/abs all.
jsp?arnumber=848675
[6] J. Park, G. N. DeSouza, and A. C. Kak, “Dual-beam structured-light
scanning for 3-d object modeling,” in 3-D Digital Imaging and
Modeling, 2001. Proceedings. Third International Conference on.
IEEE, 2001, pp. 65–72. [Online]. Available: http://ieeexplore.ieee.org/
xpls/abs all.jsp?arnumber=924399
[7] D. Page, A. Koschan, Y. Sun, and M. Abidi, “Laser-based imaging
for reverse engineering,” Sensor Review, vol. 23, no. 3, pp. 223–229,
2003. [Online]. Available: http://www.emeraldinsight.com/journals.htm?
issn=0260-2288&volume=23&issue=3&articleid=876385&articletitle=
Laser-based+imaging+for+reverse+engineering&
[8] R. Klette, K. Schlüns, and A. Koschan, Computer
obtained (Fig. 10)
Fig. 9. Image of the projected pattern as extracted by the algorithm. Colors
correspond to the estimated dutycycle.
Fig. 10. 3D point cloud.
VIII. CONCLUSION AND DISCUSSION
This paper introduced a methodology that makes use of the
hguh temporal resolution of the event based sensor to rethink
the problem of structured light depth reconstruction. We used
elements each flickering at
the idea of random shift all
unique. Each methodology r
event based camera and a l
position in the frequency do
oped approach. There are se
from the output of the even
here could be bettered, using
allowing fequency to be e
oscillation of each pattern
that condition is fulfilled ev
without the need to rely on
REF
[1] A. D. L. Escalera, L.
Armingol, “Continuous mob
light and a geometric m
Science, vol. 27, no. 8, p
http://www.tandfonline.com/d
[2] M. Asada, “Map building
Systems, Man and Cyberneti
pp. 1326–1336, 1990.
[3] Y. Kakinoki, T. Koezuka, S.
area, high dynamic range 3
Illumination, and Image Sens
[4] R. T. Chin, “Automated visu
Vision, Graphics, and Image
1988. [Online]. Available: htt
pii/0734189X88901089
[5] E. M. Petriu, Z. Sakr, H.
recognition using pseudo-ran
Instrumentation and Measure
2000. Proceedings of the
1237–1241. [Online]. Availa
jsp?arnumber=848675
[6] J. Park, G. N. DeSouza, and
scanning for 3-d object m
Modeling, 2001. Proceeding
IEEE, 2001, pp. 65–72. [On
xpls/abs all.jsp?arnumber=92
[7] D. Page, A. Koschan, Y. Su
for reverse engineering,” Sen
2003. [Online]. Available: htt
issn=0260-2288&volume=23&
Laser-based+imaging+for+rev
[8] R. Klette, K. Schlün
vision: three-dimensional
Singapore, 1998, v
Fig. 3. Experimental setup
III. PATTERN CODIFICATION STRATEGIES
To get the better out of the ATIS sensor, it is mandatory
to choose a suitable pattern codification. In the event domain,
the only information available is the polarity and timestamp
of pixels, thus the correspondance problem can’t be solve by
color or grey level coding. Moreover, the need for an asyn-
chronous system impose that our pattern can be segmented
in small independent patches that we could use to extract
local information only when it is needed. This constraint has
a direct impact on the spatial organisation of the pattern and
direct methods like the one used in ?? must be discarded.
In the next section we expand over three different methods
that could be used to gather local data independently as well
as being decodable without too much computation from the
sensor’s output to ensure a decent reconstruction speed. The
first method we will develop is based on the well known
time-multiplexing binary coding (refs?) but encode the binary
frames as a signal with set frequency and dutycycle, each
element is then part of a spatial codification that can be
decoded locally. Our second method aim to makeuseof spatio-
temporal continuity of events generated from fast moving
objects to create codewords based on spatial orientation of
multiple moving dots. This approach differs only from the first
in the way information is extracted from the sensor, but the
decoding process is done the same way. Lastly, we developed
an approach based on phase-shifting algorithms, but instead
of evaluating the phase from several shifted patterns (ref), we
continuously move a series of stripes and record precise time
difference between stripe positions from an arbitrary origin,
which gives us a robust measurement. This last method is
Projector plane Camera plane
Scene
C1
p2
P
C2
p1
Fig. 4. Structured light principle. A coded light pattern is projected onto
scene that is observed by a camera. Decoding the pattern allows the match
of paired points in the two views (p1, p2 ) and by triangulation of the r
coming from optical centers C1 and C2 , the position of the real point
is found. Colors in the picture are for clarity only and in our case refers
different binary signal’s dutycycle for the first method or different mot
orientation for the second.
instead of being computed through a sequence of intens
values. Fig. ?? shows the pattern spatial organisation, ea
color refers to aspecific dutycycle. It uses aDebruijn sequen
that is repeated for each line. Ideally we should have project
full lines instead of small spots but practical limitations limit
our choices. A periodic projection generates, at the pixel lev
an event stream composed of successive ON and OFF eve
bursts following the signal’s periodic behavior. The length a
the number of events constituting each burst is defined by t
contrast and illumination condition as well as the set of bias
used to operate the sensor.
In a perfect case, each edge of the signal would trigg
only one event of the corresponding polarity for each pix
In this situation, measuring the inter-event time betwe
successive ON and OFF events would be enough to get
correct estimation of the signal’s frequency and dutycyc
However, because of small imprecisions in the attribution
timestamps by the sensor’s arbiter and possible variations
the global illumination, the number of events triggered at ea
edge of the signal is not consistent. To ensure at least o
event per period, we have two choices: increase the senso
articlehasbe
e
nacce
pte
dfor inclus
ioninafutureis
s
ueofthisjournal.Conte
nt isfinal aspre
s
e
nte
d,withthee
xce
ptionofpagination.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
of light. By mixing the received signal
of pt (t) at the in-phase receiver of
nd by low-pass filtering the resulting
frequency component fc is removed,
l is obtained:
2παTdt + 2πfcTd − παT 2
d . (4)
of the baseband signal is proportional
d, its spectrum shows peaks at certain
ding to the distance between the radar
bjects (the receive signal being, in fact,
of the form given by (4) when multiple
fter the analog-to-digital conversion by
discrete-time signal r[i] is found as
i] = r(t = iTf ). (5)
ADC sampling period Tf impacts the
age dmax of the radar [21]
dmax =
c
2αTf
. (6)
ned as follows:
α =
Tc
(7)
duration and is the bandwidth of the
he range resolution [21]
dres =
c
2
. (8)
ange profile Rn[k] is defined as the
n the magnitude of Rn[k] indicate the
hile the phase evolution of Rn[k∗
] for a
between successive chirps represents the
TABLE I
DATASET CONTENT USED IN THE DEMONSTRATION EXPERIMENTS
Fig. 1. Gesture acquisition setup.
making them more sensitive to proper preprocessing and
feature extraction.
The radar parameters were set as follows: the number of
ADC samples per chirp is 512, the number of chirps per frame
is 192, and the time between chirps is Ts = 1.2 ms, while the
chirp duration Tc is 41 µs. Therefore, the total duration for
a frame capture is 238 ms and Tf = 80 ns. Fig. 1 shows
the gesture acquisition setup with the antennas and the radar
Radar and LIDAR and much
more…
54
Sight Restoration: Prosthetics and Optogenetics
• Development of Retina Stimulation Goggles
• 3 generations of Retina Prosthetics
• Asynchronous Retina Stimulation: Prosthetics and
Optogenetics
IRIS 2
PRIMA
55
Sight Restoration: Prosthetics and Optogenetics
56
& much more..
Décision making: game
theory stock Market
Low power Online
decoding and
classification
Robotics
ADAS Sensory
Substitution
Always on
sensing
Space
awareness
57
Conclusions
?
Where we came from…
A Chance to Move Perception from
Engineering to Basic Science!
observe understand model
application
A Chance to ground Perception in Basic
Science!
• A whole new world to explore
• A deep paradigm shift for Sensing & AI
• Novel sensors to build
• New adapted processing architectures to design
58
Conclusions
• A whole new world to explore
• A deep paradigm shift for Sensing & AI
• Novel sensors to build
• New adapted processing architectures to design
Physiology
- Models
- Hardware
- Prosthetics
- Robotics
- Computation
59

Contenu connexe

Tendances

Digital Signal Processing
Digital Signal Processing Digital Signal Processing
Digital Signal Processing Sri Rakesh
 
Multiplexing tdma fdma cdma wdma
Multiplexing tdma fdma cdma wdmaMultiplexing tdma fdma cdma wdma
Multiplexing tdma fdma cdma wdmaShadab Siddiqui
 
DSP_FOEHU - Lec 07 - Digital Filters
DSP_FOEHU - Lec 07 - Digital FiltersDSP_FOEHU - Lec 07 - Digital Filters
DSP_FOEHU - Lec 07 - Digital FiltersAmr E. Mohamed
 
Complex Programmable Logic Device (CPLD) Architecture and Its Applications
Complex Programmable Logic Device (CPLD) Architecture and Its ApplicationsComplex Programmable Logic Device (CPLD) Architecture and Its Applications
Complex Programmable Logic Device (CPLD) Architecture and Its Applicationselprocus
 
Pulse amplitude modulation (PAM)
Pulse amplitude modulation (PAM)Pulse amplitude modulation (PAM)
Pulse amplitude modulation (PAM)Mugadha Bane
 
Multirate
MultirateMultirate
MultirateaiQUANT
 
System On Chip (SOC)
System On Chip (SOC)System On Chip (SOC)
System On Chip (SOC)Shivam Gupta
 
Introduction to Digital Signal Processing
Introduction to Digital Signal ProcessingIntroduction to Digital Signal Processing
Introduction to Digital Signal Processingop205
 
9 semiconductor memory
9 semiconductor memory9 semiconductor memory
9 semiconductor memoryUsha Mehta
 
FILTER DESIGN
FILTER DESIGNFILTER DESIGN
FILTER DESIGNnaimish12
 
SoC based smartphone processors
SoC based smartphone processorsSoC based smartphone processors
SoC based smartphone processorsAnkush Kumar
 

Tendances (20)

Multirate DSP
Multirate DSPMultirate DSP
Multirate DSP
 
Digital Signal Processing
Digital Signal Processing Digital Signal Processing
Digital Signal Processing
 
Multiplexing tdma fdma cdma wdma
Multiplexing tdma fdma cdma wdmaMultiplexing tdma fdma cdma wdma
Multiplexing tdma fdma cdma wdma
 
DSP_FOEHU - Lec 07 - Digital Filters
DSP_FOEHU - Lec 07 - Digital FiltersDSP_FOEHU - Lec 07 - Digital Filters
DSP_FOEHU - Lec 07 - Digital Filters
 
Complex Programmable Logic Device (CPLD) Architecture and Its Applications
Complex Programmable Logic Device (CPLD) Architecture and Its ApplicationsComplex Programmable Logic Device (CPLD) Architecture and Its Applications
Complex Programmable Logic Device (CPLD) Architecture and Its Applications
 
ASIC
ASICASIC
ASIC
 
LDPC Codes
LDPC CodesLDPC Codes
LDPC Codes
 
Pulse amplitude modulation (PAM)
Pulse amplitude modulation (PAM)Pulse amplitude modulation (PAM)
Pulse amplitude modulation (PAM)
 
Multirate
MultirateMultirate
Multirate
 
System On Chip (SOC)
System On Chip (SOC)System On Chip (SOC)
System On Chip (SOC)
 
Introduction to Digital Signal Processing
Introduction to Digital Signal ProcessingIntroduction to Digital Signal Processing
Introduction to Digital Signal Processing
 
Embedded System Case Study
Embedded System Case StudyEmbedded System Case Study
Embedded System Case Study
 
9 semiconductor memory
9 semiconductor memory9 semiconductor memory
9 semiconductor memory
 
FILTER DESIGN
FILTER DESIGNFILTER DESIGN
FILTER DESIGN
 
quantization
quantizationquantization
quantization
 
basic vlsi ppt
basic vlsi pptbasic vlsi ppt
basic vlsi ppt
 
SRAM
SRAMSRAM
SRAM
 
SoC based smartphone processors
SoC based smartphone processorsSoC based smartphone processors
SoC based smartphone processors
 
Hilbert
HilbertHilbert
Hilbert
 
TMS320C5x
TMS320C5xTMS320C5x
TMS320C5x
 

Similaire à “Event-Based Neuromorphic Perception and Computation: The Future of Sensing and AI,” a Keynote Presentation from Ryad Benosman

AI is Impacting HPC Everywhere
AI is Impacting HPC EverywhereAI is Impacting HPC Everywhere
AI is Impacting HPC Everywhereinside-BigData.com
 
Practical computer vision-- A problem-driven approach towards learning CV/ML/DL
Practical computer vision-- A problem-driven approach towards learning CV/ML/DLPractical computer vision-- A problem-driven approach towards learning CV/ML/DL
Practical computer vision-- A problem-driven approach towards learning CV/ML/DLAlbert Y. C. Chen
 
Introduction to Binocular Stereo in Computer Vision
Introduction to Binocular Stereo in Computer VisionIntroduction to Binocular Stereo in Computer Vision
Introduction to Binocular Stereo in Computer Visionothersk46
 
Ivan Sutherland - A pioneer in Human Computer Interaction
Ivan Sutherland - A pioneer in Human Computer InteractionIvan Sutherland - A pioneer in Human Computer Interaction
Ivan Sutherland - A pioneer in Human Computer InteractionAtul Narkhede
 
Fit for Purpose MCN 2011 mjw
Fit for Purpose MCN 2011 mjwFit for Purpose MCN 2011 mjw
Fit for Purpose MCN 2011 mjwwachowiakm
 
A historical introduction to deep learning: hardware, data, and tricks
A historical introduction to deep learning: hardware, data, and tricksA historical introduction to deep learning: hardware, data, and tricks
A historical introduction to deep learning: hardware, data, and tricksBalázs Kégl
 
120_SEM_Special_Topics.ppt
120_SEM_Special_Topics.ppt120_SEM_Special_Topics.ppt
120_SEM_Special_Topics.pptzaki194502
 
Life and Work of Ivan Sutherland | Turing100@Persistent
Life and Work of Ivan Sutherland | Turing100@PersistentLife and Work of Ivan Sutherland | Turing100@Persistent
Life and Work of Ivan Sutherland | Turing100@PersistentPersistent Systems Ltd.
 
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...inside-BigData.com
 
Image compression using fractal functions
Image compression using fractal functionsImage compression using fractal functions
Image compression using fractal functionskanimozhirajasekaren
 
Semantic Mapping of Road Scenes
Semantic Mapping of Road ScenesSemantic Mapping of Road Scenes
Semantic Mapping of Road ScenesSunando Sengupta
 
Bruce Damer's talk on the Von Neumann Bottleneck to the Singularity Universit...
Bruce Damer's talk on the Von Neumann Bottleneck to the Singularity Universit...Bruce Damer's talk on the Von Neumann Bottleneck to the Singularity Universit...
Bruce Damer's talk on the Von Neumann Bottleneck to the Singularity Universit...Bruce Damer
 

Similaire à “Event-Based Neuromorphic Perception and Computation: The Future of Sensing and AI,” a Keynote Presentation from Ryad Benosman (20)

AI is Impacting HPC Everywhere
AI is Impacting HPC EverywhereAI is Impacting HPC Everywhere
AI is Impacting HPC Everywhere
 
Practical computer vision-- A problem-driven approach towards learning CV/ML/DL
Practical computer vision-- A problem-driven approach towards learning CV/ML/DLPractical computer vision-- A problem-driven approach towards learning CV/ML/DL
Practical computer vision-- A problem-driven approach towards learning CV/ML/DL
 
La latent
La latentLa latent
La latent
 
Introduction to Binocular Stereo in Computer Vision
Introduction to Binocular Stereo in Computer VisionIntroduction to Binocular Stereo in Computer Vision
Introduction to Binocular Stereo in Computer Vision
 
Optical CT
Optical CTOptical CT
Optical CT
 
Ivan Sutherland - A pioneer in Human Computer Interaction
Ivan Sutherland - A pioneer in Human Computer InteractionIvan Sutherland - A pioneer in Human Computer Interaction
Ivan Sutherland - A pioneer in Human Computer Interaction
 
07 Network Visualization
07 Network Visualization07 Network Visualization
07 Network Visualization
 
Fit for Purpose MCN 2011 mjw
Fit for Purpose MCN 2011 mjwFit for Purpose MCN 2011 mjw
Fit for Purpose MCN 2011 mjw
 
Hci history
Hci historyHci history
Hci history
 
A historical introduction to deep learning: hardware, data, and tricks
A historical introduction to deep learning: hardware, data, and tricksA historical introduction to deep learning: hardware, data, and tricks
A historical introduction to deep learning: hardware, data, and tricks
 
Computer Vision Workshop
Computer Vision WorkshopComputer Vision Workshop
Computer Vision Workshop
 
RJW CCAE Course 1977-79
RJW CCAE Course 1977-79RJW CCAE Course 1977-79
RJW CCAE Course 1977-79
 
120_SEM_Special_Topics.ppt
120_SEM_Special_Topics.ppt120_SEM_Special_Topics.ppt
120_SEM_Special_Topics.ppt
 
Medical x ray image sensors
Medical x ray image sensorsMedical x ray image sensors
Medical x ray image sensors
 
upload3.pptx
upload3.pptxupload3.pptx
upload3.pptx
 
Life and Work of Ivan Sutherland | Turing100@Persistent
Life and Work of Ivan Sutherland | Turing100@PersistentLife and Work of Ivan Sutherland | Turing100@Persistent
Life and Work of Ivan Sutherland | Turing100@Persistent
 
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
 
Image compression using fractal functions
Image compression using fractal functionsImage compression using fractal functions
Image compression using fractal functions
 
Semantic Mapping of Road Scenes
Semantic Mapping of Road ScenesSemantic Mapping of Road Scenes
Semantic Mapping of Road Scenes
 
Bruce Damer's talk on the Von Neumann Bottleneck to the Singularity Universit...
Bruce Damer's talk on the Von Neumann Bottleneck to the Singularity Universit...Bruce Damer's talk on the Von Neumann Bottleneck to the Singularity Universit...
Bruce Damer's talk on the Von Neumann Bottleneck to the Singularity Universit...
 

Plus de Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...Edge AI and Vision Alliance
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...Edge AI and Vision Alliance
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...Edge AI and Vision Alliance
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...Edge AI and Vision Alliance
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...Edge AI and Vision Alliance
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsightsEdge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...Edge AI and Vision Alliance
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...Edge AI and Vision Alliance
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...Edge AI and Vision Alliance
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from SamsaraEdge AI and Vision Alliance
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...Edge AI and Vision Alliance
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...Edge AI and Vision Alliance
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...Edge AI and Vision Alliance
 

Plus de Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 

Dernier

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Dernier (20)

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

“Event-Based Neuromorphic Perception and Computation: The Future of Sensing and AI,” a Keynote Presentation from Ryad Benosman

  • 1. R.B. Benosman, McGowan Institute, BST-3, Rm 2046, 3501 Fifth Avenue Pittsburgh, PA 15213 benosman@pitt.edu Event-based Neuromorphic Perception and Computation: The Future of Sensing and AI 1
  • 2. R.B. Benosman, McGowan Institute, BST-3, Rm 2046, 3501 Fifth Avenue Pittsburgh, PA 15213 benosman@pitt.edu Event-based Neuromorphic Perception and Computation: The Future of Sensing and AI 2
  • 3. 1959 Russell’s Infant Son: 5cm by 5cm (176x176 array) Portland Art Museum. David Hubel and Torsten Wiesel —  in 1959.Their publication, entitled “Receptive fields of single neurons in the cat’s striate cortex” Lawrence Roberts’ “Machine perception of three-dimensional solids”. Process 2D photographs to build up 3D representations from lines. 1963 1966 MIT summer Vision Project, Seymour Papert, automatically, background/foreground segmentation, extract non- overlapping objects David Marr, “Vision a computational investigation into the human representtion and processing of visual information” vision is hierarchical 1982 Franck Rosenblatt: Perceptron 1958 AI hype? Frank Rosenblatt (1928-1971) • Books: Marr (1982), Ballard & Brown (1982), Horn (1986) 1990s: Geometry reigns • Fundamental matrix: Faugeras (1992) • Normalized 8-point algorithm: Hartley (1997) • RANSAC for robust fundamental matrix estimation: Torr & Murray (1997) • Bundle adjustment: Triggs et al. (1999) • Hartley & Zisserman book (2000) • Projective structure from motion: Faugeras and Luong (2001) Early applications of image analysis • Character and digit recognition • First OCR conference in 1962 • Microscopy, cytology • Interpretation of aerial images • Even before satellites! • Particle physics • Hough transform for analysis of bubble chamber photos published in 1959 • Face recognition • Article about W. Bledsoe Azriel Rosenfeld (1931-2004) “Father of computer vision” • Ph.D. in mathematics, Columbia, 1957 • Professor at UMD and ordained rabbi • Wrote first textbook in the field in 1969 1969 Early applications of image analysis • Character and digit recognition • First OCR conference in 1962 • Microscopy, cytology • Interpretation of aerial images • Even before satellites! • Particle physics • Hough transform for analysis of bubble chamber photos published in 1959 • Face recognition • Article about W. Bledsoe • Fingerprint recognition Azriel Rosenfeld (1931-2004) “Father of computer vision” • Ph.D. in mathematics, Columbia, 1957 • Professor at UMD and ordained rabbi • Wrote first textbook in the field in 1969 • Oral history (IEEE interview, 1998) Azriel Rosenfeld Early applications of image analysis 1990’ s Vision is ruled by Geometry, (projective) 3D reconstruction, fundamental matrix, RANSAC, bundle, 2000’ s Featurebased Pattern recognition. Keypoints 3D reconstruction gets “solved”, generic object recognition Historic Timeline 3
  • 4. ? 2000’ s 1959 Russell’s Infant Son: 5cm by 5cm (176x176 array) Portland Art Museum. David Hubel and Torsten Wiesel —  in 1959. Their publication, entitled “Receptive fields of single neurons in the cat’s striate cortex” Lawrence Roberts’ “Machine perception of three-dimensional solids”. Process 2D photographs to build up 3D representations from lines. 1963 1966 MIT summer Vision Project, Seymour Papert, automatically, background/foreground segmentation, extract non- overlapping objects David Marr, “Vision a computational investigationinto the human representtion and processing of visual information” vision is hierarchical 1982 ck nblatt: ptron 958 e? (1928-1971) • Books: Marr (1982), Ballard & Brown (1982), Horn (1986) 1990s: Geometry reigns • Fundamental matrix: Faugeras (1992) • Normalized 8-point algorithm: Hartley (1997) • RANSAC for robust fundamental matrix estimation: Torr & Murray (1997) • Bundle adjustment: Triggs et al. (1999) • Hartley & Zisserman book (2000) • Projective structure from motion: Faugeras and Luong (2001) of image analysis ition 62 ages sis of bubble in 1959 Azriel Rosenfeld (1931-2004) “Father of computer vision” • Ph.D. in mathematics, Columbia, 1957 • Professor at UMD and ordained rabbi 1969 Early applications of image analysis • Character and digit recognition • First OCR conference in 1962 • Microscopy, cytology • Interpretation of aerial images • Even before satellites! • Particle physics • Hough transform for analysis of bubble chamber photos published in 1959 • Face recognition • Article about W. Bledsoe • Fingerprint recognition Azriel Rosenfeld (1931-2004) “Father of computer vision” • Ph.D. in mathematics, Columbia, 1957 • Professor at UMD and ordained rabbi • Wrote first textbook in the field in 1969 • Oral history (IEEE interview, 1998) Azriel Rosenfeld Early applications of image analysis 1990’ s Vision is ruled by Geometry, (projective) 3D reconstruction, fundamental matrix, RANSAC, bundle, 2000’ s Featurebased Pattern recognition. Keypoints 3D reconstruction gets “solved”, generic object recognition Historic Timeline 4
  • 5. • Computer vision has been reinvented at least three times. • Too close to the market: applications based research Tendency to resist novelty choosing applications over potentially more promising methods that could not yet deliver • Not idea driven Topic distribution of submissions for ICCV 2019 FromSvetlana Lazebnik,ComputerVision:LookingBack to Look Forward(2020) What Went Wrong? 5
  • 7. • Images are the optimal structure of data • Grey Levels as source of information Why Are We Using Images? 7
  • 8. • Invention of the camera obscura in 1544 (L. Da Vinci?) • The mother of all cameras Computer Vision: a Heritage from Art! 8
  • 9. Origins of Imaging • Increasing profits: painting faster • Evolution from portable models for travellers to current digital cameras • Evolving from canvas, to paper, to glass, to celluloid, to pixels 9
  • 10. Origins of Video: Motion Picture • Early work in motion-picture projection • Pioneering work on animal locomotion in 1877 and 1878 • Used multiple cameras to capture motion in stop-motion photographs Eadweard Muybridge (1830-1904) 10
  • 11. Computer vision, the Impossible Trade Off! power vs frame rate redundant useless data power and resource hungry: need to acquire/transmit/store/ process motion blur displacement between frames High Power & High Latency 11
  • 12. Event Acquisition • Reduce Data Load and only Detect “meaningful” events, at the time they happen! • No generic solution, Solutions: Scopes: • Avoid burning energy to acquire, transmit and store information that ends up being trashed • There are almost an infinite number of solutions to extract events • Need to be adapted to the dynamics and nature of the data 12
  • 13. Event acquisition • New Information is detected when it happens • When nothing happens, nothing is sent or processed • Sparse information coding mages. In order to overcome these limitations and ate temporal sampling of f (t), it is more efficient to s of f (t) just at the exact time at which they occur mely sampling on the other axis. a) t (b) f(t) t ways to sample functions values, in (a) using a t scale on the t axis, in (b) using a constant scale es of f (t). t0 tk f x,y(t) ∆f ˆ f x,y( α1 β1 α2 β2 Ev(t) +1 +1 +1 -1 -1 -1 -1 -1 +1 Popular solution: Sample on the amplitude axis of signals Time is the most valuable information 13
  • 14. Event acquisition mages. In order to overcome these limitations and ate temporal sampling of f (t), it is more efficient to s of f (t) just at the exact time at which they occur mely sampling on the other axis. a) t (b) f(t) t ways to sample functions values, in (a) using a t scale on the t axis, in (b) using a constant scale es of f (t). t0 tk f x,y(t) ∆f ˆ f x,y( α1 β1 α2 β2 Ev(t) +1 +1 +1 -1 -1 -1 -1 -1 +1 Popular solution: Sample on the amplitude axis of signals Time is the most valuable information 14
  • 15. e. This paper reviews elopment of a high- namic vision sensor pt entirely, and then r efficient low-level nd high-level object spike events. These use them for object mber of events but nts. Labeling attaches s, e.g. orientation or vents to track moving nt-by-event basis and as the basis for ject for filtering and ent past event times. ese past event times ger branching logic to . These methods are al digital hardware, g-based approach for egrates a neural style e. All code is open- sourceforge.net). address-event, vision ature extraction, low- (a) Vision sensor system (b) Vision sensor USB interface TMPDIFF128 16 bit counter USB interface Host PC USB Req Biases Ack Enb Enb 16 bit bus FIFOs buffers 3 Fig. 5 Shows the present implementation of the TMPDIFF128 camera system with USB2.0 interface. (a) shows the vision sensor system. (b of the USB hardware and software interface. The vision sensor (TMPDIFF128) sends AEs to the USB interface, which also captures tim running counter running at 100 kHz that shares the same 16-bit bus. These timestamped events are buffered by the USB FIFOs to be sent to also buffers the data in USB driver FIFOs, ‘unwraps’ the 16 bit timestamps to 32 bit values, and offers this data to other threads for further p USB chip also uses a serial interface to control the vision sensor biases. Flash memory on the USB chip stores persistent bias values. 15
  • 16. IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 1, JANUARY 2011 259 A QVGA 143 dB Dynamic Range Frame-Free PWM Image Sensor With Lossless Pixel-Level Video Compression and Time-Domain CDS Christoph Posch, Member, IEEE, Daniel Matolin, and Rainer Wohlgenannt Abstract—The biomimetic CMOS dynamic vision and image sensor described in this paper is based on a QVGA (304 240) array of fully autonomous pixels containing event-based change detection and pulse-width-modulation (PWM) imaging circuitry. Exposure measurements are initiated and carried out locally by the individual pixel that has detected a change of brightness in its field-of-view. Pixels do not rely on external timing signals and independently and asynchronously request access to an (asynchronous arbitrated) output channel when they have new grayscale values to communicate. Pixels that are not stimulated visually do not produce output. The visual information acquired from the scene, temporal contrast and grayscale data, are com- municated in the form of asynchronous address-events (AER), with the grayscale values being encoded in inter-event intervals. The pixel-autonomous and massively parallel operation ideally results in lossless video compression through complete temporal redundancy suppression at the pixel level. Compression factors depend on scene activity and peak at 1000 for static scenes. Due to the time-based encoding of the illumination information, very high dynamic range—intra-scene DR of 143 dB static and 125 dB at 30 fps equivalent temporal resolution—is achieved. A novel time-domain correlated double sampling (TCDS) method yields array FPN of 0.25% rms. SNR is 56 dB (9.3 bit) for 10 Lx illuminance. Index Terms—Address-event representation (AER), biomimetics, CMOS image sensor, event-based vision, focal-plane processing, high dynamic range (HDR), neuromorphic electronics, time-domain CDS, time-domain imaging, video compression. I. INTRODUCTION BIOLOGICAL sensory and information processing sys- tems appear to be much more effective in dealing with real-world tasks than their artificial counterparts. Humans still outperform the most powerful computers in routine functions involving, e.g., real-time sensory data processing, perception tasks and motion control and are, most strikingly, orders of synchronous information processing. It has been demonstrated [1]–[3] that modern silicon VLSI technology can be employed in the construction of biomimetic or neuromorphic artefacts that mimic biological neural functions. Neuromorphic systems, as the biological systems they model, process information using energy-efficient, asynchronous, event-driven methods. The greatest successes of neuromorphic systems to date have been in the emulation of peripheral sensory transduction, most notably in vision. Since the seminal attempt to build a “silicon retina” by Mahowald and Mead in the late 1980s [4], a variety of biomimetic vision devices has been proposed and implemented [5]. In the field of imaging and vision, two observations are crucial: biology has no notion of a frame, and the world—the source of most visual information we are interested in—works in continuous-time and asynchronously. The authors are con- vinced that biomimetic asynchronous electronics and signal processing have the potential—also in fields that are histori- cally dominated by synchronous approaches such as artificial vision, image sensing and image processing—to reach entirely new levels of performance and functionality, comparable to the ones found in biological systems. Future artificial vision systems, if they want to succeed in demanding applications such as, e.g., autonomous robot navigation, high-speed motor control, visual feedback loops, etc. must exploit the power of the asynchronous, frame-free, biomimetic approach. Studying biological vision, it has been noted that there exist two different types of retinal ganglion cells and corresponding retina–brain pathways in, e.g., the human retina: The “Magno”- cells are at the basis of what is named the transient channel or the Magno-cellular pathway. They have short latencies and respond transiently when changes—movements, onsets, offsets—are in- volved. The “Parvo”-cells are at the basis of what is called the sustained channel or the Parvo-cellular pathway. Parvo-cells are Temporal events and absolute light measurement 16
  • 18. Why Event Based Sensors? Chaotic pendulum tracking 18
  • 19. Event Time-based Sensor: Grey Levels Events 19
  • 20. CCAM sensors provide frame-free visual informa CCAM is g less events equivalent 1 based came the number of events depends on the dynamics of the scene. For standard cameras this amount is constant. Why Event Based Sensors? 20
  • 22. Event cameras have become a commodity Event Cameras 22
  • 23. How to Process Events? 23
  • 24. How to Process Events? 24
  • 25. Incremental Time ? Update previous result incrementally for the single pixel change Compute the new 1 pixel- change frame and compute result using the whole frame Batch or Frame Figure 3: Sample results from our method. The columns depict raw events, time manifold, result without manifold regularisation and finally with our manifold regularisation. Notice the increased contrast in weakly textured regions (especially around the edge of the monitor). the results of [1]. We point out that no ground truth data is available so we are limited to purely qualitative comparisons. In Fig. 4 we show a few images from the sequences. Since we are dealing with highly dynamic data, we point the reader to the included supplementary video3 which shows whole sequences of several hundred frames. Figure 4: Comparison to the method of [1]. The first row shows the raw input events that have been used for both methods. The second row depicts the results of Bardow et al., and the last row shows our result. We can see that out method produces more details (e.g. face, beard) as well as more graceful gray value variations in untextured areas, where [1] tends to produce a single gray value. 4.4 Compar ison to Standar d Camer as We have captured a sequence using a DVS128 camera as well as a Canon EOS60D DSLR camera to compare the fundamental differences of traditional cameras and event-based cam- eras. As already pointed out by [1], rapid movement results in motion blur for conventional 3ht t ps: / / www. yout ube. com / wat ch?v=r vB2URr G T94 Figure 2: Block diagram of the proposed approach. The output of the event camera is collected into frames over a specified time interval T, using aseparate channel depending on the event polarity (positiveand negative). The resulting synchronous event framesareprocessed by aResNet-inspired network, which producesaprediction of the steering angle of thevehicle. without resorting to partitioning the solution space; the an- gles produced by our network can take any value, not just discreteones, in therange[− 180◦ ,180◦ ]. Moreover, in con- trast to previous event-based vision learning works which use small datasets, we show results on the largest and most challenging (dueto scenevariability) event-based dataset to date. 3. Methodology Our approach aims at predicting steering wheel com- mandsfrom aforward-looking DVSsensor [1] mounted on a car. As shown in Fig. 2, we propose a learning approach that takes as input the visual information acquired by an event camera and outputs the vehicle’s steering angle. The events are converted into event frames by pixel-wise accu- mulation over aconstant time interval. Then, adeep neural network mapstheevent framesto steering anglesby solving a regression task. In the following, we detail the different and negativeevents. Thehistogram for positiveevents is h+ (x,y) . = Â tk2T, pk= + 1 d(x− xk,y− yk), (1) where d is the Kronecker delta, and the histogram h− for thenegativeeventsisdefined similarly, using pk = − 1. The histogramsh+ and h− arestacked to produceatwo-channel event image. Events of different polarity are stored in dif- ferent channels, asopposed to asinglechannel with thebal- ance of polarities (h+ − h− ), to avoid information loss due to cancellation in case events of opposite polarity occur in the samepixel during the integration interval T. 3.2. Learning Approach 3.2.1. Preprocessing. A correct normalization of input and output data is essential for reliably training any neural network. Since roads are almost always straight, the steer- ing angle’s distribution of a driving car is mainly picked in [−5◦ ,5◦ ]. This unbalanced distribution results in a biased regression. In addition, vehicles frequently stand still be- Event Computation V S 25
  • 26. Applications: Event Stereovision • Matching pixels is hard • Changing lighting conditions, occlusions, motion blur…. 26
  • 27. • Matching binocular events only using the time of events • Two events arriving at the same time and fulfilling geometric constraints are matched Event Stereovision P.Rogister, R.B. Benosman, S-H. Ieng, P. Lichtsteiner, T. Delbruck, Asynchronous Event-based Binocu- lar Stereo Matching. 2011 December 23, IEEE Transactions on Neural Networks 23(2) : 347-353, DOI : 10.1109/TNNLS.2011.2180025 27
  • 28. S.H. Ieng, J. Carneiro, M. Osswald, R. B.Benosman, Neuromorphic Event-Based Generalized Time-based Stereovision, 2018 July 18, Frontiers in Neuroscience, 12(442), DOI :10.3389/fnins.2018.00442 Event Stereovision 28
  • 31. 2 When events are transmitted off-chip, they are time- ped and then transmitted to a computer using a standard connection. EVENT-BASED VISUAL MOTION FLOW stream of events from the silicon retina can be mathemat- y defined as follows: let e(p, t) = (p, t)T a triplet giving position p = (x, y)T and the time t of an event. We can define the function Σe that maps to each p, the time t: Σe : R2 → R3 p → t = Σe. (1) e being an increasing function, Σe is then a monotonically asing surface. t Σey 2 hen events are transmitted off-chip, they are time- and then transmitted to a computer using a standard nnection. ENT-BASED VISUAL MOTION FLOW am of events from the silicon retina can be mathemat- fined as follows: let e(p,t) = (p,t)T a triplet giving ion p = (x, y)T and the time t of an event. We can ne the function Σe that maps to each p, the time t: Σe : R2 → R3 p → t = Σe. (1) ng an increasing function, Σe isthen amonotonically g surface. t Σey eeds a set the pixel decreased. al clocked data thus he timing d accurate me rate” is 2, FEBRUARY 2008 peration. In (a), the C and due to the e pixel is sensi- fine as h 128 by F spikes ner et al. ortional to X x Fig. 2. General principle of visual flow computation, the surface of active events Σe is derived to provide an estimation of orientation and amplitude of motion. We then set the first partial derivatives with respect to the parameters as: Σex = ∂ Σ e ∂ x and Σey = ∂ Σ e ∂ y . We can then write Σe as: Σe(p + ∆ p) = Σe(p) + ∇ ΣT e ∆ p + o(||∆ p||), (2) with ∇ Σe = ( ∂ Σ e ∂ x , ∂ Σ e ∂ y )T . Thepartial functions of Σe are functions of asingle variable whether x or y. Time being a strictly increasing function, Σe is a nonzero derivatives surface at any point. It is then possible to use the inverse function theorem to write around a location p = (x, y)T : ∂Σe ∂x (x, y0) = dΣe|y= y0 dx (x) = 1 vx (x, y0) , ∂Σe ∂y (x0, y) = dΣe|x= x 0 dy (y) = 1 vy (x0, y) , (3) Σe|x= x , Σe|y= y being Σe restricted respectively to y and pixel schematic. (b) Principle of operation. In (a), the or single-ended inverting amplifiers. ferencing circuit removes DC and due to the rsion in the photoreceptor, the pixel is sensi- ontrast , which we define as (1) hotocurrent. (The units of do not affect ) illustrates the principle of operation of the of this section, we will consider in detail the component parts of the pixel circuit (Fig. 2). (b) on DVS sensor with 128 by ple of ON and OFF spikes dapted from Lichtsteiner et al. xel’s voltage Vp proportional to corresponding generation of e change threshold) and OFF , from which the evolution of Σe(p + ∆ p) = Σe(p) + ∇ Σe ∆ p + o(||∆ p||), with ∇ Σe = ( ∂ Σ e ∂ x , ∂ Σ e ∂ y )T . Thepartial functions of Σe arefunctions of asingle vari whether x or y. Time being a strictly increasing function is a nonzero derivatives surface at any point. It is then pos to use the inverse function theorem to write around a loca p = (x, y)T : ∂Σe ∂x (x, y0) = dΣe|y= y0 dx (x) = 1 vx (x, y0) , ∂Σe ∂y (x0, y) = dΣe|x= x 0 dy (y) = 1 vy (x0, y) , Σe|x= x 0 , Σe|y= y0 being Σe restricted respectively to y x. The gradient ∇ Σe can then be written: ∇ Σe = ( 1 vx , 1 vy )T , which provides the inverse of the pixellic velocity of ev For an incoming event : Form the surface (event times): We then have: IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 25, NO. 2, FEBRUARY 2014 407 Event-Based Visual Flow Ryad Benosman, Charles Clercq, Xavier Lagorce, Sio-Hoi Ieng, and Chiara Bartolozzi Abstract—This paper introduces a new methodology to com- pute dense visual flow using the precise timings of spikes from an asynchronous event-based retina. Biological retinas, and their artificial counterparts, are totally asynchronous and data-driven and rely on a paradigm of light acquisition radically different from most of the currently used frame-grabber technologies. This paper introduces a framework to estimate visual flow from the local properties of events’ spatiotemporal space. We will show that precise visual flow orientation and amplitude can be estimated using a local differential approach on the surface defined by coactive events. Experimental results are presented; they show the method adequacy with high data sparseness and temporal resolution of event-based acquisition that allows the computation of motion flow with microsecond accuracy and at very low computational cost. Index Terms—Event-based vision, event-based visual motion flow, neuromorphic sensors, real time. I. INTRODUCTION RECENT work in this paper of neural activity has shown that each spike arrival time is reliable [1]–[4]. However, the extent to which the precise timing of neural spikes down to millisecond precision is significant for computation is still a matter of debate. In this paper, we address this issue by focus- ing on the computational principles that could be operated by motion-sensitive neurons of the visual system to compute the visual flow. We bring together new experimental sensors delivering truly naturalistic precise timed visual outputs and a new mathematical method that allows to compute event-based visual motion flow using each incoming spike’s timing as main computation feature. This presented method does not rely on gray levels, nor on the integration of activity over long time intervals. It uses each relative timing of changes of individual information in various demanding, machine vision applica- tions [9]–[15]. The microsecond temporal resolution and the inherent redundancy suppression of the frame-free, event- driven acquisition and subsequent representation of visual information employed by these cameras enables to derive a novel methodology to process the visual information at a very high speed and with low computational cost. Visual flow is a topic of several research fields that has been intensively studied since the early days of computational neuroscience. It is widely used in artificial vision and essential in navigation. Visual flow is known to be an ill-posed noisy visual measure limited by the aperture problem. Its use in real- time applications on natural scenes is generally difficult. It is usually computed sparsely on high salient points. Visual flow techniques are commonly classified under one of the four major categories. 1) Energy-based or frequency-based methods estimate opti- cal flow from the output of the velocity-tuned filters designed in the Fourier domain [16]–[18]. 2) Phase-based methods estimate image velocity in terms of band-pass filter outputs [19]. 3) Correlation-based or region-matching methods search for a best match of small spatial neighborhoods between adjacent frames [20]–[25]. 4) Gradient-based or differential methods use spatiotem- poral image intensity derivatives and an assumption of brightness constancy [26]–[28]. Energy-based techniques are slow [19] and are not adequate for real-time applications where gradient-based approaches perform better, as they rely on correlations. Visual flow 31
  • 33. • High temporal resolution generates smooth space-time surfaces • The slope of the local surface contains the orientation and amplitude of the optical flow Motion estimation: optical flow Event Flow 33
  • 34. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 25, NO. 2, FEBRUARY 2014 407 Event-Based Visual Flow Ryad Benosman, Charles Clercq, Xavier Lagorce, Sio-Hoi Ieng, and Chiara Bartolozzi Abstract—This paper introduces a new methodology to com- pute dense visual flow using the precise timings of spikes from an asynchronous event-based retina. Biological retinas, and their artificial counterparts, are totally asynchronous and data-driven and rely on a paradigm of light acquisition radically different from most of the currently used frame-grabber technologies. This paper introduces a framework to estimate visual flow from the local properties of events’ spatiotemporal space. We will show that precise visual flow orientation and amplitude can be estimated using a local differential approach on the surface defined by coactive events. Experimental results are presented; they show the method adequacy with high data sparseness and temporal resolution of event-based acquisition that allows the computation of motion flow with microsecond accuracy and at very low computational cost. Index Terms—Event-based vision, event-based visual motion flow, neuromorphic sensors, real time. I. INTRODUCTION RECENT work in this paper of neural activity has shown that each spike arrival time is reliable [1]–[4]. However, the extent to which the precise timing of neural spikes down to millisecond precision is significant for computation is still a matter of debate. In this paper, we address this issue by focus- ing on the computational principles that could be operated by motion-sensitive neurons of the visual system to compute the visual flow. We bring together new experimental sensors delivering truly naturalistic precise timed visual outputs and a new mathematical method that allows to compute event-based visual motion flow using each incoming spike’s timing as main computation feature. This presented method does not rely on gray levels, nor on the integration of activity over long time intervals. It uses each relative timing of changes of individual information in various demanding, machine vision applica- tions [9]–[15]. The microsecond temporal resolution and the inherent redundancy suppression of the frame-free, event- driven acquisition and subsequent representation of visual information employed by these cameras enables to derive a novel methodology to process the visual information at a very high speed and with low computational cost. Visual flow is a topic of several research fields that has been intensively studied since the early days of computational neuroscience. It is widely used in artificial vision and essential in navigation. Visual flow is known to be an ill-posed noisy visual measure limited by the aperture problem. Its use in real- time applications on natural scenes is generally difficult. It is usually computed sparsely on high salient points. Visual flow techniques are commonly classified under one of the four major categories. 1) Energy-based or frequency-based methods estimate opti- cal flow from the output of the velocity-tuned filters designed in the Fourier domain [16]–[18]. 2) Phase-based methods estimate image velocity in terms of band-pass filter outputs [19]. 3) Correlation-based or region-matching methods search for a best match of small spatial neighborhoods between adjacent frames [20]–[25]. 4) Gradient-based or differential methods use spatiotem- poral image intensity derivatives and an assumption of brightness constancy [26]–[28]. Energy-based techniques are slow [19] and are not adequate for real-time applications where gradient-based approaches perform better, as they rely on correlations. Visual flow 34
  • 35. Tracking Real-Time Outdoor Scenes Z. Ni, S.H. Ieng, C. Posch, S. Regnier, R.B. Benosman, Visual Tracking using Neuromorphic Asynchronous Event-based Cameras, 24 February 2015 Neural Computation 27(4) :925-53, DOI : 10.1162/NECO-a-00720 35
  • 36. that errorsaremoreequally shared between thevanandthecar thankstothemore reliable event attribution. Figure 16: The events generated by the van and the car during 3s are fitted into two planes, denoted asP1 in blueand P2 in red. n1 and n2 arethesurfacenormals. Tracking Real-Time Outdoor Scenes Z. Ni, S.H. Ieng, C. Posch, S. Regnier, R.B. Benosman, Visual Tracking using Neuromorphic Asynchronous Event-based Cameras, 24 February 2015 Neural Computation 27(4) :925-53, DOI : 10.1162/NECO-a-00720 36
  • 37. Event-Based Tracking with a Moving Camera Z. Ni, S.H. Ieng, C. Posch, S. Regnier, R.B. Benosman, Visual Tracking using Neuromorphic Asynchronous Event-based Cameras, 24 February 2015 Neural Computation 27(4) :925-53, DOI : 10.1162/NECO-a-00720 37
  • 38. Event-Based 3D Tracking and Pose Estimation D. Reverter-Valeiras, G. Orchard, S.H. Ieng, R.B. Benosman,Neuromorphic Event-Based 3D Pose Estimation, 2016January 22, Frontiers in Neuromorphic 9(522), DOI : 10.3389/fnins.2015.00522 38
  • 39. Low Power and Latency Streaming 39
  • 40. Asynchronous Event-Based Fourrier Analysis Q. Sabatier, S.H. Ieng, R.B. Benosman, Asynchronous Event-based Fourier Analysis, 2017 February 6, IEEE Transactions on Image Processing 26 (5) :2192-2202, DOI : 10.1109/TIP.2017.2661702 40
  • 41. Last Two Decades: Rethinking Computer Vision in The Time Domain
  • 42. (b) Events from the sensor (f) Time surface (a) Event-driven time-based vision sensor (ATIS or DVS) Context amplitude surface amplitude X (spatial) Y ( s p a t i a l ) ON events OFF events Deep Temporal Learning: Time Surfaces Time Surface Pixels X. Lagorce, G. Orchard, F. Galluppi, B. E. Shi, R.B. Benosman, HOTS : A Hierarchy Of event- based Time-Surfaces for pattern recognition, 2016 July 11, IEEE Transaction on Pattern Analysis and Machine Intelligence 39(7) :1346-1359, doi :10.1109/TPAMI.2016.2574707 42
  • 43. Fig. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is pre events (c) which are fed into Layer 1. The events are convolved with exponential kernels length ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is m constitute the output of the layer (g). Each layer k (gray boxes) takes input from its pre The output of Layer k is presented between Layer k and k þ 1 ((g),(h),(i)). To compute e length ð2Rk þ 1Þ around each pixel. The event contexts are compared to the differe 1350 IEEE TRANSACTIONS ON PATTERN ANALYSIS Fig. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is presented to the ATIS camera (b) which events (c) which are fed into Layer 1. The events are convolved with exponential kernels (d) to build event contexts from spatia length ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is matched, it produces an event (f). Even 1350 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL Fig. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is presented to the ATIS camera (b) which p events (c) which are fed into Layer 1. The events are convolved with exponential kernels (d) to build event contexts from spatial length ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is matched, it produces an event (f). Event 1350 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 3 Fig. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is presented to the ATIS cam events (c) which are fed into Layer 1. The events are convolved with exponential kernels (d) to build event contex length ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is matched, it produces an e constitute the output of the layer (g). Each layer k (gray boxes) takes input from its previous layer and feeds th <latexit sha1_base64="E0lXgR5y61QwpnHFZI9ZGbqt2ts=">AAACy3icjVHLSsNAFD2Nr1pfVZdugkVwVRJRdFl040aoYB/QljKZTmtomoRkItTq0h9wq/8l/oH+hXfGKahFdEKSM+eec2fuvV4c+Kl0nNecNTe/sLiUXy6srK6tbxQ3t+pplCVc1HgUREnTY6kI/FDUpC8D0YwTwUZeIBre8EzFGzciSf0ovJLjWHRGbBD6fZ8zSVSzLVlmd91Ct1hyyo5e9ixwDSjBrGpUfEEbPUTgyDCCQAhJOABDSk8LLhzExHUwIS4h5Ou4wD0K5M1IJUjBiB3Sd0C7lmFD2qucqXZzOiWgNyGnjT3yRKRLCKvTbB3PdGbF/pZ7onOqu43p75lcI2Ilron9yzdV/tenapHo40TX4FNNsWZUddxkyXRX1M3tL1VJyhATp3CP4glhrp3TPtvak+raVW+Zjr9ppWLVnhtthnd1Sxqw+3Ocs6B+UHaPys7lYalyakadxw52sU/zPEYF56iipuf4iCc8WxdWat1ad59SK2c82/i2rIcPBnaRrw==</latexit> ⌧1 (c) Spatio-temporal domain (e) Exponential kernels (b) Events from the sensor (f) Time surface (d) Time context (a) Event-driven time-based vision sensor (ATIS or DVS) Context amplitude surface amplitude X (spatial) Y ( s p a t i a l ) ON events OFF events Clustering Fig. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is presented to the ATIS camera (b) which events (c) which are fed into Layer 1. The events are convolved with exponential kernels (d) to build event contexts from spatia length ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is matched, it produces an event (f). Even Stimulus Fig. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is presented to the ATIS camera (b) which produces ON an events (c) which are fed into Layer 1. The events are convolved with exponential kernels (d) to build event contexts from spatial receptive field o length ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is matched, it produces an event (f). Events from the N1 fe constitute the output of the layer (g). Each layer k (gray boxes) takes input from its previous layer and feeds the next by reproducing steps 1350 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. 7, JUL Event Camera 1350 IEEE TRANSACTIONS ON PATTERN ANALYS 1350 IEEE TRANSACTIONS ON PATTERN ANALYSIS AN 43
  • 44. Fig. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is presented to the ATIS camera (b events (c) which are fed into Layer 1. The events are convolved with exponential kernels (d) to build event contexts from length ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is matched, it produces an event (f <latexit sha1_base64="IEy8eBhrIcA0pDsqgwmjW0T+92U=">AAACy3icjVHLSsNAFD3GV62vqks3wSK4KklRdFl040aoYB/QljJJpzU0TUJmItTq0h9wq/8l/oH+hXfGKahFdEKSM+eec2fuvV4SBkI6zuucNb+wuLScW8mvrq1vbBa2tusizlKf1/w4jNOmxwQPg4jXZCBD3kxSzkZeyBve8EzFGzc8FUEcXclxwjsjNoiCfuAzSVSzLVlmd8v5bqHolBy97FngGlCEWdW48II2eojhI8MIHBEk4RAMgp4WXDhIiOtgQlxKKNBxjnvkyZuRipOCETuk74B2LcNGtFc5hXb7dEpIb0pOG/vkiUmXElan2Tqe6cyK/S33ROdUdxvT3zO5RsRKXBP7l2+q/K9P1SLRx4muIaCaEs2o6nyTJdNdUTe3v1QlKUNCnMI9iqeEfe2c9tnWHqFrV71lOv6mlYpVe99oM7yrW9KA3Z/jnAX1csk9KjmXh8XKqRl1DrvYwwHN8xgVnKOKmp7jI57wbF1Ywrq17j6l1pzx7ODbsh4+AAjXkbA=</latexit> ⌧2 1350 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIG Fig. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is presented to the events (c) which are fed into Layer 1. The events are convolved with exponential kernels (d) to build ev length ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is matched, it prod constitute the output of the layer (g). Each layer k (gray boxes) takes input from its previous layer an The output of Layer k is presented between Layer k and k þ 1 ((g),(h),(i)). To compute event contexts 1350 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHIN Clustering 1350 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELL Fig. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is presented to the ATIS camera (b) which produces ON and OFF events (c) which are fed into Layer 1. The events are convolved with exponential kernels (d) to build event contexts from spatial receptive field of side- length ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is matched, it produces an event (f). Events from the N1 features constitute the output of the layer (g). Each layer k (gray boxes) takes input from its previous layer and feeds the next by reproducing steps (d)-(g). The output of Layer k is presented between Layer k and k þ 1 ((g),(h),(i)). To compute event contexts, each layer considers a receptive field of side- length ð2Rk þ 1Þ around each pixel. The event contexts are compared to the different features (represented as surfaces in the gray boxes as 1350 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. 7, JULY 2017 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. 7 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. 7, JULY 201 44
  • 45. g. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is presented to the ATIS camera (b) which produces ON and OFF vents (c) which are fed into Layer 1. The events are convolved with exponential kernels (d) to build event contexts from spatial receptive field of side- ngth ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is matched, it produces an event (f). Events from the N1 features onstitute the output of the layer (g). Each layer k (gray boxes) takes input from its previous layer and feeds the next by reproducing steps (d)-(g). Clustering E TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. 7, JULY 2017 model. From left to right, a moving digit (a) is presented to the ATIS camera (b) which produces ON and OFF events are convolved with exponential kernels (d) to build event contexts from spatial receptive field of side- stered into N1 features (e). When a feature is matched, it produces an event (f). Events from the N1 features ch layer k (gray boxes) takes input from its previous layer and feeds the next by reproducing steps (d)-(g). <latexit sha1_base64="m9P+XndBFyGd43BPzGkrnmWWB64=">AAACy3icjVHLSsNAFD3GV62vqks3wSK4KokPdFl040aoYB/QljJJpzU0TcLMRKjVpT/gVv9L/AP9C++MKahFdEKSM+eec2fuvV4SBlI5zuuMNTs3v7CYW8ovr6yurRc2NmsyToXPq34cxqLhMcnDIOJVFaiQNxLB2dALed0bnOl4/YYLGcTRlRolvD1k/SjoBT5TRDVaiqV25yDfKRSdkmOWPQ3cDBSRrUpceEELXcTwkWIIjgiKcAgGSU8TLhwkxLUxJk4QCkyc4x558qak4qRgxA7o26ddM2Mj2uuc0rh9OiWkV5DTxi55YtIJwvo028RTk1mzv+Uem5z6biP6e1muIbEK18T+5Zso/+vTtSj0cGJqCKimxDC6Oj/Lkpqu6JvbX6pSlCEhTuMuxQVh3zgnfbaNR5radW+Zib8ZpWb13s+0Kd71LWnA7s9xToPafsk9KjmXh8XyaTbqHLaxgz2a5zHKOEcFVTPHRzzh2bqwpHVr3X1KrZnMs4Vvy3r4AAs4kbE=</latexit> ⌧3 45
  • 46. IONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. 7, JULY 2017 normalized, and Bhattacharyya distances respectively. The recognized class is the one with the smallest bar in each col- umn, and is marked with a star. Each character is presented once to the sensor in the order we gave to the classes so that perfect results correspond to stars only on the main diago- nal. In this experiment, all distances lead to a recognition performance of 100 percent. We ran a cross validation test by randomly choosing for each pattern which presentation would be used for learning (both the model and classifier), and the other is then used for testing. Every trial amongst the hundred we ran gave a recognition accuracy of 100 percent. This experiment is run on a dataset composed of objects with very distinct spatial characteristics. Because of that, it is the best one to try to interpret the features learnt by the architecture. Fig. 11 presents the activation of all three experiment, we use a dataset consisting of the faces of seven subjects. Each subject is moving his or her head to follow the same visual stimulus tracing out a square path on a computer monitor (see Fig. 12). The data is acquired by an ATIS camera [22]. Each subject has 20 presentations in the dataset of which one is used for training and the others for testing. We again use the same hierarchical system described previously with the following parameters: ! R1 ¼ 4; ! t1 ¼ 50 ms; ! N1 ¼ 8: ! KR ¼ 2; ! Kt ¼ 5; ! KN ¼ 2: Because the faces are bigger and more complex objects, we Fig. 9. Letters & Digits experiment: Pattern signatures for some of the input classes. For each letter and digit the trained histogram used as a signa- ture by the classifier is shown. The snapshot shows an accumulation of events from the sensor (White dots for ON events and black dots for OFF events). The histograms present the signatures: X-axis is the index of the feature, Y-axis is the number of activations of the feature during the stimu- lus presentation. The signatures of all the letters & digits are presented in the supplemental material, available online. LAGORCE ET AL.: HOTS: A HIERARCHY OF EVENT-BASED TIME-SURFACES FOR PATTERN RECOGNITION 1353 ving digit (a) is presented to the ATIS camera (b) which produces ON and OFF ponential kernels (d) to build event contexts from spatial receptive field of side- en a feature is matched, it produces an event (f). Events from the N1 features nput from its previous layer and feeds the next by reproducing steps (d)-(g). i)). To compute event contexts, each layer considers a receptive field of side- E INTELLIGENCE, VOL. 39, NO. 7, JULY 2017 Deeper 46
  • 47. Deep Temporal Learning with Adaptive Temporal Feedback: Temporal Surfaces X. Lagorce, G. Orchard, F. Galluppi, B. E. Shi, R.B. Benosman, HOTS : A Hierarchy Of event- based Time-Surfaces for pattern recognition, 2016 July 11, IEEE Transaction on Pattern Analysis and Machine Intelligence 39(7) :1346-1359, doi :10.1109/TPAMI.2016.2574707 47
  • 48. Deep Temporal Learning with Adaptive Temporal Feedback: Temporal Surfaces X. Lagorce, G. Orchard, F. Galluppi, B. E. Shi, R.B. Benosman, HOTS : A Hierarchy Of event- based Time-Surfaces for pattern recognition, 2016 July 11, IEEE Transaction on Pattern Analysis and Machine Intelligence 39(7) :1346-1359, doi :10.1109/TPAMI.2016.2574707 48
  • 50. From: Goi, E., Zhang, Q., Chen, X. et al. Perspective on photonic memristive neuromorphic computing. PhotoniX 1, 3 (2020). https://doi.org/10.1186/s43074-020-0001-6 Computation Platfoms? Replicate 50
  • 51. https://wiki.ebrains.eu/bin/view/Collabs/ neuromorphic/BrainScaleS/ ~4 Million bio-realistic neurons 880 Million learning synapses 105 faster than real time 164 Giga-events/s (normal) 1 Tera-event/s (burst) several hundreds of kW [2010] Analog vs Digital SpiNNaker 1 millon processors 200 million million actions per second 51
  • 52. Neuromorphic Computing [from The Scientist, 2019] Today’s digital neuromorphic hardware 52
  • 53. I/O I/O memory Event-based Machine Learning Module Event-based Processing Communication AER Module Communication AER Module Communication AER Module I/O A Processing Solution Adapted to Event Data Events Vision Analog Digital LIDAR Radar Audio IMU • Architecture for Event per Event: programmable event-based computation and machine learning • Industry standard I/O and programmability • Scalable, enabling fast and cost-effective derivatives <10-100mW and up to 20 GigaEvents/s processing SPEAR- 1 understand 53
  • 54. https://www.summerrobotics.ai 7 VII. RESULTS A. Experiment The method is applied on a scene containing two objects placed around one meter away from the system. We used a SPSM of 20⇥30 symbols and the pattern codification is done using the dutycycle of the signal. The neighborhood in the burst filter is set to 1 (3 ⇥ 3 pixels) and signal’s frequency is set to 20Hz for performance issues. The algorithm runs on MatLab and even if it could be heavily parallelized, it wasn’t the case on this experiment and real time wasn’t achieve. An image of accumulated outputs from filters is given in Fig. 9, colors in the image reflect the estimation of the dutycycle by each pixel associated filter. Point correspondance is performed by extracting the 3 ⇥ 3 dutycycle based codewords and as pattern’s dots are bigger than a single pixel, we perform a spatial averaging to obtain a sub-pixel position of the dot in the camera frame. After triangulation, a 3D point cloud is obtained (Fig. 10) Fig. 9. Image of the projected pattern as extracted by the algorithm. Colors correspond to the estimated dutycycle. Fig. 10. 3D point cloud. VIII. CONCLUSION AND DISCUSSION This paper introduced a methodology that makes use of the hguh temporal resolution of the event based sensor to rethink Fig. 11. Reconstructed depth a unique spatial distribution of light patterns composed of elements each flickering at a unique frequency. We also used the idea of random shift allowing each neighboorhood to be unique. Each methodology relying on the combined use of an event based camera and a light projector coding each spatial position in the frequency domain can make use of the devel- oped approach. There are several ways to decode frequencies from the output of the event based camera. The method used here could be bettered, using more precise event based camera allowing fequency to be extracted from the timing of the oscillation of each pattern from inter spike information. If that condition is fulfilled even simpler patterns could be used without the need to rely on unique neighborhoods. REFERENCES [1] A. D. L. Escalera, L. Moreno, M. A. Salichs, and J. M. Armingol, “Continuous mobile robot localization by using structured light and a geometric map,” International Journal of Systems Science, vol. 27, no. 8, pp. 771–782, 1996. [Online]. Available: http://www.tandfonline.com/doi/abs/10.1080/00207729608929276 [2] M. Asada, “Map building for a mobile robot from sensory data,” Systems, Man and Cybernetics, IEEE Transactions on, vol. 20, no. 6, pp. 1326–1336, 1990. [3] Y. Kakinoki, T. Koezuka, S. Hashinami, and M. Nakashima, “Wide- area, high dynamic range 3-d imager,” in Proc. SPIE 1194, Optics, Illumination, and Image Sensing for Machine Vision IV, 1990. [4] R. T. Chin, “Automated visual inspection: 1981 to 1987,” Computer Vision, Graphics, and Image Processing, vol. 41, no. 3, pp. 346–381, 1988. [Online]. Available: http://www.sciencedirect.com/science/article/ pii/0734189X88901089 [5] E. M. Petriu, Z. Sakr, H. J. W. Spoelder, and A. Moica, “Object recognition using pseudo-random color encoded structured light,” in Instrumentation and Measurement Technology Conference, 2000. IMTC 2000. Proceedings of the 17th IEEE, vol. 3. IEEE, 2000, pp. 1237–1241. [Online]. Available: http://ieeexplore.ieee.org/xpls/abs all. jsp?arnumber=848675 [6] J. Park, G. N. DeSouza, and A. C. Kak, “Dual-beam structured-light scanning for 3-d object modeling,” in 3-D Digital Imaging and Modeling, 2001. Proceedings. Third International Conference on. IEEE, 2001, pp. 65–72. [Online]. Available: http://ieeexplore.ieee.org/ xpls/abs all.jsp?arnumber=924399 [7] D. Page, A. Koschan, Y. Sun, and M. Abidi, “Laser-based imaging for reverse engineering,” Sensor Review, vol. 23, no. 3, pp. 223–229, 2003. [Online]. Available: http://www.emeraldinsight.com/journals.htm? issn=0260-2288&volume=23&issue=3&articleid=876385&articletitle= Laser-based+imaging+for+reverse+engineering& [8] R. Klette, K. Schlüns, and A. Koschan, Computer obtained (Fig. 10) Fig. 9. Image of the projected pattern as extracted by the algorithm. Colors correspond to the estimated dutycycle. Fig. 10. 3D point cloud. VIII. CONCLUSION AND DISCUSSION This paper introduced a methodology that makes use of the hguh temporal resolution of the event based sensor to rethink the problem of structured light depth reconstruction. We used elements each flickering at the idea of random shift all unique. Each methodology r event based camera and a l position in the frequency do oped approach. There are se from the output of the even here could be bettered, using allowing fequency to be e oscillation of each pattern that condition is fulfilled ev without the need to rely on REF [1] A. D. L. Escalera, L. Armingol, “Continuous mob light and a geometric m Science, vol. 27, no. 8, p http://www.tandfonline.com/d [2] M. Asada, “Map building Systems, Man and Cyberneti pp. 1326–1336, 1990. [3] Y. Kakinoki, T. Koezuka, S. area, high dynamic range 3 Illumination, and Image Sens [4] R. T. Chin, “Automated visu Vision, Graphics, and Image 1988. [Online]. Available: htt pii/0734189X88901089 [5] E. M. Petriu, Z. Sakr, H. recognition using pseudo-ran Instrumentation and Measure 2000. Proceedings of the 1237–1241. [Online]. Availa jsp?arnumber=848675 [6] J. Park, G. N. DeSouza, and scanning for 3-d object m Modeling, 2001. Proceeding IEEE, 2001, pp. 65–72. [On xpls/abs all.jsp?arnumber=92 [7] D. Page, A. Koschan, Y. Su for reverse engineering,” Sen 2003. [Online]. Available: htt issn=0260-2288&volume=23& Laser-based+imaging+for+rev [8] R. Klette, K. Schlün vision: three-dimensional Singapore, 1998, v Fig. 3. Experimental setup III. PATTERN CODIFICATION STRATEGIES To get the better out of the ATIS sensor, it is mandatory to choose a suitable pattern codification. In the event domain, the only information available is the polarity and timestamp of pixels, thus the correspondance problem can’t be solve by color or grey level coding. Moreover, the need for an asyn- chronous system impose that our pattern can be segmented in small independent patches that we could use to extract local information only when it is needed. This constraint has a direct impact on the spatial organisation of the pattern and direct methods like the one used in ?? must be discarded. In the next section we expand over three different methods that could be used to gather local data independently as well as being decodable without too much computation from the sensor’s output to ensure a decent reconstruction speed. The first method we will develop is based on the well known time-multiplexing binary coding (refs?) but encode the binary frames as a signal with set frequency and dutycycle, each element is then part of a spatial codification that can be decoded locally. Our second method aim to makeuseof spatio- temporal continuity of events generated from fast moving objects to create codewords based on spatial orientation of multiple moving dots. This approach differs only from the first in the way information is extracted from the sensor, but the decoding process is done the same way. Lastly, we developed an approach based on phase-shifting algorithms, but instead of evaluating the phase from several shifted patterns (ref), we continuously move a series of stripes and record precise time difference between stripe positions from an arbitrary origin, which gives us a robust measurement. This last method is Projector plane Camera plane Scene C1 p2 P C2 p1 Fig. 4. Structured light principle. A coded light pattern is projected onto scene that is observed by a camera. Decoding the pattern allows the match of paired points in the two views (p1, p2 ) and by triangulation of the r coming from optical centers C1 and C2 , the position of the real point is found. Colors in the picture are for clarity only and in our case refers different binary signal’s dutycycle for the first method or different mot orientation for the second. instead of being computed through a sequence of intens values. Fig. ?? shows the pattern spatial organisation, ea color refers to aspecific dutycycle. It uses aDebruijn sequen that is repeated for each line. Ideally we should have project full lines instead of small spots but practical limitations limit our choices. A periodic projection generates, at the pixel lev an event stream composed of successive ON and OFF eve bursts following the signal’s periodic behavior. The length a the number of events constituting each burst is defined by t contrast and illumination condition as well as the set of bias used to operate the sensor. In a perfect case, each edge of the signal would trigg only one event of the corresponding polarity for each pix In this situation, measuring the inter-event time betwe successive ON and OFF events would be enough to get correct estimation of the signal’s frequency and dutycyc However, because of small imprecisions in the attribution timestamps by the sensor’s arbiter and possible variations the global illumination, the number of events triggered at ea edge of the signal is not consistent. To ensure at least o event per period, we have two choices: increase the senso articlehasbe e nacce pte dfor inclus ioninafutureis s ueofthisjournal.Conte nt isfinal aspre s e nte d,withthee xce ptionofpagination. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS of light. By mixing the received signal of pt (t) at the in-phase receiver of nd by low-pass filtering the resulting frequency component fc is removed, l is obtained: 2παTdt + 2πfcTd − παT 2 d . (4) of the baseband signal is proportional d, its spectrum shows peaks at certain ding to the distance between the radar bjects (the receive signal being, in fact, of the form given by (4) when multiple fter the analog-to-digital conversion by discrete-time signal r[i] is found as i] = r(t = iTf ). (5) ADC sampling period Tf impacts the age dmax of the radar [21] dmax = c 2αTf . (6) ned as follows: α = Tc (7) duration and is the bandwidth of the he range resolution [21] dres = c 2 . (8) ange profile Rn[k] is defined as the n the magnitude of Rn[k] indicate the hile the phase evolution of Rn[k∗ ] for a between successive chirps represents the TABLE I DATASET CONTENT USED IN THE DEMONSTRATION EXPERIMENTS Fig. 1. Gesture acquisition setup. making them more sensitive to proper preprocessing and feature extraction. The radar parameters were set as follows: the number of ADC samples per chirp is 512, the number of chirps per frame is 192, and the time between chirps is Ts = 1.2 ms, while the chirp duration Tc is 41 µs. Therefore, the total duration for a frame capture is 238 ms and Tf = 80 ns. Fig. 1 shows the gesture acquisition setup with the antennas and the radar Radar and LIDAR and much more… 54
  • 55. Sight Restoration: Prosthetics and Optogenetics • Development of Retina Stimulation Goggles • 3 generations of Retina Prosthetics • Asynchronous Retina Stimulation: Prosthetics and Optogenetics IRIS 2 PRIMA 55
  • 56. Sight Restoration: Prosthetics and Optogenetics 56
  • 57. & much more.. Décision making: game theory stock Market Low power Online decoding and classification Robotics ADAS Sensory Substitution Always on sensing Space awareness 57
  • 58. Conclusions ? Where we came from… A Chance to Move Perception from Engineering to Basic Science! observe understand model application A Chance to ground Perception in Basic Science! • A whole new world to explore • A deep paradigm shift for Sensing & AI • Novel sensors to build • New adapted processing architectures to design 58
  • 59. Conclusions • A whole new world to explore • A deep paradigm shift for Sensing & AI • Novel sensors to build • New adapted processing architectures to design Physiology - Models - Hardware - Prosthetics - Robotics - Computation 59