Level 3 Certification: Setting up Sumo Logic - Oct 2018
XCube-overview-brochure-revB
1. XCube StreamX®
The
StreamX®
streaming
data
management
platform
from
XCube
is
an
integrated,
turnkey,
end-‐to-‐end
solution
for
huge
streaming
data.
Whether
your
data
streams
are
combinations
of
video,
Lidar,
radar,
or
other
sensors,
you
can
find
the
right
time
segments
of
data
in
100’s
of
Petabytes
of
data
files,
then
use
those
segments
as
input
to
your
application
and
run
it
all
in
parallel.
The
StreamX
platform
is
designed
to
solve
two
of
the
biggest
challenges
when
working
with
huge
streaming
datasets:
The
StreamX
platform
includes
a
distributed
virtual
file
system,
virtual
machine
technology,
and
parallel
execution
engine
that
enables
any
desktop
application
to
run
in
parallel
in
a
distributed
cluster.
Applications
are
sent
into
the
cluster
to
where
the
data
resides
and
then
executed
in
parallel,
all
controlled
by
a
simple
user
interface
from
any
web
browser.
No
parallel
programming.
No
transferring
or
reformatting
the
data.
Working
with
huge
streaming
data
sets
has
never
been
easier.
The
StreamX
streaming
data
management
platform
from
XCube
can
shave
months
off
of
your
development,
test,
and
validation.
How
to
run
an
existing
desktop
application,
such
as
a
simulator,
on
lots
of
data
many
times
faster
than
a
workstation
or
standard
server
How
to
find
the
specific
time
segments
inside
of
Petabytes
of
streaming
data
files
that
you
want
to
run
the
application
on
or
further
analyze
Data Management Tools for Big Streaming Data
Capture, Store, Tag, Search, Simulate, and Automate in Parallel
without reprogramming applications or reformatting data
With the StreamX platform you can:
Automatically tag content in data
streams
• Mine the data with user-defined
detection algorithms
• Run the detection algorithms in
parallel without parallel programing
• Tag detected scenes in the stream to
enable content-based searches
Find and manage specific content
out of petabytes stored globally
• Search in parallel for tagged content
in 100s petabytes stored globally
• Manage the search results to create
independents sets of data for
training, testing, and validation
Run desktop applications in parallel
using search results as input
• Run desktop applications such as
simulators in parallel without
recoding
• Work with Petabytes of data
distributed globally without ever
transmitting the data
Automate training and testing
• Automate AI training and regression
testing.
• Partition training and test data to
maintain test integrity.
Validate using Re-simulated Data
• Bring real-world streaming data in
your simulators
• Automatically vary input parameters
to create new simulated situations
• Use results from one simulation run
as input to the next to provide active
feedback
With XCube’s StreamX systems,
each of our GIDAS simulations runs
overnight instead of a week.
- Automotive OEM test engineer
“
”
1
2
2. The StreamX Platform: Powerful and Easy to Use
The
StreamX
streaming
data
management
platform
is
powerful
yet
easy
to
use.
After
capturing
the
data,
there
are
five
main
functions
with
the
system,
all
of
which
can
be
performed
remotely
from
any
browser.
Capture.
Capture
the
data
stream
using
your
own
system
or
the
StreamX
Data
Recorder.
The
StreamX
data
recorder
can
capture
multi-‐sensor
data
at
up
to
4.4
GB/s,
eliminating
the
need
have
separate
recorders
for
each
sensor
stream.
Up
to
256
TB
of
removable
storage
holds
extended
data
collection
sessions
and
then
removed
and
shipped
to
the
data
center
without
having
to
disconnect
the
sensors.
Store.
The
incoming
data
needs
to
be
incorporated
in
the
distributed
storage
cluster
and
organized
in
a
way
that
the
StreamX
tools
can
see
the
data.
This
is
done
with
an
import
function
that
associates
each
of
the
streams
in
a
multi-‐sensor
dataset
with
a
time-‐synchronized
project.
When
using
the
StreamX
data
recorder,
the
data
is
already
stored
in
projects
and
is
immediately
accessible.
Tag.
Once
the
data
is
recognized
by
the
system,
the
next
step
is
to
tag
or
annotate
the
data
based
on
the
content
important
to
the
application.
The
user
provides
one
or
more
image
or
pattern
recognition
algorithms
in
the
form
of
a
small
application
that
will
run
on
the
data
and
indicate
when
the
content
of
interest
is
detected
in
a
particular
data
frame.
The
StreamX
platform
uses
those
mini-‐applications
to
crawl
through
the
data
distributed
across
the
cluster
nodes
in
the
background,
mining
the
data
in
parallel
for
interesting
content.
When
the
content
is
found,
the
system
tags
all
the
timestamps
of
the
data
file
where
the
content
is
located.
Time
stamps
can
contain
many
different
tags
to
indicate
multiple
types
of
content
or
multiple
versions
of
the
detection
algorithm.
Search.
When
a
user
wants
to
find
specific
sequences
of
data
within
any
of
the
large
multi-‐sensor
data
files,
he
uses
a
web-‐based
interface
to
setup
a
parallel
search.
The
search
criteria
are
the
user-‐defined
tags
setup
during
the
tagging
and
mining
phase,
and
the
criteria
can
be
as
simple
or
complex
as
needed.
The
search
runs
in
parallel
on
every
data
node
in
the
cluster
and
returns
a
list
of
time
sequences
that
matched
the
search
criteria.
The
search
results
can
be
stored,
partitioned,
and
managed
for
repeated
testing
and
validation.
Use.
Once
the
user
has
identified
what
sections
of
each
data
file
are
needed,
he
then
uses
a
web-‐based
interface
to
launch
an
application,
such
as
a
simulator,
into
the
cluster
to
run
in
parallel
on
every
node
where
any
of
the
target
data
is
located.
The
application
is
the
same
one
he
runs
on
a
desktop
or
workstation
without
recoding.
Often
the
application
is
large,
complex,
and
proprietary,
such
as
a
simulator.
The
application
results
are
collected
and
stored
separately.
The
process
can
be
saved
so
that
the
same
test
run
can
be
performed
the
same
way
in
the
future
automatically.
Automate.
Every
part
of
the
process
can
be
saved
and
automated
to
save
time
and
provide
reliable
testing
and
validation.
Complex
searches
can
be
saved
and
re-‐run
after
new
datasets
are
imported.
Parallel
execution
runs
can
be
saved
and
re-‐run
on
updated
sets
of
search
results.
Training
runs
can
be
set
up
with
thousands
or
millions
of
examples
to
train
a
classifier.
Simulators
can
be
given
slightly
different
input
parameters
automatically
to
vary
the
environment
test
conditions
to
create
more
robust
testing
and
validation.
The
test
automation
system
supports
active
feedback,
so
that
the
results
from
one
simulation
run
can
be
used
as
input
to
the
next.
It
is
as
simple
as
that.
The
user
supplies
the
data,
the
content
detection
algorithm,
the
search
criteria,
and
the
desktop
application.
The
StreamX
streaming
data
management
platform
takes
care
of
the
rest.
Capture
Store
Tag
Search
Use
Automate
3. One Platform Addressing Multiple
Streaming Data Challenges
Many
different
challenges
with
big
streaming
data
can
be
solved
by
the
StreamX
platform.
Whether
the
focus
is
on
analyzing
streams
as
they
come
in,
mining
the
data
off
line
for
deeper
understanding,
or
training
AIs,
the
StreamX
platform
has
the
capability
to
find
the
exact
data
you
are
looking
for
and
accelerate
your
use
of
that
data.
Test and Validation
Challenges
• Finding
the
right
time
slices
of
data
to
test
with
from
inside
many
large
data
files
• Quickly
testing
the
new
application
version
against
those
test
data
• Sensitivity
analysis
based
on
parametric
variation
on
the
state
space
StreamX Solution
• Content-‐based
search
of
up
to
100
PB
of
streaming
data
in
parallel
• Automatic
parallel
execution
of
the
application
against
that
data
• Test
automation
engine
allowing
flexible
parametric
variation
schemes
and
active
feedback
control
of
test
scenarios
Training Classifiers
Challenges
• Finding
the
right
slices
of
video
or
sensor
data
files
for
training
out
of
100s
of
Terabytes
or
Petabytes
• Manual
labor
or
scripts
to
feed
the
training
data
to
the
classifier
StreamX Solution
• Content-‐based
search
of
up
to
100
PB
of
streaming
data
in
parallel,
extracting
just
the
relevant
scenes
• Automated
test
harness
trains
the
classifier
after
initial
setup
from
web-‐based
user
interface
Real-time Analysis
Challenges
• Making
newly
acquired
data
instantly
available
for
analysis
• Quickly
compare
newly
acquired
data
to
mounds
of
historical
data
StreamX Solution
• Instantly
makes
newly
acquired
data
on
StreamX
data
recorders
part
of
the
distributed
the
StreamX
distributed
file
system
• Automatic
parallel
execution
of
a
comparison
application
for
the
new
data
against
existing
data
Automated Regression
Testing
Challenges
• Separating
test
data
from
data
used
to
development
application
• Manually
setting
up
each
regression
run,
or
writing
complicated
scripts
to
manage
some
of
the
testing
StreamX Solution
• Data
management
that
keeps
regression
test
data
separate
from
development
data
• Automated
runs
of
test
data
in
parallel,
executing
the
application
on
the
cluster
nodes
where
the
data
resides
How We Do It
XCube has five guiding principles
that drive the design of our stream
data management solutions:
Never require a rewrite of the
customer's application. Those are
complex and proprietary.
Never move the data, because
individual datasets can be Terabytes
and collections of testing can be
Petabytes. Also some data sets are
restricted from being moved.
Support globally distributed data
and teams, so that any of the data
can be accessed and used anywhere
in the world.
Enable customers to define what
content is important for searches.
Provide the maximum amount of
automation, so the customer can
focus on the science and engineering
instead of IT and data management.
Why We Do It
Our inspiration came
while XCube founder
Mikael Taveniku was on
a consulting project to
design the autonomous
drive architecture for a major
automotive company. His approach
extended the US DoD multi-sensor
fusion architecture for unmanned
aerial vehicles to unmanned land
vehicles, and their architecture is still
in use today.
While completing that project, he
realized that there was a challenge
even bigger than the design of the
autonomous drive system – validating
the correctness of the system design
and operation. The enabling function-
ality for that validation is to find
relevant data, process it, and do so in
parallel without parallel programming.
From that vision, XCube is changing
the way ADAS and autonomous
vehicles get developed, tested, and
validated.