Contenu connexe
Similaire à Greenplum Database on HDFS (20)
Plus de DataWorks Summit (20)
Greenplum Database on HDFS
- 1. Greenplum Database on HDFS
(GOH)
Presenter: Lei Chang
lei.chang@emc.com
© Copyright 2012 EMC Corporation. All rights reserved. 1
- 2. Outline
• Introduc/on
• Architecture
• Features
• Performance
study
© Copyright 2012 EMC Corporation. All rights reserved. 2
- 4. GOH
use
cases
• All
customers
of
Greenplum
who
want
to
minimize
the
amount
of
duplicate
storage
that
they
have
to
buy
for
analy/cs
– managing
scale
much
easier
if
you
focus
on
the
growth
of
one
pool
than
having
many
fragmented
pools.
• For
customers
who
want
the
func/onality
of
GPDB
with
the
generality
and
storage
provided
by
their
HBase
store.
• Poten/al
Ability
to
plug
various
storage
such
as
Isilon,
Atoms,
MapR
Filesystem,
CloudStore,
GPFS,
Lustre,
PVFS
and
Ceph
to
GPDB/Hadoop
soQware
stack
© Copyright 2012 EMC Corporation. All rights reserved. 4
- 5. Master host
GPDB Interconnect
Segment
Segment (Mirror)
Segment Segment Segment
Segment
Segment Segment (Mirror) Segment Segment
(Mirror) (Mirror) (Mirror)
Segment host Segment host Segment host Segment host Segment host
Meta Ops Read/Write
Tables in HDFS filespace
Namenode
B
Datanode replication
Datanode Datanode
Rack1 Rack2
© Copyright 2012 EMC Corporation. All rights reserved. 5
- 6. GOH
features
• A
pluggable
storage
layer.
If
a
new
file
system
can
support
the
full
seman/c
of
HDFS
interface,
then
the
file
system
can
be
added
as
GPDB
AO
table
storage.
• ASributed
filespace
• HDFS
filespaces
are
na/vely
supported
• Full
transac/on
support
for
AO
tables
on
HDFS.
• HDFS
trunca/on
capability
to
support
the
transac/on
capability
of
GOH.
• HDFS
na/ve
C
interface
to
eliminate
the
concurrency
limita/on
of
current
java
JNI
based
client.
• All
current
GPDB
func/onality:
fault
tolerance
et
al.
© Copyright 2012 EMC Corporation. All rights reserved. 6
- 7. Pluggable
storage:
user
interface
CREATE
FUNCTION
open_func
AS
'('
obj_file
','
link_smybol
')'
CREATE
FILESYSTEM
filesystemname
[OWNER
ownername]
(
connect
=
connect_func,
open
=
open_func,
close
=
close_func,
read
=
read_func,
write
=
write_func,
seek
=
seek_func,
...
)
© Copyright 2012 EMC Corporation. All rights reserved. 7
- 8. ASributed
filespaces
• The
number
of
replicas
for
the
table
in
the
filespace
• Whether
mirroring
is
supported
for
the
tables
stored
in
the
filespace
• Other
aSributes…
© Copyright 2012 EMC Corporation. All rights reserved. 8
- 9. Example
SQL
CREATE
FILESPACE
goh
ON
HDFS
(
1:
'hdfs://name-‐node/users/changl1/gp-‐data/gohmaster/gpseg-‐1',
2:
'hdfs://name-‐node/users/changl1/gp-‐data/goh/gpseg0',
3:
'hdfs://name-‐node/users/changl1/gp-‐data/goh/gpseg1',
)
WITH
(NUMREPLICA
=
3,
MIRRORING
=
false);
© Copyright 2012 EMC Corporation. All rights reserved. 9
- 10. Transac/on
support
• When
a
load
transac/on
is
aborted,
there
will
be
some
garbage
data
leQ
at
the
end
of
file.
For
HDFS
like
systems,
data
cannot
be
truncated
or
overwriSen.
Thus,
we
need
some
methods
to
process
the
par/al
data
to
support
transac/on.
– Op/on
1:
Load
data
into
a
separate
HDFS
file.
Unlimited
number
of
files.
– Op/on
2:
Use
metadata
to
records
the
boundary
of
garbage
data,
and
implements
a
kind
of
vacuum
mechanism.
– Op/on
3:
Implement
HDFS
trunca/on.
© Copyright 2012 EMC Corporation. All rights reserved. 10
- 11. HDFS
C
client:
why
• libhdfs
(Current
HDFS
c
client)
is
based
on
JNI.
It
is
difficult
to
make
GOH
support
a
large
number
of
concurrent
queries.
• Example:
– 6
segments
on
each
segment
hosts
– 50
concurrent
queries
– each
query
may
have
12
or
more
QE
processes
that
do
scan
– there
will
be
about
600
processes
that
start
600
JVMs
to
access
HDFS.
– If
each
JVM
uses
500MB
memory,
the
JVMs
will
consume
600
*
500M
=
300G
memory.
– Thus
naïve
usage
of
libhdfs
is
not
suitable
for
GOH.
Currently
we
have
three
op/ons
to
solve
this
problem
© Copyright 2012 EMC Corporation. All rights reserved. 11
- 12. HDFS
client:
three
op/ons
• Op/on
1:
use
HDFS
FUSE.
HDFS
FUSE
introduces
some
performance
overhead.
And
the
scalability
is
not
verified
yet.
• Op/on
3:
implement
a
webhdfs
based
C
client.
webhdfs
is
based
on
HTTP.
It
also
introduces
some
costs.
Performance
should
be
benchmarked.
Webhdfs
based
method
has
several
benefits,
such
as
ease
to
implementa/on
and
low
maintenance
cost.
• Op/on
2:
implement
a
C
RPC
interface
that
directly
communicates
with
NameNode
and
DataNode.
Many
changes
when
the
RPC
protocol
is
changed.
• Currently,
we
implemented
op/on
2
and
op/on
3.
© Copyright 2012 EMC Corporation. All rights reserved. 12
- 13. HDFS
truncate
• API
– truncate
(DistributedFileSystem)
-‐
truncate
a
file
to
a
specified
length
– void
truncate(Path
src,
long
length)
throws
IOExcep/on;
• Seman/cs
– Only
single
writer/Appender/Truncater
is
allowed.
Users
can
only
call
truncate
on
closed
files.
– HDFS
guarantees
the
atomicity
of
a
truncate
opera/on.
That
is,
it
succeeds
or
fails.
It
does
not
leave
the
file
in
an
undefined
state.
– Concurrent
readers
may
read
content
of
a
file
that
will
be
truncated
by
a
concurrent
truncate
opera/on.
But
they
must
be
able
to
read
all
the
data
that
are
not
affected
by
the
concurrent
truncate
opera/on.
© Copyright 2012 EMC Corporation. All rights reserved. 13
- 14. HDFS
truncate
implementa/on
(HDFS-‐3107)
• Get
the
lease
of
the
to-‐be-‐truncated
file
(F)
• If
truncate
is
at
block
boundary
– Delete
the
tail
blocks
as
an
atomic
opera/on.
• If
truncate
is
not
at
block
boundary
– Copy
the
last
block
(B)
of
the
result
file
(R)
to
a
temporary
file
(T).
• Otherwise,
If
truncate
is
not
at
block
boundary
• Remove
the
tail
blocks
of
file
F
(including
B,
B+1,
…),
concat
F
and
T,
get
R.
• Release
the
lease
for
the
file
© Copyright 2012 EMC Corporation. All rights reserved. 14
- 16. Thank
you!
© Copyright 2012 EMC Corporation. All rights reserved. 16