More Related Content Similar to Pivotal HD and Spring for Apache Hadoop (20) Pivotal HD and Spring for Apache Hadoop2. 2© Copyright 2013 Pivotal. All rights reserved. 2© Copyright 2013 Pivotal. All rights reserved.
Hadoop and Pivotal HD
April 23, 2013
3. 3© Copyright 2013 Pivotal. All rights reserved.
About the speakers
Adam Shook
â Technical Architect for Pivotal
â 2+ years Hadoop experience
â Instructor for Hadoop-based courses
Mark Pollack
â Spring committer since 2003
â Founder of Spring.NET
â Lead Spring Data family of projects
4. 4© Copyright 2013 Pivotal. All rights reserved.
Agenda
What is Hadoop?
Pivotal HD
HAWQ
Spring for Apache Hadoop
Questions
5. 5© Copyright 2013 Pivotal. All rights reserved. 5© Copyright 2013 Pivotal. All rights reserved. 5© Copyright 2013 Pivotal. All rights reserved.
What is Hadoop?
6. 6© Copyright 2013 Pivotal. All rights reserved.
Why Hadoop is Important?
Delivers performance and scalability at low cost
Handles large amounts of data
Stores data in native format
Resilient in case of infrastructure failures
Transparent application scalability
7. 7© Copyright 2013 Pivotal. All rights reserved.
Hadoop Overview
Open-source Apache project out of Yahoo! in 2006
Distributed fault-tolerant data storage and batch processing
Linear scalability on commodity hardware
8. 8© Copyright 2013 Pivotal. All rights reserved.
Hadoop Overview
Great at
â Reliable storage for huge data sets
â Batch queries and analytics
â Changing schemas
Not so great at
â Changes to files (canât do itâŠ)
â Low-latency responses
â Analyst usability
9. 9© Copyright 2013 Pivotal. All rights reserved.
HDFS Overview
Hierarchical UNIX-like file system for data storage
â sort of
Splitting of large files into blocks
Distribution and replication of blocks to nodes
Two key services
â Master NameNode
â Many DataNodes
Secondary/Checkpoint Node
10. 10© Copyright 2013 Pivotal. All rights reserved.
How HDFS Works - Writes
DataNode A DataNode B DataNode C DataNode D
NameNode
1
Client
2
A1
3
A2 A3 A4
Client contacts NameNode to write data
NameNode says write it to these nodes
Client sequentially
writes blocks to DataNode
11. 11© Copyright 2013 Pivotal. All rights reserved.
How HDFS Works - Writes
DataNode A DataNode B DataNode C DataNode D
NameNodeClient
A1 A2 A3 A4 A1A1 A2A2
A3A3A4 A4
DataNodes replicate data
blocks, orchestrated
by the NameNode
12. 12© Copyright 2013 Pivotal. All rights reserved.
How HDFS Works - Reads
DataNode A DataNode B DataNode C DataNode D
NameNodeClient
A1 A2 A3 A4 A1A1 A2A2
A3A3A4 A4
1
2
3
Client contacts NameNode to read data
NameNode says you can find it here
Client sequentially
reads blocks from DataNode
13. 13© Copyright 2013 Pivotal. All rights reserved.
Hadoop MapReduce 1.x
Moves the code to the data
JobTracker
â Master service to monitor jobs
TaskTracker
â Multiple services to run tasks
â Same physical machine as a DataNode
A job contains many tasks
A task contains one or more task attempts
14. 14© Copyright 2013 Pivotal. All rights reserved.
How MapReduce Works
DataNode A
A1 A2 A4 A2 A1 A3 A3 A2 A4 A4 A1 A3
JobTracker
1
Client
4
2
B1 B3 B4 B2 B3 B1 B3 B2 B4 B4 B1 B2
3
DataNode B DataNode C DataNode D
TaskTracker A TaskTracker B TaskTracker C TaskTracker D
Client submits job to JobTracker
JobTracker submits
tasks to TaskTrackers
Job output is written to
DataNodes w/replication
JobTracker reports metrics
15. 15© Copyright 2013 Pivotal. All rights reserved.
MapReduce Paradigm
Data processing system with two key phases
Map
â Perform a map function on key/value pairs
Reduce
â Perform a reduce function on key/value groups
Groups created by sorting map output
16. 16© Copyright 2013 Pivotal. All rights reserved.
Reduce Task 0 Reduce Task 1
Map Task 0 Map Task 1 Map Task 2
(0, "hadoop is fun") (52, "I love hadoop") (104, "Pig is more fun")
("hadoop", 1)
("is", 1)
("fun", 1)
("I", 1)
("love", 1)
("hadoop", 1)
("Pig", 1)
("is", 1)
("more", 1)
("fun", 1)
("hadoop", {1,1})
("is", {1,1})
("fun", {1,1})
("love", {1})
("I", {1})
("Pig", {1})
("more", {1})
("hadoop", 2)
("fun", 2)
("love", 1)
("I", 1)
("is", 2)
("Pig", 1)
("more", 1)
SHUFFLE AND SORT
Map Input
Map Output
Reducer Input Groups
Reducer Output
17. 17© Copyright 2013 Pivotal. All rights reserved.
Word Count
Count the number of times
each word is used in a body
of text
Map input is a line of text
Reduce output a word and
the count
map(byte_offset, line)
foreach word in line
emit(word, 1)
reduce(word, counts)
sum = 0
foreach count in counts
sum += count
emit(word, sum)
18. 18© Copyright 2013 Pivotal. All rights reserved.
Mapper Code
public class WordMapper
extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable ONE = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, ONE);
}
}
}
19. 19© Copyright 2013 Pivotal. All rights reserved.
Reducer Code
public class IntSumReducer
extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values,
Context context) {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
20. 20© Copyright 2013 Pivotal. All rights reserved. 20© Copyright 2013 Pivotal. All rights reserved. 20© Copyright 2013 Pivotal. All rights reserved.
Pivotal HD
21. 21© Copyright 2013 Pivotal. All rights reserved.
Pivotal HD
ï§Worldâs first true SQL processing for enterprise-ready
Hadoop
ï§100% Apache Hadoop-based platform
ï§Virtualization and cloud ready with VMWare and Isilon
22. 22© Copyright 2013 Pivotal. All rights reserved.
Pivotal HD Architecture
HDFS
HBase
Pig, Hive, Mah
out
MapReduce
Sqoop Flume
Resource
Management
& Workflow
Yarn
ZooKeeper
Deploy,
Configure,
Monitor, Manage
Command
Center
Hadoop Virtualization (HVE)
Data Loader
Pivotal HD
Enterprise
Apache Pivotal HD Enterprise HAWQ
Xtension
Framework
Catalog
Services
Query
Optimizer
Dynamic Pipelining
ANSI SQL + Analytics
HAWQ â Advanced
Database Services
Spring
23. 23© Copyright 2013 Pivotal. All rights reserved. 23© Copyright 2013 Pivotal. All rights reserved. 23© Copyright 2013 Pivotal. All rights reserved.
HAWQ
24. 24© Copyright 2013 Pivotal. All rights reserved.
HAWQ: The Crown Jewel of Greenplum
ï§ SQL compliant
ï§ World-class query optimizer
ï§ Interactive query
ï§ Horizontal scalability
ï§ Robust data management
ï§ Common Hadoop formats
ï§ Deep analytics
25. 25© Copyright 2013 Pivotal. All rights reserved.
HAWQ
Query Processing
â Interactive and true ANSI
SQL support
â Multi-petabyte horizontal
scalability
â Cost-based parallel query
optimizer
â Programmable analytics
Database Services and
Management
â Scatter-gather data loading
â Row and column storage
â Workload management
â Multi-level partitioning
â 3rd-party tool & open client
interfaces
26. 26© Copyright 2013 Pivotal. All rights reserved.
10+ Years MPP Database R&D to Hadoop
PRODUCT
FEATURES
CLIENT ACCESS
& TOOLS
Multi-Level Fault Tolerance
Shared-Nothing MPP
Parallel Query Optimizer
Polymorphic Data Storageâą
CLIENT ACCESS
ODBC, JDBC, OLEDB,
MapReduce, etc.
MPP
ARCHITECTURE
Parallel Dataflow Engine
Software Interconnect
Scatter/Gather Streamingâą Data Loading
Online System Expansion Workload Management
ADAPTIVE
SERVICES
LOADING & EXT. ACCESS
Petabyte-Scale Loading
Trickle Micro-Batching
Anywhere Data Access
STORAGE & DATA ACCESS
Hybrid Storage & Execution
(Row- & Column-Oriented)
In-Database Compression
Multi-Level Partitioning
LANGUAGE SUPPORT
Comprehensive SQL
SQL 92, 99, 2003
OLAP Extensions
Analytics Extensions
3rd PARTY TOOLS
BI Tools, ETL Tools
Data Mining, etc
ADMIN TOOLS
Command Center
Package Manager
27. 27© Copyright 2013 Pivotal. All rights reserved.
Query Optimizer
Physical plan contains
scans, joins, sorts, aggregations,
etc.
Cost-based optimization looks for
the most efficient plan
Global planning avoids sub-
optimal âSQL pushingâ to
segments
Directly inserts âmotionâ nodes
for inter-segment communication
Execution Plan
ScanBars
b
HashJoinb.name =s.bar
ScanSells
s
Filterb.city ='SanFrancisco'
Projects.beer, s.price
MotionGather
MotionRedist(b.name)
28. 28© Copyright 2013 Pivotal. All rights reserved.
Dynamic PipeliningTM
A supercomputing-based âsoft-switchâ
Core execution technology, borrowed from GPDB, allows us to run
complex job without materializing intermediate results.
Efficiently pumping streams of data between motion nodes during
query-plan execution
Delivers messages, moves data, collects results, and coordinates work
among the segments in the system
Dynamic PipeliningTM
29. 29© Copyright 2013 Pivotal. All rights reserved.
Xtension Framework
Enables Intelligent query integration
with filter pushdown to
HBase, Hive, and HDFS
Supports common data formats such
Avro, Protocol Buffers and Sequence
Files
Provides extensible framework for
connectivity to other data sourcesHDFS HBase Hive
Xtension Framework
30. 30© Copyright 2013 Pivotal. All rights reserved.
HAWQ Deployment
Dynamic Pipelining
... ...
......
Master
Servers & Name
Nodes
Query planning & dispatch
Segment
Servers & Data
Nodes
Query processing &
data storage
External
Sources
Loading, streami
ng, etc.
HDFS
ODBC/JDBC Driver
37. 37© Copyright 2013 Pivotal. All rights reserved. 37© Copyright 2013 Pivotal. All rights reserved. 37© Copyright 2013 Pivotal. All rights reserved.
Spring for Apache
Hadoop
Simplify developing Hadoop Applications
38. 38© Copyright 2013 Pivotal. All rights reserved.
Developer observations on Hadoop
Hadoop has a poor out of the box programming model
Non trivial applications often become a collection of scripts
calling Hadoop command line applications
Spring aims to simplify developer Hadoop applications
â Leverage several Spring eco-system projects
39. 39© Copyright 2013 Pivotal. All rights reserved.
Spring For Apache Hadoop - Features
Consistent programming and declarative configuration model
â Create, configure, and parameterize Hadoop connectivity and all job types
â Environment profiles â easily move application from dev to qa to production
Developer productivity
â Create well-formed applications, not spaghetti script applications
â Simplify HDFS access and FsShell API with support for JVM scripting
â Runner classes for MR/Pig/Hive/Cascading for small workflows
â Helper âTemplateâ classes for Pig/Hive/HBase
40. 40© Copyright 2013 Pivotal. All rights reserved.
Spring For Apache Hadoop â Use Cases
Apply across a wide range of use cases
â Ingest: Events/JDBC/NoSQL/Files to HDFS
â Orchestrate: Hadoop Jobs
â Export: HDFS to JDBC/NoSQL
Spring Integration and Spring Batch make this possible
41. 41© Copyright 2013 Pivotal. All rights reserved.
âą Standard Hadoop APIs
Counting Words â Configuring M/R
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
Job.setJarByClass(WordCountMapper.class);
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
42. 42© Copyright 2013 Pivotal. All rights reserved.
Configuring Hadoop with Spring
<context:property-placeholder location="hadoop-dev.properties"/>
<hdp:configuration>
fs.default.name=${hd.fs}
</hdp:configuration>
<hdp:job id="word-count-job"
input-path=â${input.path}"
output-path="${output.path}â
jar=âhadoop-examples.jarâ
mapper="examples.WordCount.WordMapperâ
reducer="examples.WordCount.IntSumReducer"/>
<hdp:job-runner id=ârunnerâ job-ref="word-count-jobâ
run-at-startup=âtrueâ />
input.path=/wc/input/
output.path=/wc/word/
hd.fs=hdfs://localhost:9000
applicationContext.xml
hadoop-dev.properties
Automatically determines
Output key and class
43. 43© Copyright 2013 Pivotal. All rights reserved.
Injecting Jobs
Use DI to obtain reference to Hadoop Job
â Perform additional runtime configuration and submit
public class WordService {
@Autowired
private Job mapReduceJob;
public void processWords() {
mapReduceJob.submit();
}
}
44. 44© Copyright 2013 Pivotal. All rights reserved.
input.path=/wc/input/
output.path=/wc/word/
hd.fs=hdfs://localhost:9000
Streaming Jobs and Environment Configuration
bin/hadoop jar hadoop-streaming.jar
âinput /wc/input âoutput /wc/output
-mapper /bin/cat âreducer /bin/wc
-files stopwords.txt
<context:property-placeholder location="hadoop-${env}.properties"/>
<hdp:streaming id=âwcâ input-path=â${input}â output-path=â${output}â
mapper=â${cat}â reducer=â${wc}â
files=âclasspath:stopwords.txtâ>
</hdp:streaming>
env=dev java âjar SpringLauncher.jar applicationContext.xml
hadoop-dev.properties
45. 45© Copyright 2013 Pivotal. All rights reserved.
Streaming Jobs and Environment Configuration
bin/hadoop jar hadoop-streaming.jar
âinput /wc/input âoutput /wc/output
-mapper /bin/cat âreducer /bin/wc
-files stopwords.txt
<context:property-placeholder location="hadoop-${env}.properties"/>
<hdp:streaming id=âwcâ input-path=â${input}â output-path=â${output}â
mapper=â${cat}â reducer=â${wc}â
files=âclasspath:stopwords.txtâ>
</hdp:streaming>
env=qa java âjar SpringLauncher.jar applicationContext.xml
input.path=/gutenberg/input/
output.path=/gutenberg/word/
hd.fs=hdfs://darwin:9000
hadoop-qa.properties
46. 46© Copyright 2013 Pivotal. All rights reserved.
âą Access all âbin/hadoop fsâ commands through
Springâs FsShell helper class
â mkdir, chmod, test
HDFS and Hadoop Shell as APIs
class MyScript {
@Autowired FsShell fsh;
@PostConstruct void init() {
String outputDir = "/data/output";
if (fsShell.test(outputDir)) {
fsShell.rmr(outputDir);
}
}
}
47. 47© Copyright 2013 Pivotal. All rights reserved.
HDFS and Hadoop Shell as APIs
FsShell is designed to support JVM scripting languages
// use the shell (made available under variable fsh)
if (!fsh.test(inputDir)) {
fsh.mkdir(inputDir);
fsh.copyFromLocal(sourceFile, inputDir);
fsh.chmod(700, inputDir)
}
if (fsh.test(outputDir)) {
fsh.rmr(outputDir)
}
copy-files.groovy
48. 48© Copyright 2013 Pivotal. All rights reserved.
HDFS and Hadoop Shell as APIs
Reference script and supply variables in application
configuration
<script id="setupScript" location="copy-files.groovy">
<property name="inputDir" value="${wordcount.input.path}"/>
<property name="outputDir" value="${wordcount.output.path}"/>
<property name=âsourceFileâ value="${localSourceFile}"/>
</script>
appCtx.xml
49. 49© Copyright 2013 Pivotal. All rights reserved.
Small workflows
Often need the following steps
â Execute HDFS operations before job
â Run MapReduce Job
â Execute HDFS operations after job completes
Springâs JobRunner helper class sequences these steps
â Can reference multiple scripts with comma delimited names
<hdp:job-runner id="runner" run-at-startup="true"
pre-action="setupScript"
job="wordcountJobâ
post-action=âtearDownScript"/>
50. 50© Copyright 2013 Pivotal. All rights reserved.
Runner classes
Similar runner classes available for Hive and Pig
Implement JDK callable interface
Easy to schedule for simple needs using Spring
Can later âgraduateâ to use Spring Batch for more complex workflows
â Start simple and grow, reusing existing configuration
<hdp:job-runner id="runnerâ run-at-startup=âfalse"
pre-action="setupScriptâ
job="wordcountJobâ
post-action=âtearDownScript"/>
<task:scheduled-tasks>
<task:scheduled ref="runner" method="call" cron="3/30 * * * * ?"/>
</task:scheduled-tasks>
51. 51© Copyright 2013 Pivotal. All rights reserved.
Springâs PigRunner
Execute a small Pig workflow
<pig-factory job-name=âanalysisâ properties-location="pig-server.properties"/>
<script id="hdfsScriptâ location="copy-files.groovy">
<property name=âsourceFile" value="${localSourceFile}"/>
<property name="inputDir" value="${inputDir}"/>
<property name="outputDir" value="${outputDir}"/>
</script>
<pig-runner id="pigRunnerâ pre-action="hdfsScriptâ run-at-startup="true">
<script location=âwordCount.pig">
<arguments>
inputDir=${inputDir}
outputDir=${outputDir}
</arguments>
</script>
</pig-runner>
52. 52© Copyright 2013 Pivotal. All rights reserved.
PigTemplate - Configuration
Helper class that simplifies the programmatic use of Pig
â Common tasks are one-liners
Similar template helper classes for Hive and HBase
<pig-factory id="pigFactoryâ properties-location="pig-server.properties"/>
<pig-template pig-factory-ref="pigFactory"/>
53. 53© Copyright 2013 Pivotal. All rights reserved.
PigTemplate â Programmatic Use
public class PigPasswordRepository implements PasswordRepository {
@Autowired
private PigTemplate pigTemplate;
@Autowired
private String outputDir;
private String pigScript = "classpath:password-analysis.pig";
public void processPasswordFile(String inputFile) {
Properties scriptParameters = new Properties();
scriptParameters.put("inputDir", inputFile);
scriptParameters.put("outputDir", outputDir);
pigTemplate.executeScript(pigScript, scriptParameters);
}
}
54. 54© Copyright 2013 Pivotal. All rights reserved.
Big Data problems are also integration problems
Collect Transform RT Analysis Ingest Batch Analysis Distribute Use
Spring Integration & Data
Spring Hadoop +
Batch
Spring MVCTwitter Search
& Gardenhose
Redis
Gemfire (CQ)
55. 55© Copyright 2013 Pivotal. All rights reserved.
Spring Integration
ï§ Implementation of Enterprise Integration Patterns
â Mature, since 2007
â Apache 2.0 License
ï§ Separates integration concerns from processing logic
â Framework handles message reception and method invocation
âą e.g. Polling vs. Event-driven
â Endpoints written as POJOs
âą Increases testability
Endpoint Endpoint
56. 56© Copyright 2013 Pivotal. All rights reserved.
Pipes and Filters Architecture
ï§Endpoints are connected through Channels and exchange
Messages
$> cat foo.txt | grep the | while read l; do echo $l ; done
Endpoint Endpoint
Channel
Producer ConsumerFile RouteJMS TCP
57. 57© Copyright 2013 Pivotal. All rights reserved.
Spring Batch
Framework for batch processing
â Basis for JSR-352
Born out of collaboration with
Accenture in 2007
Features
â parsers, mappers, readers, writers
â automatic retries after failure
â periodic commits
â synchronous and asynch processing
â parallel processing
â partial processing (skipping records)
â non-sequential processing
â job tracking and restart
58. 58© Copyright 2013 Pivotal. All rights reserved.
Spring Integration and Batch for Hadoop
Ingest/Export
Event Streams â Spring Integration
â Examples
âȘ Consume syslog events, transform and write to HDFS
âȘ Consume twitter search results and write to HDFS
Batch â Spring Batch
â Examples
âȘ Read log files on local file system, transform and write to HDFS
âȘ Read from HDFS, transform and write to JDBC, HBase, MongoDB,âŠ
59. 59© Copyright 2013 Pivotal. All rights reserved.
Spring Data, Integration, & Batch for Analytics
Realtime Analytics â Spring Integration & Data
â Examples â Service Activator that
âȘ Increments counters in Redis or MongoDB using Spring Data helper libraries
âȘ Create Gemfire Continuous Queries using Spring Gemfire
Batch Analytics â Spring Batch
â Orchestrate Hadoop based workflows with Spring Batch
â Also orchestrate non-hadoop based workflows
60. 60© Copyright 2013 Pivotal. All rights reserved.
Ingesting â Syslog into HDFS
Use SIâs syslog adapter
Perform transformation on data
Route to specific channels based
on category
One route leads to HDFS and
filtered data stored in Redis
61. 61© Copyright 2013 Pivotal. All rights reserved.
Ingesting â Multi-node syslog into HDFS
Syslog collection across multiple
machines
Break processing chain at
channel boundaries
Use SIâs TCP adapters to forward
events
â Or other SI middleware adapters
62. 62© Copyright 2013 Pivotal. All rights reserved.
Hadoop Analytical workflow managed by
Spring Batch
ï§ Reuse same Batch infrastructure
and knowledge to manage
Hadoop workflows
ï§ Step can be any Hadoop job
type or HDFS script
63. 63© Copyright 2013 Pivotal. All rights reserved.
Spring Batch Configuration for Hadoop
<job id="job1">
<step id="import" next="wordcount">
<tasklet ref=âimport-tasklet"/>
</step>
<step id=âwc" next="pig">
<tasklet ref="wordcount-tasklet"/>
</step>
<step id="pig">
<tasklet ref="pig-taskletâ></step>
<split id="parallel" next="hdfs">
<flow><step id="mrStep">
<tasklet ref="mr-tasklet"/>
</step></flow>
<flow><step id="hive">
<tasklet ref="hive-tasklet"/>
</step></flow>
</split>
<step id="hdfs">
<tasklet ref="hdfs-tasklet"/></step>
</job>
64. 64© Copyright 2013 Pivotal. All rights reserved.
âą Use Spring Batchâs
â MutliFileItemReader
â JdbcItemWriter
Exporting HDFS to JDBC
<step id="step1">
<tasklet>
<chunk reader=âflatFileItemReader" processor="itemProcessor" writer=âjdbcItemWriter"
commit-interval="100" retry-limit="3"/>
</chunk>
</tasklet>
</step>
66. 66© Copyright 2013 Pivotal. All rights reserved.
Next Steps â Spring XD
New open source umbrella project to support common big
data use cases
â High throughput distributed data ingestion into HDFS
âȘ From a variety of input sources
â Real-time analytics at ingestion time
âȘ Gathering metrics, counting values, Gemfire CQâŠ
â On and off Hadoop workflow orchestration
â High throughput data export
âȘ From HDFS to a RDBMS or NoSQL database.
XD = eXtreme Data or y= mx + b
67. 67© Copyright 2013 Pivotal. All rights reserved.
Next Steps â Spring XD
Consistent model that spans the 4 use-case categories
Move beyond delivering a set of libraries
â Provide out-of-the-box executable servier
â High level DSL to configure flows and jobs
âȘ http | hdfs
â Pluggable module system
See blog post for more information
â Github: http://github.com/springsource/spring-xd
Get involved!
68. 68© Copyright 2013 Pivotal. All rights reserved.
Resources
Pivotal
â goPivotal.com
Spring Data
â http://www.springsource.org/spring-data
â http://www.springsource.org/spring-hadoop
Spring Data Book - http://bit.ly/sd-book
â Part III on Big Data
Example Code https://github.com/SpringSource/spring-data-book
Spring XD http://github.com/springsource/spring-xd
Editor's Notes Client contacts the namenode with a request to write some dataNamenode responds and says okay write it to these data nodesClient connects to each data node and writes out four blocks, one per node After the file is closed, the data nodes traffic data around to replicate the blocks to a triplicate, all orchestrated by the namenodeIn the event of a node failure, data can be accessed on other nodes and the namenode will move data blocks to other nodes Client contacts the namenode with a request to write some dataNamenode responds and says okay write it to these data nodesClient connects to each data node and writes out four blocks, one per node Uses key value pairs as input and output to both phasesHighly parallelizable paradigm â very easy choice for data processing on a Hadoop cluster Advanced Database Services (HAWQ) â high-performance, âTrue SQLâ query interface running within the Hadoop cluster.Xtensions Framework â support for ADS interfaces on external data providers (HBase, Avro, etc.).Advanced Analytics Functions (MADLib) â ability to access parallelized machine-learning and data-mining functions at scale.Unified Storage Services (USS)and Unified Catalog Services (UCS) â support for tiered storage (hot, warm, cold) and integration of multiple data provider catalogs into a single interface. HDFSDelimited TextSequence FileGPDB Writable FormatProtocol BufferAvroHbasePredicate PushdownHiveRCFileText FileSequence File