SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
© 2012 coreservlets.com and Dima May 
HDFS - Java API 
Originals of Slides and Source Code for Examples: 
http://www.coreservlets.com/hadoop-tutorial/ 
Customized Java EE Training: http://courses.coreservlets.com/ 
Hadoop, Java, JSF 2, PrimeFaces, Servlets, JSP, Ajax, jQuery, Spring, Hibernate, RESTful Web Services, Android. 
Developed and taught by well-known author and developer. At public venues or onsite at your location. 
© 2012 coreservlets.com and Dima May 
For live Hadoop training, please see courses 
at http://courses.coreservlets.com/. 
Taught by the author of this Hadoop tutorial. Available 
at public venues, or customized versions can be held 
on-site at your organization. 
• Courses developed and taught by Marty Hall 
– JSF 2, PrimeFaces, servlets/JSP, Ajax, jQuery, Android development, Java 6 or 7 programming, custom mix of topics 
– Ajax courses can concentrate on 1 library (jQuery, Prototype/Scriptaculous, Ext-JS, Dojo, etc.) or survey several 
Customized Java EE Training: http://courses.coreservlets.com/ 
• Courses developed and taught by coreservlets.com experts (edited by Marty) 
Hadoop, Java, JSF 2, PrimeFaces, Servlets, JSP, Ajax, jQuery, Spring, Hibernate, RESTful Web Services, Android. 
– Hadoop, Spring, Hibernate/JPA, GWT, SOAP-based and RESTful Web Services 
Developed and taught by well-known Contact author hall@and coreservlets.developer. com At public for details 
venues or onsite at your location.
Agenda 
• Java API Introduction 
• Configuration 
• Reading Data 
• Writing Data 
• Browsing file system 
4 
File System Java API 
• org.apache.hadoop.fs.FileSystem 
– Abstract class that serves as a generic file system 
representation 
– Note it’s a class and not an Interface 
• Implemented in several flavors 
– Ex. Distributed or Local 
5
FileSystem Implementations 
• Hadoop ships with multiple concrete 
implementations: 
– org.apache.hadoop.fs.LocalFileSystem 
• Good old native file system using local disk(s) 
– org.apache.hadoop.hdfs.DistributedFileSystem 
• Hadoop Distributed File System (HDFS) 
• Will mostly focus on this implementation 
– org.apache.hadoop.hdfs.HftpFileSystem 
• Access HDFS in read-only mode over HTTP 
– org.apache.hadoop.fs.ftp.FTPFileSystem 
• File system on FTP server 
6 
FileSystem Implementations 
• FileSystem concrete implementations 
– Two options that are backed by Amazon S3 cloud 
• org.apache.hadoop.fs.s3.S3FileSystem 
• http://wiki.apache.org/hadoop/AmazonS3 
– org.apache.hadoop.fs.kfs.KosmosFileSystem 
• Backed by CloudStore 
• http://code.google.com/p/kosmosfs 
7
FileSystem Implementations 
• Different use cases for different concrete 
implementations 
• HDFS is the most common choice 
– org.apache.hadoop.hdfs.DistributedFileSystem 
8 
SimpleLocalLs.java Example 
9 
public class SimpleLocalLs { 
public static void main(String[] args) throws Exception{ 
Path path = new Path("/"); 
if ( args.length == 1){ 
path = new Path(args[0]); 
} 
Configuration conf = new Configuration(); 
FileSystem fs = FileSystem.get(conf); 
FileStatus [] files = fs.listStatus(path); 
for (FileStatus file : files ){ 
System.out.println(file.getPath().getName()); 
} 
} 
} 
Read location from the 
command line arguments 
Acquire FileSystem 
Instance 
List the files and directories 
under the provided path 
Print each sub directory 
or file to the screen
FileSystem API: Path 
• Hadoop's Path object represents a file or a 
directory 
– Not java.io.File which tightly couples to local filesystem 
• Path is really a URI on the FileSystem 
– HDFS: hdfs://localhost/user/file1 
– Local: file:///user/file1 
• Examples: 
– new Path("/test/file1.txt"); 
– new Path("hdfs://localhost:9000/test/file1.txt"); 
10 
Hadoop's Configuration Object 
• Configuration object stores clients’ and 
servers’ configuration 
– Very heavily used in Hadoop 
• HDFS, MapReduce, HBase, etc... 
• Simple key-value paradigm 
– Wrapper for java.util.Properties class which itself is just a 
wrapper for java.util.Hashtable 
• Several construction options 
– Configuration conf1 = new Configuration(); 
– Configuration conf2 = new Configuration(conf1); 
• Configuration object conf2 is seeded with configurations of 
conf1 object 
11
Hadoop's Configuration Object 
• Getting properties is simple! 
• Get the property 
– String nnName = conf.get("fs.default.name"); 
• returns null if property doesn't exist 
• Get the property and if doesn't exist return 
the provided default 
– String nnName = conf.get("fs.default.name", 
"hdfs://localhost:9000"); 
• There are also typed versions of these 
methods: 
– getBoolean, getInt, getFloat, etc... 
– Example: int prop = conf.getInt("file.size"); 
12 
Hadoop's Configuration Object 
13 
• Usually seeded via configuration files that are 
read from CLASSPATH (files like conf/core-site. 
xml and conf/hdfs-site.xml): 
Configuration conf = new Configuration(); 
conf.addResource(new Path(HADOOP_HOME + "/conf/core-site. 
xml")); 
• Must comply with the Configuration XML 
schema, ex: 
<?xml version="1.0"?> 
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 
<configuration> 
<property> 
<name>fs.default.name</name> 
<value>hdfs://localhost:9000</value> 
</property> 
</configuration>
Hadoop Configuration Object 
• conf.addResource() - the parameters are 
either String or Path 
– conf.addResource("hdfs-site.xml") 
• CLASSPATH is referenced when String parameter is 
provided 
– conf.addResource(new Path("/exact/location/file.xml") : 
• Path points to the exact location on the local file system 
• By default Hadoop loads 
– core-default.xml 
• Located at hadoop-common-X.X.X.jar/core-default.xml 
– core-site.xml 
• Looks for these configuration files on 
CLASSPATH 14 
LoadConfigurations.java 
Example 
15 
public class LoadConfigurations { 
private final static String PROP_NAME = "fs.default.name"; 
public static void main(String[] args) { 
Configuration conf = new Configuration(); 
System.out.println("After construction: " + 
conf.get(PROP_NAME)); 
conf.addResource(new Path(Vars.HADOOP_HOME + 
"/conf/core-site.xml")); 
System.out.println("After addResource: "+ 
conf.get(PROP_NAME)); 
conf.set(PROP_NAME, "hdfs://localhost:8111"); 
System.out.println("After set: " + conf.get(PROP_NAME)); 
} 
} 
1. Print the property with 
empty Configuration 
2. Add properties from 
core-dsite.xml 
3. Manually set the 
property
Run LoadConfigurations 
16 
$ java -cp 
$PLAY_AREA/HadoopSamples.jar:$HADOOP_HOME/share/had 
common/hadoop-common-2.0.0- 
cdh4.0.0.jar:$HADOOP_HOME/share/hadoop/common/lib/* 
hdfs.LoadConfigurations 
After construction: file:/// 
After addResource: hdfs://localhost:9000 
After set: hdfs://localhost:8111 
1. Print the 
property with 
empty 
Configuration 
2. Add 
properties from 
core-site.xml 
3. Manually set 
the property 
FileSystem API 
• Recall FileSystem is a generic abstract class 
used to interface with a file system 
• FileSystem class also serves as a factory for 
concrete implementations, with the 
following methods 
– public static FileSystem get(Configuration conf) 
• Will use information from Configuration such as scheme 
and authority 
• Recall hadoop loads conf/core-site.xml by default 
• Core-site.xml typically sets fs.default.name property to 
something like hdfs://localhost:8020 
– org.apache.hadoop.hdfs.DistributedFileSystem will be used by 
default 
– Otherwise known as HDFS 
17
Simple List Example 
18 
public class SimpleLocalLs { 
public static void main(String[] args) throws Exception{ 
Path path = new Path("/"); 
if ( args.length == 1){ 
path = new Path(args[0]); 
} 
Configuration conf = new Configuration(); 
FileSystem fs = FileSystem.get(conf); 
FileStatus [] files = fs.listStatus(path); 
for (FileStatus file : files ){ 
System.out.println(file.getPath().getName()); 
} 
} 
} 
What happens when you run this code? 
Execute Simple List Example 
• The Answer is.... it depends 
19 
$ java -cp 
$PLAY_AREA/HadoopSamples.jar:$HADOOP_HOME/share/hadoop/commo 
n/hadoop-common-2.0.0- 
cdh4.0.0.jar:$HADOOP_HOME/share/hadoop/common/lib/* 
hdfs.SimpleLs 
lib 
... 
... 
var 
sbin 
etc 
$ yarn jar $PLAY_AREA/HadoopSamples.jar hdfs.SimpleLs 
hbase 
lost+found 
test1 
tmp 
training 
user 
●Uses java command, not yarn 
●core-site.xml and core-default.xml are not on the CLASSPATH 
●properties are then NOT added to Configuration object 
●Default FileSystem is loaded => local file system 
●Yarn script will place core-default.xml and core-site-xml on 
the CLASSPATH 
●Properties within those files added to Configuration object 
●HDFS is utilized, since it was specified in core-site.xml
Reading Data from HDFS 
1. Create FileSystem 
2. Open InputStream to a Path 
3. Copy bytes using IOUtils 
4. Close Stream 
20 
1: Create FileSystem 
• FileSystem fs = FileSystem.get(new 
Configuration()); 
– If you run with yarn command, DistributedFileSystem 
(HDFS) will be created 
• Utilizes fs.default.name property from configuration 
• Recall that Hadoop framework loads core-site.xml which 
sets property to hdfs (hdfs://localhost:8020) 
21
2: Open Input Stream to a Path 
• fs.open returns 
org.apache.hadoop.fs.FSDataInputStream 
– Another FileSystem implementation will return their own 
custom implementation of InputStream 
• Opens stream with a default buffer of 4k 
• If you want to provide your own buffer size 
use 
– fs.open(Path f, int bufferSize) 
22 
... 
InputStream input = null; 
try { 
input = fs.open(fileToRead); 
... 
3: Copy bytes using IOUtils 
• Copy bytes from InputStream to 
OutputStream 
• Hadoop’s IOUtils makes the task simple 
– buffer parameter specifies number of bytes to buffer at a 
time 
23 
IOUtils.copyBytes(inputStream, outputStream, buffer);
4: Close Stream 
• Utilize IOUtils to avoid boiler plate code 
that catches IOException 
24 
... 
} finally { 
IOUtils.closeStream(input); 
... 
ReadFile.java Example 
25 
public class ReadFile { 
public static void main(String[] args) 
throws IOException { 
Path fileToRead = new Path("/training/data/readMe.txt"); 
FileSystem fs = FileSystem.get(new Configuration()); 
InputStream input = null; 
try { 
1: Open FileSystem 
2: Open InputStream 
input = fs.open(fileToRead); 
IOUtils.copyBytes(input, System.out, 4096); 
} finally { 
IOUtils.closeStream(input); 
} 
} 
} 
3: Copy from Input to 
Output Stream 
4: Close stream 
$ yarn jar $PLAY_AREA/HadoopSamples.jar hdfs.ReadFile 
Hello from readme.txt
Reading Data - Seek 
• FileSystem.open returns 
FSDataInputStream 
– Extension of java.io.DataInputStream 
– Supports random access and reading via interfaces: 
• PositionedReadable : read chunks of the stream 
• Seekable : seek to a particular position in the stream 
26 
Seeking to a Position 
• FSDataInputStream implements Seekable 
interface 
– void seek(long pos) throws IOException 
• Seek to a particular position in the file 
• Next read will begin at that position 
• If you attempt to seek past the file boundary IOException 
is emitted 
• Somewhat expensive operation – strive for streaming and 
not seeking 
– long getPos() throws IOException 
• Returns the current position/offset from the beginning of 
the stream/file 
27
SeekReadFile.java Example 
28 
public class SeekReadFile { 
public static void main(String[] args) throws IOException { 
Path fileToRead = new Path("/training/data/readMe.txt"); 
FileSystem fs = FileSystem.get(new Configuration()); 
FSDataInputStream input = null; 
try { 
input = fs.open(fileToRead); 
System.out.print("start postion=" + input.getPos() + ": 
IOUtils.copyBytes(input, System.out, 4096, false); 
input.seek(11); 
System.out.print("start postion=" + input.getPos() + ": 
IOUtils.copyBytes(input, System.out, 4096, false); 
input.seek(0); 
System.out.print("start postion=" + input.getPos() + ": 
IOUtils.copyBytes(input, System.out, 4096, false); 
} finally { 
IOUtils.closeStream(input); 
} 
} 
} 
Start at position 0 
Seek to position 11 
Seek back to 0 
Run SeekReadFile Example 
29 
$ yarn jar $PLAY_AREA/HadoopSamples.jar hdfs.SeekReadFile 
start position=0: Hello from readme.txt 
start position=11: readme.txt 
start position=0: Hello from readme.txt
Write Data 
1. Create FileSystem instance 
2. Open OutputStream 
– FSDataOutputStream in this case 
– Open a stream directly to a Path from FileSystem 
– Creates all needed directories on the provided path 
3. Copy data using IOUtils 
30 
WriteToFile.java Example 
31 
public class WriteToFile { 
public static void main(String[] args) throws IOException { 
String textToWrite = "Hello HDFS! Elephants are awesome!n"; 
InputStream in = new BufferedInputStream( 
new ByteArrayInputStream(textToWrite.getBytes())); 
Path toHdfs = new Path("/training/playArea/writeMe.txt"); 
Configuration conf = new Configuration(); 
FileSystem fs = FileSystem.get(conf); 
FSDataOutputStream out = fs.create(toHdfs); 
IOUtils.copyBytes(in, out, conf); 
} 
} 
1: Create FileSystem instance 
2: Open OutputStream 
3: Copy Data
Run WriteToFile 
32 
$ yarn jar $PLAY_AREA/HadoopSamples.jar hdfs.WriteToFile 
$ hdfs dfs -cat /training/playArea/writeMe.txt 
Hello HDFS! Elephants are awesome! 
FileSystem: Writing Data 
• Append to the end of the existing file 
– Optional support by concrete FileSystem 
– HDFS supports 
• No support for writing in the middle of the 
file 
33 
fs.append(path)
FileSystem: Writing Data 
• FileSystem's create and append methods 
have overloaded version that take callback 
interface to notify client of the progress 
34 
FileSystem fs = FileSystem.get(conf); 
FSDataOutputStream out = fs.create(toHdfs, new Progressable(){ 
@Override 
public void progress() { 
System.out.print(".."); 
} 
}); 
Report progress to the screen 
Overwrite Flag 
• Recall FileSystem's create(Path) creates all 
the directories on the provided path 
– create(new Path(“/doesnt_exist/doesnt_exist/file/txt”) 
– can be dangerous, if you want to protect yourself then 
utilize the following overloaded method: 
35 
public FSDataOutputStream create(Path f, boolean overwrite) 
Set to false to make sure you do 
not overwrite important data
Overwrite Flag Example 
36 
Path toHdfs = new 
Path("/training/playArea/writeMe.txt"); 
FileSystem fs = FileSystem.get(conf); 
FSDataOutputStream out = fs.create(toHdfs, false); 
Set to false to make sure you do 
not overwrite important data 
$ yarn jar $PLAY_AREA/HadoopSamples.jar hdfs.BadWriteToFile 
Exception in thread "main" 
org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
failed to create file 
/training/playArea/anotherSubDir/writeMe.txt on client 
127.0.0.1 either because the filename is invalid or the file 
exists 
... 
... 
Error indicates that the file already exists 
Copy/Move from and to Local 
FileSystem 
• Higher level abstractions that allow you to 
copy and move from and to HDFS 
– copyFromLocalFile 
– moveFromLocalFile 
– copyToLocalFile 
– moveToLocalFile 
37
Copy from Local to HDFS 
38 
FileSystem fs = FileSystem.get(new Configuration()); 
Path fromLocal = new 
Path("/home/hadoop/Training/exercises/sample_data/hamlet.txt"); 
Path toHdfs = new Path("/training/playArea/hamlet.txt"); 
fs.copyFromLocalFile(fromLocal, toHdfs); 
Copy file from local file 
system to HDFS 
Empty directory 
before copy 
$ hdfs dfs -ls /training/playArea/ 
$ yarn jar $PLAY_AREA/HadoopSamples.jar hdfs.CopyToHdfs 
File was copied 
$ hdfs dfs -ls /training/playArea/ 
Found 1 items 
-rw-r—r-- hadoop supergroup /training/playArea/hamlet.txt 
Delete Data 
39 
FileSystem.delete(Path path,Boolean recursive) 
If recursive == true then non-empty directory 
will be deleted otherwise IOException is emitted 
Path toDelete = 
new Path("/training/playArea/writeMe.txt"); 
boolean isDeleted = fs.delete(toDelete, false); 
System.out.println("Deleted: " + isDeleted); 
$ yarn jar $PLAY_AREA/HadoopSamples.jar hdfs.DeleteFile 
Deleted: true 
$ yarn jar $PLAY_AREA/HadoopSamples.jar hdfs.DeleteFile 
Deleted: false 
File was already deleted by 
previous run
FileSystem: mkdirs 
• Create a directory - will create all the parent 
directories 
40 
Configuration conf = new Configuration(); 
Path newDir = new Path("/training/playArea/newDir"); 
FileSystem fs = FileSystem.get(conf); 
boolean created = fs.mkdirs(newDir); 
System.out.println(created); 
$ yarn jar $PLAY_AREA/HadoopSamples.jar hdfs.MkDir 
true 
FileSystem: listStatus 
41 
Browse the FileSystem with listStatus() 
methods 
Path path = new Path("/"); 
Configuration conf = new Configuration(); 
FileSystem fs = FileSystem.get(conf); 
FileStatus [] files = fs.listStatus(path); 
for (FileStatus file : files ){ 
System.out.println(file.getPath().getName()); 
} 
$ yarn jar $PLAY_AREA/HadoopSamples.jar 
hdfs.SimpleLs 
training 
user List files under ̎/̎ for HDFS
LsWithPathFilter.java example 
42 
FileSystem fs = FileSystem.get(conf); 
FileStatus [] files = fs.listStatus(path, new PathFilter() { 
@Override 
public boolean accept(Path path) { 
if (path.getName().equals("user")){ 
return false; 
} 
return true; 
} 
}); 
for (FileStatus file : files ){ 
System.out.println(file.getPath().getName()); 
} 
Do not show path whose 
name equals to "user" 
Restrict result of 
listStatus() by 
supplying PathFilter 
object 
Run LsWithPathFilter 
Example 
43 
$ yarn jar $PLAY_AREA/HadoopSamples.jar hdfs.SimpleLs 
training 
user 
$yarn jar $PLAY_AREA/HadoopSamples.jar hdfs.LsWithPathFilter 
training
FileSystem: Globbing 
• FileSystem supports file name pattern 
matching via globStatus() methods 
• Good for traversing through a sub-set of 
files by using a pattern 
• Support is similar to bash glob: *, ?, etc... 
44 
SimpleGlobbing.java 
45 
public class SimpleGlobbing { 
public static void main(String[] args) 
throws IOException { 
Path glob = new Path(args[0]); 
FileSystem fs = FileSystem.get(new Configuration()); 
FileStatus [] files = fs.globStatus(glob); 
for (FileStatus file : files ){ 
System.out.println(file.getPath().getName()); 
} 
} 
} 
Read glob from 
command line 
Similar usage to 
listStatus method
Run SimpleGlobbing 
46 
$ hdfs dfs -ls /training/data/glob/ 
Found 4 items 
drwxr-xr-x - hadoop supergroup 0 2011-12-24 11:20 /training/data/glob/2007 
drwxr-xr-x - hadoop supergroup 0 2011-12-24 11:20 /training/data/glob/2008 
drwxr-xr-x - hadoop supergroup 0 2011-12-24 11:21 /training/data/glob/2010 
drwxr-xr-x - hadoop supergroup 0 2011-12-24 11:21 /training/data/glob/2011 
$ yarn jar $PLAY_AREA/HadoopSamples.jar hdfs.SimpleGlobbing /training/data/glob/201* 
2010 
2011 
Usage of glob with * 
FileSystem: Globbing 
47 
Glob Explanation 
? Matches any single character 
* Matches zero or more characters 
[abc] Matches a single character from character set {a,b,c}. 
[a-b] Matches a single character from the character range {a...b}. 
Note that character a must be lexicographically less than or 
equal to character b. 
[^a] Matches a single character that is not from character set or 
range {a}. Note that the ^ character must occur immediately to 
the right of the opening bracket. 
c Removes (escapes) any special meaning of character c. 
{ab,cd} Matches a string from the string set {ab, cd} 
{ab,c{de,fh}} Matches a string from the string set {ab, cde, cfh} 
Source: FileSystem.globStatus API documentation
FileSystem 
• There are several methods that return ‘true’ 
for success and ‘false’ for failure 
– delete 
– rename 
– mkdirs 
• What to do if the method returns 'false'? 
– Check Namenode's log 
• Located at $HADOOP_LOG_DIR/ 
48 
BadRename.java 
49 
FileSystem fs = FileSystem.get(new Configuration()); 
Path source = new Path("/does/not/exist/file.txt"); 
Path nonExistentPath = new Path("/does/not/exist/file1.txt"); 
boolean result = fs.rename(source, nonExistentPath); 
System.out.println("Rename: " + result); 
$ yarn jar $PLAY_AREA/HadoopSamples.jar hdfs.BadRename 
Rename: false 
Namenode's log at $HADOOP_HOME/logs/hadoop-hadoop-namenode-hadoop- 
laptop.log 
2011-12-25 01:18:54,684 WARN 
org.apache.hadoop.hdfs.StateChange: DIR* 
FSDirectory.unprotectedRenameTo: failed to rename 
/does/not/exist/file.txt to /does/not/exist/file1.txt 
because source does not exist
© 2012 coreservlets.com and Dima May 
Wrap-Up 
Customized Java EE Training: http://courses.coreservlets.com/ 
Hadoop, Java, JSF 2, PrimeFaces, Servlets, JSP, Ajax, jQuery, Spring, Hibernate, RESTful Web Services, Android. 
Developed and taught by well-known author and developer. At public venues or onsite at your location. 
Summary 
• We learned about 
– HDFS API 
– How to use Configuration class 
– How to read from HDFS 
– How to write to HDFS 
– How to browse HDFS 
51
© 2012 coreservlets.com and Dima May 
Questions? 
JSF 2, PrimeFaces, Java 7, Ajax, jQuery, Hadoop, RESTful Web Services, Android, Spring, Hibernate, Servlets, JSP, GWT, and other Java EE training. 
Customized Java EE Training: http://courses.coreservlets.com/ 
Hadoop, Java, JSF 2, PrimeFaces, Servlets, JSP, Ajax, jQuery, Spring, Hibernate, RESTful Web Services, Android. 
Developed and taught by well-known author and developer. At public venues or onsite at your location.

Contenu connexe

Tendances

Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentFarzad Nozarian
 
DSpace Tutorial : Open Source Digital Library
DSpace Tutorial : Open Source Digital LibraryDSpace Tutorial : Open Source Digital Library
DSpace Tutorial : Open Source Digital Libraryrajivkumarmca
 
Develop with linux containers and docker
Develop with linux containers and dockerDevelop with linux containers and docker
Develop with linux containers and dockerFabio Fumarola
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLCloudera, Inc.
 
Making DSpace XMLUI Your Own
Making DSpace XMLUI Your OwnMaking DSpace XMLUI Your Own
Making DSpace XMLUI Your OwnTim Donohue
 
8b. Column Oriented Databases Lab
8b. Column Oriented Databases Lab8b. Column Oriented Databases Lab
8b. Column Oriented Databases LabFabio Fumarola
 
Hadoop single node setup
Hadoop single node setupHadoop single node setup
Hadoop single node setupMohammad_Tariq
 
DSpace 4.2 Transmission: Import/Export
DSpace 4.2 Transmission: Import/ExportDSpace 4.2 Transmission: Import/Export
DSpace 4.2 Transmission: Import/ExportDuraSpace
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2benjaminwootton
 
Performance all teh things
Performance all teh thingsPerformance all teh things
Performance all teh thingsMarcus Deglos
 
Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Shalin Shekhar Mangar
 
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsMay 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsYahoo Developer Network
 
Apache HBase - Lab Assignment
Apache HBase - Lab AssignmentApache HBase - Lab Assignment
Apache HBase - Lab AssignmentFarzad Nozarian
 
03 h base-2-installation_andshell
03 h base-2-installation_andshell03 h base-2-installation_andshell
03 h base-2-installation_andshelldntth0601
 
Python mongo db-training-europython-2011
Python mongo db-training-europython-2011Python mongo db-training-europython-2011
Python mongo db-training-europython-2011Andreas Jung
 

Tendances (20)

D Space Installation
D Space InstallationD Space Installation
D Space Installation
 
Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab Assignment
 
DSpace Tutorial : Open Source Digital Library
DSpace Tutorial : Open Source Digital LibraryDSpace Tutorial : Open Source Digital Library
DSpace Tutorial : Open Source Digital Library
 
Develop with linux containers and docker
Develop with linux containers and dockerDevelop with linux containers and docker
Develop with linux containers and docker
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETL
 
Making DSpace XMLUI Your Own
Making DSpace XMLUI Your OwnMaking DSpace XMLUI Your Own
Making DSpace XMLUI Your Own
 
8b. Column Oriented Databases Lab
8b. Column Oriented Databases Lab8b. Column Oriented Databases Lab
8b. Column Oriented Databases Lab
 
Hadoop single node setup
Hadoop single node setupHadoop single node setup
Hadoop single node setup
 
DSpace 4.2 Transmission: Import/Export
DSpace 4.2 Transmission: Import/ExportDSpace 4.2 Transmission: Import/Export
DSpace 4.2 Transmission: Import/Export
 
Hadoop on osx
Hadoop on osxHadoop on osx
Hadoop on osx
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2
 
Performance all teh things
Performance all teh thingsPerformance all teh things
Performance all teh things
 
Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6
 
Hadoop HDFS
Hadoop HDFS Hadoop HDFS
Hadoop HDFS
 
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsMay 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
 
Hibernate java and_oracle
Hibernate java and_oracleHibernate java and_oracle
Hibernate java and_oracle
 
Apache HBase - Lab Assignment
Apache HBase - Lab AssignmentApache HBase - Lab Assignment
Apache HBase - Lab Assignment
 
03 h base-2-installation_andshell
03 h base-2-installation_andshell03 h base-2-installation_andshell
03 h base-2-installation_andshell
 
Shark - Lab Assignment
Shark - Lab AssignmentShark - Lab Assignment
Shark - Lab Assignment
 
Python mongo db-training-europython-2011
Python mongo db-training-europython-2011Python mongo db-training-europython-2011
Python mongo db-training-europython-2011
 

Similaire à Hdfs java api

Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117exsuns
 
Big data using Hadoop, Hive, Sqoop with Installation
Big data using Hadoop, Hive, Sqoop with InstallationBig data using Hadoop, Hive, Sqoop with Installation
Big data using Hadoop, Hive, Sqoop with Installationmellempudilavanya999
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentationpuneet yadav
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase clientShashwat Shriparv
 
Top 10 Hadoop Shell Commands
Top 10 Hadoop Shell Commands Top 10 Hadoop Shell Commands
Top 10 Hadoop Shell Commands SimoniShah6
 
Single node setup
Single node setupSingle node setup
Single node setupKBCHOW123
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configurationSubhas Kumar Ghosh
 
Install hadoop in a cluster
Install hadoop in a clusterInstall hadoop in a cluster
Install hadoop in a clusterXuhong Zhang
 
Hadoop cluster 安裝
Hadoop cluster 安裝Hadoop cluster 安裝
Hadoop cluster 安裝recast203
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop AdministrationEdureka!
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFSEdureka!
 
Configuring and manipulating HDFS files
Configuring and manipulating HDFS filesConfiguring and manipulating HDFS files
Configuring and manipulating HDFS filesRupak Roy
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jkEdureka!
 
Hadoop File System.pptx
Hadoop File System.pptxHadoop File System.pptx
Hadoop File System.pptxAakashBerlia1
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSDataWorks Summit
 
How to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin LeauHow to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin LeauCodemotion
 

Similaire à Hdfs java api (20)

Run wordcount job (hadoop)
Run wordcount job (hadoop)Run wordcount job (hadoop)
Run wordcount job (hadoop)
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117
 
Big data using Hadoop, Hive, Sqoop with Installation
Big data using Hadoop, Hive, Sqoop with InstallationBig data using Hadoop, Hive, Sqoop with Installation
Big data using Hadoop, Hive, Sqoop with Installation
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentation
 
BIGDATA ANALYTICS LAB MANUAL final.pdf
BIGDATA  ANALYTICS LAB MANUAL final.pdfBIGDATA  ANALYTICS LAB MANUAL final.pdf
BIGDATA ANALYTICS LAB MANUAL final.pdf
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
 
Top 10 Hadoop Shell Commands
Top 10 Hadoop Shell Commands Top 10 Hadoop Shell Commands
Top 10 Hadoop Shell Commands
 
Single node setup
Single node setupSingle node setup
Single node setup
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
 
Install hadoop in a cluster
Install hadoop in a clusterInstall hadoop in a cluster
Install hadoop in a cluster
 
Hadoop cluster 安裝
Hadoop cluster 安裝Hadoop cluster 安裝
Hadoop cluster 安裝
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop Administration
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Configuring and manipulating HDFS files
Configuring and manipulating HDFS filesConfiguring and manipulating HDFS files
Configuring and manipulating HDFS files
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
 
Hadoop File System.pptx
Hadoop File System.pptxHadoop File System.pptx
Hadoop File System.pptx
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop2.2
Hadoop2.2Hadoop2.2
Hadoop2.2
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
How to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin LeauHow to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin Leau
 

Dernier

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 

Dernier (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Hdfs java api

  • 1. © 2012 coreservlets.com and Dima May HDFS - Java API Originals of Slides and Source Code for Examples: http://www.coreservlets.com/hadoop-tutorial/ Customized Java EE Training: http://courses.coreservlets.com/ Hadoop, Java, JSF 2, PrimeFaces, Servlets, JSP, Ajax, jQuery, Spring, Hibernate, RESTful Web Services, Android. Developed and taught by well-known author and developer. At public venues or onsite at your location. © 2012 coreservlets.com and Dima May For live Hadoop training, please see courses at http://courses.coreservlets.com/. Taught by the author of this Hadoop tutorial. Available at public venues, or customized versions can be held on-site at your organization. • Courses developed and taught by Marty Hall – JSF 2, PrimeFaces, servlets/JSP, Ajax, jQuery, Android development, Java 6 or 7 programming, custom mix of topics – Ajax courses can concentrate on 1 library (jQuery, Prototype/Scriptaculous, Ext-JS, Dojo, etc.) or survey several Customized Java EE Training: http://courses.coreservlets.com/ • Courses developed and taught by coreservlets.com experts (edited by Marty) Hadoop, Java, JSF 2, PrimeFaces, Servlets, JSP, Ajax, jQuery, Spring, Hibernate, RESTful Web Services, Android. – Hadoop, Spring, Hibernate/JPA, GWT, SOAP-based and RESTful Web Services Developed and taught by well-known Contact author hall@and coreservlets.developer. com At public for details venues or onsite at your location.
  • 2. Agenda • Java API Introduction • Configuration • Reading Data • Writing Data • Browsing file system 4 File System Java API • org.apache.hadoop.fs.FileSystem – Abstract class that serves as a generic file system representation – Note it’s a class and not an Interface • Implemented in several flavors – Ex. Distributed or Local 5
  • 3. FileSystem Implementations • Hadoop ships with multiple concrete implementations: – org.apache.hadoop.fs.LocalFileSystem • Good old native file system using local disk(s) – org.apache.hadoop.hdfs.DistributedFileSystem • Hadoop Distributed File System (HDFS) • Will mostly focus on this implementation – org.apache.hadoop.hdfs.HftpFileSystem • Access HDFS in read-only mode over HTTP – org.apache.hadoop.fs.ftp.FTPFileSystem • File system on FTP server 6 FileSystem Implementations • FileSystem concrete implementations – Two options that are backed by Amazon S3 cloud • org.apache.hadoop.fs.s3.S3FileSystem • http://wiki.apache.org/hadoop/AmazonS3 – org.apache.hadoop.fs.kfs.KosmosFileSystem • Backed by CloudStore • http://code.google.com/p/kosmosfs 7
  • 4. FileSystem Implementations • Different use cases for different concrete implementations • HDFS is the most common choice – org.apache.hadoop.hdfs.DistributedFileSystem 8 SimpleLocalLs.java Example 9 public class SimpleLocalLs { public static void main(String[] args) throws Exception{ Path path = new Path("/"); if ( args.length == 1){ path = new Path(args[0]); } Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf); FileStatus [] files = fs.listStatus(path); for (FileStatus file : files ){ System.out.println(file.getPath().getName()); } } } Read location from the command line arguments Acquire FileSystem Instance List the files and directories under the provided path Print each sub directory or file to the screen
  • 5. FileSystem API: Path • Hadoop's Path object represents a file or a directory – Not java.io.File which tightly couples to local filesystem • Path is really a URI on the FileSystem – HDFS: hdfs://localhost/user/file1 – Local: file:///user/file1 • Examples: – new Path("/test/file1.txt"); – new Path("hdfs://localhost:9000/test/file1.txt"); 10 Hadoop's Configuration Object • Configuration object stores clients’ and servers’ configuration – Very heavily used in Hadoop • HDFS, MapReduce, HBase, etc... • Simple key-value paradigm – Wrapper for java.util.Properties class which itself is just a wrapper for java.util.Hashtable • Several construction options – Configuration conf1 = new Configuration(); – Configuration conf2 = new Configuration(conf1); • Configuration object conf2 is seeded with configurations of conf1 object 11
  • 6. Hadoop's Configuration Object • Getting properties is simple! • Get the property – String nnName = conf.get("fs.default.name"); • returns null if property doesn't exist • Get the property and if doesn't exist return the provided default – String nnName = conf.get("fs.default.name", "hdfs://localhost:9000"); • There are also typed versions of these methods: – getBoolean, getInt, getFloat, etc... – Example: int prop = conf.getInt("file.size"); 12 Hadoop's Configuration Object 13 • Usually seeded via configuration files that are read from CLASSPATH (files like conf/core-site. xml and conf/hdfs-site.xml): Configuration conf = new Configuration(); conf.addResource(new Path(HADOOP_HOME + "/conf/core-site. xml")); • Must comply with the Configuration XML schema, ex: <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>
  • 7. Hadoop Configuration Object • conf.addResource() - the parameters are either String or Path – conf.addResource("hdfs-site.xml") • CLASSPATH is referenced when String parameter is provided – conf.addResource(new Path("/exact/location/file.xml") : • Path points to the exact location on the local file system • By default Hadoop loads – core-default.xml • Located at hadoop-common-X.X.X.jar/core-default.xml – core-site.xml • Looks for these configuration files on CLASSPATH 14 LoadConfigurations.java Example 15 public class LoadConfigurations { private final static String PROP_NAME = "fs.default.name"; public static void main(String[] args) { Configuration conf = new Configuration(); System.out.println("After construction: " + conf.get(PROP_NAME)); conf.addResource(new Path(Vars.HADOOP_HOME + "/conf/core-site.xml")); System.out.println("After addResource: "+ conf.get(PROP_NAME)); conf.set(PROP_NAME, "hdfs://localhost:8111"); System.out.println("After set: " + conf.get(PROP_NAME)); } } 1. Print the property with empty Configuration 2. Add properties from core-dsite.xml 3. Manually set the property
  • 8. Run LoadConfigurations 16 $ java -cp $PLAY_AREA/HadoopSamples.jar:$HADOOP_HOME/share/had common/hadoop-common-2.0.0- cdh4.0.0.jar:$HADOOP_HOME/share/hadoop/common/lib/* hdfs.LoadConfigurations After construction: file:/// After addResource: hdfs://localhost:9000 After set: hdfs://localhost:8111 1. Print the property with empty Configuration 2. Add properties from core-site.xml 3. Manually set the property FileSystem API • Recall FileSystem is a generic abstract class used to interface with a file system • FileSystem class also serves as a factory for concrete implementations, with the following methods – public static FileSystem get(Configuration conf) • Will use information from Configuration such as scheme and authority • Recall hadoop loads conf/core-site.xml by default • Core-site.xml typically sets fs.default.name property to something like hdfs://localhost:8020 – org.apache.hadoop.hdfs.DistributedFileSystem will be used by default – Otherwise known as HDFS 17
  • 9. Simple List Example 18 public class SimpleLocalLs { public static void main(String[] args) throws Exception{ Path path = new Path("/"); if ( args.length == 1){ path = new Path(args[0]); } Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf); FileStatus [] files = fs.listStatus(path); for (FileStatus file : files ){ System.out.println(file.getPath().getName()); } } } What happens when you run this code? Execute Simple List Example • The Answer is.... it depends 19 $ java -cp $PLAY_AREA/HadoopSamples.jar:$HADOOP_HOME/share/hadoop/commo n/hadoop-common-2.0.0- cdh4.0.0.jar:$HADOOP_HOME/share/hadoop/common/lib/* hdfs.SimpleLs lib ... ... var sbin etc $ yarn jar $PLAY_AREA/HadoopSamples.jar hdfs.SimpleLs hbase lost+found test1 tmp training user ●Uses java command, not yarn ●core-site.xml and core-default.xml are not on the CLASSPATH ●properties are then NOT added to Configuration object ●Default FileSystem is loaded => local file system ●Yarn script will place core-default.xml and core-site-xml on the CLASSPATH ●Properties within those files added to Configuration object ●HDFS is utilized, since it was specified in core-site.xml
  • 10. Reading Data from HDFS 1. Create FileSystem 2. Open InputStream to a Path 3. Copy bytes using IOUtils 4. Close Stream 20 1: Create FileSystem • FileSystem fs = FileSystem.get(new Configuration()); – If you run with yarn command, DistributedFileSystem (HDFS) will be created • Utilizes fs.default.name property from configuration • Recall that Hadoop framework loads core-site.xml which sets property to hdfs (hdfs://localhost:8020) 21
  • 11. 2: Open Input Stream to a Path • fs.open returns org.apache.hadoop.fs.FSDataInputStream – Another FileSystem implementation will return their own custom implementation of InputStream • Opens stream with a default buffer of 4k • If you want to provide your own buffer size use – fs.open(Path f, int bufferSize) 22 ... InputStream input = null; try { input = fs.open(fileToRead); ... 3: Copy bytes using IOUtils • Copy bytes from InputStream to OutputStream • Hadoop’s IOUtils makes the task simple – buffer parameter specifies number of bytes to buffer at a time 23 IOUtils.copyBytes(inputStream, outputStream, buffer);
  • 12. 4: Close Stream • Utilize IOUtils to avoid boiler plate code that catches IOException 24 ... } finally { IOUtils.closeStream(input); ... ReadFile.java Example 25 public class ReadFile { public static void main(String[] args) throws IOException { Path fileToRead = new Path("/training/data/readMe.txt"); FileSystem fs = FileSystem.get(new Configuration()); InputStream input = null; try { 1: Open FileSystem 2: Open InputStream input = fs.open(fileToRead); IOUtils.copyBytes(input, System.out, 4096); } finally { IOUtils.closeStream(input); } } } 3: Copy from Input to Output Stream 4: Close stream $ yarn jar $PLAY_AREA/HadoopSamples.jar hdfs.ReadFile Hello from readme.txt
  • 13. Reading Data - Seek • FileSystem.open returns FSDataInputStream – Extension of java.io.DataInputStream – Supports random access and reading via interfaces: • PositionedReadable : read chunks of the stream • Seekable : seek to a particular position in the stream 26 Seeking to a Position • FSDataInputStream implements Seekable interface – void seek(long pos) throws IOException • Seek to a particular position in the file • Next read will begin at that position • If you attempt to seek past the file boundary IOException is emitted • Somewhat expensive operation – strive for streaming and not seeking – long getPos() throws IOException • Returns the current position/offset from the beginning of the stream/file 27
  • 14. SeekReadFile.java Example 28 public class SeekReadFile { public static void main(String[] args) throws IOException { Path fileToRead = new Path("/training/data/readMe.txt"); FileSystem fs = FileSystem.get(new Configuration()); FSDataInputStream input = null; try { input = fs.open(fileToRead); System.out.print("start postion=" + input.getPos() + ": IOUtils.copyBytes(input, System.out, 4096, false); input.seek(11); System.out.print("start postion=" + input.getPos() + ": IOUtils.copyBytes(input, System.out, 4096, false); input.seek(0); System.out.print("start postion=" + input.getPos() + ": IOUtils.copyBytes(input, System.out, 4096, false); } finally { IOUtils.closeStream(input); } } } Start at position 0 Seek to position 11 Seek back to 0 Run SeekReadFile Example 29 $ yarn jar $PLAY_AREA/HadoopSamples.jar hdfs.SeekReadFile start position=0: Hello from readme.txt start position=11: readme.txt start position=0: Hello from readme.txt
  • 15. Write Data 1. Create FileSystem instance 2. Open OutputStream – FSDataOutputStream in this case – Open a stream directly to a Path from FileSystem – Creates all needed directories on the provided path 3. Copy data using IOUtils 30 WriteToFile.java Example 31 public class WriteToFile { public static void main(String[] args) throws IOException { String textToWrite = "Hello HDFS! Elephants are awesome!n"; InputStream in = new BufferedInputStream( new ByteArrayInputStream(textToWrite.getBytes())); Path toHdfs = new Path("/training/playArea/writeMe.txt"); Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf); FSDataOutputStream out = fs.create(toHdfs); IOUtils.copyBytes(in, out, conf); } } 1: Create FileSystem instance 2: Open OutputStream 3: Copy Data
  • 16. Run WriteToFile 32 $ yarn jar $PLAY_AREA/HadoopSamples.jar hdfs.WriteToFile $ hdfs dfs -cat /training/playArea/writeMe.txt Hello HDFS! Elephants are awesome! FileSystem: Writing Data • Append to the end of the existing file – Optional support by concrete FileSystem – HDFS supports • No support for writing in the middle of the file 33 fs.append(path)
  • 17. FileSystem: Writing Data • FileSystem's create and append methods have overloaded version that take callback interface to notify client of the progress 34 FileSystem fs = FileSystem.get(conf); FSDataOutputStream out = fs.create(toHdfs, new Progressable(){ @Override public void progress() { System.out.print(".."); } }); Report progress to the screen Overwrite Flag • Recall FileSystem's create(Path) creates all the directories on the provided path – create(new Path(“/doesnt_exist/doesnt_exist/file/txt”) – can be dangerous, if you want to protect yourself then utilize the following overloaded method: 35 public FSDataOutputStream create(Path f, boolean overwrite) Set to false to make sure you do not overwrite important data
  • 18. Overwrite Flag Example 36 Path toHdfs = new Path("/training/playArea/writeMe.txt"); FileSystem fs = FileSystem.get(conf); FSDataOutputStream out = fs.create(toHdfs, false); Set to false to make sure you do not overwrite important data $ yarn jar $PLAY_AREA/HadoopSamples.jar hdfs.BadWriteToFile Exception in thread "main" org.apache.hadoop.ipc.RemoteException: java.io.IOException: failed to create file /training/playArea/anotherSubDir/writeMe.txt on client 127.0.0.1 either because the filename is invalid or the file exists ... ... Error indicates that the file already exists Copy/Move from and to Local FileSystem • Higher level abstractions that allow you to copy and move from and to HDFS – copyFromLocalFile – moveFromLocalFile – copyToLocalFile – moveToLocalFile 37
  • 19. Copy from Local to HDFS 38 FileSystem fs = FileSystem.get(new Configuration()); Path fromLocal = new Path("/home/hadoop/Training/exercises/sample_data/hamlet.txt"); Path toHdfs = new Path("/training/playArea/hamlet.txt"); fs.copyFromLocalFile(fromLocal, toHdfs); Copy file from local file system to HDFS Empty directory before copy $ hdfs dfs -ls /training/playArea/ $ yarn jar $PLAY_AREA/HadoopSamples.jar hdfs.CopyToHdfs File was copied $ hdfs dfs -ls /training/playArea/ Found 1 items -rw-r—r-- hadoop supergroup /training/playArea/hamlet.txt Delete Data 39 FileSystem.delete(Path path,Boolean recursive) If recursive == true then non-empty directory will be deleted otherwise IOException is emitted Path toDelete = new Path("/training/playArea/writeMe.txt"); boolean isDeleted = fs.delete(toDelete, false); System.out.println("Deleted: " + isDeleted); $ yarn jar $PLAY_AREA/HadoopSamples.jar hdfs.DeleteFile Deleted: true $ yarn jar $PLAY_AREA/HadoopSamples.jar hdfs.DeleteFile Deleted: false File was already deleted by previous run
  • 20. FileSystem: mkdirs • Create a directory - will create all the parent directories 40 Configuration conf = new Configuration(); Path newDir = new Path("/training/playArea/newDir"); FileSystem fs = FileSystem.get(conf); boolean created = fs.mkdirs(newDir); System.out.println(created); $ yarn jar $PLAY_AREA/HadoopSamples.jar hdfs.MkDir true FileSystem: listStatus 41 Browse the FileSystem with listStatus() methods Path path = new Path("/"); Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf); FileStatus [] files = fs.listStatus(path); for (FileStatus file : files ){ System.out.println(file.getPath().getName()); } $ yarn jar $PLAY_AREA/HadoopSamples.jar hdfs.SimpleLs training user List files under ̎/̎ for HDFS
  • 21. LsWithPathFilter.java example 42 FileSystem fs = FileSystem.get(conf); FileStatus [] files = fs.listStatus(path, new PathFilter() { @Override public boolean accept(Path path) { if (path.getName().equals("user")){ return false; } return true; } }); for (FileStatus file : files ){ System.out.println(file.getPath().getName()); } Do not show path whose name equals to "user" Restrict result of listStatus() by supplying PathFilter object Run LsWithPathFilter Example 43 $ yarn jar $PLAY_AREA/HadoopSamples.jar hdfs.SimpleLs training user $yarn jar $PLAY_AREA/HadoopSamples.jar hdfs.LsWithPathFilter training
  • 22. FileSystem: Globbing • FileSystem supports file name pattern matching via globStatus() methods • Good for traversing through a sub-set of files by using a pattern • Support is similar to bash glob: *, ?, etc... 44 SimpleGlobbing.java 45 public class SimpleGlobbing { public static void main(String[] args) throws IOException { Path glob = new Path(args[0]); FileSystem fs = FileSystem.get(new Configuration()); FileStatus [] files = fs.globStatus(glob); for (FileStatus file : files ){ System.out.println(file.getPath().getName()); } } } Read glob from command line Similar usage to listStatus method
  • 23. Run SimpleGlobbing 46 $ hdfs dfs -ls /training/data/glob/ Found 4 items drwxr-xr-x - hadoop supergroup 0 2011-12-24 11:20 /training/data/glob/2007 drwxr-xr-x - hadoop supergroup 0 2011-12-24 11:20 /training/data/glob/2008 drwxr-xr-x - hadoop supergroup 0 2011-12-24 11:21 /training/data/glob/2010 drwxr-xr-x - hadoop supergroup 0 2011-12-24 11:21 /training/data/glob/2011 $ yarn jar $PLAY_AREA/HadoopSamples.jar hdfs.SimpleGlobbing /training/data/glob/201* 2010 2011 Usage of glob with * FileSystem: Globbing 47 Glob Explanation ? Matches any single character * Matches zero or more characters [abc] Matches a single character from character set {a,b,c}. [a-b] Matches a single character from the character range {a...b}. Note that character a must be lexicographically less than or equal to character b. [^a] Matches a single character that is not from character set or range {a}. Note that the ^ character must occur immediately to the right of the opening bracket. c Removes (escapes) any special meaning of character c. {ab,cd} Matches a string from the string set {ab, cd} {ab,c{de,fh}} Matches a string from the string set {ab, cde, cfh} Source: FileSystem.globStatus API documentation
  • 24. FileSystem • There are several methods that return ‘true’ for success and ‘false’ for failure – delete – rename – mkdirs • What to do if the method returns 'false'? – Check Namenode's log • Located at $HADOOP_LOG_DIR/ 48 BadRename.java 49 FileSystem fs = FileSystem.get(new Configuration()); Path source = new Path("/does/not/exist/file.txt"); Path nonExistentPath = new Path("/does/not/exist/file1.txt"); boolean result = fs.rename(source, nonExistentPath); System.out.println("Rename: " + result); $ yarn jar $PLAY_AREA/HadoopSamples.jar hdfs.BadRename Rename: false Namenode's log at $HADOOP_HOME/logs/hadoop-hadoop-namenode-hadoop- laptop.log 2011-12-25 01:18:54,684 WARN org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.unprotectedRenameTo: failed to rename /does/not/exist/file.txt to /does/not/exist/file1.txt because source does not exist
  • 25. © 2012 coreservlets.com and Dima May Wrap-Up Customized Java EE Training: http://courses.coreservlets.com/ Hadoop, Java, JSF 2, PrimeFaces, Servlets, JSP, Ajax, jQuery, Spring, Hibernate, RESTful Web Services, Android. Developed and taught by well-known author and developer. At public venues or onsite at your location. Summary • We learned about – HDFS API – How to use Configuration class – How to read from HDFS – How to write to HDFS – How to browse HDFS 51
  • 26. © 2012 coreservlets.com and Dima May Questions? JSF 2, PrimeFaces, Java 7, Ajax, jQuery, Hadoop, RESTful Web Services, Android, Spring, Hibernate, Servlets, JSP, GWT, and other Java EE training. Customized Java EE Training: http://courses.coreservlets.com/ Hadoop, Java, JSF 2, PrimeFaces, Servlets, JSP, Ajax, jQuery, Spring, Hibernate, RESTful Web Services, Android. Developed and taught by well-known author and developer. At public venues or onsite at your location.