Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Data science bootcamp day2

135 vues

Publié le

Data Science - CCCS936 - Department of Computer Science, University of Kachchh.

Publié dans : Données & analyses
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Data science bootcamp day2

  1. 1. Data Science Bootcamp, Day 2 Presented By: Chetan Khatri, Volunteer Teaching assistant, Data Science Lab. Guidance By: Prof. Devji D. Chhanga, University of Kachchh.
  2. 2. Agenda Understanding Git. Understanding Apache Maven. Hello World Java Program with Apache Maven. Understanding of Hadoop Administrative Commands. WordCount Hadoop Program on Hadoop Cluster with Maven.
  3. 3. Git with Github ● Github: Repository storage where you can store your source code and share with team member work interactively. ● Installation: sudo apt-get install git ● Steps TODO: 1. Create Repository 2. Clone - Copy someone else's repository 3. Commit - Ready to submit your code to repository.
  4. 4. Let’s have Demo with Git ● Create Repository at Github named hadoopdemo ● Cloning Repository: git clone https://github.com/dskskv/hadoopdemo.git ● Configure github with your credentials: git config --global user.email "you@example.com" git config --global user.name "Your Name" Add individual file: sudo git add README.md for adding every files: sudo git add .
  5. 5. Let’s have Demo with Git (Conti…) commit command - sudo git commit -m "Comment anything" Submit request to github repository with whatever has been added: sudo git origin master pull - is to get latest updated code from repository Example : git pull https://github.com/dskskv/hadoopdemo.git Git Branches: Are Different Modules of the Repository, Such as Development, Test, Production phase of the software development. Master branch has always updated code.
  6. 6. Understanding Apache Maven Apache Maven is Build Tool for Java, where you can use Other Artifacts(Jar files written by someone else) and build your Jar file which contains all other’s added before. Maven Life Cycle: Create Maven Project Update Maven Project Write Java Code Maven Clean Maven Build (For building your Jar file)
  7. 7. Understanding Hadoop Administrative Commands 1. Cloning github cccs936 repository git clone https://github.com/dskskv/CCCS936.git 2. Start Hadoop Cluster sbin/start-dfs.sh sbin/start-yarn.sh 3. Check Hadoop Version hadoop version 4. Check all the options under hadoop command hadoop 5. Create Directory as "dskskv" at HDFS hadoop fs -mkdir /dskskv
  8. 8. Understanding Hadoop Administrative Commands 6. List out the contents of dskskv object inside HDFS hadoop fs -ls /dskskv 7. Create Text file sudo gedit inputfile.txt 8. Put text file inside HDFS block hadoop fs -put inputfile.txt /dskskv 9. Read the content of HDFS textfile object hadoop fs -cat /dskskv/inputfile.txt
  9. 9. Understanding Hadoop Administrative Commands 10. hadoop deprecated, use hdfs also for the same operations. hdfs dfs -mkdir /chetan hdfs dfs -put inputfile.txt /chetan hdfs dfs -cat /chetan/inputfile.txt 11. Deleting file from HDFS hadoop fs -rm /dskskv/inputfile.txt 12. Deleting Directory from HDFS hadoop fs -rm -r /dskskv
  10. 10. WordCount Hadoop Program on Hadoop Cluster with Maven 1) Login as a Hadoop User: su hduser 2) Start hadoop deamon services sbin/start-dfs.sh sbin/start-yarn.sh 3) Check whether all deamon services are up or not jps 4) Create directory in HDFS, Note: make sure wherever you are in the console , Hadoop user should have previlegies to access it. hadoop fs -mkdir /input 5) Transfer textfile to HDFS hadoop fs -put inputfile.txt /input
  11. 11. WordCount Hadoop Program on Hadoop Cluster with Maven 6) Check whether file is transferred successfully hadoop fs -ls /input 7) execute hadoop job by providing Hadoop Program executable Jar file and input directory path where text file is there and output directory path where you are looking to store process data. hadoop jar WordCountDSKSKV-0.0.1-SNAPSHOT.jar /input /output 8) Check Processed Directory has processed files ? hadoop fs -ls /output 9) Read your desired output from Hadoop Job. hadoop fs -cat /output/part-r-00000