EN.605.788 Assignment 3 Solved

45.00 $

Category:
Click Category Button to View Your Next Assignment | Homework

You'll get a download link with a: zip solution files instantly, after Payment

Securely Powered by: Secure Checkout

Description

5/5 - (2 votes)

In this week’s module, we learned how to set up our development environment for Hadoop, and also learned how to access HDFS using Java API. For this assignment, please do the following:

 

  • Create a java command line class called

ParallelLocalToHdfsCopy.  This class should be part of bdpuh.hw2 package and should contain the main() method.

  • The program should take 3 arguments. The first argument should be an absolute directory name on the local filesystem, the second argument should be an absolute directory name in the HDFS file system, and the third argument should be the number of threads to be used for coping.
  • The program should ensure that the local directory exists, if it does not, and then print the error message “Source directory does not exist”. The program should then exit.
  • The program should ensure that the HDFS destination directory does not exist, and if it does, print an error message. “Destination directory already exists.   Please delete before running the program”.  The program should then exit.  If the HDFS destination directory does not exist, then create the directory.
  • The program should start copying files from the source local directory to the target HDFS directory in parallel.
  • As the program copies files into HDFS, it should compress the files as .gz
  • Assume that there are no subdirectories in the source directory. If you find them, you can ignore them.
  • Create 10 files on the source directory and test out your code

What to turn in?

  • Jar file that can be run at command line
  • A zip of NetBeans or Eclipse project
  • A sample run
  • Assgn3-e1hi6t.zip