Hi friends, We all know that there is a recent buzz of Hadoop in the market. According to Gartner report hadoop may bring 4.5 million jobs in the coming years. After reading all this, I thought why not to create a tutorial on hadoop, Therefore I started a series of post on codingdevil.com related to hadoop. To those of you who are new to this technology and don’t know what hadoop is, and how it is to be setup on single PC, can refer to chapter 1.
Now what the heck is map reduce:
Map reduce is a parallel programming paradigm which parallely performs computations on the data through the use of mapper and reducer. Here mapper and reducer stands for processors that perform mapping and reducing function respectively. Mapper takes a set of data in the form of (key,value) and converts it into another set of data (key,value).
(k1,v1) —> (k2,v2)
Reducer takes the output produced by the mapper and performs computation (like aggregation) on it.
(k2,v2) —-> (k2,f[v2])
To clear the concepts lets take an example of running word count example on hadoop, the map reducing algorithm takes the data blocks from the datanode as directed by namenode. Then the mapper counts the frequencies of the words form the subset of the documents received from the datanode. After this it generates the output. This output file is taken by the reducers. Reducers perform shuffle and merges the output. It takes various outputs and performs aggregation on it. In the final step the reducer merges all the output. In the end we receive the frequencies of words appearing in the input file.
- This chapter includes installation steps of Hadoop single node and multi node. please download PDF attached in the start of post to to get quick installation steps of Hadoop.
- before we began to learn Map reduce we need to set up our eclipse for developing mapreduce application.
- This chapter help to simplify later chapter as mentioned below.