31.3.12

A Phyrric methodology for cleaning out hadoop in psuedodistributed mode.

Every once in a while , hadoop goes totally haywire when I play with it in psuedodistributed mode.

Problems include :

1) Data not being replicated to nodes (i.e. you do a namenode format, and the data nodes are now out of sync). 

2) No connection available (i.e. hadoop keeps trying to connect to localhost:9000 and failing) .

3) Other permissions or other types of cryptic exceptions .

The solution is simple - pseudodistributed mode, by default, writes to /tmp (which aliases to /private/tmp on OS X). Thus, to clean up your psuedodistributed hadoop dfs, you can simply :

1) run stop-all.sh (or stop all hadoop services in some other manner).

2) Remove everything in /tmp (Careful here-- im assuming you dont have anything important in /tmp - if you do, just remove everything that looks related to hadoop). 

3) hadoop namenode -format : this will format the namenode , starting things over from scratch. 

4) Fire up the rest of the "cluster" by running "start-all.sh".

No comments:

Post a Comment