Ubuntu: How to use amazon instances' public IPs in hadoop configuration?


I was trying to configure Hadoop by using the public IPs of amazon instances instead of their intra-network IPs as my aim is to create a hybrid cluster i.e. Cloud + Local machine cluster. Although all the ssh settings are perfectly fine, Hadoop is still not able to connect when amazon public IPs are used (the datanodes cannot find the namenode). I had used amazon instance's public IP in the zookeeper configuration of hbase and it properly connected to it. So why does HBase connect and Hadoop doesn't?

The same problem was with kafka too.


I have found the answer.

The trick is not to have any entries regarding the amazon EC2 instances in the '/etc/hosts' file and also the '~/.ssh/config' file should only contain only the following setting when it comes to amazon instances:

Host ec2-x-x-x-x.compute-1.amazonaws.com      StrictHostKeyChecking no      IdentityFile /path to private key  

Here ec2-x-x-x-x.compute-1.amazonaws.com is the public IP of the amazon instance.

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »