{"id":8,"date":"2014-06-02T09:47:00","date_gmt":"2014-06-02T13:47:00","guid":{"rendered":"https:\/\/mberlove.com\/blog\/?p=8"},"modified":"2014-10-08T11:24:30","modified_gmt":"2014-10-08T15:24:30","slug":"little-lessons-in-hadoop","status":"publish","type":"post","link":"https:\/\/mberlove.com\/blog\/little-lessons-in-hadoop\/","title":{"rendered":"Little Lessons in Hadoop"},"content":{"rendered":"<p>Hadoop is notoriously under-documented, as I recently discovered. I am using Hadoop in my summer research position, and have launched myself into the wonderful and aggravating world of servers and open-source map-reduce programs. And one of the fun aspects of releasing open-source software, I suppose, is no one can complain if you leave it largely undocumented.<\/p>\n<p>However, this does leave the experience of installing and running Hadoop as a rather harrowing experience for the uninitiated. But hands-on learning is the best way! And there are some pretty good, if often incomplete or outdated, tutorials out there, including&nbsp;<a href=\"http:\/\/www.michael-noll.com\/tutorials\/running-hadoop-on-ubuntu-linux-single-node-cluster\/\" target=\"_blank\">this<\/a>&nbsp;and <a href=\"http:\/\/cs.smith.edu\/dftwiki\/index.php\/Hadoop_Tutorial_1_--_Running_WordCount\" target=\"_blank\">this<\/a>.<\/p>\n<p>Those, along with a few dozen web searches, and hours of pain, struggle, and frustration, led me to the successful operation of Hadoop on the standard WordCount trial code.<\/p>\n<p>I record my efforts, failures, and discoveries now for my own benefit as well as for any who might be struggling with same.<\/p>\n<div class=\"separator\" style=\"clear: both; text-align: center;\"><a href=\"http:\/\/2.bp.blogspot.com\/-HzurpOuM_0A\/U4yqXX3OTJI\/AAAAAAAAAFU\/y0VvRECaabk\/s1600\/Screenshot+from+2014-06-02+12:44:47.png\" imageanchor=\"1\" style=\"margin-left: 1em; margin-right: 1em;\"><img loading=\"lazy\" decoding=\"async\" alt=\"Working with Hadoop\" border=\"0\" src=\"http:\/\/2.bp.blogspot.com\/-HzurpOuM_0A\/U4yqXX3OTJI\/AAAAAAAAAFU\/y0VvRECaabk\/s1600\/Screenshot+from+2014-06-02+12:44:47.png\" height=\"202\" title=\"Working with Hadoop\" width=\"320\" \/><\/a><\/div>\n<p><b>The &#8220;No such file or directory&#8221; error.<\/b><br \/>When Hadoop is set up, and you attempt to start the instance using start-all.sh or start-dfs.sh, you may get the error noted above. It is likely that either your HADOOP_HOME directory is not set for the user Hadoop is running under, or mkdir failed to create the log directory due to permissions errors.<br \/>To check for the first of these cases, type &#8220;echo $HADOOP_HOME&#8221;, to see if the variable is set. If you see nothing but a blank line, or get an error telling you that the directory cannot be found, you&#8217;ll need to change this directory to the true Hadoop installation directory (like &#8220;\/home\/<user>\/hadoop&#8221; or wherever you placed it). You can change this with the&nbsp;<a href=\"http:\/\/www.cyberciti.biz\/faq\/linux-unix-shell-export-command\/\" target=\"_blank\">export command<\/a>.<br \/>If HADOOP_HOME prints correctly, you will need to chmod the permissions on the Hadoop directory. Instructions on using chmod can be found&nbsp;<a href=\"http:\/\/www.linfo.org\/mkdir.html\" target=\"_blank\">here<\/a>. Remember the -R flag to include subdirectories.<\/p>\n<p><b>HADOOP_OPTS and HADOOP_CLASSPATH<\/b><br \/>Contrary to what several tutorials indicate, you will likely not need to have your HADOOP_OPTS variable set &#8212; in fact, it can be empty.<br \/>On the other hand, the HADOOP_CLASSPATH should contain the location of the hadoop\/lib directory, e.g. &#8220;<\/user><user>\/hadoop\/lib&#8221; (use the export command&nbsp;for this as well).<\/p>\n<p><b>Other small but Important Items<\/b><\/p>\n<ul>\n<li>Don&#8217;t forget your &#8216;sudo&#8217;. If you&#8217;re operating on files from a different user&#8217;s directory (like if you&#8217;re using a Hadoop-specific user but saving files on the standard user), you&#8217;ll need to sudo most of your commands.<\/li>\n<li>Likewise, chmod all the important directories before you get started.<\/li>\n<li>The PATH environment variable must have the &#8220;bin&#8221; folder within it, e.g. &#8220;\/home\/<\/li>\n<\/ul>\n<p><\/user><user>\/hadoop\/bin&#8221;. You can add this with the&nbsp;<a href=\"http:\/\/www.cyberciti.biz\/faq\/linux-unix-shell-export-command\/\" target=\"_blank\">export command<\/a>&nbsp;(don&#8217;t forget to use the &nbsp;&#8220;:&#8221; concatenator to avoid overwriting existing locations).<\/p>\n<li>When creating new directories, for input or output files, etc, use the -p flag to ignore any non-existing parent directories and create them along the way. For instance, if your <\/li>\n<p><\/user><user>\/Documents directory is empty, you can create the <\/user><user>\/Documents\/hadoop-output\/wordcount-results using mkdir with a -p flag.<\/p>\n<li>When running a program such as WordCount, you will need to handle the HDFS; if you&#8217;re not sure how this is set up, you can use Hadoop&#8217;s LS command to look around the same as with the equivalent command line operation: &#8220;hadoop fs -ls <directory>&#8220;.<br \/><\/directory><\/li>\n<li>Attempting to test the Hadoop setup, I had difficulty ascertaining the location of the WordCount example &#8212; every tutorial seemed to show it in a different place. As of Hadoop &nbsp;2.3.0, the jar with this example is in &#8220;<main hadoop=\"Hadoop\" directory=\"directory\">\/share\/hadoop\/mapreduce\/hadoop-mapreduce-examples-2.3.0.jar&#8221;.<br \/><\/main><\/li>\n<li>To save some typing, of which you will be doing plenty, consider using aliases on the more common commands. For instance, you might use &#8220;h-start&#8217; as an alias for &#8220;<main hadoop=\"Hadoop\" directory=\"directory\">\/bin\/start-all.sh&#8221;. You can <a href=\"https:\/\/en.wikipedia.org\/wiki\/Alias_(command)\" target=\"_blank\">learn about aliases here<\/a>.<\/main><\/li>\n<p>Good luck with your Hadooping! I will add more hints and tips as I encounter them.<\/p>\n<p><\/user><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hadoop is notoriously under-documented, as I recently discovered. I am using Hadoop in my summer research position, and have launched myself into the wonderful and aggravating world of servers and open-source map-reduce programs. And one of the fun aspects of [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[45,46,16,47,48,23,49,50,51,14,52,53,54],"class_list":["post-8","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-error","tag-fix","tag-hadoop","tag-hadoop_classpath","tag-hadoop_opts","tag-issues","tag-map-reduce","tag-no-such-file-or-directory","tag-path","tag-software","tag-tips-and-tricks","tag-trouble","tag-wordcount"],"_links":{"self":[{"href":"https:\/\/mberlove.com\/blog\/wp-json\/wp\/v2\/posts\/8","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mberlove.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mberlove.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mberlove.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mberlove.com\/blog\/wp-json\/wp\/v2\/comments?post=8"}],"version-history":[{"count":1,"href":"https:\/\/mberlove.com\/blog\/wp-json\/wp\/v2\/posts\/8\/revisions"}],"predecessor-version":[{"id":76,"href":"https:\/\/mberlove.com\/blog\/wp-json\/wp\/v2\/posts\/8\/revisions\/76"}],"wp:attachment":[{"href":"https:\/\/mberlove.com\/blog\/wp-json\/wp\/v2\/media?parent=8"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mberlove.com\/blog\/wp-json\/wp\/v2\/categories?post=8"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mberlove.com\/blog\/wp-json\/wp\/v2\/tags?post=8"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}