I am bit new to hadoop. I have recently install a stable version of Apache Hadoop 2.7.2 on ubuntu 14.04 OS.
I am trying to execute some basic Hadoop command such as follow
hadoop version
The command gives me the correct output as follows that is correct.
However, when I try to execute the hadoop fs -ls, it give me error.
I have searched the previous question related to this problem on stackoverflow such as StackoverflowQuestion. But, I am not finding /user directory in my hadoop installation. Could you please help me how can I resolve this issue?
The content of my .bashrc file is as follows:
The content of hdfs-site.xml file is as follows:
First of all the command "hadoop fs -ls" is a command to the HDFS file system and not a Linux command.
Second, the command as you typed it is incomplete. The correct syntax is "hadoop fs -ls [-d] [-h] [-R] " where the [-d], [-h] and [-R] components of the command are optional. That said, you MUST specify a path for the "args" component. The "args" component of the command expects an HDFS path (e.g. substituting / for "args" will list the entire tree ** ON HDFS ** starting at the HDFS root directory /). You will need to create a directory called "user" on HDFS under the root directory using "hadoop fs -mkdir /user". The the command "hadoop fs -ls /user" will work and will show an empty user directory.
Third, there is no way to tell HDFS to use a value for the "args" by supplying it a value of a local filesystem (Linux) path ...which is what you are attempting or understanding it to be. Any value for "args" must resolve to an HDFS filesystem path and not a Linux filesystem path.
Fourth, for newcomers to Hadoop, it is very important to have a clear distinction between the native host operating system filesystem (in this case Linux filesystem) and the Hadoop filesystem (in this case HDFS).
one thing to note when doing a hadoop commands in v2.7.2 is that, hadoop works on top of Linux OS, hence when we want to access into Hadoop Distributed File System, we would use something like this command; hdfs dfs -ls / instead of hadoop fs -ls.
Also, in your hdfs-site.xml configurations, you seemed to miss out adding this properties.
<property>
<name>dfs.datanode.dir</name>
<value>file://path/to/datanode</value>
</property>
Please take note of your $HADOOP_HOME as well.
Related
I am new to all this as I am only in my second semester and I just need help understanding a command I need to do. I am trying to load a local csv file to hdfs on cloudera using the terminal. I have to use that data and work with Pig for an assignment. I have tried everything and it still gives me 'no such file or directory'. I have turned off safe mode, checked the directories and even made sure the file could be read. Here are the commands I have tried to load the data:
hadoop fs -copyFromLocal 2008.csv
hdfs dfs -copyFromLocal 2008.csv
hdfs dfs -copyFromLocal 2008.csv /user/root
hdfs dfs -copyFromLocal 2008.csv /home/cloudera/Desktop
Nothing at all has worked and keeps giving me
'2008.csv' no such file or directory
. What could I do to fix this? Thank you very much.
I have to use that data and work with Pig for an assignment
You can run Pig without HDFS.
pig -x local
I have tried everything and it still gives me 'no such file or directory'
Well, that error is not from HDFS, it seems to be from your local shell.
ls shows you the files available to use in the current directory for -copyFromLocal or -put to work without an absolute path.
For complete assurance for what you are copying, as well as to where, use full paths in both arguments. The second path is always HDFS if using those two flags.
Try this
hadoop fs -mkdir -p /user/cloudera # just in case
hadoop fs -copyFromLocal ./2008.csv /user/cloudera/
Or even
hadoop fs -copyFromLocal /home/cloudera/Desktop/2008.csv /user/cloudera/
What I think you are having issues with, is that /user/root is not correct unless you are running commands as the root user, and neither is /home/cloudera/Desktop because HDFS has no concept of a Desktop.
The default behavior without the second path is
hadoop fs -copyFromLocal <file> /user/$(whoami)/
(Without the trailing slash, or a pre-existing directory, it'll copy <file> literally as a file, which can be unexpected in certain situations, for example, when trying to copy a file into a user directory, but the directory doesn't exist yet)
I believe you already check and made yourself sure that 2008.csv exists. That's why I think the permissions on this file not allowing you to copy it.
try: sudo -u hdfs cat 2008.csv
If you get permission denied error, this is your issue. Arrange permissions of the file or create a new one if so. If again you get "no file" error, try to use whole path for the file like:
hdfs dfs -copyFromLocal /user/home/csvFiles/2008.csv /user/home/cloudera/Desktop
I want to upload bulk csv file using phonix but I can not understood below command. Can you explain me in details ?
HADOOP_CLASSPATH=$(hbase mapredcp):/path/to/hbase/conf hadoop jar phoenix-<version>-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool --table EXAMPLE --input /data/example.csv
I take this command from following website.
https://phoenix.apache.org/bulk_dataload.html
I am not sure if you are still looking for an answer. But here it is. You are first setting the HADOOP_CLASSPATH and then call the executable "hadoop" with jar options to look for phoenix client jar and the class to run with parameters.
The following can help you to understand hadoop command usage (try typing hadoop on your ssh shell)
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
CLASSNAME run the class named CLASSNAME
or
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
note: please use "yarn jar" to launch
YARN applications, not this command.
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
envvars display computed Hadoop environment variables
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
credential interact with credential providers
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
trace view and modify Hadoop tracing settings
Most commands print help when invoked w/o parameters.
I have heroku toolbelt installed on Windows7 (x64 HomeEd). When i'm trying to login to heroku, i recieve strange message about MySQL (though MySQL already works fine with all software installed):
Microsoft Windows [Version 6.1.7601]
(c) Microsoft Corp., 2009. All rights reserved.
C:\windows\system32>heroku login
"MySQL" is not recognized as an internal or external command,
operable program or batch file.
"MySQL" is not recognized as an internal or external command,
operable program or batch file.
C:\windows\system32>
Exactly like shown - two times. As I found, heroku.bat does not invoke MySQL at any step. I think that OS runs some script (a kind of autoexec) just before or in parallel with heroku.bat, but i can't find the way used to do this. This is confirmed by the fact that when i start the other bat files the same two messages about MySQL appears.
Can you help me find how that strange script is invoked?
Detailed research shown that the reason is MySQL component, named "MySQL Fabric 1.5.3 & MySQL Utilities 1.5.3 1.5". During installation MySQL added it's location to Path environment variable. This caused to break Path variable contents with "&" symbol and any invocation of Path variable cause error
It seems that at some point a script tries to call MySQL.exe and this file can't be found. To solve this problem you should add the Directory containing MySQL.exe to %PATH%. First make sure MySQL.exe is located on your system. If you are not sure where it is, start CMD, go to your root directory (CD \) end enter dir /S MySQL.exe. This will search your drive for the file and show you the path, where it can be found. Save the path somewhere.
Now that you know where the file is is, you have to add it's location to %PATH%. To do so enter setx PATH "<NEWPATH>;%path%;" where is the path to the directory containing MySQL.EXE. Dont forget ;%path%;, this is very important. If you don't put it there it wil mess up your %PATH%.
Close the console, open a new one and type heroku login. This should fix the problem.
Now if you are not interested in fixing the error and just want to know where it comes from, you should post the code of the batch file that is executed by calling heroku login.
Just remove the C:\Program Files (x86)\MySQL\MySQL Fabric 1.5 & MySQL Utilities 1.5\;C:\Program Files (x86)\MySQL\MySQL Fabric 1.5 & MySQL Utilities 1.5\Doctrine extensions for PHP\; from your path (It is breaking things up, it gets added during MySQL installation)and add the MySQL bin path. For me it is C:\Program Files\MySQL\MySQL Server 5.7\bin
I am connecting to a mysql database from lua using :
mysql = require "luasql.mysql"
local env = mysql.mysql()
local conn = env:connect(database,userName,password)
but the option local-infile is not activated so my requests using LOAD DATA don't work.
I tried to put the line
local-infile = 1
in the file my.cnf in the field [client] but it still doesn't work.
FYI : I am using linux and mysql 5.1.
I went through the same situation last week. The query LOAD DATA INFILE worked on Mac OSX, but I could not make it work on Ubuntu. The only way I found to make it work was adding one line of code to the LuaSQL project and recompiling it.
I used the MySQL driver's function mysql_options (you can check its prototype in the mysql.h file, probably located at /usr/include/mysql) to enable the local-infile. You can check the code at the repository.
To compile and install this workaround, you should download the files:
$ wget https://github.com/rafaeldias/luasql/archive/master.zip
$ unzip master.zip
To compile and install :
$ cd luasql-master/
$ make
$ sudo make install
Note: Depending on where your Lua and MySQL folders are located, you may need to set the proper values for the LUA_LIBDIR, LUA_DIR , LUA_INC , DRIVER_LIBS and DRIVER_INCS in the config file within the LuaSQL folder.
Hope it helps.
I recently installed hadoop on my local ubuntu. I have started data-node by invoking bin/start-all.sh script. However when I try to run the word count program
bin/hadoop jar hadoop-examples-1.2.1.jar wordcount /home/USER/Desktop/books /home/USER/Desktop/books-output
I always get a connect exception. The folder 'books' is on my deskop(local filesystem). Any suggestions on how to overcome this?
I have followed every steps in this tutorial. I am not sure how to get rid of that error. All help will be appreciated.
copy your books file into your hdfs
and for the input path argument use hdfs path of your copied book file.
for more detail go through below link.
http://cs.smith.edu/dftwiki/index.php/Hadoop_Tutorial_1_--_Running_WordCount#Basic_Hadoop_Admin_Commands
There is a bit of confusion here, when you run the hadoop ... command then the default filesystem which it uses is the hadoop distributed filesystem hence the files must be located on the hdfs for hadoop to access it.
To copy files from the local filesystem to the hadoop filesystem you have to use the following command
hdfs dfs -copyFromLocal /path/in/local/file/system /destination/on/hdfs
One more thing if you want to run the program from your IDE directly then sometimes you get this issue which can be solved by adding the
core-site.xml and hdfs-site.xml files in the conf variable something like
conf.addResource(new Path("/usr/local/hadoop/etc/hadoop/core-site.xml"));
conf.addResource(new Path("/usr/local/hadoop/etc/hadoop/hdfs-site.xml"));
change the path above to the hdfs-site.xml and core-site.xml to your local path.
So the above arguments can also be provided from the command line by adding them to the classPath with -cp tag.