Hadoop ConnectException - exception

I recently installed hadoop on my local ubuntu. I have started data-node by invoking bin/start-all.sh script. However when I try to run the word count program
bin/hadoop jar hadoop-examples-1.2.1.jar wordcount /home/USER/Desktop/books /home/USER/Desktop/books-output
I always get a connect exception. The folder 'books' is on my deskop(local filesystem). Any suggestions on how to overcome this?
I have followed every steps in this tutorial. I am not sure how to get rid of that error. All help will be appreciated.

copy your books file into your hdfs
and for the input path argument use hdfs path of your copied book file.
for more detail go through below link.
http://cs.smith.edu/dftwiki/index.php/Hadoop_Tutorial_1_--_Running_WordCount#Basic_Hadoop_Admin_Commands

There is a bit of confusion here, when you run the hadoop ... command then the default filesystem which it uses is the hadoop distributed filesystem hence the files must be located on the hdfs for hadoop to access it.
To copy files from the local filesystem to the hadoop filesystem you have to use the following command
hdfs dfs -copyFromLocal /path/in/local/file/system /destination/on/hdfs
One more thing if you want to run the program from your IDE directly then sometimes you get this issue which can be solved by adding the
core-site.xml and hdfs-site.xml files in the conf variable something like
conf.addResource(new Path("/usr/local/hadoop/etc/hadoop/core-site.xml"));
conf.addResource(new Path("/usr/local/hadoop/etc/hadoop/hdfs-site.xml"));
change the path above to the hdfs-site.xml and core-site.xml to your local path.
So the above arguments can also be provided from the command line by adding them to the classPath with -cp tag.

Related

How do I install a driver in Erlang? (Specifically MySQL-otp driver)

From the documentation this driver looks great. I don't know how to install it so that I can use it though. I read somewhere that I should maybe use rebar? I looked at that documentation though and it appears to have the opposite problem. It says how to install it, but not how to use it.
Update
So it looks like after installing rebar, I can add the lines
{deps, [
{mysql, ".*", {git, "https://github.com/mysql-otp/mysql-otp",
{tag, "1.3.3"}}}
]}.
to my file rebar.config. I don't know what this does though. Do I have to compile or make this file now? Does rebar.config have to be in the same directory as my project? Right not the path to rebar.config is ~/rebar/rebar.config
Is it all correct to place my project so that it is a sibling to rebar in the file hierarchy?
Update
I ran ./rebar get-deps with the rebar folder and got
Pulling mysql from {git,"https://github.com/mysql-otp/mysql-otp",
{tag,"1.3.3"}}
Cloning into 'mysql'...
==> mysql (get-deps)
I still don't really know what this means though, and when I try compiling my erlang file I receive the result.
c(erlangFile.erl).
{error,non_existing}
rebar is a build tool for erlang.Please go through https://github.com/rebar/rebar/wiki/Rebar-commands for the commands.
After getting dependency, "rebar compile" is required to compile it.
For using the beam files, you have to give output beam path using Add Path to Erlang Search Path?
these methods.
Then you will be able to use it in your code.
Download your package, in this case
git clone https://github.com/mysql-otp/mysql-otp.git
Download a tool called rebar
git clone git://github.com/rebar/rebar.git
cd rebar
./bootstrap
Add the following to rebar/rebar.config
{deps, [
{mysql, ".*", {git, "https://github.com/mysql-otp/mysql-otp",
{tag, "1.3.3"}}}
]}.
Within the rebar/mysql-otp directory run
./rebar get-deps
Then within the same directory, run
./rebar compile
This will put a bunch of .beam files and .app file into the ebin/ directory
Next add the ebin/ directory to your path. You can update your $ERL_LIBS environment variable, run an include command within the erlang console like
1> code:add_pathz("~/rebar/mysql-otp/ebin").
or
1> code:add_pathz("rebar/mysql-otp/ebin")
And theres a few other ways to add it to your Erlang path.
Also, make sure mysql is also installed
Heres a few links with mysql installation instructions that worked for me
https://www.digitalocean.com/community/tutorials/how-to-install-mysql-on-centos-7
No package msyql-server available

phoenix through upload bulk csv file command not understand?

I want to upload bulk csv file using phonix but I can not understood below command. Can you explain me in details ?
HADOOP_CLASSPATH=$(hbase mapredcp):/path/to/hbase/conf hadoop jar phoenix-<version>-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool --table EXAMPLE --input /data/example.csv
I take this command from following website.
https://phoenix.apache.org/bulk_dataload.html
I am not sure if you are still looking for an answer. But here it is. You are first setting the HADOOP_CLASSPATH and then call the executable "hadoop" with jar options to look for phoenix client jar and the class to run with parameters.
The following can help you to understand hadoop command usage (try typing hadoop on your ssh shell)
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
CLASSNAME run the class named CLASSNAME
or
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
note: please use "yarn jar" to launch
YARN applications, not this command.
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
envvars display computed Hadoop environment variables
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
credential interact with credential providers
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
trace view and modify Hadoop tracing settings
Most commands print help when invoked w/o parameters.

missing /user directory in hadoop-2.7.2

I am bit new to hadoop. I have recently install a stable version of Apache Hadoop 2.7.2 on ubuntu 14.04 OS.
I am trying to execute some basic Hadoop command such as follow
hadoop version
The command gives me the correct output as follows that is correct.
However, when I try to execute the hadoop fs -ls, it give me error.
I have searched the previous question related to this problem on stackoverflow such as StackoverflowQuestion. But, I am not finding /user directory in my hadoop installation. Could you please help me how can I resolve this issue?
The content of my .bashrc file is as follows:
The content of hdfs-site.xml file is as follows:
First of all the command "hadoop fs -ls" is a command to the HDFS file system and not a Linux command.
Second, the command as you typed it is incomplete. The correct syntax is "hadoop fs -ls [-d] [-h] [-R] " where the [-d], [-h] and [-R] components of the command are optional. That said, you MUST specify a path for the "args" component. The "args" component of the command expects an HDFS path (e.g. substituting / for "args" will list the entire tree ** ON HDFS ** starting at the HDFS root directory /). You will need to create a directory called "user" on HDFS under the root directory using "hadoop fs -mkdir /user". The the command "hadoop fs -ls /user" will work and will show an empty user directory.
Third, there is no way to tell HDFS to use a value for the "args" by supplying it a value of a local filesystem (Linux) path ...which is what you are attempting or understanding it to be. Any value for "args" must resolve to an HDFS filesystem path and not a Linux filesystem path.
Fourth, for newcomers to Hadoop, it is very important to have a clear distinction between the native host operating system filesystem (in this case Linux filesystem) and the Hadoop filesystem (in this case HDFS).
one thing to note when doing a hadoop commands in v2.7.2 is that, hadoop works on top of Linux OS, hence when we want to access into Hadoop Distributed File System, we would use something like this command; hdfs dfs -ls / instead of hadoop fs -ls.
Also, in your hdfs-site.xml configurations, you seemed to miss out adding this properties.
<property>
<name>dfs.datanode.dir</name>
<value>file://path/to/datanode</value>
</property>
Please take note of your $HADOOP_HOME as well.

Cypher Neo4j Couldn't load the external resource

In a Windows environment, I'm trying to load a .csv file with statement:
LOAD CSV WITH HEADERS FROM "file:///E:/Neo4j/customers.csv" AS row
It seems not to work properly and returns:
Couldn't load the external resource at:
file:/E:/Neo4j/Customers.csv
Neo.TransientError.Statement.ExternalResourceFailure
What am I doing wrong? thanks in advance
I was getting this error on Community Edition 3.0.1 on Mac OS X 10.10
It appears that the LOAD CSV file:/// looks for files in a predefined directory. One would think that in the argument that one would give the Cypher statement the full path but that is not the case.
The file:/// - for my situation" meant that neo4j would append the given argument you gave to one that was already predefined and then go look for that combined path
The file:/// pre-defined directory directory did not exist entirely
/Users/User/Documents/Neo4j/default.graphdb/import, in my computers directory structure I was missing the "/import" folder, which was not created at install
To fix on my system, I created an "import" directory, put the file to be read in that directory. I executed the Cypher load statement I ONLY put the name of the file to be read in the file argument i.e.
LOAD CSV file:///data.csv
this worked for me.
It appears to be a security configuration. Here's the original answer I found: https://stackoverflow.com/a/37444571/327004
You can add the following setting in conf/neo4j.conf in order to bypass this :
dbms.security.allow_csv_import_from_file_urls=true
Or change the import directory dbms.directories.import=import
You can find the answer in the file
"C:\Users\Jack\AppData\Roaming\Neo4j Community Edition\neo4j.conf"
(above "dbms.directories.import=import")
For version neo4j-community_windows-x64_3_1_1 you have to comment out this line or you have to create the folder \import (which isn´t created through the installation) and add your file into the folder.
There it´s written that due to security reasons they only allow file load from the \Documents\Neo4j\default.graphdb\import folder
After commenting out on # dbms.directories.import=import , you can execute e.g. from
LOAD CSV FROM "file:///C:/Users/Jack/Documents/products.csv" AS row
In neo4j.conf I didn´t have to add/set
dbms.security.allow_csv_import_from_file_urls=true
On (Arch) Linux + neo4j-community-3.4.0-alpha09, edit $NEO4J_HOME/conf
/neo4j.conf:
uncomment or add: dbms.security.allow_csv_import_from_file_urls=true
comment: #dbms.directories.import=import
Restart neo4j (in terminal: neo4j restart), and reload the Neo4j Browser (http://localhost:7474/browser/) if you are using a web browser as your Neo4j interface/GUI.
Then, you should be able to load a csv from outside your $NEO4J_HOME/... directory
E.g.,
LOAD CSV WITH HEADERS FROM "file:///mnt/Vancouver/Programming/data/metabolism/practice/a.csv" AS ...
where my $NEO4J_HOME/ is /mnt/Vancouver/apps/neo4j/neo4j-community-3.4.0-alpha09/
LOAD CSV WITH HEADERS FROM "file:/mnt/Vancouver/Programming/data/metabolism/practice/a.csv" AS ...
also works, but not
LOAD CSV WITH HEADERS FROM "file://mnt/Vancouver/Programming/data/metabolism/practice/a.csv" AS...
or
LOAD CSV WITH HEADERS FROM "/mnt/Vancouver/Programming/data/metabolism/practice/a.csv" AS...
i.e. use ...file:/... or ...file:///...
It's probably an URL issue, try file:c:/path/to/data.csv
See my blog posts:
http://jexp.de/blog/2014/10/load-cvs-with-success/
http://jexp.de/blog/2014/06/load-csv-into-neo4j-quickly-and-successfully/
For the ubuntu system, I placed the file in /usr/lib/neo4j which helped me solved the issue. On every other location, i tried giving full permissions(777) but the problem remains the same. After going through another stackoverflow post, i realized that the file should be kept in neo4j directory.
In the Neo4j desktop select the database you are using, go to the setting and there you will find the solution... just comment the "dbms.directories.import=import" line
# This setting constrains all LOAD CSV import files to be under the import directory. Remove or comment it out to
# allow files to be loaded from anywhere in the filesystem; this introduces possible security problems. See the
# LOAD CSV section of the manual for details.
dbms.directories.import=import ### COMMENT THIS LINE
For macOS Mojave v 10.14.5
Actually, I had to uncomment dbms.directories.import=import from ~/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-e2dd2a9c-d450-4639-861b-1e7e42b56b31/installation-3.5.5/conf/neo4j.conf and restart the service. Then it worked. All files has to be placed in import directory.
Run command LOAD CSV WITH HEADERS FROM 'FILE:/<yourCSV>.csv' as l return l
I am using the Neo4j Desktop and as others have said, the default graph database has a predefined import location. You can find the location by using the UI. If you put the CSV into the import directory, then you can use the relative path directly from you load csv command
Neo4j version is 3.1.1, OS is win10.
For me, LOAD CSV would read from Neo4j_Database_Location/testDB/import/artists.csv.
At first, I put csv file on the path F:\code\java\helloworld\artists.csv, and my cypher sentence is
LOAD CSV FROM 'file:///F:\\code\\java\\helloworld\\artists.csv' AS line
CREATE(:Artist {name:line[1],year:toInt(line[2])})
Then I get the error message returned as follows:
Couldn't load the external resource at: file:/D:/Neo4j/db/testDB/import/code/java/helloworld/artists.csv
It means neo4j itself concat the file path.
"D:/Neo4j/db/testDB/import/" is the Neo4j database location, and the "code/java/helloworld/artists.csv" is the csv file location.
For example, I install Neo4j on the path D:\Neo4j\Neo4j CE 3.1.1, and database loaction is D:\Neo4j\db. I put the CSV file on the path D:\Neo4j\db\testDB\import\artist.csv. If you don't have the file folder "import" on the path, you should creat it by yourself and put your file in the folder "import".
Then, put your csv file in the path, and input cyper sentence:
LOAD CSV from 'file:///artist.csv' as LINE
CREATE(:Artist {name:line[1],year:toInt(line[2])})
In a word, once you put the CSV file in the right path, the problem can be solved.
Related explaination in the LOAD CSV developer-manal
If dbms.directories.import is set to the default value import, using the above URLs in LOAD CSV would read from /import/myfile.csv and import/myproject/myfile.csv respectively.
If it is set to /data/csv, using the above URLs in LOAD CSV would read from /data/csv/myfile.csv and /data/csv/myproject/myfile.csv respectively.
Set the Property "dbms.directories.import=import"
Create folder 'import' explicitly at "/Users/User/Documents/Neo4j/default.graphdb/" because pre-defined directory did not exist entirely
place the csv data set here in the import folder
then run the code like - LOAD CSV FROM "file:///C:/customers.csv" AS row
In addition after you run the line, you can analyze what is going wrong in the code section to get a better understanding
you put your dataset into the import directory in neo4j-community path.
Then re-run your command.
Add your csv file in the import folder of neo4j installation guide to do this.
open neo4j and start graph of ur project
then in open folders tab open import folders
Copy ur csv file in this folder
Copy that part in ur load syntax as file:///C:/neo4j_module_datasets/test.csv since ur neo4j in running
in C drive
Snapshot for your reference
Use the following syntax:
LOAD CSV WITH HEADERS FROM "file:///my_collection.csv" AS row CREATE (n:myCollection) SET n = row
If you are running a docker then, follow these commands before running above query:
docker run \
-p=7474:7474 \
-p=7687:7687 \
-v=$HOME/neo4j/data:/data \
-v=$HOME/neo4j/logs:/logs \
-v=$HOME/local_import_dir:/var/lib/neo4j/import \
neo4j:3.0
Then,
sudo cp my_collection.csv /home/bajju/local_import_dir/
One of the following should solve the LOAD CSV errors (assuming you have dbms.security.allow_csv_import_from_file_urls=true)
If using Linux check for the permissions for the file. Change it using chmod 777 file_name.csv
Check if the file format/format for the contents within the file is correct.
The easiest way (be ware of security) is to serve you directory over http and use the http import
in the command line go the folder where csv files are lcoated
run the following depending on your python env.
Python 2
$ python -m SimpleHTTPServer 8000
Python 3
$ python3 -m http.server 8000
- Now you can load your files from your local host
LOAD CSV FROM 'http://localhost:8000/mycsvfile.csv' AS row
return row
- you can actually expose files on one host and load them where your DB is running by exposing the folder and replacing localhost with your IP

how to connect mysql databases in weka?

i want to use my mysql databases in weka in order to analysis data.
i download the mysql-connector-java-5.0.8-bin.jar and put that in weka folder in my program Files folder, and add this path to system variable path, but when i open the weka explorer and click the openDB, i don't know what should i write in the url textbox,
i don't know completely what should i do?
the error that i saw is:
problem connecting to database:
no suitable driver found for!
please give me a total guidance, thanks in advance.
add mysql-connector-java-5.1.12-bin.jar to CLASSPATH .
or put it in weka folder, navigate to weka installation folder and run command :
%java_home%/bin/java -Xmx300M -cp ".;weka.jar;mysql-connector-java-5.1.12-bin.jar;" weka.gui.GUIChooser
then click open DB and
fill in the proper user+password
put the url: jdbc:mysql://localhost:3306/DATABASENAME
Click Execute. The result window should show the results now.
For Weka 3.7.10, the classpath system environment variable is not took into account (at leat under Windows 7). The only working approach for me was to modify the RunWeka.ini file from the Weka installation folder as follows: the cp= setting was modified to:
cp=%CLASSPATH%;d:/Programs/jdbc/mysql-connector-java-5.1.26/mysql-connector-java-5.1.26-bin.jar
whereas only the %CLASSPATH%; setting was originally provided. Does not make much sense, but in worked.
find some introduction to WEKA and the environment variable:
http://weka.wikispaces.com/CLASSPATH
Copy mysql-connector-java-X.X.XX-bin.jar to /usr/share/java/
unizip /usr/share/java/weka.jar
edit: /usr/share/java/weka/experiment/DatabaseUtils.props
add:
jdbcDriver=com.mysql.jdbc.Driver
jdbcURL=jdbc:mysql://localhost:3306/test (with your server)
add shell
export CLASSPATH=/usr/share/java/mysql-connector-java-X.X.XX-bin.jar:
export CP=/usr/share/java/mysql-connector-java-X.X.XX-bin.jar::/usr/share/java/:/usr/shared/java/weka.jar
execute weka:
java -cp $CP -Xmx500m weka.gui.explorer.Explorer
Ready