Hadoop/Hive : Loading data from .csv on a remote machine - csv

I am having a csv file that can from a http url. Is there any way I can load it from there :-
This is what I am trying
LOAD DATA INPATH 'http://192.168.56.101:8081/TeamHalf.csv' OVERWRITE INTO TABLE csvdata;

Hive Load command is a follows :
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
1) if LOCAL specified - Loads from local FS filepath
2) if no LOCAL - Loads from HDFS filepath only i.e,:
filepath must refer to files within the same filesystem as the table's (or partition's) location
So Load from remote http:path won't work. refer HIVE DML . The possible way is (Staging) to load the data from remote http:path to LocalFS or HDFS , then to Hive Warehouse.

Related

Couldn't load file in neo4j

I used the following comnands in neo4j, but the system always responds the following error message.
"Couldn't load the external resource at: file:/import/Tokyo_subway_system.csv ()"
Here is my script:
load csv with headers from "file:///Tokyo_subway_system.csv" as csvLine
create (s:Station {id: toInteger(csvLine.id), station_No: csvLine.station_No, station_Name: csvLine.station_Name, station_English: csvLine.station_English, line_Name: csvLine.line_Name ,line_English: csvLine.line_English, latitude: csvLine.latitude, longitade: csvLine.longitade})
Find your $NEO4J_HOME/import/ folder in your server or local directory. Then copy that file Tokyo_subway_system.csv in that directory. If you have multiple versions of neo4j installed, ensure that you are on the right neo4j home directory.

Neo4j load csv error : Couldn't load the external resource

I am using Neo4j3.0.1 and for loading a csv file
LOAD CSV WITH HEADERS FROM 'file:///D:/dummy.csv' as line
CREATE (:myData {line})
But it throws an error :
Couldn't load the external resource at: file:/D:/dummy.csv
Note : I've already tried configuring neo4j.conf which was described here
Suggest any other alternative besides placing csv file into import folder.
Try setting dbms.directories.import to D: in neoj4.conf
dbms.directories.import=D:
and after run
LOAD CSV WITH HEADERS FROM 'file:///dummy.csv' as line
CREATE (:myData {line})
EDIT:
As shown in comments the problem was solved by changing the owner of the CSV file location directory, as described in this answer.
sudo chown neo4j:adm <csv file location>

How GreenPlum implements external table?

I am learning gpdb. And I am confused on external tables. Following is my understanding.
For Read-only table, gpdb only create metadata of the table. All data is stored on remote server like hdfs. When querying, data is transferred from remote server to segment. Data will not be saved on segments when query ends.
For Write-only table, gpdb will load data from remote. All data will be saved on segments. Insert action will modify data on local segment, not remote.
Is my understanding right?
An External Table is a table where the data is stored and managed outside of the database. An example would be a CSV or TEXT file.
A Readable External Table has the metadata stored in the database and like you said, all data is stored somewhere else like HDFS. It can also be the local filesystem to Greenplum. The most common External Table uses gpfdist to "serve" the file(s). gpfdist is basically a web server that multiple segments can read at the same time.
Example:
First, I will start gpfdist on four different hosts. These hosts must be accessible by all segment hosts too.
[gpadmin#host1] gpfdist -p 8999 -d /landing/ &
[gpadmin#host2] gpfdist -p 8999 -d /landing/ &
[gpadmin#host3] gpfdist -p 8999 -d /landing/ &
[gpadmin#host4] gpfdist -p 8999 -d /landing/ &
And now put an example file in each:
[gpadmin#host1] hostname > /landing/hostname.txt
[gpadmin#host2] hostname > /landing/hostname.txt
[gpadmin#host3] hostname > /landing/hostname.txt
[gpadmin#host4] hostname > /landing/hostname.txt
Create an External Table to read it:
[gpadmin#master] psql
gpadmin=# create external table public.ext_hostnames
(hostname text)
location (
'gpfdist://host1:8999/hostname.txt',
'gpfdist://host2:8999/hostname.txt',
'gpfdist://host3:8999/hostname.txt',
'gpfdist://host4:8999/hostname.txt')
format 'text' (delimiter '|' null as '');
This simple example shows files managed outside of the database that are now accessible by Greenplum with an External Table.
A Writable External Table reverses this and allows you to write data to an external system like HDFS or the posix filesystem.
[gpadmin#master] gpfdist -p 8999 -d /home/gpadmin/ &
[gpadmin#master] psql
gpadmin=# create writable external table public.wrt_example
(foo text)
location ('gpfdist://master:8999/foo.txt')
format 'text' (delimiter '|' null as '');
insert into public.wrt_example values ('jon');
gpadmin=# insert into public.wrt_example select 'jon';
gpadmin=# \q
[gpadmin#master] cat /home/gpadmin/foot.txt
jon
The use case for a Writable External Table is to take data in Greenplum and put it somewhere else.
A common example is when you use HDFS for a Data Lake and Greenplum for Analytics. You could read data out of HDFS with an External Table and INSERT it into Greenplum tables. You then analyze this data, use analytical functions from the Madlib package, and find new insights about your data. Now you want to push this new data from Greenplum back to HDFS so that other consumers can benefit from the insights. You then use a Writable External Table to INSERT the data from Greenplum to HDFS.
The most common use case for a Readable External Table is loading files from external sources into Greenplum.

MySQL : Load data infile

I am getting error when using load data to insert the query .
"load data infile '/home/bharathi/out.txt' into table Summary"
This file is there in the location . But mysql throws the below error .
ERROR 29 (HY000): File '/home/bharathi/out.txt' not found (Errcode: 13)
show variables like 'data%';
+---------------+-----------------+
| Variable_name | Value |
+---------------+-----------------+
| datadir | /var/lib/mysql/ |
+---------------+-----------------+
Data Dir is pointing to root permissioned folder . I can't change this variable because it's readonly .
How can I do the load data infile operation ?
I tried changing file permissions , load data local infile . It wont work .
As documented under LOAD DATA INFILE Syntax:
For security reasons, when reading text files located on the server, the files must either reside in the database directory or be readable by all. Also, to use LOAD DATA INFILE on server files, you must have the FILE privilege. See Section 6.2.1, “Privileges Provided by MySQL”. For non-LOCAL load operations, if the secure_file_priv system variable is set to a nonempty directory name, the file to be loaded must be located in that directory.
You should therefore either:
Ensure that your MySQL user has the FILE privilege and, assuming that the secure_file_priv system variable is not set:
make the file readable by all; or
move the file into the database directory.
Or else, use the LOCAL keyword to have the file read by your client and transmitted to the server. However, note that:
LOCAL works only if your server and your client both have been configured to permit it. For example, if mysqld was started with --local-infile=0, LOCAL does not work. See Section 6.1.6, “Security Issues with LOAD DATA LOCAL”.
The solution which really worked for me was to:
sudo chown mysql:mysql /path/to/the/file/to/be/read.csv
Adding it for future reference.

MySQL cannot find data file for a load operation

I would like to load a data file into MySQL using the following command:
LOAD DATA LOCAL INFILE '/Users/David/Desktop/popularity20110511/test_data' INTO TABLE test_table
The above command gives me the following error:
#7890 - Can't find file '/Users/David/Desktop/popularity20110511/test_data'.
I've also tried:
LOAD DATA INFILE '/Users/David/Desktop/popularity20110511/test_data' INTO TABLE test_table
I also gives me an error:
#13 - Can't get stat of '/Users/David/Desktop/popularity20110511/test_data' (Errcode: 13)
I've repeatedly checked the file path and name and I've also made sure the file privilege is set to Read & Write for everyone.
I am using a Mac and phpMyAdmin.
Any suggestions on what the problem may be?
I had the same problem using MacOs and tried to change permissions, etc, but I realized you have to use the same directory structure you have using in the Terminal Application. Example: if you have (localhost/myproject/myfile.csv) try using
(Applications/XAMPP/htdocs/myproject/myfile.csv).
LOAD DATA LOCAL INFILE '/Applications/XAMPP/htdocs/myproject/myfile.csv'
INTO TABLE `mytable`
FIELDS TERMINATED BY ';'
LINES TERMINATED BY '\r';
I'm not too sure what the problem is but I made it work by moving the file to /tmp/test_data and used LOAD DATA INFILE...
I have had the same issue, trying to import an SQL file that uses LOAD DATA LOCAL INFILE... to import a CSV file in phpMyAdmin and got the same error message:
#7890 - Can't find file 'myfile.csv'
The solution I found was to put the file in the same folder as phpMyAdmin.
I was having the same problem
'C:/Program Files/DatabaseTableHolders/Menu.csv'
7890 - Can't find file '
The first thing I did was move the files to the "Program File" directory
It still wouldn't work
Then I changed the path address from
'C:/Program Files/DatabaseTableHolders/Menu.csv'
to
'C:\Program Files\DatabaseTableHolders\Menu.csv'
THIS WORKS!!!
For me its something to do with the path structure.
By the way I'm using Eclipse and phpMyAdmin on WAMP (windows operating system). I hope this helps.
Yes, I meet the same error.
My situation:
XAMPP + MAC OS 10.9
load data local infile '/Applications/XAMPP/xamppfiles/htdocs/jsonSQL.txt' into table `ttlegs` fields terminated by ',' lines terminated by '\n'
and this works when I put jsonSQL.txt to htdocs.
It is best if you put that text file in 'xammp /phpMyAdmin ' directory ( I assume you work on xammp) That's it. then LOAD DATA LOCAL INFILE will work. Happy Coding