neo4j LOAD CSV returns Couldn't Load external resource - csv

Trying CSV import to Neo4j - doesn't seem to be working.
I'm loading a local file using the syntax:
LOAD CSV WITH HEADERS FROM "file:///location/local/my.csv" AS csvDoc
Am wondering if there's something wrong with my CSV file, or if there's some syntax problem here.
If you didn't read the title, the error is:
Couldn't load the external resource at: file:/location/local/my.csv
[Neo.TransientError.Statement.ExternalResourceFailure]

Neo4j seems to need a full path spec to get a file on the local system.
On linux or mac try
LOAD CSV FROM "file:/Users/you/location/local/my.csv"
On windows try
LOAD CSV FROM "file://c:/location/local/my.csv"
.

In the browser interface (Neo4j 3.0.3, MacOS 10.11) it looks like Neo4j prefixes your file path with $path_to_graph_database/import. So you could move your files there. If you are using a command line tool, then see this SO question.

Easy solution:
Once you choose your database location (in my case ReactomeGraphDB60)...
here I placed my ddbb
...go to that folder, and create inside a folder called "import".
Later in the cypher query write (as an example):
LOAD CSV WITH HEADERS FROM "file:///ILClasiffStruct.csv" AS row
CREATE (n:Interleukines)
SET n = row

Related

Trying to load text file in mysql but error is no such file or directory

I am trying to load (a.txt) file into mysql with the load command, but it says no such file or directory even if file is present at the specified path?
load data local infile 'F:\makarand\a.txt'
into TABLE file;
load data local infile 'F:\makarand\a.txt'
into TABLE file;
I have tried this also tried by removing local word but the issue remains same
It says:
Error No 2:No such file or directory
File is a protected name in MYSQL
https://dev.mysql.com/doc/refman/8.0/en/keywords.html
It expects a filename now, rather than your suggested table. Either rename the table (use a pre- or suffix) or try to quote it in your query. These kind of things sadly happen, thus I always use a prefix on my tables, to make sure i dont encounter this kind of things.
// edit //
See also this topic. This has nothing to do with Windows or Linux
Load data infile, difference between Windows and Linux

LOAD CSV command keeps using old file: location, ignores command input

I am using Community edition 3.0.5 on Windows 10 . I made multiple efforts to execute a LOAD CSV command before being told that such files cannot reside on an external drive. When I moved the file to users/user/ and tried to execute the LOAD CSV command I got the same message "Couldn't load the external resource at: file:/F:/Neo4j%20DBs/Data.gov%20Consumer%20Complaints/Consumer%20Complaints%20DB/import/Users/CharlieOh/Consumer_Complaints.csv" in spite of the fact the command I entered was
"LOAD CSV WITH HEADERS FROM
'file:///Users/CharlieOh/Consumer_Complaints.csv' AS line
WITH line
LIMIT 1
RETURN line"
I tried to locate the file neo4j.conf and could only find C:\Program Files (x86)\Neo4j Community 3.2.2\Neo4j Community.install4j\i4jparams.conf . I even deleted the old DB and recreated the small amount of data and got the same error, which seems to indicate that the LOAD CSV function is totally useless across all my neo4j databases. BTW the %20 in the file specification was due to suggestions on Stack Overflow as well as using underscores to avoid any use of blank spaces in the file specification. None of it worked and now that I believe that I may have solved the problem by putting the csv file in the user directory, the LOAD CSV function won't let me do it. One last thing, I am following the YouTube video https://www.youtube.com/watch?v=Eh_79goBRUk to learn how to load a csv file into neo4j.
The csv file needs to go in the import directory of the specific database. With Neo4j Desktop this is easy to identify by clicking on the Manage button of the database and then the open folder button. It looks like you've found it.
Once the database import directory is located, you specify it in the LOAD CSV with the statement LOAD CSV WITH HEADERS FROM 'file:///" + FN + "'where FN is your file name, including the csv extension. You do NOT use the full path; that is assumed.

Spark S3 CSV read returns org.apache.hadoop.mapred.InvalidInputException

I see several posts here and in a Google search for org.apache.hadoop.mapred.InvalidInputException
but most deal with HDFS files or trapping errors. My issue is that while I can read a CSV file from spark-shell, running it from a compiled JAR constantly returns an org.apache.hadoop.mapred.InvalidInputException error.
The rough process of the jar:
1. read from JSON documents in S3 (this works)
2. read from parquet files in S3 (this also succeeds)
3. write a result of a query against #1 and #2 to a parquet file in S3 (also succeeds)
4. read a configuration csv file from the same bucket #3 is written to. (this fails)
These are the various approaches that I have tried in code:
1. val osRDD = spark.read.option("header","true").csv("s3://bucket/path/")
2. val osRDD = spark.read.format("com.databricks.spark.csv").option("header", "true").load("s3://bucket/path/")
All variations of the two above with s3, s3a and s3n prefixes work fine from the REPL but inside a JAR they return this:
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: s3://bucket/path/eventsByOS.csv
So, it found the file but can't read it.
Thinking this was a permissions issue, I have tried:
a. export AWS_ACCESS_KEY_ID=<access key> and export AWS_SECRET_ACCESS_KEY=<secret> from the Linux prompt. With Spark 2 this has been sufficient to provide us access to the S3 folders up until now.
b. .config("fs.s3.access.key", <access>)
.config("fs.s3.secret.key", <secret>)
.config("fs.s3n.access.key", <access>)
.config("fs.s3n.secret.key", <secret>)
.config("fs.s3a.access.key", <access>)
.config("fs.s3a.secret.key", <secret>)
Before this failure, the code reads from parquet files located in the same bucket and writes parquet files to the same bucket. The CSV file is only 4.8 KB in size.
Any ideas why this is failing?
Thanks!
Adding stack trace:
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:253)
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:201)
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:281)
org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
scala.Option.getOrElse(Option.scala:121)
org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
scala.Option.getOrElse(Option.scala:121)
org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
scala.Option.getOrElse(Option.scala:121)
org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1332)
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
org.apache.spark.rdd.RDD.take(RDD.scala:1326)
org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1367)
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
org.apache.spark.rdd.RDD.first(RDD.scala:1366)
org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.findFirstLine(CSVFileFormat.scala:206)
org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:60)
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
scala.Option.orElse(Option.scala:289)
org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$getOrInferFileFormatSchema(DataSource.scala:183)
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:387)
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:415)
org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:352)
nothing springs out when I paste that stack into the IDE, but I'm looking at a later version of Hadoop and can't currently switch to older ones.
Have a look at these instructions
That landsat gz file is actually a CSV file you can try to read in; it's the one we generally use for testing because its there and free to use. Start by seeing if you can work with it.
If using spark 2.0, use spark's own CSV package.
Do use S3a, not the others.
I solved this problem by adding the specific Hadoop configuration for the appropriate method (s3 in the example here). The odd thing is that the above security works for everything in Spark 2.0 EXCEPT reading the CSV.
This code solved my problem using S3.
spark.sparkContext.hadoopConfiguration.set("fs.s3.awsAccessKeyId", p.aws_accessKey)
spark.sparkContext.hadoopConfiguration.set("fs.s3.awsSecretAccessKey",p.aws_secretKey)

Import csv file in neo4j from local disk

I am facing difficulty in importing a csv file in neo4j. I am working on Windows I have been trying this:
LOAD CSV WITH HEADERS FROM "file:c:/path/to/data.csv" as submissions create (a1:Submission {preview: submissions.preview, secure_media_embed: submissions.secure_media_embed, media: submissions.media, secure_media: submissions.secure_media, media_embed: submissions.media_embed})
Getting error:
URI is not hierarchical
Any suggestion on what I am doing wrong here, I have been following blogs and all suggests this
Edit the neo4j conf file (/etc/neo4j/neo4j.conf)
change the below line
dbms.directories.import=import
to
dbms.directories.import=/home/suyati/Downloads/
for loading a file from downloads.
In neo4j browser:
load csv with headers from "file:///1.csv" as row
(Your file should be there like /home/suyati/Downloads/1.csv)
Its will works fine.

I get a mysterious "Neo.ClientError.Statement.InvalidSyntax" error when loading a CSV in Neo4j

For a course on Excel I was trying to load a CSV in Neo4j (first time using this application) when I was blocked at the first step of replicating an example shown in said course: loading.
The command which was used in the example was this;
LOAD CSV WITH HEADERS FROM "file:/path/to/file/file.csv"
as row
CREATE (m:movie {name:row.movie})
But it gave syntax errors. I found out I could correct it by using double \ and add "file:";
LOAD CSV WITH HEADERS FROM "file://C:\\path\\to\\file\\file.csv"
as row
CREATE (m:movie {name:row.movie})
Neo4j accepts this syntax, processes for a few moments, and returns YET ANOTHER error;
Neo.TransientError.Statement.ExternalResourceFailure
I tried the same commands (original and my own) in the online Neo4j console but no luck. I can reach the file using that path without problem; it really is there. The CSV file consist out of just 5 strings of regular letters, that's all. No fancy formatting or characters.
What's going on?
Not that mysterious, Neo4j's IMPORT CSV function looks for the specified CSV file in the import directory within your server configuration for that database, as specified at the top of its server configuration file. (IE: dbms.directories.import=import in your neo4j.conf file.)
You should create the import directory in...
"C:\Users\[User Name]\Documents\Neo4j\default.graphdb\"
If you place your CSV file in there, you can specify any sub-directory or just the "file.csv" you want to import with the IMPORT CSV function as below.
LOAD CSV WITH HEADERS FROM "file:///file.csv"
AS row
RETURN row
LIMIT 5
Try using:
"file:///C:/path/to/file/file.csv"
Since your file is on your local computer, the third / following the file scheme is not preceded by a host name or address -- but it still needs to be there. Also, file URI path separators should be forward slashes (even on Windows machines).
See the File URI scheme Wikipedia page if you need more information.