I am using comma separated value file to create nodes and edges in a Neo4j database. The commands which create nodes run with no issue. The attempt to create edges fails with this error:
Exception in thread "GC-Monitor" java.lang.OutOfMemoryError: GC
overhead limit exceeded
Exception: java.lang.OutOfMemoryError thrown from the
UncaughtExceptionHandler in thread "GC-Monitor"
Further, in the output from the commands there was this:
neo4j-sh (?)$ using periodic commit 400 load csv with headers from 'file://localhost/tmp/vm2set3.csv' as line match (u:VM {id: line.vm_id}),(s:VNIC {id: line.set3_id}) create (u)-[:VNIC]->(s);
SystemException: Kernel has encountered some problem, please perform neccesary action (tx recovery/restart)
neo4j-sh (?)$
SystemException: Kernel has encountered some problem, please perform neccesary action (tx recovery/restart)
neo4j-sh (?)$ using periodic commit 400 load csv with headers from 'file://localhost/tmp/unix2switch.csv' as line match (u:UNIX {id: line.intf_id}),(s:switch {id: line.set2a_id}) create (u)-[:cable]->(s);
SystemException: Kernel has encountered some problem, please perform neccesary action (tx recovery/restart)
neo4j-sh (?)$
My shell script is:
cat /home/ES2Neo/2.1/neo4j_commands.cql | /export/neo4j-community-2.1.4/bin/neo4j-shell -path /export/neo4j-community-2.1.4/data/graph.db > /tmp/na.out
The commands are like this:
load csv WITH HEADERS from 'file://localhost/tmp/intf.csv' AS line CREATE (:UNIX {id: line.id, MAC: line.MAC ,BIA: line.BIA ,host: line.host,name: line.name});
for nodes, and
using periodic commit 400 load csv with headers from 'file://localhost/tmp/unix2switch.csv' as line match (u:UNIX {id: line.intf_id}),(s:switch {id: line.set2a_id}) create (u)-[:cable]->(s);
for edges.
The csv input files look like this:
"intf_id","set2a_id"
"100321","6724919"
"125850","6717849"
"158249","6081895"
"51329","5565380"
"57248","6680663"
"235196","6094139"
"229242","4800249"
"225630","6661742"
"183281","4760022"
Is there something I am doing wrong? Is there something in Neo4j configuration I need to check? Thanks.
The problem is that you're running out of memory for loading the data into the database.
Take a look at this blog post which goes into a number of details about how to load CSV data in successfully.
In particular, here's the key bit from the blog post you should pay attention to.
The more memory you have the faster it will import your data.
So make sure to edit conf/neo4j-wrapper.conf and set:
# Uncomment these lines to set specific initial and maximum
# heap size in MB.
wrapper.java.initmemory=4096
wrapper.java.maxmemory=4096
In conf/neo4j.properties set:
# Default values for the low-level graph engine
neostore.nodestore.db.mapped_memory=50M
neostore.relationshipstore.db.mapped_memory=500M
neostore.propertystore.db.mapped_memory=100M
neostore.propertystore.db.strings.mapped_memory=100M
neostore.propertystore.db.arrays.mapped_memory=0M
Related
i'm using mariadb c-connector with prepare, bind and execute. it works usualy. but one case end up in "corrupted unsorted chunks" and core dumping when freeing bind buffer. i suggest the whole malloc organisation is messed up after calling mysql_stmt_execute(). my test's MysqlDynamic.c show:
the problem only is connected to x509cert variable bound by bnd[9]
freeing memory only fails if bnd[9].is_null = 0, if is_null execute end normally
freeing memory (using FreeStmt()) after bind and before execute end normally
print of bnd[9].buffer before execute show (void*) is connected to the correct string buffer
same behavior for setting bnd[9].buffer_length to STMT_INDICATOR_NTS or strlen()
other similar bindings (picture, bnd[10]) do not lead to corrupted memory and core dump.
i defined a c structure test for test data in my test program MysqlDynamic.c which is bound in MYSQL_BIND structure.
bindings for x509cert (string buffer) see bindInsTest():
bnd[9].buffer_type = MYSQL_TYPE_STRING;
bnd[9].buffer_length = STMT_INDICATOR_NTS;
bnd[9].is_null = ¶->x509certI;
bnd[9].buffer = (void*) para->x509cert;
please get the details out of source file MysqlDynamic.c. please adapt defines in the source to your environment, verify content, and run it. you will find compile info in source code. MysqlDynymic -c will create the table. MysqlDynamic -i will insert 3 records each run. And 'MysqlDynamic -d` drop the the table again.
MysqlDynamic -vc show:
session set autocommit to <0>
connection id: 175
mariadb server ver:<100408>, client ver:<100408>
connected on localhost to db test by testA
>> if program get stuck - table is locked
table t_test created
mysql connection closed
pgm ended normaly
MysqlDynamic -i show
ins2: BufPara <92> name<master> stamp<> epoch<1651313806000>
cert is cert<(nil)> buf<(nil)> null<1>
picure is pic<0x5596a0f0c220> buf<0x5596a0f0c220> null<0> length<172>
ins1: BufPara <91> name<> stamp<2020-04-30> epoch<1650707701123>
cert is cert<0x5596a0f181d0> buf<0x5596a0f181d0> null<0>
picure is pic<(nil)> buf<(nil)> null<1> length<0>
ins0: BufPara <90> name<gugus> stamp<1988-10-12T18:43:36> epoch<922337203685477580>
cert is cert<(nil)> buf<(nil)> null<1>
picure is pic<(nil)> buf<(nil)> null<1> length<0>
free(): corrupted unsorted chunks
Aborted (core dumped)
checking t_test table content show all records are inserted as expected.
you can disable loading of x509cert and/or picture by commenting out the defines line 57/58. the program than end normally. you also can comment out line 208. the buffers are then indicated as NULL.
Questions:
is there a generic coding mistake in the program causing this behavior?
can you run the program in your environment without core dumping? i'm currently using version 10.04.08.
any improvment in code will be welcome.
I'm using v5.1.1 of JMeter and attempting to use the "CSV Data Set Config". The file is read correctly as I can tell from the Debug Sampler/Results Tree, but the file is not being read line by line. In other words, it reads the first line and never proceeds to the next line for processing.
I would like to use the data inside the CSV to iterate over a series of HTTP Requests to an external API. I currently have a single thread with only the "CSV Data Set Config" and "HTTP Request".
Do I need to wrap this with a ForEach controller or another looping construct? Perhaps I'm missing it but I do not see in the documentation that would indicate it's necessary.
Thanks
You dont need to wrap this in a ForEach loop. First line in the CSV file is a var name:
Let's say your csv file looks like
foo, bar
1, John
2, George
3, Laura
And you use an http request sampler
then ${foo} and ${bar} will get iterated sequentially. However please make sure you are mindful about the CSV Data Set Config options. The following options works ok for me:
By default CSV Data Set Config doesn't trigged any "looping", it reads next line from the CSV file for each thread (virtual user) for each iteration.
So if you want to see more values from the CSV file - either add more users or loops or both.
Given
This CSV file:
line1
line2
line3
Following CSV Data Set Config setup:
And the following Thread Group setup:
You will get the following values (assuming __threadNum() function to visualize current virtual user number and ${__jm__Thread Group__idx} pre-defined variable to show current Thread Group iteration) :
Check out JMeter Parameterization - The Complete Guide article for more information on various approaches on parameterizing JMeter tests using external data sources
I have a CSV data file with a header row that I am using to populate a BigQuery table:
$ cat dummy.csv
Field1,Field2,Field3,Field4
10.5,20.5,30.5,40.5
10.6,20.6,30.6,40.6
10.7,20.7,30.7,40.7
When using the Web UI, there is a text box where I am able to specify how many header rows to skip. However, if I upload the data into BigQuery using the bq command line tool, I do not have an option to do this, and always get the following error:
$ bq load my-project:my-dataset.dummydata dummy.csv Field1:float,Field2:float,Field3:float,Field4:float
Upload complete.
Waiting on bqjob_r7eccfe35f_0000015e3e8c_1 ... (0s) Current status: DONE
BigQuery error in load operation: Error processing job 'my-project:bqjob_r7eccfe35f_0000015e3e8c_1': CSV table encountered too many errors, giving up. Rows: 1;
errors: 1.
Failure details:
- file-00000000: Could not parse 'Field1' as double for field Field1
(position 0) starting at location 0
The bq command line tool quickstart documentation also does not mention any options for skipping headers.
One simple/obvious solution is to edit dummy.csv to remove the header row, but this is not an option if pointing to a CSV file on Google Cloud Storage instead of the local file dummy.csv.
This is possible to do through the web interface, and through the Python API, so it should also be possible to do with the bq tool.
Checking bq help load revealed a --skip_leading_rows option:
--skip_leading_rows : The number of rows at the beginning of the source file to skip.
(an integer)
Also found this option in the bq command line tool documentation (which is not the same as the quickstart documentation, linked to above).
Adding a --skip_leading_rows=1 to the bq load command worked like a charm.
Here is the successful command:
$ bq load --skip_leading_rows=1 my-project:my-dataset.dummydata dummy.csv Field1:float,Field2:float,Field3:float,Field4:float
Upload complete.
Waiting on bqjob_r43eb07bad58_0000015ecea_1 ... (0s) Current status: DONE
What I'm trying to import is a CSV file with phone calls, and represent it as phone numbers in nodes and each call as an arrow.
The file is separated by pipes.
I have tried a first version:
load csv from 'file:///com.csv' as line FIELDTERMINATOR '|'
with line
merge (a:line {number:COALESCE(line[1],"" )})
return line
limit 5
and worked as expected, one node (outgoing number) is created for each row.
After that I could test what I've done with a simple
Match (a) return a
So I've tried the following step is creating the second node of the call (receiver)
load csv from 'file:///com.csv' as line FIELDTERMINATOR '|'
with line
merge (a:line {number:COALESCE(line[1],"" )})
merge (b:line {number:COALESCE(line[2],"" )})
return line
limit 5
After I run this code I receive no answer (I'm using the browser GUI at localhost:7474/broser) of this operation and if I try to perform any query on this server I get no result either.
So again if I run
match (a) return a
nothing happens.
The only way I've got to go back to life is stoping the server and starting it again.
Any ideas?
It is possible, that opening that big file twice will cause the problem because it is heavily based on the operational system how to handle big files.
Anyway, if you run it accidentally without the 'limit 5' clause then It can happen, since you are trying to load the 26GB in a single transaction.
Since LOAD CSV is for medium sized datasets, I recommend two solutions:
- Using the neo4j-import tool, or
- I would try to split up the file to smaller parts, and you should use periodic commit to prevent the out of memory situations and hangs, like this:
USING PERIODIC COMMIT 100000
LOAD CSV FROM ...
I'm trying to import roughly 64 thousand rows into a neo4j graph. During the import I'm converting some attributes to relations as these are being used by other fields as well with a merge.
This is my cypher query:
USING PERIODIC COMMIT 150
LOAD CSV WITH HEADERS FROM "http://example.com/some.csv" as csvline
MERGE (gem:Gemeente { name: csvline.GEMEENTE})
MERGE (cbs:CBS { name: csvline.CBSCODE})
CREATE (obj:Object { id: toInt(csvline.NUMMER),
prop2: toInt(csvline.PROP2)
})
CREATE (obj)-[:IN_GEMEENTE]->(gem)
CREATE (obj)-[:CBS_CODE]->(cbs)
When I manually truncate the csv-file to 10 rows; this cypher runs perfectly. I'm getting a nice graph with the appropriate relationships.
But running the Cypher-script for every row in my csv-file the server just stalls with an error/warning.
Within the dashboard at 7474 I'm just getting a plain simple error, without any information. While in the neo4j shell I'm getting the following error:
Error occurred in server thread; nested exception is:
java.lang.OutOfMemoryError: Java heap space
So it appears I'm running out of memory. So I tried to reduce the commit number; but this has no effect.
Off course I have a indexes on both :Gemeente(naam) and :CBS(naam)
A solution could be to split up the file in 'affordable' chunks; but that's off course a lot of work :) And not a real solution.
How can I resolve this issue?
You're probably running into the "eager" issue. It's discussed in these posts:
http://jexp.de/blog/2014/10/load-cvs-with-success/
http://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/
It will probably work better like this:
USING PERIODIC COMMIT 150
LOAD CSV WITH HEADERS FROM "http://example.com/some.csv" as csvline
MERGE (gem:Gemeente { name: csvline.GEMEENTE});
USING PERIODIC COMMIT 150
LOAD CSV WITH HEADERS FROM "http://example.com/some.csv" as csvline
MERGE (cbs:CBS { name: csvline.CBSCODE});
USING PERIODIC COMMIT 150
LOAD CSV WITH HEADERS FROM "http://example.com/some.csv" as csvline
CREATE (obj:Object { id: toInt(csvline.NUMMER),
prop2: toInt(csvline.PROP2)
})
MATCH
(gem:Gemeente { name: csvline.GEMEENTE}),
(cbs:CBS { name: csvline.CBSCODE})
CREATE (obj)-[:IN_GEMEENTE]->(gem)
CREATE (obj)-[:CBS_CODE]->(cbs)
You may not need to split it up as much as that, though. Also, since you'd need to load the csv file at least twice, you might want to lave it locally and run the CSV import from disk. The syntax is LOAD CSV WITH HEADERS FROM "file:///path/to/file" as csvline (I had lots of trouble finding an example when I first tried it. Itsfile://` followed by the path. My example is a unix path, but that can also be followed by a windows path, I believe)