I have 2 CSV files which I want to convert into a Neo4j database. They look like this:
first file:
name,enzyme
Aminomonas paucivorans,M1.Apa12260I
Aminomonas paucivorans,M2.Apa12260I
Bacillus cellulosilyticus,M1.BceNI
Bacillus cellulosilyticus,M2.BceNI
second file
name,motif
Aminomonas paucivorans,GGAGNNNNNGGC
Aminomonas paucivorans,GGAGNNNNNGGC
Bacillus cellulosilyticus,CCCNNNNNCTC
As you can see the common factor is the Name of the organism and the. Each Organism will have a few Enzymes and each Enzyme will have 1 Motif. Motifs can be same between enzymes . I used the following statement to create my database:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file1.csv" AS csvLine
MATCH (o:Organism { name: csvLine.name}),(e:Enzyme { name: csvLine.enzyme})
CREATE (o)-[:has_enzyme]->(e) //or maybe CREATE UNIQUE?
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file2.csv" AS csvLine
MATCH (o:Organism { name: csvLine.name}),(m:Motif { name: csvLine.motif})
CREATE (o)-[:has_motif]->(m) //or maybe CREATE UNIQUE?
This gives me errors on the very first line at USING PERIODIC COMMIT which says Invalid input 'S': expected. If I get rid of ti, the next error I get is WITH is required between CREATE and LOAD CSV (line 6, column 1)
"MATCH (o:Organism { name: csvLine.name}),(m:Motif { name: csvLine.motif})" . I googled this issue which led me to this answer . I tried the answer given ther (refreshing the browser cache) but the problem persists. WHat am I doing wrong here? Is the query correct? Is there an another solution to this issue? Any help will be greatly appreciated
Your queries have two issues at once:
You can't refer to a local file just with "file1.csv", because neo4j is expecting a URL
You're using MATCH in cases where the data may not originally exist; you need to use MERGE there instead, which basically acts like the create unique comment you added.
I don't know what the source of your specific error message is, but as written it doesn't look like these queries could possibly work. Here are your queries reformulated, so that they will work (I tested it on my machine with your CSV samples)
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/home/myuser/tmp/file1.csv" AS csvLine
MERGE (o:Organism { name: coalesce(csvLine.name, "No Name")})
MERGE (e:Enzyme { name: csvLine.enzyme})
MERGE (o)-[:has_enzyme]->(e);
Notice here 3 merge statements (MERGE basically does MATCH + CREATE if it doesn't already exist), and the fact that I've used a file: URL.
The second query gets formulated basically the same way:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/home/myuser/tmp/file2.csv" AS csvLine
MERGE (o:Organism { name: coalesce(csvLine.name, "No Name")})
MERGE (m:Motif { name: csvLine.motif})
MERGE (o)-[:has_motif]->(m);
EDIT I added coalesce in the Organism's name property. If you have null values for name in the CSV, then the query would otherwise fail. Coalesce guarantees that if csvLine.name is null, then you'll get back "No Name" instead.
Related
I've a csv file like the sample below and I want to import that into Neo4j and create nodes and relationships.
"N_ID","Name","Relationship","A_ID","Address"
"N_01","John Doe","resident","A_01","1138 Mapleview Drive"
"N_02","Jane Doe","resident","A_01","1138 Mapleview Drive"
"N_03","Randall L Russo","visitor","A_02","866 Sweetwood Drive"
"N_04","Sam B Haley","resident","A_03","152 Point Street"
"N_01","John Doe","mailing address","A_04",3490 Horizon Circle"
'm able to create nodes using the code below but i don't know how to create the relationships based on the csv file.
using PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM
‘File://contacts.csv' AS line
CREATE (:Person {ID:line.N_ID, name:line.Name})
I tried this, but it doesn't work.
CREATE (:Person {N_ID:line.N_ID, Name:line.Name})-[:line.Relationship]-> (:Address {A_ID:line.A_ID, Address:line.Address})
Please bear with me I'm new to Neo4j.
Install the apoc plugin and try this query:
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file://contacts.csv' AS line
MERGE (p1:Person {N_ID:line.N_ID})
ON CREATE SET p1.Name=line.Name
MERGE (a1:Address {A_ID:line.A_ID})
ON CREATE SET a1.Address=line.Address
WITH a1,p1,line
CALL apoc.merge.relationship(p1,line.Relationship,{},{},a1) YIELD rel
RETURN count(*);
I am querying data from MSsql server and saving it into CSVs. With those CSVs,
I am modelling data into Neo4j. But Mssql database updates regularly. So,also wants to update Neo4j data on the regular basis. Neo4j has two types of nodes: 1.X and 2.Y . Below query and indexing, used to model the data:
CREATE INDEX ON :X(X_Number, X_Description,X_Type)
CREATE INDEX ON :Y(Y_Number, Y_Name)
using periodic commit
LOAD CSV WITH HEADERS FROM "file:///CR_n_Feature_new.csv" AS line
Merge(x:X{
X_Number : line.X_num,
X_Description: line.X_txt,
X_Type : line.X_Type,
})
Merge(y:Y{
Y_Number : line.Y_number,
Y_Name: line.Y_name,
})
Merge (y)-[:delivered_by]->(x)
Now there are two kinds of updates:
There may be new X and Y nodes which can be taken care by "Merge" command.
But there can be modifications in the properties of already created nodes X and Y....for e.g. if node X{X_Number : 1, X_Description : "abc", X_Type : "z"} but now in updated data X node properties got changed to X{X_Number : 1, X_Description : "def", X_Type : "y"}
So I don't want to create a new node for the X_Number:1 node but just want to update the existing node properties like X_Description and X_Type.
You could just re-write your query to support new nodes and changes to existing nodes by merging only on the X_Number or Y_Number attributes.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///CR_n_Feature_new.csv" AS line
MERGE (x:X {X_Number: line.X_num})
SET X_Description = line.X_txt, X_Type = line.X_Type
MERGE (y:Y {Y_Number: line.Y_number})
SET Y_Name=line.Y_name
MERGE (y)-[:delivered_by]->(x)
This way the MERGE statements will always match the existing X and Y nodes based on the X_Number and Y_Number attributes which presumably are immutable. Then the existing Description, X_Type and Y_Name attributes will be updated with the newer values.
I am using syslog-ng to parse some logs that I am receiving via a csv-parser. However, I want to achieve insert operations that are a bit more complex than the conventional insert using the "destination" option in syslog-ng. Currently, my destination into MYSQL from my syslog-ng conf file looks like this:
destination d_sql_test
{
sql(
type(mysql)
host('<host>')
username('<user>')
password('<pass>')
database('<db_name>')
table('test')
columns('col1')
values('${val1}')
);
};
However, this simply just inserts the contents of val1 into the column col1. I want to be able to specify my insert "logic" as shown in the example in this question.
I am unsure as to where to actually do this, and if it is even supported by syslog-ng
I think you can do this if you can somehow make the decision within syslog-ng.
You could try to use an in-list() filter to check if the username is already listed in a file. If it is not then, you can send the log into the mysql destination, and also to another destination (possibly a program() destination) that updates the file containing the list of users, and reloads the syslog-ng to update the inlist filter.
You can write a syslog-ng template-function in Python that implements the logic somehow, and for example sets a macro to 1 in the message if it should be sent to the database. Then you can use a filter for this macro in your log path with the mysql destination.
Or if you can write a separate destination that does the work in Python: Writing syslog-ng destinations in Python
Also, you might want to post this question on the syslog-ng mailing list, where the developers notice it more easily.
I want to move one table with self reference from PostgreSQL to Neo4j.
PostgreSQL:
COPY (SELECT * FROM "public".empbase) TO '/tmp/empbase.csv' WITH CSV header;
Result:
$ cat /tmp/empbase.csv | head
e_id,e_name,e_bossid
1,emp_no_1,
2,emp_no_2,
3,emp_no_3,
4,emp_no_4,
5,emp_no_5,3
6,emp_no_6,2
7,emp_no_7,3
8,emp_no_8,1
9,emp_no_9,4
Size:
$ du -h /tmp/empbase.csv
631M /tmp/empbase.csv
I import data to neo4j with:
neo4j-sh (?)$ USING PERIODIC COMMIT 1000
> LOAD CSV WITH HEADERS FROM "file:/tmp/empbase.csv" AS row
> CREATE (:EmpBase:_EmpBase { neo_eb_id: toInt(row.e_id),
> neo_eb_bossID: toInt(row.e_bossid),
> neo_eb_name: row.e_name});
and this works fine:
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 20505764
Properties set: 61517288
Labels added: 41011528
846284 ms
The Neo4j console says:
Location:
/home/neo4j/data/graph.db
Size:
5.54 GiB
But then I want to proceed with the relation that each emp has a boss. So simple emp->bossid SELF reference.
Now I do it like this:
LOAD CSV WITH HEADERS FROM "file:/tmp/empbase.csv" AS row
MATCH (employee:EmpBase:_EmpBase {neo_eb_id: toInt(row.e_id)})
MATCH (manager:EmpBase:_EmpBase {neo_eb_id: toInt(row.e_bossid)})
MERGE (employee)-[:REPORTS_TO]->(manager);
But this works for 5-6 hours and breaks in the end with system failures it freezez the system.
I think this might be terribly wrong.
1. Am I doing sth wrong or is it bug for No4j?
2. Why out of 631 MB csv now I get 5,5 GB?
EDIT1:
$ du -h /home/neo4j/data/
20K /home/neo4j/data/graph.db/index
899M /home/neo4j/data/graph.db/schema/index/lucene/1
899M /home/neo4j/data/graph.db/schema/index/lucene
899M /home/neo4j/data/graph.db/schema/index
27M /home/neo4j/data/graph.db/schema/label/lucene
27M /home/neo4j/data/graph.db/schema/label
925M /home/neo4j/data/graph.db/schema
6,5G /home/neo4j/data/graph.db
6,5G /home/neo4j/data/
SOLUTION:
Wait until the :schema in console says ONLINe not POPULATING
change log size in config file
Add USING PERIODIC COMMIT 1000 in second csv import
Index only on label
Only match on one Label: MATCH (employee:EmpBase {neo_eb_id: toInt(row.e_id)})
Did you create the index: CREATE INDEX ON :EmpBase(neo_eb_id);
then wait for the index to get online :schema in browser
OR if it is a unique id: CREATE CONSTRAINT ON (e:EmpBase) assert e.neo_eb_id is unique;
Otherwise your match will scan all nodes in the database for each MATCH.
For your second question, I think it's the transaction log files,
you can limit their size in conf/neo4j.properties with
keep_logical_logs=100M size
The actual nodes and properties files shouldn't be that large. Also you don't have to store the boss-id in the database. That's actually handled by the relationship :)
I am running a script that stores different datasets to a MySQL database. This works so far, but only sequentially. e.g.:
# write table1
replaceTable(con,tbl="table1",dframe=dframe1)
# write table2
replaceTable(con,tbl="table2",dframe=dframe2)
If I select both (I use StatET / Eclipse) and run the selection, I get an error:
Error in function (classes, fdef, mtable) :
unable to find an inherited method for function "dbWriteTable",
for signature "MySQLConnection", "data.frame", "data.frame".
I guess this has to do with the fact that my con is still busy or so when the second request is started. When I run the script line after line it just works fine. Hence I wonder, how can I tell R to wait til the first request is ready and then go ahead ? How can I make R scripts interactive (just console like plot examples - no tcl/tk).
EDIT:
require(RMySQL)
replaceTable <- function(con,tbl,dframe){
if(dbExistsTable(con,tbl)){
dbRemoveTable(con,tbl)
dbWriteTable(con,tbl,dframe)
cat("Existing database table updated / overwritten.")
}
else {
dbWriteTable(con,tbl,dframe)
cat("New database table created")
}
}
dbWriteTable has two important arguments:
overwrite: a logical specifying whether to overwrite an existing table
or not. Its default is ‘FALSE’.
append: a logical specifying whether to append to an existing table
in the DBMS. Its default is ‘FALSE’.
For past project I have successfully achieve appending, overwriting, creating, ... of tables with proper combinations of these.