Neo4j Cypher: MERGE conditionally with values from LOAD CSV - csv

I'm trying to import from a CSV where some lines have an account number and some don't. Where accounts do have numbers I'd like to merge using them: there will be records where the name on an account has changed but the number will always stay the same. For the other records without an account number the best I can do is merge on the account name.
So really I need some kind of conditional: if a line has a account number, merge on that, else merge on account name. Something like...
LOAD CSV WITH HEADERS FROM 'file:///testfile.csv' AS line
MERGE (x:Thing {
CASE line.accountNumber WHEN NULL
THEN name: line.accountName
ELSE number: line.accountNumber
END
})
ON CREATE SET
x.name = line.accountName,
x.number = line.accountNumber
Though of course that doesn't work. Any ideas?

To test for a 'NULL' value in a .csv file in LOAD CSV, you have to test for an empty string.
testfile.csv
acct_name,acct_num
John,1
Stacey,2
Alice,
Bob,4
This assumes the account names are unique...
LOAD CSV WITH HEADERS FROM 'file:///testfile.csv' AS line
// If acct_num is not null, merge on account number and set name if node is created instead of found.
FOREACH(number IN (CASE WHEN line.acct_num <> "" THEN [TOINT(line.acct_num)] ELSE [] END) |
MERGE (x:Thing {number:number})
ON CREATE SET x.name = line.acct_name
)
// If acct_num is null, merge on account name. This node will not have an account number if it is created instead of matched.
FOREACH(name IN (CASE WHEN line.acct_num = "" THEN [line.acct_name] ELSE [] END) |
MERGE (x:Thing {name:name})
)

Related

Update and insert from qtableWidget into MYSQL database more efficiently

I'm building a desktop app with PyQt5 to connect with, load data from, insert data into and update a MySQL database. What I came up with to update the database and insert data into the database works. But I feel there should be a much faster way to do it in terms of computation speed. If anyone could help that would be really helpful. What I have as of now for updating the database is this -
def log_change(self, item):
self.changed_items.append([item.row(),item.column()])
# I connect this function to the item changed signal to log any cells which have been changed
def update_db(self):
# Creating an empty list to remove the duplicated cells from the initial list
self.changed_items_load= []
[self.changed_items_load.append(x) for x in self.changed_items if x not in self.changed_items_load]
# loop through the changed_items list and remove cells with no values in them
for db_wa in self.changed_items_load:
if self.tableWidget.item(db_wa[0],db_wa[1]).text() == "":
self.changed_items_load.remove(db_wa)
try:
mycursor = mydb.cursor()
# loop through the list and update the database cell by cell
for ecr in self.changed_items_load:
command = ("update table1 set `{col_name}` = %s where id=%s;")
# table widget column name matches db table column name
data = (str(self.tableWidget.item(ecr[0],ecr[1]).text()),int(self.tableWidget.item(ecr[0],0).text()))
mycursor.execute(command.format(col_name = self.col_names[ecr[1]]),data)
# self.col_names is a list of the tableWidget columns
mydb.commit()
mycursor.close()
except OperationalError:
Msgbox = QMessageBox()
Msgbox.setText("Error! Connection to database lost!")
Msgbox.exec()
except NameError:
Msgbox = QMessageBox()
Msgbox.setText("Error! Connect to database!")
Msgbox.exec()
For inserting data and new rows into the db I was able to find some info online about that. But I have been unable to insert multiple lines at once as well as insert varying column length for each row. Like if I want to insert only 2 columns at row 1, and then 3 columns at row 2... something like that.
def insert_db(self):
# creating a list of each column
self.a = [self.tableWidget.item(row,1).text() for row in range (self.tableWidget.rowCount()) if self.tableWidget.item(row,1) != None]
self.b = [self.tableWidget.item(row,2).text() for row in range (self.tableWidget.rowCount()) if self.tableWidget.item(row,2) != None]
self.c = [self.tableWidget.item(row,3).text() for row in range (self.tableWidget.rowCount()) if self.tableWidget.item(row,3) != None]
self.d = [self.tableWidget.item(row,4).text() for row in range (self.tableWidget.rowCount()) if self.tableWidget.item(row,4) != None]
try:
mycursor = mydb.cursor()
mycursor.execute("INSERT INTO table1(Name, Date, Quantity, Comments) VALUES ('%s', '%s', '%s', '%s')" %(''.join(self.a),
''.join(self.b),
''.join(self.c),
''.join(self.d)))
mydb.commit()
mycursor.close()
except OperationalError:
Msgbox = QMessageBox()
Msgbox.setText("Error! Connection to database lost!")
Msgbox.exec()
except NameError:
Msgbox = QMessageBox()
Msgbox.setText("Error! Connect to database!")
Msgbox.exec()
Help would be appreciated. Thanks.
Like if I want to insert only 2 columns at row 1, and then 3 columns at row 2
No. A given Database table has a specific number of columns. That is an integral part of the definition of a "table".
INSERT adds new rows to a table. It is possible to construct a single SQL statement that inserts multiple rows "all at once".
UPDATE modifies one or more rows of a table. The rows are indicated by some condition specified in the Update statement.
Constructing SQL with %s is risky -- it gets in trouble if there are quotes in the string being inserted.
(I hope these comments help you get to the next stage of understanding databases.)

MATCH blocks for Neo4j, to prevent "null property value"-error

I'm creating Person labelled nodes from a JSON, in Neo4j, with something like this (JSON is blank because of privacy/to much data):
WITH [] AS contacts
UNWIND contacts AS contact
But sometimes the person data doesn't have a first name and or last name,
so this query fails:
MERGE(personBind:Person {first_name: contact.first_name, last_name: contact.last_name})
It gives this error:
Cannot merge node using null property value
Is it possible to have a MATCH, with something like these blocks in it:
NOT MATCH
CANT CREATE (Don't do something with person, but make other stuff)
CAN CREATE (Make person and relations to person, and make other stuff)
ON MATCH (Make relations to person, and make other stuff)
This is the other code, that I need to use,
but is to "attached" to person (Sometimes will fail):
FOREACH (addr IN contact.addrs |
MERGE addrPath=((zipBind:ZipCode {zipcode: addr.zipcode})-[nz:NUMBER_IN_ZIPCODE]->(houseBind:House {number: addr.housenumber})<-[ns:NUMBER_IN_STREET]-(streetBind:Street))
MERGE(personBind)-[:WORKS_AT]->(houseBind)
)
FOREACH( phoneValue IN contact.phone | MERGE(phoneBind:Phone {number: phoneValue.value}) MERGE(personBind)-[:REACHABLE_BY]->(phoneBind) )
FOREACH( emailValue IN contact.email | MERGE(emailBind:Email {email: emailValue.value}) MERGE(personBind)-[:REACHABLE_BY]->(emailBind) )
Break your query up into two parts: first, address the universal parts (the contact data), then filter your query for people who have a first and last name and merge the other components.
UNWIND [] AS contact
WITH contact
UNWIND contact.addrs AS addr
MERGE (zipBind:ZipCode {zipcode: addr.zipcode})
MERGE (streetBind:Street) # does this need properties, like a name?
MERGE (streetBind) - [:NUMBER_IN_STREET] ->(houseBind:House {number: addr.housenumber})
MERGE (zipBind) -[:NUMBER_IN_ZIPCODE]-> (houseBind)
WITH contact, COLLECT(houseBind) AS houseBinds
UNWIND contact.phone AS phoneValue
MERGE(phoneBind:Phone {number: phoneValue.value})
WITH contact, houseBinds, COLLECT(phoneBind) AS phoneBinds
UNWIND contact.email AS emailValue
MERGE(emailBind:Email {email: emailValue.value})
WITH contact, houseBinds, phoneBinds, COLLECT(emailBind) AS emailBinds
<NOW FILTER FOR CONTACTS WITH PERSON DATA>
WHERE exists(contact.first_name) AND EXISTS(contact.last_name)
MERGE(personBind:Person {first_name: contact.first_name, last_name: contact.last_name})
FOREACH( housebind IN housebinds|MERGE(personBind)-[:WORKS_AT]->(houseBind))
FOREACH( phoneBind IN phoneBinds | MERGE(personBind)-[:REACHABLE_BY]->(phoneBind) )
FOREACH( emailBind IN emailBinds | ) MERGE(personBind)-[:REACHABLE_BY]->(emailBind) )
If you need rows that don't have Person names later in the query, you can change the filter step to create two lists, one that has names, one that doesn't, but that's more complicated than your question requires.
EDITED: Your Street nodes have no properties, so this query is going to find the same street for every row. Is there something you want to assign there?
Use a combination of foreach and case:
UNWIND [ {lastName: 'Fam1', firstName: 'Nam1'}, {lastName: 'Fam2'} ]
as contact
FOREACH( x in CASE WHEN ( NOT contact.lastName is NULL AND
NOT contact.firstName is NULL
) THEN [1] ELSE [] END |
MERGE (personBind:TMP:Person {first_name: contact.firstName,
last_name: contact.lastName}
)
)
http://www.markhneedham.com/blog/2014/06/17/neo4j-load-csv-handling-conditionals/

How to find all possible trains from Source and Destination?

I have a table named Route that stores all possible Routes of Trains.
I need to write a query to find all possible Train_ID such that Station_ID of Source = "NDLS" and Station_ID of Destination = "KNP".
My attempt:
Select t.Train_ID from Route as t,Route as d where t.Train_ID = d.Train_ID and t.Stop_Number < d.Stop_Number and t.Station_ID = "KNP" and d.Station_ID = "NDLS";
But this is returning empty set.
select t.train_id
, case when t.station_id = 'NDLS' then t.station_id end as source
, case when t.station_id = 'KNP' then t.station_id end as destination
from route t;
This will give you train_id in first column, second column would be train_id with source as 'NDLS' and third column would be as destination as 'KNP'.
You might get null for destination where source value is presnt and vice versa.
I hope this is ok for you.

How to do this lookup more elegantly in Python and MySQL?

I have a bunch of files in a directory named with nameid_cityid.txt, nameid and cityid being the ids of name (integer(10)) and city (integer(10)) in mydata table.
While the following solution works, I am doing type conversions since fetchall fetches 'L' and the file name tuple of nameid, cityid are strings,..
If you can suggest a pythonic or more elegant way of doing the same, that will be awesome, for me and the communtiy!
What I am trying to achieve :
Find those files from a directory that don't have a record in the database and then do something with that file, like parse/move/delete it.
MySQL table mydata :
nameid cityid
15633 45632
2354 76894
Python :
for pdffile in os.listdir(filepath):
cityid, nameid = pdffile.strip('.txt').split('_')[0], pdffile.strip('.txt').split('_')[1]
cursor.execute("select cityid, nameid from mydata")
alreadyparsed = cursor.fetchall()
targetvalues = ((str(cityid), str(nameid)) for cityid, nameid in alreadyparsed)
if (int(cityid), int(nameid)) in alreadyparsed:
print cityid, nameid, "Found"
else:
print cityid, nameid, "Not found"
I'd use a set for quick and easy testing:
cursor.execute("select CONCAT(nameid, '_', cityid, '.txt') from mydata")
present = set([r[0] for r in cursor])
for pdffile in os.listdir(filepath):
nameid, cityid = map(int, pdffile.rsplit('.', 1)[0].split('_'))
print nameid, cityid,
print "Found" if pdffile in present else "Not found"
First, I've pulled the query outside of the filename loop; no point in querying the same set of rows each time.
Secondly, I'll let MySQL generate filenames for me using CONCAT for ease of collecting the information into a set.
Thirdly, because we now have a set of filenames, testing each individual filename against the set is a simple pdffile in present test.
And finally, I've simplified your filename splitting logic to one line.
Now, if all you want is a set of filenames that are not present yet in the database (rather than enumerate which ones are and which ones are not), just use a set operation:
cursor.execute("select CONCAT(nameid, '_', cityid, '.txt') from mydata")
present = set([r[0] for r in cursor])
for pdffile in (set(os.listdir(filepath)) - present):
nameid, cityid = map(int, pdffile.rsplit('.', 1)[0].split('_'))
print nameid, cityid, "Found"
Here we use the .difference operation (with the - operator) to remove all the filenames for which there are already rows in the database, in one simple operation.
You could perform the concatenation in SQL, which will return a string:
SELECT CONCAT(nameid, '_', cityid, '.txt') FROM mydata

Ruby on Rails export to csv - maintain mysql select statement order

Exporting some data from mysql to a csv file using FasterCSV. I'd like the columns in the outputted CSV to be in the same order as the select statement in my query.
Example:
rows = Data.find(
:all,
:select=>'name, age, height, weight'
)
headers = rows[0].attributes.keys
FasterCSV.generate do |csv|
csv << headers
rows.each do |r|
csv << r.attributes.values
end
end
CSV Output:
height,weight,name,age
74,212,bob,23
70,201,fred,24
.
.
.
I want the CSV columns in the same order as my select statement. Obviously the attributes method is not going to work. Any ideas on the best way to ensure that the columns in my csv file will be in the same order as the select statement? Got a lot of data and performance is an issue. The select statement is not static. I realize I could loop through column names within the rows.each loop but it seems kinda dirty.
Use the Comma gem:
class Data < ActiveRecord:Base
comma do
name
age
height
weight
end
comma :height_weight do
name
age
height_in_feet
weight
end
end
Now you can generate the CSV as follows:
Data.all(:select => 'name, age, height, weight').to_comma
Data.all(:select => 'name, age, height_in_feet, weight').to_comma(:height_weight)
Edit:
The ActiveRecord finders does not support calculated columns in the resultset, i.e.
data = Data.first(:select => 'name, age, height/12 as height_in_feet, weight')
data.height_in_feet # throws error
You can use select_extra_columns gem if you want to include the calculated columns.
Try this:
def export_to_csv (rows, col_names)
col_names = col_names.split(",") if col_names.is_a?(String)
FasterCSV.generate do |csv|
# header row
csv << col_names
# data rows
rows.each do |row|
csv << col_names.collect{|name| row.send(name)}
end
end
end
cols = 'name, age, height, weight'
rows = Data.find(:all, :select=> cols)
csv = export_to_csv(rows, cols)