I'm trying to load some data in Neo4J. I have a Person node which is already setup. Now, this node needs to have an email property which should be an array(or collection). Basically, the email property needs to have multiple values, like -
email: ["abc#xyz.com", "abc#foo.com"]
I've come across similar questions here but all of the answers indicate to setting multiple property values at the time the node itself is created. Like this query from this answer -
CREATE (e:Employee { name:"Sam",languages: ["C", "C#"]})
RETURN e
But the problem in my case is that Person node is already created, and I need to set the email property on it now.
This is a small subset of the data I have to load -
Personid|email
933|Mahinda933#hotmail.com
933|Mahinda933#yahoo.com
933|Mahinda933#zoho.com
1129|Carmen1129#gmail.com
1129|Carmen1129#gmx.com
1129|Carmen1129#yahoo.com
4194|Ho.Chi4194#gmail.com
4194|Ho.Chi4194#gmx.com
Also, the data is coming from a CSV file with thousands of rows, so my query needs to be generic, I can't set the properties for each individual Person node.
When I was testing out the creation of the email property with this subset, my first attempt was this -
MATCH (n:TESTPERSON{id:933})
SET n.email = "Mahinda933#hotmail.com"
RETURN n
MATCH (n:TESTPERSON{id:933})
SET n.email = "Mahinda933#yahoo.com"
RETURN n
As I was thinking, this just overwrites the email property to the value in the most recent query.
After looking at the answers here and on the Cypher docs, I found out that Neo4J allows you to set an array/collection (multiple values of the same type) as a property value, and then I tried this -
// CREATE test node
CREATE (n:TESTPERSON{id:933})
RETURN n
// at this time, this node does not have any `email` property, so setup
// email as an array with one string value
MATCH (n:TESTPERSON{id:933})
SET n.email = ["Mahinda933#hotmail.com"]
RETURN n
// Now, using +=, I can append to the array of strings
MATCH (n:TESTPERSON{id:933})
SET n.email = n.email + "Mahinda933#yahoo.com"
RETURN n
// add a third value to array
MATCH (n:TESTPERSON{id:933})
SET n.email = n.email + "Mahinda933#zoho.com"
RETURN n
Here's the result -
As you can see, the email property now has multiple values.
But the problem is that since my CSV file has thousands of rows, I need a generic query to do this.
I thought of using a CASE statement as per the documentation here, and tried this -
MATCH (n:TESTPERSON {id:933})
CASE
WHEN n.email IS NULL THEN SET n.email = [ "Mahinda933#hotmail.com"]
ELSE SET n.email = n.email + "Mahinda933#yahoo.com"
RETURN n
But this just throws the error - mismatched input CASE expecting ;.
I was hoping I could use this query as a generic way for my CSV file like this -
LOAD CSV WITH HEADERS FROM 'FILEURL' AS line FIELDTERMINATOR `|`
MATCH (n:TESTPERSON {id:toInt(line.Personid)})
CASE
WHEN n.email IS NULL THEN SET n.email = [line.email]
ELSE SET n.email = n.email + line.email
But I don't even know if this would work, even if the CASE error is fixed.
I'm really stumped, and would appreciate any help. Thank You.
You can use COALESCE() to use a default value in case the value you're trying to get is null. You might use it like this:
...
SET n.email = COALESCE(n.email, []) + "Mahinda933#yahoo.com"
...
Whenever you're setting an array of values as a node property, it's a good idea to consider whether you might instead model these as separate nodes with relationships to the original node.
In this case, :Email nodes with some relationship to your :TESTPERSON nodes, with one :Email node per email, and multiple relationships from :TESTPERSON to multiple :Emails.
An advantage here is you'd be able to support uniqueness constraints, if you want to ensure there's only one :Email in the system, and you would be able to quickly look up a person by their email if you have an index or unique constraint, as the query would use the index to lookup the :Email and from there it's only one relationship traversal to the owner of the email.
When you have values in a collection on a node, you can't use an index lookup to a value in the collection, so your current model won't be able to quickly lookup a person by their email.
Try this solution using MERGE:
LOAD CSV WITH HEADERS FROM 'file:///p.csv' AS line FIELDTERMINATOR '|'
MERGE (p:Person {id:toInteger(line.Personid)})
ON CREATE SET p.mail = line.email
ON MATCH SET p.mail = p.mail + '-' + line.email
The MERGE command take care of the duplicate nodes, and then we're setting the properties only when the node is created with ON CREATE SET, and when the node is already in the database (i.e. ON MATCH SET), we're going to add the email address to the property.
Hope that helps.
This is my result in Neo4j:
A quick workaraound is to load your data in two steps
1/ LOAD CSV, create node with empty array property
2/LOAD CSV again, set emails +=
3/ Optionnal, depending on your data for each node, remove doubles in the array (do it with a custom procedure).
Should do it. I also am not very happy with the CASE syntax
Related
I'm using Lucee 5.x and Maria DB (MySQL).
I have a user supplied comma delimited list. I need to query the database and if the item isn't in the database, I need to add it.
user supplied list
green
blue
purple
white
database items
black
white
red
blue
pink
orange
lime
It is not expected that the database list would grow to more than 30 items but end-users always find 'creative' ways to use the tools we provide them.
So using the user supplied list above, only green and purple should be added to the database.
Do I compare the user supplied list against the database items or vice versa? Would the process change if the user supplied list count exceeds what is in the database (meaning if the user submits 10 items and the database only contains 5 items)? I'm not sure which loop is the better way to determine which items are new. Needs to be in cfscript and I'm looking at the looping options as outlined here (https://www.petefreitag.com/cheatsheets/coldfusion/cfscript/)
FOR Loop
FOR IN Loop (Array)
FOR IN Loop (Query)
I tried MySQL of NOT IN but that left me with the existing database values in addition to the new ones. I know this should be simple and I'm over complicating this somewhere and/or am too close to the problem to see the solution.
You could do this:
get a list with existing items from database
append user supplied list
remove duplicates
update db if items were added
<cfscript>
var userItems = '"green","blue","purple","white"';
var dbItems = '"black","white","red","blue","pink","orange","lime"';
var result = ListRemoveDuplicates( ListAppend(dbItems, userItems));
if (ListLen(result) neq ListLen(dbItems)) {
// update db
}
</cfscript>
Update (only new items)
<cfscript>
var userItems = '"green","blue","purple","white"';
var dbItems = '"black","white","red","blue","pink","orange","lime"';
var newItems = '';
ListEach(userItems, function (item) {
if (not ListFind(dbItems, item)) {
newItems = ListAppend(newItems, item);
}
})
</cfscript>
trycf.com gist:
(https://trycf.com/gist/f6a44821165338b3c10b7808606979e6/lucee5?theme=monokai)
Again, since this is an operation that the database can do, I'd feed the input data to the database and then let it decide how to deal with multiple keys. I don't recommend using CF to loop through your values to check them and then doing the INSERT. This will require multiple trips to the database and then processing on the application server that isn't really needed.
My suggestion is to use MariaDB's INSERT....ON DUPLICATE KEY UPDATE... syntax. This will also require that whatever field you are trying to insert on actually has a UNIQUE constraint on it. Without that constraint, then your database itself doesn't care if you have duplicate data, when can cause its own set of issues.
For the database, we have
CREATE TABLE t1 (mycolor varchar(50)
, CONSTRAINT constraint_mycolor UNIQUE (mycolor)
) ;
INSERT INTO t1(mycolor)
VALUES ('black'),('white'),('red'),('blue'),('pink'),('orange'),('lime')
;
The ColdFusion is:
<cfscript>
myInputValues = "green,blue,purple,white" ;
myQueryValues = "" ;
function sanitizeValue ( String inVal required ) {
// do sanitization stuff here
var sanitizedInVal = arguments.inVal ;
return sanitizedInVal ;
}
myQueryValues = myInputValues.listMap(
function(i) {
return "('" & sanitizeValue(i) & "')" ;
}
) ;
// This will take parameterization out of the cfquery tag and
preform sanitization and validation before building the
query string.
myQuery = new query();
myQuery.name = "myQuery";
myQuery.setDataSource("dsn");
sqlString = "INSERT INTO t1(mycolor) VALUES "
& myQueryValues
& " ON DUPLICATE KEY UPDATE mycolor=mycolor;"
;
myQuery.setSQL(sqlString);
myQueryResult = myQuery.execute().getResult();
</cfscript>
First, build up your input values (myInputValues). You'll want to do validation and sanitization on them to prevent nastiness from entering your database. I created a sanitizeValue function to be the placeholder for the sanitization and validation operations.
myQueryValues will become a string list of the values in the proper format that we will use to insert into the database.
Then we just build up a new query(), using myQueryValues in the sqlString to get our query. Again, since we are building a string for multiple values to INSERT, I don't think there's a way to user queryparam for those VALUES. But since we cleaned up our string earlier, it should do much of what cfqueryparam does anyway.
We use MariaDB's INSERT INTO .... ON DUPLICATE KEY UPDATE ... syntax to only insert unique values. Again, this requires that the database itself has a constraint to prevent duplicates in whatever column we're inserting.
For a demo: https://dbfiddle.uk/?rdbms=mariadb_10.2&fiddle=4308da3addb9135e49eeee451c6e9e58
This should do what you're looking to do without beating up on your database too much. I don't have a Lucee or MariaDB server set up to test, so you'll have to give it a shot and see how it performs. I don't know how big your database is or will become, but this should still query pretty quickly.
The SQLite JSON1 extension has some really neat capabilities. However, I have not been able to figure out how I can update or insert individual JSON attribute values.
Here is an example
CREATE TABLE keywords
(
id INTEGER PRIMARY KEY,
lang INTEGER NOT NULL,
kwd TEXT NOT NULL,
locs TEXT NOT NULL DEFAULT '{}'
);
CREATE INDEX kwd ON keywords(lang,kwd);
I am using this table to store keyword searches and recording the locations from which the search was ininitated in the object locs. A sample entry in this database table would be like the one shown below
id:1,lang:1,kwd:'stackoverflow',locs:'{"1":1,"2":1,"5":1}'
The location object attributes here are indices to the actual locations stored elsewhere.
Now imagine the following scenarios
A search for stackoverflow is initiated from location index "2". In this case I simply want to increment the value at that index so that after the operation the corresponding row reads
id:1,lang:1,kwd:'stackoverflow',locs:'{"1":1,"2":2,"5":1}'
A search for stackoverflow is initiated from a previously unknown location index "7" in which case the corresponding row after the update would have to read
id:1,lang:1,kwd:'stackoverflow',locs:'{"1":1,"2":1,"5":1,"7":1}'
It is not clear to me that this can in fact be done. I tried something along the lines of
UPDATE keywords json_set(locs,'$.2','2') WHERE kwd = 'stackoverflow';
which gave the error message error near json_set. I'd be most obliged to anyone who might be able to tell me how/whether this should/can be done.
It is not necessary to create such complicated SQL with subqueries to do this.
The SQL below would solve your needs.
UPDATE keywords
SET locs = json_set(locs,'$.7', IFNULL(json_extract(locs, '$.7'), 0) + 1)
WHERE kwd = 'stackoverflow';
I know this is old, but it's like the first link when searching, it deserves a better solution.
I could have just deleted this question but given that the SQLite JSON1 extension appears to be relatively poorly understood I felt it would be more useful to provide an answer here for the benefit of others. What I have set out to do here is possible but the SQL syntax is rather more convoluted.
UPDATE keywords set locs =
(select json_set(json(keywords.locs),'$.**N**',
ifnull(
(select json_extract(keywords.locs,'$.**N**') from keywords where id = '1'),
0)
+ 1)
from keywords where id = '1')
where id = '1';
will accomplish both of the updates I have described in my original question above. Given how complicated this looks a few explanations are in order
The UPDATE keywords part does the actual updating, but it needs to know what to updatte
The SELECT json_set part is where we establish the value to be updated
If the relevant value does not exsit in the first place we do not want to do a + 1 on a null value so we do an IFNULL TEST
The WHERE id = bits ensure that we target the right row
Having now worked with JSON1 in SQLite for a while I have a tip to share with others going down the same road. It is easy to waste your time writing extremely convoluted and hard to maintain SQL in an effort to perform in-place JSON manipulation. Consider using SQLite in memory tables - CREATE TEMP TABLE... to store intermediate results and write a sequence of SQL statements instead. This makes the code a whole lot eaiser to understand and to maintain.
I have a column within my database that holds text similar to this
CNEWS # Trinidad : "By Any Means Necessary" Watson Duke Swims And Sails To Toco http://somewebsitehere.com
What can I do to remove the entire http address from the column? Please note that some links may be broken so it may have http:// somewebsitehere.com
I was thinking of using a substring index but not sure that would work.
You could use whichever your favorite programming language is to iterate through the rows in the table, pluck out that column, apply a regular expression replacement rule to it, then update the row in the table with the new value.
Here is some pseudo-code:
theRows = SELECT * FROM TheTable WHERE 1;
foreach row in theRows
BEGIN
oldColumnValue = row[theColumnName]
// Removes any link appearing at the end of the column
newColumnValue = oldColumnValue.replace(/http:\/\/[^\s]*$/, '')
UPDATE TheTable SET theColumnName = newColumnValue WHERE id = row[id]
END
For something as small and specific as this, you could use perl with the DBI library to connect to mySQL. Here's a useful resource on regular expressions if you want to go more into it: http://www.regular-expressions.info/perl.html
I want to specify the return values for a specific update in sqlalchemy.
The documentation of the underlying update statement (sqlalchemy.sql.expression.update) says it accepts a "returning" argument and the docs for the query object state that query.update() accepts a dictionary "update_args" which will be passed as the arguments to the query statement.
Therefore my code looks like this:
session.query(
ItemClass
).update(
{ItemClass.value: value_a},
synchronize_session='fetch',
update_args={
'returning': (ItemClass.id,)
}
)
However, this does not seem to work. It just returns the regular integer.
My question is now: Am I doing something wrong or is this simply not possible with a query object and I need to manually construct statements or write raw sql?
The full solution that worked for me was to use the SQLAlchemy table object directly.
You can get that table object and the columns from your model easily by doing
table = Model.__table__
columns = table.columns
Then with this table object, I can replicate what you did in the question:
from your_settings import db
update_statement = table.update().returning(table.id)\
.where(columns.column_name=value_one)\
.values(column_name='New column name')
result = db.session.execute(update_statement)
tuple_of_results = result.fetchall()
db.session.commit()
The tuple_of_results variable would contain a tuple of the results.
Note that you would have to run db.session.commit() in order to persist the changes to the database as you it is currently running within a transaction.
You could perform an update based on the current value of a column by doing something like:
update_statement = table.update().returning(table.id)\
.where(columns.column_name=value_one)\
.values(like_count=table_columns.like_count+1)
This would increment our numeric like_count column by one.
Hope this was helpful.
Here's a snippet from the SQLAlchemy documentation:
# UPDATE..RETURNING
result = table.update().returning(table.c.col1, table.c.col2).\
where(table.c.name=='foo').values(name='bar')
print result.fetchall()
I need to get last record from db. I'm using sqlalchemy.
At the moment, I'm doing like that:
obj = ObjectRes.query.all()
return str(obj[-1].id)
But it's too heavy query. How can I get last record better?
Take a look at Query.first(). If you specify a sort on the right column, the first will be your last. An example could look like this:
obj = session.query(ObjectRes).order_by(ObjectRes.id.desc()).first()
Sometimes it is difficult to reformulate simple things:
SELECT * FROM ObjectRes WHERE id IN (SELECT MAX(id) FROM ObjectRes)
but this worked for me:
session.query(ObjectRes).filter(ObjectRes.id == session.query(func.max(ObjectRes.id)))
Don't forget to disable existing ordering if needed
In my case I have dynamic ordered relationships:
class Match:
...
records = relationship("Record", backref="match", lazy="dynamic", order_by="Record.id")
And when I tried accepted answer I got first record, not the last, cause ORDER BY were applied twice and spoiled the results.
According to documentation:
All existing ORDER BY settings can be suppressed by passing None
So the solution will be:
match = db_session.query(Match).first()
last_record = match.records.order_by(None).order_by(Record.id.desc()).first()
This answer modifies the others to allow for cases where you don't know what the primary key is called.
from sqlalchemy.inspection import inspect
# ...
def last_row(Table: type, *, session): # -> Table
primary_key = inspect(Table).primary_key[0].name # must be an arithmetic type
primary_key_row = getattr(Table, primary_key)
# get first, sorted by negative ID (primary key)
return session.query(Table).order_by(-primary_key_row).first()