When an object is saved in rails, it's ID in DB is assigned to it. But it is not actually saved in the DB.
On the console, I haven't seen any query being fired other than the INSERT query, which is performed after the after_save.
So how does rails assign the id to the object before the INSERT query.
There are different ways for different dbs. For more details you have to look through the AR models or any other ORM you are using.
For pg see - rails postgres insert
If you don't get an ID in you records, show more details from schema.rb
Typically, this is done by the database itself. Usually, the id column of a table is an auto_increment column which means that the database will keep an auto incrementing counter and assign its value to the new record when saved. Then, Rails has to pull the newly assigned id back from the database after inserting the record.
This is what Rails does when inserting a new row to DB (see docs for the insert method):
# ActiveRecord::ConnectionAdapters::DatabaseStatements#insert
#
# Returns the last auto-generated ID from the affected table.
#
# +id_value+ will be returned unless the value is nil, in
# which case the database will attempt to calculate the last inserted
# id and return that value.
#
# If the next id was calculated in advance (as in Oracle), it should be
# passed in as +id_value+.
def insert(arel, name = nil, pk = nil, id_value = nil, sequence_name = nil, binds = [])
sql, binds = sql_for_insert(to_sql(arel, binds), pk, id_value, sequence_name, binds)
value = exec_insert(sql, name, binds, pk, sequence_name)
id_value || last_inserted_id(value)
end
So, in practice, the ID is never passed from Rails in the INSERT statement. But after the insert, the created Rails object will have it's id defined.
Related
I have a case where I need to use conditional updates/inserts using peewee.
The query looks similar to what is shown here, conditional-duplicate-key-updates-with-mysql
As of now, what I'm doing is, do a get_or_create and then if it is not a create, check the condition in code and call and insert with on_conflict_replace.
But this is prone to race conditions, since the condition check happens back in web server, not in db server.
Is there a way to do the same with insert in peewee?
Using: AWS Aurora-MySQL-5.7
Yes, Peewee supports the ON DUPLICATE KEY UPDATE syntax. Here's an example from the docs:
class User(Model):
username = TextField(unique=True)
last_login = DateTimeField(null=True)
login_count = IntegerField()
# Insert a new user.
User.create(username='huey', login_count=0)
# Simulate the user logging in. The login count and timestamp will be
# either created or updated correctly.
now = datetime.now()
rowid = (User
.insert(username='huey', last_login=now, login_count=1)
.on_conflict(
preserve=[User.last_login], # Use the value we would have inserted.
update={User.login_count: User.login_count + 1})
.execute())
Doc link: http://docs.peewee-orm.com/en/latest/peewee/querying.html#upsert
I have a data frame made up of 3 columns named INTERNAL_ID, NT_CLONOTYPE and SAMPLE_ID. I need to write a script in R that will transfer this data into the appropriate 3 columns with the exact names in a MySQL table. However, the table has more than 3 columns, say 5 (INTERNAL_ID, COUNT, NT_CLONOTYPE, AA_CLONOTYPE, and SAMPLE_ID). The MySQL table already exists and may or may not include preexisting rows of data.
I'm using the dbx and RMariaDB libraries in R. I've been able to connect to the MySQL database with dbxConnect(). When I try to run dbxUpsert()
-----
conx <- dbxConnect(adapter = "mysql", dbname = "TCR_DB", host = "127.0.0.1", user = "xxxxx", password = "xxxxxxx")
table <- "TCR"
records <- newdf #dataframe previously created with the update data.
dbxUpsert(conx, table, records, where_cols = c("INTERNAL_ID"))
dbxDisconnect(conx)
I expect to obtain an updated mysql table with the new rows, which may or may not have null entries in the columns not contained in the data frame.
Ex.
INTERNAL_ID COUNT NT_CLONOTYPE AA_CLONOTYPE SAMPLE_ID
Pxxxxxx.01 CTTGGAACTG PMA.01
The connection and disconnection all run fin, but instead of the output I obtain the following error:
Error in .local(conn, statement, ...) :
could not run statement: Field 'COUNT' doesn't have a default value
I'm suspecting it's because the number of columns in each file are not the same, but I'm not sure. And if such, how can I get around this.
I figured it out. I changed the table entry for "COUNT" to default to NULL. This allowed for the program to proceed by ignoring "COUNT".
I'm trying to load some data in Neo4J. I have a Person node which is already setup. Now, this node needs to have an email property which should be an array(or collection). Basically, the email property needs to have multiple values, like -
email: ["abc#xyz.com", "abc#foo.com"]
I've come across similar questions here but all of the answers indicate to setting multiple property values at the time the node itself is created. Like this query from this answer -
CREATE (e:Employee { name:"Sam",languages: ["C", "C#"]})
RETURN e
But the problem in my case is that Person node is already created, and I need to set the email property on it now.
This is a small subset of the data I have to load -
Personid|email
933|Mahinda933#hotmail.com
933|Mahinda933#yahoo.com
933|Mahinda933#zoho.com
1129|Carmen1129#gmail.com
1129|Carmen1129#gmx.com
1129|Carmen1129#yahoo.com
4194|Ho.Chi4194#gmail.com
4194|Ho.Chi4194#gmx.com
Also, the data is coming from a CSV file with thousands of rows, so my query needs to be generic, I can't set the properties for each individual Person node.
When I was testing out the creation of the email property with this subset, my first attempt was this -
MATCH (n:TESTPERSON{id:933})
SET n.email = "Mahinda933#hotmail.com"
RETURN n
MATCH (n:TESTPERSON{id:933})
SET n.email = "Mahinda933#yahoo.com"
RETURN n
As I was thinking, this just overwrites the email property to the value in the most recent query.
After looking at the answers here and on the Cypher docs, I found out that Neo4J allows you to set an array/collection (multiple values of the same type) as a property value, and then I tried this -
// CREATE test node
CREATE (n:TESTPERSON{id:933})
RETURN n
// at this time, this node does not have any `email` property, so setup
// email as an array with one string value
MATCH (n:TESTPERSON{id:933})
SET n.email = ["Mahinda933#hotmail.com"]
RETURN n
// Now, using +=, I can append to the array of strings
MATCH (n:TESTPERSON{id:933})
SET n.email = n.email + "Mahinda933#yahoo.com"
RETURN n
// add a third value to array
MATCH (n:TESTPERSON{id:933})
SET n.email = n.email + "Mahinda933#zoho.com"
RETURN n
Here's the result -
As you can see, the email property now has multiple values.
But the problem is that since my CSV file has thousands of rows, I need a generic query to do this.
I thought of using a CASE statement as per the documentation here, and tried this -
MATCH (n:TESTPERSON {id:933})
CASE
WHEN n.email IS NULL THEN SET n.email = [ "Mahinda933#hotmail.com"]
ELSE SET n.email = n.email + "Mahinda933#yahoo.com"
RETURN n
But this just throws the error - mismatched input CASE expecting ;.
I was hoping I could use this query as a generic way for my CSV file like this -
LOAD CSV WITH HEADERS FROM 'FILEURL' AS line FIELDTERMINATOR `|`
MATCH (n:TESTPERSON {id:toInt(line.Personid)})
CASE
WHEN n.email IS NULL THEN SET n.email = [line.email]
ELSE SET n.email = n.email + line.email
But I don't even know if this would work, even if the CASE error is fixed.
I'm really stumped, and would appreciate any help. Thank You.
You can use COALESCE() to use a default value in case the value you're trying to get is null. You might use it like this:
...
SET n.email = COALESCE(n.email, []) + "Mahinda933#yahoo.com"
...
Whenever you're setting an array of values as a node property, it's a good idea to consider whether you might instead model these as separate nodes with relationships to the original node.
In this case, :Email nodes with some relationship to your :TESTPERSON nodes, with one :Email node per email, and multiple relationships from :TESTPERSON to multiple :Emails.
An advantage here is you'd be able to support uniqueness constraints, if you want to ensure there's only one :Email in the system, and you would be able to quickly look up a person by their email if you have an index or unique constraint, as the query would use the index to lookup the :Email and from there it's only one relationship traversal to the owner of the email.
When you have values in a collection on a node, you can't use an index lookup to a value in the collection, so your current model won't be able to quickly lookup a person by their email.
Try this solution using MERGE:
LOAD CSV WITH HEADERS FROM 'file:///p.csv' AS line FIELDTERMINATOR '|'
MERGE (p:Person {id:toInteger(line.Personid)})
ON CREATE SET p.mail = line.email
ON MATCH SET p.mail = p.mail + '-' + line.email
The MERGE command take care of the duplicate nodes, and then we're setting the properties only when the node is created with ON CREATE SET, and when the node is already in the database (i.e. ON MATCH SET), we're going to add the email address to the property.
Hope that helps.
This is my result in Neo4j:
A quick workaraound is to load your data in two steps
1/ LOAD CSV, create node with empty array property
2/LOAD CSV again, set emails +=
3/ Optionnal, depending on your data for each node, remove doubles in the array (do it with a custom procedure).
Should do it. I also am not very happy with the CASE syntax
I have Task objects with several attributes. These tasks are bounced between several processes (using Celery) and I'd like to update the task status in a database.
Every update should update only non-NULL attributes of the object. So far I have something like:
def del_empty_attrs(task):
for name in (key for key, val in vars(task).iteritems() if val is None):
delattr(task, name)
def update_task(session, id, **kw):
task = session.query(Task).get(id)
if task is None:
task = Task(id=id)
for key, value in kw.iteritems():
if not hasattr(task, key):
raise AttributeError('Task does not have {} attribute'.format(key))
setattr(task, key, value)
del_empty_attrs(task) # Don't update empty fields
session.merge(task)
However, get either IntegrityError or StaleDataError. What the right way to do this?
I think the problem is that every process has its own session, but I'm not sure.
a lot more detail would be needed to say for sure, but there is a race condition in this code:
def update_task(session, id, **kw):
# 1.
task = session.query(Task).get(id)
if task is None:
# 2.
task = Task(id=id)
for key, value in kw.iteritems():
if not hasattr(task, key):
raise AttributeError('Task does not have {} attribute'.format(key))
setattr(task, key, value)
del_empty_attrs(task) # Don't update empty fields
# 3.
session.merge(task)
If two processes both encounter #1, and find the object for the given id to be None, they both proceed to create a new Task() object with the given primary key (assuming id here is the primary key attribute). Both processes then race down to the Session.merge() which will attempt to emit an INSERT for the row. One process gets the INSERT, the other one gets an IntegrityError as it did not INSERT the row before the other one did.
There's no simple answer for how to "fix" this, it depends on what you're trying to do. One approach might be to ensure that no two processes work on the same pool of primary key identifiers. Another would be to ensure that all INSERTs of non-existent rows are handled by a single process.
Edit: other approaches might involve going with an "optimistic" approach, where SAVEPOINT (e.g. Session.begin_nested()) is used to intercept an IntegrityError on an INSERT, then continue on after it occurs.
I need to create an AR migration for a table of image files. The images are being checked into the source tree, and should act like attachment_fu files. That being the case, I'm creating a hierarchy for them under /public/system.
Because of the way attachment_fu generates links, I need to use the directory naming convention to insert primary key values. How do I override the auto-increment in MySQL as well as any Rails magic so that I can do something like this:
image = Image.create(:id => 42, :filename => "foo.jpg")
image.id #=> 42
Yikes, not a pleasant problem to have. The least-kludgy way I can think of to do it is to have some code in your migration that actually "uploads" all the files through attachment-fu, and therefore lets the plugin create the IDs and place the files.
Something like this:
Dir.glob("/images/to/import/*.{jpg,png,gif}").each do |path|
# simulate uploading the image
tempfile = Tempfile.new(path)
tempfile.set_encoding(Encoding::BINARY) if tempfile.respond_to?(:set_encoding)
tempfile.binmode
FileUtils.copy_file(path, tempfile.path)
# create as you do in the controller - may need other metadata here
image = Image.create({:uploaded_data => tempfile})
unless image.save
logger.info "Failed to save image #{path} in migration: #{image.errors.full_messages}"
end
tempfile.close!
end
A look at attachment-fu's tests might be useful.
Unlike, say Sybase, in MySQL if you specify the id column in the insert statement's column list, you can insert any valid, non-duplicate value in the id. No need to do something special.
I suspect the rails magic is just to not let rails know the id is auto-increment. If this is the only way you'll be inserting into this table, then don't make the id auto_increment. Just make in an int not null primary key.
Though frankly, this is using a key as data, and so it makes me uneasy. If attachment_fu is just looking for a column named "id", make a column named id that's really data, and make a column named "actual_id" the actual, synthetic, auto_incremented key.
image = Image.create(:filename => "foo.jpg") { |r| r.id = 42 }
Here's my kluge:
class AddImages < ActiveRecord::Migration
def self.up
Image.destroy_all
execute("ALTER TABLE images AUTO_INCREMENT = 1")
image = Image.create(:filename => "foo.jpg")
image.id #=> 1
end
def self.down
end
end
I'm not entirely sure I understand why you need to do this, but if you only need to do this a single time, for a migration, just use execute in the migration to set the ID (assuming it's not already taken, which I can't imagine it would be):
execute "INSERT INTO images (id, filename) VALUES (42, 'foo.jpg')"
I agree with AdminMyServer although I believe you can still perform this task on the object directly:
image = Image.new :filename => "foo.jpg"
image.id = 42
image.save
You'll also need to ensure your id auto-increment is updated at the end of the process to avoid clashes in the future.
newValue = Images.find(:first, :order => 'id DESC').id + 1
execute("ALTER TABLE images AUTO_INCREMENT = #{newValue}")
Hope this helps.