The SQL docs say that LAST_INSERT_ID() works on "per-connection" basis, that is, the last insert ID value will not be overwritten by INSERT statements executed through other connections.
AFAIU, in Go (unlike PHP for example) we don't create separate DB connections on each client request. Instead, we are told to create just one instance of sql.DB object, which manages a pool of SQL connections under the hood. Consequently, they say, there is no guarantee that two consecutive SQL statements in a Go program (even in the same thread) will be executed through the same DB connection. Therefore, the opposite could be the case – two different threads could execute two different SQL statements on the same (reused) DB connection.
The question is: could this automatic connection management inside sql.DB affect the thread safety of sql.Result.LastInsertId()?
Consider the following case: Right after the INSERT statement in one thread, the sql.DB object reuses the connection in another thread and the other thread executes another INSERT statement on that same (reused) connection. Afterwards, the first thread queries the sql.Result.LastInsertId().
Will this return row ID of the second INSERT or the first INSERT? Is the last insert ID cached at the moment of the statement execution, or is it causing a separate statement to be sent to the DB connection?
The MySQL client-server protocol returns the value of LAST_INSERT_ID() in response packets to queries performing an INSERT operation. Generally the client APIs give that back to client code using methods like sql.Result.LastInsertId() in the SQL API. No round-trip query is required.
So the answer to your question is "the first INSERT."
To be clear, MySQL connections aren't thread safe in the broad sense. Instead, they are serially reusable resources. Multi-threaded client environments make them appear thread-safe by managing the serial reuse. You have described how that works for golang in your question.
Related
Having distributed serverless application, based on AWS Aurora Serverless MySQL 5.6 and multiple Lambda functions. Some of Lambdas represent writing threads, another are reading treads. For denoting most important details, lets suppose that there is only one table with following structure:
id: bigint primary key autoincrement
key1: varchar(700)
key2: bigint
content: blob
unique(key1, key2)
Writing threads perform INSERTs in following manner: every writing thread generates one entry with key1+key2+content, where key1+key2 pair is unique, and id is generating automatically by autoincrement. Some writing threads can fail by DUPLICATE KEY ERROR, if key1+key2 will have repeating value, but that does not matter and okay.
There also some reading threads, which are polling table and tries to process new inserted entries. Goal of reading thread is retrieve all new entries and process them some way. Amount of reading threads is uncontrolled and they does not communicate with each other and does not write anything in table above, but can write some state in custom table.
Firstly it's seems that polling is very simple - it's enough to reading process to store last id that has been processed, and continue polling from it, e.g. SELECT * FROM table WHERE id > ${lastId}. Approach above works well on small load, but does not work with high load by obvious reason: there are some amount of inserting entries, which have not yet appeared in the database, because cluster had not been synchronized at this point.
Let's see what happens in cluster point of view, event if it consists of only two servers A and B.
1) Server A accepts write transaction with entry insertion and acquired autoincrement number 100500
2) Server B accepts write transaction with entry insertion and acquired autoincrement number 100501
3) Server B commits write transaction
4) Server B accepts read transaction, and returns entries with id > 100499, which is only 100501 entry.
5) Server A commits write transaction.
6) Reading thread receives only 100501 entry and moves lastId cursor to 100501. Entry 100500 is lost for current reading thread forever.
QUESTION: Is there way to solve problem above WITHOUT hard-lock tables on all cluster, in some lock-less aware way or something similar?
The issue here is that the local state in each lambda (thread) does not reflect the global state of said table.
As a first call I would try to always consult the table what is the latest ID before reading the entry with that ID.
Have a look at built in function LAST_INSERT_ID() in MySQL.
The caveat
[...] the most recently generated ID is maintained in the server on a
per-connection basis
Your lambda could be creating connections prior to handler function / method which would make them longer living (it's a known trick, but it's not bomb proof here), but I think new simultaneously executing lambda function would be given a new connection, in which case the above solution would fall apart.
Luckily what you have to do then is to wrap all WRITES and all READS in transactions so that additional coordination will take place when reading and writing simultaneously to the same table.
In your quest you might come across transaction isolation levels and SEERIALIZEABLE would be safest and least perfomant, but apparently AWS Aurora does not support it (I had not verified that statement).
HTH
I have a general understanding question about how Slick/the database manage asynchronous operations. When I compose a query, or an action, say
(for {
users <- UserDAO.findUsersAction(usersInput.map(_.email))
addToInventoriesResult <- insertOrUpdate(inventoryInput, user)
deleteInventoryToUsersResult <- inventoresToUsers.filter(_.inventoryUuid === inventoryInput.uuid).delete if addToInventoriesResult == 1
addToInventoryToUsersResult <- inventoresToUsers ++= users.map(u => DBInventoryToUser(inventoryInput.uuid, u.uuid)) if addToInventoriesResult == 1
} yield(addToInventoriesResult)).transactionally
Is there a possibility that another user can for example remove the users just after the first action UserDAO.findUsersAction(usersInput.map(_.email)) is executed, but before the rest, such that the insert will fail (because of foreign key error)? Or a scenario that can lead to a lost update like: transaction A reads data, then transaction B updates this data, then transaction A does an update based on what it did read, it will not see B's update an overwrite it
I think this probably depends on the database implementation or maybe JDBC, as this is sent to the database as a block of SQL, but maybe Slick plays a role in this. I'm using MySQL.
In case there are synchronisation issues here, what is the best way to solve this?. I have read about approaches like a background queue that processes the operations sequentially (as semantic units), but wouldn't this partly remove the benefit of being able to access the database asynchronously -> have bad performance?
First of all, if the underlying database driver is blocking (the case with JDBC based drivers) then Slick cannot deliver async peformance in the truly non-blocking sense of the word (i.e. a thread will be consumed and blocked for however long it takes for a given query to complete).
There's been talk of implementing non-blocking drivers for Oracle and SQL Server (under a paid Typesafe subscription) but that's not happening any time soon AFAICT. There are a couple of projects that do provide non-blocking drivers for Postegres and MySQL, but YMMV, still early days.
With that out of the way, when you call transactionally Slick takes the batch of queries to execute and wraps them in a try-catch block with underlying connection's autocommit flag set to false. Once the queries have executed successfully the transaction is committed by setting autocommit back to the default, true. In the event an Exception is thrown, the connection's rollback method is called. Just standard JDBC session boilerplate that Slick conveniently abstracts away.
As for your scenario of a user being deleted mid-transaction and handling that correctly, that's the job of the underlying database/driver.
Perhaps the title is a little misleading, so I'll explain my question in further detail. Obviously the queries inside of the procedure are executed synchronously and in order, but are procedures themselves executed synchronously?
Lets say I have a procedure called "Register" which handles a couple of queries, for example it looks like this:
BEGIN
DECLARE account_inserted INT(11);
INSERT INTO accounts (...) VALUES (...);
SET account_inserted = LAST_INSERTED_ID(); # <------
QUERY using account_inserted...
QUERY using account_inserted...
QUERY using account_inserted...
...
END
Now lets say that there were numerous requests to register coming in at the same time (For example purposes let us say around 200-300 requests) would it execute all of the procedures in order? Or is it possible would the LAST_INSERTED_ID() variable to conflict with a row inserted from another procedure that is being executed in parallel?
You're muddling three things:
Whether MySQL executes procedures synchronously
This could be interpreted to mean either "does MySQL wait for each command within a procedure to complete before moving on to the next?" or "does MySQL wait for the entire procedure to complete before accepting further instructions from the connection that invoked the CALL command?". In both cases, the answer is "yes, it does".
Whether invocations of MySQL procedures are executed atomically
As with any other series of commands, commands within procedures are only atomic if performed within a transaction on tables that use a transactional storage engine. Thus a different connection may well execute another INSERT between the INSERT in your procedure and the command that follows.
Whether LAST_INSERTED_ID() is guaranteed to return the value generated by the immediately preceding INSERT command in the procedure?
Yes, it is. The most recent insertion ID is maintained on a per-connection basis, and as described above the connection waits for CALL to complete before further commands are accepted.
The ID that was generated is maintained in the server on a per-connection basis. This means that the value returned by the function to a given client is the first AUTO_INCREMENT value generated for most recent statement affecting an AUTO_INCREMENT column by that client. This value cannot be affected by other clients, even if they generate AUTO_INCREMENT values of their own. This behavior ensures that each client can retrieve its own ID without concern for the activity of other clients, and without the need for locks or transactions.
https://dev.mysql.com/doc/refman/5.5/en/information-functions.html#function_last-insert-id
I will get some text from another question here:
The PreparedStatement is a slightly more powerful version of a Statement, and should always be at least as quick and easy to handle as a Statement.
The Prepared Statement may be parametrized
Most relational databases handles a JDBC / SQL query in four steps:
Parse the incoming SQL query
Compile the SQL query
Plan/optimize the data acquisition path
Execute the optimized query / acquire and return data
A Statement will always proceed through the four steps above for each SQL query sent to the database. A Prepared Statement pre-executes steps (1) - (3) in the execution process above. Thus, when creating a Prepared Statement some pre-optimization is performed immediately. The effect is to lessen the load on the database engine at execution time.
Now here is my question:
If I use hundreds or thousands of Statement, will it be cause performance problems in database? (I don't mean that they will perform slower because of more jobs to do every time). Will all those statements be cached in database or they will be lost in space as soon as they are executed?
Since there is no restictions on using prepared statements, you should work carefully with them.
As you said you need hundreds of prepaired, think twice may be you are using it wrong.
The pattern it should be used is having an application that doing a haevy inserts/updates/select hundred or thousand times a second which only differs in variables. So in real world it would be like, connecting, creating session, sending statement, and sending bunch of variables to that statement.
But if your plan is to create prepared on each single operations, it's just better to use common queries.
On your questions:
Hundreds of statements will not kill mysql or drive you to performance degradation
The prepared are stored in memory while client session is up and running. As soon as you close session the prepared die.
To be sure you need it:
Your app able to execute statements fast so you get speed value of using them
Your query will not have a variable number of arguments, otherwise you can kill you app by creating objects and storing in memory on every statement
I have a table in SQL server database in which I am recording the latest activity time of users. Can somebody please confirm me that SQL server will automatically handle the scenario when multiple update requests received simultaneously for different users. I am expecting 25-50 concurrent update request on this table but each request is responsible for updating different rows in the table. Do i need something extra like connection pooling etc..?
Yes, Sql Server will handle this scenario.
It is a SGDB and it expects scenarios like this one.
When you insert/update/delete a row in Sql, sql will lock the table/row/page to garantee that you will be able to do what you want. This lock will be released when you are done inserting/updating/deleting the row.
Check this Link
And introduction-to-locking-in-sql-server
But there are a few thing you should do:
1 - Make sure you will do whatener you want fast. Because of the lock issue, if you stay connected for too long other requests to the same table may be locked until you are done and this can lead to a timeout.
2 - Always use a transaction.
3 - Make sure to adjust the fill factor of your indexes. Check Fill Factor on MSDN.
4 - Adjust the Isolation level according to what you want.
5 - Get rid of unused indexes to speed up your insert/update.
Connection pooling are not very related to your question. Connection pooling is a technique that avoid the extra overhead of creating new connections to the Database every time you send a request. In C# and other languages that uses ADO this is automatically done. Check this out: SQL Server Connection Pooling.
Other links that may be usefull:
best-practices-for-inserting-updating-large-amount-of-data-in-sql-2008
Speed Up Insert Performance