Perhaps the title is a little misleading, so I'll explain my question in further detail. Obviously the queries inside of the procedure are executed synchronously and in order, but are procedures themselves executed synchronously?
Lets say I have a procedure called "Register" which handles a couple of queries, for example it looks like this:
BEGIN
DECLARE account_inserted INT(11);
INSERT INTO accounts (...) VALUES (...);
SET account_inserted = LAST_INSERTED_ID(); # <------
QUERY using account_inserted...
QUERY using account_inserted...
QUERY using account_inserted...
...
END
Now lets say that there were numerous requests to register coming in at the same time (For example purposes let us say around 200-300 requests) would it execute all of the procedures in order? Or is it possible would the LAST_INSERTED_ID() variable to conflict with a row inserted from another procedure that is being executed in parallel?
You're muddling three things:
Whether MySQL executes procedures synchronously
This could be interpreted to mean either "does MySQL wait for each command within a procedure to complete before moving on to the next?" or "does MySQL wait for the entire procedure to complete before accepting further instructions from the connection that invoked the CALL command?". In both cases, the answer is "yes, it does".
Whether invocations of MySQL procedures are executed atomically
As with any other series of commands, commands within procedures are only atomic if performed within a transaction on tables that use a transactional storage engine. Thus a different connection may well execute another INSERT between the INSERT in your procedure and the command that follows.
Whether LAST_INSERTED_ID() is guaranteed to return the value generated by the immediately preceding INSERT command in the procedure?
Yes, it is. The most recent insertion ID is maintained on a per-connection basis, and as described above the connection waits for CALL to complete before further commands are accepted.
The ID that was generated is maintained in the server on a per-connection basis. This means that the value returned by the function to a given client is the first AUTO_INCREMENT value generated for most recent statement affecting an AUTO_INCREMENT column by that client. This value cannot be affected by other clients, even if they generate AUTO_INCREMENT values of their own. This behavior ensures that each client can retrieve its own ID without concern for the activity of other clients, and without the need for locks or transactions.
https://dev.mysql.com/doc/refman/5.5/en/information-functions.html#function_last-insert-id
Related
The SQL docs say that LAST_INSERT_ID() works on "per-connection" basis, that is, the last insert ID value will not be overwritten by INSERT statements executed through other connections.
AFAIU, in Go (unlike PHP for example) we don't create separate DB connections on each client request. Instead, we are told to create just one instance of sql.DB object, which manages a pool of SQL connections under the hood. Consequently, they say, there is no guarantee that two consecutive SQL statements in a Go program (even in the same thread) will be executed through the same DB connection. Therefore, the opposite could be the case – two different threads could execute two different SQL statements on the same (reused) DB connection.
The question is: could this automatic connection management inside sql.DB affect the thread safety of sql.Result.LastInsertId()?
Consider the following case: Right after the INSERT statement in one thread, the sql.DB object reuses the connection in another thread and the other thread executes another INSERT statement on that same (reused) connection. Afterwards, the first thread queries the sql.Result.LastInsertId().
Will this return row ID of the second INSERT or the first INSERT? Is the last insert ID cached at the moment of the statement execution, or is it causing a separate statement to be sent to the DB connection?
The MySQL client-server protocol returns the value of LAST_INSERT_ID() in response packets to queries performing an INSERT operation. Generally the client APIs give that back to client code using methods like sql.Result.LastInsertId() in the SQL API. No round-trip query is required.
So the answer to your question is "the first INSERT."
To be clear, MySQL connections aren't thread safe in the broad sense. Instead, they are serially reusable resources. Multi-threaded client environments make them appear thread-safe by managing the serial reuse. You have described how that works for golang in your question.
For the purpose of this question, I am defining a complex stored procedure as 'one involving (at least) one cursor and (at least) one loop to insert data into a temporary table and then return the records from the said temporary table'.
When working on such a complex stored procedure, I was told that when two different users logged into the application perform operations which invoke the same procedure, as the procedure, being complex, can take time up to few seconds (~10 seconds) to finish execution, then the results may not faithful on a per-user basis. That is, the results may get mixed up and one user may see the results intended for the other user, as they try to access the same temporary table.
The recommendation was to use a unique system-generated identifier for each user in order to distinguish the result sets for each user.
Now, I'd like to know the following:-
Can this concurrency problem be avoided by using any table or database engine configuration settings?
Is this a violation of one or more ACID properties? How does using a full ACID compliant database engine (such as InnoDB, the one I am using) impact this question?
In the case of a simple stored procedure, one which involves only a single SELECT statement over a join of multiple tables, but no temporary tables, when the execution time is almost always under a second, is concurrency still a problem?
OK, "cursor" is irrelevant. You just want a long-running Stored Procedure running simultaneously in multiple connections by the same user.
A connection (alias a session) is the unit of "independence". An entity (such as a table) is either specific to the connection, or global to the world (aside from permission problems). There is nothing "specific to a user".
A CREATE TEMPORARY TABLE is unique to the connection creating it. Ditto for last_insert_id(). These are "thread-safe" in that the "connection" is the "thread".
If you want to have multiple connections (same user or not) access the same "temporary" table, then it is up to you to create a non-TEMPORARY table and somehow know the name of that table.
What is written on the PHP Manual and also one comment from the manual says:
Closes a prepared statement.
mysqli_stmt_close() also deallocates
the statement handle. If the current
statement has pending or unread
results, this function cancels them so
that the next query can be executed.
Comment:
if you are repeating an statement in
an loop using bind_param and so on
inside it for a larger operation. i
thougt id would be good to clean it
with stmt->close. but it broke always
with an error after aprox. 250
operations . As i tried it with
stmt->reset it worked for me.
Here I don't understand what is the meaning of "prepared statement
has pending or unread results"?
An RDBMS that is running a query can return data before the entire dataset has been processed. There can also be records that it has not read yet.
Both the records that are already read and the ones that are pending must be saved in some resource in the database server, usually called a 'cursor'.
You execute the application code statement that reads these records from the server's cursor and into your application's memory, with PHP's MySQi wrappers those are the called the fetch methods.
Now after executing a query, you are not obliged to fetch any or all these results. So either way, after reading the results of the query or not, executing mysqli_stmt_close() tells the server it can discard the cursor, i.e. remove the already read records from its memory and cancel the optionally still running query.
So:
Unread results: fetched from the database, but not read by the client.
Pending results: records that will be included in the result set once the query runs to completion.
I have quite a simple question I think, but I need a definitive answer. I can't think of way to test out my ideas because it depends on a transient error (out of my control)
SQL Azure is prone to transient errors, timeouts, rejected connections etc. Sometimes on opening the connection, other times when executing a query.
I am inserting a row into SQL Azure via a stored procedure (it's basically a messaging system, so each message should be sent/inserted only once).
If a transient error occurs, my system waits a few seconds, then tries again ... repeating until the stored procedure executes without any errors.
I need the stored procedure to either insert the row and confirm to me that it has been inserted OR to fail completely and NOT insert it.
At the minute, I'm finding that when the database is going through a really bad patch, a message can end up being sent several times.
What would a system that deals with financial transactions do?
Since it's jut one insert statement, am I right in thinking wrapping it in a transaction would have no effect?
Can anyone clarify for me or even point me to some documentation I should read to figure it out myself?
Suppose there is more than just one insert in procedure (some selects after insert?)
You can start transaction in application. that way- if you do commit transaction in application and it succeeds, everything is done.
As per the suggestion by gbn, I added an extra key, not an identity, that I could use to identify duplicates and check on this before inserting
I have a procedure (procedureA) that loops through a table and calls another procedure (procedureB) with variables derived from that table.
Each call to procedureB is independent of the last call.
When I run procedureA my system resources show a maximum CPU use of 50% (I assume that is 1 out of my 2 CPU cores).
However, if I open two instances of the mysql terminal and execute a query in both terminals, both CPU cores are used (CPU usage can reach close to 100%).
How can I achieve the same effect inside a stored procedure?
I want to do something like this:
BEGIN
CALL procedureB(var1); -> CPU CORE #1
SET var1 = var1+1;
CALL procedureB(var1); -> CPU CORE #2
END
I know its not going to be that easy...
Any tips?
Within MySQL, to get something done asynchronously you'd have to use an CREATE EVENT, but I'm not sure whether creating one is allowed within a stored procedure. (On a side note: async. inserts can of course be done with INSERT DELAYED, but that's 1 thread, period).
Normally, you are much better of having a couple of processes/workers/deamons which can be accessed asynchronously by you program and have their own database connection, but that of course won't be in the same procedure.
You can write your own daemon as a stored procedure, and schedule multiple copies of it to run at regular intervals, say every 5 minutes, 1 minute, 1 second, etc.
use get_lock() with N well defined lock names to abort the event execution if another copy of the event is still running, if you only want up to N parallel copies running at a time.
Use a "job table" to list the jobs to execute, with an ID column to identify the execution order. Be sure to use good transaction and lock practices of course - this is re-entrant programming, after all.
Each row can define a stored procedure to execute and possibly the parameters. You can even have multiple types of jobs, job tables, and worker events for different tasks.
Use PREPARE and EXECUTE with the CALL statement to dynamically call stored procedures whose names are stored in strings.
Then just add rows as needed to the job table, even inserting in big batches, and let your worker events process them as fast as they can.
I've done this before, in both Oracle and MySQL, and it works well. Be sure to handle errors and log them somewhere, as well as success, for that matter, for debugging and auditing, as well as performance tuning. N=#CPUs may not be the best fit, depending on your data and the types of jobs. I've seen N=2xCPUs work best for data-intensive tasks, where lots of parallel disk I/O is more important than computational power.