iBatis selectKey and transactions - mysql

I'm using iBatis with MySQL 5 in my Java app.
I have a persistent entity class
public class Entity {
private int id;
private Stirng property;
// setters and getters are omitted
}
Inserting new entity is done as follows:
<insert id="insert" parameterClass="MyEntity">
<selectKey resultClass="int" type="post" keyProperty="id" >
select LAST_INSERT_ID() as value
</selectKey>
{CALL insert_entity(#property#)}
</insert>
Transactions are managed inside the stored procedure as follows:
CREATE DEFINER=`user`#`%` PROCEDURE `insert`(IN p_property VARCHAR(255))
BEGIN
START TRANSACTION;
INSERT INTO entities (property) VALUES (p_property);
-- Do more stuff that requires transaction: update more tables etc.
COMMIT;
END;
What I'm trying to achieve is getting newly inserted entity id back to my Java code. While working with no concurrent DB updates, the setup above will do exactly what I want. The unclear part is what happens with concurrent DB updates - i.e. what is the exact timing and context of iBatis executing selectKey statement - I'd guess it will not be executed within the same transaction that defined in stored procedure, so it is possible that id returned will not be the id of the entity I want.
The only possible solution I can think about is avoid usage of selectKey:
<insert id="insert" parameterClass="MyEntity">
{CALL insert_entity(#property#, #id,mode=OUT#)}
</insert>
Returning the last inserted id from the stored procedure:
CREATE DEFINER=`user`#`%` PROCEDURE `insert`(
IN p_property VARCHAR(255),
OUT p_id INTEGER(11),
)
BEGIN
START TRANSACTION;
INSERT INTO entities (property) VALUES (p_property);
SELECT LAST_INSERT_ID() INTO p_id;
-- Do more stuff that requires transaction: update more tables etc.
COMMIT;
END;
Is there any better solution for this problem?
Edited: MySQL documentation for LAST_INSERT_ID states:
The ID that was generated is maintained in the server on a per-connection basis. This means that the value returned by the function to a given client is the first AUTO_INCREMENT value generated for most recent statement affecting an AUTO_INCREMENT column by that client. This value cannot be affected by other clients, even if they generate AUTO_INCREMENT values of their own. This behavior ensures that each client can retrieve its own ID without concern for the activity of other clients, and without the need for locks or transactions.
So it seems like the originals solution with selectKey will work in all the cases. However, for the complex stored procedures with multiple INSERT statements the second approach is safer.

Firstly, I have to state the obvious: You should seriously try to avoid doing your own transaction management inside your stored procedure.
Assuming that this is your only option, I'd say that the latter solution would be my preference, as it's is clear to any developer that the id is returned from within the transaction.

Related

Committing transactions while executing a postgreql Function

I have Postgresql Function which has to INSERT about 1.5 million data into a table. What I want is I want to see the table getting populated with every one records insertion. Currently what is happening when I am trying with say about 1000 records, the get gets populated only after the complete function gets executed. If I stop the function half way through, no data gets populated. How can I make the record committed even if I stop after certain number of records have been inserted?
This can be done using dblink. I showed an example with one insert being committed you will need to add your while loop logic and commit every loop. You can http://www.postgresql.org/docs/9.3/static/contrib-dblink-connect.html
CREATE OR REPLACE FUNCTION log_the_dancing(ip_dance_entry text)
RETURNS INT AS
$BODY$
DECLARE
BEGIN
PERFORM dblink_connect('dblink_trans','dbname=sandbox port=5433 user=postgres');
PERFORM dblink('dblink_trans','INSERT INTO dance_log(dance_entry) SELECT ' || '''' || ip_dance_entry || '''');
PERFORM dblink('dblink_trans','COMMIT;');
PERFORM dblink_disconnect('dblink_trans');
RETURN 0;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION log_the_dancing(ip_dance_entry text)
OWNER TO postgres;
BEGIN TRANSACTION;
select log_the_dancing('The Flamingo');
select log_the_dancing('Break Dance');
select log_the_dancing('Cha Cha');
ROLLBACK TRANSACTION;
--Show records committed even though we rolled back outer transaction
select *
from dance_log;
What you're asking for is generally called an autonomous transaction.
PostgreSQL does not support autonomous transactions at this time (9.4).
To properly support them it really needs stored procedures, not just the user-defined functions it currently supports. It's also very complicated to implement autonomous tx's in PostgreSQL for a variety of internal reasons related to its session and process model.
For now, use dblink as suggested by Bob.
If you have the flexibility to change from function to procedure, from PostgreSQL 12 onwards you can do internal commits if you use procedures instead of functions, invoked by CALL command. Therefore your function will be changed to a procedure and invoked with CALL command: e.g:
CREATE PROCEDURE transaction_test2()
LANGUAGE plpgsql
AS $$
DECLARE
r RECORD;
BEGIN
FOR r IN SELECT * FROM test2 ORDER BY x LOOP
INSERT INTO test1 (a) VALUES (r.x);
COMMIT;
END LOOP;
END;
$$;
CALL transaction_test2();
More details about transaction management regarding Postgres are available here: https://www.postgresql.org/docs/12/plpgsql-transactions.html
For Postgresql 9.5 or newer you can use dynamic background workers provided by pg_background extension. It creates autonomous transaction. Please, refer the github page of the extension. The sollution is better then db_link. There is a complete guide on Autonomous transaction support in PostgreSQL. There is a third way to start autonomous transaction in Postgres, but some patching neede. Please see Peter's Eisentraut patch proposal for OracleDB-style transactions.

AUTO_INCREMENT and LAST_INSERT_ID

I'm using AUTO_INCREMENT and I would like to get that ID of inserted row so that I could update another table using ID as common field between the 2 tables.
I understood that LAST_INSERT_ID will get last ID. However, my concern is that, the database is accessed at same time by many users. Hence, there might be another process accessed the table and also inserted a new row at same time.
Does LAST_INSERT_ID return just the last ID regardless of the connection used, or only return last ID for the connection that I'm using?
Notice, I'm accessing MySQL database using connection pool in Tomcat server.
In summary, I need to insert a row in table A with auto increment, than I need to insert row in table B, which need to be linked to table A using the AUTO_INCREMENT value.
SELECT max(employeeid) FROM Employee;
The above query returns the value of employeeid of last inserted record in Employee table because employeeid is an auto increment column. This seems to be OK, but suppose two threads are executing insert operation simultaneously, there is a chance that you get wrong id of last inserted record!
Don’t worry, MySQL provides a function which returns the value of auto increment column of last inserted record.
SELECT LAST_INSERT_ID();
LAST_INSERT_ID() is always connection specific, this means even if insert operation is carried out simultaneously from different connections, it always returns the value of current connection specific operation.
So you have to first insert record in Employee table, run the above query to get the id value and use this to insert in second table.
LAST_INSERT_ID() work in context, it should be in transactions or inside user defined stored procedures or user defined functions.
LAST_INSERT_ID is connection specific. That's true, but you should be careful if you use connection pooling. This may be problematic when you perform successive INSERT IGNORE statements in a loop and the pool gives you the same connection at each iteration.
For example; Assume that you receive the same (open) connection for each of the below:
INSERT IGNORE ... some-new-id >>> LAST_INSERT_ID returns 100
INSERT IGNORE ... some-existing-id >>> LAST_INSERT_ID still returns 100 (result of the previous operation)
It is always good to check whether the INSERT operation has in fact inserted any rows before calling LAST_INSERT_ID.
LAST_INSERT_ID return the last insert id for the current session.
As long as you don't insert more than one entry with your current connection, it is valid.
Further information here: https://dba.stackexchange.com/questions/21181/is-mysqls-last-insert-id-function-guaranteed-to-be-correct
(i would link to mysql.com, but it'S currently down for me)

How can I be DRY in columns names in this MySQL procedure?

I'm referencig name, description and user_id columns of meta table. Twice, and maybe more (who knows?) in future. Those columns are used to compute the ETag of my meta resource.
Adding one column that contributes to compute ETag in the future will force me to change the code N times, and this is bad.
Is there any way to make it DRY and store these column names elsewhere? Because I'd like to use these column names also when INSERT on meta is performed.
IF only = true THEN
-- Calculate ETag on meta fields only
UPDATE meta
SET etag = etag(CONCAT(name, description, user_id))
WHERE id = meta_id;
ELSE
-- Calculate Etag on meta fields and meta customers
BEGIN
DECLARE c_etags VARCHAR(32);
-- Compute c_etags
UPDATE meta
SET etag = etag(CONCAT(etag(name, description, user_id), c_etags))
WHERE id = meta_id;
END;
END IF;
Disclaimer: this code is untested, I'm pretty new to MySQL stuff, apart for simple statements.
EDIT: etag is MD5 MySQL function. Maybe this is one option:
CREATE PROCEDURE set_meta_etag(IN meta_id INT, IN related TEXT)
NOT DETERMINISTIC
BEGIN
UPDATE meta
SET etag = etag(CONCAT(name, description, user_id,
IF(related IS NOT NULL, related, '')))
WHERE id = meta_id;
END //
-- First call
CALL set_meta_etag(meta_id, NULL);
-- Second call
CALL set_meta_etag(meta_id, c_etags);
But it won't work for INSERT statement.
The obvious thing (foreach column, if it's the one I want, use it to help make the etag) doesn't work in SQL with any ease, because SQL doesn't, historically, contemplate column names stored in variables.
You could write a program in your favorite non-SQL programming language (Java, PHP, etc) to create and then define your procedure.
You could also use so-called "dynamic sql" to do this, if you were willing to do the work and take the slight performance hit. See
How To have Dynamic SQL in MySQL Stored Procedure
for information on how to PREPARE and EXECUTE statements in a stored procedure.
By the way, I have had good success building systems that have various kind of metadata stored in the column contents. For example, you could write code looking for the string '[etag]' in your column contents. The comments for columns are stored in
information_schema.COLUMNS.COLUMN_COMMENT
and are very easy to process when your program is starting up.
If you know this is confined to one table, you could add a trigger. Using an AFTER trigger should allow your stored proc to work for both INSERT and UPDATE. See MySQL Fire Trigger for both Insert and Update.

MySQL Stored Procedures and Last Inserted Row

I'm curious whether this is a possibility. I have a stored procedure which Inserts and then retrieves the last insert id. What if 2 users both use the procedure at the same time, is something like this possible?
User 1
User 2
Insert 1
Insert 2
GetsLastid 2
GetsLastid 2
Could the 2 calls of the stored procedure interlace the sequence of the insert queries? Or will one take lead?
Thank you!
That's not a problem. From the fine manual:
The ID that was generated is maintained in the server on a per-connection basis. This means that the value returned by the function to a given client is the first AUTO_INCREMENT value generated for most recent statement affecting an AUTO_INCREMENT column by that client. This value cannot be affected by other clients, even if they generate AUTO_INCREMENT values of their own.
So the last_insert_id() value is always per-session (AKA connection) and you have two sessions in play, the can't interfere with each other's last_insert_id() values.
That said, it is a good idea to grab the last_insert_id() value and store it in a variable as soon after the INSERT as possible. If you do something else that does an INSERT behind your back — say you call another procedure that has logging added two months down the road and that logging does an INSERT — you will lose the last_insert_id() value that you want.

MySQL trigger : is it possible to delete rows if table become too large?

When inserting a new row in a table T, I would like to check if the table is larger than a certain threshold, and if it is the case delete the oldest record (creating some kind of FIFO in the end).
I thought I could simply make a trigger, but apparently MySQL doesn't allow the modification of the table on which we are actually inserting :
Code: 1442 Msg: Can't update table 'amoreAgentTST01' in stored function/trigger because it is already used by statement which invoked this stored function/trigger.
Here is the trigger I tried :
Delimiter $$
CREATE TRIGGER test
AFTER INSERT ON amoreAgentTST01
FOR EACH ROW
BEGIN
DECLARE table_size INTEGER;
DECLARE new_row_size INTEGER;
DECLARE threshold INTEGER;
DECLARE max_update_time TIMESTAMP;
SELECT SUM(OCTET_LENGTH(data)) INTO table_size FROM amoreAgentTST01;
SELECT OCTET_LENGTH(NEW.data) INTO new_row_size;
SELECT 500000 INTO threshold;
select max(updatetime) INTO max_update_time from amoreAgentTST01;
IF (table_size+new_row_size) > threshold THEN
DELETE FROM amoreAgentTST01 WHERE max_update_time = updatetime; -- and check if not current
END IF;
END$$
delimiter ;
Do you have any idea on how to do this within the database ?
Or it is clearly something to be done in my program ?
Ideally you should have a dedicated archive strategy in a separate process that runs at off-peak times.
You could implement this either as a scheduled stored procedure (yuck) or an additional background worker thread within your application server, or a totally separate application service. This would be a good place to put other regular housekeeping jobs.
This has a few benefits. Apart from avoiding the trigger issue you're seeing, you should consider the performance implications of anything happening in a trigger. If you do many inserts, that trigger will do that work and effectively half the performance, not to mention the lock contention that will arise as other processes try to access the same table.
A separate process that does housekeeping work minimises lock contention, and allows the work to be carried out as a high-performance bulk operation, in a transaction.
One last thing - you should possibly consider archiving records to another table or database, rather than deleting them.