We have a system that has a database based queue for processing items in threads instead of real time. It's currently implemented in Mybatis calling a this stored procedure in mysql:
DROP PROCEDURE IF EXISTS pop_invoice_queue;
DELIMITER ;;
CREATE PROCEDURE pop_invoice_queue(IN compId int(11), IN limitRet int(11)) BEGIN
SELECT LAST_INSERT_ID(id) as value, InvoiceQueue.* FROM InvoiceQueue
WHERE companyid = compId
AND (lastPopDate is null OR lastPopDate < DATE_SUB(NOW(), INTERVAL 3 MINUTE)) LIMIT limitRet FOR UPDATE;
UPDATE InvoiceQueue SET lastPopDate=NOW() WHERE id=LAST_INSERT_ID();
END;;
DELIMITER ;
The problem is that this pops N items from the queue but only updates the lastPopDate value for the last item popped off the queue. So if we call this stored procedure with limitRet = 5, it will pop five items off the queue and start working on them but only the fifth item will have a lastPopDate set so when the next thread comes and pops off the queue it will get items 1-4 and item 6.
How can we get this to update all N records 'popped' off the database?
If you are willing to add a BIGINT field to the table via:
ALTER TABLE InvoiceQueue
ADD uuid BIGINT NULL DEFAULT NULL,
INDEX ix_uuid (uuid);
then you can do the update first, and select the records updated, via:
CREATE PROCEDURE pop_invoice_queue(IN compId int(11), IN limitRet int(11))
BEGIN
SET #uuid = UUID_SHORT();
UPDATE InvoiceQueue
SET uuid = #uuid,
lastPopDate = NOW()
WHERE companyid = compId
AND uuid IS NULL
AND (lastPopDate IS NULL OR lastPopDate < NOW() - INTERVAL 3 MINUTE)
ORDER BY
id
LIMIT limitRet;
SELECT *
FROM InvoiceQueue
WHERE uuid = #uuid
FOR UPDATE;
END;;
For the UUID_SHORT() function to return unique values, it should be called no more than 16 million times a second per machine. Visit here for more details.
For performance, you may want to alter the lastPopDate field to be NOT NULL as the OR clause will cause your query to not use an index, even if one is available:
ALTER TABLE InvoiceQueue
MODIFY lastPopDate DATETIME NOT NULL DEFAULT '0000-00-00';
Then, if you do not already have one, you could add an index on the companyid/lastPopDate/uuid fields, as follows:
ALTER TABLE InvoiceQueue
ADD INDEX ix_company_lastpop (companyid, lastPopDate, uuid);
Then you can remove the OR clause from your UPDATE query:
UPDATE InvoiceQueue
SET uuid = #uuid,
lastPopDate = NOW()
WHERE companyid = compId
AND lastPopDate < NOW() - INTERVAL 3 MINUTE
ORDER BY
id
LIMIT limitRet;
which will use the index you just created.
Since mysql has neither collection nor output/returning clause, my suggestion is to use temporary tables. Something like :
CREATE TEMPORARY TABLE temp_data
SELECT LAST_INSERT_ID(id) as value, InvoiceQueue.* FROM InvoiceQueue
WHERE companyid = compId
AND (lastPopDate is null OR lastPopDate < DATE_SUB(NOW(), INTERVAL 3 MINUTE)) LIMIT limitRet FOR UPDATE;
UPDATE InvoiceQueue
INNER JOIN temp_data ON (InvoiceQueue.PKColumn = temp_data.PKColumn)
SET lastPopDate=NOW();
SELECT * FROM temp_data ;
DROP TEMPORARY TABLE temp_data;
Also, I surmise such select ... for update can cause deadlocks (surely, if the procedure is called from different sessions) - as far as I know order in which rows get locked is not guaranteed (even if you had order by, rows might be locked in different order). I'd recommend to double check documentation.
Related
I have a table on which id is a primary key column set with auto increment. It contains over 10,00 rows.
I need to get all primary keys that have been deleted.
like
1 xcgh fct
2 xxml fcy
5 ccvb fcc
6 tylu cvn
9 vvbh cvv
The result that i should get is
3
4
7
8
currently i count all records and then insert(1 to count) in another table and then i select id from that table that dosent exists in record table. But this method is very inefficient. Is there any direct query that i can use?
please specify for mysql.
See fiddle:
http://sqlfiddle.com/#!2/edf67/4/0
CREATE TABLE SomeTable (
id INT PRIMARY KEY
, mVal VARCHAR(32)
);
INSERT INTO SomeTable
VALUES (1, 'xcgh fct'),
(2, 'xxml fcy'),
(5, 'ccvb fcc'),
(6, 'tylu cvn'),
(9, 'vvbh cvv');
set #rank = (Select max(ID)+1 from sometable);
create table CompleteIDs as (Select #rank :=#rank-1 as Rank
from sometable st1, sometable st2
where #rank >1);
SELECT CompleteIDs.Rank
FROM CompleteIDs
LEFT JOIN someTable S1
on CompleteIDs.Rank = S1.ID
WHERE S1.ID is null
order by CompleteIDs.rank
There is one assumption here. That the number of records in someTable* the number of records in sometable is greater than the maximum ID in sometable. Otherwise this doesn't work.
You can try to create a temp table, fill it with e.g. 1,000 values, you can do it using any scripting language or try a procedure (This might be not-effective overall)
DELIMITER $$
CREATE PROCEDURE InsertRand(IN NumRows INT)
BEGIN
DECLARE i INT;
SET i = 1;
START TRANSACTION;
WHILE i <= NumRows DO
INSERT INTO rand VALUES (i);
SET i = i + 1;
END WHILE;
COMMIT;
END$$
DELIMITER ;
CALL InsertRand(5);
Then you just do query
SELECT id AS deleted_id FROM temporary_table
WHERE id NOT IN
(SELECT id FROM main_table)
Please note that it should be like every day action or something cause it's very memory inefficient
we have Orders table with an identity column (OrderID) but our order number is composed by OrderType (2 chars), OrderYear (2 chars) and OrderID (6 chars), totally 10 chars (i.e. XX12123456).
This counter has limitations: we can arrive to identity 999999 as OrderID . Next order will have ID composed by 7 chars. Obviously we cannot ave duplicates order ids.
So we have created a table prefilled with progressive OrderID and OrderYear (from 100000 to 999999, order year from 12 to 16, for instance): this stored procedure begins a transacation with SERIALIZABLE isolation level, take first order id not used, update it as used and commit the transaction.
Being our Orders table, i'm worried about deadlocks on executing order id calculation stored procedure or duplicated orderids.
I'll test this with a console application that create multiple concurrency threads and try to extract orderids simulating a production load.
Doubts are:
Exists another method to simulate an identity column safely?
May consider usage of triggers?
May consider differente isolation level?
Other ideas? :D
Thanks!
[EDIT]
After googling and reading a bunch of MSDN documentation, i've found many examples showing how managing errors and dealocks and approaching a type of automatic reply directly from SP, as follow:
CREATE PROCEDURE [dbo].[sp_Ordine_GetOrderID]
#AnnoOrdine AS NVARCHAR(2) = NULL OUTPUT,
#IdOrdine AS INT = NULL OUTPUT
AS
SET NOCOUNT ON
DECLARE #retry AS INT
SET #retry = 2
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
WHILE (#retry > 0)
BEGIN
BEGIN TRY
BEGIN TRANSACTION OrderID
SELECT TOP 1 #AnnoOrdine = AnnoOrdine, #IdOrdine = IdOrdine
FROM ORDINI_PROGRESSIVI --WITH (ROWLOCK)
WHERE Attivo = 1
--ORDER BY AnnoOrdine ASC, IDOrdine ASC
UPDATE ORDINI_PROGRESSIVI WITH (UPDLOCK)
SET Attivo = 0
WHERE AnnoOrdine = #AnnoOrdine AND IdOrdine = #IdOrdine
IF ISNULL(#IdOrdine, '') = '' OR ISNULL(#AnnoOrdine,'') = ''
BEGIN
RAISERROR('Deadlock', 1, 1205)
END
SET #retry = 0
COMMIT TRANSACTION OrderID
SELECT #AnnoOrdine AS AnnoOrdine, #IdOrdine AS IdOrdine
END TRY
BEGIN CATCH
IF (ERROR_NUMBER() = 1205)
SET #retry = #retry - 1;
ELSE
SET #retry = -1;
IF XACT_STATE() <> 0
ROLLBACK TRANSACTION;
END CATCH
END
This approach reduce deadlocks (absent at all) but sometimes i got EMPTY output parameter.
Tested with 30 contemporary threads (so, 30 customers processes that insert orders at the same moment)
Here a debug log with query duration, in milliseconds: http://nopaste.info/285f558758.html
Enough robust for production?
If you do discover that the current solution is creating problems, and it's possible that it won't, then an alternative would be to have a table for each id type you want to create with an identity column and a dummy field
ie:
ABtypeID (ABID int identity(1,1), dummy varchar(1))
You can then insert a record into this table and use the built in functions to retrieve an identity.
ie
insert ABTypeID (dummy) values (null)
select Scope_Identity()
You can delete from these tables as and when you like, and truncacte them at year end to reset the id counters.
You can even wrap the insert in a transaction that gets rolled back - the identity value is not affected by the rollback
SELECT LAST_INSERT_ID() as id FROM table1
Why does this query sometimes return the last inserted id of another table other than table1?
I call it in Node.js (db-mysql plugin) and I can only do queries.
LAST_INSERT_ID() can only tell you the ID of the most recently auto-generated ID for that entire database connection, not for each individual table, which is also why the query should only read SELECT LAST_INSERT_ID() - without specifying a table.
As soon as you fire off another INSERT query on that connection, it gets overwritten. If you want the generated ID when you insert to some table, you must run SELECT LAST_INSERT_ID() immediately after doing that (or use some API function which does this for you).
If you want the newest ID currently in an arbitrary table, you have to do a SELECT MAX(id) on that table, where id is the name of your ID column. However, this is not necessarily the most recently generated ID, in case that row has been deleted, nor is it necessarily one generated from your connection, in case another connection manages to perform an INSERT between your own INSERT and your selection of the ID.
(For the record, your query actually returns N rows containing the most recently generated ID on that database connection, where N is the number of rows in table1.)
SELECT id FROM tableName ORDER BY id DESC LIMIT 1
I usually select the auto-incremented ID field, order by the field descending and limit results to 1. For example, in a wordpress database I can get the last ID of the wp_options table by doing:
SELECT option_id FROM wp_options ORDER BY option_id DESC LIMIT 1;
Hope that helps.
Edit - It may make sense to lock the table to avoid updates to the table which may result in an incorrect ID returned.
LOCK TABLES wp_options READ;
SELECT option_id FROM wp_options ORDER BY option_id DESC LIMIT 1;
Try this. This is working
select (auto_increment-1) as lastId
from information_schema.tables
where table_name = 'tableName'
and table_schema = 'dbName'
Most easy way:
select max(id) from table_name;
I only use auto_increment in MySQL or identity(1,1) in SQL Server if I know I'll never care about the generated id.
select last_insert_id() is the easy way out, but dangerous.
A way to handle correlative ids is to store them in a util table, something like:
create table correlatives(
last_correlative_used int not null,
table_identifier varchar(5) not null unique
);
You can also create a stored procedure to generate and return the next id of X table
drop procedure if exists next_correlative;
DELIMITER //
create procedure next_correlative(
in in_table_identifier varchar(5)
)
BEGIN
declare next_correlative int default 1;
select last_correlative_used+1 into next_correlative from correlatives where table_identifier = in_table_identifier;
update correlatives set last_correlative_used = next_correlative where table_identifier = in_table_identifier;
select next_correlative from dual;
END //
DELIMITER ;
To use it
call next_correlative('SALES');
This allows you to reserve ids before inserting a record. Sometimes you want to display the next id in a form before completing the insertion and helps to isolate it from other calls.
Here's a test script to mess around with:
create database testids;
use testids;
create table correlatives(
last_correlative_used int not null,
table_identifier varchar(5) not null unique
);
insert into correlatives values(1, 'SALES');
drop procedure if exists next_correlative;
DELIMITER //
create procedure next_correlative(
in in_table_identifier varchar(5)
)
BEGIN
declare next_correlative int default 1;
select last_correlative_used+1 into next_correlative from correlatives where table_identifier = in_table_identifier;
update correlatives set last_correlative_used = next_correlative where table_identifier = in_table_identifier;
select next_correlative from dual;
END //
DELIMITER ;
call next_correlative('SALES');
If you want to use these workarounds:
SELECT id FROM tableName ORDER BY id DESC LIMIT 1
SELECT MAX(id) FROM tableName
It's recommended to use a where clause after inserting rows. Without this you are going to have inconsistency issues.
in my table inv_id is auto increment
for my purpose this is worked
select `inv_id` from `tbl_invoice`ORDER BY `inv_id` DESC LIMIT 1;
I have a table that contains computer login and logoff events. Each row is a separate event with a timestamp, machine name, login or logoff event code and other details. I need to create a SQL procedure that goes through this table and locates corresponding login and logoff event and insert new rows into another table that contain the machine name, login time, logout time and duration time.
So, should I use a cursor to do this or is there a better way to go about this? The database is pretty huge so efficiency is certainly a concern. Any suggested pseudo code would be great as well.
[edit : pulled from comment]
Source table:
History (
mc_id
, hs_opcode
, hs_time
)
Existing data interpretation:
Login_Event = unique mc_id, hs_opcode = 1, and hs_time is the timestamp
Logout_Event = unique mc_id, hs_opcode = 2, and hs_time is the timestamp
First, your query will be simpler (and faster) if you can order the data in such a way that you don't need a complex subquery to pair up the rows. Since MySQL doesn't support CTE to do this on-the-fly, you'll need to create a temporary table:
CREATE TABLE history_ordered (
seq INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
hs_id INT,
mc_id VARCHAR(255),
mc_loggedinuser VARCHAR(255),
hs_time DATETIME,
hs_opcode INT
);
Then, pull and sort from your original table into the new table:
INSERT INTO history_ordered (
hs_id, mc_id, mc_loggedinuser,
hs_time, hs_opcode)
SELECT
hs_id, mc_id, mc_loggedinuser,
hs_time, hs_opcode
FROM history ORDER BY mc_id, hs_time;
You can now use this query to correlate the data:
SELECT li.mc_id,
li.mc_loggedinuser,
li.hs_time as login_time,
lo.hs_time as logout_time
FROM history_ordered AS li
JOIN history_ordered AS lo
ON lo.seq = li.seq + 1
AND li.hs_opcode = 1;
For future inserts, you can use a trigger like below to keep your duration table updated automatically:
DELIMITER $$
CREATE TRIGGER `match_login` AFTER INSERT ON `history`
FOR EACH ROW
BEGIN
IF NEW.hs_opcode = 2 THEN
DECLARE _user VARCHAR(255);
DECLARE _login DATETIME;
SELECT mc_loggedinuser, hs_time FROM history
WHERE hs_time = (
SELECT MAX(hs_time) FROM history
WHERE hs_opcode = 1
AND mc_id = NEW.mc_id
) INTO _user, _login;
INSERT INTO login_duration
SET machine = NEW.mc_id,
logout = NEW.hs_time,
user = _user,
login = _login;
END IF;
END$$
DELIMITER ;
CREATE TABLE dummy (fields you'll select data into, + additional fields as needed)
INSERT INTO dummy (columns from your source)
SELECT * FROM <all the tables where you need data for your target data set>
UPDATE dummy SET col1 = CASE WHEN this = this THEN that, etc
INSERT INTO targetTable
SELECT all columns FROM dummy
Without any code that you're working on.. it'll be hard to see if this approach will be any useful.. There may be some instances when you really need to loop through things.. and some instances when this approach can be used instead..
[EDIT: based on poster's comment]
Can you try executing this and see if you get the desired results?
INSERT INTO <your_target_table_here_with_the_three_columns_required>
SELECT li.mc_id, li.hs_time AS login_time, lo.hs_time AS logout_time
FROM
history AS li
INNER JOIN history AS lo
ON li.mc_id = lo.mc_id
AND li.hs_opcode = 1
AND lo.hs_opcode = 2
AND lo.hs_time = (
SELECT min(hs_time) AS hs_time
FROM history
WHERE hs_time > li.hs_time
AND mc_id = li.mc_id
)
Consider a structure where you have a many-to-one (or one-to-many) relationship with a condition (where, order by, etc.) on both tables. For example:
CREATE TABLE tableTwo (
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
eventTime DATETIME NOT NULL,
INDEX (eventTime)
) ENGINE=InnoDB;
CREATE TABLE tableOne (
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
tableTwoId INT UNSIGNED NOT NULL,
objectId INT UNSIGNED NOT NULL,
INDEX (objectID),
FOREIGN KEY (tableTwoId) REFERENCES tableTwo (id)
) ENGINE=InnoDB;
and for an example query:
select * from tableOne t1
inner join tableTwo t2 on t1.tableTwoId = t2.id
where objectId = '..'
order by eventTime;
Let's say you index tableOne.objectId and tableTwo.eventTime. If you then explain on the above query, it will show "Using filesort". Essentially, it first applies the tableOne.objectId index, but it can't apply the tableTwo.eventTime index because that index is for the entirety of tableTwo (not the limited result set), and thus it must do a manual sort.
Thus, is there a way to do a cross-table index so it wouldn't have to filesort each time results are retrieved? Something like:
create index ind_t1oi_t2et on tableOne t1
inner join tableTwo t2 on t1.tableTwoId = t2.id
(t1.objectId, t2.eventTime);
Also, I've looked into creating a view and indexing that, but indexing is not supported for views.
The solution I've been leaning towards if cross-table indexing isn't possible is replicating the conditional data in one table. In this case that means eventTime would be replicated in tableOne and a multi-column index would be set up on tableOne.objectId and tableOne.eventTime (essentially manually creating the index). However, I thought I'd seek out other people's experience first to see if that was the best way.
Thanks much!
Update:
Here are some procedures for loading test data and comparing results:
drop procedure if exists populate_table_two;
delimiter #
create procedure populate_table_two(IN numRows int)
begin
declare v_counter int unsigned default 0;
while v_counter < numRows do
insert into tableTwo (eventTime)
values (CURRENT_TIMESTAMP - interval 0 + floor(0 + rand()*1000) minute);
set v_counter=v_counter+1;
end while;
end #
delimiter ;
drop procedure if exists populate_table_one;
delimiter #
create procedure populate_table_one
(IN numRows int, IN maxTableTwoId int, IN maxObjectId int)
begin
declare v_counter int unsigned default 0;
while v_counter < numRows do
insert into tableOne (tableTwoId, objectId)
values (floor(1 +(rand() * maxTableTwoId)),
floor(1 +(rand() * maxObjectId)));
set v_counter=v_counter+1;
end while;
end #
delimiter ;
You can use these as follows to populate 10,000 rows in tableTwo and 20,000 rows in tableOne (with random references to tableOne and random objectIds between 1 and 5), which took 26.2 and 70.77 seconds respectively to run for me:
call populate_table_two(10000);
call populate_table_one(20000, 10000, 5);
Update 2 (Tested Triggering SQL):
Below is the tried and tested SQL based on daniHp's triggering method. This keeps the dateTime in sync on tableOne when tableOne is added or tableTwo is updated. Also, this method should also work for many-to-many relationships if the condition columns are copied to the joining table. In my testing of 300,000 rows in tableOne and 200,000 rows in tableTwo, the speed of the old query with similar limits was 0.12 sec and the speed of the new query still shows as 0.00 seconds. Thus, there is a clear improvement, and this method should perform well into the millions of rows and farther.
alter table tableOne add column tableTwo_eventTime datetime;
create index ind_t1_oid_t2et on tableOne (objectId, tableTwo_eventTime);
drop TRIGGER if exists t1_copy_t2_eventTime;
delimiter #
CREATE TRIGGER t1_copy_t2_eventTime
BEFORE INSERT ON tableOne
for each row
begin
set NEW.tableTwo_eventTime = (select eventTime
from tableTwo t2
where t2.id = NEW.tableTwoId);
end #
delimiter ;
drop TRIGGER if exists upd_t1_copy_t2_eventTime;
delimiter #
CREATE TRIGGER upd_t1_copy_t2_eventTime
BEFORE UPDATE ON tableTwo
for each row
begin
update tableOne
set tableTwo_eventTime = NEW.eventTime
where tableTwoId = NEW.id;
end #
delimiter ;
And the updated query:
select * from tableOne t1
inner join tableTwo t2 on t1.tableTwoId = t2.id
where t1.objectId = 1
order by t1.tableTwo_eventTime desc limit 0,10;
As you know, SQLServer achieves this with indexed views:
indexed views provide additional performance benefits that cannot be
achieved using standard indexes. Indexed views can increase query
performance in the following ways:
Aggregations can be precomputed and stored in the index to minimize
expensive computations during query execution.
Tables can be prejoined and the resulting data set stored.
Combinations of joins or aggregations can be stored.
In SQLServer, to take advantage of this technique, you must query over the view and not over the tables. That means that you should know about the view and indexes.
MySQL does not have indexed views, but you can simulate the behavior with table + triggers + indexes.
Instead of creating a view, you must create an indexed table, a trigger to keep the data table up to date, and then you must query your new table instead of your normalized tables.
You must evaluate if the overhead of write operations offsets the improvement in read operations.
Edited:
Note that it is not always necessary to create a new table. For example, in a 1:N relationship (master-detail) trigger, you can keep a copy of a field from the 'master' table into the 'detail' table. In your case:
CREATE TABLE tableOne (
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
tableTwoId INT UNSIGNED NOT NULL,
objectId INT UNSIGNED NOT NULL,
desnormalized_eventTime DATETIME NOT NULL,
INDEX (objectID),
FOREIGN KEY (tableTwoId) REFERENCES tableTwo (id)
) ENGINE=InnoDB;
CREATE TRIGGER tableOne_desnormalized_eventTime
BEFORE INSERT ON tableOne
for each row
begin
DECLARE eventTime DATETIME;
SET eventTime =
(select eventTime
from tableOne
where tableOne.id = NEW.tableTwoId);
NEW.desnormalized_eventTime = eventTime;
end;
Notice that this is a before insert trigger.
Now, the query is rewritten as follows:
select * from tableOne t1
inner join tableTwo t2 on t1.tableTwoId = t2.id
where t1.objectId = '..'
order by t1.desnormalized_eventTime;
Disclaimer: not tested.
Cross-table indexing is not possible in MySQL except via the now-defunct Akiban(?) Engine.
I have a rule: "Do not normalize 'continuous' values such as INTs, FLOATs, DATETIMEs, etc." The cost of the JOIN when you need to sort or range-test on the continuous value will kill performance.
DATETIME takes 5 bytes; INT takes 4. So any 'space' argument toward normalizing a datetime is rather poor. It is rare that you would need to 'normalize' a datetime in the off chance that all uses of a particular value were to change.
May be I'm wrong , but if this is my application I will not duplicate the data unless I need to order by 2 columns in 2 different tables and this is a hot query (it's required many times). But since there is no clear cut solution to avoid the filesort, what about this little trick (force the optimizer to use the index on the column in the order by clause eventTime)
select * from tableOne t1
inner join tableTwo t2 use index (eventTime) on t1.tableTwoId = t2.id and t2.eventTime > 0
where t1.objectId = 1
order by t2.eventTime desc limit 0,10;
notice use index (eventTime) and t2.eventTime > 0
It's explain shows that the optimizer has used the index on eventTime instead of filesort
1 SIMPLE t2 range eventTime eventTime 5 5000 Using where; Using index
1 SIMPLE t1 ref objectId,tableTwoId tableTwoId 4 tests.t2.id 1 Using where