MySQL: How to update unique key on duplicate unique key - mysql

Given the following table structure, how can I change the value of primary to 0 when a duplicate unique index is found?
CREATE TABLE `ncur` (
`user_id` INT NOT NULL,
`rank_id` INT NOT NULL,
`primary` TINYINT DEFAULT NULL,
PRIMARY KEY (`user_id`, `rank_id`),
UNIQUE (`user_id`, `primary`)
);
So, when I run a query like this:
UPDATE `ncur` SET `primary` = 1 WHERE `user_id` = 4 AND `rank_id` = 5;
When a constraint of user_id-primary is matched, I want it to set all primary values for user_id to NULL, and then complete the update query by updating the row it had found.

I am not as much familiar with MySQL as I am with Oracle; However, I think this query should work for you:
UPDATE `ncur` a
SET `primary` = (
/* 1st Subquery */
SELECT 1 FROM (SELECT * FROM `ncur`) b
WHERE b.`user_id` = a.`user_id` AND b.`rank_id` = a.`rank_id`
AND a.`rank_id` = 5
UNION ALL
/* 2nd Subquery */
SELECT 0 FROM (SELECT * FROM `ncur`) b
WHERE b.`user_id` = a.`user_id` AND b.`rank_id` <> 5 AND a.`rank_id` <> 5
GROUP BY `user_id`
HAVING COUNT(*) = 1
)
WHERE `user_id` = 4
Justification:
The query updates all the records that have user_id = 4.
For each of such records, primary is set to a different value of 1, 0, or NULL, depending on the value of rank_id in this record as well as the information regarding how many other records with the same user_id exists in the table.
The subquery that returns the value for primary consists of three subqueries, only one of which returns a value depending on the circumstances.
1st Subquery: This subquery returns 1 for the record with rank_id = 5; Otherwise it returns NULL.
2nd Subquery: This subquery returns 0 for the records with rank_id
!= 5 if there is only one such record in the table; otherwise it returns NULL.
Please note: if the query is run while there are no records with rank_id = 5, it will still update the other records according to the rules specified above. If this is not desired, the condition in the parent query must be changed from:
WHERE `user_id` = 4
to:
WHERE `user_id` = 4 AND
EXISTS(SELECT * FROM (SELECT * FROM `ncur`) b WHERE 'rank_id` = 5)

Related

MySQL - Select only the rows that have not been selected in the last read

Problem description
I have a table, say trans_flow:
CREATE TABLE trans_flow (
id BIGINT(20) AUTO_INCREMENT PRIMARY KEY,
card_no VARCHAR(50) DEFAULT NULL,
money INT(20) DEFAULT NULL
)
New data is inserted into this table constantly.
Now, I want to fetch only the rows that have not been fetched in the last query. For example, at 5:00, id ranges from 1 to 100, and I read the rows 80 - 100 and do some processing. Then, at 5:01, the id comes to 150, and I want to get exactly the rows 101 - 150. Otherwise, the processing program will read in old and already processed data. Note that such queries are committed continuously. From a certain perspective, I want to implement "streaming process" on MySQL.
A tentative idea
I have a simple but maybe ugly solution. I create an auxiliary table query_cursor which stores the beginning and end ids of one query:
CREATE TABLE query_cursor (
task_id VARCHAR(20) PRIMARY KEY COMMENT 'Specify which task is reading this table',
first_row_id BIGINT(20) DEFAULT NULL,
last_row_id BIGINT(20) DEFAULT NULL
)
During each query, I first update the query range stored in this table by:
UPDATE query_cursor
SET first_row_id = (SELECT last_row_id + 1 FROM query_cursor WHERE task_id = 'xxx'),
last_row_id = (SELECT MAX(id) FROM trans_flow)
WHERE task_id = 'xxx'
And then, doing query on table trans_flow using stored cursors:
SELECT * FROM trans_flow
WHERE id BETWEEN (SELECT first_row_id FROM query_cursor WHERE task_id = 'xxx')
AND (SELECT last_row_id FROM query_cursor WHERE task_id = 'xxx')
Question for help
Is there a simpler and more elegant implementation that can achieve the same effect (the best if no need to use an auxiliary table)? The version of MySQL is 5.7.

MySQL self join to flag duplicate rows - better way to do this?

I have a table that occasionally has duplicate row values, so I want to update anything except the first one and flag it as a duplicate. Currently I'm using this but it can be very slow:
UPDATE _gtemp X
JOIN _gtemp Y
ON CONCAT(X.gt_spid, "-", X.gt_cov) = CONCAT(Y.gt_spid, "-", Y.gt_cov)
AND Y.gt_dna = 0
AND Y.gt_gtid < X.gt_gtid
SET X.gt_dna = 1;
gt_spid is a numerical ID, and gt_cov is CHAR(3). I have an index on gt_spid and a 2nd index on gt_spid, gt_cov. At times this table can be upwards of 250,000 rows, but even at 30,000 it takes forever.
Is there a better way to accomplish this? I can change the table as needed.
CREATE TABLE `_gtemp` (
`gt_gtid` int(11) NOT NULL AUTO_INCREMENT,
`gt_group` varchar(10) DEFAULT NULL,
`gt_spid` int(11) DEFAULT NULL,
`gt_cov` char(3) DEFAULT NULL,
`gt_dna` tinyint(1) DEFAULT '0'
PRIMARY KEY (`gt_gtid`),
KEY `spid` (`gt_spid`),
KEY `spidcov` (`gt_spid`,`gt_cov`) USING HASH
)
The way you have used CONCAT makes MySQL optimizer lose it's indexes, resulting in very slow running query.
That's why you need to replace CONCAT with AND statements like below
UPDATE
_gtemp X
JOIN
_gtemp Y
ON
X.gt_spid = Y.gt_spid
AND
X.gt_cov = Y.gt_cov
AND
Y.gt_dna = 0
AND
Y.gt_gtid < X.gt_gtid
SET X.gt_dna = 1;
You can eliminate CONCAT in ON clause and replace it with AND as follows.
Also have moved one restriction from ON to WHERE clause.
Add index to gt_dna
UPDATE _gtemp X
JOIN _gtemp Y
ON X.gt_spid = Y.gt_spid
AND X.gt_cov = Y.gt_cov
AND Y.gt_dna = 0
SET X.gt_dna = 1
WHERE Y.gt_gtid < X.gt_gtid

How do I delete the first matching record in Table B for each record in Table A?

Table A contains multiple records that should be deleted from Table B. However, there can be multiple records in Table B that match a single record in Table A. I only want to delete the first matching record in Table B for each record in Table A. If there are 50 records in Table A then a maximum of 50 records should be deleted from Table B. I'm using the SQL statement below which is deleting more records from Table B than are listed in Table A due to multiple matches. I can not further restrict the matching criteria in my statement due to limitations in the data.
DELETE FROM [#DraftInvoiceRecords] FROM [#DraftInvoiceRecords]
INNER JOIN [#ReversedRecords]
ON [#DraftInvoiceRecords].employee = [#ReversedRecords].employee
and [#DraftInvoiceRecords].amount = [#ReversedRecords].amount
and [#DraftInvoiceRecords].units = [#ReversedRecords].units
You need some way to distinguish the rows to delete from the rows to keep. I've used someOtherColumn in the below to achieve this:
create table #DraftInvoiceRecords (
employee int not null,
amount int not null,
units int not null,
someOtherColumn int not null
)
create table #ReversedRecords (
employee int not null,
amount int not null,
units int not null
)
insert into #DraftInvoiceRecords (employee,amount,units,someOtherColumn)
select 1,1,1,1 union all
select 1,1,1,2
insert into #ReversedRecords (employee,amount,units)
select 1,1,1
delete from dir
from
#DraftInvoiceRecords dir
inner join
#ReversedRecords rr
on
dir.employee = rr.employee and
dir.amount = rr.amount and
dir.units = rr.units
left join
#DraftInvoiceRecords dir_anti
on
dir.employee = dir_anti.employee and
dir.amount = dir_anti.amount and
dir.units = dir_anti.units and
dir.someOtherColumn > dir_anti.someOtherColumn --It's this condition here that allows us to distinguish the rows
where
dir_anti.employee is null
select * from #DraftInvoiceRecords
drop table #DraftInvoiceRecords
drop table #ReversedRecords

MySQL: How to optimize this simple GROUP BY+ORDER BY query?

I have one mysql table:
CREATE TABLE IF NOT EXISTS `test` (
`Id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`SenderId` int(10) unsigned NOT NULL,
`ReceiverId` int(10) unsigned NOT NULL,
`DateSent` datetime NOT NULL,
`Notified` tinyint(1) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`Id`),
KEY `ReceiverId_SenderId` (`ReceiverId`,`SenderId`),
KEY `SenderId` (`SenderId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
The table is populated with 10.000 random rows for testing by using the following procedure:
DELIMITER //
CREATE DEFINER=`root`#`localhost` PROCEDURE `FillTest`(IN `cnt` INT)
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE intSenderId INT;
DECLARE intReceiverId INT;
DECLARE dtDateSent DATE;
DECLARE blnNotified INT;
WHILE (i<=cnt) DO
SET intSenderId = FLOOR(1 + (RAND() * 50));
SET intReceiverId = FLOOR(51 + (RAND() * 50));
SET dtDateSent = str_to_date(concat(floor(1 + rand() * (12-1)),'-',floor(1 + rand() * (28 -1)),'-','2008'),'%m-%d-%Y');
SET blnNotified = FLOOR(1 + (RAND() * 2))-1;
INSERT INTO test (SenderId, ReceiverId, DateSent, Notified)
VALUES(intSenderId,intReceiverId,dtDateSent, blnNotified);
SET i=i+1;
END WHILE;
END//
DELIMITER ;
CALL `FillTest`(10000);
The problem:
I need to write a query which will group by ‘SenderId, ReceiverId’ and return the first 100 highest Ids of each group, ordered by Id in ascending order.
I played with GROUP BY, ORDER BY and MAX(Id), but the query was too slow, so I came up with this query:
SELECT SQL_NO_CACHE t1.*
FROM test t1
LEFT JOIN test t2 ON (t1.ReceiverId = t2.ReceiverId AND t1.SenderId = t2.SenderId AND t1.Id < t2.Id)
WHERE t2.Id IS NULL
ORDER BY t1.Id ASC
LIMIT 100;
The above query returns the correct data, but it becomes too slow when the test table has more than 150.000 rows . On 150.000 rows the above query needs 7 seconds to complete. I expect the test table to have between 500.000 – 1M rows, and the query needs to return the correct data in less than 3 sec. If it’s not possible to fetch the correct data in less than 3 sec, than I need it to fetch the data using the fastest query possible.
So, how can the above query be optimized so that it runs faster?
Reasons why this query may be slow:
It's a lot of data. Lots of it may be returned. It returns the last record for each SenderId/ReceiverId combination.
The division of the data (many Sender/Receiver combinations, or relative few of them, but with multiple 'versions'.
The whole result set must be sorted by MySQL, because you need the first 100 records, sorted by Id.
These make it hard to optimize this query without restructuring the data. A few suggestions to try:
- You could try using NOT EXISTS, although I doubt if it would help.
SELECT SQL_NO_CACHE t1.*
FROM test t1
WHERE NOT EXISTS
(SELECT 'x'
FROM test t2
WHERE t1.ReceiverId = t2.ReceiverId AND t1.SenderId = t2.SenderId AND t1.Id < t2.Id)
ORDER BY t1.Id ASC
LIMIT 100;
- You could try using proper indexes on ReceiverId, SenderId and Id. Experiment with creating a combined index on the three columns. Try two versions, one with Id being the first column, and one with Id being the last.
With slight database modifications:
- You could save a combination of SenderId/ReceiverId in a separate table with a LastId pointing to the record you want.
- You could save a 'PreviousId' with each record, keeping it NULL for the last record per Sender/Receiver. You only need to query the records where previousId is null.

How to add and populate a sequence column to a link table

Assuming I have a table like the one below:
create table filetype_filestatus (
id integer(11) not null auto_increment,
file_type_id integer(11) not null,
file_status_id integer(11) not null,
)
I want to add a sequence column like so:
alter table filetype_filestatus add column sequence integer(11) not null;
alter table filetype_filestatus add unique key idx1 (file_type_id, file_status_id, sequence);
Now I want to add the column, which is straightforward, and populate it with some default values that satisfy the unique key.
The sequence column is to allow the user to arbitrarily order the display of file_status for a particular file_type. I'm not too concerned by the initial order since that can be revised in the application.
Ideally I would end up with something like:
FileType FileStatus Sequence
1 1 1
1 2 2
1 3 3
2 2 1
2 2 2
The best I can think of is something like:
update filetype_filestatus set sequence = file_type_id * 1000 + file_status_id;
Are there better approaches?
Hmm, I believe this should work:
UPDATE filetype_filestatus as a
SET sequence = (SELECT COALESCE(MAX(b.sequence), 0)
FROM filetype_filestatus as b
WHERE b.file_type_id = a.file_type_id) + 1
WHERE sequence = 0
I'd recommend adding the new column to the table, running the alter table statement (and getting the default of 0), run the update statement, then add the constraint (well, you have to anyways). Anything that gets touched updates to a sequence greater than 0, so this can be safely run multiple times, too.
EDIT:
As #Dems has pointed out, the subquery is being run before the update, and so the above doesn't actually work for this purpose. It does work on single-line inserts (which doesn't help at all here).
EDIT:
Gah, you have an id column, this works just fine (and yes, I tested this one first).
UPDATE filetype_filestatus as a
SET sequence = (SELECT COALESCE(COUNT(*), 0)
FROM filetype_filestatus as b
WHERE b.file_type_id = a.file_type_id
AND b.id < a.id) + 1
WHERE sequence = 0
Don't know about the performance implications, though.
If all you need are "some values that conform to idx1", why not just copy the id field? It is, after all, unique...
UPDATE
filetype_filestatus
SET
sequence = id;
EDIT
How to get sequential values based on the OPs changes to the question being asked.
ROW_NUMBER() is not available in MySQL, and it is also my understanding that you can't use the table being updated in the source query as well.
create temporary table temp_filetype_filestatus (
id integer(11) not null auto_increment,
file_type_id integer(11) not null,
file_status_id integer(11) not null,
PRIMARY KEY (file_type_id, file_status_id)
)
INSERT INTO temp_filetype_filestatus (
file_type_id,
file_status_id
)
SELECT
file_type_id,
file_status_id
FROM
filetype_filestatus
ORDER BY
file_type_id,
file_status_id
-- Update Option 1
------------------
UPDATE
filetype_filestatus
SET
sequence
=
(SELECT id FROM temp_filetype_filestatus
WHERE file_type_id = filetype_filestatus.file_type_id
AND file_status_id = filetype_filestatus.file_status_id)
-
(SELECT id FROM temp_filetype_filestatus
WHERE file_type_id = filetype_filestatus.file_type_id
ORDER BY file_status_id ASC LIMIT 1)
+
1
-- Update Option 2
------------------
UPDATE
filetype_filestatus
SET
sequence
=
(SELECT COUNT(*) FROM temp_filetype_filestatus
WHERE file_type_id = filetype_filestatus.file_type_id
AND file_status_id <= filetype_filestatus.file_status_id)