Duplicates in MYSQL based on date and status - mysql

This is my current query to find duplicate phone numbers.
SELECT
id, cust_num, entry_date
FROM
nums
GROUP BY
cust_num
HAVING
count(*) >= 2
ORDER BY
id
DESC
However, now Im looking to update them all in one go, based on criteria.
I only want to update the newer ones with higher id's than the original.
And only if they have a certain status.
Heres an example, of what I would update based on a list of duplicates pulled from database.
ID | Num | date | status
1 555 Sep-12 NEW (this row wouldnt update first instance of 555)
33 555 Oct-12 NEW (this row would update, duplicate with NEW as status)
42 333 Dec-12 NEW (this row wouldn't update first instance of 333)
5 555 Jan-13 ACTIVE (this row wouldnt update, duplicate but wrong status)
66 333 Feb-14 NEW (this row would update, duplicate with NEW as status)
6 555 Jan-13 NEW (this row would update, duplicate with NEW as status)
77 333 Mar 15 ACTIVE (this row wouldnt update, duplicate but wrong status)
So the real question is, what query would I use to pull all the duplicates like this, and then update them based on their status.

UPDATE nums n SET ... WHERE n.status='NEW' AND (select count(*) from nums where num = n.num and id < n.id and status = 'NEW') > 0;
Add in your SET statement for whatever you want to update.

Here is the select. and the link to fiddle
select * from mytable as m where `status` = 'NEW' and exists
(select * from mytable as mm where mm.`id` < m.`id` and mm.`num` = m.`num`)
Update status to UPDATED
update mytable as m set m.`status` = 'UPDATED' where `status` = 'NEW' and exists
(select * from mytable as mm where mm.`id` < m.`id` and mm.`num` = m.`num`)

Related

MYSQL How can I update a row in a table after updating the same table?

Each row of my table has a child, For example ID 1 is parent of 11 and 11 is parent of 111 and each row has a balance, I need that if I update the balance of 111, the balance of 11 update and the balance of 1 too
for example: UPDATE ACCOUNTS SET value = 100 WHERE ID = 1
In this case the value of 11 is going to be 100 and the value of 1
then I can do something like this : UPDATE ACCOUNTS SET value = value + 150 WHERE ID = 11; in this case The value of 11 is going to be 250 and the value of 1 will be 250 and the value of 1 should stay 100. I need to do something like that
IM using mySQL
As you mentioned in the comments, MySQL will generally not allow you to define an update trigger which itself would trigger more updates on the same table. One option here, assuming you are using MySQL 8+, would be to define a recursive CTE which targets all records intended for the update:
WITH RECURSIVE cte (id, value, parent_id) AS (
SELECT id, value, parent_id
FROM ACCOUNTS
WHERE id = 111
UNION ALL
SELECT t1.id, t1.value, t1.parent_id
FROM ACCOUNTS t1
INNER JOIN cte t2
ON t1.id = t2.parent_id
)
UPDATE ACCOUNTS a1
INNER JOIN cte a2
ON a1.id = a2.id
SET value = 100;
This assumes that you would want to do the same update logic for each matching id in the hierarchy. The CTE will generate all records starting from id = 111, and working backwards up the tree to the root.

MySQL to not return same results

I have a table with list of customers:
customer
c_id c_name c_email c_role
1 abc1 a1#abc.com Dev
2 abc2 a2#abc.com Dev
3 abc3 a3#abc.com Dev
4 abc4 a4#abc.com Dev
5 abc5 a5#abc.com Dev
6 abc6 a6#abc.com Dev
7 abc7 a7#abc.com Dev
8 abc8 a8#abc.com Dev
9 abc9 a9#abc.com Dev
I query the table in the following way:
select * from customer where c_role = 'Dev' order by c_id limit 2;
So, I get the results with:
c_id c_name c_email c_role
1 abc1 a1#abc.com Dev
2 abc2 a2#abc.com Dev
The business requirements says that if any records are accessed by a set of users for last 3 days, then those should not return in the subsequent query output.
So, if the user runs a query again for the next 3 days:
select * from customer where c_role = 'Dev' order by c_id limit 2;
The result should be:
c_id c_name c_email c_role
3 abc3 a3#abc.com Dev
4 abc4 a4#abc.com Dev
Can anyone help me how to create this kind of rule in MySQL?
Adding a new column in current table is not going to help you.
You will have to create another table where you store all c_ids a user has accessed and the datetime when the query was executed.
CREATE TABLE IF NOT EXISTS `access_record` (
`id` INT(11) NOT NULL AUTO_INCREMENT ,
`c_id` INT(11) NOT NULL , // id of the record which user accessed
`user_id` INT(11) NOT NULL , // id of the user who accessed the record
`accessed_at` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ,
PRIMARY KEY (`id`)
);
So whenever the user runs the next query you can use this table to know if user has already accessed a record or not and then use those c_ids to exclude them from next result set.
SELECT
c.c_id, c.c_role,c.c_name,c.c_email
FROM
customer AS c
WHERE
c.c_role = 'Dev'
AND c.c_id NOT IN (
SELECT
ar.c_id
FROM
access_record AS ar
WHERE ar.user_id = 1 // ofcourse this will change with each user (current user in your case I assume)
AND ar.accessed_at > DATE_SUB(NOW(), INTERVAL 3 DAY)
)
ORDER BY c.c_id LIMIT 2;
This will give you all records which were not accessed by specific user within last 3 days.
I hope this helps.
Answering #dang's question in comment
How do I populate access_record when a query runs?
When you have fetched all the records then you extract c_ids from those records then you insert those c_ids into the access_record table.
In MYSQL this query should do the trick
INSERT INTO access_record (c_id,user_id)
SELECT
c.c_id, 1 // user_id of the user who is fetching records
FROM
customer AS c
WHERE
c.c_role = 'Dev'
AND c.c_id NOT IN (
SELECT
ar.c_id
FROM
access_record AS ar
WHERE ar.user_id = 1 // ofcourse this will change with each user (current user in your case I assume)
AND ar.accessed_at > DATE_SUB(NOW(), INTERVAL 3 DAY)
)
ORDER BY c.c_id LIMIT 2;
You can also fetch those c_ids with one query then use second query to insert those c_ids into the access_record table.
If you have all your records fetched in $records then
$c_ids = array_column($temp, 'c_id'); // get all c_ids from fetched record array
Now run a query to insert all those c_ids.
I would add an extra table with users and accessdate. And make the business logic update those on access. For example:
user | accessdate | c_id
Your 'customer' table is data about customers. That is all it should have.
However, your selection criteria is really not what it appears to be. What business requirements want you to implement is a feed, or pipeline, with the selection acting as a consumer, being fed by un-accessed customers.
Each user (or group of users, ie: 'set of users') needs it's own feed, but that can be managed by a single table with a distinguishing field. So we need a user_group table to group your 'set of users'.
// user_group
g_id g_data
201 abc1
202 abc2
203 abc3
We will need to populate customer_feed with the access timestamps for each customer. We can add a foreign keys to delete customers and user_groups when they are deleted, but we will need update the customer feed when we use it.
create table customer_feed (
c_id int(11) not null,
g_id int(11) not null,
at timestamp not null,
primary key (c_id, g_id),
constraint customer_fk foreign key (c_id) references customer on delete cascade
constraint user_group_fk foreign key (g_id) references user_group on delete cascade,
);
// customer_feed
c_id g_id at
101 201 2018-11-26 07:40:21
102 201 2018-11-26 07:40:21
103 201 2018-11-26 07:40:22
When we want to read the customer data, we must do three queries:
Update the feed for the current user-group.
Get the users from the feed
Mark the users in the feed as consumed.
So, let's say we are using user_group 201.
When we update the feed, any new users to the feed are available to be read straight away,we give them a timestamp which is very early. So we can commemorate the battle of hastings...
// 1. update the customer_feed for user_group 201
insert into customer_feed(select c.c_id,201,TIMESTAMP('1066-10-14')
from customer c left join customer_feed f
on c.c_id = f.c_id and f.g_id=201 where f.c_id is null);
we select from the customer and the feed... We only accept records whose access dates are less than three days ago. This is the query you originally had, but with the feed restrictions.
// 2. read from feed for user_group 201
select c.* from customer c,customer_feed f where c.c_role='Dev'
and f.g_id=201 and f.c_id=c.c_id
and f.at < date_sub(now(), interval 3 day) limit 2
..and now we need to mark the values from the feed as being consumed. So, we gather the c_id we have selected into a list of c_id, eg: '102,103', and we mark them as consumed.
// 3. Mark the feed as consumed for user_group 201
update customer_feed set at=now() where g_id=201 and c_id in '102,103'
Add a new column in your customer table like start_accessing and then you can run the query:
SELECT *
FROM customer
WHERE c_role = 'Dev'
AND Date_add(start_accessing, INTERVAL 3 day) >= Curdate()
ORDER BY c_id
LIMIT 2;
start_accessing will be the column that will save when the user started accessing the resource.
Add a datetime stamp to the table and query from that.
There might be a way to get a 3 day rotation without having to change the tables.
By calculating batches of devs.
And calculate the current dev batch based on the current date.
The example below is for MySql 7.x (no window functions)
set #date = current_date;
-- set #date = date('2020-07-04');
set #dayslimit = 3;
set #grouplimit = 2;
set #devcnt = (select count(*) cnt
from test_customer
where c_role = 'Dev');
set #curr_rnk = ((floor(datediff(#date, date('2000-01-01')) / #dayslimit)%floor(#devcnt / #dayslimit))+1);
select c_id, c_name, c_email, c_role
-- , rnk
from
(
select t.*,
case when #rn := #rn +1 then #rnk := ceil((#rn/#grouplimit)%(#devcnt+1/#grouplimit)) end as rnk
from test_customer t
cross join (select #rn:=0, #rnk:= null) vars
where c_role = 'Dev'
order by c_id
) q
where rnk = #curr_rnk
order by c_id
limit 2;
A test on rextester here

Update entire column with 1 to n

I have table where there is column named uid , It uses Autoincrement and its updated with 1,2,3 etc. Now I have cron job that deleted rows older than 2 days.So now my uid column is 2345 to n..I want to reset it to 1 to n again.I tried below code
UPDATE `tv` SET `uid` = ''
I was thinking to loop through all rows and update uid via php script, Is there any other alternative with single SQL command ?
You can try something like this:
UPDATE `tv` t
set t.`uid` = (SELECT count(*)
from `tv` s
WHERE t.`uid` >= s.`uid`)
This will count how many uid's are there that are smaller or equal then the one being updated, so when the first UID, lets say 2345 is being updated, there is only 1 uid that is smaller/equal to him so it will get the value 1 and so on...
EDIT: Try this-
UPDATE `tv` t
INNER JOIN(SELECT s.`uid`,count(*) as cnt
from `tv` s
INNER JOIN `tv` ss
ON(s.`uid` >= ss.`uid`)
GROUP BY s.`uid) tt
ON(t.`uid`=tt.`uid`)
SET t.`uid` = tt.cnt
Why don't decrease the uid by:
update tv set uid = uid -1

Ho to assign Previous value in column for each record

I have one table scenario in which data looks like this .
Request Id Field Id Current Key
1213 11 1001
1213 12 1002
1213 12 103
1214 13 799
1214 13 899
1214 13 7
In this when loop starts for first Request ID then it should check all the field ID for that particular request ID. then data should be look like this .
Request Id Field Id Previous Key Current Key
1213 11 null 1001
1213 12 null 1002
1213 12 1002 103
1214 13 null 799
1214 13 799 899
1214 13 899 7
When very first record for Field id for particular request id come then for it should be take null values in Previous key column and the current key will remain the same.
When the second record will come for same field ID its should take previous value of first record in Previous key column and when third record come it should take previous value of second record in Previous column and so on .
When the new field ID came the same thing should be repeated again.
Please let me know if you need any more info.Much needed your help.
You can check this.
Declare #t table (Request_Id int, Field_Id int, Current_Key int)
insert into #t values (1213, 11, 1001),(1213, 12, 1002), (1213, 12, 103) , (1214, 13, 799), (1214, 13, 899), (1214, 13, 7)
;with cte
as (
select 0 rowno,0 Request_Id, 0 Field_Id, 0 Current_Key
union
select ROW_NUMBER() over(order by request_id) rowno, * from #t
)
select
t1.Request_Id , t1.Field_Id ,
case when t1.Request_Id = t2.Request_Id and t1.Field_Id = t2.Field_Id
then t2.Current_Key
else null
end previous_key
, t1.Current_Key
from cte t1, cte t2
where t1.rowno = t2.rowno + 1
Refer link when you want to compare row value
When the second record will come for same field ID...
Tables don't work this way: there is no way to tell that 1213,12,1002 is the "previous" record of 1213,12,103 as you assume in your example.
Do you have any data you can use to sort your records properly? Request id isn't enough because, even if you guarantee that it increments monotonically for each operation, each operation can include multiple values for the same item id which need to be sorted relative to each other.
IN SQL 2008
You do not have the benefit of the lead and lag functions. Instead you must do a query for the new column. Make sure you query both tables in the same order, and add a row_num column. Then select the greatest row_num that is not equal to the current row_num and has the same request_id and field_id.
select a.request_id,
a.field_id,
(select x.current_key
from (select * from (select t.*, RowNumber() as row_num from your_table t) order by row_num desc) x
where x.request_id = a.request_id
and x.field_id = a.field_id
and x.row_num < a.row_num
and RowNumber()= 1
) as previous_key,
a.current_key
from (select t.*, RowNumber()as row_num from your_table t) a
IN SQL 2012+
You can use the LAG or LEAD functions with the OVER clause to get the previous or next nth row value:
select
Request_Id,
Field_Id,
lag(Current_Key,1) over (partition by Request_ID, Field_ID) as Previous_Key
,Current_Key
from your table
You should probably look at how you order your results too. If you have multiple results lag will only grab the next row in the default order of the table. If you had another column to order by such as a date time you could do the following:
lag(Current_Key,1) over (partition by Request_ID, Field_ID order by timestampColumn)
try this,
declare #tb table (RequestId int,FieldId int, CurrentKey int)
insert into #tb (RequestId,FieldId,CurrentKey) values
(1213,11,1001),
(1213,12,1002),
(1213,12,103),
(1214,13,799),
(1214,13,899),
(1214,13, 7)
select RequestId,t.FieldId,
case when t.FieldId=t1.FieldId then t1.CurrentKey end as PreviousKey,t.CurrentKey from
(select *, ROW_NUMBER() over (order by RequestId,FieldId) as rno
from #tb) t left join
(select FieldId,CurrentKey,
ROW_NUMBER() over (order by RequestId,FieldId) as rno from #tb) t1 on t.rno=t1.rno+1

Delete duplicates from db

I have table like following
id | a_id | b_id | success
--------------------------
1 34 43 1
2 34 84 1
3 34 43 0
4 65 43 1
5 65 84 1
6 93 23 0
7 93 23 0
I want delete duplicates with same a_id and b_id, but I want keep one record. If possible kept record should be with success=1. So in example table third and sixth/seventh record should be deleted. How to do this?
I'm using MySQL 5.1
The task is simple:
Find the minimum number of records that should not be deleted.
Delete the other records.
The Oracle way,
delete from sample_table where id not in(
select id from
(
Select id, success,row_number()
over (partition by a_id,b_id order by success desc) rown
from sample_table
)
where (success = 1 and rown = 1) or rown=1)
The solution in mysql:
Will give you the minimum ids that should not be deleted.:
Select id from (SELECT * FROM report ORDER BY success desc) t
group by t.a_id, t.b
o/p:
ID
1
2
4
5
6
You can delete the other rows.
delete from report where id not in (the above query)
The consolidated DML:
delete from report
where id not in (Select id
from (SELECT * FROM report
ORDER BY success desc) t
group by t.a_id, t.b_id)
Now doing a Select on report:
ID A_ID B_ID SUCCESS
1 34 43 1
2 34 84 1
4 65 43 1
5 65 84 1
6 93 23 0
You can check the documentation of how the group by clause works when no aggregation function is provided:
When using this feature, all rows in each group should have the same
values for the columns that are omitted from the GROUP BY part. The
server is free to return any value from the group, so the results are
indeterminate unless all values are the same.
So just performing an order by 'success before the group by would allow us to get the first duplicate row with success = 1.
How about this:
CREATE TABLE new_table
AS (SELECT * FROM old_table WHERE 1 AND success = 1 GROUP BY a_id,b_id);
DROP TABLE old_table;
RENAME TABLE new_table TO old_table;
This method will create a new table with a temporary name, and copy all the deduped rows which have success = 1 from the old table. The old table is then dropped and the new table is renamed to the name of the old table.
If I understand your question correctly, this is probably the simplest solution. (though I don't know if it's really efficient or not)
This should work:
If procedural programming is available to you like e.g. pl/sql it is fairly simple. If you on the other hand is looking for a clean SQL solution it might be possible but not very "nice". Below is an example in pl/sql:
begin
for x in ( select a_id, b_id
from table
having count(*) > 1
group by a_id, b_id )
loop
for y in ( select *
from table
where a_id = x.a_id
and b_id = x.b_id
order by success desc )
loop
delete from table
where a_id = y.a_id
and b_id = y.b_id
and id != x.id;
exit; // Only do the first row
end loop;
end loop;
end;
This is the idea: For each duplicated combination of a_id and b_id select all the instances ordered so that any with success=1 is up first. Delete all of that combination except the first - being the successful one if any.
or perhaps:
declare
l_a_id integer := -1;
l_b_id integer := -1;
begin
for x in ( select *
from table
order by a_id, b_id, success desc )
loop
if x.a_id = l_a_id and x.b_id = l_b_id
then
delete from table where id = x.id;
end if;
l_a_id := x.a_id;
l_b_id := x.b_id;
end loop;
end;
In MySQL, if you dont want to care about which record is maintained, a single alter table will work.
ALTER IGNORE TABLE tbl_name
ADD UNIQUE INDEX(a_id, b_id)
It ignores the duplicate records and maintain only the unique records.
A useful links :
MySQL: ALTER IGNORE TABLE ADD UNIQUE, what will be truncated?