update a column in sql using two conditions - mysql

I have a table in mysql in which i want to update the records in a column called status to 'Duplicate'. I want to mark a record 'duplicate' on the basis of 2 conditions.
The records have duplicate customer id.
Those records which don't have the recent modified date will be marked duplicate.
I have tried the below code but it gives me an error:
UPDATE test_sql_duplicate
SET status = 'Duplicate'
WHERE test_sql_duplicate.modi_date NOT IN (
SELECT *, max(modi_date)
FROM test_sql_duplicate
GROUP BY cust_id
HAVING COUNT(cust_id > 1)

You could use a LEFT JOIN antipattern to identify the records to update. Basically we use a subquery to identify the latest record for each customer, then we use it to exclude the corresponding from the update query:
UPDATE test_sql_duplicate t
LEFT JOIN (
SELECT cust_id, MAX(modi_date) modi_date FROM test_sql_duplicate GROUP BY cust_id
) m ON m.cust_id = t.cust_id and m.modi_date = t.modi_date
SET t.status = 'Duplicate'
WHERE m.cust_id IS NULL

I suspect we may be wanting a query something like this:
UPDATE TEST_SQL_DUPLICATE t
JOIN (
SELECT n.cust_id
, MAX(n.modi_date) AS max_modi_date
FROM TEST_SQL_DUPLICATE n
GROUP
BY n.cust_id
HAVING COUNT(n.cust_id) > 1
) d
ON d.cust_id = t.cust_id
AND d.max_modi_date > t.modi_date
SET t.status = 'Duplicate'
Given sample data:
_row cust_id modi_date
------- ----------
1 444 2019-10-28
2 444 2019-10-28
3 444 2019-10-29
4 444 2019-10-30
5 444 2019-10-30
the query in this answer would flag rows 1 thru 3, set status column to to 'Duplicate'. Rows 4 and 5 would not be marked, because they both have the same (maximum) modi_date.
We would also achieve the same result if we omitted the HAVING clause from the inline view query.

Here is a quick and dirty way:
UPDATE test_sql_duplicate SET status = 'Duplicate'
WHERE cust_id IN (
SELECT t.id FROM (
SELECT
modi_date date,
cust_id id,
COUNT(*) OVER(PARTITION BY cust_id) cnt,
MAX(modi_date) OVER() maxDate
FROM test_sql_duplicate
) t
WHERE t.date < maxDate OR t.cnt > 1);

Related

SQL/MySQL DELETE all rows EXCEPT 2 of them

I have a database table setup like this:
id | code | group_id | status ---
---|-------|---------|------------
1 | abcd1 | group_1 | available
2 | abcd2 | group_1 | available
3 | adsd3 | group_1 | available
4 | dfgd4 | group_1 | available
5 | vfcd5 | group_1 | available
6 | bgcd6 | group_2 | available
7 | abcd7 | group_2 | available
8 | ahgf8 | group_2 | available
9 | dfgd9 | group_2 | available
10 | qwer6 | group_2 | available
In the example above, each group_id has 5 total rows (arbitrary for example, total rows will be dynamic and vary), I need to remove every row that matches available in status except for 2 of them (which 2 does not matter, as long as there are 2 of them remaining)
Basically every unique group_id should only have 2 total rows with status of available. I am able to do a simple SQL query to remove all of them, but struggling to come up with a SQL query to remove all except for 2 ... please helppppp :)
If code is unique, you can use subqueries to keep the "min" and "max"
DELETE FROM t
WHERE t.status = 'available'
AND (t.group_id, t.code) NOT IN (
SELECT group_id, MAX(code)
FROM t
WHERE status = 'available'
GROUP BY group_id
)
AND (t.group_id, t.code) NOT IN (
SELECT group_id, MIN(code)
FROM t
WHERE status = 'available'
GROUP BY group_id
)
Similarly, with an auto increment id:
DELETE FROM t
WHERE t.status = 'available'
AND t.id NOT IN (
SELECT MAX(id) FROM t WHERE status = 'available' GROUP BY group_id
UNION
SELECT MIN(id) FROM t WHERE status = 'available' GROUP BY group_id
)
I reworked the subquery into a UNION instead in this version, but the "AND" format would work just as well too. Also, if "code" was unique across the whole table, the NOT IN could be simplified down to excluding the group_id as well (though it would still be needed in the subqueries' GROUP BY clauses).
Edit: MySQL doesn't like subqueries referencing tables being UPDATEd/DELETEd in the WHERE of the query doing the UPDATE/DELETE; in those cases, you can usually double-wrap the subquery to give it an alias, causing MySQL to treat it as a temporary table (behind the scenes).
DELETE FROM t
WHERE t.status = 'available'
AND t.id NOT IN (
SELECT * FROM (
SELECT MAX(id) FROM t WHERE status = 'available' GROUP BY group_id
UNION
SELECT MIN(id) FROM t WHERE status = 'available' GROUP BY group_id
) AS a
)
Another alternative, I don't recall if MySQL complains as much about joins in DELETE/UPDATE....
DELETE t
FROM t
LEFT JOIN (
SELECT MIN(id) AS minId, MAX(id) AS maxId, 1 AS keep_flag
FROM t
WHERE status = 'available'
GROUP BY group_id
) AS tKeep ON t.id IN (tKeep.minId, tKeep.maxId)
WHERE t.status = 'available'
AND tKeep.keep_flag IS NULL
To keep the min and max ids, I think a join is the simplest solution:
DELETE t
FROM t LEFT JOIN
(SELECT group_id, MIN(id) as min_id, MAX(id) as max_id
FROM t
WHERE t.status = 'available'
GROUP BY group_id
) tt
ON t.id IN (tt.min_id, tt.max_id)
WHERE t.status = 'available' AND
tt.group_id IS NULL;
If the column "id" is the PRIMARY KEY or a UNIQUE KEY, then we could use a correlated subquery to get the second lowest value for a particular group_id.
We could then use that to identify rows for group_id that have higher values of the "id" column.
A query something like this:
SELECT t.`id`
, t.`group_id`
FROM `setup_like_this` t
WHERE t.`status` = 'available'
AND t.`id`
> ( SELECT s.`id`
FROM `setup_like_this` s
WHERE s.`status` = 'available'
AND s.`group_id` = t.`group_id`
ORDER
BY s.`id`
LIMIT 1,1
)
We test that as a SELECT first, to examine the rows that are returned. When we are satisfied this query is returning the set of rows we want to delete, we can replace SELECT ... FROM with DELETE t.* FROM to convert it to a DELETE statement to remove the rows.
Error 1093 encountered converting to DELETE statement.
One workaround is to make the query above into a inline view, and then join to the target table
DELETE q.*
FROM `setup_like_this` q
JOIN ( -- inline view, query from above returns `id` of rows we want to delete
SELECT t.`id`
, t.`group_id`
FROM `setup_like_this` t
WHERE t.`status` = 'available'
AND t.`id`
> ( SELECT s.`id`
FROM `setup_like_this` s
WHERE s.`status` = 'available'
AND s.`group_id` = t.`group_id`
ORDER
BY s.`id`
LIMIT 1,1
)
) r
ON r.id = q.id
select id, code, group_id, status
from (
select id, code, group_id, status
, ROW_NUMBER() OVER (
PARTITION BY group_id
ORDER BY id DESC) row_num
) rownum
from a
) q
where rownum < 3

Group by select based on OR condition

After using UNION with two select queries, I'm getting following results
UserId Name Status
------ ------ --------
1 User1 Active
2 User2 Active
1 User1 InActive
3 User3 InActive
But the expected results is
UserId Name Status
---------------------
1 User1 Active
2 User2 Active
3 User3 InActive
Here what I need is, I want to group by column Id and get status as Active if any one result is active. How to form a SQL query for this?
Can anyone suggest query for any one of the following DB?
MSSQL
Oracle
MySQL
PostgreSQL
Edit:
This is the query I've tried in PostgreSQL
(SELECT DISTINCT User.Id,User.DisplayName,AppAccessToUsers.IsActive='1' AND User.IsActive='1' AS IsStatusActive
FROM Applications Left JOIN AppAccessToUsers ON (Applications.Id=AppAccessToUsers.ApplicationId)
Left JOIN User ON (AppAccessToUsers.UserId=User.Id) WHERE Applications.ClientId='e7e66c1b-b3b8-4ffb-844b-fc4840803265')
UNION
(SELECT DISTINCT User.Id,User.DisplayName,AppAccessToGroups.IsActive='1' AND Group.IsActive='1' AND UserGroup.IsActive='1' AND User.IsActive='1' AS IsStatusActive
FROM Applications Left JOIN AppAccessToGroups ON (Applications.Id=AppAccessToGroups.ApplicationId)
Left JOIN Group ON (AppAccessToGroups.GroupId=Group.Id) Left JOIN UserGroup ON (Group.Id=UserGroup.GroupId)
Left JOIN User ON (UserGroup.UserId=User.Id) WHERE Applications.ClientId='e7e66c1b-b3b8-4ffb-844b-fc4840803265')
Use this query,
SELECT UserId
,Name
,CASE WHEN min(status) = 'Active' THEN 'Active' ELSE 'InActive' END
FROM users GROUP BY UserId,Name
I would do the following, assuming a) your tables are called t1 and t2 (amend as appropriate for your actual table names) and b) the names for each userid in both tables are the same - ie. for userid = 1, both tables have the same name:
SELECT userid,
NAME,
MIN(status)
FROM (SELECT userid, NAME, status FROM t1
UNION ALL
SELECT userid, NAME, status FROM t2)
GROUP BY userid, NAME;
This works in Oracle, and I'm pretty sure it'll work in the other database platforms you mentioned.
N.B. I used MIN(status) since you appear to want a status of Active to override a status of Inactive, and A comes before I in the alphabet.
In Sql-server, you could use group by or Row_number like this
DECLARE #SampleData AS TABLE
(
UserId int,
Name varchar(20),
Status varchar(10)
)
INSERT INTO #SampleData
(
UserId,Name,Status
)
VALUES
(1,'User1', 'Active'),
(2,'User2', 'Active'),
(1,'User1', 'InActive'),
(3,'User3', 'InActive')
-- use row_number
;WITH temp AS
(
SELECT *, row_number() OVER(PARTITION BY sd.UserId ORDER BY sd.Status ) AS Rn
FROM #SampleData sd
)
SELECT t.UserId, t.Name, t.Status
FROM temp t WHERE t.Rn = 1
--or use group by
SELECT sd.UserId, sd.Name, min(sd.Status) AS status
FROM #SampleData sd
GROUP BY sd.UserId, sd.Name
Results:
UserId Name Status
1 User1 Active
2 User2 Active
3 User3 InActive
In case of MS Sql Server you can try row_number
;with cte as (
select top 1 with ties * from
( select * from #youruser
union all
select * from #youruser) a
order by row_number() over (partition by userid order by [status] desc)
) select * from cte where status = 'Active'
select your_table.* from your_table
inner join (
select UserId, min(Status) as st from your_table
group by UserId
) t
on your_table.UserId = t.UserId and your_table.Status = t.st
Note: if same UserId can have same Status more than 1 times, then this returns duplicated results.
;With cte (UserId, Name,Status)
AS
(
SELECT 1,'User1','Active' Union all
SELECT 2,'User2','Active' Union all
SELECT 1,'User1','InActive' Union all
SELECT 3,'User3','InActive'
)
SELECT UserId
,NAME
,[Status]
FROM (
SELECT *
,ROW_NUMBER() OVER (
PARTITION BY UserId
,NAME ORDER BY STATUS
) AS Seq
FROM cte
) dt
WHERE dt.Seq = 1
OutPut
UserId Name Status
-----------------------
1 User1 Active
2 User2 Active
3 User3 InActive
for postgres you can use CASE and bool_or, eg:
t=# with a(i,n,b) as (
values (1,'a','active'), (1,'a','inactive'), (2,'b','inactive'), (2,'b','inactive')
)
select i,n,case when bool_or(b = 'active') then 'active' else 'inactive' end
from a
group by i,n
;
i | n | case
---+---+----------
1 | a | active
2 | b | inactive
(2 rows)
Another approach:
Note : Group by is to remove duplicate
select
A.USERID, A.NAME,A.STATUS
from TAB_1 A
LEFT JOIN
(SELECT * FROM TAB_1 WHERE STATUS='Active') B
ON A.USERID=B.USERID
WHERE
( B.STATUS IS NULL OR A.STATUS=B.STATUS)
GROUP BY A.USERID, A.NAME,A.STATUS
ORDER BY A.USERID
;

My sql statement to fetch the rows if any particular column value has changed from previous entry

I have a table with date and table entries. I am using My sql database.
Table employees
updatedDate req_count error_count
14-03-2014 10:20:39 1 0
15-03-2014 11:10:00 1 0
15-03-2014 12:10:00 1 1
15-03-2014 16:12:00 1 1
16-03-2014 12:09:00 2 10
I would like to fetch the entries if anything has changed from previous entry.
For example sql query should return only changed entries of req_count and error_count
updatedDate req_count error_count
14-03-2014 10:20:39 1 0
15-03-2014 12:10:00 1 1
16-03-2014 12:09:00 2 10
I am using
SELECT updateDate,req_count,error_count FROM employees WHERE empId=10
AND req_count > 0 OR error_count > 0
This returns me all 5 rows where as i want only 3 rows. What is the correct way to fetch the rows if any particular column value has changed.
My suggestion. Get the previous updated date for each employee. Then join back to the table to get the previous record so you can do the comparison. The following is one way to get the previous updated date:
select e.*,
(select max(e2.updatedDate)
from employees e2
where e2.empId = e.empId and e2.updatedDate < e.updatedDate
) as prev_updatedDate
from employees;
Then the full query is:
select e.*
from (select e.*,
(select max(e2.updatedDate)
from employees e2
where e2.empId = e.empId and e2.updatedDate < e.updatedDate
) as prev_updatedDate
from employees
) e left join
employees prev_e
on prev_e.empId = e.empId and prev_e.updatedDate = e.prev_updatedDate
where (prev_e.empId is null) or
(prev_e.req_count <> e.req_count or prev_e.error_count <> e.error_count);
You can try this
SELECT a.*
FROM employees AS a
WHERE
empId=10
AND
(
(
a.req_count <> (
SELECT
b.req_count
FROM
employees AS b
WHERE
a.id > b.id
ORDER BY
b.id DESC
LIMIT 1
)
)
OR (
a.error_count <> (
SELECT
b.error_count
FROM
employees AS b
WHERE
a.id > b.id
ORDER BY
b.id DESC
LIMIT 1
)
)
)
the query runs on the records and for each one check if either the req_count or the error_count has changed from previous record, where it gets the previous record by a field called id that i am not sure you have in your table.

Select a row with least value of a column using where and group by

Sample table:
id------user_id------grade_id------time_stamp
1---------100----------1001---------2013-08-29 15:07:38
2---------101----------1002---------2013-08-29 16:07:38
3---------100----------1001---------2013-08-29 17:07:38
4---------102----------1003---------2013-08-29 18:07:38
5---------103----------1004---------2013-08-29 19:07:38
6---------105----------1002---------2013-08-29 20:07:38
6---------100----------1002---------2013-08-29 21:07:38
I want to select rows whose user_id = 100 group by grade_id only if its time_stamp is least for that particular grade_id.
so, from the above table, it should be:
row 1 because its time_stamp is least for that value of grade_id(1001)
but not row 2 because I only want 1 row for a particular grade_id
also not row 6 because that particular grade_id has least value for user_id 105.
I tried few things, which are too basic and obviously not worth posting.
Thank You
You could try nested queries:
SELECT grade_id, COUNT(grade_id) FROM SAMPLE_TABLE ST WHERE time_stamp = (SELECT MIN(time_stamp) FROM SAMPLE_TABLE STT WHERE STT.grade_id = ST.grade_id) AND user_id = 100 GROUP BY grade_id;
In this case, the nested query will give you the minimun timestamp for each specific 'grade_id' and you can use it in your WHERE filter.
SELECT t.*
FROM tableX AS t
JOIN
( SELECT grade_id, MIN(time_stamp) AS time_stamp
FROM tableX
GROUP BY grade_id
) AS g
ON g.grade_id = t.grade_id
AND g.time_stamp = t.time_stamp
WHERE t.user_id = 100 ;

Fetch 2nd Higest value from MySql DB with GROUP BY

I have a table tbl_patient and I want to fetch last 2 visit of each patient in order to compare whether patient condition is improving or degrading.
tbl_patient
id | patient_ID | visit_ID | patient_result
1 | 1 | 1 | 5
2 | 2 | 1 | 6
3 | 2 | 3 | 7
4 | 1 | 2 | 3
5 | 2 | 3 | 2
6 | 1 | 3 | 9
I tried the query below to fetch the last visit of each patient as,
SELECT MAX(id), patient_result FROM `tbl_patient` GROUP BY `patient_ID`
Now i want to fetch the 2nd last visit of each patient with query but it give me error
(#1242 - Subquery returns more than 1 row)
SELECT id, patient_result FROM `tbl_patient` WHERE id <(SELECT MAX(id) FROM `tbl_patient` GROUP BY `patient_ID`) GROUP BY `patient_ID`
Where I'm wrong
select p1.patient_id, p2.maxid id1, max(p1.id) id2
from tbl_patient p1
join (select patient_id, max(id) maxid
from tbl_patient
group by patient_id) p2
on p1.patient_id = p2.patient_id and p1.id < p2.maxid
group by p1.patient_id
id11 is the ID of the last visit, id2 is the ID of the 2nd to last visit.
Your first query doesn't get the last visits, since it gives results 5 and 6 instead of 2 and 9.
You can try this query:
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
select max(id)
from tbl_patient
GROUP BY patient_ID)
union
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
select max(id)
from tbl_patient
where id not in (
select max(id)
from tbl_patient
GROUP BY patient_ID)
GROUP BY patient_ID)
order by 1,2
SELECT id, patient_result FROM `tbl_patient` t1
JOIN (SELECT MAX(id) as max, patient_ID FROM `tbl_patient` GROUP BY `patient_ID`) t2
ON t1.patient_ID = t2.patient_ID
WHERE id <max GROUP BY t1.`patient_ID`
There are a couple of approaches to getting the specified resultset returned in a single SQL statement.
Unfortunately, most of those approaches yield rather unwieldy statements.
The more elegant looking statements tend to come with poor (or unbearable) performance when dealing with large sets. And the statements that tend to have better performance are more un-elegant looking.
Three of the most common approaches make use of:
correlated subquery
inequality join (nearly a Cartesian product)
two passes over the data
Here's an approach that uses two passes over the data, using MySQL user variables, which basically emulates the analytic RANK() OVER(PARTITION ...) function available in other DBMS:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM (
SELECT p.id
, p.patient_id
, p.visit_id
, p.patient_result
, #rn := if(#prev_patient_id = patient_id, #rn + 1, 1) AS rn
, #prev_patient_id := patient_id AS prev_patient_id
FROM tbl_patients p
JOIN (SELECT #rn := 0, #prev_patient_id := NULL) i
ORDER BY p.patient_id DESC, p.id DESC
) t
WHERE t.rn <= 2
Note that this involves an inline view, which means there's going to be a pass over all the data in the table to create a "derived tabled". Then, the outer query will run against the derived table. So, this is essentially two passes over the data.
This query can be tweaked a bit to improve performance, by eliminating the duplicated value of the patient_id column returned by the inline view. But I show it as above, so we can better understand what is happening.
This approach can be rather expensive on large sets, but is generally MUCH more efficient than some of the other approaches.
Note also that this query will return a row for a patient_id if there is only one id value exists for that patient; it does not restrict the return to just those patients that have at least two rows.
It's also possible to get an equivalent resultset with a correlated subquery:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM tbl_patients t
WHERE ( SELECT COUNT(1) AS cnt
FROM tbl_patients p
WHERE p.patient_id = t.patient_id
AND p.id >= t.id
) <= 2
ORDER BY t.patient_id ASC, t.id ASC
Note that this is making use of a "dependent subquery", which basically means that for each row returned from t, MySQL is effectively running another query against the database. So, this will tend to be very expensive (in terms of elapsed time) on large sets.
As another approach, if there are relatively few id values for each patient, you might be able to get by with an inequality join:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM tbl_patients t
LEFT
JOIN tbl_patients p
ON p.patient_id = t.patient_id
AND t.id < p.id
GROUP
BY t.id
, t.patient_id
, t.visit_id
, t.patient_result
HAVING COUNT(1) <= 2
Note that this will create a nearly Cartesian product for each patient. For a limited number of id values for each patient, this won't be too bad. But if a patient has hundreds of id values, the intermediate result can be huge, on the order of (O)n**2.
Try this..
SELECT id, patient_result FROM tbl_patient AS tp WHERE id < ((SELECT MAX(id) FROM tbl_patient AS tp_max WHERE tp_max.patient_ID = tp.patient_ID) - 1) GROUP BY patient_ID
Why not use simply...
GROUP BY `patient_ID` DESC LIMIT 2
... and do the rest in the next step?