SQL/MySQL DELETE all rows EXCEPT 2 of them - mysql

I have a database table setup like this:
id | code | group_id | status ---
---|-------|---------|------------
1 | abcd1 | group_1 | available
2 | abcd2 | group_1 | available
3 | adsd3 | group_1 | available
4 | dfgd4 | group_1 | available
5 | vfcd5 | group_1 | available
6 | bgcd6 | group_2 | available
7 | abcd7 | group_2 | available
8 | ahgf8 | group_2 | available
9 | dfgd9 | group_2 | available
10 | qwer6 | group_2 | available
In the example above, each group_id has 5 total rows (arbitrary for example, total rows will be dynamic and vary), I need to remove every row that matches available in status except for 2 of them (which 2 does not matter, as long as there are 2 of them remaining)
Basically every unique group_id should only have 2 total rows with status of available. I am able to do a simple SQL query to remove all of them, but struggling to come up with a SQL query to remove all except for 2 ... please helppppp :)

If code is unique, you can use subqueries to keep the "min" and "max"
DELETE FROM t
WHERE t.status = 'available'
AND (t.group_id, t.code) NOT IN (
SELECT group_id, MAX(code)
FROM t
WHERE status = 'available'
GROUP BY group_id
)
AND (t.group_id, t.code) NOT IN (
SELECT group_id, MIN(code)
FROM t
WHERE status = 'available'
GROUP BY group_id
)
Similarly, with an auto increment id:
DELETE FROM t
WHERE t.status = 'available'
AND t.id NOT IN (
SELECT MAX(id) FROM t WHERE status = 'available' GROUP BY group_id
UNION
SELECT MIN(id) FROM t WHERE status = 'available' GROUP BY group_id
)
I reworked the subquery into a UNION instead in this version, but the "AND" format would work just as well too. Also, if "code" was unique across the whole table, the NOT IN could be simplified down to excluding the group_id as well (though it would still be needed in the subqueries' GROUP BY clauses).
Edit: MySQL doesn't like subqueries referencing tables being UPDATEd/DELETEd in the WHERE of the query doing the UPDATE/DELETE; in those cases, you can usually double-wrap the subquery to give it an alias, causing MySQL to treat it as a temporary table (behind the scenes).
DELETE FROM t
WHERE t.status = 'available'
AND t.id NOT IN (
SELECT * FROM (
SELECT MAX(id) FROM t WHERE status = 'available' GROUP BY group_id
UNION
SELECT MIN(id) FROM t WHERE status = 'available' GROUP BY group_id
) AS a
)
Another alternative, I don't recall if MySQL complains as much about joins in DELETE/UPDATE....
DELETE t
FROM t
LEFT JOIN (
SELECT MIN(id) AS minId, MAX(id) AS maxId, 1 AS keep_flag
FROM t
WHERE status = 'available'
GROUP BY group_id
) AS tKeep ON t.id IN (tKeep.minId, tKeep.maxId)
WHERE t.status = 'available'
AND tKeep.keep_flag IS NULL

To keep the min and max ids, I think a join is the simplest solution:
DELETE t
FROM t LEFT JOIN
(SELECT group_id, MIN(id) as min_id, MAX(id) as max_id
FROM t
WHERE t.status = 'available'
GROUP BY group_id
) tt
ON t.id IN (tt.min_id, tt.max_id)
WHERE t.status = 'available' AND
tt.group_id IS NULL;

If the column "id" is the PRIMARY KEY or a UNIQUE KEY, then we could use a correlated subquery to get the second lowest value for a particular group_id.
We could then use that to identify rows for group_id that have higher values of the "id" column.
A query something like this:
SELECT t.`id`
, t.`group_id`
FROM `setup_like_this` t
WHERE t.`status` = 'available'
AND t.`id`
> ( SELECT s.`id`
FROM `setup_like_this` s
WHERE s.`status` = 'available'
AND s.`group_id` = t.`group_id`
ORDER
BY s.`id`
LIMIT 1,1
)
We test that as a SELECT first, to examine the rows that are returned. When we are satisfied this query is returning the set of rows we want to delete, we can replace SELECT ... FROM with DELETE t.* FROM to convert it to a DELETE statement to remove the rows.
Error 1093 encountered converting to DELETE statement.
One workaround is to make the query above into a inline view, and then join to the target table
DELETE q.*
FROM `setup_like_this` q
JOIN ( -- inline view, query from above returns `id` of rows we want to delete
SELECT t.`id`
, t.`group_id`
FROM `setup_like_this` t
WHERE t.`status` = 'available'
AND t.`id`
> ( SELECT s.`id`
FROM `setup_like_this` s
WHERE s.`status` = 'available'
AND s.`group_id` = t.`group_id`
ORDER
BY s.`id`
LIMIT 1,1
)
) r
ON r.id = q.id

select id, code, group_id, status
from (
select id, code, group_id, status
, ROW_NUMBER() OVER (
PARTITION BY group_id
ORDER BY id DESC) row_num
) rownum
from a
) q
where rownum < 3

Related

How to query rows where only the rows with the highest value in a specific column appear?

Sorry if my phrasing is confusing, self learning PL/SQL. I am trying to query all the columns in rows that have the highest value based on one column.
example: I have a table with three rows and three columns
Table: PTest
Ptest_no | Test_id | Test_inst
------------------------------
ABC11 | 1 | 1
ABC11 | 2 | 1
ABC11 | 2 | 2
I need to get just the top and bottom row with all the columns it has (final table will have close to 10+ columns)
result:
ABC11 | 1 | 1
ABC11 | 2 | 2
I tried:
--but it only prints 3rd row.
select * from ptest
where test_inst = (select max(test_inst) from ptest);
--attempted self join thinking that a subquery could help specify the condition.
--but only prints 3rd row
select a.Ptest_no, a.test_id, a.test_inst
from PTest a
join (select max(test_inst) as max_insty
from PTest b
where PTest_no = 'ABC11') on max_insty = a.test_inst
where PTest_no = 'ABC11';
--results in invalid relational operator.
--I am unsure what that means.
select test_inst, ptest_no, test_id
from ptest
group by test_inst, ptest_no, test_id having max(test_inst);
Currently trying:
-attempting again with self join but using CASE, having a hard time with CASE and unsure how to properly end it of if its best route. Commented out case and ran, prints only the 3rd row
-added 4rd row names ptest_snu with value '69' on all rows. unsure why I did this.
select a.Ptest_no, a.test_id, a.test_inst, a.ptest_snu
from PTest a
--case
--when a.test_id = b.test_id then select max(test_inst)
--else (select * from Ptest a) end
join (select max(test_inst) as max_insty
from PTest b
where PTest_no = 'ABC11') on max_insty = a.test_inst
where a.ptest_snu = '69';
I suspect that you want the row with the greatest test_inst for each test_id. If so, this is a greatest-n-per-group problem; one option is to filter with a correlated subquery:
select t.*
from ptest t
where t.test_inst = (
select max(t1.test_inst) from ptest t1 where t1.test_id = t1.test_id
)
You can also use window functions:
select *
from (
select t.*, row_number() over(partition by test_id order by test_inst desc) rn
from ptest t
) t
where rn = 1
I think this will return your desired result:
select * from ptest
where
(
test_inst = (select max(test_inst) from ptest)
and
test_id = (select max(test_id) from ptest)
)
or
(
test_inst = (select min(test_inst) from ptest)
and
test_id = (select min(test_id) from ptest)
)
So both columns have to be equal to the highest values in those columns or the lowest. Not just either one of them.

SQL Create Unique Value Flag

There are lots of questions/answers about selecting unique values in a MySQL query but I haven't seen any on creating a unique value flag.
I have a customer_ID that can appear more than once in a query output. I want to create a new column that flags whether the customer_ID is unique or not (0 or 1).
The output should look something like this:
ID | Customer ID | Unique_Flag
1 | 1234 | 1
2 | 2345 | 1
3 | 2345 | 0
4 | 5678 | 1
Please let me know if anybody needs clarifications.
You seem to want to mark the first occurrence as unique, but not others. So, let's join in the comparison value:
select t.*,
(id = min_id) as is_first_occurrence
from t join
(select customer_id, min(id) as min_id
from t
group by customer_id
) tt
on t.customer_id = tt.customer_id;
For most people, a "unique" flag would mean that the overall count is "1", not that this is merely the first appearance. If that is what you want, then you can use similar logic:
select t.*,
(id = min_id) as is_first_occurrence,
(cnt = 1) as is_unique
from t join
(select customer_id, min(id) as min_id, count(*) as cnt
from t
group by customer_id
) tt
on t.customer_id = tt.customer_id;
And, in MySQL 8+, you would use window functions:
select t.*,
(row_number() over (partition by customer_id order by id) = 1) as is_first_occurrence,
(count(*) over (partition by customer_id) = 1) as is_unique
from t;
You can try below
select id,a.customerid, case when cnt=1 then 1 else 0 end as Unique_Flag
from tablename a
left join
(select customerid, count(*) as cnt from tablename
group by customerid
)b on a.customerid=b.customerid
You can use lead function as given below to get the required output.
SELECT ID, CUSTOMER_ID,
CASE
WHEN CUSTOMER_ID != CUSTOMER_ID_NEXT THEN 1
ELSE 0
END AS UNIQUE_FLAG FROM
(SELECT ID, CUSTOMER_ID,LEAD(CUSTOMER_ID, 1, 0) OVER (ORDER BY CUSTOMER_ID) AS CUSTOMER_ID_NEXT FROM TABLE)T

MySQL join two tables with group by in each

I have two mysql tables with part numbers and qty's. I want to sum each tables qty sum(qty) ... group by partNumber Then join the two tables on the part number.
Sometimes table A will have part numbers that table b does not and vice versa. Below is an image of what I am expecting.
I've tried something like this, but this returns a row for each table and I want it to return 1 combined row
SELECT *, null as macroQty, sum(qty) as cardinalQty
FROM parts.cardinal where fileinfoid IN
(select cardinalFiles from parts.reports where fileinfoid = 418)
GROUP BY partNumber UNION ALL
SELECT *, sum(qty) as macroQty, null as cardinalQty
FROM parts.macro where fileinfoid IN
(select macroFiles from parts.reports where fileinfoid = 418 )
GROUP BY partNumber
I also tried wrapping it in an outer select and grouping by the part number from the outer select like this, but this results in the second inner select being null always
SELECT * FROM (
SELECT *, null as macroQty, sum(qty) as cardinalQty
FROM parts.cardinal where fileinfoid IN
(select cardinalFiles from parts.reports where fileinfoid = 418)
GROUP BY partNumber UNION ALL
SELECT *, sum(qty) as macroQty, null as cardinalQty
FROM parts.macro where fileinfoid IN
(select macroFiles from parts.reports where fileinfoid = 418 )
GROUP BY partNumber
) combined GROUP BY combined.partNumber
One approach would be to identify unique part numbers across the 2 tables (using a UNION with it's applied distinct) and then use correlated sub queries to get the sums. For example
drop table if exists a,b;
create table a(id int,val int);
create table b(id int,val int);
insert into a values(1,10),(1,10),(3,10),(4,10);
insert into b values (2,10),(4,10),(4,10);
select (select sum(a.val) from a where a.id = s.id) aval,
(select sum(b.val) from b where b.id = s.id) bval,
s.id partno
from
(
select id from a
union select id from b
) s
order by s.id;
+------+------+--------+
| aval | bval | partno |
+------+------+--------+
| 20 | NULL | 1 |
| NULL | 10 | 2 |
| 10 | NULL | 3 |
| 10 | 20 | 4 |
+------+------+--------+
4 rows in set (0.00 sec)
I would phrase this as a join between two subqueries which each find the sum in their respective tables. However, since each table does not necessarily contain all part numbers, and in fact there may be part numbers unique to each table, we will have to use a full outer join approach.
SELECT
t1.partNumber,
t1.cardinalQty,
COALECSE(t2.macroQty, 0) AS macroQty
FROM
(
SELECT partNumber, SUM(qty) AS cardinalQty
FROM cardinal
GROUP BY partNumber
) t1
LEFT JOIN
(
SELECT partNumber, SUM(qty) AS macroQty
FROM macro
GROUP BY partNumber
) t2
ON t1.partNumber = t2.partNumber
UNION ALL
SELECT
t2.partNumber,
0 AS cardinalQty,
t2.macroQty
FROM
(
SELECT partNumber, SUM(qty) AS cardinalQty
FROM cardinal
GROUP BY partNumber
) t1
RIGHT JOIN
(
SELECT partNumber, SUM(qty) AS macroQty
FROM macro
GROUP BY partNumber
) t2
ON t1.partNumber = t2.partNumber
WHERE t1.partNumber IS NULL;
Keep in mind that under normal conditions, in a well designed database, you should rarely encounter a situation which requires using a full outer join. Actually, a full outer join screams out that there is a design problem. In this case, you don't have a single parts table containing all part numbers. That table should exist, so unless you enjoy big ugly queries, you should create a parts table where the partNumber is a primary key.

Mysql - Get the difference between two sequential values

I want to get the difference between two sequential values from my table.
| id | count |
| 1 | 1 |
| 2 | 7 |
| 3 | 9 |
| 4 | 3 |
| 5 | 7 |
| 6 | 9 |
For example the difference between
id2-id1 = 6,
id3-id2 = -2,
...
How can I do it? SELECT SUM(id(x+1) - id(x)) FROM table1
You can use a subquery to find count for the preceding id.
In case there are no gaps in the ID column:
SELECT CONCAT(t.`id` ,' - ', t.`id` - 1) AS `IDs`
, t.`count` - (SELECT `count`
FROM `tbl`
WHERE `id` = t.`id` - 1) AS `Difference`
FROM `tbl` t
WHERE t.`id` > 1
SQLFiddle
In case there are gaps in the IDcolumn.
First solution, using ORDER BY <...> DESC with LIMIT 1:
SELECT CONCAT(t.id ,' - ', (SELECT `id` FROM tbl WHERE t.id > id ORDER BY id DESC LIMIT 1)) AS IDs
, t.`count` - (SELECT `count`
FROM tbl
WHERE t.id > id
ORDER BY id DESC
LIMIT 1) AS difference
FROM tbl t
WHERE t.id > 1;
SQLFiddle
Second solution, using another subquery to find count with the MAX(id) less than current id:
SELECT CONCAT(t.id ,' - ', (SELECT MAX(`id`) FROM tbl WHERE id < t.id)) AS IDs
, t.`count` - (SELECT `count`
FROM tbl
WHERE `id` = (SELECT MAX(`id`)
FROM tbl
WHERE id < t.id)
) AS difference
FROM tbl t
WHERE t.id > 1;
SQLFiddle
P.S. : First column, IDs, is just for readability, you can omit it or change completely, if it is necessary.
If you know that the ids have no gaps, then just use a join:
select t.*, (tnext.count - t.count) as diff
from table t join
table tnext
on t.id = tnext.id - 1;
If you just want the sum of the differences, then that is the same as the last value minus the first value (all the intermediate values cancel out in the summation). You can do this with limit:
select last.count - first.count
from (select t.* from table order by id limit 1) as first cross join
(select t.* from table order by id desc limit 1) as last;
Try this:
SELECT MAX(count)-MIN(count) diff WHERE id IN(1,2)
Or this way
SELECT 2*STD(count) diff WHERE id IN(1,2)
This works even if ids have distances between them:
SELECT *,
((SELECT value FROM example e2 WHERE e2.id > e1.id ORDER BY id ASC LIMIT 1) - value) as diff
FROM example e1;

Fetch 2nd Higest value from MySql DB with GROUP BY

I have a table tbl_patient and I want to fetch last 2 visit of each patient in order to compare whether patient condition is improving or degrading.
tbl_patient
id | patient_ID | visit_ID | patient_result
1 | 1 | 1 | 5
2 | 2 | 1 | 6
3 | 2 | 3 | 7
4 | 1 | 2 | 3
5 | 2 | 3 | 2
6 | 1 | 3 | 9
I tried the query below to fetch the last visit of each patient as,
SELECT MAX(id), patient_result FROM `tbl_patient` GROUP BY `patient_ID`
Now i want to fetch the 2nd last visit of each patient with query but it give me error
(#1242 - Subquery returns more than 1 row)
SELECT id, patient_result FROM `tbl_patient` WHERE id <(SELECT MAX(id) FROM `tbl_patient` GROUP BY `patient_ID`) GROUP BY `patient_ID`
Where I'm wrong
select p1.patient_id, p2.maxid id1, max(p1.id) id2
from tbl_patient p1
join (select patient_id, max(id) maxid
from tbl_patient
group by patient_id) p2
on p1.patient_id = p2.patient_id and p1.id < p2.maxid
group by p1.patient_id
id11 is the ID of the last visit, id2 is the ID of the 2nd to last visit.
Your first query doesn't get the last visits, since it gives results 5 and 6 instead of 2 and 9.
You can try this query:
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
select max(id)
from tbl_patient
GROUP BY patient_ID)
union
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
select max(id)
from tbl_patient
where id not in (
select max(id)
from tbl_patient
GROUP BY patient_ID)
GROUP BY patient_ID)
order by 1,2
SELECT id, patient_result FROM `tbl_patient` t1
JOIN (SELECT MAX(id) as max, patient_ID FROM `tbl_patient` GROUP BY `patient_ID`) t2
ON t1.patient_ID = t2.patient_ID
WHERE id <max GROUP BY t1.`patient_ID`
There are a couple of approaches to getting the specified resultset returned in a single SQL statement.
Unfortunately, most of those approaches yield rather unwieldy statements.
The more elegant looking statements tend to come with poor (or unbearable) performance when dealing with large sets. And the statements that tend to have better performance are more un-elegant looking.
Three of the most common approaches make use of:
correlated subquery
inequality join (nearly a Cartesian product)
two passes over the data
Here's an approach that uses two passes over the data, using MySQL user variables, which basically emulates the analytic RANK() OVER(PARTITION ...) function available in other DBMS:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM (
SELECT p.id
, p.patient_id
, p.visit_id
, p.patient_result
, #rn := if(#prev_patient_id = patient_id, #rn + 1, 1) AS rn
, #prev_patient_id := patient_id AS prev_patient_id
FROM tbl_patients p
JOIN (SELECT #rn := 0, #prev_patient_id := NULL) i
ORDER BY p.patient_id DESC, p.id DESC
) t
WHERE t.rn <= 2
Note that this involves an inline view, which means there's going to be a pass over all the data in the table to create a "derived tabled". Then, the outer query will run against the derived table. So, this is essentially two passes over the data.
This query can be tweaked a bit to improve performance, by eliminating the duplicated value of the patient_id column returned by the inline view. But I show it as above, so we can better understand what is happening.
This approach can be rather expensive on large sets, but is generally MUCH more efficient than some of the other approaches.
Note also that this query will return a row for a patient_id if there is only one id value exists for that patient; it does not restrict the return to just those patients that have at least two rows.
It's also possible to get an equivalent resultset with a correlated subquery:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM tbl_patients t
WHERE ( SELECT COUNT(1) AS cnt
FROM tbl_patients p
WHERE p.patient_id = t.patient_id
AND p.id >= t.id
) <= 2
ORDER BY t.patient_id ASC, t.id ASC
Note that this is making use of a "dependent subquery", which basically means that for each row returned from t, MySQL is effectively running another query against the database. So, this will tend to be very expensive (in terms of elapsed time) on large sets.
As another approach, if there are relatively few id values for each patient, you might be able to get by with an inequality join:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM tbl_patients t
LEFT
JOIN tbl_patients p
ON p.patient_id = t.patient_id
AND t.id < p.id
GROUP
BY t.id
, t.patient_id
, t.visit_id
, t.patient_result
HAVING COUNT(1) <= 2
Note that this will create a nearly Cartesian product for each patient. For a limited number of id values for each patient, this won't be too bad. But if a patient has hundreds of id values, the intermediate result can be huge, on the order of (O)n**2.
Try this..
SELECT id, patient_result FROM tbl_patient AS tp WHERE id < ((SELECT MAX(id) FROM tbl_patient AS tp_max WHERE tp_max.patient_ID = tp.patient_ID) - 1) GROUP BY patient_ID
Why not use simply...
GROUP BY `patient_ID` DESC LIMIT 2
... and do the rest in the next step?