mySQL --> LIMIT selected rows, WHERE column value is same - mysql

This is my query... SELECT * FROM comments WHERE content_id in (525, 537) LIMIT 60 This is the SS of result:
here content_id = 537 is selected 5 times.
(comment_id is UNIQUE key )..
My question is: How to limit selected rows by 2, where values of content_id is same...
Maximum two duplicate records for each content_id... like in this picture:

If you are running MySQL 8.0, you can do this with row_number():
select comment_id, content_id
from (
select t.*, row_number() over(partition by content_id order by comment_id) rn
from mytable t
) t
where rn <= 2
In earlier versions, one solution is a correlated subquery:
select t.*
from mytable t
where (
select count(*)
from mytable t1
where t1.content_id = t.content_id and t1.comment_id < t.comment_id
) < 2

Related

Deleting rows using a limit and offset without using IN clause

I want to delete rows with an offset, so I am forced to use a nested query since its not support in the raw DELETE clause.
I know this would worked (ID is the primary key):
DELETE FROM customers
WHERE ID IN (
SELECT ID
FROM customers
WHERE register_date > '2012-01-01'
ORDER BY register_date ASC
LIMIT 5, 10
);
However, this is unsupported in my version as I get the error
This version of MariaDB doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'.
Server version: 10.4.22-MariaDB
What can I do to achieve the same result as above that is supported in my version.
CREATE TABLE customers (
ID INT PRIMARY KEY AUTO_INCREMENT,
NAME VARCHAR(32) NOT NULL,
REGISTER_DATE DATETIME NOT NULL
);
Join the table to a subquery that uses ROW_NUMBER() window function to sort the rows and filter the rows that you want to be deleted:
DELETE c
FROM customers c
INNER JOIN (
SELECT *, ROW_NUMBER() OVER (ORDER BY register_date) rn
FROM customers
WHERE register_date > '2012-01-01'
) t ON t.ID = c.ID
WHERE t.rn > 5 AND t.rn <= 15; -- get 10 rows with row numbers 6 - 15
See the demo.
If I did not miss something a simple delete with join will do the job...
delete customers
from (select *
from customers
WHERE register_date > '2012-01-01'
order by register_date asc
limit 5, 2) customers2
join customers on customers.id = customers2.id
Here is a demo for your version of MariaDB
You could try assigning a rank to your rows with the ROW_NUMBER window function, then catch those rows whose rank position is between 5 and 15.
DELETE FROM customers
WHERE ID IN (
SELECT *
FROM (SELECT ID,
ROW_NUMBER() OVER(
ORDER BY IF(register_date>'2012-01-01', 0, 1)
register_date ) AS rn
FROM customers) ranked_ids
WHERE rn > 4
AND rn < 16
);
This would safely avoid the use of LIMIT, though achieves the same result.
EDIT. Doing it with a join.
DELETE FROM customers c
INNER JOIN (SELECT ID,
ROW_NUMBER() OVER(
ORDER BY IF(register_date>'2012-01-01', 0, 1)
register_date ) AS rn
FROM customers) ranked_ids
WHERE
) ids_to_delete
ON c.ID = ids_to_delete.ID
AND ids_to_delete.rn > 4
AND ids_to_delete.rn < 16

get the most common value for each column

I'm attempting to create an SQL query that retrieves the total_cost for every row in a table. Alongside that, I also need to collect the most dominant value for both columnA and columnB, with their respective values.
For example, with the following table contents:
cost
columnA
columnB
target
250
Foo
Bar
XYZ
200
Foo
Bar
XYZ
150
Bar
Bar
ABC
250
Foo
Bar
ABC
The result would need to be:
total_cost
columnA_dominant
columnB_dominant
columnA_value
columnB_value
850
Foo
Bar
250
400
Now I can get as far as calculating the total cost - that's no issue. I can also get the most dominant value for columnA using this answer. But after this, I'm not sure how to also get the dominant value for columnB and the values too.
This is my current SQL:
SELECT
SUM(`cost`) AS `total_cost`,
COUNT(`columnA`) AS `columnA_dominant`
FROM `table`
GROUP BY `columnA_dominant`
ORDER BY `columnA_dominant` DESC
WHERE `target` = "ABC"
UPDATE: Thanks to #Barmar for the idea of using a subquery, I managed to get the dominant values for columnA and columnB:
SELECT
-- Retrieve total cost.
SUM(`cost`) AS `total_cost`,
-- Get dominant values.
(
SELECT `columnA`
FROM `table`
GROUP BY `columnA`
ORDER BY COUNT(*) DESC
LIMIT 1
) AS `columnA_dominant`,
(
SELECT `columnB`
FROM `table`
GROUP BY `columnB`
ORDER BY COUNT(*) DESC
LIMIT 1
) AS `columnB_dominant`
FROM `table`
WHERE `target` = "XYZ"
However, I'm still having issues figuring out how to calculate the respective values.
You might get close, if we want to get percentage values we can try to add COUNT(*) at subquery to get max count by columnA and columnB then do division by total count
SELECT
SUM(cost),
(
SELECT tt.columnA
FROM T tt
GROUP BY tt.columnA
ORDER BY COUNT(*) DESC
LIMIT 1
) AS columnA_dominant,
(
SELECT tt.columnB
FROM T tt
GROUP BY tt.columnB
ORDER BY COUNT(*) DESC
LIMIT 1
) AS columnB_dominant,
(
SELECT COUNT(*)
FROM T tt
GROUP BY tt.columnA
ORDER BY COUNT(*) DESC
LIMIT 1
) / COUNT(*) AS columnA_percentage,
(
SELECT COUNT(*)
FROM T tt
GROUP BY tt.columnB
ORDER BY COUNT(*) DESC
LIMIT 1
) / COUNT(*) AS columnB_percentage
FROM T t1
If your MySQL version supports the window function, there is another way which reduce table scan might get better performance than a correlated subquery
SELECT SUM(cost) OVER(),
FIRST_VALUE(columnA) OVER (ORDER BY counter1 DESC) columnA_dominant,
FIRST_VALUE(columnB) OVER (ORDER BY counter2 DESC) columnB_dominant,
FIRST_VALUE(counter1) OVER (ORDER BY counter1 DESC) / COUNT(*) OVER() columnA_percentage,
FIRST_VALUE(counter2) OVER (ORDER BY counter2 DESC) / COUNT(*) OVER() columnB_percentage
FROM (
SELECT *,
COUNT(*) OVER (PARTITION BY columnA) counter1,
COUNT(*) OVER (PARTITION BY columnB) counter2
FROM T
) t1
LIMIT 1
sqlfiddle
try this query
select sum(cost) as total_cost,p.columnA,q.columnB,p.columnA_percentage,q.columnB_percentage
from get_common,(
select top 1 columnA,columnA_percentage
from(
select columnA,count(columnA) as count_columnA,cast(count(columnA) as float)/(select count(columnA) from get_common) as columnA_percentage
from get_common
group by columnA)s
order by count_columnA desc
)p,
(select top 1 columnB,columnB_percentage
from (
select columnB,count(columnB) as count_columnB, cast(count(columnB) as float)/(select count(columnB) from get_common) as columnB_percentage
from get_common
group by columnB) t
order by count_columnB desc)q
group by p.columnA,q.columnB,p.columnA_percentage,q.columnB_percentage
so if you want to get the percent and dominant value you must make their own query like this
select top 1 columnA,columnA_percentage
from(
select columnA,count(columnA) as count_columnA,cast(count(columnA) as float)/(select count(columnA) from get_common) as columnA_percentage
from get_common
group by columnA)s
order by count_columnA desc
then you can join with the sum query to get all value you want
hope this can help you

SQL Query about percentage selection

I am trying to write a query for a condition:
If >=80 percent (4 or more rows as 4/5*100=80%) of the top 5 recent rows(by Date Column), for a KEY have Value =A or =B, then change the flag from fail to pass for the entire KEY.
Here is the input and output sample:
I have highlighted recent rows with green colour in the sample.
Can someone help me in this?
I tried till finding the top 5 recent rows by the foll code:
select * from(
select *, row_number() over (partition by "KEY") as 'RN' FROM (
select * from tb1
order by date desc))
where "RN"<=5
Couldnt figure what to be done after this
Test this:
WITH
-- enumerate rows per key group
cte1 AS ( SELECT *,
ROW_NUMBER() OVER (PARTITION BY `key` ORDER BY `date` DESC) rn
FROM sourcetable ),
-- take 5 recent rows only, check there are at least 4 rows with A/B
cte2 AS ( SELECT `key`
FROM cte1
WHERE rn <= 5
GROUP BY `key`
HAVING ( SUM(`value` = 'A') >= 4
OR SUM(`value` = 'B') >= 4 )
-- AND SUM(rn = 5) )
-- update rows with found key values
UPDATE sourcetable
JOIN cte2 USING (`key`)
SET flag = 'PASS';
5.7 version – Ayn76
Convert CTEs to subqueries. Emulate ROW_NUMBER() using user-defined variable.

Group by date and take the last one

This is my table :
What I'm trying to do, is to take the last disponibility of a user, by caserne. Example, I should have this result :
id id_user id_caserne id_dispo created_at
31 21 12 1 2019-10-24 01:21:46
33 21 13 1 2019-10-23 20:17:21
I've tried this sql, but it does not seems to work all the times :
SELECT * FROM
( SELECT id, id_dispo, id_user, id_caserne, MAX(created_at)
FROM disponibilites GROUP BY id_user, id_caserne, id_dispo
ORDER BY created_at desc ) AS sub
GROUP BY id_user, id_caserne
What am I doing wrong ?
I would simply use filtering in the where clause using a correlated subquery:
select d.*
from disponibilites d
where d.created_at = (select max(d2.created_at)
from disponibilites d2
where d2.id_user = d.id_user
);
EDIT:
Based on your comments:
select d.*
from disponibilites d
where d.created_at = (select max(d2.created_at)
from disponibilites d2
where d2.id_user = d.id_user and
d2.id_caserne = d.id_caserne
where date(d2.created_at) = date(d.created_at)
);
You can use a correlated subquery, as demonstrated by Gordon Linoff, or a window function if your RDBMS supports it:
select * from (
select
t.*,
rank() over(partition by id_caserne, id_user order by created_at desc) rn
from disponibilites t
) x
where rn = 1
Another option is to use a correlated subquery without aggregation, only with a sort and limit:
select *
from mytable t
where created_at = (
select created_at
from mytable t1
where t1.id_user = t.id_user and t1.id_caserne = t.id_caserne
order by created_at desc
limit 1
)
With an index on (id_user, id_caserne, created_at), this should be a very efficient option.
you can join your max(created_date) to your original table
select t1.* from disponibilites t1
inner join
(select max(created_at), id_caserne, id
from disponibilites
group by id_caserne, id) t2
on t2.id = t1.id

How to get rows with max(of a column) when using group by

When I tried:
select *
from some_table
group by table_id
order by timestamp desc;
I am gettings rows which have least timestamp values for that particular table_id(which I use for grouping)
How can I get the rows which have highest timestamp values for that particular table_id.
I also tried:
select *
from some_table
group by table_id
having max(timestamp)
order by timestamp desc;
which gives the same result as in the 1st case.
Thank You
select *
from your_table t
inner join
(
select table_id, max(created_timestamp) as mts
from your_table
group by table_id
) x on x.table_id = t.table_id
and x.mts = t.created_timestamp
In MySQL you can do
select *, max(created_timestamp) as mts
from your_table
group by table_id
but that will not make sure you get the corresponding data to your max(created_timestamp) but only to your table_id
SELECT * FROM (SELECT * FROM some_table ORDER BY timestamp) t1 GROUP BY t1.id