SQL Query about percentage selection

SQL Query about percentage selection - mysql

I am trying to write a query for a condition:
If >=80 percent (4 or more rows as 4/5*100=80%) of the top 5 recent rows(by Date Column), for a KEY have Value =A or =B, then change the flag from fail to pass for the entire KEY.
Here is the input and output sample:
I have highlighted recent rows with green colour in the sample.
Can someone help me in this?
I tried till finding the top 5 recent rows by the foll code:
select * from(
select *, row_number() over (partition by "KEY") as 'RN' FROM (
select * from tb1
order by date desc))
where "RN"<=5
Couldnt figure what to be done after this

Test this:
WITH
-- enumerate rows per key group
cte1 AS ( SELECT *,
ROW_NUMBER() OVER (PARTITION BY `key` ORDER BY `date` DESC) rn
FROM sourcetable ),
-- take 5 recent rows only, check there are at least 4 rows with A/B
cte2 AS ( SELECT `key`
FROM cte1
WHERE rn <= 5
GROUP BY `key`
HAVING ( SUM(`value` = 'A') >= 4
OR SUM(`value` = 'B') >= 4 )
-- AND SUM(rn = 5) )
-- update rows with found key values
UPDATE sourcetable
JOIN cte2 USING (`key`)
SET flag = 'PASS';
5.7 version – Ayn76
Convert CTEs to subqueries. Emulate ROW_NUMBER() using user-defined variable.

Related

Deleting rows using a limit and offset without using IN clause

I want to delete rows with an offset, so I am forced to use a nested query since its not support in the raw DELETE clause.
I know this would worked (ID is the primary key):
DELETE FROM customers
WHERE ID IN (
SELECT ID
FROM customers
WHERE register_date > '2012-01-01'
ORDER BY register_date ASC
LIMIT 5, 10
);
However, this is unsupported in my version as I get the error
This version of MariaDB doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'.
Server version: 10.4.22-MariaDB
What can I do to achieve the same result as above that is supported in my version.
CREATE TABLE customers (
ID INT PRIMARY KEY AUTO_INCREMENT,
NAME VARCHAR(32) NOT NULL,
REGISTER_DATE DATETIME NOT NULL
);

Join the table to a subquery that uses ROW_NUMBER() window function to sort the rows and filter the rows that you want to be deleted:
DELETE c
FROM customers c
INNER JOIN (
SELECT *, ROW_NUMBER() OVER (ORDER BY register_date) rn
FROM customers
WHERE register_date > '2012-01-01'
) t ON t.ID = c.ID
WHERE t.rn > 5 AND t.rn <= 15; -- get 10 rows with row numbers 6 - 15
See the demo.

If I did not miss something a simple delete with join will do the job...
delete customers
from (select *
from customers
WHERE register_date > '2012-01-01'
order by register_date asc
limit 5, 2) customers2
join customers on customers.id = customers2.id
Here is a demo for your version of MariaDB

You could try assigning a rank to your rows with the ROW_NUMBER window function, then catch those rows whose rank position is between 5 and 15.
DELETE FROM customers
WHERE ID IN (
SELECT *
FROM (SELECT ID,
ROW_NUMBER() OVER(
ORDER BY IF(register_date>'2012-01-01', 0, 1)
register_date ) AS rn
FROM customers) ranked_ids
WHERE rn > 4
AND rn < 16
);
This would safely avoid the use of LIMIT, though achieves the same result.
EDIT. Doing it with a join.
DELETE FROM customers c
INNER JOIN (SELECT ID,
ROW_NUMBER() OVER(
ORDER BY IF(register_date>'2012-01-01', 0, 1)
register_date ) AS rn
FROM customers) ranked_ids
WHERE
) ids_to_delete
ON c.ID = ids_to_delete.ID
AND ids_to_delete.rn > 4
AND ids_to_delete.rn < 16

get the most common value for each column

I'm attempting to create an SQL query that retrieves the total_cost for every row in a table. Alongside that, I also need to collect the most dominant value for both columnA and columnB, with their respective values.
For example, with the following table contents:
cost
columnA
columnB
target
250
Foo
Bar
XYZ
200
Foo
Bar
XYZ
150
Bar
Bar
ABC
250
Foo
Bar
ABC
The result would need to be:
total_cost
columnA_dominant
columnB_dominant
columnA_value
columnB_value
850
Foo
Bar
250
400
Now I can get as far as calculating the total cost - that's no issue. I can also get the most dominant value for columnA using this answer. But after this, I'm not sure how to also get the dominant value for columnB and the values too.
This is my current SQL:
SELECT
SUM(`cost`) AS `total_cost`,
COUNT(`columnA`) AS `columnA_dominant`
FROM `table`
GROUP BY `columnA_dominant`
ORDER BY `columnA_dominant` DESC
WHERE `target` = "ABC"
UPDATE: Thanks to #Barmar for the idea of using a subquery, I managed to get the dominant values for columnA and columnB:
SELECT
-- Retrieve total cost.
SUM(`cost`) AS `total_cost`,
-- Get dominant values.
(
SELECT `columnA`
FROM `table`
GROUP BY `columnA`
ORDER BY COUNT(*) DESC
LIMIT 1
) AS `columnA_dominant`,
(
SELECT `columnB`
FROM `table`
GROUP BY `columnB`
ORDER BY COUNT(*) DESC
LIMIT 1
) AS `columnB_dominant`
FROM `table`
WHERE `target` = "XYZ"
However, I'm still having issues figuring out how to calculate the respective values.

You might get close, if we want to get percentage values we can try to add COUNT(*) at subquery to get max count by columnA and columnB then do division by total count
SELECT
SUM(cost),
(
SELECT tt.columnA
FROM T tt
GROUP BY tt.columnA
ORDER BY COUNT(*) DESC
LIMIT 1
) AS columnA_dominant,
(
SELECT tt.columnB
FROM T tt
GROUP BY tt.columnB
ORDER BY COUNT(*) DESC
LIMIT 1
) AS columnB_dominant,
(
SELECT COUNT(*)
FROM T tt
GROUP BY tt.columnA
ORDER BY COUNT(*) DESC
LIMIT 1
) / COUNT(*) AS columnA_percentage,
(
SELECT COUNT(*)
FROM T tt
GROUP BY tt.columnB
ORDER BY COUNT(*) DESC
LIMIT 1
) / COUNT(*) AS columnB_percentage
FROM T t1
If your MySQL version supports the window function, there is another way which reduce table scan might get better performance than a correlated subquery
SELECT SUM(cost) OVER(),
FIRST_VALUE(columnA) OVER (ORDER BY counter1 DESC) columnA_dominant,
FIRST_VALUE(columnB) OVER (ORDER BY counter2 DESC) columnB_dominant,
FIRST_VALUE(counter1) OVER (ORDER BY counter1 DESC) / COUNT(*) OVER() columnA_percentage,
FIRST_VALUE(counter2) OVER (ORDER BY counter2 DESC) / COUNT(*) OVER() columnB_percentage
FROM (
SELECT *,
COUNT(*) OVER (PARTITION BY columnA) counter1,
COUNT(*) OVER (PARTITION BY columnB) counter2
FROM T
) t1
LIMIT 1
sqlfiddle

try this query
select sum(cost) as total_cost,p.columnA,q.columnB,p.columnA_percentage,q.columnB_percentage
from get_common,(
select top 1 columnA,columnA_percentage
from(
select columnA,count(columnA) as count_columnA,cast(count(columnA) as float)/(select count(columnA) from get_common) as columnA_percentage
from get_common
group by columnA)s
order by count_columnA desc
)p,
(select top 1 columnB,columnB_percentage
from (
select columnB,count(columnB) as count_columnB, cast(count(columnB) as float)/(select count(columnB) from get_common) as columnB_percentage
from get_common
group by columnB) t
order by count_columnB desc)q
group by p.columnA,q.columnB,p.columnA_percentage,q.columnB_percentage
so if you want to get the percent and dominant value you must make their own query like this
select top 1 columnA,columnA_percentage
from(
select columnA,count(columnA) as count_columnA,cast(count(columnA) as float)/(select count(columnA) from get_common) as columnA_percentage
from get_common
group by columnA)s
order by count_columnA desc
then you can join with the sum query to get all value you want
hope this can help you

mySQL --> LIMIT selected rows, WHERE column value is same

This is my query... SELECT * FROM comments WHERE content_id in (525, 537) LIMIT 60 This is the SS of result:
here content_id = 537 is selected 5 times.
(comment_id is UNIQUE key )..
My question is: How to limit selected rows by 2, where values of content_id is same...
Maximum two duplicate records for each content_id... like in this picture:

If you are running MySQL 8.0, you can do this with row_number():
select comment_id, content_id
from (
select t.*, row_number() over(partition by content_id order by comment_id) rn
from mytable t
) t
where rn <= 2
In earlier versions, one solution is a correlated subquery:
select t.*
from mytable t
where (
select count(*)
from mytable t1
where t1.content_id = t.content_id and t1.comment_id < t.comment_id
) < 2

Mysql - Accumulatively count the total on a row by row basis

I'm trying in MySql to count the number of users created each day and then get an accumulative figure on a row by row basis. I have followed other suggestions on here, but I cannot seem to get the accumulation to be correct.
The problem is that it keeps counting from the base number of 200 and not taking account of previous rows.
Where was I would expect it to return
My Sql is as follows;
SELECT day(created_at), count(*), (#something := #something+count(*)) as value
FROM myTable
CROSS JOIN (SELECT #something := 200) r
GROUP BY day(created_at);
To create the table and populate it you can use;
CREATE TABLE myTable (
id INT AUTO_INCREMENT,
created_at DATETIME,
PRIMARY KEY (id)
);
INSERT INTO myTable (created_at)
VALUES ('2018-04-01'),
('2018-04-01'),
('2018-04-01'),
('2018-04-01'),
('2018-04-02'),
('2018-04-02'),
('2018-04-02'),
('2018-04-03'),
('2018-04-03');
You can view this on SqlFiddle.

Use a subquery:
SELECT day, cnt, (#s := #s + cnt)
FROM (SELECT day(created_at) as day, count(*) as cnt
FROM myTable
GROUP BY day(created_at)
) d CROSS JOIN
(SELECT #s := 0) r;
GROUP BY and variables have not worked together for a long time. In more recent versions, ORDER BY also needs a subquery.

How many different ways are there to get the second row in a SQL search?

Let's say I was looking for the second most highest record.
Sample Table:
CREATE TABLE `my_table` (
`id` int(2) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`value` int(10),
PRIMARY KEY (`id`)
);
INSERT INTO `my_table` (`id`, `name`, `value`) VALUES (NULL, 'foo', '200'), (NULL, 'bar', '100'), (NULL, 'baz', '0'), (NULL, 'quux', '300');
The second highest value is foo. How many ways can you get this result?
The obvious example is:
SELECT name FROM my_table ORDER BY value DESC LIMIT 1 OFFSET 1;
Can you think of other examples?
I was trying this one, but LIMIT & IN/ALL/ANY/SOME subquery is not supported.
SELECT name FROM my_table WHERE value IN (
SELECT MIN(value) FROM my_table ORDER BY value DESC LIMIT 1
) LIMIT 1;

Eduardo's solution in standard SQL
select *
from (
select id,
name,
value,
row_number() over (order by value) as rn
from my_table t
) t
where rn = 1 -- can pick any row using this
This works on any modern DBMS except MySQL. This solution is usually faster than solutions using sub-selects. It also can easily return the 2nd, 3rd, ... row (again this is achievable with Eduardo's solution as well).
It can also be adjusted to count by groups (adding a partition by) so the "greatest-n-per-group" problem can be solved with the same pattern.
Here is a SQLFiddle to play around with: http://sqlfiddle.com/#!12/286d0/1

This only works for exactly the second highest:
SELECT * FROM my_table two
WHERE EXISTS (
SELECT * FROM my_table one
WHERE one.value > two.value
AND NOT EXISTS (
SELECT * FROM my_table zero
WHERE zero.value > one.value
)
)
LIMIT 1
;
This one emulates a window function rank() for platforms that don't have them. It can also be adapted for ranks <> 2 by altering one constant:
SELECT one.*
-- , 1+COALESCE(agg.rnk,0) AS rnk
FROM my_table one
LEFT JOIN (
SELECT one.id , COUNT(*) AS rnk
FROM my_table one
JOIN my_table cnt ON cnt.value > one.value
GROUP BY one.id
) agg ON agg.id = one.id
WHERE agg.rnk=1 -- the aggregate starts counting at zero
;
Both solutions need functional self-joins (I don't know if mysql allows them, IIRC it only disallows them if the table is the target for updates or deletes)
The below one does not need window functions, but uses a recursive query to enumerate the rankings:
WITH RECURSIVE agg AS (
SELECT one.id
, one.value
, 1 AS rnk
FROM my_table one
WHERE NOT EXISTS (
SELECT * FROM my_table zero
WHERE zero.value > one.value
)
UNION ALL
SELECT two.id
, two.value
, agg.rnk+1 AS rnk
FROM my_table two
JOIN agg ON two.value < agg.value
WHERE NOT EXISTS (
SELECT * FROM my_table nx
WHERE nx.value > two.value
AND nx.value < agg.value
)
)
SELECT * FROM agg
WHERE rnk = 2
;
(the recursive query will not work in mysql, obviously)

You can use inline initialization like this:
select * from (
select id,
name,
value,
#curRank := #curRank + 1 AS rank
from my_table t, (SELECT #curRank := 0) r
order by value desc
) tb
where tb.rank = 2

SELECT name
FROM my_table
WHERE value < (SELECT max(value) FROM my_table)
ORDER BY value DESC
LIMIT 1
SELECT name
FROM my_table
WHERE value = (
SELECT min(r.value)
FROM (
SELECT name, value
FROM my_table
ORDER BY value DESC
LIMIT 2
) r
)
LIMIT 1

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

SQL Query about percentage selection - mysql

Related

Deleting rows using a limit and offset without using IN clause

get the most common value for each column

mySQL --> LIMIT selected rows, WHERE column value is same

Mysql - Accumulatively count the total on a row by row basis

How many different ways are there to get the second row in a SQL search?

Categories

Resources