get the most common value for each column - mysql

I'm attempting to create an SQL query that retrieves the total_cost for every row in a table. Alongside that, I also need to collect the most dominant value for both columnA and columnB, with their respective values.
For example, with the following table contents:
cost
columnA
columnB
target
250
Foo
Bar
XYZ
200
Foo
Bar
XYZ
150
Bar
Bar
ABC
250
Foo
Bar
ABC
The result would need to be:
total_cost
columnA_dominant
columnB_dominant
columnA_value
columnB_value
850
Foo
Bar
250
400
Now I can get as far as calculating the total cost - that's no issue. I can also get the most dominant value for columnA using this answer. But after this, I'm not sure how to also get the dominant value for columnB and the values too.
This is my current SQL:
SELECT
SUM(`cost`) AS `total_cost`,
COUNT(`columnA`) AS `columnA_dominant`
FROM `table`
GROUP BY `columnA_dominant`
ORDER BY `columnA_dominant` DESC
WHERE `target` = "ABC"
UPDATE: Thanks to #Barmar for the idea of using a subquery, I managed to get the dominant values for columnA and columnB:
SELECT
-- Retrieve total cost.
SUM(`cost`) AS `total_cost`,
-- Get dominant values.
(
SELECT `columnA`
FROM `table`
GROUP BY `columnA`
ORDER BY COUNT(*) DESC
LIMIT 1
) AS `columnA_dominant`,
(
SELECT `columnB`
FROM `table`
GROUP BY `columnB`
ORDER BY COUNT(*) DESC
LIMIT 1
) AS `columnB_dominant`
FROM `table`
WHERE `target` = "XYZ"
However, I'm still having issues figuring out how to calculate the respective values.

You might get close, if we want to get percentage values we can try to add COUNT(*) at subquery to get max count by columnA and columnB then do division by total count
SELECT
SUM(cost),
(
SELECT tt.columnA
FROM T tt
GROUP BY tt.columnA
ORDER BY COUNT(*) DESC
LIMIT 1
) AS columnA_dominant,
(
SELECT tt.columnB
FROM T tt
GROUP BY tt.columnB
ORDER BY COUNT(*) DESC
LIMIT 1
) AS columnB_dominant,
(
SELECT COUNT(*)
FROM T tt
GROUP BY tt.columnA
ORDER BY COUNT(*) DESC
LIMIT 1
) / COUNT(*) AS columnA_percentage,
(
SELECT COUNT(*)
FROM T tt
GROUP BY tt.columnB
ORDER BY COUNT(*) DESC
LIMIT 1
) / COUNT(*) AS columnB_percentage
FROM T t1
If your MySQL version supports the window function, there is another way which reduce table scan might get better performance than a correlated subquery
SELECT SUM(cost) OVER(),
FIRST_VALUE(columnA) OVER (ORDER BY counter1 DESC) columnA_dominant,
FIRST_VALUE(columnB) OVER (ORDER BY counter2 DESC) columnB_dominant,
FIRST_VALUE(counter1) OVER (ORDER BY counter1 DESC) / COUNT(*) OVER() columnA_percentage,
FIRST_VALUE(counter2) OVER (ORDER BY counter2 DESC) / COUNT(*) OVER() columnB_percentage
FROM (
SELECT *,
COUNT(*) OVER (PARTITION BY columnA) counter1,
COUNT(*) OVER (PARTITION BY columnB) counter2
FROM T
) t1
LIMIT 1
sqlfiddle

try this query
select sum(cost) as total_cost,p.columnA,q.columnB,p.columnA_percentage,q.columnB_percentage
from get_common,(
select top 1 columnA,columnA_percentage
from(
select columnA,count(columnA) as count_columnA,cast(count(columnA) as float)/(select count(columnA) from get_common) as columnA_percentage
from get_common
group by columnA)s
order by count_columnA desc
)p,
(select top 1 columnB,columnB_percentage
from (
select columnB,count(columnB) as count_columnB, cast(count(columnB) as float)/(select count(columnB) from get_common) as columnB_percentage
from get_common
group by columnB) t
order by count_columnB desc)q
group by p.columnA,q.columnB,p.columnA_percentage,q.columnB_percentage
so if you want to get the percent and dominant value you must make their own query like this
select top 1 columnA,columnA_percentage
from(
select columnA,count(columnA) as count_columnA,cast(count(columnA) as float)/(select count(columnA) from get_common) as columnA_percentage
from get_common
group by columnA)s
order by count_columnA desc
then you can join with the sum query to get all value you want
hope this can help you

Related

SQL Query about percentage selection

I am trying to write a query for a condition:
If >=80 percent (4 or more rows as 4/5*100=80%) of the top 5 recent rows(by Date Column), for a KEY have Value =A or =B, then change the flag from fail to pass for the entire KEY.
Here is the input and output sample:
I have highlighted recent rows with green colour in the sample.
Can someone help me in this?
I tried till finding the top 5 recent rows by the foll code:
select * from(
select *, row_number() over (partition by "KEY") as 'RN' FROM (
select * from tb1
order by date desc))
where "RN"<=5
Couldnt figure what to be done after this
Test this:
WITH
-- enumerate rows per key group
cte1 AS ( SELECT *,
ROW_NUMBER() OVER (PARTITION BY `key` ORDER BY `date` DESC) rn
FROM sourcetable ),
-- take 5 recent rows only, check there are at least 4 rows with A/B
cte2 AS ( SELECT `key`
FROM cte1
WHERE rn <= 5
GROUP BY `key`
HAVING ( SUM(`value` = 'A') >= 4
OR SUM(`value` = 'B') >= 4 )
-- AND SUM(rn = 5) )
-- update rows with found key values
UPDATE sourcetable
JOIN cte2 USING (`key`)
SET flag = 'PASS';
5.7 version – Ayn76
Convert CTEs to subqueries. Emulate ROW_NUMBER() using user-defined variable.

mySQL --> LIMIT selected rows, WHERE column value is same

This is my query... SELECT * FROM comments WHERE content_id in (525, 537) LIMIT 60 This is the SS of result:
here content_id = 537 is selected 5 times.
(comment_id is UNIQUE key )..
My question is: How to limit selected rows by 2, where values of content_id is same...
Maximum two duplicate records for each content_id... like in this picture:
If you are running MySQL 8.0, you can do this with row_number():
select comment_id, content_id
from (
select t.*, row_number() over(partition by content_id order by comment_id) rn
from mytable t
) t
where rn <= 2
In earlier versions, one solution is a correlated subquery:
select t.*
from mytable t
where (
select count(*)
from mytable t1
where t1.content_id = t.content_id and t1.comment_id < t.comment_id
) < 2

How to select last and last but one records

I have a table with 3 columns id, type, value like in image below.
What I'm trying to do is to make a query to get the data in this format:
type previous current
month-1 666 999
month-2 200 15
month-3 0 12
I made this query but it gets just the last value
select *
from statistics
where id in (select max(id) from statistics group by type)
order
by type
EDIT: Live example http://sqlfiddle.com/#!9/af81da/1
Thanks!
I would write this as:
select s.*,
(select s2.value
from statistics s2
where s2.type = s.type
order by id desc
limit 1, 1
) value_prev
from statistics s
where id in (select max(id) from statistics s group by type) order by type;
This should be relatively efficient with an index on statistics(type, id).
select
type,
ifnull(max(case when seq = 2 then value end),0 ) previous,
max( case when seq = 1 then value end ) current
from
(
select *, (select count(*)
from statistics s
where s.type = statistics.type
and s.id >= statistics.id) seq
from statistics ) t
where seq <= 2
group by type

How to write sql query to get items from range

I would like to get values without the smallest and the biggest ones, so without entry with 2 and 29 in column NumberOfRepeating.
My query is:
SELECT Note, COUNT(*) as 'NumberOfRepeating'
WHERE COUNT(*) <> MAX(COUNT(*))AND COUNT(*) <> MIN(COUNT(*))
FROM Note GROUP BY Note;
SELECT Note, COUNT(*) as 'NumberOfRepeating'
FROM Notes
GROUP BY Note
HAVING count(*) <
(
SELECT max(t.maxi)
FROM (select
Note, COUNT(Note) maxi FROM Notes
GROUP BY Note
) as t
)
AND
count(*) >
(
SELECT min(t.min)
FROM (select
Note, COUNT(Note) min FROM Notes
GROUP BY Note
) as t
)
try this code.
One method would use order by and limit, twice:
select t.*
from (select t.*
from t
order by NumberOfRepeating asc
limit 99999999 offset 1
) t
order by NumberOfRepeating desc
limit 99999999 offset 1;
Try this code,
Select * from Note where NumberOfRepeating < (select MAX(NumberOfRepeating) from Note ) AND NumberOfRepeating > (select MIN(NumberOfRepeating) from Note );
Here in the code, as in your table Note is the name of the table, and NumberOfRepeating is the column name, as in your table.
Try this. It should work
SELECT *
FROM ( SELECT Note, COUNT(*) as 'NumberOfRepeating'
FROM Notes
GROUP BY Note
ORDER BY NumberOfRepeating DESC
LIMIT 1, 2147483647
) T1
ORDER BY T1.NumberOfRepeating
LIMIT 1, 2147483647

MySQL Sorting using order by not working using unio

I'm using an union statement in mysql but i've some problems sorting the results. The ORDER statement doesn't works at all, the results comes out always sorted by the id field.
Here an example query:
SELECT a.* FROM ( ( select * from ticket_ticket AS t1 WHERE ticket_active=1 ORDER BY t1.ticket_date_last_modified DESC )
UNION ( select * from ticket_ticket AS t2 WHERE ticket_active=0 ORDER BY t2.ticket_date_last_modified DESC, t2.ticket_status_id DESC ) )
AS a LIMIT 0,20;
I want to order the results of the first SELECT by last_modified time, and the second SELECT by time and status. But the ORDER statement get just skipped. The results always come out ordered by the ticket_id ( the PRIMARY KEY ).
What's wrong in this query ?
Thanks!
Ok, i've fixed it writing the query this way:
SELECT a.*
FROM
(SELECT *
FROM ticket_ticket
WHERE ticket_active=1
ORDER BY ticket_date_last_modified DESC) AS a
UNION ALL
SELECT b.*
FROM
(SELECT *
FROM ticket_ticket
WHERE ticket_active=0
ORDER BY ticket_date_last_modified DESC, ticket_status_id DESC) AS b LIMIT 0,
20;
You are using a UNION query that will return distinct values, and the order of the returned rows is not guaranteed.
But you don't need an union query for this:
select *
from ticket_ticket AS t1
ORDER BY
ticket_active!=1,
ticket_date_last_modified DESC,
ticket_status_id DESC
LIMIT 0,20;