SQL: aggregate over aggregate (max over sums) - mysql

I have problem creating valid query to aggregate over aggregate subquery.
MySQL allows some non-ANSI constructs but they give incorrect results.
CREATE TABLE `log` (
`id` int NOT NULL,
`id_user` varchar(32) NOT NULL,
`datastamp` datetime NOT NULL DEFAULT now(),
`processed` int NOT NULL DEFAULT '0',
PRIMARY KEY (`id`));
I want to have result table consisting of "best" user for every year (where "best" means having highest total sum over processed field), like:
source table:
2010 | u1 | 1
2010 | u1 | 3
2010 | u2 | 2
2011 | u1 | 1
2011 | u1 | 1
2011 | u2 | 5
result:
2010 | u1 | 4
2011 | u2 | 5
simple query
select year(datastamp) as y, id_user, sum(processed) as ps from log group by id_user, y
gives all sums per user and year:
2010 | u1 | 4
2010 | u2 | 2
2011 | u1 | 2
2011 | u2 | 5
but I can't select rows with highest sum for every year.
Trying something like
select y, max(ps), id_user from(...) group by y
although accepted by MySQL gives incorrect id_user field. Other solutions I found on stackoverflow suggest joining base table with subquery but I cannot use aggregate results (sum(processed) as ps) inside ON condition.

I think windowing functions might help you in this case. You can query the data using below query -
select *
from
(
select year, id_user, ps, rank() over (partition by year order by ps desc) as ranks_per_year
from
(
select year, id_user, sum(processed) as ps
from table
group by 1,2
) A
) B
where ranks_per_year = 1
rank() and dense_rank() are 2 methods you can use in case of tie.
In case the rank() does not work in your engine like you were mentioning, you can go ahead with max() function. Here is the query
with tbl as
(
select '2010' as year,'u1' as id_user,1 as processed union all
select '2010','u1',3 union all
select '2010','u2',2 union all
select '2011','u1',1 union all
select '2011','u1',1 union all
select '2011','u2',5
)
select *
from
(
select year, id_user, ps,
max(ps) over (partition by year) as max_ps_per_year
from
(
select year, id_user, sum(processed) as ps
from tbl
group by 1,2
) A
) B
where ps = max_ps_per_year

Related

MySQL - Count unique users each day considering all previous days

I would like to count how many new unique users the database gets each day for all days recorded.
There will not be any duplicate ids per day, but there will be duplicates over multiple days.
If my table looks like this :
ID | DATE
---------
1 | 2022-05-21
1 | 2022-05-22
2 | 2022-05-22
1 | 2022-05-23
2 | 2022-05-23
1 | 2022-05-24
2 | 2022-05-24
3 | 2022-05-24
I would like the results to look like this :
DATE | NEW UNIQUE IDs
---------------------------
2022-05-21 | 1
2022-05-22 | 1
2022-05-23 | 0
2022-05-24 | 1
A query such as :
SELECT `date` , COUNT( DISTINCT id)
FROM tbl
GROUP BY DATE( `date` )
Will return the count per day and will not take into account previous days.
Any assistance would be appreciated.
Edit : Using MySQL 8
The user is new when the date is the least date for this user.
So you need in something like
SELECT date, COUNT(new_users.id)
FROM calendar
LEFT JOIN ( SELECT id, MIN(date) date
FROM test
GROUP BY id ) new_users USING (date)
GROUP BY date
calendar is either static or dynamically generated table with needed dates list. It can be even SELECT DISTINCT date FROM test subquery.
Start with a subquery showing the earliest date where each id appears.
SELECT MIN(`date`) `firstdate`, id
FROM tbl
GROUP BY id
Then do your count on that subquery. here.
SELECT firstdate, COUNT(*)
FROM (
SELECT MIN(`date`) `firstdate`, id
FROM tbl
GROUP BY id
) m
GROUP BY firstdate
That gives you what you want.
But it doesn't have rows for the dates where no new user ids first appeared.
Only count (and sum) the rows where the left join fails:
SELECT
m1.`DATE` ,
sum(CASE WHEN m2.id is null THEN 1 ELSE 0 END) as C
FROM mytable m1
LEFT JOIN mytable m2 ON m2.`DATE`<m1.`DATE` AND m2.ID=m1.ID
GROUP BY m1.`DATE`
see: DBFIDDLE

sum of count(*) for all rows in MySQL

I'm stuck with sum() query where I want the sum of count(*) values in all rows with group by.
Here is the query:
select
u.user_type as user,
u.count,
sum(u.count)
FROM
(
select
DISTINCT
user_type,
count(*) as count
FROM
users
where
(user_type = "driver" OR user_type = "passenger")
GROUP BY
user_type
) u;
Current Output:
----------------------------------
| user | count | sum |
----------------------------------
| driver | 58 | 90 |
----------------------------------
Expected Output:
----------------------------------
| user | count | sum |
----------------------------------
| driver | 58 | 90 |
| passenger | 32 | 90 |
----------------------------------
If I remove sum(u.count) from query then output is looks like:
--------------------------
| user | count |
--------------------------
| driver | 58 |
| passenger | 32 |
--------------------------
You need a subquery:
SELECT user_type,
Count(*) AS count,
(SELECT COUNT(*)
FROM users
WHERE user_type IN ("driver","passenger" )) as sum
FROM users
WHERE user_type IN ("driver","passenger" )
GROUP BY user_type ;
Note you dont need distinct here.
OR
SELECT user_type,
Count(*) AS count,
c.sum
FROM users
CROSS JOIN (
SELECT COUNT(*) as sum
FROM users
WHERE user_type IN ("driver","passenger" )
) as c
WHERE user_type IN ("driver","passenger" )
GROUP BY user_type ;
You can use WITH ROLLUP modifier:
select coalesce(user_type, 'total') as user, count(*) as count
from users
where user_type in ('driver', 'passenger')
group by user_type with rollup
This will return the same information but in a different format:
user | count
----------|------
driver | 32
passenger | 58
total | 90
db-fiddle
In MySQL 8 you can use COUNT() as window function:
select distinct
user_type,
count(*) over (partition by user_type) as count,
count(*) over () as sum
from users
where user_type in ('driver', 'passenger');
Result:
user_type | count | sum
----------|-------|----
driver | 32 | 90
passenger | 58 | 90
db-fiddle
or use CTE (Common Table Expressions):
with cte as (
select user_type, count(*) as count
from users
where user_type in ('driver', 'passenger')
group by user_type
)
select user_type, count, (select sum(count) from cte) as sum
from cte
db-fiddle
I would be tempted to ask; Are you sure you need this at the DB level?
Unless you are working purely in the database layer, any processing of these results will be built into an application layer and will presumably require some form of looping through the results
It could be easier, simpler, and more readable to run
SELECT user_type,
COUNT(*) AS count
FROM users
WHERE user_type IN ("driver", "passenger")
GROUP BY user_type
.. and simply add up the total count in the application layer
As pointed out by Juan in another answer, the DISTINCT is redundant as the GROUP BY ensures that each resultant row is different
Like Juan, I also prefer an IN here, rather than OR condition, for the user_type as I find it more readable. It also reduces the likelihood of confusion if combining further AND conditions in the future
As an aside, I would consider moving the names of the user types, "driver" and "passenger" into a separate user_types table and referencing them by an ID column from your users table
N.B. If you absolutely do need this at the DB level, I would advocate using one of Paul's excellent options, or the CROSS JOIN approach proffered by Tom Mac, and by Juan as his second suggested solution
Try this. Inline view gets the overall total :
SELECT a.user_type,
count(*) AS count,
b.sum
FROM users a
JOIN (SELECT COUNT(*) as sum
FROM users
WHERE user_type IN ("driver","passenger" )
) b ON TRUE
WHERE a.user_type IN ("driver","passenger" )
GROUP BY a.user_type;
You could simply combine SUM() OVER() with COUNT(*):
SELECT user_type, COUNT(*) AS cnt, SUM(COUNT(*)) OVER() AS total
FROM users WHERE user_type IN ('driver', 'passenger') GROUP BY user_type;
db<>fiddle demo
Output:
+------------+------+-------+
| user_type | cnt | total |
+------------+------+-------+
| passenger | 58 | 90 |
| driver | 32 | 90 |
+------------+------+-------+
Add a group by clause at the end for user-type, e.g:
select
u.user_type as user,
u.count,
sum(u.count)
FROM
(
select
DISTINCT
user_type,
count(*) as count
FROM
users
where
(user_type = "driver" OR user_type = "passenger")
GROUP BY
user_type
) u GROUP BY u.user_type;
Tom Mac Explain Properly Your answer. Here is the another way you can do that.
I check the query performance and not found any difference within 1000 records
select user_type,Countuser,(SELECT COUNT(*)
FROM users
WHERE user_type IN ('driver','passenger ') )as sum from (
select user_type,count(*) as Countuser from users a
where a.user_type='driver'
group by a.user_type
union
select user_type,count(*) as Countuser from users b
where b.user_type='passenger'
group by b.user_type
)c
group by user_type,Countuser
Try this:
WITH SUB_Q AS (
SELECT USER_TYPE, COUNT (*) AS CNT
FROM USERS
WHERE USER_TYPE = "passenger" OR USER_TYPE = "driver"
GROUP BY USER_TYPE
),
SUB_Q2 AS (
SELECT SUM(CNT) AS SUM_OF_COUNT
FROM SUB_Q
)
SELECT A.USER_TYPE, A.CNT AS COUNT, SUB_Q2 AS SUM
FROM SUB_Q JOIN SUB_Q2 ON (TRUE);
I used postgresql dialect but you can easily change to a subquery.
select
u.user_type as user,
u.count,
sum(u.count)
FROM users group by user

SQL: Get the most frequent value for each group

Lets say that I have a table ( MS-ACCESS / MYSQL ) with two columns ( Time 'hh:mm:ss' , Value ) and i want to get most frequent value for each group of row.
for example i have
Time | Value
4:35:49 | 122
4:35:49 | 122
4:35:50 | 121
4:35:50 | 121
4:35:50 | 111
4:35:51 | 122
4:35:51 | 111
4:35:51 | 111
4:35:51 | 132
4:35:51 | 132
And i want to get most frequent value of each Time
Time | Value
4:35:49 | 122
4:35:50 | 121
4:35:51 | 132
Thanks in advance
Remark
I need to get the same result of this Excel solution : Get the most frequent value for each group
** MY SQL Solution **
I found a solution(Source) that works fine with mysql but i can't get it to work in ms-access:
select cnt1.`Time`,MAX(cnt1.`Value`)
from (select COUNT(*) as total, `Time`,`Value`
from `my_table`
group by `Time`,`Value`) cnt1,
(select MAX(total) as maxtotal from (select COUNT(*) as total,
`Time`,`Value` from `my_table` group by `Time`,`Value`) cnt3 ) cnt2
where cnt1.total = cnt2.maxtotal GROUP BY cnt1.`Time`
Consider an INNER JOIN to match the two derived table subqueries rather than a list of subquery select statements matched with WHERE clause. This has been tested in MS Access:
SELECT MaxCountSub.`Time`, CountSub.`Value`
FROM
(SELECT myTable.`Time`, myTable.`Value`, Count(myTable.`Value`) AS CountOfValue
FROM myTable
GROUP BY myTable.`Time`, myTable.`Value`) As CountSub
INNER JOIN
(SELECT dT.`Time`, Max(CountOfValue) As MaxCountOfValue
FROM
(SELECT myTable.`Time`, myTable.`Value`, Count(myTable.`Value`) AS CountOfValue
FROM myTable
GROUP BY myTable.`Time`, myTable.`Value`) As dT
GROUP BY dT.`Time`) As MaxCountSub
ON CountSub.`Time` = MaxCountSub.`Time`
AND CountSub.CountOfValue = MaxCountSub.MaxCountOfValue
you can do this by query like this:
select time, value
from (select value, time from your_table
group by value , time
order by count(time) desc
) temp where temp.value = value
group by value

SELECT visitors that have visited more than one place in a day along with the details

my mysql table is like:
+---------+---------+------------+-----------------------+---------------------+
| visitId | userId | locationId | comments | time |
+---------+---------+------------+-----------------------+---------------------+
| 1 | 3 | 12 | It's a good day here! | 2012-12-12 11:50:12 |
+---------+---------+------------+-----------------------+---------------------+
| 2 | 3 | 23 | very beautiful | 2012-12-12 12:50:12 |
+---------+---------+------------+-----------------------+---------------------+
| 3 | 3 | 52 | nice | 2012-12-12 13:50:12 |
+---------+---------+------------+-----------------------+---------------------+
witch records visitors' trajectory and some comments on the places visited
I want to find visitors visited more than one place in a day, along with the specific day AND the places, Not only the count.
I tried the subquery:
mysql> SELECT userId, locationId, time FROM visits
WHERE (userId,DATE(time)) in (
SELECT userNum, Date(weiboTime) from visits GROUP BY userNum, Date(wei
boTime) Having COUNT(*)>1);
And the joint query:
mysql> select v2.userId, v1.loacationId, v1.time from visits as v1, visits as
v2 where v1.userId=v2.userId GROUP BY v2.userId, Date(v2.time) HAVING
COUNT(DISTINCT v2.locationId);
I am not sure whether it is correct for the second one. But both of them take too long time. Any suggestions for what should I do?
UPDATE
mysql> SELECT t.userId, locationId, t.time FROM (
SELECT userId, time
FROM visits GROUP BY userId,Date(time)
HAVING COUNT(*) > 1) AS t, visits
WHERE t.userId=visits.userId AND t.time=visits.time;
hope this will make myself more clear.
Your queries are including locationId, but your stated goal is to get user/date combos that had more than 1 visit in a day. Here's the sql to get that:
select userId, date(time), count(*)
from visits
group by userId, date(time)
having count(*) > 1;
Update:
To get all visits from user/day combo visits greater than 1:
select *
from visits
where (userId, date(time)) in (
select userId, date(time)
from visits
group by userId, date(time)
having count(*) > 1);
I think you'd be better suited using a GROUP BY in a subquery with your count. From MySQL's count documentation, you can do something like:
mysql> SELECT userId, locationId, time, visitCount
FROM
(SELECT COUNT(*) as visitCount
FROM visits
GROUP BY userId)
WHERE visitCount > 1;
I'd assume the slowness you're encountering comes from the HAVING and DISTINCT in your WHERE clauses.

Query to Segment Results Based on Equal Sets of Column Value

I'd like to construct a single query (or as few as possible) to group a data set. So given a number of buckets, I'd like to return results based on a specific column.
So given a column called score which is a double which contains:
90.00
91.00
94.00
96.00
98.00
99.00
I'd like to be able to use a GROUP BY clause with a function like:
SELECT MIN(score), MAX(score), SUM(score) FROM table GROUP BY BUCKETS(score, 3)
Ideally this would return 3 rows (grouping the results into 3 buckets with as close to equal count in each group as is possible):
90.00, 91.00, 181.00
94.00, 96.00, 190.00
98.00, 99.00, 197.00
Is there some function that would do this? I'd like to avoid returning all the rows and figuring out the bucket segments myself.
Dave
create table test (
id int not null auto_increment primary key,
val decimal(4,2)
) engine = myisam;
insert into test (val) values
(90.00),
(91.00),
(94.00),
(96.00),
(98.00),
(99.00);
select min(val) as lower,max(val) as higher,sum(val) as total from (
select id,val,#row:=#row+1 as row
from test,(select #row:=0) as r order by id
) as t
group by ceil(row/2)
+-------+--------+--------+
| lower | higher | total |
+-------+--------+--------+
| 90.00 | 91.00 | 181.00 |
| 94.00 | 96.00 | 190.00 |
| 98.00 | 99.00 | 197.00 |
+-------+--------+--------+
3 rows in set (0.00 sec)
Unluckily mysql doesn't have analytical function like rownum(), so you have to use some variable to emulate it. Once you do it, you can simply use ceil() function in order to group every tot rows as you like. Hope that it helps despite my english.
set #r = (select count(*) from test);
select min(val) as lower,max(val) as higher,sum(val) as total from (
select id,val,#row:=#row+1 as row
from test,(select #row:=0) as r order by id
) as t
group by ceil(row/ceil(#r/3))
or, with a single query
select min(val) as lower,max(val) as higher,sum(val) as total from (
select id,val,#row:=#row+1 as row,tot
from test,(select count(*) as tot from test) as t2,(select #row:=0) as r order by id
) as t
group by ceil(row/ceil(tot/3))