mysql/postgres window function limit result without subquery - mysql

Is it possible to limit the result of a window function, with partitioning, without a subquery? This code is in postgres/mysql. I'm looking for solution in mysql and postgres.
For example: let's say the join is irrelevant to the point of the question.
select acct.name, we.channel, count(*) as cnt,
max(count(*)) over (partition by name order by count(*) desc) as max_cnt
from web_events we join accounts acct
on we.account_id=acct.id
group by acct.name, we.channel
order by name, max_cnt desc;
The result of this query gives:
I only want to show the first line of each of the window's partition.
For example: lines with cnt: [3M,19],[Abbott Labortories,20]
I tried the following that doesn't work (added limit 1 to the window function):
select acct.name, we.channel, count(*) as cnt,
max(count(*)) over (partition by name order by count(*) desc limit 1) as max_cnt
from web_events we join accounts acct
on we.account_id=acct.id
group by acct.name, we.channel
order by name, max_cnt desc;

I only want to show the first line of each of the window's partition. For example: lines with cnt: [3M,19],[Abbott Labortories,20]
You don't actually need a window function here, since the first row's max_cnt will always equal cnt. Instead use DISTINCT ON in combination with the GROUP BY.
From the postgresql documentation
SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the “first row” of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first
SELECT DISTINCT ON(acct.name)
acct.name
, we.channel
, COUNT(*) cnt
FROM web_events we
JOIN accounts acct
ON we.account_id=acct.id
GROUP BY 1, 2
ORDER BY name, cnt DESC;
Here's a quick demo in sqlfiddle. http://sqlfiddle.com/#!17/57694/8
1 way I always messed up when I first started using DISTINCT ON is to ensure that the order of expressions in the ORDER BY clause starts with the expressions in the DISTINCT ON. In the above example the ORDER BY starts with acct.name
If there is a tie for first position, the first row that meets the criteria will be returned. This is non-deterministic. It is possible to specify additional expressions in the ORDER BY to affect which rows are returned in this setting.
example:
ORDER BY name, cnt DESC, channel = 'direct'
will return the row containing facebook, if for a given account, both facebook and direct yield the same cnt.
However, note that with this approach, it is not possible to return all the rows that are tied for first position, i.e. both rows containing facebook & direct (without using a subquery).
DISTINCT ON may be combined in the same statement with GROUP BYs (above example) and WINDOW FUNCTIONS (example below). The DISTINCT ON clause is logically evaluated just before the LIMIT.
For instance, the following query (however pointless) shows off the combination of DISTINCT ON with WINDOW FUNCTION. It will return a distinct row per max_cnt
SELECT DISTINCT ON(mxcnt)
acct.name
, we.channel
, COUNT(*) cnt
, MAX(COUNT(*)) OVER (PARTITION BY acct.name) mxcnt
FROM web_events we
JOIN accounts acct
ON we.account_id=acct.id
GROUP BY 1, 2
ORDER BY mxcnt, cnt DESC;

Use a subquery. If you want exactly one row (even if there are ties), then use row_number():
select name, channel, cnt
from (select acct.name, we.channel, count(*) as cnt,
row_number() over (partition by acct.name order by count(*) desc) as seqnum
from web_events we join
accounts acct
on we.account_id = acct.id
group by acct.name, we.channel
) wea
order by name;
You can use rank() if you want multiple rows for an account, in the event of ties.

Related

Mysql Order By count?

Actually I'm working with the following table fsa_areas:
Note that each area has a responsible
Now, what I need to do, is to order the same table as following:
Note that now the results are ordered by the the responsible with more areas and at the end the responsible with less areas.
Is there a way to order them in that way?
You can use a COUNT subquery in the ORDER BY clause:
select a.*
from fsa_areas a
order by (select count(*) from fsa_areas a1 where a1.Responsible = a.Responsible) desc
Another way is to get the count in a derived table and join the base table to it
select a.*
from (
select Responsible, count(*) as cnt
from fsa_areas
group by Responsible
) r
join fsa_areas a using(Responsible)
order by r.cnt desc
In MySQL 8 you can use COUNT() as window function:
select *, count(*) over (partition by Responsible) as cnt
from fsa_areas
order by cnt desc

MySQL Row more appear with multiple row same result

I have a query with a count with some group by, and I want to get the greater count. I can do with an order by and limit 1, but I have multiple results with the same count and then does not work for me.
How do I solve this problem?
With MySql 8+, you can use CTE to get the maximum count, and then retrieve all records with a count equal to the maximum count.
However, since CTE is not available in MySql 5.6, you'd need to use a sub-query to get the maximum count, and then write the main query which compares the count of each record to the maximum count retrieved in the subquery.
Here is a query I wrote. Maybe it's not the most efficient solution, but it gets the desired result.
SELECT
group_id, COUNT(record_id) c
FROM
table_name
GROUP BY group_id
HAVING c IN (
SELECT
MAX(sub_query.c)
FROM
(SELECT
group_id, COUNT(record_id) c
FROM
table_name
GROUP BY group_id) AS sub_query
)
If you are not going to do this using window functions, you can do:
SELECT group_id, COUNT(*) as cnt
FROM table_name
GROUP BY group_id
HAVING cnt = (SELECT COUNT(*)
FROM table_name
GROUP BY group_id
ORDER BY COUNT(*) DESC
LIMIT 1
) ;

The last value using GROUP BY

I need to take the last value from table where can_id equal.
So I've tried this SQL query
SELECT com.text, com.can_id
FROM (SELECT * FROM comments ORDER BY id DESC) as com
GROUP BY com.can_id
But if I change ASC / DESC in the first select, the second select will just group without sorting and take the value with the first id
This select will be used like left join in the query.
Example:
I need to get com.text with value "text2" (lasts)
If you are on MySql 8, you can use row_number:
SELECT com.text, com.can_id
FROM (SELECT comments.*,
row_number() over (partition by can_id order by id desc) rn
FROM comments) as com
WHERE rn = 1;
If you are on MySql 5.6+, you can (ab)use group_concat:
SELECT SUBSTRING_INDEX(group_concat(text order by id desc), ',', 1),
can_id
FROM comments
GROUP BY can_id;
In any version of MySQL, the following will work:
SELECT c.*
FROM comments c
WHERE c.id = (SELECT MAX(c2.id)
FROM comments c2
WHERE c2.can_id = c.can_id
);
With an index on comments(can_id, id), this should also have the best performance.
This is better than a group by approach because it can make use of an index and is not limited by some internal limitation on intermediate string lengths.
This should have better performance than row_number() because it does not assign a row number to each row, only then to filter things out.
The order by clause in the inner select is redundant since it's being used as a table, and tables in a relational database are unordered by nature.
While other databases such as SQL Server will treat is as an error, I guess MySql simply ignores it.
I think you are looking for something like this:
SELECT text, can_id
FROM comments
ORDER BY id DESC
LIMIT 1
This way you get the text and can_id associated with the highest id value.

SQL Occurrence Comparison?

**Above is a picture of this particular table*
I need to write a query for a database that lists the name of a department for the department that controls the most projects.
In my database, departments are identified by dnums.
So my question is, how can I write something that checks for the greatest occurrence of a Dnum in SQL? Because that's how I will identify the department that controls the most projects.
I've tried several different queries, but none of them work properly.
Could anyone explain a method that could compare occurrences?
You know already how to count per department:
select dnum, count(*) from project group by dnum;
In SQL Server it is easy to select the dnum(s) with the maximum occurrences; you order by count descending and take the top row(s) using TOP() WITH TIES.
select top(1) with ties dnum from project group by dnum order by count(*) desc;
(In standard SQL that would be order by count(*) desc fetch 1 row with ties).
In standard SQL (and SQL Server) you also have the option of ranking your records per count:
select dnum
from (select dnum, rank() over (order by count(*) desc) as rnk from project) ranked
where rnk = 1;
MySQL doesn't give you any of these options, lacking both a WITH TIES clause and analytic functions such as RANK.
So in MySQL you would not find the departments with the maximum count in one step, but only the maximum count alone first. You would get the according department(s) only in a second step.
Two approaches here:
select count(*) from project group by dnum order by count(*) desc limit 1;
or
select max(cnt) from (select count(*) as cnt from project group by dnum) counted;
Then join the counted departments again:
select p.dnum
from
(
select count(*) as cnt
from project
group by dnum
order by count(*) desc limit 1
) m
(
select dnum, count(*) as cnt
from project
group by dnum
) p on p.cnt = m.cnt;
The last step is the same in both DBMS:
select dname
from departments
where dnumber in (select dnum ...);
(Or join the departments table instead, so you can show both name and count.)
You can use the COUNT function:
SELECT dnum, COUNT(*)
FROM project
GROUP BY dnum
ORDER BY COUNT(*)
If you need the department_name, you'll have to join to your department table (assuming you have one). It could look something like this:
SELECT d.dnum, d.name, COUNT(p.pnumber)
FROM department d
INNER JOIN projects p ON d.dnum = p.dnum
GROUP BY d.dnum, d.name
ORDER BY COUNT(p.pnumber)

GROUP BY in subquery to get accurate ranking

I'm trying to get the rank of a particular lap time of a specific track owned by a particular user.
There are multiple rows (laps) in this table for a specific user. So I'm trying to GROUP BY as seen in the subquery of FIND_IN_SET.
Right now MySQL (latest version) is complaining that my session_id,user_id,track_id,duration are not aggregated for the GROUP BY.
Which I don't understand why its complaining about this since the GROUP BY is in a subquery.
session_lap_times schema:
session_id, int
user_id, int
track_id, int
duration, decimal
This is what I've got so far.
SELECT
session_id
user_id,
track_id,
duration,
FIND_IN_SET( duration,
(SELECT GROUP_CONCAT( duration ORDER BY duration ASC ) FROM
(SELECT user_id,track_id,min(duration)
FROM session_lap_times GROUP BY user_id,track_id) AS aa WHERE track_id=s1.track_id)
) as ranking
FROM session_lap_times s1
WHERE user_id=1
It seems like its trying to enforce the group by rules on the parent queries as well.
For reference, this is the error I'm getting: http://imgur.com/a/ILufE
Any help is greatly appreciated.
If I'm not mistaken, the problem is here (broken out for clarity):
SELECT user_id,track_id,any_value(duration)
FROM session_lap_times
GROUP BY user_id
The query is probably barfing because track_id is in the select and not in the group by. That means the subselect doesn't stand on its own and makes the whole thing fail.
Try adding track_id to your group by and adjust from there.
You are grouping by user_id but you do not do any aggregation in select or having in the following sub-query
SELECT
user_id,any_value(track_id),any_value(duration)
FROM session_lap_times GROUP BY user_id
You are using GROUP_CONCAT in a wrong context in the following sub-query because you do not group any column in ranking temporary table.
(SELECT GROUP_CONCAT( duration ORDER BY duration ASC ) FROM
(SELECT user_id,track_id,any_value(duration)
FROM session_lap_times GROUP BY user_id,track_id) AS aa WHERE track_id=s1.track_id)
) as ranking