mysql limit inside group? - mysql

I want to limit the size of records inside a group, and here is my trial, how to do it right?
mysql> select * from accounts limit 5 group by type;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual
that corresponds to your MySQL server version for the
right syntax to use near 'group by type' at line 1

The point of an aggregate function (and the GROUP BY it requires) is to turn many rows into one row. So if you really just want the top 5 savings accounts and the top 5 chequing accounts and the top 5 USD accounts etc., what you need is more like this:
criteria: top 5 of particular account type by account_balance
SELECT account_type, account_balance FROM accounts WHERE account_type='savings'
ORDER BY account_balance DESC LIMIT 5
UNION
SELECT account_type, account_balance FROM accounts WHERE account_type='chequing'
ORDER BY account_balance DESC LIMIT 5
UNION
SELECT account_type, account_balance FROM accounts WHERE account_type='USD'
ORDER BY account_balance DESC LIMIT 5;
It's not pretty, but if you construct the SQL with a script then subbing in the account_types and concatenating together a query is straightforward.

I've had some luck with using numbered rows:
set #type = '';
set #num = 0;
select
items.*,
#num := if(#type = item_type, #num + 1, 1) as dummy_1,
#type := item_type as dummy_2,
#num as row_number
from items
group by
item_type,
row_number
having row_number < 3;
This will give you 2 results per item_type. (One gotcha: make sure you re-run the first two set statements otherwise your row numbers will steadily get higher and higher and the row_number < 3 restriction won't work.
I pieced this together from a couple of posts which have been linked in other answers on SO.

It appears you want to limit the number of rows returned within each group of your overall result set... this is difficult to do in a way that scales well. One technique is to perform N joins on the same table with the conditions such that the only rows that match are the top/bottom N that you want.
this page may offer some additional insight into your solution... although returning the top 5 in each group is going to get ugly fast.

Try placing the LIMIT clause after the GROUP BY clause.
EDIT: Try this:
SELECT *
FROM accounts a1
WHERE 5 >
(
SELECT COUNT(*)
FROM accounts a2
WHERE a2.type = a1.type
AND a2.balance > a1.balance
)
This returns at most 5 accounts of each type with the biggest balances.

Group by is used for aggregate functions (sums, averages...)
Is allows you to split the aggregate result into groups. You have not used one of these functions.

I am not sure you can use a limit in the group by. You can probably use it if your group by is a sub select that returns one row/value. For example:
select * from foo order by (select foo2.id from foo2 limit 1)
I am just guessing this would work.

This will probably do the trick, although if type isn't indexed, it'll be sloooowwww. And even with one, it's not especially fast:
SELECT a.*
FROM accounts a
LEFT JOIN accounts a2 ON (a2.type = a.type AND a2.id < a.id)
WHERE count(a2.id) < 5
GROUP BY a.id;
A better bet would be to just order the list by type and then use a loop at the business layer to remove the rows you don't want.

#dnagirl's answer almost has it, but for some reason, my version of MySQL only returns the first LIMIT'd set. To get around that, I put each statement into a subquery
SELECT * FROM (
SELECT account_type, account_balance FROM accounts WHERE account_type='savings'
ORDER BY account_balance DESC LIMIT 5
) as a
UNION
SELECT* FROM (
SELECT account_type, account_balance FROM accounts WHERE account_type='chequing'
ORDER BY account_balance DESC LIMIT 5
) as b
UNION
SELECT * FROM (
SELECT account_type, account_balance FROM accounts WHERE account_type='USD'
ORDER BY account_balance DESC LIMIT 5
) as c
This gave me back each set's results in the final result set. Otherwise, I would have only gotten the first 5 from the first query and nothing else - not sure if it's just some MySQL funk with my version

Related

How do I do a dynamic UNION query in MySQL?

mytable has an auto-incrementing id column which is an integer, and for all intents and purposes in this case you can safely assume that the higher ID represents a more recent value. mytable also has an indexed column called group_id which is a foreign key to the groups table.
I want a quick and dirty query to select the 5 most recent rows for each group_id from mytable.
If there were only three groups, this would be easy, as I could do this:
SELECT * FROM `mytable` WHERE `group_id` = 1 ORDER BY `id` DESC LIMIT 5
UNION ALL
SELECT * FROM `mytable` WHERE `group_id` = 2 ORDER BY `id` DESC LIMIT 5
UNION ALL
SELECT * FROM `mytable` WHERE `group_id` = 3 ORDER BY `id` DESC LIMIT 5
However, there is not a fixed number of groups. Groups are determined by the what's in the groups table, so there is an indeterminate number of them.
My thoughts so far:
I could grab a CURSOR on the groups table and build a new SQL query string, then EXECUTE it. However, that seems really messy and I'm hoping there's a better way of doing it.
I could grab a CURSOR on the groups table and insert things into a temporary table, then select from that. However, that also seems really messy.
I don't know if I could just grab a CURSOR and then start returning rows directly from there. Is there perhaps something similar to SQL Server's #table type variables?
What I'm hoping most of all is that I'm overthinking this and there is a way to do this in a SELECT statement.
To get n most recent rows per group can be best handled by window functions in other RDBMS (SQL Server,Postgre Sql,Oracle etc), But unfortunately MySql don't have any window functions so for alternative there is a solution to use user defined variables to assign a rank for rows that belong to same group in this case ORDER BY group_id,id desc is important to order the results properly per group
SELECT c.*
FROM (
SELECT *,
#r:= CASE WHEN #g = group_id THEN #r + 1 ELSE 1 END rownum,
#g:=group_id
FROM mytable
CROSS JOIN(SELECT #g:=NULL ,#r:=0) t
ORDER BY group_id,id desc
) c
WHERE c.rownum <=5
Above query will give you 5 recent rows for each group_id and if you want to get more than 5 rows just change where filter of outer query to your desired number WHERE c.rownum <= n

sql query that selects the newest 100 articles then returns 30 articles of that result by top rated ordering

I am asking for a query that selects the newest 100 articles, then returns 30 articles of that result by top rated ordering. Something is wrong in my example.
SELECT *
FROM
(
SELECT articleid,articletitle,articleicon,timesviewed
FROM articles
WHERE articlestatus = 1 AND (articletype = 1 || articletype = 6)
ORDER BY articleid DESC LIMIT 100
)
ORDER BY (totalvotepoints/totalvotes) DESC LIMIT 30
Your inner SELECT statement does not include the columns totalvotepoints and totalvotes. Therefore the outer SELECT cannot reference them. Try
SELECT * FROM (
SELECT articleid,articletitle,articleicon,timesviewed,totalvotepoints,totalvotes
FROM articles
WHERE articlestatus = 1 AND articletype IN (1,6)
ORDER BY articleid DESC
LIMIT 100
)
ORDER BY (totalvotepoints/totalvotes)
DESC LIMIT 30
I see two problems here:
The inner SELECT query needs to include the totalvotepoints and totalvotes columns (or even just the result of the division operation).
The inner query needs a name, even though the name is never used. I'm more of a sql server guy, so maybe mysql lets you get away with this, but I'd expect the query to fail without a name supplied after the sub query
Put it all together:
SELECT *
FROM
(
SELECT articleid,articletitle,articleicon,timesviewed,totalvotepoints/totalvotes As rating
FROM articles
WHERE articlestatus = 1 AND articletype IN (1,6)
ORDER BY articleid DESC LIMIT 100
) t
ORDER BY rating DESC LIMIT 30
Also as a Sql Server guy I was surprised to find that the result of dividing two integers is a floating-point type, instead of an integer division. Most systems will do integer division here unless you specifically cast one side to a floating point type, if for no other reason than that sometimes you need integer division. It seems that in MySql, the only way to do integer division is with the DIV operator, which is a non-standard extension to ansi sql.

Mysql: Order by max N values from subquery

I'm about to throw in the towel with this.
Preface: I want to make this work with any N, but for the sake of simplicity, I'll set N to be 3.
I've got a query (MySQL, specifically) that needs to pull in data from a table and sort based on top 3 values from that table and after that fallback to other sort criteria.
So basically I've got something like this:
SELECT tbl.id
FROM
tbl1 AS maintable
LEFT JOIN
tbl2 AS othertable
ON
maintable.id = othertable.id
ORDER BY
othertable.timestamp DESC,
maintable.timestamp DESC
Which is all basic textbook stuff. But the issue is I need the first ORDER BY clause to only get the three biggest values in othertable.timestamp and then fallback on maintable.timestamp.
Also, doing a LIMIT 3 subquery to othertable and join it is a no go as this needs to work with an arbitrary number of WHERE conditions applied to maintable.
I was almost able to make it work with a user variable based approach like this, but it fails since it doesn't take into account ordering, so it'll take the FIRST three othertable values it finds:
ORDER BY
(
IF(othertable.timestamp IS NULL, 0,
IF(
(#rank:=#rank+1) > 3, null, othertable.timestamp
)
)
) DESC
(with a #rank:=0 preceding the statement)
So... any tips on this? I'm losing my mind with the problem. Another parameter I have for this is that since I'm only altering an existing (vastly complicated) query, I can't do a wrapping outer query. Also, as noted, I'm on MySQL so any solutions using the ROW_NUMBER function are unfortunately out of reach.
Thanks to all in advance.
EDIT. Here's some sample data with timestamps dumbed down to simpler integers to illustrate what I need:
maintable
id timestamp
1 100
2 200
3 300
4 400
5 500
6 600
othertable
id timestamp
4 250
5 350
3 550
1 700
=>
1
3
5
6
4
2
And if for whatever reason we add WHERE NOT maintable.id = 5 to the query, here's what we should get:
1
3
4
6
2
...because now 4 is among the top 3 values in othertable referring to this set.
So as you see, the row with id 4 from othertable is not included in the ordering as it's the fourth in descending order of timestamp values, thus it falls back into getting ordered by the basic timestamp.
The real world need for this is this: I've got content in "maintable" and "othertable" is basically a marker for featured content with a timestamp of "featured date". I've got a view where I'm supposed to float the last 3 featured items to the top and the rest of the list just be a reverse chronologic list.
Maybe something like this.
SELECT
id
FROM
(SELECT
tbl.id,
CASE WHEN othertable.timestamp IS NULL THEN
0
ELSE
#i := #i + 1
END AS num,
othertable.timestamp as othertimestamp,
maintable.timestamp as maintimestamp
FROM
tbl1 AS maintable
CROSS JOIN (select #i := 0) i
LEFT JOIN tbl2 AS othertable
ON maintable.id = othertable.id
ORDER BY
othertable.timestamp DESC) t
ORDER BY
CASE WHEN num > 0 AND num <= 3 THEN
othertimestamp
ELSE
maintimestamp
END DESC
Modified answer:
select ilv.* from
(select sq.*, #i:=#i+1 rn from
(select #i := 0) i
CROSS JOIN
(select m.*, o.id o_id, o.timestamp o_t
from maintable m
left join othertable o
on m.id = o.id
where 1=1
order by o.timestamp desc) sq
) ilv
order by case when o_t is not null and rn <=3 then rn else 4 end,
timestamp desc
SQLFiddle here.
Amend where 1=1 condition inside subquery sq to match required complex selection conditions, and add appropriate limit criteria after the final order by for paging requirements.
Can you use a union query as below?
(SELECT id,timestamp,1 AS isFeatured FROM tbl2 ORDER BY timestamp DESC LIMIT 3)
UNION ALL
(SELECT id,timestamp,2 AS isFeatured FROM tbl1 WHERE NOT id in (SELECT id from tbl2 ORDER BY timestamp DESC LIMIT 3))
ORDER BY isFeatured,timestamp DESC
This might be somewhat redundant, but it is semantically closer to the question you are asking. This would also allow you to parameterize the number of featured results you want to return.

SQL Distinct - Get all values

Thanks for looking, I'm trying to get 20 entries from the database randomly and unique, so the same one doesn't appear twice. But I also have a questionGroup field, which should also not appear twice. I want to make that field distinct, but then get the ID of the field selected.
Below is my NOT WORKING script, because it does the ID as distinct too which
SELECT DISTINCT `questionGroup`,`id`
FROM `questions`
WHERE `area`='1'
ORDER BY rand() LIMIT 20
Any advise is greatly appreciated!
Thanks
Try doing the group by/distinct first in a subquery:
select *
from (select distinct `questionGroup`,`id`
from `questions`
where `area`='1'
) qc
order by rand()
limit 20
I see . . . What you want is to select a random row from each group, and then limit it to 20 groups. This is a harder problem. I'm not sure if you can do this accurately with a single query in mysql, not using variables or outside tables.
Here is an approximation:
select *
from (select `questionGroup`
coalesce(max(case when rand()*num < 1 then id end), min(id)) as id
from `questions` q join
(select questionGroup, count(*) as num
from questions
group by questionGroup
) qg
on qg.questionGroup = q.questionGroup
where `area`='1'
group by questionGroup
) qc
order by rand()
limit 20
This uses rand() to select an id, taking, on average two per grouping (but it is random, so sometimes 0, 1, 2, etc.). It chooses the max() of these. If none appear, then it takes the minimum.
This will be slightly biased away from the maximum id (or minimum, if you switch the min's and max's in the equation). For most applications, I'm not sure that this bias would make a big difference. In other databases that support ranking functions, you can solve the problem directly.
Something like this
SELECT DISTINCT *
FROM (
SELECT `questionGroup`,`id`
FROM `questions`
WHERE `area`='1'
ORDER BY rand()
) As q
LIMIT 20

mysql random rows

how to form a query to select 'm' rows randomly from a query result which has 'n' rows.
for ex; 5 rows from a query result which has 50 rows
i try like as follows but it errors
select * from (select * from emp where alphabet='A' order by sal desc) order by rand() limit 5;
u can wonder that why he needs sub query, i need 5 different names from a set of top 50 resulted by inner query.
SELECT * FROM t
ORDER BY RAND() LIMIT 5
or from your query result:
SELECT * FROM ( SELECT * FROM t WHERE x=y ) tt
ORDER BY RAND() LIMIT 5
This will give you the number to use as 'm' (limit)
TRUNCATE((RAND()*50),0);
...substitute 50 with n.
To check it try the following:
SELECT TRUNCATE((RAND()*50),0);
I should warn that this could return 0 as a result, is this ok for you?
For example you could do something like this:
SELECT COUNT(*) FROM YOUR_TABLE
...and store the result in a variable named totalRows for example. Then you could do:
SELECT * FROM YOUR_TABLE LIMIT TRUNCATE((RAND()*?),0);
where you substitute the '?' with the totalRows variable, according to the tech stack you are using.
Is it clearer now? If not please add more information to your question.