mytable has an auto-incrementing id column which is an integer, and for all intents and purposes in this case you can safely assume that the higher ID represents a more recent value. mytable also has an indexed column called group_id which is a foreign key to the groups table.
I want a quick and dirty query to select the 5 most recent rows for each group_id from mytable.
If there were only three groups, this would be easy, as I could do this:
SELECT * FROM `mytable` WHERE `group_id` = 1 ORDER BY `id` DESC LIMIT 5
UNION ALL
SELECT * FROM `mytable` WHERE `group_id` = 2 ORDER BY `id` DESC LIMIT 5
UNION ALL
SELECT * FROM `mytable` WHERE `group_id` = 3 ORDER BY `id` DESC LIMIT 5
However, there is not a fixed number of groups. Groups are determined by the what's in the groups table, so there is an indeterminate number of them.
My thoughts so far:
I could grab a CURSOR on the groups table and build a new SQL query string, then EXECUTE it. However, that seems really messy and I'm hoping there's a better way of doing it.
I could grab a CURSOR on the groups table and insert things into a temporary table, then select from that. However, that also seems really messy.
I don't know if I could just grab a CURSOR and then start returning rows directly from there. Is there perhaps something similar to SQL Server's #table type variables?
What I'm hoping most of all is that I'm overthinking this and there is a way to do this in a SELECT statement.
To get n most recent rows per group can be best handled by window functions in other RDBMS (SQL Server,Postgre Sql,Oracle etc), But unfortunately MySql don't have any window functions so for alternative there is a solution to use user defined variables to assign a rank for rows that belong to same group in this case ORDER BY group_id,id desc is important to order the results properly per group
SELECT c.*
FROM (
SELECT *,
#r:= CASE WHEN #g = group_id THEN #r + 1 ELSE 1 END rownum,
#g:=group_id
FROM mytable
CROSS JOIN(SELECT #g:=NULL ,#r:=0) t
ORDER BY group_id,id desc
) c
WHERE c.rownum <=5
Above query will give you 5 recent rows for each group_id and if you want to get more than 5 rows just change where filter of outer query to your desired number WHERE c.rownum <= n
Related
I am making a MySQL query of a table with thousands of records. What I'm really trying to do is find the next and previous rows that surround a particular ID. The issue is that when sorting the table in a specific way, there is no correlation between IDs (I can't just search for id > $current_id LIMIT 1, for example, because the needed ID in the next row might or might not actually be higher. Here is an example:
ID Name Date
4 Fred 1999-01-04
6 Bill 2002-04-02
7 John 2002-04-02
3 Sara 2002-04-02
24 Beth 2007-09-18
1 Dawn 2007-09-18
Say I know I want the records that come directly before and after John (ID = 7). In this case, the ID of the record after that row is actually a lower number. The table is sorted by date first and then by name, but there are many entires with the same date - so I can't just look for the next date, either. What is the best approach to find, in this case, the row before and (separately) the row after ID 7?
Thank you for any help.
As others have suggested you can use window functions for this, but I would use LEAD() and LAG() instead of ROW_NUMBER().
SELECT *
FROM (
SELECT
*,
LAG(ID) OVER (ORDER BY `Date` ASC, `Name` ASC) `prev`,
LEAD(ID) OVER (ORDER BY `Date` ASC, `Name` ASC) `next`
FROM `tbl`
) t
WHERE `ID` = 7;
With thousands of records (very small) this should be very fast but if you expect it to grow to hundreds of thousands, or even millions of rows you should try to limit the amount of work being done in the inner query. Sorting millions of rows and assigning prev and next values to all of them, just to use one row would be excessive.
Assuming your example of John (ID = 7) you could use the Date to constrain the inner query. If the adjacent records would always be within one month then you could do something like -
SELECT *
FROM (
SELECT
*,
LAG(ID) OVER (ORDER BY `Date` ASC, `Name` ASC) `prev`,
LEAD(ID) OVER (ORDER BY `Date` ASC, `Name` ASC) `next`
FROM `tbl`
WHERE `Date` BETWEEN '2002-04-02' - INTERVAL 1 MONTH AND '2002-04-02' + INTERVAL 1 MONTH
) t
WHERE `ID` = 7;
Without knowing more detail about the distribution of your data, I am only guessing but hopefully you get the idea.
You can use a window function called ROW_NUM in this way. ROW_NUM() OVER(). This will number every row in the table consecutively. Now you search for your I'd and you also get the Row number. For example, you search for ID=7 and you get row number 35. Now you can search for row number from 34 to 36 to get rows below and above the one with I'd 7.
This is what comes to mind:
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY `date`, `name`) AS row_num
FROM people
) p1
WHERE row_num > (SELECT row_num FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY `date`, `name`) AS row_num
FROM people
) p2 WHERE p2.id = 7)
LIMIT 1;
Using the row number window function, you can compare two view of the table with id = 7 and get the row you need. You can change the condition in the subquery to suit your needs, e.g., p2.name = 'John' and p2.date = '2002-04-02'.
Here's a dbfiddle demonstrating: https://www.db-fiddle.com/f/mpQBcijLFRWBBUcWa3UcFY/2
Alternately, you can simplify the syntax a bit and avoid the redundancy using a CTE like this:
WITH p AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY `date`, `name`) AS row_num
FROM people
)
SELECT *
FROM p
WHERE row_num > (SELECT row_num FROM p WHERE p.id = 7)
LIMIT 1;
Table
id user_id rank_solo lp
1 1 15 45
2 2 7 79
3 3 17 15
How can I sort out a ranking query that sorts on rank_solo ( This ranges from 0 to 28) and if rank_solo = rank_solo , uses lp ( 0-100) to further determine ranking?
(If lp = lp, add a ranking for no tie rankings)
The query should give me the ranking from a certain random user_id. How is this performance wise on 5m+ rows?
So
User_id 1 would have ranking 2
User_id 2 would have ranking 3
User_id 3 would have ranking 1
You can get the ranking using variablesL
select t.*, (#rn := #rn + 1) as ranking
from t cross join
(select #rn := 0) params
order by rank_solo desc, lp;
You can use ORDER BY to sort your query:
SELECT *
FROM `Table`
ORDER BY rank_solo, lp
I'm not sure I quite understand what you're saying. With that many rows, create a query on the fields you're using to do your selects. For example, in MySQL client use:
create index RANKINGS on mytablename(rank_solo,lp,user_id);
Depending on what you use in your query to select the data, you may change the index or add another index with a different field combination. This has improved performance on my tables by a factor of 10 or more.
As for the query, if you're selecting a specific user then could you not just use:
select rank_solo from table where user_id={user id}
If you want the highest ranking individual, you could:
select * from yourtable order by rank_solo,lp limit 1
Remove the limit 1 to list them all.
If I've misunderstood, please comment.
An alternative would be to use a 2nd table.
table2 would have the following fields:
rank (auto_increment)
user_id
rank_solo
lp
With the rank field as auto increment, as it's populated, it will automatically populate with values beginning with "1".
Once the 2nd table is ready, just do this when you want to update the rankings:
delete from table2;
insert into table2 select user_id,rank_solo,lp from table1 order by rank_solo,lp;
It may not be "elegant" but it gets the job done. Plus, if you create an index on both tables, this query would be very quick since the fields are numeric.
I have a table with 3 fields:
id
note
created_at
Is there a way in the SQL language especially Postgres that I can select the value of the last note without having to LIMIT 1?
Normal query:
select note from table order by created_at desc limit 1
I'm interested in something avoiding the limit since I'll need it as a subquery.
Simple version with EXISTS semi-join:
SELECT note FROM tbl t
WHERE NOT EXISTS
(SELECT 1 FROM tbl t1 WHERE t1.created_at > t.created_at);
"Find a note where no other note was created later."
This shares the weakness of #Hogan's version that it can return multiple rows if created_at is not UNIQUE - like #Ollie already pointed out. Unlike #Hogan's query (max() is only defined for simple types) this one can be improved easily:
Compare row types
SELECT note FROM tbl t
WHERE NOT EXISTS
(SELECT 1 FROM tbl t1
WHERE (t1.created_at, t1.id) > (t.created_at, t.id));
Assuming you want the greatest id in case of a tie with created_at, and id is the primary key, therefore unique. This works in PostgreSQL and MySQL.
SQL Fiddle.
Window function
The same can be achieved with a window function in PostgreSQL:
SELECT note
FROM (
SELECT note, row_number() OVER (ORDER BY created_at DESC, id DESC) AS rn
FROM tbl t
) x
WHERE rn = 1;
MySQL lacks support for window functions. You can substitute with a variable like this:
SELECT note
FROM (
SELECT note, #rownum := #rownum + 1 AS rn
FROM tbl t
,(SELECT #rownum := 0) r
ORDER BY created_at DESC, id DESC
) x
WHERE rn = 1;
(SELECT #rownum := 0) r initializes the variable with 0 without an explicit SET command.
SQL Fiddle.
If your id column is an autoincrementing primary key field, it's pretty easy. This assumes the latest note has the highest id. (That might not be true; only you know that!)
select *
from note
where id = (select max(id) from note)
It's here: http://sqlfiddle.com/#!2/7478a/1/0 for MySQL and here http://sqlfiddle.com/#!1/6597d/1/0 for postgreSQL. Same SQL.
If your id column isn't set up so the latest note has the highest id, but still is a primary key (that is, still has unique values in each row), it's a little harder. We have to disambiguate identical dates; we'll do this by choosing, arbitrarily, the highest id.
select *
from note
where id = (
select max(id)
from note where created_at =
(select max(created_at)
from note
)
)
Here's an example: http://sqlfiddle.com/#!2/1f802/4/0 for MySQL.
Here it is for postgreSQL (the SQL is the same, yay!) http://sqlfiddle.com/#!1/bca8c/1/0
Another possibility: maybe you want both notes shown together in one row if they were both created at the same exact time. Again, only you know that.
select group_concat(note separator '; ')
from note
where created_at = (select max(created_at) from note)
In postgreSQL 9+, it's
select string_agg(note, '; ')
from note
where created_at = (select max(created_at) from note)
If you do have the possibility for duplicate created_at times and duplicate id values, and you don't want the group_concat effect, you are unfortunately stuck with LIMIT.
I'm not 100% on Postgres (actually never used it) but you can get the same effect with something like this - if the created_at is unique ... (or with any column which is unique):
SELECT note FROM table WHERE created_at = (
SELECT MAX(created_at) FROM table
)
I may not know how to answer on this platform but what I have suggested is working
SELECT * FROM table GROUP BY field ORDER BY max(field) DESC;
You can get the last value of the field without limiting, usually in JOINED query we get the last update time with no limiting of output like this way, such as last message time without limiting it.
I have a database table that has two fields , date and name.
I want to have my query pull the first 20 by newest date first, then the rest of the query to pull the other elements by name alphabetically.
So that way the top 20 newest products would show first, then the rest would be ordered by name.
It's a bit ugly, but you can do it in one query:
SELECT name,
`date`
FROM ( SELECT #rank := #rank + 1 AS rank,
name,
`date`
FROM (SELECT #rank := 0) dummy
JOIN products
ORDER BY `date` DESC, name) dateranked
ORDER BY IF(rank <= 20, rank, 21), name;
The innermost query, dummy, initializes our #rank variable. The next derived table, dateranked, ranks all rows by recency (breaking ties by name). The outermost query then simply re-orders the rows by our computed rank, treating ranks greater than 20 as rank #21, and then by name.
UPDATE: This query version is more compact, puts the conditional ranking logic in the outermost ORDER BY, uses IF() rather than CASE/END.
I'm afraid this has to be done by adding a special column to your table or creating a temporary table, TPup. If you let me know whether you are interested in those options, I'll tell you more.
The two queries option like the following might be a possibility, but my version of MySQL tells me LIMIT isn't available in sub-queries.
SELECT `date`, `name` from `table` ORDER BY `date` LIMIT 0, 20;
SELECT `date`, `name` from `table` WHERE `id` NOT IN (SELECT `id` from `table` ORDER BY `date` LIMIT 0, 20) ORDER BY `name`;
Use sql UNION operator to combine result of two SELECT queries.
According to MySQL docs:
use of ORDER BY for individual SELECT statements implies nothing
about the order in which the rows appear in the final result because
UNION by default produces an unordered set of rows.
...
To use an ORDER BY or LIMIT clause to sort or limit the entire UNION
result, parenthesize the individual SELECT statements and place the
ORDER BY or LIMIT after the last one. The following example uses both clauses:
(SELECT a FROM t1 WHERE a=10 AND B=1)
UNION
(SELECT a FROM t2 WHERE a=11 AND B=2)
ORDER BY a LIMIT 10;
Edit:
I missed the part that explain OP needs to sort one set of the result on the date and the other set of the result alphabetically. I think you need to create a temporary field for the sorting purpose. And SQL query would be something similar to this.
(SELECT *, 'firstset' as set_id FROM t1 ORDER BY date LIMIT 0, 20)
UNION
(SELECT *, 'secondset' as set_id FROM t1 ORDER BY date LIMIT 20, 18446744073709551615)
ORDER BY
CASE
WHEN set_id='firstset' THEN date
WHEN set_id='secondset' THEN name
END DESC ;
I want to limit the size of records inside a group, and here is my trial, how to do it right?
mysql> select * from accounts limit 5 group by type;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual
that corresponds to your MySQL server version for the
right syntax to use near 'group by type' at line 1
The point of an aggregate function (and the GROUP BY it requires) is to turn many rows into one row. So if you really just want the top 5 savings accounts and the top 5 chequing accounts and the top 5 USD accounts etc., what you need is more like this:
criteria: top 5 of particular account type by account_balance
SELECT account_type, account_balance FROM accounts WHERE account_type='savings'
ORDER BY account_balance DESC LIMIT 5
UNION
SELECT account_type, account_balance FROM accounts WHERE account_type='chequing'
ORDER BY account_balance DESC LIMIT 5
UNION
SELECT account_type, account_balance FROM accounts WHERE account_type='USD'
ORDER BY account_balance DESC LIMIT 5;
It's not pretty, but if you construct the SQL with a script then subbing in the account_types and concatenating together a query is straightforward.
I've had some luck with using numbered rows:
set #type = '';
set #num = 0;
select
items.*,
#num := if(#type = item_type, #num + 1, 1) as dummy_1,
#type := item_type as dummy_2,
#num as row_number
from items
group by
item_type,
row_number
having row_number < 3;
This will give you 2 results per item_type. (One gotcha: make sure you re-run the first two set statements otherwise your row numbers will steadily get higher and higher and the row_number < 3 restriction won't work.
I pieced this together from a couple of posts which have been linked in other answers on SO.
It appears you want to limit the number of rows returned within each group of your overall result set... this is difficult to do in a way that scales well. One technique is to perform N joins on the same table with the conditions such that the only rows that match are the top/bottom N that you want.
this page may offer some additional insight into your solution... although returning the top 5 in each group is going to get ugly fast.
Try placing the LIMIT clause after the GROUP BY clause.
EDIT: Try this:
SELECT *
FROM accounts a1
WHERE 5 >
(
SELECT COUNT(*)
FROM accounts a2
WHERE a2.type = a1.type
AND a2.balance > a1.balance
)
This returns at most 5 accounts of each type with the biggest balances.
Group by is used for aggregate functions (sums, averages...)
Is allows you to split the aggregate result into groups. You have not used one of these functions.
I am not sure you can use a limit in the group by. You can probably use it if your group by is a sub select that returns one row/value. For example:
select * from foo order by (select foo2.id from foo2 limit 1)
I am just guessing this would work.
This will probably do the trick, although if type isn't indexed, it'll be sloooowwww. And even with one, it's not especially fast:
SELECT a.*
FROM accounts a
LEFT JOIN accounts a2 ON (a2.type = a.type AND a2.id < a.id)
WHERE count(a2.id) < 5
GROUP BY a.id;
A better bet would be to just order the list by type and then use a loop at the business layer to remove the rows you don't want.
#dnagirl's answer almost has it, but for some reason, my version of MySQL only returns the first LIMIT'd set. To get around that, I put each statement into a subquery
SELECT * FROM (
SELECT account_type, account_balance FROM accounts WHERE account_type='savings'
ORDER BY account_balance DESC LIMIT 5
) as a
UNION
SELECT* FROM (
SELECT account_type, account_balance FROM accounts WHERE account_type='chequing'
ORDER BY account_balance DESC LIMIT 5
) as b
UNION
SELECT * FROM (
SELECT account_type, account_balance FROM accounts WHERE account_type='USD'
ORDER BY account_balance DESC LIMIT 5
) as c
This gave me back each set's results in the final result set. Otherwise, I would have only gotten the first 5 from the first query and nothing else - not sure if it's just some MySQL funk with my version