I have a MySQL table with many rows. The table has a popularity column. If I sort by popularity, I can get the rank of each item. Is it possible to retrieve the rank of a particular item without sorting the entire table? I don't think so. Is that correct?
An alternative would be to create a new column for storing rank, sort the entire table, and then loop through all the rows and update the rank. That is extremely inefficient. Is there perhaps a way to do this in a single query?
There is no way to calculate the order (what you call rank) of something without first sorting the table or storing the rank.
If your table is properly indexed however (index on popularity) it is trivial for the database to sort this so you can get your rank. I'd suggest something like the following:
Select all, including rank
SET #rank := 0;
SELECT t.*, #rank := #rank + 1
FROM table t
ORDER BY t.popularity;
To fetch an item with a specific "id" then you can simply use a subquery as follows:
Select one, including rank
SET #rank := 0;
SELECT * FROM (
SELECT t.*, #rank := #rank + 1
FROM table t
ORDER BY t.popularity
) t2
WHERE t2.id = 1;
You are right that the second approach is inefficent, if the rank column is updated on every table read. However, depending on how many updates there are to the database, you could calculate the rank on every update, and store that - it is a form of caching. You are then turning a calculated field into a fixed value field.
This video covers caching in mysql, and although it is rails specific, and is a slightly different form of caching, is a very similar caching strategy.
If you are using an InnoDb table then you may consider building a clustered index on the popularity column. (only if the order by on popularity is a frequent query). The decision also depends on how varied the popularity column is (0 - 3 not so good).
You can look at this info on clustered index to see if this works for your case: http://msdn.microsoft.com/en-us/library/ms190639.aspx
This refers to SQL server but the concept is the same, also look up mysql documentation on this.
If you're doing this using PDO then you need to modify the query to all be within the single statement in order to get it to work properly. See PHP/PDO/MySQL: Convert Multiple Queries Into Single Query
So hobodave's answer becomes something like:
SELECT t.*, (#count := #count + 1) as rank
FROM table t
CROSS JOIN (SELECT #count := 0) CONST
ORDER BY t.popularity;
hobodave's solution is very good. Alternatively, you could add a separate rank column and then, whenever a row's popularity is UPDATEd, query to determine whether that popularity update changed its ranking relative to the row above and below it, then UPDATE the 3 rows affected. You'd have to profile to see which method is more efficient.
Related
I have an Eloquent query that is currently taking about 700ms to run and it will only increase as I add more websites to the user account. I'm trying to see what the best way to optimize it is so that it can run faster.
I really don't want to save the "results" of my calculations and then just fetch those in a smaller query later because they could update at any moment and that would mean they would not be accurate 100% of the time. Although I am pretty sure that would speed up the query, I don't want to sacrifice accuracy over performance.
This is essentially the raw query that runs:
select *
from
( SELECT `positions`.*,
#rank := IF(#group = keyword_id, #rank+1,
1) as rank_e0686ae02a55b8ad75aec0c7aaec0a21,
#group := keyword_id as group_e0686ae02a55b8ad75aec0c7aaec0a21
from
( SELECT #rank:=0, #group:=0 ) as vars,
positions
order by `keyword_id` asc, `created_at` desc
) as positions
where `rank_e0686ae02a55b8ad75aec0c7aaec0a21` <= '2'
and `positions`.`keyword_id` in ('hundreds of IDs listed here')
The query is generated using the solution mentioned here with regards to getting N number of relations per record.
I've tried running a simpler query without the N number of relations per record, and it actually ends up being even slower because it's fetching much more data. So the problem I think is that there are too many IDs that are trying to be matched up in the IN method of the query.
In my controller I have:
$user = auth()->user();
$websites = $user->websitesAndKeywords();
In my User model:
public function websitesAndKeywords() {
$user = auth()->user();
$websites = $user->websites()->orderBy('url')->get();
$websites->load('keywords', 'keywords.latestPositions');
return $websites;
}
I would appreciate any help anyone could provide in helping me speed this thing up.
EDIT: So I think I figured it out. The problem is the IN clause that Laravel uses every time eager loading is used to load relations. So I need to find a way to do a JOIN instead of eager loading.
Essentially need to convert this:
$websites->load('keywords', 'keywords.latestPositions');
Into:
$websites->load(['keywords' => function($query)
{
$query->join('positions', 'keywords.id', '=', 'positions.keyword_id');
}]);
That doesn't work, so I'm not sure what's the best way to do a JOIN on a current collection. Ideally I would also only fetch the latest N positions too and not all data.
Here are indexes on positions table:
And here is what explain returns for the query:
You need, if you don't have it yet, an index positions(keyword_id), or maybe positions(keyword_id,created_at), depending on your data, depending on if you want to keep using "lazy evaluate", and depending on if you want to use the trigger-solution.
And you have to, as Rick suggested, move your keyword_id in... into the inner query, as mysql will not be able to optimize it into the subquery since the optimizer doesn't understand that IF(#group = keyword_id, #rank+1, 1) will not need the other keywords to work properly.
This should give results for tables with several million rows (if you don't want to retrieve them all in IN) in less than 700ms, and might be improved by removing the "lazy evaluate" as Rick also suggested (so you do less table lookups for columns not included in your index), depending on your data.
If you still have troubles, you could however actually precalculate the data without loss of accuracy by using triggers. It will add a (most likely small) overhead to your inserts/updates, so if you insert/update a lot and only query once in a while, you might not want to do it.
For this, you should really use the index positions (keyword_id,created_at).
Add another table keywordrank with the columns keyword_id, rank, primarykeyofpositionstable, primary key keyword_id and rank. You need another table, since in a trigger, mysql can't update other rows in the table you are updating.
Create a trigger that will update these ranks on every insert to your positions-table:
delimiter $$
create trigger tr_positions_after_insert_updateranks after insert on positions
for each row
begin
delete from keywordrank where keyword_id = NEW.keyword_id;
insert into keywordrank (keyword_id, rank, primarykeyofpositionstable)
select NEW.keyword_id, ranks.rank, ranks.position_pk
from
(select NEW.keyword_id,
#rank := #rank+1 as rank,
`positions`.primarykeyofpositionstable as position_pk
from
(SELECT #rank:=0, #group:=0 ) as vars,
positions
where `positions`.keyword_id = NEW.keyword_id
order by `keyword_id` asc, `created_at` desc
) as ranks
where ranks.rank <= 2;
end$$
delimiter ;
If you want to be able to update or delete the entries (or to be safe if you do it at one time, so it might be a good idea anyway), add the same as an update/delete-trigger, just do it for both old.keyword_id and new.keyword_id - and you might want to put the code into a procedure to reuse it then. E.g. create a procedure fctname(kwid int), put the whole trigger code in it but replace all NEW.keyword_id with kwid and then just call fctname(new.keyword_id) for insert, fctname(new.keyword_id) and fctname(old.keyword_id) for update and fctname(old.keyword_id) for delete.
You need to init that table one time (and if you e.g. decide you may need more ranks or another order), you can use any version of your code, e.g.
delete from keywordrank;
insert into keywordrank (keyword_id, rank, primarykeyofpositionstable)
select ranks.keyword_id, ranks.rank, ranks.position_pk
from
( SELECT `positions`.primarykeyofpositionstable as position_pk,
#rank := IF(#group = keyword_id, #rank+1,
1) as rank,
#group := keyword_id as keyword_id
from
( SELECT #rank:=0, #group:=0 ) as vars,
positions
order by `keyword_id` asc, `created_at` desc
) as ranks
where ranks.rank <= 2;
You can put both the trigger(s) and the init in your migration files (without the delimiter).
You then can just use a join to get your desired rows.
Update The code without trigger, using index on (keyword_id, created_at). You can calculate the inner query completely from the index and then only look up the found ids in the tabledata. It depends on the number of rows in your result (in relation to your whole table), how much of an effect removing the lazy evaluate will have.
select positions.*, poslist.rank, poslist.group
from positions
join
( SELECT `positions`.id,
#rank := IF(#group = keyword_id, #rank+1,
1) as rank,
#group := keyword_id as group
from
( SELECT #rank:=0, #group:=0 ) as vars,
positions
where `positions`.`keyword_id` in ('hundreds of IDs listed here')
order by `keyword_id` asc, `created_at` desc
) as poslist
on positions.id = poslist.id
where poslist.rank <= 2;
Check explain if it actually uses the correct index (keyword_id, created_at). If that is not fast enough, you should try the trigger solution. (Or add the new explain-output and show profile-output to let us have a deeper look.)
The code for finding the "top 2 in each grouping" is the best I have ever seen. It is essentially the same as what I have in my blog on such.
However, there are two other things that we may be able to improve on.
Move keyword_id in... from the outer query to the inner. I assume you have an index starting with keyword_id?
"Lazy evaluate". That is, instead of doing SELECT positions.*, ..., do only SELECT id, ... where id is the PRIMARY KEY of positions. Then, in the outer query, JOIN back to positions to get the rest of the columns. With out seeing SHOW CREATE TABLE and knowing what percentage of the table in in the IN list, I can't be sure that this will help much.
cant think of the best way to do this.
So (example):
I have a table with 10 rows. In this table there is a column called 'Points'. Each row has a value for the points table. This so far works fine.
I now want to have a column called 'Ranking'. The aim is to somehow order all of the rows in that table by the points field, and then update each rows 'Ranking' field with its order number / ranking created by ordering the rows by the points value.
So rows get ordered by the points Ascending, then i update the rows with 1 - 10 depending on their rank.
How do i go about doing this?
I already use a Cron job to update the points field so was going to include it in with this.
Thanks, Craig.
Example of how i would be ordering the rows:
SELECT * FROM blogs ORDER BY points ASC
Foreach row:
UDPATE blogs SET ranking = ranking WHERE blogid = blogID
Thanks, P.S. Thats not the actual queries, just plain English explanation of how imagine this working.
Perhaps this does what you want:
update blogs cross join
(select #rn := 0) vars
set ranking = (#rn := #rn + 1)
order by points;
It uses variables and order by to do the ordering inside the update.
EDIT:
You can set the variable before the update as well:
set #rn := 0;
update blogs
set ranking = (#rn := #rn + 1)
order by points;
Have you considered rank in sql?
http://msdn.microsoft.com/en-us/library/ms176102.aspx
I would imagine something similar
SELECT name, points
,RANK() OVER
(PARTITION BY point ORDER BY points) AS Rank
FROM table
ORDER BY points
you can perhaps store this in a temp table, and update the values based on the rank numbers.
However, you might have to add logic if you don't want ties to show up as the same number.
Assume a houses table with lot's of fields, related images tables, and 3 other related tables. I have an expensive query that retrieves all houses data, with all data from the related tables. Do I need to run the same expensive MySql query twice in the case of pagination: once for current result page and once to get the total number of records?
I'm using server-side pagination with Limit 0,10, and need to return the total number of houses along with the data. It doesn't make sense to me to run the same expensive query with the count(*) function, just because I'm limiting the result-set for pagination.
Is there another way to instruct MySQL to count the whole query, but bring back only the current pagination data?
I hope my question is clear...
thanks
I don't know MySql but for many dbs, I think you'll find that the cost of running it twice isn't as high as you'd suspect - if you do it in such a way that the db's optimization engine sees the two queries as having a lot in common.
Running
select count(1) from (
select some_fields, row_number over (order by field) as rownum
from some_table
)
and then
select * from (
select some_fields, row_number over (order by field) as rownum
from some_table
)
where rownum between :startRow and :endRow
order by row_number
This also has the advantage of you being able to maintain the query in just one place with two different wrappers around it, 1 for paging and 1 for getting the total count.
Just as a side note, the best optimization you can do is make sure you send the exact same query to the db every time. In other words, if the user can change the sort or change what fields they can query on, bake it all into the same query. E.g:
select some_fields,
case
when :sortField = 'ID' and :sortType = 'asc'
then row_number over (order by id)
when :sortField = 'ID' and :sortType = 'desc'
then row_number over (order by id desc)
end as rownum
from some_table
where (:searchType = 'name'
and last_name like :lastName and first_name like :firstName)
or (:searchType = 'customerType'
and customer_type = :customer_type)
cfquery has a recordcount variable that might be useful. You can also use the startrow and maxrows attributes of cfoutput to control how many records get displayed. Finally, you can cache the query results in coldfusion so you don't have to run it against the database each time.
I'm developing a scoreboard of sorts. The table structure is ID, UID, points with UID being linked to a users account.
Now, I have this working somewhat, but I need one specific thing for this query to be pretty much perfect. To pick a user based on rank.
I'll show you my the SQL.
SELECT *, #rownum := #rownum + 1 AS `rank` FROM
(SELECT * FROM `points_table` `p`
ORDER BY `p`.`points` DESC
LIMIT 1)
`user_rank`,
(SELECT #rownum := 0) `r`, `accounts_table` `a`, `points_table` `p`
WHERE `a`.`ID` = `p`.`UID`
It's simple to have it pick people out by UID, but that's no good. I need this to pull the user by their rank (which is a, um, fake field ^_^' created on the fly). This is a bit too complex for me as my SQL knowledge is enough for simple queries, I have never delved into alias' or nested queries, so you'll have to explain fairly simply so I can get a grasp.
I think there is two problems here. From what I can gather you want to do a join on two tables, order them by points and then return the nth record.
I've put together an UNTESTED query. The inner query does a join on the two tables and the outer query specifies that only a specific row is returned.
This example returns the 4th row.
SELECT * FROM
(SELECT *, #rownum := #rownum + 1 AS rank
FROM `points_table` `p`
JOIN `accounts_table` `a` ON a.ID = p.UID,
(SELECT #rownum:=0) r
ORDER BY `p`.`points` DESC) mytable
WHERE rank = 4
Hopefully this works for you!
I've made a change to the answer which should hopefully resolve that problem. Incidentally, whether you use a php or mysql to get the rank, you are still putting a heavy strain on resources. Before mysql can calculate the rank it must create a table of every user and then order them. So you are just moving the work from one area to another. As the number of users increases, so too will the query execution time regardless of your solution. MySQL will probably take slightly longer to perform calculations which is why PHP is probably a more ideal solution. But I also know from experience, that sometimes extraneous details prevent you from having a completely elegant solution. Hope the altered code works.
I need to get the last (newest) row in a table (using MySQL's natural order - i.e. what I get without any kind of ORDER BY clause), however there is no key I can ORDER BY on!
The only 'key' in the table is an indexed MD5 field, so I can't really ORDER BY on that. There's no timestamp, autoincrement value, or any other field that I could easily ORDER on either. This is why I'm left with only the natural sort order as my indicator of 'newest'.
And, unfortunately, changing the table structure to add a proper auto_increment is out of the question. :(
Anyone have any ideas on how this can be done w/ plain SQL, or am I SOL?
If it's MyISAM you can do it in two queries
SELECT COUNT(*) FROM yourTable;
SELECT * FROM yourTable LIMIT useTheCountHere - 1,1;
This is unreliable however because
It assumes rows are only added to this table and never deleted.
It assumes no other writes are performed to this table in the meantime (you can lock the table)
MyISAM tables can be reordered using ALTER TABLE, so taht the insert order is no longer preserved.
It's not reliable at all in InnoDB, since this engine can reorder the table at will.
Can I ask why you need to do this?
In oracle, possibly the same for MySQL too but the optimiser will choose the quickest record / order to return you results. So there is potential if your data was static to run the same query twice and get a different answer.
You can assign row numbers using the ROW_NUMBER function and then sort by this value using the ORDER BY clause.
SELECT *,
ROW_NUMBER() OVER() AS rn
FROM table
ORDER BY rn DESC
LIMIT 1;
Basically, you can't do that.
Normally I'd suggest adding a surrogate primary key with auto-incrememt and ORDER BY that:
SELECT *
FROM yourtable
ORDER BY id DESC
LIMIT 1
But in your question you write...
changing the table structure to add a proper auto_increment is out of the question.
So another less pleasant option I can think of is using a simulated ROW_NUMBER using variables:
SELECT * FROM
(
SELECT T1.*, #rownum := #rownum + 1 AS rn
FROM yourtable T1, (SELECT #rownum := 0) T2
) T3
ORDER BY rn DESC
LIMIT 1
Please note that this has serious performance implications: it requires a full scan and the results are not guaranteed to be returned in any particular order in the subquery - you might get them in sort order, but then again you might not - when you dont' specify the order the server is free to choose any order it likes. Now it probably will choose the order they are stored on disk in order to do as little work as possible, but relying on this is unwise.
Without an order by clause you have no guarantee of the order in which you will get your result. The SQL engine is free to choose any order.
But if for some reason you still want to rely on this order, then the following will indeed return the last record from the result (MySql only):
select *
from (select *,
#rn := #rn + 1 rn
from mytable,
(select #rn := 0) init
) numbered
where rn = #rn
In the sub query the records are retrieved without order by, and are given a sequential number. The outer query then selects only the one that got the last attributed number.
We can use the having for that kind of problem-
SELECT MAX(id) as last_id,column1,column2 FROM table HAVING id=last_id;