I have an Eloquent query that is currently taking about 700ms to run and it will only increase as I add more websites to the user account. I'm trying to see what the best way to optimize it is so that it can run faster.
I really don't want to save the "results" of my calculations and then just fetch those in a smaller query later because they could update at any moment and that would mean they would not be accurate 100% of the time. Although I am pretty sure that would speed up the query, I don't want to sacrifice accuracy over performance.
This is essentially the raw query that runs:
select *
from
( SELECT `positions`.*,
#rank := IF(#group = keyword_id, #rank+1,
1) as rank_e0686ae02a55b8ad75aec0c7aaec0a21,
#group := keyword_id as group_e0686ae02a55b8ad75aec0c7aaec0a21
from
( SELECT #rank:=0, #group:=0 ) as vars,
positions
order by `keyword_id` asc, `created_at` desc
) as positions
where `rank_e0686ae02a55b8ad75aec0c7aaec0a21` <= '2'
and `positions`.`keyword_id` in ('hundreds of IDs listed here')
The query is generated using the solution mentioned here with regards to getting N number of relations per record.
I've tried running a simpler query without the N number of relations per record, and it actually ends up being even slower because it's fetching much more data. So the problem I think is that there are too many IDs that are trying to be matched up in the IN method of the query.
In my controller I have:
$user = auth()->user();
$websites = $user->websitesAndKeywords();
In my User model:
public function websitesAndKeywords() {
$user = auth()->user();
$websites = $user->websites()->orderBy('url')->get();
$websites->load('keywords', 'keywords.latestPositions');
return $websites;
}
I would appreciate any help anyone could provide in helping me speed this thing up.
EDIT: So I think I figured it out. The problem is the IN clause that Laravel uses every time eager loading is used to load relations. So I need to find a way to do a JOIN instead of eager loading.
Essentially need to convert this:
$websites->load('keywords', 'keywords.latestPositions');
Into:
$websites->load(['keywords' => function($query)
{
$query->join('positions', 'keywords.id', '=', 'positions.keyword_id');
}]);
That doesn't work, so I'm not sure what's the best way to do a JOIN on a current collection. Ideally I would also only fetch the latest N positions too and not all data.
Here are indexes on positions table:
And here is what explain returns for the query:
You need, if you don't have it yet, an index positions(keyword_id), or maybe positions(keyword_id,created_at), depending on your data, depending on if you want to keep using "lazy evaluate", and depending on if you want to use the trigger-solution.
And you have to, as Rick suggested, move your keyword_id in... into the inner query, as mysql will not be able to optimize it into the subquery since the optimizer doesn't understand that IF(#group = keyword_id, #rank+1, 1) will not need the other keywords to work properly.
This should give results for tables with several million rows (if you don't want to retrieve them all in IN) in less than 700ms, and might be improved by removing the "lazy evaluate" as Rick also suggested (so you do less table lookups for columns not included in your index), depending on your data.
If you still have troubles, you could however actually precalculate the data without loss of accuracy by using triggers. It will add a (most likely small) overhead to your inserts/updates, so if you insert/update a lot and only query once in a while, you might not want to do it.
For this, you should really use the index positions (keyword_id,created_at).
Add another table keywordrank with the columns keyword_id, rank, primarykeyofpositionstable, primary key keyword_id and rank. You need another table, since in a trigger, mysql can't update other rows in the table you are updating.
Create a trigger that will update these ranks on every insert to your positions-table:
delimiter $$
create trigger tr_positions_after_insert_updateranks after insert on positions
for each row
begin
delete from keywordrank where keyword_id = NEW.keyword_id;
insert into keywordrank (keyword_id, rank, primarykeyofpositionstable)
select NEW.keyword_id, ranks.rank, ranks.position_pk
from
(select NEW.keyword_id,
#rank := #rank+1 as rank,
`positions`.primarykeyofpositionstable as position_pk
from
(SELECT #rank:=0, #group:=0 ) as vars,
positions
where `positions`.keyword_id = NEW.keyword_id
order by `keyword_id` asc, `created_at` desc
) as ranks
where ranks.rank <= 2;
end$$
delimiter ;
If you want to be able to update or delete the entries (or to be safe if you do it at one time, so it might be a good idea anyway), add the same as an update/delete-trigger, just do it for both old.keyword_id and new.keyword_id - and you might want to put the code into a procedure to reuse it then. E.g. create a procedure fctname(kwid int), put the whole trigger code in it but replace all NEW.keyword_id with kwid and then just call fctname(new.keyword_id) for insert, fctname(new.keyword_id) and fctname(old.keyword_id) for update and fctname(old.keyword_id) for delete.
You need to init that table one time (and if you e.g. decide you may need more ranks or another order), you can use any version of your code, e.g.
delete from keywordrank;
insert into keywordrank (keyword_id, rank, primarykeyofpositionstable)
select ranks.keyword_id, ranks.rank, ranks.position_pk
from
( SELECT `positions`.primarykeyofpositionstable as position_pk,
#rank := IF(#group = keyword_id, #rank+1,
1) as rank,
#group := keyword_id as keyword_id
from
( SELECT #rank:=0, #group:=0 ) as vars,
positions
order by `keyword_id` asc, `created_at` desc
) as ranks
where ranks.rank <= 2;
You can put both the trigger(s) and the init in your migration files (without the delimiter).
You then can just use a join to get your desired rows.
Update The code without trigger, using index on (keyword_id, created_at). You can calculate the inner query completely from the index and then only look up the found ids in the tabledata. It depends on the number of rows in your result (in relation to your whole table), how much of an effect removing the lazy evaluate will have.
select positions.*, poslist.rank, poslist.group
from positions
join
( SELECT `positions`.id,
#rank := IF(#group = keyword_id, #rank+1,
1) as rank,
#group := keyword_id as group
from
( SELECT #rank:=0, #group:=0 ) as vars,
positions
where `positions`.`keyword_id` in ('hundreds of IDs listed here')
order by `keyword_id` asc, `created_at` desc
) as poslist
on positions.id = poslist.id
where poslist.rank <= 2;
Check explain if it actually uses the correct index (keyword_id, created_at). If that is not fast enough, you should try the trigger solution. (Or add the new explain-output and show profile-output to let us have a deeper look.)
The code for finding the "top 2 in each grouping" is the best I have ever seen. It is essentially the same as what I have in my blog on such.
However, there are two other things that we may be able to improve on.
Move keyword_id in... from the outer query to the inner. I assume you have an index starting with keyword_id?
"Lazy evaluate". That is, instead of doing SELECT positions.*, ..., do only SELECT id, ... where id is the PRIMARY KEY of positions. Then, in the outer query, JOIN back to positions to get the rest of the columns. With out seeing SHOW CREATE TABLE and knowing what percentage of the table in in the IN list, I can't be sure that this will help much.
Related
Is there a more efficient way to grab the top X results from each group?
You can ignore any field in the sqlfiddle that is not used in the query.
The query:
SET #num := 0, #item_id := '';
SELECT `item_id`, `user_id`, total_hoarded FROM (
SELECT `item_id`, `user_id`, total_hoarded,
#num := IF(#item_id = x.`item_id`, #num + 1, 1) AS ROW_NUMBER,
#item_id := x.`item_id` AS dummy
FROM (
SELECT `item_id`, `user_id`, COUNT(*) AS total_hoarded
FROM `player_items`
GROUP BY `item_id`, `user_id`
ORDER BY `item_id`, total_hoarded DESC
) AS x
) AS y WHERE y.ROW_NUMBER <= 10;");
The demo: http://sqlfiddle.com/#!2/75bc7/1
Description of query:
(Starting from the most nested query) It grabs and groups all of the rows by item_id and and user_id so that we can do some aggregate functions to figure out how many items each user has.
The next level up then attaches a row_number to each row, so that the final query can simply grab all rows less than X (in this case the top 10 users for each grouping).
SQLFiddle is limited in terms of how big the sample can be, so it only shows data for two items, and a handful of users. Not enough to fully populate the top 10, but enough to show what I am doing.
The options (I am considering):
Leave query as is.
Do a standard query grouping and loop it through PHP to grab the top 10
Other? (Haven't thought of any others)
Notes:
I realise that I might not have given enough detail, so let me know what you need. I am only looking at a general way to approach this. The query above takes about 5mins to run on a table with 30mill rows. This is not a big deal though, as the query is only run once every hour.
Breaking the query into smaller parts might run faster, but the table gets written to a lot, so queries tend to get locked out.
The problem
I'm looking at the ranking use case in MySQL but I still haven't settled on a definite "best solution" for it. I have a table like this:
CREATE TABLE mytable (
item_id int unsigned NOT NULL,
# some other data fields,
item_score int unsigned NOT NULL,
PRIMARY KEY (item_id),
KEY item_score (item_score)
) ENGINE=MyISAM;
with some millions records in it, and the most common write operation is to update item_score with a new value. Given an item_id and/or its score, I need to get its ranking, and I currently know two ways to accomplish that.
COUNT() items with higher scores
SELECT COUNT(*) FROM mytable WHERE item_score > $foo;
assign row numbers
SET #rownum := 0;
SELECT rank FROM (
SELECT #rownum := #rownum + 1 AS rank, item_id
FROM mytable ORDER BY item_score DESC ) AS result
WHERE item_id = $foo;
which one?
Do they perform the same or behave differently? If so, why are they different and which one should I choose?
any better idea?
Is there any better / faster approach? The only thing I can come up with is a separate table/memcache/NoSQL/whatever to store pre-calculated rankings, but I still have to sort & read out mytable every time I update it. That makes me think it would be a good approach only if the number of "read rank" queries is (much?) greather than the number of updates, on the other hand it should be less useful with the "read rank" queries approaching the number of update queries.
Since you have indexes on your table the only queries to use that makes sense is
-- findByScore
SELECT COUNT(*) FROM mytable WHERE item_score > :item_score;
-- findById
SELECT COUNT(*) FROM mytable WHERE item_score > (select item_score from mytable where item_id = :item_id);
on findById since you only need rank of 1 item id, it is not much different from join counterpart on performance wise.
If you need the rank of many items then using join is better.
Usign "assign row numbers" can not compete here because it wont make use of indexes (in your query not at all and if we would even improve that it is still not as good)
Also there may be some hidden traps using the assign indexes: if there are multiple items with same score then it will give you rank of last one.
Unrelated: And please use PDO if possible to be safe from sql injections.
I'm developing a scoreboard of sorts. The table structure is ID, UID, points with UID being linked to a users account.
Now, I have this working somewhat, but I need one specific thing for this query to be pretty much perfect. To pick a user based on rank.
I'll show you my the SQL.
SELECT *, #rownum := #rownum + 1 AS `rank` FROM
(SELECT * FROM `points_table` `p`
ORDER BY `p`.`points` DESC
LIMIT 1)
`user_rank`,
(SELECT #rownum := 0) `r`, `accounts_table` `a`, `points_table` `p`
WHERE `a`.`ID` = `p`.`UID`
It's simple to have it pick people out by UID, but that's no good. I need this to pull the user by their rank (which is a, um, fake field ^_^' created on the fly). This is a bit too complex for me as my SQL knowledge is enough for simple queries, I have never delved into alias' or nested queries, so you'll have to explain fairly simply so I can get a grasp.
I think there is two problems here. From what I can gather you want to do a join on two tables, order them by points and then return the nth record.
I've put together an UNTESTED query. The inner query does a join on the two tables and the outer query specifies that only a specific row is returned.
This example returns the 4th row.
SELECT * FROM
(SELECT *, #rownum := #rownum + 1 AS rank
FROM `points_table` `p`
JOIN `accounts_table` `a` ON a.ID = p.UID,
(SELECT #rownum:=0) r
ORDER BY `p`.`points` DESC) mytable
WHERE rank = 4
Hopefully this works for you!
I've made a change to the answer which should hopefully resolve that problem. Incidentally, whether you use a php or mysql to get the rank, you are still putting a heavy strain on resources. Before mysql can calculate the rank it must create a table of every user and then order them. So you are just moving the work from one area to another. As the number of users increases, so too will the query execution time regardless of your solution. MySQL will probably take slightly longer to perform calculations which is why PHP is probably a more ideal solution. But I also know from experience, that sometimes extraneous details prevent you from having a completely elegant solution. Hope the altered code works.
I need to get the last (newest) row in a table (using MySQL's natural order - i.e. what I get without any kind of ORDER BY clause), however there is no key I can ORDER BY on!
The only 'key' in the table is an indexed MD5 field, so I can't really ORDER BY on that. There's no timestamp, autoincrement value, or any other field that I could easily ORDER on either. This is why I'm left with only the natural sort order as my indicator of 'newest'.
And, unfortunately, changing the table structure to add a proper auto_increment is out of the question. :(
Anyone have any ideas on how this can be done w/ plain SQL, or am I SOL?
If it's MyISAM you can do it in two queries
SELECT COUNT(*) FROM yourTable;
SELECT * FROM yourTable LIMIT useTheCountHere - 1,1;
This is unreliable however because
It assumes rows are only added to this table and never deleted.
It assumes no other writes are performed to this table in the meantime (you can lock the table)
MyISAM tables can be reordered using ALTER TABLE, so taht the insert order is no longer preserved.
It's not reliable at all in InnoDB, since this engine can reorder the table at will.
Can I ask why you need to do this?
In oracle, possibly the same for MySQL too but the optimiser will choose the quickest record / order to return you results. So there is potential if your data was static to run the same query twice and get a different answer.
You can assign row numbers using the ROW_NUMBER function and then sort by this value using the ORDER BY clause.
SELECT *,
ROW_NUMBER() OVER() AS rn
FROM table
ORDER BY rn DESC
LIMIT 1;
Basically, you can't do that.
Normally I'd suggest adding a surrogate primary key with auto-incrememt and ORDER BY that:
SELECT *
FROM yourtable
ORDER BY id DESC
LIMIT 1
But in your question you write...
changing the table structure to add a proper auto_increment is out of the question.
So another less pleasant option I can think of is using a simulated ROW_NUMBER using variables:
SELECT * FROM
(
SELECT T1.*, #rownum := #rownum + 1 AS rn
FROM yourtable T1, (SELECT #rownum := 0) T2
) T3
ORDER BY rn DESC
LIMIT 1
Please note that this has serious performance implications: it requires a full scan and the results are not guaranteed to be returned in any particular order in the subquery - you might get them in sort order, but then again you might not - when you dont' specify the order the server is free to choose any order it likes. Now it probably will choose the order they are stored on disk in order to do as little work as possible, but relying on this is unwise.
Without an order by clause you have no guarantee of the order in which you will get your result. The SQL engine is free to choose any order.
But if for some reason you still want to rely on this order, then the following will indeed return the last record from the result (MySql only):
select *
from (select *,
#rn := #rn + 1 rn
from mytable,
(select #rn := 0) init
) numbered
where rn = #rn
In the sub query the records are retrieved without order by, and are given a sequential number. The outer query then selects only the one that got the last attributed number.
We can use the having for that kind of problem-
SELECT MAX(id) as last_id,column1,column2 FROM table HAVING id=last_id;
I have a MySQL table with many rows. The table has a popularity column. If I sort by popularity, I can get the rank of each item. Is it possible to retrieve the rank of a particular item without sorting the entire table? I don't think so. Is that correct?
An alternative would be to create a new column for storing rank, sort the entire table, and then loop through all the rows and update the rank. That is extremely inefficient. Is there perhaps a way to do this in a single query?
There is no way to calculate the order (what you call rank) of something without first sorting the table or storing the rank.
If your table is properly indexed however (index on popularity) it is trivial for the database to sort this so you can get your rank. I'd suggest something like the following:
Select all, including rank
SET #rank := 0;
SELECT t.*, #rank := #rank + 1
FROM table t
ORDER BY t.popularity;
To fetch an item with a specific "id" then you can simply use a subquery as follows:
Select one, including rank
SET #rank := 0;
SELECT * FROM (
SELECT t.*, #rank := #rank + 1
FROM table t
ORDER BY t.popularity
) t2
WHERE t2.id = 1;
You are right that the second approach is inefficent, if the rank column is updated on every table read. However, depending on how many updates there are to the database, you could calculate the rank on every update, and store that - it is a form of caching. You are then turning a calculated field into a fixed value field.
This video covers caching in mysql, and although it is rails specific, and is a slightly different form of caching, is a very similar caching strategy.
If you are using an InnoDb table then you may consider building a clustered index on the popularity column. (only if the order by on popularity is a frequent query). The decision also depends on how varied the popularity column is (0 - 3 not so good).
You can look at this info on clustered index to see if this works for your case: http://msdn.microsoft.com/en-us/library/ms190639.aspx
This refers to SQL server but the concept is the same, also look up mysql documentation on this.
If you're doing this using PDO then you need to modify the query to all be within the single statement in order to get it to work properly. See PHP/PDO/MySQL: Convert Multiple Queries Into Single Query
So hobodave's answer becomes something like:
SELECT t.*, (#count := #count + 1) as rank
FROM table t
CROSS JOIN (SELECT #count := 0) CONST
ORDER BY t.popularity;
hobodave's solution is very good. Alternatively, you could add a separate rank column and then, whenever a row's popularity is UPDATEd, query to determine whether that popularity update changed its ranking relative to the row above and below it, then UPDATE the 3 rows affected. You'd have to profile to see which method is more efficient.