I'm developing a scoreboard of sorts. The table structure is ID, UID, points with UID being linked to a users account.
Now, I have this working somewhat, but I need one specific thing for this query to be pretty much perfect. To pick a user based on rank.
I'll show you my the SQL.
SELECT *, #rownum := #rownum + 1 AS `rank` FROM
(SELECT * FROM `points_table` `p`
ORDER BY `p`.`points` DESC
LIMIT 1)
`user_rank`,
(SELECT #rownum := 0) `r`, `accounts_table` `a`, `points_table` `p`
WHERE `a`.`ID` = `p`.`UID`
It's simple to have it pick people out by UID, but that's no good. I need this to pull the user by their rank (which is a, um, fake field ^_^' created on the fly). This is a bit too complex for me as my SQL knowledge is enough for simple queries, I have never delved into alias' or nested queries, so you'll have to explain fairly simply so I can get a grasp.
I think there is two problems here. From what I can gather you want to do a join on two tables, order them by points and then return the nth record.
I've put together an UNTESTED query. The inner query does a join on the two tables and the outer query specifies that only a specific row is returned.
This example returns the 4th row.
SELECT * FROM
(SELECT *, #rownum := #rownum + 1 AS rank
FROM `points_table` `p`
JOIN `accounts_table` `a` ON a.ID = p.UID,
(SELECT #rownum:=0) r
ORDER BY `p`.`points` DESC) mytable
WHERE rank = 4
Hopefully this works for you!
I've made a change to the answer which should hopefully resolve that problem. Incidentally, whether you use a php or mysql to get the rank, you are still putting a heavy strain on resources. Before mysql can calculate the rank it must create a table of every user and then order them. So you are just moving the work from one area to another. As the number of users increases, so too will the query execution time regardless of your solution. MySQL will probably take slightly longer to perform calculations which is why PHP is probably a more ideal solution. But I also know from experience, that sometimes extraneous details prevent you from having a completely elegant solution. Hope the altered code works.
Related
I have an Eloquent query that is currently taking about 700ms to run and it will only increase as I add more websites to the user account. I'm trying to see what the best way to optimize it is so that it can run faster.
I really don't want to save the "results" of my calculations and then just fetch those in a smaller query later because they could update at any moment and that would mean they would not be accurate 100% of the time. Although I am pretty sure that would speed up the query, I don't want to sacrifice accuracy over performance.
This is essentially the raw query that runs:
select *
from
( SELECT `positions`.*,
#rank := IF(#group = keyword_id, #rank+1,
1) as rank_e0686ae02a55b8ad75aec0c7aaec0a21,
#group := keyword_id as group_e0686ae02a55b8ad75aec0c7aaec0a21
from
( SELECT #rank:=0, #group:=0 ) as vars,
positions
order by `keyword_id` asc, `created_at` desc
) as positions
where `rank_e0686ae02a55b8ad75aec0c7aaec0a21` <= '2'
and `positions`.`keyword_id` in ('hundreds of IDs listed here')
The query is generated using the solution mentioned here with regards to getting N number of relations per record.
I've tried running a simpler query without the N number of relations per record, and it actually ends up being even slower because it's fetching much more data. So the problem I think is that there are too many IDs that are trying to be matched up in the IN method of the query.
In my controller I have:
$user = auth()->user();
$websites = $user->websitesAndKeywords();
In my User model:
public function websitesAndKeywords() {
$user = auth()->user();
$websites = $user->websites()->orderBy('url')->get();
$websites->load('keywords', 'keywords.latestPositions');
return $websites;
}
I would appreciate any help anyone could provide in helping me speed this thing up.
EDIT: So I think I figured it out. The problem is the IN clause that Laravel uses every time eager loading is used to load relations. So I need to find a way to do a JOIN instead of eager loading.
Essentially need to convert this:
$websites->load('keywords', 'keywords.latestPositions');
Into:
$websites->load(['keywords' => function($query)
{
$query->join('positions', 'keywords.id', '=', 'positions.keyword_id');
}]);
That doesn't work, so I'm not sure what's the best way to do a JOIN on a current collection. Ideally I would also only fetch the latest N positions too and not all data.
Here are indexes on positions table:
And here is what explain returns for the query:
You need, if you don't have it yet, an index positions(keyword_id), or maybe positions(keyword_id,created_at), depending on your data, depending on if you want to keep using "lazy evaluate", and depending on if you want to use the trigger-solution.
And you have to, as Rick suggested, move your keyword_id in... into the inner query, as mysql will not be able to optimize it into the subquery since the optimizer doesn't understand that IF(#group = keyword_id, #rank+1, 1) will not need the other keywords to work properly.
This should give results for tables with several million rows (if you don't want to retrieve them all in IN) in less than 700ms, and might be improved by removing the "lazy evaluate" as Rick also suggested (so you do less table lookups for columns not included in your index), depending on your data.
If you still have troubles, you could however actually precalculate the data without loss of accuracy by using triggers. It will add a (most likely small) overhead to your inserts/updates, so if you insert/update a lot and only query once in a while, you might not want to do it.
For this, you should really use the index positions (keyword_id,created_at).
Add another table keywordrank with the columns keyword_id, rank, primarykeyofpositionstable, primary key keyword_id and rank. You need another table, since in a trigger, mysql can't update other rows in the table you are updating.
Create a trigger that will update these ranks on every insert to your positions-table:
delimiter $$
create trigger tr_positions_after_insert_updateranks after insert on positions
for each row
begin
delete from keywordrank where keyword_id = NEW.keyword_id;
insert into keywordrank (keyword_id, rank, primarykeyofpositionstable)
select NEW.keyword_id, ranks.rank, ranks.position_pk
from
(select NEW.keyword_id,
#rank := #rank+1 as rank,
`positions`.primarykeyofpositionstable as position_pk
from
(SELECT #rank:=0, #group:=0 ) as vars,
positions
where `positions`.keyword_id = NEW.keyword_id
order by `keyword_id` asc, `created_at` desc
) as ranks
where ranks.rank <= 2;
end$$
delimiter ;
If you want to be able to update or delete the entries (or to be safe if you do it at one time, so it might be a good idea anyway), add the same as an update/delete-trigger, just do it for both old.keyword_id and new.keyword_id - and you might want to put the code into a procedure to reuse it then. E.g. create a procedure fctname(kwid int), put the whole trigger code in it but replace all NEW.keyword_id with kwid and then just call fctname(new.keyword_id) for insert, fctname(new.keyword_id) and fctname(old.keyword_id) for update and fctname(old.keyword_id) for delete.
You need to init that table one time (and if you e.g. decide you may need more ranks or another order), you can use any version of your code, e.g.
delete from keywordrank;
insert into keywordrank (keyword_id, rank, primarykeyofpositionstable)
select ranks.keyword_id, ranks.rank, ranks.position_pk
from
( SELECT `positions`.primarykeyofpositionstable as position_pk,
#rank := IF(#group = keyword_id, #rank+1,
1) as rank,
#group := keyword_id as keyword_id
from
( SELECT #rank:=0, #group:=0 ) as vars,
positions
order by `keyword_id` asc, `created_at` desc
) as ranks
where ranks.rank <= 2;
You can put both the trigger(s) and the init in your migration files (without the delimiter).
You then can just use a join to get your desired rows.
Update The code without trigger, using index on (keyword_id, created_at). You can calculate the inner query completely from the index and then only look up the found ids in the tabledata. It depends on the number of rows in your result (in relation to your whole table), how much of an effect removing the lazy evaluate will have.
select positions.*, poslist.rank, poslist.group
from positions
join
( SELECT `positions`.id,
#rank := IF(#group = keyword_id, #rank+1,
1) as rank,
#group := keyword_id as group
from
( SELECT #rank:=0, #group:=0 ) as vars,
positions
where `positions`.`keyword_id` in ('hundreds of IDs listed here')
order by `keyword_id` asc, `created_at` desc
) as poslist
on positions.id = poslist.id
where poslist.rank <= 2;
Check explain if it actually uses the correct index (keyword_id, created_at). If that is not fast enough, you should try the trigger solution. (Or add the new explain-output and show profile-output to let us have a deeper look.)
The code for finding the "top 2 in each grouping" is the best I have ever seen. It is essentially the same as what I have in my blog on such.
However, there are two other things that we may be able to improve on.
Move keyword_id in... from the outer query to the inner. I assume you have an index starting with keyword_id?
"Lazy evaluate". That is, instead of doing SELECT positions.*, ..., do only SELECT id, ... where id is the PRIMARY KEY of positions. Then, in the outer query, JOIN back to positions to get the rest of the columns. With out seeing SHOW CREATE TABLE and knowing what percentage of the table in in the IN list, I can't be sure that this will help much.
tl;dr - lots of accepted stackoverflow answers suggest using a subquery to affect the row returned by a GROUP BY clause. While this works, is it the best advice?
I understand there are many questions already about how to retrieve a specific row in a GROUP BY statement. Most of them revolve around using a subquery in the FROM clause. The subquery will order the table appropriately and the group by will be run against the now-ordered temporary table. Some examples,
MySQL order by before group by
MySQL "Group By" and "Order By"
PostgreSQL removes the need for the subquery with the distinct on() clause.
Postgresql DISTINCT ON with different ORDER BY
However, what I'm not understanding in any of these cases is how badly I'm shooting myself in the foot trying to do something the system may not have originally been designed for. Take the following two examples in PostgreSQL and MySQL,
http://sqlfiddle.com/#!15/3b0f2/1
http://sqlfiddle.com/#!2/6d337/1
In both cases I have a table of posts that contain multiple versions of the same post (signified by its UUID). I want to select the most recently published version of each post ordered by it's created_at field.
My biggest concern is that given the MySQL approach a temporary table is necessary. Ratchet this up to "web scale" (lolz) and I'm wondering if I'm in for a world of hurt. Should I rethink my schema or are there ways to optimize the subquery-parentquery relationship enough that it'll be alright?
It is definitely not the best advice. SQL itself (and the MySQL documentation as far as I can tell) has little to say about the results from a subquery with an order by. Although they may be ordered in practice, they are not guaranteed to be.
The more important issue is the use of "hidden columns" in the aggregation. Consider this basic query:
select t.*
from (select t.* from table t order by datecol) t
group by t.col;
Everything except t.col in the select comes from an indeterminate row. The specific documentation is (emphasis is mine):
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
Furthermore, the selection of values from each group cannot be
influenced by adding an ORDER BY clause. Sorting of the result set
occurs after values have been chosen, and ORDER BY does not affect
which values within each group the server chooses.
A safe way to write such a query is:
select t.*
from table t
where not exists (select 1
from table t2
where t2.col = t.col and t2.datecol < t.datecol
);
This is not exactly the same, because it will return multiple values if the minimum is not unique. The logic is "get me all rows in the table where there are no rows with the same col value and a smaller datecol value.
EDIT:
The question in your comment doesn't make sense, because nothing is discussing two queries. In MySQL you can use order by with variables to solve this:
select t.*
from (select t.*,
#rn := if(#col = col, #rn := #rn + 1, 1) as rn,
#col := col
from table t cross join
(select #col := '', #rn := 0) vars
order by col, datecol) t
where rn = 1;
This method should be faster than the order by with group by.
I'm trying to write a query that will find the most consecutive of something in my database. This has led to me trying out variables, which I've never really used before.
The problem I have is my query is giving me exactly the result I think it should, but when I use it as a subquery inside another query, it all seems to go to pot when I add the group by/order by clauses.
Is this normal, and if so what tends to be the solution? Or have I made a simple mistake?
The results of my subquery are perfect, and all I'm trying to do in the outer query is select the maximum of the "consecutive" column that I've created. This column takes the form of
#r := IF(nFound=nThis,#r + 1,0)
I.e. it simply counts up 1 for each row that fits my where/order arrangement, and resets to 0 if a match isn't found.
I was hoping that the subquery results would be "set" and simply used as the values before being used in the main query.
I liken this to excel; sometimes you want to "paste as values" rather than copying all of the formulas across, if you get what I mean. Is there a simple way to do so in MySQL?
I wondered if creating a view might "solidify" the data set, but then found out variables aren't allowed in views!
EDIT
OK, here's the query. It's not pretty, but I've been hacking around and trying lots of things. If you remove the last 2 lines and the "MAX" function it works fine, with them it only returns a single row, rather than 10 rows.
I've never used a cross join before today either; virtually everything I do normally seems to be just "JOIN" or "LEFT JOIN"s, but today it seemed necessary.
Basically the idea is to retrieve the maximum number of chronologically consecutive events that each person has been present at. Feel free to amend as you see fit!
The "P.person < 10" was just a test. There are in fact thousands of people, but if I tried to do it on everyone at once it was sitting and doing nothing for ages - the crossjoin getting too big, I assume?
SET #r=0;
SELECT person,MAX(nConsec) FROM (
SELECT #r := IF(nFound=person,#r + 1,0) AS nConsec
test.*
FROM (SELECT P.person, event, tDate, MAX(C.person) AS nFound
FROM PEOPLE P
CROSS JOIN EVENTS E
LEFT JOIN COMPETITORS C ON C.event=E.event AND C.person = P.person
WHERE P.person < 10
AND tDate < NOW()
GROUP BY P.person, event, tDate
ORDER BY P.person ASC, tDate ASC
) test
) test2
GROUP BY person
ORDER BY MAX(nConsec) DESC
EDIT 2
OK I've no idea what, but while changing some things to preserve a bit of anonymity, I seem to have inadvertently fixed my own code... A pleasant surprise, but annoying that no amount of ctrl-Z and ctrl-shift-Zing seems to be showing me what I was doing wrong in the first place!
Any opinion/advice on the mess I've got still appreciated. I'm sure I can do something cleverer that doesn't use a cross join. There's abotu 30,000 rows in "people" and 1000 in "events", and about 500 competitors per event, so I can see why a cross join gives me issues (15 billion rows I make that...). Query takes 0.6 seconds for those 10 IDs that I picked out, 34 seconds if I raise it to 1000 IDs.
What does this do for you:
SELECT person, MAX(nConsec) AS numConsecutive FROM (
SELECT person, COUNT(*) AS nConsec FROM (
SELECT #r := #r + (COALESCE(#person, P.person) <> P.person) as consecutive, #person := P.person AS person FROM (
SELECT #r := 0, #person := NULL
) vars
JOIN PEOPLE P
JOIN EVENTS E
LEFT JOIN COMPETITORS C
ON C.person = P.person
AND C.event = E.event
ORDER BY tDate
)
GROUP BY consecutive
)
Modified from code found at http://www.dancewithgrenades.com/blog/mysql-consecutive-row-streaks.
Note that if you're counting across multiple people, you need to keep track of the person you're counting for (#person variable). I think this should run quicker though, mostly due to the lack of GROUPing in the innermost subquery which was probably having a large impact on performance. If performance still isn't good enough, then I'd suggest creating a column in PEOPLE to hold this consecutive attendance value, modify the query to work on only one person at a time, and run the query for different sets of users at different times to update the value in PEOPLE.
Oh and as far as CROSS JOINs — in MySQL, CROSS JOIN is equivalent to INNER JOIN is equivalent to JOIN. You've used cross joins before, you just didn't realize it. ;)
I need to get the last (newest) row in a table (using MySQL's natural order - i.e. what I get without any kind of ORDER BY clause), however there is no key I can ORDER BY on!
The only 'key' in the table is an indexed MD5 field, so I can't really ORDER BY on that. There's no timestamp, autoincrement value, or any other field that I could easily ORDER on either. This is why I'm left with only the natural sort order as my indicator of 'newest'.
And, unfortunately, changing the table structure to add a proper auto_increment is out of the question. :(
Anyone have any ideas on how this can be done w/ plain SQL, or am I SOL?
If it's MyISAM you can do it in two queries
SELECT COUNT(*) FROM yourTable;
SELECT * FROM yourTable LIMIT useTheCountHere - 1,1;
This is unreliable however because
It assumes rows are only added to this table and never deleted.
It assumes no other writes are performed to this table in the meantime (you can lock the table)
MyISAM tables can be reordered using ALTER TABLE, so taht the insert order is no longer preserved.
It's not reliable at all in InnoDB, since this engine can reorder the table at will.
Can I ask why you need to do this?
In oracle, possibly the same for MySQL too but the optimiser will choose the quickest record / order to return you results. So there is potential if your data was static to run the same query twice and get a different answer.
You can assign row numbers using the ROW_NUMBER function and then sort by this value using the ORDER BY clause.
SELECT *,
ROW_NUMBER() OVER() AS rn
FROM table
ORDER BY rn DESC
LIMIT 1;
Basically, you can't do that.
Normally I'd suggest adding a surrogate primary key with auto-incrememt and ORDER BY that:
SELECT *
FROM yourtable
ORDER BY id DESC
LIMIT 1
But in your question you write...
changing the table structure to add a proper auto_increment is out of the question.
So another less pleasant option I can think of is using a simulated ROW_NUMBER using variables:
SELECT * FROM
(
SELECT T1.*, #rownum := #rownum + 1 AS rn
FROM yourtable T1, (SELECT #rownum := 0) T2
) T3
ORDER BY rn DESC
LIMIT 1
Please note that this has serious performance implications: it requires a full scan and the results are not guaranteed to be returned in any particular order in the subquery - you might get them in sort order, but then again you might not - when you dont' specify the order the server is free to choose any order it likes. Now it probably will choose the order they are stored on disk in order to do as little work as possible, but relying on this is unwise.
Without an order by clause you have no guarantee of the order in which you will get your result. The SQL engine is free to choose any order.
But if for some reason you still want to rely on this order, then the following will indeed return the last record from the result (MySql only):
select *
from (select *,
#rn := #rn + 1 rn
from mytable,
(select #rn := 0) init
) numbered
where rn = #rn
In the sub query the records are retrieved without order by, and are given a sequential number. The outer query then selects only the one that got the last attributed number.
We can use the having for that kind of problem-
SELECT MAX(id) as last_id,column1,column2 FROM table HAVING id=last_id;
I have a MySQL table with many rows. The table has a popularity column. If I sort by popularity, I can get the rank of each item. Is it possible to retrieve the rank of a particular item without sorting the entire table? I don't think so. Is that correct?
An alternative would be to create a new column for storing rank, sort the entire table, and then loop through all the rows and update the rank. That is extremely inefficient. Is there perhaps a way to do this in a single query?
There is no way to calculate the order (what you call rank) of something without first sorting the table or storing the rank.
If your table is properly indexed however (index on popularity) it is trivial for the database to sort this so you can get your rank. I'd suggest something like the following:
Select all, including rank
SET #rank := 0;
SELECT t.*, #rank := #rank + 1
FROM table t
ORDER BY t.popularity;
To fetch an item with a specific "id" then you can simply use a subquery as follows:
Select one, including rank
SET #rank := 0;
SELECT * FROM (
SELECT t.*, #rank := #rank + 1
FROM table t
ORDER BY t.popularity
) t2
WHERE t2.id = 1;
You are right that the second approach is inefficent, if the rank column is updated on every table read. However, depending on how many updates there are to the database, you could calculate the rank on every update, and store that - it is a form of caching. You are then turning a calculated field into a fixed value field.
This video covers caching in mysql, and although it is rails specific, and is a slightly different form of caching, is a very similar caching strategy.
If you are using an InnoDb table then you may consider building a clustered index on the popularity column. (only if the order by on popularity is a frequent query). The decision also depends on how varied the popularity column is (0 - 3 not so good).
You can look at this info on clustered index to see if this works for your case: http://msdn.microsoft.com/en-us/library/ms190639.aspx
This refers to SQL server but the concept is the same, also look up mysql documentation on this.
If you're doing this using PDO then you need to modify the query to all be within the single statement in order to get it to work properly. See PHP/PDO/MySQL: Convert Multiple Queries Into Single Query
So hobodave's answer becomes something like:
SELECT t.*, (#count := #count + 1) as rank
FROM table t
CROSS JOIN (SELECT #count := 0) CONST
ORDER BY t.popularity;
hobodave's solution is very good. Alternatively, you could add a separate rank column and then, whenever a row's popularity is UPDATEd, query to determine whether that popularity update changed its ranking relative to the row above and below it, then UPDATE the 3 rows affected. You'd have to profile to see which method is more efficient.