analysing what to improve via an EXPLAIN query on MySQL - mysql

I'm not that well versed in indexing in MySQL, and am having a hard time trying to understand how the EXPLAIN output works, and how it can be read to know if my query is optimised or not.
I have a fairly large table (1.1M records) and I am executing the below query:
SELECT * FROM `Member` this_ WHERE (this_._Temporary_Flag = 0 or this_._Temporary_Flag
is null) and (this_._Deleted = 0 or this_._Deleted is null) and
(this_.Username = 'XXXXXXXX' or this_.Email = 'XXXXXXXX')
ORDER BY this_.Priority asc;
It takes a very long time to execute, between 30 - 60 seconds most of the times. The output of the EXPLAIN query is as below:
id select_type table type possible_keys key key_len ref rows Extra
----------------------------------------------------------------------------------------------------------------------------------
1 SIMPLE this_ ref_or_null _Temporary_Flag,_Deleted,username,email _Temporary_Flag 2 const 33735 Using where; Using filesort
What does this statement exactly mean? Does it mean that this query can be optimised? The table has mostly single-column indexes. What are the important output from the EXPLAIN query which I should use?

It is saying that the index it has chosen to use is the one called _Temporary_Flag (which I assume is on the _Temporary_Flag column). This is not a great index to use (it still leaves it looking at 33k records), but the best it can use in the situation. It might be worth adding an index covering both the _Temporary_Flag and the _Deleted columns.
However I doubt that narrows things down much.
One issue is that MySQL can only use a single index on a table within a query. It is likely that the best indexes to use would be on Username and another on Email, but as your query has an OR there it would have to chose one or the other.
A way round this restriction on indexes is to use 2 queries unioned together, something like this:-
SELECT *
FROM `Member` this_
WHERE (this_._Temporary_Flag = 0
or this_._Temporary_Flag is null)
and (this_._Deleted = 0
or this_._Deleted is null)
and this_.Email = 'XXXXXXXX'
UNION
SELECT *
FROM `Member` this_
WHERE (this_._Temporary_Flag = 0
or this_._Temporary_Flag is null)
and (this_._Deleted = 0
or this_._Deleted is null)
and this_.Username = 'XXXXXXXX'
ORDER BY this_.Priority asc;

http://dev.mysql.com/doc/refman/5.5/en/explain-output.html
Explain tells you what MySQL is doing, it doesn't necessarily tell you or even imply what can be done to make things better.
That said, there are a few warning signs that generally imply that you can optimize a query; the biggest one in this case is the occurrence of Using filesort in the Extra column.
The documentation explains what happens in that case:
MySQL must do an extra pass to find out how to retrieve the rows in
sorted order. The sort is done by going through all rows according to
the join type and storing the sort key and pointer to the row for all
rows that match the WHERE clause. The keys then are sorted and the
rows are retrieved in sorted order.
Another warning sign in your case is the key that is used. While not necessarily true in your case, a well normalized structure will generally require unique values for Username and Email.
So, why does it take so long when you are specifying those two things? Shouldn't the optimizer be able to just go straight to those rows? Probably not, because you are specifying them with an OR, which makes it difficult for the optimizer to use indexes to find those rows.
Instead, the optimizer decided to _Temporary_Flag to look through all the results, which probably didn't narrow the result set much, especially given that the Explain says that approximately 33735 rows were looked at.
So, working on the assumption that email and username will be much more selective than this key, you could try rewriting your query as a UNION.
SELECT * FROM `Member` this_
WHERE
(this_._Temporary_Flag = 0 or this_._Temporary_Flag
is null)
and
(this_._Deleted = 0 or this_._Deleted is null)
and this_.Email = 'XXXXXXXX'
UNION
SELECT * FROM `Member` this_
WHERE (this_._Temporary_Flag = 0 or this_._Temporary_Flag
is null)
and (this_._Deleted = 0 or this_._Deleted is null)
and
this_.Username = 'XXXXXXXX'
ORDER BY this_.Priority asc;
So, those are a couple of warning signs from EXPLAIN: Look for Using filesort and strange key choices as indicators that you can probably improve things.

Related

SQL gets slow on a simple query with ORDER BY

I have problem with MySQL ORDER BY, it slows down query and I really don't know why, my query was a little more complex so I simplified it to a light query with no joins, but it stills works really slow.
Query:
SELECT
W.`oid`
FROM
`z_web_dok` AS W
WHERE
W.`sent_eRacun` = 1 AND W.`status` IN(8, 9) AND W.`Drzava` = 'BiH'
ORDER BY W.`oid` ASC
LIMIT 0, 10
The table has 946,566 rows, with memory taking 500 MB, those fields I selecting are all indexed as follow:
oid - INT PRIMARY KEY AUTOINCREMENT
status - INT INDEXED
sent_eRacun - TINYINT INDEXED
Drzava - VARCHAR(3) INDEXED
I am posting screenshoots of explain query first:
The next is the query executed to database:
And this is speed after I remove ORDER BY.
I have also tried sorting with DATETIME field which is also indexed, but I get same slow query as with ordering with primary key, this started from today, usually it was fast and light always.
What can cause something like this?
The kind of query you use here calls for a composite covering index. This one should handle your query very well.
CREATE INDEX someName ON z_web_dok (Drzava, sent_eRacun, status, oid);
Why does this work? You're looking for equality matches on the first three columns, and sorting on the fourth column. The query planner will use this index to satisfy the entire query. It can random-access the index to find the first row matching your query, then scan through the index in order to get the rows it needs.
Pro tip: Indexes on single columns are generally harmful to performance unless they happen to match the requirements of particular queries in your application, or are used for primary or foreign keys. You generally choose your indexes to match your most active, or your slowest, queries. Edit You asked whether it's better to create specific indexes for each query in your application. The answer is yes.
There may be an even faster way. (Or it may not be any faster.)
The IN(8, 9) gets in the way of easily handling the WHERE..ORDER BY..LIMIT completely efficiently. The possible solution is to treat that as OR, then convert to UNION and do some tricks with the LIMIT, especially if you might also be using OFFSET.
( SELECT ... WHERE .. = 8 AND ... ORDER BY oid LIMIT 10 )
UNION ALL
( SELECT ... WHERE .. = 9 AND ... ORDER BY oid LIMIT 10 )
ORDER BY oid LIMIT 10
This will allow the covering index described by OJones to be fully used in each of the subqueries. Furthermore, each will provide up to 10 rows without any temp table or filesort. Then the outer part will sort up to 20 rows and deliver the 'correct' 10.
For OFFSET, see http://mysql.rjweb.org/doc.php/index_cookbook_mysql#or

PHP; MySQL JOIN query on large datasets gets slower as WHERE conditions update

So this might be a bit silly, but the alternative I was using is worse. I am trying to write an excel sheet using data from my database and a PHP tool called Box/Spout. The thing is that Box/Spout reads rows one at a time, and they are not retrieved via index ( e.g. rows[10], rows[42], rows[156] )
I need to retrieve data from the database in the order the rows come out. I have a database with a list of customers, that came in via Import and I have to write them into the excel spreadsheet. They have phone numbers, emails, and an address. Sorry for the confusion... :/ So I compiled this fairly complex query:
SELECT
`Import`.`UniqueID`,
`Import`.`RowNum`,
`People`.`PeopleID`,
`People`.`First`,
`People`.`Last`,
GROUP_CONCAT(
DISTINCT CONCAT_WS(',', `PhonesTable`.`Phone`, `PhonesTable`.`Type`)
ORDER BY `PhonesTable`.`PhoneID` DESC
SEPARATOR ';'
) AS `Phones`,
GROUP_CONCAT(
DISTINCT CONCAT_WS(',', `EmailsTable`.`Email`)
ORDER BY `EmailsTable`.`EmailID` DESC
SEPARATOR ';'
) AS `Emails`,
`Properties`.`Address1`,
`Properties`.`city`,
`Properties`.`state`,
`Properties`.`PostalCode5`,
...(17 more `People` Columns)...,
FROM `T_Import` AS `Import`
LEFT JOIN `T_CustomerStorageJoin` AS `CustomerJoin`
ON `Import`.`UniqueID` = `CustomerJoin`.`ImportID`
LEFT JOIN `T_People` AS `People`
ON `CustomerJoin`.`PersID`=`People`.`PeopleID`
LEFT JOIN `T_JoinPeopleIDPhoneID` AS `PeIDPhID`
ON `People`.`PeopleID` = `PeIDPhID`.`PeopleID`
LEFT JOIN `T_Phone` AS `PhonesTable`
ON `PeIDPhID`.`PhoneID`=`PhonesTable`.`PhoneID`
LEFT JOIN `T_JoinPeopleIDEmailID` AS `PeIDEmID`
ON `People`.`PeopleID` = `PeIDEmID`.`PeopleID`
LEFT JOIN `T_Email` AS `EmailsTable`
ON `PeIDEmID`.`EmailID`=`EmailsTable`.`EmailID`
LEFT JOIN `T_JoinPeopleIDPropertyID` AS `PeIDPrID`
ON `People`.`PeopleID` = `PeIDPrID`.`PeopleID`
AND `PeIDPrID`.`PropertyCP`='CurrentImported'
LEFT JOIN `T_Property` AS `Properties`
ON `PeIDPrID`.`PropertyID`=`Properties`.`PropertyID`
WHERE `Import`.`CustomerCollectionID`=$ccID
AND `RowNum` >= $rnOffset
AND `RowNum` < $rnLimit
GROUP BY `RowNum`;
So I have indexes on every ON segment, and the WHERE segment. When RowNumber is like around 0->2500 in value, the query runs great and executes within a couple seconds. But it seems like the query execution time exponentially multiplies the larger RowNumber gets.
I have an EXPLAIN here: and at pastebin( https://pastebin.com/PksYB4n2 )
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE Import NULL ref CustomerCollectionID,RowNumIndex CustomerCollectionID 4 const 48108 8.74 Using index condition; Using where; Using filesort;
1 SIMPLE CustomerJoin NULL ref ImportID ImportID 4 MyDatabase.Import.UniqueID 1 100 NULL
1 SIMPLE People NULL eq_ref PRIMARY,PeopleID PRIMARY 4 MyDatabase.CustomerJoin.PersID 1 100 NULL
1 SIMPLE PeIDPhID NULL ref PeopleID PeopleID 5 MyDatabase.People.PeopleID 8 100 NULL
1 SIMPLE PhonesTable NULL eq_ref PRIMARY,PhoneID,PhoneID_2 PRIMARY 4 MyDatabase.PeIDPhID.PhoneID 1 100 NULL
1 SIMPLE PeIDEmID NULL ref PeopleID PeopleID 5 MyDatabase.People.PeopleID 5 100 NULL
1 SIMPLE EmailsTable NULL eq_ref PRIMARY,EmailID,DupeDeleteSelect PRIMARY 4 MyDatabase.PeIDEmID.EmailID 1 100 NULL
1 SIMPLE PeIDPrID NULL ref PeopleMSCP,PeopleID,PropertyCP PeopleMSCP 5 MyDatabase.People.PeopleID 4 100 Using where
1 SIMPLE Properties NULL eq_ref PRIMARY,PropertyID PRIMARY 4 MyDatabase.PeIDPrID.PropertyID 1 100 NULL
I apologize if the formatting is absolutely terrible. I'm not sure what good formatting looks like so I may have jumbled it a bit on accident, plus the tabs got screwed up.
What I want to know is how to speed up the query time. The databases are very large, like in the 10s of millions of rows. And they aren't always like this as our tables are constantly changing, however I would like to be able to handle it when they are.
I tried using LIMIT 2000, 1000 for example, but I know that it's less efficient than using an indexed column. So I switched over to RowNumber. I feel like this was a good decision, but it seems like MySQL is still looping every single row before the offset variable which kind of defeats the purpose of my index... I think? I'm not sure. I also basically split this particular query into about 10 singular queries, and ran them one by one, for each row of the excel file. It takes a LONG time... TOO LONG. This is fast, but, obviously I'm having a problem.
Any help would be greatly appreciated, and thank you ahead of time. I'm sorry again for my lack of post organization.
The order of the columns in an index matters. The order of the clauses in WHERE does not matter (usually).
INDEX(a), INDEX(b) is not the same as the "composite" INDEX(a,b). I deliberately made composite indexes where they seemed useful.
INDEX(a,b) and INDEX(b,a) are not interchangeable unless both a and b are tested with =. (Plus a few exceptions.)
A "covering" index is one where all the columns for the one table are found in the one index. This sometimes provides an extra performance boost. Some of my recommended indexes are "covering". It implies that only the index BTree need be accessed, not also the data BTree; this is where it picks up some speed.
In EXPLAIN SELECT ... a "covering" index is indicated by "Using index" (which is not the same as "Using index condition"). (Your Explain shows no covering indexes currently.)
An index 'should not' have more than 5 columns. (This is not a hard and fast rule.) T5's index had f5 columns to be covering; it was not practical to make a covering index for T2.
When JOINing, the order of the tables does not matter; the Optimizer is free to shuffle them around. However, these "rules" apply:
A LEFT JOIN may force ordering of the tables. (I think it does in this case.) (I ordered the columns based on what I think the Optimizer wants; there may be some flexibility.)
The WHERE clause usually determines which table to "start with". (You test on T1 only, so obviously it will start with T1.
The "next table" to be referenced (via NLJ - Nested Loop Join) is determined by a variety of things. (In your case it is pretty obvious -- namely the ON column(s).)
More on indexing: http://mysql.rjweb.org/doc.php/index_cookbook_mysql
Revised Query
1. Import: (CustomerCollectionID, -- '=' comes first
RowNum, -- 'range'
UniqueID) -- 'covering'
Import shows up in WHERE, so is first in Explain; Also due to LEFTs
Properties: (PropertyID) -- is that the PK?
PeIDPrID: (PropertyCP, PeopleID, PropertyID)
3. People: (PeopleID)
I assume that is the `PRIMARY KEY`? (Too many for "covering")
(Since `People` leads to 3 other table; I won't number the rest.)
EmailsTable: (EmailID, Email)
PeIDEmID: (PeopleID, -- JOIN from People
EmailID) -- covering
PhonesTable: (PhoneID, Type, Phone)
PeIDPhID: (PeopleID, PhoneID)
2. CustomerJoin: (ImportID, -- coming from `Import` (see ON...)
PersID) -- covering
After adding those, I expect most lines of EXPLAIN to say Using index.
The lack of at least a composite index on Import is the main problem leading to your performance complaint.
Bad GROUP BY
When there is a GROUP BY that does not include all the non-aggregated columns that are not directly dependent on the group by column(s), you get random values for the extras. I see from the EXPLAIN ("Rows") that several tables probably have multiple rows. You really ought to think about the garbage being generated by this query.
Curiously, Phones and Emails are feed into GROUP_CONCAT(), thereby avoiding the above issue, but the "Rows" is only 1.
(Read about ONLY_FULL_GROUP_BY; it might explain the issue better.)
(I'm listing this as a separate Answer since it is orthogonal to my other Answer.)
I call this the "explode-implode" syndrome. The query does a JOIN, getting a bunch of rows, thereby generating several rows, and puts multiple rows into an intermediate table. Then the GROUP BY implodes back to down to the original set of rows.
Let me focus on a portion of the query that could be reformulated to provide a performance improvement:
SELECT ...
GROUP_CONCAT(
DISTINCT CONCAT_WS(',', `EmailsTable`.`Email`)
ORDER BY `EmailsTable`.`EmailID` DESC
SEPARATOR ';'
) AS `Emails`,
...
FROM ...
LEFT JOIN `T_Email` AS `EmailsTable`
ON `PeIDEmID`.`EmailID`=`EmailsTable`.`EmailID`
...
GROUP BY `RowNum`;
Instead, move the table and aggregation function into a subquery
SELECT ...
( SELECT GROUP_CONCAT(
DISTINCT CONCAT_WS(',', `Email`)
ORDER BY `EmailID` DESC
SEPARATOR ';' )
FROM T_Email
WHERE `PeIDEmID`.`EmailID` = `EmailID`
) AS `Emails`,
...
FROM ...
-- and Remove: LEFT JOIN `T_Email` ON ...
...
-- and possibly Remove: GROUP BY ...;
Ditto for PhonesTable.
(It is unclear whether the GROUP BY can be removed; other things may need it.)

MYSQL like query does not show full table scan in explain

I have a query running in Mysql like so (obfuscated names)
explain
select this_.id as id1_0_,
this_.column1 as column1,
this_.column2 as column2,
this_.column3 as column3,
this_.column4 as column4,
this_.column5 as column5,
from
tablename this_
where
this_.column1 like '/blah%'
and this_.column2 = 'a9b51a14-4338-94f7-f23dbf9d539e'
and this_.column3 <> 'DUH'
and this_.column4=0
and this_.column5 like '%somename%'
order by this_.created desc
limit 20
Edit: column1 has a BTREE index, column2, column3, column 4, column5, created all have HASH indexes.
Table has one foreign key which is selected in the select clause but not the WHERE clause.
I'm told and read that the
like %somename%
will result in a full table scan. However, when i ran the explain, the output of the explain is
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE this_ ref somecolumnandinexnames 111 const 30414 Using where
The explain output looks exactly the same if I take away the like clause.
Based on this, we decided to put the query in production only to discover that in practice the query with like took way longer to execute (some seconds compared to some millis without the like).
Is there an explanation as to why the explain didn't warn me beforehand about this?
Edit: Observations
Taking away the order by makes the query go fast again even with the LIKE still in there.
Splitting into a subquery with the like in the outer query as mentioned below in the post actually works!
As #Uueerdo says, moving the rest of the conditions into a subquery actually speeds up performance! I'm therefore tempted to conclude that one of the things that could be happening is that the WHERE clause with the like is executed before the other conditions leading to a large resultset. As #Uueerdo says, moving the rest of the conditions into a subquery actually speeds up performance! I'm therefore tempted to conclude that one of the things that could be happening is that the WHERE clause with the like is executed before the other conditions leading to a large resultset. However, I still don't have an explanation on why removing the order by speeds up performance. The query selects all of 10 rows so the order by should be quite fast.
Is there a way I can see the order in which MYSQL evaluates the query. I think I remember seeing some sort of a graphical representation in MS SQL Server explain plans once. Don't remember if it was quite the same.
Even if it does not require a table scan it can still be expensive; what is likely happening is MySQL is using other conditions in your where for initial candidate row selection, and then reducing those results with the rest of the conditions.
If there are a large number of candidates, and/or column5 values are long, that condition could take some time to evaluate. Keep in mind the LIMIT occurs after the WHERE, so that does not reduce the amount of work needed.
You might see some improvement if you put most of the query in a subquery, and filter it's results by the like '%somename%' condition in an outer query.
Something like:
SELECT * FROM (
SELECT t.id as id1_0_
, t.column1, t.column2, t.column3, t.column4, t.column5
, t.created
FROM tablename AS t
WHERE t.column2 = 'a9b51a14-4338-94f7-f23dbf9d539e'
AND t.column3 <> 'DUH'
AND t.column4=0
AND t.column1 like '/blah%'
) AS subQ
WHERE subQ.column5 like '%somename%'
ORDER BY subQ.created DESC
LIMIT 20
If you have an index that includes all of the columns being read (in the SELECT or WHERE clause), MySQL will be able to read all of these values out of the index without scanning the table.
Also note that all primary key columns will be in the index as well.
In that case, even though it isn't scanning the full table, it will be scanning every row in the index in order to handle the LIKE '%somename%' query, so it's not going to be especially more efficient.

MySQL join query performance issue

I am running the be query
SELECT packages.id, packages.title, subcat.id, packages.weight
FROM packages ,provider, packagestosubcat,
packagestocity, subcat, usertosubcat,
usertocity, usertoprovider
WHERE packages.endDate >'2011-03-11 06:00:00' AND
usertosubcat.userid = 1 AND
usertocity.userid = 1 AND
packages.providerid = provider.id AND
packages.id = packagestosubcat.packageid AND
packages.id = packagestocity.packageid AND
packagestosubcat.subcatid = subcat.id AND
usertosubcat.subcatid = packagestosubcat.subcatid AND
usertocity.cityid = packagestocity.cityid AND
(
provider.providertype = 'reg' OR
(
usertoprovider.userid = 1 AND
provider.providertype != 'reg' AND
usertoprovider.providerid = provider.ID
)
)
GROUP BY packages.title
ORDER BY subcat.id, packages.weight DESC
When i run explain, everything seems to look ok except for the scan on the usertoprovider table, which doesn't seem to be using table's keys:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE usertocity ref user,city user 4 const 4 Using temporary; Using filesort
1 SIMPLE packagestocity ref city,packageid city 4 usertocity.cityid 419
1 SIMPLE packages eq_ref PRIMARY,enddate PRIMARY 4 packagestocity.packageid 1 Using where
1 SIMPLE provider eq_ref PRIMARY,providertype PRIMARY 4 packages.providerid 1 Using where
1 SIMPLE packagestosubcat ref subcatid,packageid packageid 4 packages.id 1 Using where
1 SIMPLE subcat eq_ref PRIMARY PRIMARY 4 packagestosubcat.subcatid 1
1 SIMPLE usertosubcat ref userid,subcatid subcatid 4 const 12 Using where
1 SIMPLE usertoprovider ALL userid,providerid NULL NULL NULL 3735 Using where
As you can see in the above query, the condition itself is:
provider.providertype = 'reg' OR
(
usertoprovider.userid = 1 AND
provider.providertype != 'reg' AND
usertoprovider.providerid = provider.ID
)
Both tables, provider and usertoprovider, are indexed. provider has indexes on providerid and providertype while usertoprovider has indexes on userid and providerid
The cardinality of the keys is:
provider.id=47, provider.type=1, usertoprovider.userid=1245, usertoprovider.providerid=6
So its quite obvious that the indexes are not used.
Further more, to test it out, i went ahead and:
Duplicated the usertoprovider table
Inserted all the provider values that have providertype='reg' into the cloned table
Simplified the condition to (usertoprovider.userid = 1 AND usertoprovider.providerid = provider.ID)
The query execution time changed from 8.1317 sec. to 0.0387 sec.
Still, provider values that have providertype='reg' are valid for all the users and i would like to avoid inserting these values into the usertoprovider table for all the users since this data is redundant.
Can someone please explain why MySQL still runs a full scan and doesn't use the keys? What can be done to avoid it?
It seems that provider.providertype != 'reg' is redundant (always true) unless provider.providertype is nullable and you want the query to fail on NULL.
And shouldn't != be <> instead to be standard SQL, although MySQL may allow !=?
On cost of table scans
It is not necessarily that a full table scan is more expensive than walking an index, because walking an index still requires multiple page accesses. In many database engines, if your table is small enough to fit inside a few pages, and the number of rows are small enough, it will be cheaper to do a table scan. Database engines make this type of decision based on the data and index statistics of the table.
This case
However, in your case, it might also be because of the other leg in your OR clause: provider.providertype = 'reg'. If providertype is "reg", then this query joins in ALL the rows of usertoprovider (most likely not what you want) since it is a multi-table cross join.
The database engine is correct in determining that you'll likely need all the table rows in usertoprovider anyway (unless none of the providertype's is "reg", but the engine also may know!).
The query hides this fact because you are grouping on the (MASSIVE!) result set later on and just returning the package ID, so you won't see how many usertoprovider rows have been returned. But it will run very slowly. Get rid of the GROUP BY clause to find out how many rows you are actually forcing the database engine to work on!!!
The reason you see a massive speed improvement if you fill out the usertoprovider table is because then every row participates in a join, and there is no full cross join happening in the case of "reg". Before, if you have 1,000 rows in usertoprovider, every row with type="reg" expands the result set 1,000 times. Now, that row joins with only one row in usertoprovider, and the result set is not expanded.
If you really want to pass anything with providertype='reg', but not in your many-to-many mapping table, then the easiest way may be to use a sub-query:
Remove usertoprovider from your FROM clause
Do the following:
provider.providertype='reg' OR EXISTS (SELECT * FROM usertoprovider WHERE userid=1 AND providerid = provider.ID)
Another method is to use an OUTER JOIN on the usertoprovider -- any row with "reg" which is not in the table will come back with one row of NULL instead of expanding the result set.
Hmm, I know that MySQL does funny things with grouping. In any other RDBMS, your query won't even be executed. What does that even mean,
SELECT packages.id
[...]
GROUP BY packages.title
ORDER BY subcat.id, packages.weight DESC
You want to group by title. Then in standard SQL syntax, this means you can only select title and aggregate functions of the other columns. MySQL magically tries to execute (and probably guess) what you may have meant to execute. So what would you expect to be selected as packages.id ? The First matching package ID for every title? Or the last? And what would the ORDER BY clause mean with respect to the grouping? How can you order by columns that are not part of the result set (because only packages.title really is)?
There are two solutions, as far as I can see:
You're on the right track with your query, then remove the ORDER BY clause, because I don't think it will affect your result, but it may severely slow down your query.
You have a SQL problem, not a performance problem

How is using Join faster than using just Rand() in MySQL

How is
SELECT t.id
FROM table t
JOIN (SELECT(FLOOR(max(id) * rand())) AS maxid FROM table)
AS tt
ON t.id >= tt.maxid
LIMIT 1
faster than
SELECT * FROM `table` ORDER BY RAND() LIMIT 1
I am actually having trouble understanding the first. Maybe if I knew why one is faster than the other I would have a better understanding.
*original post # Difficult MySQL self-join please explain
You can use EXPLAIN on the queries, but basically:
In the first you're getting a random number (which isn't very slow), based on the maximum of a (i presume) indexed field. This is quite quick, i'd say maybe even near-constant time (depends on the implementation of the index hash?)
Then you're joining on that number and returning only the first row that's greater then, and because you're using an index again, this is lightning quick.
The second is ordering by some random function. This has to, but you'll need to look at the explain for that, do a FULL TABLE scan, and then return the first. This is ofcourse VERY expensive. You're not using any indexes because of that rand.
(the explain will look like this, showing that you're not using keys)
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE table ALL NULL NULL NULL NULL 14 Using temporary; Using filesort