MySQL RAND() optimization with LIMIT option

MySQL RAND() optimization with LIMIT option - mysql

I have 50,000 rows in table and i am running following query but i heard it is a bad idea but how do i make it work better way?
mysql> SELECT t_dnis,account_id FROM mytable WHERE o_dnis = '15623157085' AND enabled = 1 ORDER BY RAND() LIMIT 1;
+------------+------------+
| t_dnis | account_id |
+------------+------------+
| 5623157085 | 1127 |
+------------+------------+
Any other way i can make is query faster or user other options?
I am not DBA so sorry if this question asked before :(
Note: currently we are not seeing performance issue but we are growing so could be impact in future so just want to know + and - point before are are out of wood.

This query:
SELECT t_dnis, account_id
FROM mytable
WHERE o_dnis = '15623157085' AND enabled = 1
ORDER BY RAND()
LIMIT 1;
is not sorting 50,000 rows. It is sorting the number of rows that match the WHERE clause. As you state in the comments, this is in the low double digits. On a handful of rows, the use of ORDER BY rand() should not have much impact on performance.
You do want an index. The best index would be mytable(o_dnis, enabled, t_dnis, account_id). This is a covering index for the query, so the original data pages do not need to be accessed.
Under most circumstances, I would expect the ORDER BY to be fine up to at least a few hundred rows, if not several thousand. Of course, this depends on lots of factors, such as your response-time requirements, the hardware you are running on, and how many concurrent queries are running. My guess is that your current data/configuration does not pose a performance problem, and there is ample room for growth in the data without an issue arising.

Unless you are running on very slow hardware, you should not experience problems in sorting (much? less than) 50,000 rows. So if you still ask the question, this makes me suspect that your problem does not lie in the RAND().
For example one possible cause of slowness could be not having a proper index - in this case you can go for a covering index:
CREATE INDEX mytable_ndx ON enabled, o_dnis, t_dnis, account_id;
or the basic
CREATE INDEX mytable_ndx ON enabled, o_dnis;
At this point you should already have good performances.
Otherwise you can run the query twice, either by counting the rows or just priming a cache. Which to choose depends on the data structure and how many rows are returned; usually, the COUNT option is the safest bet.
SELECT COUNT(1) AS n FROM mytable WHERE ...
which gives you n, which allows you to generate a random number k in the same range as n, followed by
SELECT ... FROM mytable LIMIT k, 1
which ought to be really fast. Again, the index will help you speeding up the counting operation.
In some cases (MySQL only) you could perhaps do better with
SELECT SQL_CACHE SQL_CALC_FOUND_ROWS ... FROM mytable WHERE ...
using the calc_found_rows() function to recover n, then run the second query which should take advantage of the cache. It's best if you experiment first, though. And changes in the table demographics might cause performance to fall.

The problem with ORDER BY RAND() LIMIT 1 is that MySQL will give each row a random values and that sort, performing a full table scan and than drops all the results but one.
This is especially bad on a table with a lot of row, doing a query like
SELECT * FROM foo ORDER BY RAND() LIMIT 1
However in your case the query is already filtering on o_dnis and enabled. If there are only a limited number of rows that match (like a few hundred), doing an ORDER BY RAND() shouldn't cause a performance issue.
The alternative required two queries. One to count and the other one to fetch.
in pseudo code
count = query("SELECT COUNT(*) FROM mytable WHERE o_dnis = '15623157085' AND enabled = 1").value
offset = random(0, count - 1)
result = query("SELECT t_dnis, account_id FROM mytable WHERE o_dnis = '15623157085' AND enabled = 1 LIMIT 1 OFFSET " + offset).row
Note: For the pseudo code to perform well, there needs to be a (multi-column) index on o_dnis, enabled.

Related

What is the fastest way to count the number of MySql rows left after a limited results query

If I have a mysql limited query:
SELECT * FROM my_table WHERE date > '2020-12-12' LIMIT 1,16;
Is there a faster way to check and see how many results are left after my limit?
I was trying to do a count with limit, but that wasn't working, i.e.
SELECT count(ID) AS count FROM my_table WHERE date > '2020-12-12' LIMIT 16,32;
The ultimate goal here is just to determine if there ARE any other rows to be had beyond the current result set, so if there is another faster way to do this that would be fine too.

It's best to do this by counting the rows:
SELECT count(*) AS count FROM my_table WHERE date > '2020-12-12'
That tells you how many total rows match the condition. Then you can compare that to the size of the result you got with your query using LIMIT. It's just arithmetic.
Past versions of MySQL had a function FOUND_ROWS() which would report how many rows would have matched if you didn't use LIMIT. But it turns out this had worse performance than running two queries, one to count rows and one to do your limit. So they deprecated this feature.
For details read:
https://www.percona.com/blog/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/
https://dev.mysql.com/worklog/task/?id=12615

(You probably want OFFSET 0, not 1.)
It's simple to test whether there ARE more rows. Assuming you want 16 rows, use 1 more:
SELECT ... WHERE ... ORDER BY ... LIMIT 0,17
Then programmatically see whether it returned only 16 rows (no more available) or 17 (there ARE more).
Because it is piggybacking on the fetch you are already doing and not doing much extra work, it is very efficient.
The second 'page' would use LIMIT 16, 17; 3rd: LIMIT 32,17, etc. Each time, you are potentially getting and tossing an extra row.
I discuss this and other tricks where I point out the evils of OFFSET: Pagination
COUNT(x) checks x for being NOT NULL. This is [usually] unnecessary. The pattern COUNT(*) (or COUNT(1)) simply counts rows; the * or 1 has no significance.
SELECT COUNT(*) FROM t is not free. It will actually do a full index scan, which is slow for a large table. WHERE and ORDER BY are likely to add to that slowness. LIMIT is useless since the result is always 1 row. (That is, the LIMIT is applied to the result, not to the counting.)

MySQL big limit number versus no limit

I was wondering what would be faster and what's the tradeoffs of using one or the other query?
SELECT * FROM table WHERE somecolumn = 'something' LIMIT 999;
vs.
SELECT * FROM table WHERE somecolumn = 'something';
Now, considering that the results of the query will never return more than a couple of hundreds of rows, does using LIMIT 999 makes some significate performance impact or not?
I'm looking into this option as in my project I will have some kind of option for a user to limit results as he'd like, and he can leave limit empty to show all, so it's easier for me to leave LIMIT part of the query and then just to change the number.
Now, the table is really big, ranging from couple of hundreds of thousands to couple of millions rows.
The exact quesy looks something like:
SELECT SUM(revenue) AS cost,
IF(ISNULL(headline) OR headline = '', 'undefined', headline
) AS headline
FROM `some_table`
WHERE ((date >= '2017-01-01')
AND (date <= '2017-12-31')
)
AND -- (sic)
GROUP BY `headline`
ORDER BY `cost` DESC
As I said before, this query will never return more than about a hundred rows.

Disk I/O, if any, is by far the most costly part of a query.
Fetching each row ranks next.
Almost everything else is insignificant.
However, if the existence of LIMIT can change what the Optimizer does, then there could be a significant difference.
In most cases, including the queries you gave, a too-big LIMIT has not impact.
In certain subqueries, a LIMIT will prevent the elimination of ORDER BY. A subquery is, by definition, is a set not an ordered set. So LIMIT is a kludge to prevent the optimization of removing ORDER BY.
If there is a composite index that includes all the columns needed for WHERE, GROUP BY, and ORDER BY, then the Optimizer can stop when the LIMIT is reached. Other situations go through tmp tables and sorts for GROUP BY and ORDER BY and can do the LIMIT only against a full set of rows.
Two caches were alluded to in the Comments so far.
"Query cache" -- This records exact queries and their result sets. If it is turned on and if it applicable, then the query comes back "instantly". By "exact", I include the existence and value of LIMIT.
To speed up all queries, data and indexes blocks are "cached" in RAM (see innodb_buffer_pool_size). This avoids disk I/O when a similar (not necessarily exact) query is run. See my first sentence, above.

MySQL: How to achieve row-level transaction locking instead of table locking

Here's the use case:
I have a table with a bunch of unique codes which are either available or not available. As part of a transaction, I want to select a code that is available from the table, then later update that row later in the transaction. Since this can happen concurrently for a lot of sessions at the same time, I want to ideally select a random record and use row-level locking on the table, so that other transactions aren't blocked by the query which is selecting a row from the table.
I am using InnoDB for the storage engine, and my query looks something like this:
select * from tbl_codes where available = 1 order by rand() limit 1 for update
However, rather than locking just one row from the table, it ends up locking the whole table. Can anyone give me some pointers on how to make it so that this query doesn't lock the whole table but just the row?
Update
Addendum: I was able to achieve row-level locking by specifying an explicit key in my select rather than doing the rand(). When my queries look like this:
Query 1:
select * from tbl_codes where available = 1 and id=5 limit 1 for update
Query 2:
select * from tbl_codes where available = 1 and id=10 limit 1 for update
However, that doesn't really help solve the problem.
Addendum 2: Final Solution I went with
Given that rand() has some issues in MySQL, the strategy I chose is:
I select 50 code id's where available = 1, then I shuffle the array in the application layer to add a level of randomness to the order.
select id from tbl_codes where available = 1 limit 50
I start popping codes from my shuffled array in a loop until I am able to select one with a lock
select * from tbl_codes where available = 1 and id = :id

It may be useful to look at how this query is actually executed by MySQL:
select * from tbl_codes where available = 1 order by rand() limit 1 for update
This will read and sort all rows that match the WHERE condition, generate a random number using rand() into a virtual column for each row, sort all rows (in a temporary table) based on that virtual column, and then return rows to the client from the sorted set until the LIMIT is reached (in this case just one). The FOR UPDATE affects locking done by the entire statement while it is executing, and as such the clause is applied as rows are read within InnoDB, not as they are returned to the client.
Putting aside the obvious performance implications of the above (it's terrible), you're never going to get reasonable locking behavior from it.
Short answer:
Select the row you want, using RAND() or any other strategy you like, in order to find the PRIMARY KEY value of that row. E.g.: SELECT id FROM tbl_codes WHERE available = 1 ORDER BY rand() LIMIT 1
Lock the row you want using its PRIMARY KEY only. E.g.: SELECT * FROM tbl_codes WHERE id = N
Hopefully that helps.

Even if not exactly mapping to your question, the problem is somewhat discussed here: http://akinas.com/pages/en/blog/mysql_random_row/
The problem with this method is that it is very slow. The reason for
it being so slow is that MySQL creates a temporary table with all the
result rows and assigns each one of them a random sorting index. The
results are then sorted and returned.
The article does not deal with locks. However, maybe MySQL locks all the rows having available = 1 and does not release them until the end of the transaction!
That article proposes some solution, none of them seems to be good for you, except this one which is, unfortunately, very hacky and I didn't probe its correctness.
SELECT * FROM table WHERE id >= (SELECT FLOOR( MAX(id) * RAND())
FROM table ) ORDER BY id LIMIT 1;
This is the best I can do for you since I don't command MySQL internals. Moreover, the article is pretty old.

how to increase performance of limit ?,1 when the ? is a huge number

I have a situation where I need to use a huge number for the limit. For example,
"select * from a table limit 15824293949,1";
this is....really really slow. Sometimes my home mysql server just dies.
is it possible to make it faster?
sorry the number was 15824293949, not 38975901200
Added:
**Table 'photos' (Sample)**
Photos
img_id img_filename
1 a.jpg
2 b.jpg
3 c.jpg
4 d.jpg
5 e.jpg
and so on
select cp1.img_id,cp2.img_id from photos as cp1 cross join photos as cp2 limit ?,1
How I get 15824293949?
I have 177901 rows in my photo table. I can get the total # of possible combinations by using
(total # of rows * total # of rows ) - total # of rows)/2

MySQL has issues with huge limit offsets with MyISAM engine mostly where InnoDB optimizes that. There are various techniques to get MyISAM limit to behave faster, however add EXPLAIN before your select statement to see what's actually going on. 3 billion rows generated from cross join indicate that the issue lies within the join itself, not the LIMIT clause.
If you're interested in how to make LIMIT behave faster, this link should provide you with enough information.

Try limiting the query with a WHERE clause on a column with an index on it. E.g.:
SELECT * FROM table WHERE id >= 38975901200 LIMIT 1
Update: I think perhaps you don't even need the database? You can find the nth combination of two images by calculating something like 15824293949 / 177901 and 15824293949 % 177901. I suppose you could write a query:
SELECT (15824293949 / 177901) AS img_id1, (15824293949 MOD 177901) AS img_id2
If you're trying to get them from the natural order that they're in the database (and it doesn't happen to be their img_id) then you might have some trouble. Does it matter? It's not clear what you're trying to do here.

Presumably you have this in some sort of script, where the reason you are looking at that specific point is because it is where you left off last.
Ideally, if you also have an auto_increment primary key field (an id), you can store that number. Then just do select * from table where id > last_seen_id limit 1 (maybe do more than 1 at a time :P)
Generally speaking, what you are asking it to do should be slow. Give it something to search for, rather than everything with a limit.

Which is a less expensive query count(id) or order by id

I'd like to know which of the followings would execute faster in MySQL database. The table would have 200 - 1000 entries.
SELECT id
from TABLE
order by id desc
limit 1
or
SELECT count(id)
from TABLE
The story is the Table is cached. So this query is to be executed every time before cache retrieval to determine whether the cache data is invalid by comparing the previous value.
So if there exists a even less expensive query, please kindly let me know. Thanks.

If you
start from 1
never have any gaps
use the InnoDB engine
id is not nullable
Then the 2nd could run [ever so marginally] faster due to not having to visit table data at all (count is stored in metadata).
Otherwise,
if the table has NO index on ID (causing a SCAN), the 2nd one is faster
Barring both the above
the first one is faster
And if you actually meant to ask SELECT .. LIMIT 1 vs SELECT MAX(id).. then the answer is actually that they are the same for MySQL and most sane DBMS, whether or not there is an index.

I think, the first query will run faster, as the query is limited to be executed for one row only, 200-1000 may not matter that much in this case.

As already pointed out in the comments, your table is so small it really doesn't what your solution will be. For this reason the select count(id) should be used as it expresses the intent and doesn't need any further processing.
Now select count(id) comes with an alternative select count(*). These two are not synonyms. select count(*) will count the number of rows and use a cached value if possible when select count(id) counts the number of non null values of the column id exists. If the id columns is set as not null then the cached row count may be used.
The selection between count(*) and count(id) depends once again on your intent. In the general case, count(*) describes the intent better.
The there is the possibility of count(1) which is actually a synonym of count(*) when using mysql but the interpretation may vary if end up using a different RDBMS.
The performance of each type of count also varies depending on whether you are using MyISAM or InnoDB. The row counts are cached on the former but not on the latter, if I've understood correctly.
In the end, you should rely on query plans and running tests and measuring their performance rather than these general ramblings.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL RAND() optimization with LIMIT option - mysql

Related

What is the fastest way to count the number of MySql rows left after a limited results query

MySQL big limit number versus no limit

MySQL: How to achieve row-level transaction locking instead of table locking

how to increase performance of limit ?,1 when the ? is a huge number

Which is a less expensive query count(id) or order by id

Categories

Resources