MySQL Select & Limit computational complexity

MySQL Select & Limit computational complexity - mysql

Let's say I have a mysql table defined like this:
create table test_table(
id int(10) unsigned auto_increment primary key
/*, other attributes...*/
);
And given that table I wanna fetch the last record from it like this:
select * from test_table order by id desc limit 1;
It works, but it feels a bit sketchy, what is its complexity?
Is it O(log(n)) since "limit" and "order by" are executed after the select?
Is there a better way to select the last record from an auto incremented table?

I think I figured it out.
My initial question was linked to "Select & Limit", but really this applies to all queries.
MySQL provides the "analyze" keyword.
You can invoke it on your terminal an then execute your query; It will output some metadata regarding the details of the execution.
Here's an example using the table in my question (I change its name to "comment" and its PK to "commentid" to give it some context):
analyze
select * from comment order by commentid desc limit 1;
And the following is the output:
"rows" tells you how many rows the query iterated and "r_rows" are the result rows.
This is what I was looking for.
I was living under the impression that somehow the "limit" keyword would optimize the query. It doesn't.
On the other hand, you can also use MAX() to get the last row
analyze
select * from comment where commentid=(select max(commentid) from comment);
The primary query iterates just 1 row obviously, but the subquery should be the most complex select of the two, so I analyzed it:
analyze
select max(commentid) from comment;
gave me:
This doesn't tell me much, except for the "extra" description, which says: "Select tables optimized away".
I looked that up and it's an already answered question on stack
From what I've gathered so far that description means the MAX doesn't actually count the rows of your table, instead it uses a stored value which is managed by the sql engine.
It only works if the column has "auto_increment".
The accepted answer also says it only works on MyISAM tables, but I'm running these tests on a InnoDB table, and the optimization seems to be working.
Here are the details:
SELECT PLUGIN_NAME, PLUGIN_VERSION, PLUGIN_TYPE_VERSION, PLUGIN_LIBRARY, PLUGIN_LIBRARY_VERSION, PLUGIN_AUTHOR
FROM information_schema.PLUGINS
WHERE PLUGIN_NAME = 'innodb';
PS: You might be wondering if doing this:
ALTER TABLE comment AUTO_INCREMENT = 999;
messes up the optimization.
The answer is no, it doesn't, setting the AUTO_INCREMENT to a certain value only affects the next entry. Try it yourself, modify the AUTO_INCREMENT value and then run
select max(commentid) from comment;
you will still get the correct value.

You can get desired output with this approach as well.
SELECT * FROM test_table where id=(select max(id) from test_table);
Hope, this will help you.

Related

MySQL: where exists VS where id in [performance]

This question also exist here: Poor whereHas performance in Laravel
... but without answer.
A similar situation happened to me as it happened to the author of that question:
replays table has 4M rows
players table has 40M rows
This query uses where exists and it takes a lot of time (70s) to finish:
select * from `replays`
where exists (
select * from `players`
where `replays`.`id` = `players`.`replay_id`
and `battletag_name` = 'test')
order by `id` asc
limit 100;
but when it's changed to use where id in instead of where exists - it's much faster (0.4s):
select * from `replays`
where id in (
select replay_id from `players`
where `battletag_name` = 'test')
order by `id` asc
limit 100;
MySQL (InnoDB) is being used.
I would like to understand why there is such a big difference in performance between where exists VS where id in - is it because of the way how MySQL works? I expected that the "exists" variant would be faster because MySQL would just check whether relevant rows exist... but I was wrong (I probably don't understand how "exists" works in this case).

You should show the execution plans.
To optimize the exists, you want an index on players(replay_id, battletag_name). An index on replays(id) should also help -- but if id is a primary key there is already an index.

Gordon has a good answer. The fact is that performance depends on a lot of different factors including database design/schema and volume of data.
As a rough guide, the exists sub-query is going to execute once for every row in replays and the in sub-query is going to execute once to get the results of the sub-query and then those results will be searched for every row in replays.
So with the exists, the better the indexing/access path the faster it will run. Without relevant index(es) it will just read through all rows until it finds a match. For every single row in replays. For the rows with no matches it would end up reading the entire players table each time. Even the rows with matches could read through a significant number of players before finding a match.
With the in the smaller the resultset from the sub-query the faster it will run. For those without a match it only needs to quickly check the small sub query rows to reach that answer. That said you don't get the benefit of indexes (if it works this way) so for a large result set from the sub query it has to read every row in the sub select before deciding that when there is no match.
That said, database optimisers are pretty clever, and don't always evaluate queries exactly the way you ask them to, hence why checking execution plans and testing yourself is important to figure out the best approach. Its not unusual to expect a certain execution path only to find that optimiser has chosen a different method of execution based on how it expects the data to look.

Check if MySQL Table is empty: COUNT(*) is zero vs. LIMIT(0,1) has a result?

This is a simple question about efficiency specifically related to the MySQL implementation. I want to just check if a table is empty (and if it is empty, populate it with the default data). Would it be best to use a statement like SELECT COUNT(*) FROM `table` and then compare to 0, or would it be better to do a statement like SELECT `id` FROM `table` LIMIT 0,1 then check if any results were returned (the result set has next)?
Although I need this for a project I am working on, I am also interested in how MySQL works with those two statements and whether the reason people seem to suggest using COUNT(*) is because the result is cached or whether it actually goes through every row and adds to a count as it would intuitively seem to me.

You should definitely go with the second query rather than the first.
When using COUNT(*), MySQL is scanning at least an index and counting the records. Even if you would wrap the call in a LEAST() (SELECT LEAST(COUNT(*), 1) FROM table;) or an IF(), MySQL will fully evaluate COUNT() before evaluating further. I don't believe MySQL caches the COUNT(*) result when InnoDB is being used.
Your second query results in only one row being read, furthermore an index is used (assuming id is part of one). Look at the documentation of your driver to find out how to check whether any rows have been returned.
By the way, the id field may be omitted from the query (MySQL will use an arbitrary index):
SELECT 1 FROM table LIMIT 1;
However, I think the simplest and most performant solution is the following (as indicated in Gordon's answer):
SELECT EXISTS (SELECT 1 FROM table);
EXISTS returns 1 if the subquery returns any rows, otherwise 0. Because of this semantic MySQL can optimize the execution properly.
Any fields listed in the subquery are ignored, thus 1 or * is commonly written.
See the MySQL Manual for more info on the EXISTS keyword and its use.

It is better to do the second method or just exists. Specifically, something like:
if exists (select id from table)
should be the fastest way to do what you want. You don't need the limit; the SQL engine takes care of that for you.
By the way, never put identifiers (table and column names) in single quotes.

get last record in file

I have a table (rather ugly designed, but anyway), which consists only of strings. The worst is that there is a script which adds records time at time. Records will never be deleted.
I believe, that MySQL store records in a random access file, and I can get last or any other record using C language or something, since I know the max length of the record and I can find EOF.
When I do something like "SELECT * FROM table" in MySQL I get all the records in the right order - cause MySQL reads this file from the beginning to the end. I need only the last one(s).
Is there a way to get the LAST record (or records) using MySQL query only, without ORDER BY?
Well, I suppose I've found a solution here, so my current query is
SELECT
#i:=#i+1 AS iterator,
t.*
FROM
table t,
(SELECT #i:=0) i
ORDER BY
iterator DESC
LIMIT 5
If there's a better solution, please let me know!

The order is not guaranteed unless you use an ORDER BY. It just happens that the records you're getting back are sorted the way need them.

Here is the importance of keys (primary key for example).
You can make some modification in your table by adding a primary key column with auto_increment default value.
Then you can query
select * from your_table where id =(select max(id) from your_table);
and get the last inserted row.

MySql Explain ignoring the unique index in a particular query

I started looking into Index(es) in depth for the first time and started analyzing our db beginning from the users table for the first time. I searched SO to find a similar question but was not able to frame my search well, I guess.
I was going through a particular concept and this first observation left me wondering - The difference in these Explain(s) [Difference : First query is using 'a%' while the second query is using 'ab%']
[Total number of rows in users table = 9193]:
1) explain select * from users where email_address like 'a%';
(Actually matching columns = 1240)
2) explain select * from users where email_address like 'ab%';
(Actually matching columns = 109)
The index looks like this :
My question:
Why is the index totally ignored in the first query? Does mySql think that it is a better idea not to use the index in the case 1? If yes, why?

If the probability, based statistics mysql collects on distribution of the values, is above a certain ratio of the total rows (typically 1/11 of the total), mysql deems it more efficient to simply scan the whole table reading the disks pages in sequentially, rather than use the index jumping around the disk pages in random order.
You could try your luck with this query, which may use the index:
where email_address between 'a' and 'az'
Although doing the full scan may actually be faster.

This is not a direct answer to your question but I still want to point it out (in case you already don't know):
Try:
explain select email_address from users where email_address like 'a%';
explain select email_address from users where email_address like 'ab%';
MySQL would now use indexes in both the queries above since the columns of interest are directly available from the index.
Probably in the case where you do a "select *", index access is more costly since the optmizer has to go through the index records, find the row ids and then go back to the table to retrieve other column values.
But in the query above where you only do a "select email_address", the optmizer knows all the information desired is available right from the index and hence it would use the index irrespective of the 30% rule.
Experts, please correct me if I am wrong.

Queries that do not select any table columns - do I understand this correctly?

I just read this article:
http://use-the-index-luke.com/sql/clustering/index-only-scan-covering-index
And at the bottom is this statement:
Queries that do not select any table columns are often executed as index-only scan.
Can you think of a meaningful example?
Problem is, there is no comments section, so I just want to verify, this is one example, correct?
SELECT 1 FROM `table_name` WHERE `indexed_column` = ?
This is to check whether a specified row exists.
So the questions:
Are there any more practical uses for that?
As a side note, I read somewhere that the above query might be more performant if encapsulated in EXISTS, I'm not sure how to check if it's true:
SELECT EXISTS(SELECT 1 FROM `table_name` WHERE `indexed_column` = ? LIMIT 1)
Is it?

Well, possibly the canonical example would be select count(*) from mytable to get a row count.
That selects no data from the table and would most likely be satisfied by the primary key index, if available.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008