Debugging a MySQL query permanently affects the execution time - mysql

I am trying to debug a simple but very slow running MySQL query on a table with a JOIN to a very large table (13m rows), the large table has multiple indexes.
The join is very basic, just a join from ID on the small table to foreign_ID on the big table.
This query has been fast to run in the past, however a lot of new data has been added since then. It took 30ms to run previously, it now takes 5 minutes.
On live, I tried repairing the large table by using an alter command to set it to InnoDb. But this made no difference.
So to debug the query, I run EXPLAIN and try removing the joins etc until the query runs very quickly again.
The join types started out as ALL, eq_ref, ref and ref.
Then as I re-enable joins and to try to find a way of making it work in a performant way I find that actually now, the ORIGINAL QUERY now works quickly again.
The only thing that has changed is the query execution plan.
The join types are now range, eq_ref, eq_ref and ref.
What happened? Why is MySQL now treating this same query differently to how it did before?
And how can I make my live server do this too? And how can I stop this from happening again in the future?
EDIT: MySQL version on prod and locally is 5.7

You seem to be falling foul of a query planner bug that frequently manifests on MySQL 5.7 and later. What happens is that the query planner will decide on the wrong execution plan (indexes, join order), which results in the same query on the same data set sometimes running quickly (with the correct execution plan) or slowly (with the wrong execution plan, often resulting in a full table scan). I have seen this happen on every MySQL 5.7 and 8.0 deployment I have worked on. On MySQL 5.6 and earlier and MariaDB, this sort of behaviour from the query planner is only provocable by having an unusually large number of indexes on a table (10+). So if you have a lot of indexes on one of the tables involved, it my be worth trying to rationalize the number of them down.
Apart from keeping the number of indexes on each table as low as you reasonably can, you have two options to address this:
1) When you identify queries that encounter this bug, constrain them using index hints (USE/FORCE INDEX (index_name)) and, if necessary, STRAIGHT_JOIN to force the JOIN ordering.
2) Switch to MariaDB which doesn't seem to suffer from this problem.

Related

Join order differs for between two instances of the same Mysql DB

There is a query that I want to optimize. To make some tests, I took a snapshot of the production database and create a new test instance of this database. Using the explain clause, I can see that the order of the joins differ between the two databases. The two databases have the same version (MySQL 5.6.19a), the same engine (InnoDB), the same schema, the same indexes, the same data, and are executed on the same material. The only difference, is that the production database use more memory (obviously) because it has more connections to it.
What may cause the join order to be different?
The memory usage?
The indexes are still building in the test instance?
The indexes of the production database are fragmented?
This is rare but quite feasible. InnoDB has "statistics" about each index on each table; it uses them to decide what it the best way to perform the query, including what order to look at the tables.
The statistics used to come from 8 'random' dives into the BTree to get a crude feel for the number of rows and the distribution of the data. The timing of the dives, the number '8', and the randomness have all been criticized, and gradually they have been improved. Only some improvements exist in 5.6.19.
Also the "cost" model of deciding how to perform the query has recently had an overhaul (5.7 / 8.0). 8.0 and MariaDB 10.0 have "histograms", which should lead to better query plan choices. Not yet implemented (as of 8.0.0): Noticing which blocks are already cached; this could picking a 'worse' index because more of it is cached, hence faster.
Because of the complexity of the optimization problem and the huge number of possibilities, there are even some cases where a newer version picks a worse query plan.
Even if you are running the same query on the same machine, the query plan could be different.
I presume you already knew that changing a constant in the query can change the query plan -- and do it for the better. I have seen the same query come up with 6 different query plans, presumably due to different constants. This can be annoying if you are doing EXPLAIN on a query found in the slowlog -- you can't be sure that that query plan was used when it was "slow".
We simply have to live with all this.
You could do ANALYZE TABLE to recompute the statistics. But that can make things worse or better, depending on the phase of the moon. It might even (coincidentally) make your two instances perform the query the same.
The real question is "did one server run your query significantly faster than the other?" (After accounting for caching, other activity, etc, etc.)
When both of two tables in a JOIN are being filtered (something in WHERE), it is very difficult for the Optimizer to decide. If there is also ORDER BY and LIMIT, it becomes even harder to decide.
If you would like to provide your SELECT, its EXPLAIN, and SHOW CREATE TABLE, we can discuss details. (But start a new question.)

MySQL EXPLAIN - it keeps giving me different explanations each time

I have a really large, complex query I'm trying to optimise using MySQL EXPLAIN SELECT or EXPLAIN EXTENDED SELECT.
If I run it against the query, I'll see every table in the query is using Using where in the Extra column, which is great.
No data will be changed at all, I'll go off and make a cup of tea or something, come back and re-run EXPLAIN.
This time, just a few minutes later, only 20% of the tables are Using where, the primary table is now Using index; Using temporary; Using filesort, and my day becomes a nightmare trying to debug this.
I am aware that sometimes things like temporary tables and filesorts are more efficient than using where clauses and indexes. But not in the case of this database, which is 10GB in size, and creating temporary tables and filesorts kills the server completely.
Any ideas why this would be happening? Is there logic or reason behind such a thing?!
You are using InnoDB, correct? You are using a version older than 5.6.6, correct?
You have encountered an interesting variant on InnoDB's lack of "persistent statistics". Several things used to trigger re-computing the statistics for InnoDB tables. And those statistics are used for deciding how to execute the query.
Probably your particular query was "on the fence" -- a slight change in some statistic would lead to a different query plan.
If you would like, we could dig deeper. But we need to see
SHOW CREATE TABLE
SHOW TABLE STATUS (for clues of table size)
EXPLAIN EXTENDED SELECT...
EXPLAIN FORMAT=JSON SELECT... (5.6.5 or later)
And we may be able to suggest ways to speed up the query.

Determining which columns to index in MySQL in CakePHP

I have a web app, which has quite a few queries being fired from every page. As more data was added to the DB, we noticed that the pages were taking longer and longer to load.
On examining PhpMyAdmin -> Status -> Joins, we noticed this (with the number in red):
Select_full_join 348.6 k The number of joins that do not use indexes. If this value is not 0, you should carefully check the indexes of your tables.
How do I determine which joins are causing the problems? Are all the joins equally to be blamed?
How do I determine which columns should be indexed, for the performance to be proper?
We are using CakePHP + MySQL, and the queries are all auto-generated.
The rule of thumb that I have always used, is that if I am using join, the fields that I am joining on need to be indexed.
For instance, if you have a query like the following:
SELECT t1.name, t2.salary
FROM employee AS t1
INNER JOIN info AS t2 ON t1.name = t2.name;
Both t1.name and t2.name should be indexed.
Below are some good reads for this as well:
Optimizing MySQL: Importance of JOIN Order
How to optimize MySQL JOIN queries through indexing
And in general, this guy's site has some good info as well.
MySQL Optimizer Team
Edit: This is always helpful.
And if you have access to your server settings, check out:
MySQL Slow Server Logs
Once you have a log of slow queries, you can use explain on them to see what needs indexing.
If you don't know which queries are running inefficiently, you have a couple of choices.
You could try this:
Try issuing the command SHOW FULL PROCESSLIST from phpmyadmin while your web site is active. It will show you, hopefully, a bunch of slow running queries. The FULL processlist should give you the entire query. You could then use the EXPLAIN command to figure out what it's doing.
You should also try this:
Think through the work your application is doing on behalf of your users. Think through which of your queries have to romp through lots of data to deliver value to the users. Think through which tables are growing as your application gets used more and more.
Then, find your queries that deliver that value, and that access your growing tables. Again, use the EXPLAIN command to see how MySQL is processing them, and add indexes as needed.
I suspect it will be very obvious which indexes you should add. Add the obvious ones, then let your system stabilize for a couple of workdays, then remeasure.
Notice that this is a normal part of bringing a new application into production.

MySQL query slowing down until restart

I have a service that sits on top of a MySQL 5.5 database (INNODB). The service has a background job that is supposed to run every week or so. On a high level the background job does the following:
Do some initial DB read and write in one transaction
Execute UMQ (described below) with a set of parameters in one transaction.
If no records are returned we are done!
Process the result from UMQ (this is a bit heavy so it is done outside of any DB
transaction)
Write the outcome of the previous step to DB in one transaction (this
writes to tables queried by UMQ and ensures that the same records are not found again by UMQ).
Goto step 2.
UMQ - Ugly Monster Query: This is a nasty database query that joins a bunch of tables, has conditions on columns in several of these tables and includes a NOT EXISTS subquery with some more joins and conditions. UMQ includes ORDER BY also has LIMIT 1000. Even though the query is bad I have done what I can here - there are indexes on all columns filtered on and the joins are all over foreign key relations.
I do expect UMQ to be heavy and take some time, which is why it's executed in a background job. However, what I'm seeing is rapidly degrading performance until it eventually causes a timeout in my service (maybe 50 times slower after 10 iterations).
First I thought that it was because the data queried by UMQ changes (see step 4 above) but that wasn't it because if I took the last query (the one that caused the timeout) from the slow query log and executed it myself directly I got the same behavior only until I restated the MySQL service. After restart the exact query on the exact same data that took >30 seconds before restart now took <0.5 seconds. I can reproduce this behavior every time by restoring the database to it's initial state and restarting the process.
Also, using the trick described in this question I could see that the query scans around 60K rows after restart as opposed to 18M rows before. EXPLAIN tells me that around 10K rows should be scanned and the result of EXPLAIN is always the same. No other processes are accessing the database at the same time and the lock_time in the slow query log is always 0. SHOW ENGINE INNODB STATUS before and after restart gives me no hints.
So finally the question: Does anybody have any clue of why I'm seeing this behavior? And how can I analyze this further?
I have the feeling that I need to configure MySQL differently in some way but I have searched and tested like crazy without coming up with anything that makes a difference.
Turns out that the behavior I saw was the result of how the MySQL optimizer uses InnoDB statistics to decide on an execution plan. This article put me on the right track (even though it does not exactly discuss my problem). The most important thing I learned from this is that MySQL calculates statistics on startup and then once in a while. This statistics is then used to optimize queries.
The way I had set up the test data the table T where most writes are done in step 4 started out as empty. After each iteration T would contain more and more records but the InnoDB statistics had not yet been updated to reflect this. Because of this the MySQL optimizer always chose an execution plan for UMQ (which includes a JOIN with T) that worked well when T was empty but worse and worse the more records T contained.
To verify this I added an ANALYZE TABLE T; before every execution of UMQ and the rapid degradation disappeared. No lightning performance but acceptable. I also saw that leaving the database for half an hour or so (maybe a bit shorter but at least more than a couple of minutes) would allow the InnoDB statistics to refresh automatically.
In a real scenario the relative difference in index cardinality for the tables involved in UMQ will look quite different and will not change as rapidly so I have decided that I don't really need to do anything about it.
thank you very much for the analysis and answer. I've been searching this issue for several days during ci on mariadb 10.1 and bacula server 9.4 (debian buster).
The situation was that after fresh server installation during a CI cycle, the first two tests (backup and restore) runs smoothly on unrestarted mariadb server and only the third test showed that one particular UMQ took about 20 minutes (building directory tree during restore process from the table with about 30k rows).
Unless the mardiadb server was restarted or table has been analyzed the problem would not go away. ANALYZE TABLE or the restart changed the cardinality of the fields and internal query processing exactly as stated in the linked article.

MySQL queries testing WHERE clause search times

Recently I was pulled into the boss-man's office and told that one of my queries was slowing down the system. I then was told that it was because my WHERE clause began with 1 = 1. In my script I was just appending each of the search terms to the query so I added the 1 = 1 so that I could just append AND before each search term. I was told that this is causing the query to do a full table scan before proceeding to narrow the results down.
I decided to test this. We have a user table with around 14,000 records. The queries were ran five times each using both phpmyadmin and PuTTY. In phpmyadmin I limited the queries to 500 but in PuTTY there was no limit. I tried a few different basic queries and tried clocking the times on them. I found that the 1 = 1 seemed to cause the query to be faster than just a query with no WHERE clause at all. This is on a live database but it seemed the results were fairly consistent.
I was hoping to post on here and see if someone could either break down the results for me or explain to me the logic for either side of this.
Well, your boss-man and his information source are both idiots. Adding 1=1 to a query does not cause a full table scan. The only thing it does is make query parsing take a miniscule amount longer. Any decent query plan generator (including the mysql one) will realize this condition is a NOP and drop it.
I tried this on my own database (solar panel historical data), nothing interesting out of the noise.
mysql> select sum(KWHTODAY) from Samples where Timestamp >= '2010-01-01';
seconds: 5.73, 5.54, 5.65, 5.95, 5.49
mysql> select sum(KWHTODAY) from Samples where Timestamp >= '2010-01-01' and 1=1;
seconds: 6.01, 5.74, 5.83, 5.51, 5.83
Note I used ajreal's query cache disabling.
First at all, did you set session query_cache_type=off; during both testing?
Secondly, both your testing queries on PHPmyadmin and Putty (mysql client) are so different, how to verify?
You should apply same query on both site.
Also, you can not assume PHPmyadmin is query cache off. The time display on the phpmyadmin is including PHP processing, which you should avoid as well.
Therefore, you should just do the testing on mysql client instead.
This isn't a really accurate way to determine what's going on inside MySQL. Things like caching and network variations could skew your results.
You should look into using "explain" to find out what query plan MySQL is using for your queries with and without your 1=1. A DBA will be more interested in those results. Also, if your 1=1 is causing a full table scan, you will know for sure.
The explain syntax is here: http://dev.mysql.com/doc/refman/5.0/en/explain.html
How to interpret the results are here: http://dev.mysql.com/doc/refman/5.0/en/explain-output.html