How do I clear the MySQL Query Planner's statistics - mysql

I have a number of complex queries that I'm trying to benchmark. It was discovered that on one production box that the query planner hadn't been updated which is likely the cause of some of the poor performance we were seeing (MyISAM tables). To be clear, all indexes on the table are showing with NULL cardinality.
Of course, I need to perform an ANALYZE TABLE on my production boxes, but I'd like to somehow benchmark the performance of my queries in a dev environment before I do that. My dev environment shows good, usable indexes on the table. I'd like to.. "UNANALYZE" the table so I can compare the performance of the broken indexes we have in production versus what we should expect with proper indexes. Would just deleting the index give me the same results, or is there a better way to just flush the statistics?
BTW, I recognize that the NULL cardinality is an obvious problem and easy to fix. However, I'd like to quantify how much this has been hurting the performance. You know.. for science!

You could not, AFAIK, but you could play with FORCE_INDEX() clause on your SQL statement to force usage of an inadequate index, and then MySQL will fallback to table scan ;)

Related

Understanding the MySQL Query Optimizer

We use two MySQL servers with the same setup and master-master replication. They are located behind a loadbalancer and get almost the same traffic, statements etc.
On server1, there are some cron jobs running additionally, which is the only difference.
However, we've seen that the query optimizer in some cases behaves different with the same query on both servers.
In those cases we have to use FORCE_INDEX to get the best results on both servers.
The main questions are:
Is there any metadata that is stored somewhere on the server that is used by the Query Optimizer?
If we have to backup and restore a database (with XtraBackup), will the query optimizer behave in the same way or is it built from scratch?
Thanks for any reply
Joachim
The Optimizer bases its actions partially on "statistics". Those statistics come from explicit ANALYZE TABLE or from certain changes to the table, such as "adding more than 10% to the table".
There is nothing to sync the stats between the two Masters, so they can drift apart. Even running ANALYZE TABLE on both will not necessarily bring them in sync. This is because of the "random probes" that are made into the table to concoct the stats.
FORCE INDEX is risky because "what helps today may hurt tomorrow".
Probably this is what is happening: You have a query that is on the borderline between picking one query plan and another. The Optimizer's analysis says they are about equivalent, but (for various reasons) they are not. This could lead to one server doing one thing (fast query plan) and the other doing something else (poor query plan).
There is no 'reliable' and 'consistent' way to solve your problem.
Let's try to tackle it a different way. Provide the query, the EXPLAIN, and SHOW CREATE TABLE. There may be a better index and/or a reformulation of the query that avoids the problem completely, and possibly runs faster than either of your current query plans.

Join order differs for between two instances of the same Mysql DB

There is a query that I want to optimize. To make some tests, I took a snapshot of the production database and create a new test instance of this database. Using the explain clause, I can see that the order of the joins differ between the two databases. The two databases have the same version (MySQL 5.6.19a), the same engine (InnoDB), the same schema, the same indexes, the same data, and are executed on the same material. The only difference, is that the production database use more memory (obviously) because it has more connections to it.
What may cause the join order to be different?
The memory usage?
The indexes are still building in the test instance?
The indexes of the production database are fragmented?
This is rare but quite feasible. InnoDB has "statistics" about each index on each table; it uses them to decide what it the best way to perform the query, including what order to look at the tables.
The statistics used to come from 8 'random' dives into the BTree to get a crude feel for the number of rows and the distribution of the data. The timing of the dives, the number '8', and the randomness have all been criticized, and gradually they have been improved. Only some improvements exist in 5.6.19.
Also the "cost" model of deciding how to perform the query has recently had an overhaul (5.7 / 8.0). 8.0 and MariaDB 10.0 have "histograms", which should lead to better query plan choices. Not yet implemented (as of 8.0.0): Noticing which blocks are already cached; this could picking a 'worse' index because more of it is cached, hence faster.
Because of the complexity of the optimization problem and the huge number of possibilities, there are even some cases where a newer version picks a worse query plan.
Even if you are running the same query on the same machine, the query plan could be different.
I presume you already knew that changing a constant in the query can change the query plan -- and do it for the better. I have seen the same query come up with 6 different query plans, presumably due to different constants. This can be annoying if you are doing EXPLAIN on a query found in the slowlog -- you can't be sure that that query plan was used when it was "slow".
We simply have to live with all this.
You could do ANALYZE TABLE to recompute the statistics. But that can make things worse or better, depending on the phase of the moon. It might even (coincidentally) make your two instances perform the query the same.
The real question is "did one server run your query significantly faster than the other?" (After accounting for caching, other activity, etc, etc.)
When both of two tables in a JOIN are being filtered (something in WHERE), it is very difficult for the Optimizer to decide. If there is also ORDER BY and LIMIT, it becomes even harder to decide.
If you would like to provide your SELECT, its EXPLAIN, and SHOW CREATE TABLE, we can discuss details. (But start a new question.)

MySQL EXPLAIN - it keeps giving me different explanations each time

I have a really large, complex query I'm trying to optimise using MySQL EXPLAIN SELECT or EXPLAIN EXTENDED SELECT.
If I run it against the query, I'll see every table in the query is using Using where in the Extra column, which is great.
No data will be changed at all, I'll go off and make a cup of tea or something, come back and re-run EXPLAIN.
This time, just a few minutes later, only 20% of the tables are Using where, the primary table is now Using index; Using temporary; Using filesort, and my day becomes a nightmare trying to debug this.
I am aware that sometimes things like temporary tables and filesorts are more efficient than using where clauses and indexes. But not in the case of this database, which is 10GB in size, and creating temporary tables and filesorts kills the server completely.
Any ideas why this would be happening? Is there logic or reason behind such a thing?!
You are using InnoDB, correct? You are using a version older than 5.6.6, correct?
You have encountered an interesting variant on InnoDB's lack of "persistent statistics". Several things used to trigger re-computing the statistics for InnoDB tables. And those statistics are used for deciding how to execute the query.
Probably your particular query was "on the fence" -- a slight change in some statistic would lead to a different query plan.
If you would like, we could dig deeper. But we need to see
SHOW CREATE TABLE
SHOW TABLE STATUS (for clues of table size)
EXPLAIN EXTENDED SELECT...
EXPLAIN FORMAT=JSON SELECT... (5.6.5 or later)
And we may be able to suggest ways to speed up the query.

Does using union or joins in query causes slower performance in Sphinx?

I am willing to use sphinx with MySQL for my current project.
MYISAM as database engine as this db is gonna be only read-only with 10-25 millions of records.
so i would like to know whether ,
Does using union or joins in query causes performance issues in Sphinx ?
as i am about to design database and if union/joins gonna cause the slower performance then i can go for optimized design for sphinx.
Maybe like creating one big table with all fields and data and then creating separate INDEXES in sphinx depending on the data to be searched.
please guide me in correct direction.
thanks for your time.
Sphinx cant do joins anyway. Can do unions, just searching multiple indexes at once.
Or do you mean to build the sphinx index (ie in sql_query)? Indexer will only run the queries to build the indexes in the first place.
As you say read only - hence no updates, the indexes should never rebuilding, so doesnt really matter how slow they are.
In general a sphinx index will perform very similar regardless of how many feilds. So shouldnt need to split into different indexes. JUst have one multi purpose index (if its possible).
YOu can however shard the index into bits, so can distribute to multiple servers if performance becomes an issue.

How to predict MySQL tipping points?

I work on a big web application that uses a MySQL 5.0 database with InnoDB tables. Twice over the last couple of months, we have experienced the following scenario:
The database server runs fine for weeks, with low load and few slow queries.
A frequently-executed query that previously ran quickly will suddenly start running very slowly.
Database load spikes and the site hangs.
The solution in both cases was to find the slow query in the slow query log and create a new index on the table to speed it up. After applying the index, database performance returned to normal.
What's most frustrating is that, in both cases, we had no warning about the impending doom; all of our monitoring systems (e.g., graphs of system load, CPU usage, query execution rates, slow queries) told us that the database server was in good health.
Question #1: How can we predict these kinds of tipping points or avoid them altogether?
One thing we are not doing with any regularity is running OPTIMIZE TABLE or ANALYZE TABLE. We've had a hard time finding a good rule of thumb about how often (if ever) to manually do these things. (Since these commands LOCK tables, we don't want to run them indiscriminately.) Do these scenarios sound like the result of unoptimized tables?
Question #2: Should we be manually running OPTIMIZE or ANALYZE? If so, how often?
More details about the app: database usage pattern is approximately 95% reads, 5% writes; database executes around 300 queries/second; the table used in the slow queries was the same in both cases, and has hundreds of thousands of records.
The MySQL Performance Blog is a fantastic resource. Namely, this post covers the basics of properly tuning InnoDB-specific parameters.
I've also found that the PDF version of the MySQL Reference Manual to be essential. Chapter 7 covers general optimization, and section 7.5 covers server-specific optimizations you can toy with.
From the sound of your server, the query cache may be of IMMENSE value to you.
The reference manual also gives you some great detail concerning slow queries, caches, query optimization, and even disk seek analysis with indexes.
It may be worth your time to look into multi-master replication, allowing you to lock one server entirely and run OPTIMIZE/ANALYZE, without taking a performance hit (as 95% of your queries are reads, the other server could manage the writes just fine).
Section 12.5.2.5 covers OPTIMIZE TABLE in detail, and 12.5.2.1 covers ANALYZE TABLE in detail.
Update for your edits/emphasis:
Question #2 is easy to answer. From the reference manual:
OPTIMIZE:
OPTIMIZE TABLE should be used if you have deleted a large part of a table or if you have made many changes to a table with variable-length rows. [...] You can use OPTIMIZE TABLE to reclaim the unused space and to defragment the data table.
And ANALYZE:
ANALYZE TABLE analyzes and stores the key distribution for a table. [...] MySQL uses the stored key distribution to decide the order in which tables should be joined when you perform a join on something other than a constant. In addition, key distributions can be used when deciding which indexes to use for a specific table within a query.
OPTIMIZE is good to run when you have the free time. MySQL optimizes well around deleted rows, but if you go and delete 20GB of data from a table, it may be a good idea to run this. It is definitely not required for good performance in most cases.
ANALYZE is much more critical. As noted, having the needed table data available to MySQL (provided with ANALYZE) is very important when it comes to pretty much any query. It is something that should be run on a common basis.
Question #1 is a bit more of a trick. I would watch the server very carefully when this happens, namely disk I/O. My bet would be that your server is thrashing either your swap or the (InnoDB) caches. In either case, it may be query, tuning, or load related. Unoptimized tables could cause this. As mentioned, running ANALYZE can immensely help performance, and will likely help out too.
I haven't found any good way of predicting MySQL "tipping points" -- and I've run into a few.
Having said that, I've found tipping points are related to table size. But not merely raw table size, rather how big the "area of interest" is to a query. For example, in a table of over 3 million rows and about 40 columns, about three-quarters integers, most queries that would easily select a portion of them based on indices are fast. However, when one value in a query on one indexed column means two-thirds of the rows are now "interesting", the query is now about 5-times slower than normal. Lesson: try to arrange your data so such a scan isn't necessary.
However, such behaviour now gives you a size to look for. This size will be heavily dependant on your server setup, the MySQL server variables and the table's schema and data.
Similarly, I've seen reporting queries run in reasonable time (~45 seconds) if the period is two weeks, but take half-an-hour if the period is extended to four weeks.
Use slow query log that will help you to narrow down the queries you want to optimize.
For time critical queries it sometimes better to keep stable plan by using hints.
It sounds like you have a frustrating situation and maybe not the best code review process and development environment.
Whenever you add a new query to your code you need to check that it has the appropriate indexes ready and add those with the code release.
If you don't do that your second option is to constantly monitor the slow query log and then go beat the developers; I mean go add the index.
There's an option to enable logging of queries that didn't use an index which would be useful to you.
If there are some queries that "works and stops working" (but are "using and index") then it's likely that the query wasn't very good in the first place (low cardinality in the index; inefficient join; ...) and the first rule of evaluating the query carefully when it's added would apply.
For question #2 - On InnoDB "analyze table" is basically free to run, so if you have bad join performance it doesn't hurt to run it. Unless the balance of the keys in the table are changing a lot it's unlikely to help though. It almost always comes down to bad queries. "optimize table" rebuilds the InnoDB table; in my experience it's relatively rare that it helps enough to be worth the hassle of having the table unavailable for the duration (or doing the master-master failover stuff while it's running).