MySQL queries testing WHERE clause search times - mysql

Recently I was pulled into the boss-man's office and told that one of my queries was slowing down the system. I then was told that it was because my WHERE clause began with 1 = 1. In my script I was just appending each of the search terms to the query so I added the 1 = 1 so that I could just append AND before each search term. I was told that this is causing the query to do a full table scan before proceeding to narrow the results down.
I decided to test this. We have a user table with around 14,000 records. The queries were ran five times each using both phpmyadmin and PuTTY. In phpmyadmin I limited the queries to 500 but in PuTTY there was no limit. I tried a few different basic queries and tried clocking the times on them. I found that the 1 = 1 seemed to cause the query to be faster than just a query with no WHERE clause at all. This is on a live database but it seemed the results were fairly consistent.
I was hoping to post on here and see if someone could either break down the results for me or explain to me the logic for either side of this.

Well, your boss-man and his information source are both idiots. Adding 1=1 to a query does not cause a full table scan. The only thing it does is make query parsing take a miniscule amount longer. Any decent query plan generator (including the mysql one) will realize this condition is a NOP and drop it.
I tried this on my own database (solar panel historical data), nothing interesting out of the noise.
mysql> select sum(KWHTODAY) from Samples where Timestamp >= '2010-01-01';
seconds: 5.73, 5.54, 5.65, 5.95, 5.49
mysql> select sum(KWHTODAY) from Samples where Timestamp >= '2010-01-01' and 1=1;
seconds: 6.01, 5.74, 5.83, 5.51, 5.83
Note I used ajreal's query cache disabling.

First at all, did you set session query_cache_type=off; during both testing?
Secondly, both your testing queries on PHPmyadmin and Putty (mysql client) are so different, how to verify?
You should apply same query on both site.
Also, you can not assume PHPmyadmin is query cache off. The time display on the phpmyadmin is including PHP processing, which you should avoid as well.
Therefore, you should just do the testing on mysql client instead.

This isn't a really accurate way to determine what's going on inside MySQL. Things like caching and network variations could skew your results.
You should look into using "explain" to find out what query plan MySQL is using for your queries with and without your 1=1. A DBA will be more interested in those results. Also, if your 1=1 is causing a full table scan, you will know for sure.
The explain syntax is here: http://dev.mysql.com/doc/refman/5.0/en/explain.html
How to interpret the results are here: http://dev.mysql.com/doc/refman/5.0/en/explain-output.html

Related

Debugging a MySQL query permanently affects the execution time

I am trying to debug a simple but very slow running MySQL query on a table with a JOIN to a very large table (13m rows), the large table has multiple indexes.
The join is very basic, just a join from ID on the small table to foreign_ID on the big table.
This query has been fast to run in the past, however a lot of new data has been added since then. It took 30ms to run previously, it now takes 5 minutes.
On live, I tried repairing the large table by using an alter command to set it to InnoDb. But this made no difference.
So to debug the query, I run EXPLAIN and try removing the joins etc until the query runs very quickly again.
The join types started out as ALL, eq_ref, ref and ref.
Then as I re-enable joins and to try to find a way of making it work in a performant way I find that actually now, the ORIGINAL QUERY now works quickly again.
The only thing that has changed is the query execution plan.
The join types are now range, eq_ref, eq_ref and ref.
What happened? Why is MySQL now treating this same query differently to how it did before?
And how can I make my live server do this too? And how can I stop this from happening again in the future?
EDIT: MySQL version on prod and locally is 5.7
You seem to be falling foul of a query planner bug that frequently manifests on MySQL 5.7 and later. What happens is that the query planner will decide on the wrong execution plan (indexes, join order), which results in the same query on the same data set sometimes running quickly (with the correct execution plan) or slowly (with the wrong execution plan, often resulting in a full table scan). I have seen this happen on every MySQL 5.7 and 8.0 deployment I have worked on. On MySQL 5.6 and earlier and MariaDB, this sort of behaviour from the query planner is only provocable by having an unusually large number of indexes on a table (10+). So if you have a lot of indexes on one of the tables involved, it my be worth trying to rationalize the number of them down.
Apart from keeping the number of indexes on each table as low as you reasonably can, you have two options to address this:
1) When you identify queries that encounter this bug, constrain them using index hints (USE/FORCE INDEX (index_name)) and, if necessary, STRAIGHT_JOIN to force the JOIN ordering.
2) Switch to MariaDB which doesn't seem to suffer from this problem.

MySQL JOIN Keeps Timing Out

I am currently trying to run a JOIN between two tables in a local MySQL database and it's not working. Below is the query, I am even limiting the query to 10 rows just to run a test. After running this query for 15-20 minutes, it tells me "Error Code" 2013. Lost connection to MySQL server during query". My computer is not going to sleep, and I'm not doing anything to interrupt the connection.
SELECT rd_allid.CreateDate, rd_allid.SrceId, adobe.Date, adobe.Id
FROM rd_allid JOIN adobe
ON rd_allid.SrceId = adobe.Id
LIMIT 10
The rd_allid table has 17 million rows of data and the adobe table has 10 million. I know this is a lot, but I have a strong computer. My processor is an i7 6700 3.4GHz and I have 32GB of ram. I'm also running this on a solid state drive.
Any ideas why I cannot run this query?
"Why I cannot run this query?"
There's not enough information to determine definitively what is happening. We can only make guesses and speculations. And offer some suggestions.
I suspect MySQL is attempting to materialize the entire resultset before the LIMIT 10 clause is applied. For this query, there's no optimization for the LIMIT clause.
And we might guess that there is not a suitable index for the JOIN operation, which is causing MySQL to perform a nested loops join.
We also suspect that MySQL is encountering some resource limitation which is causing the session to be terminated. Possibly filling up all space in /tmp (that usually throws an error, something like "invalid/corrupted myisam table '#tmpNNN'", something of that ilk. Or it could be some other resource constraint. Without doing an analysis, we're just guessing.
It's possible MySQL wrote something to the error log (hostname.err). I'd check there.
But whatever condition MySQL is running into (the answer to the question "Why I cannot run this query")
I'm seriously questioning the purpose of the query. Why is that query being run? Why is returning that particular resultset important?
There are several possible queries we could execute. Some of those will run a long time, and some will be much more performant.
One of the best ways to investigate query performance is to use MySQL EXPLAIN. That will show us the query execution plan, revealing the operations that MySQL will perform, and in what order, and indexes will be used.
We can make some suggestions as to some possible indexes to add, based on the query shown e.g. on adobe (id, date).
And we can make some suggestions about modifications to the query (e.g. adding a WHERE clause, using a LEFT JOIN, incorporate inline views, etc. But we don't have enough of a specification to recommend a suitable alternative.
You can try something like:
SELECT rd_allidT.CreateDate, rd_allidT.SrceId, adobe.Date, adobe.Id
FROM
(SELECT CreateDate, SrceId FROM rd_allid ORDER BY SrceId LIMIT 1000) rd_allidT
INNER JOIN
(SELECT Id FROM adobe ORDER BY Id LIMIT 1000) adobeT ON adobeT.id = rd_allidT.SrceId;
This may help you get a faster response times.
Also if you are not interested in all the relation you can also put some WHERE clauses that will be executed before the INNER JOIN making the query faster also.

I am running a large number of SELECT queries at once and it takes seconds to run. Why isn't it quicker due to MySQL caching?

I have a table with 70 rows. For learning/testing purposes I wrote out a query for each row. So I wrote:
SELECT * FROM MyTable WHERE id="id1";
SELECT * FROM MyTable WHERE id="id2";
/*etc*/
SELECT * FROM MyTable WHERE id="id70";
And ran it in Sequel Pro. All of the queries took a total of 5 seconds. This seems like a really long time since I had read that MySQL has a feature called The MySQL Query Cache. It seems like a query cache, if it is this slow, is pretty useless and I might as well write my own layer of query caching between the database layer and the frontend.
Is it correct that the MySQL query cache is this slow? Or do I need to activate something or fix something to get it to work?
Per the cache documentation, it maps the text of a select statement to the returned result. Since all of those are different, the result wouldn't be cached until they have all been executed once. Does it take just as long the second time?
5 seconds seems slow even without the cache for a normal case though. How big is the table? Is id the primary key? If it is not the PK, then the server is reading every row, and just returning the one that met the criteria you asked for.
Edit - Since you're using a hosted solution, are you running the query from something on the host network, or across the internet? If it's across the internet, then the problem is almost certainly network latency rather than execution time. Especially running the queries individually, since you'll incur transit time for each select.
If you query just based on primary key, you might as well use the memcached interface.
https://dev.mysql.com/doc/refman/5.6/en/innodb-memcached.html

Determining which columns to index in MySQL in CakePHP

I have a web app, which has quite a few queries being fired from every page. As more data was added to the DB, we noticed that the pages were taking longer and longer to load.
On examining PhpMyAdmin -> Status -> Joins, we noticed this (with the number in red):
Select_full_join 348.6 k The number of joins that do not use indexes. If this value is not 0, you should carefully check the indexes of your tables.
How do I determine which joins are causing the problems? Are all the joins equally to be blamed?
How do I determine which columns should be indexed, for the performance to be proper?
We are using CakePHP + MySQL, and the queries are all auto-generated.
The rule of thumb that I have always used, is that if I am using join, the fields that I am joining on need to be indexed.
For instance, if you have a query like the following:
SELECT t1.name, t2.salary
FROM employee AS t1
INNER JOIN info AS t2 ON t1.name = t2.name;
Both t1.name and t2.name should be indexed.
Below are some good reads for this as well:
Optimizing MySQL: Importance of JOIN Order
How to optimize MySQL JOIN queries through indexing
And in general, this guy's site has some good info as well.
MySQL Optimizer Team
Edit: This is always helpful.
And if you have access to your server settings, check out:
MySQL Slow Server Logs
Once you have a log of slow queries, you can use explain on them to see what needs indexing.
If you don't know which queries are running inefficiently, you have a couple of choices.
You could try this:
Try issuing the command SHOW FULL PROCESSLIST from phpmyadmin while your web site is active. It will show you, hopefully, a bunch of slow running queries. The FULL processlist should give you the entire query. You could then use the EXPLAIN command to figure out what it's doing.
You should also try this:
Think through the work your application is doing on behalf of your users. Think through which of your queries have to romp through lots of data to deliver value to the users. Think through which tables are growing as your application gets used more and more.
Then, find your queries that deliver that value, and that access your growing tables. Again, use the EXPLAIN command to see how MySQL is processing them, and add indexes as needed.
I suspect it will be very obvious which indexes you should add. Add the obvious ones, then let your system stabilize for a couple of workdays, then remeasure.
Notice that this is a normal part of bringing a new application into production.

Is there an effect on the speed of a query when using SQL_CALC_FOUND_ROWS in MySQL?

The other day I found the FOUND_ROWS() (here) function in MySQL and it's corresponding SQL_CALC_FOUND_ROWS option. The later looks especially useful (instead of running a second query to get the row count).
I'm wondering what speed impact there is by adding SQL_CALC_FOUND_ROWS to a query?
I'm guessing it will be much faster than runnning a second query to count the rows, but will it be a lot different. Also, I have found limiting a query to make it much faster (for example when you get the first 10 rows of 1000). Will adding SQL_CALC_FOUND_ROWS to a query with a small limit cause the query to run much slower?
I know I can test this, but I'm wondering about general practices here.
When I was at the MySQL Conference in 2008, part of one session was dedicated to exactly this - benchmarks between SQL_CALC_FOUND_ROWS and doing a separate SELECT.
I believe the result was that there was no benefit to SQL_CALC_FOUND_ROWS - it wasn't faster, in fact it may have been slower. There was also a 3rd way.
Additionally, you don't always need this information, so I would go the extra query route.
I'll try to find the slides...
Edit: Hrm, google tells me that I actually liveblogged from that session: http://beerpla.net/2008/04/16/mysql-conference-liveblogging-mysql-performance-under-a-microscope-the-tobias-and-jay-show-wednesday-200pm/. Google wins when memory fails.
To calculate SQL_CALC_FOUND_ROWS the query will be execute as if no LIMIT was set, but the result set sent to the client will obey the LIMIT.
Update: for COUNT(*) operations which would be using only the index, SQL_CALC_FOUND_ROWS is slower (reference).
I assume it would be slightly faster for queries that you need the number of rows know, but would incur and overhead for queries that you don't need to know.
The best advice I could give is to try it out on your development server and benchmark the difference. Every setup is different.
I would advise to use as few proprietary SQL extensions as possible when developing an application (or actually not using SQL queries at all). Doing a separate query is portable, and actually I don't think MySql could do better at getting the actual information than re-querying. Btw. as the page mentions the command has some drawbacks too when used in replicated environments.