Determining which columns to index in MySQL in CakePHP

Determining which columns to index in MySQL in CakePHP - mysql

I have a web app, which has quite a few queries being fired from every page. As more data was added to the DB, we noticed that the pages were taking longer and longer to load.
On examining PhpMyAdmin -> Status -> Joins, we noticed this (with the number in red):
Select_full_join 348.6 k The number of joins that do not use indexes. If this value is not 0, you should carefully check the indexes of your tables.
How do I determine which joins are causing the problems? Are all the joins equally to be blamed?
How do I determine which columns should be indexed, for the performance to be proper?
We are using CakePHP + MySQL, and the queries are all auto-generated.

The rule of thumb that I have always used, is that if I am using join, the fields that I am joining on need to be indexed.
For instance, if you have a query like the following:
SELECT t1.name, t2.salary
FROM employee AS t1
INNER JOIN info AS t2 ON t1.name = t2.name;
Both t1.name and t2.name should be indexed.
Below are some good reads for this as well:
Optimizing MySQL: Importance of JOIN Order
How to optimize MySQL JOIN queries through indexing
And in general, this guy's site has some good info as well.
MySQL Optimizer Team
Edit: This is always helpful.
And if you have access to your server settings, check out:
MySQL Slow Server Logs
Once you have a log of slow queries, you can use explain on them to see what needs indexing.

If you don't know which queries are running inefficiently, you have a couple of choices.
You could try this:
Try issuing the command SHOW FULL PROCESSLIST from phpmyadmin while your web site is active. It will show you, hopefully, a bunch of slow running queries. The FULL processlist should give you the entire query. You could then use the EXPLAIN command to figure out what it's doing.
You should also try this:
Think through the work your application is doing on behalf of your users. Think through which of your queries have to romp through lots of data to deliver value to the users. Think through which tables are growing as your application gets used more and more.
Then, find your queries that deliver that value, and that access your growing tables. Again, use the EXPLAIN command to see how MySQL is processing them, and add indexes as needed.
I suspect it will be very obvious which indexes you should add. Add the obvious ones, then let your system stabilize for a couple of workdays, then remeasure.
Notice that this is a normal part of bringing a new application into production.

Related

Debugging a MySQL query permanently affects the execution time

I am trying to debug a simple but very slow running MySQL query on a table with a JOIN to a very large table (13m rows), the large table has multiple indexes.
The join is very basic, just a join from ID on the small table to foreign_ID on the big table.
This query has been fast to run in the past, however a lot of new data has been added since then. It took 30ms to run previously, it now takes 5 minutes.
On live, I tried repairing the large table by using an alter command to set it to InnoDb. But this made no difference.
So to debug the query, I run EXPLAIN and try removing the joins etc until the query runs very quickly again.
The join types started out as ALL, eq_ref, ref and ref.
Then as I re-enable joins and to try to find a way of making it work in a performant way I find that actually now, the ORIGINAL QUERY now works quickly again.
The only thing that has changed is the query execution plan.
The join types are now range, eq_ref, eq_ref and ref.
What happened? Why is MySQL now treating this same query differently to how it did before?
And how can I make my live server do this too? And how can I stop this from happening again in the future?
EDIT: MySQL version on prod and locally is 5.7

You seem to be falling foul of a query planner bug that frequently manifests on MySQL 5.7 and later. What happens is that the query planner will decide on the wrong execution plan (indexes, join order), which results in the same query on the same data set sometimes running quickly (with the correct execution plan) or slowly (with the wrong execution plan, often resulting in a full table scan). I have seen this happen on every MySQL 5.7 and 8.0 deployment I have worked on. On MySQL 5.6 and earlier and MariaDB, this sort of behaviour from the query planner is only provocable by having an unusually large number of indexes on a table (10+). So if you have a lot of indexes on one of the tables involved, it my be worth trying to rationalize the number of them down.
Apart from keeping the number of indexes on each table as low as you reasonably can, you have two options to address this:
1) When you identify queries that encounter this bug, constrain them using index hints (USE/FORCE INDEX (index_name)) and, if necessary, STRAIGHT_JOIN to force the JOIN ordering.
2) Switch to MariaDB which doesn't seem to suffer from this problem.

MySQL JOIN Keeps Timing Out

I am currently trying to run a JOIN between two tables in a local MySQL database and it's not working. Below is the query, I am even limiting the query to 10 rows just to run a test. After running this query for 15-20 minutes, it tells me "Error Code" 2013. Lost connection to MySQL server during query". My computer is not going to sleep, and I'm not doing anything to interrupt the connection.
SELECT rd_allid.CreateDate, rd_allid.SrceId, adobe.Date, adobe.Id
FROM rd_allid JOIN adobe
ON rd_allid.SrceId = adobe.Id
LIMIT 10
The rd_allid table has 17 million rows of data and the adobe table has 10 million. I know this is a lot, but I have a strong computer. My processor is an i7 6700 3.4GHz and I have 32GB of ram. I'm also running this on a solid state drive.
Any ideas why I cannot run this query?

"Why I cannot run this query?"
There's not enough information to determine definitively what is happening. We can only make guesses and speculations. And offer some suggestions.
I suspect MySQL is attempting to materialize the entire resultset before the LIMIT 10 clause is applied. For this query, there's no optimization for the LIMIT clause.
And we might guess that there is not a suitable index for the JOIN operation, which is causing MySQL to perform a nested loops join.
We also suspect that MySQL is encountering some resource limitation which is causing the session to be terminated. Possibly filling up all space in /tmp (that usually throws an error, something like "invalid/corrupted myisam table '#tmpNNN'", something of that ilk. Or it could be some other resource constraint. Without doing an analysis, we're just guessing.
It's possible MySQL wrote something to the error log (hostname.err). I'd check there.
But whatever condition MySQL is running into (the answer to the question "Why I cannot run this query")
I'm seriously questioning the purpose of the query. Why is that query being run? Why is returning that particular resultset important?
There are several possible queries we could execute. Some of those will run a long time, and some will be much more performant.
One of the best ways to investigate query performance is to use MySQL EXPLAIN. That will show us the query execution plan, revealing the operations that MySQL will perform, and in what order, and indexes will be used.
We can make some suggestions as to some possible indexes to add, based on the query shown e.g. on adobe (id, date).
And we can make some suggestions about modifications to the query (e.g. adding a WHERE clause, using a LEFT JOIN, incorporate inline views, etc. But we don't have enough of a specification to recommend a suitable alternative.

You can try something like:
SELECT rd_allidT.CreateDate, rd_allidT.SrceId, adobe.Date, adobe.Id
FROM
(SELECT CreateDate, SrceId FROM rd_allid ORDER BY SrceId LIMIT 1000) rd_allidT
INNER JOIN
(SELECT Id FROM adobe ORDER BY Id LIMIT 1000) adobeT ON adobeT.id = rd_allidT.SrceId;
This may help you get a faster response times.
Also if you are not interested in all the relation you can also put some WHERE clauses that will be executed before the INNER JOIN making the query faster also.

speed select vs join

The two quires below do the same thing. Basically show all the id's of table 1, which are present in table 2. The thing which puzzles me is that the simple select is way way faster than the JOIN, I would have expected that the JOIN is a bit slower, but not by that much...5 seconds vs. 0.2
Can anyone elaborate on this ?
SELECT table1.id FROM
table1,table2 WHERE
table1.id=table2.id
Duration/Fetch 0.295/0.028 (MySql Workbench 5.2.47)
SELECT table1.id
FROM table1
INNER JOIN table2
ON table1.id=table2.id
Duration/Fetch 5.035/0.027 (MySql Workbench 5.2.47)

Q: Can anyone elaborate on this?
A: Before we go the "a bug in MySQL" route that #a_horse_with_no_name seems impatient to race down, we'd really need to ensure that this is repeatable behavior, and isn't just a quirk.
And to do that, we'd really need to see the elapsed time result from more than one run of the query.
If the query cache is enabled on the server, we want to run the queries with the SQL_NO_CACHE hint added (SELECT SQL_NO_CACHE table1.id ...) so we know we aren't retrieving cached results.
I'd repeat the execution of each query at least three times, and throw out the result from the first run, and average the other runs. (The purpose of this is to eliminate the impact of the table data not being in the cache, either InnoDB buffer, or the filesystem cache.)
Also, run an EXPLAIN SELECT ... for each query. And compare the access plans.
If either of these tables is MyISAM storage engine, note that MyISAM tables are subject to locking by DML operations; while an INSERT, UPDATE or DELETE operation is run on the table, the SELECT statements will be blocked from accessing the table. (But five seconds seems a bit much for that, unless these are really large tables, or really inefficient DML statements).
With InnoDB, the SELECT queries won't be blocked by DML operations.
Elapsed time is also going to depend on what else is going on on the system.
But the total elapsed time is going include more than just the time in the MySQL server. Temporarily turning on the MySQL general_log would allow you to capture the statements that are actually being processed by the server.

This looks like something that could be further optimized by the database engine if indeed you are running both queries under the exact same context.
SQL is declarative. By successfully declaring what you want, the engine has free reign to restructure the "How" of your request to bring back the fastest result.
The earliest versions of SQL didn't even have the keyword JOIN. There was only the comma.
There are many coding constructs in SQL that imperatively force a single inferior methodology over another and they should be avoided. JOIN shouldn't be avoided. Something sounds a miss. JOIN is the core element of SQL. It would be a shame to always have to use commas.
There are a zillion factors that go into the performance of a JOIN all based your environment, schema, and data. Chances are that your table1 and table2 represent a fringe case that may have gotten past the optimization algorithms.

The SQL_NO_CACHE worked, the new results are:
Duration/Fetch 5.065 / 0.027 for the select where and
Duration/Fetch 5.050 / 0.027 for the join
I would have thought that the "select where" would be faster, but the join was actually a tad swifter. But the difference is negligible
I would like to thank everyone for their response.

Optimizing MySQL queries/database

I have two tables TABLE A and TABLE B.
TABLE A contain 1 million (1,000,000) records and 4 fields while TABLE 2 contain 60,000 and 3 fields.
I am running a query which joins these two tables and usees WHERE clause to find specific products like WHERE product like '%Bags%' and product like 'Bags%' e.t.c.
When I run the query directly in phpMyAdmin then it returns records in around 1 or 2 seconds. But when they are being used on website, they are sometime taking 9 or 10 seconds according to MySQL 'slow query' log. Actually my website response was very slow at times so upon investigation I found out it is due to MySQL as I came to know about 'slow query log'.
The slow query log consists of all SQL statements that took more than long_query_time seconds to execute and required at least min_examined_row_limit rows to be examined.
So according to that log "query_time" for above query was 13 seconds while in some cases they even had "query_time" exceeding 50 seconds.
Both my tables are using PRIMARY keys as well as INDEXES. So I want to know how can I optimize them more or is there any way I can optimize MySQL settings in general?
This slowness of website doesn't happen all the time but sometimes (may be once in a week) and lasts for around 1 or 2 minutes. It gets decent amount of traffic and there are many other queries too, the above I posted was just one example.
Thanks

For all things MySQL and performance related, check out http://www.mysqlperformanceblog.com/
Check your queries with EXPLAIN, see here and here for info on how to use EXPLAIN as query diagnostic tool.
It's not enough to just have indexes. Are you indexing the fields searched in the WHERE clause? Also do you have indexes for the fields used in the WHERE clause (including the fields you mention in ORDER BY, GROUP BY, and HAVING clauses as well as JOINs)? If you have grouped fields in a single index, that index won't be hit unless you have a query that searches all those fields together. If you group fields in an index make sure they the index will actually be used in your query (EXPLAIN is your friend).
That said, it could be many other things as well: poorly configured MySQL server, poorly tuned server, bad schema. But your queries and your indexes are good place to start your investigation.
Here is a nice summary of performance best practices from Jay Pipes of MySQL.

like '%Bags%' query cannot be optimized using indexes.
The only way to improve performance here is to use fulltext indexes or get sphinx to search.

Its because of some other queries are run at the time when you are going to refresh the page of your website. so if for example your website going to run 8-10 queries at time of page refresh then it will take some more time than you run single query in phpmyadmin. and if its take 1-1.5 min to execute then its may not the query problem but it may have prob with the server speed also.
and you also can use MATCH() AGAINST() statement for optimize this type of search queries.
Otherwise you are already using PRIMARY KEY, INDEXES and JOINS so there is no need to worry about other things.
just check it out.
Thanks.

There are many ways to optimize Databases and queries. My method is the following.
Look at the DB Schema and see if it makes sense
Most often, Databases have bad designs and are not normalized. This can greatly affect the speed of your Database. As a general case, learn the 3 Normal Forms and apply them at all times. The normal forms above 3rd Normal Form are often called de-normalization forms but what this really means is that they break some rules to make the Database faster.
What I suggest is to stick to the 3rd normal form except if you are a DBA (which means you know subsequent forms and know what you're doing). Normalization after the 3rd NF is often done at a later time, not during design.
Only query what you really need
Filter as much as possible
Your Where Clause is the most important part for optimization.
Select only the fields you need
Never use "Select *" -- Specify only the fields you need; it will be faster and will use less bandwidth.
Be careful with joins
Joins are expensive in terms of time. Make sure that you use all the keys that relate the two tables together and don't join to unused tables -- always try to join on indexed fields. The join type is important as well (INNER, OUTER,... ).
Optimize queries and stored procedures (Most Run First)
Queries are very fast. Generally, you can retrieve many records in less than a second, even with joins, sorting and calculations. As a rule of thumb, if your query is longer than a second, you can probably optimize it.
Start with the Queries that are most often used as well as the Queries that take the most time to execute.
Add, remove or modify indexes
If your query does Full Table Scans, indexes and proper filtering can solve what is normally a very time-consuming process. All primary keys need indexes because they makes joins faster. This also means that all tables need a primary key. You can also add indexes on fields you often use for filtering in the Where Clauses.
You especially want to use Indexes on Integers, Booleans, and Numbers. On the other hand, you probably don't want to use indexes on Blobs, VarChars and Long Strings.
Be careful with adding indexes because they need to be maintained by the database. If you do many updates on that field, maintaining indexes might take more time than it saves.
In the Internet world, read-only tables are very common. When a table is read-only, you can add indexes with less negative impact because indexes don't need to be maintained (or only rarely need maintenance).
Move Queries to Stored Procedures (SP)
Stored Procedures are usually better and faster than queries for the following reasons:
Stored Procedures are compiled (SQL Code is not), making them faster than SQL code.
SPs don't use as much bandwidth because you can do many queries in one SP. SPs also stay on the server until the final results are returned.
Stored Procedures are run on the server, which is typically faster.
Calculations in code (VB, Java, C++, ...) are not as fast as SP in most cases.
It keeps your DB access code separate from your presentation layer, which makes it easier to maintain (3 tiers model).
Remove unneeded Views
Views are a special type of Query -- they are not tables. They are logical and not physical so every time you run select * from MyView, you run the query that makes the view and your query on the view.
If you always need the same information, views could be good.
If you have to filter the View, it's like running a query on a query -- it's slower.
Tune DB settings
You can tune the DB in many ways. Update statistics used by the optimizer, run optimization options, make the DB read-only, etc... That takes a broader knowledge of the DB you work with and is mostly done by the DBA.
****> Using Query Analysers****
In many Databases, there is a tool for running and optimizing queries. SQL Server has a tool called the Query Analyser, which is very useful for optimizing. You can write queries, execute them and, more importantly, see the execution plan. You use the execution to understand what SQL Server does with your query.

MySQL queries testing WHERE clause search times

Recently I was pulled into the boss-man's office and told that one of my queries was slowing down the system. I then was told that it was because my WHERE clause began with 1 = 1. In my script I was just appending each of the search terms to the query so I added the 1 = 1 so that I could just append AND before each search term. I was told that this is causing the query to do a full table scan before proceeding to narrow the results down.
I decided to test this. We have a user table with around 14,000 records. The queries were ran five times each using both phpmyadmin and PuTTY. In phpmyadmin I limited the queries to 500 but in PuTTY there was no limit. I tried a few different basic queries and tried clocking the times on them. I found that the 1 = 1 seemed to cause the query to be faster than just a query with no WHERE clause at all. This is on a live database but it seemed the results were fairly consistent.
I was hoping to post on here and see if someone could either break down the results for me or explain to me the logic for either side of this.

Well, your boss-man and his information source are both idiots. Adding 1=1 to a query does not cause a full table scan. The only thing it does is make query parsing take a miniscule amount longer. Any decent query plan generator (including the mysql one) will realize this condition is a NOP and drop it.
I tried this on my own database (solar panel historical data), nothing interesting out of the noise.
mysql> select sum(KWHTODAY) from Samples where Timestamp >= '2010-01-01';
seconds: 5.73, 5.54, 5.65, 5.95, 5.49
mysql> select sum(KWHTODAY) from Samples where Timestamp >= '2010-01-01' and 1=1;
seconds: 6.01, 5.74, 5.83, 5.51, 5.83
Note I used ajreal's query cache disabling.

First at all, did you set session query_cache_type=off; during both testing?
Secondly, both your testing queries on PHPmyadmin and Putty (mysql client) are so different, how to verify?
You should apply same query on both site.
Also, you can not assume PHPmyadmin is query cache off. The time display on the phpmyadmin is including PHP processing, which you should avoid as well.
Therefore, you should just do the testing on mysql client instead.

This isn't a really accurate way to determine what's going on inside MySQL. Things like caching and network variations could skew your results.
You should look into using "explain" to find out what query plan MySQL is using for your queries with and without your 1=1. A DBA will be more interested in those results. Also, if your 1=1 is causing a full table scan, you will know for sure.
The explain syntax is here: http://dev.mysql.com/doc/refman/5.0/en/explain.html
How to interpret the results are here: http://dev.mysql.com/doc/refman/5.0/en/explain-output.html

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008