I have a set of quite complex SELECT queries which use a lot of disk space (I see this from df -h while running). Is there a way to estimate the temporary disk space required for a query before starting it?
You can use the EXPLAIN keyword to describe how your joins will effect the number of rows that will be joined together. This will also assist you in properly using keys if they are not already present. Explain will tell you when it thinks it will need to use temp tables (disk space). Based on the size of the rows being joined, you can then roughly estimate your disk space need.
See the docs on explain here:
http://dev.mysql.com/doc/refman/5.0/en/explain.html
Basically though, just prepend "Explain" to your select query to get info output. I believe you can also do this programatically if needed and use the results in your actual code, say for instance you needed to calculate(estimate) a large query run time and display it to the user before proceeding.
Related
I am trying to speed up a simple SELECT query on a table that has around 2 million entries, in a MariaDB MySQL database. It took over 1.5s until I created an index for the columns that I need, and running it through PhpMyAdmin showed a significant boost in speed (now takes around 0.09s).
The problem is, when I run it through my PHP server (mysqli), the execution time does not change at all. I'm logging my execution time by running microtime() before and after the query, and it takes ~1.5s to run it, regardless of having the index or not (tried removing/readding it to see the difference).
Query example:
SELECT `pair`, `price`, `time` FROM `live_prices` FORCE INDEX
(pairPriceTime) WHERE `time` = '2022-08-07 03:01:59';
Index created:
ALTER TABLE `live_prices` ADD INDEX pairPriceTime (pair, price, time);
Any thoughts on this? Does PHP PDO ignore indexes? Do I need to restart the server in order for it to "acknowledge" that there is a new index? (Which is a problem since I'm using a shared hosting service...)
If that is really the query, then it needs an INDEX starting with the value tested in the WHERE:
INDEX(time)
Or, to make a "covering index":
INDEX(time, pair, price)
However, I suspect that most of your accesses involve pair? If so, then other queries may need
INDEX(pair, time)
especially if you as for a range of times.
To discuss various options further, please provide EXPLAIN SELECT ...
PDO, mysqli, phpmyadmin -- These all work the same way. (A possible exception deals with an implicit LIMIT on phpmyadmin.)
Try hard to avoid the use of FORCE INDEX -- what helps on today's query and dataset may hurt on tomorrow's.
When you see puzzling anomalies in timings, run the query twice. Caching may be the explanation.
The mysql documenation says
The FORCE INDEX hint acts like USE INDEX (index_list), with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the named indexes to find rows in the table.
MariaDB documentation Force Index here says this
FORCE INDEX works by only considering the given indexes (like with USE_INDEX) but in addition, it tells the optimizer to regard a table scan as something very expensive. However, if none of the 'forced' indexes can be used, then a table scan will be used anyway.
Use of the index is not mandatory. Since you have only specified one condition - the time, it can choose to use some other index for the fetch. I would suggest that you use another condition for the select in the where clause or add an order by
order by pair, price, time
I ended up creating another index (just for the time column) and it did the trick, running at ~0.002s now. Setting the LIMIT clause had no effect since I was always getting 423 rows (for 423 coin pairs).
Bottom line, I probably needed a more specific index, although the weird part is that the first index worked great on PMA but not through PHP, but the second one now applies to both approaches.
Thank you all for the kind replies :)
I have been learning query optimization, increase query performance and all but in general if we create a query how can we know if this is a wise query.
I know we can see the execution time below, But this time will not give a clear indication without a good amount of data. And usually, when we create a new query we don't have much data to check.
I have learned about clauses and commands performance. But is there is anything by which we can check the performance of the query? Performance here is not execution time, it means that whether a query is "ok" or not, without data dependency.
As we cannot create that much data that would be in live database.
General performance of a query can be checked using the EXPLAIN command in MySQL. See https://dev.mysql.com/doc/refman/5.7/en/using-explain.html
It shows you how MySQL engine plans to execute the query and allows you to do some basic sanity checks i.e. if the engine will use keys and indexes to execute the query, see how MySQL will execute the joins (i.e. if foreign keys aren't missing) and many more.
You can find some general tips about how to use EXPLAIN for optimizing queries here (along with some nice samples): http://www.sitepoint.com/using-explain-to-write-better-mysql-queries/
As mentioned above, Right query is always data-dependent. Up to some level you can use the below methods to check the performance
You can use Explain to understand the Query Execution Plan and that may help you to correct some stuffs. For more info :
Refer Documentation Optimizing Queries with EXPLAIN
You can use Query Analyzer. Refer MySQL Query Analyzer
I like to throw my cookbook at Newbies because they often do not understand how important INDEXes are, or don't know some of the subtleties.
When experimenting with multiple choices of query/schema, I like to use
FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';
That counts low level actions, such as "read next record". It essentially eliminates caching issues, disk speed, etc, and is very reproducible. Often there is a counter in that output (or multiple counters) that match the number of rows in the table (sometimes +/-1) -- that tells me there are table scan(s). This is usually not as good as if some INDEX were being used. If the query has a LIMIT, that value may show up in some Handler.
A really bad query, such as a CROSS JOIN, would show a value of N*M, where N and M are the row counts for the two tables.
I used the Handler technique to 'prove' that virtually all published "get me a random row" techniques require a table scan. Then I could experiment with small tables and Handlers to come up with a list of faster random routines.
Another tip when timing... Turn off the Query_cache (or use SELECT SQL_NO_CACHE).
How to find temp space used by a query? Tried with Explain and Explain extended but that doesn't show any information.
Well, one thing you could do is monitor the temp directory while you are running a query.
Howerver, there are specific types of queries/situations that will cause temp files to be created (not all).
The answer at https://dba.stackexchange.com/a/30635 tells you which kinds.
Recently I was pulled into the boss-man's office and told that one of my queries was slowing down the system. I then was told that it was because my WHERE clause began with 1 = 1. In my script I was just appending each of the search terms to the query so I added the 1 = 1 so that I could just append AND before each search term. I was told that this is causing the query to do a full table scan before proceeding to narrow the results down.
I decided to test this. We have a user table with around 14,000 records. The queries were ran five times each using both phpmyadmin and PuTTY. In phpmyadmin I limited the queries to 500 but in PuTTY there was no limit. I tried a few different basic queries and tried clocking the times on them. I found that the 1 = 1 seemed to cause the query to be faster than just a query with no WHERE clause at all. This is on a live database but it seemed the results were fairly consistent.
I was hoping to post on here and see if someone could either break down the results for me or explain to me the logic for either side of this.
Well, your boss-man and his information source are both idiots. Adding 1=1 to a query does not cause a full table scan. The only thing it does is make query parsing take a miniscule amount longer. Any decent query plan generator (including the mysql one) will realize this condition is a NOP and drop it.
I tried this on my own database (solar panel historical data), nothing interesting out of the noise.
mysql> select sum(KWHTODAY) from Samples where Timestamp >= '2010-01-01';
seconds: 5.73, 5.54, 5.65, 5.95, 5.49
mysql> select sum(KWHTODAY) from Samples where Timestamp >= '2010-01-01' and 1=1;
seconds: 6.01, 5.74, 5.83, 5.51, 5.83
Note I used ajreal's query cache disabling.
First at all, did you set session query_cache_type=off; during both testing?
Secondly, both your testing queries on PHPmyadmin and Putty (mysql client) are so different, how to verify?
You should apply same query on both site.
Also, you can not assume PHPmyadmin is query cache off. The time display on the phpmyadmin is including PHP processing, which you should avoid as well.
Therefore, you should just do the testing on mysql client instead.
This isn't a really accurate way to determine what's going on inside MySQL. Things like caching and network variations could skew your results.
You should look into using "explain" to find out what query plan MySQL is using for your queries with and without your 1=1. A DBA will be more interested in those results. Also, if your 1=1 is causing a full table scan, you will know for sure.
The explain syntax is here: http://dev.mysql.com/doc/refman/5.0/en/explain.html
How to interpret the results are here: http://dev.mysql.com/doc/refman/5.0/en/explain-output.html
I have a table of about 800 000 records. Its basically a log which I query often.
I gave condition to query only queries that were entered last month in attempt to reduce the load on a database.
My thinking is
a) if the database goes only through the first month and then returns entries, its good.
b) if the database goes through the whole database + checking the condition against every single record, it's actually worse than no condition.
What is your opinion?
How would you go about reducing load on a dbf?
If the field containing the entry date is keyed/indexed, and is used by the DB software to optimize the query, that should reduce the set of rows examined to the rows matching that date range.
That said, it's a commonly understood that you are better off optimizing queries, indexes, database server settings and hardware, in that order. Changing how you query your data can reduce the impact of a query a millionfold for a query that is badly formulated in the first place, depending on the dataset.
If there are no obvious areas for speedup in how the query itself is formulated (joins done correctly or no joins needed, or effective use of indexes), adding indexes to help your common queries would by a good next step.
If you want more information about how the database is going to execute your query; you can use the MySQL EXPLAIN command to find out. For example, that will tell you if it's able to use an index for the query.