In an application, sometimes queries are slow, and I "explain" them after the fact (if they are slow) and log them, so I can improve the application over time.
However, if I run an "explain " after, row_count() no longer reflects the number of rows affected by the original query, which I don't want. Is there a way to run an explain query (or perhaps any query), and not change row_count()?
Note: What I am currently doing is to open a separate link to the database, and explain using that link. That works, but I am unable to explain queries using temporary tables in that way. I am looking for a different solution that will preserve row_count() and work with temporary tables.
Capture row_count() into a variable, if you need it later. You should probably be doing this anyway, since the scope of validity of this value is very limited.
The value is tied to the specific connection, and is reset with each query you execute... and EXPLAIN ... is a query.
There's not a way to change this behavior.
Rearrange your code...
SELECT ...
get ROW_COUNT ...
EXPLAIN SELECT ...
Note also that EXPLAIN's "Row" column is an approximation; it will rarely match ROW_COUNT().
Related
I am generating some MySQL queries using php. In some cases, my code generates duplicate query code for some of the queries, for security precautions. For example, let's say I have a table UploadedImages, which contains images uploaded by a user. It is connected to the User table via a reference. When a plain user wants to query that table, if he doesn't have admin rights, I forcefully put in a WHERE condition to the query, which only retrieves images which belong to him.
Because of this forcefull inclusion, sometimes, the query which I generate results in duplicate where conditions:
SELECT * FROM UploadedImages WHERE
accounts_AccountId = '143' AND
DateUploaded > '2017-10-11 21:42:32' AND
accounts_AccountId = '143'
Should I bother, with cleaning up this query before running it, or will MariaDB clean it up for me? (ie, will this query run any slower, or is it possible that it will result in erroneus results if I don't clean it up beforehand, by removing the identical duplicate conditions?)
If your question is "Should I bother cleaning it up?", Yes you should clean up the code that produces this because the fact that it can include the same clause multiple times suggests the database layer is not abstracted to a particularly modern level. The database layer should be able to be re-written to use a different database provider without having to change the code that depends upon it. It looks like it is not the case.
If your question is "Does adding the same restriction twice slow the query?" then the answer is no, not significantly.
You can answer the question for yourself: Run EXPLAIN SELECT ... on both queries. If the output is the same, then the dup was cleaned up.
I have been learning query optimization, increase query performance and all but in general if we create a query how can we know if this is a wise query.
I know we can see the execution time below, But this time will not give a clear indication without a good amount of data. And usually, when we create a new query we don't have much data to check.
I have learned about clauses and commands performance. But is there is anything by which we can check the performance of the query? Performance here is not execution time, it means that whether a query is "ok" or not, without data dependency.
As we cannot create that much data that would be in live database.
General performance of a query can be checked using the EXPLAIN command in MySQL. See https://dev.mysql.com/doc/refman/5.7/en/using-explain.html
It shows you how MySQL engine plans to execute the query and allows you to do some basic sanity checks i.e. if the engine will use keys and indexes to execute the query, see how MySQL will execute the joins (i.e. if foreign keys aren't missing) and many more.
You can find some general tips about how to use EXPLAIN for optimizing queries here (along with some nice samples): http://www.sitepoint.com/using-explain-to-write-better-mysql-queries/
As mentioned above, Right query is always data-dependent. Up to some level you can use the below methods to check the performance
You can use Explain to understand the Query Execution Plan and that may help you to correct some stuffs. For more info :
Refer Documentation Optimizing Queries with EXPLAIN
You can use Query Analyzer. Refer MySQL Query Analyzer
I like to throw my cookbook at Newbies because they often do not understand how important INDEXes are, or don't know some of the subtleties.
When experimenting with multiple choices of query/schema, I like to use
FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';
That counts low level actions, such as "read next record". It essentially eliminates caching issues, disk speed, etc, and is very reproducible. Often there is a counter in that output (or multiple counters) that match the number of rows in the table (sometimes +/-1) -- that tells me there are table scan(s). This is usually not as good as if some INDEX were being used. If the query has a LIMIT, that value may show up in some Handler.
A really bad query, such as a CROSS JOIN, would show a value of N*M, where N and M are the row counts for the two tables.
I used the Handler technique to 'prove' that virtually all published "get me a random row" techniques require a table scan. Then I could experiment with small tables and Handlers to come up with a list of faster random routines.
Another tip when timing... Turn off the Query_cache (or use SELECT SQL_NO_CACHE).
I am using MySQL as database. I need to update some data. However the data may haven't changed so I may not need to update the row in such case.
I wanted to know which one will be better (performance wise):
a) Search the table to determine if the data has changed. For example I can search by primary key and then see if the value of remaining fields have changed or not. If yes, then continue with update statement and if not then leave it.
b) Use UPDATE query directly. If there are no changes in the data, MySQL will automatically ignore it and not process updating the data.
So which one will be perform better in such case.
From the MySQL manual:
If you set a column to the value it currently has, MySQL notices this
and does not update it.
So save yourself the latency and leave that task to MySQL. It will even tell you how many rows were actually affected.
First options seems better to me but in a specific scenerio.
Select all or some rows from table and get them in a result set
Traverse the result set, as in-memory traversal is really fast enough, and pick up the primary keys whose records you want to update and then execute update queries.
It seems comparetively efficient solution to me as compared to executing update query on each row regardless whether you need to do so or not.
If you use the select-then-update approach, you need to lock the row (e.g. select for update), otherwise you are in a race condition - the row can be changed after you selected and checked that it hasn't be changed.
As #AndreKR pointed out, MySQL won't perform any write operation, if the values are the same, so using update directly is faster than using 2 queries.
Recently I was pulled into the boss-man's office and told that one of my queries was slowing down the system. I then was told that it was because my WHERE clause began with 1 = 1. In my script I was just appending each of the search terms to the query so I added the 1 = 1 so that I could just append AND before each search term. I was told that this is causing the query to do a full table scan before proceeding to narrow the results down.
I decided to test this. We have a user table with around 14,000 records. The queries were ran five times each using both phpmyadmin and PuTTY. In phpmyadmin I limited the queries to 500 but in PuTTY there was no limit. I tried a few different basic queries and tried clocking the times on them. I found that the 1 = 1 seemed to cause the query to be faster than just a query with no WHERE clause at all. This is on a live database but it seemed the results were fairly consistent.
I was hoping to post on here and see if someone could either break down the results for me or explain to me the logic for either side of this.
Well, your boss-man and his information source are both idiots. Adding 1=1 to a query does not cause a full table scan. The only thing it does is make query parsing take a miniscule amount longer. Any decent query plan generator (including the mysql one) will realize this condition is a NOP and drop it.
I tried this on my own database (solar panel historical data), nothing interesting out of the noise.
mysql> select sum(KWHTODAY) from Samples where Timestamp >= '2010-01-01';
seconds: 5.73, 5.54, 5.65, 5.95, 5.49
mysql> select sum(KWHTODAY) from Samples where Timestamp >= '2010-01-01' and 1=1;
seconds: 6.01, 5.74, 5.83, 5.51, 5.83
Note I used ajreal's query cache disabling.
First at all, did you set session query_cache_type=off; during both testing?
Secondly, both your testing queries on PHPmyadmin and Putty (mysql client) are so different, how to verify?
You should apply same query on both site.
Also, you can not assume PHPmyadmin is query cache off. The time display on the phpmyadmin is including PHP processing, which you should avoid as well.
Therefore, you should just do the testing on mysql client instead.
This isn't a really accurate way to determine what's going on inside MySQL. Things like caching and network variations could skew your results.
You should look into using "explain" to find out what query plan MySQL is using for your queries with and without your 1=1. A DBA will be more interested in those results. Also, if your 1=1 is causing a full table scan, you will know for sure.
The explain syntax is here: http://dev.mysql.com/doc/refman/5.0/en/explain.html
How to interpret the results are here: http://dev.mysql.com/doc/refman/5.0/en/explain-output.html
The other day I found the FOUND_ROWS() (here) function in MySQL and it's corresponding SQL_CALC_FOUND_ROWS option. The later looks especially useful (instead of running a second query to get the row count).
I'm wondering what speed impact there is by adding SQL_CALC_FOUND_ROWS to a query?
I'm guessing it will be much faster than runnning a second query to count the rows, but will it be a lot different. Also, I have found limiting a query to make it much faster (for example when you get the first 10 rows of 1000). Will adding SQL_CALC_FOUND_ROWS to a query with a small limit cause the query to run much slower?
I know I can test this, but I'm wondering about general practices here.
When I was at the MySQL Conference in 2008, part of one session was dedicated to exactly this - benchmarks between SQL_CALC_FOUND_ROWS and doing a separate SELECT.
I believe the result was that there was no benefit to SQL_CALC_FOUND_ROWS - it wasn't faster, in fact it may have been slower. There was also a 3rd way.
Additionally, you don't always need this information, so I would go the extra query route.
I'll try to find the slides...
Edit: Hrm, google tells me that I actually liveblogged from that session: http://beerpla.net/2008/04/16/mysql-conference-liveblogging-mysql-performance-under-a-microscope-the-tobias-and-jay-show-wednesday-200pm/. Google wins when memory fails.
To calculate SQL_CALC_FOUND_ROWS the query will be execute as if no LIMIT was set, but the result set sent to the client will obey the LIMIT.
Update: for COUNT(*) operations which would be using only the index, SQL_CALC_FOUND_ROWS is slower (reference).
I assume it would be slightly faster for queries that you need the number of rows know, but would incur and overhead for queries that you don't need to know.
The best advice I could give is to try it out on your development server and benchmark the difference. Every setup is different.
I would advise to use as few proprietary SQL extensions as possible when developing an application (or actually not using SQL queries at all). Doing a separate query is portable, and actually I don't think MySql could do better at getting the actual information than re-querying. Btw. as the page mentions the command has some drawbacks too when used in replicated environments.