phpMyAdmin - Query Execution Time - mysql

When I execute a simple statement in phpMyAdmin like
SELECT *
FROM a
where "a" has 500'000 rows, it gives me a time of a few milliseconds on my localhost.
Some complex queries report times that are way longer (as expected), but some queries report also very fast times < 1/100s but the result page in phpMyAdmin takes way longer to display.
So I'm unsure, is the displayed execution time really true and accurate in phpMyAdmin? How is it measured? Does it measure the whole query with all subselects, joins etc?
Thanks!
UPDATE
I thought it'd be a good idea to test from my own PHP-script like:
$start = microtime(true);
$sql = "same statement as in phpMyAdmin";
$db = new PDO('mysql:host=localhost;dbname=mydb', 'root', 'lala');
$statement = $db -> prepare($sql);
$statement -> execute();
echo microtime(true) - $start . ' seconds';
and that takes more than 7 seconds compared to a reported time in phpMyAdmin for the same statement of 0.005s.
The query returns 300'000 rows, if I restrict it to 50 with "LIMIT 0,50" it's under 1/100s. Where does that difference come from? I don't iterate over the returned objects or something...

The displayed execution time is how long the query took to run on the server - it's accurate and comes from the MySQL engine itself. Unfortunately, the results have to then be sent over the web to your browser to be displayed, which takes a lot longer.

phpMyAdmin automatically appends a LIMIT clause to your statement, so it has a smaller result set to return thus making it faster.
Even if you need all 300,000 or 500,000 results then you should really use a LIMIT. Multiple smaller queries does not necessarily mean same execution time as a single big query.

Besides splash21's answer, it is a good idea to use SQL_NO_CACHE when testing for execution time. This makes sure that you are looking at the real time to do the query, not just grabbing a cached result.
SELECT SQL_NO_CACHE *
FROM a

No, phpMyAdmin is not telling the truth.
Imagine you have a pretty big table, let's say a million rows. Now you do a select, something you know is going to take a while:
SELECT * FROM bigTable WHERE value > 1234
...and PMA will report (after some waiting) that the query took some 0.0045 seconds. This is not the whole query time, this is the time of getting the first 25 hits. Or 50, or whatever you set the page size to. So it's obviously fast - it stops as soon as you get the first screenful of rows. But you will notice that it gives you this deceptive result after looong seconds; it's because MySQL needs to really do the job, in order to determine which rows to return. It runs the whole query, and then it takes another look and returns only a few rows. That's what you get the time of.
How to get the real time?
Do a count() with the same conditions.
SELECT COUNT(1) FROM bigTable WHERE value > 1234
You will get ONE row telling you the total number of rows, and naturally, PMA will display the exact time needed for this. It has to, because now the first page and the whole result means the same thing.

Related

MySQL query execution time is fast, but fetching rows is slow?

When I run this query on phpmyadmin, it takes about 30 seconds to fetch the results, but once the results have successfully loaded, it says "Query took 0.5029 seconds".
Why does it say 'Query took 0.50 seconds' if the results take 30 seconds to load?
My Query:
SELECT * FROM `documents` WHERE disable=0 AND author=7 AND MATCH(text) AGAINST('"chocolate"')
The field I am searching (named "text") has a field type of "mediumtext", and each text row contains about 200kb of text. The total size of the table is 15,000 rows and 1.5GB of text.
Does anyone know what causes this to happen?
I am going to expand on my comment.
When a database reports on the time to complete a query, that is generally the time only within the database. It might or might not include the time to compile the query. It does not include the time to return the results.
Your data rows are quite wide, because of the text column on the data. So, you have a situation where running the query in the database is quite fast. But the resulting rows are very big -- so it takes lots of time to return them to the user.
Perhaps further complicating the timing is that you might be looking at when all the rows are returned rather than just for the first row to return (that is also a confusion with timings sometimes).
In any case, if you don't need the wide columns, just select the columns that you do need. That has little effect on the query processing time, but it could have a big impact on the time to return the results.
Rule #1 Select * from anytable is a BAD idea.
Just ask #supercoolville for clarification.

Is using limit on main query effect the sub-queries?

I have this query :
SELECT *,(SELECT count(id) FROM riverLikes
WHERE riverLikes.river_id = River.id) as likeCounts
FROM River
WHERE user_id IN (1,2,3)
LIMIT 10
my question is my sub-query runs only 10 time ( foreach row that are fetched ) or it run for every row in the "River" table ?
my "River" has lots of records and i like to have the best performance to get the rivers .
thanks.
In general, calculated data (either subqueries or functions), is calculated for the rows that matter, being rows that are returned, or rows for which the outcome of the calculation is relevant to further filtering or grouping.
In addition, the query optimizer may do all kinds of magic, and it is unlikely that it will run the subquery many times as such. It can be transformed in such a way that all relevant information is fetched at once.
And even if it didn't do that, it all takes place within the same operation in the database SQL engine, so executing this subselect 10 times is way, way faster than executing that subselect as a separate select 10 times, because the SQL engine only has to parse and prepare it once, and doesn't suffer from roundtrip times.
A simple select like that could easily take 30 milliseconds or so when executed from PHP, so quick math would suggest that it'd take 300ms extra to have this subselect in a 10-row query, but that's not the case, because the lion's share of those 30ms is overhead of communication between PHP and the database.
Because of the reasons mentioned above, this subselect is possibly way faster than a join, and it's a common misconception that a join is (almost) always faster.
So, to get back to your example, the subquery won't be executed for all rows in River, but will only be executed, probably in optimized form, for those 10 records of Rivers 1, 2 and 3.
In most production-ready RDBMS's subquery will be run only for rows which included in result set, i.e. only 10 times in your case. I think it is true for mysql too.
EDIT:
For assurance run
EXPLAIN <your query>
And view execution plan of your query
The subquery in the select statement runs one time per row returned, in your sample 10 times

Indexes Show No Improvement In Speed

I have a table with about 22 million rows and about 20 columns containing property data. Currently a query like:
SELECT * FROM fulldataset WHERE county = 'MIDDLESBROUGH'
takes an average of 42 seconds to run. To try and improve this, I created an index on the county column like this:
ALTER TABLE fulldataset ADD INDEX county (county)
There has been no improvement at all in the speed of the same query.
So I used EXPLAIN SELECT to try and find out what was happening. If I SELECT * from countyA, it returns around 85k entries, after ~42 seconds. If I EXPLAIN SELECT the same query it says it's using the county Index I created and that the number of rows is around 167k, which is wrong but better than searching all 22 million.
Likewise, if I SELECT * for countyB I get around 48k results and EXPLAIN SELECT tells me there are around 91k rows. The EXPLAIN SELECT statement returns the result instantly, so it's able to instantly tell that there are around half as many entries for countyB as there are for countyA. The problem is the queries don't execute any faster. If it's only checking 91k rows shouldn't it be very quick?
Here's a screenshot of what I'm doing: image
EDIT: As pointed out, the query itself is not what is taking time. In answer to my own question in the comments, a multiple column index worked wonders.
The query is not the problem. If you look closely at the output of your program you will see that the query execution took less than 1s, but fetching all the rows took 42s.
If you have to wait 42s before you see anything then I recommend to use another querying tool which only fetches the first X rows and displays them in pages.
EXPLAIN is designed to be fast. In doing so, the calculation of "Rows" is only a crude estimate. If can often be off by a factor of 2. So, don't read too much into 85K vs 167K.
Since EXPLAIN is delivering only a single row (or a small number of rows), the "fetch" time is very low.
If you are selecting the AVG() of some column, it has to first read all the relevant rows, doing the computation as it goes. It cannot even start to deliver data until it has finished all the reading.
If you are reading all the rows, it can (but I am not sure that it does) start delivering rows starting with the first row.
If you do something like SELECT * FROM tbl ORDER BY x (and x is not indexed), then you get the worst or both worlds. First it has to read all the rows and write them to a temp table, then it sorts that temp table; only then can it begin to fetch the rows.
I think "duration" and "fetch" are not very useful; the sum of the two is more useful. Here's another example of it: Mysql same querys one with index second without getting 10000xFetch time?
Notice how the sum is consistent, but the separation is not.

MySQL: SHOW PROFILES - Want to see more than 100 records

I am performance profiling my queries and need to be able to see a lot more than the last 100 queries. Max profiling_history_size is 100, but I've seen debug tools that somehow manage to save more than the last 100 queries (for example, django debug tool).
I do
SET profiling=1;
set profiling_history_size = 100;
SHOW PROFILES;
It would be fine if I could move the records to another table. Or maybe I need to be looking somewhere else altogether?
My program runs the same queries a lot of times, so what I really want is an aggregate of all the times that a particular query was executed. I was going to do some kind of GROUP BY once I had all the queries, but maybe there is some other place to look? (I don't mean to ask 2 questions, but maybe knowing what I eventually need will change the answer to the above question.)
With MySQL 5.6, the statements (queries) are instrumented by the performance schema.
You get the time a query takes, plus many more attributes such as the number of rows returned, etc, and you get many different aggregations.
See
http://dev.mysql.com/doc/refman/5.6/en/statement-summary-tables.html
http://dev.mysql.com/doc/refman/5.6/en/performance-schema-statements-tables.html
An aggregation that is particularly useful is the aggregation by digest, to group several "similar queries" together, where similar means the same structure but possibly different literal values ("select * from t1 where id = 1" and "select * from t1 where id = 2" will be aggregated together as "select * from t1 where id = ?")
If you need information for more than the last 100 queries, you will simply need to collect existing data every 100 queries. I have implemented the latter feature in PHP PDO.
https://github.com/gajus/doll
This is an application layer solution.

MySQL pagination without double-querying?

I was wondering if there was a way to get the number of results from a MySQL query, and at the same time limit the results.
The way pagination works (as I understand it) is to first do something like:
query = SELECT COUNT(*) FROM `table` WHERE `some_condition`
After I get the num_rows(query), I have the number of results. But then to actually limit my results, I have to do a second query:
query2 = SELECT COUNT(*) FROM `table` WHERE `some_condition` LIMIT 0, 10
Is there any way to both retrieve the total number of results that would be given, AND limit the results returned in a single query? Or are there any other efficient ways of achieving this?
I almost never do two queries.
Simply return one more row than is needed, only display 10 on the page, and if there are more than are displayed, display a "Next" button.
SELECT x, y, z FROM `table` WHERE `some_condition` LIMIT 0, 11
// Iterate through and display 10 rows.
// if there were 11 rows, display a "Next" button.
Your query should return in the order of most relevant first, chances are most people aren't going to care about going to page 236 out of 412.
When you do a google search and your results aren't on the first page, you likely go to page two, not nine.
No, that's how many applications that want to paginate have to do it. It's reliable and bullet-proof, albeit it makes the query twice, but you can cache the count for a few seconds and that will help a lot.
The other way is to use SQL_CALC_FOUND_ROWS clause and then call SELECT FOUND_ROWS(). Apart from the fact you have to put the FOUND_ROWS() call afterwards, there is a problem with this: there is a bug in MySQL that this tickles which affects ORDER BY queries making it much slower on large tables than the naive approach of two queries.
Another approach to avoiding double-querying is to fetch all the rows for the current page using a LIMIT clause first, then only do a second COUNT(*) query if the maximum number of rows were retrieved.
In many applications, the most likely outcome will be that all of the results fit on one page, and having to do pagination is the exception rather than the norm. In these cases, the first query will not retrieve the maximum number of results.
For example, answers on a Stackoverflow question rarely spill onto a second page. Comments on an answer rarely spill over the limit of 5 or so required to show them all.
So in these applications you can simply just do a query with a LIMIT first, and then as long as that limit is not reached, you know exactly how many rows there are without the need to do a second COUNT(*) query - which should cover the majority of situations.
In most situations it is much faster and less resource intensive to do it in two separate queries than to do it in one, even though that seems counter-intuitive.
If you use SQL_CALC_FOUND_ROWS, then for large tables it makes your query much slower, significantly slower even than executing two queries, the first with a COUNT(*) and the second with a LIMIT. The reason for this is that SQL_CALC_FOUND_ROWS causes the LIMIT clause to be applied after fetching the rows instead of before, so it fetches the entire row for all possible results before applying the limits. This can't be satisfied by an index because it actually fetches the data.
If you take the two queries approach, the first one only fetching COUNT(*) and not actually fetching and actual data, this can be satisfied much more quickly because it can usually use indexes and doesn't have to fetch the actual row data for every row it looks at. Then, the second query only needs to look at the first $offset + $limit rows and then return.
This post from the MySQL performance blog explains this further:
http://www.mysqlperformanceblog.com/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/
For more information on optimising pagination, check this post and this post.
For anyone looking for an answer in 2020. As per MySQL documentation:
The SQL_CALC_FOUND_ROWS query modifier and accompanying FOUND_ROWS() function are deprecated as of MySQL 8.0.17 and will be removed in a future MySQL version. As a replacement, considering executing your query with LIMIT, and then a second query with COUNT(*) and without LIMIT to determine whether there are additional rows.
I guess that settles that.
My answer may be late, but you can skip the second query (with the limit) and just filter the info through your back end script. In PHP for instance, you could do something like:
if($queryResult > 0) {
$counter = 0;
foreach($queryResult AS $result) {
if($counter >= $startAt AND $counter < $numOfRows) {
//do what you want here
}
$counter++;
}
}
But of course, when you have thousands of records to consider, it becomes inefficient very fast. Pre-calculated count maybe a good idea to look into.
Here's a good read on the subject:
http://www.percona.com/ppc2009/PPC2009_mysql_pagination.pdf
SELECT col, col2, (SELECT COUNT(*) FROM `table`) / 10 AS total FROM `table` WHERE `some_condition` LIMIT 0, 10
Where 10 is the page size and 0 is the page number, you need to use pageNumber - 1 in the query.
You can reuse most of the query in a subquery and set it to an identifier. For example a movie query that finds movies containing the letter 's' ordering by runtime would look like this on my site.
SELECT Movie.*, (
SELECT Count(1) FROM Movie
INNER JOIN MovieGenre
ON MovieGenre.MovieId = Movie.Id AND MovieGenre.GenreId = 11
WHERE Title LIKE '%s%'
) AS Count FROM Movie
INNER JOIN MovieGenre
ON MovieGenre.MovieId = Movie.Id AND MovieGenre.GenreId = 11
WHERE Title LIKE '%s%' LIMIT 8;
Do note that I'm not a database expert, and am hoping someone will be able to optimize that a bit better. As it stands running it straight from the SQL command line interface they both take ~0.02 seconds on my laptop.
SELECT *
FROM table
WHERE some_condition
ORDER BY RAND()
LIMIT 0, 10