Why following query takes so much time in mysql but not in oracle
select * from (select * from employee) as a limit 1000
I test this query in oracle and MySQL database with 50,00,000 records in this table.
I know that these query should be write like
select * from employee limit 1000
But for displaying data & total no of rows to our custom dynamic grid we have only one query we use simple select * from employee query and then add limit or other conditions.
we short this problem.
But my question is "Why such query in mysql takes too much time?"
Well, because to perform this query MySQL has to go over all 50K rows in the table, copy them to a temp table (in memory or on disk, depending on size) and then take the first 1000.
In MySQL, if you want to optimize queries with LIMIT, you need to follow some practices that prevent scanning the full data set (mainly indexes on the column you sort by).
See here: http://dev.mysql.com/doc/refman/5.0/en/limit-optimization.html
Related
I have a view that has a duration time of ~0.2 seconds when I do a simple SELECT * from it, but has a duration time of ~25 seconds when I do simply SELECT COUNT(*) from it. What would cause this? It seems like if it takes 0.2 seconds to compute the output data then it could run a simple length calculation on that dataset in a trivial amount of time. MySQL 5.7. Details below.
mysql> select count(*) from Lots;
+----------+
| count(*) |
+----------+
| 4136666 |
+----------+
1 row in set (25.29 sec)
In MySQL workbench, the following query produces durations like: 0.217 sec
select * from Lots;
The fetch time is significant given the amount of data, but my understanding is the "Duration" is how long it takes to compute the output dataset of the view.
Definition of Lots view:
select
lot.*,
coalesce(overrides.streetNumber, address.streetNumber, lot.rawStreetNumber) as streetNumber,
coalesce(overrides.street, address.street, lot.rawStreet) as street,
coalesce(overrides.postalCode, address.postalCode, lot.rawPostalCode) as postalCode,
coalesce(overrides.city, address.city, lot.rawCity) as city
from LotsData lot
left join Address address on address.lotNumber = lot.lotNumber
left join Override overrides on overrides.lotId = lot.lotNumber
The data in VIEW objects isn't materialized. That is, it doesn't exist in any sort of tabular form in your database server. Rather, the server pulls it together from its tables when a query (like your COUNT query) references the VIEW. So, there's no simple metadata hanging around in the server that can satisfy your COUNT query instantaneously. The server has to pull together all your joined tables to generate a row count. It takes a while. Remember, your database server may have other clients concurrently INSERTing or DELETEing rows to one or more of the tables in your view.
It's worse than that. In the InnoDB storage engine, even COUNTing the rows of a table is slow. To achieve high concurrency InnoDB doesn't attempt to store any kind of precise row count. So the database server has to count those rows one-by-one as well. (The older MyISAM storage engine does maintain precise row count metadata for tables, but it offers less concurrency.)
Wise data programmers avoid using COUNT(*) on whole tables or views composed from them in production for those reasons.
The real question is why your SELECT * FROM view is so fast. It's unlikely that your database server can compose and deliver a 4-megarow view from its JOINs in less than a second, nor is it likely that Workbench can absorb that many rows in that time. Like #ysth said, many GUI-based SQL client programs, like Workbench and HeidiSQL, sometimes silently append something like LIMIT 1000 to interactive operations calling for the display of whole tables or views. You might look for evidence of that.
I Had a problem with order by when joins multiple tables which have millions of data. But I got solution as instead of join with distinct use of EXISTS will improve performance from the following question
How to improve order by performance with joins in mysql
SELECT
`tracked_twitter` . *,
COUNT( * ) AS twitterContentCount,
retweet_count + favourite_count + reply_count AS engagement
FROM
`tracked_twitter`
INNER JOIN
`twitter_content`
ON `tracked_twitter`.`id` = `twitter_content`.`tracked_twitter_id`
INNER JOIN
`tracker_twitter_content`
ON `twitter_content`.`id` = `tracker_twitter_content`.`twitter_content_id`
WHERE
`tracker_twitter_content`.`tracker_id` = '88'
GROUP BY
`tracked_twitter`.`id`
ORDER BY
twitterContentCount DESC LIMIT 20 OFFSET 0
But that method solves if I only need the result set from the parent table. What if, I want to execute grouped count and other math functions in other than parent table. I wrote a query that solves my criteria, but it takes 20 sec to execute. How can I optimize it ??.
Thanks in advance
Given the query is already fairly simple the options I'd look in to are ...
Execution plan (to find any missing indexes you could add)
caching (to ensure SQL already has all the data in ram)
de-normalisation (to turn the query in to flat select)
cache the data in the application (so you could use something like PLINQ on it)
Use a ram based store (redis, elastic)
File group adjustments (physically move the db to faster discs)
Partition your tables (to spread the raw data over multiple physical discs)
The further you go down this list the more involved the solutions become.
I guess it depends how fast you need the query to be and how much you need your solution to scale.
I recently learned that I can search in a MySQL table across multiple columns by using the following select statement with OR:
SELECT * data WHERE TEMP = "3000" OR X ="3000" OR Y="3000";
Which returns the results needed, but it does take approximately 1.7 s to return the results in the table that has only ~260k rows. I also have already added indexes for each of the columns that are searched.
Is there a way to optimize this query? Or is there another one which is faster but returns the same results?
Another option is to use UNION...
SELECT * FROM data WHERE TEMP = "3000"
UNION
SELECT * FROM data WHERE X ="3000"
UNION
SELECT * FROM data WHERE Y="3000";
...however the real key to improving the performance is firstly indexes and second the query analyser. Often the data determines which is faster as TEMP may be a hundred times less likely that Y to be "3000" - so that should be first in you original OR statement for example.
Im in a dilema on which one of these methods are most efficient.
Suppose you have a query joining multiple tables and querying thousand of records. Than, you gotta get the total to paginate throughout all these results.
Is it faster to?
1) Do a complete select (suppose you have to select 50's columns), count the rows and than run another query with limits? (Will the MySQL cache help this case already selecting all the columns you need on the first query used to count?)
2) First do the query using COUNT function and than do the query to select the results you need.
3) Instead of using MySQL COUNT function, do the query selecting the ID's for example and use the PHP function mysql_num_rows?
I think the number 2 is the best option, using MySQL built in COUNT function, but I know MySQL uses cache, so, selecting all the results on first query gonna be faster?
Thanks,
Have a look at Found_Rows()
A SELECT statement may include a LIMIT clause to restrict the number
of rows the server returns to the client. In some cases, it is
desirable to know how many rows the statement would have returned
without the LIMIT, but without running the statement again. To obtain
this row count, include a SQL_CALC_FOUND_ROWS option in the SELECT
statement, and then invoke FOUND_ROWS() afterward:
mysql> SELECT SQL_CALC_FOUND_ROWS * FROM tbl_name WHERE id > 100 LIMIT 10;
mysql> SELECT FOUND_ROWS();
The second SELECT returns a number indicating how many rows the first SELECT`
would have returned had it been written without the LIMIT clause.
My guess is number 2, but the truth is that it will depend entirely on data size, tables, indexing, MySql version etc.
The only way of finding the answer to this is to try each one and measure how long they take. But like I say, my hunch would be number 2.
I need to query the MYSQL with some condition, and get five random different rows from the result.
Say, I have a table named 'user', and a field named 'cash'. I can compose a SQL like:
SELECT * FROM user where cash < 1000 order by RAND() LIMIT 5.
The result is good, totally random, unsorted and different from each other, exact what I want.
But I got from google that the efficiency is bad when the table get large because MySQL creates a temporary table with all the result rows and assigns each one of them a random sorting index. The results are then sorted and returned.
Then I go on searching and got a solution like:
SELECT * FROM `user` AS t1 JOIN (SELECT ROUND(RAND() * ((SELECT MAX(id) FROM `user`)- (SELECT MIN(id) FROM `user`))+(SELECT MIN(id) FROM `user`)) AS id) AS t2 WHERE t1.id >= t2.id AND cash < 1000 ORDER BY t1.id LIMIT 5;
This method uses JOIN and MAX(id). And the efficiency is better than the first one according to my testing. However, there is a problem. Since I also needs a condition "cash<1000" and if the the RAND() is so big that no row behind it has the cash<1000, then no result will return.
Anyone has good idea of how to compose the SQL that has have the same effect as the first one but has better efficiency?
Or, shall I just do simple query in MYSQL and let PHP randomly pick 5 different rows from the query result?
Your help is appreciated.
To make first query faster, just SELECT id - that will make the temporary table rather small (it will contain only IDs and not all fields of each row) and maybe it will fit in memory (temp table with text/blob are always created on-disk for example). Then when you get a result, run another query SELECT * FROM xy WHERE id IN (a,b,c,d,...). As you mentioned this approach is not very efficient, but as a quick fix this modification will make it several times faster.
One of the best approaches seems to be getting the total number of rows, choosing random numbers and for each result run a new query SELECT * FROM xy WHERE abc LIMIT $random,1. It should be quite efficient for random 3-5, but not good if you want 100 random rows each time :)
Also consider caching your results. Often you don't need different random rows to be displayed on each page load. Generate your random rows only once per minute. If you will generate the data for example via cron, you can live also with query which takes several seconds, as users will see the old data while new data are being generated.
Here are some of my bookmarks for this problem for reference:
http://jan.kneschke.de/projects/mysql/order-by-rand/
http://www.titov.net/2005/09/21/do-not-use-order-by-rand-or-how-to-get-random-rows-from-table/