I was wondering what would be faster and what's the tradeoffs of using one or the other query?
SELECT * FROM table WHERE somecolumn = 'something' LIMIT 999;
vs.
SELECT * FROM table WHERE somecolumn = 'something';
Now, considering that the results of the query will never return more than a couple of hundreds of rows, does using LIMIT 999 makes some significate performance impact or not?
I'm looking into this option as in my project I will have some kind of option for a user to limit results as he'd like, and he can leave limit empty to show all, so it's easier for me to leave LIMIT part of the query and then just to change the number.
Now, the table is really big, ranging from couple of hundreds of thousands to couple of millions rows.
The exact quesy looks something like:
SELECT SUM(revenue) AS cost,
IF(ISNULL(headline) OR headline = '', 'undefined', headline
) AS headline
FROM `some_table`
WHERE ((date >= '2017-01-01')
AND (date <= '2017-12-31')
)
AND -- (sic)
GROUP BY `headline`
ORDER BY `cost` DESC
As I said before, this query will never return more than about a hundred rows.
Disk I/O, if any, is by far the most costly part of a query.
Fetching each row ranks next.
Almost everything else is insignificant.
However, if the existence of LIMIT can change what the Optimizer does, then there could be a significant difference.
In most cases, including the queries you gave, a too-big LIMIT has not impact.
In certain subqueries, a LIMIT will prevent the elimination of ORDER BY. A subquery is, by definition, is a set not an ordered set. So LIMIT is a kludge to prevent the optimization of removing ORDER BY.
If there is a composite index that includes all the columns needed for WHERE, GROUP BY, and ORDER BY, then the Optimizer can stop when the LIMIT is reached. Other situations go through tmp tables and sorts for GROUP BY and ORDER BY and can do the LIMIT only against a full set of rows.
Two caches were alluded to in the Comments so far.
"Query cache" -- This records exact queries and their result sets. If it is turned on and if it applicable, then the query comes back "instantly". By "exact", I include the existence and value of LIMIT.
To speed up all queries, data and indexes blocks are "cached" in RAM (see innodb_buffer_pool_size). This avoids disk I/O when a similar (not necessarily exact) query is run. See my first sentence, above.
Related
There are 2 samples.
In the first example, it gives faster results when using orderby. (according to phpmyadmin speed report)
In the other example, I don't use order by, it gives slower results. (according to phpmyadmin speed report)
Isn't it unreasonable that it gives quick results when using Orderby?
The ranking doesn't matter to me, it's the speed that matters.
select bayi,tutar
from siparisler
where durum='1' and MONTH(tarih) = MONTH(CURDATE()) and YEAR(tarih) = YEAR(CURRENT_DATE())
order by id desc
Speed: 0.0006
select bayi,tutar
from siparisler
where durum='1' and MONTH(tarih) = MONTH(CURDATE()) and YEAR(tarih) = YEAR(CURRENT_DATE())
Speed: 0.7785
An order by query will never execute faster than the same query without the order by clause. Sorting rows incurs more work for the database. In the best-case scenario, the sorting becomes a no-op because MySQL fetched the rows in the correct order in the first place: but that just make the two queries equivalent in terms of performance (it does not make the query that sorts faster).
Possibly, the results of the order by were cached already, so MYSQL gives you the result directly from the cache rather than actually executing the query.
If performance is what matters most to you, let me suggest to change the where predicate in order not to use date functions on the tarih column: such construct prevents the database to take advantage of an index (we say the predicate is non-SARGable). Consider:
select bayi, tutar
from siparisler
where
durum = 1
and tarih >= dateformat(current_date, '%Y-%m-01')
and tarih < dateformat(current_date, '%Y-%m-01') + interval 1 month
order by id desc
For performance with this query, consider an index on (durum, tarih, id desc, bay, tutar): it should behave as a covering index, that MySQL can use to execute the entire query, without even looking at the actual data.
At 0.0006s, you are almost certainly measuring the performance of the query_cache rather than the execution time. Try both queries again with SELECT SQL_NO_CACHE and see what the performance difference is.
First, I recommend writing the query as:
select bayi, tutar
from siparisler p
where durum = 1 and -- no quotes assuming this is an integer
tarih >= curdate() - interval (1 - day(curdate()) day;
This can take advantage of an index on (durm, tarih).
But that isn't your question. It is possible that the order by could result in a radically different execution plan. This is hypothetical, but the intention is to explain how this might occur.
Let me assume the following:
The table only has an index on (id desc, durum, tarih).
The where clause matches few rows.
The rows are quite wide.
The query without the order by would probably generate an execution plan that is a full table scan. Because the rows are wide, lots of unnecessary data would be read.
The query with the order by could read the data in order and then apply the where conditions. This would be faster than the other version, because only the rows that match the where conditions would be read in.
I cannot guarantee that this is happening. But there are some counterintuitive situations that arise with queries.
You can analyze it through the EXPLAIN command, and then check the value corresponding to the type field, index or all
Example:
EXPLAIN SELECT bayi,tutar
FROM siparisler
WHERE durum='1' AND MONTH(tarih) = MONTH(CURDATE()) AND YEAR(tarih) = YEAR(CURRENT_DATE())
ORDER BY id DESC;
I have 500000 records table in my MySQL server. When running a query it takes more time for query execution. sometimes it goes beyond a minute.
Below I have added my MySQL machine detail.
RAM-16GB
Processor : Intel(R) -Coreā¢ i5-4460M CPU #3.20GHz
OS: Windows server 64 bit
I know there is no problem with my machine since it is a standalone machine and no other applications there.
Maybe the problem with my query. I have gone through the MySql site and found that I have used proper syntax. But I don't know exactly the reason for the delay in the result.
SELECT SUM(`samplesalesdata50000`.`UnitPrice`) AS `UnitPrice`, `samplesalesdata50000`.`SalesChannel` AS `Channel`
FROM `samplesalesdata50000` AS `samplesalesdata50000`
GROUP BY `samplesalesdata50000`.`SalesChannel`
ORDER BY 2 ASC
LIMIT 200 OFFSET 0
Can anyone please let me know whether the duration, depends on the table or the query that I have used?
Note: Even if try with indexing, there is no much difference in result time.
Thanks
Two approaches to this:
One approach is to create a covering index on the columns needed to satisfy your query. The correct index for your query contains these columns in this order: (SalesChannel, UnitPrice).
Why does this help? For one thing, the index itself contains all data needed to satisfy your query, and nothing else. This means your server does less work.
For another thing, MySQL's indexes are BTREE-organized. That means they're accessible in order. So your query can be satisfied one SalesChannel at a time, and MySQL doesn't need an internal temporary table. That's faster.
A second approach involves recognizing that ORDER BY ... LIMIT is a notorious performance antipattern. You require MySQL to sort a big mess of data, and then discard most of it.
You could try this:
SELECT SUM(UnitPrice) UnitPrice,
SalesChannel Channel
FROM samplesalesdata50000
WHERE SalesChannel IN (
SELECT SalesChannel
FROM samplesalesdata50000
ORDER BY Channel LIMIT 200 OFFSET 0
)
GROUP BY SalesChannel
ORDER BY SalesChannel
LIMIT 200 OFFSET 0
If you have an index on SalesChannel (the covering index mentioned above works) this should speed you up a lot, because your aggregate (GROUP BY) query need only consider a subset of your table.
Your problem with "ORDER BY 2 ASC". Try this "ORDER BY Channel".
If it was MS SQL Server you would use the WITH (NOLOCK)
and the MYSQL equivalent is
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED ;
SELECT SUM(`samplesalesdata50000`.`UnitPrice`) AS `UnitPrice`, `samplesalesdata50000`.`SalesChannel` AS `Channel`
FROM `samplesalesdata50000` AS `samplesalesdata50000`
GROUP BY `samplesalesdata50000`.`SalesChannel`
ORDER BY SalesChannel ASC
LIMIT 200 OFFSET 0
COMMIT ;
To improve on OJones's answer, note that
SELECT SalesChannel FROM samplesalesdata50000
ORDER BY SalesChannel LIMIT 200, 1
will quickly (assuming the index given) find the end of the desired list. Then adding this limits the main query to only the rows needed:
WHERE SalesChannel < (that-select)
There is, however, a problem. If there are fewer than 200 rows in the table, the subquery will return nothing.
You seem to be setting up for "paginating"? In that case, a similar technique can be used to find the starting value:
WHERE SalesChannel >= ...
AND SalesChannel < ...
This also avoids using the inefficient OFFSET, which has to read, then toss, all the rows being skipped over. More
But the real solution may be to build and maintain a Summary Table of the data. It would contain subtotals for each, say, month. Then run the query against the Summary table -- it might be 10x faster. More
I've got a complex query I have to run in an application that is giving me some performance trouble. I've simplified it here. The database is MySQL 5.6.35 on CentOS.
SELECT a.`po_num`,
Count(*) AS item_count,
Sum(b.`quantity`) AS total_quantity,
Group_concat(`web_sku` SEPARATOR ' ') AS web_skus
FROM `order` a
INNER JOIN `order_item` b
ON a.`order_id` = b.`order_key`
WHERE `store` LIKE '%foobar%'
LIMIT 200 offset 0;
The key part of this query is where I've placed "foobar" as a placeholder. If this value is something like big_store, the query takes much longer (roughly 0.4 seconds in the query provided here, much longer in the query I'm actually using) than if the value is small_store (roughly 0.1 seconds in the query provided). big_store would return significantly more results if there were not limit.
But there is a limit and that's what surprises me. Both datasets have more than the LIMIT, which is only 200. It appears to me that MySQL performing the select functions COUNT, SUM, GROUP_CONCAT for all big_store/small_store rows and then applies the LIMIT retroactively. I would imagine that it'd be best to stop when you get to 200.
Could it not do the select functions COUNT, SUM, GROUP_CONCAT actions after grabbing the 200 rows it will use, making my query much much quicker? This seems feasible to me except in cases where there's an ORDER BY on one of those rows.
Does MySQL not use LIMIT to optimize a query select functions? If not, is there a good reason for that? If so, did I make a mistake in my thinking above?
It can stop short due to the LIMIT, but that is not a reasonable query since there is no ORDER BY.
Without ORDER BY, it will pick whatever 200 rows it feels like and stop short.
With an ORDER BY, it will have to scan the entire table that contains store (please qualify columns with which table they come from!). This is because of the leading wildcard. Only then can it trim to 200 rows.
Another problem -- Without a GROUP BY, aggregates (SUM, etc) are performed across the entire table (or at least those that remain after filtering). The LIMIT does not apply until after that.
Perhaps what you are asking about is MariaDB 5.5.21's "LIMIT_ROWS_EXAMINED".
Think of it this way ... All of the components of a SELECT are done in the order specified by the syntax. Since LIMIT is last, it does not apply until after the other stuff is performed.
(There are a couple of exceptions: (1) SELECT col... must be done after FROM ..., since it would not know which table(s); (2) The optimizer readily reorders JOINed table and clauses in WHERE ... AND ....)
More details on that query.
The optimizer peeks ahead, and sees that the WHERE is filtering on order (that is where store is, yes?), so it decides to start with the table order.
It fetches all rows from order that match %foobar%.
For each such row, find the row(s) in order_item. Now it has some number of rows (possibly more than 200) with which to do the aggregates.
Perform the aggregates - COUNT, SUM, GROUP_CONCAT. (Actually this will probably be done as it gathers the rows -- another optimization.)
There is now 1 row (with an unpredictable value for a.po_num).
Skip 0 rows for the OFFSET part of the LIMIT. (OK, another out-of-order thingie.)
Deliver up to 200 rows. (There is only 1.)
Add ORDER BY (but no GROUP BY) -- big deal, sort the 1 row.
Add GROUP BY (but no ORDER BY) in, now you may have more than 200 rows coming out, and it can stop short.
Add GROUP BY and ORDER BY and they are identical, then it may have to do a sort for the grouping, but not for the ordering, and it may stop at 200.
Add GROUP BY and ORDER BY and they are not identical, then it may have to do a sort for the grouping, and will have to re-sort for the ordering, and cannot stop at 200 until after the ORDER BY. That is, virtually all the work is performed on all the data.
Oh, and all of this gets worse if you don't have the optimal index. Oh, did I fail to insist on providing SHOW CREATE TABLE?
I apologize for my tone. I have thrown quite a few tips in your direction; please learn from them.
I have 50,000 rows in table and i am running following query but i heard it is a bad idea but how do i make it work better way?
mysql> SELECT t_dnis,account_id FROM mytable WHERE o_dnis = '15623157085' AND enabled = 1 ORDER BY RAND() LIMIT 1;
+------------+------------+
| t_dnis | account_id |
+------------+------------+
| 5623157085 | 1127 |
+------------+------------+
Any other way i can make is query faster or user other options?
I am not DBA so sorry if this question asked before :(
Note: currently we are not seeing performance issue but we are growing so could be impact in future so just want to know + and - point before are are out of wood.
This query:
SELECT t_dnis, account_id
FROM mytable
WHERE o_dnis = '15623157085' AND enabled = 1
ORDER BY RAND()
LIMIT 1;
is not sorting 50,000 rows. It is sorting the number of rows that match the WHERE clause. As you state in the comments, this is in the low double digits. On a handful of rows, the use of ORDER BY rand() should not have much impact on performance.
You do want an index. The best index would be mytable(o_dnis, enabled, t_dnis, account_id). This is a covering index for the query, so the original data pages do not need to be accessed.
Under most circumstances, I would expect the ORDER BY to be fine up to at least a few hundred rows, if not several thousand. Of course, this depends on lots of factors, such as your response-time requirements, the hardware you are running on, and how many concurrent queries are running. My guess is that your current data/configuration does not pose a performance problem, and there is ample room for growth in the data without an issue arising.
Unless you are running on very slow hardware, you should not experience problems in sorting (much? less than) 50,000 rows. So if you still ask the question, this makes me suspect that your problem does not lie in the RAND().
For example one possible cause of slowness could be not having a proper index - in this case you can go for a covering index:
CREATE INDEX mytable_ndx ON enabled, o_dnis, t_dnis, account_id;
or the basic
CREATE INDEX mytable_ndx ON enabled, o_dnis;
At this point you should already have good performances.
Otherwise you can run the query twice, either by counting the rows or just priming a cache. Which to choose depends on the data structure and how many rows are returned; usually, the COUNT option is the safest bet.
SELECT COUNT(1) AS n FROM mytable WHERE ...
which gives you n, which allows you to generate a random number k in the same range as n, followed by
SELECT ... FROM mytable LIMIT k, 1
which ought to be really fast. Again, the index will help you speeding up the counting operation.
In some cases (MySQL only) you could perhaps do better with
SELECT SQL_CACHE SQL_CALC_FOUND_ROWS ... FROM mytable WHERE ...
using the calc_found_rows() function to recover n, then run the second query which should take advantage of the cache. It's best if you experiment first, though. And changes in the table demographics might cause performance to fall.
The problem with ORDER BY RAND() LIMIT 1 is that MySQL will give each row a random values and that sort, performing a full table scan and than drops all the results but one.
This is especially bad on a table with a lot of row, doing a query like
SELECT * FROM foo ORDER BY RAND() LIMIT 1
However in your case the query is already filtering on o_dnis and enabled. If there are only a limited number of rows that match (like a few hundred), doing an ORDER BY RAND() shouldn't cause a performance issue.
The alternative required two queries. One to count and the other one to fetch.
in pseudo code
count = query("SELECT COUNT(*) FROM mytable WHERE o_dnis = '15623157085' AND enabled = 1").value
offset = random(0, count - 1)
result = query("SELECT t_dnis, account_id FROM mytable WHERE o_dnis = '15623157085' AND enabled = 1 LIMIT 1 OFFSET " + offset).row
Note: For the pseudo code to perform well, there needs to be a (multi-column) index on o_dnis, enabled.
I have a MyISAM table with 28,900 entires. I'm processing it in chunks of 1500, which a query like this:
SELECT * FROM table WHERE id>0 LIMIT $iStart,1500
Then I loop over this and increment $iStart by 1500 each time.
The problem is that the queries are returning the same rows in some cases. For example the LIMIT 0,1500 query returns some of the same rows as the LIMIT 28500,1500 query does.
If I don't ORDER the rows, can I not expect to use LIMIT for pagination?
(The table is static while these queries are happening, no other queries are going on that would alter its rows).
Like pretty much every other SQL engine out there, MySQL MyISAM tables make no guarantees at all about the order in which rows are returned unless you specify an ORDER BY clause. Typically the order they are returned in will be the order they were read off the filesystem in, which can change from query to query depending on updates, deletes, and even the state of cached selects.
If you want to avoid having the same row returned more than once then you must order by something, the primary key being the most obvious candidate.
You should use ORDER BY to ensure a consistent order. Otherwise the DBMS may return rows in an arbitrary order (however, one would assume that it's arbitrary but consistent between queries if the no rows are modified).