Mysql Order BY Speed - mysql

There are 2 samples.
In the first example, it gives faster results when using orderby. (according to phpmyadmin speed report)
In the other example, I don't use order by, it gives slower results. (according to phpmyadmin speed report)
Isn't it unreasonable that it gives quick results when using Orderby?
The ranking doesn't matter to me, it's the speed that matters.
select bayi,tutar
from siparisler
where durum='1' and MONTH(tarih) = MONTH(CURDATE()) and YEAR(tarih) = YEAR(CURRENT_DATE())
order by id desc
Speed: 0.0006
select bayi,tutar
from siparisler
where durum='1' and MONTH(tarih) = MONTH(CURDATE()) and YEAR(tarih) = YEAR(CURRENT_DATE())
Speed: 0.7785

An order by query will never execute faster than the same query without the order by clause. Sorting rows incurs more work for the database. In the best-case scenario, the sorting becomes a no-op because MySQL fetched the rows in the correct order in the first place: but that just make the two queries equivalent in terms of performance (it does not make the query that sorts faster).
Possibly, the results of the order by were cached already, so MYSQL gives you the result directly from the cache rather than actually executing the query.
If performance is what matters most to you, let me suggest to change the where predicate in order not to use date functions on the tarih column: such construct prevents the database to take advantage of an index (we say the predicate is non-SARGable). Consider:
select bayi, tutar
from siparisler
where
durum = 1
and tarih >= dateformat(current_date, '%Y-%m-01')
and tarih < dateformat(current_date, '%Y-%m-01') + interval 1 month
order by id desc
For performance with this query, consider an index on (durum, tarih, id desc, bay, tutar): it should behave as a covering index, that MySQL can use to execute the entire query, without even looking at the actual data.

At 0.0006s, you are almost certainly measuring the performance of the query_cache rather than the execution time. Try both queries again with SELECT SQL_NO_CACHE and see what the performance difference is.

First, I recommend writing the query as:
select bayi, tutar
from siparisler p
where durum = 1 and -- no quotes assuming this is an integer
tarih >= curdate() - interval (1 - day(curdate()) day;
This can take advantage of an index on (durm, tarih).
But that isn't your question. It is possible that the order by could result in a radically different execution plan. This is hypothetical, but the intention is to explain how this might occur.
Let me assume the following:
The table only has an index on (id desc, durum, tarih).
The where clause matches few rows.
The rows are quite wide.
The query without the order by would probably generate an execution plan that is a full table scan. Because the rows are wide, lots of unnecessary data would be read.
The query with the order by could read the data in order and then apply the where conditions. This would be faster than the other version, because only the rows that match the where conditions would be read in.
I cannot guarantee that this is happening. But there are some counterintuitive situations that arise with queries.

You can analyze it through the EXPLAIN command, and then check the value corresponding to the type field, index or all
Example:
EXPLAIN SELECT bayi,tutar
FROM siparisler
WHERE durum='1' AND MONTH(tarih) = MONTH(CURDATE()) AND YEAR(tarih) = YEAR(CURRENT_DATE())
ORDER BY id DESC;

Related

MySQL big limit number versus no limit

I was wondering what would be faster and what's the tradeoffs of using one or the other query?
SELECT * FROM table WHERE somecolumn = 'something' LIMIT 999;
vs.
SELECT * FROM table WHERE somecolumn = 'something';
Now, considering that the results of the query will never return more than a couple of hundreds of rows, does using LIMIT 999 makes some significate performance impact or not?
I'm looking into this option as in my project I will have some kind of option for a user to limit results as he'd like, and he can leave limit empty to show all, so it's easier for me to leave LIMIT part of the query and then just to change the number.
Now, the table is really big, ranging from couple of hundreds of thousands to couple of millions rows.
The exact quesy looks something like:
SELECT SUM(revenue) AS cost,
IF(ISNULL(headline) OR headline = '', 'undefined', headline
) AS headline
FROM `some_table`
WHERE ((date >= '2017-01-01')
AND (date <= '2017-12-31')
)
AND -- (sic)
GROUP BY `headline`
ORDER BY `cost` DESC
As I said before, this query will never return more than about a hundred rows.
Disk I/O, if any, is by far the most costly part of a query.
Fetching each row ranks next.
Almost everything else is insignificant.
However, if the existence of LIMIT can change what the Optimizer does, then there could be a significant difference.
In most cases, including the queries you gave, a too-big LIMIT has not impact.
In certain subqueries, a LIMIT will prevent the elimination of ORDER BY. A subquery is, by definition, is a set not an ordered set. So LIMIT is a kludge to prevent the optimization of removing ORDER BY.
If there is a composite index that includes all the columns needed for WHERE, GROUP BY, and ORDER BY, then the Optimizer can stop when the LIMIT is reached. Other situations go through tmp tables and sorts for GROUP BY and ORDER BY and can do the LIMIT only against a full set of rows.
Two caches were alluded to in the Comments so far.
"Query cache" -- This records exact queries and their result sets. If it is turned on and if it applicable, then the query comes back "instantly". By "exact", I include the existence and value of LIMIT.
To speed up all queries, data and indexes blocks are "cached" in RAM (see innodb_buffer_pool_size). This avoids disk I/O when a similar (not necessarily exact) query is run. See my first sentence, above.

How would using an ORDER BY clause both increase and decrease performance?

I have a MySQL table called devicelog with it's PK on id, but multiple indices on device_id (INT), field_id (INT), and unixtime (BIGINT). They are just the default InnoDB indices.
I'm trying to get the ID next to a certain time, I get WAY different performance with different values and different ORDER BYs. IDs and unixtimes both have a positive association, since they both are increasing in order as more data gets inserted, so it seems like it would be okay to safely omit ordering on unixtime. My table has around 25 million records and performance is extremely vital.
This query is fairly slow (~0.5 seconds): Edit: after using USE INDEX(unixtime), I was able to increase performance quite a bit (<0.01 seconds!).
SELECT
id
FROM
devicelog
USE INDEX(unixtime) /* edit: looking at the EXPLAIN, I can use this index and it sped things up a bit */
WHERE
device_id = 26
AND field_id = 64
AND unixtime >= 1397166634707 /* a fairly recent time */
/* with no ORDER BY clause, this query is surprisingly slow */
LIMIT 1
EXPLAIN:
1, SIMPLE, devicelog, index_merge, device_id,field_id,field_id_2,unixtime, field_id,device_id, 8,8, , 6667, Using intersect(field_id,device_id); Using where
This query is extremely fast (<0.01 seconds):
SELECT
id
FROM
devicelog
WHERE
device_id = 26
AND field_id = 64
AND unixtime >= 1397166634707 /* a fairly recent time */
ORDER BY unixtime ASC /* <- using unixtime to order */
LIMIT 1
EXPLAIN:
1, SIMPLE, devicelog, range, device_id,field_id,field_id_2,unixtime, unixtime, 9, , 897776, Using index condition; Using where
How would omitting an ORDER BY decrease performance? It seems logical to think that it would increase speed.
Yet, if I change the unixtime to something far back, to "1", it will completely slow down when I use the ORDER BY unixtime. I believe the unixtime index is ordered ascendingly, so this doesn't make much sense either.
This query performs in an opposite manner as the queries above.
Extremely fast (<0.01 seconds):
SELECT
id
FROM
devicelog
WHERE
device_id = 26
AND field_id = 64
AND unixtime >= 1 /* a long time ago */
LIMIT 1
EXPLAIN:
1, SIMPLE, devicelog, index_merge, device_id,field_id,field_id_2,unixtime, field_id,device_id, 8,8, , 6742, Using intersect(field_id,device_id); Using where
This query is the exact same as the fast one, except it's using an older time:
EXTREMELY slow (~7 seconds):
SELECT
id
FROM
devicelog
WHERE
device_id = 26
AND field_id = 64
AND unixtime >= 1 /* a long time ago */
ORDER BY unixtime ASC /* <- using unixtime to order */
LIMIT 1
EXPLAIN:
1, SIMPLE, devicelog, index, device_id,field_id,field_id_2,unixtime, unixtime, 9, , 3504, Using where
Does anyone have any insight on the vast performance differences?
It's hard to make clear suggestions about performance without knowing stuff like the number of rows in your table, and the exact structure of the table.
You might try a compound covering index on (unixtime, device_id, file_id, id). (Look up covering index if you don't know that term).
This will allow the unixtime part of your query to be satisfied with BTREE lookup, then the rest of your query can be satisfied with an index scan.
If you specify ORDER BY unixtime ASC LIMIT 1 you're telling the query engine to stop scanning that index (which is ordered by unixtime as soon as it gets a single hit.
I don't know why it sometimes keeps going on the scan for seven seconds when you omit the ORDER BY. It's possible it has to hunt for the matching device_id and file_id values.
I think it's documented behavior of LIMIT optimization, see http://dev.mysql.com/doc/refman/5.5/en/limit-optimization.html
Optimizing LIMIT Queries
MySQL sometimes optimizes a query that has a LIMIT row_count clause and no HAVING clause:
[...]
If you use LIMIT row_count with ORDER BY, MySQL ends the sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result. If ordering is done by using an index, this is very fast. If a filesort must be done, all rows that match the query without the LIMIT clause are selected, and most or all of them are sorted, before the first row_count are found. After the initial rows have been found, MySQL does not sort any remainder of the result set.
[...]
As soon as MySQL has sent the required number of rows to the client, it aborts the query unless you are using SQL_CALC_FOUND_ROWS.
Because you're trying to get the ID next to a certain date, I would think ordering the result very vital, because else you can get an arbitrary value. Else you've got to use MIN(id) with your conditions to get the desired id value.

Optimize slow SQL query using indexes

I have a problem optimizing a really slow SQL query. I think is an index problem, but I can´t find which index I have to apply.
This is the query:
SELECT
cl.ID, cl.title, cl.text, cl.price, cl.URL, cl.ID AS ad_id, cl.cat_id,
pix.file_name, area.area_name, qn.quarter_name
FROM classifieds cl
/*FORCE INDEX (date_created) */
INNER JOIN classifieds_pix pix ON cl.ID = pix.classified_id AND pix.picture_no = 0
INNER JOIN zip_codes zip ON cl.zip_id = zip.zip_id AND zip.area_id = 132
INNER JOIN area_names area ON zip.area_id = area.id
LEFT JOIN quarter_names qn ON zip.quarter_id = qn.id
WHERE
cl.confirmed = 1
AND cl.country = 'DE'
AND cl.date_created <= NOW() - INTERVAL 1 DAY
ORDER BY
cl.date_created
desc LIMIT 7
MySQL takes about 2 seconds to get the result, and start working in pix.picture_no, but if I force index to "date_created" the query goes much faster, and takes only 0.030 s. But the problem is that the "INNER JOIN zip_codes..." is not always in the query, and when is not, the forced index make the query slow again.
I've been thinking in make a solution by PHP conditions, but I would like to know what is the problem with indexes.
These are several suggestions on how to optimize your query.
NOW Function - You're using the NOW() function in your WHERE clause. Instead, I recommend to use a constant date / timestamp, to allow the value to be cached and optimized. Otherwise, the value of NOW() will be evaluated for each row in the WHERE clause. An alternative to a constant value in case you need a dynamic value, is to add the value from the application (for example calculate the current timestamp and inject it to the query as a constant in the application before executing the query.
To test this recommendation before implementing this change, just replace NOW() with a constant timestamp and check for performance improvements.
Indexes - in general, I would suggest adding an index the contains all columns of your WHERE clause, in this case: confirmed, country, date_created. Start with the column that will cut the amount of data the most and move forward from there. Make sure you adjust the WHERE clause to the same order of the index, otherwise the index won't be used.
I used EverSQL SQL Query Optimizer to get these recommendations (disclaimer: I'm a co-founder of EverSQL and humbly provide these suggestions).
I would actually have a compound index on all elements of your where such as
(country, confirmed, date_created)
Having the country first would keep your optimized index subset to one country first, then within that, those that are confirmed, and finally the date range itself. Don't query on just the date index alone. Since you are ordering by date, the index should be able to optimize it too.
Add explain in front of the query and run it again. This will show you the indexes that are being used.
See: 13.8.2 EXPLAIN Statement
And for an explanation of explain see MySQL Explain Explained. Or: Optimizing MySQL: Queries and Indexes

Optimising sql query performing internal comparison

I have the following query which is a little expensive (currently 500ms):
SELECT * FROM events AS e, event_dates AS ed
WHERE e.id=ed.event_id AND ed.start >= DATE(NOW())
GROUP BY e.modified_datetime, e.id
ORDER BY e.modified_datetime DESC,e.created_datetime DESC
LIMIT 0,4
I have been trying to figure our how to speed it up and noticed that changing ed.start >= DATE(NOW()) to ed.start = DATE(NOW()) runs the query in 20ms. Can anyone help me with ways to speed up this date comparison? Would it help to calculate DATE(NOW()) before running the query??
EDIT: does this help, using EXPLAIN statement
BEFORE
table=event_dates
type=range
rows=25962
ref=null
extra=using where; Using temporary; Using filesort
AFTER
table=event_dates
type=ref
rows=211
ref=const
extra=Using temporary; Using filesort
SELECT * FROM events AS e
INNER JOIN event_dates AS ed ON (e.id=ed.event_id)
WHERE ed.start >= DATE(NOW())
GROUP BY e.modified_datetime, e.id
ORDER BY e.modified_datetime DESC,e.created_datetime DESC
LIMIT 0,4
Remarks
Please don't using implicit SQL '89 syntax, it is an SQL anti-pattern.
Make sure you have an index on all fields used in the join, in the where, in the group by and the order by clauses.
Don't do select * (another anti-pattern), explicitly state the fields you need instead.
Try using InnoDB instead of MyISAM, InnoDB has more optimization tricks for select statements, especially if you only select indexed fields.
For MyISAM tables try using REPAIR TABLE tablename.
For InnoDB that's not an option, but forcing the tabletype from MyISAM to InnoDB will obviously force a full rebuild of the table and all indexes.
Group by implicitly sorts the rows in ASC order, try changing the group by to group by -e.modified_datetime, e.id to minimize the reordering needed by the order by clause. (not sure about this point, would like to know the result)
For reference, using , notation for joins is poor practice AND has been a cause for poor execution plans.
SELECT
*
FROM
events AS e
INNER JOIN
event_dates AS ed
ON e.id=ed.event_id
WHERE
ed.start >= DATE(NOW())
GROUP BY
e.modified_datetime,
e.id
ORDER BY
e.modified_datetime DESC,
e.created_datetime DESC
LIMIT 0,4
Why = is faster than >= is simply because >= is a Range of values, not a very specific value. It's like saying "get me ever page in the book from page 101 onwards" instead of "get me page 101". It's more intensive by definition, especially as your query then involves aggregating and sorting many more records.
In terms of optimisation, your best option is to ensure relevant indexes...
event_dates:
- an index just on start should be sufficient
events:
- an index on id will dramatically improve the join performance
- adding modified_datetime and created_datetime to that index may help
Probably missing indexes on fields you are grouping and searching. Please provide us with: SHOW INDEXES FROM events and SHOW INDEXES FROM event_dates
If there are no indexes then you can add them:
ALTER TABLE events ADD INDEX(modified_datetime);
ALTER TABLE events ADD INDEX(created_datetime);
ALTER TABLE event_dates ADD INDEX(start);
Also be sure you have them on id fields. But here you would probably like to have them as primary keys.
Calculating DATE(NOW()) in advance will not have any impact on performance. It's computed only once (not for each row). But you have 2 different queries (one with >=, another with =). It seems natural that the first one (>=) takes longer time to execute since it returns many more rows. Also, it may decide to use different execution plan compared to query with = , for example, full table scan instead index seek/scan
You can do something like this
DECLARE #CURRENTDATE AS DATETIME
SET #CURRENTDATE = GETDATE()
then change your code to use
#CURRENTDATE variable.... "e.start >= #CURRENTDATE

Will forcing `group by` to sort DESC speed up code up, or slow it down?

In MySQL group by does an implicit order by ASC.
This is great if you wanted to add an ORDER BY ASC because then the results are already ordered.
But if you want to ORDER BY .. DESC MySQL has to order the resultset exactly the other way round.
Will this trick speed up the select, slow it down, or do nothing at all
SELECT field1, field2 FROM atable
GROUP BY -mydate -- group by trick to force a `group by ... desc`
ORDER BY mydate DESC
I know I can just time the code, but I'm looking to gain some deeper insight into the issues at hand.
All the relevant indexes are in place naturally, because it would be silly to optimize without indexes.
From my tests, adding any sort a modifier to group by like - to change the sort order slows things down.
However you are allowed to specify:
SELECT id, name, sum(amount) FROM customers GROUP BY id DESC
And MySQL will happily order the results in DESC order without needing an extra order by clause. This will not incur the extra runtime that adding the - does.
I think you're mistaken: GROUP BY doesn't sort data. It's the default MySQL behaviour that does, as MySQL adds the same ORDER BY as the GROUP BY you've set, as you've mentioned in your first sentence.
So, if you disable the sort, by using ORDER BY NULL, there's no sorting at all. The GROUP BY will only group rows together, using indexes if possible. Hence the «trick» is wrong, as you'll remove the ability to use an index on mydate. GROUP BY performs great as long as the index is good for it.
So:
SELECT field1, field2 FROM atable
GROUP BY mydate
ORDER BY NULL
should be really fast if you have an index on (mydate), and
SELECT field1, field2 FROM atable
GROUP BY mydate
ORDER BY mydate DESC
should be as fast (depending on the table structure, MyISAM is a little bit slower in reverse order).
If you have a WHERE clause, check that you've added the columns in the index, for example:
SELECT field1, field2 FROM atable
WHERE fied1 = 5
GROUP BY mydate
ORDER BY mydate DESC
will need an index on (field1,mydate).
Slow It Down
What's happening here is that you are asking MySQL to sort the records based on a (probably) non-indexed column mydate.
Any sort takes time, but sorts on indexed columns are blazing fast compared to non-indexed ones.
Here's some additional reading: http://www.mysqlperformanceblog.com/2006/09/01/order-by-limit-performance-optimization/