I have the following query which is a little expensive (currently 500ms):
SELECT * FROM events AS e, event_dates AS ed
WHERE e.id=ed.event_id AND ed.start >= DATE(NOW())
GROUP BY e.modified_datetime, e.id
ORDER BY e.modified_datetime DESC,e.created_datetime DESC
LIMIT 0,4
I have been trying to figure our how to speed it up and noticed that changing ed.start >= DATE(NOW()) to ed.start = DATE(NOW()) runs the query in 20ms. Can anyone help me with ways to speed up this date comparison? Would it help to calculate DATE(NOW()) before running the query??
EDIT: does this help, using EXPLAIN statement
BEFORE
table=event_dates
type=range
rows=25962
ref=null
extra=using where; Using temporary; Using filesort
AFTER
table=event_dates
type=ref
rows=211
ref=const
extra=Using temporary; Using filesort
SELECT * FROM events AS e
INNER JOIN event_dates AS ed ON (e.id=ed.event_id)
WHERE ed.start >= DATE(NOW())
GROUP BY e.modified_datetime, e.id
ORDER BY e.modified_datetime DESC,e.created_datetime DESC
LIMIT 0,4
Remarks
Please don't using implicit SQL '89 syntax, it is an SQL anti-pattern.
Make sure you have an index on all fields used in the join, in the where, in the group by and the order by clauses.
Don't do select * (another anti-pattern), explicitly state the fields you need instead.
Try using InnoDB instead of MyISAM, InnoDB has more optimization tricks for select statements, especially if you only select indexed fields.
For MyISAM tables try using REPAIR TABLE tablename.
For InnoDB that's not an option, but forcing the tabletype from MyISAM to InnoDB will obviously force a full rebuild of the table and all indexes.
Group by implicitly sorts the rows in ASC order, try changing the group by to group by -e.modified_datetime, e.id to minimize the reordering needed by the order by clause. (not sure about this point, would like to know the result)
For reference, using , notation for joins is poor practice AND has been a cause for poor execution plans.
SELECT
*
FROM
events AS e
INNER JOIN
event_dates AS ed
ON e.id=ed.event_id
WHERE
ed.start >= DATE(NOW())
GROUP BY
e.modified_datetime,
e.id
ORDER BY
e.modified_datetime DESC,
e.created_datetime DESC
LIMIT 0,4
Why = is faster than >= is simply because >= is a Range of values, not a very specific value. It's like saying "get me ever page in the book from page 101 onwards" instead of "get me page 101". It's more intensive by definition, especially as your query then involves aggregating and sorting many more records.
In terms of optimisation, your best option is to ensure relevant indexes...
event_dates:
- an index just on start should be sufficient
events:
- an index on id will dramatically improve the join performance
- adding modified_datetime and created_datetime to that index may help
Probably missing indexes on fields you are grouping and searching. Please provide us with: SHOW INDEXES FROM events and SHOW INDEXES FROM event_dates
If there are no indexes then you can add them:
ALTER TABLE events ADD INDEX(modified_datetime);
ALTER TABLE events ADD INDEX(created_datetime);
ALTER TABLE event_dates ADD INDEX(start);
Also be sure you have them on id fields. But here you would probably like to have them as primary keys.
Calculating DATE(NOW()) in advance will not have any impact on performance. It's computed only once (not for each row). But you have 2 different queries (one with >=, another with =). It seems natural that the first one (>=) takes longer time to execute since it returns many more rows. Also, it may decide to use different execution plan compared to query with = , for example, full table scan instead index seek/scan
You can do something like this
DECLARE #CURRENTDATE AS DATETIME
SET #CURRENTDATE = GETDATE()
then change your code to use
#CURRENTDATE variable.... "e.start >= #CURRENTDATE
Related
There are 2 samples.
In the first example, it gives faster results when using orderby. (according to phpmyadmin speed report)
In the other example, I don't use order by, it gives slower results. (according to phpmyadmin speed report)
Isn't it unreasonable that it gives quick results when using Orderby?
The ranking doesn't matter to me, it's the speed that matters.
select bayi,tutar
from siparisler
where durum='1' and MONTH(tarih) = MONTH(CURDATE()) and YEAR(tarih) = YEAR(CURRENT_DATE())
order by id desc
Speed: 0.0006
select bayi,tutar
from siparisler
where durum='1' and MONTH(tarih) = MONTH(CURDATE()) and YEAR(tarih) = YEAR(CURRENT_DATE())
Speed: 0.7785
An order by query will never execute faster than the same query without the order by clause. Sorting rows incurs more work for the database. In the best-case scenario, the sorting becomes a no-op because MySQL fetched the rows in the correct order in the first place: but that just make the two queries equivalent in terms of performance (it does not make the query that sorts faster).
Possibly, the results of the order by were cached already, so MYSQL gives you the result directly from the cache rather than actually executing the query.
If performance is what matters most to you, let me suggest to change the where predicate in order not to use date functions on the tarih column: such construct prevents the database to take advantage of an index (we say the predicate is non-SARGable). Consider:
select bayi, tutar
from siparisler
where
durum = 1
and tarih >= dateformat(current_date, '%Y-%m-01')
and tarih < dateformat(current_date, '%Y-%m-01') + interval 1 month
order by id desc
For performance with this query, consider an index on (durum, tarih, id desc, bay, tutar): it should behave as a covering index, that MySQL can use to execute the entire query, without even looking at the actual data.
At 0.0006s, you are almost certainly measuring the performance of the query_cache rather than the execution time. Try both queries again with SELECT SQL_NO_CACHE and see what the performance difference is.
First, I recommend writing the query as:
select bayi, tutar
from siparisler p
where durum = 1 and -- no quotes assuming this is an integer
tarih >= curdate() - interval (1 - day(curdate()) day;
This can take advantage of an index on (durm, tarih).
But that isn't your question. It is possible that the order by could result in a radically different execution plan. This is hypothetical, but the intention is to explain how this might occur.
Let me assume the following:
The table only has an index on (id desc, durum, tarih).
The where clause matches few rows.
The rows are quite wide.
The query without the order by would probably generate an execution plan that is a full table scan. Because the rows are wide, lots of unnecessary data would be read.
The query with the order by could read the data in order and then apply the where conditions. This would be faster than the other version, because only the rows that match the where conditions would be read in.
I cannot guarantee that this is happening. But there are some counterintuitive situations that arise with queries.
You can analyze it through the EXPLAIN command, and then check the value corresponding to the type field, index or all
Example:
EXPLAIN SELECT bayi,tutar
FROM siparisler
WHERE durum='1' AND MONTH(tarih) = MONTH(CURDATE()) AND YEAR(tarih) = YEAR(CURRENT_DATE())
ORDER BY id DESC;
I have a query that works, but it is slow. Is there a way to speed this up? Basically I have a table with timecard entries, and then a second table with time breakdowns of that entry, related by the TimecardID. What I am looking for is timeblocks that there are no breakdowns for. I thought if I cut the criteria down to 2 months that it would speed it up. Thanks for your help
SELECT * FROM Timecards
WHERE NOT EXISTS (SELECT TimeCardID FROM TimecardBreakdown WHERE Timecards.ID = TimecardBreakdown.TimeCardID)
AND Status <> 0
AND DateIn >= CURRENT_DATE() - INTERVAL 2 MONTH
It seems you want to know the TimecardIDs which do not exist in the TimecardBreakdown table, in which case you can use the left outer join.
SELECT a.*
FROM Timecards a
LEFT OUTER JOIN TimecardBreakdown b ON a.TimecardID = b.TimecardID
WHERE b.TimecardID IS NULL
This would get rid of the subquery (which is expensive) and use join (which is more efficient).
MySQL stinks doing correlated subqueries fast. Try to make your subqueries independent and join them. You can use the LEFT JOIN ... IS NULL pattern to replace WHERE NOT EXISTS.
SELECT tc.*
FROM Timecards tc
LEFT JOIN TimecardBreakdown tcb ON tc.ID = tcb.TimeCardId
WHERE tc.DateIn >= CURRENT_DATE() - INTERVAL 2 MONTH
AND tc.Status <> 0
AND tcb.TimeCardId IS NULL
Some optimization points.
First, if you can change tc.Status <> 0 to tc.Status > 0 it makes an index range scan possible on that column.
Second, when you're optimizing stuff, SELECT * is considered harmful. Instead, if you can give the names of just the columns you need, things will be quicker. The database server has to sling around all the data you ask for; it can't tell if you're going to ignore some of it.
Third, this query will be helped by a compound index on Timecards (DateIn, Status, ID). That compound index can be used to do the heavy lifing of satisfying your query conditions.
That's called a covering index; it contains the data needed to satisfy much of your query. If you were to index just the DateIn column, then the query handler would have to bounce back to the main table to find the values of Status and ID. When those columns appear in the index, it saves that extra operation.
If you SELECT a certain set of columns rather than doing SELECT *, including those columns in the covering index can dramatically improve query performance. That's one of several reasons SELECT * is considered harmful.
(Some makes and model of DBMS have ways to specify lists of columns to ride along on indexes without actually indexing them. MySQL requires you to index them. But covering indexes still help.)
Read this: http://use-the-index-luke.com/
I have this query which basically goes through a bunch of tables to get me some formatted results but I can't seem to find the bottleneck. The easiest bottleneck was the ORDER BY RAND() but the performance are still bad.
The query takes from 10 sec to 20 secs without ORDER BY RAND();
SELECT
c.prix AS prix,
ST_X(a.point) AS X,
ST_Y(a.point) AS Y,
s.sizeFormat AS size,
es.name AS estateSize,
c.title AS title,
DATE_FORMAT(c.datePub, '%m-%d-%y') AS datePub,
dbr.name AS dateBuiltRange,
m.myId AS meuble,
c.rawData_id AS rawData_id,
GROUP_CONCAT(img.captionWebPath) AS paths
FROM
immobilier_ad_blank AS c
LEFT JOIN PropertyFeature AS pf ON (c.propertyFeature_id = pf.id)
LEFT JOIN Adresse AS a ON (c.adresse_id = a.id)
LEFT JOIN Size AS s ON (pf.size_id = s.id)
LEFT JOIN EstateSize AS es ON (pf.estateSize_id = es.id)
LEFT JOIN Meuble AS m ON (pf.meuble_id = m.id)
LEFT JOIN DateBuiltRange AS dbr ON (pf.dateBuiltRange_id = dbr.id)
LEFT JOIN ImageAd AS img ON (img.commonAd_id = c.rawData_id)
WHERE
c.prix != 0
AND pf.subCatMyId = 1
AND (
(
c.datePub > STR_TO_DATE('01-04-2016', '%d-%m-%Y')
AND c.datePub < STR_TO_DATE('30-04-2016', '%d-%m-%Y')
)
OR date_format(c.datePub, '%d-%m-%Y') = '30-04-2016'
)
AND a.validPoint = 1
GROUP BY
c.id
#ORDER BY
# RAND()
LIMIT
5000
Here is the explain query:
Visual Portion:
And here is a screenshot of mysqltuner
EDIT 1
I have many indexes Here they are:
EDIT 2:
So you guys did it. Down to .5 secs to 2.5 secs.
I mostly followed all of your advices and changed some of my.cnf + runned optimized on my tables.
You're searching for dates in a very suboptimal way. Try this.
... c.datePub >= STR_TO_DATE('01-04-2016', '%d-%m-%Y')
AND c.datePub < STR_TO_DATE('30-04-2016', '%d-%m-%Y') + INTERVAL 1 DAY
That allows a range scan on an index on the datePub column. You should create a compound index for that table on (datePub, prix, addresse_id, rawData_id) and see if it helps.
Also try an index on a (valid_point). Notice that your use of a geometry data type in that table is probably not helping anything.
To begin with you have quite a lot of indexes but many of them are not useful. Remember more indexes means slower inserts and updates. Also mysql is not good at using more than one index per table in complex queries. The following indexes have a cardinality < 10 and probably should be dropped.
IDX_...E88B
IDX....62AF
IDX....7DEE
idx2
UNIQ...F210
UNIQ...F210..
IDX....0C00
IDX....A2F1
At this point I got tired of the excercise, there are many more
Then you have some duplicated data.
point
lat
lng
The point field has the lat and lng in it. So the latter two are not needed. That means you can lose two more indexes idxlat and idxlng. I am not quite sure how idxlng appears twice in the index list for the same table.
These optimizations will lead to an overall increase in performance for INSERTS and UPDATES and possibly for all SELECTs as well because the query planner needs to spend less time deciding which index to use.
Then we notice from your explain that the query does not use any index on table Adresse (a). But your where clause has a.validPoint = 1 clearly you need an index on it as suggested by #Ollie-Jones
However I suspect that this index may have low cardinality. In that case I recommend that you create a composite index on this column + another.
The problem is your join with (a). The table has an index, but the index can't be used, more than likely due to the sort (/group by), or possibly incompatible types. The EXPLAIN shows three quarters of a million rows examined, this means that index lookup was not possible.
When designing a query, look for the smallest possible result set - search by that index, and then join from there. Perhaps "c" isn't the best table for the primary query.
(You could try using FORCE INDEX (id) on table a, if it doesn't work, the error may give you more information).
As others have pointed out, you need an index on a.validPoint but what about c.datePub that is also used in the WHERE clause. Why not a multiple column index on datePub, address_id the index on address_id is already used, so a multiple column index will be better here.
I'm building a complex multi-table MySQL query, and even though it works, I'm wondering could I make it more simple.
The idea behind it is this, using the Events table that logs all site interaction, select the ID, Title, and Slug of the 10 most popular blog posts, and order by the most hits descending.
SELECT content.id, content.title, content.slug, COUNT(events.id) AS hits
FROM content, events
WHERE events.created >= DATE_SUB(NOW(), INTERVAL 1 MONTH)
AND events.page_url REGEXP '^/posts/[0-9]'
AND content.id = events.content_id
GROUP BY content.id
ORDER BY hits DESC
LIMIT 10
Blog post URLs have the following format:
/posts/2013-05-16-hello-world
As I mentioned it seems to work, but I'm sure I could be doing this cleaner.
Thanks,
The condition on created and the condition on page_url are both range conditions. You can get index-assistance for only one range condition per table in a SQL query, so you have to pick one or the other to index.
I would create an index on the events table over two columns (content_id, created).
ALTER TABLE events ADD KEY (content_id, created);
I'm assuming that restricting by created date is more selective than restricting by page_url, because I assume "/posts/" is going to match a large majority of the events.
After narrowing down the matching rows by created date, the page-url condition will have to be handled by the SQL layer, but hopefully that won't be too inefficient.
There is no performance difference between SQL-89 ("comma-style") join syntax and SQL-92 JOIN syntax. I do recommend SQL-92 syntax because it's more clear and it supports outer joins, but performance is not a reason to use it. The SQL query optimizer supports both join styles.
Temporary table and filesort are often costly for performance. This query is bound to create a temporary table and use a filesort, because you're using GROUP BY and ORDER BY against different columns. You can only hope that the temp table will be small enough to fit within your tmp_table_size limit (or increase that value). But that won't help if content.title or content.slug are BLOB/TEXT columns, the temp table will be forced to be spooled on disk anyway.
Instead of a regular expression, you can use the left function:
SELECT content.id, content.title, content.slug, COUNT(events.id) AS hits FROM content JOIN events ON content.id = events.content_id
WHERE events.created >= DATE_SUB(NOW(), INTERVAL 1 MONTH)
AND left( events.page_url, 7) = '/posts/'
GROUP BY content.id
ORDER BY hits DESC
LIMIT 10)
But that's just off the top of my head, and without a fiddle, untested. The JOIN suggestion, made in the comment, is also good and has been reflected in my answer.
I have a problem optimizing a really slow SQL query. I think is an index problem, but I can´t find which index I have to apply.
This is the query:
SELECT
cl.ID, cl.title, cl.text, cl.price, cl.URL, cl.ID AS ad_id, cl.cat_id,
pix.file_name, area.area_name, qn.quarter_name
FROM classifieds cl
/*FORCE INDEX (date_created) */
INNER JOIN classifieds_pix pix ON cl.ID = pix.classified_id AND pix.picture_no = 0
INNER JOIN zip_codes zip ON cl.zip_id = zip.zip_id AND zip.area_id = 132
INNER JOIN area_names area ON zip.area_id = area.id
LEFT JOIN quarter_names qn ON zip.quarter_id = qn.id
WHERE
cl.confirmed = 1
AND cl.country = 'DE'
AND cl.date_created <= NOW() - INTERVAL 1 DAY
ORDER BY
cl.date_created
desc LIMIT 7
MySQL takes about 2 seconds to get the result, and start working in pix.picture_no, but if I force index to "date_created" the query goes much faster, and takes only 0.030 s. But the problem is that the "INNER JOIN zip_codes..." is not always in the query, and when is not, the forced index make the query slow again.
I've been thinking in make a solution by PHP conditions, but I would like to know what is the problem with indexes.
These are several suggestions on how to optimize your query.
NOW Function - You're using the NOW() function in your WHERE clause. Instead, I recommend to use a constant date / timestamp, to allow the value to be cached and optimized. Otherwise, the value of NOW() will be evaluated for each row in the WHERE clause. An alternative to a constant value in case you need a dynamic value, is to add the value from the application (for example calculate the current timestamp and inject it to the query as a constant in the application before executing the query.
To test this recommendation before implementing this change, just replace NOW() with a constant timestamp and check for performance improvements.
Indexes - in general, I would suggest adding an index the contains all columns of your WHERE clause, in this case: confirmed, country, date_created. Start with the column that will cut the amount of data the most and move forward from there. Make sure you adjust the WHERE clause to the same order of the index, otherwise the index won't be used.
I used EverSQL SQL Query Optimizer to get these recommendations (disclaimer: I'm a co-founder of EverSQL and humbly provide these suggestions).
I would actually have a compound index on all elements of your where such as
(country, confirmed, date_created)
Having the country first would keep your optimized index subset to one country first, then within that, those that are confirmed, and finally the date range itself. Don't query on just the date index alone. Since you are ordering by date, the index should be able to optimize it too.
Add explain in front of the query and run it again. This will show you the indexes that are being used.
See: 13.8.2 EXPLAIN Statement
And for an explanation of explain see MySQL Explain Explained. Or: Optimizing MySQL: Queries and Indexes