Optimize slow SQL query using indexes

Optimize slow SQL query using indexes - mysql

I have a problem optimizing a really slow SQL query. I think is an index problem, but I can´t find which index I have to apply.
This is the query:
SELECT
cl.ID, cl.title, cl.text, cl.price, cl.URL, cl.ID AS ad_id, cl.cat_id,
pix.file_name, area.area_name, qn.quarter_name
FROM classifieds cl
/*FORCE INDEX (date_created) */
INNER JOIN classifieds_pix pix ON cl.ID = pix.classified_id AND pix.picture_no = 0
INNER JOIN zip_codes zip ON cl.zip_id = zip.zip_id AND zip.area_id = 132
INNER JOIN area_names area ON zip.area_id = area.id
LEFT JOIN quarter_names qn ON zip.quarter_id = qn.id
WHERE
cl.confirmed = 1
AND cl.country = 'DE'
AND cl.date_created <= NOW() - INTERVAL 1 DAY
ORDER BY
cl.date_created
desc LIMIT 7
MySQL takes about 2 seconds to get the result, and start working in pix.picture_no, but if I force index to "date_created" the query goes much faster, and takes only 0.030 s. But the problem is that the "INNER JOIN zip_codes..." is not always in the query, and when is not, the forced index make the query slow again.
I've been thinking in make a solution by PHP conditions, but I would like to know what is the problem with indexes.

These are several suggestions on how to optimize your query.
NOW Function - You're using the NOW() function in your WHERE clause. Instead, I recommend to use a constant date / timestamp, to allow the value to be cached and optimized. Otherwise, the value of NOW() will be evaluated for each row in the WHERE clause. An alternative to a constant value in case you need a dynamic value, is to add the value from the application (for example calculate the current timestamp and inject it to the query as a constant in the application before executing the query.
To test this recommendation before implementing this change, just replace NOW() with a constant timestamp and check for performance improvements.
Indexes - in general, I would suggest adding an index the contains all columns of your WHERE clause, in this case: confirmed, country, date_created. Start with the column that will cut the amount of data the most and move forward from there. Make sure you adjust the WHERE clause to the same order of the index, otherwise the index won't be used.
I used EverSQL SQL Query Optimizer to get these recommendations (disclaimer: I'm a co-founder of EverSQL and humbly provide these suggestions).

I would actually have a compound index on all elements of your where such as
(country, confirmed, date_created)
Having the country first would keep your optimized index subset to one country first, then within that, those that are confirmed, and finally the date range itself. Don't query on just the date index alone. Since you are ordering by date, the index should be able to optimize it too.

Add explain in front of the query and run it again. This will show you the indexes that are being used.
See: 13.8.2 EXPLAIN Statement
And for an explanation of explain see MySQL Explain Explained. Or: Optimizing MySQL: Queries and Indexes

Related

How MYSQL Engine works in Query

This is question is simple i know but i would like to somebody explain me to see if i am getting right.
This simple query does MYSQL always executes the semantics from left to right?
SELECT c03,c04,c05,count(*) as c FROM table
where status='Active' and c04 is not null and c05 is not null
group by c03,c04,c05
having c>1 order by c desc limit 10;
The engine starts from left to right filtering every record by
* Status later comparing c04 and later c05
* Later groups the results
* later filters again the result applying the c>1 filter
* Later sorts and fetch the first 10 results and disposing the others
Or have some other optimizations suppose not index are using...

I guess Status later comparing c04 and later c05 depends on the available indexes and the cardinality of the columns, and is not always evaluated in that order. MySQL will try to evaluate the constraint first, which promises to reduce the result set the most at the lowest cost, e.g. constraints using columns without indexes might be evaluated last, because the evaluation is expensive.
The rest leaves little room for other orderings, but some steps may gain benefit from appropriate indexes.

Query Speed Issue with NOT EXISTS condition

I have a query that works, but it is slow. Is there a way to speed this up? Basically I have a table with timecard entries, and then a second table with time breakdowns of that entry, related by the TimecardID. What I am looking for is timeblocks that there are no breakdowns for. I thought if I cut the criteria down to 2 months that it would speed it up. Thanks for your help
SELECT * FROM Timecards
WHERE NOT EXISTS (SELECT TimeCardID FROM TimecardBreakdown WHERE Timecards.ID = TimecardBreakdown.TimeCardID)
AND Status <> 0
AND DateIn >= CURRENT_DATE() - INTERVAL 2 MONTH

It seems you want to know the TimecardIDs which do not exist in the TimecardBreakdown table, in which case you can use the left outer join.
SELECT a.*
FROM Timecards a
LEFT OUTER JOIN TimecardBreakdown b ON a.TimecardID = b.TimecardID
WHERE b.TimecardID IS NULL
This would get rid of the subquery (which is expensive) and use join (which is more efficient).

MySQL stinks doing correlated subqueries fast. Try to make your subqueries independent and join them. You can use the LEFT JOIN ... IS NULL pattern to replace WHERE NOT EXISTS.
SELECT tc.*
FROM Timecards tc
LEFT JOIN TimecardBreakdown tcb ON tc.ID = tcb.TimeCardId
WHERE tc.DateIn >= CURRENT_DATE() - INTERVAL 2 MONTH
AND tc.Status <> 0
AND tcb.TimeCardId IS NULL
Some optimization points.
First, if you can change tc.Status <> 0 to tc.Status > 0 it makes an index range scan possible on that column.
Second, when you're optimizing stuff, SELECT * is considered harmful. Instead, if you can give the names of just the columns you need, things will be quicker. The database server has to sling around all the data you ask for; it can't tell if you're going to ignore some of it.
Third, this query will be helped by a compound index on Timecards (DateIn, Status, ID). That compound index can be used to do the heavy lifing of satisfying your query conditions.
That's called a covering index; it contains the data needed to satisfy much of your query. If you were to index just the DateIn column, then the query handler would have to bounce back to the main table to find the values of Status and ID. When those columns appear in the index, it saves that extra operation.
If you SELECT a certain set of columns rather than doing SELECT *, including those columns in the covering index can dramatically improve query performance. That's one of several reasons SELECT * is considered harmful.
(Some makes and model of DBMS have ways to specify lists of columns to ride along on indexes without actually indexing them. MySQL requires you to index them. But covering indexes still help.)
Read this: http://use-the-index-luke.com/

MySQL Slow query ~ 10 seconds

I have this query which basically goes through a bunch of tables to get me some formatted results but I can't seem to find the bottleneck. The easiest bottleneck was the ORDER BY RAND() but the performance are still bad.
The query takes from 10 sec to 20 secs without ORDER BY RAND();
SELECT
c.prix AS prix,
ST_X(a.point) AS X,
ST_Y(a.point) AS Y,
s.sizeFormat AS size,
es.name AS estateSize,
c.title AS title,
DATE_FORMAT(c.datePub, '%m-%d-%y') AS datePub,
dbr.name AS dateBuiltRange,
m.myId AS meuble,
c.rawData_id AS rawData_id,
GROUP_CONCAT(img.captionWebPath) AS paths
FROM
immobilier_ad_blank AS c
LEFT JOIN PropertyFeature AS pf ON (c.propertyFeature_id = pf.id)
LEFT JOIN Adresse AS a ON (c.adresse_id = a.id)
LEFT JOIN Size AS s ON (pf.size_id = s.id)
LEFT JOIN EstateSize AS es ON (pf.estateSize_id = es.id)
LEFT JOIN Meuble AS m ON (pf.meuble_id = m.id)
LEFT JOIN DateBuiltRange AS dbr ON (pf.dateBuiltRange_id = dbr.id)
LEFT JOIN ImageAd AS img ON (img.commonAd_id = c.rawData_id)
WHERE
c.prix != 0
AND pf.subCatMyId = 1
AND (
(
c.datePub > STR_TO_DATE('01-04-2016', '%d-%m-%Y')
AND c.datePub < STR_TO_DATE('30-04-2016', '%d-%m-%Y')
)
OR date_format(c.datePub, '%d-%m-%Y') = '30-04-2016'
)
AND a.validPoint = 1
GROUP BY
c.id
#ORDER BY
# RAND()
LIMIT
5000
Here is the explain query:
Visual Portion:
And here is a screenshot of mysqltuner
EDIT 1
I have many indexes Here they are:
EDIT 2:
So you guys did it. Down to .5 secs to 2.5 secs.
I mostly followed all of your advices and changed some of my.cnf + runned optimized on my tables.

You're searching for dates in a very suboptimal way. Try this.
... c.datePub >= STR_TO_DATE('01-04-2016', '%d-%m-%Y')
AND c.datePub < STR_TO_DATE('30-04-2016', '%d-%m-%Y') + INTERVAL 1 DAY
That allows a range scan on an index on the datePub column. You should create a compound index for that table on (datePub, prix, addresse_id, rawData_id) and see if it helps.
Also try an index on a (valid_point). Notice that your use of a geometry data type in that table is probably not helping anything.

To begin with you have quite a lot of indexes but many of them are not useful. Remember more indexes means slower inserts and updates. Also mysql is not good at using more than one index per table in complex queries. The following indexes have a cardinality < 10 and probably should be dropped.
IDX_...E88B
IDX....62AF
IDX....7DEE
idx2
UNIQ...F210
UNIQ...F210..
IDX....0C00
IDX....A2F1
At this point I got tired of the excercise, there are many more
Then you have some duplicated data.
point
lat
lng
The point field has the lat and lng in it. So the latter two are not needed. That means you can lose two more indexes idxlat and idxlng. I am not quite sure how idxlng appears twice in the index list for the same table.
These optimizations will lead to an overall increase in performance for INSERTS and UPDATES and possibly for all SELECTs as well because the query planner needs to spend less time deciding which index to use.
Then we notice from your explain that the query does not use any index on table Adresse (a). But your where clause has a.validPoint = 1 clearly you need an index on it as suggested by #Ollie-Jones
However I suspect that this index may have low cardinality. In that case I recommend that you create a composite index on this column + another.

The problem is your join with (a). The table has an index, but the index can't be used, more than likely due to the sort (/group by), or possibly incompatible types. The EXPLAIN shows three quarters of a million rows examined, this means that index lookup was not possible.
When designing a query, look for the smallest possible result set - search by that index, and then join from there. Perhaps "c" isn't the best table for the primary query.
(You could try using FORCE INDEX (id) on table a, if it doesn't work, the error may give you more information).

As others have pointed out, you need an index on a.validPoint but what about c.datePub that is also used in the WHERE clause. Why not a multiple column index on datePub, address_id the index on address_id is already used, so a multiple column index will be better here.

Mysql query - faster?

So I have this MySQL query, and as I have lots of records this gets very slow, the computers that use the software (cash registers) aren't that powerful either.
Is there a way to get the same result, but faster? Would really appreciate help!
SELECT d.sifra, COUNT(d.sifra) AS pogosti, c.*, s.Stevilka as Stev_sk FROM Cenik c, dnevna d, Podskupina s
WHERE d.sifra = c.Sifra AND d.datum >= DATE(DATE_SUB(NOW(),INTERVAL 3 DAY))
GROUP BY d.sifra ORDER BY pogosti DESC limit 27

Have you tried indexing?
You are using c.Sifra in the WHERE, so you probably want
CREATE INDEX Cenik_Sifra ON Cenik(Sifra);
Also you use datum and sifra from dnevna, and datum is your SELECT, so
CREATE INDEX dnevna_ndx ON dnevna(datum, sifra);
Finally there's no JOIN condition on Podskupina, whence you draw Stevilka. Is this a constant table? As it is, you're just counting rows in Podskupina and/or getting an unspecified value out of it, unless it only has the one row.
On some versions of MySQL you might also find benefit in pre-calculating the datum:
SELECT #datum := DATE(DATE_SUB(NOW(), INTERVAL 3 DAY))
and then use #datum in your query. This might improve its chances of a good indexed performance.
Without knowing more about the structure and cardinality of the involved tables, though, there's little that can be done.
At the very least you should post the result of
EXPLAIN SELECT...(your select)
in the question.

you don't have condition to join Podskupina s, and you get cross join (all to all), so you get x rows from join "d.sifra = c.Sifra" multiplicate by y rows of Podskupina s

This looks like a very problematic query. Do you really need to return all of c.* ? And where's the join or filter on Podskupina? Once you tighten the query, make sure you've created good indexes on the tables. For example, presuming you've already got a clustered index on a unique ID as a primary key in dnevna, performance would typically benefit by putting a secondary index on the sifra and datum columns.

Instructing MySQL to apply WHERE clause to rows returned by previous WHERE clause

I have the following query:
SELECT dt_stamp
FROM claim_notes
WHERE type_id = 0
AND dt_stamp >= :dt_stamp
AND DATE( dt_stamp ) = :date
AND user_id = :user_id
AND note LIKE :click_to_call
ORDER BY dt_stamp
LIMIT 1
The claim_notes table has about half a million rows, so this query runs very slowly since it has to search against the unindexed note column (which I can't do anything about). I know that when the type_id, dt_stamp, and user_id conditions are applied, I'll be searching against about 60 rows instead of half a million. But MySQL doesn't seem to apply these in order. What I'd like to do is to see if there's a way to tell MySQL to only apply the note LIKE :click_to_call condition to the rows that meet the former conditions so that it's not searching all rows with this condition.
What I've come up with is this:
SELECT dt_stamp
FROM (
SELECT *
FROM claim_notes
WHERE type_id = 0
AND dt_stamp >= :dt_stamp
AND DATE( dt_stamp ) = :date
AND user_id = :user_id
)
AND note LIKE :click_to_call
ORDER BY dt_stamp
LIMIT 1
This works and is extremely fast. I'm just wondering if this is the right way to do this, or if there is a more official way to handle it.

It shouldn't be necessary to do this. The MySQL optimizer can handle it if you have multiple terms in your WHERE clause separated by AND. Basically, it knows how to do "apply all the conditions you can using indexes, then apply unindexed expressions only to the remaining rows."
But choosing the right index is important. A multi-column index is best for a series of AND terms than individual indexes. MySQL can apply index intersection, but that's much less effective than finding the same rows with a single index.
A few logical rules apply to creating multi-column indexes:
Conditions on unique columns are preferred over conditions on non-unique columns.
Equality conditions (=) are preferred over ranges (>=, IN, BETWEEN, !=, etc.).
After the first column in the index used for a range condition, subsequent columns won't use an index.
Most of the time, searching the result of a function on a column (e.g. DATE(dt_stamp)) won't use an index. It'd be better in that case to store a DATE data type and use = instead of >=.
If the condition matches > 20% of the table, MySQL probably will decide to skip the index and do a table-scan anyway.
Here are some webinars by myself and my colleagues at Percona to help explain index design:
Tools and Techniques for Index Design
MySQL Indexing: Best Practices
Advanced MySQL Query Tuning
Really Large Queries: Advanced Optimization Techniques
You can get the slides for these webinars for free, and view the recording for free, but the recording requires registration.

Don't go for the derived table solution as it is not performant. I'm surprised about the fact that having = and >= operators MySQL is going for the LIKE first.
Anyway, I'd say you could try adding some indexes on those fields and see what happens:
ALTER TABLE claim_notes ADD INDEX(type_id, user_id);
ALTER TABLE claim_notes ADD INDEX(dt_stamp);
The latter index won't actually improve the search on the indexes but rather the sorting of the results.
Of course, having an EXPLAIN of the query would help.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008