Optimalize MySQL query - mysql

I have one query, that is comparatively slow. I tried to rewrite it many times, but I cant find better solution. So I want to ask you, if it is written in wrong way from the beginning or it is ok.
SELECT sql_calc_found_rows
present_id, present_id, present_url, present_name, present_text_short, foto_name, price_id, price_price, price_amount, price_dis
FROM a_present
LEFT JOIN
(SELECT price_id, price_present_id, price_supplier_id, price_dis, price_amount,
(CASE WHEN price_dis <> 0 THEN price_dis ELSE price_amount END) as price_price
FROM a_price
WHERE
price_visibility = 1 AND price_deleted <> 1
GROUP BY price_id ) pri
ON pri.price_present_id = present_id
LEFT JOIN _present_fotos ON foto_id = present_title_foto
LEFT JOIN _cate_pres ON cp_present = present_id
WHERE present_visibility = 1 AND present_deleted <> 1 AND price_price > 0 AND present_out <> 1 AND cp_category IN (30,31,232,32)
GROUP BY present_id
ORDER BY price_price
LIMIT 8
Description: price_dis is price after discount, price_amount is price before discount.. Each product (present) has more prices than one.. Is there faster solution to select final price?
If you will find table structure bad, I will be in trouble:)
Thank you very much!
EDIT:
explain select

OK, so I see a couple of things that could be improved.
First of all, you are JOINing with a table derived from a subquery - with subqueries MySQL does not use indexes (hence the slowdown). Instead of joining with a subquery, try JOINing with a table a_price itself, and put that CASE statement in the original (parent) SELECT. It should allow MySQL to use indexes when JOINing, and it is really important, when your subquery returns many rows.
It should look somewhat like this (including MIN() and GROUP BY, as you need minimum price):
SELECT (...), price_amount, price_dis, MIN((CASE WHEN pri.price_dis <> 0 THEN pri.price_dis ELSE pri.price_amount END)) as price_price
FROM a_present
LEFT JOIN a_price pri
ON pri.price_present_id = present_id AND price_visibility = 1 AND price_deleted <> 1
(...)
GROUP BY present_id
Second of all - as EXPLAIN SELECT suggests - MySQL does not use index on table _cate_pres. You should make it use index to JOIN and to select categories you need (since you put some of them in the IN (..) statement).
Try adding an index on _cate_pres.cp_category, and/or maybe a composite index on this table (using two columns - cp_category and cp_present).
Generally, the result you want to achieve (not always it's possible, but in your case I'm pretty sure it is), is to make the following disappear from EXPLAIN SELECT:
[key] NULL - this means no key is used in this particular set
[Extra] Using temporary - this means a temporary table is created to retrieve results, and it is usually bad for performance
[Extra] Using filesort - this means no index is used for sorting, so sorting process is slow.
Read more about indexes in the Mysql docs, and please give much attention to EXPLAIN output.

Related

Slow count query with group

I have a common aggregation query:
SELECT
products.type,
count(products.id)
FROM
products
INNER JOIN product_colors
ON products.id = product_colors.product_id
AND product_colors.is_active = 1
AND product_colors.is_archive = 0
WHERE
(products.is_active = 1
AND product_colors.is_individual = 0
AND product_colors.is_visible = 1)
GROUP BY
type
It lasts in the order of 0.1 seconds. The indexes look fine, tmp_table_size = 128M and
max_heap_table_size = 128M. Why they are so slow? Classic selects are fast, but as there is group and count, no.
Indexes on products table:
Indexes on product_colors table:
Explain SQL:
EDIT Product indexes:
Your indexing is not optimal for what you are asking. Instead of just having an index on each column individually (can be a big waste), you should have composite indexes that better match what you are trying to query and be covering enough to handle any group by or orderings.
In this case, you primary query is ACTIVE products and ordering by type. So I would have a SINGLE index on your primary table on (is_active, type, id). This way, your WHERE criteria is up front via Is_Active, then your order by via Type and finally the ID that qualifies the record. In this case, your query can get all it needs from the INDEX and not have to go to the raw data pages.
Now, your secondary table. Similarly should be composite index. First based on the criteria of the join between tables, THEN based on its restrictions you are looking for, thus: ( product_id, is_active, is_archive ). Why you have two columns of Is_Active and another for Is_Archive, dont know. I would think that if something were in the archives, it would not be active to begin with, but just a guess on that.
Anyhow, with the optimized indexes should help.
One last consideration on your count(product.id). Do you intend DISTINCT Products, or all records found. So if a one product has 8 colors, do you want the ID counted as 1 or 8.
count(*) would give 8
count( distinct product.id ) would give 1
Give these a try:
products: INDEX(is_active, type)
product_colors: INDEX(product_id, is_individual, is_visible, is_active, is_archive)
Since products.id cannot be NULL, you may as well say COUNT(*) instead of count(products.id). (Or, as DRapp points out, maybe you need COUNT(DISTINCT products.id)

Query Speed Issue with NOT EXISTS condition

I have a query that works, but it is slow. Is there a way to speed this up? Basically I have a table with timecard entries, and then a second table with time breakdowns of that entry, related by the TimecardID. What I am looking for is timeblocks that there are no breakdowns for. I thought if I cut the criteria down to 2 months that it would speed it up. Thanks for your help
SELECT * FROM Timecards
WHERE NOT EXISTS (SELECT TimeCardID FROM TimecardBreakdown WHERE Timecards.ID = TimecardBreakdown.TimeCardID)
AND Status <> 0
AND DateIn >= CURRENT_DATE() - INTERVAL 2 MONTH
It seems you want to know the TimecardIDs which do not exist in the TimecardBreakdown table, in which case you can use the left outer join.
SELECT a.*
FROM Timecards a
LEFT OUTER JOIN TimecardBreakdown b ON a.TimecardID = b.TimecardID
WHERE b.TimecardID IS NULL
This would get rid of the subquery (which is expensive) and use join (which is more efficient).
MySQL stinks doing correlated subqueries fast. Try to make your subqueries independent and join them. You can use the LEFT JOIN ... IS NULL pattern to replace WHERE NOT EXISTS.
SELECT tc.*
FROM Timecards tc
LEFT JOIN TimecardBreakdown tcb ON tc.ID = tcb.TimeCardId
WHERE tc.DateIn >= CURRENT_DATE() - INTERVAL 2 MONTH
AND tc.Status <> 0
AND tcb.TimeCardId IS NULL
Some optimization points.
First, if you can change tc.Status <> 0 to tc.Status > 0 it makes an index range scan possible on that column.
Second, when you're optimizing stuff, SELECT * is considered harmful. Instead, if you can give the names of just the columns you need, things will be quicker. The database server has to sling around all the data you ask for; it can't tell if you're going to ignore some of it.
Third, this query will be helped by a compound index on Timecards (DateIn, Status, ID). That compound index can be used to do the heavy lifing of satisfying your query conditions.
That's called a covering index; it contains the data needed to satisfy much of your query. If you were to index just the DateIn column, then the query handler would have to bounce back to the main table to find the values of Status and ID. When those columns appear in the index, it saves that extra operation.
If you SELECT a certain set of columns rather than doing SELECT *, including those columns in the covering index can dramatically improve query performance. That's one of several reasons SELECT * is considered harmful.
(Some makes and model of DBMS have ways to specify lists of columns to ride along on indexes without actually indexing them. MySQL requires you to index them. But covering indexes still help.)
Read this: http://use-the-index-luke.com/

Optimize performance of MySQL UPDATE query containing EXISTS

Can anybody please give me a hint on how to optimize this update MySQL query that takes about a minute to process?
UPDATE store s
SET reservation=1
WHERE EXISTS (
SELECT 1
FROM item i
WHERE s.reservation=0
AND s.status!=9
AND s.id=i.store_id
AND i.store_id!=0
)
I need to update (set reservation=1) all rows in "store" table (which is very large) where there is currently reservation=0 but it's id exists in another table "item". Table "item" is also large but not as much as "store".
I'am not an expert on creating efficient queries so forgive me if this is just a completely wrong attitude and the whole thing has a simple solution.
Thanks for any ideas.
It looks like some of the predicates in the correlated subquery could be moved to the outer query. For example, I believe this is equivalent:
UPDATE store s
SET s.reservation = 1
WHERE s.reservation = 0
AND s.status != 9
AND s.id != 0
AND EXISTS ( SELECT 1
FROM item i
WHERE i.store_id = s.id
)
For best performance of that, at a minimum, we'd want an index on store that has reservation as the leading column. Also including the status and id columns would mean those conditions could be checked from the index page, without a lookup of the underlying page in the table.
And for that correlated subquery (dependent query), we'd want an index on item with a store_id as the leading column.
As another option, consider re-writing the correlated subquery as a JOIN operation, for example:
UPDATE store s
JOIN item i
ON i.store_id = s.id
SET s.reservation = 1
WHERE s.reservation = 0
AND s.status != 9
AND s.id != 0
If you're running MySQL 5.5 or earlier, you can't get an EXPLAIN on an UPDATE statement. The closest we can get is rewriting the query as a SELECT, and getting an EXPLAIN on that. MySQL 5.6 does support EXPLAIN on an UPDATE statement.
You can try to use:
UPDATE store s INNER JOIN item i ON s.id=i.store_id SET reservation=1 WHERE i.store_id!=0 AND s.reservation=0 AND s.status != 9;
This case should works faster because you will not go thru all 'item' table each time when you need to check 'store' row.

MySQL join query performance issue

I am running the be query
SELECT packages.id, packages.title, subcat.id, packages.weight
FROM packages ,provider, packagestosubcat,
packagestocity, subcat, usertosubcat,
usertocity, usertoprovider
WHERE packages.endDate >'2011-03-11 06:00:00' AND
usertosubcat.userid = 1 AND
usertocity.userid = 1 AND
packages.providerid = provider.id AND
packages.id = packagestosubcat.packageid AND
packages.id = packagestocity.packageid AND
packagestosubcat.subcatid = subcat.id AND
usertosubcat.subcatid = packagestosubcat.subcatid AND
usertocity.cityid = packagestocity.cityid AND
(
provider.providertype = 'reg' OR
(
usertoprovider.userid = 1 AND
provider.providertype != 'reg' AND
usertoprovider.providerid = provider.ID
)
)
GROUP BY packages.title
ORDER BY subcat.id, packages.weight DESC
When i run explain, everything seems to look ok except for the scan on the usertoprovider table, which doesn't seem to be using table's keys:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE usertocity ref user,city user 4 const 4 Using temporary; Using filesort
1 SIMPLE packagestocity ref city,packageid city 4 usertocity.cityid 419
1 SIMPLE packages eq_ref PRIMARY,enddate PRIMARY 4 packagestocity.packageid 1 Using where
1 SIMPLE provider eq_ref PRIMARY,providertype PRIMARY 4 packages.providerid 1 Using where
1 SIMPLE packagestosubcat ref subcatid,packageid packageid 4 packages.id 1 Using where
1 SIMPLE subcat eq_ref PRIMARY PRIMARY 4 packagestosubcat.subcatid 1
1 SIMPLE usertosubcat ref userid,subcatid subcatid 4 const 12 Using where
1 SIMPLE usertoprovider ALL userid,providerid NULL NULL NULL 3735 Using where
As you can see in the above query, the condition itself is:
provider.providertype = 'reg' OR
(
usertoprovider.userid = 1 AND
provider.providertype != 'reg' AND
usertoprovider.providerid = provider.ID
)
Both tables, provider and usertoprovider, are indexed. provider has indexes on providerid and providertype while usertoprovider has indexes on userid and providerid
The cardinality of the keys is:
provider.id=47, provider.type=1, usertoprovider.userid=1245, usertoprovider.providerid=6
So its quite obvious that the indexes are not used.
Further more, to test it out, i went ahead and:
Duplicated the usertoprovider table
Inserted all the provider values that have providertype='reg' into the cloned table
Simplified the condition to (usertoprovider.userid = 1 AND usertoprovider.providerid = provider.ID)
The query execution time changed from 8.1317 sec. to 0.0387 sec.
Still, provider values that have providertype='reg' are valid for all the users and i would like to avoid inserting these values into the usertoprovider table for all the users since this data is redundant.
Can someone please explain why MySQL still runs a full scan and doesn't use the keys? What can be done to avoid it?
It seems that provider.providertype != 'reg' is redundant (always true) unless provider.providertype is nullable and you want the query to fail on NULL.
And shouldn't != be <> instead to be standard SQL, although MySQL may allow !=?
On cost of table scans
It is not necessarily that a full table scan is more expensive than walking an index, because walking an index still requires multiple page accesses. In many database engines, if your table is small enough to fit inside a few pages, and the number of rows are small enough, it will be cheaper to do a table scan. Database engines make this type of decision based on the data and index statistics of the table.
This case
However, in your case, it might also be because of the other leg in your OR clause: provider.providertype = 'reg'. If providertype is "reg", then this query joins in ALL the rows of usertoprovider (most likely not what you want) since it is a multi-table cross join.
The database engine is correct in determining that you'll likely need all the table rows in usertoprovider anyway (unless none of the providertype's is "reg", but the engine also may know!).
The query hides this fact because you are grouping on the (MASSIVE!) result set later on and just returning the package ID, so you won't see how many usertoprovider rows have been returned. But it will run very slowly. Get rid of the GROUP BY clause to find out how many rows you are actually forcing the database engine to work on!!!
The reason you see a massive speed improvement if you fill out the usertoprovider table is because then every row participates in a join, and there is no full cross join happening in the case of "reg". Before, if you have 1,000 rows in usertoprovider, every row with type="reg" expands the result set 1,000 times. Now, that row joins with only one row in usertoprovider, and the result set is not expanded.
If you really want to pass anything with providertype='reg', but not in your many-to-many mapping table, then the easiest way may be to use a sub-query:
Remove usertoprovider from your FROM clause
Do the following:
provider.providertype='reg' OR EXISTS (SELECT * FROM usertoprovider WHERE userid=1 AND providerid = provider.ID)
Another method is to use an OUTER JOIN on the usertoprovider -- any row with "reg" which is not in the table will come back with one row of NULL instead of expanding the result set.
Hmm, I know that MySQL does funny things with grouping. In any other RDBMS, your query won't even be executed. What does that even mean,
SELECT packages.id
[...]
GROUP BY packages.title
ORDER BY subcat.id, packages.weight DESC
You want to group by title. Then in standard SQL syntax, this means you can only select title and aggregate functions of the other columns. MySQL magically tries to execute (and probably guess) what you may have meant to execute. So what would you expect to be selected as packages.id ? The First matching package ID for every title? Or the last? And what would the ORDER BY clause mean with respect to the grouping? How can you order by columns that are not part of the result set (because only packages.title really is)?
There are two solutions, as far as I can see:
You're on the right track with your query, then remove the ORDER BY clause, because I don't think it will affect your result, but it may severely slow down your query.
You have a SQL problem, not a performance problem

Order a query with two keys SQL Server 2008

I am trying to order a query by two keys. The query is built with several subqueries. The table contains, beside columns with other data, two columns, Key and Key_Father. So I need to order the results since SQL to print the results in a report. This is an example:
Key Key_Father
4 NULL
1 4
2 4
7 NULL
1 7
2 7
As you can see is a structure father-son, where a row is a father if the Key_Father is NULL and the Key column start from one for each son with a different father.
The first subquery gives the data in order, because is stored on that order in the table, but the second subquery that uses a group by, no. So I tried adding a extra column with Row_Number on the first subquery to keep that order, but the second subquery does the same thing.
This is the query:
SELECT Orden,INV_Key,Key_Padre,INV.INV_ID,INV.BOD_Bodega_ID,
CASE WHEN MAX(HIS_Ventas) > 0 OR max(HIS_Disponible) > 0 THEN 1 ELSE 0 END AS Participacion,MAX(ISNULL(HIS_Ventas,0)) AS Ventas
FROM(SELECT ROW_NUMBER() OVER (ORDER BY C.INV_Compra_ID) Orden,C.BOD_Bodega_ID,INV_Key,Key_Padre,CD.INV_ID
FROM dbo.INV_COMPRAS_USADOS C
INNER JOIN dbo.INV_COMPRAS_USADOS_DET CD ON C.INV_Compra_ID = CD.INV_Compra_ID
WHERE C.INV_Compra_ID = #Compra_ID
AND ((Key_Padre IS NULL AND CD.INV_Catalogo_Codigo = ISNULL(#Cod_Catalogo,CD.INV_Catalogo_Codigo)
AND INV_Key IN (SELECT DISTINCT Key_Padre
FROM dbo.INV_COMPRAS_USADOS_DET
WHERE INV_Compra_ID = #Compra_ID AND Key_Padre IS NOT NULL))
OR Key_Padre IN (SELECT DISTINCT INV_Key
FROM dbo.INV_COMPRAS_USADOS_DET
WHERE INV_Compra_ID = #Compra_ID AND (Key_Padre IS NULL AND CD.INV_Catalogo_Codigo = ISNULL(#Cod_Catalogo,CD.INV_Catalogo_Codigo))))) INV
LEFT JOIN DBO.HIS_HISTORICO_DETALLE HD ON INV.INV_ID = HD.INV_ID AND HD.BOD_Bodega_ID = INV.BOD_Bodega_ID
LEFT JOIN DBO.HIS_HISTORICO_INVENTARIO H on H.HIS_Historico_ID= HD.HIS_Historico_ID AND (CONVERT(datetime,(convert(varchar(20),HIS_Historico_Ano) + '/' + convert(varchar(20),HIS_Historico_Mes) + '/01')) BETWEEN #FechaDesde AND #FechaHasta)
WHERE H.HIS_Historico_Mes IS NOT NULL OR INV.INV_ID IS NULL
GROUP BY Orden,INV_Key,Key_Padre,INV.INV_ID,INV.BOD_Bodega_ID,HIS_Historico_Ano,HIS_Historico_Mes
Another interesting thing (well for me) is that when I change the #Variables for Constant values, the second query keeps the correct order, even when the constant values are the same that the #variables. This is just a portion of the total query, is a subquery that needs of another two selects, and I need to keep the order from those selects too.
So I hope that someone could help me with this. Thanks!
To order the results you need to place an ORDER BY clause on the outermost SELECT statement. Using ORDER BY in a nested SELECT is generally not permitted but even if you work around it (e.g. by using TOP), you can't rely on the results being ordered in any particular way.
Without an ORDER BY the results may appear to be coming out in the order you want but this cannot be relied upon. Running the same query on a different server or at some point in the future may produce a different order where differences in statistics, server load, etc can affect how the query optimizer actually executes the statement.
The portion of the query you've provided is outputting the following columns. Which are the ones you want to order by?
Orden (although this is just an alias for INV_Compra_ID as far as orderin is concerned)
INV_Key
Key_Padre
INV_ID
BOD_Bodega_ID
Participacion
Ventas
Let's say you want to order by just thre of them, then you need to append the following clause to the outermost SELECT:
ORDER BY
Orden,
INV_Key,
Key_Padre,
This should do it. I'm not sure if I'm missing an obvious simplification though.
ORDER BY ISNULL(Key_Father,[Key]), ISNULL(Key_Father,-1),[Key]