Found Problem:
I was able to significantly reduce the response time to less than 0.01 seconds from 1.1 seconds simply by removing the ORDER BY clause.
I was doing a USORT in PHP on dates anyway since the post-processing would reorder the posts by type of match, so the ORDER BY clause was unnecessary.
The query described by Larry Lustig is very fast for integers. It was the ordering by a string dates which caused significant performance problems.
Hope this servers as a good example of how to apply conditions to multiple rows and why you want to watch out for database operations involving strings.
Original Question (Edited for Clarity):
I'm working on an indexed search with keyword stemming. I'm looking to optimize it more.
Larry Lustig's answer helped me a lot here on how to create the query for easy scaling:
SQL for applying conditions to multiple rows in a join
Any suggestions of how to optimize this example query to run faster? I am guessing there could be a way to do one join for all the "s{x).post_id = p.ID" conditions.
SELECT p.ID, p.post_title, LEFT ( p.post_content, 800 ), p.post_date
FROM wp_posts AS p
WHERE p.post_status = 'publish' AND ( ( (
( EXISTS ( SELECT s0.post_id FROM wp_rama_search_index AS s0
WHERE s0.post_id = p.ID AND s0.hash = -617801035 ) )
) OR (
( EXISTS ( SELECT s1.post_id FROM wp_rama_search_index AS s1
WHERE s1.post_id = p.ID AND s1.hash = 1805184399 )
AND EXISTS ( SELECT s2.post_id FROM wp_rama_search_index AS s2
WHERE s2.post_id = p.ID AND s2.hash = 1823159221 )
AND EXISTS ( SELECT s3.post_id FROM wp_rama_search_index AS s3
WHERE s3.post_id = p.ID AND s3.hash = 1692658528 ) )
) OR (
( EXISTS ( SELECT s4.post_id FROM wp_rama_search_index AS s4
WHERE s4.post_id = p.ID AND s4.hash = 332583789 ) )
) OR (
( EXISTS ( SELECT s5.post_id FROM wp_rama_search_index AS s5
WHERE s5.post_id = p.ID AND s5.hash = 623525713 ) )
) OR (
( EXISTS ( SELECT s6.post_id FROM wp_rama_search_index AS s6
WHERE s6.post_id = p.ID AND s6.hash = -2064050708 )
AND EXISTS ( SELECT s7.post_id FROM wp_rama_search_index AS s7
WHERE s7.post_id = p.ID AND s7.hash = 1692658528 ) )
) OR (
( EXISTS ( SELECT s8.post_id FROM wp_rama_search_index AS s8
WHERE s8.post_id = p.ID AND s8.hash = 263456517 )
AND EXISTS ( SELECT s9.post_id FROM wp_rama_search_index AS s9
WHERE s9.post_id = p.ID AND s9.hash = -1214274178 ) )
) OR (
( EXISTS ( SELECT s10.post_id FROM wp_rama_search_index AS s10
WHERE s10.post_id = p.ID AND s10.hash = -2064050708 )
AND EXISTS ( SELECT s11.post_id FROM wp_rama_search_index AS s11
WHERE s11.post_id = p.ID AND s11.hash = -1864773421 ) )
) OR (
( EXISTS ( SELECT s12.post_id FROM wp_rama_search_index AS s12
WHERE s12.post_id = p.ID AND s12.hash = -1227797236 ) )
) OR (
( EXISTS ( SELECT s13.post_id FROM wp_rama_search_index AS s13
WHERE s13.post_id = p.ID AND s13.hash = 1823159221 )
AND EXISTS ( SELECT s14.post_id FROM wp_rama_search_index AS s14
WHERE s14.post_id = p.ID AND s14.hash = -1214274178 ) )
) OR (
( EXISTS ( SELECT s15.post_id FROM wp_rama_search_index AS s15
WHERE s15.post_id = p.ID AND s15.hash = 323592937 ) )
) OR (
( EXISTS ( SELECT s16.post_id FROM wp_rama_search_index AS s16
WHERE s16.post_id = p.ID AND s16.hash = 322413837 ) )
) OR (
( EXISTS ( SELECT s17.post_id FROM wp_rama_search_index AS s17
WHERE s17.post_id = p.ID AND s17.hash = 472301092 ) ) ) ) )
ORDER BY p.post_date DESC
This query runs in about 1.1s from phpMyAdmin connecting to a large AWS Aurora database instance.
There are 35k published posts. Words in post titles and excerpts of post content are inflected and hashed using FVN1A32.
The "AND" operations represent phrases where both hashes must be matched. There are several hashes because keyword stemming is being used and where aliases of keywords can also be phrases.
I think this would do the trick for you. It is basically a question of Relational Division, with multiple possible divisors.
I must say, I'm not entirely familiar with MySQL so I may have some slight syntax errors. But I'm sure you will get the idea.
Put the different search conditions in a temporary table, each group having a group number.
We select all rows in our main table, where our temp table has a group for which the total number of hashes in that group is the same as the number of matches on those hashes. In other words, every requested hash in the group matches.
CREATE TABLE #temp (grp int not null, hash int not null, primary key (grp, hash));
SELECT p.ID, p.post_title, LEFT ( p.post_content, 800 ), p.post_date
FROM wp_posts AS p
WHERE p.post_status = 'publish' AND
EXISTS ( SELECT 1
FROM #temp t
LEFT JOIN wp_rama_search_index AS s0
ON s0.hash = t.hash AND s0.post_id = p.ID
GROUP BY grp
HAVING COUNT(*) = COUNT(s0.hash)
);
There are other ways to slice this, for example the following, which may be more efficient:
SELECT p.ID, p.post_title, LEFT ( p.post_content, 800 ), p.post_date
FROM wp_posts AS p
CROSS JOIN #temp t
LEFT JOIN wp_rama_search_index AS s0
ON s0.hash = t.hash AND s0.post_id = p.ID
WHERE p.post_status = 'publish'
GROUP BY p.ID -- functional dependency??
HAVING COUNT(*) = COUNT(s0.hash);
See also these excellent articles on SimpleTalk: Divided We Stand: The SQL of Relational Division and High Performance Relational Division in SQL Server. The basic principles should be the same in MySQL.
I would put an compound index on the temp table if you can, not sure which order.
SELECT s15.post_id FROM wp_rama_search_index AS s15
WHERE s15.post_id = p.ID AND s15.hash = 323592937
What is the real name of s15? That may give us some clues of how to improve the query. Also, what is hash about?
Change to simply SELECT 1 FROM ...; there is not an advantage (and maybe a disadvantage) in listing any columns when doing EXISTS).
This composite index, with the columns in either order, should help:
INDEX(post_id, hash)
OR is deadly to optimization. However, in your query, it may be advantageous to sort the EXISTS(...) clauses with the most likely ones first.
This is WordPress? It may be that collecting the attributes that you are searching among into a single column (in wp_rama_search_index?), applying a FULLTEXT index to the column, then using MATCH(ft_column) AGAINST ("foo bar baz ..."). This may do most of the ORs without needing hashes and lots of EXISTS. (Again, I am guessing what is going on with s15andhash`.)
Your preprocessing of the words would still be useful for synonyms, but unnecessary for inflections (since FULLTEXT handles that mostly). Phrases can be handled with quotes. That may obviate your current use of AND.
If page-load takes 1.5 seconds aside from the SELECT, I suspect there are other things that need optimizing.
danblock's suggestion about replacing OR with IN works like this:
EXISTS ( SELECT s15.post_id FROM wp_rama_search_index AS s15
WHERE s15.post_id = p.ID AND s15.hash = 323592937
)
OR
EXISTS ( SELECT s16.post_id FROM wp_rama_search_index AS s16
WHERE s16.post_id = p.ID AND s16.hash = 322413837
)
-->
OR EXISTS( SELECT 1 FROM wp_rama_search_index
WHERE post_id = p.ID
AND hash IN ( 323592937, 322413837 )
(LIMIT 1 is not needed.) My suggestion of FULLTEXT supersedes this.
Using FULLTEXT will probably eliminate the speed variation due to having (or not) the ORDER BY.
Related
I'm not sure how to make the following SQL query more efficient. Right now, the query is taking 8 - 12 seconds on a pretty fast server, but that's not close to fast enough for a Website when users are trying to load a page with this code on it. It's looking through tables with many rows, for instance the "Post" table has 717,873 rows. Basically, the query lists all Posts related to what the user is following (newest to oldest).
Is there a way to make it faster by only getting the last 20 results total based on PostTimeOrder?
Any help would be much appreciated or insight on anything that can be done to improve this situation. Thank you.
Here's the full SQL query (lots of nesting):
SELECT DISTINCT p.Id, UNIX_TIMESTAMP(p.PostCreationTime) AS PostCreationTime, p.Content AS Content, p.Bu AS Bu, p.Se AS Se, UNIX_TIMESTAMP(p.PostCreationTime) AS PostTimeOrder
FROM Post p
WHERE (p.Id IN (SELECT pc.PostId
FROM PostCreator pc
WHERE (pc.UserId IN (SELECT uf.FollowedId
FROM UserFollowing uf
WHERE uf.FollowingId = '100')
OR pc.UserId = '100')
))
OR (p.Id IN (SELECT pum.PostId
FROM PostUserMentions pum
WHERE (pum.UserId IN (SELECT uf.FollowedId
FROM UserFollowing uf
WHERE uf.FollowingId = '100')
OR pum.UserId = '100')
))
OR (p.Id IN (SELECT ssp.PostId
FROM SStreamPost ssp
WHERE (ssp.SStreamId IN (SELECT ssf.SStreamId
FROM SStreamFollowing ssf
WHERE ssf.UserId = '100'))
))
OR (p.Id IN (SELECT psm.PostId
FROM PostSMentions psm
WHERE (psm.StockId IN (SELECT sf.StockId
FROM StockFollowing sf
WHERE sf.UserId = '100' ))
))
UNION ALL
SELECT DISTINCT p.Id AS Id, UNIX_TIMESTAMP(p.PostCreationTime) AS PostCreationTime, p.Content AS Content, p.Bu AS Bu, p.Se AS Se, UNIX_TIMESTAMP(upe.PostEchoTime) AS PostTimeOrder
FROM Post p
INNER JOIN UserPostE upe
on p.Id = upe.PostId
INNER JOIN UserFollowing uf
on (upe.UserId = uf.FollowedId AND (uf.FollowingId = '100' OR upe.UserId = '100'))
ORDER BY PostTimeOrder DESC;
Changing your p.ID in (...) predicates to existence predicates with correlated subqueries may help. Also since both halves of your union all query are pulling from the Post table and possibly returning nearly identical records you might be able to combine the two into one query by left outer joining to UserPostE and adding upe.PostID is not null as an OR condition in the WHERE clause. UserFollowing will still inner join to UPE. If you want the same Post record twice once with upe.PostEchoTime and once with p.PostCreationTime as the PostTimeOrder you'll need keep the UNION ALL
SELECT
DISTINCT -- <<=- May not be needed
p.Id
, UNIX_TIMESTAMP(p.PostCreationTime) AS PostCreationTime
, p.Content AS Content
, p.Bu AS Bu
, p.Se AS Se
, UNIX_TIMESTAMP(coalesce( upe.PostEchoTime
, p.PostCreationTime)) AS PostTimeOrder
FROM Post p
LEFT JOIN UserPostE upe
INNER JOIN UserFollowing uf
on (upe.UserId = uf.FollowedId AND
(uf.FollowingId = '100' OR
upe.UserId = '100'))
on p.Id = upe.PostId
WHERE upe.PostID is not null
or exists (SELECT 1
FROM PostCreator pc
WHERE pc.PostId = p.ID
and pc.UserId = '100'
or exists (SELECT 1
FROM UserFollowing uf
WHERE uf.FollowedId = pc.UserID
and uf.FollowingId = '100')
)
OR exists (SELECT 1
FROM PostUserMentions pum
WHERE pum.PostId = p.ID
and pum.UserId = '100'
or exists (SELECT 1
FROM UserFollowing uf
WHERE uf.FollowedId = pum.UserId
and uf.FollowingId = '100')
)
OR exists (SELECT 1
FROM SStreamPost ssp
WHERE ssp.PostId = p.ID
and exists (SELECT 1
FROM SStreamFollowing ssf
WHERE ssf.SStreamId = ssp.SStreamId
and ssf.UserId = '100')
)
OR exists (SELECT 1
FROM PostSMentions psm
WHERE psm.PostId = p.ID
and exists (SELECT
FROM StockFollowing sf
WHERE sf.StockId = psm.StockId
and sf.UserId = '100' )
)
ORDER BY PostTimeOrder DESC
The from section could alternatively be rewritten to also use an existence clause with a correlated sub query:
FROM Post p
LEFT JOIN UserPostE upe
on p.Id = upe.PostId
and ( upe.UserId = '100'
or exists (select 1
from UserFollowing uf
where uf.FollwedID = upe.UserID
and uf.FollowingId = '100'))
Turn IN ( SELECT ... ) into a JOIN .. ON ... (see below)
Turn OR into UNION (see below)
Some the tables are many:many mappings? Such as SStreamFollowing? Follow the tips in http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
Example of IN:
SELECT ssp.PostId
FROM SStreamPost ssp
WHERE (ssp.SStreamId IN (
SELECT ssf.SStreamId
FROM SStreamFollowing ssf
WHERE ssf.UserId = '100' ))
-->
SELECT ssp.PostId
FROM SStreamPost ssp
JOIN SStreamFollowing ssf ON ssp.SStreamId = ssf.SStreamId
WHERE ssf.UserId = '100'
The big WHERE with all the INs becomes something like
JOIN ( ( SELECT pc.PostId AS id ... )
UNION ( SELECT pum.PostId ... )
UNION ( SELECT ssp.PostId ... )
UNION ( SELECT psm.PostId ... ) )
Get what you can done of that those suggestions, then come back for more advice if you still need it. And bring SHOW CREATE TABLE with you.
Something rare to happen is one to one where the second table can have millions of results for the first one. For example, I have a 'radcliente' table that has millions of 'radacct', but need to filter only with the last acct. The following are examples for better explanation:
This is criteria:
$criteria = new CDbCriteria();
$criteria->with = [
'acct', // slow because it will take millions of lines to have only the last
];
$criteria->together = true;
$clientes = Cliente::model()->findAll($criteria);
This is generated query by Yii (very slow, more then 40 seconds, it return millions of rows to use only one in AR):
SELECT
`t`.`id` AS `t0_c0`,
-- ...
`t`.`spc_serasa` AS `t0_c56`,
`acct`.`radacctid` AS `t1_c0`,
-- ...
`acct`.`cliente_id` AS `t1_c27`
FROM
`radcliente` `t`
LEFT OUTER JOIN `radacct` `acct` ON (`acct`.`cliente_id`=`t`.`id`)
ORDER BY
radacctid DESC
After apply my solution limit join to one row (is this fast! 200ms-):
SELECT
`t`.`id` AS `t0_c0`,
..
`t`.`spc_serasa` AS `t0_c56`,
`acct`.`radacctid` AS `t1_c0`,
-- ...
`acct`.`cliente_id` AS `t1_c27`
FROM
`radcliente` `t`
LEFT OUTER JOIN `radacct` `acct` ON (
acct.radacctid = (
SELECT radacctid
FROM `radacct` `acct`
WHERE (acct.cliente_id = t.id)
ORDER BY radacctid DESC
LIMIT 1
)
)
This is the generated query by CActiveDataProvider to total item count with my solution of limit join to one (slow, 10 seconds to count):
SELECT
COUNT(*)
FROM (
SELECT
`t`.`id` AS `t0_c0`,
-- ...
`t`.`spc_serasa` AS `t0_c56`,
`endereco_instalacao`.`id` AS `t1_c0`,
`telefones`.`id` AS `t2_c0`,
`telefones`.`telefone` AS `t2_c3`,
`emails`.`id` AS `t3_c0`,
`emails`.`email` AS `t3_c3`,
`metodo_cobranca`.`id` AS `t4_c0`,
`acct`.`radacctid` AS `t5_c0`,
`acct`.`framedipaddress` AS `t5_c22`
FROM
`radcliente` `t`
LEFT OUTER JOIN `radcliente_endereco_instalacao` `endereco_instalacao` ON (
endereco_instalacao.id = (
SELECT id
FROM `radcliente_endereco_instalacao` `endereco_instalacao`
WHERE (
endereco_instalacao.cliente_id = t.id
)
LIMIT 1
)
)
LEFT OUTER JOIN `radcliente_telefone` `telefones` ON (`telefones`.`cliente_id`=`t`.`id`)
LEFT OUTER JOIN `radcliente_email` `emails` ON (`emails`.`cliente_id`=`t`.`id`)
LEFT OUTER JOIN `radmetodo_cobranca` `metodo_cobranca` ON (
metodo_cobranca.id = (
SELECT id
FROM `radmetodo_cobranca` `metodo_cobranca`
WHERE (metodo_cobranca.cliente_id = t.id)
AND (metodo_cobranca.arquivo = 'nao')
ORDER BY metodo_cobranca.id DESC
LIMIT 1
)
)
LEFT OUTER JOIN `radacct` `acct` ON (
acct.radacctid = (
SELECT radacctid
FROM `radacct` `acct`
WHERE (acct.cliente_id = t.id)
ORDER BY radacctid DESC
LIMIT 1
)
)
GROUP BY t.id
) sq
But the problem is in the count generated by CActiveDataProvider (about 10 seconds to return the result) would have a way to optimize without having to lose the relationship (because I need to filter by a relationship in the future)?
UPDATE
Thank you for your response. I've been doing some tests and noticed that is slow in all cases, the table 'radacct' exacerbates the problem by its size, which should not therefore limit the 1 in the subquery. Follow the models and the link to access the system, if you need to authenticate is:
To access:
http://177.86.111.30/dev2/teste
username: help
password: 1
To download models and schema of radcliente and radacct: http://177.86.111.30/files.zip
Instead of ON id = ( SELECT ... LIMIT 1 ) try adding another JOIN (not LEFT JOIN):
JOIN ( SELECT ... LIMIT 1 ) x ON ...
The fear I have with your code is that it will be evaluating that subquery repeatedly, whenever it needs to check the ON clause. My rewrite will cause the subquery to happen only once.
Your query looks like a "correlated" subquery, so you would need to rephrase it to be non-correlated, if possible.
I have the following query which is actually within a stored procedure, but I removed it as there is too much going on inside the stored procedure. Basically this is the end result which takes ages (more than a minute) to run and I know the reason why - as you will also see from looking at the result of the explain - but I just cannot get it sorted.
Just to quickly explain what this query is doing. It is fetching all products from companies that are "connected" to the company where li.nToObjectID = 37. The result also returns some other information about the other companies like its name, company id, etc.
SELECT DISTINCT
SQL_CALC_FOUND_ROWS
p.id,
p.sTitle,
p.sTeaser,
p.TimeStamp,
p.ExpiryDate,
p.InStoreDate,
p.sCreator,
p.sProductCode,
p.nRetailPrice,
p.nCostPrice,
p.bPublic,
c.id as nCompanyID,
c.sName as sCompany,
m.id as nMID,
m.sFileName as sHighResFileName,
m.nSize,
(
Select sName
FROM tblBrand
WHERE id = p.nBrandID
) as sBrand,
(
Select t.sFileName
FROM tblThumbnail t
where t.nMediaID = m.id AND
t.sType = "thumbnail"
) as sFileName,
(
Select t.nWidth
FROM tblThumbnail t
where t.nMediaID = m.id AND
t.sType = "thumbnail"
) as nWidth,
(
Select t.nHeight
FROM tblThumbnail t
where t.nMediaID = m.id AND
t.sType = "thumbnail"
) as nHeight,
IF (
(
SELECT COUNT(id) FROM tblLink
WHERE
sType = "company"
AND sStatus = "active"
AND nToObjectID = 37
AND nFromObjectID = u.nCompanyID
),
1,
0
) AS bLinked
FROM tblProduct p
INNER JOIN tblMedia m
ON (
m.nTypeID = p.id AND
m.sType = "product"
)
INNER JOIN tblUser u
ON u.id = p.nUserID
INNER JOIN tblCompany c
ON u.nCompanyID = c.id
LEFT JOIN tblLink li
ON (
li.sType = "company"
AND li.sStatus = "active"
AND li.nToObjectID = 37
AND li.nFromObjectID = u.nCompanyID
)
WHERE c.bActive = 1
AND p.bArchive = 0
AND p.bActive = 1
AND NOW() <= p.ExpiryDate
AND (
li.id IS NOT NULL
OR (
li.id IS NULL
AND p.bPublic = 1
)
)
ORDER BY p.TimeStamp DESC
LIMIT 0, 52
Click here to see the output for EXPLAIN. Sorry, just couldn't get the formatting correct.
http://i60.tinypic.com/2hdqjgj.png
And lastly the number of rows for all the tables in this query:
tblProducts
Count: 5392
tblBrand
Count: 194
tblCompany
Count: 368
tblUser
Count: 416
tblMedia
Count: 5724
tblLink
Count: 24800
tblThumbnail
Count: 22207
So I have 2 questions:
1. Is there another way of writing this query which might potentially speed it up?
2. What index combination do I need for tblProducts so that not all the rows are searched through?
UPDATE 1
This is the new query after removing the subqueries and making use of left joins instead:
SELECT DISTINCT DISTINCT
SQL_CALC_FOUND_ROWS
p.id,
p.sTitle,
p.sTeaser,
p.TimeStamp,
p.ExpiryDate,
p.InStoreDate,
p.sCreator,
p.sProductCode,
p.nRetailPrice,
p.nCostPrice,
p.bPublic,
c.id as nCompanyID,
c.sName as sCompany,
m.id as nMID,
m.sFileName as sHighResFileName,
m.nSize,
brand.sName as sBrand,
thumb.sFilename,
thumb.nWidth,
thumb.nHeight,
IF (
(
SELECT COUNT(id) FROM tblLink
WHERE
sType = "company"
AND sStatus = "active"
AND nToObjectID = 37
AND nFromObjectID = u.nCompanyID
),
1,
0
) AS bLinked
FROM tblProduct p
INNER JOIN tblMedia m
ON (
m.nTypeID = p.id AND
m.sType = "product"
)
INNER JOIN tblUser u
ON u.id = p.nUserID
INNER JOIN tblCompany c
ON u.nCompanyID = c.id
LEFT JOIN tblLink li
ON (
li.sType = "company"
AND li.sStatus = "active"
AND li.nToObjectID = 37
AND li.nFromObjectID = u.nCompanyID
)
LEFT JOIN tblBrand AS brand
ON brand.id = p.nBrandID
LEFT JOIN tblThumbnail AS thumb
ON (
thumb.nMediaID = m.id
AND thumb.sType = 'thumbnail'
)
WHERE c.bActive = 1
AND p.bArchive = 0
AND p.bActive = 1
AND NOW() <= p.ExpiryDate
AND (
li.id IS NOT NULL
OR (
li.id IS NULL
AND p.bPublic = 1
)
)
ORDER BY p.TimeStamp DESC
LIMIT 0, 52;
UPDATE 2
ALTER TABLE tblThumbnail ADD INDEX (nMediaID,sType) USING BTREE;
ALTER TABLE tblMedia ADD INDEX (nTypeID,sType) USING BTREE;
ALTER TABLE tblProduct ADD INDEX (bArchive,bActive,ExpiryDate,bPublic,TimeStamp) USING BTREE;
After doing the above changes the explain showed that it is now only searching through 1464 rows on tblProduct instead of 5392.
That's a big query with a lot going on. It's going to take a few steps of work to optimize it. I will take the liberty of just presenting a couple of steps.
First step. Can you get rid of SQL_CALC_FOUND_ROWS and still have your program work correctly? If so, do that. When you specify SQL_CALC_FOUND_ROWS it sometimes means the server has to delay sending you the first row of your resultset until the last row is available.
Second step. Refactor the dependent subqueries to be JOINs instead.
Here's how you might approach that. Part of your query looks like this...
SELECT DISTINCT SQL_CALC_FOUND_ROWS
p.id,
...
c.id as nCompanyID,
...
m.id as nMID,
...
( /* dependent subquery to be removed */
Select sName
FROM tblBrand
WHERE id = p.nBrandID
) as sBrand,
( /* dependent subquery to be removed */
Select t.sFileName
FROM tblThumbnail t
where t.nMediaID = m.id AND
t.sType = "thumbnail"
) as sFileName,
( /* dependent subquery to be removed */
Select t.nWidth
FROM tblThumbnail t
where t.nMediaID = m.id AND
t.sType = "thumbnail"
) as nWidth,
( /* dependent subquery to be removed */
Select t.nHeight
FROM tblThumbnail t
where t.nMediaID = m.id AND
t.sType = "thumbnail"
) as nHeight,
...
Try this instead. Notice how the brand and thumbnail dependent subqueries disappear. You had three dependent subqueries for the thumbnail; they can disappear into a single JOIN.
SELECT DISTINCT SQL_CALC_FOUND_ROWS
p.id,
...
brand.sName,
thumb.sFilename,
thumb.nWidth,
thumb.nHeight,
...
FROM tblProduct p
INNER JOIN tblMedia AS m ON (m.nTypeID = p.id AND m.sType = 'product')
... (other table joins) ...
LEFT JOIN tblBrand AS brand ON p.id = p.nBrandID
LEFT JOIN tblMedia AS thumb ON (t.nMediaID = m.id AND thumb.sType = 'thumbnail')
I used LEFT JOIN rather than INNER JOIN so MySQL will present NULL values if the joined rows are missing.
Edit
You're using a join pattern that looks like this:
JOIN sometable AS s ON (s.someID = m.id AND s.sType = 'string')
You seem to do this for a few tables. You probably can speed up the JOIN operations by creating compound indexes in those tables. For example, try adding the following index to tblThumbnail: (sType, nMediaID). You can do that with this DDL statement.
ALTER TABLE tblThumbnail ADD INDEX (sType, nMediaID) USING BTREE
You can do similar things to other tables with the same join pattern.
I'm using a database that, imho, wasn't designed well, but maybe it's just me not understanding it. Anyways, I have a query that pulls the correct information, but it is really slowing down my php script. I was hoping someone could take a look at this and let me know if nesting queries to this depth is bad, and whether or not there is a way to simplify the query from the relationships depicted in the sql statement below.
SELECT name
FROM groups
WHERE id = (SELECT DISTINCT immediateparentid
FROM cachedgroupmembers
WHERE groupid = (SELECT g.id AS AdminCc
FROM Tickets t, groups g
WHERE t.Id = 124 AND t.id = g.instance AND g.type = 'AdminCc')
AND immediateparentid <> (SELECT g.id AS AdminCc
FROM Tickets t, groups g
WHERE t.Id = 124 AND t.id = g.instance AND g.type = 'AdminCc'))
Please help
Update:
Here is the output from using Explain
You may need to right click and select "View Image" for the text to be clear.
From what I can tell, you can eliminate one sub-select.
SELECT name
FROM groups
WHERE id = (
SELECT DISTINCT immediateparentid
FROM cachedgroupmembers
WHERE groupid = (
SELECT g.id
FROM Tickets t, groups g
WHERE t.Id = 124 AND t.id = g.instance AND g.type = 'AdminCc'
) AND immediateparentid != groupid
)
I'm much more used to PL/SQL on Oracle but I'll give it a try.
Get rid of aliases, you don't need them here.
Make sure columns used in the where clause are indexed (t.Id and g.type).
Don't know if MySQL indexes foreign keys by default but worth the check.
You can shorten your SQL code like that:
SELECT name
FROM groups
WHERE id = (
SELECT DISTINCT immediateparentid
FROM cachedgroupmembers
WHERE groupid = (
SELECT g.id
FROM Tickets t, groups g
WHERE t.Id = 124 AND t.id = g.instance AND g.type = 'AdminCc'
) AND immediateparentid != groupid
)
or:
SELECT name
FROM groups
WHERE id = (
SELECT DISTINCT immediateparentid
FROM cachedgroupmembers
WHERE groupid = (
SELECT g.id
FROM Tickets t inner join groups g on t.id = g.instance
WHERE t.Id = 124 AND g.type = 'AdminCc'
) AND immediateparentid != groupid
)
if your tickets table is big you may consider a temp table instead of querying it twice
I have a sql query that is running really slow and I am perplexed as to why. The query is:
SELECT DISTINCT(c.ID),c.* FROM `content` c
LEFT JOIN `content_meta` cm1 ON c.id = cm1.content_id
WHERE 1=1
AND c.site_id IN (14)
AND c.type IN ('a','t')
AND c.status = 'visible'
AND (c.lock = 0 OR c.site_id = 14)
AND c.level = 0
OR
(
( c.site_id = 14
AND cm1.meta_key = 'g_id'
AND cm1.meta_value IN ('12','13','7')
)
OR
( c.status = 'visible'
AND (
(c.type = 'topic' AND c.parent_id IN (628,633,624))
)
)
)
ORDER BY c.date_updated DESC LIMIT 20
The content table has about 1250 rows and the content meta table has about 3000 rows. This isn't a lot of data and I'm not quite sure what may be causing it to run so slow. Any thoughts/opinions would be greatly appreciated.
Thanks!
Is you where clause correct? You are making a series of AND statements and later you are executing a OR.
Wouldn't the correct be something like:
AND (c.lock = 0 OR c.site_id = 14)
AND (
( ... )
OR
( ... )
)
If it is indeed correct, you could think on changing the structure or treating the result in an script or procedure.
It might have to do with your "OR" clause at the end... your up-front are being utilized by the indexes where possible, but then you throw this huge OR condition at the end that can be either one or the other. Not knowing more of the underlying content, I would adjust to have a UNION on the inside so each entity can utilize its own indexes, get you qualified CIDs, THEN join to final results.
select
c2.*
from
( select distinct
c.ID
from
`content` c
where
c.site_id in (14)
and c.type in ('a', 't' )
and c.status = 'visible'
and c.lock in ( 0, 14 )
and c.level = 0
UNION
select
c.ID
from
`content` c
where
c.status = 'visible'
and c.type = 'topic'
and c.parent_id in ( 628, 633, 624 )
UNION
select
c.ID
from
`content` c
join `content_meta` cm1
on c.id = cm1.content_id
AND cm1.meta_key = 'g_id'
AND cm1.meta_value in ( '12', '13', '7' )
where
c.site_id = 14 ) PreQuery
JOIN `content` c2
on PreQuery.cID = c2.cID
order by
c2.date_updated desc
limit
20
I would ensure content table has an index on ( site_id, type, status ) another on (parent_id, type, status)
and the meta table, an index on ( content_id, meta_key, meta_value )