After learning how to do MySQL Full-Text search, the recommended solution for multiple tables was OR MATCH and then do the other database call. You can see that in my query below.
When I do this, it just gets stuck in a "busy" state, and I can't access the MySQL database.
SELECT
a.`product_id`, a.`name`, a.`slug`, a.`description`, b.`list_price`, b.`price`, c.`image`, c.`swatch`, e.`name` AS industry,
MATCH( a.`name`, a.`sku`, a.`description` ) AGAINST ( '%s' IN BOOLEAN MODE ) AS relevance
FROM
`products` AS a LEFT JOIN `website_products` AS b
ON (a.`product_id` = b.`product_id`)
LEFT JOIN ( SELECT `product_id`, `image`, `swatch` FROM `product_images` WHERE `sequence` = 0) AS c
ON (a.`product_id` = c.`product_id`)
LEFT JOIN `brands` AS d
ON (a.`brand_id` = d.`brand_id`)
INNER JOIN `industries` AS e ON (a.`industry_id` = e.`industry_id`)
WHERE
b.`website_id` = %d
AND b.`status` = %d
AND b.`active` = %d
AND MATCH( a.`name`, a.`sku`, a.`description` ) AGAINST ( '%s' IN BOOLEAN MODE )
OR MATCH ( d.`name` ) AGAINST ( '%s' IN BOOLEAN MODE )
GROUP BY a.`product_id`
ORDER BY relevance DESC
LIMIT 0, 9
Any help would be greatly appreciated.
EDIT
All the tables involved are MyISAM, utf8_general_ci.
Here's the EXPLAIN SELECT statement:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY a ALL NULL NULL NULL NULL 16076 Using temporary; Using filesort
1 PRIMARY b ref product_id product_id 4 database.a.product_id 2
1 PRIMARY e eq_ref PRIMARY PRIMARY 4 database.a.industry_id 1
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 23261
1 PRIMARY d eq_ref PRIMARY PRIMARY 4 database.a.brand_id 1 Using where
2 DERIVED product_images ALL NULL NULL NULL NULL 25933 Using where
I don't know how to make that look neater -- sorry about that
UPDATE
it returns the query after 196 seconds (I think correctly). The query without multiple tables takes about .56 seconds (which I know is really slow, we plan on changing to solr or sphinx soon), but 196 seconds??
If we could add a number to the relevance if it was in the brand name ( d.name ), that would also work
I found 2 things slowing down my query drastically and fixed them.
To answer the first problem, it needed parentheses around the entire "MATCH AGAINST OR MATCH AGAINST":
WHERE
b.`website_id` = %d
AND b.`status` = %d
AND b.`active` = %d
AND (
MATCH( a.`name`, a.`sku`, a.`description` ) AGAINST ( '%s' IN BOOLEAN MODE )
OR MATCH ( d.`name` ) AGAINST ( '%s' IN BOOLEAN MODE )
)
I didn't understand how to use EXPLAIN SELECT, but it helped quite a bit, so thank you! This reduced that first number 16076 rows to 143. I then noticed the other two with over 23 and 25 thousand rows. That was cause from this line:
LEFT JOIN ( SELECT `product_id`, `image`, `swatch` FROM `product_images` WHERE `sequence` = 0) AS c
ON (a.`product_id` = c.`product_id`)
There was a reason I was doing this in the first place, which then changed. When I changed it, I didn't realize I could do a normal LEFT JOIN:
LEFT JOIN `product_images` AS c
ON (a.`product_id` = c.`product_id`)
This makes my final query like this: (and MUCH faster went from the 196 seconds to 0.0084 or so)
SELECT
a.`product_id`, a.`name`, a.`slug`, a.`description`, b.`list_price`, b.`price`,
c.`image`, c.`swatch`, e.`name` AS industry,
MATCH( a.`name`, a.`sku`, a.`description` ) AGAINST ( '%s' IN BOOLEAN MODE ) AS relevance
FROM
`products` AS a LEFT JOIN `website_products` AS b
ON (a.`product_id` = b.`product_id`)
LEFT JOIN `product_images` AS c
ON (a.`product_id` = c.`product_id`)
LEFT JOIN `brands` AS d
ON (a.`brand_id` = d.`brand_id`)
INNER JOIN `industries` AS e
ON (a.`industry_id` = e.`industry_id`)
WHERE
b.`website_id` = %d
AND b.`status` = %d
AND b.`active` = %d
AND c.`sequence` = %d
AND (
MATCH( a.`name`, a.`sku`, a.`description` ) AGAINST ( '%s' IN BOOLEAN MODE )
OR MATCH( d.`name` ) AGAINST( '%s' IN BOOLEAN MODE )
)
GROUP BY a.`product_id`
ORDER BY relevance DESC
LIMIT 0, 9
Oh, and even before I was doing a full text search with multiple tables, it was taking about 1/2 a second. This is much improved.
Related
Found Problem:
I was able to significantly reduce the response time to less than 0.01 seconds from 1.1 seconds simply by removing the ORDER BY clause.
I was doing a USORT in PHP on dates anyway since the post-processing would reorder the posts by type of match, so the ORDER BY clause was unnecessary.
The query described by Larry Lustig is very fast for integers. It was the ordering by a string dates which caused significant performance problems.
Hope this servers as a good example of how to apply conditions to multiple rows and why you want to watch out for database operations involving strings.
Original Question (Edited for Clarity):
I'm working on an indexed search with keyword stemming. I'm looking to optimize it more.
Larry Lustig's answer helped me a lot here on how to create the query for easy scaling:
SQL for applying conditions to multiple rows in a join
Any suggestions of how to optimize this example query to run faster? I am guessing there could be a way to do one join for all the "s{x).post_id = p.ID" conditions.
SELECT p.ID, p.post_title, LEFT ( p.post_content, 800 ), p.post_date
FROM wp_posts AS p
WHERE p.post_status = 'publish' AND ( ( (
( EXISTS ( SELECT s0.post_id FROM wp_rama_search_index AS s0
WHERE s0.post_id = p.ID AND s0.hash = -617801035 ) )
) OR (
( EXISTS ( SELECT s1.post_id FROM wp_rama_search_index AS s1
WHERE s1.post_id = p.ID AND s1.hash = 1805184399 )
AND EXISTS ( SELECT s2.post_id FROM wp_rama_search_index AS s2
WHERE s2.post_id = p.ID AND s2.hash = 1823159221 )
AND EXISTS ( SELECT s3.post_id FROM wp_rama_search_index AS s3
WHERE s3.post_id = p.ID AND s3.hash = 1692658528 ) )
) OR (
( EXISTS ( SELECT s4.post_id FROM wp_rama_search_index AS s4
WHERE s4.post_id = p.ID AND s4.hash = 332583789 ) )
) OR (
( EXISTS ( SELECT s5.post_id FROM wp_rama_search_index AS s5
WHERE s5.post_id = p.ID AND s5.hash = 623525713 ) )
) OR (
( EXISTS ( SELECT s6.post_id FROM wp_rama_search_index AS s6
WHERE s6.post_id = p.ID AND s6.hash = -2064050708 )
AND EXISTS ( SELECT s7.post_id FROM wp_rama_search_index AS s7
WHERE s7.post_id = p.ID AND s7.hash = 1692658528 ) )
) OR (
( EXISTS ( SELECT s8.post_id FROM wp_rama_search_index AS s8
WHERE s8.post_id = p.ID AND s8.hash = 263456517 )
AND EXISTS ( SELECT s9.post_id FROM wp_rama_search_index AS s9
WHERE s9.post_id = p.ID AND s9.hash = -1214274178 ) )
) OR (
( EXISTS ( SELECT s10.post_id FROM wp_rama_search_index AS s10
WHERE s10.post_id = p.ID AND s10.hash = -2064050708 )
AND EXISTS ( SELECT s11.post_id FROM wp_rama_search_index AS s11
WHERE s11.post_id = p.ID AND s11.hash = -1864773421 ) )
) OR (
( EXISTS ( SELECT s12.post_id FROM wp_rama_search_index AS s12
WHERE s12.post_id = p.ID AND s12.hash = -1227797236 ) )
) OR (
( EXISTS ( SELECT s13.post_id FROM wp_rama_search_index AS s13
WHERE s13.post_id = p.ID AND s13.hash = 1823159221 )
AND EXISTS ( SELECT s14.post_id FROM wp_rama_search_index AS s14
WHERE s14.post_id = p.ID AND s14.hash = -1214274178 ) )
) OR (
( EXISTS ( SELECT s15.post_id FROM wp_rama_search_index AS s15
WHERE s15.post_id = p.ID AND s15.hash = 323592937 ) )
) OR (
( EXISTS ( SELECT s16.post_id FROM wp_rama_search_index AS s16
WHERE s16.post_id = p.ID AND s16.hash = 322413837 ) )
) OR (
( EXISTS ( SELECT s17.post_id FROM wp_rama_search_index AS s17
WHERE s17.post_id = p.ID AND s17.hash = 472301092 ) ) ) ) )
ORDER BY p.post_date DESC
This query runs in about 1.1s from phpMyAdmin connecting to a large AWS Aurora database instance.
There are 35k published posts. Words in post titles and excerpts of post content are inflected and hashed using FVN1A32.
The "AND" operations represent phrases where both hashes must be matched. There are several hashes because keyword stemming is being used and where aliases of keywords can also be phrases.
I think this would do the trick for you. It is basically a question of Relational Division, with multiple possible divisors.
I must say, I'm not entirely familiar with MySQL so I may have some slight syntax errors. But I'm sure you will get the idea.
Put the different search conditions in a temporary table, each group having a group number.
We select all rows in our main table, where our temp table has a group for which the total number of hashes in that group is the same as the number of matches on those hashes. In other words, every requested hash in the group matches.
CREATE TABLE #temp (grp int not null, hash int not null, primary key (grp, hash));
SELECT p.ID, p.post_title, LEFT ( p.post_content, 800 ), p.post_date
FROM wp_posts AS p
WHERE p.post_status = 'publish' AND
EXISTS ( SELECT 1
FROM #temp t
LEFT JOIN wp_rama_search_index AS s0
ON s0.hash = t.hash AND s0.post_id = p.ID
GROUP BY grp
HAVING COUNT(*) = COUNT(s0.hash)
);
There are other ways to slice this, for example the following, which may be more efficient:
SELECT p.ID, p.post_title, LEFT ( p.post_content, 800 ), p.post_date
FROM wp_posts AS p
CROSS JOIN #temp t
LEFT JOIN wp_rama_search_index AS s0
ON s0.hash = t.hash AND s0.post_id = p.ID
WHERE p.post_status = 'publish'
GROUP BY p.ID -- functional dependency??
HAVING COUNT(*) = COUNT(s0.hash);
See also these excellent articles on SimpleTalk: Divided We Stand: The SQL of Relational Division and High Performance Relational Division in SQL Server. The basic principles should be the same in MySQL.
I would put an compound index on the temp table if you can, not sure which order.
SELECT s15.post_id FROM wp_rama_search_index AS s15
WHERE s15.post_id = p.ID AND s15.hash = 323592937
What is the real name of s15? That may give us some clues of how to improve the query. Also, what is hash about?
Change to simply SELECT 1 FROM ...; there is not an advantage (and maybe a disadvantage) in listing any columns when doing EXISTS).
This composite index, with the columns in either order, should help:
INDEX(post_id, hash)
OR is deadly to optimization. However, in your query, it may be advantageous to sort the EXISTS(...) clauses with the most likely ones first.
This is WordPress? It may be that collecting the attributes that you are searching among into a single column (in wp_rama_search_index?), applying a FULLTEXT index to the column, then using MATCH(ft_column) AGAINST ("foo bar baz ..."). This may do most of the ORs without needing hashes and lots of EXISTS. (Again, I am guessing what is going on with s15andhash`.)
Your preprocessing of the words would still be useful for synonyms, but unnecessary for inflections (since FULLTEXT handles that mostly). Phrases can be handled with quotes. That may obviate your current use of AND.
If page-load takes 1.5 seconds aside from the SELECT, I suspect there are other things that need optimizing.
danblock's suggestion about replacing OR with IN works like this:
EXISTS ( SELECT s15.post_id FROM wp_rama_search_index AS s15
WHERE s15.post_id = p.ID AND s15.hash = 323592937
)
OR
EXISTS ( SELECT s16.post_id FROM wp_rama_search_index AS s16
WHERE s16.post_id = p.ID AND s16.hash = 322413837
)
-->
OR EXISTS( SELECT 1 FROM wp_rama_search_index
WHERE post_id = p.ID
AND hash IN ( 323592937, 322413837 )
(LIMIT 1 is not needed.) My suggestion of FULLTEXT supersedes this.
Using FULLTEXT will probably eliminate the speed variation due to having (or not) the ORDER BY.
This query:
SELECT `subscribers`.`email_address`, `subscribers`.`first_name`, `subscribers`.`last_name`,
GROUP_CONCAT( DISTINCT t1.value SEPARATOR '|' ) AS 'Colors', GROUP_CONCAT( DISTINCT t2.value SEPARATOR '|' ) AS 'Languages'
FROM `subscribers`
LEFT JOIN `subscribers_multivalued` AS `t1` ON subscribers.subscriber_id = t1.subscriber_id AND t1.field_id = 112
LEFT JOIN `subscribers_multivalued` AS `t2` ON subscribers.subscriber_id = t2.subscriber_id AND t2.field_id = 111
WHERE (list_id = 40) AND (state = 1)
GROUP BY `subscribers`.`email_address` , `subscribers`.`first_name` , `subscribers`.`last_name`
With execute plan:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE subscribers ref FK_list_id,state_date_added FK_list_id 4 const 1753610 Using where; Using filesort
1 SIMPLE t1 ref subscriber_fk,field_fk subscriber_fk 4 chad0598_mailablel.subscribers.subscriber_id 1
1 SIMPLE t2 ref subscriber_fk,field_fk subscriber_fk 4 chad0598_mailablel.subscribers.subscriber_id 1
Caused error for uknown reason: General error: 3 Error writing file '/tmp/MYzamaNT' (Errcode: 28) in mysql.
I rewrote it like this groupy by subscriber_id insead:
SELECT `subscribers`.`email_address`, `subscribers`.`first_name`, `subscribers`.`last_name`,
GROUP_CONCAT( DISTINCT t1.value SEPARATOR '|' ) AS 'Colors', GROUP_CONCAT( DISTINCT t2.value SEPARATOR '|' ) AS 'Languages'
FROM `subscribers`
LEFT JOIN `subscribers_multivalued` AS `t1` ON subscribers.subscriber_id = t1.subscriber_id AND t1.field_id = 112
LEFT JOIN `subscribers_multivalued` AS `t2` ON subscribers.subscriber_id = t2.subscriber_id AND t2.field_id = 111
WHERE (list_id = 40) AND (state = 1)
GROUP BY `subscribers`.`subscriber_id`
The plan of this query is better (not filesort):
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE subscribers ref FK_list_id,state_date_added FK_list_id 4 const 1753610 Using where
1 SIMPLE t1 ref subscriber_fk,field_fk subscriber_fk 4 chad0598_mailablel.subscribers.subscriber_id 1
1 SIMPLE t2 ref subscriber_fk,field_fk subscriber_fk 4 chad0598_mailablel.subscribers.subscriber_id 1
It seems that it works in mysql and it much faster that previous query.
Although it works in mysql it's not compliant sql standard there shouldn't be aggregated fields in fields' list that are not present in GROUP BY. I there a way to make the query as fast as the last one and make it sql compliant?
I tried to rewrite it this way too:
SELECT `subscribers`.`email_address`, `subscribers`.`first_name`, `subscribers`.`last_name`,
t1.colors AS 'Colors', t2.languages AS 'Languages'
FROM `subscribers`
LEFT JOIN (SELECT subscriber_id, GROUP_CONCAT( DISTINCT value SEPARATOR '|' ) AS 'colors'
FROM subscribers_multivalued
WHERE field_id = 112
GROUP BY subscriber_id) t1 ON t1.subscriber_id = `subscribers`.`subscriber_id`
LEFT JOIN (SELECT subscriber_id, GROUP_CONCAT( DISTINCT value SEPARATOR '|' ) AS 'languages'
FROM subscribers_multivalued
WHERE field_id = 111
GROUP BY subscriber_id) t2 ON t2.subscriber_id = `subscribers`.`subscriber_id`
The plan of this query is much worse:
1 PRIMARY subscribers ALL NULL NULL NULL NULL 23358546
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 900000
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 900000
3 DERIVED subscribers_multivalued ALL field_fk field_fk 4 20621115 Using filesort
2 DERIVED subscribers_multivalued ALL field_fk field_fk 4 20621115 Using filesort
and I couldn't wait untill it returns data;
I'd keep the faster one. You can also test/benchmark this variation:
SELECT s.email_address,
s.first_name,
s.last_name,
COALESCE(t1.Colors, '') AS Colors,
COALESCE(t2.Languages, '') AS Languages
FROM subscribers AS s
LEFT JOIN
( SELECT t1.subscriber_id,
GROUP_CONCAT( DISTINCT t1.value SEPARATOR '|' ) AS Colors
FROM subscribers AS s
JOIN subscribers_multivalued AS t1
ON s.subscriber_id = t1.subscriber_id
WHERE t1.field_id = 112
AND s.list_id = 40
AND s.state = 1
GROUP BY t1.subscriber_id
) AS t1
ON s.subscriber_id = t1.subscriber_id
LEFT JOIN
( SELECT t2.subscriber_id,
GROUP_CONCAT( DISTINCT t2.value SEPARATOR '|' ) AS Languages
FROM subscribers AS s
JOIN subscribers_multivalued AS t2
ON s.subscriber_id = t2.subscriber_id
WHERE t2.field_id = 111
AND s.list_id = 40
AND s.state = 1
GROUP BY t2.subscriber_id
) AS t2
ON s.subscriber_id = t2.subscriber_id
WHERE s.list_id = 40
AND s.state = 1 ;
If (field_id, subscriber_id, value) is unique in table subscribers_multivalued, you can also drop the two DISTINCT.
Regarding speed and efficiency, check the indexes you have. These would help both your version and this one:
in the subscribers table, an index on either (list_id, state, subscriber_id) or (state, list_id, subscriber_id)
in the subscribers_multivalued table, an index on (field_id, subscriber_id, value) or (second best) on (field_id, subscriber_id).
I have a sql query that is running really slow and I am perplexed as to why. The query is:
SELECT DISTINCT(c.ID),c.* FROM `content` c
LEFT JOIN `content_meta` cm1 ON c.id = cm1.content_id
WHERE 1=1
AND c.site_id IN (14)
AND c.type IN ('a','t')
AND c.status = 'visible'
AND (c.lock = 0 OR c.site_id = 14)
AND c.level = 0
OR
(
( c.site_id = 14
AND cm1.meta_key = 'g_id'
AND cm1.meta_value IN ('12','13','7')
)
OR
( c.status = 'visible'
AND (
(c.type = 'topic' AND c.parent_id IN (628,633,624))
)
)
)
ORDER BY c.date_updated DESC LIMIT 20
The content table has about 1250 rows and the content meta table has about 3000 rows. This isn't a lot of data and I'm not quite sure what may be causing it to run so slow. Any thoughts/opinions would be greatly appreciated.
Thanks!
Is you where clause correct? You are making a series of AND statements and later you are executing a OR.
Wouldn't the correct be something like:
AND (c.lock = 0 OR c.site_id = 14)
AND (
( ... )
OR
( ... )
)
If it is indeed correct, you could think on changing the structure or treating the result in an script or procedure.
It might have to do with your "OR" clause at the end... your up-front are being utilized by the indexes where possible, but then you throw this huge OR condition at the end that can be either one or the other. Not knowing more of the underlying content, I would adjust to have a UNION on the inside so each entity can utilize its own indexes, get you qualified CIDs, THEN join to final results.
select
c2.*
from
( select distinct
c.ID
from
`content` c
where
c.site_id in (14)
and c.type in ('a', 't' )
and c.status = 'visible'
and c.lock in ( 0, 14 )
and c.level = 0
UNION
select
c.ID
from
`content` c
where
c.status = 'visible'
and c.type = 'topic'
and c.parent_id in ( 628, 633, 624 )
UNION
select
c.ID
from
`content` c
join `content_meta` cm1
on c.id = cm1.content_id
AND cm1.meta_key = 'g_id'
AND cm1.meta_value in ( '12', '13', '7' )
where
c.site_id = 14 ) PreQuery
JOIN `content` c2
on PreQuery.cID = c2.cID
order by
c2.date_updated desc
limit
20
I would ensure content table has an index on ( site_id, type, status ) another on (parent_id, type, status)
and the meta table, an index on ( content_id, meta_key, meta_value )
I'm concerned about the performance of the query below once the tables are fully populated. So far it's under development and performs well with dummy data.
The table "adress_zoo" will contain about 500 million records once fully populated. "adress_zoo" table looks like this:
CREATE TABLE `adress_zoo`
( `adress_id` int(11) NOT NULL, `zoo_id` int(11) NOT NULL,
UNIQUE KEY `pk` (`adress_id`,`zoo_id`),
KEY `adress_id` (`adress_id`) )
ENGINE=InnoDB DEFAULT CHARSET=latin1;
The other tables will contain maximum 500 records each.
The full query looks like this:
SELECT a.* FROM jos_zoo_item AS a
JOIN jos_zoo_search_index AS zsi2 ON zsi2.item_id = a.id
WHERE a.id IN (
SELECT r.id FROM (
SELECT zi.id AS id, Max(zi.priority) as prio
FROM jos_zoo_item AS zi
JOIN jos_zoo_search_index AS zsi ON zsi.item_id = zi.id
LEFT JOIN jos_zoo_tag AS zt ON zt.item_id = zi.id
JOIN jos_zoo_category_item AS zci ON zci.item_id = zi.id
**JOIN adress_zoo AS az ON az.zoo_id = zi.id**
WHERE 1=1
AND ( (zci.category_id != 0 AND ( zt.name != 'prolong' OR zt.name is NULL))
OR (zci.category_id = 0 AND zt.name = 'prolong') )
AND zi.type = 'telefoni'
AND zsi.element_id = '44d3b1fd-40f6-4fd7-9444-7e11643e2cef'
AND zsi.value = 'Small'
AND zci.category_id > 15
**AND az.adress_id = 5**
GROUP BY zci.category_id ) AS r
)
AND a.application_id = 6
AND a.access IN (1,1)
AND a.state = 1
AND (a.publish_up = '0000-00-00 00:00:00' OR a.publish_up <= '2012-06-07 07:51:26')
AND (a.publish_down = '0000-00-00 00:00:00' OR a.publish_down >= '2012-06-07 07:51:26')
AND zsi2.element_id = '1c3cd26e-666d-4f8f-a465-b74fffb4cb14'
GROUP BY a.id
ORDER BY zsi2.value ASC
The query will usually return about 25 records.
Based on your experience, will this query perform acceptable (respond within say 3 seconds)?
What can I do to optimise this?
As adviced by #Jack I ran the query with EXPLAIN and got this:
This part is an important limiter:
az.adress_id = 5
MySQL will limit the table to only those records where adress_id matches before joining it with the rest of the statement, so it will depend on how big you think that result set might be.
Btw, you have a UNIQUE(adress_id, zoo_id) and a separate INDEX. Is there a particular reason? Because the first part of a spanning key can be used by MySQL to select with as well.
What's also important is to use EXPLAIN to understand how MySQL will "attack" your query and return the results. See also: http://dev.mysql.com/doc/refman/5.5/en/execution-plan-information.html
To avoid subquery you can try to rewrite your query as:
SELECT a.* FROM jos_zoo_item AS a
JOIN jos_zoo_search_index AS zsi2 ON zsi2.item_id = a.id
INNER JOIN
(
SELECT ** distinct ** r.id FROM (
SELECT zi.id AS id, Max(zi.priority) as prio
FROM jos_zoo_item AS zi
JOIN jos_zoo_search_index AS zsi ON zsi.item_id = zi.id
LEFT JOIN jos_zoo_tag AS zt ON zt.item_id = zi.id
JOIN jos_zoo_category_item AS zci ON zci.item_id = zi.id
**JOIN adress_zoo AS az ON az.zoo_id = zi.id**
WHERE 1=1
AND ( (zci.category_id != 0 AND ( zt.name != 'prolong' OR zt.name is NULL))
OR (zci.category_id = 0 AND zt.name = 'prolong') )
AND zi.type = 'telefoni'
AND zsi.element_id = '44d3b1fd-40f6-4fd7-9444-7e11643e2cef'
AND zsi.value = 'Small'
AND zci.category_id > 15
**AND az.adress_id = 5**
GROUP BY zci.category_id ) AS r
) T
on a.id = T.id
where
AND a.application_id = 6
AND a.access IN (1,1)
AND a.state = 1
AND (a.publish_up = '0000-00-00 00:00:00' OR a.publish_up <= '2012-06-07 07:51:26')
AND (a.publish_down = '0000-00-00 00:00:00' OR a.publish_down >= '2012-06-07 07:51:26')
AND zsi2.element_id = '1c3cd26e-666d-4f8f-a465-b74fffb4cb14'
GROUP BY a.id
ORDER BY zsi2.value ASC
This approach don't perform subquery for each candidate row. Performance may be increased only if T is calculated in few milliseconds.
I have the following tables:
CREATE TABLE `data` (
`date_time` decimal(26,6) NOT NULL,
`channel_id` mediumint(8) unsigned NOT NULL,
`value` varchar(40) DEFAULT NULL,
`status` tinyint(3) unsigned DEFAULT NULL,
`connected` tinyint(1) unsigned NOT NULL,
PRIMARY KEY (`channel_id`,`date_time`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `channels` (
`channel_id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`channel_name` varchar(40) NOT NULL,
PRIMARY KEY (`channel_id`),
UNIQUE KEY `channel_name` (`channel_name`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
I was wondering if anyone could give me some advice on how to optimize or rewrite the following query:
SELECT channel_name, t0.date_time, t0.value, t0.status, t0.connected, t1.date_time, t1.value, t1.status, t1.connected FROM channels,
(SELECT MAX(date_time) AS date_time, channel_id, value, status, connected FROM data
WHERE date_time <= 1300818330
GROUP BY channel_id) AS t0
RIGHT JOIN
(SELECT MAX(date_time) AS date_time, channel_id, value, status, connected FROM data
WHERE date_time <= 1300818334
GROUP BY channel_id) AS t1
ON t0.channel_id = t1.channel_id
WHERE channels.channel_id = t1.channel_id
Basically I am getting the value, status and connected fields for each channel_name at two different times. Since t0 is always <= t1, the fields could exist for t1, but not t0, and I want that to be shown. That is why I am using the RIGHT JOIN. If it does not exist for t1, then it won't exist for t0, so no row should be returned.
The problem seems to be that since I am joining sub queries, no index can be used? I tried rewriting it to do a self join on the channel_id of the data table first but that is millions of rows.
It would also be nice to be able to add a boolean field to each of the final rows that is true when t0.value = t1.value & t0.status = t1.status & t0.connected = t1.connected.
Thank you very much for your time.
You can reduce the two sub-queries to one
SELECT channel_id,
MAX(date_time) AS t1_date_time,
MAX(case when date_time <= {$p1} then date_time end) AS t0_date_time
FROM data
WHERE date_time <= {$p2}
GROUP BY channel_id
GROUP BY is notoriously misleading in MySQL. Imagine if you had MIN() and MAX() in the same select, which row should the non-grouped columns come from? Once you understand this, you will see why it is not deterministic.
To get the full t0 and t1 rows
SELECT x.channel_id,
t0.date_time, t0.value, t0.status, t0.connected,
t1.date_time, t1.value, t1.status, t1.connected
FROM (
SELECT channel_id,
MAX(date_time) AS t1_date_time,
MAX(case when date_time <= {$p1} then date_time end) AS t0_date_time
FROM data
WHERE date_time <= {$p2}
GROUP BY channel_id
) x
INNER JOIN data t1 on t1.channel_id = x.channel_id and t1.date_time = x.t1_date_time
LEFT JOIN data t0 on t0.channel_id = x.channel_id and t0.date_time = x.t0_date_time
And finally a join to get the channel name
SELECT c.channel_name,
t0.date_time, t0.value, t0.status, t0.connected,
t1.date_time, t1.value, t1.status, t1.connected,
t0.value=t1.value AND t1.status=t0.status
AND t0.connected=t1.connected name_me
FROM (
SELECT channel_id,
MAX(date_time) AS t1_date_time,
MAX(case when date_time <= {$p1} then date_time end) AS t0_date_time
FROM data
WHERE date_time <= {$p2}
GROUP BY channel_id
) x
INNER JOIN channels c on c.channel_id = x.channel_id
INNER JOIN data t1 on t1.channel_id = x.channel_id and t1.date_time = x.t1_date_time
LEFT JOIN data t0 on t0.channel_id = x.channel_id and t0.date_time = x.t0_date_time
EDIT
To perform an RLIKE on channel name, it looks simple enough to add a WHERE clause at the end of the query on c.channel_name. It may however perform better to filter it at the subquery, making use of MySQL feature of processing comma-notation joins left to right.
SELECT x.channel_name,
t0.date_time, t0.value, t0.status, t0.connected,
t1.date_time, t1.value, t1.status, t1.connected,
t0.value=t1.value AND t1.status=t0.status
AND t0.connected=t1.connected name_me
(
SELECT c.channel_id, c.channel_name,
MAX(d.date_time) AS t1_date_time,
MAX(case when d.date_time <= {$p1} then d.date_time end) AS t0_date_time
FROM channels c, data d
WHERE c.channel_name RLIKE {$expr}
AND c.channel_id = d.channel_id
AND d.date_time <= {$p2}
GROUP BY c.channel_id
) x
INNER JOIN data t1 on t1.channel_id = x.channel_id and t1.date_time = x.t1_date_time
LEFT JOIN data t0 on t0.channel_id = x.channel_id and t0.date_time = x.t0_date_time