Rebuild MySQL query to stay below MAX_JOIN_SIZE rows

Rebuild MySQL query to stay below MAX_JOIN_SIZE rows - mysql

I have a SQL query which fails (most of the times) because of too many joined rows. The error provided by MySQL is The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET MAX_JOIN_SIZE=# if the SELECT is okay. I know I can avoid the error by setting the mentioned variables SQL_BIG_SELECTS and MAX_JOIN_SIZE, but I feel like this isn't the right way and pushes the problem only a bit in the future, because the join count might grow in the future.
The facts: I have an event planning tool which assigns users (=workers) to certain tasks. The tables are users (userid,username) [ID and name], tasks (taskid,task,start,end) [ID, task name, start as timestamp, end as timestamp] and userassignment (id,userid,taskid,deleted) [ID, user assigned to a task, the task, is the assignment still valid).
The exact table definition is like this:
CREATE TABLE users (
userid INT NOT NULL AUTO_INCREMENT,
username VARCHAR(250),
PRIMARY KEY (userid)
);
CREATE TABLE tasks (
taskid INT NOT NULL AUTO_INCREMENT,
task VARCHAR(250),
start INT,
end INT,
PRIMARY KEY (taskid),
INDEX USING BTREE (start),
INDEX USING BTREE (end)
);
CREATE TABLE userassignment (
id INT NOT NULL AUTO_INCREMENT,
userid INT,
taskid INT,
deleted TINYINT,
PRIMARY KEY (id),
INDEX USING BTREE (userid),
INDEX USING BTREE (userid),
UNIQUE KEY `usertasks` ( `userid` , `taskid` )
);
I need to know, which users are assigned and on which main days of the event (day 1, day 2, day 3) they're assigned.
My query looks like this:
SELECT
u.userid,
u.username,
COUNT(ua.id) AS count_all,
dayone.c AS count_one,
daytwo.c AS count_two,
daythree.c AS count_three
FROM
users AS u
INNER JOIN
userassignment AS ua ON ua.userid = u.userid AND ua.deleted = 0
INNER JOIN
tasks AS t ON ua.taskid = t.taskid
LEFT JOIN (
SELECT
u.userid,
COUNT(ua.id) AS c
FROM
users AS u
INNER JOIN
userassignment AS ua ON
ua.userid = u.userid AND
ua.deleted = 0
INNER JOIN
tasks AS t ON
ua.taskid = t.taskid
WHERE
t.start > UNIX_TIMESTAMP("2014-08-01 00:00:00") AND
t.start < UNIX_TIMESTAMP("2014-08-02 00:00:00")
GROUP BY
u.userid
) AS dayone ON dayone.userid = u.userid
LEFT JOIN (
SELECT
u.userid,
COUNT(ua.id) AS c
FROM
users AS u
INNER JOIN
userassignment AS ua ON
ua.userid = u.userid AND
ua.deleted = 0
INNER JOIN
tasks AS t ON
ua.taskid = t.taskid
WHERE
t.start > UNIX_TIMESTAMP("2014-07-31 00:00:00") AND
t.start < UNIX_TIMESTAMP("2014-08-01 00:00:00")
GROUP BY
u.userid
) AS daytwo ON daytwo.userid = u.userid
LEFT JOIN (
SELECT
u.userid,
COUNT(ua.id) AS c
FROM
users AS u
INNER JOIN
userassignment AS ua ON
ua.userid = u.userid AND
ua.deleted = 0
INNER JOIN
tasks AS t ON
ua.taskid = t.taskid
WHERE
t.start > UNIX_TIMESTAMP("2014-08-02 00:00:00") AND
t.start < UNIX_TIMESTAMP("2014-08-04 00:00:00")
GROUP BY
u.userid
) AS daythree ON daythree.userid = u.userid
WHERE
t.start > UNIX_TIMESTAMP("2014-07-31 00:00:00") AND
t.start < UNIX_TIMESTAMP("2014-08-04 00:00:00")
GROUP BY
u.userid
ORDER BY
username ASC
First I select all users which have an assignment in one of the three days (there are about six time more users in the DB than assigned to a task), then I left join the assigned users of every of the three days.
So, is there a way to rebuild the query to join fewer rows? I only need to know, who is assigned on which day, not the number of assignments.
I already tried to UNION several queries but this was unsuccessful.
SQL Fiddle
An EXPLAIN of the real query (not in the SQL Fiddle) is:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY t range PRIMARY,start start 5 NULL 120 100.00 Using where; Using index; Using temporary; Using filesort
1 PRIMARY ua ref usertasks,userid,taskid taskid 2 db1154575-helfer.t.id 2 100.00 Using where
1 PRIMARY u eq_ref userid userid 2 db1154575-helfer.ua.userid 1 100.00
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 152 100.00
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 94 100.00
1 PRIMARY <derived4> ALL NULL NULL NULL NULL 147 100.00
4 DERIVED t range PRIMARY,start start 5 NULL 53 100.00 Using where; Using index; Using temporary; Using filesort
4 DERIVED ua ref usertasks,userid,taskid taskid 2 db1154575-helfer.t.id 2 100.00 Using where
4 DERIVED u eq_ref userid userid 2 db1154575-helfer.ua.userid 1 100.00 Using index
3 DERIVED t range PRIMARY,start start 5 NULL 21 100.00 Using where; Using index; Using temporary; Using filesort
3 DERIVED ua ref usertasks,userid,taskid taskid 2 db1154575-helfer.t.id 2 100.00 Using where
3 DERIVED u eq_ref userid userid 2 db1154575-helfer.ua.userid 1 100.00 Using index
2 DERIVED t range PRIMARY,start start 5 NULL 44 100.00 Using where; Using index; Using temporary; Using filesort
2 DERIVED ua ref usertasks,userid,taskid taskid 2 db1154575-helfer.t.id 2 100.00 Using where
2 DERIVED u eq_ref userid userid 2 db1154575-helfer.ua.userid 1 100.00 Using index

So, is all that really just a long-winded way of saying this...
SELECT u.*
, DATE(FROM_UNIXTIME(t.start)) dt
, COUNT(t.taskid) total
FROM users u
LEFT
JOIN userassignment ut
ON ut.userid = u.userid
AND ut.deleted = 0
LEFT
JOIN tasks t
ON t.taskid = ut.taskid
GROUP
BY u.userid
, DATE(FROM_UNIXTIME(t.start))
In the example above, you can change COUNT(t.taskid) to COUNT(CASE WHEN x = 'y' THEN z END) or SUM(CASE...

This should return the same result set:
SELECT u.userid, u.username,
COUNT(ua.id) AS count_all,
SUM(case when t.start > UNIX_TIMESTAMP('2014-08-01 00:00:00') AND
t.start < UNIX_TIMESTAMP('2014-08-02 00:00:00')
then 1 else 0
end) as count_one,
SUM(case when t.start > UNIX_TIMESTAMP('2014-07-31 00:00:00') AND
t.start < UNIX_TIMESTAMP('2014-08-01 00:00:00')
then 1 else 0
end) as count_two,
SUM(case when t.start > UNIX_TIMESTAMP('2014-08-02 00:00:00') AND
t.start < UNIX_TIMESTAMP('2014-08-04 00:00:00')
then 1 else 0
end) as count_three
FROM users u LEFT JOIN
userassignment ua
ON ua.userid = u.userid AND
ua.deleted = 0 LEFT JOIN
tasks t
ON ua.taskid = t.taskid
WHERE ua.deleted = 0 AND
t.start > UNIX_TIMESTAMP('2014-07-31 00:00:00') AND
t.start < UNIX_TIMESTAMP('2014-08-04 00:00:00')
GROUP BY u.userid
ORDER BY u.username;
Your formulation is a bit tricky. The outer joins are filter out any user whose assignments are always deleted, for instance. And the date periods are overlapping (I'm not sure if that is intentional, but it is how the query is structured).
Perhaps this simpler query will not exceed internal limits.

Related

Optimizing a Join

I have a join that looks like this:
LEFT JOIN abc__emails.my208 my3
ON my3.eid = my.eid
AND (
my3.my_id = 417
AND my3.data = 'AB Deleted Account'
)
This query is bordeline runnable. I'm trying to save every scrap of memory. My question then:
LEFT JOIN abc__emails.my208 my3
ON my3.eid = my.eid
AND my3.data = 'AB Deleted Account'
I removed the my3.my_id = 417 part.
Records where my_id = 417 include 'AB Deleted Account' but also a couple of other strings. I included it because I'm thinking I may help SQL narrow it down. Is that wrong?
I really just want the join to be on records where my3.data = 'AB Deleted Account'.
Is it best to add on as many "filters" as possible or to remove as many as possible?
eid is already the key on both tables.
Following feedback here is the output of EXPLAIN
1 SIMPLE co ALL 198784 Using temporary; Using filesort
1 SIMPLE my eq_ref PRIMARY PRIMARY 8 abc_emails.co.id,const 1
1 SIMPLE my1 eq_ref PRIMARY PRIMARY 8 abc_emails.my.eid,const 1
1 SIMPLE my2 eq_ref PRIMARY PRIMARY 8 abc_emails.my.eid,const 1
1 SIMPLE my3 ref ver_eid ver_eid 4 abc_emails.my.eid 246
1 SIMPLE i ref eid_iid eid_iid 10 abc_emails.my.eid,const 1 Using where; Using index
1 SIMPLE nvk eq_ref PRIMARY PRIMARY 4 abc_emails.co.id 1
And here is the query itself
SELECT
co.id AS ID_Eid,
co.email,
def.def_medium AS def_Medium_def208,
co.created AS Contact_Create_Date,
my1.data AS PAW_ID,
my.data AS Trial_Date,
my2.data AS Upgrade_Date,
MIN(my3.lastmod) AS Cancel_Date
FROM
abc_emails.cid208 co
LEFT JOIN abc_emails.my208 my ON my.eid = co.id AND my.my_id = 581
LEFT JOIN abc_emails.my208 my1 ON my1.eid = my.eid AND my1.my_id = 2765
LEFT JOIN abc_emails.my208 my2 ON my2.eid = my.eid AND my2.my_id = 3347
LEFT JOIN abc__emails.my208 my3 ON my3.eid = my.eid AND (my3.my_id = 417 AND my3.data = 'AB Deleted Account')
LEFT JOIN abc_emails.i208 i ON i.eid = my.eid AND i.iid = 22467
LEFT JOIN abc_emails.def208 def ON def.eid = co.id
WHERE i.iid IS NULL
GROUP BY ID_Eid, email, def_Medium_def208, Contact_Create_Date, PAW_ID, Upgrade_Date

Avoid full table scan in inner select

I have the following select in MySQL, which produces the right results but it takes unnecessarily long to execute:
SELECT tblGPSDevices.Email, tblLoc.Lat, tblLoc.Lon, tblLoc.Radius, tblLoc.CreationTimeStamp, tblTrackedUsers.ID, tblTrackedUsers.TrackerDeviceID, tblTrackedUsers.TrackedDeviceID
FROM tblTrackedUsers
INNER JOIN tblGPSDevices ON tblTrackedUsers.TrackedDeviceID = tblGPSDevices.ID
LEFT OUTER JOIN (
SELECT A.DeviceID, A.Lat, A.Lon, A.Radius, A.CreationTimeStamp, A.ID
FROM tblLocations A
INNER JOIN (
SELECT DeviceID, MAX(CreationTimeStamp) AS CreationTimeStamp, MAX(ID) AS ID
FROM tblLocations
GROUP BY DeviceID
) AS B ON A.DeviceID = B.DeviceID
AND A.CreationTimeStamp = B.CreationTimeStamp
AND A.ID = B.ID
) AS tblLoc ON tblLoc.DeviceID = tblGPSDevices.ID
WHERE tblGPSDevices.Validated = 0x01
AND tblGPSDevices.Enabled = 0x01
AND tblTrackedUsers.Validated = 0x01
AND tblTrackedUsers.TrackerDeviceID = 1
ORDER BY tblTrackedUsers.ID;
This query runs much slower than it should because it does a full table scan on tblLocations.
This is the part that really slows down the query:
SELECT A.DeviceID, A.Lat, A.Lon, A.Radius, A.CreationTimeStamp, A.ID
FROM tblLocations A
INNER JOIN (
SELECT DeviceID, MAX(CreationTimeStamp) AS CreationTimeStamp, MAX(ID) AS ID
FROM tblLocations
GROUP BY DeviceID
) AS B ON A.DeviceID = B.DeviceID
AND A.CreationTimeStamp = B.CreationTimeStamp
AND A.ID = B.ID
Here is the explain plan:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY tblTrackedUsers ref TrackerDeviceID,TrackedDeviceID TrackerDeviceID 9 const 14 Using where; Using temporary; Using filesort
1 PRIMARY tblGPSDevices eq_ref PRIMARY PRIMARY 8 tblTrackedUsers.TrackedDeviceID 1 Using where
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 2073
2 DERIVED <derived3> ALL NULL NULL NULL NULL 2073
2 DERIVED A eq_ref PRIMARY,DeviceID,CreationTimeStampIndex PRIMARY 8 B.ID 1 Using where
3 DERIVED tblLocations index NULL DeviceID 8 NULL 174058
It does a full table scan on tblLocations, even though I only need s small subset of DeviceID's in that table.
I just need to look at the DeviceID's that are returned from this part:
INNER JOIN tblGPSDevices ON tblTrackedUsers.TrackedDeviceID = tblGPSDevices.ID
WHERE tblTrackedUsers.TrackerDeviceID = 1
But unfortunately tblTrackedUsers.TrackedDeviceID is not visible in the inner select. So if I add
WHERE DeviceID = tblTrackedUsers.TrackedDeviceID
right above
GROUP BY DeviceID
It does not work.
How can I go about optimizing this query?
Structure of the tables involved with the relevant fields only:
tblGPSDevices:
ID | Email | Validated | Enabled
tblLocations:
ID | DeviceID | Lat | Lon | Radius | CreationTimeStamp
tblTrackedUsers:
ID | TrackerDeviceID | TrackedDeviceID | Validated
tblLocations.DeviceID, tblTrackedUsers.TrackerDeviceID and tblTrackedUsers.TrackedDeviceID are foreign keys pointing to tblGPSDevices.ID
What this query does:
The query should return all devices from tblGPSDevices that are being tracked by the user and their last location from tblLocations. The way to determine which devices are being tracked by a user is determined by tblTrackedUsers: select TrackedDeviceID from tblTrackedUsers where TrackerDeviceID = some_value

I did find the answer myself and I will post for future reference. This sped up the query from 2 sec. to 0.0179 sec. That is an enormous gain.
The key was to add one more inner select within:
SELECT DeviceID, MAX(CreationTimeStamp) AS CreationTimeStamp, MAX(ID) AS ID
FROM tblLocations
GROUP BY DeviceID
in order to avoid a full table scan, since we are only interested in DeviceID's that are = to tblTrackedUsers.TrackedDeviceID. Now this select looks like this:
SELECT C.DeviceID, MAX(C.CreationTimeStamp) AS CreationTimeStamp, MAX(C.ID) AS ID
FROM tblLocations C
INNER JOIN (
SELECT ID, TrackerDeviceID, TrackedDeviceID, TrackedName, AccessCode, Validated FROM tblAllowedUsers WHERE TrackerDeviceID = 1 AND Validated=0x01
) AS D ON D.TrackedDeviceID = C.DeviceID
GROUP BY DeviceID
Here is the full select now:
SELECT tblGPSDevices.Email, tblLoc.Lat, tblLoc.Lon, tblLoc.Radius, tblLoc.CreationTimeStamp, tblTrackedUsers.ID, tblTrackedUsers.TrackerDeviceID, tblTrackedUsers.TrackedDeviceID
FROM tblTrackedUsers
INNER JOIN tblGPSDevices ON tblTrackedUsers.TrackedDeviceID = tblGPSDevices.ID
LEFT OUTER JOIN (
SELECT A.DeviceID, A.Lat, A.Lon, A.Radius, A.CreationTimeStamp, A.ID
FROM tblLocations A
INNER JOIN (
SELECT C.DeviceID, MAX(C.CreationTimeStamp) AS CreationTimeStamp, MAX(C.ID) AS ID
FROM tblLocations C
INNER JOIN (
SELECT ID, TrackerDeviceID, TrackedDeviceID, TrackedName, AccessCode, Validated FROM tblAllowedUsers WHERE TrackerDeviceID = 1 AND Validated=0x01
) AS D ON D.TrackedDeviceID = C.DeviceID
GROUP BY DeviceID
) AS B ON A.DeviceID = B.DeviceID
AND A.CreationTimeStamp = B.CreationTimeStamp
AND A.ID = B.ID
) AS tblLoc ON tblLoc.DeviceID = tblGPSDevices.ID
WHERE tblGPSDevices.Validated = 0x01
AND tblGPSDevices.Enabled = 0x01
AND tblTrackedUsers.Validated = 0x01
AND tblTrackedUsers.TrackerDeviceID = 1
ORDER BY tblTrackedUsers.ID;

OR in left join

billboards table 140000 rows, regions 1000 rows.
SELECT
r.id,
SUM(IF(bb.r1_id = r.id, 1, 0)) AS count,
SUM(IF(bb.r2_id = r.id, 1, 0)) AS count2
FROM
tmp_regions AS r
LEFT JOIN
tmp_billboards AS bb
ON (r.id = bb.r1_id OR r.id = bb.r2_id)
WHERE
bb.deleted = 0
AND
bb.x != 0
AND
bb.y != 0
GROUP BY r.id
ORDER BY r.capital DESC , r.other , r.name
execution time is 8 sec
Explain
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE bb ref bb_r,bb_deleted,bb_x,bb_y,deleted_x_y,bb_r2 bb_deleted 1 const 66396 Using where; Using temporary; Using filesort
1 SIMPLE r ALL PRIMARY NULL NULL NULL 1000 Using where; Using join buffer
how can i change OR in join to improve perfomance?

Add indexes. The output of explain shows you which fields need them.

Assuming that tmp_regions (id) is the primary key, you could rewrite the query and the OR is converted to 2 joins:
SELECT
r.id,
COALESCE(bb1.cnt, 0) AS count,
COALESCE(bb2.cnt, 0) AS count2
FROM
tmp_regions AS r
LEFT JOIN
( SELECT r1_id, COUNT(*) AS cnt
FROM tmp_billboards
WHERE deleted = 0
AND x <> 0
AND y <> 0
GROUP BY r1_id
) AS bb1
ON r.id = bb1.r1_id
LEFT JOIN
( SELECT r2_id, COUNT(*) AS cnt
FROM tmp_billboards
WHERE deleted = 0
AND x <> 0
AND y <> 0
GROUP BY r2_id
) AS bb2
ON r.id = bb2.r2_id
ORDER BY r.capital DESC , r.other , r.name ;
For efficiency, indexes on (deleted, r1_id, x, y) and (deleted, r2_id, x, y) would help to avoid table scans on the tmp_billboards.

how to rewrite the query to make it faster and sql compliant

This query:
SELECT `subscribers`.`email_address`, `subscribers`.`first_name`, `subscribers`.`last_name`,
GROUP_CONCAT( DISTINCT t1.value SEPARATOR '|' ) AS 'Colors', GROUP_CONCAT( DISTINCT t2.value SEPARATOR '|' ) AS 'Languages'
FROM `subscribers`
LEFT JOIN `subscribers_multivalued` AS `t1` ON subscribers.subscriber_id = t1.subscriber_id AND t1.field_id = 112
LEFT JOIN `subscribers_multivalued` AS `t2` ON subscribers.subscriber_id = t2.subscriber_id AND t2.field_id = 111
WHERE (list_id = 40) AND (state = 1)
GROUP BY `subscribers`.`email_address` , `subscribers`.`first_name` , `subscribers`.`last_name`
With execute plan:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE subscribers ref FK_list_id,state_date_added FK_list_id 4 const 1753610 Using where; Using filesort
1 SIMPLE t1 ref subscriber_fk,field_fk subscriber_fk 4 chad0598_mailablel.subscribers.subscriber_id 1
1 SIMPLE t2 ref subscriber_fk,field_fk subscriber_fk 4 chad0598_mailablel.subscribers.subscriber_id 1
Caused error for uknown reason: General error: 3 Error writing file '/tmp/MYzamaNT' (Errcode: 28) in mysql.
I rewrote it like this groupy by subscriber_id insead:
SELECT `subscribers`.`email_address`, `subscribers`.`first_name`, `subscribers`.`last_name`,
GROUP_CONCAT( DISTINCT t1.value SEPARATOR '|' ) AS 'Colors', GROUP_CONCAT( DISTINCT t2.value SEPARATOR '|' ) AS 'Languages'
FROM `subscribers`
LEFT JOIN `subscribers_multivalued` AS `t1` ON subscribers.subscriber_id = t1.subscriber_id AND t1.field_id = 112
LEFT JOIN `subscribers_multivalued` AS `t2` ON subscribers.subscriber_id = t2.subscriber_id AND t2.field_id = 111
WHERE (list_id = 40) AND (state = 1)
GROUP BY `subscribers`.`subscriber_id`
The plan of this query is better (not filesort):
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE subscribers ref FK_list_id,state_date_added FK_list_id 4 const 1753610 Using where
1 SIMPLE t1 ref subscriber_fk,field_fk subscriber_fk 4 chad0598_mailablel.subscribers.subscriber_id 1
1 SIMPLE t2 ref subscriber_fk,field_fk subscriber_fk 4 chad0598_mailablel.subscribers.subscriber_id 1
It seems that it works in mysql and it much faster that previous query.
Although it works in mysql it's not compliant sql standard there shouldn't be aggregated fields in fields' list that are not present in GROUP BY. I there a way to make the query as fast as the last one and make it sql compliant?
I tried to rewrite it this way too:
SELECT `subscribers`.`email_address`, `subscribers`.`first_name`, `subscribers`.`last_name`,
t1.colors AS 'Colors', t2.languages AS 'Languages'
FROM `subscribers`
LEFT JOIN (SELECT subscriber_id, GROUP_CONCAT( DISTINCT value SEPARATOR '|' ) AS 'colors'
FROM subscribers_multivalued
WHERE field_id = 112
GROUP BY subscriber_id) t1 ON t1.subscriber_id = `subscribers`.`subscriber_id`
LEFT JOIN (SELECT subscriber_id, GROUP_CONCAT( DISTINCT value SEPARATOR '|' ) AS 'languages'
FROM subscribers_multivalued
WHERE field_id = 111
GROUP BY subscriber_id) t2 ON t2.subscriber_id = `subscribers`.`subscriber_id`
The plan of this query is much worse:
1 PRIMARY subscribers ALL NULL NULL NULL NULL 23358546
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 900000
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 900000
3 DERIVED subscribers_multivalued ALL field_fk field_fk 4 20621115 Using filesort
2 DERIVED subscribers_multivalued ALL field_fk field_fk 4 20621115 Using filesort
and I couldn't wait untill it returns data;

I'd keep the faster one. You can also test/benchmark this variation:
SELECT s.email_address,
s.first_name,
s.last_name,
COALESCE(t1.Colors, '') AS Colors,
COALESCE(t2.Languages, '') AS Languages
FROM subscribers AS s
LEFT JOIN
( SELECT t1.subscriber_id,
GROUP_CONCAT( DISTINCT t1.value SEPARATOR '|' ) AS Colors
FROM subscribers AS s
JOIN subscribers_multivalued AS t1
ON s.subscriber_id = t1.subscriber_id
WHERE t1.field_id = 112
AND s.list_id = 40
AND s.state = 1
GROUP BY t1.subscriber_id
) AS t1
ON s.subscriber_id = t1.subscriber_id
LEFT JOIN
( SELECT t2.subscriber_id,
GROUP_CONCAT( DISTINCT t2.value SEPARATOR '|' ) AS Languages
FROM subscribers AS s
JOIN subscribers_multivalued AS t2
ON s.subscriber_id = t2.subscriber_id
WHERE t2.field_id = 111
AND s.list_id = 40
AND s.state = 1
GROUP BY t2.subscriber_id
) AS t2
ON s.subscriber_id = t2.subscriber_id
WHERE s.list_id = 40
AND s.state = 1 ;
If (field_id, subscriber_id, value) is unique in table subscribers_multivalued, you can also drop the two DISTINCT.
Regarding speed and efficiency, check the indexes you have. These would help both your version and this one:
in the subscribers table, an index on either (list_id, state, subscriber_id) or (state, list_id, subscriber_id)
in the subscribers_multivalued table, an index on (field_id, subscriber_id, value) or (second best) on (field_id, subscriber_id).

MySQL "OR MATCH" hangs (very slow) on multiple tables

After learning how to do MySQL Full-Text search, the recommended solution for multiple tables was OR MATCH and then do the other database call. You can see that in my query below.
When I do this, it just gets stuck in a "busy" state, and I can't access the MySQL database.
SELECT
a.`product_id`, a.`name`, a.`slug`, a.`description`, b.`list_price`, b.`price`, c.`image`, c.`swatch`, e.`name` AS industry,
MATCH( a.`name`, a.`sku`, a.`description` ) AGAINST ( '%s' IN BOOLEAN MODE ) AS relevance
FROM
`products` AS a LEFT JOIN `website_products` AS b
ON (a.`product_id` = b.`product_id`)
LEFT JOIN ( SELECT `product_id`, `image`, `swatch` FROM `product_images` WHERE `sequence` = 0) AS c
ON (a.`product_id` = c.`product_id`)
LEFT JOIN `brands` AS d
ON (a.`brand_id` = d.`brand_id`)
INNER JOIN `industries` AS e ON (a.`industry_id` = e.`industry_id`)
WHERE
b.`website_id` = %d
AND b.`status` = %d
AND b.`active` = %d
AND MATCH( a.`name`, a.`sku`, a.`description` ) AGAINST ( '%s' IN BOOLEAN MODE )
OR MATCH ( d.`name` ) AGAINST ( '%s' IN BOOLEAN MODE )
GROUP BY a.`product_id`
ORDER BY relevance DESC
LIMIT 0, 9
Any help would be greatly appreciated.
EDIT
All the tables involved are MyISAM, utf8_general_ci.
Here's the EXPLAIN SELECT statement:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY a ALL NULL NULL NULL NULL 16076 Using temporary; Using filesort
1 PRIMARY b ref product_id product_id 4 database.a.product_id 2
1 PRIMARY e eq_ref PRIMARY PRIMARY 4 database.a.industry_id 1
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 23261
1 PRIMARY d eq_ref PRIMARY PRIMARY 4 database.a.brand_id 1 Using where
2 DERIVED product_images ALL NULL NULL NULL NULL 25933 Using where
I don't know how to make that look neater -- sorry about that
UPDATE
it returns the query after 196 seconds (I think correctly). The query without multiple tables takes about .56 seconds (which I know is really slow, we plan on changing to solr or sphinx soon), but 196 seconds??
If we could add a number to the relevance if it was in the brand name ( d.name ), that would also work

I found 2 things slowing down my query drastically and fixed them.
To answer the first problem, it needed parentheses around the entire "MATCH AGAINST OR MATCH AGAINST":
WHERE
b.`website_id` = %d
AND b.`status` = %d
AND b.`active` = %d
AND (
MATCH( a.`name`, a.`sku`, a.`description` ) AGAINST ( '%s' IN BOOLEAN MODE )
OR MATCH ( d.`name` ) AGAINST ( '%s' IN BOOLEAN MODE )
)
I didn't understand how to use EXPLAIN SELECT, but it helped quite a bit, so thank you! This reduced that first number 16076 rows to 143. I then noticed the other two with over 23 and 25 thousand rows. That was cause from this line:
LEFT JOIN ( SELECT `product_id`, `image`, `swatch` FROM `product_images` WHERE `sequence` = 0) AS c
ON (a.`product_id` = c.`product_id`)
There was a reason I was doing this in the first place, which then changed. When I changed it, I didn't realize I could do a normal LEFT JOIN:
LEFT JOIN `product_images` AS c
ON (a.`product_id` = c.`product_id`)
This makes my final query like this: (and MUCH faster went from the 196 seconds to 0.0084 or so)
SELECT
a.`product_id`, a.`name`, a.`slug`, a.`description`, b.`list_price`, b.`price`,
c.`image`, c.`swatch`, e.`name` AS industry,
MATCH( a.`name`, a.`sku`, a.`description` ) AGAINST ( '%s' IN BOOLEAN MODE ) AS relevance
FROM
`products` AS a LEFT JOIN `website_products` AS b
ON (a.`product_id` = b.`product_id`)
LEFT JOIN `product_images` AS c
ON (a.`product_id` = c.`product_id`)
LEFT JOIN `brands` AS d
ON (a.`brand_id` = d.`brand_id`)
INNER JOIN `industries` AS e
ON (a.`industry_id` = e.`industry_id`)
WHERE
b.`website_id` = %d
AND b.`status` = %d
AND b.`active` = %d
AND c.`sequence` = %d
AND (
MATCH( a.`name`, a.`sku`, a.`description` ) AGAINST ( '%s' IN BOOLEAN MODE )
OR MATCH( d.`name` ) AGAINST( '%s' IN BOOLEAN MODE )
)
GROUP BY a.`product_id`
ORDER BY relevance DESC
LIMIT 0, 9
Oh, and even before I was doing a full text search with multiple tables, it was taking about 1/2 a second. This is much improved.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Rebuild MySQL query to stay below MAX_JOIN_SIZE rows - mysql

Related

Optimizing a Join

Avoid full table scan in inner select

OR in left join

how to rewrite the query to make it faster and sql compliant

MySQL "OR MATCH" hangs (very slow) on multiple tables

Categories

Resources