OR in left join

OR in left join - mysql

billboards table 140000 rows, regions 1000 rows.
SELECT
r.id,
SUM(IF(bb.r1_id = r.id, 1, 0)) AS count,
SUM(IF(bb.r2_id = r.id, 1, 0)) AS count2
FROM
tmp_regions AS r
LEFT JOIN
tmp_billboards AS bb
ON (r.id = bb.r1_id OR r.id = bb.r2_id)
WHERE
bb.deleted = 0
AND
bb.x != 0
AND
bb.y != 0
GROUP BY r.id
ORDER BY r.capital DESC , r.other , r.name
execution time is 8 sec
Explain
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE bb ref bb_r,bb_deleted,bb_x,bb_y,deleted_x_y,bb_r2 bb_deleted 1 const 66396 Using where; Using temporary; Using filesort
1 SIMPLE r ALL PRIMARY NULL NULL NULL 1000 Using where; Using join buffer
how can i change OR in join to improve perfomance?

Add indexes. The output of explain shows you which fields need them.

Assuming that tmp_regions (id) is the primary key, you could rewrite the query and the OR is converted to 2 joins:
SELECT
r.id,
COALESCE(bb1.cnt, 0) AS count,
COALESCE(bb2.cnt, 0) AS count2
FROM
tmp_regions AS r
LEFT JOIN
( SELECT r1_id, COUNT(*) AS cnt
FROM tmp_billboards
WHERE deleted = 0
AND x <> 0
AND y <> 0
GROUP BY r1_id
) AS bb1
ON r.id = bb1.r1_id
LEFT JOIN
( SELECT r2_id, COUNT(*) AS cnt
FROM tmp_billboards
WHERE deleted = 0
AND x <> 0
AND y <> 0
GROUP BY r2_id
) AS bb2
ON r.id = bb2.r2_id
ORDER BY r.capital DESC , r.other , r.name ;
For efficiency, indexes on (deleted, r1_id, x, y) and (deleted, r2_id, x, y) would help to avoid table scans on the tmp_billboards.

Related

MYSQL LEFT JOIN returns unexpected results

I have two tables talk_comments and talk_comment_votes.
I run the following code to select, commentId, numberOfUpvotes, whetherUserUpvoted, numberOfDownvotes, whetherUserDownvoted usin LEFT JOINs to the same table.
SELECT c.id, COUNT(v1.id) as upvotes, COUNT(v2.id) as userUpvoted, COUNT(v3.id) as downvotes, COUNT(v4.id) as userDownvoted FROM talk_comments c
LEFT JOIN talk_comment_votes v1 ON v1.comment_id = c.id AND v1.status = 1
LEFT JOIN talk_comment_votes v2 ON v2.comment_id = c.id AND v2.status = 1 AND v2.user_id = 1 AND v2.is_anonymous = 0
LEFT JOIN talk_comment_votes v3 ON c.id = v3.comment_id AND v3.status = 2
LEFT JOIN talk_comment_votes v4 ON c.id = v4.comment_id AND v4.status = 2 AND v4.user_id = 1 AND v4.is_anonymous = 0
WHERE c.id = 2 GROUP BY c.id
I have the following data in my talk_comment_votes table
So, according to the query, it should select values 2,2,0,1,1 respectively. When I break those JOIN statements and do the queries, it returns the expected results. But, with JOINs, it returns something like the follows.
Can I get some help on fixing this?
Thanks.

I ran a benchmark on queries based on #spencer7593 and #RaymondNijland's 2 answers.
LEFT JOINS wins!
1. Using LEFT JOINS
SELECT c.id, COUNT(DISTINCT v1.id) as upvotes, COUNT(DISTINCT v2.id) as userUpvoted, COUNT(DISTINCT v3.id) as downvotes, COUNT(DISTINCT v4.id) as userDownvoted FROM talk_comments c
LEFT JOIN talk_comment_votes v1 ON v1.comment_id = c.id AND v1.status = 1
LEFT JOIN talk_comment_votes v2 ON v2.comment_id = c.id AND v2.status = 1 AND v2.user_id = 1 AND v2.is_anonymous = 0
LEFT JOIN talk_comment_votes v3 ON c.id = v3.comment_id AND v3.status = 2
LEFT JOIN talk_comment_votes v4 ON c.id = v4.comment_id AND v4.status = 2 AND v4.user_id = 1 AND v4.is_anonymous = 0
WHERE c.id = 2 GROUP BY c.id
Time for 1000 queries: 0.55000805854797s
2. Using Sub Queries
SELECT c.id,c.user_id, c.time,c.body, c.reply_to,
(SELECT COUNT(v1.id) FROM talk_comment_votes v1 WHERE v1.comment_id = c.id AND v1.status = 1 LIMIT 1) as upvotes,
(SELECT COUNT(v2.id) FROM talk_comment_votes v2 WHERE v2.comment_id = c.id AND v2.status = 1 AND v2.user_id = 1 LIMIT 1) as clientUpvoted,
(SELECT COUNT(v3.id) FROM talk_comment_votes v3 WHERE v3.comment_id = c.id AND v3.status = 2 LIMIT 1) as downvotes,
(SELECT COUNT(v4.id) FROM talk_comment_votes v4 WHERE v4.comment_id = c.id AND v4.status = 2 AND v4.user_id = 1 LIMIT 1) as clientDownvoted
FROM talk_comments c
WHERE c.id = 2 GROUP BY c.id
Time for 1000 queries: 0.95499300956726s
3. Using SUM, IF
SELECT c.id
, SUM(IF(v.status = 1 ,1,0)) AS upvotes
, SUM(IF(v.status = 1 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userUpvoted
, SUM(IF(v.status = 2 ,1,0)) AS downvotes
, SUM(IF(v.status = 2 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userDownvoted
FROM talk_comments c
LEFT
JOIN talk_comment_votes v
ON v.comment_id = c.id
WHERE c.id = 2
GROUP BY c.id
Time for 1000 queries: 1.2266919612885s
Thank you for all the answers.

I'd use conditional aggregation. A join to a single reference to tall_comment_votes, and then check conditions in expressions.
SELECT c.id
, SUM(IF(v.status = 1 ,1,0)) AS upvotes
, SUM(IF(v.status = 1 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userUpvoted
, SUM(IF(v.status = 2 ,1,0)) AS downvotes
, SUM(IF(v.status = 2 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userDownvoted
FROM talk_comments c
LEFT
JOIN talk_comment_votes v
ON v.comment_id = c.id
WHERE c.id = 2
GROUP
BY c.id
This avoids the problem of the partial cross product, when there are multiple rows returned from v1, v2, v3 and v4.
The MySQL IF() expression could replaced with a more ANSI standards compliant CASE expression, e.g.
, SUM(CASE WHEN v.status = 1 THEN 1 ELSE 0 END) AS upvotes
FOLLOWUP
setup test case and observe execution plans and performance
populate tables
CREATE TABLE talk_comments (id INT NOT NULL PRIMARY KEY AUTO_INCREMENT);
CREATE TABLE talk_comment_votes (id INT NOT NULL PRIMARY KEY AUTO_INCREMENT, comment_id INT UNSIGNED NOT NULL, user_id INT UNSIGNED NOT NULL, is_anonymous TINYINT(1) UNSIGNED NOT NULL, STATUS TINYINT UNSIGNED, time_ INT UNSIGNED);
CREATE INDEX talk_comment_votes_IX1 ON talk_comment_votes (comment_id, STATUS, user_id, is_anonymous) ;
INSERT INTO talk_comments (id) VALUES (1),(2),(3);
INSERT INTO talk_comment_votes (id, comment_id, user_id, is_anonymous, STATUS, time_) VALUES (1,2,2,0,1,0),(2,1,1,0,1,0),(3,2,1,0,2,NULL),(4,7,1,0,2,NULL),(5,1,14,1,1,NULL),(6,2,14,1,1,NULL);
query execution plans
EXPLAIN
SELECT c.id, COUNT(DISTINCT v1.id) AS upvotes, COUNT(DISTINCT v2.id) AS userUpvoted, COUNT(DISTINCT v3.id) AS downvotes, COUNT(DISTINCT v4.id) AS userDownvoted FROM talk_comments c
LEFT JOIN talk_comment_votes v1 ON v1.comment_id = c.id AND v1.status = 1
LEFT JOIN talk_comment_votes v2 ON v2.comment_id = c.id AND v2.status = 1 AND v2.user_id = 1 AND v2.is_anonymous = 0
LEFT JOIN talk_comment_votes v3 ON c.id = v3.comment_id AND v3.status = 2
LEFT JOIN talk_comment_votes v4 ON c.id = v4.comment_id AND v4.status = 2 AND v4.user_id = 1 AND v4.is_anonymous = 0
WHERE c.id = 2 GROUP BY c.id
;
EXPLAIN
SELECT c.id
, SUM(IF(v.status = 1 ,1,0)) AS upvotes
, SUM(IF(v.status = 1 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userUpvoted
, SUM(IF(v.status = 2 ,1,0)) AS downvotes
, SUM(IF(v.status = 2 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userDownvoted
FROM talk_comments c
LEFT
JOIN talk_comment_votes v
ON v.comment_id = c.id
WHERE c.id = 2
GROUP BY c.id
;
output from explain
-- id select_type table type possible_keys key key_len ref rows Extra
-- ------ ----------- ------ ------ ---------------------- ---------------------- ------- ----------------------- ------ -------------
-- 1 SIMPLE c const PRIMARY PRIMARY 4 const 1 Using index
-- 1 SIMPLE v1 ref talk_comment_votes_IX1 talk_comment_votes_IX1 6 const,const 2 Using index
-- 1 SIMPLE v2 ref talk_comment_votes_IX1 talk_comment_votes_IX1 11 const,const,const,const 1 Using index
-- 1 SIMPLE v3 ref talk_comment_votes_IX1 talk_comment_votes_IX1 6 const,const 1 Using index
-- 1 SIMPLE v4 ref talk_comment_votes_IX1 talk_comment_votes_IX1 11 const,const,const,const 1 Using index
-- id select_type table type possible_keys key key_len ref rows Extra
-- ------ ----------- ------ ------ ---------------------- ---------------------- ------- ------ ------ -------------
-- 1 SIMPLE c const PRIMARY PRIMARY 4 const 1 Using index
-- 1 SIMPLE v ref talk_comment_votes_IX1 talk_comment_votes_IX1 4 const 3 Using index
measured performance:
100 executions round 1 round 2 round 3
------------------------------------ ---------- ---------- ---------
multiple left join, count(distinct 0.123 secs 0.130 secs 0.125 secs
conditional aggregation sum(if 0.113 secs 0.114 secs 0.111 secs

MySQL query too slow about 25 seconds

I have a MySQL query and it takes about 25 sec. There are not many rows (just about 200) but I don't understand why it takes long time.
Query:
SELECT *
, c.id c_id
FROM campaign c
JOIN campaign_category cc
ON c.campaign_type = cc.id
WHERE c.is_deleted = 0
AND c.status = 1
AND c.id NOT IN (SELECT campaign_id FROM user_reviews WHERE user_id = 4)
AND c.amt_req > (SELECT COUNT(id)
FROM reserved_reviews
WHERE camping_id = c.id
AND user_id != 4)
+ (SELECT COUNT(id)
FROM user_reviews
WHERE campaign_id = c.id)
Edit:
I tried with JOIN like this but i got no result:
SELECT
*, `c`.`id` as `c_id`,COUNT(`ur`.`id`) as `total_reviewed`, COUNT(`rr`.`id`) as `total_reserved`
FROM
`campaign` `c`
JOIN `campaign_category` `cc` ON `c`.`campaign_type`=`cc`.`id`
JOIN `user_reviews` `ur` ON `ur`.`campaign_id`=`c`.`id`
JOIN `reserved_reviews` `rr` ON `rr`.`camping_id`=`c`.`id`
WHERE
`c`.`is_deleted` =0
AND
`c`.`status` = 1
AND
`ur`.`user_id` != 4
GROUP BY `c`.`id`
HAVING `c`.`amt_req` > COUNT(`ur`.`id`) + COUNT(`rr`.`id`)
Edit: Table structures: First Image - user_reviews Table, Second image campagin Table, Third image: reserved_reviews Table.
http://imgur.com/GI4817B,SdnSxuz,truxHM6#0

You can improve this query with indexes;
SELECT *, c.id c_id
FROM campaign c JOIN
campaign_category cc
ON c.campaign_type = cc.id
WHERE c.is_deleted = 0 AND
c.status = 1 AND
c.id NOT IN (SELECT campaign_id FROM user_reviews WHERE user_id = 4)
c.amt_req > (SELECT COUNT(*)
FROM reserved_reviews
WHERE campaign_id = c.id AND user_id <> 4)
) +
(SELECT COUNT(id)
FROM user_reviews
WHERE campaign_id = c.id
) ;
For the outer query and joins: campaign(status, is_deleted, id, amt_req) and campaign_category(id) (you should have the latter if it is defined as a primary key.
Then: user_reviews(user_id, campaign_id), reserved_reviews(campaign_id, user_id), and user_reviews(campaign_id).

Avoid full table scan in inner select

I have the following select in MySQL, which produces the right results but it takes unnecessarily long to execute:
SELECT tblGPSDevices.Email, tblLoc.Lat, tblLoc.Lon, tblLoc.Radius, tblLoc.CreationTimeStamp, tblTrackedUsers.ID, tblTrackedUsers.TrackerDeviceID, tblTrackedUsers.TrackedDeviceID
FROM tblTrackedUsers
INNER JOIN tblGPSDevices ON tblTrackedUsers.TrackedDeviceID = tblGPSDevices.ID
LEFT OUTER JOIN (
SELECT A.DeviceID, A.Lat, A.Lon, A.Radius, A.CreationTimeStamp, A.ID
FROM tblLocations A
INNER JOIN (
SELECT DeviceID, MAX(CreationTimeStamp) AS CreationTimeStamp, MAX(ID) AS ID
FROM tblLocations
GROUP BY DeviceID
) AS B ON A.DeviceID = B.DeviceID
AND A.CreationTimeStamp = B.CreationTimeStamp
AND A.ID = B.ID
) AS tblLoc ON tblLoc.DeviceID = tblGPSDevices.ID
WHERE tblGPSDevices.Validated = 0x01
AND tblGPSDevices.Enabled = 0x01
AND tblTrackedUsers.Validated = 0x01
AND tblTrackedUsers.TrackerDeviceID = 1
ORDER BY tblTrackedUsers.ID;
This query runs much slower than it should because it does a full table scan on tblLocations.
This is the part that really slows down the query:
SELECT A.DeviceID, A.Lat, A.Lon, A.Radius, A.CreationTimeStamp, A.ID
FROM tblLocations A
INNER JOIN (
SELECT DeviceID, MAX(CreationTimeStamp) AS CreationTimeStamp, MAX(ID) AS ID
FROM tblLocations
GROUP BY DeviceID
) AS B ON A.DeviceID = B.DeviceID
AND A.CreationTimeStamp = B.CreationTimeStamp
AND A.ID = B.ID
Here is the explain plan:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY tblTrackedUsers ref TrackerDeviceID,TrackedDeviceID TrackerDeviceID 9 const 14 Using where; Using temporary; Using filesort
1 PRIMARY tblGPSDevices eq_ref PRIMARY PRIMARY 8 tblTrackedUsers.TrackedDeviceID 1 Using where
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 2073
2 DERIVED <derived3> ALL NULL NULL NULL NULL 2073
2 DERIVED A eq_ref PRIMARY,DeviceID,CreationTimeStampIndex PRIMARY 8 B.ID 1 Using where
3 DERIVED tblLocations index NULL DeviceID 8 NULL 174058
It does a full table scan on tblLocations, even though I only need s small subset of DeviceID's in that table.
I just need to look at the DeviceID's that are returned from this part:
INNER JOIN tblGPSDevices ON tblTrackedUsers.TrackedDeviceID = tblGPSDevices.ID
WHERE tblTrackedUsers.TrackerDeviceID = 1
But unfortunately tblTrackedUsers.TrackedDeviceID is not visible in the inner select. So if I add
WHERE DeviceID = tblTrackedUsers.TrackedDeviceID
right above
GROUP BY DeviceID
It does not work.
How can I go about optimizing this query?
Structure of the tables involved with the relevant fields only:
tblGPSDevices:
ID | Email | Validated | Enabled
tblLocations:
ID | DeviceID | Lat | Lon | Radius | CreationTimeStamp
tblTrackedUsers:
ID | TrackerDeviceID | TrackedDeviceID | Validated
tblLocations.DeviceID, tblTrackedUsers.TrackerDeviceID and tblTrackedUsers.TrackedDeviceID are foreign keys pointing to tblGPSDevices.ID
What this query does:
The query should return all devices from tblGPSDevices that are being tracked by the user and their last location from tblLocations. The way to determine which devices are being tracked by a user is determined by tblTrackedUsers: select TrackedDeviceID from tblTrackedUsers where TrackerDeviceID = some_value

I did find the answer myself and I will post for future reference. This sped up the query from 2 sec. to 0.0179 sec. That is an enormous gain.
The key was to add one more inner select within:
SELECT DeviceID, MAX(CreationTimeStamp) AS CreationTimeStamp, MAX(ID) AS ID
FROM tblLocations
GROUP BY DeviceID
in order to avoid a full table scan, since we are only interested in DeviceID's that are = to tblTrackedUsers.TrackedDeviceID. Now this select looks like this:
SELECT C.DeviceID, MAX(C.CreationTimeStamp) AS CreationTimeStamp, MAX(C.ID) AS ID
FROM tblLocations C
INNER JOIN (
SELECT ID, TrackerDeviceID, TrackedDeviceID, TrackedName, AccessCode, Validated FROM tblAllowedUsers WHERE TrackerDeviceID = 1 AND Validated=0x01
) AS D ON D.TrackedDeviceID = C.DeviceID
GROUP BY DeviceID
Here is the full select now:
SELECT tblGPSDevices.Email, tblLoc.Lat, tblLoc.Lon, tblLoc.Radius, tblLoc.CreationTimeStamp, tblTrackedUsers.ID, tblTrackedUsers.TrackerDeviceID, tblTrackedUsers.TrackedDeviceID
FROM tblTrackedUsers
INNER JOIN tblGPSDevices ON tblTrackedUsers.TrackedDeviceID = tblGPSDevices.ID
LEFT OUTER JOIN (
SELECT A.DeviceID, A.Lat, A.Lon, A.Radius, A.CreationTimeStamp, A.ID
FROM tblLocations A
INNER JOIN (
SELECT C.DeviceID, MAX(C.CreationTimeStamp) AS CreationTimeStamp, MAX(C.ID) AS ID
FROM tblLocations C
INNER JOIN (
SELECT ID, TrackerDeviceID, TrackedDeviceID, TrackedName, AccessCode, Validated FROM tblAllowedUsers WHERE TrackerDeviceID = 1 AND Validated=0x01
) AS D ON D.TrackedDeviceID = C.DeviceID
GROUP BY DeviceID
) AS B ON A.DeviceID = B.DeviceID
AND A.CreationTimeStamp = B.CreationTimeStamp
AND A.ID = B.ID
) AS tblLoc ON tblLoc.DeviceID = tblGPSDevices.ID
WHERE tblGPSDevices.Validated = 0x01
AND tblGPSDevices.Enabled = 0x01
AND tblTrackedUsers.Validated = 0x01
AND tblTrackedUsers.TrackerDeviceID = 1
ORDER BY tblTrackedUsers.ID;

Rebuild MySQL query to stay below MAX_JOIN_SIZE rows

I have a SQL query which fails (most of the times) because of too many joined rows. The error provided by MySQL is The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET MAX_JOIN_SIZE=# if the SELECT is okay. I know I can avoid the error by setting the mentioned variables SQL_BIG_SELECTS and MAX_JOIN_SIZE, but I feel like this isn't the right way and pushes the problem only a bit in the future, because the join count might grow in the future.
The facts: I have an event planning tool which assigns users (=workers) to certain tasks. The tables are users (userid,username) [ID and name], tasks (taskid,task,start,end) [ID, task name, start as timestamp, end as timestamp] and userassignment (id,userid,taskid,deleted) [ID, user assigned to a task, the task, is the assignment still valid).
The exact table definition is like this:
CREATE TABLE users (
userid INT NOT NULL AUTO_INCREMENT,
username VARCHAR(250),
PRIMARY KEY (userid)
);
CREATE TABLE tasks (
taskid INT NOT NULL AUTO_INCREMENT,
task VARCHAR(250),
start INT,
end INT,
PRIMARY KEY (taskid),
INDEX USING BTREE (start),
INDEX USING BTREE (end)
);
CREATE TABLE userassignment (
id INT NOT NULL AUTO_INCREMENT,
userid INT,
taskid INT,
deleted TINYINT,
PRIMARY KEY (id),
INDEX USING BTREE (userid),
INDEX USING BTREE (userid),
UNIQUE KEY `usertasks` ( `userid` , `taskid` )
);
I need to know, which users are assigned and on which main days of the event (day 1, day 2, day 3) they're assigned.
My query looks like this:
SELECT
u.userid,
u.username,
COUNT(ua.id) AS count_all,
dayone.c AS count_one,
daytwo.c AS count_two,
daythree.c AS count_three
FROM
users AS u
INNER JOIN
userassignment AS ua ON ua.userid = u.userid AND ua.deleted = 0
INNER JOIN
tasks AS t ON ua.taskid = t.taskid
LEFT JOIN (
SELECT
u.userid,
COUNT(ua.id) AS c
FROM
users AS u
INNER JOIN
userassignment AS ua ON
ua.userid = u.userid AND
ua.deleted = 0
INNER JOIN
tasks AS t ON
ua.taskid = t.taskid
WHERE
t.start > UNIX_TIMESTAMP("2014-08-01 00:00:00") AND
t.start < UNIX_TIMESTAMP("2014-08-02 00:00:00")
GROUP BY
u.userid
) AS dayone ON dayone.userid = u.userid
LEFT JOIN (
SELECT
u.userid,
COUNT(ua.id) AS c
FROM
users AS u
INNER JOIN
userassignment AS ua ON
ua.userid = u.userid AND
ua.deleted = 0
INNER JOIN
tasks AS t ON
ua.taskid = t.taskid
WHERE
t.start > UNIX_TIMESTAMP("2014-07-31 00:00:00") AND
t.start < UNIX_TIMESTAMP("2014-08-01 00:00:00")
GROUP BY
u.userid
) AS daytwo ON daytwo.userid = u.userid
LEFT JOIN (
SELECT
u.userid,
COUNT(ua.id) AS c
FROM
users AS u
INNER JOIN
userassignment AS ua ON
ua.userid = u.userid AND
ua.deleted = 0
INNER JOIN
tasks AS t ON
ua.taskid = t.taskid
WHERE
t.start > UNIX_TIMESTAMP("2014-08-02 00:00:00") AND
t.start < UNIX_TIMESTAMP("2014-08-04 00:00:00")
GROUP BY
u.userid
) AS daythree ON daythree.userid = u.userid
WHERE
t.start > UNIX_TIMESTAMP("2014-07-31 00:00:00") AND
t.start < UNIX_TIMESTAMP("2014-08-04 00:00:00")
GROUP BY
u.userid
ORDER BY
username ASC
First I select all users which have an assignment in one of the three days (there are about six time more users in the DB than assigned to a task), then I left join the assigned users of every of the three days.
So, is there a way to rebuild the query to join fewer rows? I only need to know, who is assigned on which day, not the number of assignments.
I already tried to UNION several queries but this was unsuccessful.
SQL Fiddle
An EXPLAIN of the real query (not in the SQL Fiddle) is:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY t range PRIMARY,start start 5 NULL 120 100.00 Using where; Using index; Using temporary; Using filesort
1 PRIMARY ua ref usertasks,userid,taskid taskid 2 db1154575-helfer.t.id 2 100.00 Using where
1 PRIMARY u eq_ref userid userid 2 db1154575-helfer.ua.userid 1 100.00
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 152 100.00
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 94 100.00
1 PRIMARY <derived4> ALL NULL NULL NULL NULL 147 100.00
4 DERIVED t range PRIMARY,start start 5 NULL 53 100.00 Using where; Using index; Using temporary; Using filesort
4 DERIVED ua ref usertasks,userid,taskid taskid 2 db1154575-helfer.t.id 2 100.00 Using where
4 DERIVED u eq_ref userid userid 2 db1154575-helfer.ua.userid 1 100.00 Using index
3 DERIVED t range PRIMARY,start start 5 NULL 21 100.00 Using where; Using index; Using temporary; Using filesort
3 DERIVED ua ref usertasks,userid,taskid taskid 2 db1154575-helfer.t.id 2 100.00 Using where
3 DERIVED u eq_ref userid userid 2 db1154575-helfer.ua.userid 1 100.00 Using index
2 DERIVED t range PRIMARY,start start 5 NULL 44 100.00 Using where; Using index; Using temporary; Using filesort
2 DERIVED ua ref usertasks,userid,taskid taskid 2 db1154575-helfer.t.id 2 100.00 Using where
2 DERIVED u eq_ref userid userid 2 db1154575-helfer.ua.userid 1 100.00 Using index

So, is all that really just a long-winded way of saying this...
SELECT u.*
, DATE(FROM_UNIXTIME(t.start)) dt
, COUNT(t.taskid) total
FROM users u
LEFT
JOIN userassignment ut
ON ut.userid = u.userid
AND ut.deleted = 0
LEFT
JOIN tasks t
ON t.taskid = ut.taskid
GROUP
BY u.userid
, DATE(FROM_UNIXTIME(t.start))
In the example above, you can change COUNT(t.taskid) to COUNT(CASE WHEN x = 'y' THEN z END) or SUM(CASE...

This should return the same result set:
SELECT u.userid, u.username,
COUNT(ua.id) AS count_all,
SUM(case when t.start > UNIX_TIMESTAMP('2014-08-01 00:00:00') AND
t.start < UNIX_TIMESTAMP('2014-08-02 00:00:00')
then 1 else 0
end) as count_one,
SUM(case when t.start > UNIX_TIMESTAMP('2014-07-31 00:00:00') AND
t.start < UNIX_TIMESTAMP('2014-08-01 00:00:00')
then 1 else 0
end) as count_two,
SUM(case when t.start > UNIX_TIMESTAMP('2014-08-02 00:00:00') AND
t.start < UNIX_TIMESTAMP('2014-08-04 00:00:00')
then 1 else 0
end) as count_three
FROM users u LEFT JOIN
userassignment ua
ON ua.userid = u.userid AND
ua.deleted = 0 LEFT JOIN
tasks t
ON ua.taskid = t.taskid
WHERE ua.deleted = 0 AND
t.start > UNIX_TIMESTAMP('2014-07-31 00:00:00') AND
t.start < UNIX_TIMESTAMP('2014-08-04 00:00:00')
GROUP BY u.userid
ORDER BY u.username;
Your formulation is a bit tricky. The outer joins are filter out any user whose assignments are always deleted, for instance. And the date periods are overlapping (I'm not sure if that is intentional, but it is how the query is structured).
Perhaps this simpler query will not exceed internal limits.

how to rewrite the query to make it faster and sql compliant

This query:
SELECT `subscribers`.`email_address`, `subscribers`.`first_name`, `subscribers`.`last_name`,
GROUP_CONCAT( DISTINCT t1.value SEPARATOR '|' ) AS 'Colors', GROUP_CONCAT( DISTINCT t2.value SEPARATOR '|' ) AS 'Languages'
FROM `subscribers`
LEFT JOIN `subscribers_multivalued` AS `t1` ON subscribers.subscriber_id = t1.subscriber_id AND t1.field_id = 112
LEFT JOIN `subscribers_multivalued` AS `t2` ON subscribers.subscriber_id = t2.subscriber_id AND t2.field_id = 111
WHERE (list_id = 40) AND (state = 1)
GROUP BY `subscribers`.`email_address` , `subscribers`.`first_name` , `subscribers`.`last_name`
With execute plan:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE subscribers ref FK_list_id,state_date_added FK_list_id 4 const 1753610 Using where; Using filesort
1 SIMPLE t1 ref subscriber_fk,field_fk subscriber_fk 4 chad0598_mailablel.subscribers.subscriber_id 1
1 SIMPLE t2 ref subscriber_fk,field_fk subscriber_fk 4 chad0598_mailablel.subscribers.subscriber_id 1
Caused error for uknown reason: General error: 3 Error writing file '/tmp/MYzamaNT' (Errcode: 28) in mysql.
I rewrote it like this groupy by subscriber_id insead:
SELECT `subscribers`.`email_address`, `subscribers`.`first_name`, `subscribers`.`last_name`,
GROUP_CONCAT( DISTINCT t1.value SEPARATOR '|' ) AS 'Colors', GROUP_CONCAT( DISTINCT t2.value SEPARATOR '|' ) AS 'Languages'
FROM `subscribers`
LEFT JOIN `subscribers_multivalued` AS `t1` ON subscribers.subscriber_id = t1.subscriber_id AND t1.field_id = 112
LEFT JOIN `subscribers_multivalued` AS `t2` ON subscribers.subscriber_id = t2.subscriber_id AND t2.field_id = 111
WHERE (list_id = 40) AND (state = 1)
GROUP BY `subscribers`.`subscriber_id`
The plan of this query is better (not filesort):
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE subscribers ref FK_list_id,state_date_added FK_list_id 4 const 1753610 Using where
1 SIMPLE t1 ref subscriber_fk,field_fk subscriber_fk 4 chad0598_mailablel.subscribers.subscriber_id 1
1 SIMPLE t2 ref subscriber_fk,field_fk subscriber_fk 4 chad0598_mailablel.subscribers.subscriber_id 1
It seems that it works in mysql and it much faster that previous query.
Although it works in mysql it's not compliant sql standard there shouldn't be aggregated fields in fields' list that are not present in GROUP BY. I there a way to make the query as fast as the last one and make it sql compliant?
I tried to rewrite it this way too:
SELECT `subscribers`.`email_address`, `subscribers`.`first_name`, `subscribers`.`last_name`,
t1.colors AS 'Colors', t2.languages AS 'Languages'
FROM `subscribers`
LEFT JOIN (SELECT subscriber_id, GROUP_CONCAT( DISTINCT value SEPARATOR '|' ) AS 'colors'
FROM subscribers_multivalued
WHERE field_id = 112
GROUP BY subscriber_id) t1 ON t1.subscriber_id = `subscribers`.`subscriber_id`
LEFT JOIN (SELECT subscriber_id, GROUP_CONCAT( DISTINCT value SEPARATOR '|' ) AS 'languages'
FROM subscribers_multivalued
WHERE field_id = 111
GROUP BY subscriber_id) t2 ON t2.subscriber_id = `subscribers`.`subscriber_id`
The plan of this query is much worse:
1 PRIMARY subscribers ALL NULL NULL NULL NULL 23358546
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 900000
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 900000
3 DERIVED subscribers_multivalued ALL field_fk field_fk 4 20621115 Using filesort
2 DERIVED subscribers_multivalued ALL field_fk field_fk 4 20621115 Using filesort
and I couldn't wait untill it returns data;

I'd keep the faster one. You can also test/benchmark this variation:
SELECT s.email_address,
s.first_name,
s.last_name,
COALESCE(t1.Colors, '') AS Colors,
COALESCE(t2.Languages, '') AS Languages
FROM subscribers AS s
LEFT JOIN
( SELECT t1.subscriber_id,
GROUP_CONCAT( DISTINCT t1.value SEPARATOR '|' ) AS Colors
FROM subscribers AS s
JOIN subscribers_multivalued AS t1
ON s.subscriber_id = t1.subscriber_id
WHERE t1.field_id = 112
AND s.list_id = 40
AND s.state = 1
GROUP BY t1.subscriber_id
) AS t1
ON s.subscriber_id = t1.subscriber_id
LEFT JOIN
( SELECT t2.subscriber_id,
GROUP_CONCAT( DISTINCT t2.value SEPARATOR '|' ) AS Languages
FROM subscribers AS s
JOIN subscribers_multivalued AS t2
ON s.subscriber_id = t2.subscriber_id
WHERE t2.field_id = 111
AND s.list_id = 40
AND s.state = 1
GROUP BY t2.subscriber_id
) AS t2
ON s.subscriber_id = t2.subscriber_id
WHERE s.list_id = 40
AND s.state = 1 ;
If (field_id, subscriber_id, value) is unique in table subscribers_multivalued, you can also drop the two DISTINCT.
Regarding speed and efficiency, check the indexes you have. These would help both your version and this one:
in the subscribers table, an index on either (list_id, state, subscriber_id) or (state, list_id, subscriber_id)
in the subscribers_multivalued table, an index on (field_id, subscriber_id, value) or (second best) on (field_id, subscriber_id).

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

OR in left join - mysql

Add indexes. The output of explain shows you which fields need them.

Related

MYSQL LEFT JOIN returns unexpected results

MySQL query too slow about 25 seconds

Avoid full table scan in inner select

Rebuild MySQL query to stay below MAX_JOIN_SIZE rows

how to rewrite the query to make it faster and sql compliant

Categories

Resources