Optimizing a Join - mysql

I have a join that looks like this:
LEFT JOIN abc__emails.my208 my3
ON my3.eid = my.eid
AND (
my3.my_id = 417
AND my3.data = 'AB Deleted Account'
)
This query is bordeline runnable. I'm trying to save every scrap of memory. My question then:
LEFT JOIN abc__emails.my208 my3
ON my3.eid = my.eid
AND my3.data = 'AB Deleted Account'
I removed the my3.my_id = 417 part.
Records where my_id = 417 include 'AB Deleted Account' but also a couple of other strings. I included it because I'm thinking I may help SQL narrow it down. Is that wrong?
I really just want the join to be on records where my3.data = 'AB Deleted Account'.
Is it best to add on as many "filters" as possible or to remove as many as possible?
eid is already the key on both tables.
Following feedback here is the output of EXPLAIN
1 SIMPLE co ALL 198784 Using temporary; Using filesort
1 SIMPLE my eq_ref PRIMARY PRIMARY 8 abc_emails.co.id,const 1
1 SIMPLE my1 eq_ref PRIMARY PRIMARY 8 abc_emails.my.eid,const 1
1 SIMPLE my2 eq_ref PRIMARY PRIMARY 8 abc_emails.my.eid,const 1
1 SIMPLE my3 ref ver_eid ver_eid 4 abc_emails.my.eid 246
1 SIMPLE i ref eid_iid eid_iid 10 abc_emails.my.eid,const 1 Using where; Using index
1 SIMPLE nvk eq_ref PRIMARY PRIMARY 4 abc_emails.co.id 1
And here is the query itself
SELECT
co.id AS ID_Eid,
co.email,
def.def_medium AS def_Medium_def208,
co.created AS Contact_Create_Date,
my1.data AS PAW_ID,
my.data AS Trial_Date,
my2.data AS Upgrade_Date,
MIN(my3.lastmod) AS Cancel_Date
FROM
abc_emails.cid208 co
LEFT JOIN abc_emails.my208 my ON my.eid = co.id AND my.my_id = 581
LEFT JOIN abc_emails.my208 my1 ON my1.eid = my.eid AND my1.my_id = 2765
LEFT JOIN abc_emails.my208 my2 ON my2.eid = my.eid AND my2.my_id = 3347
LEFT JOIN abc__emails.my208 my3 ON my3.eid = my.eid AND (my3.my_id = 417 AND my3.data = 'AB Deleted Account')
LEFT JOIN abc_emails.i208 i ON i.eid = my.eid AND i.iid = 22467
LEFT JOIN abc_emails.def208 def ON def.eid = co.id
WHERE i.iid IS NULL
GROUP BY ID_Eid, email, def_Medium_def208, Contact_Create_Date, PAW_ID, Upgrade_Date

Related

MYSQL LEFT JOIN returns unexpected results

I have two tables talk_comments and talk_comment_votes.
I run the following code to select, commentId, numberOfUpvotes, whetherUserUpvoted, numberOfDownvotes, whetherUserDownvoted usin LEFT JOINs to the same table.
SELECT c.id, COUNT(v1.id) as upvotes, COUNT(v2.id) as userUpvoted, COUNT(v3.id) as downvotes, COUNT(v4.id) as userDownvoted FROM talk_comments c
LEFT JOIN talk_comment_votes v1 ON v1.comment_id = c.id AND v1.status = 1
LEFT JOIN talk_comment_votes v2 ON v2.comment_id = c.id AND v2.status = 1 AND v2.user_id = 1 AND v2.is_anonymous = 0
LEFT JOIN talk_comment_votes v3 ON c.id = v3.comment_id AND v3.status = 2
LEFT JOIN talk_comment_votes v4 ON c.id = v4.comment_id AND v4.status = 2 AND v4.user_id = 1 AND v4.is_anonymous = 0
WHERE c.id = 2 GROUP BY c.id
I have the following data in my talk_comment_votes table
So, according to the query, it should select values 2,2,0,1,1 respectively. When I break those JOIN statements and do the queries, it returns the expected results. But, with JOINs, it returns something like the follows.
Can I get some help on fixing this?
Thanks.
I ran a benchmark on queries based on #spencer7593 and #RaymondNijland's 2 answers.
LEFT JOINS wins!
1. Using LEFT JOINS
SELECT c.id, COUNT(DISTINCT v1.id) as upvotes, COUNT(DISTINCT v2.id) as userUpvoted, COUNT(DISTINCT v3.id) as downvotes, COUNT(DISTINCT v4.id) as userDownvoted FROM talk_comments c
LEFT JOIN talk_comment_votes v1 ON v1.comment_id = c.id AND v1.status = 1
LEFT JOIN talk_comment_votes v2 ON v2.comment_id = c.id AND v2.status = 1 AND v2.user_id = 1 AND v2.is_anonymous = 0
LEFT JOIN talk_comment_votes v3 ON c.id = v3.comment_id AND v3.status = 2
LEFT JOIN talk_comment_votes v4 ON c.id = v4.comment_id AND v4.status = 2 AND v4.user_id = 1 AND v4.is_anonymous = 0
WHERE c.id = 2 GROUP BY c.id
Time for 1000 queries: 0.55000805854797s
2. Using Sub Queries
SELECT c.id,c.user_id, c.time,c.body, c.reply_to,
(SELECT COUNT(v1.id) FROM talk_comment_votes v1 WHERE v1.comment_id = c.id AND v1.status = 1 LIMIT 1) as upvotes,
(SELECT COUNT(v2.id) FROM talk_comment_votes v2 WHERE v2.comment_id = c.id AND v2.status = 1 AND v2.user_id = 1 LIMIT 1) as clientUpvoted,
(SELECT COUNT(v3.id) FROM talk_comment_votes v3 WHERE v3.comment_id = c.id AND v3.status = 2 LIMIT 1) as downvotes,
(SELECT COUNT(v4.id) FROM talk_comment_votes v4 WHERE v4.comment_id = c.id AND v4.status = 2 AND v4.user_id = 1 LIMIT 1) as clientDownvoted
FROM talk_comments c
WHERE c.id = 2 GROUP BY c.id
Time for 1000 queries: 0.95499300956726s
3. Using SUM, IF
SELECT c.id
, SUM(IF(v.status = 1 ,1,0)) AS upvotes
, SUM(IF(v.status = 1 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userUpvoted
, SUM(IF(v.status = 2 ,1,0)) AS downvotes
, SUM(IF(v.status = 2 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userDownvoted
FROM talk_comments c
LEFT
JOIN talk_comment_votes v
ON v.comment_id = c.id
WHERE c.id = 2
GROUP BY c.id
Time for 1000 queries: 1.2266919612885s
Thank you for all the answers.
I'd use conditional aggregation. A join to a single reference to tall_comment_votes, and then check conditions in expressions.
SELECT c.id
, SUM(IF(v.status = 1 ,1,0)) AS upvotes
, SUM(IF(v.status = 1 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userUpvoted
, SUM(IF(v.status = 2 ,1,0)) AS downvotes
, SUM(IF(v.status = 2 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userDownvoted
FROM talk_comments c
LEFT
JOIN talk_comment_votes v
ON v.comment_id = c.id
WHERE c.id = 2
GROUP
BY c.id
This avoids the problem of the partial cross product, when there are multiple rows returned from v1, v2, v3 and v4.
The MySQL IF() expression could replaced with a more ANSI standards compliant CASE expression, e.g.
, SUM(CASE WHEN v.status = 1 THEN 1 ELSE 0 END) AS upvotes
FOLLOWUP
setup test case and observe execution plans and performance
populate tables
CREATE TABLE talk_comments (id INT NOT NULL PRIMARY KEY AUTO_INCREMENT);
CREATE TABLE talk_comment_votes (id INT NOT NULL PRIMARY KEY AUTO_INCREMENT, comment_id INT UNSIGNED NOT NULL, user_id INT UNSIGNED NOT NULL, is_anonymous TINYINT(1) UNSIGNED NOT NULL, STATUS TINYINT UNSIGNED, time_ INT UNSIGNED);
CREATE INDEX talk_comment_votes_IX1 ON talk_comment_votes (comment_id, STATUS, user_id, is_anonymous) ;
INSERT INTO talk_comments (id) VALUES (1),(2),(3);
INSERT INTO talk_comment_votes (id, comment_id, user_id, is_anonymous, STATUS, time_) VALUES (1,2,2,0,1,0),(2,1,1,0,1,0),(3,2,1,0,2,NULL),(4,7,1,0,2,NULL),(5,1,14,1,1,NULL),(6,2,14,1,1,NULL);
query execution plans
EXPLAIN
SELECT c.id, COUNT(DISTINCT v1.id) AS upvotes, COUNT(DISTINCT v2.id) AS userUpvoted, COUNT(DISTINCT v3.id) AS downvotes, COUNT(DISTINCT v4.id) AS userDownvoted FROM talk_comments c
LEFT JOIN talk_comment_votes v1 ON v1.comment_id = c.id AND v1.status = 1
LEFT JOIN talk_comment_votes v2 ON v2.comment_id = c.id AND v2.status = 1 AND v2.user_id = 1 AND v2.is_anonymous = 0
LEFT JOIN talk_comment_votes v3 ON c.id = v3.comment_id AND v3.status = 2
LEFT JOIN talk_comment_votes v4 ON c.id = v4.comment_id AND v4.status = 2 AND v4.user_id = 1 AND v4.is_anonymous = 0
WHERE c.id = 2 GROUP BY c.id
;
EXPLAIN
SELECT c.id
, SUM(IF(v.status = 1 ,1,0)) AS upvotes
, SUM(IF(v.status = 1 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userUpvoted
, SUM(IF(v.status = 2 ,1,0)) AS downvotes
, SUM(IF(v.status = 2 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userDownvoted
FROM talk_comments c
LEFT
JOIN talk_comment_votes v
ON v.comment_id = c.id
WHERE c.id = 2
GROUP BY c.id
;
output from explain
-- id select_type table type possible_keys key key_len ref rows Extra
-- ------ ----------- ------ ------ ---------------------- ---------------------- ------- ----------------------- ------ -------------
-- 1 SIMPLE c const PRIMARY PRIMARY 4 const 1 Using index
-- 1 SIMPLE v1 ref talk_comment_votes_IX1 talk_comment_votes_IX1 6 const,const 2 Using index
-- 1 SIMPLE v2 ref talk_comment_votes_IX1 talk_comment_votes_IX1 11 const,const,const,const 1 Using index
-- 1 SIMPLE v3 ref talk_comment_votes_IX1 talk_comment_votes_IX1 6 const,const 1 Using index
-- 1 SIMPLE v4 ref talk_comment_votes_IX1 talk_comment_votes_IX1 11 const,const,const,const 1 Using index
-- id select_type table type possible_keys key key_len ref rows Extra
-- ------ ----------- ------ ------ ---------------------- ---------------------- ------- ------ ------ -------------
-- 1 SIMPLE c const PRIMARY PRIMARY 4 const 1 Using index
-- 1 SIMPLE v ref talk_comment_votes_IX1 talk_comment_votes_IX1 4 const 3 Using index
measured performance:
100 executions round 1 round 2 round 3
------------------------------------ ---------- ---------- ---------
multiple left join, count(distinct 0.123 secs 0.130 secs 0.125 secs
conditional aggregation sum(if 0.113 secs 0.114 secs 0.111 secs

how to fetch record associated with all values in IN() Operator

I want to fetch record associated with all values in array. but the problem is, query is fetching record if any one values in IN() is present in db . i want to fetch record ONLY if all values are true.
SELECT J.ID , J.U_POST_ID,
J.TITLE,J.CREATION_DATE,J.STATUS,
R.FIRST_NAME, R.LAST_NAME,R.CLINICAL_CLINIC_NAME,
J.REQUIREMENT,J.STATE,J.CITY,J.DESCRIPTION,
J.CALL_DUR,J.USER_ID
FROM df_job_meta M
LEFT OUTER JOIN df_job_post J ON M.JOB_ID = J.ID
LEFT OUTER JOIN df_register_users R ON R.ID = J.USER_ID
WHERE
J.STATUS='ACTIVE' AND
R.OCCUPATION !='student' AND
J.STATE IN ('Maharashtra') AND
J.CITY IN ('Nagpur') and
M.VALUE IN ('Clinical','Fresher','BDS Intern','Full Time')
table df_job_meta
---------------------------------------------
***VALUE*** | **META_KEY** | JOB_ID
--------------------------------------------
-----------------------------------------
Part Time | work_hour | 103
-------------------------------
BDS Intern |qualification | 103
----------------------------------------
Clinical |profile | 103
----------------------------------------
1 |num_vacancy | 103
----------------------------------------
1 to 3 Years |experience | 103
--------------------------------------
I think you'll need to generate some Dynamic SQL. For the set of coded values you gave:
SELECT J.ID , J.U_POST_ID,
J.TITLE,J.CREATION_DATE,J.STATUS,
R.FIRST_NAME, R.LAST_NAME,R.CLINICAL_CLINIC_NAME,
J.REQUIREMENT,J.STATE,J.CITY,J.DESCRIPTION,
J.CALL_DUR,J.USER_ID
FROM (
SELECT M1.ID
FROM df_job_meta M1
INNER JOIN df_job_meta M2 ON M1.JOB_ID = M2.JOB_ID
INNER JOIN df_job_meta M3 ON M1.JOB_ID = M3.JOB_ID
INNER JOIN df_job_meta M4 ON M1.JOB_ID = M4.JOB_ID
WHERE M1.VALUE = 'Clinical'
AND M2.VALUE = 'Fresher'
AND M3.VALUE = 'BDS Intern'
AND M4.VALUE = 'Full Time'
) as JOBS_MATCHING_META
LEFT OUTER JOIN df_job_post J ON M.JOB_ID = J.ID
LEFT OUTER JOIN df_register_users R ON R.ID = J.USER_ID
WHERE
J.STATUS='ACTIVE' AND
R.OCCUPATION !='student' AND
J.STATE IN ('Maharashtra') AND
J.CITY IN ('Nagpur')
;
(untested)
I would note that looking at your sample data for df_job_meta whether you need to be looking at the META_KEY column too - ie the value is only valid if it's for the right thing, for example
WHERE M1.META_KEY = 'qualification' AND M1.VALUE = 'BDS Intern'
As a final observation, it looks like yuo are trying to implement some form of facetted search; perhaps it might be worth looking at some alternative data stores that implement that directly?

Avoid full table scan in inner select

I have the following select in MySQL, which produces the right results but it takes unnecessarily long to execute:
SELECT tblGPSDevices.Email, tblLoc.Lat, tblLoc.Lon, tblLoc.Radius, tblLoc.CreationTimeStamp, tblTrackedUsers.ID, tblTrackedUsers.TrackerDeviceID, tblTrackedUsers.TrackedDeviceID
FROM tblTrackedUsers
INNER JOIN tblGPSDevices ON tblTrackedUsers.TrackedDeviceID = tblGPSDevices.ID
LEFT OUTER JOIN (
SELECT A.DeviceID, A.Lat, A.Lon, A.Radius, A.CreationTimeStamp, A.ID
FROM tblLocations A
INNER JOIN (
SELECT DeviceID, MAX(CreationTimeStamp) AS CreationTimeStamp, MAX(ID) AS ID
FROM tblLocations
GROUP BY DeviceID
) AS B ON A.DeviceID = B.DeviceID
AND A.CreationTimeStamp = B.CreationTimeStamp
AND A.ID = B.ID
) AS tblLoc ON tblLoc.DeviceID = tblGPSDevices.ID
WHERE tblGPSDevices.Validated = 0x01
AND tblGPSDevices.Enabled = 0x01
AND tblTrackedUsers.Validated = 0x01
AND tblTrackedUsers.TrackerDeviceID = 1
ORDER BY tblTrackedUsers.ID;
This query runs much slower than it should because it does a full table scan on tblLocations.
This is the part that really slows down the query:
SELECT A.DeviceID, A.Lat, A.Lon, A.Radius, A.CreationTimeStamp, A.ID
FROM tblLocations A
INNER JOIN (
SELECT DeviceID, MAX(CreationTimeStamp) AS CreationTimeStamp, MAX(ID) AS ID
FROM tblLocations
GROUP BY DeviceID
) AS B ON A.DeviceID = B.DeviceID
AND A.CreationTimeStamp = B.CreationTimeStamp
AND A.ID = B.ID
Here is the explain plan:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY tblTrackedUsers ref TrackerDeviceID,TrackedDeviceID TrackerDeviceID 9 const 14 Using where; Using temporary; Using filesort
1 PRIMARY tblGPSDevices eq_ref PRIMARY PRIMARY 8 tblTrackedUsers.TrackedDeviceID 1 Using where
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 2073
2 DERIVED <derived3> ALL NULL NULL NULL NULL 2073
2 DERIVED A eq_ref PRIMARY,DeviceID,CreationTimeStampIndex PRIMARY 8 B.ID 1 Using where
3 DERIVED tblLocations index NULL DeviceID 8 NULL 174058
It does a full table scan on tblLocations, even though I only need s small subset of DeviceID's in that table.
I just need to look at the DeviceID's that are returned from this part:
INNER JOIN tblGPSDevices ON tblTrackedUsers.TrackedDeviceID = tblGPSDevices.ID
WHERE tblTrackedUsers.TrackerDeviceID = 1
But unfortunately tblTrackedUsers.TrackedDeviceID is not visible in the inner select. So if I add
WHERE DeviceID = tblTrackedUsers.TrackedDeviceID
right above
GROUP BY DeviceID
It does not work.
How can I go about optimizing this query?
Structure of the tables involved with the relevant fields only:
tblGPSDevices:
ID | Email | Validated | Enabled
tblLocations:
ID | DeviceID | Lat | Lon | Radius | CreationTimeStamp
tblTrackedUsers:
ID | TrackerDeviceID | TrackedDeviceID | Validated
tblLocations.DeviceID, tblTrackedUsers.TrackerDeviceID and tblTrackedUsers.TrackedDeviceID are foreign keys pointing to tblGPSDevices.ID
What this query does:
The query should return all devices from tblGPSDevices that are being tracked by the user and their last location from tblLocations. The way to determine which devices are being tracked by a user is determined by tblTrackedUsers: select TrackedDeviceID from tblTrackedUsers where TrackerDeviceID = some_value
I did find the answer myself and I will post for future reference. This sped up the query from 2 sec. to 0.0179 sec. That is an enormous gain.
The key was to add one more inner select within:
SELECT DeviceID, MAX(CreationTimeStamp) AS CreationTimeStamp, MAX(ID) AS ID
FROM tblLocations
GROUP BY DeviceID
in order to avoid a full table scan, since we are only interested in DeviceID's that are = to tblTrackedUsers.TrackedDeviceID. Now this select looks like this:
SELECT C.DeviceID, MAX(C.CreationTimeStamp) AS CreationTimeStamp, MAX(C.ID) AS ID
FROM tblLocations C
INNER JOIN (
SELECT ID, TrackerDeviceID, TrackedDeviceID, TrackedName, AccessCode, Validated FROM tblAllowedUsers WHERE TrackerDeviceID = 1 AND Validated=0x01
) AS D ON D.TrackedDeviceID = C.DeviceID
GROUP BY DeviceID
Here is the full select now:
SELECT tblGPSDevices.Email, tblLoc.Lat, tblLoc.Lon, tblLoc.Radius, tblLoc.CreationTimeStamp, tblTrackedUsers.ID, tblTrackedUsers.TrackerDeviceID, tblTrackedUsers.TrackedDeviceID
FROM tblTrackedUsers
INNER JOIN tblGPSDevices ON tblTrackedUsers.TrackedDeviceID = tblGPSDevices.ID
LEFT OUTER JOIN (
SELECT A.DeviceID, A.Lat, A.Lon, A.Radius, A.CreationTimeStamp, A.ID
FROM tblLocations A
INNER JOIN (
SELECT C.DeviceID, MAX(C.CreationTimeStamp) AS CreationTimeStamp, MAX(C.ID) AS ID
FROM tblLocations C
INNER JOIN (
SELECT ID, TrackerDeviceID, TrackedDeviceID, TrackedName, AccessCode, Validated FROM tblAllowedUsers WHERE TrackerDeviceID = 1 AND Validated=0x01
) AS D ON D.TrackedDeviceID = C.DeviceID
GROUP BY DeviceID
) AS B ON A.DeviceID = B.DeviceID
AND A.CreationTimeStamp = B.CreationTimeStamp
AND A.ID = B.ID
) AS tblLoc ON tblLoc.DeviceID = tblGPSDevices.ID
WHERE tblGPSDevices.Validated = 0x01
AND tblGPSDevices.Enabled = 0x01
AND tblTrackedUsers.Validated = 0x01
AND tblTrackedUsers.TrackerDeviceID = 1
ORDER BY tblTrackedUsers.ID;

Rebuild MySQL query to stay below MAX_JOIN_SIZE rows

I have a SQL query which fails (most of the times) because of too many joined rows. The error provided by MySQL is The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET MAX_JOIN_SIZE=# if the SELECT is okay. I know I can avoid the error by setting the mentioned variables SQL_BIG_SELECTS and MAX_JOIN_SIZE, but I feel like this isn't the right way and pushes the problem only a bit in the future, because the join count might grow in the future.
The facts: I have an event planning tool which assigns users (=workers) to certain tasks. The tables are users (userid,username) [ID and name], tasks (taskid,task,start,end) [ID, task name, start as timestamp, end as timestamp] and userassignment (id,userid,taskid,deleted) [ID, user assigned to a task, the task, is the assignment still valid).
The exact table definition is like this:
CREATE TABLE users (
userid INT NOT NULL AUTO_INCREMENT,
username VARCHAR(250),
PRIMARY KEY (userid)
);
CREATE TABLE tasks (
taskid INT NOT NULL AUTO_INCREMENT,
task VARCHAR(250),
start INT,
end INT,
PRIMARY KEY (taskid),
INDEX USING BTREE (start),
INDEX USING BTREE (end)
);
CREATE TABLE userassignment (
id INT NOT NULL AUTO_INCREMENT,
userid INT,
taskid INT,
deleted TINYINT,
PRIMARY KEY (id),
INDEX USING BTREE (userid),
INDEX USING BTREE (userid),
UNIQUE KEY `usertasks` ( `userid` , `taskid` )
);
I need to know, which users are assigned and on which main days of the event (day 1, day 2, day 3) they're assigned.
My query looks like this:
SELECT
u.userid,
u.username,
COUNT(ua.id) AS count_all,
dayone.c AS count_one,
daytwo.c AS count_two,
daythree.c AS count_three
FROM
users AS u
INNER JOIN
userassignment AS ua ON ua.userid = u.userid AND ua.deleted = 0
INNER JOIN
tasks AS t ON ua.taskid = t.taskid
LEFT JOIN (
SELECT
u.userid,
COUNT(ua.id) AS c
FROM
users AS u
INNER JOIN
userassignment AS ua ON
ua.userid = u.userid AND
ua.deleted = 0
INNER JOIN
tasks AS t ON
ua.taskid = t.taskid
WHERE
t.start > UNIX_TIMESTAMP("2014-08-01 00:00:00") AND
t.start < UNIX_TIMESTAMP("2014-08-02 00:00:00")
GROUP BY
u.userid
) AS dayone ON dayone.userid = u.userid
LEFT JOIN (
SELECT
u.userid,
COUNT(ua.id) AS c
FROM
users AS u
INNER JOIN
userassignment AS ua ON
ua.userid = u.userid AND
ua.deleted = 0
INNER JOIN
tasks AS t ON
ua.taskid = t.taskid
WHERE
t.start > UNIX_TIMESTAMP("2014-07-31 00:00:00") AND
t.start < UNIX_TIMESTAMP("2014-08-01 00:00:00")
GROUP BY
u.userid
) AS daytwo ON daytwo.userid = u.userid
LEFT JOIN (
SELECT
u.userid,
COUNT(ua.id) AS c
FROM
users AS u
INNER JOIN
userassignment AS ua ON
ua.userid = u.userid AND
ua.deleted = 0
INNER JOIN
tasks AS t ON
ua.taskid = t.taskid
WHERE
t.start > UNIX_TIMESTAMP("2014-08-02 00:00:00") AND
t.start < UNIX_TIMESTAMP("2014-08-04 00:00:00")
GROUP BY
u.userid
) AS daythree ON daythree.userid = u.userid
WHERE
t.start > UNIX_TIMESTAMP("2014-07-31 00:00:00") AND
t.start < UNIX_TIMESTAMP("2014-08-04 00:00:00")
GROUP BY
u.userid
ORDER BY
username ASC
First I select all users which have an assignment in one of the three days (there are about six time more users in the DB than assigned to a task), then I left join the assigned users of every of the three days.
So, is there a way to rebuild the query to join fewer rows? I only need to know, who is assigned on which day, not the number of assignments.
I already tried to UNION several queries but this was unsuccessful.
SQL Fiddle
An EXPLAIN of the real query (not in the SQL Fiddle) is:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY t range PRIMARY,start start 5 NULL 120 100.00 Using where; Using index; Using temporary; Using filesort
1 PRIMARY ua ref usertasks,userid,taskid taskid 2 db1154575-helfer.t.id 2 100.00 Using where
1 PRIMARY u eq_ref userid userid 2 db1154575-helfer.ua.userid 1 100.00
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 152 100.00
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 94 100.00
1 PRIMARY <derived4> ALL NULL NULL NULL NULL 147 100.00
4 DERIVED t range PRIMARY,start start 5 NULL 53 100.00 Using where; Using index; Using temporary; Using filesort
4 DERIVED ua ref usertasks,userid,taskid taskid 2 db1154575-helfer.t.id 2 100.00 Using where
4 DERIVED u eq_ref userid userid 2 db1154575-helfer.ua.userid 1 100.00 Using index
3 DERIVED t range PRIMARY,start start 5 NULL 21 100.00 Using where; Using index; Using temporary; Using filesort
3 DERIVED ua ref usertasks,userid,taskid taskid 2 db1154575-helfer.t.id 2 100.00 Using where
3 DERIVED u eq_ref userid userid 2 db1154575-helfer.ua.userid 1 100.00 Using index
2 DERIVED t range PRIMARY,start start 5 NULL 44 100.00 Using where; Using index; Using temporary; Using filesort
2 DERIVED ua ref usertasks,userid,taskid taskid 2 db1154575-helfer.t.id 2 100.00 Using where
2 DERIVED u eq_ref userid userid 2 db1154575-helfer.ua.userid 1 100.00 Using index
So, is all that really just a long-winded way of saying this...
SELECT u.*
, DATE(FROM_UNIXTIME(t.start)) dt
, COUNT(t.taskid) total
FROM users u
LEFT
JOIN userassignment ut
ON ut.userid = u.userid
AND ut.deleted = 0
LEFT
JOIN tasks t
ON t.taskid = ut.taskid
GROUP
BY u.userid
, DATE(FROM_UNIXTIME(t.start))
In the example above, you can change COUNT(t.taskid) to COUNT(CASE WHEN x = 'y' THEN z END) or SUM(CASE...
This should return the same result set:
SELECT u.userid, u.username,
COUNT(ua.id) AS count_all,
SUM(case when t.start > UNIX_TIMESTAMP('2014-08-01 00:00:00') AND
t.start < UNIX_TIMESTAMP('2014-08-02 00:00:00')
then 1 else 0
end) as count_one,
SUM(case when t.start > UNIX_TIMESTAMP('2014-07-31 00:00:00') AND
t.start < UNIX_TIMESTAMP('2014-08-01 00:00:00')
then 1 else 0
end) as count_two,
SUM(case when t.start > UNIX_TIMESTAMP('2014-08-02 00:00:00') AND
t.start < UNIX_TIMESTAMP('2014-08-04 00:00:00')
then 1 else 0
end) as count_three
FROM users u LEFT JOIN
userassignment ua
ON ua.userid = u.userid AND
ua.deleted = 0 LEFT JOIN
tasks t
ON ua.taskid = t.taskid
WHERE ua.deleted = 0 AND
t.start > UNIX_TIMESTAMP('2014-07-31 00:00:00') AND
t.start < UNIX_TIMESTAMP('2014-08-04 00:00:00')
GROUP BY u.userid
ORDER BY u.username;
Your formulation is a bit tricky. The outer joins are filter out any user whose assignments are always deleted, for instance. And the date periods are overlapping (I'm not sure if that is intentional, but it is how the query is structured).
Perhaps this simpler query will not exceed internal limits.

Optimizing a MySql database for large queries

I am building a big database that. one of my tables have 300K records and another once has 5 million record.
I currently have all foreign keys and the column named "ph.trigger_on" indexed.
My question is how can I optimize my table/query to get a faster results? I tried to create a view with this code then from this view I can get all the information that I need when I do a query to this view.
By the query is still slow and I am having difficulties understanding the results that EXPLAIN is showing.
This is my current query
EXPLAIN SELECT
ac.account_name AS accountName,
tm.name AS teamName,
cp.name AS campaignName,
cc.call_code_name AS callCode,
rc.result_code_name AS resultCode,
zn.name AS zoneName,
ind.name AS industry,
(su.first_name + su.middle_name + su.last_name) AS owner_name,
su.login_user AS ownerLoginUser,
(su1.first_name + su1.middle_name + su1.last_name) AS firstAttemptBy,
(su2.first_name + su2.middle_name + su2.last_name) AS lastAttemptBy,
(su3.first_name + su3.middle_name + su.last_name) AS modifiedBy,
ci.name AS clientName,
ph.trigger_on AS triggerOn,
ph.created_on AS createdOn,
ph.first_attempt_on AS firstAttemptOn,
ph.call_subject AS callSubject,
ph.status,
ph.last_attempt_on AS lastAttemptOn,
ph.total_attempts AS totalAttempts,
ph.call_direction AS callDirection,
ph.call_notes AS callNotes,
ph.call_duration AS callDuration,
ph.modified_on AS modifiedOn
FROM phone_calls AS ph
INNER JOIN accounts AS ac ON ph.account_id = ac.account_id
INNER JOIN clients AS ci ON ac.client_id = ci.client_id
INNER JOIN industries AS ind ON ac.industry_id = ind.industry_id
INNER JOIN call_codes AS cc ON ph.call_code_id = cc.call_code_id
INNER JOIN time_zones AS zn ON ph.time_zone_id = zn.time_zone_id
INNER JOIN users AS su ON ph.owner_id = su.user_id
LEFT JOIN teams AS tm ON ph.team_id = tm.team_id
LEFT JOIN result_codes AS rc ON ph.result_code_id = rc.result_code_id
LEFT JOIN campaigns AS cp ON ph.campaign_id = cp.campaign_id
LEFT JOIN users AS su1 ON ph.first_attempt_by = su1.user_id
LEFT JOIN users AS su2 ON ph.last_attempt_by = su2.user_id
LEFT JOIN users AS su3 ON ph.modified_by = su3.user_id
WHERE ph.trigger_on < now()
LIMIT 1000
this is my current output.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ci ALL PRIMARY 1
1 SIMPLE zn ALL PRIMARY 1 Using join buffer (Block Nested Loop)
1 SIMPLE su ALL PRIMARY 1 Using join buffer (Block Nested Loop)
1 SIMPLE ac ref PRIMARY,client_id,industry_id client_id 4 rdi_cms.ci.client_id 95917
1 SIMPLE ind eq_ref PRIMARY PRIMARY 4 rdi_cms.ac.industry_id 1
1 SIMPLE ph ref owner_id,call_code_id,account_id,time_zone_id,trigger_on account_id 4 rdi_cms.ac.account_id 11 Using where
1 SIMPLE tm ALL PRIMARY 1 Using where; Using join buffer (Block Nested Loop)
1 SIMPLE rc eq_ref PRIMARY PRIMARY 4 rdi_cms.ph.result_code_id 1
1 SIMPLE cc eq_ref PRIMARY PRIMARY 4 rdi_cms.ph.call_code_id 1
1 SIMPLE cp ALL PRIMARY 1 Using where; Using join buffer (Block Nested Loop)
1 SIMPLE su1 ALL PRIMARY 1 Using where; Using join buffer (Block Nested Loop)
1 SIMPLE su2 ALL PRIMARY 1 Using where; Using join buffer (Block Nested Loop)
1 SIMPLE su3 ALL PRIMARY 1 Using where; Using join buffer (Block Nested Loop)
What can I do to improve my tables or my query.
it could make a difference if you push a join into a subquery in the SELECT part of your query like this:
SELECT
ac.account_name AS accountName,
tm.name AS teamName,
cp.name AS campaignName,
cc.call_code_name AS callCode,
rc.result_code_name AS resultCode,
(SELECT zn.name FROM time_zones AS zn WHERE ph.time_zone_id = zn.time_zone_id) AS zoneName,
(SELECT ind.name FROM industries AS ind WHERE ac.industry_id = ind.industry_id) AS industry,
(SELECT su.first_name + su.middle_name + su.last_name users AS su WHERE ph.owner_id = su.user_id) AS owner_name,
su.login_user AS ownerLoginUser,
(su1.first_name + su1.middle_name + su1.last_name) AS firstAttemptBy,
(su2.first_name + su2.middle_name + su2.last_name) AS lastAttemptBy,
(su3.first_name + su3.middle_name + su.last_name) AS modifiedBy,
ci.name AS clientName,
ph.trigger_on AS triggerOn,
ph.created_on AS createdOn,
ph.first_attempt_on AS firstAttemptOn,
ph.call_subject AS callSubject,
ph.status,
ph.last_attempt_on AS lastAttemptOn,
ph.total_attempts AS totalAttempts,
ph.call_direction AS callDirection,
ph.call_notes AS callNotes,
ph.call_duration AS callDuration,
ph.modified_on AS modifiedOn
FROM phone_calls AS ph
INNER JOIN accounts AS ac ON ph.account_id = ac.account_id
INNER JOIN clients AS ci ON ac.client_id = ci.client_id
INNER JOIN call_codes AS cc ON ph.call_code_id = cc.call_code_id
INNER JOIN time_zones AS zn ON ph.time_zone_id = zn.time_zone_id
LEFT JOIN teams AS tm ON ph.team_id = tm.team_id
LEFT JOIN result_codes AS rc ON ph.result_code_id = rc.result_code_id
LEFT JOIN campaigns AS cp ON ph.campaign_id = cp.campaign_id
LEFT JOIN users AS su1 ON ph.first_attempt_by = su1.user_id
LEFT JOIN users AS su2 ON ph.last_attempt_by = su2.user_id
LEFT JOIN users AS su3 ON ph.modified_by = su3.user_id
WHERE ph.trigger_on < now()
LIMIT 1000
here i pushed 3 joins into your SELECT part.