Avoid full table scan in inner select

Avoid full table scan in inner select - mysql

I have the following select in MySQL, which produces the right results but it takes unnecessarily long to execute:
SELECT tblGPSDevices.Email, tblLoc.Lat, tblLoc.Lon, tblLoc.Radius, tblLoc.CreationTimeStamp, tblTrackedUsers.ID, tblTrackedUsers.TrackerDeviceID, tblTrackedUsers.TrackedDeviceID
FROM tblTrackedUsers
INNER JOIN tblGPSDevices ON tblTrackedUsers.TrackedDeviceID = tblGPSDevices.ID
LEFT OUTER JOIN (
SELECT A.DeviceID, A.Lat, A.Lon, A.Radius, A.CreationTimeStamp, A.ID
FROM tblLocations A
INNER JOIN (
SELECT DeviceID, MAX(CreationTimeStamp) AS CreationTimeStamp, MAX(ID) AS ID
FROM tblLocations
GROUP BY DeviceID
) AS B ON A.DeviceID = B.DeviceID
AND A.CreationTimeStamp = B.CreationTimeStamp
AND A.ID = B.ID
) AS tblLoc ON tblLoc.DeviceID = tblGPSDevices.ID
WHERE tblGPSDevices.Validated = 0x01
AND tblGPSDevices.Enabled = 0x01
AND tblTrackedUsers.Validated = 0x01
AND tblTrackedUsers.TrackerDeviceID = 1
ORDER BY tblTrackedUsers.ID;
This query runs much slower than it should because it does a full table scan on tblLocations.
This is the part that really slows down the query:
SELECT A.DeviceID, A.Lat, A.Lon, A.Radius, A.CreationTimeStamp, A.ID
FROM tblLocations A
INNER JOIN (
SELECT DeviceID, MAX(CreationTimeStamp) AS CreationTimeStamp, MAX(ID) AS ID
FROM tblLocations
GROUP BY DeviceID
) AS B ON A.DeviceID = B.DeviceID
AND A.CreationTimeStamp = B.CreationTimeStamp
AND A.ID = B.ID
Here is the explain plan:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY tblTrackedUsers ref TrackerDeviceID,TrackedDeviceID TrackerDeviceID 9 const 14 Using where; Using temporary; Using filesort
1 PRIMARY tblGPSDevices eq_ref PRIMARY PRIMARY 8 tblTrackedUsers.TrackedDeviceID 1 Using where
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 2073
2 DERIVED <derived3> ALL NULL NULL NULL NULL 2073
2 DERIVED A eq_ref PRIMARY,DeviceID,CreationTimeStampIndex PRIMARY 8 B.ID 1 Using where
3 DERIVED tblLocations index NULL DeviceID 8 NULL 174058
It does a full table scan on tblLocations, even though I only need s small subset of DeviceID's in that table.
I just need to look at the DeviceID's that are returned from this part:
INNER JOIN tblGPSDevices ON tblTrackedUsers.TrackedDeviceID = tblGPSDevices.ID
WHERE tblTrackedUsers.TrackerDeviceID = 1
But unfortunately tblTrackedUsers.TrackedDeviceID is not visible in the inner select. So if I add
WHERE DeviceID = tblTrackedUsers.TrackedDeviceID
right above
GROUP BY DeviceID
It does not work.
How can I go about optimizing this query?
Structure of the tables involved with the relevant fields only:
tblGPSDevices:
ID | Email | Validated | Enabled
tblLocations:
ID | DeviceID | Lat | Lon | Radius | CreationTimeStamp
tblTrackedUsers:
ID | TrackerDeviceID | TrackedDeviceID | Validated
tblLocations.DeviceID, tblTrackedUsers.TrackerDeviceID and tblTrackedUsers.TrackedDeviceID are foreign keys pointing to tblGPSDevices.ID
What this query does:
The query should return all devices from tblGPSDevices that are being tracked by the user and their last location from tblLocations. The way to determine which devices are being tracked by a user is determined by tblTrackedUsers: select TrackedDeviceID from tblTrackedUsers where TrackerDeviceID = some_value

I did find the answer myself and I will post for future reference. This sped up the query from 2 sec. to 0.0179 sec. That is an enormous gain.
The key was to add one more inner select within:
SELECT DeviceID, MAX(CreationTimeStamp) AS CreationTimeStamp, MAX(ID) AS ID
FROM tblLocations
GROUP BY DeviceID
in order to avoid a full table scan, since we are only interested in DeviceID's that are = to tblTrackedUsers.TrackedDeviceID. Now this select looks like this:
SELECT C.DeviceID, MAX(C.CreationTimeStamp) AS CreationTimeStamp, MAX(C.ID) AS ID
FROM tblLocations C
INNER JOIN (
SELECT ID, TrackerDeviceID, TrackedDeviceID, TrackedName, AccessCode, Validated FROM tblAllowedUsers WHERE TrackerDeviceID = 1 AND Validated=0x01
) AS D ON D.TrackedDeviceID = C.DeviceID
GROUP BY DeviceID
Here is the full select now:
SELECT tblGPSDevices.Email, tblLoc.Lat, tblLoc.Lon, tblLoc.Radius, tblLoc.CreationTimeStamp, tblTrackedUsers.ID, tblTrackedUsers.TrackerDeviceID, tblTrackedUsers.TrackedDeviceID
FROM tblTrackedUsers
INNER JOIN tblGPSDevices ON tblTrackedUsers.TrackedDeviceID = tblGPSDevices.ID
LEFT OUTER JOIN (
SELECT A.DeviceID, A.Lat, A.Lon, A.Radius, A.CreationTimeStamp, A.ID
FROM tblLocations A
INNER JOIN (
SELECT C.DeviceID, MAX(C.CreationTimeStamp) AS CreationTimeStamp, MAX(C.ID) AS ID
FROM tblLocations C
INNER JOIN (
SELECT ID, TrackerDeviceID, TrackedDeviceID, TrackedName, AccessCode, Validated FROM tblAllowedUsers WHERE TrackerDeviceID = 1 AND Validated=0x01
) AS D ON D.TrackedDeviceID = C.DeviceID
GROUP BY DeviceID
) AS B ON A.DeviceID = B.DeviceID
AND A.CreationTimeStamp = B.CreationTimeStamp
AND A.ID = B.ID
) AS tblLoc ON tblLoc.DeviceID = tblGPSDevices.ID
WHERE tblGPSDevices.Validated = 0x01
AND tblGPSDevices.Enabled = 0x01
AND tblTrackedUsers.Validated = 0x01
AND tblTrackedUsers.TrackerDeviceID = 1
ORDER BY tblTrackedUsers.ID;

Related

How can I optimize my sql code?

I have following tables
contacts
contact_id | contact_slug | contact_first_name | contact_email | contact_date_added | company_id | contact_is_active | contact_subscribed | contact_last_name | contact_company | contact_twitter
contact_campaigns
contact_campaign_id | contact_id | contact_campaign_created | company_id | contact_campaign_sent
bundle_feedback
bundle_feedback_id | bundle_id, contact_id | company_id | bundle_feedback_rating | bundle_feedback_favorite_track_id | bundle_feedback_supporting | campaign_id
bundles
bundle_id | bundle_name | bundle_created | company_id | bundle_is_active
tracks
track_id | company_id | track_title
I wrote this query, but it works slowly, how can I optimize this query to make it faster ?
SELECT SQL_CALC_FOUND_ROWS c.contact_id,
c.contact_first_name,
c.contact_last_name,
c.contact_email,
c.contact_date_added,
c.contact_company,
c.contact_twitter,
concat(c.contact_first_name," ", c.contact_last_name) AS fullname,
c.contact_subscribed,
ifnull(icc.sendCampaignsCount, 0) AS sendCampaignsCount,
ifnull(round((ibf.countfeedbacks/sendCampaignsCount * 100),2), 0) AS percentFeedback,
ifnull(ibf.bundle_feedback_supporting, 0) AS feedbackSupporting
FROM contacts AS c
LEFT JOIN
(SELECT c.contact_id,
count(cc.contact_campaign_id) AS sendCampaignsCount
FROM contacts AS c
LEFT JOIN contact_campaigns AS cc ON cc.contact_id = c.contact_id
WHERE c.company_id = '876'
AND c.contact_is_active = '1'
AND cc.contact_campaign_sent = '1'
GROUP BY c.contact_id) AS icc ON icc.contact_id = c.contact_id
LEFT JOIN
(SELECT bf.contact_id,
count(*) AS countfeedbacks,
bf.bundle_feedback_supporting
FROM bundle_feedback bf
JOIN bundles b
JOIN contacts c
LEFT JOIN tracks t ON bf.bundle_feedback_favorite_track_id = t.track_id
WHERE bf.bundle_id = b.bundle_id
AND bf.contact_id = c.contact_id
AND bf.company_id='876'
GROUP BY bf.contact_id) AS ibf ON ibf.contact_id = c.contact_id
WHERE c.company_id = '876'
AND contact_is_active = '1'
ORDER BY percentFeedback DESC LIMIT 0, 25;

I have done 2 improvements
1) Removed the contacts which is getting joined unnecessarily twice and put the condition at the final where condition.
2) Removed as per SQL_CALC_FOUND_ROWS
Which is fastest? SELECT SQL_CALC_FOUND_ROWS FROM `table`, or SELECT COUNT(*)
SELECT c.contact_id,
c.contact_first_name,
c.contact_last_name,
c.contact_email,
c.contact_date_added,
c.contact_company,
c.contact_twitter,
concat(c.contact_first_name," ", c.contact_last_name) AS fullname,
c.contact_subscribed,
ifnull(icc.sendCampaignsCount, 0) AS sendCampaignsCount,
ifnull(round((ibf.countfeedbacks/sendCampaignsCount * 100),2), 0) AS percentFeedback,
ifnull(ibf.bundle_feedback_supporting, 0) AS feedbackSupporting
FROM contacts AS c
LEFT JOIN
(SELECT cc.contact_id,
count(cc.contact_campaign_id) AS sendCampaignsCount
FROM contact_campaigns
WHERE cc.contact_campaign_sent = '1'
GROUP BY cc.contact_id) AS icc ON icc.contact_id = c.contact_id
LEFT JOIN
(SELECT bf.contact_id,
count(*) AS countfeedbacks,
bf.bundle_feedback_supporting
FROM bundle_feedback bf
JOIN bundles b
LEFT JOIN tracks t ON bf.bundle_feedback_favorite_track_id = t.track_id
WHERE bf.bundle_id = b.bundle_id
GROUP BY bf.contact_id) AS ibf ON ibf.contact_id = c.contact_id
WHERE c.company_id = '876' and c.contact_is_active = '1'

First, you are not identifying any indexes you have to optimize the query. That said, I would ensure you have at least the following composite / covering indexes.
table index
contacts ( company_id, contact_is_active )
contact_campaigns ( contact_id, contact_campaign_sent )
bundle_feedback ( contact_id, bundle_feedback_supporting )
Next, as noted in other answer, unless you really need how many rows qualified, remove the "SQL_CALC_FOUND_ROWS".
In your first left-join (icc), you do a left-join on contact_campaigns (cc), but then throw into your WHERE clause an "AND cc.contact_campaign_sent = '1'" which turns that into an INNER JOIN. At the outer query level, these would result in no matching record and thus NULL for your percentage calculations.
In your second left-join (ibf), you are doing a join to the tracks table, but not utilizing anything from it. Also, you are joining to the bundles table but not using anything from there either -- unless you are getting multiple rows in the bundles and tracks tables which would result in a Cartesian result and possibly overstate your "CountFeedbacks" value. You also do not need the contacts table as you are not doing anything else with it, and the feedback table has the contact ID basis your are querying for. Since that is only grouped by the contact_id, your "bf.bundle_feedback_supporting" is otherwise wasted. If you want counts of feedback, just count from that table per contact ID and remove the rest. (also, the joins should have the "ON" clauses instead of within the WHERE clause for consistency)
Also, for your supporting feedback, the data type and value are unclear, so I implied as a Yes or No and have a SUM() based on how many are supporting. So, a given contact may have 100 records but only 37 are supporting. This gives you 1 record for the contact having BOTH values 100 and 37 respectively and not lost in a group by based on the first entry found for the contact.
I would try to summarize your query to below:
SELECT
c.contact_id,
c.contact_first_name,
c.contact_last_name,
c.contact_email,
c.contact_date_added,
c.contact_company,
c.contact_twitter,
concat(c.contact_first_name," ", c.contact_last_name) AS fullname,
c.contact_subscribed,
ifnull(icc.sendCampaignsCount, 0) AS sendCampaignsCount,
ifnull(round((ibf.countfeedbacks / icc.sendCampaignsCount * 100),2), 0) AS percentFeedback,
ifnull(ibf.SupportCount, 0) AS feedbackSupporting
FROM
contacts AS c
LEFT JOIN
( SELECT
c.contact_id,
count(*) AS sendCampaignsCount
FROM
contacts AS c
JOIN contact_campaigns AS cc
ON c.contact_id = cc.contact_id
AND cc.contact_campaign_sent = '1'
WHERE
c.company_id = '876'
AND c.contact_is_active = '1'
GROUP BY
c.contact_id) AS icc
ON c.contact_id = icc.contact_id
LEFT JOIN
( SELECT
bf.contact_id,
count(*) AS countfeedbacks,
SUM( case when bf.bundle_feedback_supporting = 'Y'
then 1 else 0 end ) as SupportCount
FROM
contacts AS c
JOIN bundle_feedback bf
ON c.contact_id = bf.contact_id
WHERE
c.company_id = '876'
AND c.contact_is_active = '1'
GROUP BY
bf.contact_id) AS ibf
ON c.contact_id = ibf.contact_id
WHERE
c.company_id = '876'
AND c.contact_is_active = '1'
ORDER BY
percentFeedback DESC LIMIT 0, 25;

Rebuild MySQL query to stay below MAX_JOIN_SIZE rows

I have a SQL query which fails (most of the times) because of too many joined rows. The error provided by MySQL is The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET MAX_JOIN_SIZE=# if the SELECT is okay. I know I can avoid the error by setting the mentioned variables SQL_BIG_SELECTS and MAX_JOIN_SIZE, but I feel like this isn't the right way and pushes the problem only a bit in the future, because the join count might grow in the future.
The facts: I have an event planning tool which assigns users (=workers) to certain tasks. The tables are users (userid,username) [ID and name], tasks (taskid,task,start,end) [ID, task name, start as timestamp, end as timestamp] and userassignment (id,userid,taskid,deleted) [ID, user assigned to a task, the task, is the assignment still valid).
The exact table definition is like this:
CREATE TABLE users (
userid INT NOT NULL AUTO_INCREMENT,
username VARCHAR(250),
PRIMARY KEY (userid)
);
CREATE TABLE tasks (
taskid INT NOT NULL AUTO_INCREMENT,
task VARCHAR(250),
start INT,
end INT,
PRIMARY KEY (taskid),
INDEX USING BTREE (start),
INDEX USING BTREE (end)
);
CREATE TABLE userassignment (
id INT NOT NULL AUTO_INCREMENT,
userid INT,
taskid INT,
deleted TINYINT,
PRIMARY KEY (id),
INDEX USING BTREE (userid),
INDEX USING BTREE (userid),
UNIQUE KEY `usertasks` ( `userid` , `taskid` )
);
I need to know, which users are assigned and on which main days of the event (day 1, day 2, day 3) they're assigned.
My query looks like this:
SELECT
u.userid,
u.username,
COUNT(ua.id) AS count_all,
dayone.c AS count_one,
daytwo.c AS count_two,
daythree.c AS count_three
FROM
users AS u
INNER JOIN
userassignment AS ua ON ua.userid = u.userid AND ua.deleted = 0
INNER JOIN
tasks AS t ON ua.taskid = t.taskid
LEFT JOIN (
SELECT
u.userid,
COUNT(ua.id) AS c
FROM
users AS u
INNER JOIN
userassignment AS ua ON
ua.userid = u.userid AND
ua.deleted = 0
INNER JOIN
tasks AS t ON
ua.taskid = t.taskid
WHERE
t.start > UNIX_TIMESTAMP("2014-08-01 00:00:00") AND
t.start < UNIX_TIMESTAMP("2014-08-02 00:00:00")
GROUP BY
u.userid
) AS dayone ON dayone.userid = u.userid
LEFT JOIN (
SELECT
u.userid,
COUNT(ua.id) AS c
FROM
users AS u
INNER JOIN
userassignment AS ua ON
ua.userid = u.userid AND
ua.deleted = 0
INNER JOIN
tasks AS t ON
ua.taskid = t.taskid
WHERE
t.start > UNIX_TIMESTAMP("2014-07-31 00:00:00") AND
t.start < UNIX_TIMESTAMP("2014-08-01 00:00:00")
GROUP BY
u.userid
) AS daytwo ON daytwo.userid = u.userid
LEFT JOIN (
SELECT
u.userid,
COUNT(ua.id) AS c
FROM
users AS u
INNER JOIN
userassignment AS ua ON
ua.userid = u.userid AND
ua.deleted = 0
INNER JOIN
tasks AS t ON
ua.taskid = t.taskid
WHERE
t.start > UNIX_TIMESTAMP("2014-08-02 00:00:00") AND
t.start < UNIX_TIMESTAMP("2014-08-04 00:00:00")
GROUP BY
u.userid
) AS daythree ON daythree.userid = u.userid
WHERE
t.start > UNIX_TIMESTAMP("2014-07-31 00:00:00") AND
t.start < UNIX_TIMESTAMP("2014-08-04 00:00:00")
GROUP BY
u.userid
ORDER BY
username ASC
First I select all users which have an assignment in one of the three days (there are about six time more users in the DB than assigned to a task), then I left join the assigned users of every of the three days.
So, is there a way to rebuild the query to join fewer rows? I only need to know, who is assigned on which day, not the number of assignments.
I already tried to UNION several queries but this was unsuccessful.
SQL Fiddle
An EXPLAIN of the real query (not in the SQL Fiddle) is:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY t range PRIMARY,start start 5 NULL 120 100.00 Using where; Using index; Using temporary; Using filesort
1 PRIMARY ua ref usertasks,userid,taskid taskid 2 db1154575-helfer.t.id 2 100.00 Using where
1 PRIMARY u eq_ref userid userid 2 db1154575-helfer.ua.userid 1 100.00
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 152 100.00
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 94 100.00
1 PRIMARY <derived4> ALL NULL NULL NULL NULL 147 100.00
4 DERIVED t range PRIMARY,start start 5 NULL 53 100.00 Using where; Using index; Using temporary; Using filesort
4 DERIVED ua ref usertasks,userid,taskid taskid 2 db1154575-helfer.t.id 2 100.00 Using where
4 DERIVED u eq_ref userid userid 2 db1154575-helfer.ua.userid 1 100.00 Using index
3 DERIVED t range PRIMARY,start start 5 NULL 21 100.00 Using where; Using index; Using temporary; Using filesort
3 DERIVED ua ref usertasks,userid,taskid taskid 2 db1154575-helfer.t.id 2 100.00 Using where
3 DERIVED u eq_ref userid userid 2 db1154575-helfer.ua.userid 1 100.00 Using index
2 DERIVED t range PRIMARY,start start 5 NULL 44 100.00 Using where; Using index; Using temporary; Using filesort
2 DERIVED ua ref usertasks,userid,taskid taskid 2 db1154575-helfer.t.id 2 100.00 Using where
2 DERIVED u eq_ref userid userid 2 db1154575-helfer.ua.userid 1 100.00 Using index

So, is all that really just a long-winded way of saying this...
SELECT u.*
, DATE(FROM_UNIXTIME(t.start)) dt
, COUNT(t.taskid) total
FROM users u
LEFT
JOIN userassignment ut
ON ut.userid = u.userid
AND ut.deleted = 0
LEFT
JOIN tasks t
ON t.taskid = ut.taskid
GROUP
BY u.userid
, DATE(FROM_UNIXTIME(t.start))
In the example above, you can change COUNT(t.taskid) to COUNT(CASE WHEN x = 'y' THEN z END) or SUM(CASE...

This should return the same result set:
SELECT u.userid, u.username,
COUNT(ua.id) AS count_all,
SUM(case when t.start > UNIX_TIMESTAMP('2014-08-01 00:00:00') AND
t.start < UNIX_TIMESTAMP('2014-08-02 00:00:00')
then 1 else 0
end) as count_one,
SUM(case when t.start > UNIX_TIMESTAMP('2014-07-31 00:00:00') AND
t.start < UNIX_TIMESTAMP('2014-08-01 00:00:00')
then 1 else 0
end) as count_two,
SUM(case when t.start > UNIX_TIMESTAMP('2014-08-02 00:00:00') AND
t.start < UNIX_TIMESTAMP('2014-08-04 00:00:00')
then 1 else 0
end) as count_three
FROM users u LEFT JOIN
userassignment ua
ON ua.userid = u.userid AND
ua.deleted = 0 LEFT JOIN
tasks t
ON ua.taskid = t.taskid
WHERE ua.deleted = 0 AND
t.start > UNIX_TIMESTAMP('2014-07-31 00:00:00') AND
t.start < UNIX_TIMESTAMP('2014-08-04 00:00:00')
GROUP BY u.userid
ORDER BY u.username;
Your formulation is a bit tricky. The outer joins are filter out any user whose assignments are always deleted, for instance. And the date periods are overlapping (I'm not sure if that is intentional, but it is how the query is structured).
Perhaps this simpler query will not exceed internal limits.

Optimizing a MySql database for large queries

I am building a big database that. one of my tables have 300K records and another once has 5 million record.
I currently have all foreign keys and the column named "ph.trigger_on" indexed.
My question is how can I optimize my table/query to get a faster results? I tried to create a view with this code then from this view I can get all the information that I need when I do a query to this view.
By the query is still slow and I am having difficulties understanding the results that EXPLAIN is showing.
This is my current query
EXPLAIN SELECT
ac.account_name AS accountName,
tm.name AS teamName,
cp.name AS campaignName,
cc.call_code_name AS callCode,
rc.result_code_name AS resultCode,
zn.name AS zoneName,
ind.name AS industry,
(su.first_name + su.middle_name + su.last_name) AS owner_name,
su.login_user AS ownerLoginUser,
(su1.first_name + su1.middle_name + su1.last_name) AS firstAttemptBy,
(su2.first_name + su2.middle_name + su2.last_name) AS lastAttemptBy,
(su3.first_name + su3.middle_name + su.last_name) AS modifiedBy,
ci.name AS clientName,
ph.trigger_on AS triggerOn,
ph.created_on AS createdOn,
ph.first_attempt_on AS firstAttemptOn,
ph.call_subject AS callSubject,
ph.status,
ph.last_attempt_on AS lastAttemptOn,
ph.total_attempts AS totalAttempts,
ph.call_direction AS callDirection,
ph.call_notes AS callNotes,
ph.call_duration AS callDuration,
ph.modified_on AS modifiedOn
FROM phone_calls AS ph
INNER JOIN accounts AS ac ON ph.account_id = ac.account_id
INNER JOIN clients AS ci ON ac.client_id = ci.client_id
INNER JOIN industries AS ind ON ac.industry_id = ind.industry_id
INNER JOIN call_codes AS cc ON ph.call_code_id = cc.call_code_id
INNER JOIN time_zones AS zn ON ph.time_zone_id = zn.time_zone_id
INNER JOIN users AS su ON ph.owner_id = su.user_id
LEFT JOIN teams AS tm ON ph.team_id = tm.team_id
LEFT JOIN result_codes AS rc ON ph.result_code_id = rc.result_code_id
LEFT JOIN campaigns AS cp ON ph.campaign_id = cp.campaign_id
LEFT JOIN users AS su1 ON ph.first_attempt_by = su1.user_id
LEFT JOIN users AS su2 ON ph.last_attempt_by = su2.user_id
LEFT JOIN users AS su3 ON ph.modified_by = su3.user_id
WHERE ph.trigger_on < now()
LIMIT 1000
this is my current output.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ci ALL PRIMARY 1
1 SIMPLE zn ALL PRIMARY 1 Using join buffer (Block Nested Loop)
1 SIMPLE su ALL PRIMARY 1 Using join buffer (Block Nested Loop)
1 SIMPLE ac ref PRIMARY,client_id,industry_id client_id 4 rdi_cms.ci.client_id 95917
1 SIMPLE ind eq_ref PRIMARY PRIMARY 4 rdi_cms.ac.industry_id 1
1 SIMPLE ph ref owner_id,call_code_id,account_id,time_zone_id,trigger_on account_id 4 rdi_cms.ac.account_id 11 Using where
1 SIMPLE tm ALL PRIMARY 1 Using where; Using join buffer (Block Nested Loop)
1 SIMPLE rc eq_ref PRIMARY PRIMARY 4 rdi_cms.ph.result_code_id 1
1 SIMPLE cc eq_ref PRIMARY PRIMARY 4 rdi_cms.ph.call_code_id 1
1 SIMPLE cp ALL PRIMARY 1 Using where; Using join buffer (Block Nested Loop)
1 SIMPLE su1 ALL PRIMARY 1 Using where; Using join buffer (Block Nested Loop)
1 SIMPLE su2 ALL PRIMARY 1 Using where; Using join buffer (Block Nested Loop)
1 SIMPLE su3 ALL PRIMARY 1 Using where; Using join buffer (Block Nested Loop)
What can I do to improve my tables or my query.

it could make a difference if you push a join into a subquery in the SELECT part of your query like this:
SELECT
ac.account_name AS accountName,
tm.name AS teamName,
cp.name AS campaignName,
cc.call_code_name AS callCode,
rc.result_code_name AS resultCode,
(SELECT zn.name FROM time_zones AS zn WHERE ph.time_zone_id = zn.time_zone_id) AS zoneName,
(SELECT ind.name FROM industries AS ind WHERE ac.industry_id = ind.industry_id) AS industry,
(SELECT su.first_name + su.middle_name + su.last_name users AS su WHERE ph.owner_id = su.user_id) AS owner_name,
su.login_user AS ownerLoginUser,
(su1.first_name + su1.middle_name + su1.last_name) AS firstAttemptBy,
(su2.first_name + su2.middle_name + su2.last_name) AS lastAttemptBy,
(su3.first_name + su3.middle_name + su.last_name) AS modifiedBy,
ci.name AS clientName,
ph.trigger_on AS triggerOn,
ph.created_on AS createdOn,
ph.first_attempt_on AS firstAttemptOn,
ph.call_subject AS callSubject,
ph.status,
ph.last_attempt_on AS lastAttemptOn,
ph.total_attempts AS totalAttempts,
ph.call_direction AS callDirection,
ph.call_notes AS callNotes,
ph.call_duration AS callDuration,
ph.modified_on AS modifiedOn
FROM phone_calls AS ph
INNER JOIN accounts AS ac ON ph.account_id = ac.account_id
INNER JOIN clients AS ci ON ac.client_id = ci.client_id
INNER JOIN call_codes AS cc ON ph.call_code_id = cc.call_code_id
INNER JOIN time_zones AS zn ON ph.time_zone_id = zn.time_zone_id
LEFT JOIN teams AS tm ON ph.team_id = tm.team_id
LEFT JOIN result_codes AS rc ON ph.result_code_id = rc.result_code_id
LEFT JOIN campaigns AS cp ON ph.campaign_id = cp.campaign_id
LEFT JOIN users AS su1 ON ph.first_attempt_by = su1.user_id
LEFT JOIN users AS su2 ON ph.last_attempt_by = su2.user_id
LEFT JOIN users AS su3 ON ph.modified_by = su3.user_id
WHERE ph.trigger_on < now()
LIMIT 1000
here i pushed 3 joins into your SELECT part.

OR in left join

billboards table 140000 rows, regions 1000 rows.
SELECT
r.id,
SUM(IF(bb.r1_id = r.id, 1, 0)) AS count,
SUM(IF(bb.r2_id = r.id, 1, 0)) AS count2
FROM
tmp_regions AS r
LEFT JOIN
tmp_billboards AS bb
ON (r.id = bb.r1_id OR r.id = bb.r2_id)
WHERE
bb.deleted = 0
AND
bb.x != 0
AND
bb.y != 0
GROUP BY r.id
ORDER BY r.capital DESC , r.other , r.name
execution time is 8 sec
Explain
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE bb ref bb_r,bb_deleted,bb_x,bb_y,deleted_x_y,bb_r2 bb_deleted 1 const 66396 Using where; Using temporary; Using filesort
1 SIMPLE r ALL PRIMARY NULL NULL NULL 1000 Using where; Using join buffer
how can i change OR in join to improve perfomance?

Add indexes. The output of explain shows you which fields need them.

Assuming that tmp_regions (id) is the primary key, you could rewrite the query and the OR is converted to 2 joins:
SELECT
r.id,
COALESCE(bb1.cnt, 0) AS count,
COALESCE(bb2.cnt, 0) AS count2
FROM
tmp_regions AS r
LEFT JOIN
( SELECT r1_id, COUNT(*) AS cnt
FROM tmp_billboards
WHERE deleted = 0
AND x <> 0
AND y <> 0
GROUP BY r1_id
) AS bb1
ON r.id = bb1.r1_id
LEFT JOIN
( SELECT r2_id, COUNT(*) AS cnt
FROM tmp_billboards
WHERE deleted = 0
AND x <> 0
AND y <> 0
GROUP BY r2_id
) AS bb2
ON r.id = bb2.r2_id
ORDER BY r.capital DESC , r.other , r.name ;
For efficiency, indexes on (deleted, r1_id, x, y) and (deleted, r2_id, x, y) would help to avoid table scans on the tmp_billboards.

how to rewrite the query to make it faster and sql compliant

This query:
SELECT `subscribers`.`email_address`, `subscribers`.`first_name`, `subscribers`.`last_name`,
GROUP_CONCAT( DISTINCT t1.value SEPARATOR '|' ) AS 'Colors', GROUP_CONCAT( DISTINCT t2.value SEPARATOR '|' ) AS 'Languages'
FROM `subscribers`
LEFT JOIN `subscribers_multivalued` AS `t1` ON subscribers.subscriber_id = t1.subscriber_id AND t1.field_id = 112
LEFT JOIN `subscribers_multivalued` AS `t2` ON subscribers.subscriber_id = t2.subscriber_id AND t2.field_id = 111
WHERE (list_id = 40) AND (state = 1)
GROUP BY `subscribers`.`email_address` , `subscribers`.`first_name` , `subscribers`.`last_name`
With execute plan:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE subscribers ref FK_list_id,state_date_added FK_list_id 4 const 1753610 Using where; Using filesort
1 SIMPLE t1 ref subscriber_fk,field_fk subscriber_fk 4 chad0598_mailablel.subscribers.subscriber_id 1
1 SIMPLE t2 ref subscriber_fk,field_fk subscriber_fk 4 chad0598_mailablel.subscribers.subscriber_id 1
Caused error for uknown reason: General error: 3 Error writing file '/tmp/MYzamaNT' (Errcode: 28) in mysql.
I rewrote it like this groupy by subscriber_id insead:
SELECT `subscribers`.`email_address`, `subscribers`.`first_name`, `subscribers`.`last_name`,
GROUP_CONCAT( DISTINCT t1.value SEPARATOR '|' ) AS 'Colors', GROUP_CONCAT( DISTINCT t2.value SEPARATOR '|' ) AS 'Languages'
FROM `subscribers`
LEFT JOIN `subscribers_multivalued` AS `t1` ON subscribers.subscriber_id = t1.subscriber_id AND t1.field_id = 112
LEFT JOIN `subscribers_multivalued` AS `t2` ON subscribers.subscriber_id = t2.subscriber_id AND t2.field_id = 111
WHERE (list_id = 40) AND (state = 1)
GROUP BY `subscribers`.`subscriber_id`
The plan of this query is better (not filesort):
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE subscribers ref FK_list_id,state_date_added FK_list_id 4 const 1753610 Using where
1 SIMPLE t1 ref subscriber_fk,field_fk subscriber_fk 4 chad0598_mailablel.subscribers.subscriber_id 1
1 SIMPLE t2 ref subscriber_fk,field_fk subscriber_fk 4 chad0598_mailablel.subscribers.subscriber_id 1
It seems that it works in mysql and it much faster that previous query.
Although it works in mysql it's not compliant sql standard there shouldn't be aggregated fields in fields' list that are not present in GROUP BY. I there a way to make the query as fast as the last one and make it sql compliant?
I tried to rewrite it this way too:
SELECT `subscribers`.`email_address`, `subscribers`.`first_name`, `subscribers`.`last_name`,
t1.colors AS 'Colors', t2.languages AS 'Languages'
FROM `subscribers`
LEFT JOIN (SELECT subscriber_id, GROUP_CONCAT( DISTINCT value SEPARATOR '|' ) AS 'colors'
FROM subscribers_multivalued
WHERE field_id = 112
GROUP BY subscriber_id) t1 ON t1.subscriber_id = `subscribers`.`subscriber_id`
LEFT JOIN (SELECT subscriber_id, GROUP_CONCAT( DISTINCT value SEPARATOR '|' ) AS 'languages'
FROM subscribers_multivalued
WHERE field_id = 111
GROUP BY subscriber_id) t2 ON t2.subscriber_id = `subscribers`.`subscriber_id`
The plan of this query is much worse:
1 PRIMARY subscribers ALL NULL NULL NULL NULL 23358546
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 900000
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 900000
3 DERIVED subscribers_multivalued ALL field_fk field_fk 4 20621115 Using filesort
2 DERIVED subscribers_multivalued ALL field_fk field_fk 4 20621115 Using filesort
and I couldn't wait untill it returns data;

I'd keep the faster one. You can also test/benchmark this variation:
SELECT s.email_address,
s.first_name,
s.last_name,
COALESCE(t1.Colors, '') AS Colors,
COALESCE(t2.Languages, '') AS Languages
FROM subscribers AS s
LEFT JOIN
( SELECT t1.subscriber_id,
GROUP_CONCAT( DISTINCT t1.value SEPARATOR '|' ) AS Colors
FROM subscribers AS s
JOIN subscribers_multivalued AS t1
ON s.subscriber_id = t1.subscriber_id
WHERE t1.field_id = 112
AND s.list_id = 40
AND s.state = 1
GROUP BY t1.subscriber_id
) AS t1
ON s.subscriber_id = t1.subscriber_id
LEFT JOIN
( SELECT t2.subscriber_id,
GROUP_CONCAT( DISTINCT t2.value SEPARATOR '|' ) AS Languages
FROM subscribers AS s
JOIN subscribers_multivalued AS t2
ON s.subscriber_id = t2.subscriber_id
WHERE t2.field_id = 111
AND s.list_id = 40
AND s.state = 1
GROUP BY t2.subscriber_id
) AS t2
ON s.subscriber_id = t2.subscriber_id
WHERE s.list_id = 40
AND s.state = 1 ;
If (field_id, subscriber_id, value) is unique in table subscribers_multivalued, you can also drop the two DISTINCT.
Regarding speed and efficiency, check the indexes you have. These would help both your version and this one:
in the subscribers table, an index on either (list_id, state, subscriber_id) or (state, list_id, subscriber_id)
in the subscribers_multivalued table, an index on (field_id, subscriber_id, value) or (second best) on (field_id, subscriber_id).

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Avoid full table scan in inner select - mysql

Related

How can I optimize my sql code?

Rebuild MySQL query to stay below MAX_JOIN_SIZE rows

Optimizing a MySql database for large queries

OR in left join

how to rewrite the query to make it faster and sql compliant

Categories

Resources