MySQL UNION: Which method is quickest - mysql

Normally i go for smallest code but im currently building a payment gateway which could potentially handle millions of connections a day so every tweak in code to help performance I build in, however i have a question that can only really be answered by the mysql gurus..
with a union, is it best to filter each side of the union with where clauses, or to collate the records and then do the filtering on the union set..
which one is quickest:
1)
select * from (
select * from payment_routes
where channel_id is null
and currency_id != v_currency_id
and card_id = v_card_id
and enabled = 1
union
select * from payment_routes
where channel_id = p_channel_id
and currency_id = v_currency_id
and card_id = v_card_id
and enabled = 1
) t1
join gateways g on t1.gateway_id = g.id
where t1.currency_id = v_currency_id
;
2) or is this smaller code, as quick/quicker (but not slower)
select * from (
select * from payment_routes
where channel_id is null
and currency_id != v_currency_id
union
select * from payment_routes
where channel_id = p_channel_id
and currency_id = v_currency_id
) t1
join gateways g on t1.gateway_id = g.id
where t1.currency_id = v_currency_id
and t1.card_id = v_card_id
and t1.enabled = 1
;
logically, i would say (1) is quicker as each union set is filtered so less records need to be combined.
UPDATE
I found a better way to write this without a union
select r.*, g.`name` 'gateway_name'
from payment_routes r
join gateways g on r.gateway_id = g.id
where (
(channel_id is null and currency_id != v_sell_currency_id)
or
(channel_id = p_channel_id and currency_id = v_sell_currency_id)
)
and card_id = v_card_id
and currency_id = v_pay_currency_id
and enabled = 1

Related

MySQL group by kills the query performance

I have MySQL query currently selecting and joining 13 tables and finally grouping ~60k rows. The query without grouping takes ~0ms but with grouping the query time increases to ~1.7sec. The field, which is used for grouping is primary field and is indexed. Where could be the issue?
I know group by without aggregate is considered invalid query and bad practise but I need distinct base table rows and can not use DISTINCT syntax.
The query itself looks like this:
SELECT `table_a`.*
FROM `table_a`
LEFT JOIN `table_b`
ON `table_b`.`invoice` = `table_a`.`id`
LEFT JOIN `table_c` AS `r1`
ON `r1`.`invoice_1` = `table_a`.`id`
LEFT JOIN `table_c` AS `r2`
ON `r2`.`invoice_2` = `table_a`.`id`
LEFT JOIN `table_a` AS `i1`
ON `i1`.`id` = `r1`.`invoice_2`
LEFT JOIN `table_a` AS `i2`
ON `i2`.`id` = `r2`.`invoice_1`
JOIN `table_d` AS `_u0`
ON `_u0`.`id` = 1
LEFT JOIN `table_e` AS `_ug0`
ON `_ug0`.`user` = `_u0`.`id`
JOIN `table_f` AS `_p0`
ON ( `_p0`.`enabled` = 1
AND ( ( `_p0`.`role` < 2
AND `_p0`.`who` IS NULL )
OR ( `_p0`.`role` = 2
AND ( `_p0`.`who` = '0'
OR `_p0`.`who` = `_u0`.`id` ) )
OR ( `_p0`.`role` = 3
AND ( `_p0`.`who` = '0'
OR `_p0`.`who` = `_ug0`.`group` ) ) ) )
AND ( `_p0`.`action` = '*'
OR `_p0`.`action` = 'read' )
AND ( `_p0`.`related_table` = '*'
OR `_p0`.`related_table` = 'table_name' )
JOIN `table_a` AS `_e0`
ON ( ( `_p0`.`related_id` = 0
OR `_p0`.`related_id` = `_e0`.`id`
OR `_p0`.`related_user` = `_e0`.`user`
OR `_p0`.`related_group` = `_e0`.`group` )
OR ( `_p0`.`role` = 0
AND `_e0`.`user` = `_u0`.`id` )
OR ( `_p0`.`role` = 1
AND `_e0`.`group` = `_ug0`.`group` ) )
AND `_e0`.`id` = `table_a`.`id`
JOIN `table_d` AS `_u1`
ON `_u1`.`id` = 1
LEFT JOIN `table_e` AS `_ug1`
ON `_ug1`.`user` = `_u1`.`id`
JOIN `table_f` AS `_p1`
ON ( `_p1`.`enabled` = 1
AND ( ( `_p1`.`role` < 2
AND `_p1`.`who` IS NULL )
OR ( `_p1`.`role` = 2
AND ( `_p1`.`who` = '0'
OR `_p1`.`who` = `_u1`.`id` ) )
OR ( `_p1`.`role` = 3
AND ( `_p1`.`who` = '0'
OR `_p1`.`who` = `_ug1`.`group` ) ) ) )
AND ( `_p1`.`action` = '*'
OR `_p1`.`action` = 'read' )
AND ( `_p1`.`related_table` = '*'
OR `_p1`.`related_table` = 'table_name' )
JOIN `table_g` AS `_e1`
ON ( ( `_p1`.`related_id` = 0
OR `_p1`.`related_id` = `_e1`.`id`
OR `_p1`.`related_user` = `_e1`.`user`
OR `_p1`.`related_group` = `_e1`.`group` )
OR ( `_p1`.`role` = 0
AND `_e1`.`user` = `_u1`.`id` )
OR ( `_p1`.`role` = 1
AND `_e1`.`group` = `_ug1`.`group` ) )
AND `_e1`.`id` = `table_a`.`company`
WHERE `table_a`.`date_deleted` IS NULL
AND `table_a`.`company` = 4
AND `table_a`.`type` = 1
AND `table_a`.`date_composed` >= '2016-05-04 14:43:55'
GROUP BY `table_a`.`id`
The ORs kill performance.
This composite index may help: INDEX(company, type, date_deleted, date_composed).
LEFT JOIN table_b ON table_b.invoice = table_a.id seems to do absolutely nothing other than slow down the processing. No fields of table_b are used or SELECTed. Since it is a LEFT join, it does not limit the output. Etc. Get rid if it, or justify it.
Ditto for other joins.
What happens with JOIN and GROUP BY: First, all the joins are performed; this explodes the number of rows in the intermediate 'table'. Then the GROUP BY implodes the set of rows.
One technique for avoiding this explode-implode sluggishness is to do
SELECT ...,
( SELECT ... ) AS ...,
...
instead of a JOIN or LEFT JOIN. However, that works only if there is zero or one row in the subquery. Usually this is beneficial when an aggregate (such as SUM) can be moved into the subquery.
For further discussion, please include SHOW CREATE TABLE.

How to use both SELECT and UPDATE statements in one query?

I have this update query which works as well:
UPDATE tbname t CROSS JOIN ( SELECT related FROM tbname WHERE id = 5 ) x
SET AcceptedAnswer = ( id = 5 )
WHERE t.related = x.related
I also have two select statements which validates somethings. Actually I want to check these to conditions before updating:
Condition1:
(SELECT 1 FROM tbname
WHERE id = x.related AND
author_id = 29
)
Condition2:
(SELECT 1 FROM tbname
WHERE id = x.related AND
(
( amount IS NOT NULL AND
NOT EXISTS ( SELECT 1 FROM tbname
WHERE related = x.related AND
AcceptedAnswer = 1 )
) OR amount IS NULL
)
)
How can I combine those two conditions with that updating query?
Here is what I've tried so far but it doesn't work and throws this error:
UPDATE tbname CROSS JOIN ( SELECT related FROM tbname WHERE id = 5 ) x
SET AcceptedAnswer = ( id = 5 )
WHERE q.related = x.related
AND
(SELECT 1 FROM tbname
WHERE id = x.related AND
author_id = 29
) AND
(SELECT 1 FROM tbname
WHERE id = x.related AND
(
( amount IS NOT NULL AND
NOT EXISTS ( SELECT 1 FROM tbname
WHERE related = x.related AND
AcceptedAnswer = 1 )
) OR amount IS NULL
)
)
#1093 - You can't specify target table 'tbname' for update in FROM clause
Seems your update is equivalent to this
update tbname as a
inner join tbname as b on a.related = b.related and b.id = 5
set AcceptedAnswer = (id = 5)
your query seem set to true (1) the AccepetdAnswer of the row with id = 5 for the row that have acceppeted equalt to the accepted value of th row with id = 5 (false / 0) in the other case ..
for test use
select * from tbname as a
inner join tbname as b on a.related = b.related and b.id = 5
and (b.related = a.id and a.author_id = 29)
and (b.related = a.id and
(a.amont is not null and (a.related = b.related and a.AcceptedAnswer = 1)))
I'm not pretty sure what is the purpose of the SET clause (id =5)
anyway this way avoids the use of the cross join provided that you
don't use the table "x" to get something beyond the "related" items.
UPDATE tbname
SET
AcceptedAnswer = ( id = 5 )
WHERE
#THIS IS EQUIVALENT TO THE JOIN CLAUSE
id IN ( SELECT related FROM tbname WHERE id = 5 )
#THIS IS THE CONDITION 1 POINTING tnname
AND author_id = 29
#THIS IS THE CONDITION 2 POINTING tbname
AND (
( amount IS NOT NULL
AND NOT AcceptedAnswer = 1
) OR amount IS NOT NULL
)
;

MySQL 500 million rows table in select query with join

I'm concerned about the performance of the query below once the tables are fully populated. So far it's under development and performs well with dummy data.
The table "adress_zoo" will contain about 500 million records once fully populated. "adress_zoo" table looks like this:
CREATE TABLE `adress_zoo`
( `adress_id` int(11) NOT NULL, `zoo_id` int(11) NOT NULL,
UNIQUE KEY `pk` (`adress_id`,`zoo_id`),
KEY `adress_id` (`adress_id`) )
ENGINE=InnoDB DEFAULT CHARSET=latin1;
The other tables will contain maximum 500 records each.
The full query looks like this:
SELECT a.* FROM jos_zoo_item AS a
JOIN jos_zoo_search_index AS zsi2 ON zsi2.item_id = a.id
WHERE a.id IN (
SELECT r.id FROM (
SELECT zi.id AS id, Max(zi.priority) as prio
FROM jos_zoo_item AS zi
JOIN jos_zoo_search_index AS zsi ON zsi.item_id = zi.id
LEFT JOIN jos_zoo_tag AS zt ON zt.item_id = zi.id
JOIN jos_zoo_category_item AS zci ON zci.item_id = zi.id
**JOIN adress_zoo AS az ON az.zoo_id = zi.id**
WHERE 1=1
AND ( (zci.category_id != 0 AND ( zt.name != 'prolong' OR zt.name is NULL))
OR (zci.category_id = 0 AND zt.name = 'prolong') )
AND zi.type = 'telefoni'
AND zsi.element_id = '44d3b1fd-40f6-4fd7-9444-7e11643e2cef'
AND zsi.value = 'Small'
AND zci.category_id > 15
**AND az.adress_id = 5**
GROUP BY zci.category_id ) AS r
)
AND a.application_id = 6
AND a.access IN (1,1)
AND a.state = 1
AND (a.publish_up = '0000-00-00 00:00:00' OR a.publish_up <= '2012-06-07 07:51:26')
AND (a.publish_down = '0000-00-00 00:00:00' OR a.publish_down >= '2012-06-07 07:51:26')
AND zsi2.element_id = '1c3cd26e-666d-4f8f-a465-b74fffb4cb14'
GROUP BY a.id
ORDER BY zsi2.value ASC
The query will usually return about 25 records.
Based on your experience, will this query perform acceptable (respond within say 3 seconds)?
What can I do to optimise this?
As adviced by #Jack I ran the query with EXPLAIN and got this:
This part is an important limiter:
az.adress_id = 5
MySQL will limit the table to only those records where adress_id matches before joining it with the rest of the statement, so it will depend on how big you think that result set might be.
Btw, you have a UNIQUE(adress_id, zoo_id) and a separate INDEX. Is there a particular reason? Because the first part of a spanning key can be used by MySQL to select with as well.
What's also important is to use EXPLAIN to understand how MySQL will "attack" your query and return the results. See also: http://dev.mysql.com/doc/refman/5.5/en/execution-plan-information.html
To avoid subquery you can try to rewrite your query as:
SELECT a.* FROM jos_zoo_item AS a
JOIN jos_zoo_search_index AS zsi2 ON zsi2.item_id = a.id
INNER JOIN
(
SELECT ** distinct ** r.id FROM (
SELECT zi.id AS id, Max(zi.priority) as prio
FROM jos_zoo_item AS zi
JOIN jos_zoo_search_index AS zsi ON zsi.item_id = zi.id
LEFT JOIN jos_zoo_tag AS zt ON zt.item_id = zi.id
JOIN jos_zoo_category_item AS zci ON zci.item_id = zi.id
**JOIN adress_zoo AS az ON az.zoo_id = zi.id**
WHERE 1=1
AND ( (zci.category_id != 0 AND ( zt.name != 'prolong' OR zt.name is NULL))
OR (zci.category_id = 0 AND zt.name = 'prolong') )
AND zi.type = 'telefoni'
AND zsi.element_id = '44d3b1fd-40f6-4fd7-9444-7e11643e2cef'
AND zsi.value = 'Small'
AND zci.category_id > 15
**AND az.adress_id = 5**
GROUP BY zci.category_id ) AS r
) T
on a.id = T.id
where
AND a.application_id = 6
AND a.access IN (1,1)
AND a.state = 1
AND (a.publish_up = '0000-00-00 00:00:00' OR a.publish_up <= '2012-06-07 07:51:26')
AND (a.publish_down = '0000-00-00 00:00:00' OR a.publish_down >= '2012-06-07 07:51:26')
AND zsi2.element_id = '1c3cd26e-666d-4f8f-a465-b74fffb4cb14'
GROUP BY a.id
ORDER BY zsi2.value ASC
This approach don't perform subquery for each candidate row. Performance may be increased only if T is calculated in few milliseconds.

mysql: please help me get rid of nested clauses

I have written the following query (the repeated parts put together from string constants) that tries to do the following in a shared calendar app.
search status_relation for all users with status not equal to 0 at current date+time
search default_status_relation for all users with status not equal to 0 at current weekday+time. This gives default weekly statuses when nothing is available in status_relation.
full outer join the two
then join on usernames from userid
this then gets processed by php which will display either the status (or default status, if status doesn't exist) of all users on that day who have status!=0.
the php needs to know whether it was returned a status or default status
Currently the query works (with a UNION to simulate an outer join). However, I would like to optimize it - I understand removing the subqueries might help. How can that be done?
SELECT q.*,
users.username,
users.userid AS uid
FROM (SELECT *
FROM ((SELECT sr.status AS srstatus,
dsr.status AS dsrstatus,
sr.userid AS sruserid,
dsr.userid AS dsruserid
FROM (SELECT *
FROM status_relation
WHERE DATE = '2012-03-19'
AND TIME = '0'
) sr
LEFT JOIN
(SELECT *
FROM default_status_relation
WHERE weekday = '0'
AND TIME = '0')
) dsr
ON sr.userid = dsr.userid)
UNION
(SELECT sr.status AS srstatus,
dsr.status AS dsrstatus,
sr.userid AS sruserid,
dsr.userid AS dsruserid
FROM (SELECT *
FROM status_relation
WHERE DATE = '2012-03-19'
AND TIME = '0') sr
RIGHT JOIN (SELECT *
FROM default_status_relation
WHERE weekday = '0'
AND TIME = '0') dsr
ON sr.userid = dsr.userid)
) myjoin
WHERE ( ( sruserid IS NOT NULL
AND srstatus != '0' )
OR ( sruserid IS NULL
AND dsrstatus != '0' ) )) q
LEFT JOIN users
ON ( q.sruserid = users.userid
OR q.dsruserid = users.userid )
You should be able to replace this part :
(SELECT sr.status AS srstatus,
dsr.status AS dsrstatus,
sr.userid AS sruserid,
dsr.userid AS dsruserid
FROM (SELECT *
FROM status_relation
WHERE DATE = '2012-03-19'
AND TIME = '0'
) sr
LEFT JOIN
(SELECT *
FROM default_status_relation
WHERE weekday = '0'
AND TIME = '0')
) dsr
ON sr.userid = dsr.userid)
by
SELECT sr.status AS srstatus,
dsr.status AS dsrstatus,
sr.userid AS sruserid,
dsr.userid AS dsruserid
FROM status_relation sr
left join default_status_relation dsr
ON sr.userid = dsr.userid and
sr.date = '2012-03-19' and
sr.time = TIME = '0' and
dsr.weekday = 0 and
dsr.time = 0;
You can do the same on the UNION

Sum of rows with join

This is the current table layout.
There are 3 legs
Each leg has 2 points, where is_start = 1 is the start of the leg, and is_start is the end of the leg.
When the user check in at a point, a entry in points_user are created.
In this application you have multiple legs which has 2 points where one marks the start of the leg, where the other marks the end of the leg. So the sum of User's (with id = 2) Leg (with id= 1) is points_users.created where points_users.leg_id = 1 and points_users.user_id = 2 and points_users.is_start = 0 minus points_users where is_start = 1 (and the other parameters stay the same). And that's for just one leg.
What I would like is to sum all the time differences for each leg, we get the data like this:
| User.id | User.name | total_time |
| 1 | John | 129934 |
Anyone know how I can join these tables and sum it up grouped by user?
(No, this is not homework)
As far as I got:
SELECT
( `end_time` - `start_time` ) AS `diff`
FROM
(
SELECT SUM(UNIX_TIMESTAMP(`p1`.`created`)) AS `start_time`
FROM `points_users` AS `pu1`
LEFT JOIN `points` AS `p1` ON `pu1`.`point_id` = `p1`.`id`
WHERE `p1`.`is_start` = 1
) AS `start_time`,
(
SELECT SUM(UNIX_TIMESTAMP(`pu2`.`created`)) AS `end_time`
FROM `points_users` AS `pu2`
LEFT JOIN `points` AS `p2` ON `pu2`.`point_id` = `p2`.`id`
WHERE `p2`.`is_start` = 0
) AS `end_time`
Try this:
select users.user_id,
users.user_name,
SUM(timeDuration) totalTime
from users
join (
select
pStart.User_id,
pStart.leg_id,
(pEnd.created - pStart.created) timeDuration
from (select pu.user_id, pu.leg_id, pu.created
from points_users pu
join points p on pu.id = p.point_id and pu.leg_id = p.leg_id
where p.is_start = 1 ) pStart
join (select pu.user_id, pu.leg_id, pu.created
from points_users pu
join points p on pu.id = p.point_id and pu.leg_id = p.leg_id
where p.is_start = 0 ) pEnd
on pStart.user_id = pEnd.user_id
and pStart.leg_id = pEnd.leg_id
) tt
on users.user_id = tt.user_id
group by users.user_id, users.user_name
Subquery gets the time duration for each user/leg, and main query then sums them for all the legs of each user.
EDIT: Added the points table now that I can see your attempt at a query.
The simplest way is to join points_users to itself:
select leg_start.user_id, sum(leg_end.created - leg_start.created)
from points_users leg_start
join points_users leg_end on leg_start.user_id = leg_end.user_id
and leg_start.leg_id = leg_end.leg_id
join points point_start on leg_start.point_id = point_start.id
join points point_end on leg_end.point_id = point_end.id
where point_start.is_start = 1 and point_end.is_start = 0
group by leg_start.user_id
Some people prefer to put those is_start filters in the join condition, but since it's an inner join that's mainly just a point of style. If it were an outer join, then moving them from the WHERE to the JOIN could have an effect on the results.