MYSQL - JOIN with OR condition - mysql

I have 3 Tables CompanyMaster(Which has 3 Million rows), Token1, Token2 and the table structure is,
CompanyMaster
CREATE TABLE `CompanyMaster` (
`CompanyUID` int(11) NOT NULL AUTO_INCREMENT,
`WebDomain` varchar(150) DEFAULT NULL,
`CompanyPrimaryName` varchar(200) DEFAULT NULL,
PRIMARY KEY (`CompanyUID`)
) ENGINE=InnoDB AUTO_INCREMENT=3941244 DEFAULT CHARSET=latin1
Token1
CREATE TABLE `Token1`(
`CompanyUID` int(11) NOT NULL,
`Token` varchar(50) NOT NULL,
KEY `Token` (`Token`),
KEY `CompanyUID` (`CompanyUID`),
CONSTRAINT `CompanyAlias4_ibfk_1` FOREIGN KEY (`CompanyUID`) REFERENCES `CompanyMaster` (`CompanyUID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Token2
CREATE TABLE `Token2` (
`CompanyUID` int(11) NOT NULL,
`Token` varchar(100) NOT NULL,
KEY `Token` (`Token`),
KEY `CompanyUID` (`CompanyUID`),
CONSTRAINT `CompanyAlias5_ibfk_1` FOREIGN KEY (`CompanyUID`) REFERENCES `CompanyMaster` (`CompanyUID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
I want to get the WebDomain from the CompanyMaster table using the Token1 and Token2 tables.
The Query i am using is,
SELECT WebDomain FROM CompanyMaster WHERE CompanyUID IN (
SELECT CompanyUID FROM Token1 WHERE Token='appleinc'
UNION
SELECT CompanyUID FROM Token2 WHERE Token='d012233:q122100:')
This query takes almost 30 Seconds to get the result. I executed the sub-query alone, which is taking < 100 milli-seconds.So the problem is with the IN condition.
I replaced the query with join and it is executing in < 200 ms,
SELECT c.CompanyUID FROM `CompanyMaster` c
JOIN `Token1` tk1
ON tk1.CompanyUID = c.CompanyUID AND tk1.Token= 'appleinc'
JOIN `Token2` tk2
ON tk2.CompanyUID = c.CompanyUID AND tk2.Token= 'd012233:q122100:'
But the problem with above query is , if tk1.Alias = 'appleinc' or tk2.Alias = 'd012233:q122100:' fails it is giving output as empty row. But i want the matched rows even if only one condition is matched.
Please help me how to solve this one ? And i also want the query to be executed in less than 10 milli-seconds. Is it achievable ?

You should certainly get better performance with UNION ALL than with UNION, as it will have no difference for your case in output, but it does not need to filter out duplicates like UNION does:
SELECT WebDomain
FROM CompanyMaster
WHERE CompanyUID IN
( SELECT CompanyUID
FROM Token1
WHERE Token = 'appleinc'
UNION ALL
SELECT CompanyUID
FROM Token2
WHERE Token = 'd012233:q122100:')
However, if you would put the UNION in the outer query, it might even give better performance, like this:
SELECT WebDomain
FROM CompanyMaster m
INNER JOIN Token1 t ON t.CompanyUID = m.CompanyUID
WHERE Token = 'appleinc'
UNION
SELECT WebDomain
FROM CompanyMaster m
INNER JOIN Token2 t ON t.CompanyUID = m.CompanyUID
WHERE Token = 'd012233:q122100:'
Here it is probably important to only get unique values, so you need UNION without ALL here.

You can use where clause to filter your record on the basis of toke1 and token2. On the basis of your requirement you can change that clause.
Please check following SQL. Hope it will solve your problem.
SELECT
c.CompanyUID, c.WebDomain
FROM
CompanyMaster c
LEFT JOIN Token1 tk1 ON tk1.CompanyUID = c.CompanyUID
LEFT JOIN Token2 tk2 ON tk2.CompanyUID = c.CompanyUID
WHERE
tk1.Token = '123' OR tk2.Token = 'xyz';

Related

query taking too long, while split it to two queries taking 0.2 sec

i have the current query:
select m.id, ms.severity, ms.risk_score, count(distinct si.id), boarding_date_tbl.boarding_date
from merchant m
join merchant_has_scan ms on m.last_scan_completed_id = ms.id
join scan_item si on si.merchant_has_scan_id = ms.id and si.is_registered = true
join (select m.id merchant_id, min(s_for_boarding.scan_date) boarding_date
from merchant m
left join merchant_has_scan ms on m.id = ms.merchant_id
left join scan s_for_boarding on s_for_boarding.id = ms.scan_id and s_for_boarding.scan_type = 1
group by m.id) boarding_date_tbl on boarding_date_tbl.merchant_id = m.id
group by m.id
limit 100;
when i run it on big scheme (about 2mil "merchant") it takes more then 20 sec.
but if i'll split it to:
select m.legal_name, m.unique_id, m.merchant_status, s_for_boarding.scan_date
from merchant m
join merchant_has_scan ms on m.id = ms.merchant_id
join scan s_for_boarding on s_for_boarding.id = ms.scan_id and s_for_boarding.scan_type = 1
group by m.id
limit 100;
and
select m.id, ms.severity, ms.risk_score, count(distinct si.id)
from merchant m
join merchant_has_scan ms on m.last_scan_completed_id = ms.id
join scan_item si on si.merchant_has_scan_id = ms.id and si.is_registered = true
group by m.id
limit 100;
both will take about 0.1 sec
the reason for that is clear, the low limit means it doesn't need to do much to get the first 100. it is also clear that the inner select cause the first query to run as much as it does.
my question is there a way to do the inner select only on the relevant merchants and not on the entire table?
Update
making a left join instead of a join before the inner query help reduce it to 6 sec, but it still a lot more then what i can get if i do 2 queries
UPDATE 2
create table for merchant:
CREATE TABLE `merchant` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`last_scan_completed_id` bigint(20) DEFAULT NULL,
`last_updated` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
CONSTRAINT `FK_9lhkm7tb4bt87qy4j3fjayec5` FOREIGN KEY (`last_scan_completed_id`) REFERENCES `merchant_has_scan` (`id`)
)
merchant_has_scan:
CREATE TABLE `merchant_has_scan` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`merchant_id` bigint(20) NOT NULL,
`risk_score` int(11) DEFAULT NULL,
`scan_id` bigint(20) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_merchant_id` (`scan_id`,`merchant_id`),
CONSTRAINT `FK_3d8f81ts5wj2u99ddhinfc1jp` FOREIGN KEY (`scan_id`) REFERENCES `scan` (`id`),
CONSTRAINT `FK_e7fhioqt9b9rp9uhvcjnk31qe` FOREIGN KEY (`merchant_id`) REFERENCES `merchant` (`id`)
)
scan_item:
CREATE TABLE `scan_item` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`is_registered` bit(1) NOT NULL,
`merchant_has_scan_id` bigint(20) NOT NULL,
PRIMARY KEY (`id`),
CONSTRAINT `FK_avcc5q3hkehgreivwhoc5h7rb` FOREIGN KEY (`merchant_has_scan_id`) REFERENCES `merchant_has_scan` (`id`)
)
scan:
CREATE TABLE `scan` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`scan_date` datetime DEFAULT NULL,
`scan_type` int(11) NOT NULL,
PRIMARY KEY (`id`)
)
and the explain:
You don't have the latest version of MySQL, which would be able to create an index for the derived table. (What version are you running?)
The "derived table" (the subquery) will be the first table in the EXPLAIN because, well, it has to be.
merchant_has_scan is a many:many table, but without the optimization tips here -- fixing this may be the biggest factor in speeding it up. Caveat: The tips suggest getting rid of id, but you seem to have a use for id, so keep it.
The COUNT(DISTINCT si.id) and JOIN si... can be replaced by ( SELECT COUNT(*) FROM scan_item WHERE ...), thereby eliminating one of the JOINs and possibly diminishing the Explode-Implode .
LEFT JOIN -- are you sometimes expecting to get NULL for boarding_date? If not, please use JOIN, not LEFT JOIN. (It is better to state your intention than to leave the query open to multiple interpretations.)
If you can remove the LEFTs, then since m.id and merchant_id are specified to be equal, why list them both in the SELECT? (This is a confusion factor, not a speed question).
You say you split it into two -- but you did not. You added LIMIT 100 to the inner query when you pulled it out. If you need that, add it to the derived table, too. Then you may be able to remove GROUP BY m.id LIMIT 100 from the outer query.

Mysql query optimisation to transform subquery in join without DISTINCT

I have tables:
CREATE TABLE IF NOT EXISTS `bk_cart_rule` (
`id_cart_rule` int(10) unsigned NOT NULL DEFAULT '0',
`cart_rule_restriction` tinyint(1) unsigned NOT NULL DEFAULT '0',
KEY `id_cart_rule` (`id_cart_rule`),
KEY `cart_rule_restriction` (`cart_rule_restriction`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `bk_cart_rule_combination` (
`id_cart_rule_1` int(10) unsigned NOT NULL,
`id_cart_rule_2` int(10) unsigned NOT NULL,
KEY `id_cart_rule_1` (`id_cart_rule_1`),
KEY `id_cart_rule_2` (`id_cart_rule_2`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `bk_cart_rule_lang` (
`id_cart_rule` int(10) unsigned NOT NULL,
`id_lang` int(10) unsigned NOT NULL,
KEY `id_cart_rule` (`id_cart_rule`),
KEY `id_lang` (`id_lang`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
And a query :
SELECT SQL_NO_CACHE cr.*, crl.*, 1 as selected FROM bk_cart_rule cr
LEFT JOIN bk_cart_rule_lang crl ON (cr.id_cart_rule = crl.id_cart_rule AND crl.id_lang = 2)
WHERE cr.id_cart_rule != 375 AND
( cr.cart_rule_restriction = 0 OR
cr.id_cart_rule IN (
SELECT IF(id_cart_rule_1 = 375, id_cart_rule_2, id_cart_rule_1) FROM bk_cart_rule_combination WHERE 375 = id_cart_rule_1 OR 375 = id_cart_rule_2 ) )
Obvious optimization is:
SELECT SQL_NO_CACHE DISTINCT cr.*, crl.* 1 as selected FROM bk_cart_rule cr
LEFT JOIN bk_cart_rule_lang crl ON (cr.id_cart_rule = crl.id_cart_rule AND crl.id_lang = 2)
LEFT JOIN bk_cart_rule_combination crc ON (375 = crc.id_cart_rule_1 AND cr.id_cart_rule = crc.id_cart_rule_2) OR (375 = crc.id_cart_rule_2 AND cr.id_cart_rule = crc.id_cart_rule_1)
WHERE cr.id_cart_rule != 375 AND (cr.cart_rule_restriction = 0 OR NOT ISNULL(crc.id_cart_rule_1))
But how can i get rid off DISTINCT (in bk_cart_rule_combination I've two-way combinations : )
id_cart_rule_1 id_cart_rule_2
375 776
776 375
Or maybe there is a better optimization possible?
If the ordering of the cart rules is not important, then add the constraint that the id for the first one is less than the id of the second one. That is, put them in the table in order.
Sadly, MySQL doesn't allow simple check constraints. Instead, you have to implement it in some other way. Here are three:
Implement an insert/update trigger to maintain the ordering (and prevent duplicates).
Implement the logic on the application side.
Wrap all data modifications in stored procedures and implement the logic in the stored procedure.
If you don't want to go through all that trouble (which would probably help with other issues), you can replace the select distinct with:
group by least(id_cart_rule_1, id_cart_rule_2), greatest(id_cart_rule_1, id_cart_rule_2)

Update statement causes fields to be updated with NULL or maximum value

If you had to pick one of the two following queries, which would you choose and why:
UPDATE `table1` AS e
SET e.points = e.points+(
SELECT points FROM `table2` AS ep WHERE e.cardnbr=ep.cardnbr);
or:
UPDATE `table1` AS e
INNER JOIN
(
SELECT points, cardnbr
FROM `table2`
) AS ep ON (e.cardnbr=ep.cardnbr)
SET e.points = e.points+ep.points;
Tables' definitions:
CREATE TABLE `table1` (
`cardnbr` int(10) DEFAULT NULL,
`name` varchar(50) DEFAULT NULL,
`points` decimal(7,3) DEFAULT '0.000',
`email` varchar(50) NOT NULL DEFAULT 'user#company.com',
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=25205 DEFAULT CHARSET=latin1$$
CREATE TABLE `table2` (
`cardnbr` int(10) DEFAULT NULL,
`id` int(11) NOT NULL AUTO_INCREMENT,
`points` decimal(7,3) DEFAULT '0.000',
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=4 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci$$
UPDATE: BOTH are causing problems the first is causing non matched rows to update into NULL.
The second is causing them to update into the max value 999.9999 (decimal 7,3).
PS the cardnbr field is NOT a key
I prefer the second one..reason for that is
When using JOIN the databse can create an execution plan that is better for your query and save time whereas subqueries (like your first one ) will run all the queries and load all the datas which may take time.
i think subqueries is easy to read but performance wise JOIN is faster...
First, the two statements are not equivalent, as you found out yourself. The first one will update all rows of table1, putting NULL values for those rows that have no related rows in table2.
So the second query looks better because it doesn't update all rows of table1. It could be written in a more simpel way, like this though:
UPDATE table1 AS e
INNER JOIN table2 AS ep
ON e.cardnbr = ep.cardnbr
SET e.points = e.points + ep.points ;
So, the 2nd query would be the best to use, if cardnbr was the primary key of table2. Is it?
If it isn't, then which values from table2 should be used for the update of table1 (added to points)? All of them? You could use this:
UPDATE table1 AS e
INNER JOIN
( SELECT SUM(points) AS points, cardnbr
FROM table2
GROUP BY cardnbr
) AS ep ON e.cardnbr = ep.cardnbr
SET
e.points = e.points + ep.points ;
Just one of them? That would require some other derived table, depending on what you want.

MySQL: Update rows in table by iterating and joining with another one

I have a table papers
CREATE TABLE `papers` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(1000) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`my_count` int(11) NOT NULL,
PRIMARY KEY (`id`),
FULLTEXT KEY `title_fulltext` (`title`),
) ENGINE=MyISAM AUTO_INCREMENT=1617432 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
and another table link_table
CREATE TABLE `auth2paper2loc` (
`auth_id` int(11) NOT NULL,
`paper_id` int(11) NOT NULL,
`loc_id` int(11) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
The id papers.id from the upper table is the same one like the link_table.paper_id in the second table. I want to iterate through every row in the upper table and count how many times this its id appears in the second table and store the "count" into the column "my_count" in the upper table.
Example: If The paper with tid = 1 = paper_id appears 5 times in the table link_table, then my_count = 5.
I can do that by a Python script but it results in too many querys and I have millions of entrys so it is really slow. And I can't figure out the right syntax to make this right inside of MySQL.
This is what I am iterating about in a for-loop in Python (too slow):
SELECT count(link_table.auth_id) FROM link_table
WHERE link_table.paper_id = %s
UPDATE papers SET auth_count = %s WHERE id = %s
Could someone please tell me how to create this one? There must be a way to nest this and put it directly in MySQL so it is faster, isn't there?
How does this perform for you?
update papers a
set my_count = (select count(*)
from auth2paper2loc b
where b.paper_id = a.id);
Use either:
UPDATE PAPERS
SET my_count = (SELECT COUNT(b.paper_id)
FROM AUTH2PAPERLOC b
WHERE b.paper_id = PAPERS.id)
...or:
UPDATE PAPERS
LEFT JOIN (SELECT b.paper_id,
COUNT(b.paper_id) AS numCount
FROM AUTH2PAPERLOC b
GROUP BY b.paper_id) x ON x.paper_id = PAPERS.id
SET my_count = COALESCE(x.numCount, 0)
The COALESCE is necessary to convert the NULL to a zero when there aren't any instances of PAPERS.id in the AUTH2PAPERLOC table.
update papers left join
(select paper_id, count(*) total from auth2paper2loc group by paper_id) X
on papers.id = X.paper_id
set papers.my_count = IFNULL(X.total, 0)

MySQL query killing my server

Looking at this query there's got to be something bogging it down that I'm not noticing. I ran it for 7 minutes and it only updated 2 rows.
//set product count for makes
$tru->query->run(array(
'name' => 'get-make-list',
'sql' => 'SELECT id, name FROM vehicle_make',
'connection' => 'core'
));
while($tempMake = $tru->query->getArray('get-make-list')) {
$tru->query->run(array(
'name' => 'update-product-count',
'sql' => 'UPDATE vehicle_make SET product_count = (
SELECT COUNT(product_id) FROM taxonomy_master WHERE v_id IN (
SELECT id FROM vehicle_catalog WHERE make_id = '.$tempMake['id'].'
)
) WHERE id = '.$tempMake['id'],
'connection' => 'core'
));
}
I'm sure this query can be optimized to perform better, but I can't think of how to do it.
vehicle_make = 45 rows
taxonomy_master = 11,223 rows
vehicle_catalog = 5,108 rows
All tables have appropriate indexes
UPDATE: I should note that this is a 1-time script so overhead isn't a big deal as long as it runs.
CREATE TABLE IF NOT EXISTS `vehicle_make` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(32) NOT NULL,
`product_count` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=46 ;
CREATE TABLE IF NOT EXISTS `taxonomy_master` (
`product_id` int(10) NOT NULL,
`v_id` int(10) NOT NULL,
`vehicle_requirement` varchar(255) DEFAULT NULL,
`is_sellable` enum('True','False') DEFAULT 'True',
`programming_override` varchar(25) DEFAULT NULL,
PRIMARY KEY (`product_id`,`v_id`),
KEY `idx2` (`product_id`),
KEY `idx3` (`v_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `vehicle_catalog` (
`v_id` int(10) NOT NULL,
`id` int(11) NOT NULL,
`v_make` varchar(255) NOT NULL,
`make_id` int(11) NOT NULL,
`v_model` varchar(255) NOT NULL,
`model_id` int(11) NOT NULL,
`v_year` varchar(255) NOT NULL,
PRIMARY KEY (`v_id`,`v_make`,`v_model`,`v_year`),
UNIQUE KEY `idx` (`v_make`,`v_model`,`v_year`),
UNIQUE KEY `idx2` (`v_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Update: The successful query to get what I needed is here....
SELECT
m.id,COUNT(t.product_id) AS CountOf
FROM taxonomy_master t
INNER JOIN vehicle_catalog v ON t.v_id=v.id
INNER JOIN vehicle_make m ON v.make_id=m.id
GROUP BY m.id;
without the tables/columns this is my best guess from reverse engineering the given queries:
UPDATE m
SET product_count =COUNT(t.product_id)
FROM taxonomy_master t
INNER JOIN vehicle_catalog v ON t.v_id=v.id
INNER JOIN vehicle_make m ON v.make_id=m.id
GROUP BY m.name
The given code loops over each make, and then runs a query the counts for each. My answer just does them all in one query and should be a lot faster.
have an index for each of these:
vehicle_make.id cover on name
vehicle_catalog.id cover make_id
taxonomy_master.v_id
EDIT
give this a try:
CREATE TEMPORARY TABLE CountsOf (
id int(11) NOT NULL
, CountOf int(11) NOT NULL DEFAULT 0.00
);
INSERT INTO CountsOf
(id, CountOf )
SELECT
m.id,COUNT(t.product_id) AS CountOf
FROM taxonomy_master t
INNER JOIN vehicle_catalog v ON t.v_id=v.id
INNER JOIN vehicle_make m ON v.make_id=m.id
GROUP BY m.id;
UPDATE taxonomy_master,CountsOf
SET taxonomy_master.product_count=CountsOf.CountOf
WHERE taxonomy_master.id=CountsOf.id;
instead of using nested query ,
you can separated this query to 2 or 3 queries,
and in php insert the result of the inner query to the out query ,
its faster !
#haim-evgi Separating the queries will not increase the speed significantly, it will just shift the load from the DB server to the Web server and create overhead of moving data between the two servers.
I am not sure with the appropriate indexes you run such query 7 minutes. Could you please show the table structure of the tables involved in these queries.
Seems like you need the following indices:
INDEX BTREE('make_id') on vehicle_catalog
INDEX BTREE('v_id') on taxonomy_master