Mysql query optimisation to transform subquery in join without DISTINCT - mysql

I have tables:
CREATE TABLE IF NOT EXISTS `bk_cart_rule` (
`id_cart_rule` int(10) unsigned NOT NULL DEFAULT '0',
`cart_rule_restriction` tinyint(1) unsigned NOT NULL DEFAULT '0',
KEY `id_cart_rule` (`id_cart_rule`),
KEY `cart_rule_restriction` (`cart_rule_restriction`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `bk_cart_rule_combination` (
`id_cart_rule_1` int(10) unsigned NOT NULL,
`id_cart_rule_2` int(10) unsigned NOT NULL,
KEY `id_cart_rule_1` (`id_cart_rule_1`),
KEY `id_cart_rule_2` (`id_cart_rule_2`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `bk_cart_rule_lang` (
`id_cart_rule` int(10) unsigned NOT NULL,
`id_lang` int(10) unsigned NOT NULL,
KEY `id_cart_rule` (`id_cart_rule`),
KEY `id_lang` (`id_lang`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
And a query :
SELECT SQL_NO_CACHE cr.*, crl.*, 1 as selected FROM bk_cart_rule cr
LEFT JOIN bk_cart_rule_lang crl ON (cr.id_cart_rule = crl.id_cart_rule AND crl.id_lang = 2)
WHERE cr.id_cart_rule != 375 AND
( cr.cart_rule_restriction = 0 OR
cr.id_cart_rule IN (
SELECT IF(id_cart_rule_1 = 375, id_cart_rule_2, id_cart_rule_1) FROM bk_cart_rule_combination WHERE 375 = id_cart_rule_1 OR 375 = id_cart_rule_2 ) )
Obvious optimization is:
SELECT SQL_NO_CACHE DISTINCT cr.*, crl.* 1 as selected FROM bk_cart_rule cr
LEFT JOIN bk_cart_rule_lang crl ON (cr.id_cart_rule = crl.id_cart_rule AND crl.id_lang = 2)
LEFT JOIN bk_cart_rule_combination crc ON (375 = crc.id_cart_rule_1 AND cr.id_cart_rule = crc.id_cart_rule_2) OR (375 = crc.id_cart_rule_2 AND cr.id_cart_rule = crc.id_cart_rule_1)
WHERE cr.id_cart_rule != 375 AND (cr.cart_rule_restriction = 0 OR NOT ISNULL(crc.id_cart_rule_1))
But how can i get rid off DISTINCT (in bk_cart_rule_combination I've two-way combinations : )
id_cart_rule_1 id_cart_rule_2
375 776
776 375
Or maybe there is a better optimization possible?

If the ordering of the cart rules is not important, then add the constraint that the id for the first one is less than the id of the second one. That is, put them in the table in order.
Sadly, MySQL doesn't allow simple check constraints. Instead, you have to implement it in some other way. Here are three:
Implement an insert/update trigger to maintain the ordering (and prevent duplicates).
Implement the logic on the application side.
Wrap all data modifications in stored procedures and implement the logic in the stored procedure.
If you don't want to go through all that trouble (which would probably help with other issues), you can replace the select distinct with:
group by least(id_cart_rule_1, id_cart_rule_2), greatest(id_cart_rule_1, id_cart_rule_2)

Related

Use Indexes For Join on Indexed DATETIME and Indexed DATE columns

EDIT
I misread my initial error and blamed the INDEX not being used on the wrong columns.
I was able to recreate the issue that I saw and the solution that ysth suggested worked.
Below are the create tables statements, inserts to the tables, and two queries - one that has the error and another with the solution which does not have it.
# Make tables and indices
DROP TABLE a;
DROP TABLE b;
create table a
(
DT DATE,
USER INT,
COMMENT_SENTIMENT INT,
PRIMARY KEY (USER, DT));
CREATE INDEX a_DT_USER_IDX ON a (DT,USER);
create table b
(
id int auto_increment primary key,
DT DATETIME(6),
USER mediumtext,
COMMENT_SENTIMENT INT);
CREATE INDEX b_DT_USER_IDX ON b (DT);
CREATE UNIQUE INDEX b_DT_USER ON b (USER(16), DT);
# Insert some dummy data
INSERT INTO a VALUES('2023-01-01', 5, 4);
INSERT INTO b VALUES(NULL, '2023-01-01 00:00:00', 5, 4);
# Explain that shows the issue I was seeing.
EXPLAIN
SELECT *
FROM a
JOIN b
ON a.DT = b.DT
AND a.USER = b.USER;
# Out
# 1,SIMPLE,a,,ALL,"PRIMARY,a_DT_USER_IDX",,,,1,100,
# 1,SIMPLE,b,,ref,"b_DT_USER,b_DT_USER_IDX",b_DT_USER_IDX,9,a.DT,1,100,Using index condition; Using where
[2023-01-24 18:00:14] [HY000][1739] Cannot use ref access on index 'b_DT_USER' due to type or collation conversion on field 'USER'
[2023-01-24 18:00:14] [HY000][1003] /* select#1 */ select `a`.`DT` AS `DT`, `a`.`USER` AS `USER`,`a`.`COMMENT_SENTIMENT` AS `COMMENT_SENTIMENT`,`b`.`id` AS `id`,`b`.`DT` AS `DT`,`b`.`USER` AS `USER`,`b`.`COMMENT_SENTIMENT` AS `COMMENT_SENTIMENT` from `a` join `b` where ((`a`.`DT` = `b`.`DT`) and (`a`.`USER` = `b`.`USER`))
# Explain with the fix ysth suggested
EXPLAIN
SELECT *
FROM a
JOIN b
ON a.DT = b.DT
AND a.USER = CAST(b.USER AS DECIMAL );
# 1,SIMPLE,a,,ALL,"PRIMARY,a_DT_USER_IDX",,,,1,100,
# 1,SIMPLE,b,,ref,b_DT_USER_IDX,b_DT_USER_IDX,9,a.DT,1,100,Using index condition; Using where
# [2023-01-24 18:04:24] [HY000][1003] /* select#1 */ select `a`.`DT` AS `DT`,`a`.`USER` AS `USER`,`a`.`COMMENT_SENTIMENT` AS `COMMENT_SENTIMENT`,`b`.`id` AS `id`,`b`.`DT` AS `DT`,`b`.`USER` AS `USER`,`b`.`COMMENT_SENTIMENT` AS `COMMENT_SENTIMENT` from `a` join `b` where ((`a`.`DT` = `b`.`DT`) and (`a`.`USER` = cast(`b`.`USER` as decimal(10,0))))
# [2023-01-24 18:04:24] 2 rows retrieved starting from 1 in 359 ms (execution: 250 ms, fetching: 109 ms)
__
The below information is incorrect. Please use the edit to see the issue I was having and it's solution.
I have three tables a, b, and c in my MySQL 5.7 database. SHOW CREATE statements for each table are:
CREATE TABLE `a` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`DT` date DEFAULT NULL,
`USER` int(11) DEFAULT NULL,
`COMMENT_SENTIMENT` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `a_DT_USER_IDX` (`DT`,`USER`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `b` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`DT` datetime DEFAULT NULL,
`USER` int(11) DEFAULT NULL,
`COMMENT_SENTIMENT` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `b_DT_USER_IDX` (`DT`,`USER`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `c` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`DT` date DEFAULT NULL,
`USER` int(11) DEFAULT NULL,
`COMMENT_SENTIMENT` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `b_DT_USER_IDX` (`DT`,`USER`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Table a has a DATE column a.DT, table b has a DATETIME column b.DT, and table c has a DATE column c.DT.
All of these DT columns are indexed.
As a caveat, while b.DT is a DATETIME, all of the 'time' portions in it are 00:00:00 and they always will be. It probably should be a DATE, but I cannot change it.
I want to join table a and table b on their DT columns, but explain tells me that their indices are not used:
Cannot use ref access on index 'b.DT_datetime_index' due to type or collation conversion on field 'DT'
When I join table a and b on a.DT and b.DT
SELECT *
FROM a
JOIN b
ON a.DT = b.DT;
The result is much slower than when I do the same with a and c
SELECT *
FROM a
JOIN c
ON a.DT = c.DT;
Is there a way to use the indices in join from the first query on a.DT = b.DT, specifically without altering the tables? I'm not sure if b.DT having only 00:00:00 for the time portion could be relevant in a solution.
The end goal is a faster select using this join.
Thank you!
-- What I've done section --
I compared the joins between a.DT = b.DT and a.DT = c.DT, and saw the time difference.
I also tried wrapping b's DT column with DATE(b.DT), but explain gave the same issue, which is pretty expected.
MySQL won't use an index to join DATE and DATETIME columns.
You can create a virtual column with the corresponding DATE and use that.
CREATE TABLE `b` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`DT` datetime DEFAULT NULL,
`USER` int(11) DEFAULT NULL,
`COMMENT_SENTIMENT` int(11) DEFAULT NULL,
`DT_DATE` DATE AS (DATE(DT)),
PRIMARY KEY (`id`),
KEY `b_DT_USER_IDX` (`DT_DATE`,`USER`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
SELECT *
FROM a
JOIN b
ON a.DT = b.DT_DATE;
Assuming you want to read a and join b rows, you can just do
SELECT *
FROM a
JOIN b
ON b.DT = timestamp(a.DT);
If the other way around, then
SELECT *
FROM b
JOIN a
ON a.DT = date(b.DT);
No need for a virtual column.
Virtually any function call is "not sargable " That is, CAST(b.USER AS DECIMAL ) prevents the use of an index.
Do not mix strings and ints in comparisons. The string will be converted to numeric. If the string is a literal, such as '123' then the Optimizer is smart enough to do that once. If it is a column name, it must check every row.
Tip: If you are likely to test for one user and a range of dates, then this works better than the opposite order.
INDEX(user, dt)`
(You may need an index starting with dt for other queries.)

Can I improve my movie selecting SQL query

I've created a database to store movies data. My tables are the following:
movies:
CREATE TABLE IF NOT EXISTS `movies` (
`movieId` int(11) NOT NULL AUTO_INCREMENT,
`imdbId` varchar(255) DEFAULT NULL,
`imdbRating` float DEFAULT NULL,
`movieTitle` varchar(255) NOT NULL,
`movieLength` varchar(255) NOT NULL,
`imdbRatingCount` varchar(255) NOT NULL,
`poster` varchar(255) NOT NULL,
`year` varchar(255) NOT NULL,
PRIMARY KEY (`movieId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
I have a table in which i store movie actors:
CREATE TABLE IF NOT EXISTS `actors` (
`actorId` int(10) NOT NULL AUTO_INCREMENT,
`actorName` varchar(255) NOT NULL,
PRIMARY KEY (`actorId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
And one other in which i store the relation between the movies and actors: (movieActor)
CREATE TABLE IF NOT EXISTS `movieActor` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`movieId` int(10) NOT NULL,
`actorId` int(10) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Now when i want to select a list of movies in which are the selected actors my query is:
SELECT *
FROM movies m inner join
(SELECT movieId FROM movieActor WHERE actorId IN(1,2,3) GROUP BY movieId having count(*) = 3) ma ON m.movieId = ma.movieId
WHERE imdbRating IS NOT NULL ORDER BY imdbRating DESC
This is working perfectly, but i don't know that this is the optimal table structure and query to accomplish this. Are there any better table structure to store data or query the list?
First of all, use indexes on your tables. In my opinion it should be useful to have 3 indexes on movieActor. MovieId - ActorID - MovieIdActorId.
Second try tu use foreign keys. These help to identify the best execution plan for your dbs.
Third try to avoid generating temp tables in your execution plan of your query. Subselects often creates temp tables which are used when the database has to temporarily save something in the RAM. To check this, write EXPLAIN in front of goer query.
I would write it like this:
SELECT m.*, movieActor
FROM movies m inner join
movieActor ma ON m.movieId = ma.movieId
WHERE imdbRating IS NOT NULL
and actorId IN(1,2,3)
GROUP BY movieId
having count(*) = 3)
ORDER BY imdbRating DESC
(Not tested)
Just try to optimize it with the EXPLAIN keyword. It also can help you to create the right indexes.

MySQL: Update rows in table by iterating and joining with another one

I have a table papers
CREATE TABLE `papers` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(1000) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`my_count` int(11) NOT NULL,
PRIMARY KEY (`id`),
FULLTEXT KEY `title_fulltext` (`title`),
) ENGINE=MyISAM AUTO_INCREMENT=1617432 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
and another table link_table
CREATE TABLE `auth2paper2loc` (
`auth_id` int(11) NOT NULL,
`paper_id` int(11) NOT NULL,
`loc_id` int(11) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
The id papers.id from the upper table is the same one like the link_table.paper_id in the second table. I want to iterate through every row in the upper table and count how many times this its id appears in the second table and store the "count" into the column "my_count" in the upper table.
Example: If The paper with tid = 1 = paper_id appears 5 times in the table link_table, then my_count = 5.
I can do that by a Python script but it results in too many querys and I have millions of entrys so it is really slow. And I can't figure out the right syntax to make this right inside of MySQL.
This is what I am iterating about in a for-loop in Python (too slow):
SELECT count(link_table.auth_id) FROM link_table
WHERE link_table.paper_id = %s
UPDATE papers SET auth_count = %s WHERE id = %s
Could someone please tell me how to create this one? There must be a way to nest this and put it directly in MySQL so it is faster, isn't there?
How does this perform for you?
update papers a
set my_count = (select count(*)
from auth2paper2loc b
where b.paper_id = a.id);
Use either:
UPDATE PAPERS
SET my_count = (SELECT COUNT(b.paper_id)
FROM AUTH2PAPERLOC b
WHERE b.paper_id = PAPERS.id)
...or:
UPDATE PAPERS
LEFT JOIN (SELECT b.paper_id,
COUNT(b.paper_id) AS numCount
FROM AUTH2PAPERLOC b
GROUP BY b.paper_id) x ON x.paper_id = PAPERS.id
SET my_count = COALESCE(x.numCount, 0)
The COALESCE is necessary to convert the NULL to a zero when there aren't any instances of PAPERS.id in the AUTH2PAPERLOC table.
update papers left join
(select paper_id, count(*) total from auth2paper2loc group by paper_id) X
on papers.id = X.paper_id
set papers.my_count = IFNULL(X.total, 0)

Some help needed with a SQL query

I need some help with a MySQL query. I have two tables, one with offers and one with statuses. An offer can has one or more statuses. What I would like to do is get all the offers and their latest status. For each status there's a table field named 'added' which can be used for sorting.
I know this can be easily done with two queries, but I need to make it with only one because I also have to apply some filters later in the project.
Here's my setup:
CREATE TABLE `test`.`offers` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`client` TEXT NOT NULL ,
`products` TEXT NOT NULL ,
`contact` TEXT NOT NULL
) ENGINE = MYISAM ;
CREATE TABLE `statuses` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`offer_id` int(11) NOT NULL,
`options` text NOT NULL,
`deadline` date NOT NULL,
`added` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
Should work but not very optimal imho :
SELECT *
FROM offers
INNER JOIN statuses ON (statuses.offer_id = offers.id
AND statuses.id =
(SELECT allStatuses.id
FROM statuses allStatuses
WHERE allStatuses.offer_id = offers.id
ORDER BY allStatuses.added DESC LIMIT 1))
Try this:
SELECT
o.*
FROM offers o
INNER JOIN statuses s ON o.id = s.offer_id
ORDER BY s.added
LIMIT 1

MySQL query killing my server

Looking at this query there's got to be something bogging it down that I'm not noticing. I ran it for 7 minutes and it only updated 2 rows.
//set product count for makes
$tru->query->run(array(
'name' => 'get-make-list',
'sql' => 'SELECT id, name FROM vehicle_make',
'connection' => 'core'
));
while($tempMake = $tru->query->getArray('get-make-list')) {
$tru->query->run(array(
'name' => 'update-product-count',
'sql' => 'UPDATE vehicle_make SET product_count = (
SELECT COUNT(product_id) FROM taxonomy_master WHERE v_id IN (
SELECT id FROM vehicle_catalog WHERE make_id = '.$tempMake['id'].'
)
) WHERE id = '.$tempMake['id'],
'connection' => 'core'
));
}
I'm sure this query can be optimized to perform better, but I can't think of how to do it.
vehicle_make = 45 rows
taxonomy_master = 11,223 rows
vehicle_catalog = 5,108 rows
All tables have appropriate indexes
UPDATE: I should note that this is a 1-time script so overhead isn't a big deal as long as it runs.
CREATE TABLE IF NOT EXISTS `vehicle_make` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(32) NOT NULL,
`product_count` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=46 ;
CREATE TABLE IF NOT EXISTS `taxonomy_master` (
`product_id` int(10) NOT NULL,
`v_id` int(10) NOT NULL,
`vehicle_requirement` varchar(255) DEFAULT NULL,
`is_sellable` enum('True','False') DEFAULT 'True',
`programming_override` varchar(25) DEFAULT NULL,
PRIMARY KEY (`product_id`,`v_id`),
KEY `idx2` (`product_id`),
KEY `idx3` (`v_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `vehicle_catalog` (
`v_id` int(10) NOT NULL,
`id` int(11) NOT NULL,
`v_make` varchar(255) NOT NULL,
`make_id` int(11) NOT NULL,
`v_model` varchar(255) NOT NULL,
`model_id` int(11) NOT NULL,
`v_year` varchar(255) NOT NULL,
PRIMARY KEY (`v_id`,`v_make`,`v_model`,`v_year`),
UNIQUE KEY `idx` (`v_make`,`v_model`,`v_year`),
UNIQUE KEY `idx2` (`v_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Update: The successful query to get what I needed is here....
SELECT
m.id,COUNT(t.product_id) AS CountOf
FROM taxonomy_master t
INNER JOIN vehicle_catalog v ON t.v_id=v.id
INNER JOIN vehicle_make m ON v.make_id=m.id
GROUP BY m.id;
without the tables/columns this is my best guess from reverse engineering the given queries:
UPDATE m
SET product_count =COUNT(t.product_id)
FROM taxonomy_master t
INNER JOIN vehicle_catalog v ON t.v_id=v.id
INNER JOIN vehicle_make m ON v.make_id=m.id
GROUP BY m.name
The given code loops over each make, and then runs a query the counts for each. My answer just does them all in one query and should be a lot faster.
have an index for each of these:
vehicle_make.id cover on name
vehicle_catalog.id cover make_id
taxonomy_master.v_id
EDIT
give this a try:
CREATE TEMPORARY TABLE CountsOf (
id int(11) NOT NULL
, CountOf int(11) NOT NULL DEFAULT 0.00
);
INSERT INTO CountsOf
(id, CountOf )
SELECT
m.id,COUNT(t.product_id) AS CountOf
FROM taxonomy_master t
INNER JOIN vehicle_catalog v ON t.v_id=v.id
INNER JOIN vehicle_make m ON v.make_id=m.id
GROUP BY m.id;
UPDATE taxonomy_master,CountsOf
SET taxonomy_master.product_count=CountsOf.CountOf
WHERE taxonomy_master.id=CountsOf.id;
instead of using nested query ,
you can separated this query to 2 or 3 queries,
and in php insert the result of the inner query to the out query ,
its faster !
#haim-evgi Separating the queries will not increase the speed significantly, it will just shift the load from the DB server to the Web server and create overhead of moving data between the two servers.
I am not sure with the appropriate indexes you run such query 7 minutes. Could you please show the table structure of the tables involved in these queries.
Seems like you need the following indices:
INDEX BTREE('make_id') on vehicle_catalog
INDEX BTREE('v_id') on taxonomy_master