mysql table join with max() - mysql

I have problem with query using JOIN and MAX/MIN. For Example:
SELECT Min(a.date), Max(a.date)
FROM a
INNER JOIN b ON b.ID = a.ID AND b.cID = 5
Its possible to add index or change this query result was better?
Below the result of explain
+----+-------------+----------+------+-----------------+-----+---------+-----------+--------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+-----------------+-----+---------+-----------+--------+-----------------------+
| 1 | SIMPLE | b | ref | PRIMARY,cID | cID | 5 | const | 680648 | Using index |
| 1 | SIMPLE | a | ref | ID | ID | 5 | base.b.ID | 1 | Using index condition |
+----+-------------+----------+------+-----------------+-----+---------+-----------+--------+-----------------------+
Sorry, but I would not put here the whole table, and could make a lot of confusion.
CREATE TABLE `a` (
`ID` int(11) NOT NULL,
`date` datetime DEFAULT,
PRIMARY KEY (`ID`),
KEY `date` (`date`),
)
CREATE TABLE `b` (
`bID` int(11) NOT NULL,
`ID` int(11) NOT NULL,
`cID` int(11) DEFAULT,
PRIMARY KEY (`bID`),
KEY `cID` (`cID`),
)

b: INDEX(cID, ID)
will make that a "covering" index, so it will probably get through the 680648 rows faster. It should replace the current KEY(cID).
Key_len for b is 5. That disagrees with the table definition; something got simplified too much.

Related

How to use correct indexes with a double inner join query?

I have a query with 2 INNER JOIN statements, and only fetching a few column, but it is very slow even though I have indexes on all required columns.
My query
SELECT
dysfonctionnement,
montant,
listRembArticles,
case when dys.reimputation is not null then dys.reimputation else dys.responsable end as responsable_final
FROM
db.commandes AS com
INNER JOIN db.dysfonctionnements AS dys ON com.id_commande = dys.id_commande
INNER JOIN db.pe AS pe ON com.code_pe = pe.pe_id
WHERE
com.prestataireLAD REGEXP '.*'
AND pe_nom REGEXP 'bordeaux|chambéry-annecy|grenoble|lyon|marseille|metz|montpellier|nancy|nice|nimes|rouen|strasbourg|toulon|toulouse|vitry|vitry bis 1|vitry bis 2|vlg'
AND com.date_livraison BETWEEN '2022-06-11 00:00:00'
AND '2022-07-08 00:00:00';
It takes around 20 seconds to compute and fetch 4123 rows.
The problem
In order to find what's wrong and why is it so slow, I've used the EXPLAIN statement, here is the output:
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|----|-------------|-------|------------|--------|----------------------------|-------------|---------|------------------------|--------|----------|-------------|
| 1 | SIMPLE | dys | | ALL | id_commande,id_commande_2 | | | | 878588 | 100.00 | Using where |
| 1 | SIMPLE | com | | eq_ref | id_commande,date_livraison | id_commande | 110 | db.dys.id_commande | 1 | 7.14 | Using where |
| 1 | SIMPLE | pe | | ref | pe_id | pe_id | 5 | db.com.code_pe | 1 | 100.00 | Using where |
I can see that the dysfonctionnements JOIN is rigged, and doesn't use a key even though it could...
Table definitions
commandes (included relevant columns only)
CREATE TABLE `commandes` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`id_commande` varchar(36) NOT NULL DEFAULT '',
`date_commande` datetime NOT NULL,
`date_livraison` datetime NOT NULL,
`code_pe` int(11) NOT NULL,
`traitement_dysfonctionnement` tinyint(4) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_commande` (`id_commande`),
KEY `date_livraison` (`date_livraison`),
KEY `traitement_dysfonctionnement` (`traitement_dysfonctionnement`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
dysfonctionnements (again, relevant columns only)
CREATE TABLE `dysfonctionnements` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`id_commande` varchar(36) DEFAULT NULL,
`dysfonctionnement` varchar(150) DEFAULT NULL,
`responsable` varchar(50) DEFAULT NULL,
`reimputation` varchar(50) DEFAULT NULL,
`montant` float DEFAULT NULL,
`listRembArticles` text,
PRIMARY KEY (`id`),
UNIQUE KEY `id_commande` (`id_commande`,`dysfonctionnement`),
KEY `id_commande_2` (`id_commande`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
pe (again, relevant columns only)
CREATE TABLE `pe` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`pe_id` int(11) DEFAULT NULL,
`pe_nom` varchar(30) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `pe_nom` (`pe_nom`),
KEY `pe_id` (`pe_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Investigation
If I remove the db.pe table from the query and the WHERE clause on pe_nom, the query takes 1.7 seconds to fetch 7k rows, and with the EXPLAIN statement, I can see it is using keys as I expect it to do:
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|----|-------------|-------|------------|-------|----------------------------|----------------|---------|------------------------|--------|----------|-----------------------------------------------|
| 1 | SIMPLE | com | | range | id_commande,date_livraison | date_livraison | 5 | | 389558 | 100.00 | Using index condition; Using where; Using MRR |
| 1 | SIMPLE | dys | | ref | id_commande,id_commande_2 | id_commande_2 | 111 | ooshop.com.id_commande | 1 | 100.00 | |
I'm open to any suggestions, I see no reason not to use the key when it does on a very similar query and it definitely makes it faster...
I had a similar experience when MySQL optimiser selected a joined table sequence far from optimal. At that time I used MySQL specific STRAIGHT_JOIN operator to overcome default optimiser behaviour. In your case I would try this:
SELECT
dysfonctionnement,
montant,
listRembArticles,
case when dys.reimputation is not null then dys.reimputation else dys.responsable end as responsable_final
FROM
db.commandes AS com
STRAIGHT_JOIN db.dysfonctionnements AS dys ON com.id_commande = dys.id_commande
INNER JOIN db.pe AS pe ON com.code_pe = pe.pe_id
Also, in your WHERE clause one of the REGEXP probably might be changed to IN operator, I assume it can use index.
Remove com.prestataireLAD REGEXP '.*'. The Optimizer probably won't realize that this has no impact on the resultset. If you are dynamically building the WHERE clause, then eliminate anything else you can.
id_commande_2 is redundant. In queries where it might be useful, the UNIQUE can take care of it.
These indexes might help:
com: INDEX(date_livraison, id_commande, code_pe)
pe: INDEX(pe_nom, pe_id)

can't improve query performance of query

I have the following 2 tables:
CREATE TABLE table1 (
ID INT(11) NOT NULL AUTO_INCREMENT,
AccountID INT NOT NULL,
Type VARCHAR(50) NOT NULL,
ValidForBilling BOOLEAN NULL DEFAULT false,
MerchantCreationTime TIMESTAMP NOT NULL,
PRIMARY KEY (ID),
UNIQUE KEY (OrderID, Type)
);
with the index:
INDEX accID_type_merchCreatTime_vfb (AccountID, Type, MerchantCreationTime, ValidForBilling);
CREATE TABLE table2 (
OrderID INT NOT NULL,
AccountID INT NOT NULL,
LineType VARCHAR(256) NOT NULL,
CreationDate TIMESTAMP NOT NULL,
CalculatedAmount NUMERIC(4,4) NULL,
table1ID INT(11) NOT NULL
);
I'm running the following query:
SELECT COALESCE(SUM(CalculatedAmount), 0.0) AS CalculatedAmount
FROM table2
INNER JOIN table1 ON table1.ID = table2.table1ID
WHERE table1.ValidForBilling is TRUE
AND table1.AccountID = 388
AND table1.Type = 'TPG_DISCOUNT'
AND table1.MerchantCreationTime >= '2018-11-01T05:00:00'
AND table1.MerchantCreationTime < '2018-12-01T05:00:00';
And it takes about 2 minutes to complete.
I did EXPLAIN in order to try and improve the query performance and got the following output:
+----+-------------+------------------+------------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------+---------+----------------------+-------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------+---------+----------------------+-------+----------+--------------------------+
| 1 | SIMPLE | table1 | NULL | range | PRIMARY,i_fo_merchant_time_account,FO_AccountID_MerchantCreationTime,FO_AccountID_ExecutionTime,FO_AccountID_Type_ExecutionTime,FO_AccountID_Type_MerchantCreationTime,accID_type_merchCreatTime_vfb | accID_type_merchCreatTime_vfb | 61 | NULL | 71276 | 100.00 | Using where; Using index |
| 1 | SIMPLE | table2 | NULL | eq_ref | table1ID,i_oc_fo_id | table1ID | 4 | finance.table1.ID | 1 | 100.00 | NULL |
+----+-------------+------------------+------------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------+---------+----------------------+-------+----------+--------------------------+
I see that I scan 71276 rows in table1 and I can't seem to make this number lower.
Is there an index I can create to improve this query performance?
Move ValidForBilling before MerchantCreationTime in accID_type_merchCreatTime_vfb. You need to do ref lookups =TRUE before range uses in an index.
For table 2, seems to be a table1ID index already and appending CalculatedAmount will be able to be used in the result:
CREATE INDEX tbl1IDCalcAmount (table1ID,CalculatedAmount) ON table2

Horrible MySQL index behavior with a simplest IN statement

I have found that MySQL (Win 7 64, 5.6.14) does not use index properly if I specify table output for IN statement. USER table contains 900k records.
If I use IN (_SOME_TABLE_OUTPUT_) syntax - I get fullscan for all 900k users. Query runs forever.
If I use IN ('CONCRETE','VALUES') syntax - I get a correct index usage.
How can I make MySQL finally USE the index?
1st case:
explain SELECT gu.id FROM USER gu WHERE gu.uuid in
(select '11b6a540-0dc5-44e0-877d-b3b83f331231' union
select '11b6a540-0dc5-44e0-877d-b3b83f331232');
+----+--------------------+------------+-------+---------------+------+---------+------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+-------+---------------+------+---------+------+--------+--------------------------+
| 1 | PRIMARY | gu | index | NULL | uuid | 257 | NULL | 829930 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
| 3 | DEPENDENT UNION | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
| NULL | UNION RESULT | <union2,3> | ALL | NULL | NULL | NULL | NULL | NULL | Using temporary |
+----+--------------------+------------+-------+---------------+------+---------+------+--------+--------------------------+
2nd case:
explain SELECT gu.id FROM USER gu WHERE gu.uuid in
('11b6a540-0dc5-44e0-877d-b3b83f331231');
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
| 1 | SIMPLE | gu | ref | uuid | uuid | 257 | const | 1 | Using where; Using index |
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
Table structure:
CREATE TABLE `USER` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`version` bigint(20) NOT NULL,
`email` varchar(255) DEFAULT NULL,
`uuid` varchar(255) NOT NULL,
`partner_id` bigint(20) NOT NULL,
`password` varchar(255) DEFAULT NULL,
`date_created` datetime DEFAULT NULL,
`last_updated` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique-email` (`partner_id`,`email`),
KEY `uuid` (`uuid`),
CONSTRAINT `fk_USER_partner` FOREIGN KEY (`partner_id`) REFERENCES `partner` (`id`) ON DELETE CASCADE,
CONSTRAINT `FKB2D9FEBE725C505E` FOREIGN KEY (`partner_id`) REFERENCES `partner` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3315452 DEFAULT CHARSET=latin1
FORCE INDEX and USE INDEX statements don't change anything.
Demonstration SQLfiddle: http://sqlfiddle.com/#!2/c607e1/2
In fact I faced such problem before and it happened that I had one table that had a single column set as UTF-8 and the other tables where latin1. It did not matter what I did, MySQL insisted on using no indexes. The problem is quite well described on this blog post Slow queries in MySQL due to collation problems. Once you manage to fix the character set, I believe any of the queries will work.
An inner join on your virtual table might give you better performance. Try something along these lines.
SELECT gu.id
FROM USER gu
INNER JOIN (
select '11b6a540-0dc5-44e0-877d-b3b83f331231' uuid
union all
select '11b6a540-0dc5-44e0-877d-b3b83f331232') ids
on gu.uuid = ids.uuid;

Excluding large sets of objects from a query on a table with fast changing order

I have a table of products with a score column, which has a B-Tree Index on it. I have a query which returns products that have not been shown to the user in the current session. I can't simply use simple pagination with LIMIT for it, because the result should be ordered by the score column, which can change between query calls.
My current solution works like this:
SELECT *
FROM products p
LEFT JOIN product_seen ps
ON (ps.session_id = ? AND p.product_id = ps.product_id )
WHERE ps.product_id is null
ORDER BY p.score DESC
LIMIT 30;
This works fine for the first few pages, but the response time grows linear to the number of products already shown in the session and hits the second mark by the time this number reaches ~300. Is there a way to fasten this up in MySQL? Or should I solve this problem in an entirely other way?
Edit:
These are the two tables:
CREATE TABLE `products` (
`product_id` int(15) NOT NULL AUTO_INCREMENT,
`shop` varchar(15) NOT NULL,
`shop_id` varchar(25) NOT NULL,
`shop_category_id` varchar(20) DEFAULT NULL,
`shop_subcategory_id` varchar(20) DEFAULT NULL,
`shop_designer_id` varchar(20) DEFAULT NULL,
`shop_designer_name` varchar(40) NOT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`product_url` varchar(255) NOT NULL,
`name` varchar(255) NOT NULL,
`description` mediumtext NOT NULL,
`price_cents` int(10) NOT NULL,
`list_image_url` varchar(255) NOT NULL,
`list_image_height` int(4) NOT NULL,
`ending` timestamp NULL DEFAULT NULL,
`category_id` int(5) NOT NULL,
`last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`included_at` timestamp NULL DEFAULT NULL,
`hearts` int(5) NOT NULL,
`score` decimal(10,5) NOT NULL,
`rand_field` decimal(16,15) NOT NULL,
`last_score_update` timestamp NULL DEFAULT NULL,
`active` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`product_id`),
UNIQUE KEY `unique_shop_id` (`shop`,`shop_id`),
KEY `score_index` (`active`,`score`),
KEY `included_at_index` (`included_at`),
KEY `active_category_score` (`active`,`category_id`,`score`),
KEY `active_category` (`active`,`category_id`,`product_id`),
KEY `active_products` (`active`,`product_id`),
KEY `active_rand` (`active`,`rand_field`),
KEY `active_category_rand` (`active`,`category_id`,`rand_field`)
) ENGINE=InnoDB AUTO_INCREMENT=55985 DEFAULT CHARSET=utf8
CREATE TABLE `product_seen` (
`seenby_id` int(20) NOT NULL AUTO_INCREMENT,
`session_id` varchar(25) NOT NULL,
`product_id` int(15) NOT NULL,
`last_seen` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`sorting` varchar(10) NOT NULL,
`in_category` int(3) DEFAULT NULL,
PRIMARY KEY (`seenby_id`),
KEY `last_seen_index` (`last_seen`),
KEY `session_id` (`session_id`,`seenby_id`),
KEY `session_id_2` (`session_id`,`sorting`,`seenby_id`)
) ENGINE=InnoDB AUTO_INCREMENT=17431 DEFAULT CHARSET=utf8
Edit 2:
The query above is a simplification, this is the real query with EXPLAIN:
EXPLAIN SELECT
DISTINCT p.product_id AS id,
p.list_image_url AS image,
p.list_image_height AS list_height,
hearts,
active AS available,
(UNIX_TIMESTAMP( ) - ulp.last_action) AS last_loved
FROM `looksandgoods`.`products` p
LEFT JOIN `looksandgoods`.`user_likes_products` ulp
ON ( p.product_id = ulp.product_id AND ulp.user_id =1 )
LEFT JOIN `looksandgoods`.`product_seen` sb
ON (sb.session_id = 'y7lWunZKKABgMoDgzjwDjZw1'
AND sb.sorting = 'trend'
AND p.product_id = sb.product_id )
WHERE p.active =1
AND sb.product_id IS NULL
ORDER BY p.score DESC
LIMIT 30 ;
Explain output, there is still a temp table and filesort, although the keys for the join exist:
+----+-------------+-------+-------+----------------------------------------------------------------------------------------------------+------------------+---------+----------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+----------------------------------------------------------------------------------------------------+------------------+---------+----------------------------------+------+----------------------------------------------+
| 1 | SIMPLE | p | range | score_index,active_category_score,active_category,active_products,active_rand,active_category_rand | score_index | 1 | NULL | 2299 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | ulp | ref | love_count_index,user_to_product_index,product_id | love_count_index | 9 | looksandgoods.p.product_id,const | 1 | |
| 1 | SIMPLE | sb | ref | session_id,session_id_2 | session_id | 77 | const | 711 | Using where; Not exists; Distinct |
+----+-------------+-------+-------+----------------------------------------------------------------------------------------------------+------------------+---------+----------------------------------+------+----------------------------------------------+
New answer
I think the problem with the real query is the DISTINCT clause. The implication is that either or both of the product_seen and user_likes_products tables can join multiple rows for each product_id which could potentially appear in the result set (given the somewhat disturbing lack of UNIQUE KEYs on the product_seen table), and this is the reason you've included the DISTINCT clause. Unfortunately, it also means MySQL will have to create a temp table to process the query.
Before I go any further, if it's possible to do...
ALTER TABLE product_seen ADD UNIQUE KEY (session_id, product_id, sorting);
...and...
ALTER TABLE user_likes_products ADD UNIQUE KEY (user_id, product_id);
...then the DISTINCT clause is redundant, and removing it should eliminate the problem. N.B. I'm not suggesting you necessarily need to add these keys, but rather just to confirm that these fields are always unique.
If it's not possible, then there may be another solution, but I'd need to know a lot more about the tables involved in the joins.
Old answer
An EXPLAIN for your query yields...
+----+-------------+-------+------+---------------+------------+---------+-------+------+-------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------------+---------+-------+------+-------------------------+
| 1 | SIMPLE | p | ALL | NULL | NULL | NULL | NULL | 10 | Using filesort |
| 1 | SIMPLE | ps | ref | session_id | session_id | 27 | const | 1 | Using where; Not exists |
+----+-------------+-------+------+---------------+------------+---------+-------+------+-------------------------+
...which shows it's not using an index on the products table, so it's having to do a table scan and a filesort, which is why it's slow.
I noticed there's an index on (active, score) which you could use by changing the query to only show active products...
SELECT *
FROM products p
LEFT JOIN product_seen ps
ON (ps.session_id = ? AND p.product_id = ps.product_id )
WHERE p.active=TRUE AND ps.product_id is null
ORDER BY p.score DESC
LIMIT 30;
...which changes the EXPLAIN to...
+----+-------------+-------+-------+-----------------------------+-------------+---------+-------+------+-------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-----------------------------+-------------+---------+-------+------+-------------------------+
| 1 | SIMPLE | p | range | score_index,active_products | score_index | 1 | NULL | 10 | Using where |
| 1 | SIMPLE | ps | ref | session_id | session_id | 27 | const | 1 | Using where; Not exists |
+----+-------------+-------+-------+-----------------------------+-------------+---------+-------+------+-------------------------+
...which is now doing a range scan and no filesort, which should be much faster.
Or if you want it to also return inactive products, then you'll need to add an index on score only, with...
ALTER TABLE products ADD KEY (score);

How to prevent a full table scan when doing a simple JOIN?

I have two tables, TableA and TableB:
CREATE TABLE `TableA` (
`shared_id` int(10) unsigned NOT NULL default '0',
`foo` int(10) unsigned NOT NULL,
PRIMARY KEY (`shared_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
CREATE TABLE `TableB` (
`shared_id` int(10) unsigned NOT NULL auto_increment,
`bar` int(10) unsigned NOT NULL,
KEY `shared_id` (`shared_id`)
) ENGINE=MyISAM AUTO_INCREMENT=1001 DEFAULT CHARSET=latin1
Here's my query:
SELECT TableB.bar
FROM TableB, TableA
WHERE TableA.foo = 1000
AND TableA.shared_id = TableB.shared_id;
Here's the problem:
mysql> explain SELECT TableB.bar FROM TableB, TableA WHERE TableA.foo = 1000 AND TableA.shared_id = TableB.shared_id;
+----+-------------+--------------+--------+---------------+---------+---------+------------------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+--------+---------------+---------+---------+------------------------------------------+------+-------------+
| 1 | SIMPLE | TableB | ALL | shared_id | NULL | NULL | NULL | 1000 | |
| 1 | SIMPLE | TableA | eq_ref | PRIMARY | PRIMARY | 4 | MyDatabase.TableB.shared_id | 1 | Using where |
+----+-------------+--------------+--------+---------------+---------+---------+------------------------------------------+------+-------------+
Is there an index that I can add that will prevent the full table scan of TableB?
Runcible, your query could use some rewriting. You should always specify your JOIN conditions in an ON clause and not in a WHERE.
Your query would become:
SELECT TableB.bar
FROM TableB
JOIN TableA
ON TableB.shared_id = TableA.shared_id
AND TableA.foo = 1000;
Not only do you want to do this:
ALTER TABLE TableB ADD INDEX (shared_id,bar);
You'll want to add an index to A as follows:
ALTER TABLE TableA ADD INDEX (foo, shared_id);
Do this, and provide the EXPLAIN output please.
Also note that by adding an index on (shared_id, bar) you just made your (shared_id) index redundant. Drop it.