MySQL: How to optimize this query - mysql

I wish to reduce time to query data in view.
My tables have following structure:
Table Rings contains individual rings, each ring has unique combination of ID_RingType and Number, But also ID, which is used as foreign key elsewhere.
-- RINGS
CREATE TABLE `Rings` (
ID INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
ID_RingType CHAR(2) NOT NULL,
Number MEDIUMINT UNSIGNED NOT NULL,
ID_RingStatus TINYINT DEFAULT 1,
ID_User INT(11),
DateLastChange TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
FOREIGN KEY (ID_RingType) REFERENCES RingType(Code),
FOREIGN KEY (ID_RingStatus) REFERENCES RingStatus(ID),
FOREIGN KEY (ID_User) REFERENCES `848-cso`.`Users`(UID)
);
-- create index on tripple ID_User, ID_RingType, Number
CREATE INDEX idx_rings ON `Rings` (ID_User, ID_RingType, Number);
CREATE INDEX idx_rings_overview ON `Rings` (ID_RingType, Number, ID_RingStatus);
CREATE INDEX idx_rings_numbers ON `Rings` (ID_RingStatus, ID_User, ID_RingType, Number);
Ring Status contains only 4 values and their meaning
-- RING STATUS
CREATE TABLE `RingStatus` (
ID TINYINT NOT NULL PRIMARY KEY,
Name VARCHAR(20) UNIQUE COLLATE utf8_czech_ci,
NameEng VARCHAR(20)
);
Ring Type is indentified by two-letters Code
-- RING TYPE
CREATE TABLE `RingType` (
Code CHAR(2) NOT NULL PRIMARY KEY,
Material VARCHAR(30) COLLATE utf8_czech_ci,
Radius DOUBLE UNSIGNED,
MaxVal MEDIUMINT UNSIGNED NOT NULL
);
Moreover, I use following function:
/*
Function returns tinyint(1) specifying, whether ring was assigned
*/
CREATE FUNCTION fn_isRingAssigned (idRingStatus TINYINT)
RETURNS TINYINT(1) DETERMINISTIC
RETURN IF(idRingStatus = 1,1,2);
The query which I try to optimize is stored in following VIEW:
/*
View finds contiguous ranges of rings grouped by type, radius and status
*/
ALTER VIEW vw_rings_overview AS SELECT
a.ID_RingType,
rt.Radius,
fn_isRingAssigned(a.ID_RingStatus) AS status,
rs.Name,
a.Number AS min,
MIN(b.Number) AS max
FROM
RingStatus AS rs, Rings AS a
JOIN RingType AS rt ON a.ID_RingType = rt.Code
JOIN Rings AS b
ON a.ID_RingType = b.ID_RingType
AND fn_isRingAssigned(a.ID_RingStatus) = fn_isRingAssigned(b.ID_RingStatus)
AND a.Number <= b.Number
WHERE NOT EXISTS
( SELECT 1
FROM Rings AS c
WHERE c.ID_RingType = a.ID_RingType
AND fn_isRingAssigned(c.ID_RingStatus) = fn_isRingAssigned(a.ID_RingStatus)
AND c.Number = a.Number - 1
)
AND NOT EXISTS
( SELECT 1
FROM Rings AS d
WHERE d.ID_RingType = b.ID_RingType
AND fn_isRingAssigned(d.ID_RingStatus) = fn_isRingAssigned(b.ID_RingStatus)
AND d.Number = b.Number + 1
)
AND fn_isRingAssigned(a.ID_RingStatus) = rs.ID
GROUP BY
a.ID_RingType,
fn_isRingAssigned(a.ID_RingStatus),
a.Number
ORDER BY
a.ID_RingType,
a.Number;
The data in Rings table look as follows
+----+-------------+--------+---------------+---------+---------------------+
| ID | ID_RingType | Number | ID_RingStatus | ID_User | DateLastChange |
+----+-------------+--------+---------------+---------+---------------------+
| 1 | A | 1 | 4 | 2 | 2015-12-02 19:02:50 |
| 2 | A | 2 | 4 | 2 | 2015-12-02 19:02:56 |
| 3 | A | 3 | 4 | 2 | 2015-12-02 19:22:29 |
| 4 | A | 4 | 4 | 2 | 2015-12-21 20:32:24 |
| 5 | A | 5 | 4 | 2 | 2015-12-21 20:52:08 |
| 6 | A | 6 | 4 | 2 | 2015-12-21 20:52:22 |
| 7 | A | 7 | 1 | 2 | 2015-12-02 19:00:23 |
| 8 | A | 8 | 1 | 2 | 2015-12-02 19:00:23 |
| 9 | A | 9 | 1 | 2 | 2015-12-02 19:00:23 |
| 10 | A | 10 | 1 | 2 | 2015-12-02 19:00:23 |
+----+-------------+--------+---------------+---------+---------------------+
And results of the query look like this:
mysql> select * from vw_rings_overview;
+-------------+--------+--------+----------------+-----+-------+
| ID_RingType | Radius | status | Name | min | max |
+-------------+--------+--------+----------------+-----+-------+
| A | 20 | 2 | Assigned | 1 | 6 |
| A | 20 | 1 | Not assigned | 7 | 10 |
+-------------+--------+--------+----------------+-------------+
What the view does is it finds contiguous ranges in rings, having the same ring type, status and radius.
Table Rings currently contains less than 30 000 rows, and querying takes approx. 2 seconds. It is expected to contains few millions of rows, so I wish to optimize design of tables, indexes and view.
Here is result of EXPLAIN:
mysql> explain select * from vw_rings_overview;
+----+--------------------+------------+--------+--------------------+--------------------+---------+-----------------------------+-------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+--------+--------------------+--------------------+---------+-----------------------------+-------+-----------------------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 19 | |
| 2 | DERIVED | a | index | idx_rings_overview | idx_rings_overview | 7 | NULL | 25173 | Using where; Using index; Using temporary; Using filesort |
| 2 | DERIVED | rt | eq_ref | PRIMARY | PRIMARY | 2 | 848-avi2.a.ID_RingType | 1 | |
| 2 | DERIVED | rs | eq_ref | PRIMARY | PRIMARY | 1 | func | 1 | Using where |
| 2 | DERIVED | b | ref | idx_rings_overview | idx_rings_overview | 2 | 848-avi2.rt.Code | 1573 | Using where; Using index |
| 4 | DEPENDENT SUBQUERY | d | ref | idx_rings_overview | idx_rings_overview | 5 | 848-avi2.b.ID_RingType,func | 1 | Using where; Using index |
| 3 | DEPENDENT SUBQUERY | c | ref | idx_rings_overview | idx_rings_overview | 5 | 848-avi2.a.ID_RingType,func | 1 | Using where; Using index |
+----+--------------------+------------+--------+--------------------+--------------------+---------+-----------------------------+-------+-----------------------------------------------------------+
Here are some sample data: http://sqlfiddle.com/#!9/b8b489/1

Related

How to perform a sum for all previous records

I've been trying to implement the solution here with the added flavour of updating existing records. As an MRE I'm looking to populate the sum_date_diff column in a table with the sum of all the differences between the current row date and the date of every previous row where the current row p1_id matches the previous row p1_id or p2_id. I have already filled out the expected result below:
+-----+------------+-------+-------+---------------+
| id_ | date_time | p1_id | p2_id | sum_date_diff |
+-----+------------+-------+-------+---------------+
| 1 | 2000-01-01 | 1 | 2 | Null |
| 2 | 2000-01-02 | 2 | 4 | 1 |
| 3 | 2000-01-04 | 1 | 3 | 3 |
| 4 | 2000-01-07 | 2 | 5 | 11 |
| 5 | 2000-01-15 | 2 | 3 | 35 |
| 6 | 2000-01-20 | 1 | 3 | 35 |
| 7 | 2000-01-31 | 1 | 3 | 68 |
+-----+------------+-------+-------+---------------+
My query so far looks like:
UPDATE test.sum_date_diff AS sdd0
JOIN
(SELECT
id_,
SUM(DATEDIFF(sdd1.date_time, sq.date_time)) AS sum_date_diff
FROM
test.sum_date_diff AS sdd1
LEFT OUTER JOIN (SELECT
sdd2.date_time AS date_time, sdd2.p1_id AS player_id
FROM
test.sum_date_diff AS sdd2 UNION ALL SELECT
sdd3.date_time AS date_time, sdd3.p2_id AS player_id
FROM
test.sum_date_diff AS sdd3) AS sq ON sq.date_time < sdd1.date_time
AND sq.player_id = sdd1.p1_id
GROUP BY sdd1.id_) AS master_sq ON master_sq.id_ = sdd0.id_
SET
sdd0.sum_date_diff = master_sq.sum_date_diff
This works as shown here.
However, on a table of 1.5m records the query has been hanging for the last hour. Even when I add a WHERE clause onto the bottom to restrict the update to a single record then it hangs for 5 mins+.
Here is the EXPLAIN statement for the query on the full table:
+----+-------------+---------------+------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------+---------+-------+---------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------+---------+-------+---------+----------+--------------------------------------------+
| 1 | UPDATE | sum_date_diff | NULL | const | PRIMARY | PRIMARY | 4 | const | 1 | 100 | NULL |
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 4 | const | 10 | 100 | NULL |
| 2 | DERIVED | sum_date_diff | NULL | index | PRIMARY,ix__match_oc_history__date_time,ix__match_oc_history__p1_id,ix__match_oc_history__p2_id,ix__match_oc_history__date_time_players | ix__match_oc_history__date_time_players | 14 | NULL | 1484288 | 100 | Using index; Using temporary |
| 2 | DERIVED | <derived3> | NULL | ALL | NULL | NULL | NULL | NULL | 2968576 | 100 | Using where; Using join buffer (hash join) |
| 3 | DERIVED | sum_date_diff | NULL | index | NULL | ix__match_oc_history__date_time_players | 14 | NULL | 1484288 | 100 | Using index |
| 4 | UNION | sum_date_diff | NULL | index | NULL | ix__match_oc_history__date_time_players | 14 | NULL | 1484288 | 100 | Using index |
+----+-------------+---------------+------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------+---------+-------+---------+----------+--------------------------------------------+
Here is the CREATE TABLE statement:
CREATE TABLE `sum_date_diff` (
`id_` int NOT NULL AUTO_INCREMENT,
`date_time` datetime DEFAULT NULL,
`p1_id` int NOT NULL,
`p2_id` int NOT NULL,
`sum_date_diff` int DEFAULT NULL,
PRIMARY KEY (`id_`),
KEY `ix__sum_date_diff__date_time` (`date_time`),
KEY `ix__sum_date_diff__p1_id` (`p1_id`),
KEY `ix__sum_date_diff__p2_id` (`p2_id`),
KEY `ix__sum_date_diff__date_time_players` (`date_time`,`p1_id`,`p2_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1822120 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
MySQL version is 8.0.26 running on a 2016 MacBook Pro with Monterey with 16Gb RAM.
After reading around about boosting the RAM available to MySQL I've added the following to the standard my.cnf file:
innodb_buffer_pool_size = 8G
tmp_table_size=2G
max_heap_table_size=2G
I'm wondering if:
I've done something wrong
This is just a very slow task no matter what I do
There is a faster method
I'm hoping someone could enlighten me!
Whereas it is possible to do calculations like this in SQL, it is messy. If the number of rows is not in the millions, I would fetch the necessary columns into my application and do the arithmetic there. (Loops are easier and faster in PHP/Java/etc than in SQL.)
LEAD() and LAG() are possible, but they are not optimized well (or so is my experience). In an APP language, it is easy and efficient to look up things in arrays.
The SELECT can (easily and efficiently) do any filtering and sorting so that the app only receives the necessary data.

Query on same database runs in .4 seconds in mariadb and 83 seconds in mysql 5.6,5.7 and 8

I have the following query that runs really slow on mysql (83 seconds) but really fast on mariadb (.4 seconds).
I verified the data database has the same indexes and data. Maria Db server has less cpu (1VCPU), memory (2gb)
Mysql servers have 8 - 32GB ram and full quad core processors (tried 5.6,5.7, and 8.0 with similar results).
The phppos_inventory table has ~170000 rows and the phppos_items table has ~3000 rows
Here is the query and the tables and explains
SELECT /*+ SEMIJOIN(#subq MATERIALIZATION) */ SQL_CALC_FOUND_ROWS
1 AS _h,
`phppos_location_items`.`location_id` AS `location_id`,
`phppos_items`.`item_id`,
`phppos_items`.`name`,
`phppos_categories`.`id` AS `category_id`,
`phppos_categories`.`name` AS `category`,
`location`,
`company_name`,
`phppos_items`.`item_number`,
`size`,
`product_id`,
Coalesce(phppos_location_item_variations.cost_price,
phppos_item_variations.cost_price, phppos_location_items.cost_price,
phppos_items.cost_price, 0) AS cost_price,
Coalesce(phppos_location_item_variations.unit_price,
phppos_item_variations.unit_price, phppos_location_items.unit_price,
phppos_items.unit_price, 0) AS unit_price,
Sum(Coalesce(inv.trans_current_quantity, 0)) AS quantity,
Coalesce(phppos_location_item_variations.reorder_level,
phppos_item_variations.reorder_level, phppos_location_items.reorder_level,
phppos_items.reorder_level) AS reorder_level,
Coalesce(phppos_location_item_variations.replenish_level,
phppos_item_variations.replenish_level, phppos_location_items.replenish_level,
phppos_items.replenish_level) AS replenish_level,
description
FROM `phppos_inventory` `inv`
LEFT JOIN `phppos_items`
ON `phppos_items`.`item_id` = `inv`.`trans_items`
LEFT JOIN `phppos_location_items`
ON `phppos_location_items`.`item_id` = `phppos_items`.`item_id`
AND `phppos_location_items`.`location_id` = `inv`.`location_id`
LEFT JOIN `phppos_item_variations`
ON `phppos_items`.`item_id` = `phppos_item_variations`.`item_id`
AND `phppos_item_variations`.`id` = `inv`.`item_variation_id`
AND `phppos_item_variations`.`deleted` = 0
LEFT JOIN `phppos_location_item_variations`
ON `phppos_location_item_variations`.`item_variation_id` =
`phppos_item_variations`.`id`
AND `phppos_location_item_variations`.`location_id` =
`inv`.`location_id`
LEFT OUTER JOIN `phppos_suppliers`
ON `phppos_items`.`supplier_id` =
`phppos_suppliers`.`person_id`
LEFT OUTER JOIN `phppos_categories`
ON `phppos_items`.`category_id` = `phppos_categories`.`id`
WHERE inv.trans_id = (SELECT Max(inv1.trans_id)
FROM phppos_inventory inv1
WHERE inv1.trans_items = inv.trans_items
AND ( inv1.item_variation_id =
phppos_item_variations.id
OR phppos_item_variations.id IS NULL )
AND inv1.location_id = inv.location_id
AND inv1.trans_date < '2019-12-31 23:59:59')
AND inv.location_id IN( 1 )
AND `phppos_items`.`system_item` = 0
AND `phppos_items`.`deleted` = 0
AND `is_service` != 1
GROUP BY `phppos_items`.`item_id`
LIMIT 20
Explain mysql (slighly different than maria db but I tried use index to match the execution plan and still was slow)
+------------------------------------------+-------+----------+------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+---------------------------------+------------+--------+------------------------------+-------+----------+------------------------------------+
| 1 | PRIMARY | phppos_items | NULL | ref | PRIMARY,item_number,product_id,phppos_items_ibfk_1,deleted,phppos_items_ibfk_3,phppos_items_ibfk_4,phppos_items_ibfk_5,description,size,reorder_level,cost_price,unit_price,promo_price,last_modified,name,phppos_items_ibfk_6,deleted_system_item,custom_field_1_value,custom_field_2_value,custom_field_3_value,custom_field_4_value,custom_field_5_value,custom_field_6_value,custom_field_7_value,custom_field_8_value,custom_field_9_value,custom_field_10_value,verify_age,phppos_items_ibfk_7,item_inactive_index,tags,full_search,name_search,item_number_search,product_id_search,description_search,size_search,custom_field_1_value_search,custom_field_2_value_search,custom_field_3_value_search,custom_field_4_value_search,custom_field_5_value_search,custom_field_6_value_search,custom_field_7_value_search,custom_field_8_value_search,custom_field_9_value_search,custom_field_10_value_search | deleted | 4 | const | 21188 | 9.00 | Using index condition; Using where |
| 1 | PRIMARY | inv | NULL | ref | phppos_inventory_ibfk_1,location_id,phppos_inventory_custom | phppos_inventory_custom | 8 | pos.phppos_items.item_id,const | 3 | 100.00 | NULL |
| 1 | PRIMARY | phppos_location_items | NULL | eq_ref | PRIMARY,phppos_location_items_ibfk_2 | PRIMARY | 8 | const,pos.phppos_items.item_id | 1 | 100.00 | NULL |
| 1 | PRIMARY | phppos_item_variations | NULL | eq_ref | PRIMARY,phppos_item_variations_ibfk_1 | PRIMARY | 4 | pos.inv.item_variation_id | 1 | 100.00 | Using where |
| 1 | PRIMARY | phppos_location_item_variations | NULL | eq_ref | PRIMARY,phppos_item_attribute_location_values_ibfk_2 | PRIMARY | 8 | pos.phppos_item_variations.id,const | 1 | 100.00 | NULL |
| 1 | PRIMARY | phppos_suppliers | NULL | ref | person_id | person_id | 4 | pos.phppos_items.supplier_id | 1 | 100.00 | NULL |
| 1 | PRIMARY | phppos_categories | NULL | eq_ref | PRIMARY | PRIMARY | 4 | pos.phppos_items.category_id | 1 | 100.00 | NULL |
| 2 | DEPENDENT SUBQUERY | inv1 | NULL | ref | phppos_inventory_ibfk_1,location_id,trans_date,phppos_inventory_ibfk_4,phppos_inventory_custom | phppos_inventory_custom | 8 | pos.inv.trans_items,pos.inv.location_id | 3 | 50.00 | Using where; Using index |
+----+--------------------+---------------------------------+------------+--------+---------------------------------------------------------------------------------------------------------
Explain maria db:
+------+---------------------------------------------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------------+---------------------------------+--------+------------------------------+
| 1 | PRIMARY | phppos_items | ref | PRIMARY,deleted,deleted_system_item | deleted | 4 | const | 23955 | Using where |
| 1 | PRIMARY | inv | ref | phppos_inventory_ibfk_1,location_id,phppos_inventory_custom | phppos_inventory_ibfk_1 | 4 | freelance_pos5.phppos_items.item_id | 2 | Using where |
| 1 | PRIMARY | phppos_location_items | eq_ref | PRIMARY,phppos_location_items_ibfk_2 | PRIMARY | 8 | const,freelance_pos5.phppos_items.item_id | 1 | |
| 1 | PRIMARY | phppos_item_variations | eq_ref | PRIMARY,phppos_item_variations_ibfk_1 | PRIMARY | 4 | freelance_pos5.inv.item_variation_id | 1 | Using where |
| 1 | PRIMARY | phppos_location_item_variations | eq_ref | PRIMARY,phppos_item_attribute_location_values_ibfk_2 | PRIMARY | 8 | freelance_pos5.phppos_item_variations.id,const | 1 | Using where |
| 1 | PRIMARY | phppos_suppliers | ref | person_id | person_id | 4 | freelance_pos5.phppos_items.supplier_id | 1 | Using where |
| 1 | PRIMARY | phppos_categories | eq_ref | PRIMARY | PRIMARY | 4 | freelance_pos5.phppos_items.category_id | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | inv1 | ref | phppos_inventory_ibfk_1,location_id,trans_date,phppos_inventory_ibfk_4,phppos_inventory_custom | phppos_inventory_custom | 8 | freelance_pos5.inv.trans_items,freelance_pos5.inv.location_id | 2 | Using where; Using index |
+------+--------------------+---------------------------------+--------+------------------------------------------------------------------------------------------------+-------------------------+---------+---------------------------------------------------------------+-------+--------------------------+
Tables described (Reached StackOverflow char limit)
https://pastebin.com/nhngSHb8
Create tables:
https://pastebin.com/aWMeriqt
MYSQL (DEV BOX)
mysql> SHOW GLOBAL STATUS LIKE '%thread%';
+------------------------------------------+-------+
| Variable_name | Value |
+------------------------------------------+-------+
| Delayed_insert_threads | 0 |
| Performance_schema_thread_classes_lost | 0 |
| Performance_schema_thread_instances_lost | 0 |
| Slow_launch_threads | 0 |
| Threads_cached | 4 |
| Threads_connected | 1 |
| Threads_created | 5 |
| Threads_running | 1 |
+------------------------------------------+-------+
8 rows in set (0.06 sec)
MARIA DB
MariaDB [freelance_pos5]> SHOW GLOBAL STATUS LIKE '%thread%';
+------------------------------------------+-------+
| Variable_name | Value |
+------------------------------------------+-------+
| Delayed_insert_threads | 0 |
| Performance_schema_thread_classes_lost | 0 |
| Performance_schema_thread_instances_lost | 0 |
| Slow_launch_threads | 0 |
| Threadpool_idle_threads | 0 |
| Threadpool_threads | 0 |
| Threads_cached | 3 |
| Threads_connected | 2 |
| Threads_created | 5 |
| Threads_running | 1 |
| wsrep_applier_thread_count | 0 |
| wsrep_rollbacker_thread_count | 0 |
| wsrep_thread_count | 0 |
+------------------------------------------+-------+
13 rows in set (0.00 sec)
Moving the
WHERE inv.trans_id = (SELECT Max(inv1.trans_id)
into the INNER JOIN is the game changer.
INNER JOIN (
SELECT inv1.trans_items, inv1.item_variation_id, inv1.location_id, MAX(inv1.trans_id) as trans_id
FROM phppos_inventory inv1
WHERE inv1.trans_date < '2019-12-31 23:59:59'
GROUP BY inv1.trans_items, inv1.item_variation_id, inv1.location_id
ORDER BY inv1.trans_items, inv1.item_variation_id, inv1.location_id
) inv1 on inv1.trans_id = inv.trans_id
AND inv1.trans_items = inv.trans_items
AND (inv1.item_variation_id = phppos_item_variations.id OR phppos_item_variations.id IS NULL)
AND inv1.location_id = inv.location_id
The execution is reduced from 80+s down to ~ <0.4s, on MySQL 8.0.
MariaDB's and MySQL's Optimizers started diverging significantly at 5.6. Certain queries will run queries faster in one than the other.
I think I see a way to speed up the query, perhaps on both versions.
Don't use LEFT JOIN when it is the same as JOIN, which seems to be the case for at least phppos_items, which has items in the WHERE that override LEFT.
Please provide SHOW CREATE TABLE; meanwhile, I will guess that what indexes you have/don't have, and that each table has PRIMARY KEY(id)
Use composite indexes where appropriate. (More below.)
Get the 20 rows before JOINing to the rest of the tables:
SELECT ...
FROM ( SELECT inv.id, pi.id
FROM `phppos_inventory` AS inv `inv`
JOIN `phppos_items` AS pi
ON pi.`item_id` = `inv`.`trans_items`
AND inv.location_id IN( 1 )
AND pi.`system_item` = 0
AND pi.`deleted` = 0
AND `is_service` != 1 -- Which table is this in???
GROUP BY pi.`item_id`
LIMIT 20 )
LEFT JOIN .... (( all the other tables ))
-- no GROUP BY or LIMIT needed (I think)
phppos_items: INDEX(item_id, deleted, system_item, is_service)
phppos_items: INDEX(deleted, system_item, is_service)
phppos_inventory: INDEX(trans_items, location_id, location_id, item_variation_id, trans_date, trans_id)
phppos_inventory: INDEX(location_id)
Aside with the fact that the query is misleading since the outer join is discarded, the main difference is that the second engine operation in MariabDB is an index range scan (ref) using the phppos_inventory_custom index. MySQL also chose an index range scan but over phppos_inventory_ibfk_1.
However, without the definition of these two indexes it's difficult to asses why the engines may have chosen a different path.
Please add to your question the definition of these indexes, and alse their selectivity (percent of estimated rows selected / total table rows) to elaborate more.

How to avoid temporary table on group by with join?

I'm having two tables say(for example), Department and Members
Department table description:
CREATE TABLE `Department` (
`code` int(10) DEFAULT NULL,
`name` char(100) DEFAULT NULL,
KEY `code_index` (`code`),
KEY `name_index` (`name`)
)
Department table values:
+------+-------------+
| code | name |
+------+-------------+
| 1 | Production |
| 2 | Development |
| 3 | Management |
+------+-------------+
Members table description:
CREATE TABLE `Members` (
`department_code` int(10) DEFAULT NULL,
`name` char(100) DEFAULT NULL,
KEY `department_code_index` (`department_code`),
KEY `name_index` (`name`)
)
Members table values:
+-----------------+----------------+
| department_code | name |
+-----------------+----------------+
| 1 | Ross Geller |
| 1 | Monica Geller |
| 1 | Phoebe Buffay |
| 1 | Rachel Green |
| 1 | Chandler Bing |
| 1 | Joey Tribianni |
| 2 | Janice |
| 2 | Gunther |
| 2 | Cathy |
| 2 | Emily |
| 2 | Fun Bobby |
| 2 | Heckles |
| 3 | Paolo |
| 3 | Mike Hannigan |
| 3 | Carol |
| 3 | Susan |
| 3 | Richard |
| 3 | Tag |
+-----------------+----------------+
I want to get the all the department code and name for the given set of users. As i just want the department names alone, I used the below query.
mysql> select Department.code, Department.name, Members.department_code from Department left join Members on (Department.code=Members.department_code) where Members.name in ('Rachel Green', 'Gunther', 'Paolo') group by Department.code;
+------+-------------+-----------------+
| code | name | department_code |
+------+-------------+-----------------+
| 1 | Production | 1 |
| 2 | Development | 2 |
| 3 | Management | 3 |
+------+-------------+-----------------+
This works fine and the "explain" gives me below execution plan.
+----+-------------+------------+------------+------+----------------------------------+-----------------------+---------+----------------------+------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+----------------------------------+-----------------------+---------+----------------------+------+----------+---------------------------------+
| 1 | SIMPLE | Department | NULL | ALL | code_index | NULL | NULL | NULL | 3 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | Members | NULL | ref | department_code_index,name_index | department_code_index | 5 | test.Department.code | 1 | 16.67 | Using where |
+----+-------------+------------+------------+------+----------------------------------+-----------------------+---------+----------------------+------+----------+---------------------------------+
But the "group by" uses temporary table which may degrade the performance if the Members table contains a lot of rows. Though I guess some ideal indexing would help out here, i can't get the proper idea. Any help will be appreciated.
Thanks in advance!
You can avoid the group by over all the data using a subquery:
select d.code, d.name, d.department_code
from Department d
where exists (select 1
from Members m
where d.code = m.department_code and
m.name in ('Rachel Green', 'Gunther', 'Paolo')
);
With an index on members(department_code, name), this should be much faster.

How to create an index on a CONCAT("string" ,column) in mysql?

I have a table where id is primary key.
CREATE TABLE t1 (
id INT NOT NULL AUTO_INCREMENT,
col1 VARCHAR(45) NULL,
PRIMARY KEY (id));
I have another table t2 which is joining table t1 as
t2 LEFT JOIN t1 ON CONCAT("USER_", t1.id) = t2.user_id
I want to create an index which has CONCAT("USER_", t1.id) values indexed in any order.
I tried
ALTER TABLE t1 ADD INDEX ((CONCAT('user_',id) DESC);
but it is giving error.
I have followed official documentation of mysql.
Note : I do not want to create a new CONCAT("user_", id) column.
https://dev.mysql.com/doc/refman/8.0/en/create-index.html#create-index-column-prefixes
InnoDB supports secondary indexes on virtual generated columns.
https://dev.mysql.com/doc/refman/5.7/en/create-table-secondary-indexes.html
In 5.7(onward) you can use a generated column, then index that column. e.g.
Here is an example of taking the integer out of the string to create an efficient join:
CREATE TABLE myusers (
id mediumint(8) unsigned NOT NULL auto_increment
, name varchar(255) default NULL,
PRIMARY KEY (`id`)
) AUTO_INCREMENT=1
;
INSERT INTO myusers (`name`) VALUES ('Imelda'),('Hamish'),('Brandon'),('Amity'),('Jillian'),('Lionel'),('Faith'),('Dai'),('Reed'),('Molly');
CREATE TABLE mytable (
id mediumint(8) unsigned NOT NULL auto_increment
, user_id VARCHAR(20)
, ex_user_id integer GENERATED ALWAYS AS (0+substring(user_id,6,20))
, password varchar(255)
, PRIMARY KEY (`id`)
, INDEX idx_ex_user_id (ex_user_id)
) AUTO_INCREMENT=1
;
INSERT INTO mytable (`user_id`,`password`) VALUES
('user_1','PYX68BIC9RD')
,('user_2','LPY07EIN0UA')
,('user_3','UGC24TKI3JL')
,('user_4','YQU18ALB8YA')
,('user_5','DEL56AGR6AD')
,('user_6','YQN87UOB0PO')
,('user_7','CPC15JFU6MC')
,('user_8','MWC40ZWD2EE')
,('user_9','HEB34QQH0UM')
,('user_10','GVP36PLP5PW')
;
select
*
from myusers
inner join mytable on myusers.id = mytable.ex_user_id
;
id | name | id | user_id | ex_user_id | password
-: | :------ | -: | :------ | ---------: | :----------
1 | Imelda | 1 | user_1 | 1 | PYX68BIC9RD
2 | Hamish | 2 | user_2 | 2 | LPY07EIN0UA
3 | Brandon | 3 | user_3 | 3 | UGC24TKI3JL
4 | Amity | 4 | user_4 | 4 | YQU18ALB8YA
5 | Jillian | 5 | user_5 | 5 | DEL56AGR6AD
6 | Lionel | 6 | user_6 | 6 | YQN87UOB0PO
7 | Faith | 7 | user_7 | 7 | CPC15JFU6MC
8 | Dai | 8 | user_8 | 8 | MWC40ZWD2EE
9 | Reed | 9 | user_9 | 9 | HEB34QQH0UM
10 | Molly | 10 | user_10 | 10 | GVP36PLP5PW
explain select
*
from myusers
inner join mytable on myusers.id = mytable.ex_user_id
;
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
-: | :---------- | :------ | :--------- | :--- | :------------- | :------------- | :------ | :------------------------------------- | ---: | -------: | :----------
1 | SIMPLE | myusers | null | ALL | PRIMARY | null | null | null | 10 | 100.00 | null
1 | SIMPLE | mytable | null | ref | idx_ex_user_id | idx_ex_user_id | 5 | fiddle_HNTHMETRTFAHHKBIGWZM.myusers.id | 1 | 100.00 | Using where
db<>fiddle here
note the conversion of user_id from string to integer is "implicit":
To cast a string to a number, you normally need do nothing other than use the string value in numeric context:
https://dev.mysql.com/doc/refman/5.7/en/create-table-secondary-indexes.html

TPCH Query Optimization

The following query is taking 5 hours so far to run:
INSERT $LINEITEM_PUBLIC SELECT *
FROM LINEITEM
WHERE L_PARTKEY IN ( SELECT P_PARTKEY FROM $PART_PUBLIC )
AND L_SUPPKEY IN ( SELECT S_SUPPKEY FROM $SUPPLIER_PUBLIC )
AND L_ORDERKEY IN ( SELECT O_ORDERKEY FROM $ORDERS_PUBLIC );
I added all required indexes but nothing seems to be helping. The Query Explain Plan prints the following:
+----+-------------+------------------+------------+--------+--------------------------------+-------------+---------+--------------------------------+----------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+--------+--------------------------------+-------------+---------+--------------------------------+----------+----------+-------------+
| 1 | INSERT | $LINEITEM_PUBLIC | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 1 | SIMPLE | $ORDERS_PUBLIC | NULL | index | PRIMARY | O_ORDERDATE | 3 | NULL | 12826617 | 100.00 | Using index |
| 1 | SIMPLE | LINEITEM | NULL | ref | PRIMARY,LINEITEM_FK2,L_SUPPKEY | PRIMARY | 4 | TPCH.$ORDERS_PUBLIC.O_ORDERKEY | 3 | 100.00 | NULL |
| 1 | SIMPLE | $SUPPLIER_PUBLIC | NULL | eq_ref | PRIMARY | PRIMARY | 4 | TPCH.LINEITEM.L_SUPPKEY | 1 | 100.00 | Using index |
| 1 | SIMPLE | $PART_PUBLIC | NULL | eq_ref | PRIMARY | PRIMARY | 4 | TPCH.LINEITEM.L_PARTKEY | 1 | 100.00 | Using index |
+----+-------------+------------------+------------+--------+--------------------------------+-------------+---------+--------------------------------+----------+----------+-------------+
Any recommendations on how this query can be optimized?
Update:
The size of the tables in the previous query is as follows:
LINEITEM: 60M records
$ORDERS_PUBLIC: 13M records
$SUPPLIER_PUBLIC: 92K records
$PART_PUBLIC: 2M records
Make sure there is an index starting with O_ORDERKEY.
IN (SELECT ...) may be optimized poorly (depending on version); try this:
INSERT $LINEITEM_PUBLIC
SELECT l.*
FROM LINEITEM AS l
WHERE EXISTS( SELECT * FROM $PART_PUBLIC WHERE P_PARTKEY = L_PARTKEY )
AND EXISTS( SELECT * FROM $SUPPLIER_PUBLIC WHERE S_SUPPKEY = L_SUPPKEY )
AND EXISTS( SELECT * FROM $ORDERS_PUBLIC WHERE O_ORDERKEY = L_ORDERKEY );