It seems that mysql is only selecting data from the first partition and last partition when you use a date range.
| sales | CREATE TABLE `sales` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`quantity_sold` int(11) NOT NULL,
`prod_id` int(11) NOT NULL,
`store_id` int(11) NOT NULL,
`date` date NOT NULL,
KEY `prod_id` (`prod_id`),
KEY `date` (`date`),
KEY `store_id` (`store_id`),
KEY `id` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=577574322 DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE (to_days(date))
(PARTITION p0 VALUES LESS THAN (0) ENGINE = InnoDB,
PARTITION p201211 VALUES LESS THAN (735203) ENGINE = InnoDB,
PARTITION p201212 VALUES LESS THAN (735234) ENGINE = InnoDB,
PARTITION p201301 VALUES LESS THAN (735265) ENGINE = InnoDB,
PARTITION p201302 VALUES LESS THAN (735293) ENGINE = InnoDB,
PARTITION p201303 VALUES LESS THAN (735324) ENGINE = InnoDB,
PARTITION p201304 VALUES LESS THAN (735354) ENGINE = InnoDB,
PARTITION p201305 VALUES LESS THAN (735385) ENGINE = InnoDB,
PARTITION p201306 VALUES LESS THAN (735415) ENGINE = InnoDB,
PARTITION p201307 VALUES LESS THAN (735446) ENGINE = InnoDB,
PARTITION p201308 VALUES LESS THAN (735477) ENGINE = InnoDB,
PARTITION p201309 VALUES LESS THAN (735507) ENGINE = InnoDB,
PARTITION p201310 VALUES LESS THAN (735538) ENGINE = InnoDB,
PARTITION p201311 VALUES LESS THAN (735568) ENGINE = InnoDB,
PARTITION p201312 VALUES LESS THAN (735599) ENGINE = InnoDB,
PARTITION p201401 VALUES LESS THAN (735630) ENGINE = InnoDB,
PARTITION p201402 VALUES LESS THAN (735658) ENGINE = InnoDB,
PARTITION p201403 VALUES LESS THAN (735689) ENGINE = InnoDB,
PARTITION p201404 VALUES LESS THAN (735719) ENGINE = InnoDB,
PARTITION p201405 VALUES LESS THAN (735750) ENGINE = InnoDB,
PARTITION p201406 VALUES LESS THAN (735780) ENGINE = InnoDB,
PARTITION p201407 VALUES LESS THAN (735811) ENGINE = InnoDB,
PARTITION p201408 VALUES LESS THAN (735842) ENGINE = InnoDB,
PARTITION p201409 VALUES LESS THAN (735872) ENGINE = InnoDB,
PARTITION p201410 VALUES LESS THAN (735903) ENGINE = InnoDB,
PARTITION p201411 VALUES LESS THAN (735933) ENGINE = InnoDB,
PARTITION p201412 VALUES LESS THAN (735964) ENGINE = InnoDB,
PARTITION P201501 VALUES LESS THAN (735995) ENGINE = InnoDB,
PARTITION P201502 VALUES LESS THAN (736023) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ |
Select the sales (should get data from all of the partitions) But only from the first and last?
`mysql> select * from sales where prod_id = 232744 and store_id = 300;
+-----------+---------------+---------+----------+------------+
| id | quantity_sold | prod_id | store_id | date |
+-----------+---------------+---------+----------+------------+
| 2309 | 1 | 232744 | 300 | 2012-11-26 |
| 2484 | 10 | 232744 | 300 | 2012-11-27 |
| 2837 | 7 | 232744 | 300 | 2012-11-29 |
| 3001 | 9 | 232744 | 300 | 2012-11-30 |
| 571930074 | 4 | 232744 | 300 | 2014-12-02 |
| 573051350 | 13 | 232744 | 300 | 2014-12-03 |
| 574181358 | 5 | 232744 | 300 | 2014-12-04 |
| 575322316 | 9 | 232744 | 300 | 2014-12-05 |
| 576455102 | 4 | 232744 | 300 | 2014-12-06 |
| 577545446 | 2 | 232744 | 300 | 2014-12-07 |
+-----------+---------------+---------+----------+------------+`
The explain partition show that it is scanning all of the partitions.
mysql> explain partitions select * from sales where prod_id = 232744 and store_id =300\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: sales
partitions: p0,p201211,p201212,p201301,p201302,p201303,p201304,p201305,p201306,p201307,p201308,p201309,p201310,p201311,p201312,p201401,p201402,p201403,p201404,p201405,p201406,p201407,p201408,p201409,p201410,p201411,p201412,P201501,P201502,p1
type: index_merge
possible_keys: prod_id,store_id
key: prod_id,store_id
key_len: 4,4
ref: NULL
rows: 20
Extra: Using intersect(prod_id,store_id); Using where
1 row in set (0.00 sec)
If I manually select the partition we can see there is data there, which should be in the data above.
`mysql> select * from sales PARTITION (p201410) where prod_id = 232744 and store_id = 300;
+-----------+---------------+---------+----------+------------+
| id | quantity_sold | prod_id | store_id | date |
+-----------+---------------+---------+----------+------------+
| 509534154 | 2 | 232744 | 300 | 2014-10-01 |
| 510606312 | 10 | 232744 | 300 | 2014-10-02 |
| 511682398 | 4 | 232744 | 300 | 2014-10-03 |
| 512752933 | 2 | 232744 | 300 | 2014-10-04 |
| 514812731 | 3 | 232744 | 300 | 2014-10-06 |
| 515862308 | 6 | 232744 | 300 | 2014-10-07 |
| 516922728 | 5 | 232744 | 300 | 2014-10-08 |
| 517990349 | 19 | 232744 | 300 | 2014-10-09 |
| 519066761 | 17 | 232744 | 300 | 2014-10-10 |
| 520136175 | 3 | 232744 | 300 | 2014-10-11 |
| 522185901 | 1 | 232744 | 300 | 2014-10-14 |
| 523238559 | 3 | 232744 | 300 | 2014-10-15 |
| 524294166 | 7 | 232744 | 300 | 2014-10-16 |
| 525354982 | 3 | 232744 | 300 | 2014-10-17 |
| 526412605 | 1 | 232744 | 300 | 2014-10-18 |
| 527444329 | 1 | 232744 | 300 | 2014-10-19 |
| 528452608 | 1 | 232744 | 300 | 2014-10-20 |
| 529488414 | 2 | 232744 | 300 | 2014-10-21 |
| 530541002 | 3 | 232744 | 300 | 2014-10-22 |
| 531603714 | 4 | 232744 | 300 | 2014-10-23 |
| 532672667 | 6 | 232744 | 300 | 2014-10-24 |
| 534793524 | 1 | 232744 | 300 | 2014-10-26 |
| 535819138 | 1 | 232744 | 300 | 2014-10-27 |
| 537957232 | 1 | 232744 | 300 | 2014-10-29 |
| 539037254 | 1 | 232744 | 300 | 2014-10-30 |
| 540125545 | 2 | 232744 | 300 | 2014-10-31 |
+-----------+---------------+---------+----------+------------+
26 rows in set (0.03 sec)`
If you were to do a select * from sales where prod_id = 232744; it will select all of the data.It seems to be just when you add a store_id condition in there that it doesn't select the correct data.
I'm stumped. I've tried:
Restarting mysql
I'm about to try a OPTIMIZE TABLE (I have to move the databases because of space constraints)
Seems to me there is something wrong with the keys? corrupt table?
Thanks!
Related
I've been trying to implement the solution here with the added flavour of updating existing records. As an MRE I'm looking to populate the sum_date_diff column in a table with the sum of all the differences between the current row date and the date of every previous row where the current row p1_id matches the previous row p1_id or p2_id. I have already filled out the expected result below:
+-----+------------+-------+-------+---------------+
| id_ | date_time | p1_id | p2_id | sum_date_diff |
+-----+------------+-------+-------+---------------+
| 1 | 2000-01-01 | 1 | 2 | Null |
| 2 | 2000-01-02 | 2 | 4 | 1 |
| 3 | 2000-01-04 | 1 | 3 | 3 |
| 4 | 2000-01-07 | 2 | 5 | 11 |
| 5 | 2000-01-15 | 2 | 3 | 35 |
| 6 | 2000-01-20 | 1 | 3 | 35 |
| 7 | 2000-01-31 | 1 | 3 | 68 |
+-----+------------+-------+-------+---------------+
My query so far looks like:
UPDATE test.sum_date_diff AS sdd0
JOIN
(SELECT
id_,
SUM(DATEDIFF(sdd1.date_time, sq.date_time)) AS sum_date_diff
FROM
test.sum_date_diff AS sdd1
LEFT OUTER JOIN (SELECT
sdd2.date_time AS date_time, sdd2.p1_id AS player_id
FROM
test.sum_date_diff AS sdd2 UNION ALL SELECT
sdd3.date_time AS date_time, sdd3.p2_id AS player_id
FROM
test.sum_date_diff AS sdd3) AS sq ON sq.date_time < sdd1.date_time
AND sq.player_id = sdd1.p1_id
GROUP BY sdd1.id_) AS master_sq ON master_sq.id_ = sdd0.id_
SET
sdd0.sum_date_diff = master_sq.sum_date_diff
This works as shown here.
However, on a table of 1.5m records the query has been hanging for the last hour. Even when I add a WHERE clause onto the bottom to restrict the update to a single record then it hangs for 5 mins+.
Here is the EXPLAIN statement for the query on the full table:
+----+-------------+---------------+------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------+---------+-------+---------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------+---------+-------+---------+----------+--------------------------------------------+
| 1 | UPDATE | sum_date_diff | NULL | const | PRIMARY | PRIMARY | 4 | const | 1 | 100 | NULL |
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 4 | const | 10 | 100 | NULL |
| 2 | DERIVED | sum_date_diff | NULL | index | PRIMARY,ix__match_oc_history__date_time,ix__match_oc_history__p1_id,ix__match_oc_history__p2_id,ix__match_oc_history__date_time_players | ix__match_oc_history__date_time_players | 14 | NULL | 1484288 | 100 | Using index; Using temporary |
| 2 | DERIVED | <derived3> | NULL | ALL | NULL | NULL | NULL | NULL | 2968576 | 100 | Using where; Using join buffer (hash join) |
| 3 | DERIVED | sum_date_diff | NULL | index | NULL | ix__match_oc_history__date_time_players | 14 | NULL | 1484288 | 100 | Using index |
| 4 | UNION | sum_date_diff | NULL | index | NULL | ix__match_oc_history__date_time_players | 14 | NULL | 1484288 | 100 | Using index |
+----+-------------+---------------+------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------+---------+-------+---------+----------+--------------------------------------------+
Here is the CREATE TABLE statement:
CREATE TABLE `sum_date_diff` (
`id_` int NOT NULL AUTO_INCREMENT,
`date_time` datetime DEFAULT NULL,
`p1_id` int NOT NULL,
`p2_id` int NOT NULL,
`sum_date_diff` int DEFAULT NULL,
PRIMARY KEY (`id_`),
KEY `ix__sum_date_diff__date_time` (`date_time`),
KEY `ix__sum_date_diff__p1_id` (`p1_id`),
KEY `ix__sum_date_diff__p2_id` (`p2_id`),
KEY `ix__sum_date_diff__date_time_players` (`date_time`,`p1_id`,`p2_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1822120 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
MySQL version is 8.0.26 running on a 2016 MacBook Pro with Monterey with 16Gb RAM.
After reading around about boosting the RAM available to MySQL I've added the following to the standard my.cnf file:
innodb_buffer_pool_size = 8G
tmp_table_size=2G
max_heap_table_size=2G
I'm wondering if:
I've done something wrong
This is just a very slow task no matter what I do
There is a faster method
I'm hoping someone could enlighten me!
Whereas it is possible to do calculations like this in SQL, it is messy. If the number of rows is not in the millions, I would fetch the necessary columns into my application and do the arithmetic there. (Loops are easier and faster in PHP/Java/etc than in SQL.)
LEAD() and LAG() are possible, but they are not optimized well (or so is my experience). In an APP language, it is easy and efficient to look up things in arrays.
The SELECT can (easily and efficiently) do any filtering and sorting so that the app only receives the necessary data.
I have the following query that runs really slow on mysql (83 seconds) but really fast on mariadb (.4 seconds).
I verified the data database has the same indexes and data. Maria Db server has less cpu (1VCPU), memory (2gb)
Mysql servers have 8 - 32GB ram and full quad core processors (tried 5.6,5.7, and 8.0 with similar results).
The phppos_inventory table has ~170000 rows and the phppos_items table has ~3000 rows
Here is the query and the tables and explains
SELECT /*+ SEMIJOIN(#subq MATERIALIZATION) */ SQL_CALC_FOUND_ROWS
1 AS _h,
`phppos_location_items`.`location_id` AS `location_id`,
`phppos_items`.`item_id`,
`phppos_items`.`name`,
`phppos_categories`.`id` AS `category_id`,
`phppos_categories`.`name` AS `category`,
`location`,
`company_name`,
`phppos_items`.`item_number`,
`size`,
`product_id`,
Coalesce(phppos_location_item_variations.cost_price,
phppos_item_variations.cost_price, phppos_location_items.cost_price,
phppos_items.cost_price, 0) AS cost_price,
Coalesce(phppos_location_item_variations.unit_price,
phppos_item_variations.unit_price, phppos_location_items.unit_price,
phppos_items.unit_price, 0) AS unit_price,
Sum(Coalesce(inv.trans_current_quantity, 0)) AS quantity,
Coalesce(phppos_location_item_variations.reorder_level,
phppos_item_variations.reorder_level, phppos_location_items.reorder_level,
phppos_items.reorder_level) AS reorder_level,
Coalesce(phppos_location_item_variations.replenish_level,
phppos_item_variations.replenish_level, phppos_location_items.replenish_level,
phppos_items.replenish_level) AS replenish_level,
description
FROM `phppos_inventory` `inv`
LEFT JOIN `phppos_items`
ON `phppos_items`.`item_id` = `inv`.`trans_items`
LEFT JOIN `phppos_location_items`
ON `phppos_location_items`.`item_id` = `phppos_items`.`item_id`
AND `phppos_location_items`.`location_id` = `inv`.`location_id`
LEFT JOIN `phppos_item_variations`
ON `phppos_items`.`item_id` = `phppos_item_variations`.`item_id`
AND `phppos_item_variations`.`id` = `inv`.`item_variation_id`
AND `phppos_item_variations`.`deleted` = 0
LEFT JOIN `phppos_location_item_variations`
ON `phppos_location_item_variations`.`item_variation_id` =
`phppos_item_variations`.`id`
AND `phppos_location_item_variations`.`location_id` =
`inv`.`location_id`
LEFT OUTER JOIN `phppos_suppliers`
ON `phppos_items`.`supplier_id` =
`phppos_suppliers`.`person_id`
LEFT OUTER JOIN `phppos_categories`
ON `phppos_items`.`category_id` = `phppos_categories`.`id`
WHERE inv.trans_id = (SELECT Max(inv1.trans_id)
FROM phppos_inventory inv1
WHERE inv1.trans_items = inv.trans_items
AND ( inv1.item_variation_id =
phppos_item_variations.id
OR phppos_item_variations.id IS NULL )
AND inv1.location_id = inv.location_id
AND inv1.trans_date < '2019-12-31 23:59:59')
AND inv.location_id IN( 1 )
AND `phppos_items`.`system_item` = 0
AND `phppos_items`.`deleted` = 0
AND `is_service` != 1
GROUP BY `phppos_items`.`item_id`
LIMIT 20
Explain mysql (slighly different than maria db but I tried use index to match the execution plan and still was slow)
+------------------------------------------+-------+----------+------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+---------------------------------+------------+--------+------------------------------+-------+----------+------------------------------------+
| 1 | PRIMARY | phppos_items | NULL | ref | PRIMARY,item_number,product_id,phppos_items_ibfk_1,deleted,phppos_items_ibfk_3,phppos_items_ibfk_4,phppos_items_ibfk_5,description,size,reorder_level,cost_price,unit_price,promo_price,last_modified,name,phppos_items_ibfk_6,deleted_system_item,custom_field_1_value,custom_field_2_value,custom_field_3_value,custom_field_4_value,custom_field_5_value,custom_field_6_value,custom_field_7_value,custom_field_8_value,custom_field_9_value,custom_field_10_value,verify_age,phppos_items_ibfk_7,item_inactive_index,tags,full_search,name_search,item_number_search,product_id_search,description_search,size_search,custom_field_1_value_search,custom_field_2_value_search,custom_field_3_value_search,custom_field_4_value_search,custom_field_5_value_search,custom_field_6_value_search,custom_field_7_value_search,custom_field_8_value_search,custom_field_9_value_search,custom_field_10_value_search | deleted | 4 | const | 21188 | 9.00 | Using index condition; Using where |
| 1 | PRIMARY | inv | NULL | ref | phppos_inventory_ibfk_1,location_id,phppos_inventory_custom | phppos_inventory_custom | 8 | pos.phppos_items.item_id,const | 3 | 100.00 | NULL |
| 1 | PRIMARY | phppos_location_items | NULL | eq_ref | PRIMARY,phppos_location_items_ibfk_2 | PRIMARY | 8 | const,pos.phppos_items.item_id | 1 | 100.00 | NULL |
| 1 | PRIMARY | phppos_item_variations | NULL | eq_ref | PRIMARY,phppos_item_variations_ibfk_1 | PRIMARY | 4 | pos.inv.item_variation_id | 1 | 100.00 | Using where |
| 1 | PRIMARY | phppos_location_item_variations | NULL | eq_ref | PRIMARY,phppos_item_attribute_location_values_ibfk_2 | PRIMARY | 8 | pos.phppos_item_variations.id,const | 1 | 100.00 | NULL |
| 1 | PRIMARY | phppos_suppliers | NULL | ref | person_id | person_id | 4 | pos.phppos_items.supplier_id | 1 | 100.00 | NULL |
| 1 | PRIMARY | phppos_categories | NULL | eq_ref | PRIMARY | PRIMARY | 4 | pos.phppos_items.category_id | 1 | 100.00 | NULL |
| 2 | DEPENDENT SUBQUERY | inv1 | NULL | ref | phppos_inventory_ibfk_1,location_id,trans_date,phppos_inventory_ibfk_4,phppos_inventory_custom | phppos_inventory_custom | 8 | pos.inv.trans_items,pos.inv.location_id | 3 | 50.00 | Using where; Using index |
+----+--------------------+---------------------------------+------------+--------+---------------------------------------------------------------------------------------------------------
Explain maria db:
+------+---------------------------------------------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------------+---------------------------------+--------+------------------------------+
| 1 | PRIMARY | phppos_items | ref | PRIMARY,deleted,deleted_system_item | deleted | 4 | const | 23955 | Using where |
| 1 | PRIMARY | inv | ref | phppos_inventory_ibfk_1,location_id,phppos_inventory_custom | phppos_inventory_ibfk_1 | 4 | freelance_pos5.phppos_items.item_id | 2 | Using where |
| 1 | PRIMARY | phppos_location_items | eq_ref | PRIMARY,phppos_location_items_ibfk_2 | PRIMARY | 8 | const,freelance_pos5.phppos_items.item_id | 1 | |
| 1 | PRIMARY | phppos_item_variations | eq_ref | PRIMARY,phppos_item_variations_ibfk_1 | PRIMARY | 4 | freelance_pos5.inv.item_variation_id | 1 | Using where |
| 1 | PRIMARY | phppos_location_item_variations | eq_ref | PRIMARY,phppos_item_attribute_location_values_ibfk_2 | PRIMARY | 8 | freelance_pos5.phppos_item_variations.id,const | 1 | Using where |
| 1 | PRIMARY | phppos_suppliers | ref | person_id | person_id | 4 | freelance_pos5.phppos_items.supplier_id | 1 | Using where |
| 1 | PRIMARY | phppos_categories | eq_ref | PRIMARY | PRIMARY | 4 | freelance_pos5.phppos_items.category_id | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | inv1 | ref | phppos_inventory_ibfk_1,location_id,trans_date,phppos_inventory_ibfk_4,phppos_inventory_custom | phppos_inventory_custom | 8 | freelance_pos5.inv.trans_items,freelance_pos5.inv.location_id | 2 | Using where; Using index |
+------+--------------------+---------------------------------+--------+------------------------------------------------------------------------------------------------+-------------------------+---------+---------------------------------------------------------------+-------+--------------------------+
Tables described (Reached StackOverflow char limit)
https://pastebin.com/nhngSHb8
Create tables:
https://pastebin.com/aWMeriqt
MYSQL (DEV BOX)
mysql> SHOW GLOBAL STATUS LIKE '%thread%';
+------------------------------------------+-------+
| Variable_name | Value |
+------------------------------------------+-------+
| Delayed_insert_threads | 0 |
| Performance_schema_thread_classes_lost | 0 |
| Performance_schema_thread_instances_lost | 0 |
| Slow_launch_threads | 0 |
| Threads_cached | 4 |
| Threads_connected | 1 |
| Threads_created | 5 |
| Threads_running | 1 |
+------------------------------------------+-------+
8 rows in set (0.06 sec)
MARIA DB
MariaDB [freelance_pos5]> SHOW GLOBAL STATUS LIKE '%thread%';
+------------------------------------------+-------+
| Variable_name | Value |
+------------------------------------------+-------+
| Delayed_insert_threads | 0 |
| Performance_schema_thread_classes_lost | 0 |
| Performance_schema_thread_instances_lost | 0 |
| Slow_launch_threads | 0 |
| Threadpool_idle_threads | 0 |
| Threadpool_threads | 0 |
| Threads_cached | 3 |
| Threads_connected | 2 |
| Threads_created | 5 |
| Threads_running | 1 |
| wsrep_applier_thread_count | 0 |
| wsrep_rollbacker_thread_count | 0 |
| wsrep_thread_count | 0 |
+------------------------------------------+-------+
13 rows in set (0.00 sec)
Moving the
WHERE inv.trans_id = (SELECT Max(inv1.trans_id)
into the INNER JOIN is the game changer.
INNER JOIN (
SELECT inv1.trans_items, inv1.item_variation_id, inv1.location_id, MAX(inv1.trans_id) as trans_id
FROM phppos_inventory inv1
WHERE inv1.trans_date < '2019-12-31 23:59:59'
GROUP BY inv1.trans_items, inv1.item_variation_id, inv1.location_id
ORDER BY inv1.trans_items, inv1.item_variation_id, inv1.location_id
) inv1 on inv1.trans_id = inv.trans_id
AND inv1.trans_items = inv.trans_items
AND (inv1.item_variation_id = phppos_item_variations.id OR phppos_item_variations.id IS NULL)
AND inv1.location_id = inv.location_id
The execution is reduced from 80+s down to ~ <0.4s, on MySQL 8.0.
MariaDB's and MySQL's Optimizers started diverging significantly at 5.6. Certain queries will run queries faster in one than the other.
I think I see a way to speed up the query, perhaps on both versions.
Don't use LEFT JOIN when it is the same as JOIN, which seems to be the case for at least phppos_items, which has items in the WHERE that override LEFT.
Please provide SHOW CREATE TABLE; meanwhile, I will guess that what indexes you have/don't have, and that each table has PRIMARY KEY(id)
Use composite indexes where appropriate. (More below.)
Get the 20 rows before JOINing to the rest of the tables:
SELECT ...
FROM ( SELECT inv.id, pi.id
FROM `phppos_inventory` AS inv `inv`
JOIN `phppos_items` AS pi
ON pi.`item_id` = `inv`.`trans_items`
AND inv.location_id IN( 1 )
AND pi.`system_item` = 0
AND pi.`deleted` = 0
AND `is_service` != 1 -- Which table is this in???
GROUP BY pi.`item_id`
LIMIT 20 )
LEFT JOIN .... (( all the other tables ))
-- no GROUP BY or LIMIT needed (I think)
phppos_items: INDEX(item_id, deleted, system_item, is_service)
phppos_items: INDEX(deleted, system_item, is_service)
phppos_inventory: INDEX(trans_items, location_id, location_id, item_variation_id, trans_date, trans_id)
phppos_inventory: INDEX(location_id)
Aside with the fact that the query is misleading since the outer join is discarded, the main difference is that the second engine operation in MariabDB is an index range scan (ref) using the phppos_inventory_custom index. MySQL also chose an index range scan but over phppos_inventory_ibfk_1.
However, without the definition of these two indexes it's difficult to asses why the engines may have chosen a different path.
Please add to your question the definition of these indexes, and alse their selectivity (percent of estimated rows selected / total table rows) to elaborate more.
I did a MySQL performance optimization test, but the test results surprised me.
First of all, I prepared several tables for my test, which are "t_worker_attendance_300w(3 million data), t_worker_attendance_1000w(10 million data), t_worker_attendance_1y(100 million data), t_worker_attendance_4y(400 million data)".
Each table has the same field, the same index, they are copied, including 400 million data volume is also increased from 3 million data.
In my understanding, MySQL's performance is bound to be severely affected by the size of the data volume, but the results have puzzled me for a whole week. I've almost tested the scenarios I can think of, but their execution times are the same!
This is a new MySQL 5.6.16 server,I tested any scenario I could think of, including INNER JOIN....
A) SHOW CREATE TABLE t_worker_attendance_4y
CREATE TABLE `t_worker_attendance_4y` (
`id` bigint(20) NOT NULL ,
`attendance_id` char(32) NOT NULL,
`worker_id` char(32) NOT NULL,
`subcontractor_id` char(32) NOT NULL ,
`project_id` char(32) NOT NULL ,
`sign_date` date NOT NULL ,
`sign_type` char(2) NOT NULL ,
`latitude` double DEFAULT NULL,
`longitude` double DEFAULT NULL ,
`sign_wages` decimal(16,2) DEFAULT NULL ,
`confirm_wages` decimal(16,2) DEFAULT NULL ,
`work_content` varchar(60) DEFAULT NULL ,
`team_leader_id` char(32) DEFAULT NULL,
`sign_state` char(2) NOT NULL ,
`confirm_date` date DEFAULT NULL ,
`sign_mode` char(2) DEFAULT NULL ,
`checkin_time` datetime DEFAULT NULL ,
`checkout_time` datetime DEFAULT NULL ,
`sign_hours` decimal(6,1) DEFAULT NULL ,
`overtime` decimal(6,1) DEFAULT NULL ,
`confirm_hours` decimal(6,1) DEFAULT NULL ,
`signimg` varchar(200) DEFAULT NULL ,
`signoutimg` varchar(200) DEFAULT NULL ,
`photocheck` char(2) DEFAULT NULL ,
`machine_type` varchar(2) DEFAULT '1' ,
`project_coordinate` text ,
`floor_num` varchar(200) DEFAULT NULL ,
`device_serial_no` varchar(32) DEFAULT NULL ,
KEY `checkin_time` (`checkin_time`),
KEY `worker_id` (`worker_id`),
KEY `project_id` (`project_id`),
KEY `subcontractor_id` (`subcontractor_id`),
KEY `sign_date` (`sign_date`),
KEY `project_id_2` (`project_id`,`sign_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
B) SHOW INDEX FROM t_worker_attendance_4y
+------------------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| t_worker_attendance_4y | 1 | checkin_time | 1 | checkin_time | A | 5017494 | NULL | NULL | YES | BTREE | | |
| t_worker_attendance_4y | 1 | worker_id | 1 | worker_id | A | 1686552 | NULL | NULL | | BTREE | | |
| t_worker_attendance_4y | 1 | project_id | 1 | project_id | A | 102450 | NULL | NULL | | BTREE | | |
| t_worker_attendance_4y | 1 | subcontractor_id | 1 | subcontractor_id | A | 380473 | NULL | NULL | | BTREE | | |
| t_worker_attendance_4y | 1 | sign_date | 1 | sign_date | A | 512643 | NULL | NULL | | BTREE | | |
| t_worker_attendance_4y | 1 | project_id_2 | 1 | project_id | A | 102059 | NULL | NULL | | BTREE | | |
| t_worker_attendance_4y | 1 | project_id_2 | 2 | sign_date | A | 1776104 | NULL | NULL | | BTREE | | |
+------------------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
C) EXPLAIN SELECT SQL_NO_CACHE tw.project_id, tw.sign_date FROM t_worker_attendance_4y tw WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb' AND sign_date >= '07/01/2018' AND sign_date < '08/01/2018' ;
+----+-------------+-------+------+-----------------------------------+--------------+---------+-------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-----------------------------------+--------------+---------+-------+----------+--------------------------+
| 1 | SIMPLE | tw | ref | project_id,sign_date,project_id_2 | project_id_2 | 96 | const | 54134596 | Using where; Using index |
+----+-------------+-------+------+-----------------------------------+--------------+---------+-------+----------+--------------------------+
They all went through the same joint index.
SELECT tw.project_id, tw.sign_date FROM t_worker_attendance_300w tw
WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb'
AND sgin_date >= '07/01/2018'
AND sgin_date < '08/01/2018' LIMIT 0,10000;
Execution time: 0.02 sec
SELECT tw.project_id, tw.sign_date FROM t_worker_attendance_1000w tw
WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb'
AND sgin_date >= '07/01/2018'
AND sgin_date < '08/01/2018' LIMIT 0,10000;
Execution time: 0.01 sec
SELECT tw.project_id, tw.sign_date FROM t_worker_attendance_1y tw
WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb'
AND sgin_date >= '07/01/2018'
AND sgin_date < '08/01/2018' LIMIT 0,10000;
Execution time: 0.02 sec
SELECT tw.project_id, tw.sign_date FROM t_worker_attendance_4y tw
WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb'
AND sgin_date >= '07/01/2018'
AND sgin_date < '08/01/2018' LIMIT 0,10000;
Execution time: 0.02 sec
......
My guess is that MySQL's query performance will decline dramatically with the increase of data volume, but they are not much different. So I have no way to optimize my query. I don't know when to implement table partition plan or sub-database sub-table plan.
What I want to know is why the execution speed of index with small data volume is the same as that of index with large data volume. If you can help me, I would like to thank you very much.
Same search performance on large data volume because of BTREE index. It has O(log(n)). Relatively speaking that means that search algorithm have to complete:
6 operations on 3m of data
7 operations on 10m of data
8 operations on 100m of data
8 operations on 400m of data
Аs you can see the number of operations is almost the same.
My guess is that MySQL's query performance will decline dramatically with the increase of data volume
This is true for full table scan cases.
I have a new answer, someone told me "Because your query is covered by index, index is actually the time of query index. Mysql index uses B + tree structure. The query time is basically the same under the same tree height. You can calculate whether the height of the trees indexed by these tables is the same."
So I did the inquiry as required.
mysql> SELECT b.name, a.name, index_id, type, a.space, a.PAGE_NO
-> FROM information_schema.INNODB_SYS_INDEXES a,
-> information_schema.INNODB_SYS_TABLES b
-> WHERE a.table_id = b.table_id AND a.space <> 0;
+-------------------------------------------------+---------------------+----------+------+-------+---------+
| name | name | index_id | type | space | PAGE_NO |
+-------------------------------------------------+---------------------+----------+------+-------+---------+
| mysql/innodb_index_stats | PRIMARY | 18 | 3 | 2 | 3 |
| mysql/innodb_table_stats | PRIMARY | 17 | 3 | 1 | 3 |
| mysql/slave_master_info | PRIMARY | 20 | 3 | 4 | 3 |
| mysql/slave_relay_log_info | PRIMARY | 19 | 3 | 3 | 3 |
| mysql/slave_worker_info | PRIMARY | 21 | 3 | 5 | 3 |
| test_gomeet/t_worker_attendance_1y | GEN_CLUST_INDEX | 45 | 1 | 12 | 3 |
| test_gomeet/t_worker_attendance_1y | checkin_time | 46 | 0 | 12 | 16389 |
| test_gomeet/t_worker_attendance_1y | project_id | 50 | 0 | 12 | 32775 |
| test_gomeet/t_worker_attendance_1y | worker_id | 53 | 0 | 12 | 49161 |
| test_gomeet/t_worker_attendance_1y | subcontractor_id | 54 | 0 | 12 | 65547 |
| test_gomeet/t_worker_attendance_1y | sign_date | 66 | 0 | 12 | 81933 |
| test_gomeet/t_worker_attendance_1y | project_id_2 | 408 | 0 | 12 | 98319 |
| test_gomeet/t_worker_attendance_300w | GEN_CLUST_INDEX | 56 | 1 | 13 | 3 |
| test_gomeet/t_worker_attendance_300w | checkin_time | 58 | 0 | 13 | 16389 |
| test_gomeet/t_worker_attendance_300w | project_id | 59 | 0 | 13 | 16427 |
| test_gomeet/t_worker_attendance_300w | worker_id | 60 | 0 | 13 | 16428 |
| test_gomeet/t_worker_attendance_300w | subcontractor_id | 61 | 0 | 13 | 16429 |
| test_gomeet/t_worker_attendance_300w | sign_date | 67 | 0 | 13 | 65570 |
| test_gomeet/t_worker_attendance_300w | project_id_2 | 397 | 0 | 13 | 81929 |
| test_gomeet/t_worker_attendance_4y | GEN_CLUST_INDEX | 42 | 1 | 9 | 3 |
| test_gomeet/t_worker_attendance_4y | checkin_time | 47 | 0 | 9 | 16389 |
| test_gomeet/t_worker_attendance_4y | worker_id | 49 | 0 | 9 | 32775 |
| test_gomeet/t_worker_attendance_4y | project_id | 52 | 0 | 9 | 49161 |
| test_gomeet/t_worker_attendance_4y | subcontractor_id | 55 | 0 | 9 | 65547 |
| test_gomeet/t_worker_attendance_4y | sign_date | 69 | 0 | 9 | 81933 |
| test_gomeet/t_worker_attendance_4y | project_id_2 | 412 | 0 | 9 | 98319 |
+-------------------------------------------------+---------------------+----------+------+-------+---------+
mysql> SHOW GLOBAL STATUS LIKE 'Innodb_page_size';
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| Innodb_page_size | 16384 |
+------------------+-------+
root#localhost:/usr/local/mysql/data/test_gomeet# hexdump -s 49216 -n 02 t_worker_attendance_300w.ibd
000c040 0200
000c042
root#localhost:/usr/local/mysql/data/test_gomeet# hexdump -s 49216 -n 02 t_worker_attendance_1y.ibd
000c040 0300
000c042
root#localhost:/usr/local/mysql/data/test_gomeet# hexdump -s 49216 -n 02 t_worker_attendance_4y.ibd
000c040 0300
000c042
The calculation shows that 3.34 is 100 million and 3.589 is 400 million. It's almost the same. Is it because of this?
I try to improve performance of a SQL query, using MariaDB 10.1.18 (Linux Debian Jessie).
The server has a large amount of RAM (192GB) and SSD disks.
The real table has hundreds of millions of rows but I can reproduce my performance issue on a subset of the data and a simplified layout.
Here is the (simplified) table definition:
CREATE TABLE `data` (
`uri` varchar(255) NOT NULL,
`category` tinyint(4) NOT NULL,
`value` varchar(255) NOT NULL,
PRIMARY KEY (`uri`,`category`),
KEY `cvu` (`category`,`value`,`uri`),
KEY `cu` (`category`,`uri`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
To reproduce the actual distribution of my content, I insert about 200'000 rows like this (bash script):
#!/bin/bash
for i in `seq 1 100000`;
do
mysql mydb -e "INSERT INTO data (uri, category, value) VALUES ('uri${i}', 1, 'foo');"
done
for i in `seq 99981 200000`;
do
mysql mydb -e "INSERT INTO data (uri, category, value) VALUES ('uri${i}', 2, '$(($i % 5))');"
done
So, we insert about:
100'000 rows in category 1 with a static string ("foo") as value
100'000 rows in category 2 with a number between 1 and 5 as the value
20 rows have a common "uri" between each dataset (category 1 / 2)
I always run an ANALYZE TABLE before querying.
Here is the explain output of the query I run:
MariaDB [mydb]> EXPLAIN EXTENDED
-> SELECT d2.uri, d2.value
-> FROM data as d1
-> INNER JOIN data as d2 ON d1.uri = d2.uri AND d2.category = 2
-> WHERE d1.category = 1 and d1.value = 'foo';
+------+-------------+-------+--------+----------------+---------+---------+-------------------+-------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-------+--------+----------------+---------+---------+-------------------+-------+----------+-------------+
| 1 | SIMPLE | d1 | ref | PRIMARY,cvu,cu | cu | 1 | const | 92964 | 100.00 | Using where |
| 1 | SIMPLE | d2 | eq_ref | PRIMARY,cvu,cu | PRIMARY | 768 | mydb.d1.uri,const | 1 | 100.00 | |
+------+-------------+-------+--------+----------------+---------+---------+-------------------+-------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
MariaDB [mydb]> SHOW WARNINGS;
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | select `mydb`.`d2`.`uri` AS `uri`,`mydb`.`d2`.`value` AS `value` from `mydb`.`data` `d1` join `mydb`.`data` `d2` where ((`mydb`.`d1`.`category` = 1) and (`mydb`.`d2`.`uri` = `mydb`.`d1`.`uri`) and (`mydb`.`d2`.`category` = 2) and (`mydb`.`d1`.`value` = 'foo')) |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
MariaDB [mydb]> SELECT d2.uri, d2.value FROM data as d1 INNER JOIN data as d2 ON d1.uri = d2.uri AND d2.category = 2 WHERE d1.category = 1 and d1.value = 'foo';
+-----------+-------+
| uri | value |
+-----------+-------+
| uri100000 | 0 |
| uri99981 | 1 |
| uri99982 | 2 |
| uri99983 | 3 |
| uri99984 | 4 |
| uri99985 | 0 |
| uri99986 | 1 |
| uri99987 | 2 |
| uri99988 | 3 |
| uri99989 | 4 |
| uri99990 | 0 |
| uri99991 | 1 |
| uri99992 | 2 |
| uri99993 | 3 |
| uri99994 | 4 |
| uri99995 | 0 |
| uri99996 | 1 |
| uri99997 | 2 |
| uri99998 | 3 |
| uri99999 | 4 |
+-----------+-------+
20 rows in set (0.35 sec)
This query returns 20 rows in ~350ms.
It seems quite slow to me.
Is there a way to improve performance of such query? Any advice?
Can you try the following query?
SELECT dd.uri, max(case when dd.category=2 then dd.value end) v2
FROM data as dd
GROUP by 1
having max(case when dd.category=1 then dd.value end)='foo' and v2 is not null;
I cannot at the moment repeat your test, but my hope is that having to scan the table just once could compensate the usage of the aggregate functions.
Edited
Created a test environment and tested some hypothesis.
As of today, the best performance (for 1 million rows) has been:
1 - Adding an index on uri column
2 - Using the following query
select d2.uri, d2.value
FROM data as d2
where exists (select 1
from data d1
where d1.uri = d2.uri
AND d1.category = 1
and d1.value='foo')
and d2.category=2
and d2.uri in (select uri from data group by 1 having count(*) > 1);
The ironic thing is that in the first proposal I tried to minimize the access to the table and now I'm proposing three accesses.
Edited: 30/10
Ok, so I've done some other experiments and I would like to summarize the outcomes.
First, I'd like to expand a bit Aruna answer:
what I found interesting in the OP question, is that it is an exception to a classic "rule of thumb" in database optimization: if the # of desired results is very small compared to the dimension of the tables involved, it should be possible with the correct indexes to have a very good performance.
Why can't we simply add a "magic index" to have our 20 rows? Because we don't have any clear "attack vector".. I mean, there's no clearly selective criteria we can apply on a record to reduce significatevely the number of the target rows.
Think about it: the fact that the value must be "foo" is just removing 50% of the table form the equation. Also the category is not selective at all: the only interest thing is that, for 20 uri, they appear both in records with category 1 and 2.
But here lies the issue: the condition involves comparing two rows, and unfortunately, to my knowledge, there's no way an index (not even the Oracle Function Based Indexes) can reduce a condition that is dependant on info on multiple rows.
The conlclusion might be: if these kind of query is what you need, you should revise your data model. For example, if you have a finite and small number of categories (lets' say three=, your table might be written as:
uri, value_category1, value_category2, value_category3
The query would be:
select uri, value_category2
where value_category1='foo' and value_category2 is not null;
By the way, let's go back tp the original question.
I've created a slightly more efficient test data generator (http://pastebin.com/DP8Uaj2t).
I've used this table:
use mydb;
DROP TABLE IF EXISTS data2;
CREATE TABLE data2
(
uri varchar(255) NOT NULL,
category tinyint(4) NOT NULL,
value varchar(255) NOT NULL,
PRIMARY KEY (uri,category),
KEY cvu (category,value,uri),
KEY ucv (uri,category,value),
KEY u (uri),
KEY cu (category,uri)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The outcome is:
+--------------------------+----------+----------+----------+
| query_descr | num_rows | num | num_test |
+--------------------------+----------+----------+----------+
| exists_plus_perimeter | 10000 | 0.0000 | 5 |
| exists_plus_perimeter | 50000 | 0.0000 | 5 |
| exists_plus_perimeter | 100000 | 0.0000 | 5 |
| exists_plus_perimeter | 500000 | 2.0000 | 5 |
| exists_plus_perimeter | 1000000 | 4.8000 | 5 |
| exists_plus_perimeter | 5000000 | 26.7500 | 8 |
| max_based | 10000 | 0.0000 | 5 |
| max_based | 50000 | 0.0000 | 5 |
| max_based | 100000 | 0.0000 | 5 |
| max_based | 500000 | 3.2000 | 5 |
| max_based | 1000000 | 7.0000 | 5 |
| max_based | 5000000 | 49.5000 | 8 |
| max_based_with_ucv | 10000 | 0.0000 | 5 |
| max_based_with_ucv | 50000 | 0.0000 | 5 |
| max_based_with_ucv | 100000 | 0.0000 | 5 |
| max_based_with_ucv | 500000 | 2.6000 | 5 |
| max_based_with_ucv | 1000000 | 7.0000 | 5 |
| max_based_with_ucv | 5000000 | 36.3750 | 8 |
| standard_join | 10000 | 0.0000 | 5 |
| standard_join | 50000 | 0.4000 | 5 |
| standard_join | 100000 | 2.4000 | 5 |
| standard_join | 500000 | 13.4000 | 5 |
| standard_join | 1000000 | 33.2000 | 5 |
| standard_join | 5000000 | 205.2500 | 8 |
| standard_join_plus_perim | 5000000 | 155.0000 | 2 |
+--------------------------+----------+----------+----------+
The queries used are:
- query_max_based_with_ucv.sql
- query_exists_plus_perimeter.sql
- query_max_based.sql
- query_max_based_with_ucv.sql
- query_standard_join_plus_perim.sql query_standard_join.sql
The best query is still the "query_exists_plus_perimeter"that I've put after the first environment creation.
It is mainly due to the number of rows analysed. Even though you have tables indexed the main decision making condition "WHERE d1.category = 1 and d1.value = 'foo'" filters huge amount of rows
+------+-------------+-------+-.....-+-------+----------+-------------+
| id | select_type | table | | rows | filtered | Extra |
+------+-------------+-------+-.....-+-------+----------+-------------+
| 1 | SIMPLE | d1 | ..... | 92964 | 100.00 | Using where |
Each and every matching row it has to read the table again which is for category 2. Since it is reading on primary key it can get the matching row directly.
On your original table check the cardinality of the combination of category and value. If it is more towards unique, you can add an index on (category, value) and that should improve the performance. If it is same like the example given you may not get any performance improvement.
I have created a mysql table and hash partitioned it as below.
mysql> CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT,
store_id INT,
PRIMARY KEY(id)
)
PARTITION BY HASH(id)
PARTITIONS 10;
After I created table successfully, I inserted value 1(into store_id) into the table shown below
mysql>INSERT INTO employees (store_id) values (1);
Now I don't understand where will this value of 1 go into? Into which partition (p0,p1,p2......p10) store_id value 1 go? I thought it would go into p0. but it did not. see below I checked it like this
mysql>SELECT TABLE_NAME, PARTITION_NAME, TABLE_ROWS, AVG_ROW_LENGTH,DATA_LENGTH FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_NAME LIKE 'employees';
it has shown the value went into p1.see below
mysql>
+------------+----------------+------------+----------------+-------------+
| TABLE_NAME | PARTITION_NAME | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH |
+------------+----------------+------------+----------------+-------------+
| employees | p0 | 0 | 0 | 16384 |
| employees | p1 | 1 | 16384 | 16384 |
| employees | p2 | 0 | 0 | 16384 |
| employees | p3 | 0 | 0 | 16384 |
| employees | p4 | 0 | 0 | 16384 |
| employees | p5 | 0 | 0 | 16384 |
| employees | p6 | 0 | 0 | 16384 |
| employees | p7 | 0 | 0 | 16384 |
| employees | p8 | 0 | 0 | 16384 |
| employees | p9 | 0 | 0 | 16384 |
+------------+----------------+------------+----------------+-------------+
I don'tknow why it got inserted into p1.tested it again.. I inserted value 2 this time...
mysql> INSERT INTO employees (store_id) values (2);
It has got entered into p2.
+------------+----------------+------------+----------------+-------------+
| TABLE_NAME | PARTITION_NAME | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH |
+------------+----------------+------------+----------------+-------------+
| employees | p0 | 0 | 0 | 16384 |
| employees | p1 | 1 | 16384 | 16384 |
| employees | p2 | 1 | 16384 | 16384 |
| employees | p3 | 0 | 0 | 16384 |
| employees | p4 | 0 | 0 | 16384 |
| employees | p5 | 0 | 0 | 16384 |
| employees | p6 | 0 | 0 | 16384 |
| employees | p7 | 0 | 0 | 16384 |
| employees | p8 | 0 | 0 | 16384 |
| employees | p9 | 0 | 0 | 16384 |
+------------+----------------+------------+----------------+-------------+
why values are getting inserted into different partitions? Is there any rule that hash partition follow? Interestingly it left p0 and started getting inserted into p1? Explain?
If this explanation holds true for your MySQL version the partition number is found this way: MOD([Your input],[Number of partitions]).
In your case the first row probably has id = 1 and the calculation will be MOD(1,10) = 1. The row goes to partition 1 (id= 2 goes to partition 2).