I have created a mysql table and hash partitioned it as below.
mysql> CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT,
store_id INT,
PRIMARY KEY(id)
)
PARTITION BY HASH(id)
PARTITIONS 10;
After I created table successfully, I inserted value 1(into store_id) into the table shown below
mysql>INSERT INTO employees (store_id) values (1);
Now I don't understand where will this value of 1 go into? Into which partition (p0,p1,p2......p10) store_id value 1 go? I thought it would go into p0. but it did not. see below I checked it like this
mysql>SELECT TABLE_NAME, PARTITION_NAME, TABLE_ROWS, AVG_ROW_LENGTH,DATA_LENGTH FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_NAME LIKE 'employees';
it has shown the value went into p1.see below
mysql>
+------------+----------------+------------+----------------+-------------+
| TABLE_NAME | PARTITION_NAME | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH |
+------------+----------------+------------+----------------+-------------+
| employees | p0 | 0 | 0 | 16384 |
| employees | p1 | 1 | 16384 | 16384 |
| employees | p2 | 0 | 0 | 16384 |
| employees | p3 | 0 | 0 | 16384 |
| employees | p4 | 0 | 0 | 16384 |
| employees | p5 | 0 | 0 | 16384 |
| employees | p6 | 0 | 0 | 16384 |
| employees | p7 | 0 | 0 | 16384 |
| employees | p8 | 0 | 0 | 16384 |
| employees | p9 | 0 | 0 | 16384 |
+------------+----------------+------------+----------------+-------------+
I don'tknow why it got inserted into p1.tested it again.. I inserted value 2 this time...
mysql> INSERT INTO employees (store_id) values (2);
It has got entered into p2.
+------------+----------------+------------+----------------+-------------+
| TABLE_NAME | PARTITION_NAME | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH |
+------------+----------------+------------+----------------+-------------+
| employees | p0 | 0 | 0 | 16384 |
| employees | p1 | 1 | 16384 | 16384 |
| employees | p2 | 1 | 16384 | 16384 |
| employees | p3 | 0 | 0 | 16384 |
| employees | p4 | 0 | 0 | 16384 |
| employees | p5 | 0 | 0 | 16384 |
| employees | p6 | 0 | 0 | 16384 |
| employees | p7 | 0 | 0 | 16384 |
| employees | p8 | 0 | 0 | 16384 |
| employees | p9 | 0 | 0 | 16384 |
+------------+----------------+------------+----------------+-------------+
why values are getting inserted into different partitions? Is there any rule that hash partition follow? Interestingly it left p0 and started getting inserted into p1? Explain?
If this explanation holds true for your MySQL version the partition number is found this way: MOD([Your input],[Number of partitions]).
In your case the first row probably has id = 1 and the calculation will be MOD(1,10) = 1. The row goes to partition 1 (id= 2 goes to partition 2).
Related
I have the following query that runs really slow on mysql (83 seconds) but really fast on mariadb (.4 seconds).
I verified the data database has the same indexes and data. Maria Db server has less cpu (1VCPU), memory (2gb)
Mysql servers have 8 - 32GB ram and full quad core processors (tried 5.6,5.7, and 8.0 with similar results).
The phppos_inventory table has ~170000 rows and the phppos_items table has ~3000 rows
Here is the query and the tables and explains
SELECT /*+ SEMIJOIN(#subq MATERIALIZATION) */ SQL_CALC_FOUND_ROWS
1 AS _h,
`phppos_location_items`.`location_id` AS `location_id`,
`phppos_items`.`item_id`,
`phppos_items`.`name`,
`phppos_categories`.`id` AS `category_id`,
`phppos_categories`.`name` AS `category`,
`location`,
`company_name`,
`phppos_items`.`item_number`,
`size`,
`product_id`,
Coalesce(phppos_location_item_variations.cost_price,
phppos_item_variations.cost_price, phppos_location_items.cost_price,
phppos_items.cost_price, 0) AS cost_price,
Coalesce(phppos_location_item_variations.unit_price,
phppos_item_variations.unit_price, phppos_location_items.unit_price,
phppos_items.unit_price, 0) AS unit_price,
Sum(Coalesce(inv.trans_current_quantity, 0)) AS quantity,
Coalesce(phppos_location_item_variations.reorder_level,
phppos_item_variations.reorder_level, phppos_location_items.reorder_level,
phppos_items.reorder_level) AS reorder_level,
Coalesce(phppos_location_item_variations.replenish_level,
phppos_item_variations.replenish_level, phppos_location_items.replenish_level,
phppos_items.replenish_level) AS replenish_level,
description
FROM `phppos_inventory` `inv`
LEFT JOIN `phppos_items`
ON `phppos_items`.`item_id` = `inv`.`trans_items`
LEFT JOIN `phppos_location_items`
ON `phppos_location_items`.`item_id` = `phppos_items`.`item_id`
AND `phppos_location_items`.`location_id` = `inv`.`location_id`
LEFT JOIN `phppos_item_variations`
ON `phppos_items`.`item_id` = `phppos_item_variations`.`item_id`
AND `phppos_item_variations`.`id` = `inv`.`item_variation_id`
AND `phppos_item_variations`.`deleted` = 0
LEFT JOIN `phppos_location_item_variations`
ON `phppos_location_item_variations`.`item_variation_id` =
`phppos_item_variations`.`id`
AND `phppos_location_item_variations`.`location_id` =
`inv`.`location_id`
LEFT OUTER JOIN `phppos_suppliers`
ON `phppos_items`.`supplier_id` =
`phppos_suppliers`.`person_id`
LEFT OUTER JOIN `phppos_categories`
ON `phppos_items`.`category_id` = `phppos_categories`.`id`
WHERE inv.trans_id = (SELECT Max(inv1.trans_id)
FROM phppos_inventory inv1
WHERE inv1.trans_items = inv.trans_items
AND ( inv1.item_variation_id =
phppos_item_variations.id
OR phppos_item_variations.id IS NULL )
AND inv1.location_id = inv.location_id
AND inv1.trans_date < '2019-12-31 23:59:59')
AND inv.location_id IN( 1 )
AND `phppos_items`.`system_item` = 0
AND `phppos_items`.`deleted` = 0
AND `is_service` != 1
GROUP BY `phppos_items`.`item_id`
LIMIT 20
Explain mysql (slighly different than maria db but I tried use index to match the execution plan and still was slow)
+------------------------------------------+-------+----------+------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+---------------------------------+------------+--------+------------------------------+-------+----------+------------------------------------+
| 1 | PRIMARY | phppos_items | NULL | ref | PRIMARY,item_number,product_id,phppos_items_ibfk_1,deleted,phppos_items_ibfk_3,phppos_items_ibfk_4,phppos_items_ibfk_5,description,size,reorder_level,cost_price,unit_price,promo_price,last_modified,name,phppos_items_ibfk_6,deleted_system_item,custom_field_1_value,custom_field_2_value,custom_field_3_value,custom_field_4_value,custom_field_5_value,custom_field_6_value,custom_field_7_value,custom_field_8_value,custom_field_9_value,custom_field_10_value,verify_age,phppos_items_ibfk_7,item_inactive_index,tags,full_search,name_search,item_number_search,product_id_search,description_search,size_search,custom_field_1_value_search,custom_field_2_value_search,custom_field_3_value_search,custom_field_4_value_search,custom_field_5_value_search,custom_field_6_value_search,custom_field_7_value_search,custom_field_8_value_search,custom_field_9_value_search,custom_field_10_value_search | deleted | 4 | const | 21188 | 9.00 | Using index condition; Using where |
| 1 | PRIMARY | inv | NULL | ref | phppos_inventory_ibfk_1,location_id,phppos_inventory_custom | phppos_inventory_custom | 8 | pos.phppos_items.item_id,const | 3 | 100.00 | NULL |
| 1 | PRIMARY | phppos_location_items | NULL | eq_ref | PRIMARY,phppos_location_items_ibfk_2 | PRIMARY | 8 | const,pos.phppos_items.item_id | 1 | 100.00 | NULL |
| 1 | PRIMARY | phppos_item_variations | NULL | eq_ref | PRIMARY,phppos_item_variations_ibfk_1 | PRIMARY | 4 | pos.inv.item_variation_id | 1 | 100.00 | Using where |
| 1 | PRIMARY | phppos_location_item_variations | NULL | eq_ref | PRIMARY,phppos_item_attribute_location_values_ibfk_2 | PRIMARY | 8 | pos.phppos_item_variations.id,const | 1 | 100.00 | NULL |
| 1 | PRIMARY | phppos_suppliers | NULL | ref | person_id | person_id | 4 | pos.phppos_items.supplier_id | 1 | 100.00 | NULL |
| 1 | PRIMARY | phppos_categories | NULL | eq_ref | PRIMARY | PRIMARY | 4 | pos.phppos_items.category_id | 1 | 100.00 | NULL |
| 2 | DEPENDENT SUBQUERY | inv1 | NULL | ref | phppos_inventory_ibfk_1,location_id,trans_date,phppos_inventory_ibfk_4,phppos_inventory_custom | phppos_inventory_custom | 8 | pos.inv.trans_items,pos.inv.location_id | 3 | 50.00 | Using where; Using index |
+----+--------------------+---------------------------------+------------+--------+---------------------------------------------------------------------------------------------------------
Explain maria db:
+------+---------------------------------------------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------------+---------------------------------+--------+------------------------------+
| 1 | PRIMARY | phppos_items | ref | PRIMARY,deleted,deleted_system_item | deleted | 4 | const | 23955 | Using where |
| 1 | PRIMARY | inv | ref | phppos_inventory_ibfk_1,location_id,phppos_inventory_custom | phppos_inventory_ibfk_1 | 4 | freelance_pos5.phppos_items.item_id | 2 | Using where |
| 1 | PRIMARY | phppos_location_items | eq_ref | PRIMARY,phppos_location_items_ibfk_2 | PRIMARY | 8 | const,freelance_pos5.phppos_items.item_id | 1 | |
| 1 | PRIMARY | phppos_item_variations | eq_ref | PRIMARY,phppos_item_variations_ibfk_1 | PRIMARY | 4 | freelance_pos5.inv.item_variation_id | 1 | Using where |
| 1 | PRIMARY | phppos_location_item_variations | eq_ref | PRIMARY,phppos_item_attribute_location_values_ibfk_2 | PRIMARY | 8 | freelance_pos5.phppos_item_variations.id,const | 1 | Using where |
| 1 | PRIMARY | phppos_suppliers | ref | person_id | person_id | 4 | freelance_pos5.phppos_items.supplier_id | 1 | Using where |
| 1 | PRIMARY | phppos_categories | eq_ref | PRIMARY | PRIMARY | 4 | freelance_pos5.phppos_items.category_id | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | inv1 | ref | phppos_inventory_ibfk_1,location_id,trans_date,phppos_inventory_ibfk_4,phppos_inventory_custom | phppos_inventory_custom | 8 | freelance_pos5.inv.trans_items,freelance_pos5.inv.location_id | 2 | Using where; Using index |
+------+--------------------+---------------------------------+--------+------------------------------------------------------------------------------------------------+-------------------------+---------+---------------------------------------------------------------+-------+--------------------------+
Tables described (Reached StackOverflow char limit)
https://pastebin.com/nhngSHb8
Create tables:
https://pastebin.com/aWMeriqt
MYSQL (DEV BOX)
mysql> SHOW GLOBAL STATUS LIKE '%thread%';
+------------------------------------------+-------+
| Variable_name | Value |
+------------------------------------------+-------+
| Delayed_insert_threads | 0 |
| Performance_schema_thread_classes_lost | 0 |
| Performance_schema_thread_instances_lost | 0 |
| Slow_launch_threads | 0 |
| Threads_cached | 4 |
| Threads_connected | 1 |
| Threads_created | 5 |
| Threads_running | 1 |
+------------------------------------------+-------+
8 rows in set (0.06 sec)
MARIA DB
MariaDB [freelance_pos5]> SHOW GLOBAL STATUS LIKE '%thread%';
+------------------------------------------+-------+
| Variable_name | Value |
+------------------------------------------+-------+
| Delayed_insert_threads | 0 |
| Performance_schema_thread_classes_lost | 0 |
| Performance_schema_thread_instances_lost | 0 |
| Slow_launch_threads | 0 |
| Threadpool_idle_threads | 0 |
| Threadpool_threads | 0 |
| Threads_cached | 3 |
| Threads_connected | 2 |
| Threads_created | 5 |
| Threads_running | 1 |
| wsrep_applier_thread_count | 0 |
| wsrep_rollbacker_thread_count | 0 |
| wsrep_thread_count | 0 |
+------------------------------------------+-------+
13 rows in set (0.00 sec)
Moving the
WHERE inv.trans_id = (SELECT Max(inv1.trans_id)
into the INNER JOIN is the game changer.
INNER JOIN (
SELECT inv1.trans_items, inv1.item_variation_id, inv1.location_id, MAX(inv1.trans_id) as trans_id
FROM phppos_inventory inv1
WHERE inv1.trans_date < '2019-12-31 23:59:59'
GROUP BY inv1.trans_items, inv1.item_variation_id, inv1.location_id
ORDER BY inv1.trans_items, inv1.item_variation_id, inv1.location_id
) inv1 on inv1.trans_id = inv.trans_id
AND inv1.trans_items = inv.trans_items
AND (inv1.item_variation_id = phppos_item_variations.id OR phppos_item_variations.id IS NULL)
AND inv1.location_id = inv.location_id
The execution is reduced from 80+s down to ~ <0.4s, on MySQL 8.0.
MariaDB's and MySQL's Optimizers started diverging significantly at 5.6. Certain queries will run queries faster in one than the other.
I think I see a way to speed up the query, perhaps on both versions.
Don't use LEFT JOIN when it is the same as JOIN, which seems to be the case for at least phppos_items, which has items in the WHERE that override LEFT.
Please provide SHOW CREATE TABLE; meanwhile, I will guess that what indexes you have/don't have, and that each table has PRIMARY KEY(id)
Use composite indexes where appropriate. (More below.)
Get the 20 rows before JOINing to the rest of the tables:
SELECT ...
FROM ( SELECT inv.id, pi.id
FROM `phppos_inventory` AS inv `inv`
JOIN `phppos_items` AS pi
ON pi.`item_id` = `inv`.`trans_items`
AND inv.location_id IN( 1 )
AND pi.`system_item` = 0
AND pi.`deleted` = 0
AND `is_service` != 1 -- Which table is this in???
GROUP BY pi.`item_id`
LIMIT 20 )
LEFT JOIN .... (( all the other tables ))
-- no GROUP BY or LIMIT needed (I think)
phppos_items: INDEX(item_id, deleted, system_item, is_service)
phppos_items: INDEX(deleted, system_item, is_service)
phppos_inventory: INDEX(trans_items, location_id, location_id, item_variation_id, trans_date, trans_id)
phppos_inventory: INDEX(location_id)
Aside with the fact that the query is misleading since the outer join is discarded, the main difference is that the second engine operation in MariabDB is an index range scan (ref) using the phppos_inventory_custom index. MySQL also chose an index range scan but over phppos_inventory_ibfk_1.
However, without the definition of these two indexes it's difficult to asses why the engines may have chosen a different path.
Please add to your question the definition of these indexes, and alse their selectivity (percent of estimated rows selected / total table rows) to elaborate more.
MySQL gurus
We have readonly dataset, which contains around 4 million rows. This dataset is represented as 1 InnoDB table with a bunch of indices, like this:
CREATE TABLE IF NOT EXISTS dim_users (
ds varchar(10),
user_id varchar(32),
date_joined varchar(10),
payer boolean,
source varchar(12),
country varchar(2),
is1dayactive boolean,
is7dayactive boolean,
is28dayactive boolean,
istest boolean
) ENGINE = InnoDB;
ALTER TABLE dim_users ADD INDEX (ds);
ALTER TABLE dim_users ADD INDEX (user_id);
ALTER TABLE dim_users ADD INDEX (date_joined);
ALTER TABLE dim_users ADD INDEX (source);
ALTER TABLE dim_users ADD INDEX (country);
MySQL8 is running on Ubuntu 18.04 in servers.com provider on a cloud server with 8GB of RAM and swap turned off.
The database has default config, except
[mysqld]
innodb-buffer-pool-size=6G
read_only=1
The problem is that aggregation queries like select count(*) from dim_users or select country,count(*) from dim_users group by 1 order by 1 desc; take too much time (up to 12 seconds) comparing to default Postgres 11 setup on the same vhost.
Default psql11 performs the last query for less than 2 seconds.
We were trying to profile and analyze this request on MySQL:
Explain:
mysql> explain analyze select country,count(*) from dim_users group by 1 order by 1 desc;
+---------------+
| EXPLAIN
+---------------+
| -> Group aggregate: count(0) (actual time=241.091..14777.679 rows=11 loops=1)
| -> Index scan on dim_users using country (reverse) (cost=430171.90 rows=4230924) (actual time=0.054..11940.166 rows=4228692 loops=1)
|
+---------------+
1 row in set (14.78 sec)
Profile:
mysql> SELECT SEQ,STATE,DURATION,SWAPS,SOURCE_FUNCTION,SOURCE_FILE,SOURCE_LINE FROM INFORMATION_SCHEMA.PROFILING WHERE QUERY_ID=1;
+-----+--------------------------------+-----------+-------+-------------------------+----------------------+-------------+
| SEQ | STATE | DURATION | SWAPS | SOURCE_FUNCTION | SOURCE_FILE | SOURCE_LINE |
+-----+--------------------------------+-----------+-------+-------------------------+----------------------+-------------+
| 2 | starting | 0.000115 | 0 | NULL | NULL | NULL |
| 3 | Executing hook on transaction | 0.000008 | 0 | launch_hook_trans_begin | rpl_handler.cc | 1119 |
| 4 | starting | 0.000013 | 0 | launch_hook_trans_begin | rpl_handler.cc | 1121 |
| 5 | checking permissions | 0.000010 | 0 | check_access | sql_authorization.cc | 2176 |
| 6 | Opening tables | 0.000050 | 0 | open_tables | sql_base.cc | 5591 |
| 7 | init | 0.000012 | 0 | execute | sql_select.cc | 677 |
| 8 | System lock | 0.000020 | 0 | mysql_lock_tables | lock.cc | 331 |
| 9 | optimizing | 0.000007 | 0 | optimize | sql_optimizer.cc | 282 |
| 10 | statistics | 0.000046 | 0 | optimize | sql_optimizer.cc | 502 |
| 11 | preparing | 0.000035 | 0 | optimize | sql_optimizer.cc | 583 |
| 12 | executing | 13.552128 | 0 | ExecuteIteratorQuery | sql_union.cc | 1409 |
| 13 | end | 0.000037 | 0 | execute | sql_select.cc | 730 |
| 14 | query end | 0.000008 | 0 | mysql_execute_command | sql_parse.cc | 4606 |
| 15 | waiting for handler commit | 0.000018 | 0 | ha_commit_trans | handler.cc | 1589 |
| 16 | closing tables | 0.000015 | 0 | mysql_execute_command | sql_parse.cc | 4657 |
| 17 | freeing items | 0.000039 | 0 | mysql_parse | sql_parse.cc | 5330 |
| 18 | cleaning up | 0.000020 | 0 | dispatch_command | sql_parse.cc | 2184 |
+-----+--------------------------------+-----------+-------+-------------------------+----------------------+-------------+
As you can see, group by query takes ages to complete.
So, my questions are the following.
I've never been using MySQL before, my main DB has always been Postgresql for years and I don't know whether it's normal behaviour for this DB or not - it's very slow when we want to execute group by queries.
Otherwise, if I'm missing something very important here, eg DB or OS configuration, could you please give me a pieces of advice on how to properly setup MySQL for read only purposes.
Thank you!
I have a nano AWS server running MySQL 5.5 for testing purposes. So, keep in mind that the server has limited resources (RAM, CPU, ...).
I have a table called "gpslocations". There is a primary index on its primary key "GPSLocationID". There is another secondary index on one of its fields "userID". The table has 6583 records.
When I run this query:
select * from gpslocations where GPSLocationID in (select max(GPSLocationID) from gpslocations where userID in (1,9) group by userID);
I get two rows and it takes a lot of time:
+---------------+---------------------+------------+-----------+--------------------------------------+--------+--------------------------------------+-------+-----------+----------+---------------------+----------------+----------+-----------+-----------+
| GPSLocationID | lastUpdate | latitude | longitude | phoneNumber | userID | sessionID | speed | direction | distance | gpsTime | locationMethod | accuracy | extraInfo | eventType |
+---------------+---------------------+------------+-----------+--------------------------------------+--------+--------------------------------------+-------+-----------+----------+---------------------+----------------+----------+-----------+-----------+
| 4107 | 2018-09-25 16:38:44 | 58.7641435 | 7.4868510 | e5d6fdff-9afe-44bb-a53a-3b454b12c9c6 | 9 | 77385f89-6b72-4b9e-b937-d2927959e0bd | 0 | 0 | 2.9 | 2018-09-25 18:38:43 | fused | 455 | 0 | android |
| 9822 | 2018-10-22 10:29:43 | 58.7794353 | 7.1952995 | 5240853e-2c36-4563-9dc3-238039de411e | 1 | 1fcad5af-c6ef-4bda-8fb2-d6e5688cf08a | 0 | 0 | 185.6 | 2018-10-22 12:29:41 | fused | 129 | 0 | android |
+---------------+---------------------+------------+-----------+--------------------------------------+--------+--------------------------------------+-------+-----------+----------+---------------------+----------------+----------+-----------+-----------+
2 rows in set (14.96 sec)
When I just execute the inner select:
select max(GPSLocationID) from gpslocations where userID in (1,9) group by userID;
I get two values very fast:
+--------------------+
| max(GPSLocationID) |
+--------------------+
| 9822 |
| 4107 |
+--------------------+
2 rows in set (0.00 sec)
When I take these two values and write them manually in the outer select:
select * from gpslocations where GPSLocationID in (9822,4107);
I get exactly the same result as the first query but in no time!
+---------------+---------------------+------------+-----------+--------------------------------------+--------+--------------------------------------+-------+-----------+----------+---------------------+----------------+----------+-----------+-----------+
| GPSLocationID | lastUpdate | latitude | longitude | phoneNumber | userID | sessionID | speed | direction | distance | gpsTime | locationMethod | accuracy | extraInfo | eventType |
+---------------+---------------------+------------+-----------+--------------------------------------+--------+--------------------------------------+-------+-----------+----------+---------------------+----------------+----------+-----------+-----------+
| 4107 | 2018-09-25 16:38:44 | 58.7641435 | 7.4868510 | e5d6fdff-9afe-44bb-a53a-3b454b12c9c6 | 9 | 77385f89-6b72-4b9e-b937-d2927959e0bd | 0 | 0 | 2.9 | 2018-09-25 18:38:43 | fused | 455 | 0 | android |
| 9822 | 2018-10-22 10:29:43 | 58.7794353 | 7.1952995 | 5240853e-2c36-4563-9dc3-238039de411e | 1 | 1fcad5af-c6ef-4bda-8fb2-d6e5688cf08a | 0 | 0 | 185.6 | 2018-10-22 12:29:41 | fused | 129 | 0 | android |
+---------------+---------------------+------------+-----------+--------------------------------------+--------+--------------------------------------+-------+-----------+----------+---------------------+----------------+----------+-----------+-----------+
2 rows in set (0.00 sec)
Can anybody explain this huge performance degradation when the two simple and fast queries are combined in one?
EDIT
Here is the output of explain:
+----+--------------------+--------------+-------+----------------------+--------+---------+------+------+---------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------+-------+----------------------+--------+---------+------+------+---------------------------------------+
| 1 | PRIMARY | gpslocations | ALL | NULL | NULL | NULL | NULL | 6648 | Using where |
| 2 | DEPENDENT SUBQUERY | gpslocations | range | userNameIndex,userID | userID | 5 | NULL | 11 | Using where; Using index for group-by |
+----+--------------------+--------------+-------+----------------------+--------+---------+------+------+---------------------------------------+
2 rows in set (0.00 sec)
in can have really bad optimization characteristics. In your version of MySQL, the subquery is probably being run once for every row in gsplocations. I think this performance problem was fixed in later versions.
I recommend using a correlated subquery instead:
select l.*
from gpslocations l
where l.GPSLocationID = (select max(l2.GPSLocationID)
from gpslocations l2
where l2.userID = l.userId
) and
l.userID in (1, 9);
And for this, you want an index on gpslocations(userID, GPSLocationID).
Another alternative is the join approach:
select l.*
from gpslocations l join
(select l2.userID, max(l2.GPSLocationID)
from gpslocations l2
where l2.userID in (1, 9)
) l2
on l2.userID = l.userId
where l.userID in (1, 9);
It seems that mysql is only selecting data from the first partition and last partition when you use a date range.
| sales | CREATE TABLE `sales` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`quantity_sold` int(11) NOT NULL,
`prod_id` int(11) NOT NULL,
`store_id` int(11) NOT NULL,
`date` date NOT NULL,
KEY `prod_id` (`prod_id`),
KEY `date` (`date`),
KEY `store_id` (`store_id`),
KEY `id` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=577574322 DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE (to_days(date))
(PARTITION p0 VALUES LESS THAN (0) ENGINE = InnoDB,
PARTITION p201211 VALUES LESS THAN (735203) ENGINE = InnoDB,
PARTITION p201212 VALUES LESS THAN (735234) ENGINE = InnoDB,
PARTITION p201301 VALUES LESS THAN (735265) ENGINE = InnoDB,
PARTITION p201302 VALUES LESS THAN (735293) ENGINE = InnoDB,
PARTITION p201303 VALUES LESS THAN (735324) ENGINE = InnoDB,
PARTITION p201304 VALUES LESS THAN (735354) ENGINE = InnoDB,
PARTITION p201305 VALUES LESS THAN (735385) ENGINE = InnoDB,
PARTITION p201306 VALUES LESS THAN (735415) ENGINE = InnoDB,
PARTITION p201307 VALUES LESS THAN (735446) ENGINE = InnoDB,
PARTITION p201308 VALUES LESS THAN (735477) ENGINE = InnoDB,
PARTITION p201309 VALUES LESS THAN (735507) ENGINE = InnoDB,
PARTITION p201310 VALUES LESS THAN (735538) ENGINE = InnoDB,
PARTITION p201311 VALUES LESS THAN (735568) ENGINE = InnoDB,
PARTITION p201312 VALUES LESS THAN (735599) ENGINE = InnoDB,
PARTITION p201401 VALUES LESS THAN (735630) ENGINE = InnoDB,
PARTITION p201402 VALUES LESS THAN (735658) ENGINE = InnoDB,
PARTITION p201403 VALUES LESS THAN (735689) ENGINE = InnoDB,
PARTITION p201404 VALUES LESS THAN (735719) ENGINE = InnoDB,
PARTITION p201405 VALUES LESS THAN (735750) ENGINE = InnoDB,
PARTITION p201406 VALUES LESS THAN (735780) ENGINE = InnoDB,
PARTITION p201407 VALUES LESS THAN (735811) ENGINE = InnoDB,
PARTITION p201408 VALUES LESS THAN (735842) ENGINE = InnoDB,
PARTITION p201409 VALUES LESS THAN (735872) ENGINE = InnoDB,
PARTITION p201410 VALUES LESS THAN (735903) ENGINE = InnoDB,
PARTITION p201411 VALUES LESS THAN (735933) ENGINE = InnoDB,
PARTITION p201412 VALUES LESS THAN (735964) ENGINE = InnoDB,
PARTITION P201501 VALUES LESS THAN (735995) ENGINE = InnoDB,
PARTITION P201502 VALUES LESS THAN (736023) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ |
Select the sales (should get data from all of the partitions) But only from the first and last?
`mysql> select * from sales where prod_id = 232744 and store_id = 300;
+-----------+---------------+---------+----------+------------+
| id | quantity_sold | prod_id | store_id | date |
+-----------+---------------+---------+----------+------------+
| 2309 | 1 | 232744 | 300 | 2012-11-26 |
| 2484 | 10 | 232744 | 300 | 2012-11-27 |
| 2837 | 7 | 232744 | 300 | 2012-11-29 |
| 3001 | 9 | 232744 | 300 | 2012-11-30 |
| 571930074 | 4 | 232744 | 300 | 2014-12-02 |
| 573051350 | 13 | 232744 | 300 | 2014-12-03 |
| 574181358 | 5 | 232744 | 300 | 2014-12-04 |
| 575322316 | 9 | 232744 | 300 | 2014-12-05 |
| 576455102 | 4 | 232744 | 300 | 2014-12-06 |
| 577545446 | 2 | 232744 | 300 | 2014-12-07 |
+-----------+---------------+---------+----------+------------+`
The explain partition show that it is scanning all of the partitions.
mysql> explain partitions select * from sales where prod_id = 232744 and store_id =300\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: sales
partitions: p0,p201211,p201212,p201301,p201302,p201303,p201304,p201305,p201306,p201307,p201308,p201309,p201310,p201311,p201312,p201401,p201402,p201403,p201404,p201405,p201406,p201407,p201408,p201409,p201410,p201411,p201412,P201501,P201502,p1
type: index_merge
possible_keys: prod_id,store_id
key: prod_id,store_id
key_len: 4,4
ref: NULL
rows: 20
Extra: Using intersect(prod_id,store_id); Using where
1 row in set (0.00 sec)
If I manually select the partition we can see there is data there, which should be in the data above.
`mysql> select * from sales PARTITION (p201410) where prod_id = 232744 and store_id = 300;
+-----------+---------------+---------+----------+------------+
| id | quantity_sold | prod_id | store_id | date |
+-----------+---------------+---------+----------+------------+
| 509534154 | 2 | 232744 | 300 | 2014-10-01 |
| 510606312 | 10 | 232744 | 300 | 2014-10-02 |
| 511682398 | 4 | 232744 | 300 | 2014-10-03 |
| 512752933 | 2 | 232744 | 300 | 2014-10-04 |
| 514812731 | 3 | 232744 | 300 | 2014-10-06 |
| 515862308 | 6 | 232744 | 300 | 2014-10-07 |
| 516922728 | 5 | 232744 | 300 | 2014-10-08 |
| 517990349 | 19 | 232744 | 300 | 2014-10-09 |
| 519066761 | 17 | 232744 | 300 | 2014-10-10 |
| 520136175 | 3 | 232744 | 300 | 2014-10-11 |
| 522185901 | 1 | 232744 | 300 | 2014-10-14 |
| 523238559 | 3 | 232744 | 300 | 2014-10-15 |
| 524294166 | 7 | 232744 | 300 | 2014-10-16 |
| 525354982 | 3 | 232744 | 300 | 2014-10-17 |
| 526412605 | 1 | 232744 | 300 | 2014-10-18 |
| 527444329 | 1 | 232744 | 300 | 2014-10-19 |
| 528452608 | 1 | 232744 | 300 | 2014-10-20 |
| 529488414 | 2 | 232744 | 300 | 2014-10-21 |
| 530541002 | 3 | 232744 | 300 | 2014-10-22 |
| 531603714 | 4 | 232744 | 300 | 2014-10-23 |
| 532672667 | 6 | 232744 | 300 | 2014-10-24 |
| 534793524 | 1 | 232744 | 300 | 2014-10-26 |
| 535819138 | 1 | 232744 | 300 | 2014-10-27 |
| 537957232 | 1 | 232744 | 300 | 2014-10-29 |
| 539037254 | 1 | 232744 | 300 | 2014-10-30 |
| 540125545 | 2 | 232744 | 300 | 2014-10-31 |
+-----------+---------------+---------+----------+------------+
26 rows in set (0.03 sec)`
If you were to do a select * from sales where prod_id = 232744; it will select all of the data.It seems to be just when you add a store_id condition in there that it doesn't select the correct data.
I'm stumped. I've tried:
Restarting mysql
I'm about to try a OPTIMIZE TABLE (I have to move the databases because of space constraints)
Seems to me there is something wrong with the keys? corrupt table?
Thanks!
I have a table that with about 1 billion rows that looks like this:
CREATE TABLE `ghcnddata` (
`date` date NOT NULL ,
`TMIN` float(6,2) NULL DEFAULT NULL ,
`TMAX` float(6,2) NULL DEFAULT NULL ,
`PRCP` float(6,2) NULL DEFAULT NULL ,
`SNOW` float(6,2) NULL DEFAULT NULL ,
`SNWD` float(6,2) NULL DEFAULT NULL ,
`station` varchar(30),
PRIMARY KEY (`station`, `date`),
INDEX `date` (`date`) USING BTREE ,
INDEX `station` (`station`) USING BTREE
) ENGINE=InnoDB
All of the queries I run have a line that looks like this:
WHERE `station` = "ABSUXNNSDIA3"
and a line that looks like this:
AND `date` BETWEEN "1990-01-01" AND "2010-01-01"
There are about 30,000 unique values for the station field, and no queries refer to more than 1 station. Ideally I would like to simulate having 33,333 different tables; one per station (1 billion/30,000 = 33,333).
Initially I thought I could accomplish this by setting a HASH index on station, but apparently that is only for MEMORY tables. Then I thought I PARTITION BY KEY (station) PARTITIONS 33333, but it seems that this is far too many partitions.
What should I do in this scenario? I can't really experiment because the table is so large that any modifications take a very long time.
There is no master/slave or replication or clustering or anything fancy like that.
You don't necessarily need one partition per station. The point of HASH or KEY partitioning is that you define a fixed number of partitions, and multiple values are mapped into that partition.
mysql> alter table ghcnddata partition by key(station) partitions 31;
I choose a prime number for the number of partitions just out of habit, because it helps distribute data over the partitions more evenly if the data follows a pattern (like only odd values).
mysql> insert into ghcnddata (station, date) values ('abc', now());
mysql> insert into ghcnddata (station, date) values ('def', now());
mysql> insert into ghcnddata (station, date) values ('ghi', now());
mysql> insert into ghcnddata (station, date) values ('jkl', now());
mysql> insert into ghcnddata (station, date) values ('mno', now());
mysql> insert into ghcnddata (station, date) values ('qrs', now());
mysql> insert into ghcnddata (station, date) values ('tuv', now());
mysql> insert into ghcnddata (station, date) values ('wxyz', now());
When I run a query with EXPLAIN PARTITIONS it tells me which partition(s) it must read.
mysql> explain partitions select * from ghcnddata where station='tuv';
+----+-------------+-----------+------------+------+-----------------+---------+---------+-------+------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------------+------+-----------------+---------+---------+-------+------+-------------+
| 1 | SIMPLE | ghcnddata | p21 | ref | PRIMARY,station | PRIMARY | 122 | const | 1 | Using where |
+----+-------------+-----------+------------+------+-----------------+---------+---------+-------+------+-------------+
We can see in this case that only partition 21 was read when I reference station 'tuv'.
Note that partitioning is not a panacea. It only helps to reduce the work of the query if you search for a constant value (not a variable, or a join condition, etc.) in the same column that you defined as the partitioning key.
The rows I just inserted should be roughly evenly distributed, but not perfectly evenly distributed. And there's no guarantee it's one station value per partition.
mysql> select table_name, partition_name, table_rows
from information_schema.partitions where table_name='ghcnddata';
+------------+----------------+------------+
| table_name | partition_name | table_rows |
+------------+----------------+------------+
| ghcnddata | p0 | 1 |
| ghcnddata | p1 | 2 |
| ghcnddata | p2 | 0 |
| ghcnddata | p3 | 0 |
| ghcnddata | p4 | 0 |
| ghcnddata | p5 | 0 |
| ghcnddata | p6 | 0 |
| ghcnddata | p7 | 0 |
| ghcnddata | p8 | 0 |
| ghcnddata | p9 | 0 |
| ghcnddata | p10 | 0 |
| ghcnddata | p11 | 0 |
| ghcnddata | p12 | 0 |
| ghcnddata | p13 | 0 |
| ghcnddata | p14 | 0 |
| ghcnddata | p15 | 0 |
| ghcnddata | p16 | 0 |
| ghcnddata | p17 | 0 |
| ghcnddata | p18 | 0 |
| ghcnddata | p19 | 0 |
| ghcnddata | p20 | 0 |
| ghcnddata | p21 | 2 |
| ghcnddata | p22 | 1 |
| ghcnddata | p23 | 1 |
| ghcnddata | p24 | 1 |
| ghcnddata | p25 | 0 |
| ghcnddata | p26 | 0 |
| ghcnddata | p27 | 0 |
| ghcnddata | p28 | 0 |
| ghcnddata | p29 | 0 |
| ghcnddata | p30 | 0 |
+------------+----------------+------------+
P.S.: Your table's index on station is redundant, because that's the leftmost column of your primary key already.