why prefix index is slower than index in mysql? - mysql

table:(quantity:2100W)
CREATE TABLE `prefix` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`number` int(11) NOT NULL,
`string` varchar(750) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `idx_string_prefix10` (`string`(10)),
KEY `idx_string` (`string`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
discrimination:
select count(distinct(left(string,10)))/count(*) from prefix;
+-------------------------------------------+
| count(distinct(left(string,10)))/count(*) |
+-------------------------------------------+
| 0.9999 |
+-------------------------------------------+
result:
select sql_no_cache count(*) from prefix force index(idx_string_prefix10)
where string <"1505d28b"
243.96s,241.88s
select sql_no_cache count(*) from prefix force index(idx_string)
where string < "1505d28b"
7.96s,7.21s,7.53s
why prefix index is slower than index in mysql?(forgive my broken English)
explain select sql_no_cache count(*) from prefix force index(idx_string_prefix10)
where string < "1505d28b";
+----+-------------+--------+------------+-------+---------------------+---------------------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+---------------------+---------------------+---------+------+---------+----------+-------------+
| 1 | SIMPLE | prefix | NULL | range | idx_string_prefix10 | idx_string_prefix10 | 42 | NULL | 3489704 | 100.00 | Using where |
+----+-------------+--------+------------+-------+---------------------+---------------------+---------+------+---------+----------+-------------+

When you use a prefix index, MySQL has to read from the index and also after reading the index, it has to read the row of data too, to make sure the value is selected by the WHERE condition. That's two reads, and scanning a lot more data.
When you use a non-prefix index, MySQL can read the whole string value from the index, and it knows immediately whether the value is selected by the condition, or if it can be skipped.

Related

How to use correct indexes with a double inner join query?

I have a query with 2 INNER JOIN statements, and only fetching a few column, but it is very slow even though I have indexes on all required columns.
My query
SELECT
dysfonctionnement,
montant,
listRembArticles,
case when dys.reimputation is not null then dys.reimputation else dys.responsable end as responsable_final
FROM
db.commandes AS com
INNER JOIN db.dysfonctionnements AS dys ON com.id_commande = dys.id_commande
INNER JOIN db.pe AS pe ON com.code_pe = pe.pe_id
WHERE
com.prestataireLAD REGEXP '.*'
AND pe_nom REGEXP 'bordeaux|chambéry-annecy|grenoble|lyon|marseille|metz|montpellier|nancy|nice|nimes|rouen|strasbourg|toulon|toulouse|vitry|vitry bis 1|vitry bis 2|vlg'
AND com.date_livraison BETWEEN '2022-06-11 00:00:00'
AND '2022-07-08 00:00:00';
It takes around 20 seconds to compute and fetch 4123 rows.
The problem
In order to find what's wrong and why is it so slow, I've used the EXPLAIN statement, here is the output:
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|----|-------------|-------|------------|--------|----------------------------|-------------|---------|------------------------|--------|----------|-------------|
| 1 | SIMPLE | dys | | ALL | id_commande,id_commande_2 | | | | 878588 | 100.00 | Using where |
| 1 | SIMPLE | com | | eq_ref | id_commande,date_livraison | id_commande | 110 | db.dys.id_commande | 1 | 7.14 | Using where |
| 1 | SIMPLE | pe | | ref | pe_id | pe_id | 5 | db.com.code_pe | 1 | 100.00 | Using where |
I can see that the dysfonctionnements JOIN is rigged, and doesn't use a key even though it could...
Table definitions
commandes (included relevant columns only)
CREATE TABLE `commandes` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`id_commande` varchar(36) NOT NULL DEFAULT '',
`date_commande` datetime NOT NULL,
`date_livraison` datetime NOT NULL,
`code_pe` int(11) NOT NULL,
`traitement_dysfonctionnement` tinyint(4) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_commande` (`id_commande`),
KEY `date_livraison` (`date_livraison`),
KEY `traitement_dysfonctionnement` (`traitement_dysfonctionnement`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
dysfonctionnements (again, relevant columns only)
CREATE TABLE `dysfonctionnements` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`id_commande` varchar(36) DEFAULT NULL,
`dysfonctionnement` varchar(150) DEFAULT NULL,
`responsable` varchar(50) DEFAULT NULL,
`reimputation` varchar(50) DEFAULT NULL,
`montant` float DEFAULT NULL,
`listRembArticles` text,
PRIMARY KEY (`id`),
UNIQUE KEY `id_commande` (`id_commande`,`dysfonctionnement`),
KEY `id_commande_2` (`id_commande`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
pe (again, relevant columns only)
CREATE TABLE `pe` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`pe_id` int(11) DEFAULT NULL,
`pe_nom` varchar(30) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `pe_nom` (`pe_nom`),
KEY `pe_id` (`pe_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Investigation
If I remove the db.pe table from the query and the WHERE clause on pe_nom, the query takes 1.7 seconds to fetch 7k rows, and with the EXPLAIN statement, I can see it is using keys as I expect it to do:
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|----|-------------|-------|------------|-------|----------------------------|----------------|---------|------------------------|--------|----------|-----------------------------------------------|
| 1 | SIMPLE | com | | range | id_commande,date_livraison | date_livraison | 5 | | 389558 | 100.00 | Using index condition; Using where; Using MRR |
| 1 | SIMPLE | dys | | ref | id_commande,id_commande_2 | id_commande_2 | 111 | ooshop.com.id_commande | 1 | 100.00 | |
I'm open to any suggestions, I see no reason not to use the key when it does on a very similar query and it definitely makes it faster...
I had a similar experience when MySQL optimiser selected a joined table sequence far from optimal. At that time I used MySQL specific STRAIGHT_JOIN operator to overcome default optimiser behaviour. In your case I would try this:
SELECT
dysfonctionnement,
montant,
listRembArticles,
case when dys.reimputation is not null then dys.reimputation else dys.responsable end as responsable_final
FROM
db.commandes AS com
STRAIGHT_JOIN db.dysfonctionnements AS dys ON com.id_commande = dys.id_commande
INNER JOIN db.pe AS pe ON com.code_pe = pe.pe_id
Also, in your WHERE clause one of the REGEXP probably might be changed to IN operator, I assume it can use index.
Remove com.prestataireLAD REGEXP '.*'. The Optimizer probably won't realize that this has no impact on the resultset. If you are dynamically building the WHERE clause, then eliminate anything else you can.
id_commande_2 is redundant. In queries where it might be useful, the UNIQUE can take care of it.
These indexes might help:
com: INDEX(date_livraison, id_commande, code_pe)
pe: INDEX(pe_nom, pe_id)

Implementation of composit clustered index in MySQL

I need to create composit clustered index like: username, name, id. Is it real to implement such thing? I need to boost perfomance of query like where username = ? and name = ? by using clustered indexes in Innodb. But i think it wont work because id stay at 3rd place, and it wont be used.
It's fine to define a clustered index with multiple columns.
CREATE TABLE mytable (
username VARCHAR(64) NOT NULL,
name VARCHAR(64) NOT NULL,
id BIGINT
PRIMARY KEY (username, name, id)
);
If you query against the first two columns, it will use the clustered index, so it will avoid the overhead of lookups via secondary indexes.
But if you use EXPLAIN to report the optimizer's plan for the query, you'll see that the access is type: ref which means an index lookup, but not a unique index lookup. That is, it will potentially match multiple rows.
mysql> explain select * from mytable where username = 'user' and name = 'name';
+----+-------------+---------+------------+------+---------------+---------+---------+-------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+---------+---------+-------------+------+----------+-------+
| 1 | SIMPLE | mytable | NULL | ref | PRIMARY | PRIMARY | 516 | const,const | 1 | 100.00 | NULL |
+----+-------------+---------+------------+------+---------------+---------+---------+-------------+------+----------+-------+
When doing lookups against a PRIMARY KEY, we'd like to see type: eq_ref or type: const which means it is doing a unique lookup, and the query is guaranteed to match either 0 or 1 row.
mysql> explain select * from mytable where username = 'user' and name = 'name' and id = 1;
+----+-------------+---------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------+
| 1 | SIMPLE | mytable | NULL | const | PRIMARY | PRIMARY | 524 | const,const,const | 1 | 100.00 | NULL |
+----+-------------+---------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------+
Both queries are using the clustered index.
Re your comment:
InnoDB requires the auto-increment column be the first column of a key in the table. It doesn't have to be the primary key. So you can do this for example:
CREATE TABLE `mytable` (
`username` varchar(64) NOT NULL,
`name` varchar(64) NOT NULL,
`id` bigint NOT NULL AUTO_INCREMENT,
`x` int DEFAULT NULL,
PRIMARY KEY (`username`,`name`,`id`),
KEY (`id`)
) ENGINE=InnoDB;
Notice I added an extra KEY (id) to satisfy InnoDB's requirement. But in the primary key, id is still at the end.

slower query for searching nearby coordinates

I seem to hit slower query result for searching nearby coordinates ( for now the query is for latitude). This is a mysql query
select ABS(propertyCoordinatesLat - 3.33234) as diff from tablename order by diff asc limit 0,20
is there a way to improve this besides relying on server scripting to do the sorting?
table dump.
CREATE TABLE `property` (
`propertyID` bigint(20) NOT NULL,
`propertyName` varchar(100) NOT NULL,
`propertyCoordinatesLat` varchar(100) NOT NULL,
`propertyCoordinatesLng` varchar(100) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
--
-- Indexes for dumped tables
--
--
-- Indexes for table `property`
--
ALTER TABLE `property`
ADD PRIMARY KEY (`propertyID`),
ADD KEY `propertyCoordinatesLat` (`propertyCoordinatesLat`,`propertyCoordinatesLng`),
ADD KEY `propertyCoordinatesLat_2` (`propertyCoordinatesLat`),
ADD KEY `propertyCoordinatesLng` (`propertyCoordinatesLng`);
--
-- AUTO_INCREMENT for dumped tables
--
--
-- AUTO_INCREMENT for table `property`
--
ALTER TABLE `property`
MODIFY `propertyID` bigint(20) NOT NULL AUTO_INCREMENT;
COMMIT;
The query is ordering by the difference between a string and a float. This odd calculation confuses and angers MySQL and results in a slow filesort.
mysql> explain select ABS(propertyCoordinatesLat - 3.33234) as diff from property order by diff
+----+-------------+----------+------------+-------+---------------+--------------------------+---------+------+------+----------+-----------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+---------------+--------------------------+---------+------+------+----------+-----------------------------+
| 1 | SIMPLE | property | NULL | index | NULL | propertyCoordinatesLat_2 | 302 | NULL | 1 | 100.00 | Using index; Using filesort |
+----+-------------+----------+------------+-------+---------------+--------------------------+---------+------+------+----------+-----------------------------+
Changing propertyCoordinatesLat and propertyCoordinatesLng to a more sensible numeric type lets MySQL optimize better. No more filesort. This should perform much better.
alter table property change propertyCoordinatesLat propertyCoordinatesLat numeric(10,8) not null;
alter table property change propertyCoordinatesLng propertyCoordinatesLng numeric(11,8) not null;
mysql> explain select ABS(propertyCoordinatesLat - 3.33234) as diff from property order by propertyCoordinatesLat asc limit 0,20;
+----+-------------+----------+------------+-------+---------------+--------------------------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+---------------+--------------------------+---------+------+------+----------+-------------+
| 1 | SIMPLE | property | NULL | index | NULL | propertyCoordinatesLat_2 | 5 | NULL | 1 | 100.00 | Using index |
+----+-------------+----------+------------+-------+---------------+--------------------------+---------+------+------+----------+-------------+
If you want to get fancy, look into MySQL's spatial types. These will probably perform better, and definitely be more accurate.

Impossible to avoid 'using filesort' in a very simple query with order

I'm trying to create a select query but mysql always use "using filesort" in extra when I use explain query.
I try with the most simple query, but the problem doesn't disappear.
The structure of my table 'Partidas' is:
CREATE TABLE IF NOT EXISTS `Partidas` (
`IdUsuario` int(11) NOT NULL,
`IdPartida` int(11) NOT NULL,
`TipoPartida` tinyint(4) NOT NULL,
`Facil` tinyint(1) NOT NULL DEFAULT '0',
`Normal` tinyint(1) NOT NULL DEFAULT '0',
`Dificil` tinyint(1) NOT NULL DEFAULT '0',
`FchPartida` date NOT NULL,
`PuntosPartida` mediumint(9) NOT NULL,
`IdPartidaTemp` bigint(20) NOT NULL,
`ComplPers` tinyint(1) NOT NULL,
`SoloMulti` tinyint(2) NOT NULL,
PRIMARY KEY (`IdUsuario`,`IdPartida`),
KEY `IX_PARTIDAS_RECORDS` (`TipoPartida`,`FchPartida`,`PuntosPartida`),
KEY `IX_PARTIDAS_ORDEN2` (`FchPartida`),
KEY `IX_PARTIDAS_COMPLPERS` (`ComplPers`,`FchPartida`,`PuntosPartida`),
KEY `IX_PARTIDAS_SOLOMULTI` (`SoloMulti`,`FchPartida`,`PuntosPartida`),
KEY `IX_PARTIDAS_DIFICULTAD` (`Facil`,`Normal`,`Dificil`,`SoloMulti`,`FchPartida`,`PuntosPartida`),
KEY `IX_PARTIDAS_COMPMULTI` (`ComplPers`,`SoloMulti`,`FchPartida`,`PuntosPartida`),
KEY `IX_PARTIDAS_COMPLPERS_SIMPLE` (`ComplPers`,`PuntosPartida`),
KEY `IX_PARTIDAS_SOLOMULTI_SIMPLE` (`SoloMulti`,`PuntosPartida`),
KEY `IX_PARTIDAS_FECHA` (`FchPartida`),
KEY `IX_PARTIDAS_PUNTOS` (`PuntosPartida`),
KEY `PRUEBA_PARTIDAS` (`PuntosPartida`,`TipoPartida`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
The table has about 1000-5000 rows (really little data), but always use using filesort. The query I'm using for test is:
explain select *
from Partidas
order by PuntosPartida
limit 0, 50;
and the result is:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | Partidas | ALL | NULL | NULL | NULL |NULL | 1041 | Using filesort |
but if in the query I change the limit, for example, limit 0,5; then the result changes too
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | Partidas | index | NULL | IX_PARTIDAS_PUNTOS | 3 |NULL | 5 | |
In mysql configuration, the variables of buffer and sort are:
-myisam sort buffer size: 2MB
-sort buffer size: 2MB
-key buffer size: 1GB
but I try to change these values (increasing it until 8MB) and the result is the same
Thank you for help
My guess is that this is the query optimizer doing its job. This article here shows that "the optimizer preferred a full table scan, and it did not even consider scanning the index as a relevant choice (possible_keys: NULL)"
You can force it to use an index but the execution time may be slower (as mentioned in the article as well).
select *
from Partidas FORCE INDEX(IX_PARTIDAS_PUNTOS)
order by PuntosPartida
limit 0, 50;
You can also read more here on how to avoid table scans ("using filesort")

How to make MySQL intersect keys of INTEGER field and FLOAT field?

I have the following MySQL table (table size - around 10K records):
CREATE TABLE `tmp_index_test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`m_id` int(11) DEFAULT NULL,
`r_id` int(11) DEFAULT NULL,
`price` float DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `m_key` (`m_id`),
KEY `r_key` (`r_id`),
KEY `price_key` (`price`)
) ENGINE=InnoDB AUTO_INCREMENT=16390 DEFAULT CHARSET=utf8;
As you can see, I have two INTEGER fields (r_id and m_id) and one FLOAT field (price).
For each of these fields I have an index.
Now, when I run a query with condition on the first integer AND on the second one, everything is fine:
mysql> explain select * from tmp_index_test where m_id=1 and r_id=2;
+----+-------------+----------------+-------------+---------------+-------------+---------+------+------+-------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+-------------+---------------+-------------+---------+------+------+-------------------------------------------+
| 1 | SIMPLE | tmp_index_test | index_merge | m_key,r_key | r_key,m_key | 5,5 | NULL | 1 | Using intersect(r_key,m_key); Using where |
+----+-------------+----------------+-------------+---------------+-------------+---------+------+------+-------------------------------------------+
Seems like MySQL performs it very well since there is the Using intersect(r_key,m_key) in the Extra field.
I'm not a MySQL expert, but according to what I understand, MySQL is first making the intersection on indexes, and only then collects the result of the intersection from the table itself.
HOWEVER, when I run very similar query, but instead of condition on two integers, I put similar condition on an integer and a float, MySQL refuses to intersect the result on indexes:
mysql> explain select * from tmp_index_test where m_id=3 and price=100;
+----+-------------+----------------+------+-----------------+-----------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+------+-----------------+-----------+---------+-------+------+-------------+
| 1 | SIMPLE | tmp_index_test | ref | m_key,price_key | price_key | 5 | const | 1 | Using where |
+----+-------------+----------------+------+-----------------+-----------+---------+-------+------+-------------+
As you can see, MySQL decides to use the index of price only.
My first question is why, and how to fix it?
In addition to it, I need to run queries with MORE sign (>) instead of the equal sign (=) on price. Currently explain shows that for such queries, MySQL uses the integer key only.
mysql> explain select * from tmp_index_test where m_id=3 and price > 100;
+----+-------------+----------------+------+-----------------+-------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+------+-----------------+-------+---------+-------+------+-------------+
| 1 | SIMPLE | tmp_index_test | ref | m_key,price_key | m_key | 5 | const | 2 | Using where |
+----+-------------+----------------+------+-----------------+-------+---------+-------+------+-------------+
I need to make somehow MySQL first do the intersection on indexes. Anybody has any idea how?
Thanks a lot in advance!
From the MySQL manual:
ref is used if the join uses only a leftmost prefix of the key or if
the key is not a PRIMARY KEY or UNIQUE index (in other words, if the
join cannot select a single row based on the key value). If the key
that is used matches only a few rows, this is a good join type.
price is not unique or primary, so ref is chosen. I don't believe you can force an intersect.