Usage of MySQL indexes - mysql

I have a very simple table with five columns,
CREATE TABLE notification_tag (
_id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
notification_id INT NOT NULL,
tag_value CHAR(11) NOT NULL,
recipient CHAR(11) NOT NULL,
brand CHAR(11),
INDEX tag_value (tag_value),
INDEX notification_id_tag_value_recipient_brand (notification_id, tag_value, recipient, brand)
) CHARACTER SET ascii COLLATE ascii_bin;
Explain shows that MySQL is using the key tag_value when I run the below query,
select * from notification_tag
where recipient='user' and tag_value='doc1' and (brand='brand' or brand is null);
+--+-----------+----------------+----------+-------------------------------------------------------------+----+---------+-------+-----+----+--------+-----------+
|id|select_type|table |partitions|possible_keys |type|key |key_len|ref |rows|filtered|Extra |
+--+-----------+----------------+----------+-------------------------------------------------------------+----+---------+-------+-----+----+--------+-----------+
|1 |SIMPLE |notification_tag|NULL |tag_value,notification_id_tag_value_recipient_brand|ref |tag_value|11 |const|1 |100 |Using where|
+--+-----------+----------------+----------+-------------------------------------------------------------+----+---------+-------+-----+----+--------+-----------+
Is there any reason for not to use notification_id_tag_value_recipient_brand index?

Your query will not use the index on (notification_id, tag_value, recipient, brand) because the query doesn't have any term to search the leftmost column of the index.
Think of a telephone book. It helps if you search people by last name, or by last name and first name. But if you search only by first name, the order of the entries in the book doesn't help.
You might also like my presentation How to Design Indexes, Really, or the video.

Related

MySQL Date Range Query Optimization

I have a MySQL table structured like this:
CREATE TABLE `messages` (
`id` int NOT NULL AUTO_INCREMENT,
`author` varchar(250) COLLATE utf8mb4_unicode_ci NOT NULL,
`message` varchar(2000) COLLATE utf8mb4_unicode_ci NOT NULL,
`serverid` varchar(200) COLLATE utf8mb4_unicode_ci NOT NULL,
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`guildname` varchar(1000) COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`,`date`)
) ENGINE=InnoDB AUTO_INCREMENT=27769461 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
I need to query this table for various statistics using date ranges for Grafana graphs, however all of those queries are extremely slow, despite the table being indexed using a composite key of id and date.
"id" is auto-incrementing and date is also always increasing.
The queries generated by Grafana look like this:
SELECT
UNIX_TIMESTAMP(date) DIV 120 * 120 AS "time",
count(DISTINCT(serverid)) AS "servercount"
FROM messages
WHERE
date BETWEEN FROM_UNIXTIME(1615930154) AND FROM_UNIXTIME(1616016554)
GROUP BY 1
ORDER BY UNIX_TIMESTAMP(date) DIV 120 * 120
This query takes over 30 seconds to complete with 27 million records in the table.
Explaining the query results in this output:
+----+-------------+----------+------------+------+---------------+------+---------+------+----------+----------+-----------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+------+----------+----------+-----------------------------+
| 1 | SIMPLE | messages | NULL | ALL | PRIMARY | NULL | NULL | NULL | 26952821 | 11.11 | Using where; Using filesort |
+----+-------------+----------+------------+------+---------------+------+---------+------+----------+----------+-----------------------------+
This indicates that MySQL is indeed using the composite primary key I created for indexing the data, but still has to scan almost the entire table, which I do not understand. How can I optimize this table for date range queries?
Plan A:
PRIMARY KEY(date, id), -- to cluster by date
INDEX(id) -- needed to keep AUTO_INCREMENT happy
Assiming the table is quite big, having date at the beginning of the PK puts the rows in the given date range all next to each other. This minimizes (somewhat) the I/O.
Plan B:
PRIMARY KEY(id),
INDEX(date, serverid)
Now the secondary index is exactly what is needed for the one query you have provided. It is optimized for searching by date, and it is smaller than the whole table, hence even faster (I/O-wise) than Plan A.
But, if you have a lot of different queries like this, adding a lot more indexes gets impractical.
Plan C: There may be a still better way:
PRIMARY KEY(id),
INDEX(server_id, date)
In theory, it can hop through that secondary index checking each server_id. But I am not sure that such an optimization exists.
Plan D: Do you need id for anything other than providing a unique PRIMARY KEY? If not, there may be other options.
The index on (id, date) doesn't help because the first key is id not date.
You can either
(a) drop the current index and index (date, id) instead -- when date is in the first place this can be used to filter for date regardless of the following columns -- or
(b) just create an additional index only on (date) to support the query.

MySQL Copy one column to another column in the same table for billions of rows takes too long

In MySQL (MariaDB actually) I have the following table:
table1:
id | val1 | val2 | val3 | val4 | val5
----------------------------------------------------
I am trying to copy val3 to val1 with the following statement:
UPDATE table1 SET val1=val3 where id=some_id;
The UPDATE command works but takes WAY too long, it takes 813 seconds for 15 Million rows. I have ~200 Billion rows to update, so it will take FOREVER...I think about 118 days.
Any tricks / suggestions on how to do this faster?
SHOW CREATE TABLE table1;
CREATE TABLE `table1` (
`id` int(10) unsigned NOT NULL,
`val1` smallint(5) unsigned NOT NULL,
`val2` mediumint(7) unsigned NOT NULL,
`val3` smallint(5) unsigned NOT NULL,
`val4` binary(1) NOT NULL DEFAULT '\0',
`val5` float DEFAULT NULL,
PRIMARY KEY (`id`,`val1`,`val2`,`val3`)
) ENGINE=TokuDB DEFAULT CHARSET=latin1 `COMPRESSION`=TOKUDB_LZMA
Update this column would make the entire row to be rewritten. It would demand time and it would be IO expensive. Two options:
Use a ternary condition while selecting data: Select IF(some_id =id, val3, val1)...
Split your query among different updates: update table set val1=val3 where id=some_id and val1<>val3 and id>=x and id<= x+1000000. If you can run the same query using different x's (1,1000001,2000001,...) it would make better use of your server cores, instead of using a single one. And once one of those queries end you would know that this part of the job is done. Your bottleneck will be IO and the number of cores you will be able to use.
2.1. Important point: in order to rewrite the least as possible, please make sure that you only update if val1<>val3
Try dropping the primary key, and replacing it with a non-unique index on the id field (this is needed to make the WHERE clause efficient). Do all the updates. Then remove the id index and add back the primary key.
ALTER TABLE table1 DROP PRIMARY KEY, ADD KEY (id);
do all the updates
ALTER TABLE table1 DROP KEY(id), ADD PRIMARY KEY(id, value1, value2, value3);

Can I create a MySQL index for LIKE searches with both left and right wildcards?

I’m using MySQL 5.5.37. I have a table with a column
`NAME` varchar(100) COLLATE utf8_bin NOT NULL
and I intend to have partial searches on the name column like
select * FROM organnization where name like ‘%abc%’
Note that I want to search that the string “abc” occur anywhere, not necessarily at the beginning. Given this, is there any index I can use on the column to optimize query execution?
If you expect a few matching results only, you can still create index on the name column to speed up queries, with help of a primary key.
If your table have a primary key like
org_id int not null auto_increment primary key,
name varchar(100) COLLATE utf8_bin NOT NULL,
desc varchar(200) COLLATE utf8_bin NOT NULL,
size int,
....
you can create an index on (name, org_id)
and do your query like this:
select * from orgnizations o1 join (select org_id from orgnizations where name like '%abcd%' ) o2 using (org_id)
should be faster than your original query
if you only need one or two other columns for the name searching, you can include those columns in your name index and do queries like
select org_id, name, size from orgnizations where name like '%abcd%'
will still be much faster then the full table scan

How to create index in SQL to increase performance

I have around 200,000 rows in database table. When I execute my search query, it's taking around 4-5 seconds to give me results in next page. I want that execution should be fast and results should be loaded under 2 seconds. I have around 16 columns in my table.
Following is my query for creation of table
Create table xml(
PID int not null,
Percentdisc int not null,
name varchar(100) not null,
brand varchar(30) not null,
store varchar(30) not null,
price int not null,
category varchar(20) not null,
url1 varchar(300) not null,
emavail varchar(100) not null,
dtime varchar(100) not null,
stock varchar(30) not null,
description varchar(200) not null,
avail varchar(20) not null,
tags varchar(30) not null,
dprice int not null,
url2 varchar(300),
url3 varchar(300),
sid int primary key auto_increment);
Select query which I'm using
select * from feed where (name like '%Baby%' And NAME like '%Bassinet%')
I dont have much knowledge of indexing the database, to increase performance. Please guide me what index to use.
Indexes aren't going to help. LIKE is a non sargable operator. http://en.wikipedia.org/wiki/Sargable
The wildcard opeartor % used in starting of matching string renders any index created useless .
More are the characters before 1st wildcard operator , faster is the index lookup scan .
Anyways you can add an index to existing table
ALTER TABLE feed ADD INDEX (NAME);
This will have no index usage even after creating index on NAME column becuse it has a leading % character
select * from feed where (name like '%Baby%' And NAME like '%Bassinet%')
This will use indexing as starting % removed
select * from feed where (name like 'Baby%' And NAME like 'Bassinet%')
There's a good read here.
LIKE does not use the full text indexing. If you want to use full text searching you can use MySQL full text search functions, You can read MySQL doc regarding this.
Here's the syntax for adding INDEX in MySQL:
ALTER TABLE `feed`
ADD INDEX (`Name`);
MySQL Match example:
Substring matches: (Matches: Babylonian, Bassineete etc.)
SELECT * FROM `feed` WHERE MATCH (NAME) AGAINST ("+Baby* +Bassinett*" IN BOOLEAN MODE);
Exact matches:
SELECT * FROM `feed` WHERE MATCH (NAME) AGAINST ("+Baby +Bassinett" IN BOOLEAN MODE);
In your case index is not usefull. When we find with like operator it not use index. When we direct search i.e columnname = 'Ajay', at this time it search in index(if apply). The reason is index is searching with the physical data ,not with logical data(for like operator).
You can use Full-text search for this where you can define only those column in which you need to find data. FTS is usefull and get faster data when more data as you have.
How to enable FTS, please check the link.
http://blog.sqlauthority.com/2008/09/05/sql-server-creating-full-text-catalog-and-index/

mySql Join optimisation query

I am trying to optimise my site and would be grateful for some help.
The site is a mobile phone comparison site and the query is to show the offers for a particular phone. The data is on 2 tables, the cmp_deals table has 931000 entries and the cmp_tariffs table has 2600 entries. The common field is the tariff_id.
###The deals table###
id int(11)
hs_id varchar(30)
tariff_id varchar(30)
hs_price decimal(4,2)
months_free int(2)
months_half int(2)
cash_back decimal(4,2)
free_gift text
retailer_id varchar(20)
###Deals Indexes###
hs_id INDEX 430 Edit Drop hs_id
months_half INDEX 33 Edit Drop months_half
months_free INDEX 25 Edit Drop months_free
hs_price INDEX 2223 Edit Drop hs_price
cash_back INDEX<br/>
###The tariff table###
ID int(11)
tariff_id varchar(30)
tariff_name varchar(255)
tariff_desc text
anytime_mins int(5)
offpeak_mins int(5)
texts int(5)
line_rental decimal(4,2)
cost_offset decimal(4,2)
No Indexes
The initial query is
SELECT * FROM `cmp_deals`
LEFT JOIN `cmp_tariffs` USING (tariff_id)
WHERE hs_id = 'iphone432gbwhite'
and then the results can be sorted by various items in the cmp_deals table such as cash_back or free_gift
SELECT * FROM `cmp_deals`
LEFT JOIN `cmp_tariffs` USING (tariff_id)
WHERE hs_id = 'iphone432gbwhite'
ORDER BY hs_price DESC
LIMIT 30
This is the result when I run an "EXPLAIN" command, but I don't really understand the results.
id select_type table type possible_keys key key_len ref rows Extra<br/>
1 SIMPLE cmp_deals ref hs_id hs_id 92 const 179 Using where; Using temporary; Using filesort<br/>
1 SIMPLE cmp_tariffs ALL NULL NULL NULL NULL 2582
I would like to know please if I am doing these queries in the most efficient way as the queries are averaging at 2 seconds plus.
Thanks in advance
Can't say I'm a fan of all those double IDs (numeric and human-readable). If you don't actually need the varchar versions, drop them.
Change the foreign key cmp_deals.tarrif_id to reference cmp_tarrifs.ID (ie, make it an INT and if using InnoDB, actually create foreign key constraints).
At the very least make cmp_tariffs.ID a primary key and optionally cmp_tariffs.tariff_id a unique index.
Having zero indexes on the tariffs table means it has to do a table scan to complete the join.