Simple SQL query lasts forever - mysql

I am using mysql-workbench and mysql server in ubunt 18 machine with 16 GB RAM.
I have a schema named ips, and two tables, say: table1 and table2.
In table1 and table2 there are two fields: ip and description, bit are of type string. I have a lot of record. table1 has 779938 records and table2 has 136657 records.
I need to make a joint query to find the number of ips in table2 that has a description starts with str1% and does not contains str2 and does not contains str3. In the same time, those ips has a description in table1 that does not start with str1%, and contains either str2 or str3.
This is my query:
SELECT COUNT(`table2`.`ip`)
FROM `ips`.`table2`, `ips`.`table1`
WHERE `table2`.`ip` = `table1`.`ip`
AND (LOWER(`table1`.`description`) NOT LIKE 'str1%'
AND (LOWER(`tabl1`.`description`) LIKE '%-str2-%'
OR LOWER(`table1`.`description`) LIKE '%-str3-%'
)
)
AND (LOWER(`table2`.`description`) LIKE 'str1%'
AND LOWER(`table2`.`description`) NOT LIKE '%-str2-%'
AND LOWER(`table2`.`description`) NOT LIKE '%-str3-%'
);
However, the query never ends. The duration has ? and I never get result. Can you please help?
EDIT:
Here are the SHOW CREATE TABLE and
1) SHOW CREATE TABLEips.table2;
CREATE TABLE `table2` (
`ip` varchar(500) DEFAULT NULL,
`description` varchar(500) DEFAULT NULL,
`type` varchar(500) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
2) SHOW CREATE TABLEips.table1;
CREATE TABLE `table1` (
`ip` varchar(500) DEFAULT NULL,
`description` varchar(500) DEFAULT NULL,
`type` varchar(500) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
3) EXPLAIN <query>
# id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
1, SIMPLE, table2, , ALL, , , , , 136109, 100.00, Using where
1, SIMPLE, table1, , ALL, , , , , 786072, 10.00, Using where; Using join buffer (Block Nested Loop)
EDIT 2:
The data for ip field are string in this format: str.str.str.str
The decription field is in this format: str1-str2-str3-str4

The previous answer regarding Indexing might optimise the query. It might be correct. But I am sorry that I have to check the answer I used to solve the problem. Thanks to #Raymond Nijland for being first to point the indexing issue which reminded me of the primary keys.
The source of the problem is that both tables in the query did not have primary key. The primary key must be for a key that is unique and not null. In my case I already have the ip field ready to server as the primary key. Since I use mysql- workbench I right click the tables, click Alter Table then check the primary key for the approperiate field as follows:
That solved my problem.

You are getting the ALL operator in the execution plan because the SQL planner is not using any index. It's performing a Full Table Scan on both tables.
A Full Table Scan can be optimal when you are selecting more than 5% of the rows. In your case this could be good if your string prefix "str1" had a single letter. If it has more than one character, then the usage on an index could greatly improve the performance.
Now, the comparisong you are performing is not a simple one. You are not comparing the value of a column, but the result of an expression: LOWER(table1.description). Therefore you need to create virtual columns and index them if you want this query to be fast. This is available on MySQL 5.7 and newer:
alter table table1 add lower_desc varchar(50)
generated always as (LOWER(description)) virtual;
create index ix1 on table1 (lower_desc);
alter table table2 add lower_desc varchar(50)
generated always as (LOWER(description)) virtual;
create index ix2 on table2 (lower_desc);
These indexes will make your queries faster when the prefix has two or more characters. Get the execution plan again. Now, the operators ALL should not be there anymore (INDEX operators should show up in their place now).
Incidentally, I think your missed a join in the query. I think it should look like (I added the third line):
SELECT COUNT(`table2`.`ip`)
FROM `ips`.`table2`
JOIN `ips`.`table1` on `ips`.`table1`.ip = `ips`.`table2`.ip
WHERE `table2`.`ip` = `table1`.`ip`
AND (LOWER(`table1`.`description`) NOT LIKE 'str1%'
AND (LOWER(`tabl1`.`description`) LIKE '%-str2-%'
OR LOWER(`table1`.`description`) LIKE '%-str3-%'
)
)
AND (LOWER(`table2`.`description`) LIKE 'str1%'
AND LOWER(`table2`.`description`) NOT LIKE '%-str2-%'
AND LOWER(`table2`.`description`) NOT LIKE '%-str3-%'
);
Also, to optimize the join performance you'll need one (or both) of the indexes shown below:
create index ix3 on table1 (ip);
create index ix4 on table2 (ip);

Related

Optimize mysql subquery containing self-join to reduce CPU Usage

I understand that mysql query involving self-join table might lead to slow query and/or CPU spike, but have been struggling to come up with ways to improve it.
CREATE TABLE `tool` (
`tool_id` char(32) NOT NULL,
`provider` varchar(36) NOT NULL,
PRIMARY KEY (`tool_id`),
)
CREATE TABLE `edata` (
`e_data_id` char(32) NOT NULL,
`tool_id` char(32) DEFAULT NULL,
`ref_e_data_id` char(32) DEFAULT NULL,
PRIMARY KEY (`e_data_id`),
KEY `e_ref_e_data__06a0c1a7_fk` (`ref_e_data_id`),
KEY `edata_tool_id_61d6bb9b` (`tool_id`),
CONSTRAINT `e_tool_id_61d6bb9b` FOREIGN KEY (`tool_id`) REFERENCES `tool` (`tool_id`),
)
here is the query in question
mutdata
LEFT JOIN (SELECT e1.edata_id as m_id, a1.provider as m_cp from edata e1 INNER JOIN tool a1 on e1.tool_id=a1.tool_id WHERE a1.deleted=0) as mapping
on mutdata.ref_e_data_id=mapping.m_id or mutdata.e_data_id=map.m_id
in short, first the subquery is constructed as a lookup table like a dictionary or map, then mutdata tries to use the lookup table to determine the corresponding provider (this query is part of even larger query). Is there a way to optimize this part?
These indexes may help:
mutdata: INDEX(ref_e_data_id, e_data_id)
map: INDEX(m_id)
e1: INDEX(tool_id, edata_id)
a1: INDEX(deleted, tool_id, provider)
Try not to use the construct JOIN ( SELECT ... ); instead try to bump that up a level.
Do you really need LEFT in either place?
OR is terrible for performance. Sometimes it is practical to use two SELECT connected by UNION DISTINCT as a workaround. That way, each SELECT may be able to take advantage of a different index.
Where is map in the query?

MySQL: how to speedup a query fetching a large quantity of data and using LIKE

I have a table of 3,666,058 records and 6 columns, defined as follows:
CREATE TABLE IF NOT EXISTS `annoyance` (
`a` varchar(50) NOT NULL default '',
`b` varchar(50) NOT NULL default '',
`c` longtext,
`d` varchar(21) NOT NULL,
`e` float default NULL,
`f` smallint(6) NOT NULL default '0',
KEY `ab` (`a`,`b`),
KEY `b` (`b`),
KEY `d` (`d`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I am trying to fetch content of columns a,b, and d, when a starts with a certain prefix (3 letters long), let it be aaa. So I am running the following query: SELECT a,b,c from annoyance where a like 'aaa%';. This should fetch ~1,835,000 records from the table.
My problem is: this query is very slow (when not cached of course), and sometimes takes up to a few minutes.
So, how can I make things faster for this particular query? Simething I tried but without any success is to create an index on a (size 3 or without specifying a size): MySQL would not even bother using the index unless I would force it with FORCE INDEX (index hints) and it did not seem to speedup the query execution.
Fetching 1.8 million rows out of 3.6 million basically requires scanning the entire table. There is not much that you can do to improve performance.
Indexes will not help. If you were fetching, say 1000 rows from the table, then indexes can help. And, an index on a would be used for the like. You could also phrase this as:
where a >= 'aaa' and a < 'aab'
If you wanted to make it even easier for the optimizer to choose the index.
Have you tried LEFT()
According to this test it is faster than LIKE.
http://cc.davelozinski.com/sql/like-vs-substring-vs-leftright-vs-charindex
SELECT a,b,c from annoyance where LEFT(a,3) = 'aaa'
INDEX(a, b, d) (or (a, d, b)) would run faster because it would be a "covering" index.
(Change d to c if c is really what you are fetching.)

MySQL - multiple column index

I'm learning MySQL index and found that index should be applied to any column named in the WHERE clause of a SELECT query.
Then I found Multiple Column Index vs Multiple Indexes.
First Q, I was wondering what is multiple column index. I found code bellow from Joomla, is this Multiple Column Index?
CREATE TABLE `extensions` (
`extension_id` INT(11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR(100) NOT NULL,
`type` VARCHAR(20) NOT NULL,
`element` VARCHAR(100) NOT NULL,
`folder` VARCHAR(100) NOT NULL,
`client_id` TINYINT(3) NOT NULL,
... ...
PRIMARY KEY (`extension_id`),
// does code below is multiple column index?
INDEX `element_clientid` (`element`, `client_id`),
INDEX `element_folder_clientid` (`element`, `folder`, `client_id`),
INDEX `extension` (`type`, `element`, `folder`, `client_id`)
)
Second Q, am I correct if thinking that one Multiple Column Index is used on one SELECT ?
SELECT column_x WHERE element=y AND clinet_id=y; // index: element_clientid
SELECT ex.col_a, tb.col_b
FROM extensions ex
LEFT JOIN table2 tb
ON (ex.ext_id = tb.ext_id)
WHERE ex.element=x AND ex.folder=y AND ex.client_id=z; // index: element_folder_clientid
General rule of thumb for indexes is to slap one onto any field used in a WHERE or JOIN clause.
That being said, there are some optimizations you can do. If you KNOW that a certain combination of fields are the only one that will ever be used in WHERE on a particular table, then you can create a single multi-field key on just those fields, e.g.
INDEX (field1, field2, field5)
v.s.
INDEX (field1),
INDEX (field2),
INDEX (field5)
A multi-field index can be more efficient in many cases, v.s having to scan multiple indexes. The downside is that the multi-field index is only usable if the fields in question are actually used in a WHERE clause.
With your sample queries, since element and field_id are in all three indexes, you might be better off splitting them off into their own dedicated index. If these are changeable fields, then it's better to keep it their own dedicated index. e.g. if you ever have to change field_id in bulk, the DB has to update 3 different indexes, v.s. updating just one dedicated one.
But it all comes down to benchmarking - test your particular setup with various index setups and see which performs best. Rules of thumbs are handy, but don't work 100% of the time.

Select rows where column LIKE dictionary word

I have 2 tables:
Dictionary - Contains roughly 36,000 words
CREATE TABLE IF NOT EXISTS `dictionary` (
`word` varchar(255) NOT NULL,
PRIMARY KEY (`word`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Datas - Contains roughly 100,000 rows
CREATE TABLE IF NOT EXISTS `datas` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`hash` varchar(32) NOT NULL,
`data` varchar(255) NOT NULL,
`length` int(11) NOT NULL,
`time` int(11) NOT NULL,
PRIMARY KEY (`ID`),
UNIQUE KEY `hash` (`hash`),
KEY `data` (`data`),
KEY `length` (`length`),
KEY `time` (`time`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=105316 ;
I would like to somehow select all the rows from datas where the column data contains 1 or more words.
I understand this is a big ask, it would need to match all of these rows together in every combination possible, so it needs the best optimization.
I have tried the below query, but it just hangs for ages:
SELECT `datas`.*, `dictionary`.`word`
FROM `datas`, `dictionary`
WHERE `datas`.`data` LIKE CONCAT('%', `dictionary`.`word`, '%')
AND LENGTH(`dictionary`.`word`) > 3
ORDER BY `length` ASC
LIMIT 15
I have also tried something similar to the above with a left join, and on clause that specified the like statement.
This is actually not an easy problem, what you are trying to perform is called Full Text Search, and relational databases are not the best tools for such a task. If this is some kind of a core functionality consider using solutions dedicated for this kind of operations, like Sphinx Search Server.
If this is not a "Mission Critical" system, you can try with something else. I can see that datas.data column isn't really long, so you can create a structure dedicated for your task and keep maintaining it during operational use. Fore example, create table:
dictionary_datas (
datas_id FK (datas.id),
word FK (dictionary.word)
)
Now anytime you insert, delete or simply modify datas or dictionary tables you update dictionary_datas placing there info which datas_id contains which words (basically many to many relations). Of course it will degradate your performance, so if you have high high transactional load on your system, you have to do this periodicaly. For example place a Cron Job which runs every night at 03:00 am and actualize the table. To simplify the task you can add a flag TO_CHECK into DATAS table, and actualize data only for those records having there 1 (after you actualise dictionary_datas you switch the value to 0). Remember by the way to refresh whole DATAS table after an update to DICTIONARY table. 36 000 and 100 000 aren't big numbers in terms of data processing.
Once you have this table you can just query it like:
SELECT datas_id, count(*) AS words_num FROM dictionary_datas GROUP BY datas_id HAVING count(*) > 3;
To speed up the query (and yet slow down it's update) you can create a composite index on its columns datas_id, word (in EXACTLY that order). If you decide to refresh the data periodicaly you should remove the index before refresh, than refresh the data, and finaly create the index after refreshing - this way will be faster.
I'm not sure if I understood your problem, but I think this could be a solution. Also, I think people don't like Regular Expression but this works for me to select columns where their value has more than 1 word.
SELECT * FROM datas WHERE data REGEXP "([a-z] )+"
Have you tried this?
select *
from dictionary, datas
where position(word,data) > 0
;
This is very inefficient, but might be good enough for you. Here is a fiddle.
For better performance, you could try placing a text search index on your text column DATA and then using the CONTAINS function instead of POSITION.

Which index should I choose?(Mysql)

Table:
CREATE TABLE `table1` (
`f1` int(11) NOT NULL default '0',
`f2` int(4) NOT NULL default '0',
`f3` bigint(20) NOT NULL default '0',
PRIMARY KEY (`f1`)
) TYPE=MyISAM
Query:
select `f1` from table1 where `f2`=123 order by `f3` desc;
I want create a "covering index" for this query
ALTER TABLE `table1` ADD INDEX (`f2`,`f3`,`f1`);
or
ALTER TABLE `table1` ADD INDEX (`f2`,`f1`,`f3`);
which should I choose?
The first one. MySQL can use either index to obtain the result set without needing to read from the actual table. The first index is slightly more efficient because it is not necessary to perform the extra step of re-ordering the rows.
for you query you need an index on f2 only.
if you a query with a whereclause like "where f1=12 and f2=15", you might want an index on f1 and f2 too. However, it might be that the primary key will give you results faster, depending on the data and complete query.
you (might) need an index covering the 3 fields if you have queries ranging on the 3 (in the where clause).
in 15 years, I never faced the need to create an index for ordering results only. This operation is quite fast. What is slow is finding the rows (the where clause), and the different set matches (joins).
Now if you are not sure, create both. Do you query and check with explain_plan which one mysql uses. Then drop the other ^^.