I have been playing around with indexes on MySQL (5.5.24, WinXP), but I can't find the reason of why the server is not using one index when a LIKE is used.
The example is this:
I have created a test table:
create table testTable (
id varchar(50) primary key,
text1 varchar(50) not null,
startDate varchar(50) not null
) ENGINE = innodb;
Then, I added an index to startDate. (Please, do not ask why the column is a text and not a date time.. this is just a simple test):
create index jeje on testTable(startdate);
analyze table testTable;
After that, I added almost 200,000 rows of that where startDate had 3 possible values. (One third of appearences for each one..near 70,000 times)
So, if I run an EXPLAIN command like this:
explain select * from testTable use index (jeje) where startDate = 'aaaaaaaaa';
The answer is the following:
id = 1
select_type = SIMPLE
type = ref
possible_keys = jeje
key = jeje
rows = 88412
extra = Using where
So, the key is used, and the rows amount is near to 200,000/3 so all is ok.
The poblem is that if I change the query to: (just chaning '=' to 'LIKE'):
explain select * from testTable use index(jeje) where startDate LIKE 'aaaaaaaaa';
In this case, the answer is:
id = 1
select_type = SIMPLE
type = ALL
possible_keys = jeje
key = null
rows = 176824
extra = Using where
So, the index is not being used now(key is null, and rows near to the full table..as the type=all suggests).
MySQL documentation says that LIKE DOES make use of indexes.
So, what am i not seeing here? Where is the problem?
Thanks for your help.
MySql can ignore index if it index incurs access to more than 30% of table rows.
You could try FORCE INDEX [index_name], it will use index in any case.
The value of sysvar_max_seeks_for_key also affects whether the index is used or not:
http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#sysvar_max_seeks_for_key
Try changing this value to a smaller number.
Search for similar requests on SO.
Based on Ubik comment, and data changes, I found that:
The Index IS used in these cases:
- explain select * from testTable force index jeje where startDate like 'aaaaaaadsfadsfadsfasafsafsasfsadsfa%';
- explain select * from testTable force index jeje where startDate like 'aaaaaaadsfadsfadsfasafsafsasfsadsfa%';
- explain select * from testTable force index jeje where startDate like 'aaa';
But the index is NOT being used when I use this query:
- explain select * from testTable force index jeje where startDate like 'aaaaaaaaa';
Based on the fact that in startDate column all the values have the same length (9 characters), when I use a query using a LIKE command and a 9 characters constant, PERHAPS MySQL prefer to not use the reason because of some performance algorithm, and goes to the table.
My concern was to see if I was making some kind of mistake on my original tests, but now I think that the index and tests are correct, and that MySQL in some cases decides to not use the index... and I will relay on this.
For me, this is a closed task.
If somebody want to add something to the thread, you are welcome.
Related
I have following query which is taking time.
Mytable type is innodb and have primary key on field tsh_id
I have also added index on transaction_Id field
following is implementation inside my database stored procedure.
DECLARE lv_timestamp DATETIME(3);
SET #lv_timestamp = NOW(3);
IF(mycondition) then
SET #lv_Duration :=( SELECT UNIX_TIMESTAMP (#lv_timestamp) - UNIX_TIMESTAMP ( `changedon` )
FROM `MyTable`
WHERE transaction_Id = _transaction_Id
ORDER BY tsh_id DESC
LIMIT 1)
End if;
Please suggest any sort of improvement
Edit:
Explain to query says
"select_type":"SIMPLE",
"table":"MyTable",
"type":"ref",
"possible_keys":"IX_MyTable_Transaction",
"key":"IX_MyTable_Transaction",
"key_len":"98",
"ref":"const",
"rows":1,
"Extra":"Using where"
Make sure you have an index on MyTable.transaction_Id
Make sure you have the innodb_buffer_pool_size set to a decent value in your MySQL config
I am fairly certain your primary key is not a clustered key (its might be null or not unique or you changed it at one point), because that would explain this behaviour, at least if transaction_Id isn't unique either (and otherwise you wouldn't need limit 1).
To improve this specific query, you can create the following index:
create index ix_mytable_transaction_id_tsh_id on MyTable (transaction_id, tsh_id desc);
Use explain again.
If it doesn't use the new key, force it:
SELECT ... FROM `MyTable` force index(ix_mytable_transaction_id_tsh_id) ...
I have the following table in PostgreSQL:
CREATE TABLE index_test
(
id int PRIMARY KEY NOT NULL,
text varchar(2048) NOT NULL,
last_modified timestamp NOT NULL,
value int,
item_type varchar(2046)
);
CREATE INDEX idx_index_type ON index_test ( item_type );
CREATE INDEX idx_index_value ON index_test ( value )
I make the following selects:
explain select * from index_test r where r.item_type='B';
explain select r.value from index_test r where r.value=56;
The explanation of execution plan looks like this:
Seq Scan on index_test r (cost=0.00..1.04 rows=1 width=1576)
Filter: ((item_type)::text = 'B'::text)'
As far as I understand, this is a full table scan. The question is: why my indexes are not used?
May be, the reason is that I have too few rows in my table? I have only 20 of them. Could you please provide me with a SQL statement to easily populate my table with random data to check the indexes issue?
I have found this article: http://it.toolbox.com/blogs/db2luw/how-to-easily-populate-a-table-with-random-data-7888, but it doesn't work for me. The efficiency of the statement does not matter, only the simplicity.
Maybe, the reason is that I have too few rows in my table?
Yes. For a total of 20 rows in a table a seq scan is always going to be faster than an index scan. Chances are that those rows are located in a single database block anyway, so the seq scan would only need a single I/O operation.
If you use
explain (analyze true, verbose true, buffers true) select ....
you can see a bit more details about what is really going on.
Btw: you shouldn't use text as a column name, as that is also a datatype in Postgres (and thus a reserved word).
The example you have found is for DB2, in pg you can use generate_series to do it.
For example like this:
INSERT INTO index_test(data,last_modified,value,item_type)
SELECT
md5(random()::text),now(),floor(random()*100),md5(random()::text)
FROM generate_series(1,1000);
SELECT max(value) from index_test;
http://sqlfiddle.com/#!12/52641/3
The second query in above fiddle should use index only scan.
I started with this question: is my large mysql table destined for failure?
The answer that I found from that question was satisfactory. I have a table with 22 million rows that I would like to grow to about 100 million. At this time, the table minute_data structure is like this:
A problem that I am having is as follows. I need to execute this query:
select datediff(date,now()) from minute_data where symbol = "CSCO" order by date desc limit 1;
Which is very fast ( < 1 sec ) when the table contains the value "CSCO". The problem is, sometimes I will query for a symbol that is not in the table already. When I execute a query like this for, say, symbol = "ABCD":
select datediff(date,now()) from minute_data where symbol = "ABCD" order by date desc limit 1;
Then the query takes a LONG TIME... like forever ( 180 seconds ).
A way I can get around this is by making sure that the table contains the symbol I am looking for before I execute the query. The fastest way I found to do this is with the follow query, which I just need to use to check to see if the table minute_data contains the symbol I am looking for or not. Basically I just need it to return a boolean value so I know if the symbol is in the table or not:
select count(1) from minute_data where symbol = "CSCO";
This query takes over 30 seconds to return 1 value, way too long for my liking, since the query above, which actually returns a datediff calculation only takes less than 1 second.
symbol column is part of the pri key, I thought it should be able to figure out if a value exists there very quickly.
What am I doing wrong? Is there a fast way to do what I want to do? Should I change the structure of the data to optimize performance?
Thank You!
UPDATE
I think I found a good solution to this problem. From the answer below by LastCoder, I did the following:
1) Created a new table called minute_data_2 with the exact same definition as minute_data.
2)
ALTER TABLE minute_data_2 ADD PRIMARY KEY (symbol, date);
3)
INSERT IGNORE INTO minute_data_2 SELECT * FROM minute_data;
4)
DROP TABLE minute_data;
5) Rename minute_data_2 to minute_data
Now I am seeing blindingly fast speed for the same query which I described above as taking more than 180 second, now completes in .001 seconds. Amazing.
Did you try using EXISTS (...)
select datediff(date,now()) from minute_data
where EXISTS(SELECT * FROM minute_data WHERE symbol = "CSCO")
AND symbol = "CSCO" order by date desc limit 1;
Even though symbol is a primary key, it seems you have the timestamp as a PK as well which makes me think you are using a COMPOSITE pk which means the ordering is by timestamp then symbol. You may want to put separate index on symbol, if all you have is a composite one where timestamp is first.
I think is better to make a table named symbols and add a reference to that table in your minute_data table:
symbols:
symbol_id (INT, Primary Key, Auto Increment)
symbol_text (VARCHAR)
minute_data:
key_col (BIGINT, Primary Key, Auto Increment)
symbol_id (INT, Index)
other_field
Use InnoDB as table type for adding references.
Try to avoid duplicate entries into your tables..
Query
SELECT id FROM `user_tmp`
WHERE `code` = '9s5xs1sy'
AND `go` NOT REGEXP 'http://www.xxxx.example.com/aflam/|http://xx.example.com|http://www.xxxxx..example.com/aflam/|http://www.xxxxxx.example.com/v/|http://www.xxxxxx.example.com/vb/'
AND check='done'
AND `dataip` <1319992460
ORDER BY id DESC
LIMIT 50
MySQL returns:
Showing rows 0 - 29 ( 50 total, Query took 21.3102 sec) [id: 2622270 - 2602288]
Query took 21.3102 sec
if i remove
AND dataip <1319992460
MySQL returns
Showing rows 0 - 29 ( 50 total, Query took 0.0859 sec) [id: 3637556 - 3627005]
Query took 0.0859 sec
and if no data, MySQL returns
MySQL returned an empty result set (i.e. zero rows). ( Query took 21.7332 sec )
Query took 21.7332 sec
Explain plan:
SQL query: Explain SELECT * FROM `user_tmp` WHERE `code` = '93mhco3s5y' AND `too` NOT REGEXP 'http://www.10neen.com/aflam/|http://3ltool.com|http://www.10neen.com/aflam/|http://www.10neen.com/v/|http://www.m1-w3d.com/vb/' and checkopen='2010' and `dataip` <1319992460 ORDER BY id DESC LIMIT 50;
Rows: 1
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE user_tmp index NULL PRIMARY 4 NULL 50 Using where
Example of the database used
CREATE TABLE IF NOT EXISTS user_tmp ( id int(9) NOT NULL
AUTO_INCREMENT, ip text NOT NULL, dataip bigint(20) NOT NULL,
ref text NOT NULL, click int(20) NOT NULL, code text NOT
NULL, too text NOT NULL, name text NOT NULL, checkopen
text NOT NULL, contry text NOT NULL, vOperation text NOT NULL,
vBrowser text NOT NULL, iconOperation text NOT NULL,
iconBrowser text NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=4653425 ;
--
-- Dumping data for table user_tmp
INSERT INTO `user_tmp` (`id`, `ip`, `dataip`, `ref`, `click`, `code`, `too`, `name`, `checkopen`, `contry`, `vOperation`, `vBrowser`, `iconOperation`, `iconBrowser`) VALUES
(1, '54.125.78.84', 1319506641, 'http://xxxx.example.com/vb/showthread.php%D8%AA%D8%AD%D9%85%D9%8A%D9%84-%D8%A7%D8%BA%D9%86%D9%8A%D8%A9-%D8%A7%D9%84%D8%A8%D9%88%D9%85-giovanni-marradi-lovers-rendezvous-3cd-1999-a-155712.html', 0, '4mxxxxx5', 'http://www.xxx.example.com/aflam/', 'xxxxe', '2010', 'US', 'Linux', 'Chrome 12.0.742 ', 'linux.png', 'chrome.png');
I want the correct way to do the query and optimize database
You don't have any indexes besides the primary key. You need to make index on fields that you use in your WHERE statement. If you need to index only 1 field or a combination of several fields depends on the other SELECTs you will be running against that table.
Keep in mind that REGEXP cannot use indexes at all, LIKE can use index only when it does not begin with wildcard (so LIKE 'a%' can use index, but LIKE '%a' cannot), bigger than / smaller than (<>) usually don't use indexes also.
So you are left with the code and check fields. I suppose many rows will have the same value for check, so I would begin the index with code field. Multi-field indexes can be used only in the order in which they are defined...
Imagine index created for fields code, check. This index can be used in your query (where the WHERE clause contains both fields), also in the query with only code field, but not in query with only check field.
Is it important to ORDER BY id? If not, leave it out, it will prevent the sort pass and your query will finish faster.
I will assume you are using mysql <= 5.1
The answers above fall into two basic categories:
1. You are using the wrong column type
2. You need indexes
I will deal with each as both are relevant for performance which is ultimately what I take your questions to be about:
Column Types
The difference between bigint/int or int/char for the dataip question is basically not relevant to your issue. The fundamental issue has more to do with index strategy. However when considering performance holistically, the fact that you are using MyISAM as your engine for this table leads me to ask if you really need "text" column types. If you have short (less than 255 say) character columns, then making them fixed length columns will most likely increase performance. Keep in mind that if any one column is of variable length (varchar, text, etc) then this is not worth changing any of them.
Vertical Partitioning
The fact to keep in mind here is that even though you are only requesting the id column from the standpoint of disk IO and memory you are getting the entire row back. Since so many of the rows are text, this could mean a massive amount of data. Any of these rows that are not used for lookups of users or are not often accessed could be moved into another table where the foreign key has a unique key placed on it keeping the relationship 1:1.
Index Strategy
Most likely the problem is simply indexing as is noted above. The reason that your current situation is caused by adding the "AND dataip <1319992460" condition is that it forces a full table scan.
As stated above placing all the columns in the where clause in a single, composite index will help. The order of the columns in the index will no matter so long as all of them appear in the where clause.
However, the order could matter a great deal for other queries. A quick example would be an index made of (colA, colB). A query with "where colA = 'foo'" will use this index. But a query with "where colB = 'bar'" will not because colB is not the left most column in the index definition. So, if you have other queries that use these columns in some combination it is worth minimizing the number of indexes created on the table. This is b/c every index increases the cost of a write and uses disk space. Writes are expensive b/c of necessary disk activity. Don't make them more expensive.
You need to add index like this:
ALTER TABLE `user_tmp` ADD INDEX(`dataip`);
And if your column 'dataip' contains only unique values you can add unique key like this:
ALTER TABLE `user_tmp` ADD UNIQUE(`dataip`);
Keep in mind, that adding index can take long time on a big table, so don't do it on production server with out testing.
You need to create index on fields in the same order that that are using in where clause. Otherwise index is not be used. Index fields of your where clause.
does dataip really need to be a bigint? According to mysql The signed range is -9223372036854775808 to 9223372036854775807 ( it is a 64bit number ).
You need to choose the right column type for the job, and add the right type of index too. Else these queries will take forever.
If I have the following table:
CREATE TABLE `mytable` (
`id` INT NOT NULL AUTO_INCREMENT,
`name` VARCHAR(64) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `name_first_letter` (`name`(1)),
KEY `name_all` (`name`)
)
Will MySQL ever choose to use the name_first_letter index over the name_all index? If so, under what conditions would this happen?
I have done some quick tests and I'm not sure if MySQL will choose the name_first_letter index even when using index hints:
-- This uses name_all
EXPLAIN SELECT name FROM mytable
WHERE SUBSTRING(name FROM 1 FOR 1) = 'T';
-- This uses no index at all
EXPLAIN SELECT name FROM mytable USE INDEX (name_first_letter)
WHERE SUBSTRING(name FROM 1 FOR 1) = 'T';
Can any MySQL gurus shed some light on this? Is there even a point to having name_first_letter on this column?
Edit: Question title wasn't quite right.
It will not make sense to use the index for your query, because you are selecting the full name column. That means that MySQL cannot use the index alone to satisfy the query.
Further, I believe that MySQL cannot understand that the SUBSTRING(name FROM 1 FOR 1) expression is equivalent to the index.
MySQL might, however, use the index if the index alone can satisfy the query. For example:
select count(*)
from mytable
where name like 'T%';
But even that depends on you statistics (hinting should work).
MySQLs partial index feature is intended to save space. It does (usually) not make sense to have both, the partial and the full column index. You would typically drop the shorter one. There might be a rare case where it makes sense, but doesn't make sense in general.