Optimal MySQL index for search query within date ranges - mysql

I have a MySQL table of the form
CREATE TABLE `myTable` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`timestamp` datetime NOT NULL,
`fieldA` int(11) NOT NULL,
`fieldB` int(11) NOT NULL,
....
)
The table will have around 500,000,000 rows, with the remaining fields being floats.
The queries I will be using will be of the form:
SELECT * FROM myTable
WHERE fieldA= AND fieldB= AND timestamp>'' and timestamp<=''
ORDER BY timestamp;
At the moment I have two indices: a primary key on id, and a unique key on timestamp,fieldA,fieldB (hashed). At the moment, a select query like the above takes around 6 minutes on a reasonably powerful desktop PC.
What would the optimal index to apply? Does the ordering of the 3 fields in the key matter, and should I be using a binary tree instead of hashed? Is there a conflict between my primary key and the second index? Or do I have the best performance I can expect for such a large db without more serious hardware?
Thanks!

For that particular query adding an index to fieldA and fieldB probably would be optimal. Order of the columns in the index do matter.
Index Order
In order for Mysql to even consider using a particular index on the query the first column must be in the query, so for example:
alter table mytable add index a_b_index(a, b);
select * from mytable where a = 1 and b = 2;
The above query should use the index a_b_index. Now take this next example:
alter table mytable add index a_b_index(a, b);
select * from mytable where b = 2;
This will not use the index because the index starts with a, but a is never used in the query so mysql will not use it.
Comparison
Mysql will only use an index if you use equality comparison. So < and > won't use an index for that column, same with between
LIKE
Mysql does use indexes on the LIKE statement, but only when the % is at the end of the statement like this:
select * from mytable where cola like 'hello%';
Whereas these will not use a index:
select * from mytable where cola like '%hello';
select * from mytable where cola like '%hello%';

Hashed indexes are not used for ranges. They are used for equality comparisons only. Therefore, a hashed index cannot be used for the range portion of your query.
Since you have a range in your query, you should use a standard b-tree index. Ensure that fielda and fieldb are the first columns in the index, then timestamp. MySQL cannot utilize the index for searches beyond the first range.
Consider a multi-column index on (fielda, fieldb, timestamp).
The index should also be able to satisfy the ORDER BY.
To improve the query further, select only those three columns or consider a larger "covering" index.

Related

Do composite key indices improve performance of or clauses

I have a table in MySQL with two columns
id int(11) unsigned NOT NULL AUTO_INCREMENT,
B varchar(191) CHARACTER SET utf8mb4 DEFAULT NULL,
The id being the PK.
I need to do a lookup in a query using either one of these. id in (:idList) or B in (:bList)
Would this query perform better if, there is a composite index with these two columns in them?
No, it will not.
Indexes can be used to look up values from the leftmost columns in an index:
MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on. If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table.
So, if you have a composite index on id, B fields (in this order), then the index can be used to look up values based on their id, or a combination of id and B values. But cannot be used to look up values based on B only. However, in case of an or condition that's what you need to do: look up values based on B only.
If both fields in the or condition are leftmost fields in an index, then MySQL attempts to do an index merge optimisation, so you may actually be better off having separate indexes for these two fields.
Note: if you use innodb table engine, then there is no point in adding the primary key to any multi column index because innodb silently adds the PK to every index.
For OR I dont think so.
Optimizer will try to find a match in the first side, if fail will try the second side. So Individual index for each search will be better.
For AND a composite index will help.
MySQL index TIPS
Of course you can always add the index and compare the explain plan.
MySQL Explain Plan
The trick for optimizing OR is to use UNION. (At least, it works well in some cases.)
( SELECT ... FROM ... WHERE id IN (...) )
UNION DISTINCT
( SELECT ... FROM ... WHERE B IN (...) )
Notes:
Need separate indexes on id and B.
No benefit from any composite index (unless it is also "covering").
Change DISTINCT to ALL if you know that there won't be any rows found by both the id and B tests. (This avoids a de-dup pass.)
If you need ORDER BY, add it after the SQL above.
If you need LIMIT, it gets messier. (This is probably not relevant for IN, but it often is with ORDER BY.)
If the rows are 'wide' and the resultset has very few rows, it may be further beneficial to do
Something like this:
SELECT t...
FROM t
JOIN (
( SELECT id FROM t WHERE id IN (...) )
UNION DISTINCT
( SELECT id FROM t WHERE B IN (...) )
) AS u USING(id);
Notes:
This needs PRIMARY KEY(id) and INDEX(B, id). (Actually there is no diff, as Michael pointed out.)
The UNION is cheaper here because of collecting only id, not the bulky columns.
The SELECTs in the UNION are faster because you should be able to provide "covering" indexes.
ORDER BY would go at the very end.

JSONB Index using GIN doesn't work on postgres [duplicate]

I have the following table in PostgreSQL:
CREATE TABLE index_test
(
id int PRIMARY KEY NOT NULL,
text varchar(2048) NOT NULL,
last_modified timestamp NOT NULL,
value int,
item_type varchar(2046)
);
CREATE INDEX idx_index_type ON index_test ( item_type );
CREATE INDEX idx_index_value ON index_test ( value )
I make the following selects:
explain select * from index_test r where r.item_type='B';
explain select r.value from index_test r where r.value=56;
The explanation of execution plan looks like this:
Seq Scan on index_test r (cost=0.00..1.04 rows=1 width=1576)
Filter: ((item_type)::text = 'B'::text)'
As far as I understand, this is a full table scan. The question is: why my indexes are not used?
May be, the reason is that I have too few rows in my table? I have only 20 of them. Could you please provide me with a SQL statement to easily populate my table with random data to check the indexes issue?
I have found this article: http://it.toolbox.com/blogs/db2luw/how-to-easily-populate-a-table-with-random-data-7888, but it doesn't work for me. The efficiency of the statement does not matter, only the simplicity.
Maybe, the reason is that I have too few rows in my table?
Yes. For a total of 20 rows in a table a seq scan is always going to be faster than an index scan. Chances are that those rows are located in a single database block anyway, so the seq scan would only need a single I/O operation.
If you use
explain (analyze true, verbose true, buffers true) select ....
you can see a bit more details about what is really going on.
Btw: you shouldn't use text as a column name, as that is also a datatype in Postgres (and thus a reserved word).
The example you have found is for DB2, in pg you can use generate_series to do it.
For example like this:
INSERT INTO index_test(data,last_modified,value,item_type)
SELECT
md5(random()::text),now(),floor(random()*100),md5(random()::text)
FROM generate_series(1,1000);
SELECT max(value) from index_test;
http://sqlfiddle.com/#!12/52641/3
The second query in above fiddle should use index only scan.

MySQL index for ORDER BY A*B

I have a MySQL InnoDB table with two INT columns, say col1 and col2. I'd like to add an index that will allow me to:
SELECT * from myTable WHERE col0=5 ORDER BY col1*col2 DESC
Is it possible to have an index that will support such a sorting or will i need to add a column that keeps that value (col1*col2) ?
Noam, see ORDER BY Optimization‌​. If you want to use the index for sorting, it should be the same as the index, that is used in the WHERE clause and of course the value for sorting needs to be stored in it's own column. Here I generated a test table with 100k rows, that should match your situation.
1.) Adding ONE INDEX on two columns (this works for utlizing an index for both select and sort):
ALTER TABLE `test_data` ADD INDEX super_sort (`col0`,`sort_col`);
EXPLAIN SELECT * FROM `test_data` WHERE col0 = 50 ORDER BY sort_col;
key -> super_sort; Extra -> using where
(index is used for WHERE and SORT)
2.) Adding two indexes, one for WHERE and one for SORT (won't work)
ALTER TABLE `test_data` DROP INDEX `super_sort`;
ALTER TABLE `test_data` ADD INDEX (`col0`);
ALTER TABLE `test_data` ADD INDEX (`sort_col`);
EXPLAIN SELECT * FROM `test_data` WHERE col0 = 50 ORDER BY sort_col;
key -> col0; Extra -> Using where; Using filesort
(an index is used for WHERE, BUT NOT for sorting)
So the answer is: Yes, you will need a column, that keeps that value (col1*col2) AND you need ONE index on both columns: col0 (for the WHERE-clause) + sort_col (for sorting) like in first example. As soon, as you ORDER BY any calculation (e.g. col1*col2) no index can be used for sorting.
You can add new column that contains the value of col1*col2 and use it for sorting. Otherwise you can use SELECT * from myTable WHERE col0=5 ORDER BY col1*col2 DESC.

How to optimize database this query in large database?

Query
SELECT id FROM `user_tmp`
WHERE `code` = '9s5xs1sy'
AND `go` NOT REGEXP 'http://www.xxxx.example.com/aflam/|http://xx.example.com|http://www.xxxxx..example.com/aflam/|http://www.xxxxxx.example.com/v/|http://www.xxxxxx.example.com/vb/'
AND check='done'
AND `dataip` <1319992460
ORDER BY id DESC
LIMIT 50
MySQL returns:
Showing rows 0 - 29 ( 50 total, Query took 21.3102 sec) [id: 2622270 - 2602288]
Query took 21.3102 sec
if i remove
AND dataip <1319992460
MySQL returns
Showing rows 0 - 29 ( 50 total, Query took 0.0859 sec) [id: 3637556 - 3627005]
Query took 0.0859 sec
and if no data, MySQL returns
MySQL returned an empty result set (i.e. zero rows). ( Query took 21.7332 sec )
Query took 21.7332 sec
Explain plan:
SQL query: Explain SELECT * FROM `user_tmp` WHERE `code` = '93mhco3s5y' AND `too` NOT REGEXP 'http://www.10neen.com/aflam/|http://3ltool.com|http://www.10neen.com/aflam/|http://www.10neen.com/v/|http://www.m1-w3d.com/vb/' and checkopen='2010' and `dataip` <1319992460 ORDER BY id DESC LIMIT 50;
Rows: 1
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE user_tmp index NULL PRIMARY 4 NULL 50 Using where
Example of the database used
CREATE TABLE IF NOT EXISTS user_tmp ( id int(9) NOT NULL
AUTO_INCREMENT, ip text NOT NULL, dataip bigint(20) NOT NULL,
ref text NOT NULL, click int(20) NOT NULL, code text NOT
NULL, too text NOT NULL, name text NOT NULL, checkopen
text NOT NULL, contry text NOT NULL, vOperation text NOT NULL,
vBrowser text NOT NULL, iconOperation text NOT NULL,
iconBrowser text NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=4653425 ;
--
-- Dumping data for table user_tmp
INSERT INTO `user_tmp` (`id`, `ip`, `dataip`, `ref`, `click`, `code`, `too`, `name`, `checkopen`, `contry`, `vOperation`, `vBrowser`, `iconOperation`, `iconBrowser`) VALUES
(1, '54.125.78.84', 1319506641, 'http://xxxx.example.com/vb/showthread.php%D8%AA%D8%AD%D9%85%D9%8A%D9%84-%D8%A7%D8%BA%D9%86%D9%8A%D8%A9-%D8%A7%D9%84%D8%A8%D9%88%D9%85-giovanni-marradi-lovers-rendezvous-3cd-1999-a-155712.html', 0, '4mxxxxx5', 'http://www.xxx.example.com/aflam/', 'xxxxe', '2010', 'US', 'Linux', 'Chrome 12.0.742 ', 'linux.png', 'chrome.png');
I want the correct way to do the query and optimize database
You don't have any indexes besides the primary key. You need to make index on fields that you use in your WHERE statement. If you need to index only 1 field or a combination of several fields depends on the other SELECTs you will be running against that table.
Keep in mind that REGEXP cannot use indexes at all, LIKE can use index only when it does not begin with wildcard (so LIKE 'a%' can use index, but LIKE '%a' cannot), bigger than / smaller than (<>) usually don't use indexes also.
So you are left with the code and check fields. I suppose many rows will have the same value for check, so I would begin the index with code field. Multi-field indexes can be used only in the order in which they are defined...
Imagine index created for fields code, check. This index can be used in your query (where the WHERE clause contains both fields), also in the query with only code field, but not in query with only check field.
Is it important to ORDER BY id? If not, leave it out, it will prevent the sort pass and your query will finish faster.
I will assume you are using mysql <= 5.1
The answers above fall into two basic categories:
1. You are using the wrong column type
2. You need indexes
I will deal with each as both are relevant for performance which is ultimately what I take your questions to be about:
Column Types
The difference between bigint/int or int/char for the dataip question is basically not relevant to your issue. The fundamental issue has more to do with index strategy. However when considering performance holistically, the fact that you are using MyISAM as your engine for this table leads me to ask if you really need "text" column types. If you have short (less than 255 say) character columns, then making them fixed length columns will most likely increase performance. Keep in mind that if any one column is of variable length (varchar, text, etc) then this is not worth changing any of them.
Vertical Partitioning
The fact to keep in mind here is that even though you are only requesting the id column from the standpoint of disk IO and memory you are getting the entire row back. Since so many of the rows are text, this could mean a massive amount of data. Any of these rows that are not used for lookups of users or are not often accessed could be moved into another table where the foreign key has a unique key placed on it keeping the relationship 1:1.
Index Strategy
Most likely the problem is simply indexing as is noted above. The reason that your current situation is caused by adding the "AND dataip <1319992460" condition is that it forces a full table scan.
As stated above placing all the columns in the where clause in a single, composite index will help. The order of the columns in the index will no matter so long as all of them appear in the where clause.
However, the order could matter a great deal for other queries. A quick example would be an index made of (colA, colB). A query with "where colA = 'foo'" will use this index. But a query with "where colB = 'bar'" will not because colB is not the left most column in the index definition. So, if you have other queries that use these columns in some combination it is worth minimizing the number of indexes created on the table. This is b/c every index increases the cost of a write and uses disk space. Writes are expensive b/c of necessary disk activity. Don't make them more expensive.
You need to add index like this:
ALTER TABLE `user_tmp` ADD INDEX(`dataip`);
And if your column 'dataip' contains only unique values you can add unique key like this:
ALTER TABLE `user_tmp` ADD UNIQUE(`dataip`);
Keep in mind, that adding index can take long time on a big table, so don't do it on production server with out testing.
You need to create index on fields in the same order that that are using in where clause. Otherwise index is not be used. Index fields of your where clause.
does dataip really need to be a bigint? According to mysql The signed range is -9223372036854775808 to 9223372036854775807 ( it is a 64bit number ).
You need to choose the right column type for the job, and add the right type of index too. Else these queries will take forever.

Using filesort to sort by datetime column in MySQL

I have a table Cars with datetime (DATE) and bit (PUBLIC).
Now i would like to take rows ordered by DATE and with PUBLIC = 1 so i use:
select
c.*
from
Cars c
WHERE
c.PUBLIC = 1
ORDER BY
DATE DESC
But unfortunately when I use explain to see what is going on I have this:
1 SIMPLE a ALL IDX_PUBLIC,DATE NULL NULL NULL 103 Using where; Using filesort
And it takes 0,3 ms to take this data while I have only 100 rows. Is there any other way to disable filesort?
If i goes to indexes I have index on (PUBLIC, DATE) not unique.
Table def:
CREATE TABLE IF NOT EXISTS `Cars` (
`ID` int(11) NOT NULL auto_increment,
`DATE` datetime NOT NULL,
`PUBLIC` binary(1) NOT NULL default '0'
PRIMARY KEY (`ID`),
KEY `IDX_PUBLIC` (`PUBLIC`),
KEY `DATE` (`PUBLIC`,`DATE`)
) ENGINE=MyISAM AUTO_INCREMENT=186 ;
You need to have a composite index on (public, date)
This way, MySQL will filter on public and sort on date.
From your EXPLAIN I see that you don't have a composite index on (public, date).
Instead you have two different indexes on public and on date. At least, that's what their names IDX_PUBLIC and DATE tell.
Update:
You public column is not a BIT, it's a BINARY(1). It's a character type and uses character comparison.
When comparing integers to characters, MySQL converts the latter to the former, not vice versa.
These queries return different results:
CREATE TABLE t_binary (val BINARY(2) NOT NULL);
INSERT
INTO t_binary
VALUES
(1),
(2),
(3),
(10);
SELECT *
FROM t_binary
WHERE val <= 10;
---
1
2
3
10
SELECT *
FROM t_binary
WHERE val <= '10';
---
1
10
Either change your public column to be a bit or rewrite your query as this:
SELECT c.*
FROM Cars c
WHERE c.PUBLIC = '1'
ORDER BY
DATE DESC
, i. e. compare characters with characters, not integers.
If you are ordering by date, a sort will be required. If there isn't an index by date, then a filesort will be used. The only way to get rid of that would be to either add an index on date or not do the order by.
Also, a filesort does not always imply that the file will be sorted on disk. It could be sorting it in memory if the table is small enough or the sort buffer is large enough. It just means that the table itself has to be sorted.
Looks like you have an index on date already, and since you are using PUBLIC in your where clause, MySQL should be able to use that index. However, the optimizer may have decided that since you have so few rows it isn't worth bothering with the index. Try adding 10,000 or so rows to the table, re-analyze it, and see if that changes the plan.