I have a table with 8 columns with mixed datatypes (int, text, varchar, etc..)
This table only has one query that operates on it (a SELECT), and within that query it uses 4 "AND" statements to filter the results.
My question is: what are some guidelines for indexing in this situation?
In my case, read times have a higher priority than write times, so should I index all the columns that appear in an AND clause? Should I only index the integers?
For just one query, make an index with all of the columns affected.
MySQL's indexes work from left to right, meaning that if you have a table with the columns:
A varchar(255)
B varchar(50)
C int(11)
D datetime
An index on KEY index (A,B,C,D) means that MySQL will look for the columns in your query, from left to right. If you include them all in your query, it will use them all but if you only include A and B, it will use these to filter down the results before applying the rest of the conditions to the remaining result set (assuming you're using AND).
However if you only include C and D, it will not use this index at all since it doesn't know what to look for in A.
If you take a quick look at how MySQL uses its indexes you should be able to work out the best columns to index.
I think you should create an index for every column, you need for your select statements. Notice, that there is a limit in mysql on a length of a field, you create an index on. The created index can not exceed 1000 bytes for MyISAM and 767 for InnoDb (thx OMG Ponies). Here you can find an discussion about assessment of the size of the index for varchars column. If you don't have writes, it should significantly boost performance of select statements.
Related
I have a table with 32 columns of which 6 rows are primary keys and 2 more column are indexed.
Explain statement provides the below output
I have observed that, everytime the number of rows in the explain statement increases, the select query takes seconds to retrieve data from DB. The above select query returned only 310 rows but it had to scan 382546 rows.
Time taken was calculated by enabling mariadb's slow query log.
Create table query
I would like to understand the incorrectness in the table or query which is considerably slowing down the select query execution.
Your row is relatively large (around 300bytes, depending on the content of your varchar columns). Using the primary key means (for InnoDB) that MySQL will read the whole row. Assuming the estimate of 400k rows is right (which it probably isn't, but you can check by removing the and country_code = 1506 from your query to get a better count), MySQL may end up reading more than 100mb from disk, which reasonably can take several seconds.
Adding a proper index should fix this, in your case I would suggest (country_code, lcr_run_id, tier_type) (which would, with your primary key, actually be the same as just (country_code)).
If most of your queries have that form (e.g. use at least these three columns for lookup), you could think about changing the order of your primary key to start with those three columns, it should give you another speedboost. That operation will take some time though.
Hash partitioning is useless for performance, get rid of it. Ditto for subpartitioning.
Specifying which partition to use defeats the purpose of letting the Optimizer do it for you.
You simply need INDEX(tier_type, lcr_run_id, country_code) with the columns in any desired order.
Plan A: Have the PRIMARY KEY start with those 3 columns (again, the order is not important)
Plan B: Have a "secondary" index with those 3 columns, but not being the same as the start of the PK. (This index could have more columns on the end; let's see some more queries to advise further.)
Either way, it will scan only 310 rows if you also get rid of all partitioning. (Hence, resolving your "returned only 310 rows but it had to scan 382546 rows". Anyway, the '382546' may have been a poor estimate by Explain.)
The important issue here is that indexing works with the leftmost columns in the INDEX. (The PK is an index.) Your SELECT had a match on the first 2 columns, but country_code came later in the list, and the intervening columns were not tested with =.
The three 35M values makes me wonder if the PK is over-specified. For example, if a "zone" is comprised of several "countries", then "zone" is irrelevant in specifying the PK.
The table has only 382K rows, but it is much fatter than it needs to be. Partitioning has a lot of overhead. Also, most columns have (I think) much bigger datatypes than needed. BIGINT takes 8 bytes; INT takes 4 bytes. For example, if there are only a small number of "zones", use TINYINT UNSIGNED, which takes only 1 byte (and allows values 0..255). (See also other 'int' variants.)
Oops, I missed something else. Since zone is not in the WHERE, it can't even get past the primary partitioning.
I have a table with index on a int column.
Create table sample(
col1 varchar,
col2 int)
Create index idx1 on sample(col2);
When I explain the following query
Select * from sample where col2>2;
It does a full table scan.
Why doesn't the indexing work here?
How can i optimize such queries when table has around 20 million records?
Just because you create an index, does not mean MySQL will always use it. According to the docs, here are several reasons why it may choose to use a full table scan over the index:
The table is so small that it is faster to perform a table scan than to bother with a key lookup. This is common for tables with fewer than 10 rows and a short row length.
There are no usable restrictions in the ON or WHERE clause for indexed columns.
You are comparing indexed columns with constant values and MySQL has calculated (based on the index tree) that the constants cover too large a part of the table and that a table scan would be faster. See Section 8.2.1.1, “WHERE Clause Optimization”.
You are using a key with low cardinality (many rows match the key value) through another column. In this case, MySQL assumes that by using the key it probably will do many key lookups and that a table scan would be faster.
You can use FORCE INDEX to ensure your query uses the index instead of allowing the optimizer to determine the appropriate path, although usually MySQL will take the most efficient approach.
SELECT * FROM t1, t2 FORCE INDEX (index_for_column) WHERE t1.col_name=t2.col_name;
Reference: https://dev.mysql.com/doc/refman/8.0/en/table-scan-avoidance.html
I am working on a database with large number of rows (6 Mil+).
This table has a composite primary key on two columns.
It also has separate index on each of those fields as there are queries that require this. Obviously, one of those indexes (indices?) is redundant and slowing down performance for write operations.
How do I find out which one is redundant? I understand the first column of a primary key is already indexed and need not be indexed separately. Is that correct? If so, is there a query I can run to find out which is the first one in the list?
SHOW INDEXES FROM tablename will include a Seq_in_index column, which tells you which is first (aka, left most) column, second column, etc.
Therefore, whichever column is listed with a value of 1 for Seq_in_index is the column that does not need it's own single column index.
You can also use SHOW CREATE TABLE tablename to see the index listed from left to right, and that order displayed correctly represents the order of columns in the index.
SHOW CREATE TABLE tablename gives you all the indexes, in their established order.
You don't need INDEX(a) because the column(s) in it are the first column(s) in the INDEX(a,b),
That applies to INDEX / UNIQUE / PRIMARY KEY in (a,b).
I understand the first column of a primary key is already indexed
Erm, no. All the columns in the primary key are indexed.
An explanation of how indexes work is stretching the scope of a post here, and the question of which indexes to put on your table is way too broad.
Suppose you have a primary key defined on attributes a,b,c. This index can be used for queries with predicates
a
a and b
a and b and c
But (at least, the last time I checked) it would not be used for a query with predicates
b
b and c
The optimizer will only ever use one index for each table in a query.
The right indexes depend on the volume of data, the cardinality of the data and the frequency and combination of predicates in your queries. There are execution and storage overheads when you start adding indexes, even just for select operations badly designed indexes can make your query slower than it would run without indexes.
I have a table with 150k rows of data, and I have column with a UNIQUE INDEX, It has a type of VARCHAR(10) and stores 10 digit account numbers.
Now whenever I query, like a simple one:
SELECT * FROM table WHERE account_number LIKE '0103%'
It results 30,000+ ROWS, and when I run a EXPLAIN on my query It shows no INDEX is used.
But when I do:
SELECT * FROM table WHERE account_number LIKE '0104%'
It results 4,000+ ROWS, with the INDEX used.
Anyone can explain this?
I'm using MySQL 5.7 Percona XtraDB.
30k+/150k > 20% and I guess it is faster to do table scan. From 8.2.1.19 Avoiding Full Table Scans:
The output from EXPLAIN shows ALL in the type column when MySQL uses a full table scan to resolve a query. This usually happens under the following conditions:
You are using a key with low cardinality (many rows match the key value) through another column. In this case, MySQL assumes that by using the key it probably will do many key lookups and that a table scan would be faster.
If you don't need all values try to use:
SELECT account_number FROM table WHERE account_number LIKE '0103%'
instead of SELECT *. Then your index will become covering index and optimizer should always use it (as long as WHERE condition is SARGable).
The most database uses B tree for indexing. In this case the database optimizer don't use the index because its faster to scan without index. Like #lad2025 explained.
Your database column is unique and i think your cardinality of your index is high. But since your query using the like filter the database optimizer decides for you to choose not to use the index.
You can use try force index to see the result. Your using varchar with unique index. I would choose another data type or change your index type. If your table only contains numbers change it to numbers. This will help to optimize you query a lot.
In some cases when you have to use like you can use full text index.
If you need help with optimizing your query and table. Provide us more info and which info you want to fetch from your table.
lad2025 is correct. The database is attempting to make an intelligent optimization.
Benchmark with:
SELECT * FROM table FORCE INDEX(table_index) WHERE account_number LIKE '0103%'
and see who is smarter :-) You can always try your hand at questioning the optimizer. That's what index hints are for...
https://dev.mysql.com/doc/refman/5.7/en/index-hints.html
I'm trying to understand if it's possible to use an index on a join if there is no limiting where on the first table.
Note: this is not a line-by-line real-case usage, just a thing I draft together for understanding purposes. Don't point out the obvious "what are your trying to obtain with this schema?", "you should use UNSIGNED" or the likes because that's not the question.
Note2: this MySQL JOINS without where clause is somehow related but not the same
Schema:
CREATE TABLE posts (
id_post INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
text VARCHAR(100)
);
CREATE TABLE related (
id_relation INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
id_post1 INT NOT NULL,
id_post2 INT NOT NULL
);
CREATE INDEX related_join_index ON related(id_post1) using BTREE;
Query:
EXPLAIN SELECT * FROM posts FORCE INDEX FOR JOIN(PRIMARY) INNER JOIN related ON id_post=id_post1 LIMIT 0,10;
SQL Fiddle: http://sqlfiddle.com/#!2/84597/3
As you can see, the index is being used on the second table, but the engine is doing a full table scan on the first one (the FORCE INDEX is there just to highlight the general question).
I'd like to understand if it's possible to get a "ref" on the left side too.
Thanks!
Update: if the first table has significantly more record than the second, the thing swap: the engine uses an index for the first one and a full table scan for the second http://sqlfiddle.com/#!2/3a3bb/1 Still, no way to get indexes used on both.
The DBMS has an optimizer to figure out the best plan to execute a query. It's up to the optimizer to decide whether to use an index or simply read the table directly.
An index makes sense when the DBMS expects only few records to read from a table (say 1% of all rows only). But once it expects to read many records (say 99% of all rows) it will not use the index. The threshold may lie at low as 5% (i.e. <= 5% -> index; > 5% table scan).
There are exceptions. One is when an index holds all columns needed. Then the table itself doesn't have to be read at all. Another may be when the optimizer thinks an index access may result faster in spite of having to read many rows. It's also always possible the optimizer simply guesses wrong.
There is a page on the MySQL documentation about this subject.
Regarding the possibility to get a ref on the first table from the query, the short answer is NO.
The reason is obvious: because there is no WHERE clause ALL the rows from table posts are analyzed because they could be included in the result set. There is no reason to use an index for that, a full table scan is better because it gets all the rows; and because the order doesn't matter, the access is (more or less) sequential. Using an index requires reading more information from the storage (index and data).
MySQL will use the join type index if all the columns that appear in the SELECT clause are present in an index. In this case MySQL will perform a full index scan (join type index) instead of a full table scan (join type ALL) because it requires reading less information from the storage (an index is usually smaller than the entire table data).