I have a client running a php photo gallery (on php 5.5, mysql 5.5, using myisam tables) that uses the directory tree method. Unfortunately, some of the queries in their gallery application is demanding horribly long filesorts. The offending query:
SELECT `name`, `slug`
FROM `db_table`
WHERE `left_ptr` <= '914731'
AND `right_ptr` >= '914734'
AND `id` <> 1
ORDER BY `left_ptr` ASC
There are indexes on id, left_ptr and right_ptr, but according to the EXPLAIN, none of them are being used in the query.
I heard that creating a composite index (on the 'condition' columns) would make things faster, but does that apply to this case? The last condition is really but an 'anything but 1' clause, so would a composite index apply to that, too? Thanks for any insight into this.
Yes, a composite index on (left_ptr, right_ptr) should make this query run better.
MySQL will only use one index per query. It's likely not using any single index because it's determined no single index would be much faster than a full table scan. For example, id <> 1 is every row but the first, so just do a full table scan. The other two filters depend on how the data is distributed, but if it doesn't filter a significant portion of the table it won't use an index.
A composite index on (left_ptr, right_ptr) should make this query run better. Don't bother with id, as above id <> 1 only filters one row.
MySQL can use the first column of a composite index alone, so this composite index also replaces the one on left_ptr alone
Related
I want a query that does a fulltext search on one field and then a sort on a different field (imagine searching some text document and order by publication date). The table has about 17M rows and they are more or less uniformly distributed in dates. This is to be used in a webapp request/response cycle, so the query has to finish in at most 200ms.
Schematically:
SELECT * FROM table WHERE MATCH(text) AGAINST('query') ORDER BY date=my_date DESC LIMIT 10;
One possibility is having a fulltext index on the text field and a btree on the publication date:
ALTER TABLE table ADD FULLTEXT index_name(text);
CREATE INDEX index_name ON table (date);
This doesn't work very well in my case. What happens is that MySQL evaluates two execution paths. One is using the fulltext index to find the relevant rows, and once they are selected use a FILESORT to sort those rows. The second is using the BTREE index to sort the entire table and then look for matches using a FULL TABLE SCAN. They're both bad. In my case MySQL chooses the former. The problem is that the first step can select some 30k results which it then has to sort, which means the entire query might take of the order 10 seconds.
So I was thinking: do composite indexes of FULLTEXT+BTREE exist? If you know how a FULLTEXT index works, it first tokenizes the column you're indexing and then builds an index for the tokens. It seems reasonable to me to imagine a composite index such that the second index is a BTREE in dates for each token. Does this exist in MySQL and if so what's the syntax?
BONUS QUESTION: If it doesn't exist in MySQL, would PostgreSQL perform better in this situation?
Use IN BOOLEAN MODE.
The date index is not useful. There is no way to combine the two indexes.
Beware, if a user searches for something that shows up in 30K rows, the query will be slow. There is no straightforward away around it.
I suspect you have a TEXT column in the table? If so, there is hope. Instead of blindly doing SELECT *, let's first find the ids and get the LIMIT applied, then do the *.
SELECT a.*
FROM tbl AS a
JOIN ( SELECT date, id
FROM tbl
WHERE MATCH(...) AGAINST (...)
ORDER BY date DESC
LIMIT 10 ) AS x
USING(date, id)
ORDER BY date DESC;
Together with
PRIMARY KEY(date, id),
INDEX(id),
FULLTEXT(...)
This formulation and indexing should work like this:
Use FULLTEXT to find 30K rows, deliver the PK.
With the PK, sort 30K rows by date.
Pick the last 10, delivering date, id
Reach back into the table 10 times using the PK.
Sort again. (Yeah, this is necessary.)
More (Responding to a plethora of Comments):
The goal behind my reformulation is to avoid fetching all columns of 30K rows. Instead, it fetches only the PRIMARY KEY, then whittles that down to 10, then fetches * only 10 rows. Much less stuff shoveled around.
Concerning COUNT on an InnoDB table:
INDEX(col) makes it so that an index scan works for SELECT COUNT(*) or SELECT COUNT(col) without a WHERE.
Without INDEX(col),SELECT COUNT(*)will use the "smallest" index; butSELECT COUNT(col)` will need a table scan.
A table scan is usually slower than an index scan.
Be careful of timing -- It is significantly affected by whether the index and/or table is already cached in RAM.
Another thing about FULLTEXT is the + in front of words -- to say that each word must exist, else there is no match. This may cut down on the 30K.
The FULLTEXT index will deliver the date, id is random order, not PK order. Anyway, it is 'wrong' to assume any ordering, hence it is 'right' to add ORDER BY, then let the Optimizer toss it if it knows that it is redundant. And sometimes the Optimizer can take advantage of the ORDER BY (not in your case).
Removing just the ORDER BY, in many cases, makes a query run much faster. This is because it avoids fetching, say, 30K rows and sorting them. Instead it simply delivers "any" 10 rows.
(I have not experience with Postgres, so I cannot address that question.)
I have a table with 150k rows of data, and I have column with a UNIQUE INDEX, It has a type of VARCHAR(10) and stores 10 digit account numbers.
Now whenever I query, like a simple one:
SELECT * FROM table WHERE account_number LIKE '0103%'
It results 30,000+ ROWS, and when I run a EXPLAIN on my query It shows no INDEX is used.
But when I do:
SELECT * FROM table WHERE account_number LIKE '0104%'
It results 4,000+ ROWS, with the INDEX used.
Anyone can explain this?
I'm using MySQL 5.7 Percona XtraDB.
30k+/150k > 20% and I guess it is faster to do table scan. From 8.2.1.19 Avoiding Full Table Scans:
The output from EXPLAIN shows ALL in the type column when MySQL uses a full table scan to resolve a query. This usually happens under the following conditions:
You are using a key with low cardinality (many rows match the key value) through another column. In this case, MySQL assumes that by using the key it probably will do many key lookups and that a table scan would be faster.
If you don't need all values try to use:
SELECT account_number FROM table WHERE account_number LIKE '0103%'
instead of SELECT *. Then your index will become covering index and optimizer should always use it (as long as WHERE condition is SARGable).
The most database uses B tree for indexing. In this case the database optimizer don't use the index because its faster to scan without index. Like #lad2025 explained.
Your database column is unique and i think your cardinality of your index is high. But since your query using the like filter the database optimizer decides for you to choose not to use the index.
You can use try force index to see the result. Your using varchar with unique index. I would choose another data type or change your index type. If your table only contains numbers change it to numbers. This will help to optimize you query a lot.
In some cases when you have to use like you can use full text index.
If you need help with optimizing your query and table. Provide us more info and which info you want to fetch from your table.
lad2025 is correct. The database is attempting to make an intelligent optimization.
Benchmark with:
SELECT * FROM table FORCE INDEX(table_index) WHERE account_number LIKE '0103%'
and see who is smarter :-) You can always try your hand at questioning the optimizer. That's what index hints are for...
https://dev.mysql.com/doc/refman/5.7/en/index-hints.html
I've just heard the term covered index in some database discussion - what does it mean?
A covering index is an index that contains all of, and possibly more, the columns you need for your query.
For instance, this:
SELECT *
FROM tablename
WHERE criteria
will typically use indexes to speed up the resolution of which rows to retrieve using criteria, but then it will go to the full table to retrieve the rows.
However, if the index contained the columns column1, column2 and column3, then this sql:
SELECT column1, column2
FROM tablename
WHERE criteria
and, provided that particular index could be used to speed up the resolution of which rows to retrieve, the index already contains the values of the columns you're interested in, so it won't have to go to the table to retrieve the rows, but can produce the results directly from the index.
This can also be used if you see that a typical query uses 1-2 columns to resolve which rows, and then typically adds another 1-2 columns, it could be beneficial to append those extra columns (if they're the same all over) to the index, so that the query processor can get everything from the index itself.
Here's an article: Index Covering Boosts SQL Server Query Performance on the subject.
Covering index is just an ordinary index. It's called "covering" if it can satisfy query without necessity to analyze data.
example:
CREATE TABLE MyTable
(
ID INT IDENTITY PRIMARY KEY,
Foo INT
)
CREATE NONCLUSTERED INDEX index1 ON MyTable(ID, Foo)
SELECT ID, Foo FROM MyTable -- All requested data are covered by index
This is one of the fastest methods to retrieve data from SQL server.
Covering indexes are indexes which "cover" all columns needed from a specific table, removing the need to access the physical table at all for a given query/ operation.
Since the index contains the desired columns (or a superset of them), table access can be replaced with an index lookup or scan -- which is generally much faster.
Columns to cover:
parameterized or static conditions; columns restricted by a parameterized or constant condition.
join columns; columns dynamically used for joining
selected columns; to answer selected values.
While covering indexes can often provide good benefit for retrieval, they do add somewhat to insert/ update overhead; due to the need to write extra or larger index rows on every update.
Covering indexes for Joined Queries
Covering indexes are probably most valuable as a performance technique for joined queries. This is because joined queries are more costly & more likely then single-table retrievals to suffer high cost performance problems.
in a joined query, covering indexes should be considered per-table.
each 'covering index' removes a physical table access from the plan & replaces it with index-only access.
investigate the plan costs & experiment with which tables are most worthwhile to replace by a covering index.
by this means, the multiplicative cost of large join plans can be significantly reduced.
For example:
select oi.title, c.name, c.address
from porderitem poi
join porder po on po.id = poi.fk_order
join customer c on c.id = po.fk_customer
where po.orderdate > ? and po.status = 'SHIPPING';
create index porder_custitem on porder (orderdate, id, status, fk_customer);
See:
http://literatejava.com/sql/covering-indexes-query-optimization/
Lets say you have a simple table with the below columns, you have only indexed Id here:
Id (Int), Telephone_Number (Int), Name (VARCHAR), Address (VARCHAR)
Imagine you have to run the below query and check whether its using index, and whether performing efficiently without I/O calls or not. Remember, you have only created an index on Id.
SELECT Id FROM mytable WHERE Telephone_Number = '55442233';
When you check for performance on this query you will be dissappointed, since Telephone_Number is not indexed this needs to fetch rows from table using I/O calls. So, this is not a covering indexed since there is some column in query which is not indexed, which leads to frequent I/O calls.
To make it a covered index you need to create a composite index on (Id, Telephone_Number).
For more details, please refer to this blog:
https://www.percona.com/blog/2006/11/23/covering-index-and-prefix-indexes/
I have a table with two partitions. Partitions are pactive = 1 and pinactive = 0. I understand that two partitions does not make so much of a gain, but I have used it to truncate and load in one partition and plain inserts in another partition.
The problem comes when I create indexes.
Query goes this way
select partitionflag,companyid,activityname
from customformattributes
where companyid=47
and activityname = 'Activity 1'
and partitionflag=0
Created index -
create index idx_try on customformattributes(partitionflag,companyid,activityname,completiondate,attributename,isclosed)
there are around 200000 records that will be retreived from the above query. But the query along with the mentioned index takes 30+ seconds. What is the reason for such a long time? Also, if remove the partitionflag from the mentioned index, the index is not even used.
And is the understanding that,
Even with the partitions available, the optimizer needs to have the required partition mentioned in the index definition, so that it only hits the required partition ---- Correct?
Any ideas on understanding this would be very helpful
You can optimize your index by reordering the columns in it. Usually the columns in the index are ordered by its cardinality (starting from the highest and go down to the lowest). Cardinality is the uniqueness of data in the given column. So in your case I suppose there are many variations of companyid in customformattributes table while partitionflag will have cardinality of 2 (if all the options for this column are 1 and 0).
Your query will first filter all the rows with partitionflag=0, then it will filter by company id and so on.
When you remove partitionflag from the index the query did not used the index because may be the optimizer decides that it will be faster to make full table scan instead of using the index (in most of the cases the optimizer is right)
For the given query:
select partitionflag,companyid,activityname
from customformattributes
where companyid=47
and activityname = 'Activity 1'
and partitionflag=0
the following index may be would be better (but of course :
create index idx_try on customformattributes(companyid,activityname, completiondate,attributename, partitionflag, isclosed)
For the query to use index the following rule must be met - the left most column in the index should be present in the where clause ... and depending on the mysql version you are using additional query requirements may be needed. For example if you are using old version of mysql - you may need to order the columns in the where clause in the same order they are listed in the index. In the last versions of mysql the query optimizer is responsible for ordering the columns in the where clause in the correct order.
Your SELECT query took 30+ seconds because it returns 200k rows and because the index might not be the optimal for the given query.
For the second question about the partitioning: the common rule is that the column you are partitioning by must be part of all the UNIQUE keys in a table (Primary key is also unique key by definition so the column should be added to the PK also). If table structure and logic allows you to add the partitioning column to all the UNIQUE indexes in the table then you add it and partition the table.
When the partitioning is made correctly you can take the advantage of partitioning pruning - this is when SELECT query searches the data only in the partitions where given data is stored (otherwise it looks in all partitions)
You can read more about partitioning here:
https://dev.mysql.com/doc/refman/5.6/en/partitioning-overview.html
The query is slow simply because disks are slow.
Cardinality is not important when designing an index.
The optimal index for that query is
INDEX(companyid, activityname, partitionflag) -- in any order
It is "covering" since it includes all the columns mentioned anywhere in the SELECT. This is indicated by "Using index" in the EXPLAIN.
Leaving off the other 3 columns makes the query faster because it will have to read less off the disk.
If you make any changes to the query (add columns, change from '=' to '>', add ORDER BY, etc), then the index may no longer be optimal.
"Also, if remove the partitionflag from the mentioned index, the index is not even used." -- That is because it was no longer "covering".
Keep in mind that there are two ways an index may be used -- "covering" versus being a way to look up the data. When you don't have a "covering" index, the optimizer chooses between using the index and bouncing between the index and the data versus simply ignoring the index and scanning the table.
Making my concept more clear about indexes in Mysql. What i know is indexes are used to make your query faster. except that i have couple of questions to know about.
Let's say i'm having a query:
SELECT books.name, books.name2, books.id, books.image, books.faith, books.topic, books.downloaded, books.viewed, books.language, books.size, books.author as author_id, authors.name as author_name, authors.aid from books LEFT JOIN authors ON books.author = authors.aid WHERE books.id = '".$id."' AND status = 1
Is any index applicable for this select query while it has JOIN?
After making an index on a column query for that will be optimized
or changed ?
How to make index for this query and against which column?
What's the other benefits or disadvantages of using indexes?
On which case indexes should be avoided and where should use indexes
for more?
Do indexes applicable on random queries ?
Are indexes more efficient on IDS ?
Please apprise, thank you in advanced !
You can check details on below links.
https://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
http://mydbsolutions.in/query-optimization-2/
Your queries are here-
Is any index applicable for this select query while it has JOIN?
It will depend on various factors like if your table have very less data or approx. 70% data is same in index column then mysql will prefer to scan table instead of index. In simple all your join columns should be indexed (will be indexed if you use foreign key concept other wise you should indexed them). Also on which column your query is filtering most data that field should be indexed. In your case you are filtering data on books.id which should be primary key so already indexed.
After making an index on a column query for that will be optimized or changed?
It will be automatically start to use index but in some cases may be you need to change your query. Suppose you are using a filter condition as "date(order_date)='2015-10-15'", even after creating index on order_date it will not be used so you have to change your query as "order_date>='2015-10-15 00:00:00' and order_date<='2015-10-15 23:59:59'" if you order_date column data type is datetime or timestamp.
How to make index for this query and against which column?
Here I am not seeing any need of making index as your condition is on books table primary key and it will be already indexed.
What's the other benefits or disadvantages of using indexes?
If you create index blindly then at the time of record insertion/updation etc each time index will be updated and will slow the process. Even heavy index will perform slow. Also will consume more disk space.
On which case indexes should be avoided and where should use indexes for more?
If more than 70% data is same for any column then no need to create index on them like status or is_deleted type columns as mostly data will be active.
Do indexes applicable on random queries ?
Yes index work on random queries, for repeatable queries you can use query cache which will be more efficient.
Are indexes more efficient on IDS ?
Yes.