Optinimizing query with fts + composite index - mysql

I have the following query:
SELECT *
FROM table
WHERE
structural_type=1
AND parent_id='167F2-F'
AND points_to_id=''
# AND match(search) against ('donotmatch124213123123')
The search takes about 10ms to run, running on the composite index (structural_type, parent_id, points_to_id). However, when I add in the fts index, the query balloons to taking ~1s, regardless of what is contained in the match criteria. Basically it seems like it 'skips the index' whenever I have a fts search applied.
What would be the best way to optimize this query?
Update: a few explains:
EXPLAIN SELECT... # without fts
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE table NULL ref structural_type structural_type 209 const,const,const 2 100.00 NULL
With fts (also adding 'force index'):
explain SELECT ... force INDEX (structural_type) AND match...
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE table NULL fulltext structural_type,search search 0 const 1 5.00 Using where; Ft_hints: sorted
The only thing I can think of which would be incredibly hack-ish, would be to add an additional term to the fts so it does the filter 'within' that. For example:
fts_term = fts_term += " StructuralType1ParentID167F2FPointsToID"

The MySQL optimizer can only use one index for your WHERE clause, so it has to choose between the composite one and the FULLTEXT one.
Since it can't run both queries to bench which one is faster, it will estimate how fast will different execution plans be.
To do so, MySQL uses some internal stats it keeps about each table. But those stats can be very different from the reality if they aren't updated and the data changes in the table.
Running a OPTIMIZE TABLE table query allows MySQL to refresh its table stats, so it will be able to perform better estimates and choose the better index.

Try expressing this without the full text logic, using like:
SELECT *
FROM table
WHERE structural_type = 1 AND
parent_id ='167F2-F' AND
points_to_id = '' AND
search not like '%donotmatch124213123123%';
The index should still be used for the first three columns. LIKE might be slow, but if not many rows match the first three, this might not be as bad as using the full text index.

Related

MySQL (MyISAM) SELECT query takes too long with join

I have a pretty long insert query that inserts data from a select query in a table. The problem is that the select query takes too long to execute. The table is MyISAM and the select locks the table which affects other users who also use the table. I have found that problem of the query is a join.
When I remove this part of the query, it takes less then a second to execute but when I leave this part the query takes more than 15 minutes:
LEFT JOIN enq_217 Pex_217
ON e.survey_panelId = Pex_217.survey_panelId
AND e.survey_respondentId = Pex_217.survey_respondentId
AND Pex_217.survey_respondentId != 0
db.table_1 contains 5,90,145 rows and e contains 4,703 rows.
Explain Output:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY e ALL survey_endTime,survey_type NULL NULL NULL 4703 Using where
1 PRIMARY Pex_217 ref survey_respondentId,idx_table_1 idx_table_1 8 e.survey_panelId,e.survey_respondentId 2 Using index
2 DEPENDENT SUBQUERY enq_11525_timing eq_ref code code 80 e.code 1
How can I edit this part of the query to be faster?
I suggest creating an index on the table db.table_1 for the fields panelId and respondentId
You want an index on the table. The best index for this logic is:
create index idx_table_1 on table_1(panelId, respondentId)
The order of these two columns in the index should not matter.
You might want to include other columns in the index, depending on what the rest of the query is doing.
Note: a single index with both columns is different from two indexes with each column.
Why is it a LEFT join?
How many rows in Pex_217?
Run ANALYZE TABLE on each table used. (This sometimes helps MyISAM; rarely is needed for InnoDB.)
Since the 'real problem' seems to be that the query "holds up other users", switch to InnoDB.
Tips on conversion
The JOIN is not that bad (with the new index -- note Using index): 4703 rows scanned, then reach into the other table's index about 2 times each.
Perhaps the "Dependent subquery" is the costly part. Let's see that.

InnoDB has index problems when using COUNT() + WHERE

Recently, we switched from MyISAM to InnoDB. I tested the whole application and there are generally no problems except for one thing - using COUNT(*) in combination with 2 or more WHERE conditions.
So, here's the problem. The query below takes half a second which is not acceptable. After all InnoDB shouldn't be slower than MyISAM when using COUNT() + WHERE, but that's exactly what is happening here.
Both project_id and status_id are indexed columns. The table has 350K records.
SELECT COUNT(*) FROM respondents WHERE project_id='366' AND status_id='42'
And here is what EXPLAIN says:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE respondents index_merge project_id,status_id project_id,status_id 4,1 NULL 8343 Using intersect(project_id,status_id); Using where...
When I use only one condition after WHERE (either project_id='366' or status_id='42'), it works fine.
I'm thinking, this whole intersecting thing could be the root of the problem. But then what can I do about it? What do you think?
The index merge can be fixed by a compound index
ALTER TABLE respondents ADD KEY(project_id,status_id)
Assuming the data distribution is not very skewed,so this index will be useful.(the project_id='366' AND status_id='42' will not return more than 50% of rows)
Also make sure that your column types match the search.Are project_id and status_id really VARCHAR? If not remove the quotes.

Whats the difference between "Using index" and "Using where; Using index" in the EXPLAIN

In the extra field of the explain in mysql you can get:
Using index
Using where; Using index
What's the difference between the two?
To explain my question better I'm going to use the following table:
CREATE TABLE `test` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`another_field` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8;
INSERT INTO test() VALUES(),(),(),(),();
Which ends up with the content like:
SELECT * FROM `test`;
id another_field
1 0
2 0
3 0
4 0
5 0
On my research I found
Why is this query using where instead of index?
The output of EXPLAIN can sometimes be misleading.
For instance, filesort has nothing to do with files, using where
does not mean you are using a WHERE clause, and using index can
show up on the tables without a single index defined.
Using where just means there is some restricting clause on the table
(WHERE or ON), and not all record will be returned. Note that
LIMIT does not count as a restricting clause (though it can be).
Using index means that all information is returned from the index,
without seeking the records in the table. This is only possible if all
fields required by the query are covered by the index.
Since you are selecting *, this is impossible. Fields other than
category_id, board_id, display and order are not covered by
the index and should be looked up.
and I also found
https://dev.mysql.com/doc/refman/5.1/en/explain-output.html#explain-extra-information
Using index
The column information is retrieved from the table using only
information in the index tree without having to do an additional seek
to read the actual row. This strategy can be used when the query uses
only columns that are part of a single index.
If the Extra column also says Using where, it means the index is being
used to perform lookups of key values. Without Using where, the
optimizer may be reading the index to avoid reading data rows but not
using it for lookups. For example, if the index is a covering index
for the query, the optimizer may scan it without using it for lookups.
For InnoDB tables that have a user-defined clustered index, that index
can be used even when Using index is absent from the Extra column.
This is the case if type is index and key is PRIMARY.
(Look at the second paragraph)
My problem with this:
First: I didn't understand the second paragraph the way it's written.
Second:
The following query returns
EXPLAIN SELECT id FROM test WHERE id = 5;
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE test const PRIMARY PRIMARY 8 const 1 Using index
(Scroll to the right)
And this other query returns:
EXPLAIN SELECT id FROM test WHERE id > 5;
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE test range PRIMARY PRIMARY 8 NULL 1 Using where; Using index
(Scroll to the right)
Other than the fact that one query uses a range search and another uses the constant search, both queries are using some restricting clause on the table (WHERE or ON), and not all record will be returned.
What does the Using where; mean on the second query mean? and what does the fact that it's not on the first query mean?
EXTRA
What is the difference with Using index condition; Using where?
(I'm not adding an example of this because I have not been able to reproduce it in a small self contained piece os code)
When you see Using Index in the Extra part of an explain it means that the (covering) index is adequate for the query.
In your example: SELECT id FROM test WHERE id = 5; the server doesn't need to access the actual table as it can satisfy the query (you only access id) only using the index (as the explain says). In case you are not aware the PK is implemented via a unique index.
When you see Using Index; Using where it means that first the index is used to retrieve the records (an actual access to the table is not needed) and then on top of this result set the filtering of the where clause is done.
In this example: SELECT id FROM test WHERE id > 5; you still fetch for id from the index and then apply the greater than condition to filter out the records non matching the condition

Distinct (or group by) using filesort and temp table

I know there are similar questions on this but I've got a specific query / question around why this query
EXPLAIN SELECT DISTINCT RSubdomain FROM R_Subdomains WHERE EmploymentState IN (0,1) AND RPhone='7853932120'
gives me this output explain
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE RSubdomains index NULL RSubdomain 767 NULL 3278 Using where
with and index on RSubdomains
but if I add in a composite index on EmploymentState/RPhone
I get this output from explain
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE RSubdomains range EmploymentState EmploymentState 67 NULL 2 Using where; Using temporary
if I take away the distinct on RSubdomains it drops the Using temp from the explain output... but what I don't get is why, when I add in the composite key (and keeping the key on RSubdomain) does the distinct end up using a temp table and which index schema is better here? I see that the amount of rows scanned on the combined key is far less, but the query is of type range and it's also slower.
Q: why ... does the distinct end up using a temp table?
MySQL is doing a range scan on the index (i.e. reading index blocks) to locate the rows that satisfy the predicates (WHERE clause). Then MySQL has to lookup the value of the RSubdomain column from the underlying table (it's not available in the index.) To eliminate duplicates, MySQL needs to scan the values of RSubdomain that were retrieved. The "Using temp" indicates the MySQL is materializing a resultset, which is processed in a subsequent step. (Likely, that's the set of RSubdomain values that was retrieved; given the DISTINCT, it's likely that MySQL is actually creating a temporary table with RSubdomain as a primary or unique key, and only inserting non-duplicate values.
In the first case, it looks like the rows are being retreived in order by RSubdomain (likely, that's the first column in the cluster key). That means that MySQL needn't compare the values of all the RSubdomain values; it only needs to check if the last retrieved value matches the currently retrieved value to determine whether the value can be "skipped."
Q: which index schema is better here?
The optimum index for your query is likely a covering index:
... ON R_Subdomains (RPhone, EmploymentState, RSubdomain)
But with only 3278 rows, you aren't likely to see any performance difference.
FOLLOWUP
Unfortunately, MySQL does not provide the type of instrumentation provided in other RDBMS (like the Oracle event 10046 sql trace, which gives actual timings for resources and waits.)
Since MySQL is choosing to use the index when it is available, that is probably the most efficient plan. For the best efficiency, I'd perform an OPTIMIZE TABLE operation (for InnoDB tables and MyISAM tables with dynamic format, if there have been a significant number of DML changes, especially DELETEs and UPDATEs that modify the length of the row...) At the very least, it would ensure that the index statistics are up to date.
You might want to compare the plan of an equivalent statement that does a GROUP BY instead of a DISTINCT, i.e.
SELECT r.RSubdomain
FROM R_Subdomains r
WHERE r.EmploymentState IN (0,1)
AND r.RPhone='7853932120'
GROUP
BY r.Subdomain
For optimum performance, I'd go with a covering index with RPhone as the leading column; that's based on an assumption about the cardinality of the RPhone column (close to unique values), opposed to only a few different values in the EmploymentState column. That covering index will give the best performance... i.e. the quickest elimination of rows that need to be examined.
But again, with only a couple thousand rows, it's going to be hard to see any performance difference. If the query was examining millions of rows, that's when you'd likely see a difference, and the key to good performance will be limiting the number of rows that need to be inspected.

MySQL not using index when checking = 1 , but using it with = 0

Here is a perplexing issue I am having:
Query:
EXPLAIN SELECT id,hostname FROM queue_servers WHERE live=1
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE queue_servers ALL live NULL NULL NULL 6 Using where
Query:
EXPLAIN SELECT id,hostname FROM queue_servers WHERE live=0
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE queue_servers ref live live 1 const 1
SHOW INDEXES FROM queue_servers
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type
queue_servers 1 live 1 live A 6 NULL NULL BTREE
Any ideas? This is making me go crazy.. If I just try selecting a single column like this:
EXPLAIN SELECT id FROM queue_servers WHERE live=1
It works just fine.. But if I try to select the column "hostname" , or add it to the select column list, it won't use the live index unless I am searching for live=0 .. Why is this?
Why doesn't MySQL not use an index?
MySQL will not use an index if a large percentage of the rows have that value.
Why will adding use index to the query not work here
Adding a use index clause will have no effect, because use index will only suggest which index to use, it will not suggest whether to use an index or not.
Caveat when using test tables with few rows
This is especially vexing when using test tables with few rows as MySQL will refuse to use an index, and it's hard to see what's wrong with your query.
So make sure you add enough rows to a test table to make it a realistic test.
Is using an index on low cardinality columns useless?
Indexing on boolean columns is not as useful as you thought before asking this question.
However it is also not useless either.
With InnoDB MySQL will try and retrieve data using the indexes if possible, if your boolean field has an index the query:
SELECT id, bool_field FROM table_with_many_columns_and_rows WHERE bool_field = 1
Can read all the data in the covering index for bool_field because secondary indexes in InnoDB always include the primary index (id) as well.
This is faster because MySQL does not have to read the entire table into memory.
In MyISAM this doesn't work and MySQL will examine the whole table.
But I can use force index
You can, but on low cardinality indexes it will make your query slower, not faster.
Only override the indexes on complex queries and only if you know the rules MySQL uses to select indexes.
Links:
See: http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html
and: http://www.xaprb.com/blog/2006/07/04/how-to-exploit-mysql-index-optimizations/
If you want a book on this subject: read
High performance MySQL: http://oreilly.com/catalog/9780596003067
Have you tried enforcing a particular index for search ? http://dev.mysql.com/doc/refman/5.0/en/index-hints.html
You can tell the DBMS to use a proper index for the query. That should give you predictable behaviour.