SQL optimisation - mysql

I have this query on my database, it's basically pulling the ones that is within 10 miles radius from the coordinate.
SELECT id
FROM business
WHERE ( coordinates!=''
AND getDistance('-2.1032155,49.1801863', coordinates)<10 )
AND id NOT IN ('6', '4118') ORDER BY rand() LIMIT 0,5
when I profile this query, I get this:
JOIN SIZE: 3956 (BAD)
USING TEMPORARY (BAD)
USINGI FILESORT (BAD)
Can you guys help me optimising this query?
Thanks in advance

You will always need FILESORT (sorting the resulting data in temporary memory) as you "order by" something that is not indexed (rand()).
Depending on the optimizing capabilities of your DBMS, you might be better of using "AND id != '6' AND id != '4118'" instead of a "NOT IN" clause.
You should always state the fixed information first in queries, although this depends on the capabilities of the query optimizer as well. Also, the criteria should align with an index, meaning that the order of appearance of criteria should be the same as in the index you want your DMBS to use. There's usually an option to state which index to use (the keyword is "Index hint"), but most DMBS know best what index to use in most cases anway (guessing from the query itself).
And you will never get around USING TEMPORARY while you are using a criterium that is generated at query runtime (your function).

Related

Index when using OR in query

What is the best way to create index when I have a query like this?
... WHERE (user_1 = '$user_id' OR user_2 = '$user_id') ...
I know that only one index can be used in a query so I can't create two indexes, one for user_1 and one for user_2.
Also could solution for this type of query be used for this query?
WHERE ((user_1 = '$user_id' AND user_2 = '$friend_id') OR (user_1 = '$friend_id' AND user_2 = '$user_id'))
MySQL has a hard time with OR conditions. In theory, there's an index merge optimization that #duskwuff mentions, but in practice, it doesn't kick in when you think it should. Besides, it doesn't give as performance as a single index when it does.
The solution most people use to work around this is to split up the query:
SELECT ... WHERE user_1 = ?
UNION
SELECT ... WHERE user_2 = ?
That way each query will be able to use its own choice for index, without relying on the unreliable index merge feature.
Your second query is optimizable more simply. It's just a tuple comparison. It can be written this way:
WHERE (user_1, user_2) IN (('$user_id', '$friend_id'), ('$friend_id', '$user_id'))
In old versions of MySQL, tuple comparisons would not use an index, but since 5.7.3, it will (see https://dev.mysql.com/doc/refman/5.7/en/row-constructor-optimization.html).
P.S.: Don't interpolate application code variables directly into your SQL expressions. Use query parameters instead.
I know that only one index can be used in a query…
This is incorrect. Under the right circumstances, MySQL will routinely use multiple indexes in a query. (For example, a query JOINing multiple tables will almost always use at least one index on each table involved.)
In the case of your first query, MySQL will use an index merge union optimization. If both columns are indexed, the EXPLAIN output will give an explanation along the lines of:
Using union(index_on_user_1,index_on_user_2); Using where
The query shown in your second example is covered by an index on (user_1, user_2). Create that index if you plan on running those queries routinely.
The two cases are different.
At the first case both columns needs to be searched for the same value. If you have a two column index (u1,u2) then it may be used at the column u1 as it cannot be used at column u2. If you have two indexes separate for u1 and u2 probably both of them will be used. The choice comes from statistics based on how many rows are expected to be returned. If returned rows expected few an index seek will be selected, if the appropriate index is available. If the number is high a scan is preferable, either table or index.
At the second case again both columns need to be checked again, but within each search there are two sub-searches where the second sub-search will be upon the results of the first one, due to the AND condition. Here it matters more and two indexes u1 and u2 will help as any field chosen to be searched first will have an index. The choice to use an index is like i describe above.
In either case however every OR will force 1 more search or set of searches. So the proposed solution of breaking using union does not hinder more as the table will be searched x times no matter 1 select with OR(s) or x selects with union and no matter index selection and type of search (seek or scan). As a result, since each select at the union get its own execution plan part, it is more likely that (single column) indexes will be used and finally get all row result sets from all parts around the OR(s). If you do not want to copy a large select statement to many unions you may get the primary key values and then select those or use a view to be sure the majority of the statement is in one place.
Finally, if you exclude the union option, there is a way to trick the optimizer to use a single index. Create a double index u1,u2 (or u2,u1 - whatever column has higher cardinality goes first) and modify your statement so all OR parts use all columns:
... WHERE (user_1 = '$user_id' OR user_2 = '$user_id') ...
will be converted to:
... WHERE ((user_1 = '$user_id' and user_2=user_2) OR (user_1=user_1 and user_2 = '$user_id')) ...
This way a double index (u1,u2) will be used at all times. Please not that this will work if columns are nullable and bypassing this with isnull or coalesce may cause index not to be selected. It will work with ansi nulls off however.

Instructing MySQL to apply WHERE clause to rows returned by previous WHERE clause

I have the following query:
SELECT dt_stamp
FROM claim_notes
WHERE type_id = 0
AND dt_stamp >= :dt_stamp
AND DATE( dt_stamp ) = :date
AND user_id = :user_id
AND note LIKE :click_to_call
ORDER BY dt_stamp
LIMIT 1
The claim_notes table has about half a million rows, so this query runs very slowly since it has to search against the unindexed note column (which I can't do anything about). I know that when the type_id, dt_stamp, and user_id conditions are applied, I'll be searching against about 60 rows instead of half a million. But MySQL doesn't seem to apply these in order. What I'd like to do is to see if there's a way to tell MySQL to only apply the note LIKE :click_to_call condition to the rows that meet the former conditions so that it's not searching all rows with this condition.
What I've come up with is this:
SELECT dt_stamp
FROM (
SELECT *
FROM claim_notes
WHERE type_id = 0
AND dt_stamp >= :dt_stamp
AND DATE( dt_stamp ) = :date
AND user_id = :user_id
)
AND note LIKE :click_to_call
ORDER BY dt_stamp
LIMIT 1
This works and is extremely fast. I'm just wondering if this is the right way to do this, or if there is a more official way to handle it.
It shouldn't be necessary to do this. The MySQL optimizer can handle it if you have multiple terms in your WHERE clause separated by AND. Basically, it knows how to do "apply all the conditions you can using indexes, then apply unindexed expressions only to the remaining rows."
But choosing the right index is important. A multi-column index is best for a series of AND terms than individual indexes. MySQL can apply index intersection, but that's much less effective than finding the same rows with a single index.
A few logical rules apply to creating multi-column indexes:
Conditions on unique columns are preferred over conditions on non-unique columns.
Equality conditions (=) are preferred over ranges (>=, IN, BETWEEN, !=, etc.).
After the first column in the index used for a range condition, subsequent columns won't use an index.
Most of the time, searching the result of a function on a column (e.g. DATE(dt_stamp)) won't use an index. It'd be better in that case to store a DATE data type and use = instead of >=.
If the condition matches > 20% of the table, MySQL probably will decide to skip the index and do a table-scan anyway.
Here are some webinars by myself and my colleagues at Percona to help explain index design:
Tools and Techniques for Index Design
MySQL Indexing: Best Practices
Advanced MySQL Query Tuning
Really Large Queries: Advanced Optimization Techniques
You can get the slides for these webinars for free, and view the recording for free, but the recording requires registration.
Don't go for the derived table solution as it is not performant. I'm surprised about the fact that having = and >= operators MySQL is going for the LIKE first.
Anyway, I'd say you could try adding some indexes on those fields and see what happens:
ALTER TABLE claim_notes ADD INDEX(type_id, user_id);
ALTER TABLE claim_notes ADD INDEX(dt_stamp);
The latter index won't actually improve the search on the indexes but rather the sorting of the results.
Of course, having an EXPLAIN of the query would help.

Best way to use indexes on large mysql like query

This mysql query is runned on a large (about 200 000 records, 41 columns) myisam table :
select t1.* from table t1 where 1 and t1.inactive = '0' and (t1.code like '%searchtext%' or t1.name like '%searchtext%' or t1.ext like '%searchtext%' ) order by t1.id desc LIMIT 0, 15
id is the primary index.
I tried adding a multiple column index on all 3 searched (like) columns. works ok but results are served on a auto filled ajax table on a website and the 2 seond return delay is a bit too slow.
I also tried adding seperate indexes on all 3 columns and a fulltext index on all 3 columns without significant improvement.
What would be the best way to optimize this type of query? I would like to achieve under 1 sec performance, is it doable?
The best thing you can do is implement paging. No matter what you do, that IO cost is going to be huge. If you only return one page of records, 10/25/ or whatever that will help a lot.
As for the index, you need to check the plan to see if your index is actually being used. A full text index might help but that depends on how many rows you return and what you pass in. Using parameters such as % really drain performance. You can still use an index if it ends with % but not starts with %. If you put % on both sides of the text you are searching for, indexes can't help too much.
You can create a full-text index that covers the three columns: code, name, and ext. Then perform a full-text query using the MATCH() AGAINST () function:
select t1.*
from table t1
where match(code, name, ext) against ('searchtext')
order by t1.id desc
limit 0, 15
If you omit the ORDER BY clause the rows are sorted by default using the MATCH function result relevance value. For more information read the Full-Text Search Functions documentation.
As #Vulcronos notes, the query optimizer is not able to use the index when the LIKE operator is used with an expression that starts with a wildcard %.

How to create Effective Indexing on MySQL

Can anyone help me on how could i create an index so that my query will execute smoothly.
Currently, I have the below query that returns 8k+ or records. But it takes 2 sec or more to complete. The current records on tblproduction is 16million+
SELECT COUNT(fldglobalid) AS PackagesDone
FROM tblproduction
WHERE fldemployeeno = 'APD100401'
AND fldstarttime BETWEEN '2013-08-14 07:18:06' AND '2013-08-14 16:01:58'
AND fldshift = 'B'
AND fldprojectgroup = 'FTO'
AND fldGlobalID <> 0;
I have below current indexes but it still query executes longer
Index_1
fldEmployeeNo
fldStartTime
Index_2
fldEmployeeNo
fldTask
fldTaskStatus
Index_3
fldGlobalId
fldProjectGroup
Index_4
fldGlobalId
I have used all of this indexes using FORCE_Index but still the query executes longer.
Please advise, thanks!
This started as a comment Gordon Linoff's answer but is getting too long.
It would be better to include fldGlobalId in the index as well - no it would not - this is counter productive for performance - it won't improve the speed of retrieving the data (queries are not used for inequalities) but will lead to more frequent index updates, hence increased index fragmentation (hence potentially worsening the performance of SELECT) and reduced performance for inserts and updates.
Ideally you should design your schema to optimize all the queries - which is rather a large task, but since you've only provided one....
The query as it stands will only use a single index for resolution, hence the index should include all the fields which have predicates in the query except for non-matches (i.e. fldGlobalID).
The order of the fields is important: in the absence of other queries with different sets of predicates, then the fields with the highest relative cardinality should come first. It's rather hard to know what this is without analysing the data (SELECT COUNT(DISTINCT field)/COUNT(*) FROM yourtable) but at a guess the order should be fldstarttime, fldemployeeno, fldprojectgroup, fldshift
If there is a dependency on fldshift from fldemployeeno (i.e. employees always, or at least more than around 90% of the time) then including fldshift in the index is merely increasing it's size and not making it any more efficient.
You didn't say what type of index you're using - btrees work will with ranges, hashes with inequalities. Since the highest cardinality predicate here is using a range, then a btree index will be much more efficient than a hash based index.
You can use one index. Here is the query, slightly rearranged:
SELECT COUNT(fldglobalid) AS PackagesDone
FROM tblproduction
WHERE fldemployeeno = 'APD100401'
AND fldshift = 'B'
AND fldprojectgroup = 'FTO'
AND fldstarttime BETWEEN '2013-08-14 07:18:06' AND '2013-08-14 16:01:58'
AND fldGlobalID <> 0;
(I just moved the equality conditions together to the top of the where clause).
The query should make use of an index on tblproduction(fldemployeeno, fldshift, fldprojectgroup, fldstarttime). It would be better to include fldGlobalId in the index as well, so the index "covers" the query (all columns in the query are in the index). So, try this index:
tblproduction(fldemployeeno, fldshift, fldprojectgroup, fldstarttime, fldGlobalID)

How do I get MySQL to use an INDEX for view query?

I'm working on a web project with MySql database on Java EE. We needed a view to summarize data from 3 tables with over 3M rows overall. Each table was created with index. But I haven't found out a way to take advantages in the indexes in the conditional select statement retrieval from the view that we created with [group by].
I've getting suggestions from people that using views in MySql is not a good idea. Because you can't create index for views in mysql like in oracle. But in some test that I took, indexes can be used in view select statement. Maybe I've created those views in a wrong way.
I'll use a example to describe my problem.
We have a table that records data for high scores in NBA games, with index on column [happend_in]
CREATE TABLE `highscores` (
`tbl_id` int(11) NOT NULL auto_increment,
`happened_in` int(4) default NULL,
`player` int(3) default NULL,
`score` int(3) default NULL,
PRIMARY KEY (`tbl_id`),
KEY `index_happened_in` (`happened_in`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
insert data(8 rows)
INSERT INTO highscores(happened_in, player, score)
VALUES (2006, 24, 61),(2006, 24, 44),(2006, 24, 81),
(1998, 23, 51),(1997, 23, 46),(2006, 3, 55),(2007, 24, 34), (2008, 24, 37);
then I create a view to see the highest score that Kobe Bryant got in each year
CREATE OR REPLACE VIEW v_kobe_highScores
AS
SELECT player, max(score) AS highest_score, happened_in
FROM highscores
WHERE player = 24
GROUP BY happened_in;
I wrote a conditional statement to see the highest score that kobe got in 2006;
select * from v_kobe_highscores where happened_in = 2006;
When I explain it in toad for mysql, I found out that mysql have scan all rows to form the view, then find data with condition in it, without using index on [happened_in].
explain select * from v_kobe_highscores where happened_in = 2006;
The view that we use in our project is built among tables with millions of rows. Scanning all the rows from table in every view data retrieval is unacceptable. Please help! Thanks!
#zerkms Here is the result I tested on real-life. I don't see much differences between. I think #spencer7593 has the right point. The MySQL optimizer doesn't "push" that predicate down in the view query.
How do you get MySQL to use an index for a view query? The short answer, provide an index that MySQL can use.
In this case, the optimum index is likely a "covering" index:
... ON highscores (player, happened_in, score)
It's likely that MySQL will use that index, and the EXPLAIN will show: "Using index" due to the WHERE player = 24 (an equality predicate on the leading column in the index. The GROUP BY happened_id (the second column in the index), may allow MySQL to optimize that using the index to avoid a sort operation. Including the score column in the index will allow the query to satisfied entirely from the index, without having to visit (lookup) the data pages referenced by the index.
That's the quick answer. The longer answer is that MySQL is very unlikely to use an index with leading column of happened_id for the view query.
Why the view causes a performance issue
One of the issues you have with the MySQL view is that MySQL does not "push" the predicate from the outer query down into the view query.
Your outer query specifies WHERE happened_in = 2006. The MySQL optimizer does not consider the predicate when it runs the inner "view query". That query for the view gets executed separately, before the outer query. The resultset from the execution of that query get "materialized"; that is, the results are stored as an intermediate MyISAM table. (MySQL calls it a "derived table", and that name they use makes sense, when you understand the operations that MysQL performs.)
The bottom line is that the index you have defined on happened_in is not being used by MySQL when it rusn the query that forms the view definition.
After the intermediate "derived table" is created, THEN the outer query is executed, using that "derived table" as a rowsource. It's when that outer query runs that the happened_in = 2006 predicate is evaluated.
Note that all of the rows from the view query are stored, which (in your case) is a row for EVERY value of happened_in, not just the one you specify an equality predicate on in the outer query.
The way that view queries are processed may be "unexpected" by some, and this is one reason that using "views" in MySQL can lead to performance problems, as compared to the way view queries are processed by other relational databases.
Improving performance of the view query with a suitable covering index
Given your view definition and your query, about the best you are going to get would be a "Using index" access method for the view query. To get that, you'd need a covering index, e.g.
... ON highscores (player, happened_in, score).
That's likely to be the most beneficial index (performance wise) for your existing view definition and your existing query. The player column is the leading column because you have an equality predicate on that column in the view query. The happened_in column is next, because you've got a GROUP BY operation on that column, and MySQL is going to be able to use this index to optimize the GROUP BY operation. We also include the score column, because that is the only other column referenced in your query. That makes the index a "covering" index, because MySQL can satisfy that query directly from index pages, without a need to visit any pages in the underlying table. And that's as good as we're going to get out of that query plan: "Using index" with no "Using filesort".
Compare performance to standalone query with no derived table
You could compare the execution plan for your query against the view vs. an equivalent standalone query:
SELECT player
, MAX(score) AS highest_score
, happened_in
FROM highscores
WHERE player = 24
AND happened_in = 2006
GROUP
BY player
, happened_in
The standalone query can also make use of a covering index e.g.
... ON highscores (player, happened_in, score)
but without a need to materialize an intermediate MyISAM table.
I am not sure that any of the previous provides a direct answer to the question you were asking.
Q: How do I get MySQL to use an INDEX for view query?
A: Define a suitable INDEX that the view query can use.
The short answer is provide a "covering index" (index includes all columns referenced in the view query). The leading columns in that index should be the columns that are referenced with equality predicates (in your case, the column player would be a leading column because you have a player = 24 predicate in the query. Also, the columns referenced in the GROUP BY should be leading columns in the index, which allows MySQL to optimize the GROUP BY operation, by making use of the index rather than using a sort operation.
The key point here is that the view query is basically a standalone query; the results from that query get stored in an intermediate "derived" table (a MyISAM table that gets created when a query against the view gets run.
Using views in MySQL is not necessarily a "bad idea", but I would strongly caution those who choose to use views within MySQL to be AWARE of how MySQL processes queries that reference those views. And the way MySQL processes view queries differs (significantly) from the way view queries are handled by other databases (e.g. Oracle, SQL Server).
Creating the composite index with player + happened_in (in this particular order) columns is the best you can do in this case.
PS: don't test mysql optimizer behaviour on such small amount of rows, because it's likely to prefer fullscan over indexes. If you want to see what will happen in real life - fill it with real life-alike amount of data.
This doesn't directly answer the question, but it is a directly related workaround for others running into this issue. This achieves the same benefits of using a view, while minimizing the disadvantages.
I setup a PHP function to which I can send parameters, things to push into the inside to maximize index usage, rather than using them in a join or where clause outside a view. In the function you can formulate the SQL syntax for a derived table, and return that syntax. Then in the calling program, you can do something like this:
$table = tablesyntax(parameters);
select field1, field2 from {$table} as x... + other SQL
Thus you get the encapsulation benefits of the view, the ability to call it as if it is a view, but not the index limitations.