How to create Effective Indexing on MySQL - mysql

Can anyone help me on how could i create an index so that my query will execute smoothly.
Currently, I have the below query that returns 8k+ or records. But it takes 2 sec or more to complete. The current records on tblproduction is 16million+
SELECT COUNT(fldglobalid) AS PackagesDone
FROM tblproduction
WHERE fldemployeeno = 'APD100401'
AND fldstarttime BETWEEN '2013-08-14 07:18:06' AND '2013-08-14 16:01:58'
AND fldshift = 'B'
AND fldprojectgroup = 'FTO'
AND fldGlobalID <> 0;
I have below current indexes but it still query executes longer
Index_1
fldEmployeeNo
fldStartTime
Index_2
fldEmployeeNo
fldTask
fldTaskStatus
Index_3
fldGlobalId
fldProjectGroup
Index_4
fldGlobalId
I have used all of this indexes using FORCE_Index but still the query executes longer.
Please advise, thanks!

This started as a comment Gordon Linoff's answer but is getting too long.
It would be better to include fldGlobalId in the index as well - no it would not - this is counter productive for performance - it won't improve the speed of retrieving the data (queries are not used for inequalities) but will lead to more frequent index updates, hence increased index fragmentation (hence potentially worsening the performance of SELECT) and reduced performance for inserts and updates.
Ideally you should design your schema to optimize all the queries - which is rather a large task, but since you've only provided one....
The query as it stands will only use a single index for resolution, hence the index should include all the fields which have predicates in the query except for non-matches (i.e. fldGlobalID).
The order of the fields is important: in the absence of other queries with different sets of predicates, then the fields with the highest relative cardinality should come first. It's rather hard to know what this is without analysing the data (SELECT COUNT(DISTINCT field)/COUNT(*) FROM yourtable) but at a guess the order should be fldstarttime, fldemployeeno, fldprojectgroup, fldshift
If there is a dependency on fldshift from fldemployeeno (i.e. employees always, or at least more than around 90% of the time) then including fldshift in the index is merely increasing it's size and not making it any more efficient.
You didn't say what type of index you're using - btrees work will with ranges, hashes with inequalities. Since the highest cardinality predicate here is using a range, then a btree index will be much more efficient than a hash based index.

You can use one index. Here is the query, slightly rearranged:
SELECT COUNT(fldglobalid) AS PackagesDone
FROM tblproduction
WHERE fldemployeeno = 'APD100401'
AND fldshift = 'B'
AND fldprojectgroup = 'FTO'
AND fldstarttime BETWEEN '2013-08-14 07:18:06' AND '2013-08-14 16:01:58'
AND fldGlobalID <> 0;
(I just moved the equality conditions together to the top of the where clause).
The query should make use of an index on tblproduction(fldemployeeno, fldshift, fldprojectgroup, fldstarttime). It would be better to include fldGlobalId in the index as well, so the index "covers" the query (all columns in the query are in the index). So, try this index:
tblproduction(fldemployeeno, fldshift, fldprojectgroup, fldstarttime, fldGlobalID)

Related

SQL gets slow on a simple query with ORDER BY

I have problem with MySQL ORDER BY, it slows down query and I really don't know why, my query was a little more complex so I simplified it to a light query with no joins, but it stills works really slow.
Query:
SELECT
W.`oid`
FROM
`z_web_dok` AS W
WHERE
W.`sent_eRacun` = 1 AND W.`status` IN(8, 9) AND W.`Drzava` = 'BiH'
ORDER BY W.`oid` ASC
LIMIT 0, 10
The table has 946,566 rows, with memory taking 500 MB, those fields I selecting are all indexed as follow:
oid - INT PRIMARY KEY AUTOINCREMENT
status - INT INDEXED
sent_eRacun - TINYINT INDEXED
Drzava - VARCHAR(3) INDEXED
I am posting screenshoots of explain query first:
The next is the query executed to database:
And this is speed after I remove ORDER BY.
I have also tried sorting with DATETIME field which is also indexed, but I get same slow query as with ordering with primary key, this started from today, usually it was fast and light always.
What can cause something like this?
The kind of query you use here calls for a composite covering index. This one should handle your query very well.
CREATE INDEX someName ON z_web_dok (Drzava, sent_eRacun, status, oid);
Why does this work? You're looking for equality matches on the first three columns, and sorting on the fourth column. The query planner will use this index to satisfy the entire query. It can random-access the index to find the first row matching your query, then scan through the index in order to get the rows it needs.
Pro tip: Indexes on single columns are generally harmful to performance unless they happen to match the requirements of particular queries in your application, or are used for primary or foreign keys. You generally choose your indexes to match your most active, or your slowest, queries. Edit You asked whether it's better to create specific indexes for each query in your application. The answer is yes.
There may be an even faster way. (Or it may not be any faster.)
The IN(8, 9) gets in the way of easily handling the WHERE..ORDER BY..LIMIT completely efficiently. The possible solution is to treat that as OR, then convert to UNION and do some tricks with the LIMIT, especially if you might also be using OFFSET.
( SELECT ... WHERE .. = 8 AND ... ORDER BY oid LIMIT 10 )
UNION ALL
( SELECT ... WHERE .. = 9 AND ... ORDER BY oid LIMIT 10 )
ORDER BY oid LIMIT 10
This will allow the covering index described by OJones to be fully used in each of the subqueries. Furthermore, each will provide up to 10 rows without any temp table or filesort. Then the outer part will sort up to 20 rows and deliver the 'correct' 10.
For OFFSET, see http://mysql.rjweb.org/doc.php/index_cookbook_mysql#or

Optimize query through the order of columns in index

I had a table that is holding a domain and id
the query is
select distinct domain
from user
where id = '1'
the index is using the order idx_domain_id is faster than idx_id_domain
if the order of the execution is
(FROM clause,WHERE clause,GROUP BY clause,HAVING clause,SELECT
clause,ORDER BY clause)
then the query should be faster if it use the sorted where columns than the select one.
at 15:00 to 17:00 it show the same query i am working on
https://serversforhackers.com/laravel-perf/mysql-indexing-three
the table has a 4.6 million row.
time using idx_domain_id
time after change the order
This is your query:
select distinct first_name
from user
where id = '1';
You are observing that user(first_name, id) is faster than user(id, firstname).
Why might this be the case? First, this could simply be an artifact of how your are doing the timing. If your table is really small (i.e. the data fits on a single data page), then indexes are generally not very useful for improving performance.
Second, if you are only running the queries once, then the first time you run the query, you might have a "cold cache". The second time, the data is already stored in memory, so it runs faster.
Other issues can come up as well. You don't specify what the timings are. Small differences can be due to noise and might be meaningless.
You don't provide enough information to give a more definitive explanation. That would include:
Repeated timings run on cold caches.
Size information on the table and the number of matching rows.
Layout information, particularly the type of id.
Explain plans for the two queries.
select distinct domain
from user
where id = '1'
Since id is the PRIMARY KEY, there is at most one row involved. Hence, the keyword DISTINCT is useless.
And the most useful index is what you already have, PRIMARY KEY(id). It will drill down the BTree to find id='1' and deliver the value of domain that is sitting right there.
On the other hand, consider
select distinct domain
from user
where something_else = '1'
Now, the obvious index is INDEX(something_else, domain). This is optimal for the WHERE clause, and it is "covering" (meaning that all the columns needed by the query exist in the index). Swapping the columns in the index will be slower. Meanwhile, since there could be multiple rows, DISTINCT means something. However, it is not the logical thing to use.
Concerning your title question (order of columns): The = columns in the WHERE clause should come first. (More details in the link below.)
DISTINCT means to gather all the rows, then de-duplicate them. Why go to that much effort when this gives the same answer:
select domain
from user
where something_else = '1'
LIMIT 1
This hits only one row, not all the 1s.
Read my Indexing Cookbook.
(And, yes, Gordon has a lot of good points.)

MySQL efficient test if count w/ where is greater than a value

Is there a way to optimize the following query?
SELECT count(*)>1000 FROM table_with_lot_of_rows WHERE condition_on_index;
Using this query, MySQL first performs the count(*) and then the comparison. This is is fast when only few rows satisfy the condition, but can take forever if a lot of rows satisfy it. Is there a way to stop counting as soon as 1000 items are found, instead of going through all the results?
In particular, I'm interested in MyISAM table with full-text condition, but any answer for InnoDB and/or basic WHERE clause will help.
SELECT 1
FROM table_with_lot_of_rows
WHERE condition_on_index
LIMIT 1000, 1;
Works this way:
Using the index (which is presumably faster than using the data)
Skip over 1000 rows, collecting nothing. (This is better than other answers.)
If you make it this far, fetch 1 row, containing only the literal 1 (in the SELECT).
Now you either have an empty result set (<= 1000 rows) or a row of 1 (at least 1001 rows).
Then, depending on your application language, it is easy to distinguish between the two cases.
Another note: If this is to be a subquery in a bigger query, then do
EXISTS ( SELECT 1
FROM table_with_lot_of_rows
WHERE condition_on_index
LIMIT 1000, 1 )
Which returns TRUE/FALSE (which are synonymous with 1 or 0).
Face it, scanning 1001 rows, even of the index, will take some time. I think my formulation is the fastest possible.
Other things to check: Is this InnoDB? Does EXPLAIN say "Using index"? How much RAM? What is the setting of innodb_buffer_pool_size?
Note that InnoDB now has FULLTEXT, so there is no reason to stick with MyISAM.
If you are using MyISAM and the WHERE is MATCH..., then most of what I said is likely not to be applicable. FULLTEXT probably fetches all results before giving the rest of the engine to chance to do these games with ORDER BY and LIMIT.
Please show us the actual query, its EXPLAIN, and SHOW CREATE TABLE. And what is the real goal? To see if a query will deliver "too many" results?
Possible improvement (depending on context)
Since my initial SELECT returns scalar 1 or NULL, it can be used in any boolean context such as WHERE. 1 is TRUE, NULL will be treated as FALSE. Hence EXISTS is probably redundant.
Also, 1/NULL can be turned into 1/0 thus. Note: the extra parens are required.
IFNULL( ( SELECT ... LIMIT 1000,1 ), 0)
You can optimize the query using a sub-query with a LIMIT:
SELECT count(*)>1000 FROM (
SELECT 0 table_with_lot_of_rows
WHERE condition_on_index
LIMIT 1001
) as truncated_count;
In that case, MySQL stops as soon as enough rows satisfy the condition.

MySQL - Poor performance in a select from a simple table

I have a very simple table with three columns:
- A BigINT,
- Another BigINT,
- A string.
The first two columns are defined as INDEX and there are no repetitions. Moreover, both columns have values in a growing order.
The table has nearly 400K records.
I need to select the string when a value is within those of column 1 and two, in order words:
SELECT MyString
FROM MyTable
WHERE Col_1 <= Test_Value
AND Test_Value <= Col_2 ;
The result may be either a NOT FOUND or a single value.
The query takes nearly a whole second while, intuitively (imagining a binary search throughout an array), it should take just a small fraction of a second.
I checked the index type and it is BTREE for both columns (1 and 2).
Any idea how to improve performance?
Thanks in advance.
EDIT:
The explain reads:
Select type: Simple,
Type: Range,
Possible Keys: PRIMARY
Key: Primary,
Key Length: 8,
Rows: 441,
Filtered: 33.33,
Extra: Using where.
If I understand your obfuscation correctly, you have a start and end value such as a datetime or an ip address in a pair of columns? And you want to see if your given datetime/ip is in the given range?
Well, there is no way to generically optimize such a query on such a table. The optimizer does not know whether a given value could be in multiple ranges. Or, put another way, whether the ranges are disjoint.
So, the optimizer will, at best, use an index starting with either start or end and scan half the table. Not efficient.
Are the ranges non-overlapping? IP Addresses
What can you say about the result? Perhaps a kludge like this will work: SELECT ... WHERE Col_1 <= Test_Value ORDER BY Col_1 DESC LIMIT 1.
Your query, rewritten with shorter identifiers, is this
SELECT s FROM t WHERE t.low <= v AND v <= t.high
To satisfy this query using indexes would go like this: First we must search a table or index for all rows matching the first of these criteria
t.low <= v
We can think of that as a half-scan of a BTREE index. It starts at the beginning and stops when it gets to v.
It requires another half-scan in another index to satisfy v <= t.high. It then requires a merge of the two resultsets to identify the rows matching both criteria. The problem is, the two resultsets to merge are large, and they're almost entirely non-overlapping.
So, the query planner probably should just choose a full table scan instead to satisfy your criteria. That's especially true in the case of MySQL, where the query planner isn't very good at using more than one index.
You may, or may not, be able to speed up this exact query with a compound index on (low, high, s) -- with your original column names (Col_1, Col_2, MyString). This is called a covering index and allows MySQL to satisfy the query completely from the index. It sometimes helps performance. (It would be easier to guess whether this will help if the exact definition of your table were available; the efficiency of covering indexes depends on stuff like other indexes, primary keys, column size, and so forth. But you've chosen minimal disclosure for that information.)
What will really help here? Rethinking your algorithm could do you a lot of good. It seems you're trying to retrieve rows where a test point v lies in the range [t.low, t.high]. Does your application offer an a-priori limit on the width of the range? That is, is there a known maximum value of t.high - t.low? If so, let's call that value maxrange. Then you can rewrite your query like this:
SELECT s
FROM t
WHERE t.low BETWEEN v-maxrange AND v
AND t.low <= v AND v <= t.high
When maxrange is available we can add the col BETWEEN const1 AND const2 clause. That turns into an efficient range scan on an index on low. In that case, the covering index I mentioned above will certainly accelerate this query.
Read this. http://use-the-index-luke.com/
Well... I found a suitable solution for me (not sure your guys will like it but, as stated, it works for me).
I simply partitioned my 400K records into a number of tables and created a simple table that serves as a selector:
The selector table holds the minimal value of the first column for each partition along with a simple index (i.e. 1, 2, ,...).
I then user the following to get the index of the table that is supposed to contain the searched for range like:
SELECT Table_Index
FROM tbl_selector
WHERE start_range <= Test_Val
ORDER BY start_range DESC LIMIT 1 ;
This will give me the Index of the table I wish to select from.
I then have a CASE on the retrieved Index to select the correct partition table from perform the actual search.
(I guess that more elegant would be to use Dynamic SQL, but will take care of that later; for now just wanted to test the approach).
The result is that I get the response well below a second (~0.08) and it is uniform regardless of the number being used for test. This, by the way, was not the case with the previous approach: There, if the number was "close" to the beginning of the table, the result was produced quite fast; if, on the other hand, the record was near the end of the table, it would take several seconds to complete).
[By the way, I assume you understand what I mean by beginning and end of the table]
Again, I'm sure people might dislike this, but it does the job for me.
Thank you all for the effort to assist!!

Instructing MySQL to apply WHERE clause to rows returned by previous WHERE clause

I have the following query:
SELECT dt_stamp
FROM claim_notes
WHERE type_id = 0
AND dt_stamp >= :dt_stamp
AND DATE( dt_stamp ) = :date
AND user_id = :user_id
AND note LIKE :click_to_call
ORDER BY dt_stamp
LIMIT 1
The claim_notes table has about half a million rows, so this query runs very slowly since it has to search against the unindexed note column (which I can't do anything about). I know that when the type_id, dt_stamp, and user_id conditions are applied, I'll be searching against about 60 rows instead of half a million. But MySQL doesn't seem to apply these in order. What I'd like to do is to see if there's a way to tell MySQL to only apply the note LIKE :click_to_call condition to the rows that meet the former conditions so that it's not searching all rows with this condition.
What I've come up with is this:
SELECT dt_stamp
FROM (
SELECT *
FROM claim_notes
WHERE type_id = 0
AND dt_stamp >= :dt_stamp
AND DATE( dt_stamp ) = :date
AND user_id = :user_id
)
AND note LIKE :click_to_call
ORDER BY dt_stamp
LIMIT 1
This works and is extremely fast. I'm just wondering if this is the right way to do this, or if there is a more official way to handle it.
It shouldn't be necessary to do this. The MySQL optimizer can handle it if you have multiple terms in your WHERE clause separated by AND. Basically, it knows how to do "apply all the conditions you can using indexes, then apply unindexed expressions only to the remaining rows."
But choosing the right index is important. A multi-column index is best for a series of AND terms than individual indexes. MySQL can apply index intersection, but that's much less effective than finding the same rows with a single index.
A few logical rules apply to creating multi-column indexes:
Conditions on unique columns are preferred over conditions on non-unique columns.
Equality conditions (=) are preferred over ranges (>=, IN, BETWEEN, !=, etc.).
After the first column in the index used for a range condition, subsequent columns won't use an index.
Most of the time, searching the result of a function on a column (e.g. DATE(dt_stamp)) won't use an index. It'd be better in that case to store a DATE data type and use = instead of >=.
If the condition matches > 20% of the table, MySQL probably will decide to skip the index and do a table-scan anyway.
Here are some webinars by myself and my colleagues at Percona to help explain index design:
Tools and Techniques for Index Design
MySQL Indexing: Best Practices
Advanced MySQL Query Tuning
Really Large Queries: Advanced Optimization Techniques
You can get the slides for these webinars for free, and view the recording for free, but the recording requires registration.
Don't go for the derived table solution as it is not performant. I'm surprised about the fact that having = and >= operators MySQL is going for the LIKE first.
Anyway, I'd say you could try adding some indexes on those fields and see what happens:
ALTER TABLE claim_notes ADD INDEX(type_id, user_id);
ALTER TABLE claim_notes ADD INDEX(dt_stamp);
The latter index won't actually improve the search on the indexes but rather the sorting of the results.
Of course, having an EXPLAIN of the query would help.