MySQL not using index instead using where - mysql

I am using mysql for my db. I have 200,000 records with 30 columns. I am creating a composite index using 6 columns(txn_date,v_name,transaction_status, sid, pnum, txn_num) . When I do a explain on the following query having those 6 columns in where clause, the explain is using index till certain txn_date and then its using the where condition based on output of explain command
SELECT * FROM transactions
WHERE txn_date between '2021-01-10' and '2021-01-19'
and v_name ='Vo'
AND transaction_status = 'failed'
AND sid = '566'
AND txn_num = 100
AND p_num = 5;
In the above query when the txn_date is date from 10 Jan to 18 Jan, its using index and above that its using where condition. Please help me out to use the index effectively so it uses index always

End with the date; start with columns tested with '='.
The columns of an index will be used from the left, but won't be used past the range test, so your index was no better than a 1-column index with just the date. Given that, the Optimizer probably saw that more than about 20% of the table would need to be used (based on the date range), and punted. That is, it decided that it would probably be faster to simply scan the table.
This discussion applies to any size of table.
FORCE INDEX will force it to use the index, but so what? The Optimizer is pretty good at deciding that a small date range can effectively use the index, but a large range cannot. If you add a FORCE, it may help some of the time but hurt badly in other cases.
By having all the = tests first in the index, obviates much of the discussion about how many days are in the date range.
More on index building: http://mysql.rjweb.org/doc.php/index_cookbook_mysql

Related

Query time suddenly increased

I have MariaDB 10.1.14, For a long time I'm doing the following query without problems (it tooks about 3 seconds):
SELECT
sum(transaction_total) as sum_total,
count(*) as count_all,
transaction_currency
FROM
transactions
WHERE
DATE(transactions.created_at) = DATE(CURRENT_DATE)
AND transaction_type = 1
AND transaction_status = 2
GROUP BY
transaction_currency
Suddenly, I'm not sure exactly why, this query take about 13 seconds.
This is the EXPLAIN:
And those are the all indexes of transactions table:
What is the reason for the sudden query time increase? and how can I decrease it?
If you are adding more data to your table the query time will increase.
But you can do a few things to improve the performance.
Create a composite index for ( transaction_type, transaction_status, created_at)
Remove the DATE() functions (or any function) from your fields, because that doesn't allow engine use the index. CURRENT_DATE is a constant so there doesn't matter, but isn't necessary because already return DATE
if created_at isnt date you can use
created_at >= CURRENT_DATE and created_at < CURRENT_DATE + 1
or create a different field to only save the date part.
+1 to answer from #JuanCarlosOropeza, but you can go a little further with the index.
ALTER TABLE transactions ADD INDEX (
transaction_type,
transaction_status,
created_at,
transaction_currency,
transaction_total
);
As #RickJames mentioned in comments, the order of columns is important.
First, columns in equality comparisons
Next, you can index one column that is used for a range comparison (which is anything besides equality), or GROUP BY or ORDER BY. You have both range comparison and GROUP BY, but you can only get the index to help with one of these.
Last, other columns needed for the query, if you think you can get a covering index.
I describe more detail about index design in my presentation How to Design Indexes, Really (video: https://www.youtube.com/watch?v=ELR7-RdU9XU).
You're probably stuck with the "using temporary" since you have a range condition and also a GROUP BY referencing different columns. But you can at least eliminate the "using filesort" by this trick:
...
GROUP BY
transaction_currency
ORDER BY NULL
Supposing that it's not important to you which order the rows of the query results return in.
I don't know what has made your query slower. More data? Fragmentation? New DB version?
However, I am surprised to see that there is no index really supporting the query. You should have a compound index starting with the column with highest cardinality (the date? well, you can try different column orders and see which index the DBMS picks for the query).
create index idx1 on transactions(created_at, transaction_type, transaction_status);
If created_at contains a date part, then you may want to create a computed column created_on only containing the date and index that instead.
You can even extend this index to a covering index (where clause fields followed by group by clause fields followed by select clause fields):
create index idx2 on transactions(created_at, transaction_type, transaction_status,
transaction_currency, transaction_total);

MySQL - Poor performance in a select from a simple table

I have a very simple table with three columns:
- A BigINT,
- Another BigINT,
- A string.
The first two columns are defined as INDEX and there are no repetitions. Moreover, both columns have values in a growing order.
The table has nearly 400K records.
I need to select the string when a value is within those of column 1 and two, in order words:
SELECT MyString
FROM MyTable
WHERE Col_1 <= Test_Value
AND Test_Value <= Col_2 ;
The result may be either a NOT FOUND or a single value.
The query takes nearly a whole second while, intuitively (imagining a binary search throughout an array), it should take just a small fraction of a second.
I checked the index type and it is BTREE for both columns (1 and 2).
Any idea how to improve performance?
Thanks in advance.
EDIT:
The explain reads:
Select type: Simple,
Type: Range,
Possible Keys: PRIMARY
Key: Primary,
Key Length: 8,
Rows: 441,
Filtered: 33.33,
Extra: Using where.
If I understand your obfuscation correctly, you have a start and end value such as a datetime or an ip address in a pair of columns? And you want to see if your given datetime/ip is in the given range?
Well, there is no way to generically optimize such a query on such a table. The optimizer does not know whether a given value could be in multiple ranges. Or, put another way, whether the ranges are disjoint.
So, the optimizer will, at best, use an index starting with either start or end and scan half the table. Not efficient.
Are the ranges non-overlapping? IP Addresses
What can you say about the result? Perhaps a kludge like this will work: SELECT ... WHERE Col_1 <= Test_Value ORDER BY Col_1 DESC LIMIT 1.
Your query, rewritten with shorter identifiers, is this
SELECT s FROM t WHERE t.low <= v AND v <= t.high
To satisfy this query using indexes would go like this: First we must search a table or index for all rows matching the first of these criteria
t.low <= v
We can think of that as a half-scan of a BTREE index. It starts at the beginning and stops when it gets to v.
It requires another half-scan in another index to satisfy v <= t.high. It then requires a merge of the two resultsets to identify the rows matching both criteria. The problem is, the two resultsets to merge are large, and they're almost entirely non-overlapping.
So, the query planner probably should just choose a full table scan instead to satisfy your criteria. That's especially true in the case of MySQL, where the query planner isn't very good at using more than one index.
You may, or may not, be able to speed up this exact query with a compound index on (low, high, s) -- with your original column names (Col_1, Col_2, MyString). This is called a covering index and allows MySQL to satisfy the query completely from the index. It sometimes helps performance. (It would be easier to guess whether this will help if the exact definition of your table were available; the efficiency of covering indexes depends on stuff like other indexes, primary keys, column size, and so forth. But you've chosen minimal disclosure for that information.)
What will really help here? Rethinking your algorithm could do you a lot of good. It seems you're trying to retrieve rows where a test point v lies in the range [t.low, t.high]. Does your application offer an a-priori limit on the width of the range? That is, is there a known maximum value of t.high - t.low? If so, let's call that value maxrange. Then you can rewrite your query like this:
SELECT s
FROM t
WHERE t.low BETWEEN v-maxrange AND v
AND t.low <= v AND v <= t.high
When maxrange is available we can add the col BETWEEN const1 AND const2 clause. That turns into an efficient range scan on an index on low. In that case, the covering index I mentioned above will certainly accelerate this query.
Read this. http://use-the-index-luke.com/
Well... I found a suitable solution for me (not sure your guys will like it but, as stated, it works for me).
I simply partitioned my 400K records into a number of tables and created a simple table that serves as a selector:
The selector table holds the minimal value of the first column for each partition along with a simple index (i.e. 1, 2, ,...).
I then user the following to get the index of the table that is supposed to contain the searched for range like:
SELECT Table_Index
FROM tbl_selector
WHERE start_range <= Test_Val
ORDER BY start_range DESC LIMIT 1 ;
This will give me the Index of the table I wish to select from.
I then have a CASE on the retrieved Index to select the correct partition table from perform the actual search.
(I guess that more elegant would be to use Dynamic SQL, but will take care of that later; for now just wanted to test the approach).
The result is that I get the response well below a second (~0.08) and it is uniform regardless of the number being used for test. This, by the way, was not the case with the previous approach: There, if the number was "close" to the beginning of the table, the result was produced quite fast; if, on the other hand, the record was near the end of the table, it would take several seconds to complete).
[By the way, I assume you understand what I mean by beginning and end of the table]
Again, I'm sure people might dislike this, but it does the job for me.
Thank you all for the effort to assist!!

Two different queries on the same table with the same WHERE clause

I have two different queries. But they are both on the same table and have both the same WHERE clause. So they are selecting the same row.
Query 1:
SELECT HOUR(timestamp), COUNT(*) as hits
FROM hits_table
WHERE timestamp >= CURDATE()
GROUP BY HOUR(timestamp)
Query 2:
SELECT country, COUNT(*) as hits
FROM hits_table
WHERE timestamp >= CURDATE()
GROUP BY country
How can I make this more efficient?
If this table is indexed correctly, it honestly doesn't matter how big the entire table is because you're only looking at today's rows.
If the table is indexed incorrectly the performance of these queries will be terrible no matter what you do.
Your WHERE timestamp >= CURDATE() clause means you need to have an index on the timestamp column. In one of your queries the GROUP BY country shows that a compound covering index on (timestamp, country) will be a great help.
So, a single compound index (timestamp, country) will satisfy both the queries in your question.
Let's explain how that works. To look for today's records (or indeed any records starting and ending with particular timestamp values) and group them by country, and count them, MySQL can satisfy the query by doing these steps:
random-access the index to the first record that matches the timestamp. O(log n).
grab the first country value from the index.
scan to the next country value in the index and count. O(n).
repeat step three until the end of the timestamp range.
This index scan operation is about as fast as a team of ace developers (the MySQL team) can get it to be with a decade of hard work. (You may not be able to outdo them on a Saturday afternoon.) MySQL satisfies the whole query with a small subset of the index, so it doesn't really matter how big the table behind it is.
If you run one of these queries right after the other, it's possible that MySQL will still have some or all the index data blocks in a RAM cache, so it might not have to re-fetch them from disk. That will help even more.
Do you see how your example queries lead with timestamp? The most important WHERE criterion chooses a timestamp range. That's why the compound index I suggested has timestamp as its first column. If you don't have any queries that lead with country your simple index on that column probably is useless.
You asked whether you really need compound covering indexes. You probably should read about how they work and make that decision for yourself.
There's obviously a tradeoff in choosing indexes. Each index slows the process of INSERT and UPDATE a little, and can speed up queries a lot. Only you can sort out the tradeoffs for your particular application.
Since both queries have different GROUP BY clauses they are inherently different and cannot be combined. Assuming there already is an index present on the timestamp field there is no straightforward way to make this more efficient.
If the dataset is huge (10 million or more rows) you might get a little extra efficiency out of making an extra combined index on country, timestamp, but that's unlikely to be measurable, and the lack of it will usually be mitigated by in-memory buffering of MySQL itself if these 2 queries are executed directly after another.

MySQL Secondary Indexes

If I'm trying to increase the performance of a query that uses 4 different columns from a specific table, should I create 4 different indexes (one with each column individually) or should I create 1 index with all columns included?
One index with all 4 values is by my experience the fastest. If you use a where, try to put the columns in an order that makes it useful for the where.
An index with all four columns; the columns used in the WHERE should go first, and those for which you do == compare should go first of all.
Sometimes, giving priority to integer columns gives better results; YMMV.
So for example,
SELECT title, count(*) FROM table WHERE class = 'post' AND topic_id = 17
AND date > ##BeginDate and date < ##EndDate;
would have an index on: topic_id, post, date, and title, in this order.
The "title" in the index is only used so that the DB may find the value of "title" for those records matching the query, without the extra access to the data table.
The more balanced the distribution of the records on the first fields, the best results you will have (in this example, say 10% of the rows have topic_id = 17, you would discard the other 90% without ever having to run a string comparison with 'post' -- not that string comparisons are particularly costly. Depending on the data, you might find it better to index date first and post later, or even use date first as a MySQL PARTITION.
Single index is usually more effective than index merge, so if you have condition like f1 = 1 AND f2 = 2 AND f3 = 3 AND f4 = 4 single index would right decision.
To achieve best performance enumerate index fields in descending order of cardinality (count of distinct values), this will help to reduce analyzed rows count.
Index of less than 4 fields can be more effective, as it requires less memory.
http://www.mysqlperformanceblog.com/2008/08/22/multiple-column-index-vs-multiple-indexes/

How to prevent MySQL selecting one index when a better one is available?

I have a table with 30,000 rows (and growing), which I join with another table. One some pages, I need to run a some 100+ of those queries, and things get slow. If I EXPLAIN the query, I notice that one table uses a primary key and is fast, but another table using one of its indexes, which is not the best one. Here's an overview:
SIMPLE | acc_entries | ref | ledger,date,type,status,status_ledger_date_type | type | 1 | const | 15359 | Using where
This is a sample query:
SELECT SUM(usd) AS total FROM acc_entries
LEFT JOIN acc_ledgers ON acc_entries.ledger = acc_ledgers.id
WHERE acc_entries.status = 1 AND
acc_ledgers.account = 3004 AND
date >= '2011-01-01' AND
date <= '2011-08-30' AND
type = 'credit'
As you can see, I am using in my WHERE the fields status, ledger (which is the field that joins with acc_ledgers.account), date and type. All of these fields have indexes. However, there is also a specific index that is used for all of them, in that same order. It is called status_ledger_data_type, and as you can see it is one of the indexes that MySQL considers using. However, at the end MySQL opts to use type as an index. This has some 15,000 possible rows (half of the table), whereas the other combined index only features a fraction of this. So my questions is: why does MySQL selects this index when a better one is available, and how can I prevent this?
You can try using index hints to force the use of your desired index.
MySql docs on Index Hints
The Battle Between Force Index and the Query Optimizer
7 ways to convince MySQL to use the right index
Actually, you want your index based on your smaller granularity. The Ledger from your Acc_Entries table will join to your ACC_Ledgers table on ITS primary index of ID, so the Acc_Ledgers is not really utilizing the Ledger portion for the WHERE clause. Your index should match as closely to the WHERE clause of your common queries. In this case, I would have an index on
(Account, Status, Type, Date)
The reason for Account being first, smaller result set. You could have 5,000 entries. Of those, 300 entries for the one account accounts, so you've already eliminated a huge amount of data to go through. Then, the Status... of the 300, you could have 100 # status 1, 100 # status 2, 100 # status 3, so you've now reduced the set even more, etc by other criteria of type and date.
Your query otherwise is completely fine... just a personal style in writing, I try to write my queries with the WHERE conditions as closely matching the index in same sequence too, so I would just have the Account clause first, then Status, Type and Date... but again, thats a personal style in writing queries.