Adding index increases query execution - mysql

How's that possible when I added index to a column it slowed down the execution time?
Trying to get rid of the query from slow queries log.
My slow-query settings:
slow_query_log = 1
long_query_time = 1 # seconds
log_queries_not_using_indexes = 1
slow_query_log_file = /var/log/mysql-slow.log

Indexes do not always speed up execution. The effect of an index depends primarily on the "selectivity" of the query: how many rows are processed by the overall query.
In general, reading a database (a "full table scan") is an efficient operation. The database engine knows what pages it needs to read and can read ahead to get them. Such I/O often occurs in the background, while processing the pages is in the foreground. When the next page is needed though, there is a good chance it is already in the page cache.
The performance issue with full table scans is that tables are big. So even efficient reads take time. When you are looking for one row in a million ("needle-in-the-haystack" queries), the reads are a waste of time. This is where indexes fix things.
However, say you have 100 records per page and you are reading more than 1% of the records. On average, every page will need to be read -- whether you are using an index or a full-table scan. The problem is that index reads are less efficient than scan reads. A read-ahead mechanism doesn't help them, because the reads are random.
This problem can be further exacerbated through something called thrashing. If the table does not fit into memory, then each random read is likely to be a "cache miss", incurring the overhead of a read from disk. The full table scan would just read the data, and with a decent look-ahead system, there would be no cache misses.
In your example, you could increase the selectivity of the index by including both banner and event in the index (these are compared using equality) and one of the other fields.

Depending on structure of the data on disk, it might be faster to just load the entire db/column and sort/filter it in ram (which will likely happen when no index exists), than to traverse a sparsed index on disk. I don't know if this applies to your specific context or if you have another issue here though.

Related

MySQL Table with ~20Mil rows - Queries getting slow

I have a table in my MySQL(5.7.32) database which currently has 20Mil rows. I have a few fairly complex queries written on that table, where I carry out FullTextSearches and join them to other tables. The queries on the table are getting slow (using appropriate indexes).
I understand that 20 Mil rows are not a lot for a DB table to handle, and would like to understand what are the factors (other than indexes) that I should consider for performance improvements. For example, any DB defaults that I should consider changing that impact performance.
NOTE: Since the table has FTS indexes, partitioning is not an option.
There are a lot of factors that could hurt performance:
Buffer pool not large enough to hold the index. So as a query searches the index, it has to keep swapping parts of the index into RAM and back out. You may need to increase the innodb_buffer_pool_size.
I'd monitor the ratio of the two numbers reported by SHOW GLOBAL STATUS LIKE 'innodb_buffer_pool_read%s'.
CPU is too slow. Each query is single-threaded, so CPU speed is more important than number of cores.
Concurrent load. If you have many queries running at the same time, they compete with each other for CPU, buffer pool, and I/O. Check SHOW PROCESSLIST or SHOW GLOBAL STATUS LIKE 'Threads_running'.
Server is overloaded, either by MySQL or by other apps or processes. Use top to find out if the system load average is high (I would consider anything over 10 to be too high), or if the system is using swap space instead of RAM.
Is the query using indexes like you expect? Did you analyze them with EXPLAIN?

Resources consumed by a simple SELECT query in MySql

There a few large tables in one of the databases of a customer (each table is ~50M rows in size and is not too wide). The intent is to infrequently read these tables (completely). As there are no reasonable CDC indices present, the plan is to read the tables by querying them
SELECT * from large_table;
The reads will be performed using a jdbc driver. With the following fetch configuration present, the intent is to read the data approximately one record at a time (it may require a significant amount of time) so that the client code is never overwhelmed.
PreparedStatement stmt = connection.prepareStatement(queryString, ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
I was going through the execution path of a query in High Performance MySQL, however some questions seemed unanswered:
Without the temp tables being explicitly created and the query cache being made use of, "how" are the stream reads tracked on the server?
Is any temporary data created (in main memory or files on disk) whatsoever? If so, where is it created and how much?
If temporary data is not created, how are the rows to be returned tracked? Does the query engine keep track of all the page files to be read for this query on this connection? In case there are several such queries running on the server, are the earliest "Tracked" files purged in favor of queries submitted recently?
PS: I want to understand the effect of this approach on the MySql server (not saying that there aren't better ways of reading the tables)
That simple query will not use a temp table. It will simply fetch the rows and transfer them to the client until it finishes. Nor would any possible index be useful. (If the real query is more complex, let's see it.)
The client may wait for all the rows (faster, but memory intensive) before it hands any to the user code, or it may hand them off one at a time (much slower).
I don't know the details in JDBC on specifying it.
You may want to page through the table. If so, don't use OFFSET, but use the PRIMARY KEY and "remember where you left off". More discussion: http://mysql.rjweb.org/doc.php/pagination
Your Question #3 leads to a complex answer...
Every query brings all the relevant data (and index entries) into RAM. The data/index is read in chunks ("blocks") of 16KB from the BTree structure that is persisted on disk. For a simple select like that, it will read the blocks 'sequentially' until finished.
But, be aware of "caching":
If a block is already in RAM, no I/O is needed.
If a block is not in the cache ("buffer_pool"), it will, if necessary, bump some block out and read the desired block in. This is very normal, and very common. Do not fear it.
Because of the simplicity of the query, only a few blocks ever need to be in RAM at any moment. Hence, if your buffer pool were only a few megabytes, it could still handle, say, a 1TB table. There would be a lot of I/O, and that would impact other operations.
As for "tracking", let me use the analogy of reading a long book in a single sitting. There is nothing to track, you are simply turning pages ('blocks'). You don't even need a 'bookmark' for tracking, it is next-next-next...
Another note: InnoDB uses "B+Tree", which includes a link from one block to the "next", thereby making the page turning efficient.
Another interpretation of tracking... "Transactions" and "ACID". When any query (read or write) touches a table, there is some form of lock applied to each row touched. For SELECT the lock is rather light-weight. For writes it can cause delays or even a "deadlock". The locks are unavoidable, but sometimes actions can be taken to minimize their impact.
Logically (but not actually), a "snapshot" of all rows in all tables is taken at the instant you start a transaction. This allows you to see a consistent view of everything, even if other connections are changing rows. The underlying mechanism is very lightweight on reading, but heavier for writes. Writes will make a copy of the row so that each connection sees the snapshot that it 'should' see. Also, the copy allows for ROLLBACK and recovery from a crash (eg power failure).
(Transaction "isolation" mode allows some control over the snapshot.) To get the optimal performance for your case, do nothing special.
Here's a way to conceptualize the handling of transactions: Each row has a timestamp associated with it. Each query saves the start time of the query. The query can "see" only rows that are older than that start time. A subsequent write in another connection will be creating copies of rows with a later timestamp, hence not visible to the SELECT. Hence, the onus is on writes to do extra work; reads are cheap.

Drop and Recreate Index on MYSQL table, it will improve the performance?

I have Executed the query in the newly imported MySQL database but it takes 68sec to complete. Then I have dropped and recreated the same indexes on 2 main tables then it takes 24sec only.
Why it has occurred? Is it a good practice or not?
Thanks in Advance
You are misinterpreting the results and the cause. Dropping and re-creating the indexes isn't what makes it go faster. There are two things that could be going on:
1) DB doesn't fit into RAM so when you recreated two indexes that made most of them stick in the buffer pool by the time you ran the query.
2) Table was fragmented or had very lightly filled blocks. Recreating indexes probably rebuilt the table and that may have improved page occupancy If your query requires a full table scan, this would have meant fewer GBs of table to scan and possibly less fragmented (can matter on spinning rust).
As a general rule you should never need to do that. If you disable the query cache (query_cache_type=0, query_cache_size=0 on MySQL < 8), and run the query twice, the second time is the speed you can expect with hit buffer pool.

Will this result in two full table scans?

SELECT P_CODE, P_PRICE
FROM PRODUCT
WHERE P_PRICE >= (SELECT AVG(P_PRICE) FROM PRODUCT);
Will this query (under mysql) result in two full table scans (from disk) or will the optimizer understand that it's faster too (if there is enough RAM to hold the result set) only do one full table scan? The table has no indexes.
Is it possible to read (somehow) this information from output of the EXPLAIN command in mysql?
The question is flawed based on a misunderstanding of what a table scan actually is:
A table scan iterates over all rows in the table (irrespective of how it obtains those rows).
It also differs slightly from an index scan in that it works with the "full row". Whereas an index scan has less overall data to process, because it works with a subset of columns.
But the question is actually asking about difference between physical and logical IO.
(from disk) or will the optimizer understand that it's faster too (if there is enough RAM to hold the result set)
Yes the query will do 2 table scans. That cannot be avoided:
the server has to process the full set of prices twice.
and it has to finish processing for AVG(PRICE) before it can start processing for the WHERE filter.
However, a "logical" table scan does not necessarily require reading the data from disk twice. If all the data is in memory, the server can perform the table scan in memory. So although the second stage of processing must still perform a table scan, it can be more efficient by avoiding secondary disk access.
Take a look at this question to see how to distinguish logical and physical IO on mysql:
For a MySQL query, how do you determine physical and logical I/O?
I'll add that in theory a server could choose to keep only the Price column in memory on the first pass. In which case it wouldn't need be perform a "full table scan" on the second pass.
However this is unlikely in practice as there's a benefit to keep all the data in memory for other future queries ... whatever columns they may wish to process.
Re your comment:
my assumption, when looking at the query, is than an optimizer should/would be able to determine that "this query reads the same data twice, after the first read i will put it into memory(if there is space) and use the in-memory data for the next part of the query, instead of asking the disk for it twice"
Well, at least in MySQL's InnoDB engine, something sort of like this happens. InnoDB can't really read pages directly from disk. It load every requested page into RAM before doing data operations on it. The RAM is a preallocated area called the InnoDB buffer pool. This stores byte-for-byte copies of the pages from the on-disk tablespace, plus some metadata about them.
After reading a page, the buffer pool has no immediate need to evict it from RAM, unless other pages are requested and there's no space left in the buffer pool for them. So subsequent requests for the same pages may find the pages already residing in RAM. The more this happens, the better your performance overall.
You might have more data pages in your product table than can fit in your buffer pool. During a table-scan, InnoDB will evict pages as needed to load the remaining set of pages for the table. If you have a table that is many times larger than your buffer pool, you can imagine that this results in quite a bit of "churn" as pages come in and out. If you can afford it, allocating more RAM to the buffer pool is an good way to improve performance.
All these facts about the buffer pool don't change the fact that your query will perform two table-scans. It is true that it will be faster to read the pages from the buffer pool than reading pages from disk. You can experiment:
Shutdown your MySQL Server and start it back up again. The buffer pool should be empty at this point (unless you are using the feature to save the buffer pool on shutdown).
Run your query. It might take many seconds, because each page requested has to be read from disk before it can be used.
Run the same query again. It's faster! I've seen cases where this difference makes the performance about 4x faster in tests. I understand that RAM is typically thousands of times faster than disk, but I/O speed is not the only code running. Also it depends on what other requests are occupying the disk bandwidth, and other factors.
The difference between disk speed and RAM speed is (more or less) an arithmetic factor. No matter how large your dataset, the speed difference gives the same advantage.
Indexes are much more important, because they turn a linear search O(n) into a B-tree search O(log2n). As your dataset gets larger, the advantage of this becomes more dramatic. This is why there is so much emphasis on analyzing complexity of algorithms in computer science.
Please explain how you could do this with only one table scan. It is not obvious.
The use of the AVG() function would typically result in two full scans. If you have an index, then one or both scans might use the index.

Which is faster, key_cache or OS cache?

In a tb with 1 mil. rows if I do (after I restart the computer - so nothing it's cached):
1. SELECT price,city,state FROM tb1 WHERE zipId=13458;
the result is 23rows in 0.270s
after I run 'LOAD INDEX INTO CACHE tb1' (key_buffer_size=128M and total index size for tb is 82M):
2. SELECT price,city,state FROM tb1 WHERE zipId=24781;
the result is 23rows in 0.252s, Key_reads remains constant, Key_read_requests is incremented with 23
BUT after I load 'zipId' into OS cache, if I run again the query:
2. SELECT price,city,state FROM tb1 WHERE zipId=20548;
the result is 22rows in 0.006s
This it's just a simple example, but I run tens of tests and combinations. But the results are always the same.
I use: MySql with MyISAM, WINDOWS 7 64, and the query_cache is 0;
zipId it's a regular index (not primary key)
SHOULDN'T key_cache be faster than OS cache ??
SHOULDN'T be a huge difference in speed, after I load the index into cache ??
(in my test it's almost no difference).
I've read a lot of websites,tutorials and blogs on this matter but none of them really discuss the difference in speed. So, any ideas or links will be greatly appreciated.
Thank you.
Under normal query processing, MySQL will scan the index for the where clause values (i.e. zipId = 13458). Then uses the index to look up the corresponding values from the MyISAM main table (a second disk access). When you load the table into memory, the disk accesses are all done in memory, not from reading a real disk.
The slow part of the query is the lookup from the index into the main table. So loading the index into memory may not improve the query speed.
One thing to try is Explain Select on your queries to see how the index is being used.
Edit: Since I don't think the answers to your comments will fit in a comment space. I'll answer them here.
MyISAM in and of itself does not have a cache. It relies upon the OS to do the disk caching. How much of your table is cached by depends upon what else you are running in the system, and how much data you are reading through. Windows in particular does not allow the user much control over what data is cached and for how long.
The OS caches disk blocks (either 4K or 8K chunks) of the index file or the full table file.
SELECT indexed_col FROM tb1 WHERE zipId+0>1
Queries like this where you use functions on the predicate (Where clause) can cause MySQL to do full table scans rather than using any index. As I suggested above, use EXPLAIN SELECT to see what MySQL is doing.
If you want more control over the cache, try using an INNODB table. The InnoDB engine creates its own cache which you can size, and does a better job of keeping the most recent used stuff in it.