I have a query in an InnoDb item table which contains 400k records (only...). I need to page the result for the presentation layer (60 per page) so I use LIMIT with values depending on the page to display.
The query is (the 110000 offset is just an example):
SELECT i.id, sale_type, property_type, title, property_name, latitude,
longitude,street_number, street_name, post_code,picture, url,
score, dw_id, post_date
FROM item i WHERE picture IS NOT NULL AND picture != ''
AND sale_type = 0
ORDER BY score DESC LIMIT 110000, 60;
Running this query on my machine takes about 1s.
Running this query on our test server is 45-50s.
EXPLAIN are both the same:
+----+-------------+-------+-------+---------------+-----------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-----------+---------+------+--------+-------------+
| 1 | SIMPLE | i | index | NULL | IDX_SCORE | 5 | NULL | 110060 | Using where |
+----+-------------+-------+-------+---------------+-----------+---------+------+--------+-------------+
The only configuration difference when query show variables are:
innodb_use_native_aio. It is enabled on the Test server, not on my machine. I tried disabling it and I don't see any significant change
innodb_buffer_pool_size 1G on Test server, 2G on my machine
Test server has 2Gb of ram, 2 core CPU:
mysqld uses > 65% of RAM at all time, but only increase 1-2% running above query
mysqld uses 14% of CPU while running the above query, none when idle
My local machine has 8Gb, 8 core CPU:
mysqld uses 28% of RAM at all time, and doesn't really increase while running the above query (or for a so short time I can see it)
mysqld uses 48% of CPU while running the above query, none when idle
Where and what can I do to have the same performance on the Test server? Is the RAM and/or CPU too low?
UPDATE
I have setup a new Test server with the same specs but 8G of RAM and 4 core CPU and the performance just jumped to values similar to my machine. The original server didn't seem to use all of the RAM/CPU, why are performance so worse?
One of the surest ways to kill performance is to make MySQL scan an index that doesn't fit in memory. So during a query, it has to load part of the index into the buffer pool, then evict that part and load the other part of the index. Causing churn in the buffer pool like this during a query will cause a lot of I/O load, and that makes it very slow. Disk I/O is about 100,000 times slower than RAM.
So there's a big difference between 1GB of buffer pool and 2GB of buffer pool, if your index is, say 1.5GB.
Another tip: you really don't want to use LIMIT 110000, 60. That causes MySQL to read 110000 rows from the buffer pool (possibly loading them from disk if necessary) just to discard them. There are other ways to page through result sets much more efficiently.
See articles such as Optimized Pagination using MySQL.
Related
I have a db with around 600 000 listings, while browsing these on a page with pagination, I use this query to limit records:
SELECT file_id, file_category FROM files ORDER BY file_edit_date DESC LIMIT 290580, 30
On first pages LIMIT 0, 30 it loads in few ms, same for LIMIT 30,30, LIMIT 60,30, LIMIT 90,30, etc. But as I move forward to the end of the pages, the query takes around 1 second to execute.
Indexes are probably not related, it also happens if I run this:
SELECT * FROM `files` LIMIT 400000,30
Not sure why.
Is there a way to improve this ?
Unless there is a better solution, would it be a bad practice to just load all records and loop over them in the PHP page to see if the record is inside the pagination range and print it ?
Server is an i7 with 16GB ram;
MySQL Community Server 5.7.28;
files table is around 200 MB
here is the my.cnf if it matters
query_cache_type = 1
query_cache_size = 1G
sort_buffer_size = 1G
thread_cache_size = 256
table_open_cache = 2500
query_cache_limit = 256M
innodb_buffer_pool_size = 2G
innodb_log_buffer_size = 8M
tmp_table_size=2G
max_heap_table_size=2G
You may find that adding the following index will help performance:
CREATE INDEX idx ON files (file_edit_date DESC, file_id, file_category);
If used, MySQL would only need a single index scan to retrieve the number of records at some offset. Note that we include the columns in the select clause so that the index may cover the entire query.
LIMIT was invented to reduce the size of the result set, it can be used by the optimizer if you order the result set using an index.
When using LIMIT x,n the server needs to process x+n rows to deliver a result. The higher the value for x, the more rows have to be processed.
Here is the explain output from a simple table, having an unique index on column a:
MariaDB [test]> explain select a,b from t1 order by a limit 0, 2;
+------+-------------+-------+-------+---------------+---------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+-------+---------------+---------+---------+------+------+-------+
| 1 | SIMPLE | t1 | index | NULL | PRIMARY | 4 | NULL | 2 | |
+------+-------------+-------+-------+---------------+---------+---------+------+------+-------+
1 row in set (0.00 sec)
MariaDB [test]> explain select a,b from t1 order by a limit 400000, 2;
+------+-------------+-------+-------+---------------+---------+---------+------+--------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+-------+---------------+---------+---------+------+--------+-------+
| 1 | SIMPLE | t1 | index | NULL | PRIMARY | 4 | NULL | 400002 | |
+------+-------------+-------+-------+---------------+---------+---------+------+--------+-------+
1 row in set (0.00 sec)
When running the statements above (without EXPLAIN) the execution time for LIMIT 0 is 0.01 secs, for LIMIT 400000 0.6 secs.
Since MariaDB doesn't support LIMIT in a subquery, you could split your SQL statements in to two statements:
The first statement retrieves the id's (and needs to read the index file only), the second statement uses the id's retrieved from first statement:
MariaDB [test]> select a from t1 order by a limit 400000, 2;
+--------+
| a |
+--------+
| 595312 |
| 595313 |
+--------+
2 rows in set (0.08 sec)
MariaDB [test]> select a,b from t1 where a in (595312,595313);
+--------+------+
| a | b |
+--------+------+
| 595312 | foo |
| 595313 | foo |
+--------+------+
2 rows in set (0.00 sec)
Caution: I am about to use some strong language. Computers are big and fast, and they can handle bigger stuff than they could even a decade ago. But, as you are finding out, there are limits. I'm going to point out multiple limits that you have threatened; I will try to explain why the limits may be a problem.
Settings
query_cache_size = 1G
is terrible. Whenever a table is written to, the QC scans the 1GB looking for any references to that table in order to purge entries in the QC. Decrease that to 50M. This, alone, will speed up the entire system.
sort_buffer_size = 1G
tmp_table_size=2G
max_heap_table_size=2G
are bad for a different reason. If you have multiple connections performing complex queries, lots of RAM could be allocated for each, thereby chewing up RAM, leading to swapping, and possibly crashing. Don't set them higher than about 1% of RAM.
In general, do not blindly change values in my.cnf. The most important setting is innodb_buffer_pool_size, which should be bigger than your dataset, but no bigger than 70% of available RAM.
load all records
Ouch! The cost of shoveling all that data from MySQL to PHP is non-trivial. Once it gets to PHP, it will be stored in structures that are not designed for huge amounts of data -- 400030 (or 600000) rows might take 1GB inside PHP; this would probably blow out its "memory_limit", leading PHP crashing. (OK, just dying with an error message.) It is possible to raise that limit, but then PHP might push MySQL out of memory, leading to swapping, or maybe running out of swap space. What a mess!
OFFSET
As for the large OFFSET, why? Do you have a user paging through the data? And he is almost to page 10,000? Are there cobwebs covering him?
OFFSET must read and step over 290580 rows in your example. That is costly.
For a way to paginate without that overhead, see http://mysql.rjweb.org/doc.php/pagination .
If you have a program 'crawling' through all 600K rows, 30 at a time, then the tip about "remember where you left off" in that link will work very nicely for such use. It does not "slow down".
If you are doing something different; what is it?
Pagination and gaps
Not a problem. See also: http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks which is more aimed at walking through an entire table. It focuses on an efficient way to find the 30th row going forward. (This is not necessarily any better than remembering the last id.)
That link is aimed at DELETEing, but can easily be revised toSELECT`.
Some math for scanning a 600K-row table 30 rows at a time:
My links: 600K rows are touched. Or twice that, if you peek forward with LIMIT 30,1 as suggested in the second link.
OFFSET ..., 30 must touch (600K/30)*600K/2 rows -- about 6 billion rows.
(Corollary: changing 30 to 100 would speed up your query, though it would still be painfully slow. It would not speed up my approach, but it is already quite fast.)
I try to tune InnoDB Buffer Pool Flushing parameters.
In MySQL 5.7 manual
innodb_lru_scan_depth * innodb_buffer_pool_instances = amount of work performed by the page cleaner thread each second
My question is : How can I calculate the amount of work performed by the page cleaner thread each second?
Run the SQL command:
SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_pages_flushed'
Once every second. Compare the value to the previous second.
The difference of that value from one second to the next is the number of dirty pages the page cleaner requested to flush to disk.
Example:
mysql> SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_pages_flushed';
+----------------------------------+-----------+
| Variable_name | Value |
+----------------------------------+-----------+
| Innodb_buffer_pool_pages_flushed | 496786650 |
+----------------------------------+-----------+
...wait a moment...
mysql> SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_pages_flushed';
+----------------------------------+-----------+
| Variable_name | Value |
+----------------------------------+-----------+
| Innodb_buffer_pool_pages_flushed | 496787206 |
+----------------------------------+-----------+
So in the moment I waited, the page cleaner flushed 556 pages.
The upper limit of this work is a complex calculation, involving several InnoDB configuration options. Read my answer to How to solve mysql warning: "InnoDB: page_cleaner: 1000ms intended loop took XXX ms. The settings might not be optimal "? for a description of how it works.
I have database of just 5 million rows, But inner joins and IN taking to much time (55seconds,60seconds). so i am checking if there is a problem with my MyISAM setting.
Query: SHOW STATUS LIKE 'key%'
+------------------------+-------------+
| Variable_name | Value |
+------------------------+-------------+
| Key_blocks_not_flushed | 0 |
| Key_blocks_unused | 275029 |
| Key_blocks_used | 3316428 |
| Key_read_requests | 11459264178 |
| Key_reads | 3385967 |
| Key_write_requests | 91281692 |
| Key_writes | 27930218 |
+------------------------+-------------+
give me your suggestions to increase performance of MyISAM
I have worked with more then 45GB database, I was also faced performance issue,
Here are the some stpes which I have taken for improve perfomance.
(1) Remove any unnecessary indexes on the table, paying particular attention to UNIQUE indexes as these disable change buffering. Don't use a UNIQUE index if you have no reason for that constraint; prefer a regular INDEX.
(2) Inserting in order will result in fewer page splits (which will perform worse on tables not in memory), and the bulk loading is not specifically related to the table size, but it will help reduce redo log pressure.
(3) If bulk loading a fresh table, delay creating any indexes besides the PRIMARY KEY. If you create them once all data is loaded, then InnoDB is able to apply a pre-sort and bulk load process which is both faster and results in typically more compact indexes. This optimization became true in MySQL 5.5.
(4) Make sure to use InnoDB instead of MyISAM. MyISAM can be faster at inserts to the end of a table. Innodb is row level locking and MYISAM is table level locking
(5) Try to avoid complex SELECT queries on MyISAM tables that are updated frequently, and use query like which return less result on first condition
(6) For MyISAM tables that change frequently, try to avoid all variable-length columns (VARCHAR, BLOB, and TEXT). The table uses dynamic row format if it includes even a single variable-length column
We have a big table with the following table structure:
CREATE TABLE `location_data` (
`id` int(20) NOT NULL AUTO_INCREMENT,
`dt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`device_sn` char(30) NOT NULL,
`data` char(20) NOT NULL,
`gps_date` datetime NOT NULL,
`lat` double(30,10) DEFAULT NULL,
`lng` double(30,10) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `dt` (`dt`),
KEY `data` (`data`),
KEY `device_sn` (`device_sn`,`data`,`dt`),
KEY `device_sn_2` (`device_sn`,`dt`)
) ENGINE=MyISAM AUTO_INCREMENT=721453698 DEFAULT CHARSET=latin1
Many times we have performed query such as follow:
SELECT * FROM location_data WHERE device_sn = 'XXX' AND data = 'location' ORDER BY dt DESC LIMIT 1;
OR
SELECT * FROM location_data WHERE device_sn = 'XXX' AND data = 'location' AND dt >= '2014-01-01 00:00:00 ' AND dt <= '2014-01-01 23:00:00' ORDER BY dt DESC;
We have been optimizing this in a few ways:
By adding index and using FORCE INDEX on device_sn.
Separating the table into multiple tables based on the date (e.g. location_data_20140101) and pre-checking if there is a data based on certain date and we will pull that particular table alone. This table is created by cron once a day and the data in location_data for that particular date will be deleted.
The table location_data is HIGH WRITE and LOW READ.
However, few times, the query is running really slow. I wonder if there are other methods / ways / restructure the data that allows us to read a data in sequential date manner based on a given device_sn.
Any tips are more than welcomed.
EXPLAIN STATEMENT 1ST QUERY:
+----+-------------+--------------+------+----------------------------+-----------+---------+-------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+----------------------------+-----------+---------+-------------+------+-------------+
| 1 | SIMPLE | location_dat | ref | data,device_sn,device_sn_2 | device_sn | 50 | const,const | 1 | Using where |
+----+-------------+--------------+------+----------------------------+-----------+---------+-------------+------+-------------+
EXPLAIN STATEMENT 2nd QUERY:
+----+-------------+--------------+-------+-------------------------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+-------------------------------+------+---------+------+------+-------------+
| 1 | SIMPLE | test_udp_new | range | dt,data,device_sn,device_sn_2 | dt | 4 | NULL | 1 | Using where |
+----+-------------+--------------+-------+-------------------------------+------+---------+------+------+-------------+
The index device_sn (device_sn,data,dt) is good. MySQL should use it without need to do any FORCE INDEX. You can verify it by running "explain select ..."
However, your table is MyISAM, which is only supports table level locks. If the table is heavily write it may be slow. I would suggest converting it to InnoDB.
Ok, I'll provide info that I know and this might not answer your question but could provide some insight.
There exits certain differences between InnoDB and MyISAM. Forget about full text indexing or spatial indexes, the huge difference is in how they operate.
InnoDB has several great features compared to MyISAM.
First off, it can store the data set it works with in RAM. This is why database servers come with a lot of RAM - so that I/O operations could be done quick. For example, an index scan is faster if you have indexes in RAM rather than on HDD because finding data on HDD is several magnitudes slower than doing it in RAM. Same applies for full table scans.
The variable that controls this when using InnoDB is called innodb_buffer_pool_size. By default it's 8 MB if I am not mistaken. I personally set this value high, sometimes even up to 90% of available RAM. Usually, when this value is optimized - a lot of people experience incredible speed gains.
The other thing is that InnoDB is a transactional engine. That means it will tell you that a write to disk succeeded or failed and that will be 100% correct. MyISAM won't do that because it doesn't force OS to force HDD to commit data permanently. That's why sometimes records are lost when using MyISAM, it thinks data is written because OS said it was when in reality OS tried to optimize the write and HDD might lose buffer data, thus not writing it down. OS tries to optimize the write operation and uses HDD's buffers to store larger chunks of data and then it flushes it in a single I/O. What happens then is that you don't have control over how data is being written.
With InnoDB you can start a transaction, execute say 100 INSERT queries and then commit. That will effectively force the hard drive to flush all 100 queries at once, using 1 I/O. If each INSERT is 4 KB long, 100 of them is 400 KB. That means you'll utilize 400kb of your disk's bandwith with 1 I/O operation and that remainder of I/O will be available for other uses. This is how inserts are being optimized.
Next are indexes with low cardinality - cardinality is a number of unique values in an indexed column. For primary key this value is 1. it's also the highest value. Indexes with low cardinality are columns where you have a few distinct values, such as yes or no or similar. If an index is too low in cardinality, MySQL will prefer a full table scan - it's MUCH quicker. Also, forcing an index that MySQL doesn't want to use could (and probably will) slow things down - this is because when using an indexed search, MySQL processes records one by one. When it does a table scan, it can read multiple records at once and avoid processing them. If those records were written sequentially on a mechanical disk, further optimizations are possible.
TL;DR:
use InnoDB on a server where you can allocate sufficient RAM
set the value of innodb_buffer_pool_size large enough so you can allocate more resources for faster querying
use an SSD if possible
try to wrap multiple INSERTs into transactions so you can better utilize your hard drive's bandwith and I/O
avoid indexing columns that have low unique value count compared to row count - they just waste space (though there are exceptions to this)
I'm currently using MySQL workbench. I want to see the difference in performance as the number of rows in a table increases. I want to specifically test and compare 1000 rows, 10,000 rows, 100,000 rows, 1,000,000 rows and 10,000,000 rows.
So, are there any tools that will allow me to do this and provide statistics on disk I/O, memory usage, CPU usage and time to complete query?
yes. Benchmark is your best option I guess for some of them
you can make simple queries likes:
jcho360> select benchmark (10000000,1+1);
+--------------------------+
| benchmark (10000000,1+1) |
+--------------------------+
| 0 |
+--------------------------+
1 row in set (0.18 sec)
jcho360> select benchmark (10000000,1/1);
+--------------------------+
| benchmark (10000000,1/1) |
+--------------------------+
| 0 |
+--------------------------+
1 row in set (1.30 sec)
a sum is faster than a division (you can do this with all the things that you can imagine.
I'll recommend you to take a look to this program that will help you with this part of performance.
Mysqlslap (it's like benchmark but you can customize more the result).
SysBench (test CPUperformance, I/O performance, mutex contention, memory speed, database performance).
Mysqltuner (with this you can analize general statistics, Storage engine Statistics, performance metrics).
mk-query-profiler (perform analysis of a SQL Statement).
mysqldumpslow (good to know witch queries are causing problems).
some of them are third party, but I'm pretty sure that you can find tons of info googling the name of the APP