I am facing serious performance issue in inserting, selecting and updating rows to a table in mysql.
The table structure I am using is
CREATE TABLE `sessions` (
`sessionid` varchar(40) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`expiry` datetime NOT NULL,
`value` text NOT NULL,
`data` text,
PRIMARY KEY (`sessionid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Sessions';
The queries for which I face issue are :
INSERT INTO sessions (SESSIONID, EXPIRY, DATA, VALUE) VALUES ('b8c10810c505ba170dd9403072b310ed', '2019-05-01 17:25:50', 'PFJlc3BvbnNlIHhtbG5zPSJ1cm46b2FzaXM6bmFtZXM', '7bKDofc/pyFSQhm7QE5jb6951Ahg6Sk8OCVZI7AcbUPb4jZpHdrCAKuCPupJO14DNY3jULxKppLadGlpsKBifiJavZ/');
UPDATE sessions SET EXPIRY = '2019-05-01 17:26:07' WHERE (SESSIONID = 'e99a0889437448091a06a43a44d0f170');
SELECT SESSIONID, EXPIRY, DATA, VALUE FROM sessions WHERE (SESSIONID = '507a752c48fc9cc3043a3dbe889c52eb');
I tried explaining the query but was not able to infer much about optimizing the table/query.
From the slow query report the time taken
for select in average is 23.45, for update it is 15.93 and for insert it is
22.31.
Any help in identifying the issue is much appreciated.
How many queries per second?
How big is the table?
How much RAM?
What is the value of innodb_buffer_pool_size?
UUIDs are terrible for performance. (Is that a SHA1?) This is because they are so random that the 'next' query (any of those you mentioned) is likely not to be in cache, hence necessitating a disk hit.
So, with a table that is much larger than the buffer_pool, you won't be able to sustain more than about 100 queries per second with a spinning drive. SSD would be faster.
More on the evils of UUIDs (SHA1 has the same unfortunate properties, but no solution like the one for uuids): http://mysql.rjweb.org/doc.php/uuid
One minor thing you can do is to shrink the table:
session_id BINARY(20)
and use UNHEX() when inserting/updating/deleting and HEX() when selecting.
More
51KB avg row len --> The TEXT columns are big, and "off-record", hence multiple blocks needed to work with a row.
0.8GB buffer_pool, but 20GB of data, and 'random' PRIMARY KEY --> The cache is virtually useless.
These mean that there will be multiple disk hits to for each query, but probably under 10.
300ms (a fast time) --> about 30 disk hits on HDD (more on SSD; which do you have?).
So, I must guess that 20s for a query happened when there was a burst of activity that had the queries stumbling over each other, leading to lots of I/O contention.
What to do? Most of the data looks like hex. If that is true, you could cut the disk footprint in half (and cut back some on disk hits needed) by packing and using BINARY(..) or BLOB.
INSERT INTO sessions (SESSIONID, EXPIRY, DATA, VALUE)
VALUES (UNHEX('b8c10810c505ba170dd9403072b310ed'),
'2019-05-01 17:25:50',
UNHEX('PFJlc3BvbnNlIHhtbG5zPSJ1cm46b2FzaXM6bmFtZXM'),
UNHEX('7bKDofc/pyFSQhm7QE5jb6951Ahg6Sk8OCVZI7AcbUPb4jZpHdrCAKuCPupJO14DNY3jULxKppLadGlpsKBifiJavZ/'));
UPDATE sessions SET EXPIRY = '2019-05-01 17:26:07'
WHERE SESSIONID = UNHEX('e99a0889437448091a06a43a44d0f170');
SELECT SESSIONID, EXPIRY, DATA, VALUE FROM sessions
WHERE SESSIONID = UNHEX('507a752c48fc9cc3043a3dbe889c52eb');
and
`sessionid` VARBINARY(20) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`expiry` datetime NOT NULL,
`value` BLOB NOT NULL,
`data` BLOB,
And ROW_FORMAT=DYNAMIC might be optimal (but this is not critical).
Your queries looks good, but problem is with your server, it may not be having enough memory to handle such request, you can increase memory of your database server to to get optimised response
Related
I have a table like this:
create table test (
id int primary key auto_increment,
idcard varchar(30),
name varchar(30),
custom_value varchar(50),
index i1(idcard)
)
I insert 30,000,000 rows to the table
and then I execute:
select * from test where idcard='?'
The statement cost 12 seconds to return
when I use iostat to monitor disk
the read speed is about 6 mb/s while the util is 94%
is any way to optimize it?
12 seconds may be realistic.
Assumptions about the question:
A total of 30M rows, but only 3000 rows in the resultset.
Not enough room to cache things in RAM or you are running from a cold start.
InnoDB or MyISAM (the analysis is the same; the details are radically different).
Any CHARACTER SET and COLLATION for idcard.
INDEX(idcard) exists and is used in the query.
HDD disk drive, not SSD.
Here's a breakdown of the processing:
Go to the index, find the first entry with ?, scan forward until hitting an entry that is not ? (about 3K rows later).
For each of those 3K items, reach into the table to find all the columns (cf SELECT *.
Deliver them.
Step 1: Fast.
Step 2: This is (based on the assumption of not being cached) costly. It may involve about 3K disk hits. For an HDD, that would be about 30 seconds. So, 12 seconds could imply some of the stuff was cached or happened to be near each other.
Step 3: This is a network cost, which I am not considering.
Run the query a second time. It may take only 1 second the this time -- because all 3K blocks are cached in RAM! And iostat will show zero activity!
is any way to optimize it?
Well...
You already have the best index.
What are you going to do with 3000 rows all at once? Is this a one-time task?
When using InnoDB, innodb_buffer_pool_size should be about 70% of available RAM, but not so big that it leads to swapping. What is its setting, and how much RAM do you have and what else is running on the machine?
Could you do more of the task while you are fetching the 3K rows?
Switching to SSDs would help, but I don't like hardware bandaids; they are not reusable.
How big is the table (in GB) -- perhaps 3GB data plus index? (SHOW TABLE STATUS.) If you can't make the buffer_pool big enough for it, and you have a variety of queries that compete for different parts of this (and other) tables, then more RAM may be beneficial.
Seems more like an I/O limitation than something that could be solved by adding indices. What will improve the speed is change the collation of the idcard column to latin1_bin. This uses only 1 byte per character. It also uses binary comparison which is faster than case insensitive comparison.
Only do this if you have no special characters in the idcard column, because the character set of latin1 is quite limited.
ALTER TABLE `test` CHANGE COLUMN `idcard` `idcard` VARCHAR(30) COLLATE 'latin1_bin' AFTER `id`;
Furthermore the ROW_FORMAT=FIXED also improves the speed. ROW_FORMAT=FIXED is not available using the InnoDB engine, but it is with MyISAM. The resulting table I now have is shown below. It's 5 times quicker (80% less time) with select statements than the initial table.
Note that I also changed the collation for 'name' and 'custom_value' to latin1_bin. This does make quite a difference in speed in my test setup, and I'm still figuring out why.
CREATE TABLE `test` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`idcard` VARCHAR(30) COLLATE 'latin1_bin',
`name` VARCHAR(30) COLLATE 'latin1_bin',
`custom_value` VARCHAR(50) COLLATE 'latin1_bin',
PRIMARY KEY (`id`),
INDEX `i1` (`idcard`)
)
ENGINE=MyISAM
ROW_FORMAT=FIXED ;
You may try adding the three other columns in the select clause to the index:
CREATE INDEX idx ON test (idcard, id, name, custom_value);
The three columns other than idcard are being added to allow the index to cover everything being selected. The problem with your current index is that it is only on idcard. This means that once MySQL has traversed down to each leaf node in the index, it would have to do another seek back to the clustered index to lookup the values of all columns mentioned in the select *. As a result of this, MySQL may choose to ignore the index completely. The suggestion I made above avoids this additional seek.
I would appreciate if someone could explain how is it possible MySQL is not churning with a large table on default config.
note: I don't need advice how to increase the memory, improve the performance or migrate etc. I want to understand why it is working and performing well.
I have the following table:
CREATE TABLE `daily_reads` (
`a` varchar(32) NOT NULL DEFAULT '',
`b` varchar(50) NOT NULL DEFAULT '',
`c` varchar(20) NOT NULL DEFAULT '',
`d` varchar(20) NOT NULL DEFAULT '',
`e` varchar(20) NOT NULL DEFAULT '',
`f` varchar(10) NOT NULL DEFAULT 'Wh',
`g` datetime NOT NULL,
`PERIOD_START` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`i` decimal(16,3) NOT NULL,
`j` decimal(16,3) NOT NULL DEFAULT '0.000',
`k` decimal(16,2) NOT NULL DEFAULT '0.00',
`l` varchar(1) NOT NULL DEFAULT 'N',
`m` varchar(1) NOT NULL DEFAULT 'N',
PRIMARY KEY (`a`,`b`,`c`,`PERIOD_START`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
It is running on a VM with 1 CPU Core, 6GB RAM, CentOS 7 (have very limited access to that VM).
It is running on a default MySQL config with 128MB buffer pool (SELECT ##innodb_buffer_pool_size/1024/1024)
DB size is ~96GB, ~560M rows in the 'reads' table, ~710M rows with other tables.
select database_name, table_name, index_name, stat_value*##innodb_page_size
from mysql.innodb_index_stats where stat_name='size';
PRIMARY: 83,213,500,416 (no other indexes)
I get like ~500K reads/month and writes are done only as part of an ETL process directly from Informatica to the DB (~ 75M writes/month).
The read queries are called only via stored procedure:
CALL sp_get_meter_data('678912345678', '1234567765432', '2017-01-13 00:00:00', '2017-05-20 00:00:00');
// striped out the not important bits:
...
SET daily_from_date = DATE_FORMAT(FROM_DATE_TIME, '%Y-%m-%d 00:00:00');
SET daily_to_date = DATE_FORMAT(TO_DATE_TIME, '%Y-%m-%d 23:59:59');
...
SELECT
*
FROM
daily_reads
WHERE
A = FRIST_NUMBER
AND
B = SECOND_NUMBER
AND
daily_from_date <= PERIOD_START
AND
daily_to_date >= PERIOD_START
ORDER BY
PERIOD_START ASC;
My understanding of InnoDB is quite limited, but I thought I need to fit all indexes into memory to do fast queries. The read procedure takes only a few milliseconds. I thought it is not technically possible to query 500M+ tables fast enough on a default MySQL config...?
What am I missing?
note: I don't need advice how to increase the memory, improve the performance or migrate etc. I want to understand why it is working and performing well.
Long answer: Your primary key is a composite of several columns starting with a and b.
Your WHERE clause says this.
WHERE a = FRIST_NUMBER
AND b = SECOND_NUMBER
AND etc etc.
This WHERE clauses exploits the index associated with your primary key very efficiently indeed. It random-accesses the index to precisely the first row it needs, and then scans it sequentially. So it doesn't actually have to page in much of your index or your table to satisfy your query.
Short answer: When queries exploit indexes, MySQL is fast and cheap.
If you wanted an index that was perfect for this query, it would be a composite index on (a, b, daily_from_date). This would use equality matching to hit the first matching row in the index, then range scan the index for your chosen date range. But the performance you have now is pretty good.
You asked whether the index must fit entirely in memory. No. The entire purpose of DBMS software is to handle volumes of data that can't possibly fit in memory at once. Good DBMS implementations do a good job of maintaining memory caches, and refreshing those caches from mass storage, when needed. The innodb buffer pool is one such cache. Keep in mind that any insertions or updates to a table require both the table data and the index data to be written to mass storage eventually.
The performances can be improved with some index.
In your specific case, you are filtering on 3 columns: A, B, and PERIOD_START.
To speed up the query you can use index on this columns.
Add an index over PERIOD_START can be inefficient because this type stores TIME information, so you have a lot of differnt values in the same day.
You can add a new column to store the DATE part of PERIOD_START in the correct type (DATE) (something like PERIOD_START_DATE) and add an index on this column.
This makes a more effective indexing and this can improve the computation performance because you are using a look up table (key -> values).
If you do not want to change your client code, you can use a "Generated stored column". See MySql manual
Best regards
its possible your index is getting used (probably not given the leading edge doesnt match the columns in your query) but even if it isn't, you'd only ever read through the table once because the query doesn't have any joins and the subsequent runs would pick the cached results.
Since You're using informatica to load the data (its a swiss army knife of data loading) it may be doing a lot more than you realise e.g. assuming the data load is all inserts then it may drop and recreate indexes and run in bulk mode to load the data really quickly. It may even prerun the query to prime your cache with the first post load run.
Doesn't the index have to fit in memory?
No, the entire index does not have to fit in memory. Only the part of the index that needs to be examined during the query execution.
Since you have conditions on the left-most columns of your primary key (which is you clustered index), the query only examines rows that match the values you search for. The rest of the table is not examined at all.
You can try using EXPLAIN with your query and see an estimate of the number of rows examined. This is only a rough estimate calculated by the optimizer, but it should show that your query only needs to examine a small subset of the 550 million rows.
The InnoDB buffer pool keeps copies of frequently-used pages in RAM. The more frequently a page is used, the more likely it is to stay in the buffer pool and not get kicked out. Over time, as you run queries, your buffer pool gradually stabilizes with the set of pages that is most worth keeping in RAM.
If your query workload were to really scan your entire table frequently, then the small buffer pool would churn a lot more. But it's likely that your queries request the same small subset of the table repeatedly. A phenomenon called the Pareto Principle applies in many real-world applications: the majority of the requests are satisfied by a small minority of data.
This principle tends to fail when we run complex analytical queries, because those queries are more likely to scan the entire table.
In my database I have table named fo_image_guestimage, It Contains more then 2,63,000 rows only. But when i trying to update only one content in that, it yakes too much of time (121.683ms)
My Table structure-
my query execution and its time
How to minimize the query time in mysql? My table type was InnoDB.
EDIT 1-
My DATABASE Size - 3.5 GB , fo_guest_image table size 2.8 GB
Table Structure
CREATE TABLE `fo_guest_image` (
`Fo_Image_Id` INT(10) NOT NULL AUTO_INCREMENT,
`Fo_Image_Regno` VARCHAR(10) NULL,
`Fo_Image_GuestHistoryId` INT(10) NOT NULL,
`Fo_Image_Photo` LONGBLOB NOT NULL,
`Fo_Image_Doc1` LONGBLOB NOT NULL,
`Fo_Image_Doc2` LONGBLOB NOT NULL,
`Fo_Image_Doc3` LONGBLOB NOT NULL,
`Fo_Image_Doc4` LONGBLOB NOT NULL,
`Fo_Image_Doc5` LONGBLOB NOT NULL,
`Fo_Image_Doc6` LONGBLOB NOT NULL,
`Fo_Image_Billno` VARCHAR(10) NULL,
PRIMARY KEY (`Fo_Image_Id`)
)
ENGINE=InnoDB
ROW_FORMAT=DEFAULT
AUTO_INCREMENT=36857
Query With Execution Time -
select COUNT(Fo_Image_Regno) from fo_guest_image; Time: 11.483ms
select * from fo_guest_image where Fo_Image_Regno='G13603'; Time: 101.381ms
update fo_guest_image set Fo_Image_Regno='T13603' where Fo_Image_Regno='G13603'; Time: 144.360ms
I have tried a nonblob table: - fo_daybook Size 400 KB
Query With Execution Time -
select * from fo_daybook; Time: 0.144ms
select fo_daybok_Regno from fo_daybook; Time: 0.004ms
update fo_daybook set fo_daybok_Regno ='T13603' where fo_daybok_Regno ='G13603'; Time: 0.011ms
My Client Added 1000 rows per day in fo_guest_image. Now fo_guest_image table size 2.8 GB, surely it increase day by day. I am scary if the table has one day reach 10 G.B. Then what will happen to performance.
Short solution: add index on the column "Fo_Image_Regno"
Longer, but much better, solution: do not store images in the tables. Store only links to the images in the table. And then store images in local folders/directories on the system. It will be much better now and in the future
My apologies for incorrect first unswer. I did not realise that you are updating a non-blob column. However all my reservations about using BLOB columns to store documents are valid.
Re this UPDATE statement updated 14119 rows. It had to read all 2.9 million rows find those 14k that match WHERE clause and only then update them.
Check how long a SELECT query will run. I'm sure it will be pretty close to timing of update statement. 2.9 mn rows is not a small dataset.
Adding an index on fo_image_GuestHistoryID will speed up this query, but slightly slow down inserts into the table. Also it will take some time to create the index.
Index will speed it up, but there costs of having an index. In case of adding 1000 ros a day benefits of indes should outweight its costs.
I'm trying to run what I believe to be a simple query on a fairly large dataset, and it's taking a very long time to execute -- it stalls in the "Sending data" state for 3-4 hours or more.
The table looks like this:
CREATE TABLE `transaction` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`uuid` varchar(36) NOT NULL,
`userId` varchar(64) NOT NULL,
`protocol` int(11) NOT NULL,
... A few other fields: ints and small varchars
`created` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `uuid` (`uuid`),
KEY `userId` (`userId`),
KEY `protocol` (`protocol`),
KEY `created` (`created`)
) ENGINE=InnoDB AUTO_INCREMENT=61 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4 COMMENT='Transaction audit table'
And the query is here:
select protocol, count(distinct userId) as count from transaction
where created > '2012-01-15 23:59:59' and created <= '2012-02-14 23:59:59'
group by protocol;
The table has approximately 222 million rows, and the where clause in the query filters down to about 20 million rows. The distinct option will bring it down to about 700,000 distinct rows, and then after grouping, (and when the query finally finishes), 4 to 5 rows are actually returned.
I realize that it's a lot of data, but it seems that 4-5 hours is an awfully long time for this query.
Thanks.
Edit: For reference, this is running on AWS on a db.m2.4xlarge RDS database instance.
Why don't you profile a query and see what exactly is happening?
SET PROFILING = 1;
SET profiling_history_size = 0;
SET profiling_history_size = 15;
/* Your query should be here */
SHOW PROFILES;
SELECT state, ROUND(SUM(duration),5) AS `duration (summed) in sec` FROM information_schema.profiling WHERE query_id = 3 GROUP BY state ORDER BY `duration (summed) in sec` DESC;
SET PROFILING = 0;
EXPLAIN /* Your query again should appear here */;
I think this will help you in seeing where exactly query takes time and based on result you can perform optimization operations.
This is a really heavy query. To understand why it takes so long you should understand the details.
You have a range condition on the indexed field, that is MySQL finds the smallest created value in the index and for each value it gets the corresponding primary key from the index, retrieves the row from disk, and fetches the required fields (protocol, userId) missing in the current index record, puts them in a "temporary table", making the groupings on those 700000 rows. The index can actually be used and is used here only for speeding up the range condition.
The only way to speed it up, is to have an index that contains all the necessary data, so that MySQL would not need to make on disk lookups for the rows. That is called a covering index. But you should understand that the index will reside in memory and will contain ~ sizeOf(created+protocol+userId+PK)*rowCount bytes, that may become a burden as itself for the queries that update the table and for other indexes. It is easier to create a separate aggregates table and periodically update the table using your query.
Both distinct and group by will need to sort and store temporary data on the server. With that much data that might take a while.
Indexing different combinations of userId, created and protocol will help, but I can't say how much or what index will help the most.
Starting from a certain version of MariaDB (maybe since 10.5), I noticed that after importing a dump with
mysql dbname < dump.sql
the optimizer thinks things different from how they are, making the wrong decisions about indexes.
In general even listing tables innodb with phpmyadmin becomes very very slow.
I noticed that running
ANALYZE TABLE myTable;
fixes.
So after each import I run, that it's equal to run ANALYZE on each table
mysqlcheck -aA
I'm trying to implement a very fast table meant to store relationships between users.
CREATE TABLE IF NOT EXISTS `friends_ram` (
`a` varchar(16) CHARACTER SET latin1 COLLATE latin1_general_ci NOT NULL,
`b` varchar(16) CHARACTER SET latin1 COLLATE latin1_general_ci NOT NULL
) ENGINE=MEMORY DEFAULT CHARSET=latin1;
INSERT INTO friends_ram (a, b)
I made some tests with circa 5M of relations and it's blazing fast and it occupies circa 134MB of ram; my question is, since the queries will be:
SELECT a WHERE b = 'foo';
or
SELECT b WHERE a = 'baar';
I'd like to know if I should use a proper indexing (increasing the size of RAM required).
I'm actually ashamed of the results,
Probably the first time I made the tests i misread the output.
Turns out that with 1000 random queries without index on a or b it takes 1000 the times with proper indexing. ahem...
Another very important thing to notice is that I tried with memcached. while it takes a little longer to store data it's faster for retrieval. also it consumes way less memory.
mysql 192MB -> Mysql MEMORY engine did it in; 0.50138092041016 seconds
memcached 76MB -> Memcache engine did it in; 0.34592795372009 seconds
memcached compressed: 45.4 MBytes -> Memcache engine did it in; 0.31583189964294 seconds
so, if you need to store simple things such as these I'd recommend memcached (compressed)