Fro clarity all other tables in the DB work as expected, and load ~2million rows in a fraction of a second. The one table of just ~600 rows is taking 10+minutes to load in navcat.
I can't think of any possible reason for this. There are just 4 columns. One of them is a large text field, but I've worked with large text fields before and they've never been this slow.
running explain select * from parser_queue I get
id setect type table type possible keys key key len ref rows extra
1 SIMPLE parser_queue ALL - - - - 658 -
The profile tells me that 453 seconds are spent 'sending data'
I've also got this in the 'Status' tab. I don't understand most of it, but these numbers are much higher than my other tables.
Bytes_received 31
Bytes_sent 32265951
Com_select 1
Created_tmp_files 16
Handler_read_rnd_next 659
Key_read_requests 9018487
Key_reads 3928
Key_write_requests 310431
Key_writes 4290
Qcache_hits 135077
Qcache_inserts 14289
Qcache_lowmem_prunes 4133
Qcache_queries_in_cache 983
Questions 1
Select_scan 1
Table_locks_immediate 31514
The data stored in the text field is about 12000 chars on average.
There is a primary, auto increment int id field, a tinyint status field, a text field, and a timestamp field with on update current timestamp.
OK I will try out both answers, but I can answer the questions quickly first:
Primary key on the ID field is the only key. This table is used for queuing, with ~50 records added/deleted per hour, but I only created it yesterday. Could it become corrupted in such a short time?
It is MyISAM
More work trying to isolate the problem:
repair table did nothing
optimize table did nothing
created a temp table. queries were about 50% slower on the temp table.
Deleted the table and rebuilt it. SELECT * takes 18 seconds with just 4 rows.
Here is the SQL I used to create the table:
CREATE TABLE IF NOT EXISTS `parser_queue` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`status` tinyint(4) NOT NULL DEFAULT '1',
`data` text NOT NULL,
`last_updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=MyISAM;
Stranger still, everything seems fine on my local box. The slowness only happens on the dev site.
For clarity: there are more than 100 tables on the dev site and this is the only one that is funky.
OK I have disabled all cron jobs which use this table. SHOW PROCESSLIST does not reveal any locks on the table.
Changing the engine to InnoDB did not produce any significant improvement (86 seconds vs 94 for MyISAM)
any other ideas? . . .
Running SHOW PROCESSLIST during the query reveals it spends most of its time writing to net
If you suspect corruption somewhere, you can try either (or both) of the following:
CREATE TABLE temp SELECT * FROM parser_queue;
This will create a new table "identical" to the previous one, except it will be recreated. Alternatively (or maybe after you've made a copy), you can try:
REPAIR TABLE parser_queue;
You may also want to try optimizing the table; it might have gotten fragmented since you are using it as a queue.
OPTIMIZE TABLE parser_queue;
You can determine if the table is fragmented by running SHOW TABLE STATUS LIKE 'Data_Free' and see if this produces a high number.
Update
You say you are storing gzcompressed data in the TEXT columns. Try changing the TEXT column to BLOB instead, which is meant to handle binary data, such as compressed text.
The name gives away that you are using the table for queueing (lots of inserts and delets, maybe?). Maybe you have had the table a while and it's heavily fragmented. If my assumptions are correct, try OPTIMIZE TABLE parser_queue;
You can read more about this in the manual:
http://dev.mysql.com/doc/refman/5.1/en/optimize-table.html
Right, the problem seems to have been only this: the text fields where too huge.
Running
SELECT id, status, last_updated FROM parser_queue
takes less time than
SELECT data FROM parser_queue WHERE id = 6
Since all the queries I will be running return only one row, the slowdown will not affect me so much. I'm already using gzcompress on the data stored, so I don't think there is much more I could do anyway.
Related
I have a table like this:
create table test (
id int primary key auto_increment,
idcard varchar(30),
name varchar(30),
custom_value varchar(50),
index i1(idcard)
)
I insert 30,000,000 rows to the table
and then I execute:
select * from test where idcard='?'
The statement cost 12 seconds to return
when I use iostat to monitor disk
the read speed is about 6 mb/s while the util is 94%
is any way to optimize it?
12 seconds may be realistic.
Assumptions about the question:
A total of 30M rows, but only 3000 rows in the resultset.
Not enough room to cache things in RAM or you are running from a cold start.
InnoDB or MyISAM (the analysis is the same; the details are radically different).
Any CHARACTER SET and COLLATION for idcard.
INDEX(idcard) exists and is used in the query.
HDD disk drive, not SSD.
Here's a breakdown of the processing:
Go to the index, find the first entry with ?, scan forward until hitting an entry that is not ? (about 3K rows later).
For each of those 3K items, reach into the table to find all the columns (cf SELECT *.
Deliver them.
Step 1: Fast.
Step 2: This is (based on the assumption of not being cached) costly. It may involve about 3K disk hits. For an HDD, that would be about 30 seconds. So, 12 seconds could imply some of the stuff was cached or happened to be near each other.
Step 3: This is a network cost, which I am not considering.
Run the query a second time. It may take only 1 second the this time -- because all 3K blocks are cached in RAM! And iostat will show zero activity!
is any way to optimize it?
Well...
You already have the best index.
What are you going to do with 3000 rows all at once? Is this a one-time task?
When using InnoDB, innodb_buffer_pool_size should be about 70% of available RAM, but not so big that it leads to swapping. What is its setting, and how much RAM do you have and what else is running on the machine?
Could you do more of the task while you are fetching the 3K rows?
Switching to SSDs would help, but I don't like hardware bandaids; they are not reusable.
How big is the table (in GB) -- perhaps 3GB data plus index? (SHOW TABLE STATUS.) If you can't make the buffer_pool big enough for it, and you have a variety of queries that compete for different parts of this (and other) tables, then more RAM may be beneficial.
Seems more like an I/O limitation than something that could be solved by adding indices. What will improve the speed is change the collation of the idcard column to latin1_bin. This uses only 1 byte per character. It also uses binary comparison which is faster than case insensitive comparison.
Only do this if you have no special characters in the idcard column, because the character set of latin1 is quite limited.
ALTER TABLE `test` CHANGE COLUMN `idcard` `idcard` VARCHAR(30) COLLATE 'latin1_bin' AFTER `id`;
Furthermore the ROW_FORMAT=FIXED also improves the speed. ROW_FORMAT=FIXED is not available using the InnoDB engine, but it is with MyISAM. The resulting table I now have is shown below. It's 5 times quicker (80% less time) with select statements than the initial table.
Note that I also changed the collation for 'name' and 'custom_value' to latin1_bin. This does make quite a difference in speed in my test setup, and I'm still figuring out why.
CREATE TABLE `test` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`idcard` VARCHAR(30) COLLATE 'latin1_bin',
`name` VARCHAR(30) COLLATE 'latin1_bin',
`custom_value` VARCHAR(50) COLLATE 'latin1_bin',
PRIMARY KEY (`id`),
INDEX `i1` (`idcard`)
)
ENGINE=MyISAM
ROW_FORMAT=FIXED ;
You may try adding the three other columns in the select clause to the index:
CREATE INDEX idx ON test (idcard, id, name, custom_value);
The three columns other than idcard are being added to allow the index to cover everything being selected. The problem with your current index is that it is only on idcard. This means that once MySQL has traversed down to each leaf node in the index, it would have to do another seek back to the clustered index to lookup the values of all columns mentioned in the select *. As a result of this, MySQL may choose to ignore the index completely. The suggestion I made above avoids this additional seek.
I have a table like below,
Field Type Null Key Default Extra
id bigint(11) NO PRI NULL auto_increment
deviceId bigint(11) NO MUL NULL
value double NO NULL
time timestamp YES MUL 0000-00-00 00:00:00
It has more than 2 million rows. When I run select * from tableName; It takes more than 15 mins.
When I run select value,time from sensor_value where time > '2017-05-21 04:47:48' and deviceId>=812; It takes more than 45 sec to load.
Note : 512 has more than 92514 rows.
Even I have added index for column like below,
ALTER TABLE `sensor_value`
ADD INDEX `IDX_FIELDS1_2` (`time`, `deviceId`) ;
How do I make select query fast?(load in 1sec) Am I doing indexing wrong?
Only 4 columns? Sounds like you have very little RAM, or innodb_buffer_pool_size is set too low. Hence, you were seriously I/O-bound and/or swapping.
WHERE time > '2017-05-21 04:47:48'
AND deviceId >= 812
is two ranges. There is no thorough way to optimize that. Either of these would help. If you have both, the Optimizer might pick the better one:
INDEX(time)
INDEX(deviceId)
When using a 'secondary' index in InnoDB, the query first looks in the index BTree; when there is a match there, it has to look up in the 'data' BTree (using the PRIMARY KEY for lookup).
Some of the anomalous times you saw when trying INDEX(time, deviceId) were because the filtering kept from having to reach over into the data as often.
Do you use id for anything other than uniqueness? Is the pair deviceId & time unique? If the answers are 'no' and 'yes', then get rid of id and change to PRIMARY KEY(deviceId, time). Or you could swap those two columns. What other queries do you have?
Getting rid of id shrinks the table some, thereby cutting down on I/O.
When using combined index usually you must use equality operator on first column and then you can use range criteria on second column. So I recommend you change the order of columns in your index like this:
ALTER TABLE `sensor_value`
ADD INDEX `IDX_FIELDS1_2` (`deviceId`, `time`) ;
then change to use equal sign for deviceId(use deviceId=812 not deviceId>=812):
select value,time from sensor_value where time > '2017-05-21 04:47:48' and deviceId=812;
I hope it could help.
2 million records is not much for Mysql and it is normal to get result in less than 1 sec for 1 billion records if you do the right things.
everyone. Here is a problem in my mysql server.
I have a table about 40,000,000 rows and 10 columns.
Its size is about 4GB.And engine is innodb.
It is a master database, and only execute one sql like this.
insert into mytable ... on duplicate key update ...
And about 99% sqls executed update part.
Now the server is becoming slower and slower.
I heard that split table may enhance its performance. Then I tried on my personal computer, splited into 10 tables, failed , also tried 100 ,failed too. The speed became slower instead. So I wonder why splitting tables didn't enhance the performance?
Thanks in advance.
more details:
CREATE TABLE my_table (
id BIGINT AUTO_INCREMENT,
user_id BIGINT,
identifier VARCHAR(64),
account_id VARCHAR(64),
top_speed INT UNSIGNED NOT NULL,
total_chars INT UNSIGNED NOT NULL,
total_time INT UNSIGNED NOT NULL,
keystrokes INT UNSIGNED NOT NULL,
avg_speed INT UNSIGNED NOT NULL,
country_code VARCHAR(16),
update_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY(id), UNIQUE KEY(user_id)
);
PS:
I also tried different computers with Solid State Drive and Hard Disk Drive, but didn't help too.
Splitting up a table is unlikely to help at all. Ditto for PARTITIONing.
Let's count the disk hits. I will skip counting non-leaf nodes in BTrees; they tend to be cached; I will count leaf nodes in the data and indexes; they tend not to be cached.
IODKU does:
Read the index block containing the for any UNIQUE keys. In your case, that is probably user_id. Please provide a sample SQL statement. 1 read.
If the user_id entry is found in the index, read the record from the data as indexed by the PK(id) and do the UPDATE, and leave this second block in the buffer_pool for eventual rewrite to disk. 1 read now, 1 write later.
If the record is not found, do INSERT. The index block that needs the new row was already read, so it is ready to have a new entry inserted. Meanwhile, the "last" block in the table (due to id being AUTO_INCREMENT) is probably already cached. Add the new row to it. 0 reads now, 1 write later (UNIQUE). (Rewriting the "last" block is amortized over, say, 100 rows, so I am ignoring it.)
Eventually do the write(s).
Total, assuming essentially all take the UPDATE path: 2 reads and 1 write. Assuming the user_id follows no simple pattern, I will assume that all 3 I/Os are "random".
Let's consider a variation... What if you got rid of id? Do you need id anywhere else? Since you have a UNIQUE key, it could be the PK. That is replace your two indexes with just PRIMARY KEY(user_id). Now the counts are:
1 read
If UPDATE, 0 read, 1 write
If INSERT, 0 read, 0 write
Total: 1 read, 1 write. 2/3 as many as before. Better, but still not great.
Caching
How much RAM do you have?
What is the value of innodb_buffer_pool_size?
SHOW TABLE STATUS -- What are Data_length and Index_length?
I suspect that the buffer_pool is not big enough, and possible could be raised. If you have more than 4GB of RAM, make it about 70% of RAM.
Others
SSDs should have helped significantly, since you appear to be I/O bound. Can you tell whether you are I/O-bound or CPU-bound?
How many rows are you updating at once? How long does it take? Is it batched, or one at a time? There may be a significant improvement possible here.
Do you really need BIGINT (8 bytes)? INT UNSIGNED is only 4 bytes.
Is a transaction involved?
Is the Master having a problem? The Slave? Both? I don't want to fix the Master in such a way that it messes up the Slave.
Try to split your database into some mysql instances using mysql proxy just like mysql-proxy or haproxy instead of one mysql instance. Maybe you can have great performance.
I have a table 'logging' in which we log visitor history. We have 14 millions pageviews in a day, so we insert 14 million records in table in a day, and traffic is highest in afternoon. From somedays we are facing the problems for duplicate key entry 'id', which according to me should not be the case, since id is autoincremented field and we are not explicitly passing id in insert query. Following are the details
logging (MyISAM)
----------------------------------------
| id | int(20) |
| virtual_user_id | varchar(1000) |
| visited_page | varchar(255) |
| /* More such columns are there */ |
----------------------------------------
Please let me know what is the problem here. Is keeping table in MyISAM a problem here.
Problem 1: size of your primary key
http://dev.mysql.com/doc/refman/5.0/en/integer-types.html
The max size of an INT regardless of the size you give it is 2147483647, twice that much if unsigned.
That means you get a problem every 153 days.
To prevent that you might want to change the datatype to an unsigned bigint.
Or for even more ridiculously large volumes even a unix timestamp + microtime as a composite key. Or a different DB solution altogether.
Problem 2: the actual error
It might be concurrency, even though I don't find that very plausible.
You'll have to provide the insert IDs / errors for that. Do you use transactions?
Another possibility is a corrupt table.
Don't know your mysql version, but this might work:
CHECK TABLE tablename
See if that has any complaints.
REPAIR TABLE tablename
General advice:
Is this a sensible amount of data to be inserting into a database, and doesn't it slow everything down too much anyhow?
I wonder how your DB performs with locking and all during the delete during for example an alter table.
The right way to do it totally depends on the goals and requirements of your system which I don't know, but here's an idea:
Log lines into a log. Import the log files in our own time. Don't bother your visitors with errors or delays when your DB is having trouble or when you need to do some big operation that locks everything.
So I have a table that's being used basically like a NoSQL setup. The structure is:
id bigint primary key
data mediumblob
modified timestamp
It has around 350k rows. The queries that run on it are all structured as follows:
select data from table where id=XXX;
The table engine is InnoDB. I'm noticing that sometimes queries run against this table are rather slow. Sometimes they take 3 seconds to run. The table is 3 GB on disk and I gave the innodb_buffer_pool_size 4G.
Is there anything I'm missing here? Are there any settings I can tweak to improve performance?
Edit: As requested explain output:
+----+-------------+----------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+-------+------+-------+
| 1 | SIMPLE | cache | const | PRIMARY | PRIMARY | 8 | const | 1 | |
+----+-------------+----------+-------+---------------+---------+---------+-------+------+-------+
create table:
CREATE TABLE `cache` (
`id` bigint(20) unsigned NOT NULL DEFAULT '0',
`data` mediumblob,
`modified` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
There are two issues that I see here initially. First is that you have a query with a blob data type. This will cause speed issues when it comes to data retrieval. Second, you are using InnoDB, which is optimized for writing. This means that while it is probably the best choice overall, in extreme read situations it might be less performant than MyISAM. Neither of these issues are necessarily deal-killers but they do each add a performance hit. Beyond this, however, I'm not sure I can give you a good answer as to what you can do to better optimize without first having you do profiling. That is what I would recommend you do first. Profile your query to figure out what the execution plan is and then identify why the execution plan is so slow.
Here is a good "Top 10" list of MySQL optimizations. At least a couple apply in your situation directly:
http://20bits.com/articles/10-tips-for-optimizing-mysql-queries-that-dont-suck/
Here is another good optimization article that goes into server settings as well (for InnoDB specifically):
http://www.mysqlperformanceblog.com/2007/11/01/innodb-performance-optimization-basics/
Based on the CREATE TABLE statement you provided, I did think of another thing that you should address (again, not a query-killer but it is another performance hit). Unless there is a business case for using a bigint for your ID field, choose an int instead. An int will allow 2.1 billion rows so you shouldn't run out of numbers. Making this switch will save you disk space and it will improve query performance. Here is an article about it:
http://ronaldbradford.com/blog/bigint-v-int-is-there-a-big-deal-2008-07-18/
Try using the minimum size of id as possible. If it's a numeric key that you know will never be larger than a few million, you could use a MEDIUMINT UNSIGNED and save yourself a byte for each record over an INT, which might speed up searches a little. Still, 3 GB is an awful lot for just 350,000 rows.
It sounds like you might also get some bang for your buck by using the partitioning feature to split your table up into logical units. You might want to Google "mysql vertical partitioning" in particular; if there are large columns that you don't access frequently, it would be much more efficient to move them out into a separate table and only query it when you need it.
Could you post your CREATE TABLE statement as well as the output of EXPLAIN select data from table where id=XXX? How is the io wait on the system?
My best guess is that you're IO bound and because the rows aren't all the same size, it's having to search through the data. You have enough memory that it should be able to keep the data cached. This link describes some low level profiling in MySQL that might be helpful.
http://dev.mysql.com/tech-resources/articles/using-new-query-profiler.html
Things I would look for:
when are the slow queries appearing?
is it after a fresh start of the DB? then this might be just a temporary problem - queries hitting in a cold cache
is it during DB dump/load? - then change your backup policies - use replication for example, or add more disk IO (adding more disks in RAID, change disks to SSD, repartition your system on multiple disks, etc)
is it during peak read/write times? replication might also help here - write into master and load balance the reads between master and slaves
Also - is that mediumblob really necessary there?