Mysql: Duplicate key error with autoincrement primary key - mysql

I have a table 'logging' in which we log visitor history. We have 14 millions pageviews in a day, so we insert 14 million records in table in a day, and traffic is highest in afternoon. From somedays we are facing the problems for duplicate key entry 'id', which according to me should not be the case, since id is autoincremented field and we are not explicitly passing id in insert query. Following are the details
logging (MyISAM)
----------------------------------------
| id | int(20) |
| virtual_user_id | varchar(1000) |
| visited_page | varchar(255) |
| /* More such columns are there */ |
----------------------------------------
Please let me know what is the problem here. Is keeping table in MyISAM a problem here.

Problem 1: size of your primary key
http://dev.mysql.com/doc/refman/5.0/en/integer-types.html
The max size of an INT regardless of the size you give it is 2147483647, twice that much if unsigned.
That means you get a problem every 153 days.
To prevent that you might want to change the datatype to an unsigned bigint.
Or for even more ridiculously large volumes even a unix timestamp + microtime as a composite key. Or a different DB solution altogether.
Problem 2: the actual error
It might be concurrency, even though I don't find that very plausible.
You'll have to provide the insert IDs / errors for that. Do you use transactions?
Another possibility is a corrupt table.
Don't know your mysql version, but this might work:
CHECK TABLE tablename
See if that has any complaints.
REPAIR TABLE tablename
General advice:
Is this a sensible amount of data to be inserting into a database, and doesn't it slow everything down too much anyhow?
I wonder how your DB performs with locking and all during the delete during for example an alter table.
The right way to do it totally depends on the goals and requirements of your system which I don't know, but here's an idea:
Log lines into a log. Import the log files in our own time. Don't bother your visitors with errors or delays when your DB is having trouble or when you need to do some big operation that locks everything.

Related

split table performance in mysql

everyone. Here is a problem in my mysql server.
I have a table about 40,000,000 rows and 10 columns.
Its size is about 4GB.And engine is innodb.
It is a master database, and only execute one sql like this.
insert into mytable ... on duplicate key update ...
And about 99% sqls executed update part.
Now the server is becoming slower and slower.
I heard that split table may enhance its performance. Then I tried on my personal computer, splited into 10 tables, failed , also tried 100 ,failed too. The speed became slower instead. So I wonder why splitting tables didn't enhance the performance?
Thanks in advance.
more details:
CREATE TABLE my_table (
id BIGINT AUTO_INCREMENT,
user_id BIGINT,
identifier VARCHAR(64),
account_id VARCHAR(64),
top_speed INT UNSIGNED NOT NULL,
total_chars INT UNSIGNED NOT NULL,
total_time INT UNSIGNED NOT NULL,
keystrokes INT UNSIGNED NOT NULL,
avg_speed INT UNSIGNED NOT NULL,
country_code VARCHAR(16),
update_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY(id), UNIQUE KEY(user_id)
);
PS:
I also tried different computers with Solid State Drive and Hard Disk Drive, but didn't help too.
Splitting up a table is unlikely to help at all. Ditto for PARTITIONing.
Let's count the disk hits. I will skip counting non-leaf nodes in BTrees; they tend to be cached; I will count leaf nodes in the data and indexes; they tend not to be cached.
IODKU does:
Read the index block containing the for any UNIQUE keys. In your case, that is probably user_id. Please provide a sample SQL statement. 1 read.
If the user_id entry is found in the index, read the record from the data as indexed by the PK(id) and do the UPDATE, and leave this second block in the buffer_pool for eventual rewrite to disk. 1 read now, 1 write later.
If the record is not found, do INSERT. The index block that needs the new row was already read, so it is ready to have a new entry inserted. Meanwhile, the "last" block in the table (due to id being AUTO_INCREMENT) is probably already cached. Add the new row to it. 0 reads now, 1 write later (UNIQUE). (Rewriting the "last" block is amortized over, say, 100 rows, so I am ignoring it.)
Eventually do the write(s).
Total, assuming essentially all take the UPDATE path: 2 reads and 1 write. Assuming the user_id follows no simple pattern, I will assume that all 3 I/Os are "random".
Let's consider a variation... What if you got rid of id? Do you need id anywhere else? Since you have a UNIQUE key, it could be the PK. That is replace your two indexes with just PRIMARY KEY(user_id). Now the counts are:
1 read
If UPDATE, 0 read, 1 write
If INSERT, 0 read, 0 write
Total: 1 read, 1 write. 2/3 as many as before. Better, but still not great.
Caching
How much RAM do you have?
What is the value of innodb_buffer_pool_size?
SHOW TABLE STATUS -- What are Data_length and Index_length?
I suspect that the buffer_pool is not big enough, and possible could be raised. If you have more than 4GB of RAM, make it about 70% of RAM.
Others
SSDs should have helped significantly, since you appear to be I/O bound. Can you tell whether you are I/O-bound or CPU-bound?
How many rows are you updating at once? How long does it take? Is it batched, or one at a time? There may be a significant improvement possible here.
Do you really need BIGINT (8 bytes)? INT UNSIGNED is only 4 bytes.
Is a transaction involved?
Is the Master having a problem? The Slave? Both? I don't want to fix the Master in such a way that it messes up the Slave.
Try to split your database into some mysql instances using mysql proxy just like mysql-proxy or haproxy instead of one mysql instance. Maybe you can have great performance.

Scale Large MYSQL Table

I have a table which is growing very quickly, Currently it has 47000000+ rows.
Even very simple queries such as this is taking 46 seconds at times.
SELECT id, userId, visitorId, date FROM user_views LIMIT 20000000, 1;
Table structure is :
Field Type Null Key Default Extra
id int(11)unsigned NO PRI NULL auto_increment
userId int(11)unsigned NO MUL NULL
visitorId int(11) NO MUL NULL
date datetime NO MUL NULL
Already the application is running with 1 master and 6 slaves. Cant afford more instances.
Have btree index on id
Is there any way to make it faster?
Thanks
First of all you should consider using different storage approaches. Depending on your use cases a relational database might not be the best choice. E.g. if 99% of all oprations are writing to the table but not updating existing records (what your column names suggest), a nosql database might perform way better.
Secondly skipping 20000000 rows without any specific order criteria (based on an index of course) leaves it open to the DBMS to apply an arbitrary order, that might be suboptimal.
I don't know MySQL-internal optimization mechanisms, but LIMIT is only applied after the whole resultset has been built, which means you have the whole table loaded in your memory. So please try to reduce the size of the result set using WHERE statements before LIMITing it.

Improve performance on write-only table?

I am using MySQL 5.5 with innoDB. The basis of my server is Netty, JDBC and BoneCP.
I have a log table that contains user inputs(HTTP header, request body etc). This table will only be read very rarely for reasons like security and data recovery. Therefore the read performance is not something we care about.
There are five columns in this table.
Name | Type
--------------------------------------------------
logID | big Integer(auto-increment)(primary key)
userNumber | medium integer
logTime | timestamp
header | varchar(100)
body | varchar(200)
What are some tips that will improve the insert performance?
Also, is the logID neccessary for this case?
If the table is never referenced, why do you use any key at all in there? It's not like it's necessary. It only adds to the insert time, and serves no purpose.
My suggestion would be to drop the logID and not create any indexes on the table at all.
Another optimization would be to change the table type to myISAM. When you only insert and have no constraints on the table, InnoDB will cost you time for the option of ACID compliance, while myISAM doesn't care about that.

MySQL query slow querying table on primary key

So I have a table that's being used basically like a NoSQL setup. The structure is:
id bigint primary key
data mediumblob
modified timestamp
It has around 350k rows. The queries that run on it are all structured as follows:
select data from table where id=XXX;
The table engine is InnoDB. I'm noticing that sometimes queries run against this table are rather slow. Sometimes they take 3 seconds to run. The table is 3 GB on disk and I gave the innodb_buffer_pool_size 4G.
Is there anything I'm missing here? Are there any settings I can tweak to improve performance?
Edit: As requested explain output:
+----+-------------+----------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+-------+------+-------+
| 1 | SIMPLE | cache | const | PRIMARY | PRIMARY | 8 | const | 1 | |
+----+-------------+----------+-------+---------------+---------+---------+-------+------+-------+
create table:
CREATE TABLE `cache` (
`id` bigint(20) unsigned NOT NULL DEFAULT '0',
`data` mediumblob,
`modified` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
There are two issues that I see here initially. First is that you have a query with a blob data type. This will cause speed issues when it comes to data retrieval. Second, you are using InnoDB, which is optimized for writing. This means that while it is probably the best choice overall, in extreme read situations it might be less performant than MyISAM. Neither of these issues are necessarily deal-killers but they do each add a performance hit. Beyond this, however, I'm not sure I can give you a good answer as to what you can do to better optimize without first having you do profiling. That is what I would recommend you do first. Profile your query to figure out what the execution plan is and then identify why the execution plan is so slow.
Here is a good "Top 10" list of MySQL optimizations. At least a couple apply in your situation directly:
http://20bits.com/articles/10-tips-for-optimizing-mysql-queries-that-dont-suck/
Here is another good optimization article that goes into server settings as well (for InnoDB specifically):
http://www.mysqlperformanceblog.com/2007/11/01/innodb-performance-optimization-basics/
Based on the CREATE TABLE statement you provided, I did think of another thing that you should address (again, not a query-killer but it is another performance hit). Unless there is a business case for using a bigint for your ID field, choose an int instead. An int will allow 2.1 billion rows so you shouldn't run out of numbers. Making this switch will save you disk space and it will improve query performance. Here is an article about it:
http://ronaldbradford.com/blog/bigint-v-int-is-there-a-big-deal-2008-07-18/
Try using the minimum size of id as possible. If it's a numeric key that you know will never be larger than a few million, you could use a MEDIUMINT UNSIGNED and save yourself a byte for each record over an INT, which might speed up searches a little. Still, 3 GB is an awful lot for just 350,000 rows.
It sounds like you might also get some bang for your buck by using the partitioning feature to split your table up into logical units. You might want to Google "mysql vertical partitioning" in particular; if there are large columns that you don't access frequently, it would be much more efficient to move them out into a separate table and only query it when you need it.
Could you post your CREATE TABLE statement as well as the output of EXPLAIN select data from table where id=XXX? How is the io wait on the system?
My best guess is that you're IO bound and because the rows aren't all the same size, it's having to search through the data. You have enough memory that it should be able to keep the data cached. This link describes some low level profiling in MySQL that might be helpful.
http://dev.mysql.com/tech-resources/articles/using-new-query-profiler.html
Things I would look for:
when are the slow queries appearing?
is it after a fresh start of the DB? then this might be just a temporary problem - queries hitting in a cold cache
is it during DB dump/load? - then change your backup policies - use replication for example, or add more disk IO (adding more disks in RAID, change disks to SSD, repartition your system on multiple disks, etc)
is it during peak read/write times? replication might also help here - write into master and load balance the reads between master and slaves
Also - is that mediumblob really necessary there?

Loading one table in MySQL is ridiculously slow

Fro clarity all other tables in the DB work as expected, and load ~2million rows in a fraction of a second. The one table of just ~600 rows is taking 10+minutes to load in navcat.
I can't think of any possible reason for this. There are just 4 columns. One of them is a large text field, but I've worked with large text fields before and they've never been this slow.
running explain select * from parser_queue I get
id setect type table type possible keys key key len ref rows extra
1 SIMPLE parser_queue ALL - - - - 658 -
The profile tells me that 453 seconds are spent 'sending data'
I've also got this in the 'Status' tab. I don't understand most of it, but these numbers are much higher than my other tables.
Bytes_received 31
Bytes_sent 32265951
Com_select 1
Created_tmp_files 16
Handler_read_rnd_next 659
Key_read_requests 9018487
Key_reads 3928
Key_write_requests 310431
Key_writes 4290
Qcache_hits 135077
Qcache_inserts 14289
Qcache_lowmem_prunes 4133
Qcache_queries_in_cache 983
Questions 1
Select_scan 1
Table_locks_immediate 31514
The data stored in the text field is about 12000 chars on average.
There is a primary, auto increment int id field, a tinyint status field, a text field, and a timestamp field with on update current timestamp.
OK I will try out both answers, but I can answer the questions quickly first:
Primary key on the ID field is the only key. This table is used for queuing, with ~50 records added/deleted per hour, but I only created it yesterday. Could it become corrupted in such a short time?
It is MyISAM
More work trying to isolate the problem:
repair table did nothing
optimize table did nothing
created a temp table. queries were about 50% slower on the temp table.
Deleted the table and rebuilt it. SELECT * takes 18 seconds with just 4 rows.
Here is the SQL I used to create the table:
CREATE TABLE IF NOT EXISTS `parser_queue` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`status` tinyint(4) NOT NULL DEFAULT '1',
`data` text NOT NULL,
`last_updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=MyISAM;
Stranger still, everything seems fine on my local box. The slowness only happens on the dev site.
For clarity: there are more than 100 tables on the dev site and this is the only one that is funky.
OK I have disabled all cron jobs which use this table. SHOW PROCESSLIST does not reveal any locks on the table.
Changing the engine to InnoDB did not produce any significant improvement (86 seconds vs 94 for MyISAM)
any other ideas? . . .
Running SHOW PROCESSLIST during the query reveals it spends most of its time writing to net
If you suspect corruption somewhere, you can try either (or both) of the following:
CREATE TABLE temp SELECT * FROM parser_queue;
This will create a new table "identical" to the previous one, except it will be recreated. Alternatively (or maybe after you've made a copy), you can try:
REPAIR TABLE parser_queue;
You may also want to try optimizing the table; it might have gotten fragmented since you are using it as a queue.
OPTIMIZE TABLE parser_queue;
You can determine if the table is fragmented by running SHOW TABLE STATUS LIKE 'Data_Free' and see if this produces a high number.
Update
You say you are storing gzcompressed data in the TEXT columns. Try changing the TEXT column to BLOB instead, which is meant to handle binary data, such as compressed text.
The name gives away that you are using the table for queueing (lots of inserts and delets, maybe?). Maybe you have had the table a while and it's heavily fragmented. If my assumptions are correct, try OPTIMIZE TABLE parser_queue;
You can read more about this in the manual:
http://dev.mysql.com/doc/refman/5.1/en/optimize-table.html
Right, the problem seems to have been only this: the text fields where too huge.
Running
SELECT id, status, last_updated FROM parser_queue
takes less time than
SELECT data FROM parser_queue WHERE id = 6
Since all the queries I will be running return only one row, the slowdown will not affect me so much. I'm already using gzcompress on the data stored, so I don't think there is much more I could do anyway.