So I have a table that's being used basically like a NoSQL setup. The structure is:
id bigint primary key
data mediumblob
modified timestamp
It has around 350k rows. The queries that run on it are all structured as follows:
select data from table where id=XXX;
The table engine is InnoDB. I'm noticing that sometimes queries run against this table are rather slow. Sometimes they take 3 seconds to run. The table is 3 GB on disk and I gave the innodb_buffer_pool_size 4G.
Is there anything I'm missing here? Are there any settings I can tweak to improve performance?
Edit: As requested explain output:
+----+-------------+----------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+-------+------+-------+
| 1 | SIMPLE | cache | const | PRIMARY | PRIMARY | 8 | const | 1 | |
+----+-------------+----------+-------+---------------+---------+---------+-------+------+-------+
create table:
CREATE TABLE `cache` (
`id` bigint(20) unsigned NOT NULL DEFAULT '0',
`data` mediumblob,
`modified` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
There are two issues that I see here initially. First is that you have a query with a blob data type. This will cause speed issues when it comes to data retrieval. Second, you are using InnoDB, which is optimized for writing. This means that while it is probably the best choice overall, in extreme read situations it might be less performant than MyISAM. Neither of these issues are necessarily deal-killers but they do each add a performance hit. Beyond this, however, I'm not sure I can give you a good answer as to what you can do to better optimize without first having you do profiling. That is what I would recommend you do first. Profile your query to figure out what the execution plan is and then identify why the execution plan is so slow.
Here is a good "Top 10" list of MySQL optimizations. At least a couple apply in your situation directly:
http://20bits.com/articles/10-tips-for-optimizing-mysql-queries-that-dont-suck/
Here is another good optimization article that goes into server settings as well (for InnoDB specifically):
http://www.mysqlperformanceblog.com/2007/11/01/innodb-performance-optimization-basics/
Based on the CREATE TABLE statement you provided, I did think of another thing that you should address (again, not a query-killer but it is another performance hit). Unless there is a business case for using a bigint for your ID field, choose an int instead. An int will allow 2.1 billion rows so you shouldn't run out of numbers. Making this switch will save you disk space and it will improve query performance. Here is an article about it:
http://ronaldbradford.com/blog/bigint-v-int-is-there-a-big-deal-2008-07-18/
Try using the minimum size of id as possible. If it's a numeric key that you know will never be larger than a few million, you could use a MEDIUMINT UNSIGNED and save yourself a byte for each record over an INT, which might speed up searches a little. Still, 3 GB is an awful lot for just 350,000 rows.
It sounds like you might also get some bang for your buck by using the partitioning feature to split your table up into logical units. You might want to Google "mysql vertical partitioning" in particular; if there are large columns that you don't access frequently, it would be much more efficient to move them out into a separate table and only query it when you need it.
Could you post your CREATE TABLE statement as well as the output of EXPLAIN select data from table where id=XXX? How is the io wait on the system?
My best guess is that you're IO bound and because the rows aren't all the same size, it's having to search through the data. You have enough memory that it should be able to keep the data cached. This link describes some low level profiling in MySQL that might be helpful.
http://dev.mysql.com/tech-resources/articles/using-new-query-profiler.html
Things I would look for:
when are the slow queries appearing?
is it after a fresh start of the DB? then this might be just a temporary problem - queries hitting in a cold cache
is it during DB dump/load? - then change your backup policies - use replication for example, or add more disk IO (adding more disks in RAID, change disks to SSD, repartition your system on multiple disks, etc)
is it during peak read/write times? replication might also help here - write into master and load balance the reads between master and slaves
Also - is that mediumblob really necessary there?
Related
everyone. Here is a problem in my mysql server.
I have a table about 40,000,000 rows and 10 columns.
Its size is about 4GB.And engine is innodb.
It is a master database, and only execute one sql like this.
insert into mytable ... on duplicate key update ...
And about 99% sqls executed update part.
Now the server is becoming slower and slower.
I heard that split table may enhance its performance. Then I tried on my personal computer, splited into 10 tables, failed , also tried 100 ,failed too. The speed became slower instead. So I wonder why splitting tables didn't enhance the performance?
Thanks in advance.
more details:
CREATE TABLE my_table (
id BIGINT AUTO_INCREMENT,
user_id BIGINT,
identifier VARCHAR(64),
account_id VARCHAR(64),
top_speed INT UNSIGNED NOT NULL,
total_chars INT UNSIGNED NOT NULL,
total_time INT UNSIGNED NOT NULL,
keystrokes INT UNSIGNED NOT NULL,
avg_speed INT UNSIGNED NOT NULL,
country_code VARCHAR(16),
update_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY(id), UNIQUE KEY(user_id)
);
PS:
I also tried different computers with Solid State Drive and Hard Disk Drive, but didn't help too.
Splitting up a table is unlikely to help at all. Ditto for PARTITIONing.
Let's count the disk hits. I will skip counting non-leaf nodes in BTrees; they tend to be cached; I will count leaf nodes in the data and indexes; they tend not to be cached.
IODKU does:
Read the index block containing the for any UNIQUE keys. In your case, that is probably user_id. Please provide a sample SQL statement. 1 read.
If the user_id entry is found in the index, read the record from the data as indexed by the PK(id) and do the UPDATE, and leave this second block in the buffer_pool for eventual rewrite to disk. 1 read now, 1 write later.
If the record is not found, do INSERT. The index block that needs the new row was already read, so it is ready to have a new entry inserted. Meanwhile, the "last" block in the table (due to id being AUTO_INCREMENT) is probably already cached. Add the new row to it. 0 reads now, 1 write later (UNIQUE). (Rewriting the "last" block is amortized over, say, 100 rows, so I am ignoring it.)
Eventually do the write(s).
Total, assuming essentially all take the UPDATE path: 2 reads and 1 write. Assuming the user_id follows no simple pattern, I will assume that all 3 I/Os are "random".
Let's consider a variation... What if you got rid of id? Do you need id anywhere else? Since you have a UNIQUE key, it could be the PK. That is replace your two indexes with just PRIMARY KEY(user_id). Now the counts are:
1 read
If UPDATE, 0 read, 1 write
If INSERT, 0 read, 0 write
Total: 1 read, 1 write. 2/3 as many as before. Better, but still not great.
Caching
How much RAM do you have?
What is the value of innodb_buffer_pool_size?
SHOW TABLE STATUS -- What are Data_length and Index_length?
I suspect that the buffer_pool is not big enough, and possible could be raised. If you have more than 4GB of RAM, make it about 70% of RAM.
Others
SSDs should have helped significantly, since you appear to be I/O bound. Can you tell whether you are I/O-bound or CPU-bound?
How many rows are you updating at once? How long does it take? Is it batched, or one at a time? There may be a significant improvement possible here.
Do you really need BIGINT (8 bytes)? INT UNSIGNED is only 4 bytes.
Is a transaction involved?
Is the Master having a problem? The Slave? Both? I don't want to fix the Master in such a way that it messes up the Slave.
Try to split your database into some mysql instances using mysql proxy just like mysql-proxy or haproxy instead of one mysql instance. Maybe you can have great performance.
I have a table which is growing very quickly, Currently it has 47000000+ rows.
Even very simple queries such as this is taking 46 seconds at times.
SELECT id, userId, visitorId, date FROM user_views LIMIT 20000000, 1;
Table structure is :
Field Type Null Key Default Extra
id int(11)unsigned NO PRI NULL auto_increment
userId int(11)unsigned NO MUL NULL
visitorId int(11) NO MUL NULL
date datetime NO MUL NULL
Already the application is running with 1 master and 6 slaves. Cant afford more instances.
Have btree index on id
Is there any way to make it faster?
Thanks
First of all you should consider using different storage approaches. Depending on your use cases a relational database might not be the best choice. E.g. if 99% of all oprations are writing to the table but not updating existing records (what your column names suggest), a nosql database might perform way better.
Secondly skipping 20000000 rows without any specific order criteria (based on an index of course) leaves it open to the DBMS to apply an arbitrary order, that might be suboptimal.
I don't know MySQL-internal optimization mechanisms, but LIMIT is only applied after the whole resultset has been built, which means you have the whole table loaded in your memory. So please try to reduce the size of the result set using WHERE statements before LIMITing it.
I have a table 'logging' in which we log visitor history. We have 14 millions pageviews in a day, so we insert 14 million records in table in a day, and traffic is highest in afternoon. From somedays we are facing the problems for duplicate key entry 'id', which according to me should not be the case, since id is autoincremented field and we are not explicitly passing id in insert query. Following are the details
logging (MyISAM)
----------------------------------------
| id | int(20) |
| virtual_user_id | varchar(1000) |
| visited_page | varchar(255) |
| /* More such columns are there */ |
----------------------------------------
Please let me know what is the problem here. Is keeping table in MyISAM a problem here.
Problem 1: size of your primary key
http://dev.mysql.com/doc/refman/5.0/en/integer-types.html
The max size of an INT regardless of the size you give it is 2147483647, twice that much if unsigned.
That means you get a problem every 153 days.
To prevent that you might want to change the datatype to an unsigned bigint.
Or for even more ridiculously large volumes even a unix timestamp + microtime as a composite key. Or a different DB solution altogether.
Problem 2: the actual error
It might be concurrency, even though I don't find that very plausible.
You'll have to provide the insert IDs / errors for that. Do you use transactions?
Another possibility is a corrupt table.
Don't know your mysql version, but this might work:
CHECK TABLE tablename
See if that has any complaints.
REPAIR TABLE tablename
General advice:
Is this a sensible amount of data to be inserting into a database, and doesn't it slow everything down too much anyhow?
I wonder how your DB performs with locking and all during the delete during for example an alter table.
The right way to do it totally depends on the goals and requirements of your system which I don't know, but here's an idea:
Log lines into a log. Import the log files in our own time. Don't bother your visitors with errors or delays when your DB is having trouble or when you need to do some big operation that locks everything.
I have a table "Words" in mysql database. This table contains 2 fields. word(VARCHAR(256)) and p_id(INTEGER).
Create table statement for the table:
CREATE TABLE `Words` (
`word` varchar(256) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`p_id` int(11) NOT NULL DEFAULT '0',
KEY `word_i` (`word`(255))
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Sample entries in the table are:
+------+------+
| word | p_id |
+------+------+
| a | 1 |
| a | 2 |
| b | 1 |
| a | 4 |
+------+------+
This table contains 30+ million entries in it. I am running a group by query and it is taking 90+ minutes for running that query. The group by query I am running is:
SELECT word,group_concat(p_id) FROM Words group by word;
To optimize this problem, I sent all the data in the table into a text file using the following query.
SELECT p_id,word FROM Words INTO OUTFILE "/tmp/word_map.txt";
After that I wrote a Perl script to read all the content in the file and parse that and make a hash out of it. It took very less time compared to the Group by query(<3min).In the end hash has 14million keys(words). It is occupying a lot of memory.So Is there any way to improve the performance of Group BY query so that I don't need to go through all the above mentioned steps?
EDT: I am adding the my.cnf file entries below.
[mysqld]
datadir=/media/data/.mysql_data/mysql
tmpdir=/media/data/.mysql_tmp_data
innodb_log_file_size=5M
socket=/var/lib/mysql/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
group_concat_max_len=4M
max_allowed_packet=20M
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
tmpdir=/media/data/.mysql_tmp_data/
Thanks,
Vinod
I think the index you want is:
create index words_word_pid on words(word, pid)
This does two things. First, the group by can be handled by an index scan rather than loading the original table and sorting the results.
Secondly, this index also eliminates the need to load the original data.
My guess is that the original data does not fit into memory. So, the processing goes through the index (efficiently), finds the word, and then needs to load the pages with the word on it. Well, eventually memory fills up and the page with the word is not in memory. The page is loaded from disk. And the next page is probably not in memory, and that page is loaded from disk. And so on.
You can fix this problem by increasing the memory size. You can also fix the problem by having an index that covers all the columns used in the query.
The problem is that it is hardly a frequent usecase for a database to output the whole 30M rows table into a file. The advantange of your approach with the Perl script is that you do not need random disk IO. To simulate the bahaviour in MySQL you will need to load everythin into an index (p_id, word) (the whole word, not a prefix), which might turn out an overkill for the database.
You can put only p_id into an index, this will speed up grouping, but will require a lot of random disk IO to fetch words for each row.
By the way, the covering index will take ~(4+4+3*256)*30M bytes, that is more than 23Gb of memory. It seems that the solution with the Perl script is the best you can do.
Another thing you should be aware of is that you will need to get more than 20Gb of result through a MySQL connection, and that those 20 Gb of result shoul be collected into a temporary table (and sorted by p_id if you do not append ORDER BY NULL). If you are going to download if through a MySQL binding to a programming language, you will need to force the binding use streaming (by default bindings usually get the whole resultset)
Index the table on the word column. This will accelerate the grouping substantially as the SQL engine can locate the records for grouping with minimal searching through the table.
CREATE INDEX word_idx ON Words(word);
I'm currently trying to optimize a query generated by Doctrine 2 on this table:
CREATE TABLE `publication` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`global_order` int(11) NOT NULL,
`title` varchar(63) COLLATE utf8_unicode_ci NOT NULL,
`slug` varchar(63) COLLATE utf8_unicode_ci NOT NULL,
`type` varchar(7) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `UNIQ_AF3C6779B12CE9DB` (`global_order`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
The query is
SELECT *
FROM publication
WHERE type IN ('article', 'event', 'work')
ORDER BY global_order DESC
type is a discriminator column added by Doctrine. Although the WHERE clause is useless as type is always one of the IN values, I cannot remove it.
EXPLAIN shows me
+------+---------------+------+------+-----------------------------+
| type | possible_keys | key | rows | Extra |
+------+---------------+------+------+-----------------------------+
| ALL | NULL | NULL | 562 | Using where; Using filesort |
+------+---------------+------+------+-----------------------------+
(rows is different each time I execute the query)
After some reading I found I can force an index usage like this:
ALTER TABLE `publication` DROP INDEX `UNIQ_AF3C6779B12CE9DB` ,
ADD UNIQUE `UNIQ_AF3C6779B12CE9DB` ( `global_order` , `type` )
and
SELECT *
FROM publication
FORCE INDEX(UNIQ_AF3C6779B12CE9DB)
WHERE global_order > 0
AND type IN ('article', 'event', 'work')
ORDER BY global_order DESC
The WHERE clause is always useless, but this time EXPLAIN shows me
+-------+-----------------------+-----------------------+------+-------------+
| type | possible_keys | key | rows | Extra |
+-------+-----------------------+-----------------------+------+-------------+
| range | UNIQ_AF3C6779B12CE9DB | UNIQ_AF3C6779B12CE9DB | 499 | Using where |
+-------+-----------------------+-----------------------+------+-------------+
It seems to me it's better, but it seems it's not common to have to force an index too so I wonder if it's really efficient for such a simple query.
Does anyone know what is the better way to perform this query?
Thanks!
If your query really is:
SELECT *
FROM publication
WHERE type IN ('article', 'event', 'work')
ORDER BY global_order DESC
... and all entries (or nearly all) will match the IN clause, you're actually better off with no index at all. If you toss in a limit clause, then the index you'll want is actually on global_order, without the type field. The reason for this is, it actually costs something to read an index.
If you're going for the entire table, sequentially reading the table and sorting its rows in memory will be your cheapest plan. If you only need a few rows and most will match the where clause, going for the smallest index will do the trick.
To understand why, picture the disk IO involved.
Suppose you want the whole table without an index. To do this, you read data_page1, data_page2, data_page3, etc., visiting the various disk pages involved in order, until you reach the end of the table. You then then sort and return.
If you want the top 5 rows without an index, you'd sequentially read the entire table as before, while heap-sorting the top 5 rows. Admittedly, that's a lot of reading and sorting for a handful of rows.
Suppose, now, that you want the whole table with an index. To do this, you read index_page1, index_page2, etc., sequentially. This then leads you to visit, say, data_page3, then data_page1, then data_page3 again, then data_page2, etc., in a completely random order (that by which the sorted rows appear in the data). The IO involved makes it cheaper to just read the whole mess sequentially and sort the grab bag in memory.
If you merely want the top 5 rows of an indexed table, in contrast, using the index becomes the correct strategy. In the worst case scenario you load 5 data pages in memory and move on.
A good SQL query planner, btw, will make its decision on whether to use an index or not based on how fragmented your data is. If fetching rows in order means zooming back and forth across the table, a good planner may decide that it's not worth using the index. In contrast, if the table is clustered using that same index, the rows are guaranteed to be in order, increasing the likelihood that it'll get used.
But then, if you join the same query with another table and that other table has an extremely selective where clause that can use a small index, the planner might decide it's actually better to, e.g. fetch all IDs of rows that are tagged as foo, hash join them with publications, and heap sort them in memory.
MySQL tries to determine the best way to run a given query, and decides whether or not to use indexes based on what it thinks is the best.
It isn't always correct. Sometimes manually forcing a query to use an index is faster, sometimes its not.
If you run some testing with sample data in your specific situation, you should be able to see which method performs faster, and stick with that one.
Make sure you take into account query caching to get an accurate performance benchmark.
Forcing the use of an index is rarely the best answer. In general it is better to create and/or optimize the indices (indexes) so that MySQL chooses to use them. (It is even better to optimize the queries, but I understand you cannot do that here.)
When you are using something like Doctrine where you cannot optimize the queries and the indices don't help, your best bet is to focus on query caching. :-)