Table partitioning takes too long with percona mysql - mysql

I'm trying to create a new partition (LIST) on an existing table (innodb) with 30million records. the query is running almost 2 hours and it still on "copy to tmp table" state.
I have used the percona mysql performance wizard to improve and I don't see any differences. The server is with no traffic at all.
Running on Ubuntu server with 16cores, 30GB memory and SSD 300/3000 IO. it looks like the mysql is not using all resources. the memory usage is on 9GB and only 3 cores are running on very low load.
Is there a way to improve setting to use more resources and speed up the query?

First of all, PARTITION BY LIST is virtually useless. Why do you think it might be worth doing?
Let's see SHOW CREATE TABLE. If there are a lot of secondary indexes, that could be the issue.
How big is innodb_buffer_pool_size? Sounds like it is not as big as it should be. (Recommend about 70% of available RAM.)
Let's see the SQL that is taking so long. There may be something subtle in it.

Related

MySQL configuration for a data science workload?

All of what I find on the web for advice on tuning MySQL for performance deals with production databases that have a high number of connections and many repeated queries. This is not my workload, instead, I'm doing data investigation with MySQL where I am the only user, the data doesn't change very often (bulk imports only), and the number of connections I might have at any given time is < 20. The data I have is largish (several hundred gigs, tables with 50M rows with a bunch of strings in them), but the queries I write are rarely run more than a few times each.
I have the O'Reilly Schwartz et al. book on MySQL and it has been a godsend for understanding how to make some things (like indices) work to my advantage. Yet I feel much less comfortable with the server parameters for this kind of workload, as I can find few examples on the web. Here are the non-stock (MySQL 5.5, Ubuntu) parameters I am running with:
max_heap_table_size=32G
tmp_table_size=32G
join_buffer_size=6G
innodb_buffer_pool_size=10G
innodb_buffer_pool_instances=2
sort_buffer_size=100M
My server is a multi-core (quad, seems wasted on MySQL but sometimes I'll hit up a couple of queries at once) 32GB of RAM machine. Right now it looks like MySQL is limiting itself to 12GB of ram, likely because of the innodb_buffer_pool size. I set tmp_table_size and heap size to be just fantastical because I had been doing some queries where I stored a lot in memory.
Are there any good resources to tune MySQL to this kind of workload? Are there suggestions on what parameters I should set for innodb?
I don't think you have to tune your InnoDB engine performance any more. The real performance gain will be in the way you structure tables, and the queries you write. Be sure that the columns you select on are indexed, sensible primary keys are chosen, etc. Tables with 50M rows shouldn't be a problem as long as you have a good primary key.
If you haven't run into any performance bottlenecks yet, then I think there is no reason to worry.

MySQL memory usage for large database

We have a MySQL DB (OLD_DATA) where many partitioned table size has gone over 100GB. To improve server performance we thought of creating a parallel DB (NEW_DATA) and start collecting new data in NEW_DATA.
The DBs are MyISAM and the server has 96GB RAM.
After this the OLD_DATA will not be accessed.
Will this approach help in improving server performance in terms of RAM and CPU usage?
Will the data from OLD_DATA be loaded in memory?
Please provide SHOW CREATE TABLE. If it is, as you hinted, PARTITIONed, then we need to factor that into the analysis.
Generally if "old" rows are no longer accessed in any way, they do not hurt, and there would be no need to do what you did. Please elaborate on the queries that you feel are "slow", preferably by providing EXPLAIN SELECT.
MyISAM caches index blocks in the "key_buffer"; how big are the indexes? What is key_buffer_size set to? Data blocks are cached by the OS.
It is quite 'normal' for a system to have a dataset that are many times as big as RAM. Performance depends on the 'working set' of the dataset and on how actively you are querying the data. You have implied that "old" rows are not used, hence not part of the 'working set'.
On the other hand, if you have "table scans", the entire table is part of the working set. It is quite possible that we can advise on how to change those queries to be nicer.

MySQL ALTER TABLE on very large table - is it safe to run it?

I have a MySQL database with a MyISAM table with 4 million rows. I update this table about once a week with about 2000 new rows. After updating, I then alter the table like this:
ALTER TABLE x ORDER BY PK DESC
I order the table by the primary key field in descending order. This has not given me any problems on my development machine (Windows with 3GB memory). Three times I have tried it successfully on the production Linux server (with 512MB RAM - and achieving the resulted sorted table in about 6 minutes each time), the last time I tried it I had to stop the query after about 30 minutes and rebuild the database from a backup.
Can a 512MB server cope with that alter statement on such a large table? I have read that a temporary table is created to perform the ALTER TABLE command.
Question: Can this alter command be safely run? What should be the expected time for the alteration of the table?
As I have just read, the ALTER TABLE ... ORDER BY ... query is useful to improve performance in certain scenarios. I am surprised that the PK Index does not help with this. But, from the MySQL docs, it seems that InnoDB does use the index. However InnoDB tends to be slower as MyISAM. That said, with InnoDB you wouldn't need to re-order the table but you would lose the blazing speed of MyISAM. It still may be worth a shot.
The way you explain the problems, it seems that there is too much data loaded into memory (maybe there is even swapping going on?). You could easily check that with monitoring your memory usage. It's hard to say as I do not know MySQL all that well.
On the other hand, I think your problem lies at a very different place: You are using a machine with only 512 Megs of RAM as Database server with a table containing more than 4Mio rows... And you are performing a very memory-heavy operation on the whole table on that machine. It seems that 512Megs will not nearly be enough for that.
A much more fundamental issue I am seeing here: You are doing development (and quite likely testing as well) in an environment that is very different to the production environment. The kind of problem you are explaining is to be expected. Your development machine has six times as much memory as your production machine. I believe I can safely say, that the processor is much faster as well. In that case, I suggest you create a virtual machine mimicking your production site. That way you can easily test your project without disrupting the production site.
What you're asking it to do is rebuild the entire table and all its indexes; this is an expensive operation particularly if the data doesn't fit in ram. It will complete, but it will be vastly slower if the data doesn't fit in ram, particularly if you have lots of indexes.
I question your judgement when choosing to run a machine with such tiny memory in production. Anyway:
Is this ALTER TABLE really necessary; what specific query are you trying to speed up, and have you tried it without?
Have you considered making your development machine more like production? I mean, using a dev box with MORE memory is never a good idea, and using a different OS is definitely not either.
There is probably also some tuning you can do to try to help; it largely depends on your schema (indexes in particular). 4M rows is not very many (for a machine with normal amounts of ram).
is the primary key auto_increment? if so, then doing ALTER TABLE ... ORDER BY isn't going to improve anything since everything will be inserted in order anyway.
(unless you have lots of deletes)
I'd probably create a View instead which is ordered by the PK value, so that for one thing you don't need to lock up that huge table while the ALTER is being performed.
If you're using InnoDB, you shouldn't have to explicitly perform the ORDER BY either post-insert or at query time. According to the MySQL 5.0 manual, InnoDB already defaults to primary key ordering for query results:
http://dev.mysql.com/doc/refman/5.0/en/alter-table.html#id4052480
MyISAM tables return records in insertion order by default, instead, which may work as well if you only ever append to the table, rather than using an UPDATE query to modify any rows in-place.

General mysql freeze

Our MySQL server is behaving strangely - doing a heavy query on one table will pretty much lock it, no other query will be responded to, even if it's a trivial one on completely unrelated table (like SELECT by primary key, INSERT, or even EXPLAIN) - it will just wait for the heavy one to finish. I'm really out of ideas what could be causing that - there's plenty of free threads, CPU is in idle/wait state, network is free, there's plenty of memory ... etc. Anybody has any ideas what to look for?
How much memory does MySQL have? My guess is that it's having to go to disk for temporary tables and that can be killer. How big is your innodb_buffer_pool_size in your my.cnf/my.ini file? By increasing that (which ups memory usage) smaller queries (and queries that need to do lots of sorting) won't have to hit disk.
The next time a query starts to do this to you, log onto MySQL, do "show full processlist" and record what's there. If you issue an EXPLAIN or a DESCRIBE on the query that's going slow, what does it is? Does it include the note that it's going to a temporary table?
When MySQL starts using temp tables, things can feel like they are dying, especially if you don't have a fast disk.

How to predict MySQL tipping points?

I work on a big web application that uses a MySQL 5.0 database with InnoDB tables. Twice over the last couple of months, we have experienced the following scenario:
The database server runs fine for weeks, with low load and few slow queries.
A frequently-executed query that previously ran quickly will suddenly start running very slowly.
Database load spikes and the site hangs.
The solution in both cases was to find the slow query in the slow query log and create a new index on the table to speed it up. After applying the index, database performance returned to normal.
What's most frustrating is that, in both cases, we had no warning about the impending doom; all of our monitoring systems (e.g., graphs of system load, CPU usage, query execution rates, slow queries) told us that the database server was in good health.
Question #1: How can we predict these kinds of tipping points or avoid them altogether?
One thing we are not doing with any regularity is running OPTIMIZE TABLE or ANALYZE TABLE. We've had a hard time finding a good rule of thumb about how often (if ever) to manually do these things. (Since these commands LOCK tables, we don't want to run them indiscriminately.) Do these scenarios sound like the result of unoptimized tables?
Question #2: Should we be manually running OPTIMIZE or ANALYZE? If so, how often?
More details about the app: database usage pattern is approximately 95% reads, 5% writes; database executes around 300 queries/second; the table used in the slow queries was the same in both cases, and has hundreds of thousands of records.
The MySQL Performance Blog is a fantastic resource. Namely, this post covers the basics of properly tuning InnoDB-specific parameters.
I've also found that the PDF version of the MySQL Reference Manual to be essential. Chapter 7 covers general optimization, and section 7.5 covers server-specific optimizations you can toy with.
From the sound of your server, the query cache may be of IMMENSE value to you.
The reference manual also gives you some great detail concerning slow queries, caches, query optimization, and even disk seek analysis with indexes.
It may be worth your time to look into multi-master replication, allowing you to lock one server entirely and run OPTIMIZE/ANALYZE, without taking a performance hit (as 95% of your queries are reads, the other server could manage the writes just fine).
Section 12.5.2.5 covers OPTIMIZE TABLE in detail, and 12.5.2.1 covers ANALYZE TABLE in detail.
Update for your edits/emphasis:
Question #2 is easy to answer. From the reference manual:
OPTIMIZE:
OPTIMIZE TABLE should be used if you have deleted a large part of a table or if you have made many changes to a table with variable-length rows. [...] You can use OPTIMIZE TABLE to reclaim the unused space and to defragment the data table.
And ANALYZE:
ANALYZE TABLE analyzes and stores the key distribution for a table. [...] MySQL uses the stored key distribution to decide the order in which tables should be joined when you perform a join on something other than a constant. In addition, key distributions can be used when deciding which indexes to use for a specific table within a query.
OPTIMIZE is good to run when you have the free time. MySQL optimizes well around deleted rows, but if you go and delete 20GB of data from a table, it may be a good idea to run this. It is definitely not required for good performance in most cases.
ANALYZE is much more critical. As noted, having the needed table data available to MySQL (provided with ANALYZE) is very important when it comes to pretty much any query. It is something that should be run on a common basis.
Question #1 is a bit more of a trick. I would watch the server very carefully when this happens, namely disk I/O. My bet would be that your server is thrashing either your swap or the (InnoDB) caches. In either case, it may be query, tuning, or load related. Unoptimized tables could cause this. As mentioned, running ANALYZE can immensely help performance, and will likely help out too.
I haven't found any good way of predicting MySQL "tipping points" -- and I've run into a few.
Having said that, I've found tipping points are related to table size. But not merely raw table size, rather how big the "area of interest" is to a query. For example, in a table of over 3 million rows and about 40 columns, about three-quarters integers, most queries that would easily select a portion of them based on indices are fast. However, when one value in a query on one indexed column means two-thirds of the rows are now "interesting", the query is now about 5-times slower than normal. Lesson: try to arrange your data so such a scan isn't necessary.
However, such behaviour now gives you a size to look for. This size will be heavily dependant on your server setup, the MySQL server variables and the table's schema and data.
Similarly, I've seen reporting queries run in reasonable time (~45 seconds) if the period is two weeks, but take half-an-hour if the period is extended to four weeks.
Use slow query log that will help you to narrow down the queries you want to optimize.
For time critical queries it sometimes better to keep stable plan by using hints.
It sounds like you have a frustrating situation and maybe not the best code review process and development environment.
Whenever you add a new query to your code you need to check that it has the appropriate indexes ready and add those with the code release.
If you don't do that your second option is to constantly monitor the slow query log and then go beat the developers; I mean go add the index.
There's an option to enable logging of queries that didn't use an index which would be useful to you.
If there are some queries that "works and stops working" (but are "using and index") then it's likely that the query wasn't very good in the first place (low cardinality in the index; inefficient join; ...) and the first rule of evaluating the query carefully when it's added would apply.
For question #2 - On InnoDB "analyze table" is basically free to run, so if you have bad join performance it doesn't hurt to run it. Unless the balance of the keys in the table are changing a lot it's unlikely to help though. It almost always comes down to bad queries. "optimize table" rebuilds the InnoDB table; in my experience it's relatively rare that it helps enough to be worth the hassle of having the table unavailable for the duration (or doing the master-master failover stuff while it's running).