MySql - Alter few tables on the same time - mysql

I have a MySql Db with innoDB tables.
I need to alter a couple of big tables (~50M records), since altering locks the tables I want to make the process as fast as possible.
What is best in term of speed:
1. altering one table at a time
2. alter both tables on the same time (simultaneously)
any ideas?

I did a test.
I created a table with 4 million rows. Very simple table, one column and all values are "dude" for all rows. I then duplicated that table into big_2 containing the exact same data.
My computer is a macbook pro 13.3" from mid 2010 so everything is related to that.
I then did three things.
I ran an alter on both tables in serial, it took 34 and 33 seconds to add the column (67s total).
I ran alter on both tables in parallell, it took 1.1 min before they returned (basically at the same time) (61s total)
I redid the first test and this time it took 35 + 35 seconds (70 in total)
This confirms my suspicion that it won't be any faster in parallel. The most likely reason is that this is almost entirely an operation on disk, and that cannot be paralleled at all.

Really depends on how much memory you have in your server.
When you're doing ALTER TABLE, you really want the table and its largest secondary index (remember innodb clusters the primary key, so PK is stored with the rows) to fit into memory. If it doesn't, it's going to be slow (NB: This discussion assumes the table is not partitioned).
As your table has a tiny 50M rows, the chances are it fits in RAM trivially (you have 32G+ on your server, right?) with all its secondary indexes.
If it all fits in the innodb buffer pool, do them in parallel. If it doesn't do them in series.
Try it on your development server which has the same spec as production (obviously configure them with the same size innodb_buffer_pool)

Doing it simultaneously won't give you much gain. It still has to wait until the first is finished to do the second one.
You may prefer to run the queries with a short delay between them so other queries that have been waiting for the lock since the beginning of the first update don't have to wait for the second as well. It your database is serving a website for example, two 15 seconds hangs is better than a single 30 seconds one.

Related

Drop and Recreate Index on MYSQL table, it will improve the performance?

I have Executed the query in the newly imported MySQL database but it takes 68sec to complete. Then I have dropped and recreated the same indexes on 2 main tables then it takes 24sec only.
Why it has occurred? Is it a good practice or not?
Thanks in Advance
You are misinterpreting the results and the cause. Dropping and re-creating the indexes isn't what makes it go faster. There are two things that could be going on:
1) DB doesn't fit into RAM so when you recreated two indexes that made most of them stick in the buffer pool by the time you ran the query.
2) Table was fragmented or had very lightly filled blocks. Recreating indexes probably rebuilt the table and that may have improved page occupancy If your query requires a full table scan, this would have meant fewer GBs of table to scan and possibly less fragmented (can matter on spinning rust).
As a general rule you should never need to do that. If you disable the query cache (query_cache_type=0, query_cache_size=0 on MySQL < 8), and run the query twice, the second time is the speed you can expect with hit buffer pool.

Queries fast after creating an index but slow after a few minutes MySQL

I have several tables with ~15 million rows. When I create an idex on the id column and then I execute a simple query like SELECT * FROM my_table WHERE id = 1 I retrieve the data within one second. But then, after a few minutes, if I execute the query with a different id it takes over 15 seconds.
I'm sure it is not the query cache because I'm trying different ids all the time to make sure I'm not retrieving from the cache. Also, I used EXPLAIN to make sure the index it's being used.
The specs of the server are:
CPU: Intel Dual Xeon 5405 Harpertown 2.0Ghz Quad Core
RAM: 8GB
Hard drive 2: 146GB SAS (15k rpm)
Another thing I noticed is that if I execute REPAIR TABLE my_table the queries become within one second again. I assume something is being cached, either the table or the index. If so, is there any way to tell MySQL to keep it cached. Is it normal, given the specs of the server, to take around 13 seconds on an indexed table? The index is not unique and each query returns around 3000 rows.
NOTE: I'm using MyISAM and I know there won't be any write in these tables, all the queries will be to read data.
SOLVED: thank you for your answers, as many of you pointed out it was the key_buffer_size.I also reordered the tables using the same column as the index so the records are not scattered, now I'm executing the queries consistently under 1 second.
Please provide
SHOW CREATE TABLE
SHOW VARIABLES LIKE '%buffer%';
Likely causes:
key_buffer_size (when using MyISAM) is not 20% of RAM; or innodb_buffer_pool_size is not 70% of available RAM (when using InnoDB).
Another query (or group of queries) is coming in and "blowing out the cache" (key_buffer or buffer_pool). Look for such queries).
When using InnoDB, you don't have a PRIMARY KEY. (It is really important to have such.)
For 3000 rows to take 15 seconds to load, I deduce:
The cache for the table (not necessarily for the index) was blown out, and
The 3000 rows were scattered around the table (hence fetching one row does not help much in finding subsequent rows).
Memory allocation blog: http://mysql.rjweb.org/doc.php/memory
Is it normal, given the specs of the server, to take around 13 seconds on an indexed table?
The high variance in response time indicates that something is amiss. With only 8 GB of RAM and 15 million rows, you might not have enough RAM to keep the index in memory.
Is swap enabled on the server? This could explain the extreme jump in response time.
Investigate the memory situation with a tool like top, htop or glances.

Creating indexes on large tables in MySQL (MariaDB) takes a verrry looong time

I have a table with a few billion rows of data and I am trying to build 5 indexes on it at once. The table format is MyISAM to save space. Once I build the indexes this will be a static table, I just need it to be read only.
I created the indexes using this command:
alter table links8 add index(uid,tid), add index (date), add index (tid), add index (userid), add index (updated,uid,tid,userid,date);
The command has been running for over 45 days. You read that right: 45 DAYS. I can see that the temp files are still being accessed, it isn't a dead query.
My question is: wtf? Seems like it should take a few hours at most to sort and build an index even with a few billion rows.
Since I have a static table, is there another storage engine that makes sense to use? Innodb takes up way too much space.
45 days doesn't seem right, because in that time, MySQL is bound to do something, and that something is likely either consuming RAM or storage, likely both, which means that you should have run out of either at some point.
I'd assume it's RAM, because that usually is where things get sparse ;)
Now, you're absolutely right, sorting a few billion values in memory shouldn't take ages. Sorting a few billion values that are the concatenated values in (updated,uid,tid,userid,date) though most likely doesn't happen in RAM. Assuming updated and date are of type datetime, they take 8 bytes each; uid,tid,userid would normally be 32 bit ints, but since your table has > 2**32 entries (I'm assuming that), unique ID's would be 8 byte long, too. So one value of type (updated,uid,tid,userid,date) would be 40B long.
Now throw in let's say 5 billion of these; you get 200 GB of pure row data that you'll need to sort to build an index. Assuming you're not doing this on some huge machine, you obviously need to swap out parts of these values to disk -- since you see temporary files appear, my wild guess is that this is happening, and MySQL is actively doing that itself. Now, sorting algorithms that work on parts of the rows iteratively are much slower, because first you sort all parts, then you mix up the parts in a manner that's better sorted than before, than you re-partition your data, you sort your parts ... with storing and loading from disk in between.
By the way, a 45 day lasting memory operation is likely to be prone to memory bit errors, if no correctional measures are taken (basically, use ECC for this kind of task, or you end up with indexed, corrupted data).
MySQL themselves suggest that you just build a special MD5 index that takes the hash of your search tuple and looks for that, since sorting 128bit (==16 byte) MD5 hashes might be easier than sorting 5*8Byte == 40*8 bit == 320bit long composite rows.
I found a better solution.
I created a new table with the indexes already in place then issued an insert from one table to the other. The way this works is it fills the MYD (raw data file) up and then creates the indexes after that. Once it has started creating the indexes I killed the query. Then on the filesystem I used myisamchk to repair the table manually.
That command looked like this:
myisamchk --force --fast --update-state --key_buffer_size=2000M --sort_buffer_size=2000M --read_buffer_size=10M --write_buffer_size=10M TABLE.MYI
And the whole thing took less than 12 hours and the data looks good!
UPDATE:
Here is the flow summarized.
create table2 indentical to table1 with indexes;
insert into table2 select * from table1;
once the MYD file is full and it starts on the MYI file kill the query
then shutdown mysql and run the myisamchk query and restart mysql
OR
copy table2.MYD and table2.MYI to table3.MYD and table3.MYI, then run myisamchk, then copy table2.frm to table3.frm and change the permissions, when it's all done you should be able to access table3 without a restart of mysql

2273 msec to insert 7679 records is this fast or slow?

I'm planning on inserting a large amount of rows in my mysql database. At the moment i'm inserting about 8000 records in a almost empty table with no indexes (only primary key with autoincrement) using batches and using a mysql server (default install) on localhost (i7 6gb, fast hd)
It currently take about 2273 msec to insert 7679 records.
A single record looks like:
39492, 1.4618, 1.4619, 1.4606, 1.4613, 1199235602000, 0, 133
I was wondering if this is a normal average speed or that i should be worried because it it extremely slow?
I ask this because i have no reference when it comes to speed. And because of this i don't know if my code is good or might be bugged because the speed is slow.
0.3 milliseconds per row is respectable performance, especially if you haven't yet done anything to make your code run fast. If you have any indexes in your table the insertion rate may slow down as you get up to many thousands of rows already in the data base. Then you'll need to see about disabling constraints, loading the table, and then re-enabling constraints. But you can cross that bridge if you come to it.

Generating a massive 150M-row MySQL table

I have a C program that mines a huge data source (20GB of raw text) and generates loads of INSERTs to execute on simple blank table (4 integer columns with 1 primary key). Setup as a MEMORY table, the entire task completes in 8 hours. After finishing, about 150 million rows exist in the table. Eight hours is a completely-decent number for me. This is a one-time deal.
The problem comes when trying to convert the MEMORY table back into MyISAM so that (A) I'll have the memory freed up for other processes and (B) the data won't be killed when I restart the computer.
ALTER TABLE memtable ENGINE = MyISAM
I've let this ALTER TABLE query run for over two days now, and it's not done. I've now killed it.
If I create the table initially as MyISAM, the write speed seems terribly poor (especially due to the fact that the query requires the use of the ON DUPLICATE KEY UPDATE technique). I can't temporarily turn off the keys. The table would become over 1000 times larger if I were to and then I'd have to reprocess the keys and essentially run a GROUP BY on 150,000,000,000 rows. Umm, no.
One of the key constraints to realize: The INSERT query UPDATEs records if the primary key (a hash) exists in the table already.
At the very beginning of an attempt at strictly using MyISAM, I'm getting a rough speed of 1,250 rows per second. Once the index grows, I imagine this rate will tank even more.
I have 16GB of memory installed in the machine. What's the best way to generate a massive table that ultimately ends up as an on-disk, indexed MyISAM table?
Clarification: There are many, many UPDATEs going on from the query (INSERT ... ON DUPLICATE KEY UPDATE val=val+whatever). This isn't, by any means, a raw dump problem. My reasoning for trying a MEMORY table in the first place was for speeding-up all the index lookups and table-changes that occur for every INSERT.
If you intend to make it a MyISAM table, why are you creating it in memory in the first place? If it's only for speed, I think the conversion to a MyISAM table is going to negate any speed improvement you get by creating it in memory to start with.
You say inserting directly into an "on disk" table is too slow (though I'm not sure how you're deciding it is when your current method is taking days), you may be able to turn off or remove the uniqueness constraints and then use a DELETE query later to re-establish uniqueness, then re-enable/add the constraints. I have used this technique when importing into an INNODB table in the past, and found even with the later delete it was overall much faster.
Another option might be to create a CSV file instead of the INSERT statements, and either load it into the table using LOAD DATA INFILE (I believe that is faster then the inserts, but I can't find a reference at present) or by using it directly via the CSV storage engine, depending on your needs.
Sorry to keep throwing comments at you (last one, probably).
I just found this article which provides an example of a converting a large table from MyISAM to InnoDB, while this isn't what you are doing, he uses an intermediate Memory table and describes going from memory to InnoDB in an efficient way - Ordering the table in memory the way that InnoDB expects it to be ordered in the end. If you aren't tied to MyISAM it might be worth a look since you already have a "correct" memory table built.
I don't use mysql but use SQL server and this is the process I use to handle a file of similar size. First I dump the file into a staging table that has no constraints. Then I identify and delete the dups from the staging table. Then I search for existing records that might match and put the idfield into a column in the staging table. Then I update where the id field column is not null and insert where it is null. One of the reasons I do all the work of getting rid of the dups in the staging table is that it means less impact on the prod table when I run it and thus it is faster in the end. My whole process runs in less than an hour (and actually does much more than I describe as I also have to denormalize and clean the data) and affects production tables for less than 15 minutes of that time. I don't have to wrorry about adjusting any constraints or dropping indexes or any of that since I do most of my processing before I hit the prod table.
Consider if a simliar process might work better for you. Also could you use some sort of bulk import to get the raw data into the staging table (I pull the 22 gig file I have into staging in around 16 minutes) instead of working row-by-row?