Very long-running mysql process - mysql

I have a long-running mysql process to update a very large table where the auto-incrementing ID field needs to be changed from INT32 to INT64 (BIGINT). The data size is about 150GB and there are ~2.4B rows.
This is straightforward enough, and the alter table has been taking about 3 days now and is still running:
Is there any way to either (1) track progress here (or maybe even guesstimate how long it would take)? or (2) any other things I can do to expedite this other than cross my fingers?

Have you seen this: https://dev.mysql.com/doc/refman/5.7/en/monitor-alter-table-performance-schema.html
It might be too late to get the progress for the alter that is currently running. You have to enable a performance_schema instrument, and I don't know offhand if it would apply retroactively if you enable it now.
Another suggestion I have, also for future alterations, not for the one currently running on your server, is to use pt-online-schema-change. At my last job, we used this for dozens of alterations on large tables like yours every week. It has an option to report progress, and even better, it does not block access to the table while it's running.

Related

Can I INSERT into table while UPDATING multiple different rows with MariaDB or MySQL?

I am creating a custom analytics system and currently in the database designing process. I'm planning to use MariaDB with the InnoDB engine to be able to handle big loads.
The data I'm expecting could be around 500k clicks/day. I will need to insert these rows into the database, which means that I'll have around 5.8 inserts/sec on average. However, at the same time, I want to record if someone visited a page associated with that click. (basically to record funnels)
So what I'm planning to do is to create additional columns and search for the ID of the specific row then update that column with the exact time of the visit.
My first question: is this generally a recommended approach to design the database like that? If not, how else is it worth to design the database?
My only concern is that while updating rows the Table will be locked, and can't do inserts, therefore slowing down the user experience.
My second question: is this something I should worry about, that the table gets locked while updating, and thus slowing down inserts? Does it hurt performance?
InnoDB doesn't lock the table for insert if you're performing the update. Your users won't experience any weird hanging.
It's an MVCC compliant engine, designed to handle concurrent access to underlying tables.
You can control the engine's behavior by choosing an appropriate isolation level, however the default (REPEATABLE READ) is excellent and does the job more than well.
If a table is being modified by multiple users (not users that connect to your site but connections established towards MySQL via a scripting language or some other service) and there's many inserts/updates/deletes - MySQL can throw an error saying a deadlock occurred.
A deadlock is a warning, not an error, that more than 1 thread tried to access an occupied resource (such as two threads tried to update the same row at the same time, but only 1 will be allowed to do so). It's an indication you should repeat the query.
I'm suggesting that you take care of all possible scenarios in the language of your choice when it comes to handling MySQL that's under heavier I/O.
~6 inserts a second isn't a lot, make sure you're allowing MySQL to access sufficient system resources. For InnoDB, check the value of innodb_buffer_pool_size or google a bit to see what it is and how to use it to make your database run fast.
Good luck!
At a mere 5.6/second, there won't be much problem.
I do, however, suggest vertical partitioning for "Likes", "Upvotes", "Clicks", and similar things. These tend to have a lot of UPDATEs of random single rows, and may interfere with other activity.
That is, have a separate table with (perhaps) just 2 columns:
The id of the item being Liked/Clicked/etc.
A counter.
It is simple enough (and fast enough) to JOIN via that id when you want to display info including the counter.
As already pointed out, the row is locked, not the table.

Will cron job for creating db affect the user?

I have a website where I need to create a temporary database table which recreates every 5 hours. It takes about 0.5 sec to complete the action.
Site analytics shows about 5 hits / Second. This figure may gradually increase.
Question
The cron job empties the db and the recreates it. Does it mean, while someone is accessing a page which populates data based on the temporary db while its active under the cron job, he may get no data found or incomplete data?
Or
This scenario is taken care of by Mysql due to locking?
From my tests, if one MySQWL client attempts to drop a database while another client has one of its tables locked, the client will wait until the tasble is unlocked.
However the client dropping the database cannot itself hold a lock on any of the database's tables either. So depending on what you are doing, you may need to use some other method to serialise requests and so on. For example, if the job needs to drop and re-create the database, create the table(s) and populate them before other clients use them, table locking will be a problem because there won't always be a table to lock.
Consider using explicit locking using get_lock() to coordinate operations on the "temporary" database.
Also consider rethinking your strategy. Dropping and re-creating entire databases on a regular basis is not a common practice.
Instead of dropping and recreating, you might to create first under a temporary name, populate and then drop the old one while renaming the new one.
Additionally, you should either make your web app fit for retrying if the table was not found in order to cope with the small time window where the table is not here, or operate on a view instead of renaming tables.
as I know, when you lock the table, others couldn't access that table unless you unlock it, but other connections will only be waiting until 0.5 seconds later, so your users may have to wait for extra 0.5 seconds when you recreate the table.
don't worry about no data, only sometime delay.

Converting a big MyISAM to InnoDB

I'm trying to convert a 10million rows MySQL MyISAM table into InnoDB.
I tried ALTER TABLE but that made my server get stuck so I killed the mysql manually. What is the recommended way to do so?
Options I've thought about:
1. Making a new table which is InnoDB and inserting parts of the data each time.
2. Dumping the table into a text file and then doing LOAD FILE
3. Trying again and just keep the server non-responsive till he finishes (I tried for 2hours and the server is a production server so I prefer to keep it running)
4. Duplicating the table, Removing its indexes, then converting, and then adding indexes
Changing the engine of the table requires rewrite of the table, and that's why the table is not available for so long. Removing indexes, then converting, and adding indexes, may speed up the initial convert, but adding index creates a read lock on your table, so the effect in the end will be the same. Making new table and transferring the data is the way to go. Usually this is done in 2 parts - first copy records, then replay any changes that were done while copying the records. If you can afford disabling inserts/updates in the table, while leaving the reads, this is not a problem. If not, there are several possible solutions. One of them is to use facebook's online schema change tool. Another option is to set the application to write in both tables, while migrating the records, than switch only to the new record. This depends on the application code and crucial part is handling unique keys / duplicates, as in the old table you may update record, while in the new you need to insert it. (here transaction isolation level may also play crucial role, lower it as much as you can). "Classic" way is to use replication, which, as far as I know is also done in 2 parts - you start replication, recording the master position, then import dump of the database in the second server, then start it as a slave to catch up with changes.
Have you tried to order your data first by the PK ? e.g:
ALTER TABLE tablename ORDER BY PK_column;
should speed up the conversion.

Is Postgres better than MySql when one needs to add a column to a table with millions of rows?

We're having problems with Mysql. When I search around, I see many people having the same problem.
I have joined up with a product where the database has some tables with as many as 150 million rows. One example of our problem is that one of these tables has over 30 columns and about half of them are no longer used. When trying to remove columns or renaming columns, mysql wants to copy the entire table and rename. With this amount of data, it would take many hours to do this and the site would be offline pretty much the whole time. This is just the first of several large migrations to improve the schema. These aren't intended as a regular thing. Just a lot of cleanup I inherited.
I tried searching to see if people have the same problem with Postgres and I find almost nothing in comparison talking about this issue. Is this because Postgres is a lot better at it, or just that less people are using postgres?
In PostgreSQL, adding a new column without default value to a table is instantaneous, because the new column is only registered in the system catalog, not actually added on disk.
When the only tool you know is a hammer, all your problems look like a nail. For this problem, PostgreSQL is much much better at handling these types of changes. And the fact is, it doesn't matter how well you designed your app, you WILL have to change the schema on a live database someday. While MySQL's various engines really are amazing for certain corner cases, here none of them help. PostgreSQL's very close integration between the various layers means that you can have things like transactional ddl that allow you to roll back anything that isn't an alter / create database / tablespace. Or very very fast alter tables. Or non-impeding create indexes. And so on. It limits PostgreSQL to the things it does well (traditional transactional db load handling is a strong point) and not so great at the things that MySQL often fills in the gaps on, like live networked clustered storage with the ndb engine.
In this case none of the different engines in MySQL allow you to easily solve this problem. The very versatility of multiple storage engines means that the lexer / parser / top layer of the DB cannot be as tightly integrated to the storage engines, and therefore a lot of the cool things pgsql can do here mysql can't.
I've got a 118Gigabyte table in my stats db. It has 1.1 billion rows in it. It really should be partitioned but it's not read a whole lot, and when it is we can wait on it. At 300MB/sec (the speed the array it's on can read) it takes approximately 118*~3seconds to read, or right around 5 minutes. This machine has 32Gigs of RAM, so it cannot hold the table in memory.
When I ran the simple statement on this table:
alter table mytable add test text;
it hung waiting for a vacuum. I killed the vacuum (select pg_cancel_backend(12345) (<-- pid in there) and it finished immediately. A vacuum on this table takes a long time to run btw. Normally it's not a big deal, but when making changes to table structure, you have to wait on vacuums, or kill them.
Dropping a column is just as simple and fast.
Now we come to the problem with postgresql, and that is the in-heap MVCC storage. If you add that column, then do an update table set test='abc' it updates each row, and exactly doubles the size of the table. Unless HOT can update the rows in place, but then you need a 50% fill factor table which is double sized to begin with. The only way to get the space back is to either wait and let vacuum reclaim it over time and reuse it one update at a time, or to run cluster or vacuum full to shrink it back down.
you can get around this by running updates on parts of the table at a time (update where pkid between 1 and 10000000; ...) and running vacuum between each run to reclaim the space.
So, both systems have warts and bumps to deal with.
maybe because this should not be a regualr occurrence.
perhaps, reading between the lines, you need to be adding a row to another table, instead of columns to a large existing table..?

How much time does MySQL need to build an index

How much time does MySQL need to build an index of a table with 30,000,000 entries that are strings of length 256?
At the moment it seems to take hours and I don't know how long I should wait till I conclude that MySql simply failed at building an index.
You may run SHOW PROCESSLIST \G in mysql console to watch its state. I had a similar problem just a couple of hours ago, but my table was much smaller.
Here a list of thread states you will definitely need. After an hour of waiting I realized that ALTER TABLE CREATE INDEX is in Locked state, I needed to restart mysqld and run the statement once again. That time I had index built in 15 minutes.
By the way, I recommend to run index creation from mysql console, GUI tools may add some spices to the process.
it could easily take hours. it all depends on the machine specs, load, etc etc. to see whether it's failed, check something like top or watch your hard drives - if they're going mad it's still indexing.
Depending on your OS you may check for disk activity (i.e. does it reads/writes DB files) to find out if it failed or not.