What does pt-online-schema-change tool do if it aborts? - mysql

I am planning to use pt-online-schema-change tool for a table alter on a production server in a replication environment. Wanted to understand what steps are executed when pt-online-schema-change tool aborts due to server tool. Is it possible to resume after it aborts due to server load ? will it drop the temp table and the triggers it created? Will i need to start all over again?

pt-online-schema-change is very verbose with exactly what it's doing. So, when something fails I've always been able to read the last lines and it'll tell me what to do.
Specifically when migrations have failed due to load, triggers have not been dropped. In that case the output has stated exactly what I need to execute to drop them. It's possible that they are kept to be able to resume. I don't know about that. In my case I've always dropped the triggers and started from scratch after tweaking parameters or lowering database usage. I guess my way of doing it works if you aren't scripting pt-online-schema-change executions.
Regarding automating dropping temp table and triggers; By default it drops them (see documentation). However, you can always specify --nodrop-old-table, --nodrop-new-table and --nodrop-triggers. For larger tables (when I have the disk space) I generally always specify --nodrop-old-table. This enables me to quickly roll back to the old table in case something goes wrong when it's swapped in by simply issuing
RENAME TABLE mytable TO mytable_failedmigration, old-table TO mytable;
If the migration succeeds I drop my giant table by following the instructions in https://serverfault.com/a/566710/37237.

Related

MySQL/MariaDB Trigger for Taking Ran Query and Pasting into a Row

So one of the projects I'm working on requires us to take every query that is ran on the server and automatically paste that query into a table inside of the database. The reason for this is so that the DBA is able to view all prior SQL Queries that have been ran on the box. Unfortunately I don't have any leeway to do this differently as the client is requiring this implementation.
Has anybody done this before or has any code that I could use that will automatically do this? Thanks.
Be careful! If you do an INSERT for every action taken, you will need to do an INSERT for that INSERT, at which point, you will ...
That is, the first logged query will hang the server and fill up the disk!
Instead of doing the task the way it is asked, turn on the "general log" and periodically scrape what it in it into another machine, which does not have this logging turned on.
Other arguments against the task as stated...
If a table has TRIGGERs, you will not be able to add another TRIGGER.
If "every query" really means "every", it is impossible (with a TRIGGER) since you can't write a SELECT or SHOW trigger.
"as the client is requiring this implementation". I would approach this unreasonable constraint by politely finding out what the real goal is. He has only described is an implementation.
If his goal is some kind of audit log, then my suggestion about the general log should suffice.

Transactional DDL workflow for MySQL

I was a little surprised to discover that DDL statements (alter table, create index etc) implicitly commit the current transaction in MySQL. Coming from MS SQL Server, the ability to do database alterations in a transaction locally (that was then rolled back) was an important part of my workflow. For continuous integration, the rollback was used if the migration hiccuped for any reason, so that at least we did not leave the database in a half-migrated state.
How do people solve these two problems when using MySQL with migrations and continuous integration?
DDL statements cause an implicit commit and there is nothing you can do about it. There is no way to stop this behaviour.
Which DDL statements have this behaviour changes over time so you need to check for your version.
5.1 http://dev.mysql.com/doc/refman/5.1/en/implicit-commit.html
5.5 http://dev.mysql.com/doc/refman/5.5/en/implicit-commit.html
5.6 http://dev.mysql.com/doc/refman/5.6/en/implicit-commit.html
When we are just extending the schema, new tables/columns/views/procs/etc, that will not affect existing code then automation is OK, just check for errors and fix them.
When they will affect existing code then you need to devise a strategy on a case by case basis. Since there is no rollback you need your own backout plan and you need to test it thoroughly.
Since it is case-by-case there is not a lot that I can offer in the way of help for your particular situation.
One possibility is doing DDL changes in a non-destructive-manner, which would include:
split logic in DDL/DCL (+1 to reverse all) and DML
run only the DDL/DCL script adding columns, new tables, ..
depending on result:
on success, apply the DML changes,
on fail, apply reverse DDL/DCL script removing the stuff you wanted to add in second step (obviously with some errors "does not exist" depending on how far step 1 got)
remove what is not needed anymore, drop old columns/tables

What mysql command makes all data inaccessable?

My database is periodicly being "deleted" by an automated command from the server (because the table is too big). What happens is that all data in a certain table becomes unaccessable with e.g. select. But if I do a "repair" on the table, all data comes back. I would like to stop this nonesense, but I can't find the command that does this. Any help?
Edit: I should note that the DB is on an external machine that I do not have access to.
I have now tried to do a "select" when the db was in this curious state. The table says it has 0 entries, but take 2.5 gb of storrage space. When I selected all I got one tuple, no errors.
Its likely your DB is becoming corrupt somehow. There's no command that does that (I hope).
Do yourself a favor and alter each and every one of your tables so they use the InnoDB engine instead of MyISAM. It's still be MySQL, but it'll be a lot less prone to data corruption.
And if changing DB altogether is an option, look into using PostgreSQL instead.

How to update DB structure when updating production system without doing a teardown / rebuild

If I'm working on a development server and have updates to the database structure for some of our releases, what is the best way to update the structure on the production server?
Currently we create a new production database containing the structure only, do a SQL dump of the data on the 'old' production database, then run a SQL query to insert the data into the new database.
I know there is an easier way to do these updates, right?
Thanks in advance.
We don't run anything on prod without a script and that script must be in source control. Additionally we have to write a rollback script in case the initial script goes bad and we have to back it out. And when we move to prod configuration management does a differential compare between prod and dev to see if we have missed anything in the production script (any differences have to be traceable to development we are not yet ready to move to prod and documented). A product like Red-gate's SQL compare can do this. Our process is very formalized so that we can maintain a certification required by our larger clients.
If you have large tables even alter table can be slow, but it's still generally more efficient in total time than making a copy of the table with a new name and structure, copying the data to that table, renaming the old table, then naming the new table the name of the orginal table, then deleting the old table.
However, there are times when that is a preferable process as the total down time apparent to the user in this case is the time it takes to rename two tables, so this is good for tables where the data only is filled from the backend not the application (if the application can update the tables, it is a dangerous practice to do this as you may lose changes made while the tables were in transition). A lot of what process to use depends on the nature of the change you are making. Some changes should be done in a maintenance window where the users are not allowed to access the database. For instance if you are adding a new field with a default value to a table with 100,000,000 records, you are liable to lock up the users from using the table while the update happens. It is better to do this in single user mode during off hours (and when the users are told in advance the database will not be available). Other changes only take milliseconds and can happen easily while users are logged in.
Look at alter table to change the schema
It might not be easier than your method but it means less copying of the database
This is actually quite a deep question. If the only changes you've made are to add some columns then ALTER TABLE is probably sufficient. But if you're renaming or deleting columns then ALTER statements may break various foreign key constraints. In addition, sometimes you need to make changes both to the database and the data, which is pretty much unscriptable.
Most likely the best way to automate this would be to write a simple script for each deployment (along with a script to roll back!) which is basically what some systems like Rails will do for you I believe. Some scripts might be simply ALTER statements, some might temporarily disable foreign-key checking and triggers etc, some might run some update statements as well. And some might be dumping the db and rebuilding it. I don't think there's a one-size-fits-all solution here, sorry :)
Use the ALTER TABLE command: http://dev.mysql.com/doc/refman/5.0/en/alter-table.html

What is the best way to update (or replace) an entire database table on a live machine?

I'm being given a data source weekly that I'm going to parse and put into a database. The data will not change much from week to week, but I should be updating the database on a regular basis. Besides this weekly update, the data is static.
For now rebuilding the entire database isn't a problem, but eventually this database will be live and people could be querying the database while I'm rebuilding it. The amount of data isn't small (couple hundred megabytes), so it won't load that instantaneously, and personally I want a bit more of a foolproof system than "I hope no one queries while the database is in disarray."
I've thought of a few different ways of solving this problem, and was wondering what the best method would be. Here's my ideas so far:
Instead of replacing entire tables, query for the difference between my current database and what I want to place in the database. This seems like it could be an unnecessary amount of work, though.
Creating dummy data tables, then doing a table rename (or having the server code point towards the new data tables).
Just telling users that the site is going through maintenance and put the system offline for a few minutes. (This is not preferable for obvious reasons, but if it's far and away the best answer I'm willing to accept that.)
Thoughts?
I can't speak for MySQL, but PostgreSQL has transactional DDL. This is a wonderful feature, and means that your second option, loading new data into a dummy table and then executing a table rename, should work great. If you want to replace the table foo with foo_new, you only have to load the new data into foo_new and run a script to do the rename. This script should execute in its own transaction, so if something about the rename goes bad, both foo and foo_new will be left untouched when it rolls back.
The main problem with that approach is that it can get a little messy to handle foreign keys from other tables that key on foo. But at least you're guaranteed that your data will remain consistent.
A better approach in the long term, I think, is just to perform the updates on the data directly (your first option). Once again, you can stick all the updating in a single transaction, so you're guaranteed all-or-nothing semantics. Even better would be online updates, just updating the data directly as new information becomes available. This may not be an option for you if you need the results of someone else's batch job, but if you can do it, it's the best option.
BEGIN;
DELETE FROM TABLE;
INSERT INTO TABLE;
COMMIT;
Users will see the changeover instantly when you hit commit. Any queries started before the commit will run on the old data, anything afterwards will run on the new data. The database will actually clear the old table once the last user is done with it. Because everything is "static" (you're the only one who ever changes it, and only once a week), you don't have to worry about any lock issues or timeouts. For MySQL, this depends on InnoDB. PostgreSQL does it, and SQL Server calls it "snapshotting," and I can't remember the details off the top of my head since I rarely use the thing.
If you Google "transaction isolation" + the name of whatever database you're using, you'll find appropriate information.
We solved this problem by using PostgreSQL's table inheritance/constraints mechanism.
You create a trigger that auto-creates sub-tables partitioned based on a date field.
This article was the source I used.
Which database server are you using? SQL 2005 and above provides a locking method called "Snapshot". It allows you to open a transaction, do all of your updates, and then commit, all while users of the database continue to view the pre-transaction data. Normally, your transaction would lock your tables and block their queries, but snapshot locking would be perfect in your case.
More info here: http://blogs.msdn.com/craigfr/archive/2007/05/16/serializable-vs-snapshot-isolation-level.aspx
But it requires SQL Server, so if you're using something else....
Several database systems (since you didn't specify yours, I'll keep this general) do offer the SQL:2003 Standard statement called MERGE which will basically allow you to
insert new rows into a target table from a source which don't exist there yet
update existing rows in the target table based on new values from the source
optionally even delete rows from the target that don't show up in the import table anymore
SQL Server 2008 is the first Microsoft offering to have this statement - check out more here, here or here.
Other database system probably will have similar implementations - it's a SQL:2003 Standard statement after all.
Marc
Use different table names(mytable_[yyyy]_[wk]) and a view for providing you with a constant name(mytable). Once a new table is completely imported update your view so that it uses that table.