MySQL Add Column with Online DDL - mysql

I'm currently trying to add a column to a table of ~25m rows. I need to have near-0 down time, so was hoping to use online DDL. It runs for a while, but eventually runs into the issue:
"Duplicate entry '1234' for key 'PRIMARY'"
[SQL: u'ALTER TABLE my_table ADD COLUMN my_coumn BOOL NOT NULL DEFAULT false']
I think this is happening because I'm running INSERT ... ON DUPLICATE KEY UPDATE ... operations against the table while running the operation. This seems to be a known limitation.
After this didn't work, I tried using the Percona pt-online-schema-change tool, but unfortunately, because my table has generated columns, that didn't work either with error:
The value specified for generated column 'my_generated_column' in table '_my_table_new' is not allowed.
So, I'm now at a loss. What are my other options for adding a column without blocking DML operations?

Your Alter statement is creating a non nullable column with a default of false. I'd suspect this to place an exclusive lock on your table, attempt to create the column, then setting it to False across each row.
If you don't have any available downtime, I'd suggest you
Add the column as nullable and with no default
ALTER TABLE my_table ADD COLUMN my_coumn BOOL NULL;
Update the values for existing rows to false
update my_table set my_coumn=false;
Alter the table a second time to be not nullable and with a default.
ALTER TABLE my_table modify my_coumn BOOL NOT NULL DEFAULT false;
Alternatively you could use something like Percona which manages schema changes using triggers and is meant to offer the ability to update schemas without locking the table.
Either option I'd suggest you test in your development environment with some process writing to the table to simulate user activity.

Related

Table traversing with multiple operations in ALTER TABLE

Some databases, like MySQL [1] and PostgreSQL [2], support bundling of certain compatible ALTER TABLE statements (as non-standard SQL).
For example we can have:
ALTER TABLE `my_table`
DROP COLUMN `column_1`,
DROP COLUMN `column_2`,
...
or
ALTER TABLE
MODIFY `column_1` ... ,
MODIFY `column_2` ... ,
instead of having individual statements:
ALTER TABLE `my_table` DROP COLUMN `column_1`;
ALTER TABLE `my_table` DROP COLUMN `column_2`;
or
ALTER TABLE `my_table` MODIFY `column_1` ... ;
ALTER TABLE `my_table` MODIFY `column_2` ... ;
etc
For comparison of the same feature, PostgreSQL [2], which also implements this, will perform all operations in a single scan:
The main reason for providing the option to specify multiple changes in a single ALTER TABLE is that multiple table scans or rewrites can thereby be combined into a single pass over the table.
Although for DROP COLUMN specifically it will often not even need do that:
The DROP COLUMN form does not physically remove the column, but simply makes it invisible to SQL operations...
Questions:
Would the multi-column statement result in traversing all the rows just once and performing all changes needed?
How does MySQL actually perform DROP COLUMN? Does it also "hide" the columns first, or does it delete the data straight away?
Assumptions:
Using InnoDB
No indexes/complex defaults are involved in any of the columns we want to change/drop (so basically changes that would not require a temporary table when run as individual alter statements)
References:
[1] MySQL ALTER TABLE docs
[2] PostgreSQL ALTER TABLE docs
MySQL's InnoDB:
(This does not really answer the Questions, but provides a little more insight in the the bigger question of ALTER.)
If any of the alters needs to copy the table over, you are probably better off putting all alters into the same statement. Changing the PRIMARY KEY, for example, requires rebuilding the data that is clustered with the PK.
Some alters can be achieved by simply altering the schema; these are virtually instantaneous, and could be done via separate alter statements. Adding an option to ENUM was implemented long ago.
Some alters need some form of scan, but can do it "in the background". DROP INDEX can be done by quickly "hiding" it, then freeing up the BTree in the background.
I have left out a grey area in which you batch 'simple' alters. One would hope that ALTER is smart enough to simply go through them quickly, rather than deciding to copy the table over.
I got some useful feedback but decided to respond to my own question to provide a more concrete set of answers.
Would the multi-column statement result in traversing all the rows just once and performing all changes needed?
Yes, if the alter statement results in rebuilding the table then it only needs to do it once.*
* This answer comes from my own testing and other mostly anecdotal evidence (including #Uueerdo 's in this post). It would be useful to have some official docs for this...
How does MySQL actually perform DROP COLUMN? Does it also "hide" the columns first, or does it delete the data straight away?
MySQL will rebuild the table in place (rather than create a copy or just change metadata) for most column operations. Each specific case can be found in the Online DDL docs for InnoDB.
A few operations like renaming a column or setting a default value will just alter metadata, so they don't require a table rebuild.
However, dropping a column DOES require a full table rebuild.

re-inserting a table record and updating an auto increment primary index

I'm running MariaDB 5.5.56.
I'm looking to copy an entire row in a database, change one column, then insert the entire row back into the original database (I don't want to have to specify the individual fields because there's a lot of them). The problem I'm running into is how to deal with an auto-increment/primary key column.
example:
create temporary table t_ownership like ownership;
insert into t_ownership (select * from ownership where name='x' LIMIT 1);
update t_ownership set id='something else';
insert into ownership (select * from t_ownership);
I have a column "recno" that is an auto-increment that will create a collision in the database when I try to re-insert the slightly changed record back into the original table.
Something like this seems to work but doesn't result in an insert:
insert into ownership (select * from t_ownership) ON DUPLICATE KEY UPDATE recno=LAST_INSERT_ID(ownership.recno);
The above statement executes without error but does not add a row to table ownership.
So I think I'm close but not quite there...
What would be the best way to do this? I'd like to avoid doing an insert where I manually specify field/values. I just need to regenerate a new A.I. recno column on the insert.
NULL values inserted into auto-incremented fields end up just getting the next auto-increment value, behaving equivalent to INSERTing without specifying the field; so you should be able to update the source (temp copy) to have NULL for that field.
However, one potential issue that could present itself in scenarios like yours is that the CREATE TEMPORARY TABLE ... LIKE could result in a table that would not allow you to set such fields to NULL; this would require you to either ALTER the temporary table, or create it in a more explicit manner. Either way, it now makes code/queries that do not specify columns even more reliant on knowing columns.
Personally, I would take this route in the first place.
INSERT INTO theTable([list all but the auto-inc column])
SELECT [list all but the auto-inc column, with any replacements or modifications desired]
FROM ...[original query]...
It accomplishes the task in one query, makes the queries more self documenting, and only at the cost of a little typing (most of which a decent database browser, or query builder, will do for you).
The only argument really in favor of your current approach is that the table involved can be changed without necessarily breaking your queries; but that begs the question of whether it would be better for such table changes to break the queries, forcing them to be re-examined. If it is not an issue, it is a minor revision; but the alternative is queries that continue to be valid that have the potential to cause unexpected behavior due to copying information they were never intended to.

Detecting database change

I have a database intensive application that needs to run every couple hours. Is there a way to detect whether a given table has changed since the last time this application ran?
The most efficient way to detect changes is this.
CHECKSUM TABLE tableName
A couple of questions:
Which OS are you working on?
Which storage engine are you using?
The command [http://dev.mysql.com/doc/refman/5.5/en/show-table-status.html](SHOW TABLE STATUS) can display some info depending on storage engine though.
It also depends on how large is the interval between runs of your intensive operation.
The most precise way I believe is with the use of triggers (AFTER INSERT/UPDATE) as #Neuticle mentioned, and just store the CURRENT_TIMESTAMP next to the table name.
CREATE TABLE table_versions(
table_name VARCHAR(50) NOT NULL PRIMARY KEY,
version TIMESTAMP NOT NULL
);
CREATE TRIGGER table_1_version_insert AFTER INSERT
ON table_1
FOR EACH ROW
BEGIN
REPLACE INTO table_versions VALUES('table_1', CURRENT_TIMESTAMP);
END
Could you set a trigger on the tables you want to track to add to a log table on insert? If that would work you only have to read the log tables on each run.
Use timestamp. Depending upon your needs you can set it to update on new rows, or just changes to existing rows. Go here to see a reference:
http://dev.mysql.com/doc/refman/5.0/en/timestamp-initialization.html
A common way to detect changes to a table between runs is with a query like this:
SELECT COUNT(*),MAX(t) FROM table;
But for this to work, a few assumptions must be true about your table:
The t column has a default value of NOW()
There is a trigger that runs on UPDATE and always sets the t column to NOW().
Any normal changes made to the table will then cause the output of the above query to change:
There are a few race conditions that can make this sort of check not work in some instances.
Have used CHECKSUM TABLE tablename and that works just splendid.
Am calling it from an AJAX request to check for table updates. If changes are found a screen refresh is performed.
For database "myMVC" and table "detail" it returns one row with fields "table" and "Checksum" set to "mymvc.detail" and "521719307" respectively.

MySQL DUPLICATE KEY UPDATE fails to update due to a NOT NULL field which is already set

I have a MySQL DB which is using strict mode so I need to fill all NOT NULL values when I insert a row. The API Im creating is using just DUPLICATE KEY UPDATE functionality to do both inserts/updates.
The client application complains if any NOT NULL attributes are inserted which is expected.
Basic example (id is primary key and theare are two fields that are NOT NULL aaa and xxx)
INSERT INTO tablename (aaa, xxx, id ) VALUES ( "value", "value", 1)
ON DUPLICATE KEY UPDATE aaa=VALUES(aaa), xxx=VALUES(xxx)
All good so far. Once it is inserted, the system would allow doing updates. Nevertheless, I get the following error when updating only one of the fields.
INSERT INTO tablename (aaa, id ) VALUES ( "newValue", 1)
ON DUPLICATE KEY UPDATE aaa=VALUES(aaa)
java.sql.SQLException: Field 'xxx' doesn't have a default value
This Exception is a lie as the row is already inserted and xxx attribute has "value" as value. I would expect the following sentence to be equivalent to:
UPDATE tablename SET aaa="newValue" WHERE id=1
I would be glad if someone can shed some light about this issue.
Edit:
I can use the SQL query in PhpMyAdmin successfully to update just one field so I am afraid that this is not a SQL problem but a driver problem with JDBC. That may not have solution then.
#Marc B: Your insight is probably true and would indicate what I just described. That would mean that there is a bug in JDBC as it should not do that check when the insert is of ON DUPLICATE type as there may be a default value for the row after all. Can't provide real table data but I believe that all explained above is quite clear.
#ruakh: It does not fail to insert, neither I am expecting delayed validation. One requirement I have is to have both insert/updates done using the same query as the servlet does not know if the row exists or not. The JAVA API service only fails to update a row that has NOT NULL fields which were already filled when the insert was done. The exception is a lie because the field DOES have a default value as it was inserted before the update.
This is a typical case of DRY / SRP fail; in an attempt to not duplicate code you've created a function that violates the single responsibility principle.
The semantics of an INSERT statement is that you expect no conflicting rows; the ON DUPLICATE KEY UPDATE option is merely there to avoid handling the conflict inside your code, requiring another separate query. This is quite different from an UPDATE statement, where you would expect at least one matching row to be present.
Imagine that MySQL would only check the columns when an INSERT doesn't conflict and for some reason a row was just removed from the database and your code that expects to perform an update has to deal with an exception it doesn't expect. Given the difference in statement behaviour it's good practice to separate your insert and update logic.
Theory aside, MySQL puts together an execution plan when a query is run; in the case of an INSERT statement it has to assume that it might succeed when attempted, because that's the most optimal strategy. It prevents having to check indices etc. only to find out later that a column is missing.
This is per design and not a bug in JDBC.

How to alter MySQL table without losing data?

In my application, I make some changes and upload them to a testing server. Because I have no access to the server database I run ALTER commands to make changes on it.
Using a method I ran the following command on server:
ALTER TABLE `blahblahtable` ADD COLUMN `newcolumn` INT(12) NOT NULL
After that, I found that the all the data of the table has been removed. Now the table is blank.
So I need to alter the table without removing his data. Is there any way to do that?
Your question is quite obvious. You're adding a new column to the table, and setting it to NOT NULL.
To make things clearer, I will explain the reaction of the server when you run the command:
You add a new column, so every row of the table has to set a value for that column.
As you don't declare any default value, all the rows set null for this new column.
The server notices that the rows of the table have a null value on a column that doesn't allow nulls. This is illegal.
To solve the conflict, the invalid rows are deleted.
There are some good fixes for this issue:
Set a default value (recommended) for the column you're creating.
Create the column without the NOT NULL, set the appropiate values, and then make the column NOT NULL.
You can create a temp table, pass all the information from the table you want to alter, and then return the info to the altered table.