Changing a MySQL database retrospectively - mysql

Is there a method to track changes to a MySQL database? I develop offline and then commit all the changes to my server. For the app itself I use Git and it works nicely.
However, for the database, I'm changing everything manually because the live database contains customer data and I cannot just replace it with the development database.
Is there a way to only have the structural changes applied without completely replacing one db with another?

The term you're looking for is 'database migrations' (and no, it doesn't refer to moving from one RDBMS to another). Migrations are a way to programatically version control your database structure. Most languages have some kind of migrations toolkit, often as part of an ORM library/framework.
For PHP you can look at Doctrine
For Ruby it's Rails of course

The key to have keep track of your changes is Snapshots my friend.
Now, it's a wide field. The first thing you have to do is decide if you want to keep track of your database with some kind of data in it. If that's the case you have several options, from using LVM, copying InnoDB binary logs, and the simple mysqldump.
Now, if what you wanna do is have some smooth transition between your database changes (i mean, you added a column, for example), you have some other options.
The first one is replication. That's a great option, but is a little complex. With replication you may alter one slave and after it's done, with some locking, you can make it master, and replace master, and so on. It's really difficult, but is the better option.
If you cannot afford replication, what you must do is apply the changes to your single-master DB with the minimum downtime. Some good option is this:
Suppose you want to replace your Customer table to add a "facebook_account" field. First, you can use an alias table, like this:
The original table (it has data):
CREATE TABLE `customer` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
The new one:
CREATE TABLE `new_customer` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`facebook_account` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
Or simply:
CREATE TABLE new_customer LIKE customer;
ALTER TABLE new_customer add column facebook_account VARCHAR(255);
Now we're gonna copy the data to the new table. We'll need to issue some other things first, i'll explain them each at a time.
First, you can allow other connections to modify the customer table while your making the change of table, so i'll issue a lock. If you want to learn more about this go here:
LOCK TABLES customer WRITE ,new_customer WRITE;
Now i flush the table to write any cache content to the filesystem:
FLUSH TABLES customer;
Now we can do the insert. First I disable the keys for performance issues. After the data is inserted i enable the keys again.
ALTER TABLE new_customer DISABLE KEYS;
INSERT INTO new_customer(id,name,facebook_account) SELECT customer.id,customer.name, Null FROM customer;
ALTER TABLE new_customer ENABLE KEYS;
Now we can switch the tables.
ALTER TABLE customer RENAME old_customer;
ALTER TABLE new_customer RENAME customer;
Finally we have to release the lock.
UNLOCK TABLES;
That's it. If you want to keep track of your modified tables you may want to rename your old_customer table, to something else or move it to other database.
The only issue i didn't cover here is about Triggers. You have to pay atention to any enabled trigger, but it will depend on your schema.
That's it, hope it helps.

Related

Is MySQL trigger the way to go?

I'm creating a mysql db in which to store files of different formats and sizes like pdfs, imgs, zips and whatnot.
So I started looking for examples on the blob data type (which I think is the right data type for storing the above mentioned files) and I stumbled upon this SOquestion. Essentially what the answer suggests is not to store the blob files directly into the "main" table but create two different tables, one for the file description and the other for the blobs themselves (as these can be heavy to get). And connect these tables by a foreign key constraint to tie the file to its description and do a join operation to retrieve the wanted blob if needed.
So I've created the following tables:
create table if not exists file_description(
id int auto_increment primary key,
description_ varchar(200) not null,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) engine=INNODB;
create table if not exists files(
id int auto_increment primary key,
content longblob not null,
format_extension varchar(10) not null,
foreign key (id) references file_description(id)
on update cascade
on delete cascade
)engine=INNODB;
But how can I enforce that after each insertion into the file_description table directly follows an insertion into the files table?
I'm no expert but for what I've seen on triggers they are used in a different way than what I would like to do here. Something like
create trigger whatever
on file_description after insert
...
I don't know, how do I do that?
You cannot enforce through database tools that an insertion into a parent table is followed by an insertion into a child table as the data to be inserted come from outside of the database. You need to design your application in a way that it populates both tables right after each other.
What the application can do is to encapsulate the two insert statements into a single transaction ensuring that either both inserts succeed or both are rolled back leaving your database in a consistent state.

MySQL Add Column with Online DDL

I'm currently trying to add a column to a table of ~25m rows. I need to have near-0 down time, so was hoping to use online DDL. It runs for a while, but eventually runs into the issue:
"Duplicate entry '1234' for key 'PRIMARY'"
[SQL: u'ALTER TABLE my_table ADD COLUMN my_coumn BOOL NOT NULL DEFAULT false']
I think this is happening because I'm running INSERT ... ON DUPLICATE KEY UPDATE ... operations against the table while running the operation. This seems to be a known limitation.
After this didn't work, I tried using the Percona pt-online-schema-change tool, but unfortunately, because my table has generated columns, that didn't work either with error:
The value specified for generated column 'my_generated_column' in table '_my_table_new' is not allowed.
So, I'm now at a loss. What are my other options for adding a column without blocking DML operations?
Your Alter statement is creating a non nullable column with a default of false. I'd suspect this to place an exclusive lock on your table, attempt to create the column, then setting it to False across each row.
If you don't have any available downtime, I'd suggest you
Add the column as nullable and with no default
ALTER TABLE my_table ADD COLUMN my_coumn BOOL NULL;
Update the values for existing rows to false
update my_table set my_coumn=false;
Alter the table a second time to be not nullable and with a default.
ALTER TABLE my_table modify my_coumn BOOL NOT NULL DEFAULT false;
Alternatively you could use something like Percona which manages schema changes using triggers and is meant to offer the ability to update schemas without locking the table.
Either option I'd suggest you test in your development environment with some process writing to the table to simulate user activity.

How to solve a real time dwh delete process?

I am trying to create a near real time dwh. My first attempt is every 15 minutes load a table into my application from my DWH.
I would like to avoid all the possible problems that a near real time DWH can face. One of those problems is query an empty table that shows the value for a multiselect html tag.
To solve this I have thought the following solution but I do not know if there exists a standard to solve this kind of problem.
I create a table like this to save the possible values of the multiselect:
CREATE TABLE providers (
provider_id INT PRIMARY KEY,
provider_name VARCHAR(20) NOT NULL,
delete_flag INT NOT NULL
)
Before the insert I update the table like this:
UPDATE providers set my_flag=1
I insert rows with an ETL process like this:
INSERT INTO providers (provider_name, delete_flag) VALUES ('Provider1',0)
From my app I query the table like this:
SELECT DISTINCT provider_name FROM providers
While the app still working and selecting all providers without duplicated (The source can delete, add or update one provider, so I always have to still updated respect the source) and without showing an error because table is empty I can run this statement just after the insert statement:
DELETE FROM providers WHERE delete_flag=1
I think that this is a good solution for small tables, or big tables with few changes, but what happens when a table is big? Exist some standard to solve this kind of problems?
We can not risk user usability because we are updating data.
There are two aproaches to publich a bulk change of a dimenstion without taking a maintainance window that would interupt the queries.
The first one is simple using a transactional concept, but performs bad for large data.
DELETE the replaced dimension records
INSERT the new or changed dimension records
COMMIT;
Note that you need no logical DELETE flag as the changes are visible only after the COMMIT - so the table is never empty.
As mentioned this approach is not suitable if you have a large dimension with lot of changes. In such case you may use the EXCHANGE PARTITION feature as of MySQL 5.6
You define a temporary table with he same structure as your dimension table, that is partitioned with only one partition containing all data.
CREATE TABLE dim_tmp (
id INT NOT NULL,
col1 VARCHAR(30),
col2 VARCHAR(30)
)
PARTITION BY RANGE (id) (
PARTITION pp VALUES LESS THAN (MAXVALUE)
);
Populate the table with the complete new dimension definition and switch this temporary table with your dimension table.
ALTER TABLE dim_tmp EXCHANGE PARTITION pp WITH TABLE dim;
After this statement the data from the temporary table will be stored (published) in your dimension table (new definition) and the old state of the dimension will be stored in the temporary table.
Please check the documentation link above for constraints of this feature.
Disclaimer: I use this feature in Oracle DB and I have no experience with it in MySQL.

mysql, how to create table and automatically track users who add or delete rows/tables

I would like some kind of revision control history for my sql database.
I would like this table to keep updating with a record of who, deleted what, etc, when.
I am connecting to MySQL using Perl.
Approach 1: Create a separate "audit" table and use triggers to populate the info.
Here's a brief guide for MySQL (and Postrges): http://www.go4expert.com/forums/showthread.php?t=7252
Approach 2: Populate the audit info from your Perl database access code. Ideally, as part of the same transaction. There's no significant win over the first approach and many downsides (you don't catch changes made OUTSIDE of your code, for one)
**Disclaimer: I faced this situation in the past, but in PHP. Concepts are for PHP but could be applied to perl with some thought.
I played with the idea of adding triggers to each table AFTER INSERT, AFTER UPDATE, AFTER DELETE
to accomplish the same thing. The problem with this was:
the trigger didn't know the 'admin' user, just the db user (CURRENT_USER)
Biggest issue was that it wasn't feasible to add these triggers to all my tables (I suppose I could have written a script to add the triggers).
Maintainability of the triggers. If you change how things are tracked, you'd have to update all triggers. I suppose having the trigger call a stored procedure would mostly fix that issue.
Either way, for my situation, I found the best course of action was in the application layer (not DB layer):
create a DB abstraction layer if you haven't already (Class that handles all the interaction with the database).
create function for each action (insert, update, delete).
in each of these functions, after a successful query call, add another query that would insert the relevant information to your tracking table
If done properly, any action you perform to update any table will be tracked. I had to add some overrides for specific tables to not track (what's the point of tracking inserts on the 'track_table' table, for instance). Here's an example table tracking schema:
CREATE TABLE `track_table` (
`id` int(16) unsigned NOT NULL,
`userID` smallint(16) unsigned NOT NULL,
`tableName` varchar(255) NOT NULL DEFAULT '',
`tupleID` int(16) unsigned NOT NULL,
`date_insert` datetime NOT NULL,
`action` char(12) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `userID` (`userID`),
KEY `tableID` (`tableName`,`tupleID`,`date_insert`)
) ENGINE=InnoDB

Adding a time dimension to MySQL cells

Is there a way to keep a timestamped record of every change to every column of every row in a MySQL table? This way I would never lose any data and keep a history of the transitions. Row deletion could be just setting a "deleted" column to true, but would be recoverable.
I was looking at HyperTable, an open source implementation of Google's BigTable, and this feature really wet my mouth. It would be great if could have it in MySQL, because my apps don't handle the huge amount of data that would justify deploying HyperTable. More details about how this works can be seen here.
Is there any configuration, plugin, fork or whatever that would add just this one functionality to MySQL?
I've implemented this in the past in a php model similar to what chaos described.
If you're using mysql 5, you could also accomplish this with a stored procedure that hooks into the on update and on delete events of your table.
http://dev.mysql.com/doc/refman/5.0/en/stored-routines.html
I do this in a custom framework. Each table definition also generates a Log table related many-to-one with the main table, and when the framework does any update to a row in the main table, it inserts the current state of the row into the Log table. So I have a full audit trail on the state of the table. (I have time records because all my tables have LoggedAt columns.)
No plugin, I'm afraid, more a method of doing things that needs to be baked into your whole database interaction methodology.
Create a table that stores the following info...
CREATE TABLE MyData (
ID INT IDENTITY,
DataID INT )
CREATE TABLE Data (
ID INT IDENTITY,
MyID INT,
Name VARCHAR(50),
Timestamp DATETIME DEFAULT CURRENT_TIMESTAMP)
Now create a sproc that does this...
INSERT Data (MyID, Name)
VALUES(#MyID,#Name)
UPDATE MyData SET DataID = ##IDENTITY
WHERE ID = #MyID
In general, the MyData table is just a key table. You then point it to the record in the Data table that is the most current. Whenever you need to change data, you simply call the sproc which Inserts the new data into the Data table, then updates the MyData to point to the most recent record. All if the other tables in the system would key themselves off of the MyData.ID for foreign key purposes.
This arrangement sidesteps the need for a second log table(and keeping them in sync when the schema changes), but at the cost of an extra join and some overhead when creating new records.
Do you need it to remain queryable, or will this just be for recovering from bad edits? If the latter, you could just set up a cron job to back up the actual files where MySQL stores the data and send it to a version control server.