How to selectively export mysql data for a github repo - mysql

We're an opensource project and would like to collaboratively edit our website through github public repo.
Any ideas on the best solution to export the mysql data to github, as mysql can hold some sensitive info in it, and how we can version the changes that happen in it ?

Answer is you don't hold data in the repo.
You may want to hold your ddl, and maybe some configuration data. But that's it.
If you want to version control your data, there are other options. GIT isn't one of them

It seems dbdeploy is what you are looking for

Use a blog engine "backend-ed by git", forget about mysql, commit on github.com, push and pull, dominate !
Here it is a list of the best:
http://jekyllrb.com/
http://nestacms.com/
http://cloudhead.io/toto
https://github.com/colszowka/serious
and just in case, ... a simple, Git-powered wiki with a sweet API and local frontend. :
https://github.com/github/gollum

Assuming that you have a small quantity of data that you wish to treat this way, you can use mysqldump to dump the tables that you wish to keep in sync, check that dump into git, and push it back into your database on checkout.
Write a shell script that does the equivalent of:
mysqldump [options] database table1 table2 ... tableN > important_data.sql
to create or update the file. Check that file into git and when your data changes in a significant way you can do:
mysql [options] database < important_data.sql
Ideally that last would be in a a git post-receive hook, so you'd never forget to apply your changes.
So that's how you could do it. I'm not sure you'd want to do it. It seems pretty brittle, esp. if Team Member 1 makes some laborious changes to the tables of interest while Team Member 2 is doing the same. One of them is going to check-in their changes first, and best case you'll have some nasty merge issues. Worst case is that one of them lose all their changes.
You could mitigate those issues by always making your changes in the important_data.sql file, but the ease or difficulty of that depend on your application. If you do this, you'll want to play around with the mysqldump options so you get a nice readable, and git-mergable file.

You can export each table as a separate SQL file. Only when a table is changed it can be pushed again.

If you were talking about configuration then I'd recommend sql dumps or similar to seed the database as per Ray Baxters answer.
Since you've mentioned Drupal, I'm guessing the data concerns users/ content. As such you really ought to be looking at having a single database that each developer connects to remotely - i.e. one single version. This is because concurrent modifications to mysql tables will be extremely difficult to reconcile (e.g. two new users both with user.id = 10 each making a new post with post.id = 1, post.user_id = 10 etc).
It may make sense, of course, to back this up with an sql dump (potentially held in version control) in case one of your developers accidentally deletes something critical.

If you just want a partial dump, PHPMyAdmin will do that. Run your SELECT statement and when it's displayed there will be an export link at the bottom of the page(the one at the top does the whole table).

You can version mysqldump files which are simply sql scripts as stated in the prior answers. Based on your comments it seems that your primary interest is to allow the developers to have a basis for a local environment.
Here is an excellent ERD for Drupal 6. I don't know what version of Drupal you are using or if there have been changes to these core tables between v6 and v7, but you can check that using a dump, or phpMyAdmin or whatever other tool you have available to you that lets you inspect the database structure. Drupal ERD
Based on the ERD, the data that would be problematic for a Drupal installation is in the users, user_roles, and authmap tables. There is a quick way to omit those, although it's important to keep in mind that content that gets added will have relationships to the users that added it, and Drupal may have problems if there aren't rows in the user table that correspond to what has been added.
So to script the mysqldump, you would simply exclude the problem tables, or at very least the user table.
mysqldump -u drupaldbuser --password=drupaluserpw 0-ignore-table=drupaldb.user drupaldb > drupaldb.sql
You would need to create a mock user table with a bunch of test users with known name/password combinations that you would only need to dump and version once, but ideally you want enough of these to match or exceed the number of real drupal users you'll have that will be adding content. This is just to make the permissions relationships match up.

Related

merge design of mysql between localhost and server?

I'm kinda new to this kind of problem. I'm developing a web-app and changing DB design trying to improve it and add new tables.
well since we had not published the app since some days ago,
what I would do was to dump all the tables in server and import my local version but now we've passed the version 1 and users are starting to use it.
so I can't dump the server, but I still would need to update design of server DB when I want to publish a new version. What are the best practices here?
I like to know how I can manage differences between local and server in mysql?
I need to preserve data in server and just change the design, data on local DB are only for test.
Before this all my other apps were small and I would change a single table or column but I can't keep track of all changes now, since I might revert many of them later and managing all team members on this is impossible.
Assuming you are not using a framework that provides a migration tool for database, you need to keep track of the changes manually.
Create a folder sql_upgrades (or whatever name you name) in your code repository
Whenever a team member updates the SQL schema, he creates a file in this folder with the corresponding ALTER statements, and possibly UPDATE, CREATE TABLE etc. So basically the file contains all the statements used to update the dev database.
Name the files so that it's easy to manage, and that statements for the same feature are grouped together. I suggest something like YYYYMMDD-description.sql, e.g. 20150825-queries-for-feature-foobar.sql
When you push to production, execute the files to upgrade you SQL schema in production. Only execute the files that have been created since your last deployment, and execute them in the order they have been created.
Should you need to rollback a file, check the queries it contains, and write queries to undo what was done (drop added columns, re-create dropped columns, etc.). Note that this is "non-trivial", as many changes cannot be rolled back fully (e.g. you can recreate a dropped column, but you will have lost the data inside).
Many web frameworks (such as Ruby of Rails) have tools that will do exactly that process for you. They usually work together with the ORM provided by the framework. Keeping track of the changes manually in SQL works just as well.

Is there a tool that converts git diff output to SQL INSERT and DELETE statements?

I'm working on a WordPress site by doing development on my laptop and then deploying the changes by pushing with git to the server. This works great for files and I want to do the same thing with content changes to the database.
My first iteration at solving the problem used git hooks to dump the database using mysqldump before commits and then restoring the dumps after checkouts. This works but drops and recreates the whole database each time. This is not OK because WordPress is also making changes automatically to the database that I want to keep, like records of which products are sold, so I don't want the whole thing dropped and restored every checkout.
I'm thinking a better solution would be to continue dumping the database during commits and then for checkouts use a new tool that reads the output of git diff HEAD^ and converts it to INSERT and DELETE SQL statements fed to mysql. That way the database would be patched incrementally with my changes while preserving changes made by others (such as WordPress). Example:
git diff:
(83,NULL,550,'TI-99/4A','',0,0,0,0,'',0,0,0),
-(85,NULL,2000,'Banana Jr. 6000','',0,0,0,0,'',0,0,0),
+(85,NULL,2000,'Banana Jr. 6000 (now with tint control!)','',0,0,0,0,'',0,0,0),
(88,NULL,150,'Symbolics 3645','',0,0,0,0,'',0,0,0),
converted to SQL:
DELETE FROM `wp_yak_product` WHERE `post_id`='85';
INSERT INTO `wp_yak_product` VALUES (85,NULL,2000,'Banana Jr. 6000 (now with tint control!)','',0,0,0,0,'',0,0,0);
I've searched around and can't find anything like this. I'm considering writing it myself.
Does something like this exist? Is this a good or a bad idea?
To my knowledge that type of tool does not exist. The best option that I know of to produce similar output would be to use the "Synchronize Model" functionality in MySQL Workbench.
That said, I would recommend tracking the changes that you make in your development database in a SQL file, checked into git, which can be executed on your production server.
I thought of a possible approach:
I dump the database into a different file for each table:
wp_commentmeta.sql
wp_comments.sql
wp_links.sql
etc.
Perhaps I could separate the tables into categories of content vs. bookkeeping, like the distinction between the usr and var directories in Unix, and add the bookkeeping tables to my .gitignore so they're not clobbered when I update the content.

How to log mysql database structural changes

I'm working with a project which is using mysql as the database. The application is hosted with many clients and we are doing upgrades for the current live systems often.
There are some instances where the client has change the database structure(adding new tables) and causes some unexpected db crashes.
I need to log all the structural changes which were done at that database, so we can find the correct root cause for that. We can't do it 100% correct with diff tool because it will not show the intermediate changes.
I found http://www.liquibase.org/ tool but seems little bit complex.
Is there any well known technique or a tool to track database structural changes only.
well from mysql studio you can generate all object's schema definition and compare them with your standard schema definition and this way you can compare two database schema...
generate scrips of both database (One is client's Database and One is master copy database) and then compare it using file compare tool would be the best practice according to me because this way you can track which collumn was added, which column was deleted, which index was added like wise without any tool download.
Possiable duplication of Compare two MySQL databases ?
Hope this helps.
If you have an application for your clients to manage these schema changes, you can use a mechanism at application level. If you have a Python and Django-based solution, you could probably use South which provides schema change tracking and rollbacks.

Are there generic options for version control within a database?

I have a small amount of experience using SVN on my development projects, and I have just as little experience with relational databases. I know the basic concepts like tables, and SQL statements, but I'm far from being an expert.
What I'd like to know is if there are any generic version control type systems like SVN, but that work with a database rather than files. I would like the same kind of features you get with SVN like the ability to create branches, create tags, and merge branches together. Rather than a revision number being associated to a version of a file repository it would be associated with a version of the database.
Are their any generic solutions available that can add this kind of functionality independent of the actual database schema? I'd be interested in solutions that work with MySQL or MS SQL Server.
I should also clarify that I'm trying to version control the data not the schema. I would expect the schema to remain constant. So really it seems like I want a way to create a log of all the INSERT, UPDATE, and DELETE requests sent the the database between each version of the data. That way any version could be recreated by resending all the SQL statements that have been saved up to the desired version.
You can script all your DDL, stored procedures and such to regular text files.
Then you can simply use SVN for database versioning.
I've never found a solution that works as well as Subversion, but here's a few things I've done that have helped:
Make scripts that will create the schema and populate any initial data. Then make an update script for each change after that. It's a fairly manual process, but it works. There's extra things that help like storing the current version number in a table in the db and making sure that the scripts are idempotent.
Store the full development db in Subversion. This doesn't usually work out too well for me if there is a lot of data or it is frequently changed. But in some projects is could work.
I keep and maintain create scripts in my version control system.
There are two things I can think of:
http://www.liquibase.org/ - provides a way of generally managing database changes. Creates files that get committed into source control, and it helps manage changes across different development databases, etc.
http://www.viget.com/extend/backup-your-database-in-git/ - this describes a strategy for backing up a database into source control, but the same strategy can be used just on the schema. In this scheme, the database would be in a separate area from your main code. (This can be used with other source control systems too.)

How do you maintain revision control of your database structure?

What is the simplest way of keeping track of changes to a projects database structure?
When I change something about the database (eg, add a new table, add a new field to an existing table, add an index etc), I want that to be propagated to the rest of the team, and ultimately the production server, with the minimal fuss and effort.
At the moment, the solution is pretty weak and relies of people remembering to do things, which is an accident waiting to happen.
Everything else is managed with standard revision control software (Perforce in our case).
We use MySQL, so tools that understand that would be helpful, though I would also be interested to learn how other places are handling this anyway, regardless of database engine.
You can dump the schema and commit it -- and the RCS will take care of the changes between versions.
You can get a tool like Sql Compare from Red-Gate which allows you to point to two databases and it will let you know what is different, and will build alter scripts for you.
If you're using .NET(Visual Studio), you can create a Database project and check that into source control.
This has alrady been discussed a lot I think. Anyhow I really like Rails approach to the issue. It's code that has three things:
The version number
The way of applying the changes (updates a version table)
The way of rolling the changes back (sets the version on the version table to the previous)
So, each time you make a changeset you create this code file that can rollback or upgrade the database schema when you execute it.
This, being code, you can commit in any revision control system. You commit the first dump and then the scripts only.
The great thing about this approach is that you can easily distribute the database changes to customers, whereas with a standard just dump the schema and update it approach generating an upgrade/rollback script is a nuisance
In my company each developer is encouraged to save all db sctructure changes to a script files in the folder containing module's revision number. These scripts are kept in svn repository.
When application starts, the db upgrade code compares current db version and code version and if the code is newer - looks into scripts folder and applies all db changes automatically.
This way every instance of application (on production or developers machines) always upgrades db to their code version and it works great.
Of course, some automation could be added - if we find a suitable tool.
Poor mans version control:
Separate file for each object (table, view, etc)
When altering tables, you want to diff CREATE TABLE to CREATE TABLE. Source code history is for communicating a story. You can't do a meaningful diff of CREATE TABLE and ALTER TABLE
Try to make changes to the files, then commit them to source control, then commit them to the SQL database. Most tools poorly support this because you shouldn't commit to source control until you test and you can't test without putting the code into SQL. So in practice, you try to use SQL Redgate to compare your files to the SQL database. Failing that, you adopt a harsh policy of dropping everything in the database and replacing it with what made it into source control.
Change scripts usually are single use, but applications exist, like wordpress, where you need to move the schema from 1.0 to 1.1, 1.1 to 1.5, etc. Each of those should be under source control and modified as such (i.e. as you find bugs in the script that moves you from 1.0 to 1.1, you create a new version of that script, not yet-another script)