Using Git to track mysql schema - some questions - mysql

If this is recommended ?
Can I ask some git command examples about how to track versions of mysql schema?
Should we use another repository other then the one we normally use on our application root ?
Should I use something called hook ?
Update:
1) We navigate onto our project root where .git database resides.
2) We create a sub folder called hooks.
3) We put something like this inside a file called db-commit:
#!/bin/sh
mysqldump -u DBUSER -pDBPASSWORD DATABASE --no-data=true> SQLVersionControl/vc.sql
git add SQLVersionControl/vc.sql
exit 0
Now we can:
4) git commit -m
This commit will include a mysql schema dump that has been run just before the commit.
The source of the above is here:
http://edmondscommerce.github.io/git/using-git-to-track-db-schema-changes-with-git-hook.html
If this is an acceptable way of doing it, can I please ask someone with patience to comment line by line and with as much detail as possible, what is happening here:
#!/bin/sh
mysqldump -u DBUSER -pDBPASSWORD DATABASE --no-data=true> SQLVersionControl/vc.sql
git add SQLVersionControl/vc.sql
exit 0
Thanks a lot.

Assuming you have a git repo already, do the following in a shell script or whatever:
#!/bin/bash -e
# -e means exit if any command fails
DBHOST=dbhost.yourdomain.com
DBUSER=dbuser
DBPASS=dbpass # do this in a more secure fashion
DBNAME=dbname
GITREPO=/path/to/git/repo
cd $GITREPO
mysqldump -h $DBHOST -u $DBUSER -p$DBPASS -d $DBNAME > $GITREPO/schema.sql # the -d flag means "no data"
git add schema.sql
git commit -m "$DBNAME schema version $(`date`)"
git push # assuming you have a remote to push to
Then start this script on a daily basis from a cron job or what have you.
EDIT: By placing a script in $gitdir/hooks/pre-commit (the name is important), the script will be executed before every commit. This way the state of the DB schema is captured for each commit, which makes sense. If you automatically run this sql script every time you commit, you will blow away your database, which does not make sense.
#!/bin/sh
This line specifies that it's a shell script.
mysqldump -u DBUSER -pDBPASSWORD DATABASE --no-data=true> SQLVersionControl/vc.sql
This is the same as in my answer above; taking the DDL only from the database and storing it in a file.
git add SQLVersionControl/vc.sql
This adds the SQL file to every commit made to your repository.
exit 0
This exits the script with success. This is possibly dangerous. If mysqldump or git add fails, you may blow away something you wanted to keep.

If you're just tracking the schema, put all of the CREATE statements into one .sql file, and add the file to git.
$> mkdir myschema && cd myschema
$> git init
$> echo "CREATE TABLE ..." > schema.sql
$> git add schema.sql
$> git commit -m "Initial import"

IMO the best approach is described here: http://viget.com/extend/backup-your-database-in-git. For your convenience I repeat the most important pieces here.
The trick is to use mysqldump --skip-extended-insert, which creates dumps that can be better tracked/diffed by git.
There are also some hints regarding the best repository configuration in order to reduce disk size. Copied from here:
core.compression = 9 : Flag for gzip to specify the compression level for blobs and packs. Level 1 is fast with larger file sizes, level 9 takes more time but results in better compression.
repack.usedeltabaseoffset = true : Defaults to false for compatibility reasons, but is supported with Git >=1.4.4.
pack.windowMemory = 100m : (Re)packing objects may consume lots of memory. To prevent all your resources go down the drain it's useful to put some limits on that. There is also pack.deltaCacheSize.
pack.window = 15 : Defaults to 10. With a higher value, Git tries harder to find similar blobs.
gc.auto = 1000 : Defaults to 6700. As indicated in the article it is recommended to run git gc every once in a while. Personally I run git gc --auto everyday, so only pack things when there's enough garbage. git gc --auto normally only triggers the packing mechanism when there are 6700 loose objects around. This flag lowers this amount.
gc.autopacklimit = 10: Defaults to 50. Every time you run git gc, a new pack is generated of the loose objects. Over time you get too many packs which waste space. It is a good idea to combine all packs once in a while into a single pack, so all objects can be combined and deltified. By default git gc does this when there are 50 packs around. But for this situation a lower number may be better.
Old versions can be pruned via:
git rebase --onto master~8 master~7
(copied from here)

The following includes a git pre-commit hook to capture mysql database/schema, given user='myuser', password='mypassword', database_name='dbase1'. Properly bubbles errors up to the git system (the exit 0's in other answers could be dangerous and may not handle error scenarios properly). Optionally, can add a database import to a post-checkout hook (when capturing all the data, not just schema), but take care given your database size. Details in bash-script comments below.
pre-commit hook:
#!/bin/bash
# exit upon error
set -e
# another way to set "exit upon error", for readability
set -o errexit
mysqldump -umyuser -pmypassword dbase1 --no-data=true > dbase1.sql
# Uncomment following line to dump all data with schema,
# useful when used in tandem for the post-checkout hook below.
# WARNING: can greatly expand your git repo when employing for
# large databases, so carefully evaluate before employing this method.
# mysqldump -umyuser -pmypassword dbase1 > dbase1.sql
git add dbase1.sql
(optional) post-checkout hook:
#!/bin/bash
# mysqldump (above) is presumably run without '--no-data=true' parameter.
set -e
mysql -umyuser -pmypassword dbase1 < dbase1.sql
Versions of apps, OS I'm running:
root#node1 Dec 12 22:35:14 /var/www# mysql --version
mysql Ver 14.14 Distrib 5.1.54, for debian-linux-gnu (x86_64) using readline 6.2
root#node1 Dec 12 22:35:19 /var/www# git --version
git version 1.7.4.1
root#node1 Dec 12 22:35:22 /var/www# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 11.04
Release: 11.04
Codename: natty
root#node1 Dec 12 22:35:28 /var/www#

While I am not using Git, I have used source control for over 15 years. A best practice to adhere to when deciding where and how to store your src and accompanying resources in Source Control: If the DB Schema is used within the project then you should be versioning the schema and all other project resources in "that" project. If you develop a set of schemas or programming resources that you resuse in other projects then you should have a seperate repository for those reusable resources. That seperate Reusable resources project will be versioned on it's own and will track the versions of the actual reusable resources in that repository.
If you use a versioned resource out of the reusable repository in a different project then you have the following scenario, (just an example). Project XYZ version 1.0 is now using DB Schema_ABC version 4.0 In this case you will understand that you have used a specific version of a reusable resource and since it is versioned you will be able to track its use throughout your project. If you get a bug report on DBSchema_ABC, you will be able to fix the schema and re-version as well as understand where else DBSchem_ABC is used and where you may have to make some changes. From there you will also understand which projects contain wich versions of which reusable resources... You just have to understand how to track your resources.
Adopting this type of development Environment and Resource Management strategy is key to releasing usable software and managing a break/fix enhancement environment. Even if you're developing for your own edificcation on your own time, you should be using source control.. as you are..
As for Git, I would find a gui front end or a dev env integration if I can. Git is pretty big so I am sure it has plenty of front end support, maybe?

As brilliant as it sounds (the idea did occur to me as well), when I tried to implement it, I hit a wall. In theory, by using the --skip-extended-insert flag, despite initial dump would be big, the diffs between daily dumps should be minimal, hence the size increase over time of the repository could be assumed to be minimal as well, right? Wrong!
Git stores shapshots, not diffs, which means on each commit, it will take the entire dump file, not just the diff. Moreover, since the dump with --skip-extended-instert will use all field names on every single insert line, it will be huge compared to a dump done without --skip-extended-instert. This results in an explosion in size, the exact opposite what one would expect.
In my case, with a ~300MB sql dump, the repository went to gigabytes in days. So, what did I do? I first tried the same thing, only remove --skip-extended-instert, so that dumps will be smaller, and snapshots would be proportionally smaller as well. This approach held for a while, but in time it became unusable as well.
Still, the diff usage with --skip-extended-insert actually still seemed like a good idea, only, now I try to use subversion instead of git. I know, compared to git, svn is ancient history, yet it seems to work better, since it actually does use diffs instead of snapshots.
So in short, I believe best solution is doing the above, but with subversion instead of git.

(shameless plug)
The dbvc commandline tool allows you to manage your database schema updates in your repository.
It creates and uses a table _dbvc in the database which holds a list of the updates that are run. You can easily run the updates that haven't been apply to your database schema yet.
The tool uses git to determine the correct order of executing the updates.
DBVC usage
Show a list of commands
dbvc help
Show help on a specific command
dbvc help init
Initialise DBVC for an existing database.
dbvc init
Create a database dump. This is used to create the DB on a new environment.
mysqldump foobar > dev/schema.php
Create the DB using the schema.
dbvc create
Add an update file. These are used to update the DB on other environments.
echo 'ALTER TABLE `foo` ADD COLUMN `status` BOOL DEFAULT 1;' > dev/updates/add-status-to-foo.sql
Mark an update as already run.
dbvc mark add-status-to-foo
Show a list of updates that need to be run.
dbvc status
Show all updates with their status.
dbvc status --all
Update the database.
dbvc update

I have found the following options to be mandatory for a version control / git-compatible mysqldump.
mysqldump --skip-opt --skip-comments |sed -e 's/DEFINER[ ]*=[ ]*[^*]*\*/\*/'
(and maybe --no-data)
--skip-opt is very useful, it takes away all of --add-drop-table --add-locks --create-options --disable-keys --extended-insert --lock-tables --quick --set-charset. The DEFINER sed is necessary when the database contains triggers.

Related

typo3: mysql database not useable

Currently I try to migrate a typo3 based Webserver to a new machine. (its my first migration, so please don't judge if I did smth wrong).
What I did so far:
transfer Files via wget on new machine
create dbdump with mysqldumb
transfer dump with wget
create database with mysql source <dumpfile.sql>
create user with access to the db
When I try to connect with the server, typo3 doesn't response.
And when I try to install typo3 from skretch and replace the new database with the old one, I also run into internal server errors.
Is there a solution on how to migrate the database correctly?
Yours Sincerely,
Sebastian
Mh,
this should not be an issue in general.
We often use following steps:
[SRC] BackupDatabase: MYSQL_PWD="DBPASS" mysqldump -uDBUSER --opt -e -Q --skip-comments --single-transaction=true | gzip >dump.sql.gz
[SRC] Pack the installation and the used core: tar -czf transfer.tar.gz ./typo-webfolder ./typo3_src-VERSION
Transfer both .gz files to new server (wget, scp, ftp etc )
[NEW] Deflate files: tar -xzf transfer.tar.gz
[NEW] Create a empty database, using your fav tool
[NEW] Import database: gunzip <dump.sql.gz | MYSQL_PWD="DBPASS" mysql -uDBUSER [-hDBHOST] NEWDBNAME
[NEW] Adjust DatabaseCredentials in `typo3conf/LocalConfiguration.php'
[NEW] Recheck symlinks (typo3_src, typo3, index.php)
[NEW] Recheck .htaccess files - maybe missed to pack and transfer ?
[NEW] Create FlagFile touch typo-webfolder/typo3conf/ENABLE_INSTALL_TOOL
[NEW] Open install tool in Webbrowser ( http://newdomain.tld/typo3/install ), checking requirements, maybe fixing folderstructure and so on, clearing all caches
Eventually clear the typo3temp folder (can be repopulated by the system)
In our projects, we are setting the DB Credentials through AdditionalConfiguration.php based on Enviroment Variables (read from a .env file )
So in generell there should not be any issues, but withour more information it is hard to help you further.
Some things:
Proxy/TrustedProxy settings
DomainRecord Settings in the Database ( sys_domain )
RealUrl Config With DomainName based settings
.htaccess Canonical rewrite rules based on domain/hostname
Missing needed php modules etc., wrong php version, checking php error log
in general your workflow is usable. (don't forget the filesystem fileadmin/ and typo3conf/ext/)
but there are some traps.
be sure to delete the corresponding caches for all changes in filesystem or database.
if you transfer the database: make sure you always use UTF-8 coding of everything!
regarding filesystem: there could be thumbnails or other resized images (folder __processed__/) but there also are entries in the database for each file and each resizing.
all extensions or configuration are cached in typo3temp/Code/*, also have in mind the autoloader files.
in most cases you can do a clean-up in the install tool.
so the first thing should be:
start the install tool, do all checks and remove all temporary information.

RTC - extract specific file from repository

I have a Perl script that I wrote to package release scripts.
The RTC bits in the script are as follows.
List the workspaces:
lscm list workspaces -r "$reposURI" -u $reposUser -P $reposPwd
List the componets:
lscm compare ws "$ws1" ws "$ws2" -r "$reposURI" -u $reposUser -P $reposPwd -I c
Compare the 2 workspaces' specified component to the changed files:
lscm compare ws "$ws1" ws "$ws2" -r "$reposURI" -u $reposUser -P $reposPwd -I cf
Great! I have the liust of files changed (trust me, this took a LOT of working out). Now, next step is simply extract the files listed from the changed workspace:
According to the documentation there is a "Lscm extract", but it seems not on the version I have. I cannot upgrade as this is a corporate environment where software installs are controlled centrally, and they are sticking with the current RTC version (3).
So, is there an alternative way?
I don't know if a lscm extract: it doesn't seem to exist in the RTC documentation.
The help page only mention a lscm changeset extract (used in RTC3.x).
lscm extract is only referenced one, in the article "Using the Jazz SCM command line to support software configuration audit", and I would say it is an error.
You can load only the file you care about: scm load <workspace> <path-in-workspace>. That will get the version onto the disk, but it will pollute your disk with RTC metadata (ie, the .jazz5 dir in the root of your sandbox). I suggest running in a temporary directory and then deleting that directory once you have the file content that you want.
That's kind of kludgy. Ideally you'd be able to move onto a modern version of RTC and use the 'extract' subcommand that you mention.

How to implement what vaguely is called "database versioning"?

I write a Web application using Yii framework and MySQL.
Now the boss wants "to store all changes in the database", in order to be able to restore older data if someone destroys some important information in the current version of the data.
What is to store all changes in the database is vague. I am not sure what exactly we should do.
How to fulfill this vague boss's requirement?
Can we do it with MySQL logs? What are pros and contras of using MySQL logs for this? Is it true that we need a programmer (me) to restore some (possibly not all) data from MySQL logs? Can MySQL (partial) data restoration be made simple?
Or should I hard work to manually (not with MySQL logs) store all old data in specific MySQL tables?
I guess what you are describing is an audit trail, which will be handy to go back and look at the history, but as for restoring, that will need to be manual.
Have a look at techniques for creating an audit trail.
You might want to try searching the extensions library for something like eactsasversioned that will archive edits made to records. I'm not sure if it saves deleted records, but it seems like it's close to what you want.
If you are looking for something you can easily restore from you probably need a backup script run on a very regular basis. I use a bash script(shown below) in cron to backup the databases I am worried about hourly. My databases are fairly small so this only takes a few seconds and could be increased to run every 15 minutes if you are super paranoid.
#!/bin/bash
dbName1="li_appointments"
dbName2="lidb_users"
dbName3="orangehrm_li"
fileName1=$dbName1"_`date +%Y.%m.%d-%H:%M:%S`.sql"
fileName2=$dbName2"_`date +%Y.%m.%d-%H:%M:%S`.sql"
fileName3=$dbName3"_`date +%Y.%m.%d-%H:%M:%S`.sql"
backupDir="/home/backups/mysql"
mysqldump -u backup_user --password='********************************' $dbName1 > $backupDir/$fileName1
mysqldump -u backup_user --password='********************************' $dbName2 > $backupDir/$fileName2
mysqldump -u backup_user --password='********************************' $dbName3 > $backupDir/$fileName3
bzip2 $backupDir/$fileName1
bzip2 $backupDir/$fileName2
bzip2 $backupDir/$fileName3
gpg -c --passphrase '********************************' $backupDir/$fileName1".bz2"
gpg -c --passphrase '********************************' $backupDir/$fileName2".bz2"
gpg -c --passphrase '********************************' $backupDir/$fileName3".bz2"
rm $backupDir/*.bz2
echo "Backups completed on `date +%D`" >> $backupDir/backuplog.log

How to solve jenkins 'Disk space is too low' issue?

I have deployed Jenkins in my CentOS machine, Jenkins was working well for 3 days, but yesterday there was a Disk space is too low. Only 1.019GB left. problem.
How can I solve this problem, it make my master offline for hours?
You can easily change the threshold from jenkins UI (my version is 1.651.3):
[]
Update: How to ensure high disk space
This feature is meant to prevent working on slaves with low free disk space. Lowering the threshold would not solve the fact that some jobs do not properly cleanup after they finish.
Depending on what you're building:
Make sure you understand what is the disk output of your build - if possible - restrict the output to happen only to the job workspace. Use workspace cleanup plugin to cleanup the workspace as post build step.
If the process must write some data to external folders - clean them up manually on post build steps.
Alternative1 - provision a new slave per job (use spot slaves - there are many plugins that integrate with different cloud provider to provision on the fly machines on demand)
Alternative2 - run the build inside a container. Everything will be discarded once the build is finished
Beside above solutions, there is a more "COMMON" way - directly delete the largest space consumer from Linux machine. You can follow the below steps:
Login to Jenkins machine (Putty)
cd to the Jenkins installation path
Using ls -lart to list out hidden folder also, normally jenkin
installation is placed in .jenkins/ folder
[xxxxx ~]$ ls -lart
drwxrwxr-x 12 xxxx 4096 Feb 8 02:08 .jenkins/
list out the folders spaces
Use df -h to show Disk space in high level
du -sh ./*/ to list out total memory for each subfolder in current path.
du -a /etc/ | sort -n -r | head -n 10 will list top 10 directories eating disk space in /etc/
Delete old build or other large size folder
Normally ./job/ folder or ./workspace/ folder can be the largest folder. Please go inside and delete base on you need (DO NOT
delete entire folder).
rm -rf theFolderToDelete
You can limit the reduce of disc space by discarding the old builds. There's a checkbox for this in the project configuration.
This is actually a legitimate question so I don't understand the downvotes, perhaps it belongs on Superuser or Serverfault. This is a soft warning threshold not hard limit where the disk is out of space.
For hudson see where to configure hudson node disk temp space thresholds - this is talking about the host, not nodes
Jenkins is the same. The conclusion is for many small projects the system property called hudson.diagnosis.HudsonHomeDiskUsageChecker.freeSpaceThreshold could be decreased.
In saying that I haven't tested it and there is a disclaimer
No compatibility guarantee
In general, these switches are often experimental in nature, and subject to change without notice. If you find some of those useful, please file a ticket to promote it to the official feature.
I got the same issue. My jenkins version is 2.3 and its UI is slightly different. Putting it here so that it may helps someone. Increasing both disk space thresholds to 5GB fixed the issue.
I have a cleanup job with the following build steps. You can schedule it #daily or #weekly.
Execute system groovy script build step to clean up old jobs:
import jenkins.model.Jenkins
import hudson.model.Job
BUILDS_TO_KEEP = 5
for (job in Jenkins.instance.items) {
println job.name
def recent = job.builds.limit(BUILDS_TO_KEEP)
for (build in job.builds) {
if (!recent.contains(build)) {
println "Preparing to delete: " + build
build.delete()
}
}
}
You'd need to have Groovy plugin installed.
Execute shell build step to clean cache directories
rm -r ~/.gradle/
rm -r ~/.m2/
echo "Disk space"
du -h -s /
To check the free space as Jenkins Job:
Parameters
FREE_SPACE: Needed free space in GB.
Job
#!/usr/bin/env bash
free_space="$(df -Ph . | awk 'NR==2 {print $4}')"
if [[ "${free_space}" = *G* ]]; then
free_space_gb=${x/[^0-9]*/}
if [[ ${free_space_gb} -lt ${FREE_SPACE} ]]; then
echo "Warning! Low space: ${free_space}"
exit 2
fi
else
echo "Warning! Unknown: ${free_space}"
exit 1
fi
echo "Free space: ${free_space}"
Plugins
Set build description
Post-Build Actions
Regular expression: Free space: (.*)
Description: Free space: \1
Regular expression for failed builds: Warning! (.*)
Description for failed builds: \1
For people who do not know where the configs are, download the tmpcleaner from
https://updates.jenkins-ci.org/download/plugins/tmpcleaner/
You will get an hpi file here. Go to Manage Jenkins-> Manage plugins-> Advanced and then upload the hpi file here and restart jenkins
You can immediately see a difference if you go to Manage Nodes.
Since my jenkins was installed in a debian server, I did not understand most of the answers related to this since i cannot find a /etc/default folder or jenkins file.
If someone knows where the /tmp folder is or how to configure it for debian , do let me know in comments

MySQL Memory engine + init-file

I'm trying to set up a MySQL database so that the tables are ran by the memory engine. I don't really care about loosing some data that gets populated but I would like to dump it daily (via mysqldump in a cronjob) and have the init-file set to this dump. However I can't seem to figure out how to get the mysqldump to be compatable with how the init-file wants the SQL statements to be formatted.
Am I just missing something completely obvious trying to set up a database this way?
MySQL dumps are exactly that -- dumps of the MySQL database contents as SQL. So, there isn't any way to read this directly as a database file.
What you can do, is modify your init script for MySQL to automatically load the last dump (via the command line) every time MySQL starts.
An even better solution would be to use a ramdisk to hold the entire contents of your database in memory, and then periodically copy this to a safe location as your backup.
Although, if you want to maintain the contents of your databases at all, you're better off just using one of the disk-based storage engines (InnoDB or MyISAM), and just giving your server a lot of RAM to use as a cache.
This solution is almost great, but it causes problems when string values in table data contain semicolons - all of them are replaced with newline char.
Here is how I implemented this:
mysqldump --comments=false --opt dbname Table1 Table2 > /var/lib/mysql/mem_tables_init.tmp1
#Format dump file - each statement into single line; semicolons in table data are preserved
grep -v -- ^-- /var/lib/mysql/mem_tables_init.tmp1 | sed ':a;N;$!ba;s/\n/THISISUNIQUESTRING/g' | sed -e 's/;THISISUNIQUESTRING/;\n/g' | sed -e 's/THISISUNIQUESTRING//g' > /var/lib/mysql/mem_tables_init.tmp2
#Add "USE database_name" instruction
cat /var/lib/mysql/mem_tables_init.tmp2 |sed -e 's/DROP\ TABLE/USE\ `dbname`;\nDROP\ TABLE/' > /var/lib/mysql/mem_tables_init.sql
#Cleanup
rm -f /var/lib/mysql/mem_tables_init.tmp1 /var/lib/mysql/mem_tables_init.tmp2
My understanding is that the --init-file is expecting each SQL statement on a single line and that there are no comments in the file.
You should be able to clear up the comments with:
mysqldump --comments=false
As for each SQL statement on one line, I'm not familiar with a mysqldump option to do that, but what you can do is a line of Perl to remove all of the newlines:
perl -pi -w -e 's/\n//g;' theDumpFilename
I don't know if --init-file will like it or not, but it's worth a shot.
The other thing you could do is launch mysql from a script that also loads in a regular mysqldump file. Not the solution you were looking for, but it might accomplish the effect you're after.
I stumbled onto this, so I'll tell you what I do. First, I have an ip->country db in a memory table. There is no reason to try to "save" it, its easily and regularly dropped and recreated, but it may be unpredictable how the php will act when its missing and its only scheduled to be updated weekly. Second, I have a bunch of other memory tables. There is no reason to save these, as they are even more volatile, with lifespans in minutes. They will be refreshed very quickly, but stale data is better than none at all. Also, if you are using any separate key caches, they may (in some cases) need to loaded first or you will be unable to load them. And finally, be sure to put a "use" statement in there if you're not dumpling complete databases, as there is no other interface (like mysql client) to open the database at start up.. So..
cat << EOF > /var/lib/mysql/initial_load.tmp
use fieldsave_db;
cache index fieldsave_db.search in search_cache;
EOF
mysqldump --comments=false -udrinkin -pbeer# fieldsave_db ip2c \
>> /var/lib/mysql/initial_load.tmp
mysqldump --comments=false -ufields -pavenue -B memtables \
>> /var/lib/mysql/initial_load.tmp
grep -v -- ^-- /var/lib/mysql/initial_load.tmp |tr -d '\012' \
|sed -e 's/;/;\n/g' > /var/lib/mysql/initial_load.sql
As always, YMMV, but it works for me.