I've seen lots of examples of making Docker containers for Rails applications. Typically they run a rails server and have a CMD that runs migrations/setup then brings up the Rails server.
If I'm spawning 5 of these containers at the same time, how does Rails handle multiple processes trying to initiate the migrations? I can see Rails checking the current schema version in the general query log (it's a MySQL database):
SELECT `schema_migrations`.`version` FROM `schema_migrations`
But I can see a race condition here if this happens at the same time on different Rails instances.
Considering that DDL is not transactional in MySQL and I don't see any locks happening in the general query log while running migrations (other than the per-migration transactions), it would seem that kicking them off in parallel would be a bad idea. In fact if I kick this off three times locally I can see two of the rails instances crashing when trying to create a table because it already exists while the third rails instance completes the migrations happily. If this was a migration that inserted something into the database it would be quite unsafe.
Is it then a better idea to run a single container that runs migrations/setup then spawns (for example) a Unicorn instance which in turn spawns multiple rails workers?
Should I be spawning N rails containers and one 'migration container' that runs the migration then exits?
Is there a better option?
Especially with Rails I don't have any experience, but let's look from a docker and software engineering point of view.
The Docker team advocates, sometimes quite aggressively, that containers are about shipping applications. In this really great statement, Jerome Petazzoni says that it is all about separation of concerns. I feel that this is exactly the point you already figured out.
Running a rails container which starts a migration or setup might be good for initial deployment and probably often required during development. However, when going into production, you really should consider separating the concerns.
Thus I would say have one image, which you use to run N rails container and add a tools/migration/setup whatever container, which you use to do administrative tasks. Have a look what the developers from the official rails image say about this:
It is designed to be used both as a throw away container (mount your source code and start the container to start your app), as well as the base to build other images off of.
When you look at that image there is no setup or migration command. It is totally up to the user how to use it. So when you need to run several containers just go ahead.
From my experience with mysql this works fine. You can run a data-only container to host the data, run a container with the mysql server and finally run a container for administrative tasks like backup and restore. For all three containers you can use the same image. Now you are free to access your database from let's say several Wordpress containers. This means clear separation of concerns. When you use docker-compose it is not that difficult to manage all those containers. Certainly there are already many third party containers and tools to also support you with setting up a complex application consisting of several containers.
Finally, you should decide whether docker and the micro-service architecture is right for your problem. As outlined in this article there are some reasons against. One of the core problems being that it adds a whole new layer of complexity. However, that is the case with many solutions and I guess you are aware of this and willing to except it.
docker run <container name> rake db:migrate
Starts you standard application container but don't run the CMD (rails server), but rake db:migrate
UPDATE: Suggested by Roman, the command would now be:
docker exec <container> rake db:migrate
Having the same pb publishing to a docker swarm, I put here a solution partially grabbed from others.
Rails has already a mechanism to detect concurrent migrations by using a lock on the database. But it triggers ConcurrentException where it should just wait.
One solution is then to have a loop, that whenever a ConcurrentException is thrown, just wait for 5s et then redo the migration.
This is especially important that all containers perform the migration as the migration fails, all containers must fails.
Solution from coffejumper
namespace :db do
namespace :migrate do
desc 'Run db:migrate and monitor ActiveRecord::ConcurrentMigrationError errors'
task monitor_concurrent: :environment do
loop do
puts 'Invoking Migrations'
Rake::Task['db:migrate'].reenable
Rake::Task['db:migrate'].invoke
puts 'Migrations Successful'
break
rescue ActiveRecord::ConcurrentMigrationError
puts 'Migrations Sleeping 5'
sleep(5)
end
end
end
end
And sometimes you have other processes you want to execute also one by one to perform the migration like after_party, cron setup, etc... The solution is then to use the same mechanism as Rails to embed rake tasks around a database lock:
Below, based on Rails 6 code, the migrate_without_lock performs the needed migrations while with_advisory_lock gets database lock (triggering ConcurrentMigrationError if lock cannot be acquired).
module Swarm
class Migration
def migrate
with_advisory_lock { migrate_without_lock }
end
private
def migrate_without_lock
**puts "Database migration"
Rake::Task['db:migrate'].invoke
puts "After_party migration"
Rake::Task['after_party:run'].invoke
...
puts "Migrations successful"**
end
def with_advisory_lock
lock_id = generate_migrator_advisory_lock_id
MyAdvisoryLockBase.establish_connection(ActiveRecord::Base.connection_config) unless MyAdvisoryLockBase.connected?
connection = MDAdvisoryLockBase.connection
got_lock = connection.get_advisory_lock(lock_id)
raise ActiveRecord::ConcurrentMigrationError unless got_lock
yield
ensure
if got_lock && !connection.release_advisory_lock(lock_id)
raise ActiveRecord::ConcurrentMigrationError.new(
ActiveRecord::ConcurrentMigrationError::RELEASE_LOCK_FAILED_MESSAGE
)
end
end
MIGRATOR_SALT = 1942351734
def generate_migrator_advisory_lock_id
db_name_hash = Zlib.crc32(ActiveRecord::Base.connection_config[:database])
MIGRATOR_SALT * db_name_hash
end
end
# based on rails 6.1 AdvisoryLockBase
class MyAdvisoryLockBase < ActiveRecord::AdvisoryLockBase # :nodoc:
self.connection_specification_name = "MDAdvisoryLockBase"
end
end
Then as before, do a loop to wait
namespace :swarm do
desc 'Run migrations tasks after acquisition of lock on database'
task migrate: :environment do
result = 1
(1..10).each do |i|
**Swarm::Migration.new.migrate**
puts "Attempt #{i} sucessfully terminated"
result = 0
break
rescue ActiveRecord::ConcurrentMigrationError
seconds = rand(3..10)
puts "Attempt #{i} another migration is running => sleeping #{seconds}s"
sleep(seconds)
rescue => e
puts e
e.backtrace.each { |m| puts m }
break
end
exit(result)
end
end
Then in your startup script just launch the rake tasks
set -e
bundle exec rails swarm:migrate
exec bundle exec rails server -b "0.0.0.0"
At the end, as your migrations tasks are run by all containers, they must have a mechanism to do nothing when it's already done. (like does db:migrate)
Using this solution, the order in which Swarm launches containers doesn't matter anymore AND if something goes wrong, all containers know the problem :-)
For single container id:
docker exec -it <container ID> bundle exec rails db:migrate
for multiple we can repeat the process for different container, if there number in 1000 the need to script to execute.
Related
I would like to use test databases for feature branches.
Of course it would be best to create a gitlab ci environment on the fly (review apps style) and also create a test database on the target system with the same name. Unfortunately, this is not possible because the MySQL databases in the target system have fixed names, like xxx_1, xxx_2 etc. and this cannot be changed without moving to a different hosting provider.
So I would like to do something like "grab an empty test data base from the given xxx_n and then empty it again when the branch is deleted".
How could this be handled with gitlab ci?
Can I set a variable on the project that says "feature branch Y already uses database xxx_4"?
Or should I put a table into the test database to store this information?
Using dynamic environments/variables and stop jobs might be able to do the trick. Stop jobs will run when the environment is "stopped" -- in the case of feature branches without associated MRs, when the feature branch is deleted (or if there is an open MR for the review app, when the MR is merged or closed)
Can I set a variable on the project that says "feature branch Y already uses database xxx_4"?
One way may be to put the db name directly in the environment name. Then the Environments API keeps track of this.
stages:
- pre-deploy
- deploy
determine_database:
stage: pre-deploy
image: python:3.9-slim
script:
- pip install python-gitlab
- database_name=$(determine-database) # determine what database names are not currently in use
- echo "database_name=${database_name}" > vars.env
artifacts:
reports: # automatically set $database_name variable in subsequent jobs
dotenv: "vars.env"
deploy_review_app:
stage: deploy
environment:
name: review/$CI_COMMIT_REF_SLUG/$database_name
on_stop: teardown
script:
- echo "deploying review app for $CI_COMMIT_REF with database name configuration $database_name"
- ... # steps to actually do the deploy
teardown: # this will trigger when the environment is stopped
stage: deploy
variables:
GIT_STRATEGY: none # ensures this works even if the branch is deleted
when: manual
script:
- echo "tearing down test database $database_name"
- ... # actual script steps to stop env and cleanup database
environment:
name: review/$CI_COMMIT_REF_SLUG/$database_name
action: "stop"
The implementation of the determine-database command may have to connect to your database to determine what database names are available (or perhaps you have a set of these provisioned in advance). You can then inspect the GitLab environments API to see what database names are still in use (since it's baked into the environment name).
For example, you might have something like this. Here, I am using the python-gitlab API wrapper just because it's most familiar to me, but the same principle can be applied to any method of calling the GitLab REST API.
#!/usr/bin/env python3
import gitlab
import os, sys, random
GITLAB_URL = os.environ['CI_SERVER_URL']
PROJECT_TOKEN = os.environ['MY_PROJECT_TOKEN'] # you generate and add this to your CI/CD variables!
PROJECT_ID = os.environ['CI_PROJECT_ID']
DATABASE_NAMES = ['xxx_1', 'xxx_2', 'xxx_3'] # or determine this programmatically by connecting to the DB
gl = gitlab.Gitlab(GITLAB_URL, private_token=PROJECT_TOKEN)
in_use_databases = []
project = gl.projects.get(PROJECT_ID)
for environment in project.environments.list(state='available', all=True):
# the in-use database name is the string after the last '/' in the env name
in_use_db_name = environment.name.split('/')[-1]
in_use_databases.append(in_use_db_name)
available_databases = [name for name in DATABASE_NAMES if name not in in_use_databases]
if not available_databases: # bail if all databases are in use
print('FATAL. no available databases', file=sys.stderr)
raise SystemExit(1)
# otherwise pick one and output to stdout
db_name = random.choice(available_databses)
# optionally you could prepare the database here, too, instead of relying on the `on_stop` job.
print(db_name)
There is a potential concurrency problem here (two runs of determine_database concurrently on different branches can potentially select the same db twice before either finish) but that could be addressed with resource locks.
we use multiple PHP workers. Every PHP worker is organized in one container. To scale the amount of parallel working processes we handle it in a docker swarm.
So the PHP is running in a loop and waiting for new jobs (Get jobs from Gearman).
If a new job is receiving, it would be processed. After that, the script is waiting for the next job without quitting/leaving the PHP script.
Now we want to update our workers. In this case, the image is the same but the PHP script is changed.
So we have to leave the PHP script, update the PHP script file, and restart the PHP script.
If I use this docker service update command. Docker will stop the container immediately. In the worst case, a running worker will be canceled during this work.
docker service update --force PHP-worker
Is there any possibility to restart the docker container soft?
Soft means, give the container a sign: "I have to do a restart, please cancel all running processes." That the container has the chance to quit his work.
In my case, before I run the next process in the loop. I will check this cancel flag. If this cancel flag set I will end the loop and end running the PHP script.
Environment:
Debian: 10
Docker: 19.03.12
PHP: 7.4
In the meantime, we have solved it with SIGNALS.
In PHP work with signals is very easy. In our case, this structure helped us.
//Terminate flag
$terminate = false;
//Register signals
pcntl_async_signals(true);
pcntl_signal(SIGTERM, function() use (&$terminate) {
echo"Get SIGTERM. End worker LOOP\n";
$terminate = true;
});
pcntl_signal(SIGHUP, function() use (&$terminate) {
echo"Get SIGHUP. End worker LOOP\n";
$terminate = true;
});
//loop
while($terminate === false){
//do next job
}
Before the next job is started it is checked if the terminate flag is set.
Docker has great support for gracefully stopping containers.
To define the time to wait we used the tag "stop_grace_period".
I'm working on a Perl6 project, but having difficulty connecting to MySQL. Even when using the DBIish (or perl6.org tutorial) example code, the connection fails. Any suggestions or advice is appreciated! User credentials have been confirmed accurate too.
I'm running this on Windows 10 with MySQL Server 8.0 and standard Perl6 with Rakudo Star. I have tried modifying the connection string in numerous ways like :$password :password<> :password() etc. but can't get a connection established. Also should note that I have the ODBC, C, C++, and.Net connectors installed.
#!/usr/bin/perl6
use v6.c;
use lib 'lib';
use DBIish;
use Register::User;
# Windows support
%*ENV<DBIISH_MYSQL_LIB> = "C:/Program Files/MySQL/MySQL Server 8.0/liblibmysql.dll"
if $*DISTRO.is-win;
my $dbh = DBIish.connect('mysql', :host<localhost>, :port(3306), :database<dbNameHere>, :user<usernameHere>, :password<pwdIsHere>) or die "couldn't connect to database";
my $sth = $dbh.prepare(q:to/STATEMENT/);
SELECT *
FROM users
STATEMENT
$sth.execute();
my #rows = $sth.allrows();
for #rows { .print }
say #rows.elems;
$sth.finish;
$dbh.dispose;
This should be connecting to the DB. Then the app runs a query, followed by printing out each resulting row. What actually happens is the application hits the 'die' message every time.
This is more of a work around, but being unable to use use a DB is crippling. So even when trying to use the NativeLibs I couldn't get a connection via DBIish. Instead I have opted to using DB::MySQL which is proving to be quite helpful. With a few lines of code this module has your DB needs covered:
use DB::MySQL;
my $mysql = DB::MySQL.new(:database<databaseName>, :user<userName>, :password<passwordHere>);
my #users = $mysql.query('select * from users').arrays;
for #users { say "user #$_[0]: $_[1] $_[2]"; }
#Results would be:
#user #1: FirstName LastName
#user #2: FirstName LastName
#etc...
This will print out a line for each user formatted as shown above. It's not as familiar as DBIish, but this module gets the job done as needed. There's plenty more you can do with it to, so I highly recommend reading the docs.
According to this github DBIish issue 127
The environmental variable DBIISH_MYSQL_LIB was removed. I don't know if anyone brought it back.
However if you add the library's path, and the file is named mysql.dll, it will work. Not a good result for the scientific method.
So more testing is needed - and perhaps
C:\Program Files\MySQL\MySQL Server 8.0\lib>mklink mysql.dll .\libmysql.dll
Oviously you can create your own lib directory and add that to your path and then add this symlink to that directory.
Hope this helps. I've spent hours..
EDIT: Still spending time - accounting later.
Something very transitory is going on. I reset the machine (perhaps always do this from now on), and still got the missing mysql.dll errors. Tried going into the MySQL lib directory to execute raku from there.. worked. changed directories.. didn't work.
Launched administrator cmd - from home directory, tried the raku command. Worked. Ok - not good, but perhaps consistent. Launched non admin cmd, tried it from the MySQL lib directory, worked. And just for giggles, tried it outside of that directory.. worked.
Now I can't get it not to work. Will explore NativeLibs::Searcher as Valle Lukas suggested!
Maybe the example in the dbiish repository is not valid anymore.
The DBIISH_MYSQL_LIB Env seems to be replaced by NativeLibs::Searcher with commit 9bc4191
Looking at NativeLibs::Searcher may help to find the root cause of the problem.
I have a Rails 4.2.5 application with a MySQL 5.6 database. This MySQL database has a number of foreign keys, views and functions. Schema.rb is designed to be database agnostic and therefore can't support the database specific commands necessary to modify these additional schema objects so the structure.sql functionality is provided.
http://edgeguides.rubyonrails.org/active_record_migrations.html#schema-dumping-and-you
Unfortunately, the built in structure dump tasks for MySQL do not include procedures, triggers or foreign keys. This is problematic for our team as we have to manually version control these "non standard" objects. Therefore I decided to find a solution that would allow management of the entire database schema using migrations. I landed upon this nice post by Pivotol Labs.
https://blog.pivotal.io/labs/labs/using-mysql-foreign-keys-procedures-and-triggers-with-rails
namespace :db do
namespace :structure do |schema|
schema[:dump].abandon
desc 'OVERWRITTEN - shell out to mysqldump'
task dump: :environment do
config = ActiveRecord::Base.configurations[Rails.env]
filename = "#{Rails.root}/db/structure.sql"
cmd = "mysqldump -u#{config['username']} -p#{config['password']} "
cmd += '-d --routines --triggers --skip-comments '
cmd += "#{config['database']} > db/structure.sql"
system cmd
File.open(filename, 'a') do |f|
f << ActiveRecord::Base.connection.dump_schema_information
end
end
end
desc 'load the development_structure file using mysql shell'
task load: :environment do
config = ActiveRecord::Base.configurations[Rails.env]
cmd = "mysql -u#{config['username']} -p#{config['password']} "
cmd += "#{config['database']} < db/structure.sql"
system cmd
end
end
namespace :test do |schema|
schema[:clone_structure].abandon
desc 'OVERWRITTEN - load the development_structure file using mysql shell'
task clone_structure: %w(db:structure:dump db:test:purge) do
config = ActiveRecord::Base.configurations['test']
cmd = "mysql -u#{config['username']} -p#{config['password']} "
cmd += "#{config['database']} < db/structure.sql"
system cmd
end
end
end
By making use of mysqldump from the shell I can generate a structure.sql file that contains all of the schema objects.
Currently my main problem is on Heroku I can't locate mysql dump. I installed this buildpack which provides the MySQL binaries.
https://github.com/gaumire/heroku-buildpack-mysql
However I get the error
mysqldump: not found
when running heroku run rake db:migrate.
As you can see I'm down quite the rabbit hole here. I suspect there's going to be a problem with Heroku's readonly file system anyway even if I can correctly locate mysqldump. Perhaps I should bypass non development environments in my overridden rake db:structure:dump task because structure.sql should contain a schema that's consistent across all my environments, so perhaps I can get away with not trying to write to it in production?
If anyone has managed to pull this off or has alternative approaches to managing a complete MySQL schema using Active Record migrations I'd appreciate your input.
You can troubleshoot this by running heroku run bash -a <myapp>, which will launch a bash shell in a one-off dyno with the same environment as you would get when running heroku run rake db:migrate.
Heroku's file systems are not read only, they're "ephemeral". You can create/change files in a dyno but those changes are lost when the dyno is terminated, so changes do not persist, so this approach should work, provided that you can locate the mysqldump binary.
It is possible that my question title is misleading, but here goes --
I am trying out a prototype app which involves three MySQL/Perl Dancer powered web apps.
The user goes to app A which serves up a Google maps base layer. On document ready, app A makes three jQuery ajax calls -- two to app B like so
http://app_B/points.json
http://app_B/polys.json
and one to app C
http://app_C/polys.json
Apps B and C query the MySQL database via DBI, and serve up json packets of points and polys that are rendered in the user's browser.
All three apps are proxied through Apache to Perl Starman running via plackup started like so
$ plackup -E production -s Starman -w 10 -p 5000 path/to/app_A/app.pl
$ plackup -E production -s Starman -w 10 -p 5001 path/to/app_B/app.pl
$ plackup -E production -s Starman -w 10 -p 5002 path/to/app_C/app.pl
From time to time, I start getting errors back from the apps called via Ajax. The initial symptoms were
{"error":"Warning caught during route
execution: DBD::mysql::st fetchall_arrayref
failed: fetch() without execute() at
<path/to/app_B/app.pm> line 79.\n"}
The offending lines are
71> my $sql = qq{
72> ..
73>
74>
75> };
76>
77> my $sth = $dbh->prepare($sql);
78> $sth->execute();
79> my $res = $sth->fetchall_arrayref({});
This is bizarre... how can execute() not take place above? Perl doesn't have a habit of jumping over lines, does it? So, I turned on DBI_TRACE
$DBI_TRACE=2=logs/dbi.log plackup -E production -p 5001 -s Starman -w
10 -a bin/app.pl
And, following is what stood out to me as the potential culprit in the log file
> Handle is not in asynchronous mode error 2000 recorded: Handle is
> not in asynchronous mode
> !! ERROR: 2000 CLEARED by call to fetch method
What is going on? Basically, as is, app A is non-functional because the other apps don't return data "reliably" -- I put that in quotes because they do work correctly occasionally, so I know I don't have any logic or syntax errors in my code. I have some kind of intrinsic plumbing errors.
I did find the following on DBD::mysql about ASYNCHRONOUS_QUERIES and am wondering if this is the cause and the solution of my problem. Essentially, if I want async queries, I have to add {async => 1} to my $dbh-prepare(). Except, I am not sure if I want async true or false. I tried it, it and it doesn't seem to help.
I would love to learn what is going on here, and what is the right way to solve this.
How are you managing your database handles? If you are opening a connection before starman forks your code then multiple children may be trying to share one database handle and are confusing MySQL. You can solve this problem by always running a DBI->connect in your methods that talk to the database, but that can be inefficient. Many people switch over to some sort of connection pool, but I have no direct experience with any of them.