Why won't these threaded ActiveRecord queries run concurrently? - mysql

I'm experimenting to understand a problem in production, so I've thrown this snippet inside a controller action in dev to test:
start = Time.now
num_threads = 6
results = Queue.new
saved_results = []
threads = []
connections = []
semaphore = Mutex.new
# start threads
(1..num_threads).each do |i|
threads << Thread.new do
#semaphore.synchronize { connections << ActiveRecord::Base.connection } # for cleanup?
#ActiveRecord::Base.connection.execute("select sleep(1.6);") # runs sequentially
sleep(1.6) # runs concurrently
result = User.find_by_id(i)
results << [i, result]
end
end
# end option 1 - let everyone finish
threads.each(&:join)
# end option 2 - simulate early exit condition
#while saved_results.count < 3 do saved_results << results.pop end
#threads.each(&:exit)
# cleanup/close open connections?
#connections.select(&:active?).each(&:disconnect!)
elapsed = Time.now - start
render :text => [ elapsed.to_s, saved_results.size, results.size ].join(", ")
sleep(1.6) executes in approximately 1.6 seconds, as expected.
However, the ActiveRecord select sleep(1.6); takes 6 * 1.6 = 9.6 seconds, despite mysql console show processlist; displaying that independent connections are opened for each thread*.
What's going on? Why won't the ActiveRecord queries run concurrently? I've also experienced this in production console.
I do have config.threadsafe! set in config/environment.rb. If it matters, I'm using Rails 2.3.
*These connections have to be manually closed? Production always has a lot of open connections that are doing nothing, causing Mysql::Error: Too many connections. I'll probably submit this issue as a another question.

Some remarks:
rails 2.3 itself is afaik itself not really threadsafe, since rails 3.x it is. But for this case that does not really matter I think.
you should be using ruby 1.9 at least. The "green threads" in 1.8 are less than optimal. While treading in ruby 1.9 still is not optimal, it is better. For real threading you should check out jruby or rubinius (no GIL).
you should be using the mysql2 gem. The mysql gem keeps the GIL while waiting for a response from the database.

Related

Get value of java options in jruby

I'd like to know, in a running jruby script, which java options are set. Is there a way to do this?
My problem is this: There are some scripts that I know require much more memory than others, and I would like to add a constraint in the code so that execution will stop early with proper warnings, rather than running out of memory at some unspecified time in the future.
Perhaps something I could stick in a BEGIN{}, like:
if is_set_joption?('-J-Xmx') then
if get_joption('-J-Xmx').match(/\d+/)[0].to_i < 1000 then
puts "You're gonna run out of memory...";
abort();
end
else
puts "I recommend you start with -J-Xmx1000m.";
abort();
end
(... where is_set_joption? and get_joption are made up methods.)
Running jruby 1.7.8.
It'll be in your ENV if you've set JAVA_OPTS in your environment. You could get it from that.. You've already initiated the JVM at this point though, so you'll want to set that elsewhere, like in your command line when you exec jruby with -D
One can do:
require 'java';
java_import 'java.lang.Runtime';
mxm = Runtime.getRuntime.maxMemory.to_i;
if mxm < (512 * 1024 * 1024) then
raise "You're gonna need a bigger boat.";
end

Hanging MySQL2 connections spun up in EventMachine

I'm running this code from the mysql2 gem docs:
require 'mysql2/em'
EM.run do
client1 = Mysql2::EM::Client.new
defer1 = client1.query "SELECT sleep(3) as first_query"
defer1.callback do |result|
puts "Result: #{result.to_a.inspect}"
end
client2 = Mysql2::EM::Client.new
defer2 = client2.query "SELECT sleep(1) second_query"
defer2.callback do |result|
puts "Result: #{result.to_a.inspect}"
end
end
It runs fine, printing the results
Result: [{"second_query"=>0}]
Result: [{"first_query"=>0}]
but then the script just hangs and never returns to the command line. Any idea what is going on?
EM.run will start an EventMachine reactor. That reactor just loops and loops and loops until you somehow tell it to stop. You can manually stop it using EM.stop.
In your case, you might want to check for the callback results and stop the reactor when both callbacks fired. Ilya's em-http-request library provides a nice interface for exactly that use case. Might be worth a look.

Ruby Shoes and MySQL: GUI freezes, should I use threads?

I'm trying to learn Shoes and decided to make a simple GUI to run a SQL-script line-by-line.
My problem is that in GUI pressing the button, which executes my function, freezes the GUI for the time it takes the function to execute the script.
With long scripts this might take several minutes.
Someone had a similar problem (http://www.stackoverflow.com/questions/958662/shoes-and-heavy-operation-in-separate-thread) and the suggestion was just to put intensive stuff under a Thread: if I copy the math-code from previously mentioned thread and replace my mysql-code the GUI works without freezing, so that probably hints to a problem with how I'm using the mysql-adapter?
Below is a simplified version of the code:
problem.rb:
# copy the mysql-gem if not in gem-folder already
Shoes.setup do
gem 'mysql'
end
require "mysql"
# the function that does the mysql stuff
def someFunction
con = Mysql::real_connect("myserver", "user", "pass", "db")
scriptFile = File.open("myfile.sql", "r")
script = scriptFile.read
scriptFile.close
result = []
script.each_line do |line|
result << con.query(line)
end
return result
end
# the Shoes app with the Thread
Shoes.app do
stack do
button "Execute someFunction" do
Thread.start do
result = someFunction
para "Done!"
end
end
end
stack do
button "Can't click me when GUI is frozen" do
alert "GUI wasn't frozen?"
end
end
end
I think the problem arises from scheduling which is done by ruby, not by the operating system. Probably just a special case with shoes + mysql.
As a workaround i'd suggest you spawn a separate process for the script and use socket or file based communication between the processes.

kill mysql query in perl

I have a perl script in which i create a table from the existing mysql databases. I have hundreds of databases and tables in each database contains millions of records, because of this, sometimes, query takes hours because of problems with indexing, and sometimes i run out disk space due to improper joins. Is there a way i can kill the query in the same script watching for memory consumption and execution time?
P.S. I am using DBI module in perl for mysql interface
As far as execution time, you use alarm Perl functionaility to time out.
Your ALRM handle can either die (see example below), or issue a DBI cancel call (sub { $sth->cancel };)
The DBI documentation actually has a very good discussion of this as well as examples:
eval {
local $SIG{ALRM} = sub { die "TIMEOUT\n" }; # N.B. \n required
eval {
alarm($seconds);
... code to execute with timeout here (which may die) ...
};
# outer eval catches alarm that might fire JUST before this alarm(0)
alarm(0); # cancel alarm (if code ran fast)
die "$#" if $#;
};
if ( $# eq "TIMEOUT\n" ) { ... }
elsif ($#) { ... } # some other error
As far as watching for memory, you just need for ALRM handler - instead of simply dying/cancelling - first checks the memory consumption of your script.
I won't go into details of how to measure memory consumption since it's an unrelated question that was likely already answered comprehensively on SO, but you can use size() method from Proc::ProcessTable as described in the Perlmonks snippet Find memory usage of perl program.
I used KILL QUERY command as described in http://www.perlmonks.org/?node_id=885620
This is the code from my script
eval {
eval { # Time out and interrupt work
my $TimeOut=Sys::SigAction::set_sig_handler('ALRM',sub {
$dbh->clone()->do("KILL QUERY ".$dbh->{"mysql_thread_id"});
die "TIMEOUT\n";
});
#Set alarm
alarm($seconds);
$sth->execute();
# Clear alarm
#alarm(0);
};
# Prevent race condition
alarm(0);
die "$#" if $#;
};
This code kills the query and also removes all the temporary tables
Watch out that you can't kill a query, using the same connection handler if you have a query that is stuck because of a table lock. You must open another connection with the same user and will that thread id.
Of course, you have to store in a hash the list of currently opened thread ids.
Mind that once you've terminated a thread id, the rest of the Perl code will execute .. on a unblessed handler.

restart multi-threaded perl script and close mysql connection

I have a Perl script that reads a command file and restarts itself if necessary by doing:
myscript.pl:
exec '/home/foo/bin/myscript.pl';
exit(0);
Now, this works fine except for one issue. The thread that reads the command file does not have access to the DBI handle I use. And over a number of restarts I seem to build up the number of open mysql connections till I get the dreaded "Too Many Connections" error. The DBI spec says:
"Because of this (possibly temporary) restriction, newly created threads must make their own connections to the database. Handles can't be shared across threads."
Any way to close connections or perhaps a different way to restart the script?
Use a flag variable that is shared between threads. Have the command line reading thread set the flag to exit, and the thread holding the DB handle release it and actually do the re-exec:
#!/usr/bin/perl
use threads;
use threads::shared;
use strict; use warnings;
my $EXIT_FLAG :shared;
my $db_thread = threads->create('do_the_db_thing');
$db_thread->detach;
while ( 1 ) {
sleep rand 10;
$EXIT_FLAG = 1 if 0.05 > rand or time - $^T > 20;
}
sub do_the_db_thing {
until ( $EXIT_FLAG ) {
warn sprintf "%d: Working with the db\n", time - $^T;
sleep rand 5;
}
# $dbh->disconnect; # here
warn "Exit flag is set ... restarting\n";
exec 'j.pl';
}
You could try registering an atexit function to close the DBI handle at the point where it is opened, and then use fork & exec to restart the script rather then just exec. The parent would then call exit, invoking the atexit callback to close the DBI handle. The child could re-exec itself normally.
Edit: After thinking for a couple more minutes, I believe you could skip the atexit entirely because the handle would be closed automatically upon the parent's exit. Unless, of course, you need to do a more complex operation upon closing the DB handle than a simple filehandle close.
my $pid = fork();
if (not defined $pid) {
#Could not fork, so handle the error somehow
} elsif ($pid == 0) {
#Child re-execs itself
exec '/home/foo/bin/myscript.pl';
} else {
#Parent exits
exit(0);
}
If you expect a lot of connections, you probably want DBI::Gofer to act as a DBI proxy for you. You create as many connections in as many scripts as you like, and DBI::Gofer shares them when it can.