Parallel mysql I/O in Ruby - mysql

Good day to you. I'm writing a cron job that hopefully will split a huge MySQL table to several threads and do some work on them. This is the minimal sample of what I have at the moment:
require 'mysql'
require 'parallel'
#db = Mysql.real_connect("localhost", "root", "", "database")
#threads = 10
Parallel.map(1..#threads, :in_processes => 8) do |i|
begin
#db.query("SELECT url FROM pages LIMIT 1 OFFSET #{i}")
rescue Mysql::Error => e
#db.reconnect()
puts "Error code: #{e.errno}"
puts "Error message: #{e.error}"
puts "Error SQLSTATE: #{e.sqlstate}" if e.respond_to?("sqlstate")
end
end
#db.close
The threads don't need to return anything, they get their job share and they do it. Only they don't. Either connection to MySQL is lost during the query, or connection doesn't exist (MySQL server has gone away?!), or no _dump_data is defined for class Mysql::Result and then Parallel::DeadWorker.
How to do that right?

map method expects a result; I don't need a result, so I switched to each:
Parallel.each(1..#threads, :in_processes => 8) do |i|
Also this solves a problem with MySQL: I just needed to start the connection inside the parallel process. When using each loop, it's possible. Of course, connection should be closed inside the process also.

Related

Active Record mySQL Timeout ERROR -- : worker=0 PID:(xxxxx) timeout (61s > 60s), killing

I have a async action on my controller that can perform a heavy SQL query depending on user input.
#results = ActiveRecord::Base
.connection
.select_all(query_string)
.map do |record|
Hashie::Mash.new(record)
end
When it happens, the only response I get from the server is
E, [2020-02-05T16:14:04.133233 #59909] ERROR -- : worker=0 PID:60952 timeout (61s > 60s), killing
E, [2020-02-05T16:14:04.159372 #59909] ERROR -- : reaped #<Process::Status: pid 60952 SIGKILL (signal 9)> worker=0
Is there any way I can capture this timeout on the backend, to give the user the correct feedback?
Tried using Timeout::timeout(x) but with no success.
You could add another, shorter timeout yourself and handle the situation before the worker gets killed. Something like this might be a good start:
require 'timeout'
begin
# 5 seconds before the MySQL timeout would kick in
Timeout.timeout(55) do
# have only the slow query in this block
#database_results = ActiveRecord::Base.connection.select_all(query_string)
end
rescue Timeout::Error
# Handle the timeout. Proper error handling depends on your application.
# In a controller you might just want to return an error page, in a
# background worker you might want to record the error in your database.
end
# time to translate the data should not count towards the timeout
#results = #database_results.map { |r| Hashie::Mash.new(r) }

Dashing: Ruby: CentOS: Not closing MySQL processes

I am having trouble with my server.
It is a CentOS RedHat Linux server and runs "Dashing" a Ruby/Sinatra-based dashboard.
I am trying to close the active connections as defined by my MySQL database "SHOW PROCESSLIST;"
Example.rb File
require 'mysql2'
SCHEDULER.every '10s'do
db = Mysql.new('host_name', 'database_name', 'password', 'table')
mysql1 = "SELECT `VAR` from `TABLE` ORDER BY `VAR` DESC LIMIT 1"
result1 = db.query(mysql1)
result1.each do |row|
strrow1 = row[0]
$num1 = strrow1.to_i
end
...
db.close
LINK[0] = { label: 'LABEL', value: $num1}
...
send_event('LABEL FOR HTML', { items: LINK.values })
end
However, after a few clicks back and forth, it is clear that the database does not drop the connections, but instead keeps them. This causes the browser to slow down to the point that loading a page becomes impossible and the output of the log reads:
"max_user_connections" reached
Can anyone think of a way to fix this?
It is a best practice for DB/File/handle stuff to be in a begin/rescue/ensure block. It could be that something is happening and Rufus/Dashing is just being quiet about the error since they trap exceptions and go on their merry way. This would prevent your db connection from closing. The symptoms you are having could be from a similar problem, either way it's a good idea.
SCHEDULER.every '10s'do
begin
db = Mysql.new('host_name', 'database_name', 'password', 'table')
# .... stuff ....
rescue
# what happens if an error happens? log it, toss it, ignore it?
ensure
db.close
end
# ... more stuff if you want ...
end

Error: bad argument #1 to 'insert' (table expected, got nil)

I am trying to connect to a mysql server using LuaSql via a mysql proxy. I try to execute a simple program (db.lua):
require("luasql.mysql")
local _sqlEnv = assert(luasql.mysql())
local _con = nil
function read_auth(auth)
local host, port = string.match(proxy.backends[1].address, "(.*):(.*)")
_con = assert(_sqlEnv:connect( "db_name", "username", "password", "hostname", "3306"))
end
function disconnect_client()
assert(_con:close())
end
function read_query(packet)
local cur = con:execute("select * from t1")
myTable = {}
row = cur:fetch(myTable, "a")
print(myTable.id,myTable.user)
end
This code executes well when I execute it without mysql-proxy. When I am connecting with mysql-proxy, the error-log displays these errors:
mysql.lua:8: bad argument #1 to 'insert' (table expected, got nil)
db.lua:1: loop or previous error loading module 'luasql.mysql'
mysql.lua is a default file of LuaSql:
---------------------------------------------------------------------
-- MySQL specific tests and configurations.
-- $Id: mysql.lua,v 1.4 2006/01/25 20:28:30 tomas Exp $
---------------------------------------------------------------------
QUERYING_STRING_TYPE_NAME = "binary(65535)"
table.insert (CUR_METHODS, "numrows")
table.insert (EXTENSIONS, numrows)
---------------------------------------------------------------------
-- Build SQL command to create the test table.
---------------------------------------------------------------------
local _define_table = define_table
function define_table (n)
return _define_table(n) .. " TYPE = InnoDB;"
end
---------------------------------------------------------------------
-- MySQL versions 4.0.x do not implement rollback.
---------------------------------------------------------------------
local _rollback = rollback
function rollback ()
if luasql._MYSQLVERSION and string.sub(luasql._MYSQLVERSION, 1, 3) == "4.0" then
io.write("skipping rollback test (mysql version 4.0.x)")
return
else
_rollback ()
end
end
As stated in my previous comment, the error indicates that table.insert (CUR_METHODS, ...) is getting a nil as first arg. Since the first arg is CUR_METHODS, it means that this object CUR_METHODS has not been defined yet. Since this happens near top of the luasql.mysql module, my guess is that the luasql initialization was incomplete, maybe because the mysql DLL was not found. My guess is that the LUA_CPATH does not find the MySQL DLL for luasql, but I'm surprised that you wouldn't get a package error, so something odd is going on. You'll have to dig into the luasql module and C file to figure out why it is not being created.
Update: alternately, update your post to show the output of print("LUA path:", package.path) and print("LUA path:", package.cpath) from your mysql-proxy script and also show the path of folder where luasql is installed and contents of that folder.

Perl Module Instantiation + DBI + Forks "Mysql server has gone away"

I have written a perl program that parses records from csv into a db.
The program worked fine but took a long time. So I decided to fork the main parsing process.
After a bit of wrangling with fork it now works well and runs about 4 times faster. The main parsing method is quite database intensive. For interests sake, for each record that is parsed there are the following db calls:
1 - there is a check that the uniquely generated base62 is unique against a baseid map table
2 - There is an archive check to see if the record has changed
3 - The record is inserted into the db
The problem is that I began to get "Mysql has gone away" errors while the parser was being run in forked mode, so after much fiddling I came up with the following mysql config:
#
# * Fine Tuning
#
key_buffer = 10000M
max_allowed_packet = 10000M
thread_stack = 192K
thread_cache_size = 8
myisam-recover = BACKUP
max_connections = 10000
table_cache = 64
thread_concurrency = 32
wait_timeout = 15
tmp_table_size = 1024M
query_cache_limit = 2M
#query_cache_size = 100M
query_cache_size = 0
query_cache_type = 0
That seems to have fixed problems while the parser is running However, I am now getting a "Mysql server has gone away" when the next module is run after the main parser.
The strange thinf is the module causing problems involves a very simple SELECT query on a table with currently only 3 records. Run directly as a test (not after the parser) it works fine.
I tried adding a pause of 4 minutes after the parser module runs - but I get the same error.
I have a main DBConnection.pm model with this:
package DBConnection;
use DBI;
use PXConfig;
sub new {
my $class = shift;
## MYSQL Connection
my $config = new PXConfig();
my $host = $config->val('database', 'host');
my $database = $config->val('database', 'db');
my $user = $config->val('database', 'user');
my $pw = $config->val('database', 'password');
my $dsn = "DBI:mysql:database=$database;host=$host;";
my $connect2 = DBI->connect( $dsn, $user, $pw, );
$connect2->{mysql_auto_reconnect} = 1;
$connect2->{RaiseError} = 1;
$connect2->{PrintError} = 1;
$connect2->{ShowErrorStatement} = 1;
$connect2->{InactiveDestroy} = 1;
my $self = {
connect => $connect2,
};
bless $self, $class;
return $self;
}
Then all modules, including the forked parser modules, open a connection to the DB using:
package Example;
use DBConnection;
sub new {
my $class = shift;
my $db = new DBConnection;
my $connect2 = $db->connect();
my $self = {
connect2 => $connect2,
};
bless $self, $class;
return $self;
}
The question is if I have Module1.pm that calls Module2.pm that calls Module3.pm and each of them instantiates a connection with the DB as shown above (ie in the constructor) then are they using different connections to the database or the same connection?
What I wondered is if the script takes say 6 hours to finish, if the top level call to the db connection is timing out the lower level db connection even though the lower level module is making its 'own' connection.
It is very frustrating trying to find the problem as I can only reproduce the error after running a very long parse process.
Sorry for the long question, thanks in advance to anyone who can give me any ideas.
UPDATE 1:
Here is the actual forking part:
my $fh = Tie::Handle::CSV->new( "$file", header => 1 );
while ( my $part = <$fh> ) {
if ( $children == $max_threads ) {
$pid = wait();
$children--;
}
if ( defined( $pid = fork ) ) {
if ($pid) {
$children++;
} else {
$cfptu = new ThreadedUnit();
$cfptu->parseThreadedUnit($part, $group_id, $feed_id);
}
}
}
And then the ThreadedUnit:
package ThreadedUnit;
use CollisionChecker;
use ArchiveController;
use Filters;
use Try::Tiny;
use MysqlLogger;
sub new {
my $class = shift;
my $db = new DBConnection;
my $connect2 = $db->connect();
my $self = {
connect2 => $connect2,
};
bless $self, $class;
return $self;
}
sub parseThreadedUnit {
my ( $self, $part, $group_id, $feed_id ) = #_;
my $connect2 = $self->{connect2};
## Parsing stuff
## DB Update in try -> catch
exit();
}
So as I understand the DB connection is being called after the forking.
But, as I mentioned above the forked code outlined just above works fine. It is the next module that does not work which is being run from a controller module which just runs through each worker module one at time (the parser being one of them) - the controller module does not create a DB connection in its construct or anywhere else.
Update 2
I forgot to mention that I don't get any errors in the 'problem' module following the parser if I only parse a small number of files and not the full queue.
So it is almost as if the intensive forked parsing and accessing the DB makes it un-available for normal non-forked processes just after it ends for some undetermined time.
The only thing I have noticed when the parser run finishes in Mysql status is the Threads_connected sits around, say, 500 and does not decrease for some time.
It depends on how your program is structured, which isn't clear from the question.
If you create the DB connection before you fork, Perl will make a copy of the DB connection object for each process. This would likely cause problems if two processes try to access the database concurrently with the same DB connection.
On the other hand, if you create the DB connections after forking, each module will have its own connection. This should work, but you could have a timeout problem if Module x creates a connection, then waits a long time for a process in Module y to finish, then tries to use the connection.
In summary, here is what you want:
Don't have any open connections at the point you fork. Child processes should create their own connections.
Only open a connection right before you want to use it. If there is a point in your program when you have to wait, open the connection after the waiting is done.
Read dan1111's answer but I suspect you are connecting then forking. When the child completes the DBI connection handle goes out of scope and is closed. As dan1111 says you are better connecting in the child for all the reasons he said. Read about InactiveDestroy and AutoInactiveDestroy in DBI which will help you understand what is going on.

Ruby mysql2 error when executing statements in rapid succession

I have a strange issue using the Mysql2 client in Ruby. When trying to execute the following:
client.query("CREATE DATABASE ...; INSERT INTO ..."); #SQL truncated for brevity
client.query("SELECT 1 FROM ...") #SQL truncated for brevity
Ruby throws an error that the table I'm selecting from doesn't exist. However if I try the following:
client.query("CREATE DATABASE ...; INSERT INTO ..."); #SQL truncated for brevity
sleep 1
client.query("SELECT 1 FROM ...") #SQL truncated for brevity
The query works with no problems. It seems as though I need to give the MySQL server some time to load the data before I'm able to query it. Can anyone explain why this is happening and how to programmatically overcome this without using sleep?
Update
I initialize the client as so:
Mysql2::Client.new({
:adapter => "mysql2",
:host => ip_address,
:username => db_username,
:password => db_password,
:flags => Mysql2::Client::MULTI_STATEMENTS
})
I checked the 'query_options' attribute and async is set to false. I have tried explicitly setting the async => false flag to no avail.
The same issue happens if I use
Model.connection.execute(SQL HERE)
Note, this is all executed from within a Rails unit test.
Thanks
For some reason the only thing that ended up working and not needing the sleep 1 in between is the following:
#model = Model.new
Mysql2::Client.default_query_options[:connect_flags] |= Mysql2::Client::MULTI_STATEMENTS
#model.connection.reconnect!