Perl Connection Pooling - mysql

Right now we have a large perl application that is using raw DBI to connect to MySQL and execute SQL statements. It creates a connection each time and terminates. Were starting to approach mysql's connection limit (200 at once)
It looks like DBIx::Connection supports application layer connection pooling.
Has anybody had any experience with DBIx::Connection?. Are there any other considerations for connection pooling?
I also see mod_dbd which is an Apache mod that looks like it handles connection pooling.
http://httpd.apache.org/docs/2.1/mod/mod_dbd.html

I don't have any experience with DBIx::Connection, but I use DBIx::Connector (essentially what DBIx::Class uses internally, but inlined) and it's wonderful...
I pool these connections with a Moose object wrapper that hands back existing object instances if the connection parameters are identical (this would work the same for any underlying DB object):
package MyApp::Factory::DatabaseConnection;
use strict;
use warnings;
use Moose;
# table of database name -> connection objects
has connection_pool => (
is => 'ro', isa => 'HashRef[DBIx::Connector]',
traits => ['Hash'],
handles => {
has_pooled_connection => 'exists',
get_pooled_connection => 'get',
save_pooled_connection => 'set',
},
default => sub { {} },
);
sub get_connection
{
my ($self, %options) = #_;
# some application-specific parsing of %options here...
my $obj;
if ($options{reuse})
{
# extract the last-allocated connection for this database and pass it
# back, if there is one.
$obj = $self->get_pooled_connection($options{database});
}
if (not $obj or not $obj->connected)
{
# look up connection info based on requested database name
my ($dsn, $username, $password) = $self->get_connection_info($options{database});
$obj = DBIx::Connector->new($dsn, $username, $password);
return unless $obj;
# Save this connection for later reuse, possibly replacing an earlier
# saved connection (this latest one has the highest chance of being in
# the same pid as a subsequent request).
$self->save_pooled_connection($options{database}, $obj) unless $options{nosave};
}
return $obj;
}

Just making sure: you know about DBI->connect_cached(), right? It's a drop-in replacement for connect() that reuses dbh's, where possible, over the life of your perl script. Maybe your problem is solvable by adding 7 characters :)
And, MySQL's connections are relatively cheap. Running with your DB at max_connections=1000 or more won't by itself cause problems. (If your clients are demanding more work than your DB can handle, that's a more serious problem, one which a lower max_connections might put off but of course not solve.)

Related

Inserting data from CSV into MySQL DB is very slow

Trying to insert data from a CSV file to a MySQL DB using Ruby, and it's very slow. Note that this is not a Rails application, just stand-alone Ruby script.
Here is my code:
def add_record (data1, data2, time)
date = DateTime.strptime(time, "%m/%d/%y %H:%M")
<my table>.create(data1: data1, data2: data2, time: date)
end
def parse_file (file)
path = #folder + "\\" + file
CSV.foreach(path, {headers: :first_row}) do |line|
add_record(line[4], line[5], line[0])
end
end
def analyze_data ()
Dir.foreach #folder do |file|
next if file == '.' or file == '..'
parse_file file
end
end
And my connection:
#connection = ActiveRecord::Base.establish_connection(
:adapter=> "mysql2",
:host => "localhost",
:database=> <db>,
:username => "root",
:password => <pw>
)
Any help appreciated.
Use Load Data Infile.
Here is a nice article on performance and strategies titled Testing the Fastest Way to Import a Table into MySQL. Don't let the mysql version of the title or inside the article scare you away. Jumping to the bottom and picking up some conclusions:
The fastest way you can import a table into MySQL without using raw
files is the LOAD DATA syntax. Use parallelization for InnoDB for
better results, and remember to tune basic parameters like your
transaction log size and buffer pool. Careful programming and
importing can make a >2-hour problem became a 2-minute process. You
can disable temporarily some security features for extra performance
You might just find your times greatly reduced.
Use the zdennis/activerecord-import gem. you can insert tons records quickly.

Parallel mysql I/O in Ruby

Good day to you. I'm writing a cron job that hopefully will split a huge MySQL table to several threads and do some work on them. This is the minimal sample of what I have at the moment:
require 'mysql'
require 'parallel'
#db = Mysql.real_connect("localhost", "root", "", "database")
#threads = 10
Parallel.map(1..#threads, :in_processes => 8) do |i|
begin
#db.query("SELECT url FROM pages LIMIT 1 OFFSET #{i}")
rescue Mysql::Error => e
#db.reconnect()
puts "Error code: #{e.errno}"
puts "Error message: #{e.error}"
puts "Error SQLSTATE: #{e.sqlstate}" if e.respond_to?("sqlstate")
end
end
#db.close
The threads don't need to return anything, they get their job share and they do it. Only they don't. Either connection to MySQL is lost during the query, or connection doesn't exist (MySQL server has gone away?!), or no _dump_data is defined for class Mysql::Result and then Parallel::DeadWorker.
How to do that right?
map method expects a result; I don't need a result, so I switched to each:
Parallel.each(1..#threads, :in_processes => 8) do |i|
Also this solves a problem with MySQL: I just needed to start the connection inside the parallel process. When using each loop, it's possible. Of course, connection should be closed inside the process also.

Dashing: Ruby: CentOS: Not closing MySQL processes

I am having trouble with my server.
It is a CentOS RedHat Linux server and runs "Dashing" a Ruby/Sinatra-based dashboard.
I am trying to close the active connections as defined by my MySQL database "SHOW PROCESSLIST;"
Example.rb File
require 'mysql2'
SCHEDULER.every '10s'do
db = Mysql.new('host_name', 'database_name', 'password', 'table')
mysql1 = "SELECT `VAR` from `TABLE` ORDER BY `VAR` DESC LIMIT 1"
result1 = db.query(mysql1)
result1.each do |row|
strrow1 = row[0]
$num1 = strrow1.to_i
end
...
db.close
LINK[0] = { label: 'LABEL', value: $num1}
...
send_event('LABEL FOR HTML', { items: LINK.values })
end
However, after a few clicks back and forth, it is clear that the database does not drop the connections, but instead keeps them. This causes the browser to slow down to the point that loading a page becomes impossible and the output of the log reads:
"max_user_connections" reached
Can anyone think of a way to fix this?
It is a best practice for DB/File/handle stuff to be in a begin/rescue/ensure block. It could be that something is happening and Rufus/Dashing is just being quiet about the error since they trap exceptions and go on their merry way. This would prevent your db connection from closing. The symptoms you are having could be from a similar problem, either way it's a good idea.
SCHEDULER.every '10s'do
begin
db = Mysql.new('host_name', 'database_name', 'password', 'table')
# .... stuff ....
rescue
# what happens if an error happens? log it, toss it, ignore it?
ensure
db.close
end
# ... more stuff if you want ...
end

Perl Module Instantiation + DBI + Forks "Mysql server has gone away"

I have written a perl program that parses records from csv into a db.
The program worked fine but took a long time. So I decided to fork the main parsing process.
After a bit of wrangling with fork it now works well and runs about 4 times faster. The main parsing method is quite database intensive. For interests sake, for each record that is parsed there are the following db calls:
1 - there is a check that the uniquely generated base62 is unique against a baseid map table
2 - There is an archive check to see if the record has changed
3 - The record is inserted into the db
The problem is that I began to get "Mysql has gone away" errors while the parser was being run in forked mode, so after much fiddling I came up with the following mysql config:
#
# * Fine Tuning
#
key_buffer = 10000M
max_allowed_packet = 10000M
thread_stack = 192K
thread_cache_size = 8
myisam-recover = BACKUP
max_connections = 10000
table_cache = 64
thread_concurrency = 32
wait_timeout = 15
tmp_table_size = 1024M
query_cache_limit = 2M
#query_cache_size = 100M
query_cache_size = 0
query_cache_type = 0
That seems to have fixed problems while the parser is running However, I am now getting a "Mysql server has gone away" when the next module is run after the main parser.
The strange thinf is the module causing problems involves a very simple SELECT query on a table with currently only 3 records. Run directly as a test (not after the parser) it works fine.
I tried adding a pause of 4 minutes after the parser module runs - but I get the same error.
I have a main DBConnection.pm model with this:
package DBConnection;
use DBI;
use PXConfig;
sub new {
my $class = shift;
## MYSQL Connection
my $config = new PXConfig();
my $host = $config->val('database', 'host');
my $database = $config->val('database', 'db');
my $user = $config->val('database', 'user');
my $pw = $config->val('database', 'password');
my $dsn = "DBI:mysql:database=$database;host=$host;";
my $connect2 = DBI->connect( $dsn, $user, $pw, );
$connect2->{mysql_auto_reconnect} = 1;
$connect2->{RaiseError} = 1;
$connect2->{PrintError} = 1;
$connect2->{ShowErrorStatement} = 1;
$connect2->{InactiveDestroy} = 1;
my $self = {
connect => $connect2,
};
bless $self, $class;
return $self;
}
Then all modules, including the forked parser modules, open a connection to the DB using:
package Example;
use DBConnection;
sub new {
my $class = shift;
my $db = new DBConnection;
my $connect2 = $db->connect();
my $self = {
connect2 => $connect2,
};
bless $self, $class;
return $self;
}
The question is if I have Module1.pm that calls Module2.pm that calls Module3.pm and each of them instantiates a connection with the DB as shown above (ie in the constructor) then are they using different connections to the database or the same connection?
What I wondered is if the script takes say 6 hours to finish, if the top level call to the db connection is timing out the lower level db connection even though the lower level module is making its 'own' connection.
It is very frustrating trying to find the problem as I can only reproduce the error after running a very long parse process.
Sorry for the long question, thanks in advance to anyone who can give me any ideas.
UPDATE 1:
Here is the actual forking part:
my $fh = Tie::Handle::CSV->new( "$file", header => 1 );
while ( my $part = <$fh> ) {
if ( $children == $max_threads ) {
$pid = wait();
$children--;
}
if ( defined( $pid = fork ) ) {
if ($pid) {
$children++;
} else {
$cfptu = new ThreadedUnit();
$cfptu->parseThreadedUnit($part, $group_id, $feed_id);
}
}
}
And then the ThreadedUnit:
package ThreadedUnit;
use CollisionChecker;
use ArchiveController;
use Filters;
use Try::Tiny;
use MysqlLogger;
sub new {
my $class = shift;
my $db = new DBConnection;
my $connect2 = $db->connect();
my $self = {
connect2 => $connect2,
};
bless $self, $class;
return $self;
}
sub parseThreadedUnit {
my ( $self, $part, $group_id, $feed_id ) = #_;
my $connect2 = $self->{connect2};
## Parsing stuff
## DB Update in try -> catch
exit();
}
So as I understand the DB connection is being called after the forking.
But, as I mentioned above the forked code outlined just above works fine. It is the next module that does not work which is being run from a controller module which just runs through each worker module one at time (the parser being one of them) - the controller module does not create a DB connection in its construct or anywhere else.
Update 2
I forgot to mention that I don't get any errors in the 'problem' module following the parser if I only parse a small number of files and not the full queue.
So it is almost as if the intensive forked parsing and accessing the DB makes it un-available for normal non-forked processes just after it ends for some undetermined time.
The only thing I have noticed when the parser run finishes in Mysql status is the Threads_connected sits around, say, 500 and does not decrease for some time.
It depends on how your program is structured, which isn't clear from the question.
If you create the DB connection before you fork, Perl will make a copy of the DB connection object for each process. This would likely cause problems if two processes try to access the database concurrently with the same DB connection.
On the other hand, if you create the DB connections after forking, each module will have its own connection. This should work, but you could have a timeout problem if Module x creates a connection, then waits a long time for a process in Module y to finish, then tries to use the connection.
In summary, here is what you want:
Don't have any open connections at the point you fork. Child processes should create their own connections.
Only open a connection right before you want to use it. If there is a point in your program when you have to wait, open the connection after the waiting is done.
Read dan1111's answer but I suspect you are connecting then forking. When the child completes the DBI connection handle goes out of scope and is closed. As dan1111 says you are better connecting in the child for all the reasons he said. Read about InactiveDestroy and AutoInactiveDestroy in DBI which will help you understand what is going on.

Can MySQL support concurrent queries per connection?

If I have 10 queries, and each query is updating a particular table (i.e., 10 different tables).
Can I open one mySQL connection, spawn 10 threads, each thread handles 1 query such that they can run concurrently instead of executing one by one.
Thanks!
No you can't:
MySQL client library (at least native C one) is not thread safe to use same connection from different threads. You need to use a connection per thread.
If you just need update/insert queries running in parallel (asynchronously in terms of MySQL API) - you can use INSERT DELAYED and UPDATE LOW_PRIORITY queries.
Because of the way the MySQL protocol works, these will get sent to the server one-by-one if you only have one connection open.
There is no file "log.conn.txt" created, so there is no conflict between concurrent queries to the single mysql client connection:
<?
declare(ticks=1);
pcntl_signal(SIGUSR1, create_function('$signo', 'sleep(1);while (($pid=pcntl_wait(#$status, WNOHANG))>0) {}'));//protect against zombie children
$pdo=new PDO('mysql:host=192.168.0.2;port=3306;dbname=baseinfo', 'dev', 'dev',
array(PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION,
PDO::MYSQL_ATTR_INIT_COMMAND=>'set names utf8'
)
);
for ($i=0; $i<20; ++$i)
{if (($pid=pcntl_fork())===-1)
{//...
continue;
}
else if ($pid)
{$pids[]=$pid;
pcntl_wait($status, WNOHANG); //protect against zombie children, one wait vs one child
}
else if ($pid===0)
{ob_start();//prevent output to main process
register_shutdown_function(create_function('$pars', 'ob_end_clean();posix_kill(posix_getppid(), SIGUSR1);posix_kill(getmypid(), SIGKILL);'), array());//to kill self before exit();, or else the resource shared with parent will be closed
for ($j=0; $j<200; ++$j)
{try
{file_put_contents('log.'.$i.'.txt', $pdo->query('select partner_login from base_account where id=100')->fetch(PDO::FETCH_COLUMN, 0)."\t".time().substr(microtime(),2,6)."\n", FILE_APPEND);
}
catch (Exception $e)
{if ($pdo->getAttribute(PDO::ATTR_SERVER_INFO)==='MySQL server has gone away')
{file_put_contents('log.conn.txt', time().substr(microtime(),2,6).":{$i}:{$j} lost\n", FILE_APPEND);
$pdo=&new PDO('mysql:host=192.168.0.2;port=3306;dbname=baseinfo', 'dev', 'dev',
array(PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION,
PDO::MYSQL_ATTR_INIT_COMMAND=>'set names utf8'
)
);
}
}
usleep(50000);
}
exit();//avoid foreach loop in child process
}
}
//wait all child to end, avoid close db connection before all children self killed
foreach ($pids as $p)
{pcntl_waitpid($p, $status);
}
?>