Perl Module Instantiation + DBI + Forks "Mysql server has gone away" - mysql

I have written a perl program that parses records from csv into a db.
The program worked fine but took a long time. So I decided to fork the main parsing process.
After a bit of wrangling with fork it now works well and runs about 4 times faster. The main parsing method is quite database intensive. For interests sake, for each record that is parsed there are the following db calls:
1 - there is a check that the uniquely generated base62 is unique against a baseid map table
2 - There is an archive check to see if the record has changed
3 - The record is inserted into the db
The problem is that I began to get "Mysql has gone away" errors while the parser was being run in forked mode, so after much fiddling I came up with the following mysql config:
#
# * Fine Tuning
#
key_buffer = 10000M
max_allowed_packet = 10000M
thread_stack = 192K
thread_cache_size = 8
myisam-recover = BACKUP
max_connections = 10000
table_cache = 64
thread_concurrency = 32
wait_timeout = 15
tmp_table_size = 1024M
query_cache_limit = 2M
#query_cache_size = 100M
query_cache_size = 0
query_cache_type = 0
That seems to have fixed problems while the parser is running However, I am now getting a "Mysql server has gone away" when the next module is run after the main parser.
The strange thinf is the module causing problems involves a very simple SELECT query on a table with currently only 3 records. Run directly as a test (not after the parser) it works fine.
I tried adding a pause of 4 minutes after the parser module runs - but I get the same error.
I have a main DBConnection.pm model with this:
package DBConnection;
use DBI;
use PXConfig;
sub new {
my $class = shift;
## MYSQL Connection
my $config = new PXConfig();
my $host = $config->val('database', 'host');
my $database = $config->val('database', 'db');
my $user = $config->val('database', 'user');
my $pw = $config->val('database', 'password');
my $dsn = "DBI:mysql:database=$database;host=$host;";
my $connect2 = DBI->connect( $dsn, $user, $pw, );
$connect2->{mysql_auto_reconnect} = 1;
$connect2->{RaiseError} = 1;
$connect2->{PrintError} = 1;
$connect2->{ShowErrorStatement} = 1;
$connect2->{InactiveDestroy} = 1;
my $self = {
connect => $connect2,
};
bless $self, $class;
return $self;
}
Then all modules, including the forked parser modules, open a connection to the DB using:
package Example;
use DBConnection;
sub new {
my $class = shift;
my $db = new DBConnection;
my $connect2 = $db->connect();
my $self = {
connect2 => $connect2,
};
bless $self, $class;
return $self;
}
The question is if I have Module1.pm that calls Module2.pm that calls Module3.pm and each of them instantiates a connection with the DB as shown above (ie in the constructor) then are they using different connections to the database or the same connection?
What I wondered is if the script takes say 6 hours to finish, if the top level call to the db connection is timing out the lower level db connection even though the lower level module is making its 'own' connection.
It is very frustrating trying to find the problem as I can only reproduce the error after running a very long parse process.
Sorry for the long question, thanks in advance to anyone who can give me any ideas.
UPDATE 1:
Here is the actual forking part:
my $fh = Tie::Handle::CSV->new( "$file", header => 1 );
while ( my $part = <$fh> ) {
if ( $children == $max_threads ) {
$pid = wait();
$children--;
}
if ( defined( $pid = fork ) ) {
if ($pid) {
$children++;
} else {
$cfptu = new ThreadedUnit();
$cfptu->parseThreadedUnit($part, $group_id, $feed_id);
}
}
}
And then the ThreadedUnit:
package ThreadedUnit;
use CollisionChecker;
use ArchiveController;
use Filters;
use Try::Tiny;
use MysqlLogger;
sub new {
my $class = shift;
my $db = new DBConnection;
my $connect2 = $db->connect();
my $self = {
connect2 => $connect2,
};
bless $self, $class;
return $self;
}
sub parseThreadedUnit {
my ( $self, $part, $group_id, $feed_id ) = #_;
my $connect2 = $self->{connect2};
## Parsing stuff
## DB Update in try -> catch
exit();
}
So as I understand the DB connection is being called after the forking.
But, as I mentioned above the forked code outlined just above works fine. It is the next module that does not work which is being run from a controller module which just runs through each worker module one at time (the parser being one of them) - the controller module does not create a DB connection in its construct or anywhere else.
Update 2
I forgot to mention that I don't get any errors in the 'problem' module following the parser if I only parse a small number of files and not the full queue.
So it is almost as if the intensive forked parsing and accessing the DB makes it un-available for normal non-forked processes just after it ends for some undetermined time.
The only thing I have noticed when the parser run finishes in Mysql status is the Threads_connected sits around, say, 500 and does not decrease for some time.

It depends on how your program is structured, which isn't clear from the question.
If you create the DB connection before you fork, Perl will make a copy of the DB connection object for each process. This would likely cause problems if two processes try to access the database concurrently with the same DB connection.
On the other hand, if you create the DB connections after forking, each module will have its own connection. This should work, but you could have a timeout problem if Module x creates a connection, then waits a long time for a process in Module y to finish, then tries to use the connection.
In summary, here is what you want:
Don't have any open connections at the point you fork. Child processes should create their own connections.
Only open a connection right before you want to use it. If there is a point in your program when you have to wait, open the connection after the waiting is done.

Read dan1111's answer but I suspect you are connecting then forking. When the child completes the DBI connection handle goes out of scope and is closed. As dan1111 says you are better connecting in the child for all the reasons he said. Read about InactiveDestroy and AutoInactiveDestroy in DBI which will help you understand what is going on.

Related

multiprocessing.Pool to execute concurrent subprocesses to pipe data from mysqldump

I am using Python to pipe data from one mysql database to another. Here is a lightly abstracted version of code which I have been using for several months, which has worked fairly well:
def copy_table(mytable):
raw_mysqldump = "mysqldump -h source_host -u source_user --password='secret' --lock-tables=FALSE myschema mytable"
raw_mysql = "mysql -h destination_host -u destination_user --password='secret' myschema"
mysqldump = shlex.split(raw_mysqldump)
mysql = shlex.split(raw_mysql)
ps = subprocess.Popen(mysqldump, stdout=subprocess.PIPE)
subprocess.check_output(mysql, stdin=ps.stdout)
ps.stdout.close()
retcode = ps.wait()
if retcode == 0:
return mytable, 1
else:
return mytable, 0
The size of the data has grown, and it currently takes about an hour to copy something like 30 tables. In order to speed things along, I would like to utilize multiprocessing. I am attempting to execute the following code on an Ubuntu server, which is an t2.micro (AWS EC2).
def copy_tables(tables):
with multiprocessing.Pool(processes=4) as pool:
params = [(arg, table) for table in sorted(tables)]
results = pool.starmap(copy_table, params)
failed_tables = [table for table, success in results if success == 0]
all_tables_processed = False if failed_tables else True
return all_tables_processed
The problem is: almost all of the tables will copy, but there are always a couple of child processes left over that will not complete - they just hang, and I can see from monitoring the database that no data is being transferred. It feels like they have somehow disconnected from the parent process, or that data is not being returned properly.
This is my first question, and I've tried to be both specific and concise - thanks in advance for any help, and please let me know if I can provide more information.
I think the following code
ps = subprocess.Popen(mysqldump, stdout=subprocess.PIPE)
subprocess.check_output(mysql, stdin=ps.stdout)
ps.stdout.close()
retcode = ps.wait()
should be
ps = subprocess.Popen(mysqldump, stdout=subprocess.PIPE)
sps = subprocess.Popen(mysql, stdin=ps.stdout)
retcode = ps.wait()
ps.stdout.close()
sps.wait()
You should not close the pipe until the mysqldump process finished. And check_output is blocking, it will hang until stdin reach the end.

How to store MQTT Mosquitto publish events into MySQL? [duplicate]

This question already has an answer here:
Is there a way to store Mosquitto payload into an MySQL database for history purpose?
(1 answer)
Closed 4 years ago.
I've connected a device that communicates to my mosquitto MQTT server (RPi) and is sending out publications to a specified topic. What I want to do now is to store the messages published on that topic on the MQTT server into a MySQL database. I know how MySQL works, but I don't know how to listen for these incoming publications. I'm looking for a light-weight solution that runs in the background. Any pointers or ideas on libraries to use are very welcome.
I've done something similar in the last days:
live-collecting weatherstation-data with pywws
publishing with pywws.service.mqtt to mqtt-Broker
python-script on NAS collecting the data and writing to MariaDB
#!/usr/bin/python -u
import mysql.connector as mariadb
import paho.mqtt.client as mqtt
import ssl
mariadb_connection = mariadb.connect(user='USER', password='PW', database='MYDB')
cursor = mariadb_connection.cursor()
# MQTT Settings
MQTT_Broker = "192.XXX.XXX.XXX"
MQTT_Port = 8883
Keep_Alive_Interval = 60
MQTT_Topic = "/weather/pywws/#"
# Subscribe
def on_connect(client, userdata, flags, rc):
mqttc.subscribe(MQTT_Topic, 0)
def on_message(mosq, obj, msg):
# Prepare Data, separate columns and values
msg_clear = msg.payload.translate(None, '{}""').split(", ")
msg_dict = {}
for i in range(0, len(msg_clear)):
msg_dict[msg_clear[i].split(": ")[0]] = msg_clear[i].split(": ")[1]
# Prepare dynamic sql-statement
placeholders = ', '.join(['%s'] * len(msg_dict))
columns = ', '.join(msg_dict.keys())
sql = "INSERT INTO pws ( %s ) VALUES ( %s )" % (columns, placeholders)
# Save Data into DB Table
try:
cursor.execute(sql, msg_dict.values())
except mariadb.Error as error:
print("Error: {}".format(error))
mariadb_connection.commit()
def on_subscribe(mosq, obj, mid, granted_qos):
pass
mqttc = mqtt.Client()
# Assign event callbacks
mqttc.on_message = on_message
mqttc.on_connect = on_connect
mqttc.on_subscribe = on_subscribe
# Connect
mqttc.tls_set(ca_certs="ca.crt", tls_version=ssl.PROTOCOL_TLSv1_2)
mqttc.connect(MQTT_Broker, int(MQTT_Port), int(Keep_Alive_Interval))
# Continue the network loop & close db-connection
mqttc.loop_forever()
mariadb_connection.close()
If you are familiar with Python the Paho MQTT library is simple, light on resources, and interfaces well with Mosquitto. To use it simply subscribe to the topic and set up a callback to pass the payload to MySQL using peewee as shown in this answer. Run the script in the background and call it good!

Parallel mysql I/O in Ruby

Good day to you. I'm writing a cron job that hopefully will split a huge MySQL table to several threads and do some work on them. This is the minimal sample of what I have at the moment:
require 'mysql'
require 'parallel'
#db = Mysql.real_connect("localhost", "root", "", "database")
#threads = 10
Parallel.map(1..#threads, :in_processes => 8) do |i|
begin
#db.query("SELECT url FROM pages LIMIT 1 OFFSET #{i}")
rescue Mysql::Error => e
#db.reconnect()
puts "Error code: #{e.errno}"
puts "Error message: #{e.error}"
puts "Error SQLSTATE: #{e.sqlstate}" if e.respond_to?("sqlstate")
end
end
#db.close
The threads don't need to return anything, they get their job share and they do it. Only they don't. Either connection to MySQL is lost during the query, or connection doesn't exist (MySQL server has gone away?!), or no _dump_data is defined for class Mysql::Result and then Parallel::DeadWorker.
How to do that right?
map method expects a result; I don't need a result, so I switched to each:
Parallel.each(1..#threads, :in_processes => 8) do |i|
Also this solves a problem with MySQL: I just needed to start the connection inside the parallel process. When using each loop, it's possible. Of course, connection should be closed inside the process also.

Dashing: Ruby: CentOS: Not closing MySQL processes

I am having trouble with my server.
It is a CentOS RedHat Linux server and runs "Dashing" a Ruby/Sinatra-based dashboard.
I am trying to close the active connections as defined by my MySQL database "SHOW PROCESSLIST;"
Example.rb File
require 'mysql2'
SCHEDULER.every '10s'do
db = Mysql.new('host_name', 'database_name', 'password', 'table')
mysql1 = "SELECT `VAR` from `TABLE` ORDER BY `VAR` DESC LIMIT 1"
result1 = db.query(mysql1)
result1.each do |row|
strrow1 = row[0]
$num1 = strrow1.to_i
end
...
db.close
LINK[0] = { label: 'LABEL', value: $num1}
...
send_event('LABEL FOR HTML', { items: LINK.values })
end
However, after a few clicks back and forth, it is clear that the database does not drop the connections, but instead keeps them. This causes the browser to slow down to the point that loading a page becomes impossible and the output of the log reads:
"max_user_connections" reached
Can anyone think of a way to fix this?
It is a best practice for DB/File/handle stuff to be in a begin/rescue/ensure block. It could be that something is happening and Rufus/Dashing is just being quiet about the error since they trap exceptions and go on their merry way. This would prevent your db connection from closing. The symptoms you are having could be from a similar problem, either way it's a good idea.
SCHEDULER.every '10s'do
begin
db = Mysql.new('host_name', 'database_name', 'password', 'table')
# .... stuff ....
rescue
# what happens if an error happens? log it, toss it, ignore it?
ensure
db.close
end
# ... more stuff if you want ...
end

Perl Connection Pooling

Right now we have a large perl application that is using raw DBI to connect to MySQL and execute SQL statements. It creates a connection each time and terminates. Were starting to approach mysql's connection limit (200 at once)
It looks like DBIx::Connection supports application layer connection pooling.
Has anybody had any experience with DBIx::Connection?. Are there any other considerations for connection pooling?
I also see mod_dbd which is an Apache mod that looks like it handles connection pooling.
http://httpd.apache.org/docs/2.1/mod/mod_dbd.html
I don't have any experience with DBIx::Connection, but I use DBIx::Connector (essentially what DBIx::Class uses internally, but inlined) and it's wonderful...
I pool these connections with a Moose object wrapper that hands back existing object instances if the connection parameters are identical (this would work the same for any underlying DB object):
package MyApp::Factory::DatabaseConnection;
use strict;
use warnings;
use Moose;
# table of database name -> connection objects
has connection_pool => (
is => 'ro', isa => 'HashRef[DBIx::Connector]',
traits => ['Hash'],
handles => {
has_pooled_connection => 'exists',
get_pooled_connection => 'get',
save_pooled_connection => 'set',
},
default => sub { {} },
);
sub get_connection
{
my ($self, %options) = #_;
# some application-specific parsing of %options here...
my $obj;
if ($options{reuse})
{
# extract the last-allocated connection for this database and pass it
# back, if there is one.
$obj = $self->get_pooled_connection($options{database});
}
if (not $obj or not $obj->connected)
{
# look up connection info based on requested database name
my ($dsn, $username, $password) = $self->get_connection_info($options{database});
$obj = DBIx::Connector->new($dsn, $username, $password);
return unless $obj;
# Save this connection for later reuse, possibly replacing an earlier
# saved connection (this latest one has the highest chance of being in
# the same pid as a subsequent request).
$self->save_pooled_connection($options{database}, $obj) unless $options{nosave};
}
return $obj;
}
Just making sure: you know about DBI->connect_cached(), right? It's a drop-in replacement for connect() that reuses dbh's, where possible, over the life of your perl script. Maybe your problem is solvable by adding 7 characters :)
And, MySQL's connections are relatively cheap. Running with your DB at max_connections=1000 or more won't by itself cause problems. (If your clients are demanding more work than your DB can handle, that's a more serious problem, one which a lower max_connections might put off but of course not solve.)