If I have 10 queries, and each query is updating a particular table (i.e., 10 different tables).
Can I open one mySQL connection, spawn 10 threads, each thread handles 1 query such that they can run concurrently instead of executing one by one.
Thanks!
No you can't:
MySQL client library (at least native C one) is not thread safe to use same connection from different threads. You need to use a connection per thread.
If you just need update/insert queries running in parallel (asynchronously in terms of MySQL API) - you can use INSERT DELAYED and UPDATE LOW_PRIORITY queries.
Because of the way the MySQL protocol works, these will get sent to the server one-by-one if you only have one connection open.
There is no file "log.conn.txt" created, so there is no conflict between concurrent queries to the single mysql client connection:
<?
declare(ticks=1);
pcntl_signal(SIGUSR1, create_function('$signo', 'sleep(1);while (($pid=pcntl_wait(#$status, WNOHANG))>0) {}'));//protect against zombie children
$pdo=new PDO('mysql:host=192.168.0.2;port=3306;dbname=baseinfo', 'dev', 'dev',
array(PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION,
PDO::MYSQL_ATTR_INIT_COMMAND=>'set names utf8'
)
);
for ($i=0; $i<20; ++$i)
{if (($pid=pcntl_fork())===-1)
{//...
continue;
}
else if ($pid)
{$pids[]=$pid;
pcntl_wait($status, WNOHANG); //protect against zombie children, one wait vs one child
}
else if ($pid===0)
{ob_start();//prevent output to main process
register_shutdown_function(create_function('$pars', 'ob_end_clean();posix_kill(posix_getppid(), SIGUSR1);posix_kill(getmypid(), SIGKILL);'), array());//to kill self before exit();, or else the resource shared with parent will be closed
for ($j=0; $j<200; ++$j)
{try
{file_put_contents('log.'.$i.'.txt', $pdo->query('select partner_login from base_account where id=100')->fetch(PDO::FETCH_COLUMN, 0)."\t".time().substr(microtime(),2,6)."\n", FILE_APPEND);
}
catch (Exception $e)
{if ($pdo->getAttribute(PDO::ATTR_SERVER_INFO)==='MySQL server has gone away')
{file_put_contents('log.conn.txt', time().substr(microtime(),2,6).":{$i}:{$j} lost\n", FILE_APPEND);
$pdo=&new PDO('mysql:host=192.168.0.2;port=3306;dbname=baseinfo', 'dev', 'dev',
array(PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION,
PDO::MYSQL_ATTR_INIT_COMMAND=>'set names utf8'
)
);
}
}
usleep(50000);
}
exit();//avoid foreach loop in child process
}
}
//wait all child to end, avoid close db connection before all children self killed
foreach ($pids as $p)
{pcntl_waitpid($p, $status);
}
?>
Related
I have a async action on my controller that can perform a heavy SQL query depending on user input.
#results = ActiveRecord::Base
.connection
.select_all(query_string)
.map do |record|
Hashie::Mash.new(record)
end
When it happens, the only response I get from the server is
E, [2020-02-05T16:14:04.133233 #59909] ERROR -- : worker=0 PID:60952 timeout (61s > 60s), killing
E, [2020-02-05T16:14:04.159372 #59909] ERROR -- : reaped #<Process::Status: pid 60952 SIGKILL (signal 9)> worker=0
Is there any way I can capture this timeout on the backend, to give the user the correct feedback?
Tried using Timeout::timeout(x) but with no success.
You could add another, shorter timeout yourself and handle the situation before the worker gets killed. Something like this might be a good start:
require 'timeout'
begin
# 5 seconds before the MySQL timeout would kick in
Timeout.timeout(55) do
# have only the slow query in this block
#database_results = ActiveRecord::Base.connection.select_all(query_string)
end
rescue Timeout::Error
# Handle the timeout. Proper error handling depends on your application.
# In a controller you might just want to return an error page, in a
# background worker you might want to record the error in your database.
end
# time to translate the data should not count towards the timeout
#results = #database_results.map { |r| Hashie::Mash.new(r) }
When I use mybatis insert 100000 records into a table in mysql,
1. it takes about 14s when I run the application(springboot+mybatis) in windows(my pc, 16G+i7),
2. but it takes 1244s when I run the same application in centos7 (product env, 4Core+8G ECS Server).
They both connect to the same mysql server (also run on centos7).
The network connection is better in centos7 (product env).
CPU performance is almost the same(I have tested).
The application is simple , takes only 1G memory when running.
Libraries versions in my application:
openjdk version "1.8.0_212",
Spring boot 2.1.6 ,
spring-boot-starter-tomcat-2.1.6.RELEASE.jar ,
spring-jdbc-5.0.7.RELEASE.jar ,
druid-1.1.19.jar ,
mybatis-3.5.2.jar ,
mybatis-spring-2.0.2.jar ,
mybatis-spring-boot-starter-2.1.0.jar ,
mybatis-spring-boot-autoconfigure-2.1.0.jar ,
mysql-connector-java-5.1.38.jar ,
Anyone know the reason ?
Thanks in advance.
==============================
Insert By Foreach (max_allowed_packet has been set to 200M):
<insert id="insertBatch" parameterType="java.util.List" useGeneratedKeys="false">
insert into table_product
(id,
code,
status,
type,
create_time,
update_time)
values
<foreach collection="products" item="product" index="index" separator=",">
(#{product.id},
#{product.code},
#{product.status},
#{product.type},
#{product.createTime},
#{product.updateTime})
</foreach>
</insert>
==================================
Insert By ExecutorType.BATCH:
public void batchInsert(List<Product> products){
SqlSession session = sqlSessionTemplate.getSqlSessionFactory().openSession(ExecutorType.BATCH, false);
BatchTableDao batchTableDao = session.getMapper(BatchTableDao.class);
try {
int i=0;
for (Product product : products) {
batchTableDao.insert(product);
if (i % 1000 == 0 || i == products.size()-1) {
session.flushStatements();
session.clearCache();
}
i++;
}
session.commit();
} catch (Exception e) {
Log.warn("error : "+e.getMessage());
} finally{
session.close();
}
}
===================================
In windows 10, it takes about 14s to insert 100000 rows by using 'foreach'.
And it takes about 2500s to insert 100000 rows by using 'ExecutorType.BATCH', it is too slow to accept.
First of all, how about changing to the version specified in the official document? And I think you need to know more about the code.
Good day to you. I'm writing a cron job that hopefully will split a huge MySQL table to several threads and do some work on them. This is the minimal sample of what I have at the moment:
require 'mysql'
require 'parallel'
#db = Mysql.real_connect("localhost", "root", "", "database")
#threads = 10
Parallel.map(1..#threads, :in_processes => 8) do |i|
begin
#db.query("SELECT url FROM pages LIMIT 1 OFFSET #{i}")
rescue Mysql::Error => e
#db.reconnect()
puts "Error code: #{e.errno}"
puts "Error message: #{e.error}"
puts "Error SQLSTATE: #{e.sqlstate}" if e.respond_to?("sqlstate")
end
end
#db.close
The threads don't need to return anything, they get their job share and they do it. Only they don't. Either connection to MySQL is lost during the query, or connection doesn't exist (MySQL server has gone away?!), or no _dump_data is defined for class Mysql::Result and then Parallel::DeadWorker.
How to do that right?
map method expects a result; I don't need a result, so I switched to each:
Parallel.each(1..#threads, :in_processes => 8) do |i|
Also this solves a problem with MySQL: I just needed to start the connection inside the parallel process. When using each loop, it's possible. Of course, connection should be closed inside the process also.
I have written a perl program that parses records from csv into a db.
The program worked fine but took a long time. So I decided to fork the main parsing process.
After a bit of wrangling with fork it now works well and runs about 4 times faster. The main parsing method is quite database intensive. For interests sake, for each record that is parsed there are the following db calls:
1 - there is a check that the uniquely generated base62 is unique against a baseid map table
2 - There is an archive check to see if the record has changed
3 - The record is inserted into the db
The problem is that I began to get "Mysql has gone away" errors while the parser was being run in forked mode, so after much fiddling I came up with the following mysql config:
#
# * Fine Tuning
#
key_buffer = 10000M
max_allowed_packet = 10000M
thread_stack = 192K
thread_cache_size = 8
myisam-recover = BACKUP
max_connections = 10000
table_cache = 64
thread_concurrency = 32
wait_timeout = 15
tmp_table_size = 1024M
query_cache_limit = 2M
#query_cache_size = 100M
query_cache_size = 0
query_cache_type = 0
That seems to have fixed problems while the parser is running However, I am now getting a "Mysql server has gone away" when the next module is run after the main parser.
The strange thinf is the module causing problems involves a very simple SELECT query on a table with currently only 3 records. Run directly as a test (not after the parser) it works fine.
I tried adding a pause of 4 minutes after the parser module runs - but I get the same error.
I have a main DBConnection.pm model with this:
package DBConnection;
use DBI;
use PXConfig;
sub new {
my $class = shift;
## MYSQL Connection
my $config = new PXConfig();
my $host = $config->val('database', 'host');
my $database = $config->val('database', 'db');
my $user = $config->val('database', 'user');
my $pw = $config->val('database', 'password');
my $dsn = "DBI:mysql:database=$database;host=$host;";
my $connect2 = DBI->connect( $dsn, $user, $pw, );
$connect2->{mysql_auto_reconnect} = 1;
$connect2->{RaiseError} = 1;
$connect2->{PrintError} = 1;
$connect2->{ShowErrorStatement} = 1;
$connect2->{InactiveDestroy} = 1;
my $self = {
connect => $connect2,
};
bless $self, $class;
return $self;
}
Then all modules, including the forked parser modules, open a connection to the DB using:
package Example;
use DBConnection;
sub new {
my $class = shift;
my $db = new DBConnection;
my $connect2 = $db->connect();
my $self = {
connect2 => $connect2,
};
bless $self, $class;
return $self;
}
The question is if I have Module1.pm that calls Module2.pm that calls Module3.pm and each of them instantiates a connection with the DB as shown above (ie in the constructor) then are they using different connections to the database or the same connection?
What I wondered is if the script takes say 6 hours to finish, if the top level call to the db connection is timing out the lower level db connection even though the lower level module is making its 'own' connection.
It is very frustrating trying to find the problem as I can only reproduce the error after running a very long parse process.
Sorry for the long question, thanks in advance to anyone who can give me any ideas.
UPDATE 1:
Here is the actual forking part:
my $fh = Tie::Handle::CSV->new( "$file", header => 1 );
while ( my $part = <$fh> ) {
if ( $children == $max_threads ) {
$pid = wait();
$children--;
}
if ( defined( $pid = fork ) ) {
if ($pid) {
$children++;
} else {
$cfptu = new ThreadedUnit();
$cfptu->parseThreadedUnit($part, $group_id, $feed_id);
}
}
}
And then the ThreadedUnit:
package ThreadedUnit;
use CollisionChecker;
use ArchiveController;
use Filters;
use Try::Tiny;
use MysqlLogger;
sub new {
my $class = shift;
my $db = new DBConnection;
my $connect2 = $db->connect();
my $self = {
connect2 => $connect2,
};
bless $self, $class;
return $self;
}
sub parseThreadedUnit {
my ( $self, $part, $group_id, $feed_id ) = #_;
my $connect2 = $self->{connect2};
## Parsing stuff
## DB Update in try -> catch
exit();
}
So as I understand the DB connection is being called after the forking.
But, as I mentioned above the forked code outlined just above works fine. It is the next module that does not work which is being run from a controller module which just runs through each worker module one at time (the parser being one of them) - the controller module does not create a DB connection in its construct or anywhere else.
Update 2
I forgot to mention that I don't get any errors in the 'problem' module following the parser if I only parse a small number of files and not the full queue.
So it is almost as if the intensive forked parsing and accessing the DB makes it un-available for normal non-forked processes just after it ends for some undetermined time.
The only thing I have noticed when the parser run finishes in Mysql status is the Threads_connected sits around, say, 500 and does not decrease for some time.
It depends on how your program is structured, which isn't clear from the question.
If you create the DB connection before you fork, Perl will make a copy of the DB connection object for each process. This would likely cause problems if two processes try to access the database concurrently with the same DB connection.
On the other hand, if you create the DB connections after forking, each module will have its own connection. This should work, but you could have a timeout problem if Module x creates a connection, then waits a long time for a process in Module y to finish, then tries to use the connection.
In summary, here is what you want:
Don't have any open connections at the point you fork. Child processes should create their own connections.
Only open a connection right before you want to use it. If there is a point in your program when you have to wait, open the connection after the waiting is done.
Read dan1111's answer but I suspect you are connecting then forking. When the child completes the DBI connection handle goes out of scope and is closed. As dan1111 says you are better connecting in the child for all the reasons he said. Read about InactiveDestroy and AutoInactiveDestroy in DBI which will help you understand what is going on.
Right now we have a large perl application that is using raw DBI to connect to MySQL and execute SQL statements. It creates a connection each time and terminates. Were starting to approach mysql's connection limit (200 at once)
It looks like DBIx::Connection supports application layer connection pooling.
Has anybody had any experience with DBIx::Connection?. Are there any other considerations for connection pooling?
I also see mod_dbd which is an Apache mod that looks like it handles connection pooling.
http://httpd.apache.org/docs/2.1/mod/mod_dbd.html
I don't have any experience with DBIx::Connection, but I use DBIx::Connector (essentially what DBIx::Class uses internally, but inlined) and it's wonderful...
I pool these connections with a Moose object wrapper that hands back existing object instances if the connection parameters are identical (this would work the same for any underlying DB object):
package MyApp::Factory::DatabaseConnection;
use strict;
use warnings;
use Moose;
# table of database name -> connection objects
has connection_pool => (
is => 'ro', isa => 'HashRef[DBIx::Connector]',
traits => ['Hash'],
handles => {
has_pooled_connection => 'exists',
get_pooled_connection => 'get',
save_pooled_connection => 'set',
},
default => sub { {} },
);
sub get_connection
{
my ($self, %options) = #_;
# some application-specific parsing of %options here...
my $obj;
if ($options{reuse})
{
# extract the last-allocated connection for this database and pass it
# back, if there is one.
$obj = $self->get_pooled_connection($options{database});
}
if (not $obj or not $obj->connected)
{
# look up connection info based on requested database name
my ($dsn, $username, $password) = $self->get_connection_info($options{database});
$obj = DBIx::Connector->new($dsn, $username, $password);
return unless $obj;
# Save this connection for later reuse, possibly replacing an earlier
# saved connection (this latest one has the highest chance of being in
# the same pid as a subsequent request).
$self->save_pooled_connection($options{database}, $obj) unless $options{nosave};
}
return $obj;
}
Just making sure: you know about DBI->connect_cached(), right? It's a drop-in replacement for connect() that reuses dbh's, where possible, over the life of your perl script. Maybe your problem is solvable by adding 7 characters :)
And, MySQL's connections are relatively cheap. Running with your DB at max_connections=1000 or more won't by itself cause problems. (If your clients are demanding more work than your DB can handle, that's a more serious problem, one which a lower max_connections might put off but of course not solve.)