Will my server be able to run only one client if its a single-threaded process. If yes, why? - language-agnostic

I have googled decent enough to understand threads and processes. One thing I am confused is about single-threaded process.
The scenario is the Server-Client application process where each client is treated as a single process in the server that is single-threaded.
Some blogs are saying that single-threaded program will not be able to handle multiple clients at a time ie., it can start one process and only after finishing it, another process can be started. But some blogs saying its possible.
According to my understanding, in a single-core processor system, if its programmed for multi-tasking, depending on the time-slice allocated for each processes, more than one process can be concurrently handled. In multi-processor system also, more than one client process can be handled in parallel.
Is it just the web servers that does not handle more than one process at a time because it is iterative server? If its anyother concurrent server, will it be handling more than one process, without waiting for each process to get completed to handle the next one..?
As I am confused by many different explanations in different blogs, I am expecting a very accurate answer just for the above mentioned scenario, in either a single-processor with multi-tasking environment (or) a multi-processor environment. Also, would like to know if there are any benefits of using a single-threaded process over a process without any threads.
Sorry If my understanding is wrong. Kindly respond.
Thanks in advance..

If your server does this:
while(true)
listen()
accept()
read()
process()
write()
close()
And each of the calls to accept, read, process, write and close are single-threaded, then the server cannot listen for new requests again until those calls complete. The requests might queue up or simply get dropped if the server is not listening for them. If accept/read/process/write/close take a long time compared to listen(), then your server will not be available to service requests frequently. Moreoever, if some requests take 100 seconds and some requests take 1 second, and a lot of that time is blocking on I/O to e.g. a database or a web service, then you're spending most of your time in the server doing nothing. Here's the alternative:
while(true)
listen()
start new thread()
accept()
read()
process()
write()
close()
Now all you're doing synchronously is kicking off the new thread to handle the incoming request, and then immediately beginning to listen again. The OS will handle scheduling the threads and if some of them end up waiting a long time for some I/O to occur, your server can still process the fast requests in the meantime.

Related

Long running transactions in Slick

I'm working on a akka-http/slick web service, and I need to do the following in a transaction:
Insert a row in a table
Call some external web service
Commit the transaction
The web service I need to call is sometimes really slow to respond (let's say ~2 seconds).
I'm worried that this might keep the SQL connection open for too longer, and that'll exhaust Slick's connection pool and affect other independent requests.
Is this a possibility? Or does Slick do something to make sure this "idle" mid-transaction connection does not starve the pool?
If it is something I should be worried about - is there anything I can do to remedy this?
If it matters, I'm using MySQL with TokuDB.
The slick documentation seems to say that this will be a problem.
The use of a transaction always implies a pinned session.
And
You can use withPinnedSession to force the use of a single session, keeping the existing session open even when waiting for non-database computations.
From: http://slick.lightbend.com/doc/3.2.0/dbio.html#transactions-and-pinned-sessions

Handling doctrine 2 connections in long running background scripts

I'm running PHP commandline scripts as rabbitmq consumers which need to connect to a MySQL database. Those scripts run as Symfony2 commands using Doctrine2 ORM, meaning opening and closing the database connection is handled behind the scenes.
The connection is normally closed automatically when the cli command exits - which is by definition not happening for a long time in a background consumer.
This is a problem when the consumer is idle (no incoming messages) longer then the wait_timeout setting in the MySQL server configuration. If no message is consumed longer than that period, the database server will close the connection and the next message will fail with a MySQL server has gone away exception.
I've thought about 2 solutions for the problem:
Open the connection before each message and close the connection manually after handling the message.
Implementing a ping message which runs a dummy SQL query like SELECT 1 FROM table each n minutes and call it using a cronjob.
The problem with the first approach is: If the traffic on that queue is high, there might be a significant overhead for the consumer in opening/closing connections. The second approach just sounds like an ugly hack to deal with the issue, but at least i can use a single connection during high load times.
Are there any better solutions for handling doctrine connections in background scripts?
Here is another Solution. Try to avoid long running Symfony 2 Workers. They will always cause problems due to their long execution time. The kernel isn't made for that.
The solution here is to build a proxy in front of the real Symfony command. So every message will trigger a fresh Symfony kernel. Sound's like a good solution for me.
http://blog.vandenbrand.org/2015/01/09/symfony2-and-rabbitmq-lessons-learned/
My approach is a little bit different. My workers only process one message, then die. I have supervisor configured to create a new worker every time. So, a worker will:
Ask for a new message.
If there are no messages, sleep for 20 seconds. If not, supervisor will think there's something wrong and stop creating the worker.
If there is a message, process it.
Maybe, if processing a message is super fast, sleep for the same reason than 2.
After processing the message, just finish.
This has worked very well using AWS SQS.
Comments are welcomed.
This is a big problem when running PHP-Scripts for too long. For me, the best solution is to restart the script some times. You can see how to do this in this Topic: How to restart PHP script every 1 hour?
You should also run multiple instances of your consumer. Add a counter to any one and terminate them after some runs. Now you need a tool to ensure a consistent amount of worker processes. Something like this: http://kamisama.me/2012/10/12/background-jobs-with-php-and-resque-part-4-managing-worker/

Producer Consumer setup: How to handle Database Connections?

I'm building my first single-producer/single-consumer app in which the consumer takes items off the queue and stores them in a MySQL database.
Previously, when it was a single thread app, I would open a connection to the DB, send the query, close the connection, and repeat every time new info came in.
With a producer-consumer setup, what is the better way to handle the DB connection? Should I open it once before starting the consumer loop (I can't see a problem with this, but I'm sure that one of you fine folks will point it out if there is one)? Or should I open and close the DB connection on each iteration of the loop (seems like a waste of time and resources)?
This software runs on approximately 30 small linux computers and all of them talk to the same database. I don't see 30 simultaneous connections being an issue, but I'd love to hear your thoughts.
Apologies if this has been covered, I couldn't find it anywhere. If it has, a link would be fantastic. Thanks!
EDIT FOR CLARITY
My main focus here is the speed of the consumer thread. The whole reason for switching from single- to multi-threaded was because the single-threaded version was missing incoming information because it was busy trying to connect to the database. Given that the producer thread is expected to start dumping info into the buffer at quite a high rate, and given that the buffer will be limited in size, it is very important that the consumer work through the buffer as quickly as possible while remaining stable.
Your MySQL shouldn't have any problems handling connections in the hundreds, if not thousands.
On each of your consumers you should set up a connection pool use that from your consumer. If you consume the messages in a single thread (per application) the pool only needs to use one connection but it's also fine to consume and start parallel threads that all use one connection.
The reason for using a connection pool is that it will handle re connection and keep alive for you. Just ask it for one connection and have it promise that it will work (it does this by running a small query against the database). If you don't use a connection for a while and it get's terminated the pool will just create a new one.

To fork or not to fork?

I am re-developing a system that will send messages via http to one of a number of suppliers. The original is perl scripts and it's likely that the re-development will also use perl.
In the old system, there were a number of perl scripts all running at the same time, five for each supplier. When a message was put into the database, a random thread number (1-5) and the supplier was chosen to ensure that no message was processed twice while avoiding having to lock the table/row. Additionally there was a "Fair Queue Position" field in the database to ensure that a large message send didn't delay small sends that happened while the large one was being sent.
At some times there would be just a couple of messages per minute, but at other times there would be a dump of potentially hundreds of thousands of messages. It seems to me like a resource waste to have all the scripts running and checking for messages all of the time so I am trying to work out if there is a better way to do it, or if the old way is acceptable.
My thoughts right now lie with the idea of having one script that runs and forks as many child processes as are needed (up to a limit) depending on how much traffic there is, but I am not sure how best to implement it such that each message is processed only once, while the fair queuing is maintained.
My best guess right now is that the parent script updates the DB to indicate which child process should handle it, however I am concerned that this will end up being less efficient than the original method. I have little experience of writing forking code (last time I did it was about 15 years ago).
Any thoughts or links to guides on how best to process message queues appreciated!
You could use Thread::Queue or any other from this: Is there a multiprocessing module for Perl?
If the old system was written in Perl this way you could reuse most part of it.
Non working example:
use strict;
use warnings;
use threads;
use Thread::Queue;
my $q = Thread::Queue->new(); # A new empty queue
# Worker thread
my #thrs = threads->create(sub {
while (my $item = $q->dequeue()) {
# Do work on $item
}
})->detach() for 1..10;#for 10 threads
my $dbh = ...
while (1){
#get items from db
my #items = get_items_from_db($dbh);
# Send work to the thread
$q->enqueue(#items);
print "Pending items: "$q->pending()."\n";
sleep 15;#check DB in every 15 secs
}
I would suggest using a message queue server like RabbitMQ.
One process feeds work into the queue, and you can have multiple worker processes consume the queue.
Advantages of this approach:
workers block when waiting for work (no busy waiting)
more worker processes can be started up manually if needed
worker processes don't have to be a child of a special parent process
RabbitMQ will distribute the work among all workers which are ready to accept work
RabbitMQ will put work back into the queue if the worker doesn't return an ACK
you don't have to assign work in the database
every "agent" (worker, producer, etc.) is an independent process which means you can kill it or restart it without affecting other processes
To dynamically scale-up or down the number workers, you can implement something like:
have workers automatically die if they don't get work for a specified amount of time
have another process monitor the length of the queue and spawn more workers if the queue is getting too big
I would recommend using beanstalkd for a dedicated job server, and Beanstalk::Client in your perl scripts for adding jobs to the queue and removing them.
You should find beanstalkd easier to install and set up compared to RabbitMQ. It will also take care of distributing jobs among available workers, burying any failed jobs so they can be retried later, scheduling jobs to be done at a later date, and many more basic features. For your worker, you don't have to worry about forking or threading; just start up as many workers as you need, on as many servers as you have available.
Either RabbitMQ or Beanstalk would be better than rolling your own db-backed solution. These projects have already worked out many of the details needed for queueing, and implemented features you may not realize yet that you want. They should also handle polling for new jobs more efficiently, compared to sleeping and selecting from your database to see if there's more work to do.

How can I use Gearman for File Processing Without Killing the DB?

I'm currently designing a system for processing uploaded files.
The files are uploaded through a LAMP web frontend and must be processed through several stages some of which are sequential and others which may run in parallel.
A few key points:
The clients uploading the files only care about safely delivering the files not the results of the processing so it can be completely asynchronous.
The files are max 50kb in size
The system must scale up to processing over a million files a day
It is critical that no files may be lost or go unprocessed
My assumption is MySQL, but I have no issue with NoSQL if this could offer an advantage.
My initial idea was to have the front end put the files straight into a MySQL DB and then have a number of worker processes poll the database setting flags as they completed each step. After some rough calculations I realised that this wouldn't scale as the workers polling would start to cause locking problems on the upload table.
After some research it looks like Gearman might be the solution to the problem. The workers can register with the Gearman server and can poll for jobs without crippling the DB.
What I am currently puzzling over is how to dispatch jobs in the most efficient manner. There are three ways I can see to do this:
Write a single dispatcher to poll the database and then send jobs to Gearman
Have the upload process fire off an asynchronous Gearman job when it receives a file
Use the Gearman MySQL UDF extension to make the DB fire off jobs when files are inserted
The first approach will still hammer the DB somewhat but it could trivially recover from a failure.
The second two approaches would seem to require enabling Gearman queue persistence to recover from faults, but I am concerned that if I enable this I will loose the raw speed that attracts me to Gearman and shift the DB bottleneck downstream.
Any advice on which of these approaches would be the most efficient (or even better real world examples) would be much appreciated.
Also feel free to pitch in if you think I'm going about the whole thing the wrong way.
This has been open for a little while now so I thought I would provide some information on the approach that I took.
I create a gearman job every time a file is uploaded for a "dispatch" worker which understands the sequence of processing steps required for each file. The dispatcher queues gearman jobs for each of the processing steps.
Any jobs that complete write back a completion timestamp to the DB and call the dispatcher which can then queue any follow on tasks.
The writing of timestamps for each job completion means the system can recover its queues if processing is missed or fails without having to have the burden of persistent queues.
I would save the files to disk, then send the filename to Gearman. As each part of the process completes, it generates another message for the next part of the process, you could move the file into a new work-in-process directory for the next stage to work on it.