I am trying to establish an event subscription via zmq from my locally running sawtooth network. As soon as I start my event-subscriber container, I get the error "interrupted system call".
I am following the example from here https://github.com/danintel/sawtooth-cookiejar/tree/master/events/go
I have tried using validatorUrl as tcp://localhost:4004 tcp://validator-0:4004
note: validator-0 is my local container name for the validator
Also, have tried with the direct IP of the validator container tcp://<IP>:4004
zmqConnection.RecvMsgWithId() is throwing the error.
The error I am getting is exactly at this line https://github.com/danintel/sawtooth-cookiejar/blob/master/events/go/src/events_client.go#L105
Can someone please help for the probable reasons or the way I can debug this one?
I do not know, but one possible cause is this example was recently updated to a new version of Go, 1.11 (from 1.9) after your posting:
https://github.com/danintel/sawtooth-cookiejar/pull/9
Because of this error:
Loading input failed: unsupported version of go: exit status 2: flag provided but not defined: -compiled
The issue was related to inter-pod communication. So the issue was, my event subscriber client was in a completely different pod than the pod where the validator container was running. In that case, we need to used the FQDN of that pod. Refer to the link below.
https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-hostname-and-subdomain-fields
I'm using symfony 4, What does error stands for ?
Warning: SessionHandler::read(): open(/var/lib/php/sessions/sess_634q91mh896b6aa4jpjvlihmar, O_RDWR) failed: Permission denied (13)
When a user logs into a Symfony application, session information is stored on the web server. By default Symfony uses the native PHP session mechanism, storing session info in a file in /var/lib/php/sessions/ on Linux systems. Your error message is output by PHP and means it got a permissions error creating or re-opening a session file.
The error appears only intermittently because PHP removes old session files randomly about every 1/100th or 1/1000th page load. (On some Linux variants, old session files are removed by a cron job instead.)
https://symfony.com/doc/current/session.html says:
"some session expiration related options may not work as expected if other applications that write to the same directory have short max lifetime settings."
Try to avoid having multiple processes writing to the same sessions directory. I think I got the error message because both an Apache web server and php bin/console server:start were running at the same time. One process may have removed the other process's session file.
See PHP manual and Symfony manual for how to configure writing to separate directories. For example, I changed {Symfony directory}/config/packages/framework.yaml:
# Enables session support. Note that the session will ONLY be started if you read or write from it.
# Remove or comment this section to explicitly disable session support.
session:
# handler_id: ~
cookie_secure: auto
cookie_samesite: lax
handler_id: 'session.handler.native_file'
save_path: '%kernel.project_dir%/var/sessions/%kernel.environment%'
gc_probability: 100 # Run garbage collection always for
gc_divisor: 100 # investigating this problem only.
Another possibility is a problem in your Symfony code can cause the error message. The Symfony documentation says not to call the PHP session functions like session_start() directly since Symfony classes call them. A bug in my code caused an exception which I speculate caused the error message.
Related stack overflow questions: cleanup-php-session-files and how-does-php-know-when-to-delete-a-session
For those familiar with C code, see the PHP interpreter source code line that prints the error here
Hope this helps!
Not related to symfony 4, but you have to fix permissions in your /var/lib/php/sessions/ directory
I have a setup where i need a proxy in front on a server.
LightTpd 1.4.13 is already used on the embedded platform which should act as proxy.
Newer lighttpd's is not easily build due to an old toolchain.
One port (e.g. port 84) of the proxy platform should forward all traffic to port 80 on the server.
Some simple pages are forwarded just fine, but some other fail. The server has as "web_resp.exe", this is returned as a download option of 0 byte.
Wireshark dumping
Dumps with Wireshark show that the needed pages are send the proxy-platform, but 0 bytes are forwarded. (this was performed on a similar setup)
Question
Is my configuration wrong?
Is it impossible on lighttpd 1.4.13? (i have seen forum-post telling the mod_proxy of lighttpd has problem in general)
Reproducibility
I have reproduced the flaw by running Lighttpd on a new mintLinux (same error type)
I get the same error when forwarding to other ip/site (a web-config of a ethernet -> rs232-port unit).
Exactly what triggers the error is do not know, maybee just too large pages.
Configuration
#lighttpd configuration file
server.modules = (
"mod_proxy"
)
## a static document-root, for virtual-hosting take look at the
## server.virtual-* options
server.document-root = "/tmp/"
## where to send error-messages to
server.errorlog = "/tmp/lighttpd.error.log"
## bind to port (default: 80)
server.port = 84
#### proxy module
## read proxy.txt for more info
proxy.debug = 1
proxy.server = ( "" =>
(
( "host" => "10.0.0.175", "port" => 80)
)
)
Debug dumps
functional and non-functional request seem similar.
However the non-functional read larger size of data (it is still to considered small size <100 kB)
other tests
lighttpd 1.4.35 compiled for the target, but it seem to fail in same way.
lighttpd 1.4.35 neither work on the mintLinux.
1.4.35 + rewrite trick...
works worse than directly using a port
lighttpd 1.5 works out of the box (after installing gthread2) on a mintLinux. However will not work for the target hardware.
The issue have been found to be faulty http headers provided by the backend.
The issue was submitted to the Lighttpd-bug site https://redmine.lighttpd.net/issues/2594#change-8877
Lighttpd now have support for webpages only sending \LF as opposed to \CR\LF
You may argue that the bug is in the target web-page. However in by case i was unable to modify the target site.
I have setup gunicorn with 3 workers, 30 worker connections and using eventlet worker class. It is set up behind Nginx. After every few requests, I see this in the logs.
[ERROR] gunicorn.error: WORKER TIMEOUT (pid:23475)
None
[INFO] gunicorn.error: Booting worker with pid: 23514
Why is this happening? How can I figure out what's going wrong?
We had the same problem using Django+nginx+gunicorn. From Gunicorn documentation we have configured the graceful-timeout that made almost no difference.
After some testings, we found the solution, the parameter to configure is: timeout (And not graceful timeout). It works like a clock..
So, Do:
1) open the gunicorn configuration file
2) set the TIMEOUT to what ever you need - the value is in seconds
NUM_WORKERS=3
TIMEOUT=120
exec gunicorn ${DJANGO_WSGI_MODULE}:application \
--name $NAME \
--workers $NUM_WORKERS \
--timeout $TIMEOUT \
--log-level=debug \
--bind=127.0.0.1:9000 \
--pid=$PIDFILE
On Google Cloud
Just add --timeout 90 to entrypoint in app.yaml
entrypoint: gunicorn -b :$PORT main:app --timeout 90
Run Gunicorn with --log-level debug.
It should give you an app stack trace.
Is this endpoint taking too many time?
Maybe you are using flask without asynchronous support, so every request will block the call. To create async support without make difficult, add the gevent worker.
With gevent, a new call will spawn a new thread, and you app will be able to receive more requests
pip install gevent
gunicon .... --worker-class gevent
The Microsoft Azure official documentation for running Flask Apps on Azure App Services (Linux App) states the use of timeout as 600
gunicorn --bind=0.0.0.0 --timeout 600 application:app
https://learn.microsoft.com/en-us/azure/app-service/configure-language-python#flask-app
WORKER TIMEOUT means your application cannot response to the request in a defined amount of time. You can set this using gunicorn timeout settings. Some application need more time to response than another.
Another thing that may affect this is choosing the worker type
The default synchronous workers assume that your application is resource-bound in terms of CPU and network bandwidth. Generally this means that your application shouldn’t do anything that takes an undefined amount of time. An example of something that takes an undefined amount of time is a request to the internet. At some point the external network will fail in such a way that clients will pile up on your servers. So, in this sense, any web application which makes outgoing requests to APIs will benefit from an asynchronous worker.
When I got the same problem as yours (I was trying to deploy my application using Docker Swarm), I've tried to increase the timeout and using another type of worker class. But all failed.
And then I suddenly realised I was limitting my resource too low for the service inside my compose file. This is the thing slowed down the application in my case
deploy:
replicas: 5
resources:
limits:
cpus: "0.1"
memory: 50M
restart_policy:
condition: on-failure
So I suggest you to check what thing slowing down your application in the first place
Could it be this?
http://docs.gunicorn.org/en/latest/settings.html#timeout
Other possibilities could be your response is taking too long or is stuck waiting.
This worked for me:
gunicorn app:app -b :8080 --timeout 120 --workers=3 --threads=3 --worker-connections=1000
If you have eventlet add:
--worker-class=eventlet
If you have gevent add:
--worker-class=gevent
I've got the same problem in Docker.
In Docker I keep trained LightGBM model + Flask serving requests. As HTTP server I used gunicorn 19.9.0. When I run my code locally on my Mac laptop everything worked just perfect, but when I ran the app in Docker my POST JSON requests were freezing for some time, then gunicorn worker had been failing with [CRITICAL] WORKER TIMEOUT exception.
I tried tons of different approaches, but the only one solved my issue was adding worker_class=gthread.
Here is my complete config:
import multiprocessing
workers = multiprocessing.cpu_count() * 2 + 1
accesslog = "-" # STDOUT
access_log_format = '%(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(q)s" "%(D)s"'
bind = "0.0.0.0:5000"
keepalive = 120
timeout = 120
worker_class = "gthread"
threads = 3
I had very similar problem, I also tried using "runserver" to see if I could find anything but all I had was a message Killed
So I thought it could be resource problem, and I went ahead to give more RAM to the instance, and it worked.
You need to used an other worker type class an async one like gevent or tornado see this for more explanation :
First explantion :
You may also want to install Eventlet or Gevent if you expect that your application code may need to pause for extended periods of time during request processing
Second one :
The default synchronous workers assume that your application is resource bound in terms of CPU and network bandwidth. Generally this means that your application shouldn’t do anything that takes an undefined amount of time. For instance, a request to the internet meets this criteria. At some point the external network will fail in such a way that clients will pile up on your servers.
If you are using GCP then you have to set workers per instance type.
Link to GCP best practices https://cloud.google.com/appengine/docs/standard/python3/runtime
timeout is a key parameter to this problem.
however it's not suit for me.
i found there is not gunicorn timeout error when i set workers=1.
when i look though my code, i found some socket connect (socket.send & socket.recv) in server init.
socket.recv will block my code and that's why it always timeout when workers>1
hope to give some ideas to the people who have some problem with me
For me, the solution was to add --timeout 90 to my entrypoint, but it wasn't working because I had TWO entrypoints defined, one in app.yaml, and another in my Dockerfile. I deleted the unused entrypoint and added --timeout 90 in the other.
For me, it was because I forgot to setup firewall rule on database server for my Django.
Frank's answer pointed me in the right direction. I have a Digital Ocean droplet accessing a managed Digital Ocean Postgresql database. All I needed to do was add my droplet to the database's "Trusted Sources".
(click on database in DO console, then click on settings. Edit Trusted Sources and select droplet name (click in editable area and it will be suggested to you)).
Check that your workers are not killed by a health check. A long request may block the health check request, and the worker gets killed by your platform because the platform thinks that the worker is unresponsive.
E.g. if you have a 25-second-long request, and a liveness check is configured to hit a different endpoint in the same service every 10 seconds, time out in 1 second, and retry 3 times, this gives 10+1*3 ~ 13 seconds, and you can see that it would trigger some times but not always.
The solution, if this is your case, is to reconfigure your liveness check (or whatever health check mechanism your platform uses) so it can wait until your typical request finishes. Or allow for more threads - something that makes sure that the health check is not blocked for long enough to trigger worker kill.
You can see that adding more workers may help with (or hide) the problem.
The easiest way that worked for me is to create a new config.py file in the same folder where your app.py exists and to put inside it the timeout and all your desired special configuration:
timeout = 999
Then just run the server while pointing to this configuration file
gunicorn -c config.py --bind 0.0.0.0:5000 wsgi:app
note that for this statement to work you need wsgi.py also in the same directory having the following
from myproject import app
if __name__ == "__main__":
app.run()
Cheers!
Apart from the gunicorn timeout settings which are already suggested, since you are using nginx in front, you can check if these 2 parameters works, proxy_connect_timeout and proxy_read_timeout which are by default 60 seconds. Can set them like this in your nginx configuration file as,
proxy_connect_timeout 120s;
proxy_read_timeout 120s;
In my case I came across this issue when sending larger(10MB) files to my server. My development server(app.run()) received them no problem but gunicorn could not handle them.
for people who come to the same problem I did. My solution was to send it in chunks like this:
ref / html example, separate large files ref
def upload_to_server():
upload_file_path = location
def read_in_chunks(file_object, chunk_size=524288):
"""Lazy function (generator) to read a file piece by piece.
Default chunk size: 1k."""
while True:
data = file_object.read(chunk_size)
if not data:
break
yield data
with open(upload_file_path, 'rb') as f:
for piece in read_in_chunks(f):
r = requests.post(
url + '/api/set-doc/stream' + '/' + server_file_name,
files={name: piece},
headers={'key': key, 'allow_all': 'true'})
my flask server:
#app.route('/api/set-doc/stream/<name>', methods=['GET', 'POST'])
def api_set_file_streamed(name):
folder = escape(name) # secure_filename(escape(name))
if 'key' in request.headers:
if request.headers['key'] != key:
return 404
else:
return 404
for fn in request.files:
file = request.files[fn]
if fn == '':
print('no file name')
flash('No selected file')
return 'fail'
if file and allowed_file(file.filename):
file_dir_path = os.path.join(app.config['UPLOAD_FOLDER'], folder)
if not os.path.exists(file_dir_path):
os.makedirs(file_dir_path)
file_path = os.path.join(file_dir_path, secure_filename(file.filename))
with open(file_path, 'ab') as f:
f.write(file.read())
return 'sucess'
return 404
in case you have changed the name of the django project you should also go to
cd /etc/systemd/system/
then
sudo nano gunicorn.service
then verify that at the end of the bind line the application name has been changed to the new application name
I have a webapp that segfaults when the database in restarted and it tries to use the old connections. Running it under gdb --args apache -X leads to the following output:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1212868928 (LWP 16098)]
0xb7471c20 in mysql_send_query () from /usr/lib/libmysqlclient.so.15
I've checked that the drivers and database are all up to date (DBD::mysql 4.0008, MySQL 5.0.32-Debian_7etch6-log).
Annoyingly I can't reproduce this with a trivial script:
use DBI;
use Test::More tests => 2;
my $dbh = DBI->connect( "dbi:mysql:test", 'root' );
sub test_db {
my ($number) = $dbh->selectrow_array("select 1 ");
return $number;
}
is test_db, 1, "connected to db";
warn "restart db now";
getc;
is test_db, 1, "connected to db";
Which gives the following:
ok 1 - connected to db
restart db now at dbd-mysql-test.pl line 23.
DBD::mysql::db selectrow_array failed: MySQL server has gone away at dbd-mysql-test.pl line 17.
not ok 2 - connected to db
# Failed test 'connected to db'
# at dbd-mysql-test.pl line 26.
# got: undef
# expected: '1'
This behaves correctly, telling me why the request failed.
What stumps me is that it is segfaulting, which it shouldn't do. As it only appears to happen when the whole app is running (which uses DBIx::Class) it is hard to reduce it to a test case.
Where should I start to look to debug this? Has anyone else seen this?
UPDATE: further prodding showed that it being under mod_perl was a red herring. Having reduced it to a simple test script I've now posted to the DBI mailing list. Thanks for your answers.
What this probably means is that there's a difference between your mod_perl environment and the one you were testing via your script. Some things to check:
Was your mod_perl compiled with the same version of Perl
Are the #INC's the same for both
Are you using threads in your mod_perl setup? I don't believe DBD::mysql is completely thread-safe.
I've seen this problem, but I'm not sure it had the same cause as yours. Are you by chance using a certain module for sending mails (forgot the name, sorry) from your application? When we had the problem in a project, after days of debugging we found that this mail module was doing strange things with open file descriptors, then forked off another process which called the console tool sendmail, which again did strange things with file descriptors. I guess one of the file descriptors it messed around with was the connection to the database, but I'm still not sure about that. The problem disappeared when we switched to another module for sending mails. Maybe it's worth a look for you too.
If you're getting a segfault, do you have a core file greated? If not, check ulimit -c. If that returns 0, your system won't create core files and you'll have to change that. If you do have a core file, you can use gdb or similar tools to debug it. It's not particularly fun, but it's possible. The start of the command will look something like:
gbd /usr/bin/httpd core
There are plenty of tutorials for debugging core files scattered about the Web.
Update: Just found a reference for ensuring you get core dumps from mod_perl. That should help.
This is a known problem in old DBD::mysql. Upgrade it (4.008 is not up to date).
There's a simple test script attached to https://rt.cpan.org/Public/Bug/Display.html?id=37027
that will trigger this bug.