We have a Puppet server running that services a couple of hundred Windows servers. The installed Puppet agent is 6.x. On almost all of the servers 'puppet agent -t' works fine, with a few exceptions exhibiting the same issue.
When I start clean, the Puppet agent connects with the server, receives a certificate and downloads all of the facts and what not. This works. Then the agent loads the facts and after a while I get an error message:
C:\>puppet agent -t
Info: Using configured environment 'windows'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Retrieving locales
Info: Loading facts
Error: Failed to apply catalog: Could not render to json: source sequence is illegal/malformed utf-8
C:\>
If I run the Puppet agent in debug mode, although I could have missed it because there's a lot of output, all it shows is that it's resolving facts, and then the above message appears and the agent run stops. The last fact (according to debug output) that is being resolved is consistently:
Debug: Facter: resolving processor facts.
Debug: Facter: fact "hardwareisa" has resolved to "x64".
Debug: Facter: fact "processorcount" has resolved to 2.
Debug: Facter: fact "physicalprocessorcount" has resolved to 1.
Debug: Facter: fact "processor0" has resolved to "Intel(R) Xeon(R) CPU E5-2643 v2 # 3.50GHz".
Debug: Facter: fact "processors" has resolved to {
count => 2,
isa => "x64",
models => [
"Intel(R) Xeon(R) CPU E5-2643 v2 # 3.50GHz"
],
physicalcount => 1
}.
Error: Failed to apply catalog: Could not render to json: source sequence is illegal/malformed utf-8
However, I'm in doubt if that is the culprit because IIRC Puppet does not really run things sequential.
I don't understand how the same thing can work on one server, but not on another, even when having the same agent version. How can I find out what is the source of the error message?
I know this is an ancient topic but the resolution in my case was confirming that each of the custom facts was encoded in UTF-8. We discovered that a single fact file was encoded differently and re-encoding it as UTF-8 fixed our issues.
Installed Hyperledger sawtooth with this guide:
https://sawtooth.hyperledger.org/docs/core/releases/latest/sysadmin_guide/installation.html
[2018-11-04 02:35:13.204 DEBUG selector_events] Using selector: ZMQSelector
[2018-11-04 02:35:13.205 INFO interconnect] Listening on tcp://127.0.0.1:4004
[2018-11-04 02:35:13.205 DEBUG dispatch] Added send_message function for connection ServerThread
[2018-11-04 02:35:13.206 DEBUG dispatch] Added send_last_message function for connection ServerThread
[2018-11-04 02:35:13.206 DEBUG genesis] genesis_batch_file: /var/lib/sawtooth/genesis.batch
[2018-11-04 02:35:13.206 DEBUG genesis] block_chain_id: not yet specified
[2018-11-04 02:35:13.207 INFO genesis] Producing genesis block from /var/lib/sawtooth/genesis.batch
[2018-11-04 02:35:13.207 DEBUG genesis] Adding 1 batches
[2018-11-04 02:35:13.208 DEBUG executor] no transaction processors registered for processor type sawtooth_settings: 1.0
[2018-11-04 02:35:13.209 INFO executor] Waiting for transaction processor (sawtooth_settings, 1.0)
[2018-11-04 02:35:13.311 INFO processor_handlers] registered transaction processor: connection_id=014a2086c9ffe773b104d8a0122b9d5f867a1b2d44236acf4ab097483dbe49c2ad33d3302acde6f985d911067fe92207aa8adc1c9dbc596d826606fe1ef1d4ef, family=intkey, version=1.0, namespaces=['1cf126']
[2018-11-04 02:35:18.110 INFO processor_handlers] registered transaction processor: connection_id=e615fc881f8e7b6dd05b1e3a8673d125a3e759106247832441bd900abae8a3244e1507b943258f62c458ded9af0c5150da420c7f51f20e62330497ecf9092060, family=xo, version=1.0, namespaces=['5b7349']
[2018-11-04 02:35:21.908 DEBUG permission_verifier] Chain head is not set yet. Permit all.
[2018-11-04 02:35:21.908 DEBUG permission_verifier] Chain head is not set yet. Permit all.
Than:
ubuntu#ip-172-31-42-144:~$ sudo intkey-tp-python -vv
[2018-11-04 02:42:05.710 INFO core] register attempt: OK
Than:
ubuntu#ip-172-31-42-144:~$ intkey create_batch
Writing to batches.intkey...
ubuntu#ip-172-31-42-144:~$ intkey load
batches: 2 batch/sec: 160.14600713999351
REST-API works, too.
I did exactly all steps as shown in the guide. The older one doesn't help me, too. hyperledger sawtooth validator node permissioning issue
ubuntu#ip-172-31-42-144:~$ curl http://localhost:8008/blocks
{
"error": {
"code": 15,
"message": "The validator has no genesis block, and is not yet ready to be queried. Try your request again later.",
"title": "Validator Not Ready"
}
}
genesis was attached ?!
MARiE
As the log shows, the genesis batch is waiting on the sawtooth-setting TP. If you start that up, just like you start up intkey and xo, it will process the genesis batch and will then be able to handle your intkey transactions.
I have a Google compute engine instance(Cent-Os) which I could access using its external IP address till recently.
Now suddenly the instance cannot be accessed using its using its external IP address.
I logged in to the developer console and tried rebooting the instance but that did not help.
I also noticed that the CPU usage is almost at 100% continuously.
On further analysis of the Serial port output it appears the init module is not loading properly.
I am pasting below the last few lines from the serial port output of the virtual machine.
rtc_cmos 00:01: RTC can wake from S4
rtc_cmos 00:01: rtc core: registered rtc_cmos as rtc0
rtc0: alarms up to one day, 114 bytes nvram
cpuidle: using governor ladder
cpuidle: using governor menu
EFI Variables Facility v0.08 2004-May-17
usbcore: registered new interface driver hiddev
usbcore: registered new interface driver usbhid
usbhid: v2.6:USB HID core driver
GRE over IPv4 demultiplexor driver
TCP cubic registered
Initializing XFRM netlink socket
NET: Registered protocol family 17
registered taskstats version 1
rtc_cmos 00:01: setting system clock to 2014-07-04 07:40:53 UTC (1404459653)
Initalizing network drop monitor service
Freeing unused kernel memory: 1280k freed
Write protecting the kernel read-only data: 10240k
Freeing unused kernel memory: 800k freed
Freeing unused kernel memory: 1584k freed
Failed to execute /init
Kernel panic - not syncing: No init found. Try passing init= option to kernel.
Pid: 1, comm: swapper Not tainted 2.6.32-431.17.1.el6.x86_64 #1
Call Trace:
[] ? panic+0xa7/0x16f
[] ? init_post+0xa8/0x100
[] ? kernel_init+0x2e6/0x2f7
[] ? child_rip+0xa/0x20
[] ? kernel_init+0x0/0x2f7
[] ? child_rip+0x0/0x20
Thanks in advance for any tips to resolve this issue.
Mathew
It looks like you might have an script or other program that is causing you to run out of Inodes.
You can delete the instance without deleting the persistent disk (PD) and create a new vm with a higher capacity using your PD, however if it's an script causing this, you will end up with the same issue. It's always recommended to backup your PD before making any changes.
Run this command to find more info about your instance:
gcutil --project= getserialportoutput
If the issue still continue, you can either
- Make a snapshot of your PD and make a PD's copy or
- Delete the instance without deleting the PD
Attach and mount the PD to another vm as a second disk, so you can access it to find what is causing this issue. Visit this link https://developers.google.com/compute/docs/disks#attach_disk for more information on how to do this.
Visit this page http://www.ivankuznetsov.com/2010/02/no-space-left-on-device-running-out-of-inodes.html for more information about inodes troubleshooting.
Make sure the Allow HTTP traffic setting on the vm is still enabled.
Then see which network firewall you are using and it's rules.
If your network is set up to use an ephemral IP, it will be periodically released back. This will cause your IP to change over time. Set it to static/reserved then (on networks page).
https://developers.google.com/compute/docs/instances-and-network#externaladdresses
I have setup gunicorn with 3 workers, 30 worker connections and using eventlet worker class. It is set up behind Nginx. After every few requests, I see this in the logs.
[ERROR] gunicorn.error: WORKER TIMEOUT (pid:23475)
None
[INFO] gunicorn.error: Booting worker with pid: 23514
Why is this happening? How can I figure out what's going wrong?
We had the same problem using Django+nginx+gunicorn. From Gunicorn documentation we have configured the graceful-timeout that made almost no difference.
After some testings, we found the solution, the parameter to configure is: timeout (And not graceful timeout). It works like a clock..
So, Do:
1) open the gunicorn configuration file
2) set the TIMEOUT to what ever you need - the value is in seconds
NUM_WORKERS=3
TIMEOUT=120
exec gunicorn ${DJANGO_WSGI_MODULE}:application \
--name $NAME \
--workers $NUM_WORKERS \
--timeout $TIMEOUT \
--log-level=debug \
--bind=127.0.0.1:9000 \
--pid=$PIDFILE
On Google Cloud
Just add --timeout 90 to entrypoint in app.yaml
entrypoint: gunicorn -b :$PORT main:app --timeout 90
Run Gunicorn with --log-level debug.
It should give you an app stack trace.
Is this endpoint taking too many time?
Maybe you are using flask without asynchronous support, so every request will block the call. To create async support without make difficult, add the gevent worker.
With gevent, a new call will spawn a new thread, and you app will be able to receive more requests
pip install gevent
gunicon .... --worker-class gevent
The Microsoft Azure official documentation for running Flask Apps on Azure App Services (Linux App) states the use of timeout as 600
gunicorn --bind=0.0.0.0 --timeout 600 application:app
https://learn.microsoft.com/en-us/azure/app-service/configure-language-python#flask-app
WORKER TIMEOUT means your application cannot response to the request in a defined amount of time. You can set this using gunicorn timeout settings. Some application need more time to response than another.
Another thing that may affect this is choosing the worker type
The default synchronous workers assume that your application is resource-bound in terms of CPU and network bandwidth. Generally this means that your application shouldn’t do anything that takes an undefined amount of time. An example of something that takes an undefined amount of time is a request to the internet. At some point the external network will fail in such a way that clients will pile up on your servers. So, in this sense, any web application which makes outgoing requests to APIs will benefit from an asynchronous worker.
When I got the same problem as yours (I was trying to deploy my application using Docker Swarm), I've tried to increase the timeout and using another type of worker class. But all failed.
And then I suddenly realised I was limitting my resource too low for the service inside my compose file. This is the thing slowed down the application in my case
deploy:
replicas: 5
resources:
limits:
cpus: "0.1"
memory: 50M
restart_policy:
condition: on-failure
So I suggest you to check what thing slowing down your application in the first place
Could it be this?
http://docs.gunicorn.org/en/latest/settings.html#timeout
Other possibilities could be your response is taking too long or is stuck waiting.
This worked for me:
gunicorn app:app -b :8080 --timeout 120 --workers=3 --threads=3 --worker-connections=1000
If you have eventlet add:
--worker-class=eventlet
If you have gevent add:
--worker-class=gevent
I've got the same problem in Docker.
In Docker I keep trained LightGBM model + Flask serving requests. As HTTP server I used gunicorn 19.9.0. When I run my code locally on my Mac laptop everything worked just perfect, but when I ran the app in Docker my POST JSON requests were freezing for some time, then gunicorn worker had been failing with [CRITICAL] WORKER TIMEOUT exception.
I tried tons of different approaches, but the only one solved my issue was adding worker_class=gthread.
Here is my complete config:
import multiprocessing
workers = multiprocessing.cpu_count() * 2 + 1
accesslog = "-" # STDOUT
access_log_format = '%(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(q)s" "%(D)s"'
bind = "0.0.0.0:5000"
keepalive = 120
timeout = 120
worker_class = "gthread"
threads = 3
I had very similar problem, I also tried using "runserver" to see if I could find anything but all I had was a message Killed
So I thought it could be resource problem, and I went ahead to give more RAM to the instance, and it worked.
You need to used an other worker type class an async one like gevent or tornado see this for more explanation :
First explantion :
You may also want to install Eventlet or Gevent if you expect that your application code may need to pause for extended periods of time during request processing
Second one :
The default synchronous workers assume that your application is resource bound in terms of CPU and network bandwidth. Generally this means that your application shouldn’t do anything that takes an undefined amount of time. For instance, a request to the internet meets this criteria. At some point the external network will fail in such a way that clients will pile up on your servers.
If you are using GCP then you have to set workers per instance type.
Link to GCP best practices https://cloud.google.com/appengine/docs/standard/python3/runtime
timeout is a key parameter to this problem.
however it's not suit for me.
i found there is not gunicorn timeout error when i set workers=1.
when i look though my code, i found some socket connect (socket.send & socket.recv) in server init.
socket.recv will block my code and that's why it always timeout when workers>1
hope to give some ideas to the people who have some problem with me
For me, the solution was to add --timeout 90 to my entrypoint, but it wasn't working because I had TWO entrypoints defined, one in app.yaml, and another in my Dockerfile. I deleted the unused entrypoint and added --timeout 90 in the other.
For me, it was because I forgot to setup firewall rule on database server for my Django.
Frank's answer pointed me in the right direction. I have a Digital Ocean droplet accessing a managed Digital Ocean Postgresql database. All I needed to do was add my droplet to the database's "Trusted Sources".
(click on database in DO console, then click on settings. Edit Trusted Sources and select droplet name (click in editable area and it will be suggested to you)).
Check that your workers are not killed by a health check. A long request may block the health check request, and the worker gets killed by your platform because the platform thinks that the worker is unresponsive.
E.g. if you have a 25-second-long request, and a liveness check is configured to hit a different endpoint in the same service every 10 seconds, time out in 1 second, and retry 3 times, this gives 10+1*3 ~ 13 seconds, and you can see that it would trigger some times but not always.
The solution, if this is your case, is to reconfigure your liveness check (or whatever health check mechanism your platform uses) so it can wait until your typical request finishes. Or allow for more threads - something that makes sure that the health check is not blocked for long enough to trigger worker kill.
You can see that adding more workers may help with (or hide) the problem.
The easiest way that worked for me is to create a new config.py file in the same folder where your app.py exists and to put inside it the timeout and all your desired special configuration:
timeout = 999
Then just run the server while pointing to this configuration file
gunicorn -c config.py --bind 0.0.0.0:5000 wsgi:app
note that for this statement to work you need wsgi.py also in the same directory having the following
from myproject import app
if __name__ == "__main__":
app.run()
Cheers!
Apart from the gunicorn timeout settings which are already suggested, since you are using nginx in front, you can check if these 2 parameters works, proxy_connect_timeout and proxy_read_timeout which are by default 60 seconds. Can set them like this in your nginx configuration file as,
proxy_connect_timeout 120s;
proxy_read_timeout 120s;
In my case I came across this issue when sending larger(10MB) files to my server. My development server(app.run()) received them no problem but gunicorn could not handle them.
for people who come to the same problem I did. My solution was to send it in chunks like this:
ref / html example, separate large files ref
def upload_to_server():
upload_file_path = location
def read_in_chunks(file_object, chunk_size=524288):
"""Lazy function (generator) to read a file piece by piece.
Default chunk size: 1k."""
while True:
data = file_object.read(chunk_size)
if not data:
break
yield data
with open(upload_file_path, 'rb') as f:
for piece in read_in_chunks(f):
r = requests.post(
url + '/api/set-doc/stream' + '/' + server_file_name,
files={name: piece},
headers={'key': key, 'allow_all': 'true'})
my flask server:
#app.route('/api/set-doc/stream/<name>', methods=['GET', 'POST'])
def api_set_file_streamed(name):
folder = escape(name) # secure_filename(escape(name))
if 'key' in request.headers:
if request.headers['key'] != key:
return 404
else:
return 404
for fn in request.files:
file = request.files[fn]
if fn == '':
print('no file name')
flash('No selected file')
return 'fail'
if file and allowed_file(file.filename):
file_dir_path = os.path.join(app.config['UPLOAD_FOLDER'], folder)
if not os.path.exists(file_dir_path):
os.makedirs(file_dir_path)
file_path = os.path.join(file_dir_path, secure_filename(file.filename))
with open(file_path, 'ab') as f:
f.write(file.read())
return 'sucess'
return 404
in case you have changed the name of the django project you should also go to
cd /etc/systemd/system/
then
sudo nano gunicorn.service
then verify that at the end of the bind line the application name has been changed to the new application name
I'm trying to start the google-chrome browser in disabled web security mode. The selenium log says:
15:36:33.526 INFO - Command request: getNewBrowserSession[*googlechrome, http://www.myurl.de, , commandLineFlags=--disable-web-security] on session null
Anyways, it just hangs after
15:36:33.600 INFO - Launching Google Chrome...
Here's the stack trace:
16:36:44.605 ERROR - Failed to start new browser session, shutdown browser and clear all session data org.openqa.selenium.server.RemoteCommandException: timed out waiting for window 'null' to appear at org.openqa.selenium.server.FrameGroupCommandQueueSet.waitForLoad(FrameGroupCommandQueueSet.java:564) at org.openqa.selenium.server.FrameGroupCommandQueueSet.waitForLoad(FrameGroupCommandQueueSet.java:521) at org.openqa.selenium.server.BrowserSessionFactory.createNewRemoteSession(BrowserSessionFactory.java:374) at org.openqa.selenium.server.BrowserSessionFactory.getNewBrowserSession(BrowserSessionFactory.java:125) at org.openqa.selenium.server.BrowserSessionFactory.getNewBrowserSession(BrowserSessionFactory.java:87) at org.openqa.selenium.server.SeleniumDriverResourceHandler.getNewBrowserSession(SeleniumDriverResourceHandler.java:785) at org.openqa.selenium.server.SeleniumDriverResourceHandler.doCommand(SeleniumDriverResourceHandler.java:422) at org.openqa.selenium.server.SeleniumDriverResourceHandler.handleCommandRequest(SeleniumDriverResourceHandler.java:393) at org.openqa.selenium.server.SeleniumDriverResourceHandler.handle(SeleniumDriverResourceHandler.java:146) at org.openqa.jetty.http.HttpContext.handle(HttpContext.java:1530) at org.openqa.jetty.http.HttpContext.handle(HttpContext.java:1482) at org.openqa.jetty.http.HttpServer.service(HttpServer.java:909) at org.openqa.jetty.http.HttpConnection.service(HttpConnection.java:820) at org.openqa.jetty.http.HttpConnection.handleNext(HttpConnection.java:986) at org.openqa.jetty.http.HttpConnection.handle(HttpConnection.java:837) at org.openqa.jetty.http.SocketListener.handleConnection(SocketListener.java:243) at org.openqa.jetty.util.ThreadedServer.handle(ThreadedServer.java:357) at org.openqa.jetty.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
Selenium is started by robotframework by the robotframework-maven-plugin. Also xvfb is started by the maven build script to simulate a display. But the startup configuration does not seem to be the problem. Everything starts fine, just the browser won't get up.
I hope anyone can help me.
Make sure that the user account that is launching the browser has a home directory. Otherwise the browser profile creation will fail.