nginx + php-fpm configuration. Server stalls

nginx + php-fpm configuration. Server stalls - configuration

Running a server with 140 000 page views a day (analytics).
php-fpm processes go for about 10-12M each.
Servers got 10G ram, mysql goes for 1.2G-1.6G
Configuration looks like this:
nginx
user nginx;
worker_processes 4;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
#access_log /var/log/nginx/access.log main;
access_log off;
sendfile on;
#tcp_nopush on;
keepalive_timeout 10;
client_max_body_size 20M;
server_tokens off;
include /etc/nginx/conf.d/*.conf;
}
php-fpm like this:
listen = 127.0.0.1:9000
listen.allowed_clients = 127.0.0.1
user = webadmin
group = webadmin
pm = dynamic
pm.max_children = 900
pm.start_servers = 900
pm.min_spare_servers = 200
pm.max_spare_servers = 900
pm.max_requests = 500
chdir = /
Typically the server can run just fine with 500 simultaneous users (again, real time google analytics used to get this estimate) but stall at times where users are not that many (75-100 simultaneous users).
The configuration is done by my ISP, who i trust, but i still would like to know if the configuration makes sense.

I am not saying this is the best setup however it works for us.
A few things I updated with our nginx setup are:
The worker_connections, I believe that a browser opens two connections per request so you don't technically have 1024 available connections per request you have 512 so maybe change it to 2048.
I also changed the error log file param to "info" only as you have think about write times to keep the I/O low, So I changed it from "warn" to "info".
If you want to keep the access log maybe slim down the log entires it adds.
It might be worth looking at your master nginx.conf aswell you might have configs being overwritten by this file and being set back to default.
Just two little things I did from a big list I went through, however this article is great - link

Related

lighttpd 1.4.X - Error with reverse proxy - returns 0 byte - config or program error?

I have a setup where i need a proxy in front on a server.
LightTpd 1.4.13 is already used on the embedded platform which should act as proxy.
Newer lighttpd's is not easily build due to an old toolchain.
One port (e.g. port 84) of the proxy platform should forward all traffic to port 80 on the server.
Some simple pages are forwarded just fine, but some other fail. The server has as "web_resp.exe", this is returned as a download option of 0 byte.
Wireshark dumping
Dumps with Wireshark show that the needed pages are send the proxy-platform, but 0 bytes are forwarded. (this was performed on a similar setup)
Question
Is my configuration wrong?
Is it impossible on lighttpd 1.4.13? (i have seen forum-post telling the mod_proxy of lighttpd has problem in general)
Reproducibility
I have reproduced the flaw by running Lighttpd on a new mintLinux (same error type)
I get the same error when forwarding to other ip/site (a web-config of a ethernet -> rs232-port unit).
Exactly what triggers the error is do not know, maybee just too large pages.
Configuration
#lighttpd configuration file
server.modules = (
"mod_proxy"
)
## a static document-root, for virtual-hosting take look at the
## server.virtual-* options
server.document-root = "/tmp/"
## where to send error-messages to
server.errorlog = "/tmp/lighttpd.error.log"
## bind to port (default: 80)
server.port = 84
#### proxy module
## read proxy.txt for more info
proxy.debug = 1
proxy.server = ( "" =>
(
( "host" => "10.0.0.175", "port" => 80)
)
)
Debug dumps
functional and non-functional request seem similar.
However the non-functional read larger size of data (it is still to considered small size <100 kB)
other tests
lighttpd 1.4.35 compiled for the target, but it seem to fail in same way.
lighttpd 1.4.35 neither work on the mintLinux.
1.4.35 + rewrite trick...
works worse than directly using a port
lighttpd 1.5 works out of the box (after installing gthread2) on a mintLinux. However will not work for the target hardware.

The issue have been found to be faulty http headers provided by the backend.
The issue was submitted to the Lighttpd-bug site https://redmine.lighttpd.net/issues/2594#change-8877
Lighttpd now have support for webpages only sending \LF as opposed to \CR\LF
You may argue that the bug is in the target web-page. However in by case i was unable to modify the target site.

Nginx Continues to cache and serve 304 Responses (when caching disabled?)

I'm running nginx on a linux ubuntu 12.04 on a AWS machine, and I keep getting weird "caching" (?) issues on my production server. When I deploy new .css, .html, .js code - some files update, and others don't, and I get a weird mixed behavior between them. (e.g. the app works weirdly, etc.) If I ask my users to reset their cache (locally), everything works fine. I'd like to figure out a way to not have to ask users to do that!
I have tried changing the nginx configuration settings to keep getting "304" or "not modified" responses for my static files - even though I tried turning off caching, and tried following various stackoverflow posts about how to turn caching off.
Does anyone have any thoughts on what might be the problem? My guesses so far are - maybe it's something aws specific (though I tried turning sendfile off), or one of my other settings is overriding?
I've tried..
How to prevent "304 Not Modified" in nginx?
How to clear the cache of nginx?
How to disable nginx cache
https://serverfault.com/questions/269420/disable-caching-when-serving-static-files-with-nginx-for-development
and nothing's worked.
Tried sendfile off; sendfile on; setting "no cache" as well as setting a cache and having it expire in 1s.. (And running "sudo service nginx restart" between config file changes) - but still no luck. Every time, no matter what - I keep getting "304 - file not modified"
headers; and my users
My (full) current config:
user www-data;
worker_processes 4;
pid /var/run/nginx.pid;
events {
worker_connections 768;
# multi_accept on;
}
http {
##
# Basic Settings
##
add_header Cache-Control no-cache;
sendfile off; # Virtualbox Issue
expires 0s;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
# server_tokens off;
# server_names_hash_bucket_size 64;
# server_name_in_redirect off;
include /etc/nginx/mime.types;
default_type application/octet-stream;
##
# Logging Settings
##
access_log /var/log/nginx/access.log;
error_log /var/log/nginx/error.log;
##
# Gzip Settings
##
gzip on;
And inside my /sites-enabled/ folder,
upstream app_server {
server XX.XX.XX.XX:XXXX fail_timeout=0;
}
location / {
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_redirect off;
if (!-f $request_filename) {
proxy_pass http://app_server;
break;
}
}
# Virtualbox & Nginx Issues
sendfile off;
# Set the cache to expire - always (no caching)
location ~* \.(jpg|jpeg|gif|png|ico|css|zip|tgz|gz|rar|bz2|pdf|txt|tar|wav|bmp|rtf|js|flv|swf|xml|html|htm)$ {
add_header Pragma public;
add_header Cache-Control "public, must-revalidate, proxy-revalidate";
expires 1s;
}
Any thoughts?
Thanks so much!!

413: FULL head when pushing to Mercurial repository behind Nginx

I have a Mercurial repository running on Scm-manager proxied behind Nginx. A variety of smaller repositories run fine, so the basic setup seems OK.
Additionally, this same box runs Owncloud. I've tweaked the client_max_body_size on the server to 1000M so large files can be transferred. This works, and I have a variety of large files syncing between the server and clients.
However, when I try pushing a large Mercurial repository for the first time (1007 commits vs. about 80 for the other largest on this system) I get the following:
abort: HTTP Error 413: FULL head
Everything I've read about 413 errors doesn't seem to apply. First, it recommends setting the body size, which I've stated is already at 1G. Next, this seems to apply that the header is too large, which makes sense given that it's probably trying to check 1000+ revisions in the remote repository.
Another thing I've encountered is large_client_header_buffers. I've set this to insanely huge values like "64 128k" on both the server and http levels (read something about it not working on servers) but that didn't change anything.
I also looked at the scm-manager logs but see nothing, so this seems to stop with Nginx.
Thoughts? Here is part of my Nginx server configuration:
server {
server_name thewordnerd.info;
listen 443 ssl;
ssl_certificate /etc/ssl/certs/thewordnerd.info.crt;
ssl_certificate_key /etc/ssl/private/thewordnerd.info.key;
root /srv/www/thewordnerd.info/public;
client_max_body_size 1000M;
location /scm {
proxy_pass http://127.0.0.1:8080/scm;
include /etc/nginx/proxy_params;
}
}

The problem is the header buffer of the application server, this is because of mercurial uses very big headers. You have to increase the size of the header buffer and this application server specific. In case you are using the standalone version, you have to edit the server-config.xml and increase the requestHeaderSize value.
replace:
<Set name="requestHeaderSize">16384</Set>
with:
<Set name="requestHeaderSize">32768</Set>
Source: https://groups.google.com/forum/#!topic/scmmanager/Afad4zXSx78

I had HTTP Error: 413 (Request Entity Too Large) on my attempt to push. Resolved by adding client_max_body_size 2M; to /etc/nginx/nginx.conf. Wondering if maybe 1000M doesn't exceed the client_max_body_size...

Flask/MySQL app seems to be blocking concurrent requests when query is complicated

I have a simple, single page Flask (v0.8) application that queries a MySQL database and displays results for each request based on different request params. The application is served using Tornado over Nginx.
Recently I've noticed that the application seems to blocking concurrent requests from different clients when a DB query is still running. E.g -
A client makes a request with a complicated DB query that takes a while to complete (> 20 sec).
A different client makes a request to the server and is blocked until the first query returns.
So basically the application behaves like a single process that serves everyone. I was thinking the problem was with a shared DB connection on the server, so I started using the dbutils module for connection pooling. That didn't help. I think I'm probably missing something big in the architecture or the configuration of the server, so I'd appreciate any feedback on this.
This is the code for the Flask that performs the db querying (simplified):
#... flask imports and such
import MySQLdb
from DBUtils.PooledDB import PooledDB
POOL_SIZE = 5
class DBConnection:
def __init__(self):
self.pool = PooledDB(MySQLdb,
POOL_SIZE,
user='admin',
passwd='sikrit',
host='localhost',
db='data',
blocking=False,
maxcached=10,
maxconnections=10)
def query(self, sql):
"execute SQL and return results"
# obtain a connection from the pool and
# query the database
conn = self.pool.dedicated_connection()
cursor = conn.cursor()
cursor.execute(sql)
# get results and terminate connection
results = cursor.fetchall()
cursor.close()
conn.close()
return results
global db
db = DBConnection()
#app.route('/query/')
def query():
if request.method == 'GET':
# perform some DB querying based query params
sql = process_request_params(request)
results = db.query(sql)
# parse, render, etc...
Here's the tornado wrapper (run.py):
#!/usr/bin/env python
import tornado
from tornado.wsgi import WSGIContainer
from tornado.httpserver import HTTPServer
from tornado.ioloop import IOLoop
from myapplication import app
from tornado.options import define, options
define("port", default=8888, help="run on the given port", type=int)
def main():
tornado.options.parse_command_line()
http_server = HTTPServer(WSGIContainer(app), xheaders=True)
http_server.listen(options.port)
IOLoop.instance().start()
if __name__ == '__main__': main()
Starting the app via startup script:
#!/bin/sh
APP_ROOT=/srv/www/site
cd $APP_ROOT
python run.py --port=8000 --log_file_prefix=$APP_ROOT/logs/app.8000.log 2>&1 /dev/null
python run.py --port=8001 --log_file_prefix=$APP_ROOT/logs/app.8001.log 2>&1 /dev/null
And this is the nginx configuration:
user nginx;
worker_processes 1;
error_log /var/log/nginx/error.log;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
use epoll;
}
http {
upstream frontends {
server 127.0.0.1:8000;
server 127.0.0.1:8001;
}
include /usr/local/nginx/conf/mime.types;
default_type application/octet-stream;
# ..
keepalive_timeout 65;
proxy_read_timeout 200;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
gzip on;
gzip_min_length 1000;
gzip_proxied any;
gzip_types text/plain text/html text/css text/xml application/x-javascript
application/xml application/atom+xml text/javascript;
proxy_next_upstream error;
server {
listen 80;
root /srv/www/site;
location ^~ /static/ {
if ($query_string) {
expires max;
}
}
location / {
proxy_pass_header Server;
proxy_set_header Host $http_host;
proxy_redirect off;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Scheme $scheme;
proxy_pass http://frontends;
}
}
}
This is a small application that serves a very small client base, and most of it is legacy code I inherited and never got around to fix or rewrite. I've only noticed the problem after adding more complex query types that took longer to complete. If anything jumps out, I'd appreciate your feedback. thanks.

The connection pool doesn't make MySQLdb asyncronous. The results = cursor.fetchall() blocks Tornado, until the query is complete.
That's what happens when using non-asynchronous libraries with Tornado. Tornado is an IO loop; it's one thread. If you have a 20 second query, the server will be unresponsive while it waits for MySQLdb to return. Unfortunately, I'm not aware of a good async python MySQL library. There are some Twisted ones but they introduce additional requirements and complexity into a Tornado app.
The Tornado guys recommend abstracting slow queries into an HTTP service, which you can then access using tornado.httpclient. You could also look at tuning your query (>20 seconds!), or running more Tornado processes. Or you could switch to a datastore with an async python library (MongoDB, Postgres, etc).

What kind of 'complicated db queries' are you running? Are they just reads or are you updating the tables. Under certain circumstances, MySQL must lock the tables - even on what might seem likely read-only queries. This could explain the blocking behavior.
Additionally, I'd say any query that takes 20 seconds or more to run and that is run frequently is a candidate for optimization.

So, as we know - standard mysql drivers are blocking, so server will block while query executing. Here is good article, about how you can achieve non-blocking mysql queires in tornado.
By the way, as Mike Johnston mentioned - if your query executes >20s - it is very long. My suggestion is find way to move this query in background. Tornado does not have asynchronous mysql driver in it's package - because guys at FriendFeed done their best, to make their queries executing really fast.
Also instead of using pool of 20 synchronous database connections - you can start 20 server instances with 1 connection each, and use nginx as reverse proxy for them. They will be more bulletproof, than pool.

How do I prevent a Gateway Timeout with FastCGI on Nginx

I am running Django, FastCGI, and Nginx. I am creating an api of sorts that where someone can send some data via XML which I will process and then return some status codes for each node that was sent over.
The problem is that Nginx will throw a 504 Gateway Time-out if I take too long to process the XML -- I think longer than 60 seconds.
So I would like to set up Nginx so that if any requests matching the location /api will not time out for 120 seconds. What setting will accomplish that.
What I have so far is:
# Handles all api calls
location ^~ /api/ {
proxy_read_timeout 120;
proxy_connect_timeout 120;
fastcgi_pass 127.0.0.1:8080;
}
Edit: What I have is not working :)

Proxy timeouts are well, for proxies, not for FastCGI...
The directives that affect FastCGI timeouts are client_header_timeout, client_body_timeout and send_timeout.
Edit: Considering what's found on nginx wiki, the send_timeout directive is responsible for setting general timeout of response (which was bit misleading). For FastCGI there's fastcgi_read_timeout which is affecting the FastCGI process response timeout.

For those using nginx with unicorn and rails, most likely the timeout is in your unicorn.rb file
put a large timeout in unicorn.rb
timeout 500
if you're still facing issues, try having fail_timeout=0 in your upstream in nginx and see if this fixes your issue. This is for debugging purposes and might be dangerous in a production environment.
upstream foo_server {
server 127.0.0.1:3000 fail_timeout=0;
}

In http nginx section (/etc/nginx/nginx.conf) add or modify:
keepalive_timeout 300s
In server nginx section (/etc/nginx/sites-available/your-config-file.com) add these lines:
client_max_body_size 50M;
fastcgi_buffers 8 1600k;
fastcgi_buffer_size 3200k;
fastcgi_connect_timeout 300s;
fastcgi_send_timeout 300s;
fastcgi_read_timeout 300s;
In php file in the case 127.0.0.1:9000 (/etc/php/7.X/fpm/pool.d/www.conf) modify:
request_terminate_timeout = 300
I hope help you.

If you use unicorn.
Look at top on your server. Unicorn likely is using 100% of CPU right now.
There are several reasons of this problem.
You should check your HTTP requests, some of their can be very hard.
Check unicorn's version. May be you've updated it recently, and something was broken.

In server proxy set like that
location / {
proxy_pass http://ip:80;
proxy_connect_timeout 90;
proxy_send_timeout 90;
proxy_read_timeout 90;
}
In server php set like that
server {
client_body_timeout 120;
location = /index.php {
#include fastcgi.conf; //example
#fastcgi_pass unix:/run/php/php7.3-fpm.sock;//example veriosn
fastcgi_read_timeout 120s;
}
}

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008