NGINX remove .html extension - html

So, I found an answer to removing the .html extension on my page, that works fine with this code:
server {
listen 80;
server_name _;
root /var/www/html/;
index index.html;
if (!-f "${request_filename}index.html") {
rewrite ^/(.*)/$ /$1 permanent;
}
if ($request_uri ~* "/index.html") {
rewrite (?i)^(.*)index\.html$ $1 permanent;
}
if ($request_uri ~* ".html") {
rewrite (?i)^(.*)/(.*)\.html $1/$2 permanent;
}
location / {
try_files $uri.html $uri $uri/ /index.html;
}
}
But if I open mypage.com it redirects me to mypage.com/index
Wouldn't this be fixed by declaring index.html as index? Any help is appreciated.

The "Holy Grail" Solution for Removing ".html" in NGINX:
UPDATED ANSWER: This question piqued my curiosity, and I went on another, more in-depth search for a "holy grail" solution for .html redirects in NGINX. Here is the link to the answer I found, since I didn't come up with it myself: https://stackoverflow.com/a/32966347/4175718
However, I'll give an example and explain how it works. Here is the code:
location / {
if ($request_uri ~ ^/(.*)\.html(\?|$)) {
return 302 /$1;
}
try_files $uri $uri.html $uri/ =404;
}
What's happening here is a pretty ingenious use of the if directive. NGINX runs a regex on the $request_uri portion of incoming requests. The regex checks if the URI has an .html extension and then stores the extension-less portion of the URI in the built-in variable $1.
From the docs, since it took me a while to figure out where the $1 came from:
Regular expressions can contain captures that are made available for later reuse in the $1..$9 variables.
The regex both checks for the existence of unwanted .html requests and effectively sanitizes the URI so that it does not include the extension. Then, using a simple return statement, the request is redirected to the sanitized URI that is now stored in $1.
The best part about this, as original author cnst explains, is that
Due to the fact that $request_uri is always constant per request, and is not affected by other rewrites, it won't, in fact, form any infinite loops.
Unlike the rewrites, which operate on any .html request (including the invisible internal redirect to /index.html), this solution only operates on external URIs that are visible to the user.
What does "try_files" do?
You will still need the try_files directive, as otherwise NGINX will have no idea what to do with the newly sanitized extension-less URIs. The try_files directive shown above will first try the new URL by itself, then try it with the ".html" extension, then try it as a directory name.
The NGINX docs also explain how the default try_files directive works. The default try_files directive is ordered differently than the example above so the explanation below does not perfectly line up:
NGINX will first append .html to the end of the URI and try to serve it. If it finds an appropriate .html file, it will return that file and will maintain the extension-less URI. If it cannot find an appropriate .html file, it will try the URI without any extension, then the URI as a directory, and then finally return a 404 error.
UPDATE: What does the regex do?
The above answer touches on the use of regular expressions, but here is a more specific explanation for those who are still curious. The following regular expression (regex) is used:
^/(.*)\.html(\?|$)
This breaks down as:
^: indicates beginning of line.
/: match the character "/" literally. Forward slashes do NOT need to be escaped in NGINX.
(.*): capturing group: match any character an unlimited number of times
\.: match the character "." literally. This must be escaped with a backslash.
html: match the string "html" literally.
(\?|$): match a literal "?" or the end of the string. This is done to avoid mishandling file names with something after ".html".
The capturing group (.*) is what contains the non-".html" portion of the URL. This can later be referenced with the variable $1. NGINX is then configured to re-try the request (return 302 /$1;) and the try_files directive internally re-appends the ".html" extension so the file can be located.
UPDATE: Retaining the query string
To retain query strings and arguments passed to a .html page, the return statement can be changed to:
return 302 /$1$is_args$args;
This should allow requests such as /index.html?test to redirect to /index?test instead of just /index.
Note that this is considered safe usage of the `if` directive.
From the NGINX page If Is Evil:
The only 100% safe things which may be done inside if in a location context are:
return ...;
rewrite ... last;
Also, note that you may swap out the '302' redirect for a '301'.
A 301 redirect is permanent, and is cached by web browsers and search engines. If your goal is to permanently remove the .html extension from pages that are already indexed by a search engine, you will want to use a 301 redirect. However, if you are testing on a live site, it is best practice to start with a 302 and only move to a 301 when you are absolutely confident your configuration is working correctly.

This has often come up for me as well and due to the configuration at work, location blocks are iffy at best and the / & .php blocks are locked down. Which means that most of the solutions don't work for me.
So here is one that I simplified from the Accepted answer above.
rewrite ^/(.*)\.html /$1/ permanent;
Works great for CMSs, where the underlying framework is generating the pages

Related

Config nginx for many client projects

We have about 30+ client projects(some are vue projects and other are static html projects), each projects have seperate root directory.
For now nginx is config like, each project has a location.
location ^~ /workspaces/ {
root /var/www/workspace/;
index index.html index.htm;
}
location ^~ /offical/ {
root /var/www/official/;
index index.html index.htm;
}
...
Each time a new client project released, a new location will add to nginx file. I'm afraid of too many location in the nginx file will affect the efficiency of nginx.
How can I simplify the nginx config file for all the client projects. For example with one location location ^~ /web/, then put all the projects under web path.
Best Practice
The best practice is to use separate domain names for each app. This is important from the security perspective, to guard against a cross-site scripting vulnerability in one app having any ill effects on all the other apps, and cookie management.
Performance
However, from the performance perspective, nginx is already highly efficient for such common use cases that you shouldn't worry about having a few extra location or server_name directives:
location
I'd imagine that the prefix-based location search would be done on a prefix-based search tree — https://en.wikipedia.org/wiki/Trie — e.g., it would be highly efficient, where, effectively, each input character in the URL would only be examined once, and each level on the tree would only have a certain limited number of branches.
If you're instead move to use a regex-based approach, then that would be noticeable slower (at least from the performance analysis, you probably won't notice any difference in real use), because then each regular expression would have to be re-evaluated, potentially on the whole input, until a match is found; the complexity being a multiple of the number of regular expressions, times the size of the input URL.
server_name
If you instead move to a server-based definition, based on non-regex server_name specifications, then the matching would be done through a hash-table, which, likewise, is a very efficient operation, where the search would take constant time even on an infinite number of individual server definitions.
Comparison
Which one is more efficient, location or server_name? It is difficult to say for sure without getting into too many details; but I'd imagine that a hash-based search would be more friendly insofar as CPU branch prediction is concerned — https://en.wikipedia.org/wiki/Branch_predictor; but this is getting really into the weeds here, you don't really need to worry about these sorts of things for a webapp. However, I'd still recommend moving to a server-based configuration for security reasons, even if the extra performance benefits are negligible.
tl;dr:
tl;dr: nginx is already highly efficient for your use case as-is, and no further optimisation is required; the best you could do is to make sure that you don't use any regex-based location directives (either at all, or use a ^~ modifier for your prefix-based location directives), because those would be slower than the prefix-based ones; it would also be advisable to switch to server-based configuration for extra security.
References
http://nginx.org/r/location
http://nginx.org/r/server_name
http://nginx.org/docs/hash.html
http://nginx.org/docs/http/server_names.html
http://nginx.org/docs/http/request_processing.html
I usually use dynamic vhosts for ngninx. Therefore you can create a serving directory e.g. /var/www/ and inside define a directory for e.g. every domain of the client projects you want to deploy.
/var/www/domain.tld
/var/www/subdomain.domain.tld
/var/www/otherproject.tld
/var/www/project.tld/public
and then in nginx you define your server-block as follows
server {
# SSL configuration
listen 443 ssl http2 default_server; # managed by Certbot
listen [::]:443 ssl http2 default_server;
set $basepath "/var/www";
server_name ~^(\w+\.)?(?<base>\w+\.\w+)$;
if ( -d $basepath/$host) {
set $rootpath $basepath/$host;
}
if ( -d $basepath/$host/public ) {
set $rootpath $basepath/$host/public;
}
if ( !-d $basepath/$host ) {
set $rootpath $basepath/$base;
return 301 https://$base$request_uri;
}
root $rootpath;
access_log "/var/log/nginx/${host}.access.log";
error_log "/var/log/nginx/error.log" debug;
index index.php index.html index.htm index.nginx-debian.html;
location ~ \.php$ {
include snippets/fastcgi-php.conf;
fastcgi_pass unix:/run/php/php7.2-fpm.sock;
}
This first sets the basepath to /var/www and then tries in order directories in that basepath. If a directory with the domain from which project is accessed exists and serves them from there, if inside is a public folder this one is preferred. If both are not available it redirects to another defined URL.
Furthermore, for every host a different access.log is generated. Unfortunately for the error.log this does not work, hence all errors are gathered in the common error.log.
for specific files you can then filter for extensions etc. to specify how they are served, in the example above of PHP-files, those are served using the php7.2-fpm.

Nginx setup with multiple html files

I have got a website with multiple html files which I want to serve with Nginx.
server {
listen 80;
root /var/www;
location / {
index index.html;
}
location /projects/ {
index projects.html;
}
server_name mylady17.de;
location /shiny/ {
proxy_pass http://104.248.41.231:3838/;
}
}
This is the way it is set up. The index.html works perfectly fine, but however "http://mylady17.de/projects" gives me an error (404, not found). The projects.html file is stored in var/www/ and should work. What am I doing wrong? Why can´t I access the file?
The index directive operates on URIs which end with a / and attempt to locate files by appending the value of the directive to the URI. See this document for details.
So your URI /projects will not invoke the index module. Even if you did use /projects/ instead, the index module would attempt to locate the file at /var/www/projects/projects.html.
To point a single URI to a given file, you can use an exact match location. See this document for details.
For example:
location = /projects {
rewrite ^ /projects.html last;
}
If you did decide to expand this in the future, requiring nginx to search for files by appending .html to the end of the URI, you could use a try_files directive instead. See this document for details.
For example:
location / {
try_files $uri $uri/ $uri.html =404;
}

Fatfree routing with PHP built-in web server

I'm learning fatfree's route and found it behaves unexpected.
Here is my code in index.php:
$f3 = require_once(dirname(dirname(__FILE__)). '/lib/base.php');
$f3 = \Base::instance();
echo 'received uri: '.$_SERVER['REQUEST_URI'].'<br>';
$f3->route('GET /brew/#count',
function($f3,$params) {
echo $params['count'].' bottles of beer on the wall.';
}
);
$f3->run();
and here is the URL which I access: http://xx.xx.xx.xx:8090/brew/12
I get a 404 error:
received uri: /brew/12
Not Found
HTTP 404 (GET /12)
the strange thing is that the URI in F3 is now "/12" instead of "/brew/12" and I guess this is the issue.
When I check the base.php (3.6.5), $this->hive['BASE'] = "/brew" and $this->hive['PATH'] = "/12".
But if F3 only uses $this->hive['PATH'] to match the predefined route, it won't be able to match them.
If I change the route to:
$f3->route('GET /brew',
and use the URL: http://xx.xx.xx.xx:8090/brew, then the route matches without issue.
In this case, $this->hive['BASE'] = "" and $this->hive['PATH'] = "/brew". If F3 compares the $this->hive['PATH'] with predefined route, they match each other.
BTW, I'm using PHP's built-in web server and since $_SERVER['REQUEST_URI'] (which is used by base.php) returns the correct URI, I don't think there is anything wrong with the URL rewrite in my .htrouter.php.
Any idea? What did I miss here?
add the content of .htrouter.php here
<?php
#get the relative URL
$uri = urldecode(parse_url($_SERVER['REQUEST_URI'], PHP_URL_PATH));
#if request to a real file (such as a html, image, js, css) then leave it as it is
if ($uri !== '/' && file_exists(__DIR__ . $uri)) {
return false;
}
#if request virtual URL then pass it to the bootstrap file - index.php
$_GET['_url'] = $_SERVER['REQUEST_URI'];
require_once __DIR__ . './public/index.php';
Your issue is directly related to the way you're using the PHP built-in web server.
As stated in the PHP docs, here's how the server handles requests:
URI requests are served from the current working directory where PHP was started, unless the -t option is used to specify an explicit document root. If a URI request does not specify a file, then either index.php or index.html in the given directory are returned. If neither file exists, the lookup for index.php and index.html will be continued in the parent directory and so on until one is found or the document root has been reached. If an index.php or index.html is found, it is returned and $_SERVER['PATH_INFO'] is set to the trailing part of the URI. Otherwise a 404 response code is returned.
If a PHP file is given on the command line when the web server is started it is treated as a "router" script. The script is run at the start of each HTTP request. If this script returns FALSE, then the requested resource is returned as-is. Otherwise the script's output is returned to the browser.
That means that, by default (without a router script), the web server is doing a pretty good job for routing unexisting URIs to your document root index.php file.
In other words, provided your file structure is like:
lib/
base.php
template.php
etc.
public/
index.php
The following command is enough to start your server and dispatch the requests properly to the framework:
php -S 0.0.0.0:8090 -t public/
Or if you're running the command directly from the public/ folder:
cd public
php -S 0.0.0.0:8090
Beware that the working directory of your application depends on the folder from which you call the command. In order to leverage this value, I strongly advise you to add chdir(__DIR__); at the top of your public/index.php file. This way, all subsequent require calls will be relative to your public/ folder. For ex: $f3 = require('../lib/base.php');
Routing file-style URIs
The built-in server, by default, won't pass unexisting file URIs to your index.php, as stated in:
If a URI request does not specify a file, then either index.php or index.html in the given directory are returned
So if you plan to define some routes with dots, such as:
$f3->route('GET /brew.json','Brew->json');
$f3->route('GET /brew.html','Brew->html');
Then it won't work because PHP won't pass the request to index.php.
In that case, you need to call a custom router, such as the .htrouter.php you were trying to use. The only thing is that your .htrouter.php has obviously been designed for a different framework (F3 doesn't care about $_GET['url'] but cares about $_SERVER['SCRIPT_NAME'].
Here's an exemple of .htrouter.php that should work with F3:
// public directory definition
$public_dir=__DIR__.'/public';
// serve existing files as-is
if (file_exists($public_dir.$_SERVER['REQUEST_URI']))
return FALSE;
// patch SCRIPT_NAME and pass the request to index.php
$_SERVER['SCRIPT_NAME']='index.php';
require($public_dir.'/index.php');
NB: the $public_dir variable should be set accordingly to the location of the .htrouter.php file.
For example if you call:
php -S 0.0.0.0:8090 -t public/ .htrouter.php
it should be $public_dir=__DIR__.'/public'.
But if you call:
cd public
php -S 0.0.0.0:8090 .htrouter.php
it should be $public_dir=__DIR__.
OK, I checked the base.php and found out when f3 calculates the base URI, it uses $_SERVER['SCRIPT_NAME'].
$base='';
if (!$cli)
$base=rtrim($this->fixslashes(
dirname($_SERVER['SCRIPT_NAME'])),'/');
if we have web server directly forward all requests to index.php, then
_SERVER['SCRIPT_NAME'] = /index.php, and in this this case, base is ''.
if we use URL rewriting via .htrouter.php to index.php, then
_SERVER['SCRIPT_NAME'] = /brew/12, and in this this case, base is '/brew' which causes the issue.
Since I'm going to use the URL rewrite, I have to comment out the if statement and make sure base =''.
Thanks xfra35 for providing the clue.
Apache like php router here:
It can url rewrite.
https://github.com/kyesil/QPHP/blob/master/router.php
Usage:
php -S localhost:8081 router.php

Can you somehow use the base-tag to pass parameters?

I written some custom debugging code to a large framework, by adding ?debug to any url I get some custom server-data. Whenever I click a link, the ?debug disapears, ofcorse can I keep it there somehow? My idea was using the base-tag:
If(isset($_POST['debug']{
<base href="/images/">
}
But it doesn't seem to support parameters. Is there something similair?
Assuming you're using Apache you could just use mod_rewrite:
RewriteEngine on
# Test whether the current query string contains 'debug'
RewriteCond %{QUERY_STRING} !debug
# Internally append ('query string append') the extra parameter
RewriteRule (.*) $1?debug [QSA]
To limit this behaviour to only your computer add an extra condition in between:
# Only trigger the rule if the remote IP address exactly matches the string
RewriteCond %{REMOTE_ADDR} =192.168.1.1
And replace the IP with your own.
I think you have 2 options - the "easiest" would be to add a session variable on the server side that shows that all pages returned should be in debug mode. This brings it's own side effects, with reliance on the session being one of them.
The better option is to add the debug query string to all links on the page. This can be done on the server side when the page is rendered, but probably the best way would be to use something like jQuery to automatically add it to all the links (as described here: Jquery : Append querystring to all links)

Nginx incorrect locations handling

How to properly handle all non-existent locations in nginx configuration for php site?
I can figure out 5 possible cases of such locations.
Incorrect files: example.com/notexist.jpg
Incorrect folders: example.com/notexist
Nested incorrect folders: example.com/notexist1/notexist2/..../notexist10000
Combination of (3) and (1): example.com/notexist1/notexist2/..../notexist10000/not.exist.jpg
Non-existent php files: example.com/notexist.php
Is there tiny and powerful solution covering all of these cases?
Also need to avoid checking ANY file and dir (with -d and -f) as it will add CPU and IO overhead.
Thanks in advance!
try_files solves the issue completely for me
location / {
try_files $uri $uri/index.html $uri.html =404;
}
It is also very important to have absolute paths in your not_found_page otherwise page layout will be broken.
in all your 5 cases a 404 would normally be returned, so you can add special handling of all those cases by:
creating a named location
referring to that named location as your 404 error page:
that would yield:
server {
error_page 404 = #errors;
location #fallback {
# do whatever you want to do on faulty reqeusts
}
}