Rewriteengine in .htaccess to catch files not ending in html - html

I'd like to use mod rewrite in to convert web page addresses like /directory to /directory/index.html, in a standard LAMP hosting situation. What I have works for addresses that end in a slash. I can't find a way to handle addresses that don't end a slash.
What seems like it should work is:
rewriterule ^(.*)/$ $1/index.html [L] /* addresses ending in / */
rewriterule ^(.*(?!html))$ $1/index.html [L] /* where the problem is */
But the second line causes a 500 server error. If I add a single letter x to the second line:
rewriterule ^(.*)/$ $1/index.html [L]
rewriterule ^(.*x(?!html))$ $1/index.html [L]
It starts to work, but only for directory names that end in an x. I have tried replacing the x with many different things. Anything more complicated than real characters (like [^x] or .+) gives a 500 server error.
And, to satisfy my own curiosity, does anyone know why the addition of a single real letter makes the difference between a server error and a perfectly functioning rule?
[Accepted Answer] Thanks to Gumbo I was able to approximate a solution using rewritecond:
rewritecond %{REQUEST_URI} !\.[^/]+$
rewriterule (.+) $1/index.html [L]
This works, but filters more than just .html -- it could block other pages. Unfortunately,
rewritecond %{REQUEST_URI} !\.html$
results in a server error:
Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary.
I'd still like to know why:
rewriterule ^(.*(?!html))$ $1/index.html [L]
results in a loop. The first half is supposed to check if it doesn't end in .html. Since the second half adds .html, it seems like the functional equivalent of:
while(substr($address,-4)!='html') $address.='html'
Obviously I'm missing something.

Use a RewriteCond directive to check whether the URL path does not end with a .html:
RewriteCond %{REQUEST_URI} !\.html$
RewriteRule ^(.*[^/])?/?$ $1/index.html [L]
Edit   You’re using a look-ahead assertion ((?!…)). But there isn’t anything after .* (only a $). So try a look-behind assertion instead:
RewriteRule ^.*$(?<!html) $0/index.html [L]
But note that you probably need Apache 2.2 to use these assertions.

Well, for actually making it work, you could just use a negative lookbehind instead of a lookahead:
RewriteRule ^(.*)(?<!html)$ $1/index.html [L]
I'm not sure offhand why adding the 'x' makes it work, I'll edit if I figure it out.

For why adding the x makes it work:
If the replacement will match the regex, the RewriteRule will be applied again. As an example, this causes an error:
RewriteRule ^(.*)$ $1.rb
because it would replace script with script.rb. That matches the regex, so it replaces script.rb with script.rb.rb, again and again...
This is hinted at in the error log:
Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary.
In your example, you add index.html to the end. When there is an x at the end of the regex, then it won't match your replacement, which ends in an l.

Related

Is there any possibility to remove ? and = in html?

I am trying to give pretty url for my html page. I found many answers but they are more related to php. I need to convert this link below,
http://localhost/blog.html?id=1
to
http://localhost/blog/1
I have the .htaccess file for removing html
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([^\.]+)$ $1.html [NC,L]
Help me to change my url parameters.
The examples you find are perfectly valid for you, since this is independent of any higher level logic like php. These rules operate on the level of the http server.
Anyway, here is a rule to get you started:
RewriteEngine on
RewriteCond %{QUERY_STRING} (?:^|&)id=(\d+)(?:&|$)
RewriteRule ^/?blog$ blog.html?id=%1 [END,QSD]
In case you get an http status 500 ("internal server error") using this then chances are that you operate a very only version of the apache http server. You will find a hint about an unsopported [END] flag in your http servers error log file in that case. Try replacing the END flag by the older L flag, that probably will work the same, though this depends a bit on your setup.

Apache rewrite exclude empty URI

The following rewrite appends .html to the URI.
RewriteRule ^/(.*)$ /sites/uk/$1.html [NC,L]
However, when it comes to the index page it is giving me:
.../sites/uk/.html instead of:
.../sites/uk/index.html
What I want to know is how to exclude an empty URI so it doesn't append .html to nothing. I have tried the following rewrite condition:
RewriteCond %{REQUEST_URI} !^/?$ However this is unsuccessful.

htaccess rewrite allcaps.html to lowercase.html

I've been trying to figure out how to get this to work for awhile now and it's not making any sense to me. What I've put together so far forces it to reference the lowercase.html file from /BLAH to /blah.html but I can't seem to put together /BLAH.html to /blah.html.
This is only to move SomeCapS.html (or .HTML I guess) to somecaps.html instead.
Any ideas on how to fix this code:
RewriteEngine On
RewriteBase /
# http://www.askapache.com/htaccess/rewrite-uppercase-lowercase.html
# If there are caps, set HASCAPS to true and skip next rule
RewriteRule [A-Z] - [E=HASCAPS:TRUE,S=1]
# Skip this entire section if no uppercase letters in requested URL
RewriteRule ![A-Z] - [S=28]
# Replace single occurance of CAP with cap, then process next Rule.
RewriteRule ^([^A]*)A(.*)$ $1a$2
RewriteRule ^([^B]*)B(.*)$ $1b$2
RewriteRule ^([^C]*)C(.*)$ $1c$2
RewriteRule ^([^D]*)D(.*)$ $1d$2
RewriteRule ^([^E]*)E(.*)$ $1e$2
RewriteRule ^([^F]*)F(.*)$ $1f$2
RewriteRule ^([^G]*)G(.*)$ $1g$2
RewriteRule ^([^H]*)H(.*)$ $1h$2
RewriteRule ^([^I]*)I(.*)$ $1i$2
RewriteRule ^([^J]*)J(.*)$ $1j$2
RewriteRule ^([^K]*)K(.*)$ $1k$2
RewriteRule ^([^L]*)L(.*)$ $1l$2
RewriteRule ^([^M]*)M(.*)$ $1m$2
RewriteRule ^([^N]*)N(.*)$ $1n$2
RewriteRule ^([^O]*)O(.*)$ $1o$2
RewriteRule ^([^P]*)P(.*)$ $1p$2
RewriteRule ^([^Q]*)Q(.*)$ $1q$2
RewriteRule ^([^R]*)R(.*)$ $1r$2
RewriteRule ^([^S]*)S(.*)$ $1s$2
RewriteRule ^([^T]*)T(.*)$ $1t$2
RewriteRule ^([^U]*)U(.*)$ $1u$2
RewriteRule ^([^V]*)V(.*)$ $1v$2
RewriteRule ^([^W]*)W(.*)$ $1w$2
RewriteRule ^([^X]*)X(.*)$ $1x$2
RewriteRule ^([^Y]*)Y(.*)$ $1y$2
RewriteRule ^([^Z]*)Z(.*)$ $1z$2
# If there are any uppercase letters, restart at very first RewriteRule in file.
RewriteRule [A-Z] - [N]
RewriteCond %{ENV:HASCAPS} TRUE
RewriteRule ^/?(.*)$ /$1.html
Thanks a lot
You need to use the built in rewrite map directive, because this has a lot of problems. First, the number of internal redirects is capped at something like 10. This means you can only redirect internally 10 times, and a URI with 11 upper case letters will cause too many internal redirects, resulting in a 500 server error.
The Apache documentation has a page for RewriteMaps.
But the problem is rewrite maps must be declared in the vhost/server config. And not in the htaccess file. If you don't have access, you may need to write a script to do this for you. Something like:
RewriteEngine On
RewriteRule [A-Z] /tolower.php [L]
And in the tolower.php you'd need to look at $_SERVER['REQUEST_URI'], then change all the letters to lower case, then either redirect the browser to the new URL without any upper case letters or, internally load that page and return it on the browser's behalf.

Is an index file for every page the wrong way to set up a site?

My goal was to prevent the user from having to type in .html in order to access the page they are looking for on our site. On other sites I have left the file name as /pagename.html and the user could type in only /pagename and the page would load. For some reason, that was not possible with our server settings (GoDaddy Plesk Parallel server) so my workaround was to create a folder for every page I wanted and the actual file would be /index.html. My goal was accomplished and now the user doesn't have to include .html to load the page. The problem now is that Google and SEOmoz reports are reading tons of duplicate content. The reason is that the user could type in 3 different things to get to the same page - technically 6 if you include "www":
sitename.com/services
sitename.com/services/
sitename.com/services/index.html
Search engines are displaying it the 2nd way (http://sitename.com/services/) and if you type it without the "/" it redirects to showing it with the "/". SEOmoz is saying I have 301 redirects for each page in order for that to happen but we never manually did that.
I've tried creating an .htaccess file with redirects from sitename.com/services/ to sitename.com/services but the page won't load because of too many redirects.
Did I break some big rules setting it up this way?
Please note that "sitename.com/services/" is just an example of a page and our entire site of 50 pages is set up in this nature. The actual site is http://www.logicalposition.com.
The preferred way is to set up your server to manage the URL handling. If you are on an Apache server, for example, you could use the following suggestion and create/change the .htaccess file to get the desired affect.
http://eisabainyo.net/weblog/2007/08/19/removing-file-extension-via-htaccess/
The most straightforward way is to use Apache's .htaccess (which if I remember correctly GoDaddy allows access to, though I may be wrong) to do redirects.
See this post: https://stackoverflow.com/a/5730126/549346 (mods: possible duplicate?), which directs you to place something like the following in your .htacess file:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)\.html$ /$1 [L,R=301]
Firstly it sounds like you haven't done basic leg work to minimize this. You need to decide do you want www.samplesite.com or just samplesite.com? Then you can very easily set this with .htaccess (see this handy tool). This will mean at most you will have three variations, not 6.
I would take #Jassons's suggestion and use URL Handling - 2 of my clients currently use GoDaddy and both of which use this method so should be fully supported.
Some more helpful links for URL Handling/htaccess rewrites (although note: setting up 301 redirects takes time, patience and careful monitoring of crawl errors on Web Master Tools, so URL Handling is preferable!)
http://net.tutsplus.com/tutorials/other/using-htaccess-files-for-pretty-urls/
Extreme example, but still relevant :) Handling several thousand redirects with .htaccess
Edit Forcing trailing slash
You can easily force the trailing slash to appear by using the Rewrite rule
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(.*) $1 [L]
RewriteCond %{REQUEST_URI} !(.*)/$
RewriteRule ^(.*)$ $1/ [L,R=301]
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?category=$1
I think you have already done that in part, but what you will notice is there is a 301 redirect header sent, that means the as spiders visit your site they will update the URL to have the trailing slash - it won't be over night. You might be able to use Web Master Tools to speed things up in terms of changing the URLS.
Source: In part this website, it give's you a good explanation of how it works

How to avoid double google indexing using .htaccess?

I have a website, with a nice RewriteRule in its root, that redirects all the queries of this kind:
http://domain.com/foo/parameter
into
http://domain.com/index.php?args=parameter
Users can only see the clean URL and everyone is happy.
Now here is the problem: domain.com DNS have an A record for domain.com, pointing to a private server IP, and an A record for mail.domain.com, pointing to the exact same IP.
For some unknown reason, in the last couple of months, Google double indexed all the pages of my site (http://domain.com/foo/par1, http://domain.com/foo/par2 etc.) with another set with the mail subdomain (http://mail.domain.com/foo/par1, http://mail.domain.com/foo/par2 etc).
I thought I could get rid of all of them redirecting any request to mail.domain.com/$whatever to domain.com and eventually Google would understand that all those pages with the 'mail' subdomain redirects to the homepage and are therefore not necessary.
I tried this in .htaccess:
RewriteCond %{HTTP_HOST} ^mail.domain.com$ [NC]
RewriteRule ^(.*)$ http://domain.com [R=301,L]
But this redirects to a visible URL that looks like this: http://domain.com/index.php?args=parameter, while I just want a redirect to the homepage.
What's the correct form, and are there more elegant ways to achieve this, maybe adding something into robots.txt? (Please note that I can't just disallow a subfolder here)
If you just want to redirect to home page by discarding the original REQUEST_URI and QUERY_STRING then use these rules:
RewriteCond %{HTTP_HOST} ^mail.domain.com$ [NC]
RewriteRule ^(.*)$ http://domain.com/? [R=301,L]
By putting ? in the end it will strip out original query string, thus a URL of this type: http://mail.domain.com/index.php?args=parameter will become http://domain.com/
Your rule is correct, but you need to put it before all the other rules (right after RewriteEngine On) or it will pick up the latest state of the internal rewritten URL.
Update: Hmm, you said that your old rule redirects correctly but is using the internal, ugly, URL. That actually shouldn't be the case unless you add $1 to pick out the matched string.
RewriteCond %{HTTP_HOST} ^mail.domain.com$ [NC]
RewriteRule ^(.*)$ http://domain.com/$1 [R=301,L]