Is there any possibility to remove ? and = in html? - html

I am trying to give pretty url for my html page. I found many answers but they are more related to php. I need to convert this link below,
http://localhost/blog.html?id=1
to
http://localhost/blog/1
I have the .htaccess file for removing html
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([^\.]+)$ $1.html [NC,L]
Help me to change my url parameters.

The examples you find are perfectly valid for you, since this is independent of any higher level logic like php. These rules operate on the level of the http server.
Anyway, here is a rule to get you started:
RewriteEngine on
RewriteCond %{QUERY_STRING} (?:^|&)id=(\d+)(?:&|$)
RewriteRule ^/?blog$ blog.html?id=%1 [END,QSD]
In case you get an http status 500 ("internal server error") using this then chances are that you operate a very only version of the apache http server. You will find a hint about an unsopported [END] flag in your http servers error log file in that case. Try replacing the END flag by the older L flag, that probably will work the same, though this depends a bit on your setup.

Related

Trying to Use .Htaccess to 301 redirect all pages but one. However The One Page Exception Rule is Not Working

I have been trying to redirect all of my website's pages to a new website but would like to rule out one single page as an exception. This is the code I am using:
RewriteEngine On
RewriteCond %{REQUEST_URI} !^/en/planning/$
RewriteRule .* https://www.target.example/ [R=301,L]
As you can see, I am trying redirect all pages to a new domain but leave the /en/planning/ page intact. However, when I use the code above, all pages were redirected without an exception. From the server, I found out that /en/planning directory does not really exist. The template to the page however, exists in a different directory.
They are here > /home/indo/src-20220316-200538/apps/front/templates/planning/views/planning-view.html.
The header & footer of the page was built in a different directory.
Meanwhile the public_html of the website lies on /home/indo/www/
In this directory, there is a shortcut to the original location that is named "front". Here is a screenshot from WinSCP:
So, based on this, what is the best way to make /en/planning/ as an exception? the website I am trying to redirect is http://source.example/ to https://www.target.example/. In addition, the website is running with Fat-Free Framework.
I have been stuck here for weeks and this is frustating.
RewriteCond %{REQUEST_URI} !^/en/planning/$
RewriteRule .* https://www.target.example/ [R=301,L]
You no doubt have other directives (a front-controller pattern) that rewrites URLs of the form /en/planning/ to the front-controller, which performs the underlying routing. Your front-controller might be index.php, or something else.
The "problem" here is that when the request is rewritten to the front-controller, the REQUEST_URI server variable is no longer /en/planning/, but is updated to /index.php (or whatever the front-controller is) and the redirect occurs, since the negated condition is now successful. The rewrite engine makes multiple passes, the undesirable redirect is likely occurring on the second pass (the exception is successful initially).
You need to ensure that the you only check the originally requested URL and not the rewritten URL.
However, you also likely need to make an exception for any static resources (images, CSS, JS, etc) that are used by this page, otherwise these would also be redirected.
Try the following instead:
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{REQUEST_URI} !^/en/planning/$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^ https://www.target.example/ [R=301,L]
The check against the REDIRECT_STATUS environment variable ensures that we only test the initial request from the client and not the rewritten request by Apache.
The additional check against REQUEST_FILENAME ensures that the requested URL does not map to an actual file (a static resource). However, the obvious downside of this is that static resources (for other pages) are not redirected.
You also need to make sure the browser cache is cleared, since the erroneous 301 (permanent) redirect will have been cached by the browser. Test first with 302 (temporary) redirects to avoid potential caching issues.
I would add, however, that a many-to-one redirect to the homepage, as you are implementing here is generally bad for SEO since search engines (particularly Google) will see this as a soft-404 and not honour the redirect, ultimately dropping the pages from the search results.

Prevent direct URL access with .htaccess

I know this question has been asked before several times, but I couldn't find the right answer.
I would like to allow search engines and some referrers to access a certain URL, without allowing direct URL access.
You can't reach this domain by clicking the link to
2betist.umran.org
When you search this domain on Google, you can reach the website by clicking the search result link, but accessing it directly via URL or via referrer doesn't work. I would like to create a white list on .htaccess for some of the referrers, along with the Google and Bing search engines.
I hope I describe the problem clearly enough. Thanks!
I have googled and found https://support.acquia.com/hc/en-us/articles/360005257234-Introduction-to-htaccess-rewrite-rules which states:
The .htaccess file controls a number of ways that a website can be accessed, blocked, and redirected. It does this using a series of one or more rewrite rules. These rewrites are made possible by Apache's mod_rewrite module.
mod_rewrite provides a way to modify incoming URL requests, dynamically, based on regular expression rules. This allows you to map arbitrary URLs onto your internal URL structure in any way you like.
The basic formulation of any .htaccess rewrite rule includes setting a combination of rewrite condition (RewriteCond) tests along with a corresponding rule (RewriteRule) if the prior conditions pass. In most cases, these rules should be placed at any point after the RewriteEngine on line in the .htaccessfile located in the website's docroot.
The keys are htaccess rewrite conditions and rules.
Another website provided an example for a .htaccess file to disallow certain referrers.
RewriteCond %{QUERY_STRING} / [OR]
RewriteCond %{HTTP_REFERER} \.semalt\.com [OR,NC]
RewriteCond %{HTTP_REFERER} best-seo-solution\.com [OR,NC]
RewriteCond %{HTTP_REFERER} best-seo-offer\.com [OR,NC]
...
RewriteCond %{HTTP_USER_AGENT} OrgProbe [OR,NC]
RewriteCond %{HTTP_USER_AGENT} Majestic [NC]
RewriteRule ^.*$ - [F,L]
Here you can find the complete Apache documentation.

Use htaccess to fix misspelled urls

So I have a pretty simple problem (at least I think do) with my website. I need to be able to redirect any misspelled URLs to the correct ones. It's easier if I explain it to you guys than to describe it.
For example, let's take this url.
http://www.tomshardware.com/reviews/radeon-r9-290x-hawaii-review,3650.html
Now, that url will take you to the correct page of that article regardless of how the url is spelled. Say you accidentally place a letter, number or a word into that URL to something like this:
http://www.tomshardware.com/reviews/radeon-r9-290x-TEST-TEST-hawaii-review,3650.html
That url will still take you to the correct article and fix itself to the correct URL. You could add anything to that URL and it will still take you to the right article regardless what you accidentally type into it.
So my question is how do I do this in htaccess? This is my current htaccess file
# Secure htaccess file
<files .htaccess>
order allow,deny
deny from all
</files>
AddHandler application/x-httpd-php5 .html .htm
AddType application/x-httpd-php .html .htm .php
AddHandler cgi-script .pl .cgi
Options ALL -Indexes -Multiviews +ExecCGI +FollowSymLinks
# Do not remove this line, otherwise mod_rewrite rules will stop working
RewriteBase /
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html
#Redirect Non-WWW to WWW
RewriteCond %{HTTP_HOST} !^www\.
RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]
RewriteCond %{REQUEST_URI} /index\.html?$ [NC]
RewriteRule ^(.*)index\.html?$ "/$1" [NC,R=301,NE,L]
You probably can't do that in that way.
As you can observe, the text on the url is totally irrelevant and is only there to create readable and index-friendly (SEO) urls. Those words are called "slugs", see http://en.wikipedia.org/wiki/Clean_URL#Slug
If you modify the last part, the 3650 it will break the url because this is the only identifier which typically corresponds to a unique ID in the database.
Assumption on how and why the mentioned site do this:
The site uses either a standalone routing component (e.g. Routing from Symfony PHP framework: http://symfony.com/components/Routing), an entire web framework or everything is written by hand. Depending on the language it might be ZEND, Symfony, etc for PHP, MVC for Asp.net or any other.
In all cases there is some sort of filtering of urls before the original content is served.
The routing parses the url, retrieves the unique ID, fetches the data set and creates again an absolute URL out of it.
It then compares the freshly generated route with the one you have entered.
If they don't match the framework issues a http status of 30x and redirects you to the new url.
The purpose of that is to maintain link sanity when the slug tags have changed or for whatever reason the SEO friendly url layout have changed.
The redirect is there so the old fashioned urls are updated next time a search engine visits the page and updates it's index.
Imagine you have a typo somewhere in the slugs or you forgot to mention Radeon and you want to avoid having it forever broken or wrong in the DB.
So you need to fix it but at the same time you want to avoid breaking the old urls for search indexes which have not yet revisited your site with the new slugs or users that have bookmarked it.
After the redirect it again compares the urls and after they match the content is served.
A DB lookup is very likely here and you cannot do this properly with htaccess alone as you have no knowledge about correctness of the url here.
You would internal-redirect all article pages to a php program and it will match the parameters with best possible page to show
-- .htaccess --
RewriteEngine on
RewriteRule ^article/(.*).html$ /article.php?url=$1 [L]
-- php --
read article selection criteria
$article_url=$_GET['url'];
Search through database or files and show the article

Is an index file for every page the wrong way to set up a site?

My goal was to prevent the user from having to type in .html in order to access the page they are looking for on our site. On other sites I have left the file name as /pagename.html and the user could type in only /pagename and the page would load. For some reason, that was not possible with our server settings (GoDaddy Plesk Parallel server) so my workaround was to create a folder for every page I wanted and the actual file would be /index.html. My goal was accomplished and now the user doesn't have to include .html to load the page. The problem now is that Google and SEOmoz reports are reading tons of duplicate content. The reason is that the user could type in 3 different things to get to the same page - technically 6 if you include "www":
sitename.com/services
sitename.com/services/
sitename.com/services/index.html
Search engines are displaying it the 2nd way (http://sitename.com/services/) and if you type it without the "/" it redirects to showing it with the "/". SEOmoz is saying I have 301 redirects for each page in order for that to happen but we never manually did that.
I've tried creating an .htaccess file with redirects from sitename.com/services/ to sitename.com/services but the page won't load because of too many redirects.
Did I break some big rules setting it up this way?
Please note that "sitename.com/services/" is just an example of a page and our entire site of 50 pages is set up in this nature. The actual site is http://www.logicalposition.com.
The preferred way is to set up your server to manage the URL handling. If you are on an Apache server, for example, you could use the following suggestion and create/change the .htaccess file to get the desired affect.
http://eisabainyo.net/weblog/2007/08/19/removing-file-extension-via-htaccess/
The most straightforward way is to use Apache's .htaccess (which if I remember correctly GoDaddy allows access to, though I may be wrong) to do redirects.
See this post: https://stackoverflow.com/a/5730126/549346 (mods: possible duplicate?), which directs you to place something like the following in your .htacess file:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)\.html$ /$1 [L,R=301]
Firstly it sounds like you haven't done basic leg work to minimize this. You need to decide do you want www.samplesite.com or just samplesite.com? Then you can very easily set this with .htaccess (see this handy tool). This will mean at most you will have three variations, not 6.
I would take #Jassons's suggestion and use URL Handling - 2 of my clients currently use GoDaddy and both of which use this method so should be fully supported.
Some more helpful links for URL Handling/htaccess rewrites (although note: setting up 301 redirects takes time, patience and careful monitoring of crawl errors on Web Master Tools, so URL Handling is preferable!)
http://net.tutsplus.com/tutorials/other/using-htaccess-files-for-pretty-urls/
Extreme example, but still relevant :) Handling several thousand redirects with .htaccess
Edit Forcing trailing slash
You can easily force the trailing slash to appear by using the Rewrite rule
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(.*) $1 [L]
RewriteCond %{REQUEST_URI} !(.*)/$
RewriteRule ^(.*)$ $1/ [L,R=301]
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?category=$1
I think you have already done that in part, but what you will notice is there is a 301 redirect header sent, that means the as spiders visit your site they will update the URL to have the trailing slash - it won't be over night. You might be able to use Web Master Tools to speed things up in terms of changing the URLS.
Source: In part this website, it give's you a good explanation of how it works

Rewriteengine in .htaccess to catch files not ending in html

I'd like to use mod rewrite in to convert web page addresses like /directory to /directory/index.html, in a standard LAMP hosting situation. What I have works for addresses that end in a slash. I can't find a way to handle addresses that don't end a slash.
What seems like it should work is:
rewriterule ^(.*)/$ $1/index.html [L] /* addresses ending in / */
rewriterule ^(.*(?!html))$ $1/index.html [L] /* where the problem is */
But the second line causes a 500 server error. If I add a single letter x to the second line:
rewriterule ^(.*)/$ $1/index.html [L]
rewriterule ^(.*x(?!html))$ $1/index.html [L]
It starts to work, but only for directory names that end in an x. I have tried replacing the x with many different things. Anything more complicated than real characters (like [^x] or .+) gives a 500 server error.
And, to satisfy my own curiosity, does anyone know why the addition of a single real letter makes the difference between a server error and a perfectly functioning rule?
[Accepted Answer] Thanks to Gumbo I was able to approximate a solution using rewritecond:
rewritecond %{REQUEST_URI} !\.[^/]+$
rewriterule (.+) $1/index.html [L]
This works, but filters more than just .html -- it could block other pages. Unfortunately,
rewritecond %{REQUEST_URI} !\.html$
results in a server error:
Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary.
I'd still like to know why:
rewriterule ^(.*(?!html))$ $1/index.html [L]
results in a loop. The first half is supposed to check if it doesn't end in .html. Since the second half adds .html, it seems like the functional equivalent of:
while(substr($address,-4)!='html') $address.='html'
Obviously I'm missing something.
Use a RewriteCond directive to check whether the URL path does not end with a .html:
RewriteCond %{REQUEST_URI} !\.html$
RewriteRule ^(.*[^/])?/?$ $1/index.html [L]
Edit   You’re using a look-ahead assertion ((?!…)). But there isn’t anything after .* (only a $). So try a look-behind assertion instead:
RewriteRule ^.*$(?<!html) $0/index.html [L]
But note that you probably need Apache 2.2 to use these assertions.
Well, for actually making it work, you could just use a negative lookbehind instead of a lookahead:
RewriteRule ^(.*)(?<!html)$ $1/index.html [L]
I'm not sure offhand why adding the 'x' makes it work, I'll edit if I figure it out.
For why adding the x makes it work:
If the replacement will match the regex, the RewriteRule will be applied again. As an example, this causes an error:
RewriteRule ^(.*)$ $1.rb
because it would replace script with script.rb. That matches the regex, so it replaces script.rb with script.rb.rb, again and again...
This is hinted at in the error log:
Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary.
In your example, you add index.html to the end. When there is an x at the end of the regex, then it won't match your replacement, which ends in an l.