htaccess ajax call and hide html from search engine - html

so i developed a simple website which is using ajax call to html files to display.
the issue is that the search engine like google find the html file, and when we click on the link we see a poor html file out of context of the website of course.
so im guessing i have some work to do with htaccess file to handle this but how ?
my idea is to use htaccess rewriterule to redirect any .html file to the main index.php file.
something like that :
- www.my.com/team.html will call www.my.com/index.php which will detect the query-string team.html and will proceed to ajax loading the corresponding html file.
how is that possible please ?

Generally all search engines respect the terms set in the robots.txt file. If you wish that files in a particular folder in your site should not be crawled, the robots.txt can instruct the search engine to obey.
If you wish to lern about it, here is the link: http://www.robotstxt.org/robotstxt.html
You may see http://www.google.co.in/robots.txt for example.

Related

How do I make clean URLs (no file extensions) and also redirect from.html to .shtml at the same time without changing all my html code?

I wanted to use file extensions within the question body to make it clear as possible but the system kept throwing code errors at me when I hadn't used anything like code.
I have numerous pages that comprise a section of my website. Let's, for example, call the main page:
http://www.articles.mysite.com/
With in that, let's say some of the html files are:
"10things"
"extras"
"t7n"
"i2""
Essentially, the file names tie into what they contain, but they don't all follow the same name pattern for whatever reason (some are just numbers, some are numbers and some are numbers and letters together, for instance).
What I want to do now is upgrade these files so I can use serverside includes (SSI's) as I do on other pages of my website. However, I'm running into a couple of issues.
The URLs aren't clean (they have file extensions) and the same is true of links I've posted to social media, for instance. I'd like the resultant URL the user sees to not show these file extensions, partly for SEO and partly just to make it look less cluttered.
When I've tried upgrading the files by just changing their names, the links on my end appeared to work, but when using one of the social media links, I kept getting 404 errors so I started from scratch and kept trying to resolve the issues on my own. Unfortunately this hasn't worked and I'm now back to square one, with the links currently working with standard files.
To reiterate, I'd like the following to occur:
User clicks a link, whether directly on my site or on a social media site that takes me to a page on my own website.
Even if the link is one of the old ones, the user is silently redirected to the new version of the page, with a clean URL that does not include any extension for better readability and SEO purposes.
All this should ideally be able to happen without me needing to change the index files that store the links, only renaming the html file extensions.
The only two pieces of information that might be of help if I can figure out how to combine them are as follows:
This introduction to redirects, which references mapping file types as part of redirect matching with the same path and filename:
Could this be modified, changing the extensions used, to map the requests to the new renamed files from the old extension?
This previous question from Stack Exchange about rewriting and redirecting at the same time which talks about cleaning up extensions:
Could this be combined with the redirection in the previous question to make a clean and easy method of redirecting the user, cleaning up the extension and making it look as if nothing's changed with a file name being all that's required other than the above code?
You can use this code to redirect your .html page to .shtml page without changing any line of code.
#redirect .html link to .shtml link
RewriteEngine On
RewriteRule ^(.*)\.html$ $1.shtml [L]

Multiple index.html files inside folder structure

I just came up with an idea. Instead of using an .htaccess file to remove .html from the URL, why not just use a simple folder structure and in each folder add an index.html?
For instance:
example.com/index.html → Home
example.com/about/index.html → About
Now simply use a hyperlink on the homepage to the about folder, since typically index.html files are opened automatically.
The upside of this kind of navigation, is that it would be easily possible to create sub pages with no crazy database / .htaccess setup.
Now my question is: is there any reason not to create a webpage like that and is it legitimate to use multiple index.html files?
I appreciate all the help.
With the index.html route, there would be three URL's that can access the same page. For example for an about page:
www.yourwebsite.com/about
www.yourwebsite.com/about/
www.yourwebsite.com/about/index.html
Using the .htaccess file would likely give you more benefit from an SEO perspective. You can tell the search engine which one to use, by using 301 redirects. See more about how Google does this here:
https://webmasters.googleblog.com/2010/04/to-slash-or-not-to-slash.html
Of course you could set up your website using folders and index.html's, and still
use the .htaccess file to take care of the SEO. But, depending on your sites size and structure, this might be more work.
The only downsides would be having to create a folder in addition to a file whenever you want to create a new page, and having to take more time to navigate into a folder in order to edit a page.
As long as you are using Apache, or a similar server software, multiple index files will function normally and be served from each folder.

Use history.pushState while ignoring/bypassing .htaccess (alternative solution welcome)

What I'm trying to do is have a single dynamic file that takes a parameter to affect content (/view.html?k=about) but uses history.pushState to change the URL to something more user-friendly (ki/about). In addition, anytime an AJAX call is made on content.html to load new content, it updates the URL according, (e.g. if products are loaded via AJAX, change URL to keywords/products).
My current solution is any path requested from ki is redirected via .htaccess to the view.html page. view.html then uses history.pushState to change the URL. As links are clicked, the URL updates. The problem with this, however, is it causes a infinite loop.
Here is my .htaccess file, residing in the /ki/ folder.
RewriteEngine on
RewriteRule ^(.*)$ /concept/view.html?k=$1 [R=permanent,L]
What can I do to get my desired result? If there's a way to achieve the same thing without the .htaccess file then that's acceptable too.
You are going to want to rewrite any url that goes in the form of ki/about to the /view.html?k=about behind the scenes.
history.pushState is only really meant to be used for web applictions like Spotify that don't reload the page but would still make sense to have the back button have some functionality.
That way, urls can be shared without giving 404 pages.
I have not tested this but I am sure you want something like this
RewriteRule ^ki/([A-Za-z]+)/$ /view.html?ki=$1
If the user types in the ugly url, they will still get to the same page no problem. But the pretty urls will direct users to the right webpage.
For more info you can go here.
http://www.yourhtmlsource.com/sitemanagement/urlrewriting.html

Joomla HTML validation

I am trying to validate the home page of my Joomla. The issue is that I have the site on my local host so I can not simply copy the URL into http://validator.w3.org/ to validate.
My next thought was to open the index page in my browser and then run firebug to access the source code, and then copy and paste the code into the validator.
This seemed to work okay however when the code returns errors, I now don't know where to access the html to correct them.
Thoughts?
If you have not much knowledge about the way Joomla works, you will have to learn about the file locations.
Normaly, most changes should be done on the index.php file located in your template folder (root/templates/name_of_your_template/index.php)
If the changes you need to make aren't located in this file, you can have a look at the modules, component or plugin files that output these error and that becomes more serious.
If there is a template override for the module/component/plugin you need to modify, the files should be located in root/templates/name_of_your_template/html/name_of_the_module_component_or_plugin
If there is no template overide for the module/component/plugin that outputs the error, you will have to learn about template overrides.
Depending on your browser, there are many extensions that will validate non-accessible (i.e. localhost) pages
In Google Chrome - https://chrome.google.com/webstore/detail/html-validator/cgndfbhngibokieehnjhbjkkhbfmhojo?hl=en
Once it's installed, click on the icon and validate local page.

Load same page with different URLs?

I want to create a single page such as this:
http://www.mywebsite.com/special/index.html
But anything in the /special/ folder should be able to load the index.html page. For example, if you go to
http://www.mywebsite.com/special/another-page.html
It should still load the index page but not change the URL in the browser or to search engines. Basically, you should be able to go to any page in the /special/ folder, keep the URL the same as you enter, but always load the index.html page. Any ideas?
A 404 or 301 redirect wouldn't work because that changes the URL in the browser and to search engines...
Thanks in advance!
A 404 redirect would not help, but a custom 404 handler would:
error404.php:
<?php
include('path/to/special/index.html');
?>
Assuming .html is a static or PHP page. If it is something else, youse the equivalent construct of that environment.
Using apache mapping it should be possible. I don't how to exactly do that but this doc http://httpd.apache.org/docs/2.0/urlmapping.html probably has the answer.
It is possible to use patterns to map URL to filesystem locations.
I think (untested) something like this would work in an .htaccess file in the special directory if you have the ability to use rewrite rules:
RewriteRule ^.*$ index.html
You would have to symlink index.html in the special directory to the real index.html.
The ^.*$ just means (beginning of line)(any amount of anything)(end of line) - basically a wildcard; there might be a better way of writing it.