How do I direct all traffic or searches going to duplicate (or similar URLs) to one URL on our website? - duplicates

I'll try to keep this as simple as possible, as i don't quite understand how to frame the question entirely correctly myself.
We have a report back on our website that is indicating duplicate meta titles and descriptions, which look very much (almost exactly) like the following - although i have used an example domain below:
http://example.com/green
https://example.com/green
http://www.example.com/green
https://www.example.com/green
But, only one of these actually exists as an HTML file on our server, which is:
https://www.example.com/green
As i understand it, i need to somehow tell google and other search engines which of these URLs is correct, and this should be done by specifying a 'canonical' link or URL.
My problem is that the canonical reference must apparently be added to any duplicate pages that exist, and not the actual main canonical page? But we don't actually have any other pages, beyond the one mentioned just above. So there is nowhere to set these canonical rel references?
I'm sure there must be a simple explanation for this that i am completely missing?

So it turns out that these were duplicate URLs which occur as a result of the fact that our website exists as a subdomain of our domain. Any traffic that arrives at example.com (our domain) needs permanent redirect to https://www.example.com, by way of a redirect within the htaccess documentation.

Related

How deal with redirects in Wikipedia dump?

I have successfully imported the enwiki-latest-pages-articles-multistream.XML page into MySQL using this guide.
When I lookup the text for a page (process described here), it will often be #REDIRECT [[some_page_name]]. The only way I know of to follow this redirect is by searching through all page titles for some_page_name. Not only is this time consuming but sometimes there are multiple articles under the exact same title name!
I'm considering just removing all redirect pages from the database.
But before I do, is there a better way to handle these redirects?
As I understand, you want to determine what is the target of the redirect. Right?. If yes, then you can get it using this query:
select rd_title from redirect
inner join page
on page_id = rd_from
where page_title like "some_page_name"
The rd_title is the target page of the redirect.
Please correct me if I'm wrong.

Create a unique URL like facebook

How exactly does one do something like create a unique URL.
Like how facebook does it facebook.com/mynamehere
One way would be to create multiple folders each time we have a new user..but that doesn't seem to be the best approach
You can try a program like Elgg if you are trying to build a social media site. Otherwise, a person's profile can be custom in a couple of ways. Most of them mentioned. You, as mentioned, can use .htaccess for rewrites. You can use an automated custom url plugin (this may help: How to generate a custom URL from a html input?). Similarly, you can use the previously mentioned Elgg for social media, and but also as a last resort can use your folder method, but only if absolutely required.
I think the question is: how is it done technically, so we don't need to have physical file for every valid URL?
The answer is URL rewriting. In case of Apache server, you want to enable mod_rewrite and configure it to translate particular URL pattern (like myfbclone.com/mynamehere to myfbclone.com/index.php?username=mynamehere). This way you need to have one script file that handles all the URLs accordingly.
Different servers have different means of rewriting URLs, like Nginx or IIS, so the exact way of configuration depends on your server, but the concept is usually the same.

is homepage url and homepage url + / ever different?

Let's say we have a site that's test.com, would test.com/ ever be a different site/file then test.com? I know that the url represents a path to the server to get that file. Going to test.com/file and test.com/file/ could potentially bring up different sites since the latter is a directory. So I was wondering if the same is true for the root.
Or am wrong about the url as well?
It all depends on your environment and how it handles the routing. Many frameworks treat url/ and url as aliases, but many frameworks don't. So the answer to your question is yes, it can be different.
the url represents a path to the server to get that file
This is correct, but this can be any path you choose.
When you create a simple website with nested folders yes you can create something like this:
/webroot/index.html
/blog.html
/myvideos/list.html
Which results in for example www.example.com, www.example.com/blog.html and www.example.com/myvideos/list.html
But with some server side settings called rewrites you can make your url behave like anything you want.
I could even redirect urls to entire different servers. Or make 2 different urls go to the same path. Anything you want.

how to deal with www/http in href

I have a db with a buch of urls. The values were entered by users, so it might be something like www.domain.com or http://www.domain.com or stackoverflow.com or https://something.com
I'm retrieving that data and creating links in a html page so people can click and be redirected to that url.
If i get the url from the page , i'll have either:
1.<a href="www.domain.com">
or
2.<a href="http://www.domain.com">
in the second case it works, but the first it doesn't.
Is there a way to make it always work?
thanks!
The www. bit is not special at all, people rely on an automatic correction feature of most browsers to prepend it if the host does not exist. To replicate this, you need to run a program that attempts to resolve each of the host names in your database, and retries with an extra www. if that fails.
The http:// bit is easy: if it is missing, add it.
There are two ways to handle this situation:
First, validate the user input. At the time a URL is submitted, validate it (preferably on the client side via Javascript) to ensure it has the required elements.
Second, in your code, you can use a regular expression or even simple pattern matching to ensure that the string starts with 'http://' or 'https://', and prepend it as needed.
The implementation details vary from language to language, but the concept is the same.

web-development: how do you usually handle the "under costruction" page"?

I was wondering what's the best way to switch a website to a temporary "under costruction" page and switch it back to the new version.
For example, in a website, my customer decided to switch from Joomla to Drupal and I had to create a subfolder for the new CMS, and then move all the content to the root folder.
1) Moving all the content back to the root folder always create some problems with file permissions, links, etc...
2) Creating a rewrite rule in .htaccess or forward with php is not a solution because another url is shown including the top folder.
3) Many host services do not allow to change the root directory, so this is not an option since I don't have access to apache config file.
Thanks
Update: I can maybe forward only the domain (i.e. www.example.com) and leave the ip on the root folder (i.e. 123.24.214.22), so the access is finally different for me and other people? Can I do this in .htaccess file ?
One thing to consider is you don't want search engines to cache your under construction page - and you also don't want them to drop your homepage from the search index either (Hence just adding a "noindex" meta tag isn't the perfect solution).
A good way to deal with this is do a 302 redirect (temporarily moved) from your homepage to your under construction page - that way the search engine does not cache your homepage as an under construction page, does not index your under construction page (assuming it has a NOINDEX meta tag), and does not drop your homepage from the search index either.
One way would be the use of an include on your template page.
When you want the construction page to show, you set a redirect in the include to take all traffic to the construction page.
When you are done your remove the redirect.
What about hijacking your index.php file?
Something simple, along the lines of
<?php
if (SITE_OFFLINE)
include 'under_construction.html';
else
//normal content of your index page
?>
where you would naturally define SITE_OFFLINE in an appropriate place for your needs.
What I did when I used PHP for websites was to configure Apache to direct all requests to a front controller. You then would have full access to all requests no matter where they are pointing to. Then in your front controller (PHP file, static html file, etc.), you would do whatever you need to do there.
I believe you need to configure pathinfo in Apache and some other settings, it has been about 3 years since I have used that approach. But, this approach is also good for developing your own CMS or application so that you have full control over security.
You have to do something similar to this:
http://www.phpwact.org/pattern/front_controller
I am looking for more details, I know my configuration had more to it than that.
This is part of what I'm looking for too:
http://httpd.apache.org/docs/2.0/mod/core.html
Enabling path_info passes path information to the script, so all requests now go through a single point of entry. Let me find my configuration, I know vaguely how this works, but I'm sure it looks like a lot of hand waving.
Also, keep in mind that because all requests are going through this single PHP file, you are responsible for serving images, JavaScript, CSS, etc. So, if a requests is coming in for /css/default.css, that will go through your php script (index.php, most likely), then you'll need to determine how to handle the request. Serving static files is trivial, but it is a little more work.
If you don't want to go that route, you could possibly do something with mod_rewrite so that it only looks for .html, .htm pages or however you have your site configured. For me, I don't do extensions, so that made my regex a little more difficult. I also wanted to secure access to all files. The path_info was the solution for me, but if you don't need that granularity, then writing a front controller might be a bit too much work.
Walter