Should a subdomain be accessible as subfolder? - subdomain

I have a subdomain for my website created in cPanel and I've noticed that in addition to being able to access the content through this URL:
subdomain.example.com
It can also be accessed via:
example.com/subdomain
Questions:
Is that normal?
Is there any way to only allow access to the
subdomain through, well, the subdomain?

There's nothing wrong with it as long as either you don't send users to the directory, or the applications and pages in that directory can handle using two different URLs (e.g. it uses only relative URLs).
If you want to block the directory, then try this htaccess directive:
RewriteRule ^subdomain/ - [L,R=404]

Related

How can I simply expose local .html files via web browser using an application server (Glassfish)?

Lets say I have a directory of .html files, accessible by the app server, and I want to display to users so they can access them with their browser:
/import/tps-reports/index.html
/import/tps-reports/report1.html
/import/tps-reports/report2.html
Is there a way I can expose the tps-reports directory to do this so that a user can access them via:
http://www.example.com/tps-reports/index.html
http://www.example.com/tps-reports/report1.html
Also, keep in mind that index.html may reference the other pages:
Report 1
So those links need to work as well.
Here is a possible answer:
http://docs.oracle.com/cd/E19776-01/820-4496/geqpl/index.html
You can set up an alternate doc root so that certain URI patterns point to different paths.
The examples are only really showing relative paths though...I wonder if its "ok" to use this to reference local file systems.

Content Delivery Network configuration - CakePHP

I want to use Cloudfront to serve images and CSS for my cakephp website. I would like to just host the files on my host and use cloud front to speed up delivery of said files, I dont know how to proceed?
Till now I have created a distribution on CloudFront with my Origin Domain and CName and deployed it.
Origin Domain: example.com CName cdn.example.com
I added the CNAME for my domain:
cdn.mydomain.com with destination xx.cloudfront.net
Do I need to update the links in my HTML to that cname so if the stylesheet was http://example.com/app/webroot/css/style.css do I change that to http://cdn.example.com/app/webroot/css/style.css
You can go through and update your links to point to the CDN you would have to do this for every image and CSS/JS that you are serving from your CDN.
Another option would be to create a redirect in your .htaccess, perhaps something link this?:
RewriteRule ^css/(.*)$ http://cdn.mydomain.com/css$1 [R=301,NC,L]
I'm no .htaccess wizard so don't just copy and paste and expect it to work, but it should give you an idea of what you could do.

htaccess: how to restrict IP Address for URL path?

(I don't think the above link provides an answer for this question; see comment below)
I'm using a CMS system that uses a database instead of the traditional file-directory structure to store/manage webpages. Since there's no folder for each webpage, I can't place an .htaccess file in specific folders to control access.
There is one .htaccess file that exists in the root directory. Is it possible to use that .htaccess file to restrict access by IP Address for a specific URL? If so, what's the syntax? Note, it must be for URL because I don't believe there is even a specific file with .html extension for a webpage with this CMS (it's all handled behind the scenes somehow).
For example, the .htaccess file is here:
/home/username/public_html/.htaccess
and the URL needing access control is here:
https://www.mycompany.com/locations/europe/contactus/
What's the code to place in the .htaccess file to only allow the IP Address of 111.222.333.444 (for example) access to the above URL?
If this isn't possible, is there another way to solve this problem for restricting IP Address to a specific webpage?
Using mod_rewrite to control access
Here is the documentation of it, you may go to section "Blocking of Robots" to block IP range which trying to access particular url.
I think you may try this, I am not sure whether is works, but still try it, hope it works for you.
RewriteEngine on # this is for opening the module
RewriteCond %{REMOTE_ADDR} =123\.45\.67\.[8-9]
RewriteRule ^/locations/europe/contactus/ - [F]

Cloudfront Custom Origin Is Causing Duplicate Content Issues

I am using CloudFront to serve images, css and js files for my website using the custom origin option with subdomains CNAMEd to my account. It works pretty well.
Main site: www.mainsite.com
static1.mainsite.com
static2.mainsite.com
Sample page: www.mainsite.com/summary/page1.htm
This page calls an image from static1.mainsite.com/images/image1.jpg
If Cloudfront has not already cached the image, it gets the image from www.mainsite.htm/images/image1.jpg
This all works fine.
The problem is that google alert has reported the page as being found at both:
http://www.mainsite.com/summary/page1.htm
http://static1.mainsite.com/summary/page1.htm
The page should only be accessible from the www. site. Pages should not be accessible from the CNAME domains.
I have tried to put a mod rewrite in the .htaccess file and I have also tried to put a exit() in the main script file.
But when Cloudfront does not find the static1 version of the file in its cache, it calls it from the main site and then caches it.
Questions then are:
1. What am I missing here?
2. How do I prevent my site from serving pages instead of just static components to cloudfront?
3. How do I delete the pages from cloudfront? just let them expire?
Thanks for your help.
Joe
[I know this thread is old, but I'm answering it for people like me who see it months later.]
From what I've read and seen, CloudFront does not consistently identify itself in requests. But you can get around this problem by overriding robots.txt at the CloudFront distribution.
1) Create a new S3 bucket that only contains one file: robots.txt. That will be the robots.txt for your CloudFront domain.
2) Go to your distribution settings in the AWS Console and click Create Origin. Add the bucket.
3) Go to Behaviors and click Create Behavior:
Path Pattern: robots.txt
Origin: (your new bucket)
4) Set the robots.txt behavior at a higher precedence (lower number).
5) Go to invalidations and invalidate /robots.txt.
Now abc123.cloudfront.net/robots.txt will be served from the bucket and everything else will be served from your domain. You can choose to allow/disallow crawling at either level independently.
Another domain/subdomain will also work in place of a bucket, but why go to the trouble.
You need to add a robots.txt file and tell crawlers not to index content under static1.mainsite.com.
In CloudFront you can control the hostname with which CloudFront will access your server. I suggest using a specific hostname to give to CloudFront which is different than you regular website hostname. That way you can detect a request to that hostname and serve a robots.txt which disallows everything (unlike your regular website robots.txt)

difference between http and www

pardon me for asking a very basic doubt.
I have hosted a page in the site collinfo.annauniv.edu
The page opens fine when i enter the address as http://collinfo.annauniv.edu
But when i gave www.collinfo.annauniv.edu my browser shows 404 error.
What is the difference that http causes here in place of www.
The www. before your domain is actually a subdomain. It's essentially the same thing as help.microsoft.com or orders.amazon.com.
With that in mind, there are a few things that could be happening:
1) Your DNS records do not include the appropriate A Record for the www subdomain.
In this case, you'll need to setup an A record that points to your web site's IP address. If you don't know how to do this, your web host should be able to help.
2) Your server is not configured to handle the www subdomain.
If you're using the apache web server, it needs to be configured to show your web site when the user enters www before your domain. Again, your web host can set this up for you.
It all comes down to a misconfiguration issue. If you don't have experience administering web servers, you may want to give your web host a holler.
www comes from the (rather) old time where a domain had several sub-features, of which the web was not always the main service. For instance
www.domain.tld for web
mail.domain.tld for mail
ftp.domain.tld for ftp
domain.tld for web
but this is a convention - any subdomain may point to anything actually.
This is more a question of DNS declaration and/or web-server configuration ; in this case it is probably that the web-server configuration does not trigger the same pages for www.domain and domain (since you get a 404).
The author / administrator of collinfo.annauniv.edu either forgot to create a DNS entry for www.collinfo.annauniv.edu or did not create a virtual domain (web-server side) for it that would point to the same pages as collinfo.annauniv.edu.
HTTP is a protocol.
http://collinfo.annauniv.edu
Is the address of a resource which can be retrieved using HTTP.
annauniv.edu is the domain in your case.
collinfo is the subdomain.
www.collinfo is also considered as a subdomain but it does not exist. That's why you get HTTP 404 not found.
Subdomain can be anything, www is usually used as it usually mean World Wide Web.
WWW is a subdomain
HTTP is a protocol (language)
Whether you specify HTTP in the browser or not, the browser will always assume the request is being of "http" type and will ussually add http:// for you.
WWW however is just an alternative subdivision of the domain name, the same as in:
www.domain.com
site.domain.com
sub1.domain.com
sub2.domain.com
.....
etc.domain.com
In most cases the WWW subdomain will point to the same "page" as the main domain, which is usually called the "index" page, such as index.html, or index.php and in most cases the index page is hidden in the browser's address bar, unless you specifically type it in, such as http://www.yahoo.com/index.html, but you have to understand that if you have a full control of your webserver you can modify these, so WWW doesn't point to the same page or you can call you main page "home.html" instead of "index.html" and instruct your webserver to "point" your browswer to that page by default.
But things like HTTP are not easily changed, since HTTP is the main language of the web and most browswers use that as the primary means to access the webservers.
Peace!