Unlinked web pages on a server - security hole? - html

On my website, I have several html files I do not link off the main portal page. Without other people linking to them, is it possible for Jimmy Evil Hacker to find them?

If anyone accesses the pages with advanced options turned on on their Google toolbar, then the address will be sent to Google. This is the only reason I have can figure out why some pages I have are on Google.
So, the answer is yes. Ensure you have a robots.txt or even .htaccess or something.

Hidden pages are REALLY hard to find.
First, be absolutely sure that your web server does not return any default index pages ever. Use the following everywhere in your configuration and .htaccess files. There's probably something similar for IIS.
Options -Indexes
Second, make sure the file name isn't a dictionary word -- the odds of guessing a non-dictionary word fall to astronomically small. Non-zero, there's a theoretical possibility that someone, somewhere might patiently guess every possible file name until they find yours. [I hate these theoretical attacks. Yes, they exist. No, they'll never happen in your lifetime, unless you've given someone a reason to search for your hidden content.]

Your talking about security through obscurity (google it) and it's never a good idea to rely on it.

Yes, it is.

It's unlikely they will be found, but still a possibility.
The term "security through obscurity" comes to mind

Related

Website directories with the same name but different capitalizations

I've been asked to create a few different folders with the same name but with different capitalizations. The idea behind this is to allow for errors in capitalization when someone types in a specific url. They want to do something like this:
www.website.com/youtube
www.website.com/Youtube
www.website.com/youTube
www.website.com/YouTube
I believe this is bad practice for many reasons, mainly that it seems confusing and unnecessary and any updates to these pages will have to be done 4 times over. I've also noticed that VSCode won't let me create these directories from within the editor and my computer, a windows machine, won't let me do it from within the file manager either.
I've seen that this can create a problem with git as it won't recognize the files as separate files regardless of capitalization.
So really my questions are:
1.) Is there a way to do this?
2.) If so, is it a bad practice?
3.) If it's a bad practice, why?
I'd like to do it for them if possible, but not if there are some unforeseen consequences that I'm not aware of. Any insight would be appreciated.
Thanks in advance.
edit: Just to be clear, we already have www.website.com/youtube but a few users have reported that their browser autocorrects the 'youtube' section of the url to have the Y or the T capitalized. From what I see now, to accomplish this we must do something on the server side, of which my knowledge is limited. All I know for sure is that it is a Linux server.
To start with, the sane solution would be redirect those routes to the proper one, which is not an uncommon task. I don't know what your infrastructure looks like to the ease of doing so is unknown.
1.) Is there a way to do this?
Assuming that your server is Linux/BSD/Using anything but a Windows NTFS filesystem, yes. You can have one folder as source of truth and create symlinks. Or again, you could make the routes case-insensitive on whatever server you're using.
2.) If so, is it a bad practice?
Cloning the same information and making the same updates repeatedly is terrible practice. Making symlinks on the server is slightly less bad but still pretty bad practice, as that's cluttering up your directory tree with unnecessary nonsense.
3.) If it's a bad practice, why?
The idea isn't bad practice, you can make case-insensitive routes on most modern server configurations. The provided suggestions are pretty bad. But without knowing what your stack looks like, we can't provide much more information on how to do it.

Why should I hide my website index?

I've tried looking it up and am now coming back to here to see if I can get my question answered. Why should I make it so that others cannot view my indexes? Is there a security reason for this?
I'm just beginning in web development....so I definitely could use any help/info that you all can provide.
It's often considered a security best practice to hide directory listings. You may accidentally upload files to your docroot that you don't want to share to the world. Without knowing the URL, nobody would be able to access them. While this is a very thin layer of security, it can be helpful.
There are certainly times when you may want a directory listing, such as download directories. It's up to you to decide what is useful to you. If you don't need it, don't use it. If you do, use it.

What is the best way to point a toplevel domain to an url?

i have the following situation, due to the IT departement at our university, i had three choices to point a top level domain to the content hosted at the university server:
Redirect
use frames
use a reverse proxy
i know frames are deprecated and suck, however getting a server where i can set up a reverse proxy sounded like a bit of an overkill and redirect was not an option, as the dirty url of the webapp server would appear in the addressbar.
So, when i looked up the site in Chrome, i got the message that the site contains unsafe content, opening the console told me that the "unsafe content" are the Google Webfonts i included in the page. All other browsers worked just fine...
Does anyone have an elegant solution for this? I'm not really happy with using frames in the first place.
Thank you guys in advance, cheers!
I will of course provide all the config files/code snippets needed!!
The best and cleanest option in this case would be reverse proxy with URL rewriting (if you don't like the webapp's ones). If you post your endpoints we'd be able to prepare ones for you. Or check whatever tutorial (for e.g. this, this or this)
One important thing which nobody mentions is to use the ProxyPreserveHost directive if on other end you deal with headers processing.
Also you may consider Forward Proxy instead of Reverse one, its easier to configure.
Complete reference here.
Mixing http and https, may be the cause of the unsafe content error. Be sure you are loading the pages and the fonts with the same protocol.
As for pointing your domain, I like the virtual host solution above. If your IT department says that everything else is "impossible", you might be stuck with frames. :)

Safe to have page resources without file extensions?

I need to decide on naming conventions for a new website.
I can use mod_rewrite at will.
My favourite solution would be to work with no file extension at all.
www.exampledomain.com/language/pagename
this would lead to "pagename" being treated as a directory. I would have to take that into account when using relative links.
Are there any other pitfalls I need to be aware of when doing this?
Is this legal, or are resources supposed to have a "name.prefix" structure?
Do you know of any clients that can't deal with this and start looking for /index.htm or .html?
Can you think of any SEO problems to be expected?
Unless you have a very good reason to add an extension, drop it.
are resources supposed to have a "name.prefix" structure?
Not that I know of. Normally not. Resources are just a concept. A custom resource format may have that extension requirement, the other would not. It will depend.
As for SEO, the short a link is, the better. It will increase relative weight of keywords. An extension would make links longer by 4 characters or more.
Do you know of any clients that can't deal with this and start looking for /index.htm or .html?
A problem may arise if you decide to support multiple entry points.
www.exampledomain.com
www.exampledomain.com/index.html
www.exampledomain.com//index.htm
www.exampledomain.com/index
These are all different urls to search engines. Some people will be linking to you with the shortest name, the others will use the other version. Then ultimately there will be different inbound links pointing to your site start page which will essentially be the same. Search engines will detect it and see it as content duplication. Consequently, your page rank will be divided between several url versions. Finally, all except one will likely be dropped out of their index entirely. To deal with this situation, decide for one "true" url and let others perform 301 redirect (moved permanently) to the "correct" url.
Dropping extensions actually has the significant benefit of not tying you to a specific language. If your URLs are http://example.com/page.php and you switch to another language, you'll either lose the existing URLs (bad!) or have to fake the PHP extension (clunky).

Should I default my website to www.foo or not?

Notice how the default domain for stackoverflow is http://stackoverflow.com and if you try to goto http://www.stackoverflow.com it bounces you to http://stackoverflow.com ?
What is the reason for this? Not the tech reason (as in the http code, etc) but why would the site owners want to do this?
I know it's purely aesthetic and I always have host-headers for both www and not, but is there a reason to bounce a user to a single domain, subheaded or not?
Update 1
Not having a subdomain is called a bare domain. Thanks peeps! never knew it had a term :)
Update 2
Thanks for the answers so far - please note I understand that www.domain.com can point to domain.com. This is not a question about if i should offer both or either/or, it's asking why some sites default to a baredomain instead of www subdomains, or vice-versa. Cheers.
Jeff Atwood actually HAS explained why he's gone for bare domains here and here. (Nod to Jonas Pegerfalk for the post :) )
Jeff's post (and others in this thread) also talk about the problems of a bare domain with cookies and static images. Basically, if you have cookies on in a bare domain, then all subdomains are forced also. The solution is to purchase another domain, as posted by the Yahoo Perf Team here.
Jeff Atwood has written a great article about the The Great Dub-Dub-Dub Debate. There is also a blog entry in the Stackoverflow blog on why and how Stackoverflow has dropped the www prefix.
as far as I can tell, it doesn't really matter, but you should pick one or the other as the default, and forward to that.
the reason is that, depending on the browser implementation, www.example.com cookies are not always accessible to example.com (or is it the other way around?)
for more discussion on this, see:
in favor of www
http://faq.nearlyfreespeech.net/section/domainnameservice/baredomain#baredomain - This webhost lists several good reasons for anyone considering doing more than simple webhosting to consider (such as load balancing, subdomains with different content, etc.)
http://yes-www.org - This blog post from 2005 mainly proposed that most internet users needed the www prefix in order to recognize a URL. This is less important now that browsers have built-in searching. Most computer illiterates I know bypass the URL bar entirely.
in opposition to www
http://no-www.org/
and a miscellaneous related rant about why www should not be used as a CNAME, but only as an A RECORD.
http://member.dnsstuff.com/rc/index.php?option=com_myblog&task=view&id=62&Itemid=37
It is worth noting that you can't have CNAME and an NS record on the same (bare domain) name in DNS. So, if you use a CDN and need to set up a CNAME record for your web server, you can't do it if you are using a bare domain. You must use "www" or some other prefix.
Having said that, I prefer the look of URLs without the "www." prefix so I use a bare domain for all my sites. (I don't need a CDN.)
When I am mentioning URLs for the general public (eg. on a business card), I feel that one has to use either the www. prefix or the http:// prefix. Otherwise, just a bare domain name doesn't tell people they can necessarily type it into their browser. So, since I consider http:// an ugly wart on a business card, I do use the www. prefix there.
What a mess.
In some cases, www might indeed point to a completely separate subdomain in a large corporate environment. Especially on an internal network, having the explicit www can make split DNS easier if the Web site is hosted externally (say, at Rackspace in Texas, but everything else is in your office in Virginia.) In most cases, it doesn't matter.
The important thing is to pick one and add an IHttpModule, rewrite rule, or equivalent for your platform to permanently redirect requests from one to the other.
Having both can lead to scary certificate warrnings when switching from http to https if you don't have a wildcard certificate and forget to explicitly redirect based on your site's name (which you probably don't because you want your code to work in both dev and production, so you're using some variable populated by the server).
Much more importantly, having both accepted results in search engines seeing duplicated content--you get dinged for having duplicated content, and you get dinged because your hits are getting split across two different URIs, hurting your rankings.
actually you can use both of them. So it's better to find user your address or some. I mean actually it doesn't really matter tho :)
But putting www as a prefix is more common in public so I guess I'd prefer to use www behind it.
It's easier to type google.com than www.google.com, so give the option of both. remember, the www is just a subdomain.
Also no www is a commonplace these days, so maybe make the www.foo.com redirect to foo.com.
I think one reason is to help with search rankings so that for each page only one page is getting rankings instead of being split between two domains.
I'm not sure why the StackOVerflow team decided to use only one, but if it were me, I'd do it for simplicity. You'd have to allow for both since a lot of people type www by default or out of habit (I'm sure less "techy" people have no idea that there's a difference).
Aside from that, there used to be a difference as far as search engines were concerned and so there was concern about having either a duplicate content penalty or having link reputation split. But this has long since been handled and so isn't much of a consideration at this point.
So I'd say it's pretty much personal preference to keep things simple.