What is the best way to point a toplevel domain to an url? - google-chrome

i have the following situation, due to the IT departement at our university, i had three choices to point a top level domain to the content hosted at the university server:
Redirect
use frames
use a reverse proxy
i know frames are deprecated and suck, however getting a server where i can set up a reverse proxy sounded like a bit of an overkill and redirect was not an option, as the dirty url of the webapp server would appear in the addressbar.
So, when i looked up the site in Chrome, i got the message that the site contains unsafe content, opening the console told me that the "unsafe content" are the Google Webfonts i included in the page. All other browsers worked just fine...
Does anyone have an elegant solution for this? I'm not really happy with using frames in the first place.
Thank you guys in advance, cheers!
I will of course provide all the config files/code snippets needed!!

The best and cleanest option in this case would be reverse proxy with URL rewriting (if you don't like the webapp's ones). If you post your endpoints we'd be able to prepare ones for you. Or check whatever tutorial (for e.g. this, this or this)
One important thing which nobody mentions is to use the ProxyPreserveHost directive if on other end you deal with headers processing.
Also you may consider Forward Proxy instead of Reverse one, its easier to configure.
Complete reference here.

Mixing http and https, may be the cause of the unsafe content error. Be sure you are loading the pages and the fonts with the same protocol.
As for pointing your domain, I like the virtual host solution above. If your IT department says that everything else is "impossible", you might be stuck with frames. :)

Related

What adds the string ".json" to a url?

It's a silly question, but we have noticed one user pretty constantly appends ".json" to the URL when navigating our website. This appended string breaks our url signature validation, so this user is being rejected quite a lot (and it's showing up in my error log daily, you decide which is worse).
I'm sure there's a browser plugin or something doing it, but I just can't figure out what would cause it.
We have a ColdFusion website that passes a few url params between pages, and often makes ajax get requests for JSON, but we don't ever append .json to the url.
Can you think of what might be causing this, or where I can look for an answer? If/when I know what might be doing this then I might ask another question about appropriate ways to handle it.
Thanks all!
You need to find out a bit more about your user to understand the motivations. Look out for browser, OS, origin IP for example. If it's all within your normal user behaviour then potentially can be something on your customer device. If it is completely outside your user's normal behaviour might be that you are "under attack" and somebody or something is trying to find vulnerabilities in your website.
Cheers

What's the point of oEmbed API endpoints and URL schemes vs. link tags?

The oEmbed specification mentions 2 different ways of finding the oEmbed content of an URL:
Knowing the API endpoint of the website and passing it, through a GET parameter, the URL you want info about, if it matches the URL pattern it declared.
Discovering the URL of the oEmbed version thanks to a <link rel="alternate" type="application/json+oembed" ... /> (or text/xml+oembed) HTML header.
The 2nd ways seems more generic, as you don't have to store and maintain a whole list of providers. Moreover, lists of providers are the sign of a centralized internet, where only a few actors exist. This approach is hardly scalable.
I can see a use for the 1st approach, though, for websites that can parse resources made available by someone else. For example, I can provide an oEmbed version of video pages from website Foo. However, for several reasons, mainly security-related, I wouldn't trust someone who says "I can parse resource X for you" unless X's author is OK with that, which brings us back to approach 2.
So my question is: what did I miss here? What's the use of the 1st method of dealing with oEmbed? For instance, why store (and maintain up-to-date) a whole list of endpoints and patterns like oohEmbed does if you have a generic way of discovering it on-the-fly and for virtually any resource on the internet?
As a very closely related question, which I think may be asked at the same time (please correct me if I'm wrong): what happens if one doesn't provide a central endpoint for oEmbed contents, but rather, say, expect a '?version=oembed' parameter on each URL, that returns the oEmbed version instead of the standard one?
If I recall correctly, supporting both mechanisms was a compromise that we figured would help drive adoption. It's much easier to persuade large web properties to add a single endpoint vs. adding markup (that's irrelevant to most clients) to every response body. It was a pragmatic choice.
Longer term we planned to leverage some of the work Eran Hammer-Lahav was doing around discovery rather than re-inventing it (poorly, again). Unfortunately, his ideas still haven't gotten much traction and the web still lacks a good, standardized way to do this sort of thing.
I was hoping to find an answer here but it looks like everyone else is as confused as we are. The advantage of using option 1 in my opinion is that it only uses 1 json request instead of a potentially expensive html request followed by the json request. You can always use option 2 as a fallback in case you can't match a pattern in your pre-baked list of oEmbed providers.
OEmbed discovery is a major security concern. WordPress for example has a whitelist of supported OEmbed providers.
Suppose that every random URL at the internet can trigger an OEmbed code. That means everyone can hack your site.
Steps:
Create a new site, add an OEmbed discovery.
Post the URL to a form at your site. Now your site perform the OEmbed on my behalf.
Exploit:
by denial-of-service (DOS): e.g. redirect the URL to a tarpit or feed it a 1GB json response.
by cross site scripting (XSS): inject random HTML to pages that other people can see.
by stealing the admin's session-cookie via XSS: now the attacker can login to your CMS to upload files, and exploit even more.
It's XSS to the max, with little to stop it. The only sane thing to do, is whitelisting proper endpoints. That's the oEmbed endpoints are explicitly listed.
If you want something scalable, you might like www.noembed.com and www.embedly.com They provide OEmbed support for various sites which don't do OEmbed themselves.

What are the implications to redirecting images behind the <image> tag?

Some setup:
We have some static images, publicly available. However, we want to be able to reference these images with alternative URLs in the image tag.
So, for example, we have an image with a URL like:
http://server.com/images/2/1/account_number/public/assets/images/my_cool_image.jpg
And, we want to insert that into our front html as:
<img src="http://server.com/image/2/my_cool_image.jpg">
instead of
<img src="http://server.com/images/2/1/account_number/public/assets/images/my_cool_image.jpg">
One really neat solution we've come up with is to use 301 redirects. Now, our testing has rendered some pretty neat results (all current generation browsers work), but I am wondering if there are caveats to this approach that I may be missing.
EDIT: To clarify, the reason we want to use this approach is that we are also planning on using an external host to serve up resources, and we want to be able to turn this off on occasion. So, perhaps the URL in the would be
http://client.com/image/3/cool_image.jpg
in addition to the "default" way of accessing
Another technique that doesn't require a server round-trip is to use your HTTP server's equivalent of a "rewrite engine". This allows you to specify, in the server configuration, that requests for one URL should be satisfied by sending the results for some other URL instead. This is more efficient than telling the browser to "no, go look over there".
One possible downside may be SEO - a long and complex file name with generic folder names may hinder a shorter/snappier URL.
The main issue though is that there will be an extra HTTP request for every image. In general you should try and minimise HTTP requests to improve performance. I think you should rewrite the URLs to the longer versions behind the scenes as others have said.
You don't have to rewrite the entire URL with the final URL. If you want a little more flexibility, you could rewrite the images like this:
http://server.com/image/2/my_cool_image.jpg
Rewritten as:
http://server.com/getImage?id=2&name=my_cool_image.jpg
Then the "getImage" script would read a config file either serve up the longer-names file on server.com or the other one from client.com. You only have one trip to the server and a tiny bit of overhead on the server itself, unnoticeable to the visitor.
The only real downside is a second DNS lookup and a little server overhead to calculate the redirect, which both affect performance. Otherwise I can't think of any problem with this technique.

Should I default my website to www.foo or not?

Notice how the default domain for stackoverflow is http://stackoverflow.com and if you try to goto http://www.stackoverflow.com it bounces you to http://stackoverflow.com ?
What is the reason for this? Not the tech reason (as in the http code, etc) but why would the site owners want to do this?
I know it's purely aesthetic and I always have host-headers for both www and not, but is there a reason to bounce a user to a single domain, subheaded or not?
Update 1
Not having a subdomain is called a bare domain. Thanks peeps! never knew it had a term :)
Update 2
Thanks for the answers so far - please note I understand that www.domain.com can point to domain.com. This is not a question about if i should offer both or either/or, it's asking why some sites default to a baredomain instead of www subdomains, or vice-versa. Cheers.
Jeff Atwood actually HAS explained why he's gone for bare domains here and here. (Nod to Jonas Pegerfalk for the post :) )
Jeff's post (and others in this thread) also talk about the problems of a bare domain with cookies and static images. Basically, if you have cookies on in a bare domain, then all subdomains are forced also. The solution is to purchase another domain, as posted by the Yahoo Perf Team here.
Jeff Atwood has written a great article about the The Great Dub-Dub-Dub Debate. There is also a blog entry in the Stackoverflow blog on why and how Stackoverflow has dropped the www prefix.
as far as I can tell, it doesn't really matter, but you should pick one or the other as the default, and forward to that.
the reason is that, depending on the browser implementation, www.example.com cookies are not always accessible to example.com (or is it the other way around?)
for more discussion on this, see:
in favor of www
http://faq.nearlyfreespeech.net/section/domainnameservice/baredomain#baredomain - This webhost lists several good reasons for anyone considering doing more than simple webhosting to consider (such as load balancing, subdomains with different content, etc.)
http://yes-www.org - This blog post from 2005 mainly proposed that most internet users needed the www prefix in order to recognize a URL. This is less important now that browsers have built-in searching. Most computer illiterates I know bypass the URL bar entirely.
in opposition to www
http://no-www.org/
and a miscellaneous related rant about why www should not be used as a CNAME, but only as an A RECORD.
http://member.dnsstuff.com/rc/index.php?option=com_myblog&task=view&id=62&Itemid=37
It is worth noting that you can't have CNAME and an NS record on the same (bare domain) name in DNS. So, if you use a CDN and need to set up a CNAME record for your web server, you can't do it if you are using a bare domain. You must use "www" or some other prefix.
Having said that, I prefer the look of URLs without the "www." prefix so I use a bare domain for all my sites. (I don't need a CDN.)
When I am mentioning URLs for the general public (eg. on a business card), I feel that one has to use either the www. prefix or the http:// prefix. Otherwise, just a bare domain name doesn't tell people they can necessarily type it into their browser. So, since I consider http:// an ugly wart on a business card, I do use the www. prefix there.
What a mess.
In some cases, www might indeed point to a completely separate subdomain in a large corporate environment. Especially on an internal network, having the explicit www can make split DNS easier if the Web site is hosted externally (say, at Rackspace in Texas, but everything else is in your office in Virginia.) In most cases, it doesn't matter.
The important thing is to pick one and add an IHttpModule, rewrite rule, or equivalent for your platform to permanently redirect requests from one to the other.
Having both can lead to scary certificate warrnings when switching from http to https if you don't have a wildcard certificate and forget to explicitly redirect based on your site's name (which you probably don't because you want your code to work in both dev and production, so you're using some variable populated by the server).
Much more importantly, having both accepted results in search engines seeing duplicated content--you get dinged for having duplicated content, and you get dinged because your hits are getting split across two different URIs, hurting your rankings.
actually you can use both of them. So it's better to find user your address or some. I mean actually it doesn't really matter tho :)
But putting www as a prefix is more common in public so I guess I'd prefer to use www behind it.
It's easier to type google.com than www.google.com, so give the option of both. remember, the www is just a subdomain.
Also no www is a commonplace these days, so maybe make the www.foo.com redirect to foo.com.
I think one reason is to help with search rankings so that for each page only one page is getting rankings instead of being split between two domains.
I'm not sure why the StackOVerflow team decided to use only one, but if it were me, I'd do it for simplicity. You'd have to allow for both since a lot of people type www by default or out of habit (I'm sure less "techy" people have no idea that there's a difference).
Aside from that, there used to be a difference as far as search engines were concerned and so there was concern about having either a duplicate content penalty or having link reputation split. But this has long since been handled and so isn't much of a consideration at this point.
So I'd say it's pretty much personal preference to keep things simple.

Unlinked web pages on a server - security hole?

On my website, I have several html files I do not link off the main portal page. Without other people linking to them, is it possible for Jimmy Evil Hacker to find them?
If anyone accesses the pages with advanced options turned on on their Google toolbar, then the address will be sent to Google. This is the only reason I have can figure out why some pages I have are on Google.
So, the answer is yes. Ensure you have a robots.txt or even .htaccess or something.
Hidden pages are REALLY hard to find.
First, be absolutely sure that your web server does not return any default index pages ever. Use the following everywhere in your configuration and .htaccess files. There's probably something similar for IIS.
Options -Indexes
Second, make sure the file name isn't a dictionary word -- the odds of guessing a non-dictionary word fall to astronomically small. Non-zero, there's a theoretical possibility that someone, somewhere might patiently guess every possible file name until they find yours. [I hate these theoretical attacks. Yes, they exist. No, they'll never happen in your lifetime, unless you've given someone a reason to search for your hidden content.]
Your talking about security through obscurity (google it) and it's never a good idea to rely on it.
Yes, it is.
It's unlikely they will be found, but still a possibility.
The term "security through obscurity" comes to mind