Blocking access for a given geographic location - language-agnostic

What is the most reliable way to prevent users from a geographic location to access a web available application?
I understand that IPs are related to geo positioning and I also understand that the most naive way is to get the HTTP request header IP address and take it from there.
It's obvious that naive methods, like the one described are extremely easy to bypass, specially using Proxies or VPNs.
So the question is: is there a 100% reliable way of determining a web user geo location? If not, what are the available options and what are the pros and cons on each of them?

The short answer is no. There is no way to 100% lock down the people from a specific geographic location because you can't guarantee the location of a user that reliably using an IP address. Even if you could, it can be faked through redirects.
There are ways to make it more difficult for people in a region to access the site, but the more restrictive you get with those approaches the more legitimate users you are likely to lock out. For example, turning off the server would give you 100% assurance that no one from China could hit it, but it would also give you 100% assurance that no one in the US could either.

Nothing in TCP/IP includes location data (other than what you can infer from routing tables or look up in a database), and nothing indicates whether a machine is acting "on behalf of" someone in another location.
So as you say, proxies and VPN, SSH port-forwarding, TOR, etc, can completely prevent your web app from knowing the physical location of the human being who's using your site. All you can look up, is the IP address of that last hop which is the TCP/IP connection and HTTP request you actually see.

The above techniques won't work if anyone is trying to hide their location from you by redirecting through relays in other countries.

I found this script to be an easy way to implement this:
https://www.blocked.com/
Country blocking is included in the free version, as is blocking of open proxy servers, anonymity networks, etc.

There is a database somewhere on the tubes named IP 2 Country which can tell where an IP is from.
It is of course not perfect but it can give you the country where the ip comes from.
There is also a method called SSN which is related to ip addresses. I don't know how it works however, and seems to be rather complicated. It is comonly used in ads to send you localised spam. For example if you live in Montreal, Canada, then the ad will display "Find singles from Montreal!". The ISP behind the person does have to support this service.

first - figure out what ip groups are assigned to the region then you could check with every request for the user's ip address. If it matches part of the region you want to block then send them to disney.com.
See if this helps you: IP Address Info

No, there's no fool-proof way of doing this.
There's plenty of related work going on at the IETF in the GeoPriv working group, where protocols are being designed (e.g. HELD) to allow entities to ask the network their own location, and also allow other authorised entities to request that information.
However the VPN issue still causes problems, to the extent that clients with VPN capability need to request their location information before the VPN is established.

Related

How is location determined from internet?

I was just installing Ubuntu and noticed it was downloading updates from ca.archive.ubuntu.com. How did it know I was in Canada? As far as I'm aware an IP packet carries no information regarding physical (geographcial location) and there is no stipulation in the Ethernet standard saying anything about information regarding location.
So how do things such as geolocation work? For example this website tells you which country your IP address belongs to. Is it just a matter of looking up an IP address in a table? If so where does the data come from, it's not as if people actively signup to have their IP address associated with the building address?
how does IP address geolocation work, does it just lookup the IP from a table?
Yes, that's exactly how it works.
IP geolocation is nothing more complicated than a database lookup. IP addresses are assigned by IANA to regional governing entities who then assign (sell) them to ISPs, governments and corporations (IBM for example has a dedicated block of IP addresses for themselves because they got into the internet game very early on).
Based on this fact we can sort of figure out where an IP address is located. IANA themselves publish the block level allocations on their site: https://www.iana.org/assignments/ipv4-address-space/ipv4-address-space.xml which is rendered beautifully in this XKCD comic: http://xkcd.com/195/.
As for the more detailed info like which city that IP address comes from, to get that information requires more data gathering. Some ISPs may tell you their assignment schemes, most dont. So most databases like whatismyipaddress.com painfully build theirs up by surveys (simply asking people where they are or via smartphone apps tapping into GPS), looking up whois databases (which may or may not lie) and careful guessing.
Yes, your IP carries a geolocation as well. I'm not sure that's the best way to describe it, as it doesn't really carry the information (I don't think?). This link gives a pretty good idea of the kind of details they can get from your ISP though:
http://whatismyipaddress.com/geolocation-accuracy
Of course all of that revealing information can be partially negated by using a proxy.

Is there any tips for minimising access to a public page without login?

I have a page that is just a non interactive display for a shop window.
Obviously, I don't link to it, and I'd also like to avoid people stumbling across it (by Google etc).
It will always be powered by Chrome.
I have thought of...
Checking User Agent for Chrome
Ensuring resolution is 1920 x 1080 (not that useful as it is a client side check)
Banning under robots.txt to keep Google out of it
Do you have any more suggestions?
Should I not really worry about it?
Not that I would EVER recommend what I'm about to suggest - how about filtering by IP address. Since you provider IP is rarely going to change you can use Javascript to kick out or deny requests from IP addresses other than yours. Maybe a clean redirect to http://www.google.com or something silly like that. Although I would still suggest locking it down with a login and password and just have it write a never expiring cookie. That's still not a great idea but a shy bit better than the road your trucking down right now.
You could always limit the connections by IP address (If you know it ahead of time/it's reliable):
Apache's access control
If it is just for a shop window, do you even need access to a web page?
You can host the file locally.
Personally, I wouldn't worry about it, if no-one is linking to it externally it is unlikely to ever be found by search engines.

Is there any globaly unique identifier for a client machine accessible through the web browser?

Is there any way to identify a users machine through a browser without previously putting cookies in? Probably no access to Mac Address through the web right? Just thought I'd ask...
There is no such identity element, and even if there were, the nature of the HTTP protocol would not prevent it from being spoofed.
In short: No.
This was partly why Intel tried to have unique processor IDs a few years back, but that didn't ever take off. (Which is good as now we have multi-core machines.)
Just install a cookie on the box. IP address is no good because of Natting. Someday we'll have IPv6 to do this correctly.
You could retrieve an IP address, but it frequently wouldn't mean much (if anything). If you retrieve the IP address the client is using, you'll get a whole lot of them that are 192.168.*. If you retrieve the address your server sees, it won't match that, and you might easily see several (possibly hundreds or even thousands) of machines with the 'same' IP address.
If you put those two together, you'll get something that's unique for the moment, but is subject to change at any time. The client's local IP address may change when their DHCP lease expires and their global IP address may change anytime they reboot their router (unless they have a static IP address, which you mostly don't have any way of knowing).

What are the approaches to restrict the access to a group of machines in a web system?

My bank website has a security feature that let me register the machines that are allowed to make banking transactions. If someone steals my password, he won't be able to transfer my money from his computer. Only my personal computers are allowed to make transcations from my account. So...
What are the approaches to restrict the access to a group of machines in a web system?
In other words, how to identify the computer who made the http request in the web server?
Why not using a clients certificate inside the certificate store of an authorized host or inside a cryptographic token such as smartcard that can be plugged into any desired computer?
Update: You should take into account that uniquely identifying a computer means obtaining something that is at a relative low level, unaccessable to code embeded in an html page (Javascript, not signed applet or activeX), unless you install something in the desired computer (or executing something signed such as an applet or activeX).
One thing that is unique per computer is the MAC address of the Ethernet card, that is almost ubiquitous on every rather modern (and not so modern) computer. However that couldn't be secure enough since many cards allow changing its MAC address.
Pentium III used to have an unique serial number inside CPU, that could fit perfect for your use. The downside is that no newer CPUs come with such a thing due to privacy concerns from most users.
You could also combine many elements of the computer such as CPU id (model, speed, etc.), motherboard model, hard disk space, memory installed and so on. I think Windows XP used to gather such type of information to feed a hash to uniquely identify a computer for activation purposes.
Update 2: Hard disks also come with serial numbers that can be retrieved by software. Here is an example of how to get it for activation purposes (your case). However it will work if sb takes the HD to another computer. Nonetheless you can still combine it with more unique data from computer (such as MAC address as I said before). I would also add a unique key generated for a user and kept in a database of your own would (that could be retrieved online from a server) along with the rest to feed a hash function that identifies the system.
Did you actually install something?
Over and above what Mark Brittingham mentions about IP addresses, I suppose some kind of hash code that is known only to your bank's computer and your computer(s) would work, provided you installed something. However, if you don't have a very strong password to begin with, what would stop someone from "registering" their computer to steal money from you?
I would guess your bank was doing it by using a trusted applet - my bank used to have a similar approach (honestly I thought it was a bit of a hassle - now they're using a calculator-like code generator instead). The trusted applet has access to your file system, so it can write some sort of identifier to a file on your system and retrieve this later.
A tutorial on using trusted applets.
I'm thinking about using Gears to store locally a hash-something to flag that the computer is registered.
If you are looking for the IP address of the computer that makes an account-creation request, you can easily pull that from the Request. In ASP.NET, you'd use:
string IPAddress = Request.UserHostAddress;
You could then store that with the account record and only accept logins for that account from that IP address. The problem, of course, is that this will not work for a public site at all. Most people come through an ISP that assigns IP addresses dynamically. Even with an always-on internet connection, the ISP will occasionally drop and re-open the connection, resulting in a change of IP address.
Anyway, is this what you are looking for?
Update: if you are looking to register a specific computer, have you considered using cookies? The drawback, of course, is that someone may clear their cookies and thus "unregister" their computer. The problem is, the web only has so much access to your computer (not much) so there is no fool-proof way to "register" a computer. Even if you install an ActiveX control, they could uninstall or delete it (although this is more persistent than a cookie). In the end, you'll always have to provide the end-user with some method for re-registering. And, if you do that, then you might as well have then log in anyway.

When should one use a 'www' subdomain?

When browsing through the internet for the last few years, I'm seeing more and more pages getting rid of the 'www' subdomain.
Are there any good reasons to use or not to use the 'www' subdomain?
There are a ton of good reasons to include it, the best of which is here:
Yahoo Performance Best Practices
Due to the dot rule with cookies, if you don't have the 'www.' then you can't set two-dot cookies or cross-subdomain cookies a la *.example.com. There are two pertinent impacts.
First it means that any user you're giving cookies to will send those cookies back with requests that match the domain. So even if you have a subdomain, images.example.com, the example.com cookie will always be sent with requests to that domain. This creates overhead that wouldn't exist if you had made www.example.com the authoritative name. Of course you can use a CDN, but that depends on your resources.
Also, you then don't have the ability to set a cross-subdomain cookie. This seems evident, but this means allowing authenticated users to move between your subdomains is more of a technical challenge.
So ask yourself some questions. Do I set cookies? Do I care about potentially needless bandwidth expenditure? Will authenticated users be crossing subdomains? If you're really concerned with inconveniencing the user, you can always configure your server to take care of the www/no www thing automatically.
See dropwww and yes-www (saved).
Just after asking this question I came over the no-www page which says:
...Succinctly, use of the www subdomain
is redundant and time consuming to
communicate. The internet, media, and
society are all better off without it.
Take it from a domainer, Use both the www.domainname.com and the normal domainname.com
otherwise you are just throwing your traffic away to the browers search engine (DNS Error)
Actually it is amazing how many domains out there, especially amongst the top 100, correctly resolve for www.domainname.com but not domainname.com
There are MANY reasons to use the www sub-domain!
When writing a URL, it's easier to handwrite and type "www.stackoverflow.com", rather than "http://stackoverflow.com". Most text editors, email clients, word processors and WYSIWYG controls will automatically recognise both of the above and create hyperlinks. Typing just "stackoverflow.com" will not result in a hyperlink, after all it's just a domain name.. Who says there's a web service there? Who says the reference to that domain is a reference to its web service?
What would you rather write/type/say.. "www." (4 chars) or "http://" (7 chars) ??
"www." is an established shorthand way of unambiguously communicating the fact that the subject is a web address, not a URL for another network service.
When verbally communicating a web address, it should be clear from the context that it's a web address so saying "www" is redundant. Servers should be configured to return HTTP 301 (Moved Permanently) responses forwarding all requests for #.stackoverflow.com (the root of the domain) to the www subdomain.
In my experience, people who think WWW should be omitted tend to be people who don't understand the difference between the web and the internet and use the terms interchangeably, like they're synonymous. The web is just one of many network services.
If you want to get rid of www, why not change the your HTTP server to use a different port as well, TCP port 80 is sooo yesterday.. Let's change that to port 1234, YAY now people have to say and type "http://stackoverflow.com:1234" (eightch tee tee pee colon slash slash stack overflow dot com colon one two three four) but at least we don't have to say "www" eh?
There are several reasons, here are some:
1) The person wanted it this way on purpose
People use DNS for many things, not only the web. They may need the main dns name for some other service that is more important to them.
2) Misconfigured dns servers
If someone does a lookup of www to your dns server, your DNS server would need to resolve it.
3) Misconfigured web servers
A web server can host many different web sites. It distinguishes which site you want via the Host header. You need to specify which host names you want to be used for your website.
4) Website optimization
It is better to not handle both, but to forward one with a moved permanently http status code. That way the 2 addresses won't compete for inbound link ranks.
5) Cookies
To avoid problems with cookies not being sent back by the browser. This can also be solved with the moved permanently http status code.
6) Client side browser caching
Web browsers may not cache an image if you make a request to www and another without. This can also be solved with the moved permanently http status code.
There is no huge advantage to including-it or not-including-it and no one objectively-best strategy. “no-www.org” is a silly load of old dogma trying to present itself as definitive fact.
If the “big organisation that has many different services and doesn't want to have to dedicate the bare domain name to being a web server” scenario doesn't apply to you (and in reality it rarely does), which address you choose is a largely cultural matter. Are people where you are used to seeing a bare “example.org” domain written on advertising materials, would they immediately recognise it as a web address without the extra ‘www’ or ‘http://’? In Japan, for example, you would get funny looks for choosing the non-www version.
Whichever you choose, though, be consistent. Make both www and non-www versions accessible, but make one of them definitive, always link to that version, and make the other redirect to it (permanently, status code 301). Having both hostnames respond directly is bad for SEO, and serving any old hostname that resolves to your server leaves you open to DNS rebinding attacks.
Apart from the load optimization regarding cookies, there is also a DNS related reason for using the www subdomain. You can't use CNAME to the naked domain. On yes-www.org (saved) it says:
When using a provider such as Heroku or Akamai to host your web site, the provider wants to be able to update DNS records in case it needs to redirect traffic from a failing server to a healthy server. This is set up using DNS CNAME records, and the naked domain cannot have a CNAME record. This is only an issue if your site gets large enough to require highly redundant hosting with such a service.
As jdangel points out the www is good practice in some cookie situations but I believe there is another reason to use www.
Isn't it our responsibility to care for and protect our users. As most people expect www, you will give them a less than perfect experience by not programming for it.
To me it seems a little arrogant, to not set up a DNS entry just because in theory it's not required. There is no overhead in carrying the DNS entry and through redirects etc they can be redirected to a non www dns address.
Seriously don't loose valuable traffic by leaving your potential visitor with an unnecessary "site not found" error.
Additionally in a windows only network you might be able to set up a windows DNS server to avoid the following problem, but I don't think you can in a mixed environment of mac and windows. If a mac does a DNS query against a windows DNS mydomain.com will return all the available name servers not the webserver. So if in your browser you type mydomain.com you will have your browser query a name server not a webserver, in this case you need a subdomain (eg www.mydomain.com ) to point to the specific webserver.
Some sites require it because the service is configured on that particular set up to deliver web content via the www sub-domain only.
This is correct as www is the conventional sub-domain for "World Wide Web" traffic.
Just as port 80 is the standard port. Obviously there are other standard services and ports as well (http tcp/ip on port 80 is nothing special!)
Imagine mycompany...
mx1.mycompany.com 25 smtp, etc
ftp.mycompany.com 21 ftp
www.mycompany.com 80 http
Sites that don't require it basically have forwarding in dns or redirection of some-kind.
e.g.
*.mycompany.com 80 http
The onlty reason to do it as far as I can see is if you prefer it and you want to.