How is location determined from internet? - language-agnostic

I was just installing Ubuntu and noticed it was downloading updates from ca.archive.ubuntu.com. How did it know I was in Canada? As far as I'm aware an IP packet carries no information regarding physical (geographcial location) and there is no stipulation in the Ethernet standard saying anything about information regarding location.
So how do things such as geolocation work? For example this website tells you which country your IP address belongs to. Is it just a matter of looking up an IP address in a table? If so where does the data come from, it's not as if people actively signup to have their IP address associated with the building address?

how does IP address geolocation work, does it just lookup the IP from a table?
Yes, that's exactly how it works.
IP geolocation is nothing more complicated than a database lookup. IP addresses are assigned by IANA to regional governing entities who then assign (sell) them to ISPs, governments and corporations (IBM for example has a dedicated block of IP addresses for themselves because they got into the internet game very early on).
Based on this fact we can sort of figure out where an IP address is located. IANA themselves publish the block level allocations on their site: https://www.iana.org/assignments/ipv4-address-space/ipv4-address-space.xml which is rendered beautifully in this XKCD comic: http://xkcd.com/195/.
As for the more detailed info like which city that IP address comes from, to get that information requires more data gathering. Some ISPs may tell you their assignment schemes, most dont. So most databases like whatismyipaddress.com painfully build theirs up by surveys (simply asking people where they are or via smartphone apps tapping into GPS), looking up whois databases (which may or may not lie) and careful guessing.

Yes, your IP carries a geolocation as well. I'm not sure that's the best way to describe it, as it doesn't really carry the information (I don't think?). This link gives a pretty good idea of the kind of details they can get from your ISP though:
http://whatismyipaddress.com/geolocation-accuracy
Of course all of that revealing information can be partially negated by using a proxy.

Related

GCE Instance IP Address

I recently created a GCE instance in the "europe-west" zone.
Its intended to run an application that connects off to an external webservice.
When trying to login to the webservice I get an error about restricted region.
It turns out the webservice does not accept login requests from US regions.
I checked and even though my instance is in the "europe-west" zone, its associated IP is being reported as US.
Is there anything I can do to get a proper region IP or is there any way around this?
May need to abandon GCE if the answer is no...
Thanks
Robert
Reposting the answer from Gary Ling, product manager for external networking:
Thank you for posting the email. We are aware of this issue that
(almost) all Google IP addresses are SWIP'ed to be Mountain View, CA.
And at Google, it's not uncommon to remap a block of IPs from one
location to another, especially given the elasticity of IP addresses
for the Cloud. Too bad that many of external Geo IP services solely
depend on SWIP database. While we are evaluating what we can do to
help our customers, your best bet in my opinion is contacting your
API provider and explore options they may offer now.
To be more explicit, there are several ways that a Geo IP provider might determine the location of an IP address. Most of these probably won't work well with a global cloud provider like GCP.
Associate the IP with the region of the allocating internet authority. In this case, GCE has addresses mostly allocated from ARIN, the American Internet authority. Once allocated to Google, these addresses can be used in any location by managing routing rules on Google's internal network.
Associate the IP with the address of the registering company. In that case, the official address associated with all GCP IPs is the Google Mountain View headquarters, even for addresses used in Europe or Asia.
Use network distance measurements to determine where a subnet is located. This method is more expensive, because it requires sending active pings from multiple locations around the globe; typically the address is associated with the closest measurement node. This is a more accurate method, but requires running many well-connected nodes and sending a lot of internet traffic to, at a minimum, each /24 on the internet.
All IP address from the Google Cloud will always originate to US, particularly Mountain View City, because it is linked to Google HQ which is located there. I would like you to know that all data and hardware for your instance are located on specific data centers across the globe, depending on the location that you have selected. You may refer to this link [1] for reference. However, Google Public DNS uses the anycast routing to direct the packets to the closest DNS server geographically, so if you are assigning an IP address for any instance, Google's network will be aware that the IP address is on that zone, even if the IP address was originally from Mountain View, California. See this link [2] [3] for a detailed explanation. The reason that you see your instance's IP address originate from US is because the entire IP block is owned by Google and the ARIN information lists the address for the entire block to Google's HQ in Mountain View.
[1] https://groups.google.com/forum/#!searchin/gce-discussion/ip$20in$20us%7Csort:relevance/gce-discussion/otD1c6E-wWI/cvEDCUAlBAAJ
[2] Why do Google Cloud Platform static IP addresses list Mountain View, CA in reverse lookup regardless of region assignment?
[3] https://groups.google.com/forum/#!searchin/gce-discussion/us$20ip%7Csort:relevance/gce-discussion/RjzyHRBRujg/Fd21YlmOpzEJ

Why HTML5 Geolocation?

Why does HTML5 geolocation let you share your location? What is main purpose of using geolocation, as you can get the location with IP address as well. Is there any difference between these two methods?
I'm asking because geolocation requires the user's permission and also doesn't work on all browsers.
HTML5 GeoLocation tends to be much more accurate than IP-based GeoLocation.
IP-based GeoLocation depends on databases associated with ISPs to figure out where you are. This doesn't work well when your ISP services a very large area and gives out dynamic IP addresses. The address in one town today might be 100 miles away tomorrow. Furthermore, those databases are usually not updated frequently. If your ISP sells off blocks of IPs or moves them to a new town, the database may still incorrectly think you're somewhere else.
HTML5 location uses services provided by your browser to figure out where you are. If your computer has GPS built-in (such as on many mobile devices and some laptops), it will know exactly where you are. This makes it much more useful for webapps that have a navigation or location component. For devices without GPS, it can often provide a very good approximation based on nearby known wireless signals and other factors, such as tracing what routers your computer goes through when connecting to the internet. The exact implementation depends on the computer, what hardware it has available, and how the browser chooses to do things.
For example, when I check an IP-based location service, it says that I'm in a particular large city in the same general area that I live in, but it's actually about 50 miles away.
When I use an HTML5 location based service to figure out where I am, it's only off by about 20 blocks.
If you're developing a webapp which needs location data, try using HTML5 GeoLocation if at all possible. Set up a fallback, so that if HTML5 location fails, you can use an GeoIP solution instead. This way, you will get optimal accuracy on supported browsers and devices, but will still have a solution in place when the HTML5 data are not available.
If you used the geolocation-javascript project from Google Code, you could use the following code:
//determine if device has geo location capabilities
if(geo_position_js.init()){
geo_position_js.getCurrentPosition(success_callback,error_callback);
}
else{
// use an AJAX request to look up the location based on IP address
}
Geolocation is a lot more precise than IP address, though both can be faked.
IP address just gives you country and general region.
Geolocation gives you:
Geographic coordinates (latitude and longitude)
Speed (assuming you're on a device that can measure this; most tablets and smartphones can)
Altitude (this is also dependent on the device)
Degrees clockwise from north (again, assuming the device supports this)
http://diveintohtml5.ep.io/geolocation.html has some good info on geolocation and the HTML5 geolocation API. Here's the W3C Candidate Recommendation.
Obtaining a user's location through the IP-address is by far not as accurate. The IP-address' location is mostly based on the location of the actual server, which can be requested. Often this is far away from the actual user's location, so it only provides the basic region.
HTML5 geolocation on the other hand is more precise, but the user's information is used to determine the location and often also speed along with some other things. This is based on the device's GPS if available, and otherwise on information entered by the user. Clearly this is way more accurate. Both methods can be faked though.
Getting location by IP address only gives a vague location (it is rarely any more accurate than to town, and often much less accurate than that, depending on your location and ISP).
It is also sometimes completely inaccurate: if I use a VPN to connect to my company network, I will show up as being at their office because I will have an IP address from the office, but I could actually be connecting from anywhere in the world.
HTML5 geolocation can be much more accurate -- if you have a GPS receiver in your device, then it is completely accurate, but even without that, in heavily populated areas it can get your position with an accuracy of 20 meters or less by mapping the local wireless network signals. And it doesn't matter how you've connected, it will always be accurate.
Because HTML5 geolocation is so accurate, it is considered a privacy risk, so the spec states that you must give permission before a site can use your gelocation data. Also, not all browsers or machines may be capable of providing the gelocation data. The website therefore must be able to cope with users who do not provide it, and cannot rely on it being provided.
IP address location doesn't have this kind of restriction because the location mapping is done by the server using publically available IP location mapping data. The end user cannot avoid giving out their IP address, so they cannot prevent the website mapping them using it.
So the two are completely different.
The major difference that you will see is the accuracy. IP addresses only give you a very general idea of where someone might be... Geolocation will tell you exactly where they are. To read more on geolocation go here, and a demo of how accurate it can be can be is found here
HTML5 geolocation gives the client (browser) the possibility to provide the location information of the machine. This is potentially orders of magnitude more accurate than IP location. For example, the client could have actual GPS hardware installed, or be able to triangulate the location by GSM or WiFi spots. Location by IP on the other hand is very rough and somewhere between not always accurate and misleading.

Is there any globaly unique identifier for a client machine accessible through the web browser?

Is there any way to identify a users machine through a browser without previously putting cookies in? Probably no access to Mac Address through the web right? Just thought I'd ask...
There is no such identity element, and even if there were, the nature of the HTTP protocol would not prevent it from being spoofed.
In short: No.
This was partly why Intel tried to have unique processor IDs a few years back, but that didn't ever take off. (Which is good as now we have multi-core machines.)
Just install a cookie on the box. IP address is no good because of Natting. Someday we'll have IPv6 to do this correctly.
You could retrieve an IP address, but it frequently wouldn't mean much (if anything). If you retrieve the IP address the client is using, you'll get a whole lot of them that are 192.168.*. If you retrieve the address your server sees, it won't match that, and you might easily see several (possibly hundreds or even thousands) of machines with the 'same' IP address.
If you put those two together, you'll get something that's unique for the moment, but is subject to change at any time. The client's local IP address may change when their DHCP lease expires and their global IP address may change anytime they reboot their router (unless they have a static IP address, which you mostly don't have any way of knowing).

What are the approaches to restrict the access to a group of machines in a web system?

My bank website has a security feature that let me register the machines that are allowed to make banking transactions. If someone steals my password, he won't be able to transfer my money from his computer. Only my personal computers are allowed to make transcations from my account. So...
What are the approaches to restrict the access to a group of machines in a web system?
In other words, how to identify the computer who made the http request in the web server?
Why not using a clients certificate inside the certificate store of an authorized host or inside a cryptographic token such as smartcard that can be plugged into any desired computer?
Update: You should take into account that uniquely identifying a computer means obtaining something that is at a relative low level, unaccessable to code embeded in an html page (Javascript, not signed applet or activeX), unless you install something in the desired computer (or executing something signed such as an applet or activeX).
One thing that is unique per computer is the MAC address of the Ethernet card, that is almost ubiquitous on every rather modern (and not so modern) computer. However that couldn't be secure enough since many cards allow changing its MAC address.
Pentium III used to have an unique serial number inside CPU, that could fit perfect for your use. The downside is that no newer CPUs come with such a thing due to privacy concerns from most users.
You could also combine many elements of the computer such as CPU id (model, speed, etc.), motherboard model, hard disk space, memory installed and so on. I think Windows XP used to gather such type of information to feed a hash to uniquely identify a computer for activation purposes.
Update 2: Hard disks also come with serial numbers that can be retrieved by software. Here is an example of how to get it for activation purposes (your case). However it will work if sb takes the HD to another computer. Nonetheless you can still combine it with more unique data from computer (such as MAC address as I said before). I would also add a unique key generated for a user and kept in a database of your own would (that could be retrieved online from a server) along with the rest to feed a hash function that identifies the system.
Did you actually install something?
Over and above what Mark Brittingham mentions about IP addresses, I suppose some kind of hash code that is known only to your bank's computer and your computer(s) would work, provided you installed something. However, if you don't have a very strong password to begin with, what would stop someone from "registering" their computer to steal money from you?
I would guess your bank was doing it by using a trusted applet - my bank used to have a similar approach (honestly I thought it was a bit of a hassle - now they're using a calculator-like code generator instead). The trusted applet has access to your file system, so it can write some sort of identifier to a file on your system and retrieve this later.
A tutorial on using trusted applets.
I'm thinking about using Gears to store locally a hash-something to flag that the computer is registered.
If you are looking for the IP address of the computer that makes an account-creation request, you can easily pull that from the Request. In ASP.NET, you'd use:
string IPAddress = Request.UserHostAddress;
You could then store that with the account record and only accept logins for that account from that IP address. The problem, of course, is that this will not work for a public site at all. Most people come through an ISP that assigns IP addresses dynamically. Even with an always-on internet connection, the ISP will occasionally drop and re-open the connection, resulting in a change of IP address.
Anyway, is this what you are looking for?
Update: if you are looking to register a specific computer, have you considered using cookies? The drawback, of course, is that someone may clear their cookies and thus "unregister" their computer. The problem is, the web only has so much access to your computer (not much) so there is no fool-proof way to "register" a computer. Even if you install an ActiveX control, they could uninstall or delete it (although this is more persistent than a cookie). In the end, you'll always have to provide the end-user with some method for re-registering. And, if you do that, then you might as well have then log in anyway.

Blocking access for a given geographic location

What is the most reliable way to prevent users from a geographic location to access a web available application?
I understand that IPs are related to geo positioning and I also understand that the most naive way is to get the HTTP request header IP address and take it from there.
It's obvious that naive methods, like the one described are extremely easy to bypass, specially using Proxies or VPNs.
So the question is: is there a 100% reliable way of determining a web user geo location? If not, what are the available options and what are the pros and cons on each of them?
The short answer is no. There is no way to 100% lock down the people from a specific geographic location because you can't guarantee the location of a user that reliably using an IP address. Even if you could, it can be faked through redirects.
There are ways to make it more difficult for people in a region to access the site, but the more restrictive you get with those approaches the more legitimate users you are likely to lock out. For example, turning off the server would give you 100% assurance that no one from China could hit it, but it would also give you 100% assurance that no one in the US could either.
Nothing in TCP/IP includes location data (other than what you can infer from routing tables or look up in a database), and nothing indicates whether a machine is acting "on behalf of" someone in another location.
So as you say, proxies and VPN, SSH port-forwarding, TOR, etc, can completely prevent your web app from knowing the physical location of the human being who's using your site. All you can look up, is the IP address of that last hop which is the TCP/IP connection and HTTP request you actually see.
The above techniques won't work if anyone is trying to hide their location from you by redirecting through relays in other countries.
I found this script to be an easy way to implement this:
https://www.blocked.com/
Country blocking is included in the free version, as is blocking of open proxy servers, anonymity networks, etc.
There is a database somewhere on the tubes named IP 2 Country which can tell where an IP is from.
It is of course not perfect but it can give you the country where the ip comes from.
There is also a method called SSN which is related to ip addresses. I don't know how it works however, and seems to be rather complicated. It is comonly used in ads to send you localised spam. For example if you live in Montreal, Canada, then the ad will display "Find singles from Montreal!". The ISP behind the person does have to support this service.
first - figure out what ip groups are assigned to the region then you could check with every request for the user's ip address. If it matches part of the region you want to block then send them to disney.com.
See if this helps you: IP Address Info
No, there's no fool-proof way of doing this.
There's plenty of related work going on at the IETF in the GeoPriv working group, where protocols are being designed (e.g. HELD) to allow entities to ask the network their own location, and also allow other authorised entities to request that information.
However the VPN issue still causes problems, to the extent that clients with VPN capability need to request their location information before the VPN is established.