Picking the most accurate geocode - google-maps

I'm using http://maps.google.com/maps/geo? web service to geocode some addresses.
The problem I have is that a fuller address doesn't necessarily give a more accurate geocode.
e.g passing in Llantysilio, Denbighshire, UK is far more accurate than Llantysilio, Llangollen, Denbighshire, UK
The Accuracy attribute in the XML doesn't seem very helpful in deciding which address to pick.
How have other people dealt with this issue? Is there a good way to pick the best geocode that works most/all of the time?
*edit
A bit of extra info - when I put in the fuller address the first line of the address is ignored and the geocoder jumps to a different, but exact, address which is a central street located in the extra line added to the address. In this example, it picks Castle Street in the middle llangollen, seemingly disregarding Llantysilio.
Edit by kdgregory: here are the two API requests that I used (missing API key doesn't seem to be an issue):
http://maps.google.com/maps/geo?q=Llantysilio,+Denbighshire,+UK&sensor=false&output=xml
http://maps.google.com/maps/geo?q=Llantysilio,Llangollen,++Denbighshire,+UK&sensor=false&output=xml

You have to interpret the accuracy my friend. There are usually 2 parts to an accuracy, first the address macthing. The second part is the important part. You can geocode something to a accuracy level of the United States, or a city level, zipcode centroid, street interpolated level or an actual parcel precision level. The first example has a 4 and the second is 9. For this service higher is better.
Accuracy Value Description
0 Unknown accuracy.
1 Country level accuracy.
2 Region (state, province, prefecture, etc.) level accuracy.
3 Sub-region (county, municipality, etc.) level accuracy.
4 Town (city, village) level accuracy.
5 Post code (zip code) level accuracy.
6 Street level accuracy.
7 Intersection level accuracy.
8 Address level accuracy.
9 Premise (building name, property name, shopping center, etc.) level accuracy.

It's probably good to note that Google does not follow the XAL specs, but rather implements them in a subset.
So, this means that you won't necessarily be able to do:
place.AddressDetails.Country.AdministrativeArea.Locality.LocalityName
place.AddressDetails.Country.AdministrativeArea.AdministrativeAreaName
place.AddressDetails.Country.CountryName
Because a country and sub-locality may be provided while a administrative area is not.
The data that is returned is identified with an accuracy gauge that gives you a relative idea of what you can expect for data. So, you can store objects and chop off parts of the full address using this variable and try to geocode in such a fashion - It's not recommended though.
Typically, a full address is (without the thoroughfare) is a good way of finding the general location. You can use some of the weighted-preferential logic Google provides to refine the address.
E.g. Use the setViewPort or setCountryCode to give your searches a bit more accuracy.
Remember, Geocoding is not a science. You can't expect consistent results.

A geocode response.Placemark[0] via gmap you can check what you got, and take the level or try again. I chose default in the order
place.AddressDetails.Country.AdministrativeArea.Locality.LocalityName
place.AddressDetails.Country.AdministrativeArea.AdministrativeAreaName
place.AddressDetails.Country.CountryName
It could be more logically named as seen above. gmaps 3 works somewhat incompatible with v2.

You can try a very ugly hack which consists in geocoding your full adresse and all subsets of the words your adress contains, you get a lot of geocodes that you use to get the adresses related to them with reverse geocoding tool.
Once you have plenty of adresses you compare them with the one you first gave, then you take the most accurate geocode...
Many requests, lot of iteration growing with each word you add to your adress, well an ugly work but can be fun to make some statistics ^^

In the end I concluded that there are far too many weird blips in address consistency with google's geocoding webservice in the UK, but eventually managed to figure out a way of using postcodes instead, which is far more accurate: how it's done

Related

Google maps sum of unique road distance in given area

I would like to calculate the number of unique kilometers of roadways in my city. More generally, I wish to sum the distance of every road within a bound, for simplicity a rectangle will do.
Is this possible using the Google Maps suite of APIs? If so, how would you go about doing it? If anyone has any resources related to this type of problem, I would be interested in reading them regardless of language (or even solutions with other mapping tools).
Bonus points: A general solution to this problem that can be applied to the pre set "cities" (example) that appear in Google Maps with well defined city limits.
You can use OpenStreetMap to calculate the total road length of a specific country or geographic area. There are multiple solutions available, based on multiple similar questions already asked.
Approach 1 from Total road length in Kilometers for a country at help.openstreetmap.org:
Use the Perl script osm-length-2.pl. There is an example at a mailing list post.
Approach 2 from Actual road length of exported map at help.openstreetmap.org:
Import your data (the planet or an country or area extract) into a PostGIS database, then use the following queries proposed by Frederik Ramm:
SELECT way AS clip
INTO clipping_polygon
FROM planet_osm_polygon
WHERE boundary='administrative' AND admin_level='8' and name='My City';
SELECT name, highway, ST_INTERSECTION(way, clip)
INTO clipped_roads
FROM planet_osm_line, clipping_polygon
WHERE ST_INTERSECTS(way, clip) AND highway IS NOT NULL;
SELECT highway, SUM(ST_LENGTH(way::geography))
FROM clipped_roads
GROUP BY highway;

Google Place Search does not return result

I need to fetch location data based on given text.
As example if I search Aldi in google map it shows me lot of data with pagination. I need to get that result using google places api.
I tried it with two API calls. But it returns me following result
https://maps.googleapis.com/maps/api/place/textsearch/json?query=Aldi&key=MY_KEY
Result
{
"html_attributions" : [],
"results" : [],
"status" : "ZERO_RESULTS"
}
https://maps.googleapis.com/maps/api/place/findplacefromtext/json?input=ALDI&inputtype=textquery&fields=place_id,name,formatted_address,geometry&key=MY_KEY
Result
{
"candidates" : [],
"status" : "ZERO_RESULTS"
}
I need to fetch data based on the given name. Can anyone find out the reason.
There are three types of searches provided by the Places API: Find Place, Nearby Search and Text Search. Each allows you to specify a location with radius to start the search from. The location is specified as a latitude/longitude pair. You received ZERO_RESULTS because you didn't specify a location for your request. If the location parameter is not specified "the API uses IP address biasing by default" according to the documentation. So, there are no Aldi stores within range of the location of your IP address.
Find Place will only return one result though, in my experience, it sometimes returns two. Both Nearby Search and Text Search will return up to 60 place results. All three of the Place search requests allow specifying a radius around your location of up to 50 kilometers. If you need to find Aldi places worldwide you'll need to make quite a few requests.
I am weeks into a similar project to find all locations for a list of restaurant chains in the US. I have found that Nearby Search is a better choice for my use case and should be considered always before committing to Text Search for a project. I've tested Aldi searches with both Nearby Search and Text Search and found that they provide the identical set of place_id results. This Nearby Search request will find all Aldi locations within 50 kilometers of New York City:
https://maps.googleapis.com/maps/api/place/nearbysearch/json?location=40.785276%2C-73.9651827&name=Aldi&radius=50000&key=MY_API_KEY
Here is the same as a Text Search:
https://maps.googleapis.com/maps/api/place/textsearch/json?query=Aldi&location=40.785276%2C-73.9651827&radius=50000&key=MY_API_KEY
So why should we care? Text Search according to API documentation "... returns all of the available data fields for the selected place, and you will be billed accordingly." Furthermore "... the Text Search service is subject to a 10-times multiplier. That is, each Text Search request that you make will count as 10 requests against your quota." A Nearby Search request is less expensive and not subject to the 10x multiplier. It returns a subset of the available data fields that you might find sufficient. If you need additional data fields, you can get only what you need from a Places Detail request. Do the math for your application before you select Text Search. It might be dramatically less expensive to implement using Nearby Search followed by Places Detail requests if necessary. In any case, you don't want to be shocked when you hit quota limits unexpectedly because of the 10x multiplier OR the billed transaction costs are more than you expect!
I have found additional hurdles that should be considerations for projects attempting to find all locations for a business in a large area:
The Places API will prefer places within your radius but will include places outside your radius if it determines they are relevant and within the 60 place limit. I have had places returned more than 450 kilometers from my requested search position.
Results are going to be returned for places with names that are NOT what you searched for. In my search for the restaurant Benihana in Seattle a Nearby Search request only returns a restaurant with the name Hamansu. Upon investigation, this is because there is not a Benihana in Seattle, however, Hamansu is similar to Benihana in that it serves Japanese dishes grilled tableside. The API documentation states your search term will be "matched against all content that Google has indexed for this place, including but not limited to name, type, and address, as well as customer reviews and other third-party content."
Results are returned 20 at a time. If there are more results, a page_token is provided to make a request to get the next page of up to 20 results. Each request is chargeable. You will be billed for the 3 requests required to get 60 results. I'm not saying this is bad, just be aware of the expense and quota usage you are incurring with this API.
If there are more than 60 results for your radius then you haven't found all the possible locations within it. And, you can't determine with certainty what the effective radius covered was for the 60 results. You need to search with a small enough radius to return < 60 results for each request. A worldwide search is going to require a large quota and $ budget to pursue.
You should be aware that Places API search is not designed to provide results world wide. In your examples you specify only text value 'Aldi'. However, in order to get results you should specify also where you are searching.
For example, if I want to bias results towards Barcelona area in Spain I have to add location and radius in my request
https://maps.googleapis.com/maps/api/place/textsearch/json?query=Aldi&location=41.3850639%2C2.1734035&radius=10000&key=MY_API_KEY
This request will return Aldi supermarkets in Barcelona area as shown in my screenshot
The same thing for Find place, you should specify location bias
https://maps.googleapis.com/maps/api/place/findplacefromtext/json?input=Aldi&inputtype=textquery&fields=formatted_address,geometry,name,place_id&locationbias=circle%3A1000%4041.3850639%2C2.1734035&key=MY_API_KEY
Also note that Find place returns only one result.
I hope this addresses your doubt.
#Art answer, which is marked with higher upvotes, is only partially correct. The answer suggests that the Find Place api (e.g. maps/api/place/findplacefromtext) will usually return 1 result, at most 2. I tend to agree with him. Even if your search hits multiple targets, only one would be returned with the Find Place api. Consequently, he recommends to use Nearby Search or Text Search, both of which would yield at most 60 results.
However, these two searches require some form of location parameter, otherwise they will likely return 0 results, defaulting to using your IP address, as he indicates. But he recommends using a location accompanied with a radius parameter. The problem with this is the radius parameter has a maximum limit. So it will not target all types of things you want if you are searching over the stretch of an entire country, such as the United States.
The truth is you do not need to use the location and radius. There is another option called region. And you can use region to search the entire distance of a country.
What #Art suggested:
https://maps.googleapis.com/maps/api/place/nearbysearch/json?location=25.7392%2C-80.3103&name=Law%Offices%of%Alex&radius=50000&key=KEY
https://maps.googleapis.com/maps/api/place/textsearch/json?query=Law%Offices%of%Alex&location=25.7392%2C-80.3103&radius=50000&key=KEY
A more encompassing alternative:
https://maps.googleapis.com/maps/api/place/textsearch/json?query=Law%Offices%of%Alex&region=us&key=KEY
You need to specify the location of your search.

Google maps geocoder state bias

I have a google maps application where users can search by Country, State, City or a street address. Users may be anywhere in the world and they may be searching for anywhere else in the world, not just within their own country.
I need the geocoder to have a bias such that if a state is entered (without the country) it geocodes to the state and not to a city with the same name. Our application prioritises countries first, then states, then cities etc... however the geocoder is not doing the same.
Eg. I want to search for "Victoria" which is a state within Australia.
http://maps.googleapis.com/maps/api/geocode/json?address=victoria shows Victoria, BC, Canada.
http://maps.googleapis.com/maps/api/geocode/json?address=victoria&region=au shows the state of Victoria in Australia however I cannot include the region as my users may be anywhere in the world so I have no way of knowing which region they are searching for.
I have looked at "administrative levels" and also "types" but I cannot find a solution which suits my needs of simply prioritising in the order country > state > city.
I ideally want something like this:
maps.googleapis.com/maps/api/geocode/json?address=victoria&components=administrative_area:WILDCARD
OR
maps.googleapis.com/maps/api/geocode/json?address=victoria&types=administrative_area_level_1
Of course neither of these solutions work but I hope they illustrate what I am trying to achieve.
Any suggestions?
Thanks,
Nicole
You can do a query without specifying the address, use
...?components=administrative_area:victoria
and then iterate over the results.address_components to pick out ones where the types include administrative_area_level_1
Update: I noticed that depending on the search term provided to administrative_area, google is using some kind of heuristics to determine the certainty of the results. If there's a clear winner, then only 1 result is shown. If the matching is similar for a group of locations, then you will get several. So when there's several results, you can pick towards a higher or lower administrative_area_level to suit your needs.

Retrieving a fully qualified street address from ZIP / postal code

I have a form in which my users need to enter the following location data:
Full address line (street address, apartment, suite, unit, building, floor)
House number
City
State / province / region
ZIP / Postal code
Country
To simplify the completion of this form, I would like to automatically fill in the fully qualified address (addrses line, city, state province etc) by letting the user only enter his country, zip code and house number.
Is it correct that these 3 items are sufficient to lookup the address in the United States? Or is less or more information necessary? And is the answer to this question different for every country? Moreover, is there a service, API, or library that can be utilized for this purpose (e.g. Google Maps or OpenStreetMap)?
Great questions!
Is it correct that these 3 items are sufficient to look up the address in the United States?
No. Unfortunately these three will get you down to ~hundreds of possible addresses in the
US.
Is the answer to this question different for every country?
Yes! The postal systems from country to country vary greatly and you're users in them will have different expectations about what they expect to supply - Brits don't expect to have to enter a full address for example.
With the UK, Canada and Australia you can usually get to a single address from the house number and postcode. BUT, you can not guarantee this. There may be sub-premise information or business information which requires a bit of interaction with the user to check you have right address.
Some countries, such as France, do not have complete premise number coverage. With these you can take the premise number & postcode but depending upon the town you have to alter your behavior to either trust and accept the input or prompt them for a correction.
Another important consideration when planning your workflow is the need to allow for people who perhaps do not know their postcode / zip. It does not happen often but sometimes people have just moved, or occasionally a properties postcode/zip changes so it is important to be flexible in the information you need.
Is there a service, API, or library that can be utilized for this purpose?
Yes - there are several solutions around that offer the ability to capture global addresses. Experian Data Quality (my company) offer a hosted or on premise solution that allows for this.
Try it out here - on the right hand side under the "Do you want to know more?" you can switch countries, the prompt updates and the interaction occurs if needed.
I can only answer about US addresses (I work at SmartyStreets), but the answer is no, that won't work.
Kudos for your desires to improve the user experience. Unfortunately, I would not recommend trying this, and here's why:
A US ZIP code, in its entirety, is actually 11 digits long (12 with the check digit):
The first three digits are the SCF (Sectional Center Facility), kind of like a region code
The first five digits are your typical 5-digit ZIP code that specifies a set of carrier routes
The next 4 digits are more precise, often narrowing down an address to block-level.
The next 2 digits are seldom used except in barcodes, but they indicate the delivery point. In theory, this would specify a particular house, apartment, or mailbox, but in reality, sometimes the 11-digit code is ambiguous (common in large complexes, street blocks, or PO facilities). It's typical for the delivery point to correlate to the house or apartment number, but not always.
So in your situation:
Knowing the country narrows down the possibilities to just 350,000,000+ addresses
Knowing the 5-digit ZIP code narrows it down to somewhere around 10,000+ addresses (important note: not everyone knows the 5-digit ZIP code, and they change. What's more, is that they may not be sure whether to enter their PO box ZIP code or their house ZIP code. And what if their house doesn't receive mail? Or what if they're in the military and their 5-digit ZIP is in flux?)
Knowing the house number may narrow down the address candidates to anywhere from 1-1000. It depends how "big" the ZIP code is. (But ZIP codes are not polygons).
So no, it is not sufficient to know these three parts of the address. The country is practically worthless at that point, and the ZIP code is locality/city-specific at best. The house number might appear dozens, if not hundreds, of times in a ZIP code. (I grew up in the boonies where our house number was unique, but that's rare.)
And yes, the answer to this question varies country to country, but this reasoning holds true for most developed countries. Less developed countries don't have such organization to their postal system.
Is there a service that can do this? Not if you don't want your users to scroll through dozens or hundreds of results. If they have to look through more than just a couple, you're better off just asking them to type their full address.
I answered a very similar question just the other day. You might find it useful.
So now that I've rained doomsday on your idea, how about an alternative? Of course I'm partial to SmartyStreets' autocomplete, which suggests addresses, geo-located close to the user, as they're typing. I should mention that it's free. It doesn't actually verify the address until the user is finished or has chosen one of the suggestions, but it does reduce keystrokes.
Further on this UX vein, I'd recommend putting country as the first field of your address form. This way, you can alter the form's format based on the country they choose. If you use a service like LiveAddress, you can have the user type their address in a format comfortable to them in a single field, rather than across multiple text boxes in your arbitrary order, since LiveAddress can parse their input.
You could easily achieve this by using the google maps reverse geocoding api. Heres a link to its documentation. link
I don't know of any country where there is a one-to-one mapping between a post code and a street address. Except Singapore. Postal Codes in SG
In that particular case you can use the post code to fill in the remaining fields, in any other case you can derive the city name and the street address, but not likely the House number.
Example 1: (derive full street address from post code)
https://geocode.xyz/339696?geoit=xml
<geodata>
<latt>1.32035</latt>
<longt>103.87430</longt>
<elevation/>
<standard>
<stnumber>88</stnumber>
<addresst>88 GEYLANG BAHRU</addresst>
<postal>339696</postal>
<city>Singapore</city>
<prov>SG</prov>
<countryname>Singapore</countryname>
<confidence>0.5</confidence>
</standard>
</geodata>
Example 2: (Get most common street address, and other variations of city name)
https://geocode.xyz/27777?region=DE&geoit=xml
<geodata>
<latt>53.06060</latt>
<longt>8.58388</longt>
<elevation/>
<standard>
<stnumber>20</stnumber>
<addresst>20 Bokenbusch</addresst>
<postal>27777</postal>
<city>Ganderlesee</city>
<prov>DE</prov>
<countryname>Germany</countryname>
<confidence>0.5</confidence>
</standard>
<alt>
<loc>
<city>Ganderkesee</city>
<latt>53.06868</latt>
<longt>8.57437</longt>
<cc>951</cc>
</loc>
<loc>
<city>Bremen</city>
<latt>53.07675</latt>
<longt>8.57559</longt>
<cc>172</cc>
</loc>
<loc>
<city>Schierbrok</city>
<latt>53.08639</latt>
<longt>8.58037</longt>
<cc>166</cc>
</loc>
The number in "cc" indicates how many street addresses in that city share the given post code.
Good luck!

How can I filter out fictional locations (ex. "under a rock", "hiding") from Google Maps API geocode results?

Google Maps API does a great job trying to locate a match for nearly every query. But if I'm only interested in real locations, how can I filter out Google's guesses?
For example, according to Google, "under a rock" is located at "The Rock, Shifnal, Shropshire TF11, UK". But a person who answers the question, "Where are you?" with "Under a rock" does not mean to indicate that they are in Shropshire, UK. Instead they just don't want to tell you — well, either that or they are in real trouble, thankfully with web access, stuck under some rock.
I have several million user generated location strings that I'm attempting to find coordinates for. If someone writes "under a rock" I'd rather just leave the coordinates null instead of putting an obviously wrong point in Shropshire, UK.
Here are some other examples:
under a rock => Shropshire, UK
planet earth => Cheshire, UK
nowhere => Scituate, RI, USA
travelling => Madrid, Spain
hiding => Anderson, CA, USA
global => Midland, TX, USA
on the web => North Part, ON, Canada
internet => Frisco, TX, USA
worldwide => Mie Prefecture, Japan
Ultimately I'm after a solid way to return coordinates from a string but return false if the location is like the above.
I need to build a function that returns the following:
Twin Cities => Return the colloquial coordinates of Minneapolis-St. Paul
right behind you => false [Google get's this one "right" -- at least for my purposes]
under a rock => false
nowhere => false
Canada => Return coordinates
Mission District San Francisco => Return coordinates
Chicago => Return coordinates
a galaxy far far away => false [Google also get's this "right" — zero results]
What do you recommend?
Here's a comma-delimited array for you to play at home:
'twin cities','right behind you','under a rock','nowhere','canada','mission district san francisco','chicago','a galaxy far far away','london, england','1600 pennsylvania ave, washington, d.c.','california','41.87194,12.56738','global','worldwide','on the internet','mars'
And here's the url format:
'http://maps.googleapis.com/maps/api/geocode/json?address=' + query + '&sensor=false'
ex: http://maps.googleapis.com/maps/api/geocode/json?address=twin+cities&sensor=false
It seems most of your incorrect results have a "partial_match" attribute set to "true".
e.g.
Twin Cities, no partial match:
http://maps.googleapis.com/maps/api/geocode/json?address=Twin%20Cities&sensor=false
under a rock, 10+ results, all with partial match:
http://maps.googleapis.com/maps/api/geocode/json?address=under%20a%20rock&sensor=false
Though the original purpose of this attribute is not to tell wether a locality is correct or not, it's still pretty accurate on the dataset you provided.
From Google Maps API documentation:
partial_match indicates that the geocoder did not return an exact match for the original request, though it was able to match part of the requested address. You may wish to examine the original request for misspellings and/or an incomplete address.
Partial matches most often occur for street addresses that do not exist within the locality you pass in the request. Partial matches may also be returned when a request matches two or more locations in the same locality. For example, "21 Henr St, Bristol, UK" will return a partial match for both Henry Street and Henrietta Street. Note that if a request includes a misspelled address component, the geocoding service may suggest an alternate address. Suggestions triggered in this way will not be marked as a partial match.
This might not be the direct answer to your question.
If you are currently going through 1000s of user input saved in db, and filter out the invalid ones, I think it is too late and not feasible. The output can be only good as input.
The better way is to make input as valid as possible, and end users don't always know what they want.
I would suggest you that user enter their address through autocomplete, so that you will always have the valid address
User enters text, and select the suggestions
An marker and info window will be shown
When user confirms input, you save info window text as user input, not from text input.
By doing this way, you don't need to validate or filter user input.
I know there are Bayes Classifier implementations in javascript. Never tried them though, I currently use a Ruby implementation which works correctly.
You could have two classifications (Real and Unreal), training each of them with how many samples you want (30, 50 samples each?). "If your classifier is well trained, it will be more accurate".
Then you'd have to test the location before calling GoogleMaps API to filter Unreal locations.
To truly succeed here you are going to have to build a database driven system that facilitates both positive and negative lookups with AI that gets smarter over time, just like Google did. I don't believe that there is a single algorithm that will filter out results based on cosmetics alone.
I looked around and found a site that contains every city in the world. Unfortunately, it doesn't give it as a single list so you'd have to spend a bit of time harvesting data. the site is http://www.fallingrain.com/world/index.html.
They seem to be using individual directories for organizing countries, states, and cities. Then, broken down further by alphabet. It is however the only comprehensive that I could find.
If you manage to get all of these locations into a database then you will have the beginnings of a positive lookup system for your queries. Also, you'll need to start building separate lists of bi, tri, and quad-city areas as well as popular destinations and land marks.
You should also store a negative lookup table for all known mismatches. People have a tendency to generate similar false data and type-o's across large populations. So, the most popular "nowhere" and "planet earth" answers will be repeated over and over again and, in every language you can think of.
One of the benefits of this strategy is that you can run relational queries against your data to get matches in bulk instead as well as one at a time. Since some false negatives will occur at the beginning then your main decision is to determine what you want to do with unmatched items. You may want to adopt a strategy where you have the ability to both reject non-matches as well as substituting partial matches with the nearest actual match.
Anyhow, I hope this helps. It is a bit of effort but if it's important it will be worth it. Who knows, you may end up with a database that's actually worth something. Maybe even a Google maps gateway service for companies/developers who need the same functionality. (:
Take care.