How do I match street addresses to UK postcodes? [closed] - street-address

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
U.K. sites that require addresses often ask the user to provide a postcode. The site then offers the user a choice between the different addresses that match that postcode.
How can I match a postcode to a street address?

Quick comparison:
Ideal Postcodes - https://ideal-postcodes.co.uk has an package on npm and reasonably priced. Gives address as individual parts as well as a single string, which can be useful for filling out a form as individual bits (and allowing people to customise).
getAddress - http://getaddress.io - also reasonably priced. Gives results as a single string only, rather than the broken down components - whether this is better or worse really depends on the situation.
PostCode Anywhere - http://www.postcodeanywhere.co.uk/ - used by lots of large household names but much more expensive than other solutions. You can get up to 50 JSON API calls a day directly though - see How do I look up a UK address based on house number and postcode?
Get the PAF file yourself http://poweredbypaf.com - this is incredibly expensive.

I believe you need the Royal Mail Postcode Address File. From that link:
PAF is the only complete source of all
known UK Postcodes.
Services do exist to handle requests for this info, such that it may be cheaper to use such services for small numbers of requests (obviously you have issues as and when such services aren't available for whatever reasons).

The only way to do it officially up to now has been to buy the Postcode Address File however there was a news item recently that the data may be free in 2010 so depends if you can wait!

Contrary to the answers here, you do NOT need the very expensive PAF from the Post Office. There are a number of commercial services (presumably powered by the PAF) that return the streets and street numbers for a specified post code. They generally charged on a per-request basis. I do not have any experience with a particular vendor, but this is an example - capscan

To do this you need access to the Postcode Address File - this is something that is licensed for use on an annual basis from the post-office, usually via a third party.
You have a choice depending on your needs of buying a package to use locally or of using web services.
The Royal Mail's page is here: http://www.royalmail.com/portal/rm/jump2?mediaId=400085&catId=400084 and on that page are links to service providers.

Postcode Anywhere is one of the providers out there (one of my clients uses them with no complaints). Licensing is flexible:
Postcode Anywhere UK Address Finder

You can use a geocoding service, such as the one provided by Google.
Physical Address to GeoLocation UK

If you want to get the approximate address for a UK postcode (i.e. street level) there is a way you can do it legally and for free without using PAF data.
Geocode the postcode - This can be done legally for free now. OS have released the codepoint database into the public domain
Do a reverse lookup on the WGS84 lat/lng pair using the Google Maps HTTP Geocoding API to get the street address
As an example of this take a look at this XML Web Service:
http://geo.jamiethompson.co.uk/W127RJ.xml
explained at:
http://jamiethompson.co.uk/projects/2010/04/30/an-open-free-uk-postcode-geocoding-web-service/
which returns:
<result>
<status>200</status>
<message/>
<postcode>W12 7RJ</postcode>
<geo>
<os_x>523180</os_x>
<os_y>180541</os_y>
<lat>51.510379</lat>
<lng>-0.226376</lng>
<landranger>TQ231805</landranger>
<accuracy>1</accuracy>
<key>UO1NV-4UO8</key>
</geo>
<address>
<street>White City Close</street>
<locality>Hammersmith</locality>
<district>Hammersmith</district>
<county>Greater London</county>
</address>
It's not as handy as the commercial offerings which give you a full list of actual addresses for any given postcode, but it lets you do a "What's your postcode? What's your house number?" type system.

You can either use the PAF or one of the commercial web services (there are a few) which licence the PAF. I think you usually buy "credits" or pay a flat rate for unlimited access.

To add to the answers already coming through:
In the paid for products typically you pay for either:
premise level - more detailed and can offer the user a list of premises at that postcode location
street level - simply matches the street at that postcode location - you or your user fills in "the first line of the address" usually house name or number
I believe this differentiation is built into the licencing by the Royal Mail at source. Premise level is substantially more expensive

If you're adding it into a website shopping cart or similar system, you can buy access to the data on a per-click basis. If you're using it for an internal system such as CRM, you need to buy a per-user license.
Either way, you can use the Data8 Postcode Lookup API via web services.

You don't need to purchase the Royal Mail Postcode Address File (PAF). There are lots of APIs available.
getaddress.io is the only one I've found that has a free plan:
https://getaddress.io

Related

How to use digital certificate to check the author of a program?

We develop a Win32 program (=host) which allows 3rd party to write plug-ins. As some plug-ins contains valuable piece of code (for example, high quality video scalar), the 3rd parties want to limit their plug-in to work only with our host program.
Our idea is to use Microsoft Authenticode technology to sign the host. Then, the 3rd parties are asked to implement the following algorithms to check the host. (The 3rd parties are expected to do sufficient code obfuscation for the algorithm).
Use WinVerifyTrust() API to verify the certificate of the host is valid (= Not revoked, not tampered, etc).
Verify the certificate that the subject is our company.
The question is about step (2). The 3rd parties cannot simply check thumb print or serial number because the digital certificate of the host will be renewed after the certificate expiration date.
My idea is to check parts of subject's distinguished name, specifically "country (C)" and "common name (CN)", assuming that there is no company name confliction in the United States. We shouldn't check other attributes such as state and city because our company might move - in fact, we have moved from one city to another just a year ago.
Question: Is it good way to accomplish the goal?
While the scheme is workable, it's possible to relatively easily circumvent protection by just patching plugins so that they ignore the signature or skip signature verification altogether.
What is even more important, - if you plan to have multiple plugins/vendors, you would have hard time ensuring that all vendors obfuscate validation code right.
Then, I'd say that it can be against plugin vendor's interest to limit their plugin to your application only - if they want bigger market, they might want to have the same plugin run on wider scope of hosts.

How are Google maps formed? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 12 years ago.
Improve this question
How are Google maps formed? Are they based on satellite image solely? How are locations and places named? I sometimes see very minute details; gathering them doesn't look like an easy task.
Google Maps data is based on many sources, depending on both the type of data, and the area you're looking at.
Vector (Road, Park, etc. data)
Vector data, like roads, points of interest, etc. are bought from many different companies. Tele Atlas is one of their worldwide data providers, and is a key component, especially outside densely populated urban areas.
In some areas, this data is combined with other vector data providers, like Sanborn, who do 3D building outlines, as well as combining with more local sources of data, such as organizations which collect POI data (restaurants, etc.).
In countries other than the US, data is often purchased from a National Mapping Agency; a government agency tasked with collecting and distributing map data.
In some cases, data -- especially for populating searches -- is gathered via the web, and geocoded (looked up by address) to be placed on the map.
This data is commercial; the collection aspects are expensive, and Google pays a significant amount of money to license the data for this usage. (The actual amount is not public knowledge.)
Imagery
Imagery data for Google is similarly collected via many sources. Imagery up to .5M/px (Letting you see cars clearly, but not people) is typically collected via satellites flown by Digital Globe or Geoeye. (Geoeye actually flies a satellite, "Geoeye1", which was funded by Google in large part.)
In addition, Google adds in many different public datasources, from government organizations and programs (USGS, NAIP), state and local organizations, and more. In addition, for high profile events Google will sometimes specifically pay a company to do an overflight -- this was the case for the Haiti earthquake, and is common for them to do during the Burning Man festival.
Street View
Streetview data is collected by vehicles paid by Google to drive around with special tools (LIDAR detectors + 8-way videocameras) and collect the data.
Overall, in each case, you can look at the various sources for data -- at least those that require crediting, which is not all of them -- in the lower right hand corner of any Google Map.
They buy their data from other companies to form the maps. I beleive they purchased the majority of it from Tele Atlas. http://code.google.com/apis/maps/signup.html
Here is a lot of information on the history of it:
http://en.wikipedia.org/wiki/Google_Maps#Map_projection

Best Practice / Standard for storing an Address in a SQL Database

I am wondering if there is some sort of "standard" for storing US addresses in a database? It seems this is a common task, and there should be some sort of a standard.
What I am looking for is a specific schema of how the database tables should work and interact, already in third normal form, including data types (MySQL). A good UML document would work.
Maybe I'm just being lazy, but this is a very common task, and I am sure someone has published an efficient way to do this somewhere. I just don't know where to look and Google isn't helping. Please point me to the resource. Thanks.
EDIT
Although this is more of a general question, I would like to clarify my specific needs.
Addresses will be used to specify road addresses of locations of events. These addresses will need to be in a format that can be best broken down and searched, and also used by any third-party applications I may end up linking my data source to.
ALSO. Data will be geo-coded (long, lat) on entry and stored separately, so it must fit the (yet undecided) protocol of whatever geocoder / application / library does that.
For international addresses, refer to the Universal Postal Union's Postal Addressing Systems database.
For U.S. addresses, refer to USPS Publication 28 "Postal Addressing Standards".
The USPS wants the following unpunctuated address components concatenated on a single line:
house number
predirectional (N, SE, etc.)
street
suffix (AVE, BLVD, etc.)
postdirectional (SW, E, etc.)
unit (APT, STE, etc.)
apartment/suite number
E.g. 102 N MAIN ST SE APT B
If you keep the entire address line as a single field in your database, input and editing is easy, but searches can be more difficult (eg, in the case SOUTH EAST LANE is the street EAST as in S EAST LN or is it LANE as in SE LANE ST?).
If you keep the address parsed into separate fields, searches for components like street name or apartments become easier, but you have to append everything together for output, you need CASS software to parse correctly, and PO boxes, rural route addresses, and APO/FPO addresses have special parsings.
A physical location with multiple addresses at that location is either a multiunit building, in which case letters/numbers after units like APT and STE designate the address, or it's a Commercial Mail Receiving Agency (eg, UPS store) and a maildrop/private mailbox number is appended (like 100 MAIN ST STE B PMB 102), or it's a business with one USPS delivery point and mail is routed after USPS delivery (which usually requires a separate mailstop field which the company might need but the USPS won't want on the address line).
A contact with more than one physical address is usually a business or person with a street address and a PO box. Note that it's common for each address to have a different ZIP code.
It's quite typical that one business transaction might have a shipping address and a billing address (again, with different ZIP codes). The information I keep for EACH address is:
name prefix (DR, MS, etc)
first name and initial
last name
name suffix (III, PHD, etc)
mail stop
company name
address (one line only per Pub 28 for USA)
city
state/province
ZIP/postal code
country
I typically print mail stops somewhere between the person's name and company because the country contains the state/ZIP which contains the city which contains the address which contains the company which contains the mail stop which contains the person. I use CASS software to validate and standardize addresses when entered or edited.
First, as a person who spend most of there professional day working with addresses, they are hard to manage from a data perspective.
If you ask 5 people what address they live at; you will find that you get 5 different answers. While you and I can tell that 123 Main Street Apt 1 and Apt 1 123 Main Street
are the same address, the database program will have a challenge.
If you are using United States centric addresses CASS certified software from almost any vendor will standardize your addresses reasonably well. I would recommend a simple format as follows:
Address 1
Address 2
Address 3
City
State
Zip
Zip+4 (I would carry this so lookups are easier when checking for duplicates)
However, if you want a universal address I would look at the ADIS standard from IdeaAlliance. This standard can be used to breakdown (parse) addresses from almost any country into the relevant parts. Then they can be put back together using templates/components based on the Universal Postal Union standards (UPU S42 Standard on International Postal Address Components and Templates).
The big plus of this format is that addresses that dont exist in a postal database like CASS can be entered and stored as separate parts.
Very similar questions have been asked before.
Addresses are messy - at best.
It partly depends on what you want to do with the addresses. If you're going to use them to mail thing to people, then you simply need to record the image that will appear on the address label in a convenient form. If you're going to analyze the address, you have to work a lot harder.
Remember that the first time you have to deal with someone outside the US, all previous rules go astray. You may be strictly US-only, but beware.
I looked into this a while ago, but for international addresses. I didn't find much in the way of a consensus. However, for the US, I found the succinctly named United States Thoroughfare, Landmark, and Postal Address Data Standard (Draft):
http://www.fgdc.gov/standards/projects/FGDC-standards-projects/street-address/index_html
I don't think that they actually provide any specific database schema ideas, but it might be a good starting point.
First, the "best" means of storing an address depends greatly on how it will be used. Is it just for reference or searches on say city? Do you plan on addressing envelopes? Are you going to integrate with a shipping system like FedEx or UPS? Will you store non-US addresses? Once you get into the realm of integrating with something that ships, you should start looking at CASS. This is a specification for handling the USPS addresses. There are applications out there that are CASS certified which will store and verify addresses. Thus, the second best practice would be to try to avoid reinventing the wheel and see if there is a system out there that will solve your problem especially if you are going to go international. You want to leverage the fact that someone else has worked out all the details about how to properly and efficiently store addresses for many countries around the world instead of having to do that investigation yourself.
I've had to try to do this before and I'd found this document that gives you some pointers. I ended up shelving my schema since my application does have to deal with international addresses.

How do you deal with duplicate street suffixes?

I have a system where users need to enter addresses. I am trying to limit duplicates of course and something I started noticing was becoming a big problem was some users putting in "Road" and others "Rd", therefore duplicates were creeping in.
I looked up the list of USPS street suffix abbreviations but I still have a question which I can't find an answer to. Can I replace all words in a street address with the USPS standard abbreviation? An example would be "123 Forest Hill Road". If I were to replace it with the abbreviations it would then be "123 Frst Hl Rd" or does the "street suffix" that USPS is referring to mean they only want you to make go as far as "123 Forest Hill Rd"?
USPS has an API that can get you properly formatted addresses.
You would have to ask the USPS to be sure, but I imagine that your app and data would be in trouble if you started replacing "123 Forest Hill Rd" with "123 Frst Hl Rd".
I have done some work with addresses and let me tell you it is very complicated and time consuming to do even remotely correctly. In most cases you would be better off making use of existing packages out there. For example, you would be surprised what you can achieve with a few simple calls to the free Google Maps API.
Can you avoid the whole problem by expanding all of the terms rather than attempting to abbreviate any?
On the duplicates, just wondering if you'd be better to make the Users choose from a drop-down of Address Types. Take it out of the User's hands.
On the abbreviation, are you asking this because USPS needs the Address in some specific format? Just wondering what purpose there is in the abbreviating. Apologies if I've missed the mark.
You could also take a look at the USPS Postal Addressing Standards which has explanation of the preferred and acceptable formats for various address examples.
http://pe.usps.gov/text/pub28/pub28c2_toc.htm
In the example case, the relevant section is 23 Delivery Address Line.
http://pe.usps.gov/text/pub28/pub28c2_012.htm
The trouble with trying to expand/contract addresses yourself is that oftentimes abbreviations can be part of the street or even city name. For example: "100 Avenue A" where Avenue isn't supposed to be abbreviated. Or "900 St Louis Loop". In this case St don't mean street, it means Saint.
Within the USA, there is a component of a certificated address called a delivery point barcode (DPBC). It's a unique 12-digit value that can serve as the unique identifier of an address. To get this value you'll want to use an address verification or address standardization web service API, which can cost about $20/mo depending upon the volume of requests you make to it. Using this you can easily prevent duplicates or do fraud prevention/detection, etc.
In the interest of full disclosure, I'm the founder of SmartyStreets. We offer just such an address verification web service API called LiveAddress. You're more than welcome to contact me personally with any questions you have.

Is there a centralized list of country names that can be used for web drop down boxes (and validation) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
There are examples online with web select boxes that have a huge list of countries and that probably will be good enough for me to use. However, by Murphy's law, there's bound to be some random country that someone is from and isn't on my list (and probably someone else also ran into this and has updated their local list). Also, when new countries are added, I won't know about it.
Basically, I feel it's better practice and a better smell if there is some centralized list of country names that I can use / trust. (also it could set/follow standards for exact namings "United St..." vs "USA" etc.)
I would prefer a solution that isn't IIS specific if possible
There are many list of countries, check this wikipedia article; there you can find some lists like:
ISO 3166-1 countries codes
IOC country codes
Alternative country names
And more...
We maintain a list of 'PUBLIC DOMAIN' Worldwide Country names in all official formats. The information comes from the ISO 3166-1 Maintenance Agency for offical English and French short names, the US Board on Geographic Names (BGN) for english short, long and local short, long names, and United Nations Group of Experts on Geographic Names (UNGEGN) format (long) English, French names, short and formal local names and Spanish names.
There still is a problem representing some Arabic characters as romanized characters (will see ?). But these are limited to the local names of a few Arab countries.
Note, the English, Spanish and French cover the 3 Western languages of the UN's official 6 languages. Metadata, information on the sources, and download can be found at:
http://www.opengeocode.org/download.php#countrynames
The Open Geocode Team
OpenGeoCode.org
Jan. 26, 2014.
We updated the list to include country names in Italian and German. We used the UN Food & Agriculture list of countries in Italian and the German Government's Federal Foreign Office's list of countries in German.
I recommend pulling data out of the Unicode CLDR (Common Locale Data Repository), which include a professionally-maintained list of countries and country name data.
Grab the data from there once, and do updates once in a while; the CLDR data will come in a consistent format, so you won't need to fuss over it once it's part of your workflow.
An answer to this question contains a useful link to a Github project that has lists in various formats and the script that produced them, making it easy to obtain updated versions.
No list is comprehensive.
"there's bound to be some random country that someone is from and isn't on my list"
If that was all there was to it, it would be simple.
There's no "world law" or "world constitution", so there's no single list of countries, republics, territories, protectorates, autonomous regions, independent governments, and disputed territories.
Indeed, it's not possible to come to an agreed-upon definition of "country" which would lead to a final list of country codes. The definition of "country" is politically charged. What, for example, is Tibet? Country or region of China? Northern Ireland? The Holy See?
Pick a list, and know that it's subject to some dispute.
You could, for example, use the IANA country code database: http://www.iana.org/domains/root/db/
It's as good as any, and since it's part of the IANA, it has some standing as a standard. Further, it's pretty accessible as easy-to-parse web content.
I have a recent list ready to go on my website at http://www.john.geek.nz/index.php/2009/01/sql-tips-list-of-countries/
It's both at sql and tab delimited - The original list was sourced from wikipedia
I don't know what IIS is, but ISO 3166 specifies 2-letter codes for each country; AFAIK, their list is comprehensive. ISO 3166 site
This is probably way too late, but there's a web service that you can call that would theoretically allow you to automatically databind your controls:
http://www.webservicex.net/country.asmx
May give you another approach and would be better than hard-coding a list yourself.
I've posted a few files to github:gist
Including:
The HTML Select enumerations for the Alpha-2, Alpha-3, and Numeric-3 values, as well as an XSD snippet of those values as enumerations for a simple type restriction.
Check on this link
The CIA world fact book has this information, however, just as you point out in your question there are some disputed countries that are not on their list sometimes, ie. Palestine.
Another source for country names is Natural Earth Data and their cultural map download links which come as shape files ready to be plotted as maps. Here is a direct link to the medium quality map data download page.
Check out angrymonkeycloud.com/geography to get the full list of countries.
It's a free .Net client and so easy to start with, and it uses an API to retrieve values so you should always get the latest updates.
Much, much easier is to use a web service for this task rather than holding your own data store. This way its updated and you can do things like have country-state ajax dropdown sets. http://geodata.solutions is the best one to use, and it has lots of cools stuff like being able to pre-select the user's country based on their IP, and ordering lists by their population.