How do you deal with duplicate street suffixes?

How do you deal with duplicate street suffixes? - street-address

I have a system where users need to enter addresses. I am trying to limit duplicates of course and something I started noticing was becoming a big problem was some users putting in "Road" and others "Rd", therefore duplicates were creeping in.
I looked up the list of USPS street suffix abbreviations but I still have a question which I can't find an answer to. Can I replace all words in a street address with the USPS standard abbreviation? An example would be "123 Forest Hill Road". If I were to replace it with the abbreviations it would then be "123 Frst Hl Rd" or does the "street suffix" that USPS is referring to mean they only want you to make go as far as "123 Forest Hill Rd"?

USPS has an API that can get you properly formatted addresses.

You would have to ask the USPS to be sure, but I imagine that your app and data would be in trouble if you started replacing "123 Forest Hill Rd" with "123 Frst Hl Rd".
I have done some work with addresses and let me tell you it is very complicated and time consuming to do even remotely correctly. In most cases you would be better off making use of existing packages out there. For example, you would be surprised what you can achieve with a few simple calls to the free Google Maps API.

Can you avoid the whole problem by expanding all of the terms rather than attempting to abbreviate any?

On the duplicates, just wondering if you'd be better to make the Users choose from a drop-down of Address Types. Take it out of the User's hands.
On the abbreviation, are you asking this because USPS needs the Address in some specific format? Just wondering what purpose there is in the abbreviating. Apologies if I've missed the mark.

You could also take a look at the USPS Postal Addressing Standards which has explanation of the preferred and acceptable formats for various address examples.
http://pe.usps.gov/text/pub28/pub28c2_toc.htm
In the example case, the relevant section is 23 Delivery Address Line.
http://pe.usps.gov/text/pub28/pub28c2_012.htm

The trouble with trying to expand/contract addresses yourself is that oftentimes abbreviations can be part of the street or even city name. For example: "100 Avenue A" where Avenue isn't supposed to be abbreviated. Or "900 St Louis Loop". In this case St don't mean street, it means Saint.
Within the USA, there is a component of a certificated address called a delivery point barcode (DPBC). It's a unique 12-digit value that can serve as the unique identifier of an address. To get this value you'll want to use an address verification or address standardization web service API, which can cost about $20/mo depending upon the volume of requests you make to it. Using this you can easily prevent duplicates or do fraud prevention/detection, etc.
In the interest of full disclosure, I'm the founder of SmartyStreets. We offer just such an address verification web service API called LiveAddress. You're more than welcome to contact me personally with any questions you have.

Related

Address Capitalization

I'm looking into using a CASS-Certified address validation service to correct user-provided street addresses at the time of entry. (Specifically, I'm looking at SmartyStreets' LiveAddress.) However, USPS dictates that a correct address must be in all caps, so CASS services almost uniformly return addresses that way. When mailing to the client at that address, though, it would be preferable to use a more humane, conventional casing.
The question, of course, is how to make that happen. I know there's no such thing as a perfect solution that doesn't involve an complete nation-wide database of correctly capitalized street and city names. A set of passable heuristics might be good enough, though, since we will probably be kicking the corrected address back to the user, ultimately leaving it up to them.
A short list of problems that I was able to come up with after a few minutes of thought:
SW FIRST ST should be SW First St, not Sw First St.
MCDOUGLE ST should be McDougle St, not Mcdougle St.
MACDOUGLE ST should probably be Macdougle St rather than MacDougle St, since Macoroni Rd should usually not be MacOroni Rd.
1ST ST should be 1st St, not 1St St.
Not knowing if a street name is based on a surname, we can possibly not safely make VAN into van, but VON can probably become von.
Are there any existing libraries that could at least get me started? Addresses are complicated and fickle things, and I'd rather not home-brew the whole thing if I don't have to. I'm using C#, but I'm open to porting code from another language.
Barring that, does anyone have a decent reference of common capitalization exceptions for street and/or city names?

Great to see that you're using the LiveAddress service to facilitate address verification and standardization. There is one thing you may want to be aware of that will help you significantly in the process of applying casing rules to your standardized address:
We recently introduced a new REST+JSON endpoint that returns the standardized form of the address as well as various component parts of the address. Because of this, it's very easy to apply your casing rules to "street_name" and "city_name" values returned independent of the street suffix and pre/post-directionals.
You're welcome to contact SmartyStreets support for additional help with this issue in addition to questions here on Stack Overflow (which we monitor continually). I should probably also mention that I'm the founder of SmartyStreets. Lastly, we're working on being able to return properly cased addresses, but I don't have any kind of release time frame on it yet.

Not a library, but you could probably solve the problem with the Google Maps API depending on your requirements.

What is a Geo Cookie? How do I implement one?

I've been tasked with using a 'Geo Cookie' to identify where in the world users are accessing the site from, and to redirect them to a site configured for their region.
I understand what a cookie is in the context of the web, but I'm not sure about a geo cookie, partly because I've never heard of one before.
Could somebody please tell me if this a real thing, or if it could be a term that was made up by people who didn't know the proper name for what they're really talking about? (this happens quite regularly, it seems). Is there any way for me to identify where the user is located?

I'd never heard of the term before, and doing a little googling confirmed my suspicion that this is not a common term in usage.
My thoughts, reading off of the first search result from that query, is that you're being asked to implement a cookie that stores the users region. This can be done with a normal cookie, and you can give it the name "GEO" - viola! A Geo Cookie.
If I were you, I'd go back to the stakeholder and find out what this requirement means.
For determining where a user is located (without asking the directly) you can use the ip of the user, and a geo-location service. See Know a good IP address Geolocation Service for more info on that.

A geo cookie is just a cookie which you store the geo data in.
This allows you to:
Avoid having to hit the geo/ip database with each request
Allow users to override the location (e.g. I know it looks like I'm in Spain, and as it happens, I am (yay holidays), but I want data for England, that is where I live, and that is where I want my results localised to).

How to defend against users with Multiple Accounts?

We have a service where we literally give away free money.
Naturally said service is ripe for abuse. To defend against this we do the following:
log ip address
use unique email addresses (only 1 acct/email addy)
collect more info like st. address, phone number, etc.
use signup captcha
BHOs (I've seen poker rooms use these)
Now, let's get real here -- NONE of this will stop a determined user.
Obviously ip addresses can be changed via a proxy (which could be blacklisted via akismet) but change anyways if the user has a dynamic ip or if more than one user is behind a NAT'd network (can we say almost everyone?)
I can sign up for thousands of unique email addresses each hour -- this is no defense.
I can put in fake information taken from lists for street addresses and phone numbers.
I can buy captchas from captcha solving services (1k for $5).
bhos seem only effective for downloadable software -- this is a website
What are some other ways to prevent multiple users from abusing the service? How do all the PPC people control click fraud?
I know we could actually call the person but I don't think we are trying to do that anytime soon.
Thanks,

It's pretty difficult to generate lots of fake phone numbers that can send and receive SMS messages. SMS verification could go a long way towards cutting down on fraud. Of course, it also limits you to giving away free money to cell phone owners.

I think only way is to bind your users accounts to 'real world' information, like his/her passport number, for instance. Of course, you'll need to make sure that information is securely stored and to find some way to validate it.

Re: signing up for new email accounts...
A user doesn't even need to do that. Please feel free to send your mail to brian_s#mailinator.com, or feydr.asks.a.question#spamherelots.com, or stackoverflow#safetymail.info, or my_arbitrary_username#zippymail.info. I haven't registered any of those email addresses, but all of them will work.
Those domains are owned by ManyBrain, and they (and probably others as well) set the domain to accept any email user. ManyBrain in particular then makes the inboxes for those emails publicly accessible without any registration (stripping everything by text from the email and deleting old mail). Check it out: admin#mailinator.com's email inbox!
Others have mentioned ways to try and keep user identities unique. This is just one more reason to not trust email addresses.

First, I suppose (hope) that you don't literally give away free money but rather give it to use your service or something like that.
That matters as there is a big difference between users trying to just get free money from you they can spend on buying expensive cars vs only spending on your service which would be much more limited.
Obviously many more user will try to fool the system in the former than in the latter case.
Why it matters? Because it is all about the balance between your control vs your user annoyance. I see many answers concentrating on the control part, so let's go through annoyance, shall we?
Log IP address. What if I am the next guy on the computer in say internet shop and the guy before me already used that IP? The other guy left your hot page that I now see but I am screwed because the IP is blocked. Yes, I can go to another computer but it is annoyance and I may have other things to do.
Collecting physical Adresses. For what??? Are you going to visit me? Or start sending me spam letters? Let me guess, more often than not you get addresses with misprints at best and fake ones at worst. In fact, it is much less hassle for me to give you fake address and not dealing with whatever possible spam letters I'll have to recycle in environment-friendly way. :)
Collecting phone numbers. Again, why shall I trust your site? This is the real story. I gave my phone nr to obscure site, then later I started receiving occasional messages full of nonsense like "hit the fly". That I simply deleted. Only later and by accident to discover that I was actually charged 2 euros to receive each of those messages!!! Do I want to get those hassles? Obviously not! So no, buddy, sorry to disappoint but I will not give your site my phone number unless your company is called Facebook or Google. :)
Use signup captcha. I love that :). So what are we trying to achieve here? Will the user who is determined to abuse your service, have problems to type in a couple of captchas? I doubt it. But what about the "good user"? Are you aware how annoying captchas are for many users??? What about users with impaired vision? But even without it, most captchas are so bad that they make you feel like you have impaired vision! The best advice I can give - if you care about user experience, avoid captchas as plague! If you have any doubts, do your online research first!
See here more discussion about control vs annoyance and here some more thoughts about being user-friendly.

You have to bind their information to something that is 'real world', as Rubens says. Of course, you also need to be able to verify this information (I can just make up passport numbers all day if you don't check to make sure they're correct).
How do you deliver the money? Perhaps you can index this off the paypal account, mailing address, or whatever you're sending the money to?

Sometimes the only way to prevent people abusing a system is to not have the system in the first place.
If you're doing what you say you're doing, "giving away money to people", then surprise surprise, there will be tons of people with more time available to try to find ways to game the system than you will have to fix it.

I guess it will never be possible to have an identification system which identifies fake identities that is:
cheap to run (I think it's called "operational cost"?)
cheap to implement (ideally one time cost - how do you call that?)
has no Type-I/Type-II errors
is scalable
But I think you could prevent users from having too many (to say a quite random number: more than 50) accounts.
You might combine the following approaches:
IP address: can be bypassed with VPN
CAPTCHA: can be bypassed with human farms (see this article, for example - although they claim that their test can't be that easily passed to other humans, I doubt this is true)
Ability-based identification: can be faked when you know what is stored and how exactly the identification works by randomly (but with a given distribution) acting (example: brainauth.com)
Real-world interaction: Although this might be the best one, but I guess it is expensive and not many users will accept it. Also, for some users/countries it might not be possible. (example: Postident in Germany, where the Post wants to see your identity card. I guess this can only be faced in massive scale by the government.)
Other sites/resources: This basically transforms the problem for other sites. You can use services, where it is not allowed/uncommon/expensive to have much more than 1 account
Email
Phone number: e.g. by using SMS, see Multi-factor authentication
Bank account: PayPal; transfer not much money or ask them to transfer a random (small) amount to you (which you will send back).
Social based
When you take the social graph (vertices are people, edges are connections), you will expect some distribution. You know that you are a single human and you know some other people. So you have a "network of trust" (in quotes, because I think this might be used in other context as well). Now you might not trust people / networks how interact heavily with your service, but are either isolated (no connection) or who connect a large group with another large group ("articulation points"). You also might not trust fast growing, heavily interacting new, isolated graphs.
When a user provides content that is liked by many other users (who you trust), this might be an indicator that there is a real human creating it.

We had a similar issue recently on our website, it is really a hassle to solve this issue if you are providing a business over one time or monthly recurring free credits system.
We are using a fraud detection solution https://fraudradar.io for a while and that helped us a lot to clean out most of the spam activities. It is pretty customizable with:
IP checks
Email domain validity
Regex rules
Whitelisting options per IP, email domain etc.
Simple API to communicate through
I would suggest to check that out.

Best Practice / Standard for storing an Address in a SQL Database

I am wondering if there is some sort of "standard" for storing US addresses in a database? It seems this is a common task, and there should be some sort of a standard.
What I am looking for is a specific schema of how the database tables should work and interact, already in third normal form, including data types (MySQL). A good UML document would work.
Maybe I'm just being lazy, but this is a very common task, and I am sure someone has published an efficient way to do this somewhere. I just don't know where to look and Google isn't helping. Please point me to the resource. Thanks.
EDIT
Although this is more of a general question, I would like to clarify my specific needs.
Addresses will be used to specify road addresses of locations of events. These addresses will need to be in a format that can be best broken down and searched, and also used by any third-party applications I may end up linking my data source to.
ALSO. Data will be geo-coded (long, lat) on entry and stored separately, so it must fit the (yet undecided) protocol of whatever geocoder / application / library does that.

For international addresses, refer to the Universal Postal Union's Postal Addressing Systems database.
For U.S. addresses, refer to USPS Publication 28 "Postal Addressing Standards".
The USPS wants the following unpunctuated address components concatenated on a single line:
house number
predirectional (N, SE, etc.)
street
suffix (AVE, BLVD, etc.)
postdirectional (SW, E, etc.)
unit (APT, STE, etc.)
apartment/suite number
E.g. 102 N MAIN ST SE APT B
If you keep the entire address line as a single field in your database, input and editing is easy, but searches can be more difficult (eg, in the case SOUTH EAST LANE is the street EAST as in S EAST LN or is it LANE as in SE LANE ST?).
If you keep the address parsed into separate fields, searches for components like street name or apartments become easier, but you have to append everything together for output, you need CASS software to parse correctly, and PO boxes, rural route addresses, and APO/FPO addresses have special parsings.
A physical location with multiple addresses at that location is either a multiunit building, in which case letters/numbers after units like APT and STE designate the address, or it's a Commercial Mail Receiving Agency (eg, UPS store) and a maildrop/private mailbox number is appended (like 100 MAIN ST STE B PMB 102), or it's a business with one USPS delivery point and mail is routed after USPS delivery (which usually requires a separate mailstop field which the company might need but the USPS won't want on the address line).
A contact with more than one physical address is usually a business or person with a street address and a PO box. Note that it's common for each address to have a different ZIP code.
It's quite typical that one business transaction might have a shipping address and a billing address (again, with different ZIP codes). The information I keep for EACH address is:
name prefix (DR, MS, etc)
first name and initial
last name
name suffix (III, PHD, etc)
mail stop
company name
address (one line only per Pub 28 for USA)
city
state/province
ZIP/postal code
country
I typically print mail stops somewhere between the person's name and company because the country contains the state/ZIP which contains the city which contains the address which contains the company which contains the mail stop which contains the person. I use CASS software to validate and standardize addresses when entered or edited.

First, as a person who spend most of there professional day working with addresses, they are hard to manage from a data perspective.
If you ask 5 people what address they live at; you will find that you get 5 different answers. While you and I can tell that 123 Main Street Apt 1 and Apt 1 123 Main Street
are the same address, the database program will have a challenge.
If you are using United States centric addresses CASS certified software from almost any vendor will standardize your addresses reasonably well. I would recommend a simple format as follows:
Address 1
Address 2
Address 3
City
State
Zip
Zip+4 (I would carry this so lookups are easier when checking for duplicates)
However, if you want a universal address I would look at the ADIS standard from IdeaAlliance. This standard can be used to breakdown (parse) addresses from almost any country into the relevant parts. Then they can be put back together using templates/components based on the Universal Postal Union standards (UPU S42 Standard on International Postal Address Components and Templates).
The big plus of this format is that addresses that dont exist in a postal database like CASS can be entered and stored as separate parts.

Very similar questions have been asked before.
Addresses are messy - at best.
It partly depends on what you want to do with the addresses. If you're going to use them to mail thing to people, then you simply need to record the image that will appear on the address label in a convenient form. If you're going to analyze the address, you have to work a lot harder.
Remember that the first time you have to deal with someone outside the US, all previous rules go astray. You may be strictly US-only, but beware.

I looked into this a while ago, but for international addresses. I didn't find much in the way of a consensus. However, for the US, I found the succinctly named United States Thoroughfare, Landmark, and Postal Address Data Standard (Draft):
http://www.fgdc.gov/standards/projects/FGDC-standards-projects/street-address/index_html
I don't think that they actually provide any specific database schema ideas, but it might be a good starting point.

First, the "best" means of storing an address depends greatly on how it will be used. Is it just for reference or searches on say city? Do you plan on addressing envelopes? Are you going to integrate with a shipping system like FedEx or UPS? Will you store non-US addresses? Once you get into the realm of integrating with something that ships, you should start looking at CASS. This is a specification for handling the USPS addresses. There are applications out there that are CASS certified which will store and verify addresses. Thus, the second best practice would be to try to avoid reinventing the wheel and see if there is a system out there that will solve your problem especially if you are going to go international. You want to leverage the fact that someone else has worked out all the details about how to properly and efficiently store addresses for many countries around the world instead of having to do that investigation yourself.

I've had to try to do this before and I'd found this document that gives you some pointers. I ended up shelving my schema since my application does have to deal with international addresses.

How do I match street addresses to UK postcodes? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
U.K. sites that require addresses often ask the user to provide a postcode. The site then offers the user a choice between the different addresses that match that postcode.
How can I match a postcode to a street address?

Quick comparison:
Ideal Postcodes - https://ideal-postcodes.co.uk has an package on npm and reasonably priced. Gives address as individual parts as well as a single string, which can be useful for filling out a form as individual bits (and allowing people to customise).
getAddress - http://getaddress.io - also reasonably priced. Gives results as a single string only, rather than the broken down components - whether this is better or worse really depends on the situation.
PostCode Anywhere - http://www.postcodeanywhere.co.uk/ - used by lots of large household names but much more expensive than other solutions. You can get up to 50 JSON API calls a day directly though - see How do I look up a UK address based on house number and postcode?
Get the PAF file yourself http://poweredbypaf.com - this is incredibly expensive.

I believe you need the Royal Mail Postcode Address File. From that link:
PAF is the only complete source of all
known UK Postcodes.
Services do exist to handle requests for this info, such that it may be cheaper to use such services for small numbers of requests (obviously you have issues as and when such services aren't available for whatever reasons).

The only way to do it officially up to now has been to buy the Postcode Address File however there was a news item recently that the data may be free in 2010 so depends if you can wait!

Contrary to the answers here, you do NOT need the very expensive PAF from the Post Office. There are a number of commercial services (presumably powered by the PAF) that return the streets and street numbers for a specified post code. They generally charged on a per-request basis. I do not have any experience with a particular vendor, but this is an example - capscan

To do this you need access to the Postcode Address File - this is something that is licensed for use on an annual basis from the post-office, usually via a third party.
You have a choice depending on your needs of buying a package to use locally or of using web services.
The Royal Mail's page is here: http://www.royalmail.com/portal/rm/jump2?mediaId=400085&catId=400084 and on that page are links to service providers.

Postcode Anywhere is one of the providers out there (one of my clients uses them with no complaints). Licensing is flexible:
Postcode Anywhere UK Address Finder

You can use a geocoding service, such as the one provided by Google.
Physical Address to GeoLocation UK

If you want to get the approximate address for a UK postcode (i.e. street level) there is a way you can do it legally and for free without using PAF data.
Geocode the postcode - This can be done legally for free now. OS have released the codepoint database into the public domain
Do a reverse lookup on the WGS84 lat/lng pair using the Google Maps HTTP Geocoding API to get the street address
As an example of this take a look at this XML Web Service:
http://geo.jamiethompson.co.uk/W127RJ.xml
explained at:
http://jamiethompson.co.uk/projects/2010/04/30/an-open-free-uk-postcode-geocoding-web-service/
which returns:
<result>
<status>200</status>
<message/>
<postcode>W12 7RJ</postcode>
<geo>
<os_x>523180</os_x>
<os_y>180541</os_y>
<lat>51.510379</lat>
<lng>-0.226376</lng>
<landranger>TQ231805</landranger>
<accuracy>1</accuracy>
<key>UO1NV-4UO8</key>
</geo>
<address>
<street>White City Close</street>
<locality>Hammersmith</locality>
<district>Hammersmith</district>
<county>Greater London</county>
</address>
It's not as handy as the commercial offerings which give you a full list of actual addresses for any given postcode, but it lets you do a "What's your postcode? What's your house number?" type system.

You can either use the PAF or one of the commercial web services (there are a few) which licence the PAF. I think you usually buy "credits" or pay a flat rate for unlimited access.

To add to the answers already coming through:
In the paid for products typically you pay for either:
premise level - more detailed and can offer the user a list of premises at that postcode location
street level - simply matches the street at that postcode location - you or your user fills in "the first line of the address" usually house name or number
I believe this differentiation is built into the licencing by the Royal Mail at source. Premise level is substantially more expensive

If you're adding it into a website shopping cart or similar system, you can buy access to the data on a per-click basis. If you're using it for an internal system such as CRM, you need to buy a per-user license.
Either way, you can use the Data8 Postcode Lookup API via web services.

You don't need to purchase the Royal Mail Postcode Address File (PAF). There are lots of APIs available.
getaddress.io is the only one I've found that has a free plan:
https://getaddress.io

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008