Addresses stored in a database should you normalize?

Addresses stored in a database should you normalize? - mysql

quick question.
consider the following table (UK):
CustomerID (PK)
First Name
Surname
House_No/name
street
City
Postcode
Would you split off address into another table?
basic business assumption is that a customer cannot have more than one address.
originally i seperated this off to look something like this:
Customer Table
CustomerID (PK)
FirstName
Surname
AddressID (FK)
Address Table
AddressID(PK)
Postcode(FK)
House_Number_name
Postcode Table:
Postcode (PK)
StreetName
CityID(FK)
City Table
CityID (PK)
CityName
unless i have my assumptions wrong that a postcode uniquely identifies a streetname and city is this not in 3NF?

personally, i would put address in another table, and link them together.
the business assumption/rule may change and when you split these things you have the best chance of accommodating any possible business rule without a major redo.
for instance - oops, the customer has a different billing address than their shipping address, or oops, we need to know where something actually shipped last year even though the customer changed their address for this year, etc.

basic business assumption is that a customer cannot have more than
one address.
If this is an actual rule and not an assumption, I'd just keep them in the one table.
However, assume puts the 'ass' in 'u' and 'me'.
So play safe and sperate the address into another table.
But it looks like you are taking normalisation too far with that from your eample.

Yes, I would split off the address into a separate table.
However, the reason is not normalization per se (under most circumstances). The primary reason is that it is a slowly changing dimension and it might be useful to look up a previous addresses.
Whether you go ahead an normalize things like postal code is a matter of taste. In a more "amateur" database, I don't think it is necessary. However, for a large database of real customers, I would be inclined to split it off. It helps ensure that the postal codes are accurate. Also, they change over time. And, you might be purchasing additional information at the postal code level, for instance.

It all depends to your requirements, but as you mentioned above customer can't have more than one address so there's no need to another one to one relationship because you can put it in the same relation. But I suggest you break it into another one to many relationship because of future requirements.

Related

Many Bool columns in database table

I recently took over a website where people can register to help tutor kids. Part of the user's details is which areas they could work, represented by postal codes. The problem is, my predecessor designed the site such that in the database there is a Boolean column for every postal code. As such, the user table has almost 270 columns and can be quite slow at times (plus it's a nightmare to administer).
Most users select only a few postal codes so there is surely a better way to do it. I was thinking about a varchar that could save the selected areas comma separated, e.g. 6043,8811,1234
Any advice from somebody who's had the same problem?

both your predecessor's and your solution are... strange.
You should simply have a relationship table between user and localities (assuming you have a locality table, with a postalCode field and a surrogate key (id)).
UserLocality(userId int, localityId int)
so a locality could have many user, and a user could have many localities.
Coma separated fields is a really bad idea, when query time comes.

You should throw that entire idea out of your head and look into properly normalized data.
A possible solution to this problem would be a table for tutors, which has an id column to uniquely identify one tutor.
Then you would have a table for just Postal Codes (each with unique ids as well) and finally a tutor_availability table that creates one record of (t_id, pc_id) for each postal code a tutor wishes to offer their services, again with a unique id to avoid duplication risks in the case they can select the same location twice.

How to handle unified relationships of different tables (eg. Customers/Companies and Addresses)

I'll start of with what I have already:
I've learned at uni how to create databases and now I'm trying to create my own for personal use and possible use for my customer base. The people I work for are on the one hand businesses (Companies) and on the other hand privates (Customers).
I tried to build my database like shown above. I want to be able to add multiple addresses to both my customers and companies. I also have several employees that work for me.
Now, I'm pretty happy with what I have right now but I have the feeling it can be simpler but with the same capabilities (multiple addresses, ...).
Secondly, both Companies and Customers can make orders. Right now I only have a table for Customers to place an order and I'm clueless how I can do the same for Companies.
Should I make a CustomerOrder and CompanyOrder table to achieve this or is there a better solution?
EDIT
I played around a little and actually started over. I tried to take each part like email, phone, fax and put it in its own table. This way, if I update a phone number somewhere and it's used elsewhere, both will be updated.
Below is what I have so far:
Phone, Phone_1 and Phone_2 are the same table, Access just displays it that way. Any suggestions on how I'm doing? ContactType is used if there is CustomerSupport or TechnicalSupport. Type in EntityAddress is to determine if the Address is for a Person or a Company. This way it's expandable for more entities.
Now that I'm Writing this, would it be a good idea to do the same for Phone, Email and Fax like I do with Address?

What I would suggest you is to create a separate table for orders with no link to either customers or to companies. Then you create 2 intermediate tables to link order and company together on the one hand and order and customers together on the other hand.
Fields in the table will look like:
LinkID
CustomerID / CompanyID
OrderID
It avoids having twice the same table (order table).
For the phones, you can do the same thing. This way you can have as many phone numbers as you like linked to one company or customer.

People has 3 Addresses and Address belongs to People, what's the best practice?

In mysql database with cakephp 2.4.3.
People always have 3 addresses.
registed address.
Where they live now.
Where they work.
Both tables are just like people timeline.(Mean the record increase overtime)
for example, if you change your name in real life. you will have to add new record in people table.
something like this for 1 person.
Year name address address2 address3
2013 Parker boon 13/3 huston,usa null 332/2 tansania
2014 Parker samel 13/3 huston,usa 23,NY,usa 332/2 tansania
2015 Parker samel 13/3 huston,usa 23,NY,usa 992 osky,russia
in 2013 parker boon lived at his regised address.
in 2014 he moved to 23,NY,usa.
in 2015 he works in russia.
I have 2 questions.
first, I made 3 foreign keys in People table(address_id,address2_id,address3_id)
and dedicated 3 Address id for just 1 people.
in People table.
people_id(PK),
name ,
address_id(FK),
address2_id(FK),
address3_id(FK)
in Address table.
address_id(PK) ,
address_name
Is it better than made 1 foreign key in Address table (people_id) ?
in People table.
people_id(PK) ,
name
in Address table.
address_id(PK) ,
address_name,
people_id(FK) ,
address_type,
(registed,lived,work)
second, whether my method is bad or not. I want to know how to config model and
saveAssociated() or save with transaction in cakephp for learning.
link to cake saveassociated

I think, the relation should be as follow:-
User has one addesss.
An address belongs to user.
In address there should be fields as:-
year, user_id, name, address1, address2, address3 and is_active
if any address would be change you need to generate the new record with clone of last record and updated one with status active and deactive the previous record.
take an example.
if the user modified the address2 only---------
You'll copy the address1, and address3 from the last record and set the status as active for new record that you're going to insert.
Deactivate the last record.
Conclusion:-
At the one time only one record would be active so you can handle association with hasOne.
You'll able to manage the history of history change.

There are a number of ways to do this, and the best would depend on your specific requirements for your app - we don't know enough now to say for sure.
But from what I can see, it sounds like you basically want to keep track of:
a) A person
b) A person's work address
c) A person's home address
d) changes to either a person's name, or a person's home or work address.
It sounds like the "registered address" is just their most recent previous home address.
I definitely would NOT be creating a new record in the people table when a person changes their name - they're not a new person; they're the same person with a different name. Likewise, with addresses, you don't necessarily want to create a new record when someone's address changes (you might - depends on your requirements).
One really simple way would be to have a people table, and a people_histories table.
The people table would store both the persons name, and the persons work and home address. And the people_histories table would just keep track of changes to any fields in your people table. A 'person' would have many 'person_histories', and each 'person_history' record would indicate the field that was changed, the date it was changes, and what value it previously held.
Another way would be to have people, people_histories, addresses, and address_histories, and a person has a work_address_id and a home_address_id.
Yet another way would be to say a person has_many addresses, and each address as a role, either 'work' or 'home', and a date indicating when the person moved to that address. A person may end up with eg. 10 work addresses, and you know that the current one is the most recently dated one.
Really, though, the best solution depends on what you want to do with people's previous names and addresses. If you just want to keep track of them for manual viewing, and they're not really a central part of the system, then a histories table keeps things simple, and works well.
Oh, and another thing, you should stick to CakePHP conventions, and name your primary key fields simply 'id'. And if you've got more than one foriegn key pointing to the same table, rather than naming them address1_id and address2_id, I'd name them descriptively, like home_address_id, and work_address_id.

Best strategy for storing order's addresses

I have a 'strategy' question.
Thing is, we have a table of customers' addresses and customer orders. Structure is something like (just an example, ignore filed types etc.):
Address
id INT
line1 TEXT
line2 TEXT
state TEXT
zip TEXT
countryid INT
To preserve historical validity of the data we are storing those addresses in a text field with orders (previously it was done by reference, but this is wrong because if address changes all old orders change delivery address too, which is wrong). E.g:
Orders
id INT
productid INT
quantity INT
delivery_address TEXT
delivery address is something akin to CONCAT_WS("\n",line1,line2,state,zip,country_name)
Everything is nice and dandy, however it seems that customers need an access to historical data and be able to export those in XML format and they want to have those lines split up properly again. Because sometimes there is no line2 or state or zip or whatever, how can we store this information in a way that we can then decipher the 'label' of each line?
Storing as JSON encoded array was suggested but is this a best way? I thought about storing it as XML... or maybe create those 6-10 extra columns and store address data with every order? Perhaps some of you guys have more experience in dealing with this kind of stuff and be able to point me in the right direction.
Thanks in advance!

Personally I would model the addresses as a single table, every update to the address would generate a new row, this would be marked as the current address.
I guess you could allow deletes if there are no related orders, however it would be simpiler to mark the old record as inactive.
This will allow you to preserve the relationship between orders & addresses,
and to easily query the historic data at a later date.
see the wikipedia entry for slowly changing dimensions

The best way IMHO is to add history to the address-table. This will cause extra elements to be added for its key (say address_id and {start_of_validity, end_of_validity}) The customer id than becomes a foreign key into the customer table. The orders table references only the address_id field (which is "stable" in time). New orders would reference the "current" row in address.
NB: I dont know json.

You should store those as 6-10 extra fields, just like you do in the current address. You see, that way you have every piece of information at hand, without having to parse anything.
Any other approach (concatenation, JSON, XML) will make you have to do parsing when you need to access the info.

when you say "previously it was done by reference, but this is wrong because if address changes all old orders change delivery address too, which is wrong", it was not that wrong ...
Funny, isn't it?
So, as proposed by others, adresses should (must?) be stored in an independant table. You'll then have different address types (invoicing, delivery), address status (active, inactive) and a de facto address history log ...

In order to be able to utilize the address data for future uses you will definitely want to retain as much metadata (meaning, fields such as Address, City, State, and ZIP). Losing this data by pulling it all into a single line looks simpler and may conserve a small amount of space but in the end is not the best method. In fact, breaking it apart is very difficult--much like separating out first and last names from a generic, one-size-fits-all "name" column. Having the data stored in complete entries, utilizing 6-10 new fields (as mentioned) is the best way to go.
Even better would be standardizing the addresses (at least the US addresses) when they are first entered. That would ensure that the address is real and deliverable and eliminate shipping issues in the future. My thoughts, always retain as much of the data as possible because storage is cheap and data is valuable.
In the interest of full disclosure, I am the founder SmartyStreets. We provide street address verification.

Database Normalization with user input

I develop a mysql database that will contain the country,city and occupation of each user.
While I can use a "country" table and then insert the id of the country into the user table, I still have to look for the perfect method for the other two tables.
The problem is that the city and occupation of each user are taken from an input field, meaning that users can type "NYC" or "New York" or "New York City" and millions of other combinations for each town, for example.
Is it a good idea to disregard this issue, create an own "town" table containing all the towns inserted by users and then put the id of the town entry into the user table or would it be more appropriate to use a VARCHAR column "town" in the user table and not normalize the database concerning this relation?
I want to display the data from the three tables on user profile pages.
I am concerned about normalization because I don't want to have too much redundant data in my database because it consumes a lot of space and the queries will be slower if I use a varchar index instead of an integer index for example (as far as I know):
Thanks

We had this problem. Our solution was to collect the various synonyms and typo-containing versions that people use and explicitly map them to a known canonical city name. This allowed to correctly guess the name from user input in 99% of cases.
For the remaining 1%, we created a new city entry and marked it as a non-canonical. Periodically we looked through non-canonical entries. For recognizable known cities, we remapped the non-canonical entry to the canonical (updating FKs of linked records and adding a synonym). For a genuinely new city name we didn't know about we kept the created entry as canonical.
So we had something like this:
table city(
id integer primary key,
name varchar not null, -- the canonical name
...
);
table city_synonym(
name varchar primary key, -- we want unique index
city_id integer foreign key references(city.id)
);

Usually data normalization helps you to work with data and keep it simple. If normalized schema not fit your needs you can use denormalized data as well. So it depends on queries you want to use.
There is no good solution to group cities without creating separate table where you will keep all names for each city within single id. So it will be good to have 3 tables then: user(user_id, city_id), city (city_id, correct name), city_alias(alias_id, city_id, name).

It would be better to store the data in a normalized design, containing the actual, government recognized city names.
#Varela's suggestion of an 'alias' for the city would probably work well in this situation. But you have to return a message along the lines of "You typed in 'Now Yerk'. Did you perhaps mean 'New York'?". Actually, you want to get these kinds of corrections regardless...
Of course, what you should probably actually store isn't the city, but the postal/zip code. Table design is along these lines:
State:
Id State
============
AL Alabama
NY New York
City:
Id State_Id City
========================
1 NY New York
2 NY Buffalo
Zip_Code:
Id Code City_Id
=========================
1 00001-0001 1
And then store a reference to Zip_Code.Id whenever you have an address. You want to know exactly which zip code a user has (claimed) to be a part of. Reasons include:
Taxes for retail (regardless of how Amazon plays out).
Addresses for delivery (There is a Bellevue in both Washington and New York, for example. Zip codes are different).
Social mapping. If you store it as 'user input' cities, you will not be able to (easily) analyze the data to find out things like which users live near each other, much less in the same city.
There are a number of other things that can be done about address verification, including geo-location, but this is a basic design that should help you in most of your needs (and prevent most of the possible 'invalid' anomalies).

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008