Storing geopolitical data in mysql - mysql

My application has users and properties, both of which can have addresses. This application is also going to be heavily analytical, and we will need to be able to grab all addresses that belong to a specific city, zip, county, or state.
Storing Countries, States, and Counties is easy, because a state belongs to exactly 1 country, and a county belongs to exactly 1 state.
However, when storing zip codes and cities, the problem becomes a bit more complex. A state can have multiple zip codes, and a zip code can belong to multiple states. A zip codes and cities can also belong to multiple states/counties. Heck, some cities and zips might not even belong to any counties like Washington, DC.
Is there an established database model I can use to ensure that I account for all edge cases, while at the same time allowing querying by each type?

However, when storing zip codes and cities, the problem becomes a bit
more complex.
Firstly you need to create Countries,cities,States and zip_codes.
It's obvious there is M:M relationship between States and Zip_codes entities, so you will need to create a new table and let us call it States_Zip_codes which comprises two foreign keys State_id FK and Zip_Id as FK.
A zip codes and cities can also belong to multiple states/counties.
In this case the relationship is one to many, so Zip_code_id (the primary key of Zip_codes table) is a FK in States table and so on.
I don't understand the nature of relation it's many to many or one to many
some cities and zips might not even belong to any counties like
Washington, DC
So in city_name field (property) it would be null
Hope that helps.
.

Related

Relationships between Master and Transaction tables

I defined Master tables (data definition tables, static in nature) to generate content in my web page; and Transaction tables to store data entered by users (these tables are dynamic in nature). Consider following example:
Set of Master tables consisting of State having 1:M relationship with City, City having 1:M relationship with Locality. A Transaction table User to store personal details entered by a user. The User table has address attributes like Address, State, City and Locality. These can be defined as 1:M relationships from corresponding Master Tables (a particular record in State, City, Locality tables can be a part of multiple records in User table).
Is the design correct? I think it's sufficient to define 1:M relationship between Locality and User tables since the other two attributes (City and State) can be obtained from relationships between the Master tables. Would it be better to change the ER design to the following?
Are there alternatives to my requirement?
What queries do you have? Do you ever need to search by state or city? Even if you do search by those, it may not impact what I am about to say...
Since locality, city, and state are 'nested' and it is not likely for the names to change, I suggest that both of your options are "over-normalized". One table with all three items in it is the way I would go.
As I see it, there are two main reasons for normalizing:
Locating some string that is likely to change. By putting it in a separate table and pointing to that table, you can change it on only one place. This is not needed in your example.
Saving space (hence providing speed, etc). This does apply in your example, but only at the locality level, not at address. You might argue that city and state can be dedupped; I would counter with "The added complexity (extra tables) does not warrant the minimal benefit.".
A side note: If locality is zipcode, then your option 1 is in trouble at least one place I know of: Los Altos and Los Altos Hills (two different cities in California) both have sections of zipcodes 94022 and 94024.

Would a new table really be needed?

I'm making a sql database for a small company.. Pretty much the other tables don't relate to the question so ill list the two that do...
There is a table:
NextofKin:
fname
lname
street
no
houseno
city
AND
Patient:
ID[pk]
fname
lname
houseno
city
Pretty much would I need a seperate table for street, house and city?
also any idea what i could use as a primary key for NextOfKin?
Your questions are starting to get into database normalization.
What you should be doing is never duplicating data between tables unless that data relates the tables, and that data should be indexed. Something like this comes to mind ( there are different ways you might construct it based on business logic )
PersonalData: id, fname, lname, address1, address2, city, state, zip
Patient: id PK, personal_data_id FK, next_of_kin_id FK
Granted most of the tables already exist so this may be impossible. But to answer your question directly, since the database is not normalized already, there's no good place to put further address records ( don't want them under Patient right? ) and so you're stuck duplicating the data. Even so, there has to be some relationship between Patient and NextOfKin, so either Patient holds a reference to NextOfKin, or NextOfKin hods reference to Patient. Either way, you might consider using a foreign key between them to enforce, and explicitly state, this relationship.
Yes, use a pk for next of kin.
Use a joining table between patient and next of kin. Multiple patients could list the same person as next of kin, and while your app may not today require someone to designate multiple people as next of kin, they may change their mind in the future and your application will support it.
Myself, I always use a separate address table. Since usually more than one person lives in a house, and a person can have more than one home, you would again use a joining table.

Do all the tables in SQL have to be connected?

I have this scheme in CaseStudio where I have multiple instances of keeping the address of something. I have the adress of a client, address of an event, address of a user. And in order to have it in 3rd normal form I have a table called cities which has only "city" as PK and its ZIP code. But I do not know if I have to connect it to all the others tables which contain the name of the city, or if it can be without any relation to anything.
It is unlikely that this table would be unconnected to anything else. You would want the address to relate in some way to the city. Think how you would query the address, you would wan t return the person name (from your user or peopel table), the strett addess from you address table, the city from you city table. But you woudl have to know how each of these pieces of information are related to each other.
Personally I find that having a separate city and zipcode table is really overkill when it comes to addresses, but if this is an academic exercise, they might want you to break it out to get to the correct normal form. Normally in this case you would have a cityid as a column in the city table and then the address table would contain the cityid field and there would be a foriegn key set up to the city table to maintain teh data integrity.

MySQL database basic design

I have 3 entities:
buildings
activities
addresses
And I don't know how to foreign key the relationships between tables.
Buildings are located at addresses.
Activities are performed at addresses (one address at a time).
But I just want one address table.
Suppose the next attributes:
Buildings(id,phone,email,image,comments) <- should I put address_id ?
Activities(id,description) <- should I put address_id?
Addresses(id,street,city,state,postcode) <- or should I put center_id and activity_id here?
Thank you in advance!
You should use address_id both in buildings and activities tables.
Address is unique, while many buildings and many activities can be located to the same address!!
Your question implies that multiple buildings can be located at the same address - is this what you want? If so, just normalize it accordingly:
The Address is your 'root entity':
ADDRESS(address_id,street,city,state,postcode)
A Building can be located at exactly one Address, so include a reference to Address, a foreign key:
BUILDING(building_id,phone,email,image,comments,address_id)
An activity is performed at exactly one address, references to Address by foreign key:
ACTIVITY_ID(activity_id,description,address_id)
Maybe you should think about whether:
a street number is missing in the address?
should having more than one building at a given address be possible?
more than one address for a given building is possible (yes, I've seen this)?
a separate ADDRESS table is really necessary (see above questions)?
Alex, you should have the IDs in both tables, as you're saying in your question. There is no need to have them in separate tables as actually the address of a building will be where an activity will be performed, right?
If you are worried about two buildings having the same location then add a uniq index in the address_id column of the buildings table.
Moving a bit forward. Can you have an address without a building? If that is the case, then you could even add the address data (columns) to the buildings table. Because it would be a one-to-one relationship and no other entity would need to use the address table but the buildings one. That way you would get rid of the addresses table

Database Design: Composite key vs one column primary key

A web application I am working on has encountered an unexpected 'bug' - The database of the app has two tables (among many others) called 'States' and 'Cities'.
'States' table fields:
-------------------------------------------
idStates | State | Lat | Long
-------------------------------------------
'idStates' is an auto-incrementing primary key.
'Cities' table fields:
----------------------------------------------------------
idAreaCode | idStates | City | Lat | Long
----------------------------------------------------------
'idAreaCode' is a primary key consisting of country code + area code (e.g. 91422 where 91 is the country code for india and 422 is the area code of a city in India). 'idStates' is a foreign key derived from 'States' table to associate each city in the 'Cities' table with its corresponding State.
We figured that the country code + area code combination would be unique for each city, and thus could safely be used as a primary key. Everything was working. But a location in India found an unexpected 'flaw' in the db design - India, like the US is a federal democracy and is geographically divided into many states or union territories. Both the states and union territories data is stored in the 'States' table. There is, however, one location - Chandigarh - which belongs to TWO states (Haryana and Punjab) and is also a union territory by itself.
Obviously, the current db design doesn't allow us to store more than one record of the city 'Chandigarh'.
One of the solutions suggested is to create a primary key combining the columns 'idAreaCode' and 'idStates'.
I'd like to know if this is the best solution possible?
(FYI: we are using MySQL with the InnoDB engine).
More information:
The database stores meteorological information for each city. Thus, the state and city are the starting point of each query.
Fresh data for each city is inserted everyday using a CSV file. The CSV file includes an idStates (for state) and idAreaCode (for city) column which is used to identify each record.
Database normalization is important to us.
Note: The reason for not using an auto incrementing primary key for the city table is that the database is updated everyday / hourly using a CSV file (which is generated by another app). And each record in the CSV file is identified by the idStates and idAreaCode column. Hence it is preferred that the primary key used in the city table is the same for every city, even if the table is deleted and refreshed again. Zip codes (or pin codes) and area codes (or STD codes) meet the criteria of being unique, static (don't change often) and a ready list of these are easily available. (We decided on area codes for now because India is in the process of updating its pin codes to a new format).
The solution we decided on was to handle this at the application level instead of making changes to the database design. In the database we will only be storing one record of 'Chandigarh'. In the application we've created a flag for any search for 'Chandigarh, Punjab' or 'Chandigarh, Haryana' to redirect search to this record. Yeah, it's not ideal, but an acceptable compromise since this is the ONLY exception we've come across so far.
It sounds like you are gathering data for a telephone directory. Are you? Why are states important to you? The answer to this question will probably determine which database design will work best for you.
You may think that it's obvious what a city is. It's not. It depends on what you are going to do with the data. In the US, there is this unit called MSA (Metropolitan Statistical Area). The Kansas City MSA spans both Kansas City, Kansas and Kansas City, Missouri. Whether the MSA unit makes sense or not depends on the intended use of the data.
If you used area codes in US to determine cities, you'd end up with a very different grouping than MSAs. Again, it depends on what you are going to do with the data.
In general whenever hierarchical patterns of political subdivisions break down, the most general solution is to consider the relationship many-to-many. You solve this problem the same way you solve other many-to-many problems. By creating a new table, with two foreign keys. In this case the foreign keys are IdAreacode and IdStates.
Now you can have one arecode in many states and one state spanning many area codes. It seems a shame to accpet this extra overhead to cover just one exception. Do you know whether the exception you have uncovered is just the tip of the iceberg, and there are many such exceptions?
Having a composite key could be problematic when you want to reference that table, since the referring table would have to have all columns the primary key has.
If that's the case, you might want to have a sequence primary key, and have the idAreaCode and idStates defined in a UNIQUE NOT NULL group.
I think it is best to add another table, countries. Your problem is an example why database normalization is important. You can't just mix and match different keys to one column.
So, I suggest you to create these table:
countries:
+------------+--------------+
| country_id | country_name |
+------------+--------------+
states:
+------------+----------+------------+
| country_id | state_id | state_name |
+------------+----------+------------+
cities
+------------+----------+---------+-----------+
| country_id | state_id | city_id | city_name |
+------------+----------+---------+-----------+
data
+------------+----------+---------+---------+----------+
| country_id | state_id | city_id | data_id | your_CSV |
+------------+----------+---------+---------+----------+
The bold fields are primary keys. Enter a standard country_id like 1 for US, 91 for india, and so on. city_id should also use their standard id.
You can then find anything belongs to each other pretty fast with minimal overhead. All data can then entered directly to data table, thus serving as one entry point, storing all the data into single spot. I don't know with mysql, but if your database support partitioning, you can partition data tables according to country_id or country_id+state_id to a couple of server arrays, thus it will also speed up your database performance considerably. The first, second, and third table won't take much hit on server load at all, and only serve as reference. You will mainly working on fourth data table. You can add data as much as you wish, without any duplicate ever again.
If you only have one data per city, you can omit data table and move CSV_data to cities table like this:
cities
+------------+----------+---------+-----------+----------+
| country_id | state_id | city_id | city_name | CSV_data |
+------------+----------+---------+-----------+----------+
If you go with adding an additional column to the key so that you can add an additional record for a given city, then you're not properly normalizing your data. Given that you've now discovered that a city can be a member of multiple states, I would suggest removing any reference to a state from the Cities table, then adding a StateCity table that allows you to relate states to cities (creating a m:m relationship).
Imtroduce a surrogate key. What are you going to do when area codes change numbets or get split? Using business keys as a primary key almost always is a mistake.
Your above summary is another example of why.
"We figured that the country code + area code combination would be unique for each city, and thus could safely be used as a primary key"
After having read this, I just stopped to read anything further in this topic.
How could someone figure it in this way?
Area codes, by definition (the first one I found on internet):
- "An Area code is the prefix numbers that are used to identify a geographical region based on the North American number Plan. This 3 digit number can be assigned to any number in North America, including Canada, The United States, Mexico, Latin America and the Caribbean" [1]
Putting aside that they are changeable and defined only in North America, the area codes are not 3-digits in some other countries (3-digits is simply not enough having hundred thousands of locations in some countries. BTW, my mother's area code has 5 digits) and they are not strictly linked to fixed geographical locations.
Area codes have migrating locations like arctic camps drifting with ice, normadic tribes, migrating military units or, even, big oceanic ships, etc.
Then, what about merging a few cities into one (or vice versa)?
[1]
http://www.successfuloffice.com/articles/answering-service-glossary-area-code.htm
I recommend adding a new primary key field to the Cities table that will be simply auto-incremental. The KISS methodology (keep it simple).
Any other solution is cumbersome and confusing in my opinion.
The database is not Normalised. It may be partly Normalised. You will find many more bugs and limitations in extensibility, as a result.
A hierarchy of Country then State then City is fine. You do not need a many-to-many additional table as some suggest. The said city (and many in America) is multiply in three States.
By placing CountryCode and AreaCode, concatenated, in a single column, you have broken basic database rules, not to mention added code on every access. Additionally, CountryCode is not Normalised.
The problem is that CountryCode+AreaCode is a poor choice for a key for a City. In real terms, it has very little to do with a city, it applies to huge swaths of land. If the meaning of City was changed to town (as in, your company starts collecting data for large towns), the db would break completely.
Magician has the only answer that is close to being correct, that would save you from your current limitations due to lack of Normalisation. It is not accurate to say that Magician's answer is Normalised; it is correct choice of Identifiers, which form a hierarchy in this case. But I would remove the "id" columns because they are unnecessary, 100% redundant columns, 100% redundant indices. The char() columns are fine as they are, and fine for the PK (compound keys). Remember you need an Index on the char() column anyway, to ensure it is unique.
If you had this, the Relational structure, with Relational Identifiers, your problem would not exist.
and your poor users do not have to figure silly things out or keep track of meaningless identifiers. They just state, naturally: State.Name, City.Name, ReadingType, Data ...
.
When you get to the lower end of the hierarchy (City), the compound PK has become onerous (3 x CHAR(20) ), and I wouldn't want to carry it into the Data table (esp if there are daily CSV imports and many readings or rows per city). Therefore for City only, I would add a surrogate key, as the PK.
But for the posted DDL, even as it is, without Normalising the db and using Relational Identifiers, yes, the PK of City is incorrect. It should be (idStates, idAreaCode), not the other way around. That will fix your problem.
Very bad naming by the way.