I am looking for a best practice for multiple tables in a vertical hierarchy having small shared data. Let's say three tables:
Country {
id
}
State {
id,
country_id,
FK_to_country_id,
}
City {
id,
state_id,
FK_to_state_id,
}
in above state belongs to country while city belongs to state and turns out to belong to country as well. The schema looks clean but when you want to look up which country a city belongs to, you have to use a JOIN with three tables. If there is another tier called County which belongs to city, the situation gets worse.
City {
id,
country_id,
state_id,
FK_to_country_id,
FK_to_state_id,
}
adding another column 'country_id' in City frees us from cumbersome JOIN but the database schema gets a little duplicate.
What's practice in real world?
That's what JOINs are made for; don't deprive them of their duty.
In real-world practice, it is absolutely normal to be joining multiple tables like that.
In your example, yes it would be redundant to have country ID's in the city table.
But it's not always the case. Consider an example where a city can exist without needing to be in a state (a lot of 3rd-world countries have no "state" designation), then you have to have that foreign-key country ID, because the State table is optional.
Related
I'm currently designing a database for a project I have in college and I would really appreciate some feedback. Basically, the idea of the project is to create a food plants, food inventory database. I have created a schema for the database but I'm not sure if the products table abides by 3NF.
So my products table contains p_id( the primary key), the name of the product, what type is the product (eg pizza or dessert), the country it comes from, the region in that country and who the product is suitable for (eg vegan).
So basically when I break it down, I feel that Region is dependant on Country and type is dependant on the name? Would I be right in assuming this? I could then split the table into 3 separate tables. 1 which would contain p_id, name, Country and another one which would contain Country, Region and a third one which would contain name and type. Would this then be a fully normalised database up to 4NF?
Here is a my schema:
I need to create database which will contain the following entities in country:
Regions
Municipalities
Now comes the hard part for me:
Municipality can also be city, where city can (but not always) have its own municipalities.
Municipality can have several cities.
Municipality in both cases contains several settlements
I wanted to create several tables, for example:
regions (region_id, region_name)
municipalities (municipality_id, municipality_name, region_id).
cities (city_id, city_name, region_id)
city_municipalities (city_municipality_id, city_municipality_name, city_id)
settlements (settlement_id, settlement_name, municipality_id, city_municipality_id)
i think that here i make problem. There would be a lot of NULLs for municipality_id or city_municipality_id, since i cant have both field filled.
But right now it comes to my mind to have two tables organized like this:
geo_entity(geo_id, geo_name, geo_type_id, parent)
geo_entities_type (geo_type_id, geo_type_name)
this table will contain definitions of what geo_entity entry is, like:
1 Region
2 Municipality
3 City
4 City Municipality
5 Settlement
What should i stick to? Do you have a better approach and which one?
I have three tables named users, cities and countries and these two scenarios:
1) User belongs to city, city belongs to country (deep join)
Table users has 2 fields: id (PK) and city_id (FK).
Table cities has 2 fields: id (PK) and country_id (FK).
Table countries has 2 fields: id (PK) and name.
Get any user's country:
SELECT country.name
FROM users
LEFT JOIN cities ON user.city_id = cities.id
LEFT JOIN countries ON city.country_id = country.id
WHERE user.id = 1;
2) User belongs to city and country, city belongs to country (one join)
Table users has 3 fields: id (PK), city_id (FK) and country_id (FK).
Table cities has 2 fields: id (PK) and country_id (FK).
Table countries has 2 fields: id (PK) and name.
Get any user's country:
SELECT country.name
FROM users
LEFT JOIN countries ON user.country_id = country.id
WHERE user.id = 1;
At first glance, scenario 2 seems faster but, is it a good idea to have country_id FK in users table to save one join? Or should I take advantage of relationships and make a deep join? What of these two scenarios actually perform faster?
One join is almost always faster than 2 joins, but the question here shouldn't be which is faster but which is more maintainable (also look at When to optimize).
Are you actually having a performance problem? Even though in this case the data probably never changes (at least, cities usually don't change country) there is still a risk that the data between the tables gets out of date. So the question here is, is it fast enough?
These types of optimisations generally give very little benefit in terms of performance but bring in risks that the data will be out of date and it makes things more complex.
In the first situation you are primary key based lookup on three tables and reducing it to only two tables in the second. That is what I would consider a micro-optimization. You won't see significant performance returns unless the tables are enormous (millions of rows) or writes are happening quickly enough to cause lock contention.
I am sure this is a basic question but I am new to SQL so anyways, for my user profile I want to display this: location = "Hollywood, CA - USA" if a user lives in Hollywood. So I assume in the user table there will be 1 column like current_city which will have ID say 1232 which is a FK to the city table where city_name for this PK = Hollywood. Then connect with the state table and the country table to find the names CA and USA as the city lookup table will only store the IDs (like CA = 21 and USA = 345)
Is this the best way to design the table OR I was thinking should I add 2 columns like city_id and city_name to the user_table. And also add country_id, country_name, state_id, state_name to the city table. This way i save on trips to other parent tables just to fetch the name for the IDs.
This is only a sample use case but I have lots of lookup ID tables so I will apply the same principle to all tables once i know how to do it best. My requirement is scalability and performance so whatever works best for these is what i would like.
The first way you described is almost always better.
Having both the city_id and city_name (or any pair of that kind) in the users table is not best practice since it may cause data discrepancies - a wrong update may result in a city_id that does not match the city_name and then the system behavior becomes unexpected.
As said, your first suggestion would be the common and usually the best way to do this. If table keys are designed properly so all select statements can use them efficiently this would also give the best performance.
For example, having just the city_name in the users table would make it a little quicker to find and show the city for one user, but when trying to run other queries - like finding all users in city X, that would make much less sense.
You can find a nice series of articles for beginners about DB normalization here:
http://databases.about.com/od/specificproducts/a/2nf.htm. This article has an example which is very much like what you are trying to achieve, and the related articles will probably help you design many other tables in your DB.
Good luck!
Ok, I have a database with with a table for storing classified posts, each post belongs to a different city. For the purpose of this example we will call this table posts. This table has columns:
id (INT, +AI),
cityid (TEXT),
postcat (TEXT),
user (TEXT),
datatime (DATETIME),
title (TEXT),
desc (TEXT),
location (TEXT)
an example of that data would be:
'12039',
'fayetteville-nc',
'user#gmail.com',
'December 28th, 2010 - 11:55 PM',
'post title',
'post description',
'spring lake'
id is auto incremented, cityid is in text format (this is where I think i will be losing performance once the database is large)...
Originally I planned on having a different table for each city and now since a user has to have the option of searching and posting through multiple cities, I think I need them all in one table. Everything was perfect when I had one city per table, where I could:
SELECT *
FROM `posts`
WHERE MATCH (`title`, `desc`, `location`)
AGAINST ('searchtext' IN BOOLEAN MODE)
AND `postcat` LIKE 'searchcatagory'
But then I ran into problems when trying to search multiple cities at one time, or listing all of a users posts for them to delete or edit.
So looks like I have to have one table with all the posts, and also match another FULLTEXT field: cityid. I am guessing I need full-text because if a user chooses an entire state, and my cityid is "fayetteville-nc" I would need to match cityid against "-nc" this is only an assumption and I would love another way. This database could easily reach over a million rows within 6 months, and a fulltext search against 4 columns is probably going to be slow.
My question is, is there a better way to do this more efficiently? The database has nothing in it now, except for some test posts made by me. So I can completely redesign the table structure if necessary. I am open to any and all suggestions, even if it is just a more efficient way to perform my query.
Yes, one table for all posts sounds sensible. It would also be normal design for the posts table to have a city_id, referring to the id in a city table. Each city would also have a state_id, referring to the id in a state table, and similarly each state would have a country_id referring to the id in a country table. So you could write:
SELECT $columns
FROM posts JOIN city ON city.id = posts.city_id
WHERE city.tag = 'fayetteville-nc'
Once you've brought the cities into a separate table, it might make more sense for you to do the city-to-city_id resolving up front. This fairly naturally happens if you have a city chose from a dropdown, for instance. But if you're entering free text into a search field, you may want to do it differently.
You can also search for all posts in a given state (or set of states) as:
SELECT $columns
FROM posts
JOIN city ON city.id = posts.city_id
JOIN state ON state.id = city.state_id
WHERE state.tag = 'NC'
If you're going to go more fancy or international, you may need a more flexible way of arranging locations into a hierarchy (e.g. you may want city districts, counties, multinational regions, intranational regions (Midwest, East Coast etc)) but stay easy for now :)