I've got a prexisting table that contains all kinds of customer information. Currently it also has the "city" as well as the "region" and the "state" listed in a 3 columns as strings. Redundant info!
I'd like to create three new tables, one for the city and one for the region and one for the state, that will contain single entries for each of the cities etc, and then reference the ID back into the existing customer table with a location_id.
How would I go about exporting the distinct city names into the cities table, and the distinct regions into a regions table, and then have the cities reference the region_id and state_id table as well so that the information is all grouped!
Amatuer question for sure, but I appreciate any help!
You don't want three different tables! You want one table with three columns: city, state, region.
The reason is that city does not exist by itself. Consider (in the US) Springfield, IL. And Springfield, MA. Or Miami, FL and Miami, OH. What you have is a dimension of the data that has hierarchies. The right way to store this is at the lowest level (city in your case) with a "dimension" table providing the other information.
Assuming that your original data is correct, you can do something like this:
create table Cities (
CityId int auto_increment not null primary key,
City varchar(255),
State varchar(255),
Region varchar(255)
);
insert into Cities(City, State, Region)
select distinct City, State, Region
from YourTable;
I realize that this is not "standard normal form". But for most applications this works well. If you are doing this for an application where you want to pick states from a list, for instance, create an index on state and the query will be fast.
There are some circumstances where you might want separate tables at the state and region level. This would be the case if you had lots of different columns at those levels. And, in particular, if you were modifying the values in those columns. A flattened dimension (such as described here) is most appropriate when the data is static (cities don't change states very often). Normalization is most appropriate when you are changing values in the different levels.
Related
Let's say I have the below schema. country -> states -> district -> village.
I understand in the real scenario there won't be an auto-increment id for country states etc.
But let's assume that we have the below scenario.
I am using MySQL as a database.
these all tables are dependent on the previous table id.
for example, once the countryId is generated that would be used in state table, once stateId is generated that would be used in the district table and so on.
That's the real problem. In a crude way, I can add country object and insert in country table. Get the auto-generated country id and insert the record in state table and so on.
If I have 100 countries to insert then there will be billions of records to be inserted in the village table and the crude way is going to be a terrible way to do it.
I am looking for the fastest way to do it.
I am open for spring boot, JPA, and some other way also.
Any link, suggestion, idea is welocme.
I'm designing a table 'employees', which contains an primary key which is auto increment and represents an ID of the employee.
I want to prefix the ID with an number designating the city: city 1: 1, city 2:2, etc.
So the IDs should look like xyy where x represents the city and yy the ID of the employee.
When I'm adding new employee I'm selecting the city x, and I would like to yy values to auto increment.
Is that possible using SQL commands?
That is not good database design. You really should have a separate column for the city in your table. If you have many cities, the cities should perhaps be in their own table. What you are trying to do is overly complex and although 'everything is possible', I would not recommend it.
You are effectively packing two fields into one and violating the principle of atomicity and the 1NF in the process. On top of that, your key is not minimal (so to speak).
Instead, keep two separate fields: ID and CITY.
ID alone is the primary key. In your own words, ID is auto-increment, so it alone is unique.
You can easily concatenate ID and CITY together for display purposes in your query or VIEW or even in the client code. There is no reason to "precook" the concatenated value in the table itself.
Given this requirement from the comments, "Unique ID should provide users with an info of the city, company requirements", I would do this.
table employee would have an employeeID as the primary key. Other fields would be firstname, lastname, birthdate, gender, etc
table city would have a cityId as the primary key. Other fields would be the name of the city, provinceState, Country, whatever is appropriate.
Table EmployeeCity would have have a primary key of EmployeeId, CityId, and StartDate. Not part of the primary key would be field EndDate.
The primary key of EmployeeCity satisfies the requirement of a unique identifier which leads to city information. Also, if an employee changes cities, it's a simple matter of updating one record and adding another.
Could someone suggest the best design for the following scenario?
I have a database in which there is a table called City. This table has the following fields:
City id (Primary key)
City Name
State Id (which is linked to the State table)
My problem is I have 10 cities with the same name in one state. What will be the best design so I can represent one city name per id?
It does not matter that they have the same City Name, as long as they have different City Ids.
Just make sure to set the CityId as primary key in the City table. Also, it would be useful to make it an identity autoincrement column, so that it is inserted automatically and will always be unique.
Same goes for StateId in the State table.
Also, if you use a visual management tool for the database, make sure to set the foreign key relationship between the two tables:
FK_State.StateId_City.StateId.
I have seen ,implementing the relation between "States" and "Districts" in two ways:
The relation between States and Districts is one to many relationship respectively..
First way:
In this implementaion,take two tables "States" and "Districts" and implement the one to many relationship between States to District as put the foreign key in Districts table.
In my "States" table the columns are: state_id(pk) & state_name.
In my "Districts" table the columns are: district_id(pk) district_name state_id(fk).
Second Way:
In this implementaion,take two tables "States" and "Districts" and implement the one to many relationship between States to District as creating the third table "state_district" and implementing as follows.
In my "States" table the columns are: state_id(pk) & state_name.
In my "Districts" table the columns are: district_id(pk) district_name .
The third table is "state_district",the columns are s_did(pk), district_id(fk),state_id(fk).
What is the difference betwen these two mechanisms.
The difference is that in the first case there can be only one state per district, wheras in the second there can be many states per district.
Which one you should use depends entirely on whether a district can be associated with multiple states or not. If they can then you have to use the second many-to-many model. If they cannot then while in practice you could use the second model, it would be incorrect to do so -- you should use the first one-to-many model.
for one to many relationship we use a table's primary key as foreign key in another table - Which is your first approach ans correct in this case
For many to many relationship we use a third table to store the relationship between first 2 tables- which is not required in your case as state to district has one to many relationship
What is the difference between these two mechanisms.
The difference is that the second method allows a district to be associated with more than one state. You can do this by just adding another row for a given district in the third table.
INSERT INTO state_district (district_id, state_id) VALUES
(1234, 49), (1234, 50);
Now you have the same district 1234 associated with both Alaska (49) and Hawaii (50).
I would assume you don't really need this. In fact, it would be better to ensure that each district belongs to exactly one state. You should have only a one-to-many relationship between states and districts. So you should use the first design.
Your second way should be done if there is a many to many relationship between states to districts.
You first way is correct and you should implement.
I would propose the following table structures:
States: --don't need extra metadata such as the sequence generated value
state_name varchar2(50) PRIMARY KEY
Districts: -- don't need extra metadata
district_name varchar2(100) PRIMARY KEY
State_Districts
state_name varchar2(50)
district_name varchar2(100)
primary key (state_name,district_name)
This ensures that you don't have duplicate district names which are real unique identifiers, regardless of if Wyoming and Pennsylvania have the same district name, the data is independent of each other. This also ensures that there will be no null values in any of the three tables, this is pretty important when we think about normalization techniques.
It would appear from your first table definition of the two tables that there is a State ID field in the District table. This indicates to me that there are one or more districts per state. In this case, a third table would be redundant.
Let's say there is a database with two tables: one customer table and one country table. Each customer row contains (among other things) a countryId foreign key. Let's also assume that we are populating the database from a data file (i.e., it is not an operator that is selecting a country from a UI).
What is the best practice for this?
Should one query the database first and get all ID's for all countries, and then just supply the (now known) country id's in the insert query? This is not a problem for my 'country' example, but what if there is a large number of records in the table that is being referred?
Or should the insert query use a sub query to get the country id based on the country name? If so, what if the record for the country does not exist yet and has to be added?
Or another approach? Or does it depend? :)
I would suggest using a join in your insert query to get the country id based on the country name. However, I don't know if that's something possible with every SGBD and you don't give more precision on the one you're using.