Normalizing MySQL table with records of another table - mysql

i have 2 tables. The city tables is not normalized because the country information is in plain text. I have added the id_country to the 'city' table (that column is empty).
I need to check for matches between city>country and country>country and then update the city records that matched with the id_country from the country table. At the end i will be able to delete the 'country' column from the city table.
City table
id_city (1, 2, 3...)
city (Washington, Guayaquil, Bonn...)
country (Germany, Ecuador, USA...)
id_country (currently empty)
Country table
id_country (1, 2, 3...)
code (GE, EC, US...)
country (Germany, Ecuador, USA...)
I have no idea on where to start and if it can be done with a SQL query. My original idea was to search for matches in a php loop but that seems to be a really harder implementation.

You can do this with a JOIN on an UPDATE statement.
UPDATE city c1 INNER JOIN country c2 ON c1.country=c2.country
SET c1.id_country=c2.id_country;
Using an INNER JOIN will make sure that updates only occur for cities that have a matching country value.
Once you've run it, you'll be able to select all those cities that still have a null id_country just in case some of them didn't match. Conversely, once you've determined that all your cities have an id_country, you can delete that column from the city table.

The city tables is not normalized because the country information is
in plain text.
Nonsense. Normalization doesn't mean "replace plain text with id numbers". Find whoever taught you that and poke him in the eye with a sharp stick.
Your real problem is that "city" plus "country" isn't sufficient to identify cities, at least in the USA. I think there are at least a dozen different cities named "Washington" in the USA.
Instead of replacing the country name with an id number, you'd be far better off replacing it with the two-letter country code. The codes are human-readable; the id numbers will require an additional JOIN in every query that uses your table of cities.

Something like this should work:
UPDATE city set id_country = (SELECT country.id_country from country WHERE country.country = city.country)

Related

Inner join in mysql take a long time

I have table contacts with more than 1,000,000 and other table cities which have about 20,000 records. Need to fetch all cities which have used in contacts table.
Contacts table have following columns
Id, name, phone, email, city, state, country, postal, address, manager_Id
cities table have
Id, city
I used Inner join for this, but its taking a long time to go. Query takes more than 2 minutes to execute.
I used this query
SELECT cities.* FROM cities
INNER JOIN contacts ON contacts.City = cities.city
WHERE contacts.manager_Id= 1
created index on manager_Id as well. But still its very slow.
for better performance you could add index
on table cities column city
on table contacts a composite index on columns (manager_id, city)
Filter contacts first and then join to cities:
SELECT ct.*
FROM cities ct INNER JOIN (
SELECT city FROM contacts
WHERE manager_Id = 1
) cn ON cn.city = ct.city
You need indexes for city in both tables and for manager_id in contacts.
As others have pointed out about having proper index, I am taking it a bit more for clarification. You are specifically looking for contacts where the MANAGER ID = 1. This is not expected to be one person, but could be many people. So having the MANAGER ID in the first position will optimize get me all people for that manager. By having the city as part of the index via (manager_id, city), you are pulling the two data elements you need to optimize as part of the index. This way the engine does not have to go to the raw data pages to get the other part of interest.
Now, From that, you want all the city information (hence the join to city table on that ID).
Since you are only querying the CITIES and not the actual contact people information, you probably want to have DISTINCT City ID. Lets say a manager is responsible for 50 people and most of them live in the same city or neighboring. You may have 5 distinct cities? That too will limit your result set of joining.
Having said that, I would do a follows, and with MySQL, using STRAIGHT_JOIN can help optimize by "do the query as I wrote it, don't think for me".
select STRAIGHT_JOIN
cty.*
from
( select distinct c.City
from Contacts c
where c.Manager_ID = 1 ) PQ
JOIN Cities cty
on PQ.City = Cty.City
The "PQ" is an alias representing my "pre-query" of just DISTINCT cities for a given manager.
Again, have one index on Contacts table on (manager_id, city). On the city table, I would expect and index on (city).
You need two indexes, one on each table.
On the contacts table, first index manager_Id, then City
CREATE INDEX idx_contacts_mgr_city ON contacts(manager_Id, City);
On the cities table, just index `City.
Is the 'City' field from the table 'Contacts' a VARCHAR?
If that's the case, I see multiple things here.
First of all, since you have already have the 'Id' for the corresponding city in your 'cities' tables, I don't see why not to use the same 'Id' from the 'cities' table for the 'Contacts' table.
You can add the 'IdCity' field to the 'Contacts' table so you don't have to modify your existing records.
You'll have to insert the 'IdCity' manually though for each of your records, or you can create a Query using 'cities' table and then compare the 'idCity' but insert the 'city' (city name) in your 'Contacts' table.
Returning to your query:
Then, use an INT JOIN instead of a VARCHAR JOIN. Since you have many records, this can show up an important significance in performance.
It looks like you need to add two indexes, one on cities.city and one on (contacts.manager_Id, contacts.city). That should speed things up significantly.

How to get redundant consistent (duplicate) data from the database using relational algebra?

I have 2 relations (tables):
Shops (Postcode (PK), SuburbName, BottleShopName (PK), Address),
People (PersonName (PK), Postcode (PK), AlcoholConsumption)
If I write down a query that will return the name of the each bottle shop in the
database and its suburb using relational algebra, it would look like this:
π BottleShopName, SuburbName (Shops).
The limitation of this query is that it would not show any redundant data. Say if there are two different Bottle shop names with the same name and are in the same suburb having the same post code, the above query would ignore the second one.
What modifications should I make to this query to get both results explicitly using relational algebra?
Since this question is missing the used query, here's what I understood of your question:
SELECT BottleShopName, SuburbName FROM Shops JOIN People ON People.Postcode = Shops.Postcode WHERE People.PersonName = ?
Say if there are two different Bottle shop names with the same name and are in the same suburb having the same post code, the above query would ignore the second one.
Relational algebra relations are sets of tuples. They cannot have two tuples with all the same values for attributes.
A relation holds the rows that make a true statement from a given predicate--a fill-in-the-blanks sentence template parameterized by attribute names.
Suppose that Shops holds rows where "a bottle shop has postal code Postcode, suburb name SuburbName, name BottleShopName and address Address". If, for some particular values of Postcode, SuburbName, BottleShopName & Address, there is more than one shop with postal code Postcode, suburb name SuburbName, name BottleShopName and address Address then there is still only one tuple with those values in the relation.
If there could be more than one shop and you want to know that then you need a different predicate. You could keep a count of similar shops: "N bottle shops have postal code Postcode, suburb name SuburbName, name BottleShopName and address Address". Or you could assign a unique name or id to each shop and then use: "a bottle shop identified by id has postal code Postcode, suburb name SuburbName, name BottleShopName and address Address". (Still no duplicates possible.)
The reason relations are sets of similar tuples in the relational model is that if relations R and S have predicates r and s then:
R JOIN S has predicate (holds the tuples where) r AND s,
R UNION S has predicate (holds the tuples where) r OR s,
R MINUS S has predicate (holds the tuples where) r AND NOT s,
RESTRICT condition R has predicate (holds the tuples where) r AND condition,
RENAME A N R has the predicate (holds the tuples where) r with A replaced by N, and
PROJECT A R has predicate (holds the tuples where) FOR SOME attributes other than A, r.
Ie a query also has a predicate and its tuples are also the ones that make its predicate into a true statement. So querying means writing a predicate for which we want the satisfying tuples.
In SQL, tables can have duplicate rows. So you cannot reason this way about what rows should be in a table in a given application situation, what a table states about the application situation in terms of the rows in it, or what rows a query is asking for in terms of its inputs, its operator and the application situation--except in special cases. (Eg it doesn't make sense to talk about a row stating something by being in a table.) This is perhaps the most profound way that SQL violates the relational model and is pointlessly harder to use.

Update table based on other 2 related tables

I have a strange problem. I got some data for cities, regions and countries in CSV format and imported them into MySQL tables.
I have 3 tables and their fields
1. City : id, name, country_code, region_number
2. Region : region_number, country_code, name
3. Country : country_code, name
Now things get a little complicated, as I added an auto-generated id column to the region table, so the region x for country y would be unique.
The thing is: Now i am trying to update city field region_number to hold this unique value (the new id column in region) so I can have relations city->region.
The relation region->country or country->region is OK.
Is it possible to write an update query that would update city region_code (or fill some new column, eg. region_id) with correct values?
If not an query, what could I use to get the correct values into the cities table?
I have arround 3 million records!
If I understant correctly, I think you are looking for something like this:
UPDATE
City inner join Region
on City.country_code = Region.country_code
and City.region_number = Region.region_number
SET
City.new_column = Region.id
However, since there's a relation already between City and Region, I am not sure this is the right thing to do, since it will make the table not normalized.
Now i am trying to update city field region_number to not hold this unique value
The only way you can do this is if the region_number uniquely identifies each region - and if that's already the case then you are wasting your time by creating redundant references. Although frankly, if these really are your table structures, there's no reason for using surrogate keys. And if there's no reason for using surrogate keys then the region and country table are redundant.

What is the most efficient way to make this MySQL database?

Ok, I have a database with with a table for storing classified posts, each post belongs to a different city. For the purpose of this example we will call this table posts. This table has columns:
id (INT, +AI),
cityid (TEXT),
postcat (TEXT),
user (TEXT),
datatime (DATETIME),
title (TEXT),
desc (TEXT),
location (TEXT)
an example of that data would be:
'12039',
'fayetteville-nc',
'user#gmail.com',
'December 28th, 2010 - 11:55 PM',
'post title',
'post description',
'spring lake'
id is auto incremented, cityid is in text format (this is where I think i will be losing performance once the database is large)...
Originally I planned on having a different table for each city and now since a user has to have the option of searching and posting through multiple cities, I think I need them all in one table. Everything was perfect when I had one city per table, where I could:
SELECT *
FROM `posts`
WHERE MATCH (`title`, `desc`, `location`)
AGAINST ('searchtext' IN BOOLEAN MODE)
AND `postcat` LIKE 'searchcatagory'
But then I ran into problems when trying to search multiple cities at one time, or listing all of a users posts for them to delete or edit.
So looks like I have to have one table with all the posts, and also match another FULLTEXT field: cityid. I am guessing I need full-text because if a user chooses an entire state, and my cityid is "fayetteville-nc" I would need to match cityid against "-nc" this is only an assumption and I would love another way. This database could easily reach over a million rows within 6 months, and a fulltext search against 4 columns is probably going to be slow.
My question is, is there a better way to do this more efficiently? The database has nothing in it now, except for some test posts made by me. So I can completely redesign the table structure if necessary. I am open to any and all suggestions, even if it is just a more efficient way to perform my query.
Yes, one table for all posts sounds sensible. It would also be normal design for the posts table to have a city_id, referring to the id in a city table. Each city would also have a state_id, referring to the id in a state table, and similarly each state would have a country_id referring to the id in a country table. So you could write:
SELECT $columns
FROM posts JOIN city ON city.id = posts.city_id
WHERE city.tag = 'fayetteville-nc'
Once you've brought the cities into a separate table, it might make more sense for you to do the city-to-city_id resolving up front. This fairly naturally happens if you have a city chose from a dropdown, for instance. But if you're entering free text into a search field, you may want to do it differently.
You can also search for all posts in a given state (or set of states) as:
SELECT $columns
FROM posts
JOIN city ON city.id = posts.city_id
JOIN state ON state.id = city.state_id
WHERE state.tag = 'NC'
If you're going to go more fancy or international, you may need a more flexible way of arranging locations into a hierarchy (e.g. you may want city districts, counties, multinational regions, intranational regions (Midwest, East Coast etc)) but stay easy for now :)

Basic table design Q

I am trying to understand this concept. For example: I have two tables City and Country.
Country
-------
id
abbreviation
name
City
-----
id
name
Country (name or id, or both? - This is the question)
To reference and keep a particular city in sync with the country it belongs to I guess this will be reference to country.id as a FK. This means an example of the city table will be: (200, New York, 19) - where 19 = USA in country table. But this doesn't help a person viewing the table because he wont know what 19 is without looking up in country table what 19 is.
So I want to add the country name also to city table so it reads: (200, New York, USA). I don't need the 19 to display because 19 is of no use to the reader but is only used in back to connect the tables.
So what should my tables colunms / FK look like to i can store in city table rows like this (200, New York, USA), yet ensure New york will always reference to USA in the USA lookup and keep the 19 which is the primary key for USA out of the city table so the tables look clean and easy to understand? And I assume if these are referenced, tomorrow if i update USA to be 20, it will update in the city table on its own, and same way if I rename USA to US it will update on city table on its own?
My DB is in MySQL
Why not use ISO 3166 country codes (2 character or 3 character) as the country ID? This leaves you with recognizable codes in the city table; you can map to the full name in the country table.
As for viewing the data, use a VIEW to create a good looking table:
CREATE VIEW CityInfo(CityID, CityName, CountryID, CountryName) AS
SELECT ci.id, ci.name, ci.country, co.name
FROM City AS ci JOIN Country AS co ON ci.Country = co.id;
You don't ... if you need to have "usable" tables in the DB, so someone can easily view something useful with select * etc. (or to make programing the SQL by hand easier). Then you create the tables as above, in normal form, and then create a VIEW which combines the tables.
Tables aren't supposed to be viewed by people: some application should be accessing the tables to present data to people in a way they can parse it. The reason you want to have country.id as a FK in your table is so that you don't have a million rows whose country name is "USA", because then all kinds of problems can occur, like what happens when you mistype and "US A" lands in one of the fields? Or what if you want to change what the user sees from "USA" to "United States"?
The right way to handle it is to use the country.id as you initially suggest, and then use a JOIN statement to present the data, like this:
SELECT city.name, country.name
FROM city JOIN country ON country.id = city.country
My syntax could be off, but that's in essence what you want.