I have a strange problem. I got some data for cities, regions and countries in CSV format and imported them into MySQL tables.
I have 3 tables and their fields
1. City : id, name, country_code, region_number
2. Region : region_number, country_code, name
3. Country : country_code, name
Now things get a little complicated, as I added an auto-generated id column to the region table, so the region x for country y would be unique.
The thing is: Now i am trying to update city field region_number to hold this unique value (the new id column in region) so I can have relations city->region.
The relation region->country or country->region is OK.
Is it possible to write an update query that would update city region_code (or fill some new column, eg. region_id) with correct values?
If not an query, what could I use to get the correct values into the cities table?
I have arround 3 million records!
If I understant correctly, I think you are looking for something like this:
UPDATE
City inner join Region
on City.country_code = Region.country_code
and City.region_number = Region.region_number
SET
City.new_column = Region.id
However, since there's a relation already between City and Region, I am not sure this is the right thing to do, since it will make the table not normalized.
Now i am trying to update city field region_number to not hold this unique value
The only way you can do this is if the region_number uniquely identifies each region - and if that's already the case then you are wasting your time by creating redundant references. Although frankly, if these really are your table structures, there's no reason for using surrogate keys. And if there's no reason for using surrogate keys then the region and country table are redundant.
Related
Hello i have a string that is stored in my database separated by comma
eg: (new south wales,Queensland,etc,etc)
Know my problem is when i try to search Queensland i am not able to get the result but when i try to search for new south wales i get the record.
But i want to get the result when i try to search for queen or etc.
I am new to php so please help...
Short Term Solution
Use the FIND_IN_SET function:
WHERE FIND_IN_SET('Queensland', csv_column)
...because using LIKE with wildcards on either end is risky, depending on how much/little matches (and it also ensures a table scan). Performance of LIKE with wildcards on either side is on par with REGEXP--that means bad.
Long Term Solution
Don't store comma separated values -- use a proper many-to-many relationship, involving three tables:
Things
thing_id (primary key)
Australian States
State_id (primary key)
State_name
Things_to_Auz_States
thing_id (primary key, foreign key to THINGS table)
State_id (primary key, foreign key to AUSTRALIAN_STATES table)
You'll need JOINs to get data out of the three tables, but if you want to know things like how many are associated to a particular state, or two particular states, it's the proper model.
Not really what you were asking, but just to be complete: you're going to have a lot of trouble unless you change your approach.
The correct way:
TableOne
--------
ThingID
TableTwo
--------
ThingID
Province
Then your database query becomes:
SELECT fields FROM TableOne WHERE ThingID IN
(SELECT ThingID from TableTwo WHERE Province = 'Queensland')
And what do you want to have happen when they search for "Australia"? Get back both Western Australia and South Australia?
By using REGEXP
$result = mysql_query("SELECT * FROM table WHERE column REGEXP $your_search_string");
i have 2 tables. The city tables is not normalized because the country information is in plain text. I have added the id_country to the 'city' table (that column is empty).
I need to check for matches between city>country and country>country and then update the city records that matched with the id_country from the country table. At the end i will be able to delete the 'country' column from the city table.
City table
id_city (1, 2, 3...)
city (Washington, Guayaquil, Bonn...)
country (Germany, Ecuador, USA...)
id_country (currently empty)
Country table
id_country (1, 2, 3...)
code (GE, EC, US...)
country (Germany, Ecuador, USA...)
I have no idea on where to start and if it can be done with a SQL query. My original idea was to search for matches in a php loop but that seems to be a really harder implementation.
You can do this with a JOIN on an UPDATE statement.
UPDATE city c1 INNER JOIN country c2 ON c1.country=c2.country
SET c1.id_country=c2.id_country;
Using an INNER JOIN will make sure that updates only occur for cities that have a matching country value.
Once you've run it, you'll be able to select all those cities that still have a null id_country just in case some of them didn't match. Conversely, once you've determined that all your cities have an id_country, you can delete that column from the city table.
The city tables is not normalized because the country information is
in plain text.
Nonsense. Normalization doesn't mean "replace plain text with id numbers". Find whoever taught you that and poke him in the eye with a sharp stick.
Your real problem is that "city" plus "country" isn't sufficient to identify cities, at least in the USA. I think there are at least a dozen different cities named "Washington" in the USA.
Instead of replacing the country name with an id number, you'd be far better off replacing it with the two-letter country code. The codes are human-readable; the id numbers will require an additional JOIN in every query that uses your table of cities.
Something like this should work:
UPDATE city set id_country = (SELECT country.id_country from country WHERE country.country = city.country)
I have 4 tables namely,
countries, states, cities, areas,
apart from countries table the rest three(states,cities,areas) contains country_id foreign key.
i wanted to return the total number of count of country_id combined in three tables for which i used jon_darstar's solution, here is the code i am using.
SELECT COUNT(DISTINCT(states.id)) + COUNT(DISTINCT(cities.id)) + COUNT(DISTINCT(areas.id))
FROM states
JOIN cities on cities.country_id = states.country_id
JOIN areas on areas.country_id = states.country_id
WHERE states.country_id IN (118);
the above code works perfectly fine although i am unable to understand the code properly, mainly the first line i.e
SELECT COUNT(DISTINCT(states.id)) + COUNT(DISTINCT(cities.id)) + COUNT(DISTINCT(areas.id))
Question 1 : doesn't that select the
primary id of the three tables
states,cities and areas and make the
count? i know this is not happening
from the result i am getting then what
is actually happening here?
However if i remove the DISTINCT from the query string it shows me a different result i.e a count of 120 whereas actually it should be 15(this is the count number of country_id in all three tables).
Question 2 : What is happening if i
use DISTINCT and remove DISTINCT?
isn't DISTINCT supposed to remove any
duplicate values. where is duplication
happening here?
thank you..
For an example, if in a country A(having primary id a.id=118),there is State B(having primary id b.id),inside that state there is City C(having primary id c.id), In city C there's Area D(having primary id d.id),E(having primary id e.id),F(f.id).lets visualize the query result in a database table.
C St Ct Ar
A->B->C->D
A->B->C->E
A->B->C->F
(Here C=Country,St=States,Ct=Cities,Ar=Areas)
Now just think what happens when you do count on above table to get total number of States within Country A without distinct.The result is 3,this way the Number of Cities is 3 and areas is 3,total 9.Because without distinct you're getting duplicate values in which you're not interested.
Now,if you use distinct count you'll get correct result cause here distinct states under
country A is 1,City is 1 and Areas is 3,total:5(excluding duplicate values)..
Hope this works!
!!Design Issue!!!
Like to add something:From your database design,i can see that you're using country id as a reference for countries from country table(to states,areas and cities) then joining states and cities then states and areas (by their country id)don't you think it's creating cross join?.Better design choice is at areas table keep foreign key of city,this way go bottom up like in city keep states and in states keep country.Or make a table for Areas where you are keeping Countries,States,Cities foreign key and areas primary key.
Ok, I have a database with with a table for storing classified posts, each post belongs to a different city. For the purpose of this example we will call this table posts. This table has columns:
id (INT, +AI),
cityid (TEXT),
postcat (TEXT),
user (TEXT),
datatime (DATETIME),
title (TEXT),
desc (TEXT),
location (TEXT)
an example of that data would be:
'12039',
'fayetteville-nc',
'user#gmail.com',
'December 28th, 2010 - 11:55 PM',
'post title',
'post description',
'spring lake'
id is auto incremented, cityid is in text format (this is where I think i will be losing performance once the database is large)...
Originally I planned on having a different table for each city and now since a user has to have the option of searching and posting through multiple cities, I think I need them all in one table. Everything was perfect when I had one city per table, where I could:
SELECT *
FROM `posts`
WHERE MATCH (`title`, `desc`, `location`)
AGAINST ('searchtext' IN BOOLEAN MODE)
AND `postcat` LIKE 'searchcatagory'
But then I ran into problems when trying to search multiple cities at one time, or listing all of a users posts for them to delete or edit.
So looks like I have to have one table with all the posts, and also match another FULLTEXT field: cityid. I am guessing I need full-text because if a user chooses an entire state, and my cityid is "fayetteville-nc" I would need to match cityid against "-nc" this is only an assumption and I would love another way. This database could easily reach over a million rows within 6 months, and a fulltext search against 4 columns is probably going to be slow.
My question is, is there a better way to do this more efficiently? The database has nothing in it now, except for some test posts made by me. So I can completely redesign the table structure if necessary. I am open to any and all suggestions, even if it is just a more efficient way to perform my query.
Yes, one table for all posts sounds sensible. It would also be normal design for the posts table to have a city_id, referring to the id in a city table. Each city would also have a state_id, referring to the id in a state table, and similarly each state would have a country_id referring to the id in a country table. So you could write:
SELECT $columns
FROM posts JOIN city ON city.id = posts.city_id
WHERE city.tag = 'fayetteville-nc'
Once you've brought the cities into a separate table, it might make more sense for you to do the city-to-city_id resolving up front. This fairly naturally happens if you have a city chose from a dropdown, for instance. But if you're entering free text into a search field, you may want to do it differently.
You can also search for all posts in a given state (or set of states) as:
SELECT $columns
FROM posts
JOIN city ON city.id = posts.city_id
JOIN state ON state.id = city.state_id
WHERE state.tag = 'NC'
If you're going to go more fancy or international, you may need a more flexible way of arranging locations into a hierarchy (e.g. you may want city districts, counties, multinational regions, intranational regions (Midwest, East Coast etc)) but stay easy for now :)
I'm using Geonames database for a hotel booking website. The database has two tables, one for countries, and one for cities with over 3 mil. entries. If I try to get all the cities for a specific country the query is too slow. I think is because I don't have any index defined.
The countries table has the following fields:
iso_alpha2 (country code)
name
continent
population
The cities table has the following fields:
name
asciiname
alternate_names
country
The "country" field from the cities table relates to "iso_alpha2" field in the countries table.
How can I speed up the query?
P.S. I'm using MySQL.
You need to add an index on the field that you use in the WHERE clause (in your case it seems to be the country field).
Edit: one more thing - if you have multiple conditions in the WHERE clause you need to add an index that contains all the fields used in that clause (having separate indexes on the fields won't work). However in your case I believe that the index on the country field should do.
For this query, you'd only need the cities table:
select name from cities where country = 'US'
This query would benefit from an index on country.