Searching in a database with over 3 mil. entries - mysql

I'm using Geonames database for a hotel booking website. The database has two tables, one for countries, and one for cities with over 3 mil. entries. If I try to get all the cities for a specific country the query is too slow. I think is because I don't have any index defined.
The countries table has the following fields:
iso_alpha2 (country code)
name
continent
population
The cities table has the following fields:
name
asciiname
alternate_names
country
The "country" field from the cities table relates to "iso_alpha2" field in the countries table.
How can I speed up the query?
P.S. I'm using MySQL.

You need to add an index on the field that you use in the WHERE clause (in your case it seems to be the country field).
Edit: one more thing - if you have multiple conditions in the WHERE clause you need to add an index that contains all the fields used in that clause (having separate indexes on the fields won't work). However in your case I believe that the index on the country field should do.

For this query, you'd only need the cities table:
select name from cities where country = 'US'
This query would benefit from an index on country.

Related

Inner join in mysql take a long time

I have table contacts with more than 1,000,000 and other table cities which have about 20,000 records. Need to fetch all cities which have used in contacts table.
Contacts table have following columns
Id, name, phone, email, city, state, country, postal, address, manager_Id
cities table have
Id, city
I used Inner join for this, but its taking a long time to go. Query takes more than 2 minutes to execute.
I used this query
SELECT cities.* FROM cities
INNER JOIN contacts ON contacts.City = cities.city
WHERE contacts.manager_Id= 1
created index on manager_Id as well. But still its very slow.
for better performance you could add index
on table cities column city
on table contacts a composite index on columns (manager_id, city)
Filter contacts first and then join to cities:
SELECT ct.*
FROM cities ct INNER JOIN (
SELECT city FROM contacts
WHERE manager_Id = 1
) cn ON cn.city = ct.city
You need indexes for city in both tables and for manager_id in contacts.
As others have pointed out about having proper index, I am taking it a bit more for clarification. You are specifically looking for contacts where the MANAGER ID = 1. This is not expected to be one person, but could be many people. So having the MANAGER ID in the first position will optimize get me all people for that manager. By having the city as part of the index via (manager_id, city), you are pulling the two data elements you need to optimize as part of the index. This way the engine does not have to go to the raw data pages to get the other part of interest.
Now, From that, you want all the city information (hence the join to city table on that ID).
Since you are only querying the CITIES and not the actual contact people information, you probably want to have DISTINCT City ID. Lets say a manager is responsible for 50 people and most of them live in the same city or neighboring. You may have 5 distinct cities? That too will limit your result set of joining.
Having said that, I would do a follows, and with MySQL, using STRAIGHT_JOIN can help optimize by "do the query as I wrote it, don't think for me".
select STRAIGHT_JOIN
cty.*
from
( select distinct c.City
from Contacts c
where c.Manager_ID = 1 ) PQ
JOIN Cities cty
on PQ.City = Cty.City
The "PQ" is an alias representing my "pre-query" of just DISTINCT cities for a given manager.
Again, have one index on Contacts table on (manager_id, city). On the city table, I would expect and index on (city).
You need two indexes, one on each table.
On the contacts table, first index manager_Id, then City
CREATE INDEX idx_contacts_mgr_city ON contacts(manager_Id, City);
On the cities table, just index `City.
Is the 'City' field from the table 'Contacts' a VARCHAR?
If that's the case, I see multiple things here.
First of all, since you have already have the 'Id' for the corresponding city in your 'cities' tables, I don't see why not to use the same 'Id' from the 'cities' table for the 'Contacts' table.
You can add the 'IdCity' field to the 'Contacts' table so you don't have to modify your existing records.
You'll have to insert the 'IdCity' manually though for each of your records, or you can create a Query using 'cities' table and then compare the 'idCity' but insert the 'city' (city name) in your 'Contacts' table.
Returning to your query:
Then, use an INT JOIN instead of a VARCHAR JOIN. Since you have many records, this can show up an important significance in performance.
It looks like you need to add two indexes, one on cities.city and one on (contacts.manager_Id, contacts.city). That should speed things up significantly.

MYSql Code to Display data from two tables into Single one

I have two tables States and Cities.
In teststates table: there are data regarding states of different countries
The code column has id for states associated with it
In Cities table:
there is region column which has same id as of code column in teststates table.
Requirement:
I want the Code in mysql to fetch id from the teststates table column and replace it into region column in testcities table as I want only two columns which are city name and region in testcities table. Please help me!
select a.name,b.region from teststates a,testcities b where a.country = b.country
This will return City Name and Region Name columns joining on Country name from both tables.

SQL QUERY - How to extract last child of a table?

I have a MYSQL table of groups of people organised by country, region, sub region and city.
When visitor join a group, he select a city and we automatically add him in the parents groups "sub-region", "region, "country".
For example: John selected London. So he will be added in groups London, Greater London, England, UK.
We get a parent-child table like this:
http://prntscr.com/3kui69
I need to extract all the rows for the city groups. How to recognise a city groups? It is the only rows where its ID is not in id_parent field of other rows.
Yes! City groups rows can't be parents of other groups. So we can't find city groups id in id_parent fields.
Now that we know his, how can I extract the city groups rows with SQL language? It is too complicate for me.
Thanks in advance.
Please try using this query:
select * from table
where id not in (select id_parent from table);
Here is an example: http://sqlfiddle.com/#!2/7de26/1

Update table based on other 2 related tables

I have a strange problem. I got some data for cities, regions and countries in CSV format and imported them into MySQL tables.
I have 3 tables and their fields
1. City : id, name, country_code, region_number
2. Region : region_number, country_code, name
3. Country : country_code, name
Now things get a little complicated, as I added an auto-generated id column to the region table, so the region x for country y would be unique.
The thing is: Now i am trying to update city field region_number to hold this unique value (the new id column in region) so I can have relations city->region.
The relation region->country or country->region is OK.
Is it possible to write an update query that would update city region_code (or fill some new column, eg. region_id) with correct values?
If not an query, what could I use to get the correct values into the cities table?
I have arround 3 million records!
If I understant correctly, I think you are looking for something like this:
UPDATE
City inner join Region
on City.country_code = Region.country_code
and City.region_number = Region.region_number
SET
City.new_column = Region.id
However, since there's a relation already between City and Region, I am not sure this is the right thing to do, since it will make the table not normalized.
Now i am trying to update city field region_number to not hold this unique value
The only way you can do this is if the region_number uniquely identifies each region - and if that's already the case then you are wasting your time by creating redundant references. Although frankly, if these really are your table structures, there's no reason for using surrogate keys. And if there's no reason for using surrogate keys then the region and country table are redundant.

Normalizing MySQL table with records of another table

i have 2 tables. The city tables is not normalized because the country information is in plain text. I have added the id_country to the 'city' table (that column is empty).
I need to check for matches between city>country and country>country and then update the city records that matched with the id_country from the country table. At the end i will be able to delete the 'country' column from the city table.
City table
id_city (1, 2, 3...)
city (Washington, Guayaquil, Bonn...)
country (Germany, Ecuador, USA...)
id_country (currently empty)
Country table
id_country (1, 2, 3...)
code (GE, EC, US...)
country (Germany, Ecuador, USA...)
I have no idea on where to start and if it can be done with a SQL query. My original idea was to search for matches in a php loop but that seems to be a really harder implementation.
You can do this with a JOIN on an UPDATE statement.
UPDATE city c1 INNER JOIN country c2 ON c1.country=c2.country
SET c1.id_country=c2.id_country;
Using an INNER JOIN will make sure that updates only occur for cities that have a matching country value.
Once you've run it, you'll be able to select all those cities that still have a null id_country just in case some of them didn't match. Conversely, once you've determined that all your cities have an id_country, you can delete that column from the city table.
The city tables is not normalized because the country information is
in plain text.
Nonsense. Normalization doesn't mean "replace plain text with id numbers". Find whoever taught you that and poke him in the eye with a sharp stick.
Your real problem is that "city" plus "country" isn't sufficient to identify cities, at least in the USA. I think there are at least a dozen different cities named "Washington" in the USA.
Instead of replacing the country name with an id number, you'd be far better off replacing it with the two-letter country code. The codes are human-readable; the id numbers will require an additional JOIN in every query that uses your table of cities.
Something like this should work:
UPDATE city set id_country = (SELECT country.id_country from country WHERE country.country = city.country)