I have three tables named users, cities and countries and these two scenarios:
1) User belongs to city, city belongs to country (deep join)
Table users has 2 fields: id (PK) and city_id (FK).
Table cities has 2 fields: id (PK) and country_id (FK).
Table countries has 2 fields: id (PK) and name.
Get any user's country:
SELECT country.name
FROM users
LEFT JOIN cities ON user.city_id = cities.id
LEFT JOIN countries ON city.country_id = country.id
WHERE user.id = 1;
2) User belongs to city and country, city belongs to country (one join)
Table users has 3 fields: id (PK), city_id (FK) and country_id (FK).
Table cities has 2 fields: id (PK) and country_id (FK).
Table countries has 2 fields: id (PK) and name.
Get any user's country:
SELECT country.name
FROM users
LEFT JOIN countries ON user.country_id = country.id
WHERE user.id = 1;
At first glance, scenario 2 seems faster but, is it a good idea to have country_id FK in users table to save one join? Or should I take advantage of relationships and make a deep join? What of these two scenarios actually perform faster?
One join is almost always faster than 2 joins, but the question here shouldn't be which is faster but which is more maintainable (also look at When to optimize).
Are you actually having a performance problem? Even though in this case the data probably never changes (at least, cities usually don't change country) there is still a risk that the data between the tables gets out of date. So the question here is, is it fast enough?
These types of optimisations generally give very little benefit in terms of performance but bring in risks that the data will be out of date and it makes things more complex.
In the first situation you are primary key based lookup on three tables and reducing it to only two tables in the second. That is what I would consider a micro-optimization. You won't see significant performance returns unless the tables are enormous (millions of rows) or writes are happening quickly enough to cause lock contention.
Related
I have table contacts with more than 1,000,000 and other table cities which have about 20,000 records. Need to fetch all cities which have used in contacts table.
Contacts table have following columns
Id, name, phone, email, city, state, country, postal, address, manager_Id
cities table have
Id, city
I used Inner join for this, but its taking a long time to go. Query takes more than 2 minutes to execute.
I used this query
SELECT cities.* FROM cities
INNER JOIN contacts ON contacts.City = cities.city
WHERE contacts.manager_Id= 1
created index on manager_Id as well. But still its very slow.
for better performance you could add index
on table cities column city
on table contacts a composite index on columns (manager_id, city)
Filter contacts first and then join to cities:
SELECT ct.*
FROM cities ct INNER JOIN (
SELECT city FROM contacts
WHERE manager_Id = 1
) cn ON cn.city = ct.city
You need indexes for city in both tables and for manager_id in contacts.
As others have pointed out about having proper index, I am taking it a bit more for clarification. You are specifically looking for contacts where the MANAGER ID = 1. This is not expected to be one person, but could be many people. So having the MANAGER ID in the first position will optimize get me all people for that manager. By having the city as part of the index via (manager_id, city), you are pulling the two data elements you need to optimize as part of the index. This way the engine does not have to go to the raw data pages to get the other part of interest.
Now, From that, you want all the city information (hence the join to city table on that ID).
Since you are only querying the CITIES and not the actual contact people information, you probably want to have DISTINCT City ID. Lets say a manager is responsible for 50 people and most of them live in the same city or neighboring. You may have 5 distinct cities? That too will limit your result set of joining.
Having said that, I would do a follows, and with MySQL, using STRAIGHT_JOIN can help optimize by "do the query as I wrote it, don't think for me".
select STRAIGHT_JOIN
cty.*
from
( select distinct c.City
from Contacts c
where c.Manager_ID = 1 ) PQ
JOIN Cities cty
on PQ.City = Cty.City
The "PQ" is an alias representing my "pre-query" of just DISTINCT cities for a given manager.
Again, have one index on Contacts table on (manager_id, city). On the city table, I would expect and index on (city).
You need two indexes, one on each table.
On the contacts table, first index manager_Id, then City
CREATE INDEX idx_contacts_mgr_city ON contacts(manager_Id, City);
On the cities table, just index `City.
Is the 'City' field from the table 'Contacts' a VARCHAR?
If that's the case, I see multiple things here.
First of all, since you have already have the 'Id' for the corresponding city in your 'cities' tables, I don't see why not to use the same 'Id' from the 'cities' table for the 'Contacts' table.
You can add the 'IdCity' field to the 'Contacts' table so you don't have to modify your existing records.
You'll have to insert the 'IdCity' manually though for each of your records, or you can create a Query using 'cities' table and then compare the 'idCity' but insert the 'city' (city name) in your 'Contacts' table.
Returning to your query:
Then, use an INT JOIN instead of a VARCHAR JOIN. Since you have many records, this can show up an important significance in performance.
It looks like you need to add two indexes, one on cities.city and one on (contacts.manager_Id, contacts.city). That should speed things up significantly.
Considering this ER diagram
we have Students who are admitted to participate in an Exam, and each Exam can be split up into multiple Runs (for example, to split up large groups across multiple rooms or to have two Runs for the same Exam in direct succession).
Is it possible to ensure (via database constraints) that Students participate only in Runs that belong to Exams they are admitted to?
I couldn't find a way on my own and also don't know how to phrase this for an internet search.
You have these tables and columns:
exam: id, name
student: id, name
run: id, exam_id (foreign key to exam.id), when (timestamp), room
You need a new intersection table to keep track of what exam is being taken by which student:
int_exam_to_student: exam_id, student_id - both foreign keys
Now, you can query this to determine what runs a student is allowed to be in:
select run.* from run join int_exam_to_student i on (run.exam_id = i.exam_id) where i.student_id = 123;
I have a schema design question for my application, hope I can get advices from teachers. This is very alike of Role Based Access Controll, but a bit different in detail.
Spec:
For one company, there are 4 roles: Company (Boss) / Department (Manager) / Team (Leader) / Member (Sales), and there are about 1 million Customers records. Each customer record can be owned by someone, and he could be Boss or Manager or Leader or Sales. If the record's owner is some Sales, then his upper grade (say: his leader / manager / boss) can see this record as well (but others: say the same level of his workmates, cannot see, unless his upper grade manager share the customer to his workmates), but if the record's owner is boss, none except the boss himself can see it.
My Design is like this (I want to improve it to make it more simple and clear):
Table:
departments:
id (P.K. deparment id)
d_name (department name)
p_id (parent department id)
employees
id (P.K. employee id)
e_name (employee name)
employee_roles
id (P.K.)
e_id (employee id)
d_id (department id)
customers
id (P.K. customer id)
c_name (customer name)
c_phone (customer phone)
permissions
id (P.K.)
c_id (customer id)
e_id (owner employee id)
d_id (this customer belongs to which deparment)
share_to (this customer share to other's id)
P.S.: each employee can have multi roles, for example, employee A can be the manager of department_I and meanwhile he can also be one sales of deparment_II >> Team_X.
So, when an employee login to application, by querying from employee_roles table, we can get all of the department ids and sub department ids, and save them into an array.
Then I can use this array to query from permissions table and join it with customers table to get all the customers this employee should see. The SQL might look like this:
SELECT * FROM customers AS a INNER JOIN permissions AS b ON a.id =
b.c_id AND (b.d_id IN ${DEP_ARRAY} OR e_id = ${LOGIN_EMPLOYEE_ID} OR
share_to = ${LOGIN_EMPLOYEE_ID})
I don't really like the above SQL, especially the "IN" clause, since I am afraid it will slow down the query, since there are about 1 million records or even more in the customer table; and, there will be as many records as the customers table in the permissions table, the INNER JOIN might be very slow too. (So what I care about is the performance like everyone :))
To my best knowledge, this is the best design I can work out, could you teachers please help to give me some advice on this question? If you need anything more info, please let me know.
Any advice would be appreciated!
Thanks a million in advance!!
Do not use an array, use a table, ie the value of a select statement. And stop worrying about performance until you know more basics about thinking in terms of tables and queries.
The point of the relational model is that if you structure your data as tables then you can looplessly describe the output table and the DBMS figures out how to calculate it. See this. Do not think about "joining"; think about describing the result. Whatever the DBMS ends up doing is its business not yours. Only after you become knowledgeable about variations in descriptions and options for descriptions will you have basic knowledge to learn about performance.
Ok, so I am going to try and be specific as possible but my MySQL skills are pretty weak. So here is the situation:
I have 2 tables: Donor and Students. A donor can be linked to as many students as they want and each student can be linked to as many donors as donors want to "claim" them. So if I have Sally, a student, she can have Jim, a donor, and Jeff, a donor, be linked to her. So, I have all my students in one table and all my donors in another table. I need to put them together show the students name, id and the id of all the donors that the student is linked to.
Currently my tables are: Donor with DonorID, FirstName, LastName, DonorType, StreetAddress, etc. Then Students with: StudentID, FirstName, LastName and DonorID. However, that only allows me to link a student with one donor. So, I was thinking I need to make a transition table that would allow me to show the StudentID, FirstName(of student), LastName(of student), and the DonorID that "claims" that student and allow for me to duplicate StudentID and put a different DonorID in the 2nd, 3rd, 4th, so on and so on entries of the same student.
So, I guess my question is how do transition tables work in MySQL; I believe I am going to need to work with the JOIN function and join the two tables together but after reading about that on tizag.com I am even more confused. I have worked with Access where when you create a transition table you can pull the PK from each table and create a composite key using the two keys from the other tables but I am not quite sure how to do that in MySQL; does it work essentially the same and I should be pulling the PK from each table and linking them together in the 3rd, transition, table?
Marvin's right.
You need three tables to pull this off.
Donor
DonorID
FirstName
LastName
DonorType
StreetAddress
etc.
Student
StudentID
FirstName
LastName
etc.
Then you need Student_Donor. This is often called a join table, and implements the many-to-many relationship.
StudentID (PK) (FK to Student)
DonorID (PK) (FK to Donor)
DonorOrdinal
If StudentID = 5 has four donors with ID = 6,7,11,15, then you'l have these rows
StudentId DonorId DonorOrdinal
5 6 1
5 7 2
5 11 3
5 15 4
The DonorOrdinal column allows you to specify a student's primary donor and secondary donors. I can't tell from your question how important that is, but it's helpful to be able to order these things. Remember this: formally speaking, a SQL SELECT query returns rows in an unpredictable order unless you also specify ORDER BY.
If you want to display your students and their donors, you'll need this query:
SELECT s.StudentID, s.FirstName, s.LastName,
sd.DonorOrdinal,
d.DonorType, d.DonorID, d.FirstName, d.LastName
FROM student s
LEFT JOIN student_donor sd ON s.StudentID = sd.StudentID
LEFT JOIN donor d ON sd.DonorID = d.DonorID
ORDER BY s.StudentID, sd.DonorOrdinal, d.DonorID
This will show you all students (whether having donors or not per LEFT JOIN) in order of ID, then their donors in order of DonorOrdinal.
Yes, pretty much. What you are talking about is a Many to Many relationship. Your donor table should not include a reference to the student table (and vice versa). You'll need a new table (what you are calling a transition table) that contains the DonorId and StudentId. This can comprise your primary key or you can use another column for the primary key that is an auto_increment.
Ok, I have a database with with a table for storing classified posts, each post belongs to a different city. For the purpose of this example we will call this table posts. This table has columns:
id (INT, +AI),
cityid (TEXT),
postcat (TEXT),
user (TEXT),
datatime (DATETIME),
title (TEXT),
desc (TEXT),
location (TEXT)
an example of that data would be:
'12039',
'fayetteville-nc',
'user#gmail.com',
'December 28th, 2010 - 11:55 PM',
'post title',
'post description',
'spring lake'
id is auto incremented, cityid is in text format (this is where I think i will be losing performance once the database is large)...
Originally I planned on having a different table for each city and now since a user has to have the option of searching and posting through multiple cities, I think I need them all in one table. Everything was perfect when I had one city per table, where I could:
SELECT *
FROM `posts`
WHERE MATCH (`title`, `desc`, `location`)
AGAINST ('searchtext' IN BOOLEAN MODE)
AND `postcat` LIKE 'searchcatagory'
But then I ran into problems when trying to search multiple cities at one time, or listing all of a users posts for them to delete or edit.
So looks like I have to have one table with all the posts, and also match another FULLTEXT field: cityid. I am guessing I need full-text because if a user chooses an entire state, and my cityid is "fayetteville-nc" I would need to match cityid against "-nc" this is only an assumption and I would love another way. This database could easily reach over a million rows within 6 months, and a fulltext search against 4 columns is probably going to be slow.
My question is, is there a better way to do this more efficiently? The database has nothing in it now, except for some test posts made by me. So I can completely redesign the table structure if necessary. I am open to any and all suggestions, even if it is just a more efficient way to perform my query.
Yes, one table for all posts sounds sensible. It would also be normal design for the posts table to have a city_id, referring to the id in a city table. Each city would also have a state_id, referring to the id in a state table, and similarly each state would have a country_id referring to the id in a country table. So you could write:
SELECT $columns
FROM posts JOIN city ON city.id = posts.city_id
WHERE city.tag = 'fayetteville-nc'
Once you've brought the cities into a separate table, it might make more sense for you to do the city-to-city_id resolving up front. This fairly naturally happens if you have a city chose from a dropdown, for instance. But if you're entering free text into a search field, you may want to do it differently.
You can also search for all posts in a given state (or set of states) as:
SELECT $columns
FROM posts
JOIN city ON city.id = posts.city_id
JOIN state ON state.id = city.state_id
WHERE state.tag = 'NC'
If you're going to go more fancy or international, you may need a more flexible way of arranging locations into a hierarchy (e.g. you may want city districts, counties, multinational regions, intranational regions (Midwest, East Coast etc)) but stay easy for now :)