Update tables based on NULL values in a derived table - mysql

I want to start out with this as a basis for my question that I may already know the answer to.
mysql> select a.i as AI, a.b as AB, b.i as BI, b.b as BB from a left join b using (i) union select a.i as AI, a.b as AB, b.i as BI, b.b as BB from b left join a using (i) where b.i = 5;
+------+----------+------+------+
| AI | AB | BI | BB |
+------+----------+------+------+
| 1 | Not NULL | 1 | 0 |
| 2 | Not NULL | 2 | 0 |
| 3 | Not NULL | 3 | 0 |
| 4 | Not NULL | 4 | 0 |
| 0 | Not NULL | NULL | NULL |
| NULL | NULL | 5 | 0 |
+------+----------+------+------+
6 rows in set (0.00 sec)
is there anyway the column AB where = 0 based on the fact that BI = NULL within a joined statement (as the row obviously doesn't exist), and then vice verse, update BB where AI = NULL?
My instinct is NO!!! for a couple of reasons which I am looking to debunk or verify.
for the same reason you cannot update a real table with a temptable algorithm based view. I do not believe that you can update a table from the output of a derived table (at least not in such a way as I am looking for).
probably the most important reason, you cannot update something based on something else that doesn't exist (or by definition may exist but is an unknown value) due to the fact that without the isnull() function you cannot compare null. That being said to use the isnull() function you would need to have a static table with NULL in it, however due to the fact that in this particular instance the NULL values are only in affect when the tables are joined together (due to the obvious reason that matching rows plainly just do not exist). To perform this effectively you would need to make the values real by creating a derived table... which leads us back to point number 1.
If this IS possible and someone knows how to accomplish this please let me know... also if so is it possible to update both of the values with one statement?
I was about half finished with this command when I realized that it may be useless based on knowledge and other research.
mysql> update a,b set b.b='n',a.b='n' where /*possibly a derived table here*/ = isnull((select i1 as 'a.i', i2 as 'b.i' from ((select a.i as i1, b.i as i2 from a left join b using (i) where isnull(b.i)=1) union (select a.i as i1, b.i as i2 from b left join a using (i) where isnull(a.i)=1) as derived1 where isnull(i1)) /*maybe an or statement of some kind and another two derived tables here equalling each other*/ ;
based on my statement it seems that I would need another derived table or 3 to complete the statement in any kind of complete fashion... although I am not confident that it can work based on my tries up till now.
Also its possible I have been at this for long enough that I am no longer thinking straight.
In any case I appreciate the help in advance (even a this is possible, here is the direction you need to go).
Let me know if I need to provide any further information.
P.S. I do not want to put too much emphasis on my unfinished MySQL statement as It may not even be in the direction that I need to be going... and thus confusing.

I have found a partial answer. Please let me know if anyone can extend this. also I had to reinstall everything due to upgrading to centos 7.
where:
mysql> (select b.i as '1',b.b as '2',a.i as '3',a.b as '4' from a left join b using (i)) union (select b.i as '1',b.b as '2',a.i as '3',a.b as '4' from b left join a using (i)) order by 1;
+------+--------------+------+--------------+
| 1 | 2 | 3 | 4 |
+------+--------------+------+--------------+
| NULL | NULL | 1 | 何もない |
| 2 | 何もない | 2 | 何もない |
| 3 | 何もない | 3 | 何もない |
| 4 | 何もない | 4 | 何もない |
| 5 | 何もない | 5 | 何もない |
| 6 | 何もない | NULL | NULL |
+------+--------------+------+--------------+
6 rows in set (0.00 sec)
The command would end up using the same join syntax:
mysql> update a right join b using (i) set b.b = 'ある' where isnull(a.i) or isnull(b.i);
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
In this case MySQL was able to properly locate the row in a that was missing a match in b and replace the data as specified.
mysql> (select b.i as '1',b.b as '2',a.i as '3',a.b as '4' from a left join b using (i)) union (select b.i as '1',b.b as '2',a.i as '3',a.b as '4' from b left join a using (i)) order by 1;
+------+--------------+------+--------------+
| 1 | 2 | 3 | 4 |
+------+--------------+------+--------------+
| NULL | NULL | 1 | 何もない |
| 2 | 何もない | 2 | 何もない |
| 3 | 何もない | 3 | 何もない |
| 4 | 何もない | 4 | 何もない |
| 5 | 何もない | 5 | 何もない |
| 6 | ある | NULL | NULL |
+------+--------------+------+--------------+
6 rows in set (0.00 sec)
I'm not sure if this is the best answer for this question or it can get more detailed yet. that is something I would have to do more research on. If this is the best answer please let me know. If not and someone knows the best answer please let me know.

Related

SQL - get records from many-to-many relations by the user itself -OR- his group

I have two database tables, one as the main table and the other as the relation table.
The first table is a table of contents and the second table is a table that connects to users or groups.
Some data may also be modified in this second table.
I'm not sure about the structure and performance.
for example, we have User Id 160 which is under group id 7
So for the first, we have a post Table.
id | title | content | cover | status
------------------------------------------------
1 | first | content 1 | /img/... | 1
2 | second | content 2 | /img/... | 1
3 | another | content 3 | /img/... | 1
4 | four | content 4 | /img/... | 1
5 | five | content 5 | /img/... | 1
and for the second we have a post_rel Table:
id | group_id | user_id | post_id | title | cover | sort | status
---------------------------------------------------------------------------
1 | 7 | NULL | 1 | g title | img/... | 1 | 1
2 | NULL | 160 | 1 | u title | NULL | 2 | 1 *** selected for user_id
3 | 7 | NULL | 2 | NULL | img/... | 6 | 0
4 | NULL | 160 | 2 | NULL | img/... | 4 | 1 *** selected for user_id
5 | NULL | 160 | 3 | some | img/... | 3 | 1 *** selected for user_id
6 | 7 | NULL | 4 | NULL | img/... | 9 | 1 *** selected for group_id
7 | NULL | 165 | 5 | NULL | img/... | 5 | 0
This is the basic query we have.
select
`post_rel`.`title` as `custom_title`,
`post_rel`.`cover` as `custom_cover`,
`post_rel`.`group_id`,
`post_rel`.`user_id`,
`post`.*
from
`post`
inner join `post_rel` on `post`.`id` = `post_rel`.`post_id`
where
`post`.`status` = 1
and `post_rel`.`status` = 1
and (
`post_rel`.`user_id` = 160
or (
`post_rel`.`group_id` = 7
and `post_rel`.`post_id` not in (
select
`post_rel`.`post_id`
from
`post_rel`
where
`post_rel`.`user_id` = 160
)
)
)
order by
`post_rel`.`sort` asc
So, what you think about the basic query? Especially in the subquery, won't performance drop in a large table? Is it possible to write a better and simpler query or change the structure?
Edit: this is sqlfiddle example of my code and structure http://sqlfiddle.com/#!9/ed9d4b/1
I would change it to use "not exists" instead of "not in" and would use aliases so I could pull it off like so:
select
b.`title` as `custom_title`,
b.`cover` as `custom_cover`,
b.`group_id`,
b.`user_id`,
a.*
from
`post` a
inner join `post_rel` b on a.`id` = b.`post_id`
where
a.`status` = 1
and b.`status` = 1
and (
b.`user_id` = 160
or (
b.`group_id` = 7
and not exists (
select
'x'
from
`post_rel` c
where
c.`user_id` = 160 and c.`post_id`=b.`post_id`
)
)
)
order by
b.`sort` asc
Typically when managing users and group, there's this notion of an exception user who directly can get assigned to assets just like the whole group. This seems to be an example of that.
From a modeling-only perspective, there are 2 ways to deal with that:
Ensure that every user exists in a group and that you only assign assets to groups. For the exception user, create a group. You could even enforce that every user belongs to only one group. This way your post_rel table deals with only groups. Unfortunately, the relationship between group and user is not understood well enough to weigh in appropriately.
Driven only by the need to eliminate null values towards a good model which also reduces overhead, the other option is to use name value pairs and allows the User and Group to exist in the same field with another field besides it, denoting Group or User.
These are the SQL Fiddle:
NOT EXISTS version: http://sqlfiddle.com/#!9/1af8cf/2
NOT IN version: http://sqlfiddle.com/#!9/1af8cf/1
Some reading on nulls https://dev.mysql.com/doc/refman/5.6/en/data-size.html
Specifically:
Declare columns to be NOT NULL if possible. It makes SQL operations faster, by enabling better use of indexes and eliminating overhead for testing whether each value is NULL. You also save some storage space, one bit per column. If you really need NULL values in your tables, use them. Just avoid the default setting that allows NULL values in every column.

Calculating row indices with subquery having joins, results in A*B examined rows

This question is derived from a one I started previously: Incorrect row index when grouping
Due to different natures, I'm asking here and will provide the answer back there once I have resolved this issue.
I thought about subqueries, and came up with this:
SELECT
mq.*,
#indexer := #indexer + 1 AS indexer
FROM
(
SELECT
p.id,
p.tag_id,
p.title,
p.created_at
FROM
`posts` AS p
LEFT JOIN
`votes` AS v
ON p.id = v.votable_id
AND v.votable_type = "Post"
AND v.deleted_at IS NULL
WHERE
p.deleted_at IS NULL
GROUP BY
p.id
) AS mq
JOIN
(SELECT #indexer := 0) AS i
Which actually works, I get the desired result:
+----+--------+------------------------------------+---------------------+---------+
| id | tag_id | title | created_at | indexer |
+----+--------+------------------------------------+---------------------+---------+
| 2 | 2 | PostPostPost | 2014-10-23 23:53:15 | 1 |
| 3 | 3 | Title | 2014-10-23 23:56:13 | 2 |
| 4 | 2 | GIFGIFIGIIF | 2014-10-23 23:59:03 | 3 |
| 5 | 2 | GIFGIFIGIIF | 2014-10-23 23:59:03 | 4 |
| 6 | 4 | My new avatar | 2014-10-26 22:22:30 | 5 |
| 7 | 5 | Hi, haiii, oh Hey ! | 2014-10-26 22:38:10 | 6 |
| 8 | 6 | Mclaren testing stealth technology | 2014-10-26 22:44:15 | 7 |
| 9 | 7 | Just random thoughts while pooping | 2014-10-26 22:50:03 | 8 |
+----+--------+------------------------------------+---------------------+---------+
The problem now is... I ran a EXPLAIN query, to see how fast it works. And, I have a number there that is really bugging me:
Well, the number is obvious: 252 * 1663 = 419076.
This worries me, though - is the row count normal there, or I have to optimize the query? And if so, then how do I optimize this one?
As of MySQL version 5.7 all joins are treated as nested loop joins.
MySQL resolves all joins using a nested-loop join method. This means that MySQL reads a row from the first table, and then finds a matching row in the second table, the third table, and so on.
So to answer your question... no, you won't be able to get that row count down. However, by adding indexes to your join columns you may be able to achieve faster results but your row count will be the same.

comparing two SQL tables set cut-set to true and else to false

for my new project I decided to optimize all kind of inefficient SQL queries I've used on my older projects before importing it. After some hours on a problem that in my opinion seems to be very simple I am at a loss. I've got two different Tables that are holding user-IDs
table r table s
+-----+ +-----+
| user| | user|
+-----+ +-----+
| 2 | | 1 |
| 3 | | 4 |
| 4 | +-----+
+-----+
what I want now is to compare both Tables and setting a true or false flag in another column of the result table which shows me what value of table s is in table r. So what I want is just set the cut-set to true. the result table should look like this
result
+-----+-------+
| user| flag |
+-----+-------+
| 1 | FALSE |
+-----+-------+
| 4 | TRUE |
+-----+-------+
So the user 4 is the only user that is listed in table r. I already tried some solutions with IF clauses but I'm not pro in SQL but I do my best to get closer to that aim. My idea was this query
SELECT
r.user,
IF(r.user = s.user, TRUE, FALSE) AS flag
FROM
r,
s
I know that it isn't correct at all because the query gives me the following 6 rows.
result
+-----+-------+
| user| flag |
+-----+-------+
| 2 | 0 |
+-----+-------+
| 2 | 0 |
+-----+-------+
| 3 | 0 |
+-----+-------+
| 3 | 0 |
+-----+-------+
| 4 | 0 |
+-----+-------+
| 4 | 1 |
+-----+-------+
so the last row to each user seems to be the result I want but the problem is that the false table is combined with the flag and each user has two flags. I don't understand whats going wrong and I would be appreciated if someone could explain me how the SQL server understands my query and what I have to change to get the correct result
You have no join condition, so your query is simply creating a cartesian product between the two tables. It then reports true or false depending on whether the users match in a particular pairing. You would see this if you included s.user in your SELECT clause.
To do what you want, you should use an outer join.
SELECT s.user, r.user IS NOT NULL AS flag
FROM s
LEFT JOIN r
ON r.user = s.user
Try:
SELECT r.user FROM r,s flag WHERE r.user = s.user

MySQL: optimize query for scoring calculation

I have a data table that I use to do some calculations. The resulting data set after calculations looks like:
+------------+-----------+------+----------+
| id_process | id_region | type | result |
+------------+-----------+------+----------+
| 1 | 4 | 1 | 65.2174 |
| 1 | 5 | 1 | 78.7419 |
| 1 | 6 | 1 | 95.2308 |
| 1 | 4 | 1 | 25.0000 |
| 1 | 7 | 1 | 100.0000 |
+------------+-----------+------+----------+
By other hand I have other table that contains a set of ranges that are used to classify the calculations results. The range tables looks like:
+----------+--------------+---------+
| id_level | start | end | status |
+----------+--------------+---------+
| 1 | 0 | 75 | Danger |
| 2 | 76 | 90 | Alert |
| 3 | 91 | 100 | Good |
+----------+--------------+---------+
I need to do a query that add the corresponding 'status' column to each value when do calculations. Currently, I can do that adding the following field to calculation query:
select
...,
...,
[math formula] as result,
(select status
from ranges r
where result between r.start and r.end) status
from ...
where ...
It works ok. But when I have a lot of rows (more than 200K), calculation query become slow.
My question is: there is some way to find that 'status' value without do that subquery?
Some one have worked on something similar before?
Thanks
Yes, you are looking for a subquery and join:
select s.*, r.status
from (select s.*
from <your query here>
) s left outer join
ranges r
on s.result between r.start and r.end
Explicit joins often optimize better than nested select. In this case, though, the ranges table seems pretty small, so this may not be the performance issue.

SQL query return data from multiple tables

I would like to know the following:
how to get data from multiple tables in my database?
what types of methods are there to do this?
what are joins and unions and how are they different from one another?
When should I use each one compared to the others?
I am planning to use this in my (for example - PHP) application, but don't want to run multiple queries against the database, what options do I have to get data from multiple tables in a single query?
Note: I am writing this as I would like to be able to link to a well written guide on the numerous questions that I constantly come across in the PHP queue, so I can link to this for further detail when I post an answer.
The answers cover off the following:
Part 1 - Joins and Unions
Part 2 - Subqueries
Part 3 - Tricks and Efficient Code
Part 4 - Subqueries in the From Clause
Part 5 - Mixed Bag of John's Tricks
Part 1 - Joins and Unions
This answer covers:
Part 1
Joining two or more tables using an inner join (See the wikipedia entry for additional info)
How to use a union query
Left and Right Outer Joins (this stackOverflow answer is excellent to describe types of joins)
Intersect queries (and how to reproduce them if your database doesn't support them) - this is a function of SQL-Server (see info) and part of the reason I wrote this whole thing in the first place.
Part 2
Subqueries - what they are, where they can be used and what to watch out for
Cartesian joins AKA - Oh, the misery!
There are a number of ways to retrieve data from multiple tables in a database. In this answer, I will be using ANSI-92 join syntax. This may be different to a number of other tutorials out there which use the older ANSI-89 syntax (and if you are used to 89, may seem much less intuitive - but all I can say is to try it) as it is much easier to understand when the queries start getting more complex. Why use it? Is there a performance gain? The short answer is no, but it is easier to read once you get used to it. It is easier to read queries written by other folks using this syntax.
I am also going to use the concept of a small caryard which has a database to keep track of what cars it has available. The owner has hired you as his IT Computer guy and expects you to be able to drop him the data that he asks for at the drop of a hat.
I have made a number of lookup tables that will be used by the final table. This will give us a reasonable model to work from. To start off, I will be running my queries against an example database that has the following structure. I will try to think of common mistakes that are made when starting out and explain what goes wrong with them - as well as of course showing how to correct them.
The first table is simply a color listing so that we know what colors we have in the car yard.
mysql> create table colors(id int(3) not null auto_increment primary key,
-> color varchar(15), paint varchar(10));
Query OK, 0 rows affected (0.01 sec)
mysql> show columns from colors;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(3) | NO | PRI | NULL | auto_increment |
| color | varchar(15) | YES | | NULL | |
| paint | varchar(10) | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+
3 rows in set (0.01 sec)
mysql> insert into colors (color, paint) values ('Red', 'Metallic'),
-> ('Green', 'Gloss'), ('Blue', 'Metallic'),
-> ('White' 'Gloss'), ('Black' 'Gloss');
Query OK, 5 rows affected (0.00 sec)
Records: 5 Duplicates: 0 Warnings: 0
mysql> select * from colors;
+----+-------+----------+
| id | color | paint |
+----+-------+----------+
| 1 | Red | Metallic |
| 2 | Green | Gloss |
| 3 | Blue | Metallic |
| 4 | White | Gloss |
| 5 | Black | Gloss |
+----+-------+----------+
5 rows in set (0.00 sec)
The brands table identifies the different brands of the cars out caryard could possibly sell.
mysql> create table brands (id int(3) not null auto_increment primary key,
-> brand varchar(15));
Query OK, 0 rows affected (0.01 sec)
mysql> show columns from brands;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(3) | NO | PRI | NULL | auto_increment |
| brand | varchar(15) | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+
2 rows in set (0.01 sec)
mysql> insert into brands (brand) values ('Ford'), ('Toyota'),
-> ('Nissan'), ('Smart'), ('BMW');
Query OK, 5 rows affected (0.00 sec)
Records: 5 Duplicates: 0 Warnings: 0
mysql> select * from brands;
+----+--------+
| id | brand |
+----+--------+
| 1 | Ford |
| 2 | Toyota |
| 3 | Nissan |
| 4 | Smart |
| 5 | BMW |
+----+--------+
5 rows in set (0.00 sec)
The model table will cover off different types of cars, it is going to be simpler for this to use different car types rather than actual car models.
mysql> create table models (id int(3) not null auto_increment primary key,
-> model varchar(15));
Query OK, 0 rows affected (0.01 sec)
mysql> show columns from models;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(3) | NO | PRI | NULL | auto_increment |
| model | varchar(15) | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)
mysql> insert into models (model) values ('Sports'), ('Sedan'), ('4WD'), ('Luxury');
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> select * from models;
+----+--------+
| id | model |
+----+--------+
| 1 | Sports |
| 2 | Sedan |
| 3 | 4WD |
| 4 | Luxury |
+----+--------+
4 rows in set (0.00 sec)
And finally, to tie up all these other tables, the table that ties everything together. The ID field is actually the unique lot number used to identify cars.
mysql> create table cars (id int(3) not null auto_increment primary key,
-> color int(3), brand int(3), model int(3));
Query OK, 0 rows affected (0.01 sec)
mysql> show columns from cars;
+-------+--------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------+------+-----+---------+----------------+
| id | int(3) | NO | PRI | NULL | auto_increment |
| color | int(3) | YES | | NULL | |
| brand | int(3) | YES | | NULL | |
| model | int(3) | YES | | NULL | |
+-------+--------+------+-----+---------+----------------+
4 rows in set (0.00 sec)
mysql> insert into cars (color, brand, model) values (1,2,1), (3,1,2), (5,3,1),
-> (4,4,2), (2,2,3), (3,5,4), (4,1,3), (2,2,1), (5,2,3), (4,5,1);
Query OK, 10 rows affected (0.00 sec)
Records: 10 Duplicates: 0 Warnings: 0
mysql> select * from cars;
+----+-------+-------+-------+
| id | color | brand | model |
+----+-------+-------+-------+
| 1 | 1 | 2 | 1 |
| 2 | 3 | 1 | 2 |
| 3 | 5 | 3 | 1 |
| 4 | 4 | 4 | 2 |
| 5 | 2 | 2 | 3 |
| 6 | 3 | 5 | 4 |
| 7 | 4 | 1 | 3 |
| 8 | 2 | 2 | 1 |
| 9 | 5 | 2 | 3 |
| 10 | 4 | 5 | 1 |
+----+-------+-------+-------+
10 rows in set (0.00 sec)
This will give us enough data (I hope) to cover off the examples below of different types of joins and also give enough data to make them worthwhile.
So getting into the grit of it, the boss wants to know The IDs of all the sports cars he has.
This is a simple two table join. We have a table that identifies the model and the table with the available stock in it. As you can see, the data in the model column of the cars table relates to the models column of the cars table we have. Now, we know that the models table has an ID of 1 for Sports so lets write the join.
select
ID,
model
from
cars
join models
on model=ID
So this query looks good right? We have identified the two tables and contain the information we need and use a join that correctly identifies what columns to join on.
ERROR 1052 (23000): Column 'ID' in field list is ambiguous
Oh noes! An error in our first query! Yes, and it is a plum. You see, the query has indeed got the right columns, but some of them exist in both tables, so the database gets confused about what actual column we mean and where. There are two solutions to solve this. The first is nice and simple, we can use tableName.columnName to tell the database exactly what we mean, like this:
select
cars.ID,
models.model
from
cars
join models
on cars.model=models.ID
+----+--------+
| ID | model |
+----+--------+
| 1 | Sports |
| 3 | Sports |
| 8 | Sports |
| 10 | Sports |
| 2 | Sedan |
| 4 | Sedan |
| 5 | 4WD |
| 7 | 4WD |
| 9 | 4WD |
| 6 | Luxury |
+----+--------+
10 rows in set (0.00 sec)
The other is probably more often used and is called table aliasing. The tables in this example have nice and short simple names, but typing out something like KPI_DAILY_SALES_BY_DEPARTMENT would probably get old quickly, so a simple way is to nickname the table like this:
select
a.ID,
b.model
from
cars a
join models b
on a.model=b.ID
Now, back to the request. As you can see we have the information we need, but we also have information that wasn't asked for, so we need to include a where clause in the statement to only get the Sports cars as was asked. As I prefer the table alias method rather than using the table names over and over, I will stick to it from this point onwards.
Clearly, we need to add a where clause to our query. We can identify Sports cars either by ID=1 or model='Sports'. As the ID is indexed and the primary key (and it happens to be less typing), lets use that in our query.
select
a.ID,
b.model
from
cars a
join models b
on a.model=b.ID
where
b.ID=1
+----+--------+
| ID | model |
+----+--------+
| 1 | Sports |
| 3 | Sports |
| 8 | Sports |
| 10 | Sports |
+----+--------+
4 rows in set (0.00 sec)
Bingo! The boss is happy. Of course, being a boss and never being happy with what he asked for, he looks at the information, then says I want the colors as well.
Okay, so we have a good part of our query already written, but we need to use a third table which is colors. Now, our main information table cars stores the car color ID and this links back to the colors ID column. So, in a similar manner to the original, we can join a third table:
select
a.ID,
b.model
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
where
b.ID=1
+----+--------+
| ID | model |
+----+--------+
| 1 | Sports |
| 3 | Sports |
| 8 | Sports |
| 10 | Sports |
+----+--------+
4 rows in set (0.00 sec)
Damn, although the table was correctly joined and the related columns were linked, we forgot to pull in the actual information from the new table that we just linked.
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
where
b.ID=1
+----+--------+-------+
| ID | model | color |
+----+--------+-------+
| 1 | Sports | Red |
| 8 | Sports | Green |
| 10 | Sports | White |
| 3 | Sports | Black |
+----+--------+-------+
4 rows in set (0.00 sec)
Right, that's the boss off our back for a moment. Now, to explain some of this in a little more detail. As you can see, the from clause in our statement links our main table (I often use a table that contains information rather than a lookup or dimension table. The query would work just as well with the tables all switched around, but make less sense when we come back to this query to read it in a few months time, so it is often best to try to write a query that will be nice and easy to understand - lay it out intuitively, use nice indenting so that everything is as clear as it can be. If you go on to teach others, try to instill these characteristics in their queries - especially if you will be troubleshooting them.
It is entirely possible to keep linking more and more tables in this manner.
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
join brands d
on a.brand=d.ID
where
b.ID=1
While I forgot to include a table where we might want to join more than one column in the join statement, here is an example. If the models table had brand-specific models and therefore also had a column called brand which linked back to the brands table on the ID field, it could be done as this:
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
join brands d
on a.brand=d.ID
and b.brand=d.ID
where
b.ID=1
You can see, the query above not only links the joined tables to the main cars table, but also specifies joins between the already joined tables. If this wasn't done, the result is called a cartesian join - which is dba speak for bad. A cartesian join is one where rows are returned because the information doesn't tell the database how to limit the results, so the query returns all the rows that fit the criteria.
So, to give an example of a cartesian join, lets run the following query:
select
a.ID,
b.model
from
cars a
join models b
+----+--------+
| ID | model |
+----+--------+
| 1 | Sports |
| 1 | Sedan |
| 1 | 4WD |
| 1 | Luxury |
| 2 | Sports |
| 2 | Sedan |
| 2 | 4WD |
| 2 | Luxury |
| 3 | Sports |
| 3 | Sedan |
| 3 | 4WD |
| 3 | Luxury |
| 4 | Sports |
| 4 | Sedan |
| 4 | 4WD |
| 4 | Luxury |
| 5 | Sports |
| 5 | Sedan |
| 5 | 4WD |
| 5 | Luxury |
| 6 | Sports |
| 6 | Sedan |
| 6 | 4WD |
| 6 | Luxury |
| 7 | Sports |
| 7 | Sedan |
| 7 | 4WD |
| 7 | Luxury |
| 8 | Sports |
| 8 | Sedan |
| 8 | 4WD |
| 8 | Luxury |
| 9 | Sports |
| 9 | Sedan |
| 9 | 4WD |
| 9 | Luxury |
| 10 | Sports |
| 10 | Sedan |
| 10 | 4WD |
| 10 | Luxury |
+----+--------+
40 rows in set (0.00 sec)
Good god, that's ugly. However, as far as the database is concerned, it is exactly what was asked for. In the query, we asked for for the ID from cars and the model from models. However, because we didn't specify how to join the tables, the database has matched every row from the first table with every row from the second table.
Okay, so the boss is back, and he wants more information again. I want the same list, but also include 4WDs in it.
This however, gives us a great excuse to look at two different ways to accomplish this. We could add another condition to the where clause like this:
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
join brands d
on a.brand=d.ID
where
b.ID=1
or b.ID=3
While the above will work perfectly well, lets look at it differently, this is a great excuse to show how a union query will work.
We know that the following will return all the Sports cars:
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
join brands d
on a.brand=d.ID
where
b.ID=1
And the following would return all the 4WDs:
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
join brands d
on a.brand=d.ID
where
b.ID=3
So by adding a union all clause between them, the results of the second query will be appended to the results of the first query.
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
join brands d
on a.brand=d.ID
where
b.ID=1
union all
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
join brands d
on a.brand=d.ID
where
b.ID=3
+----+--------+-------+
| ID | model | color |
+----+--------+-------+
| 1 | Sports | Red |
| 8 | Sports | Green |
| 10 | Sports | White |
| 3 | Sports | Black |
| 5 | 4WD | Green |
| 7 | 4WD | White |
| 9 | 4WD | Black |
+----+--------+-------+
7 rows in set (0.00 sec)
As you can see, the results of the first query are returned first, followed by the results of the second query.
In this example, it would of course have been much easier to simply use the first query, but union queries can be great for specific cases. They are a great way to return specific results from tables from tables that aren't easily joined together - or for that matter completely unrelated tables. There are a few rules to follow however.
The column types from the first query must match the column types from every other query below.
The names of the columns from the first query will be used to identify the entire set of results.
The number of columns in each query must be the same.
Now, you might be wondering what the difference is between using union and union all. A union query will remove duplicates, while a union all will not. This does mean that there is a small performance hit when using union over union all but the results may be worth it - I won't speculate on that sort of thing in this though.
On this note, it might be worth noting some additional notes here.
If we wanted to order the results, we can use an order by but you can't use the alias anymore. In the query above, appending an order by a.ID would result in an error - as far as the results are concerned, the column is called ID rather than a.ID - even though the same alias has been used in both queries.
We can only have one order by statement, and it must be as the last statement.
For the next examples, I am adding a few extra rows to our tables.
I have added Holden to the brands table.
I have also added a row into cars that has the color value of 12 - which has no reference in the colors table.
Okay, the boss is back again, barking requests out - *I want a count of each brand we carry and the number of cars in it!` - Typical, we just get to an interesting section of our discussion and the boss wants more work.
Rightyo, so the first thing we need to do is get a complete listing of possible brands.
select
a.brand
from
brands a
+--------+
| brand |
+--------+
| Ford |
| Toyota |
| Nissan |
| Smart |
| BMW |
| Holden |
+--------+
6 rows in set (0.00 sec)
Now, when we join this to our cars table we get the following result:
select
a.brand
from
brands a
join cars b
on a.ID=b.brand
group by
a.brand
+--------+
| brand |
+--------+
| BMW |
| Ford |
| Nissan |
| Smart |
| Toyota |
+--------+
5 rows in set (0.00 sec)
Which is of course a problem - we aren't seeing any mention of the lovely Holden brand I added.
This is because a join looks for matching rows in both tables. As there is no data in cars that is of type Holden it isn't returned. This is where we can use an outer join. This will return all the results from one table whether they are matched in the other table or not:
select
a.brand
from
brands a
left outer join cars b
on a.ID=b.brand
group by
a.brand
+--------+
| brand |
+--------+
| BMW |
| Ford |
| Holden |
| Nissan |
| Smart |
| Toyota |
+--------+
6 rows in set (0.00 sec)
Now that we have that, we can add a lovely aggregate function to get a count and get the boss off our backs for a moment.
select
a.brand,
count(b.id) as countOfBrand
from
brands a
left outer join cars b
on a.ID=b.brand
group by
a.brand
+--------+--------------+
| brand | countOfBrand |
+--------+--------------+
| BMW | 2 |
| Ford | 2 |
| Holden | 0 |
| Nissan | 1 |
| Smart | 1 |
| Toyota | 5 |
+--------+--------------+
6 rows in set (0.00 sec)
And with that, away the boss skulks.
Now, to explain this in some more detail, outer joins can be of the left or right type. The Left or Right defines which table is fully included. A left outer join will include all the rows from the table on the left, while (you guessed it) a right outer join brings all the results from the table on the right into the results.
Some databases will allow a full outer join which will bring back results (whether matched or not) from both tables, but this isn't supported in all databases.
Now, I probably figure at this point in time, you are wondering whether or not you can merge join types in a query - and the answer is yes, you absolutely can.
select
b.brand,
c.color,
count(a.id) as countOfBrand
from
cars a
right outer join brands b
on b.ID=a.brand
join colors c
on a.color=c.ID
group by
a.brand,
c.color
+--------+-------+--------------+
| brand | color | countOfBrand |
+--------+-------+--------------+
| Ford | Blue | 1 |
| Ford | White | 1 |
| Toyota | Black | 1 |
| Toyota | Green | 2 |
| Toyota | Red | 1 |
| Nissan | Black | 1 |
| Smart | White | 1 |
| BMW | Blue | 1 |
| BMW | White | 1 |
+--------+-------+--------------+
9 rows in set (0.00 sec)
So, why is that not the results that were expected? It is because although we have selected the outer join from cars to brands, it wasn't specified in the join to colors - so that particular join will only bring back results that match in both tables.
Here is the query that would work to get the results that we expected:
select
a.brand,
c.color,
count(b.id) as countOfBrand
from
brands a
left outer join cars b
on a.ID=b.brand
left outer join colors c
on b.color=c.ID
group by
a.brand,
c.color
+--------+-------+--------------+
| brand | color | countOfBrand |
+--------+-------+--------------+
| BMW | Blue | 1 |
| BMW | White | 1 |
| Ford | Blue | 1 |
| Ford | White | 1 |
| Holden | NULL | 0 |
| Nissan | Black | 1 |
| Smart | White | 1 |
| Toyota | NULL | 1 |
| Toyota | Black | 1 |
| Toyota | Green | 2 |
| Toyota | Red | 1 |
+--------+-------+--------------+
11 rows in set (0.00 sec)
As we can see, we have two outer joins in the query and the results are coming through as expected.
Now, how about those other types of joins you ask? What about Intersections?
Well, not all databases support the intersection but pretty much all databases will allow you to create an intersection through a join (or a well structured where statement at the least).
An Intersection is a type of join somewhat similar to a union as described above - but the difference is that it only returns rows of data that are identical (and I do mean identical) between the various individual queries joined by the union. Only rows that are identical in every regard will be returned.
A simple example would be as such:
select
*
from
colors
where
ID>2
intersect
select
*
from
colors
where
id<4
While a normal union query would return all the rows of the table (the first query returning anything over ID>2 and the second anything having ID<4) which would result in a full set, an intersect query would only return the row matching id=3 as it meets both criteria.
Now, if your database doesn't support an intersect query, the above can be easily accomlished with the following query:
select
a.ID,
a.color,
a.paint
from
colors a
join colors b
on a.ID=b.ID
where
a.ID>2
and b.ID<4
+----+-------+----------+
| ID | color | paint |
+----+-------+----------+
| 3 | Blue | Metallic |
+----+-------+----------+
1 row in set (0.00 sec)
If you wish to perform an intersection across two different tables using a database that doesn't inherently support an intersection query, you will need to create a join on every column of the tables.
Ok, I found this post very interesting and I would like to share some of my knowledge on creating a query. Thanks for this Fluffeh. Others who may read this and may feel that I'm wrong are 101% free to edit and criticise my answer. (Honestly, I feel very thankful for correcting my mistake(s).)
I'll be posting some of the frequently asked questions in MySQL tag.
Trick No. 1 (rows that matches to multiple conditions)
Given this schema
CREATE TABLE MovieList
(
ID INT,
MovieName VARCHAR(25),
CONSTRAINT ml_pk PRIMARY KEY (ID),
CONSTRAINT ml_uq UNIQUE (MovieName)
);
INSERT INTO MovieList VALUES (1, 'American Pie');
INSERT INTO MovieList VALUES (2, 'The Notebook');
INSERT INTO MovieList VALUES (3, 'Discovery Channel: Africa');
INSERT INTO MovieList VALUES (4, 'Mr. Bean');
INSERT INTO MovieList VALUES (5, 'Expendables 2');
CREATE TABLE CategoryList
(
MovieID INT,
CategoryName VARCHAR(25),
CONSTRAINT cl_uq UNIQUE(MovieID, CategoryName),
CONSTRAINT cl_fk FOREIGN KEY (MovieID) REFERENCES MovieList(ID)
);
INSERT INTO CategoryList VALUES (1, 'Comedy');
INSERT INTO CategoryList VALUES (1, 'Romance');
INSERT INTO CategoryList VALUES (2, 'Romance');
INSERT INTO CategoryList VALUES (2, 'Drama');
INSERT INTO CategoryList VALUES (3, 'Documentary');
INSERT INTO CategoryList VALUES (4, 'Comedy');
INSERT INTO CategoryList VALUES (5, 'Comedy');
INSERT INTO CategoryList VALUES (5, 'Action');
QUESTION
Find all movies that belong to at least both Comedy and Romance categories.
Solution
This question can be very tricky sometimes. It may seem that a query like this will be the answer:-
SELECT DISTINCT a.MovieName
FROM MovieList a
INNER JOIN CategoryList b
ON a.ID = b.MovieID
WHERE b.CategoryName = 'Comedy' AND
b.CategoryName = 'Romance'
SQLFiddle Demo
which is definitely very wrong because it produces no result. The explanation of this is that there is only one valid value of CategoryName on each row. For instance, the first condition returns true, the second condition is always false. Thus, by using AND operator, both condition should be true; otherwise, it will be false. Another query is like this,
SELECT DISTINCT a.MovieName
FROM MovieList a
INNER JOIN CategoryList b
ON a.ID = b.MovieID
WHERE b.CategoryName IN ('Comedy','Romance')
SQLFiddle Demo
and the result is still incorrect because it matches to record that has at least one match on the categoryName. The real solution would be by counting the number of record instances per movie. The number of instance should match to the total number of the values supplied in the condition.
SELECT a.MovieName
FROM MovieList a
INNER JOIN CategoryList b
ON a.ID = b.MovieID
WHERE b.CategoryName IN ('Comedy','Romance')
GROUP BY a.MovieName
HAVING COUNT(*) = 2
SQLFiddle Demo (the answer)
SQL of Relational Division
Trick No. 2 (maximum record for each entry)
Given schema,
CREATE TABLE Software
(
ID INT,
SoftwareName VARCHAR(25),
Descriptions VARCHAR(150),
CONSTRAINT sw_pk PRIMARY KEY (ID),
CONSTRAINT sw_uq UNIQUE (SoftwareName)
);
INSERT INTO Software VALUES (1,'PaintMe','used for photo editing');
INSERT INTO Software VALUES (2,'World Map','contains map of different places of the world');
INSERT INTO Software VALUES (3,'Dictionary','contains description, synonym, antonym of the words');
CREATE TABLE VersionList
(
SoftwareID INT,
VersionNo INT,
DateReleased DATE,
CONSTRAINT sw_uq UNIQUE (SoftwareID, VersionNo),
CONSTRAINT sw_fk FOREIGN KEY (SOftwareID) REFERENCES Software(ID)
);
INSERT INTO VersionList VALUES (3, 2, '2009-12-01');
INSERT INTO VersionList VALUES (3, 1, '2009-11-01');
INSERT INTO VersionList VALUES (3, 3, '2010-01-01');
INSERT INTO VersionList VALUES (2, 2, '2010-12-01');
INSERT INTO VersionList VALUES (2, 1, '2009-12-01');
INSERT INTO VersionList VALUES (1, 3, '2011-12-01');
INSERT INTO VersionList VALUES (1, 2, '2010-12-01');
INSERT INTO VersionList VALUES (1, 1, '2009-12-01');
INSERT INTO VersionList VALUES (1, 4, '2012-12-01');
QUESTION
Find the latest version on each software. Display the following columns: SoftwareName,Descriptions,LatestVersion (from VersionNo column),DateReleased
Solution
Some SQL developers mistakenly use MAX() aggregate function. They tend to create like this,
SELECT a.SoftwareName, a.Descriptions,
MAX(b.VersionNo) AS LatestVersion, b.DateReleased
FROM Software a
INNER JOIN VersionList b
ON a.ID = b.SoftwareID
GROUP BY a.ID
ORDER BY a.ID
SQLFiddle Demo
(most RDBMS generates a syntax error on this because of not specifying some of the non-aggregated columns on the group by clause) the result produces the correct LatestVersion on each software but obviously the DateReleased are incorrect. MySQL doesn't support Window Functions and Common Table Expression yet as some RDBMS do already. The workaround on this problem is to create a subquery which gets the individual maximum versionNo on each software and later on be joined on the other tables.
SELECT a.SoftwareName, a.Descriptions,
b.LatestVersion, c.DateReleased
FROM Software a
INNER JOIN
(
SELECT SoftwareID, MAX(VersionNO) LatestVersion
FROM VersionList
GROUP BY SoftwareID
) b ON a.ID = b.SoftwareID
INNER JOIN VersionList c
ON c.SoftwareID = b.SoftwareID AND
c.VersionNO = b.LatestVersion
GROUP BY a.ID
ORDER BY a.ID
SQLFiddle Demo (the answer)
So that was it. I'll be posting another soon as I recall any other FAQ on MySQL tag. Thank you for reading this little article. I hope that you have atleast get even a little knowledge from this.
UPDATE 1
Trick No. 3 (Finding the latest record between two IDs)
Given Schema
CREATE TABLE userList
(
ID INT,
NAME VARCHAR(20),
CONSTRAINT us_pk PRIMARY KEY (ID),
CONSTRAINT us_uq UNIQUE (NAME)
);
INSERT INTO userList VALUES (1, 'Fluffeh');
INSERT INTO userList VALUES (2, 'John Woo');
INSERT INTO userList VALUES (3, 'hims056');
CREATE TABLE CONVERSATION
(
ID INT,
FROM_ID INT,
TO_ID INT,
MESSAGE VARCHAR(250),
DeliveryDate DATE
);
INSERT INTO CONVERSATION VALUES (1, 1, 2, 'hi john', '2012-01-01');
INSERT INTO CONVERSATION VALUES (2, 2, 1, 'hello fluff', '2012-01-02');
INSERT INTO CONVERSATION VALUES (3, 1, 3, 'hey hims', '2012-01-03');
INSERT INTO CONVERSATION VALUES (4, 1, 3, 'please reply', '2012-01-04');
INSERT INTO CONVERSATION VALUES (5, 3, 1, 'how are you?', '2012-01-05');
INSERT INTO CONVERSATION VALUES (6, 3, 2, 'sample message!', '2012-01-05');
QUESTION
Find the latest conversation between two users.
Solution
SELECT b.Name SenderName,
c.Name RecipientName,
a.Message,
a.DeliveryDate
FROM Conversation a
INNER JOIN userList b
ON a.From_ID = b.ID
INNER JOIN userList c
ON a.To_ID = c.ID
WHERE (LEAST(a.FROM_ID, a.TO_ID), GREATEST(a.FROM_ID, a.TO_ID), DeliveryDate)
IN
(
SELECT LEAST(FROM_ID, TO_ID) minFROM,
GREATEST(FROM_ID, TO_ID) maxTo,
MAX(DeliveryDate) maxDate
FROM Conversation
GROUP BY minFROM, maxTo
)
SQLFiddle Demo
Part 2 - Subqueries
Okay, now the boss has burst in again - I want a list of all of our cars with the brand and a total of how many of that brand we have!
This is a great opportunity to use the next trick in our bag of SQL goodies - the subquery. If you are unfamiliar with the term, a subquery is a query that runs inside another query. There are many different ways to use them.
For our request, lets first put a simple query together that will list each car and the brand:
select
a.ID,
b.brand
from
cars a
join brands b
on a.brand=b.ID
Now, if we wanted to simply get a count of cars sorted by brand, we could of course write this:
select
b.brand,
count(a.ID) as countCars
from
cars a
join brands b
on a.brand=b.ID
group by
b.brand
+--------+-----------+
| brand | countCars |
+--------+-----------+
| BMW | 2 |
| Ford | 2 |
| Nissan | 1 |
| Smart | 1 |
| Toyota | 5 |
+--------+-----------+
So, we should be able to simply add in the count function to our original query right?
select
a.ID,
b.brand,
count(a.ID) as countCars
from
cars a
join brands b
on a.brand=b.ID
group by
a.ID,
b.brand
+----+--------+-----------+
| ID | brand | countCars |
+----+--------+-----------+
| 1 | Toyota | 1 |
| 2 | Ford | 1 |
| 3 | Nissan | 1 |
| 4 | Smart | 1 |
| 5 | Toyota | 1 |
| 6 | BMW | 1 |
| 7 | Ford | 1 |
| 8 | Toyota | 1 |
| 9 | Toyota | 1 |
| 10 | BMW | 1 |
| 11 | Toyota | 1 |
+----+--------+-----------+
11 rows in set (0.00 sec)
Sadly, no, we can't do that. The reason is that when we add in the car ID (column a.ID) we have to add it into the group by - so now, when the count function works, there is only one ID matched per ID.
This is where we can however use a subquery - in fact we can do two completely different types of subquery that will return the same results that we need for this. The first is to simply put the subquery in the select clause. This means each time we get a row of data, the subquery will run off, get a column of data and then pop it into our row of data.
select
a.ID,
b.brand,
(
select
count(c.ID)
from
cars c
where
a.brand=c.brand
) as countCars
from
cars a
join brands b
on a.brand=b.ID
+----+--------+-----------+
| ID | brand | countCars |
+----+--------+-----------+
| 2 | Ford | 2 |
| 7 | Ford | 2 |
| 1 | Toyota | 5 |
| 5 | Toyota | 5 |
| 8 | Toyota | 5 |
| 9 | Toyota | 5 |
| 11 | Toyota | 5 |
| 3 | Nissan | 1 |
| 4 | Smart | 1 |
| 6 | BMW | 2 |
| 10 | BMW | 2 |
+----+--------+-----------+
11 rows in set (0.00 sec)
And Bam!, this would do us. If you noticed though, this sub query will have to run for each and every single row of data we return. Even in this little example, we only have five different Brands of car, but the subquery ran eleven times as we have eleven rows of data that we are returning. So, in this case, it doesn't seem like the most efficient way to write code.
For a different approach, lets run a subquery and pretend it is a table:
select
a.ID,
b.brand,
d.countCars
from
cars a
join brands b
on a.brand=b.ID
join
(
select
c.brand,
count(c.ID) as countCars
from
cars c
group by
c.brand
) d
on a.brand=d.brand
+----+--------+-----------+
| ID | brand | countCars |
+----+--------+-----------+
| 1 | Toyota | 5 |
| 2 | Ford | 2 |
| 3 | Nissan | 1 |
| 4 | Smart | 1 |
| 5 | Toyota | 5 |
| 6 | BMW | 2 |
| 7 | Ford | 2 |
| 8 | Toyota | 5 |
| 9 | Toyota | 5 |
| 10 | BMW | 2 |
| 11 | Toyota | 5 |
+----+--------+-----------+
11 rows in set (0.00 sec)
Okay, so we have the same results (ordered slightly different - it seems the database wanted to return results ordered by the first column we picked this time) - but the same right numbers.
So, what's the difference between the two - and when should we use each type of subquery? First, lets make sure we understand how that second query works. We selected two tables in the from clause of our query, and then wrote a query and told the database that it was in fact a table instead - which the database is perfectly happy with. There can be some benefits to using this method (as well as some limitations). Foremost is that this subquery ran once. If our database contained a large volume of data, there could well be a massive improvement over the first method. However, as we are using this as a table, we have to bring in extra rows of data - so that they can actually be joined back to our rows of data. We also have to be sure that there are enough rows of data if we are going to use a simple join like in the query above. If you recall, the join will only pull back rows that have matching data on both sides of the join. If we aren't careful, this could result in valid data not being returned from our cars table if there wasn't a matching row in this subquery.
Now, looking back at the first subquery, there are some limitations as well. because we are pulling data back into a single row, we can ONLY pull back one row of data. Subqueries used in the select clause of a query very often use only an aggregate function such as sum, count, max or another similar aggregate function. They don't have to, but that is often how they are written.
So, before we move on, lets have a quick look at where else we can use a subquery. We can use it in the where clause - now, this example is a little contrived as in our database, there are better ways of getting the following data, but seeing as it is only for an example, lets have a look:
select
ID,
brand
from
brands
where
brand like '%o%'
+----+--------+
| ID | brand |
+----+--------+
| 1 | Ford |
| 2 | Toyota |
| 6 | Holden |
+----+--------+
3 rows in set (0.00 sec)
This returns us a list of brand IDs and Brand names (the second column is only added to show us the brands) that contain the letter o in the name.
Now, we could use the results of this query in a where clause this:
select
a.ID,
b.brand
from
cars a
join brands b
on a.brand=b.ID
where
a.brand in
(
select
ID
from
brands
where
brand like '%o%'
)
+----+--------+
| ID | brand |
+----+--------+
| 2 | Ford |
| 7 | Ford |
| 1 | Toyota |
| 5 | Toyota |
| 8 | Toyota |
| 9 | Toyota |
| 11 | Toyota |
+----+--------+
7 rows in set (0.00 sec)
As you can see, even though the subquery was returning the three brand IDs, our cars table only had entries for two of them.
In this case, for further detail, the subquery is working as if we wrote the following code:
select
a.ID,
b.brand
from
cars a
join brands b
on a.brand=b.ID
where
a.brand in (1,2,6)
+----+--------+
| ID | brand |
+----+--------+
| 1 | Toyota |
| 2 | Ford |
| 5 | Toyota |
| 7 | Ford |
| 8 | Toyota |
| 9 | Toyota |
| 11 | Toyota |
+----+--------+
7 rows in set (0.00 sec)
Again, you can see how a subquery vs manual inputs has changed the order of the rows when returning from the database.
While we are discussing subqueries, lets see what else we can do with a subquery:
You can place a subquery within another subquery, and so on and so on. There is a limit which depends on your database, but short of recursive functions of some insane and maniacal programmer, most folks will never hit that limit.
You can place a number of subqueries into a single query, a few in the select clause, some in the from clause and a couple more in the where clause - just remember that each one you put in is making your query more complex and likely to take longer to execute.
If you need to write some efficient code, it can be beneficial to write the query a number of ways and see (either by timing it or by using an explain plan) which is the optimal query to get your results. The first way that works may not always be the best way of doing it.
Part 3 - Tricks and Efficient Code
MySQL in() efficiency
I thought I would add some extra bits, for tips and tricks that have come up.
One question I see come up a fair bit, is How do I get non-matching rows from two tables and I see the answer most commonly accepted as something like the following (based on our cars and brands table - which has Holden listed as a brand, but does not appear in the cars table):
select
a.ID,
a.brand
from
brands a
where
a.ID not in(select brand from cars)
And yes it will work.
+----+--------+
| ID | brand |
+----+--------+
| 6 | Holden |
+----+--------+
1 row in set (0.00 sec)
However it is not efficient in some database. Here is a link to a Stack Overflow question asking about it, and here is an excellent in depth article if you want to get into the nitty gritty.
The short answer is, if the optimiser doesn't handle it efficiently, it may be much better to use a query like the following to get non matched rows:
select
a.brand
from
brands a
left join cars b
on a.id=b.brand
where
b.brand is null
+--------+
| brand |
+--------+
| Holden |
+--------+
1 row in set (0.00 sec)
Update Table with same table in subquery
Ahhh, another oldie but goodie - the old You can't specify target table 'brands' for update in FROM clause.
MySQL will not allow you to run an update... query with a subselect on the same table. Now, you might be thinking, why not just slap it into the where clause right? But what if you want to update only the row with the max() date amoung a bunch of other rows? You can't exactly do that in a where clause.
update
brands
set
brand='Holden'
where
id=
(select
id
from
brands
where
id=6);
ERROR 1093 (HY000): You can't specify target table 'brands'
for update in FROM clause
So, we can't do that eh? Well, not exactly. There is a sneaky workaround that a surprisingly large number of users don't know about - though it does include some hackery that you will need to pay attention to.
You can stick the subquery within another subquery, which puts enough of a gap between the two queries so that it will work. However, note that it might be safest to stick the query within a transaction - this will prevent any other changes being made to the tables while the query is running.
update
brands
set
brand='Holden'
where id=
(select
id
from
(select
id
from
brands
where
id=6
)
as updateTable);
Query OK, 0 rows affected (0.02 sec)
Rows matched: 1 Changed: 0 Warnings: 0
You can use the concept of multiple queries in the FROM keyword. Let me show you one example:
SELECT DISTINCT e.id,e.name,d.name,lap.lappy LAPTOP_MAKE,c_loc.cnty COUNTY
FROM (
SELECT c.id cnty,l.name
FROM county c, location l
WHERE c.id=l.county_id AND l.end_Date IS NOT NULL
) c_loc, emp e
INNER JOIN dept d ON e.deptno =d.id
LEFT JOIN
(
SELECT l.id lappy, c.name cmpy
FROM laptop l, company c
WHERE l.make = c.name
) lap ON e.cmpy_id=lap.cmpy
You can use as many tables as you want to. Use outer joins and union where ever it's necessary, even inside table subqueries.
That's a very easy method to involve as many as tables and fields.
Hopes this makes it find the tables as you're reading through the thing:
jsfiddle
mysql> show columns from colors;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(3) | NO | PRI | NULL | auto_increment |
| color | varchar(15) | YES | | NULL | |
| paint | varchar(10) | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+