Here is the SQL problem.
Table: Countries
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| country_id | int |
| country_name | varchar |
+---------------+---------+
country_id is the primary key for this table.
Each row of this table contains the ID and the name of one country.
Table: Weather
+---------------+------+
| Column Name | Type |
+---------------+------+
| country_id | int |
| weather_state | int |
| day | date |
+---------------+------+
(country_id, day) is the primary key for this table.
Each row of this table indicates the weather state in a country for one day.
Write an SQL query to find the type of weather in each country for November 2019.
The type of weather is:
Cold if the average weather_state is less than or equal 15,
Hot if the average weather_state is greater than or equal to 25, and
Warm otherwise.
Return result table in any order.
One of the MySQL solutions is as follows:
SELECT country_name, CASE WHEN AVG(weather_state) <= 15 THEN 'Cold' WHEN AVG(weather_state) >= 25 THEN 'Hot'
ELSE 'Warm'
END AS weather_type
FROM Weather w
JOIN Countries c
ON w.country_id = c.country_id
AND LEFT(w.day, 7) = '2019-11'
GROUP BY w.country_id
How does the "case when AVG(weather_state)" get executed, if the group by gets executed after the select statement?
How does the "case when AVG(weather_state)" get executed, if the group by gets executed after the select statement?
AVG(weather_state) computes the per-group average of column weather_state. It and other aggregate functions can be used in a select clause, from which you can conclude that the grouping defined by a group by clause must be visible in the context where the select clause is evaluated. In this sense, at least, group by gets executed before select. Pretty much everything else does too.
It is possible for an aggregate query to be identifiable only from the select clause. In such cases, the select clause needs to be parsed before it is known that grouping (all rows into a single group) is to be performed. This is the closest I can think of to the execution-order claim you asserted, but it is not at all well characterized as group by being executed after select.
MySQL's implementation details surely present a more complicated picture, but the fact remains that MySQL does provide correct SQL semantics in this regard. Therefore, even if you look at the details, they cannot reasonably be characterized as executing the group by after the select. Whoever told you that was wrong, or at least their lesson was very misleading, or else you misunderstood them.
I am doing some works to fill in document by using MySQL Database. What I want to do is to make result with given WHERE condition. Following student table:
student
+-----+------------+-----------------+-----+
| id | nickname | student_name | ... |
+-----+------------+-----------------+-----+
| 1 | Joy | Anderson | ... |
| 2 | Prank | Campbell | ... |
+-----+------------+-----------------+-----+
I gave this following query to database:
SELECT nickname FROM students WHERE student_name in ('Anderson', 'Campbell')
then, I expected a result like this:
Joy
Prank
Above expected result is matched with sequence on WHERE condition. ( WHERE student_name in ('Anderson', 'Campbell') ) Joy is matched with Anderson and Prank is matched with Campbell. But current result is like this:
Prank
Joy
Now, I don't know what I should do to make my expected result. Does anyone can give me some idea or information for this situation ?
You have fallen into a common SQL trap. Rows such as your rows in students and members of sets such as ('Anderson', 'Campbell') have no built-in order. The server doesn't know anything about Anderson coming before Campbell even though your query shows them that way.
Your only recourse is to use an appropriate ORDER BY clause. Without an ORDER BY clause, results are shown in an order that's formally unpredictable. In your case ORDER BY student_name at the end of your query will make your row ordering predictable.
Unpredictable is a complex idea. It's like random except worse. Random usually implies a result is likely to be different each time. Unpredictable means it's the same every time, until it isn't.
I've been experimenting with this particular table:
http://www.quackit.com/sql/tutorial/sql_order_by.cfm
and it seems when I order by more than 2 columns, I get the same results as ordering by one column.
For example:
SELECT * FROM Individual ORDER BY last_name;
is basically the same as saying:
SELECT * FROM Individual ORDER BY last_name, first_name;
What's the whole point of ordering by multiple columns in SQL? I really see no practical use of it, are there some things you can accomplish with it that you can't accomplish in sorting by same column?
It is not the same.
While ORDER BY last_name may produce a result like
last_name | first_name
Doe | John
Doe | Jane
ORDER BY last_name,first_name is always
last_name | first_name
Doe | Jane
Doe | John
If 2+ people have the same last name, the second sort column will sort by their first name.
it could be that there is a index on first_name and last_name column and they are being sorted on the index.
Alright having a real tough time getting this working.
I have a database of employees. I am looking to better condition this data. By this i mean that many entries in the table have rows where the employee_sales_id is "?". Each employee can have multiple rows (each time they change jobs). For the same employee some entries have the employee_sales_id while others don't. I want to scan my table and update all "?" where the value can be picked out from another row. The Employee_id is unique for each employee.
DB Looks like this:
Employee_id | employee_sales_id | name
1234 | abc | Jim Smith
1234 | abc | Jim Smith
1234 | ? | Jim Smith
1234 | abc | Jim Smith
You see the 3rd row. I want to fix that and update it with abc. There are many employees so i cant do this manually. It has to be a sql script. Also would be great to have it process the data on insert.
UPDATE employee
JOIN
(SELECT employee_id,employee_sales_id FROM employee
GROUP BY employee_id
HAVING COUNT(employee_sales_id)>1
AND employee_sales_id!='?')x
ON employee.employee_id=x.employee_id
SET employee.employee_sales_id=x.employee_sales_id
WHERE employee.employee_sales_id='?'
SQL FIDDLE
Try
UPDATE employee
SET Employee_sales_id = ES.employee_sales_id
FROM
(SELECT DISTINCT Employee_id, employee_sales_Id
FROM Employee
where employee_sales_id <> '?') ES
INNER JOIN Employee E on E.Employee_id = ES.Employee_id
WHERE E.employee_sales_id = '?'
By the way I assume you have some sort of primary key and/or further attributes on the Employee table that we are not seeing in your question text otherwise you have duplicate rows in your table which are going to cause problems for you somewhere.
Update
Sorry, I'm using Sql Server. Did not see your question related to MySql.
Try:
update employees x join (select employee_id,
min(employee_sales_id) as good_id,
jobcode,
job_start_date
from employees
where employee_sales_id <> '?') y on x.employee_id = y.employee_id
and x.jobcode = y.jobcode
and x.job_start_date = y.job_start_date
set x.employee_sales_id = y.employee_sales_id
where x.employee_sales_id = '?'
If the employee_sales_id changes over time, then you would want the row with the ? to be populated with the last employee_sales_id value for the given employee_id where the job is the same as the current job and start date, I assume.
In this book I'm currently reading while following a course on databases, the following example of an illegal query using an aggregate operator is given:
Find the name and age of the oldest sailor.
Consider the following attempt to answer this query:
SELECT S.sname, MAX(S.age)
FROM Sailors S
The intent is for this query to return not only the maximum age but
also the name of the sailors having that age. However, this query is
illegal in SQL--if the SELECT clause uses an aggregate operation, then
it must use only aggregate operations unless the query contains a GROUP BY clause!
Some time later while doing an exercise using MySQL, I faced a similar problem, and made a mistake similar to the one mentioned. However, MySQL didn't complain and just spit out some tables which later turned out not to be what I needed.
Is the query above really illegal in SQL, but legal in MySQL, and if so, why is that?
In what situation would one need to make such a query?
Further elaboration of the question:
The question isn't about whether or not all attributes mentioned in a SELECT should also be mentioned in a GROUP BY.
It's about why the above query, using atributes together with aggregate operations on attributes, without any GROUP BY is legal in MySQL.
Let's say the Sailors table looked like this:
+----------+------+
| sname | age |
+----------+------+
| John Doe | 30 |
| Jane Doe | 50 |
+----------+------+
The query would then return:
+----------+------------+
| sname | MAX(S.age) |
+----------+------------+
| John Doe | 50 |
+----------+------------+
Now who would need that? John Doe ain't 50, he's 30!
As stated in the citation from the book, this is a first attempt to get the name and age of the oldest sailor, in this example, Jane Doe at the age of 50.
SQL would say this query is illegal, but MySQL just proceeds and spits out "garbage".
Who would need this kind of result?
Why does MySQL allow this little trap for newcomers?
By the way, it is default MySQL behavior. But it can be changed by setting ONLY_FULL_GROUP_BY server mode in the my.ini file or in the session -
SET sql_mode = 'ONLY_FULL_GROUP_BY';
SELECT * FROM sakila.film_actor GROUP BY actor_id;
Error: 'sakila.film_actor.film_id' isn't in GROUP BY
ONLY_FULL_GROUP_BY - Do not permit queries for which the select list refers to nonaggregated columns that are not named in the GROUP BY clause.
Is the query above really illegal in SQL, but legal in MySQL
Yes
if so, why is that
I don't know the reasons for the design decisions made in MySQL, but considering that you can get the actual related data from the same row(s) as the aggregate came from (e.g., MAX or MIN) with only slightly more work, I don't see any advantage in returning additional column data from arbitrary rows.
I strongly dislike this "feature" in MySQL and it trips up many people who learn aggregates on MySQL and then move to a different dbms, and suddenly realize they never quite knew what they were doing.
Based on a link which a_horse_with_no_name provided in a comment, I have arrived at my own answer:
It seems that the MySQL way of using GROUP BY differs from the SQL way, in order to permit leaving out columns, from the GROUP BY clause, when they are functionally dependant on other included columns anyways.
Lets say we have a table displaying the activity of a bank account.
It's not a very thought-out table, but it's the only one we have, and that will have to do.
Instead of keeping track of an amount, we imagine an account starts at '0', and all transactions to it is recorded instead, so the amount is the sum of the transactions. The table could look like this:
+------------+----------+-------------+
| costumerID | name | transaction |
+------------+----------+-------------+
| 1337 | h4x0r | 101 |
| 42 | John Doe | 500 |
| 1337 | h4x0r | -101 |
| 42 | John Doe | -200 |
| 42 | John Doe | 500 |
| 42 | John Doe | -200 |
+------------+----------+-------------+
It is clear that the 'name' is functionally dependant on the 'costumerID'.
(The other way around would also be possible in this example.)
What if we wanted to know the costumerID, name and current amount of each customer?
In such a situation, two very similar queries would return the following right result:
+------------+----------+--------+
| costumerID | name | amount |
+------------+----------+--------+
| 42 | John Doe | 600 |
| 1337 | h4x0r | 0 |
+------------+----------+--------+
This query can be executed in MySQL, and is legal according to SQL.
SELECT costumerID, name, SUM(transaction) AS amount
FROM Activity
GROUP BY costumerID, name
This query can be executed in MySQL, and is NOT legal according to SQL.
SELECT costumerID, name, SUM(transaction) AS amount
FROM Activity
GROUP BY costumerID
The following line would make the query return and error instead, since it would now have to follow the SQL way of using aggregation operations and GROUP BY:
SET sql_mode = 'ONLY_FULL_GROUP_BY';
The argument for allowing the second query in MySQL, seems to be that it is assumed that all columns mentioned in SELECT, but not mentioned in GROUP BY, are either used inside an aggregate operation, (the case with 'transaction'), or are functionally dependent on other included columns, (the case with 'name'). In the case of 'name', we can be sure that the correct 'name' is chosen for all group entries, since it is functionally dependant on 'costumerID', and therefore there is only one possibly name for each group of costumerID's.
This way of using GROUP BY seems flawed tough, since it doesn't do any further checks on what is left out from the GROUP BY clause. People can pick and choose columns from their SELECT statement to put in their GROUP BY clause as they see fit, even if it makes no sense to include or leave out any particular column.
The Sailor example illustrates this flaw very well.
When using aggregation operators (possibly in conjunction with GROUP BY), each group entry in the returned set has only one value for each of its columns. In the case of Sailors, since the GROUP BY clause is left out, the whole table is put into one single group entry. This entry needs a name and a maximum age. Choosing a maximum age for this entry is a no-brainer, since MAX(S.age) only returns one value. In the case of S.sname though, wich is only mentioned in SELECT, there are now as many choices as there are unique sname's in the whole Sailor table, (in this case two, John and Jane Doe). MySQL doens't have any clue which to choose, we didn't give it any, and it didn't hit the brakes in time, so it has to just pick whatever comes first, (Jane Doe). If the two rows were switched, it would actually give "the right answer" by accident. It just seems plain dumb that something like this is allowed in MySQL, that the result of a query using GROUP BY could potententially depend on the ordering of the table, if something is left out in the GROUP BY clause. Apparently, that's just how MySQL rolls. But still couldn't it at least have the courtesy of warning us when it has no clue what it's doing because of a "flawed" query? I mean, sure, if you give the wrong instructions to a program, it probably wouldn't (or shouldn't) do as you want, but if you give unclear instructions, I certainly wouldn't want it to just start guessing or pick whatever comes first... -_-'
MySQL allows this non-standard SQL syntax because there is at least one specific case in which it makes the SQL nominally easier to write. That case is when you're joining two tables which have a PRIMARY / FOREIGN KEY relationship (whether enforced by the database or not) and you want an aggregate value from the FOREIGN KEY side and multiple columns from the PRIMARY KEY side.
Consider a system with Customer and Orders tables. Imagine you want all the fields from the customer table along with the total of the Amount field from the Orders table. In standard SQL you would write:
SELECT C.CustomerID, C.FirstName, C.LastName, C.Address, C.City, C.State, C.Zip, SUM(O.Amount)
FROM Customer C INNER JOIN Orders O ON C.CustomerID = O.CustomerID
GROUP BY C.CustomerID, C.FirstName, C.LastName, C.Address, C.City, C.State, C.Zip
Notice the unwieldy GROUP BY clause, and imagine what it would look like if there were more columns you wanted from customer.
In MySQL, you could write:
SELECT C.CustomerID, C.FirstName, C.LastName, C.Address, C.City, C.State, C.Zip, SUM(O.Amount)
FROM Customer C INNER JOIN Orders O ON C.CustomerID = O.CustomerID
GROUP BY C.CustomerID
or even (I think, I haven't tried it):
SELECT C.*, SUM(O.Amount)
FROM Customer C INNER JOIN Orders O ON C.CustomerID = O.CustomerID
GROUP BY C.CustomerID
Much easier to write. In this particular case it's safe as well, since you know that only one row from the Customer table will contribute to each group (assuming CustomerID is PRIMARY or UNIQUE KEY).
Personally, I'm not a big fan of this exception to standard SQL syntax (since there are many cases where it's not safe to use this syntax and rely on getting values from any particular row in the group), but I can see where it makes certain kinds of queries easier and (in the case of my second MySQL example) possible.