I have read multiple articles and now I am confused between 2 following statements.
If we use having without group by then whole table act as Single Group.
If we use having without group by then each table act as an individual Group.
Which One is Correct in MySQL?
For example I have a table named ABC as Follow:
| Wage |
_____________
| 4 |
| 8 |
| 28 |
| 90 |
If We Use Following Query
select wage
from ABC
having wage > 1
then all the records get printed. So each row works as indivisual group.
But If We Use:
select wage
from ABC
having wage = max(wage)
the no record get printed. So whole table works as a group.
So which one is correct and why this 2 queries shows different results.
Don't use having without group by. Although MySQL supports that, this is not valid standard SQL, and the behavior you get will most likely be counter-intuitive.
The first query should be just a where clause:
select wage from abc where wage > 1
The second query just makes no sense: you have both an aggregated and a non-aggregated wage in the having clause. If you want the row that has the maximum wage, then you can order by and limit:
select wage
from abc
order by wage desc limit 1
Or if you want to allow ties, use a correlated subquery:
select *
from abc a
where wage = (select max(a1.wage) from abc)
Related
For one of the questions in my computing coursework, I was asked to explain the following SQL script in detail:
SELECT exam_board, COUNT(*)
FROM subjects
GROUP BY exam_board;
Below is what I have written in response to that question. I was just wondering if I forgot to include something, or if I incorrectly stated something.Any feedback at all would be greatly appreciated!
The script begins with a SELECT statement. A SELECT statement retrieves records from one or more tables or databases (, the data that is returned is then stored inside a result table, which is called a result-set). ‘COUNT ()’ is a function which returns (all (, as there is an asterisk)) the number of rows which match a specified criteria and it gives a total number of records fetched in a query. Therefore ‘SELECT exam_board, COUNT() FROM subjects’ means that the script will return all exam boards from the ‘exam_board’ column in the ‘subjects’ table with their count (of how many subjects are of that exam board). Finally the last line is ‘GROUP BY exam_board;’ the ‘GROUP BY’ clause is often used in SELECT statements to collect data from a number of records. Its purpose is to group the results in one or more columns. In this case it was grouped by ‘exam_board’, meaning that the result of the query will be grouped into a column of the exam boards.
You forgot the effect of GROUP BY is to reduce the result set to one row per distinct value in the grouping column (exam_board in this query).
So there might be 10,000 rows in the subjects table, but only four distinct values for exam_board. Using GROUP BY means you will only have four rows in the result set, exactly one row for each exam_board.
Then the COUNT(*) will be the count of rows that were "collapsed" for each respective group.
I request that you do not copy & paste my answer, but write your own answer in your own words. My writing style is pretty different from yours, so if you copy & paste, it'll be obvious to your teacher that you lifted this.
Actually this not the best answer.
SELECT can return not only data from the tables, but any result of any function, for example SELECT VERSION() returns a version of server software.
An asterisk as a parameter for COUNT(*) does not matter at all. You can put here any column or function, even COUNT(VERSION()), the result will be the same.
‘SELECT exam_board, COUNT() FROM subjects’ will return a single row with two columns: the total number of rows in table 'subjects' and the value of 'exam_board' column in the first row of the table.
Content of the table:
mysql> select exam_board from subjects;
+------------+
| exam_board |
+------------+
| 2 |
| 2 |
| 3 |
| 3 |
| 3 |
+------------+
5 rows in set (0.00 sec)
Mixing together column values and a function returning a single value like SUM(), MIN(), MAX() etc without grouping functions:
mysql> select exam_board, count(*) from subjects;
+------------+----------+
| exam_board | count(*) |
+------------+----------+
| 2 | 5 |
+------------+----------+
1 row in set (0.00 sec)
And only with grouping operator we will get the desired result: the count of records for each value of exam_board field.
mysql> select exam_board, count(*) from subjects group by exam_board;
+------------+----------+
| exam_board | count(*) |
+------------+----------+
| 2 | 2 |
| 3 | 3 |
+------------+----------+
2 rows in set (0.00 sec)
This is the SQL query I have written. It works until right before the group by statement but once I add that part, I get this error:
'reading_datetime' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get
My query:
Select A.bill_account, hour(A.reading_datetime), A.reading_value
from (
Select cast(cast(bill_account as double) as int)bill_account, reading_datetime, cast(reading_value as double)reading_value, `interval`
from amerendataorc
WHERE cast(cast(`interval` as double)as int) = 3600 AND reading_datetime between '2015-03-15 00:00:00' and '2016-03-14 23:59:59'
) A
GROUP BY A.bill_account
HAVING (COUNT(A.bill_account)>= 8000) and (COUNT(A.bill_account) < 9500)")
Not sure exactly how the group by is messing up the query.
take the sum of reading date time and reading value
Select A.bill_account, sum(hour(A.reading_datetime)), sum(A.reading_value)
from (
Select cast(cast(bill_account as double) as int)bill_account, reading_datetime, cast(reading_value as double)reading_value, `interval`
from amerendataorc
WHERE cast(cast(`interval` as double)as int) = 3600 AND reading_datetime between '2015-03-15 00:00:00' and '2016-03-14 23:59:59'
) A
GROUP BY A.bill_account
HAVING (COUNT(A.bill_account)>= 8000) and (COUNT(A.bill_account) < 9500)")
---- explanation ------------
mysql> SELECT * FROM tt where user="user1";
+----------+-------+
| duration | user |
+----------+-------+
| 00:06:00 | user1 |
| 00:02:00 | user1 |
+----------+-------+
2 rows in set (0.00 sec)
mysql> SELECT * FROM tt where user="user1" group by user;
+----------+-------+
| duration | user |
+----------+-------+
| 00:06:00 | user1 |
+----------+-------+
1 row in set (0.00 sec)
once you add group by it will give only the summery after group by on that column in above example its giving 1st value
else you can get sum,max ... aggreagte values
SQL is trying to avoid an issue whereby you have multiple hour(A.reading_datetime) per A.Bill_Account. Grouping by Bill_account will give you a list of unique Bill_accounts. Then it has multiple hour(A.reading_datetime) per Bill_account and needs you to help it choose how to select one.
You need to group by each value that occurs or use aggregate functions on non-group by fields. If you group by reading_datetime and reading_value as well SQL will list all unique combinations of the three fields in the group by.
MySql suggests using first(); max() min() sum() etc are all aggregate functions what will help you get once value per Bill_account.
You will need to doing this for reading_value as well.
Standard SQL doesn't permit queries for which the select list refers to nonaggregated columns that are not named in the GROUP BY clause.
Therefore you have to add those columns to the GROUP BY clause, or you have to aggregate the columns in the SELECT clause, in your case:
Select A.bill_account, sum(hour(A.reading_datetime)), sum(A.reading_value)
But you have to evaluate if it is adequate for your data to sum those columns in that way, and if it isn't, add the columns as GROUP BY criteria.
Any field that is not included in the Group By Clause will require an aggregate function like SUM, COUNT, MIN or MAX to be included in the Selected fields.
http://www.w3schools.com/sql/sql_groupby.asp
To correct the issue you will need to use the following group by clause
GROUP BY A.bill_account, A.reading_datetime, A.reading_value
I'm trying to write a query that returns a fixed number of results in a group concat. I don't think it's possible with a group concat, but I'm having trouble figuring out what sort of subquery to add.
Here's what I would like to do:
Query
select id,
group_concat(concat(user,'-',time) order by time limit 5)
from table
where id in(1,2,3,4)
group by 1
When I remove the "limit 5" from the group concat, the query works but spits out way too much information.
I'm open to structuring the query differently. Specific ID numbers will be supplied by the user of the query, and for each ID specified, I would like to list a fixed number of results. Let me know if there is a better way to achieve this.
Not sure the exact result set you want, but check out this SO post:
How to hack MySQL GROUP_CONCAT to fetch a limited number of rows?
As another example, I tried out the query/solution provided in the link and came up with this:
SELECT user_id, SUBSTRING_INDEX(GROUP_CONCAT(DISTINCT date_of_entry),',',5) AS logged_dates FROM log GROUP BY user_id;
Which returns:
user_id | logged_dates
1 | "2014-09-29,2014-10-18,2014-10-05,2014-10-12,2014-10-19"
2 | "2014-09-12,2014-09-03,2014-09-23,2014-09-22,2014-10-13"
3 | "2014-09-10"
6 | "2014-09-29,2014-09-27,2014-09-26,2014-09-25"
8 | "2014-09-26,2014-09-30,2014-09-27"
9 | "2014-09-28"
13 | "2014-09-29"
22 | "2014-10-12"
The above query will return every user id that has logged something, and up to 5 dates that the user has logged. If you want more or less results form the group concat, just change the number 5 in my query.
Following up, and merging my query with yours, I get:
SELECT user_id, SUBSTRING_INDEX(GROUP_CONCAT(date_of_entry ORDER BY date_of_entry ASC),',',3) AS logged_dates FROM log WHERE user_id IN(1,2,3,4) GROUP BY user_id
Which would return (notice that I changed the number of results returned from the group_concat):
user_id | logged_dates
1 | "2014-09-16,2014-09-17,2014-09-18"
2 | "2014-09-02,2014-09-03,2014-09-04"
3 | "2014-09-10"
Lets say I have a plant table:
id fruit
1 banana
2 apple
3 orange
I can do these
SELECT * FROM plant ORDER BY id;
SELECT * FROM plant ORDER BY fruit DESC;
which does the obvious thing.
But I was bitten by this, what does this do?
SELECT * FROM plant ORDER BY SUM(id);
SELECT * FROM plant ORDER BY COUNT(fruit);
SELECT * FROM plant ORDER BY COUNT(*);
SELECT * FROM plant ORDER BY SUM(1) DESC;
All these return just the first row (which is with id = 1).
What's happening underhood?
What are the scenarios where aggregate function will come in handy in ORDER BY?
Your results are more clear if you actually select the aggregate values instead of columns from the table:
SELECT SUM(id) FROM plant ORDER BY SUM(id)
This will return the sum of all id's. This is of course a useless example because the aggregation will always create only one row, hence no need for ordering. The reason you get a row qith columns in your query is because MySQL picks one row, not at random but not deterministic either. It just so happens that it is the first column in the table in your case, but others may get another row depending on storage engine, primary keys and so on. Aggregation only in the ORDER BY clause is thus not very useful.
What you usually want to do is grouping by a certain field and then order the result set in some way:
SELECT fruit, COUNT(*)
FROM plant
GROUP BY fruit
ORDER BY COUNT(*)
Now that's a more interesting query! This will give you one row for each fruit together with the total count for that fruit. Try adding some more apples and the ordering will actually start making sense:
Complete table:
+----+--------+
| id | fruit |
+----+--------+
| 1 | banana |
| 2 | apple |
| 3 | orange |
| 4 | apple |
| 5 | apple |
| 6 | banana |
+----+--------+
The query above:
+--------+----------+
| fruit | COUNT(*) |
+--------+----------+
| orange | 1 |
| banana | 2 |
| apple | 3 |
+--------+----------+
All these queries will all give you a syntax error on any SQL platform that complies with SQL standards.
SELECT * FROM plant ORDER BY SUM(id);
SELECT * FROM plant ORDER BY COUNT(fruit);
SELECT * FROM plant ORDER BY COUNT(*);
SELECT * FROM plant ORDER BY SUM(1) DESC;
On PostgreSQL, for example, all those queries will raise the same error.
ERROR: column "plant.id" must appear in the GROUP BY clause or be
used in an aggregate function
That means you're using a domain aggregate function without using GROUP BY. SQL Server and Oracle return similar error messages.
MySQL's GROUP BY is known to be broken in several respects, at least as far as standard behavior is concerned. But the queries you posted were a new broken behavior to me, so +1 for that.
Instead of trying to understand what it's doing under the hood, you're probably better off learning to write standard GROUP BY queries. MySQL will process standard GROUP BY statements correctly, as far as I know.
Earlier versions of MySQL docs warned you about GROUP BY and hidden columns. (I don't have a reference, but this text is cited all over the place.)
Do not use this feature if the columns you omit from the GROUP BY part
are not constant in the group. The server is free to return any value
from the group, so the results are indeterminate unless all values are
the same.
More recent versions are a little different.
You can use this feature to get better performance by avoiding
unnecessary column sorting and grouping. However, this is useful
primarily when all values in each nonaggregated column not named in
the GROUP BY are the same for each group. The server is free to choose
any value from each group, so unless they are the same, the values
chosen are indeterminate.
Personally, I don't consider indeterminate a feature in SQL.
When you use an aggregate like that, the query gets an implicit group by where the entire result is a single group.
Using an aggregate in order by is only useful if you also have a group by, so that you can have more than one row in the result.
I have a table with multiple rows which have a same data. I used SELECT DISTINCT to get a unique row and it works fine. But when i use ORDER BY with SELECT DISTINCT it gives me unsorted data.
Can anyone tell me how distinct works?
Based on what criteria it selects the row?
From your comment earlier, the query you are trying to run is
Select distinct id from table where id2 =12312 order by time desc.
As I expected, here is your problem. Your select column and order by column are different. Your output rows are ordered by time, but that order doesn't necessarily need to preserved in the id column. Here is an example.
id | id2 | time
-------------------
1 | 12312 | 34
2 | 12312 | 12
3 | 12312 | 48
If you run
SELECT * FROM table WHERE id2=12312 ORDER BY time DESC
you will get the following result
id | id2 | time
-------------------
2 | 12312 | 12
1 | 12312 | 34
3 | 12312 | 48
Now if you select only the id column from this, you will get
id
--
2
1
3
This is why your results are not sorted.
When you specify SELECT DISTINCT it will give you all the rows, eliminating duplicates from the result set. By "duplicates" I mean rows where all fields have the same values. For example, say you have a table that looks like:
id | num
--------------
1 | 1
2 | 3
3 | 3
SELECT DISTINCT * would return all rows above, whereas SELECT DISTINCT num would return two rows:
num
-----
1
3
Note that which row actual row (eg: whether it's row 2 or row 3) it selects is irrelevant, as the result would be indistinguishable.
Finally, DISTINCT should not affect how ORDER BY works.
Reference: MySQL SELECT statement
The behaviour you describe happens when you ORDER BY an expression that is not present in the SELECT clause. The SQL standard does not allow such a query but MySQL is less strict and allows it.
Let's try an example:
SELECT DISTINCT colum1, column2
FROM table1
WHERE ...
ORDER BY column3
Let's say the content of the table table1 is:
id | column1 | column2 | column3
----+---------+---------+---------
1 | A | B | 1
2 | A | B | 5
3 | X | Y | 3
Without the ORDER BY clause, the above query returns following two records (without ORDER BY the order is not guaranteed):
column1 | column2
---------+---------
A | B
X | Y
But with ORDER BY column3 the order is also not guaranteed.
The DISTINCT clause operates on the values of the expressions present in the SELECT clause. If row #1 is processed first then (A, B) is placed in the result set and it is associated with row #1. Then, when row #2 is processed, the values of the SELECT expressions produce the record (A, B) that is already in the result set. Because of DISTINCT it is dropped. Row #3 produces (X, Y) that is also put in the result set. Then, the ORDER BY column3 clause makes the records be sorted in the result set as (A, B), (X, Y).
But if row #2 is processed before row #1 then, following the same logic exposed in the previous paragraph, the records in the result set are sorted as (X, Y), (A, B).
There is no rule imposed on the database engine about the order it processes the rows when it runs a query. The database is free to process the rows in any order it consider it's better for performance.
Your query is invalid SQL and the fact that it can return different results using the same input data proves it.