GROUP BY: If you want, you can take the rows that remain after WHERE and put them in groups or buckets, where each group contains the same value for the GROUP BY expression (and all the other rows are put in a list for that group). In Java, you would get something like: Map<String, List<Row>>. If you do specify a GROUP BY clause, then your actual rows contain only the group columns, no longer the remaining columns, which are now in that list. Those columns in the list are only visible to aggregate functions that can operate upon that list.
This above paragraph was taken from: https://blog.jooq.org/a-beginners-guide-to-the-true-order-of-sql-operations/
We have a table named student that has the following fields :
Student_id
Student_name
Student_marks
Student_branch
I write a query as:
select sum(student_marks) from student
group by student_branch;
So according to the paragraph, I am grouping the rows by student_branch.
So what group is created?
Is there a group that contains all of the student_branch values?
Also I couldn't get the meaning of this sentence '(and all the other rows are put in a list for that group).'
Can anyone please explain how does group by actually work and then how do the aggregate functions work on those groups.
In your example you would have a separate group for each distinct value in the column student_branch. So if you you have student_branch = A for 5 students and student_branch = B for 3 students, then you would get 2 groups:
A with 5 records
B with 3 records
When you use aggregate functions, they will operate on all records within one group. So SUM(student_marks) will add all student marks of the students in group A separately, and also in group B separately.
In your sample query you will get 2 aggregated result rows, with only the sums of the marks. The result would be more meaningful, if you include the student_branch in the SELECT clause, like:
SELECT student_branch, SUM(student_marks) AS sum_of_marks FROM student
GROUP BY student_branch;
Then the result would look like:
student_branch
sum_of_marks
A
15 (random number, = sum of marks for A students)
B
8 (random number, = sum of marks for B students)
The GROUP BY will only group the records which remain after filtering the data with the WHERE clause. In your example case there is no WHERE clause, so it will group the records of the whole table by student branch.
Related
Why do I not get the same results when running the two queries? If I run the second one I get the course with the smallest amount of credits and when I run the first one I get the courses ordered by courseid
select min(credits), title, courseid
from course
group by title, courseid
select min(credits)
from course
An aggregation query is any query that has a group by or an aggregation function in the select.
An aggregation query returns one row per group, where a "group" is defined as the unique combination of values of the keys in the group by clause. If there is no group by clause, then all rows are taken to be a single group and one row is returned.
So, your first query returns one row for each combination of title and courseid in the course table. That row contains the minimum value of credits for that combination. If the course table has only one row per courseid, then the results are very similar to the contents of the table.
The second query returns one row overall, with the minimum number of credits of all rows.
If you want to get one row from with the minimum number of credits, then you don't want an aggregation query. Instead, you can use:
select c.*
from course c
order by c.credits
limit 1;
When you use a group by, you are using a sort of "filter", in the first query you group by title, then all the same titles are grouped by courseid, in the second you only select the minimum value of credits without filtering.
Take a look at a group by doc maybe with some graphical examples like this:
https://www.geeksforgeeks.org/sql-group-by/
I have a table structure
id group name points
1 1 a 10
2 1 b 9
3 2 c 7
and so on..
I am writing a query which gives me an array of names and avg of all the points for seleceted rows where group matches the value
for group_list = [1] want a results like this [name: ['a','b'], median:[9.5]]
I have tried like this
$group_list = [1];
createQueryBuilder()
->select('x.name as name, x.AVG(points) as median')
->from('myTable', 'x')
->where('x.group IN(:groupList)')
->setParameter('groupList', $group_list)
->getQuery()
->getResult();
Need some help with this
You are combining 2 distinct requirements into a single sql statement and this causes the problem.
The average of points is a single value per group or per all records, while the names are a list. You can combine the 2 into a single query by repeating the averages across the names, however, it just generates an overhead.
I would simply run a query to get the list of usernames and a separate one to get the average points (either grouped by groups or across all groups, this is not clear from the question).
This solution is so simple, that I do not think I need to provide any code.
Alternatively, you can use MySQL's group_concat() function to get the list of names per group into in single value in comma separated list (you can use any other separator character in place of comma). In this case it is more worthwile to combine the 2 in a single query:
select group_concat(`name`) as names, avg(`points`) as median
from mytable
where `group` in (...)
If you want names from more than one groups, then add group field to the select and group by lists:
select `group`, group_concat(`name`) as names, avg(`points`) as median
from mytable
where `group` in (...)
group by `group`
You should add a group by
->groupBy('x.`group`')
Lets say I have a table with a column of ages..
Here is the list of ages
1
2
3
1
1
3
I want the SQL to count how many of age 1s, how many of 2s and 3s.
The code:
Select count(age) as age1 where age = ‘1’;
Select count(age) as age2 where age = ‘2’;
Select count(age) as age3 where age = ‘3’;
Should work but would there be a way to just display it all using only 1 line of code?
This is an instance where the GROUP BY clause really shines:
SELECT age, COUNT(age)
FROM table_name
GROUP BY age
Just an additional tip:
You shouldn't use single quotes here in your query:
WHERE age = '1';
This is because age is an INT data type and therefore does not have single quotes. MySQL will implicitly convert age to the correct data type for you - and it's a negligible amount of overhead here. But imagine if you were doing a JOIN of two tables with millions of rows, then the overhead introduced would be something to consider.
Try this ,if the count is limited to three ages ,also using aggregate functions without grouping them will result in a single row,you can use SUM() with the condition which will result in a boolean and you can get the count based on your criteria
Select SUM(age = '1') as age1,
SUM(age = '2') as age2,
SUM(age = '3') as age3
from table
SELECT SUM(CASE WHEN age = 1 THEN 1 ELSE 0 END) AS age1,
SUM(CASE WHEN age = 2 THEN 1 ELSE 0 END) AS age2,
SUM(CASE WHEN age = 3 THEN 1 ELSE 0 END) AS age3
FROM YourTable
If your query should return only one column (age in this case, you can use Count+groupby):
SELECT age, Count(1) as qty
FROM [yourTable]
GROUP BY age
Remember you must include any additional column in your group by condition.
Select age as Age_Group, count(age) as Total_count from table1 group by age;
select age, count(age) from SomeTable group by age
http://sqlfiddle.com/#!2/b40da/2
The group by clause works like this:
When using aggregate functions, like the count function without a group by clause the function will apply to the entire dataset determined by the from and where clauses. A count will for instance count the number of rows in the result set, and sum over a specfic column will sum all the rows in the result set.
What the group by clause allows us to do, is to divide the result set determined by the from and where clause into partitions, so that the aggregate functions no longer applies to the result set as a whole, but rather within each partition of the result set.
When you specify a column to group by, what you are saying is something like "for each distinct value of column x in the result set, create a partition containing any row in the result set with this particular value in column x". Then, instead of yielding one result covering the entire resultset, aggregate functions will yield one result for each distinct value of column x in the result set.
With your example input of:
1
2
3
1
1
3
let's analyze the above query. As always, we should look at the from clause and the where clause first. The from clause tells us that we are selecting from SomeTable and only this, and the lack of a where clause tells us that we are selecting from the full contents of SomeTable.
Next, we'll look at the group by clause. It's present, and it groups by the age column, which is the only column in our example. The presence of the group by clause changes our dataset completely! Instead of selecting from the entire row set of SomeTable, we are now selecting from a set of partitions, one for each distinct value of the age-column in our original result set (which was every row in SomeTable).
At last, we'll look at the select-clause. Now, since we are selecting from partitions and not regular rows, the select-clause has fewer options for what it can contain, actually it only has 2: The column that it is grouped by, or an aggregate function.
Now, in our example we only have one column, but consider that we had another column, like here:
http://sqlfiddle.com/#!2/d5479/2
Now, imagine that in our data set we have two rows, both with age='1', but with different values in the other column. If we were to include this other column in a query that is grouped by the age-column (which we now know will return one row for each partition over the age-column), which value should be presented in the result? It makes no sense to include other column than the one you grouped by. (I'll leave multiple columns in the group by clause out of this, in my experience one usually just wants one..)
But back to our select-clause, knowing our dataset has the distinct values {1, 2, 3} in the age-column, we should expect to get 3 rows in our result set. The first thing to be selected is the age-column, which will yield the values [1, 2, 3]´ in the three rows. Next in theselect-list is an aggregate functioncount(age), which we now know will count the number of rows in each partition. So, for the row in the result whereage='1', it will count the number of rows withage='1', for the row whereage='2'it will count the number of rows whereage='2'`, and so on.
The result would look something like this:
age count(age)
1 3
2 1
3 2
(of course you are free to override the name of the second column in the result, with the as-operator..)
And that concludes today's lesson.
Please consider the following query:
SELECT artist.id, COUNT(DISTINCT artist$styles.v_id)
FROM artist
LEFT JOIN artist$styles ON artist$styles.p_id = artist.id
This is the result I get:
id count
1 4
The questions are:
How come it's only selecting one row from the artist table, when there are 4 rows in it and there are no WHERE, HAVING, LIMIT or GROUP BY clauses applied to the query?
There are only three records in artist$styles having p_id of value 1, why is it counting 4?
Why if I add a GROUP BY clause to it I get the correct results?
SELECT artist.id, COUNT(DISTINCT artist$styles.v_id)
FROM artist
LEFT JOIN artist$styles ON artist$styles.p_id = artist.id
GROUP BY artist.id
----
id count
1 3
2 1
3 3
4 1
This all just doesn't make sense to me. Could this be a bug of MySQL? I'm running Community 5.5.25a
As stated in the manual page on aggregate functions (of which COUNT() is one):
If you use a group function in a statement containing no GROUP BY clause, it is equivalent to grouping on all rows.
As stated in the manual page on GROUP BY with hidden columns:
The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
In other words, the server has chosen one (indeterminate) value for column p_id, which happens in this case to be the value 1, whilst it has properly aggregated and counted the result for the COUNT() function.
Because you are then grouping on the correct columns, rather than on all rows.
It's not a bug; this behaviour is documented and by design.
It is a possible bug in Mysql. All non aggeregate columns should be included in Group by clause. MySQL does not force this and the result is unpredictable and hard to debug. As a rule always include all non-aggregate columns in the Group by clause. This is how all RDBMSs work
Count Function return single row result if you are not using group by clause and that's why its returning one row.
2.In your output
id count
1 4
4 is total no of results in that table not result for id 1.and it display in front of 1 because only one row produce.
3.when you use group by then a group of that column value is created that's why you get that output.
And finally its not a bug.Mysql provide a proper documentation for that you can read on mysql site.
I want to get the distinct value of a particular column however duplicity is not properly managed if more than 3 columns are selected.
The query is:
SELECT DISTINCT
ShoppingSessionId, userid
FROM
dbo.tbl_ShoppingCart
GROUP BY
ShoppingSessionId, userid
HAVING
userid = 7
This query produces correct result, but if we add another column then result is wrong.
Please help me as I want to use the ShoppingSessionId as a distinct, except when I want to use all the columns from the table, including with the where clause .
How can I do that?
The DISTINCT keyword applies to the entire row, never to a column.
Presently DISTINCT is not needed at all, because your script already makes sure that ShoppingSession is distinct: by specifying the column in GROUP BY and filtering on the other grouping column (userid).
When you add a third column to GROUP BY and it results in duplicated ShoppingSession, it means that some ShoppingSession values are associated with many different values of the added column.
If you want ShoppingSession to remain distinct after including that third column, you should decide which values of the the added column should be left in the output and which should be discarded. This is called aggregating. You could apply the MAX() function to that column, or MIN() or any other suitable aggregate function. Note that the column should not be included in GROUP BY in this case.
Here's an illustration of what I'm talking about:
SELECT
ShoppingSessionId,
userid,
MAX(YourThirdColumn) AS YourThirdColumn
FROM dbo.tbl_ShoppingCart
GROUP BY
ShoppingSessionId,
userid
HAVING userid = 7
There's one more note on your query. The HAVING clause is typically used for filtering on aggregated columns. If your filter does not involve aggregated columns, you'll be better off using the WHERE clause instead:
SELECT
ShoppingSessionId,
userid,
MAX(YourThirdColumn) AS YourThirdColumn
FROM dbo.tbl_ShoppingCart
WHERE userid = 7
GROUP BY
ShoppingSessionId,
userid
Although both queries would produce identical results, their efficiency would be different, because the first query would have to pull all rows, group/aggregate them, then discard all rows except userid = 7, but the second one would discard rows first and only then group/aggregate the remaining, which is much more efficient.
You could go even further and exclude the userid column from GROUP BY and pull its value with an aggregate function:
SELECT
ShoppingSessionId,
MAX(userid) AS userid,
MAX(YourThirdColumn) AS YourThirdColumn
FROM dbo.tbl_ShoppingCart
WHERE userid = 7
GROUP BY
ShoppingSessionId
Since all userid values in your output are supposed to contain 7 (because that's in your filter), you can just pick a maximum value per every ShoppingSession, knowing that it'll always be 7.