WHERE in Aggregate function - mysql

I have a quick question. I know that we cannot use WHERE clause in an aggregate function in MySQL. The table structure is as follows:
+----+------+----------+--------+
| ID | Name | Location | Active |
+----+------+----------+--------+
| 1 | Aaaa | India | 0 |
| 2 | Aaaa | USA | 0 |
| 3 | Aaaa | USA | 1 |
| 4 | Aaaa | India | 0 |
| 5 | Aaaa | UK | 0 |
| 6 | Aaaa | India | 1 |
| 7 | Aaaa | USA | 1 |
| 8 | Aaaa | USA | 0 |
| 9 | Aaaa | India | 0 |
| 10 | Aaaa | UK | 1 |
+----+------+----------+--------+
The query I have here is:
SELECT COUNT(*), `location`, `active` FROM `users` GROUP BY `location`;
The above query will give me the counts of the location. But, I need only the active users. So, I need a WHERE clause that does something like:
SELECT COUNT(*), `location`, `active` FROM `users` GROUP BY `location` WHERE `active`=1;
The above query is invalid. The valid query would be using HAVING. But, if I change the query to:
SELECT COUNT(*), `location`, `active` FROM `users` GROUP BY `location` HAVING `active`=1;
The counts are no different from the original query, which is:
SELECT COUNT(*), `location`, `active` FROM `users` GROUP BY `location`;
So, what am I supposed to do for getting the user counts of the location, who are active? Thanks in advance.

Use where before group by so where clause will filter out the results according to your criteria and you can have your count on basis of your where criteria
SELECT COUNT(*), `location`, `active`
FROM `users`
WHERE `active`=1
GROUP BY `location`
In addition to your specific question you can also use sum with your criteria so this will return as boolean and you can have your count on basis of your expression like
SELECT sum(`active`=1), `location`, `active`
FROM `users`
GROUP BY `location`
Above sum expression is equivalent to sum(CASE when active=1 THEN 1 ELSE 0 END)

You can put the WHERE before the GROUP BY if you want to filter rows before aggregating. I you want to do some calculation among rows already aggregated, you have to use a CASE expression inside the aggregate. I don't think that's what you're doing here, so just put the WHERE clause in the right place.
The query looks like this, with active removed because it's now redundant (will always be 1):
SELECT COUNT(*), location
FROM users
WHERE active = 1
GROUP BY location, active
It's important here to be reminded of the logical order of operations of an SQL query:
FROM: starting table, view, derived table
JOIN: adding additional tables, views, etc.
WHERE: filtering the rows from the initial table and joined tables
GROUP BY: aggregating the rows that have been filtered by WHERE
HAVING: filtering the aggregated rows output by GROUP BY
SELECT: pick columns and compute expressions
ORDER BY: sort the resulting rows, with selected columns and computed expressions available for ordering
LIMIT: control which and how many rows go back to the client
If you keep this in mind, it'll be easier to understand when to do certain types of filtering and what the rows that you're filtering look like.

You can use where in aggregate queries, however, it must be before group by and after from

Related

How to select both sum value of all rows and values in some specific rows?

I have a record table and its comment table, like:
| commentId | relatedRecordId | isRead |
|-----------+-----------------+--------|
| 1 | 1 | TRUE |
| 2 | 1 | FALSE |
| 3 | 1 | FALSE |
Now I want to select newCommentCount and allCommentCount as a server response to the browser. Is there any way to select these two fields in one SQL?
I've tried this:
SELECT `isRead`, count(*) AS cnt FROM comment WHERE relatedRecordId=1 GROUP BY `isRead`
| isRead | cnt |
| FALSE | 2 |
| TRUE | 1 |
But, I have to use a special data structure to map it and sum the cnt fields in two rows to get allCommentCount by using an upper-layer programming language. I want to know if I could get the following format of data by SQL only and in one step:
| newCommentCount | allCommentCount |
|-----------------+-----------------|
| 2 | 3 |
I don't even know how to describe the question. So I got no any search result in Google and Stackoverflow. (Because of My poor English, maybe)
Use conditional aggregation:
SELECT SUM(NOT isRead) AS newCommentCount, COUNT(*) AS allCommentCount
FROM comment
WHERE relatedRecordId = 1;
if I under stand you want show sum of newComments Count and all comments so you can do it like
SELECT SUM ( CASE WHEN isRead=false THEN 1 ELSE 0 END ) AS newComment,
Count(*) AS AllComments From comments where relatedRecord=1
also you can make store procedure for it.
To place two result sets horizontally, you can as simple as use a subquery for an expression in the SELECT CLAUSE as long as the number of rows from the result sets match:
select (select count(*) from c_table where isread=false and relatedRecordId=1 ) as newCommentCount,
count(*) as allCommentCount
from c_table where relatedRecordId=1;

How does select and group by execution works. Column present in GROUP BY and not in SELECT

According to documentations of MySQL I read SELECT executes before group by. I have a table named Views as follows and query
select distinct(viewer_id) as id
from Views v
group by viewer_id,view_date
having count(distinct(article_id))>1;
In this query if select is performed before group by according to documentation,how is it able to group by based on view_date as only viewer_id is selected. This has really confused me about how exact order of group by and select work.
+------------+-----------+-----------+------------+
| article_id | author_id | viewer_id | view_date |
+------------+-----------+-----------+------------+
| 1 | 3 | 5 | 2019-08-01 |
| 3 | 4 | 5 | 2019-08-01 |
| 1 | 3 | 6 | 2019-08-02 |
| 2 | 7 | 7 | 2019-08-01 |
| 2 | 7 | 6 | 2019-08-02 |
| 4 | 7 | 1 | 2019-07-22 |
| 3 | 4 | 4 | 2019-07-21 |
| 3 | 4 | 4 | 2019-07-21 |
+------------+-----------+-----------+------------+
There is no order to the evaluation. A SQL query describes the result set.
It is true that MySQL has a rather naive optimizer, so you can often see what the resulting query will be. But you should not think of the clauses as being evaluated in a particular order.
You might be confusing evaluation of the query with scoping rules. These affect how a particular identifier is determined.
You should not think in term of order of execution, but rather in terms of correctness of the statement. The select clause must be consistent with the group by clause: that is, any column that is present in the select clause and this is not part of an aggregate function must belong to the group by clause.
It is, on the other hand, perfectly valid to have columns in the group by clause that does not belong to the select clause - although the results might be a bit difficult to understand, because some information is missing about how the groups were built.
If we remove the distinct in the select clause, your query would phrase as:
select viewer_id as id
from views v
group by viewer_id, view_date
having count(distinct(article_id)) > 1;
This brings the viewer_ids for every view_date when they have more than one distinct article_id. A given viewer_id may appear more than once in the resultset, if they satisfied the condition on more than one date.
Then, distinct filters out duplicates viewer_ids: as a result, you get the list of viewers that viewed more one article on any date.

Prioritize By Values Within Group

In MySQL, say I have the following table (called workers):
| id | specialty | status | name
| :- | :-------- | :--------- | :--- |
| 1 | Bricks | Unemployed | Joe
| 2 | Bricks | Employed | Eric
| 3 | Bricks | Contracted | Bob
| 4 | Tiles | Employed | Dylan
| 5 | Tiles | Contracted | James
In my query, say I want to find who is a prospective person for a new job. Thus, I would want to first find who is Unemployed, if no one is Unemployed, then who is only Contracted, and if no one is Contracted then at least who is Employed.
This would be GROUP BY specialty. The only methods I could figure out are either complex sub-queries or sets of UNIONs (or both). I also tried GROUP_CONCAT however this didn't work (or I didn't do it right). Googling this has not yielded any results.
Another idea is to assign a value to each category, and then do a group-wise max/min sub-query. I piloted this and it works, however seems quite messy and definitely not normalized:
SELECT
`id`,
`name`,
`status`,
-- I haven't been able to figure out how to get rid of MIN from the actual select
-- statement except by wrapping this in another sub-query, which I'm not keen on
MIN(`priority`) AS `priority`
FROM workers
INNER JOIN (
SELECT 'Unemployed' AS `status`, 0 AS `priority` FROM dual UNION
SELECT 'Contracted' AS `status`, 1 AS `priority` FROM dual UNION
SELECT 'Employed' AS `status`, 2 AS `priority` FROM dual
) AS priorities USING (`status`)
GROUP BY `specialty`;
I am looking for a more standard, efficient, normalized or versatile method of doing this.
Update:
An additional method I could be to use a CASE expression in the SELECT clause of the statement. This would be if I were to normalize the status column, through a foreign-key relationship or other related table:
New table called statuses
| id | status |
| :- | :------------- |
| 1 | Employed |
| 2 | Contracted |
| 3 | Unemployed |
| 4 | Not contracted |
Diffs: 'Not Contracted' is a new status and my workers table now stores the foreign key to the new statuses table.
Then my SQL would be:
SELECT
`id`,
`name`,
statuses.status,
MIN(`priority`) AS `priority`
FROM workers
INNER JOIN (
SELECT
`id`,
`status`,
CASE
-- currently uses text in `status`,
-- could also explicitly use `id`
WHEN `status` IN ('Unemployed', 'Not Contracted') THEN 0
WHEN `status` = 'Contracted' THEN 1
WHEN `status` = 'Employed' THEN 2
ELSE 3
END AS `priority`
FROM statuses
) AS statuses ON workers.status = statuses.id
GROUP BY `specialty`;
Note: You might think - why not put the priority in the statuses table? The reason why I am not doing that is because the priority changes depending on the data needed / the purpose of the report being generated.
Potentially this is a cleaner solution (for the times that the related data to prioritize against is in another table). Again, I am looking for a more standard, efficient, normalized or versatile method of doing this. Also, if there is more of a way this could be configurable to user input / variables.
The difficulty here mainly arises because you don't have an ordinal column which ranks the various status in some order. Absent that, we can introduce one using a CASE expression, similar to what your second query is trying to do:
SELECT w1.*
FROM workers w1
INNER JOIN
(
SELECT
specialty,
MIN(CASE status WHEN 'Unemployed' THEN 1
WHEN 'Contracted' THEN 2
ELSE 3 END) AS status_rnk
FROM workers
GROUP BY specialty
) w2
ON w1.specialty = w2.specialty AND
w2.status_rnk = CASE w1.status WHEN 'Unemployed' THEN 1
WHEN 'Contracted' THEN 2
ELSE 3 END;

Joining 2 row of data into 1 row of data

I have a table which looks like this
|Application No | Status | Amount | Type |
==========================================
|90909090 | Null | 3,000 | Null |
|90909090 | Forfeit| Null | A |
What I want to achieve is to combine the values together and end with a result like
|Application No | Status | Amount | Type |
==========================================
|90909090 | Forfeit| 3,000 | A |
I am new to SQL Query and have no idea how to do this
Thanks in advance
No need to join, use max() aggregate function and group by:
select applicationno, max(status), max(amount), max(type)
from yourtable
group by applicationno
However, if you have several non-null values for an application number in a field, then you may have to define a more granular rule than a simple aggregation via max.

mysql returns wrong results with random duplicate values

i need to return the best 5 scores in each category from a table.so far i have tried query below following an example from this site: selecting top n records per group
query:
select
subject_name,substring_index(substring_index
(group_concat(exams_scores.admission_no order by exams_scores.score desc),',',value),',',-1) as names,
substring_index(substring_index(group_concat(score order by score desc),',',value),',',-1)
as orderedscore
from exams_scores,students,subjects,tinyint_asc
where tinyint_asc.value >=1 and tinyint_asc.value <=5 and exam_id=2
and exams_scores.admission_no=students.admission_no and students.form_id=1 and
exams_scores.subject_code=subjects.subject_code group by exams_scores.subject_code,value;
i get the top n as i need but my problem is that its returning duplicates at random which i dont know where they are coming from
As you can see English and Math have duplicates which should not be there
+------------------+-------+--------------+
| subject_name | names | orderedscore |
+------------------+-------+--------------+
| English | 1500 | 100 |
| English | 1500 | 100 |
| English | 2491 | 100 |
| English | 1501 | 99 |
| English | 1111 | 99 |
|Mathematics | 1004 | 100 |
| Mathematics | 1004 | 100 |
| Mathematics | 2722 | 99 |
| Mathematics | 2734 | 99 |
| Mathematics | 2712 | 99 |
+-----------------------------------------+
I have checked table and no duplicates exist
to confirm there are no duplicates in the table:
select * from exams_scores
having(exam_id=2) and (subject_code=121) and (admission_no=1004);
result :
+------+--------------+---------+--------------+-------+
| id | admission_no | exam_id | subject_code | score |
+------+--------------+---------+--------------+-------+
| 4919 | 1004 | 2 | 121 | 100 |
+------+--------------+---------+--------------+-------+
1 row in set (0.00 sec)
same result for English.
If i run the query like 5 times i sometimes end up with another field having duplicate values.
can anyone tell me why my query is behaving this way..i tried adding distinct inside
group_concat(ditinct(exams_scores.admission_no))
but that didnt work ??
You're grouping by exams_scores.subject_code, value. If you add them to your selected columns (...as orderedscore, exams_scores.subject_code, value from...), you should see that all rows are distinct with respect to these two columns you grouped by. Which is the correct semantics of GROUP BY.
Edit, to clarify:
First, the SQL server removes some rows according to your WHERE clause.
Afterwards, it groups the remaining rows according to your GROUP BY clause.
Finally, it selects the colums you specified, either by directly returning a column's value or performing a GROUP_CONCAT on some of the columns and returning their accumulated value.
If you select columns not included in the GROUP BY clause, the returned results for these columns are arbitrary, since the SQL server reduces all rows equal with respect to the columns specified in the GROUP BY clause to one single row - as for the remaining columns, the results are pretty much undefined (hence the "randomness" you're experiencing), because - what should the server choose as a value for this column? It can only pick one randomly from all the reduced rows.
In fact, some SQL servers won't perform such a query and return an SQL error, since the result for those columns would be undefined, which is something you don't want to have in general. With these servers (I believe MSSQL is one of them), you more or less can only have columns in you SELECT clause which are part of your GROUP BY clause.
Edit 2: Which, finally, means that you have to refine your GROUP BY clause to obtain the grouping that you want.