How SELECT is executed before GROUP BY in this query? - mysql

I was looking at the order in which the SQL is executed and I found out that it is:
FROM,
WHERE,
GROUP BY,
HAVING,
SELECT,
ORDER BY
But in the below query the "_index" is used in the GROUP BY, How is this possible?
SELECT COUNT(ab.id) AS count, COUNT(ab.id)/365.24 AS average,
IF((SUBSTR(ab.begin, 1, 7) = '2014-08'), '2014-08-18 00:00:00.0 CEST',
IF((SUBSTR(ab.begin, 1, 7) = '2014-09'), '2014-09-18 00:00:00.0 CEST',
IF((SUBSTR(ab.begin, 1, 7) = '2014-10'), '2014-10-18 00:00:00.0 CEST',
IF((SUBSTR(ab.begin, 1, 7) = '2014-11'), '2014-11-18 00:00:00.0 CET',
'0')))) AS _index
FROM active_begin AS ab
INNER JOIN asources AS a ON a.id = ab.asource AND a.unit IN (4, 3, 1)
WHERE (1408226400000 <= ab.begin_time AND ab.begin_time < 1417388400000)
GROUP BY _index
PS. refr this for the order: http://www.bennadel.com/blog/70-sql-query-order-of-operations.htm
Thanks in advance.

I think this (the ability to use a column alias defined in the SELECT clause for the GROUP BY) is a probably non-standard extension that some databases allow (but not all).
You are supposed to repeat the exact definition again or wrap everything in a sub-select.
Lucky if your database lets you get away with it.

There is no "order" to how SQL is executed. The SQL optimizer can choose to execute the operations in any order it decides is best for the query.
There is an order to how the clauses are interpreted. So, table aliases and columns are defined in the from clause -- this is interpreted first. Then the subsequent clauses are interpreted. In general, this explains why you cannot use a column alias defined in a select in a where clause, because the where clause is interpreted first.

Related

Alternatives to using "having" clause for alias fields

I have this somewhat complex sql query that works ok without the final where clause. I'm looking to filter some records using the column unreviewed_records which is an alias
Problem is that I get an error saying unreviewed_records cannot be found. I found some information saying that alias fields are not permitted to be used in where clauses and I'm not sure what's the best way to fix this. Considered using a computed column but I'm not sure how that works yet and I'm hoping there's an easier fix to the query.
Also I find that switching to using the "having" clause work for aliases, but I'll only resort to this if there's no better alternative, to avoid the performance hit.
Any pointers would be helpful :)
select
r_alias.serv_id, r_alias.node_id,
SUM(g_alias.total_records)- SUM(r_alias.reviewed_records) AS unreviewed_records,
SUM(r_alias.reviewed_records) AS reviewed_records,
SUM(g_alias.total_records) AS total_records,
FROM (
SELECT prs.serv_id,
prs.node_id,
SUM(prs.reviewed_records) AS reviewed_records,
FROM p_rev_server prs
WHERE
prs.area_id = 3
AND prs.subId = 3
AND prs.sId = 12
GROUP BY prs.serv_id, prs.node_id, prs.domain_name
) r_alias
INNER JOIN (SELECT
serv_id,
node_id,
SUM(pgs.total_records) AS total_records,
FROM p_gen_serve pgs
WHERE pgs.area_id = 3
AND pgs.subId = 3
AND pgs.sId = 12
AND pgs.total_records > 0
GROUP BY pgs.serv_id, pgs.node_id, pgs.domain_name
) g_alias
ON g_alias.serv_id = r_alias.serv_id AND g_alias.node_id = r_alias.node_id
LEFT JOIN p_cust_columns cust_cols
ON cust_cols.node_id = r_alias.node_id AND cust_cols.serv_id = r_alias.serv_id
where (((NOT (unreviewed_records IS NULL)) AND (unreviewed_records = 5)))
group by r_alias.serv_id, r_alias.node_id
order by g_alias.node_id ASC
limit 25
The reason aliases are not allowed in a WHERE clause is that the expressions in the SELECT list are not evaluated until after the rows are filtered by the WHERE clause. So it's a chicken-and-egg problem.
The easiest and most common alternative is a derived table:
SELECT a, b, c
FROM (
SELECT a, b, a+b AS c
FROM mytable
WHERE b = 1234
) AS t
WHERE c = 42;
This example shows that you can put some filtering conditions inside the derived table subquery, so you can at least reduce the result set partially, before the result of the subquery is turned into a temporary table.
Then in the outer query, you can reference a column that was derived from an expression in the select-list of the subquery. In this example, it's the c column.
The CTE approach is basically the same, it creates a temporary table to store the result of the inner query (the CTE), and then you can apply conditions to that in the outer query.
WITH t AS (
SELECT a, b, a+b AS c
FROM mytable
WHERE b = 1234
)
SELECT a, b, c
FROM t
WHERE c = 42;
The CTE solution is not better than the derived-table approach, unless you need to reference the CTE multiple times in the outer query, i.e. doing a self-join.
Yeah, you are kind of SOL, WHERE can't know what an alias will be. So, frankly, a CTE, common table expression, is probably your best bet here. It should work, though not all RDBMS really support them (MySQL for example only in version 8).

Select Result only if unique (sort of)

I am building a simple sql query, though I cant get my head around this one.
This is the layout for the table:
Challange: I would like to grap all from this table, only if there is an entry (by id_order) which does not have a threshold of 20 (which in this case, only ID 18 should be shown dynamically).
I was thinking going with:
SELECT * FROM `cancelorders_history` WHERE threshold != 20 GROUP BY `id_order`
Though this throws following error (and Im not sure that the query is matching the logic Im looking for, like explained above):
Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'exampletable.id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
I cant use:
SELECT * FROM `cancelorders_history` WHERE threshold != 20
Because gives me both ID of 13,18,19.
What would be the preferred method to go around this?
Hmmm . . . I think you can use not exists:
select coh.*
from cancelorders_history coh
where not exists (select 1
from cancelorders_history coh2
where coh2.id_order = coh.id_order and coh2.threshold >= 20
);
Or, use a window function:
select coh.*
from (select coh.*,
max(threshold) over (partition by id_order) as max_threshold
from cancelorders_history coh
) coh
where max_threshold < 20;
You have an error on the GROUP BY because you have to have a group for every column you SELECT. Code should be something like:
SELECT * FROM cancelorders_history WHERE threshold != 20 GROUP BY threshold, notification_sent, id_order, id;
Yet, your query does not meet the requirements to only get one result. Doesn't look like want to use GROUP BY. Perhaps ORDER BY, but you'll have to explain the logic you really want to use.

Can't fetch a field with GROUP BY clause

I'm trying to create a simple query that will find a person with highest average marks and display some basic information about them. It's retrieving the proper record, but I can't make MySQL display students.classId field. The error I'm getting in LibreOffice Base is
Not in aggregate function or group by clause.
Query with error:
SELECT CONCAT(`students`.`surname`, CONCAT(' ', `students`.`name`)) AS `Student`,
AVG(CAST(`marks`.`mark` AS DECIMAL (10, 2))) AS `Average`,
`students`.`classId`
FROM `students`, `marks`, `subjects`
WHERE `marks`.`subjectId` = `subjects`.`subjectId`
AND `students`.`studentId` = `marks`.`markId`
GROUP BY `students`.`surname`, `students`.`name`
ORDER BY `Average` DESC LIMIT 1;
Query without error:
SELECT CONCAT(`students`.`surname`, CONCAT(' ', `students`.`name`)) AS `Student`,
AVG(CAST(`marks`.`mark` AS DECIMAL (10, 2))) AS `Average`
FROM `students`, `marks`, `subjects`
WHERE `marks`.`subjectId` = `subjects`.`subjectId`
AND `students`.`studentId` = `marks`.`markId`
GROUP BY `students`.`surname`, `students`.`name`
ORDER BY `Average` DESC LIMIT 1;
I'm not really experienced with SQL, but I think that posting table definitions isn't necessary in this case. If I am wrong, please leave a note in the comments, I'll update the question as soon as possible.
Please note that it is not a homework.
The problematic item is this:
`students`.`classId`
Since the GROUP BY query produces a single row for one or more rows of the joined tables, that single row may correspond to more than one students.classId value.
That is what SQL is asking you to fix: it wants to know which of potentially many items of students.classId you want it to return. The two choices are adding an aggregate function, say
MIN(`students`.`classId`) AS StudentClassId
or using students.classId in the GROUP BY clause:
GROUP BY `students`.`surname`, `students`.`name`, `students`.`classId`
Note that if you go with the later choice, the aggregation would be per student / class pair, not per student.

Finding even values in a table MySQL

In MySQL, i have a table with a column full of positive integers and i want to filter out all the odd integers. It seems like there is nothing in the MySQL documentation. I tried the following query.
select kapsule.owner_name,
kapsule.owner_domain,
count(xform_action)
from kapsule, rec_xform
where rec_xform.g_conf_id=kapsule.g_conf_id
and (count(xform_action))%2=0
group by kapsule.owner_name;
I want to keep only those values where count(xform_action) is even. The table looks like this.
To filter out resultset after GROUP BY you need to use HAVING clause.
WHERE clause is used to filter source rows before GROUP BY occurs.
Try
SELECT k.owner_name,
k.owner_domain,
COUNT(x.xform_action) cnt -- < you probably meant to use SUM() instead of COUNT() here
FROM kapsule k JOIN rec_xform x -- < use JOIN notation for clarity
ON x.g_conf_id = k.g_conf_id
GROUP BY k.owner_name
HAVING cnt % 2 = 0
You probably meant to use SUM() (sums values of a column of all rows in a group) aggregate instead of COUNT() (returns number of rows in a group)
Here is SQLFiddle demo (for both SUM() and COUNT())
For aggregate functions like COUNT(*) using GROUP BY you need to use HAVING clause
select kapsule.owner_name, kapsule.owner_domain,
count(xform_action) from kapsule, rec_xform
where rec_xform.g_conf_id=kapsule.g_conf_id and
group by kapsule.owner_name, kapsule.owner_domain
HAVING (count(xform_action))%2=0
or you could use alias (i.e. AS) like:
select kapsule.owner_name, kapsule.owner_domain,
count(xform_action) count_form from kapsule, rec_xform
where rec_xform.g_conf_id=kapsule.g_conf_id and
group by kapsule.owner_name, kapsule.owner_domain
HAVING count_form%2=0
And you could use JOIN as more efficient than the old one of joining tables. And by the way
if you have GROUP BY the fields before the aggregate function should be in GROUP BY like:
select kapsule.owner_name, kapsule.owner_domain,
count(xform_action) count_form from kapsule A
INNER JOIN rec_xform B
ON A.g_conf_id=B.g_conf_id and
GROUP BY by A.owner_name, A.owner_domain
HAVING count_form%2=0
See examples here

mysql SELECT JOIN and GROUP BY

Here is my query:
SELECT v2.mac, v2.userag_hash, v2.area, count(*), count(distinct v2.video_id)
FROM video v2 JOIN (
SELECT distinct v.mac, v.userag_hash
from video v
WHERE v.date_pl >= '2012-01-30 00:00' AND
v.date_pl <= '2012-02-05 23:55'
ORDER BY rand() LIMIT 50
) table2
ON v2.mac = table2.mac AND
v2.userag_hash = table2.userag_hash AND
v2.date_pl >= '2012-01-30 00:00' AND
v2.date_pl <= '2012-02-05 23:55'
GROUP BY v2.mac, v2.userag_hash
I have one table "video" in the database, it contains several thousand users' data, now I want to randomly select 50 users and calculate based on the selected rows, (each user is identified by unique combination of ), this query's result is:
usermac1, userag_hash1, area1, 10, 5
usermac2, userag_hash2, area2, 20, 8
...
But if I don't use "GROUP BY" in the end of the query, then it will return only one row:
usermac, userag_hash, areax, 1500, 700 (don't know what this row stands for)
I am wondering if the "1500, 700" is the sum of the last two columns of the previous results. like 1500 = 10+20+... 700 = 5+8+...
Based on the fact that you have only one aggregate function (count) and used on 2 columns, and you can run it without GROUP BY at all, you must be using the non-standards compliant MySQL.
SELECT v2.mac, v2.userag_hash, v2.area, count(*), count(distinct v2.video_id)
...
Whatever your data is, MySQL will return one row when you use aggregate functions, which is:
<undefined value>, <undefined value>, count of all rows, count of rows where v2.video_id is distinct (and probably non null).
So I think you have 1500 rows, and 700 distinct values of v2.video_id, or 700 non-null distinct values. To test this null idea, try:
count(distinct IFNULL(v2.video_id,'nullvaluehere'))
which will convert nulls to non-null so they will be included.
The "undefined values" could be the first row, last row, first where something is non null, first in an index, first in some cache, etc. There is no definition of what should happen when you write an invalid query.
Every SQL database I'm aware of other than MySQL will give you an error message and not even run the query. For the query to be valid, it must have all non-aggregated columns in the group by. eg. mac and userag_hash must both be in group by.