How can I use aliases on GROUP BY clause? - mysql

Here is my query:
SELECT SUM(score) score,
type,
context,
post_id,
e.table_code,
comment_id,
MIN(seen) seen,
MAX(date_time) d_t,
(CASE WHEN FROM_UNIXTIME(MAX(date_time)) >= CURDATE() THEN 'today'
WHEN FROM_UNIXTIME(MAX(date_time)) >= DATE_SUB(CURDATE(), INTERVAL 1 DAY) THEN 'yesterday'
WHEN FROM_UNIXTIME(MAX(date_time)) >= DATE_SUB(CURDATE(), INTERVAL 7 DAY) THEN 'in last week'
ELSE 'in last month'
END) as range_day
FROM `events` e
WHERE e.id IN ($ids)
GROUP BY type, post_id, e.table_code, comment_id, range_day
ORDER BY seen, MAX(date_time) desc, MAX(e.id) desc
It throws this error:
#1056 - Can't group on 'range_day'
And if I remove range_day from GROUP BY clause, then it works as well. But I need to also group the result also based on range_day. How can I do that?

Not sure what you are trying to do. But you can wrap your query into a subquery without range_day in the GROUP BY clause. Then use your GROUP BY clause in the outer query as it is.
SELECT SUM(score) score,
type,
context, -- WARNING! Not listed in group by clause
post_id,
table_code,
comment_id,
MIN(seen) seen,
MAX(d_t) d_t,
range_day
FROM (
SELECT SUM(score) score,
MAX(id) as id,
type,
context, -- WARNING! Not listed in group by clause
post_id,
e.table_code,
comment_id,
MIN(seen) seen,
MAX(date_time) d_t,
(CASE WHEN FROM_UNIXTIME(MAX(date_time)) >= CURDATE() THEN 'today'
WHEN FROM_UNIXTIME(MAX(date_time)) >= DATE_SUB(CURDATE(), INTERVAL 1 DAY) THEN 'yesterday'
WHEN FROM_UNIXTIME(MAX(date_time)) >= DATE_SUB(CURDATE(), INTERVAL 7 DAY) THEN 'in last week'
ELSE 'in last month'
END) as range_day
FROM `events` e
WHERE e.id IN ($ids)
GROUP BY type, post_id, e.table_code, comment_id
) sub
GROUP BY type, post_id, table_code, comment_id, range_day
ORDER BY seen, MAX(d_t) desc, MAX(id) desc
However - you select context without aggregation wich is not listed in the GROUP BY clause. Thus you will get some "random" value from the group. In strict mode the query will fail.

Your definition of range_day doesn't exactly make sense. Why does it use MAX()? Max of what?
A natural way to make the query work is to remove the MAX() from the definition:
SELECT SUM(score) score, type, context, post_id, e.table_code, comment_id,
MIN(seen) as seen, MAX(date_time) as d_t,
(CASE WHEN FROM_UNIXTIME(date_time) >= CURDATE() THEN 'today'
WHEN FROM_UNIXTIME(date_time) >= DATE_SUB(CURDATE(), INTERVAL 1 DAY) THEN 'yesterday'
WHEN FROM_UNIXTIME(date_time) >= DATE_SUB(CURDATE(), INTERVAL 7 DAY) THEN 'in last week'
ELSE 'in last month'
END) as range_day
FROM `events` e
WHERE e.id IN ($ids)
GROUP BY type, post_id, e.table_code, comment_id, range_day
ORDER BY seen, MAX(date_time) desc, MAX(e.id) desc;
More comments:
The IN ($ids) probably doesn't do what you expect. The variables $ids is treated as a single value, so this is equivalent to e.di = $ids.
If this doesn't do what you want, then you might want MAX() at some other level of aggregation. That would require an additional subquery.

First, your problem...
You cannot GROUP BY an aggregate. Notice the MAX() inside range_day. Your GROUP BY should include all non-aggregate items in the SELECT. context is missing and may lead to an error in subsequent releases.
Then another problem...
MIN(seen) seen,
MAX(d_t) d_t,
...
ORDER BY seen,
MAX(d_t) desc
Notice an inconsistency? An ambiguity? Is seen (in the ORDER BY) supposed to be the original seen, or the alias seen, meaning MIN(seen)? Ditto for d_t?
Always try to avoid having an alias spelled the same as a column name when you need to refer to it later. In WHERE, it must be the column name; in ORDER BY and HAVING, it is the alias.
So, I think, this is wrong: MAX(d_t) desc in the ORDER BY.

Related

MYSQL max() and group by error:only_full_group_by

I have question about a MySQL query that is logging error's since updating the MySQL-5.7.
The error is the "only_full_group_by" which is will spoken off on stackoverflow.
In many answers it's stated not to disable this option but improve your sql query.
The query that I'm using is returning the minimum and maximum values of a counter per hour.
SELECT MAX( counter ) AS max,
MIN( counter ) AS min,
DATE_FORMAT(date_time, '%H:%i') AS dt
FROM table1
WHERE date_time >= NOW() - INTERVAL 1 DAY
GROUP BY YEAR(date_time), MONTH(date_time), DAY(date_time), HOUR(date_time)
as I understand from the error message I'm missing one of the items from the SELECT cause in the GROUP BY cause. But however I restort/remove/add items I'm not getting the result I got before the upgrade to MySQL-5.7.
I tried to subquery the main query to improve the SQL query. But somehow I can't recreate the results.
What is it I'm missing?
MySQL isn't able to determine the functional dependence ... between the expressions in the GROUP BY clause, and the expressions in the SELECT list.
The non-aggregate expression in the SELECT list (DATE_FORMAT(date_time, '%H:%i') includes a minutes component. The GROUP BY clause is going to collapse the rows into groups by just hour. So the value of the minutes is indeterminate... we know it's going to come from some row in the group, but there's no guarantee which one.
(The question reference to ONLY_FULL_GROUP_BY seems to indicate that we've got some understanding of indeterminate values...)
The easiest (fewest) changes fix would be to wrap that expression in a MIN or MAX function.
SELECT MAX(t.counter) AS `max`
, MIN(t.counter) AS `min`
, MIN(DATE_FORMAT(t.date_time,'%H:%i')) AS `dt`
FROM table1 t
WHERE t.date_time >= NOW() - INTERVAL 1 DAY
GROUP
BY YEAR(t.date_time)
, MONTH(t.date_time)
, DAY(t.date_time)
, HOUR(t.date_time)
ORDER
BY YEAR(t.date_time)
, MONTH(t.date_time)
, DAY(t.date_time)
, HOUR(t.date_time)
If we want rows returned in a particular order, we should include an ORDER BY clause, and not rely on MySQL-specific extension or behavior of GROUP BY (which may disappear in future releases.)
It's a bit odd to be doing a GROUP BY year, month, day and not including those values in the SELECT list. (It's not invalid to do that, just kind of strange. The conditions in the WHERE clause are guaranteeing that we don't have more than 24 hours span for date_time.
My preference would to do the GROUP BY on the same expression as the non-aggregate in the SELECT list. If I ever needed more than 24 hours, I'd include the date component:
SELECT MAX(t.counter) AS `max`
, MIN(t.counter) AS `min`
, DATE_FORMAT(t.date_time,'%Y-%m-%d %H:00') + INTERVAL 0 DAY AS `dt`
FROM table1 t
WHERE t.date_time >= NOW() - INTERVAL 1 DAY
GROUP
BY DATE_FORMAT(t.date_time,'%Y-%m-%d %H:00') + INTERVAL 0 DAY
ORDER
BY DATE_FORMAT(t.date_time,'%Y-%m-%d %H:00') + INTERVAL 0 DAY
--or--
if we always know it's just one day's worth of date_time, and we only want to return the hour, then we can group by just the hour. The same expression as in the SELECT list.
SELECT MAX(t.counter) AS `max`
, MIN(t.counter) AS `min`
, DATE_FORMAT(t.date_time,'%H:00') AS `dt`
FROM table1 t
WHERE t.date_time >= NOW() - INTERVAL 1 DAY
GROUP
BY DATE_FORMAT(t.date_time,'%H:00')
, DATE_FORMAT(t.date_time,'%Y-%m-%d %H')
ORDER
BY DATE_FORMAT(t.date_time,'%Y-%m-%d %H')
SELECT MAX( counter ) AS max,
MIN( counter ) AS min,
YEAR(date_time) AS g_year,
MONTH(date_time)AS g_month,
DAY(date_time) AS g_day,
HOUR(date_time) AS g_hour
FROM table1
WHERE date_time >= NOW() - INTERVAL 1 DAY
GROUP BY g_year, g_month, g_day, g_hour
Or you can get rid of redundant data if you always do it for 1 day:
SELECT MAX( counter ) AS max,
MIN( counter ) AS min,
DAY(date_time) AS g_day,
HOUR(date_time) AS g_hour
FROM table1
WHERE date_time >= NOW() - INTERVAL 1 DAY
GROUP BY g_day, g_hour

how do I replace missing records with nulls in a mysql query?

I get the number of tests taken by a unit :
select
date(START_DATE_TIME), product_id, BATCH_SERIAL_NUMBER, count(*)
from
( select START_DATE_TIME, product_id, uut_serial_number, BATCH_SERIAL_NUMBER
from uut_result
where START_DATE_TIME >= '2016-07-01 00:00:00'
and START_DATE_TIME <= '2016-07-07 23:59:59') as passtbl
group by date(START_DATE_TIME), product_id, batch_serial_number;
I fetch the number of tests a unit passed broken down by day:
select
date(START_DATE_TIME), product_id, BATCH_SERIAL_NUMBER, count(*)
from
( select START_DATE_TIME, product_id, uut_serial_number, BATCH_SERIAL_NUMBER
from uut_result
where START_DATE_TIME >= '2016-07-01 00:00:00'
and START_DATE_TIME <= '2016-07-07 23:59:59'
and uut_status = 'passed' ) as passtbl
group by date(START_DATE_TIME), product_id, batch_serial_number;
what I'm finding is that there are units that don't have any pass records at all, so the second query is returning fewer records than the first. This is breaking post processing. Is there a way to catch the absence of a record and replace it with null or some other dummy value?
select date(START_DATE_TIME),
product_id,
BATCH_SERIAL_NUMBER,
status,
count(*)
from (select *,
case when uut_status = 'passed' then uut_status
else 'other statuses'
end status
from uut_result)
where START_DATE_TIME >= '2016-07-01 00:00:00'
and START_DATE_TIME <= '2016-07-07 23:59:59'
group by date(START_DATE_TIME),
status,
product_id,
batch_serial_number;
My standard answer to everything like this is to use a common table expression and window functions, instead of using group by where you lose the details and have to struggle to recover them.
To get a dummy row you might use a union like this:
;with myCTE (unitId, otherdetail, passed)
as (
select unitDetail, otherdetail, Sum(1) Over (partition by unit) as passed
from sourceTable
)
SELECT unitid, otherDetail, passed
from myCTE
where startDate >= lowerbound and startdate < upperBound
UNION
SELECT unitId, otherdetail, 0 as passed
from sourceTable S
where not exists (select 1 from myCTE where myCTE.unitId = S.unitID
and startDate >= lowerbound and startdate < upperBound)
I think that's a pretty good rough sketch of what you need.
Also I would used a half-open interval to compare times
On the off chance that the startTime is between 11:59:59 and 0:00 the next
day.
You never mentioned what db engine [Duh, it's in the title I was looking for a TAG]. CTE is available on SQL Server and Oracle, but not on MySQL.
For most uses you can substitute a correlated subquery,but you have to repeat yourself. The ';' before WITH is a quirk of SQL
Server.
Since you are MySQL, you have to duplicate the CTE as a subquery where it is referenced. Or maybe you have table-valued functions??

conversion mysql to postgresql

I have a working mysql query, but I can not get it work with postgres. This is the query (I already changed date format to to_char
SELECT country as grouper, date(users.created_at) as date,
to_char(users.created_at, '%Y-%m') as date_group,
count(id) as total_count
FROM "users"
WHERE (users.created_at >= '2011-12-01')
AND (users.created_at <= '2014-02-11')
GROUP BY grouper, date_group
ORDER BY date ASC
I am getting the error:
PG::Error: ERROR: column "users.created_at" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT country as grouper, date(users.created_at) as date, t...
Thank for your help.
SELECT country as grouper, date(MIN(users.created_at)) as date,
to_char(MIN(users.created_at), '%Y-%m') as date_group,
count(id) as total_count
FROM "users"
HAVING (users.created_at >= '2011-12-01')
AND (users.created_at <= '2014-02-11')
GROUP BY grouper, date_group
ORDER BY date ASC
MySQL is not very strict. In standard conform SQL all column values have to use an aggrate function (SUM, COUNT, MAX, MIN) on non-grouping fields - when using GROUP BY.
Honestly said, I am not entirely sure about data_group in the GROUP BY; can it be dropped?
Also note that I have switched WHERE with a HAVING.
You should use every selected column in GROUP BY section.
SELECT country as grouper, to_char(created_at, '%Y-%u') as date_group, count(id) as total_count
FROM "users"
WHERE created_at >= '2013-10-01'
AND created_at <= '2014-02-11'
GROUP BY grouper, date_group
ORDER BY date_group ASC

Use the result of a Subquery in WHERE CLAUSE

I'm try to use the result of a subquery in the query's WHERE clause. The attribute I am wishing to use is last_contact. See below.
SELECT forename, surname, type,
( SELECT MAX(completed_date)
FROM tblTasks
WHERE prospect_id = tblProspects.prospect_id AND completed = '1'
) AS last_contact,
created_at
FROM tblProspects
WHERE hidden != '1' AND type='Prospect' AND last_contact > DATE_ADD(CURDATE(), INTERVAL -90 DAY)
ORDER BY last_contact ASC
I get the SQL Error: #1054 - Unknown column 'last_contact' in 'where clause'
Any help would be greatly appreciated.
Thanks.
You need to use HAVING clause in order to refine your results by custom aliased coulmn they cannot be used in where clause
SELECT
forename,
surname,
type,
(SELECT MAX(completed_date) FROM tblTasks WHERE prospect_id = tblProspects.prospect_id AND completed = '1') AS last_contact,
created_at
FROM tblProspects
WHERE hidden != '1' AND type='Prospect'
HAVING (
last_contact > DATE_ADD(CURDATE(), INTERVAL -90 DAY)
OR last_contact IS NULL
)
ORDER BY last_contact ASC

MySQL - How can I improve these queries?

first one:
SELECT MONTH(timestamp) AS d, COUNT(*) AS c
FROM table
WHERE YEAR(timestamp)=2012 AND Status = 1
GROUP BY MONTH(timestamp)
one of the issues I'm facing for this one is that I have to run multiple queries that use different values for Status. Is there a way to combine them into one? Like in one column it would have all the counts for when Status=1 and another column for when Status=2, etc.
second one:
SELECT COUNT(*) c , MONTH(timestamp) t FROM
(
SELECT t.adminid, timestamp
FROM table1 t
LEFT JOIN admins a ON a.adminID=t.adminID
WHERE YEAR(timestamp)=2012
GROUP BY t.adminID, DATE(Timestamp)
ORDER BY timestamp DESC
) AS a
GROUP BY MONTH(timestamp)
ORDER BY MONTH(timestamp) ASC;
a nested query, not sure if I can improve on this. I'm running this one on 2 tables, one has ~35k rows and one has ~300k rows. It takes about half a second for the first table and 4-5 seconds for the second.
These might help:
First one:
SELECT MONTH(timestamp) AS d,
sum(case when Status=1 then 1 else 0 end) as Status1Count,
sum(case when Status=2 then 1 else 0 end) as Status2Count,
sum(case when Status=3 then 1 else 0 end) as Status3Count
FROM `table`
WHERE timestamp between '2012-01-01 00:00:00' and '2012-12-31 23:59:59'
AND Status in (1,2,3)
GROUP BY MONTH(timestamp);
Second one:
Make sure that there is an index on the timestamp column and then make sure that you do not run any conversion functions e.g. MONTH(timestamp) on the indexed column. Somthing like:
SELECT COUNT(*) c , a.m as t FROM
(
SELECT t.adminid, timestamp, MONTH(timestamp) as m
FROM table1 t
LEFT JOIN admins a ON a.adminID=t.adminID
WHERE timestamp between '2012-01-01 00:00:00' and '2012-12-31 23:59:59'
GROUP BY t.adminID, DATE(Timestamp)
ORDER BY timestamp DESC
) AS a
GROUP BY a.m
ORDER BY a.m ASC;
Second one is a bit tricky since I do not have the data in front of me so I can't see the DB access path!