I'm attempting to sort an aggregate column, which contains some zero values. I need the zero values to be last.
For non-aggregate columns I can do something like this (simplified example query):
SELECT age FROM books
ORDER BY
age = 0,
age ASC
However, for aggregate columns I'm getting an error as the column doesn't exist:
SELECT avg(age) as avg_age FROM books
GROUP BY book.type
ORDER BY
avg_age = 0,
avg_age ASC
The error is:
SQLSTATE[42S22]: Column not found: 1247 Reference 'avg_age' not supported (reference to group function)
I totally appreciate why this is happening, but I wasn't able to find a workaround, any tips?
There seams to be a (old) related bug report
[21 Mar 2016 9:22] Jiří Kavalík
Description: When using alias to aggregated column in ORDER BY only
plain alias is allowed, using it in any expression returns error.
http://sqlfiddle.com/#!9/e87bb/7
Workarounds:
- select the expression and use its alias
- use a derived table and order the outer one
How to repeat: create table t(a int);
-- these work select sum(a) x from t group by a order by x; select sum(a) x from t group by a order by sum(a); select sum(a) x from t
group by a order by -sum(a);
-- this one wrongly gives "Reference 'x' not supported (reference to group function)" select sum(a) x from t group by a order by -x;
source
You would have to write, this is better as the query is then also ANSI/ISO SQL standard valid meaning the query is most likely better portable between most databases vendor software.
SELECT
avg(books.age) as avg_age
FROM books
GROUP BY books.type
ORDER BY
avg(books.age) = 0
, avg(books.age) ASC
see demo this bug is fixed in MySQL 8.0 see demo
Try repeating the code
SELECT avg(age) as avg_age
FROM books
GROUP BY book.type
ORDER BY avg(age) = 0, avg(age) ASC
Related
I have a table in a database mysql (5.7.21) like this:
+----------+--------------+-----------+-----------+
| id_price | id_reference | price_usd | unix_time |
+----------+--------------+-----------+-----------+
And I need to extract the average price (price_usd) grouped by week of year, or month (unix_time).
I prepare this query:
SELECT CONCAT(WEEKOFYEAR(FROM_UNIXTIME(unix_time)),
'-',
YEAR(FROM_UNIXTIME(unix_time))) as date,
AVG(price_usd) AS "model"
FROM price_avg
INNER JOIN reference ON reference.id_reference=price_avg.id_reference
WHERE price_avg.id_reference=1
GROUP BY WEEKOFYEAR(FROM_UNIXTIME(unix_time)),
YEAR(FROM_UNIXTIME(unix_time)),
price_avg.id_reference
ORDER BY unix_time ASC
The inner join is useful to get the name of the product having the the id.
I get this error:
#1055 - Expression #1 of SELECT list is not in GROUP BY clause and
contains nonaggregated column 'name_of_db.price_avg.unix_time' which is not
functionally dependent on columns in GROUP BY clause; this is
incompatible with sql_mode=only_full_group_by
I cannot change the settings of MySQL (I can't disable ONLY_FULL_GROUP_BY mode or anything else).
How do I have to change the query to extract the data in MySQL 5.7.21?
Thanks in advance.
You can use sub query so that it will be a full group by.
Select `date`,
AVG(price_usd) AS "model"
From (
SELECT CONCAT(WEEKOFYEAR(FROM_UNIXTIME(unix_time)),
'-',
YEAR(FROM_UNIXTIME(unix_time))) as `date`,
price_usd
FROM price_avg
INNER JOIN reference ON reference.id_reference=price_avg.id_reference
WHERE price_avg.id_reference=1
) t
GROUP BY `date`
ORDER BY substring(`date`, -4), substring(`date`, 1, 2) ASC;
Result:
date model
48-1998 11.99
36-2001 19.99
I solved with this query:
SELECT DATE_FORMAT(FROM_UNIXTIME(unix_time),'%Y-%m') as date, AVG(price_usd) AS model FROM price_avg INNER JOIN reference ON reference.id_reference=price_avg.id_reference WHERE price_avg.id_reference=1
GROUP BY date, price_avg.id_reference ORDER BY date ASC
I have week and year inverted but I can resolve in client enviroment!
Thank you all.
Hit this same problem, reported it as a possible bug on MySQL:
https://bugs.mysql.com/bug.php?id=90792&thanks=4
From the initial response, it sounded like it might be treated as a bug and fixed, but I think the follow-up suggests that GROUP BY expressions (instead of columns) aren't part of the SQL standard, and that determining if a complex expression is completely derived from GROUPed expressions is difficult and is, for now at least, something they decided not to pursue:
https://mysqlserverteam.com/when-only_full_group_by-wont-see-the-query-is-deterministic/
There are some workarounds in the meantime.
I'm trying to convert a query in Teradata to HIVE QL (HDF) and have struggled to find examples.
Teradata (my functional end goal) - want a count of records in the table, then for each growth_type_id value and ultimately a % each group is.
select trim(growth_type_id) AS VAL, COUNT(1) AS cnt, SUM(cnt) over () as GRP_CNT,CNT/(GRP_CNT* 1.0000) AS perc
from acdw_apex_account_strategy
qualify perc > .01 group by val
Note: running HDP-2.4.3.0-227
select val
,cnt
,grp_cnt
,cnt/(grp_cnt* 1.0000) as perc
from (select trim(growth_type_id) as val
,count(*) as cnt
,sum(count(*)) over () as grp_cnt
from acdw_apex_account_strategy
group by trim(growth_type_id)
) t
where cnt/grp_cnt > 0.01
;
QUALIFY is unique to Teradata.
Aliases used everywhere in the query is unique to Teradata.
Grouping by columns positions is parameter dependent - hive.groupby.orderby.position.alias
Grouping by aliases is not supported - https://issues.apache.org/jira/browse/HIVE-1683
Hive doesn't use integer arithmetic (e.g. 7/4 - 1.75 and not 1 as in teradata)
decimal notation without preceding digit(s) is not valid
P.s.
You are using QUALIFY before the GROUP BY and although Teradata syntax is agile and the only requirement is that the SELECT/WITH clause will be positioned first, I strongly recommend to keep the standard order of clauses:
WITH - SELECT - FROM - WHERE - GROUP BY - HAVING - ORDER BY
I'm trying to insert using a select statement. However, I need to order the sub-select results using a ranking equation. If I create an alias, it throws off the column count. Can I somehow order my results using an equation?
INSERT INTO draft
( fk_contrib_id , end_time )
SELECT pk_contrib_id, UNIX_TIMESTAMP(), (X+Y+Z) AS ranking
FROM contrib
ORDER BY ranking DESC
LIMIT 1
I need the 'ranking' column for sorting, but if I do, the column count is off for the insert. Do I have to use two queries for this?
You could simply change your query to directly use the expression in the ORDER BY clause, like so:
INSERT INTO draft
( fk_contrib_id , end_time )
SELECT pk_contrib_id, UNIX_TIMESTAMP()
FROM contrib
ORDER BY (X+Y+Z) DESC
LIMIT 1
Remove the expression from the SELECT list. And use the expression in the ORDER BY clause.
ORDER BY X+Y+Z
It's perfectly valid to ORDER BY expressions that are not in the SELECT list.
I'm moving my Delphi application from MySQL to SQL server 2012. In MySQL I had this query:
SELECT *,(XS+S+M+L+XL+XXL+[1Size]+Custom) as Total FROM StockData
GROUP BY StyleNr,Customer,Color
ORDER BY StyleNr,Customer,Color
And it worked perfectly. But in Microsoft SQL Server 2012 this query says
Msg 8120, Level 16, State 1, Line 1
Column 'StockData.ID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
If I change my query to:
SELECT *,([XS]+[S]+[M]+[L]+[XL]+[XXL]+[1Size]+[Custom]) total
FROM [dbo].[stockdata]
GROUP BY ID,StyleNr,Customer,Color
ORDER BY StyleNr,Customer,Color
Then I get this error:
Msg 8120, Level 16, State 1, Line 1
Column 'dbo.stockdata.XS' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Any ideas?
Here is the table's design view:
SQL Server is working as expected. You must include all items in your SELECT list in either a GROUP BY or in an aggregate function:
SELECT *,(XS+S+M+L+XL+XXL+[1Size]+Custom) as Total
FROM StockData
-- GROUP BY ID,StyleNr,Customer,Color, XS,S,M,L,XL,XXL,[1Size],Custom
ORDER BY StyleNr,Customer,Color
Or you might be able to use:
SELECT StyleNr,Customer,Color, SUM(XS+S+M+L+XL+XXL+[1Size]+Custom) as Total
FROM StockData
GROUP BY StyleNr,Customer,Color
ORDER BY StyleNr,Customer,Color;
All columns in an aggregate query must either be used by an aggregate function or a group by. Try only selecting the columns you require rather than * I.e. select stylenr, customer, color, ([...] ) as Total from.
This is a SQL standard way of dealing with aggregates, you'd get a similar error in Oracle.
You can also use this approach:
with OrdinalOnGroup
(
SELECT
Ordinal = rank() over(partition by StyleNr, Customer, Color order by id)
, *, (XS+S+M+L+XL+XXL+[1Size]+Custom) as Total
FROM StockData
)
select *
from OrdinalOnGroup
where Ordinal = 1;
PARTITION BY denotes the grouping of related information, this is called windowing
I need to find duplicates in a table. In MySQL I simply write:
SELECT *,count(id) count FROM `MY_TABLE`
GROUP BY SOME_COLUMN ORDER BY count DESC
This query nicely:
Finds duplicates based on SOME_COLUMN, giving its repetition count.
Sorts in desc order of repetition, which is useful to quickly scan major dups.
Chooses a random value for all remaining columns, giving me an idea of values in those columns.
Similar query in Postgres greets me with an error:
column "MY_TABLE.SOME_COLUMN" must appear in the GROUP BY clause or be
used in an aggregate function
What is the Postgres equivalent of this query?
PS: I know that MySQL behaviour deviates from SQL standards.
Back-ticks are a non-standard MySQL thing. Use the canonical double quotes to quote identifiers (possible in MySQL, too). That is, if your table in fact is named "MY_TABLE" (all upper case). If you (more wisely) named it my_table (all lower case), then you can remove the double quotes or use lower case.
Also, I use ct instead of count as alias, because it is bad practice to use function names as identifiers.
Simple case
This would work with PostgreSQL 9.1:
SELECT *, count(id) ct
FROM my_table
GROUP BY primary_key_column(s)
ORDER BY ct DESC;
It requires primary key column(s) in the GROUP BY clause. The results are identical to a MySQL query, but ct would always be 1 (or 0 if id IS NULL) - useless to find duplicates.
Group by other than primary key columns
If you want to group by other column(s), things get more complicated. This query mimics the behavior of your MySQL query - and you can use *.
SELECT DISTINCT ON (1, some_column)
count(*) OVER (PARTITION BY some_column) AS ct
,*
FROM my_table
ORDER BY 1 DESC, some_column, id, col1;
This works because DISTINCT ON (PostgreSQL specific), like DISTINCT (SQL-Standard), are applied after the window function count(*) OVER (...). Window functions (with the OVER clause) require PostgreSQL 8.4 or later and are not available in MySQL.
Works with any table, regardless of primary or unique constraints.
The 1 in DISTINCT ON and ORDER BY is just shorthand to refer to the ordinal number of the item in the SELECT list.
SQL Fiddle to demonstrate both side by side.
More details in this closely related answer:
Select first row in each GROUP BY group?
count(*) vs. count(id)
If you are looking for duplicates, you are better off with count(*) than with count(id). There is a subtle difference if id can be NULL, because NULL values are not counted - while count(*) counts all rows. If id is defined NOT NULL, results are the same, but count(*) is generally more appropriate (and slightly faster, too).
Here's another approach, uses DISTINCT ON:
select
distinct on(ct, some_column)
*,
count(id) over(PARTITION BY some_column) as ct
from my_table x
order by ct desc, some_column, id
Data source:
CREATE TABLE my_table (some_column int, id int, col1 int);
INSERT INTO my_table VALUES
(1, 3, 4)
,(2, 4, 1)
,(2, 5, 1)
,(3, 6, 4)
,(3, 7, 3)
,(4, 8, 3)
,(4, 9, 4)
,(5, 10, 1)
,(5, 11, 2)
,(5, 11, 3);
Output:
SOME_COLUMN ID COL1 CT
5 10 1 3
2 4 1 2
3 6 4 2
4 8 3 2
1 3 4 1
Live test: http://www.sqlfiddle.com/#!1/e2509/1
DISTINCT ON documentation: http://www.postgresonline.com/journal/archives/4-Using-Distinct-ON-to-return-newest-order-for-each-customer.html
mysql allows group by to omit non-aggregated selected columns from the group by list, which it executes by returning the first row found for each unique combination of grouped by columns. This is non-standard SQL behaviour.
postgres on the other hand is SQL standard compliant.
There is no equivalent query in postgres.
Here is a self-joined CTE, which allows you to use select *. key0 is the intended unique key, {key1,key2} are the additional key elements needed to address the currently non-unique rows. Use at your own risk, YMMV.
WITH zcte AS (
SELECT DISTINCT tt.key0
, MIN(tt.key1) AS key1
, MIN(tt.key2) AS key2
, COUNT(*) AS cnt
FROM ztable tt
GROUP BY tt.key0
HAVING COUNT(*) > 1
)
SELECT zt.*
, zc.cnt AS cnt
FROM ztable zt
JOIN zcte zc ON zc.key0 = zt.key0 AND zc.key1 = zt.key1 AND zc.key2 = zt.key2
ORDER BY zt.key0, zt.key1,zt.key2
;
BTW: to get the intended behaviour for the OP, the HAVING COUNT(*) > 1 clause should be omitted.