This appears to work,
SELECT CONCATENATE(col1,col2) newcol,sum(othercol)
FROM mytable GROUP BY newcol.
Or even
SELECT STR_TO_DATE("%Y%m%d") as newcol,sum(othercol)
FROM mytable GROUP BY newcol.
Looking at these, one assumes the first example produces a count for each unique combination of col1,col2 as a string, the second example to produce a count for each day.
Having been bitten before by mysql e.g. silently ignoring "missing" columns in group by,
Does the above actually work, or are there any hidden gotchas ?
If you want to count rows, you should use count(*) instead of sum().
Otherwise, this pattern works fine.
(I personally use "GROUP BY 1" to signify grouping by the first column).
Related
Below is the data in my table:
TABLE:
abc-ac
abc-dc
aax-i
bcs-o-dc
ddd-o-poe-dc
I need to write a query which will display only the unique entries as a result:
abc-ac
aax-i
bcs-o-dc
ddd-o-poe-dc
So basically, since the first two entries start with "abc", it should be treated as one and displayed.
Thanks.
If you're not picky about which one of the two abc-* records that it shows you can use this:
SELECT f1 FROM mytable GROUP BY substring_index(f1, '-', 1)
SQLFiddle Here
That substring_index() function will split the value in your field by - and return the first bit. So essentially your records get grouped by only the first part. This is one of the few times that we can take advantage of MySQLs strange GROUP BY behavior where it will allow you to leave out non-aggregated fields from the group by.
I have a simple query with a few rows and multiple criteria in the where clause but it is only returning one row instead of 13. No joins and the syntax was triple checked and appears to be free of errors.
Query:
select column1, column2, column3
from mydb
where onecolumn in (number1, number2....number13)
Results:
returns one row of data associated with a random number in the where clause
spent a big part of the day trying to figure this one out and am now out of ideas. Please help...
Absent a more detailed test case, and the actual SQL statement that is actually running, this question cannot be answered. Here are some "ideas"...
Our first guess is that the rows you think are going to satisfy the predicates aren't actually satisfying all of the conditions.
Our second guess is that you've got an aggregate expression (COUNT(), MAX(), SUM()) in the SELECT list that's causing an implicit GROUP BY. This is a common "gotcha"... the non-standard MySQL extension to GROUP BY which allows non-aggregates to appear in the SELECT list, which are not also included as expressions in the GROUP BY clause. This same gotcha appears when the GROUP BY clause is omitted entirely, and an aggregate is included in the SELECT list.
But the question doesn't make any mention of an aggregate expression in the SELECT list.
Our third guess is another issue that beginners frequently overlook: the order of precedence of operations, especially AND and OR. For example, consider the expressions:
a AND b OR c
a AND ( b OR c )
( a AND b ) OR c
consider those while we sing-along, Sesame Street style,...: "One of these things is not like the others, one of these things just doesn't belong..."
A fourth guess... if it wasn't for the row being returned having a value of onecolumn as a random number in the IN list... if it was instead the first number in the IN list, we'd be very suspicious that the IN list actually contains a single string value that looks like a list a values, but is actually not.
The two expression in the SELECT list look very similar, but they are very different:
SELECT t.n IN (2,3,5,7) AS n_in_list
, t.n IN ('2,3,5,7') AS n_in_string
FROM ( SELECT 2 AS n
UNION ALL SELECT 3
UNION ALL SELECT 5
) t
The first expression is comparing n to each value in a list of four values.
The second expression is equivalent to t.n IN (2).
This is a frequent trip up when neophytes are dynamically creating SQL text, thinking that they can pass in a string value and that MySQL will see the commas within the string as part of the SQL statement.
(But this doesn't explain how a some the random one in the list.)
Those are all just guesses. Those are some of the most frequent trip ups we see, but we're just guessing. It could be something else entirely. In it's current form, there is no definitive "answer" to the question.
Someone sent me a SQL query where the GROUP BY clause consisted of the statement: GROUP BY 1.
This must be a typo right? No column is given the alias 1. What could this mean? Am I right to assume that this must be a typo?
It means to group by the first column of your result set regardless of what it's called. You can do the same with ORDER BY.
SELECT account_id, open_emp_id
^^^^ ^^^^
1 2
FROM account
GROUP BY 1;
In above query GROUP BY 1 refers to the first column in select statement which is
account_id.
You also can specify in ORDER BY.
Note : The number in ORDER BY and GROUP BY always start with 1 not with 0.
In addition to grouping by the field name, you may also group by ordinal, or position of the field within the table. 1 corresponds to the first field (regardless of name), 2 is the second, and so on.
This is generally ill-advised if you're grouping on something specific, since the table/view structure may change. Additionally, it may be difficult to quickly comprehend what your SQL query is doing if you haven’t memorized the table fields.
If you are returning a unique set, or quickly performing a temporary lookup, this is nice shorthand syntax to reduce typing. If you plan to run the query again at some point, I’d recommend replacing those to avoid future confusion and unexpected complications (due to scheme changes).
It will group by first field in the select clause
That means *"group by the 1st column in your select clause". Always use GROUP BY 1 together with ORDER BY 1.
You can also use GROUP BY 1,2,3... It is convenient, but you need to pay attention to that condition; the result may not be what you want if someone has modified your select columns and it's not visualized.
It will group by the column position you put after the group by clause.
for example if you run 'SELECT SALESMAN_NAME, SUM(SALES) FROM SALES GROUP BY 1'
it will group by SALESMAN_NAME.
One risk on doing that is if you run 'Select *' and for some reason you recreate the table with columns on a different order, it will give you a different result than you would expect.
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Is there any difference between Group By and Distinct
What's the difference between GROUP BY and DISTINCT in a MySQL query?
Duplicate of
Is there any difference between GROUP BY and DISTINCT
It is already discussed here
If still want to listen here
Well group by and distinct has its own use.
Distinct is used to filter unique records out of the records that satisfy the query criteria.
Group by clause is used to group the data upon which the aggregate functions are fired and the output is returned based on the columns in the group by clause. It has its own limitations such as all the columns that are in the select query apart from the aggregate functions have to be the part of the Group by clause.
So even though you can have the same data returned by distinct and group by clause its better to use distinct. See the below example
select col1,col2,col3,col4,col5,col6,col7,col8,col9 from table group by col1,col2,col3,col4,col5,col6,col7,col8,col9
can be written as
select distinct col1,col2,col3,col4,col5,col6,col7,col8,col9 from table
It makes you life easier when you have more columns in the select list. But at the same time if you need to display sum(col10) along with the above columns than you will have to use Group By. In that case distinct will not work.
eg
select col1,col2,col3,col4,col5,col6,col7,col8,col9,sum(col10) from table group by col1,col2,col3,col4,col5,col6,col7,col8,col9
Hope this helps.
DISTINCT works only on the entire row. Don't be mislead into thinking SELECT DISTINCT(A), B does something different. This is equivalent to SELECT DISTINCT A, B.
On the other hand GROUP BY creates a group containing all the rows that share each distinct value in a single column (or in a number of columns, or arbitrary expressions). Using GROUP BY you can use aggregate functions such as COUNT and MAX. This is not possible with DISTINCT.
If you want to ensure that all rows in your result set are unique and you do not need to aggregate then use DISTINCT.
For anything more advanced you should use GROUP BY.
Another difference that applies only to MySQL is that GROUP BY also implies an ORDER BY unless you specify otherwise. Here's what can happen if you use DISTINCT:
SELECT DISTINCT a FROM table1
Results:
2
1
But using GROUP BY the results will come in sorted order:
SELECT a FROM table1 GROUP BY a
Results:
1
2
As a result of the lack of sorting using DISTINCT is faster in the case where you can use either. Note: if you don't need the sorting with GROUP BY you can add ORDER BY NULL to improve performance.
Care to look at the docs:
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html
and
http://dev.mysql.com/doc/refman/5.0/en/select.html
I'm trying to do a rather complicated SELECT computation that I will generalize:
Main query is a wildcard select for a table
One subquery does a COUNT() of all items based on a condition (this works fine)
Another subquery does a SUM() of numbers in a column based on another condition. This also works correctly, except when no records meet the conditions, it returns NULL.
I initially wanted to add up the two subqueries, something like (subquery1)+(subquery2) AS total which works fine unless subquery2 is null, in which case total becomes null, regardless of what the result of subquery1 is. My second thought was to try to create a third column that was to be a calculation of the two subqueries (ie, (subquery1) AS count1, (subquery2) AS count2, count1+count2 AS total) but I don't think it's possible to calculate two calculated columns, and even if it were, I feel like the same problem applies.
Does anyone have an elegant solution to this problem outside of just getting the two subquery values and totalling them in my program?
Thanks!
Two issues going on here:
You can't use one column alias in another expression in the same SELECT list.
However, you can establish aliases in a derived table subquery and use them in an outer query.
You can't do arithmetic with NULL, because NULL is not zero.
However, you can "default" NULL to a non-NULL value using the COALESCE() function. This function returns its first non-NULL argument.
Here's an example:
SELECT *, count1+count2 AS total
FROM (SELECT *, COALESCE((subquery1), 0) AS count1,
COALESCE((subquery2), 0) AS count2
FROM ... ) t;
(remember that a derived table must be given a table alias, "t" in this example)
First off, the COALESCE function should help you take care of any null problems.
Could you use a union to merge those two queries into a single result set, then treat it as a subquery for further analysis?
Or maybe I did not completely understand your question?
I would try (for the second query) something like: SELECT SUM(ISNULL(myColumn, 0)) //Please verify syntax on that before you use it, though...
This should return 0 instead of null for any instance of that column being zero.
It might be unnecessary to say, but since you're using it inside a program, You'd rather use program logic to sum the two results (NULL and a number), due to portability issues.
Who knows when COALESCE function is deprecated or if another DBMS supports it or not.