I'm trying to do a rather complicated SELECT computation that I will generalize:
Main query is a wildcard select for a table
One subquery does a COUNT() of all items based on a condition (this works fine)
Another subquery does a SUM() of numbers in a column based on another condition. This also works correctly, except when no records meet the conditions, it returns NULL.
I initially wanted to add up the two subqueries, something like (subquery1)+(subquery2) AS total which works fine unless subquery2 is null, in which case total becomes null, regardless of what the result of subquery1 is. My second thought was to try to create a third column that was to be a calculation of the two subqueries (ie, (subquery1) AS count1, (subquery2) AS count2, count1+count2 AS total) but I don't think it's possible to calculate two calculated columns, and even if it were, I feel like the same problem applies.
Does anyone have an elegant solution to this problem outside of just getting the two subquery values and totalling them in my program?
Thanks!
Two issues going on here:
You can't use one column alias in another expression in the same SELECT list.
However, you can establish aliases in a derived table subquery and use them in an outer query.
You can't do arithmetic with NULL, because NULL is not zero.
However, you can "default" NULL to a non-NULL value using the COALESCE() function. This function returns its first non-NULL argument.
Here's an example:
SELECT *, count1+count2 AS total
FROM (SELECT *, COALESCE((subquery1), 0) AS count1,
COALESCE((subquery2), 0) AS count2
FROM ... ) t;
(remember that a derived table must be given a table alias, "t" in this example)
First off, the COALESCE function should help you take care of any null problems.
Could you use a union to merge those two queries into a single result set, then treat it as a subquery for further analysis?
Or maybe I did not completely understand your question?
I would try (for the second query) something like: SELECT SUM(ISNULL(myColumn, 0)) //Please verify syntax on that before you use it, though...
This should return 0 instead of null for any instance of that column being zero.
It might be unnecessary to say, but since you're using it inside a program, You'd rather use program logic to sum the two results (NULL and a number), due to portability issues.
Who knows when COALESCE function is deprecated or if another DBMS supports it or not.
Related
This is my current table, let's call it "TABLE"
I want end result to be:
I tried this query:
SELECT * FROM TABLE GROUP BY(service)
but it doesn't work
i tried replacing NULL with 0 and then perform group by but "TBA" (text value) is creating problem, kindly help me out!
This looks like simple aggregation:
select service, max(for1) for1, max(for2) for2, max(for3) for3
from mytable
group by service
This takes advantage of the fact that aggregate functions such as max() ignore null values. However if a column has more than one non-null value for a given service, only the greatest will appear in the resultset.
It is unclear what the datatype of your columns is. Different datatypes have different rules for sorting.
I have always had the understanding that you use SELECT to select columns from a table. However, I was thrown off when I saw SELECT LAST_INSERT_ID(). I understand what it does... but I don't understand how we can simply just ask for the last inserted id like that. Isn't it true that the SELECT keyword expects to see column names immediately afterwards... so how does that function call satisfy that requirement?
The SELECT statement normally works with a FROM clause to select columns -- and expressions on columns and constants -- from rows in a table.
Without the FROM clause, a SELECT simply evaluates the expressions and returns one row. The function LAST_INSERT_ID() is simply an expression that returns a value, so:
SELECT LAST_INSERT_ID()
returns a result set with single row with a single (unnamed) column.
Some databases do not like the idea of a SELECT without a FROM. Oracle is one of them. It requires a FROM clause and provides a table with one column and one row. MySQL also supports dual, so you could write:
SELECT LAST_INSERT_ID()
FROM dual;
This is handy, if you want to include a WHERE clause with the SELECT (the WHERE requires a FROM in MySQL).
I have a simple query with a few rows and multiple criteria in the where clause but it is only returning one row instead of 13. No joins and the syntax was triple checked and appears to be free of errors.
Query:
select column1, column2, column3
from mydb
where onecolumn in (number1, number2....number13)
Results:
returns one row of data associated with a random number in the where clause
spent a big part of the day trying to figure this one out and am now out of ideas. Please help...
Absent a more detailed test case, and the actual SQL statement that is actually running, this question cannot be answered. Here are some "ideas"...
Our first guess is that the rows you think are going to satisfy the predicates aren't actually satisfying all of the conditions.
Our second guess is that you've got an aggregate expression (COUNT(), MAX(), SUM()) in the SELECT list that's causing an implicit GROUP BY. This is a common "gotcha"... the non-standard MySQL extension to GROUP BY which allows non-aggregates to appear in the SELECT list, which are not also included as expressions in the GROUP BY clause. This same gotcha appears when the GROUP BY clause is omitted entirely, and an aggregate is included in the SELECT list.
But the question doesn't make any mention of an aggregate expression in the SELECT list.
Our third guess is another issue that beginners frequently overlook: the order of precedence of operations, especially AND and OR. For example, consider the expressions:
a AND b OR c
a AND ( b OR c )
( a AND b ) OR c
consider those while we sing-along, Sesame Street style,...: "One of these things is not like the others, one of these things just doesn't belong..."
A fourth guess... if it wasn't for the row being returned having a value of onecolumn as a random number in the IN list... if it was instead the first number in the IN list, we'd be very suspicious that the IN list actually contains a single string value that looks like a list a values, but is actually not.
The two expression in the SELECT list look very similar, but they are very different:
SELECT t.n IN (2,3,5,7) AS n_in_list
, t.n IN ('2,3,5,7') AS n_in_string
FROM ( SELECT 2 AS n
UNION ALL SELECT 3
UNION ALL SELECT 5
) t
The first expression is comparing n to each value in a list of four values.
The second expression is equivalent to t.n IN (2).
This is a frequent trip up when neophytes are dynamically creating SQL text, thinking that they can pass in a string value and that MySQL will see the commas within the string as part of the SQL statement.
(But this doesn't explain how a some the random one in the list.)
Those are all just guesses. Those are some of the most frequent trip ups we see, but we're just guessing. It could be something else entirely. In it's current form, there is no definitive "answer" to the question.
I'm trying to write a query that excludes values beyond 6 standard deviations from the mean of the result set. I expect this can be done elegantly with a subquery, but I'm getting nowhere and in every similar case I've read the aim seems to be just a little different. My result set seems to get limited to a single row, I'm guessing due to calling the aggregate functions. Conceptually, this is what I'm after:
SELECT t.Result FROM
(SELECT Result, AVG(Result) avgr, STD(Result) stdr
FROM myTable WHERE myField=myCondition limit=75) as t
WHERE t.Result BETWEEN (t.avgr-6*t.stdr) AND (t.avgr+6*t.stdr)
I can get it to work by replacing each use of the STD or AVG value (ie. t.avgr) with it's own select statement as:
(SELECT AVG(Result) FROM myTable WHERE myField=myCondition limit=75)
However this seems waay more messy than I expect it needs to be (I've a few conditions). At first I thought specifying a HAVING clause was necessary, but as I learn more it doesn't seem to be quite what I'm after. Am I close? Is there some snazzy way to access the value of aggregate functions for use in conditions (without needing to return the aggregate values)?
Yes, your subquery is an aggregate query with no GROUP BY clause, therefore its result is a single row. When you select from that, you cannot get more than one row. Moreover, it is a MySQL extension that you can include the Result field in the subquery's selection list at all, as it is neither a grouping column nor an aggregate function of the groups (so what does it even mean in that context unless, possibly, all the relevant column values are the same?).
You should be able to do something like this to compute the average and standard deviation once, together, instead of per-result:
SELECT t.Result FROM
myTable AS t
CROSS JOIN (
SELECT AVG(Result) avgr, STD(Result) stdr
FROM myTable
WHERE myField = myCondition
) AS stats
WHERE
t.myField = myCondition
AND t.Result BETWEEN (stats.avgr-6*stats.stdr) AND (stats.avgr+6*stats.stdr)
LIMIT 75
Note that you will want to be careful that the statistics are computed over the same set of rows that you are selecting from, hence the duplication of the myField = myCondition predicate, but also the removal of the LIMIT clause to the outer query only.
You can add more statistics to the aggregate subquery, provided that they are all computed over the same set of rows, or you can join additional statistics computed over different rows via a separate subquery. Do ensure that all your statistics subqueries return exactly one row each, else you will get duplicate (or no) results.
I created a UDF that doesn't calculate exactly the way you asked (it discards a percent of the results from the top and bottom, instead of using std), but it might be useful for you
(or someone else) anyway, matching the Excel function referenced here https://support.office.com/en-us/article/trimmean-function-d90c9878-a119-4746-88fa-63d988f511d3
https://github.com/StirlingMarketingGroup/mysql-trimmean
Usage
`trimmean` ( `NumberColumn`, double `Percent` [, integer `Decimals` = 4 ] )
`NumberColumn`
The column of values to trim and average.
`Percent`
The fractional number of data points to exclude from the calculation. For example, if percent = 0.2, 4 points are trimmed from a data set of 20 points (20 x 0.2): 2 from the top and 2 from the bottom of the set.
`Decimals`
Optionally, the number of decimal places to output. Default is 4.
This appears to work,
SELECT CONCATENATE(col1,col2) newcol,sum(othercol)
FROM mytable GROUP BY newcol.
Or even
SELECT STR_TO_DATE("%Y%m%d") as newcol,sum(othercol)
FROM mytable GROUP BY newcol.
Looking at these, one assumes the first example produces a count for each unique combination of col1,col2 as a string, the second example to produce a count for each day.
Having been bitten before by mysql e.g. silently ignoring "missing" columns in group by,
Does the above actually work, or are there any hidden gotchas ?
If you want to count rows, you should use count(*) instead of sum().
Otherwise, this pattern works fine.
(I personally use "GROUP BY 1" to signify grouping by the first column).