Define and use a variable with a subquery? - mysql

I know normally "the order of evaluation for expressions involving user variables is undefined" so we can't safely define and use a variable in the same select statement. But what if there's a subquery? As an example, I have something like this:
select col1,
(select min(date_)from t where i.col1=col1) as first_date,
datediff(date_, (select min(date_)from t where i.col1=col1)
) as days_since_first_date,
count(*) cnt
from t i
where anothercol in ('long','list','of','values')
group by col1,days_since_first_date;
Is there a way to use (select #foo:=min(date_)from t where i.col1=col1) safely instead of repeating the subquery? If so, could I do it in the datediff function or the first time the subquery appears (or either one)?
Of course, I could do
select col1,
(select min(date_)from t where i.col1=col1) as first_date,
date_,
count(*) cnt
from t i
where anothercol in ('long','list','of','values')
group by col1,date_;
and then do some simple postprocessing to get the datediff. Or I can write two separate queries. But those don't answer my question, which is whether one can safely define and use the same variable in a query and a subquery.

First, your query doesn't really make sense, because date_ has no aggregation functions. You are going to get an arbitrary value.
That said, you could repeat the subquery, but I don't see why that would be necessary. Just use a subquery:
select t.col1, t.first_date,
datediff(date_, first_date),
count(*)
from (select t.*, (select min(date_) from t where i.col1 = t.col1) as first_date
from t
where anothercol in ('long','list', 'of', 'values')
) t
group by col1, days_since_first_date;
As I mentioned, though, the value of the third column is problematic.
Note: this does occur additional overhead for materializing the subquery. However, there is a group by anyway, so the data is being read and written multiple times.

Related

DENSE_RANK() OVER and IFNULL()

Let's say I have a table like this -
id
number
1
1
2
1
3
1
I want to return the second largest number, and if there isn't, return NULL instead. In this case, since all the numbers in the table are the same, there isn't the second largest number, so it should return NULL.
These codes work -
SELECT IFNULL((
SELECT number
FROM (SELECT *, DENSE_RANK() OVER(ORDER BY number DESC) AS ranking
FROM test) r
WHERE ranking = 2), NULL) AS SecondHighestNumber;
However, after I changed the order of the query, it doesn't work anymore -
SELECT IFNULL(number, NULL) AS SecondHighestNumber
FROM (SELECT *, DENSE_RANK() OVER(ORDER BY number DESC) AS ranking
FROM test) r
WHERE ranking = 2;
It returns blank instead of NULL. Why?
Explanation
This is something of a byproduct of the way you are using subquery in your SELECT clause, and really without a FROM clause.
It is easy to see with a very simple example. We create an empty table. Then we select from it where id = 1 (no results as expected).
CREATE TABLE #foo (id int)
SELECT * FROM #foo WHERE id = 1; -- Empty results
But now if we take a left turn and turn that into a subquery in the select statement - we get a result!
CREATE TABLE #foo (id int)
SELECT (SELECT * FROM #foo WHERE id = 1) AS wtf; -- one record in results with value NULL
I'm not sure what else we could ask our sql engine to do for us - perhaps cough up an error and say I can't do this? Maybe return no results? We are telling it to select an empty result set as a value in the SELECT clause, in a query that doesn't have any FROM clause (personally I would like SQL to cough up and error and say I can't do this ... but it's not my call).
I hope someone else can explain this better, more accurately or technically - or even just give a name to this behavior. But in a nutshell there it is.
tldr;
So your first query has SELECT clause with an IFNULL function in it that uses a subquery ... and otherwise is a SELECT without a FROM. So this is a little weird but does what you want, as shown above. On the other hand, your second query is "normal" sql that selects from a table, filters the results, and lets you know it found nothing -- which might not be what you want but I think actually makes more sense ;)
Footnote: my "sql" here is T-SQL, but I believe this simple example would work the same in MySQL. And for what it's worth, I believe Oracle (back when I learned it years ago) actually would cough up errors here and say you can't have a SELECT clause with no FROM.

Multiple select in select

I have a code like this:
SELECT column1 = (SELECT MAX(column-name21) FROM table-name2 WHERE condition2 GROUP BY id2) as m,
column2 = (SELECT count(*) FROM table-name2 WHERE condition2 GROUP BY id2) as c,
column-names
FROM table-name
WHERE condition
ORDER BY ordercondition
LIMIT 25,50
those internal selects are quite long and complicated.
My question is are there in mysql language contracts, which allow one to avoid duplicating code and computations in this case?
For example, something like this
SELECT (column1, column2) = (SELECT MAX(column-name1) as m, count(*) as c FROM table-name WHERE condition GROUP BY id),
column-names
FROM table-name
WHERE condition
ORDER BY ordercondition
LIMIT 25,50
which of course won't be interpreted by mysql.
I tried this:
SELECT (SELECT MAX(column-name1) as column1, count(*) as column2 FROM table-name WHERE condition GROUP BY id),
column-names
FROM table-name
WHERE condition
ORDER BY ordercondition
LIMIT 25,50
and it also doesn't work.
Such subqueries get cumbersome when you need more than one from the same source. Usually, the "fix" is to us a "derived table" and JOIN:
SELECT x2.col1, x2.col2, names
FROM ( SELECT MAX(c21) AS col1,
COUNT(*) AS col2,
?? -- may be needed for "cond2"
FROM t2
WHERE cond2a ) AS x2
JOIN t1
ON cond2b
WHERE cond1
ORDER BY ??? -- Limit is non-deterministic without ORDER BY
LIMIT 25, 50
If the "condition" in the subquery is "correlated", please specify it; it makes a big difference in how to transform the query.
The construct COUNT(col) is usually a mistake:
COUNT(*) -- the number of rows.
COUNT(DISTINCT col) -- the number of different values in column `col`.
COUNT(col) -- count the number of rows with non-NULL `col`.
Please provide your actual query and provide SHOW CREATE TABLE. I sloughed over several issues; "the devil is in the details".
for Edit 1
INDEX(tool, uuuuId) -- would help performance
Is uuuuId some form of "hash" or "UUID"? If so, that is relevant to seeing how the performance works. Also, how big (approximately) are the tables? What is the value of innodb_buffer_pool_size. (I am fishing for whether you are I/O-bound or CPU-bound.)
WZ needs INDEX(uuuuId, ppppppId, check1) But actually, that Select...=Yes can be turned and EXISTS for some speedup.
Z might benefit from INDEX(check1, uuuuId, ppppppId, check2)
Since Z and WZ are the same table, this might take care of both:
INDEX(ppppppId, uuuuId, check1, check2)
(The order is important.)

Avoiding redundant expressions in SELECT query

Is there any way to avoid repeating column expressions in the SELECT query? I want to divide the sum and count of a column but would like to use the assigned name instead of repeating SUM(value)/COUNT(value) or using a sub query. Is this possible? If so, does that speed up the query by not repeating the calculation of the sum and count or does mysql remember already calculated expressions?
SELECT datalist.type, SUM(value) AS type_sum, COUNT(value) AS type_count, type_sum/type_count
FROM (...) AS datalist
GROUP BY datalist.type
throws: #1054 - Unknown column 'type_sum' in field list
Unless you put it in outer query, this is the only way.
SELECT datalist.type, SUM(value) AS type_sum, COUNT(value) AS type_count, SUM(value)/COUNT(value)
FROM (...) AS datalist
GROUP BY datalist.type
One workaround would be to use a alias table with pre-defined calculations and then later call it from outer table such as:
select d.type_sum/d.type_count as dividedValue from (SELECT datalist.type, SUM(value)
AS type_sum, COUNT(value) AS type_count
FROM (...) )AS d
GROUP BY d.type

mysql calculate percentage for diffrent groups

I want to calculate percentage for test groups.
I have group A,B and C. And I want to know how much success percentage each group have.
My first query is counting total test ran in each group by doing the following:
SELECT type, count(type) as total_runs
From mytable
Where ran_at > '2015-09-11'
Group by type
Second query is counting success for each group:
SELECT type, count(type) as success
FROM mytable
where run_status like '%success%' and ran_at> '2015-09-11'
Group by type
Now I need to divide one in the other and multiply in 100.
how do I do this in one query in an efficient way, I guess nested query is not so efficient- but anyway I can't see how I can uses nested query to solve it.
I would appreciate answer which include simple way, maybe not so efficient, and an efficient way with explanations
You can just use conditional aggregation:
SELECT type, sum(run_status like '%success%') as success,
100 * avg(run_status like '%success%') as p_success
FROM mytable
where ran_at> '2015-09-11'
Group by type;
In a numeric context, MySQL treats boolean expressions as integers with 1 for true and 0 for false. The above works assuming that run_status is not NULL. If it can be NULL, then you need an explicit case statement for the avg().
I had this one, but Gordon have a better solution if run_status is not NULL.
Select type, sum(if(run_status like '%success%',1,0)) / count(1) * 100) as p_success
From mytable
Where ran_at > '2015-09-11'
Group by type

SQL Assign "co-efficients" to query conditions and use them to sort result

I saw a query once that assigned some kind of ranking to query conditions, I can't remember it now.
They way I understood it, i think variable names (s1,s2,...) were assigned to each of the conditions with a coefficient to give them different "weights" then the sum of the variables was used to sort the result.
It looked something like this:
SELECT
*
FROM
table_name
WHERE condition1='value1' as (s1*3)
OR condition2='value2' as (s2*2)
OR condition3='value3' as (s3*1)
ORDER BY (s1+s2+s3)
So, the different numbers sort of give the conditions varying degrees of importance in the ORDER, makes it perfect for doing a related product/post search.
Please, does anyone know the right structure for this query?
In MySQL, you would define the aliases in the SELECT clause and then use them in the ORDER BY. For instance:
SELECT t.*, (condition1 = 'value1') as s1,
(condition2 = 'value2') as s2, (condition3 = 'value3') as s3
FROM table t
ORDER BY (s1*3 + s2*2 + s3*1);