Using AS value in later on in query - mysql

Consider the following example query:
SELECT foo.bar,
DATEDIFF(
# Some more advanced logic, such as IF(,,), which shouldn't be copy pasted
) as bazValue
FROM foo
WHERE bazValue >= CURDATE() # <-- This doesn't work
How can I make the bazValue available later on in the query? I'd prefer this, since I believe that it's enough to maintain the code in one place if possible.

There are a couple of ways around this problem that you can use in MySQL:
By using an inline view (this should work in most other versions of SQL, too):
select * from
(SELECT foo.bar,
DATEDIFF(
# Some more advanced logic, such as IF(,,), which shouldn't be copy pasted
) as bazValue
FROM foo) buz
WHERE bazValue >= CURDATE()
By using a HAVING clause (using column aliases in HAVING clauses is specific to MySQL):
SELECT foo.bar,
DATEDIFF(
# Some more advanced logic, such as IF(,,), which shouldn't be copy pasted
) as bazValue
FROM foo
HAVING bazValue >= CURDATE()

As documented under Problems with Column Aliases:
Standard SQL disallows references to column aliases in a WHERE clause. This restriction is imposed because when the WHERE clause is evaluated, the column value may not yet have been determined. For example, the following query is illegal:
SELECT id, COUNT(*) AS cnt FROM tbl_name
WHERE cnt > 0 GROUP BY id;
The WHERE clause determines which rows should be included in the GROUP BY clause, but it refers to the alias of a column value that is not known until after the rows have been selected, and grouped by the GROUP BY.
You can however reuse the aliased expression, and if it uses deterministic functions the query optimiser will ensure that cached results are reused:
SELECT foo.bar,
DATEDIFF(
-- your arguments
) as bazValue
FROM foo
WHERE DATEDIFF(
-- your arguments
) >= CURDATE()
Alternatively, you can move the filter into a HAVING clause (where aliased columns will already have been calculated and are therefore available) - but performance will suffer as indexes cannot be used and the filter will not be applied until after results have been compiled.

As MySQL doesn't support CTE, consider using inline view:
SELECT foo.bar,
FROM foo,
(SELECT DATEDIFF(
# Some more advanced logic, such as IF(,,), which shouldn't be copy pasted
) as bazValue
) AS iv
WHERE iv.bazValue >= CURDATE()

Related

MySQL: Optimize left join on formatted date

I'm trying to optimize the speed of this query:
SELECT t.t_date td, v.visit_date vd
FROM temp_dates t
LEFT JOIN visits v ON DATE_FORMAT(v.visit_date, '%Y-%m-%d') = t.t_date
ORDER BY t.t_date
v.visit_date is of type DATETIME and t.t_date is a string of format '%Y-%m-%d'.
Simply creating an index on v.visitdate didn't improve the speed. Therefore I intended to try the solution #oysteing gave here:
How to optimize mysql group by with DATE_FORMAT
I successfully created a virtual column by this SQL
ALTER TABLE visits ADD COLUMN datestr varchar(10) AS (DATE_FORMAT(visit_date, '%Y-%m-%d')) VIRTUAL;
However when I try to create an index on this column by
CREATE INDEX idx_visit_date on visits(datestr) I get this error:
#1901 - Function or expression 'date_format()' cannot be used in the GENERATED ALWAYS AS clause of datestr
What am I doing wrong? My DB is Maria DB 10.4.8
Best regards - Ulrich
date_format() cannot be used for persistent generated columns either. And in an index it cannot be just virtual, it has to be persisted.
I could not find an explicit statement in the manual, but I believe this is due to the fact that the output of date_format() can depend on the locale and isn't strictly deterministic therefore.
Instead of date_format() you can build the string using deterministic functions such as concat(), year(), month(), day() and lpad().
...
datestr varchar(10) AS (concat(year(visit_date),
'-',
lpad(month(visit_date), 2, '0'),
'-',
lpad(day(visit_date), 2, '0')))
...
But as I already mentioned in a comment, you're fixing the wrong end. Dates/times should never be stored as strings. So you should rather promote temp_dates.t_date to a date and use date() to extract the date portion of visit_date in the generated, indexed column
...
visit_date_date date AS (date(visit_date))
...
And you might also want to try to also index temp_dates.t_date.
Does this work for you?
SELECT t.t_date td, v.visit_date vd
FROM temp_dates t
LEFT JOIN visits v ON DATE(v.visit_date) = DATE(t.t_date)
ORDER BY t.t_date
If so, there's a workable solution to your problem:
Add a DATE column using the deterministic DATE() function on your visit_date object. Like this.
ALTER TABLE visits ADD COLUMN dateval DATE AS (DATE(visit_date)) VIRTUAL;
CREATE INDEX idx_visit_date on visits(dateval);
Then create a virtual column in the other table (the one with the nicely formatted dates jammed into your VARCHAR() column.
ALTER TABLE temp_dates ADD COLUMN dateval DATE AS (DATE(t_date)) VIRTUAL;
CREATE INDEX idx_temp_dates_date on temp_dates (dateval);
This works because DATE() is deterministic, unlike DATE_FORMAT().
Then your query should be.
SELECT t.t_date td, v.visit_date vd
FROM temp_dates t
LEFT JOIN visits v ON v.dateval = t.dateval
ORDER BY t.t_date
This solution gives you indexes on (virtual) DATE columns. That's nice because index matching on such columns is efficient.
But, your best solution is to change the datatype of temp_date.t_date from VARCHAR() to DATE.
DATE_FORMAT(expr, format) cannot be used in virtual columns as it depends on the locale of the connection (MariaDB issue MDEV-11553).
A 3 argument form was created to date_format that adds locale.
DATE_FORMAT(visit_date, '%Y-%m-%d', 'en_US') is possible to use in virtual column expressions in MariaDB-10.3+ stable versions.
Using DATE or altering your query not to use functions around column expressions is definitely recommended.
Functions are not "sargeable".
Consider:
ON v.visit_date >= t.t_date
AND v.visit_date < t.t_date + INTERVAL 1 DAY

Reuse variables in sql query

I want to know is there a way to reuse variables defined as AS and not to rewrite our code every time or not?
Example
I have such line:
IFNULL((CASE WHEN (((c.gaji_pokok+c.tunjangan+q.jkk+q.jkm)*12)+('c.komisi' * 'c.thr') >= 6000000) then ((c.gaji_pokok+c.tunjangan+q.jkk+q.jkm)*12)+('c.komisi' * 'c.thr')*0.05 ELSE 6000000 end),0) as b_jabatan,
and the next time I need such command I just want to use b_jabatan instead of writing whole line
example
IFNULL((b_jabatan,0) as brutto_tahun,
is that possible?
You can't reuse an alias in the same scope - left apart the order by clause (and, in MySQL, the group by and having clauses).
You either need to repeat the expression, or use a subquery or CTE (which creates a new scode). So:
select t.*, coalesce(b_jabatan,0) as brutto_tahun, ...
from (
select
< your super long expression > as b_jabatan,
...
from ...
) t
Use the least() function:
coalesce(least( ((c.gaji_pokok+c.tunjangan+q.jkk+q.jkm)*12)+(c.komisi * c.thr)*0.05, 6000000
), 0
) as b_jabatan,

Can't Set User-defined Variable From MySQL to Excel With ODBC

I have a query that has a user-defined variable set on top of the main query. Its something like this.
set #from_date = '2019-10-01', #end_date = '2019-12-31';
SELECT * FROM myTable
WHERE create_time between #from_date AND #end_date;
It works just fine when I executed it in MySQL Workbench, but when I put it to Excel via MySQL ODBC it always shows an error like this.
I need that user-defined variable to works in Excel. What am I supposed to change in my query?
The ODBC connector is most likely communicating with MySQL via statements or prepared statements, one statement at a time, and user variables are not supported. A prepared statement would be one way you could bind your date literals. Another option here, given your example, would be to just inline the date literals:
SELECT *
FROM myTable
WHERE create_time >= '2019-10-01' AND create_time < '2020-01-01';
Side note: I expressed the check on the create_time, which seems to want to include the final quarter of 2019, using an inequality. The reason for this is that if create_time be a timestamp/datetime, then using BETWEEN with 31-December on the RHS would only include that day at midnight, at no time after it.
Use subquery for variables values init:
SELECT *
FROM myTable,
( SELECT #from_date := '2019-10-01', #end_date := '2019-12-31' ) init_vars
WHERE create_time between #from_date AND #end_date;
Pay attention:
SELECT is used, not SET;
Assinging operator := is used, = operator will be treated as comparing one in this case giving wrong result;
Alias (init_vars) may be any, but it is compulsory.
Variable is initialized once but may be used a lot of times.
If your query is complex (nested) the variables must be initialized in the most inner subquery. Or in the first CTE if DBMS version knows about CTEs. If the query is extremely complex, and you cannot determine what subquery is "the most inner", then look for execution plan for what table is scanned firstly, and use its subquery for variables init (but remember that execution plan may be altered in any moment).

How to count when value crosses the average

I am trying to write a MySQL query that would count the number of times a value crosses a constant. The end result is we are tying to determine the relative 'noise' of the value via the amplitude and the frequency of the value. MIN() and MAX() provide the amplitude. Count() gives the number of samples that fit the criteria, but it doesn't provide how stable that value is. We are currently using MySQL 5.7 but we will be moving to MySQL 8.0 that provides the windowing features. Something like
Select Count(Value) over (order by logtime ROWS 1 Proeeding <123 AND 1 Following > 123) WHERE logtime BETWEEN...;
Thank your for any help you can provide.
SELECT Count(Value) WHERE Value > 123 AND logtime BETWEEN...;
SELECT Count(Value) WHERE Value < 123 AND logtime BETWEEN...;
Window functions are not available in MySQL versions before 8.0
With MySQL 5.7, we can emulate some window functions by using user-defined variables in a carefully crafted query. The MySQL Reference Manual gives explicit warning about using user-defined variables in a context like this. We are relying on behavior that is not guaranteed.
But as an example of the pattern I would use to achieve the specified result:
SELECT SUM(c.crossed_avg) AS count_crossed_avg
FROM (
SELECT IF( ( #prval > a.avg_ AND t.value < a.avg_ ) OR
( #prval < a.avg_ AND t.value > a.avg_ )
,1,0) AS crossed_avg
, #prval := t.value AS value_
FROM mytable t
CROSS
JOIN ( SELECT 123 AS avg_ ) a
CROSS
JOIN ( SELECT #prval := NULL ) i
WHERE ...
ORDER BY t.logtime
) c
To unpack this, focus first on the inline view query; that is, ignore the SELECT SUM() wrapper query, and run just the inline view query.
We order the rows by logtime so that we can process the rows in order.
We compare the value on the current row to the value from the previous row. If one is above average and the other is below average, then we return a 1, else we return 0.
Save the current value into the user-defined variable for comparing the next row. (Note: the order of operations is important; we are depending on MySQL to do that assignment after the evaluation of the IF() function.
The example query doesn't address the edge case when a row value is exactly equal to the average, e.g. a sequence of values 124.4 < 123.0 < 122.2. (We might want to consider changing the comparisons so that one includes the equality e.g. < and >=.

HIVE Sum Over query

I'm trying to convert a query in Teradata to HIVE QL (HDF) and have struggled to find examples.
Teradata (my functional end goal) - want a count of records in the table, then for each growth_type_id value and ultimately a % each group is.
select trim(growth_type_id) AS VAL, COUNT(1) AS cnt, SUM(cnt) over () as GRP_CNT,CNT/(GRP_CNT* 1.0000) AS perc
from acdw_apex_account_strategy
qualify perc > .01 group by val
Note: running HDP-2.4.3.0-227
select val
,cnt
,grp_cnt
,cnt/(grp_cnt* 1.0000) as perc
from (select trim(growth_type_id) as val
,count(*) as cnt
,sum(count(*)) over () as grp_cnt
from acdw_apex_account_strategy
group by trim(growth_type_id)
) t
where cnt/grp_cnt > 0.01
;
QUALIFY is unique to Teradata.
Aliases used everywhere in the query is unique to Teradata.
Grouping by columns positions is parameter dependent - hive.groupby.orderby.position.alias
Grouping by aliases is not supported - https://issues.apache.org/jira/browse/HIVE-1683
Hive doesn't use integer arithmetic (e.g. 7/4 - 1.75 and not 1 as in teradata)
decimal notation without preceding digit(s) is not valid
P.s.
You are using QUALIFY before the GROUP BY and although Teradata syntax is agile and the only requirement is that the SELECT/WITH clause will be positioned first, I strongly recommend to keep the standard order of clauses:
WITH - SELECT - FROM - WHERE - GROUP BY - HAVING - ORDER BY