How to replace a null value with previous non-null value? - mysql

I want to replace a null value with previous non-null value using MySQL.
I tried this:
SELECT
`Date_Column`
,CASE
WHEN `Value_Column` is null
THEN (
SELECT
`Value_Column`
FROM
table_name t2
WHERE
`Date_Column` = (
SELECT
MAX(`Date_Column`)
FROM
table_name t3
WHERE
`Date_Column` < t1.`Date_Column`
AND `Value_Column` > 0
)
)
ELSE `Value_Column`
END AS `Value_Column`
FROM
table_name t1
This works but takes really long for big datasets.
I tried this for a subset of data and it worked.
Is there an easier/more efficient way to achieve the same?
Thanks.

Based on your query, your first check should be that your take is correctly indexed by the date column (and the value column as a covering index). If it is, then you could use the following slightly simplified query.
Note, this replaces NULL as per your description, where yours replaced 0 contrary to your description, you should be clear as to Exactly which behaviour you want.
SELECT
date_column,
COALESCE(
value_column,
(
SELECT lookup.value_column
FROM table_name AS lookup
WHERE lookup.date_column < table_name.date_column
AND lookup.value_column IS NOT NULL
AND table_name.value_column IS NULL
ORDER BY lookup.date_column DESC
LIMIT 1
)
)
FROM
table_name
(on my phone, so please forgive typos)

You can simplify the query to:
SELECT t1.Date_Column
(CASE WHEN t1.Value_Column = 0
THEN (SELECT t2.Value_Column
FROM table_name t2
WHERE t2.Date_Column < t1.Date_Column AND t2.Value_Column > 0
ORDER BY t2.Date_Column DESC
LIMIT 1
)
ELSE t2.Value_Column
END) AS Value_Column
FROM table_name t1;
This is an improvement on your query because it removes the second nested subquery. But it will still be slow. An index on table_name(Date_Column, Value_Column) might help.

Related

MySql is null vs is not null performance

I have a query where I am basically doing a left outer join and checking if the joined value is null
select count(T1.code)
from ( select code
from asset
where type = 'meter'
and creation_time <= '2022-04-29 00:00:00'
and (deactivation_time > '2022-04-28 00:00:00' or deactivation_time is null )
group by code
) as T1
left join ( select asset_code
from amr_midnight_data
where server_time between '2022-04-28 00:00:00' and '2022-04-29 00:00:00'
group by asset_code
) as T2 on T1.code = T2.asset_code
Where T2.asset_code is null;
This query takes 3 seconds to execute, but if I replace the is null at the end with is not null, it takes less then a second. Why is there a performance difference here and what alternatives do I have to make my original query faster?
Look at the EXPLAIN. A guess... Changing to IS NOT NULL lets the Optimizer change LEFT JOIN to JOIN, which lets it start with amr_midnight_data which might optimize better.
I think that the LEFT JOIN ( SELECT ... ) .. IS [NOT] NULL can be replaced with
WHERE [NOT] EXISTS ( SELECT 1 FROM amr_midnight_data
WHERE asset_code = T1.code
AND server_time >= '2022-04-28'
AND server_time < '2022-04-28' + INTERVAL 1 DAY )
That would like to have INDEX(asset_code, server_time)
EXISTS is faster than SELECT .. GROUP BY because it can stop as soon as one matching row is found.
asset would probably benefit from INDEX(type, creation_time) or (to make it "covering"):
INDEX(time, creation_time, deactivation_time, code)
If you wish to discuss further, please provide SHOW CREATE TABLE for both tables and EXPLAIN for each SELECT.

How to write the a new select query with backward compatibility after adding a new column to a MySQL database

I have a MySQL table running for 4 months and I have a select statement in that table, like below.
SELECT
CONCAT(
YEAR(FROM_UNIXTIME(creation_time)),
'-',
IF(
MONTH(FROM_UNIXTIME(creation_time)) < 10,
CONCAT('0', MONTH(FROM_UNIXTIME(creation_time))),
MONTH(FROM_UNIXTIME(creation_time))
)
) AS Period,
(
COUNT(CASE
WHEN system_name = 'System' THEN 1
ELSE NULL
END)
) AS "Some data",
FROM table_name
GROUP BY
Period
ORDER BY
Period DESC
Lately, I've added a new feature and a column, let's say is_rerun. This value is just added and not exist previously. Now, i would like to write a query with the current statement which checks the system_name and also the is_rerun field and if this field exists and value is 1 then return 1 and if the column not exist or it its value is zero, then return null.
I tried IF EXISTS re_run THEN 1 ELSE NULL, but no luck. I can also insert values for the previous runs but i don't want to do that. Is there any solution. Thanks.
SELECT
CONCAT(
YEAR(FROM_UNIXTIME(creation_time)),
'-',
IF(
MONTH(FROM_UNIXTIME(creation_time)) < 10,
CONCAT('0', MONTH(FROM_UNIXTIME(creation_time))),
MONTH(FROM_UNIXTIME(creation_time))
)
) AS Period,
(
COUNT(CASE
WHEN system_name = 'System' AND IF EXISTS is_rerun THEN 1
ELSE NULL
END)
) AS "Some data",
FROM table_name
GROUP BY
Period
ORDER BY
Period DESC
As a starter: you have a group by query, so you need to put is_rerun in an aggregate function.
Based on your description, I think that something like case(case when is_rerun = 1 then 1 end) should do the work: it returns 1 if any is_rerun in the group is 1, else null.
Or if you can live with 0 instead of null, then you can use a simpler expression: max(is_rerun = 1).
Note that your query could be largely simplified as for the date formating logic and the conditional count. I would phrase it as:
select
date_format(from_unixtime(creation_time),'%Y-%m') period,
sum(system_name = 'System') some_data,
max(is_rerun = 1) is_rerun
from mytable
group by period
order by period desc

Simplify CASE expression used multiple times

For readability, I would like to modify the below statement. Is there a way to extract the CASE statement, so I can use it multiple times without having to write it out every time?
select
mturk_worker.notes,
worker_id,
count(worker_id) answers,
count(episode_has_accepted_imdb_url) scored,
sum( case when isnull(imdb_url) and isnull(accepted_imdb_url) then 1
when imdb_url = accepted_imdb_url then 1
else 0 end ) correct,
100 * ( sum( case when isnull(imdb_url) and isnull(accepted_imdb_url) then 1
when imdb_url = accepted_imdb_url then 1
else 0 end)
/ count(episode_has_accepted_imdb_url) ) percentage
from
mturk_completion
inner join mturk_worker using (worker_id)
where
timestamp > '2015-02-01'
group by
worker_id
order by
percentage desc,
correct desc
You can actually eliminate the case statements. MySQL will interpret boolean expressions as integers in a numeric context (with 1 being true and 0 being false):
select mturk_worker.notes, worker_id, count(worker_id) answers,
count(episode_has_accepted_imdb_url) scored,
sum(imdb_url = accepted_imdb_url or imdb_url is null and accepted_idb_url is null) as correct,
(100 * sum(imdb_url = accepted_imdb_url or imdb_url is null and accepted_idb_url is null) / count(episode_has_accepted_imdb_url)
) as percentage
from mturk_completion inner join
mturk_worker
using (worker_id)
where timestamp > '2015-02-01'
group by worker_id
order by percentage desc, correct desc;
If you like, you can simplify it further by using the null-safe equals operator:
select mturk_worker.notes, worker_id, count(worker_id) answers,
count(episode_has_accepted_imdb_url) scored,
sum(imdb_url <=> accepted_imdb_url) as correct,
(100 * sum(imdb_url <=> accepted_imdb_url) / count(episode_has_accepted_imdb_url)
) as percentage
from mturk_completion inner join
mturk_worker
using (worker_id)
where timestamp > '2015-02-01'
group by worker_id
order by percentage desc, correct desc;
This isn't standard SQL, but it is perfectly fine in MySQL.
Otherwise, you would need to use a subquery, and there is additional overhead in MySQL associated with subqueries.

check whether all rows in a table have unique values for a column

How to check whether all rows in a table have unique values for a column
having char datatype in a table in MySQL and return the value as yes or no ?
You can try something like this, where ? is the column you want to check :
SELECT IF(t.total = t.total_distinct, 'YES', 'NO') AS result
FROM ( SELECT COUNT(*) AS total
, COUNT(DISTINCT ?) AS total_distinct
FROM tbl
) t
If you ignore NULL values, you can just compare count() and count(distinct):
select (case when count(col) = count(distinct col) then 'All Unique' else 'Duplicates' end)
from table t;
If NULL values are a concern (so NULL would be allowed at most one time), then you can aggregate and look at the maximum count:
select (case when max(cnt) = 1 then 'All Unique' else 'Duplicates' end)
from (select col, count(*) as cnt
from table t
group by col
) col
I would go with the first version of Gordon's answer, but grouped by the Primary key of the table. In other words :
select primary_key_field, (case when count(col) = count(distinct col) then 'All Unique' else 'Duplicates' end)
from table t
group by primary_key_field.

SQL find rows where value is not increasing

I have a table with columns like this:
id | timestamp | ...
and I am looking for rows where the timestamp decreased since the previous row.
I tried a statement like this:
SELECT count(a.id)
FROM tbl AS a INNER JOIN tbl AS b ON a.id+1=b.id
WHERE a.timestamp<b.timestamp;
but it appears not to have worked. I get zero results even though I expect some. Any suggestions what is wrong?
I would also appreciate any ideas on a better way to write this query.
I am using MySQL.
You can get the previous value using a correlated subquery, and then use that for the comparison:
select t.*
from (select t.*,
(select t2.timestamp from tbl t2 where t2.id < t.id order by t2.id desc limit 1
) as prevts
from tbl t
) t
where timestamp < prevts;
The problem with your query is probably that the ids have gaps in them.
EDIT:
You can do this with variables. The challenge is getting the variable comparison and assignment in a single expression. This is needed because MySQL does not guarantee the order of evaluation of expressions in a select statement.
The following assigns a value to IsDecreasing and assigns the values:
select t.*
from (select t.*,
if(#prev > timestamp, if(#prev := timestamp, 1, 1),
if(#prev := timestamp, 0, 0)
) IsDecreasing
from tbl t cross join
(select #prev := -1) vars
order by id
) t
where IsDecreasing = 1;
This should be faster than the previous method -- probably even when you have the right index.