SQL find rows where value is not increasing - mysql

I have a table with columns like this:
id | timestamp | ...
and I am looking for rows where the timestamp decreased since the previous row.
I tried a statement like this:
SELECT count(a.id)
FROM tbl AS a INNER JOIN tbl AS b ON a.id+1=b.id
WHERE a.timestamp<b.timestamp;
but it appears not to have worked. I get zero results even though I expect some. Any suggestions what is wrong?
I would also appreciate any ideas on a better way to write this query.
I am using MySQL.

You can get the previous value using a correlated subquery, and then use that for the comparison:
select t.*
from (select t.*,
(select t2.timestamp from tbl t2 where t2.id < t.id order by t2.id desc limit 1
) as prevts
from tbl t
) t
where timestamp < prevts;
The problem with your query is probably that the ids have gaps in them.
EDIT:
You can do this with variables. The challenge is getting the variable comparison and assignment in a single expression. This is needed because MySQL does not guarantee the order of evaluation of expressions in a select statement.
The following assigns a value to IsDecreasing and assigns the values:
select t.*
from (select t.*,
if(#prev > timestamp, if(#prev := timestamp, 1, 1),
if(#prev := timestamp, 0, 0)
) IsDecreasing
from tbl t cross join
(select #prev := -1) vars
order by id
) t
where IsDecreasing = 1;
This should be faster than the previous method -- probably even when you have the right index.

Related

MySQL: user-variable definition within SQL statement to create counter column

Is it possible to create a counter in mysql/mariadb in one single SELECT-statement. I've tried the following but it returns only the value 1 in the first column:
SELECT #rownr := IF(ISNULL(#rownr),0,#rownr)+1 AS rowNumber, * FROM table_x LIMIT 0,10
If I run the statement more often in the same mysql-instance it starts counting from the last number. So the second time it starts at 2, the third time at 12. This means that the variable is created but seems to be only available for modification when it was instantiated before the SQL statement was issued.
It is possible, but a bit tricky. First, you need to declare the variable outside of the select clause (in a separate set assignment, or in a derived table). Also, it is safer to sort the rows in a subquery first, and then compute the variable.
I would recommend:
set #rn := 0;
select t.*, #rn := #rn + 1 rowNumber
from (select t.* from mytable t order by id limit 10) t
Note that I added an order by clause to the inner query, otherwise it is undefined in which sequence rows will be ordered (I assumed id).
Alternatively, you can declare the variable in a derived table:
select t.*, #rn := #rn + 1 rowNumber
from (select t.* from mytable t order by id limit 10) t
cross join (select #rn := 0) x
Finally: if you are running MySQL 8.0, just use row_number():
select t.*, row_number() over(order by id) rn
from mytable t
order by id
limit 10;
You don't have an order by, so the ordering is indeterminate. But you can initialize the parameter in the statement itself:
SELECT #rownr := (#rownr + 1) AS rowNumber, x.*
FROM table_x x.CROSS JOIN
(SELECT #rownr := 0) params
LIMIT 0, 10;
If you want a particular ordering, you should use an order by in a subquery.
Also note that starting in MySQL 8, variable assignments in SELECT are deprecated. You should be using window functions (row_number()) in more recent versions.

How to eliminate only continuous duplicates but not all duplicates in a select query (MySQL)?

I have a table like this:
01-Jul-17 100
02-Jul-17 100
03-Jul-17 300
04-Jul-17 300
05-Jul-17 500
06-Jul-17 500
07-Jul-17 300
08-Jul-17 400
09-Jul-17 100
10-Jul-17 100
What I want to output is (in this order) by eliminating the continuous duplicates but not all duplicates:
100
300
500
300
400
100
I cannot select Distinct, as it will eliminate the second instances of 300, 100. Is there a way to achieve this result in MySQL?
Thanks!
You want to get the previous value. If the dates really have no gaps or duplicates, just do:
select t.*
from t left join
t tprev
on t.col1 = date_add(tprev.col1, interval 1 day)
where tprev.col2 is null or tprev.col2 <> t.col2;
EDIT:
If the dates don't meet these conditions, then you can use variables:
select t.*
from (select t.*,
(#rn := if(#v = col2, #rn + 1,
if(#v := col2, 1, 1)
)
) as rn
from t cross join
(select #v := 0, #rn := 0) params
order by t.col1
) t
where rn = 1;
Note that MySQL does not guarantee the order of evaluation of expressions in the SELECT. So variables should not be assigned in one expression and then used in another -- they should be assigned in a single expression.
One way to handle this problem is by using session variables to track the changes of the values as ordered by your date column. In the query below, we keep track of the value, ordered by date, and assign a row number to each group of identical value. Then, only the first value in each group is retained. Note that this approach is robust to any number of duplicates. It is also robust with respect to there being gaps in your dates, so long as each record can be ordered by date.
SET #rn = 1;
SET #val = NULL;
SELECT t.val
FROM
(
SELECT
#rn:=CASE WHEN #val = val THEN #rn+1 ELSE 1 END rn,
#val:=val AS val,
dt
FROM yourTable
ORDER BY dt
) t
WHERE t.rn = 1
ORDER BY t.dt;
Output:
Demo here:
Rextester
You can make use of lag and lead functions.
select y from (select y , lag(y,1,0) over (order by x) as prev_y from t1) where y <> prev_y;

Hide ROW_NUMBER() in subquery

I have a query like
Select *
From y
WHERE y.z = (
SELECT a, (adding rownumber here)
FROM b
)
I want to add a clause where it only selects every second row. To do this I need to add row_number() to the subquery, and have a clause where rownumber % 2 = 0.
My question is, am I able to add rownumber to the select of the subquery and somehow hide it so it doesn't affect the query
Rownumbering in MySQL is a notorious pain in the neck.
You can number your rows in MySQL like this.
SELECT (#rownum := #rownum+1) rownum, b.*
FROM b
JOIN (SELECT #rownum := 0) init
ORDER BY b.whatever
Don't forget the ORDER BY clause here. Without explicit ordering the query engine is free to randomize the order of rows it returns.
Then, you can use that mess as a subquery and do things with the rownum.
SELECT *
FROM (
SELECT (#rownum := #rownum+1) rownum, b.*
FROM b
JOIN (SELECT #rownum := 0) init
ORDER BY b.whatever
) table_with_rownum
WHERE rownum % 2 = 0
If you don't want to show the rownumbers, change your SELECT from SELECT * to SELECT col, col, col and leave out rownum.

What are the subquery equivalents of SQL aggregate functions MAX/MIN/AVG/COUNT

Can someone show me how to represent the following SQL statements without the use of aggregate functions?
SELECT COUNT(column) FROM table;
SELECT AVG(column) FROM table;
SELECT MAX(column) FROM table;
SELECT MIN(column) FROM table;
MIN() and MAX() can be done with simple subqueries:
select (select column from table order by column is not null desc, column asc limit 1) as "MIN",
(select column from table order by column is not null desc, column desc limit 1) as "MAX"
COUNT() and AVG() require the use of variables, if you don't allow any aggregations:
select rn as "COUNT", sumcol / rnaas "AVG"
from (select t.*
from (select t.*,
(#rn := #rn + 1) as rn,
(#rna := #rna + if(column is not null, 1, 0)) as rna,
(#sum := #sum + coalesce(column, 0)) as sumcol
from table t cross join
(select #rn := 0, #rna := 0, #sum := 0) const
order by column
) t
order by rn desc
limit 1
) t
This latter formulation only works in MySQL.
EDIT:
The empty table is a challenge. Let's do this with a left outer join:
select cast(coalesce(rn, 0) as int) as "COUNT",
(case when rna > 0 then sumcol / rna end) as "AVG"
from (select 1 as n
) n left outer join
(select t.*
from (select t.*,
(#rn := #rn + 1) as rn,
(#rna := #rna + if(column is not null, 1, 0)) as rna,
(#sum := #sum + coalesce(column, 0)) as sumcol
from table t cross join
(select #rn := 0, #rna := 0, #sum := 0) const
order by column
) t
order by rn desc
limit 1
) t
on n.n = 1;
Notes. This will return 0 for the count if the table is empty. That is correct. If the table is empty, it will return NULL for the average, and that is also correct.
If the table is not empty, but the values are all NULL, then it will also return NULL. The types for the count are always integers, so that should be ok. The type of the average is more problematic, but the variables will return some sort of generic numeric type, which seems compatible in spirit.
min/max can be replaced with something like this:
select t1.pk_column,
t1.some_column
from the_table t1
where t1.some_column < ALL (select t2.some_column
from the_table t2
where t2.pk_column <> t2.pk_column);
For getting the max you need to replace < with >. pk_column is the primary key column of the table and is needed to avoid comparing each row to itself (it doesn't have to be a PK it only needs to be unique)
I don't think there is an alternative for count() or avg() (at least I can't think of one)
I used the_column and the_table because column and table are reserved words
SET #t1=0, #t2=0, #t3=0,#T4=0;
COUNT:
Select #t1:=#t1+1 as CNT from table
order by #t1:=#t1+1 DESC
LIMIT 1
Similar methods could be put together for Avg and max/min using limits...
Still thinking about Min/Max...
Not to supersede the excellent answer from Gordon Linoff, but there's a little more work involved to accurately emulate the AVG(), COUNT(), and SUM() functions. (The answer for the MIN and MAX functions in Gordon's answer are spot on.)
There's a corner case when the table is empty. In order to emulate the SQL aggregate functions, we need our query to return a single row. But at the same time, we need a test of whether or not the table contains at least one row.
Here's a query that is a more precise emulation:
-- create an empty table
CREATE TABLE `foo` (col INT);
-- TRUNCATE TABLE `foo`;
SELECT IF(s.ne IS NULL,0,s.rn) AS `COUNT(*)`
, IF(s.cc>0,s.tc,NULL) AS `SUM(col)`
, IF(s.cc>0,s.tc/s.cc,NULL) AS `AVG(col)`
FROM ( SELECT v.rn
, v.cc
, v.tc
, e.ne
FROM ( SELECT #rn := #rn + 1 AS rn
, #cc := #cc + (t.col IS NOT NULL) AS cc
, #tc := #tc + IFNULL(t.col,0) AS tc
FROM (SELECT #rn := 0, #cc := 0, #tc := 0) c
LEFT
JOIN `foo` t
ON 1=1
) v
LEFT
JOIN (SELECT 1 AS ne FROM `foo` z LIMIT 1) e
ON 1=1
ORDER BY v.rn DESC
LIMIT 1
) s
NOTES:
The purpose of the inline view aliased as e is to give us a way to determine whether or not the table contains any rows. If the table contains at least one row, we'll get a value of 1 returned as column ne (not empty). If the table is empty, that query won't return a row, and e.ne will be NULL, which is something we can test in the outer query.
In order to return a row, so we can return a value, like a 0 for a COUNT, we need to insure that we return at least one row from the inline view v. Since we are guaranteed exactly one row from the inline view aliased as c (which initializes our user defined variables), we'll use that as the "driving" table for a LEFT [OUTER] JOIN operation.
But, if the table is empty, our our row counter (#rn) coming out of v is going to have a value of 1. But we'll deal with that, we have the e.ne we can check to know if the count should really be returned as 0.
In order to calculate the average, we can't divide by the row counter, we have to divide by the number of rows where col was not null. We use the #cc user defined variable to keep track of the count of those rows.
Similarly, for the SUM (and the average) we need to accumulate only the non-NULL values. (If we were to add a NULL, it would turn the whole total to NULL, basically wiping out are accumulation. So, we're going to do a conditional test to check if t.col IS NULL, to avoid accidentally wiping out the accumulation. And our accumulator is going to be a 0 if there aren't any rows that are not null. But that's not a problem, because we'll make sure we check our #cc to see if there were any rows that were included. We're going to need to check it anyway, to avoid a "divide by zero" issue.
To test, run against the empty table foo. It will return a count of 0, and NULL for SUM and AVG, equivalent to the result we get from:
SELECT COUNT(*), SUM(col), AVG(col) FROM foo;
We can also test the query against a table containing only NULL values for col:
INSERT INTO `foo` (col) VALUES (NULL);
As well as some non-NULL values:
INSERT INTO `foo` (col) VALUES (2),(3),(5),(7),(11),(13),(17),(19);
And compare the results of the two queries.
This essentially the same as the answer from Gordon Linoff, with just a little more precision to work around the corner cases of NULL values and the empty table.

Getting latest rows in MySQL based on date (grouped by another column)

This type of question is asked every now and then. The queries provided works, but it affects performance.
I have tried the JOIN method:
SELECT *
FROM nbk_tabl
INNER JOIN (
SELECT ITEM_NO, MAX(REF_DATE) as LDATE
FROM nbk_tabl
GROUP BY ITEM_NO) nbk2
ON nbk_tabl.REF_DATE = nbk2.LDATE
AND nbk_tabl.ITEM_NO = nbk2.ITEM_NO
And the tuple one (way slower):
SELECT *
FROM nbk_tabl
WHERE REF_DATE IN (
SELECT MAX(REF_DATE)
FROM nbk_tabl
GROUP BY ITEM_NO
)
Is there any other performance friendly way of doing this?
EDIT: To be clear, I'm applying this to a table with thousands of rows.
Yes, there is a faster way.
select *
from nbk_table
order by ref_date desc
limit <n>
Where is the number of rows that you want to return.
Hold on. I see you are trying to do this for a particular item. You might try this:
select *
from nbk_table n
where ref_date = (select max(ref_date) from nbk_table n2 where n.item_no = n2.item_no)
It might optimize better than the "in" version.
Also in MySQL you can use user variables (Suppose nbk_tabl.Item_no<>0):
select *
from (
select nbk_tabl.*,
#i := if(#ITEM_NO = ITEM_NO, #i + 1, 1) as row_num,
#ITEM_NO := ITEM_NO as t_itemNo
from nbk_tabl,(select #i := 0, #ITEM_NO := 0) t
order by Item_no, REF_DATE DESC
) as x where x.row_num = 1;