Count up on a positive change, and down on a negative change - mysql

I have a column that changes values.
I want to count by adding at each change up and subtracting at each change down. Assuming x[] are my values, Delta is the sign of change in x's elements, and y[] is my targeted results or counts.
We count up until the next delta -1 at which we start counting down, then we resume counting up when delta changes back to +1. In summary we add normally until we have a delta of -1 at that time we start subtracting, then resume adding up at the next +1 delta.
x: 1, 3, 4, 4, 4, 5, 5, 3, 3, 4, 5, 5, 6, 5, 4, 4, 4, 3, 4, 5, 6, 7, 8
Delta: 0, 1, 1, 0, 0, 1, 0, -1, 0, 1, 1, 0, 1, -1, -1, 0, 0, -1, 1, 1, 1, 1, 1
y: 1, 2, 3, 4, 5, 6, 7, 6, 5, 6, 7, 8, 9, 8, 7, 6, 5, 4, 5, 6, 7, 8, 9
The length of my array is in the millions of rows, and efficiency is important. Not sure if such operation should be done in SQL or whether I would be better off retrieving the data from the database and performing such calculation outside.

You could use this query in SQL-Server, presuming a PK-column for the ordering:
WITH CTE AS
(
SELECT t.ID, t.Value,
LastValue = Prev.Value,
Delta = CASE WHEN Prev.Value IS NULL
OR t.Value > Prev.Value THEN 1
WHEN t.Value = Prev.Value THEN 0
WHEN t.Value < Prev.Value THEN -1 END
FROM dbo.TableName t
OUTER APPLY (SELECT TOP 1 t2.ID, t2.Value
FROM dbo.TableName t2
WHERE t2.ID < t.ID
ORDER BY t2.ID DESC) Prev
)
, Changes AS
(
SELECT CTE.ID, CTE.Value, CTE.LastValue, CTE.Delta,
Change = CASE WHEN CTE.Delta <> 0 THEN CTE.Delta
ELSE (SELECT TOP 1 CTE2.Delta
FROM CTE CTE2
WHERE CTE2.ID < CTE.ID
AND CTE2.Delta <> 0
ORDER BY CTE2.ID DESC) END
FROM CTE
)
SELECT SUM(Change) FROM Changes c
The result is 9 as expected:
complete result set
only Sum
The OUTER APPLY links the current with the previous record, the previous record is the one with the highest ID < current.ID. It works similar to a LEFT OUTER JOIN.
The main challenge was the sub-query in the last CTE. That is necessary to find the last delta that is <> 0 to determine if the current delta is positive or negative.

You can also use LAG and SUM with OVER (Assuming you have SQL Server 2012 or above) like this.
Sample Data
DECLARE #Table1 TABLE (ID int identity(1,1), [x] int);
INSERT INTO #Table1([x])
VALUES (1),(3),(4),(4),(4),(5),(5),(3),(3),(4),(5),(5),(6),(5),(4),(4),(4),(3),(4),(5),(6),(7),(8);
Query
;WITH T1 as
(
SELECT ID,x,ISNULL(LAG(x) OVER(ORDER BY ID ASC),x - 1) as PrevVal
FROM #Table1
), T2 as
(
SELECT ID,x,PrevVal,CASE WHEN x > PrevVal THEN 1 WHEN x < PrevVal THEN -1 ELSE 0 END as delta
FROM T1
)
SELECT ID,x,SUM(COALESCE(NULLIF(T2.delta,0),TI.delta,0))OVER(ORDER BY ID) as Ordered
FROM T2 OUTER APPLY (SELECT TOP 1 delta from T2 TI WHERE TI.ID < T2.ID AND TI.x = T2.x AND TI.delta <> 0 ORDER BY ID DESC) as TI
ORDER BY ID
Output
ID x Ordered
1 1 1
2 3 2
3 4 3
4 4 4
5 4 5
6 5 6
7 5 7
8 3 6
9 3 5
10 4 6
11 5 7
12 5 8
13 6 9
14 5 8
15 4 7
16 4 6
17 4 5
18 3 4
19 4 5
20 5 6
21 6 7
22 7 8
23 8 9

You use sql-server and mysql tag. If this can be done within SQL-Server you should have a look on the OVER-clause: https://msdn.microsoft.com/en-us/library/ms189461.aspx
Assuming there's an ordering criteria it is possible to state a ROW-clause and use the value of a preceeding row. Many SQL-functions allow the usage of OVER.
You could define a computed column which does the calculation on insert...
Good luck!

Related

How to calculate the drawdown in Mysql with Window Functions?

I have this schema:
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`amount` int(11) DEFAULT NULL,
`group_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
);
and i have populated that table with the following data:
insert into test (amount, group_id) values
(1,1), (3,1), (-4,1), (-2,1), (5,1), (10,1), (18,1), (-3,1),
(-5,1), (-7,1), (12,1), (-9,1), (6,1), (0,1), (185,2), (-150,2)
The table is:
# id, amount, group_id
1, 1, 1
2, 3, 1
3, -4, 1
4, -2, 1
5, 5, 1
6, 10, 1
7, 18, 1
8, -3, 1
9, -5, 1
10, -7, 1
11, 12, 1
12, -9, 1
13, 6, 1
14, 0, 1
15, 185, 2
16, -150, 2
This is the query i am using right now:
SELECT
t1.id,
t1.amount,
t1.cumsum,
(MAX(t1.cumsum) OVER (ORDER BY id RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) - t1.cumsum) as drawdown
FROM
(
SELECT
id,
amount,
group_id,
SUM(amount) OVER (ORDER BY id RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as cumsum
FROM
test
) as t1
order by drawdown desc
the query returns this output:
id, amount, cumsum, drawdown
10 -7 16 15
12 -9 19 12
9 -5 23 8
4 -2 -2 6
14 0 25 6
13 6 25 6
3 -4 0 4
8 -3 28 3
11 12 28 3
5 5 3 1
16 150 360 0
2 3 4 0
7 18 31 0
15 185 210 0
1 1 1 0
6 10 13 0
Just a fast clarification:
With the term "drawdown" (in this case) i mean the difference from the max cumsum of the value(field) AND the simple cumsum (cumulative sum).
I am not referring to any particular trading definition, i just need to difference between the biggest pick and the lowest. (obviously the highest pick must occour before the lowest = downtrend).
OK, my problem is that i need to group by the field group_id, if i add the group by clause on the query all the windows function will mess up.
RESULT EXPECTED:
I need to get a list grouped by group_id field that show the max drawdown for each group.
Edit: (#Akina)
group_id, drawdown
1 15
2 150

How to limit results of SQLite per specific group of results?

I have the following problem at work. I have a large table with different columns and few 100 000s of rows. I'll only post the ones im interested in.
Assume the following data set
Device ID, Feature Id, Feature Status
1, 1, 0
1, 2, 0
1, 3, 1
1, 4, 1
1, 5, 1
2, 1, 1
2, 2, 0
2, 3, 0
2, 4, 1
2, 5, 0
3, 1, 1
3, 2, 1
3, 3, 1
3, 4, 1
3, 5, 1
4, 1, 0
4, 2, 0
4, 3, 1
4, 4, 0
4, 5, 0
I need to select rows with Feature Status = 1 but only the first 2 from each Device Id.
The results of the query should be:
1,3,1
1,4,1
2,1,1
2,4,1
3,1,1
3,2,1
4,3,1
I tried something like this:
SELECT brdsurfid,featureidx,FeatStatus FROM Features F1 WHERE FeatStatus = 1 AND
(SELECT COUNT(*) FROM Features F2
WHERE F2.FeatureIdx <= F1.FeatureIdx AND F2.FeatStatus = 1) < 2
ORDER BY BrdSurfId,FeatureIdx;
which I found in another response but it didnt quite work.
I know I need to use a mix of LIMIT or COunt(*) and some nested selects but I can't figure it out. Thanks
This probably not a very efficient way to do this, but I don't think there is a better solution for sqlite (that involves a single query):
SELECT *
FROM t t0
WHERE FeatureStatus AND
(SELECT count(*)
FROM t t1
WHERE t0.DeviceID=t1.DeviceID
AND FeatureStatus
AND t1.FeatureId<t0.FeatureId
)<2;
I assume that the table is called t. The idea is to find all features where the status is 1 and then for each feature to count the previous features with that status for the same product. If that count is more than 2, then reject the row.
Not sure if this will work with sqlite but for what its worth...
;with result as
(
SELECT
brdsurfid,
featureidx,
FeatStatus ,
ROW_NUMBER() OVER(PARTITION BY brdsurfid ORDER BY fieldWhatever) AS 'someName1',
ROW_NUMBER() OVER(PARTITION BY featureidx ORDER BY fieldWhatever) AS 'someName2',
ROW_NUMBER() OVER(PARTITION BY FeatStatus ORDER BY fieldWhatever) AS 'someName3'
FROM
Features
)
SELECT *
FROM
result
WHERE
FeatStatus = 1 AND
someName1 <= 2 AND
someName2 <= 2 AND
someName3 <= 2

Compare rows SQL counting discrepancies in value

I have a table:
id firstval secondval
1 4 5
2 5 4
3 3 3
4 6 6
5 7 8
6 9 8
7 3 3
8 3 3
The first thing I need to do is count the number of times secondval > firstval. This is obviously no problem.
However, the thing I'm struggling with is how to then count how many times (for each instance of secondval > firstval) the next row satisfies the condition secondval < firstval
So in this example there are two rows that would satisfy the first rule id 1 & 5 and two for the second rule, the next rows id 2 and 6.
SELECT id, #prevGreater AND secondval < firstval AS discrepancy,
#prevGreater := secondval > firstval AS secondGreater
FROM (SELECT * FROM YourTable ORDER BY id) AS x
CROSS JOIN (SELECT #prevGreater := false) AS init
DEMO
SELECT * from table t1
INNER JOIN table t2 on t1.ID+1=t2.ID -- here we join on t2.ID is t1.ID+1
WHERE t1.secondval>t1.firstval AND t2.secondval<t2.firstval
Now you can use COUNT statement as you want :)
DECLARE #YourTable TABLE
(id int, firstval int, secondval int)
INSERT INTO #YourTable
SELECT 1, 4, 5
UNION ALL
SELECT 2, 5, 4
UNION ALL
SELECT 3, 3, 3
UNION ALL
SELECT 4, 6, 6
UNION ALL
SELECT 5, 7, 8
UNION ALL
SELECT 6, 9, 8
UNION ALL
SELECT 7, 3, 3
UNION ALL
SELECT 8, 3, 3
SELECT ID
,CASE
WHEN SECONDVAL>FIRSTVAL THEN 0
WHEN FIRSTVAL>SECONDVAL THEN 1
ELSE 0
END AS DISCREPANCY
,CASE
WHEN SECONDVAL>FIRSTVAL THEN 1
WHEN FIRSTVAL>SECONDVAL THEN 0
ELSE 0
END AS SECONDGREATER
FROM #YourTable
You could try this one.

Group MySQL values from table

I'm sure this is possible I don't know where to start.
I have a table with 2000 values they are on the range from 0 to 100.
I want to query the table to get the different groups of values.
i.e. I have those values 5, 10 , 5 , 2 , 2, 0, 1, 1, 1, 1, 1, 10, 2
I want an output like this:
Value - Number_of_times
0 1
1 5
2 3
5 2
10 2
You need to group your results by your field, and take the COUNT() of each group:
SELECT myfield, COUNT(*) as number_of_times
FROM mytable
GROUP BY myfield

Get the column order of a query

I have a table with two columns [id, value] both numeric.
In this example:
[ id, value ]
[ 1, 6 ]
[ 2, 4 ]
[ 3, 10 ]
[ 4, 2 ]
[ 5, 7 ]
[ 6, 3 ]
For a given id I'd like to retrieve the top 3 id's (those with highest value), their top position and if the given id is not in the top 3, also get its position, id and value:
Example 1: ask_id = 5 Return:
[ position, id, value ]
[ 1, 3, 10 ]
[ 2, 5, 7 ]
[ 3, 1, 6 ]
Example 2: ask_id = 4. Return:
[ position, id, value ]
[ 1, 3, 10 ]
[ 2, 5, 7 ]
[ 3, 1, 6 ]
[ 6, 4, 2 ]
So the important points are:
How to get for the position column?
How to get the additional row if possible (anyway there's no problem if I need two queries)?
select t2.pos, t1.id, t1.value
from test as t1
inner join
(select id, value, #pos:=if(#pos is null, 0, #pos)+1 as pos
from test order by value desc) as t2
on t1.id=t2.id
where t2.pos<=3 or t2.id={$ask_id}
order by t2.pos;
Basically, the idea is like this:
Rank the rows by value.
Retrieve rows where at least one of the following is true:
position BETWEEN 1 AND 3
id = #given_id
These posts give examples of how you could substitute ranking functions (at least the most fundamental of them, ROW_NUMBER()) in MySQL:
ROW_NUMBER() in MySQL
MSSQL Row_Number() over(order by) in MySql
This method should be used with caution, though, as this article explains.
That said, one possible implementation of the above steps might look like this:
SET #pos = 0;
SELECT
position,
id,
value
FROM (
SELECT
id,
value,
#pos := #pos + 1 AS position
FROM atable
ORDER BY value DESC
) s
WHERE position BETWEEN 1 AND 3
OR id = #given_id
ORDER BY position
Tested in MySQL
to retrieve the top 3 id's (those with highest value) with position in ascending order.
set #num = 0;
SELECT #num := #num + 1 as position_sequence,id,value FROM tablename
ORDER BY value desc
limit 3;
I've not (yet) tested the selected answer in MySQL on the interesting cases where there are ties in the top three places, but I have tested this code in Informix on those cases, and it produces the answer I think should be produced.
Assuming that the table is called leader_board:
CREATE TABLE leader_board(id INTEGER NOT NULL PRIMARY KEY, value INTEGER NOT NULL);
INSERT INTO leader_board(id, value) VALUES(1, 6);
INSERT INTO leader_board(id, value) VALUES(2, 4);
INSERT INTO leader_board(id, value) VALUES(3, 10);
INSERT INTO leader_board(id, value) VALUES(4, 2);
INSERT INTO leader_board(id, value) VALUES(5, 7);
INSERT INTO leader_board(id, value) VALUES(6, 3);
This query works on the data shown, assuming that the special ID is 4:
SELECT b.position - c.tied + 1 AS standing, a.id, a.value
FROM leader_board AS a
JOIN (SELECT COUNT(*) AS position, d.id
FROM leader_board AS d
JOIN leader_board AS e ON (d.value <= e.value)
GROUP BY d.id
) AS b
ON a.id = b.id
JOIN (SELECT COUNT(*) AS tied, f.id
FROM leader_board AS f
JOIN leader_board AS g ON (f.value = g.value)
GROUP BY f.id
) AS c
ON a.id = c.id
WHERE (a.id = 4 OR (b.position - c.tied + 1) <= 3) -- Special ID = 4; Top N = 3
ORDER BY position, a.id;
Output on original data:
standing id value
1 3 10
2 5 7
3 1 6
6 4 2
Explanation
The two sub-queries are closely related, but they produce different answers. At one time, I used two temporary tables to hold those results. In particular, the first sub-query (AS b) produces a position, but when there are ties, the position is the lowest rather than the highest of the tied positions. That is, given:
ID Value
1 10
2 7
3 7
4 7
The outputs will be:
Position ID
1 1
4 2
4 3
4 4
However, we would like to count them as:
Position ID
1 1
2 2
2 3
2 4
So, the corrected position is the original position minus the number of tied values (3 for ID ∈ { 2, 3, 4 }, 1 for ID 1) plus 1. The second sub-query returns the number of tied values for each ID. There might be a neater way to do that calculation, but I'm not sure what it is at the moment.
Special cases
However, the code should demonstrate that it handles the cases where:
There are 2 or more ID values with the same top value.
There are 2 or more ID values with the same second highest top score (but the top one is unique).
There are 2 or more ID values with the same third highest top score (but the top two are unique).
To save rewriting the query each time, I converted it into an Informix-style stored procedure which take both the Special ID and the Top N (defaulting to 3) values that should be displayed and made them into parameters of the procedure. (Yes, the notation in the RETURNING clause is weird.)
CREATE PROCEDURE leader_board_standings(extra_id INTEGER, top_n INTEGER DEFAULT 3)
RETURNING INTEGER AS standing, INTEGER AS id, INTEGER AS value;
DEFINE standing, id, value INTEGER;
FOREACH SELECT b.position - c.tied + 1 AS standing, a.id, a.value
INTO standing, id, value
FROM leader_board AS a
JOIN (SELECT COUNT(*) AS position, d.id
FROM leader_board AS d
JOIN leader_board AS e ON (d.value <= e.value)
GROUP BY d.id
) AS b
ON a.id = b.id
JOIN (SELECT COUNT(*) AS tied, f.id
FROM leader_board AS f
JOIN leader_board AS g ON (f.value = g.value)
GROUP BY f.id
) AS c
ON a.id = c.id
WHERE (a.id = extra_id OR (b.position - c.tied + 1) <= top_n)
ORDER BY position, a.id
RETURN standing, id, value WITH RESUME;
END FOREACH;
END PROCEDURE;
This can be invoked to produce the same result as before:
EXECUTE PROCEDURE leader_board_standings(4);
To illustrate the various cases outlined above, add and remove extra rows:
EXECUTE PROCEDURE leader_board_standings(4);
1 3 10
2 5 7
3 1 6
6 4 2
INSERT INTO leader_board(id, value) VALUES(10, 10);
EXECUTE PROCEDURE leader_board_standings(4);
1 3 10
1 10 10
3 5 7
7 4 2
INSERT INTO leader_board(id, value) VALUES(11, 10);
EXECUTE PROCEDURE leader_board_standings(4);
1 3 10
1 10 10
1 11 10
8 4 2
INSERT INTO leader_board(id, value) VALUES(12, 10);
EXECUTE PROCEDURE leader_board_standings(4);
1 3 10
1 10 10
1 11 10
1 12 10
9 4 2
DELETE FROM leader_board WHERE id IN (10, 11, 12);
EXECUTE PROCEDURE leader_board_standings(6, 4); -- Special ID 6; Top 4
1 3 10
2 5 7
3 1 6
4 2 4
5 6 3
INSERT INTO leader_board(id, value) VALUES(7, 7);
EXECUTE PROCEDURE leader_board_standings(4);
1 3 10
2 5 7
2 7 7
7 4 2
INSERT INTO leader_board(id, value) VALUES(13, 7);
EXECUTE PROCEDURE leader_board_standings(4);
1 3 10
2 5 7
2 7 7
2 13 7
8 4 2
INSERT INTO leader_board(id, value) VALUES(14, 7);
EXECUTE PROCEDURE leader_board_standings(4);
1 3 10
2 5 7
2 7 7
2 13 7
2 14 7
9 4 2
DELETE FROM leader_board WHERE id IN(7, 13, 14);
INSERT INTO leader_board(id, value) VALUES(8, 6);
EXECUTE PROCEDURE leader_board_standings(4);
1 3 10
2 5 7
3 1 6
3 8 6
7 4 2
INSERT INTO leader_board(id, value) VALUES(9, 6);
EXECUTE PROCEDURE leader_board_standings(4);
1 3 10
2 5 7
3 1 6
3 8 6
3 9 6
8 4 2
INSERT INTO leader_board(id, value) VALUES(15, 6);
EXECUTE PROCEDURE leader_board_standings(4);
1 3 10
2 5 7
3 1 6
3 8 6
3 9 6
3 15 6
9 4 2
EXECUTE PROCEDURE leader_board_standings(3); -- Special ID 3 appears in top 3
1 3 10
2 5 7
3 1 6
That all looks correct to me.