SELECT TOP PERCENT, VaR, Expected Shortfall in MySQL

SELECT TOP PERCENT, VaR, Expected Shortfall in MySQL - mysql

I would like to achieve SELECT TOP PERCENT in MySQL.
I used Victor Sorokin's idea in Select TOP X (or bottom) percent for numeric values in MySQL, and got the following query:
SELECT x.log AS Login,
AVG(x.PROFIT) AS 'Expected Shortfall',
MAX(x.PROFIT) AS '40%VaR'
FROM
(SELECT t.PROFIT,
#counter := #counter +1 AS counter,
t.LOGIN AS log
FROM (SELECT #counter:=0) initvar, trades AS t
WHERE t.LOGIN IN (100,101)
ORDER BY t.PROFIT) AS x
WHERE x.counter <= (40/100 * #counter)
GROUP BY x.log
Which return the following result:
Login
Expected Shortfall
40%VaR
101
-85
-70
This works when I change WHERE t.LOGIN IN (100,101) to a single value like WHERE t.LOGIN=100. Whereby it will return me values for each login as following:
Login
Expected Shortfall
40%VaR
100
-4.5
-4
Login
Expected Shortfall
40%VaR
101
-95
-90
I'm not really sure what is happening and I was wondering if there is a way to use the query for multiple accounts or there is a better way to solve the issue? Was thinking of a LOOP statement?
I'm currently using MySQL version 5.7.34. Please do not hesitate to let me know if any clarification is needed. Any ideas would be much appreciated!
Edit: To replicate the issue:
CREATE TABLE trades (
TICKET int(11) PRIMARY KEY,
LOGIN int(11),
PROFIT double)
INSERT INTO trades (TICKET,LOGIN,PROFIT)
VALUES
(1,100,-5),
(2,100,-4),
(3,100,-3),
(4,100,-2),
(5,100,-1),
(6,101,-100),
(7,101,-90),
(8,101,-80),
(9,101,-70),
(10,101,-60),
(11,101,-50),
(12,101,500)
The expected output is just like the outputs you would get if you ran the query for 100 and 101 separately:
Expected Output
LOGIN
ES
40%VAR
100
-4.5
-4
101
-95
-90
Expected Output

The reason why the end result was not according to the single value queries was caused by the #row_number assignment. Taking the base query (the subquery) to run alone will return the following results:
PROFIT
counter
log
-100
1
101
-90
2
101
-80
3
101
-70
4
101
-60
5
101
-50
6
101
-5
7
100
-4
8
100
-3
9
100
-2
10
100
-1
11
100
500
12
101
As you can see, the counter value that was generated using #row_number is giving a running number for all of the data in the table regardless of it's log value. The result below shows the differences with query that using a single log value:
PROFIT
counter
log
-5
1
100
-4
2
100
-3
3
100
-2
4
100
-1
5
100
Here you can see that if using log=100, you'll get a counter (#row_number) generated from 1-5 as opposed to it being generated from 7-11 in the combined log IN (100,101). This is why WHERE x.counter <= (40/100*v.ctr) in the final query only take log=101 because it's the only one matches the condition. What you're looking for is a counter value separated by log. On MySQL 8.0+ (or MariaDB 10.2+) that support window function, this can be done by using ROW_NUMBER(). However, since OP is using an older version, I found a way to emulate the functionality of ROW_NUMBER() accordingly.
This is the final query generated:
SELECT x.log AS Login,
AVG(x.PROFIT) AS 'Expected Shortfall',
MAX(x.PROFIT) AS '40%VaR'
FROM
(SELECT t.PROFIT,
#row_number:=CASE
WHEN #id = LOGIN THEN #row_number + 1
ELSE 1 END AS counter,
#id:=LOGIN ID, t.LOGIN AS log
FROM trades t
CROSS JOIN (SELECT #id:=0,#row_number:=0) as n
ORDER BY LOGIN) AS x
JOIN (SELECT Login,COUNT(*) ctr FROM trades GROUP BY login) AS v
ON x.log=v.login
WHERE x.counter <= (40/100*v.ctr)
GROUP BY x.log
ORDER BY x.log;
And here is the demo fiddle (inclusive of ROW_NUMBER()) on MySQL 8.0+ query.

Related

mysql query to identify groups of data based on timestmp

I have records of smartmeter in an mysql database.
Records in timestamp order looking in generall as follow:
key
timestamp
watt now
000001
2022-10-04-01-01-01
10
000002
2022-10-04-01-02-01
10
000003
2022-10-04-01-03-01
101
000004
2022-10-04-01-04-01
101
000005
2022-10-04-01-05-01
102
000006
2022-10-04-01-06-01
101
000007
2022-10-04-01-07-01
102
000008
2022-10-04-01-08-01
10
000009
2022-10-04-01-09-01
10
000010
2022-10-04-01-09-01
10
000011
2022-10-04-01-09-01
107
000012
2022-10-04-01-09-01
101
000013
2022-10-04-01-09-01
109
000014
2022-10-04-01-09-01
10
000015
2022-10-04-01-09-01
10
I want to identify the groups with bigger number (lets say > 100)
and give them an incresing id. Also I want to get per group the first and last key id
Result of query should look like this:
month
day
numbers of group
first id
last id
average watt
10
04
0
000003
000007
102
10
04
1
000011
0000013
105
Any help apreciated

You'll need something to identify them as a group. My first thought was using RANK() or DENSE_RANK() but after multiple tries, I couldn't find a way. Then I thought about using LAG() but still I'm stuck at how to re-identify the rows as new group. After testing many times, I come up with this suggestion:
WITH cte AS (
SELECT s1.*,
#n := COALESCE(IF(s1.skey=1,1,s2.skey), #n) As newGroup
FROM smartmeter s1
LEFT JOIN (
SELECT skey,
stimestamp,
watt,
LENGTH(watt) AS lenwatt,
LAG(LENGTH(watt)) OVER (ORDER BY skey) llwatt
FROM smartmeter) s2 ON s1.skey=s2.skey
AND lenwatt != llwatt)
SELECT MONTH(stimestamp) AS Month,
DAY(stimestamp) AS Day,
ROW_NUMBER() OVER (ORDER BY MIN(skey)) AS 'numbers of group',
MIN(skey) AS 'first id',
MAX(skey) AS 'last id',
AVG(watt) AS 'Average watt',
CEIL(AVG(watt)) AS 'Average watt rounded',
newGroup
FROM cte
WHERE watt >= 100
GROUP BY newGroup, MONTH(stimestamp), DAY(stimestamp)
By the way, I've changed some of your column names because key is actually a reserve word. Although you can use it as column name as long as you wrap it in backticks, I personally find it's a hassle to do it every time.
Ok, so my idea was to use LENGTH(watt) and ORDER BY skey in the LAG() function. Then I'll separate those rows where the length doesn't match and use that as a starting point for each new group. After that, I left join the result of that with smartmeter table. The next challenge is to assign each of the rows that doesn't match with previous skey value then I've found this answer and applied it into the cte.
Once those are done, I just write another query to fulfil your expected result. Although, some part of it is not exactly as what you expected.
Here's a demo fiddle

Get the average of values in every specific epoch ranges in unix timestamp which returns -1 in specific condition in MySQL

I have a MySQL table which has some records as follows:
unix_timestamp value
1001 2
1003 3
1012 1
1025 5
1040 0
1101 3
1105 4
1130 0
...
I want to compute the average for every 10 epochs to see the following results:
unix_timestamp_range avg_value
1001-1010 2.5
1011-1020 1
1021-1030 5
1031-1040 0
1041-1050 -1
1051-1060 -1
1061-1070 -1
1071-1080 -1
1081-1090 -1
1091-1100 -1
1101-1110 3.5
1111-1120 -1
1121-1130 0
...
I saw some similar answers like enter link description here and enter link description here and enter link description here but these answers are not a solution for my specific question. How can I get the above results?

The easiest way to do this is to use a calendar table. Consider this approach:
SELECT
CONCAT(CAST(cal.ts AS CHAR(50)), '-', CAST(cal.ts + 9 AS CHAR(50))) AS unix_timestamp_range,
CASE WHEN COUNT(t.value) > 0 THEN AVG(t.value) ELSE -1 END AS avg_value
FROM
(
SELECT 1001 AS ts UNION ALL
SELECT 1011 UNION ALL
SELECT 1021 UNION ALL
...
) cal
LEFT JOIN yourTable t
ON t.unix_timestamp BETWEEN cal.ts AND cal.ts + 9
GROUP BY
cal.ts
ORDER BY
cal.ts;
In practice, if you have the need to do this sort of query often, instead of the inline subquery labelled as cal above, you might want to have a full dedicated table representing all timestamp ranges.

Select a distributed sample set of records from a MySQL set of many records

I have a table that has many rows in it, with rows occurring at the rate of 400-500 per minute (I know this isn't THAT many), but I need to do some sort of 'trend' analysis on the data that has been collected over the last 1 minute.
Instead of pulling all records that have been entered and then processing each of those, I would really like to be able to select, say, 10 records - which occur at a -somewhat- even distribution through the timeframe specified.
ID DEVICE_ID LA LO CREATED
-------------------------------------------------------------------
1 1 23.4 948.7 2018-12-13 00:00:01
2 2 22.4 948.2 2018-12-13 00:01:01
3 2 28.4 948.3 2018-12-13 00:02:22
4 1 26.4 948.6 2018-12-13 00:02:33
5 1 21.4 948.1 2018-12-13 00:02:42
6 1 22.4 948.3 2018-12-13 00:03:02
7 1 28.4 948.0 2018-12-13 00:03:11
8 2 23.4 948.8 2018-12-13 00:03:12
...
492 2 21.4 948.4 2018-12-13 00:03:25
493 1 22.4 948.2 2018-12-13 00:04:01
494 1 24.4 948.7 2018-12-13 00:04:02
495 2 27.4 948.1 2018-12-13 00:05:04
Considering this data set, instead of pulling all those rows, I would like to maybe pull a row from the set every 50 records (10 rows for roughly ~500 rows returned).
This does not need to be exact, I just need a sample in which to perform some sort of linear regression on.
Is this even possible? I can do it in my application code if need be, but I wanted to see if there was a function or something in MySQL that would handle this.
Edit
Here is the query I have tried, which works for now - but I would like the results more evenly distributed, not by RAND().
SELECT * FROM (
SELECT * FROM (
SELECT t.*, DATE_SUB(NOW(), INTERVAL 30 HOUR) as offsetdate
from tracking t
HAVING created > offsetdate) as parp
ORDER BY RAND()
LIMIT 10) as mastr
ORDER BY id ASC;

Do not order by RAND() as the rand calculated for every row, then reordered and only then you are selecting a few records.
You can try something like this:
SELECT
*
FROM
(
SELECT
tracking.*
, #rownum := #rownum + 1 AS rownum
FROM
tracking
, (SELECT #rownum := 0) AS dummy
WHERE
created > DATE_SUB(NOW(), INTERVAL 30 HOUR)
) AS s
WHERE
(rownum % 10) = 0
Index on created is "the must".
Also, you might consider to use something like 'AND (UNIX_TIMESTAMP(created) % 60 = 0)' which is slightly different from what you wanted, however might be OK (depends on your insert distribution)

How to get the lowest price from a particular group of users MYSQL

I have been at this for a few days without much luck and I am looking for some guidance on how to get the lowest estimate from a particular group of sullpiers and then place it into another table.
I have 4 supplier estimate on every piece of work and all new estimates go into a single table, i am trying to find the lowest 'mid' price from the 4 newsest entries in the 'RECENT QUOTE TABLE' with a group id of '1' and then place that into the 'LOWEST QUOTE TABLE' as seen below.
RECENT QUOTE TABLE:
suppid group min mid high
1 1 200 400 600
2 1 300 500 700
3 1 100 300 500
[4] [1] 50 [150] 300
5 2 1000 3000 5000
6 2 3000 5000 8000
7 2 2000 4000 6000
8 2 1250 3125 5578
LOWEST QUOTE TABLE:
suppid group min mid high
4 1 50 150 300
Any help on how to structure this would be great as i have been loking for a few days and have not been able to find anything to get me moving again, im using MYSQL and the app is made in Python im open to all suggestions.
Thanks in advance.

If you really want to select only row with group 1, you can do something like
INSERT INTO lowest_quote_table
SELECT * FROM recent_quote_table
WHERE `group` = 1
ORDER BY `mid` ASC
LIMIT 1.
If you want a row with the lowest mid from every group, you can do something like
INSERT INTO lowest_quote_table
SELECT rq.* FROM recent_quote_table AS rq
JOIN (
SELECT `group`, MIN(`mid`) AS min_mid FROM recent_quote_table
GROUP BY `group`
) MQ ON rq.`group` = MQ.`group` AND rq.`mid` = MQ.min_mid

How to sum up Run time hours based on Work Center using SQL

I am trying to sum up Run Time hours based on Work Centre. For example, if Work Center is 108, then sum up Run Time hours for these work centers : 572, 257, 107, 102 and 108. But if work center 108 is not present, then do not sum up.
Here is example with work center 108:
Work Center Run Time Hours
572 0.025
257 1
107 4.284
102 19.046
108 4.865
So the total Rum time hours should come : 29.22
But if Work Center 108 is not present, I do not want to Sum up the total Run Time hours even other work centers 572, 257 107 and 102 are present.
Work Center Run Time Hours
572 0.025
257 1
107 4.284
102 19.046
I tired:
SUM(
CASE
WHEN routingoperations.workcentreid IN ('108', '107','102','105','255','257','572')
THEN (dbo.routingoperations.runtime)
END) AS [Total Runtime Hours]
Thanks for all the help

Why don't you just using a GROUP BY clause? If Work Centre 108 is not in your dataset it will not appear in the aggreagation.
If you want to filter out some records, put them in the WHERE clause.
Just like this:
SELECT
WorkCenter
,TotalRunTime = SUM([Run Time Hours])
FROM [table]
GROUP BY
WorkCenter
WHERE
Workcenter NOT IN ('1','2','3','4')

Based on your last comment:
If this is a static case then you need to check first that is there any record where Work centre = 108 and then do a sum. You can do this with an IF EXISTS statement.
Like this:
IF EXISTS (SELECT 1 FROM WorkCenter WHERE WorkCenter = '108')
BEGIN
SELECT
TotalRunTime = SUM(RunTimeHours)
FROM WorkCenter
WHERE
WorkCenter in ('572','257','107','102','108')
END
ELSE
SELECT TotalRunTime = 0
If you have various different cases based on workcenters then this logic is not the best solution.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

SELECT TOP PERCENT, VaR, Expected Shortfall in MySQL - mysql

Related

mysql query to identify groups of data based on timestmp

Get the average of values in every specific epoch ranges in unix timestamp which returns -1 in specific condition in MySQL

Select a distributed sample set of records from a MySQL set of many records

How to get the lowest price from a particular group of users MYSQL

How to sum up Run time hours based on Work Center using SQL

Categories

Resources