MySQL cumulative product group by - mysql

I've been working with the WRDS/CRSP dataset (a stock price database maintained by UPenn for academic research). I've been downloading the data in Python and inserting it into my local MySQL database.
The data looks like this and has primary key on (quote_date, security_id):
quote_date security_id tr accum_index
10-Jan-86 10002 null 1000
13-Jan-86 10002 -0.026595745 973.4042548
14-Jan-86 10002 0.005464481 978.7234036
15-Jan-86 10002 -0.016304348 962.7659569
16-Jan-86 10002 0 962.7659569
17-Jan-86 10002 0 962.7659569
20-Jan-86 10002 0 962.7659569
21-Jan-86 10002 0.005524862 968.0851061
22-Jan-86 10002 -0.005494506 962.765957
23-Jan-86 10002 0 962.765957
24-Jan-86 10002 -0.005524862 957.4468078
27-Jan-86 10002 0.005555556 962.7659569
28-Jan-86 10002 0 962.7659569
29-Jan-86 10002 0 962.7659569
30-Jan-86 10002 0 962.7659569
31-Jan-86 10002 0.027624309 989.3617013
3-Feb-86 10002 0.016129032 1005.319148
4-Feb-86 10002 0.042328041 1047.872338
5-Feb-86 10002 0.04568528 1095.744679
I need to calculate the accum_index column which is basically an index of the total return of the stock and is calculated as follows:
accum_index_t = accum_index_{t-1} * (1 + tr_t)
The table has 80m rows. I've wrote some code to iterating through every security_id and calculate a cumulative product, like so:
select #sid := min(security_id)
from stock_prices;
create temporary table prices (
quote_date datetime,
security_id int,
tr double null,
accum_index double null,
PRIMARY KEY (quote_date, security_id)
);
while #sid is not null
do
select 'security_id', #sid;
select #accum := null;
insert into prices
select quote_date, security_id, tr, accum_index
from stock_prices
where security_id = #sid
order by quote_date asc;
update prices
set accum_index = (#accum := ifnull(#accum * (1 + tr), 1000.0));
update stock_prices p use index(PRIMARY), prices a use index(PRIMARY)
set p.accum_index = a.accum_index
where p.security_id = a.security_id
and p.quote_date = a.quote_date;
select #sid := min(security_id)
from stock_prices
where security_id > #sid;
delete from prices;
end while;
drop table prices;
But this is too slow, it's taking about a minute per security on my laptop and it will take years to calculate this series. Is there a way to vectorise this?
Cheers,
Steve

If you're using MySQL 8, you could use window functions to create the cumulative product. Unfortunately, there is no PROD() aggregate / window function in any SQL database I'm aware of, but you can emulate it using EXP(SUM(LOG(factor))):
SELECT
quote_date,
security_id,
tr,
1000 * (EXP(SUM(LOG(1 + COALESCE(tr, 0)))
OVER (PARTITION BY security_id ORDER BY quote_date)))
AS accum_index
FROM stock_prices
dbfiddle here.

If you're using MySQL 5, you can emulate this function multiplying current with last tr line by line. After that we take the accumulated value of the last line.
tr is percentual value, right now?
So lets add 1 to each tr.
The first stored value will be neutral 1.
Try this:
SET #variation = 1;
SET #row_number = 0;
SELECT accumulateTr
FROM
(SELECT
#row_number := (#row_number + 1) AS rowNumber,
#variation := (1 + variation) * #variation AS accumulateTr
FROM
prices) accumulatedTrs
ORDER BY rowNumber DESC
LIMIT 1;

Related

MySQL select data on the basis of source type

I am working on MySQL. I have a table in which there are some records. Below is my table
CREATE TABLE `mdc_meters_data` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`msn` varchar(100) DEFAULT NULL,
`kwh_t` varchar(100) DEFAULT NULL,
`data_date_time` datetime DEFAULT NULL,
`s_type` varchar(100) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=52702 DEFAULT CHARSET=latin1;
/*Data for the table `mdc_meters_data` */
insert into `mdc_meters_data`(`id`,`msn`,`kwh_t`,`data_date_time`,`s_type`) values(49641,'4A60193390662','2068.3','2020-11-01 00:02:17','WAPDA'),
(49642,'00209701','1476.59','2020-11-01 00:02:47','Sync Meter'),(49643,'00209702','1389.79','2020-11-01 00:03:17','Sync Meter'),(49644,'4A60193390662','2068.3','2020-11-01 00:04:57','WAPDA'),(49645,'00209701','1476.6','2020-11-01 00:05:28','Sync Meter'),(49646,'00209702','1389.81','2020-11-01 00:05:58','Sync Meter'),(49647,'4A60193390662','2068.3','2020-11-01 00:07:38','WAPDA'),(49648,'00209701','1476.6','2020-11-01 00:08:08','Sync Meter'),(49649,'00209702','1389.81','2020-11-01 00:08:38','Sync Meter'),(49650,'4A60193390662','2068.3','2020-11-01 00:10:19','WAPDA'),(49651,'00209701','1476.6','2020-11-01 00:10:49','Sync Meter'),(49652,'00209702','1389.82','2020-11-01 00:11:19','Sync Meter'),(49653,'4A60193390662','2068.3','2020-11-01 00:12:59','Generator'),(49654,'00209701','1476.61','2020-11-01 00:13:30','Sync Meter'),(49655,'00209702','1389.83','2020-11-01 00:14:00','Sync Meter'),(49656,'4A60193390662','2068.3','2020-11-01 00:15:40','Generator'),(49657,'00209701','1476.61','2020-11-01 00:16:10','Sync Meter'),(49658,'00209702','1389.84','2020-11-01 00:16:40','Sync Meter'),(49659,'4A60193390662','2068.3','2020-11-01 00:18:20','Generator'),(49660,'00209701','1476.61','2020-11-01 00:18:51','Sync Meter'),(49661,'00209702','1389.84','2020-11-01 00:19:21','Sync Meter'),(49662,'4A60193390662','2068.3','2020-11-01 00:21:01','Generator'),(49663,'00209701','1476.61','2020-11-01 00:21:31','Sync Meter'),(49664,'00209702','1389.85','2020-11-01 00:22:01','Sync Meter'),(49665,'4A60193390662','2068.3','2020-11-01 00:23:42','WAPDA'),(49666,'00209701','1476.62','2020-11-01 00:24:12','Sync Meter'),(49667,'00209702','1389.86','2020-11-01 00:24:42','Sync Meter'),(49668,'4A60193390662','2068.3','2020-11-01 00:26:22','WAPDA'),(49669,'00209701','1476.63','2020-11-01 00:26:53','Sync Meter'),(49670,'00209702','1389.88','2020-11-01 00:27:23','Sync Meter'),(49671,'4A60193390662','2068.3','2020-11-01 00:29:03','WAPDA'),(49672,'00209701','1476.63','2020-11-01 00:29:33','Sync Meter'),(49673,'00209702','1389.88','2020-11-01 00:30:03','Sync Meter'),(49674,'4A60193390662','2068.3','2020-11-01 00:31:44','WAPDA');
Same is in SQL Fiddle
What I have done
I am able to carry out the start and end date time of a source named WAPDA and in that time I have carried of the MAX value of kwh_t. I want to check it for every hour in 24 hours span. So I have managed the query like that way.
SELECT
msn,
MAX(kwh_t),
MIN(data_date_time),
MAX(data_date_time)
FROM mdc_meters_data
WHERE s_type = 'WAPDA'
AND data_date_time >= DATE '2020-11-01'
AND data_date_time < DATE '2020-11-02'
GROUP BY msn, DATE(data_date_time), HOUR(data_date_time)
ORDER BY msn, DATE(data_date_time), HOUR(data_date_time);
The above query gives me
msn | MAX(kwh_t)| MIN(data_date_time) | MAX(data_date_time)
=======================================================================
4A60193390662| 2068.3 | 2020-11-01T00:02:17Z | 2020-11-01T00:31:44Z
What I want?
The above result is not correct as seen in Fiddle at 2020-11-01T00:02:17Z the s_type is WAPDA and at 2020-11-01T00:12:59Z the s_type is Generator. Then again at 2020-11-01T00:23:42Z the s_type is again WAPDA and so on. I want to set my query in a way that it will give proper information according to the s_type like below
For WAPDA
msn | MAX(kwh_t)| MIN(data_date_time) | MAX(data_date_time)
=======================================================================
4A60193390662| 2068.3 | 2020-11-01T00:02:17Z | 2020-11-01T00:10:19Z
4A60193390662| 2068.3 | 2020-11-01T00:23:42Z | 2020-11-01T00:31:44Z
For Generator
msn | MAX(kwh_t)| MIN(data_date_time) | MAX(data_date_time)
=======================================================================
4A60193390663| 1000.3 | 2020-11-01T00:12:59Z | 2020-11-01T00:21:01Z
As there is no record for the s_type = WAPDA after 2020-11-01T00:10:19Z and before 2020-11-01T00:22:01Z so the query must start from the value of that particular s_type from where it records began. Same is applied for s_type=Generator
How to achieve it?
Any help would be highly appreciated
This is a gaps and islands problem. To solve your problem, you need to also group your readings according to the s_type value, so that (in your sample data) you extract two distinct groups of WAPDA values (separated by the Generator values). Basically you need to keep an overall row number as well as a row number for each island (so counting restarts whenever s_type changes). Subtracting the latter from the former gives you a constant number for each island, on which you can then group.
This is a tricky problem to solve in MySQL 5.x because of the lack of the ROW_NUMBER function, however that functionality can be emulated using variables. This query should give the results you want:
SELECT msn,
s_type,
MAX(kwh_t) AS max_kwh,
MIN(data_date_time) AS min_date_time,
MAX(data_date_time) AS max_date_time
FROM (
SELECT md.*,
#rn := #rn + 1 AS rn,
#rst := CASE
WHEN #st = s_type THEN #rst + 1
WHEN #st := s_type THEN 1
ELSE 1
END AS rst
FROM (
SELECT *
FROM mdc_meters_data
WHERE s_type != 'Sync Meter'
AND data_date_time >= '2020-11-01'
AND data_date_time < '2020-11-02'
ORDER BY data_date_time
) md
CROSS JOIN (SELECT #rn := 0, #rst := 0, #st := '') init
) m
WHERE s_type = 'WAPDA'
GROUP BY msn, rn - rst, DATE(data_date_time), HOUR(data_date_time)
ORDER BY msn, min_date_time
Output (for your sample data):
msn s_type max_kwh min_date_time max_date_time
4A60193390662 WAPDA 2068.3 2020-11-01 00:02:17 2020-11-01 00:10:19
4A60193390662 WAPDA 2068.3 2020-11-01 00:23:42 2020-11-01 00:31:44
Demo (also showing results for s_type = 'Generator') on dbfiddle.

How to select static values on mysql select query?

I am new to mysql, here i am trying to get data from database table.
select id,txnid,amount,status from txn_details;
With above query Getting data successfully but status column getting 0 or 1 or 2, but i want 0 as failed, 1 as success and 2 as not processed.
How to change my query?
You can use a case
select id, txnid, amount,
case when status = 0 then 'failed'
when status = 1 then 'success'
else 'not processed'
end as status
from txn_details;
We can use an expression in the SELECT list. It could be a searched CASE expression e.g.
SELECT CASE t.status
WHEN 0 THEN 'failed'
WHEN 1 THEN 'success'
WHEN 2 THEN 'not processed'
ELSE 'unknown'
END AS status_name
, t.status
, t.amount
, t.txnid
FROM txn_details t
This approach is ANSI-92 standards compliant, and will work in most relational databases.
There are some other MySQL specific alternatives, such as the ELT function ...
SELECT ELT(t.status+1,'failed','success','not processed') AS status_name
, t.status
, t.amount
, t.txnid
FROM txn_details t
https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_elt
If you prefer a central point of maintenance (ie you prefer not to recode all your queries when a new status comes along) you could create a status table and either use a join or sub query to get the values, alternatively you could create a function, for example
drop table if exists txn_details,txn_status;
create table txn_details(id int, txnid int, amount int , status int);
insert into txn_details values
(1,1,10,1),(2,1,10,2),(3,1,10,4);
create table txn_status (id int, statusval varchar(20));
insert into txn_status values
(1,'success'),(2,'not processed'), (3,'failed');
drop function if exists f;
delimiter $$
create function f(instatus int)
returns varchar(20)
begin
declare rval varchar(20);
return (select
case when instatus = 1 then 'success'
when instatus = 2 then 'not processed'
when instatus = 3 then 'failed'
else 'Unknown'
end
);
select t.*,coalesce(ts.statusval,'Unknown') status
from txn_details t
left join txn_status ts on ts.id = t.status;
select t.*,coalesce((select statusval from txn_status ts where ts.id = t.status),'Unknown') status
from txn_details t;
Note the use of coalesce in case a status is not found.
Both produce this result
+------+-------+--------+--------+---------------+
| id | txnid | amount | status | status |
+------+-------+--------+--------+---------------+
| 1 | 1 | 10 | 1 | success |
| 2 | 1 | 10 | 2 | not processed |
| 3 | 1 | 10 | 4 | Unknown |
+------+-------+--------+--------+---------------+
3 rows in set (0.00 sec)
Using the function like this
select t.*, f(status) as status
from txn_details t;
also produces the same result.
Of course using a status table or a function means you have to communicate their availability and enforce their use.
I would also consider the using a foreign key constraint in txn_details to cut down on the number of unknown values and put procedures in place to stop people adding new status codes at will without going through change control
The following query would work. It uses CASE ... END to determine and return values for the virtual column status.
SELECT id,txnid,amount,
CASE
WHEN status = 0 THEN 'failed'
WHEN status = 1 THEN 'success'
WHEN status= 2 THEN 'not processed'
END AS status
FROM txn_details;

how to count last 7 record of one table using mysql stored procedure ,in my query i want count isprinted as printed when isprinted=1

CREATE DEFINER=`root`#`localhost` PROCEDURE `sp_printnotprintdailyexpenses`(
in i_localBodyId varchar(10),
in i_epId int(20),
out printed INT(20),
out notprinted INT(20)
)
BEGIN
set printed=(select count(isPrinted) from tbl_dailyExpenses
where date((curdate() - 7)) and date(curdate())and
localBodyId =i_localBodyId and epId=i_epId and isPrinted=1 group by CURDATE()-7 );
set notprinted=(select count(isPrinted) from tbl_dailyExpenses
where date((curdate() - 7 )) and date(curdate())and
localBodyId =i_localBodyId and epId=i_epId and isPrinted=0 group by CURDATE()-7 );
END
I don't think you need a procedure. A simple query should suffice. Based on what I have been able to understand from this question (and the other one you asked) this should give you the results you want:
SELECT date,
localBodyId,
epId,
SUM(CASE WHEN isPrinted=1 THEN 1 ELSE 0 END) AS printed,
SUM(CASE WHEN isPrinted=0 THEN 1 ELSE 0 END) AS notprinted
FROM tbl_dailyExpenses
WHERE localBodyId = i_localBodyId AND epId = i_epId AND
date >= NOW() - INTERVAL 7 DAY
GROUP BY date, localBodyId, epId
ORDER BY date DESC
LIMIT 7
In this query you will need to replace i_localBodyId and i_epId with the values you want to test.

How to get the total from 2 column in a week

How can I get the Total Undertime for each employee per week. per month. I have a view in which it has columns undertime and overtime per day per employee. however i need to get the TotalUndertime per week.
|EmpID |DayofWeek|DatePresent|Overtime |Undertime|
|3050001|Friday |2016-04-01 | |00:01:00 |
|3050001|Monday |2016-04-04 | |01:00:00 |
|3050001|Tuesday |2016-04-05 |00:30:00 | |
|3050001|Wednesday|2016-04-06 |00:30:00 | |
|3050001|Thursday |2016-04-07 |00:05:00 | |
|3050001|Friday |2016-04-08 |00:05:00 | |
If the employee has an Undertime on Monday, the employee can pay for the Undertime on the following days from Tuesday - Friday. Or if the Employee has an Undertime on Tuesday, the employee has Wednesday - Friday to pay for the Undertime. TheTotalUndertime = "00:01:00"` in the table shown above.
I'm just a newbie when it comes to mysql queries using date and time. Should i use function or procedure?
I used this code to get it but it didn't work.
CREATE DEFINER = `root`#`localhost` PROCEDURE `getUndertime` ( IN `varDatePresent` DATE, IN `varEmpID` VARCHAR( 8 ) ) NOT DETERMINISTIC NO SQL SQL SECURITY DEFINER
BEGIN SELECT DAY( LAST_DAY( varDatePresent ) )
INTO #totaldays ;
SET #daycount =0;
WHILE(
#daycount < #totaldays
) DO SELECT Undertime
FROM view_dtr
WHERE EmpID LIKE '%varEmpID%'
AND DatePresent = varDatePresent
INTO #undertime ;
SELECT Overtime
FROM view_dtr
WHERE EmpID LIKE '%varEmpID%'
AND DATE_ADD( DatePresent, INTERVAL 1
DAY )
INTO #overtime ;
SET #totalUndertime = #undertime - #overtime ;
SET #daycount = #daycount +1;
END WHILE;
SELECT #totalUndertime ;
END ;
Any suggestion will help me very much.
Thank You in advance.
Wouldn't Group By do the trick?
SELECT EMPID, Week(DatePresent) as 'WeekNumber',
SEC_TO_TIME(SUM(COALESCE(undertime,0)-COALESCE(overtime,0)) * 3600) as 'TotalUndertime'
FROM table_name GROUP BY EmpID, Week(DatePresent)
Usage: https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_week
Edit: Based on your definition of TotalUndertime. As a side note, I strongly suggest not to use LIKE when filtering your EMPID.

How to loop a MYSQL query / maybe in a function?

What I've got so far is a code which updates one row as based on the IF conditions - but it
It is important that it is reliable and atomic (so with transaction).
It should loop/repeat itself based on #q which is quantity - however it can be in a MYSQL function rather than one fixed variable. How can I do that? I've added how the table should look like in the end
SET #q=1000;
SET #p=5.00;
SET #email='test#test.com';
update 1detail
SET quantity =
if((#q := quantity - #q) >= 0, #q, 0)
WHERE price>=#p ORDER BY datetime DESC LIMIT 1;
SET #q = if(#q <0,#q-#q-#q, #q);
UPDATE 1detail
SET quantity =
if(#q > 0, #q,quantity),
email=
if(#q > 0, #email, email)
WHERE price>=#p ORDER BY datetime DESC LIMIT 1;
The table at the beginning
quantity price email datetime
---------------------------------------------------
800 5.00 test1#test.com oldest
50 5.00 test2#test.com 2nd oldest
100 10.00 test3#test.com 3rd oldest (ignore in processing because #p < price)
How it should looks like after looping
quantity price email datetime
-----------------------------------------------------------------
0 5.00 test1#test.com oldest
150* 5.00 test#test.com (changed to #email) 2nd oldest (now newest)
100 10.00 test3#test.com 3rd oldest (ignored)
*this has changed as the oldest only had 800 and the 2nd oldest just had 50
So #q= 1000 (#q) - 800 (oldest) = 200
then #q = 200 (#q) - 50 (2nd oldest)
--> #q = 150
this updates the 2nd oldest row
A small loop can be achieved like this :
declare #counter int
set #counter=0
gohere:
set #counter=#counter+1
if #counter<10
BEGIN
select #counter
GOTO gohere;
END