Join two tables, different date frequency MySQL - mysql

I'm using MySQL to calculate returns for my portfolio. So, I have a table for portfolios, the holding period is 6 months say:
table Portfolio
DATE_ TCIKER WEIGHT
2007-01-31 AAPL 0.2
2007-01-31 IBM 0.2
2007-01-31 FB 0.3
2007-01-31 MMM 0.3
2007-07-31 AAPL 0.1
2007-07-31 FB 0.8
2007-07-31 AMD 0.1
... ... ...
And I have a monthly stat table for these companies(the whole universe of stocks) including monthly returns:
table stats
DATE_ TICKER RETURN OTHER_STATS
2007-01-31 AAPL 0.01 ...
2007-01-31 IBM 0.03 ...
2007-01-31 FB 0.13 ...
2007-01-31 MMM -0.07 ...
2007-02-31 AAPL 0.03 ...
2007-02-31 IBM 0.04 ...
2007-02-31 FB 0.06 ...
2007-02-31 MMM -0.10 ...
I'm re-balancing the portfolio every 6 month. So during these 6 months, the weights of each stock won't change. What I want to get is something like this:
ResultTable
DATE_ TICKER RETURN OTHER_STATS WEIGHT
2007-01-31 AAPL 0.01 ... 0.2
2007-01-31 IBM 0.03 ... 0.2
2007-01-31 FB 0.13 ... 0.3
2007-01-31 MMM -0.07 ... 0.3
2007-02-31 AAPL 0.03 ... 0.2
2007-02-31 IBM 0.04 ... 0.2
2007-02-31 FB 0.06 ... 0.3
2007-02-31 MMM -0.10 ... 0.3
2007-03-31 AAPL 0.03 ... 0.2
2007-03-31 IBM 0.14 ... 0.2
2007-03-31 FB 0.16 ... 0.3
2007-03-31 MMM -0.06 ... 0.3
... ... ... ... ...
2007-07-31 AAPL ... ... 0.1
2007-07-31 FB ... ... 0.8
2007-07-31 AMD ... ... 0.1
2007-08-31 AAPL ... ... 0.1
2007-08-31 FB ... ... 0.8
2007-08-31 AMD ... ... 0.1
I tired
select s.*, p.WEIGHT from portfolio p
left join stats s
on p.DATE_ = s.DATE_
and p.TICKER= s.TICKER;
It would only give me the dates of my portfolio re-balance date.
Is there any efficient way to calculate the monthly returns?

This might work, if I understand you formula:
SELECT
p.`DATE_`,
p.`TICKER`,
SUM(s.`RETURN` * p.`WEIGHT`) as `return`,
p.WEIGHT
FROM `portfolio` p
LEFT JOIN `stats` s
ON p.`TICKER` = s.`TICKER`
WHERE s.`DATE_` BETWEEN p.`DATE_` AND DATE_ADD(DATE_ADD(p.`DATE_`, INTERVAL 6 MONTHS),INTERVAL -1 DAY)
GROUP BY p.`DATE_`, p.`TICKER`
ORDER BY p.`DATE_`, p.`TICKER`;

Related

The loss effects in multitask learning framework

I have designed a multi-task network where the first layers are shared between two output layers. Through investigating multi-task learning principles, I got to know that there should be a weight scalar parameter such as alpha that dampens the two losses outputted from two output layers. My question is about this parameter itself. Does it have effect on the model's final performance? probably yes.
This is the part of my code snippet for computation of losses:
...
mtl_loss = (alpha) * loss_1 + (1-alpha) * loss_2
mtl_loss.backward()
...
Above, loss_1 is MSELoss, and loss_2 is CrossEntropyLoss. As such, picking alpha=0.9, I'm getting the following loss values during training steps:
[2020-05-03 04:46:55,398 INFO] Step 50/150000; loss_1: 0.90 + loss_2: 1.48 = mtl_loss: 2.43 (RMSE: 2.03, F1score: 0.07); lr: 0.0000001; 29 docs/s; 28 sec
[2020-05-03 04:47:23,238 INFO] Step 100/150000; loss_1: 0.40 + loss_2: 1.27 = mtl_loss: 1.72 (RMSE: 1.38, F1score: 0.07); lr: 0.0000002; 29 docs/s; 56 sec
[2020-05-03 04:47:51,117 INFO] Step 150/150000; loss_1: 0.12 + loss_2: 1.19 = mtl_loss: 1.37 (RMSE: 0.81, F1score: 0.08); lr: 0.0000003; 29 docs/s; 84 sec
[2020-05-03 04:48:19,034 INFO] Step 200/150000; loss_1: 0.04 + loss_2: 1.10 = mtl_loss: 1.20 (RMSE: 0.55, F1score: 0.07); lr: 0.0000004; 29 docs/s; 112 sec
[2020-05-03 04:48:46,927 INFO] Step 250/150000; loss_1: 0.02 + loss_2: 0.96 = mtl_loss: 1.03 (RMSE: 0.46, F1score: 0.08); lr: 0.0000005; 29 docs/s; 140 sec
[2020-05-03 04:49:14,851 INFO] Step 300/150000; loss_1: 0.02 + loss_2: 0.99 = mtl_loss: 1.05 (RMSE: 0.43, F1score: 0.08); lr: 0.0000006; 29 docs/s; 167 sec
[2020-05-03 04:49:42,793 INFO] Step 350/150000; loss_1: 0.02 + loss_2: 0.97 = mtl_loss: 1.04 (RMSE: 0.43, F1score: 0.08); lr: 0.0000007; 29 docs/s; 195 sec
[2020-05-03 04:50:10,821 INFO] Step 400/150000; loss_1: 0.01 + loss_2: 0.94 = mtl_loss: 1.00 (RMSE: 0.41, F1score: 0.08); lr: 0.0000008; 29 docs/s; 223 sec
[2020-05-03 04:50:38,943 INFO] Step 450/150000; loss_1: 0.01 + loss_2: 0.86 = mtl_loss: 0.92 (RMSE: 0.40, F1score: 0.08); lr: 0.0000009; 29 docs/s; 252 sec
As training loss shows, it seems that my first network that uses MSELoss converges super fast, while the second network has not been converged yet. RMSE, and F1score are two metrics that I'm using to track the progress of first, and second network, respectively.
I know that picking the optimal alpha is somewhat experimental, but are there hints to make the process of picking it easier? Specifically, I want the networks being trained in line with each other, not like above that the first network converges super duper fast. Can alpha parameter help controlling this?
With that alpha, loss_1 is contributing more to the result and due backpropagation updates weights proportionally to error it improves faster. Try using more equilibrated alpha to balance the performance in both tasks.
You also can try change alpha during training.

Non linear regression on Scilab

I'm new to scilab (I'm using 5.5.2), and I need to make a non linear regression.
I have a dataset of points which behaves like a sinewave, so I want to find the parameters of this sinewave.
Here is my dataset :
t1=[11800, 11805, 11810, 11817, 11824, 11829, 11834, 11839, 11844, 11849, 11854, 11859, 11866, 11871, 11878, 11883, 11888, 11893, 11898, 11903, 11908, 11915, 11920, 11928, 11933, 11938, 11943, 11948, 11953, 11958, 11965, 11970, 11975, 11980, 11987, 11992, 11997, 12002, 12007, 12014, 12019, 12024, 12029, 12037, 12042, 12047, 12052, 12057, 12063, 12069, 12074, 12079, 12084, 12091, 12096, 12101, 12106, 12111, 12119, 12123, 12128, 12133, 12138, 12146, 12151, 12156, 12161, 12169, 12174, 12179, 12184, 12188, 12193, 12201, 12206, 12211, 12218, 12223, 12228, 12233, 12238, 12243, 12251, 12256, 12260, 12268, 12273, 12278, 12283, 12288, 12292, 12297, 12302, 12310, 12317, 12322, 12327, 12332, 12337]
v1=[
0.36
0.59
0.81
0.92
0.90
0.76
0.54
0.31
0.17
0.19
0.36
0.59
0.81
0.92
0.90
0.76
0.54
0.31
0.17
0.19
0.36
0.59
0.81
0.92
0.90
0.77
0.54
0.31
0.17
0.19
0.35
0.59
0.81
0.92
0.90
0.77
0.55
0.32
0.18
0.19
0.35
0.59
0.80
0.92
0.90
0.77
0.55
0.32
0.17
0.19
0.35
0.59
0.80
0.92
0.90
0.79
0.55
0.32
0.18
0.18
0.35
0.59
0.80
0.92
0.92
0.79
0.57
0.32
0.18
0.18
0.35
0.58
0.80
0.92
0.92
0.79
0.57
0.32
0.18
0.18
0.35
0.58
0.80
0.92
0.92
0.79
0.57
0.34
0.18
0.18
0.34
0.58
0.80
0.92
0.92
0.80
0.57
0.34
0.18
]
In order to make the non linear regression I added the toolbox which contains the nlinregr function and called it like this :
fun='A*sin(W*t1+P)'
dfun='[sin(W*t1+P), A*t1*cos(W*t1+P), A*cos(W*t1+P)]'
[p, yhat,stat]=nlinregr([t1 v1], 't1 v1', fun, dfun,'A W P', 'v1')
With 'fun' the sinewave function I'm trying to fit, 'dfun' the matrix made of the analytical derivative depending on my parameters A, W and P.
While executing this function I'm getting the error "Incoherent Multiplication" but after 2 hours I'm still not able to point out where the problem is ....
Can someone help me please ?
datafit is dedicated to such non-linear fitting. It's a native Scilab function.
Code to perform the fitting and to show results:
t1 = [11800, 11805, 11810, 11817, 11824, 11829, 11834, 11839, 11844, 11849, 11854, 11859, 11866, 11871, 11878, 11883, 11888, 11893, 11898, 11903, 11908, 11915, 11920, 11928, 11933, 11938, 11943, 11948, 11953, 11958, 11965, 11970, 11975, 11980, 11987, 11992, 11997, 12002, 12007, 12014, 12019, 12024, 12029, 12037, 12042, 12047, 12052, 12057, 12063, 12069, 12074, 12079, 12084, 12091, 12096, 12101, 12106, 12111, 12119, 12123, 12128, 12133, 12138, 12146, 12151, 12156, 12161, 12169, 12174, 12179, 12184, 12188, 12193, 12201, 12206, 12211, 12218, 12223, 12228, 12233, 12238, 12243, 12251, 12256, 12260, 12268, 12273, 12278, 12283, 12288, 12292, 12297, 12302, 12310, 12317, 12322, 12327, 12332, 12337];
v1 = [0.36 0.59 0.81 0.92 0.9 0.76 0.54 0.31 0.17 0.19 0.36 0.59 0.81 0.92 0.9 0.76 0.54 0.31 0.17 0.19 0.36 0.59 0.81 0.92 0.9 0.77 0.54 0.31 0.17 0.19 0.35 0.59 0.81 0.92 0.9 0.77 0.55 0.32 0.18 0.19 0.35 0.59 0.8 0.92 0.9 0.77 0.55 0.32 0.17 0.19 0.35 0.59 0.8 0.92 0.9 0.79 0.55 0.32 0.18 0.18 0.35 0.59 0.8 0.92 0.92 0.79 0.57 0.32 0.18 0.18 0.35 0.58 0.8 0.92 0.92 0.79 0.57 0.32 0.18 0.18 0.35 0.58 0.8 0.92 0.92 0.79 0.57 0.34 0.18 0.18 0.34 0.58 0.8 0.92 0.92 0.8 0.57 0.34 0.18];
function g = gap(p, Data)
// v = p(1) + p(2)*sin(p(3)*t+p(4))
// p = [v_offset, amplitude, angular_frequency, phase]
t = Data(1,:)
v = Data(2,:)
g = v - (p(1) + p(2)*sin(p(3)*t+p(4)))
endfunction
p0 = [0.50 0.35 2*%pi/50 0.5]; // initial guess of fitting parameters
[p, dmin, status] = datafit(gap, [t1 ; v1], p0, "ar",200)
vfit = p(1) + p(2)*sin(p(3)*t1+p(4));
clf
subplot(2,1,1), plot(t1,v1), title "Given data" fontsize 3
subplot(2,1,2), plot(t1, vfit-v1), title "Fit - data" fontsize 3
Results:
--> [p, dmin, status] = datafit(gap, [t1 ; v1], p0, "ar",200)
p =
0.572807 0.3879044 0.114452 131.77806
offset amplitude angular phase
frequency
dmin =
0.0331676 average v distance between data and fit
status =
9. means "OK"

Round off to 0.5 SQL

I'm new in SQL.
How do I round off if:
1.01 -- 1.24 -> 1
1.25 -- 1.49 -> 1.5
1.51 -- 1.74 -> 1.5
1.75 -- 1.99 -> 2
Thanks for your help, much appreciated.
You can just do:
select floor(val * 2 + 0.5) / 2

Concatenating multiple csv files into one

I have multiple .csv files and I want to concatenate them into one file. Essentially I would like to choose certain columns and append them side by side.
This code I have here doesn't work. No error message at all. It just does nothing.
Does anybody know how to fix it?
import pandas as pd
import datetime
import numpy as np
import glob
import csv
import os
def concatenate(indir='/My Documents/Python/Test/in',
outfile='/My Documents/Python/Test/out/Forecast.csv'):
os.chdir(indir)
fileList = glob.glob('*.csv')
print(fileList)
dfList = []
colnames=["DateTime","WindSpeed","Capacity","p0.025","p0.05","p0.1","p0.5","p0.9","p0.95","p0.975","suffix"]
for filename in fileList:
print(filename)
df = pd.read_csv(filename ,delimiter=',',engine = 'python', encoding='latin-1', index_col = False)
dfList.append(df)
concatDF = pd.concat(dfList,axis=0)
concatDF.columns=colnames
concatDF.to_csv(outfile,index=None)
I ran this code to set up files on my file system
setup
import pandas as pd
import numpy as np
def setup_test_files(indir='in'):
colnames = [
"WindSpeed", "Capacity",
"p0.025", "p0.05", "p0.1", "p0.5",
"p0.9", "p0.95", "p0.975", "suffix"
]
tidx = pd.date_range('2016-03-31', periods=3, freq='M', name='DateTime')
for filename in ['in/fn_{}.csv'.format(i) for i in range(3)]:
pd.DataFrame(
np.random.rand(3, len(colnames)),
tidx, colnames
).round(2).to_csv(filename)
print(filename)
setup_test_files()
This created 3 files named ['fn_0.csv', 'fn_1.csv', 'fn_2.csv']
They look like this
with open('in/fn_0.csv', 'r') as fo:
print(''.join(fo.readlines()))
DateTime,WindSpeed,Capacity,p0.025,p0.05,p0.1,p0.5,p0.9,p0.95,p0.975,suffix
2016-03-31,0.03,0.76,0.62,0.21,0.76,0.36,0.44,0.61,0.23,0.04
2016-04-30,0.39,0.12,0.31,0.99,0.86,0.35,0.15,0.61,0.55,0.03
2016-05-31,0.72,1.0,0.71,0.86,0.41,0.79,0.22,0.76,0.92,0.79
I'll define a parser function and one that does the concatenation separately. Why? Because I think it's easier to follow that way.
import pandas as pd
import glob
import os
def read_csv(fn):
colnames = [
"DateTime", "WindSpeed", "Capacity",
"p0.025", "p0.05", "p0.1", "p0.5",
"p0.9", "p0.95", "p0.975", "suffix"
]
df = pd.read_csv(fn, encoding='latin-1')
df.columns = colnames
return df
def concatenate(indir='in', outfile='out/Forecast.csv'):
curdir = os.getcwd()
try:
os.chdir(indir)
file_list = glob.glob('*.csv')
df_names = [fn.replace('.csv', '') for fn in file_list]
concat_df = pd.concat(
[read_csv(fn) for fn in file_list],
axis=1, keys=df_names)
# notice I was nice enough to change directory back :-)
os.chdir(curdir)
concat_df.to_csv(outfile, index=None)
except:
os.chdir(curdir)
Then run concatenation
concatenate()
You can read in the results like this
print(pd.read_csv('out/Forecast.csv', header=[0, 1]))
fn_0 \
DateTime WindSpeed Capacity p0.025 p0.05 p0.1 p0.5 p0.9 p0.95 p0.975
0 2016-03-31 0.03 0.76 0.62 0.21 0.76 0.36 0.44 0.61 0.23
1 2016-04-30 0.39 0.12 0.31 0.99 0.86 0.35 0.15 0.61 0.55
2 2016-05-31 0.72 1.00 0.71 0.86 0.41 0.79 0.22 0.76 0.92
... fn_2
... WindSpeed Capacity p0.025 p0.05 p0.1 p0.5 p0.9 p0.95 p0.975 suffix
0 ... 0.80 0.79 0.38 0.94 0.91 0.18 0.27 0.14 0.39 0.91
1 ... 0.60 0.97 0.04 0.69 0.04 0.65 0.94 0.81 0.37 0.22
2 ... 0.78 0.53 0.83 0.93 0.92 0.12 0.15 0.65 0.06 0.11
[3 rows x 33 columns]
Notes:
You aren't taking care to make DateTime your index. I think this is probably what you want. If so, change the read_csv and concatenate functions to this
import pandas as pd
import glob
import os
def read_csv(fn):
colnames = [
"WindSpeed", "Capacity",
"p0.025", "p0.05", "p0.1", "p0.5",
"p0.9", "p0.95", "p0.975", "suffix"
]
# notice extra parameters for specifying index and parsing dates
df = pd.read_csv(fn, index_col=0, parse_dates=[0], encoding='latin-1')
df.index.name = "DateTime"
df.columns = colnames
return df
def concatenate(indir='in', outfile='out/Forecast.csv'):
curdir = os.getcwd()
try:
os.chdir(indir)
file_list = glob.glob('*.csv')
df_names = [fn.replace('.csv', '') for fn in file_list]
concat_df = pd.concat(
[read_csv(fn) for fn in file_list],
axis=1, keys=df_names)
os.chdir(curdir)
concat_df.to_csv(outfile)
except:
os.chdir(curdir)
This is what final result looks like with this change, notice the dates will be aligned this way
fn_0 \
WindSpeed Capacity p0.025 p0.05 p0.1 p0.5 p0.9 p0.95 p0.975
DateTime
2016-03-31 0.03 0.76 0.62 0.21 0.76 0.36 0.44 0.61 0.23
2016-04-30 0.39 0.12 0.31 0.99 0.86 0.35 0.15 0.61 0.55
2016-05-31 0.72 1.00 0.71 0.86 0.41 0.79 0.22 0.76 0.92
... fn_2 \
suffix ... WindSpeed Capacity p0.025 p0.05 p0.1 p0.5 p0.9
DateTime ...
2016-03-31 0.04 ... 0.80 0.79 0.38 0.94 0.91 0.18 0.27
2016-04-30 0.03 ... 0.60 0.97 0.04 0.69 0.04 0.65 0.94
2016-05-31 0.79 ... 0.78 0.53 0.83 0.93 0.92 0.12 0.15
p0.95 p0.975 suffix
DateTime
2016-03-31 0.14 0.39 0.91
2016-04-30 0.81 0.37 0.22
2016-05-31 0.65 0.06 0.11
[3 rows x 30 columns]

SQL money round to closest 0.05 cents

I'm trying to round money in MySQL SELECT to the closest 0.05 cents.
so numbers like:
140.70 should become 140.70
140.71 should become 140.70
140.72 should become 140.70
140.73 should become 140.75
140.74 should become 140.75
140.75 should become 140.75
140.76 should become 140.75
140.77 should become 140.75
140.78 should become 140.80
140.79 should become 140.80
So more in detail
0.00 = 0.00
0.01 = 0.00
0.02 = 0.00
0.022 = 0.00 // here the magic should happen 0.022 is closer to 0, so result is 0
0.023 = 0.05 // but 0.023 should be rounded to 0.05! cause first round 0.023 to 0.025 which should then be rounded up to 0.05
0.03 = 0.05
I've tried some different things with MySQL CEIL() and MySQL FLOOR() but couldn't get the right result.
Created a SQL Fiddle here
With a table which makes no sense, except we need one to SELECT from:
CREATE TABLE hello ( world varchar(255) );
INSERT INTO hello (world) VALUES ('blubb');
This is the select Query:
SELECT
CEILING ( 0.05 / 0.05 ) * 0.05 AS CEIL_1,
CEILING ( 0.06 / 0.05 ) * 0.05 AS CEIL_2,
CEILING ( 0.07 / 0.05 ) * 0.05 AS CEIL_3,
CEILING ( 0.08 / 0.05 ) * 0.05 AS CEIL_4,
CEILING ( 0.09 / 0.05 ) * 0.05 AS CEIL_5
FROM hello;
Anyone here telling me how to do it right?
SELECT ROUND(140.77/5,2) * 5;
+-----------------------+
| ROUND(140.77/5,2) * 5 |
+-----------------------+
| 140.75 |
+-----------------------+