Error when fitting complex data with leasqr algorithm in Octave - octave

I have data as real and imaginary parts of complex number, and I want to fit them according to a complex function. More in detail, they are data from electrochemical impedance spectroscopy (EIS) experiments, and the function comes from an equivalent circuit.
I am using Octave 7.2.0 in a Windows 10 computer. I need to use the leasqr algorithm, present in the Optim package. The leasqr used the Levenberg-Marquardt nonlinear regression, typical in EIS data fitting.
Regarding the data, xdata are linear frequency, y data are ReZ + j*ImZ.
If I try to fit the complex data with the complex fitting function, I get the following error:
error: weighted residuals are not real
error: called from
__lm_svd__ at line 147 column 20
leasqr at line 662 column 25
Code_for_StackOverflow at line 47 column 73
I tried to fit the real part of the data with the real part of the fitting function, and the imaginary parts of the data, with the imaginary part of the function. The fits are successfully performed, but I have two sets of fitted parameters, while I need only one set.
Here the code I wrote.
clear -a;
clf;
clc;
pkg load optim;
pkg load symbolic;
Linear_freq = [1051.432, 394.2871, 112.6535, 39.42871, 11.59668, 3.458659, 1.065641, 0.3258571, 0.1000221];
ReZ = [84.10412, 102.0962, 178.8031, 283.0663, 366.7088, 431.3653, 514.4105, 650.5853, 895.9588];
MinusImZ = [27.84804, 59.56786, 116.5972, 123.2293, 102.6806, 117.4836, 178.1147, 306.256, 551.2337];
Z = [88.5946744, 118.2030626, 213.4606653, 308.7264008, 380.8131426, 447.0776424, 544.3739605, 719.0646495, 1051.950932];
MinusPhase = [18.32042302, 30.26135402, 33.1083029, 23.52528583, 15.64255593, 15.23515301, 19.09841797, 25.2082044, 31.60167787];
ImZ = -MinusImZ;
Angular_freq = 2*pi*Linear_freq;
xdata = Angular_freq;
ydata = ReZ + j*ImZ;
Fitting_Function = #(xdata, p) (p(1) + ((p(2) + (1./(p(3)*(j*xdata).^0.5))).^-1 + (1./(p(4)*(j*xdata).^p(5))).^-1).^-1);
p = [80, 300, 6.63E-3, 5E-5, 0.8]; # True parameters values, taken with a dedicated software: 76, 283, 1.63E-3, 1.5E-5, 0.876
options.fract_prec = [0.0005, 0.0005, 0.0005, 0.0005, 0.0005].';
niter=400;
tol=1E-12;
dFdp="dfdp";
dp=1E-9*ones(size(p));
wt = abs(sqrt(ydata).^-1);
#[Fitted_Parameters_ReZ pfit_real cvg_real iter_real corp_real covp_real covr_real stdresid_real z_real r2_real] = leasqr(xdata, ReZ, p, Fitting_Function_ReZ, tol, niter, wt, dp, dFdp, options);
#[Fitted_Parameters_ImZ pfit_imag cvg_imag iter_imag corp_imag covp_imag covr_imag stdresid_imag z_imag r2_imag] = leasqr(xdata, ImZ, p, Fitting_Function_ImZ, tol, niter, wt, dp, dFdp, options);
[Fitted_Parameters pfit cvg iter corp covp covr stdresid z r2] = leasqr(xdata, ydata, p, Fitting_Function, tol, niter, wt, dp, dFdp, options);
#########################################################################
# Calculate the fitted functions, with the fitted paramteres array
#########################################################################
Fitted_Function_Real = real(pfit_real(1) + ((pfit_real(2) + (1./(pfit_real(3)*(j*xdata).^0.5))).^-1 + (1./(pfit_real(4)*(j*xdata).^pfit_real(5))).^-1).^-1);
Fitted_Function_Imag = imag(pfit_imag(1) + ((pfit_imag(2) + (1./(pfit_imag(3)*(j*xdata).^0.5))).^-1 + (1./(pfit_imag(4)*(j*xdata).^pfit_imag(5))).^-1).^-1);
Fitted_Function = Fitted_Function_Real + j.*Fitted_Function_Imag;
Fitted_Function_Mod = abs(Fitted_Function);
Fitted_Function_Phase = (-(angle(Fitted_Function))*(180./pi));
################################################################################
# Calculate the residuals, from https://iopscience.iop.org/article/10.1149/1.2044210
# An optimum fit is obtained when the residuals are spread randomly around the log Omega axis.
# When the residuals show a systematic deviation from the horizontal axis, e.g., by forming
# a "trace" around, above, or below the log co axis, the complex nonlinear least squares (CNLS) fit is not adequate.
################################################################################
Residuals_Real = (ReZ-Fitted_Function_Real)./Fitted_Function_Mod;
Residuals_Imag = (ImZ-Fitted_Function_Imag)./Fitted_Function_Mod;
################################################################################
# Calculate the chi-squared - reduced value, with the fitted paramteres array NOVA manual page 452
################################################################################
chi_squared_ReZ = sum(((ReZ-Fitted_Function_Real).^2)./Z.^2)
chi_squared_ImZ = sum(((ImZ-Fitted_Function_Imag).^2)./Z.^2)
Pseudo_chi_squared = sum((((ReZ-Fitted_Function_Real).^2)+((ImZ-Fitted_Function_Imag).^2))./Z.^2)
disp('The values of the parameters after the fit of the real function are '), disp(pfit_real);
disp('The values of the parameters after the fit of the imaginary function are '), disp(pfit_imag);
disp("R^2, the coefficient of multiple determination, intercept form (not suitable for non-real residuals) is "), disp(r2_real), disp(r2_imag);
###################################################
## PLOT Data and the Function
###################################################
#Set plot parameters
set(0, "defaultlinelinewidth", 1);
set(0, "defaulttextfontname", "Verdana");
set(0, "defaulttextfontsize", 20);
set(0, "DefaultAxesFontName", "Verdana");
set(0, 'DefaultAxesFontSize', 12);
figure(1);
## Nyquist plot (Argand diagram)
subplot(1,2,1, "align");
plot((ReZ), (MinusImZ), "o", "markersize", 2, (Fitted_Function_Real), -(Fitted_Function_Imag), "-k");
axis ("square");
grid on;
daspect([1 1 2]);
title ('Nyquist Plot - Argand Diagram');
xlabel ('Z'' / \Omega' , 'interpreter', 'tex');
ylabel ('-Z'''' / \Omega', 'interpreter', 'tex');
## Bode Modulus
subplot (2, 2, 2);
loglog((Linear_freq), (Z), "o", "markersize", 2, (Linear_freq), (Fitted_Function_Mod), "-k");
grid on;
title ('Bode Plot - Modulus');
xlabel ('\nu (Hz)' , 'interpreter', 'tex');
ylabel ('|Z| / \Omega', 'interpreter', 'tex');
## Bode Phase
subplot (2, 2, 4);
semilogx((Linear_freq), (MinusPhase), "o", "markersize", 2, (Linear_freq), (Fitted_Function_Phase), "-k");
set(gca,'YTick',0:10:90);
grid on;
title ('Bode Plot - Phase');
xlabel ('\nu (Hz)' , 'interpreter', 'tex');
ylabel ('-\theta (°)', 'interpreter', 'tex');
figure(2)
## Bode Z'
subplot (2, 1, 1);
semilogx((Linear_freq), (ReZ), "o", "markersize", 2, (Linear_freq), (Fitted_Function_Real), "-k");
grid on;
title ('Bode Plot Z''');
xlabel ('\nu (Hz)' , 'interpreter', 'tex');
ylabel ('Z'' / \Omega', 'interpreter', 'tex');
## Bode -Z''
subplot (2, 1, 2);
#subplot (2, 2, 4);
semilogx((Linear_freq), (MinusImZ), "o", "markersize", 2, (Linear_freq), -(Fitted_Function_Imag), "-k");
grid on;
title ('Bode Plot -Z''''');
xlabel ('\nu (Hz)' , 'interpreter', 'tex');
ylabel ('-Z'''' / \Omega', 'interpreter', 'tex');
figure(3)
## Residuals Real
subplot (2, 1, 1);
semilogx((Angular_freq), (Residuals_Real), "-o", "markersize", 2);
grid on;
title ('Residuals Real');
xlabel ('\omega (Hz)' , 'interpreter', 'tex');
ylabel ('\Delta_{re} / \Omega', 'interpreter', 'tex');
## Residuals Imaginary
subplot (2, 1, 2);
#subplot (2, 2, 4);
semilogx((Angular_freq), (Residuals_Imag), "-o", "markersize", 2);
grid on;
title ('Residuals Imaginary');
xlabel ('\omega (Hz)' , 'interpreter', 'tex');
ylabel ('\Delta_{im} / \Omega', 'interpreter', 'tex');
Octave should be able to handle complex numbers. What do I do wrong?
I was thinking to fit the real part of the data with the real part of the fitting function, and then using the Kramers-Kronig relations to get the imaginary part of the fitted function, but I would like to avoid this method, if possible.
Any help would be greatly appreciated, thanks in advance.

From your data drawing the complex impedances diagram makes appear a rather common shape that can be model with a lot of equivalent circuits :
Reference : https://fr.scribd.com/doc/71923015/The-Phasance-Concept
You chose the model n°2 probably according to some physical considerations. This is not the subject to be discussed here.
Also according to physical consideration and/or by graphical inspection you correctly assumed that one phasance is of Warbourg kind (Phi=-pi/4 ; nu=-1/2).
The problem is to fit an equation with five adjustable parameters. This is a difficult problem of non linear regression of a complex equation. The usual method consists in an iterative process starting from "guessed values" of the five parameters.
The "guessed values" have to be not far from the unknown correct values. One can find some approximates from graphical inspection of the impedances diagram. Often this is a cause of failure of convergence of the iterative process.
A more reliable method consists in using a combination of linear regression wrt most of the parameters and non-linear regression wrt only few parameters.
In the present case it is shown below that the nonlinear regression can be reduced to only one parameter while the other parameters can be handled by a simple linear regression. This is a big simplification.
A software for mixed linear and nonlinear regression (in cases involving several phasors) was developed in years 1980-1990. Infortunately I have no access to it presently.
Nevertheless in the present case of one phasor only we don't need a sledgehammer to crack a nut. The Newton-Raphson method is sufficient. Graphical inspection gives a rough approximate of (nu) between -0.7 and -0.8 The chosen initial value is nu=-0.75 giving the next first run :
Since all calculus are carried out in complex numbers the resulting values are complex instead of real as expected. They are noted ZR1, ZR2, ZP1, ZP2 to distinguish from real R1, R2, P1, P2. This is because the value of (nu) isn't optimal.
The more (nu) converges to the final value the more the imaginary parts vanishes. After a few runs of the Newton-Raphson process the imaginary parts become quite negligible. The final result is shown below.
Publications :
"Contribution à l'interprétation de certaines mesures d'impédances". 2-ième Forum sur les Imédances Electrochimiques, 28-29 octobre 1987.
"Calcul de réseau électriques équivalents à partir de mesures d'impédances". 3-ième Forum sur les Imédances Electrochimiques, 24 novembre 1988.
"Synthèse de circuits électiques équivalents à partir de mesures d'impédances complexes". 5-ième Forum sur les Imédances Electrochimiques, 28 novembre 1991.

Related

Unexpected output of emmeans averaged accross variables

I transformed a variable (e.g. leaf_area) using a simple square transformation and then fitted to the following model containing an interaction:
fit <- lmer(leaf_area^2 ~genotype*soil_type + date_measurement + light + (1|repetition) + (1|y_position) + (1|x_position), data = dataset)
To obtain the emmeans averaged accross genotypes and soil type for each measurement date, I further use the following command:
fit.emm <- emmeans(fit, ~ genotype*soil_type + date_measurement, type = "response")
The emmeans are, nevertheless, averaged for the variable date_measurement.
As represented in the following example, emmeans are averages of genotypes x, y and z in the soil MT and in the date of measurement 27.4, but the measurement dates actually occured on 21, 23, 28, 30 and 35 das.
genotype soil_type date_measurement emmean SE df lower.CL upper.CL
x MT 27.4 0.190 0.0174 126.0 0.155 0.224
y MT 27.4 0.220 0.0147 74.1 0.191 0.250
z MT 27.4 0.210 0.0157 108.6 0.179 0.241
When I fit the model without interaction between genotype and soil type and run the emmeans, the results are still averaged for the measurement dates.
fit <- lmer(leaf_area^2 ~genotype + soil_type + date_measurement + light + (1|repetition) + (1|y_position) + (1|x_position), data = dataset)
fit.emm <- emmeans(fit, ~ genotype + soil_type + date_measurement, type = "response")
My question is: how can I obtain the emmeans averaged accross genotype and soil but separated for each date of measurement?
Class of variables:
date_measurement, light, x_position, y_position: numeric
genotype and soil_type: factor
Thank you in advance.
When you have a numerical predictor in the model, the default is to obtain predictions at the average value of that covariate. If you want the covariates treated like factors, you have to say so:
fit.emm <- emmeans(fit, ~ genotype*soil_type + date_measurement,
cov.reduce = FALSE)
In addition, emmeans cannot auto-detect your square transformation. You can fix it up by doing
fit.emm <- update(fit.emm, tran = make.tran("power", 2),
type = "response")
Then I think you will want to subsequently obtain marginal means by averaging over date_measurement at least -- i.e.,
fit.emm2 <- emmeans(fit.emm, ~ genotype*soil_type)
It will retain the transformation and type = "response" setting.

Mathematical equation of binomial probit gam (mgcv) with tensor product interactions?

I have the following binomial (probit) gam using mgcv, which includes y (0 or 1), two continuous predictors (xa, xb) plus the ‘ti’ interactions of a third covariate (xc) with these two predictors.
mygam <- gamV(y ~ s(xa, k=10, bs="cr") + s(xb, k=10, bs="cr") +
ti(xc, xa, bs = c("cr", "cr"), k = c(5, 5)) +
ti(xc, xb, bs = c("cr", "cr"), k = c(5, 5)),
data = df, method = "ML", family = binomial(link = "probit"))
Using default k=10 for main effects and k=c(5,5) for interactions, the intercept and 50 coefficients are the following:
terms <- c("Intercept", "s(xa).1", "s(xa).2", "s(xa).3", "s(xa).4", "s(xa).5", "s(xa).6", "s(xa).7", "s(xa).8", "s(xa).9", "s(xb).1", "s(xb).2", "s(xb).3", "s(xb).4", "s(xb).5", "s(xb).6", "s(xb).7", "s(xb).8", "s(xb).9", "ti(xc,xa).1", "ti(xc,xa).2", "ti(xc,xa).3", "ti(xc,xa).4", "ti(xc,xa).5", "ti(xc,xa).6", "ti(xc,xa).7", "ti(xc,xa).8", "ti(xc,xa).9", "ti(xc,xa).10", "ti(xc,xa).11", "ti(xc,xa).12", "ti(xc,xa).13", "ti(xc,xa).14", "ti(xc,xa).15", "ti(xc,xa).16", "ti(xc,xb).1", "ti(xc,xb).2", "ti(xc,xb).3", "ti(xc,xb).4", "ti(xc,xb).5", "ti(xc,xb).6", "ti(xc,xb).7", "ti(xc,xb).8", "ti(xc,xb).9", "ti(xc,xb).10", "ti(xc,xb).11", "ti(xc,xb).12", "ti(xc,xb).13", "ti(xc,xb).14", "ti(xc,xb).15", "ti(xc,xb).16")
coefs <- c(-0.0702421404106311, 0.0768316292916553, 0.210036768213672, 0.409025596435604, 0.516554288252813, 0.314600352165584, -0.271938137725695, -1.1169186662112, -1.44829172827383, -2.39608336269616, 0.445091855160863, 0.119747299507175, -0.73508332280573, -1.3851857008194, -1.84125850675114, -1.77797283303084, -1.45118023146655, -1.56696555281429, -2.55103708393941, 0.0505422263407052, -0.110361707609838, -0.168897589312596, -0.0602318423244818, 0.095385784704545, -0.20818521830706, -0.318650042681766, -0.113613570916751, 0.123559386280642, -0.269467853796075, -0.412476320830133, -0.147039497705579, 0.189416535823022, -0.412990646359733, -0.632158143648671, -0.225344249076957, 0.0237165469278517, 0.0434926950921869, 0.080572361088243, 0.397397459143317, 0.0453636001566695, 0.0831126054198634, 0.153350111096294, 0.75009880522662, 0.0583689328419794, 0.107001374561518, 0.197852239031467, 0.970623037721609, 0.0894562434842868, 0.163989821269297, 0.303175057387294, 1.48718228468607)
df_coefs <- data.frame(terms, coefs)
I would like the mathematical equation of this model, which would allow to determine the probability of y given known covariates. Given as example from my dataset (n > 70000), the predicted probability ‘prob’ (type = “response”) obtained with xa = 7.116, xb = 2.6, and xc = 19 was prob = 0.76444141, which is the result to be determined with the expected mathematical equation.
Is this possible?
Thanks for your help and time.
Below, the summary(mygam)
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.07024 0.00709 -9.907 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(xa) 8.007 8.548 5602.328 < 2e-16 ***
s(xb) 8.448 8.908 16282.793 < 2e-16 ***
ti(xc,xa) 1.004 1.007 10.278 0.00138 **
ti(xc,xb) 1.021 1.042 7.718 0.00627 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.52 Deviance explained = 45.6%
-ML = 29379 Scale est. = 1 n = 77870
If you set type="terms" in the predict function, you get the contributions of the individual components to the linear predictor. However, these are not on the scale of outcome probability, but on that of the linear predictor.
Because of the non-linear transformation of the linear predictor -- in your case with the probit link -- attributing the predicted probability to the individual components requires attribution methods that come with additional assumptions.
An example of such an attribution method is Shapley values.

Problem while implementating of “Continual Learning Through Synaptic Intelligence” paper

I am trying to reproduce the results of “Continual Learning Through Synaptic Intelligence” paper [1]. I tried implementing the algorithm as best as I could understand after going through paper many times. I also looked at it’s official implementation on github which is in tensorflow 1.0, but could not understand much as I don’t have much familiarity with that.
Though I got some results but not good enough as paper. I wanted to ask if anyone can help me to find out where I am going wrong. Before going into coding details I want to discuss sudo code so that I undersatnd what is going wrong with my implementation.
Here is kind of sudo code that I have implemented. Please help me.
lambda = 1
xi = 1e-3
total_tasks = 5
model = NN(total_tasks)
## multiheaded linear model ([784(input)-->256-->256-->2(output)(*5, 5 separate heads)])
## output layer is 2 neuron head (separate heads for each task, total 5 tasks)
## output is vector of size 2 (for 2 classes)
prev_theta = model.theta(copy=True) # updated at end of task
## model.theta() returns list of shared parameters (i.e. layer1 and layer2 excluding output layer)
## copy=True, gives copy of parameters
## so it don't effect original params connected to computaitonal graph
omega_total = zero_like(prev_theta) ## Capital Omega in paper (per-parameter regularization strength)
omega = zero_like(prev_theta) ## small omega in paper (per-parameter contribution to loss)
for task_num in range(total_tasks):
optmizer = ADAM() # created before every task (or reset it)
prev_theta_step = model.theta(copy=True) # updated at end of step
## trainig for task start
for epoch in range(10):
for steps in range(steps_per_epoch):
X, Y = train_dataset[task_num].sample()
## X is flattened image of size 784
## Y is binary vector of size 2 ([0,1] or [1,0])
Y_pred = model(X, task_num) # model is multihead, task_num selects the head
loss = CROSS_ENTROPY(Y_pred, Y)
if(task_num>0): ## reg_loss starts from second task
theta = model.theta()
## here copy is not true so it returns params connected to computaitonal graph
reg_loss = torch.sum(omega_total*torch.square(theta - prev_theta))
loss = loss + lambda*reg_loss
optmizer.zero_grad()
loss.backward()
theta = model.theta(copy=True)
grads = model.theta_grads() ## grads of shared paramters only
omega = omega - grads*(theta - prev_theta_step)
prev_theta_step = theta
optimizer.step()
## training for task complete, update importance parameters
theta = model.theta(copy=True)
omega_total += relu( omega/( (theta - prev_theta)**2 + xi) )
prev_theta = theta
omega = torch.zeros(theta_shape)
## evaluation code
...
...
...
## evaluation done
I am also attaching result I got. In results ‘one’ (blue) represents without regression loss (lambda=0), ‘two’ (green) represents with regression loss (lambda=1).
Thank you for reading so far. Kindly help me out.

Why do we "pack" the sequences in PyTorch?

I was trying to replicate How to use packing for variable-length sequence inputs for rnn but I guess I first need to understand why we need to "pack" the sequence.
I understand why we "pad" them but why is "packing" (via pack_padded_sequence) necessary?
I have stumbled upon this problem too and below is what I figured out.
When training RNN (LSTM or GRU or vanilla-RNN), it is difficult to batch the variable length sequences. For example: if the length of sequences in a size 8 batch is [4,6,8,5,4,3,7,8], you will pad all the sequences and that will result in 8 sequences of length 8. You would end up doing 64 computations (8x8), but you needed to do only 45 computations. Moreover, if you wanted to do something fancy like using a bidirectional-RNN, it would be harder to do batch computations just by padding and you might end up doing more computations than required.
Instead, PyTorch allows us to pack the sequence, internally packed sequence is a tuple of two lists. One contains the elements of sequences. Elements are interleaved by time steps (see example below) and other contains the size of each sequence the batch size at each step. This is helpful in recovering the actual sequences as well as telling RNN what is the batch size at each time step. This has been pointed by #Aerin. This can be passed to RNN and it will internally optimize the computations.
I might have been unclear at some points, so let me know and I can add more explanations.
Here's a code example:
a = [torch.tensor([1,2,3]), torch.tensor([3,4])]
b = torch.nn.utils.rnn.pad_sequence(a, batch_first=True)
>>>>
tensor([[ 1, 2, 3],
[ 3, 4, 0]])
torch.nn.utils.rnn.pack_padded_sequence(b, batch_first=True, lengths=[3,2])
>>>>PackedSequence(data=tensor([ 1, 3, 2, 4, 3]), batch_sizes=tensor([ 2, 2, 1]))
Here are some visual explanations1 that might help to develop better intuition for the functionality of pack_padded_sequence().
TL;DR: It is performed primarily to save compute. Consequently, the time required for training neural network models is also (drastically) reduced, especially when carried out on very large (a.k.a. web-scale) datasets.
Let's assume we have 6 sequences (of variable lengths) in total. You can also consider this number 6 as the batch_size hyperparameter. (The batch_size will vary depending on the length of the sequence (cf. Fig.2 below))
Now, we want to pass these sequences to some recurrent neural network architecture(s). To do so, we have to pad all of the sequences (typically with 0s) in our batch to the maximum sequence length in our batch (max(sequence_lengths)), which in the below figure is 9.
So, the data preparation work should be complete by now, right? Not really.. Because there is still one pressing problem, mainly in terms of how much compute do we have to do when compared to the actually required computations.
For the sake of understanding, let's also assume that we will matrix multiply the above padded_batch_of_sequences of shape (6, 9) with a weight matrix W of shape (9, 3).
Thus, we will have to perform 6x9 = 54 multiplication and 6x8 = 48 addition                    
(nrows x (n-1)_cols) operations, only to throw away most of the computed results since they would be 0s (where we have pads). The actual required compute in this case is as follows:
9-mult 8-add
8-mult 7-add
6-mult 5-add
4-mult 3-add
3-mult 2-add
2-mult 1-add
---------------
32-mult 26-add
------------------------------
#savings: 22-mult & 22-add ops
(32-54) (26-48)
That's a LOT more savings even for this very simple (toy) example. You can now imagine how much compute (eventually: cost, energy, time, carbon emission etc.) can be saved using pack_padded_sequence() for large tensors with millions of entries, and million+ systems all over the world doing that, again and again.
The functionality of pack_padded_sequence() can be understood from the figure below, with the help of the used color-coding:
As a result of using pack_padded_sequence(), we will get a tuple of tensors containing (i) the flattened (along axis-1, in the above figure) sequences , (ii) the corresponding batch sizes, tensor([6,6,5,4,3,3,2,2,1]) for the above example.
The data tensor (i.e. the flattened sequences) could then be passed to objective functions such as CrossEntropy for loss calculations.
1 image credits to #sgrvinod
The above answers addressed the question why very well. I just want to add an example for better understanding the use of pack_padded_sequence.
Let's take an example
Note: pack_padded_sequence requires sorted sequences in the batch (in the descending order of sequence lengths). In the below example, the sequence batch were already sorted for less cluttering. Visit this gist link for the full implementation.
First, we create a batch of 2 sequences of different sequence lengths as below. We have 7 elements in the batch totally.
Each sequence has embedding size of 2.
The first sequence has the length: 5
The second sequence has the length: 2
import torch
seq_batch = [torch.tensor([[1, 1],
[2, 2],
[3, 3],
[4, 4],
[5, 5]]),
torch.tensor([[10, 10],
[20, 20]])]
seq_lens = [5, 2]
We pad seq_batch to get the batch of sequences with equal length of 5 (The max length in the batch). Now, the new batch has 10 elements totally.
# pad the seq_batch
padded_seq_batch = torch.nn.utils.rnn.pad_sequence(seq_batch, batch_first=True)
"""
>>>padded_seq_batch
tensor([[[ 1, 1],
[ 2, 2],
[ 3, 3],
[ 4, 4],
[ 5, 5]],
[[10, 10],
[20, 20],
[ 0, 0],
[ 0, 0],
[ 0, 0]]])
"""
Then, we pack the padded_seq_batch. It returns a tuple of two tensors:
The first is the data including all the elements in the sequence batch.
The second is the batch_sizes which will tell how the elements related to each other by the steps.
# pack the padded_seq_batch
packed_seq_batch = torch.nn.utils.rnn.pack_padded_sequence(padded_seq_batch, lengths=seq_lens, batch_first=True)
"""
>>> packed_seq_batch
PackedSequence(
data=tensor([[ 1, 1],
[10, 10],
[ 2, 2],
[20, 20],
[ 3, 3],
[ 4, 4],
[ 5, 5]]),
batch_sizes=tensor([2, 2, 1, 1, 1]))
"""
Now, we pass the tuple packed_seq_batch to the recurrent modules in Pytorch, such as RNN, LSTM. This only requires 5 + 2=7 computations in the recurrrent module.
lstm = nn.LSTM(input_size=2, hidden_size=3, batch_first=True)
output, (hn, cn) = lstm(packed_seq_batch.float()) # pass float tensor instead long tensor.
"""
>>> output # PackedSequence
PackedSequence(data=tensor(
[[-3.6256e-02, 1.5403e-01, 1.6556e-02],
[-6.3486e-05, 4.0227e-03, 1.2513e-01],
[-5.3134e-02, 1.6058e-01, 2.0192e-01],
[-4.3123e-05, 2.3017e-05, 1.4112e-01],
[-5.9372e-02, 1.0934e-01, 4.1991e-01],
[-6.0768e-02, 7.0689e-02, 5.9374e-01],
[-6.0125e-02, 4.6476e-02, 7.1243e-01]], grad_fn=<CatBackward>), batch_sizes=tensor([2, 2, 1, 1, 1]))
>>>hn
tensor([[[-6.0125e-02, 4.6476e-02, 7.1243e-01],
[-4.3123e-05, 2.3017e-05, 1.4112e-01]]], grad_fn=<StackBackward>),
>>>cn
tensor([[[-1.8826e-01, 5.8109e-02, 1.2209e+00],
[-2.2475e-04, 2.3041e-05, 1.4254e-01]]], grad_fn=<StackBackward>)))
"""
We need to convert output back to the padded batch of output:
padded_output, output_lens = torch.nn.utils.rnn.pad_packed_sequence(output, batch_first=True, total_length=5)
"""
>>> padded_output
tensor([[[-3.6256e-02, 1.5403e-01, 1.6556e-02],
[-5.3134e-02, 1.6058e-01, 2.0192e-01],
[-5.9372e-02, 1.0934e-01, 4.1991e-01],
[-6.0768e-02, 7.0689e-02, 5.9374e-01],
[-6.0125e-02, 4.6476e-02, 7.1243e-01]],
[[-6.3486e-05, 4.0227e-03, 1.2513e-01],
[-4.3123e-05, 2.3017e-05, 1.4112e-01],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00]]],
grad_fn=<TransposeBackward0>)
>>> output_lens
tensor([5, 2])
"""
Compare this effort with the standard way
In the standard way, we only need to pass the padded_seq_batch to lstm module. However, it requires 10 computations. It involves several computes more on padding elements which would be computationally inefficient.
Note that it does not lead to inaccurate representations, but need much more logic to extract correct representations.
For LSTM (or any recurrent modules) with only forward direction, if we would like to extract the hidden vector of the last step as a representation for a sequence, we would have to pick up hidden vectors from T(th) step, where T is the length of the input. Picking up the last representation will be incorrect. Note that T will be different for different inputs in batch.
For Bi-directional LSTM (or any recurrent modules), it is even more cumbersome, as one would have to maintain two RNN modules, one that works with padding at the beginning of the input and one with padding at end of the input, and finally extracting and concatenating the hidden vectors as explained above.
Let's see the difference:
# The standard approach: using padding batch for recurrent modules
output, (hn, cn) = lstm(padded_seq_batch.float())
"""
>>> output
tensor([[[-3.6256e-02, 1.5403e-01, 1.6556e-02],
[-5.3134e-02, 1.6058e-01, 2.0192e-01],
[-5.9372e-02, 1.0934e-01, 4.1991e-01],
[-6.0768e-02, 7.0689e-02, 5.9374e-01],
[-6.0125e-02, 4.6476e-02, 7.1243e-01]],
[[-6.3486e-05, 4.0227e-03, 1.2513e-01],
[-4.3123e-05, 2.3017e-05, 1.4112e-01],
[-4.1217e-02, 1.0726e-01, -1.2697e-01],
[-7.7770e-02, 1.5477e-01, -2.2911e-01],
[-9.9957e-02, 1.7440e-01, -2.7972e-01]]],
grad_fn= < TransposeBackward0 >)
>>> hn
tensor([[[-0.0601, 0.0465, 0.7124],
[-0.1000, 0.1744, -0.2797]]], grad_fn= < StackBackward >),
>>> cn
tensor([[[-0.1883, 0.0581, 1.2209],
[-0.2531, 0.3600, -0.4141]]], grad_fn= < StackBackward >))
"""
The above results show that hn, cn are different in two ways while output from two ways lead to different values for padding elements.
Adding to Umang's answer, I found this important to note.
The first item in the returned tuple of pack_padded_sequence is a data (tensor) -- a tensor containing the packed sequence. The second item is a tensor of integers holding information about the batch size at each sequence step.
What's important here though is the second item (Batch sizes) represents the number of elements at each sequence step in the batch, not the varying sequence lengths passed to pack_padded_sequence.
For instance, given the data abc and x
the :class:PackedSequence would contain the data axbc with
batch_sizes=[2,1,1].
I used pack padded sequence as follows.
packed_embedded = nn.utils.rnn.pack_padded_sequence(seq, text_lengths)
packed_output, hidden = self.rnn(packed_embedded)
where text_lengths are the length of the individual sequence before padding and sequence are sorted according to decreasing order of length within a given batch.
you can check out an example here.
And we do packing so that the RNN doesn't see the unwanted padded index while processing the sequence which would affect the overall performance.

Calculate cutoff and sensitivity for specific values of specificity?

After calculating several regression models, I want to calculate sensitivity-values and the cut-off for pre-specified values of specificity (i.e, 0.99, 0.90, 0.85 and so on) to find the best model. I have created code to calculate sensitivity and specificity for given values of the cut-off (from 0.1 to 0.9), but now I want to use specific values of specificity (i.e., calculate the corresponding cut-off value and sensitivity-values), and here I'm stuck.
Suppose I have the following regression model (using the example dataset 'mtcars'):
data(mtcars)
model <- glm(formula= vs ~ wt + disp, data=mtcars, family=binomial)
Here is the code I've used for the calculation of sens and spec for given values of the cut-off:
predvalues <- model$fitted.values
getMisclass <- function(cutoff, p, labels){
d <- cut(predvalues, breaks=c(-Inf, cutoff, Inf), labels=c("0", "1"))
print(confusionMatrix(d, mtcars$vs, positive="1"))
cat("cutoff", cutoff, ":\n")
t <- table(d, mtcars$vs)
print(round(sum(t[c(1,4)])/sum(t), 2))
}
cutoffs <- seq(.1,.9,by=.1)
sapply(cutoffs, getMisclass, p=predval, labels=mtcars$vs)
Can someone help me how to rewrite this code for the calculation of sensitivity and cut-off scores given a range of specificity-values? Is it possible?
The values for the cutoff should be
cutoffs <- c(0.99, 0.90, 0.85, 0.80, 0.75)
Thanks a lot!
This is closely related to how ROC curves are calculated: if those are calculated with fine granularity you essentially get a sensitivity and specificity for "every" threshold value. So, what you could do is simply calculate the sensitivities, specificities and corresponding threshold as if you would want to obtain a ROC curve...
library(pROC)
myRoc <- roc(predictor = predvalues, response = mtcars$vs)
plot(myRoc)
myRoc$specificities
print(with(myRoc, data.frame(specificities, sensitivities, thresholds)))
# specificities sensitivities thresholds
# 1 0.00000000 1.00000000 -Inf
# 2 0.05555556 1.00000000 0.002462809
# 3 0.11111111 1.00000000 0.003577104
# 4 0.16666667 1.00000000 0.004656164
# 5 0.22222222 1.00000000 0.005191974
# 6 0.27777778 1.00000000 0.006171197
# [...]
...and then lookup the corresponding sensitivities and thresholds for whichever specificities you are interested in, e.g. as:
cutoffs <- c(0.99, 0.90, 0.85, 0.80, 0.75)
myData <- with(myRoc, data.frame(specificities, sensitivities, thresholds))
library(plyr)
print(laply(cutoffs, function(cutoff) myData$sensitivities[which.min(abs(myData$specificities-cutoff))]))
# [1] 0.7857143 0.8571429 0.8571429 0.9285714 0.9285714