More arguments required when I run my python/pyspark function - function

I have a function which I have defined as follows, you can see that it clearly requires 7 arguments;
def calc_z(w,S,var,a1,a2,yt1,yt2):
mu = w*S
sigma = mt.sqrt(var)
z = np.random.normal(mu,sigma)
u = [a1,a2,z]
yt = [yt1,yt2,1]
thetaset = np.random.rand(len(u))
m = [i for i in range(len(u))]
max_iter = 30
#Calculate E-step
for i in range(max_iter):
print 'Iteration:', i
print 'z:', z
print 'thetaset', thetaset
devLz = eq6(var,w,S,z,yt,u,thetaset,m)
dev2Lz2 = eq9(var,thetaset,u)
#Calculate M-Step
z = z - (devLz / dev2Lz2)
w = lambdaw * z
for i in range(len(thetaset)):
devLTheta = eq7(yt,u,thetaset,lambdatheta)
dev2LTheta2 = eq10(thetaset,u,lambdatheta)
thetaset = thetaset - (devLTheta / dev2LTheta2)
return z
I am using pyspark so I convert this to a udf
calc_z_udf = udf(calc_z,FloatType())
and then run it as follows (where I am clearly passing in 7 arguments - Or am I going mad!?);
data = data.withColumn('z', calc_z_udf(data['w'],data['Org_Depth_Diff_S'],data['var'],data['proximity_rank_a1'],data['cotravel_count_a2'],data['cotravel_yt1'],data['proximity_yt2']))
When I run this however I am getting an error which states:
TypeError: calc_z() takes exactly 7 arguments (6 given)
Could anyone help me with why this might be as it is clear that when I am running the function I am infact passing in 7 arguments and not 6 as the error states?

I am not sure its is the reason no need to send column objects you can just pass strings:
data = data.withColumn('z', calc_z_udf('w', 'Org_Depth_Diff_S','var', 'proximity_rank_a1', 'cotravel_count_a2', 'cotravel_yt1', 'proximity_yt2'))

Related

Trouble Calling Terms from Function

I have a code that defines a function, then I try to use the variables I defined within that function in another expression. When I do this, I get an error that is:
Undefined function or variable 'phi'.
I'm not sure why phi is undefined, since I have it in the if/else statement.
It might be better to explain with my (shortened) code:
global I11 I22 I33 Mx My Mz w10 w20 w30 eps10 eps20 eps30 eps40...
C110 C120 C130 C210 C220 C230 C310 C320 C330 IC K0
IC = [w10 w20 w30...
eps10 eps20 eps30 eps40...
C110 C120 C130 C210 C220 C230 C310 C320 C330];
opts = odeset('RelTol', 1*10^(-10),'AbsTol', 1*10^(-10));
[t, y] = ode45(#(t,y) DynEqn2(t,y,I11,I22,I33,Mx,My,Mz), [ti tf], IC, opts);
N = sqrt(sum(y(:,4:7).^2,2));
kap = acosd(1-2*y(:,5).^2-2*y(:,7).^2);
phi1 = acosd((2.*(y(:,4).*y(:,5)+y(:,6).*y(:,7)))/sind(kap));
phi2 = asind((2.*(y(:,6).*y(:,4)-y(:,5).*y(:,7)))/sind(kap));
if phi1==phi2
phi = phi1;
elseif phi1==180-phi2
phi = phi1;
elseif -phi1==phi2
phi = -phi1;
elseif -phi1==180-phi2
phi = -phi1;
else
disp('Something is wrong with phi')
end
figure (1)
plot(t,phi)
figure (2)
plot(t,kap)
function soln = DynEqn2(t,y,I11,I22,I33,Mx,My,Mz)
w1 = y(1);
w2 = y(2);
w3 = y(3);
eps1 = y(4);
eps2 = y(5);
eps3 = y(6);
eps4 = y(7);
C11 = y(8);
C12 = y(9);
C13 = y(10);
C21 = y(11);
C22 = y(12);
C23 = y(13);
C31 = y(14);
C32 = y(15);
C33 = y(16);
w1_dot = (Mx - w2*w3*(I33-I22))/I11;
w2_dot = (My - w1*w3*(I11-I33))/I22;
w3_dot = (Mz - w1*w2*(I22-I11))/I33;
eps1_dot = .5*(w1*eps4-w2*eps3+w3*eps2);
eps2_dot = .5*(w1*eps3+w2*eps4-w3*eps1);
eps3_dot = .5*(-w1*eps2+w2*eps1+w3*eps4);
eps4_dot = -.5*(w1*eps1+w2*eps2+w3*eps3);
C11_dot = C12*w3-C13*w2;
C12_dot = C13*w1-C11*w3;
C13_dot = C11*w2-C12*w1;
C21_dot = C22*w3-C23*w2;
C22_dot = C23*w1-C21*w3;
C23_dot = C21*w2-C22*w1;
C31_dot = C32*w3-C33*w2;
C32_dot = C33*w1-C31*w3;
C33_dot = C31*w2-C32*w1;
soln = [w1_dot; w2_dot; w3_dot; ...
eps1_dot; eps2_dot; eps3_dot; eps4_dot; ...
C11_dot; C12_dot; C13_dot; C21_dot; C22_dot; C23_dot; C31_dot; C32_dot; C33_dot];
end
My lines where I calculate phi1, phi2, and then the if/else statement to find phi, are what I am struggling with.
I made sure that the variables defined in the function work, so for example, in the command window I typed in 'y(:,4)' and got the correct output. But whenever I try to use this within the functions i.e. 'phi1', it repeatedly outputs an incorrect value of '90.0000' until I stop it.
Where I define the 'N' variable, it is something similar, yet that one works without errors.
Does anyone have any ideas how to amend this issue?
Any help is appreciated, thanks.
Edit: The complete error message is as follows:
Undefined function or variable 'phi'.
Error in HW6_Q1 (line 85)
plot(t,phi)
I figured out my solution with the help from a colleague not on Stack Overflow.
I forgot the ./, which turned my phi into a matrix, rather than a vector which is what I wanted from it.

Dimension problem when converting a MATLAB .m script into an Octave compatible syntax

I want to run a MATLAB script M-file to reconstruct a point cloud in Octave. Therefore I had to rewrite some parts of the code to make it compatible with Octave. Actually the M-file works fine in Octave (I don't get any errors) and also the plotted point cloud looks good at first glance, but it seems that the variables are only half the size of the original MATLAB variables. In the attached screenshots you can see what I mean.
Octave:
MATLAB:
You can see that the dimension of e.g. M in Octave is 1311114x3 but in MATLAB it is 2622227x3. The actual number of rows in my raw file is 2622227 as well.
Here you can see an extract of the raw file (original data) that I use.
Rotation angle Measured distance
-0,090 26,295
-0,342 26,294
-0,594 26,294
-0,846 26,295
-1,098 26,294
-1,368 26,296
-1,620 26,296
-1,872 26,296
In MATLAB I created my output variable as follows.
data = table;
data.Rotationangle = cell2mat(raw(:, 1));
data.Measureddistance = cell2mat(raw(:, 2));
As there is no table function in Octave I wrote
data = cellfun(#(x)str2num(x), strrep(raw, ',', '.'))
instead.
Octave also has no struct2array function, so I had to replace it as well.
In MATLAB I wrote.
data = table2array(data);
In Octave this was a bit more difficult to do. I had to create a struct2array function, which I did by means of this bug report.
%% Create a struct2array function
function retval = struct2array (input_struct)
%input check
if (~isstruct (input_struct) || (nargin ~= 1))
print_usage;
endif
%convert to cell array and flatten/concatenate output.
retval = [ (struct2cell (input_struct)){:}];
endfunction
clear b;
b.a = data;
data = struct2array(b);
Did I make a mistake somewhere and could someone help me to solve this problem?
edit:
Here's the part of my script where I'm using raw.
delimiter = '\t';
startRow = 5;
formatSpec = '%s%s%[^\n\r]';
fileID = fopen(filename,'r');
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'HeaderLines' ,startRow-1, 'ReturnOnError', false, 'EndOfLine', '\r\n');
fclose(fileID);
%% Convert the contents of columns containing numeric text to numbers.
% Replace non-numeric text with NaN.
raw = repmat({''},length(dataArray{1}),length(dataArray)-1);
for col=1:length(dataArray)-1
raw(1:length(dataArray{col}),col) = mat2cell(dataArray{col}, ones(length(dataArray{col}), 1));
end
numericData = NaN(size(dataArray{1},1),size(dataArray,2));
for col=[1,2]
% Converts text in the input cell array to numbers. Replaced non-numeric
% text with NaN.
rawData = dataArray{col};
for row=1:size(rawData, 1)
% Create a regular expression to detect and remove non-numeric prefixes and
% suffixes.
regexstr = '(?<prefix>.*?)(?<numbers>([-]*(\d+[\.]*)+[\,]{0,1}\d*[eEdD]{0,1}[-+]*\d*[i]{0,1})|([-]*(\d+[\.]*)*[\,]{1,1}\d+[eEdD]{0,1}[-+]*\d*[i]{0,1}))(?<suffix>.*)';
try
result = regexp(rawData(row), regexstr, 'names');
numbers = result.numbers;
% Detected commas in non-thousand locations.
invalidThousandsSeparator = false;
if numbers.contains('.')
thousandsRegExp = '^\d+?(\.\d{3})*\,{0,1}\d*$';
if isempty(regexp(numbers, thousandsRegExp, 'once'))
numbers = NaN;
invalidThousandsSeparator = true;
end
end
% Convert numeric text to numbers.
if ~invalidThousandsSeparator
numbers = strrep(numbers, '.', '');
numbers = strrep(numbers, ',', '.');
numbers = textscan(char(numbers), '%f');
numericData(row, col) = numbers{1};
raw{row, col} = numbers{1};
end
catch
raw{row, col} = rawData{row};
end
end
end
You don't see any raw in my workspaces because I clear all temporary variables before I reconstruct my point cloud.
Also my original data in row 1311114 and 1311115 look normal.
edit 2:
As suggested here is a small example table to clarify what I want and what MATLAB does with the table2array function in my case.
data =
-0.0900 26.2950
-0.3420 26.2940
-0.5940 26.2940
-0.8460 26.2950
-1.0980 26.2940
-1.3680 26.2960
-1.6200 26.2960
-1.8720 26.2960
With the struct2array function I used in Octave I get the following array.
data =
-0.090000 26.295000
-0.594000 26.294000
-1.098000 26.294000
-1.620000 26.296000
-2.124000 26.295000
-2.646000 26.293000
-3.150000 26.294000
-3.654000 26.294000
If you compare the Octave array with my original data, you can see that every second row is skipped. This seems to be the reason for 1311114 instead of 2622227 rows.
edit 3:
I tried to solve my problem with the suggestions of #Tasos Papastylianou, which unfortunately was not successful.
First I did the variant with a struct.
data = struct();
data.Rotationangle = [raw(:,1)];
data.Measureddistance = [raw(:,2)];
data = cell2mat( struct2cell (data ).' )
But this leads to the following structure in my script. (Unfortunately the result is not what I would like to have as shown in edit 2. Don't be surprised, I only used a small part of my raw file to accelerate the run of my script, so here are only 769 lines.)
[766,1] = -357,966
[767,1] = -358,506
[768,1] = -359,010
[769,1] = -359,514
[1,2] = 26,295
[2,2] = 26,294
[3,2] = 26,294
[4,2] = 26,296
Furthermore I get the following error.
error: unary operator '-' not implemented for 'cell' operands
error: called from
Cloud_reconstruction at line 137 column 11
Also the approach with the dataframe octave package didn't work. When I run the following code it leads to the error you can see below.
dataframe2array = #(df) cell2mat( struct(df).x_data );
pkg load dataframe;
data = dataframe();
data.Rotationangle = [raw(:, 1)];
data.Measureddistance = [raw(:, 2)];
dataframe2array(data)
error:
warning: Trying to overwrite colum names
warning: called from
df_matassign at line 147 column 13
subsasgn at line 172 column 14
Cloud_reconstruction at line 106 column 20
warning: Trying to overwrite colum names
warning: called from
df_matassign at line 176 column 13
subsasgn at line 172 column 14
Cloud_reconstruction at line 106 column 20
warning: Trying to overwrite colum names
warning: called from
df_matassign at line 147 column 13
subsasgn at line 172 column 14
Cloud_reconstruction at line 107 column 23
warning: Trying to overwrite colum names
warning: called from
df_matassign at line 176 column 13
subsasgn at line 172 column 14
Cloud_reconstruction at line 107 column 23
error: RHS(_,2): but RHS has size 768x1
error: called from
df_matassign at line 179 column 11
subsasgn at line 172 column 14
Cloud_reconstruction at line 107 column 23
Both error messages refer to the following part of my script where I'm doing the reconstruction of the point cloud in cylindrical coordinates.
distLaserCenter = 47; % Distance between the pipe centerline and the blind zone in mm
m = size(data,1); % Find the length of the first dimension of data
zincr = 0.4/360; % z increment in mm per deg
data(:,1) = -data(:,1);
for i = 1:m
data(i,2) = data(i,2) + distLaserCenter;
if i == 1
data(i,3) = 0;
elseif abs(data(i,1)-data(i-1)) < 100
data(i,3) = data(i-1,3) + zincr*(data(i,1)-data(i-1));
else abs(data(i,1)-data(i-1)) > 100;
data(i,3) = data(i-1,3) + zincr*(data(i,1)-(data(i-1)-360));
end
end
To give some background information for a better understanding. The script is used to reconstruct a pipe as a point cloud. The surface of the pipe was scanned from inside with a laser and the laser measured several points (distance from laser to the inner wall of the pipe) at each deg of rotation. I hope this helps to understand what I want to do with my script.
Not sure exactly what you're trying to do, but here's a toy example of how a struct could be used in an equivalent manner to a table:
matlab:
data = table;
data.A = [1;2;3;4;5];
data.B = [10;20;30;40;50];
table2array(data)
octave:
data = struct();
data.A = [1;2;3;4;5];
data.B = [10;20;30;40;50];
cell2mat( struct2cell (data ).' )
Note the transposition operation (.') before passing the result to cell2mat, since in a table, the 'fieldnames' are arranged horizontally in columns, whereas the struct2cell ends up arranging what used to be the 'fieldnames' as rows.
You might also be interested in the dataframe octave package, which performs similar functions to matlab's table (or in fact, R's dataframe object): https://octave.sourceforge.io/dataframe/ (you can install this by typing pkg install -forge dataframe in your console)
Unfortunately, the way to display the data as an array is still not ideal (see: https://stackoverflow.com/a/55417141/4183191), but you can easily convert that into a tiny function, e.g.
dataframe2array = #(df) cell2mat( struct(df).x_data );
Your code can then become:
pkg load dataframe;
data = dataframe();
data.A = [1;2;3;4;5];
data.B = [10;20;30;40;50];
dataframe2array(data)

Nonlinear fits in Octave

currently I'm using nonlin_curvefit function from GNU Octave's 'optim' package to fit data with . But this time I did also need the uncertainty of the returned parameters to determine the quality of the fit. After reading through the documentation I tied using the function curvefit_stat.
However whenever I alwas get errors using this functiona and I can't make any sense of the error message. I'm using Octave 4.2.2 from Ubuntu 18.04's default repository.
Standalone minimal example and error messages below. Startparameters init_cvg usually produces good result, while using init_dvg usually results in poor fits:
1;
x = linspace(-2*pi, 2*pi, 600);
ydata = 3.4*sin(1.6*x) + rand(size(x))*1.3;
f = #(p, x) p(1)*sin(p(2).*x);
function y = testfun(p ,x)
y = p(1).*sin(p(2).*x);
endfunction
init_cvg = [1; 1.1];
init_dvg = [1; 1.0];
[pc, mod_valc, cvgc, outpc] = nonlin_curvefit(f, init_cvg, x, ydata);
[pd, mod_vald, cvgd, outpd] = nonlin_curvefit(f, init_dvg, x, ydata);
hold off
plot(x, yd, "b.");
hold on;
plot(x, mod_valc, "r-");
plot(x, mod_vald, "color", [0.9 0.4 0.3]);
settings = optimset("ret_covp", true);
covpc = curvefit_stat(f, pc, x, ydata, settings);
covpd = curvefit_stat(f, pd, x, ydata, settings);
puts("sqrt(diag(covpc))")
sqrt(diag(covpc))
puts("sqrt(diag(covpd))")
sqrt(diag(covpd))
The first error message occurs when I use f as a model function, the second occurs when I use testfun instead:
>> curvefit_stat_TEST
error: label of objective function must be specified
error: called from
__residmin_stat__ at line 566 column 7
curvefit_stat at line 56 column 7
curvefit_stat_TEST at line 25 column 7
>> curvefit_stat_TEST
error: 'p' undefined near line 8 column 7
error: called from
testfun at line 8 column 5
curvefit_stat_TEST at line 25 column 7
>>
Could somebody confirm this error ?
I would appreciate any help.
I found the problem.
I needed to add "abjf_type", "wls" as arguments to optimset.

Plotting a function in matlab involving an integral

I'm trying to plot a function that contains a definite integral. My code uses all anonymous functions. When I run the file, it gives me an error. My code is below:
%%% List of Parameters %%%
gamma_sp = 1;
cap_gamma = 15;
gamma_ph = 0;
omega_0 = -750;
d_omega_0 = 400;
omega_inh = 100;
d_omega_inh = 1000;
%%% Formulae %%%
gamma_t = gamma_sp/2 + cap_gamma/2 + gamma_ph;
G = #(x) exp(-(x-omega_inh).^2./(2*d_omega_inh.^2))./(sqrt(2*pi)*d_omega_inh);
F = #(x) exp(-(x-omega_0).^2./(2*d_omega_0.^2))./(sqrt(2*pi)*d_omega_0);
A_integral = #(x,y) G(x)./(y - x + 1i*gamma_t);
Q_integral = #(x,y) F(x)./(y - x + 1i*gamma_t);
A = #(y) integral(#(x)A_integral(x,y),-1000,1000);
Q = #(y) integral(#(x)Q_integral(x,y),-3000,0);
P1 = #(y) -1./(1i.*(gamma_sp + cap_gamma)).*(1./(y + 2.*1i.*gamma_t)*(A(y)-conj(A(0)))-1./y.*(A(y)-A(0))+cap_gamma./gamma_sp.*Q(y).*(A(0)-conj(A(0))));
P2 = #(y) conj(P1(y));
P = #(y) P1(y) - P2(y);
sig = #(y) abs(P(y)).^2;
rng = -2000:0.05:1000;
plot(rng,sig(rng))
It seems to me that when the program runs the plot command, it should put each value of rng into sig(y), and that value will be used as the y value in A_integral and Q_integral. However, matlab throws an error when I try to run the program.
Error using -
Matrix dimensions must agree.
Error in #(x,y)G(x)./(y-x+1i*gamma_t)
Error in #(x)A_integral(x,y)
Error in integralCalc/iterateScalarValued (line 314)
fx = FUN(t);
Error in integralCalc/vadapt (line 133)
[q,errbnd] = iterateScalarValued(u,tinterval,pathlen);
Error in integralCalc (line 76)
[q,errbnd] = vadapt(#AtoBInvTransform,interval);
Error in integral (line 89)
Q = integralCalc(fun,a,b,opstruct);
Error in #(y)integral(#(x)A_integral(x,y),-1000,1000)
Error in
#(y)-1./(1i.*(gamma_sp+cap_gamma)).*(1./(y+2.*1i.*gamma_t)*(A(y)-conj(A(0)))-1. /y.*(A(y)-A(0))+cap_gamma./gamma_sp.*Q(y).*(A(0)-conj(A(0))))
Error in #(y)P1(y)-P2(y)
Error in #(y)abs(P(y)).^2
Error in fwm_spec_diff_paper_eqn (line 26)
plot(rng,sig(rng))
Any ideas about what I'm doing wrong?
You have
>> rng = -2000:0.05:1000;
>> numel(rng)
ans =
60001
all 60001 elements get passed down to
A = #(y) integral(#(x)A_integral(x,y),-1000,1000);
which calls
A_integral = #(x,y) G(x)./(y - x + 1i*gamma_t);
(similar for Q). The thing is, integral is an adaptive quadrature method, meaning (roughly) that the amount of x's it will insert into A_integral varies with how A_integral behaves at certain x.
Therefore, the amount of elements in y will generally be different from the elements in x in the call to A_integral. This is why y-x +1i*gamma_t fails.
Considering the complexity of what you're trying to do, I think it is best to re-define all anonymous functions as proper functions, and integrate a few of them into single functions. Look into the documentation of bsxfun to see if that can help (e.g., bsxfun(#minus, y.', x) instead of y-x could perhaps fix a few of these issues), otherwise, vectorize only in x and loop over y.
Thanks Rody, that made sense to me. I keep trying to use matlab like mathematica and I forget how matlab does things. I modified the code a bit, and it produces the right result. The integrals are evaluated very roughly, but it should be easy to fix that. I've posted my modified code below.
%%% List of Parameters %%%
gamma_sp = 1;
cap_gamma = 15;
gamma_ph = 0;
omega_0 = -750;
d_omega_0 = 400;
omega_inh = 100;
d_omega_inh = 1000;
%%% Formulae %%%
gamma_t = gamma_sp/2 + cap_gamma/2 + gamma_ph;
G = #(x) exp(-(x-omega_inh).^2./(2*d_omega_inh.^2))./(sqrt(2*pi)*d_omega_inh);
F = #(x) exp(-(x-omega_0).^2./(2*d_omega_0.^2))./(sqrt(2*pi)*d_omega_0);
A_integral = #(x,y) G(x)./(y - x + 1i*gamma_t);
Q_integral = #(x,y) F(x)./(y - x + 1i*gamma_t);
w = -2000:0.05:1000;
sigplot = zeros(size(w));
P1plot = zeros(size(w));
P2plot = zeros(size(w));
Pplot = zeros(size(w));
aInt_range = -1000:0.1:1200;
qInt_range = -2000:0.1:100;
A_0 = sum(A_integral(aInt_range,0).*0.1);
for k=1:size(w,2)
P1plot(k) = -1./(1i*(gamma_sp + cap_gamma)).*(1./(w(k)+2.*1i.*gamma_t).*(sum(A_integral(aInt_range,w(k)).*0.1)-conj(A_0))-1./w(k).*(sum(A_integral(aInt_range,w(k)).*0.1)-A_0)+cap_gamma./gamma_sp.*sum(Q_integral(qInt_range,w(k)).*0.1).*(A_0-conj(A_0)));
P2plot(k) = conj(P1plot(k));
Pplot(k) = P1plot(k) - P2plot(k);
sigplot(k) = abs(Pplot(k)).^2;
end
plot(w,sigplot)

Problem with spline method = 'monoH.FC''

I am interested in using the monotone spline, but I get an error when R tries to use it. I am using R 2.12.0, and the method 'monoH.FC' says that it has been supported since 2.8.0
Reproducible example (same result for more complicated (x,y) relationships)
x<-1:2
y<-1:2
spline(x,y,method="monoH.FC")
Error in spline(x, y, method = "monoH.FC") : invalid interpolation method
What I have tried
?spline returns:
...
Usage:
...
spline(x, y = NULL, n = 3*length(x), method = "fmm",
xmin = min(x), xmax = max(x), xout, ties = mean)
...
Arguments:
method: specifies the type of spline to be used. Possible values are
‘"fmm"’, ‘"natural"’, ‘"periodic"’ and ‘"monoH.FC"’.
...
But the spline function itself indicates that the 'monoH.FC' method is not supported:
...
method <- pmatch(method, c("periodic", "natural", "fmm"))
if (is.na(method))
stop("invalid interpolation method")
...
Question
How can I use method = 'monoH.FC' with spline?
Use splinefun; it supports method=monoH.FC.
The last example in ?spline shows you how to do it.
## An example of monotone interpolation
n <- 20
set.seed(11)
x. <- sort(runif(n)) ; y. <- cumsum(abs(rnorm(n)))
plot(x.,y.)
curve(splinefun(x.,y.)(x), add=TRUE, col=2, n=1001)
curve(splinefun(x.,y., method="mono")(x), add=TRUE, col=3, n=1001)
legend("topleft", paste("splinefun( \"", c("fmm", "monoH.CS"), "\" )", sep=''),
col=2:3, lty=1)