unbalanced factorial anova in r - regression

I have the factorial IV called pa_res$Category_Recorded with size 121 and the continuous DV called Scaled_MO_sq$sq_score with size 123. I run the below code since the sample sizes are unequal but it still gives me a mistake:
Anova(lm(Scaled_MO_sq$sq_score~pa_res$Category_Recorded), type="III")
Error in model.frame.default(formula = Scaled_MO_sq$sq_score ~ pa_res$Category_Recorded, :
variable lengths differ (found for 'pa_res$Category_Recorded')
Could you please help me how to implement the regression for these two variables?!

Related

calculating DFT of time signal in MATLAB

This code computes the DFT from time domain.
Can anybody see the code below and help me to get the right answer?
my problem is:
when I change N value, for example, to 4, 5, 10 ,or other values.
X(1) changes with that. but I think X(1) must be the same for every value of N.
just like the shape below: the N value changes but the vertical value is the same.
I appreciate if you help me.
Thank you.
enter image description here
clear; clc;
% %% Analytical
N=4;
k=0:N-1;
X=zeros(N,1);
t=k/N;
x=(5+2*cos(2*pi*t-pi/2)+3*cos(4*pi*t))
%x=abs((1-(0.012.*(pi.*52.*(t-0.3721)).^2)).*exp(-(pi.*52.*(t-0.3721).^2)))
abs(sum(x))
for k=0:N-1
for n=0:N-1
X(k+1)=X(k+1)+x(n+1).*exp(-1i.*2.*pi.*(n).*(k)/N);
end
end
k1=[0:N-1];
stem(k1,abs(X))
% xlim([0 1])
% ylim([-1 1])
xlabel('Frequency');
ylabel('|X(k)|');
title('Frequency domain - Magnitude response')
Your definition of DFT (which is probably the most common definition) does not have the property that X(1) remains constant with N. Instead, it is X(1)/N which will remain constant. To use this DFT to get the magnitudes of the input at various frequencies, you'll need to divide the DFT output by N.
To verify this, you can call Matlab's fft function and compare with your results. You should get the same answer from Matlab's fft. Note that Matlab's fft documentation says:
The resulting FFT amplitude is A*n/2, where A is the original amplitude and n is the number of FFT points.

Created factors with EFA, tried regressing (lm) with control variables - Error message "variable lengths differ"

EFA first-timer here!
I ran an Exploratory Factor Analysis (EFA) on a data set ("df1" = 1320 observations) with 50 variables by creating a subset with relevant variables only that have no missing values ("df2" = 301 observations).
I was able to filter 4 factors (19 variables in total).
Now I would like to take those 4 factors and regress them with control variables.
For instance: Factor 1 (df2$fa1) describes job satisfaction.
I would like to control for age and marital status.
Fa1Regression <- lm(df2$fa1 ~ df1$age + df1$marital)
However I receive the error message:
Error in model.frame.default(formula = df2$fa1 ~ df1$age + :
variable lengths differ (found for 'df1$age')
What can I do to run the regression correctly? Can I delete observations from df1 that are nonexistent in df2 so that the variable lengths are the same?
Its having a problem using lm to regress a latent factor on other coefficients. Instead, use the lavaan package, where your model statement would be myModel<- 'df2$fa1~ x1+x2+x3'

Scilab fsolve returning incorrect argument (error 98)

I am trying to run fsolve to resolve a dissociation equation but I keep getting this error. I tracked it down to the x(1)^(1/2) term (removing the square root yields no error) but I can't find a way to solve the proper equation I need. The code is below.
T = 2000
Kp = exp(-deltaG/(Ru*T))
function [f]=func(x)
f(1) = 2-x(1)*4 / (3*x(1) - 1)*(x(1))^(1/2) - Kp
endfunction
x0 = [1]
[x,f_x] = fsolve(x0,func)
EDIT: More requested info
The error is
!--error 98 variable returned by scilab argument function is
incorrect
Ru is the gas constant, 8.315.
DeltaG is -135643.
Kp is 3.489e-3.
This is a book example, x should yield 0.3334.
What kind of solved this problem was that I updated scilab to version 6.0.1 from 5.5. The problem is that depending on the initial guess x0 the values of x get really absurd and x0 has to be so close to the real answer that it defeats the purpose of the calculation.
Also I don't have access to Maple, my other alternative would be MATLAB
Using symbolic calculus software like xcas or Maple one can solve your equation symbolycally
There are 3 solutions:
s1=((1/4)*t1+(1/4)*(Kp-2)^2/t1-(1/4)*Kp+1/2)^2
s2=(-(1/8)*t1-(1/8)*(Kp-2)^2/t1-(1/4)*Kp+1/2+(1/2*%i)*sqrt(3)*((1/4)*t1-(1/4)*(Kp-2)^2/t1))^2
s3=(-(1/8)*t1-(1/8)*(Kp-2)^2/t1-(1/4)*Kp+1/2-(1/2*%i)*sqrt(3)*((1/4)*t1-(1/4)*(Kp-2)^2/t1))^2
where
t=sqrt(-Kp*(Kp-4)*(Kp-2)^2)
t1=(-(Kp-2)*(Kp-2*sqrt(2))*(Kp+2*sqrt(2))+4*t)^(1/3);
depending on Kp values some solutions can be complex.

Octave FWHM calculation

I am having some problem about calculating the FWHM of my data. Because the "fwhm" function in signal package results in a 100 times bigger value than i expected to get.
What i did is that,
Depending on the gaussian distribution function (you can find it on wikipedia) I produced some data. In this function you can give a specific sigma (RMS) value (FWHM=sigma*2.355). Here is that the script I wrote to understand the situation
x=10:0.01:40;
x0=25;
sigma=0.25;
y=(1/(sigma*sqrt(2*pi)))*exp(-((x-x0).^2)/(2*sigma^2));
z=fwhm(y)/2.355;
plot(x,y)
when I compared the results the output of "fwhm" function (24.999) is 100 times bigger than the one I used (0.25) in the function.
If you have any idea it will be very helpful.
Thanks in advance.
Your z is 100 times bigger because your steps in x are 1/100 (0.01). If you use fwhm(y) it is expected that the stepsize in x is 1. If not you have to specify that.
In your case you should do:
z=fwhm(x, y)/2.355
z = 0.24999
which matches your sigma

Interpreting libsvm epsilon-SVR result

I tried to train & cross validate a set of data with 8616 samples using epsilon SVR.
Among the datasets, I take 4368 for test, 4248 for CV.
Kernel type = RBF kernel. Libsvm provides a result as shown below.
optimization finished, #iter = 502363
nu = 0.689607
obj = -6383530527604706.000000, rho = 2884789.960212
nSV = 3023, nBSV = 3004
This is a result gotten by setting
-s 3 -t 2 -c 2^28 -g 2^-13 -p 2^12
(a) What does "nu" means? Sometimes I got nu = 0.99xx for different parameter.
(b) It seems that "obj" is surprisingly large. Does it sounds correct? Libsvm FAQ said this is "optimal objective value of the dual SVM problem". Does it means that this is the min value of f(alpha)?
(c) "rho" is large too. This is the bias term, b. The dataset labels (y) consist of value between 82672 to 286026. So I guess this is reasonable, am I right?
For training set,
Mean squared error = 1.26991e+008 (regression)
Squared correlation coefficient = 0.881112 (regression)
For cross-validation set,
Mean squared error = 1.38909e+008 (regression)
Squared correlation coefficient = 0.883144 (regression)
Using the selected param, I have produced the below result
kernel_type=2 (best c:2^28=2.68435e+008, g:2^-13=0.00012207, e:2^12=4096)
NRMS: 0.345139, best_gap:0.00199433
Mean Absolute Percent Error (MAPE): 5.39%
Mean Absolute Error (MAE): 8956.12 MWh
Daily Peak MAPE: 5.30%
The CV set MAPE is low (5.39%). Using Bias-Variance test, the difference between train set MAPE and CV set MAPE is only 0.00199433, which mean the param seems to be set correctly. But I wonder if the extremely large "obj", "rho" value is correct....
I am very new to SVR, do correct me if my interpretation or validation method is incorrect/insufficient.
Method to calculate MAPE
train_model = svmtrain(train_label, train_data, cmd);
[result_label, train_accuracy, train_dec_values] = svmpredict(train_label, train_data, train_model);
train_err = train_label-result_label;
train_errpct = abs(train_err)./train_label*100;
train_MAPE = mean(train_errpct(~isinf(train_errpct)));
The objective and rho values are high because (most probably) the data were not scaled. Scaling is highly recommended to avoid overflow; the overflow risk also depends on the type of kernel. Btw, when scaling the training data, do not forget to also scale the test data, which is most easily accomplished by scaling all data first, and then splitting them into a training and test set.