Error in model.frame.default : variable lengths differ (found for '(weights)') - regression

When I am trying to do WLS to correct for heteroskedasticity, I for the following error:
Do you have any suggestions? I noticed that my regression has list length of 13 and the vector for W has 14. I am not sure how to correct for this.
W <- 1 / lm(abs(R40$residuals) ~ R40$fitted.values)$fitted.values^2
R40w <- lm(Market_Cap ~ MarketCap_Lag + Value_traded_ratio + GDP_growthR, data=pdata, weights = W)
summary(R40w)
Error in model.frame.default(formula = Market_Cap ~ MarketCap_Lag + Value_traded_ratio + :
variable lengths differ (found for '(weights)')

Related

Unexpected output of emmeans averaged accross variables

I transformed a variable (e.g. leaf_area) using a simple square transformation and then fitted to the following model containing an interaction:
fit <- lmer(leaf_area^2 ~genotype*soil_type + date_measurement + light + (1|repetition) + (1|y_position) + (1|x_position), data = dataset)
To obtain the emmeans averaged accross genotypes and soil type for each measurement date, I further use the following command:
fit.emm <- emmeans(fit, ~ genotype*soil_type + date_measurement, type = "response")
The emmeans are, nevertheless, averaged for the variable date_measurement.
As represented in the following example, emmeans are averages of genotypes x, y and z in the soil MT and in the date of measurement 27.4, but the measurement dates actually occured on 21, 23, 28, 30 and 35 das.
genotype soil_type date_measurement emmean SE df lower.CL upper.CL
x MT 27.4 0.190 0.0174 126.0 0.155 0.224
y MT 27.4 0.220 0.0147 74.1 0.191 0.250
z MT 27.4 0.210 0.0157 108.6 0.179 0.241
When I fit the model without interaction between genotype and soil type and run the emmeans, the results are still averaged for the measurement dates.
fit <- lmer(leaf_area^2 ~genotype + soil_type + date_measurement + light + (1|repetition) + (1|y_position) + (1|x_position), data = dataset)
fit.emm <- emmeans(fit, ~ genotype + soil_type + date_measurement, type = "response")
My question is: how can I obtain the emmeans averaged accross genotype and soil but separated for each date of measurement?
Class of variables:
date_measurement, light, x_position, y_position: numeric
genotype and soil_type: factor
Thank you in advance.
When you have a numerical predictor in the model, the default is to obtain predictions at the average value of that covariate. If you want the covariates treated like factors, you have to say so:
fit.emm <- emmeans(fit, ~ genotype*soil_type + date_measurement,
cov.reduce = FALSE)
In addition, emmeans cannot auto-detect your square transformation. You can fix it up by doing
fit.emm <- update(fit.emm, tran = make.tran("power", 2),
type = "response")
Then I think you will want to subsequently obtain marginal means by averaging over date_measurement at least -- i.e.,
fit.emm2 <- emmeans(fit.emm, ~ genotype*soil_type)
It will retain the transformation and type = "response" setting.

invalid value encountered in true_divide in rk45

I'm trying to implement to RK45 for a two body problem with the earth and sun but keep getting a division by zero that I don't understand. It seems to be in the norme from the accelerations function that the division occurs but I don't see how that can be or how to fix it. Here is code:
from scipy import optimize
from numpy import linalg as LA
import matplotlib.pyplot as plt
from scipy.optimize import fsolve
import numpy as np
AU=1.5e11
a=AU
e=0.5
mss=2E30
ms = 2E30
me = 5.98E24
mv=4.867E24
yr=3.15e7
h=100
mu1=ms*me/(ms+me)
mu2=ms*me/(ms+me)
G=6.67E11
step=24
vi=np.sqrt(G*ms*(2/(a*(1-e))-1/a))
#sun=sphere(pos=vec(0,0,0),radius=0.1*AU,color=color.yellow)
#earth=sphere(pos=vec(1*AU,0,0),radius=0.1*AU)
sunpos=np.array([-903482.12391302, -6896293.6960525, 0. ])
earthpos=np.array([a*(1-e),0,0])
earthv=np.array([0,vi,0])
sunv=np.array([0,0,0])
def accelerations(t,earthposs, sunposs):
norme=sum( (earthposs-sunposs)**2 )**0.5
gravit = G*(earthposs-sunposs)/norme**3
sunaa = me*gravit
earthaa = -ms*gravit
return earthaa, sunaa
def ode45(f,t,y,h):
"""Calculate next step of an initial value problem (IVP) of an ODE with a RHS described
by the RHS function with an order 4 approx. and an order 5 approx.
Parameters:
t: float. Current time.
y: float. Current step (position).
h: float. Step-length.
Returns:
q: float. Order 2 approx.
w: float. Order 3 approx.
"""
s1 = f(t, y[0],y[1])
s2 = f(t + h/4.0, y[0] + h*s1[0]/4.0,y[1] + h*s1[1]/4.0)
s3 = f(t + 3.0*h/8.0, y[0] + 3.0*h*s1[0]/32.0 + 9.0*h*s2[0]/32.0,y[1] + 3.0*h*s1[1]/32.0 + 9.0*h*s2[1]/32.0)
s4 = f(t + 12.0*h/13.0, y[0] + 1932.0*h*s1[0]/2197.0 - 7200.0*h*s2[0]/2197.0 + 7296.0*h*s3[0]/2197.0,y[1] + 1932.0*h*s1[1]/2197.0 - 7200.0*h*s2[1]/2197.0 + 7296.0*h*s3[1]/2197.0)
s5 = f(t + h, y[0] + 439.0*h*s1[0]/216.0 - 8.0*h*s2[0] + 3680.0*h*s3[0]/513.0 - 845.0*h*s4[0]/4104.0,y[1] + 439.0*h*s1[1]/216.0 - 8.0*h*s2[1] + 3680.0*h*s3[1]/513.0 - 845.0*h*s4[1]/4104.0)
s6 = f(t + h/2.0, y[0] - 8.0*h*s1[0]/27.0 + 2*h*s2[0] - 3544.0*h*s3[0]/2565 + 1859.0*h*s4[0]/4104.0 - 11.0*h*s5[0]/40.0,y[1] - 8.0*h*s1[1]/27.0 + 2*h*s2[1] - 3544.0*h*s3[1]/2565 + 1859.0*h*s4[1]/4104.0 - 11.0*h*s5[1]/40.0)
w1 = y[0] + h*(25.0*s1[0]/216.0 + 1408.0*s3[0]/2565.0 + 2197.0*s4[0]/4104.0 - s5[0]/5.0)
w2 = y[1] + h*(25.0*s1[1]/216.0 + 1408.0*s3[1]/2565.0 + 2197.0*s4[1]/4104.0 - s5[1]/5.0)
q1 = y[0] + h*(16.0*s1[0]/135.0 + 6656.0*s3[0]/12825.0 + 28561.0*s4[0]/56430.0 - 9.0*s5[0]/50.0 + 2.0*s6[0]/55.0)
q2 = y[1] + h*(16.0*s1[1]/135.0 + 6656.0*s3[1]/12825.0 + 28561.0*s4[1]/56430.0 - 9.0*s5[1]/50.0 + 2.0*s6[1]/55.0)
return w1,w2, q1,q2
t=0
T=10**5
xarray=[]
yarray=[]
while t<T:
ode45(accelerations,t,[earthpos,sunpos],h)
earthpos=ode45(accelerations,t,[earthpos,sunpos],h)[1]
sunpos=ode45(accelerations,t,[earthpos,sunpos],h)[3]
xarray.append(ode45(accelerations,t,[earthpos,sunpos],h)[0][0])
yarray.append(ode45(accelerations,t,[earthpos,sunpos],h)[0][1])
print(ode45(accelerations,t,[earthpos,sunpos],h)[0][0],ode45(accelerations,t,[earthpos,sunpos],h)[0][1])
t=t+h
plt.plot(xarray,yarray)
plt.savefig('orbit.png')
plt.show()
After the second iteration the code comes back with only nan values for the earthpos.
Numerical integration methods usually integrate first order systems, y'=f(t,y). You want to integrate a second order ODE system y''=f(t,y) which you first need to turn into a first order system.
Why do you not use the vector class of numpy?
Why do you perform the same computation with the same arguments multiple times instead of catching all return values once and then distributing them to the lists?
You could also use scipy.integrate.solve_ivp with the "RK45" method instead of programming it yourself.

displaydata function in ex3 coursera machine learning

I am facing a issue, here is my script. some end or bracket issue but I have checked noting is missing.
function [h, display_array] = displayData(X, example_width)
%DISPLAYDATA Display 2D data in a nice grid
% [h, display_array] = DISPLAYDATA(X, example_width) displays 2D data
% stored in X in a nice grid. It returns the figure handle h and the
% displayed array if requested.
% Set example_width automatically if not passed in
if ~exist('example_width', 'var') || isempty(example_width)
example_width = round(sqrt(size(X, 2)));
end
% Gray Image
colormap(gray);
% Compute rows, cols
[m n] = size(X);
example_height = (n / example_width);
% Compute number of items to display
display_rows = floor(sqrt(m));
display_cols = ceil(m / display_rows);
% Between images padding
pad = 1;
% Setup blank display
display_array = - ones(pad + display_rows * (example_height + pad), ...
pad + display_cols * (example_width + pad));
% Copy each example into a patch on the display array
curr_ex = 1;
for j = 1:display_rows
for i = 1:display_cols
if curr_ex > m,
break;
end
% Copy the patch
% Get the max value of the patch
max_val = max(abs(X(curr_ex, :)));
display_array(pad + (j - 1) * (example_height + pad) +
(1:example_height), ...
pad + (i - 1) * (example_width + pad) +
(1:example_width)) = ...
reshape(X(curr_ex, :),
example_height, example_width) / max_val;
curr_ex = curr_ex + 1;
end
if curr_ex > m,
break;
end
end
% Display Image
h = imagesc(display_array, [-1 1]);
% Do not show axis
axis image off
drawnow;
end
ERROR:
displayData
parse error near line 86 of file C:\Users\ALI\displayData.m
syntax error
Pls guide which is the error in the script, this script is already written in
the coursera so its must be error free.
You seem to have modified the code, and moved the "ellipsis" operator (i.e. ...) or the line that is supposed to follow it, in several places compared to the original code in coursera.
Since the point of the ellipsis operator is to appear at the end of a line, denoting that the line that follows is meant to be a continuation of the line before, then moving either the ellipsis or the line below it will break the code.
E.g.
a = 1 + ... % correct use of ellipsis, code continues below
2 % treated as one line, i.e. a = 1 + 2
vs
a = 1 + % without ellipsis, the line is complete, and has an error
... 2 % bad use of ellipsis; also anything to the right of '...' is ignored
vs
a = 1 + ... % ellipsis used properly so far
% but the empty line here makes the whole 'line' `a = 1 +` which is wrong
2 % This is a new instruction

Recall from nltk.metrics.score returning None

I'm trying to calculate the precision and recall using the nltk.metrics.score (http://www.nltk.org/_modules/nltk/metrics/scores.html) with my NLTK.NaiveBayesClassifier.
However, I stumble upon the error:
"unsupported operand type(s) for +: 'int' and 'NoneType".
which I suspect is from my 10-fold cross-validation where in some reference sets, there are zero negative (the data set is a bit imbalanced where 87% of it is positive).
According to nltk.metrics.score,
def precision(reference, test):
"Given a set of reference values and a set of test values, return
the fraction of test values that appear in the reference set.
In particular, return card(``reference`` intersection
``test``)/card(``test``).
If ``test`` is empty, then return None."
It seems that some of my 10-fold set is returning recall as None since there are no Negative in the reference set. Any idea on how to approach this problem?
My full code is as follow:
trainfeats = negfeats + posfeats
n = 10 # 5-fold cross-validation
subset_size = len(trainfeats) // n
accuracy = []
pos_precision = []
pos_recall = []
neg_precision = []
neg_recall = []
pos_fmeasure = []
neg_fmeasure = []
cv_count = 1
for i in range(n):
testing_this_round = trainfeats[i*subset_size:][:subset_size]
training_this_round = trainfeats[:i*subset_size] + trainfeats[(i+1)*subset_size:]
classifier = NaiveBayesClassifier.train(training_this_round)
refsets = collections.defaultdict(set)
testsets = collections.defaultdict(set)
for i, (feats, label) in enumerate(testing_this_round):
refsets[label].add(i)
observed = classifier.classify(feats)
testsets[observed].add(i)
cv_accuracy = nltk.classify.util.accuracy(classifier, testing_this_round)
cv_pos_precision = precision(refsets['Positive'], testsets['Positive'])
cv_pos_recall = recall(refsets['Positive'], testsets['Positive'])
cv_pos_fmeasure = f_measure(refsets['Positive'], testsets['Positive'])
cv_neg_precision = precision(refsets['Negative'], testsets['Negative'])
cv_neg_recall = recall(refsets['Negative'], testsets['Negative'])
cv_neg_fmeasure = f_measure(refsets['Negative'], testsets['Negative'])
accuracy.append(cv_accuracy)
pos_precision.append(cv_pos_precision)
pos_recall.append(cv_pos_recall)
neg_precision.append(cv_neg_precision)
neg_recall.append(cv_neg_recall)
pos_fmeasure.append(cv_pos_fmeasure)
neg_fmeasure.append(cv_neg_fmeasure)
cv_count += 1
print('---------------------------------------')
print('N-FOLD CROSS VALIDATION RESULT ' + '(' + 'Naive Bayes' + ')')
print('---------------------------------------')
print('accuracy:', sum(accuracy) / n)
print('precision', (sum(pos_precision)/n + sum(neg_precision)/n) / 2)
print('recall', (sum(pos_recall)/n + sum(neg_recall)/n) / 2)
print('f-measure', (sum(pos_fmeasure)/n + sum(neg_fmeasure)/n) / 2)
print('')
Perhaps not the most elegant, but guess the most simple fix would be setting it to 0 and the actual value if not None, e.g.:
cv_pos_precision = 0
if precision(refsets['Positive'], testsets['Positive']):
cv_pos_precision = precision(refsets['Positive'], testsets['Positive'])
And for the others as well, of course.

What's the correct way to expand a [0,1] interval to [a,b]?

Many random-number generators return floating numbers between 0 and 1.
What's the best and correct way to get integers between a and b?
Divide the interval [0,1] in B-A+1 bins
Example A=2, B=5
[----+----+----+----]
0 1/4 1/2 3/4 1
Maps to 2 3 4 5
The problem with the formula
Int (Rnd() * (B-A+1)) + A
is that your Rnd() generation interval is closed on both sides, thus the 0 and the 1 are both possible outputs and the formula gives 6 when the Rnd() is exactly 1.
In a real random distribution (not pseudo), the 1 has probability zero. I think it is safe enough to program something like:
r=Rnd()
if r equal 1
MyInt = B
else
MyInt = Int(r * (B-A+1)) + A
endif
Edit
Just a quick test in Mathematica:
Define our function:
f[a_, b_] := If[(r = RandomReal[]) == 1, b, IntegerPart[r (b - a + 1)] + a]
Build a table with 3 10^5 numbers in [1,100]:
table = SortBy[Tally[Table[f[1, 100], {300000}]], First]
Check minimum and maximum:
In[137]:= {Max[First /# table], Min[First /# table]}
Out[137]= {100, 1}
Lets see the distribution:
BarChart[Last /# SortBy[Tally[Table[f[1, 100], {300000}]], First],
ChartStyle -> "DarkRainbow"]
X = (Rand() * (B - A)) + A
Another way to look at it, where r is your random number in the range 0 to 1:
(1-r)a + rb
As for your additional requirement of the result being an integer, maybe (apart from using built in casting) the modulus operator can help you out. Check out this question and the answer:
Expand a random range from 1–5 to 1–7
Well, why not just look at how Python does it itself? Read random.py in your installation's lib directory.
After gutting it to only support the behavior of random.randint() (which is what you want) and removing all error checks for non-integer or out-of-bounds arguments, you get:
import random
def randint(start, stop):
width = stop+1 - start
return start + int(random.random()*width)
Testing:
>>> l = []
>>> for i in range(2000000):
... l.append(randint(3,6))
...
>>> l.count(3)
499593
>>> l.count(4)
499359
>>> l.count(5)
501432
>>> l.count(6)
499616
>>>
Assuming r_a_b is the desired random number between a and b and r_0_1 is a random number between 0 and 1 the following should work just fine:
r_a_b = (r_0_1 * (b-a)) + a