Regression dummy*Likert scale - regression

I have a problem with an interaction, which I want to integrate in my ols regression model. The dependent variable is a 5 point Likert scale; the independent variables are a dummy variable and another 5 point Likert scale variable. Besides that I want to have an interaction between those independent variables in the model. The code i use is
reg "depvar" "indep_dummy"##"indep_likert"
As far as I know this should include the main effects of both variables and the interaction between them. Sadly Stata treats the independent Likert scale variable as a categorical variable instead of an ordinal scaled. How can I fix this?
I know that I can put a c. in front of the independent variables to treat them as continuous but does that work for ordinal variables?

Related

How to interpret weighted effect coding interaction in GLMER

I am conducting 2 generalized linear mixed-effect regressions. One with interactions and one without. My variables are weighted effect coded. However, I am a bit confused about the interaction between two weighted effect-coded variables. I read that they "represent the additional effects over and above the main effects". Is the interpretation the same as effect-coded variables with an equal condition size? i.e, Variable A1 when variable B = 1?

How to decide tree depth of LGBM for high dimensional data?

I am using LightGBM for regression problems on my project and the input data has 800 numeric variables which is high dimensional and sparse dataset.
I want to use as many variables as possible in each iterations. In this case, should I unlimit max_depth?
Because I set max_depth=2 to overcome overfitting issue but it seems using only 1~3 variables in each iterations and those variables are used reapetedly.
And I want to know how tree depth affects to learning result of regression tree.
Detailed info. of the present model:
Number of input variables=800 (numeric)
target variables=1 (numeric)
objective=regression
max_depth=2
num_leaves=3
num_iterations=2000
learning_rate=0.01

Why W_q matrix in torch.nn.MultiheadAttention is quadratic

I am trying to implement nn.MultiheadAttention in my network. According to the docs,
embed_dim  – total dimension of the model.
However, according to the source file,
embed_dim must be divisible by num_heads
and
self.q_proj_weight = Parameter(torch.Tensor(embed_dim, embed_dim))
If I understand properly, this means each head takes only a part of features of each query, as the matrix is quadratic. Is it a bug of realization or is my understanding wrong?
Each head uses a different part of the projected query vector. You can imagine it as if the query gets split into num_heads vectors that are independently used to compute the scaled dot-product attention. So, each head operates on a different linear combination of the features in queries (and keys and values, too). This linear projection is done using the self.q_proj_weight matrix and the projected queries are passed to F.multi_head_attention_forward function.
In F.multi_head_attention_forward, it is implemented by reshaping and transposing the query vector, so that the independent attentions for individual heads can be computed efficiently by matrix multiplication.
The attention head sizes are a design decision of PyTorch. In theory, you could have a different head size, so the projection matrix would have a shape of embedding_dim × num_heads * head_dims. Some implementations of transformers (such as C++-based Marian for machine translation, or Huggingface's Transformers) allow that.

Standard error of absorved fixed effect // Run regression with noninteger factor variable

I have a regression that I can run for example as
reghdfe y, a(x1_est=x1 x2_est=x2)
which will store the estimated coefficients in x1_est and x2_est. Now, the issue is that using absorb() does not allow me to get the standard errors for these coefficients. If I understand it correctly, no postestimation method of reghdfe allows me to retrieve those.
Luckily, I only care about the standard errors of x1. So, I could instead run
reg y i.x1, a(x2)
and inspect _se[x1]. Unfortunately, x1 has so many different levels that it is not possible to store it as integer, it has to be double. The previous regression hence will fail with x1: factor variables may not contain noninteger values.
What could be another approach to get standard errors for x1?
With large number of fixed effects, STATA's default approaches won't work. One angle is to bootstrap fixed effects and generate standard errors. Again, the issue is that there are so many FE, such that standard bootstrapping methods won't work (cannot return such a large matrix in each bootstrap).
Essentially, to bootstrap the FE, one would (for a large number of iterations)
preserve
bsample
run the regression, reghdfe y, a(x1_est=x1 x2_est-x2)
Store x1_est in a .dta file
restore
After the loop is done, iteratively append all the .dta files, and compute standard errors.

Does PyTorch support variable with dynamic dimension?

I've updated my question based upon the variable dimension of variables.
Suppose the input tensor stores the 3d points with dimension 10x3, 10 means the #points and 3 is the feature dimension (say x,y,z coordinates). The dimension of the variable depends on the input tensor, say its dimension is 10x10. When the input tensor changes its dimension to 50x3, then the dimension of the variable will also have to change to 50x50.
I know in Tensorflow, if the input dimension is changing/unknown, we can declare it as tf.placeholder(None,3). However, I never meet the situation where the size of variable is changing/unknown, it seems that the variable will always have the fixed dimension.
I am currently learning PyTorch and don't know whether PyTorch supports this function. Any information would be appreciated!
========= Original question ========
I have a variable in which the size is changeable when input dimension changes. For example, if input is 10x2, then the variable should be 10x10. If input is 25x2, then the variable should be 25x25. As my understanding, the variable is used to store weights, which normally has fixed dimension. However in my case, the dimension of the variable depends on input data, which can change. Does PyTorch currently supports this kind of function?
Thanks!
Your question is little ambiguous. When you say, your input is say, 10x2, you need to define what the input tensor contains.
I am assuming you are talking about torch.autograd.Variable. If you want to use PyTorch's functionality, what you need to do is to provide your input through a tensor in the desired shape of the target function.
For example, if you want to use RNN implemented in PyTorch for an input sentence of length 10 where each word is represented by a 300 dimensional vector (e.g., word embedding), then you can do as follows.
rnn = nn.RNN(300, 256, 2) # emb_size=300,hidden_size=256,num_layers=2
input = Variable(torch.randn(10, 1, 300)) # sent_length=10,batch_size=1,emb_size=300
h0 = Variable(torch.randn(2, 1, 256)) # num_layers=2,batch_size=1,hidden_size=256
output, hn = rnn(input, h0)
If you have more than 1 sentence, then you can provide them in batch. In that case, you need to pad them to handle variable lengths. As you can see, RNN doesn't care about the sentence length, it can handle variable lengths but to provide many sentences in batch, you need padding. You can explore related functionalities in the official documentation.
Since you didn't mention what is your input actually, I am assuming you need variables with variable number of timesteps, in that case PyTorch can serve your purpose. Actually, PyTorch is developed to meet all basic functionalities that are required to build deep neural network architectures.