Centering variables for multiple regression - interested in group effects - regression

I'm trying to run a multiple regression model looking at the length-weight relationship in fish. So y = weight, x = weight. What I want to examine specifically is if the length-weight relationship between different populations (same species) differs - I've run the model as:
weight = length * population
BUT have also reading a lot about centering data in regression models. It seems to make no sense to me to grand-mean centre length for this analysis as i'm specifically interested in the differences in L-W relationship between the groups, but should I group-centre for length? Or, not centre at all?
Any help or pointers greatly apriciated.
Cheers.
G.

Related

How to reveal relations between number of words and target with self-attention based models?

Transformers can handle variable length input, but what if the number of words might correlate with the target? Let's say we want to perform a sentiment analysis for some reviews where the longer reviews are more probable to be bad. How can the model harness this knowledge? Of course a simple solution could be to add this count as a feature after the self-attention layer. However, this hand-crafted-like approach wouldn't reveal more complex relations, for example if there is a high number of word X, it correlates with target 1, except if there is also high number of word Y, in which case the target tends to be 0.
How could this information be included using deep learning? Paper recommendations in the topic are also well appreciated.

Making smaller models from pre-existing redundant models

Sorry for the vague title.
I will just start with an example. Say that I have a pre-existing model that can classify dogs, cats and humans. However, all I need is a model that can classify between dogs and cats (Humans are not needed). The pre-existing model is heavy and redundant, so I want to make a smaller, faster model that can just do the job needed.
What approaches exist?
I thought of utilizing knowledge distillation (using the previous model as a teacher and the new model as the student) and training a whole new model.
First, prune the teacher model to have a smaller version to be used as a student in distillation. A simple regime such as magnitude-based pruning will suffice.
For distillation, as your output vectors will not match anymore (the student is 2 dimensional and teacher is 3 dimensional, you will have to take this into account and only calculate this distillation loss based on the overlapping dimensions. An alternative is layer-wise distillation in which the output vectors are irrelevant and the distillation loss is calculated based on the difference between intermediate layers of the teacher and student. In both cases, total loss may include a difference between student output and label, in addition to student output and teacher output.
It is possible for a simple task like this that just basic transfer learning would suffice after pruning - that is just to replace the 3d output vector with a 2d output vector and continue training.

Analyse data with degree of affection

Hello everyone! I'm a newbie studying Data Analysis.
If you'd like to see relationship how A,B,C affects outcome, you may use several models such as KNN, SVM, Logistics regression (as far as I know).
But all of them are kinda categorical, rather than degree of affection.
Let's say, I'd like to show how Fonts and Colors contribute the degree of attraction (as shown).
What models can I use?
Thousands thanks!
If your input is only categorical variables (and having a few values each), then there are finitely many potential samples. Therefore, the model will have finitely many inputs, and, therefore, only a few outputs. Just warning.
If you use, say, KNN or random forest, you can assign L2 norm as your accuracy metric. It will emphasize that 1 is closer to 2 than 5 (pls don't forget to normalize).

Statsmodels: How to fit an AR-like model using multiple time series as input?

I have been stuck for days because I cannot find a way to fit a traditional regression model such as ARIMA giving multiple time series as input.
I have got thousand trajectories positions of different vehicles (xy coordinates for each position). Let's say each sample (trajectory) is composed of 10 positions (all objects trajectories are not necessarily of the same length). It means I have got 10*N different time series (N is the total number of samples). I want to fit a model with all samples for x coordinates and then predict the future position of any new trajectory (test samples) that I give in input. Then I plan to do the same with another model for y coordinates. I do not say the method will work but I need to implement it to compare it with others (Neural networks, ...)
The hypothesis: a number of time series can be modeled with a single ARIMAX (or other) model (i.e. the same parameters work for all the time series). What is wanted: to fit them all simultaneously.
Can someone help me please?
Thank you in advance for your support!
Best regards,

Regression with constrained independent variables that are highly correlated

I am not sure if what I am trying to construct is correct; hence, I need your help.
I am trying to fit a regression model, of any type, to data with one response, and multiple independent variables. The problem is that the independent variables are highly correlated (0.9-1.0), but more importantly for each realization of y, the sum of the corresponding x's is constrained to be a specific value that I have.
Is fitting a regression model even possible here? If not, can you think of a way to characterize the relationship between y and the x's?
Thank you.