Can gtsummary be used to predict an ordinal variable (several predictors of all kinds in one model), adjusted for confounding factors - regression

i am trying to bulid a prediction model for an oridinal variable. I now that MASS:polr() function tragets this issue but i want to present it in a more approchable way. i thought gtsummary package may be sutiable
my code-
reg_tb<-tbl_uvregression(
reg_df,
include = c(a,b,c,d),
method = polr,
y = e,
exponentiate = TRUE,
pvalue_fun = ~style_pvalue(.x, digits = 2))
i now that tbl_uvregression() is a univariate model but under 'methods = ' i used the 'polr' option. i suspected polr can't be used in tbl_uvregression() to do an adjusted prediction model because after including 15 predictors, they all remained significant when runing the model (not reasonbele, several factors are strongly associated with each other).

Related

How can i use fsolve to plot the solutions to a function?

I have a variable of a that is equal to (weight./(1360*pi)).^(1/3), where the weight ranges between 4 and 8kg.
I then have guess of the time taken ,which is 14400 seconds.
The function in question is attached, where infinity is replaced by k=22.
Function in question
This function should be equal to 57/80
r/a can be replaced by 0.464, meaning that the multiplication of the summation can be written as 2/(0.464*pi).
alpha will be equal to 0.7*10^-7
How would i be able to plot the times taken for the masses to cook in hours, for weight in the given range?
I have tried to code this function for a couple of days now but it wont seem to work, due to array size issues and the general function just not working.
Any help would be greatly appreciated :)
First, you need a master equation as a function of weight and t, which you want fsolve to find the zero of. Then for each weight, you can capture it in another function that you then solve for t:
alpha = 0.7e-7;
rbya = 0.464;
k = 1:22;
a = #(weight)(weight./(1360*pi)).^(1/3);
eqn = #(weight,t)2/pi/rbya*sum((-1).^(k-1)./k.*sin(k*pi*rbya).*exp(-1.*k.^2.*pi^2.*alpha.*t./(a(weight).^2)))-57/80;
weights = 4:8;
ts = zeros(size(weights));
for i = 1:numel(weights)
sub_eqn = #(t)eqn(weights(i),t);
ts(i)=fsolve(sub_eqn,14400);
end
plot(weights,ts/(60*60))
xlabel("Weight (kg)")
ylabel("Cooking Time (hrs)")
If you want to solve the entire set of equations at once, then you need to be careful of array sizes (as you have experienced, read more here). k should be a column vector so that sum will sum along each column, and weights should be a row vector so that element-wise operations will repeat the k’s for each weight. You also need your list of initial guesses to be the same size as weights so that fsolve can have a guess for each weight:
alpha = 0.7e-7;
rbya = 0.464;
k = (1:22)';
a = #(weight)(weight./(1360*pi)).^(1/3);
weights = 4:8;
eqn = #(t)2/pi/rbya*sum((-1).^(k-1)./k.*sin(k*pi*rbya).*exp(-1.*k.^2.*pi^2.*alpha.*t./(a(weights).^2)))-57/80;
ts=fsolve(eqn,repmat(14400,size(weights)));
plot(weights,ts/(60*60))
xlabel("Weight (kg)")
ylabel("Cooking Time (hrs)")
Note that you do get slightly different answers with the two methods.

torch.nn.DataParallel with torch.autograd.grad in loss function fails

I have a neural network model that represents the surface of an object. For this to work, the gradients are calculated in the loss function (because for example it's a property of signed distance fields (sdfs) that the gradient is always unit length).
The loss function is the one from SIREN for sdfs and defined as
def sdf(model_output, gt):
gt_sdf = gt['sdf']
gt_normals = gt['normals']
coords = model_output['model_in']
pred_sdf = model_output['model_out'].to(torch.float32)
gradient = diff_operators.gradient(pred_sdf, coords)
# Wherever boundary_values is not equal to zero, we interpret it as a boundary constraint.
sdf_constraint = torch.where(gt_sdf != -1, pred_sdf, torch.zeros_like(pred_sdf))
inter_constraint = torch.where(gt_sdf != -1, torch.zeros_like(pred_sdf), torch.exp(-1e2 * torch.abs(pred_sdf)))
normal_constraint = torch.where(gt_sdf != -1, 1 - F.cosine_similarity(gradient, gt_normals, dim=-1)[..., None],
torch.zeros_like(gradient[..., :1]))
grad_constraint = torch.abs(gradient.norm(dim=-1) - 1)
return {'sdf': torch.abs(sdf_constraint).mean() * 3e3,
'inter': inter_constraint.mean() * 1e2,
'normal_constraint': normal_constraint.mean() * 1e2,
'grad_constraint': grad_constraint.mean() * 5e1}
and the gradient calculation uses torch.autograd.grad:
def gradient(y, x, grad_outputs=None):
if grad_outputs is None:
grad_outputs = torch.ones_like(y)
grad = torch.autograd.grad(y, [x], grad_outputs=grad_outputs, create_graph=True)[0]
return grad
Now I wanted to parallelise the training by implementing torch.nn.DataParallel. I get the following error:
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
Is it possible to use torch.nn.DataParallel with gradient calculation in the loss function and what do I need to change to make it work?
Looking at the documentation of nn.parallel.DistributedDataParallel:
This module doesn’t work with torch.autograd.grad() (i.e. it will only work if gradients are to be accumulated in .grad attributes of parameters).
It also recommends to use torch.distributed.autograd.backward and torch.distributed.optim.DistributedOptimizer.
Also in the documentation of torch.distributed it recommends using gloo backend:
Please notice that currently the only backend where all the functions are guaranteed to work is gloo.

store values of function to prevent from running again

Say I have some complicated function f(fvar1, ..., fvarN) such as:
def f(fvar1,..., fvarN):
return (complicated function of fvar1, ..., fvarN).
Now function g(gvar1, ..., gvarM) has an expression in terms of f(fvar1, ..., fvarN), let's say:
def g(gvar1, ..., gvarM):
return stuff * f(gvar1 * gvar2, ..., gvar5 * gvarM) - stuff * f(gvar3, gvar2, ..., gvarM)
where the arguments of f inside g can be different linear combinations of gvar1, ..., gvarM.
Because f is a complicated function, it is costly to call f, but it is also difficult to store the value locally in g because g has many instances of f with different argument combinations.
Is there a way to store values of f such that f of the same values are not called again and again without having to define every different instance of f locally within g?
Yes, this is called memoisation. The basic idea is to have f() maintain some sort of data store based on the parameters passed in. Then, if it's called with the same parameters, it simply returns the stored value rather than recalculating it.
The data store probably needs to be limited in size and optimised for the pattern of calls you expect, by removing parameter sets based on some rules. For example, if the number of times a parameter set is used indicates its likelihood of being used in future, you probably want to remove patterns that are used infrequently, and keep those that are use more often.
Consider, for example, the following Python code for adding two numbers (let us pretend that this is a massively time-expensive operation):
import random
def addTwo(a, b):
return a + b
for _ in range(100):
x = random.randint(1, 5)
y = random.randint(1, 5)
z = addTwo(x, y)
print(f"{x} + {y} = {z}")
That works but, of course, is inefficient if you use the same numbers as used previously. You can add memoisation as follows.
The code will "remember" a certain number of calculations (probably random, given the dictionaries but I won't guarantee that). If it gets a pair it already knows about, it just returns the cached value.
Otherwise, it calculates the value, storing it into the cache, and ensuring said cache doesn't grow too big:
import random, time
# Cache, and the stats for it.
(pairToSumMap, cached, calculated) = ({}, 0, 0)
def addTwo(a, b):
global pairToSumMap, cached, calculated
# Attempt two different cache lookups first (a:b, b:a).
sum = None
try:
sum = pairToSumMap[f"{a}:{b}"]
except:
try:
sum = pairToSumMap[f"{b}:{a}"]
except:
pass
# Found in cache, return.
if sum is not None:
print("Using cached value: ", end ="")
cached += 1
return sum
# Not found, calculate and add to cache (with limited cache size).
print("Calculating value: ", end="")
calculated += 1
time.sleep(1) ; sum = a + b # Make expensive.
if len(pairToSumMap) > 10:
del pairToSumMap[list(pairToSumMap.keys())[0]]
pairToSumMap[f"{a}:{b}"] = sum
return sum
for _ in range(100):
x = random.randint(1, 5)
y = random.randint(1, 5)
z = addTwo(x, y)
print(f"{x} + {y} = {z}")
print(f"Calculated {calculated}, cached {cached}")
You'll see I've also added cached/calculated information, including a final statistics line which shows the caching in action, for example:
Calculated 29, cached 71
I've also made the calculation an expensive operation so you can see it in action (as per the speed of output). Ones that are cached will come back immediately, calculating the sum will take a second.

Understanding log_prob for Normal distribution in pytorch

I'm currently trying to solve Pendulum-v0 from the openAi gym environment which has a continuous action space. As a result, I need to use a Normal Distribution to sample my actions. What I don't understand is the dimension of the log_prob when using it :
import torch
from torch.distributions import Normal
means = torch.tensor([[0.0538],
[0.0651]])
stds = torch.tensor([[0.7865],
[0.7792]])
dist = Normal(means, stds)
a = torch.tensor([1.2,3.4])
d = dist.log_prob(a)
print(d.size())
I was expecting a tensor of size 2 (one log_prob for each actions) but it output a tensor of size(2,2).
However, when using a Categorical distribution for discrete environment the log_prob has the expected size:
logits = torch.tensor([[-0.0657, -0.0949],
[-0.0586, -0.1007]])
dist = Categorical(logits = logits)
a = torch.tensor([1, 1])
print(dist.log_prob(a).size())
give me a tensor a size(2).
Why is the log_prob for Normal distribution of a different size ?
If one takes a look in the source code of torch.distributions.Normal and finds the definition of the log_prob(value) function, one can see that the main part of the calculation is:
return -((value - self.loc) ** 2) / (2 * var) - some other part
where value is a variable containing values for which you want to calculate the log probability (in your case, a), self.loc is the mean of the distribution (in you case, means) and var is the variance, that is, the square of the standard deviation (in your case, stds**2). One can see that this is indeed the logarithm of the probability density function of the normal distribution, minus some constants and logarithm of the standard deviation that I don't write above.
In the first example, you define means and stds to be column vectors, while the values to be a row vector
means = torch.tensor([[0.0538],
[0.0651]])
stds = torch.tensor([[0.7865],
[0.7792]])
a = torch.tensor([1.2,3.4])
But subtracting a row vector from a column vector, that the code does in value - self.loc in Python gives a matrix (try!), thus the result you obtain is a value of log_prob for each of your two defined distribution and for each of the variables in a.
If you want to obtain a log_prob without the cross terms, then define the variables consistently, i.e., either
means = torch.tensor([[0.0538],
[0.0651]])
stds = torch.tensor([[0.7865],
[0.7792]])
a = torch.tensor([[1.2],[3.4]])
or
means = torch.tensor([0.0538,
0.0651])
stds = torch.tensor([0.7865,
0.7792])
a = torch.tensor([1.2,3.4])
This is how you do in your second example, which is why you obtain the result you expected.

Defining a seed value in Branscripts for CNTK sequential machine learning models

This is respect to CNTK brain scripts. I went through [1] to figure out whether there is an option to specify the random seed value, although I couldn't find any (Yes there is an option to set the 'random seed' parameter through the ParameterTensor() function, but if I followed that approach, I might have to explicitly initialize all the LSTM weights separately(defining separate weights for input layer gate, forget layer gate etc. ), instead of using the model sequence as below). Is there any other option available to set the random seed value, preserving the following RNN layered sequence.
nn_Train = {
action = train
BrainScriptNetworkBuilder = {
model = Sequential (
RecurrentLSTMLayer {$stateDim$, usePeepholes = true}:
DenseLayer {$labelDim$, bias=false}
)
z = model (inputs)
inputs=Input($inputDim$) # features
labels=Input($labelDim$)
# loss and metric
ce = SquareError(labels, z)
# node assignment
featureNodes = (inputs)
labelNodes = (labels)
criterionNodes = (ce)
evaluationNodes = (ce)
outputNodes = (z)
}
[1] https://github.com/microsoft/cntk/wiki/Parameters-And-Constants#random-initialization
There isn't a global random seed option for parameters unfortunately. However, you can modify the cntk.core.bs file next to cntk.exe where all the layers are defined to support random seed for the layers you want.