Defining a seed value in Branscripts for CNTK sequential machine learning models - deep-learning

This is respect to CNTK brain scripts. I went through [1] to figure out whether there is an option to specify the random seed value, although I couldn't find any (Yes there is an option to set the 'random seed' parameter through the ParameterTensor() function, but if I followed that approach, I might have to explicitly initialize all the LSTM weights separately(defining separate weights for input layer gate, forget layer gate etc. ), instead of using the model sequence as below). Is there any other option available to set the random seed value, preserving the following RNN layered sequence.
nn_Train = {
action = train
BrainScriptNetworkBuilder = {
model = Sequential (
RecurrentLSTMLayer {$stateDim$, usePeepholes = true}:
DenseLayer {$labelDim$, bias=false}
)
z = model (inputs)
inputs=Input($inputDim$) # features
labels=Input($labelDim$)
# loss and metric
ce = SquareError(labels, z)
# node assignment
featureNodes = (inputs)
labelNodes = (labels)
criterionNodes = (ce)
evaluationNodes = (ce)
outputNodes = (z)
}
[1] https://github.com/microsoft/cntk/wiki/Parameters-And-Constants#random-initialization

There isn't a global random seed option for parameters unfortunately. However, you can modify the cntk.core.bs file next to cntk.exe where all the layers are defined to support random seed for the layers you want.

Related

Can gtsummary be used to predict an ordinal variable (several predictors of all kinds in one model), adjusted for confounding factors

i am trying to bulid a prediction model for an oridinal variable. I now that MASS:polr() function tragets this issue but i want to present it in a more approchable way. i thought gtsummary package may be sutiable
my code-
reg_tb<-tbl_uvregression(
reg_df,
include = c(a,b,c,d),
method = polr,
y = e,
exponentiate = TRUE,
pvalue_fun = ~style_pvalue(.x, digits = 2))
i now that tbl_uvregression() is a univariate model but under 'methods = ' i used the 'polr' option. i suspected polr can't be used in tbl_uvregression() to do an adjusted prediction model because after including 15 predictors, they all remained significant when runing the model (not reasonbele, several factors are strongly associated with each other).

torch.nn.DataParallel with torch.autograd.grad in loss function fails

I have a neural network model that represents the surface of an object. For this to work, the gradients are calculated in the loss function (because for example it's a property of signed distance fields (sdfs) that the gradient is always unit length).
The loss function is the one from SIREN for sdfs and defined as
def sdf(model_output, gt):
gt_sdf = gt['sdf']
gt_normals = gt['normals']
coords = model_output['model_in']
pred_sdf = model_output['model_out'].to(torch.float32)
gradient = diff_operators.gradient(pred_sdf, coords)
# Wherever boundary_values is not equal to zero, we interpret it as a boundary constraint.
sdf_constraint = torch.where(gt_sdf != -1, pred_sdf, torch.zeros_like(pred_sdf))
inter_constraint = torch.where(gt_sdf != -1, torch.zeros_like(pred_sdf), torch.exp(-1e2 * torch.abs(pred_sdf)))
normal_constraint = torch.where(gt_sdf != -1, 1 - F.cosine_similarity(gradient, gt_normals, dim=-1)[..., None],
torch.zeros_like(gradient[..., :1]))
grad_constraint = torch.abs(gradient.norm(dim=-1) - 1)
return {'sdf': torch.abs(sdf_constraint).mean() * 3e3,
'inter': inter_constraint.mean() * 1e2,
'normal_constraint': normal_constraint.mean() * 1e2,
'grad_constraint': grad_constraint.mean() * 5e1}
and the gradient calculation uses torch.autograd.grad:
def gradient(y, x, grad_outputs=None):
if grad_outputs is None:
grad_outputs = torch.ones_like(y)
grad = torch.autograd.grad(y, [x], grad_outputs=grad_outputs, create_graph=True)[0]
return grad
Now I wanted to parallelise the training by implementing torch.nn.DataParallel. I get the following error:
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
Is it possible to use torch.nn.DataParallel with gradient calculation in the loss function and what do I need to change to make it work?
Looking at the documentation of nn.parallel.DistributedDataParallel:
This module doesn’t work with torch.autograd.grad() (i.e. it will only work if gradients are to be accumulated in .grad attributes of parameters).
It also recommends to use torch.distributed.autograd.backward and torch.distributed.optim.DistributedOptimizer.
Also in the documentation of torch.distributed it recommends using gloo backend:
Please notice that currently the only backend where all the functions are guaranteed to work is gloo.

How would you represent a 2D matrix as an input state and have it select the index of the row it thinks is the best action for that state?

I'm trying to build an RL model where the input is a NxM matrix, N being the number of selectable actions and M being features describing the action.
In all the RL problems I've seen so far, the state space is either a vector and passed in to a regular neural network or an image and is passed in through a convolutional neural network.
But say we have an environment where the objective is to learn to select the strongest worker for a fixed task, and a single state representation looked like this:
names = ['Bob','Henry','Mike','Phil']
max_squat = [300,400,200,100]
max_bench = [200,100,225,100]
max_deadlift = [600,400,300,225]
strongest_worker_df = pd.DataFrame({'Name':names,'Max_Squat':max_squat,'Max_Bench':max_bench,'Max_Deadlift':max_deadlift})
I want to pass in this 2D matrix (without Name column of course) as an input and have it return a row index, and then pass that row index as an action to the environment and get a reward. Then run a reinforcement learning algorithm on the gradient of the reward with respect to the action selection.
Any suggestions on how to go about this, specifically the state representation?
Well as long as your matrix is of fixed size (N and M don't change), you could just vectorize it (concatenate rows) and the network would work like that.
It is perhaps suboptimal to do this though because given the problem setting it seems preferable to maybe pass each row through the same neural net to get features and then have a top level discriminator that operates on the concatenated features.
An example model that would do this (in TensorFlow code) is:
model_input = x = Input(shape=(N, M))
x = Dense(64, activation='relu')(x)
x = Dropout(0.1)(x)
x = Dense(32, activation='relu')(x)
x = Dropout(0.1)(x)
x = Dense(16, activation='relu')(x)
x = Dropout(0.1)(x)
# The layers above this line define the feature generator, at this point
# your model has 16 fetaures for every person, i.e. an Nx16 matrix.
# Each person's feature have gone through the same nodes and have received
# the same transformations from them.
x = Flatten()(x)
# The Nx16 matrix is now flattened and below define the discriminator
# which will have a softmax output of size N (the highest output identifies
# the selected index)
x = Dense(16, activation='relu')(x)
x = Dropout(0.1)(x)
x = Dense(16, activation='relu')(x)
x = Dropout(0.1)(x)
x = Dense(16, activation='relu')(x)
x = Dropout(0.1)(x)
x = Dense(N, activation='softmax')(x)
model = Model(inputs=model_input, outputs=x)

Understanding log_prob for Normal distribution in pytorch

I'm currently trying to solve Pendulum-v0 from the openAi gym environment which has a continuous action space. As a result, I need to use a Normal Distribution to sample my actions. What I don't understand is the dimension of the log_prob when using it :
import torch
from torch.distributions import Normal
means = torch.tensor([[0.0538],
[0.0651]])
stds = torch.tensor([[0.7865],
[0.7792]])
dist = Normal(means, stds)
a = torch.tensor([1.2,3.4])
d = dist.log_prob(a)
print(d.size())
I was expecting a tensor of size 2 (one log_prob for each actions) but it output a tensor of size(2,2).
However, when using a Categorical distribution for discrete environment the log_prob has the expected size:
logits = torch.tensor([[-0.0657, -0.0949],
[-0.0586, -0.1007]])
dist = Categorical(logits = logits)
a = torch.tensor([1, 1])
print(dist.log_prob(a).size())
give me a tensor a size(2).
Why is the log_prob for Normal distribution of a different size ?
If one takes a look in the source code of torch.distributions.Normal and finds the definition of the log_prob(value) function, one can see that the main part of the calculation is:
return -((value - self.loc) ** 2) / (2 * var) - some other part
where value is a variable containing values for which you want to calculate the log probability (in your case, a), self.loc is the mean of the distribution (in you case, means) and var is the variance, that is, the square of the standard deviation (in your case, stds**2). One can see that this is indeed the logarithm of the probability density function of the normal distribution, minus some constants and logarithm of the standard deviation that I don't write above.
In the first example, you define means and stds to be column vectors, while the values to be a row vector
means = torch.tensor([[0.0538],
[0.0651]])
stds = torch.tensor([[0.7865],
[0.7792]])
a = torch.tensor([1.2,3.4])
But subtracting a row vector from a column vector, that the code does in value - self.loc in Python gives a matrix (try!), thus the result you obtain is a value of log_prob for each of your two defined distribution and for each of the variables in a.
If you want to obtain a log_prob without the cross terms, then define the variables consistently, i.e., either
means = torch.tensor([[0.0538],
[0.0651]])
stds = torch.tensor([[0.7865],
[0.7792]])
a = torch.tensor([[1.2],[3.4]])
or
means = torch.tensor([0.0538,
0.0651])
stds = torch.tensor([0.7865,
0.7792])
a = torch.tensor([1.2,3.4])
This is how you do in your second example, which is why you obtain the result you expected.

how to calculate a Mobilenet FLOPs in Keras

run_meta = tf.RunMetadata()
enter codwith tf.Session(graph=tf.Graph()) as sess:
K.set_session(sess)
with tf.device('/cpu:0'):
base_model = MobileNet(alpha=1, weights=None, input_tensor=tf.placeholder('float32', shape=(1,224,224,3)))
opts = tf.profiler.ProfileOptionBuilder.float_operation()
flops = tf.profiler.profile(sess.graph, run_meta=run_meta, cmd='op', options=opts)
opts = tf.profiler.ProfileOptionBuilder.trainable_variables_parameter()
params = tf.profiler.profile(sess.graph, run_meta=run_meta, cmd='op', options=opts)
print("{:,} --- {:,}".format(flops.total_float_ops, params.total_parameters))
When I run above code, I got a below result
1,137,481,704 --- 4,253,864
This is different from the flops described in the paper.
mobilenet: https://arxiv.org/pdf/1704.04861.pdf
ShuffleNet: https://arxiv.org/pdf/1707.01083.pdf
How to calculate exact flops described in the paper?
tl;dr You've actually got the right answer! You are simply comparing flops with multiply accumulates (from the paper) and therefore need to divide by two.
If you're using Keras, then the code you listed is slightly over-complicating things...
Let model be any compiled Keras model. We can arrive at the flops of the model with the following code.
import tensorflow as tf
import keras.backend as K
def get_flops():
run_meta = tf.RunMetadata()
opts = tf.profiler.ProfileOptionBuilder.float_operation()
# We use the Keras session graph in the call to the profiler.
flops = tf.profiler.profile(graph=K.get_session().graph,
run_meta=run_meta, cmd='op', options=opts)
return flops.total_float_ops # Prints the "flops" of the model.
# .... Define your model here ....
# You need to have compiled your model before calling this.
print(get_flops())
However, when I look at my own example (not Mobilenet) that I did on my computer, the printed out total_float_ops was 2115 and I had the following results when I simply printed the flops variable:
[...]
Mul 1.06k float_ops (100.00%, 49.98%)
Add 1.06k float_ops (50.02%, 49.93%)
Sub 2 float_ops (0.09%, 0.09%)
It's pretty clear that the total_float_ops property takes into consideration multiplication, addition and subtraction.
I then looked back at the MobileNets example, looking through the paper briefly, I found the implementation of MobileNet that is the default Keras implementation based on the number of parameters:
The first model in the table matches the result you have (4,253,864) and the Mult-Adds are approximately half of the flops result that you have. Therefore you have the correct answer, it's just you were mistaking flops for Mult-Adds (aka multiply accumulates or MACs).
If you want to compute the number of MACs you simply have to divide the result from the above code by two.
Important Notes
Keep the following in mind if you are trying to run the code sample:
The code sample was written in 2018 and doesn't work with tensorflow version 2. See #driedler 's answer for a complete example of tensorflow version 2 compatibility.
The code sample was originally meant to be run once on a compiled model... For a better example of using this in a way that does not have side effects (and can therefore be run multiple times on the same model), see #ch271828n 's answer.
This is working for me in TF-2.1:
def get_flops(model_h5_path):
session = tf.compat.v1.Session()
graph = tf.compat.v1.get_default_graph()
with graph.as_default():
with session.as_default():
model = tf.keras.models.load_model(model_h5_path)
run_meta = tf.compat.v1.RunMetadata()
opts = tf.compat.v1.profiler.ProfileOptionBuilder.float_operation()
# Optional: save printed results to file
# flops_log_path = os.path.join(tempfile.gettempdir(), 'tf_flops_log.txt')
# opts['output'] = 'file:outfile={}'.format(flops_log_path)
# We use the Keras session graph in the call to the profiler.
flops = tf.compat.v1.profiler.profile(graph=graph,
run_meta=run_meta, cmd='op', options=opts)
return flops.total_float_ops
The above solutions cannot be run twice, otherwise the flops will accumulate! (In other words, the second time you run it, you will get output = flops_of_1st_call + flops_of_2nd_call.) The following code calls reset_default_graph to avoid this.
def get_flops():
session = tf.compat.v1.Session()
graph = tf.compat.v1.get_default_graph()
with graph.as_default():
with session.as_default():
model = keras.applications.mobilenet.MobileNet(
alpha=1, weights=None, input_tensor=tf.compat.v1.placeholder('float32', shape=(1, 224, 224, 3)))
run_meta = tf.compat.v1.RunMetadata()
opts = tf.compat.v1.profiler.ProfileOptionBuilder.float_operation()
# Optional: save printed results to file
# flops_log_path = os.path.join(tempfile.gettempdir(), 'tf_flops_log.txt')
# opts['output'] = 'file:outfile={}'.format(flops_log_path)
# We use the Keras session graph in the call to the profiler.
flops = tf.compat.v1.profiler.profile(graph=graph,
run_meta=run_meta, cmd='op', options=opts)
tf.compat.v1.reset_default_graph()
return flops.total_float_ops
Modified from #driedler, thanks!
You can use model.summary() on all Keras models to get number of FLOPS.