Why is my tensorflow generator function so slow? - deep-learning

Below generator function is too slow. Is there a way by which we can optimise this code ?.
train_dataset_c1 is train dataset for Class 1 of the form image,1
train_dataset_c0 is train dataset for Class 0 of the form image,0
def generator(positive_dataset, negative_dataset):
while True:
for pos_rec, neg_rec in zip(positive_dataset, negative_dataset):
pos_x, pos_y = pos_rec
neg_x, neg_y = neg_rec
x = tf.concat([pos_x, neg_x], axis=0)
y = tf.concat([pos_y, neg_y], axis=0)
yield x, y
train_generator = generator(train_dataset_c1, train_dataset_c0)
test_generator = generator(test_dataset_c1, test_dataset_c0)

If you are using tensorflow 2.0 I'd recommend you using the tf.data API to speed up your pipeline.
Actually there is a from_generator function that you can apply to your generator to speed it up
After converting it to a tf.data.Dataset object by using this function you can optimise it even more by using any strategy in this tutorial

Related

how to model and train a PyTorch Model to to learn a mathematical function f(x)?

I'm currently learning how to use pytorch to model NNs and did the "Getting Started" Session on the PyTorch Website.
I tried to train a PyTorch NN to apply the function e.g. f(x)=2x-1 to a given input integer list but my model is far apart from learning the right thing.
How can I model and train a PyTorch model to learn a given mathematical function f(x) ?
I've tried this model and trained it with 10 random numbers with labels generated by the 'myFunc' function to learn the function 2x-1.
Thanks for your help.
batch_size = 10
def myFunc(a):
#y = 2x-1
return 2*a-1
class NeuralNetwork(nn.Module):
def __init__(self):
super().__init__()
self.lin1 = nn.Linear(batch_size,1)
self.lin2 = nn.Linear(1,batch_size)
def forward(self, x):
x = self.lin1(x)
x = F.relu(x)
x = self.lin2(x)
return x
model = NeuralNetwork()
Theoretically for your example of an affine-linear function over a bounded interval you need only
linear(bias) -> relu -> linear(bias)
with one node per linear layer. Or just one linear layer without activation.
For more general functions, you will need larger layers in the construction of the first type, with one node for every piece in a piece-wise approximation. The last layer always needs to be linear without activation. Using more layers might give more pieces with less total nodes.

How to define a custom loss function for a multi-dimensional target

I'm working with Tensorflow 2.0 and I'm using a normal sequential layer.
I'm trying to define a custom loss functions which does the following:
takes some elements of the input
computes their sum and invert the result
multiplies the result with a part of y_pred
constrain the result to be as close to 1 as possibile
Thus the loss function would be L() = MSE() + (described above)
My code follows:
def custom_loss_wrapper(input_train):
#tf.function
def summing(row):
return tf.math.reduce_sum(row, 1,keepdims=True)
#tf.function
def custom_loss(y_true, y_pred):
row_M = input_train
row_M = row_M[:, 2:5]
sum_M = summing(row_M)
inv_M = (1/sum_M)
row_B = y_pred[:, :3]
sum_B = summing(row_B)
row_Q = tf.math.multiply(inv_M,row_B)
alpha = 0.01
penalty = K.mean(K.square(sum_Q - 1))
return K.mean(K.square(y_true - y_pred)) + (1/alpha) * penalty
return custom_loss
I would like to understand if what I'm doing is right. There are not errors and the training runs, but I do not know if this piece of code does what I'm trying to define. Mostly if this works correctly considering batches of data and not single records

how to calculate a Mobilenet FLOPs in Keras

run_meta = tf.RunMetadata()
enter codwith tf.Session(graph=tf.Graph()) as sess:
K.set_session(sess)
with tf.device('/cpu:0'):
base_model = MobileNet(alpha=1, weights=None, input_tensor=tf.placeholder('float32', shape=(1,224,224,3)))
opts = tf.profiler.ProfileOptionBuilder.float_operation()
flops = tf.profiler.profile(sess.graph, run_meta=run_meta, cmd='op', options=opts)
opts = tf.profiler.ProfileOptionBuilder.trainable_variables_parameter()
params = tf.profiler.profile(sess.graph, run_meta=run_meta, cmd='op', options=opts)
print("{:,} --- {:,}".format(flops.total_float_ops, params.total_parameters))
When I run above code, I got a below result
1,137,481,704 --- 4,253,864
This is different from the flops described in the paper.
mobilenet: https://arxiv.org/pdf/1704.04861.pdf
ShuffleNet: https://arxiv.org/pdf/1707.01083.pdf
How to calculate exact flops described in the paper?
tl;dr You've actually got the right answer! You are simply comparing flops with multiply accumulates (from the paper) and therefore need to divide by two.
If you're using Keras, then the code you listed is slightly over-complicating things...
Let model be any compiled Keras model. We can arrive at the flops of the model with the following code.
import tensorflow as tf
import keras.backend as K
def get_flops():
run_meta = tf.RunMetadata()
opts = tf.profiler.ProfileOptionBuilder.float_operation()
# We use the Keras session graph in the call to the profiler.
flops = tf.profiler.profile(graph=K.get_session().graph,
run_meta=run_meta, cmd='op', options=opts)
return flops.total_float_ops # Prints the "flops" of the model.
# .... Define your model here ....
# You need to have compiled your model before calling this.
print(get_flops())
However, when I look at my own example (not Mobilenet) that I did on my computer, the printed out total_float_ops was 2115 and I had the following results when I simply printed the flops variable:
[...]
Mul 1.06k float_ops (100.00%, 49.98%)
Add 1.06k float_ops (50.02%, 49.93%)
Sub 2 float_ops (0.09%, 0.09%)
It's pretty clear that the total_float_ops property takes into consideration multiplication, addition and subtraction.
I then looked back at the MobileNets example, looking through the paper briefly, I found the implementation of MobileNet that is the default Keras implementation based on the number of parameters:
The first model in the table matches the result you have (4,253,864) and the Mult-Adds are approximately half of the flops result that you have. Therefore you have the correct answer, it's just you were mistaking flops for Mult-Adds (aka multiply accumulates or MACs).
If you want to compute the number of MACs you simply have to divide the result from the above code by two.
Important Notes
Keep the following in mind if you are trying to run the code sample:
The code sample was written in 2018 and doesn't work with tensorflow version 2. See #driedler 's answer for a complete example of tensorflow version 2 compatibility.
The code sample was originally meant to be run once on a compiled model... For a better example of using this in a way that does not have side effects (and can therefore be run multiple times on the same model), see #ch271828n 's answer.
This is working for me in TF-2.1:
def get_flops(model_h5_path):
session = tf.compat.v1.Session()
graph = tf.compat.v1.get_default_graph()
with graph.as_default():
with session.as_default():
model = tf.keras.models.load_model(model_h5_path)
run_meta = tf.compat.v1.RunMetadata()
opts = tf.compat.v1.profiler.ProfileOptionBuilder.float_operation()
# Optional: save printed results to file
# flops_log_path = os.path.join(tempfile.gettempdir(), 'tf_flops_log.txt')
# opts['output'] = 'file:outfile={}'.format(flops_log_path)
# We use the Keras session graph in the call to the profiler.
flops = tf.compat.v1.profiler.profile(graph=graph,
run_meta=run_meta, cmd='op', options=opts)
return flops.total_float_ops
The above solutions cannot be run twice, otherwise the flops will accumulate! (In other words, the second time you run it, you will get output = flops_of_1st_call + flops_of_2nd_call.) The following code calls reset_default_graph to avoid this.
def get_flops():
session = tf.compat.v1.Session()
graph = tf.compat.v1.get_default_graph()
with graph.as_default():
with session.as_default():
model = keras.applications.mobilenet.MobileNet(
alpha=1, weights=None, input_tensor=tf.compat.v1.placeholder('float32', shape=(1, 224, 224, 3)))
run_meta = tf.compat.v1.RunMetadata()
opts = tf.compat.v1.profiler.ProfileOptionBuilder.float_operation()
# Optional: save printed results to file
# flops_log_path = os.path.join(tempfile.gettempdir(), 'tf_flops_log.txt')
# opts['output'] = 'file:outfile={}'.format(flops_log_path)
# We use the Keras session graph in the call to the profiler.
flops = tf.compat.v1.profiler.profile(graph=graph,
run_meta=run_meta, cmd='op', options=opts)
tf.compat.v1.reset_default_graph()
return flops.total_float_ops
Modified from #driedler, thanks!
You can use model.summary() on all Keras models to get number of FLOPS.

Implementing WNGrad in Pytorch?

I'm trying to implement the WNGrad (technically WN-Adam, algorithm 4 in the paper) optimizier (WNGrad) in pytorch. I've never implemented an optimizer in pytorch before so I don't know if I've done it correctly (I started from the adam implementation). The optimizer does not make much progress and falls down like I would expect (bj values can only monotonically increase, which happens quickly so no progress is made) but I'm guessing I have a bug. Standard optimizers (Adam, SGD) work fine on the same model I'm trying to optimize.
Does this implementation look correct?
from torch.optim import Optimizer
class WNAdam(Optimizer):
"""Implements WNAdam algorithm.
It has been proposed in `WNGrad: Learn the Learning Rate in Gradient Descent`_.
Arguments:
params (iterable): iterable of parameters to optimize or dicts defining
parameter groups
lr (float, optional): learning rate (default: 0.1)
beta1 (float, optional): exponential smoothing coefficient for gradient.
When beta=0 this implements WNGrad.
.. _WNGrad\: Learn the Learning Rate in Gradient Descent:
https://arxiv.org/abs/1803.02865
"""
def __init__(self, params, lr=0.1, beta1=0.9):
if not 0.0 <= beta1 < 1.0:
raise ValueError("Invalid beta1 parameter: {}".format(beta1))
defaults = dict(lr=lr, beta1=beta1)
super().__init__(params, defaults)
def step(self, closure=None):
"""Performs a single optimization step.
Arguments:
closure (callable, optional): A closure that reevaluates the model
and returns the loss.
"""
loss = None
if closure is not None:
loss = closure()
for group in self.param_groups:
for p in group['params']:
if p.grad is None:
continue
grad = p.grad.data
state = self.state[p]
# State initialization
if len(state) == 0:
state['step'] = 0
# Exponential moving average of gradient values
state['exp_avg'] = torch.zeros_like(p.data)
# Learning rate adjustment
state['bj'] = 1.0
exp_avg = state['exp_avg']
beta1 = group['beta1']
state['step'] += 1
state['bj'] += (group['lr']**2)/(state['bj'])*grad.pow(2).sum()
# update exponential moving average
exp_avg.mul_(beta1).add_(1 - beta1, grad)
bias_correction = 1 - beta1 ** state['step']
p.data.sub_(group['lr'] / state['bj'] / bias_correction, exp_avg)
return loss
The paper's author has an open sourced implementation on GitHub.
The WNGrad paper
states it's inspired by batch (and weight) normalization. You should use L2 norm with respect to the weight dimensions (don't sum it all) as show in this algorithm

Computing cosine_proximity loss between two outputs of the network

I'm using Keras 2.0.2 Functional API (Tensorflow 1.0.1) to implement a network that takes several inputs and produces two outputs a and b. I need to train the network using the cosine_proximity loss, such that b is the label for a. How do I do this?
Sharing my code here. The last line model.fit(..) is the problematic part because I don't have labeled data per se. The label is produced by the model itself.
from keras.models import Model
from keras.layers import Input, LSTM
from keras import losses
shared_lstm = LSTM(dim)
q1 = Input(shape=(..,.. ), name='q1')
q2 = Input(shape=(..,.. ), name='q2')
a = shared_lstm(q1)
b = shared_lstm(q2)
model = Model(inputs=[q1,q2], outputs=[a, b])
model.compile(optimizer='adam', loss=losses.cosine_proximity)
model.fit([testq1, testq2], [?????])
You can define a fake true label first. For example, define it as a 1-D array of ones of the size of your input data.
Now comes the loss function. You can write it as follows.
def my_cosine_proximity(y_true, y_pred):
a = y_pred[0]
b = y_pred[1]
# depends on whether you want to normalize
a = K.l2_normalize(a, axis=-1)
b = K.l2_normalize(b, axis=-1)
return -K.mean(a * b, axis=-1) + 0 * y_true
I have multiplied y_true by zero and added it just so that Theano does give not missing input warning/error.
You should call your fit function normally i.e. by including your fake ground-truth labels.
model.compile('adam', my_cosine_proximity) # 'adam' used as an example optimizer
model.fit([testq1, testq2], fake_y_true)