Tensorflow: tf.reduce_logsumexp returning negative values - deep-learning

I am new to tensorflow framework. I am using
tf.reduce_logsumexp in my code. But inspecting the output I see that some of the values are negative. How is that possible? I suspected that it might be due to some nans or inf values so I put in a check to remove those values in my input like this(X is my input):
res = tf.where(tf.is_inf(X), tf.zeros_like(X), X)
res = tf.where(tf.is_nan(res), tf.zeros_like(res), res)
output = tf.reduce_logsumexp(res, axis=0)
But even this does not help and I still get some values as negative. Any help appreciated! Thanks

Note that the logarithm is negative, if its argument is smaller than 1. Therefore, you will always get a negative logsumexp output if the sum of the exponentials is smaller than 1. This occurs for example if all of the exponents are much smaller than zero, i.e. you have
res = [-2.5, -1.4, -3.3, -1.65, -2.15], then the corresponding exponentials are
exp(res) = [0.082, 0.247, 0.037, 0.192, 0.116], and their sum
sum(exp(res)) = 0.674 smaller than 1.
If you then take the logarithm, you get
log(sum(exp(res))) = log(0.674) = -0.394.
It simply emerges from the definition of the function that it does not necessarily have a positive output. You can test it with the following program:
import numpy as np
def logsumexp(arr):
summ = 0.0
for i in range(arr.shape[0]):
print(np.exp(arr[i]))
summ += np.exp(arr[i])
print('Sum: {}'.format(summ))
return np.log(np.sum(np.exp(arr)))
arr = np.asarray([-2.5, -1.4, -3.3, -1.65, -2.15])
print('LogSumExp: {}'.format(logsumexp(arr)))

Related

Use of function / return

I had the task to code the following:
Take a list of integers and returns the value of these numbers added up, but only if they are odd.
Example input: [1,5,3,2]
Output: 9
I did the code below and it worked perfectly.
numbers = [1,5,3,2]
print(numbers)
add_up_the_odds = []
for number in numbers:
if number % 2 == 1:
add_up_the_odds.append(number)
print(add_up_the_odds)
print(sum(add_up_the_odds))
Then I tried to re-code it using function definition / return:
def add_up_the_odds(numbers):
odds = []
for number in range(1,len(numbers)):
if number % 2 == 1:
odds.append(number)
return odds
numbers = [1,5,3,2]
print (sum(odds))
But I couldn’t make it working, anybody can help with that?
Note: I'm going to assume Python 3.x
It looks like you're defining your function, but never calling it.
When the interpreter finishes going through your function definition, the function is now there for you to use - but it never actually executes until you tell it to.
Between the last two lines in your code, you need to call add_up_the_odds() on your numbers array, and assign the result to the odds variable.
i.e. odds = add_up_the_odds(numbers)

Sequence to Sequence Loss

I'm trying to figure out how sequence to sequence loss is calculated. I am using the huggingface transformers library in this case, but this might actually be relevant to other DL libraries.
So to get the required data we can do:
from transformers import EncoderDecoderModel, BertTokenizer
import torch
import torch.nn.functional as F
torch.manual_seed(42)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
MAX_LEN = 128
tokenize = lambda x: tokenizer(x, max_length=MAX_LEN, truncation=True, padding=True, return_tensors="pt")
model = EncoderDecoderModel.from_encoder_decoder_pretrained('bert-base-uncased', 'bert-base-uncased') # initialize Bert2Bert from pre-trained checkpoints
input_seq = ["Hello, my dog is cute", "my cat cute"]
output_seq = ["Yes it is", "ok"]
input_tokens = tokenize(input_seq)
output_tokens = tokenize(output_seq)
outputs = model(
input_ids=input_tokens["input_ids"],
attention_mask=input_tokens["attention_mask"],
decoder_input_ids=output_tokens["input_ids"],
decoder_attention_mask=output_tokens["attention_mask"],
labels=output_tokens["input_ids"],
return_dict=True)
idx = output_tokens["input_ids"]
logits = F.log_softmax(outputs["logits"], dim=-1)
mask = output_tokens["attention_mask"]
Edit 1
Thanks to #cronoik I was able to replicate the loss calculated by huggingface as being:
output_logits = logits[:,:-1,:]
output_mask = mask[:,:-1]
label_tokens = output_tokens["input_ids"][:, 1:].unsqueeze(-1)
select_logits = torch.gather(output_logits, -1, label_tokens).squeeze()
huggingface_loss = -select_logits.mean()
However, since the last two tokens of the second input is just padding, shouldn't we calculate the loss to be:
seq_loss = (select_logits * output_mask).sum(dim=-1, keepdims=True) / output_mask.sum(dim=-1, keepdims=True)
seq_loss = -seq_loss.mean()
^This takes into account the length of the sequence of each row of outputs, and the padding by masking it out. Think this is especially useful when we have batches of varying length outputs.
ok I found out where I was making the mistakes. This is all thanks to this thread in the HuggingFace forum.
The output labels need to have -100 for the masked version. The transoformers library does not do it for you.
One silly mistake I made was with the mask. It should have been output_mask = mask[:, 1:] instead of :-1.
1. Using Model
We need to set the masks of output to -100. It is important to use clone as shown below:
labels = output_tokens["input_ids"].clone()
labels[output_tokens["attention_mask"]==0] = -100
outputs = model(
input_ids=input_tokens["input_ids"],
attention_mask=input_tokens["attention_mask"],
decoder_input_ids=output_tokens["input_ids"],
decoder_attention_mask=output_tokens["attention_mask"],
labels=labels,
return_dict=True)
2. Calculating Loss
So the final way to replicate it is as follows:
idx = output_tokens["input_ids"]
logits = F.log_softmax(outputs["logits"], dim=-1)
mask = output_tokens["attention_mask"]
# shift things
output_logits = logits[:,:-1,:]
label_tokens = idx[:, 1:].unsqueeze(-1)
output_mask = mask[:,1:]
# gather the logits and mask
select_logits = torch.gather(output_logits, -1, label_tokens).squeeze()
-select_logits[output_mask==1].mean(), outputs["loss"]
The above however ignores the fact that this comes from two different lines. So an alternate way of calculating loss could be:
seq_loss = (select_logits * output_mask).sum(dim=-1, keepdims=True) / output_mask.sum(dim=-1, keepdims=True)
seq_loss.mean()
thanks for sharing. However, the new version of transformers as of today actually does not "shift" anymore. The following is not needed.
#shift things
output_logits = logits[:,:-1,:]
label_tokens = idx[:, 1:].unsqueeze(-1)
output_mask = mask[:,1:

Using 2 different outputs of 'return' of a function in separate elements of a plot

I am drawing a plot of voltage per time. For the voltage values, I want the values to be evaluated by a 'scaling' function which converts the values from volts to kilovolts if the biggest element is higher than 1000 volts (11000 volts to 11 KILOvolts).
This function is supposed to return 2 separate outputs; one for (new) values of voltage and one for the unit. The values are fed into the y axis values of the plot and the unit is given to the labeling line of that axis. For example:
import numpy as np
time = np.array([0, 1, 2, 3])
system_voltage1 = np.array([110, 120, 130, 150])
system_voltage2 = np.array([11000, 12000, 13000, 15000])
scaling_function(input)
if np.amax(input) < 1000:
output = input/1
Voltage_label = 'Voltage in Volts'
if np.amax(input) > 1000:
output = input/1000
Voltage_label = 'Voltage in KILOVolts'
return(output, Voltage_label)
fig14 = plt.figure(figsize=(16,9))
ax1 = fig14.add_subplot(111)
l1, = ax1.plot(time, scaling_function(system_voltage), color='r')
ax1.set_xlabel("time in second", color='k')
ax1.set_ylabel(Voltage_label, color='k')
Now, I am having trouble, calling this function properly. I need the function to only receive the output for scaling_function(system_voltage), and receive Voltage_label in ax1.set_ylabel(Voltage_label, color='k'). Now:
A) My problem: I don't know how to write the code so only the first output is received and used for scaling_function(system_voltage) , and the second element for the labeling line.
B) Something I tried but didn't work:Voltage_label does not recognize the value of voltage_label from scaling_function, as it is located in an outer loop than the function. I mean, I cannot access voltage_label as its value is not globally assigned.
Can anyone help me with this?
y,l = scaling_function(system_voltage)
l1, = ax1.plot(time, y, color='r')
ax1.set_xlabel("time in second", color='k')
ax1.set_ylabel(l, color='k')

Mean_squared_error output in function includes dtype and '0'

I want to calculate test statistics for a fb prophet forecast in a function because I want to average the test stats over different forecasts and cutoff points after using the fb-prophet cross_validation to get df_cv. I created a function that I apply to the dataframe after grouping by the cutoff points, in order to receive a measure per cutoff point. Then I calculate the mean over all these values.
The problem is that my function returns not only the value I am looking for but also a 0 as well as an information of the dtype. I can still do calculations with the returned value but when I want to plot etc. later it is very inconvenient. How can I strip these unnecessary values from the output?
def compute_avg_stats(df_cv,perf_measure):
measures = {'mse':mean_squared_error,'mae':mean_absolute_error,'mape':mean_absolute_percentage_error,'rmse':mean_squared_error}
performance_stats = {}
if perf_measure == 'rmse':
measure = np.sqrt(measures[perf_measure](y_true=df_cv['y'],y_pred=df_cv['yhat']))
else:
measure = measures[perf_measure](y_true=df_cv['yu'],y_pred=df_cv['yhat'])
return measure
df_cv.groupby('cutoff').apply(compute_avg_stats,perf_measure='rmse').to_frame().mean()
I think .mean() returns a Series. Try with .mean()[0]

Matlab fminsearch options/restrictions

I have this function in Matlab which is supposed to find the smallest value possible for minValuePossible, by varying the two initial set values of inValues. How can I set the fmin search function to NOT try negative values while trying to find the minimum? Also how can I set the number of different variations the fminsearch function performs while trying to find the minimum? Because currently it tries somewhere around 20 different combinations of the two inValues and then completes. Maybe define the amount by which it changes each value? How would I do that?
function Valueminimiser
inValues = [50,50];
minValuePossible = fminsearch(#minimiser, inValues);
function result = minimiser(inValues)
x=inValues(1);
y=inValues(2);
RunMode = 2;
ValueOne = x;
ValueTwo = y;
[maxSCRAout] = main(RunMode,ValueOne,ValueTwo);
result = minValuePossible;
end
end
How can I set the fmin search function to NOT try negative values while trying to find the minimum?
Add the constrains of the values of your minimiser function at its beginning. If you meet this constrains then return a huge function value of minimizer. This will prevent fminsearch consider numbers which are not in your interest:
function result = minimiser(inValues)
if (sum(inValues < 0) > 1) % check if there is any negative number in input variable
result = hugeValue; % give a big value to the result
return; % return to fminsearch - do not execute the rest of the code
end
x=inValues(1);
y=inValues(2);
RunMode = 2;
ValueOne = x;
ValueTwo = y;
[maxSCRAout] = main(RunMode,ValueOne,ValueTwo);
result = minValuePossible;
Also how can I set the number of different variations the fminsearch function performs while trying to find the minimum?
You can define options of fminsearch by using optimset function. The parameter of optimset 'MaxFunEvals' is the maximum number of evaluations -- notice that this cout even the values you constrained, so maybe setting 'TolX' as advised by #slayton might be better if you are concerned about the accuarcy.
options = optimset('MaxFunEvals',numberOfVariations);
minValuePossible = fminsearch(#minimiser, inValues,options);
The docs for fminsearch don't describe a way to restrict the domain of the function you want to minimize.
If you want to restrict the range to all non-negative numbers then you can simply wrap your function in a call to abs, depending on the syntax .
minValuePossible = fminsearch( #(x)(minimiser( abs(x) ) ), inValues);
If you are worried about it constantly converging to the same minima then try a variety of different initial values.
Lastly you can alter the termination tolerances for X and minValuePossible using the TolX and TolFun input parameters. This is done with standard param value syntax: function(...., 'Param', value)
fminsearch( #(x)(minimiser(abs(x))), inValues, 'TolX', x_tolerance);