Difference betwee torch.unsqueeze and target.unsqueeze - deep-learning

I am training a simple MLP by computing the MSE and get the following error:
UserWarning: Using a target size (torch.Size([1])) that is different to the input size (torch.Size([1, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
The following gives me the right solution
target = target.unsqueeze(1) while torch.unsqueeze(target,1) does not. The former solution is from a previous question and the latter comes from the documentation
Why does the latter fix the UserWarning message with the former doesn't?

torch.unsqueeze Returns a new tensor with a dimension of size one inserted at the specified position. That is its not an inplace operation thus you need to assign its output to something. i.e. simply do :
target = torch.unsqueeze(target, 1)
Otherwise the tensor will remain the same, as you did not store the changes back into it!

Related

How do I read variable length 1D inputs in Tensorflow?

I'm trying to read variable length 1-D inputs into a Tensorflow CNN.
I have previously implemented reading fixed length inputs by first constructing a CSV file (where the first column is the label and the remaining columns are the input values - flattened spectrogram data all padded/truncated to the same length) using tf.TextLineReader().
This time I have a directory full of files each one containing a line of data I want to use as input (flattened spectrogram data again but I do not want to force them to the same dimensions), and the line lengths are not fixed. I'm getting an error trying to use the previous approach of compiling a CSV first. I looked into the documentation of tf.TextLineReader() and it specifies that all CSV rows must be the same shape, so I am stuck! Any help would be much appreciated, thanks :)
I'm assuming that the data isn't changing shape when you have a longer or shorter sample right? By that I mean that if you trained your network on arrays of 1000 pixels for example, with a kernel of say [5,1] size. That [5,1] kernel needs to see the same patterns in the variable length data as it did in the training data. If your data is stretched or shrunk, then the correct solution is to interpolate the data to the same size as the training data so the shapes/patterns match.
Assuming you just want variable length inputs, then in theory you should be able to do this by setting your batch size to 1 and varying the 1st dimension of the data.
So your input placeholder would look like:
X = tf.placeholder(dtype, shape=[1,None,1,1])
The 4 shape arguments are: 1=batch size; None=unknown first dimension size; 1=unused because it's a 1D dataset, 1=one channel images, again unused but necessary for tf.conv2d to receive the expected 4D image.
This is not very different from configuring tensorflow to support variable batch sizes. So you should review this link below and understand that process.
get the size of a variable batch dimension
Note that you can't use a batch size more than 1 here because you wouldn't be able to construct a matrix with missing values in the 2nd dimension. I expect the convolution operations to work with this variable dimension (though I haven't actually tried this).
Another option to deal with this problem would be to pad your inputs with 0's so they all have a common length, but that will need to have been trained into the model up front.

What does caffe do with the mean-binary file ?

In the caffe-input layer one can define a mean image that holds mean values of all the images used. From the image net example: "The model requires us to subtract the image mean from each image, so we have to compute the mean".
My question is: What is the implementation of this subtraction? Is it simply :
used_image = original_image - mean_image
or
used_image = mean_image - original_iamge
or
used_image = |original_image - mean_image|^2
if it is one of the first two, then how are negative pixels handeld ? Since the pictures are usually stored in uint8 it would mean that it simply starts from the beginning. e.g
200 - 255 = 56
Why I need to know this? I made tests and I know that the second example or the third example would work better.
It's the first one, a trivial normalization step. Using the second instead wouldn't really matter: the weights would invert.
There are no "negative pixels", per se: this is simply integer input to the matrix operations. You are welcome to interpret this as a visual alteration of some sort, but the arithmetic doesn't care.

What is the format of the Qpdeltamap used for ROI in NVENC?

I am trying to get started with ROI encoding with the Nvidia Encoder NVENC.
As a first step I am trying to get the Nvidia demos to encode using ROI. I know that the switch -qpDeltaMapFile enables the flag enableExtQPDeltaMap. This allows me to send a file with a qp map that the encoder uses to tweak the values obtained by the rate control algorithm.
However there is absolutely no documentation on the format of this file. I tried to use one value per byte, and one byte per value assuming fixed size macroblocks of 16x16. It doesn't seem to work as I would expect.
Any guidance or references would help a lot.
There was a bug in my code. It actually works almost as I described.
Assume your screen is divided equally in 16x16 blocks, then each value will be added to the qp that the rate control algorithm chose. Each value passed is a signed integer, therefore a negative value will improve the quality while a positive value will decrease it. A value of 0 will stay with whatever the rate control algorithm decided.

fft: fitting binned data

I want to fit a curve to data obtained from an FFT. While working on this, I remembered that an FFT gives binned data, and therefore I wondered if I should treat this differently with curve-fitting.
If the bins are narrow compared to the structure, I think it should not be necessary to treat the data differently, but for me that is not the case.
I expect the right way to fit binned data is by minimizing not the difference between values of the bin and fit, but between bin area and the area beneath the fitted curve, for each bin, such that the energy in each bin matches the energy in the range of the bin as signified by the curve.
So my question is: am I thinking correctly about this? If not, how should I go about it?
Also, when looking around for information about this subject, I encountered the "Maximum log likelihood" for example, but did not find enough information about it to understand if and how it applied to my situation.
PS: I have no clue if this is the right site for this question, please let me know if there is a better place.
For an unwindowed FFT, the correct interpolation between bins is by using a Sinc (sin(x)/x) or periodic Sinc (Dirichlet) interpolation kernel. For an FFT of samples of a band-limited signal, thus will reconstruct the continuous spectrum.
A very simple and effective way of interpolating the spectrum (from an FFT) is to use zero-padding. It works both with and without windowing prior to the FFT.
Take your input vector of length N and extend it to length M*N, where M is an integer
Set all values beyond the original N values to zeros
Perform an FFT of length (N*M)
Calculate the magnitude of the ouput bins
What you get is the interpolated spectrum.
Best regards,
Jens
This can be done by using maximum log likelihood estimation. This is a method that finds the set of parameters that is most likely to have yielded the measured data - the technique originates in statistics.
I have finally found an understandable source for how to apply this to binned data. Sadly I cannot enter formulas here, so I refer to that source for a full explanation: slide 4 of this slide show.
EDIT:
For noisier signals this method did not seem to work very well. A method that was a bit more robust is a least squares fit, where the difference between the area is minimized, as suggested in the question.
I have not found any literature to defend this method, but it is similar to what happens in the maximum log likelihood estimation, and yields very similar results for noiseless test cases.

Possible to call subfunction in S-function level-2

I have been trying to convert my level-1 S-function to level-2 but I got stuck at calling another subfunction at function Output(block) trying to look for other threads but to no avail, do you mind to provide related links?
My output depends on a lot processing with the inputs, this is the reason I need to call the sub-function in order to calculate and then return output values, all the examples that I can see are calculating their outputs directly in "function Output(block)", in my case I thought it is not possible.
I then tried to use Interpreted Matlab Function block but failed due to the output dimension is NOT the same as input dimension, also it does not support the return of more than ONE output................
Dear Sir/Madam,
I read in S-function documentation that "S-function level-1 supports vector inputs and outputs. DOES NOT support multiple input and output ports".
Does the second sentence mean the input and output dimension MUST BE SAME?
I have been using S-function level-1 to do the following:
[a1, b1] = choose_cells(c, d);
where a1 and b1 are outputs, c and d are inputs. All the variables are having a single value, except d is an array with 6 values.
Referring to the image attached, we all know that in S-function block, the input dimension must be SAME as output dimension, else we will get error, in this case, the input dimension is 7 while the output dimension is 2, so I have to include the "Terminator" blocks in the diagram for it to work perfectly, otherwise, I will get an error.
My problem is, when the system gets bigger, the array d could contain hundreds of variables, using this method, it means I would have to add hundreds of "Terminator" blocks in order to get this work, this definitely does not sound practical.
Could you please suggest me a wise way to implement this?
Thanks in advance.
http://imgur.com/ib6BTTp
http://imageshack.us/content_round.php?page=done&id=4tHclZ2klaGtl66S36zY2KfO5co