Vanishing/Exploding gradients in deeplearning4j

Vanishing/Exploding gradients in deeplearning4j - deep-learning

How to check if we're having a vanishing/exploding gradient in deeplearning4j, more specifically for recurrent neural networks? I mean, what parameters to look for and what methods should we call to get the value of such parameters?

As suggested above you should take a look at the GUI, introduction here.
DL4J GUI: Overview Tab -> Update:Parameter Ratios
The ratio of updates to parameters is specifically the ratio of mean
magnitudes of these values (i.e.,
log10(mean(abs(updates))/mean(abs(parameters)))
So, significantly high or low values may suggest exploding/vanishing gradients.
Programmatically
At the end of each iteration the gradients are stored in the gradient field of both ComputationalGraph and MultiLayerNetwork. It can be accessed via public gradient() method (this method does not change state, it is a simple getter), so you can analyze the gradients in your code.
Here's a small code code snippet that outputs gradients' min, average, max per variable, as well as the log10 (magnitude) of the min value:
StringBuilder gradSummary = new StringBuilder("--- Gradients ---\n");
net.gradient().gradientForVariable().forEach((var, grad) -> {
Number min = grad.aminNumber();
Number max = grad.amaxNumber();
Number mean = grad.ameanNumber();
int order = (int) Math.log10(min.doubleValue());
gradSummary.append(var).append(": ")
.append(min).append(",")
.append(mean).append(",")
.append(max).append(",")
.append("magnitude: ").append(order).append('\n');
});
gradSummary.append("-----------------");
log.info(gradSummary.toString());
It produces an output like the following (notice variables are named based on the layer names):
2019-01-05 15:26:12 INFO --- Gradients ---
lstm-1_W: 4.1305625586574024E-11,2.102349571941886E-5,5.235217977315187E-4, magnitude: -10
lstm-1_RW: 6.30961949354969E-11,1.7203132301801816E-5,1.335109118372202E-4, magnitude: -10
lstm-1_b: 2.9782620813989524E-10,3.226526814614772E-6,3.882131932186894E-5, magnitude: -9
lstm-2_W: 2.340811988688074E-10,2.496814886399079E-5,7.095998153090477E-4, magnitude: -9
lstm-2_RW: 8.640199666842818E-11,4.6048542571952567E-5,0.0015051497612148523, magnitude: -10
lstm-2_b: 6.85293555235944E-9,3.012867455254309E-5,4.262796137481928E-4, magnitude: -8
lstm-3_W: 1.141415850725025E-10,5.7301283959532157E-5,0.0024848710745573044, magnitude: -9
lstm-3_RW: 2.446540747769177E-10,3.4060700272675604E-5,0.002297096885740757, magnitude: -9
lstm-3_b: 1.5003001507807312E-8,2.131067230948247E-5,2.356997865717858E-4, magnitude: -7
norm-1_gamma: 4.6524661456714966E-8,2.8755117455148138E-5,1.543344114907086E-4, magnitude: -7
norm-1_beta: 5.754080234510184E-7,1.0409040987724438E-4,3.460813604760915E-4, magnitude: -6
norm-1_mean: 8.82148754044465E-7,0.0033756729681044817,0.048742543905973434, magnitude: -6
norm-1_var: 3.0532873451782905E-10,2.6078732844325714E-6,1.6723810404073447E-4, magnitude: -9
dense-1_W: 3.8744474295526743E-10,5.491946285474114E-5,6.59565266687423E-4, magnitude: -9
dense-1_b: 4.4111070565122645E-6,1.4454024494625628E-4,4.0868428186513484E-4, magnitude: -5
norm-2_gamma: 2.477656607879908E-6,9.73446512944065E-5,2.708708052523434E-4, magnitude: -5
norm-2_beta: 3.106115855189273E-6,4.934889730066061E-4,0.0012065295595675707, magnitude: -5
norm-2_mean: 2.7818930902867578E-5,0.004300051834434271,0.01411475520581007, magnitude: -4
norm-2_var: 1.806318869057577E-5,0.007471780758351088,0.020012110471725464, magnitude: -4
output_W: 7.830021786503494E-8,1.4970696065574884E-4,4.896917380392551E-4, magnitude: -7
output_b: 3.1583107193000615E-4,6.765704602003098E-4,0.0011031415779143572, magnitude: -3
-----------------
You can even wrap this code around iteration listener, and output this once per N iterations to help babysit your training process.

Related

two's complement binary VHDL

I have to transfer a binary number from A to -A in VHDL. I have some questions about two's complement notation.
For instance if I have A(8 bit) = 5, in binary 0000 0101. From online sources I realized that to translate it into two's complement negative form, I need to invert all bits adding 1 at the end:
0000 0101 --> 1111 1010 + 0000 0001 = 1111 1011 and this represents -A=-5;
My doubts now is about this final binary form that can represent -5 and 251, how can I recognize if it is -5 or 251?
By the way, this method is not so simply to be described in VHDL. Do you know if there is any simpler method?

Intuitively my reasoning would be: you are using two's complement, which is a signed number representation, therefore negative values exist. If negative values exist, you need a sign bit. That will leave 7-bits for the magnitude: hence you can only represent a value between -128 and 127. Value 251 is not within this range: It cannot be represented using 8-bits two's complement notation. Thus only -5 is valid.
The easiest way to realize two's complement sign inversion in VHDL is using the numeric_bit package.
library ieee;
entity bit_inv is
generic(width : positive);
port(
A : in bit_vector(width-1 downto 0);
A_inv : out bit_vector(width-1 downto 0));
end entity;
architecture rtl of bit_inv is
use ieee.numeric_bit.all;
begin
A_inv <= bit_vector(-signed(A));
end architecture;
entity bit_inv_tb is end entity;
library ieee;
architecture beh of bit_inv_tb is
use ieee.numeric_bit.all;
constant width : positive := 8;
signal A, A_inv : bit_vector(width-1 downto 0);
begin
DUT : entity work.bit_inv
generic map(width => width)
port map(A=>A, A_inv =>A_inv);
test: process begin
A <= bit_vector(to_signed(5,width));
wait for 1 ns;
assert to_integer(signed(A_inv)) = -5 report "A_inv is not equal to -5" severity failure;
wait;
end process;
end architecture;

what is the result inside fft bin at different index?

For example if I have 10 samples in the FFT in a system sampled at 100 Hz. Then how are the results represented in FFT bin output ?
The frequency resolution is going to be how many Hz each DFT bin represents. This is, as you have noted, given by fs/N. In this case :--
resolution = 100/10 = 10 hz
Definition of FFT states that frequency range [0,fs] is represented by N point.
But when the result comes :--
X[0] is the constant term (also refereed to as DC bias)
X[1] is the 10 hz
X[2] is the 20 hz
X[3] is the 30 hz
X[4] is the 40 hz
X[5] is the 50 hz
X[6] to X[9] are the complex conjugates of X[4] to X[1] respectively.
so my question is :--
X[6] is the 60hz signal & its value will be complex conjugate of 40 hz signal .. is it right ?
Or X[6] does not contains spectral of 60hz but it is only complex conjugate of X[4] ?
Please suggest.

The frequencies of each bin in the DFT go from $n = 0 to N-1$ where each frequency is given as $n F_s/N$.
Due to the sampling process, for any signals (either real and complex signals) the upper half of this frequency spectrum that extends from $F_s/2$ to 1 sample less than $F_s$ is equivalent to the negative half spectrum (the spectrum that extends from $-F_s/2% to 1 sample less than 0.). This applies to the DFT and all digital signals.
This is demonstrated in the graphics below, showing how an analog spectrum (top line), convolves with the sampling spectrum (second line) of a system sampled at 20 Hz to result in the digital spectrum of a sampled signal. Since the spectrum is unique over a 20 Hz range (and repeats everywhere else), we only need to show the frequencies over any 20 Hz range to represent the signal. This can be identically shown from -10 Hz to +10 Hz, or 0 to 20 Hz.
Representing the sampled spectrum of a real signal sampled at 20 Hz over the range of -10Hz to +10Hz:
The same signal can also be represented over the range of 0 to 20 Hz:
Representing the sampled spectrum of a complex signal sampled at 20 Hz over the range of -10Hz to +10Hz:
The same signal can also be represented over the range of 0 to 20 Hz:
Since the spectrum for the DFT is discrete, the samples go from $n = 0 to N-1$ where each frequency is given as $F_s/n$ as described above. There is also only N samples in the DFT of course but as you rotate the samples in the DFT, you effectively move through the expanded spectrum above. It is helpful to view the spectrum on the surface of a cylinder with a circumference that goes from 0 to $F_s$ where you see that $F_s$ is equivalent to 0, and going backwards from 0 is equivalent to going into the negative half spectrum.
So in your example specifically:
X[0] = DC
X1 = 10
X2 = 20
X3 = 30
X4 = 40
X[5] = 50
X[6] = 60
X[7] = 70
X[8] = 80
X[9] = 90
Can also be represented as
X[0] = DC
X1 = 10
X2 = 20
X3 = 30
X4 = 40
X[5] = -50
X[6] = -40
X[7] = -30
X[8] = -20
X[9] = -10
Note that the command "FFTSHIFT" in MATLAB shifts the DFT vector accordingly to produce the following order representing the range from -F_s/2 to +F_s/2 :
fftshift([X[0], X1, X2, X3, X4, X[5], X[6], X[7], X[8], X[9]]) =
[X[5], X[6], X[7], X[8], X[9], X[0], X1, X2, X3, X4]

Assuming your input signal is purely real, then X[6] is just the complex conjugate of X[4]. The top half of the spectrum is essentially redundant for real signals. See: this question and answer for further details.
Note that the input signal should be bandwidth-limited to fs / 2, otherwise aliasing will occur, so for a baseband signal there should not be any components >= fs / 2.

How Java's BigInteger sign-magnitude works

Consider the following code:
int i = 1;
System.out.println("1 binary: " + Long.toBinaryString(i));
long ri = Long.reverse(i);
System.out.println("1 reverse bit decimal: " + ri);
System.out.println("1 reverse bit binary: "+ Long.toBinaryString(ri));
BigInteger bil = new BigInteger(1, Longs.toByteArray(ri));
System.out.println("1 Sign-Magnitude BigInteger toString: " + bil.toString());
The output is:
1 binary: 1
1 reverse bit decimal: -9223372036854775808
1 reverse bit binary: 1000000000000000000000000000000000000000000000000000000000000000
1 Sign-Magnitude BigInteger toString: 9223372036854775808
Can anyone help to explain why the value of "1 Sign-Magnitude BigInteger toString:" is 9223372036854775808 (2^63)?

To get the sign-magnitude of a value, you simply take its absolute value as magnitude and remember the sign in a separate bit (or byte).
The sign-magnitude representation of, say, 722 is simply:
sign = 0
magnitude = 722
The sign magnitude of -722 is simply:
sign = 1
magnitude = 722
That is also what BigInteger uses.
Your code reverses a value, which means that, say, the 8 bit value 00000001 (1) is changed into 10000000 (128 or 2^7). That is not the same as inverting, which turns e.g. 00000001 (1) into 11111110 (254). That is what one's complement does. The generally used two's complement negates 00000001 (1) into 11111111 (255, i.e. 256 - 1). You should read up about two's complement, which takes some understanding. Sign-magnitude is, however, very easy to understand (but not always very practical -- addition, subtraction are different for signed and unsigned, etc. -- and that is why most processors use two's-complement)
So again: sign-magnitude works like this:
sign = (n < 0)
magnitude = abs(n)

what is the result of (54.125) - (184)10

I am practicing for midterm and apprently there's no answer key for it.
However, I practiced and got a result but not sure if this is correct since the solution is really long.
perfrom the following aritmetic operations using the 10 bit floating point standard(given on equation sheet)
-I need to convert to normalized
-standardize properly
-convert final answer to normalized 10 bit.
-convert the final answer to decimal.
-must indicate if there is any loss of precision
my solution is really long so I'm just gonna post only answer here.
-normalized
: 54.125 = 0 1100 10110
: 184 = 0 1110 01110
-standardized:
54.125 = 0 1110 0.01101
184 = 0 1110 1.01110
-result :
0 0010 0000
-denormalized: 0.03125
-precision lost: -129.90625
please help thanks!

Max and min values of H, S and B in Flash?

What are the minimum and maximum values of Hue, Staturation and Brightnest when they are modified through AdjustColor class?

Read the manual, the ranges are listed below each property.
http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/fl/motion/AdjustColor.html
For reference in case docs go down:
Hue: -180 to 180
Contrast: -100 to 100
Saturation: -100 to 100

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Vanishing/Exploding gradients in deeplearning4j - deep-learning

How to check if we're having a vanishing/exploding gradient in deeplearning4j, more specifically for recurrent neural networks? I mean, what parameters to look for and what methods should we call to get the value of such parameters?

Related

two's complement binary VHDL

what is the result inside fft bin at different index?

How Java's BigInteger sign-magnitude works

what is the result of (54.125) - (184)10

Max and min values of H, S and B in Flash?

Categories

Resources