I'm trying to find the shortest way to code a octave matrix of size m x n, and fill it with default/initial value, all in one command .
how can this be done?
Typically, if you're filling it with zeros, then zeros([M,N]).
If you're filling it with ones, ones([M,N])
For any other value k, there's lots of options, e.g..
X = k * ones(M,N);
X = repmat( k, M, N);
X = k(ones(M,N));
W.r.t, Dror's answer, this is also a nice way, with the caveats that a) it requires assignment (whereas the above can all be used in an operation in-place, without having to assign to a temporary variable), and b) you need to be sure X doesn't already exist on your workspace, otherwise you may not be getting what you think you're getting. If you really want to be sure, you'd have to clear X first;
clear X; X(1:M, 1:N) = k;
I'm sure there are other ways. All the above have different benchmarks; there used to be advice on which to avoid for speed if you're doing this repeatedly inside a for loop (repmat is slower, for instance). Can't remember details now.
in just one line of code -
for matrix X with dimensions m x n and default value v:
X(1:m,1:n)=v
example (for matrix 2x3 with default value of 5):
X(1:2,1:3)=5
(Per comment from #Daniel you may prefix the above creation command with :
clear X )
Related
I use torch.nn.Embedding to embed my model’s categorical input features, however, I face problems when I set the max_norm parameter to not None.
There is a note on the pytorch docs page that explains how to use max_norm parameter through the following example:
n, d, m = 3, 5, 7
embedding = nn.Embedding(n, d, max_norm=True)
W = torch.randn((m, d), requires_grad=True)
idx = torch.tensor(\[1, 2\])
a = embedding.weight.clone() # W.t() # weight must be cloned for this to be differentiable
b = embedding(idx) # W.t() # modifies weight in-place
out = (a.unsqueeze(0) + b.unsqueeze(1))
loss = out.sigmoid().prod()
loss.backward()
I can’t easily understand this example from the docs. What is the purpose of having both ‘a’ and ‘b’ and why ‘out’ is defined as, out = (a.unsqueeze(0) + b.unsqueeze(1))?
Do we need to first clone the entire embedding tensor as in ‘a’, and then finding the embeddings for our desired indices as in ‘b’? Then how do ‘a’ and ‘b’ need to be added?
In my code, I don’t have W explicitly, I am assuming that W is representative of the weights applied by the torch.nn.Linear layers. So, I just need to prepare the input (which includes the embeddings for categorical features) that goes into my network.
I greatly appreciate any instructions on this, as understanding this example would help me adapt my code accordingly.
Because W in the line computing a requires gradients, we must save embedding.weight to compute those gradients in the backward pass. However, in the line computing b, executing embedding(idx) will scale embedding.weight by max_norm - in place. So, without cloning it in line a, embedding.weight will be modified when line b is executed - changing what was saved for the backward pass to update W. Hence the requirement to clone embedding.weight - to save it before it gets scaled in line b.
If you don't use embedding.weight outside of the normal forward pass, you don't need to worry about all this.
If you get an error, post it (and your code).
When I run this piece of code:
dx = 4/(nx-1);
dy = 2/(ny-1);
phinew(2:ny-1,2:nx-1) = ((dy^2*(phiold(1:ny-2,2:nx-1)+phiold(3:ny,2:nx-1))+dx^2*(phiold(2:ny-1,1:nx-2)+phiold(2:ny-1,3:nx)))/(2*(dx^2+dy^2))-poissonf);
Within my code, the piece of code operates correctly, taking an input of an nx by ny matrix and outputting a nx by ny matrix. When I run the following code in the exact same position however:
phinew = Smoothing(phiold,poissonf,nx,ny)
Where the function "Smoothing" is defined as:
function [phinew,dx,dy] = Smoothing(phiold,poissonf,nx,ny)
dx = 4/(nx-1);
dy = 2/(ny-1);
phinew(2:ny-1,2:nx-1) = ((dy^2*(phiold(1:ny-2,2:nx-1)+phiold(3:ny,2:nx-1))+dx^2*(phiold(2:ny-1,1:nx-2)+phiold(2:ny-1,3:nx)))/(2*(dx^2+dy^2))-poissonf);
end
The function returns a nx-1 by ny-1 matrix for an nx by ny input.
I cannot wrap my head around why this is occurring at all. The output matrix is the exact same as it should be, except the last column and row are missing entirely. My code is iterative and so requires these to be of the same size, so I cannot move on until this issue is resolved.
Thank you for your time and your help. You people are life-savers.
When you run your code in the command window, phinew already exists. In your command window, do clear phinew before you pasting in those three lines, and you'll find that phinew is then nx-1 by ny-1, as you get from your function.
If you want to force your function to return nx by ny, put phinew = zeros(nx,ny); at the start of the function, or set the last column and row to whatever you want them to be.
EDIT: Responding you your comment "why [does the RHS of the main assignment output] a 48x48 matrix? Directly before the command is run phiold, phinew and poissonf are all 50x50."
I don't think poissonf is 50x50: that would lead to the error Matrix dimensions must agree, because poissonf is being added to the rest of the expression which is part of phiold, so I'll ignore poissonf in the following.
The RHS is always ny-2 by nx-2, even on the first iteration. You can see this by assigning the RHS to an intermediate variable, e.g. phipiece = ... and checking size(phipiece). The reason phinew (if created anew) is 49x49 is because it is assigned to (2:ny-1,2:nx-1), which will create a ny-1 by nx-1 matrix and leave the first row and column as zero.
If you use phinew = zeros(nx,ny); first, then the first and last rows and columns are left as zero.
Effectively what I'm looking for is a function f(x) that outputs into a range that is pre-defined. Calling f(f(x)) should be valid as well. The function should be cyclical, so calling f(f(...(x))) where the number of calls is equal to the size of the range should give you the original number, and f(x) should not be time dependent and will always give the same output.
While I can see that taking a list of all possible values and shuffling it would give me something close to what I want, I'd much prefer it if I could simply plug values into the function one at a time so that I do not have to compute the entire range all at once.
I've looked into Minimal Perfect Hash Functions but haven't been able to find one that doesn't use external libraries. I'm okay with using them, but would prefer to not do so.
If an actual range is necessary to help answer my question, I don't think it would need to be bigger than [0, 2^24-1], but the starting and ending values don't matter too much.
You might want to take a look at Linear Congruential Generator. You shall be looking at full period generator (say, m=224), which means parameters shall satisfy Hull-Dobell Theorem.
Calling f(f(x)) should be valid as well.
should work
the number of calls is equal to the size of the range should give you the original number
yes, for LCG with parameters satisfying Hull-Dobell Theorem you'll get full period covered once, and 'm+1' call shall put you back at where you started.
Period of such LCG is exactly equal to m
should not be time dependent and will always give the same output
LCG is O(1) algorithm and it is 100% reproducible
LCG is reversible as well, via extended Euclid algorithm, check Reversible pseudo-random sequence generator for details
Minimal perfect hash functions are overkill, all you've asked for is a function f that is,
bijective, and
"cyclical" (ie fN=f)
For a permutation to be cyclical in that way, its order must divide N (or be N but in a way that's just a special case of dividing N). Which in turn means the LCM of the orders of the sub-cycles must divide N. One way to do that is to just have one "sub"-cycle of order N. For power of two N, it's also really easy to have lots of small cycles of some other power-of-two order. General permutations do not necessarily satisfy the cycle-requirement, of course they are bijective but the LCM of the orders of the sub-cycles may exceed N.
In the following I will leave all reduction modulo N implicit. Without loss of generality I will assume the range starts at 0 and goes up to N-1, where N is the size of the range.
The only thing I can immediately think of for general N is f(x) = x + c where gcd(c, N) == 1. The GCD condition ensures there is only one cycle, which necessarily has order N.
For power-of-two N I have more inspiration:
f(x) = cx where c is odd. Bijective because gcd(c, N) == 1 so c has a modular multiplicative inverse. Also cN=1, because φ(N)=N/2 (since N is a power of two) so cφ(N)=1 (Euler's theorem).
f(x) = x XOR c where c < N. Trivially bijective and trivially cycles with a period of 2, which divides N.
f(x) = clmul(x, c) where c is odd and clmul is carry-less multiplication. Bijective because any odd c has a carry-less multiplicative inverse. Has some power-of-two cycle length (less than N) so it divides N. I don't know why though. This is a weird one, but it has decent special cases such as x ^ (x << k). By symmetry, the "mirrored" version also works.
Eg x ^ (x >> k).
f(x) = x >>> k where >>> is bit-rotation. Obviously bijective, and fN(x) = x >>> Nk, where Nk mod N = 0 so it rotates all the way back to the unrotated position regardless of what k is.
Suppose I have a matrix A and I want to obtain the following:
for i=1:m
A(i,:) = something which depends on i;
endfor
Is there a way to obtain that without the loop?
Added: Ok, I've understood I have to be more specific.
I have two matrices B and C (all the matrices we are considering have m rows).
I want to record in the i-th row of A the product of the polynomials written in the i-th rows of B and C (so using the loop I would call the conv function).
Any ideas?
I don't think it's possible to do this without a for loop as conv only accepts vector inputs, but I might be wrong. I can't see a way of using either bsxfun or arrayfun in conjunction with conv and matrix inputs. I might be wrong though... I stand to be corrected.
That is a very very general question, and not possible to answer with more details. Mainly on what i will be involved with. Suppose the following
for i = 1:m
A(i,:) += i;
endfor
It could be written with the much more efficient:
A .+ (1:m)'
Just compare:
octave> n = 1000;
octave> A = B = rand (n);
octave> tic; for i = 1:n, B(i,:) += i; endfor; toc
Elapsed time is 0.051 seconds.
octave> tic; C = A.+ (1:n)'; toc
Elapsed time is 0.01 seconds.
octave> isequal (C, B)
ans = 1
If you have a very old version of octave, you can instead do bsxfun (#plus, A, (i:m)').
However, if i on the right side of the expression will be used for indexing some other variable, then solution would be different. Maybe, the solution is cumsum, or some other of cumfoo function.
Your question is basically, "how do I vectorize code?", which is a really large subject, without telling us what you're trying to vectorize.
Reading this question got me thinking: For a given function f, how can we know that a loop of this form:
while (x > 2)
x = f(x)
will stop for any value x? Is there some simple criterion?
(The fact that f(x) < x for x > 2 doesn't seem to help since the series may converge).
Specifically, can we prove this for sqrt and for log?
For these functions, a proof that ceil(f(x))<x for x > 2 would suffice. You could do one iteration -- to arrive at an integer number, and then proceed by simple induction.
For the general case, probably the best idea is to use well-founded induction to prove this property. However, as Moron pointed out in the comments, this could be impossible in the general case and the right ordering is, in many cases, quite hard to find.
Edit, in reply to Amnon's comment:
If you wanted to use well-founded induction, you would have to define another strict order, that would be well-founded. In case of the functions you mentioned this is not hard: you can take x << y if and only if ceil(x) < ceil(y), where << is a symbol for this new order. This order is of course well-founded on numbers greater then 2, and both sqrt and log are decreasing with respect to it -- so you can apply well-founded induction.
Of course, in general case such an order is much more difficult to find. This is also related, in some way, to total correctness assertions in Hoare logic, where you need to guarantee similar obligations on each loop construct.
There's a general theorem for when then sequence of iterations will converge. (A convergent sequence may not stop in a finite number of steps, but it is getting closer to a target. You can get as close to the target as you like by going far enough out in the sequence.)
The sequence x, f(x), f(f(x)), ... will converge if f is a contraction mapping. That is, there exists a positive constant k < 1 such that for all x and y, |f(x) - f(y)| <= k |x-y|.
(The fact that f(x) < x for x > 2 doesn't seem to help since the series may converge).
If we're talking about floats here, that's not true. If for all x > n f(x) is strictly less than x, it will reach n at some point (because there's only a limited number of floating point values between any two numbers).
Of course this means you need to prove that f(x) is actually less than x using floating point arithmetic (i.e. proving it is less than x mathematically does not suffice, because then f(x) = x may still be true with floats when the difference is not enough).
There is no general algorithm to determine whether a function f and a variable x will end or not in that loop. The Halting problem is reducible to that problem.
For sqrt and log, we could safely do that because we happen to know the mathematical properties of those functions. Say, sqrt approaches 1, log eventually goes negative. So the condition x < 2 has to be false at some point.
Hope that helps.
In the general case, all that can be said is that the loop will terminate when it encounters xi≤2. That doesn't mean that the sequence will converge, nor does it even mean that it is bounded below 2. It only means that the sequence contains a value that is not greater than 2.
That said, any sequence containing a subsequence that converges to a value strictly less than two will (eventually) halt. That is the case for the sequence xi+1 = sqrt(xi), since x converges to 1. In the case of yi+1 = log(yi), it will contain a value less than 2 before becoming undefined for elements of R (though it is well defined on the extended complex plane, C*, but I don't think it will, in general converge except at any stable points that may exist (i.e. where z = log(z)). Ultimately what this means is that you need to perform some upfront analysis on the sequence to better understand its behavior.
The standard test for convergence of a sequence xi to a point z is that give ε > 0, there is an n such that for all i > n, |xi - z| < ε.
As an aside, consider the Mandelbrot Set, M. The test for a particular point c in C for an element in M is whether the sequence zi+1 = zi2 + c is unbounded, which occurs whenever there is a |zi| > 2. Some elements of M may converge (such as 0), but many do not (such as -1).
Sure. For all positive numbers x, the following inequality holds:
log(x) <= x - 1
(this is a pretty basic result from real analysis; it suffices to observe that the second derivative of log is always negative for all positive x, so the function is concave down, and that x-1 is tangent to the function at x = 1). From this it follows essentially immediately that your while loop must terminate within the first ceil(x) - 2 steps -- though in actuality it terminates much, much faster than that.
A similar argument will establish your result for f(x) = sqrt(x); specifically, you can use the fact that:
sqrt(x) <= x/(2 sqrt(2)) + 1/sqrt(2)
for all positive x.
If you're asking whether this result holds for actual programs, instead of mathematically, the answer is a little bit more nuanced, but not much. Basically, many languages don't actually have hard accuracy requirements for the log function, so if your particular language implementation had an absolutely terrible math library this property might fail to hold. That said, it would need to be a really, really terrible library; this property will hold for any reasonable implementation of log.
I suggest reading this wikipedia entry which provides useful pointers. Without additional knowledge about f, nothing can be said.