Vectorization or sum as matrix operations [closed] - octave

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
Let there be the following definition of gradient descent cost function
with the hypothesis function defined as
what I've come up with for multivariate linear regression is
theta = theta - alpha * 1/m * ([theta', -1]*[X';y']*X)';
h_theta = 1/(2*m)* (X*theta - y)'*(X*theta-y);
(octave notation, ' means matrix transpose, [A, n] means adding a new column to matrix A with scalar value n, [A; B] means appending matrix B to matrix A row-wise)
It's doing its job correctly how far I can tell (the plots look ok), however I have a strong feeling that it's unnecessarily complicated.
How to write it with as little matrix operations as possible (and no element-wise operations, of course)?

I don't think that is unnecessarily complicated, and instead this is what you want. Matrix operations are good because you don't have to loop over elements yourself or do element-wise operations. I remember taking a course online and my solution seems pretty similar.

The way you have it is the most efficient way of doing it as it is fully vectorized. It can be done by having a for loop over the summation and so on, however this is very inefficient in terms of processing power.

Related

Prove that not all ternary linear codes contain the all zero codeword [closed]

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 days ago.
Improve this question
I have to prove that not all ternary linear codes contain the all zero codeword.
Some term explanations:
A linear code is a code that the sum or difference of any two codewords must also be a codeword. In other words, for all x, y in C, x +/- y is also in C.
A ternary codeword in this exercise is a codeword containing 0, 1 or 2. Eg, 1011, 1012, 012, ...
An all zero codeword is something like 0000, 000, 00000, ...
My attempt:
The addition of codewords must be done MOD 3 which means there are three allowable numbers in this MOD 3 world including 0, 1 and 2. For any ternary linear codes, there are cases when all the additions of any two codewords yeild non-zero codewords.
I am not sure about the proof. I have an idea that it must deal with the MOD 3 but not sure how to explain.
Thank you so much!

What is the Mathematical formula for sparse categorical cross entropy loss? [closed]

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 months ago.
Improve this question
Can anyone help me with the Mathematics of sparse categorical cross entropy loss function? I have searched for the derivation, explanation (Mathematical) but couldn't find any
I know it is not the right place to ask question like this. But I am helpless.
It is just cross entropy loss. The "sparse" refers to the representation it is expecting for efficiency reasons. E.g. in keras it is expected that label provided is an integer i*, an index for which target[i*] = 1.
CE(target, pred) = -1/n SUM_k [ SUM_i target_ki log pred_ki ]
and since we have sparse target, we have
sparse-CE(int_target, pred) = -1/n SUM_k [ log pred_k{int_target_k} ]
So instead of summing over label dimension we just index, since we know all remaining ones are 0s either way.
And overall as long as targets are exactly one class we have:
CE(target, pred) = CE(onehot(int_target), pred) = sparse-CE(int_target, pred)
The only reason for this distinction is efficiency. For regular classification with ~10-100 classes it does not really matter, but imagine word-level language models where we have thousands of classes.

Multiply two matrix in cuda c [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have to matrices A(32*32) and B(32*n), in which 'n' is coming from inputs and is between 2000 to 2000000.
I have two kind of inputs one is integers between 0 to 255 and the other one is 0,1. this multiplication is in a loop that iterates 3000 times. B(32*n) comes form input and is constant in all of the iterations but A(32*32) can change in each iteration.
//read B from file
//read A from file
double D[3000];
for(int i = 0; i < 3000; i++)
{
C = multiply(A, B);
// D[i] = mean of all elements in C
// build A from B using D[i] (this part is really complicated sequential process that contains lots of if and switches)
}
What is the fastest way to do this?
thank you.
Nobody here is going to write code for you, that is not what Stack Overflow is intended for. However, it would appear to be that there are a number of characteristics of the problem which you should be looking to exploit to improve the performance of your code:
Recognise that because one of the matrices only contains 0 or 1 and you are performing this in integer, what you are describing as matrix multiplication is really a large number of independent sparse sums
Recognise that because the next operation is to compute an average, you don't actually have to store the intermediate dot products and could directly perform a reduction on partial results of the matrix row summation
There are probably parallel primitives in the thrust library which you could use for prototyping, and an optimal hand written kernel would be aiming to fuse both the first and most of the second part of the operation into a single kernel.

Bouncing off inclined surface [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I can calculate distance between inclined line and my ball (with normal vector), But how can I calculate new velocity?
Anders' answer was a good one but I realise that you may not have a great mathematical back ground so I will elaborate. The problem you have at the moment is poorly stated. However, see the following figure
This will allow us to derive the equation you require. Now, the scalar product of two vectors a and b, a.b gives the magnitude of a multiplied by the projection of b onto a. Basically, if we take n as a unit vector (magnitude 1 in each component direction) then a.n gives the magnitude of the components of a which act in the direction of n.
So, splitting the velocity components into those parallel and perpendicular to the plain; to get the velocity V we first split U into components.
Perpendicular to the plane in direction n, we have a vector velocity w = (U.n) n. This means that in fact we can write U = (U.n) n + [U - (U.n) n]. This is saying that U is made up of the perpendicular component of itself + the parallel component of itself. Now, -V is very similar to U but the parallel components acts in the reverse direction, so we can write -V = (U.n) n - [U - (U.n) n].
Combining the above gives the result Anders stated, i.e. V = U -2[(U.n) n]. The dot/scalar product is defined as a.b = |a||b|cos(A) where A is the angle between the vectors laid together tail-to-tail, this should enable you to solve your problem.
I hope this helps
If The vector v=(vx,vy) is the initial velocity and the plane has normal n=(nx,ny) then the new reflected velocity vector r will be
r=v−2(v⋅n)*n
The product (v⋅n) is the dot product of v and n, defined as vxnx+vyny. Note that the plane normal must be normalized (length 1.0). A related question with the same answer https://math.stackexchange.com/questions/13261/how-to-get-a-reflection-vector

Working example of Principal Component Analysis? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
Are there any examples available that give a hands-on example of Principal Component Analysis on a dataset? I am reading articles discussing only theory and am really looking for something that will show me how to use PCA and then interpret the results and transform the original dataset into the new dataset. Any suggestions please?
If you know Python, here is a short hands-on example:
# Generate correlated data from uncorrelated data.
# Each column of X is a 3-dimensional feature vector.
Z = scipy.randn(3, 1000)
C = scipy.randn(3, 3)
X = scipy.dot(C, Z)
# Visualize the correlation among the features.
pylab.scatter(X[0,:], X[1,:])
pylab.scatter(X[0,:], X[2,:])
pylab.scatter(X[1,:], X[2,:])
# Perform PCA. It can be shown that the principal components of the
# matrix X are equivalent to the left singular vectors of X, which are
# equivalent to the eigenvectors of X X^T (up to indeterminacy in sign).
U, S, Vh = scipy.linalg.svd(X)
W, Q = scipy.linalg.eig(scipy.dot(X, X.T))
print U
print Q
# Project the original features onto the eigenspace.
Y = scipy.dot(U.T, X)
# Visualize the absence of correlation among the projected features.
pylab.scatter(Y[0,:], Y[1,:])
pylab.scatter(Y[1,:], Y[2,:])
pylab.scatter(Y[0,:], Y[2,:])
You can check http://alias-i.com/lingpipe/demos/tutorial/svd/read-me.html SVD and LSA is very similar approach to PCA both are space reduction methods. The only difference in basis evaluation approach.
Since you're asking for available hands-on examples, here you have an interactive demo to play with.