Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
Are there any examples available that give a hands-on example of Principal Component Analysis on a dataset? I am reading articles discussing only theory and am really looking for something that will show me how to use PCA and then interpret the results and transform the original dataset into the new dataset. Any suggestions please?
If you know Python, here is a short hands-on example:
# Generate correlated data from uncorrelated data.
# Each column of X is a 3-dimensional feature vector.
Z = scipy.randn(3, 1000)
C = scipy.randn(3, 3)
X = scipy.dot(C, Z)
# Visualize the correlation among the features.
pylab.scatter(X[0,:], X[1,:])
pylab.scatter(X[0,:], X[2,:])
pylab.scatter(X[1,:], X[2,:])
# Perform PCA. It can be shown that the principal components of the
# matrix X are equivalent to the left singular vectors of X, which are
# equivalent to the eigenvectors of X X^T (up to indeterminacy in sign).
U, S, Vh = scipy.linalg.svd(X)
W, Q = scipy.linalg.eig(scipy.dot(X, X.T))
print U
print Q
# Project the original features onto the eigenspace.
Y = scipy.dot(U.T, X)
# Visualize the absence of correlation among the projected features.
pylab.scatter(Y[0,:], Y[1,:])
pylab.scatter(Y[1,:], Y[2,:])
pylab.scatter(Y[0,:], Y[2,:])
You can check http://alias-i.com/lingpipe/demos/tutorial/svd/read-me.html SVD and LSA is very similar approach to PCA both are space reduction methods. The only difference in basis evaluation approach.
Since you're asking for available hands-on examples, here you have an interactive demo to play with.
Related
Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 months ago.
Improve this question
Can anyone help me with the Mathematics of sparse categorical cross entropy loss function? I have searched for the derivation, explanation (Mathematical) but couldn't find any
I know it is not the right place to ask question like this. But I am helpless.
It is just cross entropy loss. The "sparse" refers to the representation it is expecting for efficiency reasons. E.g. in keras it is expected that label provided is an integer i*, an index for which target[i*] = 1.
CE(target, pred) = -1/n SUM_k [ SUM_i target_ki log pred_ki ]
and since we have sparse target, we have
sparse-CE(int_target, pred) = -1/n SUM_k [ log pred_k{int_target_k} ]
So instead of summing over label dimension we just index, since we know all remaining ones are 0s either way.
And overall as long as targets are exactly one class we have:
CE(target, pred) = CE(onehot(int_target), pred) = sparse-CE(int_target, pred)
The only reason for this distinction is efficiency. For regular classification with ~10-100 classes it does not really matter, but imagine word-level language models where we have thousands of classes.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed last year.
Improve this question
I need help with understanding the shaping theorem for MDPs. Here's the relevant paper: https://people.eecs.berkeley.edu/~pabbeel/cs287-fa09/readings/NgHaradaRussell-shaping-ICML1999.pdf it basically says that a markov decision process that has some reward function on transitions between states and actions R(s, a, s') has the same optimal policy as a different markov decision process with it's reward defined as R'(s, a, s') = R(s, a, s') + gamma*f(s') - f(s), where gamma is the time-discount-rate.
I understand the proof, but it seems like a trivial case where it breaks down is when R(s, a, s') = 0 for all states and actions, and the agent is faced with the path A -> s -> B versus A -> r -> t -> B. With the original markov process we get an EV of 0 for both paths, so both paths are optimal. But with the potential added to each transition we get, gamma^2*f(B)-f(A) for the first path, and gamma^3*f(B) - f(A) for the second. So if gamma < 1, and 0 < f(B), f(A), then the second path is no longer optimal.
Am I misunderstanding the theorem, or am I making some other mistake?
You are missing the assumption that for every terminal, and starting state s_T, s_0 we have f(s_T) = f(s_0) = 0. (Note, that in the paper there is an assumption that after terminal state there is always the new starting state, and the potential "wraps around).
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I have been thinking about convolutions recently. There are common 3by3 convs, where (3,3) kernel's information is weighted and aggregated to supply information to a single spatial point on the output. There are also 3 by 3 upconvs, where a single spatial point on the input supplies weighted information to a 3 by 3 output space.
The conv is a many to one relationship and the upconv is a one to many relationship.
I have however never heard of many to many conv? is there such a thing? For example, a 3by3 kernel supplying information to another 3by3 kernel. I would like to experiment with it in PyTorch. My internet searching has not revealed anything.
You can combine pixel shuffle and averaging to get what you want.
for example, if you want 3x3 -> 3x3 mapping with in_channels to out_channels:
from torch import nn
import torch.nn.functional as nnf
class ManyToManyConv2d(nn.Module):
def __init__(in_channels, out_channels, in_kernel, out_kernel):
self.out_kernel = out_kernel
self.conv = nn.Conv2d(in_channels, out_channles * out_kernel * out_kernel, in_kernel)
def forward(self, x):
y = self.conv(x) # all the output kernel are "folded" into the channel dim
y = nnf.pixel_shuffle(y, self.out_kernel) # "unfold" the out_kernel - image size *out_kernel bigger
y = nnf.avg_pool2d(y, self.out_kernel)
return y
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I can calculate distance between inclined line and my ball (with normal vector), But how can I calculate new velocity?
Anders' answer was a good one but I realise that you may not have a great mathematical back ground so I will elaborate. The problem you have at the moment is poorly stated. However, see the following figure
This will allow us to derive the equation you require. Now, the scalar product of two vectors a and b, a.b gives the magnitude of a multiplied by the projection of b onto a. Basically, if we take n as a unit vector (magnitude 1 in each component direction) then a.n gives the magnitude of the components of a which act in the direction of n.
So, splitting the velocity components into those parallel and perpendicular to the plain; to get the velocity V we first split U into components.
Perpendicular to the plane in direction n, we have a vector velocity w = (U.n) n. This means that in fact we can write U = (U.n) n + [U - (U.n) n]. This is saying that U is made up of the perpendicular component of itself + the parallel component of itself. Now, -V is very similar to U but the parallel components acts in the reverse direction, so we can write -V = (U.n) n - [U - (U.n) n].
Combining the above gives the result Anders stated, i.e. V = U -2[(U.n) n]. The dot/scalar product is defined as a.b = |a||b|cos(A) where A is the angle between the vectors laid together tail-to-tail, this should enable you to solve your problem.
I hope this helps
If The vector v=(vx,vy) is the initial velocity and the plane has normal n=(nx,ny) then the new reflected velocity vector r will be
r=v−2(v⋅n)*n
The product (v⋅n) is the dot product of v and n, defined as vxnx+vyny. Note that the plane normal must be normalized (length 1.0). A related question with the same answer https://math.stackexchange.com/questions/13261/how-to-get-a-reflection-vector
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
Let there be the following definition of gradient descent cost function
with the hypothesis function defined as
what I've come up with for multivariate linear regression is
theta = theta - alpha * 1/m * ([theta', -1]*[X';y']*X)';
h_theta = 1/(2*m)* (X*theta - y)'*(X*theta-y);
(octave notation, ' means matrix transpose, [A, n] means adding a new column to matrix A with scalar value n, [A; B] means appending matrix B to matrix A row-wise)
It's doing its job correctly how far I can tell (the plots look ok), however I have a strong feeling that it's unnecessarily complicated.
How to write it with as little matrix operations as possible (and no element-wise operations, of course)?
I don't think that is unnecessarily complicated, and instead this is what you want. Matrix operations are good because you don't have to loop over elements yourself or do element-wise operations. I remember taking a course online and my solution seems pretty similar.
The way you have it is the most efficient way of doing it as it is fully vectorized. It can be done by having a for loop over the summation and so on, however this is very inefficient in terms of processing power.