If NMDS (vegan) convergence is impossible does that make the output useless? - output

Maybe this question is not in the right place and if so, I'll delete it.
Probably a very basic question: If NMDS (vegan package) convergence does not reach a solution (regardless of dimensions and iterations) does that make the output meaningless?
Because R does give an output. With a nice low stress level, good plot. All of that.
Thank you.

You could try to change the number of dimensions (k = 3 or more), but display only the two first dimensions.
Some dataset are too complex to reach a solution in a 2d space. Even after trymax=100000 one of my 'sp x site' dataset did not reached a solution. However, the stress kept consistently low in all iterations, then I assumed that no best solution was possible. Maybe after billions of iterations.
see:
https://stackoverflow.com/a/14437200/14708092

There is no sufficient information in this question. So the answer is "it depends". Please give us a reproducible example.

Related

Trilateration different approaches and issues

Although there exists several posts about (multi)lateration, i would like to summarize some approaches and present some issues/questions to better clarify the approach.
It seems that are two ways to detect the target location; using geometric/analytic approach (solving directly the equations with some trick) and fitting approach converting from non-linear to linear system.
With respect to the first one i would like to ask few questions.
Suppose in the presence of perfect range measurements,considering 2D case, the exact solution is a unique point at three circles intersection. Can anyone point some geometric solution for the first case? I found this approach: https://math.stackexchange.com/questions/884807/find-x-location-using-3-known-x-y-location-using-trilateration
but is seems it fails to consider two points with the same y coordinate as we can get a division by 0. Moreover can this be extended to 3D?
The same solution can be extracted using the second approach
Ax=b and latter recovering x = A^-1b or using MLS (x = A^T A)^-1 A^T b.
Please see http://www3.nd.edu/~cpoellab/teaching/cse40815/Chapter10.pdf
What about the case when the three circles have no intersection. It seems that the second approach still finds a solution. Is this normal? How can be explained?
What about the first approach when the range measurements are noisy. Does it find an approximate solution or it fails?
Considering the 3D, it seems that it needs at least 4 anchors to provide a unique solution. However, considering 3 anchors it can provide 2 solutions. I am asking if anyone of u guys can provide such equations to find the two solutions. This can be good even we have two solutions we may discard one by checking the values if they agree with our scenario. E.g., the GPS case where we pick the solution located in the earth. Instead the second approach of LMS would provide always one solution, wrong one.
Do u know any existing library C/C++ which would implement some of this techniques and maybe some more complex fitting functions such as non-linear etc.
Thank you
Regards

Algorithm for online approximation of a slowly-changing, real valued function

I'm tackling an interesting machine learning problem and would love to hear if anyone knows a good algorithm to deal with the following:
The algorithm must learn to approximate a function of N inputs and M outputs
N is quite large, e.g. 1,000-10,000
M is quite small, e.g. 5-10
All inputs and outputs are floating point values, could be positive or negative, likely to be relatively small in absolute value but no absolute guarantees on bounds
Each time period I get N inputs and need to predict the M outputs, at the end of the time period the actual values for the M outputs are provided (i.e. this is a supervised learning situation where learning needs to take place online)
The underlying function is non-linear, but not too nasty (e.g. I expect it will be smooth and continuous over most of the input space)
There will be a small amount of noise in the function, but signal/noise is likely to be good - I expect the N inputs will expain 95%+ of the output values
The underlying function is slowly changing over time - unlikely to change drastically in a single time period but is likely to shift slightly over the 1000s of time periods range
There is no hidden state to worry about (other than the changing function), i.e. all the information required is in the N inputs
I'm currently thinking some kind of back-propagation neural network with lots of hidden nodes might work - but is that really the best approach for this situation and will it handle the changing function?
With your number of inputs and outputs, I'd also go for a neural network, it should do a good approximation. The slight change is good for a back-propagation technique, it should not have to 'de-learn' stuff.
I think stochastic gradient descent (http://en.wikipedia.org/wiki/Stochastic_gradient_descent) would be a straight forward first step, it will probably work nicely given the operating conditions you have.
I'd also go for an ANN. Single layer might do fine since your input space is large. You might wanna give it a shot before adding a lot of hidden layers.
#mikera What is it going to be used for? Is it an assignment in a ML course?

Project Euler 298 - there must be a correct answer? (only pastebinned code)

Project Euler has a paging file problem (though it's disguised in other words).
I tested my code(pastebinned so as not to spoil it for anyone) against the sample data and got the same memory contents+score as the problem. However, there is nowhere near a consistent grouping of scores. It asks for the expected difference in scores after 50 turns. A random sampling of scores:
1.50000000
1.78000000
1.64000000
1.64000000
1.80000000
2.02000000
2.06000000
1.56000000
1.66000000
2.04000000
I've tried a few of those as answers, but none of them have been accepted... I know some people have succeeded, so I'm really confused - what the heck am I missing?
Your problem likely is that you don't seem to know the definition of Expected Value.
You will have to run the simulation multiple times and for each score difference, maintain the frequency of that occurence and then take the weighted mean to get the expected value.
Of course, given that it is Project Euler problem, there is probably a mathematical formula which can be used readily.
Yep, there is a correct answer. To be honest, Monte Carlo can theoretically come close in on the expect value given the law of large numbers. However, you won't want to try it here. Because practically each time you run the simu, you will have a slightly different result rounded to eight decimal places (And I think this setting does exactly deprive anybody of any chance of even thinking to use Monte Carlo). If you are lucky, you will have one simu that delivers the answer after lots of trials, given that you have submitted all the previous and failed. I think, captcha is the second way that euler project let you give up any brute-force approach.
Well, agree with Moron, you have to figure out "expected value" first. The principle of this problem is, you have to find a way to enumerate every possible "essential" outcomes after 50 rounds. Each outcome will have its own |L-R|, so sum them up, you will have the answer. No need to say, brute-force approach fails in most of the case, especially in this case. Fortunately, we have dynamic programming (dp), which is fast!
Basically, dp saves the computation results in each round as states and uses them in the next. Thus it avoids repeating the same computation over and over again. The difficult part of this problem is to find a way to represent a state, that is to say, how you would like to save your temp results. If you have solved problem 290 in dp, you can get some hints there about how to understand the problem and formulate a state.
Actually, that isn't the most difficult part for the mind. The hardest mental piece is whether you realize that some memory statuses of the two players are numerically different but substantially equivalent. For example, L:12345 R:12345 vs L:23456 R:23456 or even vs L:98765 R:98765. That is due to the fact that the call is random. That is also why I wrote possible "essential" outcomes. That is, you can summarize some states into one. And only by doing so, your program can finish in reasonal time.
I would run your simulation a whole bunch of times and then do a weighted average of the | L- R | value over all the runs. That should get you closer to the expected value.
Just submitting one run as an answer is really unlikely to work. Imagine it was dice roll expected value. Roll on dice, score a 6, submit that as expected value.

Graph Expansion

I'm currently working on an interesting graph problem, I can't find any algorithms or other stackoverflow questions which mention anything like this.
If I have a graph (undirected, cyclic) and a list of commonly used paths, what is the best way to reduce the average path length by adding in N more edges?
EDIT:: Important point, which might help, all paths start at the same node.
Answering my own question, to cover what I've already considered.
The obvious solution is simply to sort the common paths by order, and slot in a connection between the two ends, and keep doing this until you run out of edges to insert. However, I suspect there is a more intelligent solution.
You could just try inserting all possible edges and see how much the shortest path decreases for each of your given start/end points. Pick the best edge and repeat.
The usefulness of edges depends on what other edges have been added, so if you really want optimality, you'd have to try all sets of N edges. That sounds a tad expensive. Wouldn't surprise me if it was NP-hard.
Interesting question!
Another possible solution, which sounds like it might be the best heuristic, is to take the weighted average of all the end nodes (weighted by path importance), then find the node which is closest to the computed average point. Connect to that node.
Obviously that only works if the nodes are laid out in space somehow, but it's a good analogy.

Contains Test for a Constant Set

The problem statement:
Given a set of integers that is known in advance, generate code to test if a single integer is in the set. The domain of the testing function is the integers in some consecutive range.
Nothing in particular is known now about the range or the set to be tested. The range could be small or huge (but a solution can reject problems that are to big but higher limits are better). It could be that very few of the values in the allowed range are in the set or most of them are or anything in between. The set may be uniformly distributed or clustered. There may be large sections of only contained/not-contained values or there may be at least a few of each type of value in most swaths. (sort of like the assumption made about items to be sorted when analyzing sorting algorithms)
The objective is a procedure for generating effective code for running the test.
Partial solutions that come to mind include
perfect hash function (costly for large sets)
range tests: foreach(b in ranges) if(b.l <= v && v <= b.h) return true;
trees/indexes (more costly than others in some cases)
table lookup (costly for large sets)
the inverse of any of these these (kodos to Jason S)
It seems that an ideal solution would be able to pick what option is best or if none work well, use a tree to break down the full range into sections and then switch to other options for subsection that are better suited to them.
Topic(s) that might be useful include:
Huffman coding
Note: this is not homework. if it was issued as homework below the doctoral level the prof should be shot with a Nerf gun (if you don't get that then re-read the problem, it is very much non trivial)
Note: This is a problem that occurred to me a few days a go and I've been puzzling over it off and on. I have no direct use for this but thought it would be a cool problem to attack. The reason that I wan to generate the code is because generated code will be no slower than general code (it can be the same thing if needed) and might be faster in some/many cases.
I'm posting this question as much to clarify my thoughts as anything. If I can come up with any reasonable or cool solutions I plan on implementing them as a template meta program (the other reason for generated code)
some people have noted that the problem is very general. That is the point. I'm hoping to generate a system that would work an a very general domain: sets of integers in some range.
a previous question on dictionary/spellchecking had a number of responses that mentioned Bloom filters; maybe that would help.
I would think that testing for large sets is going to be expensive no matter what.
let's pretend, for a moment, that this is a real question:
there are no limits on the size of the base set or the input set
this makes the "problem" unrealistic, underspecified, and un-solvable in any practical sense
if someone wants to posit a solution, here's some unit test cases:
unit test 1:
the base set is all integers between -1,000,000,000,000 and +1,000,000,000,000 except for 100,000,000,000 randomly-removed values
the input set is 100,000,000,000 randomly-generated integers in the same range
unit test 2:
the base set is the Fibonacci series
the input set is 1T randomly-generated integers in the range 0..infinity
there's also boost::dynamic_bitset, not sure how it scales for time, or in space with respect to distribution of original numbers. (e.g. if the bits are stored in chunks of 8/16/32/64, then sparse bitsets are inefficient)
or perhaps this (compressed bit set) or this (bit vector) webpage (I googled for "large sparse bit sets" and "compressed bit sets")