What is a good quick pathfinding algorithm? - language-agnostic

What is a good path finding algorithm when you care the amount of time it takes but not how long the path is.
Also is there a faster algorithm if you don't care about the path at all but just want to check reachability.
(Is Flood Fill a good algorithm for this sort of stuff?)

What kind of graph are you finding a path on? Is it a grid? Is it a weight graph?
These things all matter.
Some algorithms that may be helpful include
Breadth First Search
Depth First Search
Dijkstra's Algorithm
A* (A Star)
Floyd Warshall's Algorithm
The Bellman Ford Algorithm

Related

Convolutional filter design in neural networks by data clustering

My understanding is that filters in convolutional neural networks are going to extract features in raw data (or previous layers), so designing them by supervised learning through backpropagation makes complete sense. But I have seen some papers in which the filters are found by unsupervised clustering of input data samples. That looks strange to me how cluster centers can be regarded as good filters for feature extraction. Does anybody have a good explanation for that?
Certain popular clustering algorithms such as k-means are vector quantization methods.
They try to find a good least-squares quantization of the data, such that every data point can be represented by a similar vector with least-squares difference.
So from a least-squares approximation point of view, the cluster centers are good approximations (we can't afford to find the optimal centers, but we have a good chance at finding reasonably good centers). Whether or not least squares is appropriate depends a lot on the data, for example all attributes should be of the same kind. For a typical image processing task, where each pixel is represented the same way, this will be a good starting point for later supervised optimization. But I believe soft factorizations will usually be better that do not assume every patch is of exactly one kind.

Proving equivalence of programs

The ultimate in optimizing compilers would be one that searched among the space of programs for a program equivalent to the original but faster. This has been done in practice for very small basic blocks: https://en.wikipedia.org/wiki/Superoptimization
It sounds like the hard part is the exponential nature of the search space, but actually it's not; the hard part is, supposing you find what you're looking for, how do you prove that the new, faster program is really equivalent to the original?
Last time I looked into it, some progress had been made on proving certain properties of programs in certain contexts, particularly at a very small scale when you are talking about scalar variables or small fixed bit vectors, but not really on proving equivalence of programs at a larger scale when you are talking about complex data structures.
Has anyone figured out a way to do this yet, even 'modulo solving this NP-hard search problem that we don't know how to solve yet'?
Edit: Yes, we all know about the halting problem. It's defined in terms of the general case. Humans are an existence proof that this can be done for many practical cases of interest.
You're asking a fairly broad question, but let me see if I can get you going.
John Regehr does a really nice job surveying some relevant papers on superoptimizers: https://blog.regehr.org/archives/923
The thing is you don't really need to prove whole program equivalence for these types of optimizations. Instead you just need to prove that given the CPU is in a particular state, 2 sequences of code modify the CPU state in the same way. To prove this across many optimizations (i.e. at scale), typically you might first throw some random inputs at both sequences. If they're not equivalent bits of code then you might get lucky and very quickly show this (proof by contradiction) and you can move on. If you haven't found a contradiction, you can now try to prove equivalence via a computationally expensive SAT solver. (As an aside, the STOKE paper that Regehr mentions is particularly interesting if you're interested in superoptimizers.)
Now looking at whole program semantic equivalence, one approach here is the one used by the CompCert compiler. Essentially that compiler is proving this theorem:
If CompCert is able to translate C code X into assembly code Y then X and Y are semantically equivalent.
In addition CompCert does apply a few compiler optimizations and indeed these optimizations are often the areas that traditional compilers get wrong. Perhaps something like CompCert is what you're after in which case, the compiler goes about things via a series of refinement passes where it proves that if each pass succeeds, the results are semantically equivalent to the previous pass.

Real roots of a system of 8 polynomial equations of degree 3

I would like to obtain ALL the real roots of a system of 8 polynomial equations in 8 variables, where the maximum degree of each equation is 3. Is this doable? What is the best software to do this?
You will need to use a multi-dimensional root finding method.
Newton's should be fairly easy to implement (simple algorithm) and you may find some details about this and other methods here.
As far as I understand, it's not likely to be straightforward to solve 8 simultaneous cubic equations with a very general direct method.
Depending on your problem, you may also want to check if it's analogous to the problem of fitting a cubic spline to a set of points. If it is, then a simple direct algorithm is available here.
For the initial conditions, you might need to use well-dispersed random starting points in your domain of interest. You could use sobol sequences or some other low-discrepancy random number generator to efficiently generate random numbers to fill-out the space under consideration.
You may also consider moving this question to math exchange where you might have a better chance of having this answered correctly.

Cosine in floating point

I am trying to implement the cosine and sine functions in floating point (but I have no floating point hardware).
Since my processor has no floating-point hardware, nor instructions, I have already implemented algorithms for floating point multiplication, division, addition, subtraction, and square root. So those are the tools I have available to me to implement cosine and sine.
I was considering using the CORDIC method, at this site
However, I implemented division and square root using newton's method, so I was hoping to use the most efficient method.
Please don't tell me to just go look in a book or that "paper's exist", no kidding they exist. I am looking for names of well known algorithms that are known to be fast and efficient.
First off, depending on your accuracy requirements, this can be considerably fussier than your earlier questions.
Now that you've been warned: you'll first want to reduce the argument modulo pi/2 (or 2pi, or pi, or pi/4) to get the input into a manageable range. This is the subtle part. For a nice discussion of the issues involved, download a copy of K.C. Ng's ARGUMENT REDUCTION FOR HUGE ARGUMENTS: Good to the Last Bit. (simple google search on the title will get you a pdf). It's very readable, and does a great job of describing why this is tricky.
After doing that, you only need to approximate the functions on a small range around zero, which is easily done via a polynomial approximation. A taylor series will work, though it is inefficient. A truncated chebyshev series is easy to compute and reasonably efficient; computing the minimax approximation is better still. This is the easy part.
I have implemented sine and cosine exactly as described, entirely in integer, in the past (sorry, no public sources). Using hand-tuned assembly, results in the neighborhood of 100 cycles are entirely reasonable on "typical" processors. I don't know what hardware you're dealing with (the performance will mostly be gated on how quickly your hardware can produce the high part of an integer multiply).
For various levels of precision, you can find some good approximations here:
http://www.ganssle.com/approx.htm
With the added advantage that they are deterministic in runtime unlike the various "converging series" options which can vary wildly depending on the input value. This matters if you are doing anything real-time (games, motion control etc.)
Since you have the basic arithmetic operations implemented, you may as well implement sine and cosine using their taylor series expansions.

What kind of learning algorithm would you use to build a model of how long it takes a human to solve a given Sudoku situation?

I don't have much experience in machine learning, pattern recognition, data mining, etc. and in their underlying theory and systems.
I would like to develop an artificial model of the time it takes a human to make a move in a given Sudoku puzzle.
So what I'm looking for as an output from the machine learning process is a model that can give predictions on how long does it take for a target human to make a move in a given Sudoku situation.
Same input doesn't always map to same outcome. It takes different times for the human to make a move with the same situation, but my hypothesis is that there's a tendency in the resulting probability distribution. (My educated guess is that it is ~normal.)
I have ideas about the factors that influence the distribution (like #empty slots) but would preferably leave it to the system to figure these patterns out. Please notice, that I'm not interested in the patterns, just the model.
I can generate sample and test data easily by running sudoku puzzles and measuring the times it takes to make the moves.
What kind of learning algorithm would you suggest to use for this?
I was thinking NNs, but I'm not sure if they can have the desired property of giving weighted random outcomes for the same input.
If I understand this correctly you have an input vector of length 81, which contains 1 if the square is filled in and 0 otherwise. You want to learn a function which returns a probability distribution which models the response time of a human to that board position.
My first response would be that this is a regression problem and you should try straightforward linear regression. This will not provide you with a distribution of response times, but a single 'best-guess' response time.
I'm not clear on why you want to model a distribution of response times. However, if you really want to do want to output a distribution then it sounds like you want to look at Bayesian methods. I'm not really an expert on Bayesian inference, so I can't help you much further here.
However, I don't really think your approach is going to work because I agree with your intuition about features such as the number of empty slots being important. There are also other obvious features, such as the number of empty slots per row/column that are likely to be important. Explicitly putting these features in your representation will probably be much more successful than expecting that the learning algorithm will infer something similar on its own.
The monte carlo method seems like it would work well here but would require a stack of solutions the size of the moon to really do it. And it wouldn't give you the time per person, just the time on average.
My understanding of it, tenuous as it is, is that you have a database with a board position and the time it took a human to make the next move. At the very least you have a starting point for most moves. Even if it's not in the database you could start to calculate how long it would take to make a move based on some algorithm. Though I know you had specified you wanted machine learning to do this it might be worth segmenting the problem into something a little smaller then building on it.
If you have some guesstimate as to what influences the function (# of empty cell, etc), try to train a classifier on a vector of features, and not on the 81 cells vector (0/1 or 0..9, doesn't really matter for my argument).
I think that your claim:
we wouldn't have to necessary know the underlying patterns, the "trained patterns" in a learning system automatically encodes these sometimes quite delicate and subtle patterns inside them -- that's one of their great power
is wrong. you do have to give the network the right domain. for example, when trying to detect object in an image, working in the pixel domain is pointless. you'll only get results if you first run some feature detection to detect edges, corners, etc.
Theoretically, with enough non-linearity (in NN - enough layers in the network) it can detect such things, but in practice, I have never seen that work, without giving the classifier the right features to work with.
I was thinking NNs, but I'm not sure if they can have the desired property of giving weighted random outcomes for the same input.
You're just trying to learn a function from 2^81 or 10^81 (or a much smaller feature space as I suggest) to R (response time between 0 and Inf) or some discretization of that. So NN and other classifiers can do that.