What is a modifying function with amortized complexity requirement vs. one with non amortized? - language-agnostic

What is the official theoretical definition of amortized vs. non amortized complexity of a modification function? The question is especially relevant inside the C++ standard for the STL, but it's a more general question.
Can "constant" (non mutating) functions (observers, or "getters") have amortized complexity?
EDIT: clarification of apparently confusing "observers" (which can either mean observing function or external observer, according to context).

There are at least three (indeed more) independent and orthogonal axes along which asymptotic analysis of algorithms may occur: the case (best, average, worst); the bound (big-O, little-omega, Theta); and whether amortization is to be considered.
The case describes the class of input under consideration. It is literally a subset of the inputs to the algorithm. When performing asymptotic complexity analysis, it is natural to partition the input state into subsets based on the algorithm's performance w.r.t. elements of the subset. So, you might have a best case for which the algorithm does asymptotically as well as possible; a subset where it does asymptotically as poorly as possible; or, in the case of average complexity, a set of inputs along with their relative weight or probability of occurring. In the absence of a specifically described case, it's natural to assume that all inputs are included and have equal weight.
The bound describes the how the algorithm's complexity behaves asymptotically for inputs of a certain class. The complexity can be bounded from above, from below, or both; the bounds can be tight or not; and while which bound you choose might in practice be informed by the case under consideration (lower bound on best case, upper bound on worst case, etc.) in theory the choice is completely independent.
Amortized analysis is performed on top of the underlying complexity analysis and contemplates not single inputs but sequences of inputs. Amortized analysis seeks to explain how the aggregate time complexity of a sequence of operations behaves. For instance, consider the simple case of inserting a new element into an array-backed vector structure. If we have enough capacity, the operation is O(1). if we lack capacity, the operation is O(n). If we increase the capacity of the vector arithmetically (adding, e.g., k new spots each time), then we will have O(1) accesses (k-1)/k of the time, and O(n) accesses 1/k of the time, for O(n) amortized complexity. However, if we geometrically increase the capacity each time we need more capacity, then we find a sequence of adds will have O(1) amortized complexity.
Truly constant functions can have amortized analysis performed but there is really no reason to do so. Amortized analysis only makes sense when a potentially small number of repeated requests have poor individual performance, while the majority (asymptotically speaking) of requests has asymptotically better performance.

Related

Simulating a matrix of variables with predefined correlation structure

For a simulation study I am working on, we are trying to test an algorithm that aims to identify specific culprit factors that predict a binary outcome of interest from a large mixture of possible exposures that are mostly unrelated to the outcome. To test this algorithm, I am trying to simulate the following data:
A binary dependent variable
A set of, say, 1000 variables, most binary and some continuous, that are not associated with the outcome (that is, are completely independent from the binary dependent variable, but that can still be correlated with one another).
A group of 10 or so binary variables which will be associated with the dependent variable. I will a-priori determine the magnitude of the correlation with the binary dependent variable, as well as their frequency in the data.
Generating a random set of binary variables is easy. But is there a way of doing this while ensuring that none of these variables are correlated with the dependent outcome?
Thank you!
"But is there a way of doing this while ensuring that none of these variables are correlated with the dependent outcome?"
With statistical sampling you can't ensure anything, you can only adjust the acceptable risk. Finding an acceptable level of risk may be harder than many people think.
Spurious correlations are a very real phenomenon. Real independent observations will often contain correlations, and if you want to actually test your algorithm to see how it will perform in reality then your tests should produce such phenomena in a manner similar to the real world—you should be generating independent candidate factors and allowing spurious correlations to occur.
If you are performing ~1000 independent tests of candidate factors, and you're targeting a risk level of α = 0.05, you can expect 50 non-significant terms to leak through into your analysis. To avoid this, you need to adjust your testing threshold using something along the lines of a Bonferroni correction. Recall that statistical discriminating power is based on standard error, which is inversely proportional to the square root of the sample size. Bonferroni says that 1000 simultaneous tests need their individual test threshold to be adjusted by a factor of 1000, which in turn means the sample size needs to be a million times larger than when performing a single test for significance.
So in summary I'd say that you shouldn't attempt to ensure lack of correlation, it's going to occur in the real world. You can mitigate the risk of non-predictive factors being included due to spurious correlation by generating massive amounts of data. In practice there will be non-predictors that leak through unless you can obtain enough data, so I'd suggest that your testing should address the rates of occurrence as a function of number of candidate factors and the sample size.

Most efficient computational method to numerically minimize a 8 variables constrained system

I'm working for quite some time on finding a numerical instance for solution of a 8 variables system of 7 very complicated inequalities plus region specification. Unfortunately I cannot produce a MWE or nothing of the sort since the inputs are really long.
My current method is Mathematica's NMinimize routine, minimizing one of the 7 inequalities subject to every other condition as constraint -- The FindInstance command simply quits the kernel without being able to finish running.
The NMinimize is able to produce output, but besides being slower than would be optimal, produce results that do not obey every constraint.
The thing is that I need to be certain, for each benchmark I run, that if the output doesn't satisfy every constraint it is because such a set of real numbers doesn't exist -- which with my current method I can't be, by experience.
So: is there a foolproof, as efficient as possible, computational method for me to find a single instance of numerical solution to 7 complicated inequalities (involving trigonometric functions) of 8 variables or be sure that such a set doesn't exist?
It could be a Mathematica/python/fortran package, genetic algorithm or anything -- as long as there is clear enough documentation.
You need to give importance multiplier to constraints and the optimization method should not be greedy.
A genetic algorithm combined with multiple-starting points (or simulated annealing for diminishing mutations) tends to converge to global minima (hence not greedy) with more time given to it but there is not guarantee that the heuristic will complete X function in Y time. The more time given to it, the better it converges to global minima.
In genetic algorithm, you can add big constraint penalties like this:
fitness_minima = some_function_output_between_1_and_10 +
constraints_breached?1000.0f:0;
so that the DNAs with no contraint-violations will be favored for the crossover part of GA.
"As efficient as possible" depends on your algorithm. If you can parallelize the algorithm and run it on multiple GPUs, it should give substantial speedup over CPU. Compared to some hours of Mona-Lisa painting by CPU, a parallelized version running on 3 low-end GPUs complete within 10 minutes (https://www.youtube.com/watch?v=QRZqBLJ6brQ). At least some OpenCL/CUDA supporting libraries/frameworks (like Tensorflow) should be able to accelerate your algorithm if you don't want to do the work distribution yourself.

Alternatives to the Big-O notation?

Good afternoon all,
We say that a hashtable has O(1) lookup (provided that we have the key), whereas a linked list has O(1) lookup for the next node (provided that we have a reference to the current node).
However, due to how the Big-O notation works, it is not very useful in expressing (or differentiating) the cost of an algorithm x, vs the cost of an algorithm x + m.
For example, even though we label both the hashtable's lookup and the linked list's lookup as O(1), these two O(1)s boil down to a very different number of steps indeed,
The linked list's lookup is fixed at x number of steps. However, the hashtable's lookup is variable. The cost of the hashtable's lookup depends on the cost of the hashing function, so the number of steps required for the hashtable's lookup is: x + m,
where x is a fixed number
and m is an unknown variable value
In other words, even though we call both operations O(1), the cost of the hashtable's lookup is a magnitude higher than the cost of the linked list's lookup.
The Big-O notation is specifically about the size of the input data collection. This does have its advantages, but it has its disadvantages as well, as can be seen when we collapse and normalize all non-n variables into 1. We cannot see the m variable (the hashing function) inside it anymore.
Besides the Big-O notation, Is there another (established) notation we can use for expressing the fixed-cost O(1) which means x operations and the variable-cost O(1) which means x + m (m, the hashing function) number of operations?
literal O(1) which means exactly 1 operation
Except it doesn't. The big O-Notation concerns relative comparision of complexity in relation to an input. If the algorithm does take a constant amount of steps, completely independent of the size of your input, than the exact amount of steps doesn't matter.
Take a look at the (informal) definition of O(n):
It means: There is a certain k so that for each n the function f is smaller than the function g.
In the case above, the hashtable lookup and linked list lookup would be f, and g would be g(n) = 1. For each case, you are able to find a k that f(n) <= g(n) * k.
Now, this k doesn't need to fixed, it can vary depending on platform, implementation, specific hardware. The only interesting point is that it exists. That's why both hashtable lookup and linked list node lookup are O(1): Both have a constant complexity, regardless of input. And when evaluating algorithms, that's what interesting, not the physical steps.
Specifically concerning the Hashtable lookup
Yes, the hash function does take a variable amount of operations (depending on implementation). However, it doesn't take a variable amount of operation depending on the size of the input. Big O-Nation is specifically about the size of the input data collection. A hash function takes a single element. For the evaluation of an algorithm it doesn't matter wether a certain function takes 10, 20, 50 or 100 operations, if the number of operations doesn't increase with the input size, it is O(1). There is no way to distinguish this in big O-Notation, as this isn't what big O-Notation is about.
"~" includes the constant factor - see the family of bachmann functions
The issue is that the "number of operations" is highly context dependent. In fact, that's why big-O notation was invented -- it seems to work rather well in modelling a broad number of computers.
Besides, what a programmer things the number of "ops" is doesn't mean how much time it actually does take (e.g. is it already in cache?) or how many steps hardware actually takes (what does your processor do -exactly-? Does it have micro-ops?) or even how many operations are dictated to the processor (what is your compiler doing for you?). And those are all concerns, even when you try to define a precise concept that's abstract enough to be useful.
So. For now, it's Big-O vs. "operations" -- whatever "operations" means to you and your colleagues at the time.

What's the term to describe an algorithm that has the same complexity as the underlying prroblem?

Whilst researching data structures for a project a few months back, I came across a term that I quite liked, that could be used as follows:
This [Algorithm/Solution/Data structure] is ?????ally-optimal
Meaning that the time (or space, depending on context) complexity of the solution being referred to is the same as the fundamental complexity as the problem it solves.
For example, if we ignore quantum computation and accept that problem of sorting is O(n log n) time in the general case, then with respect to time complexity heap sort is ?????ally-optimal because its complexity is also O(n log n), whereas bubble sort is not ?????ally-optimal because O(n^2) is worse than O(n log n).
I have no idea where I read it, I've so far failed to find it with google, and not being able to remember it has been bothering me ever since!
Maybe you are talking about Asymptotically optimal algorithm:
In computer science, an algorithm is said to be asymptotically optimal if, roughly speaking, for large inputs it performs at worst a constant factor (independent of the input size) worse than the best possible algorithm.
Are you thinking computationally optimal? probably "asymptotically optimal," like another answer said. It seems what you're describing is big-theta:
If a problem has been proven to take at least f(x), it is called Omega(f(x)); an algorithm's worst case is big-O(g(x)). When f(x) == g(x), that is to say the worst case for the solution is the best case for the problem, the algorithm is big-theta(f(x)). So heapsort, e.g. is theta(n*log(n)).

Hashtable and list side by side?

I need a data structure that is ordered but also gives fast random access and inserts and removes. Linkedlists are ordered and fast in inserts and removes but they give slow random access. Hashtables give fast random access but are not ordered.
So, it seems to nice to use both of them together. In my current solution, my Hashtable includes iterators of the list and the List contains the actual items. Nice and effective. Okay, it requires double the memory but that's not an issue.
I have heard that some tree structures could do this also, but are they as fast as this solution?
The most efficient tree structure I know is Red Black Tree, and it's not as fast as your solution as it has O(log n) for all operations while your solution has O(1) for some, if not all, operations.
If memory is not an issue and you sure your solution is O(1) meaning the time required to add/delete/find item in the structure is not related to the amount of items you have, go for it.
You should consider a Skip List, which is an ordered linked-list with O(log n) access times. In other words, you can enumerate it O(n) and index/insert/delete is O(log n).
Trees are made for this. The most appropriate are self-balancing trees like AVL tree or Red Black tree. If you deal with a very big data amounts, it also may be useful to create B-tree (they are used for filesystems, for example).
Concerning your implementation: it may be more or less efficient then trees depending on data amount you work with and HashTable implementation. E.g. some hash tables with a very dense data may give access not in O(1) but in O(log n) or even O(n). Also remember that computing hash from data takes some time too, so for a quit small data amounts absolute time for computing hash may be more then for searching it in a tree.
What you did is pretty much the right choice.
The cool thing about this is that adding ordering to an existing map implementation by using a double-ended doubly-linked list doesn't actually change its asymptotic complexity, because all the relevant list operations (appending and deleting) have a worst-case step complexity of Θ(1). (Yes, deletion is Θ(1), too. The reason it is usually Θ(n) is because you have to find the element to delete first, which is Θ(n), but the actual deletion itself is Θ(1). In this particular case, you let the map do the finding, which is something like Θ(1) amortized worst-case step complexity or Θ(logb n) worst-case step complexity, depending on the type of map implementation used.)
The Hash class in Ruby 1.9, for example, is an ordered map, and it is implemented at least in YARV and Rubinius as a hash table embedded into a linked list.
Trees generally have a worst-case step complexity of Θ(logb n) for random access, whereas hash tables may be worse in the worst case (Θ(n)), but usually amortize to Θ(1), provided you don't screw up the hash function or the resize function.
[Note: I'm deliberately only talking about asymptotic behavior here, aka "infinitely large" collections. If your collections are small, then just choose the one with the smallest constant factors.]
Java actually contains a LinkedHashTable, which is similar to the data-structure you're describing. It can be surprisingly useful at times.
Tree structures could work as well, seeing they can perform random access (and most other operations) in (O log n) time. Not as fast as Hashtables (O 1), but still fast unless your database is very large.
The only real advantage of trees is that you don't need to decide on the capacity beforehand. Some HashTable implementations can grow their capacity as needed, but simply do so by copying all items into a new, larger hashtable when they've exceeded their capacity, which is very slow. (O n)