2nd order centered finite-difference approximation - numerical-methods

This question may sound mathematical, but it's more of a programming question related to discretization, so I decided to ask it here.
The problem is to find a 2nd order finite difference approximation of the partial derivative uxy, where u is a function of x and y.
Page 5 of this pdf I found does a centered difference approximation it in two steps. It first does the 2nd order centered finite-difference approximation of one of the partials, and then inserts the approximation of the second partial into it (using the same formula):
Inserting lines 2 and 3 into 1 gives (according to the pdf) the following:
The last O[(Δx)2,(Δy)2] is what I have a problem with. Notice that when the O(Δy)2 terms of lines 2 and 3 go into the numerator of 1, they are being divided by the Δx in the denominator. So how come the residual terms in line 3 are of O(Δy)2 instead of O(Δy2/Δx)? Would this be a '2nd order' approximation any more? (If, say, grid-spacing along both axes are the same (Δx=Δy=h), the term is of order h2/h =h, not h2.)
My suggestion would be to use a higher order approximation (3rd or more) in lines 2 and 3 in order to survive the division by Δx and still have the final expression in 2nd order. But I may be missing something here.

If I remember correctly, if you write more terms in the Taylor expansions, it quickly becomes obvious that the higher order terms cancel out. That is, that the "O(dy)^2 - O(dy)^2" that you'd get after substitution of (2) and (3) in the numerator of (1) actually does become zero.

You have two 1st order slopes that combined gives a 1st order plane. You are not gaining anything in terms of order by combining the two slopes to get (∂u/∂x)*(∂u/∂y).
This is still a 1st order approximation and if needed you will need to use more points into the finite difference to get higher order terms.
I think the notation of (∂²u/∂x∂y) is confusing the matter. Use the product of two 1st order operators to be more clear on what is going on.

Related

perMANOVA for small sample size

I have data of 6 groups with sample size of n = 2, 10, 2, 9, 3, 1 and I want to perform Permutational multivariate analysis of variance (PERMANOVA) on these data.
My question is: Is it correct to run perMANOVA on these data with the small sample size? The results look strange for me because the group of n = 1 showed insignificant difference to other groups although the graphical representation of the groups clearly show a difference.
Thank you
I would not trust any result with group of n=1 because there is no source of variation to define difference among groups.
I have also received some answers from other platforms. I put them here for information:
The sample size is simply too small to yield a stable solution via manova. Note that the n = 1 cell contributes a constant value for that cell's mean, no matter what you do by way of permutations.
Finally, note that the effective per-cell sample size with unequal cell n for one-way designs tracks well to the harmonic mean of n. For your data set as it stands, that means an "effective" per-cell n of about 2.4. Unless differences are gigantic on the DV set, no procedure (parametric or exact/permutation) will have the statistical power to detect differences with that size.
MANOVA emphasizes the attribute scattering in the study group and the logic of this analysis is based on the scattering of scores. It is not recommended to use small groups with one or more people (I mean less than 20 people) to perform parametric tests such as MANOVA. In my opinion, use non-parametric tests to examine small groups.

Find the Relationship Between Two Logarithmic Equations

No idea if I am asking this question in the right place, but here goes...
I have a set of equations that were calculated based on numbers ranging from 4 to 8. So an equation for when this number is 5, one for when it is 6, one for when it is 7, etc. These equations were determined from graphing a best fit line to data points in a Google Sheet graph. Here is an example of a graph...
Example...
When the number is between 6 and 6.9, this equation is used: windGust6to7 = -29.2 + (17.7 * log(windSpeed))
When the number is between 7 and 7.9, this equation is used: windGust7to8 = -70.0 + (30.8 * log(windSpeed))
I am using these equations to create an image in python, but the image is too choppy since each equation covers a range from x to x.9. In order to smooth this image out and make it more accurate, I really would need an equation for every 0.1 change in number. So an equation for 6, a different equation for 6.1, one for 6.2, etc.
Here is an example output image that is created using the current equations:
So my question is: Is there a way to find the relationship between the two example equations I gave above in order to use that to create a smoother looking image?
This is not about logarithms; for the purposes of this derivation, log(windspeed) is a constant term. Rather, you're trying to find a fit for your mapping:
6 (-29.2, 17.7)
7 (-70.0, 30.8)
...
... and all of the other numbers you have already. You need to determine two basic search paramteres:
(1) Where in each range is your function an exact fit? For instance, for the first one, is it exactly correct at 6.0, 6.5, 7.0, or elsewhere? Change the left-hand column to reflect that point.
(2) What sort of fit do you want? You are basically fitting a pair of parameterized equations, one for each coefficient:
x y x y
6 -29.2 6 17.7
7 -70.0 7 30.8
For each of these, you want to find the coefficients of a good matching function. This is a large field of statistical and algebraic study. Since you have four ranges, you will have four points for each function. It is straightforward to fit a cubic equation to each set of points in Cartesian space. However, the resulting function may not be as smooth as you like; in such a case, you may well find that a 4th- or 5th- degree function fits better, or perhaps something exponential, depending on the actual distribution of your points.
You need to work with your own problem objectives and do a little more research into function fitting. Once you determine the desired characteristics, look into scikit for fitting functions to do the heavy computational work for you.

Do we input only 1s for minterms and 0s for maxterms?

This has been bugging me since a long time.
Suppose I have a boolean function F defined as follows:
Now, it can be expressed in its SOP form as:
F = bar(X)Ybar(Z)+ XYZ
But I fail to understand why we always complement the 0s to express them as 1. Is it assumed that the inputs X, Y and Z will always be 1?
What is the practical application of that? All the youtube videos I watched on this topic, how to express a function in SOP form or as sum of minterms but none of them explained why we need this thing? Why do we need minterms in the first place?
As of now, I believe that we design circuits to yield and take only 1 and that's where minterms come in handy. But I couldn't get any confirmation of this thing anywhere so I am not sure I am right.
Maxterms are even more confusing. Do we design circuits that would yield and take only 0s? Is that the purpose of maxterms?
Why do we need minterms in the first place?
We do not need minterms, we need a way to solve a logic design problem, i.e. given a truth table, find a logic circuit able to reproduce this truth table.
Obviously, this requires a methodology. Minterm and sum-of-products is mean to realize that. Maxterms and product-of-sums is another one. In either case, you get an algebraic representation of your truth table and you can either implement it directly or try to apply standard theorems of boolean algebra to find an equivalent, but simpler, representation.
But these are not the only tools. For instance, with Karnaugh maps, you rewrite your truth table with some rules and you can simultaneously find an algebraic representation and reduce its complexity, and it does not consider minterms. Its main drawback is that it becomes unworkable if the number of inputs rises and it cannot be considered as a general way to solve the problem of logic design.
It happens that minterms (or maxterms) do not have this drawback, and can be used to solve any problem. We get a trut table and we can directly convert it in an equation with ands, ors and nots. Indeed minterms are somehow simpler to human beings than maxterms, but it is just a matter of taste or of a reduced number of parenthesis, they are actually equivalent.
But I fail to understand why we always complement the 0s to express them as 1. Is it assumed that the inputs X, Y and Z will always be 1?
Assume that we have a truth table, with only a given output at 1. For instance, as line 3 of your table. It means that when x=0, y=1 and z=0 , the output will be zero. So, can I express that in boolean logic? With the SOP methodology, we say that we want a solution for this problem that is an "and" of entries or of their complement. And obviously the solution is "x must be false and y must be true and z must be false" or "(not x) must be true and y must be true and (not z) must be true", hence the minterm /x.y./z. So complementing when we have a 0 and leaving unchanged when we have a 1 is way to find the equation that will be true when xyz=010
If I have another table with only one output at 1 (for instance line 8 of your table), we can find similarly that I can implement this TT with x.y.z.
Now if I have a TT with 2 lines at 1, one can use the property of OR gates and do the OR of the previous circuits. when the output of the first one is 1, it will force this behavior and ditto for the second. And we directly get the solution for your table /xy/z+xyz
This can be extended to any number of ones in the TT and gives a systematic way to find an equation equivalent to a truth table.
So just think of minterms and maxterms as a tool to translate a TT into equations. What is important is the truth table (that describes the behaviour of what you want to do) and the equations (that give you a way to realize it).

Determining edge weights given a list of walks in a graph

These questions regard a set of data with lists of tasks performed in succession and the total time required to complete them. I've been wondering whether it would be possible to determine useful things about the tasks' lengths, either as they are or with some initial guesstimation based on appropriate domain knowledge. I've come to think graph theory would be the way to approach this problem in the abstract, and have a decent basic grasp of the stuff, but I'm unable to know for certain whether I'm on the right track. Furthermore, I think it's a pretty interesting question to crack. So here we go:
Is it possible to determine the weights of edges in a directed weighted graph, given a list of walks in that graph with the lengths (summed weights) of said walks? I recognize the amount and quality of permutations on the routes taken by the walks will dictate the quality of any possible answer, but let's assume all possible walks and their lengths are given. If a definite answer isn't possible, what kind of things can be concluded about the graph? How would you arrive at those conclusions?
What if there were several similar walks with possibly differing lengths given? Can you calculate a decent average (or other illustrative measure) for each edge, given enough permutations on different routes to take? How will discounting some permutations from the available data set affect the calculation's accuracy?
Finally, what if you had a set of initial guesses as to the weights and had to refine those using the walks given? Would that improve upon your guesstimation ability, and how could you apply the extra information?
EDIT: Clarification on the difficulties of a plain linear algebraic approach. Consider the following set of walks:
a = 5
b = 4
b + c = 5
a + b + c = 8
A matrix equation with these values is unsolvable, but we'd still like to estimate the terms. There might be some helpful initial data available, such as in scenario 3, and in any case we can apply knowledge of the real world - such as that the length of a task can't be negative. I'd like to know if you have ideas on how to ensure we get reasonable estimations and that we also know what we don't know - eg. when there's not enough data to tell a from b.
Seems like an application of linear algebra.
You have a set of linear equations which you need to solve. The variables being the lengths of the tasks (or edge weights).
For instance if the tasks lengths were t1, t2, t3 for 3 tasks.
And you are given
t1 + t2 = 2 (task 1 and 2 take 2 hours)
t1 + t2 + t3 = 7 (all 3 tasks take 7 hours)
t2 + t3 = 6 (tasks 2 and 3 take 6 hours)
Solving gives t1 = 1, t2 = 1, t3 = 5.
You can use any linear algebra techniques (for eg: http://en.wikipedia.org/wiki/Gaussian_elimination) to solve these, which will tell you if there is a unique solution, no solution or an infinite number of solutions (no other possibilities are possible).
If you find that the linear equations do not have a solution, you can try adding a very small random number to some of the task weights/coefficients of the matrix and try solving it again. (I believe falls under Perturbation Theory). Matrices are notorious for radically changing behavior with small changes in the values, so this will likely give you an approximate answer reasonably quickly.
Or maybe you can try introducing some 'slack' task in each walk (i.e add more variables) and try to pick the solution to the new equations where the slack tasks satisfy some linear constraints (like 0 < s_i < 0.0001 and minimize sum of s_i), using Linear Programming Techniques.
Assume you have an unlimited number of arbitrary characters to represent each edge. (a,b,c,d etc)
w is a list of all the walks, in the form of 0,a,b,c,d,e etc. (the 0 will be explained later.)
i = 1
if #w[i] ~= 1 then
replace w[2] with the LENGTH of w[i], minus all other values in w.
repeat forever.
Example:
0,a,b,c,d,e 50
0,a,c,b,e 20
0,c,e 10
So:
a is the first. Replace all instances of "a" with 50, -b,-c,-d,-e.
New data:
50, 50
50,-b,-d, 20
0,c,e 10
And, repeat until one value is left, and you finish! Alternatively, the first number can simply be subtracted from the length of each walk.
I'd forget about graphs and treat lists of tasks as vectors - every task represented as a component with value equal to it's cost (time to complete in this case.
In tasks are in different orderes initially, that's where to use domain knowledge to bring them to a cannonical form and assign multipliers if domain knowledge tells you that the ratio of costs will be synstantially influenced by ordering / timing. Timing is implicit initial ordering but you may have to make a function of time just for adjustment factors (say drivingat lunch time vs driving at midnight). Function might be tabular/discrete. In general it's always much easier to evaluate ratios and relative biases (hardnes of doing something). You may need a functional language to do repeated rewrites of your vectors till there's nothing more that romain knowledge and rules can change.
With cannonical vectors consider just presence and absence of task (just 0|1 for this iteratioon) and look for minimal diffs - single task diffs first - that will provide estimates which small number of variables. Keep doing this recursively, be ready to back track and have a heuristing rule for goodness or quality of estimates so far. Keep track of good "rounds" that you backtraced from.
When you reach minimal irreducible state - dan't many any more diffs - all vectors have the same remaining tasks then you can do some basic statistics like variance, mean, median and look for big outliers and ways to improve initial domain knowledge based estimates that lead to cannonical form. If you finsd a lot of them and can infer new rules, take them in and start the whole process from start.
Yes, this can cost a lot :-)

How many combinations of k neighboring pixels are there in an image?

I suck at math, so I can't figure this out: how many combinations of k neighboring pixels are there in an image? Combinations of k pixels out of n * n total pixels in the image, but with the restriction that they must be neighbors, for each k from 2 to n * n. I need the sum for all values of k for a program that must take into account that many elements in a set that it's reasoning about.
Neighbors are 4-connected and do not wrap-around.
Once you get the number of distinct shapes for a blob of pixels of size k (here's a reference) then it comes down to two things:
How many ways on your image can you place this blob?
How many of these are the same so that you don't double-count (because of symmetries)?
Getting an exact answer is a huge computational job (you're looking at more than 10^30 distinct shapes for k=56 -- imagine if k = 10,000) but you may be able to get good enough for what you need by fitting for the first 50 values of k.
(Note: the reference in the wikipedia article takes care of duplicates with their definition of A_k.)
It seems that you are working on a problem that can be mapped to Markovian Walks.
If I understand your question, you are trying to count paths of length k like this:
Start (end)-> any pixel after visiting k neighbours
* - - - - -*
| |
| |
- - - -
in a structure that is similar to a chess board, and you want to connect only vertical and horizontal neighbours.
I think that you want the paths to be self avoiding, meaning that a pixel should not be traversed twice in a walk (meaning no loops). This condition lead to a classical problem called SAWs (Self Avoiding Walks).
Well, now the bad news: The problem is open! No one solved it yet.
You can find a nice intro to the problem here, starting at page 54 (or page 16, the counting is confusing because the page numbers are repeating in the doc). But the whole paper is very interesting and easy to read. It manages to explain the mathematical background, the historical anecdotes and the scientific importance of markovian chains in a few slides.
Hope this helps ... to avoid the problem.
If you were planning to iterate over all possible polyominos, I'm afraid you'll be waiting a long time. From the wikipedia site about polyominos, it's going to be at least O(4.0626^n) and probably closer to O(8^n). By the time n=14, the count will be over 5 billion and too big to fit into an int. By time n=30, the count will be more than 17 quintillion and you won't be able to fit it into a long. If all the world governments pooled together their resources to iterate through all polyominos in a 32 x 32 icon, they would not be able to do it before the sun goes supernova.
Now that doesn't mean what you want to do is intractable. It is likely almost all the work you do on one polyominal was done in part on others. It may be a fun task make an exponential speedup using dynamic programming. What is it you're trying to accomplish?