Plot power series gnuplot - function

I'd like to know how to plot power series (whose variable is x), but I don't even know where to start with.
I know it might not be possible plot infinite series, but it'd do as well plotting the sum of the first n terms.

Gnuplot has a sum function, which can be used inside the using statement to sum up several columns or terms. Together with the special file name + you can implement power series.
Consider the exponention function, which has a power series
\sum_{n=0}^\infty x^n/n!
So, we define a term as
term(x, n) = x**n/n!
Now we can plot the power series up to the n=5 term with
set xrange [0:4]
term(x, n) = x**n/n!
set samples 20
plot '+' using 1:(sum [n=0:5] term($1, n))
To plot the results when using 2 to 7 terms and compare it with the actual exp function, use
term(x, n) = x**n/n!
set xrange [-2:2]
set samples 41
set key left
plot exp(x), for [i=1:6] '+' using 1:(sum[t=0:i] term($1, t)) title sprintf('%d terms', i)

The easiest way that I can think of is to generate a file that has a column of x-values and a column of f(x) values, then just plot the table like you would any other data. A power series is continuous, so you can just connect the dots and have a fairly accurate representation (provided your dots are close enough together). Also, when evaluating f(x), you just sum up the first N terms (where N is big enough). Big enough means that the sum of the rest of the terms is smaller than whatever error you allow. (*If you want 3 good digits, then N needs to be large enough that the remaining sum is smaller than .001.)
You can pull out a calc II textbook to determine how to bound the error on the tail of the sum. A lot of calc classes briefly cover it, but students tend to feel like the error estimates are pointless (I know because I've taught the course a few times.) As an example, if you have an alternating series (whose terms are decreasing in absolute value), then the absolute value of the first term you omit (don't sum) is an upperbound on your error.
*This statement is not 100% true, it is slightly over simplified, but is correct for most practical purposes.

Related

Find the Relationship Between Two Logarithmic Equations

No idea if I am asking this question in the right place, but here goes...
I have a set of equations that were calculated based on numbers ranging from 4 to 8. So an equation for when this number is 5, one for when it is 6, one for when it is 7, etc. These equations were determined from graphing a best fit line to data points in a Google Sheet graph. Here is an example of a graph...
Example...
When the number is between 6 and 6.9, this equation is used: windGust6to7 = -29.2 + (17.7 * log(windSpeed))
When the number is between 7 and 7.9, this equation is used: windGust7to8 = -70.0 + (30.8 * log(windSpeed))
I am using these equations to create an image in python, but the image is too choppy since each equation covers a range from x to x.9. In order to smooth this image out and make it more accurate, I really would need an equation for every 0.1 change in number. So an equation for 6, a different equation for 6.1, one for 6.2, etc.
Here is an example output image that is created using the current equations:
So my question is: Is there a way to find the relationship between the two example equations I gave above in order to use that to create a smoother looking image?
This is not about logarithms; for the purposes of this derivation, log(windspeed) is a constant term. Rather, you're trying to find a fit for your mapping:
6 (-29.2, 17.7)
7 (-70.0, 30.8)
...
... and all of the other numbers you have already. You need to determine two basic search paramteres:
(1) Where in each range is your function an exact fit? For instance, for the first one, is it exactly correct at 6.0, 6.5, 7.0, or elsewhere? Change the left-hand column to reflect that point.
(2) What sort of fit do you want? You are basically fitting a pair of parameterized equations, one for each coefficient:
x y x y
6 -29.2 6 17.7
7 -70.0 7 30.8
For each of these, you want to find the coefficients of a good matching function. This is a large field of statistical and algebraic study. Since you have four ranges, you will have four points for each function. It is straightforward to fit a cubic equation to each set of points in Cartesian space. However, the resulting function may not be as smooth as you like; in such a case, you may well find that a 4th- or 5th- degree function fits better, or perhaps something exponential, depending on the actual distribution of your points.
You need to work with your own problem objectives and do a little more research into function fitting. Once you determine the desired characteristics, look into scikit for fitting functions to do the heavy computational work for you.

Cubic spline implementation in Octave

My bold claim is that the Octave implementation of the cubic spline, as implemented in interp1(..., "spline") differs from the "natural cubic spline" algorithm outlined in, e.g., Wolfram's Mathworld. I have written my own implementation of the latter and compared it to the output of the interp1(..., "spline") function, with the following results:
I discovered that when I try the same comparison with 4 points, the solutions also differ, and, moreover, the Octave solution is identical to fitting a single cubic polynomial to all four points (and not actually producing a piecewise spline for the three intervals).
I also tried to look under the hood at Octave's implementation of splines, and found it was too obtuse to read and understand in 5 minutes.
I know that there are a few options for boundary conditions one can choose ("natural" vs "clamped") when implementing a cubic spline. My implementation uses "natural" boundary conditions (in which the second derivative of the two exterior points is set to zero).
If Octave's cubic spline is indeed different to the standard cubic spline, then what actually is it?
EDIT:
The second order differences of the two solutions shown in the Comparison plot above are plotted here:
Firstly, there appear to be only two cubic polynomials in Octave's case: one that is fit over the first two intervals, and one that is fit over the last two intervals. Secondly, they are clearly not using "natural" splines, since the second derivatives at the extremes do not tend to zero.
Also, I think the fact that the second order difference for my implementation at the middle (i.e. 3rd) point is zero is just a coincidence, and not demanded by the algorithm. Repeating this test for a different set of points will confirm/refute this.
Different end conditions explains the difference between your implementation and Octave's. Octave uses the not-a-knot condition (depending on input)
See help spline
To explain your observations: the third derivative is continuous at the 2nd and (n-1)th break due to the not-a-knot condition, so that's why Octave's second derivative looks like it has less 'breaks', because it is a continuous straight line over the first two and last two segments. If you look at the third derivative, you can see the effect more clearly - the 3rd derivative is discontinuous only at the 3rd break (the middle)
x = 1:5;
y = rand(1,5);
xx = linspace(1,5);
pp = interp1(x, y, 'spline', 'pp');
yy = ppval(pp, xx);
dyy = ppval(ppder(pp, 3), xx);
plot(xx, yy, xx, dyy);
Also the pp data structure looks like this
pp =
scalar structure containing the fields:
form = pp
breaks =
1 2 3 4 5
coefs =
0.427823 -1.767499 1.994444 0.240388
0.427823 -0.484030 -0.257085 0.895156
-0.442232 0.799439 0.058324 0.581864
-0.442232 -0.527258 0.330506 0.997395
pieces = 4
order = 4
dim = 1
orient = first

Cumulative Distribution Function For a Set of Values

I have a histogram, where I count the number of occurrences that a function takes particular values in the range 0.8 and 2.2.
I would like to get the cumulative distribution function for the set of values. Is it correct to just count the total number of occurrences until each particular value.
For example, the cdf at 0.9 will be the sum of all the occurrences from 0.8 to 0.9?
Is it correct?
Thank you
The sum normalised by the number of entries will give you an estimate of the cdf, yes. It will be as accurate as the histogram is an accurate representation of the pdf. If you want to evaluate the cdf anywhere except the bin endpoints, it makes sense to include a fraction of the counts, so that if you have break points b_i and b_j, then to evaluate the cdf at some point b_i < p < b_j you should add the fraction of counts (p - b_i) / (b_j-b_i) from the relevant cell. Essentially this assumes uniform density within the cells.
You can get an estimate of the cdf from the underlying values, too (based on your question I'm not quite sure what you have access to, whether its bin counts in the histogram or the actual values). Beware that doing so will give your CDF discontinuities (steps) at each data point, so think about whether you have enough, and what you're using the CDF for, to determine whether this is appropriate.
As a final note of warning, beware that evaluating the cdf outside of the range of observed values will give you an estimated probability of zero or one (zero for x<0.8, one for x>2.2). You should consider whether the function is truly bounded to that interval, and if not, employ some smoothing to ensure small amounts of probability mass outside the range of observed values.

numerical issues causing the difference in outputs of two programs?

I have two codes that theoretically should return the exact same output. However, this does not happen. The issue is that the two codes handle very small numbers (doubles) to the order of 1e-100 or so. I suspect that there could be some numerical issues which are related to that, and lead to the two outputs being different even though they should be theoretically the same.
Does it indeed make sense that handling numbers on the order of 1e-100 cause such problems? I don't mind the difference in output, if I could safely assume that the source is numerical issues. Does anyone have a good source/reference that talks about issues that come up with stability of algorithms when they handle numbers in such order?
Thanks.
Does anyone have a good source/reference that talks about issues that come up with stability of algorithms when they handle numbers in such order?
The first reference that comes to mind is What Every Computer Scientist Should Know About Floating-Point Arithmetic. It covers floating-point maths in general.
As far as numerical stability is concerned, the best references probably depend on the numerical algorithm in question. Two wide-ranging works that come to mind are:
Numerical Recipes by Press et al;
Matrix Computations by Golub and Van Loan.
It is not necessarily the small numbers that are causing the problem.
How do you check whether the outputs are the "exact same"?
I would check equality with tolerance. You may consider the floating point numbers x and y equal if either fabs(x-y) < 1.0e-6 or fabs(x-y) < fabs(x)*1.0e-6 holds.
Usually, there is a HUGE difference between the two algorithms if there are numerical issues. Often, a small change in the input may result in extreme changes in the output, if the algorithm suffers from numerical issues.
What makes you think that there are "numerical issues"?
If possible, change your algorithm to use Kahan Summation (aka compensated summation). From Wikipedia:
function KahanSum(input)
var sum = 0.0
var c = 0.0 //A running compensation for lost low-order bits.
for i = 1 to input.length do
y = input[i] - c //So far, so good: c is zero.
t = sum + y //Alas, sum is big, y small, so low-order digits of y are lost.
c = (t - sum) - y //(t - sum) recovers the high-order part of y; subtracting y recovers -(low part of y)
sum = t //Algebraically, c should always be zero. Beware eagerly optimising compilers!
//Next time around, the lost low part will be added to y in a fresh attempt.
return sum
This works by keeping a second running total of the cumulative error, similar to the Bresenham line drawing algorithm. The end result is that you get precision that is nearly double the data type's advertised precision.
Another technique I use is to sort my numbers from small to large (by manitude, ignoring sign) and add or subtract the small numbers first, then the larger ones. This has the virtue that if you add and subtract the same value multiple times, such numbers may cancel exactly and can be removed from the list.

Function to dampen a value

I have a list of documents each having a relevance score for a search query. I need older documents to have their relevance score dampened, to try to introduce their date in the ranking process. I already tried fiddling with functions such as 1/(1+date_difference), but the reciprocal function is too discriminating for close recent dates.
I was thinking maybe a mathematical function with range (0..1) and domain(0..x) to amplify their score, where the x-axis is the age of a document. It's best to explain what I further need from the function by an image:
Decaying behavior is often modeled well by an exponentional function (many decaying processes in nature also follow it). You would use 2 positive parameters A and B and get
y(x) = A exp(-B x)
Since you want a y-range [0,1] set A=1. Larger B give slower decays.
If a simple 1/(1+x) decreases too quickly too soon, a sigmoid function like 1/(1+e^-x) or the error function might be better suited to your purpose. Let the current date be somewhere in the negative numbers for such a function, and you can get a value that is current for some configurable time and then decreases towards a base value.
log((x+1)-age_of_document)
Where the base of the logarithm is (x+1). Note the x is as per your diagram and is the "threshold". If the age of the document is greater than x the score goes negative. Multiply by the maximum possible score to introduce scaling.
E.g. Domain = (0,10) with a maximum score of 10: 10*(log(11-x))/log(11)
A bit late, but as thiton says, you might want to use a sigmoid function instead, since it has a "floor" value for your long tail data points. E.g.:
0.8/(1+5^(x-3)) + 0.2 - You can adjust the constants 5 and 3 to control the slope of the curve. The 0.2 is where the floor will be.