Curve Fitting past last data point(s) - regression

I am trying to fit a curve to a set of data points but would like to preserve certain characteristics.
Like in this graph I have curves that almost end up being linear and some of them are not. I need a functional form to interpolate between the given data points or past the last given point.
The curves have been created using a simple regression
def func(x, d, b, c):
return c + b * np.sqrt(x) + d * x
My question now is what is the best approach to ensure a positive slope past the last data point(s) ??? In my application a decrease in costs while increasing the volume doesn't make sense even if the data says so.
I would like to keep the order as low as possible maybe ˆ3 would still be fine.
The data used to create the curve with the negative slope is
x_data = [ 100, 560, 791, 1117, 1576, 2225,
3141, 4434, 6258, 8834, 12470, 17603,
24848, 35075, 49511, 69889, 98654, 139258,
196573, 277479, 391684, 552893, 780453, 1101672,
1555099, 2195148, 3098628, 4373963, 6174201, 8715381,
12302462, 17365915]
y_data = [ 7, 8, 9, 10, 11, 12, 14, 16, 21, 27, 32, 30, 31,
38, 49, 65, 86, 108, 130, 156, 183, 211, 240, 272, 307, 346,
389, 436, 490, 549, 473, 536]
And for the positive one
x_data = [ 100, 653, 950, 1383, 2013, 2930,
4265, 6207, 9034, 13148, 19136, 27851,
40535, 58996, 85865, 124969, 181884, 264718,
385277, 560741, 816117, 1187796, 1728748, 2516062,
3661939, 5329675, 7756940, 11289641, 16431220, 23914400,
34805603, 50656927]
y_data = [ 6, 6, 7, 7, 8, 8, 9, 10, 11, 12, 14, 16, 18,
21, 25, 29, 35, 42, 50, 60, 72, 87, 105, 128, 156, 190,
232, 284, 347, 426, 522, 640]
The curve fitting is simple done by using
popt, pcov = curve_fit(func, x_data, y_data)
For the plot
plt.plot(xdata, func(xdata, *popt), 'g--', label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))
plt.plot(x_data, y_data, 'ro')
plt.xlabel('Volume')
plt.ylabel('Costs')
plt.show()

A simple solution might just look like this:
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import least_squares
def fit_function(x, a, b, c, d):
return a**2 + b**2 * x + c**2 * abs(x)**d
def residuals( params, xData, yData):
diff = [ fit_function(x, *params ) - y for x, y in zip( xData, yData ) ]
return diff
fit1 = least_squares( residuals, [ .1, .1, .1, .5 ], loss='soft_l1', args=( x1Data, y1Data ) )
print fit1.x
fit2 = least_squares( residuals, [ .1, .1, .1, .5 ], loss='soft_l1', args=( x2Data, y2Data ) )
print fit2.x
testX1 = np.linspace(0, 1.1 * max( x1Data ), 100 )
testX2 = np.linspace(0, 1.1 * max( x2Data ), 100 )
testY1 = [ fit_function( x, *( fit1.x ) ) for x in testX1 ]
testY2 = [ fit_function( x, *( fit2.x ) ) for x in testX2 ]
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.scatter( x1Data, y1Data )
ax.scatter( x2Data, y2Data )
ax.plot( testX1, testY1 )
ax.plot( testX2, testY2 )
plt.show()
providing
>>[ 1.00232004e-01 -1.10838455e-04 2.50434266e-01 5.73214256e-01]
>>[ 1.00104293e-01 -2.57749592e-05 1.83726191e-01 5.55926678e-01]
and
It just takes the parameters as squares, therefore ensuring positive slope. Naturally, the fit becomes worse if following the decreasing points at the end of data set 1 is forbidden. Concerning this I'd say those are just statistical outliers. Therefore, I used least_squares, which can deal with this with a soft loss. See this doc for details. Depending on how the real data set is, I'd think about removing them. Finally, I'd expect that zero volume produces zero costs, so the constant term in the fit function doesn't seem to make sense.
So if the function is only of type a**2 * x + b**2 * sqrt(x) it look like:
where the green graph is the result of leastsq, i.e. without the f_scale option of least_squares.

Related

Return a value of dictionary where a variable is inbetween keys or values

I have some data that I think would work best as a dictionary or JSON. The data has an initial category, a, b...z, and five bands within each category.
What I want to be able to do is give a function a category and a value and for the function to return the corresponding band.
I tried to create a dictionary like this where the values of each band are the lower threshold i.e. for category a, Band 1 is between 0 and 89:
bandings = {
'a' :
{
'Band 1' : 0,
'Band 2': 90,
'Band 3': 190,
'Band 4': 420,
'Band 5': 500
},
'b' :
{
'Band 1' : 0,
'Band 2': 500,
'Band 3': 1200,
'Band 4': 1700,
'Band 5': 2000
}
}
So if I was to run a function:
lookup_band(category='a', value=100)
it would return 'Band 3' as 100 is between 90 and 189 in category a
I also experimented with settings keys as ranges but struggled with how to handle a range of > max value in Band 5.
I can change the structure of the dictionary or use a different way of referencing the data.
Any ideas, please?
You can structure your data a little bit differently (using sorted lists instead of dictionaries) and use bisect module. For example:
from bisect import bisect
bandings = {
'a': [0, 90, 190, 420, 500],
'b': [0, 500, 1200, 1700, 2000]
}
def lookup_band(bandings, band, value):
return 'Band {}'.format(bisect(bandings[band], value))
print(lookup_band(bandings, 'a', 100)) # Band 2
print(lookup_band(bandings, 'b', 1700)) # Band 4
print(lookup_band(bandings, 'b', 9999)) # Band 5

How to read JSON file in Prolog

I found a few SO posts on related issues which were unhelpful. I finally figured it out and here's how to read the contents of a .json file. Say the path is /home/xxx/dnns/test/params.json, I want to turn the dictionary in the .json into a Prolog dictionary:
{
"type": "lenet_1d",
"input_channel": 1,
"output_size": 130,
"batch_norm": 1,
"use_pooling": 1,
"pooling_method": "max",
"conv1_kernel_size": 17,
"conv1_num_kernels": 45,
"conv1_stride": 1,
"conv1_dropout": 0.0,
"pool1_kernel_size": 2,
"pool1_stride": 2,
"conv2_kernel_size": 12,
"conv2_num_kernels": 35,
"conv2_stride": 1,
"conv2_dropout": 0.514948804688646,
"pool2_kernel_size": 2,
"pool2_stride": 2,
"fcs_hidden_size": 109,
"fcs_num_hidden_layers": 2,
"fcs_dropout": 0.8559119274655482,
"cost_function": "SmoothL1",
"optimizer": "Adam",
"learning_rate": 0.0001802763794651928,
"momentum": null,
"data_is_target": 0,
"data_train": "/home/xxx/data/20180402_L74_70mm/train_2.h5",
"data_val": "/home/xxx/data/20180402_L74_70mm/val_2.h5",
"batch_size": 32,
"data_noise_gaussian": 1,
"weight_decay": 0,
"patience": 20,
"cuda": 1,
"save_initial": 0,
"k": 4,
"save_dir": "DNNs/20181203090415_11_created/k_4"
}
To read a JSON file with SWI-Prolog, query
?- use_module(library(http/json)). % to enable json_read_dict/2
?- FPath = '/home/xxx/dnns/test/params.json', open(FPath, read, Stream), json_read_dict(Stream, Dicty).
You'll get
FPath = 'DNNs/test/k_4/model_params.json',
Stream = <stream>(0x7fa664401750),
Dicty = _12796{batch_norm:1, batch_size:32, conv1_dropout:0.
0, conv1_kernel_size:17, conv1_num_kernels:45, conv1_stride:
1, conv2_dropout:0.514948804688646, conv2_kernel_size:12, co
nv2_num_kernels:35, conv2_stride:1, cost_function:"SmoothL1"
, cuda:1, data_is_target:0, data_noise_gaussian:1, data_trai
n:"/home/xxx/Downloads/20180402_L74_70mm/train_2.h5", data
_val:"/home/xxx/Downloads/20180402_L74_70mm/val_2.h5", fcs
_dropout:0.8559119274655482, fcs_hidden_size:109, fcs_num_hi
dden_layers:2, input_channel:1, k:4, learning_rate:0.0001802
763794651928, momentum:null, optimizer:"Adam", output_size:1
30, patience:20, pool1_kernel_size:2, pool1_stride:2, pool2_
kernel_size:2, pool2_stride:2, pooling_method:"max", save_di
r:"DNNs/20181203090415_11_created/k_4", save_initial:0, type
:"lenet_1d", use_pooling:1, weight_decay:0}.
where Dicty is the desired dictionary.
If you want to define this as a predicate, you could do:
:- use_module(library(http/json)).
get_dict_from_json_file(FPath, Dicty) :-
open(FPath, read, Stream), json_read_dict(Stream, Dicty), close(Stream).
Even DEC10 Prolog released 40 years ago could handle JSON just as a normal term . There should be no need for a specialized library or parser for JSON because Prolog can just parse it directly .
?- X={"a":3,"b":"hello","c":undefined,"d":null} .
X = {"a":3, "b":"hello", "c":undefined, "d":null}.
?-

out of stack space (infinite loop?) error while working with matplolib and Django

I am trying to show a graph which I am trying to plot in Matplotlib and then showing it with some hard coding in HTML template.But out of 3 attempts, it is working only once else it is throwing TCL Out of stack space error. Below is my coding.
def similar_images(request):
n_groups = 5
means_men = (20, 35, 30, 35, 27)
std_men = (2, 3, 4, 1, 2)
means_women = (25, 32, 34, 20, 25)
std_women = (3, 5, 2, 3, 3)
fig, ax = plt.subplots() # somwhere here it is throwing this error
index = np.arange(n_groups)
bar_width = 0.35
opacity = 0.4
error_config = {'ecolor': '0.3'}
rects1 = plt.bar(index, means_men, bar_width,
alpha=opacity,
color='b',
yerr=std_men,
error_kw=error_config,
label='Men')
rects2 = plt.bar(index + bar_width, means_women, bar_width,
alpha=opacity,
color='r',
yerr=std_women,
error_kw=error_config,
label='Women')
plt.xlabel('Shoe')
plt.ylabel('Week')
plt.title('Last Year sale details')
#plt.xticks(index + bar_width / 2, ('A', 'B', 'C', 'D', 'E'))
plt.legend()
plt.savefig('/graph.png')
plt.close()
return render(request,'sales/Details.html')
Please, can someone help me here?
Below is the error.
Exception Type: TclError at /sales/similar_images/
Exception Value: out of stack space (infinite loop?)

pm3d in gnuplot with binary data

I have some data files with content
a1 b1 c1 d1
a1 b2 c2 d2
...
[blank line]
a2 b1 c1 d1
a2 b2 c2 d2
...
I plot this with gnuplot using
splot 'file' u 1:2:3:4 w pm3d.
Now, I want to use a binary file. I created the file with Fortran using unformatted stream-access (direct or sequential access did not work directly). By using gnuplot with
splot 'file' binary format='%float%float%float%float' u 1:2:3
I get a normal 3D-plot. However, the pm3d-command does not work as I don't have the blank lines in the binary file. I get the error message:
>splot 'file' binary format='%float%float%float%float' u 1:2:3:4 w pm3d
Warning: Single isoline (scan) is not enough for a pm3d plot.
Hint: Missing blank lines in the data file? See 'help pm3d' and FAQ.
According to the demo script in http://gnuplot.sourceforge.net/demo/image2.html, I have to specify the record length (which I still don't understand right). However, using this script from the demo page and the command with pm3d obtains the same error message:
splot 'scatter2.bin' binary record=30:30:29:26 u 1:2:3 w pm3d
So how is it possible to plot this four dimensional data from a binary file correctly?
Edit: Thanks, mgilson. Now it works fine. Just for the record: My fortran code-snippet:
open(unit=83,file=fname,action='write',status='replace',access='stream',form='unformatted')
a= 0.d0
b= 0.d0
do i=1,200
do j=1,100
write(83)real(a),real(b),c(i,j),d(i,j)
b = b + db
end do
a = a + da
b = 0.d0
end do
close(83)
The gnuplot commands:
set pm3d map
set contour
set cntrparam levels 20
set cntrparam bspline
unset clabel
splot 'fname' binary record=(100,-1) format='%float' u 1:2:3:4 t 'd as pm3d-projection, c as contour'
Great question, and thanks for posting it. This is a corner of gnuplot I hadn't spent much time with before. First, I need to generate a little test data -- I used python, but you could use fortran just as easily:
Note that my input array (b) is just a 10x10 array. The first two "columns" in the datafile are just the index (i,j), but you could use anything.
>>> import numpy as np
>>> a = np.arange(10)
>>> b = a[None,:]+a[:,None]
>>> b
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[ 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
[ 4, 5, 6, 7, 8, 9, 10, 11, 12, 13],
[ 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[ 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
[ 7, 8, 9, 10, 11, 12, 13, 14, 15, 16],
[ 8, 9, 10, 11, 12, 13, 14, 15, 16, 17],
[ 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]])
>>> with open('foo.dat','wb') as foo:
... for (i,j),dat in np.ndenumerate(b):
... s = struct.pack('4f',i,j,dat,dat)
... foo.write(s)
...
So here I just write 4-floating point values to the file for each data-point. Again, this is what you've already done using fortran. Now for plotting it:
splot 'foo.dat' binary record=(10,-1) format='%float' u 1:2:3:4 w pm3d
I believe that this specifies that each "scan" is a "record". Since I know that each scan will be 10 floats long, that becomes the first index in the record list. The -1 indicates that gnuplot should keep reading records until it finds the end of the file.

How do I plot a function and data in Mathematica?

Simple question but I can't find the answer.
I want to combine a ListLinePlot and a regular Plot (of a function) onto one plot. How do I do this?
Thanks.
Use Show, e.g.
Show[Plot[x^2, {x, 0, 3.5}], ListPlot[{1, 4, 9}]]
Note, if plot options conflict Show uses the first plot's option, unless the option is specified in Show. I.e.
Show[Plot[x^2, {x, 0, 3.5}, ImageSize -> 100],
ListPlot[{1, 4, 9}, ImageSize -> 400]]
shows a combined plot of size 100.
Show[Plot[x^2, {x, 0, 3.5}, ImageSize -> 100],
ListPlot[{1, 4, 9}, ImageSize -> 400], ImageSize -> 300]
Shows a combined plot of size 300.
An alternative to using Show and combining two separate plots, is to use Epilog to add the data points to the main plot. For example:
data = Table[{i, Sin[i] + .1 RandomReal[]}, {i, 0, 10, .5}];
Plot[Sin[x], {x, 0, 10}, Epilog -> Point[data], PlotRange -> All]
or
Plot[Sin[x], {x, 0, 10}, Epilog -> Line[data], PlotRange -> All]