sympy autowrap (cython): limit of # of arguments, arguments in array form? - cython

I have the following issue:
I want to use autowrap to generate a compiled version of a sympy matrix, with cells containing sympy expressions. Depending on the specification of my problem, the number of arguments can get very large.
I ran into the following 2 issues:
The number of arguments that autowrap accepts seems to be limited to 509.
i.e., this works:
import sympy
from sympy.utilities.autowrap import autowrap
x = sympy.symbols("x:509")
exp = sum(x)
cyt = autowrap(exp, backend="cython", args=x)
and this fails to compile:
x = sympy.symbols("x:510")
exp = sum(x)
cyt = autowrap(exp, backend="cython", args=x)
The message I get seems not very telling:
[...] (Full output upon request)
Generating code
c:\users\[classified]\appdata\local\temp\tmp2zer8vfe_sympy_compile\wrapper_module_17.c(6293) : fatal error C1001: An internal error has occurred in the compiler.
(compiler file 'f:\dd\vctools\compiler\utc\src\p2\hash.c', line 884)
To work around this problem, try simplifying or changing the program near the locations listed above.
Please choose the Technical Support command on the Visual C++
Help menu, or open the Technical Support help file for more information
LINK : fatal error LNK1257: code generation failed
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN\\x86_amd64\\link.exe' failed with exit status 1257
Is there any way around this? I would like to use versions of my program that need ~1000 input variables.
(I have no understanding of C/cython. Is this an autowrap limitation, a C limitation ...?)
Partly connected to the above:
Can one compile functions that accept the arguments as array.
Is there any way to generate code that accepts a numpy array as input? I specifically mean one array for all the arguments, instead of providing the arguments as list. (Similar to lambdify using a DeferredVector). ufuncify supports array input, but as I understand only for broadcasting/vectorizing the function.
I would hope that an array as argument could circumvent the first problem above, which is most pressing for me. Apart from that, I would prefer array input anyways, both because it seems faster (no need to unpack the numpy array I have as input into a list), and also more straightforward and natural.
Does anyone have any suggestions what I can do?
Also, could anyone tell me whether f2py has similar limitations? This would also be an option for me if feasible, but I don't have it set up to work currently, and would prefer to know whether it helps at all before investing the time.
Thanks!
Edit:
I played around a bit with the different candidates for telling autowrap that the input argument will be something in array form, rather than a list of numbers. I'll document my steps here for posterity, and also to increase chances to get some input:
sympy.DeferredVector
Is what I use with lambdify for the same purpose, so I thought to give it a try. However, warning:
A = sympy.DeferredVector("A")
expression = A[0]+A[1]
cyt = autowrap(expression, backend="cython", args=A)
just completely crashed my OS - last statement started executing, (no feedback), everything got really slow, then no more reactions. (Can only speculate, perhaps it has to do with the fact that A has no shape information, which does not seem to bother lambdify, but might be a problem here. Anyways, seems not the right way to go.)
All sorts of array-type objects filled with the symbols in the expression to be wrapped.
e.g.
x0 ,x1 = sympy.symbols("x:2")
expression = x0 + x1
cyt = autowrap(expression, backend="cython", args=np.array([x0,x1]))
Still wants unpacked arguments. Replacing the last row by
cyt = autowrap(expression, backend="cython", args=[np.array([x0,x1])])
Gives the message
CodeGenArgumentListError: ("Argument list didn't specify: x0, x1 ", [InputArgument(x0), InputArgument(x1)])
Which is a recurrent theme to this approach: also happens when using a sympy matrix, a tuple, and so on inside the arguments list.
sympy.IndexedBase
This is actually used in the autowrap examples; however, in a (to me) inintuitive way, using an equation as the expression to be wrapped. Also, the way it is used there seems not really feasible to me: The expression I want to cythonize is a matrix, but its cells are themselves longish expressions, which I cannot obtain via index operations.
The upside is that I got a minimal example to work:
X = sympy.IndexedBase("X",shape=(1,1))
expression = 2*X[0,0]
cyt = autowrap(expression, backend="cython", args=[X])
actually compiles, and the resulting function correctly evaluates - when passed a 2d-np.array.
So this seems the most promising avenue, even though further extensions to this approach I keep trying fail.
For example this
X = sympy.IndexedBase("X",shape=(1,))
expression = 2*X[0]
cyt = autowrap(expression, backend="cython", args=[X])
gets me
[...]\site-packages\sympy\printing\codeprinter.py", line 258, in _get_expression_indices " rhs indices in %s" % expr)
ValueError: lhs indices must match non-dummy rhs indices in 2*X[0]
even though I don't see how it should be different from the working one above.
Same error message when sticking to two dimensions, but increasing the size of X:
X = sympy.IndexedBase("X",shape=(2,2))
expression = 2*X[0,0]+X[0,1]+X[1,0]+X[1,1]
cyt = autowrap(expression, backend="cython", args=[X])
ValueError: lhs indices must match non-dummy rhs indices in 2*X[0, 0] + X[0, 1] + X[1, 0] + X[1, 1]
I tried snooping around the code for autowrap, but I feel a bit lost there...
So I'm still searching for a solution and happy for any input.

Passing the argument as an array seems to work OK
x = sympy.MatrixSymbol('x', 520, 1)
exp = 0
for i in range(x.shape[0]):
exp += x[i]
cyt = autowrap(exp, backend='cython')
arr = np.random.randn(520, 1)
cyt(arr)
Out[48]: -42.59735861021934
arr.sum()
Out[49]: -42.597358610219345

Related

How to use RANSAC method to fit a line in Matlab

I am using RANSAC to fit a line to my data. The data is 30X2 double, I have used MatLab example to write the code given below, but I am getting an error in my problem. I don't understand the error and unable to resolve it.
The link to Matlab example is
https://se.mathworks.com/help/vision/ref/ransac.html
load linedata
data = [xm,ym];
N = length(xm); % number of data points
sampleSize = 2; % number of points to sample per trial
maxDistance = 2; % max allowable distance for inliers
fitLineFcn = polyfit(xm,ym,1); % fit function using polyfit
evalLineFcn =#(model) sum(ym - polyval(fitLineFcn, xm).^2,2); % distance evaluation function
[modelRANSAC, inlierIdx] = ransac(data,fitLineFcn,evalLineFcn,sampleSize,maxDistance);
The error is as follows
Error using ransac Expected fitFun to be one of these types:
function_handle
Instead its type was double.
Error in ransac>parseInputs (line 202) validateattributes(fitFun,
{'function_handle'}, {'scalar'}, mfilename, 'fitFun');
Error in ransac (line 148) [params, funcs] = parseInputs(data, fitFun,
distFun, sampleSize, ...
Lets read the error message and the documentation, as they tend to have all the information required to solve the issue!
Error using ransac Expected fitFun to be one of these types:
function_handle
Instead its type was double.
Hum, interesting. If you read the docs (which is always the first thing you should do) you see that fitFun is the second input. The error says its double, but it should be function_handle. This is easy to verify, indeed firLineFun is double!
But why? Well, lets read more documentation, right? polyfit says that it returns an array of the coefficient values, not a function_hanlde, so indeed everything the documentation says and the error says is clear about why you get the error.
Now, what do you want to do? It seems that you want to use polyfit as the function to fit with ransac. So we need to make it a function. According to the docs, fitFun has to be of the form fitFun(data), so we just do that, create a function_handle for this;
fitLineFcn=#(data)polyfit(data(:,1),data(:,2),1);
And magic! It works!
Lesson to learn: read the error text you provide, and the documentation1, all the information is there. In fact, I have never used ransac, its just reading the docs that led me to this answer.
1- In fact, programmers tend to reply with the now practically a meme: RTFM often, as it is always the first step on everything programming.

Why does Octave allow renaming of functions?

If I open Octave and do:
a = 1:10;
sum(a)
ans = 55
But if I then do:
sum = 30;
sum(a)
I get an error:
error: A(I): index out of bounds; value 10 out of bound 1
Octave has allowed me to change where the word "sum" points so now it's at a value not a function. Why is this allowed and shouldn't I be given a warning - is this not incredibly dangerous?
How, if I realise I've done this, do I remove the reference without closing octave and losing my workspace?
Imagine Octave does not allow variables to have the same as a function. You write a program in Octave and you have a variable named total which is not a function. Everything is fine. A new Octave version comes out and adds a function named total. Your program would stop working and you would have to rename your variables. That level of backwards incompatibility would be worse. And the issue wouldn't be limited to new Octave versions. Maybe you later decide that you want to use an Octave package which brings a whole set of new functions, one of which could clash with your variables.
However, in the upcoming release of Octave, out of bounds errors will give a hint that the variable name is shadowing a function. In Octave 4.2.1:
octave-cli-4.2.0:1> a = 1:10;
octave-cli-4.2.0:2> sum = 30;
octave-cli-4.2.0:3> sum (a)
error: sum(10): out of bound 1
While in in 4.3.0+ (which one day will become 4.4):
octave-cli-4.3.0+:1> a = 1:10;
octave-cli-4.3.0+:2> sum = 30;
octave-cli-4.3.0+:3> sum(a)
error: sum(10): out of bound 1 (note: variable 'sum' shadows function)
However, the real problem is not that variables can shadow functions. The real problem is that the syntax does not allow to distinguish between a variable and a function. Both variable indexing and function calling uses the same brackets () (other languages typically use () for functions and [] for index variables). And even if you call a function without any arguments, the parentheses are optional:
foo(1) # 1st element of foo? Or is foo a function?
foo # Is this a variable or a function call without any arguments?
foo() # idem
This syntax is mainly required for Matlab compatibility which is one of the aims of GNU Octave.
To work around this deficiency, Octave coding guidelines (guidelines required for code contributed to Octave. The Octave parser does not really care) requires functions to always use parentheses and to have a space between them and the function name:
foo (x, y); # there is a space after foo so it must be a function
foo(x, y); # there is no space, it is indexing a matrix
foo # this is a variable
foo (); # this is a function
How, if I realise I've done this, do I remove the reference without closing octave and losing my workspace?
Use the command clear sum to clear the user definition of symbol sum, which will revert it to the built-in meaning. (That is, the built-in definition will no longer be shadowed by user definition.)
As for why Octave works this way, one would have to ask the maintainers of this open-source project. Perhaps it's because Matlab works this way, and Octave strives to be as compatible as possible.

define theano function with other theano function output

I am new to theano, can anyone help me defining a theano function like this:
Basically, I have a network model looks like this:
y_hat, cost, mu, output_hiddens, cells = nn_f(x, y, in_size, out_size, hidden_size, layer_models, 'MDN', training=False)
here the input x is a tensor:
x = tensor.tensor3('features', dtype=theano.config.floatX)
I want to define two theano functions for later use:
f_x_hidden = theano.function([x], [output_hiddens])
f_hidden_mu = theano.function([output_hiddens], [mu], on_unused_input = 'warn')
the first one is fine. for the second one, the problem is both the input and the output are output of the original function. it gives me error:
theano.gof.fg.MissingInputError: An input of the graph, used to compute Elemwise{identity}(features), was not provided and not given a value.
my understanding is, both of [output_hiddens] and [mu] are related to the input [x], there should be an relation between them. I tried define another theano function from [x] to [mu] like:
f_x_mu = theano.function([x], [mu]),
then
f_hidden_mu = theano.function(f_x_hidden, f_x_mu),
but it still does not work. Does anyone can help me? Thanks.
The simple answer is NO WAY. In here
Because in Theano you first express everything symbolically and afterwards compile this expression to get functions, ...
You can't use the output of theano.function as input/output for another theano.function since they are already a compiled graph/function.
You should pass the symbolic variables, such as x in your example code for f_x_hidden, to build the model.

Can I read the rest of the line after a positive value of IOSTAT?

I have a file with 13 columns and 41 lines consisting of the coefficients for the Joback Method for 41 different groups. Some of the values are non-existing, though, and the table lists them as "X". I saved the table as a .csv and in my code read the file to an array. An excerpt of two lines from the .csv (the second one contains non-exisiting coefficients) looks like this:
48.84,11.74,0.0169,0.0074,9.0,123.34,163.16,453.0,1124.0,-31.1,0.227,-0.00032,0.000000146
X,74.6,0.0255,-0.0099,X,23.61,X,797.0,X,X,X,X,X
What I've tried doing was to read and define an array to hold each IOSTAT value so I can know if an "X" was read (that is, IOSTAT would be positive):
DO I = 1, 41
(READ(25,*,IOSTAT=ReadStatus(I,J)) JobackCoeff, J = 1, 13)
END DO
The problem, I've found, is that if the first value of the line to be read is "X", producing a positive value of ReadStatus, then the rest of the values of those line are not read correctly.
My intent was to use the ReadStatus array to produce an error message if JobackCoeff(I,J) caused a read error, therefore pinpointing the "X"s.
Can I force the program to keep reading a line after there is a reading error? Or is there a better way of doing this?
As soon as an error occurs during the input execution then processing of the input list terminates. Further, all variables specified in the input list become undefined. The short answer to your first question is: no, there is no way to keep reading a line after a reading error.
We come, then, to the usual answer when more complicated input processing is required: read the line into a character variable and process that. I won't write complete code for you (mostly because it isn't clear exactly what is required), but when you have a character variable you may find the index intrinsic useful. With this you can locate Xs (with repeated calls on substrings to find all of them on a line).
Alternatively, if you provide an explicit format (rather than relying on list-directed (fmt=*) input) you may be able to do something with non-advancing input (advance='no' in the read statement). However, as soon as an error condition comes about then the position of the file becomes indeterminate: you'll also have to handle this. It's probably much simpler to process the line-as-a-character-variable.
An outline of the concept (without declarations, robustness) is given below.
read(iunit, '(A)') line
idx = 1
do i=1, 13
read(line(idx:), *, iostat=iostat) x(i)
if (iostat.gt.0) then
print '("Column ",I0," has an X")', i
x(i) = -HUGE(0.) ! Recall x(i) was left undefined
end if
idx = idx + INDEX(line(idx:), ',')
end do
An alternative, long used by many many Fortran programmers, and programmers in other languages, would be to use an editor of some sort (I like sed) and modify the file by changing all the Xs to NANs. Your compiler has to provide support for IEEE NaNs for this to work (most of the current crop in widespread use do) and they will correctly interpret NAN in the input file to a real number with value NaN.
This approach has the benefit, compared with the already accepted (and perfectly good) answer, of not requiring clever programming in Fortran to parse input lines containing mixed entries. Use an editor for string processing, use Fortran for reading numbers.

Using warning ('on', 'Octave:matlab-incompatible')

I recently found out that you can get Octave to warn when you are using features not compatible with Matlab. Due to working with others this feature is appealing.
warning ('on', 'Octave:matlab-incompatible')
However when I use it in even simple scripts
warning ('on', 'Octave:matlab-incompatible');
x = 5;
plot(x);
I get many warning due to the implementation of plot using non-Matlab compatible features. For example
warning: potential Matlab compatibility problem: ! used as operator near line 215 offile /usr/share/octave/3.8.1/m/plot/draw/plot.m
Is there a way to turn off these warnings? I don't care if plot is implemented using non-Matlab features because when I use Matlab its implementation will be fine.
No that is not possible which makes the Octave:matlab-incompatible almost useless. Also, that warning is only printed for syntax so you can still use Octave only functions (such as center or sumsq) without any problem.
I recommend you use a text editor that has separate Matlab and Octave syntax highlight (such as gedit) and avoid things that don't get highlighted.
Here's how it's done in Matlab, which looks to be similar to the Octave case:
warning('on', 'Octave:matlab-incompatible'); % Your Octave warning
x = 5;
WarnState = warning('off', 'Offending_MSGID'); % You'll need to get the specific ID
plot(x);
warning(WarnState); % Restore
Yes, it's a bit clumsy. There's no way to specify that a warning is not enabled within a particular file that I know of.
One thing that can happen is that the code an be interrupted by the user or an error before the warning state is restored. In this case, Your system now is in an unknown warning state. One way to avoid this is to use onCleanup. This function is called when a function exits, even if it exits due to an error. You might rewrite the above as:
warning('on', 'Octave:matlab-incompatible'); % Your Octave warning
x = 5;
WarnState = warning('off', 'Offending_MSGID'); % You'll need to get the specific ID
C = onCleanup(#()warning(WarnState));
plot(x);
...
Note that onCleanup won't be called until the function exits so the warning state won't be restored until then. You should be able to add a warning(WarnState); line to manually restore before if you want. Just be sure that whatever function onCleanup is calling can never return an error itself.