A software library originally written for MATLAB, comprising of MATLAB and C source files, is being ported to Octave. The C code uses the MATLAB MEX file interface. The library works without error on MATLAB, but not on Octave. The C source is closed and I don't have access to it, but someone kindly compiled it for me.
The following Octave code
Y=ones(size(X)) + X;
fails with the error
Subscript indices must be either positive integers or logicals.
X is a matrix returned by the MEX module.
I've already verified that ones and size are referring to the builtin functions and not overwritten by some local variables.
How can I fix this?
EDIT
Breaking down into steps:
S=size(X);
O=ones(S);
X+O;
gives the above error on the last line, the addition. The whos command outputs this:
octave:13> whos O X
Variables in the current scope:
Attr Name Size Bytes Class
==== ==== ==== ===== =====
O 512x512 2097152 double
X 512x512 2097152 double
Total is 524288 elements using 4194304 bytes
The error you report has no reason to be. I can't figure out what it may be doing. If I generate the X matrix you report, it all works fine:
X = rand (512, 512);
S = size (X);
O = ones (S);
X+O;
I don't know how you confirmed that you were using the builtin functions, so can you check that this works:
X = rand (512, 512);
S = builtin ("size", X);
O = builtin ("ones", S);
X+O;
Or could it be that the mex file someone compile for you somehow overloads the plus operator for double? Since you don't have the source for it, I'd suggest you do the following. After calling the mex function, save X, exit, and load it in a new Octave session. Check that the error disappeared, and if not, share the file with us so that we can at least try to reproduce it.
X = your_closed_source_mex (...);
save -binary data.dat X
exit();
Then start a new Octave session:
load -binary data.dat
whos X # confirm that X is loaded
X = rand (512, 512);
S = size (X);
O = ones (S);
X+O;
Related
Running Octave 6.3.0 for Windows. I need to get the smallest eigenvalue of some matrix.eigs(A,1,"sm") is supposed to do that, but I often get wrong results with singular matrices.
eigs(A) (which returns all the the first 6 eigenvalues/vectors) is correct (at least at the machine precision):
>> A = [[1 1 1];[1 1 1];[1 1 1]]
A =
1 1 1
1 1 1
1 1 1
>> [v lambda flag] = eigs(A)
v =
0.5774 -0.3094 -0.7556
0.5774 -0.4996 0.6458
0.5774 0.8091 0.1098
lambda =
Diagonal Matrix
3.0000e+00 0 0
0 -4.5198e-16 0
0 0 -1.5831e-17
flag = 0
But eigs(A,1,"sm") is not:
>> [v lambda flag] = eigs(A,1,"sm")
warning: eigs: 'A - sigma*B' is singular, indicating sigma is exactly an eigenvalue so convergence is not guaranteed
warning: called from
eigs at line 298 column 20
warning: matrix singular to machine precision
warning: called from
eigs at line 298 column 20
warning: matrix singular to machine precision
warning: called from
eigs at line 298 column 20
warning: matrix singular to machine precision
warning: called from
eigs at line 298 column 20
warning: matrix singular to machine precision
warning: called from
eigs at line 298 column 20
v =
-0.7554
0.2745
0.5950
lambda = 0.4322
flag = 0
Not only the returned eigenvalue is wrong, but the returned flag is zero, indicating that every went right in the function...
Is it a wrong usage of eigs() (but from the doc I can't see what is wrong) or a bug?
EDIT: if not a bug, at least a design issue... No problem either when requesting the 2 smallest values instead of the smallest value alone.
>> eigs(A,2,"sm")
ans =
-1.7700e-17
-5.8485e-16
EDIT 2: the eigs() function in Matlab online just runs fine and return the correct results (at the machine precision)
>> A=ones(3)
A =
1 1 1
1 1 1
1 1 1
>> [v lambda flag] = eigs(A,1,"smallestabs")
v =
-0.7556
0.6458
0.1098
lambda =
-1.5831e-17
flag =
0
After more tests and investigations I think I can answer that yes, Octave eigs() has some flaw.
eigs(A,1,"sm") likely uses the inverse power iteration method, that is repeatedly solving y=A\x, then x=y, starting with an arbitrary x vector. Obviously there's a problem if A is singular. However:
Matlab eigs() runs fine in such case, and returns the correct eigenvalue (at the machine precision). I don't know what it does, maybe adding a tiny value on the diagonal if the matrix is detected as singular, but it does something better (or at least different) than Octave.
If for some (good or bad) reason Octave's algorithm cannot handle a singular matrix, then this should be reflected in the 3rd return argument ("flag"). Instead, it is always zero as if everything went OK.
eigs(A,1,"sm") is actually equivalent to eigs(A,1,0), and the more general syntax is eigs(A,1,sigma), which means "find the closest eigenvalue to sigma, and the associated eigenvector". For this, the inverse power iteration method is applied with the matrix A-sigma*I. Problem: if sigma is already an exact eigenvalue this matrix is singular by definition. Octave eigs() fails in this case, while Matlab eigs() succeeds. It's kind of weird to have a failure when one knows in advance the exact eigenvalue, or sets it by chance. So the right thing to do in Octave is to test if (A-sigma.I) is singular, and if yes add a tiny value to sigma: eigs(A,1,sigma+eps*norm(A)). Matlab eigs() probably does something like that.
I'm trying to solve the following ODE:
where R(T) is defined as:
This is my not so great attempt at using Octave:
1;
function xdot = f(t, T)
xdot = 987 * ( 0.0000696 * ( 1 + 0.0038 * ( T(t) - 25 ))) - ( 0.0168 * (T(t)-25 )) - (( 3.25 * 10 ^ (-13))) * ((T(t))^4 - (25^4));
endfunction
[x, istate, msg] = lsode( "f", 100, (t=linspace(0,3600,1000)'));
T_ref and T_infinity_sign are the same constant.
Why isn't my code correct?
If you type
debug_on_error(1)
on your octave session, and then run your code, you will see that the "f" part is called as expected, but then it fails inside lsode with the following error:
error: T(100): out of bound 1 (dimensions are 1x1)
error: called from
f at line 4 column 8
If you look at the documentation of lsode, it says it expects a function f whose first argument is a state vector x, and the second is a scalar, corresponding to time t at which that state vector occurs; f is expected to output the differential dx/dt at time t for state vector x.
From your code it seems that you are reversing the order of the arguments (and their meanings).
Therefore, when you passed T as a second argument to your function, lsode treats it like a scalar, so when you then try to evaluate T(t), it fails with an 'out_of_bounds_ error.
My advice would be, read the documentation of lsode and have a look at its examples carefully, and start playing with it to understand how it works.
PS. The debug_on_error directive launches the debugger if an error occurs during code execution. To exit the debugger type dbquit (or if you're using the GUI, click on the 'Quit Debugging Mode' button at the top right of the octave editor). If you don't know how to use the octave debugger, I recommend you spend some time to learn it, it is a very useful tool.
error: 'y' undefined near line 8 column 12
error: called from computeCost at line 8 column 3
Here is my code:
1;
function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
% J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
% parameter for linear regression to fit the data points in X and y
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta
% You should set J to the cost.
J = sum(( X * theta - y ) .^2 )/( 2 * m );
% =========================================================================
end
I am guessing it's an error from Coursera ML course assignment. I think you are trying to run the file which contains the implementation of function computeCost(X, y, theta), not the file which calls the computeCost(,,) function with values of X, y, theta. This is why you are getting the error as you aren't providing y.
Run the file which is calling computeCost() function, not the file which contains the implementation of computeCost() function.
That is:
For Week2 Assignment 1: Run ex1.m file
For Week3 Assignment 2: Run ex2.m file
There are two things happening here. First you are defining your function dynamically as opposed to in its own file; not sure why you would prefer that.
Second, after having defined this computeCost function, you are calling it from a context where you did not pass a y argument (or presumably, you didn't pass any arguments to it, and y happens to be the first one detected as missing inside the function).
Since this is a cost function and your code looks suspiciously like code from Andrew Ng's Machine Learning course on Coursera, I am going to go out on a limb here and guess that you called computeCost from something else that was supposed to use it as a cost function to be optimised, e.g. fminunc. Typically functions like fminunc expect a function handle as an argument, but they expect a very specific function handle too. If you look at the help of fminunc, it states that:
FCN should accept a vector (array) defining the unknown variables,
and return the objective function value, optionally with gradient.
Therefore, if you want to pass a function that is to be computed with three arguments, you need to "wrap" it into your own handle, which you can define on the spot, e.g. #(x) computeCost(x, y, t) (assuming 'y' and 't' exist already).
So, I'm guessing that instead of calling fminunc like so: fminunc( #(x) computeCost(x, y, t),
you probably called it like so: fminunc( #computeCost )
or even like so: fminunc( computeCost ) (which evaluates the function first, rather than pass a function handle as an argument).
Basically, go back to the code given to you by coursera, or read the notes carefully. You're calling things the wrong way.
Actually, you are trying to run a function and you can't run it until you provide the desired no. of parameters. Doing so, you may encounter the following error:
computeCost error: 'y' undefined near line 7 column 12
error: called from computeCost at line 7 column 3
As you see, here I'm calling this function without passing any argument.
SOLUTION:
You can test your code by running 'ex1' script. After that submit your work by calling 'submit' script.
Let us suppose that we run the following set of commands in Octave:
pkg load symbolic %loads the symbolic math package
syms x y %declare x and y symbols
f = x^2 - 2*x + 3;
V = [-5:0.25:5]';
V_x = subs(f, x, V)
At this point V_x is a symbolic expression in Octave. Now, if this were to be MATLAB, one would run eval(V_x) and everything would be converted to numbers. However, eval does not seem to run in Octave as in MATLAB.
What should be done to convert the symbolic array into numbers?
double has been overloaded for symbolic variables so you can use double to explicitly convert the symbolic result to it's numeric representation
V_x_num = double(V_x);
This works in MATLAB as well as Octave.
and how about getting variable precision (number of digits) in the evaluated symbolic output, ie, staying in the vpa symbolic space but solving all the sym internal functions to digits
eval still outputs in the default octave output_precision and format long limitations, so that is of no use.
this is one way (sym variable y holds the value):
sympref digits 1000
syms x
eqn = x==y
vpasolve(eqn)
I am trying to make a simple program in PGI's fortran compiler. This simple program will use the graphics card to calculate pi using the "dart board" algorithm. After battling with this program for quite some time now I have finally got it to behave for the most part. However, I am currently stuck on passing back the results properly. I must say, this is a rather tricky program to debug since I can no longer shove any print statements into the subroutine. This program currently returns all zeros. I am not really sure what is going on, but I have two ideas. Both of which I am not sure how to fix:
The CUDA kernel is not running somehow?
I am not converting the values properly? pi_parts = pi_parts_d
Well, this is the status of my current program. All variables with _d on the end stand for the CUDA prepared device memory where all the other variables (with the exception of the CUDA kernel) are typical Fortran CPU prepared variables. Now there are some print statements I have commented out that I have already tried out from CPU Fortran land. These commands were to check if I really was generating the random numbers properly. As for the CUDA method, I have currently commented out the calculations and replaced z to statically equal to 1 just to see something happen.
module calcPi
contains
attributes(global) subroutine pi_darts(x, y, results, N)
use cudafor
implicit none
integer :: id
integer, value :: N
real, dimension(N) :: x, y, results
real :: z
id = (blockIdx%x-1)*blockDim%x + threadIdx%x
if (id .lt. N) then
! SQRT NOT NEEDED, SQRT(1) === 1
! Anything above and below 1 would stay the same even with the applied
! sqrt function. Therefore using the sqrt function wastes GPU time.
z = 1.0
!z = x(id)*x(id)+y(id)*y(id)
!if (z .lt. 1.0) then
! z = 1.0
!else
! z = 0.0
!endif
results(id) = z
endif
end subroutine pi_darts
end module calcPi
program final_project
use calcPi
use cudafor
implicit none
integer, parameter :: N = 400
integer :: i
real, dimension(N) :: x, y, pi_parts
real, dimension(N), device :: x_d, y_d, pi_parts_d
type(dim3) :: grid, tBlock
! Initialize the random number generaters seed
call random_seed()
! Make sure we initialize the parts with 0
pi_parts = 0
! Prepare the random numbers (These cannot be generated from inside the
! cuda kernel)
call random_number(x)
call random_number(y)
!write(*,*) x, y
! Convert the random numbers into graphics card memory land!
x_d = x
y_d = y
pi_parts_d = pi_parts
! For the cuda kernel
tBlock = dim3(256,1,1)
grid = dim3((N/tBlock%x)+1,1,1)
! Start the cuda kernel
call pi_darts<<<grid, tblock>>>(x_d, y_d, pi_parts_d, N)
! Transform the results into CPU Memory
pi_parts = pi_parts_d
write(*,*) pi_parts
write(*,*) 'PI: ', 4.0*sum(pi_parts)/N
end program final_project
EDIT TO CODE:
Changed various lines to reflect the fixes mentioned by: Robert Crovella. Current status: error caught by cuda-memcheck revealing: Program hit error 8 on CUDA API call to cudaLaunch on my machine.
If there is any method I can use to test this program please let me know. I am throwing darts and seeing where they land for my current style of debugging with CUDA. Not the most ideal, but it will have to do until I find another way.
May the Fortran Gods have mercy on my soul at this dark hour.
When I compile and run your program I get a segfault. This is due to the last parameter you are passing to the kernel (N_d):
call pi_darts<<<grid, tblock>>>(x_d, y_d, pi_parts_d, N_d)
Since N is a scalar quantity, the kernel is expecting to use it directly, rather than as a pointer. So when you pass a pointer to device data (N_d), the process of setting up the kernel generates a seg fault (in host code!) as it attempts to access the value N, which should be passed directly as:
call pi_darts<<<grid, tblock>>>(x_d, y_d, pi_parts_d, N)
When I make that change to the code you have posted, I then get actual printed output (instead of a seg fault), which is an array of ones and zeroes (256 ones, followed by 144 zeroes, for a total of N=400 values), followed by the calculated PI value (which happens to be 2.56 in this case (4*256/400), since you have made the kernel basically a dummy kernel).
This line of code is also probably not doing what you want:
grid = dim3(N/tBlock%x,1,1)
With N = 400 and tBlock%x = 256 (from previous code lines), the result of the calculation is 1 (ie. grid ends up at (1,1,1) which amounts to one threadblock). But you really want to launch 2 threadblocks, so as to cover the entire range of your data set (N = 400 elements). There's a number of ways to fix this, but for simplicity let's just always add 1 to the calculation:
grid = dim3((N/tBlock%x)+1,1,1)
Under these circumstances, when we launch grids that are larger (in terms of total threads) than our data set size (512 threads but only 400 data elements in this example) it's customary to put a thread check near the beginning of our kernel (in this case, after the initialization of id), to prevent out-of-bounds accesses, like so:
if (id .lt. N) then
(and a corresponding endif at the very end of the kernel code) This way, only the threads that correspond to actual valid data are allowed to do any work.
With the above changes, your code should be essentially functional, and you should be able to revert your kernel code to the proper statements and start to get an estimate of PI.
Note that you can check the CUDA API for error return codes, and you can also run your code with cuda-memcheck to get an idea of whether the kernel is making out-of-bounds accesses. Niether of these would have helped with this particular seg fault, however.