Fortran does not understand call statement - cuda

I am attempting to use PGFortran for CUDA. I installed PGFortran on my computer and linked everything up to the best of my knowledge. To get going I decided to follow a tutorial listed here. When trying to compile the code:
module mathOps
contains
attributes(global) subroutine saxpy(x, y, a)
implicit none
real :: x(:), y(:)
real, value :: a
integer :: i, n
n = size(x)
i = blockDim%x * (blockIdx%x - 1) + threadIdx%x
if (i <= n) y(i) = y(i) + a*x(i)
end subroutine saxpy
end module mathOps
program testSaxpy
use mathOps
use cudafor
implicit none
integer, parameter :: N = 40000
real :: x(N), y(N), a
real, device :: x_d(N), y_d(N)
type(dim3) :: grid, tBlock
tBlock = dim3(256,1,1)
grid = dim3(ceiling(real(N)/tBlock%x),1,1)
x = 1.0; y = 2.0; a = 2.0
x_d = x
y_d = y
call saxpy<<<grid, tblock="">>>(x_d, y_d, a)
y = y_d
write(*,*) 'Max error: ', maxval(abs(y-4.0))
end program testSaxpy
I got:
PGF90-S-0034-Syntax error at or near identifier saxpy (main.cuf: 29)
0 inform, 0 warnings, 1 severes, 0 fatal for testsaxpy
The error points to the line call saxpy<<<grid, tblock="">>>(x_d, y_d, a). For some reason it apparently hates the fact that I am using <<< and >>>? Going by the tutorial those triple chevrons are meant to be there:
The information between the triple chevrons is the execution
configuration, which dictates how many device threads execute the
kernel in parallel.
Removing these chevrons would not make any sense since they are the purpose of the program. So why does PGFortran dislike this?
As for the compilation. I have followed the tutorial by using
pgf90 -o saxpy main.cuf. But since that gave an error I also tried pgf90 -Mcuda -o saxpy main.cuf. Same results.

There does seem to be a text error in that blog at the kernel invocation line:
call saxpy<<<grid, tblock="">>>(x_d, y_d, a)
tblock="" is not correct. You'll notice elsewhere in that blog text, the kernel invocation line is given correctly as:
call saxpy<<<grid,tBlock>>>(x_d, y_d, a)
So if you change that line accordingly in your actual code, I think you'll have better results.

Related

Adding a print statement in a Fortran 90 function this do not work [duplicate]

I'm trying to learn Fortran (unfortunately a necessity for my research group) - one of the tasks I set myself was to package one of the necessary functions (Associated Legendre polynomials) from the Numerical Recipes book into a fortran 03 compliant module. The original program (f77) has some error handling in the form of the following:
if(m.lt.0.or.m.gt.1.or.abs(x).gt.1)pause 'bad arguments in plgndr'
Pause seems to have been deprecated since f77 as using this line gives me a compiling error, so I tried the following:
module sha_helper
implicit none
public :: plgndr, factorial!, ylm
contains
! numerical recipes Associated Legendre Polynomials rewritten for f03
function plgndr(l,m,x) result(res_plgndr)
integer, intent(in) :: l, m
real, intent(in) :: x
real :: res_plgndr, fact, pll, pmm, pmmp1, somx2
integer :: i,ll
if (m.lt.0.or.m.gt.l.or.abs(x).gt.1) then
write (*, *) "bad arguments to plgndr, aborting", m, x
res_plgndr=-10e6 !return a ridiculous value
else
pmm = 1.
if (m.gt.0) then
somx2 = sqrt((1.-x)*(1.+x))
fact = 1.
do i = 1, m
pmm = -pmm*fact*somx2
fact = fact+2
end do
end if
if (l.eq.m) then
res_plgndr = pmm
else
pmmp1 = x*(2*m+1)*pmm
if(l.eq.m+1) then
res_plgndr = pmmp1
else
do ll = m+2, l
pll = (x*(2*ll-1)*pmmp1-(ll+m-1)*pmm)/(ll-m)
pmm = pmmp1
pmmp1 = pll
end do
res_plgndr = pll
end if
end if
end if
end function plgndr
recursive function factorial(n) result(factorial_result)
integer, intent(in) :: n
integer, parameter :: RegInt_K = selected_int_kind(20) !should be enough for the factorials I am using
integer (kind = RegInt_K) :: factorial_result
if (n <= 0) then
factorial_result = 1
else
factorial_result = n * factorial(n-1)
end if
end function factorial
! function ylm(l,m,theta,phi) result(res_ylm)
! integer, intent(in) :: l, m
! real, intent(in) :: theta, phi
! real :: res_ylm, front_block
! real, parameter :: pi = 3.1415926536
! front_block = sqrt((2*l+1)*factorial(l-abs(m))/(4*pi*))
! end function ylm
end module sha_helper
The main code after the else works, but if I execute my main program and call the function with bad values, the program freezes before executing the print statement. I know that the print statement is the problem, as commenting it out allows the function to execute normally, returning -10e6 as the value. Ideally, I would like the program to crash after giving a user readable error message, as giving bad values to the plgndr function is a fatal error for the program. The function plgndr is being used by the program sha_lmc. Currently all this does is read some arrays and then print a value of plgndr for testing (early days). The function ylm in the module sha_helper is also not finished, hence it is commented out. The code compiles using gfortran sha_helper.f03 sha_lmc.f03 -o sha_lmc, and
gfortran --version
GNU Fortran (GCC) 4.8.2
!Spherical Harmonic Bayesian Analysis testbed for Lagrangian Dynamical Monte Carlo
program sha_analysis
use sha_helper
implicit none
!Analysis Parameters
integer, parameter :: harm_order = 6
integer, parameter :: harm_array_length = (harm_order+1)**2
real, parameter :: coeff_lo = -0.1, coeff_hi = 0.1, data_err = 0.01 !for now, data_err fixed rather than heirarchical
!Monte Carlo Parameters
integer, parameter :: run = 100000, burn = 50000, thin = 100
real, parameter :: L = 1.0, e = 1.0
!Variables needed by the program
integer :: points, r, h, p, counter = 1
real, dimension(:), allocatable :: x, y, z
real, dimension(harm_array_length) :: l_index_list, m_index_list
real, dimension(:,:), allocatable :: g_matrix
!Open the file, allocate the x,y,z arrays and read the file
open(1, file = 'Average_H_M_C_PcP_boschi_1200.xyz', status = 'old')
read(1,*) points
allocate(x(points))
allocate(y(points))
allocate(z(points))
print *, "Number of Points: ", points
readloop: do r = 1, points
read(1,*) x(r), y(r), z(r)
end do readloop
!Set up the forwards model
allocate(g_matrix(harm_array_length,points))
!Generate the l and m values of spherical harmonics
hloop: do h = 0, harm_order
ploop: do p = -h,h
l_index_list(counter) = h
m_index_list(counter) = p
counter = counter + 1
end do ploop
end do hloop
print *, plgndr(1,2,0.1)
!print *, ylm(1,1,0.1,0.1)
end program sha_analysis
Your program does what is known as recursive IO - the initial call to plgndr is in the output item list of an IO statement (a print statement) [directing output to the console] - inside that function you then also attempt to execute another IO statement [that outputs to the console]. This is not permitted - see 9.11p2 and p3 of F2003 or 9.12p2 of F2008.
A solution is to separate the function invocation from the io statement in the main program, i.e.
REAL :: a_temporary
...
a_temporary = plgndr(1,2,0.1)
PRINT *, a_temporary
Other alternatives in F2008 (but not F2003 - hence the [ ] parts in the first paragraph) include directing the output from the function to a different logical unit (note that WRITE (*, ... and PRINT ... reference the same unit).
In F2008 you could also replace the WRITE statement with a STOP statement with a message (the message must be a constant - which wouldn't let you report the problematic values).
The potential for inadvertently invoking recursive IO is part of the reason that some programming styles discourage conducting IO in functions.
Try:
if (m.lt.0.or.m.gt.l.or.abs(x).gt.1) then
write (*, *) "bad arguments to plgndr, aborting", m, x
stop
else
...
end if

Double-precision error using Dislin

I get the following error when trying to compile:
call qplot (Z, B, m + 1)
1
Error: Type mismatch in argument 'x' at (1); passed REAL(8) to REAL(4)
Everything seems to be in double precision so I can't help but think it is a Dislin error, especially considering that it appears with reference to a Dislin statement. What am I doing wrong? My code is the following:
program test
use dislin
integer :: i
integer, parameter :: n = 2
integer, parameter :: m = 5000
real (kind = 8) :: X(n + 1), Z(0:m), B(0:m)
X(1) = 1.D0
X(2) = 0.D0
X(3) = 2.D0
do i = 0, m
Z(i) = -1.D0 + (2.D0*i) / m
B(i) = f(Z(i))
end do
call qplot (Z, B, m + 1)
read(*,*)
contains
real (kind = 8) function f(t)
implicit none
real (kind = 8), intent(in) :: t
real (kind = 8), parameter :: pi = Atan(1.D0)*4.D0
f = cos(pi*t)
end function f
end program
From the DISLIN manual I read that qplot requires (single precision) floats:
QPLOT connects data points with lines.
The call is: CALL QPLOT (XRAY, YRAY, N) level 0, 1
or: void qplot (const float *xray, const float *yray, int n);
XRAY, YRAY are arrays that contain X- and Y-coordinates.
N is the number of data points.
So you need to convert Z and B to real:
call qplot (real(Z), real(B), m + 1)
Instead of using fixed numbers for the kind of numbers (which vary between compilers), please consider using the ISO_Fortran_env module and the pre-defined constants REAL32 and REAL64.
The qplot routine requires a default real. You can convert your data
call qplot(real(Z), real(B), m + 1)
I second the remark with kind = 8, it is very ugly, if you insist on 8 at least declare a constant
integer, parameter :: rp = 8
and use
real(rp) ::
As the first two answers explain, the standard versions of the dislin routines require single precision arguments. I find it most convenient to use these since I may have single or double arguments, using the real technique to convert the type of double variables. It seems unlikely that the lost precision will be perceptible on a graph. However, if you wish to work exclusively in double precision, there is an alternative set of routines. They have the same names, but take double precision arguments. To obtain them, link in the library "dislin_d".

Cuda illegal memory access error when using array indexes stored in another array

I'm using cuda fortran and I've been struggling with this problem in one simple kernel and I couldn't find the solution.
Isn't it possible to use integer values stored in an array as the indexes for another array?
Here's a simple example (edited to include also the main program):
program test
use cudafor
integer:: ncell, i
integer, allocatable:: values(:)
integer, allocatable, device :: values_d(:)
ncell = 10
allocate(values(ncell), values_d(ncell))
do i=1,ncell
values(i) = i
enddo
values_d = values
call multipleindices_kernel<<< ncell/1024+1,1024 >>> (values_d,
+ ncell)
values = values_d
write (*,*) values
end program test
!////////////////////////////////////////////////////
attributes(global) subroutine multipleindices_kernel(valu, ncell)
use cudafor
implicit none
integer, value:: ncell ! ncell = 10
integer :: valu(ncell)
integer :: tempind(10)
integer:: i
tempind(1)=10
tempind(2)=3
tempind(3)=5
tempind(4)=7
tempind(5)=9
tempind(6)=2
tempind(7)=4
tempind(8)=6
tempind(9)=8
tempind(10)=1
i = (blockidx%x - 1 ) * blockdim%x + threadidx%x
if (i .LE. ncell) then
valu(tempind(i))= 1
endif
end subroutine
I understand that if there were repeated values in the tempind array different threads could be accessing the same memory location for reading or writting, but that is not the case.
Even though, this gives the error "0: copyout Memcpy (host=0x303610, dev=0x3e20000, size=40) FAILED: 77(an illegal memory access was encountered).
Does anyone know if it is possible to use this indexes coming from another array in cuda?
After some additional tests, I've noticed that the problem occurs not while running the kernel itself, but on the transfer of the data back to CPU (if I remove "values = values_d" then no error is displayed). Also, if I substitute in the kernel valu(tempind(i)) by valu(i) it works fine, but I want to have the indexes coming from an array since the purpose of this test is to make a parallelization of a CFD code where the indexes are stored like that.
The problem appears to be that the generated executable doesn't pass the variable ncell to the kernel correctly. Running the application through cuda-memcheck shows that threads outside of the 1-10 are passing through the branch statement, and adding a print statement to print ncell inside the kernel also gives strange answers.
It used to be a requirement that all attributes(global) subroutines had to reside within a module. This requirement seems to have been relaxed in more recent versions of CUDA Fortran (I cannot find references to it in the programming guide). I believe the code outside of the module is causing the error here. By placing multipleindices_kernel within a module and using that module in test I consistantly get correct answers with no errors. The code for this is below:
module testmod
contains
attributes(global) subroutine multipleindices_kernel(valu, ncell)
use cudafor
implicit none
integer, value:: ncell ! ncell = 10
integer :: valu(ncell)
integer :: tempind(10)
integer:: i
tempind(1)=10
tempind(2)=3
tempind(3)=5
tempind(4)=7
tempind(5)=9
tempind(6)=2
tempind(7)=4
tempind(8)=6
tempind(9)=8
tempind(10)=1
i = (blockidx%x - 1 ) * blockdim%x + threadidx%x
if (i .LE. ncell) then
valu(tempind(i))= 1
endif
end subroutine
end module testmod
program test
use cudafor
use testmod
integer:: ncell, i
integer, allocatable:: values(:)
integer, allocatable, device :: values_d(:)
ncell = 10
allocate(values(ncell), values_d(ncell))
do i=1,ncell
values(i) = i
enddo
values_d = values
call multipleindices_kernel<<< ncell/1024+1,1024 >>> (values_d, ncell)
values = values_d
write (*,*) values
end program test
!////////////////////////////////////////////////////

pass function as argument to subroutine using interface doesn't work in Plato Fortran 90

I created a fortran 90 program that I used on a linux machine and compiled using gfortran. It worked fine on the linux machine with gfortran but provides the error
error 327 - In the INTERFACE to SECANTMETHOD (from MODULE SECMETH), the ninth dummy argument (F) was of type REAL(KIND=2) FUNCTION, whereas the actual argument is of type REAL(KIND=2)
when using the Plato compiler (FTN95). Does anyone know how I would need to change my code to work in Plato? I tried to read up on this error and there was some mention of pointers but from what I tried that did not work. I have figured out some workarounds but they make it so that the subroutine can no longer accept any function as an argument - which is pretty much useless. Any help would be greatly appreciated. My code is below.
!--! A module to define a real number precision.
module types
integer, parameter :: dp=selected_real_kind(15)
end module types
module secFuncs
contains
function colebrookWhite(T)
use types
real(dp) :: colebrookWhite
real(dp), intent(in) :: T
colebrookwhite=25-T**2
return
end function colebrookWhite
end module secFuncs
module secMeth
contains
subroutine secantMethod(xolder,xold,xnew,epsi1,epsi2,maxit,exitFlag,numit,f)
use types
use secFuncs
implicit none
interface
function f(T)
use types
real(dp) :: f
real(dp), intent(in) :: T
end function f
end interface
real(dp), intent(in) :: epsi1, epsi2
real(dp), intent(inout) :: xolder, xold
real(dp), intent(out) :: xnew
integer, intent(in) :: maxit
integer, intent(out) :: numit, exitFlag
real(dp) :: fxold, fxolder, fxnew
integer :: i
fxolder = f(xolder)
fxold = f(xold)
i = 0
do
i = i + 1
xnew = xold - fxold*(xold-xolder)/(fxold-fxolder)
fxnew = f(xnew)
if (i == maxit) then
exitFlag = 1
numit = i
return
else if (abs(fxnew) < epsi1) then
exitFlag = 2
numit = i
return
else if (abs(xnew - xold) < epsi2) then
exitFlag = 3
numit = i
return
end if
xolder = xold
xold = xnew
fxolder = fxold
fxold = fxnew
end do
end subroutine secantMethod
end module secMeth
program secantRoots
use types
use secMeth
use secFuncs
implicit none
real(dp) :: x1, x2, xfinal, epsi1, epsi2
integer :: ioerror, maxit, numit, exitFlag
do
write(*,'(A)',advance="no")"Please enter two initial root estimates, 2epsi's, and maxit: "
read(*,*,iostat=ioerror) x1, x2, epsi1, epsi2, maxit
if (ioerror /= 0) then
write(*,*)"Invalid input."
else
exit
end if
end do
call secantMethod(x1,x2,xfinal,epsi1,epsi2,maxit,exitFlag,numit,colebrookWhite)
if (exitFlag == 1) then
write(*,*)"The maximum number of iterations was reached."
else if (exitFlag == 2) then
write(*,'(a,f5.3,a,i3,a)')"The root is ", xfinal, ", which was reached in ", numit, " iterations."
else if (exitFlag == 3) then
write(*,'(a,i3,a)')"There is slow or no progress at ", numit, " iterations."
end if
end program secantRoots
Current gfortran detects the error in the call to the secantMethod procedure, where you have parentheses, but no argument list, following the colebrookWhite function name.
If you want to pass a function as an argument (as opposed to the result of evaluating a function), which is what you want to do here, you do not follow the function name with a parenthesis pair.
call secantMethod(x1,x2,xfinal,epsi1,epsi2,maxit,exitFlag,numit,colebrookWhite )
! ^
I ended up just switching from Plato to Geany IDE (I actually like Geany WAY better now that I've used it for a couple hours), setting up gfortran with Geany, and the code works with that setup. I'm guessing the reason I'm getting the error with Plato is that its compiler is actually a fortran95 compiler while gfortran is a fortran90 compiler. It took a while to get everything working but once I downloaded mingw-w64 for gfortran and set the path user (not system) environment variable to the correct location everything works great. I would still be interested in seeing if there is a way to get the code working with the FTN95 compiler, but in the end I'm still sticking with gfortran and Geany.

How can I fill a matrix using a fortran subroutine (or function) and pass the matrix back to the main program?

I'm working on a Fortran 90 assignment, and I'm having a lot of issues learning how to use subroutines and functions, and I'm hoping someone can help me. if it isn't obvious, I'm extremely new to FORTRAN and much more comfortable with language like C and Java.
Anyway, here's what I have to do: The user selects what they would like to do: add, subtract, multiply, or transpose two matrices. I'm using a select case for this, which works great. However, I obviously don't want to duplicate the same code to fill two matrices four different times, so I'm trying to make it a separate function. Ideally, I'd like to do something like this:
integer matrix1(11,11), matrix2(11,11)
integer rows1,cols1,rows2,cols2,i,j
case (1)
matrix1 = fillmatrix(rows1,cols1)
matrix2 = fillmatrix(rows2,cols2)
.
.
.
function fillmatrix(rows,columns)
integer input
read *,rows,columns
do i = 1, rows
do j = 1, columns
fillmatrix(i,j) = read *,input
end do
end do
end
Is there any way to do something like this? And am i making myself clear, because sometimes I have trouble saying what I mean.
Or is this possible?
matrix1 = fillmatrix(rows1)cols1)
function fillmatrix(rows,columns)
integer input,matrix(11,11)
//fill matrix
return matrix
end
In C or Java, you just have functions, but Fortran has both functions and subroutines. In a case like this, it might be easier to write it as a subroutine instead of as a function, so your call would look something like
integer matrix1(11,11), matrix2(11,11)
integer rows1,cols1,rows2,cols2,i,j
...
case (1)
call fillmatrix(matrix1)
call fillmatrix(matrix2)
...
where the subroutine would look something like
subroutine fillmatrix(m)
implicit none
integer, intent(out) :: m(:,:)
integer :: i, j
do j = 1,size(m,2)
do i = 1,size(m,1)
read *, m(i,j)
end do
end do
end subroutine fillmatrix
Note that I'm not directly specifying the array bounds - instead I'm figuring them out inside the subroutine. This means that this subroutine needs an explicit interface - the easiest way to get this is to put it in either a contains block or a module.
If you want to use a function, you need to know the size of the matrix before calling it. Here is a small example:
module readMatrix
implicit none
contains
function fillmatrix(cols,rows)
implicit none
! Argument/return value
integer,intent(in) :: rows,cols
integer :: fillmatrix(rows,cols)
! Loop counters
integer :: i,j
do j = 1, rows
do i = 1, cols
write(*,*) 'Enter matrix element ',i,j
read *,fillmatrix(i,j)
enddo ! j
enddo ! i
end function
end module
program test
use readMatrix
implicit none
integer,allocatable :: matrix(:,:)
integer :: row,col, stat
write(*,*) 'Enter number of rows'
read *,row
write(*,*) 'Enter number of cols'
read *,col
allocate( matrix(col,row), stat=stat )
if (stat/=0) stop 'Cannot allocate memory'
matrix = fillmatrix(col,row)
write(*,*) matrix
deallocate(matrix)
end program
This is similar, using a subroutine and a static array (like in the question):
module readMatrix
implicit none
contains
subroutine fillmatrix(cols,rows,matrix)
implicit none
! Argument/return value
integer,intent(out) :: rows,cols
integer,intent(out) :: matrix(:,:)
! Loop counters
integer :: i,j
write(*,*) 'Enter number of rows, up to a maximum of ',size(matrix,2)
read *,rows
write(*,*) 'Enter number of cols, up to a maximum of ',size(matrix,1)
read *,cols
if ( rows > size(matrix,2) .or. cols > size(matrix,1) ) &
stop 'Invalid dimension specified'
do j = 1, rows
do i = 1, cols
write(*,*) 'Enter matrix element ',i,j
read *,matrix(i,j)
enddo ! j
enddo ! i
end subroutine
end module
program test
use readMatrix
implicit none
integer,parameter :: maxCol=10,maxRow=10
integer :: matrix(maxCol,maxRow)
integer :: row,col
call fillmatrix(col,row,matrix)
write(*,*) matrix(1:col,1:row)
end program
You could even pass an allocatable array to the subroutine and allocate it there, but that's a different story...