pgi cuda fortran compiling error - cuda

As I compile a single cuda fortran code , the compiler give me the following error,
PGF90-F-0000-Internal compiler error. Device compiler exited with error status code and
Attempt to call global subroutine without chevrons: increment
arch linux, pgf90 2013
the code is as follow:
module simple
contains
attributes (global) subroutine increment(a,b)
implicit none
integer, intent(inout) :: a(:)
integer , intent(in) :: b
integer :: i , n
n = size( a )
do i = 1 , n
a ( i ) = a ( i )+ b
end do
end subroutine increment
end module simple
program incrementTestCPU
use simple
implicit none
integer :: n = 256
integer :: a ( n ) , b
a = 1
b = 3
call increment ( a , b )
if ( any ( a /= 4)) then
write (* ,*) "pass"
else
write(*,*) "not passed"
end if
end program incrementTestCPU

You're calling this a "cuda fortran" code, but it is syntactically incorrect whether you want to ultimately run the subroutine on the host (CPU) or device (GPU). You may wish to refer to this blog post as a quick start guide.
If you want to run the subroutine increment on the GPU, you have not called it correctly:
call increment ( a , b )
A GPU subroutine call needs kernel launch parameters, which are contained in the "triple chevron" <<<...>>> syntax which should be placed between the increment and its parameter list, like so:
call increment<<<1,1>>> ( a , b )
and this is giving rise to the error message:
Attempt to call global subroutine without chevrons
If, instead, you intend to run this subroutine on the CPU, and are just passing it through the CUDA fortran compiler, then it is incorrect to specify the global attribute on the subroutine:
attributes (global) subroutine increment(a,b)
The following is a modification of your code which would run the subroutine on the GPU, and compiles cleanly for me with PGI 14.9 tools:
$ cat test3.cuf
module simple
contains
attributes (global) subroutine increment(a,b)
implicit none
integer :: a(:)
integer, value :: b
integer :: i , n
n = size( a )
do i = 1 , n
a ( i ) = a ( i )+ b
end do
end subroutine increment
end module simple
program incrementTestCPU
use simple
use cudafor
implicit none
integer, parameter :: n = 256
integer, device :: a_d(n), b_d
integer :: a ( n ) , b
a = 1
b = 3
a_d = a
b_d = b
call increment<<<1,1>>> ( a_d , b_d )
a = a_d
if ( any ( a /= 4)) then
write (* ,*) "pass"
else
write(*,*) "not passed"
end if
end program incrementTestCPU
$ pgf90 -Mcuda -ta=nvidia,cc20,cuda6.5 -Minfo test3.cuf -o test3
incrementtestcpu:
23, Memory set idiom, loop replaced by call to __c_mset4
29, any reduction inlined
$ pgf90 --version
pgf90 14.9-0 64-bit target on x86-64 Linux -tp nehalem
The Portland Group - PGI Compilers and Tools
Copyright (c) 2014, NVIDIA CORPORATION. All rights reserved.
$
If you are trying to create a CPU-only version, then remove all CUDA Fortran syntax from your program. If you still have difficulty, you can ask a Fortran-directed question, as it is not a CUDA issue at that point. As an example, the following (non-CUDA) code compiled cleanly for me:
module simple
contains
subroutine increment(a,b)
implicit none
integer, intent(inout) :: a(:)
integer , intent(in) :: b
integer :: i , n
n = size( a )
do i = 1 , n
a ( i ) = a ( i )+ b
end do
end subroutine increment
end module simple
program incrementTestCPU
use simple
implicit none
integer, parameter :: n = 256
integer :: a ( n ) , b
a = 1
b = 3
call increment ( a , b )
if ( any ( a /= 4)) then
write (* ,*) "pass"
else
write(*,*) "not passed"
end if
end program incrementTestCPU

Related

Need help understanding how this Haskell code works

I am trying to learn Haskell programming language by trying to figure out some pieces of code.
I have these 2 small functions but I have no idea how to test them on ghci.
What parameters should I use when calling these functions?
total :: (Integer -> Integer) -> Integer -> Integer
total function count = foldr(\x count -> function x + count) 0 [0..count]
The function above is supposed to for the given value n, return f 0 + f 1 + ... + f n.
However when calling the function I don't understand what to put in the f part. n is just an integer, but what is f supposed to be?
iter :: Int -> (a -> a) -> (a -> a)
iter n f
| n > 0 = f . iter (n-1) f
| otherwise = id
iter' :: Int -> (a -> a) -> (a -> a)
iter' n = foldr (.) id . replicate n
This function is supposed to compose the given function f :: a -> a with itself n :: Integer times, e.g., iter 2 f = f . f.
Once again when calling the function I don't understand what to put instead of f as a parameter.
To your first question, you use any value for f such that
f 0 + f 1 + ... + f n
indeed makes sense. You could use any numeric function capable of accepting an Integer argument and returning an Integer value, like (1 +), abs, signum, error "error", (\x -> x^3-x^2+5*x-2), etc.
"Makes sense" here means that the resulting expression has type ("typechecks", in a vernacular), not that it would run without causing an error.
To your second question, any function that returns the same type of value as its argument, like (1+), (2/) etc.

Fortran: Calling other functions in a function

I wrote the GNU Fortran code in two separate files on Code::Blocks: main.f95, example.f95. main.f95 content:
program testing
use example
implicit none
integer :: a, b
write(*,"(a)", advance="no") "Enter first number: "
read(*,*) a
write(*,"(a)", advance="no") "Enter second number: "
read(*,*) b
write(*,*) factorial(a)
write(*,*) permutation(a, b)
write(*,*) combination(a, b)
end program testing
example.f95 content:
module example
contains
integer function factorial(x)
implicit none
integer, intent(in) :: x
integer :: product_ = 1, i
if (x < 1) then
factorial = -1
else if (x == 0 .or. x == 1) then
factorial = 1
else
do i = 2, x
product_ = product_ * i
end do
factorial = product_
end if
end function factorial
real function permutation(x, y)
implicit none
integer, intent(in) :: x, y
permutation = factorial(x) / factorial(x - y)
end function permutation
real function combination(x, y)
implicit none
integer, intent(in) :: x, y
combination = permutation(x, y) / factorial(y)
end function combination
end module example
When I run this code, the output is:
Enter first number: 5
Enter second number: 3
120
0.00000000
0.00000000
The permutation and combination functions don't work properly. Thanks for answers.
I think you've fallen foul of one of Fortran's well-known (to those who know it) gotchas. But before revealing that I have to ask how much testing you did ? I ran your code, got the odd result and thought for a minute ...
then I tested the factorial function for a few small values of x which produced
factorial 1 = 1
factorial 2 = 2
factorial 3 = 12
factorial 4 = 288
factorial 5 = 34560
factorial 6 = 24883200
factorial 7 = 857276416
factorial 8 = -511705088
factorial 9 = 1073741824
factorial 10 = 0
which is obviously wrong. So it seems that you didn't test your code properly, if at all, before asking for help. (I didn't test your combination and permutation functions.)
O tempora, o mores
You've initialised the variable product_ in the line
integer :: product_ = 1, i
and this automatically means that product_ acquires the attribute save so its value is stored from invocation to invocation (gotcha !). At the start of each call (other than the first) product_ has the value it had at the end of the previous call.
The remedy is simple, don't initialise product_. Change
integer :: product_ = 1, i
to
integer :: product_ , i
...
product_ = 1
Simpler still would be to not write your own factorial function but to use the intrinsic product function but that's another story.

Return an array from a function and store it in the main program

Here is the Main Program:
PROGRAM integration
EXTERNAL funct
DOUBLE PRECISION funct, a , b, sum, h
INTEGER n, i
REAL s
PARAMETER (a = 0, b = 10, n = 200)
h = (b-a)/n
sum = 0.0
DO i = 1, n
sum = sum+funct(i*h+a)
END DO
sum = h*(sum-0.5*(funct(a)+funct(b)))
PRINT *,sum
CONTAINS
END
And below is the Function funct(x)
DOUBLE PRECISION FUNCTION funct(x)
IMPLICIT NONE
DOUBLE PRECISION x
INTEGER K
Do k = 1,10
funct = x ** 2 * k
End Do
PRINT *, 'Value of funct is', funct
RETURN
END
I would like the 'Sum' in the Main Program to print 10 different sums over 10 different values of k in Function funct(x).
I have tried the above program but it just compiles the last value of Funct() instead of 10 different values in sum.
Array results require an explicit interface. You would also need to adjust funct and sum to actually be arrays using the dimension statement. Using an explicit interface requires Fortran 90+ (thanks for the hints by #francescalus and #VladimirF) and is quite tedious:
PROGRAM integration
INTERFACE funct
FUNCTION funct(x) result(r)
IMPLICIT NONE
DOUBLE PRECISION r
DIMENSION r( 10 )
DOUBLE PRECISION x
END FUNCTION
END INTERFACE
DOUBLE PRECISION a , b, sum, h
DIMENSION sum( 10)
INTEGER n, i
PARAMETER (a = 0, b = 10, n = 200)
h = (b-a)/n
sum = 0.0
DO i = 1, n
sum = sum+funct(i*h+a)
END DO
sum = h*(sum-0.5*(funct(a)+funct(b)))
PRINT *,sum
END
FUNCTION funct(x)
IMPLICIT NONE
DOUBLE PRECISION funct
DIMENSION funct( 10)
DOUBLE PRECISION x
INTEGER K
Do k = 1,10
funct(k) = x ** 2 * k
End Do
PRINT *, 'Value of funct is', funct
RETURN
END
If you can, you should switch to a more modern Standard such as Fortran 90+, and use modules. These provide interfaces automatically, which makes the code much simpler.
Alternatively, you could take the loop over k out of the function, and perform the sum element-wise. This would be valid FORTRAN 77:
PROGRAM integration
c ...
DIMENSION sum( 10)
c ...
INTEGER K
c ...
DO i = 1, n
Do k = 1,10
sum(k)= sum(k)+funct(i*h+a, k)
End Do
END DO
c ...
Notice that I pass k to the function. It needs to be adjusted accordingly:
DOUBLE PRECISION FUNCTION funct(x,k)
IMPLICIT NONE
DOUBLE PRECISION x
INTEGER K
funct = x ** 2 * k
PRINT *, 'Value of funct is', funct
RETURN
END
This version just returns a scalar and fills the array in the main program.
Apart from that I'm not sure it is wise to use a variable called sum. There is an intrinsic function with the same name. This could lead to some confusion...

How can I get values from a function on Fortran?

This is a simple program that performs a base conversion.
I try to print out the values using a loop and the directive:
write (*,'(i4,a,a)') it," = ",baseConversion(it,base)
For some reason I can't get the values using this line
program echeverria_4
implicit none
interface
function baseConversion(anumber,abase)
character(8) :: baseConversion
integer,intent(in) :: anumber, abase
end function baseConversion
end interface
integer :: firstNumbers,base,it, numero
character(8),dimension(100) :: rangeNumbers
!Part A
write(*,*) "Project 4 Part A"
firstNumbers = 20
base = 11
write(*,'(i4,i4)') firstNumbers, base
do it = 1, firstNumbers
write (*,'(i4,a,a)') it," = ",baseConversion(it,base)
end do
end program echeverria_4
function trans(anumber)
implicit none
character :: trans
integer,intent(in):: anumber
integer :: conversor1 = 48
integer :: conversor2 = 55
if (anumber >= 10) then
trans = char(anumber+conversor2)
else
trans = char(anumber+conversor1)
endif
end function trans
function baseConversion(anumber, abase)
implicit none
interface
function trans(anumber)
character :: trans
integer,intent(in):: anumber
end function trans
end interface
character(8):: baseConversion
integer,intent(in):: anumber,abase
character(8) :: leftmost
logical :: is_process_complete = .false.
integer :: remainder,division,localNumber
localNumber = anumber
do while(.not. is_process_complete)
!Step 1: Find the remainder
remainder = mod(localNumber,abase)
!Step 2: Divide the number by the base
division = localNumber/abase
if (division>0) then
localNumber = division
leftmost=trans(remainder)//leftmost
else
is_process_complete=.true.
leftmost=trans(remainder)//leftmost
end if
end do
write(baseConversion,'(a)') leftmost
end function baseConversion
Its easier if you place your procedures (subroutines and functions) into modules and then use the module for any program or procedure that needs those procedures. This automatically makes the interfaces explicit. You don't have to write interfaces ... less work, less chance of mistakes. So:
module MyModule
contains
function trans(anumber)
implicit none
character :: trans
integer,intent(in):: anumber
integer :: conversor1 = 48
integer :: conversor2 = 55
if (anumber >= 10) then
trans = char(anumber+conversor2)
else
trans = char(anumber+conversor1)
endif
end function trans
function baseConversion(anumber, abase)
implicit none
character(8):: baseConversion
integer,intent(in):: anumber,abase
character(8) :: leftmost
logical :: is_process_complete = .false.
integer :: remainder,division,localNumber
localNumber = anumber
do while(.not. is_process_complete)
!Step 1: Find the remainder
remainder = mod(localNumber,abase)
!Step 2: Divide the number by the base
division = localNumber/abase
if (division>0) then
localNumber = division
leftmost=trans(remainder)//leftmost
else
is_process_complete=.true.
leftmost=trans(remainder)//leftmost
end if
end do
write(baseConversion,'(a)') leftmost
end function baseConversion
end module MyModule
program echeverria_4
use MyModule
implicit none
integer :: firstNumbers,base,it, numero
character(8),dimension(100) :: rangeNumbers
!Part A
write(*,*) "Project 4 Part A"
firstNumbers = 20
base = 11
write(*,'(i4,i4)') firstNumbers, base
do it = 1, firstNumbers
write (*,'(i4,a,a)') it," = ",baseConversion(it,base)
end do
end program echeverria_4
When I compile this with gfortran with extensive error/warning options, I get the following warning messages:
test99.f90:42.51:
leftmost=trans(remainder)//leftmost
1
Warning: CHARACTER expression will be truncated in assignment (8/9) at (1)
test99.f90:45.51:
leftmost=trans(remainder)//leftmost
1
Warning: CHARACTER expression will be truncated in assignment (8/9) at (1)
Fixing those might make your program work. It is at least a first step.

cuda fortran speed-up

I am trying to evaluate the speed-up of a simple cuda fortran code: increment of an array.
CPU version:
module simpleOps_m
contains
subroutine increment (a, b)
implicit none
integer , intent ( inout ) :: a(:)
integer , intent (in) :: b
integer :: i, n
n = size (a)
do i = 1, n
a(i) = a(i)+b
enddo
end subroutine increment
end module simpleOps_m
program incrementTest
use simpleOps_m
implicit none
integer , parameter :: n = 1024*1024*100
integer :: a(n), b
a = 1
b = 3
call increment (a, b)
if ( any(a /= 4)) then
write (* ,*) '**** Program Failed **** '
else
write (* ,*) 'Program Passed '
endif
end program incrementTest
GPU version:
module simpleOps_m
contains
attributes ( global ) subroutine increment (a, b)
implicit none
integer , intent ( inout ) :: a(:)
integer , value :: b
integer :: i, n
n = size (a)
do i=blockDim %x*( blockIdx %x -1) + threadIdx %x ,n, BlockDim %x* GridDim %x
a(i) = a(i)+b
end do
end subroutine increment
end module simpleOps_m
program incrementTest
use cudafor
use simpleOps_m
implicit none
integer , parameter :: n = 1024*1024*100
integer :: a(n), b
integer , device :: a_d(n)
integer :: tPB = 256
a = 1
b = 3
a_d = a
call increment <<< 128,tPB >>>(a_d , b)
a = a_d
if ( any(a /= 4)) then
write (* ,*) '**** Program Failed **** '
else
write (* ,*) 'Program Passed '
endif
end program incrementTest
So I compile both versions with pgf90
http://www.pgroup.com/resources/cudafortran.htm
Using "time" command to evaluate execution time, I obtain:
for CPU version
$ time (cpu executable)
real 0m0.715s
user 0m0.410s
sys 0m0.300s
for GPU version
$ time (gpu executable)
real 0m1.057s
user 0m0.710s
sys 0m0.340s
So the speed-up=(CPU exec.time)/(GPU exec.time) is < 1
Are there some reason why the speed-up is not > 1 as one should attain?
Thanks in advance
The problem here is that in this rather contrived example, the cost of initialising the large array on the host (a=1) is almost the same as the cost of loop to increment the array contents, which is the part of the code being parallelised on the GPU. Because the total amount of parallel work is about the same as the total amount of serial work, Amdahl's Law is heavily stacked against achieving any sort of significant speed up by parallelising some of the code on the GPU.
A more significant speed up could probably be achieved by fusing the initialisation and increment operations into a single parallel operation on the GPU.
[This answer has been assembled from comments and added as a community wiki entry to get this question off the unanswered list for the CUDA tag]