How to make a function func2(func1,t,y0) which receives another function func1 as an argument, but where func1 is a function that returns a 1D real(kind=8), dimension(:) array?
I have the following code written in Matlab, and I would like to write an equivalent one in Modern Fortran for speed and portability. I have written one for first order differential equations, but I'm struggling with the task of writing the code for a code for second and higher order differential equations because the external variable corresponding to differential equations must return an array with dimension(:). I want a code to be general purpose, i.e. I want a function or subroutine to which I can pass any differential equation.
The MatLab code is:
%---------------------------------------------------------------------------
clear all
close all
clc
t = [0:0.01:20]';
y0 = [2, 0]';
y = func_runge_kutta(#func_my_ode,t,y0);
function dy=func_my_ode(t,y)
% Second order differential equation y'' - (1-y^2)*y'+y = 0
dy = zeros(size(y));
dy(1) = y(2);
dy(2) = (1-y(1)^2)*y(2)-y(1);
end
function y = func_runge_kutta(func_my_ode,t,y0)
y = zeros(length(t),length(y0));
y(1,:) = y0';
for i=1:(length(t)-1)
h = t(i+1)-t(i);
F_1 = func_my_ode(t(i),y(i,:)');
F_2 = func_my_ode(t(i)+h/2,y(i,:)'+h/2*F_1);
F_3 = func_my_ode(t(i)+h/2,y(i,:)'+h/2*F_2);
F_4 = func_my_ode(t(i)+h,y(i,:)'+h*F_3);
y(i+1,:) = y(i,:)+h/6*(F_1+2*F_2+2*F_3+F_4)';
end
end
%---------------------------------------------------------------------------
If a function returns an array its interface must be explicit in the caller. The easiest way to achieve this for a dummy argument function is to use the PROCEDURE statement to clone the interface from a function that may be used as an actual argument. Starting with your code, translating to Fortran and adding declarations, we get:
module everything
use ISO_FORTRAN_ENV, only : wp => REAL64
implicit none
contains
function func_my_ode_1(t,y) result(dy)
! Second order differential equation y'' - (1-y**2)*y'+y = 0
real(wp) t
real(wp) y(:)
real(wp) dy(size(y))
dy(1) = y(2);
dy(2) = (1-y(1)**2)*y(2)-y(1);
end
function func_runge_kutta(func_my_ode,t,y0) result(y)
procedure(func_my_ode_1) func_my_ode
real(wp) t(:)
real(wp) y0(:)
real(wp) y(size(t),size(y0))
integer i
real(wp) h
real(wp) F_1(size(y0)),F_2(size(y0)),F_3(size(y0)),F_4(size(y0))
y(1,:) = y0;
do i=1,(size(t)-1)
h = t(i+1)-t(i);
F_1 = func_my_ode(t(i),y(i,:));
F_2 = func_my_ode(t(i)+h/2,y(i,:)+h/2*F_1);
F_3 = func_my_ode(t(i)+h/2,y(i,:)+h/2*F_2);
F_4 = func_my_ode(t(i)+h,y(i,:)+h*F_3);
y(i+1,:) = y(i,:)+h/6*(F_1+2*F_2+2*F_3+F_4);
end do
end
end module everything
program main
!clear all
!close all
!clc
use everything
implicit none
real(wp), allocatable :: t(:)
real(wp), allocatable :: y0(:)
real(wp), allocatable :: y(:,:)
integer i
integer iunit
t = [(0+0.01_wp*i,i=0,nint(20/0.01_wp))];
y0 = [2, 0];
y = func_runge_kutta(func_my_ode_1,t,y0);
open(newunit=iunit,file='rk4.txt',status='replace')
do i = 1,size(t)
write(iunit,*) t(i),y(i,1)
end do
end program main
I had Matlab read the data file and it plotted the same picture as the original Matlab program would have, had it plotted its results.
I would like to implement a function duration = timer(n, f, arguments_of_f) that would measure how much time does a method f with arguments arguments_of_f need to run n times. My attempt was the following:
function duration = timer(n, f, arguments_of_f)
duration = 0;
for i=1:n
t0 = cputime;
f(arguments_of_f);
t1 = cputime;
duration += t1 - t0;
end
In another file, I have
function y = f(x)
y = x + 1;
end
The call d1 = timer(100, #f, 3); works as expected.
In another file, I have
function y = g(x1, x2)
y = x1 + x2;
end
but the call d2 = timer(100, #g, 1, 2); gives an error about undefined
argument x2, which is, when I look back, somehow expected, since I pass only
1 to g and 2 is never used.
So, how to implement the function timer in Octave, so that the call like
timer(4, #g, x1, ... , xK) would work? How can one pack the xs together?
So, I am looking for the analogue of Pythons *args trick:
def use_f(f, *args):
f(*args)
works if we define def f(x, y): return x + y and call use_f(f, 3, 4).
You don't need to pack all the arguments together, you just need to tell Octave that there is more than one argument coming and that they are all necessary. This is very easy to do using variadic arguments.
Your original implementation is nearly spot on: the necessary change is minimal. You need to change the variable arguments_to_f to the special name varargin, which is a magical cell array containing all your arbitrary undeclared arguments, and pass it with expansion instead of directly:
function duration = timer(n, f, varargin)
duration = 0;
for i=1:n
t0 = cputime;
f(varargin{:});
t1 = cputime;
duration += t1 - t0;
end
That's it. None of the other functions need to change.
I am trying to convert this FORTRAN program (motion of pendulum) to CUDA FORTRAN but I can use only 1 block with two threads. Is there any way to use more then 2 threads....
MODULE CB
REAL :: Q,B,W
END MODULE CB
PROGRAM PENDULUM
USE CB
IMPLICIT NONE
INTEGER, PARAMETER :: N=10,L=100,M=1
INTEGER :: I,count_rate,count_max,count(2)
REAL :: PI,H,T,Y1,Y2,G1,G1F,G2,G2F
REAL :: DK11,DK21,DK12,DK22,DK13,DK23,DK14,DK24
REAL, DIMENSION (2,N) :: Y
PI = 4.0*ATAN(1.0)
H = 3.0*PI/L
Q = 0.5
B = 0.9
W = 2.0/3.0
Y(1,1) = 0.0
Y(2,1) = 2.0
DO I = 1, N-1
T = H*I
Y1 = Y(1,I)
Y2 = Y(2,I)
DK11 = H*G1F(Y1,Y2,T)
DK21 = H*G2F(Y1,Y2,T)
DK12 = H*G1F((Y1+DK11/2.0),(Y2+DK21/2.0),(T+H/2.0))
DK22 = H*G2F((Y1+DK11/2.0),(Y2+DK21/2.0),(T+H/2.0))
DK13 = H*G1F((Y1+DK12/2.0),(Y2+DK22/2.0),(T+H/2.0))
DK23 = H*G2F((Y1+DK12/2.0),(Y2+DK22/2.0),(T+H/2.0))
DK14 = H*G1F((Y1+DK13),(Y2+DK23),(T+H))
DK24 = H*G2F((Y1+DK13),(Y2+DK23),(T+H))
Y(1,I+1) = Y(1,I)+(DK11+2.0*(DK12+DK13)+DK14)/6.0
Y(2,I+1) = Y(2,I)+(DK21+2.0*(DK22+DK23)+DK24)/6.0
! Bring theta back to the region [-pi,pi]
Y(1,I+1) = Y(1,I+1)-2.0*PI*NINT(Y(1,I+1)/(2.0*PI))
END DO
call system_clock ( count(2), count_rate, count_max )
WRITE (6,"(2F16.8)") (Y(1,I),Y(2,I),I=1,N,M)
END PROGRAM PENDULUM
FUNCTION G1F (Y1,Y2,T) RESULT (G1)
USE CB
IMPLICIT NONE
REAL :: Y1,Y2,T,G1
G1 = Y2
END FUNCTION G1F
FUNCTION G2F (Y1,Y2,T) RESULT (G2)
USE CB
IMPLICIT NONE
REAL :: Y1,Y2,T,G2
G2 = -Q*Y2-SIN(Y1)+B*COS(W*T)
END FUNCTION G2F
CUDA FORTRAN VERSION OF PROGRAM
MODULE KERNEL
CONTAINS
attributes(global) subroutine mykernel(Y_d,N,L,M)
INTEGER,value:: N,L,M
INTEGER ::tid
REAL:: Y_d(:,:)
REAL :: PI,H,T,G1,G1F,G2,G2F
REAL,shared :: DK11,DK21,DK12,DK22,DK13,DK23,DK14,DK24,Y1,Y2
PI = 4.0*ATAN(1.0)
H = 3.0*PI/L
Y_d(1,1) = 0.0
Y_d(2,1) = 2.0
tid=threadidx%x
DO I = 1, N-1
T = H*I
Y1 = Y_d(1,I)
Y2 = Y_d(2,I)
if(tid==1)then
DK11 = H*G1F(Y1,Y2,T)
else
DK21 = H*G2F(Y1,Y2,T)
endif
call syncthreads ()
if(tid==1)then
DK12 = H*G1F((Y1+DK11/2.0),(Y2+DK21/2.0),(T+H/2.0))
else
DK22 = H*G2F((Y1+DK11/2.0),(Y2+DK21/2.0),(T+H/2.0))
endif
call syncthreads ()
if(tid==1)then
DK13 = H*G1F((Y1+DK12/2.0),(Y2+DK22/2.0),(T+H/2.0))
else
DK23 = H*G2F((Y1+DK12/2.0),(Y2+DK22/2.0),(T+H/2.0))
endif
call syncthreads ()
if(tid==1)then
DK14 = H*G1F((Y1+DK13),(Y2+DK23),(T+H))
else
DK24 = H*G2F((Y1+DK13),(Y2+DK23),(T+H))
endif
call syncthreads ()
if(tid==1)then
Y_d(1,I+1) = Y1+(DK11+2.0*(DK12+DK13)+DK14)/6.0
else
Y_d(2,I+1) = Y2+(DK21+2.0*(DK22+DK23)+DK24)/6.0
endif
Y_d(1,I+1) = Y_d(1,I+1)-2.0*PI*NINT(Y_d(1,I+1)/(2.0*PI))
call syncthreads ()
END DO
end subroutine mykernel
attributes(device) FUNCTION G1F (Y1,Y2,T) RESULT (G1)
IMPLICIT NONE
REAL :: Y1,Y2,T,G1
G1 = Y2
END FUNCTION G1F
attributes(device) FUNCTION G2F (Y1,Y2,T) RESULT (G2)
IMPLICIT NONE
REAL :: Y1,Y2,T,G2
G2 = -0.5*Y2-SIN(Y1)+0.9*COS((2.0/3.0)*T)
END FUNCTION G2F
END MODULE KERNEL
PROGRAM PENDULUM
use cudafor
use KERNEL
IMPLICIT NONE
INTEGER, PARAMETER :: N=100000,L=1000,M=1
INTEGER :: I,d,count_max,count_rate
REAL,device :: Y_d(2,N)
REAL, DIMENSION (2,N) :: Y
INTEGER :: count(2)
call mykernel<<<1,2>>>(Y_d,N,L,M)
Y=Y_d
WRITE (6,"(2F16.8)") (Y(1,I),Y(2,I),I=1,N,M)
END PROGRAM PENDULUM
You can see that only two independent threads of execution are possible by doing a data-dependency analysis of your original serial code. It's easiest to think of this as an "outer" and an "inner" part.
The "outer" part is the dependence of Y(1:2,i+1) on Y(1:2,i). At each time step, you need to use the values of Y(1:2,i) to calculate Y(1:2,i+1), so it's not possible to perform the calculations for multiple time steps in parallel, simply because of the serial dependence structure -- you need to know what happens at time i to calculate what happens at time i+1, you need to know what happens at time i+1 to calculate what happens at time i+2, and so on. The best that you can hope to do is to calculate Y(1,i+1) and Y(2,i+1) in parallel, which is exactly what you do.
The "inner" part is based on the dependencies between the intermediate values in the Runge-Kutta scheme, the DK11, DK12, etc. values in your code. When calculating Y(1:2,i+1), each of the DK[n,m] depends on Y(1:2,i) and for m > 1, each of the DK[n,m] depends on both DK[1,m-1] and DK[2,m-1]. If you draw a graph of these dependencies (which my ASCII art skills aren't really good enough for!), you'll see that there are at each step of the calculation only two possible sub-calculations that can be performed in parallel.
The result of all this is that you cannot do better than two parallel threads for this calculation. As one of the commenters above said, you can certainly do much better if you're simulating a particle system or some other mechanical system with multiple independent degrees of freedom, which you can then integrate in parallel.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Passing a function as argument to another function
Below is a simple code for the bisection method. I would like to know how to be able to pass in any function I choose as a parameter instead of hard coding functions.
% This is an implementation of the bisection method
% for a solution to f(x) = 0 over an interval [a,b] where f(a) and f(b)
% Input: endpoints (a,b),Tolerance(TOL), Max # of iterations (No).
% Output: Value p or error message.
function bjsect(a,b,TOL,No)
% Step 0
if f(a)*f(b)>0
disp('Function fails condition of f(a),f(b) w/opposite sign'\n);
return
end
% Step 1
i = 1;
FA = f(a);
% Step 2
while i <= No
% Step 3
p = a +(b - a)/2;
FP = f(p);
% Step 4
if FP == 0 || (b - a)/2 < TOL
disp(p);
return
end
% Step 5
i = i + 1;
% Step 6
if FA*FP > 0
a = p;
else
b = p;
end
% Step 7
if i > No
disp('Method failed after No iterations\n');
return
end
end
end
% Hard coded test function
function y = f(x)
y = x - 2*sin(x);
end
I know this is an important concept so any help is greatly appreciated.
The simplest method is using anonymous functions. In your example, you would define your anonymous function outside bjsect using:
MyAnonFunc = #(x) (x - 2 * sin(x));
You can now pass MyAnonFunc into bjsect as an argument. It has the object type of function handle, which can be validated using isa. Inside bjsect simply use MyAnonFunc as if it is a function, ie: MyAnonFunc(SomeInputValue).
Note, you can of course wrap any function you've written in an anonymous function, ie:
MyAnonFunc2 = #(x) (SomeOtherCustomFunction(x, OtherInputArgs));
is perfectly valid.
EDIT: Oops, just realized this is almost certainly a duplicate of another question - thanks H. Muster, I'll flag it.
I am a new user of MATLAB. I want to find the value that makes f(x) = 0, using the Newton-Raphson method. I have tried to write a code, but it seems that it's difficult to implement Newton-Raphson method. This is what I have so far:
function x = newton(x0, tolerance)
tolerance = 1.e-10;
format short e;
Params = load('saved_data.mat');
theta = pi/2;
zeta = cos(theta);
I = eye(Params.n,Params.n);
Q = zeta*I-Params.p*Params.p';
% T is a matrix(5,5)
Mroot = Params.M.^(1/2); %optimization
T = Mroot*Q*Mroot;
% Find the eigenvalues
E = real(eig(T));
% Find the negative eigenvalues
% Find the smallest negative eigenvalue
gamma = min(E);
% Now solve for lambda
M_inv = inv(Params.M); %optimization
zm = Params.zm;
x = x0;
err = (x - xPrev)/x;
while abs(err) > tolerance
xPrev = x;
x = xPrev - f(xPrev)./dfdx(xPrev);
% stop criterion: (f(x) - 0) < tolerance
err = f(x);
end
% stop criterion: change of x < tolerance % err = x - xPrev;
end
The above function is used like so:
% Calculate the functions
Winv = inv(M_inv+x.*Q);
f = #(x)( zm'*M_inv*Winv*M_inv*zm);
dfdx = #(x)(-zm'*M_inv*Winv*Q*M_inv*zm);
x0 = (-1/gamma)/2;
xRoot = newton(x0,1e-10);
The question isn't particularly clear. However, do you need to implement the root finding yourself? If not then just use Matlab's built in function fzero (not based on Newton-Raphson).
If you do need your own implementation of the Newton-Raphson method then I suggest using one of the answers to Newton Raphsons method in Matlab? as your starting point.
Edit: The following isn't answering your question, but is just a note on coding style.
It is useful to split your program up into reusable chunks. In this case your root finding should be separated from your function construction. I recommend writing your Newton-Raphson method in a separate file and call this from the script where you define your function and its derivative. Your source would then look some thing like:
% Define the function (and its derivative) to perform root finding on:
Params = load('saved_data.mat');
theta = pi/2;
zeta = cos(theta);
I = eye(Params.n,Params.n);
Q = zeta*I-Params.p*Params.p';
Mroot = Params.M.^(1/2);
T = Mroot*Q*Mroot; %T is a matrix(5,5)
E = real(eig(T)); % Find the eigen-values
gamma = min(E); % Find the smallest negative eigen value
% Now solve for lambda (what is lambda?)
M_inv = inv(Params.M);
zm = Params.zm;
Winv = inv(M_inv+x.*Q);
f = #(x)( zm'*M_inv*Winv*M_inv*zm);
dfdx = #(x)(-zm'*M_inv*Winv*Q*M_inv*zm);
x0 = (-1./gamma)/2.;
xRoot = newton(f, dfdx, x0, 1e-10);
In newton.m you would have your implementation of the Newton-Raphson method, which takes as arguments the function handles you define (f and dfdx). Using your code given in the question, this would look something like
function root = newton(f, df, x0, tol)
root = x0; % Initial guess for the root
MAXIT = 20; % Maximum number of iterations
for j = 1:MAXIT;
dx = f(root) / df(root);
root = root - dx
% Stop criterion:
if abs(dx) < tolerance
return
end
end
% Raise error if maximum number of iterations reached.
error('newton: maximum number of allowed iterations exceeded.')
end
Notice that I avoided using an infinite loop.