I have many truth-tables of many variables (7 or more) and I use a tool (eg logic friday 1) to simplify the logic formula. I could do that by hand but that is much too error prone. These formula I then translate to compiler intrinsics (eg _mm_xor_epi32) which works fine.
Question: with vpternlog I can make ternary logic operations. But I'm not aware of a method to simplify my truth-tables to sequences of vpternlog instructions that are (somewhat) efficient.
I'm not asking if someone knows a tool that simplifies to arbitrary ternary logic operations, although that would be great, I'm looking for a method to do such simplifications.
Edit: I asked a similar question on Electrical Engineering.
Outside of just leaving it to the compiler, or the hand-wavy suggestions in the 2nd section of my answer, see HJLebbink's self-answer using FPGA logic-optimization tools. (This answer ended up with the bounty because it failed to attract such an answer from anyone else; it's not really bounty-worthy. :/ I wrote it before there was a bounty, but don't have anything else useful to add.)
ICC18 optimizes chained _mm512_and/or/xor_epi32 intrinsics into vpternlogd instructions, but gcc/clang don't.
On Godbolt for this and a more complicated function using some inputs multiple times:
#include <immintrin.h>
__m512i logic(__m512i a, __m512i b, __m512i c,
__m512i d, __m512i e, __m512i f, __m512i g) {
// return _mm512_and_epi32(_mm512_and_epi32(a, b), c);
return a & b & c & d & e & f;
}
gcc -O3 -march=skylake-avx512 nightly build
logic:
vpandq zmm4, zmm4, zmm5
vpandq zmm3, zmm2, zmm3
vpandq zmm4, zmm4, zmm3
vpandq zmm0, zmm0, zmm1
vpandq zmm0, zmm4, zmm0
ret
ICC18 -O3 -march=skylake-avx512
logic:
vpternlogd zmm2, zmm0, zmm1, 128 #6.21
vpternlogd zmm4, zmm2, zmm3, 128 #6.29
vpandd zmm0, zmm4, zmm5 #6.33
ret #6.33
IDK how good it is at picking optimal solutions when each variable is used more than once in different subexpressions.
To see if it does a good job, you have to do the optimization yourself.
You want to find sets of 3 variables that can be combined together into a single boolean value without still needing those 3 variables anywhere else in the expression.
I think it's possible for a truth table with more than 3 inputs to not simplify down this way, to a smaller truth table where one of the columns is the result of a ternary combination of 3 of the inputs. e.g. I think it's not guaranteed that it's possible to simplify a 4 input function to vpternlog + AND, OR, or XOR.
I'd definitely worry that compilers might pick 3 inputs to combine that didn't result in as much simplification as a different choice of 3.
It might even be optimal for a compiler to start with a binary operation or two on a couple pairs to set up for a ternary operation, especially if that enables better ILP.
You could probably write a brute-force truth-table optimizer that looked for triplets of variables that could be combined to make a smaller table for just the ternary result and the rest of the table. But I'm not sure a greedy approach is guaranteed to give the best results. If there are multiple ways to combine with the same total instruction count, they're probably not all equivalent for ILP (Instruction Level Parallelism).
How to translate a truth table to a sequence of vpternlog instructions.
Translate the truth table into a logical formula; use e.g., Logic Friday.
Store the logical formula in Synopsys equation format (.eqn). E.g., I used a network with 6 input nodes A to F, two output nodes F0 and F1, and a somewhat complicated (non unate) Boolean function.
Content of BF_Q6.eqn:
INORDER = A B C D E F;
OUTORDER = F0 F1;
F0 = (!A*!B*!C*!D*!E*F) + (!A*!B*!C*!D*E*!F) + (!A*!B*!C*D*!E*!F) + (!A*!B*C*!D*!E*!F) + (!A*B*!C*!D*!E*!F) + (A*!B*!C*!D*!E*!F);
F1 = (!A*!B*!C*!D*E) + (!A*!B*!C*D*!E) + (!A*!B*C*!D*!E) + (!A*B*!C*!D*!E) + (A*!B*!C*!D*!E);
Use "ABC: A System for Sequential Synthesis and Verification" from the Berkeley Verification and Synthesis Research Center. I used the windows version. Get ABC here.
In ABC I run:
abc 01> read_eqn BF_Q6.eqn
abc 02> choice; if -K 3; ps
abc 03> lutpack -N 3 -S 3; ps
abc 04> show
abc 05> write_bench BF_Q6.bench
You may need to run choice; if -K 3; ps multiple time to get better results.
The resulting BF_Q6.bench contains the 3-LUTs for an FPGA:
INPUT(A)
INPUT(B)
INPUT(C)
INPUT(D)
INPUT(E)
INPUT(F)
OUTPUT(F0)
OUTPUT(F1)
n11 = LUT 0x01 ( B, C, D )
n12 = LUT 0x1 ( A, E )
n14 = LUT 0x9 ( A, E )
n16 = LUT 0xe9 ( B, C, D )
n18 = LUT 0x2 ( n11, n14 )
F1 = LUT 0xae ( n18, n12, n16 )
n21 = LUT 0xd9 ( F, n11, n14 )
n22 = LUT 0xd9 ( F, n12, n16 )
F0 = LUT 0x95 ( F, n21, n22 )
4.
This can be translated to C++ mechanically:
__m512i not(__m512i a) { return _mm512_xor_si512(a, _mm512_set1_epi32(-1)); }
__m512i binary(__m512i a, __m512i b, int i) {
switch (i)
{
case 0: return _mm512_setzero_si512();
case 1: return not(_mm512_or_si512(a, b));
case 2: return _mm512_andnot_si512(a, b);
case 3: return not(a);
case 4: return _mm512_andnot_si512(b, a);
case 5: return not(b);
case 6: return _mm512_xor_si512(a, b);
case 7: return not(_mm512_and_si512(a, b));
case 8: return _mm512_and_si512(a, b);
case 9: return not(_mm512_xor_si512(a, b));
case 10: return b;
case 11: return _mm512_or_si512(not(a), b);
case 12: return a;
case 13: return mm512_or_si512(a, not(b));
case 14: return _mm512_or_si512(a, b);
case 15: return _mm512_set1_epi32(-1);
default: return _mm512_setzero_si512();
}
}
void BF_Q6(const __m512i A, const __m512i B, const __m512i C, const __m512i D, const __m512i E, const __m512i F, __m512i& F0, __m512i& F1) {
const auto n11 = _mm512_ternarylogic_epi64(B, C, D, 0x01);
const auto n12 = binary(A, E, 0x1);
const auto n14 = binary(A, E, 0x9);
const auto n16 = _mm512_ternarylogic_epi64(B, C, D, 0xe9);
const auto n18 = binary(n11, n14, 0x2);
F1 = _mm512_ternarylogic_epi64(n18, n12, n16, 0xae);
const auto n21 = _mm512_ternarylogic_epi64(F, n11, n14, 0xd9);
const auto n22 = _mm512_ternarylogic_epi64(F, n12, n16, 0xd9);
F0 = _mm512_ternarylogic_epi64(F, n21, n22, 0x95);
}
The question remains whether the resulting C++ code is optimal. I don't think that this method (often) yields the smallest networks of 3-LUTs, simply because this problem is NP-hard. Additionally, it is not possible to inform ABC about instruction parallelism, and it is not possible to prioritize the order of variables such that variables that will be used later are not in the first position of the LUT (since the first source operand is overwritten with the result). But the compiler may be smart enough to do such optimizations.
ICC18 gives the following assembly:
00007FF75DCE1340 sub rsp,78h
00007FF75DCE1344 vmovups xmmword ptr [rsp+40h],xmm15
00007FF75DCE134A vmovups zmm2,zmmword ptr [r9]
00007FF75DCE1350 vmovups zmm1,zmmword ptr [r8]
00007FF75DCE1356 vmovups zmm5,zmmword ptr [rdx]
00007FF75DCE135C vmovups zmm4,zmmword ptr [rcx]
00007FF75DCE1362 vpternlogd zmm15, zmm15, zmm15, 0FFh
00007FF75DCE1369 vpternlogq zmm5, zmm1, zmm2, 0E9h
00007FF75DCE1370 vmovaps zmm3, zmm2
00007FF75DCE1376 mov rax, qword ptr[&E]
00007FF75DCE137E vpternlogq zmm3, zmm1, zmmword ptr[rdx], 1
00007FF75DCE1385 mov r11, qword ptr[&F]
00007FF75DCE138D mov rcx, qword ptr[F0]
00007FF75DCE1395 mov r10, qword ptr[F1]
00007FF75DCE139D vpord zmm0, zmm4, zmmword ptr[rax]
00007FF75DCE13A3 vpxord zmm4, zmm4, zmmword ptr[rax]
00007FF75DCE13A9 vpxord zmm0, zmm0, zmm15
00007FF75DCE13AF vpxord zmm15, zmm4, zmm15
00007FF75DCE13B5 vpandnd zmm1, zmm3, zmm15
00007FF75DCE13BB vpternlogq zmm1, zmm0, zmm5, 0AEh
00007FF75DCE13C2 vpternlogq zmm15, zmm3, zmmword ptr[r11], 0CBh
^^^^^ ^^^^^^^^^^^^^^^^
00007FF75DCE13C9 vpternlogq zmm5, zmm0, zmmword ptr[r11], 0CBh
00007FF75DCE13D0 vmovups zmmword ptr[r10], zmm1
00007FF75DCE13D6 vpternlogq zmm5, zmm15, zmmword ptr[r11], 87h
00007FF75DCE13DD vmovups zmmword ptr [rcx],zmm5
00007FF75DCE13E3 vzeroupper
00007FF75DCE13E6 vmovups xmm15,xmmword ptr [rsp+40h]
00007FF75DCE13EC add rsp,78h
00007FF75DCE13F0 ret
ICC18 is able to change the variable ordering in const auto n22 = _mm512_ternarylogic_epi64(F, n12, n16, 0xd9); to vpternlogq zmm15, zmm3, zmmword ptr[r11], 0CBh such that variable F is not overwritten. (But strangely enough fetched from memory 3 times...)
Related
I am working on a toy language which is need of some compilation now. So far, I have nested function calls, and I am wondering what the assembly sort of pseudocode would look like. This includes:
Function prologue and epilogue.
Instruction "blocks".
How functions are reused.
Not real assembly code, just pseudocode (though I am sort of coming from x86).
The x86 "block" structure I'm referring to seems to be this:
my_function:
enter N, 0
; ... details
leave
ret
To keep it simple (avoiding async and all that), my toy code looks like this basically:
my_func_a(add(1, my_func_z(2, 3)), sub(my_func_r(4, 5, 6), 7), 8)
Or spread out:
my_func_a(
add(
1,
my_func_z(2, 3)
),
sub(
my_func_r(
4,
5,
6
),
7
),
8
)
When I try to translate this mentally into some sort of corresponding pseudo-assembly, I end up doing this:
u = my_func_r(4, 5, 6)
v = sub(u, 7)
w = my_func_z(2, 3)
x = add(1, w)
y = my_func_a(x, v) ; final result
I'm not sure if I got the order exactly correct, but it's close I think. But this isn't quite low-level enough as x86, so I try lowering it further:
mov r1, 4
mov r2, 5
mov r3, 6
call my_func_r
; how to capture the "u = ..."?
mov r1, u?
mov r2, 7
call sub
; how about capturing "v"?
mov r1, 2
mov r2, 3
call my_func_z
; w = ... somehow
mov r1, 1
mov r2, w
call add
; x = ... somehow
mov r1, x
mov r2, v
call my_func_a
; y...
I would try and use some sort of tool to compile some rough C-like language into LLVM, for example, but it's way over my head to get that all going and working at this stage.
Now let's say we have implementations for some of these functions:
my_func_r(a,b,c):
enter ...
add a, b
sub b, c
leave
ret
my_func_z(a,b):
enter ...
; ...
leave
ret
my_func_a(a,b):
enter ...
; ...
leave
ret
Basically what I'm wondering is, how should the final pseudocode be written? (given I haven't exactly specified here how everything works, just got it at a rough level at this point). What would it roughly look like? I don't see how the values are properly placed and passed around in the lowest-level Assembly version.
Where are the arguments passed into the functions? At what place, before the function call, or inside the function/block?
How do you capture the output variables outside of the function?
Sorry if this pseudocode is prone with errors, I am just beginning to put the pieces together, so please be gentle.
I have a problem with a Fortran project and figured out maybe you could help me.
I'm using codeblocks as IDE and there you can make projects, so I created a project with two files in it: a main program and a fuction (I don't know what else to use, I could use something different from a fuction maybe).
So I have my function that reads values from a .txt and saves them as real numbers and everything is working good. What I want to do is, from the file main, call this function and save in main the data I collected with my function so that the main remains cleaner.
How would I do that? I can post the whole script if you want, but I don't think it would add that much more.
EDIT: As you asked, here it is (uncut):
program main
! Variables
real :: d1, r1, r2, a, teta, freq, Dt, mu, g0, r_t, height, r, omega, H, lx, ly, lz, m_c0, &
Jx, Jy, Jz, gmax, I_s, K, Jx0, Jy0, Jz0, Vmin, tsp_0, Fmax, Isp, c1, n, F, DV, tfin, cont, Tmax
real :: data_input
! Call the funcion
data_input=data_module(d1, r1, r2, a, teta, freq, Dt, mu, g0, r_t, height, r, omega, H, lx, ly, lz, m_c0, &
Jx, Jy, Jz, gmax, I_s, K, Jx0, Jy0, Jz0, Vmin, tsp_0, Fmax, Isp, c1, n, F, DV, tfin, cont, Tmax)
! Error
if (data_input/=1) then
print*, 'ERROR: data_module did not work'
end if
!Just to show it
print*,'After'
print*, d1, r1, r2, a, teta, freq, Dt, mu, g0, r_t, height, r, omega, H, lx, ly, lz, m_c0, &
Jx, Jy, Jz, gmax, I_s, K, Jx0, Jy0, Jz0, Vmin, tsp_0, Fmax, Isp, c1, n, F, DV, tfin, cont, Tmax
end program main
real function data_module ()
! Variables
implicit none
integer :: flag_read=0, w_int, d_int
real:: coefficient, d1, r1, r2, a, teta, freq, Dt, mu, g0, r_t, height, r, omega, H, lx, ly, lz, m_c0, &
Jx, Jy, Jz, gmax, I_s, K, Jx0, Jy0, Jz0, Vmin, tsp_0, Fmax, Isp, c1, n, F, DV, tfin, cont, Tmax
character (LEN=35) :: starting_string, name*15, coefficient_string*20, w_string, d_string, number_format
character :: w*2, d
! Open file
open (11, file = 'Data.txt', status = 'old', access = 'sequential', form = 'formatted')
! Read a new line for every iteration
sentence_reader: do while (flag_read==0)
read (11, fmt='(A)', iostat = flag_read) starting_string
! Error
if (flag_read>0)then
print*, 'ERROR: could not read data'
stop
end if
! Skip useless lines
if (starting_string(1:1)=='%' .OR. starting_string(1:1)==' ') then
cycle
end if
! Exit when you're done
if (flag_read<0)then
exit sentence_reader
end if
! Just stuff to prepare it
name=trim(starting_string(1:index(starting_string, '=')-1))
coefficient_string=trim(adjustl(starting_string(index(starting_string, '=')+1:index(starting_string,';')-1)))
if (scan(coefficient_string,'E')/=0) then
w_string=coefficient_string
w_int=len_trim(w_string)
write(w, '(BN,I2)') w_int
d_string=coefficient_string(index(coefficient_string, '.')+1:index(coefficient_string, 'E')-1)
d_int=len_trim(d_string)
write(d, '(BN,I1)') d_int
!All togheter
number_format='(BN,F' // trim(w) // '.' // d // ')'
else
w_string=coefficient_string
w_int=len_trim(w_string)
write(w, '(BN,I1)') w_int
d_string=coefficient_string(index(coefficient_string, '.')+1:len_trim(coefficient_string))
d_int=len_trim(d_string)
write(d, '(BN,I1)') d_int
number_format='(BN,F' // trim(w) // '.' // d // ')'
end if
! Read the number
read(coefficient_string,number_format) coefficient
! Save where it's needed (is there an easier way to do it?)
select case (name)
case ('d1')
d1=coefficient
case ('r1')
r1=coefficient
case ('r2')
r2=coefficient
case ('a')
exit
case ('teta')
exit
case ('freq')
freq=coefficient
case ('Dt')
exit
case ('mu')
mu=coefficient
case ('g0')
g0=coefficient
case ('r_t')
r_t=coefficient
case ('height')
height=coefficient
case ('lx')
lx=coefficient
case ('ly')
ly=coefficient
case ('lz')
lz=coefficient
case ('m_c0')
m_c0=coefficient
case ('Jx')
Jx=coefficient
case ('Jy')
Jy=coefficient
case ('Jz')
Jz=coefficient
case ('gmax')
gmax=coefficient
case ('I_s')
I_s=coefficient
case ('K')
K=coefficient
case ('Vmin')
Vmin=coefficient
case ('tsp_0')
tsp_0=coefficient
case ('Fmax')
Fmax=coefficient
case ('Isp')
Isp=coefficient
case ('n')
n=coefficient
case ('tfin')
tfin=coefficient
case ('cont')
cont=coefficient
case ('Tmax')
Tmax=coefficient
case default
print*, 'Variable ', name, ' is not recognized'
end select
end do sentence_reader
! Other stuff I need
teta=atan((r1 - r2)/d1)
a=sqrt(d1**2 + (r1 - r2)**2)
Dt=1/freq
r=r_t + height
omega=(mu/(r**3))**0.5
H=(r*mu)**0.5
Jx0=Jx - I_s
Jy0=Jy - I_s
Jz0=Jz - I_s
c1=Isp*g0
F=n*Fmax
DV=(F/m_c0)*tsp_0
! Shows that the function is correctly executed
data_module=1
print*,'Before'
print*, d1, r1, r2, a, teta, freq, Dt, mu, g0, r_t, height, r, omega, H, lx, ly, lz, m_c0, &
Jx, Jy, Jz, gmax, I_s, K, Jx0, Jy0, Jz0, Vmin, tsp_0, Fmax, Isp, c1, n, F, DV, tfin, cont, Tmax
end function data_module
PS. I know modules, but with open and all the other stuff I couldn't get them to work. Would love to.
What I want to do is to pass the data d1, r1, ecc that I collected in data_module to main and save them in main, but doing it this way it doesn't save them (if you run it, when you print them "before" everything is fine, when you print them "after" you got all zeros.
Okay, there are a few things I notice.
Your function is of type real, but you set it only to 1 (an integer), to, as you put it in the comment "Show that the function is correctly executed".
It's not uncommon to make a procedure return a value to show whether it executed correctly or not, but it's usually an error code, with zero meaning that no error occurred and everything went fine.
Also, you might want to declare the function as integer instead of real, as integers are better for that kind of thing. (More reliable to compare.)
As to your actual question: If you want to pass more than a single value back to the calling routine, you would want to declare intent(out) dummy variables. See this example:
integer function test_output(outdata)
integer, intent(out) :: outdata(10)
integer :: i
outdata = (/(i, i=1, 10)/)
! All worked well
test_output = 0
return
end function test_output
Modules are the way to go. Here is a very limited example on how to incorporate the function above into a module, and using that module in a program:
module mod_test
implicit none
! Here you can place variables that should be available
! to any procedure using this module
contains
! Here you can place all the procedures (functions and
! subroutines)
integer function test_output(outdata)
integer, intent(out) :: outdata(10)
integer :: i
outdata = (/(i, i=1, 10)/)
! All worked well
test_output = 0
return
end function test_output
end module mod_test
program test
! The 'USE' statement is the only thing that needs to be
! *ahead* of the 'implicit none'
use mod_test
implicit none
integer :: mydata(10) ! The variable that will contain the data
! from the function
integer :: status ! The variable that will contain the error
! code.
status = test_output(mydata)
if (status == 0) then
print*, mydata
end if
end program test
If the module is in a different source file, you need to compile them this way (assuming that you use gfortran):
$ gfortran -c -o mod_test.o mod_test.f90
$ gfortran -c -o test.o test.f90
$ gfortran -o test test.o mod_test.o
I have I simple problem but I cannot find a solution anywhere.
I have to integrate a function (for example using a Simpson's rule subroutine) but I am obliged to pass to my function more than one argument: one is the variable that I want to integrate later and another one is just a value coming from a different calculation which I cannot perform inside the function.
The problem is that the Simpson subroutine only accept f(x) to perform the integral and not f(x,y).
After Vladimir suggestions I modified the code.
Below the example:
Program main2
!------------------------------------------------------------------
! Integration of a function using Simpson rule
! with doubling number of intervals
!------------------------------------------------------------------
! to compile:
! gfortran main2.f90 -o simp2
implicit none
double precision r, rb, rmin, rmax, rstep, integral, eps
double precision F_int
integer nint, i, rbins
double precision t
rbins = 4
rmin = 0.0
rmax = 4.0
rstep = (rmax-rmin)/rbins
rb = rmin
eps = 1.0e-8
func = 0.0
t=2.0
do i=1,rbins
call func(rb,t,res)
write(*,*)'r, f(rb) (in main) = ', rb, res
!test = F_int(rb)
!write(*,*)'test F_int (in loop) = ', test
call simpson2(F_int(rb),rmin,rb,eps,integral,nint)
write(*,*)'r, integral = ', rb, integral
rb = rb+rstep
end do
end program main2
subroutine func(x,y,res)
!----------------------------------------
! Real Function
!----------------------------------------
implicit none
double precision res
double precision, intent(in) :: x
double precision y
res = 2.0*x + y
write(*,*)'f(x,y) (in func) = ',res
return
end subroutine func
function F_int(x)
!Function to integrate
implicit none
double precision F_int, res
double precision, intent(in) :: x
double precision y
call func(x,y,res)
F_int = res
end function F_int
Subroutine simpson2(f,a,b,eps,integral,nint)
!==========================================================
! Integration of f(x) on [a,b]
! Method: Simpson rule with doubling number of intervals
! till error = coeff*|I_n - I_2n| < eps
! written by: Alex Godunov (October 2009)
!----------------------------------------------------------
! IN:
! f - Function to integrate (supplied by a user)
! a - Lower limit of integration
! b - Upper limit of integration
! eps - tolerance
! OUT:
! integral - Result of integration
! nint - number of intervals to achieve accuracy
!==========================================================
implicit none
double precision f, a, b, eps, integral
double precision sn, s2n, h, x
integer nint
double precision, parameter :: coeff = 1.0/15.0 ! error estimate coeff
integer, parameter :: nmax=1048576 ! max number of intervals
integer n, i
! evaluate integral for 2 intervals (three points)
h = (b-a)/2.0
sn = (1.0/3.0)*h*(f(a)+4.0*f(a+h)+f(b))
write(*,*)'a, b, h, sn (in simp) = ', a, b, h, sn
! loop over number of intervals (starting from 4 intervals)
n=4
do while (n <= nmax)
s2n = 0.0
h = (b-a)/dfloat(n)
do i=2, n-2, 2
x = a+dfloat(i)*h
s2n = s2n + 2.0*f(x) + 4.0*f(x+h)
end do
s2n = (s2n + f(a) + f(b) + 4.0*f(a+h))*h/3.0
if(coeff*abs(s2n-sn) <= eps) then
integral = s2n + coeff*(s2n-sn)
nint = n
exit
end if
sn = s2n
n = n*2
end do
return
end subroutine simpson2
I think I'm pretty close to the solution but I cannot figure it out...
If I call simpson2(F_int, ..) without putting the argument in F_int I receive this message:
call simpson2(F_int,rmin,rb,eps,integral,nint)
1
Warning: Expected a procedure for argument 'f' at (1)
Any help?
Thanks in advance!
Now you have a code we can work with, good job!
You need to tell the compiler, that F_int is a function. That can be done by
external F_int
but it is much better to learn Fortran 90 and use modules or at least interface blocks.
module my_functions
implicit none
contains
subroutine func(x,y,res)
!----------------------------------------
! Real Function
!----------------------------------------
implicit none
double precision res
double precision, intent(in) :: x
double precision y
res = 2.0*x + y
write(*,*)'f(x,y) (in func) = ',res
return
end subroutine func
function F_int(x)
!Function to integrate
implicit none
double precision F_int, res
double precision, intent(in) :: x
double precision y
call func(x,y,res)
F_int = res
end function F_int
end module
Now you can easily use the module and integrate the function
use my_functions
call simpson2(F_int,rmin,rb,eps,integral,nint)
But you will find that F_int still does not know what y is! It has it's own y with undefined value! You should put y into the module instead so that everyone can see it.
module my_functions
implicit none
double precision :: y
contains
Don't forget to remove all other declarations of y! Both in function F_int and in the main program. Probably it is also better to call it differently.
Don't forget to set the value of y somewhere inside your main loop!
I'm stuck in a process where I need to compute the values of a function f[x,y,z] on a grid. Here I put how I wrote the program, only evaluating on a one-dimensional grid.
I wrote the program:
program CHISQUARE_MINIMIZATION_VELOCITY_PROFILES
use distribution
IMPLICIT none
integer, parameter :: kp=1001 ! Parameter which states the number of points on the grid.
integer, parameter :: ndata=13 ! Parameter which states the number of elements of the data file.
integer, parameter :: nconst=3 ! Fixed integer parameter.
integer i, j, n
real*8 rc0, rcf, V00, V0f, d00, d0f, rc, V0, d, z
real*8 rcr(kp), V0r(kp), d0r(kp), chisq(kp)
!Scaling radius range
rc0=0.0d-5 ! kpc
rcf=1.0d2 ! kpc
call linspace(rc0,rcf,kp,rcr)
!**************If I call like this, it works normal*****************
!CHISQUARED(1.3d0, 130.2d0, 0.12d0, 1.0d0, 1.0d0, 2.0d0, 0.0d0, 0.0d0, 1, !ndata, nconst)
! **1.27000000000000 0.745818846396887**
! Press any key to continue
!**************If I call like this, it works normal*****************
!******* Here is where my problem is****************
do j=1, kp
rc=rcr(j)
write(*,*) rc, CHISQUARED(rc, 130.2d0, 0.12d0, 1.0d0, 1.0d0, 2.0d0, 0.0d0, 0.0d0, 1, ndata, nconst)
enddo
!******* Here is where my problem is****************
end program CHISQUARE_MINIMIZATION_VELOCITY_PROFILES
I use the module where I compute the chi^2 distribution, coming from a theoretical model...
MODULE distribution
IMPLICIT NONE
CONTAINS
! I define here the chi^2 function****
real*8 function CHISQUARED(rc, V0, d, alpha, gamma, chi, a, b, n, ndata, nconst)
integer i, n, ndata, nconst
real*8 rc, V0, d
real*8 alpha, gamma, chi, a, b, s
real*8, DIMENSION(ndata,3) :: X
open(unit=1, file="data.txt")
s=0.0d0
do i=1, ndata
Read(1,*) X(i,:)
s=s+((X(i,2)-VELOCITYPROFILE(X(i,1), rc, V0, d, alpha, gamma, chi, a, b, n))/(X(i,3)))**2.0d0
end do
CHISQUARED=s/(ndata-nconst)
end function CHISQUARED
!****Here I define the model function
real*8 function VELOCITYPROFILE(r, rc, V0, d, alpha, gamma, chi, a, b, n)
integer i, n
real*8 r, rc, V0, d, alpha, gamma, chi, a, b, z
if (rc < 0.0d0 .OR. d < 0.0d0 .OR. a <0.0d0 .OR. b <0.0d0 .OR. alpha < 0.0d0 .OR. gamma <0.0d0 .OR. chi < 0.0d0 .OR. n<1 ) then
VELOCITYPROFILE=0.0d0
return
else
z=0.0d0
do i=0,n
z=z+((V0*((r/rc)**(1.5d0))*(1+a+r/rc)**(-gamma*(2*n+0.5d0)))/((a+(r/rc)**alpha)**(chi/2.0d0)))*(((b+r/rc)**gamma)/d)**i
end do
VELOCITYPROFILE=z
end if
end function VELOCITYPROFILE
END MODULE distribution
!*****************END OF THE MODULE******************************
the data.txt file is of the form
0.24 37.31 6.15
0.28 37.92 5.5
0.46 47.12 3.9
0.64 53.48 2.8
0.73 55.14 3.3
0.82 58.47 2.5
1.08 66.15 3.3
1.22 69.39 2.75
1.45 74.55 5.
1.71 77.94 2.93
1.87 81.66 2.5
2.2 86.81 3.02
2.28 90.08 2.1
2.69 94.38 3.92
2.7 95.36 1.8
In order to get several values of the function CHISQUARED, I use the subroutine linspace to generate the partition of the 1-dimensional grid
subroutine linspace(xi,xf,jmax,y)
integer jmax,j
real*8 xi,xf,y(jmax)
y=(/(xi+dble(j-1)*(xf-xi)/(dble(jmax)-1.0d0), j=1, jmax)/)
end subroutine linspace
What happens is that if in the main program, I call the function CHISQUARED like this:
CHISQUARED(1.3d0, 130.2d0, 0.12d0, 1.0d0, 1.0d0, 2.0d0, 0.0d0, 0.0d0, 1, ndata, nconst)
**1.27000000000000 0.745818846396887**
Press any key to continue
I get some finite value, like, I don't know, 0.7 or something like this. (I restricted the data file so the result won't be the one written, I just put 0.7 as an example). However, when I put it inside a loop as it is in the program written above, to get the values on the one dimensional grid, it gives me the error
**0.000000000000000E+000 NaN**
forrtl: severe (24): end-of-file during read, unit 1, file C:\Users\Ernesto Lopez Fune\Desktop\Minimize\newone\chisquarerotationcurve\data.txt
Image PC Routine Line Source
chisquarerotation 0040B889 Unknown Unknown Unknown
Press any key to continue
Can anyone recommend me what to do in this case? How to overcome this barrier?
According to your error, you reach the end of your file.
When you call your subroutine once, it's OK but in a loop, your file is read multiple times. After the first iteration, your file is read until the EOF control but for the next iteration, the program can't read anymore because it has already reached the end of the file.
You need to use the REWIND(1) statement before end function CHISQUARED. With this, the cursor will be re-positioned at the beginning of the file. Besides, I think it would be better to OPEN your file in the main program and not in a function or subroutine to avoid multiple OPEN/CLOSE.
Don't forget to CLOSE your file when you are done dealing with it.
I was reading Joel's book where he was suggesting as interview question:
Write a program to reverse the "ON" bits in a given byte.
I only can think of a solution using C.
Asking here so you can show me how to do in a Non C way (if possible)
I claim trick question. :) Reversing all bits means a flip-flop, but only the bits that are on clearly means:
return 0;
What specifically does that question mean?
Good question. If reversing the "ON" bits means reversing only the bits that are "ON", then you will always get 0, no matter what the input is. If it means reversing all the bits, i.e. changing all 1s to 0s and all 0s to 1s, which is how I initially read it, then that's just a bitwise NOT, or complement. C-based languages have a complement operator, ~, that does this. For example:
unsigned char b = 102; /* 0x66, 01100110 */
unsigned char reverse = ~b; /* 0x99, 10011001 */
What specifically does that question mean?
Does reverse mean setting 1's to 0's and vice versa?
Or does it mean 00001100 --> 00110000 where you reverse their order in the byte? Or perhaps just reversing the part that is from the first 1 to the last 1? ie. 00110101 --> 00101011?
Assuming it means reversing the bit order in the whole byte, here's an x86 assembler version:
; al is input register
; bl is output register
xor bl, bl ; clear output
; first bit
rcl al, 1 ; rotate al through carry
rcr bl, 1 ; rotate carry into bl
; duplicate above 2-line statements 7 more times for the other bits
not the most optimal solution, a table lookup is faster.
Reversing the order of bits in C#:
byte ReverseByte(byte b)
{
byte r = 0;
for(int i=0; i<8; i++)
{
int mask = 1 << i;
int bit = (b & mask) >> i;
int reversedMask = bit << (7 - i);
r |= (byte)reversedMask;
}
return r;
}
I'm sure there are more clever ways of doing it but in that precise case, the interview question is meant to determine if you know bitwise operations so I guess this solution would work.
In an interview, the interviewer usually wants to know how you find a solution, what are you problem solving skills, if it's clean or if it's a hack. So don't come up with too much of a clever solution because that will probably mean you found it somewhere on the Internet beforehand. Don't try to fake that you don't know it neither and that you just come up with the answer because you are a genius, this is will be even worst if she figures out since you are basically lying.
If you're talking about switching 1's to 0's and 0's to 1's, using Ruby:
n = 0b11001100
~n
If you mean reverse the order:
n = 0b11001100
eval("0b" + n.to_s(2).reverse)
If you mean counting the on bits, as mentioned by another user:
n = 123
count = 0
0.upto(8) { |i| count = count + n[i] }
♥ Ruby
I'm probably misremembering, but I
thought that Joel's question was about
counting the "on" bits rather than
reversing them.
Here you go:
#include <stdio.h>
int countBits(unsigned char byte);
int main(){
FILE* out = fopen( "bitcount.c" ,"w");
int i;
fprintf(out, "#include <stdio.h>\n#include <stdlib.h>\n#include <time.h>\n\n");
fprintf(out, "int bitcount[256] = {");
for(i=0;i<256;i++){
fprintf(out, "%i", countBits((unsigned char)i));
if( i < 255 ) fprintf(out, ", ");
}
fprintf(out, "};\n\n");
fprintf(out, "int main(){\n");
fprintf(out, "srand ( time(NULL) );\n");
fprintf(out, "\tint num = rand() %% 256;\n");
fprintf(out, "\tprintf(\"The byte %%i has %%i bits set to ON.\\n\", num, bitcount[num]);\n");
fprintf(out, "\treturn 0;\n");
fprintf(out, "}\n");
fclose(out);
return 0;
}
int countBits(unsigned char byte){
unsigned char mask = 1;
int count = 0;
while(mask){
if( mask&byte ) count++;
mask <<= 1;
}
return count;
}
The classic Bit Hacks page has several (really very clever) ways to do this, but it's all in C. Any language derived from C syntax (notably Java) will likely have similar methods. I'm sure we'll get some Haskell versions in this thread ;)
byte ReverseByte(byte b)
{
return b ^ 0xff;
}
That works if ^ is XOR in your language, but not if it's AND, which it often is.
And here's a version directly cut and pasted from OpenJDK, which is interesting because it involves no loop. On the other hand, unlike the Scheme version I posted, this version only works for 32-bit and 64-bit numbers. :-)
32-bit version:
public static int reverse(int i) {
// HD, Figure 7-1
i = (i & 0x55555555) << 1 | (i >>> 1) & 0x55555555;
i = (i & 0x33333333) << 2 | (i >>> 2) & 0x33333333;
i = (i & 0x0f0f0f0f) << 4 | (i >>> 4) & 0x0f0f0f0f;
i = (i << 24) | ((i & 0xff00) << 8) |
((i >>> 8) & 0xff00) | (i >>> 24);
return i;
}
64-bit version:
public static long reverse(long i) {
// HD, Figure 7-1
i = (i & 0x5555555555555555L) << 1 | (i >>> 1) & 0x5555555555555555L;
i = (i & 0x3333333333333333L) << 2 | (i >>> 2) & 0x3333333333333333L;
i = (i & 0x0f0f0f0f0f0f0f0fL) << 4 | (i >>> 4) & 0x0f0f0f0f0f0f0f0fL;
i = (i & 0x00ff00ff00ff00ffL) << 8 | (i >>> 8) & 0x00ff00ff00ff00ffL;
i = (i << 48) | ((i & 0xffff0000L) << 16) |
((i >>> 16) & 0xffff0000L) | (i >>> 48);
return i;
}
pseudo code..
while (Read())
Write(0);
I'm probably misremembering, but I thought that Joel's question was about counting the "on" bits rather than reversing them.
Here's the obligatory Haskell soln for complementing the bits, it uses the library function, complement:
import Data.Bits
import Data.Int
i = 123::Int
i32 = 123::Int32
i64 = 123::Int64
var2 = 123::Integer
test1 = sho i
test2 = sho i32
test3 = sho i64
test4 = sho var2 -- Exception
sho i = putStrLn $ showBits i ++ "\n" ++ (showBits $complement i)
showBits v = concatMap f (showBits2 v) where
f False = "0"
f True = "1"
showBits2 v = map (testBit v) [0..(bitSize v - 1)]
If the question means to flip all the bits, and you aren't allowed to use C-like operators such as XOR and NOT, then this will work:
bFlipped = -1 - bInput;
I'd modify palmsey's second example, eliminating a bug and eliminating the eval:
n = 0b11001100
n.to_s(2).rjust(8, '0').reverse.to_i(2)
The rjust is important if the number to be bitwise-reversed is a fixed-length bit field -- without it, the reverse of 0b00101010 would be 0b10101 rather than the correct 0b01010100. (Obviously, the 8 should be replaced with the length in question.) I just got tripped up by this one.
Asking here so you can show me how to do in a Non C way (if possible)
Say you have the number 10101010. To change 1s to 0s (and vice versa) you just use XOR:
10101010
^11111111
--------
01010101
Doing it by hand is about as "Non C" as you'll get.
However from the wording of the question it really sounds like it's only turning off "ON" bits... In which case the answer is zero (as has already been mentioned) (unless of course the question is actually asking to swap the order of the bits).
Since the question asked for a non-C way, here's a Scheme implementation, cheerfully plagiarised from SLIB:
(define (bit-reverse k n)
(do ((m (if (negative? n) (lognot n) n) (arithmetic-shift m -1))
(k (+ -1 k) (+ -1 k))
(rvs 0 (logior (arithmetic-shift rvs 1) (logand 1 m))))
((negative? k) (if (negative? n) (lognot rvs) rvs))))
(define (reverse-bit-field n start end)
(define width (- end start))
(let ((mask (lognot (ash -1 width))))
(define zn (logand mask (arithmetic-shift n (- start))))
(logior (arithmetic-shift (bit-reverse width zn) start)
(logand (lognot (ash mask start)) n))))
Rewritten as C (for people unfamiliar with Scheme), it'd look something like this (with the understanding that in Scheme, numbers can be arbitrarily big):
int
bit_reverse(int k, int n)
{
int m = n < 0 ? ~n : n;
int rvs = 0;
while (--k >= 0) {
rvs = (rvs << 1) | (m & 1);
m >>= 1;
}
return n < 0 ? ~rvs : rvs;
}
int
reverse_bit_field(int n, int start, int end)
{
int width = end - start;
int mask = ~(-1 << width);
int zn = mask & (n >> start);
return (bit_reverse(width, zn) << start) | (~(mask << start) & n);
}
Reversing the bits.
For example we have a number represented by 01101011 . Now if we reverse the bits then this number will become 11010110. Now to achieve this you should first know how to do swap two bits in a number.
Swapping two bits in a number:-
XOR both the bits with one and see if results are different. If they are not then both the bits are same otherwise XOR both the bits with XOR and save it in its original number;
Now for reversing the number
FOR I less than Numberofbits/2
swap(Number,I,NumberOfBits-1-I);