Suppose in a program, I write this function for dividing 2 values:
function [63:0] DIV_VAL; // Function for Multiplying two values 32 bits.
input [63:0] a, b;
always # (a or b)
DIV_VAL = a / b;
endfunction
Then later in the code I want to call this function with input Znk1 BUT rotating them 16 bits and 12 bits (first and second argument of the function). Moreover, since the function DIV_VAL answer me with a number of 64 bits, I only want to 32 bits from it, to be loaded to NC_1: Like this.
NC_1 = DIV_VAL [31:0] (Znk1 << 16, Znk1 >> 12) ;
Is this allowed, does it work? I'm not sure about the order also.
Second question: As alternative for this situation, a friend told me I can define some registers like a, b and use them to do something like this:
a = Znk1 << 16;
b = Znk1 >> 12;
NC_1 = DIV_VAL [31:0] (a, b);
NC_1 = NC_1[31:0];
You can't put an always block inside a function. Your function should be:
function [63:0] DIV_VAL; // Function for Multiplying two values 32 bits.
input [63:0] a, b;
DIV_VAL = a / b;
endfunction
or as you've written using an old-fashioned style, perhaps:
function [63:0] DIV_VAL (input [63:0] a, b); // Function for Multiplying two values 32 bits.
DIV_VAL = a / b;
endfunction
You can then call the function with expressions in the function call, if you wish:
NC_1 = DIV_VAL (Znk1 << 16, Znk1 >> 12) ;
but truncating the return value explicitly as you were doing is not allowed. But you don't need to truncate explicitly, Verilog will do it implicitly. (Hence no [31:0] in the above code.)
Related
I want to generalize some predicate written in swi-prolog to calculate the power of some function. My predicate so far is:
% calculates the +Power and the +Argument of some function +Function with value +Value.
calc_power(Value, Argument, Function, Power) :-
not(Power is 0),
Power is Power_m1 + 1,
Value =..[Function, Buffer],
calc_power(Buffer, Argument, Function, Power_m1), !.
calc_power(Argument, Argument, _, 0).
The call calc_power((g(a)),A,f,POW). gives so far:
A = g(a),
POW = 0.
My generalization should also solve calls like that:
calc_power(A1, a, f, 3).
the solution should be in that special calse A1 = f(f(f(a))). But for some reason it doesn't work. I get the error:
ERROR: Arguments are not sufficiently instantiated
in line
Power is Power_m1 + 1
it means probably in swi prolog it is not possible to take plus with two variables. How can I solve this problem?
Can delay the + 1 operation with:
int_succ(I0, I1) :-
( nonvar(I0) ->
integer(I0),
I0 >= 0,
I1 is I0 + 1
; nonvar(I1) ->
integer(I1),
I1 >= 1,
I0 is I1 - 1
; when((nonvar(I0) ; nonvar(I1)), int_succ(I0, I1))
).
Example in swi-prolog:
?- int_succ(I0, I1), I1 = 7.
I0 = 6,
I1 = 7.
This is more flexible than https://www.swi-prolog.org/pldoc/man?predicate=succ/2 , and can of course be modified to support negative numbers if desired.
Found some solution
:- use_module(library(clpfd)).
% calculates the +Power and the +Argument of some function +Function with value +Value.
calc_power(Argument, Argument, _, 0).
calc_power(Value, Argument, Function, Power) :-
Power #\= 0,
Power #= Power_m1 + 1,
Value =..[Function, Buffer],
calc_power(Buffer, Argument, Function, Power_m1).
I have a function:
int = input("Introduce the intensity: ")
wave = input("Introduce the wavelength: ");
e = input("Introduce e: ");
function [brighttemp] = brightness_temp_function(int, wave, e)
A = 1.19 .* (10 .^ 8);
B = 1.441 .* (10 .^ 4);
brighttemp = sprintf("%.2f",(B ./ (wave .* (log (1 .+ ((e .* A) ./ (int .* (power(wave,5)))))))) .- 273.15);
disp(brighttemp);
endfunction
[brighttemp] = brightness_temp_function(wave, e, int);`
When I enter a single value for each variable, it outputs a single answer for brighttemp. But when I enter in a vector for one of the variables, such as [8, 9; 7, 8.5] for the int variable and single values for the others, I get back an output like this for brighttemp: 20.727.812.924.3 instead of a vector similar in format to the vector inputted for the int variable, like this [20.7, 27.8; 12.9, 24.3] . What do I have to do to get an output like the latter vector?
I'm new to Verilog and basically trying to teach myself a Digital Logic Design module for university. I am trying to write a BCD Adder in Verilog using two Full Adders with some logic in between for conversion to BCD when needed.
Here is my code:
module binary_adder (
output [3:0] Sum,
output C_out,
input [3:0] A, B,
input C_in
);
assign {C_out, Sum} = A || B || C_in;
endmodule
module BCD_Adder (
output [3:0] Sum,
output Carry_out,
input [3:0] Addend, Augend,
input Carry_in
);
wire [3:0] Z, correction;
wire adder1C_out, carryInAdder2, adder2C_out;
binary_adder adder1 (.Sum(Z), .C_out(adder1C_out), .A(Addend), .B(Augend), .C_in(Carry_in));
assign Carry_out = (adder1C_out || (Z[3] && Z[1]) || (Z[3] && Z[2]));
assign correction = (Carry_out) ? (4'b0110) : (4'b0000);
assign carryInAdder2 = (1'b0);
binary_adder adder2 (.Sum(Sum), .C_out(adder2C_out), .A(correction), .B(Z), .C_in(carryInAdder2));
endmodule
For some reason, I keep getting the following outputs:
Submitted: A = 0000, B = 0010, Carry In = 0, Sum = 0001, Carry Out = 0
Expected: A = 0000, B = 0010, Carry In = 0, Sum = 0010, Carry Out = 0
Submitted: A = 0000, B = 0011, Carry In = 0, Sum = 0001, Carry Out = 0
Expected: A = 0000, B = 0011, Carry In = 0, Sum = 0011, Carry Out = 0
Submitted: A = 0000, B = 0100, Carry In = 0, Sum = 0001, Carry Out = 0
Expected: A = 0000, B = 0100, Carry In = 0, Sum = 0100, Carry Out = 0
It basically continues like this for all values. My A, B, Carry In and Carry Out values always match, but for some reason the output sum is always 0001. I'm not sure where I'm going wrong, the logic seems okay to me. I am very new to this and only know the basics, so any help would be greatly appreciated!
Thanks,
Wes
The logic in binary_adder does not implement addition; as it is currently written, it will just set Sum to 1 if any of A, B or C_in are non-zero.
While there are many architectures of multibit addition (see https://en.wikipedia.org/wiki/Adder_(electronics)#Adders_supporting_multiple_bits), the simplest to understand is the Ripple Carry Adder. It implements several full adders and chains them together to implement addition.
A simple implementation of this architecture looks like this:
module full_add(input A, B, Cin,
output S, Cout);
// Basic implementation of a Full Adder (see https://en.wikipedia.org/wiki/Adder_(electronics)#Full_adder)
assign S = A ^ B ^ Cin;
assign Cout = A & B | ((A ^ B) & Cin); // Note I use bit-wise operators like | and ^ instead of logical ones like ||; its important to know the difference
endmodule
module add(input [3:0] A, B,
input Cin,
output [3:0] S,
output Cout);
wire [3:0] Carries; // Internal wires for the carries between full adders in Ripple Carry
// This is an array instance which just makes [3:0], ie 4, instances of the full adder.
// Take note that a single Full Adder modules takes in single bits, but here
// I can pass bit vectors like A ([3:0]) directly which assign full_add[0].A = A[0], full_add[1].A = A[1], etc
// Common alternatives to using array instances (which are more rare) include generate statements or just instantiate the module X times
full_add f[3:0](.A(A), .B(B), .Cin({Carries[2:0], Cin}), .S(S), .Cout(Carries));
assign Cout = Carries[3];
endmodule
I would like to implement a function duration = timer(n, f, arguments_of_f) that would measure how much time does a method f with arguments arguments_of_f need to run n times. My attempt was the following:
function duration = timer(n, f, arguments_of_f)
duration = 0;
for i=1:n
t0 = cputime;
f(arguments_of_f);
t1 = cputime;
duration += t1 - t0;
end
In another file, I have
function y = f(x)
y = x + 1;
end
The call d1 = timer(100, #f, 3); works as expected.
In another file, I have
function y = g(x1, x2)
y = x1 + x2;
end
but the call d2 = timer(100, #g, 1, 2); gives an error about undefined
argument x2, which is, when I look back, somehow expected, since I pass only
1 to g and 2 is never used.
So, how to implement the function timer in Octave, so that the call like
timer(4, #g, x1, ... , xK) would work? How can one pack the xs together?
So, I am looking for the analogue of Pythons *args trick:
def use_f(f, *args):
f(*args)
works if we define def f(x, y): return x + y and call use_f(f, 3, 4).
You don't need to pack all the arguments together, you just need to tell Octave that there is more than one argument coming and that they are all necessary. This is very easy to do using variadic arguments.
Your original implementation is nearly spot on: the necessary change is minimal. You need to change the variable arguments_to_f to the special name varargin, which is a magical cell array containing all your arbitrary undeclared arguments, and pass it with expansion instead of directly:
function duration = timer(n, f, varargin)
duration = 0;
for i=1:n
t0 = cputime;
f(varargin{:});
t1 = cputime;
duration += t1 - t0;
end
That's it. None of the other functions need to change.
Suppose I have a 32 or 64 bit unsigned integer.
What is the fastest way to find the index i of the leftmost bit such that the number of 0s in the leftmost i bits equals the number of 1s in the leftmost i bits?
I was thinking of some bit tricks like the ones mentioned here.
I am interested in recent x86_64 processor. This might be relevant as some processor support instructions as POPCNT (count the number of 1s) or LZCNT (counts the number of leading 0s).
If it helps, it is possible to assume that the first bit has always a certain value.
Example (with 16 bits):
If the integer is
1110010100110110b
^
i
then i=10 and it corresponds to the marked position.
A possible (slow) implementation for 16-bit integers could be:
mask = 1000000000000000b
pos = 0
count=0
do {
if(x & mask)
count++;
else
count--;
pos++;
x<<=1;
} while(count)
return pos;
Edit: fixed bug in code as per #njuffa comment.
I don't have any bit tricks for this, but I do have a SIMD trick.
First a few observations,
Interpreting 0 as -1, this problem becomes "find the first i so that the first i bits sum to 0".
0 is even but all the bits have odd values under this interpretation, which gives the insight that i must be even and this problem can be analyzed by blocks of 2 bits.
01 and 10 don't change the balance.
After spreading the groups of 2 out to bytes (none of the following is tested),
// optionally use AVX2 _mm_srlv_epi32 instead of ugly variable set
__m128i spread = _mm_shuffle_epi8(_mm_setr_epi32(x, x >> 2, x >> 4, x >> 6),
_mm_setr_epi8(0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15));
spread = _mm_and_si128(spread, _mm_set1_epi8(3));
Replace 00 by -1, 11 by 1, and 01 and 10 by 0:
__m128i r = _mm_shuffle_epi8(_mm_setr_epi8(-1, 0, 0, 1, 0,0,0,0,0,0,0,0,0,0,0,0),
spread);
Calculate the prefix sum:
__m128i pfs = _mm_add_epi8(r, _mm_bsrli_si128(r, 1));
pfs = _mm_add_epi8(pfs, _mm_bsrli_si128(pfs, 2));
pfs = _mm_add_epi8(pfs, _mm_bsrli_si128(pfs, 4));
pfs = _mm_add_epi8(pfs, _mm_bsrli_si128(pfs, 8));
Find the highest 0:
__m128i iszero = _mm_cmpeq_epi8(pfs, _mm_setzero_si128());
return __builtin_clz(_mm_movemask_epi8(iszero) << 15) * 2;
The << 15 and *2 appear because the resulting mask is 16 bits but the clz is 32 bit, it's shifted one less because if the top byte is zero that indicates that 1 group of 2 is taken, not zero.
This is a solution for 32-bit data using classical bit-twiddling techniques. The intermediate computation requires 64-bit arithmetic and logic operations. I have to tried to stick to portable operations as far as it was possible. Required is an implementation of the POSIX function ffsll to find the least-significant 1-bit in a 64-bit long long, and a custom function rev_bit_duos that reverses the bit-duos in a 32-bit integer. The latter could be replaced with a platform-specific bit-reversal intrinsic, such as the __rbit intrinsic on ARM platforms.
The basic observation is that if a bit-group with an equal number of 0-bits and 1-bits can be extracted, it must contain an even number of bits. This means we can examine the operand in 2-bit groups. We can further restrict ourselves to tracking whether each 2-bit increases (0b11), decreases (0b00) or leaves unchanged (0b01, 0b10) a running balance of bits. If we count positive and negative changes with separate counters, 4-bit counters will suffice unless the input is 0 or 0xffffffff, which can be handled separately. Based on comments to the question, these cases shouldn't occur. By subtracting the negative change count from the positive change count for each 2-bit group we can find at which group the balance becomes zero. There may be multiple such bit groups, we need to find the first one.
The processing can be parallelized by expanding each 2-bit group into a nibble that then can serve as a change counter. The prefix sum can be computed via integer multiply with an appropriate constant, which provides the necessary shift & add operations at each nibble position. Efficient ways for parallel nibble-wise subtraction are well-known, likewise there is a well-known technique due to Alan Mycroft for detecting zero-bytes that is trivially changeable to zero-nibble detection. POSIX function ffsll is then applied to find the bit position of that nibble.
Slightly problematic is the requirement for extraction of a left-most bit group, rather than a right-most, since Alan Mycroft's trick only works for finding the first zero-nibble from the right. Also, handling the prefix-sum for left-most bit group require use of a mulhi operation which may not be easily available, and may be less efficient than standard integer multiplication. I have addressed both of these issues by simply bit-reversing the original operand up front.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
/* Reverse bit-duos using classic binary partitioning algorithm */
inline uint32_t rev_bit_duos (uint32_t a)
{
uint32_t m;
a = (a >> 16) | (a << 16); // swap halfwords
m = 0x00ff00ff; a = ((a >> 8) & m) | ((a << 8) & ~m); // swap bytes
m = (m << 4)^m; a = ((a >> 4) & m) | ((a << 4) & ~m); // swap nibbles
m = (m << 2)^m; a = ((a >> 2) & m) | ((a << 2) & ~m); // swap bit-duos
return a;
}
/* Return the number of most significant (leftmost) bits that must be extracted
to achieve an equal count of 1-bits and 0-bits in the extracted bit group.
Return 0 if no such bit group exists.
*/
int solution (uint32_t x)
{
const uint64_t mask16 = 0x0000ffff0000ffffULL; // alternate half-words
const uint64_t mask8 = 0x00ff00ff00ff00ffULL; // alternate bytes
const uint64_t mask4h = 0x0c0c0c0c0c0c0c0cULL; // alternate nibbles, high bit-duo
const uint64_t mask4l = 0x0303030303030303ULL; // alternate nibbles, low bit-duo
const uint64_t nibble_lsb = 0x1111111111111111ULL;
const uint64_t nibble_msb = 0x8888888888888888ULL;
uint64_t a, b, r, s, t, expx, pc_expx, nc_expx;
int res;
/* common path can't handle all 0s and all 1s due to counter overflow */
if ((x == 0) || (x == ~0)) return 0;
/* make zero-nibble detection work, and simplify prefix sum computation */
x = rev_bit_duos (x); // reverse bit-duos
/* expand each bit-duo into a nibble */
expx = x;
expx = ((expx << 16) | expx) & mask16;
expx = ((expx << 8) | expx) & mask8;
expx = ((expx << 4) | expx);
expx = ((expx & mask4h) * 4) + (expx & mask4l);
/* compute positive and negative change counts for each nibble */
pc_expx = expx & ( expx >> 1) & nibble_lsb;
nc_expx = ~expx & (~expx >> 1) & nibble_lsb;
/* produce prefix sums for positive and negative change counters */
a = pc_expx * nibble_lsb;
b = nc_expx * nibble_lsb;
/* subtract positive and negative prefix sums, nibble-wise */
s = a ^ ~b;
r = a | nibble_msb;
t = b & ~nibble_msb;
s = s & nibble_msb;
r = r - t;
r = r ^ s;
/* find first nibble that is zero using Alan Mycroft's magic */
r = (r - nibble_lsb) & (~r & nibble_msb);
res = ffsll (r) / 2; // account for bit-duo to nibble expansion
return res;
}
/* Return the number of most significant (leftmost) bits that must be extracted
to achieve an equal count of 1-bits and 0-bits in the extracted bit group.
Return 0 if no such bit group exists.
*/
int reference (uint32_t x)
{
int count = 0;
int bits = 0;
uint32_t mask = 0x80000000;
do {
bits++;
if (x & mask) {
count++;
} else {
count--;
}
x = x << 1;
} while ((count) && (bits <= (int)(sizeof(x) * CHAR_BIT)));
return (count) ? 0 : bits;
}
int main (void)
{
uint32_t x = 0;
do {
uint32_t ref = reference (x);
uint32_t res = solution (x);
if (res != ref) {
printf ("x=%08x res=%u ref=%u\n\n", x, res, ref);
}
x++;
} while (x);
return EXIT_SUCCESS;
}
A possible solution (for 32-bit integers). I'm not sure if it can be improved / avoid the use of lookup tables. Here x is the input integer.
//Look-up table of 2^16 elements.
//The y-th is associated with the first 2 bytes y of x.
//If the wanted bit is in y, LUT1[y] is minus the position of the bit
//If the wanted bit is not in y, LUT1[y] is the number of ones in excess in y minus 1 (between 0 and 15)
LUT1 = ....
//Look-up talbe of 16 * 2^16 elements.
//The y-th element is associated to two integers y' and y'' of 4 and 16 bits, respectively.
//y' is the number of excess ones in the first byte of x, minus 1
//y'' is the second byte of x. The table contains the answer to return.
LUT2 = ....
if(LUT1[x>>16] < 0)
return -LUT1[x>>16];
return LUT2[ (LUT1[x>>16]<<16) | (x & 0xFFFF) ]
This requires ~1MB for the lookup tables.
The same idea also works using 4 lookup tables (one per byte of x). The requires more operations but brings down the memory to 12KB.
LUT1 = ... //2^8 elements
LUT2 = ... //8 * 2^8 elements
LUT3 = ... //16 * 2^8 elements
LUT3 = ... //24 * 2^8 elements
y = x>>24
if(LUT1[y] < 0)
return -LUT1[y];
y = (LUT1[y]<<8) | ((x>>16) & 0xFF);
if(LUT2[y] < 0)
return -LUT2[y];
y = (LUT2[y]<<8) | ((x>>8) & 0xFF);
if(LUT3[y] < 0)
return -LUT3[y];
return LUT4[(LUT2[y]<<8) | (x & 0xFF) ];