Verilog conditional assign outputs X where there should be 1 - output

I am currently building a sign extender in Verilog based on the one present in the ARMv8 processor, but after the first result is extended, every subsequent result makes a 1 in the output into an X. How do I get rid of the X?
The module and the quick test bench I made are shown below.
Sign Extender:
`timescale 1ns / 1ps
module SignExtender(BusImm, ImmIns);
output [63:0] BusImm;
input [31:0] ImmIns;
wire extBit;
assign extBit = (ImmIns[31:26] == 6'bx00101) ? ImmIns[25]:
(ImmIns[31:24] == 8'bxxx10100) ? ImmIns[23]:
(ImmIns[31:21] == 11'bxxxx1000xx0) ? ImmIns[20]:
1'b0;
assign BusImm = (ImmIns[31:26] == 6'bx00101) ? {{38{extBit}}, ImmIns[25:0]}:
(ImmIns[31:24] == 8'bxxx10100) ? {{45{extBit}}, ImmIns[23:5]}:
(ImmIns[31:21] == 11'bxxxx1000xx0) ? {{55{extBit}}, ImmIns[20:12]}:
64'b0;
assign BusImm = 64'b0;
endmodule
Test Bench:
`timescale 1ns / 1ps
`define STRLEN 32
`define HalfClockPeriod 60
`define ClockPeriod `HalfClockPeriod * 2
module SignExtenderTest;
task passTest;
input [63:0] actualOut, expectedOut;
input [`STRLEN*8:0] testType;
inout [7:0] passed;
if(actualOut == expectedOut) begin $display ("%s passed", testType); passed = passed + 1; end
else $display ("%s failed: 0x%x should be 0x%x", testType, actualOut, expectedOut);
endtask
task allPassed;
input [7:0] passed;
input [7:0] numTests;
if(passed == numTests) $display ("All tests passed");
else $display("Some tests failed: %d of %d passed", passed, numTests);
endtask
reg [7:0] passed;
reg [31:0] in;
wire [63:0] out;
SignExtender uut (
.BusImm(out),
.ImmIns(in)
);
initial begin
passed = 0;
in = 32'hF84003E9;
#10;
begin
passTest(out, 63'b0, "Stuff", passed);
#10;
in = 32'hf84093ea;
#10;
passTest(out, 63'b0, "Stuff", passed);
end
end
endmodule

You seem to be treating x as a "don't-care" value in your comparisons, but it is not. x is a specific value which represents "unknown". Since you drive your input signals to all known values (0 or 1), all your == comparisons resolve to x, and your output has x in it. You should only compare bits you are interested in. For example, change:
(ImmIns[31:21] == 11'bxxxx1000xx0) ? {{55{extBit}}, ImmIns[20:12]}:
to:
( (ImmIns[27:24] == 4'b1000) && (ImmIns[21] == 1'b0) ) ? {{55{extBit}}, ImmIns[20:12]}:
You need to make similar changes to all your comparisons.
Also, you drive BusImm with 2 continuous assignments. Get rid of this line:
assign BusImm = 64'b0;
These changes get the x out of your output.
Also consider using casez. Refer to IEEE Std 1800-2017, section 12.5.1 Case statement with do-not-cares.

Related

Verilog Binary Coded Decimal Adder Not Outputting Correctly

I'm new to Verilog and basically trying to teach myself a Digital Logic Design module for university. I am trying to write a BCD Adder in Verilog using two Full Adders with some logic in between for conversion to BCD when needed.
Here is my code:
module binary_adder (
output [3:0] Sum,
output C_out,
input [3:0] A, B,
input C_in
);
assign {C_out, Sum} = A || B || C_in;
endmodule
module BCD_Adder (
output [3:0] Sum,
output Carry_out,
input [3:0] Addend, Augend,
input Carry_in
);
wire [3:0] Z, correction;
wire adder1C_out, carryInAdder2, adder2C_out;
binary_adder adder1 (.Sum(Z), .C_out(adder1C_out), .A(Addend), .B(Augend), .C_in(Carry_in));
assign Carry_out = (adder1C_out || (Z[3] && Z[1]) || (Z[3] && Z[2]));
assign correction = (Carry_out) ? (4'b0110) : (4'b0000);
assign carryInAdder2 = (1'b0);
binary_adder adder2 (.Sum(Sum), .C_out(adder2C_out), .A(correction), .B(Z), .C_in(carryInAdder2));
endmodule
For some reason, I keep getting the following outputs:
Submitted: A = 0000, B = 0010, Carry In = 0, Sum = 0001, Carry Out = 0
Expected: A = 0000, B = 0010, Carry In = 0, Sum = 0010, Carry Out = 0
Submitted: A = 0000, B = 0011, Carry In = 0, Sum = 0001, Carry Out = 0
Expected: A = 0000, B = 0011, Carry In = 0, Sum = 0011, Carry Out = 0
Submitted: A = 0000, B = 0100, Carry In = 0, Sum = 0001, Carry Out = 0
Expected: A = 0000, B = 0100, Carry In = 0, Sum = 0100, Carry Out = 0
It basically continues like this for all values. My A, B, Carry In and Carry Out values always match, but for some reason the output sum is always 0001. I'm not sure where I'm going wrong, the logic seems okay to me. I am very new to this and only know the basics, so any help would be greatly appreciated!
Thanks,
Wes
The logic in binary_adder does not implement addition; as it is currently written, it will just set Sum to 1 if any of A, B or C_in are non-zero.
While there are many architectures of multibit addition (see https://en.wikipedia.org/wiki/Adder_(electronics)#Adders_supporting_multiple_bits), the simplest to understand is the Ripple Carry Adder. It implements several full adders and chains them together to implement addition.
A simple implementation of this architecture looks like this:
module full_add(input A, B, Cin,
output S, Cout);
// Basic implementation of a Full Adder (see https://en.wikipedia.org/wiki/Adder_(electronics)#Full_adder)
assign S = A ^ B ^ Cin;
assign Cout = A & B | ((A ^ B) & Cin); // Note I use bit-wise operators like | and ^ instead of logical ones like ||; its important to know the difference
endmodule
module add(input [3:0] A, B,
input Cin,
output [3:0] S,
output Cout);
wire [3:0] Carries; // Internal wires for the carries between full adders in Ripple Carry
// This is an array instance which just makes [3:0], ie 4, instances of the full adder.
// Take note that a single Full Adder modules takes in single bits, but here
// I can pass bit vectors like A ([3:0]) directly which assign full_add[0].A = A[0], full_add[1].A = A[1], etc
// Common alternatives to using array instances (which are more rare) include generate statements or just instantiate the module X times
full_add f[3:0](.A(A), .B(B), .Cin({Carries[2:0], Cin}), .S(S), .Cout(Carries));
assign Cout = Carries[3];
endmodule

Single cycle MIPS processor instruction execution

I am executing six instructions in this Single Cycle MIPS processor.
I am unable to figure error in ALU module.
Six instructions i am trying to execute are: Add, Sub, AND, OR, Load, Store
Right now, i am getting correct result for Addition only.
In five bit MUX, i have given an input 'a' and 'b'. How can i relate this input 'a' and 'b' to instruction source and detonation registers.
Also, how can i add 4 byte after the execution of each instruction (i.e. my program counter is unable to increment).
//ALU Block
module ALU (FunctField,ReadData1,out_inALU, ALUOp,ctr1,result);
input ALUOp;
input [5:0]FunctField;
input [31:0] ReadData1;
input [31:0] out_inALU;
output [2:0] ctr1;
reg [2:0] ctr1;
output [31:0] result;
reg [31:0] result;
always #(*)
begin
if(ALUOp == 1) //Arithmetic' Type Instructions
begin
case(FunctField)
//begin
6'b100000: ctr1 = 3'b010; //ADDITION in 'R' Type
6'b100010: ctr1 = 3'b110; // SUBTRACTION in 'R' Type
6'b100100: ctr1 = 3'b000; // AND in 'R' Type
6'b100101: ctr1 = 3'b001; // OR in 'R' Type
endcase
end
if(ALUOp == 0)
begin // LW/SW Type Instructions
if (ctr1 == 3'b010)
result = ReadData1 + out_inALU ;
else if (ctr1 == 3'b110)
result = ReadData1 - out_inALU ;
else if (ctr1 == 3'b000)
result = ReadData1 & out_inALU ;
else if (ctr1 == 3'b001)
result = ReadData1 | out_inALU;
end
result = ReadData1 | out_inALU ;
end
endmodule
// Data memory
module data_memory (ReadAddr, WriteAddr, WriteData1, clock,
MemWrite, MemRead,ReadData);
input [31:0] ReadAddr, WriteAddr, WriteData1;
input clock, MemWrite, MemRead;
reg [31:0] mem[0:50]; // For simulation this no. is enough
reg [31:0] i; // Temporary variable
output [31:0] ReadData;
reg [31:0] ReadData;
initial
begin
// Initial read-out
ReadData = 0;
// Initial memory content for testing purpose
for ( i = 0; i <= 50; i = i+1)
mem[i] = i;
end
// Memory content is always fetched with positive edge clock
always #(posedge clock)
begin
wait ( MemRead )
#10 ReadData = mem[ReadAddr];
wait ( MemWrite )
#10 mem[WriteAddr] = WriteData1;
end
endmodule
// Instruction Memory
module inst(addr,clock,instruction);
input clock;
input [ 31 : 0 ] addr;
output [ 31 : 0 ] instruction;
reg [ 31 : 0 ] instruction;
reg [ 31 : 0 ] MEM[0 :31 ] ;
initial
begin
MEM[ 0 ] <= 32'h10000011;
MEM[ 1 ] <= 32'h10000011 ;
MEM[ 2 ] <= 32'h20000022 ;
MEM[ 3 ] <= 32'h30000033 ;
MEM[ 4 ] <= 32'h40000044 ;
MEM[ 5 ] <= 32'h50000055 ;
MEM[ 6 ] <= 32'h60000066 ;
MEM[ 7 ] <= 32'h70000077 ;
MEM[ 8 ] <= 32'h80000088 ;
MEM[ 9 ] <= 32'h90000099 ;
end
always #( posedge clock )
begin
instruction <= MEM[ addr ] ;
end
endmodule
//Main module
module(reset, clock, regwrite,regDst,ALUSrc,MemtoReg,MemWrite,MemRead, input_PC)
input reset,clock;
input regWrite;
input regDst;
input ALUSrc;
input MemtoReg;
input MemWrite;
input MemRead;
input [31:0] input_PC;
wire [4:0] ReadReg1, ReadReg2, WriteReg;
wire [31:0] WriteData;
// Instantiation of modules
//Program Counter
wire [31:0] addr;
PC x1(input_PC,clock,reset,addr);
// Instruction Memory
wire [31:0] instruction;
inst x2(addr,clock,instruction);
//Multiplexer with regDst
reg[4:0] inputa,inputb;
MUX_2to1_5bit x3(inputa,inputb,regDst,WriteReg);
//Register File
wire [31:0] ReadData1,ReadData2;
Register_32 x4 ( ReadReg1, ReadReg2, WriteReg, WriteData, clock,
RegWrite,ReadData1, ReadData2);
//Sign Extender
wire [31:0] out_sign;
SignExtender_16to32 x5(immediate, out_sign);
//Multilpexer ALUSrc
wire [31:0] out_inALU;
MUX_2to1 x6( ReadData2 , out_sign, ALUSrc,out_inALU );
//ALU
wire [31:0] result;
wire [2:0] ctr1;
ALU x7 (FunctField,ReadData1,out_inALU, ALUOp,ctr1,result);
//Data Memory
reg [31:0] ReadAddr;
reg [31:0] WriteAddr;
wire [31:0] ReadData;
data_memory x8(ReadAddr, WriteAddr, WriteData, clock, MemWrite,
MemRead,ReadData);
//Multiplexer MemtoReg
MUX_2to1_memreg x9( result,ReadData,MemtoReg,WriteData);
endmodule
// Mux2 to 1_32 bit
module MUX_2to1( ReadData2 , outputData, ALUSrc,out_inALU );
input [31:0] ReadData2,outputData;
input ALUSrc;
output [31:0]out_inALU;
reg [31:0]out_inALU;
always #(ReadData2 or outputData or ALUSrc )
begin
case(ALUSrc)
1'b0: out_inALU=ReadData2;
1'b1: out_inALU=outputData;
endcase
end
endmodule
// Mux 2 to 1 5 bit
module MUX_2to1_5bit( inputa , inputb, regDst, WriteReg);
input [4:0] inputa, inputb;
input regDst;
output [4:0]WriteReg;
reg [4:0]WriteReg;
always #(inputa or inputb or regDst )
begin
case(regDst)
1'b0: WriteReg=inputa;
1'b1: WriteReg=inputb;
endcase
end
endmodule
// Mux 2 to 1 for memory register
module MUX_2to1_memreg( result,ReadData,MemtoReg,WriteData);
input [31:0] ReadData,result;
input MemtoReg;
output [31:0] WriteData;
reg [31:0]WriteData;
always #(* )
begin
case(MemtoReg)
1'b0: WriteData= result ;
1'b1: WriteData= ReadData;
endcase
end
endmodule
// Progrma COunter
module PC(input_PC,clock,reset,addr);
input reset,clock;
input [31:0] input_PC;
output reg [31:0] addr;
always #(posedge clock)
begin
if (reset)
addr=0;
else
addr=input_PC+4;
end
endmodule
//Register
module Register_32 ( ReadReg1, ReadReg2, WriteReg,
WriteData, clock, RegWrite,ReadData1, ReadData2);
input [4:0] ReadReg1, ReadReg2, WriteReg;
input [31:0] WriteData;
input clock, RegWrite;
output [31:0] ReadData1, ReadData2;
reg [31:0] ReadData1, ReadData2;
reg [31:0] mem[0:31]; // 32 32-bit registers
reg [5:0] i; // Temporary variable
initial
begin
// Initial registers for testing purpose
for ( i = 0; i <= 31; i = i+1)
mem[i] = i;
// Initial start-up
ReadData1 = 0;
ReadData2 = 0;
end
// Data from register is always fetched with positive edge clock
always #(posedge clock)
begin
#1 ReadData1 = mem[ReadReg1];
#1 ReadData2 = mem[ReadReg2];
if ( RegWrite == 1)
#1 mem[WriteReg] = WriteData;
end
endmodule
// Sign extender
module SignExtender_16to32(immediate,out_sign);
input[15:0] immediate;
output[31:0] out_sign;
reg [31:0] out_sign;
always#(*)
begin
out_sign[15:0] = immediate[15:0];
out_sign[31:16] = {16{immediate[15]}};
end
endmodule
You could increment the program counter as below but it will have wrap around issues. Better to have a signal to latch the input address or at reset do add = input_PC;
module PC(input_PC,clock,reset,addr);
input reset,clock;
input [31:0] input_PC;
output reg [31:0] addr;
always #(posedge clock)
begin
if (reset)
addr= 0;
else
addr=input_PC+4+addr;
end
endmodule
you would want the mem to ignore lower 2 bits in inst module else after reading mem 0 it will read mem[4] .
instruction <= MEM[ addr[31:2] ] ;
You need to connect the instruction to the ALU module , cannot offer any suggestion as i cannot figure what your instruction decoding scheme is.

SystemVerilog/Verilog: Is there a way to find the integer bit offset of a field of a packed struct?

I was wondering if there is a standard function in verilog or systemverilog that will return the bit offset of a certain field within a packed struct. For instance, see use of hypothetical function $find_field_offset below:
typedef struct packed
{
logic [31:0] field_1,
logic [15:0] field_2,
logic [63:0] field_3
} struct_type;
struct_type the_struct;
int field_1_offset;
assign field_1_offset = $find_field_offset(the_struct.field_1);
Thanks!
It might not be a good and convenient way. But this is the native SV code to find out the offset field_1 inside struct_type
function automatic int get_offset_struct_type_field_1;
struct_type x = '0;
x.field_1 = '1;
for (integer i = 0; i < $bits(x); i=i+1) begin
if (x[i] == 1) return i;
end
endfunction
Thanks for the examples. Below are some pedantic checks and messaging to confirm offsets and sizes of all fields of some hierarchical structs. char_idx is the character position of a nibble in the struct if read as a string from a $writememh or similar. This taks is invoked within an initial block to confirm downstream parsing will be able to interpret the hexadecimal representation correctly.
task report_offsets(
);
phy_mon_info_s tr;
integer char_idx;
$display("tr is made of %0d bits",$bits(tr));
for (integer idx = 0;idx< $bits(tr);idx++) begin
char_idx = ($bits(tr) - idx - 1) >> 2;
tr = 1'b1 << idx;
if (|tr.inst > 0) $display("tr.inst claims bit %0d hex char %0d" ,idx, char_idx);
if (|tr.clock_count) $display("tr.clock_count claims bit %0d hex char %0d" ,idx, char_idx);
if (|tr.phy_info.dir) $display("tr.phy_info.dir claims bit %0d hex char %0d" ,idx, char_idx);
if (|tr.phy_info.data_type) $display("tr.phy_info.data_type claims bit %0d hex char %0d" ,idx, char_idx);
for (int inner = 0;inner< PHY_MON_FRAME_DWS;inner++) begin
if (|tr.phy_info.header[inner]) $display("tr.phy_info.header[%0d] claims bit %0d hex char %0d",inner,idx, char_idx);
end
if (|tr.phy_info.payload_dws) $display("tr.phy_info.payload_dws claims bit %0d hex char %0d",idx, char_idx);
if (|tr.phy_info.prim) $display("tr.phy_info.prim claims bit %0d hex char %0d" ,idx, char_idx);
if (|tr.phy_info.num_prims) $display("tr.phy_info.num_prims claims bit %0d hex char %0d" ,idx, char_idx);
if (|tr.phy_clock) $display("tr.phy_info.phy_clk claims bit %0d hex char %0d" ,idx, char_idx);
end
assert($bits(tr.inst ) % 4 == 0) else $error("total bits in tr.inst %0d is not a multiple of 4!",$bits(tr.inst ));
assert($bits(tr.clock_count ) % 4 == 0) else $error("total bits in tr.clock_count %0d is not a multiple of 4!",$bits(tr.clock_count ));
assert($bits(tr.phy_info.dir ) % 4 == 0) else $error("total bits in tr.phy_info.dir %0d is not a multiple of 4!",$bits(tr.phy_info.dir ));
assert($bits(tr.phy_info.data_type ) % 4 == 0) else $error("total bits in tr.phy_info.data_type %0d is not a multiple of 4!",$bits(tr.phy_info.data_type ));
assert($bits(tr.phy_info.header ) % 4 == 0) else $error("total bits in tr.phy_info.header %0d is not a multiple of 4!",$bits(tr.phy_info.header ));
assert($bits(tr.phy_info.payload_dws ) % 4 == 0) else $error("total bits in tr.phy_info.payload_dws %0d is not a multiple of 4!",$bits(tr.phy_info.payload_dws ));
assert($bits(tr.phy_info.prim ) % 4 == 0) else $error("total bits in tr.phy_info.prim %0d is not a multiple of 4!",$bits(tr.phy_info.prim ));
assert($bits(tr.phy_info.num_prims ) % 4 == 0) else $error("total bits in tr.phy_info.num_prims %0d is not a multiple of 4!",$bits(tr.phy_info.num_prims ));
assert($bits(tr.phy_clock ) % 4 == 0) else $error("total bits in tr.phy_clock %0d is not a multiple of 4!",$bits(tr.phy_clock ));
assert($bits(tr ) % 4 == 0) else $error("total bits in tr %0d is not a multiple of 4!",$bits(tr ));
endtask
I used #jclin's function and turned it into a macro, so you can call it like a function, no need to replicate code.
`define get_offset(struct_name, field_name, bit_offset)\
struct_name = '0;\
struct_name.field_name = '1;\
for (integer i = 0; i < $bits(struct_name); i=i+1) begin\
if (struct_name[i] == 1) begin\
$display("%s offset=%4d", `"field_name`", i);\
bit_offset = i;\
break;\
end\
end
example_struct_t example_struct;
int return_value;
initial begin
`get_offset(example_struct, field_0, return_value);
`get_offset(example_struct, field_1, return_value);
`get_offset(example_struct, field_2.subfield_0, return_value);
end
The return value is useful if you want to collect the values and do some further calculation
It doesn't really make sense to have a function that returns this in the form that you presented. The reason is that you don't have anything variable in there. You just pass in a value to the method and there is nothing the compiler could use to determine that you want to get the offset of field1 in struct_type.
The best you can do with native SystemVerilog is to define your own function that returns the offset based on an enumerated argument. Off the top of my head:
typedef enum { FIELD1, FIELD2, FIELD3 } struct_type_fields_e;
function int unsigned get_offset(struct_type_fields_e field);
case (field)
FIELD1 : return 0;
FIELD2 : return 32;
FIELD3 : return 48;
endcase
endfunction
You could probably do more with some VPI code, but you would need to change the way you call your function.
If you're ok with using the VPI, then you can do what you want with a little bit of type introspection. The problem with this is that you can't call the function in the way you like, because, to my knowledge, the compiler loses the context of what field1 actually is. What I mean by this is that the function would see a logic vector value, but not know it originated from a struct.
If you're okay with changing the function call to:
$find_field_offset(the_struct, "field1"); // note "field1" in quotes
then it would technically be possible to figure out that the_struct is of type struct_type and loop over all of its fields to find the field called "field1" and return the offset of that.
The problem with using VPI code is that support for the VPI object model varies from vendor to vendor. You have to be lucky enough to use a vendor that has support for the functions we would need here.

FFT implemetation in Verilog: Assigning Wire input to Register type array

I am trying to implement butterfly FFT algorithm in verilog.
I create K(Here 4) butterfly modules . I create modules like this.
localparam K = 4;
genvar i;
generate
for(i=0;i<N/2;i=i+1)
begin:BN
butterfly #(
.M_WDTH (3 + 2*1),
.X_WDTH (4)
)
bf (
.clk(clk),
.rst_n(rst_n),
.m_in(min),
.w(w[i]),
.xa(IN[i]),
.xb(IN[i+2]),
.x_nd(x_ndd),
.m_out(mout[i]),
.ya(OUT[i]),
.yb(OUT[i+2]),
.y_nd(y_nddd[i])
);
end
Each level I have to change input Xa and Xb for each Module (Here Number of level 3).
So I try to initialize reg type "IN"array and assign the array to input Xa and Xb. When I initialize "IN" array manually, it works perfectly.
The problem I face now, I couldn't assign Main module input X to register type "IN" array.
Main module input X ,
input wire signed [N*2*X_WDTH-1:0] X,
I have to assign this X into array "IN",
reg signed [2*X_WDTH-1:0] IN [0:N-1];
I assigned like this,
initial
begin
IN[0]= X[2*X_WDTH-1:0];
IN[1]=X[4*X_WDTH-1:2*X_WDTH];
IN[2]=X[6*X_WDTH-1:4*X_WDTH];
IN[3]= X[8*X_WDTH-1:6*X_WDTH];
IN[4]= X[10*X_WDTH-1:8*X_WDTH];
IN[5]=X[12*X_WDTH-1:10*X_WDTH];
IN[6]=X[14*X_WDTH-12*X_WDTH];
IN[7]= X[16*X_WDTH-1:14*X_WDTH];
end
I have gone through many tutorials and forums. No luck.
Can't we assign wire type to reg type array? If so how I can solve this problem.
Here is the Main module where I initialize Butterfly modules,
module Network
#(
// N
parameter N = 8,
// K.
parameter K = 3,
parameter M_WDTH=5,
parameter X_WDTH=4
)
(
input wire clk,
input wire rst_n,
// X
input wire signed [N*2*X_WDTH-1:0] X,
//Y
output wire signed [N*2*X_WDTH-1:0] Y,
output wire signed [K-1:0] y_ndd
);
wire y_nddd [K-1:0];
assign y_ndd ={y_nddd[1],y_nddd[0]};
reg [4:0] min=5'sb11111;
wire [4:0] mout [0:K-1];
reg x_ndd;
reg [2:0] count=3'b100;
reg [2*X_WDTH-1:0] w [K-1:0];
reg [2*X_WDTH-1:0] IN [0:N-1];
wire [2*X_WDTH-1:0] OUT [0:N-1];
assign Y = {OUT[3],OUT[2],OUT[1],OUT[0]};
reg [3:0] a;
initial
begin
//TODO : Here is the problem. Assigning Wire to reg array. Synthesize ok. In Simulate "red" output.
IN[0]= X[2*X_WDTH-1:0];
IN[1]=X[4*X_WDTH-1:2*X_WDTH];
IN[2]=X[6*X_WDTH-1:4*X_WDTH];
IN[3]= X[8*X_WDTH-1:6*X_WDTH];
IN[4]= X[10*X_WDTH-1:8*X_WDTH];
IN[5]=X[12*X_WDTH-1:10*X_WDTH];
IN[6]=X[14*X_WDTH-12*X_WDTH];
IN[7]= X[16*X_WDTH-1:14*X_WDTH];
//TODO :This is only a random values
w[0]=8'sb01000100;
w[1]=8'sb01000100;
w[2]=8'sb01000100;
w[3]=8'sb01000100;
end
/* levels */
genvar i;
generate
for(i=0;i<N/2;i=i+1)
begin:BN
butterfly #(
.M_WDTH (3 + 2*1),
.X_WDTH (4)
)
bf (
.clk(clk),
.rst_n(rst_n),
.m_in(min),
.w(w[i]),
.xa(IN[i]),
.xb(IN[i+N/2]),
.x_nd(x_ndd),
.m_out(mout[i]),
.ya(OUT[2*i]),
.yb(OUT[2*i+1]),
.y_nd(y_nddd[i])
);
end
endgenerate
always # (posedge clk)
begin
if (count==3'b100)
begin
count=3'b001;
x_ndd=1;
end
else
begin
count=count+1;
x_ndd=0;
end
end
always# (posedge y_ndd[0])
begin
//TODO
//Here I have to swap OUT-->IN
end
endmodule
Any help is appreciated.
Thanks in advance.
"Output is red", this likely means it is x this could be due to multiple drivers or an uninitialized value. If it was un-driven it would be z.
The main Issue I believe is that you do this :
initial begin
IN[0] = X[2*X_WDTH-1:0];
IN[1] = X[4*X_WDTH-1:2*X_WDTH];
...
The important part is the initial This is only evaluated once, at time 0. Generally everything is x at time zero. To make this an equivalent of the assign IN[0] = ... for a wire use always #* begin this is a combinatorial block which will update the values for IN when ever X changes.
always #* begin
IN[0] = X[2*X_WDTH-1:0];
IN[1] = X[4*X_WDTH-1:2*X_WDTH];
...
I am not sure why you do not just connect your X to your butterfly .xa and .xb ports directly though?
Other pointers
X is a bad variable name verilog as a wire or reg can hold four values 1,0,x or z.
In always #(posedge clk) you should be using non-blocking (<=) assignments to correctly model the behaviour of a flip-flop.
y_ndd is k bits wide but only the first 2 bits are assigned.
output signed [K-1:0] y_ndd
assign y_ndd = {y_nddd[1],y_nddd[0]};
Assignments should be in terms of their parameter width/size. For example IN has N entries but currently exactly 8 entries are assigned. There will been an issue when N!=8. Look into Indexing vectors and arrays with +:. Example:
integer idx;
always #* begin
for (idx=0; idx<N; idx=idx+1)
IN[idx] = X[ idx*2*X_WDTH +: 2*X_WDTH];
end
genvar gidx;
generate
for(gidx=0; gidx<N; gidx=gidx+1) begin
assign Y[ gidx*2*X_WDTH +: 2*X_WDTH] = OUT[gidx];
end
endgenerate

How can you emulate recursion with a stack?

I've heard that any recursive algorithm can always be expressed by using a stack. Recently, I've been working on programs in an environment with a prohibitively small available call stack size.
I need to do some deep recursion, so I was wondering how you could rework any recursive algorithm to use an explicit stack.
For example, let's suppose I have a recursive function like this
function f(n, i) {
if n <= i return n
if n % i = 0 return f(n / i, i)
return f(n, i + 1)
}
how could I write it with a stack instead? Is there a simple process I can follow to convert any recursive function into a stack-based one?
If you understand how a function call affects the process stack, you can understand how to do it yourself.
When you call a function, some data are written on the stack including the arguments. The function reads these arguments, does whatever with them and places the result on the stack. You can do the exact same thing. Your example in particular doesn't need a stack so if I convert that to one that uses stack it may look a bit silly, so I'm going to give you the fibonacci example:
fib(n)
if n < 2 return n
return fib(n-1) + fib(n-2)
function fib(n, i)
stack.empty()
stack.push(<is_arg, n>)
while (!stack.size() > 2 || stack.top().is_arg)
<isarg, argn> = stack.pop()
if (isarg)
if (argn < 2)
stack.push(<is_result, argn>)
else
stack.push(<is_arg, argn-1>)
stack.push(<is_arg, argn-2>)
else
<isarg_prev, argn_prev> = stack.pop()
if (isarg_prev)
stack.push(<is_result, argn>)
stack.push(<is_arg, argn_prev>)
else
stack.push(<is_result, argn+argn_prev>)
return stack.top().argn
Explanation: every time you take an item from the stack, you need to check whether it needs to be expanded or not. If so, push appropriate arguments on the stack, if not, let it merge with previous results. In the case of fibonacci, once fib(n-2) is computed (and is available at top of stack), n-1 is retrieved (one after top of stack), result of fib(n-2) is pushed under it, and then fib(n-1) is expanded and computed. If the top two elements of the stack were both results, of course, you just add them and push to stack.
If you'd like to see how your own function would look like, here it is:
function f(n, i)
stack.empty()
stack.push(n)
stack.push(i)
while (!stack.is_empty())
argi = stack.pop()
argn = stack.pop()
if argn <= argi
result = argn
else if n % i = 0
stack.push(n / i)
stack.push(i)
else
stack.push(n)
stack.push(i + 1)
return result
You can convert your code to use a stack like follows:
stack.push(n)
stack.push(i)
while(stack.notEmpty)
i = stack.pop()
n = stack.pop()
if (n <= i) {
return n
} else if (n % i = 0) {
stack.push(n / i)
stack.push(i)
} else {
stack.push(n)
stack.push(i+1)
}
}
Note: I didn't test this, so it may contain errors, but it gives you the idea.
Your particular example is tail-recursive, so with a properly optimising compiler, it should not consume any stack depth at all, as it is equivalent to a simple loop. To be clear: this example does not require a stack at all.
Both your example and the fibonacci function can be rewritten iteratively without using stack.
Here's an example where the stack is required, Ackermann function:
def ack(m, n):
assert m >= 0 and n >= 0
if m == 0: return n + 1
if n == 0: return ack(m - 1, 1)
return ack(m - 1, ack(m, n - 1))
Eliminating recursion:
def ack_iter(m, n):
stack = []
push = stack.append
pop = stack.pop
RETURN_VALUE, CALL_FUNCTION, NESTED = -1, -2, -3
push(m) # push function arguments
push(n)
push(CALL_FUNCTION) # push address
while stack: # not empty
address = pop()
if address is CALL_FUNCTION:
n = pop() # pop function arguments
m = pop()
if m == 0: # return n + 1
push(n+1) # push returned value
push(RETURN_VALUE)
elif n == 0: # return ack(m - 1, 1)
push(m-1)
push(1)
push(CALL_FUNCTION)
else: # begin: return ack(m - 1, ack(m, n - 1))
push(m-1) # save local value
push(NESTED) # save address to return
push(m)
push(n-1)
push(CALL_FUNCTION)
elif address is NESTED: # end: return ack(m - 1, ack(m, n - 1))
# old (m - 1) is already on the stack
push(value) # use returned value from the most recent call
push(CALL_FUNCTION)
elif address is RETURN_VALUE:
value = pop() # pop returned value
else:
assert 0, (address, stack)
return value
Note it is not necessary here to put CALL_FUNCTION, RETURN_VALUE labels and value on the stack.
Example
print(ack(2, 4)) # -> 11
print(ack_iter(2, 4))
assert all(ack(m, n) == ack_iter(m, n) for m in range(4) for n in range(6))
print(ack_iter(3, 4)) # -> 125