MIPS data path for store word? - mips

Based on this figure, executing the SW instruction would cause these values to be assigned to the signals labeled in blue:
RegWrite = 0
ALUSrc = 1
ALU operation = 0010
MemRead = 0
MemWrite = 1
MemtoReg = X
PCSrc =
However, I am a little confused which inputs will be used in the Registers block? Can anyone describe the overall SW procedure in the MIPS datapath?

The execution of sw would follow the following steps in your diagram:
Instruction is read and decoded from the PC in the Instruction Memory subcircuit.
The register file is read for $rs and $rt (Registers subcircuit)
The value of $rs is added to the sign extended immediate (selected by ALUSrc) (ALU subcircuit).
The added value and $rt are passed to the Data Memory subcircuit where the value of $rt is written to memory.

Related

What is the cause of "cannot get label for unreachable MBB" in llvm back-end?

I'm writing a llvm back-end, I'm meeting a problem for branch conditional instruction.
I want to translate the llvm IR branch to my specific target. below is what i have tried.
here is the llvm ir
%6 = icmp slt i32 %4, %5
br i1 %6, label %7, label %14
I have defined the instruction and instruction pattern, and i have write the compare instruction and pattern:
def BNE: InstToy<4, (outs), (ins GPR32:$Rd,btargetS15:$S15), "bne\t$Rd, $S15", [(brcond GPR32:$Rd, bb:$S15)]> {
bits<15> S15;
bits<5> Rd;
let Inst{19-5} = S15;
let Inst{4-0} = Rd;
}
The BNE instruction is check the Rd is zero, if no, it will jump to the target pc, otherwise do noting.
And i set "Legal" at isellowering
setOperationAction(ISD::BRCOND, MVT::i32, Legal);
but once i try the command compile the llvm ir
llc test.ll
it will raise the error:
llc: MachineBasicBlock.cpp:59: llvm::MCSymbol* llvm::MachineBasicBlock::getSymbol() const: Assertion `getNumber() >= 0 && "cannot get label for unreachable MBB"' failed.
I expect it can compiler without exception.
with -print-after-all on in llc, i found out the MBB disappears after branch folding pass, I found there is an error in analyzeBranch, after fixed that, the problem is fixed.

C to MIPS translate

Here i have been given an exam question that i partly solved but do not understand it completely
why it is used volatile here? and the missing expression
i have must be switches >>8.
when it comes to translation i have some difficulty.
Eight switches are memory mapped to the memory address 0xabab0020, where the
least significant bit (index 0) represents switch number 1 and the bit with index 7
represents switch number 8. A bit value 1 indicates that the switch is on and
0 means that it is off. Write down the missing C code expression, such that the
while loop exits if the toggle switch number 8 is off.
volatile int * switches = (volatile int *) 0xabab0020;
volatile int * leds = (volatile int *) 0xabab0040;
while(/* MISSING C CODE EXPRESSION */){
*leds = (*switches >> 4) & 1;
}
Translate the complete C code above into MIPS assembly code, including the missing C code expression. You are not allowed to use pseudo instructions.
without volatile your code can legally be interpreted by the compiler as:
int * switches = (volatile int *) 0xabab0020;
int * leds = (volatile int *) 0xabab0040;
*leds = (*switches >> 4) & 1;
while(/* MISSING C CODE EXPRESSION */){
}
The volatile qualifier is an indication to the C compiler that the data at addresses switches and leds can be changed by another agent in the system. Without the volatile qualifier, the compiler would be allowed to optimize references to these variables away.
The problem description says the loop should run while bit 7 of *switches is set, i.e: while (*switches & 0x80 != 0)
Translating the code is left as an exercise for the reader.
volatile int * switches = (volatile int *) 0xabab0020;
volatile int * leds = (volatile int *) 0xabab0040;
while((*switches >> 8) & 1){
*leds = (*switches >> 4) & 1;
}
To mips
lui $s0,0xabab #load the upper half
ori $s0,0x0020
lui $s1,0xabab
ori $s1,0x0040
while:
lw $t0,0x20($s0)
srl $t0,$t0,8 #only the 8th bit is important
andi $t0,$t0,1 # clear other bit keep the LSB
beq $t0,$0, done
lw $t1,0x40($s1)
srl $t1,$t1,4
andi $t1,$t1,1
sw $t1,0x40($s1)
j while
done:
sw $t0,0x20($s0)

"dimension too large" error when broadcasting to sparse matrix in octave

32-bit Octave has a limit on the maximum number of elements in an array. I have recompiled from source (following the script at https://github.com/calaba/octave-3.8.2-enable-64-ubuntu-14.04 ), and now have 64-bit indexing.
Nevertheless, when I attempt to perform elementwise multiplication using a broadcast function, I get error: out of memory or dimension too large for Octave's index type
Is this a bug, or an undocumented feature? If it's a bug, does anyone have a reasonably efficient workaround?
Minimal code to reproduce the problem:
function indexerror();
% both of these are formed without error
% a = zeros (2^32, 1, 'int8');
% b = zeros (1024*1024*1024*3, 1, 'int8');
% sizemax % returns 9223372036854775806
nnz = 1000 % number of non-zero elements
rowmax = 250000
colmax = 100000
irow = zeros(1,nnz);
icol = zeros(1,nnz);
for ind =1:nnz
irow(ind) = round(rowmax/nnz*ind);
icol(ind) = round(colmax/nnz*ind);
end
sparseMat = sparse(irow,icol,1,rowmax,colmax);
% column vector to be broadcast
broad = 1:rowmax;
broad = broad(:);
% this gives "dimension too large" error
toobig = bsxfun(#times,sparseMat,broad);
% so does this
toobig2 = sparse(repmat(broad,1,size(sparseMat,2)));
mult = sparse( sparseMat .* toobig2 ); % never made it this far
end
EDIT:
Well, I have an inefficient workaround. It's slower than using bsxfun by a factor of 3 or so (depending on the details), but it's better than having to sort through the error in the libraries. Hope someone finds this useful some day.
% loop over rows, instead of using bsxfun
mult_loop = sparse([],[],[],rowmax,colmax);
for ind =1:length(broad);
mult_loop(ind,:) = broad(ind) * sparseMat(ind,:);
end
The unfortunate answer is that yes, this is a bug. Apparently #bsxfun and repmat are returning full matrices rather than sparse. Bug has been filed here:
http://savannah.gnu.org/bugs/index.php?47175

FFT implemetation in Verilog: Assigning Wire input to Register type array

I am trying to implement butterfly FFT algorithm in verilog.
I create K(Here 4) butterfly modules . I create modules like this.
localparam K = 4;
genvar i;
generate
for(i=0;i<N/2;i=i+1)
begin:BN
butterfly #(
.M_WDTH (3 + 2*1),
.X_WDTH (4)
)
bf (
.clk(clk),
.rst_n(rst_n),
.m_in(min),
.w(w[i]),
.xa(IN[i]),
.xb(IN[i+2]),
.x_nd(x_ndd),
.m_out(mout[i]),
.ya(OUT[i]),
.yb(OUT[i+2]),
.y_nd(y_nddd[i])
);
end
Each level I have to change input Xa and Xb for each Module (Here Number of level 3).
So I try to initialize reg type "IN"array and assign the array to input Xa and Xb. When I initialize "IN" array manually, it works perfectly.
The problem I face now, I couldn't assign Main module input X to register type "IN" array.
Main module input X ,
input wire signed [N*2*X_WDTH-1:0] X,
I have to assign this X into array "IN",
reg signed [2*X_WDTH-1:0] IN [0:N-1];
I assigned like this,
initial
begin
IN[0]= X[2*X_WDTH-1:0];
IN[1]=X[4*X_WDTH-1:2*X_WDTH];
IN[2]=X[6*X_WDTH-1:4*X_WDTH];
IN[3]= X[8*X_WDTH-1:6*X_WDTH];
IN[4]= X[10*X_WDTH-1:8*X_WDTH];
IN[5]=X[12*X_WDTH-1:10*X_WDTH];
IN[6]=X[14*X_WDTH-12*X_WDTH];
IN[7]= X[16*X_WDTH-1:14*X_WDTH];
end
I have gone through many tutorials and forums. No luck.
Can't we assign wire type to reg type array? If so how I can solve this problem.
Here is the Main module where I initialize Butterfly modules,
module Network
#(
// N
parameter N = 8,
// K.
parameter K = 3,
parameter M_WDTH=5,
parameter X_WDTH=4
)
(
input wire clk,
input wire rst_n,
// X
input wire signed [N*2*X_WDTH-1:0] X,
//Y
output wire signed [N*2*X_WDTH-1:0] Y,
output wire signed [K-1:0] y_ndd
);
wire y_nddd [K-1:0];
assign y_ndd ={y_nddd[1],y_nddd[0]};
reg [4:0] min=5'sb11111;
wire [4:0] mout [0:K-1];
reg x_ndd;
reg [2:0] count=3'b100;
reg [2*X_WDTH-1:0] w [K-1:0];
reg [2*X_WDTH-1:0] IN [0:N-1];
wire [2*X_WDTH-1:0] OUT [0:N-1];
assign Y = {OUT[3],OUT[2],OUT[1],OUT[0]};
reg [3:0] a;
initial
begin
//TODO : Here is the problem. Assigning Wire to reg array. Synthesize ok. In Simulate "red" output.
IN[0]= X[2*X_WDTH-1:0];
IN[1]=X[4*X_WDTH-1:2*X_WDTH];
IN[2]=X[6*X_WDTH-1:4*X_WDTH];
IN[3]= X[8*X_WDTH-1:6*X_WDTH];
IN[4]= X[10*X_WDTH-1:8*X_WDTH];
IN[5]=X[12*X_WDTH-1:10*X_WDTH];
IN[6]=X[14*X_WDTH-12*X_WDTH];
IN[7]= X[16*X_WDTH-1:14*X_WDTH];
//TODO :This is only a random values
w[0]=8'sb01000100;
w[1]=8'sb01000100;
w[2]=8'sb01000100;
w[3]=8'sb01000100;
end
/* levels */
genvar i;
generate
for(i=0;i<N/2;i=i+1)
begin:BN
butterfly #(
.M_WDTH (3 + 2*1),
.X_WDTH (4)
)
bf (
.clk(clk),
.rst_n(rst_n),
.m_in(min),
.w(w[i]),
.xa(IN[i]),
.xb(IN[i+N/2]),
.x_nd(x_ndd),
.m_out(mout[i]),
.ya(OUT[2*i]),
.yb(OUT[2*i+1]),
.y_nd(y_nddd[i])
);
end
endgenerate
always # (posedge clk)
begin
if (count==3'b100)
begin
count=3'b001;
x_ndd=1;
end
else
begin
count=count+1;
x_ndd=0;
end
end
always# (posedge y_ndd[0])
begin
//TODO
//Here I have to swap OUT-->IN
end
endmodule
Any help is appreciated.
Thanks in advance.
"Output is red", this likely means it is x this could be due to multiple drivers or an uninitialized value. If it was un-driven it would be z.
The main Issue I believe is that you do this :
initial begin
IN[0] = X[2*X_WDTH-1:0];
IN[1] = X[4*X_WDTH-1:2*X_WDTH];
...
The important part is the initial This is only evaluated once, at time 0. Generally everything is x at time zero. To make this an equivalent of the assign IN[0] = ... for a wire use always #* begin this is a combinatorial block which will update the values for IN when ever X changes.
always #* begin
IN[0] = X[2*X_WDTH-1:0];
IN[1] = X[4*X_WDTH-1:2*X_WDTH];
...
I am not sure why you do not just connect your X to your butterfly .xa and .xb ports directly though?
Other pointers
X is a bad variable name verilog as a wire or reg can hold four values 1,0,x or z.
In always #(posedge clk) you should be using non-blocking (<=) assignments to correctly model the behaviour of a flip-flop.
y_ndd is k bits wide but only the first 2 bits are assigned.
output signed [K-1:0] y_ndd
assign y_ndd = {y_nddd[1],y_nddd[0]};
Assignments should be in terms of their parameter width/size. For example IN has N entries but currently exactly 8 entries are assigned. There will been an issue when N!=8. Look into Indexing vectors and arrays with +:. Example:
integer idx;
always #* begin
for (idx=0; idx<N; idx=idx+1)
IN[idx] = X[ idx*2*X_WDTH +: 2*X_WDTH];
end
genvar gidx;
generate
for(gidx=0; gidx<N; gidx=gidx+1) begin
assign Y[ gidx*2*X_WDTH +: 2*X_WDTH] = OUT[gidx];
end
endgenerate

Cuda Fortran: Data copy from cpu to gpu [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have a problem about the data copy form host to device. Here is my problem. I have an array define as
real, allocatable :: cpuArray(:,:,:)
real, device, allocatable :: gpuArrray(:,:,:)
allocate(cpuArray(0:imax-1,0:jmax-1,0:kmax-1))
allocate(gpuArrray(-1:imax,-1:jmax,-1:kmax))
!array initialiazation
cpuArrray = randomValue !non 0 value
gpuArray = 0.0 !first 0 all gpu array elements
gpuArrray(0:imax-1,0:jmax-1,0:kmax-1)= cpuArray
My expectation is that only the designated index in the gpuArray will receive data from the host however it does not work.
Could you help me find what is wrong with this?
PS: I based my my approach from this tutorial of PGI home page
--
When I set both the cpuArray and the gpuArray the same dimension,
I get exactly the correct result.
But the current situation produces 0 for all element in the gpuArray. I modified the default value to a non zero (ie. gpuArray = 10.0 !first 10 all gpu array elements ) but the result still 0.
Best regards,
Adjeiinfo
All my apologies to the whole community. I could solve my problem. It was a silly bug I introduced in the test program. Instead of cpuArrray= cpuArray(0:imax-1,0:jmax-1,0:kmax-1) in the check program, I did cpuArrray= cpuArray.So the program was working well but the result check program was buggy.
Thank you for your follow up.
For your reference this is a part of the program (can be built and run)
module mytest
use cudafor
implicit none
integer :: imax , jmax, kmax
integer :: i,j,k
!host arrays
real,allocatable:: h_a(:,:,:)
real,allocatable:: h_b(:,:,:)
real,allocatable:: h_c(:,:,:)
!device array
real,device,allocatable:: d_b(:,:,:)
real,device,allocatable:: d_c(:,:,:)
real,device,allocatable:: d_b_copy(:,:,:)
real,device,allocatable:: d_c_copy(:,:,:)
contains
attributes(global) subroutine testdata()
integer :: d_i, d_j,d_k
d_i = (blockIdx%x-1) * blockDim%x + threadIdx%x-1
d_j = (blockIdx%y-1) * blockDim%y + threadIdx%y-1
do d_k = 0, 1
d_b_copy(d_i, d_j, d_k) = d_b(d_i, d_j, d_k)
d_c_copy(d_i, d_j, d_k) = d_c(d_i, d_j, d_k)
end do
end subroutine testdata
end module mytest
program Test
use mytest
type(dim3) :: dimGrid, dimBlock,dimGrid1, dimBlock1
imax = 32
jmax = 32
kmax = 2
dimGrid = dim3(2,2, 1)
dimBlock = dim3(imax,jmax,1)
allocate(h_a(0:imax-1,0:jmax-1,0:1))
allocate(h_b(0:imax-1,0:jmax-1,0:1))
allocate(h_c(0:imax-1,0:jmax-1,0:1))
!real,device,allocatable::d_c(:,:,:)
allocate(d_b(0:imax-1,0:jmax-1,0:1))
allocate(d_c(-1:imax,-1:jmax,-1:16))
allocate(d_b_copy(0:imax-1,0:jmax-1,0:1))
allocate(d_c_copy(-1:imax,-1:jmax,-1:1))
!array initialization
do k = 0,kmax-1
do j=0, jmax-1
do i = 0, imax-1
h_a(i,j,k) = i*0.1
end do
end do
end do
!data transfer (cpu to gpu)
d_b = h_a
d_c(0:imax-1,0:jmax-1,0:kmax-1)= h_a
call testdata<<<dimGrid,dimBlock>>>()
!copy back to cpu
h_b = d_b_copy(0:imax-1,0:jmax-1,0:kmax-1)
h_c = d_c_copy(0:imax-1,0:jmax-1,0:kmax-1)
!just for visual test
write(*,*), h_b
open(24,file='h_a.dat')
write(24,*) h_a
close(24)
open(24,file='d_b_copy.dat')
write(24,*) h_b
close(24)
open(24,file='d_c_copy.dat')
write(24,*) h_c
close(24)
end program Test