what is the idiomatic way to update *part* of a memory element in FIRRTL? this comes up when updating one entry of a line in a cache - chisel

Writing a register file in FIRRTL is straightforward: make a memory of machine words and read/write them.
However, when writing a cache, it is different: you typically have a cache line and when writing, only want to update part of the line, a single element of a line of the cache.
What is the idiomatic way to do this in FIRRTL? (And please do not point me at the Rocket implementation in Chisel as I find Chisel to be completely unreadable.)
I can think of at least two ways to do it:
(1) make the memory contain a Vector or Bundle and then select that member of the memory element, something like this:
cmem mem1 : {x:SInt<64>,y:SInt<64>}[4]
infer mport temp_x_mem1 = mem1[i].x, clock
temp_x_mem1 <= foo
(2) do some sort of read-modify-write, something like this:
cmem mem1 : {x:SInt<64>,y:SInt<64>}[4]
infer mport temp_x_mem1 = mem1[i], clock
bar <= temp_x_mem1
bar.x <= foo
infer mport temp_x_mem1_B = mem1[i], clock
temp_x_mem1_B <= bar
I am generating my FIRRTL from another format and I did not plan for this, so when I generate a memory, the only straightforward way is to read or write an entire memory element, not part of one. Therefore way (1) is difficult, but way (2) is straightforward. Would some layer of the FIRRTL optimizer or a subsequent Verilog optimizer make way (2) work as efficiently as way (1) if all of the code occurred in the same module?

The simplest approach would be to use a FIRRTL memory construct (mem) and avoid CHIRRTL entirely (cmem/smem). The former gives you an explicit mask port on the memory. This then enables you to describe a masked write which is exactly what you want. FIRRTL memory constructs have no Chisel API, but it sounds like you are using something else so this may not be an issue.
Approach (1) will not work as you can't use a part select to describe a memory port. (infer mport temp_x_mem1 = mem1[i].x, clock is illegal CHIRRTL.) Approach (2) will work, but you pay a cycle penalty to do it.
There is a third, idiomatic approach that involves describing the memory in such a way that a FIRRTL compiler will infer the mask. This is done by guarding the write behind a when statement which contains the enable:
circuit Foo:
module Foo:
input clock: Clock
input i: UInt<2>
input mask: {x: UInt<1>, y: UInt<1>}
input data: {x: SInt<64>, y: SInt<64>}
cmem mem1 : {x:SInt<64>,y:SInt<64>}[4]
infer mport temp_x_mem1 = mem1[i], clock
when eq(mask.x, UInt<1>(1)):
temp_x_mem1.x <= data.x
when eq(mask.y, UInt<1>(1)):
temp_x_mem1.y <= data.y
Either the Scala-based FIRRTL Compiler or the MLIR-based FIRRTL Compiler will infer the when conditions as the mask.
MLIR-based FIRRTL Compiler output:
circuit Foo :
module Foo :
input clock : Clock
input i : UInt<2>
input mask : { x : UInt<1>, y : UInt<1> }
input data : { x : SInt<64>, y : SInt<64> }
mem mem1 :
data-type => { x : SInt<64>, y : SInt<64> }
depth => 4
read-latency => 0
write-latency => 1
writer => temp_x_mem1
read-under-write => undefined
mem1.temp_x_mem1.addr is invalid
mem1.temp_x_mem1.en <= UInt<1>(0)
mem1.temp_x_mem1.clk is invalid
mem1.temp_x_mem1.data is invalid
mem1.temp_x_mem1.mask is invalid
mem1.temp_x_mem1.addr <= i
mem1.temp_x_mem1.en <= UInt<1>(1)
mem1.temp_x_mem1.clk <= clock
mem1.temp_x_mem1.mask.x <= UInt<1>(0)
mem1.temp_x_mem1.mask.y <= UInt<1>(0)
when eq(mask.x, UInt<1>(1)) :
mem1.temp_x_mem1.mask.x <= UInt<1>(1)
mem1.temp_x_mem1.data.x <= data.x
when eq(mask.y, UInt<1>(1)) :
mem1.temp_x_mem1.mask.y <= UInt<1>(1)
mem1.temp_x_mem1.data.y <= data.y

Related

How to get the Index of Max element in UInt Vec , Chisel

I'm trying to get the index of the Max element in a UInt vector.
My code looks like this
val pwr = Vec.tabulate(N) {i => energyMeters(i).io.pwr}
val maxPwr = pwr.indexOf(pwr.max)
However this code generate compilation error:
No implicit Ordering Defined for Chisel.UInt.
val maxPwr = pwr.indexOf(pwr.max)
^
I understand that I probably need to implement the max function , can someone give an example how this should be done ?
Edit:
I also tried this:
val pwr = Vec.tabulate(N) {i => energyMeters(i).io.pwr}
val maxPwr = pwr reduceLeft {(x,y) => Mux(x > y,x,y)}
val maxPwridx = pwr.indexOf(maxPwr)
But it fails on elaboration , when I tried to cast maxPwridx to UInt.
I've ended up with this workaround:
val pwr = Vec.tabulate(N) {i => energyMeters(i).io.pwr}
val maxPwr = pwr reduceLeft {(x,y) => Mux(x > y,x,y)}
val maxPwridx = pwr.indexWhere((x : UInt => x === maxPwr))
Chisel's Vec extends Scala's Seq. This means that a Vec has both dynamic access hardware methods that will allow you to generate hardware to search for something in a Vec (e.g., indexWhere, onlyIndexWhere, lastIndexWhere) as well as all the methods available to normal Scala sequences (e.g., indexOf).
For the purposes of doing hardware operations, you want to use the former (as you found in your last edit---which looks great!) as opposed to the latter.
To get some handle on this, the screenshot below shows the Chisel 3.3.0-RC1 API documentation for VecLike, filtered to excluded inherited methods. Notable here are indexWhere, onlyIndexWhere, lastIndexWhere, exists, forall, and contains:
And the documentation for Vec. The only interesting method here would be reduceTree:

Chisel : When-otherwise clause not working in function definition

I am trying to develop a simple circuit using Chisel 3 to generate the factorial for a number n. Here's my implementation :
class Factorial extends Module{
val io = IO(new Bundle{
val input = Input(UInt(8.W))
val output = Output(UInt(16.W))
})
def factorial(n: UInt): UInt = {
when (n === 0.U) {1.U}
.otherwise {n*factorial(n-1.U)}
}
io.out := factorial(io.in)
}
However, when I try to run it, I get the following error :
cmd26.sc:9: type mismatch;
found : Unit
required: chisel3.UInt
.otherwise {n*factorial(n-1.U)}
^Compilation Failed
Is there any particular reason for this? How do I solve this issue?
Also, I realize that an easy solution is to just have the number n to be of type Int, and have an if-else clause instead. Is there any way to type cast the parameter being passed during function call (i.e. from chisel3.UInt to Int)?
The Chisel when,elsewhen, and otherwise statement do not return a value.
Your design seems to be an attempt to compute the factorial value for an input in a single cycle. This is only going be practical for small input values and would probably be easier to implement via a lookup table.
I think what you are looking for (which would be a good learning exercise) is to build a circuit that given an input will return the factorial value after some number of cycles. This is very very similar to the way the GCD example works, GCD is included as an example in the chisel-template repo as an example. To do this you will need registers and ready and valid ports.
I suggest you figure out how that works and you should have a much easier time making your factorial. Good luck. And as suggested by #FabienM you will need a very large output port to contain the answer for even modest input values.
I thinks you can't do that. when(){}.otherwise{} is an hardware construction that don't return any value (Unit) as we can see in code.
With this construct you want to generate hardware «on the fly», which is impossible.
I think you have generate all solutions directly like it:
class Factorial extends Module{
val io = IO(new Bundle{
val input = Input(UInt(8.W))
val output = Output(UInt(1676.W))
})
def factorial(n: BigInt): BigInt = {
if(n == 0){
1
}else{
n*factorial(n-1)
}
}
io.output := 0.U
for(i <- 0 to 0xFF) {
when(io.input === i.U){
io.output := factorial(i).U
}
}
}
You can keep your recursive scala fonction but just for hardware generation step.
Note that 255! is a really big number you will need more than 16 bits UInt to output the value ;)

How to Paramatrized vector of registers in chisel

I need an example on how to paramtrize Vector of registers in terms of bit-width and initial values which are not '0' and are different for each register.
My use-case is a generic filter coefficients bank with some unique reset values to each, and off course an option to override values.
I thought of something like the below code (not really sure how to write the iteration, so this is kind of pseudo):
class Coeffbank(bitWidth : UInt ,ncoeff : UInt, rstVal : Vec(SInt)) extends Module {
// how do iterate through the reset vector ?? //
val coeffs = Vec.fill(ncoeff) {Reg(init = SInt(rstVal(i),width = bitwidth))
}
Also, when new'ing the above (instantiating this module how do I pass the list of reset value in the argument list?
Hoping to get some help on how to write it properly.
The explanation should probably be a bit more thorough, but basically you need to create a Reg of Vec. Something like should do it:
val coeffs = RegInit(rstVal)
In this case, since you already have the Vec of reset values, you can just pass it to the Reg constructor.
I'm assuming that the size of rstVal is equal to ncoeff, otherwise you'll need to reduce the size of rstVal with something like rstVal.take(ncoeff). Also note that I'm using RegInit which is the preferred way to create a register with a reset value.
Let's start with the easy case. This would be much easier if instead of a Vec of SInts your rstVal array was instead a scala collection (Seq, Array, ...) of regular SInt. When possible it is best to save generation of actual hardware until you directly need them. If rstVal contains Int's. Your code would become
val newRstVals = VecInit(Seq.tabulate(ncoeff) { index => rstVals(index).S(bitWidth.W) })
val reg = RegInit(newRstVals)
If you really need to pass in a Vec then the right approach is to create a separate type instance and use the two argument call to RegInit
val vecType = Vec(ncoeff, SInt(bitWidth.W))
val newRstVals1 = VecInit(Seq.tabulate(ncoeff) { index => newRstVals(index) })
val reg = RegInit(vecType, newRstVals1)
There might be problems if the bitWidth you pass in is not big enough to contain the constants you have passed in. You probably should have some checks for that.

LSTM with rnn cuda()?

I have the following model:
model = nn.Sequential()
model:add(nn.Sequencer(nn.LookupTable(nIndex, hiddenSize)))
model:add(nn.Sequencer(nn.FastLSTM(hiddenSize, hiddenSize, rho)))
model:add(nn.Sequencer(nn.Linear(hiddenSize, nIndex)))
model:add(nn.Sequencer(nn.LogSoftMax()))
then I put the model on cuda by:
model:cuda()
and I try to forward an input (cudatensor) and it breaks .
Is FastLSTM incompatible with cuda ?
the message:
[string "local f = function() return targets:cuda() en..."]:1: attempt to call method 'cuda' (a nil value)
I managed to introduce a few computations on cuda with the following changes:
- first put the model ans the criterion on cuda by :
model=model:cuda()
criterion=criterion:cuda()
-second I built a table of cuda tensor that I provided as targets by :
local targetscudatable={}
for i = 1, #targets do
table.insert(targetscudatable, targets[i]:cuda())
end
then it works, but I wonder if I can have more data sent to cuda, like the inputs. Anyway I already had a speed increase od 500% wich is not to bad
You forgot to require the cunn package :
require 'cunn'

VHDL, using functions in for generate statement

VHDL, using functions in for generate statement
I have a component that should be instantiated about 8000 times, I used for-generate statement with the help of some constant values for reducing amount of code, but I had to declare a function for parametrization of component connections.
My function looks like this:
function dim1_calc (
cmp_index : integer;
prt_index : integer
) return integer is
variable updw : integer := 0;
variable shft_v : integer := 0;
variable result : integer := 0;
begin
if (cmp_index < max_up) then
updw := 1;
else
updw := 2;
end if;
case prt_index is
when 1 =>
shft_v := cnst_rom(updw)(1) + (i-1);
when 2 =>
shft_v := cnst_rom(updw)(2) + (i);
--
--
--
when 32 =>
shft_v := cnst_rom(updw)(32) + (i);
when others =>
shft_v := 0;
end case;
if (updw = 1) then
if (shft_v = min_up & ((prt_index mod 2) = 0)) then
result <= max_up;
elsif (shft_v = max_up & ((prt_index mod 2) = 1)) then
result <= min_up;
elsif (shft_v < max_up) then
result <= shft_v;
else
result <= shft_v - max_up;
end if;
else
--something like first condition statements...
--
--
end if;
return result;
end function;
and part of my code that uses this function plus some related part looks like this:
--these type definitions are in my package
type nx_bits_at is array (natural range <>) of std_logic_vector (bits-1 downto 0);
type mxn_bits_at is array (natural range <>) of nx_bits_at;
--
--
--
component pn_cmpn is
port(
clk : in std_logic;
bn_to_pn : in nx_bits_at(1 to row_wght);
pn_to_bn : out nx_bits_at(1 to row_wght)
);
end component;
--
--
--
signal v2c : mxn_bits_at(1 to bn_num)(1 to col_wght);
signal c2v : mxn_bits_at(1 to pn_num)(1 to row_wght);
--
--
--
gen_pn : for i in (1 to pn_num) generate
ins_pn : pn_cmpn port map (
clk => clk,
bn_to_pn(1) => b2p (dim1_calc(i, 1)) (dim2_calc(i, 1)),
bn_to_pn(2) => b2p (dim1_calc(i, 2)) (dim2_calc(i, 2)),
.
.
.
bn_to_pn(32) => b2p (dim1_calc(i, 32)) (dim2_calc(i, 32)),
pn_to_bn => p2b (i)
);
end generate;
I know that using too many sequential statements together is not appropriate in general, and I'm avoiding them as much as possible, but in this case I assumed that this function won't synthesize into some real hardware, and synthesizer just calculates the output value and will put it in corresponding instantiations of that component. Am I right? or this way of coding leads to extra hardware compared to just 8000 instantiations.
PS1: Initially I used "0 to..." for defining ranges of the 2nd and 3rd dimension of my arrays, but because of confusion that were made in dimension calculation function based on for-generate statement parameter, I replaced them with "1 to...". Is that an OK! coding style or should I avoid it?
PS2: Is there a way that port mapping part in above code combines into something like this:
(I know this is strongly wrong, it's just a clarification of what I want)
gen_pn : for i in (1 to pn_num) generate
ins_pn : pn_cmpn port map (
clk => clk,
gen_bn_to_pn : for j in (1 to 32) generate
bn_to_pn(j) => b2p (dim1_calc(i, j)) (dim2_calc(i, j)),
end generate;
pn_to_bn => p2b (i)
);
end generate;
Let me give another example
Assume that I have a component instantiation like this:
ins_test : test_comp port map (
clk => clk,
test_port(1) => test_sig(2)
test_port(2) => test_sig(3)
test_port(3) => test_sig(4)
);
Is there a way that I can use for generate here? something like:
ins_test : test_comp port map (
clk => clk,
gen_pn : for i in (1 to 3) generate
test_port(i) => test_sig(i+1)
end generate;
);
PS3: Is it possible to call a function inside another function in VHDL?
Functions are usable this way. If you encounter problems, I am sure they will regard details in the design or design tools, rather than the basic approach.
One potential issue is that the function refers to some external "things" such as max_up, i, cnst_rom whose declarations are not part of the function nor parameters to it. This makes it an "impure function" which - because it refers to external state or even modifies it - has restrictions on calling it (because the external state may change, results may depend on order of evaluation etc).
If you can make it pure, do so. I have a feeling that max_up, cnst_rom are constants : if they aren't used elsewhere, declare them local to the function. And i probably ought to be a parameter.
If this is not possible, make the external declarations constants, and preferably wrap them and the function together in a package.
This will just generate the values you need in a small, comprehensible, maintainable form, and not an infinite volume of hardware. I have used a complex nest of functions performing floating point arithmetic then fiddly range reduction and integer rounding to initialise a lookup table, so fundamentally the approach does work.
Potential pitfall:
Some design tools have trouble with perfectly valid VHDL, if its use is slightly unorthodox. Synplicity cannot synthesise some forms of function (which DO generate hardware) though has no trouble with the equivalent procedure returning the result through an OUT parameter!. XST is considerably better.
XST parsing my lookup table init has an absurd slowdown, quadratic in the number of function calls. But only if you are using the old VHDL parser (the default for Spartan-3). Spartan-6 uses the new parser and works fine ( under a second instead of half an hour!) as do Modelsim and Isim. (haven't tried Synplicity on that project)
Some tools object to unorthodox things in port maps : you may get away with function calls there; or you may have to workaround tool bugs by initialising constants with the calls, and using those constants in the port maps.
And your supplementary questions:
PS1) The correct coding style for an array range is ... whatever makes your intent clear.
If you find yourself mentally offsetting by 1 and getting confused or even making errors, STOP! and improve the design.
Some good array indexing styles:
type colour is (red, green, blue);
subtype brightness is natural range 0 to 255;
hue : array (colour) of brightness;
gamma : array (brightness) of brightness;
-- here 0 is a legitimate value
channel : array (1 to 99) of frequency;
PS2) I think you're asking if you can nest generate statements. Yes.
Details may be awkward and difficult, but yes.
PS3) Yes of course! You can even declare functions local to others; eliminating the possibility they will be accidentally called somewhere they make no sense. They (impure functions) can access the execution scope of the outer function (or process), simplifying parameter lists.
Q1 - in this case I assumed that this function won't synthesize into some ...
It depends on which synthesizer you're using. See this relevant question and comments below.
Q2 - PS1: Initially I used "0 to..." for defining ranges of the ...
Surely it's OK. And please allow we to post a suggestion on coding style here. (from this book)
When defining the loop parameter specification, either use a type (or subtype) definition, or use predefined object attributes (e.g., PredefinedObject'range, PredefinedObject'length - 1 downto 0). Avoid using discrete range (e.g., 1 to 4).
This rule makes the code more reusable and flexible for maintenance.
Q3 - PS2: Is there a way that port mapping part in above code combines into ...
I think this is why you asked the 4th question. So refer to the next answer:).
Q4 - Is it possible to call a function inside another function in VHDL?
Though I can't find some official reference to this, the answer is yes.
PS: Coding rules are defined by the synthesizer tools. So the best way to find an answer is to try it yourself.