How to link verilog blackbox to memory of rocketchip in chisel? - chisel

I am trying to attach a verilog module to rocketchip's memory. More precisely, I want to integrate a memory encryption engine as a blackbox. My idea is to link my verilog module to memAXI4Node of trait CanHaveMasterAXI4MemPort and io_axi4 node of SimAXIMem.
The verilog module has IOs for AXI ports, clock and reset.
My first try looks something like this:
SimAXIMem.scala
def connectMem(dut: CanHaveMasterAXI4MemPort)(implicit p: Parameters): Seq[SimAXIMem] = {
dut.mem_axi4.zip(dut.memAXI4Node.in).map { case (io, (_, edge)) =>
val mem = LazyModule(new SimAXIMem(edge, base = p(ExtMem).get.master.base, size = p(ExtMem).get.master.size))
Module(mem.module).suggestName("mem")
val blackbox = Module(new MyBlackBox())
blackbox.io.s_axi_awid := io.aw.bits.id
blackbox.io.s_axi_awaddr := io.aw.bits.addr
...
mem.io_axi4.head.aw.bits.id := blackbox.io.m_axi_awid
mem.io_axi4.head.aw.bits.addr := blackbox.io.m_axi_awaddr
...
//not working
val clock: Clock
blackbox.io.clock := clock
mem
}
}
Is there a proper way to put my verilog module between those two nodes?
How can I assign the clock to my blackbox, because this is only achievable inside a trait or module. But I assume, to connect the blackbox to memory the instantiation has to be done inside the method.
Jason

Related

How to not synthesize memory for release in Chisel

I'm writing an architecture that makes extensive internal use of SyncReadMem, which is intended to represent SRAM memory banks. Our current synthesis toolchain, however, does not support SRAM properly, but is fine for registers and computational logic. So, whenever we're running a synthesis build, I pass in a flag that disables the elaboration of any SyncReadMem modules and treats them purely as IO signals:
class exampleModule (synthesis: Boolean) extends Module {
val io = IO(new Bundle {
val in = Input(UInt(32.W))
val out = Output(Uint(32.W))
val synth_bundle = if (synthesis) Some(new Bundle() {
val synth_out = Output(UInt(32.W))
val synth_in = Input(UInt(4.W))
}) else None
}
val mem_read = if (synthesis) {
val memory = SyncReadMem(1 << 32, UInt(4.W))
memory.read(io.in)
} else {
io.synth_bundle.get.synth_out := io.in
io.synth_bundle.get.synth_in
}
io.out := mem_read * 2.U
}
This, to my understanding, should properly elaborate all of my logic (ie not optimize anything out that I want to be there), and won't elaborate any of the memory whenever I have synthesis builds enabled.
The problem I'm running into is that for really hierarchical modules where a deeply ingrained module needs something like this, it requires every module above it to implement this synth_bundle style IO, requiring a bit of writing and adding no functionality. Is there some easier / more canonical way to do this?
Thank you

Can I compute constants in software before Chisel begins designing hardware?

I'm new to Chisel, and I was wondering if it's possible to calculate constants in software before Chisel begins designing any circuitry. For instance, I have a module which takes one parameter, myParameter, but from this parameter I'd like to derive more variables (constant1 and constant2) that would be later used to initialize registers.
class MyModule(myParameter: Int) extends Module {
val io = IO(new Bundle{
val in = Input(SInt(8.W))
val out = Output(SInt(8.W))
})
val constant1 = 2 * myParameter
val constant2 = 17 * myParameter
val register1 = RegInit((constant1).U(8.W))
val register2 = RegInit((constant2).U(8.W))
//...
//...
}
Is there a way to configure Chisel's functionality so that an instance of MyModule(2) will first evaluate all Scala vals in software: constant1 = 2 * 2 = 4 and constant2 = 17 * 2 = 34. Then proceed to instantiate and initialize registers register1 = RegInit(4.U(8.W)) and register2 = RegInit(34.U(8.W))?
I was wondering if it's possible to calculate constants in software before Chisel begins designing any circuitry
Unless I'm misunderstanding your question, this is, in fact, how Chisel works.
Fundamentally, Chisel is a Scala library where the execution of your compiled Scala code creates hardware. This means that any pure-Scala code in your Chisel only exists at elaboration time, that is, during execution of this Scala program (which we call a generator).
Now, values in your program are created in sequential order as defined by Scala (and more-or-less the same as any general purpose programming language). For example, io is defined before constant1 and constant2 so the Chisel object for io will be created before either constants are calculated, but this shouldn't really matter for the purposes of your question.
A common practice in Chisel is to create custom classes to hold parameters when you have a lot of them. In this case, you could do something similar like this:
// Note this doesn't extend anything, it's just a Scala class
// Also note myParameter is a val now, this makes it accessible outside the class
class MyParameters(val myParameter: Int) {
val constant1 = 2 * myParameter
val constant2 = 17 * myParameter
}
class MyModule(params: MyParameters) extends Module {
val io = IO(new Bundle{
val in = Input(SInt(8.W))
val out = Output(SInt(8.W))
})
val register1 = RegInit((params.constant1).U(8.W))
val register2 = RegInit((params.constant2).U(8.W))
//...
//...
}

SyncReadMem generated verilog vs. Rocketchip emitted verilog

I am using SyncReadMem() for sram behavioral simulation. With the generated Verilog by verilator, I hope to replace it with a commercial sram compiler compiled verilog such that I can do synthesis for the whole design including sram.
However, I noticed that the verilog emitted by SyncReadMem() is not with uniform IOs just like the sram emitted in rocketchip. I wonder how do we generate some sram verilog just like the rocketchip one, using chisel mem API like SyncReadMem()?
You can use the Scala FIRRTL Compiler's "Replace Sequential Memories" pass to blackbox the memories. This is exactly what is happening with Rocket Chip.
Note that this is limited to only work if the memories have a single read port and a single write port and with read latency 1 and write latency 1.
As an example, consider the following 1r1w (one read, one write) SyncReadMem:
import chisel3._
class Foo extends MultiIOModule {
val read = IO(new Bundle {
val en = Input(Bool())
val addr = Input(UInt(8.W))
val data = Output(UInt(1.W))
})
val write = IO(new Bundle{
val en = Input(Bool())
val addr = Input(UInt(8.W))
val data = Input(UInt(1.W))
})
val bar = SyncReadMem(256, UInt(1.W))
read.data := bar.read(read.addr, read.en)
when (write.en) {
bar.write(write.addr, write.data)
}
}
If you compile this with a request to run the replace sequential memories pass:
(new ChiselStage)
.emitVerilog(new Foo, Array("--repl-seq-mem", "-c:Foo:-o:Foo.mem.conf"))
The arguments used there are -c:<circuit> where <circuit> is the name of the circuit you want to run on and -o:<mem-conf-file> is the name of a file to generate that will contain information (e.g., name, width, and depth) of the memories that were blackboxed.
You wind up with the memory blackboxed inside a new module bar called bar_ext:
module bar(
input [7:0] R0_addr,
input R0_en,
input R0_clk,
output R0_data,
input [7:0] W0_addr,
input W0_en,
input W0_clk,
input W0_data
);
wire [7:0] bar_ext_R0_addr;
wire bar_ext_R0_en;
wire bar_ext_R0_clk;
wire bar_ext_R0_data;
wire [7:0] bar_ext_W0_addr;
wire bar_ext_W0_en;
wire bar_ext_W0_clk;
wire bar_ext_W0_data;
bar_ext bar_ext (
.R0_addr(bar_ext_R0_addr),
.R0_en(bar_ext_R0_en),
.R0_clk(bar_ext_R0_clk),
.R0_data(bar_ext_R0_data),
.W0_addr(bar_ext_W0_addr),
.W0_en(bar_ext_W0_en),
.W0_clk(bar_ext_W0_clk),
.W0_data(bar_ext_W0_data)
);
assign bar_ext_R0_clk = R0_clk;
assign bar_ext_R0_en = R0_en;
assign bar_ext_R0_addr = R0_addr;
assign R0_data = bar_ext_R0_data;
assign bar_ext_W0_clk = W0_clk;
assign bar_ext_W0_en = W0_en;
assign bar_ext_W0_addr = W0_addr;
assign bar_ext_W0_data = W0_data;
endmodule
You can then run a memory compiler to consume the information in the memory configuration file and drop the output in place of bar_ext.

Chisel test - internal signals

I would like to test my code, so I'm doing a testbench. I wanted to know if it was possible to check the internal signals -like the value of the state register in this example- or if the peek was available only for the I/O
class MatrixMultiplier(matrixSize : UInt, cellSize : Int) extends Module {
val io = IO(new Bundle {
val writeEnable = Input(Bool())
val bufferSel = Input(Bool())
val writeAddress = Input(UInt(14.W)) //(matrixSize * matrixSize)
val writeData = Input(SInt(cellSize.W))
val readEnable = Input(Bool())
val readAddress = Input(UInt(14.W)) //(matrixSize * matrixSize)
val readReady = Output(Bool())
val readData = Output(SInt((2 * cellSize).W))
})
val s_idle :: s_writeMemA :: s_writeMemB :: s_multiplier :: s_ready :: s_readResult :: Nil = Enum(6)
val state = RegInit(s_idle)
...
and for the testbench :
class MatrixUnitTester(matrixMultiplier: MatrixMultiplier) extends PeekPokeTester(matrixMultiplier) { //(5.asUInt(), 32.asSInt())
println("State is: " + peek(matrixMultiplier.state).toString) // is it possible to have access to state ?
poke(matrixMultiplier.io.writeEnable, true.B)
poke(matrixMultiplier.io.bufferSel, false.B)
step(1)
...
EDIT : Ok, with VCD + GTKWave it is possible to graphically see these variables ;)
Good question. There's several parts to this answer
The Chisel supplied unit testing frameworks older chisel-testers and the newer chiseltest. Do not provide a mechanism to look into the wires directly.
Currently the chisel team is looking into ways of doing that.
Both provide indirect ways of doing it. Writing VCD output and using printf to see internal values
The Treadle firrtl simulator, which can directly simulate a firrtl (the direct output of the Chisel compiler) does allow for peek, and poking any signal directly. There are lots of examples of how its use in Treadle's unit tests. Treadle also provides a REPL shell which can be useful for exploring a circuit with manual peeks and pokes
The older chiseltesters (io-testers) and current chiseltest frameworks allow debugging the signal values with .peek() function that works well for the interface signals.
I haven't found a way to peek() an internal signal while debugging a testcase. However, Treadle simulator can dump the values of internal signals when it is running in verbose mode:
Add the annotation treadle.VerboseAnnotation to the test:
`test(new DecoupledGcd(16)).withAnnotations(Seq(WriteVcdAnnotation, treadle.VerboseAnnotation))`
When debugging in the IDEA and the test stops at breakpoint, the changes in the values of all internal signals up to this point are dumped to the Console.
This example will also generate the VCD wave file for further debugging.

Chisel/FIRRTL constant propagation & optimization across hierarchy

Consider a module that does some simple arithmetic and is controlled by a few parameters. One parameter controls the top level behavior: the module either reads its inputs from its module ports, or from other parameters. Therefore, the result will either be dynamically computed, or statically known at compile (cough, synthesis) time.
The Verilog generated by Chisel has different module names for the various flavors of this module, as expected. For the case where the result is statically known, there is a module with just one output port and a set of internal wires that are assigned constants and then implement the arithmetic to drive that output.
Is it possible to ask Chisel or FIRRTL to go further and completely optimize this away, i.e. in the next level of hierarchy up, just replace the instantiated module with its constant and statically known result? (granted that these constant values should by optimized away during synthesis, but maybe there are complicated use cases where this kind of elaboration time optimization could be useful).
For simple things that Firrtl currently knows how to constant propagate, it already actually does this. The issue is that it currently doesn't const prop arithmetic operators. I am planning to expand what operators can be constant propagated in the Chisel 3.1 release expected around New Years.
Below is an example of 3.0 behavior constant propagating a logical AND and a MUX.
import chisel3._
class OptChild extends Module {
val io = IO(new Bundle {
val a = Input(UInt(32.W))
val b = Input(UInt(32.W))
val s = Input(Bool())
val z = Output(UInt(32.W))
})
when (io.s) {
io.z := io.a & "hffff0000".U
} .otherwise {
io.z := io.b & "h0000ffff".U
}
}
class Optimize extends Module {
val io = IO(new Bundle {
val out = Output(UInt())
})
val child = Module(new OptChild)
child.io.a := "hdeadbeef".U
child.io.b := "hbadcad00".U
child.io.s := true.B
io.out := child.io.z
}
object OptimizeTop extends App {
chisel3.Driver.execute(args, () => new Optimize)
}
The emitted Verilog looks like:
module Optimize(
input clock,
input reset,
output [31:0] io_out
);
assign io_out = 32'hdead0000;
endmodule