How to find the number of PLIC contexts? - chisel

I'm a SW developer trying to understand configuration of the RISC-V Platform-Level Interrupt Controller (PLIC) that's in a rocket-chip derived SoC in an FPGA. Please correct me if my terminology is off.
I'm trying to programmatically configure the PLIC after a warm boot, in particular clearing interrupt pending bits. I've read the RISC-V PLIC Specification which talks about up to 15872 contexts. While I can certainly iterate over all contexts with 1024 interrupts each, I would like to be more economical.
Where do I find the actual number of contexts? Is it constant for all rocket-chips designs? Is it a tunable value? What is the right question to ask the FPGA colleagues? They use chisel which I understand to be some sort of design language or tool.

To clarify terminology: What is a hart context?
We use the term hart to unambiguously and concisely describe a hardware thread as opposed to software-managed thread contexts.
The RISCV specification allows for up to 15872 contexts, but in practice you'll see many fewer - The actual number is set by each specific RISCV implementation. It is customizable in rocket-chip, so it could vary. The default configuration may offer more insight, but your specific configuration could be anything.
Your questions:
Where do the contexts come from? Where do I find the total number of contexts?
You can speculate what the number should be from implementation details, but as far as I'm aware there is no register that says how many contexts there might be. This will be implementation specific. Your best bet is looking at your rocket chip configuration.
From the Linux core docs:
A hart context is a privilege mode in a hardware execution thread. For example, in an 4 core system with 2-way SMT, you have 8 harts and probably at least two privilege modes per hart; machine mode and supervisor mode.
That means you would have 16 contexts for that case (4 cores x 2 threads x 2 privilege modes).
From this issue:
PLIC contexts are 1:1 with harts' interruptible privilege modes. (e.g. if you have 3 harts, each of which supports taking interrupts into M-mode and S-mode, you have 6 contexts.)
In this case, M mode and S mode are privilege modes.
Is there a Scala/Chisel/VHDL line of code to grep for the number of contexts?
No. The best you'll probably be able to do is find relevant values in your rocket-chip configs and figure out what it should be. Or ask someone with lots of RISCV experience on your team what the number should be. There isn't a register that stores the total number of contexts.
Is it constant for all rocket-chips designs?
No. The design could specify any number of harts or user modes. This is implementation specific and rocket-chip doesn't enforce any particular values.
Is it a tunable value?
Yes. The spec mentions a maximum, but in practice it can be any number <= that spec.
What is the right question to ask the FPGA colleagues?
Ask what is the maximum number of contexts they expect. If they don't know, ask them how many harts there are in your implementation, and how many user modes. Then multiply the two.
More resources
RISCV Wikipedia page
Official RISCV specification
RISCV PLIC specification
Rocket-chip docs
Chisel docs

Related

How can RISC-V SYSTEM instructions be implemented as trap?

I am currently studying the specifications for RISC-V with specification version 2.2 and Privileged Architecture version 1.10. In Chapter 2 of RISC-V specification, it is mentioned that "[...] though a simple implementation might cover the eight SCALL/SBREAK/CSRR* instructions with a single SYSTEM hardware instruction that always traps [...]"
However, when I look at the privileged specification, the instruction MRET is also a SYSTEM instruction, which is required to return from a trap. Right now I am confused how much of the Machine-level ISA are required: is it possible to omit all M-level CSRs and use a software handler for any SYSTEM instructions, as stated in Specification? If so, how does one pass in information such as return address and trap cause? Are they done through regular registers x1-x31?
Alternatively, is it enough to implement only the following M-level CSRs, if I am aiming for a simple embedded core with only M-level privilege?
mvendorid
marchid
mimpid
mhartid
misa
mscratch
mepc
mcause
Finally, how many of these CSRs can be omitted?
ECALL/EBREAK instructions are traps anyway. CSR instructions need to be carefully parsed to make sure they specify existent registers being accessed in allowed modes, which sounds like a job for your favorite sparse matrix, whether PLA or if/then.
You could emulate all SYSTEM instructions, but, as you see, you need to be able to access information inside the hardware that is not part of the normal ISA. This implies that you need to add "instruction extensions."
I would also recommend making the SYSTEM instructions atomic, meaning that exceptions should be masked or avoided within each emulated instruction.
Since I am not a very trusting person, I would create a new mode that would enable the instruction extensions that would allow you to read the exception address directly from the hardware, for example, and fetch instructions from a protected area of memory. Interrupts would be disabled automatically. The mode would be exited by branching to epc+4 or the illegal instruction handler. I would not want to have anything outside the RISC-V spec available even in M-mode, just to be safe.
In my experience, it is better to say "I do everything," than it is to explain to each customer, or worse, have a competitor explain to your customers, what it is that you do not do. But perhaps someone who knows the CSRs better could help; it is not something I do.

What does the EpisodeParameterMemory of keras-rl do?

I have found the keras-rl/examples/cem_cartpole.py example and I would like to understand, but I don't find documentation.
What does the line
memory = EpisodeParameterMemory(limit=1000, window_length=1)
do? What is the limit and what is the window_length? Which effect does increasing either / both parameters have?
EpisodeParameterMemory is a special class that is used for CEM. In essence it stores the parameters of a policy network that were used for an entire episode (hence the name).
Regarding your questions: The limit parameter simply specifies how many entries the memory can hold. After exceeding this limit, older entries will be replaced by newer ones.
The second parameter is not used in this specific type of memory (CEM is somewhat of an edge case in Keras-RL and mostly there as a simple baseline). Typically, however, the window_length parameter controls how many observations are concatenated to form a "state". This may be necessary if the environment is not fully observable (think of it as transforming a POMDP into an MDP, or at least approximately). DQN on Atari uses this since a single frame is clearly not enough to infer the velocity of a ball with a FF network, for example.
Generally, I recommend reading the relevant paper (again, CEM is somewhat of an exception). It should then become relatively clear what each parameter means. I agree that Keras-RL desperately needs documentation but I don't have time to work on it right now, unfortunately. Contributions to improve the situation are of course always welcome ;).
A little late to the party, but I feel like the answer doesn't really answer the question.
I found this description online (https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html#replay-memory):
We’ll be using experience replay
memory for training our DQN. It stores the transitions that the agent
observes, allowing us to reuse this data later. By sampling from it
randomly, the transitions that build up a batch are decorrelated. It
has been shown that this greatly stabilizes and improves the DQN
training procedure.
Basically you observe and save all of your state transitions so that you can train your network on them later on (instead of having to make observations from the environment all the time).

No LR and SPSR for EL0 in Aarch64

In AArch64, There are 4 exception levels viz EL0-3. ARM site mentions there are 4 Stack pointers (SP_EL0/1/2/3) but only 3 exception Link registers (ELR_EL1/2/3) and only 3 saved program status register(SPSR_EL1/2/3).
Why the ELR_EL0 and SPSR_EL0 are not required?
P.S. Sorry if this is a silly question. I am new to ARM architecture.
By design exceptions cannnot target EL0, so if it can't ever take an exception then it has no use for the machinery to be able to return from one.
To expand on the reasoning a bit (glossing over the optional and more special-purpose higher exception levels), the basic design is that EL1 is where privileged system code runs, and EL0 is where unprivileged user code runs. Thus EL0 is by necessity far more restricted in what it can do, and wouldn't be very useful for handling architectural exceptions, i.e. low-level things requiring detailed knowledge of the system. Only privileged software (typically the OS kernel) should have access to the full hardware and software state necessary to decide whether handling that basic hardware exception means e.g. going and quietly paging something in from swap, versus delivering a "software exception"-type signal to the offending task to tell it off for doing something bad.

about floating point operation

Recently, I have been making program (FDTD Operation) using the CUDA
development environment, OS is Windows server 2008 , Graphic card is TeslaC2070, compiler is VS2010. This program calculates using single and double precision floating-point.
I was reading the CUDA programming guide 3.2 and 4.0 . In appendix, guide tell me sin(), cos() has maximum accuracy of 2 ULP. My original CPU program produces results which are different to the CUDA Version.
I want to make results correctly same. Is it possible?
To quote Goldberg (a paper that every Computer Scientist, Computational Scientist, and possibly even every scientist who programs, should read):
Due to roundoff errors, the associative laws of algebra do not
necessarily hold for floating-point numbers.
This means that when you change the order of operations—even when using ostensibly associative arithmetic—you are likely to get slightly different answers.
Parallelism, by definition, results in different ordering of operations relative to serial arithmetic. "Embarrasingly parallel" computations, that is, computations where each output element is computed independently from all others, sometimes do not have to worry about this. But collective operations, like reductions or scans, and spatial neighborhood computations, such stencils (as in FDTD), do experience this effect.
In practice, even using a different compiler (and even different compiler options) can change the result of floating point computation, even when compiling the same code, with or without parallelism.

Are dll injection,ring0,ring3... all windows specific conceptions?

Do they exist on linux platforms?
Rings are x86 processor architecture terminology, in which the processor can execute in one of four different operating modes called "priority levels, numbered zero to three. Priority level zero is allowed to perform any operation on the CPU, while priority level three is the most restricted - there are some instructions that cannot be executed at priority level three. Ref.
DLL injection is not specific to any operating system.
Well, DLL injection is not a Windows-specific concept, Linux can do it too, and it might be slightly simpler. (See http://en.wikipedia.org/wiki/DLL_injection). Also, IIRC the three "rings" are an x86 specific concept (not OS dependent). So to answer your question, no, none of these things is Windows specific.
The ring concept is a very general one, as the wikipedia entry explains. Re Linux specifically, it says:
Linux and Windows are two operating
systems that use supervisor/user-mode.
To perform specialized functions,
user-mode code must perform a system
call into supervisor mode or even to
the kernel space where trusted code of
the operating system will perform the
needed task and return it back to user
space.
Other operating systems (as, again, the article mentions, pointing to other articles for more details) can use different security architectures (esp. capability-based ones).