Who provides syscalls in qemu-riscv? - qemu

I started learn riscv. I got qemu-riscv, riscv-gcc and compiled next hello world asm program:
.section .text
.globl _start
_start:
li a0, 0 # stdout
1: auipc a1, %pcrel_hi(msg) # load msg(hi)
addi a1, a1, %pcrel_lo(1b) # load msg(lo)
li a2, 12 # length
li a3, 0
li a7, 64 # _NR_sys_write
ecall # system call
li a0, 0
li a1, 0
li a2, 0
li a3, 0
li a7, 93 # _NR_sys_exit
ecall # system call
loop:
j loop
.section .rodata
msg:
.string "Hello World\n"
Here are using syscalls (_NR_sys_write, _NR_sys_exit) and that's confusing me - I think I run "bare metal" program, but why a syscalls are being used implicitly? Why this syscalls is proxied by qemu and what will happens if I run this code on fpga riscv where is no imlemented syscalls?
ps: It's really hard to find any risc-v programming tutorial or processor bare-metal configuration for me. There is some poorly commented code of ported OSs (FreeRTOS, Linux and FreeBSD) but there is no any explanation. Could you also help me with this info?

There are three families of targets in QEMU:
User-mode emulation, where QEMU provides an AEE (Application Execution Environment). In this mode, QEMU itself acts as the supervisor -- in other words, when QEMU sees an ecall it will decode the syscall number/arguments, perform the system call, and then return to emulating instructions. QEMU's user-mode emulation is thus tied to a specific supervisor's ABI, with the only instance I've seen in RISC-V land being Linux.
Soft MMU, where QEMU provides an MEE (Machine Execution Environment). In this mode the entire software stack is run as if it was running on a full RISC-V system with QEMU providing emulated devices -- in other words, when QEMU sees an ecall it will start emulating code at the trap vector. Here QEMU doesn't need to know anything about the supervisor as the exact supervisor code is running.
Hardware virtualization, where QEMU provides a HEE (Hypervisor Execution Environment) while relying on hardware emulation to provide better performance. We don't have this working on RISC-V yet (as of October 2018), but there are specifications in progress along with early implementation work.
If you see ecall instructions in userspace just magicly working, then you're probably running in user-mode emulation.

Related

In which pipeline stage is branch decision been made?

In which RISC pipeline stage is branch decision been made? Is it in the "Decode" or "Executes" or other stages? Assume the pipeline have 5 stages - "IF", "ID", "EX", "MEM" and "WB".
There are a few ways to implement this in a classic 5-stage RISC in general. For unconditional direct (not register) branches, obviously you can detect them in ID and have the target PC ready for the next IF cycle (with 1 cycle of branch latency, i.e. 1 wasted IF cycle if you don't hide that latency somehow, e.g. MIPS's branch delay slot or branch prediction).
Some toy pipelines like described in this answer do the simplest thing and evaluate in ALU in EX, forwarding to a muxer between PC+4 and PC+4+rel_offset and eventually on to IF with 3 cycle branch latency. (End of EX to start of IF)
Actual commercial MIPS I (R2000) evaluated branch conditions in the first half-cycle of EX, forwarding to IF which only needed an address in the second half-cycle. See How does MIPS I handle branching on the previous ALU instruction without stalling? This gives a branch latency of 1 cycle, short enough to be fully hidden by 1 branch-delay slot, even for conditional or indirect jr $reg branches.
This half-cycle speed is why MIPS branch conditions are simple, only checking the whole register for non-zero or not, or checking the MSB (sign bit) for non-zero. Simple RISCs with a FLAGS / status register (like PowerPC or ARM) could use a similar strategy of very quickly checking a flags condition.
(Note that RISC-V allows a full set of branch conditions; as described in RISC-V's design rationale, checking a whole register for all-zeros in modern CMOS designs is apparently not much shorter gate-delay than comparing two registers for equality or even > or < with a good comparator, presumably something smarter than subtract with ripple-carry.
RISC-V assumes branch-prediction will hide branch delays.)
The previous version of this answer incorrectly claimed that MIPS I evaluated branch conditions in ID itself. A toy pipeline in this question does that, but that would require the inputs to be ready earlier than usual. It introduces the problem of a b?? instruction stalling while waiting for the EX result of the previous ALU instruction, like in common sequences like slt $at, $t1, $t2 / bnez $at, target, i.e. the expansion of a pseudo-instruction like blt $t1, $t2.
Wikipedia's Classic RISC (5-stage pipeline) article's Instruction Decode section was misleading at best, but has been fixed. It now says "The branch condition is computed in the following cycle (after the register file is read)" - I think that was a bugfix, not just clarification: this is all described in the ID section, implying it happened there without explicit phrasing to the contrary. Also, the still-present claim that "Some architectures made use of the Arithmetic logic unit (ALU) in the Execute stage, at the cost of slightly decreased instruction throughput." makes no sense if it wasn't talking about evaluating them earlier, since nothing else could be using the ALU during that time in a scalar in-order pipeline.
Other sources (like these slides: http://home.deib.polimi.it/santambr/dida/phd/wonderland/2014/doc/PDF/4_BranchHazard_StaticPrediction_V0.pdf) says "Branch Outcome and Branch Target Address are ready at the end of the EX stage (3th stage)" for a classic MIPS beq instruction. That's not how commercial R2000 worked, but may be describing a simple MIPS implementation from a textbook or course material that does work that way.
Much discussion of MIPS is actually about hypothetical MIPS-like 5-stage RISC pipelines in general, not real MIPS R2000, or the classic Stanford MIPS CPU that R2000 was based on (but it was a full re-design). So it's hard to know whether something you find about "MIPS" applies to R2000 (gcc -march=mips1) or if it's for a simplified teaching version of MIPS.
Some "MIPS" implementations aren't even the same ISA, e.g. without branch-delay slots (which complicate exception handling significantly).
This originally wasn't a MIPS question at all, just generic classic
5-stage RISC. There were multiple early RISC ISAs, many of them originally designed around a 5-stage pipeline (https://en.wikipedia.org/wiki/Classic_RISC_pipeline). I don't know a lot about their internals:
Different architectures could make different choices, e.g. stall or use branch prediction + speculative fetch/decode if needed while they wait for the branch result to be ready from whatever stage produces it.
And even speculative execution is possible, even with a static prediction like forward not-taken / backward taken. If still in-order, mis-speculation can be caught before it reaches write-back or MEM. You don't want any speculative stores written to cache, but you can definitely catch it by the time the branch reaches EX. All instructions which have a control dependency on the branch are younger and therefore are in earlier pipeline stages (if present at all; IF could have missed in I-cache).

Loading ROM bootloader in qemu

I have a custom bootloader, I have the entry point of the bootloader, How do I specify this address to qemu?
I also have this warning when I try to load the image with this line qemu-system-mips -pflash img_:
WARNING: Image format was not specified for 'img_' and probing guessed raw.
Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
Specify the 'raw' format explicitly to remove the restrictions.
I tried -pflash=img_,format=raw but it didn't work.
Thanks.
You should perform some dig into qemu sources to play with custom bootloader images at qemu. QEMU loads booloader at board initilizaton function, which is board specific.
Run following to view all available MIPS board models:
qemu-system-mips64 -machine ?
Supported machines are:
magnum MIPS Magnum
malta MIPS Malta Core LV (default)
mips mips r4k platform
mipssim MIPS MIPSsim platform
none empty machine
pica61 Acer Pica 61
This models are implemented in qemu/hw/mips/ files. (look for * _init funtctions)
For example, for malta board (which is default) it is hw/mips/mips_malta.c. Look at mips_malta_init function: it constructs devices, buses and cpu, register memory regions and place bootloader to memory. Seems that FLASH_ADDRESS macro is that you looking for.
Note that this init functions is common thing to all boards that QEMU implements.
Also every board have some reference/datasheet document, and QEMU model should complement with them, as a programer view.

MIPS internal logic signals for a beq instructions

I'd really need a hand or two with this Assembly Mips CPU Excercise.
I have to determine input and output from: ALU(s), Jump-related MUX and from the Register File.
PC is 0x01D0 and the instruction I have to simulate is: beq $3, $7, -120
Regarding the ALU(s) I've no problem on those, I've got issues on MUX and RG.
As you can see on the image on the second jump-related MUX I don't know what to write regarding jump address [31-0].
The other problem I've got is within the Register File, I don't know what to write as input.(Instruction should be: 0x1067FFE2)

Strange MIPS AdES exception

I am giving very limited information, I know, but it is probably enough.
System specs: MIPS 64-bit processor with 8 cores running 4 virtual cpus each.
OS: Some proprietary Linux based on the 2.6.32.9 kernel
Process: a simple user-land process running 7 posix threads. This specific application is running on core 0, which does not have any cpu affinity by any process.
The crash is almost impossible to reproduce. There is no specific scenario. We know that, if we perform some minor activity with the application it might crash once a day.
The specific thread that's crashing wakes up every 5 milliseconds, reads information from one shared memory area and updates another. That's it.
There is not too much activity. The process is not working too hard.
Now: When I open the core and load a symbol-less image of the application, gdb points to instruction 100661e0. Instruction 100661e0 looks like this (viewing with an objdump view of the non-stripped image):
void class::foo(uint8_t factor)
{
100661d8: ffbf0018 sd ra,24(sp)
100661dc: 0080802d move s0,a0
bar(factor, shared_memory_a->profiles[PROFILE_1]);
100661e0: 0c019852 jal 10066148 <_ZN28class35barEhR30profile_t>
100661e4: 64c673c8 daddiu a2,a2,29640
bar(factor, shared_memory_a->profiles[PROFILE_2]);
100661e8: de060010 ld a2,16(s0)
The line that shows as the exception line is
100661e0: 0c019852 jal 10066148 <_ZN28class35barEhR30profile_t>
Note that 10066148 is a valid instruction.
The Bad register contains the following address, which is aligned, but does not look valid as far as the instruction address space is concerned: c0000003ddc5dd90
The cause register contains this value: 0000000000800014
I don't understand why the Bad registers shows the value it does, where the instruction specifically states a valid instruction. I am a little concerned about branch delay slot issues, but I shouldn't be concerned about those when I am running a userland application using simple C++.
I'd appreciate any thoughts.

Difference between Syscall and Traps

I am wondering if there is any difference between the MIPS syscall and trap instructions. I can't find anything involving this, so I am not sure if there is a difference. Traps seem to just be a conditional syscall, but some clarifying can be helpful.
The SYSCALL and TRAP instructions both trigger exceptions, but the resulting exception is of a different type (SystemCall versus Trap), and the operating system will likely handle them differently.
A Trap is an exception switches to kernel mode by invoking a kernel sub-routine (any system call). Usually trap creates any kind of control transfer to operating system. Where as SYSCALL is synchronous and planned user process to kernel mode.