QEMU - Code Flow [ Instruction cache and TCG] - qemu

I am trying to analyze the QEMU source code.
I know its huge and till date there is no official documentation for it.
My main areas of concern are the Instruction cache management and TCG operation.
Any pointers to them would be helpful ?

I know full answer would be much longer, but for start I just want to bring to your attention this diagram: (now, it would be useful for you to play with gdb running QEMU, set breakpoints in functions you see in the diagram, follow code execution, etc.)

Yes, the QEMU code flow has changed a good bit. I'm not going to do a libreoffice presentation but here's a couple TCG stack traces of the QEMU 5.1 MTTCG code. The first is the TCG Front-end (FE) taking the guest code and converting it to internal intermediate code in a Translation Block (TB). The TB has a max size of 512 instructions.
#0 0x00005555559ce81f in disas_insn (s=0x7fffe933d450, cpu=0x555556b2d990) at /opt/distros/qemu-5.1.0/target/i386/translate.c:4476
#1 0x00005555559dd471 in i386_tr_translate_insn (dcbase=0x7fffe933d450, cpu=0x555556b2d990) at /opt/distros/qemu-5.1.0/target/i386/translate.c:8569
#2 0x00005555558c4222 in translator_loop (ops=0x5555565ae9a0 <i386_tr_ops>, db=0x7fffe933d450, cpu=0x555556b2d990, tb=0x7fffac099900 <code_gen_buffer+134846675>, max_insns=512) at /opt/distros/qemu-5.1.0/accel/tcg/translator.c:102
#3 0x00005555559dd643 in gen_intermediate_code (cpu=0x555556b2d990, tb=0x7fffac099900 <code_gen_buffer+134846675>, max_insns=512) at /opt/distros/qemu-5.1.0/target/i386/translate.c:8631
#4 0x00005555558c2258 in tb_gen_code (cpu=0x555556b2d990, pc=18446744071591428680, cs_base=0, flags=4244144, cflags=-16252928) at /opt/distros/qemu-5.1.0/accel/tcg/translate-all.c:1743
#5 0x00005555558be77a in tb_find (cpu=0x555556b2d990, last_tb=0x0, tb_exit=0, cf_mask=524288) at /opt/distros/qemu-5.1.0/accel/tcg/cpu-exec.c:407
#6 0x00005555558bf18e in cpu_exec (cpu=0x555556b2d990) at /opt/distros/qemu-5.1.0/accel/tcg/cpu-exec.c:748
#7 0x00005555559846eb in tcg_cpu_exec (cpu=0x555556b2d990) at /opt/distros/qemu-5.1.0/softmmu/cpus.c:1356
#8 0x0000555555984f41 in qemu_tcg_cpu_thread_fn (arg=0x555556b2d990) at /opt/distros/qemu-5.1.0/softmmu/cpus.c:1664
#9 0x0000555555e5ec8d in qemu_thread_start (args=0x555556b5e0e0) at /opt/distros/qemu-5.1.0/util/qemu-thread-posix.c:521
#10 0x00007ffff3c406db in start_thread (arg=0x7fffe933e700) at pthread_create.c:463
#11 0x00007ffff396971f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
The second is the TCG Back-end (BE) translating the intermediate code into host instructions.
Thread 3 "qemu-system-x86" hit Breakpoint 9, tcg_out_op (s=0x7ffe9c000b20, opc=INDEX_op_ld_i32, args=0x7fffe933d530, const_args=0x7fffe933d4f0) at /opt/distros/qemu-5.1.0/tcg/i386/tcg-target.inc.c:2259
2259 int c, const_a2, vexop, rexw = 0;
#0 0x00005555558447d2 in tcg_out_op (s=0x7ffe9c000b20, opc=INDEX_op_ld_i32, args=0x7fffe933d530, const_args=0x7fffe933d4f0) at /opt/distros/qemu-5.1.0/tcg/i386/tcg-target.inc.c:2259
#1 0x000055555584fb40 in tcg_reg_alloc_op (s=0x7ffe9c000b20, op=0x7ffe9c00a418) at /opt/distros/qemu-5.1.0/tcg/tcg.c:3803
#2 0x000055555585078e in tcg_gen_code (s=0x7ffe9c000b20, tb=0x7fffac129880 <code_gen_buffer+135436371>) at /opt/distros/qemu-5.1.0/tcg/tcg.c:4244
#3 0x00005555558c22f1 in tb_gen_code (cpu=0x555556b2d990, pc=94290869746347, cs_base=0, flags=4244147, cflags=-16252928) at /opt/distros/qemu-5.1.0/accel/tcg/translate-all.c:1766
#4 0x00005555558be77a in tb_find (cpu=0x555556b2d990, last_tb=0x7fffac0ea700 <code_gen_buffer+135177939>, tb_exit=1, cf_mask=524288) at /opt/distros/qemu-5.1.0/accel/tcg/cpu-exec.c:407
#5 0x00005555558bf18e in cpu_exec (cpu=0x555556b2d990) at /opt/distros/qemu-5.1.0/accel/tcg/cpu-exec.c:748
#6 0x00005555559846eb in tcg_cpu_exec (cpu=0x555556b2d990) at /opt/distros/qemu-5.1.0/softmmu/cpus.c:1356
#7 0x0000555555984f41 in qemu_tcg_cpu_thread_fn (arg=0x555556b2d990) at /opt/distros/qemu-5.1.0/softmmu/cpus.c:1664
#8 0x0000555555e5ec8d in qemu_thread_start (args=0x555556b5e0e0) at /opt/distros/qemu-5.1.0/util/qemu-thread-posix.c:521
#9 0x00007ffff3c406db in start_thread (arg=0x7fffe933e700) at pthread_create.c:463
#10 0x00007ffff396971f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Once the TB is generated then tb_find returns with it and cpu_tb_exec runs it.
As prior answers indicate, there is A LOT more to the TCG.
Note that I used host and guest rather than native and target. TCG has the terms somewhat reversed: the source is the guest code that is converted to run on the target, which is the host. In other words, the TCG target is the QEMU host (which makes sense for the VM code generation.)

Related

RISC-V: Illegal instruction exception when switching to supervisor mode

When setting the mstatus.mpp field to switch to supervisor mode, I'm getting an illegal instruction exception when calling mret. I'm testing this in qemu-system-riscv64 version 6.1 with the riscv64-softmmu system.
I recently upgraded from QEMU 5.0 to 6.1. Prior to this upgrade my code worked. I can't see anything relevant in the changelog. I'm assuming that there's a problem in my code that the newer version simply doesn't tolerate.
Here is a snippet of assembly that shows what's happening (unrelated boot code removed):
.setup_hart:
csrw satp, zero # Disable address translation.
li t0, (1 << 11) # Supervisor mode.
csrw mstatus, t0
csrw mie, zero # Disable interrupts.
la sp, __stack_top # Setup stack pointer.
la t0, asm_trap_vector
csrw mtvec, t0
la t0, kernel_main # Jump to kernel_main on trap return.
csrw mepc, t0
la ra, cpu_halt # If we return from main, halt.
mret
If I set the mstatus.mpp field to 0b11 for machine mode, I can get to kernel_main without any problem.
Here's the output from QEMU showing the exception information:
riscv_cpu_do_interrupt: hart:0, async:0, cause:0000000000000002, epc:0x000000008000006c, tval:0x0000000000000000, desc=illegal_instruction
mepc points to the address of the mret instruction where the exception occurs.
I've tested that the machine supports supervisor mode by writing and retrieving the value in mstatus.mpp successfully.
Is there something obvious I'm missing? My code seems very similar to the few examples I can find online, such as https://osblog.stephenmarz.com/ch3.2.html. Any help would be greatly appreciated.
The issue turned out to be RISC-V's Physical Memory Protection (PMP). QEMU will raise an illegal instruction exception when executing an MRET instruction if no PMP rules have been defined. Adding a PMP entry resolved the issue.
This was confusing, as this behaviour is not specified in the Privileged Architecture manual's section on mret.

Filling the gap in my understanding

I was going through the following series of lecture notes on OS :
http://williamstallings.com/Extras/OS-Notes/h3.html
Here while trying to explain the different outcomes the program for thread can produce it breaks down the execution of function and says the following line :
"sum first reads the value of a into a register. It then increments the register, then stores the contents of the register back into a. It then reads the values of of the control string, p and a into the registers that it uses to pass arguments to the printf routine. It then calls printf, which prints out the data"
I exactly don't know how a function is executed at the level of registers and at the same time don't know which topic should I learn to know more about it .
So , which topic encompasses this execution of function at the level of registers and the level of electronic circuits?
please kindly elaborate how a stack is incremented while a value is being read during the execution of function .
Thanks in advance.
The advice to look at the assambler code is already a good one. You can look up the assembler instructions and think what happens if at any instruction the thread execution changes to the other thread.
Look at this code
la a, %r0
ld [%r0],%r1
add %r1,1,%r1
st %r1,[%r0]
ld [%r0], %o3 ! parameters are passed starting with %o0
mov %o0, %o1
la .L17, %o0
call printf
In the first four lines (the a++) there are different possibilities how the execution can happen. You dont know if sum(1) or sum(0) is called first.
To understand what is ongoing on a deeper level I suggest you look up 'computer organization'. See for example this link Computer Organisation WikiBook.

what is the current execution mode/exception level, etc?

I am new to ARMv8 architecture. I have following basic questions on my mind:
How do I know what is the current execution mode AArch32 or AArch64? Should I read CPSR or SPSR to ascertain this?
What is the current Exception level, EL0/1/2/3?
Once an exception comes, can i read any register to determine whether I am in Serror/Synchronous/IRQ/FIQ exception handler.
TIA.
The assembly instructions and their binary encoding are entirely different for 32 and 64 bit. So the information what mode you are currently in is something that you/ the compiler already needs to know during compilation. checking for them at runtime doesn't make sense. For C, C++ checking can be done at compile time (#ifdef) through compiler provided macros like the ones provided by armclang: __aarch64__ for 64 bit, __arm__ for 32 bit
depends on the execution mode:
aarch32: MRS <Rn>, CPSR read the current state into register number n. Then extract bits 3:0 that contain the current mode.
aarch64: MRS <Xn>, CurrentEL read the current EL into register number n
short answer: you can't. long answer: the assumption is that by the structure of the code and the state of any user defined variables, you already know what you are doing. i.e. whether you came to a position in code through regular code or through an exception.
aarch64 C code:
register uint64_t x0 __asm__ ("x0");
__asm__ ("mrs x0, CurrentEL;" : : : "%x0");
printf("EL = %" PRIu64 "\n", x0 >> 2);
arm C code:
register uint32_t r0 __asm__ ("r0");
__asm__ ("mrs r0, CPSR" : : : "%r0");
printf("EL = %" PRIu32 "\n", r0 & 0x1F);
CurrentEL however is not readable from EL0 as shown on the ARMv8 manual C5.2.1 "CurrentEL, Current Exception Level" section "Accessibility". Trying to run it in Linux userland raises SIGILL. You could catch that signal however I suppose...
CPSR is readable from EL0 however.
Tested on QEMU and gem5 with this setup.

MIPS: legal to have two consecutive "load word" instructions into the same register?

Background: We're seeing a very intermittent crash in a function foo(int *p). The crash occurs while dereferencing p, whose value in these cases turns out to be 0xffffffff. An analysis of the core dump shows that foo() is called from the following assembly snippet:
bne ... somewhere else
lw $a0,44(sp)
lw $a0,40(sp)
jal foo()
lui s1, 0x1000
Inspecting memory in the core dump shows that 44(sp) is 0xffffffff, whereas 40(sp) is the correct value we intend to dereference. However, the value of a0 at the time of the crash, inside foo(), is 0xffffffff. (It's important to note that foo() in this case is just accessing a member; so it's literally the first instruction in foo() which is already attempting to access via a0, and crashing. Also, ra is pointing to the instruction following the above snippet, and s1 currently contains 0x10000000, so we're quite confident that foo() was, indeed, called from the above snippet.)
Our only theory at the moment is that the two consecutive lws into a0 are a hazard -- either a documented one, in which case this looks like a compiler bug; or an undocumented one.
So: is the above assembly legal? If it is, any other ideas about what could be going on here?
Thanks!
UPDATE: Well, turns out this was all a wild goose chase: a repeat analysis of the coredump by a colleague turned up a path in the code which I had missed, where there was a jump directly to the jal foo() instruction, immediately after having set a0 to 44(sp). In other words, there is a path in the code which is consistent with the result we're seeing that does not involve hazards, or "skipped instructions" or anything... I thought I checked this, but I guess I either didn't, or missed it... :(
Anyway, I've accepted markgz's answer, since it answers my original question about the legality of these instructions (apparently they are).
A quick search of the MIPS documentation for the MIPS32R2 ISA doesn't show any restrictions on LW after LW instructions.
There might be a bug in the MIPS implementation in your CPU. Things to look at include:
What address is 44(sp), 40(sp) - are they on a page boundary or a 256MByte boundary, or other interesting address?
Do either of the loads trigger a page fault?
Does patching the binary to insert a NOP, SSNOP, or a SYNC instruction between the loads make the problem go away?

ARM Undefined Instruction error

I'm getting an Undefined Instruction error while running an embedded system, no coprocessor, no MMU, Atmel 9263. The embedded system has memory in the range 0x20000000 - 0x23FFFFFF. I've had two cases so far:
SP 0x0030B840, LR 2000AE78 - the LR points at valid code, so I'm not sure what causes the exception, although the SP is bogus. What other addresses, registers, memory locations should I look at?
SP 0x20D384A8, LR 0x1FFCA59C - SP is ok, LR is bogus. Is there some kind of post mortem that I can do to find out how the LR got crushed? Looks like it rolled backwards off the end of the address space, but I can't figure out how.
Right now I am just replacing large chunks of code with simulations and running the tests agin to try and isolate the issue - the problem is sometimes it takes 4 hours to show the problem.
Any hints out there would be appreciated, thanks!
The chip is the AT91SAM9263, and we are using the IAR EWARM toolchain. I'm pretty sure it is straight ARM, but I will check.
EDIT
Another example of the Undef Instruct - this time SP/LR look fine. LR = 0x2000b0c4, and when I disassemble near there:
2000b0bc e5922000 LDR R2, [R2, #+0]
2000b0c0 e12fff32 BLX R2
2000b0c4 e1b00004 MOVS R0, R4
since LR is the instruction following the Undef Exception - how is BLX identified as Undefined? Note that CPSR is 0x00000013, so this is all ARM mode. However, R2 is 0x226d2a08 which is in the heap area, and I think is incorrect - the disassmbly there is ANDEQ R0,R0,R12, the instruction is 0x0000000C, and the other instructions there look like data to me. So I think the bad R2 is the problem, I'm just trying to understand why the Undef at the BLX?
thanks!
Check the T bit in the CPSR. If you are inadvertently changing from ARM mode to Thumb mode (or vice versa), undefined instructions will occur.
As far as the SP or LR getting corrupted, it could be that you execute a few instructions in the wrong mode that corrupt them before hitting the undefined instruction.
EDIT
Responding to the new error case in the edit of the question:
LR contains the return address from the BLX R2, so it makes sense that it points to one instruction after the BLX.
If R2 was pointing to the heap when the BLX R2 was executed, you'll jump into the heap and start executing the data as if they were instructions. This will cause an undefined instruction exception in short order...
If you want to see the exact instruction that was undefined, look at the R14_und register (defined while you're in the undefined instruction handler) - it contains the address of the next instruction after the Undefined one.
The root cause is the bad value in R2. Assuming this is C code, my guess is a bad pointer dereference, but I'd need to see the source to know for sure.
Is this an undefined instruction or a data abort because you are reading from an unaligned address?
edit:
On an undefined exception CPSR[4:0] should be 0b11011 or 0x1B not 0x13, 0x13 is a reset according to the arm arm.