Two functions/subroutines in ARM assembly language - function

I am stuck with an exercise of ARM.
The following program should calculate the result of 2((x-1)^2 + 1) but there is a mistake in the program that leads it into an infinite loop.
I think that I still don't understand completely subroutines and for this reason I am not seeing where the mistake is.
_start:
mov r0, #4
bl g
mov r7, #1
swi #0
f:
mul r1, r0, r0
add r0, r1, #1
mov pc, lr
g:
sub r0, r0, #1
bl f
add r0, r0, r0
mov pc, lr
The infinite loop starts in subroutine g: in the line of mov pc, lr and instead of returning to _start it goes to the previous line add r0, r0, r0 and then again to the last line of subroutine g:.
So I guess that the problem is the last line of subroutine g: but I can't find the way to return to _start without using mov pc, lr. I mean, this should be the command used when we have a branch with link.
Also, in this case r0 = 4, so the result of the program should be 20.

This is because you don't save lr on the stack prior to calling f, and the initial return address was therefore lost: if you only have one level of subroutine calls, using lr without saving it is fine, but if you have more then one, you need to preserve the previous value of lr.
For example, when compiling this C example using Compiler Explorer with ARM gcc 4.56.4 (Linux), and options -mthumb -O0,
void f()
{
}
void g()
{
f();
}
void start()
{
g();
}
The generated code will be:
f():
push {r7, lr}
add r7, sp, #0
mov sp, r7
pop {r7, pc}
g():
push {r7, lr}
add r7, sp, #0
bl f()
mov sp, r7
pop {r7, pc}
start():
push {r7, lr}
add r7, sp, #0
bl g()
mov sp, r7
pop {r7, pc}
If you were running this on bare metal, not under Linux, you'd need your stack pointer to be initialized a correct value.
Assuming you are running from RAM on a bare-metal system/simulator, you could setup a minimal stack of 128 bytes:
.text
.balign 8
_start:
adr r0, . + 128 // set top of stack at _start + 128
mov sp, r0
...
But it looks like you're writing a Linux executable that exits with a swi/r7=1 exit system call. So don't do that, it would make your program crash when it tries to write to the stack.

Related

arm cortex-m33 (trustzone, silabs efm32pg22) - assembler hardfaults accessing GPIO or almost any peripherals areas, any hint?

I am just lost here with this code trying to configure on baremetal the silicon labs efm32pg22 in theirs devkit accessed through internal J-Link from segger studio (great fast ide) - I have such example blink hello world in C working from theirs simplicity studio, but was trying to achieve the same thing I did on microchip pic32 mc00 or samd21g17d easily in pure assembler, having only clocks and startup configured through gui in mplab x... well, here I tried to go to segger IDE where is NO startup/clocks config easy way, or I didnt found it yet. On hardware level, registers of such cortex beasts are different by manufacturer, in C/C++ there is some not cheap unification over cmsis - but I want only to know what minimal is needed to just have working raw GPIO after clock/startup ... Segger project is generic cortex-m for specific efm32pg22 so cortex-M33 with trust-zone security - I probably dont know what all is locked or switched off or in which state MCU is, if privileged or nonprivileged - there are 2 sets of registers mapping, but nothing works. As far as I try to "store" or even "load" on GPIO config registers (or SMU regs to query someting too) it is throw hardfault exception. All using segger ide debugger over onboard j-link. Kindly please, what I am doing wrong, whats missing here?
in C, I have only this code:
extern void blink(void);
int main ( void )
{
blink();
}
In blink.s I have this:
;#https://github.com/hubmartin/ARM-cortex-M-bare-metal-assembler-examples/blob/master/02%20-%20Bare%20metal%20blinking%20LED/main.S
;#https://sites.google.com/site/hubmartin/arm/arm-cortex-bare-metal-assembly/02---arm-cortex-bare-metal-assembly-blinking-led
;#https://mecrisp-stellaris-folkdoc.sourceforge.io/projects/blink-f0disco-gdbtui/doc/readme.html
;#https://microcontrollerslab.com/use-gpio-pins-tm4c123g-tiva-launchpad/
;#!!! ENABLE GPIO CLOCK SOURCE ON EFM32 !!!
;#https://community.silabs.com/s/share/a5U1M000000knsWUAQ/hello-world-part-2-create-firmware-to-blink-the-led?language=en_US
;#EFM32 GPIO
;#https://www.silabs.com/documents/public/application-notes/an0012-efm32-gpio.pdf
;# ARM thumb2 ISA
;#https://www.engr.scu.edu/~dlewis/book3/docs/ARM_and_Thumb-2_Instruction_Set.pdf
;#https://sciencezero.4hv.org/index.php?title=ARM:_Cortex-M3_Thumb-2_instruction_set
;#!!! https://stackoverflow.com/questions/48561243/gnu-arm-assembler-changes-orr-into-movw
;#segger assembler
;#https://studio.segger.com/segger/UM20006_Assembler.pdf
;#https://www.segger.com/doc/UM20006_Assembler.html
;#!!! unfortunatelly, we dont know here yet how to include ASM SFR defines, nor for MPLAB ARM (Harmony) !!!
;##include <xc.h>
;##include "definitions.h"
.cpu cortex-m33
.thumb
.text
.section .text.startup.main,"ax",%progbits
.balign 2
.p2align 2,,3
.global blink
//.arch armv8-m.base
.arch armv6-m
.syntax unified
.code 16
.thumb_func
.fpu softvfp
.type blink, %function
//!!! here we have manually entered GPIO PORT defines for PIC32CM
.equ SYSCFG_BASE_ADDRESS, 0x50078000
.equ SMU_BASE_ADDRESS, 0x54008000
//.equ SMU_BASE_ADDRESS, 0x5400C000
.equ CMU_BASE_ADDRESS, 0x50008000
.equ GPIO_BASE_ADDRESS, 0x5003C000 // this differs totally from both "special" infineon and microchip "standard?" cortex devices !!!
.equ DELAY, 40000
// Vector table
.word 0x20001000 // Vector #0 - Stack pointer init value (0x20000000 is RAM address and 0x1000 is 4kB size, stack grows "downwards")
.word blink // Vector #1 - Reset vector - where the code begins
// Vector #3..#n - I don't use Systick and another interrupts right now
// so it is not necessary to define them and code can start here
blink:
LDR r0, =(SYSCFG_BASE_ADDRESS + 0x200) // SYSCFG SYSCFG_CTRL
LDR r1, =0 // 0 diable address faults exceptions
ldr r1, [r0] // Store R0 value to r1
LDR r0, =(CMU_BASE_ADDRESS) // CMU CMU_SYSCLKCTRL PCLKPRESC + CLKSEL
LDR r1, =0b10000000001 // FSRCO 20MHz + PCLK = HCLK/2 = 10MHz
STR r1, [r0, 0x70] // Store R0 value to r1
LDR r0, =(CMU_BASE_ADDRESS) // CMU CMU_CLKEN0
LDR r1, [r0, 0x64]
LDR r2, =(1 << 25) // GPIO CLK EN
orrs r1, r2 // !!! HORROR !!! -- orr is not possible in thumb2 ?? only orrs !! (width suffix)
STR r1, [r0, 0x64] // Store R0 value to r1
LDR r1, [r0, 0x68]
LDR r2, =(1 << 14) // SMU CLK EN
orrs r1, r2 // !!! HORROR !!! -- orr is not possible in thumb2 ?? only orrs !! (width suffix)
STR r1, [r0, 0x68] // Store R0 value to r1
//LDR r0, =(SMU_BASE_ADDRESS) // SMU SMU_LOCK
//LDR r1, =11325013 // SMU UNLOCK CODE
//STR r1, [r0, 0x08] //Store R0 value to r1
ldr r0, =(SMU_BASE_ADDRESS) // SMU reading values, detection - AGAIN, HARD FAULTS !!!!!!!
ldr r1, [r0, 0x04]
ldr r1, [r0, 0x20]
ldr r1, [r0, 0x40]
//LDR r0, =(GPIO_BASE_ADDRESS + 0x300) // GPIO UNLOCK
//LDR r1, =0xA534
//STR r1, [r0] // Store R0 value to r1
//!! THIS BELOW IS OLD FOR SAMD , WE STILL SIMPLY CANT ENABLE GPIO !!!!
// Enable PORTA pin 4 as output
LDR r0, =(GPIO_BASE_ADDRESS) // DIR PORTA
LDR r1, =0b00000000000001000000000000000000
STR r1, [r0, 0x04] // Store R0 value to r1
LDR R2, =1
loop:
// Write high to pin PA04
LDR r0, =GPIO_BASE_ADDRESS // OUT PORTA
LDR r1, =0b10000 // PORT_PA04
STR r1, [r0, 0x10] // Store R1 value to address pointed by R0
// Dummy counter to slow down my loop
LDR R0, =0
LDR R1, =DELAY
loop0:
ADD R0, R2
cmp R0, R1
bne loop0
// Write low to PA04
LDR r0, =GPIO_BASE_ADDRESS // OUT PORTA
LDR r1, =0b00000
STR r1, [r0, 0x10] // Store R1 value to address pointed by R0
// Dummy counter to slow down my loop
LDR R0, =0
LDR R1, =DELAY
loop1:
ADD R0, R2
cmp R0, R1
bne loop1
b loop
UPDATE: well, now I tried it again in SimplicityStudio, placing blink() call after pregenerated system init:
extern void blink(void);
int main(void)
{
// Initialize Silicon Labs device, system, service(s) and protocol stack(s).
// Note that if the kernel is present, processing task(s) will be created by
// this call.
sl_system_init();
blink();
}
having this code in blink.s: - and here it works this way and blinks ...
.cpu cortex-m33
.thumb
.text
.section .text.startup.main,"ax",%progbits
.balign 2
.p2align 2,,3
.global blink
//.arch armv8-m.base
.arch armv6-m
.syntax unified
.code 16
.thumb_func
.fpu softvfp
.type blink, %function
/*
//!!! here we have manually entered GPIO PORT defines for PIC32CM
.equ SYSCFG_BASE_ADDRESS, 0x50078000
.equ SMU_BASE_ADDRESS, 0x54008000
//.equ SMU_BASE_ADDRESS, 0x5400C000
.equ CMU_BASE_ADDRESS, 0x50008000
*/
.equ GPIO_BASE_ADDRESS, 0x5003C000 // this differs totally from both "special" infineon and microchip "standard?" cortex devices !!!
.equ DELAY, 400000
// Vector table
.word 0x20001000 // Vector #0 - Stack pointer init value (0x20000000 is RAM address and 0x1000 is 4kB size, stack grows "downwards")
.word blink // Vector #1 - Reset vector - where the code begins
// Vector #3..#n - I don't use Systick and another interrupts right now
// so it is not necessary to define them and code can start here
blink:
// Enable PORTA pin 4 as output
LDR r0, =(GPIO_BASE_ADDRESS) // DIR PORTA
LDR r1, =0b00000000000001000000000000000000
STR r1, [r0, 0x04]
loop:
// Write high to pin PA04
LDR r0, =GPIO_BASE_ADDRESS // OUT PORTA
LDR r1, =0b10000 // PORT_PA04
STR r1, [r0, 0x10]
// Dummy counter to slow down my loop
LDR R0, =0
LDR R1, =DELAY
loop0:
ADD R0, R2
cmp R0, R1
bne loop0
// Write low to PA04
LDR r0, =GPIO_BASE_ADDRESS // OUT PORTA
LDR r1, =0b00000
STR r1, [r0, 0x10]
// Dummy counter to slow down my loop
LDR R0, =0
LDR R1, =DELAY
loop1:
ADD R0, R2
cmp R0, R1
bne loop1
b loop
... so NOW, I am just curious, what all is missing in pure assembly code to bring that cortex-m33 into some "easy" state, just ignoring trustzone, probably to use it similary as say, plain cortex-m3 ??
can anybody help? I am digging deeply into this datasheet/ref manual, but no luck till now ...
https://www.silabs.com/documents/public/reference-manuals/efm32pg22-rm.pdf
UPDATE AGAIN: umm, will try to figure out ... by traversing system_init C-code its clear whats going on, there are also some chip errata workarounds, but I never touched DCDC while initializing, this may be culprit...
void sl_platform_init(void)
{
CHIP_Init();
sl_device_init_nvic();
sl_board_preinit();
sl_device_init_dcdc();
sl_device_init_hfxo();
sl_device_init_lfxo();
sl_device_init_clocks();
sl_device_init_emu();
sl_board_init();
}
well, okay, manufacturer specific code generation for MCU startup IS really important and useful thing )) ... such MCUs from different manufacturers are really much different at registers level (even that all are "cortex-m" core based), that its worthless to try to configure them manually in assembly if there is enough flash available, and it mostly IS. So, till now, no luck with segger/keil/iar "generic" arm/cortex IDEs to do this properly on specific parts, so using manufacturer specific IDE to (mostly) graphically configure startup clocks and peripherals IS CRUCIAL, or at least, its really easiest way (I know, quite expensive observation after all the assembly tries... )). After then, its easy to make even pure assembly "blink" helloworld test called as extern C-function. You may be asking why I am still considering assembly if there are even CMSIS (on arm) "platform abstraction layer" C-headers at least (no, it doesnt help in abstraction, as the devices are still very different, you only have registers symbols #defines and typedefs and enums to do something in C easily, okay). But I am trying to compare some C-compiled code with handwriten assembly for some specific purpose, which needs forced optimized algorithm from scratch and its often quite easier to think/design it directly in assembly that to rely on very complexly described C-compiler optimisations (each compiler has its own LONG document how his optimisations work and at this level, C is simply still too abstract and moving target, the more, you try to write something for even different MCU architectures (think ARM cortex-m, PIC32/mips, and/or even PIC16/18 + PIC24, AVR , MSP430 ...) - while general algorithm may be described in shared pseudoassenbly to be as near to hardware as possible, withnout knowing all optimization quirks of each architecture C compiler(s) - there are often MORE different C compilers too. So, to compare C-compiler generated code with handwriten assembly you can do it, and I already tried such assembly blink on MANY VERY different architectures, in case I definitelly used mfg specific IDE to genearte startup in C, using all the GUI configurations and code generation down to always compilable empty C project, of course, having very different code size output using such generated startups. Most advanced MCUs are really very complex, mostly in clocks configuration and pins functions config and then different peripheral devices too, sure. Some similarities are possible only at single mfg level, to some extent, so MCU of single manufacturer often share similar approach, obviously. So final solution is to have startup generated and then switch to assembly immediatelly, this is feasible. Sure that in case of small flash, its further possible to optimize even startup code, but its mostly important on smallest 8bit parts, where startup IS quite easy anyway or the generated code is also small, obviously.

x86: Access writing violation while using the OFFSET operator to address of array

I am getting the Exception thrown at 0x0044369D in Project2.exe: 0xC0000005: Access violation writing location 0x00000099. From my research so far, I am under the impression this has to do with a null pointer or out of range memory being accessed from the line mov [eax], ecx
I used the offset operator in mov eax, OFFSET arrayfib and I thought that should remedy this. I can't seem to figure out what is causing the issue here.
.model flat, stdcall
.stack 4096
INCLUDE Irvine32.inc
ExitProcess PROTO, dwExitCode: DWORD
.data
arrayfib DWORD 35 DUP (99h)
.code
main PROC
mov eax, OFFSET arrayfib ;address of fibonacci number array
mov bx, 30 ;number of Fibonacci numbers to generate
call fibSequence ;call to Fibonacci sequence procedure
mov edx, OFFSET arrayfib ;passes information to call DumpMem
mov ecx, LENGTHOF arrayfib
mov ebx, TYPE arrayfib
call DumpMem
INVOKE ExitProcess, 0
main ENDP
;----------------------------------------------------------------------------------
;fibSequence
;Calculates the fibonacci numbers to the n'th fibonacci number
;Receives: eax = address of the Fibonacci number array
;bx = number of Fibonacci numbers to generate
;ecx = used for Fibonacci calculation (i-2)
;edx = used for Fibonacci calculation (i-1)
;returns: [eax+4i] = each value of fibonacci sequence in array, scaled for doubleword
;---------------------------------------------------------------------------------
fibSequence PROC
mov ecx, 0 ;initialize the registers with Fib(0) and Fib(1)
mov edx, 1
mov [eax], edx ;store first Fibonacci number in the array
;since a Fibonacci number has been generated, decrement the counter
;test for completion, proceed
mov eax, [eax+4]
fibLoop:
sub bx, 1
jz quit ;if bx = 0, jump to exit
add ecx, edx
push ecx ;save fib# for next iteration before it is destroyed
mov [eax], ecx ;stores new fibonacci number in the next address of array
push edx ;save other fib# before it is destroyed
mov ecx, edx
mov edx, [eax]
mov eax, [eax+4] ;increments accounting for doubleword
pop edx
pop ecx
quit:
exit
fibSequence ENDP
END main
Also if there are any other suggestions I would be happy to hear them. I am new to all this and looking to learn as much as possible.

What's point of the instruction "push 0FFFFFFFFh" after a new stack frame establishing?

The instruction "push 0FFFFFFFFh" appears that just after a new stack frame within the callee is established,e.g.,
push ebp
mov ebp,esp
push 0FFFFFFFFh <===HERE //[ebp-4] is set to 0FFFFFFFFh
push 0255F58h // SEH EXCEPTION_REGISTRATION.handler
mov eax,dword ptr fs:[00000000h]
push eax // SEH EXCEPTION_REGISTRATION.prev
sub esp,0D8h
push ebx
push esi
push edi
lea edi,[ebp-0E4h]
mov ecx,36h // 36h * 0CCCCCCCCh
mov eax,0CCCCCCCCh
rep stos dword ptr es:[edi]
mov eax,dword ptr [__security_cookie (025A004h)]
xor eax,ebp
push eax
lea eax,[ebp-0Ch]
mov dword ptr fs:[00000000h],eax // Install new EXECEPTION_REGISTRATION
lea ecx,[intobj]
call A<int>::A<int> (0251389h)
mov dword ptr [ebp-4],0 //[ebp-4] is set to 0
call A<int>::PrintNum (025139Dh)
mov dword ptr [ebp-0E0h],0
mov dword ptr [ebp-4],0FFFFFFFFh //[ebp-4] is set to 0FFFFFFFFh again, then [ebp-4] keeps the value in this callee.
lea ecx,[intobj]
call A<int>::~A<int> (025138Eh)
mov eax,dword ptr [ebp-0E0h]
...
What's point of this instruction "push 0FFFFFFFFh"?
[C++ Source Code]
[UPDATE] Apr 4 2018
Using Windbg, I can make sure the instruction "push 0FFFFFFFFh"(see "Here" in disassembly code) is nothing with SEH, though I still don't know what's point of this instruction "push 0FFFFFFFFh"?
0:000> dd fs:[0] l4
0053:00000000 0046fec0 00470000 0046d000 00000000
0:000> dt _Exception_registration_record 0046fec0
test!_EXCEPTION_REGISTRATION_RECORD
+0x000 Next : 0x0046ff28 _EXCEPTION_REGISTRATION_RECORD <== eax
+0x004 Handler : 0x00255F58 _EXCEPTION_DISPOSITION test!__scrt_stub_for_acrt_uninitialize+0 <== 0255F58h
0:000> dt _Exception_registration_record 0x0046ff28
test!_EXCEPTION_REGISTRATION_RECORD
+0x000 Next : 0x0046ff84 _EXCEPTION_REGISTRATION_RECORD
+0x004 Handler : 0x00283100 _EXCEPTION_DISPOSITION test!_except_handler4+0
0:000> dt _Exception_registration_record 0x0046ff84
test!_EXCEPTION_REGISTRATION_RECORD
+0x000 Next : 0xffffffff _EXCEPTION_REGISTRATION_RECORD
+0x004 Handler : 0x77875845 _EXCEPTION_DISPOSITION ntdll!_except_handler4+0
Since SEH is a linked list there are actually two addresses.
The first one is the address of the next handler (in case of chaining) or 0xFFFFFFFF (-1) is this is the last one. The next one is the actual SE Handler.
An old but good read about SEH is "A Crash Course on the Depths of Win32™ Structured Exception Handling".

Function call with more than 4 registers ARM assembly

I am trying to pass r0-r5 into the function check. However only the registers r0-r3 are copied by reference. In my main function i have this code.
push {lr}
mov r0, #1
mov r1, #2
mov r2, #3
mov r3, #4
mov r4, #5
mov r5, #6
bl check
pop {lr}
bx lr
Inside my check function i have this code. This is in a separate file also not sure if that matters
m: .asciz "%d, %d ~ (%d, %d, %d)
...
push {lr}
ldr r0, =m
bl printf
pop {lr}
bx lr
The output for this is 2, 3 ~ (4, 33772, 1994545180). I am trying to learn assembly so can you please explain the answer with some googling i know i need to use the stack but, I am not sure how to use it and would like to learn how. Thanks in advance.
you could just try it and see
void check ( unsigned int, unsigned int, unsigned int, unsigned int, unsigned int );
void call_check ( void )
{
check(1,2,3,4,5);
}
arm-linux-gnueabi-gcc -c -O2 check.c -o check.o
arm-linux-gnueabi-objdump -D check.o
00000000 <call_check>:
0: e52de004 push {lr} ; (str lr, [sp, #-4]!)
4: e3a03005 mov r3, #5
8: e24dd00c sub sp, sp, #12
c: e58d3000 str r3, [sp]
10: e3a00001 mov r0, #1
14: e3a01002 mov r1, #2
18: e3a02003 mov r2, #3
1c: e3a03004 mov r3, #4
20: ebfffffe bl 0 <check>
24: e28dd00c add sp, sp, #12
28: e8bd8000 ldmfd sp!, {pc}
now of course this could be hand optimized and still work just fine. Maybe they are keeping the stack aligned on a 16 byte/4 word/64 bit boundary is the reason for the additional 12 byte modification to the stack pointer? dont know. but other than that you can see that you naturally need to save the link register since you are calling another function. r0 - r3 are obvious and then per the eabi the first thing on the stack is the 5th word worth of parameters.
Likewise for your check function you can simply let the compiler get you started. If you look at your code, r0 is coming in as your first parameter and then you trash it by changing it to the first parameter for printf. you need 6 parameters for printf to pass in. you need to move them over one the first parameter to check is the second parameter to printf, the second to check is third to printf and so on. so the code has to do that shift (two of which now are on the stack).

Using Functions in the LC3 Assembly Language

I need to know how to write a simple function in LC3 and using it in the main program.
It's just a matter of creating a label and then jumping to it. Once you're done with that subroutine then return back to the main code.
.orig x3000
AND R0, R0, #0 ; clear R0
JSR FUNCTION
PUTc
HALT ; TRAP x25
FUNCTION
ADD R0, R0, #10 ; Store the value of 10 into R0
RET ; return back to the main code
.end