How the compiled version of this toy language pseudocode would work in assembly roughly? - function

I am working on a toy language which is need of some compilation now. So far, I have nested function calls, and I am wondering what the assembly sort of pseudocode would look like. This includes:
Function prologue and epilogue.
Instruction "blocks".
How functions are reused.
Not real assembly code, just pseudocode (though I am sort of coming from x86).
The x86 "block" structure I'm referring to seems to be this:
my_function:
enter N, 0
; ... details
leave
ret
To keep it simple (avoiding async and all that), my toy code looks like this basically:
my_func_a(add(1, my_func_z(2, 3)), sub(my_func_r(4, 5, 6), 7), 8)
Or spread out:
my_func_a(
add(
1,
my_func_z(2, 3)
),
sub(
my_func_r(
4,
5,
6
),
7
),
8
)
When I try to translate this mentally into some sort of corresponding pseudo-assembly, I end up doing this:
u = my_func_r(4, 5, 6)
v = sub(u, 7)
w = my_func_z(2, 3)
x = add(1, w)
y = my_func_a(x, v) ; final result
I'm not sure if I got the order exactly correct, but it's close I think. But this isn't quite low-level enough as x86, so I try lowering it further:
mov r1, 4
mov r2, 5
mov r3, 6
call my_func_r
; how to capture the "u = ..."?
mov r1, u?
mov r2, 7
call sub
; how about capturing "v"?
mov r1, 2
mov r2, 3
call my_func_z
; w = ... somehow
mov r1, 1
mov r2, w
call add
; x = ... somehow
mov r1, x
mov r2, v
call my_func_a
; y...
I would try and use some sort of tool to compile some rough C-like language into LLVM, for example, but it's way over my head to get that all going and working at this stage.
Now let's say we have implementations for some of these functions:
my_func_r(a,b,c):
enter ...
add a, b
sub b, c
leave
ret
my_func_z(a,b):
enter ...
; ...
leave
ret
my_func_a(a,b):
enter ...
; ...
leave
ret
Basically what I'm wondering is, how should the final pseudocode be written? (given I haven't exactly specified here how everything works, just got it at a rough level at this point). What would it roughly look like? I don't see how the values are properly placed and passed around in the lowest-level Assembly version.
Where are the arguments passed into the functions? At what place, before the function call, or inside the function/block?
How do you capture the output variables outside of the function?
Sorry if this pseudocode is prone with errors, I am just beginning to put the pieces together, so please be gentle.

Related

arm cortex-m33 (trustzone, silabs efm32pg22) - assembler hardfaults accessing GPIO or almost any peripherals areas, any hint?

I am just lost here with this code trying to configure on baremetal the silicon labs efm32pg22 in theirs devkit accessed through internal J-Link from segger studio (great fast ide) - I have such example blink hello world in C working from theirs simplicity studio, but was trying to achieve the same thing I did on microchip pic32 mc00 or samd21g17d easily in pure assembler, having only clocks and startup configured through gui in mplab x... well, here I tried to go to segger IDE where is NO startup/clocks config easy way, or I didnt found it yet. On hardware level, registers of such cortex beasts are different by manufacturer, in C/C++ there is some not cheap unification over cmsis - but I want only to know what minimal is needed to just have working raw GPIO after clock/startup ... Segger project is generic cortex-m for specific efm32pg22 so cortex-M33 with trust-zone security - I probably dont know what all is locked or switched off or in which state MCU is, if privileged or nonprivileged - there are 2 sets of registers mapping, but nothing works. As far as I try to "store" or even "load" on GPIO config registers (or SMU regs to query someting too) it is throw hardfault exception. All using segger ide debugger over onboard j-link. Kindly please, what I am doing wrong, whats missing here?
in C, I have only this code:
extern void blink(void);
int main ( void )
{
blink();
}
In blink.s I have this:
;#https://github.com/hubmartin/ARM-cortex-M-bare-metal-assembler-examples/blob/master/02%20-%20Bare%20metal%20blinking%20LED/main.S
;#https://sites.google.com/site/hubmartin/arm/arm-cortex-bare-metal-assembly/02---arm-cortex-bare-metal-assembly-blinking-led
;#https://mecrisp-stellaris-folkdoc.sourceforge.io/projects/blink-f0disco-gdbtui/doc/readme.html
;#https://microcontrollerslab.com/use-gpio-pins-tm4c123g-tiva-launchpad/
;#!!! ENABLE GPIO CLOCK SOURCE ON EFM32 !!!
;#https://community.silabs.com/s/share/a5U1M000000knsWUAQ/hello-world-part-2-create-firmware-to-blink-the-led?language=en_US
;#EFM32 GPIO
;#https://www.silabs.com/documents/public/application-notes/an0012-efm32-gpio.pdf
;# ARM thumb2 ISA
;#https://www.engr.scu.edu/~dlewis/book3/docs/ARM_and_Thumb-2_Instruction_Set.pdf
;#https://sciencezero.4hv.org/index.php?title=ARM:_Cortex-M3_Thumb-2_instruction_set
;#!!! https://stackoverflow.com/questions/48561243/gnu-arm-assembler-changes-orr-into-movw
;#segger assembler
;#https://studio.segger.com/segger/UM20006_Assembler.pdf
;#https://www.segger.com/doc/UM20006_Assembler.html
;#!!! unfortunatelly, we dont know here yet how to include ASM SFR defines, nor for MPLAB ARM (Harmony) !!!
;##include <xc.h>
;##include "definitions.h"
.cpu cortex-m33
.thumb
.text
.section .text.startup.main,"ax",%progbits
.balign 2
.p2align 2,,3
.global blink
//.arch armv8-m.base
.arch armv6-m
.syntax unified
.code 16
.thumb_func
.fpu softvfp
.type blink, %function
//!!! here we have manually entered GPIO PORT defines for PIC32CM
.equ SYSCFG_BASE_ADDRESS, 0x50078000
.equ SMU_BASE_ADDRESS, 0x54008000
//.equ SMU_BASE_ADDRESS, 0x5400C000
.equ CMU_BASE_ADDRESS, 0x50008000
.equ GPIO_BASE_ADDRESS, 0x5003C000 // this differs totally from both "special" infineon and microchip "standard?" cortex devices !!!
.equ DELAY, 40000
// Vector table
.word 0x20001000 // Vector #0 - Stack pointer init value (0x20000000 is RAM address and 0x1000 is 4kB size, stack grows "downwards")
.word blink // Vector #1 - Reset vector - where the code begins
// Vector #3..#n - I don't use Systick and another interrupts right now
// so it is not necessary to define them and code can start here
blink:
LDR r0, =(SYSCFG_BASE_ADDRESS + 0x200) // SYSCFG SYSCFG_CTRL
LDR r1, =0 // 0 diable address faults exceptions
ldr r1, [r0] // Store R0 value to r1
LDR r0, =(CMU_BASE_ADDRESS) // CMU CMU_SYSCLKCTRL PCLKPRESC + CLKSEL
LDR r1, =0b10000000001 // FSRCO 20MHz + PCLK = HCLK/2 = 10MHz
STR r1, [r0, 0x70] // Store R0 value to r1
LDR r0, =(CMU_BASE_ADDRESS) // CMU CMU_CLKEN0
LDR r1, [r0, 0x64]
LDR r2, =(1 << 25) // GPIO CLK EN
orrs r1, r2 // !!! HORROR !!! -- orr is not possible in thumb2 ?? only orrs !! (width suffix)
STR r1, [r0, 0x64] // Store R0 value to r1
LDR r1, [r0, 0x68]
LDR r2, =(1 << 14) // SMU CLK EN
orrs r1, r2 // !!! HORROR !!! -- orr is not possible in thumb2 ?? only orrs !! (width suffix)
STR r1, [r0, 0x68] // Store R0 value to r1
//LDR r0, =(SMU_BASE_ADDRESS) // SMU SMU_LOCK
//LDR r1, =11325013 // SMU UNLOCK CODE
//STR r1, [r0, 0x08] //Store R0 value to r1
ldr r0, =(SMU_BASE_ADDRESS) // SMU reading values, detection - AGAIN, HARD FAULTS !!!!!!!
ldr r1, [r0, 0x04]
ldr r1, [r0, 0x20]
ldr r1, [r0, 0x40]
//LDR r0, =(GPIO_BASE_ADDRESS + 0x300) // GPIO UNLOCK
//LDR r1, =0xA534
//STR r1, [r0] // Store R0 value to r1
//!! THIS BELOW IS OLD FOR SAMD , WE STILL SIMPLY CANT ENABLE GPIO !!!!
// Enable PORTA pin 4 as output
LDR r0, =(GPIO_BASE_ADDRESS) // DIR PORTA
LDR r1, =0b00000000000001000000000000000000
STR r1, [r0, 0x04] // Store R0 value to r1
LDR R2, =1
loop:
// Write high to pin PA04
LDR r0, =GPIO_BASE_ADDRESS // OUT PORTA
LDR r1, =0b10000 // PORT_PA04
STR r1, [r0, 0x10] // Store R1 value to address pointed by R0
// Dummy counter to slow down my loop
LDR R0, =0
LDR R1, =DELAY
loop0:
ADD R0, R2
cmp R0, R1
bne loop0
// Write low to PA04
LDR r0, =GPIO_BASE_ADDRESS // OUT PORTA
LDR r1, =0b00000
STR r1, [r0, 0x10] // Store R1 value to address pointed by R0
// Dummy counter to slow down my loop
LDR R0, =0
LDR R1, =DELAY
loop1:
ADD R0, R2
cmp R0, R1
bne loop1
b loop
UPDATE: well, now I tried it again in SimplicityStudio, placing blink() call after pregenerated system init:
extern void blink(void);
int main(void)
{
// Initialize Silicon Labs device, system, service(s) and protocol stack(s).
// Note that if the kernel is present, processing task(s) will be created by
// this call.
sl_system_init();
blink();
}
having this code in blink.s: - and here it works this way and blinks ...
.cpu cortex-m33
.thumb
.text
.section .text.startup.main,"ax",%progbits
.balign 2
.p2align 2,,3
.global blink
//.arch armv8-m.base
.arch armv6-m
.syntax unified
.code 16
.thumb_func
.fpu softvfp
.type blink, %function
/*
//!!! here we have manually entered GPIO PORT defines for PIC32CM
.equ SYSCFG_BASE_ADDRESS, 0x50078000
.equ SMU_BASE_ADDRESS, 0x54008000
//.equ SMU_BASE_ADDRESS, 0x5400C000
.equ CMU_BASE_ADDRESS, 0x50008000
*/
.equ GPIO_BASE_ADDRESS, 0x5003C000 // this differs totally from both "special" infineon and microchip "standard?" cortex devices !!!
.equ DELAY, 400000
// Vector table
.word 0x20001000 // Vector #0 - Stack pointer init value (0x20000000 is RAM address and 0x1000 is 4kB size, stack grows "downwards")
.word blink // Vector #1 - Reset vector - where the code begins
// Vector #3..#n - I don't use Systick and another interrupts right now
// so it is not necessary to define them and code can start here
blink:
// Enable PORTA pin 4 as output
LDR r0, =(GPIO_BASE_ADDRESS) // DIR PORTA
LDR r1, =0b00000000000001000000000000000000
STR r1, [r0, 0x04]
loop:
// Write high to pin PA04
LDR r0, =GPIO_BASE_ADDRESS // OUT PORTA
LDR r1, =0b10000 // PORT_PA04
STR r1, [r0, 0x10]
// Dummy counter to slow down my loop
LDR R0, =0
LDR R1, =DELAY
loop0:
ADD R0, R2
cmp R0, R1
bne loop0
// Write low to PA04
LDR r0, =GPIO_BASE_ADDRESS // OUT PORTA
LDR r1, =0b00000
STR r1, [r0, 0x10]
// Dummy counter to slow down my loop
LDR R0, =0
LDR R1, =DELAY
loop1:
ADD R0, R2
cmp R0, R1
bne loop1
b loop
... so NOW, I am just curious, what all is missing in pure assembly code to bring that cortex-m33 into some "easy" state, just ignoring trustzone, probably to use it similary as say, plain cortex-m3 ??
can anybody help? I am digging deeply into this datasheet/ref manual, but no luck till now ...
https://www.silabs.com/documents/public/reference-manuals/efm32pg22-rm.pdf
UPDATE AGAIN: umm, will try to figure out ... by traversing system_init C-code its clear whats going on, there are also some chip errata workarounds, but I never touched DCDC while initializing, this may be culprit...
void sl_platform_init(void)
{
CHIP_Init();
sl_device_init_nvic();
sl_board_preinit();
sl_device_init_dcdc();
sl_device_init_hfxo();
sl_device_init_lfxo();
sl_device_init_clocks();
sl_device_init_emu();
sl_board_init();
}
well, okay, manufacturer specific code generation for MCU startup IS really important and useful thing )) ... such MCUs from different manufacturers are really much different at registers level (even that all are "cortex-m" core based), that its worthless to try to configure them manually in assembly if there is enough flash available, and it mostly IS. So, till now, no luck with segger/keil/iar "generic" arm/cortex IDEs to do this properly on specific parts, so using manufacturer specific IDE to (mostly) graphically configure startup clocks and peripherals IS CRUCIAL, or at least, its really easiest way (I know, quite expensive observation after all the assembly tries... )). After then, its easy to make even pure assembly "blink" helloworld test called as extern C-function. You may be asking why I am still considering assembly if there are even CMSIS (on arm) "platform abstraction layer" C-headers at least (no, it doesnt help in abstraction, as the devices are still very different, you only have registers symbols #defines and typedefs and enums to do something in C easily, okay). But I am trying to compare some C-compiled code with handwriten assembly for some specific purpose, which needs forced optimized algorithm from scratch and its often quite easier to think/design it directly in assembly that to rely on very complexly described C-compiler optimisations (each compiler has its own LONG document how his optimisations work and at this level, C is simply still too abstract and moving target, the more, you try to write something for even different MCU architectures (think ARM cortex-m, PIC32/mips, and/or even PIC16/18 + PIC24, AVR , MSP430 ...) - while general algorithm may be described in shared pseudoassenbly to be as near to hardware as possible, withnout knowing all optimization quirks of each architecture C compiler(s) - there are often MORE different C compilers too. So, to compare C-compiler generated code with handwriten assembly you can do it, and I already tried such assembly blink on MANY VERY different architectures, in case I definitelly used mfg specific IDE to genearte startup in C, using all the GUI configurations and code generation down to always compilable empty C project, of course, having very different code size output using such generated startups. Most advanced MCUs are really very complex, mostly in clocks configuration and pins functions config and then different peripheral devices too, sure. Some similarities are possible only at single mfg level, to some extent, so MCU of single manufacturer often share similar approach, obviously. So final solution is to have startup generated and then switch to assembly immediatelly, this is feasible. Sure that in case of small flash, its further possible to optimize even startup code, but its mostly important on smallest 8bit parts, where startup IS quite easy anyway or the generated code is also small, obviously.

Why does moving a call function 1 byte earlier causes it to malfunction? What to modify to resolve it?

I am trying to create an inline codecave to modify a very old dosgame (ROTK2). However, far calls cannot be repositioned - moving the machine code for the function call 1 byte earlier causes it to malfunction. What parameters I need to readjust to correct the problem?
To be more specific for the task needed, I need to create 3 bytes of space to be able to add an assembly line. Hence I need to reposition code (codecaving may not be a possible option due to lack of resources in script, I had to find the 3 bytes inline).
There are too many lines of code in the whole program, which I do not fully understand, but I am using dosbox debugger with breakpoints to pinpoint to lines relevant:
This position of the code prints out the year, month and season at the current point of time in the game. From line 4EE8, the two pushs of ax moves the cursor to the position (2,4) on the screen so that the year located at [0x44] and the month, number located at bl [0x46] which then needs to be modified to reference to the month in text somewhere else. Then the string is printed out, and the process is repeated.
...
00004EDF 0E push cs
00004EE0 E80500 call 0x4ee8
00004EE3 0E push cs
00004EE4 E85101 call 0x5038
00004EE7 CB retf
00004EE8 B80400 mov ax,0x4
00004EEB 50 push ax
00004EEC B80200 mov ax,0x2
00004EEF 50 push ax
00004EF0 9A3404EF03 call 0x3ef:0x434
00004EF5 83C404 add sp,byte +0x4
00004EF8 FF364400 push word [0x44]
00004EFC 8A1E4600 mov bl,[0x46]
00004F00 2AFF sub bh,bh
00004F02 D1E3 shl bx,1
00004F04 FFB71C36 push word [bx+0x361c]
00004F08 B8B834 mov ax,0x34b8
00004F0B 50 push ax
00004F0C 9AE806EF03 call 0x3ef:0x6e8
00004F11 83C406 add sp,byte +0x6
00004F14 B80C00 mov ax,0xc
00004F17 50 push ax
00004F18 B80500 mov ax,0x5
00004F1B 50 push ax
00004F1C 9A3404EF03 call 0x3ef:0x434
00004F21 83C404 add sp,byte +0x4
00004F24 A04600 mov al,[0x46]
00004F27 B103 mov cl,0x3
00004F29 2AE4 sub ah,ah
00004F2B F6F1 div cl
00004F2D 8AD8 mov bl,al
00004F2F 2AFF sub bh,bh
00004F31 D1E3 shl bx,1
00004F33 FFB7D634 push word [bx+0x34d6]
00004F37 B8CC34 mov ax,0x34cc
00004F3A 50 push ax
00004F3B 9AE806EF03 call 0x3ef:0x6e8
00004F40 83C404 add sp,byte +0x4
00004F43 CB retf
....
I can make the code more "concise" by doing this without breaking the code:
...
00004EDF 0E push cs
00004EE0 E80500 call 0x4ee8
00004EE3 0E push cs
00004EE4 E85101 call 0x5038
00004EE7 CB retf
00004EE8 B80400 mov ax,0x4
00004EEB 50 push ax
00004EEC D1E8 shr ax,1
00004EEE 50 push ax; just nice the previous number was double
00004EEF 90 nop
00004EF0 9A3404EF03 call 0x3ef:0x434
00004EF5 83C404 add sp,byte +0x4
00004EF8 FF364400 push word [0x44]
00004EFC 8A1E4600 mov bl,[0x46]
00004F00 2AFF sub bh,bh
00004F02 D1E3 shl bx,1
00004F04 FFB71C36 push word [bx+0x361c]
00004F08 B8B834 mov ax,0x34b8
00004F0B 50 push ax
00004F0C 9AE806EF03 call 0x3ef:0x6e8
00004F11 83C406 add sp,byte +0x6
00004F14 B80C00 mov ax,0xc
00004F17 50 push ax
00004F18 B80500 mov ax,0x5
00004F1B 50 push ax
00004F1C 9A3404EF03 call 0x3ef:0x434
00004F21 83C404 add sp,byte +0x4
00004F24 A04600 mov al,[0x46]
00004F27 B103 mov cl,0x3
00004F29 2AE4 sub ah,ah
00004F2B F6F1 div cl
00004F2D 8AD8 mov bl,al
00004F2F 2AFF sub bh,bh
00004F31 D1E3 shl bx,1
00004F33 FFB7D634 push word [bx+0x34d6]
00004F37 B8CC34 mov ax,0x34cc
00004F3A 50 push ax
00004F3B 9AE806EF03 call 0x3ef:0x6e8
00004F40 83C404 add sp,byte +0x4
00004F43 CB retf
....
But once I shift the function call one byte earlier (instead of having the nop in front of the call 0x3ef:0x434 but call 0x3ef:0x434 then the nop), everything cease to function.
with nop first: dosbox logger output
07D2:0000039F nop EAX:00000002 EBX:00000054 ECX:00000000 EDX:000003CF ESI:00003431 EDI:00000107 EBP:0000E424 ESP:0000E416 DS:32C3 ES:A000 FS:0000 GS:0000 SS:32C3 CF:0 ZF:0 SF:0 OF:0 AF:1 PF:0 IF:1
07D2:000003A0 call 070C:0434 EAX:00000002 EBX:00000054 ECX:00000000 EDX:000003CF ESI:00003431 EDI:00000107 EBP:0000E424 ESP:0000E416 DS:32C3 ES:A000 FS:0000 GS:0000 SS:32C3 CF:0 ZF:0 SF:0 OF:0 AF:1 PF:0 IF:1
without nop:
07D2:0000039F call 20EF:0434 EAX:00000002 EBX:00000054 ECX:00000000 EDX:000003CF ESI:00003431 EDI:00000107 EBP:0000E424 ESP:0000E416 DS:32C3 ES:A000 FS:0000 GS:0000 SS:32C3 CF:0 ZF:0 SF:0 OF:0 AF:1 PF:0 IF:1
I have been reading assembly, bp, sp, ss and stuff, but I am still stuck. Hence asking the question, which is how to reposition function calls without corrupting code?
DOS doesn't load programs at a constant address so the far calls have relocation entries that also need changing, if one is modifying an executable.
Also the moved call instruction may be the destination of a call or jmp instruction which also needs changing.

Delphi - Write a .pas library with functions

I'm writing some functions in Delphi using Assembly. So I want to put it in a .pas file called Strings.pas. To use in uses of a new Delphi software. What do I need to write, to make it a valid library?
My function is like this:
function Strlen(texto : string) : integer;
begin
asm
mov esi, texto
xor ecx,ecx
cld
#here:
inc ecx
lodsb
cmp al,0
jne #here
dec ecx
mov Result,ecx
end;
end;
That counts the numbers of chars in the string. How can I make it in a lib Strings.pas to call with uses Strings; in my form?
A .pas file is a unit, not a library. A .pas file needs to have unit, interface, and implementation statements, eg:
Strings.pas:
unit Strings;
interface
function Strlen(texto : string) : integer;
implementation
function Strlen(texto : string) : integer;
asm
// your assembly code...
// See Note below...
end;
end.
Then you can add the .pas file to your other projects and use the Strings unit as needed. It will be compiled directly into each executable. You don't need to make a separate library out of it. But if you want to, you can. Create a separate Library (DLL) or Package (BPL) project, add your .pas file to it, and compile it into an executable file that you can then reference in your other projects.
In the case of a DLL library, you will not be able to use the Strings unit directly. You will have to export your function(s) from the library (and string is not a safe data type to pass over a DLL boundary between modules), eg:
Mylib.dpr:
library Mylib;
uses
Strings;
exports
Strings.Strlen;
begin
end.
And then you can have your other projects declare the function(s) using external clause(s) that reference the DLL file, eg:
function Strlen(texto : PChar) : integer; external 'Mylib.dll';
In this case, you can make a wrapper .pas file that declares the functions to import, add that unit to your other projects and use it as needed, eg:
StringsLib.pas:
unit StringsLib;
interface
function Strlen(texto : PChar) : integer;
implementation
function Strlen; external 'Mylib.dll';
end.
In the case of a Package, you can use the Strings units directly. Simply add a reference to the package's .bpi in your other project's Requires list in the Project Manager, and then use the unit as needed. In this case, string is safe to pass around.
Note: in the assembly code you showed, for the function to not cause an access violation, you need to save and restore the ESI register. See the section on Register saving conventions in the Delphi documentation.
The correct asm version may be:
unit MyStrings; // do not overlap Strings.pas unit
interface
function StringLen(const texto : string) : integer;
implementation
function StringLen(const texto : string) : integer;
asm
test eax,eax
jz #done
mov eax,dword ptr [eax-4]
#done:
end;
end.
Note that:
I used MyStrings as unit name, since it is a very bad idea to overlap the official RTL unit names, like Strings.pas;
I wrote (const texto: string) instead of (texto: string), to avoid a reference count change at calling;
Delphi string type already has its length stored as integer just before the character memory buffer;
In Delphi asm calling conventions, the input parameters are set in eax edx ecx registers, and the integer result of a function is the eax register - see this reference article - for Win32 only;
I tested for texto to be nil (eax=0), which stands for a void '' string;
This would work only under Win32 - asm code under Win64 would be diverse;
Built-in length() function would be faster than an asm sub-function, since it is inlined in new versions of Delphi;
Be aware of potential name collisions: there is already a well known StrLen() function, which expects a PChar as input parameter - so I renamed your function as StringLen().
Since you want to learn asm, here are some reference implementation of this function.
A fast PChar oriented version may be :
function StrLen(S: PAnsiChar): integer;
asm
test eax,eax
mov edx,eax
jz #0
xor eax,eax
#s: cmp byte ptr [eax+edx+0],0; je #0
cmp byte ptr [eax+edx+1],0; je #1
cmp byte ptr [eax+edx+2],0; je #2
cmp byte ptr [eax+edx+3],0; je #3
add eax,4
jmp #s
#1: inc eax
#0: ret
#2: add eax,2; ret
#3: add eax,3
end;
A more optimized version:
function StrLen(S: PAnsiChar): integer;
// pure x86 function (if SSE2 not available) - faster than SysUtils' version
asm
test eax,eax
jz ##z
cmp byte ptr [eax+0],0; je ##0
cmp byte ptr [eax+1],0; je ##1
cmp byte ptr [eax+2],0; je ##2
cmp byte ptr [eax+3],0; je ##3
push eax
and eax,-4 { DWORD Align Reads }
##Loop:
add eax,4
mov edx,[eax] { 4 Chars per Loop }
lea ecx,[edx-$01010101]
not edx
and edx,ecx
and edx,$80808080 { Set Byte to $80 at each #0 Position }
jz ##Loop { Loop until any #0 Found }
##SetResult:
pop ecx
bsf edx,edx { Find First #0 Position }
shr edx,3 { Byte Offset of First #0 }
add eax,edx { Address of First #0 }
sub eax,ecx { Returns Length }
##z: ret
##0: xor eax,eax; ret
##1: mov eax,1; ret
##2: mov eax,2; ret
##3: mov eax,3
end;
An SSE2 optimized version:
function StrLen(S: PAnsiChar): integer;
asm // from GPL strlen32.asm by Agner Fog - www.agner.org/optimize
or eax,eax
mov ecx,eax // copy pointer
jz #null // returns 0 if S=nil
push eax // save start address
pxor xmm0,xmm0 // set to zero
and ecx,0FH // lower 4 bits indicate misalignment
and eax,-10H // align pointer by 16
movdqa xmm1,[eax] // read from nearest preceding boundary
pcmpeqb xmm1,xmm0 // compare 16 bytes with zero
pmovmskb edx,xmm1 // get one bit for each byte result
shr edx,cl // shift out false bits
shl edx,cl // shift back again
bsf edx,edx // find first 1-bit
jnz #A200 // found
// Main loop, search 16 bytes at a time
#A100: add eax,10H // increment pointer by 16
movdqa xmm1,[eax] // read 16 bytes aligned
pcmpeqb xmm1,xmm0 // compare 16 bytes with zero
pmovmskb edx,xmm1 // get one bit for each byte result
bsf edx,edx // find first 1-bit
// (moving the bsf out of the loop and using test here would be faster
// for long strings on old processors, but we are assuming that most
// strings are short, and newer processors have higher priority)
jz #A100 // loop if not found
#A200: // Zero-byte found. Compute string length
pop ecx // restore start address
sub eax,ecx // subtract start address
add eax,edx // add byte index
#null:
end;
Or even a SSE4.2 optimized version:
function StrLen(S: PAnsiChar): integer;
asm // warning: may read up to 15 bytes beyond the string itself
or eax,eax
mov edx,eax // copy pointer
jz #null // returns 0 if S=nil
xor eax,eax
pxor xmm0,xmm0
{$ifdef HASAESNI}
pcmpistri xmm0,dqword [edx],EQUAL_EACH // comparison result in ecx
{$else}
db $66,$0F,$3A,$63,$02,EQUAL_EACH
{$endif}
jnz #loop
mov eax,ecx
#null: ret
#loop: add eax,16
{$ifdef HASAESNI}
pcmpistri xmm0,dqword [edx+eax],EQUAL_EACH // comparison result in ecx
{$else}
db $66,$0F,$3A,$63,$04,$10,EQUAL_EACH
{$endif}
jnz #loop
#ok: add eax,ecx
end;
You will find all those functions, including Win64 versions, in our very optimized SynCommons.pas unit, which is shared by almost all our Open Source projects.
The my two solutions to get the length of two types of string,
as for says Peter Cordes are not both useful.
Only the "PAnsiCharLen()" could be an alternative solution,
but not as fast as it is StrLen() (optimized) of Amaud Bouchez,
that it is about 3 times faster than mine.
10/14/2017 (mm/dd/yyy): Added one new function (Clean_Str).
However, for now, I propose three small corrections to both
of them (two suggested by Peter Cordes: 1) use MovZX instead of Mov && And;
2) Use SetZ/SetE instead LAHF/ShL, use XOr EAX,EAX instead XOr AL,AL);
in the future I could define the functions in assembly (now they are defined in Pascal):
unit MyStr;
{ Some strings' function }
interface
Function PAnsiCharLen(S:PAnsiChar):Integer;
{ Get the length of the PAnsiChar^ string. }
Function ShortStrLen(S:ShortString):Integer;
{ Get the length of the ShortString^ string. }
Procedure Clean_Str(Str:ShortString;Max_Len:Integer);
{ This function can be used to clear the unused space of a short string
without modifying his useful content (for example, if you save a
short-string field in a file, at parity of content the file may be
different, because the unused space is not initialized).
Clears a String Str_Ptr ^: String [], which has
Max_Len = SizeOf (String []) - 1 characters, placing # 0
all characters beyond the position of Str_Ptr ^ [Str_Ptr ^ [0]] }
implementation
Function PAnsiCharLen(S:PAnsiChar):Integer;
{ EAX EDX ECX are 1°, 2° AND 3° PARAMETERs.
Can freely modify the EAX, ECX, AND EDX REGISTERs. }
Asm
ClD {Clear string direction flag}
Push EDI {Save EDI's reg. into the STACK}
Mov EDI,S {Load S into EDI's reg.}
XOr EAX,EAX {Set AL's reg. with null terminator}
Mov ECX,-1 {Set ECX's reg. with maximum length of the string}
RepNE ScaSB {Search null and decrease ECX's reg.}
SetE AL {AL is set with FZero}
Add EAX,ECX {EAX= maximum_length_of_the_string - real_length_of_the_string}
Not EAX {EAX= real_length_of_the_string}
Pop EDI {Restore EDI's reg. from the STACK}
End;
Function ShortStrLen(S:ShortString):Integer; Assembler;
{ EAX EDX ECX are 1°, 2° AND 3° PARAMETERs.
Can freely modify the EAX, ECX, AND EDX REGISTERs. }
Asm
MovZX EAX,Byte Ptr [EAX] {Load the length of S^ into EAX's reg. (function's result)}
End;
Procedure Clean_Str(Str:ShortString;Max_Len:Integer); Assembler;
(* EAX EDX ECX are 1°, 2° AND 3° PARAMETERs.
Can freely modify the EAX, ECX, AND EDX REGISTERs. *)
Asm
ClD {Clear string direction flag}
Push EDI {Save EDI's reg. into the STACK}
Mov EDI,Str {Load input string pointer into EDI's reg.}
Mov ECX,Max_Len {Load allocated string length into ECX's reg.}
MovZX EDX,Byte Ptr [EDI] {Load real string length into EDX's reg.}
StC {Process the address of unused space of Str; ...}
AdC EDI,EDX {... skip first byte and useful Str space}
Cmp EDX,ECX {If EDX>ECX ...}
CMovGE EDX,ECX {... set EDX with ECX}
Sub ECX,EDX {ECX contains the size of unused space of Str}
XOr EAX,EAX {Clear accumulator}
Rep StoSB {Fill with 0 the unused space of Str}
Pop EDI {Restore EDI's reg. from the STACK}
End;
end.
Old (incomplete) answer:
"Some new string's functions, not presents in Delphi library, could be these:"
Type Whole=Set Of Char;
Procedure AsmKeepField (PStrIn,PStrOut:Pointer;FieldPos:Byte;
All:Boolean);
{ Given "field" as a sequence of characters that does not contain spaces
or tabs (# 32, # 9), it takes FieldPos (1..N) field
to PStrIn ^ (STRING) and copies it to PStrOut ^ (STRING).
If All = TRUE, it also takes all subsequent fields }
Function AsmUpCComp (PStr1,PStr2:Pointer):Boolean;
{ Compare a string PStr1 ^ (STRING) with a string PStr2 ^ (STRING),
considering the PStr1 alphabetic characters ^ always SHIFT }
Function UpCaseStrComp (Str1,Str2:String;Mode:Boolean):ShortInt;
{ Returns: -1 if Str1 < Str2.
0 is Str1 = Str2.
1 is Str1 > Str2.
MODE = FALSE means "case sensitive comparison" (the letters are
consider them as they are).
MODE = TRUE means that the comparison is done by considering
both strings as if they were all uppercase }
Function KeepLS (Str:String;CntX:Byte):String;
{ RETURN THE PART OF STR THAT INCLUDES THE FIRST CHARACTER
OF STR AND ALL THE FOLLOW UP TO THE POSITION CntX (0 to N-1) INCLUDED }
Function KeepRS (Str:String;CntX,CsMode:Byte):String;
{ RETURN THE PART OF STR STARTING TO POSITION CntX + 1 (0 to N-1)
UP TO END OF STR.
IF CsMode = 0 (INSERT MODE), IF CsMode = 1 (OVERWRITE-MODE):
IN THIS CASE, THE CHARACTER TO CntX + 1 POSITION IS NOT INCLUDED }
Function GetSubStr (Str:String;
Pos,Qnt:Byte;CH:Char):String;
{ RETURN Qnt STR CHARACTERS FROM POSITION Pos (1 to N) OF STR;
IF EFFECTIVE LENGTH IS LESS THAN Qnt, WILL ADDED CHARACTER = CH }
Function Keep_Right_Path_Str_W(PathName:String;FieldWidth:Byte;
FormatWithSpaces:Boolean):String;
{ RESIZE A STRING OF A FILE PATH, FROM PathName;
THE NEW STRING WILL HAVE A MAXIMUM LENGTH OF FieldWidth CHARACTERS.
REPLACE EXCEDENT CHARACTERS WITH 3 POINTS,
INSERTED AFTER DRIVE AND ROOT.
REPLACE SOME DIRECTORY WITH 3 POINTS,
ONLY WHEN IT IS NECESSARY, POSSIBLE FROM SECOND.
FORMAT RETURN WITH SPACE ONLY IF FormatWithSpaces = TRUE }
Function KeepBarStr (Percentage,Qnt:Byte;
Ch1,Ch2,Ch3:Char):String;
{ THIS IS A FUNCTION WICH MAKES A STRING WICH CONTAINS A REPRESENTATION OF STATE
OF ADVANCEMENT OF A PROCESS; IT RETURNS A CHARACTERS' SEQUENCE, CONSTITUTED BY "<Ch1>"
(LENGTH = Percentage / 100 * Qnt), WITH AN APPROXIMATION OF THE LAST CHARACTER TO
"<Ch2>" (IF "Percentage / 100 * Qnt" HAS HIS FRACTIONAL'S PART GREATER THAN 0.5),
FOLLOWED BY AN OTHER CHARACTERS' SEQUENCE, CONSTITUTED BY "<Ch3>" (LENGTH = (100 -
Percentage) / 100 * Qnt). }
Function Str2ChWhole (Str:String;Var StrIndex:Byte;
Var ChSet:Whole;
Mode:Boolean):Boolean;
{ CONVERT A PART OF Str, POINTED BY StrIndex, IN A ChSet CHARACTER SET;
IF Mode = TRUE, "StrIn" SHOULD CONTAIN ASCII CODES
OF CORRESPONDING CHARACTERS EXPRESSED IN DECIMAL SIZE;
OTHERWISE IT SHOULD CONTAIN CORRESPONDING CHARACTER SYMBOLS }
Function ChWhole2Str (ChSet:Whole;Mode:Boolean):String;
{ CONVERT A SET OF CHARACTERS IN A CORRESPONDING STRING;
IF Mode = TRUE ELEMENTS OF ChSet WILL BE CONVERTED IN ASCII CODES
EXPRESSED IN DECIMAL SIZE; OTHERWISE THE CORRESPONDING SYMBOLS
WILL BE RETURNED }
Function ConverteFSize (FSize:LongInt;
Var SizeStr:TSizeStr):Integer;
{ MAKES THE CONVERSION OF THE DIMENSION OF A FILE IN A TEXT,
LARGE TO MAXIMUM 5 CHARACTERS, AND RETURN THE COLOR OF THIS STRING }
Function UpCasePos (SubStr,Str:String):Byte;
{ Like the Pos () system function, but not "case sensitive" }

Assembly program that identifies if parameters are different or same.

Hi I am working on an assembly, technically HLA(High Level Assembly) assignment and I am a bug that I need help with. Here is the assignment: Write an HLA Assembly language program that implements a function which correctly identifies whether all the parameters are different, returning either 0 or 1 in EAX depending on whether this condition has been met. This function should have the following signature:
procedure allDifferent( x: int16; y : int16; z : int16 ); #nodisplay; #noframe;
Shown below is a sample program dialogue.
Feed Me X: 205
Feed Me Y: 170
Feed Me Z: 91
allDifferent returns true!
Feed Me X: 0
Feed Me Y: 0
Feed Me Z: 0
allDifferent returns false!
Feed Me X: 121
Feed Me Y: 121
Feed Me Z: 121
allDifferent returns false!
Here is the code I have. My problem is that regardless of what numbers I put in, it always returns "allDifferent returns false!" Thanks you for the help.
program allDifferent;
#include( "stdlib.hhf" );
static
iDataValue1 : int16 := 0;
iDataValue2 : int16 := 0;
iDataValue3 : int16 := 0;
iDataValue4 : int16 := 0;
procedure allDiff( x: int16; y : int16; z : int16 ); #nodisplay; #noframe;
static
returnAddress : dword;
temp : int16;
begin allDiff;
pop(returnAddress);
pop(z);
pop(y);
pop(x);
pop(temp);
push(returnAddress);
push(AX);
push(BX);
mov(x, AX);
cmp(y, AX);
je xyequal;
jmp notequal;
xyequal:
mov(y, BX);
cmp(z, BX);
je equal;
jmp notequal;
equal:
mov(0, EAX);
jmp ExitSequence;
notequal:
mov(1, EAX);
jmp ExitSequence;
ExitSequence:
pop(BX);
pop(AX);
ret();
end allDiff;
begin allDifferent;
stdout.put( "Gimme a X:" );
stdin.get( iDataValue1 );
stdout.put("Gimme a Y:");
stdin.get(iDataValue2);
stdout.put("Gimme a Z:");
stdin.get(iDataValue3);
push( iDataValue1 );
push( iDataValue2 );
push( iDataValue3 );
push( iDataValue4 );
call allDiff;
cmp(EAX, 1);
je ISDIFFERENT;
jmp NOTDIFFERENT;
ISDIFFERENT:
stdout.put("allDifferent retursn true",nl);
jmp EndProgram;
NOTDIFFERENT:
stdout.put("allDifferent retursn false",nl);
jmp EndProgram;
stdout.newln();
EndProgram:
end allDifferent;
notequal:
mov(1, EAX); <<- good.
jmp ExitSequence;
:
ExitSequence:
pop(BX);
pop(AX); <<- not so good.
ret();
Have a close look at what's happening to AX in the above sequence. Even though you set it to something within the code, you overwrite that value with the pop instruction, reverting AX to whatever it was when you entered the function.
Assembler functions should generally preserve and restore registers that may be being used by the callers, but not when you want to use that register to return some useful piece of information.
In addition, your parameters are not being treated correctly. You push them in the order {p1, p2, p3, junk} (not sure why you have a fourth parameter since you don't use it for anything).
But, within the function, you pop in the order {x, y, z, temp}. Now, because the stack is a LIFO (last in, first out) structure, the mappings will be:
junk -> x
p3 -> y
p2 -> z
p1 -> temp
That means the x variable will be set to some arbitrary value rather than one of the "real" parameters you passed in.
If you're not going to use that fourth parameter, I'd suggest getting rid of it. If you do want to use it at some point, you'll need to correlate your push and pop operations so you get the correct values.
As an aside, you could probably also make your code a lot cleaner in a couple of ways.
First, there's no real need to use (or save/restore) BX since AX is used locally (in a small mov/cmp block). You could use AX both for the xy check and the yz check.
Second, you could get rid of quite a few of the jumps that aren't actually needed. The pseudo-code for your algorithm can boil down to a very simple:
if x and y are same, go to NOTDIFF.
if y and z are same, go to NOTDIFF.
DIFF:
set AX to 1
go to END
NOTDIFF:
set AX to 0
END:
return

Function call with more than 4 registers ARM assembly

I am trying to pass r0-r5 into the function check. However only the registers r0-r3 are copied by reference. In my main function i have this code.
push {lr}
mov r0, #1
mov r1, #2
mov r2, #3
mov r3, #4
mov r4, #5
mov r5, #6
bl check
pop {lr}
bx lr
Inside my check function i have this code. This is in a separate file also not sure if that matters
m: .asciz "%d, %d ~ (%d, %d, %d)
...
push {lr}
ldr r0, =m
bl printf
pop {lr}
bx lr
The output for this is 2, 3 ~ (4, 33772, 1994545180). I am trying to learn assembly so can you please explain the answer with some googling i know i need to use the stack but, I am not sure how to use it and would like to learn how. Thanks in advance.
you could just try it and see
void check ( unsigned int, unsigned int, unsigned int, unsigned int, unsigned int );
void call_check ( void )
{
check(1,2,3,4,5);
}
arm-linux-gnueabi-gcc -c -O2 check.c -o check.o
arm-linux-gnueabi-objdump -D check.o
00000000 <call_check>:
0: e52de004 push {lr} ; (str lr, [sp, #-4]!)
4: e3a03005 mov r3, #5
8: e24dd00c sub sp, sp, #12
c: e58d3000 str r3, [sp]
10: e3a00001 mov r0, #1
14: e3a01002 mov r1, #2
18: e3a02003 mov r2, #3
1c: e3a03004 mov r3, #4
20: ebfffffe bl 0 <check>
24: e28dd00c add sp, sp, #12
28: e8bd8000 ldmfd sp!, {pc}
now of course this could be hand optimized and still work just fine. Maybe they are keeping the stack aligned on a 16 byte/4 word/64 bit boundary is the reason for the additional 12 byte modification to the stack pointer? dont know. but other than that you can see that you naturally need to save the link register since you are calling another function. r0 - r3 are obvious and then per the eabi the first thing on the stack is the 5th word worth of parameters.
Likewise for your check function you can simply let the compiler get you started. If you look at your code, r0 is coming in as your first parameter and then you trash it by changing it to the first parameter for printf. you need 6 parameters for printf to pass in. you need to move them over one the first parameter to check is the second parameter to printf, the second to check is third to printf and so on. so the code has to do that shift (two of which now are on the stack).