LLVM use of carry and zero flags - llvm-clang

I'm starting to read LLVM docs and IR documentation.
In common architectures, an asm cmp instruction "result" value is -at least- 3 bits long, let's say the first bit is the SIGN flag, the second bit is the CARRY flag and the third bit is the ZERO flag.
Question 1)
Why the IR icmp instruction result value is only i1? (you can choose only one flag)
Why doesn't IR define, let's call it a icmp2 instruction returning an i3 having SIGN,CARRY and ZERO flags?
This i3 value can be acted upon with a switch instruction, or maybe a specific br2 instruction, like:
%result = cmp2 i32 %a, i32 %b
br2 i3 %result onzero label %EQUAL, onsign label %A_LT_B
#here %a GT %b
Question 2)
Does this make sense? Could this br2 instruction help create new optimizations? i.e. remove all jmps? it is necessary or the performance gains are negligible?
The reason I'm asking this -besides not being an expert in LLVM- is because in my first tests I was expecting some kind of optimization to be made by LLVM in order to avoid making the comparison twice and also avoid all branches by using asm conditional-move instructions.
My Tests:
I've compiled with clang-LLVM this:
#include <stdlib.h>
#include <inttypes.h>
typedef int32_t i32;
i32 compare (i32 a, i32 b){
// return (a - b) & 1;
if (a>b) return 1;
if (a<b) return -1;
return 0;
}
int main(int argc, char** args){
i32 n,i;
i32 a,b,avg;
srand(0); //fixed seed
for (i=0;i<500;i++){
for (n=0;n<1e6;n++){
a=rand();
b=rand();
avg+=compare(a,b);
}
}
return avg;
}
Output asm is:
...
mov r15d, -1
...
.LBB1_2: # Parent Loop BB1_1 Depth=1
# => This Inner Loop Header: Depth=2
call rand
mov r12d, eax
call rand
mov ecx, 1
cmp r12d, eax
jg .LBB1_4
# BB#3: # in Loop: Header=BB1_2 Depth=2
mov ecx, 0
cmovl ecx, r15d
.LBB1_4: # %compare.exit
# in Loop: Header=BB1_2 Depth=2
add ebx, ecx
...
I expected (all jmps removed in the inner loop):
mov r15d, -1
mov r13d, 1 # HAND CODED
call rand
mov r12d, eax
call rand
xor ecx,ecx # HAND CODED
cmp r12d, eax
cmovl ecx, r15d # HAND CODED
cmovg ecx, r13d # HAND CODED
add ebx, ecx
Performance difference (1s) seems to be negligible (on a VM under VirtualBox):
LLVM generated asm: 12.53s
hancoded asm: 11.53s
diff: 1s, in 500 millions iterations
Question 3)
Are my performance measures correct? Here's the makefile and the full hancoded.compare.s
makefile:
CC=clang -mllvm --x86-asm-syntax=intel
all:
$(CC) -S -O3 compare.c
$(CC) compare.s -o compare.test
$(CC) handcoded.compare.s -o handcoded.compare.test
echo `time ./compare.test`
echo `time ./handcoded.compare.test`
echo `time ./compare.test`
echo `time ./handcoded.compare.test`
hand coded (fixed) asm:
.text
.file "handcoded.compare.c"
.globl compare
.align 16, 0x90
.type compare,#function
compare: # #compare
.cfi_startproc
# BB#0:
mov eax, 1
cmp edi, esi
jg .LBB0_2
# BB#1:
xor ecx, ecx
cmp edi, esi
mov eax, -1
cmovge eax, ecx
.LBB0_2:
ret
.Ltmp0:
.size compare, .Ltmp0-compare
.cfi_endproc
.globl main
.align 16, 0x90
.type main,#function
main: # #main
.cfi_startproc
# BB#0:
push rbp
.Ltmp1:
.cfi_def_cfa_offset 16
push r15
.Ltmp2:
.cfi_def_cfa_offset 24
push r14
.Ltmp3:
.cfi_def_cfa_offset 32
push r12
.Ltmp4:
.cfi_def_cfa_offset 40
push rbx
.Ltmp5:
.cfi_def_cfa_offset 48
.Ltmp6:
.cfi_offset rbx, -48
.Ltmp7:
.cfi_offset r12, -40
.Ltmp8:
.cfi_offset r14, -32
.Ltmp9:
.cfi_offset r15, -24
.Ltmp10:
.cfi_offset rbp, -16
xor r14d, r14d
xor edi, edi
call srand
mov r15d, -1
mov r13d, 1 # HAND CODED
# implicit-def: EBX
.align 16, 0x90
.LBB1_1: # %.preheader
# =>This Loop Header: Depth=1
# Child Loop BB1_2 Depth 2
mov ebp, 1000000
.align 16, 0x90
.LBB1_2: # Parent Loop BB1_1 Depth=1
# => This Inner Loop Header: Depth=2
call rand
mov r12d, eax
call rand
xor ecx,ecx #hand coded
cmp r12d, eax
cmovl ecx, r15d #hand coded
cmovg ecx, r13d #hand coded
add ebx, ecx
.LBB1_3:
dec ebp
jne .LBB1_2
# BB#5: # in Loop: Header=BB1_1 Depth=1
inc r14d
cmp r14d, 500
jne .LBB1_1
# BB#6:
mov eax, ebx
pop rbx
pop r12
pop r14
pop r15
pop rbp
ret
.Ltmp11:
.size main, .Ltmp11-main
.cfi_endproc
.ident "Debian clang version 3.5.0-1~exp1 (trunk) (based on LLVM 3.5.0)"
.section ".note.GNU-stack","",#progbits

Question 1: LLVM IR is machine independent. Some machines might not even have a carry flag, or even a zero flag or sign flag. The return value is i1 which suffices to indicate TRUE or FALSE. You can set the comparison condition like 'eq' and then check the result to see if the two operands are equal or not, etc.
Question 2: LLVM IR does not care about optimization initially. The main goal is to generate a Static Single Assignment (SSA) based representation of instructions. Optimization happens in later passes of which some are machine independent and some are machine dependent. Your br2 idea will assume that the machine will support those 3 flags which might be a wrong assumption,
Question 3: I am not sure what you are trying to do here. Can you explain more?

Related

Two functions/subroutines in ARM assembly language

I am stuck with an exercise of ARM.
The following program should calculate the result of 2((x-1)^2 + 1) but there is a mistake in the program that leads it into an infinite loop.
I think that I still don't understand completely subroutines and for this reason I am not seeing where the mistake is.
_start:
mov r0, #4
bl g
mov r7, #1
swi #0
f:
mul r1, r0, r0
add r0, r1, #1
mov pc, lr
g:
sub r0, r0, #1
bl f
add r0, r0, r0
mov pc, lr
The infinite loop starts in subroutine g: in the line of mov pc, lr and instead of returning to _start it goes to the previous line add r0, r0, r0 and then again to the last line of subroutine g:.
So I guess that the problem is the last line of subroutine g: but I can't find the way to return to _start without using mov pc, lr. I mean, this should be the command used when we have a branch with link.
Also, in this case r0 = 4, so the result of the program should be 20.
This is because you don't save lr on the stack prior to calling f, and the initial return address was therefore lost: if you only have one level of subroutine calls, using lr without saving it is fine, but if you have more then one, you need to preserve the previous value of lr.
For example, when compiling this C example using Compiler Explorer with ARM gcc 4.56.4 (Linux), and options -mthumb -O0,
void f()
{
}
void g()
{
f();
}
void start()
{
g();
}
The generated code will be:
f():
push {r7, lr}
add r7, sp, #0
mov sp, r7
pop {r7, pc}
g():
push {r7, lr}
add r7, sp, #0
bl f()
mov sp, r7
pop {r7, pc}
start():
push {r7, lr}
add r7, sp, #0
bl g()
mov sp, r7
pop {r7, pc}
If you were running this on bare metal, not under Linux, you'd need your stack pointer to be initialized a correct value.
Assuming you are running from RAM on a bare-metal system/simulator, you could setup a minimal stack of 128 bytes:
.text
.balign 8
_start:
adr r0, . + 128 // set top of stack at _start + 128
mov sp, r0
...
But it looks like you're writing a Linux executable that exits with a swi/r7=1 exit system call. So don't do that, it would make your program crash when it tries to write to the stack.

x86: Access writing violation while using the OFFSET operator to address of array

I am getting the Exception thrown at 0x0044369D in Project2.exe: 0xC0000005: Access violation writing location 0x00000099. From my research so far, I am under the impression this has to do with a null pointer or out of range memory being accessed from the line mov [eax], ecx
I used the offset operator in mov eax, OFFSET arrayfib and I thought that should remedy this. I can't seem to figure out what is causing the issue here.
.model flat, stdcall
.stack 4096
INCLUDE Irvine32.inc
ExitProcess PROTO, dwExitCode: DWORD
.data
arrayfib DWORD 35 DUP (99h)
.code
main PROC
mov eax, OFFSET arrayfib ;address of fibonacci number array
mov bx, 30 ;number of Fibonacci numbers to generate
call fibSequence ;call to Fibonacci sequence procedure
mov edx, OFFSET arrayfib ;passes information to call DumpMem
mov ecx, LENGTHOF arrayfib
mov ebx, TYPE arrayfib
call DumpMem
INVOKE ExitProcess, 0
main ENDP
;----------------------------------------------------------------------------------
;fibSequence
;Calculates the fibonacci numbers to the n'th fibonacci number
;Receives: eax = address of the Fibonacci number array
;bx = number of Fibonacci numbers to generate
;ecx = used for Fibonacci calculation (i-2)
;edx = used for Fibonacci calculation (i-1)
;returns: [eax+4i] = each value of fibonacci sequence in array, scaled for doubleword
;---------------------------------------------------------------------------------
fibSequence PROC
mov ecx, 0 ;initialize the registers with Fib(0) and Fib(1)
mov edx, 1
mov [eax], edx ;store first Fibonacci number in the array
;since a Fibonacci number has been generated, decrement the counter
;test for completion, proceed
mov eax, [eax+4]
fibLoop:
sub bx, 1
jz quit ;if bx = 0, jump to exit
add ecx, edx
push ecx ;save fib# for next iteration before it is destroyed
mov [eax], ecx ;stores new fibonacci number in the next address of array
push edx ;save other fib# before it is destroyed
mov ecx, edx
mov edx, [eax]
mov eax, [eax+4] ;increments accounting for doubleword
pop edx
pop ecx
quit:
exit
fibSequence ENDP
END main
Also if there are any other suggestions I would be happy to hear them. I am new to all this and looking to learn as much as possible.

How to fix "Unhandled exception" error in assembly?

I've written a function that determines if a value is prime or not prime. But when I return from the function, it comes up with an error.
The error message is
Unhandled exception at 0x00000001 in Project.exe: 0xC0000005: Access violation executing location 0x00000001.
This function should return eax.
push ebp
mov ebp, esp
mov eax, [ebp+8] ; store input value
mov ecx, [ebp+8] ; store input value in counter
sub esp, 4 ; local variable
sub ecx, 1 ; avoid compare with itself
cmp eax, 3 ; compare with 1, 2, 3
jbe Prime
L1:
cmp ecx, 3 ; when count=3 to stop
je NotP
mov edx, 0 ; clear edx to save remainder
mov [esp-4], eax ; save input value
div ecx ; divide number
cmp edx, 0 ; check remainder
je NotP ; if remainder=0 then not prime
jmp Prime
loop L1
NotP:
mov eax, 0
push eax ; if delete this ilne still come up error
pop ebp
ret
Prime:
mov eax, 1
push eax ; if delete this ilne still come up error
pop ebp
ret
isPrime endp
mov [esp-4], eax ; save input value
If you plan on using the local variable that you reserved room for, then you have to write:
mov [esp], eax ; save input value
or alternatively write:
mov [ebp-4], eax ; save input value
A correct prolog/epilog would be:
push ebp
mov ebp, esp
sub esp, 4 ; local variable
mov eax, [ebp+8] ; store input value
...
NotP:
mov eax, 0
pop ebp ; remove local variable
pop ebp
ret
Prime:
mov eax, 1
pop ebp ; remove local variable
pop ebp
ret
isPrime endp
cmp edx, 0 ; check remainder
je NotP ; if remainder=0 then not prime
jmp Prime
loop L1
Finding the remainder not zero is not enough to conclude that the number is prime! More tests are needed. For now, that loop L1 instruction is never executed.
e.g. To test 15, your first division does 15 / 14 which yields a non-zero remainder but 15 isn't a prime number.
L1:
cmp ecx, 3 ; when count=3 to stop
je NotP
The top of the loop can't be correct either! Consider testing the number 7.
First division is 7 / 6 and has a remainder so the loop has to continue
Second division is 7 / 5 and has a remainder so the loop has to continue
Third division is 7 / 4 and has a remainder so the loop has to continue
You don't try any more divisions and conclude "not prime", yet 7 is definitively a prime number.

MASM how to make desired function call

I'd like to know, how to do the following.
I have an array, where i have to summ numbers (easy)
but the twist is, that i have to have a function call for it,
that get's is params through specific registers. How do i implement that?
In this case, the function needs to get the array (offset) through ESI, and the length of it through ECX.
please educate me
EDIT:
in the meantime i've conjured up this. No idea if this works to as my MASM compliling just broken itself for no reason
.data
intarray DWORD 10000h,20000h,30000h,40000h
.code
szummer proc uses esi ecx,
ptrArray:PTR DWORD, ;points to the array
szArray: Dword ;array size
mov esi, ptrArray ;address of the array
mov ecx, szArray ;szize
mov eax, 0 ;set to 0
AS1:
add eax, [esi] ;add each int to sum
add esi, 4 ;point to next int
loop AS1 ;reapet for array size
ret;
szummer endp
main proc
mov ecx, OFFSET intarray
mov esi, LENGHTOF intarray
INVOKE ArraySum,ecx,esi
invoke ExitProcess,0
main endp
end main
The MASM directive INVOKE works only with the calling conventions C (cdecl), STDCALL, BASIC, FORTRAN and PASCAL. All of these conventions pass the arguments on the stack. Thus, you can't use INVOKE for passing the arguments in registers. You can use the Assembly instruction CALL instead. Your program - slightly modified ;-) - with MASM32 library included (because of "ExitProcess"):
INCLUDE \masm32\include\masm32rt.inc
.DATA
intarray DWORD 10000h,20000h,30000h,40000h
.CODE
szummer proc uses esi ecx
mov eax, 0 ;set to 0
AS1:
add eax, [esi] ;add each int to sum
add esi, 4 ;point to next int
loop AS1 ;reapet for array size
ret;
szummer endp
main proc
mov esi, OFFSET intarray
mov ecx, LENGTHOF intarray
call szummer
invoke ExitProcess,0
main ENDP
END main

Sum function in x86 assembly - no output

I am trying to write a simple sum function in x86 assembly - to which i am passing 3 and 8 as arguments. However, the code doesn't print the sum. Appreciate any help in spotting the errors. I'm using NASM
section .text
global _start
_sum:
push ebp
mov ebp, esp
push edi
push esi ;prologue ends
mov eax, [ebp+8]
add eax, [ebp+12]
pop esi ;epilogue begins
pop edi
mov esp, ebp
pop ebp
ret 8
_start:
push 8
push 3
call _sum
mov edx, 1
mov ecx, eax
mov ebx, 1 ;stdout
mov eax, 4 ;write
int 0x80
mov ebx, 0
mov eax, 1 ;exit
int 0x80
To me, this looks like Linux assembler. From this page, in the Examples section, subsection int 0x80, it looks like ecx expects the address of the string:
_start:
movl $4, %eax ; use the write syscall
movl $1, %ebx ; write to stdout
movl $msg, %ecx ; use string "Hello World"
movl $12, %edx ; write 12 characters
int $0x80 ; make syscall
So, you'll have to get a spare chunk of memory, convert your result to a string, probably null-terminate that string, and then call the write with the address of the string in ecx.
For an example of how to convert an integer to a string see Printing an Int (or Int to String) You'll have to store each digit in a string instead of printing it, and null-terminate it. Then you can print the string.
Sorry, I have not programmed in assembly in years, so I cannot give you a more detailed answer, but hope that this will be enough to point you in the right direction.