how can i reverse this disassembly code to c - reverse-engineering

The disassembly code is this:
movzx ecx, byte ptr [rax] ;
add ecx, 0FFFFFFFEh ;
cmp cl, 2
I guest the code is reversed like this:
if rax - 2 > 2 {
...
Is that right?
and why?

it'll wrap so you have four cases 0-1 , 2-3, 4 (=2) , >4
may be they've used OF check to distinct

Related

How to fix "Unhandled exception" error in assembly?

I've written a function that determines if a value is prime or not prime. But when I return from the function, it comes up with an error.
The error message is
Unhandled exception at 0x00000001 in Project.exe: 0xC0000005: Access violation executing location 0x00000001.
This function should return eax.
push ebp
mov ebp, esp
mov eax, [ebp+8] ; store input value
mov ecx, [ebp+8] ; store input value in counter
sub esp, 4 ; local variable
sub ecx, 1 ; avoid compare with itself
cmp eax, 3 ; compare with 1, 2, 3
jbe Prime
L1:
cmp ecx, 3 ; when count=3 to stop
je NotP
mov edx, 0 ; clear edx to save remainder
mov [esp-4], eax ; save input value
div ecx ; divide number
cmp edx, 0 ; check remainder
je NotP ; if remainder=0 then not prime
jmp Prime
loop L1
NotP:
mov eax, 0
push eax ; if delete this ilne still come up error
pop ebp
ret
Prime:
mov eax, 1
push eax ; if delete this ilne still come up error
pop ebp
ret
isPrime endp
mov [esp-4], eax ; save input value
If you plan on using the local variable that you reserved room for, then you have to write:
mov [esp], eax ; save input value
or alternatively write:
mov [ebp-4], eax ; save input value
A correct prolog/epilog would be:
push ebp
mov ebp, esp
sub esp, 4 ; local variable
mov eax, [ebp+8] ; store input value
...
NotP:
mov eax, 0
pop ebp ; remove local variable
pop ebp
ret
Prime:
mov eax, 1
pop ebp ; remove local variable
pop ebp
ret
isPrime endp
cmp edx, 0 ; check remainder
je NotP ; if remainder=0 then not prime
jmp Prime
loop L1
Finding the remainder not zero is not enough to conclude that the number is prime! More tests are needed. For now, that loop L1 instruction is never executed.
e.g. To test 15, your first division does 15 / 14 which yields a non-zero remainder but 15 isn't a prime number.
L1:
cmp ecx, 3 ; when count=3 to stop
je NotP
The top of the loop can't be correct either! Consider testing the number 7.
First division is 7 / 6 and has a remainder so the loop has to continue
Second division is 7 / 5 and has a remainder so the loop has to continue
Third division is 7 / 4 and has a remainder so the loop has to continue
You don't try any more divisions and conclude "not prime", yet 7 is definitively a prime number.

Variable initialization in as8088

I'm currently writing a function that should basically just write characters from a string into variables.
When performing test prints my variables seem fine. But when I attempt to print the first variable assigned(inchar) outside of the function it returns a empty string, but the second variable (outchar) seems to return fine. Am I somehow overwriting the first variable?
This is my code:
_EXIT = 1
_READ = 3
_WRITE = 4
_STDOUT = 1
_STDIN = 1
_GETCHAR = 117
MAXBUFF = 100
.SECT .TEXT
start:
0: PUSH endpro2-prompt2
PUSH prompt2
PUSH _STDOUT
PUSH _WRITE
SYS
ADD SP,8
PUSH 4
PUSH buff
CALL getline
ADD SP,4
!!!!!!!!!
PUSH buff
CALL gettrans
ADD SP,4
ADD AX,1 !gives AX an intial value to start loop
1: CMP AX,0
JE 2f
PUSH endpro-prompt1
PUSH prompt1
PUSH _STDOUT
PUSH _WRITE
SYS
ADD SP,8
PUSH MAXBUFF
PUSH buff
CALL getline
ADD SP,2
!PUSH buff
!CALL translate
!ADD SP,4
JMP 1b
2: PUSH 0 ! exit with normal exit status
PUSH _EXIT
SYS
getline:
PUSH BX
PUSH CX
PUSH BP
MOV BP,SP
MOV BX,8(BP)
MOV CX,8(BP)
ADD CX,10(BP)
SUB CX,1
1: CMP CX,BX
JE 2f
PUSH _GETCHAR
SYS
ADD SP,2
CMPB AL,-1
JE 2f
MOVB (BX),AL
INC BX
CMPB AL,'\n'
JNE 1b
2: MOVB (BX),0
MOV AX, BX
SUB AX,8(BP)
POP BP
POP CX
POP BX
RET
gettrans:
PUSH BX
PUSH BP
MOV BP,SP
MOV BX,6(BP) !Store argument in BX
MOVB (inchar),BL ! move first char to inchar
1: INC BX
CMPB (BX),' '
JE 1b
MOVB (outchar),BL !Move char seperated by Space to outchar
MOV AX,1 !On success
POP BP
POP BX
RET
.SECT .BSS
buff:
.SPACE MAXBUFF
.SECT .DATA
prompt1:
.ASCII "Enter a line of text: "
endpro:
prompt2:
.ASCII "Enter 2 characters for translation: "
endpro2:
outchar:
.BYTE 0
inchar:
.BYTE 0
charct:
.BYTE 0
wordct:
.BYTE 0
linect:
.BYTE 0
inword:
.BYTE 0
This is the code used to test print
PUSH 1 ! print that byte
PUSH inchar
PUSH _STDOUT
PUSH _WRITE
SYS
ADD SP,8
CALL printnl !function that prints new line
PUSH 1 ! print that byte
PUSH outchar
PUSH _STDOUT
PUSH _WRITE
SYS
CALL printnl
ADD SP,8
There seem to be a number of as88 8088 simulator environments. But I noticed on many of the repositories of code this bug mentioned:
1. The assembler requires sections to be defined in the following order:
TEXT
DATA
BSS
After the first occurrences, remaining section directives may appear in any order.
I'd recommend in your code to move the BSS section after DATA in the event your as88 environment has a similar problem.
In your original code you had lines like this:
MOV (outchar),BX
[snip]
MOV (inchar),BX
You defined outchar and inchar as bytes. The 2 lines above move 2 bytes (16-bits) from the BX register to both one byte variables. This will cause the CPU to write the extra byte into the next variable in memory. You'd want to explicitly move a single byte. Something like this might have been more appropriate:
MOVB (outchar),BL
[snip]
MOVB (inchar),BL
As you will see this code still has a bug as I mention later in this answer. To clarify - the MOVB instruction will move a single byte from BL and place it into the variable.
When you do a SYS call for Write you need to pass the address of the buffer to print, not the data in the buffer. You had 2 lines like this:
PUSH (inchar)
[snip]
PUSH (outchar)
The parentheses say to take the value in the variables and push them on the stack. SYS WRITE requires the address of the characters to display. The code to push their addresses should look like:
PUSH inchar
[snip]
PUSH outchar
gettrans function has a serious flaw in handling the copy of a byte from one buffer to another. You have code that does this:
MOV BX,6(BP) !Store argument in BX
MOVB (inchar),BL ! move first char to inchar
1: INC BX
CMPB (BX),' '
JE 1b
MOVB (outchar),BL !Move char seperated by Space to outchar
MOV BX,6(BP) properly places that buffer address passed as an argument and puts it into BX. There appears to be a problem with the lines that look like:
MOVB (inchar),BL ! move first char to inchar
This isn't doing what the comment suggests it should. The line above moves the lower byte (BL) of the buffer address in BX to the variable inchar . You want to move the byte at the memory location pointed to by BX and put it into inchar. Unfortunately on the x86 you can't move the data from one memory operand to another directly. To get around this you will have to move the data from the buffer pointed to by BX into a temporary register (I'll choose CL) and then move that to the variable. The code could look like this:
MOVB CL, (BX)
MOVB (inchar),CL ! move first char to inchar
You then have to do the same for outchar so the fix in both places could look similar to this:
MOV BX,8(BP) !Store argument in BX
MOVB CL, (BX)
MOVB (inchar),CL ! move first char to inchar
1: INC BX
CMPB (BX),' '
JE 1b
MOVB CL, (BX)
MOVB (outchar),CL ! move second char to outchar
The instruction MOV (inchar),BX stores register BX to the memory location labelled inchar.
However, inchar has been defined as a .BYTE, but BX is a 16-bit register, (2 bytes,) so you are writing not only inchar but also outchar.
The only reason why it appears to work in the beginning is because the 8088 is a low-endian architecture, so the low-order byte of BX is being stored first, while the high-order byte follows.
So, try MOV (inchar),BL

LLVM use of carry and zero flags

I'm starting to read LLVM docs and IR documentation.
In common architectures, an asm cmp instruction "result" value is -at least- 3 bits long, let's say the first bit is the SIGN flag, the second bit is the CARRY flag and the third bit is the ZERO flag.
Question 1)
Why the IR icmp instruction result value is only i1? (you can choose only one flag)
Why doesn't IR define, let's call it a icmp2 instruction returning an i3 having SIGN,CARRY and ZERO flags?
This i3 value can be acted upon with a switch instruction, or maybe a specific br2 instruction, like:
%result = cmp2 i32 %a, i32 %b
br2 i3 %result onzero label %EQUAL, onsign label %A_LT_B
#here %a GT %b
Question 2)
Does this make sense? Could this br2 instruction help create new optimizations? i.e. remove all jmps? it is necessary or the performance gains are negligible?
The reason I'm asking this -besides not being an expert in LLVM- is because in my first tests I was expecting some kind of optimization to be made by LLVM in order to avoid making the comparison twice and also avoid all branches by using asm conditional-move instructions.
My Tests:
I've compiled with clang-LLVM this:
#include <stdlib.h>
#include <inttypes.h>
typedef int32_t i32;
i32 compare (i32 a, i32 b){
// return (a - b) & 1;
if (a>b) return 1;
if (a<b) return -1;
return 0;
}
int main(int argc, char** args){
i32 n,i;
i32 a,b,avg;
srand(0); //fixed seed
for (i=0;i<500;i++){
for (n=0;n<1e6;n++){
a=rand();
b=rand();
avg+=compare(a,b);
}
}
return avg;
}
Output asm is:
...
mov r15d, -1
...
.LBB1_2: # Parent Loop BB1_1 Depth=1
# => This Inner Loop Header: Depth=2
call rand
mov r12d, eax
call rand
mov ecx, 1
cmp r12d, eax
jg .LBB1_4
# BB#3: # in Loop: Header=BB1_2 Depth=2
mov ecx, 0
cmovl ecx, r15d
.LBB1_4: # %compare.exit
# in Loop: Header=BB1_2 Depth=2
add ebx, ecx
...
I expected (all jmps removed in the inner loop):
mov r15d, -1
mov r13d, 1 # HAND CODED
call rand
mov r12d, eax
call rand
xor ecx,ecx # HAND CODED
cmp r12d, eax
cmovl ecx, r15d # HAND CODED
cmovg ecx, r13d # HAND CODED
add ebx, ecx
Performance difference (1s) seems to be negligible (on a VM under VirtualBox):
LLVM generated asm: 12.53s
hancoded asm: 11.53s
diff: 1s, in 500 millions iterations
Question 3)
Are my performance measures correct? Here's the makefile and the full hancoded.compare.s
makefile:
CC=clang -mllvm --x86-asm-syntax=intel
all:
$(CC) -S -O3 compare.c
$(CC) compare.s -o compare.test
$(CC) handcoded.compare.s -o handcoded.compare.test
echo `time ./compare.test`
echo `time ./handcoded.compare.test`
echo `time ./compare.test`
echo `time ./handcoded.compare.test`
hand coded (fixed) asm:
.text
.file "handcoded.compare.c"
.globl compare
.align 16, 0x90
.type compare,#function
compare: # #compare
.cfi_startproc
# BB#0:
mov eax, 1
cmp edi, esi
jg .LBB0_2
# BB#1:
xor ecx, ecx
cmp edi, esi
mov eax, -1
cmovge eax, ecx
.LBB0_2:
ret
.Ltmp0:
.size compare, .Ltmp0-compare
.cfi_endproc
.globl main
.align 16, 0x90
.type main,#function
main: # #main
.cfi_startproc
# BB#0:
push rbp
.Ltmp1:
.cfi_def_cfa_offset 16
push r15
.Ltmp2:
.cfi_def_cfa_offset 24
push r14
.Ltmp3:
.cfi_def_cfa_offset 32
push r12
.Ltmp4:
.cfi_def_cfa_offset 40
push rbx
.Ltmp5:
.cfi_def_cfa_offset 48
.Ltmp6:
.cfi_offset rbx, -48
.Ltmp7:
.cfi_offset r12, -40
.Ltmp8:
.cfi_offset r14, -32
.Ltmp9:
.cfi_offset r15, -24
.Ltmp10:
.cfi_offset rbp, -16
xor r14d, r14d
xor edi, edi
call srand
mov r15d, -1
mov r13d, 1 # HAND CODED
# implicit-def: EBX
.align 16, 0x90
.LBB1_1: # %.preheader
# =>This Loop Header: Depth=1
# Child Loop BB1_2 Depth 2
mov ebp, 1000000
.align 16, 0x90
.LBB1_2: # Parent Loop BB1_1 Depth=1
# => This Inner Loop Header: Depth=2
call rand
mov r12d, eax
call rand
xor ecx,ecx #hand coded
cmp r12d, eax
cmovl ecx, r15d #hand coded
cmovg ecx, r13d #hand coded
add ebx, ecx
.LBB1_3:
dec ebp
jne .LBB1_2
# BB#5: # in Loop: Header=BB1_1 Depth=1
inc r14d
cmp r14d, 500
jne .LBB1_1
# BB#6:
mov eax, ebx
pop rbx
pop r12
pop r14
pop r15
pop rbp
ret
.Ltmp11:
.size main, .Ltmp11-main
.cfi_endproc
.ident "Debian clang version 3.5.0-1~exp1 (trunk) (based on LLVM 3.5.0)"
.section ".note.GNU-stack","",#progbits
Question 1: LLVM IR is machine independent. Some machines might not even have a carry flag, or even a zero flag or sign flag. The return value is i1 which suffices to indicate TRUE or FALSE. You can set the comparison condition like 'eq' and then check the result to see if the two operands are equal or not, etc.
Question 2: LLVM IR does not care about optimization initially. The main goal is to generate a Static Single Assignment (SSA) based representation of instructions. Optimization happens in later passes of which some are machine independent and some are machine dependent. Your br2 idea will assume that the machine will support those 3 flags which might be a wrong assumption,
Question 3: I am not sure what you are trying to do here. Can you explain more?

Sum function in x86 assembly - no output

I am trying to write a simple sum function in x86 assembly - to which i am passing 3 and 8 as arguments. However, the code doesn't print the sum. Appreciate any help in spotting the errors. I'm using NASM
section .text
global _start
_sum:
push ebp
mov ebp, esp
push edi
push esi ;prologue ends
mov eax, [ebp+8]
add eax, [ebp+12]
pop esi ;epilogue begins
pop edi
mov esp, ebp
pop ebp
ret 8
_start:
push 8
push 3
call _sum
mov edx, 1
mov ecx, eax
mov ebx, 1 ;stdout
mov eax, 4 ;write
int 0x80
mov ebx, 0
mov eax, 1 ;exit
int 0x80
To me, this looks like Linux assembler. From this page, in the Examples section, subsection int 0x80, it looks like ecx expects the address of the string:
_start:
movl $4, %eax ; use the write syscall
movl $1, %ebx ; write to stdout
movl $msg, %ecx ; use string "Hello World"
movl $12, %edx ; write 12 characters
int $0x80 ; make syscall
So, you'll have to get a spare chunk of memory, convert your result to a string, probably null-terminate that string, and then call the write with the address of the string in ecx.
For an example of how to convert an integer to a string see Printing an Int (or Int to String) You'll have to store each digit in a string instead of printing it, and null-terminate it. Then you can print the string.
Sorry, I have not programmed in assembly in years, so I cannot give you a more detailed answer, but hope that this will be enough to point you in the right direction.

Creating a function in assembly language (TASM)

I wanted to print the first 20 numbers using loop.
Printing the first nine numbers is absolutely fine as the hexadecimal and decimal codes are the same, but from the 10th number I had to convert each number into its appropriate code and then convert it and store it to string and eventually display it
That is,
If (NUMBER > 9)
ADD 6D
;10d = 0ah --(+6)--> 16d = 10h
IF NUMBER IS > 19
ADD 12D
;20d = 14h --(+12)--> 32d = 20h
Then rotating and shifting each number to get the desired output number, that is,
DAA # let al = 74h = 0111.0100
XOR AH,AH # ah = 0 (Just in case it wasn't)
# ax = 0000.0000.0111.0100
ROR AX,4 # ax = 0100.0000.0000.0111 = 4007h
SHR AH,4 # ax = 0000.0100.0000.0111 = 0407h
ADD AX,3030h # ax = 0011.0100.0011.0111 = 3437h = ASCII "74" (Reversed due to little endian)
And then storing the result in to the string and displaying it, that is,
MOV BX,OFFSET Result ;Let Result is an empty string
MOV byte ptr[BX],5 ;Size of the string
MOV byte ptr[BX+4],'$' ;String terminator
MOV byte ptr[BX+3],AH ;storing number
MOV byte ptr[BX+2],AL
MOV DX,BX
ADD DX,02 ;Displaying the result
MOV AH,09H ;Interrupt 21 service to display string
INT 21H
And here is the complete code with proper commenting,
MOV CX,20 ;Number of iterations
MOV DX,0 ;First value of the sequence
L1:
PUSH DX
ADD DX,30H ; 30H is equal to 0 in hexadecimal , 31H = 1 and so on
MOV AH,02H ; INTERRUPT Service to print the DX content
INT 21H
POP DX
ADD DX,1
CMP DX,09 ; if number is > 9 i.e 0A then go to L2
JA L2
LOOP L1
L2:
PUSH DX
MOV AX,DX
CMP AX,14H ;If number is equal to 14H(20) then Jump to L3
JE L3
ADD AX,6D ;If less than 20 then add 6D
XOR AH,AH ;Clear the content of AH
ROR AX,4 ;Rotating and Shifting for to properly store
SHR AH,4
ADC AX,3030h
MOV BX,OFFSET Result
MOV byte ptr[BX],5
MOV byte ptr[BX+4],'$'
MOV byte ptr[BX+3],AH
MOV byte ptr[BX+2],AL
MOV DX,BX
ADD DX,02
MOV AH,09H
INT 21H
POP DX
ADD DX,1
LOOP L2
;If the number is equal to 20 come here, ->
; Every step is repeated here just to change 6D to 12D
L3:
ADD AX,12D
XOR AH,AH
ROR AX,1
ROR AX,1
ROR AX,1
ROR AX,1
SHR AH,1
SHR AH,1
SHR AH,1
SHR AH,1
ADC AX,3030h
MOV BX,OFFSET Result
MOV byte ptr[BX],5
MOV byte ptr[BX+4],'$'
MOV byte ptr[BX+3],AH
MOV byte ptr[BX+2],AL
MOV DX,BX
ADD DX,02
MOV AH,09H
INT 21H
Is there any proper way to do it, creating a function and using if/else (jumps) to get the desired output rather than repeating the code again and again?
PSEUDO CODE:
VAR = 6
IF Number is > 9
ADD AX,VAR
Else IF Number is > 19
ADD AX,(VAR*2)
ELSE IF NUMBER is > 29
ADD AX,(VAR*3)
So you just want to print 0 ... 20 as ASCII characters? It looks like you understand that the numerals are identified as 0x30 ... 0x39 for '0' to '9', so you could use integer division to generate the character for the tens digit:
I usually work with C but conversion to assembler shouldn't be too complicated since these are all fundamental operations and there are no function calls.
int i_value = 29;
int i_tens = i_value/10; //Integer division! 29/10 = 2, save for later use
char c_tens = '0' + i_tens;
char c_ones = '0' + i_value-(10*i_tens); // Subtract N*10 from value
The output will be c_tens = 0x32, c_ones = 0x39. You should be able to wrap this inside of a loop pretty easily using a pair of registers.
Pseudocode
regA <- num_iterations //For example, 20
regB <- 0 //Initialize counter register
LOOP:
//Do conversion for the current iteration.
//Manipulate bytes for output as necessary.
regB <- regB +1
branch not equal regA, regB LOOP
The following code counts from 0 up to 99 (ax contains the ASCII number):
count proc
mov cx, 100 ; loop runs the times specified in the cx register
xor bx, bx ; set counter to zero
print:
mov ax, bx
aam ; Converts binary to unpacked BCD
xor ax, 3030h ; Converts upacked BCD to ASCII
; Print here (ax now contains the numer in ASCII representation)
inc bx ; Increase counter
loop print
ret
count endp