MASM32 string comparsion - masm32

I have written following code to compare two strings, one is predefined and other is taken as input from user. But everytime the program shows them as unequal. please assist me. I am using MASM32 assembler.
.data
msg1 db '***Welcome to My Program***',13,10,0
msg2 db 'Please Enter a Product: ',0
msg3 db 'You Entered Shoes: ',0
p1 db 'Shoes',0
.data?
product db 100 dup(?)
.code
start:
invoke StdOut,ADDR msg1
invoke StdOut,ADDR msg2
invoke StdIn,ADDR product,100 ; receive text input
lea esi, p1 ; Load the buffer start address
lea edi, product ; Load the save buffer start address
mov ecx, 10 ; Load the operation count
repe cmpsb ; Compare the byte into the buffer
jne Terminate
invoke StdOut,ADDR msg3
Terminate:
invoke ExitProcess,0
END start

I don't have the MASM32 reference to hand but from memory StdIn will also take the carriage return + line feed from hitting enter in the console and that will be reflected in the variable you read to.
MASM32 has a built in function called StripLF or something like that to deal with this. The comparison should pass after that.
For problems like this I highly recommend OllyDbg which will allow you to step your code and see the memory dump and stack.
Edit : Just found a thread on MASM32 forums demonstrating exactly what I'm describing (ignore the fact that's a bug report but hutch comments on the designated behaviour of StdIn) : http://www.masm32.com/board/index.php?PHPSESSID=b98a1a56c52fdc4c07a2bca3553302e2&topic=51.0

Related

How do I convert a disassembled instruction to be shown in binary in gdb?

I've got a binary that I've disassembled into viewable assembly in gdb. However, I'd like to see the actual binary of each instruction (i.e. the actual instruction in whatever instruction format it is actually issued to the CPU in). Is there a way to input the address of an instruction and see that instruction in binary?
I tried p /t 0x-------- for whatever address, but it decoded the address itself into binary.
I tried the same, but with $0x--------, this produced a "Value can't be converted to integer" error.
I'd just like to be able to see an instruction such as lwi or ori at a given address, such as 0x00000300, in binary as gdb is seeing it.
You are looking for disassemble/r 0x....
From the manual:
print the raw instructions in hex as well as in symbolic
form by specifying the /r modifier.
Update:
I can see, in layout asm, the assembly instructions obtained from my binary. But running the disassemble command on its own does not allow me to see anything, as it says "No function contains specified address."
So your binary is stripped (or at least GDB doesn't know where the nearest function is).
The solution is to disassemble just the instruction you are interested in. For example:
(gdb) disas 0x0000555555556d60
No function contains specified address.
(gdb) disas 0x0000555555556d60,+1
Dump of assembler code from 0x555555556d60 to 0x555555556d61:
0x0000555555556d60: mov %edi,%eax
End of assembler dump.
(gdb) disas/r 0x0000555555556d60,+1
Dump of assembler code from 0x555555556d60 to 0x555555556d61:
0x0000555555556d60: 89 f8 mov %edi,%eax
End of assembler dump.
I found the solution, it was to write the following command:
p /x *[hex address]
So for example:
p /x *0x00000300

How can I simulate a CALL instruction by using JMP?

Like this but without the CALL instruction. I suppose that I should use JMP and probably other instructions.
PUSH 5
PUSH 4
CALL Function
This is fairly easy to do. Push the return address onto the stack and then jump to the subroutine.
The final code looks like this:
PUSH 5
PUSH 4
PUSH offset label1
jmp Function
label1: ; returns here
leas esp, 8[esp]
Function:
...
ret
While this works, you really don't want to do this. On most modern processors, an on-chip call stack return address cache is kept, which pushes return addresses on a call, and pops return addresses on an RET. Being on the processor this has extremely short update/access times, which means the RET instruction can use the call-stack cache popped value to predict where the PC should go next, rather than waiting for the actual memory read from the memory location actually pointed to by ESP. If you do the "PUSH offset label1" trick,
this cache does not get updated, and thus the RET branch prediction is wrong and the processor pipeline gets blown, having a severe negative impact on performance. (I think IBM has a patent on special instructions which are essentially "PUSHRETURNADDRESS k" and "POPRETURNADDESS", allowing this trick to be used on some of their CPUs. Alas, not on the x86.
It depends on the situation. If the last thing your function does before returning is call another function, you can simply jump to that function. This is called tail call elimination, and is an optimization performed by many compilers. Example:
foo:
call B
call A
ret
Tail call elimination replaces the last two lines with a single jump instruction:
foo:
call B
jmp A
This works because the stack contains the return address of foo's caller. So when function A returns, it returns back to the function that called foo.
It you want execution to resume after the jump to A, push that address onto the stack before jumping:
foo:
call B
push offset bar
jmp A
bar:
However, I can think of no reason why anybody would want to do this.
Before x86-64, call was the only instruction that could read EIP. (I guess int as well, but it doesn't put the result anywhere you can read from user-space).
So it's impossible to simulate call in position-independent code. In fact, 32-bit PIC code uses call to find out its own address.
But in x86-64, we have RIP-relative lea
... put function args in registers
lea rax, [rel ret_addr] ; AT&T lea ret_addr(%rip), %rax
push rax
jmp call_target
ret_addr:
call itself internally decodes as push RIP / jmp target, where RIP during execution of an instruction = address of the end of that instruction = start of the next.
Of course this is normally terrible for performance, unbalancing the return-address predictor stack. http://blog.stuffedcow.net/2018/04/ras-microbenchmarks/. Use a normal call unless you want a ret to mispredict, e.g. for a retpoline or specpoline.
(A tailcall with just jmp is fine, collapsing a call/ret pair into a jmp, but pushing a new return address manually is always a problem.)

VHDL MIPS 5 stage pipeline Bug

The code for this is too long to post so Ill just describe it. I've created a 5 stage mips pipe that almost works. The catch is that EVERY lw instruction that reaches the instruction decode stage overwrites the control signal values in the execution stage. Not only that it causes the PC to skip can instruction, i.e from 300 -> 308. I just need some idea on where to look for bugs since this is a class assignment. If we take out all the LW instructions the CPU works fine.
Example:
The adder in the EX stage is going to sub $4 $1 $2 which should be 1
Once LW enters the ID stage ALUsrc is asserted AND ALUop is changed from subtract to add
This forces the adder in the EX stage to add $4 $1 $2 resulting in 5 being stored in $4
http://en.wikipedia.org/wiki/File:MIPS_Architecture_%28Pipelined%29.svg
The MIPS 5 Stage Pipeline (annotated to show Write Reg Select and enable)
The bottom line through the pipeline stages represents the register file write (back) port address and write enable and WB is the data from memory.
http://www.mrc.uidaho.edu/mrc/people/jff/digital/MIPSir.html
Load Word Instruction
Description:
A word is loaded into a register from the specified address.
Operation: $t = MEM[$s + offset]; advance_pc (4);
Syntax: lw $t, offset($s)
Encoding:
1000 11ss ssst tttt iiii iiii iiii iiii
Where the write register address ($t) input is read from data memory address comprised of register file register $s offset with the immediate value i which gets sign extended. Your $4 is $t above, $1 or $2 is $s while the remaining register file output lane sounds to be suborned for the sign extended immediate.
From your description it sounds like you aren't using a three port register file with one port a write only port.
With a three port register file the only time you run into conflicts is when you attempt to use the new register file value from memory before it is read from memory and written to the register file. That can be managed by a compiler scheduling NOOPs until the outstanding register file write is retired when a following instruction is trying to use it, or stalling the IF/ID in hardware when it's output contains a reference to an outstanding register file write.
There are three instructions that can be in flight to the right of IF/ID, each with a write to register file address and a write enable. You'd need to compare both instruction decode register file addresses to all three of those and stall IF/ID until those clear out. The write enable stored in each of those three pipeline stages are used to determine whether the write register address in those pipeilne stages should be compared.
Because the ID/EX, EX/MEM and MEM/WB write register file addresses are not used anywhere else the circuitry for doing the comparison can be collocated with IF/ID and the Register File, preventing unnecessary layout delays affecting the minimum clock cycle.
Using a two port register file is much simpler and infers IF/ID stalling until the write enable comes back from MEM/WB, effectively turning any memory reading instructions into 3 cycle instructions (or more, data memory can stall if it's a cache or slow). It makes a three port register file more or less necessary for performance reasons. There's an implied multiplexer to source for at least one of the two register file port controls (write enable, write address) from the MEM/WB stage when IF/ID is stalled (for memory->regfile).
Data memory access can stall MEM/WB, just like instruction memory access can also stall IF/ID. A stalled IF/ID doesn't issue a write enable for the register file to ID/EX nor does a stalled MEM/WB.

x86 assembly functions

I have a function that is called by main. Assume that function's name is funct1. funct1 calls another function named read_input.
Now assume that funct1 starts as follows:
push %rbp
push %rbx
sub $0x28, %rsp
mov $rsp, %rsi
callq 4014f0 read_input
cmpl $0x0, (%rsp)
jne (some terminating function)
So just a few of questions:
In this case, does read_input only have one argument, which is
%rbx?
Furthermore, if the stack pointer is being decreased by
0x28, this means a string of size 0x28 is getting pushed onto the
stack? (I know it's a string).
And what is the significance of
mov %rsp, %rsi before calling a function?
And lastly, when read_input returns, where is the return value put?
Thank you and sorry for the questions but I am just starting to learn x86!
It looks like your code is using the Linux/AMD ABI. I'll answer your questions in that context.
No, rbx is a callee-saved (nonvolatile) register. Your function is saving it so that it doesn't disturb the caller's value. It's not being restored in the code you've shown, but that's because you haven't shown the whole function. If there's more to this function, and I think there is, it's because rbx is being used somewhere later on in this routine.
Yes, space for 0x28 bytes of data is being made on the stack. Assuming read_input is taking a string as a parameter, your description is reasonable. It's not necessarily accurate, however. Some of that data might be used for other local variables aside from just the buffer being allocated to pass to read_input.
This instruction is putting a pointer to the newly allocated stack buffer into rsi. rsi is the second parameter register for the AMD x64 calling convention. That means you're going to be calling read_input with whatever the first parameter passed to this function is, along with a pointer to your new stack buffer.
In rax, if it's a 64-bit value or smaller, in rax & rdx if it's larger. Or if it's floating point, in xmm0, ymm0, or st(0). You probably should look at a description of your calling convention to get a handle on this stuff - there's a great PDF file at this link. Check out Table 4.

Will arguments to a function be passed on the stack or in a register?

I'm currently analyzing a program I wrote in assembly and was thinking about moving some code around in the assembly. I have a procedure which takes one argument, but I'm not sure if it is passed on the stack or a register.
When I open my program in IDA Pro, the first line in the procedure is:
ThreadID= dword ptr -4
If I hover my cursor over the declaration, the following also appears:
ThreadID dd ?
r db 4 dup(?)
which I would assume would point to a stack variable?
When I open the same program in OllyDbg however, at this spot on the stack there is a large value, which would be inconsistent with any parameter that could have been passed, leading me to believe that it is passed in a register.
Can anyone point me in the right direction?
The way arguments are passed to a function depends on the function's calling convention. The default calling convention depends on the language, compiler and architecture.
I can't say anything for sure with the information you provided, however you shouldn't forget that assembly-level debuggers like OllyDbg and disassemblers like IDA often use heuristics to reverse-engineer the program. The best way to study the code generated by the compiler is to instruct it to write assembly listings. Most compilers have an option to do this.
It is a local variable for sure. To check out arguments look for [esp+XXX] values. IDA names those [esp+arg_XXX] automatically.
.text:0100346A sub_100346A proc near ; CODE XREF: sub_100347C+44p
.text:0100346A ; sub_100367A+C6p ...
.text:0100346A
.text:0100346A arg_0 = dword ptr 4
.text:0100346A
.text:0100346A mov eax, [esp+arg_0]
.text:0100346E add dword_1005194, eax
.text:01003474 call sub_1002801
.text:01003474
.text:01003479 retn 4
.text:01003479
.text:01003479 sub_100346A endp
And fastcall convention as was outlined in comment above uses registers to pass arguments. I'd bet on Microsoft or GCC compiler as they are more widely used. So check out ECX and EDX registers first.
Microsoft or GCC [2] __fastcall[3]
convention (aka __msfastcall) passes
the first two arguments (evaluated
left to right) that fit into ECX and
EDX. Remaining arguments are pushed
onto the stack from right to left.
http://en.wikipedia.org/wiki/X86_calling_conventions#fastcall