Reverse engineering ambiguous syntax - reverse-engineering

What I often see online, when the topic is reversing, is this syntax
*(_WORD *)(a1 + 6) = *(_WORD *)(a2 + 2);
I think this code is from an IDA plugin (right?), but I can't understand it .. can someone explain me a little bit, or indicate something where to study this code nature ?
Thanxs in advance =)

This code copies 2 bytes from the address pointed to by a2 + 2 into the address pointed to by a1 + 6.
In more detail, the code does the following:
advance 2 bytes from a2.
treat the result as a WORD pointer, i.e. a pointer to a value made up of two bytes. This is the (_WORD *) part on the right.
read the 2 bytes referenced by the above pointer. This is the * at the very left of the expression on the right.
We now have a 16-bit value. Now we:
advance 6 bytes from a1.
treat the result as a WORD pointer. Again, this is the (_WORD *) part.
write the 2 bytes we read in the first part into the address pointed to by the pointer that we have.
If you've never seen such code before, you may think that it's superfluous to use the (_WORD*) on both sides of the expression - but it is not. For example, we can read a 16 bit value and write it into a pointer to a 32-bit value (e.g. by sign-extending it).
I suggest that you also look at the assembly code where you will see the steps making up this assignment. If you don't have it available then just write a C program on your own that does such manipulation and then decompile it.

Related

How to perform Arithmetic on Ones Complement Numbers and correct overflow?

For some backstory, I'm making a program that can do arithmetic on ones complement numbers. To do this I'm converting a binary string into a BigInteger and then performing the math using said BigIntegers, and then converting that back into a binary string. The only problem occurs when the end result goes below -127 or above +127 because I don't know how to correct it due to the nature of ones complement numbers. I was hoping I could somehow instead convert them like unsigned numbers and do like what this answer says to do.
There are also a couple of other questions that I got from reading the linked question. I put them in block quotes. I'm just asking for information on what they mean, and explain it to me.
Firstly
I know that the r-1 complement for r-base number should do end around carry if the highest bit has carry.
Secondly
End-around carry is actually rather simple: it changes the modulus of the addition operation from rn to rn–1.
And lastly
Again, let's keep the carry bit where it is. If you look at the numbers as unsigned integers, we're computing 13 + 11 = 24. However, due to the wrap-around carry, addition is done modulo 15, so we end up with 9, which represents -6 (the correct result).
If someone can explain these quotes to me and provide some web pages for me to read I would greatly appreciate it! :)

Is there any errors in this Hamming code 10101011110?

Suppose we are working with an error-correcting code that will allow all single-bit errors to be
corrected for memory words of length 7. We have already calculated that we need 4 check bits,
and the length of all code words will be 11. Code words are created according to the Hamming
algorithm presented in the text. We now receive the following code word:
1 0 1 0 1 0 1 1 1 1 0
Assuming even parity, is this a legal code word? If not, according to our error-correcting code,
where is the error?
P.s need a bit help with this Hamming code problem ,it's a book question .
Thanks in advance :)
Code words are created according to the Hamming algorithm presented in the text.
This is an important piece of information, don't you think? :-)
Without the algorithm, we can't be sure of validity so I'll give you a general approach.
Each check bit will generally apply to some subset of bits. Let's say the seven bits are b6 through b0. The four check bits are calculated to give even parity across the following subsets:
1 0 1 0 1 0 1
ca b6 b5 b3 b2 b1 b0 1+0+0+1+0+1 + (ca = 0)
cb b6 b4 b2 b0 1+1+1+1 + (cb = 0)
cc b6 b3 b0 1+0+1 + (cc = 0)
cd b5 b1 b0 0+0+1 + (cd = 1)
Now, if you didn't get a check bit sequence matching the data, you would ideally be able to work out which single data or check bit would need to change in order to repair it. This would only work if you could ensure that each check bit calculation was carefully crafted to add maximal extra information.
That probably a failing to my above calculation since I just pulled it out of thin air. But, since my intent was just to explain the concept in the absence of your algorithm, that shouldn't matter.
One way to ensure that an algorithm works is to brute force it:
Get a list of all possible eleven-bit values. There's only 2048 of these so it's not onerous. For now, discard the ones that are already okay (we're only interested in invalid ones).
In turn, toggle each bit (of the 11) and see if the item becomes valid.
If no bit toggles made it valid, this is not a single-bit error so can safely be ignored.
If one bit toggle made it valid, then you have enough information to fix this case.
If more than one bit toggle made it valid, you don't have enough information to fix this case.
Bottom line, if every one-bit error possibility can be made valid with a single bit toggle (the second last bullet point above), the correction code is perfect.
You should also check to ensure that every valid one will become invalid if a single bit is toggled, but I'll leave that an an exercise since you should now have enough information on how to do this.

Reverse-Engineering Memory Load Techniques?

I am attempting to reverse engineer a game (with permission). I am using IDA Pro. The functions are sub_xxxxx, meaning that they are protected functions.
However, the strings that would be the names for the functions, when looking at the only cross-reference, are shown in the following manner:
__data:xxxxxxxx DCD aEcdh_compute_k ; "ECDH_compute_key"
__data:xxxxxxxx DCB 0
__data:xxxxxxxx DCB 0x40
__data:xxxxxxxx DCB 12
__data:xxxxxxxx DCB 0x3B
Some of the numbers, including the DCBs are changed for the sake of safety (OCD)
I had attempted to use the 40 12 3B to use as an offset. However, the offset brings me to the middle of a random loc_xxxxx, along with the others.
My question to you is, how would I go about finding where the actual function is? Is the offset from the top of the .data segment? Or is it from the actual declaring string itself?
I do not expect or require a full answer; obviously this may not have been encountered in the past, and I may not have given enough information needed. (If you need more information, please ask, thanks). Basically, I am asking, "What should I try next?", trying to find the most likely answer. Thank you.
You're ignoring the processors endianity, which is usually little endian.
Hhitting D two times (once to convert data representation from single byte to word and another to convert it from word to dword) will convert the data to a dword for you. Alternatively, you could also hit O to directly convert data representation to an offset (which is of size dword on most architectures)
This is most likely to show you offset to address 0x003b1240, which is probably the address you were looking for.

PIC Assembly: Calling functions with variables

So say I have a variable, which holds a song number. -> song_no
Depending upon the value of this variable, I wish to call a function.
Say I have many different functions:
Fcn1
....
Fcn2
....
Fcn3
So for example,
If song_no = 1, call Fcn1
If song_no = 2, call Fcn2
and so forth...
How would I do this?
you should have compare function in the instruction set (the post suggests you are looking for assembly solution), the result for that is usually set a True bit or set a value in a register. But you need to check the instruction set for that.
the code should look something like:
load(song_no, $R1)
cmpeq($1,R1) //result is in R3
jmpe Fcn1 //jump if equal
cmpeq ($2,R1)
jmpe Fcn2
....
Hope this helps
I'm not well acquainted with the pic, but these sort of things are usually implemented as a jump table. In short, put pointers to the target routines in an array and call/jump to the entry indexed by your song_no. You just need to calculate the address into the array somehow, so it is very efficient. No compares necessary.
To elaborate on Jens' reply the traditional way of doing on 12/14-bit PICs is the same way you would look up constant data from ROM, except instead of returning an number with RETLW you jump forward to the desired routine with GOTO. The actual jump into the jump table is performed by adding the offset to the program counter.
Something along these lines:
movlw high(table)
movwf PCLATH
movf song_no,w
addlw table
btfsc STATUS,C
incf PCLATH
addwf PCL
table:
goto fcn1
goto fcn2
goto fcn3
.
.
.
Unfortunately there are some subtleties here.
The PIC16 only has an eight-bit accumulator while the address space to jump into is 11-bits. Therefore both a directly writable low-byte (PCL) as well as a latched high-byte PCLATH register is available. The value in the latch is applied as MSB once the jump is taken.
The jump table may cross a page, hence the manual carry into PCLATH. Omit the BTFSC/INCF if you know the table will always stay within a 256-instruction page.
The ADDWF instruction will already have been read and be pointing at table when PCL is to be added to. Therefore a 0 offset jumps to the first table entry.
Unlike the PIC18 each GOTO instruction fits in a single 14-bit instruction word and PCL addresses instructions not bytes, so the offset should not be multiplied by two.
All things considered you're probably better off searching for general PIC16 tutorials. Any of these will clearly explain how data/jump tables work, not to mention begin with the basics of how to handle the chip. Frankly it is a particularly convoluted architecture and I would advice staying with the "free" hi-tech C compiler unless you particularly enjoy logic puzzles or desperately need the performance.

What are "magic numbers" in computer programming?

When people talk about the use of "magic numbers" in computer programming, what do they mean?
Magic numbers are any number in your code that isn't immediately obvious to someone with very little knowledge.
For example, the following piece of code:
sz = sz + 729;
has a magic number in it and would be far better written as:
sz = sz + CAPACITY_INCREMENT;
Some extreme views state that you should never have any numbers in your code except -1, 0 and 1 but I prefer a somewhat less dogmatic view since I would instantly recognise 24, 1440, 86400, 3.1415, 2.71828 and 1.414 - it all depends on your knowledge.
However, even though I know there are 1440 minutes in a day, I would probably still use a MINS_PER_DAY identifier since it makes searching for them that much easier. Whose to say that the capacity increment mentioned above wouldn't also be 1440 and you end up changing the wrong value? This is especially true for the low numbers: the chance of dual use of 37197 is relatively low, the chance of using 5 for multiple things is pretty high.
Use of an identifier means that you wouldn't have to go through all your 700 source files and change 729 to 730 when the capacity increment changed. You could just change the one line:
#define CAPACITY_INCREMENT 729
to:
#define CAPACITY_INCREMENT 730
and recompile the lot.
Contrast this with magic constants which are the result of naive people thinking that just because they remove the actual numbers from their code, they can change:
x = x + 4;
to:
#define FOUR 4
x = x + FOUR;
That adds absolutely zero extra information to your code and is a total waste of time.
"magic numbers" are numbers that appear in statements like
if days == 365
Assuming you didn't know there were 365 days in a year, you'd find this statement meaningless. Thus, it's good practice to assign all "magic" numbers (numbers that have some kind of significance in your program) to a constant,
DAYS_IN_A_YEAR = 365
And from then on, compare to that instead. It's easier to read, and if the earth ever gets knocked out of alignment, and we gain an extra day... you can easily change it (other numbers might be more likely to change).
There's more than one meaning. The one given by most answers already (an arbitrary unnamed number) is a very common one, and the only thing I'll say about that is that some people go to the extreme of defining...
#define ZERO 0
#define ONE 1
If you do this, I will hunt you down and show no mercy.
Another kind of magic number, though, is used in file formats. It's just a value included as typically the first thing in the file which helps identify the file format, the version of the file format and/or the endian-ness of the particular file.
For example, you might have a magic number of 0x12345678. If you see that magic number, it's a fair guess you're seeing a file of the correct format. If you see, on the other hand, 0x78563412, it's a fair guess that you're seeing an endian-swapped version of the same file format.
The term "magic number" gets abused a bit, though, referring to almost anything that identifies a file format - including quite long ASCII strings in the header.
http://en.wikipedia.org/wiki/File_format#Magic_number
Wikipedia is your friend (Magic Number article)
Most of the answers so far have described a magic number as a constant that isn't self describing. Being a little bit of an "old-school" programmer myself, back in the day we described magic numbers as being any constant that is being assigned some special purpose that influences the behaviour of the code. For example, the number 999999 or MAX_INT or something else completely arbitrary.
The big problem with magic numbers is that their purpose can easily be forgotten, or the value used in another perfectly reasonable context.
As a crude and terribly contrived example:
while (int i != 99999)
{
DoSomeCleverCalculationBasedOnTheValueOf(i);
if (escapeConditionReached)
{
i = 99999;
}
}
The fact that a constant is used or not named isn't really the issue. In the case of my awful example, the value influences behaviour, but what if we need to change the value of "i" while looping?
Clearly in the example above, you don't NEED a magic number to exit the loop. You could replace it with a break statement, and that is the real issue with magic numbers, that they are a lazy approach to coding, and without fail can always be replaced by something less prone to either failure, or to losing meaning over time.
Anything that doesn't have a readily apparent meaning to anyone but the application itself.
if (foo == 3) {
// do something
} else if (foo == 4) {
// delete all users
}
Magic numbers are special value of certain variables which causes the program to behave in an special manner.
For example, a communication library might take a Timeout parameter and it can define the magic number "-1" for indicating infinite timeout.
The term magic number is usually used to describe some numeric constant in code. The number appears without any further description and thus its meaning is esoteric.
The use of magic numbers can be avoided by using named constants.
Using numbers in calculations other than 0 or 1 that aren't defined by some identifier or variable (which not only makes the number easy to change in several places by changing it in one place, but also makes it clear to the reader what the number is for).
In simple and true words, a magic number is a three-digit number, whose sum of the squares of the first two digits is equal to the third one.
Ex-202,
as, 2*2 + 0*0 = 2*2.
Now, WAP in java to accept an integer and print whether is a magic number or not.
It may seem a bit banal, but there IS at least one real magic number in every programming language.
0
I argue that it is THE magic wand to rule them all in virtually every programmer's quiver of magic wands.
FALSE is inevitably 0
TRUE is not(FALSE), but not necessarily 1! Could be -1 (0xFFFF)
NULL is inevitably 0 (the pointer)
And most compilers allow it unless their typechecking is utterly rabid.
0 is the base index of array elements, except in languages that are so antiquated that the base index is '1'. One can then conveniently code for(i = 0; i < 32; i++), and expect that 'i' will start at the base (0), and increment to, and stop at 32-1... the 32nd member of an array, or whatever.
0 is the end of many programming language strings. The "stop here" value.
0 is likewise built into the X86 instructions to 'move strings efficiently'. Saves many microseconds.
0 is often used by programmers to indicate that "nothing went wrong" in a routine's execution. It is the "not-an-exception" code value. One can use it to indicate the lack of thrown exceptions.
Zero is the answer most often given by programmers to the amount of work it would take to do something completely trivial, like change the color of the active cell to purple instead of bright pink. "Zero, man, just like zero!"
0 is the count of bugs in a program that we aspire to achieve. 0 exceptions unaccounted for, 0 loops unterminated, 0 recursion pathways that cannot be actually taken. 0 is the asymptote that we're trying to achieve in programming labor, girlfriend (or boyfriend) "issues", lousy restaurant experiences and general idiosyncracies of one's car.
Yes, 0 is a magic number indeed. FAR more magic than any other value. Nothing ... ahem, comes close.
rlynch#datalyser.com