Tools, approaches for analysing proprietary data format? - reverse-engineering

I need to analyse a binary data file containing raw data from a scientific instrument. A quick look in a hex viewer indicates that's probably no encryption or anything fancy: integers will probably be written as integers (but I don't know what byte order), and who knows about floating point.
I have access to a (closed source) program that can view the contents of the file. So I can see that a certain value is 74078. Actually searching for that value I'm not sure about - do I search for 00 01 21 5E, some other byte order, etc? (Hex Fiend doesn't support searching for decimal values) And how would I find a floating point number?
The software that produces these files runs on XP. I'd prefer tools that run on OSX if possible.
(Hmm, I wrote up this question, forgot to post it, then solved the problem. I guess I will write my own answer.)

In the end, Hex Fiend turned out to be just enough. What I was expecting to do:
Convert a known value into hex
Search for it
What I actually did:
Pick a random chunk of hex that looked like it might be a useful value
Tell Hex Fiend to display it as integer, or as float, in either little endian or big endian, until it gave a plausible looking result (ie, 45.000 is a lot more plausible than some huge integer)
Search for that result in the results I had from the closed source program.
Document it, go back to step 1. (Except that normally the next chunk wouldn't be 'random', but would follow sequentially.)
In this case there were really only three (binary) variables for how to interpret data:
float or integer
2 bytes or 4 bytes
little or big endian
With more variables the task would be a lot harder. It would have been nice if Hex Fiend could search for integers/floats directly, perhaps trying out the different combinations. Perhaps other hex viewers do.
And to answer one of my original questions, 74078 turned out to be stored as 5E2101. A bit more trial and error and I would have got there. :)
UPDATE
If I was doing this over, I'd use "Synalyze It!", a tool designed for exactly this purpose.

Related

Relay Calculator: How to show binary results in a display?

I'm trying to create a calculator(real calculator), from scratch, with only simple parts (relay, diode, etc.). But I got to a chronic problem, how the hell do I convert binary numbers, ex:00101110 to decimal? so that I can make my display show 46 for example. Making the display translate a 0000...1001 (0...9) is easy, but what about after that? when a number comes with two or more decimal places (ex:10000000 =128). I know it can be tricky to explain, so is there somewhere I can find the answer, maybe a schematic?
This isn't a programming language, it's literally a relay computer (remember the old IBM, Harvard Mark I?). I just want to make a relay calculator that does binary calculations (the calculation part is theoretically finished. For now only sum.)
What I can't do is make the result in binary become something that can be shown on a 7segment display.
An easy example is: "0000 0111" this I can make the display show the number 7, because it has only one decimal place. Now with "0011 0100" the situation changes, the number would be 52, simply making the "52" appear on a display is not a challenge, the problem here is: how does a processor translate binary numbers from 0000 to infinity, in a way that can you put it on a display?
I don't necessarily need a definitive answer, whatever, even if a website, a book, a light at the end of the tunnel.

Convert EM4x02 ID to Hitag2 Value

I've been working on an RFID project to produce our own RFID cards to work on our existing timeclocks and readers.
I've got most of the work done, and have been able to successfully write a Hitag2 card using the value of page 4 & 5 from another card (so basically copying the card) then changing the config bit which makes it act like an EM4x02 which allows our readers to read it.
What I'm struggling with is trying to relate the hex code on page4/5 to the output you get when scanning as an EM4x..
The values of the hitag page 4/5 are FF800000/003EDF10. This translates to 0000001EBC when read as an EM4x.
Does anybody have an idea on how this translation is done? I've tried using the methods in RFIDIOT but that doesn't seem to work for this.
I've managed to find how this is done after finding a hitag2 datasheet from 1999 (the only one I could find that explains the bits when hitag is in public mode A)
Firstly, convert the number you want on the EM4 card to hex.
Convert that hex into binary.
Split the binary into 4 bit chunks, then work out the even parity for each section and add it to the end of each chunk. (So you'll end up with 5 bits per chunk)
Then, work out the even parity of each column in the data (i.e first character of all chunks, then second etc. But ignoring the parity bit you added) and add these 4 bytes to the binary string.
Then add the correct amount of zeros at the start to ensure the data section has 50 bits.
Once you have the data section sorted, add 9 bits of 1 to the beginning (header) and a final 0 to the very end of the binary.
Your whole binary string should be 64 bits long.
Convert this to hex and split it in half. You can then write these onto pages 4/5 of a Hitag2 card.
You then need to change the configuration bit to 0x02 for the tag to work in public mode a.
Just thought I would send you the diagram of how this works.Em4X tag data

Reverse-Engineering Memory Load Techniques?

I am attempting to reverse engineer a game (with permission). I am using IDA Pro. The functions are sub_xxxxx, meaning that they are protected functions.
However, the strings that would be the names for the functions, when looking at the only cross-reference, are shown in the following manner:
__data:xxxxxxxx DCD aEcdh_compute_k ; "ECDH_compute_key"
__data:xxxxxxxx DCB 0
__data:xxxxxxxx DCB 0x40
__data:xxxxxxxx DCB 12
__data:xxxxxxxx DCB 0x3B
Some of the numbers, including the DCBs are changed for the sake of safety (OCD)
I had attempted to use the 40 12 3B to use as an offset. However, the offset brings me to the middle of a random loc_xxxxx, along with the others.
My question to you is, how would I go about finding where the actual function is? Is the offset from the top of the .data segment? Or is it from the actual declaring string itself?
I do not expect or require a full answer; obviously this may not have been encountered in the past, and I may not have given enough information needed. (If you need more information, please ask, thanks). Basically, I am asking, "What should I try next?", trying to find the most likely answer. Thank you.
You're ignoring the processors endianity, which is usually little endian.
Hhitting D two times (once to convert data representation from single byte to word and another to convert it from word to dword) will convert the data to a dword for you. Alternatively, you could also hit O to directly convert data representation to an offset (which is of size dword on most architectures)
This is most likely to show you offset to address 0x003b1240, which is probably the address you were looking for.

Bulk export of binary waveform data from oscilloscope to data points (csv preferred)

I'm working with some binary waveform files from various early to mid-90's HP scopes. I am trying to do a bulk conversion (we have over 5000) of the files to CSV's and then upload them into a database. I've tried hexdump, xxd, od, strings, etc. and none of them seem to work. I did hunt down a programmers manual but it's not making a whole lot of sense.
The files have a preamble line as ascii text but then the data points are in binary and for some reason nothing I try can decode them. The preamble gives the data necessary to use the binary values and calculate the correct values. It also states that the data is in WORD format.
:WAV:PRE 2,1,32768,1,+4.000000E-08,-4.9722700001108E-06,0,+2.460630E-04,+2.500000E+00,16384;:WAV:DATA #800065536^W�^W�^W�^
I'm pretty confused.
Have a look at
http://www.naic.edu/~phil/hardware/oscilloscopes/9000A_Programmer_Reference.pdf
specifically page 1-21. After ":WAV:DATA", I think the rest of the chunk above will have 65536 8-bit data bytes (the start of which is represented above by �) . The ^W is probably a delimiter, so you would have to parse that out. Just a thought.
UPDATE: I'm new to oscilloscope data collection and am trying to figure the whole thing out from scratch. So, on further digging, it looks like the data you have provided shows this:
PREamble:
- WORD format (16-bit signed integers split into 2 8-bit bytes)
- If there is a WAV:BYT section, that would specify byte order for each pair
- RAW data
- 32768 data points
- COUNT = 1 (I'm not clear on the meaning of this)
- Next 3 should be X increment, origin, reference
- Next 3 should be Y increment, origin, reference, although the manual that I pointed you at above has many more fields than just these, so you might want to consult your specific scope manual.
DATA:
- On closer examination, I don't think the ^W is a delimiter, I think it is the first byte of the pair (0010111). The � character is apparently a standard "I don't know how to represent this character" web representation. You would need to look at that character as 8 bits also.
- 65536 byte pairs of data
I'm not finding a utility that will do this for you. I think you're going to have to write or acquire some code (Perl, C, Java, Python, VB, etc.) to get this done.

What exactly is the danger of using magic debug values (such as 0xDEADBEEF) as literals?

It goes without saying that using hard-coded, hex literal pointers is a disaster:
int *i = 0xDEADBEEF;
// god knows if that location is available
However, what exactly is the danger in using hex literals as variable values?
int i = 0xDEADBEEF;
// what can go wrong?
If these values are indeed "dangerous" due to their use in various debugging scenarios, then this means that even if I do not use these literals, any program that during runtime happens to stumble upon one of these values might crash.
Anyone care to explain the real dangers of using hex literals?
Edit: just to clarify, I am not referring to the general use of constants in source code. I am specifically talking about debug-scenario issues that might come up to the use of hex values, with the specific example of 0xDEADBEEF.
There's no more danger in using a hex literal than any other kind of literal.
If your debugging session ends up executing data as code without you intending it to, you're in a world of pain anyway.
Of course, there's the normal "magic value" vs "well-named constant" code smell/cleanliness issue, but that's not really the sort of danger I think you're talking about.
With few exceptions, nothing is "constant".
We prefer to call them "slow variables" -- their value changes so slowly that we don't mind recompiling to change them.
However, we don't want to have many instances of 0x07 all through an application or a test script, where each instance has a different meaning.
We want to put a label on each constant that makes it totally unambiguous what it means.
if( x == 7 )
What does "7" mean in the above statement? Is it the same thing as
d = y / 7;
Is that the same meaning of "7"?
Test Cases are a slightly different problem. We don't need extensive, careful management of each instance of a numeric literal. Instead, we need documentation.
We can -- to an extent -- explain where "7" comes from by including a tiny bit of a hint in the code.
assertEquals( 7, someFunction(3,4), "Expected 7, see paragraph 7 of use case 7" );
A "constant" should be stated -- and named -- exactly once.
A "result" in a unit test isn't the same thing as a constant, and requires a little care in explaining where it came from.
A hex literal is no different than a decimal literal like 1. Any special significance of a value is due to the context of a particular program.
I believe the concern raised in the IP address formatting question earlier today was not related to the use of hex literals in general, but the specific use of 0xDEADBEEF. At least, that's the way I read it.
There is a concern with using 0xDEADBEEF in particular, though in my opinion it is a small one. The problem is that many debuggers and runtime systems have already co-opted this particular value as a marker value to indicate unallocated heap, bad pointers on the stack, etc.
I don't recall off the top of my head just which debugging and runtime systems use this particular value, but I have seen it used this way several times over the years. If you are debugging in one of these environments, the existence of the 0xDEADBEEF constant in your code will be indistinguishable from the values in unallocated RAM or whatever, so at best you will not have as useful RAM dumps, and at worst you will get warnings from the debugger.
Anyhow, that's what I think the original commenter meant when he told you it was bad for "use in various debugging scenarios."
There's no reason why you shouldn't assign 0xdeadbeef to a variable.
But woe betide the programmer who tries to assign decimal 3735928559, or octal 33653337357, or worst of all: binary 11011110101011011011111011101111.
Big Endian or Little Endian?
One danger is when constants are assigned to an array or structure with different sized members; the endian-ness of the compiler or machine (including JVM vs CLR) will affect the ordering of the bytes.
This issue is true of non-constant values, too, of course.
Here's an, admittedly contrived, example. What is the value of buffer[0] after the last line?
const int TEST[] = { 0x01BADA55, 0xDEADBEEF };
char buffer[BUFSZ];
memcpy( buffer, (void*)TEST, sizeof(TEST));
I don't see any problem with using it as a value. Its just a number after all.
There's no danger in using a hard-coded hex value for a pointer (like your first example) in the right context. In particular, when doing very low-level hardware development, this is the way you access memory-mapped registers. (Though it's best to give them names with a #define, for example.) But at the application level you shouldn't ever need to do an assignment like that.
I use CAFEBABE
I haven't seen it used by any debuggers before.
int *i = 0xDEADBEEF;
// god knows if that location is available
int i = 0xDEADBEEF;
// what can go wrong?
The danger that I see is the same in both cases: you've created a flag value that has no immediate context. There's nothing about i in either case that will let me know 100, 1000 or 10000 lines that there is a potentially critical flag value associated with it. What you've planted is a landmine bug that, if I don't remember to check for it in every possible use, I could be faced with a terrible debugging problem. Every use of i will now have to look like this:
if (i != 0xDEADBEEF) { // Curse the original designer to oblivion
// Actual useful work goes here
}
Repeat the above for all of the 7000 instances where you need to use i in your code.
Now, why is the above worse than this?
if (isIProperlyInitialized()) { // Which could just be a boolean
// Actual useful work goes here
}
At a minimum, I can spot several critical issues:
Spelling: I'm a terrible typist. How easily will you spot 0xDAEDBEEF in a code review? Or 0xDEADBEFF? On the other hand, I know that my compile will barf immediately on isIProperlyInitialised() (insert the obligatory s vs. z debate here).
Exposure of meaning. Rather than trying to hide your flags in the code, you've intentionally created a method that the rest of the code can see.
Opportunities for coupling. It's entirely possible that a pointer or reference is connected to a loosely defined cache. An initialization check could be overloaded to check first if the value is in cache, then to try to bring it back into cache and, if all that fails, return false.
In short, it's just as easy to write the code you really need as it is to create a mysterious magic value. The code-maintainer of the future (who quite likely will be you) will thank you.