Tiff tag data buffer strange hex values for floating point variables - binary

So, I have been assigned a task to read and save values of C-struct that was stored within tifftag of tiff image as byte buffer. This might be quite simple, but I am quite new to this realm of programming.
I know the exact positions I need to read bytes from. When I use python tiff tag readers, I get these weird values of bytes, that I could not make sense of. I was expecting it to be \xb5\x00\x00\x00\x01
format, not something strange like \n\xd7#=\n\xd7#=K.
Here is the snippet of weird buffer values
However, in utility app AsTiffViewer, those are perfectly fine as shown
here.
How do I decode this? What does this all mean?
\n\xd7#=\n\xd7#=K (0A D7 23 3D 0A D7 23 3D - as per AsTiffViewer)
By the way, these 0A D7 23 3D & 0A D7 23 3D are supposed to be two float value, each of them 4 bytes.
I was expecting tiff tag byte buffer to be in format of\xb5\x00\x00\x00\x01 etc, However, it spit out some weird format - \n\xd7#=\n\xd7#=K. I don't know how to decode or read this.

So, after mucking around a bit, I found out that, \n\xd7#=\n\xd7#=K, this is nothing but how python represents float in binary string.

Related

How to encode raw bytes?

We have a 2D datamatrix barcode which outputs as 12002052 (CR+LF after the value). When scanning into Chrome the barcode is triggering the downloads menu - which I have read from other posts that this is due to the CR+LF. To troubleshoot, we generated a new 2D datamatrix barcode with an online generator for 12002052 which scans successfully in Chrome (doesn't trigger the downloads menu) but when scanned into notepad++ (showing all characters) it shows the exact same output as the original/bad barcode.
I took an image of both the good and bad barcode and uploaded them to a datamatrix decoding website (zxing) and what is interesting is the last value in the "raw bytes" is different for each barcode
bad 2D
Raw text 12002052
Raw bytes 8e 82 96 b6 81
Barcode format DATA_MATRIX
Parsed Result Type TEXT
Parsed Result 12002052
good 2D
Raw text 12002052
Raw bytes 8e 82 96 b6 0b
Barcode format DATA_MATRIX
Parsed Result Type TEXT
Parsed Result 12002052
my question is what exactly are the "raw bytes" and how could I possible encode them to hopefully reverse engineer this and find what is differentiating the 2 barcodes?
I believe that 'Raw bytes' would refer to a byte array. Byte arrays are exactly what they sound like, an array of bytes which are 8 bits each. So, the raw bytes '8e 82 96 b6 0b' refer to hexidecimal representations of each byte.
That said, from the string you have provided - I do not get a corresponding byte array that matches the raw text input provided. (There are plenty of string to byte converters online) Perhaps some character encoding other than ASCII or UTF8 is used in this case.

How to convert 4 bytes hex to decimal manually

I am doing a CTF challenge. I open a broken BMP image file with a hex editor (HexFiend). I highlight 4 bytes in hex 8E262C00. In the bottom, HexFiend shows their value in decimal 2893454. However, I use online hex to decimal converting tool, their value is 2384866304.
Do anyone know how HexFiend comes up with 2893454?. I believe it is a correct answer, because that is the size of the file.
It's the endianness of the file.
A binary encoded file can be encoded with small or big endian. The difference is which succession the single bytes have, i.e. if you read them from left or from right. Note that the order of bits almost always is big endian. The natural way of reading is big ending; the bytes are stores as you would expect it. 8E262C00 becomes 8E 26 2C 00. This file, however, seems to be stored in small endian format. The order is flipped. In other words; 8E262C00 now becomes 00 2C 26 8E which then results in the decimal representation of 2893454
I think it's the Big Endian and Little Endian things.
You should check out this online converting tool, BMP file format is the Little Endian, but i think the tool maybe convert it by Big Endian method.
try it: https://www.scadacore.com/tools/programming-calculators/online-hex-converter/

Reading / Computing Hex received over RS232

I am using Docklight Scripting to put together a VBScript that communicates with a device via RS232. All the commands are sent in Hex.
When I want to read from the device, I send a 32-bit address, a 16-bit read length, and an 8-bit checksum.
When I want to write to the device, I send a 16-bit data length, the data, followed by an 8-bit checksum.
In Hex, the data that is sent to the device is the following:
AA0001110200060013F81800104D
AA 00 01 11 02 0006 0013F818 0010 4D
(spaced for ease of reading)
AA000111020006 is the protocol header, where:
AA is the Protocol Byte
00 is the Source ID
01 is the Dest ID
11 is the Message Type
02 is the Command Byte
0006 is the Length Byte(s)
The remainder of the string is broken down as follows:
0013F818 is the 32-bit address
0010 is the 16 bit read length
4D is the 8-bit checksum
If the string is not correct, or the checksum is invalid the device replies back with an error string. However, I am not getting an error. The device replies back with the following hex string:
AA0100120200100001000000000100000000000001000029
AA 01 00 12 02 0010 00010000000001000000000000010000 29
(spaced for ease of reading)
Again, the first part of the string (AA00011102) is a part of the protocol header, where:
AA is the Protocol Byte
01 is the Source ID
00 is the Dest ID
12 is the Message Type
02 is the Command Byte
The difference between what is sent to the device, and what the device replies back with is that the length bytes is not a "static" part of the protocol header, and will change based of the request. The remainder of the string is broken down as follows:
0010 is the Length Byte(s)
00010000000001000000000000010000 is the data
29 is the 8-bit Check Sum
The goal is to read a timer that is stored in the NVM. The timer is stored in the upper halves of 60 4-byte NVM words.
The instructions specify that I need to read the first two bytes of each word, and then sum the results.
Verbatim, the instructions say:
Read the NVM elapsed timer. The timer is stored in the upper halves of 60 4-byte words.
Read the first two bytes of each word of the timer. Read the 16 bit values of these locations:
13F800H, 13F804H, 13808H, and continue to 13F8ECH.
Sum the results. Multiply the sum by 409.6 seconds, then divide by 3600 to get the results in hours.
My knowledge of bits, and bytes, and all other things is a bit cloudy. The first thing I need to confirm is that I am understanding the read protocol correctly.
I am assuming that when I specify 0010 as the 16 bit read length, that translates to the 16-bit values that the instructions want me to read.
The second thing I need to understand a little better is that when it tells me to read the first two bytes of each word, what exactly constitutes the first two bytes of each word?
I think what confuses me a little more is that the instructions say the timer is stored in the upper half of the 4 byte word (which to me seems like the first half).
I've sat with another colleague of mine for a day trying to figure out how to make this all work, and we haven't had any consistent results with our trials.
I have looked on the internet to find something that would explain this better in the context being used.
Another worry is that the technical data I am using to accomplish this project isn't 100% accurate in their instructions, and they have conflicting information or skipping information throughout their publication (which is probably close to 1000 pages long).
What I would really appreciate is someone who has a much better understanding of hex / binary to review the instructions I've posted, and provide some feedback on my interpretation of the instructions provided, and provide any information.

In-memory layout of array in Turbo Pascal

We have an old application in Turbo Pascal which can save its internal state into a file, and we need to be able to read/write this file in a C# application.
The old application generates the file by dumping various in-memory data structures. In one place, the application just dumps a range of memory, and this memory range contains some arrays. I am trying to noodle out the purpose of the bytes immediately preceding the actual array elements. In particular, the first two items in the block can be represented as:
type
string2 = string[2];
stringarr2 = array[0..64] of string2;
string4 = string[4];
stringarr4 = array[0..64] of string4;
In the data file, I see the following byte sequence:
25 00 02 02 41 42 02 43 44 ...
The 25 is the number of elements in the array. The 02 41 42 is the first string element, "AB"; the 02 43 44 is the second string element, "CD", and so on. I don't know what the 00 02 between the array element count and the first array element refers to. It's possible the array element count is 25 00 and the element size is 02, but each array element is actually 3 bytes in size.
In the place in the file where the array of 4-character strings starts, I see the following:
25 00 04 00 00 04 41 42 43 44 04 45 46 47 48
Again, there's the 25 which is the number of elements in the array; 04 41 42 43 44 is the first element in the array, "ABCD", and so on. In between there are the bytes 00 04 00 00. Maybe they are flags. Maybe they are some kind of indication of the shape of the array (but I don't see how 02 and 04 both indicate a one-dimensional array).
I don't have access to Turbo Pascal to try writing different kinds of arrays to a file, and don't have authorization to install something like Free Pascal, so my opportunities for experimentation along those lines are very limited.
These arrays are not dynamic, since Turbo Pascal didn't have them.
Thanks in advance for any dusty memories.
Pascal arrays have no bookkeeping data. You have an array of five-byte data structures (string[4]), so an array of 65 of them occupies 65*5=325 bytes. If the program wrote more than that, then it's because the program took special measures to write more. The "extra" values weren't just sitting in memory that the program happened to write to disk when it naively wrote the whole data structure with SizeOf. Thus, the only way to know what those bytes mean is to find the source code or the documentation. Merely knowing that it's Turbo Pascal is no help.
It's possible that the first section of the file is intentionally the same size as all the other array elements. For the two-character strings, the "header" is three bytes, and for the four-character strings, the "header" is five bytes, the same as the size of the strings. That would have allowed the program to use a file of string4 data type for the file, and then just skip the file's first record. The zero between the file length and the string length in the header might belong to either of those fields, and the remaining two zero bytes might just be filler.
Besides the layout of the individual strings of characters in the file, you will also need to consider what code page those single-byte characters are from. C# chars are unicode 2 byte chars.
If you're lucky, the original file data contains only ASCII 7 bit characters, which covers characters of the English alphabet. If the original data contains "European" letters such as umlauts or accented characters, these will be "high ascii" char values in the range 128..255. You'll need to perform an encoding conversion to see these characters correctly in C#. Code page 1252 Windows Latin 1 would be a good starting point.
If the original file data contains Japanese, Chinese, Korean, Thai, or characters from other "Eastern" scripts, you have a lot of work ahead of you.
Turbo Pascal strings are prefixed with a length byte. So a string[2] is actually 3 bytes: length, char1 and char2. An array of string[2] will hold all the strings one by one directly after each other in memory. If you do a blockwrite with the array as a parameter it will immediately start with the first string, it will not write any headers etc. So if you have the source you should be able to see what it writes before the array.

Help Me Understand This Binary File Format

I am attempting to write a small utility to produce a binary file that will mimic the one produced by another closed application. I've used hex editors to decrypt the format by I'm stuck trying to understand what the format/encoding is so that I can produce it using C++ or C#.
The file starts with the first four bytes: 01 00 followed by FF FE. My understanding is that the file begins with SOH followed by the byte order mark for little endian. After these four bytes, the program appears to write BSTR's for each of the string fields from the app's GUI.
Using C#, I have produced a unicode file that starts with FF FE, but I'm not sure how to insert the SOH character first.
I would be forever grateful if someone could offer insight to the file format or encoding and why the file starts with the SOH character.
Thank you in advance.
Reverse engineering a binary file format can be a challenging task. On the surface, I don't recognize this as an obvious, well-known file format ... but there are thousands out there, so who knows.
Legal issues aside, I would suggest you look at some of the following resources that talk about approaches to such an endeavor:
How To Crack a Binary File Format
Tools to Reverse Engineer Binary Files
Basics of Reverse Engineering File Formats
File Format Reverse Engineering
If you are just having trouble writing out the first four bytes this will do it for you.
using (var stream = new FileStream("myfile.bin", FileMode.Create))
{
using (var binaryWriter = new BinaryWriter(stream))
{
binaryWriter.Write((byte)1);
binaryWriter.Write((byte)0);
binaryWriter.Write((byte)0xFF);
binaryWriter.Write((byte)0xFE);
binaryWriter.Write(Encoding.Unicode.GetBytes("string"));
}
}
This will output the following file
01 00 FF FE 73 00 74 00 72 00 69 00 6e 00 67 00 ....s.t.r.i.n.g.
Edit: Added Mark H's suggestion for writing out a string.