Currently trying to essentially reverse engineer a file format that is produced by a CNC machine when backing up programs on the machine so that i can read the programs on a standard PC. Have opened a few of the backup files created and can clearly see patterns of data such as the program name etc. which can be clearly seen in plaintext form. One thing i am struggling with is how numbers are represented in this.
for Example: the number '20' is represented in this file in hex form as '40 0D 03 00'.
More examples:
"-213.6287": "21 67 DF FF"
"-500.3366": "9A A7 B3 FF"
Any help with trying to figure out how these hex values make up those numbers?
Thanks
These numbers are stored as little-endian signed integers, as a count of ten-thousandths.
for Example: the number '20' is represented in this file in hex form as '40 0D 03 00'.
0x00030d40 = 200000.
"-213.6287": "21 67 DF FF"
0xffdf6721 = -2136287.
"-500.3366": "9A A7 B3 FF"
0xffb3a79a = -5003366.
Related
신옠안
Above is the html, below is the code. Is this a name? What does it mean?
신옠안
You have a double Mojibake, data mangled by using incorrect codecs.
It's actually Korean, a name:
신영 안
or, if using HTML entities, this should have been encoded to
신영 안
It translates to English as Shin-Young An.
When encoded to UTF-8, and grouped per input code-point then displayed using hex digits you would get this:
ec 8b a0
ec 98 81
20
ec 95 88
To produce the output you have, someone must have:
Decoded the above UTF-8 data using Windows codepage 1252, producing
ì‹<A0>ì˜<81> 안
(where <A0> is the non-breaking space character, and <81> is an invalid CP1252 byte, but this is often ignored in many decoders; I've included them in this notation because they'd not otherwise be printable)
Encoded the resulting mess to UTF-8 again to give you the following byte values:
c3 ac e2 80 b9 c2 a0
c3 ac cb 9c c2 81
20
c3 ac e2 80 a2 cb 86
(the grouping matches the correct UTF-8, above)
Decoded those UTF-8 bytes a second time using the same Windows CP1252 codec, this time producing:
ì‹Â<A0>ì˜Â<81> 안
(with the same note on the <A0> and <81> characters)
Finally encoded the resulting characters to HTML entities:
신옠안
If you have Python installed, then the ftfy library can 'repair' text like this in a single step:
>>> import ftfy
>>> sample = '신옠안'
>>> ftfy.ftfy(sample)
'신영 안'
I used that library to tell me what codecs were used, as well as use it's sloppy CP1252 decoder to produce the decodes above.
E.g. for your input I used:
>>> ftfy.fixes.fix_encoding_and_explain(ftfy.fixes.unescape_html(sample))
('신영 안', [('encode', 'sloppy-windows-1252', 0), ('decode', 'utf-8', 0), ('encode', 'sloppy-windows-1252', 0), ('decode', 'utf-8', 0)])
to see the repair plan, and reversed it to explain how the Mojibake was produced in the first place.
I'm searching in web for magic numbers for Paradox DB files (.db), but there are no results.
What binary signature for these files?
I hate to answer this question because identifying the type of a file by its contents is inherently unreliable -- a binary signature tells you only that the file MAY be of a certain type, but there's no guarantee that another file won't happen to contain that string.
Also, binary signature recognition of *.DB files is weak because almost all the bytes carry highly variable information, and most of the constant bytes are zero:
byte constant
offset value
-- --
2 00
3 08
E 00
F 01
14 00
2A 00
2B 00
2C 00
(all values in this answer are given in hexidecimal)
There's probably lots of non *.DB files that will accidentally match this pattern.
A couple more bytes are easily interpreted and provide additional information:
byte offset 4:
00 = indexed .DB data file
01 = primary index .PX file
02 = non-indexed .DB data file
03 = non-incrementing secondary index .Xnn file
04 = secondary index .Ynn file (inc or non-inc)
05 = incrementing secondary index .Xnn file
06 = non-incrementing secondary index .XGn file
07 = secondary index .YGn file (inc or non inc)
08 = incrementing secondary index .XGn file
byte offset 39:
03 version 3.0
04 version 3.5
05..09 version 4.x
0A..0B version 5.x
0C version 7.x
Borland never published the internal format of its data base files. I think that to reliably identify a *.DB file by its contents you would have to try to open it with Borland's data base engine.
-Al.
Have tried some of the online references as wells as unix time form at etc. but none of these seem to work. See the examples below.
running Mysql 5.5.5 in ubuntu. innodb engine.
nothing is custom. This is using a built in datetime function.
Here are some examples with the 6 byte hex string and the decoded message below. We are looking for the decoding algorithm. i.e.how to turn the 6 byte hex string into the correct date/time. The algorithm must work correctly on the examples below. The right most byte seems to indicate difference in seconds correctly for small small differences in time between records. i.e. we show an example with 14 sec difference.
full records,nicely highlighted and formated word doc here.
https://www.dropbox.com/s/zsqy9o2rw1h0e09/mysql%20datetime%20examples%20.docx?dl=0
link to formatted word document with the examples.
contact frank%simrex.com re. reward.
replace % with #
hex strings and decoded date/time pairs are below.
pulled from healthy file running mysql
12 51 72 78 B9 46 ... 2014-10-22 16:53:18
12 51 72 78 B9 54 ... 2014-10-22 16:53:32
12 51 72 78 BA 13 ... 2014-10-22 16:55:23
12 51 72 78 CC 27 ... 2014-10-22 17:01:51
here you go.
select str_to_date(conv(replace('12 51 72 78 CC 27',' ', ''), 16, 10), '%Y%m%d%H%i%s')
We have an old application in Turbo Pascal which can save its internal state into a file, and we need to be able to read/write this file in a C# application.
The old application generates the file by dumping various in-memory data structures. In one place, the application just dumps a range of memory, and this memory range contains some arrays. I am trying to noodle out the purpose of the bytes immediately preceding the actual array elements. In particular, the first two items in the block can be represented as:
type
string2 = string[2];
stringarr2 = array[0..64] of string2;
string4 = string[4];
stringarr4 = array[0..64] of string4;
In the data file, I see the following byte sequence:
25 00 02 02 41 42 02 43 44 ...
The 25 is the number of elements in the array. The 02 41 42 is the first string element, "AB"; the 02 43 44 is the second string element, "CD", and so on. I don't know what the 00 02 between the array element count and the first array element refers to. It's possible the array element count is 25 00 and the element size is 02, but each array element is actually 3 bytes in size.
In the place in the file where the array of 4-character strings starts, I see the following:
25 00 04 00 00 04 41 42 43 44 04 45 46 47 48
Again, there's the 25 which is the number of elements in the array; 04 41 42 43 44 is the first element in the array, "ABCD", and so on. In between there are the bytes 00 04 00 00. Maybe they are flags. Maybe they are some kind of indication of the shape of the array (but I don't see how 02 and 04 both indicate a one-dimensional array).
I don't have access to Turbo Pascal to try writing different kinds of arrays to a file, and don't have authorization to install something like Free Pascal, so my opportunities for experimentation along those lines are very limited.
These arrays are not dynamic, since Turbo Pascal didn't have them.
Thanks in advance for any dusty memories.
Pascal arrays have no bookkeeping data. You have an array of five-byte data structures (string[4]), so an array of 65 of them occupies 65*5=325 bytes. If the program wrote more than that, then it's because the program took special measures to write more. The "extra" values weren't just sitting in memory that the program happened to write to disk when it naively wrote the whole data structure with SizeOf. Thus, the only way to know what those bytes mean is to find the source code or the documentation. Merely knowing that it's Turbo Pascal is no help.
It's possible that the first section of the file is intentionally the same size as all the other array elements. For the two-character strings, the "header" is three bytes, and for the four-character strings, the "header" is five bytes, the same as the size of the strings. That would have allowed the program to use a file of string4 data type for the file, and then just skip the file's first record. The zero between the file length and the string length in the header might belong to either of those fields, and the remaining two zero bytes might just be filler.
Besides the layout of the individual strings of characters in the file, you will also need to consider what code page those single-byte characters are from. C# chars are unicode 2 byte chars.
If you're lucky, the original file data contains only ASCII 7 bit characters, which covers characters of the English alphabet. If the original data contains "European" letters such as umlauts or accented characters, these will be "high ascii" char values in the range 128..255. You'll need to perform an encoding conversion to see these characters correctly in C#. Code page 1252 Windows Latin 1 would be a good starting point.
If the original file data contains Japanese, Chinese, Korean, Thai, or characters from other "Eastern" scripts, you have a lot of work ahead of you.
Turbo Pascal strings are prefixed with a length byte. So a string[2] is actually 3 bytes: length, char1 and char2. An array of string[2] will hold all the strings one by one directly after each other in memory. If you do a blockwrite with the array as a parameter it will immediately start with the first string, it will not write any headers etc. So if you have the source you should be able to see what it writes before the array.
I am attempting to write a small utility to produce a binary file that will mimic the one produced by another closed application. I've used hex editors to decrypt the format by I'm stuck trying to understand what the format/encoding is so that I can produce it using C++ or C#.
The file starts with the first four bytes: 01 00 followed by FF FE. My understanding is that the file begins with SOH followed by the byte order mark for little endian. After these four bytes, the program appears to write BSTR's for each of the string fields from the app's GUI.
Using C#, I have produced a unicode file that starts with FF FE, but I'm not sure how to insert the SOH character first.
I would be forever grateful if someone could offer insight to the file format or encoding and why the file starts with the SOH character.
Thank you in advance.
Reverse engineering a binary file format can be a challenging task. On the surface, I don't recognize this as an obvious, well-known file format ... but there are thousands out there, so who knows.
Legal issues aside, I would suggest you look at some of the following resources that talk about approaches to such an endeavor:
How To Crack a Binary File Format
Tools to Reverse Engineer Binary Files
Basics of Reverse Engineering File Formats
File Format Reverse Engineering
If you are just having trouble writing out the first four bytes this will do it for you.
using (var stream = new FileStream("myfile.bin", FileMode.Create))
{
using (var binaryWriter = new BinaryWriter(stream))
{
binaryWriter.Write((byte)1);
binaryWriter.Write((byte)0);
binaryWriter.Write((byte)0xFF);
binaryWriter.Write((byte)0xFE);
binaryWriter.Write(Encoding.Unicode.GetBytes("string"));
}
}
This will output the following file
01 00 FF FE 73 00 74 00 72 00 69 00 6e 00 67 00 ....s.t.r.i.n.g.
Edit: Added Mark H's suggestion for writing out a string.