I am doing a CTF challenge. I open a broken BMP image file with a hex editor (HexFiend). I highlight 4 bytes in hex 8E262C00. In the bottom, HexFiend shows their value in decimal 2893454. However, I use online hex to decimal converting tool, their value is 2384866304.
Do anyone know how HexFiend comes up with 2893454?. I believe it is a correct answer, because that is the size of the file.
It's the endianness of the file.
A binary encoded file can be encoded with small or big endian. The difference is which succession the single bytes have, i.e. if you read them from left or from right. Note that the order of bits almost always is big endian. The natural way of reading is big ending; the bytes are stores as you would expect it. 8E262C00 becomes 8E 26 2C 00. This file, however, seems to be stored in small endian format. The order is flipped. In other words; 8E262C00 now becomes 00 2C 26 8E which then results in the decimal representation of 2893454
I think it's the Big Endian and Little Endian things.
You should check out this online converting tool, BMP file format is the Little Endian, but i think the tool maybe convert it by Big Endian method.
try it: https://www.scadacore.com/tools/programming-calculators/online-hex-converter/
Related
So, I have been assigned a task to read and save values of C-struct that was stored within tifftag of tiff image as byte buffer. This might be quite simple, but I am quite new to this realm of programming.
I know the exact positions I need to read bytes from. When I use python tiff tag readers, I get these weird values of bytes, that I could not make sense of. I was expecting it to be \xb5\x00\x00\x00\x01
format, not something strange like \n\xd7#=\n\xd7#=K.
Here is the snippet of weird buffer values
However, in utility app AsTiffViewer, those are perfectly fine as shown
here.
How do I decode this? What does this all mean?
\n\xd7#=\n\xd7#=K (0A D7 23 3D 0A D7 23 3D - as per AsTiffViewer)
By the way, these 0A D7 23 3D & 0A D7 23 3D are supposed to be two float value, each of them 4 bytes.
I was expecting tiff tag byte buffer to be in format of\xb5\x00\x00\x00\x01 etc, However, it spit out some weird format - \n\xd7#=\n\xd7#=K. I don't know how to decode or read this.
So, after mucking around a bit, I found out that, \n\xd7#=\n\xd7#=K, this is nothing but how python represents float in binary string.
I would like to store many numbers in 1 cell and save space. The idea is to encode each one to a string of a constant length and store them in a text field (presumably MEDIUMTEXT). What characters can be used so that they are 1 byte only? I assume that special characters are stored in a way that uses more than 1 byte. I can use e.g. base64 but I am not sure how many encoding characters I can add to the base before MySQL uses actually more space to store them than I manage to save.
You say "numbers". What do you mean, really?
Digits? See above.
Integers? (no decimal point, no fraction)
Floats? (with exponent, etc)
Some notes on digits, compression, etc:
1 byte per Ascii character -- 8 bits
1 byte per digit, since it is an ascii character
One random digit, at maximum theoretical compression, is about 0.332 bytes. Visualize it this way: 1000 is 3 digits and 1024 is 10 bits.
MySQL's DECIMAL datatype puts 2 digits into one byte for smaller numbers; for larger numbers it stores 9 digits in 4 bytes.
If you zip up a million digits of pi, it will come very close to the above compression.
A simple Rule of Thumb is that "text" compresses 3:1.
Base64 expands bytes by 8/6 because one 8-bit byte is represented by 64 (2^6) different characters.
Base64 is more useful for avoiding special characters; it is not really a compression technique.
A 4-byte MySQL INT (range of -2 billion to +2 billion, but usually just positive and not evenly distributed), when converted to base64 would take more than 5 bytes for 9-10 digits.
General techniques
Client compression: For 123,2345,88,22, here is one way to handle it. In fact, I recommend this for virtually any text handling where compression is desired in MySQL.
use compress() (or similar function) in your client.
use BLOB (up to 64KB) or MEDIUMBLOB (up to 16MB) in the table
use uncompress() after retrieving the blob.
For an array of numbers, use json_encode for the array, then feed into compress+blob, above. It will work for any sized "numbers" and provide nearly maximal compresssion.
You cannot efficiently reach into a MEDIUMTEXT or BLOB to get one number out of an array. The entire cell will be fetched.
That leads to another general statement... If you have a lot of stuff that you don't need to sort on, nor fetch individually, JSON is a good approach. Think of it as from MySQL's point of view an opaque blob. The application writes and rereads it as one big thing, then picks it apart.
The JSON will possibly encode the above example as ["123","2345","88","22"], which will be slightly fatter after compression. But, any good compression algorithm will notice and take advantage of the repetition.
Take advantage of the data
17,22738 48,77795 300
17,22792 48,77795 297
17,22853 48,77764 294
17,22874 48,77743 297
17,22887 48,77704 300
17,22968 48,77671 305
17,23069 48,77563 296
17,23092 48,77561 292
-->
17,22738 48,77795 300
54 0 -3
61 -31 -3
21 -21 3
13 -39 3
81 -33 5
1 -108 -9
23 -2 -4
The numbers stay relatively constant. Take advantage of it by starting with raw data, but then switching to deltas. Try it will about 10 times as much data; I suspect you will continue to get better than 2x compression before zipping, but maybe slightly less than 2x after zipping. (Zipping can take advantage of the repetition of 48,777; I am taking more advantage of it by tossing most of it.)
Question with regards to the use of hexadecimal. Here's a statement I have a question about:
Hexadecimal is often the preferred means of representing data because it uses fewer digits than binary.
For example, the word vim can be represented by:
Hexadecimal: 76 69 6D
Decimal: 118 105 109
Binary: 01110110 01101001
01101101
Obviously, hex is shorter than binary in this example, however, wont the hex values eventually be converted to binary at the machine level so the end result for hex-binary is exactly the same?
This is a good question.
Yes, the hexadecimal values will be converted in binaries at machine level.
But you are looking the question from the machine point of view.
Hexadecimal notation was introduced because:
It's more easy to read and memorize than binaries for human. For example if you are reading memory addresses, you can observe that they are actually written in hexadecimal, that is far more simple to read then binary.
It's easy to do calculation from binaries to hexadecimal than other base (like
our today-used base 10). For example, it's easy to group binary digits into hex in your head (4 bits per hex digit).
I suggest you this article that gives some easy example calculations to better understand hexadecimal advantages.
If I use a file comparison tool like fc in Windows, you can choose between ASCII and binary comparison.
What is the actual difference between these two comparisons? If I compare two ASCII files, don't I want the binary data of the files to be identical?
WARNING: this is 5 year old loose remembrance of knowledge from uni
Binary representation means you compare the binary exactly, and ascii is a comparison of data type. to put it in a simple case the char 'A' is a representation of 01000001, but that is also an 8 bit integer equal to '65', so that means A = 65 in binary. so if you were doing A + A as a string and 65 43 65 (43 is '+' in binary to decimal), in binary they would be equivalent, but in ascii they would not. This is a very loose explanation and i'm sure i missed a lot, but that should sum it up loosely.
In a text file you want ASCII because you write in ascii characters. In say, a program state saved to a file you want binary to get a direct comparison.
I need to analyse a binary data file containing raw data from a scientific instrument. A quick look in a hex viewer indicates that's probably no encryption or anything fancy: integers will probably be written as integers (but I don't know what byte order), and who knows about floating point.
I have access to a (closed source) program that can view the contents of the file. So I can see that a certain value is 74078. Actually searching for that value I'm not sure about - do I search for 00 01 21 5E, some other byte order, etc? (Hex Fiend doesn't support searching for decimal values) And how would I find a floating point number?
The software that produces these files runs on XP. I'd prefer tools that run on OSX if possible.
(Hmm, I wrote up this question, forgot to post it, then solved the problem. I guess I will write my own answer.)
In the end, Hex Fiend turned out to be just enough. What I was expecting to do:
Convert a known value into hex
Search for it
What I actually did:
Pick a random chunk of hex that looked like it might be a useful value
Tell Hex Fiend to display it as integer, or as float, in either little endian or big endian, until it gave a plausible looking result (ie, 45.000 is a lot more plausible than some huge integer)
Search for that result in the results I had from the closed source program.
Document it, go back to step 1. (Except that normally the next chunk wouldn't be 'random', but would follow sequentially.)
In this case there were really only three (binary) variables for how to interpret data:
float or integer
2 bytes or 4 bytes
little or big endian
With more variables the task would be a lot harder. It would have been nice if Hex Fiend could search for integers/floats directly, perhaps trying out the different combinations. Perhaps other hex viewers do.
And to answer one of my original questions, 74078 turned out to be stored as 5E2101. A bit more trial and error and I would have got there. :)
UPDATE
If I was doing this over, I'd use "Synalyze It!", a tool designed for exactly this purpose.