Reading / Computing Hex received over RS232 - binary

I am using Docklight Scripting to put together a VBScript that communicates with a device via RS232. All the commands are sent in Hex.
When I want to read from the device, I send a 32-bit address, a 16-bit read length, and an 8-bit checksum.
When I want to write to the device, I send a 16-bit data length, the data, followed by an 8-bit checksum.
In Hex, the data that is sent to the device is the following:
AA0001110200060013F81800104D
AA 00 01 11 02 0006 0013F818 0010 4D
(spaced for ease of reading)
AA000111020006 is the protocol header, where:
AA is the Protocol Byte
00 is the Source ID
01 is the Dest ID
11 is the Message Type
02 is the Command Byte
0006 is the Length Byte(s)
The remainder of the string is broken down as follows:
0013F818 is the 32-bit address
0010 is the 16 bit read length
4D is the 8-bit checksum
If the string is not correct, or the checksum is invalid the device replies back with an error string. However, I am not getting an error. The device replies back with the following hex string:
AA0100120200100001000000000100000000000001000029
AA 01 00 12 02 0010 00010000000001000000000000010000 29
(spaced for ease of reading)
Again, the first part of the string (AA00011102) is a part of the protocol header, where:
AA is the Protocol Byte
01 is the Source ID
00 is the Dest ID
12 is the Message Type
02 is the Command Byte
The difference between what is sent to the device, and what the device replies back with is that the length bytes is not a "static" part of the protocol header, and will change based of the request. The remainder of the string is broken down as follows:
0010 is the Length Byte(s)
00010000000001000000000000010000 is the data
29 is the 8-bit Check Sum
The goal is to read a timer that is stored in the NVM. The timer is stored in the upper halves of 60 4-byte NVM words.
The instructions specify that I need to read the first two bytes of each word, and then sum the results.
Verbatim, the instructions say:
Read the NVM elapsed timer. The timer is stored in the upper halves of 60 4-byte words.
Read the first two bytes of each word of the timer. Read the 16 bit values of these locations:
13F800H, 13F804H, 13808H, and continue to 13F8ECH.
Sum the results. Multiply the sum by 409.6 seconds, then divide by 3600 to get the results in hours.
My knowledge of bits, and bytes, and all other things is a bit cloudy. The first thing I need to confirm is that I am understanding the read protocol correctly.
I am assuming that when I specify 0010 as the 16 bit read length, that translates to the 16-bit values that the instructions want me to read.
The second thing I need to understand a little better is that when it tells me to read the first two bytes of each word, what exactly constitutes the first two bytes of each word?
I think what confuses me a little more is that the instructions say the timer is stored in the upper half of the 4 byte word (which to me seems like the first half).
I've sat with another colleague of mine for a day trying to figure out how to make this all work, and we haven't had any consistent results with our trials.
I have looked on the internet to find something that would explain this better in the context being used.
Another worry is that the technical data I am using to accomplish this project isn't 100% accurate in their instructions, and they have conflicting information or skipping information throughout their publication (which is probably close to 1000 pages long).
What I would really appreciate is someone who has a much better understanding of hex / binary to review the instructions I've posted, and provide some feedback on my interpretation of the instructions provided, and provide any information.

Related

MySQL character set for numbers compression

I would like to store many numbers in 1 cell and save space. The idea is to encode each one to a string of a constant length and store them in a text field (presumably MEDIUMTEXT). What characters can be used so that they are 1 byte only? I assume that special characters are stored in a way that uses more than 1 byte. I can use e.g. base64 but I am not sure how many encoding characters I can add to the base before MySQL uses actually more space to store them than I manage to save.
You say "numbers". What do you mean, really?
Digits? See above.
Integers? (no decimal point, no fraction)
Floats? (with exponent, etc)
Some notes on digits, compression, etc:
1 byte per Ascii character -- 8 bits
1 byte per digit, since it is an ascii character
One random digit, at maximum theoretical compression, is about 0.332 bytes. Visualize it this way: 1000 is 3 digits and 1024 is 10 bits.
MySQL's DECIMAL datatype puts 2 digits into one byte for smaller numbers; for larger numbers it stores 9 digits in 4 bytes.
If you zip up a million digits of pi, it will come very close to the above compression.
A simple Rule of Thumb is that "text" compresses 3:1.
Base64 expands bytes by 8/6 because one 8-bit byte is represented by 64 (2^6) different characters.
Base64 is more useful for avoiding special characters; it is not really a compression technique.
A 4-byte MySQL INT (range of -2 billion to +2 billion, but usually just positive and not evenly distributed), when converted to base64 would take more than 5 bytes for 9-10 digits.
General techniques
Client compression: For 123,2345,88,22, here is one way to handle it. In fact, I recommend this for virtually any text handling where compression is desired in MySQL.
use compress() (or similar function) in your client.
use BLOB (up to 64KB) or MEDIUMBLOB (up to 16MB) in the table
use uncompress() after retrieving the blob.
For an array of numbers, use json_encode for the array, then feed into compress+blob, above. It will work for any sized "numbers" and provide nearly maximal compresssion.
You cannot efficiently reach into a MEDIUMTEXT or BLOB to get one number out of an array. The entire cell will be fetched.
That leads to another general statement... If you have a lot of stuff that you don't need to sort on, nor fetch individually, JSON is a good approach. Think of it as from MySQL's point of view an opaque blob. The application writes and rereads it as one big thing, then picks it apart.
The JSON will possibly encode the above example as ["123","2345","88","22"], which will be slightly fatter after compression. But, any good compression algorithm will notice and take advantage of the repetition.
Take advantage of the data
17,22738 48,77795 300
17,22792 48,77795 297
17,22853 48,77764 294
17,22874 48,77743 297
17,22887 48,77704 300
17,22968 48,77671 305
17,23069 48,77563 296
17,23092 48,77561 292
-->
17,22738 48,77795 300
54 0 -3
61 -31 -3
21 -21 3
13 -39 3
81 -33 5
1 -108 -9
23 -2 -4
The numbers stay relatively constant. Take advantage of it by starting with raw data, but then switching to deltas. Try it will about 10 times as much data; I suspect you will continue to get better than 2x compression before zipping, but maybe slightly less than 2x after zipping. (Zipping can take advantage of the repetition of 48,777; I am taking more advantage of it by tossing most of it.)

Deflate length of 258 double encoding

In Deflate algorithm there are two ways to encode a length of 258:
Code 284 + 5 extra bits of all 1's
Code 285 + 0 extra bits;
On first glance, this is not optimal, because the proper use of code 285 would allow a length of 259 be encoded;
Is this duality some specification mistake, not fixed because of compatibility reasons, or there are some arguments about it - for example length of 258 must be encoded with shorter code (0 extra bits) because of some reason?
We may never know. The developer of the deflate format, Phil Katz, passed away many years ago at a young age.
My theory is that a match length was limited to 258 so that a match length in the range 3..258 could fit in a byte, encoded as 0..255. This format was developed around 1990, when this might make a difference in an assembler implementation.
Adding a second answer here to underscore Mark's guess that allowing the length to be encoded in a byte is helpful to assembler implementations. At the time 8086 level assembler was still common and using the 8 bit form of registers gave you more of them to work with than using them in 16 bit size.
The benefit is even more pronounced on 8 bit processors such as the 6502. It starts with the length decoding. Symbols 257 .. 264 represent a match length of 3 .. 10 respectively. If you take the low byte of those symbols (1 .. 8) you get exactly 2 less than the match length.
A more complicated yet fairly easy to compute formula gives 2 less than the match length of symbols 265 through 284. 2 less than the match length of symbol 285 is 256. That doesn't fit in a byte but we can store 0 which turns out to be equivalent.
zlib6502 uses this for considerable advantage. It calculates the match length in inflateCodes_lengthMinus2. And once the back pointer into the window has been determined it copies the data like so:
jsr copyByte
jsr copyByte
inflateCodes_copyByte
jsr copyByte
dec inflateCodes_lengthMinus2
bne inflateCodes_copyByte
It makes two explicit calls to copy a byte and then loops over the length less 2. Which works as you would expect for lengths 1 to 255. For length 0 it will actually iterate 256 times as we desire. The first time through the loop the length of 0 is decremented to 255 which is non-zero so the loop continues 255 more times for a total of 256.
I'd have to think that Phil Katz understood intuitively if not explicitly the benefits of keeping the length of matches within 8 bits.

Impossible to encode 32-bit binary opcode in machine instruction

I have been trying to format binary opcodes for Motorola 68000, but I keep finding that it's not possible to encode both the destination memory address, instruction designation and addressing mode/size, and data value to be copied to the address bus for memory-mapped I/O.
For the Sega Genesis' Video Display Processor I am attempting to write to the control port which is memory mapped at C00004 in the Genesis' memory map.
C0004 is 1100 0000 0000 0000 0000 0100 in binary, or three bytes. The value I'm writing is 87, which the VDP recognizes as 8787 in VDP register #7. The issue I'm having is figuring out how to encode 32-bits worth of data, e.g. the instruction prefix designation move.b, value 87, which is #$87, and the destination memory address C00004 for MMIO re-routing to the correct VDP port on the way to the VDP.
Altogether it looks like this:
move.b #$87, $00C00004,
which loosely translates into not four, but four bytes and a nibble(36-bits to be exact!)
0001 1000 0111 1100 0000 0000 0000 0000 0100
Since Motorola 68000 will only parse 32-bits when working down to microcode, how is it possible to encode the required information if there's not enough space(and within the same instruction)?
Perhaps I'm understanding this incorrectly?
I know this is beyond the level most programmers would anticipate, but I'm hoping someone around can break this down for me and explain how this encoding scheme would work.
Your instruction, move.b #$87, $$00C00004 ought to encode to
0001111001111100 0000000010000111 00000000110000000000000000000100
(or similar; I'm not sure about the order of the operands).
The first 16-bit word can be broken down thusly:
The first four bits say that this is a move.b instruction.
The next six bits say that the destination addressing mode is an absolute 32-bit address.
The last six bits say that the source operand is immediate.
After that follows instruction extension words with the operands. The first is 16 bits for the immediate data, and the last 32 bits are for the address. (Might be the other way around.)
For more information, see http://www.freescale.com/files/archives/doc/ref_manual/M68000PRM.pdf

In-memory layout of array in Turbo Pascal

We have an old application in Turbo Pascal which can save its internal state into a file, and we need to be able to read/write this file in a C# application.
The old application generates the file by dumping various in-memory data structures. In one place, the application just dumps a range of memory, and this memory range contains some arrays. I am trying to noodle out the purpose of the bytes immediately preceding the actual array elements. In particular, the first two items in the block can be represented as:
type
string2 = string[2];
stringarr2 = array[0..64] of string2;
string4 = string[4];
stringarr4 = array[0..64] of string4;
In the data file, I see the following byte sequence:
25 00 02 02 41 42 02 43 44 ...
The 25 is the number of elements in the array. The 02 41 42 is the first string element, "AB"; the 02 43 44 is the second string element, "CD", and so on. I don't know what the 00 02 between the array element count and the first array element refers to. It's possible the array element count is 25 00 and the element size is 02, but each array element is actually 3 bytes in size.
In the place in the file where the array of 4-character strings starts, I see the following:
25 00 04 00 00 04 41 42 43 44 04 45 46 47 48
Again, there's the 25 which is the number of elements in the array; 04 41 42 43 44 is the first element in the array, "ABCD", and so on. In between there are the bytes 00 04 00 00. Maybe they are flags. Maybe they are some kind of indication of the shape of the array (but I don't see how 02 and 04 both indicate a one-dimensional array).
I don't have access to Turbo Pascal to try writing different kinds of arrays to a file, and don't have authorization to install something like Free Pascal, so my opportunities for experimentation along those lines are very limited.
These arrays are not dynamic, since Turbo Pascal didn't have them.
Thanks in advance for any dusty memories.
Pascal arrays have no bookkeeping data. You have an array of five-byte data structures (string[4]), so an array of 65 of them occupies 65*5=325 bytes. If the program wrote more than that, then it's because the program took special measures to write more. The "extra" values weren't just sitting in memory that the program happened to write to disk when it naively wrote the whole data structure with SizeOf. Thus, the only way to know what those bytes mean is to find the source code or the documentation. Merely knowing that it's Turbo Pascal is no help.
It's possible that the first section of the file is intentionally the same size as all the other array elements. For the two-character strings, the "header" is three bytes, and for the four-character strings, the "header" is five bytes, the same as the size of the strings. That would have allowed the program to use a file of string4 data type for the file, and then just skip the file's first record. The zero between the file length and the string length in the header might belong to either of those fields, and the remaining two zero bytes might just be filler.
Besides the layout of the individual strings of characters in the file, you will also need to consider what code page those single-byte characters are from. C# chars are unicode 2 byte chars.
If you're lucky, the original file data contains only ASCII 7 bit characters, which covers characters of the English alphabet. If the original data contains "European" letters such as umlauts or accented characters, these will be "high ascii" char values in the range 128..255. You'll need to perform an encoding conversion to see these characters correctly in C#. Code page 1252 Windows Latin 1 would be a good starting point.
If the original file data contains Japanese, Chinese, Korean, Thai, or characters from other "Eastern" scripts, you have a lot of work ahead of you.
Turbo Pascal strings are prefixed with a length byte. So a string[2] is actually 3 bytes: length, char1 and char2. An array of string[2] will hold all the strings one by one directly after each other in memory. If you do a blockwrite with the array as a parameter it will immediately start with the first string, it will not write any headers etc. So if you have the source you should be able to see what it writes before the array.

Creating "holes" in a binary H.264 bitstream

I am trying to simulate data loss in a video by selectively removing H.264 bitstream data. The data is simply a raw H.264 file, which is essentially a binary file. My plan is to delete 2 bytes for every 100 bytes so as to achieve a 2% loss. Eventually, I will be testing the effectiveness of some motion vector error concealment algorithms.
It would be nice to be able to do this in a Unix environment. So far, I have investigated the command xxd for a bit and I am able to save a specific portion of a hex dump from a binary file. For example, to skip the first 50 bytes of a binary bitstream and save the subsequent 100 bytes, I would do the following:
xxd -s 50 -l 100 inputBinaryFile | xxd -r > outputBinaryFile
I'm hoping to incorporate something similar into a bash script that will automatically delete the last 2 bytes per 100 bytes. Furthermore, I would like the script to skip everything before the second occurrence of the sequence 00 00 01 06 05 (first P-frame SEI start code).
I don't know how much easier this could be in a C-based language but my programming skills are quite limited and I would rather deal with just Linux programming for now if possible.
Thanks.