represent bitmask value in JSON object - json

What is the best way to represent bit-mask values in JSON object?
For example:
we want to know what ingredients user want in his fruit salad
Orange = 0x01
Apple = 0x02
Banana = 0x04
Grapes = 0x08
How would one represent the selected options in a JSON Object, obviously we can use integer value (i.e. 3 is for Orange and Apple) but it is not quite readable.
Is there are a better way?!

Researching a bit on this topic uncovered the following case study:
https://www.smartsheet.com/blog/smartsheet-api-formatting
It's not exactly the same problem, but it was good for extrapolating some solutions here:
Send a list of integers, from a predefined lookup table: e.g. [1, 3] (compromise between space and parsing)
Send the actual bit mask value (harder to parse, takes the least space)
Send a list of strings: e.g. [Orange, Banana] (easy to read, takes most space)
If space is not a constraint, I think the last options is the best.

Related

Encoder number of outpus for opcode within a MIPS machine instruction

If I have an encoder with 8 data inputs, what is its maximum number of outputs?
I know that an encoder is a combinational circuit that performs the reverse operation of a decoder. It has a maximum of 2^n input lines and ‘n’ output lines, hence it encodes the information from 2^n inputs into an n-bit code. Since I have 8 data input, the output will be 3, since 2^3 = 8. Is that the correct assumption?
Let's try to tease apart the concepts of one hot (decoded) lines and an encoding using a number of bits.  Both these concepts are a way to represent information, but their form and typical usage is different.
One hot is a technique wherein at most one line is 1/true and all the other lines are 0/false.  These one hot lines are not considered digits in a number, but rather individual signals or conditions (only one of which is can be true at any given time).  This form is particularly useful in certain circuits, as each of the one hot lines can activate some other hardware.  (A hardware lookup table (LUT), a RAM or ROM may use one-hot within its internal array indexing.)
Encoding is a technique where we use N lines as digits in an N-bit number, as would be found in a CPU register holding a number, or as we might write normal binary numbers in text.  By contrast, in this form any of the N bits can be 1 (or 0).
Simple encoders & decoders translate between encoded form (N-bit numbers) and one hot form (2N lines).
... encoder ... has a maximum of 2^n input lines and ‘n’ output lines
In your statement, the 2^n input lines are in one hot form, while the output lines are normal numbers in binary (i.e. encoded).
Both the inputs (2^n lines) and the outputs (n lines) are capable of representing exactly 2^n different values!  As a result, decode/encode is a 1:1 mapping, back & forth.  (It would be an error to have multiple hots on the input side of such a decoder, and bad things would happen in a system that allowed that.)
In the formulas you're speaking to:  2N = V,  and   N = log2 ( V )  —  N stands for number of bits (a bit is a binary digit), and V stands for number of values that can be represented in N bits.
(While the 2's in these formulas are for binary — substitute 2 with 10 for the same relationships for number of decimal digits vs. number of values those number of digits can represent/store/communicate).
In one hot form we need V number of lines, whereas in encoded form we need N lines (as bits/digits) to represent the same information (one of V different values).
Consider whether a number you're looking is a digit count (as with N) or a value count (as with V).
And bear in mind that in one hot form, we need one line for each possible value, V (whereas in encoded form we need N bits for V possible values).
A MIPS processor will feed the 6 bit opcode field into a lookup table of some sort, in order to determine which set of control signals to activate for any given instruction.  (The opcode field is not one hot, but rather a bit field of N=6 bits).
These control signals are (also) not one hot, and the MIPS instruction decoder is not using a simple decoder, but rather a mapper that goes between encoded opcode values and effectively encoded control signals — this mapping is accomplished by lookup in a table.
These control signals are individual boolean values rather than as a set either one-hot or an encoded number.  One hot may be used internally in indexing of this mapping.  This mapping is basically an array lookup where the index is the opcode and each array element has all the individual control signal values appropriate its index.
(R-Type instructions all share a common opcode value, so when the R-Type opcode value is present, then additional lookup/mapping is done on the func bit field to generate the proper control signals.)

When JSON is send over network how are numbers represented (as binary or text)?

This might be a trivial question... Or might not be. When I serialize an object to JSON how are numbers represented?
Specifically, I need to know how efficiently they are encoded to binary. There are 2 ways:
Transform number to its decimal string representation and then encode that string to binary.
Or encode the number directly to binary.
Which is the case?
That is a big difference: Let's say serialized object contains number 12345678. Encoded first way it will take 8 B to transfer, encoded second way only 4 B. When it comes to lots of big numbers (my case) than in the first case I would better use base64 as pre-process for serialization.
I can imagine that this might be dependent on serializer (though I really hope it is not). In that case, I am using Firebase Realtime database SDK.
JSON is a textual notation. So the number 12345678 is sent as those eight characters, 1, 2, 3, etc. Depending on your text encoding, that's probably eight bytes (e.g., UTF-8 or Windows-1252; but if you were using UTF-16, for instance, it would be 16 bytes).
There have been various "binary JSON" proposals over the years, but I don't think any of them really caught on outside of specific applications (for instance, BSON in MongoDB).

Can I get more explanations for BSON?

I am trying to understand BSON via http://bsonspec.org/#/specification, but still some questions remain.
let's take an example from the web site above:
{"hello": "world"} → "\x16\x00\x00\x00\x02hello\x00\x06\x00\x00\x00world\x00\x00"
Question 1
in the above example, for the encoded bytes results, the double quotes actually are not part of the results, right?
Question 2
I understand that the first 4 bytes \x16\x00\x00\x00 is the size of the whole BSON doc.
And it is little endian format. But why? Why not take big endian?
Question 3
How comes the size of the example doc being \x16, i.e. 22?
Question 4
Normally, if I want to encode the doc by myself, how do I calculate the size of the doc? I think my trouble majorly is how to decide the size of UTF-8 string?
Let's take another example:
{"BSON": ["awesome", 5.05, 1986]}
→
"\x31\x00\x00\x00\x04BSON\x00\x26\x00\x00\x00\x020\x00\x08\x00\x00
\x00awesome\x00\x011\x00\x33\x33\x33\x33\x33\x33\x14\x40\x102\x00\xc2\x07\x00\x00
\x00\x00"
Question 5
In this example, there is an array. according to the specification, for array, it is actually a list of {key, value} pairs, whereas the key is 0, 1, etc. My question is so the 0, 1 here are strings too, right?
Question 1
in the above example, for the encoded bytes results, the double quotes actually are not part of the results, right?
The quotes are not part of the strings. They're used to mark JSON strings
Question 2
And it is little endian format. But why? Why not take big endian?
Choice of endianness is largely a matter of preference. One advantage of little endian is that commonly used platforms are little endian, and thus don't need to reverse the bytes.
Question 3
How comes the size of the example doc being \x16, i.e. 22?
There are 22 bytes (including the length prefix)
Question 4
Normally, if I want to encode the doc by myself, how do I calculate the size of the doc? I think my trouble majorly is how to decide the size of UTF-8 string?
First write out the document, and then go back to fill in the length.
Question 5
n this example, there is an array. according to the specification, for array, it is actually a list of {key, value} pairs, whereas the key is 0, 1, etc. My question is so the 0, 1 here are strings too, right?
Yes. Zero terminated strings without length prefix to be exact. (Called cstring in the list). Just like an embedded document.

240 bit radar word

I Have a project using a 240 bit Octal data format that will be coming in the serial port of Arduino uno at 2.4K RS232 converted to TTL.
The 240 bits along with other things has range, azimuth and elevation words, which is what I need to display.
The frame starts with a frame sync code wich is an alternating binary 7 bit code which is:
1110010 for frame 1 and
0001101 for frame 2 and so on.
I was thinking that I might use something like val = serial.read command like
if (val = 1110010 or 0001101) { data++val; }`
that will let me validate the start of my sting.
The rest of the 240 bit octal frame (all numbers) can be serial read to a string of which only parts will be needed to be printed to the screen.
Past the frame sync, all octal data is serial with no Nulls or delimiters so I am thinking
printf("%.Xs",stringname[xx]);
will let me off set the characters as needed so they can be parsed out.
How do I tell the program that the frame sync its looking for is binary or that the data that needs to go into the string is octal, or that it may need to be converted to be read on the screen?

Rot13 for numbers

EDIT: Now a Major Motion Blog Post at http://messymatters.com/sealedbids
The idea of rot13 is to obscure text, for example to prevent spoilers. It's not meant to be cryptographically secure but to simply make sure that only people who are sure they want to read it will read it.
I'd like to do something similar for numbers, for an application involving sealed bids. Roughly I want to send someone my number and trust them to pick their own number, uninfluenced by mine, but then they should be able to reveal mine (purely client-side) when they're ready. They should not require further input from me or any third party.
(Added: Note the assumption that the recipient is being trusted not to cheat.)
It's not as simple as rot13 because certain numbers, like 1 and 2, will recur often enough that you might remember that, say, 34.2 is really 1.
Here's what I'm looking for specifically:
A function seal() that maps a real number to a real number (or a string). It should not be deterministic -- seal(7) should not map to the same thing every time. But the corresponding function unseal() should be deterministic -- unseal(seal(x)) should equal x for all x. I don't want seal or unseal to call any webservices or even get the system time (because I don't want to assume synchronized clocks). (Added: It's fine to assume that all bids will be less than some maximum, known to everyone, say a million.)
Sanity check:
> seal(7)
482.2382 # some random-seeming number or string.
> seal(7)
71.9217 # a completely different random-seeming number or string.
> unseal(seal(7))
7 # we always recover the original number by unsealing.
You can pack your number as a 4 byte float together with another random float into a double and send that. The client then just has to pick up the first four bytes. In python:
import struct, random
def seal(f):
return struct.unpack("d",struct.pack("ff", f, random.random() ))[0]
def unseal(f):
return struct.unpack("ff",struct.pack("d", f))[0]
>>> unseal( seal( 3))
3.0
>>> seal(3)
4.4533985422978706e-009
>>> seal(3)
9.0767582382536571e-010
Here's a solution inspired by Svante's answer.
M = 9999 # Upper bound on bid.
seal(x) = M * randInt(9,99) + x
unseal(x) = x % M
Sanity check:
> seal(7)
716017
> seal(7)
518497
> unseal(seal(7))
7
This needs tweaking to allow negative bids though:
M = 9999 # Numbers between -M/2 and M/2 can be sealed.
seal(x) = M * randInt(9,99) + x
unseal(x) =
m = x % M;
if m > M/2 return m - M else return m
A nice thing about this solution is how trivial it is for the recipient to decode -- just mod by 9999 (and if that's 5000 or more then it was a negative bid so subtract another 9999). It's also nice that the obscured bid will be at most 6 digits long. (This is plenty security for what I have in mind -- if the bids can possibly exceed $5k then I'd use a more secure method. Though of course the max bid in this method can be set as high as you want.)
Instructions for Lay Folk
Pick a number between 9 and 99 and multiply it by 9999, then add your bid.
This will yield a 5 or 6-digit number that encodes your bid.
To unseal it, divide by 9999, subtract the part to the left of the decimal point, then multiply by 9999.
(This is known to children and mathematicians as "finding the remainder when dividing by 9999" or "mod'ing by 9999", respectively.)
This works for nonnegative bids less than 9999 (if that's not enough, use 99999 or as many digits as you want).
If you want to allow negative bids, then the magic 9999 number needs to be twice the biggest possible bid.
And when decoding, if the result is greater than half of 9999, ie, 5000 or more, then subtract 9999 to get the actual (negative) bid.
Again, note that this is on the honor system: there's nothing technically preventing you from unsealing the other person's number as soon as you see it.
If you're relying on honesty of the user and only dealing with integer bids, a simple XOR operation with a random number should be all you need, an example in C#:
static Random rng = new Random();
static string EncodeBid(int bid)
{
int i = rng.Next();
return String.Format("{0}:{1}", i, bid ^ i);
}
static int DecodeBid(string encodedBid)
{
string[] d = encodedBid.Split(":".ToCharArray());
return Convert.ToInt32(d[0]) ^ Convert.ToInt32(d[1]);
}
Use:
int bid = 500;
string encodedBid = EncodeBid(bid); // encodedBid is something like 54017514:4017054 and will be different each time
int decodedBid = DecodeBid(encodedBid); // decodedBid is 500
Converting the decode process to a client side construct should be simple enough.
Is there a maximum bid? If so, you could do this:
Let max-bid be the maximum bid and a-bid the bid you want to encode. Multiply max-bid by a rather large random number (if you want to use base64 encoding in the last step, max-rand should be (2^24/max-bid)-1, and min-rand perhaps half of that), then add a-bid. Encode this, e.g. through base64.
The recipient then just has to decode and find the remainder modulo max-bid.
What you want to do (a Commitment scheme) is impossible to do client-side-only. The best you could do is encrypt with a shared key.
If the client doesn't need your cooperation to reveal the number, they can just modify the program to reveal the number. You might as well have just sent it and not displayed it.
To do it properly, you could send a secure hash of your bid + a random salt. That commits you to your bid. The other client can commit to their bid in the same way. Then you each share your bid and salt.
[edit] Since you trust the other client:
Sender:
Let M be your message
K = random 4-byte key
C1 = M xor hash(K) //hash optional: hides patterns in M xor K
//(you can repeat or truncate hash(K) as necessary to cover the message)
//(could also xor with output of a PRNG instead)
C2 = K append M //they need to know K to reveal the message
send C2 //(convert bytes to hex representation if needed)
Receiver:
receive C2
K = C2[:4]
C1 = C2[4:]
M = C1 xor hash(K)
Are you aware that you need a larger 'sealed' set of numbers than your original, if you want that to work?
So you need to restrict your real numbers somehow, or store extra info that you don't show.
One simple way is to write a message like:
"my bid is: $14.23: aduigfurjwjnfdjfugfojdjkdskdfdhfddfuiodrnfnghfifyis"
All that junk is randomly-generated, and different every time.
Send the other person the SHA256 hash of the message. Have them send you the hash of their bid. Then, once you both have the hashes, send the full message, and confirm that their bid corresponds to the hash they gave you.
This gives rather stronger guarantees than you need - it's actually not possible from them to work out your bid before you send them your full message. However, there is no unseal() function as you describe.
This simple scheme has various weaknesses that a full zero-knowledge scheme would not have. For example, if they fake you out by sending you a random number instead of a hash, then they can work out your bid without revealing their own. But you didn't ask for bullet-proof. This prevents both accidental and (I think) undetectable cheating, and uses only a commonly-available command line utility, plus a random number generator (dice will do).
If, as you say, you want them to be able to recover your bid without any further input from you, and you are willing to trust them only to do it after posting their bid, then just encrypt using any old symmetric cipher (gpg --symmetric, perhaps) and the key, "rot13". This will prevent accidental cheating, but allow undetectable cheating.
One idea that poped into my mind was to maybe base your algorithm on the mathematics
used for secure key sharing.
If you want to give two persons, Bob and Alice, half a key each so
that only when combining them they will be able to open whatever the key locks, how do you do that? The solution to this comes from mathematics. Say you have two points A (-2,2) and B (2,0) in a x/y coordinate system.
|
A +
|
C
|
---+---+---+---|---+---B---+---+---+---
|
+
|
+
If you draw a straight line between them it will cross the y axis at exactly one single point, C (0,1).
If you only know one of the points A or B it is impossible to tell where it will cross.
Thus you can let the points A and B be the shared keys which when combined will reveal the y-value
of the crossing point (i.e. 1 in this example) and this value is then typically used as
a real key for something.
For your bidding application you could let seal() and unseal() swap the y-value between the C and B points
(deterministic) but have the A point vary from time to time.
This way seal(y-value of point B) will give completely different results depending on point A,
but unseal(seal(y-value of point B)) should return the y-value of B which is what you ask for.
PS
It is not required to have A and B on different sides of the y-axis, but is much simpler conceptually to think of it this way (and I recommend implementing it that way as well).
With this straight line you can then share keys between several persons so that only two of
them are needed to unlock whatever. It is possible to use curve types other then straight lines to create other
key sharing properties (i.e. 3 out of 3 keys are required etc).
Pseudo code:
encode:
value = 2000
key = random(0..255); // our key is only 2 bytes
// 'sealing it'
value = value XOR 2000;
// add key
sealed = (value << 16) | key
decode:
key = sealed & 0xFF
unsealed = key XOR (sealed >> 16)
Would that work?
Since it seems that you are assuming that the other person doesn't want to know your bid until after they've placed their own, and can be trusted not to cheat, you could try a variable rotation scheme:
from random import randint
def seal(input):
r = randint(0, 50)
obfuscate = [str(r)] + [ str(ord(c) + r) for c in '%s' % input ]
return ':'.join(obfuscate)
def unseal(input):
tmp = input.split(':')
r = int(tmp.pop(0))
deobfuscate = [ chr(int(c) - r) for c in tmp ]
return ''.join(deobfuscate)
# I suppose you would put your bid in here, for 100 dollars
tmp = seal('$100.00') # --> '1:37:50:49:49:47:49:49' (output varies)
print unseal(tmp) # --> '$100.00'
At some point (I think we may have already passed it) this becomes silly, and because it is so easy, you should just use simple encryption, where the message recipient always knows the key - the person's username, perhaps.
If the bids are fairly large numbers, how about a bitwise XOR with some predetermined random-ish number? XORing again will then retrieve the original value.
You can change the number as often as you like, as long as both client and server know it.
You could set a different base (like 16, 17, 18, etc.) and keep track of which base you've "sealed" the bid with...
Of course, this presumes large numbers (> the base you're using, at least). If they were decimal, you could drop the point (for example, 27.04 becomes 2704, which you then translate to base 29...)
You'd probably want to use base 17 to 36 (only because some people might recognize hex and be able to translate it in their head...)
This way, you would have numbers like G4 or Z3 or KW (depending on the numbers you're sealing)...
Here's a cheap way to piggyback off rot13:
Assume we have a function gibberish() that generates something like "fdjk alqef lwwqisvz" and a function words(x) that converts a number x to words, eg, words(42) returns "forty two" (no hyphens).
Then define
seal(x) = rot13(gibberish() + words(x) + gibberish())
and
unseal(x) = rot13(x)
Of course the output of unseal is not an actual number and is only useful to a human, but that might be ok.
You could make it a little more sophisticated with words-to-number function that would also just throw away all the gibberish words (defined as anything that's not one of the number words -- there are less than a hundred of those, I think).
Sanity check:
> seal(7)
fhrlls hqufw huqfha frira afsb ht ahuqw ajaijzji
> seal(7)
qbua adfshua hqgya ubiwi ahp wqwia qhu frira wge
> unseal(seal(7))
sueyyf udhsj seven ahkua snsfo ug nuhdj nwnvwmwv
I know this is silly but it's a way to do it "by hand" if all you have is rot13 available.