Difference between regular JSON storage and BSON (binary JSON)? - json

I am reading "Designing Data Intensive Applications". It is mentioned that you can extend JSON encoding with binary encoding (using libraries like MessagePack).
I am getting a little confused because everything gets encoded down to binary to be sent across the network, right? I mean: even every character in JSON is ultimately stored as 0s and 1s (binary).
What is the difference then? It will be great if one could elaborate with an example.

Related

Can and should I store key-value pairs with localized UTF-8 characters?

Ok, a question regarding JSON and UTF-8/unicode encoding. My simple stack is to be Backend: MongoDB + GraphQL, FrontEnd: Flutter app.
There’s no doubt there’s an international coding standard of naming things in English for a data model like {"firstName":"Oscar"} when storing data and later parsing data in apps, and that’s fine.
However, if the code for my project is only intended to be reviewed by and additional work done by other native speaking coders (Swedish in this case), are there any bad reasons to name the data model fields as {"fornamn":"Oscar"} or even using the Unicode-key {"förnamn":"Oscar"} when it comes to either storing, fetching or parsing JSON to both the MongoDB document, GraphQL resolvers and then the Flutter dart class model?
Why would I like to store it localized? - Because some of the fields and data structures make more sense and are easier to conceptualize when mentally modelled in the native tongue as opposed to translating to English back and forth.
Using JSON with Unicode characters is perfectly fine. This also applies to keys in JSON objects. I don't see any reason why not.
JSON should always be saved/transferred using a Unicode encoding, preferably UTF-8. If a non-Unicode encoding is used, the JSON needs to be rewritten using escape sequences for characters not available in that encoding. This applies equally to keys and other JSON elements.

Binary data in JSON

I am using JSON because its readable and offers flexibility as a transmission protocol for IPC. Part of the exchange between processes is a requirement to transfer large binary files (MB's).
I am using UDP and JSON as the transport protocol, in this case the binary data is translated into HEX strings with no delimiters so a single 8 bit character is used to represent each 4 bit nibble.
I'm exploring and looking for ways of keeping the JSON protocol but getting a more efficient way to transferring the binary hex data.
The reason for this is that UDP packets are limited in size and converting each nibble to a byte doubles the bit count and slows down the transfer as the data size is doubled.
Can anyone think of a better way of sending the binary data in a JSON packet without loosing anything?
I recommend Parket to submit information, Parket is a single table binary format, this format is used in machine learning in Python, here are some examples. Link Link
if your reason is UDP packets maybe try sockets: Link
again, :)
I hope it helps you, greetings

XSLT is to XML as <WHAT> is to Protocol Buffers? Is there a transform for Google Protocol Buffer data?

This is NOT a question about XML! This is a question about transforming binary data in a Google Protocol Buffer.
Let's say I have two .proto's generating two different "Messages". Imagine that in the one message all the units are metric, in the other they are all English. Aside from that names are all capitalized in the one and not the other.. and so on, and so on.
Now my question is:
How can I generically transform protocol buffer data in place WITHOUT either: (1) writing custom implementation to access a field in object A only to process it and mutate it into object B, or (2) pulling the data out of the proto namespace and paradigm (eg: stream to xml).
So far my solution has been moving data from protocol buffers through Xerces, transforming in Xalan and then streaming back into another object. Painful, clunky, slow.
Quite simply: there isn't anything comparable pre-existing of which I am aware. In theory something could be possible using the reader/writer APIs (for whichever platform you're targeting), but it still wouldn't be trivial, especially in the treatment of sub-objects.
It could be interesting to investigate such a transformation API, but I don't imagine it is going to be common-place enough to warrant anything as advanced as xslt.

What are the pros and cons of Base64 file upload through JSON, as opposed to AJAX or jQuery upload?

I was tasked to write image upload to remote server and save those images locally. It was quite easy to do it with Base64 transfer through JSON and storing with Node.js. However, is there a reason to not use this type of file upload, to use AJAX or other ways? (Other than the 30% bandwidth increase, which I know of. You can still include that in your answer in order for it to be full).
The idea of base64 encoding is to avoid binary data for protocols based on text. Outside this situation, it's I think always a bad idea.
Pros
Avoidance of binary data for protocols based on text, and independance from external files.
Avoidance of delimiter collision.
Cons
Time and space increased complexity; for space it's 33–36% (33% by the encoding itself, up to 3% more by the inserted line breaks).
API response payloads are larger/too large.
User Experience is negatively impacted, unless one invoke some lazy loading.
By including all image data together in one API response, the app
must receive all data before drawing anything on screen. This means
users will see on-screen loading states for longer and the app will
appear sluggish as users wait.
This is however mitigated with Axios and some lazy loader such as react-lazyload or lazyload or so.
CDN caching is harder. Contrary to image files, the Base64 strings inside an API response cannot be delivered via a CDN cache. The whole API response must be delivered by CDN. (cf., Don’t use Base64 encoded images on mobile and Why "optimizing" your images with Base64 is almost always a bad idea)
Image caching on the device is no longer possible.
Content management becomes harder on server side. Most content management tools handle images as binary files. But then when managing in binary, there is the time overhead of encoding/decoding.
No security gain and overhead in engineering to mitigate (Sanitizing, Input Validation, Escaping). Example of XSS attack: Preventing XSS with Base64 encoding: The False sense of web application security
The developers of that site might have opted to make the website appear more secure by having cryptic URLs and whatnot. However, that
doesn't mean this is security by obscurity.
If their website is vulnerable to SQL injection and they try to hide that by encoding the URLs, then it's security by obscurity. If their website is well secured against SQL injection; XSS; CSRF; etc., and they deiced to encode the URLs like that, then it's just plain stupidity.
It does not help with text encoded images such as svg (Probably Don’t Base64 SVG)
Data URIs aren't supported on IE6 or IE7, nor on Opera before 7.2 (Which browsers support data URIs and since which version?)
References
https://en.wikipedia.org/wiki/Base64
https://en.wikipedia.org/wiki/Delimiter#Delimiter_collision
SO: What is base 64 encoding used for?
https://medium.com/snapp-mobile/dont-use-base64-encoded-images-on-mobile-13ddeac89d7c
https://css-tricks.com/probably-dont-base64-svg/
https://security.stackexchange.com/questions/46362/purpose-of-using-base64-encoded-urls
https://bunnycdn.com/blog/why-optimizing-your-images-with-base64-is-almost-always-a-bad-idea/
https://www.davidbcalhoun.com/2011/when-to-base64-encode-images-and-when-not-to/
Data Encoding
Every data Encoding and Decoding can be used duo various reasons, which came up with benefits and downsides.
like:
Error-detection encodings : which can detect errors but increase data usage.
Encryption encodings: turns data to cipher which intruder wont decipher.
There are a lot of Encoding Algorithms which Alter Data in
Which has some usefullness to do that.
but with
Base64 Encoding, its encode every 6-bit data into one character (8-bit) .
3 Byte to 4 Byte but it only includes alphanumeric(62 distinc) and 2 signs.
its benefits is it Dose not have special chars and signs
Base64 Purpose
it make possible to transfer Any Data with Channels Which Prohibits us to have:
special chars like ' " / \ ...
non-printable Ascii like \0 \n \r \t \a
8-bit Ascii codes (ascii with 1 MSB )
binary files usually includes any data which if turns in ascii can be any 8-bit character.
in some protocols and application there are I/O Interfaces Which Does only accepts a handful of chars (alphanumeric with a few of signs).
duo to:
prevent to code injection (ex: SQL injection or any prgramming-language-syntax-like characters ; )
or just some character has already has a meaning in their protocol (ex: in URI QueryString character & has a meaning and cannot be in any QueryString Value)
or maybe the input is not intended to accept non-alphanumerical values. (ex: it should accept only Human Names)
but with base64 encoding you can encode anything and transfer it with
any channel you want.
Example:
you can encode an image or application and save it in DBMS with SQL
you can include some binary data in URI
you can send binary files in a protocol which has been designed to accepts only human chats as alphanumerical like IRC Channel
Base64 is a just a converting format that HTTP server cannot accept binary data in the contents except the HTTP Header type is binary or acceptable format defined by web-server.
As you might know, JSON can contain various formats and information; thus, you can contain such as
{
IMG_FILENAME="HELLO",
IMG_TYPE="IMG/JPEG",
DATA="~~~BASE64 ENCODED IMAGE~~~~"
}
You can send JSON file through AJAX or other method. But, as I told you, HTTP server have various limitation because it should keep RFC2616 (https://www.rfc-editor.org/rfc/rfc2616).
In short, Sending Through JSON can contain various data.
AJAX is just a type of sending as other ways does.
I used same solution in one of my project.
The only concern is the request body size. If all your images are small, like a few M, then you should be fine.
My server is asp.net core, its maxAllowedContentLength value is 30000000, which is approximately 28.6MB. When the image size is over this, the request failed with error "request body too large".
I think node.js should have similar setting, make sure to adjust it to meet your need.
Please note that when the request size is too big, the possibility of request timeout increases accordingly due to the network traffic. This will be an issue especially for the requests from phones.
I think the use of base64 is valid.
The only doubt is the size of the request, but this can be circumvented if you divide this base64 in the frontend, if a 30mb file you could divide each request into 5mb and in the backend put the parts together, this is useful even to do the "keep downloading" "when you have a problem with the network and corrupt some part.
Hugs
Base64 converts your data to an ASCII representation of the binary data. It allows you to embed your data in text streams such as JSON for example. Base64 increases the size of the data transferred by 33%.
multipart/form-data is the standard way of transferring binary data in HTTP requests. It allows you to use specific encodings / content types for each part you'd like to transfer. In my opinion, you should stick to multipart uploads unless you have specific requirements or device/SDK capabilities.
Checkout these links
What is difference Between Base64 and Multipart?
Base64 image upload VS Binary image upload?

Is there a standard to specify a binary format in json

I would like to know whether there is some standard that specifies binary formats using JSON as the describing language, similar to google's protocol buffers.
Protocol buffers seem very powerful but they require parsing of yet another language and considerable overhead, especially for compiled languages such as C++.
So I am wondering whether there is some accepted standard that uses JSON to describe a binary format. (Parsing the binary data might then still require some manual steps, but at least a clear and unique description of the data can be made available.)
To be clear, I am not talking about encoding binary data in JSON, I am talking about describing binary data in JSON.
Head to the ultimate Wikipedia listing and evaluate for yourself. I don't know what is the right argument to overcome your programmer's inertia. I'd consider Apache Avro the most fitting your requirement - it has JSON description.
For least friction, you could try MessagePack or BSON, which are JSON themselves, just better packed. But, by not having external declaration, need to be self descriptive, so must transport the field names on wire - so it's not as "binary" and compact as Protocol Buffers or Avro.