Custom reason for word wrap in VS Code extension to enable working with multiline values in Json - json

I write Jsons for an API that often requires to have multiline values because scripts are in between the data in the attributes. I've written an extension for me that can escape and unescape multiline values, therefore I can cycle between those states:
{
"value": "
multiline
value
"
}
{
"value": "multiline\n value"
}
However, in the un-escaped, formatted, status, I have an invalid Json, which just causes trouble. I have to switch between escaped and unescaped states to do any Json operation (like format), which I work around by replacing \n with \\n and back.
I have even considered switching to another format, but neither of those I tried had a killer feature making me switch. Among those: Jsonc (no multiline value support), XML (hard to write and read, but supports multiline values and indentation), YAML (would be an option, but does not support indentation in multiline values).
Can I force VS Code to render a specific sequence of characters as line break (in this case, it would be \\n) without changing the document data? The intended functionality is like what the Alt+Z word wrap does, just in a different place.

After some research, I've found the following:
it is not possible or hard to do to hijack the default editor and make it render lines in a way I want
even if I managed to, it might have an underlying issue of not tokenizing long lines
I have decided to go in a way, where I define a custom language, because:
* the API has a constant Json structure
* I can define a new grammar for the API scripts and embed it to the Jsons
This approach seems to be a way to go for me, although it might be a temporary solution. I'm losing Json validations, therefore I do not get a Json new line in value error, but I'm also losing errors in missing commas. This is something I want to approach with the following precautions:
if a JSON is valid and contains attributes of the API-defined classes, offer a button to switch to my defined language, which also un-escapes \\\n to \n
if language is my language, escape the newlines and try to pass it to a JSON validation.

Related

Regex for replacing unnecessary quotation marks within a JSON object containing an array

I am currently trying to format a JSON object using LabVIEW and have ran into the issue where it adds additional quotation marks invalidating my JSON formatting. I have not found a way around this so I thought just formatting the string manually would be enough.
Here is the JSON object that I have:
{
"contentType":"application/json",
"content":{
"msgType":2,
"objects":"["cat","dog","bird"]",
"count":3
}
}
Here is the JSON object I want with the quotation marks removed.
{
"contentType":"application/json",
"content":{
"msgType":2,
"objects":["cat","dog","bird"],
"count":3
}
}
I am still not an expert with regex and using a regex tester I was only able to grab the "objects" and "count" fields but I would still feel I would have to utilize substrings to remove the quotation marks.
Example I am using (would use a "count" to find the start of the next field and work backwards from there)
"([objects]*)"
Additionally, all the other Regex I have been looking at removes all instances of quotation marks whereas I only need a specific area trimmed. Thus, I feel that a specific regex replace would be a much more elegant solution.
If there is a better way to go about this I am happy to hear any suggestions!
Your question suggests that the built-in LabVIEW JSON tools are insufficient for your use case.
The built-in library converts LabVIEW clusters to JSON in a one-shot approach. Bundle all your data into a cluster and then convert it to JSON.
When it comes to parsing JSON, you use the path input terminal and the default type terminals to control what data is parsed from a JSON string.
If you need to handle JSON in a manner similar to say JavaScript, I would recommend something like the JSONText Toolkit which is free to use (and distribute) under the BSD licence. This allows more complex and iterative building of JSON strings from LabVIEW types and has text-path style element access along with many more features.
The Output controls from both my examples are identical - although JSONText provides a handy Pretty Print vi.
After using a regex from one of the comments, I ended up with this regex which allowed me to match the array itself.
(\[(?:"[^"]*"|[^"])+\])
I was able to split the the JSON string into before match, match and after match and removed the quotation marks from the end of 'before match' and start of 'after match' and concatenated the strings again to form a new output.

What is the issue with the trailing comma of JSON Object

Usually i am creating valid JSON objects like this:
{
hasPermission: true,
notificationStatusId: 1
};
ON VSCode, when is save the file, sometimes it's adding a tailing comma automatically after the last property like this:
{
hasPermission: true,
notificationStatusId: 1,
};
Actually I'm not asking how to disable that behavior and i know how to do that. i am asking what is the reason VSCode has that feature of adding a trailing comma automatically?
First, make sure the file's language mode is set to just JSON (not JSON with comments or JSON5 or JSON takes Manhattan). The
language mode is displayed in the lower right corner of the status bar:
VS Code will warn you about trailing commands in the JSON language mode.
As for why the trailing comma is being inserted, that is likely caused by one of your extensions as VS Code should not do this by default. Try going through your extensions to identify which one is causing this
If you are using some linter like prettier, Recently Default value of Trailing commas changed from "none" to "es5" in new versions
To fix this you have to change configuration of your linter
In case of prettier, create a file .prettierrc.json with content
{"trailingComma": "none"}
The trailing comma is probably added by prettier code formatter. This is recommended for javascript, and disallowed for JSON according to the documentation:
Trailing commas (sometimes called "final commas") can
be useful when adding new elements, parameters, or properties to
JavaScript code. If you want to add a new property, you can add a new
line without modifying the previously last line if that line already
uses a trailing comma. This makes version-control diffs cleaner and
editing code might be less troublesome.
JavaScript has allowed trailing commas in array literals since the
beginning, and later added them to object literals (ECMAScript 5) and
most recently (ECMAScript 2017) to function parameters.
JSON, however, disallows trailing commas.

Text encoding problems in JSON.stringified() object

I have a index.html with a which sends a text to a PHP code. This PHP sends it again by POST (curl) to a Node.js server, inserted in a JSON message (utf8-encoded)
//Node.js server file (app.js) -- gets the json and shows it in a <script> to save it in client JS
render(index, {json:{string:"mystring"}})
//Template to render (index.ejs)
var data = <%=JSON.stringify(json)%>;
So that I can pass those variables in the JSON to data. JSON is way bigger than here, I wrote only the part which creates a bug : the string contained here makes an "INvalid character" JS bug. What should I do ? Which encoding/decoding/escaping should I use ?
I have utf-8 everywhere, as all my other strings work, even with german or arabic characters. In this particular case, this is the "mystring" below which breaks the app :
If I remove the characters in the red circles It works.
Here is the string as it is in the JSON i receive :
"Otto\nTheater-, Konzert- und Gpb\n\u2028\u2028Rhoasse\u00dfe 20\u2028\n51065 K\u00f6ln\n\nTelefon: 0000-000000-0\u2028\nTelefax: 0000-000000\n\nE-Mail: address#mail.com\u2028"
Because it is a user-entered text, I must handle this kind of characters. I don't have access to the PHP part of the code, only to the nodeJS and client JS. How can I find and remove/convert those chars in JS ?
<%- JSON.stringify(data).replace(/[\u0000\u00ad\u0600-\u0604\u070f\u17b4\u17b5\u200c-\u200f\u2028-\u202f\u2060-\u206f\ufeff\ufff0-\uffff]/g, "\\n") %>;
I ended up replacing invalid unicode characters (which are valid for JSON but not in JS code) with line breaks. This solves the problem
JSON is commonly thought to be a subset of JavaScript, but it isn't quite. Due to an unfortunate oversight, the raw characters U+2028 and U+2029 are permitted in JSON string literals, but not in JavaScript string literals. In JavaScript, they are interpreted as newlines and so having one in a string literal is a syntax error.
Consequently this:
var data = <%=JSON.stringify(json)%>;
isn't safe. You can make it so by manually replacing them with string-literal-escaped versions:
JSON.stringify(json).replace('\u2028', '\\u2028').replace('\u2029', '\\u2029')
Typically it's best to avoid this kind of problem, and keep code and data strictly separated, by dropping the JSON data into an HTML data- attribute. It can then be read out of the DOM from the client-side script and passed through JSON.parse. Then the only kind of escaping you have to worry about is normal HTML-escaping, which hopefully your templating language does by default.
The other characters in your answer are actually okay for JS string literals, except for the control characters, which JSON also escapes.
It may well make sense to remove some of these characters anyway, as an input filtering step. It's unusual and almost always undesirable to have cruft like U+2028 in your data. You could consider filtering out the characters unsuitable for use in markup which include U+2028/9 and other bad things like bidi overrides that can mess up your page rendering.

JSON alternatives (for the purpose of specifying configuration)?

I like json as a format for configuration files for the software I write. I like that it's lightweight, simple, and widely supported. However, I'm finding that there are some things I'd really like in json that it doesn't have.
Json doesn't have multiline strings or here documents ( http://en.wikipedia.org/wiki/Here_document ), and that is often very awkward when you want your json file to be human-readable and -editable. You can use arrays of strings, but that's a kludgy workaround.
Json doesn't allow comments.
If you look at the formats of unix configuration files, you see a lot of people designing their own awkward formats for things that it would really make more sense to do using some kind of general-purpose thing. For example, here's some code from an Apache config file:
RewriteEngine on
RewriteBase /temp
RewriteCond %{HTTP_ACCEPT} application/xhtml\+xml
RewriteCond %{HTTP_ACCEPT} !application/xhtml\+xml\s*;\s*q=0
RewriteCond %{REQUEST_URI} \.html
RewriteCond %{THE_REQUEST} HTTP/1\.1
RewriteRule t\.html t.xhtml [T=application/xhtml+xml]
Essentially, what's going on here is that they've invented an extremely painful way of writing a boolean function f(w,x,y,z)=w&!x&y&z. You want a logical "or"? They've got some separate (ugly) mechanism for that, too.
What this seems to point toward is some kind of data description language that is simple and Turing-incomplete, but still more expressive, flexible, and convenient than json. Does anyone know of such a language?
To my taste, XML is too complicated, and lisp expressions have the wrong features (Turing-completeness) and lack the right features (here documents, expressive syntax).
[EDIT] The title is misleading. I'm not literally interested in the next iteration of json. I'm not interested in languages that are a subset of javascript. I'm interested in alternative data-description languages.
The EDN format is one option based on Clojure literals. It is almost a superset of JSON, except that no special symbol separates keys and values in maps (as : does in JSON); rather, all elements are separated by whitespace and/or a comma and a map is encoded as a list with an even number of elements, enclosed in {..}.
EDN allows for comments (to newline using ;, or to end of the next element read using #_), but not here-docs. It is extensible to new types using a tag notation:
#myapp/Person {:first "Fred" :last "Mertz"}
The argument of the myapp/Person tag (i.e. {:first "Fred" :last "Mertz"}) must be a valid EDN expression, which makes it unextensible to here-doc support.
It has two built-in tags: #inst for timestamps and #uuid. It also supports namespaced symbol (i.e. identifier) and keyword (i.e. map key consts) types; it distinguishes lists (..) and vectors [..]. An element of any type may be used as a key in a map.
In the context of your above problem, one could invent an #apache/rule-or tag which accepts a sequence of elements, whose semantics I leave up to you!
Have a look at http://github.com/igagis/puu/
It is even simpler than JSON.
It has C++ style comments.
It is possible to format multiline strings and use escaped new line \n and tab \t chars if "real" new line or tab is needed.
Here is the example snippet:
"String object"
AnotherStringObject
"String with children"{
"child 1"
Child2
"child three"{
SubChild1
"Subchild two"
Property1 {Value1}
"Property two" {"Value 2"}
//comment
/* multi-line
comment */
"multi-line
string"
"Escape sequences \" \n \r \t \\"
}
R"qwerty(
This is a
raw string, "Hello world!"
int main(argc, argv){
int a = 10;
printf("Hello %d", a);
}
)qwerty"
}
Consider TOML.
Designed for configuration. Appears to be pretty friendly and powerful. Easy to read and supports a wide range of datatypes and structures. There are parsers for a lot of languages:
C
C#
C++
Common Lisp
Crystal
Dart
Erlang
Fortran
Go
Janet
Java
JavaScript
Julia
Kotlin
Lua
Nim
OCaml
Perl
Perl6/Raku
Python
Rust
Swift
V
The 'J' in JSON is "Javascript". If a particular desired syntax construct isn't in Javascript, then it won't be on JSON.
Heredocs are beyond JSON's purview. That's a language syntax construct for simplified multi-line string definition, but JSON is a transport notation. It has nothing to do with construction. It does, however, have multiline strings, simply by allowing \n newline characters within strings. There's nothing in JSON that says you can't have a linebreak in a string. As long as the containing quote characters are correct, it's perfectly valid. e.g.
{"x":"y\nz"}
is 100% legitimate valid JSON, and is a multiline string, whereas
{"x":"y
z"}
isn't and will fail on parsing.
There's always what I like to call "real JSON". JSON stands for JavaScript Object Notation, and JavaScript does have comments and something close enough to heredocs.
For the heredoc, you would use JavaScript's E4X inline XML:
{
longString: <>
Hello, world!
This is a long string made possible with the magic of E4X.
Implementing a parser isn't so difficult.
</>.toString() // And a comment
/* And another
comment */
}
You can use Firefox's JavaScript engine (FF is the only browser to support E4X currently) or you can implement your own parser, which really isn't so difficult.
Here's the E4X quickstart guide, too.
Since March 2018 you can use JSON5 which seems to have added everything you (& many others) were missing from JSON.
Short Example (JSON5)
{
// comments
unquoted: 'and you can quote me on that',
singleQuotes: 'I can use "double quotes" here',
lineBreaks: "Look, Mom! \
No \\n's!",
hexadecimal: 0xdecaf,
leadingDecimalPoint: .8675309, andTrailing: 8675309.,
positiveSign: +1,
trailingComma: 'in objects', andIn: ['arrays',],
"backwardsCompatible": "with JSON",
}
The JSON5 Data Interchange Format (JSON5) is a superset of JSON that
aims to alleviate some of the limitations of JSON by expanding its
syntax to include some productions from ECMAScript 5.1.
Summary of Features
The following ECMAScript 5.1 features, which are not supported in
JSON, have been extended to JSON5.
Objects
Object keys may be an ECMAScript 5.1 IdentifierName.
Objects may have a single trailing comma.
Arrays
Arrays may have a single trailing comma.
Strings
Strings may be single quoted.
Strings may span multiple lines by escaping new line characters.
Strings may include character escapes.
Numbers
Numbers may be hexadecimal.
Numbers may have a leading or trailing decimal point.
Numbers may be IEEE 754 positive infinity, negative infinity, and NaN.
Numbers may begin with an explicit plus sign.
Comments
Single and multi-line comments are allowed.
White Space
Additional white space characters are allowed.
GitHub: https://github.com/json5/json5
One important attribute of JSON (probably the most important) is that you can easily "flip" between the string representation and the representation in object form, and the objects used to represent the object form are relatively simple arrays and maps. This is what makes JSON so useful in a networking context.
The functions you want would conflict with this dual nature of JSON.
For configuration you could use an embeddable scripting language, such as lua or python, in fact this is not an uncommon thing to do for configuration. That gives you multiline strings or here documents, and comments. It also makes it easier to have things like the boolean function you describe. However, the scripting languages are, of course, Turing complete.
There is also ELDF.
Although it does not support comments, they can be emulated via empty keys:
config_var1 = value1
=some comment
config_var2 = value2

Is it true that a JSON document will not be parseable until the last byte is written?

Is the following hypothesis valid?
Leaving aside whitespace, once the first character of a JSON document
has been written, the resulting stream will not parse as valid JSON
until the last character has been written.
I'm interested in using this assumption so that when I have one process writing a file and another reading it, I can safely ignore partially-written files by ignoring anything that doesn't parse as valid JSON.
I'm sure it depends on the parser you are using... it seems than any scrupulous parser would follow that rule due to the structure of JSON... curly brackets around every "object" key/value pair, including any wrapping document { }).
As always with programming, test rather than assume.