Regex for replacing unnecessary quotation marks within a JSON object containing an array - json

I am currently trying to format a JSON object using LabVIEW and have ran into the issue where it adds additional quotation marks invalidating my JSON formatting. I have not found a way around this so I thought just formatting the string manually would be enough.
Here is the JSON object that I have:
{
"contentType":"application/json",
"content":{
"msgType":2,
"objects":"["cat","dog","bird"]",
"count":3
}
}
Here is the JSON object I want with the quotation marks removed.
{
"contentType":"application/json",
"content":{
"msgType":2,
"objects":["cat","dog","bird"],
"count":3
}
}
I am still not an expert with regex and using a regex tester I was only able to grab the "objects" and "count" fields but I would still feel I would have to utilize substrings to remove the quotation marks.
Example I am using (would use a "count" to find the start of the next field and work backwards from there)
"([objects]*)"
Additionally, all the other Regex I have been looking at removes all instances of quotation marks whereas I only need a specific area trimmed. Thus, I feel that a specific regex replace would be a much more elegant solution.
If there is a better way to go about this I am happy to hear any suggestions!

Your question suggests that the built-in LabVIEW JSON tools are insufficient for your use case.
The built-in library converts LabVIEW clusters to JSON in a one-shot approach. Bundle all your data into a cluster and then convert it to JSON.
When it comes to parsing JSON, you use the path input terminal and the default type terminals to control what data is parsed from a JSON string.
If you need to handle JSON in a manner similar to say JavaScript, I would recommend something like the JSONText Toolkit which is free to use (and distribute) under the BSD licence. This allows more complex and iterative building of JSON strings from LabVIEW types and has text-path style element access along with many more features.
The Output controls from both my examples are identical - although JSONText provides a handy Pretty Print vi.

After using a regex from one of the comments, I ended up with this regex which allowed me to match the array itself.
(\[(?:"[^"]*"|[^"])+\])
I was able to split the the JSON string into before match, match and after match and removed the quotation marks from the end of 'before match' and start of 'after match' and concatenated the strings again to form a new output.

Related

Custom reason for word wrap in VS Code extension to enable working with multiline values in Json

I write Jsons for an API that often requires to have multiline values because scripts are in between the data in the attributes. I've written an extension for me that can escape and unescape multiline values, therefore I can cycle between those states:
{
"value": "
multiline
value
"
}
{
"value": "multiline\n value"
}
However, in the un-escaped, formatted, status, I have an invalid Json, which just causes trouble. I have to switch between escaped and unescaped states to do any Json operation (like format), which I work around by replacing \n with \\n and back.
I have even considered switching to another format, but neither of those I tried had a killer feature making me switch. Among those: Jsonc (no multiline value support), XML (hard to write and read, but supports multiline values and indentation), YAML (would be an option, but does not support indentation in multiline values).
Can I force VS Code to render a specific sequence of characters as line break (in this case, it would be \\n) without changing the document data? The intended functionality is like what the Alt+Z word wrap does, just in a different place.
After some research, I've found the following:
it is not possible or hard to do to hijack the default editor and make it render lines in a way I want
even if I managed to, it might have an underlying issue of not tokenizing long lines
I have decided to go in a way, where I define a custom language, because:
* the API has a constant Json structure
* I can define a new grammar for the API scripts and embed it to the Jsons
This approach seems to be a way to go for me, although it might be a temporary solution. I'm losing Json validations, therefore I do not get a Json new line in value error, but I'm also losing errors in missing commas. This is something I want to approach with the following precautions:
if a JSON is valid and contains attributes of the API-defined classes, offer a button to switch to my defined language, which also un-escapes \\\n to \n
if language is my language, escape the newlines and try to pass it to a JSON validation.

What's the easiest way to convert a 'pretty printed' JSON string to a compact format representation?

This may be an odd question as it's specific to the JSON strings themselves, not the objects they represent. Given a 'pretty printed' JSON string (representing any JSON-encodable model), how would one reformat it to the 'compact' format?
My first thought was to not consider it JSON, but rather just a string, then use RegEx to remove duplicate spaces, remove newlines, etc., but that's not context aware so it risks affecting the keys and values portions of the JSON if you don't properly test that you're inside quotes.
My next thought was to try and construct an object from the JSON, but without a type to convert to, I'm not sure how to do that without manually parsing the values as 'ANY', then testing if they're an array, and recurse into it if they are, repeating the process. Then once I have the final object, serialize the result in compact form. However, that seems like a lot of overkill.
Is there an easier way to accomplish this? We're using Swift 4 if it helps.
UPDATE:
as pointed by #Mark A. Donohoe, this removes ALL whitespaces. so even though it looks coooooool, it's a dumb answer. don't fall for it.
i needed the same thing and i ended up creating a String extension:
extension String {
func toCompactJSON() -> String {
self.filter { !$0.isWhitespace && !$0.isNewline }
}
}
in my case though it was for testing purposes and it turned out to be useless as the order in which the Javascript object/arrays appear is not the same as while generated through the JSONEncoder.

Regex for matching with and without quotes for dynamic JSON

I have the following text strings:
"Name":"John"}]
"Age":36
"Address":"ABC,PQR234[]/.,#ANYCHARACTERS"
"Gender":null
I need to get two groups (key value pair) from this such that the output would be only:
Key|Value
Name|John
Age|36
Address|ABC,PQR234[]/.,#ANYCHARACTERS
The requirement is to have a single regex to grab everything in the double quotes if the double quotes are present. If not, take the value without the quotes.
In our example above, 36 and null are the one without the quotes and they need to be captured as well.
I have tried a lot but have failed to do so.
UPDATE:
I don't know why I am getting down votes for this question. Yes this is JSON that I am trying to parse but there is a reason behind why I am doing this and not using any document parser.
I am supposed to use Talend for getting a dynamic JSON converted into Key Value Pair. What I mean by dynamic is the fields of the JSON can vary and hence I do not have a fixed schema and hence cannot use a document parser (which demands a fixed structure of JSON). I am devising a solution to get around this using Normalizer (on comma) and then extracting the key value pair which will be in double quotes using Regular Expressions. I tried many things on my own and since I am not an expert in Regular expressions, I have come here to get inputs.
If you know any better solution to this, I would be very happy to get your inputs.
How about this?
/"?([^\n"]*)"?:"?([^\n"]*)"?/
Explained in detail at:
https://regex101.com/r/UM0rl2/1/

Delimiter for multiple json strings

I'd like to save multiple json strings to a file and separate them by a delimiter, such that it will be easy to read this list in, split on the delimiter and work with each json doc separately.
Serializing using a json array is not an option due to external reasons.
I would like to use a delimiter that is illegal in JSON (e.g. delimiting using a comma would be a bad idea since there are commas within the json strings).
Are there any characters that are not considered legal in JSON serialized strings?
I know it's not exactly what you needed, but you can use this SO answer to write the json string to a CSV, then read it on the other side by using a good streaming CSV reader such as this one
NDJSON
Have a look at NDJSON (Newline delimited JSON).
http://ndjson.org/
It seems to me to be exactly how you should do things, though its not exactly what you asked for. (If you can't flatten your JSON objects into single lines then it's not for you though!) You asked for a delimiter that is not allowed in JSON. Newline is allowed in JSON, but it is not necessary for JSON to contain newlines.
The format is used for log files amongst other things. I discovered it when looking at the Lichess API documentation.
You can start listening in to a broadcast stream of NDJSON data part way through, wait for the next newline character and then start processing objects as and when they arrive.
If you go for NDJSON, you are at least following a standard and I think you'd be hard pressed to find an alternative standard to follow.
Example NDJSON
{"some":"thing"}
{"foo":17,"bar":false,"quux":true}
{"may":{"include":"nested","objects":["and","arrays"]}}
An old question, but hopefully this answer will be useful.
Most JSON readers crash on this character: , which is information separator two. They declare it "unexpected token", so I guess it has to be wrapped to pass or something.

JSON - not all fields quoted in Dojo diji.tree sample code in book

O'Reilly book "Dojo - The Definitive Guide" page 378 shows the following sample Tree structure which is supposedly JSON. It seems to work in building the Dijit Tree structure.
{
identifier: 'name',
label:'name',
items: [
{
name: "Programming Languages",
children: [
etc...
Should the word identifier, label, items, name, and children be enclosed in quotes?
I'm writing a Python program to generate syntax that is compatible with their desired tree structure. Just to test my output, I tried:
testDict = "xxxx" where xxxx is the string supposed JSON string above.
It always gives an error that 'identifier' is not defined.
So I'm curious if this was a typo - or if there are some new keywords or features of JSON that I need to learn.
Thanks,
Neal Walters
JSON doesn't really have any additional features. That's the beauty of it :)
You don't have to wrap those names in quotes. The names before the colon are supposed to be quoted, strictly according to the JSON spec. Why? Mostly (only?) because JavaScript gets upset when reserved words are used as object properties -- for example, if you had properties called 'function' or 'return'. Quoting these names consistently avoids this problem. Dojo doesn't care. It just uses eval to parse the JSON, and as long as you avoid keywords, it won't enforce the use of quotes. You can use quotes consistently if you like to be JSON compliant.
I'm not sure exactly what problem you had with your testDict example. I don't fully understand the context (what is testDict, what language are you using to set up that string, how is it used, etc.) Perhaps you needed to escape something in the JSON such as nested double quotes?
These are not new keywords or features of JSON but they are how dojo expects a JSON file to be structured. You should wrap them in quotes. Here's an example from dojocampus.