Can you use a trailing comma in a JSON object? - json

When manually generating a JSON object or array, it's often easier to leave a trailing comma on the last item in the object or array. For example, code to output from an array of strings might look like (in a C++ like pseudocode):
s.append("[");
for (i = 0; i < 5; ++i) {
s.appendF("\"%d\",", i);
}
s.append("]");
giving you a string like
[0,1,2,3,4,5,]
Is this allowed?

Unfortunately the JSON specification does not allow a trailing comma. There are a few browsers that will allow it, but generally you need to worry about all browsers.
In general I try turn the problem around, and add the comma before the actual value, so you end up with code that looks like this:
s.append("[");
for (i = 0; i < 5; ++i) {
if (i) s.append(","); // add the comma only if this isn't the first entry
s.appendF("\"%d\"", i);
}
s.append("]");
That extra one line of code in your for loop is hardly expensive...
Another alternative I've used when output a structure to JSON from a dictionary of some form is to always append a comma after each entry (as you are doing above) and then add a dummy entry at the end that has not trailing comma (but that is just lazy ;->).
Doesn't work well with an array unfortunately.

No. The JSON spec, as maintained at http://json.org, does not allow trailing commas. From what I've seen, some parsers may silently allow them when reading a JSON string, while others will throw errors. For interoperability, you shouldn't include it.
The code above could be restructured, either to remove the trailing comma when adding the array terminator or to add the comma before items, skipping that for the first one.

Simple, cheap, easy to read, and always works regardless of the specs.
$delimiter = '';
for .... {
print $delimiter.$whatever
$delimiter = ',';
}
The redundant assignment to $delim is a very small price to pay.
Also works just as well if there is no explicit loop but separate code fragments.

Trailing commas are allowed in JavaScript, but don't work in IE. Douglas Crockford's versionless JSON spec didn't allow them, and because it was versionless this wasn't supposed to change. The ES5 JSON spec allowed them as an extension, but Crockford's RFC 4627 didn't, and ES5 reverted to disallowing them. Firefox followed suit. Internet Explorer is why we can't have nice things.

As it's been already said, JSON spec (based on ECMAScript 3) doesn't allow trailing comma. ES >= 5 allows it, so you can actually use that notation in pure JS. It's been argued about, and some parsers did support it (http://bolinfest.com/essays/json.html, http://whereswalden.com/2010/09/08/spidermonkey-json-change-trailing-commas-no-longer-accepted/), but it's the spec fact (as shown on http://json.org/) that it shouldn't work in JSON. That thing said...
... I'm wondering why no-one pointed out that you can actually split the loop at 0th iteration and use leading comma instead of trailing one to get rid of the comparison code smell and any actual performance overhead in the loop, resulting in a code that's actually shorter, simpler and faster (due to no branching/conditionals in the loop) than other solutions proposed.
E.g. (in a C-style pseudocode similar to OP's proposed code):
s.append("[");
// MAX == 5 here. if it's constant, you can inline it below and get rid of the comparison
if ( MAX > 0 ) {
s.appendF("\"%d\"", 0); // 0-th iteration
for( int i = 1; i < MAX; ++i ) {
s.appendF(",\"%d\"", i); // i-th iteration
}
}
s.append("]");

PHP coders may want to check out implode(). This takes an array joins it up using a string.
From the docs...
$array = array('lastname', 'email', 'phone');
echo implode(",", $array); // lastname,email,phone

Interestingly, both C & C++ (and I think C#, but I'm not sure) specifically allow the trailing comma -- for exactly the reason given: It make programmaticly generating lists much easier. Not sure why JavaScript didn't follow their lead.

Rather than engage in a debating club, I would adhere to the principle of Defensive Programming by combining both simple techniques in order to simplify interfacing with others:
As a developer of an app that receives json data, I'd be relaxed and allow the trailing comma.
When developing an app that writes json, I'd be strict and use one of the clever techniques of the other answers to only add commas between items and avoid the trailing comma.
There are bigger problems to be solved...

Use JSON5. Don't use JSON.
Objects and arrays can have trailing commas
Object keys can be unquoted if they're valid identifiers
Strings can be single-quoted
Strings can be split across multiple lines
Numbers can be hexadecimal (base 16)
Numbers can begin or end with a (leading or trailing) decimal point.
Numbers can include Infinity and -Infinity.
Numbers can begin with an explicit plus (+) sign.
Both inline (single-line) and block (multi-line) comments are allowed.
http://json5.org/
https://github.com/aseemk/json5

No. The "railroad diagrams" in https://json.org are an exact translation of the spec and make it clear a , always comes before a value, never directly before ]:
or }:

There is a possible way to avoid a if-branch in the loop.
s.append("[ "); // there is a space after the left bracket
for (i = 0; i < 5; ++i) {
s.appendF("\"%d\",", i); // always add comma
}
s.back() = ']'; // modify last comma (or the space) to right bracket

According to the Class JSONArray specification:
An extra , (comma) may appear just before the closing bracket.
The null value will be inserted when there is , (comma) elision.
So, as I understand it, it should be allowed to write:
[0,1,2,3,4,5,]
But it could happen that some parsers will return the 7 as item count (like IE8 as Daniel Earwicker pointed out) instead of the expected 6.
Edited:
I found this JSON Validator that validates a JSON string against RFC 4627 (The application/json media type for JavaScript Object Notation) and against the JavaScript language specification. Actually here an array with a trailing comma is considered valid just for JavaScript and not for the RFC 4627 specification.
However, in the RFC 4627 specification is stated that:
2.3. Arrays
An array structure is represented as square brackets surrounding zero
or more values (or elements). Elements are separated by commas.
array = begin-array [ value *( value-separator value ) ] end-array
To me this is again an interpretation problem. If you write that Elements are separated by commas (without stating something about special cases, like the last element), it could be understood in both ways.
P.S. RFC 4627 isn't a standard (as explicitly stated), and is already obsolited by RFC 7159 (which is a proposed standard) RFC 7159

It is not recommended, but you can still do something like this to parse it.
jsonStr = '[0,1,2,3,4,5,]';
let data;
eval('data = ' + jsonStr);
console.log(data)

With Relaxed JSON, you can have trailing commas, or just leave the commas out. They are optional.
There is no reason at all commas need to be present to parse a JSON-like document.
Take a look at the Relaxed JSON spec and you will see how 'noisy' the original JSON spec is. Way too many commas and quotes...
http://www.relaxedjson.org
You can also try out your example using this online RJSON parser and see it get parsed correctly.
http://www.relaxedjson.org/docs/converter.html?source=%5B0%2C1%2C2%2C3%2C4%2C5%2C%5D

As stated it is not allowed. But in JavaScript this is:
var a = Array()
for(let i=1; i<=5; i++) {
a.push(i)
}
var s = "[" + a.join(",") + "]"
(works fine in Firefox, Chrome, Edge, IE11, and without the let in IE9, 8, 7, 5)

From my past experience, I found that different browsers deal with trailing commas in JSON differently.
Both Firefox and Chrome handles it just fine. But IE (All versions) seems to break. I mean really break and stop reading the rest of the script.
Keeping that in mind, and also the fact that it's always nice to write compliant code, I suggest spending the extra effort of making sure that there's no trailing comma.
:)

I keep a current count and compare it to a total count. If the current count is less than the total count, I display the comma.
May not work if you don't have a total count prior to executing the JSON generation.
Then again, if your using PHP 5.2.0 or better, you can just format your response using the JSON API built in.

Since a for-loop is used to iterate over an array, or similar iterable data structure, we can use the length of the array as shown,
awk -v header="FirstName,LastName,DOB" '
BEGIN {
FS = ",";
print("[");
columns = split(header, column_names, ",");
}
{ print(" {");
for (i = 1; i < columns; i++) {
printf(" \"%s\":\"%s\",\n", column_names[i], $(i));
}
printf(" \"%s\":\"%s\"\n", column_names[i], $(i));
print(" }");
}
END { print("]"); } ' datafile.txt
With datafile.txt containing,
Angela,Baker,2010-05-23
Betty,Crockett,1990-12-07
David,Done,2003-10-31

String l = "[" + List<int>.generate(5, (i) => i + 1).join(",") + "]";

Using a trailing comma is not allowed for json. A solution I like, which you could do if you're not writing for an external recipient but for your own project, is to just strip (or replace by whitespace) the trailing comma on the receiving end before feeding it to the json parser. I do this for the trailing comma in the outermost json object. The convenient thing is then if you add an object at the end, you don't have to add a comma to the now second last object. This also makes for cleaner diffs if your config file is in a version control system, since it will only show the lines of the stuff you actually added.
char* str = readFile("myConfig.json");
char* chr = strrchr(str, '}') - 1;
int i = 0;
while( chr[i] == ' ' || chr[i] == '\n' ){
i--;
}
if( chr[i] == ',' ) chr[i] = ' ';
JsonParser parser;
parser.parse(str);

I usually loop over the array and attach a comma after every entry in the string. After the loop I delete the last comma again.
Maybe not the best way, but less expensive than checking every time if it's the last object in the loop I guess.

Related

How to parse a text file to csv file using Perl

I am learning Perl and would like to parse a text file to csv file using Perl. I have a loop that generates the following text file:
//This part is what outputs on the text file
for $row(#$data) {
while(my($key,$value) = each(%$row)) {
print "${key}=${value}, ";
}
print "\n";
}
Text File Output:
name=Mary, id=231, age=38, weight=130, height=5.05, speed=26.233, time=30,
time=25, name=Jose, age=30, id=638, weight=150, height=6.05, speed=20.233,
age=40, weight=130, name=Mark, id=369, speed=40.555, height=5.07, time=30
CSV File Desired Output:
name,age,weight,height,speed,time
Mary,38,130,5.05,26.233,30,
Jose,30,150,6.05,20.233,25,
Mark,40,130,5.04,40.555,30
Any good feedback is welcome!
The key part here is how to manipulate your data so to extract what need be printed for each line. Then you are best off using a module to produce valid CSV, and Text::CSV is very good.
A program using an array of small hashrefs, mimicking data in the question
use strict;
use warnings;
use feature 'say';
use Text::CSV;
my #data = (
{ name => 'A', age => 1, weight => 10 },
{ name => 'B', age => 2, weight => 20 },
);
my $csv = Text::CSV->new({ binary => 1, auto_diag => 2 });
my $outfile = 'test.csv';
open my $ofh, '>', $outfile or die "Can't open $outfile: $!";
# Header, also used below for order of values for fields
my #hdr = qw(name age weight);
$csv->say($ofh, \#hdr);
foreach my $href (#data) {
$csv->say($ofh, [ #{$href}{#hdr} ]);
}
The values from hashrefs in a desired order are extracted using a hashref slice #{$href}{#hdr}, what is in general
#{ expression returning hash reference } { list of keys }
This returns a list of values for the given list of keys, from the hashref that the expression in the block {} must return. That is then used to build an arrayref (an anonymous array here, using []), what the module's say method needs in order to make and print a string of comma-separated-values† from that list of values.
Note a block that evaluates to a hash reference, used instead of a hash name that is used for a slice of a hash. This is a general rule that
Anywhere you'd put an identifier (or chain of identifiers) as part of a variable or subroutine name, you can replace the identifier with a BLOCK returning a reference of the correct type.
Some further comments
Look over the supported constructor's attributes; there are many goodies
For very simple data you can simply join fields with a comma and print
say $ofh join ',', #{$href}{#hdr};
But it is far safer to use a module to construct a valid CSV record. With the right choice of attributes in the constructor it can handle whatever is legal to embed in fields (some of what can take quite a bit of work to do correctly by hand) and it calls things which aren't
I list column names explicitly. Instead, you can fetch the keys and then sort in a desired order, but this will again need a hard-coded list for sorting
The program creates the file test.csv and prints to it the expected header and data lines.
† But separating those "values" with commas may involve a whole lot more than merely what the acronym for the "CSV format" stands for. A variety of things may come between those commas, including commas, newlines, and whatnot. This is why one is best advised to always use a library. Seeing constructor's options is informative.
The following commentary referred to the initial question. In the meanwhile the problems this addresses were corrected in OP's code and the question updated. I'm still leaving this text for some general comments that can be useful.
As for the code in the question and its output, there is almost certainly an issue with how the data is processed to produce #data, judged by the presence of keys HASH(address) in the output.
That string HASH(0x...) is output when one prints a variable which is a hash reference (what cannot show any of hash's content). Perl handles such a print by stringifying (producing a printable string out of something which is more complex) the reference in that way.
There is no good reason to have a hash reference for a hash key. So I'd suggest that you review your data and its processing and see how that comes about. (Or briefly show this, or post another question with it if it isn't feasible to add that to this one.)
One measure you can use to bypass that is to only use a list of keys that you know are valid, like I show above; however, then you may be leaving some outright error unhandled. So I'd rather suggest to find what is wrong.

Are both comma and colon redundant for JSON parser?

JSON I mentioned below is valid JSON.
I finished writing a parser of JSON which allowing only two basic data types of String and Object. Let me show what parser does in case of any ambiguity.
parse("{ "Mon": "weekday", "Tue": "weekday", "Sun": "weekend" }").get("Sun");//return value: "weekend"
parse("{ "weekday" : { "Mon": "1", "Tue": "2"} }").get("weekday").get("Mon");//return value: "1"
Function parse returns a dictionary from which we can get what we want.
I found that I didn't use any commas or colons to parse JSON, then I guess those notations may be also redundant for a full-data-type-supported JSON parser, is that true? If it is, they are for readability, right?
PS: what if it's invalid JSON? Same answer?
According to RFC 8259 (The JavaScript Object Notation (JSON) Data Interchange Format), the colon and comma are listed as name-separator and value-separator respectively.
See under section 2. JSON Grammar:
These are the six structural characters:
begin-array = ws %x5B ws ; [ left square bracket
begin-object = ws %x7B ws ; { left curly bracket
end-array = ws %x5D ws ; ] right square bracket
end-object = ws %x7D ws ; } right curly bracket
name-separator = ws %x3A ws ; : colon
value-separator = ws %x2C ws ; , comma
So, they are both valid JSON separators with specific uses.
Refer section 9. Parsers:
A JSON parser transforms a JSON text into another representation. A
JSON parser MUST accept all texts that conform to the JSON grammar.
A JSON parser MAY accept non-JSON forms or extensions.
An implementation may set limits on the size of texts that it
accepts. An implementation may set limits on the maximum depth of
nesting. An implementation may set limits on the range and precision
of numbers. An implementation may set limits on the length and
character contents of strings.
From the Parsers section, one can gather that there's no mention of skipping (ignoring) colon and/or comma because then the parser in question would not be conforming to JSON grammar.
Summing up, from the above sections, it is safe to say that any such decision of ignoring the JSON grammar would certainly be completely subjective implying that such parser is not conforming to the grammar.
So, that answers the question that the colon or comma are not redundant and they are essential part of the JSON grammar.
Hope that helps!
Json is a subset of JavaScript syntax. It's very small subset, and so not all of the punctuation is necessary. But it is necessary in full expression syntax, because in many cases you cannot know where one expression in a list ends and the next one starts, unless there is a comma between them.
(There are alternatives to commas, of course. Lisp S-expressions don't need commas, as Ira Baxter points out, but they use more parentheses, which many people find noisier than commas.)
So as long as you consider it important to be able to insert JSON into a JavaScript text, you need to keep the JavaScript form, commas and colons and all.
One important aspect of JSON is that correct JSON is safe. You cannot insert untested JSON into an executable string, of course. That would be insane. But a JSON parser should validate its input, and validated JSON is safe to the ninect into code. If your parser lets you leave out the commas, that would no longer be rhe case.

Repair Bad Json with Unescaped Quote in Field Name

Kissmetrics exports apparently produce invalid json when there is a quote in the field name, for example, the following is one of the events produced:
{
"ab test group native dialogs on mobile":"Control",
"ab test group "interested" button copy":"Interested",
"_t":1412633724,
"_p":"hk5yxuxcqe/935mkbj+pz8xi0a8="
}
(Newlines were added to clarify the issue, we can't use those to repair the JSON).
I am looking for a mechanism for repairing such broken JSON.
There are som assumptions I believe we can take advantage of:
We can assume that the JSON being produced is flat (no nested objects or arrays), so I think we can take advantage of that.
I believe all fields are strings, except for _t, but not 100% sure.
I don't think we can assume the bad unescaped quotes will be balanced.
I believe KM removes commas and colons from field names, but not 100% sure -- they are not removed from values (though I believe values to be properly encoded).
Solution I am using now, in python, which I'm sure is imperfect:
match = regex.match(r'^{("(?P<fieldName>([^:]*))":(?P<fieldValue>([0-9]*\.?[0-9]+)|("(([^"])|(\\"))*"))(,|}))*$', s)
fieldNames = match.captures('fieldName')
fieldValues = match.captures('fieldValue')
newJson = "{%s}" % (
",".join(
"\"%s\":%s" % (
fieldName.replace("\"", "\\\""),
fieldValue,
)
for fieldName, fieldValue
in zip(fieldNames, fieldValues)
)
)
This assumes there are no colons in the keys.

Are whitespace characters insignificant in JSON?

Are blank characters like spaces, tabs and carriage returns ignored in json strings?
For example, is {"a":"b"} equal to {"a" : "b"}?
Yes, blanks outside a double-quoted string literal are ignored in the syntax. Specifically, the ws production in the JSON grammar in RFC 4627 shows:
Insignificant whitespace is allowed before or after any of the six
structural characters.
ws = *(
%x20 / ; Space
%x09 / ; Horizontal tab
%x0A / ; Line feed or New line
%x0D ; Carriage return
)
In standard JSON, whitespace outside of string literals is ignored, as has been said.
However, since your question is tagged C#, I should note that there's at least one other case in C#/.NET where whitespace in JSON does matter.
The DataContractJsonSerializer uses a special __type property to support deserializing to the correct subclass. This property is required to be the first property in an object, and to have no whitespace between the property name and the preceeding {. See this previous thread:
DataContractJsonSerializer doesn't work with formatted JSON?
At least, I have tested that the no-whitespace requirement is true as of .NET 4. Perhaps this will be changed in a future version to bring it more into line with the JSON standard?

Do the JSON keys have to be surrounded by quotes?

Example:
Is the following code valid against the JSON Spec?
{
precision: "zip"
}
Or should I always use the following syntax? (And if so, why?)
{
"precision": "zip"
}
I haven't really found something about this in the JSON specifications. Although they use quotes around their keys in their examples.
Yes, you need quotation marks. This is to make it simpler and to avoid having to have another escape method for javascript reserved keywords, ie {for:"foo"}.
You are correct to use strings as the key. Here is an excerpt from RFC 4627 - The application/json Media Type for JavaScript Object Notation (JSON)
2.2. Objects
An object structure is represented as a pair of curly brackets
surrounding zero or more name/value pairs (or members). A name is a
string. A single colon comes after each name, separating the name
from the value. A single comma separates a value from a following
name. The names within an object SHOULD be unique.
object = begin-object [ member *( value-separator member ) ] end-object
member = string name-separator value
[...]
2.5. Strings
The representation of strings is similar to conventions used in the C
family of programming languages. A string begins and ends with
quotation marks. [...]
string = quotation-mark *char quotation-mark
quotation-mark = %x22 ; "
Read the whole RFC here.
From 2.2. Objects
An object structure is represented as a pair of curly brackets surrounding zero or more name/value pairs (or members). A name is a string.
and from 2.5. Strings
A string begins and ends with quotation marks.
So I would say that according to the standard: yes, you should always quote the key (although some parsers may be more forgiving)
Yes, quotes are mandatory. http://json.org/ says:
string
""
" chars "
Not if you use JSON5
For regular JSON, yes keys must be quoted. But if you need otherwise, checkout widely used JSON5, which is so-named because is a superset of JSON that allows ES5 syntax, including:
unquoted property keys
single-quoted, escaped and multi-line strings
alternate number formats
comments
extra whitespace
The JSON5 reference implementation (json5 npm package) provides a JSON5 object that has parse and stringify methods with the same args and semantics as the built-in JSON object.
widely used, and depended on by many high profile projects
JSON5 was started in 2012, and as of 2022, now gets >65M downloads/week, ranks in the top 0.1% of the most depended-upon packages on npm, and has been adopted by major projects like Chromium, Next.js, Babel, Retool, WebStorm, and more. It's also natively supported on Apple platforms like MacOS and iOS.
~ json5.org homepage
In your situation, both of them are valid, meaning that both of them will work.
However, you still should use the one with quotation marks in the key names because it is more conventional, which leads to more simplicity and ability to have key names with white spaces etc.
Therefore, use the one with the quotation marks.
edit// check this: What is the difference between JSON and Object Literal Notation?
Since you can put "parent.child" dotted notation and you don't have to put parent["child"] which is also valid and useful, I'd say both ways is technically acceptable. The parsers all should do both ways just fine. If your parser does not need quotes on keys then it's probably better not to put them (saves space). It makes sense to call them strings because that is what they are, and since the square brackets gives you the ability to use values for keys essentially it makes perfect sense not to.
In Json you can put...
>var keyName = "someKey";
>var obj = {[keyName]:"someValue"};
>obj
Object {someKey: "someValue"}
just fine without issues, if you need a value for a key and none quoted won't work, so if it doesn't, you can't, so you won't so "you don't need quotes on keys". Even if it's right to say they are technically strings. Logic and usage argue otherwise. Nor does it officially output Object {"someKey": "someValue"} for obj in our example run from the console of any browser.