Related
Trying to scrape a webpage, I hit the necessity to work with ASP.NET's __VIEWSTATE variables. So, ever the optimist, I decided to read up on those variables, and their formats. Even though classified as Open Source by Microsoft, I couldn't find any formal definition:
Everybody agrees the first step to do is decode the string, using a Base64 decoder. Great - that works...
Next - and this is where the confusion sets in:
Roughly 3/4 of the decoders seem to use binary values (characters whose values indicate the the type of field which is follow). Here's an example of such a specification. This format also seems to expect a 'signature' of 0xFF 0x01 as first two bytes.
The rest of the articles (such as this one) describe a format where the fields in the format are separated (or marked) by t< ... >, p< ... >, etc. (this seems to be the case of the page I'm interested in).
Even after looking at over a hundred pages, I didn't find any mention about the existence of two formats.
My questions are: Are there two different formats of __VIEWSTATE variables in use, or am I missing something basic? Is there any formal description of the __VIEWSTATE contents somewhere?
The view state is serialized and deserialized by the
System.Web.UI.LosFormatter class—the LOS stands for limited object
serialization—and is designed to efficiently serialize certain types
of objects into a base-64 encoded string. The LosFormatter can
serialize any type of object that can be serialized by the
BinaryFormatter class, but is built to efficiently serialize objects
of the following types:
Strings
Integers
Booleans
Arrays
ArrayLists
Hashtables
Pairs
Triplets
Everything you need to know about ViewState: Understanding View State
Is this valid json?
{
"a" : "x",
"a" : "y"
}
http://jsonlint.com/ says yes.
http://www.json.org/ doesn't say anything about it being forbidden.
But obviously it doesn't make much sense, does it?
Most implementations probably use a hashtable so it is being overriden anyways.
The short answer: Yes but is not recommended.
The long answer: It depends on what you call valid...
ECMA-404 "The JSON Data Interchange Syntax" doesn't say anything about duplicated names (keys).
However, RFC 8259 "The JavaScript Object Notation (JSON) Data Interchange Format" says:
The names within an object SHOULD be unique.
In this context SHOULD must be understood as specified in BCP 14:
SHOULD This word, or the adjective "RECOMMENDED", mean that there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.
RFC 8259 explains why unique names (keys) are good:
An object whose names are all unique is interoperable in the sense
that all software implementations receiving that object will agree on
the name-value mappings. When the names within an object are not
unique, the behavior of software that receives such an object is
unpredictable. Many implementations report the last name/value pair
only. Other implementations report an error or fail to parse the
object, and some implementations report all of the name/value pairs,
including duplicates.
Also, as Serguei pointed out in the comments: ECMA-262 "ECMAScript® Language Specification", reads:
In the case where there are duplicate name Strings within an object, lexically preceding values for the same key shall be overwritten.
In other words, last-value-wins.
Trying to parse a string with duplicated names with the Java implementation by Douglas Crockford (the creator of JSON) results in an exception:
org.json.JSONException: Duplicate key "status" at
org.json.JSONObject.putOnce(JSONObject.java:1076)
From the standard (p. ii):
It is expected that other standards will refer to this one, strictly adhering to the JSON text format, while
imposing restrictions on various encoding details. Such standards may require specific behaviours. JSON
itself specifies no behaviour.
Further down in the standard (p. 2), the specification for a JSON object:
An object structure is represented as a pair of curly bracket tokens surrounding zero or more name/value pairs.
A name is a string. A single colon token follows each name, separating the name from the value. A single
comma token separates a value from a following name.
It does not make any mention of duplicate keys being invalid or valid, so according to the specification I would safely assume that means they are allowed.
That most implementations of JSON libraries do not accept duplicate keys does not conflict with the standard, because of the first quote.
Here are two examples related to the C++ standard library. When deserializing some JSON object into a std::map it would make sense to refuse duplicate keys. But when deserializing some JSON object into a std::multimap it would make sense to accept duplicate keys as normal.
There are 2 documents specifying the JSON format:
http://json.org/
https://www.rfc-editor.org/rfc/rfc7159
The accepted answer quotes from the 1st document. I think the 1st document is more clear, but the 2nd contains more detail.
The 2nd document says:
Objects
An object structure is represented as a pair of curly brackets
surrounding zero or more name/value pairs (or members). A name is a
string. A single colon comes after each name, separating the name
from the value. A single comma separates a value from a following
name. The names within an object SHOULD be unique.
So it is not forbidden to have a duplicate name, but it is discouraged.
I came across a similar question when dealing with an API that accepts both XML and JSON, but doesn't document how it would handle what you'd expect to be duplicate keys in the JSON accepted.
The following is a valid XML representation of your sample JSON:
<object>
<a>x</a>
<a>y</a>
</object>
When this is converted into JSON, you get the following:
{
"object": {
"a": [
"x",
"y"
]
}
}
A natural mapping from a language that handles what you might call duplicate keys to another, can serve as a potential best practice reference here.
Hope that helps someone!
The JSON spec says this:
An object is an unordered set of name/value pairs.
The important part here is "unordered": it implies uniqueness of keys, because the only thing you can use to refer to a specific pair is its key.
In addition, most JSON libs will deserialize JSON objects to hash maps/dictionaries, where keys are guaranteed unique. What happens when you deserialize a JSON object with duplicate keys depends on the library: in most cases, you'll either get an error, or only the last value for each duplicate key will be taken into account.
For example, in Python, json.loads('{"a": 1, "a": 2}') returns {"a": 2}.
Posting and answer because there is a lot of outdated ideas and confusion about the standards. As of December 2017, there are two competing standards:
RFC 8259 - https://www.rfc-editor.org/rfc/rfc8259
ECMA-404 - http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf
json.org suggests ECMA-404 is the standard, but this site does not appear to be an authority. While I think it's fair to consider ECMA the authority, what's important here is, the only difference between the standards (regarding unique keys) is that RFC 8259 says the keys should be unique, and the ECMA-404 says they are not required to be unique.
RFC-8259:
"The names within an object SHOULD be unique."
The word "should" in all caps like that, has a meaning within the RFC world, that is specifically defined in another standard (BCP 14, RFC 2119 - https://www.rfc-editor.org/rfc/rfc2119) as,
SHOULD This word, or the adjective "RECOMMENDED", mean that
there may exist valid reasons in particular circumstances to ignore
a particular item, but the full implications must be understood and
carefully weighed before choosing a different course.
ECMA-404:
"The JSON syntax does not impose any restrictions on the strings used
as names, does not require that name strings be unique, and does not
assign any significance to the ordering of name/value pairs."
So, no matter how you slice it, it's syntactically valid JSON.
The reason given for the unique key recommendation in RFC 8259 is,
An object whose names are all unique is interoperable in the sense
that all software implementations receiving that object will agree on
the name-value mappings. When the names within an object are not
unique, the behavior of software that receives such an object is
unpredictable. Many implementations report the last name/value pair
only. Other implementations report an error or fail to parse the
object, and some implementations report all of the name/value pairs,
including duplicates.
In other words, from the RFC 8259 viewpoint, it's valid but your parser may barf and there's no promise as to which, if any, value will be paired with that key. From the ECMA-404 viewpoint (which I'd personally take as the authority), it's valid, period. To me this means that any parser that refuses to parse it is broken. It should at least parse according to both of these standards. But how it gets turned into your native object of choice is, in any case, unique keys or not, completely dependent on the environment and the situation, and none of that is in the standard to begin with.
SHOULD be unique does not mean MUST be unique. However, as stated, some parsers would fail and others would just use the last value parsed. However, if the spec was cleaned up a little to allow for duplicates then I could see a use where you may have an event handler which is transforming the JSON to HTML or some other format... In such cases it would be perfectly valid to parse the JSON and create another document format...
[
"div":
{
"p": "hello",
"p": "universe"
},
"div":
{
"h1": "Heading 1",
"p": "another paragraph"
}
]
could then easily parse to html for example:
<body>
<div>
<p>hello</p>
<p>universe</p>
</div>
<div>
<h1>Heading 1</h1>
<p>another paragraph</p>
</div>
</body>
I can see the reasoning behind the question but as it stands... I wouldn't trust it.
It's not defined in the ECMA JSON standard. And generally speaking, a lack of definition in a standard means, "Don't count on this working the same way everywhere."
If you're a gambler, "many" JSON engines will allow duplication and simply use the last-specified value. This:
var o = {"a": 1, "b": 2, "a": 3}
Becomes this:
Object {a: 3, b: 2}
But if you're not a gambler, don't count on it!
Asking for purpose, there are different answers:
Using JSON to serialize objects (JavaScriptObjectNotation), each dictionary element maps to an indivual object property, so different entries defining a value for the same property has no meaning.
However, I came over the same question from a very specific use case:
Writing JSON samples for API testing, I was wondering how to add comments into our JSON file without breaking the usability. The JSON spec does not know comments, so I came up with a very simple approach:
To use duplicate keys to comment our JSON samples.
Example:
{
"property1" : "value1", "REMARK" : "... prop1 controls ...",
"property2" : "value2", "REMARK" : "... value2 raises an exception ...",
}
The JSON serializers which we are using have no problems with these "REMARK" duplicates and our application code simply ignores this little overhead.
So, even though there is no meaning on the application layer, these duplicates for us provide a valuable workaround to add comments to our testing samples without breaking the usability of the JSON.
The standard does say this:
Programming languages vary widely on whether they support objects, and
if so, what characteristics and constraints the objects offer. The
models of object systems can be wildly divergent and are continuing to
evolve. JSON instead provides a simple notation for expressing
collections of name/value pairs. Most programming languages will have
some feature for representing such collections, which can go by names
like record, struct, dict, map, hash, or object.
The bug is in node.js at least. This code succeeds in node.js.
try {
var json = {"name":"n","name":"v"};
console.log(json); // outputs { name: 'v' }
} catch (e) {
console.log(e);
}
According to RFC-7159, the current standard for JSON published by the Internet Engineering Task Force (IETF), states "The names within an object SHOULD be unique". However, according to RFC-2119 which defines the terminology used in IETF documents, the word "should" in fact means "... there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course." What this essentially means is that while having unique keys is recommended, it is not a must. We can have duplicate keys in a JSON object, and it would still be valid.
From practical application, I have seen the value from the last key is considered when duplicate keys are found in a JSON.
In C# if you deserialise to a Dictionary<string, string> it takes the last key value pair:
string json = #"{""a"": ""x"", ""a"": ""y""}";
var d = JsonConvert.DeserializeObject<Dictionary<string, string>>(json);
// { "a" : "y" }
if you try to deserialise to
class Foo
{
[JsonProperty("a")]
public string Bar { get; set; }
[JsonProperty("a")]
public string Baz { get; set; }
}
var f = JsonConvert.DeserializeObject<Foo>(json);
you get a Newtonsoft.Json.JsonSerializationException exception.
Are the following valid JSON texts, or must their top-level value be an array or object?
4.0
"Hello World"
true
Related questions in the past, such as What is the minimum valid JSON?, and Is this simple string considered valid JSON? have concluded that they are not. This was based on the description of the JSON format in RFC-4627, which states that:
2. JSON Grammar
A JSON text is a sequence of tokens. The set of tokens includes six
structural characters, strings, numbers, and three literal names.
A JSON text is a serialized object or array.
JSON-text = object / array
These are the six structural characters:
[...]
However, the RFC-4627 status declares that it "does not specify an Internet standard of any kind". Instead, the official standard for JSON is the recently-published ECMA-404. Unlike RFC-4627, ECMA-404's description of valid JSON text does not include any requirement that it be an object or an array. For example, the section most similar to the quote above is missing that requirement:
4 JSON Text
A JSON text is a sequence of tokens formed from Unicode code points that conforms to the JSON value
grammar. The set of tokens includes six structural tokens, strings, numbers, and three literal name tokens.
The six structural tokens:
[...]
Given this new specification, are encoded non-array non-object top-level values considered valid JSON texts?
Douglas Crockford posted a comment on this Google+ post which helped me start to clarify things:
JSON is just a grammar, and the grammar includes numbers and strings. Uses of JSON must necessarily be more restrictive. RFC-4627 is one possible use, and was never intended to be the standard for JSON itself.
We cannot say that non-array non-object JSON texts are generally invalid, just that it is not valid to use them with internet media type application/json, per RFC-4627.
Representations of non-object non-array values are valid JSON texts per ECMA-404, which is the only currently published standard that might be identified as "the JSON specification".
However, it turns out that the IETF is likely to soon publish a replacement to RFC-4627 which will also be a specification of JSON. Its latest draft still includes the restriction on JSON texts, but also mentions that JSON has be specified in several places and that these specifications vary slightly. The draft specifically mentions that the definition of JSON in ECMA-262 (the ECMAScript/JavaScript specification) does not share the top-level value restriction.
Therefore, the question of whether non-object non-arrays are valid JSON texts must be disambiguated:
Is "hello" a valid JSON text as specified in RFC-4627 and its successor?
No.
Is "hello" a valid JSON text as specified by ECMA-404 and ECMA-262?
Yes.
Have a node.js app that is receiving JSON data strings that contain the literal NaN, like
"[1, 2, 3, NaN, 5, 6]"
This crashes JSON.parse(...) in Node.js. I'd like to parse it, if i can into an object.
I know NaN is not part of JSON spec. Most SO links (sending NaN in json) suggest to fix the output.
Here, though the data is produced in a server I don't control, it's by a commercial Java library where I can see the source code. And it's produced by Google's Gson library:
private Gson gson = (new GsonBuilder().serializeSpecialFloatingPointValues().create());
...
gson.toJson(data[i], Vector.class, jsonOut)
So that seems like a legitimate source. And according to the Gson API Javadoc it says I should be able to parse it:
Section 2.4 of JSON specification disallows special double values
(NaN, Infinity, -Infinity). However, Javascript specification (see
section 4.3.20, 4.3.22, 4.3.23) allows these values as valid
Javascript values. Moreover, most JavaScript engines will accept these
special values in JSON without problem. So, at a practical level, it
makes sense to accept these values as valid JSON even though JSON
specification disallows them.
Despite that, this fails in both Node.js and Chrome: JSON.parse('[1,2,3,NaN,"5"]')
Is there a flag to set in JSON.parse()? Or an alternative parser that accepts NaN as a literal?
I've been Googling for a while but can't seem to find a doc on this issue.
PHP: How to encode infinity or NaN numbers to JSON?
Have a node.js app that is receiving JSON data strings that contain the literal NaN, like
Then your NodeJS app isn't receiving JSON, it's receiving text that's vaguely JSON-like. NaN is not a valid JSON token.
Three options:
1. Get the source to correctly produce JSON
This is obviously the preferred course. The data is not JSON, that should be fixed, which would fix your problem.
2. Tolerate the NaN in a simple-minded way:
You could replace it with null before parsing it, e.g.:
var result = JSON.parse(yourString.replace(/\bNaN\b/g, "null"));
...and then handle nulls in the result. But that's very simple-minded, it doesn't allow for the possibility that the characters NaN might appear in a string somewhere.
Alternately, spinning Matt Ball's reviver idea (now deleted), you could change it to a special string (like "***NaN***") and then use a reviver to replace that with the real NaN:
var result = JSON.parse(yourString.replace(/\bNaN\b/g, '"***NaN***"'), function(key, value) {
return value === "***NaN***" ? NaN : value;
});
...but that has the same issue of being a bit simple-minded, assuming the characters NaN never appear in an appropriate place.
3. Use (shudder!) eval
If you know and trust the source of this data and there's NO possibility of it being tampered with in transit, then you could use eval to parse it instead of JSON.parse. Since eval allows full JavaScript syntax, including NaN, that works. Hopefully I made the caveat bold enough for people to understand that I would only recommend this in a very, very, very tiny percentage of situations. But again, remember eval allows arbitrary execution of code, so if there's any possibility of the string having been tampered with, don't use it.
When you deal with about anything mathematical or with industry data, NaN is terribly convenient (and often infinities too are). And it's an industry standard since IEEE754.
That's obviously why some libraries, notably GSON, let you include them in the JSON they produce, losing standard purity and gaining sanity.
Revival and regex solutions aren't reliably usable in a real project when you exchange complex dynamic objects.
And eval has problems too, one of them being the fact it's prone to crash on IE when the JSON string is big, another one being security risks.
That's why I wrote a specific parser (used in production) : JSON.parseMore
You can use JSON5 library. A quote from the project page:
The JSON5 Data Interchange Format (JSON5) is a superset of JSON that aims to alleviate some of the limitations of JSON by expanding its syntax to include some productions from ECMAScript 5.1.
This JavaScript library is the official reference implementation for JSON5 parsing and serialization libraries.
As you would expect, among other things it does support parsing NaNs (compatible with how Python and the like serialize them):
JSON5.parse("[1, 2, 3, NaN, 5, 6]")
> (6) [1, 2, 3, NaN, 5, 6]
The correct solution is to recompile the parser, and contribute an "allowNan" boolean flag to the source base. This is the solution other libraries have (python's comes to mind).
Good JSON libraries will permissively parse just about anything vaguely resembling JSON with the right flags set (perl's JSON.pm is notably flexible)... but when writing a message they produce standard JSON.
IE: leave the room cleaner than you found it.
Just a minor addition to TJ Crowder's already comprehensive enough reply, I'd rather use
var result = JSON.parse(yourString.replace(/\bNaN\b/g, '"NaN"'));
because I actually need to know if its a NaN value.
Also I'd do this inside a fetch or axios GET request, only if the default JSON parsing failed and the data came as a string.
const StringConstructor = "".constructor;
if (data.constructor === StringConstructor) {
data = JSON.parse(tableData.data.replace(/\bNaN\b/g, '"NaN"'))
}
I have a string which gets serialized to JSON in Javascript, and then deserialized to Java.
It looks like if the string contains a degree symbol, then I get a problem.
I could use some help in figuring out who to blame:
is it the Spidermonkey 1.8 implementation? (this has a JSON implementation built-in)
is it Google gson?
is it me for not doing something properly?
Here's what happens in JSDB:
js>s='15\u00f8C'
15°C
js>JSON.stringify(s)
"15°C"
I would have expected "15\u00f8C' which leads me to believe that Spidermonkey's JSON implementation isn't doing the right thing... except that the JSON homepage's syntax description (is that the spec?) says that a char can be
any-Unicode-character-
except-"-or-\-or-
control-character"
so maybe it passes the string along as-is without encoding it as \u00f8... in which case I would think the problem is with the gson library.
Can anyone help?
I suppose my workaround is to use either a different JSON library, or manually escape strings myself after calling JSON.stringify() -- but if this is a bug then I'd like to file a bug report.
This is not a bug in either implementation. There is no requirement to escape U+00B0. To quote the RFC:
2.5. Strings
The representation of strings is
similar to conventions used in the C
family of programming languages. A
string begins and ends with quotation
marks. All Unicode characters may be
placed within the quotation marks
except for the characters that must be
escaped: quotation mark, reverse
solidus, and the control characters
(U+0000 through U+001F).
Any character may be escaped.
Escaping everything inflates the size of the data (all code points can be represented in four or fewer bytes in all Unicode transformation formats; whereas encoding them all makes them six or twelve bytes).
It is more likely that you have a text transcoding bug somewhere in your code and escaping everything in the ASCII subset masks the problem. It is a requirement of the JSON spec that all data use a Unicode encoding.
hmm, well here's a workaround anyway:
function JSON_stringify(s, emit_unicode)
{
var json = JSON.stringify(s);
return emit_unicode ? json : json.replace(/[\u007f-\uffff]/g,
function(c) {
return '\\u'+('0000'+c.charCodeAt(0).toString(16)).slice(-4);
}
);
}
test case:
js>s='15\u00f8C 3\u0111';
15°C 3◄
js>JSON_stringify(s, true)
"15°C 3◄"
js>JSON_stringify(s, false)
"15\u00f8C 3\u0111"
This is SUPER late and probably not relevant anymore, but if anyone stumbles upon this answer, I believe I know the cause.
So the JSON encoded string is perfectly valid with the degree symbol in it, as the other answer mentions. The problem is most likely in the character encoding that you are reading/writing with. Depending on how you are using Gson, you are probably passing it a java.io.Reader instance. Any time you are creating a Reader from an InputStream, you need to specify the character encoding, or java.nio.charset.Charset instance (it's usually best to use java.nio.charset.StandardCharsets.UTF_8). If you don't specify a Charset, Java will use your platform default encoding, which on Windows is usually CP-1252.