Golang json.Unmarshal invalid character '\n' in string literal - json

Golang json.Unmarshal throws error for newline character. Go Playground
How to unmarshal data if string contains newline?

Simply escaping the newline character should do the trick:
var val []byte = []byte(`"{\"channel\":\"buupr\\niya\",\"name\":\"john\", \"msg\":\"doe\"}"`)
The output for the above:
{"channel":"buupr\niya","name":"john", "msg":"doe"}
Since you're attempting to pass a raw string literal here, you will need to be able to represent the JSON in string form, which requires you to escape the newline character.

Related

Parsing JSON in Delphi with slashes

I receive JSON strings from a WebService with slashes in it.
When I parse them, Delphi adds backslashes.
V := TJSONObject.ParseJSONValue( '{"Kode":"ABC/123" }') ;
V.ToString returns : {"Kode":"ABC\\/123"}
What is best practice to parse JSON strings in Delphi?
And why does the function V.Value always returns an empty string?
In the unit System.JSON, I see the following code:
function TJSONAncestor.Value: string;
begin
Result := '';
end;
Do I need to add code myself to System units that ship with Delphi?
\ is required to be escaped as \\, so your claim that "Just removing the backslash in the result does not work, because the value could also be "ABC\123"" is wrong, as it would have to be "ABC\\123" instead.
The only other character required to be escaped is " as \". / is certainly not required to be escaped, but it MAY be escaped at the discretion of the implementation. It is not invalid to do so. As to WHY ToString() does that, you would have to ask Embarcadero.
If you needed to, you could easily do a search-and-replace of \/ into / without affecting other escape sequences, like \\ and \". Or, try using ToJSON() instead of ToString().
Regarding TJSONAncestor.Value(), TJSONAncestor is a base class, and its Value() is a virtual method that returns a blank string by default. Descendant classes (JSONString, TJSONNumber, etc) can override Value() to return more meaningful strings. But in your example, your input string represents a JSON object, so V will be pointing at a TJSONObject instance, and TJSONObject does not override Value() because it does not make sense for an object to be represented by a string. This is even documented behavior:
http://docwiki.embarcadero.com/Libraries/en/System.JSON.TJSONAncestor.Value
Returns the string representation of a simple JSON element like a string, number or boolean.
For structured JSON elements like an object and array returns empty string.

Dart decode escaped JSON

I'm trying to deserialize a simple JSON object from an API.
The API returns the JSON in escaped quotes since the data represents a user's quote.
test("escaped quoted json test", () {
var s = '''{"quote": "\"a quote from a user\""}''';
var b = json.decode(s);
expect(b["quote"], "\"a quote from a user\"");
});
However this throws:
FormatException: Unexpected character (at character 13)
{"quote": ""a quote from a user""}
Btw the JSON is valid:
{"quote": "\"a quote from a user\""}
How do I tell Dart to handle this correctly?
Thanks in advance.
The inner quote needs to be escaped with a literal backslash not an escaped backslash
var s = '''{"quote": "\\"a quote from a user\\""}''';
or
var s = r'''{"quote": "\"a quote from a user\""}''';
There is a difference between JSON written in Dart source code and JSON received over network.
If you put it in source like in your question, the string is interpreted. You can either adapt the string (double all \) or prefix it with r for raw string.

Go json.Unmarshal key with \u0000 \x00

Here is the Go playground link.
Basically there are some special characters ('\u0000') in my JSON string key:
var j = []byte(`{"Page":1,"Fruits":["5","6"],"\u0000*\u0000_errorMessages":{"x":"123"},"*_successMessages":{"ok":"hi"}}`)
I want to Unmarshal it into a struct:
type Response1 struct {
Page int
Fruits []string
Msg interface{} `json:"*_errorMessages"`
Msg1 interface{} `json:"\\u0000*\\u0000_errorMessages"`
Msg2 interface{} `json:"\u0000*\u0000_errorMessages"`
Msg3 interface{} `json:"\0*\0_errorMessages"`
Msg4 interface{} `json:"\\0*\\0_errorMessages"`
Msg5 interface{} `json:"\x00*\x00_errorMessages"`
Msg6 interface{} `json:"\\x00*\\x00_errorMessages"`
SMsg interface{} `json:"*_successMessages"`
}
I tried a lot but it's not working.
This link might help golang.org/src/encoding/json/encode_test.go.
Short answer: With the current json implementation it is not possible using only struct tags.
Note: It's an implementation restriction, not a specification restriction. (It's the restriction of the json package implementation, not the restriction of the struct tags specification.)
Some background: you specified your tags with a raw string literal:
The value of a raw string literal is the string composed of the uninterpreted (implicitly UTF-8-encoded) characters between the quotes...
So no unescaping or unquoting happens in the content of the raw string literal by the compiler.
The convention for struct tag values quoted from reflect.StructTag:
By convention, tag strings are a concatenation of optionally space-separated key:"value" pairs. Each key is a non-empty string consisting of non-control characters other than space (U+0020 ' '), quote (U+0022 '"'), and colon (U+003A ':'). Each value is quoted using U+0022 '"' characters and Go string literal syntax.
What this means is that by convention tag values are a list of (key:"value") pairs separated by spaces. There are quite a few restrictions for keys, but values may be anything, and values (should) use "Go string literal syntax", this means that these values will be unquoted at runtime from code (by a call to strconv.Unquote(), called from StructTag.Get(), in source file reflect/type.go, currently line #809).
So no need for double quoting. See your simplified example:
type Response1 struct {
Page int
Fruits []string
Msg interface{} `json:"\u0000_abc"`
}
Now the following code:
t := reflect.TypeOf(Response1{})
fmt.Printf("%#v\n", t.Field(2).Tag)
fmt.Printf("%#v\n", t.Field(2).Tag.Get("json"))
Prints:
"json:\"\\u0000_abc\""
"\x00_abc"
As you can see, the value part for the json key is "\x00_abc" so it properly contains the zero character.
But how will the json package use this?
The json package uses the value returned by StructTag.Get() (from the reflect package), exactly what we did. You can see it in the json/encode.go source file, typeFields() function, currently line #1032. So far so good.
Then it calls the unexported json.parseTag() function, in json/tags.go source file, currently line #17. This cuts the part after the comma (which becomes the "tag options").
And finally json.isValidTag() function is called with the previous value, in source file json/encode.go, currently line #731. This function checks the runes of the passed string, and (besides a set of pre-defined allowed characters "!#$%&()*+-./:<=>?#[]^_{|}~ ") rejects everything that is not a unicode letter or digit (as defined by unicode.IsLetter() and unicode.IsDigit()):
if !unicode.IsLetter(c) && !unicode.IsDigit(c) {
return false
}
'\u0000' is not part of the pre-defined allowed characters, and as you can guess now, it is neither a letter nor a digit:
// Following code prints "INVALID":
c := '\u0000'
if !unicode.IsLetter(c) && !unicode.IsDigit(c) {
fmt.Println("INVALID")
}
And since isValidTag() returns false, the name (which is the value for the json key, without the "tag options" part) will be discarded (name = "") and not used. So no match will be found for the struct field containing a unicode zero.
For an alternative solution use a map, or a custom json.Unmarshaler or use json.RawMessage.
But I would highly discourage using such ugly json keys. I understand likely you are just trying to parse such json response and it may be out of your reach, but you should fight against using these keys as they will just cause more problems later on (e.g. if stored in db, by inspecting records it will be very hard to spot that there are '\u0000' characters in them as they may be displayed as nothing).
You cannot do in such way due to: http://golang.org/ref/spec#Struct_types
But You can unmarshal to map[string]interface{} then check field names of that object through regexp.
I don't think this is possible with struct tags. The best thing you can do is unmarshal it into map[string]interface{} and then get the values manually:
var b = []byte(`{"\u0000abc":42}`)
var m map[string]interface{}
err := json.Unmarshal(b, &m)
if err != nil {
panic(err)
}
fmt.Println(m, m["\x00abc"])
Playground: http://play.golang.org/p/RtS7Nst0d7.

Importing JSON into R with in-line quotation marks

I'm attempting to read the following JSON file ("my_file.json") into R, which contains the following:
[{"id":"484","comment":"They call me "Bruce""}]
using the jsonlite package (0.9.12), the following fails:
library(jsonlite)
fromJSON(readLines('~/my_file.json'))
receiving an error:
"Error in parseJSON(txt) : lexical error: invalid char in json text.
84","comment":"They call me "Bruce""}]
(right here) ------^"
Here is the output from R escaping of the file:
readLines('~/my_file.json')
"[{\"id\":\"484\",\"comment\":\"They call me \"Bruce\"\"}]"
Removing the quotes around "Bruce" solves the problem, as in:
my_file.json
[{"id":"484","comment":"They call me Bruce"}]
But what is the issue with the escapement?
In R strings literals can be defined using single or double quotes.
e.g.
s1 <- 'hello'
s2 <- "world"
Of course, if you want to include double quotes inside a string literal defined using double quotes you need to escape (using backslash) the inner quotes, otherwise the R code parser won't be able to detect the end of the string correctly (the same holds for single quote).
e.g.
s1 <- "Hello, my name is \"John\""
If you print (using cat¹) this string on the console, or you write this string on a file you will get the actual "face" of the string, not the R literal representation, that is :
> cat("Hello, my name is \"John\"")
Hello, my name is "John"
The json parser, reads the actual "face" of the string, so, in your case json reads :
[{"id":"484","comment":"They call me "Bruce""}]
not (the R literal representation) :
"[{\"id\":\"484\",\"comment\":\"They call me \"Bruce\"\"}]"
That being said, also the json parser needs double-quotes escaping when you have quotes inside strings.
Hence, your string should be modified in this way :
[{"id":"484","comment":"They call me \"Bruce\""}]
If you simply modify your file by adding the backslashes you will be perfectly able to read the json.
Note that the corresponding R literal representation of that string would be :
"[{\"id\":\"484\",\"comment\":\"They call me \\\"Bruce\\\"\"}]"
in fact, this works :
> fromJSON("[{\"id\":\"484\",\"comment\":\"They call me \\\"Bruce\\\"\"}]")
id comment
1 484 They call me "Bruce"
¹
the default R print function (invoked also when you simply press ENTER on a value) returns the corresponding R string literal. If you want to print the actual string, you need to use print(quote=F,stringToPrint), or cat function.
EDIT (on #EngrStudent comment on the possibility to automatize quotes escaping) :
Json parser cannot do quotes escaping automatically.
I mean, try to put yourself in the computer's shoes and image you should parse this (unescaped) string as json: { "foo1" : " : "foo2" : "foo3" }
I see at least three possible escaping giving a valid json:
{ "foo1" : " : \"foo2\" : \"foo3" }
{ "foo1\" : " : "foo2\" : \"foo3" }
{ "foo1\" : \" : \"foo2" : "foo3" }
As you can see from this small example, escaping is really necessary to avoid ambiguities.
Maybe, if the string you want to escape has a really particular structure where you can recognize (without uncertainty) the double-quotes needing to be escaped, you can create your own automatic escaping procedure, but you need to start from scratch, because there's nothing built-in.

JSON.parse file input differ from parsing string literal

Im using nodejs to parse some JSON files and insert them into mongodb,the JSON in these files have invalid JSON characters like \n,\" etc ..
The thing that i dont understand is that if i tried to parse like :
console.log(JSON.parse('{"foo":"bar\n"}'))
i get
undefined:1
{"foo":"bar
but if i tried to parse the input from the file (The file has the same string {"foo":"bar\n"})like:
new lazy(fs.createReadStream("info.json"))
.lines
.forEach(function(line){
var line = line.toString();
console.log(JSON.parse(line));
}
);
every thing works fine , i want to know if this fine and its ok to parse the files i have, or i should replace all invalid JSON characters before i parse the files ,
and why is there a difference between the two.
Thanks
If you can read "\n" if your text file, then it's not an end of line but the \ character followed by a n.
\n in a JavaScript string literal adds an end of line and they're forbidden in JSON strings.
See json.org :
To put an end of line in a JSON string, you must escape it, which means you must escape the \ in a JavaScript string so that there's "\n" in the string received by JSON.parse :
console.log(JSON.parse('{"foo":"bar\\n"}'))
This would produce an object whose foo property value would contain an end of line :