Go json.Unmarshal key with \u0000 \x00 - json

Here is the Go playground link.
Basically there are some special characters ('\u0000') in my JSON string key:
var j = []byte(`{"Page":1,"Fruits":["5","6"],"\u0000*\u0000_errorMessages":{"x":"123"},"*_successMessages":{"ok":"hi"}}`)
I want to Unmarshal it into a struct:
type Response1 struct {
Page int
Fruits []string
Msg interface{} `json:"*_errorMessages"`
Msg1 interface{} `json:"\\u0000*\\u0000_errorMessages"`
Msg2 interface{} `json:"\u0000*\u0000_errorMessages"`
Msg3 interface{} `json:"\0*\0_errorMessages"`
Msg4 interface{} `json:"\\0*\\0_errorMessages"`
Msg5 interface{} `json:"\x00*\x00_errorMessages"`
Msg6 interface{} `json:"\\x00*\\x00_errorMessages"`
SMsg interface{} `json:"*_successMessages"`
}
I tried a lot but it's not working.
This link might help golang.org/src/encoding/json/encode_test.go.

Short answer: With the current json implementation it is not possible using only struct tags.
Note: It's an implementation restriction, not a specification restriction. (It's the restriction of the json package implementation, not the restriction of the struct tags specification.)
Some background: you specified your tags with a raw string literal:
The value of a raw string literal is the string composed of the uninterpreted (implicitly UTF-8-encoded) characters between the quotes...
So no unescaping or unquoting happens in the content of the raw string literal by the compiler.
The convention for struct tag values quoted from reflect.StructTag:
By convention, tag strings are a concatenation of optionally space-separated key:"value" pairs. Each key is a non-empty string consisting of non-control characters other than space (U+0020 ' '), quote (U+0022 '"'), and colon (U+003A ':'). Each value is quoted using U+0022 '"' characters and Go string literal syntax.
What this means is that by convention tag values are a list of (key:"value") pairs separated by spaces. There are quite a few restrictions for keys, but values may be anything, and values (should) use "Go string literal syntax", this means that these values will be unquoted at runtime from code (by a call to strconv.Unquote(), called from StructTag.Get(), in source file reflect/type.go, currently line #809).
So no need for double quoting. See your simplified example:
type Response1 struct {
Page int
Fruits []string
Msg interface{} `json:"\u0000_abc"`
}
Now the following code:
t := reflect.TypeOf(Response1{})
fmt.Printf("%#v\n", t.Field(2).Tag)
fmt.Printf("%#v\n", t.Field(2).Tag.Get("json"))
Prints:
"json:\"\\u0000_abc\""
"\x00_abc"
As you can see, the value part for the json key is "\x00_abc" so it properly contains the zero character.
But how will the json package use this?
The json package uses the value returned by StructTag.Get() (from the reflect package), exactly what we did. You can see it in the json/encode.go source file, typeFields() function, currently line #1032. So far so good.
Then it calls the unexported json.parseTag() function, in json/tags.go source file, currently line #17. This cuts the part after the comma (which becomes the "tag options").
And finally json.isValidTag() function is called with the previous value, in source file json/encode.go, currently line #731. This function checks the runes of the passed string, and (besides a set of pre-defined allowed characters "!#$%&()*+-./:<=>?#[]^_{|}~ ") rejects everything that is not a unicode letter or digit (as defined by unicode.IsLetter() and unicode.IsDigit()):
if !unicode.IsLetter(c) && !unicode.IsDigit(c) {
return false
}
'\u0000' is not part of the pre-defined allowed characters, and as you can guess now, it is neither a letter nor a digit:
// Following code prints "INVALID":
c := '\u0000'
if !unicode.IsLetter(c) && !unicode.IsDigit(c) {
fmt.Println("INVALID")
}
And since isValidTag() returns false, the name (which is the value for the json key, without the "tag options" part) will be discarded (name = "") and not used. So no match will be found for the struct field containing a unicode zero.
For an alternative solution use a map, or a custom json.Unmarshaler or use json.RawMessage.
But I would highly discourage using such ugly json keys. I understand likely you are just trying to parse such json response and it may be out of your reach, but you should fight against using these keys as they will just cause more problems later on (e.g. if stored in db, by inspecting records it will be very hard to spot that there are '\u0000' characters in them as they may be displayed as nothing).

You cannot do in such way due to: http://golang.org/ref/spec#Struct_types
But You can unmarshal to map[string]interface{} then check field names of that object through regexp.

I don't think this is possible with struct tags. The best thing you can do is unmarshal it into map[string]interface{} and then get the values manually:
var b = []byte(`{"\u0000abc":42}`)
var m map[string]interface{}
err := json.Unmarshal(b, &m)
if err != nil {
panic(err)
}
fmt.Println(m, m["\x00abc"])
Playground: http://play.golang.org/p/RtS7Nst0d7.

Related

can't read quoted field with gocsv

I have a csv response that comes from an endpoint that I don't control and I'm failing to parse its response because it has quotes. It looks something like this:
name,id,quantity,"status (active, expired)"
John,14,4,active
Bob,12,7,expired
to parse this response I have created the following struct:
type UserInfo struct {
Name string `csv:"name"`
ID string `csv:"id"`
Quantity string `csv:"quantity"`
Status string `csv:"status (active, expired)"`
}
I have tried using
Status string `csv:""status (active, expired)""`
Status string `csv:'"status (active, expired)"'`
but none seem to be helpful, I just can't access the field Status when I use gocsv.Unmarshal.
var actualResult []UserInfo
err = gocsv.Unmarshal(in, &actualResult)
for _, elem := range actualResult {
fmt.Println(elem.Status)
}
And I get nothing as as response.
https://go.dev/play/p/lje1zNO9w6E here's an example
You don't need third party package like gocsv (unless you have specific usecase) when it can be done easily with Go's builtin encoding/csv.
You just have to ignore first line/record which is csv header in your endpoint's response.
csvReader := csv.NewReader(strings.NewReader(csvString))
records, err := csvReader.ReadAll()
if err != nil {
panic(err)
}
var users []UserInfo
// Iterate over all records excluding first one i.e., header
for _, record := range records[1:] {
users = append(users, UserInfo{Name: record[0], ID: record[1], Quantity: record[2], Status: record[3]})
}
fmt.Printf("%v", users)
// Output: [{ John 14 4 active } { Bob 12 7 expired}]
Here is working example on Go Playground based on your use case and sample string.
I simply don't think gocarina/gocsv can parse a header with a quoted comma. I don't see it spelled out anywhere in the documentation that it cannot, but I did some digging and there are clear examples of commas being used in the "CSV annotations", and it looks like the author only conceived of commas in the annotations being used for the purposes of the package/API, and not as part of the column name.
If we look at sample_structs_test.go from the package, we can see commas being used in some of the following ways:
in metadata directives, like "omitempty":
type Sample struct {
Foo string `csv:"foo"`
Bar int `csv:"BAR"`
Baz string `csv:"Baz"`
...
Omit *string `csv:"Omit,omitempty"`
}
for declaring that a field in the struct can be populated from multiple, different headers:
type MultiTagSample struct {
Foo string `csv:"Baz,foo"`
Bar int `csv:"BAR"`
}
You can see this in action, here.
FWIW, the official encoding/json package has the same limitation, and they note it (emphasis added):
The encoding of each struct field can be customized by the format string stored under the "json" key in the struct field's tag. The format string gives the name of the field, possibly followed by a comma-separated list of options. The name may be empty in order to specify options without overriding the default field name.
and
The key name will be used if it's a non-empty string consisting of only Unicode letters, digits, and ASCII punctuation except quotation marks, backslash, and comma.
So, you may not be able to get what you expect/want: sorry, this may just be a limitation of having the ability to annotate your structs. If you want, you could file a bug with gocarina/gocsv.
In the meantime, you can just modify the header as it's coming in. This is example is pretty hacky, but it works: it just replaces "status (active, expired)" with "status (active expired)" and uses the comma-less version to annotate the struct.
endpointReader := strings.NewReader(sCSV)
// Fix header
var bTmp bytes.Buffer
fixer := bufio.NewReader(endpointReader)
header, _ := fixer.ReadString('\n')
header = strings.Replace(header, "\"status (active, expired)\"", "status (active expired)", -1)
bTmp.Write([]byte(header))
// Read rest of CSV
bTmp.ReadFrom(fixer)
// Turn back into a reader
reader := bytes.NewReader(bTmp.Bytes())
var actualResult []UserInfo
...
I can run that and now get:
active
expired

json.Unmarshal Ignore json string value inside a json string, dont attempt to parse it

I need to unmarshal a json string, but treat the 'SomeString' value as a string and not json.
The unmarshaler errors when attempting this
Not sure if this is possible. Thanks.
type Test struct{
Name string `json:"Name"`
Description string `json:"Description"`
SomeString string `json:"SomeString"`
}
func main() {
a := "{\"Name\":\"Jim\", \"Description\":\"This is a test\", \"SomeString\":\"{\"test\":{\"test\":\"i am a test\"}}\" }"
var test Test
err := json.Unmarshal([]byte(a), &test)
if err !=nil{
fmt.Println(err)
}
fmt.Println(a)
fmt.Printf("%+v\n", test)
}
Your input is not a valid JSON. See this part:
\"SomeString\":\"{\"test
There's a key SomeString and it has a value: {\. The 2nd quote inside its supposed value closes it. The subsequent characters test are not part of SomeString's value, they appear "alone", but if they would be the name of the next property, it would have to be quoted. So you get the error invalid character 't' after object key:value pair.
It's possible of course to have a JSON string as the value and it's not parsed, but the input must be valid JSON.
For example, using a raw string literal:
a := `{"Name":"Jim", "Description":"This is a test", "SomeString":"{\"test\":{\"test\":\"i am a test\"}}" }`
Using this input, the output will be (try it on the Go Playground):
{"Name":"Jim", "Description":"This is a test", "SomeString":"{\"test\":{\"test\":\"i am a test\"}}" }
{Name:Jim Description:This is a test SomeString:{"test":{"test":"i am a test"}}}
This works as intended, your problem is how you specify your input. "" is already an interpreted (Go) literal, and you want another, JSON escaped string inside it. In that case you'd have to escape twice!
Like this:
a := "{\"Name\":\"Jim\", \"Description\":\"This is a test\", \"SomeString\":\"{\\\"test\\\":{\\\"test\\\":\\\"i am a test\\\"}}\" }"
This will output the same, try it on the Go Playground.

Golang struct unmarshal xss

I have a struct which has XSS injected in it. In order to remove it, I json.Marshal it, then run json.HTMLEscape. Then I json.Unmarshal it into a new struct.
The problem is the new struct has XSS injected still.
I simply can't figure how to remove the XSS from the struct. I can write a function to do it on the field but considering there is json.HTMLEscape and we can Unmarshal it back it should work fine, but its not.
type Person struct {
Name string `json:"name"`
}
func main() {
var p, p2 Person
// p.Name has XSS
p.Name = "<script>alert(1)</script>"
var tBytes bytes.Buffer
// I marshal it so I can use json.HTMLEscape
marshalledJson, _ := json.Marshal(p)
json.HTMLEscape(&tBytes, marshalledJson)
// here I insert it into a new struct, sadly the p2 struct has the XSS still
err := json.Unmarshal(tBytes.Bytes(), &p2)
if err != nil {
fmt.Printf(err.Error())
}
fmt.Print(p2)
}
expected outcome is p2.Name to be sanitized like <script>alert(1)</script>
First, json.HTMLEscape doesn't do what you want:
HTMLEscape appends to dst the JSON-encoded src with <, >, &, U+2028 and U+2029 characters inside string literals changed to \u003c, \u003e, \u0026, \u2028, \u2029 so that the JSON will be safe to embed inside HTML <script> tags.
but what you want is:
p2.Name to be sanitized like <script>alert(1)</script>
which you can get by calling html.EscapeString, but not by any of the json encoder routines.1
Second, if you inspect the result of json.Marshal you'll see that it has already replaced < with \u003c and so forth—it's already done the json.HTMLEscape, so json.HTMLEscape does not have any characters to replace! See https://play.golang.org/p/Zergs3bwElY for an example.
As Ahmed Hashem noted, if you really want to do this sort of thing, you can use reflection to find string fields (as in Implement XSS protection in Golang)—but in general it's probably wiser to do this at the point of input. Note that the answer there does not recurse into inner objects that might contain strings.
1JSON is not HTML, nor XML, etc. Keep them separate in your head, and in your code.
See also https://medium.com/#oazzat19/what-is-the-difference-between-html-vs-xml-vs-json-254864972bbb, a short summary of how we got here that as far as I can tell has no errors, which is pretty good for a random web article. :-) When using JSON, we get very simple typed data: objects, strings, numbers, lists/arrays, boolean, and null; see https://www.w3schools.com/js/js_json_syntax.asp, https://www.w3schools.com/js/js_json_objects.asp, and https://cswr.github.io/JsonSchema/spec/basic_types/ for instance.

Runtime error when parsing JSON array and map elements with trailing commas

Dave Cheney, one of the leading subject matter experts on Go, wrote: "When initializing a variable with a composite literal, Go requires that each line of the composite literal end with a comma, even the last line of your declaration. This is the result of the semicolon rule."
However, when I am trying to apply that beautiful rule to JSON text, the parser doesn't seem to agree with this philosophy. In the code below, removing the comma works. Is there a fix for this so I can just see one line change when I add elements in my diffs?
package main
import (
"fmt"
"encoding/json"
)
type jsonobject struct {
Objects []ObjectType `json:"objects"`
}
type ObjectType struct {
Name string `json:"name"`
}
func main() {
bytes := []byte(`{ "objects":
[
{"name": "foo"}, // REMOVE THE COMMA TO MAKE THE CODE WORK!
]}`)
jsontype := &jsonobject{}
json.Unmarshal(bytes, &jsontype)
fmt.Printf("Results: %v\n", jsontype.Objects[0].Name) // panic: runtime error: index out of range
}
There is not. The JSON specification does not allow a trailing comma.
This is not a valid JSON:
{ "objects":
[
{"name": "foo"},
]}
It's a Go syntax that you need to use a comma if the enumeration is not closed on the line (more on this), e.g.:
// Slice literal:
s := []int {
1,
2,
}
// Function call:
fmt.Println(
"Slice:",
s,
)
Even if you could "convince" one specific JSON parser to silently swallow it, other, valid JSON parsers would report an error, rightfully. Don't do it.
While trailing commas are not valid JSON, some languages support trailing commas natively, notably JavaScript, so you may see them in your data.
It's better to remove trailing commas, but if you cannot change your data, use a JSON parser that supports trailing commas like HuJSON (aka Human JSON) which supports trailing commas and comments in JSON. It's a soft fork of encoding/json and is maintained by noted Xoogler and Ex-Golang team member Brad Fitzpatrick and others.
repo: https://github.com/tailscale/hujson
docs: https://pkg.go.dev/github.com/tailscale/hujson
The Unmarshal syntax is the same as encoding/json, just use:
err := hujson.Unmarshal(data, v)
I've used it and it works as described.

Using `json:",string"` returning invalid use of ,string struct tag, trying to unmarshal unquoted value

When trying to parse a json with a float value for distance to the following struct
type CreateBookingRequest struct {
Distance float64 `json:"distance,string"`
DistanceSource string `json:"distanceSource"`
}
I get the following error
json: invalid use of ,string struct tag, trying to unmarshal unquoted
value into [34 100 105 115 116 97 110 99 101 34]%!(EXTRA
*reflect.rtype=dto.CreateBookingRequest)
Is there a way for me to avoid the error/get a better error message?
Edit:
I am actually expecting the users of the API to pass in a string value but if they for some reason pass in a non-string value, I would like to be able to tell them clearly, instead of this hard to read error message.
I had to work with an API which sometimes quotes numbers and sometimes doesn't. The owners of the service weren't likely to fix it, so I came up with a simple workaround:
re := regexp.MustCompile(`(":\s*)([\d\.]+)(\s*[,}])`)
rawJsonByteArray = re.ReplaceAll(rawJsonByteArray, []byte(`$1"$2"$3`))
Regular expressions are somewhat inefficient, but I don't believe I'd be able to implement something substantially faster.
func unmarshal state:
To unmarshal JSON into a struct, Unmarshal matches incoming object
keys to the keys used by Marshal (either the struct field name or its
tag), preferring an exact match but also accepting a case-insensitive
match.
bool, for JSON booleans
float64, for JSON numbers
string, for JSON strings
[]interface{}, for JSON arrays
map[string]interface{}, for JSON objects
nil for JSON null
So, unmarshal expecting Distance should be float64 by default. But as per tag, you are requesting unmarshal to except Distance as string. Here is data type missing matches.
So you have two options, either you change distance tag with float64 or marshal distance as string.
This error happens when the "distance" JSON value is encoded as a number instead of a string (per the "string" tag on the "Distance") field:
str := []byte(`{"distance":1.23,"distanceSource":"foo"}`)
// Note JSON number -------^
var cbr CreateBookingRequest
err := json.Unmarshal(str, &cbr)
// err => json: invalid use of ,string struct tag, trying to unmarshal unquoted value into [34 100 105 115 116 97 110 99 101 34]%!(EXTRA *reflect.rtype=main.CreateBookingRequest)
If you change the type of the distance value to a string (per the tag) then it works fine:
str := []byte(`{"distance":"1.23","distanceSource":"foo"}`)
// Note JSON string -------^
You could change the error message by identifying that specific error somehow and provide a different message. You might also consider changing the tag for the Distance type to simply accept a number instead of a string:
type CreateBookingRequest struct {
Distance float64 `json:"distance"`
...
}
...
str := []byte(`{"distance":1.23,"distanceSource":"foo"}`)
The error is simply saying you designated Distance as a string with your json annotations but in the json string you're trying to deserialize the value is not quoted (therefor not a string).
The solution is simple, either change this json:"distance,string" to json:"distance" or get json that matches your definition (meaning it has distince in quotes like "Distance":"10.4")
Given, the error and the fact that your native Go type is a float64 I would advise getting rid of the string annotation.
Another way to deal with this issue is to use json.Number. It will parse all numeric data into json.Number type, which is a string alias. Then you have to cast it:
package main
import (
"encoding/json"
"fmt"
)
type x struct {
Num json.Number `json:"price"`
}
func castToFloat64(num json.Number) (float64, error) {
return num.Float64()
}
func main() {
var resultHolder x
data := `{"price":"5"}`
jsonErr := json.Unmarshal([]byte(data), &resultHolder)
if jsonErr != nil {
fmt.Println(jsonErr)
}
convertedNum, convertErr := castToFloat64(resultHolder.Num)
if convertErr != nil {
fmt.Println(convertErr)
}
fmt.Println(convertedNum*2, resultHolder.Num+"extraString")
}
PlayGround