Golang struct unmarshal xss - json

I have a struct which has XSS injected in it. In order to remove it, I json.Marshal it, then run json.HTMLEscape. Then I json.Unmarshal it into a new struct.
The problem is the new struct has XSS injected still.
I simply can't figure how to remove the XSS from the struct. I can write a function to do it on the field but considering there is json.HTMLEscape and we can Unmarshal it back it should work fine, but its not.
type Person struct {
Name string `json:"name"`
}
func main() {
var p, p2 Person
// p.Name has XSS
p.Name = "<script>alert(1)</script>"
var tBytes bytes.Buffer
// I marshal it so I can use json.HTMLEscape
marshalledJson, _ := json.Marshal(p)
json.HTMLEscape(&tBytes, marshalledJson)
// here I insert it into a new struct, sadly the p2 struct has the XSS still
err := json.Unmarshal(tBytes.Bytes(), &p2)
if err != nil {
fmt.Printf(err.Error())
}
fmt.Print(p2)
}
expected outcome is p2.Name to be sanitized like <script>alert(1)</script>

First, json.HTMLEscape doesn't do what you want:
HTMLEscape appends to dst the JSON-encoded src with <, >, &, U+2028 and U+2029 characters inside string literals changed to \u003c, \u003e, \u0026, \u2028, \u2029 so that the JSON will be safe to embed inside HTML <script> tags.
but what you want is:
p2.Name to be sanitized like <script>alert(1)</script>
which you can get by calling html.EscapeString, but not by any of the json encoder routines.1
Second, if you inspect the result of json.Marshal you'll see that it has already replaced < with \u003c and so forth—it's already done the json.HTMLEscape, so json.HTMLEscape does not have any characters to replace! See https://play.golang.org/p/Zergs3bwElY for an example.
As Ahmed Hashem noted, if you really want to do this sort of thing, you can use reflection to find string fields (as in Implement XSS protection in Golang)—but in general it's probably wiser to do this at the point of input. Note that the answer there does not recurse into inner objects that might contain strings.
1JSON is not HTML, nor XML, etc. Keep them separate in your head, and in your code.
See also https://medium.com/#oazzat19/what-is-the-difference-between-html-vs-xml-vs-json-254864972bbb, a short summary of how we got here that as far as I can tell has no errors, which is pretty good for a random web article. :-) When using JSON, we get very simple typed data: objects, strings, numbers, lists/arrays, boolean, and null; see https://www.w3schools.com/js/js_json_syntax.asp, https://www.w3schools.com/js/js_json_objects.asp, and https://cswr.github.io/JsonSchema/spec/basic_types/ for instance.

Related

can't read quoted field with gocsv

I have a csv response that comes from an endpoint that I don't control and I'm failing to parse its response because it has quotes. It looks something like this:
name,id,quantity,"status (active, expired)"
John,14,4,active
Bob,12,7,expired
to parse this response I have created the following struct:
type UserInfo struct {
Name string `csv:"name"`
ID string `csv:"id"`
Quantity string `csv:"quantity"`
Status string `csv:"status (active, expired)"`
}
I have tried using
Status string `csv:""status (active, expired)""`
Status string `csv:'"status (active, expired)"'`
but none seem to be helpful, I just can't access the field Status when I use gocsv.Unmarshal.
var actualResult []UserInfo
err = gocsv.Unmarshal(in, &actualResult)
for _, elem := range actualResult {
fmt.Println(elem.Status)
}
And I get nothing as as response.
https://go.dev/play/p/lje1zNO9w6E here's an example
You don't need third party package like gocsv (unless you have specific usecase) when it can be done easily with Go's builtin encoding/csv.
You just have to ignore first line/record which is csv header in your endpoint's response.
csvReader := csv.NewReader(strings.NewReader(csvString))
records, err := csvReader.ReadAll()
if err != nil {
panic(err)
}
var users []UserInfo
// Iterate over all records excluding first one i.e., header
for _, record := range records[1:] {
users = append(users, UserInfo{Name: record[0], ID: record[1], Quantity: record[2], Status: record[3]})
}
fmt.Printf("%v", users)
// Output: [{ John 14 4 active } { Bob 12 7 expired}]
Here is working example on Go Playground based on your use case and sample string.
I simply don't think gocarina/gocsv can parse a header with a quoted comma. I don't see it spelled out anywhere in the documentation that it cannot, but I did some digging and there are clear examples of commas being used in the "CSV annotations", and it looks like the author only conceived of commas in the annotations being used for the purposes of the package/API, and not as part of the column name.
If we look at sample_structs_test.go from the package, we can see commas being used in some of the following ways:
in metadata directives, like "omitempty":
type Sample struct {
Foo string `csv:"foo"`
Bar int `csv:"BAR"`
Baz string `csv:"Baz"`
...
Omit *string `csv:"Omit,omitempty"`
}
for declaring that a field in the struct can be populated from multiple, different headers:
type MultiTagSample struct {
Foo string `csv:"Baz,foo"`
Bar int `csv:"BAR"`
}
You can see this in action, here.
FWIW, the official encoding/json package has the same limitation, and they note it (emphasis added):
The encoding of each struct field can be customized by the format string stored under the "json" key in the struct field's tag. The format string gives the name of the field, possibly followed by a comma-separated list of options. The name may be empty in order to specify options without overriding the default field name.
and
The key name will be used if it's a non-empty string consisting of only Unicode letters, digits, and ASCII punctuation except quotation marks, backslash, and comma.
So, you may not be able to get what you expect/want: sorry, this may just be a limitation of having the ability to annotate your structs. If you want, you could file a bug with gocarina/gocsv.
In the meantime, you can just modify the header as it's coming in. This is example is pretty hacky, but it works: it just replaces "status (active, expired)" with "status (active expired)" and uses the comma-less version to annotate the struct.
endpointReader := strings.NewReader(sCSV)
// Fix header
var bTmp bytes.Buffer
fixer := bufio.NewReader(endpointReader)
header, _ := fixer.ReadString('\n')
header = strings.Replace(header, "\"status (active, expired)\"", "status (active expired)", -1)
bTmp.Write([]byte(header))
// Read rest of CSV
bTmp.ReadFrom(fixer)
// Turn back into a reader
reader := bytes.NewReader(bTmp.Bytes())
var actualResult []UserInfo
...
I can run that and now get:
active
expired

Passing a struct Type in Golang?

Please forgive my question, I'm new to Golang and possibly have the wrong approach.
I'm currently implementing a Terraform provider for an internal service.
As probably expected, that requires unmarshalling JSON data in to pre-defined Struct Types, e.g:
type SomeTypeIveDefined struct {
ID string `json:"id"`
Name String `json:"name"`
}
I've got myself in to a situation where I have a lot of duplicate code that looks like this
res := r.(*http.Response)
var tempThing SomeTypeIveDefined
dec := json.NewDecoder(res.Body)
err := dec.Decode(&tempThing)
In an effort to reduce duplication, I decided what I wanted to do was create a function which does the JSON unmarshalling, but takes in the Struct Type as a parameter.
I've trawled through several StackOverflow articles and Google Groups trying to make sense of some of the answers around using the reflect package, but I've not had much success in using it.
My latest attempt was using reflect.StructOf and passing in a set of StructFields, but that still seems to require using myReflectedStruct.Field(0) rather than myReflectedStruct.ID.
I suspect there may be no way until something like Generics are widely available in Golang.
I considered perhaps an interface for the structs which requires implementing an unmarshal method, then I could pass the interface to the function and call the unmarshal method. But then I'm still implementing unmarshal on all the structs anyway.
I'm just wondering what suggestions there may be for achieving what I'm after, please?
Create a helper function with the repeated code. Pass the destination value as a pointer.
func decode(r *http.Repsonse, v interface{}) error {
return json.NewDecoder(res.Body).Decode(v)
}
Call the helper function with a pointer to your thing:
var tempThing SomeTypeIveDefined
err := deocde(r, &tempThing)
You can do this with interfaces:
func decodeResponse(r *http.Response, dest interface{}) error {
dec := json.NewDecoder(r.Body)
return dec.Decode(dest)
}
func handler(...) {
res := r.(*http.Response)
var tempThing SomeTypeIveDefined
if err:=decodeResponse(res,&tempThing); err!=nil {
// handle err
}
...
}
You don't need to implement an unmarshal for the structs, because the stdlib decoder will use reflection to set the struct fields.

Runtime error when parsing JSON array and map elements with trailing commas

Dave Cheney, one of the leading subject matter experts on Go, wrote: "When initializing a variable with a composite literal, Go requires that each line of the composite literal end with a comma, even the last line of your declaration. This is the result of the semicolon rule."
However, when I am trying to apply that beautiful rule to JSON text, the parser doesn't seem to agree with this philosophy. In the code below, removing the comma works. Is there a fix for this so I can just see one line change when I add elements in my diffs?
package main
import (
"fmt"
"encoding/json"
)
type jsonobject struct {
Objects []ObjectType `json:"objects"`
}
type ObjectType struct {
Name string `json:"name"`
}
func main() {
bytes := []byte(`{ "objects":
[
{"name": "foo"}, // REMOVE THE COMMA TO MAKE THE CODE WORK!
]}`)
jsontype := &jsonobject{}
json.Unmarshal(bytes, &jsontype)
fmt.Printf("Results: %v\n", jsontype.Objects[0].Name) // panic: runtime error: index out of range
}
There is not. The JSON specification does not allow a trailing comma.
This is not a valid JSON:
{ "objects":
[
{"name": "foo"},
]}
It's a Go syntax that you need to use a comma if the enumeration is not closed on the line (more on this), e.g.:
// Slice literal:
s := []int {
1,
2,
}
// Function call:
fmt.Println(
"Slice:",
s,
)
Even if you could "convince" one specific JSON parser to silently swallow it, other, valid JSON parsers would report an error, rightfully. Don't do it.
While trailing commas are not valid JSON, some languages support trailing commas natively, notably JavaScript, so you may see them in your data.
It's better to remove trailing commas, but if you cannot change your data, use a JSON parser that supports trailing commas like HuJSON (aka Human JSON) which supports trailing commas and comments in JSON. It's a soft fork of encoding/json and is maintained by noted Xoogler and Ex-Golang team member Brad Fitzpatrick and others.
repo: https://github.com/tailscale/hujson
docs: https://pkg.go.dev/github.com/tailscale/hujson
The Unmarshal syntax is the same as encoding/json, just use:
err := hujson.Unmarshal(data, v)
I've used it and it works as described.

Go json.Unmarshal key with \u0000 \x00

Here is the Go playground link.
Basically there are some special characters ('\u0000') in my JSON string key:
var j = []byte(`{"Page":1,"Fruits":["5","6"],"\u0000*\u0000_errorMessages":{"x":"123"},"*_successMessages":{"ok":"hi"}}`)
I want to Unmarshal it into a struct:
type Response1 struct {
Page int
Fruits []string
Msg interface{} `json:"*_errorMessages"`
Msg1 interface{} `json:"\\u0000*\\u0000_errorMessages"`
Msg2 interface{} `json:"\u0000*\u0000_errorMessages"`
Msg3 interface{} `json:"\0*\0_errorMessages"`
Msg4 interface{} `json:"\\0*\\0_errorMessages"`
Msg5 interface{} `json:"\x00*\x00_errorMessages"`
Msg6 interface{} `json:"\\x00*\\x00_errorMessages"`
SMsg interface{} `json:"*_successMessages"`
}
I tried a lot but it's not working.
This link might help golang.org/src/encoding/json/encode_test.go.
Short answer: With the current json implementation it is not possible using only struct tags.
Note: It's an implementation restriction, not a specification restriction. (It's the restriction of the json package implementation, not the restriction of the struct tags specification.)
Some background: you specified your tags with a raw string literal:
The value of a raw string literal is the string composed of the uninterpreted (implicitly UTF-8-encoded) characters between the quotes...
So no unescaping or unquoting happens in the content of the raw string literal by the compiler.
The convention for struct tag values quoted from reflect.StructTag:
By convention, tag strings are a concatenation of optionally space-separated key:"value" pairs. Each key is a non-empty string consisting of non-control characters other than space (U+0020 ' '), quote (U+0022 '"'), and colon (U+003A ':'). Each value is quoted using U+0022 '"' characters and Go string literal syntax.
What this means is that by convention tag values are a list of (key:"value") pairs separated by spaces. There are quite a few restrictions for keys, but values may be anything, and values (should) use "Go string literal syntax", this means that these values will be unquoted at runtime from code (by a call to strconv.Unquote(), called from StructTag.Get(), in source file reflect/type.go, currently line #809).
So no need for double quoting. See your simplified example:
type Response1 struct {
Page int
Fruits []string
Msg interface{} `json:"\u0000_abc"`
}
Now the following code:
t := reflect.TypeOf(Response1{})
fmt.Printf("%#v\n", t.Field(2).Tag)
fmt.Printf("%#v\n", t.Field(2).Tag.Get("json"))
Prints:
"json:\"\\u0000_abc\""
"\x00_abc"
As you can see, the value part for the json key is "\x00_abc" so it properly contains the zero character.
But how will the json package use this?
The json package uses the value returned by StructTag.Get() (from the reflect package), exactly what we did. You can see it in the json/encode.go source file, typeFields() function, currently line #1032. So far so good.
Then it calls the unexported json.parseTag() function, in json/tags.go source file, currently line #17. This cuts the part after the comma (which becomes the "tag options").
And finally json.isValidTag() function is called with the previous value, in source file json/encode.go, currently line #731. This function checks the runes of the passed string, and (besides a set of pre-defined allowed characters "!#$%&()*+-./:<=>?#[]^_{|}~ ") rejects everything that is not a unicode letter or digit (as defined by unicode.IsLetter() and unicode.IsDigit()):
if !unicode.IsLetter(c) && !unicode.IsDigit(c) {
return false
}
'\u0000' is not part of the pre-defined allowed characters, and as you can guess now, it is neither a letter nor a digit:
// Following code prints "INVALID":
c := '\u0000'
if !unicode.IsLetter(c) && !unicode.IsDigit(c) {
fmt.Println("INVALID")
}
And since isValidTag() returns false, the name (which is the value for the json key, without the "tag options" part) will be discarded (name = "") and not used. So no match will be found for the struct field containing a unicode zero.
For an alternative solution use a map, or a custom json.Unmarshaler or use json.RawMessage.
But I would highly discourage using such ugly json keys. I understand likely you are just trying to parse such json response and it may be out of your reach, but you should fight against using these keys as they will just cause more problems later on (e.g. if stored in db, by inspecting records it will be very hard to spot that there are '\u0000' characters in them as they may be displayed as nothing).
You cannot do in such way due to: http://golang.org/ref/spec#Struct_types
But You can unmarshal to map[string]interface{} then check field names of that object through regexp.
I don't think this is possible with struct tags. The best thing you can do is unmarshal it into map[string]interface{} and then get the values manually:
var b = []byte(`{"\u0000abc":42}`)
var m map[string]interface{}
err := json.Unmarshal(b, &m)
if err != nil {
panic(err)
}
fmt.Println(m, m["\x00abc"])
Playground: http://play.golang.org/p/RtS7Nst0d7.

Parsing complex JSON into goweb REST Create() function

I apologise in advance for a very long question. Hopefully you'll bear with me.
I'm working with the goweb library, and experimenting with the example web app.
I've been trying to modify the RESTful example code, which defines a Thing as:
type Thing struct {
Id string
Text string
}
A Thing is created by sending an HTTP Post request, with an appropriate JSON body, to http://localhost:9090/things. This is handled in the example code in the Create function, specifically the lines:
dataMap := data.(map[string]interface{})
thing := new(Thing)
thing.Id = dataMap["Id"].(string)
thing.Text = dataMap["Text"].(string)
This is all well and good, and I can run the example server (which listens on http://localhost:9090/) and the server operates as expected.
For example:
curl -X POST -H "Content-Type: application/json" -d '{"Id":"TestId","Text":"TestText"}' http://localhost:9090/things
returns with no error, and then I GET that Thing with
curl http://localhost:9090/things/TestId
and it returns
{"d":{"Id":"TestId","Text":"TestText"},"s":200}
So far, so good.
Now, I would like to modify the Thing type, and add a custom ThingText type, like so:
type ThingText struct {
Title string
Body string
}
type Thing struct {
Id string
Text ThingText
}
This in itself isn't an issue, and I can modify the Create function like so:
thing := new(Thing)
thing.Id = dataMap["Id"].(string)
thing.Text.Title = dataMap["Title"].(string)
thing.Text.Body = dataMap["Body"].(string)
and the run the previous curl POST request with the JSON set to:
{"Id":"TestId","Title":"TestTitle","Title":"TestBody"}
and it returns with no error.
Once again I can GET the Thing URL and it returns:
{"d":{"Id":"TestId","Text":{"Title":"TestTitle","Body":"TestBody"}},"s":200}
Again, so far, so good.
Now, my question:
how do I modify the Create function to allow me to POST complex JSON to it?
For example, that last returned JSON string above includes {"Id":"TestId","Text":{"Title":"TestTitle","Body":"TestBody"}}. I'd like to be able to POST that exact JSON to the endpoint and have the Thing created.
I've followed the code back, and it seems that the data variable is of type Context.RequestData() from https://github.com/stretchr/goweb/context, and the internal Map seems to be of type Object.Map from https://github.com/stretchr/stew/, described as "a map[string]interface{} with additional helpful functionality." in particular, I noticed "Supports dot syntax to set deep values."
I can't work out how I can set up the thing.Text.Title = dataMap... statement so that the correct JSON field is parsed into it. I can't seem to use anything other than string types in the dataMap, and if I try that JSON it gives an error similar to:
http: panic serving 127.0.0.1:59113: interface conversion: interface is nil, not string
Once again, sorry for the ridiculously long question. I really appreciate you reading, and any help you may have to offer. Thanks!
As the JSON package documentation and JSON and Go introduction describe, JSON data can be parsed either generically through interface{} / string maps or unmarshalling directly to the struct types.
The example code you linked to and you based your changes on seems to use the generic string-map approach, dataMap := data.(map[string]interface{}).
As your desired JSON data is an object in an object, it’s simply a map within a map.
So you should be able to
dataMap := data.(map[string]interface{})
subthingMap := dataMap["Text"].(map[string]interface{})
thing.Text.Title = subthingMap["Title"].(string)
thing.Text.Body = subthingMap["Body"].(string)
I’m not sure why that code uses casts and generic types over type-safe unmarshalling directly from JSON to struct types (abstraction I guess). Using the json packages unmarshalling to struct types would go something like
type ThingText struct {
Title string
Body string
}
type Thing struct {
Id string
Text ThingText
}
…
decoder := json.NewDecoder(body)
var thingobj Thing
for {
if err := decoder.Decode(&thingobj); err == io.EOF {
break
} else if err != nil {
log.Fatal(err)
}
fmt.Println(thingobj)
}
where body is a io.Reader - in simple/most cases from http.Response.Body.