How do I make raw unicode encoded content readable? - json

I used net/http request a web API and the server returned a JSON response. When I print the response body, it displayed as raw ASCII content. I tried using bufio.ScanRunes to parse the content but failed.
I also tried write a simple server and return a unicode string and it worked well.
Here is the core code:
func (c ClientInfo) Request(method string, url string, form url.Values) string {
req, _ := http.NewRequest(method, url, strings.NewReader(c.Encode(form)))
req.Header = c.Header
req.AddCookie(&c.Cookie)
resp, err := http.DefaultClient.Do(req)
defer resp.Body.Close()
if err != nil {
fmt.Println(err)
}
scanner := bufio.NewScanner(resp.Body)
scanner.Split(bufio.ScanRunes)
var buf bytes.Buffer
for scanner.Scan() {
buf.WriteString(scanner.Text())
}
rv := buf.String()
fmt.Println(rv)
return rv
}
Here is the example output:
{"forum":{"id":"3251718","name":"\u5408\u80a5\u5de5\u4e1a\u5927\u5b66\u5ba3\u57ce\u6821\u533a","first_class":"\u9ad8\u7b49\u9662\u6821","second_class":"\u5b89\u5fbd\u9662\u6821","is_like":"0","user_level":"1","level_id":"1","level_name":"\u7d20\u672a\u8c0b\u9762","cur_score":"0","levelup_score":"5","member_num":"80329","is_exists":"1","thread_num":"108762","post_num":"3445881","good_classify":[{"class_id":"0","class_name":"\u5168\u90e8"},{"class_id":"1","class_name":"\u516c\u544a\u7c7b"},{"class_id":"2","class_name":"\u5427\u53cb\u4e13\u533a"},{"class_id":"4","class_name":"\u6d3b\u52a8\u4e13\u533a"},{"class_id":"6","class_name":"\u793e\u56e2\u73ed\u7ea7"},{"class_id":"5","class_name":"\u8d44\u6e90\u5171\u4eab"},{"class_id":"8","class_name":"\u6e29\u99a8\u751f\u6d3b\u7c7b"},{"class_id":"7","class_name":"\u54a8\u8be2\u65b0\u95fb\u7c7b"},{"class_id":"3","class_name":"\u98ce\u91c7\u5c55\u793a\u533a"}],"managers":[{"id":"793092593","name":"yi\u62b9\u660e\u5a9a\u7684\u5fe7\u4f24"},
...

That is just the standard way to escape any Unicode character.
Unmarshal it to see the unquoted text (the json package will unquote it):
func main() {
var i interface{}
err := json.Unmarshal([]byte(src), &i)
fmt.Println(err, i)
}
const src = `{"forum":{"id":"3251718","name":"\u5408\u80a5\u5de5\u4e1a\u5927\u5b66\u5ba3\u57ce\u6821\u533a","first_class":"\u9ad8\u7b49\u9662\u6821","second_class":"\u5b89\u5fbd\u9662\u6821","is_like":"0","user_level":"1","level_id":"1","level_name":"\u7d20\u672a\u8c0b\u9762","cur_score":"0","levelup_score":"5","member_num":"80329","is_exists":"1","thread_num":"108762","post_num":"3445881","good_classify":[{"class_id":"0","class_name":"\u5168\u90e8"},{"class_id":"1","class_name":"\u516c\u544a\u7c7b"},{"class_id":"2","class_name":"\u5427\u53cb\u4e13\u533a"},{"class_id":"4","class_name":"\u6d3b\u52a8\u4e13\u533a"},{"class_id":"6","class_name":"\u793e\u56e2\u73ed\u7ea7"},{"class_id":"5","class_name":"\u8d44\u6e90\u5171\u4eab"},{"class_id":"8","class_name":"\u6e29\u99a8\u751f\u6d3b\u7c7b"},{"class_id":"7","class_name":"\u54a8\u8be2\u65b0\u95fb\u7c7b"},{"class_id":"3","class_name":"\u98ce\u91c7\u5c55\u793a\u533a"}]}}`
Output (trimmed) (try it on the Go Playground):
<nil> map[forum:map[levelup_score:5 is_exists:1 post_num:3445881 good_classify:[map[class_id:0 class_name:全部] map[class_id:1 class_name:公告类] map[class_id:2 class_name:吧友专区] map[class_id:4 class_name:活动专区] map[class_id:6 class_name:社团班级] map[class_id:5 class_name:资源共享] map[class_id:8 class_name:温馨生活类] map[class_name:咨询新闻类 class_id:7] map[class_id:3 class_name:风采展示区]] id:3251718 is_like:0 cur_score:0
If you just want to unquote a fragment, you may use strconv.Unquote():
fmt.Println(strconv.Unquote(`"\u7d20\u672a\u8c0b"`))
Output (try it on the Go Playground):
素未谋 <nil>
Note that strconv.Unquote() expects a string that is in quotes, that's why I used a raw string literal, so I could add quotes, and also so that the compiler itself will not interpret / unquote the Unicode escapes.
See related question: How to convert escape characters in HTML tags?

Related

strconv.Unquote does not remove escaped json characters, unmarshall fails in Go

I am parsing rather simple json where the slash in the date format string is escaped when it arrives in response.
however, when you try to unmarshall the string, it fails on "invalid syntax" error.
So I googled and we should use strconv.Unquote to replace the escaped characters first. Did that, and now the unquote function fails on the "unknown escape" error. However JSON RFC 8259 shows it is a valid case. So why valid JSON is causing failures in Go unmarshaller?
This is the simple JSON
{
"created_at": "6\/30\/2022 21:51:49"
}
var dst string
err := json.Unmarshal([]byte("6\/30\/2022 21:51:49"), dst) // this fails on ivnalid quote
fmt.Println(dst, err)
s, err := strconv.Unquote(`"6\/30\/2022 21:51:49"`) // this fails on ivnalid syntax
fmt.Println(s, err)
How does one proceed with such cases when the JSON is valid, but none of the built-in functions of Go actually parse it? What is the right way, apart from writing simple find and replace pattern in json []byte manually?
EDIT:
ok I think I should have asked a better question or provided a specific example from the scenario rather than just a portion of it. Garbage in garbage out, my fault.
Here is the thing. Let's say we have the specific, non standard time format like in the above json example.
{
"created_at": "6\/30\/2022 21:51:49"
}
I need to parse it into the struct with go time type. Because the standard parser for time would fail for that format I create a custom type.
type Request struct {
CreatedAt Time `json:"created_at"`
}
type Time time.Time
unc (d *Time) UnmarshalJSON(b []byte) error {
s := strings.Trim(string(b), "\"")
parse, err := time.Parse("1/2/2006 15:04:05", s) // this fails because of the escaped backslashes
if err != nil {
return err
}
*d = Time(parse)
return nil
}
Here is an example in Playground: https://go.dev/play/p/bE3AWQeV-ug
panic: parsing time "6\\/30\\/2022 21:51:49" as "1/2/2006 15:04:05": cannot parse "\\/30\\/2022 21:51:49" as "/"
This is the reason I am trying to put the Unquote into the custom unmarshal function, but that is not the right approach as it was already pointed out. How does the default string json unmarshaller for example removes those escaped slashes? Oh and I have no control about the side that writes the json, it comes in with single escaped slashes like that in []byte from *http.Response
Go has a great built-in function to convert a JSON string to a Go string: json.Unmarshal. Here is how you can integrate it with a custom UnmarshalJSON method:
func (d *Time) UnmarshalJSON(b []byte) error {
var s string
if err := json.Unmarshal(b, &s); err != nil {
return err
}
// Remove this line: s := strings.Trim(string(b), "\"")
// The rest of the code is unchanged
Unquote is designed for cases when you have a double quoted string like "\"My name\"", you don't really need that in this case.
The problem with that code is that you're trying to unmarshal a byte slice that isn't JSON. You're also unmarshaling into an string which isn't going to work. You need to unmarshal into a struct that has json tags or unmarshal into a map.
var dest map[string]string
someJsonString := `{
"created_at": "6\/30\/2022 21:51:49"
}`
err := json.Unmarshal([]byte(someJsonString), &dest)
if err != nil {
panic(err)
}
fmt.Println(dest)
if you wanted to do Unmarshal into a struct, you can do it like so
type SomeStruct struct {
CreatedAt string `json:"created_at"`
}
someJsonString := `{
"created_at": "6\/30\/2022 21:51:49"
}`
var data SomeStruct
err := json.Unmarshal([]byte(someJsonString), &data)
if err != nil {
panic(err)
}
fmt.Println(data)

Using go-jsonnet to return pure JSON

I am using Google's go-jsonnet library to evaluate some jsonnet files.
I have a function, like so, which renders a Jsonnet document:
// Takes a list of jsonnet files and imports each one and mixes them with "+"
func renderJsonnet(files []string, param string, prune bool) string {
// empty slice
jsonnetPaths := files[:0]
// range through the files
for _, s := range files {
jsonnetPaths = append(jsonnetPaths, fmt.Sprintf("(import '%s')", s))
}
// Create a JSonnet VM
vm := jsonnet.MakeVM()
// Join the slices into a jsonnet compat string
jsonnetImport := strings.Join(jsonnetPaths, "+")
if param != "" {
jsonnetImport = "(" + jsonnetImport + ")" + param
}
if prune {
// wrap in std.prune, to remove nulls, empty arrays and hashes
jsonnetImport = "std.prune(" + jsonnetImport + ")"
}
// render the jsonnet
out, err := vm.EvaluateSnippet("file", jsonnetImport)
if err != nil {
log.Panic("Error evaluating jsonnet snippet: ", err)
}
return out
}
This function currently returns a string, because the jsonnet EvaluateSnippet function returns a string.
What I now want to do is render that result JSON using the go-prettyjson library. However, because the JSON i'm piping in is a string, it's not rendering correctly.
So, some questions:
Can I convert the returned JSON string to a JSON type, without knowing beforehand what struct to marshal it into
if not, can I render the json in a pretty manner some other way?
Is there an option, function or method I'm missing here to make this easier?
Can I convert the returned JSON string to a JSON type, without knowing beforehand what struct to marshal it into
Yes. It's very easy:
var jsonOut interface{}
err := json.Unmarshal([]byte(out), &jsonOut)
if err != nil {
log.Panic("Invalid json returned by jsonnet: ", err)
}
formatted, err := prettyjson.Marshal([]byte(jsonOut))
if err != nil {
log.Panic("Failed to format jsonnet output: ", err)
}
More info here: https://blog.golang.org/json-and-go#TOC_5.
Is there an option, function or method I'm missing here to make this easier?
Yes. The go-prettyjson library has a Format function which does the unmarshalling for you:
formatted, err := prettyjson.Format([]byte(out))
if err != nil {
log.Panic("Failed to format jsonnet output: ", err)
}
can I render the json in a pretty manner some other way?
Depends on your definition of pretty. Jsonnet normally outputs every field of an object and every array element on a separate line. This is usually considered pretty printing (as opposed to putting everything on the same line with minimal whitespace to save a few bytes). I suppose this is not good enough for you. You can write your own manifester in jsonnet which formats it to your liking (see std.manifestJson as an example).

How to convert utf8 string to []byte?

I want to unmarshal a string that contains JSON,
however the Unmarshal function takes a []byte as input.
How can I convert my UTF8 string to []byte?
This question is a possible duplicate of How to assign string to bytes array, but still answering it as there is a better, alternative solution:
Converting from string to []byte is allowed by the spec, using a simple conversion:
Conversions to and from a string type
[...]
Converting a value of a string type to a slice of bytes type yields a slice whose successive elements are the bytes of the string.
So you can simply do:
s := "some text"
b := []byte(s) // b is of type []byte
However, the string => []byte conversion makes a copy of the string content (it has to, as strings are immutable while []byte values are not), and in case of large strings it's not efficient. Instead, you can create an io.Reader using strings.NewReader() which will read from the passed string without making a copy of it. And you can pass this io.Reader to json.NewDecoder() and unmarshal using the Decoder.Decode() method:
s := `{"somekey":"somevalue"}`
var result interface{}
err := json.NewDecoder(strings.NewReader(s)).Decode(&result)
fmt.Println(result, err)
Output (try it on the Go Playground):
map[somekey:somevalue] <nil>
Note: calling strings.NewReader() and json.NewDecoder() does have some overhead, so if you're working with small JSON texts, you can safely convert it to []byte and use json.Unmarshal(), it won't be slower:
s := `{"somekey":"somevalue"}`
var result interface{}
err := json.Unmarshal([]byte(s), &result)
fmt.Println(result, err)
Output is the same. Try this on the Go Playground.
Note: if you're getting your JSON input string by reading some io.Reader (e.g. a file or a network connection), you can directly pass that io.Reader to json.NewDecoder(), without having to read the content from it first.
just use []byte(s) on the string. for example:
package main
import (
"encoding/json"
"fmt"
)
func main() {
s := `{"test":"ok"}`
var data map[string]interface{}
if err := json.Unmarshal([]byte(s), &data); err != nil {
panic(err)
}
fmt.Printf("json data: %v", data)
}
check it out on the playground here.

Unit testing http json response in Golang

I am using gin as my http server and sending back an empty array in json as my response:
c.JSON(http.StatusOK, []string{})
The resulting json string I get is "[]\n". The newline is added by the json Encoder object, see here.
Using goconvey, I could test my json like
So(response.Body.String(), ShouldEqual, "[]\n")
But is there a better way to generate the expected json string than just adding a newline to all of them?
You should first unmarshal the body of the response into a struct and compare against the resulting object. Example:
result := []string{}
if err := json.NewDecoder(response.Body).Decode(&result); err != nil {
log.Fatalln(err)
}
So(len(result), ShouldEqual, 0)
You may find jsonassert useful.
It has no dependencies outside the standard library and allows you to verify that JSON strings are semantically equivalent to a JSON string you expect.
In your case:
// white space is ignored, no need for \n
jsonassert.New(t).Assertf(response.Body().String(), "[]")
It can handle any form of JSON, and has very friendly assertion error messages.
Disclaimer: I wrote this package.
Unmarshal the body into a struct and the use Gocheck's DeepEquals
https://godoc.org/launchpad.net/gocheck
I made it this way. Because I don't want to include an extra library.
tc := testCase{
w: httptest.NewRecorder(),
wantResponse: mustJson(t, map[string]string{"message": "unauthorized"}),
}
...
if tc.wantResponse != tc.w.Body.String() {
t.Errorf("want %s, got %s", tt.wantResponse, tt.w.Body.String())
}
...
func mustJson(t *testing.T, v interface{}) string {
t.Helper()
out, err := json.Marshal(v)
if err != nil {
t.Fatal(err)
}
return string(out)
}

REST API for uploading file inside JSON

I am designing an REST API to upload a largish (100MB) file together with some information. So it's natural to think of json encoding.
So something like this:
{
file: content of the file or URL?
name: string
description: string
}
The name and description are easy to do with json but I'm not sure how the file content can be added to it.
Also I'm thinking I should use http PUT method. Is this correct?
Incidentally, golang is used to implement this API if it matters.
For a JSON encoding, use a []byte value to hold the file contents. The standard encoding/json package encodes []byte values as base64 strings.
Here's a sketch of how to implement the JSON encoding. Declare a type representing the payload:
type Upload struct {
Name string
Description string
Content []byte
}
To encode the file to a request body:
v := Upload{Name: fileName, Description: description, Content: content}
var buf bytes.Buffer
if err := json.NewEncoder(&buf).Encode(v); err != nil {
// handle error
}
req, err := http.NewRequest("PUT", url, &buf)
if err != nil {
// handle error
}
resp, err := http.DefaultClient.Do(req)
To decode the from a request body on the server:
var v Upload
if err := json.NewDecoder(req.Body).Decode(&v); err != nil {
// handle error
}
Another option is to use the mime/multipart package. The multipart encoding will be more efficient than JSON encoding because no base64 or other text encoding of the file is required for multipart.
To me, the most clear-cut way to do it would be to encode the file bytes somehow. base64 seems like a good choice, and golang has built-in support for it with "encoding/base64".