Reading JSON into GO Strings - json

So I have a JSON File in the format...
[
{
"Key":"Value",
"Key2":"Value2",
"Key3":"Value3"
},
{
"Foo":"Bar",
"Blah":2
}
]
I want to just read in the hash parts of it and pass them to an HTTP request like in goRequest, because goRequest is fine with just the JSON being in a String.
package main
request := gorequest.New()
resp, body, errs := request.Post("http://example.com").
Set("Notes","gorequst is coming!").
Send(`{"Foo":"Bar","Blah":2}`).
End()
I don't care what the JSON is and I don't need to unmarshal it to any go Structs or anything of the sort, it's fine just remaining as a string and being totally untouched, just passed along to the request.
I've seen a lot online about it, but it always seems to wanna un-marshal the JSON to Go Structs and the sort, which is fine if you want to care about what actually is in the JSON, but in my case this seems like unnecessary overhead.
How would I accomplish something like this? It seems pretty simple, but none of the existing JSON libraries for Go seem to be able to accomplish this.
Thanks.

You are probably looking for json.RawMessage.
For example:
package main
import (
"encoding/json"
"fmt"
"log"
)
func main() {
txt := []byte(`
[
{"key1" : "value1" },
{"key2" : "value2" }
]`)
msg := []json.RawMessage{}
err := json.Unmarshal(txt, &msg)
if err != nil {
log.Fatal(err)
}
for _, c := range msg {
fmt.Printf("%s\n", string(c))
}
}
Note that the redundant white space in the example separating the key/value pairs is intentional: you will see that these are preserved in the output.
Alternatively, even if you don't care about the exact structure, you can still dynamically poke at it by using an interface{} variable. See the JSON and Go document for a running example of this, under the Generic JSON with interface{} section.
If we are trying to do something like a streaming approach, we may attempt to do something custom with the io.Reader. The JSON parser assumes you can represent everything in memory at once. That assumption may not hold in your situation, so we have to break a few things.
Perhaps we might manually consume bytes in the io.Reader till we eat the leading [ character, and then repeatedly call json.Decode on the rest of the io.Reader. Something like this:
package main
import (
"bytes"
"encoding/json"
"fmt"
"io"
"log"
)
func main() {
var txt io.Reader = bytes.NewBufferString(`
[
{"key1" : "value1" },
{"key2" : "value2" }
]`)
buf := make([]byte, 1)
for {
_, err := txt.Read(buf)
if err != nil {
log.Fatal(err)
}
if buf[0] == '[' {
break
}
}
for {
decoder := json.NewDecoder(txt)
msg := json.RawMessage{}
err := decoder.Decode(&msg)
if err != nil {
break
}
fmt.Printf("I see: %s\n", string(msg))
txt = decoder.Buffered()
for {
_, err := txt.Read(buf)
if err != nil {
log.Fatal(err)
}
if buf[0] == ',' || buf[0] == ']' {
break
}
}
}
}
This code is severely kludgy and non-obvious. I also don't think it's a good idea. If you have to deal with this in a streaming fashion, then JSON is likely not a good serialization format for this scenario. If you have control over the input, then you should consider changing it so it's more amendable to a streaming approach: hacks like what we're doing here are a bad smell that the input is in the wrong shape.

Here is what I was thinking as a solution, does this look sane?
package main
import (
"encoding/csv"
"fmt"
"os"
"bytes"
"flag"
"github.com/parnurzeal/gorequest"
)
func process_line(headers []string, line []string) {
var comma string = ""
var buffer bytes.Buffer
buffer.WriteString("[{")
for i := range headers {
buffer.WriteString(fmt.Sprintf("%s\"%s\":\"%s\"", comma, headers[i], line[i]))
comma = ","
}
fmt.Fprintf(&buffer,"}]\n")
request := gorequest.New()
resp, body, errs := request.Post("www.something.com").
Set("Content-Type", "application/json").
Set("Accept", "application/json").
Send(buffer.String()).End()
if errs == nil {
return resp
}else{
fmt.Println(errs)
}
}
func main() {
file := flag.String("file", "", "Filename?")
flag.Parse()
if *file == "" {
fmt.Println("No file specified. :-(")
os.Exit(1)
}
csvFile, err := os.Open(*file)
if err != nil {
fmt.Println(err)
}
defer csvFile.Close()
reader := csv.NewReader(csvFile)
var i int = 0
var headers []string
for {
line, err := reader.Read()
if err != nil {
break
}
if i == 0 {
headers = line
}else{
go process_line(headers, line)
}
if i%100 == 0 {
fmt.Printf("%v records processed.\n", i)
}
i += 1
}
}

Related

Unmarshaling from JSON key containing a single quote

I feel quite puzzled by this.
I need to load some data (coming from a French database) that is serialized in JSON and in which some keys have a single quote.
Here is a simplified version:
package main
import (
"encoding/json"
"fmt"
)
type Product struct {
Name string `json:"nom"`
Cost int64 `json:"prix d'achat"`
}
func main() {
var p Product
err := json.Unmarshal([]byte(`{"nom":"savon", "prix d'achat": 170}`), &p)
fmt.Printf("product cost: %d\nerror: %s\n", p.Cost, err)
}
// product cost: 0
// error: %!s(<nil>)
Unmarshaling leads to no errors however the "prix d'achat" (p.Cost) is not correctly parsed.
When I unmarshal into a map[string]any, the "prix d'achat" key is parsed as I would expect:
package main
import (
"encoding/json"
"fmt"
)
func main() {
blob := map[string]any{}
err := json.Unmarshal([]byte(`{"nom":"savon", "prix d'achat": 170}`), &blob)
fmt.Printf("blob: %f\nerror: %s\n", blob["prix d'achat"], err)
}
// blob: 170.000000
// error: %!s(<nil>)
I checked the json.Marshal documentation on struct tags and I cannot find any issue with the data I'm trying to process.
Am I missing something obvious here?
How can I parse a JSON key containing a single quote using struct tags?
Thanks a lot for any insight!
I didn't find anything in the documentation, but the JSON encoder considers single quote to be a reserved character in tag names.
func isValidTag(s string) bool {
if s == "" {
return false
}
for _, c := range s {
switch {
case strings.ContainsRune("!#$%&()*+-./:;<=>?#[]^_{|}~ ", c):
// Backslash and quote chars are reserved, but
// otherwise any punctuation chars are allowed
// in a tag name.
case !unicode.IsLetter(c) && !unicode.IsDigit(c):
return false
}
}
return true
}
I think opening an issue is justified here. In the meantime, you're going to have to implement json.Unmarshaler and/or json.Marshaler. Here is a start:
func (p *Product) UnmarshalJSON(b []byte) error {
type product Product // revent recursion
var _p product
if err := json.Unmarshal(b, &_p); err != nil {
return err
}
*p = Product(_p)
return unmarshalFieldsWithSingleQuotes(p, b)
}
func unmarshalFieldsWithSingleQuotes(dest interface{}, b []byte) error {
// Look through the JSON tags. If there is one containing single quotes,
// unmarshal b again, into a map this time. Then unmarshal the value
// at the map key corresponding to the tag, if any.
var m map[string]json.RawMessage
t := reflect.TypeOf(dest).Elem()
v := reflect.ValueOf(dest).Elem()
for i := 0; i < t.NumField(); i++ {
tag := t.Field(i).Tag.Get("json")
if !strings.Contains(tag, "'") {
continue
}
if m == nil {
if err := json.Unmarshal(b, &m); err != nil {
return err
}
}
if j, ok := m[tag]; ok {
if err := json.Unmarshal(j, v.Field(i).Addr().Interface()); err != nil {
return err
}
}
}
return nil
}
Try it on the playground: https://go.dev/play/p/aupACXorjOO

Parsing CSV file which includes a header block of "comments"

I'm trying to parse a CSV file hosted in a remote location, but annoyingly the file contains some readme comments atop the file, in the following format:
######
# some readme text,
# some more, comments
######
01-02-03,123,foo,http://example.com
04-05-06,789,baz,http://another.com
I'm attempting to use the following code in order to extract the URLs within the data, but it throws an error saying wrong number of fields due to the comments at the top, presumably it's trying to parse them as CSV content.
type myData struct {
URL string `json:"url"`
}
func doWork() ([]myData, error) {
rurl := "https://example.com/some.csv"
out := make([]myData, 0)
resp, err := http.Get(rurl)
if err != nil {
return []myData{}, err
}
defer resp.Body.Close()
reader := csv.NewReader(resp.Body)
reader.Comma = ','
data, err := reader.ReadAll()
if err != nil {
return []myData{}, err
}
for _, row := range data {
out = append(out, myData{URL: row[4]})
}
return out, nil
}
func main() {
data, err := doWork()
if err != nil {
panic(err)
}
// do something with data
}
Is there a way to skip over the first N lines of the remote file, or have it ignore lines which start with a #
Oh, actually I just realised I can add this:
reader.Comment = '#' // ignores the line starting with '#'
Which works perfectly with my current code, but appreciate the other suggestions.
Well, the approach is simple: do not try to interpret the lines starting with '#' as part of the CSV stream; instead, consider the whole stream of data as a concatenation of two streams: the header and the actual CSV payload.
The easiest approach probably is to employ the fact bufio.Reader is able to read lines from its underlying stream and it itself is an io.Reader so you can make a csv.Reader to read from it instead of the source stream.
So, you could roll like this (not real code, untested):
import (
"bufio"
"encoding/csv"
"io"
"strings"
)
func parse(r io.Reader) ([]MyData, error) (
br := bufio.NewReader(r)
var line string
for {
s, err := br.ReadString('\n')
if err != nil {
return nil, err
}
if len(s) == 0 || s[0] != '#' {
line = s
break
}
}
// At this point the line variable contains the 1st line of the CSV stream.
// Let's create a "multi reader" which reads first from that line
// and then — from the rest of the CSV stream.
cr := csv.NewReader(io.MultiReader(strings.NewReader(line), br))
cr.Comma = ','
data, err := cr.ReadAll()
if err != nil {
return nil, err
}
for _, row := range data {
out = append(out, myData{URL: row[4]})
}
return out, nil
}

Read extern JSON file

I am trying to read the following JSON file:
{
"a":1,
"b":2,
"c":3
}
I have tried this but I found that I had to write each field of the JSON file into a struct but I really don't want to have all my JSON file in my Go code.
import (
"fmt"
"encoding/json"
"io/ioutil"
)
type Data struct {
A string `json:"a"`
B string `json:"b"`
C string `json:"c"`
}
func main() {
file, _ := ioutil.ReadFile("/path/to/file.json")
data := Data{}
if err := json.Unmarshal(file ,&data); err != nil {
panic(err)
}
for _, letter := range data.Letter {
fmt.Println(letter)
}
}
Is there a way to bypass this thing with something like json.load(file) in Python?
If you only want to support integer values, you could unmarshal your data into a map[string]int. Note that the order of a map is not defined, so the below program's output is non-deterministic for the input.
package main
import (
"fmt"
"encoding/json"
"io/ioutil"
)
func main() {
file, _ := ioutil.ReadFile("/path/to/file.json")
var data map[string]int
if err := json.Unmarshal(file ,&data); err != nil {
panic(err)
}
for letter := range data {
fmt.Println(letter)
}
}
You can unmarshal any JSON data in this way:
var data interface{}
if err := json.Unmarshal(..., &data); err != nil {
// handle error
}
Though, in this way you should handle all the reflection-related stuffs
since you don't know what type the root data is, and its fields.
Even worse, your data might not be map at all.
It can be any valid JSON data type like array, string, integer, etc.
Here's a playground link: https://play.golang.org/p/DiceOv4sATO
It's impossible to do anything as simple as in Python, because Go is strictly typed, so it's necessary to pass your target into the unmarshal function.
What you've written could otherwise be shortened, slightly, to something like this:
func UnmarshalJSONFile(path string, i interface{}) error {
f, err := os.Open(path)
if err != nil {
return err
}
defer f.Close()
return json.NewDecoder(f).Decode(i)
}
But then to use it, you would do this:
func main() {
data := Data{}
if err := UnmarshalJSONFile("/path/to/file.json", &data); err != nil {
panic(err)
}
}
But you can see that the UnmarshalJSONFile is so simple, it hardly warrants a standard library function.

Check if strings are JSON format

How to check if a given string is in form of multiple json string separated by spaces/newline?
For example,
given: "test" 123 {"Name": "mike"} (3 json concatenated with space)
return: true, since each of item ("test" 123 and {"Name": "mike"}) is a valid json.
In Go, I can write a O(N^2) function like:
// check given string is json or multiple json concatenated with space/newline
func validateJSON(str string) error {
// only one json string
if isJSON(str) {
return nil
}
// multiple json string concatenate with spaces
str = strings.TrimSpace(str)
arr := []rune(str)
start := 0
end := 0
for start < len(str) {
for end < len(str) && !unicode.IsSpace(arr[end]) {
end++
}
substr := str[start:end]
if isJSON(substr) {
for end < len(str) && unicode.IsSpace(arr[end]) {
end++
}
start = end
} else {
if end == len(str) {
return errors.New("error when parsing input: " + substr)
}
for end < len(str) && unicode.IsSpace(arr[end]) {
end++
}
}
}
return nil
}
func isJSON(str string) bool {
var js json.RawMessage
return json.Unmarshal([]byte(str), &js) == nil
}
But this won't work for large input.
There are two options. The simplest, from a coding standpoint, is going to be just to decode the JSON string normally. You can make this most efficient by decoding to an empty struct:
package main
import "encoding/json"
func main() {
input := []byte(`{"a":"b", "c": 123}`)
var x struct{}
if err := json.Unmarshal(input, &x); err != nil {
panic(err)
}
input = []byte(`{"a":"b", "c": 123}xxx`) // This one fails
if err := json.Unmarshal(input, &x); err != nil {
panic(err)
}
}
(playground link)
This method has a few potential drawbacks:
It only works with a single JSON object. That is, a list of objects (as requested in the question) will fail, without additional logic.
As pointed out by #icza in comments, it only works with JSON objects, so bare arrays, numbers, or strings will fail. To accomodate these types, interface{} must be used, which introduces the potential for some serious performance penalties.
The throw-away x value must still be allocated, and at least one reflection call is likely under the sheets, which may introduce a noticeable performance penalty for some workloads.
Given these limitations, my recommendation is to use the second option: loop through the entire JSON input, ignoring the actual contents. This is made simple with the standard library json.Decoder:
package main
import (
"bytes"
"encoding/json"
"io"
)
func main() {
input := []byte(`{"a":"b", "c": 123}`)
dec := json.NewDecoder(bytes.NewReader(input))
for {
_, err := dec.Token()
if err == io.EOF {
break // End of input, valid JSON
}
if err != nil {
panic(err) // Invalid input
}
}
input = []byte(`{"a":"b", "c": 123}xxx`) // This input fails
dec = json.NewDecoder(bytes.NewReader(input))
for {
_, err := dec.Token()
if err == io.EOF {
break // End of input, valid JSON
}
if err != nil {
panic(err) // Invalid input
}
}
}
(playground link)
As Volker mentioned in the comments, use a *json.Decoder to decode all json documents in your input successively:
package main
import (
"encoding/json"
"io"
"log"
"strings"
)
func main() {
input := `"test" 123 {"Name": "mike"}`
dec := json.NewDecoder(strings.NewReader(input))
for {
var x json.RawMessage
switch err := dec.Decode(&x); err {
case nil:
// not done yet
case io.EOF:
return // success
default:
log.Fatal(err)
}
}
}
Try it on the playground: https://play.golang.org/p/1OKOii9mRHn
Try fastjson.Scanner:
s := `"test" 123 {"Name": "mike"}`
var sc fastjson.Scanner
sc.Init(s)
// Iterate over a stream of json objects
for sc.Next() {}
if sc.Error() != nil {
fmt.Println("ok")
} else {
fmt.Println("false")
}

preserve int64 values when parsing json in Go

I'm processing a json POST in Go that contains an array of objects containing 64bit integers. When using json.Unmarshal these values seem to be converted to a float64 which isn't very helpful.
body := []byte(`{"tags":[{"id":4418489049307132905},{"id":4418489049307132906}]}`)
var dat map[string]interface{}
if err := json.Unmarshal(body, &dat); err != nil {
panic(err)
}
tags := dat["tags"].([]interface{})
for i, tag := range tags {
fmt.Println("tag: ", i, " id: ", tag.(map[string]interface{})["id"].(int64))
}
Is there any way to preserve the original int64 in the output of json.Unmarshal?
Go Playground of above code
Solution 1
You can use a Decoder and UseNumber to decode your numbers without loss :
The Number type is defined like this :
// A Number represents a JSON number literal.
type Number string
which means you can easily convert it :
package main
import (
"encoding/json"
"fmt"
"bytes"
"strconv"
)
func main() {
body := []byte("{\"tags\":[{\"id\":4418489049307132905},{\"id\":4418489049307132906}]}")
dat := make(map[string]interface{})
d := json.NewDecoder(bytes.NewBuffer(body))
d.UseNumber()
if err := d.Decode(&dat); err != nil {
panic(err)
}
tags := dat["tags"].([]interface{})
n := tags[0].(map[string]interface{})["id"].(json.Number)
i64, _ := strconv.ParseUint(string(n), 10, 64)
fmt.Println(i64) // prints 4418489049307132905
}
Solution 2
You can also decode into a specific structure tailored to your needs :
package main
import (
"encoding/json"
"fmt"
)
type A struct {
Tags []map[string]uint64 // "tags"
}
func main() {
body := []byte("{\"tags\":[{\"id\":4418489049307132905},{\"id\":4418489049307132906}]}")
var a A
if err := json.Unmarshal(body, &a); err != nil {
panic(err)
}
fmt.Println(a.Tags[0]["id"]) // logs 4418489049307132905
}
Personally I generally prefer this solution which feels more structured and easier to maintain.
Caution
A small note if you use JSON because your application is partly in JavaScript : JavaScript has no 64 bits integers but only one number type, which is the IEEE754 double precision float. So you wouldn't be able to parse this JSON in JavaScript without loss using the standard parsing function.
easier one:
body := []byte(`{"tags":[{"id":4418489049307132905},{"id":4418489049307132906}]}`)
var dat map[string]interface{}
if err := json.Unmarshal(body, &dat); err != nil {
panic(err)
}
tags := dat["tags"].([]interface{})
for i, tag := range tags {
fmt.Printf("tag: %v, id: %.0f", i, tag.(map[string]interface{})["id"].(float64))
}
I realize this is very old, but this is the solution I ended up using
/*
skipping previous code, this is just converting the float
to an int, if the value is the same with or without what's
after the decimal points
*/
f := tag.(map[string]interface{})["id"].(float64)
if math.Floor(f) == f {
fmt.Println("int tag: ", i, " id: ", int64(f))
} else {
fmt.Println("tag: ", i, " id: ", f)
}