Go reading map from json stream - json

I need to parse really long json file (more than million items). I don't want to load it to the memory and read it chunk by chunk. There's a good example with the array of items here. The problem is that I deal with the map. And when I call Decode I get not at beginning of value.
I can't get what should be changed.
const data = `{
"object1": {"name": "cattle","location": "kitchen"},
"object2": {"name": "table","location": "office"}
}`
type ReadObject struct {
Name string `json:"name"`
Location string `json:"location"`
}
func ParseJSON() {
dec := json.NewDecoder(strings.NewReader(data))
tkn, err := dec.Token()
if err != nil {
log.Fatalf("failed to read opening token: %v", err)
}
fmt.Printf("opening token: %v\n", tkn)
objects := make(map[string]*ReadObject)
for dec.More() {
var nextSymbol string
if err := dec.Decode(&nextSymbol); err != nil {
log.Fatalf("failed to parse next symbol: %v", err)
}
nextObject := &ReadObject{}
if err := dec.Decode(&nextObject); err != nil {
log.Fatalf("failed to parse next object")
}
objects[nextSymbol] = nextObject
}
tkn, err = dec.Token()
if err != nil {
log.Fatalf("failed to read closing token: %v", err)
}
fmt.Printf("closing token: %v\n", tkn)
fmt.Printf("OBJECTS: \n%v\n", objects)
}

TL,DR: when you are calling Token() method for a first time, you move offset from the beginning (of a JSON value) and therefore you get the error.
You are working with this struct (link):
type Decoder struct {
// others fields omits for simplicity
tokenState int
}
Pay attention for a tokenState field. This value could be one of (link):
const (
tokenTopValue = iota
tokenArrayStart
tokenArrayValue
tokenArrayComma
tokenObjectStart
tokenObjectKey
tokenObjectColon
tokenObjectValue
tokenObjectComma
)
Let's back to your code. You are calling Token() method. This method obtains first JSON-valid token { and changes tokenState from tokenObjectValue to the tokenObjectStart (link). Now you are "in-an-object" state.
If you try to call Decode() at this point you will get an error (not at beginning of value). This is because allowed states of tokenState for calling Decode() are tokenTopValue, tokenArrayStart, tokenArrayValue, tokenObjectValue, i.e. "full" value, not part of it (link).
To avoid this you can just don't call Token() at all and do something like this:
dec := json.NewDecoder(strings.NewReader(dataMapFromJson))
objects := make(map[string]*ReadObject)
if err := dec.Decode(&objects); err != nil {
log.Fatalf("failed to parse next symbol: %v", err)
}
fmt.Printf("OBJECTS: \n%v\n", objects)
Or, if you want to read chunk-by-chunk, you could keep calling Token() until you reach "full" value. And then call Decode() on this value (I guess this should work).

After consuming the initial { with your first call to dec.Token(), you must :
use dec.Token() to extract the next key
after extracting the key, you can call dec.Decode(&nextObject) to decode an entry
example code :
for dec.More() {
key, err := dec.Token()
if err != nil {
// handle error
}
var val interface{}
err = dec.Decode(&val)
if err != nil {
// handle error
}
fmt.Printf(" %s : %v\n", key, val)
}
https://play.golang.org/p/5r1d8MsNlKb

Related

How can I insert json string to MongoDB?

I have a json string. Like this:
"{"http_requests":[{"http_requests":{"code":"400","method":"PUT","value":89}},{"http_requests":{"code":"200","method":"PUT","value":45}}]}"
I want to insert this json to mongodb. But I have error in my code.
The error is "cannot transform type string to a BSON Document: WriteString can only write while positioned on a Element or Value but is positioned on a TopLevel"
func insertJson(json_value string) {
client, err := mongo.NewClient(options.Client().ApplyURI("mongodb+srv://abc:123#cluster0.wrzj3zo.mongodb.net/?retryWrites=true&w=majority"))
if err != nil {
log.Fatal(err)
}
ctx, _ := context.WithTimeout(context.Background(), 10*time.Second)
err = client.Connect(ctx)
if err != nil {
log.Fatal(err)
}
defer client.Disconnect(ctx)
myDatabase := client.Database("my_db")
myCollection := myDatabase.Collection("my_collection")
myResult, err := myCollection.InsertOne(ctx, json_value)
if err != nil {
log.Fatal(err)
}
fmt.Println(myResult.InsertedID)
}
How do I insert this json string to mongodb?
First thing's first: Add a ping to check if connection is succeeding after defer client.Disconnect(ctx).
if err = client.Ping(ctx, readpref.Primary()); err != nil {
log.Fatalf("ping failed: %v", err)
}
If that doesn't throw an error, you can unmarshal your JSON string as explained in stackoverflow: How to insert a json object array to mongodb in golang. However, in this case, use interface{} instead of slice as follows:
var v interface{}
if err := json.Unmarshal([]byte(json_value), &v); err != nil {
log.Fatal(err)
}
Pass v to InsertOne.
Note: This is one way of solving the problem. However, the recommended way to go about it is to unmarshal the JSON to go struct with json and bson tags, and pass the struct instance(s) to InsertOne.
Some references:
Go by Example: JSON
How to Use Golang Structs With MongoDB
Use Struct Tags
The insertOne() method has the following syntax:
db.collection.insertOne(
<document>,
{
writeConcern: <document> (optional)
}
)
all you have to do is
myCollection.insertOne(json_metrics)

Parse JSON having sibling dynamic keys alongside with static in Go

I need to parse this json
{
"version": "1.1.29-snapshot",
"linux-amd64": {
"url": "https://origin/path",
"size": 7794688,
"sha256": "14b3c3ad05e3a98d30ee7e774646aec7ffa8825a1f6f4d9c01e08bf2d8a08646"
},
"windows-amd64": {
"url": "https://origin/path",
"size": 8102400,
"sha256": "01b8b927388f774bdda4b5394e381beb592d8ef0ceed69324d1d42f6605ab56d"
}
}
Keys like linux-amd64 are dynamic and theirs amount is arbitrary. I tried something like that to describe it and unmarshal. Obviously it doesn't work. Items is always empty.
type FileInfo struct {
Url string `json:"url"`
Size int64 `json:"size"`
Sha256 string `json:"sha256"`
}
type UpdateInfo struct {
Version string `json:"version"`
Items map[string]FileInfo
}
It's similar to this use case, but has no parent key items. I suppose I can use 3rd party library or map[string]interface{} approach, but I'm interested in knowing how to achieve this with explicitly declared types.
The rest of the parsing code is:
func parseUpdateJson(jsonStr []byte) (UpdateInfo, error) {
var allInfo = UpdateInfo{Items: make(map[string]FileInfo)}
var err = json.Unmarshal(jsonStr, &allInfo)
return allInfo, err
}
Look at the link I attached and you will realize that is not that simple as you think. Also I pointed that I interested in typed approach. Ok, how to declare this map[string]FileInfo to get parsed?
You can create a json.Unmarshaller to decode the json into a map, then apply those values to your struct: https://play.golang.org/p/j1JXMpc4Q9u
type FileInfo struct {
Url string `json:"url"`
Size int64 `json:"size"`
Sha256 string `json:"sha256"`
}
type UpdateInfo struct {
Version string `json:"version"`
Items map[string]FileInfo
}
func (i *UpdateInfo) UnmarshalJSON(d []byte) error {
tmp := map[string]json.RawMessage{}
err := json.Unmarshal(d, &tmp)
if err != nil {
return err
}
err = json.Unmarshal(tmp["version"], &i.Version)
if err != nil {
return err
}
delete(tmp, "version")
i.Items = map[string]FileInfo{}
for k, v := range tmp {
var item FileInfo
err := json.Unmarshal(v, &item)
if err != nil {
return err
}
i.Items[k] = item
}
return nil
}
This answer is adapted from this recipe in my YouTube video on advanced JSON handling in Go.
func (u *UpdateInfo) UnmarshalJSON(d []byte) error {
var x struct {
UpdateInfo
UnmarshalJSON struct{}
}
if err := json.Unmarshal(d, &x); err != nil {
return err
}
var y map[string]json.RawMessage{}
if err := json.Unsmarshal(d, &y); err != nil {
return err
}
delete(y, "version"_ // We don't need this in the map
*u = x.UpdateInfo
u.Items = make(map[string]FileInfo, len(y))
for k, v := range y {
var info FileInfo
if err := json.Unmarshal(v, &info); err != nil {
return err
}
u.Items[k] = info
}
return nil
}
It:
Unmarshals the JSON into the struct directly, to get the struct fields.
It re-unmarshals into a map of map[string]json.RawMessage to get the arbitrary keys. This is necessary since the value of version is not of type FileInfo, and trying to unmarshal directly into map[string]FileInfo will thus error.
It deletes the keys we know we already got in the struct fields.
It then iterates through the map of string to json.RawMessage, and finally unmarshals each value into the FileInfo type, and stores it in the final object.
If you really don't want to unmarshal multiple times, your next best option is to iterate over the JSON tokens in your input by using the json.Decoder type. I've done this in a couple of performance-sensitive bits of code, but it makes your code INCREDIBLY hard to read, and in almost all cases is not worth the effort.

Parsing JSON concurrently - panic of runtime error (decoding related)

I was playing with go recently and stuck with a runtime error, I can't explain. These are my working functions.
type User struct {
Browsers []string `json:"browsers"`
Name string `json:"name"`
Email string `json:"email"`
}
func asyncUserProcJson(wg *sync.WaitGroup, users *[]User, ch chan []byte) {
for buf := range ch {
var mu sync.Mutex
var user User
mu.Lock()
err := json.Unmarshal(buf, &user)
mu.Unlock()
if err != nil {
fmt.Println("json:", err)
wg.Done()
continue
}
*users = append(*users, user)
wg.Done()
}
}
func userProcJson(buf []byte) (User, error) {
var user User
err := json.Unmarshal(buf, &user)
if err != nil {
return User{}, err
}
return user, nil
}
If I do a common - non-concurrent aproach, its works as expected. But if, try to use channel to pass bytes to goroutine... it fails.
type AsyncUserProc func(*sync.WaitGroup, *[]User, chan []byte)
type UserProc func(buf []byte) (User, error)
type SearchParams struct {
out io.Writer
asyncUserProc AsyncUserProc
userProc UserProc
}
func (sp SearchParams) AsyncSearch() []User {
file, err := os.Open(filePath)
if err != nil {
log.Fatalln(err)
}
var Users = make([]User, 0, 1024)
var ch = make(chan []byte)
var wg sync.WaitGroup
go sp.asyncUserProcess(&wg, &Users, ch)
scanner := bufio.NewScanner(file)
for scanner.Scan() {
wg.Add(1)
ch <- scanner.Bytes()
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, "reading standard input:", err)
}
close(ch)
wg.Wait()
return Users
}
func (sp SearchParams) Search() []User {
file, err := os.Open(filePath)
if err != nil {
log.Fatalln(err)
}
// json processor
var Users = make([]User, 0, 1024)
scanner := bufio.NewScanner(file)
for scanner.Scan() {
u, err := sp.userProcess(scanner.Bytes())
if err != nil {
log.Panicln(err)
continue
}
Users = append(Users, u)
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, "reading standard input:", err)
}
return Users
}
Workflow is the next one:
filePath contains a JSON chunks (each on new line)
Open for reading.
Create a line scanner
(AsyncSearch)
Pass line to channel.
return value of the line from range (blocking operation)
pass to json.Unmarshal
troubles
(Search)
Pass line directly to userProc func
Enjoy result
I am getting a lot (different) errors.
a lot of json unmarshaling error.
index out of range
JSON decoder out of sync - data changing underfoot?
as description of last error:
// phasePanicMsg is used as a panic message when we end up with something that
// shouldn't happen. It can indicate a bug in the JSON decoder, or that
// something is editing the data slice while the decoder executes.
So here is a question: How the bytes slice is modified?
I thought it was blocking operation. What am I missing in language mechanics?
Example of the errors (different each run)
json: invalid character 'i' looking for beginning of value
json: invalid character ':' after top-level value
json: invalid character 'r' looking for beginning of value
panic: runtime error: index out of range
----
json: invalid character '.' after top-level value
json: invalid character 'K' looking for beginning of value
panic: JSON decoder out of sync - data changing underfoot?
Package bufio
import "bufio"
func (*Scanner) Bytes
func (s *Scanner) Bytes() []byte
Bytes returns the most recent token generated by a call to Scan. The
underlying array may point to data that will be overwritten by a
subsequent call to Scan. It does no allocation.
The underlying array may point to data that will be overwritten by a subsequent call to Scan.

send and read a [] byte between two microservices golang

I have a data encryption function that returns a [] byte. Of course, what has been encrypted must be decrypted (through another function) in another micro-service.
The problem is created when I send the []byte via JSON: the []byte is transformed into a string and then when I go to read the JSON through the call, the result is no longer the same.
I have to be able to pass the original []byte, created by the encryption function, through JSON or otherwise pass the []byte through a call like the one you can see below. Another possibility is to change the decryption function, but I have not succeeded.
caller function
func Dati_mono(c *gin.Context) {
id := c.Param("id")
oracle, err := http.Get("http://XXXX/"+id)
if err != nil {
panic(err)
}
defer oracle.Body.Close()
oJSON, err := ioutil.ReadAll(oracle.Body)
if err != nil {
panic(err)
}
oracleJSON := security.Decrypt(oJSON, keyEn)
c.JSON(http.StatusOK, string(oJSON))
}
function that is called with the url
func Dati(c *gin.Context) {
var (
person Person
result mapstring.Dati_Plus
mmap []map[string]interface{}
)
rows, err := db.DBConor.Query("SELECT COD_DIPENDENTE, MATRICOLA, COGNOME FROM ANDIP021_K")
if err != nil {
fmt.Print(err.Error())
}
for rows.Next() {
err = rows.Scan(&person.COD_DIPENDENTE, &person.MATRICOLA, &person.COGNOME)
ciao := structs.Map(&person)
mmap = append(mmap, ciao)
}
defer rows.Close()
result = mapstring.Dati_Plus{
len(mmap),
mmap,
}
jsonEn := []byte(mapstring.Dati_PlustoStr(result))
keyEn := []byte(key)
cipherjson, err := security.Encrypt(jsonEn, keyEn)
if err != nil {
log.Fatal(err)
}
c.JSON(http.StatusOK, cipherjson)
}
encryption and decryption functions
func Encrypt(json []byte, key []byte) (string, error) {
k, err := aes.NewCipher(key)
if err != nil {
return "nil", err
}
gcm, err := cipher.NewGCM(k)
if err != nil {
return "nil", err
}
nonce := make([]byte, gcm.NonceSize())
if _, err = io.ReadFull(rand.Reader, nonce); err != nil {
return "nil", err
}
return gcm.Seal(nonce, nonce, json, nil), nil
}
func Decrypt(cipherjson []byte, key []byte) ([]byte, error) {
k, err := aes.NewCipher(key)
if err != nil {
return nil, err
}
gcm, err := cipher.NewGCM(k)
if err != nil {
return nil, err
}
nonceSize := gcm.NonceSize()
if len(cipherjson) < nonceSize {
return nil, errors.New("cipherjson too short")
}
nonce, cipherjson := cipherjson[:nonceSize], cipherjson[nonceSize:]
return gcm.Open(nil, nonce, cipherjson, nil)
}
Everything works, the problem is created when I print cipherjson in c.JSON (): the []byte is translated into a string.
At the time it is taken and read by the calling function it is read as string and ioutil.ReadAll () creates the [] byte of the read string.
Instead I must be able to pass to the Decryot function the return of the Encrypt function used in the called function.
I hope I was clear, thanks in advance
You are not decoding the response before decrypting. In other words, you are handing the JSON encoding of the ciphertext to Decrypt. That is obviously not going to do what you want. To recover the plaintext you have to precisely undo all of the operations of the encryption and encoding in reverse order.
Either decode before decrypting, or don't JSON encode on the server. For instance:
oJSON, err := ioutil.ReadAll(oracle.Body)
if err != nil {
panic(err)
}
var ciphertext string
if err := json.Unmarshal(oJSON, &ciphertext); err != nil {
// TODO: handle error
}
oracleJSON := security.Decrypt(ciphertext, keyEn)
Although it is unclear why you even go through the trouble of JSON encoding in the first place. You might as well just write the ciphertext directly. If you really want to encode the ciphertext, you should not convert it to a string. The ciphertext is just a bunch of random bytes, not remotely resembling a UTF-8 encoded string, so don't treat it like one. encoding/json uses the base64 encoding for byte slices automatically, which is a much cleaner (and probably shorter) representation of the ciphertext than tons of unicode escape sequences.
Independent of the encoding you choose (if any), your Encrypt function is broken.
// The plaintext and dst must overlap exactly or not at all. To reuse
// plaintext's storage for the encrypted output, use plaintext[:0] as dst.
Seal(dst, nonce, plaintext, additionalData []byte) []byte
The first argument is the destination for the encryption. If you don't need to retain the plaintext, pass json[:0]; otherwise pass nil.
Also, Decrypt expects the ciphertext to be prefixed by the nonce, but Encrypt doesn't prepend it.

Custom marshalling to bson and JSON (Golang & mgo)

I have the following type in Golang:
type Base64Data []byte
In order to support unmarshalling a base64 encoded string to this type, I did the following:
func (b *Base64Data) UnmarshalJSON(data []byte) error {
if len(data) == 0 {
return nil
}
content, err := base64.StdEncoding.DecodeString(string(data[1 : len(data)-1]))
if err != nil {
return err
}
*b = []byte(xml)
return nil
}
Now I also want to be able to marshal and unmarshal it to mongo database, using mgo Golang library.
The problem is that I already have documents there stored as base64 encoded string, so I have to maintain that.
I tried to do the following:
func (b Base64Data) GetBSON() (interface{}, error) {
return base64.StdEncoding.EncodeToString([]byte(b)), nil
}
func (b *Base64DecodedXml) SetBSON(raw bson.Raw) error {
var s string
var err error
if err = raw.Unmarshal(&s); err != nil {
return err
}
*b, err = base64.StdEncoding.DecodeString(s)
return err
}
So that after unmarshaling, the data is already decoded, so I need to encode it back, and return it as a string so it will be written to db as a string (and vice versa)
For that I implemented bson getter and setter, but it seems only the getter is working properly
JSON unmarshaling from base64 encoded string works, as well marshaling it to database. but unmarshling setter seems to not be called at all.
Can anyone suggest what I'm missing, so that I'll be able to properly hold the data decoded in memory, but encoded string type?
This is a test I tried to run:
b := struct {
Value shared.Base64Data `json:"value" bson:"value"`
}{}
s := `{"value": "PHJvb3Q+aGVsbG88L3Jvb3Q+"}`
require.NoError(t, json.Unmarshal([]byte(s), &b))
t.Logf("%v", string(b.Value))
b4, err := bson.Marshal(b)
require.NoError(t, err)
t.Logf("%v", string(b4))
require.NoError(t, bson.Unmarshal(b4, &b))
t.Logf("%v", string(b.Value))
You can't marshal any value with bson.Marshal(), only maps and struct values.
If you want to test it, pass a map, e.g. bson.M to bson.Marshal():
var x = Base64Data{0x01, 0x02, 0x03}
dd, err := bson.Marshal(bson.M{"data": x})
fmt.Println(string(dd), err)
Your code works as-is, and as you intend it to. Try to insert a wrapper value to verify it:
c := sess.DB("testdb").C("testcoll")
var x = Base64Data{0x01, 0x02, 0x03}
if err := c.Insert(bson.M{
"data": x,
}); err != nil {
panic(err)
}
This will save the data as a string, being the Base64 encoded form.
Of course if you want to load it back into a value of type Base64Data, you will also need to define the SetBSON(raw Raw) error method too (bson.Setter interface).