json.Unmarshal file data works but json.NewDecoder().Decode() does not - json

The following correctly unmarshals the struct:
func foo() {
d, err := os.ReadFile("file.json")
var t T
if err := json.Unmarshal(d, &t); err != nil {
panic(err)
}
}
but this doesn't work and throws a bunch of a classic JSON parsing errors i.e. EOF, unexpected token 't', etc.
func foo() {
f, err := os.Open("file.json")
var t T
if err := json.NewDecoder(f).Decode(&t); err != nil {
panic(err)
}
}
Any idea why? The os.File or []byte is used in two goroutines at once, and the JSON is of the following structure (with some fields omitted):
{
"data": [
{
"field": "stuff",
"num": 123,
},
...
]
}

The os.File or []byte is used in two goroutines at once...
That's the issue. os.File has an internal file pointer, a position where the next read happens. If 2 unrelated entities keep reading from it, they will likely not read overlapping data. Bytes read by the first entity will not be repeated for the second.
Also, os.File is not safe for concurrent use (the documentation doesn't state explicitly that it's safe): calling its methods from multiple, concurrent goroutines may result in a data race.
When you pass a []byte to multiple functions / goroutines which read "from it", there's no shared pointer or index variable. Each function / goroutine will maintain its index, separately, and reading a variable from multiple goroutines is OK (which in this case would be the (fields of the) slice header and the slice elements).

Related

Passing a csv.NewWriter() to another func in Golang to Write to file Asynchronously

I am making API calls (potentially thousands in a single job) and as they return and complete, I'd like to be able to write them to a single shared file (say CSV for simplicity) instead of waiting for all of them to complete before writing.
How could I share a single csv.Writer() in a way that effectively writes to a single file shared by many threads. This may be too daunting of a task, but I was curious if there was a way to go about it.
package main
import (
"encoding/csv"
"os"
)
type Row struct {
Field1 string
Field2 string
}
func main () {
file, _ := os.Create("file.csv")
w := csv.NewWriter(file)
// Some operations to create a slice of Row structs that will contain the rows
// To write
var rowsToWrite []Row
// Now lets iterate over and write to file
// Ideally, I'd like to do this in a goroutine but not entirely sure about thread safe writes
for _, r := range rowsToWrite {
go func(row, writer) {
err := writeToFile(row, writer)
if err != nil {
// Handle error
}
}(r, w)
}
}
func writeToFile(row Row, writer ???) error {
// Use the shared writer to maintain where I am at in the file so I can append to the CSV
if err := w.Write(row); err != nil {
return err
}
return nil
}
Lots of back and forth on this one for me 🙂
I originally thought you could use the Write() method on a csv.Writer in a goroutine, but there are issues when the buffer flushes to disk as the buffer is being written to... not exactly sure.
Anyways, to get back to what you were originally asking for...
Still using the same setup to download Todo objects from https://jsonplaceholder.typicode.com, as an example:
type Todo struct {
UserID int `json:"userId"`
ID int `json:"id"`
Title string `json:"title"`
Completed bool `json:"completed"`
}
// toRecord converts Todo struct to []string, for writing to CSV.
func (t Todo) toRecord() []string {
userID := strconv.Itoa(t.UserID)
id := strconv.Itoa(t.ID)
completed := strconv.FormatBool(t.Completed)
return []string{userID, id, t.Title, completed}
}
// getTodo gets endpoint and unmarshalls the response JSON into todo.
func getTodo(endpoint string) (todo Todo) {
resp, err := http.Get(endpoint)
if err != nil {
log.Println("error:", err)
}
defer resp.Body.Close()
json.NewDecoder(resp.Body).Decode(&todo)
return
}
The following:
Will start one "parent" goroutine to start filling the todos channel:
inside that routine, goroutines will be started for each HTTP request and will send the response Todo on todos
the parent will wait till all the request routines are done
when they're done, the parent will close the todos channel
Meanwhile, main has moved on and is ranging over todos, picking a Todo off one-at-a-time and writing it to the CSV.
When the original, "parent" goroutine finally closes todos, the for-loop will break, the writer does a final Flush(), and the program will complete.
func main() {
todos := make(chan Todo)
go func() {
const nAPICalls = 200
var wg sync.WaitGroup
wg.Add(nAPICalls)
for i := 0; i < nAPICalls; i++ {
s := fmt.Sprintf("https://jsonplaceholder.typicode.com/todos/%d", i+1)
go func(x string) {
todos <- getTodo(x)
wg.Done()
}(s)
}
wg.Wait()
close(todos)
}()
w := csv.NewWriter(os.Stdout)
w.Write([]string{"UserID", "ID", "Title", "Completed"})
for todo := range todos {
w.Write(todo.toRecord())
}
w.Flush()
}
I would (personally) not have the same file open for writing at two separate points in the code. Depending on how the OS handles buffered writes, etc., you can end up with "interesting" things happening.
Given how you've described your goals, one might do something like (this is off the top of my head and not rigorously tested):
Create a channel to queue blocks of text (I assume) to be written - make(chan []byte, depth) - depth could be tuneable based on some tests you'd run, presumably.
Have a goroutine open a filehandle for writing on your file, then read from that queueing channel, writing whatever it gets from the channel to that file
you could then have n goroutines writing to the queueing channel, and as long as you don't exceed the channel capacity (outrun your ability to write), you might never need to worry about locks.
If you did want to use locks, then you'd need a sync.Mutex shared between the goroutines responsible for enqueueing.
Season to taste, obviously.

check json array length without unmarshalling

Ive go a request body that is an json array of objects something like,
{
"data": [
{
"id": "1234",
"someNestedObject": {
"someBool": true,
"randomNumber": 488
},
"timestamp": "2021-12-13T02:43:44.155Z"
},
{
"id": "4321",
"someNestedObject": {
"someBool": false,
"randomNumber": 484
},
"timestamp": "2018-11-13T02:43:44.155Z"
}
]
}
I want to get a count of the objects in the array and split them into seperate json outputs to pass onto the next service. Im doing this atm by unmarshalling the original json request body and and then looping over the the elements marshalling each one again and attaching it to whatever outgoing message is being sent. Something like,
requestBodyBytes := []bytes(JSON_INPUT_STRING)
type body struct {
Foo []json.RawMessage `json:"foo"`
}
var inputs body
_ = json.Unmarshal(requestBodyBytes, &inputs)
for input := range inputs {
re, _ := json.Marshal(m)
... do something with re
}
What Im seeing though is the byte array of the before and after is different, even though the string representation is the same. I am wondering if there is a way to do this without altering the encoding or whatever is happening here to change the bytes to safeguard against any unwanted mutations? The actual json objects in the array will all have different shapes so I cant use a structured json definition with field validations to help.
Also, the above code is just an example of whats happening so if there are spelling or syntax errors please ignore them as the actual code works as described.
If you use json.RawMessage, the JSON source text will not be parsed but stored in it as-is (it's a []byte).
So if you want to distribute the same JSON array element, you do not need to do anything with it, you may "hand it over" as-is. You do not have to pass it to json.Marshal(), it's already JSON marshalled text.
So simply do:
for _, input := range inputs.Foo {
// input is of type json.RawMessage, and it's already JSON text
}
If you pass a json.RawMessage to json.Marshal(), it might get reencoded, e.g. compacted (which may result in a different byte sequence, but it will hold the same data as JSON).
Compacting might even be a good idea, as the original indentation might look weird taken out of the original context (object and array), also it'll be shorter. To simply compact a JSON text, you may use json.Compact() like this:
for _, input := range inputs.Foo {
buf := &bytes.Buffer{}
if err := json.Compact(buf, input); err != nil {
panic(err)
}
fmt.Println(buf) // The compacted array element value
}
If you don't want to compact it but to indent the array elements on their own, use json.Indent() like this:
for _, input := range inputs.Foo {
buf := &bytes.Buffer{}
if err := json.Indent(buf, input, "", " "); err != nil {
panic(err)
}
fmt.Println(buf)
}
Using your example input, this is how the first array element would look like (original, compacted and indented):
Orignal:
{
"id": "1234",
"someNestedObject": {
"someBool": true,
"randomNumber": 488
},
"timestamp": "2021-12-13T02:43:44.155Z"
}
Compacted:
{"id":"1234","someNestedObject":{"someBool":true,"randomNumber":488},"timestamp":"2021-12-13T02:43:44.155Z"}
Indented:
{
"id": "1234",
"someNestedObject": {
"someBool": true,
"randomNumber": 488
},
"timestamp": "2021-12-13T02:43:44.155Z"
}
Try the examples on the Go Playground.
Also note that if you do decide to compact or indent the individual array elements in the loop, you may create a simple bytes.Buffer before the loop, and reuse it in each iteration, calling its Buffer.Reset() method to clear the previous array's data.
It could look like this:
buf := &bytes.Buffer{}
for _, input := range inputs.Foo {
buf.Reset()
if err := json.Compact(buf, input); err != nil {
panic(err)
}
fmt.Println("Compacted:\n", buf)
}

Inserting JSON or a map from map[string]interface{} to MongoDB collection sets ints and floats as strings

I know the title seems generic and a duplicate, but i've tried many of the options from previous questions, and I can't use a struct here
My system is using the messaging service NATS to sends maps between a subscriber and a publisher. The subscriber takes the received map, and inserts it as a document in a MongoDB collection
The problem I have is that floats and ints are inserted as strings!
In my code, the recipe is a configuration file that sets the datatypes of the columns received in the map. Think of it as a series of keys like this:
String column: "string",
Int column: "int"
Here's the code that creates the map with the right datatypes
mapWithCorrectDataTypes := make(map[string]interface{})
for columnNameFromDataTypesInRecipe, datatypeForColumnInRecipe := range dataTypesFromRecipeForColumns {
for natsMessageColumn, natsMessageColumnValue := range mapFromNATSMessage {
//If the column in the NATS message is found in the recipe, format the data as dictated in the recipe
if natsMessageColumn == columnNameFromDataTypesInRecipe {
if datatypeForColumnInRecipe.(string) == "string" {
natsMessageColumnValue = natsMessageColumnValue.(string)
mapWithCorrectDataTypes[columnNameFromDataTypesInRecipe] = natsMessageColumnValue
}
if datatypeForColumnInRecipe.(string) == "int" {
convertedInt, err := strconv.Atoi(mapFromNATSMessage[columnNameFromDataTypesInRecipe].(string))
if err != nil {
fmt.Println("ERROR -->", err)
}
mapWithCorrectDataTypes[columnNameFromDataTypesInRecipe] = convertedInt
}
if datatypeForColumnInRecipe.(string) == "float64" {
convertedFloat, err := strconv.ParseFloat(mapFromNATSMessage[columnNameFromDataTypesInRecipe].(string), 64)
if err != nil {
fmt.Println("ERROR -->", err)
}
mapWithCorrectDataTypes[columnNameFromDataTypesInRecipe] = convertedFloat
fmt.Println("TYPE -->", reflect.TypeOf(mapWithCorrectDataTypes[columnNameFromDataTypesInRecipe]))
}
} else {
//If column not found in the recipe, format as a string
mapWithCorrectDataTypes[natsMessageColumn] = natsMessageColumnValue.(string)
}
}
}
From the last line I put in a print statement for float64s to check that the datatype for this key in the map is correct, and it passes this test!
My question is this: If the data types are correctly being set in the map, why when the map is inserted as a document in MongoDB are the floats and ints set as strings?!
What I have tried so far:
Marshalling and unmarshalling the map as an interface, then inserting the record:
jsonVersionOfMap, err := json.Marshal(mapWithCorrectDataTypes)
if err != nil {
fmt.Println("ERROR -->", err)
}
var interfaceForJSON interface{}
json.Unmarshal(jsonVersionOfMap, &interfaceForJSON)
fmt.Println("JSON -->", interfaceForJSON)
err = mongoConnection.Insert(interfaceForJSON)
if err != nil {
fmt.Println("Error inserting MongoDB documents", err)
}
What am I missing here?
See the result with the incorrectly formatted data:
this may not be a fix. But i've resolved the issue i've been having. I'm using a publisher and a subscriber via NATS. Previously I was creating a map with all the data, then sending that out as a message, then the subscriber takes the map from the message, and processing the datatypes (on the subscriber side)
To fix the problem that I was experiencing, I instead formatted the maps' values on the publisher side. So I instead moved my code that checks for the datatype over to the NATS publisher, and not the code that processes the incoming message
I understand this isn't an ideal solution, but if you're using NATS and find you're having the same issue. Try this

Insert a slice result JSON into MongoDB

I'm using the mgo driver for MongoDB, with the Gin framework.
type Users struct {
User_id *string `json:"id user" bson:"id user"`
Images []string `json:"images" bson:"images"`
}
I have this function which tries to convert the slice into JSON.
The slice here is UsersTotal
func GetUsersApi(c *gin.Context) {
UsersTotal, err := GetUsers()
if err != nil {
fmt.Println("error:", err)
}
c.JSON(http.StatusOK, gin.H{
"Count Users": len(UsersTotal),
"Users Found ": UsersTotal,
})
session, err := mgo.Dial(URL)
if err == nil {
fmt.Println("Connection to mongodb established ok!!")
cc := session.DB("UsersDB").C("results")
err22 := cc.Insert(&UsersTotal)
if err22 != nil {
fmt.Println("error insertion ", err22)
}
}
session.Close()
}
Running it I get the following error:
error insertion Wrong type for documents[0]. Expected a object, got a array.
Inserting multiple documents is the same as inserting a single one because the Collection.Insert() method has a variadic parameter:
func (c *Collection) Insert(docs ...interface{}) error
One thing you should note is that it expects interface{} values. Value of any type qualifies "to be" an interface{}. Another thing you should note is that only the slice type []interface{} qualifies to be []interface{}, a user slice []User does not. For details, see Type converting slices of interfaces in go
So simply create a copy of your users slice where the copy has a type of []interface{}, and that you can directly pass to Collection.Insert():
docs := make([]interface{}, len(UsersTotal))
for i, u := range UsersTotal {
docs[i] = u
}
err := cc.Insert(docs...)
// Handle error
Also please do not connect to MongodB in your handler. Do it once, on app startup, store the global connection / session, and clone / copy it when needed. For details see mgo - query performance seems consistently slow (500-650ms); and too many open files in mgo go server.

goroutine channels over a for loop

My main function reads json from a file, unmarshals it into a struct, converts it into another struct type and spits out formatted JSON through stdout.
I'm trying to implement goroutines and channels to add concurrency to my for loop.
func main() {
muvMap := map[string]string{"male": "M", "female": "F"}
fileA, err := os.Open("serviceAfileultimate.json")
if err != nil {
panic(err)
}
defer fileA.Close()
data := make([]byte, 10000)
count, err := fileA.Read(data)
if err != nil {
panic(err)
}
dataBytes := data[:count]
var servicesA ServiceA
json.Unmarshal(dataBytes, &servicesA)
var servicesB = make([]ServiceB, servicesA.Count)
goChannels := make(chan ServiceB, servicesA.Count)
for i := 0; i < servicesA.Count; i++ {
go func() {
reflect.ValueOf(&servicesB[i]).Elem().FieldByName("Address").SetString(Merge(&servicesA.Users[i].Location))
reflect.ValueOf(&servicesB[i]).Elem().FieldByName("Date_Of_Birth").SetString(dateCopyTransform(servicesA.Users[i].Dob))
reflect.ValueOf(&servicesB[i]).Elem().FieldByName("Email").SetString(servicesA.Users[i].Email)
reflect.ValueOf(&servicesB[i]).Elem().FieldByName("Fullname").SetString(Merge(&servicesA.Users[i].Name))
reflect.ValueOf(&servicesB[i]).Elem().FieldByName("Gender").SetString(muvMap[servicesA.Users[i].Gender])
reflect.ValueOf(&servicesB[i]).Elem().FieldByName("Phone").SetString(servicesA.Users[i].Cell)
reflect.ValueOf(&servicesB[i]).Elem().FieldByName("Username").SetString(servicesA.Users[i].Username)
goChannels <- servicesB[i]
}()
}
for index := range goChannels {
json.NewEncoder(os.Stdout).Encode(index)
}
}
It compiles but is returning messages like:
goroutine 1 [chan receive]: main.main() C://.....go.94 +0x55b.
You're printing the channels info, not the data it contains. You don't want a loop, you just want to receive then print.
json := <-index
json.NewEncoder(os.Stdout).Encode(json)
Now I do I need to point out, that code is not going to block. If you want to keep reading until all work is done you need some kind of locking/coordination mechanism.
You'll often see things like
for {
select {
case json := <-jsonChannel:
// do stuff
case <-abort:
// get out of here
}
}
To deal with that. Also, just fyi you're initializing your channel with a default capacity (meaning it's a buffered channel) which is pretty odd. I'd recommend reviewing some tutorials on the topic cause overall your design needs some work actually be an improvement of non-concurrent implementations. Lastly you can find libraries to abstract some of this work for you and most people would probably recommend you do. Here's an example; https://github.com/lytics/squaredance