Appending to json file without writing entire file - json

I have a json which contains one its attributes value as an array and I need to keep appending values to the array and write to a file. Is there a way I could avoid rewrite of the existing data and only append the new values?
----- Moving next question on different thread ---------------
what is recommended way for writing big data sets onto the file incremental file write or file dump at the end process?

A general solution makes the most sense if the existing JSON is actually an array, or if it's an object that has an array as the last or only pair, as in your case. Otherwise, you're inserting instead of appending. You probably don't want to read the entire file either.
One approach is not much different than what you were thinking, but handles several details
Read the end of the file to verify that it "ends with an array"
Retain that part
Position the file at that ending array bracket
Take the output from a standard encoder for an array of new data, dropping its opening bracket, and inserting a comma if necessary
The end of the the new output replaces the original ending array bracket
Tack the rest of the tail back on
import (
"bytes"
"errors"
"io"
"io/ioutil"
"os"
"regexp"
"unicode"
)
const (
tailCheckLen = 16
)
var (
arrayEndsObject = regexp.MustCompile("(\\[\\s*)?](\\s*}\\s*)$")
justArray = regexp.MustCompile("(\\[\\s*)?](\\s*)$")
)
type jsonAppender struct {
f *os.File
strippedBracket bool
needsComma bool
tail []byte
}
func (a jsonAppender) Write(b []byte) (int, error) {
trimmed := 0
if !a.strippedBracket {
t := bytes.TrimLeftFunc(b, unicode.IsSpace)
if len(t) == 0 {
return len(b), nil
}
if t[0] != '[' {
return 0, errors.New("not appending array: " + string(t))
}
trimmed = len(b) - len(t) + 1
b = t[1:]
a.strippedBracket = true
}
if a.needsComma {
a.needsComma = false
n, err := a.f.Write([]byte(", "))
if err != nil {
return n, err
}
}
n, err := a.f.Write(b)
return trimmed + n, err
}
func (a jsonAppender) Close() error {
if _, err := a.f.Write(a.tail); err != nil {
defer a.f.Close()
return err
}
return a.f.Close()
}
func JSONArrayAppender(file string) (io.WriteCloser, error) {
f, err := os.OpenFile(file, os.O_RDWR, 0664)
if err != nil {
return nil, err
}
pos, err := f.Seek(0, io.SeekEnd)
if err != nil {
return nil, err
}
if pos < tailCheckLen {
pos = 0
} else {
pos -= tailCheckLen
}
_, err = f.Seek(pos, io.SeekStart)
if err != nil {
return nil, err
}
tail, err := ioutil.ReadAll(f)
if err != nil {
return nil, err
}
hasElements := false
if len(tail) == 0 {
_, err = f.Write([]byte("["))
if err != nil {
return nil, err
}
} else {
var g [][]byte
if g = arrayEndsObject.FindSubmatch(tail); g != nil {
} else if g = justArray.FindSubmatch(tail); g != nil {
} else {
return nil, errors.New("does not end with array")
}
hasElements = len(g[1]) == 0
_, err = f.Seek(-int64(len(g[2])+1), io.SeekEnd) // 1 for ]
if err != nil {
return nil, err
}
tail = g[2]
}
return jsonAppender{f: f, needsComma: hasElements, tail: tail}, nil
}
Usage is then like in this test fragment
a, err := JSONArrayAppender(f)
if err != nil {
t.Fatal(err)
}
added := []struct {
Name string `json:"name"`
}{
{"Wonder Woman"},
}
if err = json.NewEncoder(a).Encode(added); err != nil {
t.Fatal(err)
}
if err = a.Close(); err != nil {
t.Fatal(err)
}
You can use whatever settings on the Encoder you want. The only hard-coded part is handling needsComma, but you can add an argument for that.

If your JSON array is simple you can use something like the following code. In this code, I create JSON array manually.
type item struct {
Name string
}
func main() {
fd, err := os.Create("hello.json")
if err != nil {
log.Fatal(err)
}
fd.Write([]byte{'['})
for i := 0; i < 10; i++ {
b, err := json.Marshal(item{
"parham",
})
if err != nil {
log.Fatal(err)
}
if i != 0 {
fd.Write([]byte{','})
}
fd.Write(b)
}
fd.Write([]byte{']'})
}
If you want to have a valid array in each step you can write ']' at the end of each iteration and then seek back on the start of the next iteration.

Related

Unmarshal nested map with int keys

If I have a map, I can Marshal it no problem:
package main
import (
"encoding/json"
"os"
)
type object map[int]interface{}
func main() {
obj := object{
1: "one", 2: object{3: "three"},
}
buf, err := json.Marshal(obj)
if err != nil {
panic(err)
}
os.Stdout.Write(buf) // {"1":"one","2":{"3":"three"}}
}
However I want to do the reverse. I tried this:
package main
import (
"encoding/json"
"fmt"
)
func main() {
buf := []byte(`{"1":"one","2":{"3":"three"}}`)
var obj map[int]interface{}
json.Unmarshal(buf, &obj)
// map[int]interface {}{1:"one", 2:map[string]interface {}{"3":"three"}}
fmt.Printf("%#v\n", obj)
}
Only the top level has the correct type. Is it possible to do what I am wanting?
JSON keys are never anything but strings, that's how the spec is defined, and you can see that in the output of you marshal. So when you try the reverse with interface{} as the top level map's value type, the type information for the nested objects is lost. You'd need a custom map type that implements UnmarshalJSON to be able to do what you want.
For example:
type IntKeyMap map[int]interface{}
func (m *IntKeyMap) UnmarshalJSON(data []byte) error {
raw := map[int]json.RawMessage{}
if err := json.Unmarshal(data, &raw); err != nil {
return err
}
for k, v := range raw {
// check if the value is a nested object
if len(v) > 0 && v[0] == '{' && v[len(v)-1] == '}' {
// The following assumes that the nested JSON object's
// key strings represent integers, if that is not the
// case this block will fail.
obj := IntKeyMap{}
if err := json.Unmarshal([]byte(v), &obj); err != nil {
return err
}
(*m)[k] = obj
} else {
var i interface{}
if err := json.Unmarshal([]byte(v), &i); err != nil {
return err
}
(*m)[k] = i
}
}
return nil
}
https://go.dev/play/p/lmyhqD__Uod

How do you append function result, while overwriting error?

Usually, result, err := func() is used.
When one of the variables is already initialized:
_, err := func()
var result string
result, err = func()
Doing:
result, err = func()
all_results += result // seems redundant and unneeded
How do you append results to one of them (result), and reset the other one?
// along the lines of this:
var result slice
// for loop {
result, _ += func() // combine this line
_, err = func() // with this line
Can you do:
result +=, err = func()
// or
result, err +=, = func()
// or
result, err += = func()
// or
result, err (+=, =) func() // ?
The language spec does not support different treatment for multiple return values.
However, it's very easy to do it with a helper function:
func foo() (int, error) {
return 1, nil
}
func main() {
var all int
add := func(result int, err error) error {
all += result
return err
}
if err := add(foo()); err != nil {
panic(err)
}
if err := add(foo()); err != nil {
panic(err)
}
if err := add(foo()); err != nil {
panic(err)
}
fmt.Println(all)
}
This will output 3 (try it on the Go Playground).
If you can move the error handling into the helper function, it can also look like this:
var all int
check := func(result int, err error) int {
if err != nil {
panic(err)
}
return result
}
all += check(foo())
all += check(foo())
all += check(foo())
fmt.Println(all)
This outputs the same, try this one on the Go Playground.
Another variant can be to do everything in the helper function:
var all int
handle := func(result int, err error) {
if err != nil {
panic(err)
}
all += result
}
handle(foo())
handle(foo())
handle(foo())
fmt.Println(all)
Try this one on the Go Playground.
See related: Multiple values in single-value context

Efficiently count the number of JSON objects in a file

I need to get the number of json objects in a given file. The File contains an array of JSON objects. I observe that its taking approximately 150-180 seconds to count a file with 1 million objects. Is there a way I can optimize the below code to get the count faster?
func Count(file string) (int, error) {
f, err := os.Open(file)
if err != nil {
return -1, err
}
defer f.Close()
dec := json.NewDecoder(bufio.NewReader(f))
_, e := dec.Token()
if e != nil {
return -1, e
}
var count int
for dec.More() {
var tempMap map[string]interface{}
readErr := dec.Decode(&tempMap)
if readErr != nil {
return -1, readErr
}
tranCount++
}
return count, nil
}
Speed things up by counting start object delimiters instead of decoding to Go values.
Based on the code in the question, it looks like your goal is to count objects at the first level of nesting in the document. Here's code that does that:
func Count(r io.Reader) (int, error) {
dec := json.NewDecoder(r)
nest := 0
count := 0
for {
t, err := dec.Token()
if err == io.EOF {
break
}
if err != nil {
return -1, err
}
switch t {
case json.Delim('{'):
if nest == 1 {
count++
}
nest++
case json.Delim('}'):
nest--
}
}
return count, nil
}
If your goal is to count all objects, remove all uses of nest from the code above:
func Count(r io.Reader) (int, error) {
dec := json.NewDecoder(r)
count := 0
for {
t, err := dec.Token()
if err == io.EOF {
break
}
if err != nil {
return -1, err
}
switch t {
case json.Delim('{'):
count++
}
}
return count, nil
}

Loading CSV data on server, converting data to JSON and getting result using Json Queries using Golang

I am trying to build a TCP server that loads dataset from a CSV file and provide an interface to query the dataset. TCP server will expose port 4040. CSV file contains the following columns related to corona virus cases:
Cumulative Test Positive
Cumulative Tests Performed
Date
Discharged
Expired
Admitted
Region
Users should be able to connect to the server using NetCat nc localhost 4040 command on Linux/Unix based systems.
Once connected to TCP, the user should be able to communicate with the application by sending queries in JSON format.
{
"query": {
"region": "Sindh"
}
}
{
"query": {
"date": "2020-03-20"
}
}
My server.go
package main
import (
"fmt"
"net"
"os"
"flag"
"log"
"encoding/csv"
"encoding/json"
"bufio"
"io"
"strings"
)
type CovidPatient struct {
Positive string `json:"Covid_Positive"`
Performed string `json:"Coivd_Performed"`
Date string `json:"Covid_Date"`
Discharged string `json:"Covid_Discharged"`
Expired string `json:"Covid_Expired"`
Region string `json:"Covid_Region"`
Admitted string `json:"Covid_Admitted"`
}
type DataRequest struct {
Get string `json:"get"`
}
type DataError struct {
Error string `json:"Covid_error"`
}
func Load(path string) []CovidPatient {
table := make([]CovidPatient, 0)
var patient CovidPatient
file, err := os.Open(path)
if err != nil {
panic(err.Error())
}
defer file.Close()
reader := csv.NewReader(file)
csvData, err := reader.ReadAll()
if err != nil {
fmt.Println(err)
os.Exit(1)
}
for _, row := range csvData{
patient.Positive = row[0]
patient.Performed = row[1]
patient.Date = row[2]
patient.Discharged = row[3]
patient.Expired = row[4]
patient.Region = row[5]
patient.Admitted = row[6]
table = append(table, patient)
}
return table
}
func Find(table []CovidPatient, filter string) []CovidPatient {
if filter == "" || filter == "*" {
return table
}
result := make([]CovidPatient, 0)
filter = strings.ToUpper(filter)
for _, cp := range table {
if cp.Date == filter ||
cp.Region == filter ||
strings.Contains(strings.ToUpper(cp.Positive), filter) ||
strings.Contains(strings.ToUpper(cp.Performed), filter) ||
strings.Contains(strings.ToUpper(cp.Date), filter) ||
strings.Contains(strings.ToUpper(cp.Discharged), filter) ||
strings.Contains(strings.ToUpper(cp.Expired), filter) ||
strings.Contains(strings.ToUpper(cp.Region), filter) ||
strings.Contains(strings.ToUpper(cp.Admitted), filter){
result = append(result, cp)
}
}
return result
}
var (
patientsDetail = Load("./covid_final_data.csv")
)
func main(){
var addr string
var network string
flag.StringVar(&addr, "e", ":4040", "service endpoint [ip addr or socket path]")
flag.StringVar(&network, "n", "tcp", "network protocol [tcp,unix]")
flag.Parse()
switch network {
case "tcp", "tcp4", "tcp6", "unix":
default:
fmt.Println("unsupported network protocol")
os.Exit(1)
}
ln, err := net.Listen(network, addr)
if err != nil {
log.Println(err)
os.Exit(1)
}
defer ln.Close()
log.Println("Covid19 Condition in Pakistan")
log.Printf("Service started: (%s) %s\n", network, addr)
for {
conn, err := ln.Accept()
if err != nil {
log.Println(err)
conn.Close()
continue
}
log.Println("Connected to ", conn.RemoteAddr())
go handleConnection(conn)
}
}
func handleConnection(conn net.Conn) {
defer func() {
if err := conn.Close(); err != nil {
log.Println("error closing connection:", err)
}
}()
reader := bufio.NewReaderSize(conn, 4)
for {
buf, err := reader.ReadSlice('}')
if err != nil {
if err != io.EOF {
log.Println("connection read error:", err)
return
}
}
reader.Reset(conn)
var req DataRequest
if err := json.Unmarshal(buf, &req); err != nil {
log.Println("failed to unmarshal request:", err)
cerr, jerr := json.Marshal(DataError{Error: err.Error()})
if jerr != nil {
log.Println("failed to marshal DataError:", jerr)
continue
}
if _, werr := conn.Write(cerr); werr != nil {
log.Println("failed to write to DataError:", werr)
return
}
continue
}
result := Find(patientsDetail, req.Get)
rsp, err := json.Marshal(&result)
if err != nil {
log.Println("failed to marshal data:", err)
if _, err := fmt.Fprintf(conn, `{"data_error":"internal error"}`); err != nil {
log.Printf("failed to write to client: %v", err)
return
}
continue
}
if _, err := conn.Write(rsp); err != nil {
log.Println("failed to write response:", err)
return
}
}
}
This correctly loads the csv and convert it into JSON. But, when I try to run query using NetCat command it return empty JSON element. Kindly guide me where is error.
Guess you want this:
╭─root#DESKTOP-OCDRD7Q ~
╰─# nc localhost 4040
{"get": "Sindh"}
[{"Covid_Positive":"1","Coivd_Performed":"1","Covid_Date":"1","Covid_Discharged":"1","Covid_Expired":"1","Covid_Region":"Sindh","Covid_Admitted":"1"}]
What you should do is just to modify your json request.
package main
import (
"bufio"
"encoding/csv"
"encoding/json"
"flag"
"fmt"
"io"
"log"
"net"
"os"
)
type CovidPatient struct {
Positive string `json:"Covid_Positive"`
Performed string `json:"Coivd_Performed"`
Date string `json:"Covid_Date"`
Discharged string `json:"Covid_Discharged"`
Expired string `json:"Covid_Expired"`
Region string `json:"Covid_Region"`
Admitted string `json:"Covid_Admitted"`
}
type DataRequest struct {
Get CovidPatient `json:"get"`
}
type DataError struct {
Error string `json:"Covid_error"`
}
func Load(path string) []CovidPatient {
table := make([]CovidPatient, 0)
var patient CovidPatient
file, err := os.Open(path)
if err != nil {
panic(err.Error())
}
defer file.Close()
reader := csv.NewReader(file)
csvData, err := reader.ReadAll()
if err != nil {
fmt.Println(err)
os.Exit(1)
}
for _, row := range csvData {
patient.Positive = row[0]
patient.Performed = row[1]
patient.Date = row[2]
patient.Discharged = row[3]
patient.Expired = row[4]
patient.Region = row[5]
patient.Admitted = row[6]
table = append(table, patient)
}
return table
}
func Find(table []CovidPatient, filter CovidPatient) []CovidPatient {
result := make([]CovidPatient, 0)
log.Println(filter, table)
for _, cp := range table {
if filter.Positive == "" {
} else if filter.Positive != cp.Positive {
continue
}
if filter.Performed == "" {
} else if filter.Performed != cp.Performed {
continue
}
if filter.Date == "" {
} else if filter.Date != cp.Date {
continue
}
if filter.Discharged == "" {
} else if filter.Discharged != cp.Discharged {
continue
}
if filter.Expired == "" {
} else if filter.Expired != cp.Expired {
continue
}
if filter.Region == "" {
} else if filter.Region != cp.Region {
continue
}
if filter.Admitted == "" {
} else if filter.Admitted != cp.Admitted {
continue
}
result = append(result, cp)
}
return result
}
var (
patientsDetail = Load("./covid_final_data.csv")
)
func main() {
log.SetFlags(log.Lshortfile | log.Ltime)
var addr string
var network string
flag.StringVar(&addr, "e", ":4040", "service endpoint [ip addr or socket path]")
flag.StringVar(&network, "n", "tcp", "network protocol [tcp,unix]")
flag.Parse()
switch network {
case "tcp", "tcp4", "tcp6", "unix":
default:
fmt.Println("unsupported network protocol")
os.Exit(1)
}
ln, err := net.Listen(network, addr)
if err != nil {
log.Println(err)
os.Exit(1)
}
defer ln.Close()
log.Println("Covid19 Condition in Pakistan")
log.Printf("Service started: (%s) %s\n", network, addr)
for {
conn, err := ln.Accept()
if err != nil {
log.Println(err)
conn.Close()
continue
}
log.Println("Connected to ", conn.RemoteAddr())
go handleConnection(conn)
}
}
func handleConnection(conn net.Conn) {
defer func() {
if err := conn.Close(); err != nil {
log.Println("error closing connection:", err)
}
}()
reader := bufio.NewReaderSize(conn, 100)
for {
buf, err := reader.ReadBytes('|')
if err != nil {
if err != io.EOF {
log.Println("connection read error:", err)
return
}
}
reader.Reset(conn)
var req DataRequest
if err := json.Unmarshal(buf[:len(buf)-1], &req); err != nil {
log.Println("failed to unmarshal request:", string(buf), err)
cerr, jerr := json.Marshal(DataError{Error: err.Error()})
if jerr != nil {
log.Println("failed to marshal DataError:", jerr)
continue
}
if _, werr := conn.Write(cerr); werr != nil {
log.Println("failed to write to DataError:", werr)
return
}
continue
}
result := Find(patientsDetail, req.Get)
rsp, err := json.Marshal(&result)
if err != nil {
log.Println("failed to marshal data:", err)
if _, err := fmt.Fprintf(conn, `{"data_error":"internal error"}`); err != nil {
log.Printf("failed to write to client: %v", err)
return
}
continue
}
if _, err := conn.Write(rsp); err != nil {
log.Println("failed to write response:", err)
return
}
}
}
The query is:
╭─root#DESKTOP-OCDRD7Q ~
╰─# nc localhost 4040 127 ↵
{
"get": {
"Covid_Region": "Sindh",
"Covid_Date": "2020-03-20"
}
}|
[{"Covid_Positive":"1","Coivd_Performed":"1","Covid_Date":"2020-03-20","Covid_Discharged":"1","Covid_Expired":"1","Covid_Region":"Sindh","Covid_Admitted":"1"}]
Inside function handleConnection, the first thing is "read until you find the first }", imagine the user is sending the request:
{ "get": { "Covid_Region": "Sindh", "Covid_Date": "2020-03-20" } }
then that step read:
{ "get": { "Covid_Region": "Sindh", "Covid_Date": "2020-03-20" }
Notice the trailing } is missing, then the json.Unmarshal is trying to unmarshal the query without the last } (which is an invalid json).
This problem can take advantage of JSON streaming decoding, in other words, use json.NewDecoder(r io.Reader) instead of json.Unmarshal. Let me copy and modify the first part of that function:
func handleConnection(conn net.Conn) {
defer func() {
if err := conn.Close(); err != nil {
log.Println("error closing connection:", err)
}
}()
jsonDecoder := json.NewDecoder(conn) // A json decoder read a stream to find a
// valid JSON and stop just the byte
// after the JSON ends. Process can be
// repeated.
for {
var req DataRequest
err := jsonDecoder.Decode(&req)
if err == io.EOF {
log.Println("finish")
return
}
if err != nil {
log.Println("unmarshal:", err)
return
}
result := Find(patientsDetail, req.Get) // Here query the system
// ...
Probably now it works, but you can also take advantage of json streaming to send the response back with a jsonEncoder := json.NewEncoder(conn) before de for loop and sending the request like this:
err := jsonEncoder.Encode(&result)
if err != nil {
log.Println("failed to marshal data:", err)
// ...
continue
}

detect duplicate in JSON String Golang

I have JSON string like
"{\"a\": \"b\", \"a\":true,\"c\":[\"field_3 string 1\",\"field3 string2\"]}"
how to detect the duplicate attribute in this json string using Golang
Use the json.Decoder to walk through the JSON. When an object is found, walk through keys and values checking for duplicate keys.
func check(d *json.Decoder, path []string, dup func(path []string) error) error {
// Get next token from JSON
t, err := d.Token()
if err != nil {
return err
}
// Is it a delimiter?
delim, ok := t.(json.Delim)
// No, nothing more to check.
if !ok {
// scaler type, nothing to do
return nil
}
switch delim {
case '{':
keys := make(map[string]bool)
for d.More() {
// Get field key.
t, err := d.Token()
if err != nil {
return err
}
key := t.(string)
// Check for duplicates.
if keys[key] {
// Duplicate found. Call the application's dup function. The
// function can record the duplicate or return an error to stop
// the walk through the document.
if err := dup(append(path, key)); err != nil {
return err
}
}
keys[key] = true
// Check value.
if err := check(d, append(path, key), dup); err != nil {
return err
}
}
// consume trailing }
if _, err := d.Token(); err != nil {
return err
}
case '[':
i := 0
for d.More() {
if err := check(d, append(path, strconv.Itoa(i)), dup); err != nil {
return err
}
i++
}
// consume trailing ]
if _, err := d.Token(); err != nil {
return err
}
}
return nil
}
Here's how to call it:
func printDup(path []string) error {
fmt.Printf("Duplicate %s\n", strings.Join(path, "/"))
return nil
}
...
data := `{"a": "b", "a":true,"c":["field_3 string 1","field3 string2"], "d": {"e": 1, "e": 2}}`
if err := check(json.NewDecoder(strings.NewReader(data)), nil, printDup); err != nil {
log.Fatal(err)
}
The output is:
Duplicate a
Duplicate d/e
Run it on the Playground
Here's how to generate an error on the first duplicate key:
var ErrDuplicate = errors.New("duplicate")
func dupErr(path []string) error {
return ErrDuplicate
}
...
data := `{"a": "b", "a":true,"c":["field_3 string 1","field3 string2"], "d": {"e": 1, "e": 2}}`
err := check(json.NewDecoder(strings.NewReader(data)), nil, dupErr)
if err == ErrDuplicate {
fmt.Println("found a duplicate")
} else if err != nil {
// some other error
log.Fatal(err)
}
One that would probably work well would be to simply decode, reencode, then check the length of the new json against the old json:
https://play.golang.org/p/50P-x1fxCzp
package main
import (
"encoding/json"
"fmt"
)
func main() {
jsn := []byte("{\"a\": \"b\", \"a\":true,\"c\":[\"field_3 string 1\",\"field3 string2\"]}")
var m map[string]interface{}
err := json.Unmarshal(jsn, &m)
if err != nil {
panic(err)
}
l := len(jsn)
jsn, err = json.Marshal(m)
if err != nil {
panic(err)
}
if l != len(jsn) {
panic(fmt.Sprintf("%s: %d (%d)", "duplicate key", l, len(jsn)))
}
}
The right way to do it would be to re-implement the json.Decode function, and store a map of keys found, but the above should work (especially if you first stripped any spaces from the json using jsn = bytes.Replace(jsn, []byte(" "), []byte(""), -1) to guard against false positives.