Effectively remove invalid characters from JSON file? - json

I am reading in a file via the command line.
As the file is a JSON export from Oracle, it has a certain structure. This default structure is not valid JSON for some reason. Example:
// This isn't valid JSON
,"items":
[
{"id":123,"language":"ja-JP","location":"Osaka"}
,{"id":33,"language":"ja-JP","location":"Tokyo"}
,{"id":22,"language":"ja-JP","location":"Kentok"}
]}
I wish for it to only be an array of object, thus having the expected output:
// This is valid json
[
{"id":123,"language":"ja-JP","location":"Osaka"}
,{"id":33,"language":"ja-JP","location":"Tokyo"}
,{"id":22,"language":"ja-JP","location":"Kentok"}
]
Therefore, I need to remove line 1(entirely) as well as the last } from the last line of the file.
The file is being parsed via commandline from the input:
file, err := ioutil.ReadFile(os.Args[1])
I am trying to remove the invalid strings/words this way, but it does not reformat anything:
// in func main()
removeInvalidJSON(file, os.Args[1])
// later on ..
func removeInvalidJSON(file []byte, path string) {
info, _ := os.Stat(path)
mode := info.Mode()
array := strings.Split(string(file), "\n")
fmt.Println(array)
//If we have the clunky items array which is invalid JSON, remove the first line
if strings.Contains(array[0], "items") {
fmt.Println("Removing items")
array = append(array[:1], array[1+1:]...)
}
// Finds the last index of the array
lastIndex := array[len(array)-1]
// If we have the "}" in the last line, remove it as this is invalid JSON
if strings.Contains(lastIndex, "}") {
fmt.Println("Removing }")
strings.Trim(lastIndex, "}")
}
// Nothing changed?
fmt.Println(array)
ioutil.WriteFile(path, []byte(strings.Join(array, "\n")), mode)
}
The above function does write to the file I can see - but it does not alter the array as far as I can tell, and does not write it into the file.
How do I effectively remote the first line of the file, as well as the last false curly brace } from the file?
I unmarshall the JSON in another function: Is there a method of doing it more "cleanly" using the "encoding/json" library?

There are several significant issues with this code causing it to behave not as intended. I've noted these with comments below:
func removeInvalidJSON(file []byte, path string) {
info, _ := os.Stat(path)
mode := info.Mode()
array := strings.Split(string(file), "\n")
fmt.Println(array)
//If we have the clunky items array which is invalid JSON, remove the first line
if strings.Contains(array[0], "items") {
fmt.Println("Removing items")
// If you just want to remove the first item, this should be array = array[1:].
// As written, this appends the rest of the array to the first item, i.e. nothing.
array = append(array[:1], array[1+1:]...)
}
// Finds the last ~index~ *line* of the array
lastIndex := array[len(array)-1]
// If we have the "}" in the last line, remove it as this is invalid JSON
if strings.Contains(lastIndex, "}") {
fmt.Println("Removing }")
// Strings are immutable. `strings.Trim` does nothing if you discard the return value
strings.Trim(lastIndex, "}")
// After the trim, if you want this to have any effect, you need to put it back in `array`.
}
// Nothing changed?
fmt.Println(array)
ioutil.WriteFile(path, []byte(strings.Join(array, "\n")), mode)
}
I think what you want is something more like:
func removeInvalidJSON(file []byte, path string) {
info, _ := os.Stat(path)
mode := info.Mode()
array := strings.Split(string(file), "\n")
fmt.Println(array)
//If we have the clunky items array which is invalid JSON, remove the first line
if strings.Contains(array[0], "items") {
fmt.Println("Removing items")
array = array[1:]
}
// Finds the last line of the array
lastLine := array[len(array)-1]
array[len(array)-1] = strings.Trim(lastLine, "}")
fmt.Println(array)
ioutil.WriteFile(path, []byte(strings.Join(array, "\n")), mode)
}

Related

How to append consecutively to a JSON file in Go?

I wonder how can I write consecutively to the same file in Go? Do I have to use os.WriteAt()?
The JSON is basically just an array filled with structs:
[
{
"Id": "2817293",
"Data": "XXXXXXX"
},
{
"Id": "2817438",
"Data": "XXXXXXX"
}
...
]
I want right data to it consecutively i.e. append to that JSON array more than once before closing the file.
The data I want to write to the file is a slice of said structs:
dataToWrite := []struct{
Id string
Data string
}{}
What is the proper way to write consecutively to a JSON array in Go?
My current approach creates multiple slices in the JSON file and thus is not what I want. The write process (lying in a for loop) looks like this:
...
// Read current state of file
data := []byte{}
f.Read(data)
// Write current state to slice
curr := []Result{}
json.Unmarshal(data, &curr)
// Append data to the created slice
curr = append(curr, *initArr...)
JSON, _ := JSONMarshal(curr)
// Empty data container
initArr = &[]Result{}
// Write
_, err := f.Write(JSON)
if err != nil {
log.Fatal(err)
}
...
Write the opening [ to the file. Create an encoder on the file. Loop over slices and the elements of each slice. Write a comma if it's not the first slice element. Encode each slice element with the encoder. Write the closing ].
_, err := f.WriteString("[")
if err != nil {
log.Fatal(err)
}
e := json.NewEncoder(f)
first := true
for i := 0; i < 10; i++ {
// Create dummy slice data for this iteration.
dataToWrite := []struct {
Id string
Data string
}{
{fmt.Sprintf("id%d.1", i), fmt.Sprintf("data%d.1", i)},
{fmt.Sprintf("id%d.2", i), fmt.Sprintf("data%d.2", i)},
}
// Encode each slice element to the file
for _, v := range dataToWrite {
// Write comma separator if not the first.
if !first {
_, err := f.WriteString(",\n")
if err != nil {
log.Fatal(err)
}
}
first = false
err := e.Encode(v)
if err != nil {
log.Fatal(err)
}
}
}
_, err = f.WriteString("]")
if err != nil {
log.Fatal(err)
}
https://go.dev/play/p/Z-T1nxRIaqL
If it's reasonable to hold all of the slice elements in memory, then simplify the code by encoding all of the data in a single batch:
type Item struct {
Id string
Data string
}
// Collect all items to write in this slice.
var result []Item
for i := 0; i < 10; i++ {
// Generate slice for this iteration.
dataToWrite := []Item{
{fmt.Sprintf("id%d.1", i), fmt.Sprintf("data%d.1", i)},
{fmt.Sprintf("id%d.2", i), fmt.Sprintf("data%d.2", i)},
}
// Append slice generated in this iteration to the result.
result = append(result, dataToWrite...)
}
// Write the result to the file.
err := json.NewEncoder(f).Encode(result)
if err != nil {
log.Fatal(err)
}
https://go.dev/play/p/01xmVZg7ePc
If you don't care about the existing file you can just use Encoder.Encode on the whole slice as #redblue mentioned.
If you have an existing file you want to append to, the simplest way is to do what you've shown in your edit: Unmarshal or Decoder.Decoder the whole file into a slice of structs, append the new struct to the slice, and re-decode the whole lot using Marshal or Encoder.Encode.
If you have a large amount of data, you may want to consider using JSON Lines to avoid the trailing , and ] issue, and write one JSON object per line. Or you could use regular JSON, seek back from the end of the file so you're writing over the final ], then write a ,, the new JSON-encoded struct, and finally a ] to make the file a valid JSON array again.
So it depends on a bit on your use case and the data size which approach you take.
NOTICE
This Answer is a solution or workaround if you care about the content of an existing file!
This means it allows you to append to an existing json file, created by your API.
Obviously this only works for arrays of same structs
actual working json format:
[
object,
...
object,
]
When writing to a file, do NOT write [ and ].
Just append to the file writing your serialized json object and append a ,
actual file content:
object,
...
object,
Finally when reading the file prepend [ and append ]
This way you can write to the file from multiple sources and still have valid JSON
Also load the file and have a valid input for your json-processor.
We write our logfiles like this and provide a vaild json via REST calls, which is then processed (for example by a JavaScript Grid)

How to deserialize json comma-separated strings from CSV file

I have a mysql dump csv file containing two columns, json1 and json2, both columns are JSON objects string representations. So a csv row looks like the following:
"{"field1":"value","field2":4}","{"field1":"value","field2":4}"
I need to deserialize those two strings and then unmarshal the JSON to Go values. I'm stuck at the first step. I'm having trouble with the , since the JSON strings themselves have ,s inside them so the reader breaks each line in a wrong number of fields, never two as needed.
Here is my full code:
reader := csv.NewReader(csvFile)
reader.LazyQuotes = true //allows non-doubled quotes to appear in quoted fields
for {
record, err := reader.Read()
if err == io.EOF {
break
}
if err != nil {
log.Fatal(err)
}
fmt.Printf("json1: %s json2 %s\n", record[0], record[1])
}
What I've tried
I've tried setting the csv delimiter to }","{ and then appending the corresponding } and { to the resulting strings but, besides it being prone to errors, some of the rows have a NULL json1 or json2.
Observations
I'm using
- golang 1.12.1
I would just use strings.Split() to split on }","{ (if you are sure that will always work) then Unmarshall the JSON strings as you say. Can you get the dump file to have nested quotes delimited somehow?
columns := strings.Split(`"{"field1":"value","field2":4}","{"field1":"value","field2":5}"`, `}","{`)
for i, s := range columns {
if i == 0 {
s = s[1:] // remove leading quote
}
if i == len(columns) - 1 {
s = s[:len(s)-1] // remove trailing quote
}
if i > 0 {
s = "{" + s
}
if i < len(columns) - 1 {
s += "}"
}
// unmarshal JSON ...
}
This is a bit of a kludge but should work even if some fields are NULL.

Check if JSON is Object or Array

Is there a simple way in Go to check whether given JSON is either an Object {} or array []?
The first thing that comes to mind is to json.Unmarshal() into an interface, and then see if it becomes a map, or a slice of maps. But that seems quite inefficient.
Could I just check if the first byte is a { or a [? Or is there a better way of doing this that already exists.
Use the following to detect if JSON text in the []byte value data is an array or object:
// Get slice of data with optional leading whitespace removed.
// See RFC 7159, Section 2 for the definition of JSON whitespace.
x := bytes.TrimLeft(data, " \t\r\n")
isArray := len(x) > 0 && x[0] == '['
isObject := len(x) > 0 && x[0] == '{'
This snippet of code handles optional leading whitespace and is more efficient than unmarshalling the entire value.
Because the top-level value in JSON can also be a number, string, boolean or nil, it's possible that isArray and isObject both evaluate to false. The values isArray and isObject can also evaluate to false when the JSON is invalid.
Use a type switch to determine the type. This is similar to Xay's answer, but simpler:
var v interface{}
if err := json.Unmarshal(data, &v); err != nil {
// handle error
}
switch v := v.(type) {
case []interface{}:
// it's an array
case map[string]interface{}:
// it's an object
default:
// it's something else
}
Do step-by-step parsing of your JSON, using json.Decoder. This has the advantage over the other answers of:
Being more efficient than decoding the entire value
Using the official JSON parsing rules, and generating standard errors if you get invalid input.
Note, this code isn't tested, but should be enough to give you the idea. It can also be easily expanded to check for numbers, booleans, or strings, if desired.
func jsonType(in io.Reader) (string, error) {
dec := json.NewDecoder(in)
// Get just the first valid JSON token from input
t, err := dec.Token()
if err != nil {
return "", err
}
if d, ok := t.(json.Delim); ok {
// The first token is a delimiter, so this is an array or an object
switch (d) {
case '[':
return "array", nil
case '{':
return "object", nil
default: // ] or }, shouldn't be possible
return "", errors.New("Unexpected delimiter")
}
}
return "", errors.New("Input does not represent a JSON object or array")
}
Note that this consumed the first few bytes of in. It is an exercise for the reader to make a copy, if necessary. If you're trying to read from a byte slice ([]byte), convert it to a reader first:
t, err := jsonType(bytes.NewReader(myValue))
Go playground

Using go-jsonnet to return pure JSON

I am using Google's go-jsonnet library to evaluate some jsonnet files.
I have a function, like so, which renders a Jsonnet document:
// Takes a list of jsonnet files and imports each one and mixes them with "+"
func renderJsonnet(files []string, param string, prune bool) string {
// empty slice
jsonnetPaths := files[:0]
// range through the files
for _, s := range files {
jsonnetPaths = append(jsonnetPaths, fmt.Sprintf("(import '%s')", s))
}
// Create a JSonnet VM
vm := jsonnet.MakeVM()
// Join the slices into a jsonnet compat string
jsonnetImport := strings.Join(jsonnetPaths, "+")
if param != "" {
jsonnetImport = "(" + jsonnetImport + ")" + param
}
if prune {
// wrap in std.prune, to remove nulls, empty arrays and hashes
jsonnetImport = "std.prune(" + jsonnetImport + ")"
}
// render the jsonnet
out, err := vm.EvaluateSnippet("file", jsonnetImport)
if err != nil {
log.Panic("Error evaluating jsonnet snippet: ", err)
}
return out
}
This function currently returns a string, because the jsonnet EvaluateSnippet function returns a string.
What I now want to do is render that result JSON using the go-prettyjson library. However, because the JSON i'm piping in is a string, it's not rendering correctly.
So, some questions:
Can I convert the returned JSON string to a JSON type, without knowing beforehand what struct to marshal it into
if not, can I render the json in a pretty manner some other way?
Is there an option, function or method I'm missing here to make this easier?
Can I convert the returned JSON string to a JSON type, without knowing beforehand what struct to marshal it into
Yes. It's very easy:
var jsonOut interface{}
err := json.Unmarshal([]byte(out), &jsonOut)
if err != nil {
log.Panic("Invalid json returned by jsonnet: ", err)
}
formatted, err := prettyjson.Marshal([]byte(jsonOut))
if err != nil {
log.Panic("Failed to format jsonnet output: ", err)
}
More info here: https://blog.golang.org/json-and-go#TOC_5.
Is there an option, function or method I'm missing here to make this easier?
Yes. The go-prettyjson library has a Format function which does the unmarshalling for you:
formatted, err := prettyjson.Format([]byte(out))
if err != nil {
log.Panic("Failed to format jsonnet output: ", err)
}
can I render the json in a pretty manner some other way?
Depends on your definition of pretty. Jsonnet normally outputs every field of an object and every array element on a separate line. This is usually considered pretty printing (as opposed to putting everything on the same line with minimal whitespace to save a few bytes). I suppose this is not good enough for you. You can write your own manifester in jsonnet which formats it to your liking (see std.manifestJson as an example).

Parsing nested JSON objects in a CSV file with golang

I'm trying to parse a CSV file which contains a JSON object in the last column.
Here is an example with two rows from the input CSV file:
'id','value','createddate','attributes'
524256,CAFE,2018-04-06 16:41:01,{"Att1Numeric": 6, "Att2String": "abc"}
524257,BEBE,2018-04-06 17:00:00,{}
I tried using the parser from csv package:
func processFileAsCSV(f *multipart.Part) (int, error) {
reader := csv.NewReader(f)
reader.LazyQuotes = true
reader.Comma = ','
lineCount := 0
for {
line, err := reader.Read()
if err == io.EOF {
break
} else if err != nil {
fmt.Println("Error:", err)
return 0, err
}
if lineCount%100000 == 0 {
fmt.Println(lineCount)
}
lineCount++
fmt.Println(lineCount, line)
processLine(line) // do something with the line
}
fmt.Println("done!", lineCount)
return lineCount, nil
}
But I got an error:
Error: line 2, column 0: wrong number of fields in line,
probably because the parser ignores the JSON scope which starts with {.
Should I be writing my own CSV parser, or is there a library that can handle this?
Your CSV input doesn't follow normal CSV convention, by using unquoted fields (for JSON).
I think the best approach would be to pre-process your input, either in your Go program, or in an external script.
If your CSV input is predictable (as indicated in your question), it should be easy to properly quote last element, using a simple strings.Split call, for instance, before passing it to the CSV parser.