Time JSON marshals to 0 time - json

I have the following code which primarily marshals and un-marshals a time struct. Here is the code
package main
import (
"fmt"
"time"
"encoding/json"
)
type check struct{
A time.Time `json:"a"`
}
func main(){
ds := check{A:time.Now().Truncate(0)}
fmt.Println(ds)
dd, _ := json.Marshal(ds)
d2 := check {}
json.Unmarshal(dd, d2)
fmt.Println(d2)
}
here is the output it produces
{2019-05-20 15:20:16.247914 +0530 IST}
{0001-01-01 00:00:00 +0000 UTC}
The first line is the original time and the second line is the time after the unmarshalling. Why do we have this loss of information with JSON conversions? How to prevent this?
Thanks.

Go vet tells you exactly what the problem is:
./prog.go:18:16: call of Unmarshal passes non-pointer as second argument
Also never ignore errors! The least you can do is print it:
ds := check{A: time.Now().Truncate(0)}
fmt.Println(ds)
dd, err := json.Marshal(ds)
fmt.Println(err)
d2 := check{}
err = json.Unmarshal(dd, d2)
fmt.Println(err)
fmt.Println(d2)
This will output (try it on the Go Playground):
{2009-11-10 23:00:00 +0000 UTC}
<nil>
json: Unmarshal(non-pointer main.check)
{0001-01-01 00:00:00 +0000 UTC}
You have to pass a pointer to json.Unmarshal() for it to be able to unmarshal into (change) your value:
err = json.Unmarshal(dd, &d2)
With this change output will be (try it on the Go Playground):
{2009-11-10 23:00:00 +0000 UTC}
<nil>
<nil>
{2009-11-10 23:00:00 +0000 UTC}

Related

Writing data from bigquery to csv is slow

I wrote code that behaves weird and slow and I can't understand why.
What I'm trying to do is to download data from bigquery (using a query as an input) to a CSV file, then create a url link with this CSV so people can download it as a report.
I'm trying to optimize the process of writing the CSV as it takes some time and have some weird behavior.
The code iterates over bigquery results and pass each result to a channel for future parsing/writing using golang encoding/csv package.
This is the relevant parts with some debugging
func (s *Service) generateReportWorker(ctx context.Context, query, reportName string) error {
it, err := s.bigqueryClient.Read(ctx, query)
if err != nil {
return err
}
filename := generateReportFilename(reportName)
gcsObj := s.gcsClient.Bucket(s.config.GcsBucket).Object(filename)
wc := gcsObj.NewWriter(ctx)
wc.ContentType = "text/csv"
wc.ContentDisposition = "attachment"
csvWriter := csv.NewWriter(wc)
var doneCount uint64
go backgroundTimer(ctx, it.TotalRows, &doneCount)
rowJobs := make(chan []bigquery.Value, it.TotalRows)
workers := 10
wg := sync.WaitGroup{}
wg.Add(workers)
// start wrokers pool
for i := 0; i < workers; i++ {
go func(c context.Context, num int) {
defer wg.Done()
for row := range rowJobs {
records := make([]string, len(row))
for j, r := range records {
records[j] = fmt.Sprintf("%v", r)
}
s.mu.Lock()
start := time.Now()
if err := csvWriter.Write(records); err != {
log.Errorf("Error writing row: %v", err)
}
if time.Since(start) > time.Second {
fmt.Printf("worker %d took %v\n", num, time.Since(start))
}
s.mu.Unlock()
atomic.AddUint64(&doneCount, 1)
}
}(ctx, i)
}
// read results from bigquery and add to the pool
for {
var row []bigquery.Value
if err := it.Next(&row); err != nil {
if err == iterator.Done || err == context.DeadlineExceeded {
break
}
log.Errorf("Error loading next row from BQ: %v", err)
}
rowJobs <- row
}
fmt.Println("***done loop!***")
close(rowJobs)
wg.Wait()
csvWriter.Flush()
wc.Close()
url := fmt.Sprintf("%s/%s/%s", s.config.BaseURL s.config.GcsBucket, filename)
/// ....
}
func backgroundTimer(ctx context.Context, total uint64, done *uint64) {
ticker := time.NewTicker(10 * time.Second)
go func() {
for {
select {
case <-ctx.Done():
ticker.Stop()
return
case _ = <-ticker.C:
fmt.Printf("progress (%d,%d)\n", atomic.LoadUint64(done), total)
}
}
}()
}
bigquery Read func
func (c *Client) Read(ctx context.Context, query string) (*bigquery.RowIterator, error) {
job, err := c.bigqueryClient.Query(query).Run(ctx)
if err != nil {
return nil, err
}
it, err := job.Read(ctx)
if err != nil {
return nil, err
}
return it, nil
}
I run this code with query that have about 400,000 rows. the query itself take around 10 seconds, but the whole process takes around 2 minutes
The output:
progress (112346,392565)
progress (123631,392565)
***done loop!***
progress (123631,392565)
progress (123631,392565)
progress (123631,392565)
progress (123631,392565)
progress (123631,392565)
progress (123631,392565)
progress (123631,392565)
worker 3 took 1m16.728143875s
progress (247525,392565)
progress (247525,392565)
progress (247525,392565)
progress (247525,392565)
progress (247525,392565)
progress (247525,392565)
progress (247525,392565)
worker 3 took 1m13.525662666s
progress (370737,392565)
progress (370737,392565)
progress (370737,392565)
progress (370737,392565)
progress (370737,392565)
progress (370737,392565)
progress (370737,392565)
progress (370737,392565)
worker 4 took 1m17.576536375s
progress (392565,392565)
You can see that writing first 112346 rows was fast, then for some reason worker 3 took 1.16minutes (!!!) to write a single row, which cause the other workers to wait for the mutex to be released, and this happened again 2 more times, which caused the whole process to take more than 2 minutes to finish.
I'm not sure whats going and how can I debug this further, why I have this stalls in the execution?
As suggested by #serge-v, you can write all the records to a local file and then transfer the file as a whole to GCS. To make the process happen in a shorter time span you can split the files into multiple chunks and can use this command : gsutil -m cp -j where
gsutil is used to access cloud storage from command line
-m is used to perform a parallel multi-threaded/multi-processing copy
cp is used to copy files
-j applies gzip transport encoding to any file upload. This also saves network bandwidth while leaving the data uncompressed in Cloud Storage.
To apply this command in your go Program you can refer to this Github link.
You could try implementing profiling in your Go program. Profiling will help you analyze the complexity. You can also find the time consumption in the program through profiling.
Since you are reading millions of rows from BigQuery you can try using the BigQuery Storage API. It Provides faster access to BigQuery-managed Storage than Bulk data export. Using BigQuery Storage API rather than the iterators that you are using in Go program can make the process faster.
For more reference you can also look into the Query Optimization techniques provided by BigQuery.

Visually align TSV using tabs

I have a text file with fields, separated by some number of consequent tabs (so that the fields are all visually aligned). I'd like to add a lot of new fields to it from another (not aligned, pure tsv) file, while keeping everything aligned. A lot of values contain spaces in them, so only tabs (with assumed width of 8) can be used for alignment, because I want to be able to parse the file later by splitting each line on any number of consequent tabs. This means that I can't use tools like column or tsv-pretty as they use spaces for alignment. Is there a tool or a short script I can use to achieve what I want?
Example:
File 1:
AA BB CCC
AAAA BBB CCC
AA BBBB CC
File 2:
DD EE FF
DDDD EE FFFF
DD EEEE FF
Result:
AA BB CCC
AAAA BBB CCC
AA BBBB CC
DD EE FF
DDDD EE FFFF
DD EEEE FF
Visual alignment is for human consumption don't save the file in that format, rather when you need to view the file use column to format it for you.
First need to get rid of the extra tabs in your first file and combine the files
$ cat <(tr -s '\t' <file1) file2 > file12
which will have the aligned columns by the delimiter (tab). Now you can use column -ts$'\t' file12 whenever you want to view the file which will align the columns for you.
This assumes you don't have missing fields.
I asked this question in hope that there's an existing tool or a simple awk/perl one-liner that can do what I want. Looks like there isn't, so I wrote a simple tool in Go that worked for my input. It doesn't handle a lot of things that a good tsv parser should (like escaping) but maybe it'll still be useful for someone else:
package main
import (
"bufio"
"fmt"
"math"
"os"
"strings"
)
const tabWidth = 8
func tsvAlign(filenames []string) (err error) {
var lines [][]string
for _, filename := range filenames {
file, err := os.Open(filename)
if err != nil {
return err
}
defer file.Close()
scanner := bufio.NewScanner(file)
for scanner.Scan() {
lines = append(lines, strings.FieldsFunc(scanner.Text(), func(c rune) bool { return c == '\t' }))
}
}
maxFieldWidths := make([]int, len(lines[0])-1, len(lines[0])-1)
for i := 0; i < len(lines[0])-1; i++ {
for _, line := range lines {
if len(line[i]) > maxFieldWidths[i] {
maxFieldWidths[i] = len(line[i])
}
}
}
for _, line := range lines {
for i, field := range line[:len(line)-1] {
padding := int(math.Ceil(float64(maxFieldWidths[i]+tabWidth-maxFieldWidths[i]%tabWidth)/8 - float64(len(field))/8))
fmt.Print(field, strings.Repeat("\t", padding))
}
fmt.Println(line[len(line)-1])
}
return err
}
func main() {
if len(os.Args) < 2 {
fmt.Fprintln(os.Stderr, "ERROR: No arguments provided")
return
}
err := tsvAlign(os.Args[1:])
if err != nil {
fmt.Fprintln(os.Stderr, "ERROR: ", err)
}
}

JSON Marshal uint or int as integer

I'm looking for information about the json marshal with Go. I'll explain the situation first.
I'm developing an app for a IoT device. The app sends a JSON inside a MQTT Packet to our broker. How the device is using a SIM for data connection I need to reduce to minimum the bytes of the packet.
Right now, The JSON has this structure
{
"d": 1524036831
"p": "important message"
}
The field d is a timestamp and p is the payload.
When the app sends this JSON it has 40 bytes. But if d is 1000, pe, the JSON will be 34 bytes. So the marshal is converting the field d as uint32 to ASCII representation of the number and then sends the string.
What I want is to send this field as a true int or uint. I want to say, 1524036831 is a int32, 4 bytes, the same as 1000. So with this change I could reduce the packet size some bytes and the number is be able to grow to 32 bits.
I read the docs for json.Marshal and I did not find anything about this.
I found a "solution" but I guest it is not pretty but does the work. I want another opinions.
Ugly solution (for me)
package main
import (
"encoding/binary"
"encoding/json"
"fmt"
)
type test struct {
Data uint32 `json:"d"`
Payload string `json:"p"`
}
type testB struct {
Data []byte `json:"d"`
Payload string `json:"p"`
}
func main() {
fmt.Println("TEST with uin32")
d := []test{test{Data: 5, Payload: "Important Message"}, test{Data: 10, Payload: "Important Message"}, test{Data: 1000, Payload: "Important Message"}, test{Data: 1524036831, Payload: "Important Message"}}
for _, i := range d {
j, _ := json.Marshal(i)
fmt.Println(string(j))
fmt.Println("All:", len(j))
fmt.Println("-----------")
}
fmt.Println("\nTEST with []Byte")
d1 := []testB{testB{Data: make([]byte, 4), Payload: "Important Message"}, testB{Data: make([]byte, 4), Payload: "Important Message"}, testB{Data: make([]byte, 4), Payload: "Important Message"}, testB{Data: make([]byte, 4), Payload: "Important Message"}}
binary.BigEndian.PutUint32(d1[0].Data, 5)
binary.BigEndian.PutUint32(d1[1].Data, 20)
binary.BigEndian.PutUint32(d1[2].Data, 1000)
binary.BigEndian.PutUint32(d1[3].Data, 1524036831)
for _, i := range d1 {
j, _ := json.Marshal(i)
fmt.Println(string(j))
fmt.Println(len(j))
fmt.Println("-----------")
}
}
Play
To re-interate my comment: JSON is a text format, and text format are not designed to produce small messages. In particular there is no representation for numbers other than decimal strings in JSON.
Encoding numbers in a base larger than 10 will reduce the message size for large enough numbers.
You can reduce the message size your "ugly" code produces by removing leading zero bytes and encoding with base64.RawStdEncoding (which omits the padding characters). Doing this pays of for numbers >= 1e6.
If you put this all in a custom type it becomes much nicer to use:
package main
import (
"bytes"
"encoding/base64"
"encoding/binary"
"encoding/json"
"fmt"
)
type IntB64 uint32
func (n IntB64) MarshalJSON() ([]byte, error) {
b := make([]byte, 4)
binary.BigEndian.PutUint32(b, uint32(n))
b = bytes.TrimLeft(b, string(0))
// All characters in the base64 alphabet need not be escaped, so we don't
// have to call json.Marshal here.
l := base64.RawStdEncoding.EncodedLen(len(b)) + 2
j := make([]byte, l)
base64.RawStdEncoding.Encode(j[1:], b)
j[0], j[l-1] = '"', '"'
return j, nil
}
func main() {
enc(1) // "AQ"
enc(1000) // "A+g"
enc(1e6 - 1) // "D0I/"
enc(1e6) // "D0JA"
enc(1524036831) // "Wtb03w"
}
func enc(n int64) {
b, _ := json.Marshal(IntB64(n))
fmt.Printf("%10d %s\n", n, string(b))
}
Updated playground: https://play.golang.org/p/7Z03VE9roqN

Ignore unsupported type error on JSON Marshal

Here is my code:
func dump(w io.Writer, val interface{}) error {
je := json.NewEncoder(w)
return je.Encode(val)
}
type example struct {
Name string
Func func()
}
func main() {
a := example{
Name: "Gopher",
Func: func() {},
}
err := dump(os.Stdout, a)
if err != nil {
panic(err)
}
}
The program will panic with json: unsupported type: func()
My question is, how can I encode ANYTHING into json, ignoring those the encode cannot handle. For example, the above data structure, I want the output to be: {"Name": "Gopher"}
IMPORTANT: for the dump funtion, its value is an interface{}, i.e. don't know what kind of data it will receive, so tricks like json:"-" is not what I want.
If in case that the data passed to dump() is not marshal-able, e.g. dump(func(){}), it is perfectly acceptable to just return empty string.
EDIT after two years:
The answer provided by #TehSphinX has this piece of code which I want to avoid:
func (s example) MarshalJSON() ([]byte, error) {
return json.Marshal(struct {
Name string
}{
Name: s.Name,
})
}
the reason is: in func dump(w io.Writer, val interface{}) error, the val is ANY code, i.e. code NOT written by me, could be any open source code, or even data types I don't have access to its source code.
JSON tags (the correct way)
Is this what you are looking for: playground
You can use field tags to give json instructions on how to handle your struct. See also encoding/json for more info on the options you have.
-- edit --
It does not matter that val is of type interface{}. json.Marshal() will reflect on it anyway and find out what type it is. It is the programmers job to set the correct json tags on all structs he/she wants to dump.
Custom Marshal function (another here not necessary way)
You can also write a custom MarshalJSON function to do marshalling of every type however you like: custom marshalling playground
Your own json Marshaller (have it your way)
If you don't like the way golang handles json marshalling you could always write your own marshalling function doing all the reflection youself.
You can use json.RawMessage. It can be used to delay JSON decoding or precompute a JSON encoding.
If you just want to log the entire struct, you can do this.
func dump(w io.Writer, val interface{}) error {
je := json.NewEncoder(w)
return je.Encode(val)
}
type example struct {
Name string
Func func()
}
func main() {
a := example{
Name: "Gopher",
Func: func() {},
}
type _example struct {
example
Func interface{}
}
err := dump(os.Stdout, _example{example: a})
if err != nil {
panic(err)
}
}
This is an example of logging the entire http.Request.
type _Request struct {
*http.Request
GetBody interface{}
Cancel interface{}
}
j, err := json.MarshalIndent(_Request{Request: r}, "", " ")
To Quote the comment from Cerise Limon
If your goal is data inspection and debugging, then the spew package may be of help.
Spew is wow-cool!! https://github.com/davecgh/go-spew
part of dump of an HTTP request:
(*x509.Certificate)(0xc00016c100)({
Raw: ([]uint8) (len=947 cap=960) {
00000000 30 82 03 af 30 82 02 97 a0 03 02 01 02 02 10 08 |0...0...........|
00000010 3b e0 56 90 42 46 b1 a1 75 6a c9 59 91 c7 4a 30 |;.V.BF..uj.Y..J0|
00000020 0d 06 09 2a 86 48 86 f7 0d 01 01 05 05 00 30 61 |...*.H........0a|
00000030 31 0b 30 09 06 03 55 04 06 13 02 55 53 31 15 30 |1.0...U....US1.0|
00000040 13 06 03 55 04 0a 13 0c 44 69 67 69 43 65 72 74 |...U....DigiCert|

Receiving binary data from stdin, sending to channel in Go

so I have the following test Go code which is designed to read from a binary file through stdin, and send the data read to a channel, (where it would then be processed further). In the version I've given here, it only reads the first two values from stdin, although that's fine as far as showing the problem is concerned.
package main
import (
"fmt"
"io"
"os"
)
func input(dc chan []byte) {
data := make([]byte, 2)
var err error
var n int
for err != io.EOF {
n, err = os.Stdin.Read(data)
if n > 0 {
dc <- data[0:n]
}
}
}
func main() {
dc := make(chan []byte, 1)
go input(dc)
fmt.Println(<-dc)
}
To test it, I first build it using go build, and then send data to it using the command-
./inputtest < data.bin
The data I am using currently to test is just random binary data created using the openssl command.
The problem I am having is that it misses the first values from Stdin, and only gives the second and greater values. I think this is to do with the channel, as the same script with the channel removed produces the correct data. Has anyone come across this before? For example, I get the following output when running this command-
./inputtest < data.bin
[36 181]
Whereas I should be getting-
./inputtest < data.bin
[72 218]
(The binary data is the same in both instances.)
You're overwriting your buffer on every read and you've got a channel buffer, so you'll lose data every time there's space in the channel.
Try something like this (not tested, written on tablet, etc...):
import "os"
func input(dc chan []byte) error {
defer close(dc)
for {
data := make([]byte, 2)
n, err := os.Stdin.Read(data)
if n > 0 {
dc <- data[0:n]
}
if err != nil {
return err
}
}
return nil
}