JSON Marshal uint or int as integer - json

I'm looking for information about the json marshal with Go. I'll explain the situation first.
I'm developing an app for a IoT device. The app sends a JSON inside a MQTT Packet to our broker. How the device is using a SIM for data connection I need to reduce to minimum the bytes of the packet.
Right now, The JSON has this structure
{
"d": 1524036831
"p": "important message"
}
The field d is a timestamp and p is the payload.
When the app sends this JSON it has 40 bytes. But if d is 1000, pe, the JSON will be 34 bytes. So the marshal is converting the field d as uint32 to ASCII representation of the number and then sends the string.
What I want is to send this field as a true int or uint. I want to say, 1524036831 is a int32, 4 bytes, the same as 1000. So with this change I could reduce the packet size some bytes and the number is be able to grow to 32 bits.
I read the docs for json.Marshal and I did not find anything about this.
I found a "solution" but I guest it is not pretty but does the work. I want another opinions.
Ugly solution (for me)
package main
import (
"encoding/binary"
"encoding/json"
"fmt"
)
type test struct {
Data uint32 `json:"d"`
Payload string `json:"p"`
}
type testB struct {
Data []byte `json:"d"`
Payload string `json:"p"`
}
func main() {
fmt.Println("TEST with uin32")
d := []test{test{Data: 5, Payload: "Important Message"}, test{Data: 10, Payload: "Important Message"}, test{Data: 1000, Payload: "Important Message"}, test{Data: 1524036831, Payload: "Important Message"}}
for _, i := range d {
j, _ := json.Marshal(i)
fmt.Println(string(j))
fmt.Println("All:", len(j))
fmt.Println("-----------")
}
fmt.Println("\nTEST with []Byte")
d1 := []testB{testB{Data: make([]byte, 4), Payload: "Important Message"}, testB{Data: make([]byte, 4), Payload: "Important Message"}, testB{Data: make([]byte, 4), Payload: "Important Message"}, testB{Data: make([]byte, 4), Payload: "Important Message"}}
binary.BigEndian.PutUint32(d1[0].Data, 5)
binary.BigEndian.PutUint32(d1[1].Data, 20)
binary.BigEndian.PutUint32(d1[2].Data, 1000)
binary.BigEndian.PutUint32(d1[3].Data, 1524036831)
for _, i := range d1 {
j, _ := json.Marshal(i)
fmt.Println(string(j))
fmt.Println(len(j))
fmt.Println("-----------")
}
}
Play

To re-interate my comment: JSON is a text format, and text format are not designed to produce small messages. In particular there is no representation for numbers other than decimal strings in JSON.
Encoding numbers in a base larger than 10 will reduce the message size for large enough numbers.
You can reduce the message size your "ugly" code produces by removing leading zero bytes and encoding with base64.RawStdEncoding (which omits the padding characters). Doing this pays of for numbers >= 1e6.
If you put this all in a custom type it becomes much nicer to use:
package main
import (
"bytes"
"encoding/base64"
"encoding/binary"
"encoding/json"
"fmt"
)
type IntB64 uint32
func (n IntB64) MarshalJSON() ([]byte, error) {
b := make([]byte, 4)
binary.BigEndian.PutUint32(b, uint32(n))
b = bytes.TrimLeft(b, string(0))
// All characters in the base64 alphabet need not be escaped, so we don't
// have to call json.Marshal here.
l := base64.RawStdEncoding.EncodedLen(len(b)) + 2
j := make([]byte, l)
base64.RawStdEncoding.Encode(j[1:], b)
j[0], j[l-1] = '"', '"'
return j, nil
}
func main() {
enc(1) // "AQ"
enc(1000) // "A+g"
enc(1e6 - 1) // "D0I/"
enc(1e6) // "D0JA"
enc(1524036831) // "Wtb03w"
}
func enc(n int64) {
b, _ := json.Marshal(IntB64(n))
fmt.Printf("%10d %s\n", n, string(b))
}
Updated playground: https://play.golang.org/p/7Z03VE9roqN

Related

Is it possible to connect an accelerator with several memory mapped inputs/outputs?

Again I have some questions that are mainly due to my inexperience.
I am designing a memory mapped accelerator, the idea is that the accelerator will have 1 Data input, 1 Data output, and a control input.
And I want all this connections to be memory mapped and connected via FIFOs.
I have already design a memory mapped accelerator before, but it just had 1 input and 1 output as the example given (GenericFIR).
If we check the example of the GenericFIR example we can see how to connect 1 Input and 1 Output:
// DOC include start: GenericFIRBlock chisel
abstract class GenericFIRBlock[D, U, EO, EI, B<:Data, T<:Data:Ring]
(
genIn: T,
genOut: T,
coeffs: Seq[T]
)(implicit p: Parameters) extends DspBlock[D, U, EO, EI, B] {
val streamNode = AXI4StreamIdentityNode()
val mem = None
lazy val module = new LazyModuleImp(this) {
require(streamNode.in.length == 1)
require(streamNode.out.length == 1)
val in = streamNode.in.head._1
val out = streamNode.out.head._1
// instantiate generic fir
val fir = Module(new GenericFIR(genIn, genOut, coeffs))
// Attach ready and valid to outside interface
in.ready := fir.io.in.ready
fir.io.in.valid := in.valid
fir.io.out.ready := out.ready
out.valid := fir.io.out.valid
// cast UInt to T
fir.io.in.bits := in.bits.data.asTypeOf(GenericFIRBundle(genIn))
// cast T to UInt
out.bits.data := fir.io.out.bits.asUInt
}
}
// DOC include end: GenericFIRBlock chisel
But how do we modify this for the case in which the GenericFIR has two InputBundles and two OutputBundles? Lets say in1, in2, out1, out2 and all of them with their Ready/Valid signals (Decoupled).
Also how do we connect the StreamNodes after?
Thanks!

Writing data from bigquery to csv is slow

I wrote code that behaves weird and slow and I can't understand why.
What I'm trying to do is to download data from bigquery (using a query as an input) to a CSV file, then create a url link with this CSV so people can download it as a report.
I'm trying to optimize the process of writing the CSV as it takes some time and have some weird behavior.
The code iterates over bigquery results and pass each result to a channel for future parsing/writing using golang encoding/csv package.
This is the relevant parts with some debugging
func (s *Service) generateReportWorker(ctx context.Context, query, reportName string) error {
it, err := s.bigqueryClient.Read(ctx, query)
if err != nil {
return err
}
filename := generateReportFilename(reportName)
gcsObj := s.gcsClient.Bucket(s.config.GcsBucket).Object(filename)
wc := gcsObj.NewWriter(ctx)
wc.ContentType = "text/csv"
wc.ContentDisposition = "attachment"
csvWriter := csv.NewWriter(wc)
var doneCount uint64
go backgroundTimer(ctx, it.TotalRows, &doneCount)
rowJobs := make(chan []bigquery.Value, it.TotalRows)
workers := 10
wg := sync.WaitGroup{}
wg.Add(workers)
// start wrokers pool
for i := 0; i < workers; i++ {
go func(c context.Context, num int) {
defer wg.Done()
for row := range rowJobs {
records := make([]string, len(row))
for j, r := range records {
records[j] = fmt.Sprintf("%v", r)
}
s.mu.Lock()
start := time.Now()
if err := csvWriter.Write(records); err != {
log.Errorf("Error writing row: %v", err)
}
if time.Since(start) > time.Second {
fmt.Printf("worker %d took %v\n", num, time.Since(start))
}
s.mu.Unlock()
atomic.AddUint64(&doneCount, 1)
}
}(ctx, i)
}
// read results from bigquery and add to the pool
for {
var row []bigquery.Value
if err := it.Next(&row); err != nil {
if err == iterator.Done || err == context.DeadlineExceeded {
break
}
log.Errorf("Error loading next row from BQ: %v", err)
}
rowJobs <- row
}
fmt.Println("***done loop!***")
close(rowJobs)
wg.Wait()
csvWriter.Flush()
wc.Close()
url := fmt.Sprintf("%s/%s/%s", s.config.BaseURL s.config.GcsBucket, filename)
/// ....
}
func backgroundTimer(ctx context.Context, total uint64, done *uint64) {
ticker := time.NewTicker(10 * time.Second)
go func() {
for {
select {
case <-ctx.Done():
ticker.Stop()
return
case _ = <-ticker.C:
fmt.Printf("progress (%d,%d)\n", atomic.LoadUint64(done), total)
}
}
}()
}
bigquery Read func
func (c *Client) Read(ctx context.Context, query string) (*bigquery.RowIterator, error) {
job, err := c.bigqueryClient.Query(query).Run(ctx)
if err != nil {
return nil, err
}
it, err := job.Read(ctx)
if err != nil {
return nil, err
}
return it, nil
}
I run this code with query that have about 400,000 rows. the query itself take around 10 seconds, but the whole process takes around 2 minutes
The output:
progress (112346,392565)
progress (123631,392565)
***done loop!***
progress (123631,392565)
progress (123631,392565)
progress (123631,392565)
progress (123631,392565)
progress (123631,392565)
progress (123631,392565)
progress (123631,392565)
worker 3 took 1m16.728143875s
progress (247525,392565)
progress (247525,392565)
progress (247525,392565)
progress (247525,392565)
progress (247525,392565)
progress (247525,392565)
progress (247525,392565)
worker 3 took 1m13.525662666s
progress (370737,392565)
progress (370737,392565)
progress (370737,392565)
progress (370737,392565)
progress (370737,392565)
progress (370737,392565)
progress (370737,392565)
progress (370737,392565)
worker 4 took 1m17.576536375s
progress (392565,392565)
You can see that writing first 112346 rows was fast, then for some reason worker 3 took 1.16minutes (!!!) to write a single row, which cause the other workers to wait for the mutex to be released, and this happened again 2 more times, which caused the whole process to take more than 2 minutes to finish.
I'm not sure whats going and how can I debug this further, why I have this stalls in the execution?
As suggested by #serge-v, you can write all the records to a local file and then transfer the file as a whole to GCS. To make the process happen in a shorter time span you can split the files into multiple chunks and can use this command : gsutil -m cp -j where
gsutil is used to access cloud storage from command line
-m is used to perform a parallel multi-threaded/multi-processing copy
cp is used to copy files
-j applies gzip transport encoding to any file upload. This also saves network bandwidth while leaving the data uncompressed in Cloud Storage.
To apply this command in your go Program you can refer to this Github link.
You could try implementing profiling in your Go program. Profiling will help you analyze the complexity. You can also find the time consumption in the program through profiling.
Since you are reading millions of rows from BigQuery you can try using the BigQuery Storage API. It Provides faster access to BigQuery-managed Storage than Bulk data export. Using BigQuery Storage API rather than the iterators that you are using in Go program can make the process faster.
For more reference you can also look into the Query Optimization techniques provided by BigQuery.

Visually align TSV using tabs

I have a text file with fields, separated by some number of consequent tabs (so that the fields are all visually aligned). I'd like to add a lot of new fields to it from another (not aligned, pure tsv) file, while keeping everything aligned. A lot of values contain spaces in them, so only tabs (with assumed width of 8) can be used for alignment, because I want to be able to parse the file later by splitting each line on any number of consequent tabs. This means that I can't use tools like column or tsv-pretty as they use spaces for alignment. Is there a tool or a short script I can use to achieve what I want?
Example:
File 1:
AA BB CCC
AAAA BBB CCC
AA BBBB CC
File 2:
DD EE FF
DDDD EE FFFF
DD EEEE FF
Result:
AA BB CCC
AAAA BBB CCC
AA BBBB CC
DD EE FF
DDDD EE FFFF
DD EEEE FF
Visual alignment is for human consumption don't save the file in that format, rather when you need to view the file use column to format it for you.
First need to get rid of the extra tabs in your first file and combine the files
$ cat <(tr -s '\t' <file1) file2 > file12
which will have the aligned columns by the delimiter (tab). Now you can use column -ts$'\t' file12 whenever you want to view the file which will align the columns for you.
This assumes you don't have missing fields.
I asked this question in hope that there's an existing tool or a simple awk/perl one-liner that can do what I want. Looks like there isn't, so I wrote a simple tool in Go that worked for my input. It doesn't handle a lot of things that a good tsv parser should (like escaping) but maybe it'll still be useful for someone else:
package main
import (
"bufio"
"fmt"
"math"
"os"
"strings"
)
const tabWidth = 8
func tsvAlign(filenames []string) (err error) {
var lines [][]string
for _, filename := range filenames {
file, err := os.Open(filename)
if err != nil {
return err
}
defer file.Close()
scanner := bufio.NewScanner(file)
for scanner.Scan() {
lines = append(lines, strings.FieldsFunc(scanner.Text(), func(c rune) bool { return c == '\t' }))
}
}
maxFieldWidths := make([]int, len(lines[0])-1, len(lines[0])-1)
for i := 0; i < len(lines[0])-1; i++ {
for _, line := range lines {
if len(line[i]) > maxFieldWidths[i] {
maxFieldWidths[i] = len(line[i])
}
}
}
for _, line := range lines {
for i, field := range line[:len(line)-1] {
padding := int(math.Ceil(float64(maxFieldWidths[i]+tabWidth-maxFieldWidths[i]%tabWidth)/8 - float64(len(field))/8))
fmt.Print(field, strings.Repeat("\t", padding))
}
fmt.Println(line[len(line)-1])
}
return err
}
func main() {
if len(os.Args) < 2 {
fmt.Fprintln(os.Stderr, "ERROR: No arguments provided")
return
}
err := tsvAlign(os.Args[1:])
if err != nil {
fmt.Fprintln(os.Stderr, "ERROR: ", err)
}
}

How to read a CSV that includes Chinese characters in Rust?

When I read a CSV file that includes Chinese characters using the csv crate, it has a error.
fn main() {
let mut rdr =
csv::Reader::from_file("C:\\Users\\Desktop\\test.csv").unwrap().has_headers(false);
for record in rdr.decode() {
let (a, b): (String, String) = record.unwrap();
println!("a:{},b:{}", a, b);
}
thread::sleep_ms(500000);
}
The error:
Running `target\release\rust_Work.exe`
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Decode("Could not convert bytes \'FromUtf8Error { bytes: [208, 213, 195, 251], error: Utf8Error { va
lid_up_to: 0 } }\' to UTF-8.")', ../src/libcore\result.rs:788
note: Run with `RUST_BACKTRACE=1` for a backtrace.
error: Process didn't exit successfully: `target\release\rust_Work.exe` (exit code: 101)
test.csv:
1. 姓名 性别 年纪 分数 等级
2. 小二 男 12 88 良好
3. 小三 男 13 89 良好
4. 小四 男 14 91 优秀
I'm not sure what could be done to make the error message more clear:
Decode("Could not convert bytes 'FromUtf8Error { bytes: [208, 213, 195, 251], error: Utf8Error { valid_up_to: 0 } }' to UTF-8.")
FromUtf8Error is documented in the standard library, and the text of the error says "Could not convert bytes to UTF-8" (although there's some extra detail in the middle).
Simply put, your data isn't in UTF-8 and it must be. That's all that the Rust standard library (and thus most libraries) really deal with. You will need to figure out what encoding it is in and then find some way of converting from that to UTF-8. There may be a crate to help with either of those cases.
Perhaps even better, you can save the file as UTF-8 from the beginning. Sadly, it's relatively common for people to hit this issue when using Excel, because Excel does not have a way to easily export UTF-8 CSV files. It always writes a CSV file in the system locale encoding.
I have a way to solve it. Thanks all.
extern crate csv;
extern crate rustc_serialize;
extern crate encoding;
use encoding::{Encoding, EncoderTrap, DecoderTrap};
use encoding::all::{GB18030};
use std::io::prelude::*;
fn main() {
let path = "C:\\Users\\Desktop\\test.csv";
let mut f = File::open(path).expect("cannot open file");
let mut reader: Vec<u8> = Vec::new();
f.read_to_end(&mut reader).expect("can not read file");
let mut chars = String::new();
GB18030.decode_to(&mut reader, DecoderTrap::Ignore, &mut chars);
let mut rdr = csv::Reader::from_string(chars).has_headers(true);
for row in rdr.decode() {
let (x, y, r): (String, String, String) = row.unwrap();
println!("({}, {}): {:?}", x, y, r);
}
}
output:
Part 1: Read Unicode (Chinese or not) characters:
The easiest way to achieve your goal is to use the read_to_string function that mutates the String you pass to it, appending the Unicode content of your file to that passed String:
use std::io::prelude::*;
use std::fs::File;
fn main() {
let mut f = File::open("file.txt").unwrap();
let mut buffer = String::new();
f.read_to_string(&mut buffer);
println!("{}", buffer)
}
Part 2: Parse a CSV file, its delimiter being a ',':
extern crate regex;
use regex::Regex;
use std::io::prelude::*;
use std::fs::File;
fn main() {
let mut f = File::open("file.txt").unwrap();
let mut buffer = String::new();
let delimiter = ",";
f.read_to_string(&mut buffer);
let modified_buffer = buffer.replace("\n", delimiter);
let mut regex_str = "([^".to_string();
regex_str.push_str(delimiter);
regex_str.push_str("]+)");
let mut final_part = "".to_string();
final_part.push_str(delimiter);
final_part.push_str("?");
regex_str.push_str(&final_part);
let regex_str_copy = regex_str.clone();
regex_str.push_str(&regex_str_copy);
regex_str.push_str(&regex_str_copy);
let re = Regex::new(&regex_str).unwrap();
for cap in re.captures_iter(&modified_buffer) {
let (s1, s2, dist): (String, String, usize) =
(cap[1].to_string(), cap[2].to_string(), cap[3].parse::<usize>().unwrap());
println!("({}, {}): {}", s1, s2, dist);
}
}
Sample input and output here

Receiving binary data from stdin, sending to channel in Go

so I have the following test Go code which is designed to read from a binary file through stdin, and send the data read to a channel, (where it would then be processed further). In the version I've given here, it only reads the first two values from stdin, although that's fine as far as showing the problem is concerned.
package main
import (
"fmt"
"io"
"os"
)
func input(dc chan []byte) {
data := make([]byte, 2)
var err error
var n int
for err != io.EOF {
n, err = os.Stdin.Read(data)
if n > 0 {
dc <- data[0:n]
}
}
}
func main() {
dc := make(chan []byte, 1)
go input(dc)
fmt.Println(<-dc)
}
To test it, I first build it using go build, and then send data to it using the command-
./inputtest < data.bin
The data I am using currently to test is just random binary data created using the openssl command.
The problem I am having is that it misses the first values from Stdin, and only gives the second and greater values. I think this is to do with the channel, as the same script with the channel removed produces the correct data. Has anyone come across this before? For example, I get the following output when running this command-
./inputtest < data.bin
[36 181]
Whereas I should be getting-
./inputtest < data.bin
[72 218]
(The binary data is the same in both instances.)
You're overwriting your buffer on every read and you've got a channel buffer, so you'll lose data every time there's space in the channel.
Try something like this (not tested, written on tablet, etc...):
import "os"
func input(dc chan []byte) error {
defer close(dc)
for {
data := make([]byte, 2)
n, err := os.Stdin.Read(data)
if n > 0 {
dc <- data[0:n]
}
if err != nil {
return err
}
}
return nil
}