I'm trying to decode CSV files encoded in UTF-16BE in Golang. What is the charmap ISO character number that I have to call for the new reader ?
I want to invoke
csv.NewReader(charmap.XXXX.NewDecoder().Reader(file))
What should be the value of XXXX ?
Have you tried this?
https://godoc.org/golang.org/x/text/encoding/unicode#UTF16
unicode.UTF16(BigEndian, UseBOM)
After some review, a simple way to decode UTF16 into UTF8 is provided by this code:
https://gist.github.com/bradleypeabody/185b1d7ed6c0c2ab6cec#file-gistfile1-go
You can use golang.org/x/text/encoding/unicode.UTF16 to create a decoder from your target UTF-16 Little/Big-Endian encoding into UTF-8.
The code below shows a working example for UTF-16 LE (Go playground):
dec := unicode.UTF16(unicode.LittleEndian, unicode.UseBOM).NewDecoder()
utf16r := getUTF16LittleEndianCSVReader()
utf8r := transform.NewReader(utf16r, dec)
csvr := csv.NewReader(utf8r)
records, err := csvr.ReadAll()
// TODO: handle err
fmt.Printf("%#v", records)
// [][]string{[]string{"id", "name"}, []string{"1", "foo"}}
Switching to Big-endian should be as simple as below:
enc := unicode.UTF16(unicode.BigEndian, unicode.UseBOM).NewDecoder()
Related
I'm trying to store utf-8 text into a table which encoding is latin1_swedish_ci. I can't change the encoding since I do not have direct access to the the db. So what I'm trying is encode the text into latin-1 with this Go library that provides the encoder and this one that has a function that wraps the encoder so it replaces the invalid characters instead of returning an error.
But when I try to insert the row mysql complains Error 1366: Incorrect string value: '\\xE7\\xE3o pa...' for column 'description' at row 1.
I tried writing the same text to a file and file -I reports this file.txt: application/octet-stream; charset=binary.
Example
package main
import (
"fmt"
"os"
"golang.org/x/text/encoding"
"golang.org/x/text/encoding/charmap"
)
func main() {
s := "foo – bar"
encoder := charmap.ISO8859_1.NewEncoder()
encoder = encoding.ReplaceUnsupported(encoder)
encoded, err := encoder.String(s)
if err != nil {
panic(err)
}
fmt.Println(s)
fmt.Println(encoded)
fmt.Printf("%q\n", encoded)
/* file test */
f, err := os.Create("file.txt")
if err != nil {
panic(err)
}
defer f.Close()
w := encoder.Writer(f)
w.Write([]byte(s))
}
I'm probably missing something very obvious but my knowledge about encodings is very poor.
Thanks in advace.
Were you expecting çã ?
The problem is easily solved. MySQL will gladly translate from latin1 to utf8 while INSERTing text. But you must tell it that your client is using latin1. That is probably done during the connection to MySQL, and is probably defaulted to utf8 or UTF-8 or utf8mb4 currently. It is something like
charset=latin1
so I am trying to POST a csv file in JSON format to a website in Golang. I have been successful in POSTing a singular JSON file. However, that is not what I want the purpose of my program to be. Basically, I'm trying to create an account generator for a site. I want to be able to generate multiple accounts at once. I feel the best way to do this is with a csv file.
I've tried using encoding/csv to read the csv file then marshal to JSON. Also, ioutil.ReadFile(). However, the response from the site is, 'first name is mandatory field, last name is a mandatory field' etc etc. So, this obviously means the csv data is not going into JSON format. I'll show my code with the ioutil.ReadFile() below.
func main() {
file, _ := ioutil.ReadFile("accounts.csv")
jsonConv, _ := json.Marshal(file)
client := http.Client{}
req, err := http.NewRequest("POST", "websiteUrlHere", bytes.NewBuffer(jsonConv))
req.Header.Add("cookie", `"really long cookie goes here"`)
req.Header.Set("user-agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36")
req.Header.Set("content-type", "application/json")
resp, err := client.Do(req)
if err != nil {
fmt.Print(err)
}
defer resp.Body.Close()
}
(^ This is just a snippet of the code).
I'm still pretty new to all of this so please understand if the question lacks anything. I've also looked for similar questions but all I find is the use of a struct. I feel this wouldn't be applicable for this as the goal is to create unlimited accounts at once.
Hope the above is sufficient. Thank you.
The issue with your code is actually that you're trying to convert a file to bytes with:
file, _ := ioutil.ReadFile("accounts.csv")
...and then you are AGAIN trying to convert that slice of bytes to JSON bytes with:
jsonConv, _ := json.Marshal(file)
Where the text contents of the file are stored as a slice of bytes in the variable file, and then that slice of bytes (the file contents in bytes form) is then being converted to a JSON array of bytes. So you are basically sending a JSON array of numbers...not good.
The normal flow here would be to take the file bytes and then create a Go struct(s) out of it. Once your Go objects are in place, THEN you would marshal to JSON. That converts the Go objects to a slice of bytes AFTER it has been converted to JSON text form.
So what you are missing is the Go structure middle step but you also should keep in mind that converting a Go struct to JSON bytes with json.Marshal() will only show fields that are exported. Also, usually you should use struct tags to customize exactly how the fields will show up.
Your best bet is just to stick with JSON, forget about the CSV. In your own code example, you are taking a CSV and then trying to convert it to JSON...so, then, just use JSON.
If you want to send multiple accounts, just make your Go structure a slice, which will marshal into a JSON array, which is basically what you are trying to do. The end result will be a JSON array of accounts. Here's a simple example:
package main
import (
"fmt"
"encoding/json"
)
type Account struct {
Username string `json:"username"`
Email string `json:"email"`
}
type AccountsRequest struct {
Accounts []Account `json:"accounts"`
}
func main() {
//gather account info
acct1 := Account{Username: "tom", Email: "tom#example.com"}
acct2 := Account{Username: "dick", Email: "dick#example.com"}
acct3 := Account{Username: "harry", Email: "harry#example.com"}
//create request
acctsReq := AccountsRequest{Accounts: []Account{acct1, acct2, acct3}}
//convert to JSON bytes/data
//jsonData, _ := json.Marshal(acctsReq)
//debug/output
jsonDataPretty, _ := json.MarshalIndent(acctsReq, "", " ")
fmt.Println(string(jsonDataPretty))
//send request with data
//...
}
Runnable here in playground.
The key is that the structs are set up and ready to go and the struct tags determine what the JSON field names will be (i.e. username & email for each account and accounts for the overall array of accounts).
Hope that helps. Drop a comment if you need more specific help.
You need to parse the CSV file first and convert it into the list that you want:
package main
func main() {
file, err := os.Open("file.csv")
if err != nil {
log.Fatal("failed opening file because: %s", err.Error())
}
r := csv.NewReader(file)
records, err := r.ReadAll()
if err != nil {
log.Fatal(err)
}
fmt.Print(records)
}
The above code is parsing the list into a [][]string array. you will now need to iterate over that array and turn it into the json object that the page needs. Then you can send it. You can read more about the csv package here : https://golang.org/pkg/encoding/csv/
A word of advise: never ignore errors, they might give you useful information.
Using Go, how can I unmarshal a JSON string that contains unprintable ASCII characters?
For Example
testJsonString := "{\"test_one\" : \"123\x10456\x0B789\v123\a456\"}"
var dat map[string]interface{}
err := json.Unmarshal([]byte(testJsonString), &dat)
if err != nil {
panic(err)
}
Yields:
panic: invalid character '\x10' in string literal
goroutine 1 [running]:
main.main()
/tmp/sandbox903140350/main.go:14 +0x180
https://play.golang.org/p/mFGWzndDK8V
Unfortunately I do not have control over the source data, so I need a way to ignore or strip out the unprintable characters.
Similarly, another data issue I'm encountering is stripping out a few C escape sequences as well - like \0 and \a. If I replace string listed above with this string below, the program fails as well. Essentially it also fails on any C escape sequence https://en.wikipedia.org/wiki/Escape_sequences_in_C
testJsonString := "{\"test_one\" : \"123456789\\a123456\"}"
will error out with
panic: invalid character 'a' in string escape code
goroutine 1 [running]:
main.main()
/tmp/sandbox322770276/main.go:12 +0x100
This also seems to not be able to be unmarshaled, but is not able to be escaped through rune number checking or checking the unicode (since Go appears to treat it as a backslash followed by the character 'a', which are both legal)
Is there a good way to handle these edge cases?
According to the JSON spec https://jsonapi.org/format/ non printable characters should be URI escaped (or converted to valid unicode escapes)
So here's a converter that makes non printable characters into their uri escaped forms. These can then be fed into the Unmarshal
If this isn't exactly the behaviour you need then modify the converter to remove the characters (with continue) or replace with a question mark rune or whatever
BTW, the second problem with \\a does not "print out as expected" for me. Please give a better example that actually shows the problem you are experiencing
package main
import (
"bytes"
"encoding/json"
"fmt"
"unicode"
"net/url"
)
func safety(d string) []byte {
var buffer bytes.Buffer
for _, c := range d {
s := string(c)
if c == 92 { // 92 is a backslash
continue
}
if unicode.IsPrint(c) {
buffer.WriteString(s)
} else {
buffer.WriteString(url.QueryEscape(s))
}
fmt.Println(buffer.String())
}
return buffer.Bytes()
}
func main() {
testJsonString := "{\"test_one\" : \"123\x10456\x0B789\v123\a456\"}"
var dat map[string]interface{}
err := json.Unmarshal(safety(testJsonString), &dat)
if err != nil {
panic(err)
}
fmt.Printf("%v", dat)
}
json.Encoder seems to behave slightly different than json.Marshal. Specifically it adds a new line at the end of the encoded value. Any idea why is that? It looks like a bug to me.
package main
import "fmt"
import "encoding/json"
import "bytes"
func main() {
var v string
v = "hello"
buf := bytes.NewBuffer(nil)
json.NewEncoder(buf).Encode(v)
b, _ := json.Marshal(&v)
fmt.Printf("%q, %q", buf.Bytes(), b)
}
This outputs
"\"hello\"\n", "\"hello\""
Try it in the Playground
Because they explicitly added a new line character when using Encoder.Encode. Here's the source code to that func, and it actually states it adds a newline character in the documentation (see comment, which is the documentation):
https://golang.org/src/encoding/json/stream.go?s=4272:4319
// Encode writes the JSON encoding of v to the stream,
// followed by a newline character.
//
// See the documentation for Marshal for details about the
// conversion of Go values to JSON.
func (enc *Encoder) Encode(v interface{}) error {
if enc.err != nil {
return enc.err
}
e := newEncodeState()
err := e.marshal(v)
if err != nil {
return err
}
// Terminate each value with a newline.
// This makes the output look a little nicer
// when debugging, and some kind of space
// is required if the encoded value was a number,
// so that the reader knows there aren't more
// digits coming.
e.WriteByte('\n')
if _, err = enc.w.Write(e.Bytes()); err != nil {
enc.err = err
}
encodeStatePool.Put(e)
return err
}
Now, why did the Go developers do it other than "makes the output look a little nice"? One answer:
Streaming
The go json Encoder is optimized for streaming (e.g. MB/GB/PB of json data). It is typical that when streaming you need a way to deliminate when your stream has completed. In the case of Encoder.Encode(), that is a \n newline character. Sure, you can certainly write to a buffer. But you can also write to an io.Writer which would stream the block of v.
This is opposed to the use of json.Marshal which is generally discouraged if your input is from an untrusted (and unknown limited) source (e.g. an ajax POST method to your web service - what if someone posts a 100MB json file?). And, json.Marshal would be a final complete set of json - e.g. you wouldn't expect to concatenate a few 100 Marshal entries together. You'd use Encoder.Encode() for that to build a large set and write to the buffer, stream, file, io.Writer, etc.
Whenever in doubt if it's a bug, I always lookup the source - that's one of the advantages to Go, it's source and compiler is just pure Go. Within [n]vim I use \gb to open the source definition in a browser with my .vimrc settings.
You can erease the newline by backward stream:
f, _ := os.OpenFile(fname, ...)
encoder := json.NewEncoder(f)
encoder.Encode(v)
f.Seek(-1, 1)
f.WriteString("other data ...")
They should let user control this strange behavior:
a build option to disable it
Encoder.SetEOF(eof string)
Encoder.SetIndent(prefix, indent, eof string)
The Encoder writes a stream of documents. The extra whitespace terminates a JSON document in the stream.
A terminator is required for stream readers. Consider a stream containing these JSON documents: 1, 2, 3. Without the extra whitespace, the data on the wire is the sequence of bytes 123. This is a single JSON document with the number 123, not three documents.
I am designing an REST API to upload a largish (100MB) file together with some information. So it's natural to think of json encoding.
So something like this:
{
file: content of the file or URL?
name: string
description: string
}
The name and description are easy to do with json but I'm not sure how the file content can be added to it.
Also I'm thinking I should use http PUT method. Is this correct?
Incidentally, golang is used to implement this API if it matters.
For a JSON encoding, use a []byte value to hold the file contents. The standard encoding/json package encodes []byte values as base64 strings.
Here's a sketch of how to implement the JSON encoding. Declare a type representing the payload:
type Upload struct {
Name string
Description string
Content []byte
}
To encode the file to a request body:
v := Upload{Name: fileName, Description: description, Content: content}
var buf bytes.Buffer
if err := json.NewEncoder(&buf).Encode(v); err != nil {
// handle error
}
req, err := http.NewRequest("PUT", url, &buf)
if err != nil {
// handle error
}
resp, err := http.DefaultClient.Do(req)
To decode the from a request body on the server:
var v Upload
if err := json.NewDecoder(req.Body).Decode(&v); err != nil {
// handle error
}
Another option is to use the mime/multipart package. The multipart encoding will be more efficient than JSON encoding because no base64 or other text encoding of the file is required for multipart.
To me, the most clear-cut way to do it would be to encode the file bytes somehow. base64 seems like a good choice, and golang has built-in support for it with "encoding/base64".