I am writing a package to read CSV files in Go, and I need to open CSV files which may be coded in different formats (such as UTF8, Latin1 or others). Is there a way to specify the encoding format of the CSV file to read?
Package csv
import "encoding/csv"
func NewReader
func NewReader(r io.Reader) *Reader
NewReader returns a new Reader that reads from r.
Provide an io.Reader to csv.NewReader that maps the CSV file character set to Unicode UTF-8.
For example,
import (
"encoding/csv"
"golang.org/x/text/encoding/charmap"
)
file, err := os.Open(filename)
if err != nil {
return err
}
defer file.Close()
rdr := csv.NewReader(charmap.ISO8859_15.NewDecoder().Reader(file))
Related
I'm trying to store utf-8 text into a table which encoding is latin1_swedish_ci. I can't change the encoding since I do not have direct access to the the db. So what I'm trying is encode the text into latin-1 with this Go library that provides the encoder and this one that has a function that wraps the encoder so it replaces the invalid characters instead of returning an error.
But when I try to insert the row mysql complains Error 1366: Incorrect string value: '\\xE7\\xE3o pa...' for column 'description' at row 1.
I tried writing the same text to a file and file -I reports this file.txt: application/octet-stream; charset=binary.
Example
package main
import (
"fmt"
"os"
"golang.org/x/text/encoding"
"golang.org/x/text/encoding/charmap"
)
func main() {
s := "foo – bar"
encoder := charmap.ISO8859_1.NewEncoder()
encoder = encoding.ReplaceUnsupported(encoder)
encoded, err := encoder.String(s)
if err != nil {
panic(err)
}
fmt.Println(s)
fmt.Println(encoded)
fmt.Printf("%q\n", encoded)
/* file test */
f, err := os.Create("file.txt")
if err != nil {
panic(err)
}
defer f.Close()
w := encoder.Writer(f)
w.Write([]byte(s))
}
I'm probably missing something very obvious but my knowledge about encodings is very poor.
Thanks in advace.
Were you expecting çã ?
The problem is easily solved. MySQL will gladly translate from latin1 to utf8 while INSERTing text. But you must tell it that your client is using latin1. That is probably done during the connection to MySQL, and is probably defaulted to utf8 or UTF-8 or utf8mb4 currently. It is something like
charset=latin1
I have a 8 gigs CSV file that i need to unmarshal to a list of struct
package main
import (
"encoding/csv"
"fmt"
"io"
"os"
gocsv "github.com/gocarina/gocsv"
dto "github.com/toto/GeoTransport/import/dto"
)
// Put in parameter json the csv names
func importAdresse() {
var adressesDB []dto.GeoAdresse
clientsFile, err := os.OpenFile("../../../data/geo/public.geo_adresse.csv", os.O_RDWR|os.O_CREATE, os.ModePerm)
if err != nil {
panic(err)
}
gocsv.SetCSVReader(func(in io.Reader) gocsv.CSVReader {
r := csv.NewReader(in)
r.Comma = ';'
return r // Allows use pipe as delimiter
})
if err = gocsv.UnmarshalFile(clientsFile, &adressesDB); err != nil { // Load clients from file
panic(err)
}
var i int
i = 0
for _, adresse := range adressesDB {
fmt.Println("adresse.Numero")
fmt.Printf("%+v\n", adresse)
fmt.Println(adresse.Numero)
i++
if i == 3 {
break
}
}
}
func init() {
}
func main() {
importAdresse()
}
Actually I am using go csv to unmarshall it but I have some memory error.
The program quit because it does not have enough ram.
I would like to know how to read the csv line by line and unmarshal it to a struct.
One of the solution will be to split the CSV file with some unix command.
But I would like to know how to do it with only Go.
It looks like the parsing method you're using attempts to read the entire CSV file into memory. You might try using the standard CSV reader package directly, or using another CSV-to-struct library that allows for line-by-line decoding like this one. Does the example code on those pages show what you're looking for?
Another thing to try would be running wc -l ../../../data/geo/public.geo_adresse.csv to get the number of lines in your CSV file, then write this:
var adressesDB [<number of lines in your CSV>]dto.GeoAdresse
If the runtime raises the out of memory exception on that line, it means that the unmarshalled CSV data exceeds your RAM capacity and you'll have to read it in chunks.
I have seen many Lambda functions to get the CSV file data from S3 in python, Node.js but I have been trying to write the function using Go.
package main
import (
"encoding/csv"
"fmt"
"os"
)
func main() {
file, err := os.Open("testcsv.csv")
if err != nil {
fmt.Println(err)
}
reader := csv.NewReader(file)
records, _ := reader.ReadAll()
fmt.Println(records)
}
This is the way to read the CSV file but how to write this for AWS Lambda Function?
you need have the sdk set up and amazon docs will be a good start
once you have the code compile with os as linux then deploy in aws as zip file . Please note that handler name should same as binary
more details on deployment here
Following this tutorial I'm trying to read a json file in Golang. It says there are two ways of doing that:
unmarshal the JSON using a set of predefined structs
or unmarshal the JSON using a map[string]interface{}
Since I'll probably have a lot of different json formats I prefer to interpret it on the fly. So I now have the following code:
package main
import (
"fmt"
"os"
"io/ioutil"
"encoding/json"
)
func main() {
// Open our jsonFile
jsonFile, err := os.Open("users.json")
// if we os.Open returns an error then handle it
if err != nil {
fmt.Println(err)
}
fmt.Println("Successfully Opened users.json")
// defer the closing of our jsonFile so that we can parse it later on
defer jsonFile.Close()
byteValue, _ := ioutil.ReadAll(jsonFile)
var result map[string]interface{}
json.Unmarshal([]byte(byteValue), &result)
fmt.Println(result["users"])
fmt.Printf("%T\n", result["users"])
}
This prints out:
Successfully Opened users.json
[map[type:Reader age:23 social:map[facebook:https://facebook.com twitter:https://twitter.com] name:Elliot] map[name:Fraser type:Author age:17 social:map[facebook:https://facebook.com twitter:https://twitter.com]]]
[]interface {}
At this point I don't understand how I can read the age of the first user (23). I tried some variations:
fmt.Println(result["users"][0])
fmt.Println(result["users"][0].age)
But apparently, type interface {} does not support indexing.
Is there a way that I can access the items in the json without defining the structure?
Probably you want
fmt.Println(result["users"].(map[string]interface{})["age"])
or
fmt.Println(result[0].(map[string]interface{})["age"])
As the JSON is a map of maps the type of the leaf nodes is interface{} and so has to be converted to map[string]interface{} in order to lookup a key
Defining a struct is much easier. My top tip for doing this is to use a website that converts JSON to a Go struct definition, like Json-To-Go
I'm now learning Go myself and am stuck in getting and parsing HTML/XML. In Python, I usually write the following code when I do web scraping:
from urllib.request import urlopen, Request
url = "http://stackoverflow.com/"
req = Request(url)
html = urlopen(req).read()
, then I can get raw HTML/XML in a form of either string or bytes and proceed to work with it. In Go, how can I cope with it? What I hope to get is raw HTML data which is stored either in string or []byte (though it can be easily converted, that I don't mind which to get at all). I consider using gokogiri package to do web scraping in Go (not sure I'll indeed end up with using it!), but it looks like it requires raw HTML text before doing any work with it...
So how can I acquire such object?
Or is there any better way to do web scraping work in Go?
Thanks.
From the Go http.Get Example:
package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
)
func main() {
res, err := http.Get("http://www.google.com/robots.txt")
if err != nil {
log.Fatal(err)
}
robots, err := ioutil.ReadAll(res.Body)
res.Body.Close()
if err != nil {
log.Fatal(err)
}
fmt.Printf("%s", robots)
}
Will return the contents of http://www.google.com/robots.txt into the string variable robots.
For XML parsing look into the Go encoding/xml package.