Safely Interupt csv read and write - csv

I am processing csv files, and when I interupt the process, I want to store unprocessed data to another file.
This what I've done
csvFile, err := os.Open(csvPath)
r := csv.NewReader(csvFile)
sigc := make(chan os.Signal, 1)
signal.Notify(sigc,
syscall.SIGHUP,
syscall.SIGINT,
syscall.SIGTERM,
syscall.SIGQUIT)
go func() {
<-sigc
savePending(r)
}()
for {
record, err := r.Read()
if err == io.EOF {
break
}
if err != nil {
log.Println(record, err)
continue
}
doSomethingWithRecord(record)
}
savePending Function
func savePending(r *csv.Reader) {
pendingFileName := fmt.Sprintf("%s_pending.csv", fileBaseName)
csvPendingPath := path.Join(dirname, pendingFileName)
pendingFile, err := os.Create(csvPendingPath)
if err != nil {
log.Fatalln("Couldn't open the csv file", csvPendingPath, err)
}
defer pendingFile.Close()
pendR := csv.NewWriter(pendingFile)
records, err := r.ReadAll()
if err == io.EOF {
log.Println("no pending records")
}
err = pendR.WriteAll(records)
if err != nil {
log.Println("error writing pending file")
}
}
But when I run the code, then I interupt the script by pressing CTRL+C, I always get panic
panic: runtime error: slice bounds out of range [:7887] with capacity 4096
goroutine 82 [running]:
bufio.(*Reader).ReadSlice(0xc0000c2ea0, 0x105930a, 0x88, 0x90, 0xc00090cab0, 0x0, 0x0)
/usr/local/Cellar/go/1.13.3/libexec/src/bufio/bufio.go:334 +0x232
encoding/csv.(*Reader).readLine(0xc00015c1b0, 0x9, 0x9, 0xc00090cab0, 0xc00090f680, 0x20e)
/usr/local/Cellar/go/1.13.3/libexec/src/encoding/csv/reader.go:218 +0x49
encoding/csv.(*Reader).readRecord(0xc00015c1b0, 0x0, 0x0, 0x0, 0xc00090cab0, 0x9, 0x9, 0x0, 0x0)
/usr/local/Cellar/go/1.13.3/libexec/src/encoding/csv/reader.go:266 +0x115
encoding/csv.(*Reader).ReadAll(0xc00015c1b0, 0xc0005af2c0, 0x1000, 0xc0006fc000, 0xc0001da608, 0x0)
/usr/local/Cellar/go/1.13.3/libexec/src/encoding/csv/reader.go:202 +0x74
main.savePending(0xc00015c1b0, 0x0, 0x0, 0x0)
What could be the issue ?

While the savePending function is being started, the main routine continues to read from the reader.
How about aborting the for loop on <-sigc and saving the rest then:
csvFile, err := os.Open(csvPath)
r := csv.NewReader(csvFile)
sigc := make(chan os.Signal, 1)
signal.Notify(sigc,
syscall.SIGHUP,
syscall.SIGINT,
syscall.SIGTERM,
syscall.SIGQUIT)
for {
select {
case <-sigc:
savePending(r)
return
default:
}
record, err := r.Read()
if err == io.EOF {
break
}
if err != nil {
log.Println(record, err)
continue
}
doSomethingWithRecord(record)
}

Related

Why call rows.Close() could take long time in pgx?

Why call rows.Close() takes too long time when I call it after I exit from loop rows.Next() before processing all elements of loop.
Its happen when I make request which returns huge amount of data (around 300 000 rows).
This problem doesn't exists when amount of rows is not so big.
func SelectHugeAmountOfRows() {
query := `SELECT * FROM big_table`
rows, _ := Conn.Query(context.Background(), query)
defer func() {
fmt.Println("Start close rows")
start := time.Now()
rows.Close()
duration := time.Since(start)
fmt.Println("rowsClose duration:", duration)
}()
for rows.Next() {
rowValues, err := rows.Values()
if err != nil {
fmt.Println(err)
return
}
// do something with rowValues and get error in process
err = func() error {
fmt.Println(rowValues)
return errors.New("some error")
}()
if err != nil {
return
}
}
if err := rows.Err(); err != nil {
fmt.Println(err)
return
}
}
rowsClose duration: 1m2.5488669s
Interesting that duration of rows.Close() in this case in same as duration which I will have if will process all elements of rows.Next() loop without break it.

How to process a request that has multiple inputs and multiple files at the same time

Building a backend go server that can take a form with multiple inputs and 3 of them have multiple file inputs. I searched and it states that if you want to make something like this work you don't want to use the typical
if err := r.ParseMultipartForm(32 << 20); err != nil {
fmt.Println(err)
}
// get a reference to the fileHeaders
files := r.MultipartForm.File["coverArt"]
and instead you should use
mr, err := r.MultipartReader()
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
}
Standard form-data:
Name
Email
Cover art photos (multiple files)
Profile photos (multiple files)
2 Audio files (2 songs)
2 Videos (personal intro, recording of person in a cappella)
HTML Form
<form method="post" enctype="multipart/form-data" action="/upload">
<input type="text" name="name">
<input type="text" name="email">
<input name="coverArt" type="file" multiple />
<input name="profile" type="file" multiple />
<input type="file" name="songs" multiple />
<input type="file" name="videos" multiple/>
<button type="submit">Upload File</button>
</form>
Go Code:
func FilePOST(w http.ResponseWriter, r *http.Request) error {
fmt.Println("File Upload Endpoint Hit")
mr, err := r.MultipartReader()
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
}
for {
part, err := mr.NextPart()
// This is OK, no more parts
if err == io.EOF {
break
}
// Some error
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
}
// CoverArt 'files' part
if part.FormName() == "coverArt" {
name := part.FileName()
outfile, err := os.Create("uploads/" + name)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
// return
}
defer outfile.Close()
_, err = io.Copy(outfile, part)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
// return
}
}
// Profile Pic 'files' part
if part.FormName() == "profile" {
name := part.FileName()
outfile, err := os.Create("uploads/" + name)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
// return
}
defer outfile.Close()
_, err = io.Copy(outfile, part)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
// return
}
}
// Songs 'files' part
if part.FormName() == "songs" {
name := part.FileName()
outfile, err := os.Create("uploads/" + name)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
// return
}
defer outfile.Close()
_, err = io.Copy(outfile, part)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
// return
}
}
// Video 'files' part
if part.FormName() == "videos" {
name := part.FileName()
outfile, err := os.Create("uploads/" + name)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
// return
}
defer outfile.Close()
_, err = io.Copy(outfile, part)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
// return
}
}
}
fmt.Println("done")
return nil
}
Go Server Error:
go run main.go [15:58:21]
now serving at the following location www.localhost:3000
File Upload Endpoint Hit
INFO[0009] POST /upload elapsed="680.422µs" host= method=POST path=/upload query=
2021/07/14 15:58:32 http: panic serving [::1]:62924: runtime error: invalid memory address or nil pointer dereference
It is hard to guess where your code panics. Probably the reason is that your program continue to execute when error occurs. For example if creation of file fails, outfile.Close() will panic as the outfile is nil.
Both approaches support multiple files for single field. The difference is in how they handle memory. The streaming version reads small portions of data from the network and writes it to a file when you call io.Copy. The other variant loads all the data into memory when you call ParseMultiForm(), so it requires as much memory as the size of the files you want to transfer. Below you will find working examples for both variants.
Streaming variant:
func storeFile(part *multipart.Part) error {
name := part.FileName()
outfile, err := os.Create("uploads/" + name)
if err != nil {
return err
}
defer outfile.Close()
_, err = io.Copy(outfile, part)
if err != nil {
return err
}
return nil
}
func filePOST(w http.ResponseWriter, r *http.Request) error {
fmt.Println("File Upload Endpoint Hit")
mr, err := r.MultipartReader()
if err != nil {
return err
}
for {
part, err := mr.NextPart()
// This is OK, no more parts
switch {
case errors.Is(err, io.EOF):
fmt.Println("done")
return nil
case err != nil:
// Some error
return err
default:
switch part.FormName() {
case "coverArt", "profile", "songs", "videos":
if err := storeFile(part); err != nil {
return err
}
}
}
}
}
func main() {
http.HandleFunc("/upload", func(writer http.ResponseWriter, request *http.Request) {
err := filePOST(writer, request)
if err != nil {
http.Error(writer, err.Error(), http.StatusInternalServerError)
log.Println("Error", err)
}
})
if err := http.ListenAndServe(":8080", nil); err != nil {
log.Fatal(err)
}
}
And version with ParseMultipartForm, which reads data to memory.
func storeFile(part *multipart.FileHeader) error {
name := part.Filename
infile, err := part.Open()
if err != nil {
return err
}
defer infile.Close()
outfile, err := os.Create("uploads/" + name)
if err != nil {
return err
}
defer outfile.Close()
_, err = io.Copy(outfile, infile)
if err != nil {
return err
}
return nil
}
func FilePOST(w http.ResponseWriter, r *http.Request) error {
fmt.Println("File Upload Endpoint Hit")
if err := r.ParseMultipartForm(2 << 24); err != nil {
return err
}
for _, fileType := range []string{"coverArt", "profile", "songs", "videos"} {
uploadedFiles, exists := r.MultipartForm.File[fileType]
if !exists {
continue
}
for _, file := range uploadedFiles {
if err := storeFile(file); err != nil {
return err
}
}
}
return nil
}
func main() {
http.HandleFunc("/upload", func(writer http.ResponseWriter, request *http.Request) {
err := FilePOST(writer, request)
if err != nil {
http.Error(writer, err.Error(), http.StatusInternalServerError)
log.Println("Error", err)
}
})
if err := http.ListenAndServe(":8080", nil); err != nil {
log.Fatal(err)
}
}

Process csv file from upload

I have a gin application that receives a post request containing a csv file which I want to read without saving it. I'm stuck here trying to read from the post request with the following error message: cannot use file (variable of type *multipart.FileHeader) as io.Reader value in argument to csv.NewReader: missing method Read
file, err := c.FormFile("file")
if err != nil {
errList["Invalid_body"] = "Unable to get request"
c.JSON(http.StatusUnprocessableEntity, gin.H{
"status": http.StatusUnprocessableEntity,
"error": errList,
})
}
r := csv.NewReader(file) // <= Error message
records, err := r.ReadAll()
for _, record := range records {
fmt.Println(record)
}
Is there a good example that I could use?
first read the file and header
csvPartFile, csvHeader, openErr := r.FormFile("file")
if openErr != nil {
// handle error
}
then read the lines from the file
csvLines, readErr := csv.NewReader(csvPartFile).ReadAll()
if readErr != nil {
//handle error
}
you can go through the lines looping through the records
for _, line := range csvLines {
fmt.Println(line)
}
As other answers have mentioned, you should Open() it first.
The latest version of gin.Context.FromFile(string) seems to return only two values.
This worked for me:
func (c *gin.Context) {
file_ptr, err := c.FormFile("file")
if err != nil {
log.Println(err.Error())
c.Status(http.StatusUnprocessableEntity)
return
}
log.Println(file_ptr.Filename)
file, err := file_ptr.Open()
if err != nil {
log.Println(err.Error())
c.Status(http.StatusUnprocessableEntity)
return
}
defer file.Close()
records, err := csv.NewReader(file).ReadAll()
if err != nil {
log.Println(err.Error())
c.Status(http.StatusUnprocessableEntity)
return
}
for _, line := range records {
fmt.Println(line)
}
}

SQL result to JSON as fast as possible

I'm trying to transform the Go built-in sql result to JSON. I'm using goroutines for that but I got problems.
The base problem:
There is a really big database with around 200k user and I have to serve them through tcp sockets in a microservice based system. To get the users from the database spent 20ms but transform this bunch of data to JSON spend 10 seconds with the current solution. This is why I want to use goroutines.
Solution with Goroutines:
func getJSON(rows *sql.Rows, cnf configure.Config) ([]byte, error) {
log := logan.Log{
Cnf: cnf,
}
cols, _ := rows.Columns()
defer rows.Close()
done := make(chan struct{})
go func() {
defer close(done)
for result := range resultChannel {
results = append(
results,
result,
)
}
}()
wg.Add(1)
go func() {
for rows.Next() {
wg.Add(1)
go handleSQLRow(cols, rows)
}
wg.Done()
}()
go func() {
wg.Wait()
defer close(resultChannel)
}()
<-done
s, err := json.Marshal(results)
results = []resultContainer{}
if err != nil {
log.Context(1).Error(err)
}
rows.Close()
return s, nil
}
func handleSQLRow(cols []string, rows *sql.Rows) {
defer wg.Done()
result := make(map[string]string, len(cols))
fmt.Println("asd -> " + strconv.Itoa(counter))
counter++
rawResult := make([][]byte, len(cols))
dest := make([]interface{}, len(cols))
for i := range rawResult {
dest[i] = &rawResult[i]
}
rows.Scan(dest...) // GET PANIC
for i, raw := range rawResult {
if raw == nil {
result[cols[i]] = ""
} else {
fmt.Println(string(raw))
result[cols[i]] = string(raw)
}
}
resultChannel <- result
}
This solution give me a panic with the following message:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x45974c]
goroutine 408 [running]:
panic(0x7ca140, 0xc420010150)
/usr/lib/golang/src/runtime/panic.go:500 +0x1a1
database/sql.convertAssign(0x793960, 0xc420529210, 0x7a5240, 0x0, 0x0, 0x0)
/usr/lib/golang/src/database/sql/convert.go:88 +0x1ef1
database/sql.(*Rows).Scan(0xc4203e4060, 0xc42021fb00, 0x44, 0x44, 0x44, 0x44)
/usr/lib/golang/src/database/sql/sql.go:1850 +0xc2
github.com/PumpkinSeed/zerodb/operations.handleSQLRow(0xc420402000, 0x44, 0x44, 0xc4203e4060)
/home/loow/gopath/src/github.com/PumpkinSeed/zerodb/operations/operations.go:290 +0x19c
created by github.com/PumpkinSeed/zerodb/operations.getJSON.func2
/home/loow/gopath/src/github.com/PumpkinSeed/zerodb/operations/operations.go:258 +0x91
exit status 2
The current solution which is working but spend too much time:
func getJSON(rows *sql.Rows, cnf configure.Config) ([]byte, error) {
log := logan.Log{
Cnf: cnf,
}
var results []resultContainer
cols, _ := rows.Columns()
rawResult := make([][]byte, len(cols))
dest := make([]interface{}, len(cols))
for i := range rawResult {
dest[i] = &rawResult[i]
}
defer rows.Close()
for rows.Next() {
result := make(map[string]string, len(cols))
rows.Scan(dest...)
for i, raw := range rawResult {
if raw == nil {
result[cols[i]] = ""
} else {
result[cols[i]] = string(raw)
}
}
results = append(results, result)
}
s, err := json.Marshal(results)
if err != nil {
log.Context(1).Error(err)
}
rows.Close()
return s, nil
}
Question:
Why the goroutine solution give me an error, where it is not an obvious panic, because the first ~200 goroutine running properly?!
UPDATE
Performance test for the original working solution:
INFO[0020] setup taken -> 3.149124658s file=operations.go func=operations.getJSON line=260 service="Database manager" ts="2017-04-02 19:45:27.132881211 +0100 BST"
INFO[0025] toJSON taken -> 5.317647046s file=operations.go func=operations.getJSON line=263 service="Database manager" ts="2017-04-02 19:45:32.450551417 +0100 BST"
The sql to map is 3 sec and to json is 5 sec.
Go routines won't improve performance on CPU-bound operations like JSON marshaling. What you need is a more efficient JSON marshaler. There are some available, although I haven't used any. A simple Google search for 'faster JSON marshaling' will turn up many results. A popular one is ffjson. I suggest starting there.

Efficient read and write CSV in Go

The Go code below reads in a 10,000 record CSV (of timestamp times and float values), runs some operations on the data, and then writes the original values to another CSV along with an additional column for score. However it is terribly slow (i.e. hours, but most of that is calculateStuff()) and I'm curious if there are any inefficiencies in the CSV reading/writing I can take care of.
package main
import (
"encoding/csv"
"log"
"os"
"strconv"
)
func ReadCSV(filepath string) ([][]string, error) {
csvfile, err := os.Open(filepath)
if err != nil {
return nil, err
}
defer csvfile.Close()
reader := csv.NewReader(csvfile)
fields, err := reader.ReadAll()
return fields, nil
}
func main() {
// load data csv
records, err := ReadCSV("./path/to/datafile.csv")
if err != nil {
log.Fatal(err)
}
// write results to a new csv
outfile, err := os.Create("./where/to/write/resultsfile.csv"))
if err != nil {
log.Fatal("Unable to open output")
}
defer outfile.Close()
writer := csv.NewWriter(outfile)
for i, record := range records {
time := record[0]
value := record[1]
// skip header row
if i == 0 {
writer.Write([]string{time, value, "score"})
continue
}
// get float values
floatValue, err := strconv.ParseFloat(value, 64)
if err != nil {
log.Fatal("Record: %v, Error: %v", floatValue, err)
}
// calculate scores; THIS EXTERNAL METHOD CANNOT BE CHANGED
score := calculateStuff(floatValue)
valueString := strconv.FormatFloat(floatValue, 'f', 8, 64)
scoreString := strconv.FormatFloat(prob, 'f', 8, 64)
//fmt.Printf("Result: %v\n", []string{time, valueString, scoreString})
writer.Write([]string{time, valueString, scoreString})
}
writer.Flush()
}
I'm looking for help making this CSV read/write template code as fast as possible. For the scope of this question we need not worry about the calculateStuff method.
You're loading the file in memory first then processing it, that can be slow with a big file.
You need to loop and call .Read and process one line at a time.
func processCSV(rc io.Reader) (ch chan []string) {
ch = make(chan []string, 10)
go func() {
r := csv.NewReader(rc)
if _, err := r.Read(); err != nil { //read header
log.Fatal(err)
}
defer close(ch)
for {
rec, err := r.Read()
if err != nil {
if err == io.EOF {
break
}
log.Fatal(err)
}
ch <- rec
}
}()
return
}
playground
//note it's roughly based on DaveC's comment.
This is essentially Dave C's answer from the comments sections:
package main
import (
"encoding/csv"
"log"
"os"
"strconv"
)
func main() {
// setup reader
csvIn, err := os.Open("./path/to/datafile.csv")
if err != nil {
log.Fatal(err)
}
r := csv.NewReader(csvIn)
// setup writer
csvOut, err := os.Create("./where/to/write/resultsfile.csv"))
if err != nil {
log.Fatal("Unable to open output")
}
w := csv.NewWriter(csvOut)
defer csvOut.Close()
// handle header
rec, err := r.Read()
if err != nil {
log.Fatal(err)
}
rec = append(rec, "score")
if err = w.Write(rec); err != nil {
log.Fatal(err)
}
for {
rec, err = r.Read()
if err != nil {
if err == io.EOF {
break
}
log.Fatal(err)
}
// get float value
value := rec[1]
floatValue, err := strconv.ParseFloat(value, 64)
if err != nil {
log.Fatal("Record, error: %v, %v", value, err)
}
// calculate scores; THIS EXTERNAL METHOD CANNOT BE CHANGED
score := calculateStuff(floatValue)
scoreString := strconv.FormatFloat(score, 'f', 8, 64)
rec = append(rec, scoreString)
if err = w.Write(rec); err != nil {
log.Fatal(err)
}
w.Flush()
}
}
Note of course the logic is all jammed into main(), better would be to split it into several functions, but that's beyond the scope of this question.
encoding/csv is indeed very slow on big files, as it performs a lot of allocations. Since your format is so simple I recommend using strings.Split instead which is much faster.
If even that is not fast enough you can consider implementing the parsing yourself using strings.IndexByte which is implemented in assembly: http://golang.org/src/strings/strings_decl.go?s=274:310#L1
Having said that, you should also reconsider using ReadAll if the file is larger than your memory.