Bulk insert with Golang and Gorm deadlock in concurrency goroutines - mysql

I'm trying to bulk insert many records using Gorm, Golang and MySQL. My code looks like this:
package main
import (
"fmt"
"sync"
"gorm.io/driver/mysql"
"gorm.io/gorm"
)
type Article struct {
gorm.Model
Code string `gorm:"size:255;uniqueIndex"`
}
func main() {
db, err := gorm.Open(mysql.Open("root#tcp(127.0.0.1:3306)/q_test"), nil)
if err != nil {
panic(err)
}
db.AutoMigrate(&Article{})
// err = db.Exec("TRUNCATE articles").Error
err = db.Exec("DELETE FROM articles").Error
if err != nil {
panic(err)
}
// Build some articles
n := 10000
var articles []Article
for i := 0; i < n; i++ {
article := Article{Code: fmt.Sprintf("code_%d", i)}
articles = append(articles, article)
}
// // Save articles
// err = db.Create(&articles).Error
// if err != nil {
// panic(err)
// }
// Save articles with goroutines
chunkSize := 100
var wg sync.WaitGroup
wg.Add(n / chunkSize)
for i := 0; i < n; i += chunkSize {
go func(i int) {
defer wg.Done()
chunk := articles[i:(i + chunkSize)]
err := db.Create(&chunk).Error
if err != nil {
panic(err)
}
}(i)
}
wg.Wait()
}
When I run this code sometimes (about one in three times) I get this error:
panic: Error 1213: Deadlock found when trying to get lock; try restarting transaction
If I run the code without goroutines (commented lines), I get no deadlock. Also, I've noticed that if I remove the unique index on the code field the deadlock doesn't happen anymore. And if I replace the DELETE FROM articles statement with TRUNCATE articles the deadlock doesn't seem to happen anymore.
I've also run the same code with Postgresql and it works, with no deadlocks.
Any idea why the deadlock happens only with the unique index on MySQL and how to avoid it?

DELETE statement is executed using a row lock, each row in the table is locked for deletion.
TRUNCATE TABLE always locks the table and page but not each row.
source : https://stackoverflow.com/a/20559931/18012302
I think mysql need time to do DELETE query.
try add time.Sleep after query delete.
err = db.Exec("DELETE FROM articles").Error
if err != nil {
panic(err)
}
time.Sleep(time.Second)

Related

How to filter elements of a [][]string slice in Golang?

First of all i'm new here and i'm trying to learn Golang. I would like to check my csv file (which has 3 values; type, maker, model) and create a new one and after a filter operation i want to write new data(filtered) to the created csv file. Here is my code so you can understand me more clearly.
package main
import (
"encoding/csv"
"fmt"
"os"
)
func main() {
//openning my csv file which is vehicles.csv
recordFile, err := os.Open("vehicles.csv")
if err != nil{
fmt.Println("An error encountered ::", err)
}
//reading it
reader := csv.NewReader(recordFile)
vehicles, _ := reader.ReadAll()
//creating a new csv file
newRecordFile, err := os.Create("newCsvFile.csv")
if err != nil{
fmt.Println("An error encountered ::", err)
}
//writing vehicles.csv into the new csv
writer := csv.NewWriter(newRecordFile)
err = writer.WriteAll(vehicles)
if err != nil {
fmt.Println("An error encountered ::", err)
}
}
After i build it, it is working this way. It reads and writes the all data to new created csv file. But the problem here is, i want to filter duplicates of readed csv which is vehicles, i am creating another function (outside of the main function) to filter duplicates but i can't do it because vehicles 's type is [][]string, i searched the internet about filtering duplicates but all i found is int or string types. What i want to do is create a function and call it before WriteAll operation so WriteAll can write the correct (duplicates filtered) data into new csv file. Help me please!!
I appreciate any answer.
Happy coding!
This depends on how you define "uniqueness", but in general there are a few parts of this problem.
What is unique?
All fields must be equal
Only some fields must be equal
Normalize some or all fields before comparing
You have a few approaches for applying your uniqueness, including:
You can use a map, keyed by the "pieces" of uniqueness, requires O(N) state
You can sort the records and compare with the prior record as you iterate, requires O(1) state but is more complicated
You have two approaches for filtering and outputting:
You can build a new slice based on the old one using a loop and write all at once, this requires O(N) space
You can write the records out to the file as you go if you don't need to sort, this requires O(1) space
I think a reasonably simple and performant approach would be to pick (1) from the first, (1) from the second, and (2) from the third, which together would look like:
package main
import (
"encoding/csv"
"errors"
"io"
"log"
"os"
)
func main() {
input, err := os.Open("vehicles.csv")
if err != nil {
log.Fatalf("opening input file: %s", err)
}
output, err := os.Create("vehicles_filtered.csv")
if err != nil {
log.Fatalf("creating output file: %s", err)
}
defer func() {
// Ensure the file is closed at the end of the program
if err := output.Close(); err != nil {
log.Fatalf("finalizing output file: %s", err)
}
}()
reader := csv.NewReader(input)
writer := csv.NewWriter(output)
seen := make(map[[3]string]bool)
for {
// Read in one record
record, err := reader.Read()
if errors.Is(err, io.EOF) {
break
}
if err != nil {
log.Fatalf("reading record: %s", err)
}
if len(record) != 3 {
log.Printf("bad record %q", record)
continue
}
// Check if the record has been seen before, skipping if so
key := [3]string{record[0], record[1], record[2]}
if seen[key] {
continue
}
seen[key] = true
// Write the record
if err := writer.Write(record); err != nil {
log.Fatalf("writing record %d: %s", len(seen), err)
}
}
}

XML Insert Performance into MYSQL

I have some code which inserts the records on the database:
The code is supposed to insert 15M records on the database, right now, it takes 60 hours on a AWS t2.large instance. I'm looking for ways to make the insert on the DB faster while also not duplicating records.
Do you guys have suggestions for me?
I'm using Gorm and MYSQL.
// InsertJob will insert job into database, by checking its hash.
func InsertJob(job XMLJob, oid int, ResourceID int) (Job, error) {
db := globalDBConnection
cleanJobDescription := job.Body
hashString := GetMD5Hash(job.Title + job.Body + job.Location + job.Zip)
JobDescriptionHash := GetMD5Hash(job.Body)
empty := sql.NullString{String: "", Valid: true}
j := Job{
CurrencyID: 1, //USD
//other fields here elided for brevity
PrimaryIndustry: sql.NullString{String: job.PrimaryIndustry, Valid: true},
}
err := db.Where("hash = ?", hashString).Find(&j).Error
if err != nil {
if err.Error() != "record not found" {
return j, err
}
err2 := db.Create(&j).Error
if err2 != nil {
log.Println("Unable to create job:" + err.Error())
return j, err2
}
}
return j, nil
}
You can speed it up using using semaphore pattern.
https://play.golang.org/p/OxO8pNy3bc6
inspired from here.
https://gist.github.com/montanaflynn/ea4b92ed640f790c4b9cee36046a5383

Deadlock error occurrd when query mysql in golang, not in query segment but after out of "rows.Next()" loop

when i use golang to query mysql, sometimes i found "deadlock err" in my code.
my question is not "why deadlock occurred", but why deadlock err found in "err = rows.Err()". in my mind, if deadlock occurred, i should get it at "tx.Query"'s return err.
this is demo code, "point 2" is where deadlock error occurred
func demoFunc(tx *sql.Tx, arg1, arg2 int) ([]outItem, error) {
var ret []outItem
var err error
var rows *sql.Rows
//xxxLockSql may deadlock, so try again for 3-times
for i := 0; i < 3; i++ {
//------ point 1
rows, err = tx.Query(xxxLockSql, arg1, arg2)
if err == nil {
break
}
log.Printf("[ERROR] xxxLockSql failed, err %s, retry %d", err.Error(), i)
time.Sleep(time.Millisecond * 10)
}
//if query xxxLockSql failed up to 3-times, then return
if err != nil {
log.Printf("[ERROR] xxxLockSql failed, err %s", err.Error())
return ret, err
}
defer rows.Close()
for rows.Next() {
err = rows.Scan(&a1, &a2)
if err != nil {
return ret, err
}
ret = append(ret, acl)
}
//------ point 2
if err = rows.Err(); err != nil {
// i found deadlock err in this "if" segment.
// err content is "Error 1213: Deadlock found when trying to get lock; try restarting transaction"
log.Printf("[ERROR] loop rows failed, err %s", err.Error())
return ret, err
}
return ret, nil
}
I cannot be sure about the reason since you did not mention your database driver (and which sql package you are using). But I think this is because sql.Query is lazy, which means querying and loading the rows is postponed until actual use, i.e., rows.Next() - that is why the deadlock error occurs there.
As of why it is out of the loop, it's because that when an error occurs, rows.Next() returns false and break the loop.

How can I ensure that all of my write transaction functions get resolved in order? Also, why is the else function not executing?

I'm trying to create a very simple Bolt database called "ledger.db" that includes one Bucket, called "Users", which contains Usernames as a Key and Balances as the value that allows users to transfer their balance to one another. I am using Bolter to view the database in the command line
There are two problems, both contained in this transfer function issue resides in the transfer function.
The First: Inside the transfer function is an if/else. If the condition is true, it executes as it should. If it's false, nothing happens. There's no syntax errors and the program runs as though nothing is wrong, it just doesn't execute the else statement.
The Second: Even if the condition is true, when it executes, it doesn't update BOTH the respective balance values in the database. It updates the balance of the receiver, but it doesn't do the same for the sender. The mathematical operations are completed and the values are marshaled into a JSON-compatible format.
The problem is that the sender balance is not updated in the database.
Everything from the second "Success!" fmt.Println() function onward is not processed
I've tried changing the "db.Update()" to "db.Batch()". I've tried changing the order of the Put() functions. I've tried messing with goroutines and defer, but I have no clue how to use those, as I am rather new to golang.
func (from *User) transfer(to User, amount int) error{
var fbalance int = 0
var tbalance int = 0
db, err := bolt.Open("ledger.db", 0600, nil)
if err != nil {
log.Fatal(err)
}
defer db.Close()
return db.Update(func(tx *bolt.Tx) error {
uBuck := tx.Bucket([]byte("Users"))
json.Unmarshal(uBuck.Get([]byte(from.username)), &fbalance)
json.Unmarshal(uBuck.Get([]byte(to.username)), &tbalance)
if (amount <= fbalance) {
fbalance = fbalance - amount
encoded, err := json.Marshal(fbalance)
if err != nil {
return err
}
tbalance = tbalance + amount
encoded2, err := json.Marshal(tbalance)
if err != nil {
return err
}
fmt.Println("Success!")
c := uBuck
err = c.Put([]byte(to.username), encoded2)
return err
fmt.Println("Success!")
err = c.Put([]byte(from.username), encoded)
return err
fmt.Println("Success!")
} else {
return fmt.Errorf("Not enough in balance!", amount)
}
return nil
})
return nil
}
func main() {
/*
db, err := bolt.Open("ledger.db", 0600, nil)
if err != nil {
log.Fatal(err)
}
defer db.Close()
*/
var b User = User{"Big", "jig", 50000, 0}
var t User = User{"Trig", "pig", 40000, 0}
// These two functions add each User to the database, they aren't
// the problem
b.createUser()
t.createUser()
/*
db.View(func(tx *bolt.Tx) error {
c := tx.Bucket([]byte("Users"))
get := c.Get([]byte(b.username))
fmt.Printf("The return value %v",get)
return nil
})
*/
t.transfer(b, 40000)
}
I expect the database to show Big:90000 Trig:0 from the beginning values of Big:50000 Trig:40000
Instead, the program outputs Big:90000 Trig:40000
You return unconditionally:
c := uBuck
err = c.Put([]byte(to.username), encoded2)
return err
fmt.Println("Success!")
err = c.Put([]byte(from.username), encoded)
return err
fmt.Println("Success!")
You are not returning and checking errors.
json.Unmarshal(uBuck.Get([]byte(from.username)), &fbalance)
json.Unmarshal(uBuck.Get([]byte(to.username)), &tbalance)
t.transfer(b, 40000)
And so on.
Debug your code statement by statement.

DB calls in goroutine failing without error

I wrote a script to migrate lots of data from one DB to another and got it working fine, but now I want to try and use goroutines to speed up the script by using concurrent DB calls. Since making the change to calling go processBatch(offset) instead of just processBatch(offset), I can see that a few goroutines are started but the script finishes almost instantly and nothing is actually done. Also the number of started goroutines varies every time I call the script. There are no errors (that I can see).
I'm still new to goroutines and Go in general, so any pointers as to what I might be doing wrong are much appreciated. I have removed all logic from the code below that is not related to concurrency or DB access, as it runs fine without the changes. I also left a comment where I believe it fails, as nothing below that line is run (Print gives not output). I also tried using sync.WaitGroup to stagger DB calls, but it didn't seem to change anything.
var (
legacyDB *sql.DB
v2DB *sql.DB
)
func main() {
var total, loops int
var err error
legacyDB, err = sql.Open("mysql", "...")
if err != nil {
panic(err)
}
defer legacyDB.Close()
v2DB, err = sql.Open("mysql", "...")
if err != nil {
panic(err)
}
defer v2DB.Close()
err = legacyDB.QueryRow("SELECT count(*) FROM users").Scan(&total)
checkErr(err)
loops = int(math.Ceil(float64(total) / float64(batchsize)))
fmt.Println("Total: " + strconv.Itoa(total))
fmt.Println("Loops: " + strconv.Itoa(loops))
for i := 0; i < loops; i++ {
offset := i * batchsize
go processBatch(offset)
}
legacyDB.Close()
v2DB.Close()
}
func processBatch(offset int) {
query := namedParameterQuery.NewNamedParameterQuery(`
SELECT ...
LIMIT :offset,:batchsize
`)
query.SetValue(...)
rows, err := legacyDB.Query(query.GetParsedQuery(), (query.GetParsedParameters())...)
// nothing after this line gets done (Println here does not show output)
checkErr(err)
defer rows.Close()
....
var m runtime.MemStats
runtime.ReadMemStats(&m)
log.Printf("\nAlloc = %v\nTotalAlloc = %v\nSys = %v\nNumGC = %v\n\n", m.Alloc/1024/1024, m.TotalAlloc/1024/1024, m.Sys/1024/1024, m.NumGC)
}
func checkErr(err error) {
if err != nil {
panic(err)
}
}
As Nadh mentioned in a comment, that would be because the program exits when the main function finishes, regardless whether or not there are still other goroutines running. To fix this, a *sync.WaitGroup will suffice. A WaitGroup is used for cases where you have multiple concurrent operations, and you would like to wait until they have all completed. Documentation can be found here: https://golang.org/pkg/sync/#WaitGroup.
An example implementation for your program without the use of global variables would look like replacing
fmt.Println("Total: " + strconv.Itoa(total))
fmt.Println("Loops: " + strconv.Itoa(loops))
for i := 0; i < loops; i++ {
offset := i * batchsize
go processBatch(offset)
}
with
fmt.Println("Total: " + strconv.Itoa(total))
fmt.Println("Loops: " + strconv.Itoa(loops))
wg := new(sync.WaitGroup)
wg.Add(loops)
for i := 0; i < loops; i++ {
offset := i * batchsize
go func(offset int) {
defer wg.Done()
processBatch(offset)
}(offset)
}
wg.Wait()