I wrote a script to migrate lots of data from one DB to another and got it working fine, but now I want to try and use goroutines to speed up the script by using concurrent DB calls. Since making the change to calling go processBatch(offset) instead of just processBatch(offset), I can see that a few goroutines are started but the script finishes almost instantly and nothing is actually done. Also the number of started goroutines varies every time I call the script. There are no errors (that I can see).
I'm still new to goroutines and Go in general, so any pointers as to what I might be doing wrong are much appreciated. I have removed all logic from the code below that is not related to concurrency or DB access, as it runs fine without the changes. I also left a comment where I believe it fails, as nothing below that line is run (Print gives not output). I also tried using sync.WaitGroup to stagger DB calls, but it didn't seem to change anything.
var (
legacyDB *sql.DB
v2DB *sql.DB
)
func main() {
var total, loops int
var err error
legacyDB, err = sql.Open("mysql", "...")
if err != nil {
panic(err)
}
defer legacyDB.Close()
v2DB, err = sql.Open("mysql", "...")
if err != nil {
panic(err)
}
defer v2DB.Close()
err = legacyDB.QueryRow("SELECT count(*) FROM users").Scan(&total)
checkErr(err)
loops = int(math.Ceil(float64(total) / float64(batchsize)))
fmt.Println("Total: " + strconv.Itoa(total))
fmt.Println("Loops: " + strconv.Itoa(loops))
for i := 0; i < loops; i++ {
offset := i * batchsize
go processBatch(offset)
}
legacyDB.Close()
v2DB.Close()
}
func processBatch(offset int) {
query := namedParameterQuery.NewNamedParameterQuery(`
SELECT ...
LIMIT :offset,:batchsize
`)
query.SetValue(...)
rows, err := legacyDB.Query(query.GetParsedQuery(), (query.GetParsedParameters())...)
// nothing after this line gets done (Println here does not show output)
checkErr(err)
defer rows.Close()
....
var m runtime.MemStats
runtime.ReadMemStats(&m)
log.Printf("\nAlloc = %v\nTotalAlloc = %v\nSys = %v\nNumGC = %v\n\n", m.Alloc/1024/1024, m.TotalAlloc/1024/1024, m.Sys/1024/1024, m.NumGC)
}
func checkErr(err error) {
if err != nil {
panic(err)
}
}
As Nadh mentioned in a comment, that would be because the program exits when the main function finishes, regardless whether or not there are still other goroutines running. To fix this, a *sync.WaitGroup will suffice. A WaitGroup is used for cases where you have multiple concurrent operations, and you would like to wait until they have all completed. Documentation can be found here: https://golang.org/pkg/sync/#WaitGroup.
An example implementation for your program without the use of global variables would look like replacing
fmt.Println("Total: " + strconv.Itoa(total))
fmt.Println("Loops: " + strconv.Itoa(loops))
for i := 0; i < loops; i++ {
offset := i * batchsize
go processBatch(offset)
}
with
fmt.Println("Total: " + strconv.Itoa(total))
fmt.Println("Loops: " + strconv.Itoa(loops))
wg := new(sync.WaitGroup)
wg.Add(loops)
for i := 0; i < loops; i++ {
offset := i * batchsize
go func(offset int) {
defer wg.Done()
processBatch(offset)
}(offset)
}
wg.Wait()
Related
I'm trying to bulk insert many records using Gorm, Golang and MySQL. My code looks like this:
package main
import (
"fmt"
"sync"
"gorm.io/driver/mysql"
"gorm.io/gorm"
)
type Article struct {
gorm.Model
Code string `gorm:"size:255;uniqueIndex"`
}
func main() {
db, err := gorm.Open(mysql.Open("root#tcp(127.0.0.1:3306)/q_test"), nil)
if err != nil {
panic(err)
}
db.AutoMigrate(&Article{})
// err = db.Exec("TRUNCATE articles").Error
err = db.Exec("DELETE FROM articles").Error
if err != nil {
panic(err)
}
// Build some articles
n := 10000
var articles []Article
for i := 0; i < n; i++ {
article := Article{Code: fmt.Sprintf("code_%d", i)}
articles = append(articles, article)
}
// // Save articles
// err = db.Create(&articles).Error
// if err != nil {
// panic(err)
// }
// Save articles with goroutines
chunkSize := 100
var wg sync.WaitGroup
wg.Add(n / chunkSize)
for i := 0; i < n; i += chunkSize {
go func(i int) {
defer wg.Done()
chunk := articles[i:(i + chunkSize)]
err := db.Create(&chunk).Error
if err != nil {
panic(err)
}
}(i)
}
wg.Wait()
}
When I run this code sometimes (about one in three times) I get this error:
panic: Error 1213: Deadlock found when trying to get lock; try restarting transaction
If I run the code without goroutines (commented lines), I get no deadlock. Also, I've noticed that if I remove the unique index on the code field the deadlock doesn't happen anymore. And if I replace the DELETE FROM articles statement with TRUNCATE articles the deadlock doesn't seem to happen anymore.
I've also run the same code with Postgresql and it works, with no deadlocks.
Any idea why the deadlock happens only with the unique index on MySQL and how to avoid it?
DELETE statement is executed using a row lock, each row in the table is locked for deletion.
TRUNCATE TABLE always locks the table and page but not each row.
source : https://stackoverflow.com/a/20559931/18012302
I think mysql need time to do DELETE query.
try add time.Sleep after query delete.
err = db.Exec("DELETE FROM articles").Error
if err != nil {
panic(err)
}
time.Sleep(time.Second)
I'm doing something like this:
import(
"database/sql"
"github.com/go-sql-driver/mysql"
)
var db *sql.DB
func main() {
var err error
db, err = sql.Open(...)
if err != nil {
panic(err)
}
for j := 0; j < 8000; j++ {
_, err := db.Query("QUERY...")
if err != nil {
logger.Println("Error " + err.Error())
return
}
}
}
It works for the first 150 queries (for that I'm using another function to make) but after that, I get the error :
mysqli_real_connect(): (HY000/1040): Too many connections
So clearly I'm doing something wrong but I can't find what is it. I don't know what to open and close a new connection for each query.
Error in the log file :
"reg: 2020/06/28 03:35:34 Errores Error 1040: Too many connections"
(it is printed only once)
Error in mysql php my admin:
"mysqli_real_connect(): (HY000/1040): Too many connections"
"La conexión para controluser, como está definida en su configuración, fracasó."
(translated: "the connection for controluser, as it is defined in ti's configuration , failed.")
"mysqli_real_connect(): (08004/1040): Too many connections"
Every time you call Query(), you're creating a new database handle. Each active handle needs a unique database connection. Since you're not calling Close, that handle, and thus the connection, remains open until the program exits.
Solve your problem by calling rows.Close() after you're done with each query:
for j := 0; j < 8000; j++ {
rows, err := db.Query("QUERY...")
if err != nil {
logger.Println("Error " + err.Error())
return
}
// Your main logic here
rows.Close()
}
This Close() call is often called in a defer statement, but this precludes the use of a for loop (since a defer only executes when then function returns), so you may want to move your main logic to a new function:
for j := 0; j < 8000; j++ {
doStuff()
}
// later
func doStuff() {
rows, err := db.Query("QUERY...")
if err != nil {
logger.Println("Error " + err.Error())
return
}
defer rows.Close()
// Your main logic here
}
I have data on my local computer in 2 MySQL databases (dbConnOuter and dbConnInner) that I want to process and collate into a 3rd database (dbConnTarget).
The code runs for about 17000 cycles, then stops with these error messages:
[mysql] 2018/08/06 18:20:57 packets.go:72: unexpected EOF
[mysql] 2018/08/06 18:20:57 packets.go:405: busy buffer
As far as I can tell I'm properly closing the connections where I'm reading from and I'm using Exec for writing, that I believe handles its own resources. I've also tried prepared statements, but it didn't help, the result was the same.
Below is the relevant part of my code, and it does similar database operations prior to this without any issues.
As this is one of my first experiments with Go, I can't yet see where I might be wasting my resources.
import (
"database/sql"
_ "github.com/go-sql-driver/mysql"
)
var dbConnOuter *sql.DB
var dbConnInner *sql.DB
var dbConnTarget *sql.DB
func main() {
dbConnOuter = connectToDb(dbUserDataOne)
dbConnInner = connectToDb(dbUserDataTwo)
dbConnTarget = connectToDb(dbUserDataThree)
// execute various db processing functions
doStuff()
}
func connectToDb(dbUser dbUser) *sql.DB {
dbConn, err := sql.Open("mysql", fmt.Sprintf("%v:%v#tcp(127.0.0.1:3306)/%v", dbUser.username, dbUser.password, dbUser.dbname))
if err != nil {
panic(err)
}
dbConn.SetMaxOpenConns(500)
return dbConn
}
// omitted similar db processing functions that work just fine
func doStuff() {
outerRes, err := dbConnOuter.Query("SELECT some outer data")
if err != nil {
panic(err)
}
defer outerRes.Close()
for outerRes.Next() {
outerRes.Scan(&data1)
innerRes, err := dbConnInner.Query("SELECT some inner data using", data1)
if err != nil {
panic(err)
}
innerRes.Scan(&data2, &data3)
innerRes.Close()
dbConnTarget.Exec("REPLACE INTO whatever", data1, data2, data3)
}
}
I'm new in Go (Golang). I wrote a simple benchmark program to test the concurrent processing with MySQL. Keep getting "dial tcp 52.55.254.165:3306: getsockopt: connection refused", "unexpected EOF" errors when I increase the number of concurrent channels.
Each go routine is doing a batch insert of 1 to n number of row to a simple customer table. The program allows to set variable insert size (number of rows in a single statement) and number of parallel go routine (each go routine performs one insert above). Program works fine with small numbers row<100 and number go routines<100. But start getting Unexpected EOF errors when the numbers increase, especially the number of parallel go routines.
Did search for clues. Based on them, I've set the database max connection and 'max_allowed_packet' and 'max_connections'. I've also set the go program db.db.SetMaxOpenConns(200), db.SetConnMaxLifetime(200), db.SetMaxIdleConns(10). I've experimented with big numbers and small (from 10 to 2000). Nothing seems to solve the program.
I have one global db connection open. Code snippet below:
// main package
func main() {
var err error
db, err = sql.Open("mysql","usr:pwd#tcp(ip:3306)/gopoc")
if err != nil {
log.Panic(err)
}
db.SetMaxOpenConns(1000)
db.SetConnMaxLifetime(1000)
db.SetMaxIdleConns(10)
// sql.DB should be long lived "defer" closes it once this function ends
defer db.Close()
if err = db.Ping(); err != nil {
log.Panic(err)
}
http.HandleFunc("/addCust/", HFHandleFunc(addCustHandler))
http.ListenAndServe(":8080", nil)
}
// add customer handler
func addCustHandler(w http.ResponseWriter, r *http.Request) {
// experected url: /addCust/?num=3$pcnt=1
num, _ := strconv.Atoi(r.URL.Query().Get("num"))
pcnt, _ := strconv.Atoi(r.URL.Query().Get("pcnt"))
ch := make([]chan string, pcnt) // initialize channel slice
for i := range ch {
ch[i] = make(chan string, 1)
}
var wg sync.WaitGroup
for i, chans := range ch {
wg.Add(1)
go func(cha chan string, ii int) {
defer wg.Done()
addCust(num)
cha <- "Channel[" + strconv.Itoa(ii) + "]\n"
}(chans, i)
}
wg.Wait()
var outputstring string
for i := 0; i < pcnt; i++ {
outputstring = outputstring + <-ch[i]
}
fmt.Fprintf(w, "Output:\n%s", outputstring)
}
func addCust(cnt int) sql.Result {
...
sqlStr := "INSERT INTO CUST (idCUST, idFI, fName, state, country) VALUES "
for i := 0; i < cnt; i++ {
sqlStr += "(" + strconv.Itoa(FiIDpadding+r.Intn(CidMax)+1) + ", " + strconv.Itoa(FiID) +", 'fname', 'PA', 'USA), "
}
//trim the last ,
sqlStr = sqlStr[0:len(sqlStr)-2] + " on duplicate key update lname='dup';"
res, err := db.Exec(sqlStr)
if err != nil {
panic("\nInsert Statement error\n" + err.Error())
}
return res
}
I suppose you are calling sql.Open in each of your routine?
The Open function should be called just once. You should share your opened DB connection between your routines. The DB returned by the Open function can be used concurrently and has its own pool
So I'm trying to use the OpenID package for Golang, located here: https://github.com/yohcop/openid-go
In the _example it says that it uses in memory storage for storing the nonce/discoverycache information and that it will not free the memory and that I should implement my own version of them using some sort of database.
My database of choice is MySQL, I have tried to implement what I thought was correct (but is not, does not give me any compile errors, but crashes on runtime)
My DiscoveryCache.go is as such:
package openid
import (
"database/sql"
"log"
//"time"
_ "github.com/go-sql-driver/mysql"
"github.com/yohcop/openid-go"
)
type SimpleDiscoveredInfo struct {
opEndpoint, opLocalID, claimedID string
}
func (s *SimpleDiscoveredInfo) OpEndpoint() string { return s.opEndpoint }
func (s *SimpleDiscoveredInfo) OpLocalID() string { return s.opLocalID }
func (s *SimpleDiscoveredInfo) ClaimedID() string { return s.claimedID }
type SimpleDiscoveryCache struct{}
func (s SimpleDiscoveryCache) Put(id string, info openid.DiscoveredInfo) {
/*
db, err := sql.Query("mysql", "db:connectinfo")
errCheck(err)
rows, err := db.Query("SELECT opendpoint, oplocalid, claimedid FROM discovery_cache")
errCheck(err)
was unsure what to do here because I'm not sure how to
return the info properly
*/
log.Println(info)
}
func (s SimpleDiscoveryCache) Get(id string) openid.DiscoveredInfo {
db, err := sql.Query("mysql", "db:connectinfo")
errCheck(err)
var sdi = new(SimpleDiscoveredInfo)
err = db.QueryRow("SELECT opendpoint, oplocalid, claimedid FROM discovery_cache WHERE id=?", id).Scan(&sdi)
errCheck(err)
return sdi
}
And my Noncestore.go
package openid
import (
"database/sql"
"errors"
"flag"
"fmt"
"time"
_ "github.com/go-sql-driver/mysql"
)
var maxNonceAge = flag.Duration("openid-max-nonce-age",
60*time.Second,
"Maximum accepted age for openid nonces. The bigger, the more"+
"memory is needed to store used nonces.")
type SimpleNonceStore struct{}
func (s *SimpleNonceStore) Accept(endpoint, nonce string) error {
db, err := sql.Open("mysql", "dbconnectinfo")
errCheck(err)
if len(nonce) < 20 || len(nonce) > 256 {
return errors.New("Invalid nonce")
}
ts, err := time.Parse(time.RFC3339, nonce[0:20])
errCheck(err)
rows, err := db.Query("SELECT * FROM noncestore")
defer rows.Close()
now := time.Now()
diff := now.Sub(ts)
if diff > *maxNonceAge {
return fmt.Errorf("Nonce too old: %ds", diff.Seconds())
}
d := nonce[20:]
for rows.Next() {
var timeDB, nonce string
err := rows.Scan(&nonce, &timeDB)
errCheck(err)
dbTime, err := time.Parse(time.RFC3339, timeDB)
errCheck(err)
if dbTime == ts && nonce == d {
return errors.New("Nonce is already used")
}
if now.Sub(dbTime) < *maxNonceAge {
_, err := db.Query("INSERT INTO noncestore SET nonce=?, time=?", &nonce, dbTime)
errCheck(err)
}
}
return nil
}
func errCheck(err error) {
if err != nil {
panic("We had an error!" + err.Error())
}
}
Then I try to use them in my main file as:
import _"github.com/mysqlOpenID"
var nonceStore = &openid.SimpleNonceStore{}
var discoveryCache = &openid.SimpleDiscoveryCache{}
I get no compile errors but it crashes
I'm sure you'll look at my code and go what the hell (I'm fairly new and only have a week or so experience with Golang so please feel free to correct anything)
Obviously I have done something wrong, I basically looked at the NonceStore.go and DiscoveryCache.go on the github for OpenId, replicated it, but replaced the map with database insert and select functions
IF anybody can point me in the right direction on how to implement this properly that would be much appreciated, thanks! If you need anymore information please ask.
Ok. First off, I don't believe you that the code compiles.
Let's look at some mistakes, shall we?
db, err := sql.Open("mysql", "dbconnectinfo")
This line opens a database connection. It should only be used once, preferably inside an init() function. For example,
var db *sql.DB
func init() {
var err error
// Now the db variable above is automagically set to the left value (db)
// of sql.Open and the "var err error" above is the right value (err)
db, err = sql.Open("mysql", "root#tcp(127.0.0.1:3306)")
if err != nil {
panic(err)
}
}
Bang. Now you're connected to your MySQL database.
Now what?
Well this (from Get) is gross:
db, err := sql.Query("mysql", "db:connectinfo")
errCheck(err)
var sdi = new(SimpleDiscoveredInfo)
err = db.QueryRow("SELECT opendpoint, oplocalid, claimedid FROM discovery_cache WHERE id=?", id).Scan(&sdi)
errCheck(err)
Instead, it should be this:
// No need for a pointer...
var sdi SimpleDiscoveredInfo
// Because we take the address of 'sdi' right here (inside Scan)
// And that's a useless (and potentially problematic) layer of indirection.
// Notice how I dropped the other "db, err := sql.Query" part? We don't
// need it because we've already declared "db" as you saw in the first
// part of my answer.
err := db.QueryRow("SELECT ...").Scan(&sdi)
if err != nil {
panic(err)
}
// Return the address of sdi, which means we're returning a pointer
// do wherever sdi is inside the heap.
return &sdi
Up next is this:
/*
db, err := sql.Query("mysql", "db:connectinfo")
errCheck(err)
rows, err := db.Query("SELECT opendpoint, oplocalid, claimedid FROM discovery_cache")
errCheck(err)
was unsure what to do here because I'm not sure how to
return the info properly
*/
If you've been paying attention, we can drop the first sql.Query line.
Great, now we just have:
rows, err := db.Query("SELECT ...")
So, why don't you do what you did inside the Accept method and parse the rows using for rows.Next()... ?