Golang Gorm scope broken after upgrade from v1 to v2 - mysql

I was using Gorm v1. I had this scope for pagination which was working correctly:
func Paginate(entity BaseEntity, p *Pagination) func(db *gorm.DB) *gorm.DB {
return func(db *gorm.DB) *gorm.DB {
var totalRows int64
db.Model(entity).Count(&totalRows)
totalPages := int(math.Ceil(float64(totalRows) / float64(p.PerPage)))
p.TotalPages = totalPages
p.TotalCount = int(totalRows)
p.SetLinks(entity.ResourceName())
return db.Offset(p.Offset).Limit(p.PerPage)
}
}
and the way I called it:
if err := gs.db.Scopes(entities.Paginate(genre, p)).Find(&gs.Genres).Error; err != nil {
return errors.New(err.Error())
}
again, this used to work correctly, that is until I upgraded to Gorm v2. Now I'm getting the following message:
[0.204ms] [rows:2] SELECT * FROM genres LIMIT 2
sql: expected 9 destination arguments in Scan, not 1; sql: expected 9 destination arguments in Scan, not 1[GIN] 2022/06/18 - 00:41:00 | 400 | 1.5205ms | 127.0.0.1 | GET "/api/v1/genres"
Error #01: sql: expected 9 destination arguments in Scan, not 1; sql: expected 9 destination arguments in Scan, not 1
Now, I found out that the error is due to this line:
db.Model(entity).Count(&totalRows)
because if I remove it then my query is being correctly executed (obviously the data for TotalPages is not correct since it wasn't calculated). Going through the documentation I saw https://gorm.io/docs/method_chaining.html#Multiple-Immediate-Methods so my guess is that the connection used to get totalRows is being reused and have some residual data therefore my offset and limit query is failing.
I tried to create a new session for both the count and the offset queries:
db.Model(entity).Count(&totalRows).Session(&gorm.Session{})
return db.Offset(p.Offset).Limit(p.PerPage).Session(&gorm.Session{})
hoping that each one will use their own session but doesn't seem to work.
Any suggestions?

In case anyone needs it:
I did have to create a new session but I wasn't creating it the right way. I ended up doing:
countDBSession := db.Session(&gorm.Session{Initialized: true})
countDBSession.Model(entity).Count(&totalRows)
and that worked as expecting. So my scope now is:
// Paginate is a Gorm scope function.
func Paginate(entity BaseEntity, p *Pagination) func(db *gorm.DB) *gorm.DB {
return func(db *gorm.DB) *gorm.DB {
var totalRows int64
// we must create a new session to run the count, otherwise by using the same db connection
// we'll get some residual data which will cause db.Offset(p.Offset).Limit(p.PerPage) to fail.
countDBSession := db.Session(&gorm.Session{Initialized: true})
countDBSession.Model(entity).Count(&totalRows)
totalPages := int(math.Ceil(float64(totalRows) / float64(p.PerPage)))
p.TotalPages = totalPages
p.TotalCount = int(totalRows)
p.SetLinks(entity.ResourceName())
return db.Offset(p.Offset).Limit(p.PerPage)
}
}
notice that I'm using a new session to get the count via countDBSession which won't affect the latter use of the *db.Gorm parameter in return db.Offset(p.Offset).Limit(p.PerPage).Session(&gorm.Session{})

Related

Safely perform DB migrations with Go

Let's say I have a web app that shows a list of posts. The post struct is:
type Post struct {
Id int64 `sql:",primary"`
Title string
Body string
}
It retrieves the posts with:
var posts []*Post
rows, err := db.QueryContext(ctx, "select * from posts;")
if err != nil {
return nil, oops.Wrapf(err, "could not get posts")
}
defer rows.Close()
for rows.Next() {
p := &Post{}
err := rows.Scan(
&p.Id,
&p.Title,
&p.Body,
)
if err != nil {
return nil, oops.Wrapf(err, "could not scan row")
}
posts = append(posts, p)
}
return posts, nil
All works fine. Now, I want to alter the table schema by adding a column:
ALTER TABLE posts ADD author varchar(62);
Suddenly, the requests to get posts result in:
sql: expected 4 destination arguments in Scan, not 3
which makes sense since the table now has 4 columns instead of the 3 stipulated by the retrieval logic.
I can then update the struct to be:
type Post struct {
Id int64 `sql:",primary"`
Title string
Body string
Author string
}
and the retrival logic to be:
for rows.Next() {
p := &Post{}
err := rows.Scan(
&p.Id,
&p.Title,
&p.Body,
&p.Author
)
if err != nil {
return nil, oops.Wrapf(err, "could not scan row")
}
posts = append(posts, p)
}
which solves this. However, this implies there is always a period of downtime between migration and logic update + deploy. How to avoid that downtime?
I have tried swapping the order of the above changes but this does not work, with that same request resulting in:
sql: expected 3 destination arguments in Scan, not 4
(which makes sense, since the table only has 3 columns at that point as opposed to 4);
and other requests resulting in:
Error 1054: Unknown column 'author' in 'field list'
(which makes sense, because at that point the posts table does not have an author column just yet)
You should be able to achieve your desired behavior by adapting the SQL Query to return the exact fields you want to populate.
SELECT Id , Title , Body FROM posts;
This way even if you add another column Author the query results only contain 3 values.

Compare time in gorm: time.Time or string

We are using gorm v1.9.11and mysql.
// before is string like "2019-01-15T06:31:14Z"
if len(before) > 0 {
// TODO: Determine time format
beforeDate, e := time.Parse(time.RFC3339, before)
if e != nil {
...
}
// TODO: Pass in string?
statement = statement.Where("created_at < ?", beforeDate)
}
I have two questions:
1, any problems in the above piece of codes? I found it took about 5 minutes to finish the query given before. If no before, it tooks 14 seconds.
2, it is OK to pass string like "2019-01-15T06:31:14Z" as argument in statement.Where("created_at < ?", beforeString)?
Thanks
UPDATE
We have an index on created_at, the query plan shows that it will filter 8millions rows based on created_At. I guess this is the reason.
Now, trying to avoid this index to be used.

GORM 2.0 get last insert ID

I'm operating on a MySQL database using GORM v 2.0. I'm inserting row into database using GORM transaction (tx := db.Begin()). In previous GORM versions, Begin() returned sql.Tx object that allowed to use LastInsertId() method on query return parameter.
To do that in GORM v 2.0, i can simply call db.Last() function after insert row into database, or i can use smarter method?
Thank you.
In V2.0 the GetLastInsertId method was removed. As #rustyx says, the ID is populated in the model you pass the Create function. I wouldn't bother calling db.Last(&...) as this is a bit of a waste when the model will already have it.
type User struct {
gorm.Model
Name string
}
user1 := User{Name: "User One"}
_ = db.Transaction(func(tx *gorm.DB) error {
tx.Create(&user1)
return nil
})
// This is unnecessary
// db.Last(&user1)
fmt.Printf("User one ID: %d\n", user1.ID)

Limit max prepared statement count

The problem
I wrote an application which synchronizes data from BigQuery into a MySQL database. I try to insert roughly 10-20k rows in batches (up to 10 items each batch) every 3 hours. For some reason I receive the following error when it tries to upsert these rows into MySQL:
Can't create more than max_prepared_stmt_count statements:
Error 1461: Can't create more than max_prepared_stmt_count statements
(current value: 2000)
My "relevant code"
// ProcessProjectSkuCost receives the given sku cost entries and sends them in batches to upsertProjectSkuCosts()
func ProcessProjectSkuCost(done <-chan bigquery.SkuCost) {
var skuCosts []bigquery.SkuCost
var rowsAffected int64
for skuCostRow := range done {
skuCosts = append(skuCosts, skuCostRow)
if len(skuCosts) == 10 {
rowsAffected += upsertProjectSkuCosts(skuCosts)
skuCosts = []bigquery.SkuCost{}
}
}
if len(skuCosts) > 0 {
rowsAffected += upsertProjectSkuCosts(skuCosts)
}
log.Infof("Completed upserting project sku costs. Affected rows: '%d'", rowsAffected)
}
// upsertProjectSkuCosts inserts or updates ProjectSkuCosts into SQL in batches
func upsertProjectSkuCosts(skuCosts []bigquery.SkuCost) int64 {
// properties are table fields
tableFields := []string{"project_name", "sku_id", "sku_description", "usage_start_time", "usage_end_time",
"cost", "currency", "usage_amount", "usage_unit", "usage_amount_in_pricing_units", "usage_pricing_unit",
"invoice_month"}
tableFieldString := fmt.Sprintf("(%s)", strings.Join(tableFields, ","))
// placeholderstring for all to be inserted values
placeholderString := createPlaceholderString(tableFields)
valuePlaceholderString := ""
values := []interface{}{}
for _, row := range skuCosts {
valuePlaceholderString += fmt.Sprintf("(%s),", placeholderString)
values = append(values, row.ProjectName, row.SkuID, row.SkuDescription, row.UsageStartTime,
row.UsageEndTime, row.Cost, row.Currency, row.UsageAmount, row.UsageUnit,
row.UsageAmountInPricingUnits, row.UsagePricingUnit, row.InvoiceMonth)
}
valuePlaceholderString = strings.TrimSuffix(valuePlaceholderString, ",")
// put together SQL string
sqlString := fmt.Sprintf(`INSERT INTO
project_sku_cost %s VALUES %s ON DUPLICATE KEY UPDATE invoice_month=invoice_month`, tableFieldString, valuePlaceholderString)
sqlString = strings.TrimSpace(sqlString)
stmt, err := db.Prepare(sqlString)
if err != nil {
log.Warn("Error while preparing SQL statement to upsert project sku costs. ", err)
return 0
}
// execute query
res, err := stmt.Exec(values...)
if err != nil {
log.Warn("Error while executing statement to upsert project sku costs. ", err)
return 0
}
rowsAffected, err := res.RowsAffected()
if err != nil {
log.Warn("Error while trying to access affected rows ", err)
return 0
}
return rowsAffected
}
// createPlaceholderString creates a string which will be used for prepare statement (output looks like "(?,?,?)")
func createPlaceholderString(tableFields []string) string {
placeHolderString := ""
for range tableFields {
placeHolderString += "?,"
}
placeHolderString = strings.TrimSuffix(placeHolderString, ",")
return placeHolderString
}
My question:
Why do I hit the max_prepared_stmt_count when I immediately execute the prepared statement (see function upsertProjectSkuCosts)?
I could only imagine it's some sort of concurrency which creates tons of prepared statements in the meantime between preparing and executing all these statements. On the other hand I don't understand why there would be so much concurrency as the channel in the ProcessProjectSkuCost is a buffered channel with a size of 20.
You need to close the statement inside upsertProjectSkuCosts() (or re-use it - see the end of this post).
When you call db.Prepare(), a connection is taken from the internal connection pool (or a new connection is created, if there aren't any free connections). The statement is then prepared on that connection (if that connection isn't free when stmt.Exec() is called, the statement is then also prepared on another connection).
So this creates a statement inside your database for that connection. This statement will not magically disappear - having multiple prepared statements in a connection is perfectly valid. Golang could see that stmt goes out of scope, see it requires some sort of cleanup and then do that cleanup, but Golang doesn't (just like it doesn't close files for you and things like that). So you'll need to do that yourself using stmt.Close(). When you call stmt.Close(), the driver will send a command to the database server, telling it the statement is no longer needed.
The easiest way to do this is by adding defer stmt.Close() after the err check following db.Prepare().
What you can also do, is prepare the statement once and make that available for upsertProjectSkuCosts (either by passing the stmt into upsertProjectSkuCosts or by making upsertProjectSkuCosts a func of a struct, so the struct can have a property for the stmt). If you do this, you should not call stmt.Close() - because you aren't creating new statements anymore, you are re-using an existing statement.
Also see Should we also close DB's .Prepare() in Golang? and https://groups.google.com/forum/#!topic/golang-nuts/ISh22XXze-s

Retrieve relation one to many into JSON sql pure, Golang, Performance

Suppose that I've the following structures that it's the mapped tables.
type Publisher struct{
ID int `db:"id"`
Name string `db:"name"`
Books []*Book
}
type Book struct {
ID int `db:"id"`
Name string `db:"name"`
PublisherID `db:"publisher_id"`
}
So, What if I wanna retrieve all the Publisher with all related Books I would like to get a JSON like this:
[ //Publisher 1
{
"id" : "10001",
"name":"Publisher1",
"books" : [
{ "id":321,"name": "Book1"},
{ "id":333,"name": "Book2"}
]
},
//Publisher 2
{
"id" : "10002",
"name":"Slytherin Publisher",
"books" : [
{ "id":4021,"name": "Harry Potter and the Chamber of Secrets"},
{ "id":433,"name": "Harry Potter and the Order of the Phoenix"}
]
},
]
So I've the following structure that I use to retrieve all kind of query related with Publisher
type PublisherRepository struct{
Connection *sql.DB
}
// GetEbooks return all the books related with a publisher
func (r *PublisherRepository) GetBooks(idPublisher int) []*Book {
bs := make([]Book,0)
sql := "SELECT * FROM books b WHERE b.publisher_id =$1 "
row, err := r.Connection.Query(sql,idPublisher)
if err != nil {
//log
}
for rows.Next() {
b := &Book{}
rows.Scan(&b.ID, &b.Name, &b.PublisherID)
bs := append(bs,b)
}
return bs
}
func (r *PublisherRepository) GetAllPublishers() []*Publisher {
sql := "SELECT * FROM publishers"
ps := make([]Publisher,0)
rows, err := r.Connection.Query(sql)
if err != nil {
// log
}
for rows.Next() {
p := &Publisher{}
rows.Scan(&p.ID,&p.Name)
// Is this the best way?
books := r.GetBooks(p.ID)
p.Books = books
}
return ps
}
So , here my questions
What is the best approach to retrieve all the publisher with the best performance, because a for inside a for is not the best solution, what if I've 200 publisher and in the average of each publisher has 100 books.
Is in GoLang idiomatic PublisherRepository or is there another way to create something to manage the transactions of an entity with pure sql?
1) Bad about this would be the sql request per iteration. So here a solution that does not make an extra request per Publisher:
func (r *PublisherRepository) GetAllPublishers() []*Publisher {
sql := "SELECT * FROM publishers"
ps := make(map[int]*Publisher)
rows, err := connection.Query(sql)
if err != nil {
// log
}
for rows.Next() {
p := &Publisher{}
rows.Scan(&p.ID,&p.Name)
ps[p.ID] = p
}
sql = "SELECT * FROM books"
rows, err := connection.Query(sql)
if err != nil {
//log
}
for rows.Next() {
b := &Book{}
rows.Scan(&b.ID, &b.Name, &b.PublisherID)
ps[b.PublisherID].Books = append(ps[b.PublisherID].Books, b)
}
// you might choose to keep the map as a return value, but otherwise:
// preallocate memory for the slice
publishers := make([]*Publisher, 0, len(ps))
for _, p := range ps {
publishers = append(publishers, p)
}
return publishers
}
2) Unless you create the PublisherRepository only once, this might be a bad idea creating and closing loads of connections. Depending also on your sql client implementation I would suggest (and also have seen it for many other go database clients) to have one connection for the entire server. Pooling is done internally by many of the sql clients, that is why you should check your sql client.
If your sql client library does pooling internally use a global variable for the "connection" (it's not really one connection if pooling is done internally):
connection *sql.DB
func New () *PublisherRepository {
repo := &PublisherRepository{}
return repo.connect()
}
type PublisherRepository struct{
}
func (r *PublisherRepository) connect() *PublisherRepository {
// open new connection if connection is nil
// or not open (if there is such a state)
// you can also check "once.Do" if that suits your needs better
if connection == nil {
// ...
}
return r
}
So each time you create a new PublisherRepository, it will only check if connection already exists. If you use once.Do, go will only create the "connection" once and you are done with it.
If you have other structs that will use the connection as well, you need a global place for your connection variable or (even better) you write a little wrapper package for your sql client, that is in turn used in all your structs.
In your case, the simplest way would be to use json_agg in the query. Like here http://sqlfiddle.com/#!15/97c41/4 (sqlfiddle is slow so here is screenshot http://i.imgur.com/hxMPkUa.png) Not very Go friendly (you need to unmarshal query result data if you want to do something with the books) but all books in one query as you wanted without for loops.
As #TehSphinX said it is better to have single global db connection.
But before implementing strange queries I really suggest you to think: why do you need to return the full list of publishers and their books in one API query? I can't imagine the situation in web or mobile app where your idea might be a good decision. Usually, you just show users list of publishers then users chooses one and you show him the list of books by this publisher. This is "win-win" situation for you and your users - you can make simple queries and your users just get small sets of data that they actually need without paying for unnecessary traffic/wasting browser memory. As you said there can be 200 publishers with 100 books and I'm sure your users don't need 20000 books loaded in one request. Of course, if you are not trying to make your API more data theft friendly.
Even if you have something like a short preview-like list of books for each publisher you should think about pagination for publishers and/or denormalisation of books data for this case (add a column to publishers table with the short list of books in JSON format).