MySQL sharding in Golang - mysql

I want to scale out my MySQL database into few servers using horizontal sharding. Let's imagine that I have 5 database servers (not replicas) and I want to distribute the data from users table between 5 database servers:
Shard 1 (192.168.1.1)
Shard 2 (192.168.1.2)
Shard 3 (192.168.1.3)
Shard 4 (192.168.1.4)
Shard 5 (192.168.1.5)
Now I want to connect to one of them depending on user_id (server_id = user_id % 5). I will do this on each API request from users in my Go application.
I'm using go-sql-driver and standard database/sql package.
import "database/sql"
import _ "github.com/go-sql-driver/mysql"
shard1, _ := sql.Open("mysql", "user:password#tcp(192.168.1.1:3306)/dbname")
shard2, _ := sql.Open("mysql", "user:password#tcp(192.168.1.2:3306)/dbname")
shard3, _ := sql.Open("mysql", "user:password#tcp(192.168.1.3:3306)/dbname")
...
There is a basic connection pool in the database/sql package, but there is not a lot of control on it. There are also few methods: SetMaxIdleConns, SetMaxOpenConns, SetConnMaxLifetime, but it looks that they work only with single database server at once.
The question is how to properly handle and pool database connections in my Golang application? How to work with multiple database servers in Golang?
Should I create a singleton object with connection map with *DB values, store all connections there and use them across the whole application? For example:
connections := make(map[string]interface{})
connections["shard1"] = shard1
connections["shard2"] = shard2
...
How to close connections or don't close them after SQL query execution?

Related

how to excute conjunctive query crossing two mysql databases with gorm?

code like:
addr:=fmt.Sprintf(`%v:%v#tcp(%v:%v)/(%v,%v)?charset=utf8`, dbuser, dbpassword, dbhost, dbport, dbdatabase)
DB, err = gorm.Open("mysql", addr)
sql := "select * from db1.user join db2.salary"
rows, err := DB.Raw(sql).Rows()
it seems the method gorm.Open() only accept one source parameter, and it run error "unknown table name 'db1.user'"
is there a correct way to init the DB to excute the sql or other way to solve the problem?
many thanks
sovled by setting dbdatabase="" , which it means giving null database name connecting to the mysql instance. And database name shoud be presented as prefix on table name in sql.

Is there a right way to connect to MySQL?

I've come across documentation that says:
the sql.DB object is designed to be long-lived. Don’t Open() and
Close() databases frequently. Instead, create one sql.DB object for
each distinct datastore you need to access
source
And doing some poking around I mostly found code opening the connection in the handler file as so
func dbConn() (db *sql.DB) {
dbDriver := "mysql"
dbUser := "root"
dbPass := "root"
dbName := "goblog"
db, err := sql.Open(dbDriver, dbUser+":"+dbPass+"#/"+dbName)
if err != nil {
panic(err.Error())
}
return db
}
source
And to access the db
db := dbConn()
This is called in the functions that need to use it, to my understanding this would open the connection and then close it when it reaches the end of said function.
Wouldn't this be violating the quote above?
The example is simply poorly written, and Yes you shouldn't Open and Close unless its different datastores.
Open returns DB and
DB is a database handle representing a pool of zero or more underlying connections. It's safe for concurrent use by multiple goroutines

Why does R upload data much faster than KNIME or Workbench?

What I want to know is, what the heck happens, under the hoods, when I upload data through R and it turns to be way much faster than MySQL Workbench or KNIME?
I work with data and, everyday, I upload data into a MySQL server. I used to upload data using KNIME since it was much faster than uploading with MySQL Workbench (select the table -> "import data").
Some infos: The CSV has 4000 rows and 15 columns. The library I used in R is RMySQL. The node I used in KNIME is database writer.
library('RMySQL')
df=read.csv('C:/Users/my_user/Documents/file.csv', encoding = 'UTF-8', sep=';')
connection <- dbConnect(
RMySQL::MySQL(),
dbname = "db_name",
host = "yyy.xxxxxxx.com",
user = "vitor",
password = "****"
)
dbWriteTable(connection, "table_name", df, append=TRUE, row.names=FALSE)
So, to test, I did the exact same process, using the same file. It took 2 minutes in KNIME and only seconds in R.
Everything happens under the hood! Data upload to DB depends on parameters such as interface between DB and tool, network connectivity, batch size set, memory available for tool and tool data processing speed itself and probably some more. In your case RMySQL package uses batch size of 500 by default and KNIME only 1 so probably that is where the difference comes from. Try setting it to 500 in KNIME and then compare. Have no clue how MySQL Workbench works...

slow mysql database restore with multiple threads (delphi)

I need to restore a lot of mysql database backups and I've been trying to speed up by using multiple threads (in Delphi), each with their own connection. When I'm using MODE_SCRIPT, I can only process around 1 file per second (fps), with the CPU/DISK/MEMORY not stressed at all
When I'm using MODE_CMD, I can get as high as 12+fps with the CPU up to 100% on all cores.
It looks like when using TClientDataSet or descendants, the script is not using all cores, even when using multiple threads?
Minimal code example:
type
TWorker = class(TThread)
private
FTasks: TThreadStringList;
FConn: TMyConnection;
FScript: TMyScript;
FQ: TMyQuery;
protected
procedure Execute; override;
public
procedure addTask(const aFn: String);
constructor create(Suspended: Boolean; const aMyId: LongInt;const aIniDb: TIniDBSettings);
end;
procedure TWorker.addTask(const aFn: String);
begin
FTasks.Add(aFn);
end;
constructor TWorker.create(Suspended: Boolean; const aMyId: LongInt; const aIniDb: TIniDBSettings);
begin
inherited Create(Suspended);
FTasks := TMTThreadStringList.Create;
FMyName := 'WORKER__'+IntToStr(aMyId);
end;
procedure TWorker.Execute;
var
mode: LongInt;
const
MODE_DOS=1;
MODE_SCRIPT = 2;
begin
FConn := TMyConnection.Create(Nil);
FConn.Username := aIniDb.iniSDBUsername;
FConn.Password := aIniDb.iniSDBPass;
FConn.Database := aIniDb.iniSDBDatabase;
FConn.Server := aIniDb.iniSDBServer;
FScript := TMyScript.Create(Nil);
FScript.Connection := FConn;
try
FConn.Connect;
while not Terminated do begin
if FTasks.Count > 0 then begin
tmpFn := FTasks.Strings[0];
FTasks.Delete(0);
fMyDbname := 'tmpdb_'+FMyName;
if(mode=MODE_SCRIPT) then {
FQ.SQL.Text := 'drop database if exists '+fMyDbname ;
FQ.Execute;
FQ.SQL.Text := 'create database '+fMyDbname;
FQ.Execute;
FQ.SQL.Text := 'use '+fMyDbname;
fQ.Execute;
FScript.SQL.LoadFromFile(tmpFn+'.new');
FScript.Execute;
}
else if(mode=MODE_DOS) then begin
sCmd := 'cmd.exe /c mysql -u user -h serverip < '+tmpFn;
GetDosOutput(sCmd,dosOutput);//function using 'CreateProcess()'
}
InterlockedIncrement(QDONE);
end
else Sleep(15);
end;
except on e: Exception do
MessageBox(0,PWideChar('error'+e.Message),'error',MB_OK);
end;
end;
It sounds like you are using MyISAM. That is antiquated, and suffers from "table locks", which inhibits much in the way of parallelism.
The following are irrelevant for MyISAM:
-SET FOREIGN_KEY_CHECKS=0;
-SET autocommit=0;
Some questions that relate to the problem:
Do you have AUTO_INCREMENT columns?
Are you inserting into the same table at the same time from different threads? (Problematic with MyISAM and MEMORY, less so with InnoDB.)
How many UNIQUE keys on each table? (INSERTs are slowed down by the need to check for dups.)
Are you using INSERT? One row at a time? Or batched? (Inserting a batch of 100 rows at a time is about optimal -- 10 times as fast as 1 at a time.)
Or are you using LOAD DATA? (Even faster.)
What is the relationship between a "file" and a "table"? That is, are you loading lots of little files into a table, or each file is one table?
Does the RAID have striping and/or a Battery Backed Write Cache?
Is the disk HDD or SSD?
What is the ping time between the client and server? (You mentioned "network", but gave no indication of proximity.)
How many tables? Are you creating up to 1.87 tables per second? That is 3 files to write and 1 to read? (Windows is not the greatest at rapid opening of files.) That's about 7 file opens/sec. (Note InnoDB needs only 1 file per table if using innodb_file_per_table=1.)
Please provide SHOW CREATE TABLE for a couple of the larger tables. Please provide a sample of the SQL statements used.
Wilson's request could also be handy.

How to parsimoniously refer to a data frame in RMySQL

I have a MySQL table that I am reading with the RMySQL package of R. I would like to be able to directly refer to the data frame stored in the table so I can seamlessly interact with it rather than having to execute RMySQL statement every time I want to do something. Is there a way to accomplish this? I tried:
data <- dbReadTable(conn = con, name = 'tablename')
For example, if I now want to check how many rows I have in this table I would run:
nrow(data)
Does this go through the database connection, or am I now storing the object "data" locally, defeating the whole purpose of using an external database?
data <- dbReadTable(conn = con, name = 'tablename')
This command downloads all the data into a local R dataframe (assuming you have enough RAM). Any operations with data from that point forward do not require the SQL connection.