slow mysql database restore with multiple threads (delphi) - mysql

I need to restore a lot of mysql database backups and I've been trying to speed up by using multiple threads (in Delphi), each with their own connection. When I'm using MODE_SCRIPT, I can only process around 1 file per second (fps), with the CPU/DISK/MEMORY not stressed at all
When I'm using MODE_CMD, I can get as high as 12+fps with the CPU up to 100% on all cores.
It looks like when using TClientDataSet or descendants, the script is not using all cores, even when using multiple threads?
Minimal code example:
type
TWorker = class(TThread)
private
FTasks: TThreadStringList;
FConn: TMyConnection;
FScript: TMyScript;
FQ: TMyQuery;
protected
procedure Execute; override;
public
procedure addTask(const aFn: String);
constructor create(Suspended: Boolean; const aMyId: LongInt;const aIniDb: TIniDBSettings);
end;
procedure TWorker.addTask(const aFn: String);
begin
FTasks.Add(aFn);
end;
constructor TWorker.create(Suspended: Boolean; const aMyId: LongInt; const aIniDb: TIniDBSettings);
begin
inherited Create(Suspended);
FTasks := TMTThreadStringList.Create;
FMyName := 'WORKER__'+IntToStr(aMyId);
end;
procedure TWorker.Execute;
var
mode: LongInt;
const
MODE_DOS=1;
MODE_SCRIPT = 2;
begin
FConn := TMyConnection.Create(Nil);
FConn.Username := aIniDb.iniSDBUsername;
FConn.Password := aIniDb.iniSDBPass;
FConn.Database := aIniDb.iniSDBDatabase;
FConn.Server := aIniDb.iniSDBServer;
FScript := TMyScript.Create(Nil);
FScript.Connection := FConn;
try
FConn.Connect;
while not Terminated do begin
if FTasks.Count > 0 then begin
tmpFn := FTasks.Strings[0];
FTasks.Delete(0);
fMyDbname := 'tmpdb_'+FMyName;
if(mode=MODE_SCRIPT) then {
FQ.SQL.Text := 'drop database if exists '+fMyDbname ;
FQ.Execute;
FQ.SQL.Text := 'create database '+fMyDbname;
FQ.Execute;
FQ.SQL.Text := 'use '+fMyDbname;
fQ.Execute;
FScript.SQL.LoadFromFile(tmpFn+'.new');
FScript.Execute;
}
else if(mode=MODE_DOS) then begin
sCmd := 'cmd.exe /c mysql -u user -h serverip < '+tmpFn;
GetDosOutput(sCmd,dosOutput);//function using 'CreateProcess()'
}
InterlockedIncrement(QDONE);
end
else Sleep(15);
end;
except on e: Exception do
MessageBox(0,PWideChar('error'+e.Message),'error',MB_OK);
end;
end;

It sounds like you are using MyISAM. That is antiquated, and suffers from "table locks", which inhibits much in the way of parallelism.
The following are irrelevant for MyISAM:
-SET FOREIGN_KEY_CHECKS=0;
-SET autocommit=0;
Some questions that relate to the problem:
Do you have AUTO_INCREMENT columns?
Are you inserting into the same table at the same time from different threads? (Problematic with MyISAM and MEMORY, less so with InnoDB.)
How many UNIQUE keys on each table? (INSERTs are slowed down by the need to check for dups.)
Are you using INSERT? One row at a time? Or batched? (Inserting a batch of 100 rows at a time is about optimal -- 10 times as fast as 1 at a time.)
Or are you using LOAD DATA? (Even faster.)
What is the relationship between a "file" and a "table"? That is, are you loading lots of little files into a table, or each file is one table?
Does the RAID have striping and/or a Battery Backed Write Cache?
Is the disk HDD or SSD?
What is the ping time between the client and server? (You mentioned "network", but gave no indication of proximity.)
How many tables? Are you creating up to 1.87 tables per second? That is 3 files to write and 1 to read? (Windows is not the greatest at rapid opening of files.) That's about 7 file opens/sec. (Note InnoDB needs only 1 file per table if using innodb_file_per_table=1.)
Please provide SHOW CREATE TABLE for a couple of the larger tables. Please provide a sample of the SQL statements used.
Wilson's request could also be handy.

Related

How to switch between databases using GORM in golang?

I'm new to GORM in golang. I'm stuck at a point. Generally we select the database like this:
DBGorm, err = gorm.Open("mysql", user:password#tcp(host:port)/db_name)
But my problem is I'll get the 'db_name' in the request, which means I don't know which db_name might come and I'll have to query according to that db_name.
So now, I'll create the database pointer in the init function like this:
DBGorm, err = gorm.Open("mysql", user:password#tcp(host:port)/) which is without the db_name.
Now how will I switch to db_name coming to me in request. Because when I try to do DBGorm.Create(&con), it shows No database selected.
If I use 'database/sql', then I can make raw queries like this: "SELECT * FROM db_name.table_name", this can solve my problem. But how to do this in gorm?
You can explicitly specify db_name and table_name using .Table() when doing query or other operation on table.
DBGorm.Table("db_name.table_name").Create(&con)
I saw a related article on Github. https://github.com/go-sql-driver/mysql/issues/173#issuecomment-427721651
All you need to do is
start a transaction,
set you database
run your desired query.
and switch back to your desired DB
commit once you are done.
below is an example
tx := db.Begin() // start transaction
tx.Exec("use " + userDB) // switch to tenant db
tx.Exec("insert into ....") // do some work
tx.Exec("use `no-op-db`") // switch away from tenant db (there is no unuse, so I just use a dummy)
tx.Commit() // end transaction

Delphi Slowdown on importing multiple CSV files to a dataset

I am using Delphi 7, Windows 7 and Absolute Database.
Quick Background. I work for a charity shop that relies on items being donated to us. To reclaim 'Gift Aid' from our HMRC we have to submit details of all sales together with the name and address of each donator of that sale. We help people with Special Needs so accurate data input is important.
Up to now to check Post Code verification was easy (based on our local area) basically the format of AA00_0AA or AA0_0AA. As our name has become better known not all Post Codes follow these rules.
I have access to UK's Royal Mail database for addresses in the UK si I thought to actually compare the inputted Post Code with a real Post Code. The RM csv file is huge so I use GSplit3 to break it down into more manageable files. This leaves me with 492 csv files each consisting of about 62000 lines. Note that I am only interested in the Post Codes so there is massive duplication.
To load these files into a dataset (without duplication) I first loaded the file names into a listbox and ran a loop to iterate through all the files to copy to the dataser.To avoid duplication I tried putting an unique index on the field but even running outside of Delphi I still got an error message about duplication. I then tried capturing the text of the last record to be appended and then compare it with the next record
procedure TForm1.importClick(Sender: TObject);
var
i,y:Integer;
lstfile:string;
begin
for i:= 0 to ListBox1.Items.Count-1 do
begin
lstfile:='';
cd.Active:=False;//cd is a csv dataset loaded with csv file
cd.FileName:='C:\paf 112018\CSV PAF\'+ListBox1.Items[i];
cd.Active:=True;
while not cd.Eof do
begin
if (lstfile:=cd.Fields[0].AsString=cd.Fields[0].AsString) then cd.Next
else
table1.append;
table1.fields[0].asstring:=cd.Fields[0].AsString;
lstfile:=cd.Fields[0].AsString;
cd.Next;
end;
end;
table1.Edit;
table1.Post;
end;
This seemed to work OK although the total number of records in the dataset seemed low. I checked with my own Post Code and it wasn't there although another post Code was located. So obviously records had been skipped. I then tried loading the CSV file into a string list with dupignore then copying the stringlist to the dataset.
unit Unit1;
interface
uses
Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
Dialogs, Grids, DBGrids, SMDBGrid, StdCtrls, DB, ABSMain, SdfData;
type
TForm1 = class(TForm)
cd: TSdfDataSet;
dscd: TDataSource;
dst: TDataSource;
ABSDatabase1: TABSDatabase;
table1: TABSTable;
table1PostCode: TStringField;
Label2: TLabel;
ListBox1: TListBox;
getfiles: TButton;
import: TButton;
procedure getfilesClick(Sender: TObject);
procedure importClick(Sender: TObject);
private
{ Private declarations }
public
{ Public declarations }
end;
var
Form1: TForm1;
num:Integer;
implementation
{$R *.dfm}
procedure ListFileDir(Path: string; FileList: TStrings);
var
SR: TSearchRec;
begin
if FindFirst(Path + '*.csv', faAnyFile, SR) = 0 then
begin
repeat
if (SR.Attr <> faDirectory) then
begin
FileList.Add(SR.Name);
end;
until FindNext(SR) <> 0;
FindClose(SR);
end;
end;
procedure TForm1.getfilesClick(Sender: TObject);
begin//Fill listbox with csv files
ListFileDir('C:\paf 112018\CSV PAF\', ListBox1.Items);
end;
//start to iterate through files to appane to dataset
procedure TForm1.importClick(Sender: TObject);
var
i,y:Integer;
myl:TStringList;
begin
for i:= 0 to ListBox1.Items.Count-1 do
begin
myl:=TStringList.Create;
myl.Sorted:=True;
myl.Duplicates:=dupIgnore;
cd.Active:=False;
cd.FileName:='C:\paf 112018\CSV PAF\'+ListBox1.Items[i];
cd.Active:=True;
while not cd.Eof do
begin
if (cd.Fields[0].AsString='')then cd.Next
else
myl.Add(cd.Fields[0].AsString);
cd.Next;
end;
for y:= 0 to myl.Count-1 do
begin
table1.Append;
table1.Fields[0].AsString:=myl.Strings[y];
end;
myl.Destroy;
end;
t.Edit;
t.Post;
end;
procedure TForm1.Button1Click(Sender: TObject);
begin
t.Locate('Post Code',edit1.text,[]);
end;
procedure TForm1.Button2Click(Sender: TObject);
var
sel:string;
begin
q.Close;
q.SQL.Clear;
q.SQL.Add('Select * from postc where [Post Code] like :sel');
q.ParamByName('sel').AsString:=Edit1.Text;
q.Active:=True;
end;
end.
This starts well but soon starts to slow I presume because of memory leaks, I have tried myl.free(), freeandnil(myl) and finally destroy but they all slow down quickly. I am not an expert but do enjoy using Delphi and normally manage to solve problems through your pages or googling but this time I am stumped. Can anyone suggest a better method please
The following shows how to add postcodes from a file containing a list of them
to a query-type DataSet such as TAdoQuery or your q dataset (your q doesn't say what type
it is, as far as I can see).
It also seems from what you say that although you treat your postcode file
as a CVS file, you shouldn't actually need to: if the records contain only one
field, there should be no need for commas, because there is nothing to separate,
the file should contain simply one postcode per line. Consequently, there
doesn't seem to be any need to incur the overhead of loading it as a CSV file, so you should be able simply to load it into a TStringList and add the postcodes from there.
I'm not going to attempt to correct your code, just show a very simple example of how I think it should be done.
So the following code opens a postcode list file, which is assumed to contain one postcode per line, checks whether each entry in it
already exists in your table of postcodes and adds it if not.
procedure TForm1.AddPostCodes(const PostCodeFileName: String);
// The following shows how to add postcodes to a table of existing ones
// from a file named PostCodeFileName which should include an explicit path
var
PostCodeList : TStringList;
PostCode : String;
i : Integer;
begin
PostCodeList := TStringList.Create;
try
PostCodeList.LoadFromFile(PostCodeFileName);
if qPostCodes.Active then
qPostCodes.Close;
qPostCodes.Sql.Text := 'select * from postcodes order by postcode';
qPostCodes.Open;
try
qPostCodes.DisableControls; // Always call DisableControls + EnableControls
// when iterating a dataset which has db-aware controls connected to it
for i := 0 to PostCodeList.Count - 1 do begin
PostCode := PostCodeList[i]; // use of PostCode local variable is to assist debuggging
if not qPostCodes.Locate('PostCode', PostCode, [loCaseInsensitive]) then
qPostCodes.InsertRecord([PostCode]); // InsertRecord does not need to be foollowed by a call to Post.
end;
finally
qPostCodes.EnableControls;
qPostCodes.Close;
end;
finally
PostCodeList.Free; // Don't use .Destroy!
end;
end;
Btw, regarding the comment about bracketing an iteration of a dataset
by calls to DisableControls and EnableControls, the usual reason for doing this is to avoid the overheaad of updating the gui's display of any db-aware controls connected to the dataset. However, one of the reasons I'm
not willing to speculate what is causing your slowdown is that TAdoQuery,
which is one of the standard dataset types that codes with Delphi benefits massively
from DisableControls/EnableControls even when there are NO db-aware controls
connected to it. This is because of a quirk in the coding of TAdoQuery. Whatever
dataset you are using may have a similar quirk.

Why Oracle insert faster than Mysql

I have a little compharison between oracle 11gR2 and Mysql 5.6.
I create same schema in both DBMS with 3 tables
--branch
--client
--loan
loan has a foreign key to client, and a client has a foreign key to branch, besides all of them have primary keys.
I created branches, and client (200_000 clients) and I wanna tests insert perfomance with loan table which is consist around 50 columns.
Most of clolumns double or integer or string.
create or replace PROCEDURE create_loans( n number)
as
BEGIN
Declare
i number:=0;
randDouble float ;
randInt number;
randString varchar2(50);
Begin
while i < n
Loop
randDouble := ROUND(dbms_random.value(0,1),17);
randInt := ROUND(dbms_random.value(1,100000000));
randString := dbms_random.string('l', 50);
Insert into loan_row_model.loan values(null,
randDouble,
randDouble*10,
randDouble*13,
SUBSTR(randString,1,32),
SUBSTR(randString,2,10),
randDouble*155,
SUBSTR(randString,1,9),
SUBSTR(randString,9,10),
SUBSTR(randString,1,32),
randDouble*6123,--annual_inc
SUBSTR(randString,3,32),--verification_status
SUBSTR(randString,4,30),
randDouble,
randInt,--open_acc
randInt*2,
SUBSTR(randString,7,7),
randInt*5,--total_acc
SUBSTR(randString,1,3),--initial_list_status
randDouble*64,
randDouble*4,
randDouble*231,
randDouble,
randDouble,
randDouble*12,
randDouble,--collection_recovery_fee
SUBSTR(randString,19,30),
randDouble*14,--last_pymnt_amnt
SUBSTR(randString,21,32),
SUBSTR(randString,9,30),
SUBSTR(randString,16,15),--policy_code
SUBSTR(randString,1,29),--application_type
randInt,
randInt*7,
randInt*4,
randInt,
randInt,
randInt,
randInt*3,
randInt,--mths_since_rcnt_il
randDouble*6149,
randInt*8,--open_rv_12m
randInt*8,--open_rv_24m
randDouble*475,
randDouble*37,--all_util
randInt*4,
randInt,
randInt*3,
randInt,
randInt*9,
TO_DATE( TRUNC( DBMS_RANDOM.VALUE(TO_CHAR(DATE '2016-01-01','J'),TO_CHAR(DATE '2046-12-31','J') )),'J'),
ROUND(dbms_random.value(1,200000))
);
i := i+1;
end loop;
end;
END;
the procedure in mysql almost identical, I just used their native random generator for values.
Before start I have disabled parallel executing in oracle, and flush cache, in mysql also disable cache.
But as a result for 50000 inserts Oracle has 15s vs 30s in Mysql.
What is the reason, could you help?
MySQL can do that in 3 seconds if you "batch" 100 rows at a time. Perhaps even faster with LOAD DATA.
How often do you need to insert 50K rows? In other words, why does it matter?
Show us SHOW CREATE TABLE; there could be various issues (favorable or unfavorable) with the indexes or lack of them, and also in the datatypes, and especially the "engine".
Were they "finished"? Both Oracle and MySQL do some variant on "delayed writes" to avoid making you wait. 15s or 30s may or may not be sustainable.
Were you using spinning drives or SSDs? RAID with write cache? What about the settings for autocommit versus BEGIN...COMMIT? Did you even do a commit? Or does the timing include a rollback?! Committing after each INSERT is not a good idea since it has a huge overhead.
Were the settings tuned optimally?
Did the table already have data? Were you inserting "at the end"? Or randomly?
When you have answered all of those, I may have another 10 questions that will show that further things can be done to make your benchmark 'prove' that one vendor or the other is faster.

Attempt to fetch logical page in database 2 failed. It belongs to allocation unit X not to Y

Started to get following error when executing certain SP. Code related to this error is pretty simple, joining #temp table to real table
Full text of error:
Msg 605, Level 21, State 3, Procedure spSSRSRPTIncorrectRevenue, Line 123
Attempt to fetch logical page (1:558552) in database 2 failed. It belongs to allocation unit 2089673263876079616 not to 4179358581172469760.
Here is what I found:
https://support.microsoft.com/en-us/kb/2015739
This suggests some kind of issue with database. I run DBCC CHECKDB on user database and on temp database - all passes.
Second thing I'm doing - trying to find which table those allocation units belong
SELECT au.allocation_unit_id, OBJECT_NAME(p.object_id) AS table_name, fg.name AS filegroup_name,
au.type_desc AS allocation_type, au.data_pages, partition_number
FROM sys.allocation_units AS au
JOIN sys.partitions AS p ON au.container_id = p.partition_id
JOIN sys.filegroups AS fg ON fg.data_space_id = au.data_space_id
WHERE au.allocation_unit_id in(2089673263876079616, 4179358581172469760)
ORDER BY au.allocation_unit_id
This returns 2 objects in tempdb, not in user db. So, it makes me think it's some kind of data corruption in tempdb? I'm developer, not DBA. Any suggestions on what I should check next?
Also, when I run query above, how can I tell REAL object name that I understand? Like #myTempTable______... instead of #07C650CE
I was able to resolve this by clearing the SQL caches:
DBCC FREEPROCCACHE
GO
DBCC DROPCLEANBUFFERS
GO
Apparently restarting the SQL service would have had the same affect.
(via Made By SQL, reproduced here to help others!)
I have like your get errors too.
firstly you must backing up to table or object for dont panic more after. I tryed below steps on my Database.
step 1:
Backing up table (data movement to other table as manuel or vs..how can you do)
I used to below codes to my table move other table
--CODE-
set nocount on;
DECLARE #Counter INT = 1;
DECLARE #LastRecord INT = 10000000; --your table_count
WHILE #Counter < #LastRecord
BEGIN
BEGIN TRY
BEGIN
insert into your_table_new SELECT * FROM your_table WHERE your_column= #Counter --dont forget! create your_table_new before
END
END TRY
BEGIN CATCH
BEGIN
insert into error_code select #Counter,'error_number' --dont forget the create error_code table before.
END
END CATCH
SET #Counter += 1;
END;
step 2:
-DBCC CHECKTABLE(your_table , REPAIR_REBUILD )
GO
check your table. if you have an error go to other step_3.
step 3:
!!attention!! you can lost some data/datas on your table. but dont worry. so you backed-up your table in step_1.
-DBCC CHECKTABLE(your_table , REPAIR_ALLOW_DATA_LOSS)
GO
Good luck!
~~pektas
In my case, truncating and re-populating data in the concerned tables was the solution.
Most probably the data inside tables was corrupted.
Database ID 2 means your tempdb is corrupted. Fixing tempdp is easy. Restart sqlserver service and you are good to go.
This could be an instance of a bug Microsoft fixed on SQL Server 2008 with queries on temporary tables that self reference (for example we have experienced it when loading data from a real table to a temporary table while filtering any rows we already have populated in the temp table in a previous step).
It seems that it only happens on temporary tables with no identity/primary key, so a workaround is to add one, although if you patch CU3 or later you also can enable the hotfix via turning a trace flag on.
For more details on the bug/fixes: https://support.microsoft.com/en-us/help/960770/fix-you-receive-error-605-and-error-824-when-you-run-a-query-that-inse

Using SQL to determine word count stats of a text field

I've recently been working on some database search functionality and wanted to get some information like the average words per document (e.g. text field in the database). The only thing I have found so far (without processing in language of choice outside the DB) is:
SELECT AVG(LENGTH(content) - LENGTH(REPLACE(content, ' ', '')) + 1)
FROM documents
This seems to work* but do you have other suggestions? I'm currently using MySQL 4 (hope to move to version 5 for this app soon), but am also interested in general solutions.
Thanks!
* I can imagine that this is a pretty rough way to determine this as it does not account for HTML in the content and the like as well. That's OK for this particular project but again are there better ways?
Update: To define what I mean by "better": either more accurate, performs more efficiently, or is more "correct" (easy to maintain, good practice, etc). For the content I have available, the query above is fast enough and is accurate for this project, but I may need something similar in the future (so I asked).
The text handling capabilities of MySQL aren't good enough for what you want. A stored function is an option, but will probably be slow. Your best bet to process the data within MySQL is to add a user defined function. If you're going to build a newer version of MySQL anyway, you could also add a native function.
The "correct" way is to process the data outside the DB since DBs are for storage, not processing, and any heavy processing might put too much of a load on the DBMS. Additionally, calculating the word count outside of MySQL makes it easier to change the definition of what counts as a word. How about storing the word count in the DB and updating it when a document is changed?
Example stored function:
DELIMITER $$
CREATE FUNCTION wordcount(str LONGTEXT)
RETURNS INT
DETERMINISTIC
SQL SECURITY INVOKER
NO SQL
BEGIN
DECLARE wordCnt, idx, maxIdx INT DEFAULT 0;
DECLARE currChar, prevChar BOOL DEFAULT 0;
SET maxIdx=char_length(str);
SET idx = 1;
WHILE idx <= maxIdx DO
SET currChar=SUBSTRING(str, idx, 1) RLIKE '[[:alnum:]]';
IF NOT prevChar AND currChar THEN
SET wordCnt=wordCnt+1;
END IF;
SET prevChar=currChar;
SET idx=idx+1;
END WHILE;
RETURN wordCnt;
END
$$
DELIMITER ;
This is quite a bit faster, though just slightly less accurate. I found it 4% light on the count, which is OK for "estimate" scenarios.
SELECT
ROUND (
(
CHAR_LENGTH(content) - CHAR_LENGTH(REPLACE (content, " ", ""))
)
/ CHAR_LENGTH(" ")
) AS count
FROM documents
Simple solution for some similar cases (MySQL):
SELECT *,
(CHAR_LENGTH(student)-CHAR_LENGTH(REPLACE(student,' ','')))+1 as 'count'
FROM documents;
You can use the word_count() UDF from https://github.com/spachev/mysql_udf_bundle. I ported the logic from the accepted answer with a difference that my code only supports latin1 charset. The logic would need to be reworked to support other charsets. Also, both implementations always consider a non-alphanumeric character to be a delimiter, which may not always desirable - for example "teacher's book" is considered to be three words by both implementations.
The UDF version is, of course, significantly faster. For a quick test I tried both on a dataset from Project Guttenberg consisting of 9751 records totaling about 3 GB. The UDF did all of them in 18 seconds, while the stored function took 63 seconds to process just 30 records (which UDF does in 0.05 seconds). So the UDF is roughly 1000 times faster in this case.
UDF will beat any other method in speed that does not involve modifying MySQL source code. This is because it has access to the string bytes in memory and can operate directly on bytes without them having to be moved around. It is also compiled into machine code and runs directly on the CPU.
Well I tried to use the function defined above and it was great, except one scenario.
In English you have strong use of ' as part of the word. The function above, at least to me, counted "haven't" as 2.
So here is my little correction:
DELIMITER $$
CREATE FUNCTION wordcount(str TEXT)
RETURNS INT
DETERMINISTIC
SQL SECURITY INVOKER
NO SQL
BEGIN
DECLARE wordCnt, idx, maxIdx INT DEFAULT 0;
DECLARE currChar, prevChar BOOL DEFAULT 0;
SET maxIdx=CHAR_LENGTH(str);
WHILE idx < maxIdx DO
SET currChar=SUBSTRING(str, idx, 1) RLIKE '[[:alnum:]]' OR SUBSTRING(str, idx, 1) RLIKE "'";
IF NOT prevChar AND currChar THEN
SET wordCnt=wordCnt+1;
END IF;
SET prevChar=currChar;
SET idx=idx+1;
END WHILE;
RETURN wordCnt;
END
$$