Why Oracle insert faster than Mysql - mysql

I have a little compharison between oracle 11gR2 and Mysql 5.6.
I create same schema in both DBMS with 3 tables
--branch
--client
--loan
loan has a foreign key to client, and a client has a foreign key to branch, besides all of them have primary keys.
I created branches, and client (200_000 clients) and I wanna tests insert perfomance with loan table which is consist around 50 columns.
Most of clolumns double or integer or string.
create or replace PROCEDURE create_loans( n number)
as
BEGIN
Declare
i number:=0;
randDouble float ;
randInt number;
randString varchar2(50);
Begin
while i < n
Loop
randDouble := ROUND(dbms_random.value(0,1),17);
randInt := ROUND(dbms_random.value(1,100000000));
randString := dbms_random.string('l', 50);
Insert into loan_row_model.loan values(null,
randDouble,
randDouble*10,
randDouble*13,
SUBSTR(randString,1,32),
SUBSTR(randString,2,10),
randDouble*155,
SUBSTR(randString,1,9),
SUBSTR(randString,9,10),
SUBSTR(randString,1,32),
randDouble*6123,--annual_inc
SUBSTR(randString,3,32),--verification_status
SUBSTR(randString,4,30),
randDouble,
randInt,--open_acc
randInt*2,
SUBSTR(randString,7,7),
randInt*5,--total_acc
SUBSTR(randString,1,3),--initial_list_status
randDouble*64,
randDouble*4,
randDouble*231,
randDouble,
randDouble,
randDouble*12,
randDouble,--collection_recovery_fee
SUBSTR(randString,19,30),
randDouble*14,--last_pymnt_amnt
SUBSTR(randString,21,32),
SUBSTR(randString,9,30),
SUBSTR(randString,16,15),--policy_code
SUBSTR(randString,1,29),--application_type
randInt,
randInt*7,
randInt*4,
randInt,
randInt,
randInt,
randInt*3,
randInt,--mths_since_rcnt_il
randDouble*6149,
randInt*8,--open_rv_12m
randInt*8,--open_rv_24m
randDouble*475,
randDouble*37,--all_util
randInt*4,
randInt,
randInt*3,
randInt,
randInt*9,
TO_DATE( TRUNC( DBMS_RANDOM.VALUE(TO_CHAR(DATE '2016-01-01','J'),TO_CHAR(DATE '2046-12-31','J') )),'J'),
ROUND(dbms_random.value(1,200000))
);
i := i+1;
end loop;
end;
END;
the procedure in mysql almost identical, I just used their native random generator for values.
Before start I have disabled parallel executing in oracle, and flush cache, in mysql also disable cache.
But as a result for 50000 inserts Oracle has 15s vs 30s in Mysql.
What is the reason, could you help?

MySQL can do that in 3 seconds if you "batch" 100 rows at a time. Perhaps even faster with LOAD DATA.
How often do you need to insert 50K rows? In other words, why does it matter?
Show us SHOW CREATE TABLE; there could be various issues (favorable or unfavorable) with the indexes or lack of them, and also in the datatypes, and especially the "engine".
Were they "finished"? Both Oracle and MySQL do some variant on "delayed writes" to avoid making you wait. 15s or 30s may or may not be sustainable.
Were you using spinning drives or SSDs? RAID with write cache? What about the settings for autocommit versus BEGIN...COMMIT? Did you even do a commit? Or does the timing include a rollback?! Committing after each INSERT is not a good idea since it has a huge overhead.
Were the settings tuned optimally?
Did the table already have data? Were you inserting "at the end"? Or randomly?
When you have answered all of those, I may have another 10 questions that will show that further things can be done to make your benchmark 'prove' that one vendor or the other is faster.

Related

slow mysql database restore with multiple threads (delphi)

I need to restore a lot of mysql database backups and I've been trying to speed up by using multiple threads (in Delphi), each with their own connection. When I'm using MODE_SCRIPT, I can only process around 1 file per second (fps), with the CPU/DISK/MEMORY not stressed at all
When I'm using MODE_CMD, I can get as high as 12+fps with the CPU up to 100% on all cores.
It looks like when using TClientDataSet or descendants, the script is not using all cores, even when using multiple threads?
Minimal code example:
type
TWorker = class(TThread)
private
FTasks: TThreadStringList;
FConn: TMyConnection;
FScript: TMyScript;
FQ: TMyQuery;
protected
procedure Execute; override;
public
procedure addTask(const aFn: String);
constructor create(Suspended: Boolean; const aMyId: LongInt;const aIniDb: TIniDBSettings);
end;
procedure TWorker.addTask(const aFn: String);
begin
FTasks.Add(aFn);
end;
constructor TWorker.create(Suspended: Boolean; const aMyId: LongInt; const aIniDb: TIniDBSettings);
begin
inherited Create(Suspended);
FTasks := TMTThreadStringList.Create;
FMyName := 'WORKER__'+IntToStr(aMyId);
end;
procedure TWorker.Execute;
var
mode: LongInt;
const
MODE_DOS=1;
MODE_SCRIPT = 2;
begin
FConn := TMyConnection.Create(Nil);
FConn.Username := aIniDb.iniSDBUsername;
FConn.Password := aIniDb.iniSDBPass;
FConn.Database := aIniDb.iniSDBDatabase;
FConn.Server := aIniDb.iniSDBServer;
FScript := TMyScript.Create(Nil);
FScript.Connection := FConn;
try
FConn.Connect;
while not Terminated do begin
if FTasks.Count > 0 then begin
tmpFn := FTasks.Strings[0];
FTasks.Delete(0);
fMyDbname := 'tmpdb_'+FMyName;
if(mode=MODE_SCRIPT) then {
FQ.SQL.Text := 'drop database if exists '+fMyDbname ;
FQ.Execute;
FQ.SQL.Text := 'create database '+fMyDbname;
FQ.Execute;
FQ.SQL.Text := 'use '+fMyDbname;
fQ.Execute;
FScript.SQL.LoadFromFile(tmpFn+'.new');
FScript.Execute;
}
else if(mode=MODE_DOS) then begin
sCmd := 'cmd.exe /c mysql -u user -h serverip < '+tmpFn;
GetDosOutput(sCmd,dosOutput);//function using 'CreateProcess()'
}
InterlockedIncrement(QDONE);
end
else Sleep(15);
end;
except on e: Exception do
MessageBox(0,PWideChar('error'+e.Message),'error',MB_OK);
end;
end;
It sounds like you are using MyISAM. That is antiquated, and suffers from "table locks", which inhibits much in the way of parallelism.
The following are irrelevant for MyISAM:
-SET FOREIGN_KEY_CHECKS=0;
-SET autocommit=0;
Some questions that relate to the problem:
Do you have AUTO_INCREMENT columns?
Are you inserting into the same table at the same time from different threads? (Problematic with MyISAM and MEMORY, less so with InnoDB.)
How many UNIQUE keys on each table? (INSERTs are slowed down by the need to check for dups.)
Are you using INSERT? One row at a time? Or batched? (Inserting a batch of 100 rows at a time is about optimal -- 10 times as fast as 1 at a time.)
Or are you using LOAD DATA? (Even faster.)
What is the relationship between a "file" and a "table"? That is, are you loading lots of little files into a table, or each file is one table?
Does the RAID have striping and/or a Battery Backed Write Cache?
Is the disk HDD or SSD?
What is the ping time between the client and server? (You mentioned "network", but gave no indication of proximity.)
How many tables? Are you creating up to 1.87 tables per second? That is 3 files to write and 1 to read? (Windows is not the greatest at rapid opening of files.) That's about 7 file opens/sec. (Note InnoDB needs only 1 file per table if using innodb_file_per_table=1.)
Please provide SHOW CREATE TABLE for a couple of the larger tables. Please provide a sample of the SQL statements used.
Wilson's request could also be handy.

Attempt to fetch logical page in database 2 failed. It belongs to allocation unit X not to Y

Started to get following error when executing certain SP. Code related to this error is pretty simple, joining #temp table to real table
Full text of error:
Msg 605, Level 21, State 3, Procedure spSSRSRPTIncorrectRevenue, Line 123
Attempt to fetch logical page (1:558552) in database 2 failed. It belongs to allocation unit 2089673263876079616 not to 4179358581172469760.
Here is what I found:
https://support.microsoft.com/en-us/kb/2015739
This suggests some kind of issue with database. I run DBCC CHECKDB on user database and on temp database - all passes.
Second thing I'm doing - trying to find which table those allocation units belong
SELECT au.allocation_unit_id, OBJECT_NAME(p.object_id) AS table_name, fg.name AS filegroup_name,
au.type_desc AS allocation_type, au.data_pages, partition_number
FROM sys.allocation_units AS au
JOIN sys.partitions AS p ON au.container_id = p.partition_id
JOIN sys.filegroups AS fg ON fg.data_space_id = au.data_space_id
WHERE au.allocation_unit_id in(2089673263876079616, 4179358581172469760)
ORDER BY au.allocation_unit_id
This returns 2 objects in tempdb, not in user db. So, it makes me think it's some kind of data corruption in tempdb? I'm developer, not DBA. Any suggestions on what I should check next?
Also, when I run query above, how can I tell REAL object name that I understand? Like #myTempTable______... instead of #07C650CE
I was able to resolve this by clearing the SQL caches:
DBCC FREEPROCCACHE
GO
DBCC DROPCLEANBUFFERS
GO
Apparently restarting the SQL service would have had the same affect.
(via Made By SQL, reproduced here to help others!)
I have like your get errors too.
firstly you must backing up to table or object for dont panic more after. I tryed below steps on my Database.
step 1:
Backing up table (data movement to other table as manuel or vs..how can you do)
I used to below codes to my table move other table
--CODE-
set nocount on;
DECLARE #Counter INT = 1;
DECLARE #LastRecord INT = 10000000; --your table_count
WHILE #Counter < #LastRecord
BEGIN
BEGIN TRY
BEGIN
insert into your_table_new SELECT * FROM your_table WHERE your_column= #Counter --dont forget! create your_table_new before
END
END TRY
BEGIN CATCH
BEGIN
insert into error_code select #Counter,'error_number' --dont forget the create error_code table before.
END
END CATCH
SET #Counter += 1;
END;
step 2:
-DBCC CHECKTABLE(your_table , REPAIR_REBUILD )
GO
check your table. if you have an error go to other step_3.
step 3:
!!attention!! you can lost some data/datas on your table. but dont worry. so you backed-up your table in step_1.
-DBCC CHECKTABLE(your_table , REPAIR_ALLOW_DATA_LOSS)
GO
Good luck!
~~pektas
In my case, truncating and re-populating data in the concerned tables was the solution.
Most probably the data inside tables was corrupted.
Database ID 2 means your tempdb is corrupted. Fixing tempdp is easy. Restart sqlserver service and you are good to go.
This could be an instance of a bug Microsoft fixed on SQL Server 2008 with queries on temporary tables that self reference (for example we have experienced it when loading data from a real table to a temporary table while filtering any rows we already have populated in the temp table in a previous step).
It seems that it only happens on temporary tables with no identity/primary key, so a workaround is to add one, although if you patch CU3 or later you also can enable the hotfix via turning a trace flag on.
For more details on the bug/fixes: https://support.microsoft.com/en-us/help/960770/fix-you-receive-error-605-and-error-824-when-you-run-a-query-that-inse

Prevent race conditions across multiple rows

I have read a lot about preventing race conditions, but typically with one record in an upsert scenario. For example:
Atomic UPSERT in SQL Server 2005
I have a different requirement, and it is to prevent race conditions across multiple rows. For example, say I have the following table structure:
GiftCards:
GiftCardId int primary key not null,
OriginalAmount money not null
GiftCardTransactions:
TransactionId int primary key not null,
GiftCardId int (foreign key to GiftCards.GiftCardId),
Amount money not null
There could be multiple processes inserting into GiftCardTransactions and I need to prevent inserting if SUM(GiftCardTransactions.Amount) + insertingAmount would go over GiftCards.OriginalAmount.
I know I could use TABLOCKX on GiftCardTransactions, but obviously this would not be feasible for lots of transactions. Another way would be to add a GiftCards.RemainingAmount column and then I only need to lock one row (though with possibility of lock escalation), but unfortunately this isn't an option for me at this time (would this have been the best option?).
Instead of trying to prevent inserting in the first place, maybe the answer is to just insert, then select SUM(GiftCardTransactions.Amount), and rollback if necessary. This is an edge case, so I'm not worried about unnecessarily using up PK values, etc.
So the question is, without modifying the table structure and using any combination of transactions, isolation levels and hints, how can I achieve this with a minimal amount of locking?
I have run into this exact situation in the past and ended up using SP_GetAppLock to create a semaphore on a key to prevent a race condition. I wrote up an article several years ago discussing various methods. The article is here:
http://www.sqlservercentral.com/articles/Miscellaneous/2649/
The basic idea is that you acquire a lock on a constructed key that is separate from the table. In this way, you can be very precise and only block spids that would potentially create a race condition and not block other consumers of the table.
I've left the meat of the article below but I would apply this technique by acquiring a lock on a constructed key such as
#Key = 'GiftCardTransaction' + GiftCardId
Acquiring a lock on this key (and ensuring you consistently apply this approach) would prevent any potential race condition as the first to acquire the lock would do it's work with all other requests waited for the lock to be released (or time out, depending on how your want your app to work.)
The meat of the article is here:
SP_getapplock is a wrapper for the extended procedure XP_USERLOCK. It allows you to use SQL SERVERs locking mechanism to manage concurrency outside the scope of tables and rows. It can be used you to marshal PROC calls in the same way the above solutions with some additional features.
Sp_getapplock adds locks directly to the server memory which keeps your overhead low.
Second, you can specify a lock timeout without needing to change session settings. In cases where you only want one call for a particular key to run, a quick timeout would ensure the proc doesn't hold up execution of the application for very long.
Third, sp_getapplock returns a status which can be useful in determining if the code should run at all. Again, in cases where you only want one call for a particular key, a return code of 1 would tell you that the lock was granted successfully after waiting for other incompatible locks to be released, thus you can exit without running any more code (like an existence check, for example).
The synax is as follows:
sp_getapplock [ #Resource = ] 'resource_name',
[ #LockMode = ] 'lock_mode'
[ , [ #LockOwner = ] 'lock_owner' ]
[ , [ #LockTimeout = ] 'value' ]
An example using sp_getapplock
/************** Proc Code **************/
CREATE PROC dbo.GetAppLockTest
AS
BEGIN TRAN
EXEC sp_getapplock #Resource = #key, #Lockmode = 'Exclusive'
/*Code goes here*/
EXEC sp_releaseapplock #Resource = #key
COMMIT
I know it goes without saying, but since the scope of sp_getapplock's locks is an explicit transaction, be sure to SET XACT_ABORT ON, or include checks in code to ensure a ROLLBACK happens where required.
My T-SQL is a little rusty, but here is my shot at a solution. The trick is to take an update lock on all transactions for that gift card at the beginning of the transaction, so that as long as all procedures don't read uncommitted data (which is the default behavior), this effectively will lock the transactions of the targeted gift card only.
CREATE PROC dbo.AddGiftCardTransaction
(#GiftCardID int,
#TransactionAmount float,
#id int out)
AS
BEGIN
BEGIN TRANS
DECLARE #TotalPriorTransAmount float;
SET #TotalPriorTransAmount = SELECT SUM(Amount)
FROM dbo.GiftCardTransactions WTIH UPDLOCK
WHERE GiftCardId = #GiftCardID;
IF #TotalPriorTransAmount + #TransactionAmount > SELECT TOP 1 OriginalAmout
FROM GiftCards WHERE GiftCardID = #GiftCardID;
BEGIN
PRINT 'Transaction would exceed GiftCard Value'
set #id = null
RETURN
END
ELSE
BEGIN
INSERT INTO dbo.GiftCardTransactions (GiftCardId, Amount)
VALUES (#GiftCardID, #TransactionAmount);
set #id = ##identity
RETURN
END
COMMIT TRANS
END
While this is very explicit, I think it would be more efficient, and more T-SQL friendly to use a rollback statement like:
BEGIN
BEGIN TRANS
INSERT INTO dbo.GiftCardTransactions (GiftCardId, Amount)
VALUES (#GiftCardID, #TransactionAmount);
IF (SELECT SUM(Amount)
FROM dbo.GiftCardTransactions WTIH UPDLOCK
WHERE GiftCardId = #GiftCardID)
>
(SELECT TOP 1 OriginalAmout FROM GiftCards
WHERE GiftCardID = #GiftCardID)
BEGIN
PRINT 'Transaction would exceed GiftCard Value'
set #id = null
ROLLBACK TRANS
END
ELSE
BEGIN
set #id = ##identity
COMMIT TRANS
END
END

Why is my custom MySQL function so much slower than inlining same in query?

I repeatedly use this SELECT query to read unsigned integers representing IPv4 addresses and present them as human readable dotted quad strings.
SELECT CONCAT_WS('.',
FLOOR(ip/POW(256,3)),
MOD(FLOOR(ip/POW(256,2)), 256),
MOD(FLOOR(ip/256), 256),
MOD(ip, 256))
FROM ips;
With my test data, this query takes 3.6 seconds to execute.
I thought that creating a custom stored function for the int->string conversion would allow for easier to read queries and allow reuse, so I made this:
CREATE FUNCTION IntToIp(value INT UNSIGNED)
RETURNS char(15)
DETERMINISTIC
RETURN CONCAT_WS(
'.',
FLOOR(value/POW(256,3)),
MOD(FLOOR(value/POW(256,2)), 256),
MOD(FLOOR(value/256), 256),
MOD(value, 256)
);
With this function my query looks like this:
SELECT IntToIp(ip) FROM ips;
but with my test data, this takes 13.6 seconds to execute.
I would expect this to be slower on first run, as there is an extra level of indirection involved, but nearly 4 times slower seems excessive. Is this much slowness expected?
I'm using out of the box MySQL server 5.1 on Ubuntu 10.10 with no configuration changes.
To reproduce my test, create a table and populate with 1,221,201 rows:
CREATE TABLE ips (ip INT UNSIGNED NOT NULL);
DELIMITER //
CREATE PROCEDURE AddIps ()
BEGIN
DECLARE i INT UNSIGNED DEFAULT POW(2,32)-1;
WHILE (i>0) DO
INSERT INTO ips (ip) VALUES (i);
SET i = IF(i<3517,0,i-3517);
END WHILE;
END//
DELIMITER ;
CALL AddIps();
Don't reinvent the wheel, use INET_NTOA():
mysql> SELECT INET_NTOA(167773449);
-> '10.0.5.9'
Using this one you could get better performance:
CREATE FUNCTION IntToIp2(value INT UNSIGNED)
RETURNS char(15)
DETERMINISTIC
RETURN CONCAT_WS(
'.',
(value >> 24),
(value >> 16) & 255,
(value >> 8) & 255,
value & 255
);
> SELECT IntToIp(ip) FROM ips;
1221202 rows in set (18.52 sec)
> SELECT IntToIp2(ip) FROM ips;
1221202 rows in set (10.21 sec)
Launching your original SELECT just after adding your test data took 4.78 secs on my system (2gB mysql 5.1 instance on quad core (fedora 64 bit).
EDIT: Is this much slowness expected?
Yes, stored procedures are slow, a bunch of magnitudes slower than interpreted/compiled code. They turn out useful when you need to tie up some database logic which you want to keep out of your application because it's out of the specific domain (ie, logging/administrative tasks). If a stored function contains no queries, it's always better practice to write an utility function in your chosen language, as that wont prevent reuse (there are no queries), and will run much faster.
And that's the reason for which, in this particular case, you should use the INET_NTOA function instead, which is available and fulfils your needs, as suggested in sanmai answer.

Using SQL to determine word count stats of a text field

I've recently been working on some database search functionality and wanted to get some information like the average words per document (e.g. text field in the database). The only thing I have found so far (without processing in language of choice outside the DB) is:
SELECT AVG(LENGTH(content) - LENGTH(REPLACE(content, ' ', '')) + 1)
FROM documents
This seems to work* but do you have other suggestions? I'm currently using MySQL 4 (hope to move to version 5 for this app soon), but am also interested in general solutions.
Thanks!
* I can imagine that this is a pretty rough way to determine this as it does not account for HTML in the content and the like as well. That's OK for this particular project but again are there better ways?
Update: To define what I mean by "better": either more accurate, performs more efficiently, or is more "correct" (easy to maintain, good practice, etc). For the content I have available, the query above is fast enough and is accurate for this project, but I may need something similar in the future (so I asked).
The text handling capabilities of MySQL aren't good enough for what you want. A stored function is an option, but will probably be slow. Your best bet to process the data within MySQL is to add a user defined function. If you're going to build a newer version of MySQL anyway, you could also add a native function.
The "correct" way is to process the data outside the DB since DBs are for storage, not processing, and any heavy processing might put too much of a load on the DBMS. Additionally, calculating the word count outside of MySQL makes it easier to change the definition of what counts as a word. How about storing the word count in the DB and updating it when a document is changed?
Example stored function:
DELIMITER $$
CREATE FUNCTION wordcount(str LONGTEXT)
RETURNS INT
DETERMINISTIC
SQL SECURITY INVOKER
NO SQL
BEGIN
DECLARE wordCnt, idx, maxIdx INT DEFAULT 0;
DECLARE currChar, prevChar BOOL DEFAULT 0;
SET maxIdx=char_length(str);
SET idx = 1;
WHILE idx <= maxIdx DO
SET currChar=SUBSTRING(str, idx, 1) RLIKE '[[:alnum:]]';
IF NOT prevChar AND currChar THEN
SET wordCnt=wordCnt+1;
END IF;
SET prevChar=currChar;
SET idx=idx+1;
END WHILE;
RETURN wordCnt;
END
$$
DELIMITER ;
This is quite a bit faster, though just slightly less accurate. I found it 4% light on the count, which is OK for "estimate" scenarios.
SELECT
ROUND (
(
CHAR_LENGTH(content) - CHAR_LENGTH(REPLACE (content, " ", ""))
)
/ CHAR_LENGTH(" ")
) AS count
FROM documents
Simple solution for some similar cases (MySQL):
SELECT *,
(CHAR_LENGTH(student)-CHAR_LENGTH(REPLACE(student,' ','')))+1 as 'count'
FROM documents;
You can use the word_count() UDF from https://github.com/spachev/mysql_udf_bundle. I ported the logic from the accepted answer with a difference that my code only supports latin1 charset. The logic would need to be reworked to support other charsets. Also, both implementations always consider a non-alphanumeric character to be a delimiter, which may not always desirable - for example "teacher's book" is considered to be three words by both implementations.
The UDF version is, of course, significantly faster. For a quick test I tried both on a dataset from Project Guttenberg consisting of 9751 records totaling about 3 GB. The UDF did all of them in 18 seconds, while the stored function took 63 seconds to process just 30 records (which UDF does in 0.05 seconds). So the UDF is roughly 1000 times faster in this case.
UDF will beat any other method in speed that does not involve modifying MySQL source code. This is because it has access to the string bytes in memory and can operate directly on bytes without them having to be moved around. It is also compiled into machine code and runs directly on the CPU.
Well I tried to use the function defined above and it was great, except one scenario.
In English you have strong use of ' as part of the word. The function above, at least to me, counted "haven't" as 2.
So here is my little correction:
DELIMITER $$
CREATE FUNCTION wordcount(str TEXT)
RETURNS INT
DETERMINISTIC
SQL SECURITY INVOKER
NO SQL
BEGIN
DECLARE wordCnt, idx, maxIdx INT DEFAULT 0;
DECLARE currChar, prevChar BOOL DEFAULT 0;
SET maxIdx=CHAR_LENGTH(str);
WHILE idx < maxIdx DO
SET currChar=SUBSTRING(str, idx, 1) RLIKE '[[:alnum:]]' OR SUBSTRING(str, idx, 1) RLIKE "'";
IF NOT prevChar AND currChar THEN
SET wordCnt=wordCnt+1;
END IF;
SET prevChar=currChar;
SET idx=idx+1;
END WHILE;
RETURN wordCnt;
END
$$