I am a beginner in MySQL, looking for expert advise ...
I need to create 4 logging stored procedures:
OPEN_LOG(code, comment)
CLOSE_LOG(result)
BEGIN_ACTION(action, comment)
END_ACTION(result)
to use them in my stored procedures like this:
START TRANSACTION;
CALL open_log('ETL_ABC', 'Running ETL process ABC');
CALL begin_action('Refreshing Dimensions', 'CUSTOMERS, PRODUCTS, STORES');
MERGE INTO customers ...;
MERGE INTO products ...;
MERGE INTO stores' ...;
CALL end_action('Done');
CALL begin_action('Loading new SALES data', 'For 01-APR-2018');
INSERT INTO sales ...;
CALL end_action('... rows added');
CALL close_log('Successfully completed');
COMMIT;
The OPEN_LOG procedure should insert 1 row into the PROCESS_LOGS table.
The CLOSE_LOG procedure should update that row with RESULT and END_DATE_TIME values. The BEGIN_ACTION procedure should insert 1 row into the LOG_DATA table. The END_ACTION procedure should insert yet another row into the same LOG_DATA table. All these inserts and updates should be independent from the main transaction.
There may be many ETL processes running simultaneously in the system and using this logging mechanism. At the same time, support staff can manually query these logging tables to see what is running and at what stage the processes are.
In Oracle, I was able to achieve this by declaring my logging procedures with PRAGMA AUTONOMOUS_TRANSACTION.
In MySQL/MariaDB, I believe my only option is using MyISAM storage engine for the logging tables. My concern is concurrency. I estimate that I may need to allow up to 100 log writes per second and up to 10 queries per minute. The queries should be quick because both log tables will properly indexed.
Please, give me your opinion/alternatives.
Related
I'm using an Aurora DB (ie MySQL version 5.6.10) as a queue, and I'm using a stored procedure to pull records out of a table in batches. The sproc works with the following steps...
Select the next batch of data into a temptable
Write the IDs from the records from the temp table into to a log table
Output the records
Once a record has been added to the log, the sproc won't select it again next time it's called, so multiple servers can call this sproc, and both deal with batches of data from the queue without stepping on each others toes.
The sproc runs in a fraction of a second, but my company is now spinning up servers automatically, and these cloned servers are calling the sproc at exactly the same time, and the result is the same records are being selected twice
Is there a way I can make this sproc be limited to one call at a time? Ideally, any additional calls should wait until the first call is finished, and then they can run
Unfortunately, I have very little experience working with MySQL, so I'm not really sure where to start. I'd much appreciate it if anyone could point me in the right direction
This is a job for MySQL table locking. Try something like this. (You didn't show us your queries so there's a lot of guesswork here.)
SET autocommit = 0;
LOCK TABLES logtable WRITE;
CREATE TEMPORARY TABLE temptable AS
SELECT whatever FROM whatevertable FOR UPDATE;
INSERT INTO logtable (id)
SELECT id FROM temptable;
COMMIT;
UNLOCK TABLES;
If more than one connection tries to run this sequence concurrently, one will wait for the other's UNLOCK TABLES; to proceed. You say your SP is quick, so probably nobody will notice the short wait.
Pro tip: When you have the same timed code running on lots of servers, it's best to put in a short random delay before running the job. That way the shared resources (like your MySQL database) won't get hammered by a whole lot of requests precisely timed to be simultaneous.
My issue currently is as follows:
I have Table A, that copies items to a transfer table based on whether an Update, Insert or Delete transaction has occurred on Table A i.e.
Table A -> new insert
Trigger activates and inserts row into Transfer table with 2 other columns - DateQueried and QueryType (Where DateQueried is the date the trigger fired and QueryType is 'Delete', 'Insert' or 'Update' depending on the trigger type)
However, now I need to transfer this data to a web server by a linked table (all this is fine and doing as it should). Currently I have a PowerShell script to do this. The script does the following:
Select all values from my transfer table, orders by datequeried
Run a foreach loop that runs a stored procedure to either insert / update / delete that value to the web server, depending on the value of QueryType.
This method is extremely slow, the script runs on a 30 minute timer and we can have a situation where we receive over 100,000 rows within that 30 minute time frame, which means 100,000 connections to the DB via the PowerShell script (especially when there's 15 tables that need to run through this process).
Is there a better way to get these values out by running an inner join? Previously I would just run the entire table at once through a stored procedure that would delete all values from the second server with a QueryType of delete, then run inserts then updates. However, this had some issues if a person was to create a new job, then delete the job, then recreate the job, then update the job, then delete the job again. My stored procedure would process all deletes, THEN all inserts, THEN all updates, so even though the row was deleted it would go and insert it back again.
I then rejigged it yet again and instead transferred primary keys across only and whenever I ran the stored procedure it would process deletes based on primary keys, then for inserts and updates it would first join to the original table on the primary keys (which if it was previously deleted would return no results and therefore not insert). But I ran into a problem where the query was chewing up way too much resources for the process and bombing out the server at times (it had to attempt to join > 100,000 results to a table that has over 10 million rows). Also there was another issue where it would insert a row with only null values for each column where the join wouldn't work. Then when it happened again there would be a primary key error and the stored procedure would stop.
Is there another option I am overlooking that would make the process here a great deal faster or am I just stuck with the limitations on the server and maybe have to suggest that the company only processes uploads at the end of every day rather than the 30 minute schedule they would like?
Stick with bulk Delete/Insert/update order.
But:
Only insert rows where a later delete is not there (all
changes are lost anyway)
Only process updates where a later insert OR delete rows aren't there (all
changes would be overwritten)
Perhaps the title is a little misleading, so I'll explain my question in further detail. Obviously the queries inside of the procedure are executed synchronously and in order, but are procedures themselves executed synchronously?
Lets say I have a procedure called "Register" which handles a couple of queries, for example it looks like this:
BEGIN
DECLARE account_inserted INT(11);
INSERT INTO accounts (...) VALUES (...);
SET account_inserted = LAST_INSERTED_ID(); # <------
QUERY using account_inserted...
QUERY using account_inserted...
QUERY using account_inserted...
...
END
Now lets say that there were numerous requests to register coming in at the same time (For example purposes let us say around 200-300 requests) would it execute all of the procedures in order? Or is it possible would the LAST_INSERTED_ID() variable to conflict with a row inserted from another procedure that is being executed in parallel?
You're muddling three things:
Whether MySQL executes procedures synchronously
This could be interpreted to mean either "does MySQL wait for each command within a procedure to complete before moving on to the next?" or "does MySQL wait for the entire procedure to complete before accepting further instructions from the connection that invoked the CALL command?". In both cases, the answer is "yes, it does".
Whether invocations of MySQL procedures are executed atomically
As with any other series of commands, commands within procedures are only atomic if performed within a transaction on tables that use a transactional storage engine. Thus a different connection may well execute another INSERT between the INSERT in your procedure and the command that follows.
Whether LAST_INSERTED_ID() is guaranteed to return the value generated by the immediately preceding INSERT command in the procedure?
Yes, it is. The most recent insertion ID is maintained on a per-connection basis, and as described above the connection waits for CALL to complete before further commands are accepted.
The ID that was generated is maintained in the server on a per-connection basis. This means that the value returned by the function to a given client is the first AUTO_INCREMENT value generated for most recent statement affecting an AUTO_INCREMENT column by that client. This value cannot be affected by other clients, even if they generate AUTO_INCREMENT values of their own. This behavior ensures that each client can retrieve its own ID without concern for the activity of other clients, and without the need for locks or transactions.
https://dev.mysql.com/doc/refman/5.5/en/information-functions.html#function_last-insert-id
I have an SP with a set of 3 temp tables created within the SP with no indexes. All three are inserted into using INSERT INTO ... WITH (TABLOCK). The database recovery model is SIMPLE for userDB as well as tempDB.
This SP is generating and inserting new data and a Transaction Commit/Rollback is good enough to maintain data integrity. So I want it to do minimal logging which I think I have enabled by using the TABLOCK hint.
I am checking the log generation before and after execution of the SP using below query and see no difference in log generation after adding the TABLOCK hint. (Checking in tempDB as tables are temp tables)
SELECT count(1) as NumLogEntries
, sum("log record length") as TotalLengthWritten
FROM fn_dblog(null, null);
Is there anything else I need to do in order to enable minimal logging?
PN: I am able to see reduced logging if I use this hint to do the same INSERT INTO separately in management studio, but not if I do the same within the SP.
I have also tried adding the trace flag 610 ON before the insert statement but to no effect.
I want to insert or update, then insert a new log into another table.
I'm running a nifty little query to pull information from a staging table into other tables, something like
Insert into
select
on duplicate key update
What I'd like to do without php, or triggers (the lead dev doesn't like em, and I'm not that familiar with them either) is insert a new record into a logging table. Needed for reporting on what data was updated or inserted and on what table.
Any hints or examples?
Note: I was doing this with php just fine, although it was taking about 4 hours to process on 50K rows. Using the laravel php framework, looping over each entry in staging update 4 other tables with the data and log for each one was equalling 8 queries for each row (this was using laravel models not raw sql). I was able to optimise by pushing logs into an array and batch processing. But you can't beat 15sec processing time in mysql by bypassing all that throughput. Now I'm hooked on doing awesome things the sql way.
If you need executing more than one query statement. I refer to use transaction then trigger to guarantee Atomicity (part of ACID). Bellow code is sample for MySql transaction:
START TRANSACTION;
UPDATE ...
INSERT ...
DELETE ...
Other query statement
COMMIT;
Statements inside transaction will be executed all or nothing.
If you want to do two things (insert the base row and insert a log row), you'll need two statements. The second can (and should) be a trigger.
It would be better to use a Trigger, it is often used for Logging purposes