What I'm trying to do is write a stored procedure that will query a view, process each row and and make one or more inserts into a table for each row pulled from the view. Everything seems fine, except for the fact that, arbitrarily, mid-point during the process, the server seems to hang on the insert command. I have no idea if there's some memory limit on cursor results sets, or what could be happening. Relevant parts of the SP and a few clarifying comments posted below.
CREATE PROCEDURE `Cache_Network_Observations` ()
BEGIN
-- Declare all variables
/* This cursor is hitting the view which should be returning a number of rows on the scale of ~5M+ records
*/
DECLARE cursor1 CURSOR FOR
SELECT * FROM usanpn2.vw_Network_Observation;
CREATE TABLE Cached_Network_Observation_Temp (observation_id int, name varchar(100), id int);
OPEN cursor1;
load_loop: loop
FETCH cursor1 INTO observation_id, id1, name1, id2, name2, id3, name3, gid1, gname1, gid2, gname2, gid3, gname3;
IF id1 IS NOT NULL THEN
INSERT INTO usanpn2.Cached_Network_Observation_Temp values (observation_id, name1, id1);
END IF;
-- some additional logic here, essentially just the same as the above if statement
END LOOP;
CLOSE cursor1;
END
That being the SP, when I actually run it, everything goes off without a hitch until the process runs and runs and runs. Taking a look at the active query report, I am seeing this:
| 1076 | root | localhost | mydb | Query | 3253 | update | INSERT INTO usanpn2.Cached_Network_Observation values ( NAME_CONST('observation_id',2137912), NAME_ |
Not positive where the NAME_CONST function is coming from or what that has to do with anything. I've tried this multiple times, the observation_id variable / row in the view varies each time, so it doesn't seem to be anything tied to the record.
TIA!
I don't see a NOT FOUND handler for your fetch loop. There's no "exit" condition.
DECLARE done INT DEFAULT FALSE;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
Immediately following the fetch, test the done flag, and exit the loop when it's true.
IF done THEN
LEAVE load_loop;
END IF;
Without that, I think you have yourself a classic infinite loop.
The statement shown in the SHOW FULL PROCESSLIST output is inserting to a different table. (There's no _Temp at the end of the tablename.)
But why on earth do you need a cursor loop to process this row-by-agonizing-row?
If you need a table loaded, just load the flipping table, and be done with it.
Replace all of that "declare cursor", "open cursor", fetch loop, exit handler, individual insert statement nonsense with a single statement that does what you need done:
INSERT INTO Cached_Network_Observation_Temp (observation_id, `name`, id)
SELECT s.observation_id, s.name1 AS `name`, s.id1 AS id
FROM usanpn2.vw_Network_Observation s
WHERE s.id1 IS NOT NULL
That is going to be way more efficient. And it won't clog up the binary logs with a bloatload of unnecessary INSERT statements. (This also has me wanting to backup to a bigger picture, and understand why this table is even needed. This also has me wondering if vw_Network_Observation is a view, and whether the overhead of materializing a derived table is warranted. The predicate in that outer query isn't getting pushed down into the view definition. MySQL processes views much differently than other RDBMSs do.)
EDIT
If the next part of the procedure that is commented out is checking whether id2 is not null to conditionally insert id2,name2 to the _Temp table, that can be done in the same way.
Or, the multiple queries can be combined with UNION ALL operator.
INSERT INTO Cached_Network_Observation_Temp (observation_id, `name`, id)
SELECT s1.observation_id, s1.name1 AS `name`, s1.id1 AS id
FROM usanpn2.vw_Network_Observation s1
WHERE s1.id1 IS NOT NULL
UNION ALL
SELECT s2.observation_id, s2.name2 AS `name`, s2.id2 AS id
FROM usanpn2.vw_Network_Observation s2
WHERE s2.id2 IS NOT NULL
... etc.
FOLLOWUP
If we need to generate multiple rows out a single row, and the number of rows isn't unreasonably large, I'd be tempted to test something like this, processing id1, id2, id3 and id4 in one fell swoop, using a CROSS JOIN of the row source (s) and a artificially generated set of four rows.
That would generate four rows per row from the row source (s), and we can use conditional expressions to return id1, id2, etc.
As an example, something like this:
SELECT s.observation_id
, CASE n.i
WHEN 1 THEN s.id1
WHEN 2 THEN s.id2
WHEN 3 THEN s.id3
WHEN 4 THEN s.id4
END AS `id`
, CASE n.i
WHEN 1 THEN s.name1
WHEN 2 THEN s.name2
WHEN 3 THEN s.name3
WHEN 4 THEN s.name4
END AS `name`
FROM usanpn2.vw_Network_Observation s
CROSS
JOIN ( SELECT 1 AS i UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4) n
HAVING `id` IS NOT NULL
We use a predicate in the HAVING clause rather than the WHERE clause because the value generated for the id column in the resultset isn't available when the rows are accessed. The predicates in the HAVING clause are applied nearly last in the execution plan, after the rows are accessed, just before the rows are returned. (I think a "filesort" operation to satisfy an ORDER BY, and the LIMIT clause are applied after the HAVING.)
If the number of rows to be processed is "very large", then we may get better performance processing rows in several reasonably sized batches. If we do a batch size of two, processing two rows per INSERT, that effectively halves the number of INSERTs we need to run. With 4 rows per batch, we cut that in half again. Once we are up to a couple of dozen rows per batch, we've significantly reduced the number of individual INSERT statements we need to run.
As the batches get progressively larger, our performance gains become much smaller. Until the batches become unwieldy ("too large") and we start thrashing to disk. There's a performance "sweet spot" in there between the two extremes (processing one row at a time vs processing ALL of the rows in one batch).
Related
I have access to a reporting dataset (that I don't control) that we retrieve daily from a cloud service and store in a mysql db to run advanced reporting and report combining locally with 3rd party data visualization software.
The data often has duplicate values on an id field that create problems when joining with other tables for data analysis.
For example:
+-------------+----------+------------+----------+
| workfile_id | zip_code | date | total |
+-------------+----------+------------+----------+
| 78002 | 90210 | 2016-11-11 | 2010.023 |
| 78002 | 90210 | 2016-12-22 | 427.132 |
+-------------+----------+------------+----------+
Workfile_id is duplicated because this is the same job, but additional work on the job was performed in a different month than the original work. Instead of the software creating another workfile id for the job, the same is used.
Doing joins with other tables on workfile_id is problematic when more than one of the same id is present, so I was wondering if it is possible to do one of two things:
Make duplicate workfile_id's unique. Have sql append a number to the workfile id when a duplicate is found. The first duplicate (or second occurrence of the same workfile id) would need to get a .01 appended to the end of the workfile id. Then later, if another duplicate is inserted, it would need to auto increment the appended number, say .02, and so on with any subsequent duplicate workfile_id. This method would work best with our data but I'm curious how difficult this would be for the server from a performance perspective. If I could schedule the alteration to take place after the data is inserted to speed up the initial data insert, that would be ideal.
Sum total columns and remove duplicate workfile_id row. Have a task that identifies duplicate workfile_ids and sums the financial columns of the duplicates, replacing the original total with new sum and deleting the 'new row' after the columns have been added together.
This is more messy from a data preservation perspective, but is acceptable if the first solution isn't possible.
My assumption is that there will be significant overhead to have the server compare new workfile_id values to all existing worlfile_id values each time data is inserted, but our dataset is small and new data is only inserted once daily, at 1:30am, and it also should be feasible to keep the duplicate workfile_id searching to rows inserted within the last 6 mo.
Is finding duplicates in a column (workfile_id) and appending an auto-incrementing value onto the workfile_id possible?
EDIT:
I'm having trouble getting my trigger to work based on sdsc81's answer below.
Any ideas?
DELIMITER //
CREATE TRIGGER append_subID_to_workfile_ID_salesjournal
AFTER INSERT
ON salesjournal FOR EACH ROW
BEGIN
SET #COUNTER = ( SELECT (COUNT(*)-1) FROM salesjournal WHERE workfile_id = NEW.workfile_id );
IF #COUNTER > 1 THEN
UPDATE salesjournal SET workfile_id = CONCAT(workfile_id, #COUNTER) WHERE id = NEW.id;
END IF;
END;//
DELIMITER ;
It's hard to know if the trigger isn't working at all, or if just the code in the trigger isn't working. I get no errors on insert. Is there any way to debug trigger errors?
Well, everything is posible ;)
You dont control the dataset but you can modifify the database, right?
Then you could use a trigger after every insert of a new value, and update it, if its duplicate. Something like:
SET #COUNTER = ( SELECT (COUNT(*)-1) FROM *your_table* WHERE workfile_id = NEW.workfile_id );
IF #COUNTER > 1 THEN
UPDATE *your_table* SET workfile_id = CONCAT(workfile_id, #COUNTER) WHERE some_unique_id = NEW.some_unique_id;
END IF;
If there are only one insert a day, and there is defined an index over the workfile_id value, then it shouldn't be any problem for your server at all.
Also, you could implement the second solution, doing:
DELIMITER //
CREATE TRIGGER append_subID_to_workfile_ID_salesjournal
AFTER INSERT ON salesjournal FOR EACH ROW
BEGIN
SET #COUNTER = ( SELECT (COUNT(*)-1) FROM salesjournal WHERE workfile_id = NEW.workfile_id );
IF #COUNTER > 1 THEN
UPDATE salesjournal SET total = total + NEW.total WHERE workfile_id = NEW.workfile_id AND id <> NEW.id;
DELETE FROM salesjournal WHERE id = NEW.id;
END IF;
END;//
DELIMITER ;
Hope this helps.
I am able to execute my stored procedure. When I execute it a second time instead of updating the existing values same values from source are inserted as new values.
i.e my target has
1
2
3
When I run the stored procedure a second time, instead of updating 1,2,3, it is inserting the same
1
2
3
1
2
3
My condition for when matched then select S.REPORT_TEST1 except T.REPORT_TEST1 is not working.
When I use the same code on a different table which doesn't have data conversions I am able to update.
Can anyone tell where am I going wrong?
CREATE PROCEDURE [dbo].[Merge]
INSERT INTO .[dbo].[TARGET](REPORT_TEST1, REPORT_TEST2, REPOST_TEST3)
FROM (MERGE [dbo].[TARGET] T
USING (SELECT
Cast([REPORT TEST1] as int) [REPORT_TEST1],
Cast([REPORT TEST2] as int) [REPORT_TEST2],
Cast([REPORT TEST3] as int) [REPORT_TEST3]
FROM
[dbo].[SOURCE]) S ON (T.[REPORT_TEST1] = S.[REPORT_TEST1])
WHEN NOT MATCHED BY TARGET
THEN INSERT
VALUES (S.REPORT_TEST1, S.REPORT_TEST2, S.REPOST_TEST3)
WHEN MATCHED
AND EXISTS (SELECT S.REPORT_TEST1, S.REPORT_TEST2, S.REPOST_TEST3
EXCEPT
SELECT T.REPORT_TEST1, T.REPORT_TEST2, T.REPOST_TEST3)
OUTPUT $ACTION ACTION_OUT,
S.REPORT_TEST1, S.REPORT_TEST2, S.REPOST_TEST3) ;
Thanks
would it not suffice to rewrite your WHEN MATCHED statement thusly:
WHEN MATCHED
AND S.REPORT_TEST2 <> T.REPORT_TEST2
AND S.REPORT_TEST3 <> T.REPORT_TEST3
(
SELECT
S.REPORT_TEST1
,S.REPORT_TEST2
,S.REPOST_TEST3
)
I think I understand what you're trying to do, but inside the MERGE context, you're only comparing this row with that row, not the source row against the whole target table. you could modify the subselect thusly if you're trying to query "this source is not at all in the target"
WHEN MATCHED AND EXISTS
(
SELECT
S.REPORT_TEST1
,S.REPORT_TEST2
,S.REPOST_TEST3
EXCEPT SELECT
T2.REPORT_TEST1
,T2.REPORT_TEST2
,T2.REPOST_TEST3
FROM
[dbo].[TARGET] T2
)
In DB2, I need to do a SELECT FROM UPDATE, to put an update + select in a single transaction.
But I need to make sure to update only one record per transaction.
Familiar with the LIMIT clause from MySQL's UPDATE option
places a limit on the number of rows that can be updated
I looked for something similar in DB2's UPDATE reference but without success.
How can something similar be achieved in DB2?
Edit: In my scenario, I have to deliver 1000 coupon codes upon request. I just need to select (any)one that has not been given yet.
The question uses some ambiguous terminology that makes it unclear what needs to be accomplished. Fortunately, DB2 offers robust support for a variety of SQL patterns.
To limit the number of rows that are modified by an UPDATE:
UPDATE
( SELECT t.column1 FROM someschema.sometable t WHERE ... FETCH FIRST ROW ONLY
)
SET column1 = 'newvalue';
The UPDATE statement never sees the base table, just the expression that filters it, so you can control which rows are updated.
To INSERT a limited number of new rows:
INSERT INTO mktg.offeredcoupons( cust_id, coupon_id, offered_on, expires_on )
SELECT c.cust_id, 1234, CURRENT TIMESTAMP, CURRENT TIMESTAMP + 30 DAYS
FROM mktg.customers c
LEFT OUTER JOIN mktg.offered_coupons o
ON o.cust_id = c.cust_id
WHERE ....
AND o.cust_id IS NULL
FETCH FIRST 1000 ROWS ONLY;
This is how DB2 supports SELECT from an UPDATE, INSERT, or DELETE statement:
SELECT column1 FROM NEW TABLE (
UPDATE ( SELECT column1 FROM someschema.sometable
WHERE ... FETCH FIRST ROW ONLY
)
SET column1 = 'newvalue'
) AS x;
The SELECT will return data from only the modified rows.
You have two options. As noted by A Horse With No Name, you can use the primary key of the table to ensure that one row is updated at a time.
The alternative, if you're using a programming language and have control over cursors, is to use a cursor with the 'FOR UPDATE' option (though that may be probably optional; IIRC, cursors are 'FOR UPDATE' by default when the underlying SELECT means it can be), and then use an UPDATE statement with the WHERE CURRENT OF <cursor-name> in the UPDATE statement. This will update the one row currently addressed by the cursor. The details of the syntax vary with the language you're using, but the raw SQL looks like:
DECLARE CURSOR cursor_name FOR
SELECT *
FROM SomeTable
WHERE PKCol1 = ? AND PKCol2 = ?
FOR UPDATE;
UPDATE SomeTable
SET ...
WHERE CURRENT OF cursor_name;
If you can't write DECLARE in your host language, you have to do manual bashing to find the equivalent mechanism.
First, here's the concise summary of the question:
Is it possible to run an INSERT statement conditionally?
Something akin to this:
IF(expression) INSERT...
Now, I know I can do this with a stored procedure.
My question is: can I do this in my query?
Now, why would I want to do that?
Let's assume we have the following 2 tables:
products: id, qty_on_hand
orders: id, product_id, qty
Now, let's say an order for 20 Voodoo Dolls (product id 2) comes in.
We first check if there's enough Quantity On Hand:
SELECT IF(
( SELECT SUM(qty) FROM orders WHERE product_id = 2 ) + 20
<=
( SELECT qty_on_hand FROM products WHERE id = 2)
, 'true', 'false');
Then, if it evaluates to true, we run an INSERT query.
So far so good.
However, there's a problem with concurrency.
If 2 orders come in at the exact same time, they might both read the quantity-on-hand before any one of them has entered the order.
They'll then both place the order, thus exceeding the qty_on_hand.
So, back to the root of the question:
Is it possible to run an INSERT statement conditionally, so that we can combine both these queries into one?
I searched around a lot, and the only type of conditional INSERT statement that I could find was ON DUPLICATE KEY, which obviously does not apply here.
INSERT INTO TABLE
SELECT value_for_column1, value_for_column2, ...
FROM wherever
WHERE your_special_condition
If no rows are returned from the select (because your special condition is false) no insert happens.
Using your schema from question (assuming your id column is auto_increment):
insert into orders (product_id, qty)
select 2, 20
where (SELECT qty_on_hand FROM products WHERE id = 2) > 20;
This will insert no rows if there's not enough stock on hand, otherwise it will create the order row.
Nice idea btw!
Try:
INSERT INTO orders(product_id, qty)
SELECT 2, 20 FROM products WHERE id = 2 AND qty_on_hand >= 20
If a product with id equal to 2 exists and the qty_on_hand is greater or equal to 20 for this product, then an insert will occur with the values product_id = 2, and qty = 20. Otherwise, no insert will occur.
Note: If your product ids are note unique, you might want to add a LIMIT clause at the end of the SELECT statement.
Not sure about concurrency, you'll need to read up on locking in mysql, but this will let you be sure that you only take 20 items if 20 items are available:
update products
set qty_on_hand = qty_on_hand - 20
where qty_on_hand >= 20
and id=2
You can then check how many rows were affected. If none were affected, you did not have enough stock. If 1 row was affected, you have effectively consumed the stock.
You're probably solving the problem the wrong way.
If you're afraid two read-operations will occur at the same time and thus one will work with stale data, the solution is to use locks or transactions.
Have the query do this:
lock table for read
read table
update table
release lock
I wanted to insert into a table using values so I found this solution to insert the values using the IF condition
DELIMITER $$
CREATE PROCEDURE insertIssue()
BEGIN
IF (1 NOT IN (select I.issue_number from issue as I where I.series_id = 1)) THEN
INSERT IGNORE INTO issue ( issue_number, month_published, year_published, series_id, mcs_issue_id) VALUES (1, 1, 1990, 1, 1);
END IF;
END$$
DELIMITER ;
If you later on want to call the procedure it's as simple as
CALL insertIssue()
You can find more information about PROCEDURES and if conditions in this site
I have a query written very poorly in SQL Server 2008
UPDATE PatientChartImages
SET PatientChartImages.IsLockDown = #IsLockdown
WHERE PatientChartImages.IsLockDown = #IsNotLockdown
AND PatientChartId IN (
SELECT PatientCharts.PatientChartId
FROM PatientCharts
WHERE ( PatientCharts.ChartStatusID = #ChartCompletedStatusID
OR PatientCharts.ChartStatusID = #ChartOnBaseStatusID
)
AND PatientCharts.IsLockDown = #IsNotLockdown
AND PatientCharts.CompletedOn IS NOT NULL
AND DATEDIFF(MINUTE, PatientCharts.CompletedOn, GETUTCDATE()) >= ( SELECT
tf.LockUpInterval
FROM
#tblFacCOnf tf
WHERE
tf.facilityId = PatientCharts.FacilityId
) )
This query locks the main table and results in TimeOut. IF i create a CTE first of all the updateable records and then update the main table by joining to the CTE. Will it help ??
First thing i advice you to do is to substitute IN condition with EXISTS. Second is to move all this conditional logic into CTE. Third is to substitute sub-select with #tblFacCOnf with join.
Last advice depends on your business logic and is not so important in my opinion
So at the end you will get something as
WITH search_cte as (
SELECT PatientCharts.PatientChartId
FROM PatientCharts
JOIN #tblFacCOnf tf on tf.facilityId = PatientCharts.FacilityId
WHERE ( PatientCharts.ChartStatusID = #ChartCompletedStatusID
OR PatientCharts.ChartStatusID = #ChartOnBaseStatusID
)
AND PatientCharts.IsLockDown = #IsNotLockdown
AND PatientCharts.CompletedOn IS NOT NULL
AND DATEDIFF(MINUTE, PatientCharts.CompletedOn, GETUTCDATE()) >= tf.LockUpInterval
) --cte end
UPDATE PatientChartImages
SET PatientChartImages.IsLockDown = #IsLockdown
WHERE PatientChartImages.IsLockDown = #IsNotLockdown
AND EXISTS (select 1 from PatientChartImages where PatientChartImages.PatientChartId = search_cte.PatientChartId)
One additional thing I might suggest if the other suggestions don't get you enough speed is not to use a table variable. Temp Tables are often faster for large data sets and can be indexed if need be.
The update lock is being held the time it takes to compute the CTE and the time for the update. The CTE time is probably causing the time out.
To reduce the lock time to the minimum required to update the target table. I suggest you create a temp table with two columns. Col1 is the primary key or cluster key of the target table and Col2 is the value you want in the target table. Wrap the temp table creation and fill the table with values according to your business logic within a transaction. Update the target table using a join to the temp table and the value from the temp table in a seperate transaction. After update drop the temp table.
I think you should create an SQL script (or stored procedure, if you will use it from a higher level) where you store the results of your selection into a cursor (you'll only have to find the PatientCartId's of the rows to be updated) and then you should use it in your update, so, the answer is yes.
It's easy to test this, you should put these commands into a transaction, rollback the transaction and before the rollback you should perform a selection to test your results. Good luck.