MySQL - Optimize Orphan Record Grooming

MySQL - Optimize Orphan Record Grooming - mysql

So here is my problem, I have written a stored procedure to do the following task. In table events there are events that might potentially exist for venues that no longer exist. Not all events are tied to a venue, but the ones that are have an integer value in their venue id field otherwise it is NULL (or potentially zero but that is accounted for). Periodically, venues get deleted from our system, when that happens it is not possible to delete all of the events associated with that venue at that exact time. Instead, a task is run periodically at a later time that deletes every event that has a venue id that no longer references an existing record in the venues table. I have written a stored procedure for this and it seems to work.
This is the stored procedure:
DROP PROCEDURE IF EXISTS delete_synced_events_orphans;
DELIMITER $$
CREATE PROCEDURE delete_synced_events_orphans()
BEGIN
DECLARE event_count int(11) DEFAULT 0;
DECLARE active_event_id int(11) DEFAULT 0;
DECLARE active_venue_id int(11) DEFAULT 0;
DECLARE event_to_delete_id int(11) DEFAULT NULL;
CREATE TEMPORARY TABLE IF NOT EXISTS possible_events_to_delete (
event_id int(11) NOT NULL,
venue_id_temp int(11) NOT NULL
) engine = memory;
# create an "array" which is a table that holds the events that might need deleting
INSERT INTO possible_events_to_delete (event_id, venue_id_temp) SELECT `events`.`id`, `events`.`venue_id` FROM `events` WHERE `events`.`venue_id` IS NOT NULL AND `events`.`venue_id` <> 0;
SELECT COUNT(*) INTO `event_count` FROM `possible_events_to_delete` WHERE 1;
detector_loop: WHILE `event_count` > 0 DO
SELECT event_id INTO active_event_id FROM possible_events_to_delete WHERE 1 LIMIT 1;
SELECT venue_id_temp INTO active_venue_id FROM possible_events_to_delete WHERE 1 LIMIT 1;
# this figures out if there are events that need to be deleted
SELECT `events`.`id` INTO event_to_delete_id FROM `events`, `venues` WHERE `events`.`venue_id` <> `venues`.`id` AND `events`.`id` = active_event_id AND `events`.`venue_id` = active_venue_id;
#if no record meets that query, the active event is safe to delete
IF (event_to_delete_id <> 0 AND event_to_delete_id IS NOT NULL) THEN
DELETE FROM `events` WHERE `events`.`id` = event_to_delete_id;
#INSERT INTO test_table (event_id_test, venue_id_temp_test) SELECT `events`.`id`, `events`.`venue_id` FROM `events` WHERE `events`.`id` = event_to_delete_id;
END IF;
DELETE FROM possible_events_to_delete WHERE `event_id` = active_event_id AND `venue_id_temp` = active_venue_id;
SET `event_count` = `event_count` - 1;
END WHILE;
END $$
DELIMITER ;
Here is the table structure for the two tables in question:
CREATE TABLE IF NOT EXISTS events (
id int(11) NOT NULL,
event_time timestamp NOT NULL,
venue_id_temp int(11) NOT NULL
);
CREATE TABLE IF NOT EXISTS venues (
event_id int(11) NOT NULL,
venue_id_temp int(11) NOT NULL
);
The stored procedure works as written, but I want to know about ways that it could be made to run better. It seems like its doing a lot of extra processing to achieve its goal. Are there better ways I could query the data at hand, or are there other more useful commands and key words I could use that I just don't know about, which would allow me to complete this task better (fewer lines less computation). I am still learning how to use stored procedures, so I am using them to complete tasks as pragmatically as possible, I want to understand how this specific query could be made to better use the full range of features in MySQL to its advantage. Thank you folks.

Everithing is much simpler:
DROP PROCEDURE IF EXISTS delete_synced_events_orphans;
DELIMITER $$
CREATE PROCEDURE delete_synced_events_orphans()
BEGIN
DELETE
FROM `events`
WHERE `venue_id` IS NOT NULL AND `venue_id` <> 0
AND `venue_id` NOT IN (SELECT `id` FROM `venues`)
;
END $$
DELIMITER ;
That's it. :)
You think imperatively, trying to say MySQL how to complete your task. But SQL is a declarative language, designed for saying what to do.

Related

What is error? i'm trying to fill table with random values

I have two similar tables:
CREATE TABLE `t1` (
`id` int(11) NOT NULL AUTO_INCREMENT ,
`c1` int(11) NOT NULL DEFAULT '0',
`c2` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `idx_c1` (`c1`)
) ENGINE=InnoDB;
CREATE TABLE `t2` (
`id` int(11) NOT NULL AUTO_INCREMENT ,
`c1` int(11) NOT NULL DEFAULT '0',
`c2` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `idx_c1` (`c1`)
) ENGINE=InnoDB;
I want to fill both tables with random values:
drop procedure if exists random_records;
truncate table t1;
truncate table t2;
delimiter $$
create procedure random_records(n int)
begin
set #i=1;
set #m=100000;
while #i <= n do
insert into t1(c1,c2) values(rand()*#m,rand()*#m);
insert into t2(c1,c2) values(rand()*#m,rand()*#m);
set #i=#i+1;
end while;
end $$
delimiter ;
call random_records(100);
select * from t1 limit 10;
select * from t2 limit 10;
select count(*) from t1;
select count(*) from t2;
Here is what i see in table t1:
I don't understand why there is a lot of '0' and'1'
Function count() returns 210 for t1 and 208 for t2 - one mystery more

The most likely reason for the presence of many zeros and ones in the c1 and c2 columns of both tables is that the rand() function is returning very small numbers. This is because the #m variable, which is used to scale the random numbers generated by rand(), is set to a relatively low value of 100,000.
As a result, the random numbers generated are mostly between 0 and 0.00001, which is why you are seeing many zeros and ones in the tables. To fix this, you can increase the value of #m to a higher number, such as 1,000,000 or even 10,000,000, to generate larger random numbers.
As for the discrepancy in the number of rows in the two tables, it is likely due to the fact that the insert statements in the random_records procedure are not being executed atomically.
This means that there is a chance that one of the insert statements could fail, resulting in fewer rows being inserted into one of the tables. To fix this, you can wrap the insert statements in a transaction to ensure that they are executed as a single unit of work.
For example, you can modify the random_records procedure as follows:
drop procedure if exists random_records;
truncate table t1;
truncate table t2;
delimiter $$
create procedure random_records(n int)
begin
set #i=1;
set #m=1000000;
start transaction;
while #i <= n do
insert into t1(c1,c2) values(rand()*#m,rand()*#m);
insert into t2(c1,c2) values(rand()*#m,rand()*#m);
set #i=#i+1;
end while;
commit;
end $$
delimiter ;
This should ensure that the insert statements are executed atomically and that the number of rows in both tables is consistent.

MySQL stored procedure where clause is not filtering records

I'm working with a db where the keys are all binary(16), essentially stored as a GUID with a couple of the values flipped around. I have a simple stored procedure where I want to filter out a single by ID.
delimiter //
create procedure select_item_by_id (
in id binary(16)
)
begin
select
`id`,
`name`
from
`item`
where
`id` = id;
end //
delimiter ;
When I fire it like so, it pulls back all the records in the table, no filtering is done:
call select_item_by_id(unhex('11e7deb1b1628696ad3894b2c0ab197a'));
However, if I run it manually...it filters the record exactly as expected:
select
`id`,
`name`
from
`item`
where
`id` = unhex('11e7deb1b1628696ad3894b2c0ab197a');
I even tried passing in a string/chars and doing the unhex inside of the sproc, but that pulls zero results:
delimiter //
create procedure select_item_by_id (
in id char(32)
)
begin
select
`id`,
`name`
from
`item`
where
`id` = unhex(id);
end //
delimiter ;
call select_item_by_id('11e7deb1b1628696ad3894b2c0ab197a');
Pretty weird. What am I doing wrong?

It's likely that WHERE id = id is always evaluating to true, as it might be checking if the row's id is equal to itself. Rename the parameter to something else.

Rename the parameter of your proc:
create procedure select_item_by_id (
in idToTest char(32)
)
and use
where
`id` = idToTest;
to avoid ambiguity.

Remove duplicates from MySQL DB

I've got a database with over 7000 records. As it turns out, there are several duplicates within those records. I found several suggestions on how to delete duplicates and keep only 1 record.
But in my case things are a bit more complicated: cases are not simply duplicates if they hold the same data as another record. Instead, several cases ar perfectly okay holding the same data. They are marked as duplicate only when they hold the same data AND are both inserted within 30 seconds.
Therefore I need a SQL statement that deletes duplicates (eg: all fields, except id and datetime) if they have been inserted within a 40 seconds range (eg: evaluating the datetime field).
Since I'm everything but a SQL expert and can't find a suitable solution online, I truly hope some of you might help me out and point me in the right direction. That would be very appreciated!
The table structure is as following:
CREATE TABLE IF NOT EXISTS `wp_ttr_results` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`schoolyear` varchar(10) CHARACTER SET utf8 DEFAULT NULL,
`datetime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`area` varchar(15) CHARACTER SET utf8 NOT NULL,
`content` varchar(10) CHARACTER SET utf8 NOT NULL,
`types` varchar(100) CHARACTER SET utf8 NOT NULL,
`tasksWrong` varchar(300) DEFAULT NULL,
`tasksRight` varchar(300) DEFAULT NULL,
`tasksData` longtext CHARACTER SET utf8,
`parent_id` varchar(20) DEFAULT NULL,
UNIQUE KEY `id` (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=68696 ;
So just to clarify again, a duplicate case is a case that:
[1]holds the same data as another case for all fields, except the id and datetime field
[2]is inserted in the DB, according to the datetime field, within 40 seconds of another record with the same values
If both conditions are met, all cases except one, should be deleted.

As #Juru pointed out in the comments, we need quite a surgical knive to cut this one. It is however possible to do this in an iterative way via a stored procedure.
First we use a self-join to identify the first duplicate for every record, that itself is not a duplicate:
SELECT DISTINCT
MIN(postdups.id AS id)
FROM wp_ttr_results AS base
INNER JOIN wp_ttr_results AS postdups
ON base.id<postdups.id
AND UNIX_TIMESTAMP(postdups.datetime)-UNIX_TIMESTAMP(base.datetime)<40
AND base.user_id=postdups.user_id
AND base.schoolyear=postdups.schoolyear
AND base.area=postdups.area
AND base.content=postdups.content
AND base.types=postdups.types
AND base.tasksWrong=postdups.tasksWrong
AND base.tasksRight=postdups.tasksRight
AND base.parent_id=postdups.user_id
LEFT JOIN wp_ttr_results AS predups
ON base.id>predups.id
AND UNIX_TIMESTAMP(base.datetime)-UNIX_TIMESTAMP(predups.datetime)<40
AND base.user_id=predups.user_id
AND base.schoolyear=predups.schoolyear
AND base.area=predups.area
AND base.content=predups.content
AND base.types=predups.types
AND base.tasksWrong=predups.tasksWrong
AND base.tasksRight=predups.tasksRight
AND base.parent_id=predups.user_id
WHERE predups.id IS NULL
GROUP BY base.id
;
This selects the lowest id of all later records (base.id<postdups.id), that have the same payload as an existing record and are within a 40s window (UNIX_TIMESTAMP(dups.datetime)-UNIX_TIMESTAMP(base.datetime)<40), but skips those base records, that are duplicates themselves. In #Juru's example, the :30 record would be hit, as it is a duplicate of the :00 record, which itself is not a duplicate, but the :41 record would not be hit, as it is a duplicate only to :30, which itself is a duplicate of :00.
We have
Now we have to remove this record - since MySQL can't delete from a table it is reading, we must use a variable to achieve that:
CREATE TEMPORARY TABLE cleanUpDuplicatesTemp SELECT DISTINCT
-- as above
;
DELETE FROM wp_ttr_results
WHERE id IN
(SELECT id FROM cleanUpDuplicatesTemp)
;
DROP TABLE cleanUpDuplicatesTemp
;
Until now we will have removed the first duplicate for each record, in the process possibly changing, what would be considered a duplicate ...
Finally we must loop through this process, exiting the loop if the SELECT DISTINCT returns nothing.
Putting it all together into a stored proceedure:
DELIMITER ;;
CREATE PROCEDURE cleanUpDuplicates()
BEGIN
DECLARE numDuplicates INT;
iterate: LOOP
DROP TABLE IF EXISTS cleanUpDuplicatesTemp;
CREATE TEMPORARY TABLE cleanUpDuplicatesTemp
SELECT DISTINCT
MIN(postdups.id AS id)
FROM wp_ttr_results AS base
INNER JOIN wp_ttr_results AS postdups
ON base.id<postdups.id
AND UNIX_TIMESTAMP(postdups.datetime)-UNIX_TIMESTAMP(base.datetime)<40
AND base.user_id=postdups.user_id
AND base.schoolyear=postdups.schoolyear
AND base.area=postdups.area
AND base.content=postdups.content
AND base.types=postdups.types
AND base.tasksWrong=postdups.tasksWrong
AND base.tasksRight=postdups.tasksRight
AND base.parent_id=postdups.user_id
LEFT JOIN wp_ttr_results AS predups
ON base.id>predups.id
AND UNIX_TIMESTAMP(base.datetime)-UNIX_TIMESTAMP(predups.datetime)<40
AND base.user_id=predups.user_id
AND base.schoolyear=predups.schoolyear
AND base.area=predups.area
AND base.content=predups.content
AND base.types=predups.types
AND base.tasksWrong=predups.tasksWrong
AND base.tasksRight=predups.tasksRight
AND base.parent_id=predups.user_id
WHERE predups.id IS NULL
GROUP BY base.id;
SELECT COUNT(*) INTO numDuplicates FROM cleanUpDuplicatesTemp;
IF numDuplicates<=0 THEN
LEAVE iterate;
END IF;
DELETE FROM wp_ttr_results
WHERE id IN
(SELECT id FROM cleanUpDuplicatesTemp)
END LOOP iterate;
DROP TABLE IF EXISTS cleanUpDuplicatesTemp;
END;;
DELIMITER ;
Now a simple CALL cleanUpDuplicates; should do the trick.

This might work, but it probably won't be very fast...
DELETE FROM dupes
USING wp_ttr_results AS dupes
INNER JOIN wp_ttr_results AS origs
ON dupes.field1 = origs.field1
AND dupes.field2 = origs.field2
AND ....
AND AS dupes.id <> origs.id
AND dupes.`datetime` BETWEEN orig.`datetime` AND (orig.`datetime` + INTERVAL 40 SECOND)
;

Mysql Cursor not sorting result in correct order

This is an abstraction of my original code as it´ll be easier to read for you guys.
I´m new to Mysql Storage procedures and from Cursors.
Whats happening is that Cursor is not bringing the results properly sorted as I set the ORDER BY instruction on the query.
Here´s all the structure and data for the tables to reproduce the issue.
Log Table :
DROP TABLE IF EXISTS `log`;
CREATE TABLE `log` (
`key` text NOT NULL,
`value` text NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Test Table :
DROP TABLE IF EXISTS `test1`;
CREATE TABLE `test1` (
`ID` bigint(8) unsigned NOT NULL AUTO_INCREMENT,
`price` float(16,8) NOT NULL,
PRIMARY KEY (`ID`),
KEY `price` (`price`)
) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=latin1;
Test table data :
INSERT INTO `test1` (`price`)
VALUES (100),(200),(300),(400),(300),(200),(100);
Query:
SELECT *
FROM `test1`
ORDER BY price DESC;
Expected results works fine with query directly:
4 - 400.00000000
5 - 300.00000000
3 - 300.00000000
6 - 200.00000000
2 - 200.00000000
7 - 100.00000000
1 - 100.00000000
Stored Procedure
DROP PROCEDURE IF EXISTS `test_proc1`;
DELIMITER ;;
CREATE DEFINER=`root`#`localhost` PROCEDURE `test_proc1`()
BEGIN
DECLARE done INT DEFAULT 0;
DECLARE ID BIGINT(8);
DECLARE price FLOAT(16,8);
DECLARE cur1 CURSOR FOR
SELECT * FROM `test1` ORDER BY price DESC; #Exact Query
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
START TRANSACTION;
OPEN cur1;
#Cleaning log
TRUNCATE `log`;
read_loop:
LOOP
FETCH cur1 INTO ID,price;
IF done = 1 THEN
LEAVE read_loop;
END IF;
#Inserting data to log
INSERT INTO `log`
VALUES (ID,price);
END LOOP read_loop;
CLOSE cur1;
COMMIT;
#Bring log for result
SELECT * FROM log;
END;;
DELIMITER ;
Call procedure
CALL test_proc1();
The CURSOR has exactly the same query as I posted at the top, you can check that on the Stored Procedure. But when I loop through it, I get another order.
15 100.00000000
21 100.00000000
16 200.00000000
20 200.00000000
17 300.00000000
19 300.00000000
18 400.00000000
Whats going on? Can somebody help me on this?
I also tried nesting the query like this with no fix at all.
SELECT * FROM(
SELECT *
FROM `test1`
ORDER BY price DESC) AS tmp_tbl

Looks like you have a "variable collision". Variable price is used instead of table column with that exact name. Change variable name, or use table alias like this:
SELECT * FROM `test1` as `t` ORDER BY `t`.`price` DESC;

MySQL trigger delete passing user ID

I need to create MySQL trigger that would log user ID on delete table row statement which must fit in one query, since I'm using PHP PDO. This is what I've come up so far:
I need the way to pass user ID in the delete query even though it is irrelevant to delete action to be performed:
Normally the query would look like this:
DELETE FROM mytable WHERE mytable.RowID = :rowID
If I could use multiple queries in my statement, I would do it like this:
SET #userID := :userID;
DELETE FROM mytable WHERE mytable.RowID = :rowID;
This way the variable #userID would be set before trigger event fires and it can use it. However since I need to squeeze my delete statement in one query, so I came up with this:
DELETE FROM mytable
WHERE CASE
WHEN #userID := :userID
THEN mytable.RowID = :rowID
ELSE mytable.RowID IS NULL
END
Just a note: RowID will never be null since it's the primary key. Now I have to create a delete trigger to log the user ID to the audit table, however I suppose that in this case trigger will be fired before the delete query itself which means that #userID variable will not be created? This was my idea of passing it as a value to the trigger.
I feel like I'm close to the solution, but this issue is a blocker. How to pass user ID value to the trigger without having multiple queries in the statement? Any thoughts, suggestions?

You can use NEW / OLD mysql trigger extensions. Reference: http://dev.mysql.com/doc/refman/5.0/en/trigger-syntax.html
Here is a sample code :
drop table `project`;
drop table `projectDEL`;
CREATE TABLE `project` (
`proj_id` int(11) NOT NULL AUTO_INCREMENT,
`proj_name` varchar(30) NOT NULL,
`Proj_Type` varchar(30) NOT NULL,
PRIMARY KEY (`proj_id`)
);
CREATE TABLE `projectDEL` (
`proj_id` int(11) NOT NULL AUTO_INCREMENT,
`proj_name` varchar(30) NOT NULL,
`Proj_Type` varchar(30) NOT NULL,
PRIMARY KEY (`proj_id`)
);
INSERT INTO `project` (`proj_id`, `proj_name`, `Proj_Type`) VALUES
(1, 'admin1', 'admin1'),
(2, 'admin2', 'admin2');
delimiter $
CREATE TRIGGER `uProjectDelete` BEFORE DELETE ON project
FOR EACH ROW BEGIN
INSERT INTO projectDEL SELECT * FROM project WHERE proj_id = OLD.proj_id;
END;$
delimiter ;
DELETE FROM project WHERE proj_id = 1;
SELECT * FROM project;
SELECT * FROM projectDEL;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL - Optimize Orphan Record Grooming - mysql

Related

What is error? i'm trying to fill table with random values

MySQL stored procedure where clause is not filtering records

Remove duplicates from MySQL DB

Mysql Cursor not sorting result in correct order

MySQL trigger delete passing user ID

Categories

Resources