I have two tables, reviews and grade.
Reviews table has
id_review (primary key), id_lang, email, text etc.
Example
1, 2, email#email.com, test text
2, 2, email#email.com, test text
4, 2, email#email.com, test text
Grade table has
id_review (primary/foreign key), id_criterion, grade
1, 3, 5.00
1, 1, 4.00
2, 3, 3.00
2, 1, 5.00
4, 2, 3.00
I need to copy all the reviews with lang id 2, change the text and the lang id to 1 (this I can do manually).
But as the id_review changes with the copied reviews, I need to create new rows on the grade table, too. Is there a way to make sure that the foreign keys are matched with the copied reviews, too?
I tried to do it the old fashioned way with copy/paste on csv but as some reviews are removed from the reviews table and some reviews have differences in id_criterion count, it's very hard to do for a large table.
Or should I try to edit the table to allow the reviews table to have distinct values for id_lang with the same id_review?
You can create temporary tables (no foreign keys) out of the original ones, patch and validate the data until you satisfy and then insert back to the original tables.
I am not sure how you populate id_review, so I assume they are auto generated when you insert new rows.
create table reviews_temp_20211119 as
select r.id_review as old_id_review
, 0 as new_id_review
, row_number() over(order by r.id_review) as ref_patch_id
, r.id_lang
, r.email
, r.text
from reviews r
where id_lang = 2;
create table grades_temp_20211119 as
select g.id_review
, g.id_criterion
, g.gradate
, 0 as new_id_review
from grades g
where g.id_review in (select t.old_id_review from reviews_temp_20211119 t);
update reviews_temp_20211119
set id_lang = 1;
alter table reviews add column ref_patch_id bigint null;
-- insert back to original to get the auto generated id_review
-- if you use other strategies to populate the id_review, you can do update it directly to the temp table and review if all the data are correct before insert back to the original table
insert into reviews (id_lang, email, text, ref_patch_id)
select id_lang, email, text, ref_patch_id
from reviews_temp_20211119;
update reviews_temp_20211119 t
join reviews r on (r.ref_patch_id = t.ref_patch_id)
set t.new_id_review = r.id_review;
update grades_temp_20211119 g
join reviews_temp_20211119 t on (g.id_review = t.old_id_review)
set g.new_id_review = t.new_id_review);
insert into grades (id_review, id_criterion, grade)
select t.new_id_review
, t.id_criterion
, t.grade
from grades_temp_20211119 t;
By keeping the temporary tables, you have opportunity to review or rollback the change if something went wrong by looking back at the temporary tables.
For a repeatable process, I think a stored procedure with cursors is the way. Here's my version, it accept two parameters, the old idLang you wish to copy, and the new idLang:
CREATE PROCEDURE copyReviewWithNewLang(IN oldidLang INT, IN newidLang INT)
BEGIN
DECLARE c_idReview, c_maxIdReview INT;
DECLARE c_text, c_email VARCHAR;
DECLARE old_c_idreview INT DEFAULT 0;
-- first cursor gets all the review rows of the old language, ordered
DECLARE rev_cur CURSOR FOR SELECT idReview, email, text FROM reviews WHERE id_lang = oldidLang ORDER BY idReview ASC;
-- second cursor gets the highest idReview
DECLARE maxid_cur CURSOR FOR SELECT MAX(idReview) FROM reviews;
-- needed for ending the loop on end of retrieved data
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
OPEN rev_cur;
retrieving : LOOP
FETCH rev_cur INTO c_idReview, c_email, c_text;
-- ending the loop
IF done THEN
LEAVE retrieving;
END IF;
IF (old_c_idreview = 0) OR (old_c_idreview != c_idReview) THEN
OPEN maxid_cur;
FETCH maxid_cur INTO c_maxIdReview;
CLOSE maxid_cur;
SET c_maxIdReview = c_maxIdReview + 1
END IF;
-- copying the review row
INSERT INTO reviews (id_review, id_lang, email, text)
VALUES(c_maxIdReview, newidLang, c_email, c_text)
-- copying the grade rows
INSERT INTO grades (id_review, id_criterion, grade)
SELECT c_maxIdReview, id_criterion, grade FROM grades
WHERE id_review = c_idReview;
-- needed for checking if id changed
SET old_c_idreview = c_idReview;
END LOOP;
CLOSE rev_cur;
END;
Related
Background info: We have a community room, which can be divided in half via a curtain. In the past when a group needed the full room we put 2 entries in, one for each half... However we've modified the software (MRBS) so that there are now 3 rooms (Full{1},Closet Side{2}, and Kitchen Side{3}) and the software checks that you can't reserve a partial room when the full is already booked and vice versa. However we have plenty of old "full room" reservations made by reserving both sides. So when 2 & 3 are identical I need to move one of the bookings to 1 and delete the other.
So I have a table such as:
id room_id start_time name
1 2 13:00 Meeting
2 2 15:00 Meeting
3 3 15:00 Meeting
4 3 13:00 Storytime
I want to go through the table, and when room 2 & 3 both have entries at the same time and with the same name I want to change room 2's room_id to 1 and delete the entry for room 3. So in the above example entry 2 would be modified and entry 3 would be deleted.
I'm fairly certain this needs to be two separate queries; EG first where there is a match change all of the room_id's for 2 to 1, then as a separate query, compare room 1&3 and delete entries on 3.
I think this is close for changing room 2 to 1:
UPDATE `mrbs_entry`
JOIN `mrbs_entry` AS `other_side` ON `other_side.room_id` = '3'
AND `other_side.name` = `mrbs_entry.name`
AND `other_side.start_time` = `mrbs_entry.start_time`
AND `other_side.id` != `mrbs_entry.id`
SET `mrbs_entry.room_id` = '1'
WHERE (`mrbs_entry.room_id` = '2' AND `mrbs_entery.id` IN(92437,92438,92442,92443,92470,92471,92477,92478,92489,89462,92496,90873))
however I get an #1054 - Unknown column 'mrbs_entry.room_id' in 'field list' error
Note: the IN(*) bit is to limit it to a few test entries to make sure it's actually working as expected.
Method 1 - Stored Procedure with temporary table
This seems the simplest method if you're prepared to use a Stored Procedure and temporary table:
CREATE PROCEDURE sp_sanitize_mrbs()
BEGIN
DROP TEMPORARY TABLE IF EXISTS mrbs_to_sanitize;
CREATE TEMPORARY TABLE mrbs_to_sanitize (
id int auto_increment primary key,
room2_id int,
room3_id int);
-- "I want to go through the table, and when room 2 & 3 both have
-- entries at the same time and with the same name I want to..."
INSERT INTO mrbs_to_sanitize (room2_id, room3_id)
SELECT m1.id, m2.id
FROM mrbs_entry m1
CROSS JOIN mrbs_entry m2
WHERE m1.start_time = m2.start_time
AND m1.name = m2.name
AND m1.room_id = 2
AND m2.room_id = 3;
-- ...change room 2's room_id to 1
UPDATE mrbs_entry me
JOIN mrbs_to_sanitize mts
ON me.id = mts.room2_id
SET me.room_id = 1;
-- "...and delete the entry for room 3."
DELETE me
FROM mrbs_entry me
JOIN mrbs_to_sanitize mts
ON me.id = mts.room3_id;
END//
-- ...
-- The Stored Procedure can now be called any time you like:
CALL sp_sanitize_mrbs();
See SQL Fiddle Demo - using a Stored Procedure
Method 2 - without Stored Procedure
The following "trick" is slightly more complex but should do it without using stored procedures, temporary tables or variables:
-- "I want to go through the table, and when room 2 & 3 both have
-- entries at the same time and with the same name I want to..."
-- "...change room 2's room_id to 1"
UPDATE mrbs_entry m1
CROSS JOIN mrbs_entry m2
-- temporarily mark this row as having been updated
SET m1.room_id = 1, m1.name = CONCAT(m1.name, ' UPDATED')
WHERE m1.start_time = m2.start_time
AND m1.name = m2.name
AND m1.room_id = 2
AND m2.room_id = 3;
-- "...and delete the entry for room 3."
DELETE m2 FROM mrbs_entry m1
CROSS JOIN mrbs_entry m2
WHERE m1.start_time = m2.start_time
AND m1.name = CONCAT(m2.name, ' UPDATED')
AND m1.room_id = 1
AND m2.room_id = 3;
-- now remove the temporary marker to restore previous value
UPDATE mrbs_entry
SET name = LEFT(name, CHAR_LENGTH(name) - CHAR_LENGTH(' UPDATED'))
WHERE name LIKE '% UPDATED';
Explanation of Method 2
The first query updates the room number. However, as you mentioned, we need to perform the delete in a separate query. Since I'm not making any assumptions about your data, a safe way of requerying to get the same results once they have been modified is to introduce a "marker" to temporarily indicate which row was changed by the update. In the example above, this marker is 'UPDATED ' but you may wish to choose something more likely to never be used for any other purpose e.g. a random sequence of characters. It could also be moved onto a different field if required. The delete can then be performed and finally the marker needs to be removed to restore the original data.
See SQL Fiddle demo - without Stored Procedure.
You can't use the table name for an update in the line for SET.
You could probably get away with changing that line to just
SET `room_id` = '1'
But this is probably safer from the standpoint of ensuring the query works like you want it to:
UPDATE
`mrbs_entry`
set `room_id` = '1'
WHERE `id` IN
(
SELECT `mrbs_entry.id` FROM
`mrbs_entry`
JOIN `mrbs_entry` AS `other_side` ON `other_side.room_id` = '3'
AND `other_side.name` = `mrbs_entry.name`
AND `other_side.start_time` = `mrbs_entry.start_time`
AND `other_side.id` != `mrbs_entry.id`
WHERE (`mrbs_entry.room_id` = '2' AND `mrbs_entry.id` IN(92437,92438,92442,92443,92470,92471,92477,92478,92489,89462,92496,90873))
) AS T
Run the inner query until it's pulling the right group of ids, then run the whole thing to change the room_ids
I think you need to implemented a trigger or more exactly a store procedure that returns a trigger.
1) Check the entries have the same title.
2) UPDATE room to 2 to room 1.
3) DELETE entry for room 3.
Something like this.
I would do it in 2 steps. The first step would be non-destructive. This way I could see if the first step looked correct before moving on to the actual modifications to mrbs_entry.
First create a temp table with the self-join to discover exact matches of rooms 2 and 3. This extra table would effectively list the reservations that need to go in as room 1, plus it lists the reservations for 2 and 3 that need deleting.
CREATE TABLE tmp
SELECT a.*
FROM mrbs_entry AS a
JOIN mrbs_entry AS b ON a... = b...
WHERE a.room_id = 2
AND b.room_id = 3;
The ON is start_time and/or date and/or name or some combo of them. Note that only one set of columns (a.*) is captured.
That table is deliberately not a CREATE TEMPORARY TABLE, but rather a 'permanent' table. Check the contents to see if was created correctly.
The second step breaks into 2 queries, but in a single transaction for some extra security:
BEGIN;
DELETE mrbs_entry
FROM mrbs_entry
JOIN tmp ON mrbs_entry... = tmp...
WHERE mrbs_entry.room_id IN (2,3); -- delete both reservations
INSERT INTO mrbs_entry
SELECT 1 as room_id, ...
FROM tmp; -- Insert the "full" assignments
COMMIT;
And finally clean up -- after you have further confirmed that the changes were good,
DROP TABLE tmp;
This worked for me, I think it's pretty self explanatory and follows your logic of using two queries.
-- update query
update mrbs_entry
join
(
select t1.*
from mrbs_entry as t1
join mrbs_entry as t2
on
t2.room_id <> t1.room_id -- get different rooms
and t2.start_time = t1.start_time -- with same start time
and t2.name = t1.name -- and same name
where
t1.room_id = 2 -- where the left side has room id 2
and t2.room_id = 3 -- and the right side has room id 3
) as t
on
t.id = mrbs_entry.id -- set those room ids (2) to
set mrbs_entry.room_id = 1 ; -- the new id (1)
-- delete query
delete mrbs_entry
from mrbs_entry
join mrbs_entry as t1
on
t1.room_id <> mrbs_entry.room_id -- same as above, different room ids
and t1.start_time = mrbs_entry.start_time -- same start time
and t1.name = mrbs_entry.name -- same name
where
mrbs_entry.room_id = 3 -- get the stuff that has room id 3 (to get rid of it)
and t1.room_id = 1 -- and new duplicate room id 1
;
I have modified your statement a little bit and it worked correctly (at least at first glance).
UPDATE mrbs_entry AS tab1
INNER JOIN (SELECT other_side.id, other_side.name, other_side.start_time
FROM mrbs_entry AS other_side
WHERE other_side.room_id = 3) as tab2
ON tab1.start_time = tab2.start_time
AND tab1.name = tab2.name
AND tab1.id != tab2.id
SET tab1.room_id = 1
WHERE tab1.room_id = 2
AND tab1.id IN(1,2,3,4)
I'm confused by some of the answers here as the actual problem isn't all that hard IMHO.
First you need to identify all reservations for all the sub-rooms. In other words, you need a list of reservations with the same name and the same start_time but one for room_id 2 and one for room_id 3. (GROUP BY & HAVING)
You then convert this list in room_id 1 entries (INSERT)
Finally you get rid of all room_id 2 and 3 entries when there also is a (freshly inserted) room_id 1 entry. (DELETE)
-- setup
CREATE TABLE mrbs_entry (id int auto_increment primary key, room_id int, start_time varchar(255), name varchar(255));
INSERT INTO mrbs_entry (room_id, start_time, name) VALUES (2, '13:00', 'Meeting');
INSERT INTO mrbs_entry (room_id, start_time, name) VALUES (2, '15:00', 'Meeting');
INSERT INTO mrbs_entry (room_id, start_time, name) VALUES (3, '15:00', 'Meeting');
INSERT INTO mrbs_entry (room_id, start_time, name) VALUES (3, '13:00', 'Storytime');
INSERT INTO mrbs_entry (room_id, start_time, name) VALUES (2, '18:00', 'Test');
INSERT INTO mrbs_entry (room_id, start_time, name) VALUES (3, '18:00', 'Test');
-- step 1: "convert" reservations that book all 'sub-rooms' into a 'full-room' reservation
INSERT `mrbs_entry` (room_id, start_time, name)
SELECT 1, -- full room
start_time,
name
FROM `mrbs_entry` new
WHERE room_id IN (2, 3) -- reserved left and/or right side
-- safety check, avoid inserting the value if it already exists!
AND NOT EXISTS ( SELECT *
FROM `mrbs_entry` old
WHERE old.room_id = 1
AND old.start_time = new.start_time
AND old.name = new.name )
-- must have reserved both sides of the room
GROUP BY start_time,
name
HAVING COUNT(*) = 2;
-- step 2: get rid of all 'sub-room' reservations in case there already is a 'full-room' reservation
DELETE del
FROM `mrbs_entry` del
JOIN `mrbs_entry` fr -- WHERE exists a room_id 1 for same time & name
ON fr.room_id = 1
AND fr.start_time = del.start_time
AND fr.name = del.name
WHERE del.room_id IN (2, 3); -- get rid of half-rooms
PS: I'd prefer to have used the WHERE EXISTS syntax in step 2, but couldn't it get to work right away.. will see if I can find the time later today (I usually work in TSQL which is slightly different)
Along the lines of Steve Chambers solution Method 2 I would do that in three steps of one transaction.
Test case:
CREATE TABLE mrbs_entry (id int auto_increment primary key, room_id int, start_time varchar(255), name varchar(255));
INSERT INTO mrbs_entry
(room_id, start_time, name)
VALUES
(2, '13:00', 'Meeting'),
(2, '15:00', 'Meeting'),
(3, '15:00', 'Meeting'),
(3, '13:00', 'Storytime');
Solution:
START TRANSACTION;
-- 1. Mark one "side" of the pair with temporary artificial room_id = 0.
UPDATE mrbs_entry side2
INNER JOIN mrbs_entry side3 ON side3.room_id = 3 AND
side2.start_time = side3.start_time AND
side2.name = side3.name
SET side2.room_id = 0
WHERE side2.room_id = 2;
-- 2. Delete the other "side" of the pair.
DELETE side3 FROM mrbs_entry side3
INNER JOIN mrbs_entry side0 ON side0.room_id = 0 AND
side0.start_time = side3.start_time AND
side0.name = side3.name
WHERE side3.room_id = 3;
-- 3. Reset the mark to a valid value.
UPDATE mrbs_entry
SET room_id = 1
WHERE room_id = 0;
COMMIT;
This will not work if there are constraints on room_id, but if there are none that would be the safest and most efficient solution. Even if there are constraints, room_id = 0 can be temporarily added to the list of possible values at the start of the transaction and removed before commit.
I have a table on which id is a primary key column set with auto increment. It contains over 10,00 rows.
I need to get all primary keys that have been deleted.
like
1 xcgh fct
2 xxml fcy
5 ccvb fcc
6 tylu cvn
9 vvbh cvv
The result that i should get is
3
4
7
8
currently i count all records and then insert(1 to count) in another table and then i select id from that table that dosent exists in record table. But this method is very inefficient. Is there any direct query that i can use?
please specify for mysql.
See fiddle:
http://sqlfiddle.com/#!2/edf67/4/0
CREATE TABLE SomeTable (
id INT PRIMARY KEY
, mVal VARCHAR(32)
);
INSERT INTO SomeTable
VALUES (1, 'xcgh fct'),
(2, 'xxml fcy'),
(5, 'ccvb fcc'),
(6, 'tylu cvn'),
(9, 'vvbh cvv');
set #rank = (Select max(ID)+1 from sometable);
create table CompleteIDs as (Select #rank :=#rank-1 as Rank
from sometable st1, sometable st2
where #rank >1);
SELECT CompleteIDs.Rank
FROM CompleteIDs
LEFT JOIN someTable S1
on CompleteIDs.Rank = S1.ID
WHERE S1.ID is null
order by CompleteIDs.rank
There is one assumption here. That the number of records in someTable* the number of records in sometable is greater than the maximum ID in sometable. Otherwise this doesn't work.
You can try to create a temp table, fill it with e.g. 1,000 values, you can do it using any scripting language or try a procedure (This might be not-effective overall)
DELIMITER $$
CREATE PROCEDURE InsertRand(IN NumRows INT)
BEGIN
DECLARE i INT;
SET i = 1;
START TRANSACTION;
WHILE i <= NumRows DO
INSERT INTO rand VALUES (i);
SET i = i + 1;
END WHILE;
COMMIT;
END$$
DELIMITER ;
CALL InsertRand(5);
Then you just do query
SELECT id AS deleted_id FROM temporary_table
WHERE id NOT IN
(SELECT id FROM main_table)
Please note that it should be like every day action or something cause it's very memory inefficient
I saw this answer and i hope he is incorrect, just like someone was incorrect telling primary keys are on a column and I can't set it on multiple columns.
Here is my table
create table Users(id INT primary key AUTO_INCREMENT,
parent INT,
name TEXT NOT NULL,
FOREIGN KEY(parent)
REFERENCES Users(id)
);
+----+--------+---------+
| id | parent | name |
+----+--------+---------+
| 1 | NULL | root |
| 2 | 1 | one |
| 3 | 1 | 1down |
| 4 | 2 | one_a |
| 5 | 4 | one_a_b |
+----+--------+---------+
I'd like to select user id 2 and recurse so I get all its direct and indirect child (so id 4 and 5).
How do I write it in such a way this will work? I seen recursion in postgresql and sqlserver.
CREATE DEFINER = 'root'#'localhost'
PROCEDURE test.GetHierarchyUsers(IN StartKey INT)
BEGIN
-- prepare a hierarchy level variable
SET #hierlevel := 00000;
-- prepare a variable for total rows so we know when no more rows found
SET #lastRowCount := 0;
-- pre-drop temp table
DROP TABLE IF EXISTS MyHierarchy;
-- now, create it as the first level you want...
-- ie: a specific top level of all "no parent" entries
-- or parameterize the function and ask for a specific "ID".
-- add extra column as flag for next set of ID's to load into this.
CREATE TABLE MyHierarchy AS
SELECT U.ID
, U.Parent
, U.`name`
, 00 AS IDHierLevel
, 00 AS AlreadyProcessed
FROM
Users U
WHERE
U.ID = StartKey;
-- how many rows are we starting with at this tier level
-- START the cycle, only IF we found rows...
SET #lastRowCount := FOUND_ROWS();
-- we need to have a "key" for updates to be applied against,
-- otherwise our UPDATE statement will nag about an unsafe update command
CREATE INDEX MyHier_Idx1 ON MyHierarchy (IDHierLevel);
-- NOW, keep cycling through until we get no more records
WHILE #lastRowCount > 0
DO
UPDATE MyHierarchy
SET
AlreadyProcessed = 1
WHERE
IDHierLevel = #hierLevel;
-- NOW, load in all entries found from full-set NOT already processed
INSERT INTO MyHierarchy
SELECT DISTINCT U.ID
, U.Parent
, U.`name`
, #hierLevel + 1 AS IDHierLevel
, 0 AS AlreadyProcessed
FROM
MyHierarchy mh
JOIN Users U
ON mh.Parent = U.ID
WHERE
mh.IDHierLevel = #hierLevel;
-- preserve latest count of records accounted for from above query
-- now, how many acrual rows DID we insert from the select query
SET #lastRowCount := ROW_COUNT();
-- only mark the LOWER level we just joined against as processed,
-- and NOT the new records we just inserted
UPDATE MyHierarchy
SET
AlreadyProcessed = 1
WHERE
IDHierLevel = #hierLevel;
-- now, update the hierarchy level
SET #hierLevel := #hierLevel + 1;
END WHILE;
-- return the final set now
SELECT *
FROM
MyHierarchy;
-- and we can clean-up after the query of data has been selected / returned.
-- drop table if exists MyHierarchy;
END
It might appear cumbersome, but to use this, do
call GetHierarchyUsers( 5 );
(or whatever key ID you want to find UP the hierarchical tree for).
The premise is to start with the one KEY you are working with. Then, use that as a basis to join to the users table AGAIN, but based on the first entry's PARENT ID. Once found, update the temp table as to not try and join for that key again on the next cycle. Then keep going until no more "parent" ID keys can be found.
This will return the entire hierarchy of records up to the parent no matter how deep the nesting. However, if you only want the FINAL parent, you can use the #hierlevel variable to return only the latest one in the file added, or ORDER BY and LIMIT 1
I know there is probably better and more efficient answer above but this snippet gives a slightly different approach and provides both - ancestors and children.
The idea is to constantly insert relative rowIds into temporary table, then fetch a row to look for it's relatives, rinse repeat until all rows are processed. Query can be probably optimized to use only 1 temporary table.
Here is a working sqlfiddle example.
CREATE TABLE Users
(`id` int, `parent` int,`name` VARCHAR(10))//
INSERT INTO Users
(`id`, `parent`, `name`)
VALUES
(1, NULL, 'root'),
(2, 1, 'one'),
(3, 1, '1down'),
(4, 2, 'one_a'),
(5, 4, 'one_a_b')//
CREATE PROCEDURE getAncestors (in ParRowId int)
BEGIN
DECLARE tmp_parentId int;
CREATE TEMPORARY TABLE tmp (parentId INT NOT NULL);
CREATE TEMPORARY TABLE results (parentId INT NOT NULL);
INSERT INTO tmp SELECT ParRowId;
WHILE (SELECT COUNT(*) FROM tmp) > 0 DO
SET tmp_parentId = (SELECT MIN(parentId) FROM tmp);
DELETE FROM tmp WHERE parentId = tmp_parentId;
INSERT INTO results SELECT parent FROM Users WHERE id = tmp_parentId AND parent IS NOT NULL;
INSERT INTO tmp SELECT parent FROM Users WHERE id = tmp_parentId AND parent IS NOT NULL;
END WHILE;
SELECT * FROM Users WHERE id IN (SELECT * FROM results);
END//
CREATE PROCEDURE getChildren (in ParRowId int)
BEGIN
DECLARE tmp_childId int;
CREATE TEMPORARY TABLE tmp (childId INT NOT NULL);
CREATE TEMPORARY TABLE results (childId INT NOT NULL);
INSERT INTO tmp SELECT ParRowId;
WHILE (SELECT COUNT(*) FROM tmp) > 0 DO
SET tmp_childId = (SELECT MIN(childId) FROM tmp);
DELETE FROM tmp WHERE childId = tmp_childId;
INSERT INTO results SELECT id FROM Users WHERE parent = tmp_childId;
INSERT INTO tmp SELECT id FROM Users WHERE parent = tmp_childId;
END WHILE;
SELECT * FROM Users WHERE id IN (SELECT * FROM results);
END//
Usage:
CALL getChildren(2);
-- returns
id parent name
4 2 one_a
5 4 one_a_b
CALL getAncestors(5);
-- returns
id parent name
1 (null) root
2 1 one
4 2 one_a
I have a table that contains computer login and logoff events. Each row is a separate event with a timestamp, machine name, login or logoff event code and other details. I need to create a SQL procedure that goes through this table and locates corresponding login and logoff event and insert new rows into another table that contain the machine name, login time, logout time and duration time.
So, should I use a cursor to do this or is there a better way to go about this? The database is pretty huge so efficiency is certainly a concern. Any suggested pseudo code would be great as well.
[edit : pulled from comment]
Source table:
History (
mc_id
, hs_opcode
, hs_time
)
Existing data interpretation:
Login_Event = unique mc_id, hs_opcode = 1, and hs_time is the timestamp
Logout_Event = unique mc_id, hs_opcode = 2, and hs_time is the timestamp
First, your query will be simpler (and faster) if you can order the data in such a way that you don't need a complex subquery to pair up the rows. Since MySQL doesn't support CTE to do this on-the-fly, you'll need to create a temporary table:
CREATE TABLE history_ordered (
seq INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
hs_id INT,
mc_id VARCHAR(255),
mc_loggedinuser VARCHAR(255),
hs_time DATETIME,
hs_opcode INT
);
Then, pull and sort from your original table into the new table:
INSERT INTO history_ordered (
hs_id, mc_id, mc_loggedinuser,
hs_time, hs_opcode)
SELECT
hs_id, mc_id, mc_loggedinuser,
hs_time, hs_opcode
FROM history ORDER BY mc_id, hs_time;
You can now use this query to correlate the data:
SELECT li.mc_id,
li.mc_loggedinuser,
li.hs_time as login_time,
lo.hs_time as logout_time
FROM history_ordered AS li
JOIN history_ordered AS lo
ON lo.seq = li.seq + 1
AND li.hs_opcode = 1;
For future inserts, you can use a trigger like below to keep your duration table updated automatically:
DELIMITER $$
CREATE TRIGGER `match_login` AFTER INSERT ON `history`
FOR EACH ROW
BEGIN
IF NEW.hs_opcode = 2 THEN
DECLARE _user VARCHAR(255);
DECLARE _login DATETIME;
SELECT mc_loggedinuser, hs_time FROM history
WHERE hs_time = (
SELECT MAX(hs_time) FROM history
WHERE hs_opcode = 1
AND mc_id = NEW.mc_id
) INTO _user, _login;
INSERT INTO login_duration
SET machine = NEW.mc_id,
logout = NEW.hs_time,
user = _user,
login = _login;
END IF;
END$$
DELIMITER ;
CREATE TABLE dummy (fields you'll select data into, + additional fields as needed)
INSERT INTO dummy (columns from your source)
SELECT * FROM <all the tables where you need data for your target data set>
UPDATE dummy SET col1 = CASE WHEN this = this THEN that, etc
INSERT INTO targetTable
SELECT all columns FROM dummy
Without any code that you're working on.. it'll be hard to see if this approach will be any useful.. There may be some instances when you really need to loop through things.. and some instances when this approach can be used instead..
[EDIT: based on poster's comment]
Can you try executing this and see if you get the desired results?
INSERT INTO <your_target_table_here_with_the_three_columns_required>
SELECT li.mc_id, li.hs_time AS login_time, lo.hs_time AS logout_time
FROM
history AS li
INNER JOIN history AS lo
ON li.mc_id = lo.mc_id
AND li.hs_opcode = 1
AND lo.hs_opcode = 2
AND lo.hs_time = (
SELECT min(hs_time) AS hs_time
FROM history
WHERE hs_time > li.hs_time
AND mc_id = li.mc_id
)
I'm working on a pair comparison site where a user loads a list of films and grades from another site. My site then picks two random movies and matches them against each other, the user selects the better of the two and a new pair is loaded. This gives a complete list of movies ordered by whichever is best.
The database contains three tables;
fm_film_data - this contains all imported movies
fm_film_data(id int(11),
imdb_id varchar(10),
tmdb_id varchar(10),
title varchar(255),
original_title varchar(255),
year year(4),
director text,
description text,
poster_url varchar(255))
fm_films - this contains all information related to a user, what movies the user has seen, what grades the user has given, as well as information about each film's wins/losses for that user.
fm_films(id int(11),
user_id int(11),
film_id int(11),
grade int(11),
wins int(11),
losses int(11))
fm_log - this contains records of every duel that has occurred.
fm_log(id int(11),
user_id int(11),
winner int(11),
loser int(11))
To pick a pair to show the user, I've created a mySQL query that checks the log and picks a pair at random.
SELECT pair.id1, pair.id2
FROM
(SELECT part1.id AS id1, part2.id AS id2
FROM fm_films AS part1, fm_films AS part2
WHERE part1.id <> part2.id
AND part1.user_id = [!!USERID!!]
AND part2.user_id = [!!USERID!!])
AS pair
LEFT JOIN
(SELECT winner AS id1, loser AS id2
FROM fm_log
WHERE fm_log.user_id = [!!USERID!!]
UNION
SELECT loser AS id1, winner AS id2
FROM fm_log
WHERE fm_log.user_id = [!!USERID!!])
AS log
ON pair.id1 = log.id1 AND pair.id2 = log.id2
WHERE log.id1 IS NULL
ORDER BY RAND()
LIMIT 1
This query takes some time to load, about 6 seconds in our tests with two users with about 800 grades each.
I'm looking for a way to optimize this but still limit all duels to appear only once.
The server runs MySQL version 5.0.90-community.
i think you are better off creating a stored procedure/function which will return a pair as soon as it found a valid one.
make sure there are proper indexes:
fm_films.user_id (try including the film_id also)
fm_log.user_id (try including the winner and loser)
DELIMITER $$
DROP PROCEDURE IF EXISTS spu_findPair$$
CREATE PROCEDURE spu_findPair
(
IN vUserID INT
)
BEGIN
DECLARE done BOOLEAN DEFAULT FALSE;
DECLARE vLastFilmID INT;
DECLARE vCurFilmID INT;
DECLARE cUserFilms CURSOR FOR
SELECT id
FROM fm_films
WHERE user_id = vUserID
ORDER BY RAND();
DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done=TRUE;
OPEN cUserFilms;
ufLoop: LOOP
FETCH cUserFilms INTO vCurFilmID;
IF done THEN
CLOSE cUserFilms;
LEAVE ufLoop;
END IF;
IF vLastFilmID IS NOT NULL THEN
IF NOT EXISTS
(
SELECT 1
FROM fm_log
WHERE user_id = vUserID
AND ((winner = vCurFilmID AND loser = vLastFilmID) OR (winner = vLastFilmID AND loser = vCurFilmID))
) THEN
CLOSE cUserFilms;
LEAVE ufLoop;
#output
SELECT vLastFilmID, vCurFilmID;
END IF;
END IF;
END LOOP;
END$$
DELIMITER ;
Have you tried applying any indexes to the tables?
The user_id columns would be a good start. The id field that is also used in the WHERE clause would be another index that might be worth adding.
Benchmakr to make sure the addition of the indices do result in speedups and do not slow other code (eg. insertions).
However, I have found that simple indexes on short tables like these can still result in some huge speed ups when they apply to fields in the WHERE clauses of SELECT and UPDATE statements.