Optimization of SQL query regarding pair comparisons - mysql

I'm working on a pair comparison site where a user loads a list of films and grades from another site. My site then picks two random movies and matches them against each other, the user selects the better of the two and a new pair is loaded. This gives a complete list of movies ordered by whichever is best.
The database contains three tables;
fm_film_data - this contains all imported movies
fm_film_data(id int(11),
imdb_id varchar(10),
tmdb_id varchar(10),
title varchar(255),
original_title varchar(255),
year year(4),
director text,
description text,
poster_url varchar(255))
fm_films - this contains all information related to a user, what movies the user has seen, what grades the user has given, as well as information about each film's wins/losses for that user.
fm_films(id int(11),
user_id int(11),
film_id int(11),
grade int(11),
wins int(11),
losses int(11))
fm_log - this contains records of every duel that has occurred.
fm_log(id int(11),
user_id int(11),
winner int(11),
loser int(11))
To pick a pair to show the user, I've created a mySQL query that checks the log and picks a pair at random.
SELECT pair.id1, pair.id2
FROM
(SELECT part1.id AS id1, part2.id AS id2
FROM fm_films AS part1, fm_films AS part2
WHERE part1.id <> part2.id
AND part1.user_id = [!!USERID!!]
AND part2.user_id = [!!USERID!!])
AS pair
LEFT JOIN
(SELECT winner AS id1, loser AS id2
FROM fm_log
WHERE fm_log.user_id = [!!USERID!!]
UNION
SELECT loser AS id1, winner AS id2
FROM fm_log
WHERE fm_log.user_id = [!!USERID!!])
AS log
ON pair.id1 = log.id1 AND pair.id2 = log.id2
WHERE log.id1 IS NULL
ORDER BY RAND()
LIMIT 1
This query takes some time to load, about 6 seconds in our tests with two users with about 800 grades each.
I'm looking for a way to optimize this but still limit all duels to appear only once.
The server runs MySQL version 5.0.90-community.

i think you are better off creating a stored procedure/function which will return a pair as soon as it found a valid one.
make sure there are proper indexes:
fm_films.user_id (try including the film_id also)
fm_log.user_id (try including the winner and loser)
DELIMITER $$
DROP PROCEDURE IF EXISTS spu_findPair$$
CREATE PROCEDURE spu_findPair
(
IN vUserID INT
)
BEGIN
DECLARE done BOOLEAN DEFAULT FALSE;
DECLARE vLastFilmID INT;
DECLARE vCurFilmID INT;
DECLARE cUserFilms CURSOR FOR
SELECT id
FROM fm_films
WHERE user_id = vUserID
ORDER BY RAND();
DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done=TRUE;
OPEN cUserFilms;
ufLoop: LOOP
FETCH cUserFilms INTO vCurFilmID;
IF done THEN
CLOSE cUserFilms;
LEAVE ufLoop;
END IF;
IF vLastFilmID IS NOT NULL THEN
IF NOT EXISTS
(
SELECT 1
FROM fm_log
WHERE user_id = vUserID
AND ((winner = vCurFilmID AND loser = vLastFilmID) OR (winner = vLastFilmID AND loser = vCurFilmID))
) THEN
CLOSE cUserFilms;
LEAVE ufLoop;
#output
SELECT vLastFilmID, vCurFilmID;
END IF;
END IF;
END LOOP;
END$$
DELIMITER ;

Have you tried applying any indexes to the tables?
The user_id columns would be a good start. The id field that is also used in the WHERE clause would be another index that might be worth adding.
Benchmakr to make sure the addition of the indices do result in speedups and do not slow other code (eg. insertions).
However, I have found that simple indexes on short tables like these can still result in some huge speed ups when they apply to fields in the WHERE clauses of SELECT and UPDATE statements.

Related

Return two separate queries in a stored procedure with MySQL

I'm trying to write a stored procedure in MySQL that will query a table based on the employee's department number. If the department number is present in the table, it should return only those employees belonging to that department. On the other hand, if it's not present in the table, then I want to return all records from the table.
However, I'm not sure how to accomplish this.
Table Schema:
empNo
empName
salary
deptNo
number
name
salary
dept number
My stored procedure so far:
DELIMITER //
CREATE PROCEDURE GetEmpData(
IN deptNum INT
)
BEGIN
IF deptNum THEN
SELECT * FROM empdemo
WHERE deptNo = deptNum;
ELSE
SELECT * FROM empdemo;
END IF;
END //
I used the if else because I didn't know of any other way to break this query down. If anyone has any other suggestions please post them!
It seems that you need in this query:
SELECT *
FROM empdemo
WHERE empdemo.deptNo = deptNum
OR NOT EXISTS ( SELECT NULL
FROM empdemo
WHERE empdemo.deptNo = deptNum )

Copying data in two tables with foreign keys

I have two tables, reviews and grade.
Reviews table has
id_review (primary key), id_lang, email, text etc.
Example
1, 2, email#email.com, test text
2, 2, email#email.com, test text
4, 2, email#email.com, test text
Grade table has
id_review (primary/foreign key), id_criterion, grade
1, 3, 5.00
1, 1, 4.00
2, 3, 3.00
2, 1, 5.00
4, 2, 3.00
I need to copy all the reviews with lang id 2, change the text and the lang id to 1 (this I can do manually).
But as the id_review changes with the copied reviews, I need to create new rows on the grade table, too. Is there a way to make sure that the foreign keys are matched with the copied reviews, too?
I tried to do it the old fashioned way with copy/paste on csv but as some reviews are removed from the reviews table and some reviews have differences in id_criterion count, it's very hard to do for a large table.
Or should I try to edit the table to allow the reviews table to have distinct values for id_lang with the same id_review?
You can create temporary tables (no foreign keys) out of the original ones, patch and validate the data until you satisfy and then insert back to the original tables.
I am not sure how you populate id_review, so I assume they are auto generated when you insert new rows.
create table reviews_temp_20211119 as
select r.id_review as old_id_review
, 0 as new_id_review
, row_number() over(order by r.id_review) as ref_patch_id
, r.id_lang
, r.email
, r.text
from reviews r
where id_lang = 2;
create table grades_temp_20211119 as
select g.id_review
, g.id_criterion
, g.gradate
, 0 as new_id_review
from grades g
where g.id_review in (select t.old_id_review from reviews_temp_20211119 t);
update reviews_temp_20211119
set id_lang = 1;
alter table reviews add column ref_patch_id bigint null;
-- insert back to original to get the auto generated id_review
-- if you use other strategies to populate the id_review, you can do update it directly to the temp table and review if all the data are correct before insert back to the original table
insert into reviews (id_lang, email, text, ref_patch_id)
select id_lang, email, text, ref_patch_id
from reviews_temp_20211119;
update reviews_temp_20211119 t
join reviews r on (r.ref_patch_id = t.ref_patch_id)
set t.new_id_review = r.id_review;
update grades_temp_20211119 g
join reviews_temp_20211119 t on (g.id_review = t.old_id_review)
set g.new_id_review = t.new_id_review);
insert into grades (id_review, id_criterion, grade)
select t.new_id_review
, t.id_criterion
, t.grade
from grades_temp_20211119 t;
By keeping the temporary tables, you have opportunity to review or rollback the change if something went wrong by looking back at the temporary tables.
For a repeatable process, I think a stored procedure with cursors is the way. Here's my version, it accept two parameters, the old idLang you wish to copy, and the new idLang:
CREATE PROCEDURE copyReviewWithNewLang(IN oldidLang INT, IN newidLang INT)
BEGIN
DECLARE c_idReview, c_maxIdReview INT;
DECLARE c_text, c_email VARCHAR;
DECLARE old_c_idreview INT DEFAULT 0;
-- first cursor gets all the review rows of the old language, ordered
DECLARE rev_cur CURSOR FOR SELECT idReview, email, text FROM reviews WHERE id_lang = oldidLang ORDER BY idReview ASC;
-- second cursor gets the highest idReview
DECLARE maxid_cur CURSOR FOR SELECT MAX(idReview) FROM reviews;
-- needed for ending the loop on end of retrieved data
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
OPEN rev_cur;
retrieving : LOOP
FETCH rev_cur INTO c_idReview, c_email, c_text;
-- ending the loop
IF done THEN
LEAVE retrieving;
END IF;
IF (old_c_idreview = 0) OR (old_c_idreview != c_idReview) THEN
OPEN maxid_cur;
FETCH maxid_cur INTO c_maxIdReview;
CLOSE maxid_cur;
SET c_maxIdReview = c_maxIdReview + 1
END IF;
-- copying the review row
INSERT INTO reviews (id_review, id_lang, email, text)
VALUES(c_maxIdReview, newidLang, c_email, c_text)
-- copying the grade rows
INSERT INTO grades (id_review, id_criterion, grade)
SELECT c_maxIdReview, id_criterion, grade FROM grades
WHERE id_review = c_idReview;
-- needed for checking if id changed
SET old_c_idreview = c_idReview;
END LOOP;
CLOSE rev_cur;
END;

Optimize while loop

I am using this procedure that I created to create a member's downline.
PROCEDURE get_downline(IN id INT)
BEGIN
declare cur_depth int default 1;
-- Create the structure of the final table
drop temporary table if exists tmp_downline;
create temporary table tmp_downline (
member_id int unsigned,
referrer_id int unsigned,
depth tinyint unsigned
);
-- Create a table for the previous list of users
drop temporary table if exists tmp_members;
create temporary table tmp_members(
member_id int unsigned
);
-- Make a duplicate of tmp_members so we can select on both
drop temporary table if exists tmp_members2;
create temporary table tmp_members2(
member_id int unsigned
);
-- Create the level 1 downline
insert into tmp_downline select id, member_id, cur_depth from members where referrer_id = id;
-- Add those members into the tmp table
insert into tmp_members select member_id from members where referrer_id = id;
myLoop: while ((select count(*) from tmp_members) > 0) do
-- Set next level of users
set cur_depth = cur_depth + 1;
-- Insert next level of users into table
insert into tmp_downline select id, member_id, cur_depth from members where referrer_id in(select member_id from tmp_members);
-- Re-fill duplicate temporary table
truncate table tmp_members2;
insert into tmp_members2 select member_id from tmp_members;
-- Reset the default temporary table
truncate table tmp_members;
insert into tmp_members select member_id from members where referrer_id in(select member_id from tmp_members2);
end while;
-- Get the final list of results
select * from tmp_downline order by depth;
END
Here are my results:
Found rows: 424,097; Duration for 1 query: 12.438 sec.
All the queries look like they are using optimized indexes, but it is still taking a while to run. Is there a better way to run my while loop? I feel that making 2 temporary tables might be part of the issue, but when running my last insert query I can't reopen the temporary table which is why I made a duplicate table.
Here is a slimmed down version of the original table (original has 50 cols):
CREATE TABLE `members` (
`member_id` INT(11) NOT NULL AUTO_INCREMENT,
`referrer_id` INT(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`member_id`),
INDEX `referrer_id_idx` (`referrer_id`)
);
What I am trying to achieve is to get an MLM downline.
Here is a picture that shows a downline where the number shows the level and you are the main circle at the top.
Level 1: People you referred to the program
Level 2: People your referrals referred to the program
Level 3: People your referrals, referrals referred to the program
Level 4: ...
Level 5: ....
Level ........
Re-evaluating select count(*) from tmp_members for each iteration is expensive and unnecessary. Find the initial count then deduct mysql_affected_rows() after each operation. Doing a proper join in the the insert....select...in (select...) will likely result in better plans.
You've not shown the structure of the members table nor the EXPLAIN plans for the queries, so its impossible to say if there's any scope for optimising there, but you've probably get a big performance uplift by using a proper graph database rather than a relational one for this task. Or maintain a denormalised view of the results populated when you add a new record to the base table. Or use a more appropriate tree representation such as an adjacency list.
I think you might be able to do away with the second tmp_members table...
PROCEDURE get_downline(IN id INT)
BEGIN
declare cur_depth int default 1;
-- Create the structure of the final table
drop temporary table if exists tmp_downline;
create temporary table tmp_downline (
member_id int unsigned,
referrer_id int unsigned,
depth tinyint unsigned
);
-- Create a table for the previous list of users
drop temporary table if exists tmp_members;
create temporary table tmp_members(
member_id int unsigned
);
-- Create the level 1 downline
insert into tmp_downline
select id, member_id, cur_depth
from members
where referrer_id = id
;
myLoop: while (mysql_affected_rows()>1) do
-- Load tmp_members with previously added referred members
-- to use as the next potential referrers
truncate tmp_members;
insert into tmp_members
select referrer_id
from tmp_downline
where depth = cur_depth
;
-- Set next level of users
set cur_depth = cur_depth + 1;
-- Insert next level of users into table
insert into tmp_downline
select id, m.member_id, cur_depth
from members AS m
INNER JOIN tmp_members AS t ON m.referrer_id = t.member_id
;
end while;
-- Get the final list of results
select * from tmp_downline order by depth;
END
In this case, tmp_members could be more appropriately named tmp_candidate_referrers, but I left it as it was to show how little of a change it was.
Also, for this version, you might consider adding indexes for tmp_downline.cur_depth and tmp_members.member_id since they are repeatedly used.
--- Edit ---
I made a correction in the first insert within the loop above. referrer_id should be selected from tmp_downline, not member_id; which brings me to a question...
Why bother with the tmp_downline.member_id field when it always contains the value of the argument passed?
If you need it in the final results, you can always finish with
SELECT id AS member_id, referrer_id, depth FROM tmp_downline ORDER BY depth;

Storing counts in variable, inserting into rows, and using IF statements with MySQL

You can use an IF statement in a MySQL stored procedure to control the actions that are taken.
The syntax is as follows:
IF condition THEN ... do something ... END IF;
The condition in your IF statements can be similar to the conditions in CASE statements. For additional information, check out MySQL's web site.
You can also retrieve the auto increment ID after you perform an insert. Just use the following query:
SELECT LAST_INSERT_ID() into artist_id;
For this project, create a new procedure called AddAlbum. It will take at least two parameters - NameOfArtist and AlbumName (just like in the last project).
Your procedure needs to check the Artists table to see if an NameOfArtist exists. If it does not exist, you'll need to add a new row. Here are the steps:
Use the COUNT aggregate on Artists to see how many rows exist for NameOfArtist. Store the count in a variable called artist_count.
If artist_count is zero, insert a new row into Artists.
-- Select the LAST_INSERT_ID() into a variable.
If artist_count is one, lookup the ArtistID and store it in a variable.
Insert a new row into the Albums table.
code:
CREATE PROCEDURE AddAlbum(
NameOfArtist varchar(50),
AlbumName varchar(50)
);
BEGIN
DECLARE artist_count INT;
DECLARE artist_id INT;
SELECT COUNT(ArtistName) INTO artist_count FROM Artists
WHERE ArtistName = NameOfArtist;
IF artist_count = 0
THEN SELECT LAST_INSERT_ID(NameOfArtist) INTO artist_id
AND INSERT INTO Artists (ArtistName)
VALUES (NameOfArtist)
END IF;
IF artist_count = 1
THEN SELECT ArtistID INTO artist_id
FROM Artists
WHERE ArtistName = NameOfArtist
END IF;
INSERT INTO Albums (ArtistID, Title)
VALUES (artist_id, AlbumName);
END;
//
This is what I'm trying, for the class I'm currently taking. Honestly, I'm not understanding how to write the code too well. I understand exactly what needs to be done, my brain for some reason is just not processing how to write the code correctly for some reason.
I'm fairly certain that I have a lot of errors, I've been working on it quiet some time, and just can't seem to get it. Been re-thinking this database admin course I'm taking. But, I'm not going to give up.
Is there anyone on who can assist me with this?
If you need to follow the exact steps, this would be your code:
CREATE PROCEDURE AddAlbum(
NameOfArtist varchar(50),
AlbumName varchar(50)
);
BEGIN
DECLARE artist_count INT;
DECLARE artist_id INT;
SELECT COUNT(ArtistName) INTO artist_count FROM Artists WHERE ArtistName = NameOfArtist;
IF artist_count = 0 THEN
INSERT INTO Artists (ArtistName) VALUES (NameOfArtist);
SELECT LAST_INSERT_ID(NameOfArtist) INTO artist_id;
ELSE IF artist_count = 1 THEN
SELECT ArtistID INTO artist_id FROM Artists WHERE ArtistName = NameOfArtist;
END IF;
INSERT INTO Albums (ArtistID, Title) VALUES (artist_id, AlbumName);
END; //

Should I use cursors in my SQL procedure?

I have a table that contains computer login and logoff events. Each row is a separate event with a timestamp, machine name, login or logoff event code and other details. I need to create a SQL procedure that goes through this table and locates corresponding login and logoff event and insert new rows into another table that contain the machine name, login time, logout time and duration time.
So, should I use a cursor to do this or is there a better way to go about this? The database is pretty huge so efficiency is certainly a concern. Any suggested pseudo code would be great as well.
[edit : pulled from comment]
Source table:
History (
mc_id
, hs_opcode
, hs_time
)
Existing data interpretation:
Login_Event = unique mc_id, hs_opcode = 1, and hs_time is the timestamp
Logout_Event = unique mc_id, hs_opcode = 2, and hs_time is the timestamp
First, your query will be simpler (and faster) if you can order the data in such a way that you don't need a complex subquery to pair up the rows. Since MySQL doesn't support CTE to do this on-the-fly, you'll need to create a temporary table:
CREATE TABLE history_ordered (
seq INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
hs_id INT,
mc_id VARCHAR(255),
mc_loggedinuser VARCHAR(255),
hs_time DATETIME,
hs_opcode INT
);
Then, pull and sort from your original table into the new table:
INSERT INTO history_ordered (
hs_id, mc_id, mc_loggedinuser,
hs_time, hs_opcode)
SELECT
hs_id, mc_id, mc_loggedinuser,
hs_time, hs_opcode
FROM history ORDER BY mc_id, hs_time;
You can now use this query to correlate the data:
SELECT li.mc_id,
li.mc_loggedinuser,
li.hs_time as login_time,
lo.hs_time as logout_time
FROM history_ordered AS li
JOIN history_ordered AS lo
ON lo.seq = li.seq + 1
AND li.hs_opcode = 1;
For future inserts, you can use a trigger like below to keep your duration table updated automatically:
DELIMITER $$
CREATE TRIGGER `match_login` AFTER INSERT ON `history`
FOR EACH ROW
BEGIN
IF NEW.hs_opcode = 2 THEN
DECLARE _user VARCHAR(255);
DECLARE _login DATETIME;
SELECT mc_loggedinuser, hs_time FROM history
WHERE hs_time = (
SELECT MAX(hs_time) FROM history
WHERE hs_opcode = 1
AND mc_id = NEW.mc_id
) INTO _user, _login;
INSERT INTO login_duration
SET machine = NEW.mc_id,
logout = NEW.hs_time,
user = _user,
login = _login;
END IF;
END$$
DELIMITER ;
CREATE TABLE dummy (fields you'll select data into, + additional fields as needed)
INSERT INTO dummy (columns from your source)
SELECT * FROM <all the tables where you need data for your target data set>
UPDATE dummy SET col1 = CASE WHEN this = this THEN that, etc
INSERT INTO targetTable
SELECT all columns FROM dummy
Without any code that you're working on.. it'll be hard to see if this approach will be any useful.. There may be some instances when you really need to loop through things.. and some instances when this approach can be used instead..
[EDIT: based on poster's comment]
Can you try executing this and see if you get the desired results?
INSERT INTO <your_target_table_here_with_the_three_columns_required>
SELECT li.mc_id, li.hs_time AS login_time, lo.hs_time AS logout_time
FROM
history AS li
INNER JOIN history AS lo
ON li.mc_id = lo.mc_id
AND li.hs_opcode = 1
AND lo.hs_opcode = 2
AND lo.hs_time = (
SELECT min(hs_time) AS hs_time
FROM history
WHERE hs_time > li.hs_time
AND mc_id = li.mc_id
)