Batch MYSQL inserts for performance in DB structure migration - mysql

I need to restructure my MYSQL InnoDB database.
At the moment I have a customer table holding 3 product names.
I need to extract these names to a new product table. The product table should hold each name currently held in the customer table and be linked to the customer table via a new customer_product table. While the product names may not be unique, they don't have anything to do with each other, meaning for each customer there will need to be inserted 3 new entries into the product table and 3 new entries into the customer_product table.
So instead of this:
customer
| id | product_name_a | product_name_b | product_name_c |
I need this:
customer
| id |
customer_product
| customer_id | product_id | X3
product
| id | name | X3
I've written the following MYSQL procedure that works:
BEGIN
DECLARE nbr_of_customers BIGINT(20);
DECLARE customer_count BIGINT(20);
DECLARE product_id BIGINT(20);
DECLARE customer_id BIGINT(20);
DECLARE product_name_a VARCHAR(500);
DECLARE product_name_b VARCHAR(500);
DECLARE product_name_c VARCHAR(500);
SELECT COUNT(*) FROM customer INTO nbr_of_customers;
SET customer_count = 0;
SET product_id = 1;
WHILE customer_count < nbr_of_customers DO
SELECT
customer.id,
customer.product_name_a,
customer.product_name_b,
customer.product_name_c
INTO
customer_id,
product_name_a,
product_name_b,
product_name_c
FROM customer
LIMIT customer_count,1;
INSERT INTO product(id, name)
VALUES(product_id, product_name_a);
INSERT INTO customer_product(customer_id, product_id)
VALUES(customer_id, product_id);
SET product_id = product_id + 1;
INSERT INTO product(id, name)
VALUES(product_id, product_name_b);
INSERT INTO customer_product(customer_id, product_id)
VALUES(customer_id, product_id);
SET product_id = product_id + 1;
INSERT INTO product(id, name)
VALUES(product_id, product_name_c);
INSERT INTO customer_product(customer_id, product_id)
VALUES(customer_id, product_id);
SET product_id = product_id + 1;
SET customer_count = customer_count + 1;
END WHILE;
END;
This is too slow.
I've run this locally and estimate that my ~15k customers would take ~1h to complete. And my VPS server is far slower than that, so it could take upward to 10h to complete.
The problem seem to be the inserts taking a long time. I've would therefore like to store all the inserts during the procedure and execute them all in batch after the loop is complete and I know what to insert.
I there a way to perform all the ~100k inserts in batch to optimize performance, or is there a better way to do it?
FINAL EDIT:
I marked the correct solution based on that it did an excellent job of speeding up the process massively, which was the main focus of the question. In the end I ended up performing the migration using modified production code (in Java), due to the solution's limitations regarding not escaping the inserted strings.

First, use a cursor to process the results of a single query, rather than performing a separate query for each row.
Then concatenate the VALUES lists into strings that you execute using PREPARE and EXECUTE.
My code does the inserts in batches of 100 customers, because I expect there's a limit on the size of a query.
BEGIN
DECLARE product_id BIGINT(20);
DECLARE customer_id BIGINT(20);
DECLARE product_name_a VARCHAR(500);
DECLARE product_name_b VARCHAR(500);
DECLARE product_name_c VARCHAR(500);
DECLARE done INT DEFAULT FALSE;
DECLARE cur CURSOR FOR SELECT c.id, c.product_name_a, c.product_name_b, c.product_name_c FROM customer AS c;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
SET product_id = 1;
OPEN cur;
SET #product_values = '';
SET #cp_values = '';
read_loop: LOOP
FETCH cur INTO customer_id, product_name_a, product_name_b, product_name_c;
IF done THEN
LEAVE read_loop;
END IF;
SET #product_values = CONCAT(#product_values, IF(#product_values != '', ',', ''), "(", product_id, ",'", product_name_a, "'), (", product_id + 1, ",'", product_name_b, "'), (", product_id + 2, ",'", product_name_c, "'), ");
SET #cp_values = CONCAT(#cp_values, IF(#cp_values != '', ',', ''), "(", customer_id, ",", product_id, "), (", customer_id, ",", product_id + 1, "), (", customer_id, ",", product_id + 2, "),");
SET product_id = product_id + 3;
IF product_id % 300 = 1 -- insert every 100 customers
THEN BEGIN
SET #insert_product = CONCAT("INSERT INTO product(id, name) VALUES ", #product_values);
PREPARE stmt1 FROM #insert_product;
EXECUTE stmt1;
SET #insert_cp = CONCAT("INSERT INTO customer_product(customer_id, product_id) VALUES ", #cp_values);
PREPARE stmt2 FROM #insert_cp;
EXECUTE stmt2;
SET #product_values = '';
SET #cp_values = '';
END IF;
END LOOP;
IF #product_values != '' -- Process any remaining rows
THEN BEGIN
SET #insert_product = CONCAT("INSERT INTO product(id, name) VALUES ", #product_values);
PREPARE stmt1 FROM #insert_product;
EXECUTE stmt1;
SET #insert_cp = CONCAT("INSERT INTO customer_product(customer_id, product_id) VALUES ", #cp_values);
PREPARE stmt2 FROM #insert_cp;
EXECUTE stmt2;
SET #product_values = '';
SET #cp_values = '';
END IF;
END;
Beware that, using this solution, the product names will not be properly escaped before inserting. This solution will therefore not work if any of the product names contains special characters, such as single quote '.

Perhaps you could do this in three separate inserts (instead of ~100K) as follows:
INSERT INTO customer_product (customer_id, product_id)
SELECT customer.id as customer_id, product.id as product_id
FROM customer
JOIN product on customer.product_name_a = product.name
INSERT INTO customer_product (customer_id, product_id)
SELECT customer.id as customer_id, product.id as product_id
FROM customer
JOIN product on customer.product_name_b = product.name
INSERT INTO customer_product (customer_id, product_id)
SELECT customer.id as customer_id, product.id as product_id
FROM customer
JOIN product on customer.product_name_c = product.name
Of course, you would have to set up your product table ahead of time, and you'd want to drop your de-normalized columns from your customer table after the fact.
This could be further sped up if you create an index on the customer.product_name_X columns (and possibly the product.name column, though it's so few, idk if it would be significant). EXPLAIN can help with that.

Related

How to find the sum of all the numeric values present in a MySQL table's column which has datatype as varchar and are separated by commas

I have a table which has been created using the following query
create table damaged_property_value
(case_id int, property_value varchar(100) );
insert into damaged_property_value (1,'2000'),(2,'5000,3000,7000');
The problem is I need to find the total value of all the properties that have been damaged.
I am writing the following query to return the sum:
select SUM(cast(property_value as unsigned)) from damaged_property_value;
It returns the sum as 7000, i.e , 2000+5000. It is not considering the value of property which are separated by commas.
Note that 5000,3000 and 7000 are values of three different properties that have been damaged in a particular case. It should have produced 17000 as an answer.
How to solve this problem.
Please help!
As was said, the best solution would be to fix the data structure.
Now, just for the fun of solving the problem, and after much research, I managed to do the following (it requires the case_id to be sequential, starting at 1) that calculates the values of the property_value strings and puts them into the new actual_value field.
drop table if exists damaged_property_value;
create table damaged_property_value
(case_id int primary key, property_value varchar(100), actual_value int );
insert into damaged_property_value (case_id, property_value) values (1,'2000'),(2,'5000,3000,7000'),(3, '7000, 2000'),(4, '100,200,300,400,500,600');
drop procedure if exists Calculate_values;
DELIMITER $$
CREATE PROCEDURE Calculate_values()
BEGIN
DECLARE count INT;
SET count = 1;
label: LOOP
select
concat('update damaged_property_value set actual_value = ',
replace((select property_value from damaged_property_value where case_id = count), ",", "+"),
' where case_id = ', count, ';')
into #formula;
#select #formula;
prepare stmt from #formula;
execute stmt;
deallocate prepare stmt;
SET count = count +1;
IF count > (select count(*) from damaged_property_value) THEN
LEAVE label;
END IF;
END LOOP label;
END $$
DELIMITER ;
CALL Calculate_values();
select * from damaged_property_value;
/* select SUM(actual_value) from damaged_property_value; */

MySQL) How to remove duplicated rows working with WHILE loop?

I have a table called cart and an empty table sort in database. My goal is to transmit & split values in comma spread from one table to another like this:
(table cart)
id | food
---+--------------
1 | Carrots, Cucumbers
2 | Dandelions
3 | Salmons
4 | Cucumbers, Potatoes
5 | Tomatoes
(table sort after run query)
id | food
---+--------------
1 | Carrots
1 | Cucumbers
2 | Dandelions
3 | Salmons
4 | Cucumbers
4 | Potatoes
5 | Tomatoes
An issue is, it seems like the keywords DISTINCT, GROUP BY and ORDER BY won't work properly inside of the WHILE clause. When I run my query, MySQL inserts a whole list of values multiple times based on a total of id count.
Have a look at this query:
DELIMITER //
CREATE FUNCTION fx_splitString(columnName TEXT, pos INT)
RETURNS TEXT
RETURN TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(columnName, ',', pos), ',', -1)) //
CREATE PROCEDURE pd_splitRow()
BEGIN
DECLARE rowToll INT;
SET rowToll = (SELECT COUNT(id) FROM cart);
SET #outerCount = 1;
WHILE #outerCount <= rowToll DO
INSERT INTO sort (id, food)
SELECT DISTINCT id, fx_splitString(food, #stringToll) FROM cart;
SET #outerCount = #outerCount + 1;
END WHILE;
END //
DELIMITER ;
CALL pd_splitRow();
Now I'm trying to find other ways to accomplish this but none of these work.
Try 1: Nested Loop
This idea is aborted because MySQL won't allow to set multiple rows. I can't use to create variable
Error Code: 1242. Subquery returns more than 1 row 0.016 sec
DECLARE rowToll INT;
SET rowToll = (SELECT COUNT(id) FROM cart);
SET #stringToll = (SELECT LENGTH(food) - LENGTH(REPLACE(food, ',', '')) + 1 FROM cart); /* error */
SET #outerCount = 1;
SET #innerCount = 1;
WHILE #outerCount <= rowToll DO
WHILE #innerCount <= #stringToll DO
INSERT INTO sort (id, food)
SELECT DISTINCT id, fx_splitString(food, #stringToll) FROM cart;
SET #innerCount = #innerCount + 1;
END WHILE;
SET #outerCount = #outerCount + 1;
END WHILE;
Try 2: Self Join
This one simply does not insert any of values from cart table.
WHILE #begin <= rowCount DO
INSERT INTO sort (id, food)
SELECT DISTINCT cte1.id, fx_splitString(cte1.food, #begin) FROM cart cte1
INNER JOIN cart cte2 ON cte1.id = cte2.id AND cte1.food = cte2.food;
SET #begin = #begin + 1;
END WHILE;
I could just use in the examples from the other guys' query but I want to create my own for improving skills.
Are there any ways to group the rows working with WHILE loop?

How to Get Data from Multiple Database Dynamically?

I have multiple databases in MySQL from Diffrent Companies.
Ex.
1.Company1
2.Company2
3.Company3
4.Company4
In every database the table and column's structure are same but data is stored for different companies.
Now if i have to get a count of EmployeeID's sales for diffrent companies then i need to write the queries like below.
Select a.EmployeeID,Count(b.TransactionDate)
From Company1.Employee as a
Inner Join Company1.Sales as b
On a.EmployeeID=b.EmployeeID
Group By a.EmployeeID
Union
Select a.EmployeeID,Count(b.TransactionDate)
From Company2.Employee as a
Inner Join Company2.Sales as b
On a.EmployeeID=b.EmployeeID
Group By a.EmployeeID
Union
Select a.EmployeeID,Count(b.TransactionDate)
From Company3.Employee as a
Inner Join Company3.Sales as b
On a.EmployeeID=b.EmployeeID
Group By a.EmployeeID
Union
Select a.EmployeeID,Count(b.TransactionDate)
From Company4.Employee as a
Inner Join Company4.Sales as b
On a.EmployeeID=b.EmployeeID
Group By a.EmployeeID
Notice i am changing the database in "FROM" Clause and "INNER JOIN" with hard-coded value.
In future further database may be added and i don't want to change the code behind or i don't want add code with another "union".
Is there anything which we can do to do it dynamically. i mean if we can store database name in a table and query should automatically pick those database information from the table.
I'm always waiting for upvotes and acknowledgements ;)
With this databases (they are all the same of course, so i post only one of them.
use `company3`;
DROP TABLE IF EXISTS Employee;
CREATE TABLE Employee
(`EmployeeID` int, `LastName` varchar(40), `Firstname` varchar(40), `Age` int)
;
INSERT INTO Employee
(`EmployeeID`, `LastName`, `Firstname`, `Age`)
VALUES
(1, 'Hansen', 'Han', 30),
(2, 'Svendson', 'Sven', 23),
(3, 'Pettersen', 'Peter', 20)
;
DROP TABLE IF EXISTS Sales;
CREATE TABLE Sales
(`EmployeeID` int, `TransactionDate` datetime)
;
INSERT INTO Sales
(`EmployeeID`, `TransactionDate`)
VALUES
(1, '2015-12-20 10:01:00'),
(1, '2015-12-20 10:01:00'),
(2, '2015-12-20 10:01:00'),
(2, '2015-12-20 10:01:00'),
(2, '2015-12-20 10:01:00')
;
And this stored procedure
CREATE DEFINER=`root`#`localhost` PROCEDURE `GetSakesConut`()
BEGIN
DECLARE bDone INT;
DECLARE DBname TEXT;
DECLARE sqlstement LONGTEXT;
DECLARE n INT;
DECLARE curs CURSOR FOR SELECT SCHEMA_NAME FROM INFORMATION_SCHEMA.SCHEMATA
WHERE SCHEMA_NAME LIKE 'company%';
DECLARE CONTINUE HANDLER FOR NOT FOUND SET bDone = 1;
OPEN curs;
SET bDone = 0;
SET n =0;
SET sqlstement = '';
SALESloop: LOOP
FETCH curs INTO DBname;
IF bDone = 1 THEN
LEAVE SALESloop;
END IF;
IF n>0 THEN
SET sqlstement = CONCAT(sqlstement,' UNION ');
END IF;
SET sqlstement = CONCAT(sqlstement,'Select "',DBname,'",a.EmployeeID,');
SET sqlstement = CONCAT(sqlstement,'Count(b.TransactionDate) ');
SET sqlstement = CONCAT(sqlstement,'From ',DBname,'.Employee as a ');
SET sqlstement = CONCAT(sqlstement,'Inner Join ',DBname,'.Sales as b ');
SET sqlstement = CONCAT(sqlstement,'On a.EmployeeID=b.EmployeeID ');
SET sqlstement = CONCAT(sqlstement,'Group By a.EmployeeID ');
SET n =n+1;
END LOOP SALESloop;
CLOSE curs;
SET #sqlstement = sqlstement;
PREPARE stmt FROM #sqlstement;
EXECUTE stmt;
END
For the explanation:
For the cursor curs i get all Database names that start with compan
In the loop i get one Dataase name after another and i build with it
your select statement with the correct database names.
And of course ou have to add union to all Select without the first
you get folloowing Result
company1 EmployeeID Count(b.TransactionDate)
company1 1 2
company1 2 3
company2 1 2
company2 2 3
company3 1 2
company3 2 3
Of course i had to adept the select statement because yours didn't work properly.

MySQL / Mariadb Stored Procedure, Prepared Statement, Union, Values from dynamically created tables and column names

I'd like to create reports without having to create a pivot table in excel for every report.
I have survey software that creates a new table for each survey. The columns are named with ID numbers. So, I never know what the columns will be named. The software stores answers in two different tables depending on the 'type' of question. (text, radio button, etc.)
I manually created a table 'survey_answers_lookup' that stores a few key fields but it duplicates the answers. The procedure 'survey_report' works well and produces the required data but there is a challenge.
Since the survey tables are created when someone creates a new survey, I would need a trigger on the schema that creates a second trigger and I don't think that is possible. The second trigger would monitor the survey table and insert the data into the 'survey_answers_lookup' table after someone completes a survey.
I could edit the php software and insert the values into the survey_answers_lookup table but that would create more work when I update the software. (I'd have to update the files and then put my changes back in the files). I also could not determine where they insert the values into the tables.
Can you please help?
Edited. I posted my solution below.
Change some_user to a user who has access to the database.
CREATE DEFINER=`some_user`#`localhost` PROCEDURE `usp_produce_survey_report`(IN survey_id VARCHAR(10), IN lang VARCHAR(2))
SQL SECURITY INVOKER
BEGIN
/*---------------------------------------------------------------------------------
I do not guarantee that this will work for you or that it cannot be hacked with
with SQL injections or other malicious intents.
This stored procedure will produce output that you may use to create a report.
It accepts two arguments; The survey id (745) and the language (en).
It parses the column name in the survey table to get the qid.
It will copy the answers from the survey table to the survey_report
table if the answer is type S or K. It will get the answers from
the answers table for other types. NOTE: Other types might need to
be added to the if statement.
Additionally, the qid and id from the survey table are also copied to
the survey_report table.
Then the questions from the questions table, and answers from the answers
and survey_report tables are combined and displayed.
The data in the survey_report table is deleted after the data is displayed.
The id from the survey table is displayed as the respondent_id which may
be used to combine the questions and answers from a specific respondent.
You may have to change the prefix on the table names.
Example: survey_answers to my_prefix_answers.
Use this to call the procedure.
Syntax: call survey.usp_produce_survey_report('<SURVERY_ID>', '<LANGUAGE>');
Example: call survey.usp_produce_survey_report('457345', 'en');
use this to create the table that stores the data
CREATE TABLE `survey_report` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`qid` int(11) NOT NULL DEFAULT '0',
`survey_row_id` int(11) NOT NULL DEFAULT '0' COMMENT 'id that is in the survey_<id> table',
`answer` mediumtext COLLATE utf8mb4_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`)
);
*/
DECLARE v_col_name VARCHAR (25);
DECLARE v_qid INT;
DECLARE v_col_count INT DEFAULT 0;
DECLARE done INT DEFAULT false;
DECLARE tname VARCHAR(24) DEFAULT CONCAT('survey_survey_',survey_id);
DECLARE counter INT DEFAULT 0;
DECLARE current_row INT DEFAULT 0;
DECLARE total_rows INT DEFAULT 0;
-- select locate ('X','123457X212X1125', 8); -- use locate to determine location of second X - returns 11
-- select substring('123457X212X1125', 11+1, 7); -- use substring to get the qid - returns 1125
DECLARE cur1 cursor for
SELECT column_name, substring(column_name, 11+1, 7) as qid -- get the qid from the column name. the 7 might need to be higher depending on the id.
FROM information_schema.columns -- this has the column names
WHERE table_name = tname -- table name created form the id that was passed to the stored procedure
AND column_name REGEXP 'X'; -- get the columns that have an X
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
SET done = FALSE;
OPEN cur1;
SET total_rows = (SELECT table_rows -- get the number of rows
FROM INFORMATION_SCHEMA.TABLES
WHERE table_name = tname);
-- SELECT total_rows;
read_loop: LOOP
FETCH cur1 INTO v_col_name, v_qid; -- v_col_name is the original column name and v_qid is the qid that is taken from the column name
IF done THEN
LEAVE read_loop;
END IF;
-- SELECT v_col_name, v_qid;
SET counter = 1; -- use to compare id's
SET current_row = 1; -- used for the while loop
WHILE current_row <= total_rows DO
SET #sql := NULL;
-- SELECT v_col_name, v_qid, counter, x;
-- SELECT counter as id, v_col_name, v_qid as qid, x;
-- SET #sql = CONCAT ('SELECT id ', ',',v_qid, ' as qid ,', v_col_name,' FROM ', tname, ' WHERE id = ', counter );
-- I would have to join the survey table below if I did not add the answer (v_col_name). I assume this is faster than another join.
SET #sql = CONCAT ('INSERT INTO survey_report(qid,survey_row_id,answer) SELECT ',v_qid, ',id,' , v_col_name, ' FROM ', tname, ' WHERE id = ', counter );
-- SELECT #sql;
PREPARE stmt FROM #sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
-- SELECT counter, x;
SET current_row = current_row + 1; -- increment counter for while loop
SET counter = counter + 1; -- increment counter for id's
END WHILE;
END LOOP; -- read_loop
CLOSE cur1;
-- SELECT * FROM survey_report
-- ORDER BY id, qid;
SET #counter = 0;
SELECT
#counter:=#counter + 1 AS newindex, -- increment the counter that is in the header
survey_report.id,
survey_report.survey_row_id as respondent_id, -- the id that copied from the survey table
survey_report.qid,
question,
IF(type IN ('S' , 'K'),
(SELECT answer
FROM survey_report
WHERE qid NOT IN (SELECT qid FROM survey_answers)
AND survey_questions.language = lang
AND survey_report.id = #counter),
(SELECT answer
FROM survey_answers
WHERE survey_questions.qid = survey_answers.qid
AND survey_report.qid = survey_questions.qid
AND survey_report.answer = survey_answers.code
AND survey_answers.language = lang
)
) AS answer
FROM survey_questions
JOIN survey_report ON survey_report.qid = survey_questions.qid
WHERE survey_questions.sid = survey_id
ORDER BY survey_report.survey_row_id, survey_report.id;
TRUNCATE TABLE survey_report;
END

Slope-One recommender implementation using mysql stored procedure

I am trying to implement Slop-One recommender using mysql stored procedure, the query runs okay and doesn't give any error. But it is not inserting/updating the 'dev' table.
The structure of tables are:
rating (user_id, article_id, rating_value, date)
dev (article1_id, article2_id, count, sum)
The 'dev' table has joint primary key (article1_id, article2_id). The sql for my procedure is as follows:
DELIMITER $$
CREATE PROCEDURE update_matrix(IN user INT(11), IN article INT(11), IN rating TINYINT(1))
BEGIN
DECLARE article_id2 INT(11);
DECLARE rating_diff TINYINT(1);
DECLARE done TINYINT(1) DEFAULT 0;
DECLARE mycursor CURSOR FOR
SELECT DISTINCT article_id, (rating - rating_value)
FROM rating
WHERE user_id = user AND article_id != article;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
OPEN mycursor;
FETCH mycursor INTO article_id2, rating_diff;
WHILE(!done) DO
INSERT INTO dev (`article1_id`, `article2_id`, `count`, `sum`)
VALUES (article, article_id2, 1, rating_diff)
ON DUPLICATE KEY UPDATE count = count + 1, sum = sum + rating_diff;
INSERT INTO dev (`article1_id`, `article2_id`, `count`, `sum`)
VALUES (article_id2, article, 1, -rating_diff)
ON DUPLICATE KEY UPDATE count = count + 1, sum = sum - rating_diff;
FETCH mycursor INTO article_id2, rating_diff;
END WHILE;
CLOSE mycursor;
END$$
DELIMITER ;