MySQL Stored Procedure behaves inconsistently - mysql

I am trying to write a mysql stored procedure with the following functions :
Inputs are report parameters including report type, database to run
against and some date and filter parameters
The procedure looks up the report in table (report) and reads the report parameters.
some reports are simple queries while others are UNIONS of 2 reports (isunion=1)
after reading the report parameters from the report table, the procedure passes these to the build_report function which puts it together and
returns the sql query.
in the case of single query, the procedure works perfectly (i.e only 1 report part) but where there are 2 parts, the procedure
fails. It seems unable to get a value from the
build_report function even though the part1 of a multi part query
is the almost identical as the case for report with a single query
(is_union=0) except for getting the report_id for the part from the reports table.
What am I missing ? any insight would be hugely appreciated.
The below procedure works perfectly when isunion=0 but fails when isunion=1 (even if I rem out part 2 altogether)
CREATE PROCEDURE `runreport_u`(
in all_active INT(1),
in reportid INT(10),
in db_id INT(10),
in start_date date,
in end_date date,
in inc_grpby INT(1),
OUT qry_part varchar(20000), #only outputing these to help debug
OUT qry_part1 varchar(20000),
OUT qry_part2 varchar(50000),
OUT qry varchar(40000),
OUT union_rep_id1 int(10),
OUT union_rep_id2 int(10),
OUT isunion int(10))
BEGIN
# purpose - this procedures looks up the specifications of a report query in table report. If the report is a single query the procedure works perfectly (case isunion=0) . When the qry is made up of 2 queries to be unionized, the build_report function which builds the parts seems to return nothing.
DECLARE part int DEFAULT NULL;
# is this report a union
SELECT is_union INTO isunion FROM report WHERE report_id=#reportid;
IF isunion=0 THEN
#report is not a union
SET part=1;
SET union_rep_id1=reportid;
SET qry_part = build_report(all_active,union_rep_id1,db_id,start_date,end_date,inc_grpby);
#log the components into gen table (more for debugging than anything else)
INSERT INTO gen (report_number,run_date,part_number,part_report_number,qry_part,part_order,param_all_active,param_report_id,param_db_id,param_start_date,param_end_date,param_incgrp) VALUES (reportid, now(),part,COALESCE(union_rep_id1,999),qry_part,part,all_active,reportid,db_id,start_date,end_date,inc_grpby);
SET qry=qry_part;
ELSE
#report has 2 or more parts
#*** first part
SET part=1;
#set the first report part - works (returns same union_rep_id as above)
SELECT union_report_id1 INTO union_rep_id1 FROM report WHERE report_id=reportid;
#get the query - in principle identical to the above but works above and returns nothing here
SET qry_part1 = build_report(all_active,union_rep_id1,db_id,start_date,end_date,inc_grpby);
INSERT INTO gen (report_number,run_date,part_number,part_report_number,qry_part,part_order,param_all_active,param_report_id,param_db_id,param_start_date,param_end_date,param_incgrp) VALUES (reportid, now(),part,COALESCE(union_rep_id1,999),qry_part1,part,all_active,reportid,db_id,start_date,end_date,inc_grpby);
#*** second part
SET part=2;
SELECT union_report_id2 INTO union_rep_id2 FROM report WHERE report_id=reportid;
SET qry_part2 = build_report(all_active,union_rep_id2,db_id,start_date,end_date,inc_grpby);
INSERT INTO gen (report_number,run_date,part_number,part_report_number,qry_part,part_order,param_all_active,param_report_id,param_db_id,param_start_date,param_end_date,param_incgrp) VALUES (reportid, now(),part,COALESCE(union_rep_id2,999),qry_part,part,all_active,reportid,db_id,start_date,end_date,inc_grpby);
#*******
#join the parts together in UNION
SET qry=CONCAT(qry_part,' UNION ',qry_part2);
END IF;
#had to add the line below otherwise Prepare stmt failed to recognize qry as variable
SET #final_qry=qry;
PREPARE stmt FROM #final_qry;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
#fails if isunion=1
END

Related

MYSQL - Parameter being lost when using it in SQL statement

I have a stored procedure in MYSQL where I am passing one parameter which is passed into an SQL statement as you can see below however the result is returning a count of 0 where I am expecting a count of 2.
Stored Procedure:
CREATE DEFINER=`admin`#`%` PROCEDURE `EmployeesRecords`(IN employee_id varchar (1000))
BEGIN
--
declare v_count int ;
--
select count(*)
into v_count
from employees
where employees_id IN (employee_id);
--
END
One or many employee Id's can be passed into the parameter employee_id.
when Calling Stored Procedure like this : CALL EmployeesRecords('2,3'); This returns a count of 0 where I am expecting a count of 2
As for the parameter itself, I have tried various methods including changing it in the procedure to have it as "IN ('2','3') in the SQL condition however it still does not work.
However what I have noticed is that when passing one employee Id, it works successfully such as CALL EmployeesRecords('2');
Can anyone guide me to what I am doing wrong please?

Update MySQL table in chunks

I am trying to update a MySQL InnoDB table with c. 100 million rows. The query takes close to an hour, which is not a problem.
However, I'd like to split this update into smaller chunks in order not to block table access. This update does not have to be an isolated transaction.
At the same time, the splitting of the update should not be too expensive in terms of additional overhead.
I considered looping through the table in a procedure using :
UPDATE TABLENAME SET NEWVAR=<expression> LIMIT batchsize, offset,
But UPDATE does not have an offset option in MySQL.
I understand I could try to UPDATE ranges of data that are SELECTed on a key, together with the LIMIT option, but that seems rather complicated for that simple task.
I ended up with the procedure listed below. It works but I am not sure whether it is efficient with all the queries to identify consecutive ranges. It can be called with the following arguments (example):
call chunkUpdate('SET var=0','someTable','theKey',500000);
Basically, the first argument is the update command (e.g. something like "set x = ..."), followed by the mysql table name, followed by a numeric (integer) key that has to be unique, followed by the size of the chunks to be processed. The key should have an index for reasonable performance. The "n" variable and the "select" statements in the code below can be removed and are only for debugging.
delimiter //
CREATE PROCEDURE chunkUpdate (IN cmd VARCHAR(255), IN tab VARCHAR(255), IN ky VARCHAR(255),IN sz INT)
BEGIN
SET #sqlgetmin = CONCAT("SELECT MIN(",ky,")-1 INTO #minkey FROM ",tab);
SET #sqlgetmax = CONCAT("SELECT MAX(",ky,") INTO #maxkey FROM ( SELECT ",ky," FROM ",tab," WHERE ",ky,">#minkey ORDER BY ",ky," LIMIT ",sz,") AS TMP");
SET #sqlstatement = CONCAT("UPDATE ",tab," ",cmd," WHERE ",ky,">#minkey AND ",ky,"<=#maxkey");
SET #n=1;
PREPARE getmin from #sqlgetmin;
PREPARE getmax from #sqlgetmax;
PREPARE statement from #sqlstatement;
EXECUTE getmin;
REPEAT
EXECUTE getmax;
SELECT cmd,#n AS step, #minkey AS min, #maxkey AS max;
EXECUTE statement;
set #minkey=#maxkey;
set #n=#n+1;
UNTIL #maxkey IS NULL
END REPEAT;
select CONCAT(cmd, " EXECUTED IN ",#n," STEPS") AS MESSAGE;
END//

Duplicate Entries During Batch Call Procedure

I have a stored procedure that basically looks like this:
do_insert (IN in_x varchar(64), IN in_y varchar(64))
BEGIN
declare v_x int(20) unsigned default -1;
declare v_y int(20) unsigned default -1;
select x into v_x from xs where x_col = in_x;
if v_x = 0
then
insert into xs (x_col) values(in_x);
select x into v_x from xs where x_col = in_x;
end if;
select y into v_y from ys where y_col = in_y;
if v_y = 0
then
insert into ys (y_col) values(in_y);
select y into v_y from ys where y_col = in_y;
end if;
insert ignore into unique_table (xId, yId) values(v_x, v_y);
END
Basically I look to see if I already have the varchars defined in their respective tables, and if so I select the id. If not, then I create them and get their IDs. Then I insert them into the unique_table ignoring if they're already there. (Yes I could probably put more logic in there to NOT have to do the final insert, but that shouldn't be an issue and KISS.)
The problem I have is that when I run this in a batch JDBC statement using Google Cloud SQL I get duplicate entries inserted into my xs table. The next time this stored proc is run I get the following exception:
java.sql.SQLException: Result consisted of more than one row Query: call do_insert(:x, :y)
So what I think is happening is that two calls with the same in_x values are occurring in the same batch statement. These two calls are being run in parallel, both selects come back with 0 as it's a new entry, then they both do an insert. The next run then fails.
Questions:
How do I prevent this?
Should I wrap my select (and possible insert) calls in a LOCK TABLE for that table to prevent this?
I've never noticed this on a local MySQL, is this Google Cloud SQL specific? Or just a fluke that I haven't seen it on my local MySQL?
I haven't tested this yet, but I'm guessing the batch statement is introducing some level of parallelism. So the 'select x into v_x' statement can race with the 'insert into vx' statement allowing the latter to be executed more than once.
Try changing the 'insert into xs' statement to an 'insert ignore' and add a unique index on the column.

MySQL and JDBC caching (?) issue with procedure call in Scala

I'm working on a project that parses text units (referenced as "ngrams") from a string and stores them in a MySQL database.
In MySQL I have the following procedure that's supposed to store an ngram in a specific dataset (table) if it's not already there and return the id of the ngram:
CREATE PROCEDURE `add_ngram`(IN ngram VARCHAR(400), IN dataset VARCHAR(128), OUT ngramId INT)
BEGIN
-- try get id of ngram
SET #s = CONCAT('SELECT `id` INTO #ngramId FROM `mdt_', dataset, '_b1` WHERE `ngram` = ''', ngram, ''' LIMIT 1');
PREPARE stm FROM #s;
EXECUTE stm;
SET ngramId = #ngramId;
-- if id could not be retrieved
IF ngramId IS NULL THEN BEGIN
-- insert ngram into dataset
SET #s = CONCAT('INSERT INTO `mdt_', dataset, '_b1`(`ngram`) VALUES (''', ngram, ''')');
PREPARE stm FROM #s;
EXECUTE stm;
SET ngramId = LAST_INSERT_ID();
END;
END IF;
END$$
A dataset table has only two columns: id, an auto-incremented int that serves as the primary key, and ngram, a varchar(400) that serves as a unique index.
In my scala app I have a method that takes in a string, splits it into ngrams and then return a Seq of the ngrams' ids by passing the ngrams to the above procedure:
private def processNgrams(text: String, dataSet: String) {
val ids = parser.parse(text).map(ngram => {
val query = dbConn.prepareCall("CALL add_ngram(?,?,?)")
query.setString(1, ngram)
query.setString(2, dataSet)
query.registerOutParameter(3, java.sql.Types.INTEGER)
query.executeUpdate()
dbConn.commit()
val id = query.getInt(3)
Debug(s"parsed ngram - id: $id ${" " * (3 - id.toString.length)}[ $ngram ]")
id
}
}
dbConn in the above code is an instance of java.sql.Connection and has auto commit set to false.
When I executed this I noticed that very few ngrams were stored in the database. Here's what the debug statement from the above method prints out:
So there are multiple ngrams that are clearly different from each other that seem to have the same id returned from the procedure. If I look in the database table, I can see that for example the ngram "i" has id "1", but it seems that ngrams inserted immediately after also get returned an id of "1". This is true of the other ngrams I looked up in the table as well. This leads me to believe that perhaps the procedure call maybe gets cached?
I've tried a number of things, such as creating the prepared call outside of the method and reusing it and calling clearParameters every time, creating a new call inside the method every time (as it is above), even sleeping the Thread between calls, but nothing seems to make a difference.
I've also tested the procedure by manually running queries in a MySQL client and it seems to run fine, though my program executes queries at a much faster speed than I can manually, so that might make a difference.
I'm not entirely sure if it's a JDBC issue with making the call or a MySQL issue with the procedure. I'm new to scala and MySQL procedures, so forgive me if this is something really simple that's escaped me.
Figured it out! Turns out, it was the stored procedure causing all the trouble.
In the procedure I check to see if a ngram exists by doing a dynamic SQL query (since the table name is passed in) that stores a value in #ngramId, which is a session variable. I store it in #ngramId and not into ngramId (a procedure output parameter) because prepared statements can only select into session variables (or so I've been told by an error when I originally created the procedure). Next I set the value of #ngramId to ngramId and check if ngramId is null to determine if the ngram exists in the table; if null, the ngram is inserted into the table and ngramId is set to the last inserted id.
The problem with this is that because #ngramId is a session variable and because I used the same database connection for all procedures calls, the value of #ngramId persisted between calls. So for example, if I made a call with the ngram "I", and it was found in the database with id 1, #ngramId now had the value of 1. Next if I tried to insert another ngram that did not exist in the table, the dynamic select statement did not return anything so the value of #ngramId remained as 1. Since the output parameter ngramId is populated with the value of #ngramId and now it was no longer NULL, it bypassed the if statement that inserted the ngram in the database and returned the id of the last ngram found in the table, resulting in the seeming caching of ngram ids.
The solution to this was to add the following line as the very first statement in the procedure:
SET #ngramId = NULL;
Which resets the value of #ngramId between calls to the procedure over the same session.

MySQL temporary tables do not clear

Background - I have a DB created from a single large flat file. Instead of creating a single large table with 106 columns. I created a "columns" table which stores the column names and the id of the table that holds that data, plus 106 other tables to store the data for each column. Since not all the records have data in all columns, I thought this might be a more efficient way to load the data (maybe a bad idea).
The difficulty with this was rebuilding a single record from this structure. To facilitate this I created the following procedure:
DROP PROCEDURE IF EXISTS `col_val`;
delimiter $$
CREATE PROCEDURE `col_val`(IN id INT)
BEGIN
DROP TEMPORARY TABLE IF EXISTS tmp_record;
CREATE TEMPORARY TABLE tmp_record (id INT(11), val varchar(100)) ENGINE=MEMORY;
SET #ctr = 1;
SET #valsql = '';
WHILE (#ctr < 107) DO
SET #valsql = CONCAT('INSERT INTO tmp_record SELECT ',#ctr,', value FROM col',#ctr,' WHERE recordID = ',#id,';');
PREPARE s1 FROM #valsql;
EXECUTE s1;
DEALLOCATE PREPARE s1;
SET #ctr = #ctr+1;
END WHILE;
END$$
DELIMITER ;
Then I use the following SQL where the stored procedure parameter is the id of the record I want.
CALL col_val(10);
SELECT c.`name`, t.`val`
FROM `columns` c INNER JOIN tmp_record t ON c.ID = t.id
Problem - The first time I run this it works great. However, each subsequent run returns the exact same record even though the parameter is changed. How does this persist even when the stored procedure should be dropping and re-creating the temp table?
I might be re-thinking the whole design and going back to a single table, but the problem illustrates something I would like to understand.
Unsure if it matters but I'm running MySQL 5.6 (64 bit) on Windows 7 and executing the SQL via MySQL Workbench v5.2.47 CE.
Thanks,
In MySQL stored procedures, don't put an # symbol in front of local variables (input parameters or locally declared variables). The #id you used refers to a user variable, which is kind of like a global variable for the session you're invoking the procedure from.
In other words, #id is a different variable from id.
That's the explanation of the immediate problem you're having. However, I would not design the tables as you have done.
Since not all the records have data in all columns, I thought this might be a more efficient way to load the data
I recommend using a conventional single table, and use NULL to signify missing data.