Duplicate Entries During Batch Call Procedure - mysql

I have a stored procedure that basically looks like this:
do_insert (IN in_x varchar(64), IN in_y varchar(64))
BEGIN
declare v_x int(20) unsigned default -1;
declare v_y int(20) unsigned default -1;
select x into v_x from xs where x_col = in_x;
if v_x = 0
then
insert into xs (x_col) values(in_x);
select x into v_x from xs where x_col = in_x;
end if;
select y into v_y from ys where y_col = in_y;
if v_y = 0
then
insert into ys (y_col) values(in_y);
select y into v_y from ys where y_col = in_y;
end if;
insert ignore into unique_table (xId, yId) values(v_x, v_y);
END
Basically I look to see if I already have the varchars defined in their respective tables, and if so I select the id. If not, then I create them and get their IDs. Then I insert them into the unique_table ignoring if they're already there. (Yes I could probably put more logic in there to NOT have to do the final insert, but that shouldn't be an issue and KISS.)
The problem I have is that when I run this in a batch JDBC statement using Google Cloud SQL I get duplicate entries inserted into my xs table. The next time this stored proc is run I get the following exception:
java.sql.SQLException: Result consisted of more than one row Query: call do_insert(:x, :y)
So what I think is happening is that two calls with the same in_x values are occurring in the same batch statement. These two calls are being run in parallel, both selects come back with 0 as it's a new entry, then they both do an insert. The next run then fails.
Questions:
How do I prevent this?
Should I wrap my select (and possible insert) calls in a LOCK TABLE for that table to prevent this?
I've never noticed this on a local MySQL, is this Google Cloud SQL specific? Or just a fluke that I haven't seen it on my local MySQL?

I haven't tested this yet, but I'm guessing the batch statement is introducing some level of parallelism. So the 'select x into v_x' statement can race with the 'insert into vx' statement allowing the latter to be executed more than once.
Try changing the 'insert into xs' statement to an 'insert ignore' and add a unique index on the column.

Related

UPDATE primary key in POSTGRES database [duplicate]

Several months ago I learned from an answer on Stack Overflow how to perform multiple updates at once in MySQL using the following syntax:
INSERT INTO table (id, field, field2) VALUES (1, A, X), (2, B, Y), (3, C, Z)
ON DUPLICATE KEY UPDATE field=VALUES(Col1), field2=VALUES(Col2);
I've now switched over to PostgreSQL and apparently this is not correct. It's referring to all the correct tables so I assume it's a matter of different keywords being used but I'm not sure where in the PostgreSQL documentation this is covered.
To clarify, I want to insert several things and if they already exist to update them.
PostgreSQL since version 9.5 has UPSERT syntax, with ON CONFLICT clause. with the following syntax (similar to MySQL)
INSERT INTO the_table (id, column_1, column_2)
VALUES (1, 'A', 'X'), (2, 'B', 'Y'), (3, 'C', 'Z')
ON CONFLICT (id) DO UPDATE
SET column_1 = excluded.column_1,
column_2 = excluded.column_2;
Searching postgresql's email group archives for "upsert" leads to finding an example of doing what you possibly want to do, in the manual:
Example 38-2. Exceptions with UPDATE/INSERT
This example uses exception handling to perform either UPDATE or INSERT, as appropriate:
CREATE TABLE db (a INT PRIMARY KEY, b TEXT);
CREATE FUNCTION merge_db(key INT, data TEXT) RETURNS VOID AS
$$
BEGIN
LOOP
-- first try to update the key
-- note that "a" must be unique
UPDATE db SET b = data WHERE a = key;
IF found THEN
RETURN;
END IF;
-- not there, so try to insert the key
-- if someone else inserts the same key concurrently,
-- we could get a unique-key failure
BEGIN
INSERT INTO db(a,b) VALUES (key, data);
RETURN;
EXCEPTION WHEN unique_violation THEN
-- do nothing, and loop to try the UPDATE again
END;
END LOOP;
END;
$$
LANGUAGE plpgsql;
SELECT merge_db(1, 'david');
SELECT merge_db(1, 'dennis');
There's possibly an example of how to do this in bulk, using CTEs in 9.1 and above, in the hackers mailing list:
WITH foos AS (SELECT (UNNEST(%foo[])).*)
updated as (UPDATE foo SET foo.a = foos.a ... RETURNING foo.id)
INSERT INTO foo SELECT foos.* FROM foos LEFT JOIN updated USING(id)
WHERE updated.id IS NULL;
See a_horse_with_no_name's answer for a clearer example.
Warning: this is not safe if executed from multiple sessions at the same time (see caveats below).
Another clever way to do an "UPSERT" in postgresql is to do two sequential UPDATE/INSERT statements that are each designed to succeed or have no effect.
UPDATE table SET field='C', field2='Z' WHERE id=3;
INSERT INTO table (id, field, field2)
SELECT 3, 'C', 'Z'
WHERE NOT EXISTS (SELECT 1 FROM table WHERE id=3);
The UPDATE will succeed if a row with "id=3" already exists, otherwise it has no effect.
The INSERT will succeed only if row with "id=3" does not already exist.
You can combine these two into a single string and run them both with a single SQL statement execute from your application. Running them together in a single transaction is highly recommended.
This works very well when run in isolation or on a locked table, but is subject to race conditions that mean it might still fail with duplicate key error if a row is inserted concurrently, or might terminate with no row inserted when a row is deleted concurrently. A SERIALIZABLE transaction on PostgreSQL 9.1 or higher will handle it reliably at the cost of a very high serialization failure rate, meaning you'll have to retry a lot. See why is upsert so complicated, which discusses this case in more detail.
This approach is also subject to lost updates in read committed isolation unless the application checks the affected row counts and verifies that either the insert or the update affected a row.
With PostgreSQL 9.1 this can be achieved using a writeable CTE (common table expression):
WITH new_values (id, field1, field2) as (
values
(1, 'A', 'X'),
(2, 'B', 'Y'),
(3, 'C', 'Z')
),
upsert as
(
update mytable m
set field1 = nv.field1,
field2 = nv.field2
FROM new_values nv
WHERE m.id = nv.id
RETURNING m.*
)
INSERT INTO mytable (id, field1, field2)
SELECT id, field1, field2
FROM new_values
WHERE NOT EXISTS (SELECT 1
FROM upsert up
WHERE up.id = new_values.id)
See these blog entries:
Upserting via Writeable CTE
WAITING FOR 9.1 – WRITABLE CTE
WHY IS UPSERT SO COMPLICATED?
Note that this solution does not prevent a unique key violation but it is not vulnerable to lost updates.
See the follow up by Craig Ringer on dba.stackexchange.com
In PostgreSQL 9.5 and newer you can use INSERT ... ON CONFLICT UPDATE.
See the documentation.
A MySQL INSERT ... ON DUPLICATE KEY UPDATE can be directly rephrased to a ON CONFLICT UPDATE. Neither is SQL-standard syntax, they're both database-specific extensions. There are good reasons MERGE wasn't used for this, a new syntax wasn't created just for fun. (MySQL's syntax also has issues that mean it wasn't adopted directly).
e.g. given setup:
CREATE TABLE tablename (a integer primary key, b integer, c integer);
INSERT INTO tablename (a, b, c) values (1, 2, 3);
the MySQL query:
INSERT INTO tablename (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;
becomes:
INSERT INTO tablename (a, b, c) values (1, 2, 10)
ON CONFLICT (a) DO UPDATE SET c = tablename.c + 1;
Differences:
You must specify the column name (or unique constraint name) to use for the uniqueness check. That's the ON CONFLICT (columnname) DO
The keyword SET must be used, as if this was a normal UPDATE statement
It has some nice features too:
You can have a WHERE clause on your UPDATE (letting you effectively turn ON CONFLICT UPDATE into ON CONFLICT IGNORE for certain values)
The proposed-for-insertion values are available as the row-variable EXCLUDED, which has the same structure as the target table. You can get the original values in the table by using the table name. So in this case EXCLUDED.c will be 10 (because that's what we tried to insert) and "table".c will be 3 because that's the current value in the table. You can use either or both in the SET expressions and WHERE clause.
For background on upsert see How to UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE) in PostgreSQL?
I was looking for the same thing when I came here, but the lack of a generic "upsert" function botherd me a bit so I thought you could just pass the update and insert sql as arguments on that function form the manual
that would look like this:
CREATE FUNCTION upsert (sql_update TEXT, sql_insert TEXT)
RETURNS VOID
LANGUAGE plpgsql
AS $$
BEGIN
LOOP
-- first try to update
EXECUTE sql_update;
-- check if the row is found
IF FOUND THEN
RETURN;
END IF;
-- not found so insert the row
BEGIN
EXECUTE sql_insert;
RETURN;
EXCEPTION WHEN unique_violation THEN
-- do nothing and loop
END;
END LOOP;
END;
$$;
and perhaps to do what you initially wanted to do, batch "upsert", you could use Tcl to split the sql_update and loop the individual updates, the preformance hit will be very small see http://archives.postgresql.org/pgsql-performance/2006-04/msg00557.php
the highest cost is executing the query from your code, on the database side the execution cost is much smaller
There is no simple command to do it.
The most correct approach is to use function, like the one from docs.
Another solution (although not that safe) is to do update with returning, check which rows were updates, and insert the rest of them
Something along the lines of:
update table
set column = x.column
from (values (1,'aa'),(2,'bb'),(3,'cc')) as x (id, column)
where table.id = x.id
returning id;
assuming id:2 was returned:
insert into table (id, column) values (1, 'aa'), (3, 'cc');
Of course it will bail out sooner or later (in concurrent environment), as there is clear race condition in here, but usually it will work.
Here's a longer and more comprehensive article on the topic.
I use this function merge
CREATE OR REPLACE FUNCTION merge_tabla(key INT, data TEXT)
RETURNS void AS
$BODY$
BEGIN
IF EXISTS(SELECT a FROM tabla WHERE a = key)
THEN
UPDATE tabla SET b = data WHERE a = key;
RETURN;
ELSE
INSERT INTO tabla(a,b) VALUES (key, data);
RETURN;
END IF;
END;
$BODY$
LANGUAGE plpgsql
Personally, I've set up a "rule" attached to the insert statement. Say you had a "dns" table that recorded dns hits per customer on a per-time basis:
CREATE TABLE dns (
"time" timestamp without time zone NOT NULL,
customer_id integer NOT NULL,
hits integer
);
You wanted to be able to re-insert rows with updated values, or create them if they didn't exist already. Keyed on the customer_id and the time. Something like this:
CREATE RULE replace_dns AS
ON INSERT TO dns
WHERE (EXISTS (SELECT 1 FROM dns WHERE ((dns."time" = new."time")
AND (dns.customer_id = new.customer_id))))
DO INSTEAD UPDATE dns
SET hits = new.hits
WHERE ((dns."time" = new."time") AND (dns.customer_id = new.customer_id));
Update: This has the potential to fail if simultaneous inserts are happening, as it will generate unique_violation exceptions. However, the non-terminated transaction will continue and succeed, and you just need to repeat the terminated transaction.
However, if there are tons of inserts happening all the time, you will want to put a table lock around the insert statements: SHARE ROW EXCLUSIVE locking will prevent any operations that could insert, delete or update rows in your target table. However, updates that do not update the unique key are safe, so if you no operation will do this, use advisory locks instead.
Also, the COPY command does not use RULES, so if you're inserting with COPY, you'll need to use triggers instead.
Similar to most-liked answer, but works slightly faster:
WITH upsert AS (UPDATE spider_count SET tally=1 WHERE date='today' RETURNING *)
INSERT INTO spider_count (spider, tally) SELECT 'Googlebot', 1 WHERE NOT EXISTS (SELECT * FROM upsert)
(source: http://www.the-art-of-web.com/sql/upsert/)
I custom "upsert" function above, if you want to INSERT AND REPLACE :
`
CREATE OR REPLACE FUNCTION upsert(sql_insert text, sql_update text)
RETURNS void AS
$BODY$
BEGIN
-- first try to insert and after to update. Note : insert has pk and update not...
EXECUTE sql_insert;
RETURN;
EXCEPTION WHEN unique_violation THEN
EXECUTE sql_update;
IF FOUND THEN
RETURN;
END IF;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION upsert(text, text)
OWNER TO postgres;`
And after to execute, do something like this :
SELECT upsert($$INSERT INTO ...$$,$$UPDATE... $$)
Is important to put double dollar-comma to avoid compiler errors
check the speed...
According the PostgreSQL documentation of the INSERT statement, handling the ON DUPLICATE KEY case is not supported. That part of the syntax is a proprietary MySQL extension.
I have the same issue for managing account settings as name value pairs.
The design criteria is that different clients could have different settings sets.
My solution, similar to JWP is to bulk erase and replace, generating the merge record within your application.
This is pretty bulletproof, platform independent and since there are never more than about 20 settings per client, this is only 3 fairly low load db calls - probably the fastest method.
The alternative of updating individual rows - checking for exceptions then inserting - or some combination of is hideous code, slow and often breaks because (as mentioned above) non standard SQL exception handling changing from db to db - or even release to release.
#This is pseudo-code - within the application:
BEGIN TRANSACTION - get transaction lock
SELECT all current name value pairs where id = $id into a hash record
create a merge record from the current and update record
(set intersection where shared keys in new win, and empty values in new are deleted).
DELETE all name value pairs where id = $id
COPY/INSERT merged records
END TRANSACTION
CREATE OR REPLACE FUNCTION save_user(_id integer, _name character varying)
RETURNS boolean AS
$BODY$
BEGIN
UPDATE users SET name = _name WHERE id = _id;
IF FOUND THEN
RETURN true;
END IF;
BEGIN
INSERT INTO users (id, name) VALUES (_id, _name);
EXCEPTION WHEN OTHERS THEN
UPDATE users SET name = _name WHERE id = _id;
END;
RETURN TRUE;
END;
$BODY$
LANGUAGE plpgsql VOLATILE STRICT
For merging small sets, using the above function is fine. However, if you are merging large amounts of data, I'd suggest looking into http://mbk.projects.postgresql.org
The current best practice that I'm aware of is:
COPY new/updated data into temp table (sure, or you can do INSERT if the cost is ok)
Acquire Lock [optional] (advisory is preferable to table locks, IMO)
Merge. (the fun part)
UPDATE will return the number of modified rows. If you use JDBC (Java), you can then check this value against 0 and, if no rows have been affected, fire INSERT instead. If you use some other programming language, maybe the number of the modified rows still can be obtained, check documentation.
This may not be as elegant but you have much simpler SQL that is more trivial to use from the calling code. Differently, if you write the ten line script in PL/PSQL, you probably should have a unit test of one or another kind just for it alone.
Edit: This does not work as expected. Unlike the accepted answer, this produces unique key violations when two processes repeatedly call upsert_foo concurrently.
Eureka! I figured out a way to do it in one query: use UPDATE ... RETURNING to test if any rows were affected:
CREATE TABLE foo (k INT PRIMARY KEY, v TEXT);
CREATE FUNCTION update_foo(k INT, v TEXT)
RETURNS SETOF INT AS $$
UPDATE foo SET v = $2 WHERE k = $1 RETURNING $1
$$ LANGUAGE sql;
CREATE FUNCTION upsert_foo(k INT, v TEXT)
RETURNS VOID AS $$
INSERT INTO foo
SELECT $1, $2
WHERE NOT EXISTS (SELECT update_foo($1, $2))
$$ LANGUAGE sql;
The UPDATE has to be done in a separate procedure because, unfortunately, this is a syntax error:
... WHERE NOT EXISTS (UPDATE ...)
Now it works as desired:
SELECT upsert_foo(1, 'hi');
SELECT upsert_foo(1, 'bye');
SELECT upsert_foo(3, 'hi');
SELECT upsert_foo(3, 'bye');
PostgreSQL >= v15
Big news on this topic as in PostgreSQL v15, it is possible to use MERGE command. In fact, this long awaited feature was listed the first of the improvements of the v15 release.
This is similar to INSERT ... ON CONFLICT but more batch-oriented. It has a powerful WHEN MATCHED vs WHEN NOT MATCHED structure that gives the ability to INSERT, UPDATE or DELETE on such conditions.
It not only eases bulk changes, but it even adds more control that tradition UPSERT and INSERT ... ON CONFLICT
Take a look at this very complete sample from official page:
MERGE INTO wines w
USING wine_stock_changes s
ON s.winename = w.winename
WHEN NOT MATCHED AND s.stock_delta > 0 THEN
INSERT VALUES(s.winename, s.stock_delta)
WHEN MATCHED AND w.stock + s.stock_delta > 0 THEN
UPDATE SET stock = w.stock + s.stock_delta
WHEN MATCHED THEN
DELETE;
PostgreSQL v9, v10, v11, v12, v13, v14
If version is under v15 and over v9.5 , probably best choice is to use UPSERT syntax, with ON CONFLICT clause
Here is the example how to do upsert with params and without special sql constructions
if you have special condition (sometimes you can't use 'on conflict' because you can't create constraint)
WITH upd AS
(
update view_layer set metadata=:metadata where layer_id = :layer_id and view_id = :view_id returning id
)
insert into view_layer (layer_id, view_id, metadata)
(select :layer_id layer_id, :view_id view_id, :metadata metadata FROM view_layer l
where NOT EXISTS(select id FROM upd WHERE id IS NOT NULL) limit 1)
returning id
maybe it will be helpful

Update MySQL table in chunks

I am trying to update a MySQL InnoDB table with c. 100 million rows. The query takes close to an hour, which is not a problem.
However, I'd like to split this update into smaller chunks in order not to block table access. This update does not have to be an isolated transaction.
At the same time, the splitting of the update should not be too expensive in terms of additional overhead.
I considered looping through the table in a procedure using :
UPDATE TABLENAME SET NEWVAR=<expression> LIMIT batchsize, offset,
But UPDATE does not have an offset option in MySQL.
I understand I could try to UPDATE ranges of data that are SELECTed on a key, together with the LIMIT option, but that seems rather complicated for that simple task.
I ended up with the procedure listed below. It works but I am not sure whether it is efficient with all the queries to identify consecutive ranges. It can be called with the following arguments (example):
call chunkUpdate('SET var=0','someTable','theKey',500000);
Basically, the first argument is the update command (e.g. something like "set x = ..."), followed by the mysql table name, followed by a numeric (integer) key that has to be unique, followed by the size of the chunks to be processed. The key should have an index for reasonable performance. The "n" variable and the "select" statements in the code below can be removed and are only for debugging.
delimiter //
CREATE PROCEDURE chunkUpdate (IN cmd VARCHAR(255), IN tab VARCHAR(255), IN ky VARCHAR(255),IN sz INT)
BEGIN
SET #sqlgetmin = CONCAT("SELECT MIN(",ky,")-1 INTO #minkey FROM ",tab);
SET #sqlgetmax = CONCAT("SELECT MAX(",ky,") INTO #maxkey FROM ( SELECT ",ky," FROM ",tab," WHERE ",ky,">#minkey ORDER BY ",ky," LIMIT ",sz,") AS TMP");
SET #sqlstatement = CONCAT("UPDATE ",tab," ",cmd," WHERE ",ky,">#minkey AND ",ky,"<=#maxkey");
SET #n=1;
PREPARE getmin from #sqlgetmin;
PREPARE getmax from #sqlgetmax;
PREPARE statement from #sqlstatement;
EXECUTE getmin;
REPEAT
EXECUTE getmax;
SELECT cmd,#n AS step, #minkey AS min, #maxkey AS max;
EXECUTE statement;
set #minkey=#maxkey;
set #n=#n+1;
UNTIL #maxkey IS NULL
END REPEAT;
select CONCAT(cmd, " EXECUTED IN ",#n," STEPS") AS MESSAGE;
END//

MySql Procedure Producing Wrong Results

I noticed a very interesting (and unexpected as well) thing yesterday. I was given a task (on production environment) to update three columns of TableA (I am changing the table and column names due to some obvious reasons) by getting all the values present in dummytable. The primary key of both the tables is column A. I know that this task was very simple and could be accomplished in several ways but I chose to write a stored procedure (given below) for that.
When the stored procedure was finished executing then it was noticed that columns B, C & statusCode were having the same values (i.e. thousands of records were having identical values in these three columns). Can someone tell me what went wrong?
1) What's wrong (or missing) in this stored procedure? (Dummy table had thousands of records as well)
2) What could be the best possible way of doing this task other than creating a stored procedure?
PS: I created (executed as well) this stored procedure on production environment using MySQL workbench and I got an exception during the execution of the procedure which stated something "Lost connection to MySQL server" but I guess since I was running this procedure on the remote machine then there was no interruption on the server while the procedure was executing.
Here is my stored procedure.
DELIMITER $$
CREATE DEFINER=`ABC`#`%` PROCEDURE `RetrieveExtractionData`()
BEGIN
DECLARE claimlisttraversed BOOLEAN DEFAULT FALSE;
DECLARE a VARCHAR(20);
DECLARE b INTEGER;
DECLARE c INTEGER;
DECLARE claimlist CURSOR FOR SELECT
`dummytable`.`A`,
`dummytable`.`B`,
`dummytable`.`C`
FROM `ABC`.`dummytable`;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET claimlisttraversed = TRUE;
OPEN claimlist;
claimlistloop: LOOP
FETCH claimlist INTO a, b, c;
IF claimlisttraversed THEN
CLOSE claimlist;
LEAVE claimlistloop;
END IF;
UPDATE `ABC`.`TableA`
SET
`B` = b,
`C` = c,
`statuscode` = 'Sent'
WHERE `A` = a;
END LOOP claimlistloop;
END
For your first question:
1) What's wrong (or missing) in this stored procedure? (Dummy table had
thousands of records as well)
I guess you forgot to CLOSE the CURSOR. Right after you end the LOOP, you should CLOSE the CURSOR.
END LOOP claimlistloop;
CLOSE claimlist;
END
2) What could be the best possible way of doing this task other than
creating a stored procedure?
Doing that in the STORED PROCEDURE should be fine. And also using CURSOR would be fine since you will just execute the procedure once (I guess because this is a production fix).
But from your question, you just want to update TableA based from the provided DummyTable. I assume that these tables have the same columns.
So I think this query is better than the CURSOR:
UPDATE TableA A
INNER JOIN DummyTable D ON D.A = A.A
SET A.B = D.B
, A.C = D.C
, A.statuscode = 'Sent';
But please try it first on a backup or dummy table. I haven't tested it yet.
Forget the cursor. In fact you should never use a cursor if it's avoidable. Cursors are incredibly slow.
Simply do
UPDATE
yourTable yt
INNER JOIN dummyTable dt ON yt.A = dt.A
SET
yt.B = dt.B,
yt.C = dt.C;
and you're fine.
1) What's wrong (or missing) in this stored procedure? (Dummy table
had thousands of records as well)
2) What could be the best possible
way of doing this task other than creating a stored procedure?
IMHO the most important thing that you're currently missing is that you don't need any cursors for that. Your whole stored procedure is one UPDATE statement. Execute it alone or wrap it in a stored procedure
CREATE PROCEDURE RetrieveExtractionData()
UPDATE TableA a JOIN dummytable d
ON a.a = d.a
SET a.b = d.b, a.c = d.c, a.statuscode = 'Sent';
You don't even need to change a delimiter and use BEGIN ... END block
Here is SQLFiddle demo.

MySQL Result consisted of more than one row on stored procedure

This stored procedure that I'm working on errors out some times. I am getting a Result consisted of more than one row error, but only for certain JOB_ID_INPUT values. I understand what causes this error, and so I have tried to be really careful to make sure that my return values are scalar when they should be. Its tough to see into the stored procedure, so I'm not sure where the error could be generated. Since the error is thrown conditionally, it has me thinking memory could be an issue, or cursor reuse. I don't work with cursors that often so I'm not sure. Thank you to anyone who helps.
DROP PROCEDURE IF EXISTS export_job_candidates;
DELIMITER $$
CREATE PROCEDURE export_job_candidates (IN JOB_ID_INPUT INT(11))
BEGIN
DECLARE candidate_count INT(11) DEFAULT 0;
DECLARE candidate_id INT(11) DEFAULT 0;
# these are the ib variables
DECLARE _overall_score DECIMAL(5, 2) DEFAULT 0.0;
# declare the cursor that will be needed for this SP
DECLARE curs CURSOR FOR SELECT user_id FROM job_application WHERE job_id = JOB_ID_INPUT;
# this table stores all of the data that will be returned from the various tables that will be joined together to build the final export
CREATE TEMPORARY TABLE IF NOT EXISTS candidate_stats_temp_table (
overall_score_ib DECIMAL(5, 2) DEFAULT 0.0
) engine = memory;
SELECT COUNT(job_application.id) INTO candidate_count FROM job_application WHERE job_id = JOB_ID_INPUT;
OPEN curs;
# loop controlling the insert of data into the temp table that is retuned by this function
insert_loop: LOOP
# end the loop if there is no more computation that needs to be done
IF candidate_count = 0 THEN
LEAVE insert_loop;
END IF;
FETCH curs INTO candidate_id;
# get the ib data that may exist for this user
SELECT
tests.overall_score
INTO
_overall_score
FROM
tests
WHERE
user_id = candidate_id;
#build the insert for the table that is being constructed via this loop
INSERT INTO candidate_stats_temp_table (
overall_score
) VALUES (
_overall_score
);
SET candidate_count = candidate_count - 1;
END LOOP;
CLOSE curs;
SELECT * FROM candidate_stats_temp_table WHERE 1;
END $$
DELIMITER ;
The WHERE 1 (as pointed out by #cdonner) definitely doesn't look right, but I'm pretty sure this error is happening because one of your SELECT ... INTO commands is returning more than one row.
This one should be OK because it's an aggregate without a GROUP BY, which always returns one row:
SELECT COUNT(job_application.id) INTO candidate_count
FROM job_application WHERE job_id = JOB_ID_INPUT;
So it's probably this one:
# get the ib data that may exist for this user
SELECT
tests.overall_score
INTO
_overall_score
FROM
tests
WHERE
user_id = candidate_id;
Try to figure out if it's possible for this query to return more than one row, and if so, how do you work around it. One way might be to MAX the overall score:
SELECT MAX(tests.overall_sore) INTO _overall_score
FROM tests
WHERE user_id = candidate_id
I think you want to use
LIMIT 1
in your select, not
WHERE 1
Aside from using this safety net, you should understand your data to figure out why you are getting multiple results. Without seeing the data, it is difficult for me to take a guess.

Locking stored procedure

I have a stored procedure with a select and an update. I would like to prevent multiple users, from executing it, at the same time, so I don't update, based on an incorrect select.
How do I lock it?
I've read various solutions (Transaction isolation, xlock), but I haven't been able to figure what I really want, and how to do it.
The easiest way is to forget about data locks but look at sp_getapplock to control access through the code
BEGIN TRY
EXEC sp_getapplock ...
SELECT ...
UPDATE ...
EXEC sp_releaseapplock
END TRY
...
Saying that, with thing like the OUTPUT clause and judicious use of ROWLOCK, UPDLOCK there is a good chance the UPDATE and SELECT can be one statement
Using the XLOCK table hint in the SELECT query:
CREATE TABLE [X]([x] INT NOT NULL)
GO
INSERT [X]([x]) SELECT 0
GO
CREATE PROCEDURE [ATOMIC]
AS
BEGIN
BEGIN TRAN
DECLARE #x INT = (
SELECT [x]
FROM [X] (XLOCK)
) + 1
UPDATE [X] SET [x] = #x
COMMIT TRAN
END
GO
You can then test this by running
EXEC [ATOMIC]
GO 10000
simultaneously from different sessions. You can test using
SELECT [x] FROM [X]
The value should be exactly 10 000 times the number of sessions you ran. If the number is less than expected you don't have atomic read + write, or some SPIDs may have been killed due to dead locking.