First MySQL command line:
use usersbase;
LOAD DATA LOCAL INFILE 'D:/base/users.txt'
INTO TABLE users
FIELDS TERMINATED BY ',';
Second:
use usersbase;
set session transaction isolation level read uncommitted;
select count(1) from users;
How to stop lodaing from file, if i see, that users table have n rows, and i dont need more? How to save current loaded rows, and stop loading?
Try this:
Use LOAD DATA INFILE .. IGNORE ...
Add temporary trigger to this table like
CREATE TRIGGER prevent_excess_lines_insertion
BEFORE INSERT
ON users
FOR EACH ROW
BEGIN
IF 50000 < (SELECT COUNT(*) FROM USERS) THEN
SET NEW.id = 1;
END IF;
END
When the line is loaded then the amount of rows in the table (except the line to be inserted) is counted and compared with pre-defined rows amount (50000).
If current rows amount is less then the row is inserted.
If predefined amount is reached then some predefined value (1) is assigned to primary key column. This causes unique constraint violation, which is ignored due to IGNORE modifier.
In this case the whole file will be nevertheless loaded (but only needed rows amount will be inserted).
If you want to break the process then remove IGNORE modifier and replace SET statement with SIGNAL which sets generic SQL error, and loading process will be terminated.
Do not forget to remove the trigger immediately after performing the import.
Note that COUNT(*) in InnoDB can be pretty slow on large tables. Doing it before each insert might make the load take a while. – Barmar
This is true :(
You may use user-defined variable instead of querying the amount of rows in a table. The trigger will be
CREATE TRIGGER prevent_excess_lines_insertion
BEFORE INSERT
ON users
FOR EACH ROW
BEGIN
IF (#needed_count := #needed_count - 1) < 0 THEN
SET NEW.id = 1;
END IF;
END
Before insertion you must set this variable to the amount of rows to be loaded, for example, SET #needed_count := 50000;. Variable must be set in the same connection strictly !!! And variable's name must not interfere with another variables names if they're used.
Related
MySQL provides an automatic mechanism to increment record IDs. This is OK for many purposes, but I need to be able to use sequences as offered by ORACLE. Obviously, there is no point in creating a table for that purpose.
The solution SHOULD be simple:
1) Create a table to hosts all the needed sequences,
2) Create a function that increases the value of a specific sequence and returns the new value,
3) Create a function that returns the current value of a sequence.
In theory, it looks simple... BUT...
When increasing the value of a sequence (much the same as nextval in Oracle), you need to prevent other sessions to perform this operation (or even fetch the current value) till the updated is completed.
Two theoretical options:
a - Use an UPDATE statement that would return the new value in a single shot, or
b - Lock the table between the UPDATE and SELECT.
Unfortunately, it would appear that MySQL does not allow to lock tables within functions / procedures, and while trying to make the whole thing in a single statement (like UPDATE... RETURNING...) you must use #-type variables which survive the completion of the function/procedure.
Does anyone have an idea/working solution for this?
Thanks.
The following is a simple example with a FOR UPDATE intention lock. A row-level lock with the INNODB engine. The sample shows four rows for next available sequences that will not suffer from the well-known INNODB Gap Anomaly (the case where gaps occur after failed usage of an AUTO_INCREMENT).
Schema:
-- drop table if exists sequences;
create table sequences
( id int auto_increment primary key,
sectionType varchar(200) not null,
nextSequence int not null,
unique key(sectionType)
) ENGINE=InnoDB;
-- truncate table sequences;
insert sequences (sectionType,nextSequence) values
('Chassis',1),('Engine Block',1),('Brakes',1),('Carburetor',1);
Sample code:
START TRANSACTION; -- Line1
SELECT nextSequence into #mine_to_use from sequences where sectionType='Carburetor' FOR UPDATE; -- Line2
select #mine_to_use; -- Line3
UPDATE sequences set nextSequence=nextSequence+1 where sectionType='Carburetor'; -- Line4
COMMIT; -- Line5
Ideally you do not have a Line3 or bloaty code at all which would delay other clients on a Lock Wait. Meaning, get your next sequence to use, perform the update (the incrementing part), and COMMIT, ASAP.
The above in a stored procedure:
DROP PROCEDURE if exists getNextSequence;
DELIMITER $$
CREATE PROCEDURE getNextSequence(p_sectionType varchar(200),OUT p_YoursToUse int)
BEGIN
-- for flexibility, return the sequence number as both an OUT parameter and a single row resultset
START TRANSACTION;
SELECT nextSequence into #mine_to_use from sequences where sectionType=p_sectionType FOR UPDATE;
UPDATE sequences set nextSequence=nextSequence+1 where sectionType=p_sectionType;
COMMIT; -- get it and release INTENTION LOCK ASAP
set p_YoursToUse=#mine_to_use; -- set the OUT parameter
select #mine_to_use as yourSeqNum; -- also return as a 1 column, 1 row resultset
END$$
DELIMITER ;
Test:
set #myNum:= -1;
call getNextSequence('Carburetor',#myNum);
+------------+
| yourSeqNum |
+------------+
| 4 |
+------------+
select #myNum; -- 4
Modify the stored procedure accordingly for you needs, such as having only 1 of the 2 mechanisms for retrieving the sequence number (either the OUT parameter or the result set). In other words, it is easy to ditch the OUT parameter concept.
If you do not adhere to ASAP release of the LOCK (which obviously is not needed after the update), and proceed to perform time consuming code, prior to the release, then the following can occur after a timeout period for other clients awaiting a sequence number:
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting
transaction
Hopefully this is never an issue.
show variables where variable_name='innodb_lock_wait_timeout';
MySQL Manual Page for innodb_lock_wait_timeout.
On my system at the moment it has a value of 50 (seconds). A wait of more than a second or two is probably unbearable in most situations.
Also of interest during TRANSACTIONS is that section of the output from the following command:
SHOW ENGINE INNODB STATUS;
I'm experiencing a race condition because I'm dealing with a lot of concurrency.
I'm trying to combine these two mysql statements to execute at the same time.
I need to select a row and update the same one...
SELECT id_file FROM filenames WHERE pending=1 LIMIT 1;
UPDATE filenames SET pending=2 WHERE id_file=**id of select query**;
Another solution to the race-condition I'm experiencing would be to perform an UPDATE query where pending=1 and somehow get the ID of the updated row, but I'm not sure if that's even possible?
Thanks
To deal with concurrency is one of the basic functions of transactions.
Wrap your queries into one transaction and tell the DBMS, that you need the row not to change in between with FOR UPDATE:
BEGIN;
SELECT id_file FROM filenames WHERE pending=1 LIMIT 1 FOR UPDATE;
# do whatever you like
UPDATE filenames SET pending=2 WHERE id_file=**id of select query**;
COMMIT;
You can execute these statements with 4 mysqli_query calls, and do whatever you want in between, without need to worry about the consistency of your database. The selected row is save until you release it.
You can avoid the "race" condition by performing just an UPDATE statement on the table, allow that to identify the row to modified, and then subsequently retrieve values of columns from the row.
There's a "trick" returning values of columns, in your case, the value of the id_file column from the row that was just updated. You can use either the LAST_INSERT_ID() function (only if the column is integer type), or a MySQL user-defined variable.
If the value of the column you want to retrieve is integer, you can use LAST_INSERT_ID() function (which supports a BIGINT-64 value).
For example:
UPDATE filenames
SET pending = 2
, id_file = LAST_INSERT_ID(id_file)
WHERE pending = 1
LIMIT 1;
Following the successful execution of the UPDATE statement, you'll want to verify that at least one row was affected. (If any rows satisfied the WHERE, and the statement succeeded, we know that one row will be affected. Then you can retrieve that value, in the same session:
SELECT LAST_INSERT_ID();
to retrieve the value of id_file column of the last row processed by the UPDATE statement. Note that if the UPDATE processes multiple rows, only the value of last row that was processed by the UPDATE will be available. (But that won't be an issue for you, since there's a LIMIT 1 clause.)
Again, you'll want to ensure that a row was actually updated, before you rely on the value returned by the LAST_INSERT_ID() function.
For non-integer columns, you can use a MySQL user-defined variable in a similar way, assigning the value of the column to a user-defined variable, and then immediately retrieve the value stored in the user-defined variable.
-- initialize user-defined variable, to "clear" any previous value
SELECT #id_file := NULL;
-- save value of id_file column into user-defined variable
UPDATE filenames
SET pending = 2
, id_file = (SELECT #id_file := id_file)
WHERE pending = 1
LIMIT 1;
-- retrieve value stored in user-defined variable
SELECT #id_file;
Note that the value of this variable is maintained within the session. If the UPDATE statement doesn't find any rows that satisfy the predicate (WHERE clause), the value of the user-defined variable will be unaffected... so, to make sure you don't inadvertently get an "old" value, you may want to first initialize that variable with a NULL.
Note that it's important that a subsequently fired trigger doesn't modify the value of that user defined variable. (The user-defined variable is "in scope" in the current session.)
It's also possible to do the assignment to the user-defined variable within in a trigger, but I'm not going to demonstrate that, and I would not recommend you do it in a trigger.
I have a cursor which is declared as so:
DECLARE staging_cur CURSOR FOR
SELECT
col1, col2, ......
FROM crawl_db.staging_listing
WHERE is_deleted = FALSE;
I then fetch each row, perform some checks and then insert the row into another (production) database
OPEN staging_cur;
the_loop: LOOP
FETCH staging_cur
INTO col1_val, col2_val,.....;
-- perform some checks and some optional inserts
-- for example, if city with given name is not found in production DB, insert it
-- insert into production db
END LOOP the_loop;
I realize I need to declare a variable (col1_val, col2_val ...) for each corresponding column of table staging_listing (col1, col2....). The problem is that this table contains 90-100 columns and declaring all variables is really cumbersome
It seems there should be a better way than this. Is there some way in which we can access the column of the cursor's current row without having to declare separate variables to hold the column values?
If you need to insert rows into another table, then a better way is to use INSERT...SELECT statement. Try to avoid using cursors.
INSERT ... SELECT Syntax.
I want to add 10,000 rows to a MySQL table. The table has a field, let's call it "Number", that needs to increment from 540000 to 549999.
This is just something that needs to run once, so performance is not critical. Is there a MySQL command that will do this, or do I need to write a script to call 10,000 insert statements?
Assuming you have those 10,000 rows in a tab-delimited file, you can bulk load the data into your table and set the Number value incrementally like this:
set #number = (540000 - 1);
load data infile '/tmp/your_data.txt'
ignore into table your_table
(column_1,...,column_n)
set Number = (#number := #number + 1);
I ended up creating a script with 10,000 insert statements.
I'm having some issues dealing with updating and inserting millions of row in a MySQL Database. I need to flag 50 million rows in Table A, insert some data from the marked 50 million rows into Table B, then update those same 50 million rows in Table A again. There are about 130 million rows in Table A and 80 million in Table B.
This needs to happen on a live server without denying access to other queries from the website. The problem is while this stored procedure is running, other queries from the website end up locked and the HTTP request times out.
Here's gist of the SP, a little simplified for illustration purposes:
CREATE DEFINER=`user`#`localhost` PROCEDURE `MyProcedure`(
totalLimit int
)
BEGIN
SET #totalLimit = totalLimit;
/* Prepare new rows to be issued */
PREPARE STMT FROM 'UPDATE tableA SET `status` = "Being-Issued" WHERE `status` = "Available" LIMIT ?';
EXECUTE STMT USING #totalLimit;
/* Insert new rows for usage into tableB */
INSERT INTO tableB (/* my fields */)
SELECT /* some values from TableA */
FROM tableA
WHERE `status` = "Being-Issued";
/* Set rows as being issued */
UPDATE tableB SET `status` = 'Issued' WHERE `status` = 'Being-Issued';
END$$
DELIMITER ;
Processing 50M rows three times will be slow irrespective of what you're doing.
Make sure your updates are affecting smaller-sized, disjoint sets. And execute each of them one by one, rather than each and every one of them within the same transaction.
If you're doing this already and MySQL is misbehaving, try this slight tweak to your code:
create a temporary table
begin
insert into tmp_table
select your stuff
limit ?
for update
do your update on A using tmp_table
commit
begin
do your insert on B using tmp_table
do your update on A using tmp_table
commit
this should keep locks for a minimal time.
What about this? It basically calls the original stored procedure in a loop until the total amount needed is reached, and having a sleep period in between calls (like 2 seconds) to allow other queries to process.
increment is the amount to do at one time (using 10,000 in this case)
totalLimit is the total amount to be processed
sleepSec is the amount of time to rest between calls
BEGIN
SET #x = 0;
REPEAT
SELECT SLEEP(sleepSec);
SET #x = #x + increment;
CALL OriginalProcedure( increment );
UNTIL #x >= totalLimit
END REPEAT;
END$$
Obviously it could use a little math to make sure the increment doesn't go over the total limit if its not evenly divisible, but it appears to work (by work I mean allow other queries to still be processed from web requests), and seems to be faster overall as well.
Any insight here? Is this a good idea? Bad idea?