Error inserting 100 million record in MYSQL tables - mysql

i have table Temp_load with Columns:
key bigint(19) UN PK
plane_key bigint(20) PK
locat_key bigint(20) PK
time_period_key bigint(19) UN PK
business_unit_key bigint(19) UN
curret_allocated tinyint(1)
value float
valid_ind int(11)
last_updated datetime
Above are the column which table Temp_load is containing in it.
i am trying to insert data into this table using below query
INSERT INTO <Schema_name>.`Temp_load`
(key,
plane_key,
locat_key,
time_period_key,
business_unit_key,
curret_allocated,
value)
(SELECT DISTINCT 1,
plane_key,
locat_key,
1,
CASE
WHEN current_area = 'HEALTH' THEN 1
WHEN current_area = 'BEAUTY/PERSONAL' THEN 3
WHEN current_area = 'GM' THEN 2
WHEN current_area = 'CONSUMABLES' THEN 4
end,
current_flag,
opt_metric_1
FROM staging.curves
WHERE opt_metric_1 IS NOT NULL
AND current_area IS NOT NULL);
The Source table is having 29 million records in it. The above insert statment is running for more than 5 hours and still running.
I am doing an insert of 29 Million in one go like this from the same table i need to do 3 times more insert on different column.
When i try to load using LOAD DATA INFILE it is throwing ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
i also tried increasing innodb_lock_wait_timeout to 120 but still we are facing the problem.
Also before loading i ave disabled below flags.
SET FOREIGN_KEY_CHECKS = 0;
SET UNIQUE_CHECKS = 0;
SET AUTOCOMMIT = 0;
Do we have any other optimal solution for this? where the insert can be done in much faster way.
Thanks

The Problem might be that the PRIMARY KEY integrity is checked for each row to insert. You should remove the PK before you insert your data. In MariaDB you can disable some of These checks, but I dont know if it is possible for MySQL, too.
An alternative might be to UNION your Temp_load and staging.curves into another table:
CREATE TABLE myNewTable
SELECT ... FROM Temp_load
UNION
SELECT ... FROM staging.curves

Related

MySQL - Select only the rows that have not been selected in the last read

Problem description
I have a table, say trans_flow:
CREATE TABLE trans_flow (
id BIGINT(20) AUTO_INCREMENT PRIMARY KEY,
card_no VARCHAR(50) DEFAULT NULL,
money INT(20) DEFAULT NULL
)
New data is inserted into this table constantly.
Now, I want to fetch only the rows that have not been fetched in the last query. For example, at 5:00, id ranges from 1 to 100, and I read the rows 80 - 100 and do some processing. Then, at 5:01, the id comes to 150, and I want to get exactly the rows 101 - 150. Otherwise, the processing program will read in old and already processed data. Note that such queries are committed continuously. From a certain perspective, I want to implement "streaming process" on MySQL.
A tentative idea
I have a simple but maybe ugly solution. I create an auxiliary table query_cursor which stores the beginning and end ids of one query:
CREATE TABLE query_cursor (
task_id VARCHAR(20) PRIMARY KEY COMMENT 'Specify which task is reading this table',
first_row_id BIGINT(20) DEFAULT NULL,
last_row_id BIGINT(20) DEFAULT NULL
)
During each query, I first update the query range stored in this table by:
UPDATE query_cursor
SET first_row_id = (SELECT last_row_id + 1 FROM query_cursor WHERE task_id = 'xxx'),
last_row_id = (SELECT MAX(id) FROM trans_flow)
WHERE task_id = 'xxx'
And then, doing query on table trans_flow using stored cursors:
SELECT * FROM trans_flow
WHERE id BETWEEN (SELECT first_row_id FROM query_cursor WHERE task_id = 'xxx')
AND (SELECT last_row_id FROM query_cursor WHERE task_id = 'xxx')
Question for help
Is there a simpler and more elegant implementation that can achieve the same effect (the best if no need to use an auxiliary table)? The version of MySQL is 5.7.

MySQL 8 - Trigger on INSERT - duplicate AUTO_INCREMENT id for VCS

Trying to
create trigger that is called on INSERT & sets originId = id (AUTO_INCREMENT),
I've used SQL suggested here in 1st block:
CREATE TRIGGER insert_example
BEFORE INSERT ON notes
FOR EACH ROW
SET NEW.originId = (
SELECT AUTO_INCREMENT
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = DATABASE()
AND TABLE_NAME = 'notes'
);
Due to information_schema caching I have also set
information_schema_stats_expiry = 0
in my.cnf file. Now information gets updated almost instantly on every INSERT, as I've noticed..
But, performing "direct" INSERTs via console with ~2min intervals, I keep getting not updated AUTO_INCREMENT values in originId.
(They shoud be equal to id fields)
While explicit queries, fetching AUTO_) result in updated correct values.
Thus I suspect that the result of SELECT AUTO_INCREMENT... subquery gets somehow.. what? cached?
How can one get around this?
Thank you.
Edit 1
I intended to implement sort of VCS this way:
User creates new Note, app marks it as 'new' and performs an INSERT in MySQL table. It is the "origin" note.
Then user might edit this Note (completely) in UI, app will mark is as 'update' and INSERT it in MySQL table as a new row, again. But this time originId should be filled with an id of "origin" Note (by app logics). And so on.
This allows PARTITIONing by originId on SELECT, fetching only latest versions to UI.
initial Problem:
If originId of "origin" Note is NULL, MySQL 8 window function(s) in default (and only?) RESPECT_NULL mode perform(s) framing not as expected ("well, duh, it's all about your NULLs in grouping-by column").
supposed Solution:
Set originId of "origin" Notes to id on their initial and only INSERT, expecting 2 benefits:
Easily fetch "origin" Notes via originId = id,
perform correct PARTITION by originId.
resulting Problem:
id is AUTO_INCREMENT, so there's no way (known to me) of getting its new value (for the new row) on INSERT via backend (namely, PHP).
supposed Solution:
So, I was hoping to find some MySQL mechanism to solve this (avoiding manipulations with id field) and TRIGGERs seemed a right way...
Edit 2
I believed automated duplicating id AUTO_INCREMENT field (or any field) within MySQL to be extra fast & super easy, but it totally doesn't appear so now..
So, possibly, better way is to have vcsGroupId UNSIGNED INT field, responsible for "relating" Note's versions:
On create and "origin" INSERT - fill it with MAX(vcsGroupId) + 1,
On edit and "version" INSERT - fill it with "sibling"/"origin" vcsGroupId value (fetched with CTE),
On view and "normal" SELECT - perform framing with Window Function by PARTITION BY vcsGroupId, ORDER BY id or timestamp DESC, then just using 1st (or ascending order by & using last) row,
On view and "origin" SELECT - almost the same, but reversed..
It seems easier, doesn't it?
What you are doing is playing with fire. I don't know exactly what can go wrong with your trigger (beside that it doesn't work for you already), but I have a strong feeling that many things can and will go wrong. For example: What if you insert multiple rows in a single statement? I don't think, that the engine will update the information_schema for each row. And it's going to be even worse if you run an INSERT ... SELECT statement. So using the information_schema for this task is a very bad idea.
However - The first question is: Why do you need it at all? If you need to save the "origin ID", then you probably plan to update the id column. That is already a bad idea. And assuming you will find a way to solve your problem - What guarantees you, that the originId will not be changed outside the trigger?
However - the alternative is to keep the originId column blank on insert, and update it in an UPDATE trigger instead.
Assuming this is your table:
create table vcs_test(
id int auto_increment,
origin_id int null default null,
primary key (id)
);
Use the UPDATE trigger to save the origin ID, when it is changed for the first time:
delimiter //
create trigger vcs_test_before_update before update on vcs_test for each row begin
if new.id <> old.id then
set new.origin_id = coalesce(old.origin_id, old.id);
end if;
end;
delimiter //
Your SELECT query would then be something like this:
select *, coalesce(origin_id, id) as origin_id from vcs_test;
See demo on db-fiddle
You can even save the full id history with the following schema:
create table vcs_test(
id int auto_increment,
id_history text null default null,
primary key (id)
);
delimiter //
create trigger vcs_test_before_update before update on vcs_test for each row begin
if new.id <> old.id then
set new.id_history = concat_ws(',', old.id_history, old.id);
end if;
end;
delimiter //
The following test
insert into vcs_test (id) values (null), (null), (null);
update vcs_test set id = 5 where id = 2;
update vcs_test set id = 4 where id = 5;
select *, concat_ws(',', id_history, id) as full_id_history
from vcs_test;
will return
| id | id_history | full_id_history |
| --- | ---------- | --------------- |
| 1 | | 1 |
| 3 | | 3 |
| 4 | 2,5 | 2,5,4 |
View on DB Fiddle

Running a MySQL Insert when update fails

What is the fastest approach to update a row and if the parameters don't exist insert it.
My table has 2 columns for ids quote_id, order_id and those columns combined would make a unique column. I don't want 2 rows containing the same quote_id and order_id but either can have multiple rows.
id | quote_id | order_id
1 | q200 | o100
2 | q200 | o101
3 | q201 | o100
Previously I would have added a third field and combined those 2 fields with a - so I could use the ON DUPLICATE KEY UPDATE. But this is not very efficient as I seem to forget to add those fields sometimes.
My idea is to try to run the update query and if it fails the insert it as I run a lot more update queries then insert. How would I put this into a single query instead of the MySQL server having to return a error and then I rerun the insert query.
if (
UPDATE table_name SET column1=value, column2=value2 WHERE some_column=some_value === ERROR
) THEN
INSERT INTO table_name ....
I looked through some of the MySQL documentation and I couldn't find a example that showed how an error is detected in a IF statement
You should have a PRIMARY or UNIQUE constraint over the column(s) that identify rows uniquely. It's normal to use multiple columns for this, and SQL support syntax for it:
CREATE TABLE MyTable (
quote_id VARCHAR(4) NOT NULL,
order_id VARCHAR(4) NOT NULL,
other_data VARCHAR(4),
...
PRIMARY KEY(quote_id, order_id)
);
Then you can rely on the unique constraint to cause an INSERT to fail and run an UPDATE instead:
INSERT INTO MyTable (quote_id, order_id, other_data) VALUES ('q200', 'o100', 'blah blah')
ON DUPLICATE KEY UPDATE
other_data = VALUES(other_data);
Using the VALUES() clause in the UPDATE part means "use the same value for the respective column that I tried to use in the INSERT part."

MySQL insert on duplicate update for non-PRIMARY key

I am little confused with insert on duplicate update query.
I have MySQL table with structure like this:
record_id (PRIMARY, UNIQUE)
person_id (UNIQUE)
some_text
some_other_text
I want to update some_text and some_other_text values for person if it's id exists in my table.person or insert new record in this table otherwise. How it can be done if person_id is not PRIMARY?
You need a query that check if exists any row with you record_id (or person_id). If exists update it, else insert new row
IF EXISTS (SELECT * FROM table.person WHERE record_id='SomeValue')
UPDATE table.person
SET some_text='new_some_text', some_other_text='some_other_text'
WHERE record_id='old_record_id'
ELSE
INSERT INTO table.person (record_id, person_id, some_text, some_other_text)
VALUES ('new_record_id', 'new_person_id', 'new_some_text', 'new_some_other_text')
Another better approach is
UPDATE table.person SET (...) WHERE person_id='SomeValue'
IF ROW_COUNT()=0
INSERT INTO table.person (...) VALUES (...)
Your question is very valid. This is a very common requirement. And most people get it wrong, due to what MySQL offers.
The requirement: Insert unless the PRIMARY key exists, otherwise update.
The common approach: ON DUPLICATE KEY UPDATE
The result of that approach, disturbingly: Insert unless the PRIMARY or any UNIQUE key exists, otherwise update!
What can go horribly wrong with ON DUPLICATE KEY UPDATE? You insert a supposedly new record, with a new PRIMARY key value (say a UUID), but you happen to have a duplicate value for its UNIQUE key.
What you want is a proper exception, indicating that you are trying to insert a duplicate into a UNIQUE column.
But what you get is an unwanted UPDATE! MySQL will take the conflicting record and start overwriting its values. If this happens unintentionally, you have mutilated an old record, and any incoming references to the old record are now referencing the new record. And since you probably won't tell the query to update the PRIMARY column, your new UUID is nowhere to be found. If you ever encounter this data, it will probably make no sense and you will have no idea where it came from.
We need a solution to actually insert unless the PRIMARY key exists, otherwise update.
We will use a query that consists of two statements:
Update where the PRIMARY key value matches (affects 0 or 1 rows).
Insert if the PRIMARY key value does not exist (inserts 1 or 0 rows).
This is the query:
UPDATE my_table SET
unique_name = 'one', update_datetime = NOW()
WHERE id = 1;
INSERT INTO my_table
SELECT 1, 'one', NOW()
FROM my_table
WHERE id = 1
HAVING COUNT(*) = 0;
Only one of these queries will have an effect. The UPDATE is easy. As for the INSERT: WHERE id = 1 results in a row if the id exists, or no row if it does not. HAVING COUNT(*) = 0 inverts that, resulting in a row if the id is new, or no row if it already exists.
I have explored other variants of the same idea, such as with a LEFT JOIN and WHERE, but they all looked more convoluted. Improvements are welcome.
13.2.5.3 INSERT ... ON DUPLICATE KEY UPDATE Syntax
If you specify ON DUPLICATE KEY UPDATE, and a row is inserted that
would cause a duplicate value in a UNIQUE index or PRIMARY KEY, MySQL
performs an UPDATE of the old row.
Example:
DELIMITER //
DROP PROCEDURE IF EXISTS `sp_upsert`//
DROP TABLE IF EXISTS `table_test`//
CREATE TABLE `table_test` (
`record_id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
`person_id` INT UNSIGNED NOT NULL,
`some_text` VARCHAR(50),
`some_other_text` VARCHAR(50),
UNIQUE KEY `record_id_index` (`record_id`),
UNIQUE KEY `person_id_index` (`person_id`)
)//
INSERT INTO `table_test`
(`person_id`, `some_text`, `some_other_text`)
VALUES
(1, 'AAA', 'XXX'),
(2, 'BBB', 'YYY'),
(3, 'CCC', 'ZZZ')//
CREATE PROCEDURE `sp_upsert`(
`p_person_id` INT UNSIGNED,
`p_some_text` VARCHAR(50),
`p_some_other_text` VARCHAR(50)
)
BEGIN
INSERT INTO `table_test`
(`person_id`, `some_text`, `some_other_text`)
VALUES
(`p_person_id`, `p_some_text`, `p_some_other_text`)
ON DUPLICATE KEY UPDATE `some_text` = `p_some_text`,
`some_other_text` = `p_some_other_text`;
END//
DELIMITER ;
mysql> CALL `sp_upsert`(1, 'update_text_0', 'update_text_1');
Query OK, 2 rows affected (0.00 sec)
mysql> SELECT
-> `record_id`,
-> `person_id`,
-> `some_text`,
-> `some_other_text`
-> FROM
-> `table_test`;
+-----------+-----------+---------------+-----------------+
| record_id | person_id | some_text | some_other_text |
+-----------+-----------+---------------+-----------------+
| 1 | 1 | update_text_0 | update_text_1 |
| 2 | 2 | BBB | YYY |
| 3 | 3 | CCC | ZZZ |
+-----------+-----------+---------------+-----------------+
3 rows in set (0.00 sec)
mysql> CALL `sp_upsert`(4, 'new_text_0', 'new_text_1');
Query OK, 1 row affected (0.00 sec)
mysql> SELECT
-> `record_id`,
-> `person_id`,
-> `some_text`,
-> `some_other_text`
-> FROM
-> `table_test`;
+-----------+-----------+---------------+-----------------+
| record_id | person_id | some_text | some_other_text |
+-----------+-----------+---------------+-----------------+
| 1 | 1 | update_text_0 | update_text_1 |
| 2 | 2 | BBB | YYY |
| 3 | 3 | CCC | ZZZ |
| 5 | 4 | new_text_0 | new_text_1 |
+-----------+-----------+---------------+-----------------+
4 rows in set (0.00 sec)
SQL Fiddle demo
How about my approach?
Let's say you have one table with a autoincrement id and three text-columns. You want to insert/update the value of column3 with the values in column1 and column2 being a (non unique) key.
I use this query (without explicitly locking the table):
insert into myTable (id, col1, col2, col3)
select tmp.id, 'col1data', 'col2data', 'col3data' from
(select id from myTable where col1 = 'col1data' and col2 = 'col2data' union select null as id limit 1) tmp
on duplicate key update col3 = values(col3)
Anything wrong with that? For me it works the way I want.
A flexible solution should retain the atomicity offered by INSERT ... ON DUPLICATE KEY UPDATE and work regardless of if it's autocommit=true and not depend on a transaction with an isolation level of REPEATABLE READ or greater.
Any solution performing check-then-act across multiple statements would not satisfy this.
Here are the options:
If there tends to be more inserts than updates:
INSERT INTO table (record_id, ..., some_text, some_other_text) VALUES (...);
IF <duplicate entry for primary key error>
UPDATE table SET some_text = ..., some_other_text = ... WHERE record_id = ...;
IF affected-rows = 0
-- retry from INSERT OR ignore this conflict and defer to the other session
If there tends to be more updates than inserts:
UPDATE table SET some_text = ..., some_other_text = ... WHERE record_id = ...;
IF affected-rows = 0
INSERT INTO table (record_id, ..., some_text, some_other_text) VALUES (...);
IF <duplicate entry for primary key error>
-- retry from UPDATE OR ignore this conflict and defer to the other session
If you don't mind a bit of ugliness, you can actually use INSERT ... ON DUPLICATE KEY UPDATE and do this in a single statement:
INSERT INTO table (record_id, ..., some_text, some_other_text) VALUES (...)
ON DUPLICATE KEY UPDATE
some_text = if(record_id = VALUES(record_id), VALUES(some_text), some_text),
some_other_text = if(record_id = VALUES(record_id), VALUES(some_other_text), some_other_text)
IF affected-rows = 0
-- handle this as a unique check constraint violation
Note: affected-rows in these examples mean affected rows and not found rows. The two can be confused because a single parameter switches which of these values the client is returned.
Also note, if some_text and some_other_text are not actually modified (and the record is not otherwise changed) when you perform the update, those checks on affected-rows = 0 will misfire.
I came across this post because I needed what's written in the title, and I found a pretty handy solution, but no one mentioned it here, so I thought of pasting it here. Note that this solution is very handy if you're initiating your database tables. In this case, when you create your corresponding table, define your primary key etc. as usual, and for the combination of columns you want to be unique, simply add
UNIQUE(column_name1,column_name2,...)
at the end of your CREATE TABLE statement, for any combination of the specified columns you want to be unique. Like this, according to this page here, "MySQL uses the combination of values in both column column_name1 and column_name2 to evaluate the uniqueness", and reports an error if you try to make an insert which already has the combination of values for column_name1 and column_name2 you provide in your insert. Combining this way of creating a database table with the corresponding INSERT ON DUPLICATE KEY syntax appeared to be the most suitable solution for me. Just need to think of it carefully before you actually start using your table; when setting up your database tables.
For anyone else, like me, who is a DB noob....the above things didn't work for me. I have a primary key and a unique key... And I wanted to insert if unique key didn't exist. After a LOT of Stack Overflow and Google searching, I found not many results for this... but I did find a site that gave me a working answer: https://thispointer.com/insert-record-if-not-exists-in-mysql/
And for ease of reading here is my answer from that site:
INSERT INTO table (unique_key_column_name)
SELECT * FROM (SELECT 'unique_value' AS unique_key_column_name) AS temp
WHERE NOT EXISTS (
SELECT unique_key_column_name FROM table
WHERE unique_key_column_name = 'unique_value'
) LIMIT 1;
Please also note the ' marks are wrapped around for me because I use string in this case.

Mysql if entry exists

Is there a possibility to check if record exists using mysql?
rowName | number
----------------
dog | 1
cat | 2
For example:
If i have a variable $var = 'dog', which already exists in my database, i want the system to add +1 number to the dog row.
On the other hand, when i have, for example, variable $var='fish', which does not exist in my database, i want the system to insert new row 'fish' with number 1.
I am wondering if there is one query alternative to two different queries using php conditions. I assume it would be faster running only one mysql query.
Please see this INSERT ... ON DUPLICATE KEY UPDATE. For example
INSERT INTO table (rowName, `number`) VALUES ('$var', 1)
ON DUPLICATE KEY UPDATE `number` = `number` + 1;
Try this:
// you can check record exists or not
SELECT EXISTS(SELECT rowName FROM table WHERE rowName="$var");
// you can make one query also
INSERT INTO table(`rowName`, `number`) VALUES ("$var", 1)
ON DUPLICATE KEY UPDATE `number` = `number`+ 1;