So I've made table1 and table2 and a trigger such that when there's an insert on table1, data gets inserted into table2 from table1. I then go another step with another trigger where after insert on table2, it inserts into another table, table3 with data from table2. The trigger in place is 'FOR EACH ROW', so unfortunately, when a second insert happens on table1, it goes into table2, and table3 reads in the new, second row, AND the first row again.
Ideally to prevent this from happening, or to reduce the impact, it would make sense to remove duplicates at the end or at the start of a respective trigger so it's not exponentially filling up tables with duplicate rows. However, I've not been able to find a way to do it thus far within a trigger. Is it even possible? Any help? The tables also do not have Primary or Foreign Keys. Thanks in advance.
An example of what I've tried so far:
DELETE FROM table2 WHERE rowid NOT IN (SELECT MIN(rowid) FROM table2 GROUP BY col1, col2, col3, ...);
Though I think this is for SQLite? As I've seen this working for SQLite databases, whereas here I just get an error saying it can't recognise the column rowid.
I also tried WHERE NOT EXISTS during insert, which works for not inserting duplicates in the first place, however I need an update as part of the trigger which changes some column values, so it won't work in this case as the rows will always be different from their initial insert.
Related
I have a table say Table1 in mysql. We have an application that stores data in Table1, millions of new record get saved daily. We have a requirement of extracting data from this table, transform it and then load it to new table say Table2 (kinda ETL process), which should be happened live in the interval of some seconds. How can i perform it efficiently and without copying duplicate records from Table1.
I though of introducing new field in Table1 say Extracted to keep track of extraction. So, if particular row has already been extracted, field Extracted will have the value Y indicating extraction. If not, then field Extracted will have value N, which means this row still needs to be extracted. Means ETL job needs to update this field Extracted in Table1 after extraction. What i am wondering is, Would it be efficient to update records in such a huge table where millions of new data get saved daily ?? Please suggest!!
Thank You Guys!!
If you need to keep the data in Table2 in sync (and slighthly modified) with those in Table1 you have a couple of options on MySQL level:
Triggers - create after INSERT, UPDATE, DELETE trigger which transfers the data to Table2 immediately and does the transformation for you.
Views - if data in Table2 are read only, create a view where select definition does the required transformation.
The advantage of both approaches is that Table2 is always up to date with Table1 and no extra fields are required.
use this to ignore duplicate record
INSERT IGNORE INTO Table2 (field1,field2) VALUES (x,y);
or use this to update record if there is a duplicate record on the table
INSERT INTO Table2 (field1,field2) VALUES (x,y) ON DUPLICATE KEY UPDATE primarykey=primarykeyvalues;
I have a case where I'm doing two queries: query1 is a bulk INSERT ... ON DUPLICATE KEY UPDATE on table1. For query2, I want to do another bulk INSERT on table2 with some application data along with using the ids inserted/updated from query1. I know I can do this with an intermediate query, selecting the ids I need from table1 and then inserting them into table2 along with application data, but I really want to avoid the extra network back-and-forth of that query along with the db overhead. Is there any way I can either get the ids inserted/updated from query1 when running that, or do some kind of complex, but relatively less expensive INSERT ... SELECT FROM in query2 to avoid this?
As far as I know, getting ids added/modified returned from query1 is impossible without a separate query, and I can't think of a way to batch INSERT ... SELECT FROM where the insertion values for each row are dependent on the selected value, but I'd love to be proven wrong, or shown a way around either of those.
There is no way to get a set of IDs as a result of a bulk INSERT.
One option you have is indeed to run a SELECT query to get the IDs and use them in the second bulk INSERT. But that's a hassle.
Another option is to run the 2nd bulk INSERT into a temporary table, let's call it table3, then use INSERT INTO table2 ... SELECT FROM ... table1 JOIN table3 ...
With a similar use case we eventually found that this is the fastest option, given that you index table3 correctly.
Note that in this case you don't have a SELECT that you need to loop over in your code, which is nice.
I've two tables one is the main table having data and I want to insert data from another existing table having about 13 million records. I'm using the query to insert from another table i.e.
insert into table1 ( column1, col2 ...) select col1, col2... from table2;
But, unfortunately the query fails as lock wait timeout comes Error 1205.
What is the best way to do it in least time without timeout.
If you have a primary key on table2, then you can use that for ordering and inserting in batches:
insert into table1 ( column1, col2 ...)
select col1, col2...
from table2
order by <primary key>
limit 0, 100000
Then repeat this for additional values. (Of course, the 100,000 is arbitrary. A larger value might work. A smaller value might be necessary.)
Another possibility is to remove all indexes and insert triggers from table1, try the insert without them, and then add them back after the new data is in the table.
I need some help with MySQL triggers, and my guess is it should be quite simple. I have a db which consists of 3 tables (trable1, table2, table3), each of these has two columns (column1, column2). So here is what I need. If a row is inserted into table1, I want it to be automaticaly replicated to table2 and table3. Likewise with updates on table2 and table3. But here is a trick, I heed an if condition, which checks if the row is already there, otherwise SQL throws me an error. I would be really grateful for any help.
Do you want it to throw an error or do you want to avoid it?
In the first case, you could do something like
IF EXISTS (SELECT 1 FROM table2or3 WHERE col1 = NEW.col1) THEN
SIGNAL ...
END IF;
Read more about signals here.
If you don't want to throw an error there's also the possibility to
INSERT INTO foo(col1, col2) VALUES (NEW.col1, NEW.col2)
ON DUPLICATE KEY UPDATE whatever = whatever;
If you specify ON DUPLICATE KEY UPDATE, and a row is inserted that would cause a duplicate value in a UNIQUE index or PRIMARY KEY, an UPDATE of the old row is performed.
Read more about it here.
Or
INSERT IGNORE ....
If you use the IGNORE keyword, errors that occur while executing the INSERT statement are treated as warnings instead. For example, without IGNORE, a row that duplicates an existing UNIQUE index or PRIMARY KEY value in the table causes a duplicate-key error and the statement is aborted. With IGNORE, the row still is not inserted, but no error is issued.
Read more about it here.
I have a single procedure that has two insert statements in it for two different tables. I must insert data into table1 before I can insert into table2. I'm using PHP to do the data collection. What I'd like to know is how to insert multiple rows into table2, which can have many rows associated with table1. How would I do this?
I want to only store the person in table1 just one time but table2 requires multiple rows. If these insert statements were in separate procedures, I wouldn't have a problem but I just don't know how I would insert more than one row into table2 without table1 rejecting a second duplicate record.
BEGIN
INSERT INTO user(name, address, city) VALUES(Name, Address, City);
INSERT INTO order(order_id, desc) VALUES(OrderNo, Description);
END
I'd suggest you do it separately, otherwise you'd need a complicated solution which is prone to error if something changes.
The complicated solution is:
join all orderno and descriptions with a separator. (orderno#description)
join all orders with a different separator. (orderno#description/orderno#description/...)
pass it to the procedure
in the procedure, split the string by order separator, then loop through each of them
for each order, split the string by the first separator, then insert into the appropriate columns
As you can see, this is bad.
I am sorry, but what's stopping you from inserting data into these (seemingly unrelated) tables in separate queries? If you don't like the idea of it failing halfway through, you can wrap it into a transaction. I know, mysqli and pdo can do that just fine.
Answering your question directly, insert's ignore mode turns errors during insertion into warnings, so upon attempting to insert a duplicate row the warning is issued and the row is not inserted, but there is no error.
You could use the IGNORE keyword on the first statement.
http://dev.mysql.com/doc/refman/5.1/en/insert.html:
If you use the IGNORE keyword, errors that occur while executing the INSERT statement are treated as warnings instead. For example, without IGNORE, a row that duplicates an existing UNIQUE index or PRIMARY KEY value in the table causes a duplicate-key error and the statement is aborted. With IGNORE, the row still is not inserted, but no error is issued.But somehow this seems rather inefficient to me, a "stabbed from behind through the chest in the eye"-solution.