MySQL INSERT SELECT on large static table - mysql

I need to copy the content of one table to another. So I started using:
INSERT new_table SELECT * FROM old_table
However, I am getting the following error now:
1297, "Got temporary error 233 'Out of operation records in transaction coordinator (increase MaxNoOfConcurrentOperations)' from NDBCLUSTER"
I think I have an understanding why this occurs: My table is huge, and MySQL tries to take a snapshot in time (lock everything and make one large transaction out of it).
However, my data is fairly static and there is no other concurrent session that would modify the data. How can I tell MySQL to copy one row at a time, or in smaller chunks, without locking the whole thing?
Edit note: I already know that I can just read the whole table row-by-row into memory/file/dump and write back. I am interested to know if there is an easy way (maybe setting isolation level?). Note that the engine is InnoDB.

Data Migration is one of the few instances where a CURSOR can make sense, as you say, to ensure that the number of locks stays sane.
Use a cursor in conjunction with TRANSACTION, where you commit after every row, or after N rows (e.g. use a counter with modulo)

select the data from innodb into an outfile and load infile into
cluster

Related

MySQL performing a "No impact" temporary INSERT with replication avoiding Locks

SO, we are trying to run a Report going to screen, which will not change any stored data.
However, it is complex, so needs to go through a couple of (TEMPORARY*) tables.
It pulls data from live tables, which are replicated.
The nasty bit when it comes to take the "eligible" records from
temp_PreCalc
and populate them from the live data to create the next (TEMPORARY*) table output
resulting in effectively:
INSERT INTO temp_PostCalc (...)
SELECT ...
FROM temp_PreCalc
JOIN live_Tab1 ON ...
JOIN live_Tab2 ON ...
JOIN live_Tab3 ON ...
The report is not a "definitive" answer, expectation is that is merely a "snapshot" report and will be out-of-date as soon as it appears on screen.
There is no order or reproducibility issue.
So Ideally, I would turn my TRANSACTION ISOLATION LEVEL down to READ COMMITTED...
However, I can't because live_Tab1,2,3 are replicated with BIN_LOG STATEMENT type...
The statement is lovely and quick - it takes hardly any time to run, so the resource load is now less than it used to be (which did separate selects and inserts) but it waits (as I understand it) because of the SELECT that waits for a repeatable/syncable lock on the live_Tab's so that any result could be replicated safely.
In fact it now takes more time because of that wait.
I'd like to SEE that performance benefit in response time!
Except the data is written to (TEMPORARY*) tables and then thrown away.
There are no live_ table destinations - only sources...
these tables are actually not TEMPORARY TABLES but dynamically created and thrown away InnoDB Tables, as the report Calculation requires Self-join and delete... but they are temporary
I now seem to be going around in circles finding an answer.
I don't have SUPER privilege and don't want it...
So can't SET BIN_LOG=0 for this connection session (Why is this a requirement?)
So...
If I have a scratch Database or table wildcard, which excludes all my temp_ "Temporary" tables from replication...
(I am awaiting for this change to go through at my host centre)
Will MySQL allow me to
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;
INSERT INTO temp_PostCalc (...)
SELECT ...
FROM temp_PreCalc
JOIN live_Tab1 ON ...
JOIN live_Tab2 ON ...
JOIN live_Tab3 ON ...
;
Or will I still get my
"Cannot Execute statement: impossible to write to binary log since
BINLOG_FORMAT = STATEMENT and at least one table uses a storage engine
limited to row-based logging..."
Even though its not technically true?
I am expecting it to, as I presume that the replication will kick in simply because it sees the "INSERT" statement, and will do a simple check on any of the tables involved being replication eligible, even though none of the destinations are actually replication eligible....
or will it pleasantly surprise me?
I really can't face using an unpleasant solution like
SELECT TO OUTFILE
LOAD DATA INFILE
In fact I dont think I could even use that - how would I get unique filenames? How would I clean them up?
The reports are run on-demand directly by end users, and I only have MySQL interface access to the server.
or streaming it through the PHP client, just to separate the INSERT from the SELECT so that MySQL doesnt get upset about which tables are replication eligible....
So, it looks like the only way appears to be:
We create a second Schema "ScratchTemp"...
Set the dreaded replication --replicate-ignore-db=ScratchTemp
My "local" query code opens a new mysql connection, and performs a USE ScratchTemp;
Because I have selected the default database of the "ignore"d one - none of my queries will be replicated.
So I need to take huge care not to perform ANY real queries here
Reference my scratch_ tables and actual data tables by prefixing them all on my queries with the schema qualified name...
e.g.
INSERT INTO LiveSchema.temp_PostCalc (...) SELECT ... FROM LiveSchema.temp_PreCalc JOIN LiveSchema.live_Tab1 etc etc as above.
And then close this connection just as soon as I can, as it is frankly dangerous to have a non-replicated connection open....
Sigh...?

how can i make database consistent in this situation?

I have problem while dealing with copy of database table entry from one file to other .
suppose i have two tables called one.sql , two.sql , now i wana transfer some record from one.sql to the table two.sql, then delete that entry from one.sql ater copying success.
problem : suppose power was gone after i make copy from one to two, the delete record
from one wasnt done , here in that case same record will be in both tables. that i dont want. so in this situation how to handle these types of inconsistent on fly.
Your RDBMS is not a simple datastore! It supports journaling, transaction isolation and atomic updates. So...
... with transactional tables (InnoDB) and with decent isolation level simply do:
START TRANSACTION -- Or SET autocommit = 0
INSERT INTO two SELECT * FROM one WHERE ...;
DELETE FROM one WHERE ...;
COMMIT
COMMIT will atomicity apply the changes to the DB. That is, from the other transactions point of view, the move is either done or not started. No one can see it half done. Even in case of catastrophic failure (power outage).
Of course, if you move all your records, you could also rely on RENAME TABLE...
You can use transaction blocks to descrease unexpected results at some degree. But solving a power problem is another thing.
You can however use a batch and check if two tables don't contain same records at some interval, if you are worried about a power problem.

sql log file growing too big

I have a table with 10 million records with no indexes and I am trying to dedupe the table. I tried the inserts with select where either using a left join or where not exists; but each time I get the error with violation of key. The other problem is that the log file grows too large and the transaction will not complete. I tried setting the recovery to simple as recommended online but that does not help. Here are the queries I used;
insert into temp(profile,feed,photo,dateadded)
select distinct profile,feed,photo,dateadded from original as s
where not exists(select 1 from temp as t where t.profile=s.profile)
This just produces the violation of key error. I tried using the following:
insert into temp(profile,feed,photo,dateadded)
select distinct profile,feed,photo,dateadded from original as s
left outer join temp t on t.profile=s.profile where t.profile is null
In both instances now the log file fills up before the transaction completes. So my main question is about the log file and I can figure out the deduping with the queries.
You may need to work in batches. Write a loop to go through 5000 (you can experiment with the number, I've had to go as far down as 500 or up to 50,000 depending on the db and how busy it was) records or so at a time.
What is your key? Likely your query will need to pick using an aggreagate function on dataadded (use the min or the max function).
the bigger the transaction, the bigger the transaction log will be.
The log is used for uncommitted recovery of an open transaction so if you’re not committing frequently and your executing a very large transaction, it will cause the log file to grow substantially. Once it commits, then the file will become free space. This is to safe guard the data in case something fails and roll back is needed.
my suggestion would be to run the insert in batches, committing after each batch

Insert..Select into InnoDB table with commit after each insert?

I just finished creating a new partitioned table to replace an old, non-partitioned table (renamed for safekeeping). I copied the newest data from the old table into the new table at the time I created it, but I still have roughly half the data left to copy over. The problem is, it's a live web service getting hammered nonstop, and every time I try to copy a chunk over via INSERT..SELECT, it insists on doing it as an atomic transaction (which consumes all the server's resources, slows everything to a crawl, and probably pushes the server dangerously close to running out of physical resources).
Just to be clear: OldTable is MyISAM. NewTable is InnoDB and partitioned by range on its primary key 'a'. Both tables have identical field names. The fields themselves aren't identical, but where they differ, the fields in NewTable are bigger.
The query that's causing problems looks like:
INSERT INTO NewTable (a,b,c,d,e,f,g)
SELECT a,b,c,d,e,f,g
FROM OldTable
WHERE a > 300000000 AND a <= 400000000
order by a
What I'd like for it to do: either commit after each insert, or just dispense with transactional integrity entirely and allow dirty reads to happen if they happen.
Locking NewTable (beyond possibly the one single row being inserted) is unacceptable. Locking OldTable is fine, because nothing else is using it anymore, anyway (besides the SQL to copy it to the new table, of course).
Also, is there a way to tell MySQL to do it at the lowest possible priority, and only work on the task in its (relative) free time?
In addition to reducing the number of rows being inserted at a time, try increasing the value of bulk_insert_buffer_size system variable to something more appropriate for your case? The default value is 8MB.

Should I commit or rollback a transaction that creates a temp table, reads, then deletes it?

To select information related to a list of hundreds of IDs... rather than make a huge select statement, I create temp table, insert the ids into it, join it with a table to select the rows matching the IDs, then delete the temp table. So this is essentially a read operation, with no permanent changes made to any persistent database tables.
I do this in a transaction, to ensure the temp table is deleted when I'm finished. My question is... what happens when I commit such a transaction vs. let it roll it back?
Performance-wise... does the DB engine have to do more work to roll back the transaction vs committing it? Is there even a difference since the only modifications are done to a temp table?
Related question here, but doesn't answer my specific case involving temp tables: Should I commit or rollback a read transaction?
EDIT (Clarification of Question):
Not looking for advice up to point of commit/rollback. Transaction is absolutely necessary. Assume no errors occur. Assume I have created a temp table, assume I know real "work" writing to tempdb has occurred, assume I perform read-only (select) operations in the transaction, and assume I issue a delete statement on the temp table. After all that... which is cheaper, commit or rollback, and why? What OTHER work might the db engine do at THAT POINT for a commit vs a rollback, based on this specific scenario involving temp-tables and otherwise read-only operations?
If we are talking about local temporary table (i.e. the name is prefixed with a single #), the moment you close your connection, SQL Server will kill the table. Thus, assuming your data layer is well designed to keep connections open as short a time as possible, I would not worry about wrapping the creation of temp tables in a transaction.
I suppose there could be a slight performance difference of wrapping the table in a transaction but I would bet it is so small as to be inconsequential compared to the cost of keeping a transaction open longer due to the time to create and populate the temp table.
A simpler way to insure that the temp table is deleted is to create it using the # sign.
CREATE TABLE #mytable (
rowID int,
rowName char(30) )
The # tells SQL Server that this table is a local temporary table. This table is only visible to this session of SQL Server. When the session is closed, the table will be automatically dropped. You can treat this table just like any other table with a few exceptions. The only real major one is that you can't have foreign key constraints on a temporary table. The others are covered in Books Online.
Temporary tables are created in tempdb.
If you do this, you won't have to wrap it in a transaction.