MySQL LOAD DATA INFILE issue with updating + inserting - mysql

I am given a rather poorly structured table that has a Primary Key set to autoincrement and an UNIQUE key that is just unique. Conceptually, the UNIQUE key was supposed to be the primary key, but whoever made the table didn't have the UNQIUE key's column information at the time of the table's construction.
Now, we need to start doing regular update to this table where a provided text file contains updated rows and new rows. The challenge would be to replace the row if there's a matching value in the UNIQUE key and we actually don't care about the primary key itself as long as it autoincrements.
However, the way that LOAD DATA INFILE is structured is that it'd reset the PK we already have, which is bad - The reason we kept the PK is that it is foreign key to other legacy table (Sigh...).
So... is there a way I can make an elegant SQL-only update script that reads the updated table in text form and just updates based on the UNIQUE key column without screwing up the PK?
I guess a solution would be to export the table to tab form and do VLOOKUP to assign rows with the matching PK value (or NULL if it is a new row).
Any input?
Edit: Someone suggested that I do LOAD DATE INFILE into a temporary table and then do INSERT/UPDATE from there. Based on what this post and that post say, here's the script I propose:
// Create temporary table
CREATE TABLE tmp {
// my specifications
}
// Load into temporary table
LOAD DATA LOCAL INFILE *'[my tab file]'*
REPLACE INTO TABLE *mytable* FIELDS TERMINATED BY '\t' ENCLOSED BY '"' LINES TERMINATED BY '\r\n';
// Set copy all the columns over except the PK column. This is for rows with existing UNIQUE values
UPDATE mytable
RIGHT JOIN tmp ON mytable.unique = tmp.unique
SET mytable.col1 = tmp.col1, ..., mytable.coln = tmp.coln, mytable.unique = tmp.unique;
// Now insert the rows with new UNIQUE values
INSERT IGNORE INTO mytable (mytable.col1, mytable.col2, ...)
SELECT tmp.col1, tmp.col2, ... FROM tmp
// Delete the temporary table now.
DROP tmp;
Edit2: I updated the above query and tested it. It should work. Any opinions?

You can load data into new table using LOAD DATA INFILE. Then use INSERT, UPDATE statements to change your table with data from new table, in this case you can link tables as you want - by primary/unique key or by any field(s).

Related

MySQL renaming and create table at the same time

I need to rename MySQL table and create a new MySQL table at the same time.
There is critical live table with large number of records. master_table is always inserted records from scripts.
Need to backup the master table and create a another master table with same name at the same time.
General SQL is is like this.
RENAME TABLE master_table TO backup_table;
Create table master_table (id,value) values ('1','5000');
Is there a possibility to record missing data during the execution of above queries?
Any way to avoid missing record? Lock the master table, etc...
What I do is the following. It results in no downtime, no data loss, and nearly instantaneous execution.
CREATE TABLE mytable_new LIKE mytable;
...possibly update the AUTO_INCREMENT of the new table...
RENAME TABLE mytable TO mytable_old, mytable_new TO mytable;
By renaming both tables in one statement, they are swapped atomically. There is no chance for any data to be written "in between" while there is no table to receive the write. If you don't do this atomically, some writes may fail.
RENAME TABLE is virtually instantaneous, no matter how large the table. You don't have to wait for data to be copied.
If the table has an auto-increment primary key, I like to make sure the new table starts with an id value greater than the current id in the old table. Do this before swapping the table names.
SELECT AUTO_INCREMENT FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA='mydatabase' AND TABLE_NAME='mytable';
I like to add some comfortable margin to that value. You want to make sure that the id values inserted to the old table won't exceed the value you queried from INFORMATION_SCHEMA.
Change the new table to use this new value for its next auto-increment:
ALTER TABLE mytable_new AUTO_INCREMENT=<increased value>;
Then promptly execute the RENAME TABLE to swap them. As soon as new rows are inserted to the new, empty table, it will use id values starting with the increased auto-increment value, which should still be greater than the last id inserted into the old table, if you did these steps promptly.
Instead of renaming the master_backup table and recreating it, you could
just create a backup_table with the data from the master_table for the first backup run.
CREATE TABLE backup_table AS
SELECT * FROM master_table;
If you must add a primary key to the backup table then run this just once, that is for the first backup:
ALTER TABLE backup_table ADD CONSTRAINT pk_backup_table PRIMARY KEY(id);
For future backups do:
INSERT INTO backup_table
SELECT * FROM master_table;
Then you can delete all the data in the backup_table found in the master_table like:
DELETE FROM master_table A JOIN
backup_table B ON A.id=B.id;
Then you can add data to the master_table with this query:
INSERT INTO master_table (`value`) VALUES ('5000'); -- I assume the id field is auto_incrementable
I think this should work perfectly even without locking the master table, and with no missing executions.

In mysql how to insert a column with huge data contained in table with no downtime

If a table in MySQL containing suppose 1 million record, how can I add a column at any position with no downtime expected.
MySQL's ALTER TABLE performance can become very frustrating with very large tables. ALTER statements makes a new temporary table, copies records from your existing table into the new table even if the data wouldn't strictly need to be copied, and then replaces the old table with the new table.
Suppose you have a table with one million records and if you try to add 3 columns in it, then it will certainly copy the table 3 times, which means coping 3 million records.
A faster way of adding columns is to create your own new table, then select all of the rows from the existing table into it. You can create the structure from the existing table, then modify the structure however you’d like, then select in the data. Make sure that you select the information into the new table in the same order as the fields are defined.
1. CREATE TABLE new_table LIKE table
2. INSERT INTO new_table SELECT * FROM table
3. RENAME TABLE table = old_table, table = new_table;
If you have foreign key constraints you can handle these foreign keys using
SET FOREIGN_KEY_CHECKS = 0;

Is there a MySQL command drop all indexes except PRIMARY index?

I have a database table with one Index where the keyname is PRIMARY, Type is BTREE, Unique is YES, Packed is NO, Column is ID, Cardinality is 728, and Collation is A.
I have a script that runs on page load that adds entries to the MySQL database table and also removes duplicates from the Database Table.
Below is the script section that deletes the duplicates:
// Removes Duplicates from the MySQL Database Table
// Removes Duplicates from the MySQL Database Table based on 'Entry_Date' field
mysql_query("Alter IGNORE table $TableName add unique key (Entry_Date)");
// Deletes the added index created by the Removes Duplicates function
mysql_query("ALTER TABLE $TableName DROP INDEX Entry_Date");
Using the Remove Duplicates command above, an additional index is added to the table. The next line command is suppose to delete this added index.
The problem is that sometimes the added index created by the Removes Duplicates command does not get deleted by the following Delete added index command and therefore more indexes are added to the table. These additional indexes prevent the script from adding additional data to the database until I remove the added indexes by hand.
My Question:
Is there a command or short function that I can add to the script that will delete all indexes except the original index mentioned in the beginning of this post?
I did read the following post, but I don't know if this is the correct script to use:
How to drop all of the indexes except primary keys with single query
I don't think so, what you can do is create copies but that wouldn't copy the index. for example if you make
create table1 as (select * from table_2), he will make copy but without index or PK.
After all the comments I think I realize what is happening.
You actually allow duplicates in the database. You just want to clean them some times.
The problem is that the method you have chosen to clean them is through creating a Unique key and using the IGNORE option which causes duplicate lines to get dropped instead of failing the unique key creation. then you drop the unique key so that duplicate rows can be added again. your problem is that sometimes the unique key is not being dropped.
I suggest you delete the duplicates in another way. supposing that your table name is "my_table" and your primary key is my_mey_column then:
delete from my_table where my_key_column not in (select min(my_key_column) from my_table group by Entry_Date)
Edit: the above won't work due to limitation in mysql as pointed by #a_horse_with_no_name
try the three following queries instead:
create temporary table if not exists tmp_posting_data select id from posting_data where 1=2
insert into tmp_posting_data(id) select min(id) from posting_data group by Entry_Date
delete from Posting_Data where id not in (select id FROM tmp_posting_data)
As a final note, try to reconsider the need to allow the rows to be duplicated also as suggested by #a_horse_with_no_name. instead of allowing rows to be entered and then deleted, you can create the unique key once in the database like:
Alter table posting_data add unique key (Entry_Date)
and then, when you are inserting new data from the RSS use the following instead of "insert" use "replace" which will delete the old row if it is a duplicate on the primary key or any unique index
replace into posting_data (......) values(.....)

Importing MySQL records with duplicate keys

I have two MySQL databases with identical table structure, each populated with several thousand records. I need to merge the two into a single database but I can't import one into the other because of duplicate IDs. It's a relational database with many linked tables (fields point to other table record IDs).
Edit: The goal is to have data from both databases in one final database, without overwriting data, and updating foreign keys to match with new record IDs.
I'm not sure how to go about merging the two databases. I could write a script I suppose, but there's many tables and it would take a while to do. I wondered if anyone else had encountered this problem, and the best way to go about it?
Just ignore the duplicates. The first time the key is inserted, it'll be inserted. The second time it will be ignored.
INSERT IGNORE INTO myTable (SELECT * FROM myOtherTable );
See the full INSERT syntax here.
The trick was to increment the IDs in one database by 1000 (or something won't overlap data in the target database), then import it.
Thanks for everyone's answers.
Are the duplicate IDs supposed to correspond to each other? You could create a new table with an auto increment field and save the existing keys as two columns.
That would just be a 'bulk copy' though. If there is some underlying relationship then that would dictate how to combine the data.
If you have two tables A1 and A2 and you want to merge this to AA you can do this:
INSERT INTO aa SELECT * FROM A1;
INSERT INTO aa SELECT * FROM A2 ON DUPLICATE KEY
UPDATE aa.nonkeyfield1 = a1.nonkeyfield1,
aa.nonkeyfield2 = a1.nonkeyfield2, ....;
This will overwrite fields with duplicate keys with A2 data.
A slightly slower method with simpler syntax is:
INSERT INTO aa SELECT * FROM A1;
REPLACE INTO aa SELECT * FROM A2;
This will do the same thing, but will not update duplicate rows, but instead delete the row from A1 first and then reinsert the data from A2.
If you want to merge a whole database with foreign keys, this will not work, because it will break the links between tables.
If you have a whole database and you do not want to overwrite data
I'd import the first database as normal into database A.
import the second database into a database B.
Set all foreign keys as on update cascade.
Double check this.
Now run the following statement on all tables on database B.
SELECT #increment:= MAX(pk) FROM A.table1;
UPDATE B.table1 SET pk = pk + #increment WHERE pk IS NOT NULL
ORDER BY pk DESC;
(The where clause is to stop MySQL from giving an error in strict mode)
If you write a script with those two lines per table in your database you can then insert all tables into database AA, remember to disable foreign key checks during the update with
SET foreign_key_checks = 0;
... do lots of inserts ...
SET foreign_key_checks = 1;
Good luck.
Create a new database table with an autoincrimented primary key as the first column. Then add the column names from your databases and import each one. Then just drop the old primary field, and rename the new one to match your primary name.

Duplicating rows in a mysql table without enumerating fields

This mysql table has an autoincrement field. I want to duplicate some rows. I thought I will use a temporary table:
CREATE TEMPORARY TABLE tmptab SELECT * FROM mytab WHERE somecondition = 1;
Before copying the data back to mytab I can now do some updates in tmptab.
UPDATE tmptab ... /* some updates */;
Because mytab has an autoincrement field I cannot simply copy the contents of tmptab to mytab. One solution would be to enumarate fields (and omit the autoincrement field).
I am looking for a solution without enumerating fields. This has advantages, for instance when fields will be added later.
I thougth I could erase the autoincrement field in tmptab (removing the autoincrement column) and then use a query similar to this one:
INSERT INTO mytab SELECT * FROM tmptab;
Would this work? The autoincrement field in mytab should be set correctly. Or is there a better way to do it?
I thougth I could erase the autoindex field in tmptab (removing the autoindex column) and then use a query similar to this one
You need to use a command like this:
UPDATE tmptab SET key_column=NULL
When you insert NULLs back into the original table, it will generate new auto_increment ids.
You might need to add a command to drop the primary key index on the temp table for this to work.