Large MySQL Table daily updates? - mysql

I have a MySQL table that has a bunch of product pricing information on around 2 million products. Every day I have to update this information for any products whose pricing information has changed [huge pain].
I am wondering what the best way to handle these changes are other than running something like compare and update any products that have changed ?
Love any advice that you can provide

For bulk updates you should definitely be using LOAD DATA INFILE rather than a lot of smaller update statements.
First, load the new data into a temporary table:
LOAD DATA INFILE 'foo.txt' INTO TABLE bar (productid, info);
Then run the update:
UPDATE products, bar SET products.info = bar.info WHERE products.productid = bar.productid;
If you also want to INSERT new records from the same file that you're updating from, you can SELECT INTO OUTFILE all of the records that don't have a matching ID in the existing table then load that outfile into your products table using LOAD DATA INFILE.

I maintain a price comparison engine with millions of prices and I select each row that I find in the source and update each row individually. If there is no row then I insert. Its best to use InnoDB transactions to speed this up.
This is all done by a PHP script that knows how to parse the source files and update the tables.

Related

Do UPDATE in the first place and then INSERT for new data (reports) into mysql

I get a report in a tab delimited file which stores some SKUs and the current quantities of them.
Which means most of the time the inventory is the same and we just have to update the quantities.
But it can happen, that a new SKU is in the list which we have to insert instead of updating.
We are using an INNODB table for storing those SKUs. At the moment we just cut the file by tabs and line breaks and make an INSERT ... ON DUPLICATE KEY UPDATE query which is quite inefficient, because INSERT is expensive at INNODB, right? Also tricky because when a list with a lot of SKUs coming in > 20k it just take some minutes.
So my resolution for now is to just make a LOAD DATA INFILE into an tmp table and afterwards do the INSERT ... ON DUPLICATE KEY UPDATE, which should be faster i think.
Also is there another solution which does a simple UPDATE in the first place and only if there are some left, it performs and INSERT? This would be perfect, but yet i could not find anything about it. Is there a way to delete rows which returned an update: 1?
Sort the CSV file by the PRIMARY KEY of the table.
LOAD DATA INFILE into a separate table (as you said)
INSERT INTO real_table SELECT * FROM tmp_table ON DUPLICATE KEY UPDATE ... -- Note: This is a single INSERT.
Caveat: This may block the table from other uses during step 3. A solution: Break the CSV into 1000-row chunks. COMMIT after each chunk.

Mysql setting a record as deleted or archive

Is there any way to omit some records in mysql select statement and not deleting them?
We can easily add a column for example deleted and set it to 1 for deleted ones and keep them but the problem is that we have to put where deleted = 1 in all queries.
What is the best way to keep some records as an archive?
I don't know how many tables you have and how much data you want to store, but a solution could be this one:
You create a tblName_HIST table for each the tables (tblName) you want to keep the virtually deleted data
Optional: Add a DELETED_DATE column to keep track of the date the record was deleted.
You add a Trigger on the tblName tables that AFTER DELETE statement INSERT the record in the tblName_HIST table.
This will allow you to keep the Queries and the DB tables made since now without modify them that much.

Restructure huge unnormalized mysql database

Hi I have a huge unnormalized mysql database with (~100 million) urls (~20% dupes) divided into identical split tables of 13 million rows each.
I want to move the urls into a normalized database on the same mySql server.
The old database table is unnormalized, and the url's have no index
It look like this:
entry{id,data,data2, data3, data4, possition,rang,url}
And i'm goin to slit it up into multiple tables.
url{id,url}
data{id,data}
data1{id,data}
etc
The first thing I did was
INSERT IGNORE INTO newDatabase.url (url)
SELECT DISTINCT unNormalised.url FROM oldDatabase.unNormalised
But the " SELECT DISTINCT unNormalised.url" (13 million rows) took ages, and I figured that that since "INSERT IGNORE INTO" also do a comparison, it would be fast to just do a
INSERT IGNORE INTO newDatabase.url (url)
SELECT unNormalised.url FROM oldDatabase.unNormalised
Without the DISTINCT, is this assumption Wrong?
Any way it still takes forever and i need some help, is there a better way of dealing withe this huge quantity of unnormalized data?
Whould it be best if i did a SELECT DISTINCT unNormalised.url" on the entire 100 milion row database, and exported all the id's, and then moved only those id's to the new database with lets say a php script?
All ideas are welcomed, i have no clue how to port all this date without it taking a year!
ps it is hosted on a rds amazon server.
Thank you!
As the MySQL Manual states that LOAD DATA INFILE is quicker than INSERT, the fastest way to load your data would be:
LOCK TABLES url WRITE;
ALTER TABLE url DISABLE KEYS;
LOAD DATA INFILE 'urls.txt'
IGNORE
INTO TABLE url
...;
ALTER TABLE url ENABLE KEYS;
UNLOCK TABLES;
But since you already have the data loaded into MySQL, but just need to normalize it, you might try:
LOCK TABLES url WRITE;
ALTER TABLE url DISABLE KEYS;
INSERT IGNORE INTO url (url)
SELECT url FROM oldDatabase.unNormalised;
ALTER TABLE url ENABLE KEYS;
UNLOCK TABLES;
My guess is that INSERT IGNORE ... SELECT will be faster than INSERT IGNORE ... SELECT DISTINCT but that's just a guess.

How to create a trigger that would save the deleted data (multiple records) to a Production Table

I use a temporary table to keep a good performance of sql-server, I have a copy of that table (production table), I created a trigger that when I delete the data from the temporary table it inserts the data to the production table. The issue it that when I delete the data records from the temporary table it only inserts the first record.
Can I save selected records, from the deleted data records? For example I want to save to Production Table those records that in the field POST = 'T'
This should be a pretty simple requirement along the following lines
CREATE TRIGGER YourTrigger
ON Staging
AFTER DELETE
AS
INSERT INTO Production
SELECT *
FROM DELETED
But using the OUTPUT clause may well be more efficient than a trigger anyway
DELETE Staging
OUTPUT DELETED.*
INTO Production

MYSQL: Load Data Infile but update if same key found?

I have a members table. Half the data/fields are populated through an online CMS.
But for the member's core contact detail fields, they come from a CSV exported from a desktop database.
I wanted to be able to upload this CSV and use the LOAD DATA command to update the members contact detail fields (matching on id) but without touching/erasing the other fields.
Is there a way to do this or must I instead loop through each row of the CSV and UPDATE... (if that's the case, any tips for the best way to do it?)
The Load Data Infile command supports the REPLACE keyword. This might be what you're looking for. From the manual:
REPLACE works exactly like INSERT,
except that if an old row in the table
has the same value as a new row for a
PRIMARY KEY or a UNIQUE index, the old
row is deleted before the new row is
inserted
The Load Data Infile command also has options where you can specify which columns to update, so perhaps you can upload the data, only specifying the columns which you want to update.