Check if a record from database exist in a csv file - mysql

today I come to you for inspiration or maybe ideas how to solve a task not killing my laptop with massive and repetitive code.
I have a CSV file with around 10k records. I also have a database with respective records in it. I have four fields inside both of these structures: destination, countryCode,prefix and cost
Every time I update a database with this .csv file I have to check if the record with given destination, countryCode and prefix exist and if so, I have to update the cost. That is pretty easy and it works fine.
But here comes the tricky part: there is a possibility that the destination may be deleted from one .csv file to another and I need to be aware of that and delete that unused record from the database. What is the most efficient way of handling that kind of situation?
I really wouldn't want to check every record from the database with every row in a .csv file: that sounds like a very bad idea.
I was thinking about some time_stamp or just a bool variable which will tell me if the record was modified during the last update of the DB BUT: there is also a chance that neither of params within the record change, thus: no need to touch that record and mark it as modified.
For that task, I use Python 3 and mysql.connector lib.
Any ideas and advice will be appreciated :)

If you're keeping a time stamp why do you care if it's updated even if nothing was changed in the record? If the reason is that you want to save the date of the latest update you can add another column saving a time stamp of the last time the record appeared in the csv and afterwords delete all the records that the value of this column in them is smaller than the date of the last csv.

If the .CSV is a replacement for the existing table:
CREATE TABLE new LIKE real;
load the .csv into `new` (Probably use LOAD DATA...)
RENAME TABLE real TO old, new TO real;
DROP TABLE old;
If you have good reason to keep the old table and patch it, then...
load the .csv into a table
add suitable indexes
do one SQL to do deletes (no loop needed). It is probably a multi-table DELETE.
do one sql to update the prices (no loop needed). It is probably a multi-table UPDATE.
You can probably do the entire task (either way) without touching Python.

Related

Fastest-Cleanest way to update database (mysql large tables)

I have a website feeded with large mysql tables (>50k of rows in some tables). Lets name one table "MotherTable". Every night I update the site with a new csv file (produced locally) that has to substitute "MotherTable" data.
The way I do this currently (I am not an expert, as you see), is:
- First, I TRUNCATE the MotherTable table.
- Second, I import the csv file to the empty table, with columns separated by "/" and skipping 1 line.
As the csv file is not very small, there are some seconds (or even a minute) when the MotherTable is empty, so the web users that make SELECTS on this table find nothing.
Obviously, I don't like that. Is there any procedure to update MotherTable in a way users note nothing? If not, what would be the quickest way to update the table with the new csv file?
Thank you!

Can I import a single column from a SQL dump file since hunders of new rows have been added since.?

I removed, what at the time I thought, was an unnecessary field/column from a table in mysql and now would like to import it back into the table. Just before I dropped the column, I did a backup of the whole table. Since then, close to 1000 new rows have been added.
How can I add the column and information back to the table?
I have a sandbox that I can "play" with. I have tried but have yet to get the proper end result. Thank you in advance.
Not sure if it matters but, the system is fedora 16.
What I would do is to load your dump into another table (e.g. table_bak, you might need to do this in your sandbox and redump). Then add your column back in to your live table with a sensible default value. Then you should be able to bring the old data back like this
UPDATE `table`,`table_bak` SET `table`.restored_column=`table_bak`.restored_column WHERE `table`.pk_id=`table_bak`.pk_id

The best way of update records in MySQL from XML feed

I am thinking about the best way, how to update my records in table (MySQL) from XML feed. I have database and this database contains the daily offers from several sales portals.
So now - about midnight I am deleting all records from my table (because I think is better delete everything from table than comparing currently data with inserted) and with using script (run of CRON) I am downloading new offers from portal (by their XML feeds). This way have one disadvantage -- the time interval among deleting old and inserting new records is ca 5 minutes -- and through this 5 minutes is table empty.
And I would like to ask you, if you could help me with some better way, how updating records in my table using better way...
Thanks a lot for your time!
I would import the new xml feeds into a temporary table, keeping the old table active; and then when all imports are successful, you can simply drop the original table and rename the new to be the original. This should result in less downtime, but more importantly it will give you the ability to abort the switch in the event the new import of XML fails on a given night.
hope that helps.

reflecting record deletions / additions between two datasets

I currently have a table of 3m records that needs updating nightly.
The data that populates this table comes from ~100 APIs that all get normalised into one jumbo table.
Problem:
How to reflect new records being added, and records being deleted at the source?
Facts:
I can't truncate the table every night and reinsert.
Each API provides a constant ID for each record (so I can keep track of what's what).
Some fields will be updated each night.
Solutions:
New records are easy, I just add them to my table with an AvailableFrom date.
Updates are also easy, for each record I check if it exists and if data has changed (performance will suck).
Deleted records are where I'm stuck.
The APIs just dump me a load of data, how do I tell if a record has "dropped off"?
I'm thinking a swap table of some sort - any ideas?
If the only way to tell whether a record has been deleted is to check whether the api delivers it any more without knowing what record exactly you are looking for you will need to keep track on the iports. If you always do a full import:
Solution 1:
set a flag for for every row in database, then do the import and update the flag for every row you get, then delete everything that hasn't been updated.
Solution 2:
Set an import ID (bound to the date?) for every import and write it to the database entries. so you know which row originates from which import. Overriding existing data with the import id from the latest import.
Then you can work only with the data from the last import.
but if you always do a full import, dropping everything before should be faster shouldnt it?

partial restore from sql dump?

I have a table that has 7000 rows,
I added a new column to this table
The table has a mysql DateTime so.
When i updated the table to fill in this new table it updated the datetime,
I took an sql dump just before i did the update so now i need to use the sql dump to revert the datetime back (and only that column).
How do i do that?
There are a couple ways I can think of to do this off the top of my head.
First is to create another mysql database and load the dump into that database (make sure it's not going to load into the first database from a use commmand in the dump), and then use the data from that database to construct the update queries for the first.
The second, easier, more hackish way, is to open the dump in a text editor, pull out just that table, and find and replace to make update statements for just that column based on primary key instead of inserts. You'd need to be able to find and replace on patterns.
A third way would be to load the dump in an abstract sql tool letting it do the parsing for you, and write new queries from the data in the abstract syntax trees.
A fourth, again hackish, possibility, if this isn't a live system, is to rollback and re-perform the more recent transformations (only if they are simple).
Restore the dump to a second table. Select the ID and datetime from that table. Use those results to update the rows in the original table corresponding to the IDs you got.