Using PySpark, I am updating a MySQL table. The schema has a unique key constraint on multiple three fields.
My Spark job will be running three times a day, since one of the column parts of the unique key is 'date'. I am getting a unique key constraint violation error if I am running a job more than once in a day.
Is there a way from Spark where we can delete the already-existing rows and insert new ones?
I searched the web for the solution, but I could not find any solution.
You should update the table on the database side. My suggestion is to create a temporary table in the MySQL database and the Spark job inserts data to the temporary table with overwrite mode.
Write a MySQL update script for the table with using a temporary table. And add a job chain after the Spark job to run the MySQL update script.
Assuming df.writer is being used, there isn't any UPSert mode currently.
Related
I have an AWS RDS MySQL table which has a unique key constraint. Let us take the table name as user and field name as username for example. Suppose the table has a row username=admin. I am executing the below queries using Laravel.
delete from user where username='admin'
insert into user (username) values ('admin')
Once in a while, I can see Integrity Constraint Violation1062 Duplicate Entry in the logs. It seems like the row is not deleted by the time the code executes the insert query. It works most of the time. I can update the code with other logic but wondering why it is happening! Is there any AWS RDS specific scenarios related to this use case? I have not experienced this using own MySQL installation. Thanks for your help!
I'm developing an Android application in which the data is stored in a SQLite database.
I have made sync with a MySQL database, in the web, to where I'm sending the data stored in the SQLite in the device.
The problem is that I don't know how to maintain the relations between tables, because the primary keys are going to be updated with AUTO_INCREMENT, and the foreign keys remain the same, breaking the relations between tables.
If this is a full migration, don't use auto increment during migration - create tables with normal columns. Use ALTER TABLE to change the model after import.
For incremental sync, the easiest way I see is additional column in each MySQL table called sqlite_id and filled with original id. Then you can update references using UPDATE (with joins).
Alternatives involve temporary tables for storing data and an auxiliary table used for pairing. Tedious for bigger data model.
The approach I tend to use, if possible, is to avoid auto increment in such situations. I have usaully an auxiliary table with four columns like this: t_import(tablename, operationid, sqlite_id, mysqlid).
Process is the following:
Import the primary keys into t_import. Use operationid to separate parallel imports if needed.
Generate new keys for data tables and store them into t_import table. This can be combined with step one.
Import the actual data and use t_import for setting new primary keys and restore relations.
That should work for most scenarios I know about.
Thanks or the help, you have given me some ideas.
I will try to add a id2 field to the tables that will store the same value as the primary key (_id)
When I send the information from SQLite to MySQL and the primary key is incremented I will have the id2 field with the original value of the primary key so that I can compare it with the foreign key of the other tables and update it.
Let’s see if it works.
Thanks
I want to save all rows get deleted from db in a different table , So once in a day i will run a php command and deleted all related files from server.
I have created a trigger to save deleted row in to a table and it working fine but table rows get delete due to foreign relationship not get save.
I think triggers are not execute on rows get deleted due to foreign key constrain.
please help
You need to alter the trigger on your master table. It should backup all the child records first and then the master table record. Instead of having multiple triggers for all child tables you can try to achieve it using a single trigger.
I'm trying to clean up a 7.5GiB table in MySQL by executing the following command:
DELETE FROM wp_commentmeta WHERE comment_id NOT IN (SELECT comment_id FROM wp_comments);
There is no foreign key between the two fields. Because of the size of (the second? both?) tables, attempting to execute this results in the following error:
Multi-statement transaction required more than 'max_binlog_cache_size'
bytes of storage; increase this mysqld variable and try again
The table is huge enough that I can't feasibly raise binlog_cache_size to accommodate this request. Short of dumping the two tables to disk and diffing their contents with a parser offline, is there some way to restructure the query to more-efficiently perform what I need to do?
Some of things I could do (but I wish to choose the correct/smart course of option):
Create a new table with a foreign key constraint between the two fields and insert into it, then delete the old and rename the new.
Using a MySQL derived/virtual table to create a view I could export then re-import
Dump the two tables and compare w/ a parser to generate a list of IDs to delete
Suggestions welcome, please!
Try this one:
DELETE wcm
FROM wp_commentmeta wcm
LEFT JOIN wp_comments wc ON wc.comment_id = wcm.comment_id
WHERE wc.comment_id IS NULL;
I have an app that has to import TONS of data from a remote source. From 500 to 1500 entries per call.
Sometimes some of the data coming in will need to replace data already stored in the dB. If I had to guess, I would say once in 300 or 400 entries would one need to be replaced.
Each incoming entry has a unique ID. So I am trying to figure out if it is more efficient to always issue a delete command based on this ID or to check if there is already an entry THEN delete.
I found this SO post where it talks about the heavy work a dB has to do to delete something. But it is discussing a different issue so I'm not sure if it applies here.
Each incoming entry has a unique ID. So I am trying to figure out if it is more efficient to always issue a delete command based on this ID or to check if there is already an entry THEN delete.
Neither. Use INSERT ... ON DUPLICATE KEY UPDATE ....
Since you are using MySQL and you have a unique key then let MySQL do the work.
You can use
INSERT INTO..... ON DUPLICATE KEY UPDATE......
MySQL will try to insert a new record in the table, is the unique value exists in the table then MySQL will update all the field that you have set after the update
You can read more about the INSERT INTO..... ON DUPLICATE KEY UPDATE...... syntax on
http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html