I have a table Provision with this structure:
ONT_ID varchar(12) PK
neID set('7360-1','7360-2','7360-3','5000-1','5000-2') not null
and some other crap
I have loaded a temporary table called tempTable that has the same structure and the same data. Prior to trying what I'm trying the neID in the Provision table was a varchar field. The values that were not the same as the set were deleted. (I've done this before without a problem.)
Neither this:
UPDATE Provision P
INNER JOIN tempTable TT ON TT.ONT_ID = P.ONT_ID
SET P.neID = TT.NE_ID
Nor this (broken up for readability):
update Provision P
set P.neID = (
select TT.NE_ID from tempTable TT where TT.ONT_ID = P.ONT_ID
)
...accomplishes what they are supposed to. What is going on?
The Provision table has a record of the ONT_ID and the neID is an empty string. The temp table has the same ONT_ID and a pertinent NE_ID. I'm trying to update the neID in the Provision table with the value that is in the temporary table.
Data in tempTable was wrong. Changed data and updated Provision table.
Related
I want to update a MySQL table from matlab in bulk. The current logic that I use iterates over the array and inserts it one-by-one which takes way too long.
Here is my current implementation-
function update_table(customer_id_list, cluster_id_list, write_conn)
num_customers = size(customer_id_list, 1);
for idx=1:num_customers+1
customer_id = customer_id_list(idx);
cluster_id = cluster_id_list(idx);
sql = strcat(sql, 'UPDATE table SET cluster_id = ', num2str(cluster_id), ' WHERE customer_id = ', num2str(customer_id));
exec(write_conn, sql);
end
end
Tried to look for documentation to do bulk update/insert, but haven't found anything yet.
Do an "upjoin" using a temporary table.
Build your update specification as a Matlab table array with all the cluster_id and customer_id pairs that specify the new values.
Create a SQL temporary table that contains columns for the key columns you'll be matching on and the columns to update.
CREATE TEMPORARY TABLE my_temp_table SELECT customer_id, cluster_id FROM table WHERE 1 = 0
Batch-insert your update specification data from Matlab into the temporary table using Matlab Database Toolbox's datainsert or sqlwrite.
Update the target table en masse by joining it to the temp table: UPDATE table SET targ.cluster_id = upd.cluster_id FROM table targ INNER JOIN my_temp_table upd ON targ.customer_id = upd.customer_id.
Drop the temp table.
Boom. If you're going to do this a lot, wrap it up in a generic upjoin() function.
See the Matlab documentation for datainsert and sqlwrite. Do not use fastinsert; despite its name, it is much slower than datainsert and sqlwrite.
I know, deleting duplicates from mysql is often discussed here. But none of the solution work fine within my case.
So, I have a DB with Address Data nearly like this:
ID; Anrede; Vorname; Nachname; Strasse; Hausnummer; PLZ; Ort; Nummer_Art; Vorwahl; Rufnummer
ID is primary Key and unique.
And i have entrys for example like this:
1;Herr;Michael;Müller;Testweg;1;55555;Testhausen;Mobile;012345;67890
2;Herr;Michael;Müller;Testweg;1;55555;Testhausen;Fixed;045678;877656
The different PhoneNumber are not the problem, because they are not relevant for me. So i just want to delete the duplicates in Lastname, Street and Zipcode. In that case ID 1 or ID 2. Which one of both doesn't matter.
I tried it actually like this with delete:
DELETE db
FROM Import_Daten db,
Import_Daten dbl
WHERE db.id > dbl.id AND
db.Lastname = dbl.Lastname AND
db.Strasse = dbl.Strasse AND
db.PLZ = dbl.PLZ;
And insert into a copy table:
INSERT INTO Import_Daten_1
SELECT MIN(db.id),
db.Anrede,
db.Firstname,
db.Lastname,
db.Branche,
db.Strasse,
db.Hausnummer,
db.Ortsteil,
db.Land,
db.PLZ,
db.Ort,
db.Kontaktart,
db.Vorwahl,
db.Durchwahl
FROM Import_Daten db,
Import_Daten dbl
WHERE db.lastname = dbl.lastname AND
db.Strasse = dbl.Strasse And
db.PLZ = dbl.PLZ;
The complete table contains over 10Mio rows. The size is actually my problem. The mysql runs on a MAMP Server on a Macbook with 1,5GHZ and 4GB RAM. So not really fast. SQL Statements run in a phpmyadmin. Actually i have no other system possibilities.
You can write a stored procedure that will each time select a different chunk of data (for example by rownumber between two values) and delete only from that range. This way you will slowly bit by bit delete your duplicates
A more effective two table solution can look like following.
We can store only the data we really need to delete and only the fields that contain duplicate information.
Let's assume we are looking for duplicate data in Lastname , Branche, Haushummer fields.
Create table to hold the duplicate data
DROP TABLE data_to_delete;
Populate the table with data we need to delete ( I assume all fields have VARCHAR(255) type )
CREATE TABLE data_to_delete (
id BIGINT COMMENT 'this field will contain ID of row that we will not delete',
cnt INT,
Lastname VARCHAR(255),
Branche VARCHAR(255),
Hausnummer VARCHAR(255)
) AS SELECT
min(t1.id) AS id,
count(*) AS cnt,
t1.Lastname,
t1.Branche,
t1.Hausnummer
FROM Import_Daten AS t1
GROUP BY t1.Lastname, t1.Branche, t1.Hausnummer
HAVING count(*)>1 ;
Now let's delete duplicate data and leave only one record of all duplicate sets
DELETE Import_Daten
FROM Import_Daten LEFT JOIN data_to_delete
ON Import_Daten.Lastname=data_to_delete.Lastname
AND Import_Daten.Branche=data_to_delete.Branche
AND Import_Daten.Hausnummer = data_to_delete.Hausnummer
WHERE Import_Daten.id != data_to_delete.id;
DROP TABLE data_to_delete;
You can add a new column e.g. uq and make it UNIQUE.
ALTER TABLE Import_Daten
ADD COLUMN `uq` BINARY(16) NULL,
ADD UNIQUE INDEX `uq_UNIQUE` (`uq` ASC);
When this is done you can execute an UPDATE query like this
UPDATE IGNORE Import_Daten
SET
uq = UNHEX(
MD5(
CONCAT(
Import_Daten.Lastname,
Import_Daten.Street,
Import_Daten.Zipcode
)
)
)
WHERE
uq IS NULL;
Once all entries are updated and the query is executed again, all duplicates will have the uq field with a value=NULL and can be removed.
The result then is:
0 row(s) affected, 1 warning(s): 1062 Duplicate entry...
For newly added rows always create the uq hash and and consider using this as the primary key once all entries are unique.
We are currently importing very large CSV files into a mySQL data warehouse. A key part of the processing is to flag whether a record in the CSV file match an existing record in the warehouse. The "match" is done by comparing specific fields in the new data against the previous version of the table. If the record is "new" or if there have been updates, we want to add it to the warehouse.
At the moment the processing plan is as follows :
~ read CSV file into mySQL table A
~ is primary key on A on old-A? If it isnt set record status to "NEW"
~ if key is on old-A, issue update statement , JOINING old-A to A
~ if A.field1 = old-A.field1 OR A.field2 = A.old-A.field2 OR A.field3 = old-A.field3 THEN flag record status as "UPDATE"
~ process NEW or UPDATEd records according to record status
File-size on A and old-A is currently in the order of 50M records. We would expect new records to be 1M, updates to be 5-10M.
Although we are currently using MYSQL for this processing, I am wondering whether it would simply be better to do this using a scripting language? We are finding in particular that the step to flag the updates is very time consuming. Essentially we have an UPDATE statement that is unable to use any indexation.
so
CREATE TABLE A (key1 bigint,
field1 varchar(50),
field2 varchar(50),
field 3 varchar(50) );
LOAD DATA ...
... add field rec_status to table A
... then
UPDATE A
LEFT JOIN old-A ON A.key1 = old-A.key1
SET rec_status = 'NEW'
WHERE old-A.key1 = NULL;
UPDATE A
JOIN old-A ON A.key1 = old-A.key1
SET rec_status = 'UPDATED'
WHERE A.field1 <> old-A.field1
OR A.field2 <> old-A.field2
OR A.field3 <> old-A.field3;
...
I will consider skipping the "flag" step. Process the CSV file using script or MySql table A using MySQL statement, select a record from old-A table base on whatever criteria, such as field1, or/and field2... of table A, if found, lock and update old-A record, delete processed record from CSV or table A. If not found, create record in old-A with data.
I am working with a database table that stores, among other things, an AssociateID field and a DocumentName field. The table is used to keep records of which associates are missing which documents.
I am trying to write a stored procedure that will remove a row for 'Restrictive Covenant Letters' if an entry for 'Termination Statement' cannot be found for the same associate. Below is the body of what I have so far:
DELETE from tbl_HR_Auditing_Reports
WHERE DocumentName = 'Restrictive Covenant Letters'
AND NOT EXISTS
(
SELECT TS.AssociateID
FROM
(SELECT AssociateID FROM tbl_HR_Auditing_Reports WHERE DocumentName = 'Termination Statement (*)') TS,
(SELECT AssociateID FROM tbl_HR_Auditing_Reports WHERE DocumentName = 'Restrictive Covenant Letters') RCL
WHERE TS.AssociateID = RCL.AssociateID
)
I think I am close, but sql isn't really my forte. If anyone could possibly help, I would really appreciate it!
According to the MySQL manual:
Currently, you cannot delete from a table and select from the same table in a subquery.
To get around this you can use a temporary table, inserting the relevant rows into the temporary table, then referencing it in the delete statement.
CREATE TEMPORARY TABLE tmp (AssociateID INT);
INSERT INTO tmp (AssociateID)
SELECT AssociateID
FROM tbl_HR_Auditing_Reports
WHERE DocumentName = 'Termination Statement (*)';
DELETE
FROM tbl_HR_Auditing_Reports
WHERE DocumentName = 'Restrictive Covenant Letters'
AND NOT EXISTS
( SELECT 1
FROM tmp
WHERE tmp.AssociateID = tbl_HR_Auditing_Reports.AssociateID
)
Example on SQL Fiddle
I have a SSIS package that copies data from table A to table B and sets a flag in table A so that the same data is not copied subsequently. This works great by using the following as the SQL command text on the ADO Net Source object:
update transfer
set ProcessDateTimeStamp = GetDate(), LastUpdatedBy = 'legacy processed'
output inserted.*
where LastUpdatedBy = 'legacy'
and ProcessDateTimeStamp is not null
The problem I have is that I need to run a similar data copy but from two sources table, joined on a primary / foreign key - select from table A join table B update flag in table A.
I don't think I can use the technique above because I don't know where I'd put the join!
Is there another way around this problem?
Thanks
Rob.
You can use a join in an update statement.
update m
set ProcessDateTimeStamp = GetDate(),
LastUpdatedBy = 'legacy processed',
somefield = t.someotherfield
output inserted.*
from transfer t
join mytable m
on t.id = m.id
where m.LastUpdatedBy = 'legacy'
and m.ProcessDateTimeStamp is null
and t.ProcessDateTimeStamp is not null
The key is to not alias the fields on the left side of the set but to alias everything else. And use the table alias for the table you are updating after the update key word so it knows which table of the join to update.