MySQL cleanup table from duplicated entries AND relink FK in depending table

MySQL cleanup table from duplicated entries AND relink FK in depending table - mysql

Here is my situation: I have 2 tables, patient and study.
Each table has its own PK using autoincrement.
In my case, the pat_id should be unique. It's not declared as unique at database level since it could be non unique is some uses (it's not a home made system). I found out how to configure the system to consider the pat_id as unique, but I need now to cleanup the database for duplicated patients AND relink duplicated patients in study table to remaining unique patient, before deleting the duplicated patients.
Patient table:
CREATE TABLE `patient` (
`pk` BIGINT(20) NOT NULL AUTO_INCREMENT,
`pat_id` VARCHAR(250) COLLATE latin1_bin DEFAULT NULL,
...
`pat_name` VARCHAR(250) COLLATE latin1_bin DEFAULT NULL,
...
`pat_custom1` VARCHAR(250) COLLATE latin1_bin DEFAULT NULL
....
PRIMARY KEY (`pk`)
)ENGINE=InnoDB;
Study table:
CREATE TABLE `study` (
`pk` BIGINT(20) NOT NULL AUTO_INCREMENT,
`patient_fk` BIGINT(20) DEFAULT NULL,
...
PRIMARY KEY (`pk`),
...
CONSTRAINT `patient_fk` FOREIGN KEY (`patient_fk`) REFERENCES `patient` (`pk`)
)ENGINE=InnoDB;
I found some similar questions, but not exactly the same issue, especially it was missing the link of the foreign keys to the remaining unique patient.
Cleanup Update for Duplicate Entries
Update only first record from duplicate entries in MySQL

This is how I did.
I reused an unused field in patient table to mark non duplicated (N), 1st of duplicated (X), and other duplicated patients (Y). You could also add a column for this (and drop it after use).
Here are the steps I followed to cleanup my database:
/*1: List duplicated */
select pk,pat_id, t.`pat_id_issuer`, t.`pat_name`, t.pat_custom1
from patient t
where pat_id in (
select pat_id from (
select pat_id, count(*)
from patient
group by 1
having count(*)>1
) xxx);
/*2: Delete orphan patients */
delete from patient where pk not in (select patient_fk from study);
/*3: Reset flag for duplicated (or not) patients*/
update patient t set t.`pat_custom1`='N';
/*4: Mark all duplicated */
update patient t set t.`pat_custom1`='Y'
where pat_id in (
select pat_id from (
select pat_id, count(*)
from patient
group by 1
having count(*)>1
) xxx) ;
/*5: Unmark the 1st of the duplicated*/
update patient t
join (select pk from (
select min(pk) as pk, pat_id from patient
where pat_custom1='Y'
group by pat_id
) xxx ) x
on (x.pk=t.pk)
set t.`pat_custom1`='X'
where pat_custom1='Y'
;
/*6: Verify update is correct*/
select pk, pat_id,pat_custom1
from `patient`
where pat_custom1!='N'
order by pat_id, pat_custom1;
/*7: Verify studies linked to duplicated patient */
select p.* from study s
join patient p on (p.pk=s.patient_fk)
where p.pat_custom1='Y';
/*8: Relink duplicated patients */
update study s
join patient p on (p.pk=s.patient_fk)
set patient_fk = (select pk from patient pp
where pp.pat_id=p.pat_id and pp.pat_custom1='X')
where p.pat_custom1='Y';
/*9: Delete newly orphan patients */
delete from patient where pk not in (select patient_fk from study);
/* 10: reset flag */
update patient t set t.`pat_custom1`=null;
/* 11: Commit changes */
commit;
There is certainly a shorter way, with a some smarter (complicated?) SQL, but I personally prefer the simple way. This also allows me to check each step is doing what I expect.

Related

Update table's column with values from another table's column

I have a table with random names (along with an id as primary key):
CREATE TABLE `people` (
`id` int(4) NOT NULL AUTO_INCREMENT,
`name` varchar(30) NOT NULL,
PRIMARY KEY (`id`)
);
I have inserted in it 100 random names along with their ids. I also have another table with other names:
CREATE TABLE `names` (
`id` int(4) NOT NULL AUTO_INCREMENT,
`name` varchar(30) NOT NULL,
PRIMARY KEY (`id`)
);
This table also has 100 (different) random names along with their ids. I want to update the column name of table people with the names from the column name of table names.
I obviously have to use UPDATE and SET but in most ways I saw that people are also using INNER JOIN. Personally, I am wondering if there is simpler way to do this (without using INNER JOIN) and I am missing it?

the update with inner join as eg:
update people
inner join names on people.id = names.id
set people.name = names.name
Is the simplest and also the more clear, compact and expressive.
others methods need normally subselect or implicit join based on where condition. In one case there query is more verbouse and in the second the query is more confused and often less performant.

Linking two tables using a foreign key in where one table contains images and the other user info

i have two tables: TableA contains images and TableB contains user information.
TableA
CREATE TABLE `TableA` (
`picID` int(11) NOT NULL,
`logos` varchar(200) DEFAULT NULL,
`user_id` varchar(20) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
TableB
CREATE TABLE `TableB` (
`user_id` int(11) NOT NULL,
`user_name` varchar(12) NOT NULL,
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
The foreign key in TableA is user_id. And user belongs to a session called members.
user_id=".$_SESSION['members']);
picID is a primary key in TableA and user_id is the primary key in TableB,both set to auto increment and not null.
I would then like to know how to associate the logos in TableA to the user table, TableB, such that the associated row with an image in TableA will only display during the specific user session when I use echo.
In other words, the user_id should be associated with the picID's specific row so that every user has his or her own image row with unique images.
But i can't seem to figure out how to link the two tables using my foreign key user_id in my database.

This query will return all images associated with each user.
SELECT b.user_id, b.user_name, a.picID, a.logos
FROM TableB b
LEFT JOIN TableA a ON a.user_id = b.user_id
WHERE b.user_id = $user_id
Or, if you only care about the contents of TableB in this context do this simpler query:
SELECT a.picID, a.logos
FROM TableA a
WHERE a.user_id = $user_id
ORDER BY a.picID DESC LIMIT 1
Your question beyond that is not clear. Still, I guess you want just one image if a user has multiple images associated with them.
So, you need to decide how you will choose among the images. One choice is this: choose the latest image for the user. In that case append this line to either query I just gave you.
ORDER BY a.picID DESC LIMIT 1
I define latest image here as image with the highest value of picID.

MySQL update column with value from a different table

I have two tables with the following structure and example content. Table one has the membership_no set to the correct values, but table two has some incorrect values in the membership_no column. I am needing to query both tables and check to see when the membership_no values are not equal, then update table two's membership_no column with the value from table one.
Table One:
id membership_no
====================
800960 800960
800965 800965
Table Two:
id membership_no
====================
800960 800970
800965 800975
Update query so far. It is not catching all of the incorrect values from table two.
UPDATE
tabletwo
INNER JOIN
tableone ON tabletwo.id = tableone.id
SET
tabletwo.membership_no = tableone.membership_no;
EDIT: Including SHOW CREATE and SELECT queries for unmatched membership_no column values.
Table One SHOW:
CREATE TABLE `n2z7m3_kiduka_accounts_j15` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`membership_no` int(11) NOT NULL,
...
`membershipyear` varchar(100) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `user_id` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=800987 DEFAULT CHARSET=utf8
Table Two SHOW:
CREATE TABLE `n2z7m3_kiduka_accounts` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`membership_no` int(11) NOT NULL,
...
`membershipyear` varchar(100) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `user_id` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=801072 DEFAULT CHARSET=utf8
SELECT query for unmatched membership_no column values:
SELECT
u.name,
a.membership_no as 'Joomla 1.5 accounts table',
j.membership_no as 'Joomla 3.0 accounts table'
FROM
n2z7m3_kiduka_accounts_j15 AS a
INNER JOIN n2z7m3_users AS u ON a.user_id = u.id
INNER JOIN n2z7m3_kiduka_accounts AS j ON a.user_id = j.membership_no
and a.membership_no != j.membership_no
ORDER BY u.name;

While Tim's Answer is perfectly valid, another variation is to add the filter qualifier to the ON clause such that:
UPDATE tabletwo
INNER JOIN
tableone ON tabletwo.id = tableone.id AND tabletwo.membership_no <> tableone.membership_no
SET
tabletwo.membership_no = tableone.membership_no;
This means that you don't have the WHERE filter so it will process all rows, but will act on only those with differing membership_no values. Because it is an INNER JOIN the results will be both tables or no tables (Skipped/NULL result).
EDIT:
If you suspect you have a problem still, what does the MySQL command respond, do you have a specific error notice? With 80k columns, it may take a while for the comand to actually process , so are you giving the command time to complete or is PHP or the system causing the command to abort due to execution time expiry? (Update your execution time on PHP and MySQL and rerun query just to see if that causes it to complete successfully?)
Suggestion
As another sggestion I think your UNIQUE KEY should also be your AI key so for both tables:
DROP INDEX `user_id` ON <table> #removes the current unique index.
then
CREATE UNIQUE INDEX `id` ON <table> #addes unique index to the A_I column.

You just need to add a WHERE clause:
UPDATE
tabletwo
INNER JOIN
tableone
ON tabletwo.id = tableone.id
SET
tabletwo.membership_no = tableone.membership_no
WHERE tabletwo.membership_no <> tableone.membership_no

RANK conversion of MS SQL to MYSQL

I am converting our project database from SQL Server to MySQL, the DB conversion has done already.
We have code as below to identify duplicate records based on hashcode and update them as duplicate.
Rank function in MySQL ([Rank function in MySQL) need rank based on age which will start with 1 and increment by 1 for each record. But for me Rank for each hascode should start from 1 and increment by 1 for same hascode, if new hascode comes Rank should start from 1.
update table set Duplicate=1
WHERE id IN
( SELECT id FROM (
select RANK() OVER (PARTITION BY Hashcode ORDER BY Date asc) R,*
from table )A where R!=1 )
Below is table structure
CREATE TABLE TBL (
id int(11) NOT NULL AUTO_INCREMENT,
FileName varchar(100) DEFAULT NULL,
date datetime DEFAULT NULL,
hashcode varchar(255) DEFAULT NULL,
FileSize varchar(25) DEFAULT NULL,
IsDuplicate bit(1) DEFAULT NULL,
IsActive bit(1) DEFAULT NULL
PRIMARY KEY (`id`)
)
Please help me to migrate this code to MYSQL.

You don't need to use enumeration for this logic. You just want to set the duplicate flag on everything that is not the minimum date for the hashcode:
update table t join
(select hashcode, min(date) as mindate
from table t
group by hashcode
) tt
on t.hashcode = tt.hashcode and t.date > tt.mindate
set t.Duplicate = 1;

MySQL features a rather unique way to delete duplicates:
alter ignore table YourTable
add unique index ux_yourtable_hashcode (hashcode);
The trick here is in the ignore option:
If IGNORE is specified, only one row is used of rows with duplicates
on a unique key. The other conflicting rows are deleted.
But there are also other ways. Based on your comment, there is an auto_increment column called id. Since this column is unique and not null, you can use it to distinguish duplicates. You'd need a temporary table to work around the cant specify target table TBL for update in FROM clause error:
create temporary table tmp_originals (id int);
insert tmp_originals
(id)
select min(id)
from YourTable
group by
hashcode;
update YourTable
set Duplicate = 1
where id not in (select id from tmp_originals);
The group by query selects the lowest id per group of rows with the same hashcode.

MySQL multiple columns as unordered sets

I'm somewhat new to MySQL and SQL in general, so hopefully this isn't a simple question.
I have a table that represents items in a customer's basket at checkout. This table represents a situation in which a customer is limited to 3 items, so I currently have a column for each item in the basket. It looks like this:
+------------------------------------------------------+
+ id | item1 | item2 | item3 | val |
+------------------------------------------------------+
where val is just some value associated with the basket. The ordering of the items means nothing in terms of my processing, so in theory I would like to have them represented as an unordered set. This means that a row of (i1,i2,i3,val) is functionally equivalent to (i2,i1,i3,val).
My question is, how do I implement this in my table and/or in SQL such that selecting (i3,i2,i1,val) will return the row for (i1,i2,i3,val)?
I also need to have something that catches uniqueness when I'm inserting. For example, if I insert (i2,i3,i1,newval), I would want the table to update (i1,i2,i3,val) to be (i1,i2,i3,newval).

You could standardise your model by using a 0 to many relation between customer and item:
-- assuming the existing table to be named `yourtable`
-- assuming your customer's table to be named `customer`
-- assuming your customer's id in the customer's table to be named `id`
-- assuming innodb (remove fk constraint if not)
CREATE TABLE `customer_item` (
`id` INT(10) NOT NULL AUTO_INCREMENT,
`id_customer` INT(10) NOT NULL,
`item` VARCHAR(255) NOT NULL,
PRIMARY KEY (`id`)
)
ENGINE=innodb
SELECT NULL AS `id`, t.id AS `id_customer`, t.`item`
FROM (
SELECT id, item1 AS `item`
FROM
yourtable
UNION
SELECT id, item2 AS `item`
FROM
yourtable
UNION
SELECT id, item3 AS `item`
FROM
yourtable
) t
ORDER BY t.id ASC
;
CREATE INDEX UNIQUE `idx_customer_item_cust` ON `customer_item` (`id_customer`, `item`);
ALTER TABLE `customer_item` ADD CONSTRAINT `fk_customer_item_cust` FOREIGN KEY (`id_customer`) REFERENCES `customer` (`id`) ON DELETE CASCADE ON UPDATE CASCADE;
-- once you check the data is consistent:
DROP TABLE `yourtable`;
Once that done, no item could be possibly inserted twice for the same customer.
Please note:
the union select skips duplicates already at table creation, in case some items were repeated for some customers
your data is normalised, from the customer to item point of view
your data is still not normalised, from the item point of view. You should have an item table, and the customer_item table should reference the id of items in the item table instead of using item names or description for varchars.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL cleanup table from duplicated entries AND relink FK in depending table - mysql

Related

Update table's column with values from another table's column

Linking two tables using a foreign key in where one table contains images and the other user info

MySQL update column with value from a different table

RANK conversion of MS SQL to MYSQL

MySQL multiple columns as unordered sets

Categories

Resources