Avoid duplicate data in mySQL table

Avoid duplicate data in mySQL table - mysql

I have the following table in my DDBB:
CREATE TABLE subjects (
subject_id int(11) NOT NULL AUTO_INCREMENT,
subject text,
PRIMARY KEY (subject_id, subject)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 AUTO_INCREMENT=1;
This is an example of my table:
id | subject |
1 test
2 ICT
3 ICT
The key (id) is not duplicate since it is automatically generated by MySQL, but the las two rows are repeating.
How can I avoid repeating the subject name?
I have read that it can be done with a 'constraint' like this:
ALTER TABLE subjects
ADD CONSTRAINT constraint_subject UNIQUE KEY(subject);
But I've tried it and I get an error every time.
I know that this has been asked before but I'm still entering from my PHP, some subjects with the same name and the program always lets me enter them.

Adding the constraint gives you an error because it wouldn't be satisfied since you already have duplicated data.
You have to delete the duplicates and then add the constraint, which would then work.
If you want to select only distinct rows right now, even when having duplicates in your database, you can run the following:
SELECT
*
FROM
subjects AS s1
WHERE
NOT EXISTS (
SELECT
id
FROM
subjects AS s2
WHERE
s1.subject = s2.subject
AND s1.id != s2.id
);
Or
SELECT
s1.*
FROM
subjects AS s1
LEFT JOIN
subjects AS s2
ON (s1.subject = s2.subject AND s1.id != s2.id)
WHERE
s2.id IS NULL
Both will give the same result but I find the first one to be more explicit about what you're trying to achieve.

You can't create an index on a column of data type TEXT because that's too long for an index.
You can create an index, even a unique index, on a prefix of that column.
ALTER TABLE subjects
ADD CONSTRAINT constraint_subject UNIQUE KEY(subject(191));
This means two subjects cannot have exactly the same leading 191 characters.
I don't think you should declare the PRIMARY KEY including the subject. It's more typical to use the auto-increment integer column alone as the primary key.
So your table ends up with this definition:
CREATE TABLE `subjects` (
`subject_id` int(11) NOT NULL AUTO_INCREMENT,
`subject` text,
PRIMARY KEY (`subject_id`),
UNIQUE KEY `constraint_subject` (`subject`(191))
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
I chose the length 191 because it's the longest I could fit in the 767 byte limit that InnoDB has on indexes (utf8mb4 characters count as 4 bytes).

Related

Update of primary key would cause duplicate entries in foreign table

I have two tables described by the following SQL Fiddle. My application needs to insert new records in tblA in between two already existing records. For example, if tblA has 6 records with AID ranging from 0 to 5 and I want to insert a new record with AID being 4, I increment the AID of tuple 4 and tuple 5 by one and then insert the new record. Thus, I use the following prepared statement to increment the value of the column AID of the tuples of both tblA and tblB (via cascading) by one:
update tblA set AID = (AID + 1) where AID >= ? order by AID desc;
On my test Installation the above Statement works great. However, on our production system we get the following error message in some, but not all cases:
Foreign key constraint for table 'tblA', record '4' would lead to a duplicate entry in table 'tblB'
Now, it is unclear to me what exactly causes the problem and how to solve the issue.
I appreciate any tips. Thanks in advance!

About tblB
This
create table if not exists tblB(
BID integer not null,
AID integer not null,
constraint fkB_A foreign key(AID) references tblA(AID),
primary key(AID, BID)
);
should probably be
create table if not exists tblB(
BID integer not null,
AID integer not null,
constraint fkB_A foreign key(AID) references tblA(AID)
on update cascade,
-- ^^^^^^^^^^^^^^^^
primary key(AID, BID)
);
Surrogate ID numbers in the relational model of data and in SQL databases are meaningless. Unless you know more than you've included in your question, AID and BID are meaningless. In a properly designed database, there's never a need to insert a row between two other rows based solely on their surrogate ID numbers.
If your real-world requirement is simply to insert a timestamp between "2015-12-01 23:07:00" and "2015-12-04 14:58:00", you don't need the ID number 4 to do that.
-- Use single quotes around timestamps.
insert into tblA values (-42, '2015-12-03 00:00:00');
select * from tblA order by RecordDate;
AID RecordDate
--
0 2015-11-07 16:55:00
1 2015-11-08 22:16:00
2 2015-11-10 14:26:00
3 2015-12-01 23:07:00
-42 2015-12-03 00:00:00
5 2015-12-04 14:58:00
6 2015-12-13 10:07:00
About tblA
This
create table if not exists tblA(
AID integer not null,
RecordDate varchar(25),
constraint pkA primary key(AID)
);
should probably be
create table if not exists tblA(
AID integer not null,
RecordDate varchar(25) not null,
-- ^^^^^^^^
constraint pkA primary key(AID)
);
Without that not null, you can insert data like this.
AID RecordDate
--
17 Null
18 Null
19 Null
Since surrogate ID numbers are meaningless, these rows are all essentially both identical and identically useless.
About the update statement
update tblA
set AID = (AID + 1)
where AID >= 4
order by AID desc;
Standard SQL doesn't permit order by in this position in update statement. MySQL documents this as
If the ORDER BY clause is specified, the rows are updated in the order
that is specified.
The relational model and SQL are set-oriented. Updates are supposed to happen "all at once". IMHO, you'd be better off learning standard SQL and using a dbms that better supports standard SQL. (PostgreSQL springs to mind.) But adding on update cascade to tblB (above) will let your update statement succeed in MySQL.
update tblA
set AID = (AID + 1)
where AID >= 4 order by AID desc;

adding on update cascade might solve your problem
create table if not exists tblB(
BID integer not null,
AID integer not null,
constraint fkB_A foreign key(AID)
references tblA(AID)
on update cascade,
primary key(AID, BID));

MySql replace with multiple primary keys

I have a table which has three primary keys and references three other tables
Here is the table scheema:
CREATE TABLE IF NOT EXISTS training_matrix_reference(
employee INT NOT NULL,
training_matrix INT NOT NULL,
training_record INT UNSIGNED NOT NULL,
PRIMARY KEY (employee, training_matrix,training_record),
FOREIGN KEY (employee) REFERENCES employees(id),
FOREIGN KEY (training_matrix) REFERENCES training_matrix_courses(id),
FOREIGN KEY (training_record) REFERENCES training_records(m_id)
)
I'm trying to craft a REPLACE statement which updates the training_record column or training_matrix column or both columns or creates a new row if not exists, but I also need to check that the employee belongs to the same company.
Here's what I tried so far:
REPLACE INTO `training_matrix_reference`
( employee, training_matrix, training_record ) (
SELECT id, '5', '100'
FROM employees
WHERE id =22
AND company =64
)
So my theory was that this should have replaced the first row in the table, updating training_record to 100 but in fact it actually created a new row:
22 | 5 | 100
My guess is that this happened because training_record is a primary key?
But I'm not sure that removing the primary keys/references is the right way to go as this table is used as a many to many table in other queries.
Effectively what I'm trying to do is:
REPLACE INTO `training_matrix_reference`
( employee, training_matrix, training_record )
VALUES
(22,33,18)
WHERE
employee = 22
and training_matrix = 5
and training_record = 2189
But obviously a replace statement doesn't have a where clause.
I did check out these similar questions:
MySQL REPLACE INTO on multiple keys?
mysql REPLACE query with multiple primary keys
But unfortunately MySql is not my strong suit and I could really use some help.
I hope I explained things clearly, Thanks

The PRIMARY KEY of the training_matrix_reference table is the combination of three columns. The table doesn't have multiple primary keys, it has a single PRIMARY KEY.
The REPLACE syntax you have is equivalent to performing:
DELETE FROM training_matrix_reference
WHERE employee = 22
AND training_matrix = 5
AND training_record = 100
;
INSERT INTO training_matrix_reference (employee, training_matrix, training_record)
VALUES (22, 5, 100);
;
The DELETE action only removes rows where the entire primary key is matched. Given the information you provided, we'd expect a row to be added to the table.
Did you have a question?

you should make a joining table between (employee, training_matrix_reference)
or dispense at lest one relation

add a value in table 1 from table 2 where multiple columns matches

I have a table 1 with 16 unique values considering columns A, B and C.
CREATE TABLE `monitor_periodo` (
`ano` varchar(255) DEFAULT 'NULL',
`mes` varchar(255) DEFAULT 'NULL',
`data` varchar(255) DEFAULT 'NULL',
`id_periodo` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id_periodo`),
UNIQUE KEY `UNIQUE` (`ano`,`mes`,`data`)
)
A have another table, Table 2 with millions of rows and with the same structure of columns as Table 1 (except the id.periodo), so my 16 combinations from Table 1 repeats a lot in Table 2, however I do not have a id.periodo column in Table 2 to link it with table 1.
I would like to insert in Table 2 the column id.periodo following the same "matches" as Table 1. Of course it is not going to be a unique index, since the numbers from 1 to 16 will repeat a lot, but my intention is to create foreign key in Table 2 following the Primary Key (and also index) from Table 1.
Thank you in advance,
Gabriel

You can update your table2 with the id_periodo field from monitor_periodo using following statement:
UPDATE
table2
LEFT JOIN
monitor_periodo
ON
monitor_periodo.ano = table2.ano
AND
monitor_periodo.mes = table2.mes
AND
monitor_periodo.data = table2.data
SET
table2.id_periodo = monitor_periodo.id_periodo
;
Then you can create the foreign key constraint with:
ALTER TABLE table2
ADD FOREIGN KEY (id_periodo) REFERENCES monitor_periodo(id_periodo)
;

Getting the affected rows when altering a table in mysql

I need to retrieve a report of the affected rows when a table has been altered with the following commands:
1.- Changing the engine:
ALTER TABLE <table> ENGINE=INNODB;
2.- Adding constraints:
ALTER TABLE nombre_tabla ADD PRIMARY KEY símbolo_clave_foránea;
ALTER TABLE nombre_tabla DROP PRIMARY KEY símbolo_clave_foránea;
ALTER TABLE nombre_tabla ADD FOREIGN KEY símbolo_clave_foránea;
ALTER TABLE nombre_tabla DROP FOREIGN KEY símbolo_clave_foránea;
3.- Adding a UNIQUE constraint.

Primary or Unique Key failure is look for duplicates, if you have nulls in there you'll need to sort them first.
E.g given MyTable(KeyField int not null) then
Select KeyField From MyTable
inner join (Select KeyField,Count() as NumberOfTimes Group By KeyField) Duplicates
Where NumberOfTimes > 1
Then you'll have to come up with something to do with them. Delete or rekey.
Foreign Keys just a outer join query with where key is null
e.g Given MyTable (KeyField int not null, ForeignKeyField int not null) and
MyLookUpTable(LookUpkey int not null, Description VarChar(32) not null) then
Select KeyField From MyTable
Left Join MyLookUpTable On MyTable.LookUpField = MyLookUpTable.LookUpKey
Where MyTable.LookUpField Is Null
Again you'll have to decide what to do with them. You could delete them, but this might help.
One way is to insert a "Missing" Record in the look Up Table, grab it's key, then do an update with join. So given that key is 999
Update m
Set LookUpField = 999
From MyTable m
Left Join MyLookUpTable On m.LookUpField = MyLookUpTable.LookUpKey
Where m.LookUpField Is Null
Now you can dig out 999s and deal with them at your leisure.

MySql "INSERT … ON DUPLICATE KEY UPDATE" still inserting duplicate records. What am I missing?

I have a simple table set up with two columns, each column is a key value. the values stored in each field are varchar(45) representing an email address and a keyword. It is possible that the information collected may duplicate itself as it is related to site browsing data collection. To avoid duplicate entries, I used tried to use INSERT IGNORE into, REPLACE into, and finally I'm trying the following:
insert into <table name> (user_email, key_token) values ('<email>#<this>.com', 'discountsupplies') on duplicate key update user_email='<email>#<this>.com',key_token='discountsupplies';
but I am still seeing duplicate records being inserted into the table.
The SQL that generated the table:
DROP TABLE IF EXISTS `<database name>`.`<table name>` ;
CREATE TABLE IF NOT EXISTS `<database name>`.`<table name>` (
`user_email` VARCHAR(45) NOT NULL ,
`key_token` VARCHAR(45) NOT NULL,
PRIMARY KEY (`user_email`, `key_token`) )
ENGINE = InnoDB;
While I saw several questions that were close to this one, I did not see any that addressed why this might be happening, and I'd like to figure out what I'm not understanding about this behavior. Any help is appreciated.
As an addendum, After adding the UNIQUE KEY statements, I went back and tried both REPLACE and INSERT IGNORE to achieve my goal, and none of these options is excluding duplicate entries.
Also adding: UNIQUE INDEX (user_email, key_token)
doesn't seem to help either.
I'm going to do this check via a manual look-up routine until I can figure this out. If I find an answer I'll be happy to update the post.
Added Unique Index lines below the original create table statement -
-- -----------------------------------------------------
-- Table `<db name>`.`<table name>`
-- -----------------------------------------------------
DROP TABLE IF EXISTS `<db name>`.`<table name>` ;
CREATE TABLE IF NOT EXISTS `<db name>`.`<table name>` (
`user_email` VARCHAR(45) NOT NULL ,
`key_token` VARCHAR(45) NOT NULL,
PRIMARY KEY (`user_email`, `key_token`),
UNIQUE KEY (user_email),
UNIQUE KEY (key_token)
)
ENGINE = InnoDB;
CREATE UNIQUE INDEX ix_<table name>_useremail on `<db name>`.`<table name>`(user_email);
CREATE UNIQUE INDEX ix_<table name>_keytoken on `<db name>`.`<table name>`(key_token);
it seems to be ok (no errors when creating tables during the source step), but I'm still getting duplicates when running the on duplicate query.

You have a composite primary key on both columns.
This means that it's the combination of the fields is UNIQUE, not each field as is.
Thes data are possible in the table:
1#example.com 1
2#example.com 1
2#example.com 2
, since no combination of (user_email, key_token) repeats in the table, while user_email and key_token as themselves can repeat.
If you want each separate column to be UNIQUE, define the UNIQUE constraints on the fields:
CREATE TABLE IF NOT EXISTS `<database name>`.`<table name>` (
`user_email` VARCHAR(45) NOT NULL ,
`key_token` VARCHAR(45) NOT NULL,
PRIMARY KEY (`user_email`, `key_token`),
UNIQUE KEY (user_email),
UNIQUE KEY (key_token)
)
ENGINE = InnoDB;
Update
Having duplicates in a column marked as UNIQUE would be a level 1 bug in MySQL.
Could you please run the following queries:
SELECT user_email
FROM mytable
GROUP BY
user_email
HAVING COUNT(*) > 1
SELECT key_token
FROM mytable
GROUP BY
key_token
HAVING COUNT(*) > 1
and see if they return something?

PRIMARY KEY (user_email,key_token) means a combination of both will be unique but if you also want individual email and key_tokens to be unique you have to use UNIQUE seperately for each column..
PRIMARY KEY ('user_email', 'key_token'),
UNIQUE KEY (user_email),
UNIQUE KEY (key_token)

final solution for now: query table to get list of key_tokens by user_email, test current key_token against list entries, if found don't insert.
Not optimal or pretty, but it works....

To me it looks like you selected composite Primary Key solely for performance reasons where it should be an index like so
CREATE TABLE IF NOT EXISTS `<database name>`.`<table name>` (
`user_email` VARCHAR(45) NOT NULL ,
`key_token` VARCHAR(45) NOT NULL,
PRIMARY KEY (`user_email`),
INDEX (`user_email`, `key_token`)
)
Of course if you are concerned about getting a duplicate key_token you'll still need a unique index.
Sorry I'm awfully late to reply, but perhaps someone will stumble on this like I have :)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Avoid duplicate data in mySQL table - mysql

Related

Update of primary key would cause duplicate entries in foreign table

MySql replace with multiple primary keys

add a value in table 1 from table 2 where multiple columns matches

Getting the affected rows when altering a table in mysql

MySql "INSERT … ON DUPLICATE KEY UPDATE" still inserting duplicate records. What am I missing?

Categories

Resources