MySQL: ALTER IGNORE TABLE ADD UNIQUE, what will be truncated? - mysql

I have a table with 4 columns: ID, type, owner, description. ID is AUTO_INCREMENT PRIMARY KEY and now I want to:
ALTER IGNORE TABLE `my_table`
ADD UNIQUE (`type`, `owner`);
Of course I have few records with type = 'Apple' and owner = 'Apple CO'. So my question is which record will be the special one to stay after that ALTER TABLE, the one with smallest ID or maybe the one with biggest as the latest inserted?

The first record will be kept, the rest deleted ยงยง:
IGNORE is a MySQL extension to
standard SQL. It controls how ALTER
TABLE works if there are duplicates on
unique keys in the new table or if
warnings occur when strict mode is
enabled. If IGNORE is not specified,
the copy is aborted and rolled back if
duplicate-key errors occur. If IGNORE
is specified, only the first row is
used of rows with duplicates on a
unique key, The other conflicting rows
are deleted. Incorrect values are
truncated to the closest matching
acceptable value
I am guessing 'first' here means the one with the smallest ID, assuming the ID is the primary key.
Also note:
As of MySQL 5.7.4, the IGNORE clause for ALTER TABLE is removed and its use produces an error.

It appears that your problem is one of the very reasons that ALTER IGNORE has been deprecated.
This is from the MySQL notes on the ALTER IGNORE deprecation:
"This feature is badly defined (what is the first row?), causes problems
for replication, disables online alter for unique index creation and has
caused problems with foreign keys (rows removed in parent table)."

Related

MySQL: Insert multiple values if they don't exist, but need a multiple column check

I have a simpe query like so:
INSERT INTO myTable (col1, col2) VALUES
(1,2),
(1,3),
(2,2)
I need to do a check that no duplicate values have been added BUT the check needs to happen across both column: if a value exists in col1 AND col2 then I don't want to insert. If the value exists only in one of those columns but not both then then insert should go through..
In other words let's say we have the following table:
+-------------------------+
|____col1____|___col2_____|
| 1 | 2 |
| 1 | 3 |
|______2_____|_____2______|
Inserting values like (2,3) and (1,1) would be allowed, but (1,3) would not be allowed.
Is it possible to do a WHERE NOT EXISTS check a single time? I may need to insert 1000 values at one time and I'm not sure whether doing a WHERE check on every single insert row would be efficient.
EDIT:
To add to the question - if there's a duplicate value across both columns, I'd like the query to ignore this specific row and continue onto inserting other values rather than throwing an error.
What you might want to use is either a primary key or a unique index across those columns. Afterwards, you can use either replace into or just insert ignore:
create table myTable
(
a int,
b int,
primary key (a,b)
);
-- Variant 1
replace into myTable(a,b) values (1, 2);
-- Variant 2
insert ignore into myTable(a,b) values (1,2);
See Insert Ignore and Replace Into
Using the latter variant has the advantage that you don't change any record if it already exists (thus no need to rebuild any index) and would best match your needs regarding your question.
If, however, there are other columns that need to be updated when inserting a record violating a unique constraint, you can either use replace into or insert into ... on duplicate key update.
Replace into will perform a real deletion prior to inserting a new record, whereas insert into ... on duplicate key update will perform an update instead. Although one might think that the result will be same, so why is there a statement for both operations, the answer can be found in the side-effects:
Replace into will delete the old record before inserting the new one. This causes the index to be updated twice, delete and insert triggers get executed (if defined) and, most important, if you have a foreign key constraint (with on delete restrict or on delete cascade) defined, your constraint will behave exactly the same way as if you deleted the record manually and inserted the new version later on. This means: Either your operation fails because the restriction is in place or the delete operation gets cascaded to the target table (i.e. deleting related records there, although you just changed some column data).
On the other hand, when using on duplicate key update, update triggers will get fired, the indexes on changed columns will be rewritten once and, if a foreign key is defined on update cascade for one of the columns being changed, this operation is performed as well.
To answer your question in the comments, as stated in the manual:
If you use the IGNORE modifier, errors that occur while executing the INSERT statement are ignored. For example, without IGNORE, a row that duplicates an existing UNIQUE index or PRIMARY KEY value in the table causes a duplicate-key error and the statement is aborted. With IGNORE, the row is discarded and no error occurs. Ignored errors may generate warnings instead, although duplicate-key errors do not.
So, all violations are treated as warnings rather than errors, causing the insert to complete. Otherwise, the insert would be applied partially (except when using transactions). Violations of duplicate key, however, do not even produce such a warning. Nonetheless, all records violating any constraint won't get inserted at all, but ignore will ensure all valid records get inserted (given that there is no system failure or out-of-memory condition).

Why mysql autoincrement increments the last used id rather then the last existing id

I am using mysql, and am looking at a strange behavior.
Scenario :
I have a table having table_id as primary key, which is set to auto-increment.
table_id more_columns
1 some value
2 others
Now if i delete row 2, and insert one more row, the table_id becomes 3 (Expected is 2)
table_id more_columns
1 some value
3 recent
Why is it so? Here I am loosing some ids (I know they are not important). Please put some lights on this behavior
In auto-increment field If a row is deleted, the auto_increment column of that row will not be re-assigned.
Please see here for more information.
For reasons why auto-increment doesn't use deleted values you can refer here(mentioned in comments by #AaronBlenkush).
The auto_increment value is a counter stored internally for each table. The counter is only increased, never decreased.
Every syntactically correct INSERT statement fired against the table increments this counter, even when it is rolled back and also when you define an insert value for the primary key.
A MySQL auto_increment column maintains a number internally, and will always increment it, even after deletions. If you need to fill in an empty space, you have to handle it yourself in PHP, rather than use the auto_increment keyword in the table definition.
Rolling back to fill in empty row ids can cause all sorts of difficulty if you have foreign key relationships to maintain, and it really isn't advised.
The auto_increment can be reset using a SQL statement, but this is not advised because it will cause duplicate key errors.
-- Doing this will cause problems!
ALTER table AUTO_INCREMENT=12345;
EDIT
To enforce your foreign key relationships as described in the comments, you should add to your table definition:
FOREIGN KEY (friendid) REFERENCES registration_table (id) ON DELETE SET NULL;
Fill in the correct table and column names. Now, when a user is deleted from the registration, their friend association is nulled. If you need to reassociate with a different user, that has to be handled with PHP. mysql_insert_id() is no longer helpful.
If you need to find the highest numbered id still in the database after deletion to associate with friends, use the following.
SELECT MAX(id) FROM registration_table;
After delete write this query
ALTER TABLE tablename AUTO_INCREMENT = 1

avoiding concurrent insert multiple same id in mysql

I am facing a problem in my application. I have a table that one field name is registration_no. Before inserting a new record i increment registration_no field by 1 and then insert that incremented registration_no in that table. the problem is when some user concurrently insert data some registration_no value has been same. how can i prevent this.
You want to use a sequence.
Two caveats:
The AUTO_INCREMENT feature described in the article is non-standard and may give portability issues when moving to a different database.
If an INSERT is aborted, a number from the sequence is consumed still, so you may end up with holes in the sequence. If that is unacceptable, use an autogenerated sequence for the primary (surrogate) key, and add a separate map from that key to the "official" sequence number, enforcing uniqueness in the index of that table.
The alternative is to enforce UNIQUEness in the database, use an appropriate TRANSACTION ISOLATION LEVEL and add application logic to handle failure to INSERT.
You could have the database set the registration_no for you, and not do this in code. You can get the registration_no in the result of an insert statement and this will solve your concurrency problem.
alter table myTable modify column registration_no int auto_increment
The result of your query will be the index of the record. IF registration_no is not the index your will need to query the auto generated registration_no based on the returned index id.

How to get rid of duplicate results in a table

I have a table that has some duplicate results. For example:
`person_url` `movie_url`
1 2
1 2
2 3
Would become -->
`person_url` `movie_url`
1 2
2 3
I know how to do it by creating a new table,
create table tmp_credits (select distinct * from name);
However, it is a pretty large table and I have a couple indexes on it which will need to be re-created. How would I do this transformation in place, that is, without creating a new table?
You can add a UNIQUE index over your table's columns using the IGNORE keyword:
ALTER IGNORE TABLE name ADD UNIQUE INDEX (person_url, movie_url);
As stated in the manual:
IGNORE is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.
This will also prevent duplicates from being added in the future.
`create table temp
(col1 varchar(20),col2 varchar(20));
INSERT INTO temp VALUES
('1','one'),('2','two'),('2','two');
`select col1,col2 from temp
union
select col1,col2 from temp;
`
Have you considered just putting a semantic layer/view on top of the table that de-dups?
select person_url, movie_url
from name
group by person_url, movie_url

Which DB design is faster: a unique index and INSERT IGNORE, or using SELECT to find existing records?

I have a table with just one column: userid.
When a user accesses a certain page, his userid is being inserted to the table. Userids are unique, so there shouldn't be two of the same userids in that table.
I'm considering two designs:
Making the column unique and using INSERT commands every time a user accesses that page.
Checking if the user is already recorded in the table by SELECTing from the table, then INSERTing if no record is found.
Which one is faster?
Definitely create a UNIQUE index, or, better, make this column a PRIMARY KEY.
You need an index to make your checks fast anyway.
Why don't make this index UNIQUE so that you have another fallback option (if you for some reason forgot to check with SELECT)?
If your table is InnoDB, it will have a PRIMARY KEY anyway, since all InnoDB tables are index-organized by design.
In case you didn't declare a PRIMARY KEY in your table, InnoDB will create a hidden column to be a primary key, thus making your table twise as large and you will not have an index on your column.
Creating a PRIMARY KEY on your column is a win-win.
You can issue
INSERT
IGNORE
INTO mytable
VALUES (userid)
and check how many records were affected.
If 0, there was a key violation, but no exception.
How about using REPLACE?
If a user already exists it's being replaced, if it doesn't a new row is inserted.
what about doing update, e.g.
UPDATE xxx SET x=x+1 WHERE userid=y
and if that fails (e.g. no matched rows), then do an insert for a new user?
SELECT is faster... but you'd prefer SELECT check not because of this, but to escape from rasing an error..
orrrrrrr
INSERT INTO xxx (`userid`) VALUES (4) ON DUPLICATE KEY UPDATE userid=VALUE(`userid`)
You should make it unique in any cases.
Wether to check first using SELECT, depends on what scenario is most common. If you have new users all the time, and only occationally existing users, it might be overall faster for the system to just insert and catch the exception in the rare occations this happens, but exception is slower than check first and then insert, so if it is a common scenario that it is an existing user, you should allways check first with select.