How to merge rows with duplicate uniqe key on update in MySQL - mysql

I have following table structure:
+------------------+ +---------------------+
| Users | | Data |
+------------------+ +---------------------+
| id | uname_UK | | id |user_id_FK |data|
+-----|------------| +---------------------+
| 1 | foobar | | 1 | 1 | aa |
| 2 | bazqui +<-------+ 2 | 3 | bb |
| 3 | foobaz | | 3 | 2 | cc |
+------------------+ | 4 | 2 | dd |
+---------------------+
The problem now is, that during storing data in database there was typo. The user named foobaz should be named foobar. The uname column has a Unique constraint.
My question is how to easily fix this problem? When I update the username table, I get error - duplicate uniqe key, as expected. In the end I would like to have the foreign keys updated too.
My idea was do some trigger magic, but I was hoping there would be some more elegant solution. Another constraint here is, that the update is initiated through frontend, so I cannot use PHP.
Alternate way would be to drop the Unique constraint and make some cron job, to periodically update the database and remove the duplicate entries.
Thanks.

Why not just delete the record? Update all data to the user you want to keep and delete the obsolete user.
In Oracle you can do this using the merge into statement. I don't know if that is possible to do in one statement in MySQL, but you might as well execute a separate delete for it. You can make it trigger magic, but I doubt if it's a good decision to always autmagically merge the users. The new username might be a typo too.
So in a normal application, if this would happen so often, I would make a 'merge users' functionality that lets you do just this.

What you should do, is figure out what it means to your data that two users are actualy one. In this case, since there are two records in Data for user ID 2, it seems as if it's okay for users to have several records in Data and you can just
UPDATE data
SET user_id_FK = 1
WHERE user_id_FK = 3;
DELETE FROM users
WHERE id = 3;
In general, you need to figure this out at an application level.
What if there's a foo counter for each user? You should probably add the value from the user you'll be deleting to the value you're keeping.
What if a user has an address?
What if a user can only have one e-mail address and your duplicate user has a different one? Which do you keep?
This is not an easy question with a general answer.

Related

AUTO_INCREMENT for two tables

I have two tables in my database for users:
users
id|username|password|registration_date|
1 |bruce |****** |2017-03-04 |
2 |jason |***** |2017-03-06 |
3 |brad |******* |2017-03-12 |
google_users
id|username|password|registration_date|
1 |jimmy |***** |2017-03-05 |
2 |wade |******* |2017-03-08 |
I want to apply the same AUTO_INCREMENT index for both tables when a new user signs up with google.
Something like this:
users
id|username|password|registration_date|
1 |bruce |****** |2017-03-04 |
3 |jason |***** |2017-03-06 |
5 |brad |******* |2017-03-12 |
google_users
id|username|password|registration_date|
2 |jimmy |***** |2017-03-05 |
4 |wade |******* |2017-03-08 |
How can I do this?
I'm going to vote against this table design and recommend that that you just maintain a single users table:
users (id, username, password, registration_date)
To keep track of the method by which they signed up, you may create a second table:
accounts (id, user_id, type_id)
The type_id can point to yet a third table, indicating whether Google or something else were the source of the signup. Note also that the accounts table can have a user with more than one signup relationship, if you would need that.
The basic idea is that maintaining an auto increment column across two tables will either be impossible, or at the very least ugly. This is not a feature which is usually supported/needed in SQL. So if you find yourself having this need, you should first look closely at your database design.
Not suggested, but if you really want it to happen this way:
You can try to implement this setting in MySQL:
mysql> SHOW VARIABLES LIKE 'auto_inc%';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| auto_increment_increment | 2 |
and then for your tables, you can do:
ALTER TABLE users AUTO_INCREMENT = 1;
ALTER TABLE google_users AUTO_INCREMENT = 2;
So, now, your auto-increment will be incremented by 2 and it gives you the expected result.
But as I said, this will impact your whole DB. All your increments will be done by 2 instead of 1.

How to update a column with specific data for each row? [duplicate]

I'm trying to update one MySQL table based on information from another.
My original table looks like:
id | value
------------
1 | hello
2 | fortune
3 | my
4 | old
5 | friend
And the tobeupdated table looks like:
uniqueid | id | value
---------------------
1 | | something
2 | | anything
3 | | old
4 | | friend
5 | | fortune
I want to update id in tobeupdated with the id from original based on value (strings stored in VARCHAR(32) field).
The updated table will hopefully look like:
uniqueid | id | value
---------------------
1 | | something
2 | | anything
3 | 4 | old
4 | 5 | friend
5 | 2 | fortune
I have a query that works, but it's very slow:
UPDATE tobeupdated, original
SET tobeupdated.id = original.id
WHERE tobeupdated.value = original.value
This maxes out my CPU and eventually leads to a timeout with only a fraction of the updates performed (there are several thousand values to match). I know matching by value will be slow, but this is the only data I have to match them together.
Is there a better way to update values like this? I could create a third table for the merged results, if that would be faster?
I tried MySQL - How can I update a table with values from another table?, but it didn't really help. Any ideas?
UPDATE tobeupdated
INNER JOIN original ON (tobeupdated.value = original.value)
SET tobeupdated.id = original.id
That should do it, and really its doing exactly what yours is. However, I prefer 'JOIN' syntax for joins rather than multiple 'WHERE' conditions, I think its easier to read
As for running slow, how large are the tables? You should have indexes on tobeupdated.value and original.value
EDIT:
we can also simplify the query
UPDATE tobeupdated
INNER JOIN original USING (value)
SET tobeupdated.id = original.id
USING is shorthand when both tables of a join have an identical named key such as id. ie an equi-join - http://en.wikipedia.org/wiki/Join_(SQL)#Equi-join
It depends what is a use of those tables, but you might consider putting trigger on original table on insert and update. When insert or update is done, update the second table based on only one item from the original table. It will be quicker.

Keep certain rows as constant in a MySQL table

I have a situation where I have a table, for example:
| id | type |
------------------
| 0 | Complete |
| 1 | Zone |
Now, I always want my database to be populated with these values, but additionally users should be able to CRUD their own custom types beyond these. For example, a user might decide they want a "Partial Zone" type:
| id | type |
---------------------
| 0 | Complete |
| 1 | Zone |
| 2 | Partial Zone |
This is all fine. But I don't want anyone to be able to delete/modify the first and second rows.
This seems like it should be so simple, but is there a common strategy for handling this case that ensures that these rows go unaffected? Should I put a lock column on the table and only lock these two values when I initially populate the database on application setup? Is there something much more obvious and elegant that I am missing?
Unless I'm missing something, you should be able to just add a third column to your table for the user ID/owner of the record. For the Complete and Zone records, the owner could be e.g. user 0, which would correspond to an admin. In your deletion logic, just check the ID column and do not allow admin records to be deleted by anyone from the application.
If this won't work, you could also consider having two tables, one for system records which cannot be deleted, and another one for user created records. You would have to possibly always take a union of the two tables when you query.

Update one MySQL table with values from another

I'm trying to update one MySQL table based on information from another.
My original table looks like:
id | value
------------
1 | hello
2 | fortune
3 | my
4 | old
5 | friend
And the tobeupdated table looks like:
uniqueid | id | value
---------------------
1 | | something
2 | | anything
3 | | old
4 | | friend
5 | | fortune
I want to update id in tobeupdated with the id from original based on value (strings stored in VARCHAR(32) field).
The updated table will hopefully look like:
uniqueid | id | value
---------------------
1 | | something
2 | | anything
3 | 4 | old
4 | 5 | friend
5 | 2 | fortune
I have a query that works, but it's very slow:
UPDATE tobeupdated, original
SET tobeupdated.id = original.id
WHERE tobeupdated.value = original.value
This maxes out my CPU and eventually leads to a timeout with only a fraction of the updates performed (there are several thousand values to match). I know matching by value will be slow, but this is the only data I have to match them together.
Is there a better way to update values like this? I could create a third table for the merged results, if that would be faster?
I tried MySQL - How can I update a table with values from another table?, but it didn't really help. Any ideas?
UPDATE tobeupdated
INNER JOIN original ON (tobeupdated.value = original.value)
SET tobeupdated.id = original.id
That should do it, and really its doing exactly what yours is. However, I prefer 'JOIN' syntax for joins rather than multiple 'WHERE' conditions, I think its easier to read
As for running slow, how large are the tables? You should have indexes on tobeupdated.value and original.value
EDIT:
we can also simplify the query
UPDATE tobeupdated
INNER JOIN original USING (value)
SET tobeupdated.id = original.id
USING is shorthand when both tables of a join have an identical named key such as id. ie an equi-join - http://en.wikipedia.org/wiki/Join_(SQL)#Equi-join
It depends what is a use of those tables, but you might consider putting trigger on original table on insert and update. When insert or update is done, update the second table based on only one item from the original table. It will be quicker.

Keeping page changes history. A bit like SO does for revisions

I have a CMS system that stores data across tables like this:
Entries Table
+----+-------+------+--------+--------+
| id | title | text | index1 | index2 |
+----+-------+------+--------+--------+
Entries META Table
+----+----------+-------+-------+
| id | entry_id | value | param |
+----+----------+-------+-------+
Files Table
+----+----------+----------+
| id | entry_id | filename |
+----+----------+----------+
Entries-to-Tags Table
+----+----------+--------+
| id | entry_id | tag_id |
+----+----------+--------+
Tags Table
+----+-----+
| id | tag |
+----+-----+
I am in trying to implement a revision system, a bit like SO has. If I was just doing it for the Entries Table I was planning to just keep a copy of all changes to that table in a separate table. As I have to do it for at least 4 tables (the TAGS table doesn't need to have revisions) this doesn't seem at all like an elegant solution.
How would you guys do it?
Please notice that the Meta Tables are modeled in EAV (entity-attribute-value).
Thank you in advance.
Hi am currently working on solution to similar problem, I am solving it by splitting my tables into two, a control table and a data table. The control table will contain a primary key and reference into the data table, the data table will contain auto increment revision key and the control table's primary key as a foreign key.
taking your entries table as an example
Entries Table
+----+-------+------+--------+--------+
| id | title | text | index1 | index2 |
+----+-------+------+--------+--------+
becomes
entries entries_data
+----+----------+ +----------+----+--------+------+--------+--------+
| id | revision | | revision | id | title | text | index1 | index2 |
+----+----------+ +----------+----+--------+------+--------+--------+
to query
select * from entries join entries_data on entries.revision = entries_data.revision;
instead of updating the entries_data table you use an insert statement and then update the entries table's revision with the new revision of the entries table.
The advantage of this system is that you can move to different revisions simply by changing the revision property within the entries table. The disadvantage is you need to update your queries. I am currently integrating this into an ORM layer so the developers don't have worry about writing SQL anyway. Another idea I am toying with is for there to be a centralised revision table which all the data tables use. This would allow you to describe the state of the database with a single revision number, similar to how subversion revision numbers work.
Have a look at this question: How to version control a record in a database
Why not have a separate history_table for each table (as per the accepted answer on the linked question)? That simply has a compound primary key of the original tables' PK and the revision number. You will still need to store the data somewhere after all.
For one of our projects we went the following way:
Entries Table
+----+-----------+---------+
| id | date_from | date_to |
+----+--------_--+---------+
EntryProperties Table
+----------+-----------+-------+------+--------+--------+
| entry_id | date_from | title | text | index1 | index2 |
+----------+-----------+-------+------+--------+--------+
Pretty much complicated, still allows to keep track of full object's lifecycle. So for querying active entities we were going for:
SELECT
entry_id, title, text, index1, index2
FROM
Entities INNER JOIN EntityProperties
ON Entities.id = EntityProperties.entity_id
AND Entities.date_to IS NULL
AND EntityProperties.date_to IS NULL
The only concern was for a situation with entity being removed (so we put a date_to there) and then restored by admin. Using given scheme there's no way to track such kind of tricks.
Overall downside of any attempt like that is obvious - you've to write tons of TSQL where non-versioned DBs will go for something like select A join B.