I have a database table with about 1M records. I need to find all duplicate names in this table and make them unique.
Id Name
1 A
2 A
3 B
4 C
5 C
Should be changed to...
Id Name
1 A
2 A-1
3 B
4 C
5 C-1
Is there an effective way of doing this with a mysql query or procedure?
Thanks in advance!
I needed to do something similar to a table I was just working on. I needed a unique URL field but the previous keepers of the data did not keep these constraints. The key was to create a temp table.
I used this response here to help: MySQL Error 1093 - Can't specify target table for update in FROM clause
Take note that it doesn't perform well, but then again if you only need to run it once on a database to clean a table then it shouldn't be so bad.
UPDATE `new_article` `upd`
SET `upd`.`url` = CONCAT(`upd`.`url`, '-', `upd`.`id`)
WHERE `upd`.`url` IN(
SELECT `url` FROM (
SELECT `sel`.`url` FROM `new_article` `sel`
GROUP BY `sel`.`url` HAVING count(`sel`.`id`) > 1
) as `temp_table`
);
Related
I have a situation where my source table TABLE_A gets frequently updated on hourly basis through some front end consumer app.
I've another reporting table TABLE_B which gets some of its data from TABLE_A.
Initially copying of records which were done manually through insert statement.
Now this needs to be automated where a stored procedure/script runs every 1 hr and copies the records. So when this script does a 2nd run , it should not copy the records which it has already copied in its first run.
Please help me in how I can do this.
TABLE_A:
id date product type
11 2020-10-14 abc T
12 2020-10-16 def P
- - - - - - - - - -- - - - -- - - - -- - - --
13 2020-10-17 ghi K
14 2020-10-18 klm L
15 2020-10-19 abc T
TABLE_B
id date product type
hadgha 2020-10-14 abc T
gsggss 2020-10-16 def P
Suppose the script has copied the first 2 records in its first run and after that records 13 / 14 / 15 has been added to TABLE_A. Now in the second run the script should copy the last 3 records from TABLE_A to TABLE_B..
Assuming that id is a primary key in both tables, you can use on duplicate key:
insert into table_b values(id, date, product, type)
select id, date, product, type
from table_a
on duplicate key update id = values(id)
Basically this attempts to insert everything from table_a to table_b; when an already-existing record is met, the query goes to the on duplicate key clause, that performs a "dummy" update (the values are the same already, so that's a no-op: MySQL skips the update).
For this to properly work, I would recommend:
setting id in table_a to int primary key auto_increment
setting id in table_b to a int primary key (not auto increment, since values will always be copied from the other table)
I would suggest filtering them out using a subquery and then using on duplicate key:
insert into table_b values(id, date, product, type)
select id, date, product, type
from table_a a
where a.date = (select max(a2.date) from table_a a2 where a2.type = a.type and a2.product = a.product)
on duplicate key update id = values(id);
This assumes that you have a unique constraint on table_b(product, type).
The tables don't have any duplicate key or any constraints in common.
The stored procedure that is being created has variables to capture the max count id and then updating this max count id on every run.. this way it is ensured that it copies the records that has not been copied in last run.
Essentially I have the following called Table1 with columns OrderNum and Book there should never be duplicate records of any kind of Book for each OrderNum, if there is it needs to identified and deleted.
For example:
OrderNum 1 should only have Book1 listed once so the query must identify the other 2 Book1 listed for OrderNum 1 and delete them.
OrderNum 4 should only have Book2 listed once so the query must identify the other Book2 listed for OrderNum 4 and delete it.
After the query runs Table1 Should look like this:
I am working with MS Access queries but I am looking for a solution that could work for an mySQL query as well.
I don't know how to do this gracefully on either MySQL or Access, because your table doesn't have a primary key column, which it rightfully should have. On Access, you could try creating a new table, then populating it using the following query:
INSERT INTO yourNewTable (OrderNum, Book)
SELECT DISTINCT OrderNum, Book
FROM yourTable;
Then, delete yourTable after you are done with the above query.
If you had a primary key/auto increment column in your table, let's say id, then you could use the following delete statement directly:
DELETE
FROM yourTable t1
WHERE EXISTS (SELECT 1 FROM yourTable t2
WHERE t2.OrderNum = t1.OrderNum AND
t2.Book = b1.Book AND
t2.id < t1.id);
This would leave, for each (OrderNum, Book) combination, the single record among duplicates which happens to have the lowest id value.
I have a problem with my queries in MySQL. My table has 4 columns and it looks something like this:
id_users id_product quantity date
1 2 1 2013
1 2 1 2013
2 2 1 2013
1 3 1 2013
id_users and id_product are foreign keys from different tables.
What I want is to delete just one row:
1 2 1 2013
Which appears twice, so I just want to delete it.
I've tried this query:
delete from orders where id_users = 1 and id_product = 2
But it will delete both of them (since they are duplicated). Any hints on solving this problem?
Add a limit to the delete query
delete from orders
where id_users = 1 and id_product = 2
limit 1
All tables should have a primary key (consisting of a single or multiple columns), duplicate rows doesn't make sense in a relational database. You can limit the number of delete rows using LIMIT though:
DELETE FROM orders WHERE id_users = 1 AND id_product = 2 LIMIT 1
But that just solves your current issue, you should definitely work on the bigger issue by defining primary keys.
You need to specify the number of rows which should be deleted. In your case (and I assume that you only want to keep one) this can be done like this:
DELETE FROM your_table WHERE id_users=1 AND id_product=2
LIMIT (SELECT COUNT(*)-1 FROM your_table WHERE id_users=1 AND id_product=2)
Best way to design table is add one temporary row as auto increment and keep as primary key. So we can avoid such above issues.
There are already answers for Deleting row by LIMIT. Ideally you should have primary key in your table. But if there is not.
I will give other ways:
By creating Unique index
I see id_users and id_product should be unique in your example.
ALTER IGNORE TABLE orders ADD UNIQUE INDEX unique_columns_index (id_users, id_product)
These will delete duplicate rows with same data.
But if you still get an error, even if you use IGNORE clause, try this:
ALTER TABLE orders ENGINE MyISAM;
ALTER IGNORE TABLE orders ADD UNIQUE INDEX unique_columns_index (id_users, id_product)
ALTER TABLE orders ENGINE InnoDB;
By creating table again
If there are multiple rows who have duplicate values, then you can also recreate table
RENAME TABLE `orders` TO `orders2`;
CREATE TABLE `orders`
SELECT * FROM `orders2` GROUP BY id_users, id_product;
You must add an id that auto-increment for each row, after that you can delet the row by its id.
so your table will have an unique id for each row and the id_user, id_product ecc...
i am currently writing query. i want to select all records from table . records will be based on mutiple values of foreign key. for example all records related to 1 and 2 both
eg. table might have
id name uid
1 bil 3
2 test 3
3 test 4
4 test 4
5 bil 5
6 bil 5
i want to select all records related to 3 but also related to 4 in this case it is record number 2
SELECT id
FROM `table`
WHERE uid = value1 AND like_id
IN (SELECT like_id
FROM likes
WHERE uid = uid2)
LIMIT 0 , 30
It's not at all clear where "value1" is coming from, or "uid2" is coming from, or where the column "like_id" is coming from. Those column names do not appear in your sample table. Your example query references two different table names (table and likes), yet you only show data for one example table, and that table does not have a column named like_id.
If we assume that "value1" and "uid2" in your query are literals, or bind parameters supplied to the query, which seems to be reasonable, given your specification (variously), of values of 1,2,3 and 4. But we're still left with "like_id" column. Given that it's referenced in the SELECT list of the IN subquery, we're going to presume that's a column in the "likes" table, and given that it's referenced in the outer query, we're going to assume that it's a column in the (unfortunately named) table table.
(Bottomline, it's not at all clear how your query is returning a "correct" result, given that you've made it impossible to replicate a working test case.)
Given a single table, as shown in your example data, e.g.
CREATE TABLE likes (id INT, name VARCHAR(4), uid INT);
INSERT INTO likes VALUES (1,'bil',3),(2,'test',3),(3,'test',4)
,(4,'test',4),(5,'bil',5),(6,'bil',5);
ALTER TABLE likes ADD PRIMARY KEY (id);
ALTER TABLE likes ADD CONSTRAINT likes_ix UNIQUE KEY (uid, name);
Assuming that we're running a query against that single table, and that we're matching "likes" associated with uid=3 to "likes" associated with uid=4, and that the matching is done on the "name" column, then
SELECT t.id
FROM `likes` t
WHERE t.uid = 3
AND EXISTS
( SELECT 1
FROM `likes` s
WHERE s.name = t.name
AND s.uid = 4
)
That will return the id of the row from the likes table for uid=3 where we also find a row in the likes table for uid=4 with a matching name value.
Given a limited number of rows to be inspected from the likes table on the outer query, that gives a limited number of times a correlated subquery would need to be run, which should give reasonable performance:
For large sets, a join operation generally performs better to return an equivalent result:
SELECT t.id
FROM `likes` t
JOIN `likes` s
ON s.name = t.name
AND s.uid = 4
WHERE t.uid = 3
GROUP
BY t.id
The key to optimum performance for either query is going to be appropriate indexes.
I have a table which has a structure like as below.
create table test_table (id INT NOT NUll AUTO_INCREMENT
, name varchar(100),
primary key (id))ENGINE=INNODB
Select * from test_table;
id name
1 a
2 b
3 c
Now I want to increment the id by a number lets say 2
So the final results should be
Select * from test_table;
id name
3 a
4 b
5 c
The way I can do it is, first remove the PK and auto increment and then
update the table:
update test_table set id=id+2;
The other way is to make a temp table with out PK and auto increment and then
extract the result to the main table.
Is there any other way to do this without destroying the table structure ?
I am using MYSQL.
In your example, you need to remove the PK first to allow (temporary) duplicate id's during the course of the update.
To avoid duplicates, you must perform an ordered update:
UPDATE test_table SET id = id + 2 ORDER BY id DESC;
This will update records with largest value of id first, hence avoiding collision.
Obviously, if you want to decrement the values of id, then use "ORDER BY id ASC".
Here is the query to update the tables in SQL :- Its generic
UPDATE table_name SET column1=value, column2=value2,WHERE some_column=some_value;
Please follow the link for more information
Update Query
Thanks,
Pavan