I have a database table with one Index where the keyname is PRIMARY, Type is BTREE, Unique is YES, Packed is NO, Column is ID, Cardinality is 728, and Collation is A.
I have a script that runs on page load that adds entries to the MySQL database table and also removes duplicates from the Database Table.
Below is the script section that deletes the duplicates:
// Removes Duplicates from the MySQL Database Table
// Removes Duplicates from the MySQL Database Table based on 'Entry_Date' field
mysql_query("Alter IGNORE table $TableName add unique key (Entry_Date)");
// Deletes the added index created by the Removes Duplicates function
mysql_query("ALTER TABLE $TableName DROP INDEX Entry_Date");
Using the Remove Duplicates command above, an additional index is added to the table. The next line command is suppose to delete this added index.
The problem is that sometimes the added index created by the Removes Duplicates command does not get deleted by the following Delete added index command and therefore more indexes are added to the table. These additional indexes prevent the script from adding additional data to the database until I remove the added indexes by hand.
My Question:
Is there a command or short function that I can add to the script that will delete all indexes except the original index mentioned in the beginning of this post?
I did read the following post, but I don't know if this is the correct script to use:
How to drop all of the indexes except primary keys with single query
I don't think so, what you can do is create copies but that wouldn't copy the index. for example if you make
create table1 as (select * from table_2), he will make copy but without index or PK.
After all the comments I think I realize what is happening.
You actually allow duplicates in the database. You just want to clean them some times.
The problem is that the method you have chosen to clean them is through creating a Unique key and using the IGNORE option which causes duplicate lines to get dropped instead of failing the unique key creation. then you drop the unique key so that duplicate rows can be added again. your problem is that sometimes the unique key is not being dropped.
I suggest you delete the duplicates in another way. supposing that your table name is "my_table" and your primary key is my_mey_column then:
delete from my_table where my_key_column not in (select min(my_key_column) from my_table group by Entry_Date)
Edit: the above won't work due to limitation in mysql as pointed by #a_horse_with_no_name
try the three following queries instead:
create temporary table if not exists tmp_posting_data select id from posting_data where 1=2
insert into tmp_posting_data(id) select min(id) from posting_data group by Entry_Date
delete from Posting_Data where id not in (select id FROM tmp_posting_data)
As a final note, try to reconsider the need to allow the rows to be duplicated also as suggested by #a_horse_with_no_name. instead of allowing rows to be entered and then deleted, you can create the unique key once in the database like:
Alter table posting_data add unique key (Entry_Date)
and then, when you are inserting new data from the RSS use the following instead of "insert" use "replace" which will delete the old row if it is a duplicate on the primary key or any unique index
replace into posting_data (......) values(.....)
Related
I have database like the following with 10K rows. How to delete duplicate if all fields are same. I don't want to search for any specific company. Is there a way to search and find any multiple entries with all same fields get deleted. Thanks
This command adds a unique key, and drops all rows that generate errors (due to the unique key). This removes duplicates.
ALTER IGNORE TABLE table ADD UNIQUE KEY idx1(title);
Note: This command may not work for InnoDB tables for some versions of MySQL. See this post for a workaround. (Thanks to "an anonymous user" for this information.)
OR
Simply creates a new table without duplicates. Sometimes this is actually faster and easier than trying to delete all the offending rows. Just create a new table, insert the unique rows (I used min(id) for the id of the resulting row), rename the two tables, and (once you are satisfied that everything worked correctly) drop the original table
This below query used to find the duplicate entry using all fields:
Select * from Table group by company_name,city,state,country having count(*)>1;
I'm using SugarCRM and a few weeks ago I executed a a query on MySQL which created an index to prevent duplicate rows. Where can I see that or find it and edit or delete this ? I'm not able to remember the exact query but it's needed to add more columns. Using MySQL only just a few weeks.
MySQL error 1062: Duplicate entry 'example-dyplicate' for key
'idx_name'
To see the structure of a table, including all the indexes, use:
SHOW CREATE TABLE tablename;
You can delete an index with:
DROP INDEX indexname ON tablename;
There's no way to edit an index. If you want to change an index, you drop it and then add a new index with the new columns you want. However, you can do both in a single query using ALTER TABLE:
ALTER TABLE tablename DROP INDEX indexname ADD INDEX indexname (col1, col2, ...);
I want to know whether it is possible to avoid duplicate entries or data without any keys or group by statement
Create Unique key constrait.
ALTER TABLE Comment ADD CONSTRAINT uc_Comment UNIQUE (CommentId, Comment)
In above case Comment duplication will not be done as we are creating the unique combination of COmmentId and Comment.
Hope this helps.
More info : http://www.w3schools.com/sql/sql_unique.asp OR
SQL Server 2005 How Create a Unique Constraint?
If you want to suppress duplicates when querying, use SELECT DISTINCT.
If you want to avoid putting duplicates into a table, just don't insert records that are already there. It doesn't matter whether you have a primary/unique key: those will make the database not allow duplicate records, but it's still up to you to avoid trying to insert duplicates (assuming you want your queries to succeed).
You can use SELECT to find whether a record already exists before trying to insert it. Or, if you want to be fancy, you can insert the new records into a temporary table, use DELETE to remove any that are already present in the real table, then use INSERT ... SELECT to copy the remaining records from the temporary table into the real one.
I'm able to display duplicates in my table
table name reportingdetail and column name ReportingDetailID
SELECT DISTINCT ReportingDetailID from reportingdetail group by ReportingDetailID HAVING count(ReportingDetailID) > 1;
+-------------------+
| ReportingDetailID |
+-------------------+
| 664602311 |
+-------------------+
1 row in set (2.81 sec)
Dose anyone know how can I go about deleting duplicates and keep only one record?
I tired the following
SELECT * FROM reportingdetail USING reportingdetail, reportingdetail AS vtable WHERE (reportingdetailID > vtable.id) AND (reportingdetail.reportingdetailID=reportingdetailID);
But it just deleted everything and kept single duplicates records!
The quickest way (that I know of) to remove duplicates in MySQL is by adding an index.
E.g., assuming reportingdetailID is going to be the PK for that table:
mysql> ALTER IGNORE TABLE reportingdetail
-> ADD PRIMARY KEY (reportingdetailID);
From the documentation:
IGNORE is a MySQL extension to standard SQL. It controls how ALTER
TABLE works if there are duplicates on unique keys in the new table or
if warnings occur when strict mode is enabled. If IGNORE is not
specified, the copy is aborted and rolled back if duplicate-key errors
occur. If IGNORE is specified, only the first row is used of rows with
duplicates on a unique key. The other conflicting rows are deleted.
Incorrect values are truncated to the closest matching acceptable
value.
Adding this index will both remove duplicates and prevent any future duplicates from being inserted. If you do not want the latter behavior, just drop the index after creating it.
The following MySQL commands will create a temporary table and populate it with all columns GROUPED by one column name (the column that has duplicates) and order them by the primary key ascending. The second command creates a real table from the temporary table. The third command drops the table that is being used and finally the last command renames the second temporary table to the current being used table name.
Thats a really fast solution. Here are the four commands:
CREATE TEMPORARY TABLE videos_temp AS SELECT * FROM videos GROUP by
title ORDER BY videoid ASC;
CREATE TABLE videos_temp2 AS SELECT * FROM videos_temp;
DROP TABLE videos;
ALTER TABLE videos_temp2 RENAME videos;
This should give you duplicate entries.
SELECT `ReportingDetailID`, COUNT(`ReportingDetailID`) AS Nummber_of_Occurrences FROM reportingdetail GROUP BY `ReportingDetailID` HAVING ( COUNT(`ReportingDetailID`) > 1 )
I have a table with just one column: userid.
When a user accesses a certain page, his userid is being inserted to the table. Userids are unique, so there shouldn't be two of the same userids in that table.
I'm considering two designs:
Making the column unique and using INSERT commands every time a user accesses that page.
Checking if the user is already recorded in the table by SELECTing from the table, then INSERTing if no record is found.
Which one is faster?
Definitely create a UNIQUE index, or, better, make this column a PRIMARY KEY.
You need an index to make your checks fast anyway.
Why don't make this index UNIQUE so that you have another fallback option (if you for some reason forgot to check with SELECT)?
If your table is InnoDB, it will have a PRIMARY KEY anyway, since all InnoDB tables are index-organized by design.
In case you didn't declare a PRIMARY KEY in your table, InnoDB will create a hidden column to be a primary key, thus making your table twise as large and you will not have an index on your column.
Creating a PRIMARY KEY on your column is a win-win.
You can issue
INSERT
IGNORE
INTO mytable
VALUES (userid)
and check how many records were affected.
If 0, there was a key violation, but no exception.
How about using REPLACE?
If a user already exists it's being replaced, if it doesn't a new row is inserted.
what about doing update, e.g.
UPDATE xxx SET x=x+1 WHERE userid=y
and if that fails (e.g. no matched rows), then do an insert for a new user?
SELECT is faster... but you'd prefer SELECT check not because of this, but to escape from rasing an error..
orrrrrrr
INSERT INTO xxx (`userid`) VALUES (4) ON DUPLICATE KEY UPDATE userid=VALUE(`userid`)
You should make it unique in any cases.
Wether to check first using SELECT, depends on what scenario is most common. If you have new users all the time, and only occationally existing users, it might be overall faster for the system to just insert and catch the exception in the rare occations this happens, but exception is slower than check first and then insert, so if it is a common scenario that it is an existing user, you should allways check first with select.