One of my table contains data(numbers) that i would like to copy to other table, but problem is that data is not unique there can be 2 or more rows with same data i would like to copy (i need to copy each number only once). Table is around 3 milion records. Is any effcient way to do this?
Would this work for you?
INSERT INTO destination_table ('the_value_field') SELECT DISTINCT('the_value_field') FROM origin_table
Suppose there are two columns a, b in your table
INSERT INTO new_table (a, b) SELECT
a, b FROM old_table GROUP BY
a, b HAVING COUNT(*) > 1;
you can extend this with more columns.
this will be a slow process and may never complete with huge data.
So, instead copy all values into new_table using
Insert into new_table select * from old_table;
and then delete duplicate records from new table . This can be relatively faster and is with an assured completion.
You can use SELECT DISTINCT to select only the unique values.
https://www.w3schools.com/sql/sql_distinct.asp
SELECT DISTINCT `val` FROM `table_name`
Related
I have a table contains more than 500 millions records in MySQL database ,
i need to remove duplicated from it ,
i tried this query on table contain 20 millions , it was ok but for the 500 millions it take very long time :
-- Create temporary table
CREATE TABLE temp_table LIKE names_tbles;
-- Add constraint
ALTER TABLE temp_table ADD UNIQUE(name , family);
-- Copy data
INSERT IGNORE INTO temp_table SELECT * FROM names_tbles;
is there better solution ?
One option is aggregation rather than insert ignore. That way, there is no need for the database to manage rejected records:
insert into temp_table(id, name, family)
select min(id), name, family
from names_tbles
group by id, family;
I would take one step further and suggest adding the unique constraints only after the table is populated, so there is no need for the database to check for duplicates (the query guarantees that already), which should speed up the insert statement.
I have a table with 60 columns in it. I would like to delete duplicate entries. It has to compare all 60 columns for a record to be considered duplicate.
I tried setting all 60 columns to UNIQUE in MySQL, but I get this error
#1070 - Too many key parts specified; max 16 parts allowed
Any other solutions out there?
If your new table should have the exact same schema than the old one
CREATE TABLE new_table LIKE old_table;
To INSERT all distinct rows into new_table use
INSERT INTO new_table
SELECT DISTINCT * FROM old_table;
Then you can DROP TABLE old_table and RENAME TABLE new_table TO old_table or whatever.
I would suggest try something like this
Select col1, ..., col60 from [mytable] group by col1,...col60 HAVING count(*)>1
this will list all the duplicate rows. Once you have that you can delete the duplicate rows.
I am deleting rows in order of hundreds of thousands from a remote DB. Each delete has it's own target eg.
DELETE FROM tablename
WHERE (col1=c1val1 AND col2=c2val1) OR (col1=c1val2 AND col2=c2val2) OR ...
This has been almost twice as fast for me than individual queries, but I was wondering if there's a way to speed this up more, as I haven't been working with SQL very long.
Create a temporary table and fill it with all your value pairs, one per row. Name the columns the same as the matching columns in your table.
CREATE TEMPORARY TABLE donotwant (
col1 INT NOT NULL,
col2 INT NOT NULL,
PRIMARY KEY (c1val, c2val)
);
INSERT INTO donotwant VALUES (c1val1, c2val1), (c1val2, c2val2), ...
Then execute a multi-table delete based on the JOIN between these tables:
DELETE t1 FROM `tablename` AS t1 JOIN `donotwant` USING (col1, col2);
The USING clause is shorthand for ON t1.col1=donotwant.col1 AND t1.col2=donotwant.col2, assuming the columns are named the same in both tables, and you want the join condition where both columns are equal to their namesake in the joined table.
Generally speaking, the fastest way to do bulk DELETEs is to put the ids to be deleted into a temp table of some sort, then use that as part of the query:
DELETE FROM table
WHERE (col1, col2) IN (SELECT col1, col2
FROM tmp)
Inserting can be done via a standard:
INSERT INTO tmp VALUES (...), (...), ...;
statement, or by using the DB's bulk-load utility.
I doubt it makes much difference to performance but you can write that kind of thing this way...
DELETE
FROM table
WHERE (col1,col2) IN(('c1val1','c2val1'),('c1val2','c2val2')...);
I have a table with 28 million records but now it has 56 million records because I assumed the load local infile command would ignore rows that were already in the table. No I need a way to efficiently remove the duplicate rows. What is the best way to approach this?
If I do not want to touch my table can I just select unique rows by this statement:
select distinct (l1.lat, l2.lon) from A, B;
Select originals into a new/temp table, delete the 56 million records, insert your originals.
Example:
INSERT INTO new_fresh_table
SELECT a, b, c, d FROM table_with_dupes
GROUP BY a, b, c, d
If you've lost duped your IDs somehow (not sure how that's possible with a PK), you need to use GROUP BY on every single column. Write a SELECT against meta-data to write your SELECT for you.
You didn't specify how the records are duped. Is it Primary Key? Name? What?
From O'Reily's SQL Cookbook (highly recommended, even for SQL pros):
delete from dupes
where id not in ( select min(id) from dupes group by name )
If you cannot touch the table, and have to use it, why don't you create a view which only show you distinct records?
I would like to copy the structure and the content of one mysql table to another, adding all of the columns and values of that table to the already existing ones in the other table.
I could do it manually, but since I'm talking about a large amount of columns, it would be great if there were some sort of ALTER statement to help me do that.
EDIT:
To explain myself better:
I first need to add the columns contained in table B (column_name, data_type) to table A (which already has its own set of columns). Once that is done, I can copy the content, which is easy.
I guess the real question is: is there a way to add the columns contained in table B to another table (table A) which has columns of its own?
To build on flavianatill's second solution, it seems to me that the export/import step is not needed. If I understand the problem correctly, the following one-liner should do it.
CREATE TABLE IF NOT EXISTS merged_table AS (SELECT * FROM table1 LEFT JOIN table2 ON table1.id = table2.id);
Sorry, I would have put this in a comment but I lack the reputation!
This will copy all data from a source table to a target table. You can specify which columns should go to which. by changing the names of targetColumn.. and sourceColumn....
INSERT INTO targetTable (
targetColumn1
targetColumn1
targetColumn1
....
targetColumnN
)
SELECT
sourceColumn1
sourceColumn1
sourceColumn1
....
sourceColumnN
FROM sourceTable
You can also create sourceTable by doing
CREATE TABLE targetTable LIKE sourceTable
EDIT A method to pull all data from sourceTable to targetTable, however removing targetTable if it exists
DROP TABLE IF EXISTS targetTable;
CREATE TABLE targetTable LIKE sourceTable;
INSERT INTO targetTable SELECT * FROM sourceTable;
EDIT If you need to keep old data, you may need to remap it but you can merge in other tables
CREATE TABLE targetTable LIKE sourceTable;
INSERT INTO targetTable SELECT * FROM sourceTable;
INSERT INTO targetTable ( fieldsToInsertTo ) SELECT fieldsToSelectFrom FROM oldTargetTable ON DUPLICATE KEY ......;
DROP TABLE IF EXISTS oldTargetTable;
RENAME TABLE targetTable TO oldTargetTable;
This will however potentially either require ON DUPLICATE KEY UPDATE ..... logic, or simply INSERT IGNORE on the second if you are happy throwing away any PRIMARY/UNIQUE key conflicting rows. This assumes you have sourceTable you want to copy and merge with data from oldTargetTable. The table targetTable is just a temporary name.
If you wanted to prefer data from the old table then just swap the order you perform the INSERTs of course