Merging duplicate rows that have one different value - mysql

This is an correction of this question now I properly understand what I need to do.
I have a table with films and the dates they were shown on along with other info in other columns, in MySQL.
So relevant columns are...
FilmID
FilmName
DateShown
The dates are stored as Unix timestamps.
I currently have multiple instances of films that were shown on different dates yet all other information is the same.
I need to copy the dates of the duplicate films into a new table matching them up to the film ID. Then I need to remove the duplicate film rows from the original table.
So I have created a new table, Film_Dates with the columns
FilmDateID
FilmID
Date
Can anyone help with the actual sql to do this.
Thank you.

to start with:
insert into filmdateid (filmid, `date`)
select filmid, dateshown
from films
and that should populate your new table.
alter ignore table films
add unique (filmid)
This will enforce uniqueness for filmid, and drop all duplicates, keeping just the one row. If this fails with a 'duplicate entry' error, you will need to run this command, and then try the alter again.
set session old_alter_table=1
As it seems mysql is moving away from being able to do it this way.
Lastly, you need to get rid of your dateshown column.
alter table films
drop column dateshown
Please make sure you have a backup before you attempt any of this. Always best to be safe.
since filmid is not duplicated, only filmname, there are some extra steps
first, create the filmdates table:
create table filmdates as
select filmname, dateshown
from films;
Then add a filmid column:
alter table filmdates add column filmid integer;
And a unique index on (filmname, dateshown)
alter ignore table filmdates add unique(filmname, dateshown);
Then we add a unique index on films(filmname) - since its the only value that really gets duplicated.
alter ignore table films add unique(filmname);
Now that we're setup, we populate the filmid column in the new table, with maching values from the old.
update films f
inner join filmdates fd
on f.filmname = fd.filmname
set fd.filmid = f.filmid;
Now we just need to cleanup, and get rid of the redundant columns (films.dateshown and filmdates.filmname).
alter table filmdates drop column filmname;
alter table films drop column dateshown;
demo here

Related

Delete duplicate rows on a huge list

I have a huge list of roads and the place of that road, like below:
StreetName,PlaceName,xcoord,ycoord
Ovayok Road,Cambridge Bay,-104.99656,69.12876
Ovayok Road,Cambridge Bay,-104.99693,69.12865
Ovayok Road,Cambridge Bay,-104.99794,69.12842
Ovayok Road,Cambridge Bay,-104.99823,69.12835
Hikok Drive,Kugluktuk,-115.09433,67.82674
Hikok Drive,Kugluktuk,-115.09570,67.82686
Hikok Drive,Kugluktuk,-115.09593,67.82689
Hikok Drive,Kugluktuk,-115.09630,67.82695
Sivulliq Avenue,Rankin Inlet,-92.08252,62.81265
Sivulliq Avenue,Rankin Inlet,-92.08276,62.81265
Sivulliq Avenue,Rankin Inlet,-92.08461,62.81262
How to delete rows that have duplicates data on first and second column? All numbers (coordinates) are differents.
If you don't have any column by which you can uniquely identify your data or any column with ID,
then fetch the unique records in the table and move them to a copy of the table and rename this temp table with the original table.
Below find the query for the same -
CREATE TABLE street_details_temp LIKE street_details;
INSERT INTO street_details_temp SELECT DISTINCT * FROM street_details;
DROP TABLE street_details;
RENAME TABLE street_details_new TO street_details;

Adding column from one table to the other table in mysql

How to add or insert specific column from one table to other ? I tried writing like this
ALTER TABLE info_apie_zaideja
ADD SELECT info_apie_match.Rank AFTER 'Nick'
FROM info_apie_match;
or this
UPDATE info_apie_zaideja
ADD COLUMN SELECT info_apie_match.Rank AFTER 'Nick'
FROM info_apie_match;
but that did not work. Oh, and the table where I want to insert column is view table if that helps somehow. All answers will be appreciated.
You need to do this in two steps. First alter the table to add the new column:
ALTER TABLE info_apie_zaideja
ADD COLUMN Rank INT AFTER Nick;
Then fill it in by copying from corresponding rows in the other table:
UPDATE info_apie_zaideja AS z
JOIN info_apie_match AS m ON z.id = m.zaideja_id
SET z.Rank = m.Rank
I had to guess at the column that relates the two tables. Correct the ON clause to match your actual table relations.
Also, consider whether you really need the column in both tables. With this redundancy, you'll need to make sure that whenever you update one table, the other one is updated as well. Instead, you could just use a JOIN whenever you need the value from the other table.

Duplicate Primary Key while populating new table from old table

I've created multiple indexed tables that I want to tie into a new normalized version of an old table. I get everything indexed and the relations set and I get a "Duplicate entry '11' for key 'Primary' " error message.
Here's the code I'm using to populate the new table.
insert into dvdNormal(dvdId, dvdTitle, year, publicRating, dvdStudioId,
dvdStatusId, dvdGenreId)
(
select dvdId, dvdTitle, year, publicRating, studioId, statusId, genreId
from dvd d
join dvdStudio on d.studio = dvdStudio.studioName
join dvdStatus on d.status = dvdStatus.dvdStatus
join dvdGenre on d.genre = dvdGenre.genre);
I'm going to assume you were asking a question, and not just giving a status report.
The behavior you observe is (most likely) due to the insert statement attempting to insert a row that violates a UNIQUE (or PRIMARY KEY) constraint defined on the dvdId column in the target table (the table the statment is inserting rows into.)
And either 1) the dvdId column is not unique in the table it's being retrieved from, or 2) there is more than one "matching" row in one of the other three tables.
For example, if dvdId is a column in dvd, and it's defined as UNIQUE, then case 1) doesn't apply.
But if that row from dvd has more than one "matching" row from one (or more) of the other three tables, then we'd expect the SELECT to generate "duplicate" values for dvdId.
For example, if the genre column is not unique in dvdGenre table, or studioName column is not unique in dvdStudio, we'd expect the query to return multiple copies of the row from dvd. The redundant data (duplicated values) is expected when we "denormalize" data.
If we want to get the table loaded from the query, there's a couple of options.
If we want to store every row returned by the query, we would remove the UNIQUE constraint from the dvdId column. (There may also be other UNIQUE constraints that need to be removed from the target table.)
If we only want to store one copy of the row from dvd, along with values from one matching row from each of the other tables, we could leave the UNIQUE constraint, and use an INSERT IGNORE statement to avoid throwing a "duplicate key error". Any rows where that error would have been thrown will be discarded, and won't be inserted into the target table.
Because the column references aren't qualified, we can't actually tell which table the dvdId column is beint returned from. We can't tell which table any of the columns are returned from. We can "guess" that genreId is being returned from the dvdGenre table, but for us to figure that out, we'd need to investigate the schema definition. It's not a problem for MySQL, it can lookup the table definitions a whole lot faster than we can.
We could aid to the future reader of that SQL statement by qualifying the column references with the tablename, or a table alias.

MSQL creating a table which automatically updates with data from other tables

I have two tables:
Table1 - Sales
id, Timestamp, Notes, CSR, Date, ExistingCustomer, PubCode, Price, Pub, Expiry, Size, KeyCode, UpSell, CustName, CustAddress, CustCity, CustState, CustPostal, CustCountry, CustPhone, CustEmail, CustCardName, CustCardNumber, CustCardType, CustCardExpiry, CustCardCode
Table2 - Refunds
id,Timestamp,CSR,Date,OrderId,Refunded,Saved,Publication
Basically, I want to create a table (MySQL) which will have some columns that are the same between the two tables and which will update automatically with the values from these two columns.
ie. Table3
Timestamp, CSR, Date, Publication
And this table would automatically update whenever a new record is posted into either of the other two tables, so it would essentially be a merged table.
Because there's nothing to join these two tables, I don't think the JOIN function would work here. Is there anyway I can do this?
You can use an trigger which actives on insert on both tables to make it automatically update.
As for combining tables with no common tables, view this question.
You need to use a stored procedure and a trigger on insert/update of the non merged table
There's got to be some way to join it, and in fact you mention Timestamp, CSR, Date, Publication.
You could join on them in a view. You could add table three and then add triggers though that would be an awful mess.
Why do you want to denormalise in this way?
How about Table3 is a unique key to use as a surrogate, and your 4 join fields, and then you take those out of Table 1 and 2 and replace them with the key the suurrogate key in table 3.
Then it'sa simple join query and no data duplication.

Duplicate Entries in DB

I have a huge table of products but there are lot of duplicate entries. The table has more than10 Thousand entries and I want to remove the duplicate entries in it without manually finding and deleting it. Please let me know if you can provide me a solution for this
You could use SELECT DISTINCT INTO TempTable, drop the original table, and then rename the temp one.
You should also add primary and unique keys to avoid this sort of thing in the future.
for full row duplicates try this.
select distinct * into mytable_tmp from mytable
drop table mytable
alter table mytable_tmp rename mytable
Seems the below statements will help you in resolving your requirements.
if the table(foo) has primary key field
First step
store key values in temporary table, give your unique conditions in group by clause
if you want to delete the duplicate email id, give email id in group by clause and give the primary key name in
select clause like either min(primarykey) or max(primarykey)
CREATE TEMPORARY TABLE temptable AS SELECT min( primarykey ) FROM foo GROUP BY uniquefields;
Second step
call the below delete statement and give the table name and primarykey columns
DELETE FROM foo WHERE primarykey NOT IN (SELECT * FROM temptable );
execute both the query combined in your query analyser or db tool.
If the table(foo) doesn't have a primary key filed
step 1
CREATE TABLE temp_table AS SELECT * FROM foo GROUP BY field or fileds;
step 2
DELETE FROM foo;
step 3
INSERT INTO foo select * from temp_table;
There are different solutions to remove duplicate rows and it fully depends upon your scenario to make use of one from them. The simplest method is to alter the table making the Unique Index on Product Name field:
alter ignore table products add unique index `unique_index` (product_name);
You can remove the index after getting all the duplicate rows deleted:
alter table products drop index `unique_index`;
Please let me know if this resolves the issue. If not I can give you alternate solutions for that.
You can add more than one column to a group by. I.E.
SELECT * from tableName GROUP BY prod_name HAVING count(prod_name) > 1
That will show the unique products. You can write it dump it to new table and drop the existing one.