Trouble deleting duplicate records in mysql - mysql

I have a products table which contains duplicate products by a column id_str and not id. We use the id_str to track each product. This is what I tried thus far:
Created a temp table and truncated it, then ran the following query
INSERT INTO products_temp SELECT DISTINCT id_str, id, title, url, image_url, long_descr, mp_seller_name, customer_rating, curr_item_price, base_item_price, item_num, rank, created_at, updated_at, published, publish_ready, categories, feed_id, category_names, last_published_at, canonical_url, is_curated, pr_attributes, gender, rating, stock_status, uploadedimage_file_name, updated_by, backfill_text, image_width, image_height, list_source, list_source_time, list_category, list_type, list_image, list_name, list_domain, notes, street_date, list_product_rank, created_by from products
And this moved everything over however when I searched the new table for duplicate id_str's:
SELECT id_str, COUNT(*) C FROM PRODUCTS GROUP BY id_str HAVING C > 1
I get the same result as I do on the original table. What am i missing?

one or more of the other columns cause the rows being inserted to be unique.
you are only testing the id_str in the count query,.

Using SELECT DISTINCT only removes duplicated entire rows. It doesn't remove a row if only one of the values is the same and the others are different.
Assuming that id is unique, try this instead:
INSERT INTO products_temp
SELECT id_str, id, title, url, -- etc
FROM products
WHERE id IN (SELECT MIN(id) FROM products GROUP BY id_str)

Try SELECT id_str, COUNT(*) C FROM PRODUCTS_TEMP GROUP BY id_str HAVING C > 1
In your case you are selecting again from the original table.

This is the simplest way I found to find and delete duplicates:
Note: Because of a bug with the InnoDB engine, for this to work you need to change your engine to MyISAM:
ALTER TABLE <table_name> ENGINE MyISAM
then add a unique index to the column you are trying to find dup's in using ignore:
ALTER IGNORE TABLE <table_name> ADD UNIQUE INDEX(`<column_name>`)
and change your db engine back:
ALTER TABLE <table_name> ENGINE InnoDB
and if you want you can delete the index you just created, but I would suggest also looking into what caused the duplicates in the first place.

Related

Insert into a table conditionally using values from another table

I have a good idea of the pseudo-logic of what I want to do - just struggling to think of the syntax to put it into practice.
I’ve got three tables:
product_images
product
product_import
At a high level - I want to insert a row into the product_images table with just two values (image_url, product_id) - the image url can be found in the product_import table along with an product_id. This product_id is the old ID of the product (migrating from another system) - this is recorded as old_id in the product table.
Therefore the retrieval of the image_url works conditionally on the basis that: the product_id in the product_import table has a match with the old_id value in the product table. If it does match - then insert the value of the matching image_url from the product_import table and the new product_id that matches from the product table (if the old_id is found)
My guess at the SQL statement is something along the lines of:
INSERT INTO product_image(image_url, product_id)
SELECT product_import.image_url, product.id WHERE product.old_id = product_import.id;
When you want to write an INSERT INTO ... SELECT ... statement, the first rule is to start by designing the SELECT ... part.
In your case, the select query is syntactically invalid because it is missing a FROM statement.
So a valid select would be something like:
SELECT product_import.image_url, product.id
FROM product_import
INNER JOIN product ON product.old_id = product_import.id;
You may have to tweak it a bit based on your needs but it is the minimum query to start working. And of course you add the INSERT part only when you are satisfied with your SELECT query.
This was what I needed in the end..
INSERT INTO product_images(image_url, product_id, account_id, is_thumbnail, created_at, modified_at)
SELECT `value`, id, account_id, "1", now(), now()
FROM product_import
JOIN product ON product_import.entity_id = product.old_id
WHERE product.account_id=1;

How do i write the condition in sql to get the required result SQL

I have 2 tables (1) Updates (2) Companies
Updates Table Columns: ID, Title, inserted_at, updated_at, revisions, published_at, archived_at, versions
Companies Table columns: id, name, host, email, inserted_at, updated_at, features.
How do I write a query to show how many posts have been made by a company.
What I know so far is I need to use COUNT in the query but how can I get the no. of updates by a company using that?
SELECT COUNT (column_name)
FROM TABLE (table_name)
condition??
Thanks in advance.
You will have to add a column "company_id" in Updates table which will reference the ID column in Companies table. You will need to add this column then only you can identify which update was made by which company.
So the new structure will be as follows
Companies (ID, name, host, email, inserted_at, updated_at, features)<br/>
Updates (ID, title, inserted_at, updated_at, revisions, published_at, archived_at, company_id)
Then use the following command to select the count of updates
SELECT COUNT(*) FROM updates WHERE company_id = 1;
Or if you want for all the companies then use the following command
SELECT u.company_id, c.name COUNT(u.company_id) FROM companies c, updates u, WHERE c.id = u.company_id GROUP BY u.company_id, c.name;

Update Column Using ROW_Number() function. But it is failing. Could Any one suggest a solution?

I know guys, this might be a silly question, but I have not found any solution till now, so I am asking this question with all the inputs and outputs that I have done. Could anyone provide me the solution.
What I want to do is: the parcelno can have one or more invoicenumbers, I want to find how many invoice numbers does an parcel has and give it a rank. The ranking part is important because my further work is depending on this column.
I have one table named TableA. It has three columns Invoicenumber which is the unique id, ParcelNo which can be duplicate and Ranking which I want to update.
CREATE TABLE TableA
(
Invoicenumber varchar(5),
ParcelNo varchar(5),
Ranking bit,
IDate Datetime
)
INSERT INTO TableA (Invoicenumber, ParcelNo)
VALUES ('INV01', 'P0001'), ('INV02', 'P0001'),
('INV03', 'P0002'), ('INV04', 'P0002'),
('INV05', 'P0003'), ('INV06', 'P0003')
When I run the following query the output is as desired.
;WITH CTE AS
(
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY PARCELNO ORDER BY INVOICENUMBER) AS RWNO
FROM
TableA
)
SELECT
T.*, C.RWNO
FROM CTE C
JOIN TableA T ON T.Invoicenumber = C.Invoicenumber
The output is below:
So, I tried to update the Ranking column in Table A.
I run this query to do so:
;WITH CTE AS
(
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY PARCELNO ORDER BY INVOICENUMBER) AS RWNO
FROM
TableA
)
UPDATE T
SET Ranking = C.RWNO
FROM CTE C
JOIN TableA T ON T.Invoicenumber = C.Invoicenumber
But the output is wrong. The column is not updated as expected.
Below is the output of the updated column:
Why is the Ranking column is updated incorrectly?
I want to update the column to prepare some data. This table is sample for the explanation.
I am elaborating my issue below:-
Below in the image are two tables:-
Table A and Table B has IDate column.
I want to update the IDate column in A from B. But the dates should be unique. First date should not be repeated. These date are associated with Invoicenumbers.
I think what you really want is a calculated column (called a calculated field or generated field). I'm guessing that your parcel number should point to another table that stores information about the parcels. If that's the case, then go with:
-- First approach
CREATE TABLE Parcels (
id int IDENTITY (1,1) NOT NULL,
ParcelNo varchar(5),
Description varchar(max)
-- Ranking AS (SELECT COUNT(*) FROM Invoices i WHERE i.ParcelID = id)
);
CREATE TABLE Invoices (
id int IDENTITY (1,1) NOT NULL,
InvoiceNumber varchar(5),
ParcelID int FOREIGN KEY REFERENCES Parcels(id)
);
ALTER TABLE Parcels ADD Ranking AS (SELECT COUNT(*) FROM Invoices i WHERE i.ParcelID = id);
INSERT INTO Parcels
(ParcelNo)
VALUES
('P0001'),
('P0001'),
('P0002'),
('P0003');
INSERT INTO Invoices
(InvoiceNumber, ParcelID)
VALUES
('INV01', (SELECT p.id FROM Parcels p WHERE p.ParcelNo = 'P0001')),
('INV02', (SELECT p.id FROM Parcels p WHERE p.ParcelNo = 'P0001')),
('INV03', (SELECT p.id FROM Parcels p WHERE p.ParcelNo = 'P0002')),
('INV04', (SELECT p.id FROM Parcels p WHERE p.ParcelNo = 'P0002')),
('INV05', (SELECT p.id FROM Parcels p WHERE p.ParcelNo = 'P0003')),
('INV06', (SELECT p.id FROM Parcels p WHERE p.ParcelNo = 'P0003'));
On the other hand, if you really want all the data in a single table, then try this:
-- Second approach
CREATE TABLE TableA (
Invoicenumber varchar(5),
ParcelNo varchar(5),
Ranking AS (SELECT COUNT(*) FROM TableA a WHERE a.ParcelNo = ParcelNo)
)
Some notes:
Both of my approaches assume that by ranking, you mean that you want a count of how many invoices are in a parcel.
My first approach has a circular reference, because the Invoices table has a foreign key into the Parcels table, but the Parcels table tabulates information from the Invoices table. That's why I commented out the calculated field in the first table, then added the calculated field back in after creating both tables.
Notice that I capitalized all SQL keywords (except the types such as varchar). It's easier to read SQL if you either go with all caps or no caps for an entire query.
Notice my semicolons at the end of each logical break. Semi-colons are technically optional, but a lot of folks consider using them to be good practice.
For my first approach, I'm using a foreign key. You can read more about those here.
Because my first approach split the table into 2 tables, I needed to somehow know the id of the Parcels table when populating the Invoices table, even though the ids are given by the database (so I can't know them ahead of time). Those select statements accomplish that.
My syntax should work with SQL Server, but no necessarily with any other DBMS. That's because calculated fields are not ANSI standard.

How to find duplicate fields using SQL?

I have table person(id, iin, name, done) and table err_person(id, iin, surname, name). How i can find duplicate values within 'iin' fields. If it exist, copy to err_person table and set flag person.done=1 for these rows.
person table
desired results: err_person
To find duplicate values in a field:
SELECT iin
FROM person
GROUP BY iin
HAVING COUNT(*) > 1
You may wish to nest this query:
SELECT * FROM person WHERE iin IN (
SELECT iin
FROM person
GROUP BY iin
HAVING COUNT(*) > 1)
And so, to insert this into the err_person table you could do something like this (noting that the person table does not have a surname field):
INSERT INTO err_person (id, iin, name)
SELECT id, iin, name FROM person WHERE iin IN (
SELECT iin
FROM person
GROUP BY iin
HAVING COUNT(*) > 1)
Finally, a separate query would have to be run to change the done field. The problem with a nested query here is that you're updating a table that you're trying to look at, thus a simple effort to use an update query that looks at a subquery will fail - because both are based on the person table. A temporary table might be a better option here.

SELECT, Count & Insert in a single query?

I'm not entirely sure if this is possible, but I suspect it is.
I'm trying to gather some very basic statistics, so I have a'tracker' table that stores info on an ongoing basis, like so;
ID, IP, itemid
Each time an item is viewed, the visitors IP address and the Item ID are logged.
On a daily basis, I'd like to summarize this data and insert it into another table, like so;
ID, itemid, views
Now, the 'views' element I want to be unique - so ignoring any duplicate IP addresses (counting them only once).
I know I could simply loop through them all and do it that way, but is it possible to do the entire process with just a single query?
I'm using MySQL
If you group the tracker table by itemid, the number of distinct IP addresses should be the number of views you want:
INSERT INTO newtable (itemid, views)
SELECT itemid, COUNT(DISTINCT IP)
FROM tracker
GROUP BY itemid;
In other RDBMS it possible on this manner:
insert into othertable (field_views, field_itemid)
select count(distinct t.views), t.itemid from tracker
group by t.itemid
See also http://dev.mysql.com/doc/refman/5.0/en/insert-select.html
Note, this solution implies presence autoincrement in othertable.id
Try this,
insert into newtable(itemid,views)
select itemid,count(*)
from (
select itemid
from tracker
group by itemid,ip
)
as a
group by a.itemid.