I am trying to identify and remove duplicates from a data extract.
I have setup a query to group by contract_number and count > 1 which identifies the cases and there are two contract_start_date's of which I need to remove the earliest so I have applied min.
I am unable to run this as delete query. I am fairly new to Access and SQL Scripts.
SELECT Gas_Data.CONTRACT_NUMBER,
Count(Gas_Data.CONTRACT_NUMBER) AS CountOfCONTRACT_NUMBER,
Min(Gas_Data.CONTRACT_START_DATE) AS MinOfCONTRACT_START_DATE
FROM Gas_Data
GROUP BY Gas_Data.CONTRACT_NUMBER
HAVING (((Count(Gas_Data.CONTRACT_NUMBER))>1));
Try this approach where you, in the subquery, identify those records not to be deleted:
DELETE
*
FROM
Gas_Data
WHERE
Gas_Data.CONTRACT_START_DATE Not IN
(SELECT
Max(T.CONTRACT_START_DATE)
FROM
Gas_Data As T
WHERE
T.CONTRACT_NUMBER = Gas_Data.CONTRACT_NUMBER)
Of course, do make a backup first.
Consider the following:
delete from gas_data a
where exists
(
select top 1 * from gas_data b
where
a.contract_number = b.contract_number and
a.contract_start_date < b.contract_start_date
)
For every record, the above will test whether there is at least one other record in the dataset for which the contract number is equal and the start date is later. If such a record exists, the earlier record is deleted.
Always retain a backup of your data before running delete queries.
Try:
DELETE FROM Gas_Data
WHERE Count(Gas_Data.CONTRACT_NUMBER)>1
Related
I have two tables, tbl_NTE and tbl_PAH. There are records in tbl_PAH that is already available in tbl_NTE that is why I created an append query to automatically transfer and update some records which as a result causes duplicates every time I click the save button because the save button triggers the append query query.
I want to run a query where all the data with duplicates are deleted and just leave the original ones.
I created a delete query and typed the criteria:-
In (SELECT [CaseIDNo]
FROM [tbl_PAH] As Tmp GROUP BY [CaseIDNo]
HAVING Count(*)>1 )
I've also tried Last, First, Max and Group By as criteria but all it does it delete all the records as well.
In (SELECT DISTINCTROW tbl_PAH.CaseIDNo
FROM tbl_PAH
GROUP BY tbl_PAH.CaseIDNo
HAVING (((tbl_PAH.CaseIDNo) In (SELECT Last(tbl_PAH.CaseIDNo) AS
LastOfCaseIDNo FROM tbl_PAH Group By tbl_PAH.CaseIDNo HAVING
(((Count(tbl_PAH.CaseIDNo))>1));)));)
Here is the other one I've tried but also deletes the whole records of duplicates without leaving the original one.
DELETE tbl_PAH.CaseIDNo
FROM tbl_PAH
WHERE (((tbl_PAH.CaseIDNo) In (SELECT DISTINCTROW tbl_PAH.CaseIDNo
FROM tbl_PAH
GROUP BY tbl_PAH.CaseIDNo;)));
and when I run it, all the duplicates are deleted without leaving the original ones. Any idea on how I can work this out?
I've already set the Unique Records to Yes. I set the index to Yes (Duplicates Ok) to have no error while automatically appending the records to other tables but as a result, duplicates are created. Any help on deleting the duplicates with the criteria "When a record has duplicates in terms of CaseIDNo, the duplicates will be deleted leaving only the original record." I am a newbie at MS Access 2010 that is why I am still learning. I am using Microsoft Access 2010. Thank you in advance to those who will answer.
You can use the following query to delete all duplicate records where ID is not the minimal value of ID. Since ID is a unique column, that should leave the originals in place.
Note that I've refactored your first condition from an IN to an EXISTS, because those are often faster and more reliable.
DELETE tbl_PAH.CaseIDNo
FROM tbl_PAH t
WHERE EXISTS (SELECT 1 FROM tbl_PAH s WHERE s.CaseIDNo = t.CaseIDNo HAVING COUNT(s.CaseIDNo) > 1)
AND t.ID <> (SELECT Min(s2.ID) FROM tbl_PAH s2 WHERE t.CaseIDNo = s2.CaseIDNo)
I have this table:
Mytable(ID, IDGroupReference, Model ...)
I have many records in MyTable. The belong to a group, so all the records that belong to the same group has the the same IDGroupReference. IDGroup reference is the ID of one of the records that belong to the group. So all the records of a group has the same IDGroupReference, I can get all the records of the group with a single query:
select * from MyTable where IDGroupReference = 12345;
I can change one record from one group to another, in this case I want to change also all the records of the group too. I mean, I want to merge two groups in one.
In this case I can use this query:
Update Mytable set IDGroupReferencia = myIDReferenceGroup1 where IDGroupReference = IDGroupReferencieGroup2
I set the IDGroupReference of the group 2 with the IDGroupReference of the group one.
My doubt is about the concurrency, when two users try to change the group of two different records. Imagine the I have the group 1 with 10.000 records and tow users. User 1 try to change the record A of the group 1 to group 2 and user 2 try to change the record B from group 1 to group 3.
How the group has many records, 10.000, I think that when I try to update IDGroupReference with the query that I describe above, SQL Server update one by one, and how there are many records, it's is possible that some records are in the group b and other records go are in the group 3, when all of the must be in the same group, in the group 2 or 3, depends of which user is the last to update. But all of the records must be in the same group, not split.
So, when I use the update, how does it work? is a transaction and nobody can update any of the records that will be affected or a second user can update records in the middle of the update of the first user?
I mean:
group 1 with 10 records. User one execute the update. So the steps are:
SQL Server updates record 1.
SQL Server updates record 2
Meanwhile, a second user execute the query.
it is possible that the second user update the record 3 before is update by the query of the first user? Because if this happends, then the group 1 is splitted in two groups, some records go to group 2 and some of them go to group 3.
How can I ensure that all the records of the group 1 go to group 2 or group 3?
Thanks.
The solution is to use the hints of SQL Server. In this link there are more information.
The initial update:
Update Mytable set IDGroupReferencia = myIDReferenceGroup1 where IDGroupReference = IDGroupReferencieGroup2
It ss modify to:
Update Mytable with(tablock) set IDGroupReferencia = myIDReferenceGroup1 where IDGroupReference = IDGroupReferencieGroup2
By default SQL Server, with the update, only block the record that is being updated, but the rest can be modified. So I need to block all the table, to avoid that other update modify records in the middle of other update process.
The use of "with(tablock)" makes that, block the table when an update begins. then search for all the records that match with the where and update it. When the table is block, no other user can select or update records from this table. that is what I need in my particular case.
I am a web developer so my knowledge of manipulating mass data is lacking.
A coworker is looking for a solution to our data problems. We have a table of about 400k rows with company names listed.
Whoever designed this didnt realize there needed to be some kind of unique identifier for a company, so there are duplicate entries for company names.
What method would one use in order to match all these records up based on company name, and delete the duplicates based on some kind of criteria (another column)
I was thinking of writing a script to do this in php, but I really have a hard time believing that my script would be able to execute while making comparisons between so many rows. Any advice?
Answer:
Answer origin
1) delete from table1
2) USING table1, table1 as vtable
3) WHERE (NOT table1.ID>vtable.ID)
4) AND (table1.field_name=vtable.field_name)
Here you tell mysql that there is a table1.
Then you tell it that you will use table1 and a virtual table with the values of table1.
This will let mysql not compare a record with itself!
Here you tell it that there shouldn’t be records with the same field_name.
The way I've done this in the past is to write a query that returns only the set I want (usually using DISTINCT + a subquery to determine the right record based on other values), and insert that into a different table. You can then delete the old table and rename the new one to the old name.
To find list of companies with duplicates in your table you can use script like that:
SELECT NAME
FROM companies
GROUP BY NAME
HAVING COUNT(*) > 1
And following will delete all duplicates except containing max values in col column
DELETE del
FROM companies AS del
INNER JOIN (
SELECT NAME, MAX(col) AS col
FROM companies
GROUP BY NAME
HAVING COUNT(*) > 1
) AS sub
ON del.NAME = sub.NAME AND del.col <> sub.col
I want to write 1 SQL syntax that will place current transaction in group. This transaction had to be done within last 60 seconds.
Grouping current transaction with other existing transactions is done by assigning group id number (GRID) that is copied from other transaction also performed within a last minute.
In other words:
purchase is done and SQL script will look for other purchases that has been done within last minute and if found it will take group number from found row and assign to current purchase, so in this case every purchase made within a minute will find itself in a group.
This is the update statement below I have composed
UPDATE TRANSACTIONS
SET GRID=(SELECT G FROM
(SELECT GRID AS G
FROM TRANSACTIONS
WHERE CUST_ID='123ID'
AND STAMP+60>UNIX_TIMESTAMP()
LIMIT 1)
AS t),
STAMP=UNIX_TIMESTAMP()
WHERE CUST_ID='123ID'
AND STAMP+60>UNIX_TIMESTAMP();
However this always returns number of updated rows, even if row which exists is only the one is due to be updated, or the other row that was found has no group number assigned yet. Which is obvious as it updates with whatever value was found in subquery. If nothing found it will update with empty value.
There are 2 solutions I am interested in:
I want this script to stop performing update (by condition) if found (from the sub query) value is basically empty.
or
I want to insert condition that if subquery returns empty value, the fix string of characters will be inserted instead.
After a while of exploring my issue I have come to solution.
The following MySQL syntax serves what I want.
Please note very interesting MySQL function IFNULL(). Can be very handy!
UPDATE TRANSACTIONS
SET GRID=(SELECT G FROM
(SELECT IFNULL(GRID, 'NO ID') AS G
FROM TRANSACTIONS
WHERE CUST_ID='123ID'
AND STAMP+60>UNIX_TIMESTAMP()
LIMIT 1)
AS t),
STAMP=UNIX_TIMESTAMP()
WHERE CUST_ID='123ID'
AND STAMP+60>UNIX_TIMESTAMP();
Using mysql, how do I delete all rows from a table, but keep, say, 200 records?
The obvious approach is to count them, do some arithmetic, and delete the right number. But does mysql has some builtin function that does it in one delete query?
You can delete using a condition:
delete from YourTable
where YourSequentialID > 200
However your sequential could have gaps, so you would not have exactly 200 rows. So what you can do is working on your condition.
Find the records you want to keep (say the first 200) and delete everything else:
delete from YourTable
where id not in
(
select ID
from YourTable
LIMIT 200
)
I know, that can be slow. But that's not a production query, it's just a clean up query. You can live with having to run it only once.