MySQL deleting duplicates - mysql

I updated an old site a couple of months ago moving from a Joomla install to a bespoke system. In order to convert the data from the Joomla tables to the new format I wrote various php scripts which stepped through the old records by section, processed them and inserted into the new table. All was fine until I recently discovered I had forgotten to add the die() statement to the top of one of the scripts and somehow a searchbot has been merrily pinging that script over time to add precisely 610 duplicates in one particular section.
So the things I do know about the data is that the row with the lowest ID is the row I want to keep, and the duplication only exists in CATEGORY = 8. To be sure of a duplicate, the row ORIGINAL_ID will match.
Beyond SELECT, INSERT, DELETE, I'm no MySQL expert, so confused as to how to approach this. What would the experts out their suggest?
Edit: Example code
ID CATEGORY TITLE ORIGINAL_ID
1 7 A 1
2 8 A 2
3 8 A 2
4 8 B 3
5 8 C 4
6 8 A 2
In the above example, records 3 & 6 should be stripped, because they are in CATEGORY=8, have duplicate ORIGINAL_ID; but retain the row with the lowest id (row 2)

So, you want to identify records within Category 8, where there is another record with the same Category, Title and Original_id. You also want to check if that other record has a lower ID.
So:
Select *
from MYTABLE T1
where CATGEORY = 8
and EXISTS (
select 1
from MYTABLE T2
where T2.CATEGORY=T1.CATEGORY
and T2.TITLE=T1.TITLE
where T2.ORIGINAL_ID=T1.ORIGINAL_ID
where T2.ID>T1.ID
If you run this and it returns only the records you wish to delete, replace the "select *" with a "delete" and re-run.

Related

Access: finding the corresponding value of maximum value

I have a database in which I perform an audit on a set of required documents, for several locations of those documents.
So I have a table named Locations and a table named Documents, which are correlated through a 2 x 2 relationship.
Every document can have multiple versions. In my query, I want to see only the most recent version of each document, so the max(Id).
Now, every version can be 'audited' (checked) multiple times, for example 2 times each year. Each Audit/check is stored in a record, and I want to show only the most recent audit for each document, so Max(ID).
This is my Selection Query:
SELECT [~Locations].Location, [+DocuProperties].Category, [~Documents].[Document name], Max([DocuVersion].Id) AS MaxDocuID, Max([Audit].Id) AS MaxAuditID, [Audit].Conclusion
FROM ([~Documents] INNER JOIN ([~Locations] INNER JOIN ([+DocuLocation] INNER JOIN [+DocuProperties] ON [+DocuLocation].Id = [+DocuProperties].DocuLocation) ON [~Locations].Id = [+DocuLocation].Location) ON [~Locations].Id = [+DocuLocation].DocuName) INNER JOIN (DocuVersion INNER JOIN 2Audit ON [DocuVersion].Id = [Audit].DocuVersion) ON [+DocuProperties].Id = [DocuVersion].DocuLocation
GROUP BY [~Locations].Location, [+Docuproperties].Category, [~Documents].[Document name], [Audit].Conclusion
However: I do not wish to Group on Audit Conclusion, I wish to show the Audit conclusion that corresponds to the Max(Id) of that Audit.
So for every most recent Audit, I want to show the Conclusion. This conclusion I want to show for each Document, grouped byCategory and grouped byLocation.
I know I need to build a nested subquery of some form, but I just can't get any code to work.
I hope anybody can help.
The basic idea is like this:
Table 1
DocuProperties
Id Location Category
1 15 1
2 15 1
3 14 2
(every location can have multiple document properties a.k.a. objects)
Table2
DocuVersion
Id DocuProperty DocumentEndDate
1 1 01-01-2022
2 1 20-07-2023
3 2 31-07-2023 etc.
4 3 01-10-2023
(every DocuProperties can have multiple versions, I have to check If they are still valid, but also on some other criteria ).
Table 3
Audit
Id DocuVersion Conclusion
1 1 Not Valid
2 1 Not Valid
3 2 Valid
4 4 Valid
(every version can be audited multiple times. Every audit can have a different conclusion)
Which I would like to translate into the following:
LASTAudit (a.k.a. the most recent audit of the most recent version of the most recent property)
Location DocutPropertyId DocuVersionId AuditId Conclusion
15 2 2 2 Not Valid
14 3 4 4 Valid
The ID’s were easy to get right, as those were just Max(Id) functions. The problem was to get the Conclusion corresponding to that audit of that version of that object.

Select all values from one table, check another table to see related columns and fetch more values

I really dont know how to phrase my question, probably why google is not giving me results that i need, but am going to try.
I have two tables, required_files table and submitted_files table. I have a page where i want to display to a user all required files for submission and show which files he/she has submitted.
Required files table is as follows:
file_id file_name mandatory
1 Registration Certificate 0
2 KRA Clearance 1
3 3 Months Tax returns 0
4 Business Permit 1
5 Tour Permit 1
6 Country Govt Operating License 0
7 Certificate of good Conduct 0
file_id is unique, mandatory column is binary value to state whether the file is mandatory before registration or not.
submitted files table is a follows
file_id user_id file_required_id original_file_name file_name_on_server submission_date
1 2 2 KRA_Form.docx 0a10f5291e9bcb6a345ac7a8f5705b8a.docx 2016-11-01
2 2 3 Tax_returns.docx 9f04361013df7e25235a03c506f347ed.docx 2016-11-03
3 3 3 Taxes.docx 86aea74cc87fb669510d9d4c488cbcf8.docx 2016-11-04
file_id is unique AI value, user_id col is unique value of the current user logged in, file_required_id column is related to files_required.file_id column
When fetching the values i already have a user_id (in this case, lets use user_id = 2) Now i want to fetch all values of files_required table and check on files submitted table for files that user_id = 2 meaning user has submitted the files.
my sql query is as follows
SELECT files_required.*, submitted_files.* FROM submitted_files
RIGHT JOIN files_required ON files_required.id = submitted_files.file_required_id
WHERE submitted_files.user_id = 2
This gives me two rows only where the user_ids matched but i want the entire files_required table values and show which files the user has submitted. Someone Kindly assist.
In the meantime, i am fetching files_requied table first then looping through the other table using a php script to look for submitted files for the given user. it works but its not what i wanted and is cumbersome and a rookie move.
Try having user_id condition in RIGHT JOIN itself like below query
SELECT files_required.*, submitted_files.*
FROM submitted_files
RIGHT JOIN files_required ON files_required.id = submitted_files.file_required_id
AND submitted_files.user_id = 2
You want this.
SELECT submitted_files.user_id, files_required.*, submitted_files.*
FROM submitted_files
RIGHT JOIN files_required ON files_required.id =
submitted_files.file_required_id
Don't put the where condition on userid as it will filter out the data just for that user. You want all the records and user should also be seen. Just put the user_id in the select statement.

MS-Access 2010 DELETE Query LEFT JOIN

There's a lot of these issues floating around the net with many solutions, but I'm really struggling with this one.
I have a table [BaseHrs] which looks a little like this -
p_ID b_Person WeekNos HrsRequired
1 A 2016-39 10
1 A 2016-40 10
1 A 2016-41 10
1 A 2016-42 10
1 B 2016-39 11
1 B 2016-40 11
1 B 2016-41 12
1 B 2016-42 09
The table continues with different p_ID, people & week numbers. There is no Primary Key and no indexing. This table also has no relationship with any other table.
It is populated from a Query connected to another table as well as a form for the [HrsRequired] field.
Scenario -
Project 1 (p_ID=1) has now been brought forward by two weeks and BaseHrs table no longer needs row for [WeekNos] 2016-41 & 2016-42.
I initially use a query to show which weeks the project is now running on (qry_SelectNewDates).
I have started my delete query by first creating a Select query which looks like this -
SELECT BaseHrs.*
FROM BaseHrs
LEFT JOIN qry_SelectNewDates
ON BaseHrs.WeekNos = qry_SelectNewDates.WeekNos
WHERE (((BaseHrs.p_ID)=[Forms]![frm_Projects]![p_ID])
AND ((BaseHrs.WeekNos) Not In ([qry_SelectNewDates].[WeekNos])));
This works as intended.
Converting that into a delete query produces an error though. Delete Query -
DELETE BaseHrs.*, BaseHrs.p_ID, BaseHrs.WeekNos
FROM BaseHrs
LEFT JOIN qry_SelectNewDates
ON BaseHrs.WeekNos = qry_SelectNewDates.WeekNos
WHERE (((BaseHrs.p_ID)=[Forms]![frm_Projects]![p_ID])
AND ((BaseHrs.WeekNos) Not In ([qry_SelectNewDates].[WeekNos])));
Error message -
Could not delete from specified tables.
I realise that there is often an issue when trying to delete records in this way. I've tried using it with just 'DELETE.*' in the first line without luck.
I have also made an attempt at a nested Query, but I just can't figure out how to construct it. Any guidance?
**********EDIT**********
With advice from #SunKnight0 I have added a primary key to my BaseHrs table and got this query -
DELETE *
FROM BaseHrs
WHERE b_pKey IN
(SELECT BaseHrs.b_pKey
FROM BaseHrs
LEFT JOIN qry_SelectNewDates
ON (BaseHrs.WeekNos = qry_SelectNewDates.WeekNos)
WHERE (((BaseHrs.p_ID)=[Forms]![frm_Projects]![p_ID])
AND ((BaseHrs.WeekNos) Not In ([qry_SelectNewDates].[WeekNos]))));
This query appears to work but takes a huge amount of time to run. Is that as good as it gets?

Conditional delete across multiple tables in mysql

I have two tables. One of them contains files, the other one actions:
|Files | |Actions |
|---------| |------------|
|FileID | |ActionID |
|Filename | |ActionDate |
|... | |... |
|---------| |------------|
One file can have several actions. Those actions happened at a certain date.
Every now and then I want to delete all files and its actions. But only if one of the actions of that file is older than - say - 1 year.
For example:
File 1 has 2 actions: Both actions happened a week ago. Do not delete
File 2 has 2 actions: Both actions happened 10 years ago. Delete
File 3 has 2 actions: One of them happened 10 years ago, the other one half a year ago. Delete
I would love to do that without having to do it in several steps. (Like selecting stuff in my perl script first and then iterate over those to delete them or whatever)
If this is too easy I can provide further challenge:
There is another table, lets call it 'State'. One State can have multiple actions again and i also want to delete all the states that are referenced by the actions that are going to be deleted.
Any hints on how to do this highly appreciated!
edit
oh my, I just realized that deleting from multiple tables at once is highly discouraged, especially when dealing with big amounts of data.
I assume this means there is no (decent) way around doing this within sql, correct?
For files and action you first need to find out the files whose one of action is a year later this can be done using below query
select *,
sum(ActionDate < now() - interval 1 year) need_to_delete
from
Actions
group by FileID
having need_to_delete >0
This will give you the file ids which need to be delete from the database
Select Demo
Second you need multi-delete query joined with above query to delete from multiple tables in single query
delete f.*,a.* from files f
join Actions a
on(f.FileID = a.FileID)
join (
select *,
sum(ActionDate < now() - interval 1 year) need_to_delete
from
Actions
group by FileID
having need_to_delete >0
) fa
on(f.FileID = fa.FileID)
Delete Demo
For deletion of states above query will help you and i am leaving it to

Sql query to keep "counted" the rows that have been deleted

I have a classified site... I'm trying to make a sql query that COUNTS the number of ads the user has posted in last 7 days, but I have a problem...
I'm trying to show in user profile something like this for example: [Username] has posted 30 ads in last 7 days
Here is my sql query ->
SELECT COUNT(*)
FROM table_name
WHERE user_id = '[user_id]' AND created_date > NOW() - INTERVAL 7 DAY
So in my case "table_name" contains ALL the ads from all the users and by "user_id = '[user_id]'" I show the user A his number of ads, and to USER B his number of ads etc...
So this query works, it counts the number of ads correctly, BUT, if for example user enters on site and DELETE's 1,2 or whatever number of his ads, this number will be "minused" from the "[Username] has posted 30 ads in last 7 days"
So let's say for example user posted 20 ads in the last 5 days - The correct result is [Username] has posted 20 ads in last 7 days
Now user enters on site and delete's 4 ads - Now the result is [Username] has posted 16 ads in last 7 days
Can somebody help me please, what can I add to the query so the count still shows the correct number of ads (in my case 20 ads), even if the ads where deleted..
Thank you
Cheers
Instead of deleting a row using a DELETE ... WHERE ... statement, add a deleted column and use an UPDATE statement:
UPDATE ... SET deleted = 1 WHERE ...
Then your counting function will work without modification.
Of course you will now have to fix all the rest of your code to not show deleted adverts. You can do this by adding WHERE NOT deleted to all your other queries. You could also create a view that only shows ads that are not deleted and update your code to query this view instead of the original table.
Rather than deleting the ads from the system, add a "deleted" flag, or move them to a deleted table. This way you never lose the record of them.
There's nothing you can add to a query to find data which has been deleted. You need not to delete the data, but add a marker to the record the user wants to delete to indicate that it's been removed from display.
As well as allowing the record to be counted, it has the additional advantage that the user can be permitted to reinstate that advert, if he wants to.