Find and remove duplicates in a Bibtex (BibDesk) using AppleScript

Find and remove duplicates in a Bibtex (BibDesk) using AppleScript - duplicates

I have more than a thousand duplicates in my Bibtex library. The duplicates have no identical Citation Keys. They have identical titles.
I have tried both BibDesk and Jabref to remove the duplicates. They are however don't manage to find them all; not even half of them.
I find one promising AppleScript in here: http://se-server.ethz.ch/staff/af/bibdesk/
But, since I am total beginner with AppleScript, I couldn't adopt it to my needs.
Here is the AppleScript:
on run {}
CleanupDuplicates()
end run
-- IMPORTANT NOTE: The following routine is an identical copy as contained in files 'Cleanup Duplicates.scpt' and 'Fix PDF and URL Links.scpt'. Make sure the two copies are always kept identical.
on CleanupDuplicates()
set theBibDeskDocu to document 1 of application "BibDesk"
tell document 1 of application "BibDesk"
-- get all publications sorted by cite key ensuring that in any set of publications with the same cite key the youngest comes first and the oldest, typically the only one of the set that is still member of any static groups, comes last. To retain static group memberships we have to ensure that such "membership info" is copied from the last to the first publication of any set of publications with the same cite key (see vars 'aPub', 'prevPub', 'youngestPub').
set thePubs to (sort (get publications) by "Cite Key" subsort by "Date-Added" without ascending)
set theDupes to {}
set prevCiteKey to missing value
set prevPub to missing value
set youngestPub to missing value
repeat with aPub in thePubs
set aCiteKey to cite key of aPub
ignoring case
if aCiteKey is prevCiteKey then
set end of theDupes to aPub
-- we fix the static group membership redundantly in cases where aPub is also merely an obsolete duplicate, since we have possibly not yet advanced to the end of the set with the same cite key. But this is unavoidable with this algorithm looping simply through all publications. The end result will be that youngestPub (first in set of publications with same cite key) will be member of all static groups of the publications in the set (unification). The latter should be no big issue, since typically in multiple sets of publications it is only the last publication that matters. If this should be an issue, then we would need to first delete all static group membership info in 'youngestPub' in case we encounter a 3rd, or 4th etc. same cite key in 'aPub', and copy only those of 'aPub'. However, for the sake of efficiency I wish not to support this behavior.
my fixGroupMembership(theBibDeskDocu, aCiteKey, aPub, youngestPub)
else
-- remember in 'youngestPub' possible candiate for a new set of publications with the same cite key
set youngestPub to aPub
end if
end ignoring
set prevCiteKey to aCiteKey
set prevPub to aPub
end repeat
repeat with aPub in theDupes
delete aPub
end repeat
end tell
end CleanupDuplicates
on fixGroupMembership(theBibDeskDocu, theCiteKey, oldPub, newPub)
tell application "BibDesk"
tell theBibDeskDocu
set thePubsGroups to (get static groups whose publications contains oldPub)
if (count of thePubsGroups) is greater than 0 then
repeat with aGroup in thePubsGroups
add newPub to aGroup
end repeat
end if
end tell
end tell
end fixGroupMembership
So, what I want is to be able to find the duplicates by Title: and to be able to delete the Oldest (that means, by modification date).
Can you guys help me modify this script please?

Use this script:
on run {}
CleanupDuplicates()
end run
on CleanupDuplicates()
script o
property thePubs : {}
end script
tell document 1 of application "BibDesk"
-- get all publications sorted by Title (same titles are sorted by Date-Modified, descending)
set o's thePubs to (sort (get publications) by "Title" subsort by "Date-Modified" without ascending)
set tc to count o's thePubs
set i to 1
repeat while i < tc
set theTitle to title of item i of o's thePubs
repeat with j from (i + 1) to tc -- check the next title
considering case -- match the case, *** remove this if you want to ignore the case
if (title of item j of o's thePubs) is not theTitle then exit repeat --- not the same title, so exit this loop ---
end considering
delete item j of o's thePubs --- the title is the same, so remove this publication (a duplicate, oldest modification date) ---
end repeat
set i to j
end repeat
end tell
end CleanupDuplicates
Update
Caveat: some publications have no modification date.
To sort publications by modification date properly, you need to define the Date-Modified field on publications that have not been modified.
An AppleScript can't change the date property of a publication in BibDesk because these dates are read-only.
Here's a solution:
Close the document in BibDesk.
Open the ".bib" file in the "TextWrangler" application.
Run this script:
--
-- This script add the modification date on publications that have no "Date-Modified", the date will be that of the "Date-Added".
-- so, open a ".bib" file in "TextWrangler", and run this script
tell application "TextWrangler"
tell text document 1
select line 1 -- to start the search at the beginning of the document
repeat -- until not found
-- search "Date-Added" + (a blank line or the end of the document)
set r to find "(?s)^\\tDate-Added = {.+?(^$|\\z)" searching in it options {search mode:grep, wrap around:false} with selecting match
if found of r then
if "Date-Modified = {" is not in (found text of r) then -- the Date-Modified field is not in this publication
set x to startLine of found object of r
set t to text 12 thru -1 of (get contents of line x) -- get the value of the Date-Added field --> " = {2016.09.10 03:34}," as example
add suffix (line x) suffix "\\n\\tDate-Modified" & t -- append (a line break + a tab + "Date-Modified" + the value of the Date-Added) to this line
end if
else
exit repeat -- no found or end of the document
end if
end repeat
end tell
end tell
From TextWrangler, Save or "Save as..." and close the document.
Open the ".bib" file in BibDesk.

Related

How Can I Get a Row From a sheet and Use it to Add or Update a New Row on Another Sheet? (Python 3)

I have been looking for a solution to the situation I've briefly described in the title. I'm trying to use a smartsheet_client.Sheets.get_row() request from one sheet, take that data and either add it or update it to a new row on a sheet where I can use a location specifying attribute to place it in a spot that isn't just the bottom of the sheet. I know that I can copy rows from one sheet and paste them to another with code, but I am trying to bypass the "only copy at the end of the sheet" limitation. Is this even possible, or am I grasping at straws? Here is my code that I have been working with:
row_a = smartsheet_client.Sheets.get_row(
2896510686914436, # sheet_id
6830091038549892, # row_id
include='discussions,attachments,columns,columnType'
)
row_a.sibling_id = 3539932199446404
#if row_a.isinstance('parent_id',int)
#row_a.created_at = new_time
#row_a.modified_at = new_time
row_a.above = False
#row_a.row_number = None
#row_a.parent_id = None
row_a.id = 7015416612448132 #id of new row
# Add rows to sheet
response = smartsheet_client.Sheets.update_rows_with_partial_success(
731362710841220, # sheet_id of sheets we are adding to
[row_a]
)
print("Done!")
print(row_b)
There's a lot going on right now, but my original plan was to get_row then append the sibling_id and .above of where I want my new row to go, and then simply add a row of the row object I just built. Instead, I received parsing errors:
`{"response": {"statusCode": 400, "reason": "Bad Request", "content": {"errorCode": 1008, "message": "Unable to parse request. The following error occurred: Field \"createdAt\" was of unexpected type.", "refId": "1eyronnlz32sw"}}}`
My next thoughts were to append the created_at attribute to be the same as real time, but then modified_at started becoming the error. So I did the same thing again, and then the new error became "invalid row location: specify above or below with siblingId You cannot use other location specifiers in the same request."
No matter what I seem to do from this point, nothing works. Even if I set "other location specifiers" like row_number and parent_id to None, I'm just told that "The attribute(s) row.id, row.createdAt, row.modifiedAt, row.columns[], row.sheetId, row.version, row.accessLevel are not allowed for this operation."
Nothing seems to be just quite right for this operation. If anyone can offer any insight relating to my situation or just helpful tips in general, I am all ears.
Thank you!

I would not recommend trying to use the response from a Get Row operation to create a new row (Add Row) in a sheet. Reason being -- you're likely to encounter issues (as you've described) caused by the fact that not all row attributes can be set.
For example, createdAt and modifiedAt will be included in the Get Row response, but cannot be specified in an Add Row or Update Row request -- because they are read-only attributes that are set automatically by Smartsheet when a row is created or modified.
That's exactly what the error message "The attribute(s) row.id, row.createdAt, row.modifiedAt, row.columns[], row.sheetId, row.version, row.accessLevel are not allowed for this operation." is trying to tell you -- those are all read-only row attributes that are set automatically by Smartsheet -- trying to set them via an Add Row or Update Row request will always result in this error.
Copying a row from one sheet to a specified position in another sheet can be accomplished by a two-step process:
Issue a Copy Rows to another sheet request to append a copy of the specified row to the bottom of the other sheet. (Note the id of the newly created row that's included in the response, as you'll use it in step #2.)
Issue an Update Rows request -- containing attributes as described in Specify Row Location -- to move the newly created row to the desired location in the other sheet.
Here's some sample code that implements the 2-step process I've described.
# specify source info
source_sheet_id = 3932034054809476
source_row_id = 3812039265019780
# specify destination info
destination_sheet_id = 8428033158735748
'''
STEP 1:
Copy row from source sheet to (bottom of) destination sheet
'''
# copy row from source sheet to (bottom of) destination sheet
# (include everything -- i.e., attachments, children, and discussions)
response = smartsheet_client.Sheets.copy_rows(
source_sheet_id,
smartsheet.models.CopyOrMoveRowDirective({
'row_ids': [source_row_id],
'to': smartsheet.models.CopyOrMoveRowDestination({
'sheet_id': destination_sheet_id
})
}),
'all'
)
# get the id of the newly created row
destination_row_id = response.row_mappings[0].to
'''
STEP 2:
Move new row from the bottom of the destination sheet
to the desired location within that sheet. This example moves the row
to directly below the specified sibling row.
'''
# specify id of row that should appear directly above the row I'm moving
sibling_row_id = 3620387999115140
# build the row to update (move)
# 'id' specifies the id of the newly created row that I now want to move
# 'sibling' attribute specifies the id of the sibling row where the row should be moved
# relative position (to sibling row) is not specified, so it will default to 'below'
row_to_move = smartsheet.models.Row()
row_to_move.id = destination_row_id
row_to_move.sibling_id = sibling_row_id
# update the row to change its location
updated_row = smartsheet_client.Sheets.update_rows(
destination_sheet_id,
[row_to_move]
)
This should get you headed in the right direction with things. If you have trouble with step 2 (successfully using location specifier attributes to move the row to the desired location), please post a new question here on Stack Overflow.

Access writng to wrong row number

4150
NRrows = RSNonResourceCosts.RecordCount ' Number of Rows in Non Resource Table
NRCols = RSNonResourceCosts.Fields.Count ' Number of Fields in NonResource Table
Dim CL(1 To 10) As Integer ' This is to count "filled rows" when spreadsheet is filled
Dim Header(1 To 10) As String
'-----------
'Find the Headers (Taken from Actual Table and not predefined as original)
For Each Recordsetfieldx In RSNonResourceCosts.Fields
If C > 0 Then
Header(C) = Recordsetfieldx.Name
End If
C = C + 1
Next Recordsetfieldx
4170
R = 0
'Write to worksheet
RSNonResourceCosts.MoveFirst
Do Until RSNonResourceCosts.EOF
For C = 1 To NRCols - 1
FieldName = RSNonResourceCosts.Fields(C).Value
If RSNonResourceCosts.Fields(Header(C)).Value <> "" Then
CL(C) = CL(C) + 1
WKS.Cells(200 + R, C) = RSNonResourceCosts.Fields(Header(C)).Value
End If
Next C
RSNonResourceCosts.MoveNext
R = R + 1
Loop
I attach code. Have solved part of original by defining Recordset. User can add column to Table. First part of code determines the headers. Second part determines values and writes to worksheet. The new Rows are appearing first on the worksheet and in wrong column. I tried attaching worksheet but it looked awful. Any help would be appreciated.

Two things:
1) The order your records is the order they are in the recordset. If you want them in a particular order, try sorting them (perhaps with an ORDER BY in the underlying SQL statement)
2) For the column issue: In the first bit of code, I don't see where C is initialized, but keep in mind the Headers and Fields both start with an index of 0, so if you set Header(1) = the first field's header (index 0), but then copy the data in the fields without shifting the index value, it will shift everything over by one column.
As an added note, you might want to consider what happens when you have more than 10 columns. Using fixed-length arrays means your code will break. You might want to read about using a dynamic array and ReDim.

I don't yet feel like I have completely grasped the entirety of the problem yet, but let me take a stab at it. From what I do understand, data is being written from your record set into excel (good), but it is going into the 'wrong row' (question title) and the 'wrong column' (question text).
From what I see, I don't know the purpose of FieldName = RSNonResourceCosts.Fields(C).Value, but I want to make sure that you understand that RSNonResourceCosts.Fields(C).Value is not necessarily equivalent to RSNonResourceCosts.Fields(Header(C)).Value. More than that, you are likely missing at least one column altogether in your output, or at least skipping over it accidentally. rs.Fields(0).name is the first 'column' in a recordset, but it is completely ignored in your code. Perhaps this is intentional, maybe it is a key field or something useless to you, but it is important that you are making that distinction intentionally. But, since I don't see where your code populates the headers in your worksheet, I wonder if 'wrong column' means every record has been shifted a column and your last column is sitting empty. That, coupled with the dubious omission of C being initialized as 0 (not 1, or anything else) in your above code, makes me concerned that Header(3) could possibly by field(1), or field(4), or I don't know. That would certainly also confuse the columns in your output, or at least make dependence on FieldName frustrating.
Another thing, really a shot in the dark: NRrows. I have had issues before, depending on how I create my recordset, of not getting the correct record count the first time. And, if I base the population of a worksheet, array, etc., on the number of rows and the records relative position in that number, my records get all sorts of wacky. Maybe you did this already, but since it isn't shown, I recommend a RSNonResourceCosts.movelast: RSNonResourceCosts.movefirst line before you define NRrows, just to be sure.
And last, if I am way off base here... then you really are going to have to show us the spreadsheet, even if it isn't your most beautiful work. We all know that if it were, you wouldn't be asking about it here... so set your pride aside, and be more specific as well as show us what the output looks like and how it should look.

In MS Access 2010 I'm trying to ignore a duplicate entry in a control

I want to ignore duplicate entries in specific text fields on a form. So, for example, I have 3 fields for seal entry. If the person scans a seal twice, it will ignore the duplicate and keep the focus on the field until a different number is entered. I cobbled together some code that works for the first and second entry, but not the third. When I debug it seems to be finding a duplicate number even though I'm entering in a different one.
Private Sub Seal2_AfterUpdate()
If Seal2.Value = Seal1.Value Or Seal3.Value Then
Seal2.Value = Null
Seal1.SetFocus
Seal2.SetFocus
End If
End Sub
Private Sub Seal3_AfterUpdate()
If Seal3.Value = Seal1.Value Or Seal2.Value Then
Seal3.Value = Null
Seal1.SetFocus
Seal3.SetFocus
End If
End Sub

You are treating the Or wrong. It's evaluating after the Seal2.Value = Seal1.Value.
So what you have done in that if statement is equivalent to
If (Seal3.Value = Seal1.Value) Or Seal2.Value Then
And since Seal2 has a value, it is evaluating to True
You want something more like:
If (Seal3.Value = Seal1.Value) Or (Seal3.Value = Seal2.Value) Then
Also, you seem to be setting focus to one field and then another. You probably want to only go to one unless you are trying to fire off triggers/events when you get/lose focus in each one; however, if you are doing that you might be playing with fire
In the end your if blocks probably want to look like this:
If (Seal2.Value = Seal1.Value) Or (Seal2.Value = Seal3.Value) Then
Seal2.Value = Null
Seal2.SetFocus
End If
and
If (Seal3.Value = Seal1.Value) Or (Seal3.Value = Seal2.Value) Then
Seal3.Value = Null
Seal3.SetFocus
End If
Finally, if you find that you have to expand beyond 3 fields, you may want take a slightly different approach that would be more scalable. E.g., Building up a list in memory, sorting and filtering it. But for 3 or 4 items you can get away with what you have already.

Can I set a datafield value twice in a single mysql command

Suppose there is a data field education in my table profile, now I want to update education='01' where earlier education was 'BA' , similarly education='02' where education was 'MD'
So I can do this task like this
update profile set education='01' where education='BA';
update profile set education='02' where education='MD';
My question is can I do this task in one command only like
update profile set education='01' where education='BA' and set education='02' where education='MD';
This syntax is wrong, please tell me is this possible and how ?
If it is not possible, than also please let me know about it...

You can use a CASE statement in the SET clause, but be careful to include an ELSE case which sets the column to its current value -- otherwise, the rows that aren't matched by the two cases will be set to NULL.
UPDATE profile
SET education =
CASE
WHEN education = 'BA' THEN '01'
WHEN education = 'MD' THEN '02'
/* MUST include an ELSE case to set to current value,
otherwise the non-matching will be NULLed! */
ELSE education
END

MySQL triggers: move to trash

I want to create a trigger in MySQL that will do two things: if forum's topic is located in trash or it is hidden, delete it, elsewhere move the topic to trash. The question is how to stop the delete action in 'before delete' trigger?

I don't know if there's a way to prevent the delete once it has already been called without raising an exception of some sort.
I think a better solution might be instead of calling delete on the record you want to trash/delete you should update a field such as "IsTrashed" to TRUE. And then in the update trigger, see if it was already TRUE and being set to TRUE again (e.g. IF(OLD.IsTrashed && NEW.IsTrashed)). If so, delete it, otherwise move it to the trash.
The only problem that will arise from this method is if you update a different field (e.g. PostDate) of a trashed item, NEW.IsTrashed and OLD.IsTrashed will both be TRUE so it might look like you are trying to delete it, but you are only updating the PostDate. You can either check that this is the only field that was modified (e.g. by checking OLD.SomeField <> NEW.SomeField for every other field) or use a field that will always reset it's value to NULL after an UPDATE statement. Something like "TrashNow". That way if TrashNow ever has a TRUE value, you know that you did intentionally want to trash the field.
However, that "reseting field" is just wasted space, I think the best solution for this problem is a stored procedure... something like:
CREATE PROCEDURE DeletePost (IN APostID INT)
BEGIN
IF ((SELECT InTrash FROM posts WHERE PostID = APostID LIMIT 1))
DELETE FROM posts WHERE PostID = APostID;
ELSE IF
UPDATE posts SET InTrash = TRUE WHERE PostID = APostID;
END IF;
END;
Assuming you have the table posts with the fields PostID (INT) and InTrash (any integer type).
You would call this like so to delete post with PostID 123:
CALL DeletePost(123);

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008