AppleScript parse HTML - html

Goal: Long list of purchase orders that I need to refund if they contain the word "From:" and each purchase has a order number and I have been trying to parse or loop, not sure of right word Search HTML doc for instance of the word "From:" and if found, then get other number and print to a text file.
So far, I was able to learn, sort of how to use text item delimiters, but when I tried making it loop or repeat, I just got the same order number over and over. This is before even trying to implement the part that says IF order details include word "from" then get order number.
set astid to AppleScript's text item delimiters
set x to 2
set startHere to "p class=\"webOrderNumber\">"
set stopHere to "</p>"
set entireText to (do shell script "curl file:///Users/Michael/Desktop/StorePurchases.webarchive")
set AppleScript's text item delimiters to startHere
set blurb1 to text item x of entireText
repeat with i from 2 to (count of blurb1)
set thisItem to item i of blurb1
set AppleScript's text item delimiters to stopHere
set blurb2 to text item 1 of blurb1
set AppleScript's text item delimiters to astid
set writeFile to ((path to desktop as text) & "test_output") --
set writeData to blurb2
set success to appendDataToFile(writeData, writeFile)
end repeat
on appendDataToFile(myData, aFile)
set OA to (open for access aFile with write permission)
try
write (myData & (ASCII character 10)) as text to OA starting at eof
close access OA
return true
on error
try
close access OA
end try
return false
end try
end appendDataToFile
Here is an excerpt of the HTML:
</td>
<td class="OrderNumber">
<p class="webNumber">BHWL123456</p>
</p>
<td class="OrderNumber">
<p class="details">
Full Version: Get all Activities & Remove ads! [From: Pony Resort: MakeverMagic] </p>
</td>
This is going to be used when there are hundreds of orders that would take too long to physically search for the word FROM and write the order number.

Try:
set resultNumbers to {}
set myText to read file ((path to temporary items as text) & "test.txt") as «class utf8»
set {TID, text item delimiters} to {text item delimiters, "<p class=\"webNumber\">"}
set webOrderNumbers to text items 2 thru -1 of myText
set AppleScript's text item delimiters to "</p>"
repeat with wNumber in webOrderNumbers
set end of resultNumbers to text item 1 of wNumber
end repeat
set text item delimiters to TID
return resultNumbers

This question was asked and solved afaik on Macscripter.net with the use of a Applescript Script library and additional Applescript I provided in this thread.
The scripting library uses NSRegularExpresion to search between two text targets and return all occurrences in one go.
I can post the code here if anyone is interested or you can visit the thread.

Related

Check for duplicates when using tab-separated txt file to generate iTunes playlist

I'm using the following code in Applescript to generate iTunes playlists from a tab-separated .txt file. However, I wondered if there is a way of getting it to check if the files are already in the target playlist before adding each file?
set thisTSVFile to (choose file with prompt "Select the CSV file")
readTabSeparatedValuesFile(thisTSVFile)
set theList to readTabSeparatedValuesFile(thisTSVFile)
tell application "iTunes"
set myPlaylist to playlist "InSLtxtLists"
set sourcePlaylist to playlist "Music"
end tell
repeat with i from 2 to number of items in readTabSeparatedValuesFile(thisTSVFile)
--gets first column
set theName to item 1 of item i of theList
--gets second
set theArtist to item 2 of item i of theList
tell application "iTunes"
try
duplicate (some track of sourcePlaylist whose name is theName and artist is theArtist) to myPlaylist
on error the errorMessage
set
end try
end tell
delay 0.1
end repeat
on readTabSeparatedValuesFile(thisTSVFile)
try
set dataBlob to (every paragraph of (read thisTSVFile))
set the tableData to {}
set AppleScript's text item delimiters to tab
repeat with i from 1 to the count of dataBlob
set the end of the tableData to (every text item of (item i of dataBlob))
end repeat
set AppleScript's text item delimiters to ""
return tableData
on error errorMessage number errorNumber
set AppleScript's text item delimiters to ""
error errorMessage number errorNumber
end try
end readTabSeparatedValuesFile
For reference, each line in the .txt file is formatted like this:
'Allo 'Allo! Theme (David Croft & Roy Moore) Jack Emblow 1982

Splitting CSV column data into new CSV file using VBScript

I have a CSV file where 2 columns contain several different text values e.g.
Column 1: Reptiles, Health, Hygiene
Column 2: Purity
I need to use VBscript to split these columns into a new CSV file without changing the current file, expected output in new CSV file shown below:
Column 1 Column 2
Reptiles Reptiles
Health Health
Hygiene Hygiene
Purity Purity
Unfortunately(?) it must be done with VB Script and nothing else.
Here is an example of how the data looks (of course the data consistently repeats with some extra entries through the same columns in file 1.
And here is an example of how it needs to look but it needs to repeat down until all unique entries from Column 1 and 2 in the original file have been input as a single entry to Column 1 in the new file and copied to Column 2 in the same new file. e.g.
Examples in text format as requested:
Original file:
Column 1,Column 2
"Reptiles, Health, Hygiene",Purity
New File:
Column 1,Column 2
Reptiles,Reptiles
Health,Health
Hygiene,Hygiene
Purity,Purity
I think this is a simple matter of using the FileSystemObject with Split function.
Assuming each input line is just one set of data you can remove the double quotes and process from there
Try this VB script out (edited to process header line separately):
Const Overwrite = True
Set ObjFso = CreateObject("Scripting.FileSystemObject")
Set ObjOutFile = ObjFso.CreateTextFile("My New File Path", Overwrite)
Set ObjInFile = ObjFso.OpenTextFile("My Old File Path")
' Skip processing first header line and just write it out as is
strLine = ObjInFile.ReadLine
ObjOutFile.WriteLine strLine
Do Until ObjInFile.AtEndOfStream
' Remove all double quotes to treat this as one set of data
strLine = Replace(ObjInFile.ReadLine, """","")
varData = Split(strLine,",")
' Write out each element twice into its own line
For i = 0 to uBound(varData)
ObjOutFile.WriteLine varData(i) & "," & varData(i)
Next i
Loop
ObjInFile.Close
ObjOutFile.Close

Convert Hyperlinks to HTML code in Excel

I have a column of hyperlinks in an Excel file and I want to convert them to their respective HTML code:
Link Name
I found ways to extract the link only (as text), but I need the whole HTML code as text to replace the hyperlink in the cell.
I've searched and searched but no one needed this answer, I guess. Can someone help?
It is actually a fairly straightforward method to yank the .Address and optional .SubAddress from the Hyperlinks collection object. The .TextToDisplay property is simply the value or text of the cell.
Sub html_anchors()
Dim a As Range, u As String, l As String
Dim sANCHOR As String: sANCHOR = "%L%"
For Each a In Selection
With a
If CBool(.Hyperlinks.Count) Then
l = .Text
u = .Hyperlinks(1).Address
If Right(u, 1) = Chr(47) Then u = Left(u, Len(u) - 1)
.Hyperlinks(1).Delete
.Value = Replace(Replace(sANCHOR, "%U%", u), "%L%", l)
End If
End With
Next a
End Sub
Select all of the cells you want to process and run the routine. If any cell in your selection does not contain a hyperlink, it will be ignored.

Issue with ruby parsing

Im just having a slight problem parising a website with nokogiri in ruby.
Here is what the site looks like
<div id="post_message_111112" class="postcontent">
Hee is text 1
here is another
</div>
<div id="post_message_111111" class="postcontent">
Here is text 2
</div>
Here is my code to parse it
doc = Nokogiri::HTML(open(myNewLink))
myPost = doc.xpath("//div[#class='postcontent']/text()").to_a()
ii=0
while ii!=myPost.length
puts "#{ii} #{myPost[ii].to_s().strip}"
ii+=1
end
My problem is when it displays it, because of the new line after Hee is text 1, the to_a puts it weird like so
myPost[0] = hee is text 1
myPost[1] = here is another
myPost[2] = here is text 2
I want each div to be its own message. like
myPost[0] = hee is text 1 here is another
myPost[1] = here is text 2
How would i solve this thanks
UPDATED
I tried
myPost = doc.xpath("//div[#class='postcontent']/text()").to_a()
myPost.each_with_index do |post, index|
puts "#{index} #{post.to_s().gsub(/\n/, ' ').strip}"
end
I put post.to_s().gsub because it was complaining about gsub not being a method for post. But i still have the same issue. I know im doing it wrong just wrecking my head
UPDATE 2
Forgot to say that the new line is <br /> and even with
doc.search('br').each do |n|
n.replace('')
end
or
doc.search('br').remove
The issue is still there
If you look at the myPost array, you will see that each div is in fact its own message. The first just happens to include a newline-character \n. To replace it with a space, use #gsub(/\n/, ' '). So your loop looks like this:
myPost.each_with_index do |post, index|
puts "#{index} #{post.to_s.gsub(/\n/, ' ').strip}"
end
Edit:
According to my limited understanding of it, xpath can only find nodes. The child nodes are <br />, so either you have multiple texts between them or you have the div tag included in your search. There sure is a way to join the texts between the <br /> nodes, but I don't know it.
Until you find it, here something that works:
replace your xpath match with "//div[#class='postcontent']"
adjust your loop to delete the div tags:
myPost.each_with_index do |post, index|
post = post.to_s
post.gsub!(/\n/, ' ')
post.gsub!(/^<div[^>]*>/, '') # delete opening div tag
post.gsub!(%r|</\s*div[^>]*>|, '') # delete closing div tag
puts "#{index} #{post.strip}"
end
Here, let me clean that up for you:
doc.search('div.postcontent').each_with_index do |div, i|
puts "#{i} #{div.text.gsub(/\s+/, ' ').strip}"
end
# 0 Hee is text 1 here is another
# 1 Here is text 2

Importing CSV file into iCal

I get my teaching schedule and term plans in word doc table. I would like to know if there is a way to get this data into iCal. It will take me much longer to create events in iCal than what it would to copy these tables into an excel file and import to iCal from there.
The data would be one day events ie House-Gala Friday 22/02/2013 and the rest of the data will be 2 weeks or 3 weeks events ie 3 Weeks - Gr.10 Maths - Topic:Exponents (these events will be a 5 day event (Monday to Friday) repeated for three weeks.)
This is a script I got from the internet - but the first error it gives is me is that it can't convert the .csv file text into type Unicode.
Another issue later in the script will be to get those 5 day events to repeat for 3 or 2 weeks.
Any help would be greatly appreciated. This is what I have thus far:
--Convert CSV file to iCal events
--Prompts for file, then processes
--expects date,start time,end time,event name,xxxx,calendar name
--eg 12/01/2006,20:30,22:00,Water Committee,,TestCal
--change the various text item ns if data order in a file line is different
--blank lines skipped
--if other data present (eg location, notes ...) add a line in the tell calendar Calno loop
--to include it eg set location to text item 5 of ThisLine
set OldDelimiters to AppleScript's text item delimiters
set LF to ASCII character 10
set theFile to choose file with prompt "Select CSV calendar file"
set theLines to read theFile
set AppleScript's text item delimiters to {LF}
set theLines to paragraphs of theLines
set AppleScript's text item delimiters to {","}
repeat with ThisLine in theLines
if (count of ThisLine) > 0 then --ignore blanks
set StartDate to date (text item 1 of ThisLine & " " & text item 2 of ThisLine)
set EndDate to date (text item 1 of ThisLine & " " & text item 3 of ThisLine)
set CalName to word 1 of text item 6 of ThisLine
tell application "Calendar"
set CalList to title of every calendar
if CalName is in CalList then
repeat with CalNo from 1 to count of CalList
if CalName is item CalNo of CalList then exit repeat
end repeat
else
set NewOne to make new calendar at end of calendars with properties {title:CalName}
set CalNo to 1 + (count of CalList)
end if
tell calendar CalNo
set newItem to make new event at end of events with properties {start date:StartDate}
set summary of newItem to text item 4 of ThisLine
set end date of newItem to EndDate
end tell --calendar
end tell --iCal
end if
end repeat
set AppleScript's text item delimiters to OldDelimiters
Ok thanks for the replies, this is what I have thus far, I can't get the recurrence to use NumberCount as the COUNT for the repeat of the event:
set text item delimiters to ";"
repeat with l in paragraphs of (read "/Users/pienaar0/Desktop/test.csv" as «class utf8»)
if contents of l is not "" then
set sd to date (text item 1 of l & " ")
set ed to date (text item 2 of l & " ")
set NumberWeeks to (text item 4 of l & " ")
set NumberCount to NumberWeeks - 1
tell application "Calendar" to tell calendar "Test"
make new event with properties {allday event:true, start date:sd, end date:ed, summary:text item 3 of l, recurrence:"FREQ=WEEKLY;COUNT=2 * NumberCount"}
end tell
end if
end repeat
set text item delimiters to ","
repeat with l in paragraphs of (read "/Users/username/Desktop/test.csv" as «class utf8»)
if contents of l is not "" then
set sd to date (text item 1 of l & " " & text item 2 of l)
set ed to date (text item 1 of l & " " & text item 2 of l)
tell application "Calendar" to tell calendar "Test"
make new event with properties {start date:sd, end date:ed, summary:text item 4 of l}
end tell
end if
end repeat
test.csv:
01/25/2013,10:30PM,11:00PM,test event
01/26/2013,00:00AM,01:00AM,test event2