Retrieving a "Var" values from an HTML file - html

I have an html file that contains many "var"s in a section delimited by "<!--";
<!--
var g_stickyTableHeadersScrollVersion=1;... ;var g_priceListInfo={...,"arrProducts":[{"name":"...","type":"...","arrVariants":[{"name":"...","priceGroup":"..."},{"name":"...","priceGroup":"..."},...,{"name":"...","defaultSlabSize":[...,...],"priceGroup":"..."}],{"name":"...","price":...,"isSlabPricing":1}]}...}
-->
I'm at loss as to how to get the arrProducts array of g_priceListInfo variable values
After many (really a lot of) different attempts I thought I could use querySelector method of HTMLDocument as follows:
Dim url As String
url = "C:\Users\...\myHTMLFile.html"
Dim oFSO As FileSystemObject
Dim oFS As Object, sText As String
Set oFSO = New FileSystemObject ' CreateObject("Scripting.FileSystemObject")
Set oFS = oFSO.OpenTextFile(url)
Do Until oFS.AtEndOfStream
sText = oFS.ReadAll()
Loop
Dim doc As HTMLDocument
Set doc = CreateObject("htmlfile")
doc.body.innerHTML = sText
Dim ele As Object
Set ele = doc.querySelector("g_priceListInfo.arrProducts.name")
but, provided that is the right path, I couldn't find the correct syntax to make it work
thanks in advance for any help
EDIT: adding the relevant html page code view snapshots
EDIT 19/08/2022:
I finally made it by means of a brute force string manipulation
Then I found the no-ScriptControl & no-GitHub JSon parser solution at this link, which gave me the same results of my brute force method
I'd point everybody with the same need as this one of mine to that solution

Related

Using VBA to Create Word Label Sheets, Getting Extra Unwanted Document

I am trying to use VBA to create a label form in Word using data provided from Access. This VBA code works:
Dim appwd As Object
Dim oDoc As Object
Set appwd = CreateObject("Word.Application")
With appwd
.Documents.Add
Set oDoc = .MailingLabel.CreateNewDocumentByID(LabelID:="1359804671")
.Visible = True
.Activate
End With
oDoc.Activate
'Remaining code that creates labels
However, it creates and opens the blank Document1 as well as the Labels2 document I want. How do I prevent it from creating the unwanted Document1, or at the very least close that document again without saving it?
If I comment out the .Documents.Add, then I get
Run-time error '4605':
This method or property is not available because a document window is not active.
Unfortunately, the .MailingLabel object requires a document to be open, so the best alternative is to just close that document as soon as you create your desired one:
Dim appwd As Object
Dim oDoc As Object
Set appwd = CreateObject("Word.Application")
With appwd
.Documents.Add
Set oDoc = .MailingLabel.CreateNewDocumentByID(LabelID:="1359804671")
.Documents(1).Close SaveChanges:=0 'wdDoNotSaveChanges, close the first document
.Visible = True
.Activate
End With
oDoc.Activate

VBA Access object code for DropDownList ContentControls

I am coding an Access database that will collect user input, then open a Word document and fill out various parts of the document.
The problem I am having is it will only work once for Drop Down Lists. Not sure why or where to look to fix this one. There are three types of items I am programmatically filling in. The first is bookmarks, no problem with this. Second is Content Control Checkboxes, these also work with no problems. The third is Content Control Drop Down Lists, this is where the problem is. First time I open the Access Database it works, but if I click the command button again, nothing (for Drop Downs). The main problem is that it doesn't produce an error message, so I am not sure where to look.
I am guessing it has something to do with the objects I am creating to do the drop down updates? any help would be great:
Dim WordApp As Word.Application
Dim strTemplateLocation As String
Dim dir As String
Dim path As String
Dim wDoc As Word.Document
path = Left(CurrentDb.Name, InStrRev(CurrentDb.Name, "\"))
strTemplateLocation = path & "UserDoc.docx"
On Error Resume Next
Set WordApp = GetObject(, "Word.Application")
If Err.Number <> 0 Then
Set WordApp = CreateObject("Word.Application")
End If
WordApp.Visible = True
WordApp.WindowState = wdWindowStateMaximize
WordApp.Documents.Add Template:=strTemplateLocation, newtemplate:=False
With WordApp
'Working Bookmark
.Selection.GoTo what:=wdGoToBookmark, Name:="COMPANY": .Selection.TypeText [fac]
'Working checkbox
If Me.RD = True Then: .ActiveDocument.ContentControls(9).Checked = True
'Works ONCE drop down
Dim objCC As ContentControl
Dim objCE As ContentControlListEntry
Dim ListSelection As String
ListSelection = Me.System_Type.ListIndex + 2
Set objCC = ActiveDocument.ContentControls(1): Set objCE = objCC.DropdownListEntries.Item(ListSelection): objCE.Select
End With
Should I be closing out the objCE and objCC at the end or something?
This is probably your problem:
Set objCC = ActiveDocument.ContentControls(1)
It should be
Set objCC = .ActiveDocument.ContentControls(1)
But much better would be:
Set wDoc = WordApp.Documents.Add(Template:=strTemplateLocation, newtemplate:=False)
and then always use wDoc instead of WordApp.ActiveDocument.
See here: VBA ActiveDocument Concerns / Alternatives?

How to get rich text, which Access formats using HTML, into a Word doc

I am copying formatted text from a Word document into a rich text field in Access.
Later I want to use VBA to create a new Word document and write the text as formatted to it.
The problem is, Access saves rich text with HTML formatting. And when you try to write that to a doc or docx, you see the text and its HTML tags.
How do I write the text to a Word document so that it retains the intended formatting and doesn't show HTML codes?
The only way I have found that works (without modifying the input string) is to write the HTML to a temporary file, then use .InsertFile to load that file into the word document. Here's a sub that takes the input argument and puts it into a new Word document:
Sub WriteToWord(myHtmlFormattedText as String)
Dim objWord As Word.Application
Dim doc As Word.Document
Dim fso As Object ' FileSystemObject
Dim f As Object ' TextStream
Dim tempHtmlFile As String
' Write your HTML content to a temp file:
Set fso = CreateObject("Scripting.FileSystemObject")
tempHtmlFile = fso.GetSpecialFolder(2) & "\" & fso.GetTempName & ".htm"
Set f = fso.CreateTextFile(tempHtmlFile, True)
f.Write myHtmlFormattedText
f.Close
Set f = Nothing
Set fso = Nothing
' Set up word object
Set objWord = CreateObject("Word.Application")
With objWord
.Visible = True
Set doc = .Documents.Add
End With
'Add HTML file contents:
objWord.Selection.InsertFile tempHtmlFile
' Show the doc
doc.Activate
End Sub
Not sure this will help, but there are a few different pasting features in Microsoft Word (in thee upper left corner of the 'Home' tab) that many people overlook
This allows you to paste with/without formatting. Not sure if it will solve your specific issue, but hopefully!

How do I extract text of a single HTML element by tag name using MSXML in VBA?

I'm trying to extract US Patent titles using MSXML6.
On the full-text html view of a patent document on the USPTO website, the patent title appears as the first and only "font" element that is a child of "body".
Here is my function that is not working (I get no error; the cell with the formula just stays blank).
Can somebody help me figure out what is wrong?
An example URL that I am feeding into the function is http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=6293874.PN.&OS=PN/6293874&RS=PN/6293874
Function getUSPatentTitle(url As String)
Static colTitle As New Collection
Dim title As String
Dim pageSource As String
Dim xDoc As MSXML2.DOMDocument
Dim xNode As IXMLDOMNode
On Error Resume Next
title = colTitle(url)
If Err.Number <> 0 Then
Set html_doc = CreateObject("htmlfile")
Set xml_obj = CreateObject("MSXML6.XMLHTTP60")
xml_obj.Open "GET", url, False
xml_obj.send
pageSource = xml_obj.responseText
Set xml_obj = Nothing
Set xDoc = New MSXML2.DOMDocument
If Not xDoc.LoadXML(pageSource) Then
Err.Raise xDoc.parseError.ErrorCode, , xDoc.parseError.reason
End If
Set xNode = xDoc.getElementsByTagName("font").Item(1)
title = xNode.Text
If Not title = "" Then colTitle.Add Item:=title, Key:=url
End If
On Error GoTo 0 ' I understand "GoTo" is dangerous coding but copied from somebody and so far haven't thought of a more natural substitute for a GoTo statement
getUSPatentTitle = title
End Function
Just a few points:
"On Error Goto 0" is not really a traditional Goto statement - it's just how you turn off user error handling in VBA. There were a few errors in your code but the "On Error Resume Next" skipped them so you saw nothing.
The data from the web page is in HTML format not XML.
There were a few "font" elements before the one with the title.
This should work:
Function getUSPatentTitle(url As String)
Static colTitle As New Collection
Dim title As String
Dim pageSource As String
Dim errorNumber As Integer
On Error Resume Next
title = colTitle(url)
errorNumber = Err.Number
On Error GoTo 0
If errorNumber <> 0 Then
Dim xml_obj As XMLHTTP60
Set xml_obj = CreateObject("MSXML2.XMLHTTP")
xml_obj.Open "GET", url, False
xml_obj.send
pageSource = xml_obj.responseText
Set xml_obj = Nothing
Dim html_doc As HTMLDocument
Set html_doc = CreateObject("HTMLFile")
html_doc.body.innerHTML = pageSource
Dim fontElement As IHTMLElement
Set fontElement = html_doc.getElementsByTagName("font").Item(3)
title = fontElement.innerText
If Not title = "" Then colTitle.Add Item:=title, Key:=url
End If
getUSPatentTitle = title
End Function
CSS selector:
You can re-write what you described, which in fact is first font tag within a body tag as a CSS selector of:
body > font
CSS query:
VBA:
As it is the first match/only you want you can use the querySelector method of document to apply the selector and retrieve a single element.
Debug.Print html_doc.querySelector("body > font").innerText
You may need to add a reference to HTML Object Library and use an early bound call of Dim html_doc As HTMLDocument to access the method. The late bound method may expose the querySelector method but if the interface doesn't then use early binding.

Read Local HTML File into String With VBA

This feels like it should be simple. I have a .HTML file stored on my computer, and I'd like to read the entire file into a string. When I try the super straightforward
Dim FileAsString as string
Open "C:\Myfile.HTML" for input as #1
Input #1, FileAsString
Close #1
debug.print FileAsString
I don't get the whole file. I only get the first few lines (I know the immediate window cuts off, but that's not the issue. I'm definitely not getting the whole file into my string.) I also tried using an alternative method using the file system object, and got similar results, only this time with lots of weird characters and question marks thrown in. This makes me think it's probably some kind of encoding issue. (Although frankly, I don't fully understand what that means. I know there are different encoding formats and that this can cause issues with string parsing, but that's about it.)
So more generally, here's what I'd really like to know: How can I use vba to open a file of any extension (that can be viewed in a text editor) and length (that's doesn't exceed VBA's string limit), and be sure that whatever characters I would see in a basic text editor are what gets read into a string? (If that can't be (easily) done, I'd certainly appreciate being pointed towards a method that's likely to work with .html files) Thanks so much for your help
EDIT:
Here's an example of what happens when I use the suggested method. Specifically
Dim oFSO As Object
Dim oFS As Object, sText As String
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFS = oFSO.OpenTextFile(Path)
Do Until oFS.AtEndOfStream
sText = oFS.ReadAll()
Loop
FileToString = sText
Set oFSO = Nothing
Set oFS = Nothing
End Function
I'll show you both the beginning (via a message box) and the end (via the immediate window) because both are weird in different ways. In both cases I'll compare it to a screen capture of the html source displayed in chrome:
Beginning:
End:
This is one method
Option Explicit
Sub test()
Dim oFSO As Object
Dim oFS As Object, sText As String
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFS = oFSO.OpenTextFile("C:\Users\osknows\Desktop\import-store.csv")
Do Until oFS.AtEndOfStream
' sText = oFS.ReadLine 'read line by line
sText = oFS.ReadAll()
Debug.Print sText
Loop
End Sub
EDIT:
Try changing the following line to one of the following 3 lines and see if it makes any difference
http://msdn.microsoft.com/en-us/library/aa265347(v=vs.60).aspx
Set FS = FSO.OpenTextFile("C:\Users\osknows\Desktop\import-store.csv", 1, 0)
Set FS = FSO.OpenTextFile("C:\Users\osknows\Desktop\import-store.csv", 1, 1)
Set FS = FSO.OpenTextFile("C:\Users\osknows\Desktop\import-store.csv", 1, 2)
EDIT2:
Does this code work for you?
Function ExecuteWebRequest(ByVal url As String) As String
Dim oXHTTP As Object
Set oXHTTP = CreateObject("MSXML2.XMLHTTP")
oXHTTP.Open "GET", url, False
oXHTTP.send
ExecuteWebRequest = oXHTTP.responseText
Set oXHTTP = Nothing
End Function
Function OutputText(ByVal outputstring As String)
MyFile = ThisWorkbook.Path & "\temp.html"
'set and open file for output
fnum = FreeFile()
Open MyFile For Output As fnum
'use Print when you want the string without quotation marks
Print #fnum, outputstring
Close #fnum
End Function
Sub test()
Dim oFSO As Object
Dim oFS As Object, sText As String
Dim Uri As String, HTML As String
Uri = "http://www.forrent.com/results.php?search_type=citystate&page_type_id=city&seed=859049165&main_field=12345&ssradius=-1&min_price=%240&max_price=No+Limit&sbeds=99&sbaths=99&search-submit=Submit"
HTML = ExecuteWebRequest(Uri)
OutputText (HTML)
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFS = oFSO.OpenTextFile(ThisWorkbook.Path & "\temp.html")
Do Until oFS.AtEndOfStream
' sText = oFS.ReadLine 'read line by line
sText = oFS.ReadAll()
Debug.Print sText
Loop
End Sub
Okay so I finally managed to figure this out. The VBA file system object can only read asciiII files, and I had saved mine as unicode. Sometimes, as in my case, saving an asciiII file can cause errors. You can get around this, however, by converting the file to binary, and then back to a string. The details are explained here http://bytes.com/topic/asp-classic/answers/521362-write-xmlhttp-result-text-file.
A bit late to answer but I did this exact thing today (works perfectly):
Sub modify_local_html_file()
Dim url As String
Dim html As Object
Dim fill_a As Object
url = "C:\Myfile.HTML"
Dim oFSO As Object
Dim oFS As Object, sText As String
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFS = oFSO.OpenTextFile(url)
Do Until oFS.AtEndOfStream
sText = oFS.ReadAll()
Debug.Print sText
Loop
Set html = CreateObject("htmlfile")
html.body.innerHTML = sText
oFS.Close
Set oFS = Nothing
'# grab some element #'
Set fill_a = html.getElementById("val_a")
MsgBox fill_a.innerText
'# change its inner text #'
fill_a.innerText = "20%"
MsgBox fill_a.innerText
'# open file this time to write to #'
Set oFS = oFSO.OpenTextFile(url, 2)
'# write it modified html #'
oFS.write html.body.innerHTML
oFS.Close
Set oFSO = Nothing
Set oFS = Nothing
End Sub