Read Local HTML File into String With VBA - html

This feels like it should be simple. I have a .HTML file stored on my computer, and I'd like to read the entire file into a string. When I try the super straightforward
Dim FileAsString as string
Open "C:\Myfile.HTML" for input as #1
Input #1, FileAsString
Close #1
debug.print FileAsString
I don't get the whole file. I only get the first few lines (I know the immediate window cuts off, but that's not the issue. I'm definitely not getting the whole file into my string.) I also tried using an alternative method using the file system object, and got similar results, only this time with lots of weird characters and question marks thrown in. This makes me think it's probably some kind of encoding issue. (Although frankly, I don't fully understand what that means. I know there are different encoding formats and that this can cause issues with string parsing, but that's about it.)
So more generally, here's what I'd really like to know: How can I use vba to open a file of any extension (that can be viewed in a text editor) and length (that's doesn't exceed VBA's string limit), and be sure that whatever characters I would see in a basic text editor are what gets read into a string? (If that can't be (easily) done, I'd certainly appreciate being pointed towards a method that's likely to work with .html files) Thanks so much for your help
EDIT:
Here's an example of what happens when I use the suggested method. Specifically
Dim oFSO As Object
Dim oFS As Object, sText As String
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFS = oFSO.OpenTextFile(Path)
Do Until oFS.AtEndOfStream
sText = oFS.ReadAll()
Loop
FileToString = sText
Set oFSO = Nothing
Set oFS = Nothing
End Function
I'll show you both the beginning (via a message box) and the end (via the immediate window) because both are weird in different ways. In both cases I'll compare it to a screen capture of the html source displayed in chrome:
Beginning:
End:

This is one method
Option Explicit
Sub test()
Dim oFSO As Object
Dim oFS As Object, sText As String
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFS = oFSO.OpenTextFile("C:\Users\osknows\Desktop\import-store.csv")
Do Until oFS.AtEndOfStream
' sText = oFS.ReadLine 'read line by line
sText = oFS.ReadAll()
Debug.Print sText
Loop
End Sub
EDIT:
Try changing the following line to one of the following 3 lines and see if it makes any difference
http://msdn.microsoft.com/en-us/library/aa265347(v=vs.60).aspx
Set FS = FSO.OpenTextFile("C:\Users\osknows\Desktop\import-store.csv", 1, 0)
Set FS = FSO.OpenTextFile("C:\Users\osknows\Desktop\import-store.csv", 1, 1)
Set FS = FSO.OpenTextFile("C:\Users\osknows\Desktop\import-store.csv", 1, 2)
EDIT2:
Does this code work for you?
Function ExecuteWebRequest(ByVal url As String) As String
Dim oXHTTP As Object
Set oXHTTP = CreateObject("MSXML2.XMLHTTP")
oXHTTP.Open "GET", url, False
oXHTTP.send
ExecuteWebRequest = oXHTTP.responseText
Set oXHTTP = Nothing
End Function
Function OutputText(ByVal outputstring As String)
MyFile = ThisWorkbook.Path & "\temp.html"
'set and open file for output
fnum = FreeFile()
Open MyFile For Output As fnum
'use Print when you want the string without quotation marks
Print #fnum, outputstring
Close #fnum
End Function
Sub test()
Dim oFSO As Object
Dim oFS As Object, sText As String
Dim Uri As String, HTML As String
Uri = "http://www.forrent.com/results.php?search_type=citystate&page_type_id=city&seed=859049165&main_field=12345&ssradius=-1&min_price=%240&max_price=No+Limit&sbeds=99&sbaths=99&search-submit=Submit"
HTML = ExecuteWebRequest(Uri)
OutputText (HTML)
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFS = oFSO.OpenTextFile(ThisWorkbook.Path & "\temp.html")
Do Until oFS.AtEndOfStream
' sText = oFS.ReadLine 'read line by line
sText = oFS.ReadAll()
Debug.Print sText
Loop
End Sub

Okay so I finally managed to figure this out. The VBA file system object can only read asciiII files, and I had saved mine as unicode. Sometimes, as in my case, saving an asciiII file can cause errors. You can get around this, however, by converting the file to binary, and then back to a string. The details are explained here http://bytes.com/topic/asp-classic/answers/521362-write-xmlhttp-result-text-file.

A bit late to answer but I did this exact thing today (works perfectly):
Sub modify_local_html_file()
Dim url As String
Dim html As Object
Dim fill_a As Object
url = "C:\Myfile.HTML"
Dim oFSO As Object
Dim oFS As Object, sText As String
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFS = oFSO.OpenTextFile(url)
Do Until oFS.AtEndOfStream
sText = oFS.ReadAll()
Debug.Print sText
Loop
Set html = CreateObject("htmlfile")
html.body.innerHTML = sText
oFS.Close
Set oFS = Nothing
'# grab some element #'
Set fill_a = html.getElementById("val_a")
MsgBox fill_a.innerText
'# change its inner text #'
fill_a.innerText = "20%"
MsgBox fill_a.innerText
'# open file this time to write to #'
Set oFS = oFSO.OpenTextFile(url, 2)
'# write it modified html #'
oFS.write html.body.innerHTML
oFS.Close
Set oFSO = Nothing
Set oFS = Nothing
End Sub

Related

Retrieving a "Var" values from an HTML file

I have an html file that contains many "var"s in a section delimited by "<!--";
<!--
var g_stickyTableHeadersScrollVersion=1;... ;var g_priceListInfo={...,"arrProducts":[{"name":"...","type":"...","arrVariants":[{"name":"...","priceGroup":"..."},{"name":"...","priceGroup":"..."},...,{"name":"...","defaultSlabSize":[...,...],"priceGroup":"..."}],{"name":"...","price":...,"isSlabPricing":1}]}...}
-->
I'm at loss as to how to get the arrProducts array of g_priceListInfo variable values
After many (really a lot of) different attempts I thought I could use querySelector method of HTMLDocument as follows:
Dim url As String
url = "C:\Users\...\myHTMLFile.html"
Dim oFSO As FileSystemObject
Dim oFS As Object, sText As String
Set oFSO = New FileSystemObject ' CreateObject("Scripting.FileSystemObject")
Set oFS = oFSO.OpenTextFile(url)
Do Until oFS.AtEndOfStream
sText = oFS.ReadAll()
Loop
Dim doc As HTMLDocument
Set doc = CreateObject("htmlfile")
doc.body.innerHTML = sText
Dim ele As Object
Set ele = doc.querySelector("g_priceListInfo.arrProducts.name")
but, provided that is the right path, I couldn't find the correct syntax to make it work
thanks in advance for any help
EDIT: adding the relevant html page code view snapshots
EDIT 19/08/2022:
I finally made it by means of a brute force string manipulation
Then I found the no-ScriptControl & no-GitHub JSon parser solution at this link, which gave me the same results of my brute force method
I'd point everybody with the same need as this one of mine to that solution

Export all tables to txt files with export specification

I have a Access DB containing several different tables, each with a different structure (number & names of fields, number of rows, title).
What I would like to do is to export all these tables into txt files, with a given separator ("|"), point as decimal separator, quotes for strings.
I have browsed the internet and what I got was:
use DoCmd.TransferText acExportDelim command
save a customized export specification and apply it
I get an error messagge ("object does not exist") and I think it is related to the fact that the export specification is "sheet-specific", i.e. does not apply to tables with different fields and fieldnames.
Can you help me?
thanks!!
EDIT.
I post also the original code I run. As I said before, I am new to VBA, so I just looked for a code on the web, adapted it to my needs, and run.
Public Sub ExportDatabaseObjects()
On Error GoTo Err_ExportDatabaseObjects
Dim db As Database
Dim db As DAO.Database
Dim td As TableDef
Dim sExportLocation As String
Dim a As Long
Set db = CurrentDb()
sExportLocation = "C:\" 'Do not forget the closing back slash! ie: C:\Temp\
For a = 0 To db.TableDefs.Count - 1
If Not (db.TableDefs(a).Name Like "MSys*") Then
DoCmd.TransferText acExportDelim, "Export_specs", db.TableDefs(a).Name, sExportLocation & db.TableDefs(a).Name & ".txt", True
End If
Next a
Set db = Nothing
MsgBox "All database objects have been exported as a text file to " & sExportLocation, vbInformation
Exit_ExportDatabaseObjects:
Exit Sub
Err_ExportDatabaseObjects:
MsgBox Err.Number & " - " & Err.Description
Resume Exit_ExportDatabaseObjects
End Sub
Before running the code, I manually exported the first table saving the Export_specs to a file.
Consider a db with two tables, A and B.
When I run the code A is properly exported, then I get the following errore message "3011 - The Microsoft Access database engine could not find the object 'B#txt'. Make sure the object exists and that you spell its name and the path name correctly. If 'B#txt' is not a local object, check your network connection or contact the server administration".
So, it's kind of complex. I've created a routine that imports files using ImportExport Specs, you should be able to easily adapt to your purpose. The basic operation is to create a spec that does exactly what you want to one file. Then, export this spec using this code:
Public Function SaveSpecAsXMltoTempDirectory(sSpecName As String)
Dim oFSO As FileSystemObject
Dim oTS As TextStream
Set oFSO = New FileSystemObject
Set oTS = oFSO.CreateTextFile("C:\Temp\" & sSpecName & ".xml", True)
oTS.Write CurrentProject.ImportExportSpecifications(sSpecName).XML
oTS.Close
Set oTS = Nothing
Set oFSO = Nothing
End Function
Then open this file in Notepad and replace the file name with some placeholder (I used "FILE_PATH_AND_NAME" in this sample). Then, import back into database using this code:
Public Function SaveSpecFromXMLinTempDirectory(sSpecName As String)
Dim oFSO As FileSystemObject
Dim oTS As TextStream
Dim sSpecXML As String
Dim oSpec As ImportExportSpecification
Set oFSO = New FileSystemObject
Set oTS = oFSO.OpenTextFile("C:\Temp\" & sSpecName & ".xml", ForReading)
sSpecXML = oTS.ReadAll
For Each oSpec In CurrentProject.ImportExportSpecifications
If oSpec.Name = sSpecName Then oSpec.Delete
Next oSpec
Set oSpec = CurrentProject.ImportExportSpecifications.Add(sSpecName, sSpecXML)
Set oSpec = Nothing
oTS.Close
Set oTS = Nothing
Set oFSO = Nothing
End Function
Now you can cycle thru the files and replace the placeholder in the spec with the filename then execute it using this code:
Public Function ImportFileUsingSpecification(sSpecName As String, sFile As String) As Boolean
Dim oSpec As ImportExportSpecification
Dim sSpecXML As String
Dim bReturn As Boolean
'initialize return variable as bad until function completes
bReturn = False
'export data using saved Spec
' first make sure no temp spec left by accident
For Each oSpec In CurrentProject.ImportExportSpecifications
If oSpec.Name = "Temp" Then oSpec.Delete
Next oSpec
sSpecXML = CurrentProject.ImportExportSpecifications(sSpecName).XML
If Not Len(sSpecXML) = 0 Then
sSpecXML = Replace(sSpecXML, "FILE_PATH_AND_NAME", sFile)
'now create temp spec to use, get template text and replace file path and name
Set oSpec = CurrentProject.ImportExportSpecifications.Add("Temp", sSpecXML)
oSpec.Execute
bReturn = True
Else
MsgBox "Could not locate correct specification to import that file!", vbCritical, "NOTIFY ADMIN"
GoTo ExitImport
End If
ExitImport:
On Error Resume Next
ImportFileUsingSpecification = bReturn
Set oSpec = Nothing
Exit Function
End Function
Obviously you'll need to find the table name in the spec XML and use a placeholder on it as well. Let me know if you can't get it to work and i'll update for export.

VBA Access object code for DropDownList ContentControls

I am coding an Access database that will collect user input, then open a Word document and fill out various parts of the document.
The problem I am having is it will only work once for Drop Down Lists. Not sure why or where to look to fix this one. There are three types of items I am programmatically filling in. The first is bookmarks, no problem with this. Second is Content Control Checkboxes, these also work with no problems. The third is Content Control Drop Down Lists, this is where the problem is. First time I open the Access Database it works, but if I click the command button again, nothing (for Drop Downs). The main problem is that it doesn't produce an error message, so I am not sure where to look.
I am guessing it has something to do with the objects I am creating to do the drop down updates? any help would be great:
Dim WordApp As Word.Application
Dim strTemplateLocation As String
Dim dir As String
Dim path As String
Dim wDoc As Word.Document
path = Left(CurrentDb.Name, InStrRev(CurrentDb.Name, "\"))
strTemplateLocation = path & "UserDoc.docx"
On Error Resume Next
Set WordApp = GetObject(, "Word.Application")
If Err.Number <> 0 Then
Set WordApp = CreateObject("Word.Application")
End If
WordApp.Visible = True
WordApp.WindowState = wdWindowStateMaximize
WordApp.Documents.Add Template:=strTemplateLocation, newtemplate:=False
With WordApp
'Working Bookmark
.Selection.GoTo what:=wdGoToBookmark, Name:="COMPANY": .Selection.TypeText [fac]
'Working checkbox
If Me.RD = True Then: .ActiveDocument.ContentControls(9).Checked = True
'Works ONCE drop down
Dim objCC As ContentControl
Dim objCE As ContentControlListEntry
Dim ListSelection As String
ListSelection = Me.System_Type.ListIndex + 2
Set objCC = ActiveDocument.ContentControls(1): Set objCE = objCC.DropdownListEntries.Item(ListSelection): objCE.Select
End With
Should I be closing out the objCE and objCC at the end or something?
This is probably your problem:
Set objCC = ActiveDocument.ContentControls(1)
It should be
Set objCC = .ActiveDocument.ContentControls(1)
But much better would be:
Set wDoc = WordApp.Documents.Add(Template:=strTemplateLocation, newtemplate:=False)
and then always use wDoc instead of WordApp.ActiveDocument.
See here: VBA ActiveDocument Concerns / Alternatives?

Exception "path not found" in access vba

I have the following code that throws an exception "Path not found".
Dim myfso As New FileSystemObject
Set myfso = CreateObject("Scripting.FileSystemObject")
Dim myoFile As Object
Set myoFile = myfso.CreateTextFile("C:\Users\myname\dropbox_folder\Dropbox\dropboxpath.txt")
myoFile.WriteLine "C:\Users\myname\dropbox_folder\Dropbox\"
myoFile.Close
Set myfso = Nothing
Set myoFile = Nothing
Dim strContents As String
Dim myfso1 As New FileSystemObject
Set myfso1 = CreateObject("Scripting.FileSystemObject")
Dim myoFile1 As Object
Dim mypath As String
Set myoFile1 = myfso1.OpenTextFile("C:\Users\myname\dropbox_folder\Dropbox\dropboxpath.txt", ForReading)
strContents = myoFile1.ReadAll
myoFile1.Close
Dim fso, oFolder, oSubfolder, oFile, queue As Collection
Set fso = CreateObject("Scripting.FileSystemObject")
Set queue = New Collection
queue.Add fso.GetFolder(strContents)
This last command throws the exception: Path not found. But the path exist "C:\Users\myname\dropbox_folder\Dropbox\" and this is positive lets not argue about that.
The strange is that if you point the mouse over the variable you see this:
"C:\Users\myname\dropbox_folder\Dropbox\
without the second ". That is a bit strange for me.
Furthermore, if I run the previous command
queue.Add fso.GetFolder("C:\Users\myname\dropbox_folder\Dropbox\")
the code executes smoothly.
What is the problem in your opinion?
Change to:
myoFile.Write "C:\Users\myname\dropbox_folder\Dropbox\"
because WriteLine appends a VbCrLf (\r\n) to the file. When you subsequently ReadAll you end up with VbCrLf on the end of the path, invalidating it. (The CrLf is not displayed in the single line tool-tip but manifests as the absent closing ")

Deleting words/strings containing a specific character in MS Access

I'm writing a query to extract text that was entered through a vendor-created word processor to an Oracle database and I need to export it to Word or Excel. The text is entered into a memo field and the text is intertwined with codes that the word processor uses for different functions (bold, indent, hard return, font size, etc.).
I've used the replace function to parse out a lot of the more common codes, but there are so many variations, it's nearly impossible to catch them all. Is there a way to do this? Unfortunately, I'm limited to using Microsoft Access 2010 to try and accomplish this.
The common thread I've found is that all the codes start with a back-slash and I'd like to be able to delete all strings that start with a back-slash up to the next space so all the codes are stripped out of the final text.
Here's a brief example of the text I'm working with:
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Times New Roman;
\viewkind4\uc1\pard\f0\fs36 An abbreviated survey was conducted
on 02/02/15 to investigate complaint #OK000227. \par
No deficiencies were cited.\par
\fs20\par
}}
If your machine has Microsoft Word installed then you already have an RTF parser available so you don't have to "roll your own". You can just get Word to open the RTF document and save it as plain text like this:
Option Compare Database
Option Explicit
Public Function RtfToPlainText(rtfText As Variant) As Variant
Dim rtn As Variant
Dim tempFolder As String, rtfPath As String, txtPath As String
Dim fso As Object ' FileSystemObject
Dim f As Object ' TextStream
Dim wordApp As Object ' Word.Application
Dim wordDoc As Object ' Word.Document
Dim tempFileName As String
tempFileName = "~RtfToPlainText"
If IsNull(rtfText) Then
rtn = Null
Else
' save RTF text as file
Set fso = CreateObject("Scripting.FileSystemObject")
tempFolder = fso.GetSpecialFolder(2) ' Temporaryfolder
rtfPath = tempFolder & "\" & tempFileName & ".rtf"
Set f = fso.CreateTextFile(rtfPath)
f.Write rtfText
f.Close
Set f = Nothing
' open in Word and save as plain text
Set wordApp = CreateObject("Word.Application")
Set wordDoc = wordApp.Documents.Open(rtfPath)
txtPath = tempFolder & "\" & tempFileName & ".txt"
wordDoc.SaveAs2 txtPath, 2 ' wdFormatText
wordDoc.Close False
Set wordDoc = Nothing
wordApp.Quit False
Set wordApp = Nothing
fso.DeleteFile rtfPath
' retrieve plain text
Set f = fso.OpenTextFile(txtPath)
rtn = f.ReadAll
f.Close
Set f = Nothing
fso.DeleteFile txtPath
Set fso = Nothing
End If
RtfToPlainText = rtn
End Function
Then, if you had a table with two Memo fields - [rtfText] and [plainText] - you could extract the plain text into the second Memo field using the following query in Access:
UPDATE rtfTestTable SET plainText = RtfToPlainText([rtfText]);
The text you are working with is RTF. Here is a tutorial about the file format.
This link (on another site, registration required) may give you copy & paste code you can use to convert rtf fields to txt.
You may be able to copy the value of the field from the database and paste it into notepad and then save the notepad file as "test.rtf"...you could then double click the file icon and the document may open.
RTF is an old MS file format that allows formatting of text. See this wikipedia page.