Convert HTML to string, then Find and Replace - html

I've searched extensively on this subject and haven't been able to find just what I've been looking for... so here I go!
Basically, I have 2 HTML Files. 1, I export from excel to HTML each day. The 2nd file, has addition code/CSS for table headers and formats as well as a scroll bar and search function. What I do, is copy the necessary bits out of the exported file into the 2nd file so it updates with the latest data. The 2nd file then links in to a larger web page on the company intranet for staff to see their results.
I have an entire automated system in place, and how I have been currently accomplishing this is to use VBA to open Notepad++ (used as my HTML editor) and then manually make these changes. I have recorded a macro within Notepad++ to do the changes automatically using "CTRL F1" as the command, but VBA doesn't work well with the Sendkeys function when I use the shell command to open Notepad++ so it hasn't been a viable solution for automation.
I then researched some more and came across the below code which I have amended to suit my needs, to bypass Notepad++ altogether and turn the HTML into a string. Problem is, it isn't just 1 word I am needing to find and replace, it's 2 entire and separate sections of code. I thought I could use wildcards but it doesn't seem to want to work. Any help that will allow me to replace an entire block of HTML code with another, using excel VBA would be an absolute lifesaver. Thanks in advance!
PS: The below code works as written because I remove the wildcard so it's just finding a couple of words on 1 line and replacing it with the entire code from the source file. I need to be able to replace and entire section, with a specified section from the source file as well
Sub Find_Replace2()
Dim sTempSource As String, sTempDest As String
'Dim sTemp As String
Dim sBuf As String
Dim iFileNum As Integer
Dim sFileName As String
'locations of html files, sourcefile goes into destfile
Dim htmlSourcefile As String: htmlSourcefile = "I:\The Hub\Pages\Statistics\Incentive\STB Incentive\STB League2.html"
Dim htmlDestfile As String: htmlDestfile = "I:\The Hub\Pages\Statistics\Incentive\STB League - Copy.html"
sFileName = htmlDestfile
'Opens the above files, and converts them to big long strings
iFileNum = FreeFile
Open htmlDestfile For Input As iFileNum
Do Until EOF(iFileNum)
Line Input #iFileNum, sBuf
sTempDest = sTempDest & sBuf & vbCrLf
Loop
Close iFileNum
iFileNum = FreeFile
Open htmlSourcefile For Input As iFileNum
Do Until EOF(iFileNum)
Line Input #iFileNum, sBuf
sTempSource = sTempSource & sBuf & vbCrLf
Loop
Close iFileNum
'find and replace on string
sTempDest = Replace(sTempDest, "<!--Start of VBA insert -->", "<!--Start of VBA insert -->" & sTempSource & "<!--End of VBA insert -->")
'saves string back off as original file
iFileNum = FreeFile
Open sFileName For Output As iFileNum
Print #iFileNum, sTempDest
Close iFileNum
End Sub

The replace function will not work as you have it. The following code gets dest file code before > "!--insert vba....>" here in one text string, then it gets everything after "!--end insert vba..." in another text string
And we get only the table from the source file. (given that the table end tag is
</table>.
I saved a table to html and that is how my Excel ended the table.)
So we add together Dest1 + source table + dest2 for the final page.
I have it saving the file to a tester.html file so it won't ruin your original file until you have had the chance to test it.
Sub Find_Replace2()
Dim sTempSource As String, sTempDest As String, sTempDest1 As String, sTempDest2 As String
Dim sSource1 As Long, sSource2 As Long, sTempSource2 As String
Dim point1 As Long, point2 As Long, point3 As Long, point4 As Long
Dim sBuf As String
Dim iFileNum As Integer
'locations of html files, sourcefile goes into destfile
Dim htmlSourcefile As String: htmlSourcefile = "I:\The Hub\Pages\Statistics\Incentive\STB Incentive\STB League2.html"
Dim htmlDestfile As String: htmlDestfile = "I:\The Hub\Pages\Statistics\Incentive\STB League - Copy.html"
'Opens the above files, and converts them to big long strings
iFileNum = FreeFile
Open htmlDestfile For Input As iFileNum
Do Until EOF(iFileNum)
Line Input #iFileNum, sBuf
sTempDest = sTempDest & sBuf & vbCrLf
Loop
Close iFileNum
iFileNum = FreeFile
Open htmlSourcefile For Input As iFileNum
Do Until EOF(iFileNum)
Line Input #iFileNum, sBuf
sTempSource = sTempSource & sBuf & vbCrLf
Loop
Close iFileNum
point3 = InStr(1, sTempSource, "<table") - 1
point4 = InStr(point3, sTempSource, "</table>") + 8
sTempSource2 = Mid(sTempSource, point3, point4 - point3)
point1 = InStr(1, sTempDest, "<!--Start of VBA insert -->") + 27
point2 = InStr(point1, sTempDest, "<!--End of VBA insert -->")
sTempDest1 = Mid(sTempDest, 1, point1)
sTempDest1 = sTempDest1 & sTempSource2
sTempDest2 = sTempDest1 & Mid(sTempDest, point2, Len(sTempDest) - point2)
'saves string back to a tester file
iFileNum = FreeFile
Open "I:\The Hub\Pages\Statistics\Incentive\tester.html" For Output As iFileNum
Print #iFileNum, sTempDest2
Close iFileNum
End Sub

Sub Find_Replace2()
Dim sTempSource As String, sTempDest As String, sTempDest1 As String, sTempDest2 As String, sTempDest3 As String
Dim sTempSource2 As String, sTempSource3 As String
Dim point1 As Long, point2 As Long, point3 As Long, point4 As Long, point5 As Long, point6 As Long, point7 As Long, point8 As Long
Dim sBuf As String
Dim iFileNum As Integer
'locations of html files, sourcefile goes into destfile
Dim htmlSourcefile As String: htmlSourcefile = "I:\The Hub\Pages\Statistics\Incentive\STB Incentive\STB League2.html"
Dim htmlDestfile As String: htmlDestfile = "I:\The Hub\Pages\Statistics\Incentive\STB League - Copy.html"
'Opens the above files, and converts them to big long strings
iFileNum = FreeFile
Open htmlDestfile For Input As iFileNum
Do Until EOF(iFileNum)
Line Input #iFileNum, sBuf
sTempDest = sTempDest & sBuf & vbCrLf
Loop
Close iFileNum
iFileNum = FreeFile
Open htmlSourcefile For Input As iFileNum
Do Until EOF(iFileNum)
Line Input #iFileNum, sBuf
sTempSource = sTempSource & sBuf & vbCrLf
Loop
Close iFileNum
point3 = InStr(1, sTempSource, "<!--table")
point4 = InStr(point3, sTempSource, "-->") + 3
sTempSource2 = Mid(sTempSource, point3, point4 - point3)
point7 = InStr(1, sTempSource, "</tr>")
point8 = InStr(point7, sTempSource, "<!--END OF OUTPUT FROM EXCEL PUBLISH AS WEB PAGE WIZARD-->") + 58
sTempSource3 = Mid(sTempSource, point7, point8 - point7)
point1 = InStr(1, sTempDest, "<!--Start of VBA insert -->") + 27
point2 = InStr(point1, sTempDest, "<!--End of VBA insert -->") - 1
point5 = InStr(1, sTempDest, "<!--Start of VBA insert2 -->") + 28
point6 = InStr(point5, sTempDest, "<!--End of VBA insert2 -->") - 1
sTempDest1 = Mid(sTempDest, 1, point1)
sTempDest1 = sTempDest1 & sTempSource2
sTempDest2 = sTempDest1 & Mid(sTempDest, point2, point5 - point2)
sTempDest2 = sTempDest2 & sTempSource3
sTempDest3 = sTempDest2 & Mid(sTempDest, point6, Len(sTempDest) - point6)
'saves string back to a tester file
iFileNum = FreeFile
Open "I:\The Hub\Pages\Statistics\Incentive\STB League - Copy.html" For Output As iFileNum
Print #iFileNum, sTempDest3
Close iFileNum
End Sub

Related

Error initializing PDF reDirect ProFinding to merge two files

I've found this code that should find two matching pdf-filenames and merges them into 1 pdf-file always in the same order. File 1 then File 2.
The code matches filenames based on the first part of the filename, before the AnotherWord 2014.pdf or before SomeWord.pdf.
Example document name1: John Doe SomeWord.pdf
Example document name2: John Doe AnotherWord 2014.pdf
I use PDF reDirect Pro v2.5.2 (freeware) and a reference to the program.
The problem I have is that the line
Dim oPDF As New PDF_reDirect_v25002.Batch_RC_AXD
That gives me the error:
User-defined type not defined
How can I fix this?
This is the whole code:
Private Sub Knop0_Click()
'Only works with PDF reDirect Pro v2.5.2
'And needs to have a reference to PDF_reDirect_v2500 and PDF reDirect Pro Remote Control
Dim fs As Object
Dim fld As Object
Dim fld2 As Object
Dim objFile As Object
Dim objFile2 As Object
Set fs = CreateObject("Scripting.FileSystemObject")
Dim oPDF As New PDF_reDirect_v25002.Batch_RC_AXD
Dim TempBool As Boolean
Dim Files_to_Merge(1) As String
Dim ObjFileName() As String
Dim CellNameValue() As String
Dim ofn As String
Dim cnv As String
Dim i As Integer
Set fld = fs.GetFolder("C:\pdf")
Set fld2 = fs.GetFolder("C:\pdf\merged")
i = 1
For Each objFile In fld.files
For Each objFile2 In fld.files
CellNameValue() = Split(objFile.Name, " SomeWord.pdf")
cnv = CellNameValue(0)
ObjFileName() = Split(objFile2.Name, " AnotherWord 2014.pdf")
ofn = ObjFileName(0)
Files_to_Merge(1) = fld & "\" & ofn & " AnotherWord 2014.pdf"
Files_to_Merge(0) = fld & "\" & cnv & " SomeWord.pdf"
If StrComp(ofn, cnv) = 0 Then
With oPDF
TempBool = .Utility_Merge_PDF_Files(fld2 & "\" & ofn & " AnotherWord 2014.pdf", Files_to_Merge) 'The file merges here unless it generates an error and goes to If Not TempBool Then...
If Not TempBool Then
MsgBox "An Error Occured: etc."
Else
'Optional
End If
End With
End If
i = i + 1
Next objFile2
i = i + 1
Next objFile
Set oPDF = Nothing
End Sub
As I said in my comments this should work assuming your trial Pro version will still allow this feature.
You just have to make sure you're using the correct version of the object that has been registered on your system.
Can you delete the line Dim oPDF As New PDF_reDirect_v25002.Batch_RC_AXD and start retyping it from scratch - not copy/pasting? Does the Object for the tool show up in Intellisense as you start typing PDF_Re
Put in the object that it finds PDF_reDirect_v2500 if that's what it is - then type the . and start typing Batch to fill in the last part. You have to use the current version of the object reference.
Compile your code and see if gets past that line

Combining multiple files in to one text using ms access

I have 6 text files in one folder.
I want combine selected files in to one text using access.
I have tried this code without success, because the one text file is created but is empty
Can any one help me on this?
Thanks in advance, my code below.
Lines in the text file:
xN;xDate;xNode;xCO;
100;2017-09-26 00:00:00;Valley;D6;
101;2017-09-25 00:00:00;Valley;D3;
...
...
Code:
Dim xPath
Function xExtract()
Dim xArray() As Variant
Dim I As Integer
Dim StrFileName As String
xPath = CurrentProject.Path
PDS:
xArray = Array("D1", "D2", "D3", "D4", "D5", "D6")
new_file = "" & xPath & "\PDS.txt"
fn = FreeFile
Open new_file For Output As fn
Close
For I = 0 To UBound(xArray)
StrFileName = "\\myserver\inetpub\ftproot\PDS_" & xArray(I) & ".txt"
fn = FreeFile
Open StrFileName For Input As fn
Open new_file For Append As fn + 1
Line Input #fn, dato
Do While Not EOF(fn)
Line Input #fn, dato
dati = Split(dato, Chr(9))
For d = 0 To UBound(dati)
If d = 0 Then
dato = Trim(dati(d))
Else
dato = dato & ";" & Trim(dati(d))
End If
Next
Print #fn + 1, dato
Loop
Close
Next I
Application.Quit
End Function
Here's code that works for concatenating comma delimited text files (probably would work for any text files). Pretty crude. Needs error handler and would benefit from common dialog to select output folder and file name. Also I don't like using non-typed variables, but I don't know what type of object some of them are and can't figure it out from Microsoft help. Warning, don't put output in same folder - might result in endless loop - trust me I tried it
Public Function CFiles(Filepath As String) As String
Dim FSO As Scripting.FileSystemObject
Dim SourceFolder As Folder
Dim Filein As Object
Dim fileout As Object
Dim strText As String
Dim TheInputfile As Object
Dim filename As String
Set FSO = New Scripting.FileSystemObject
Set SourceFolder = FSO.GetFolder(Filepath)
Set fileout = FSO.CreateTextFile("c:\InvestmentsPersonal\files\backup\output.txt", ForAppending, False)
For Each Filein In SourceFolder.Files
filename = Filein.Name
Set TheInputfile = FSO.OpenTextFile(Filepath & filename, ForReading)
strText = TheInputfile.ReadAll
TheInputfile.Close
fileout.WriteLine strText
Next
fileout.Close
Set fileout = Nothing
Set Filein = Nothing
Set SourceFolder = Nothing
Set FSO = Nothing
CFiles = "c:\InvestmentsPersonal\files\backup\output.txt"
End Function
As your code works for files with windows EOL format (CR (Carriage Return) + LF (Line Feed)), I guess your files are UNIX EOL format (just LF, no CR), check this with a texteditor like e.g. Notepad++ (View->Show Symbol->Show End of Line). This causesLine Inputto read the whole file in one line as it breaks on CR. Then you skip the first line and nothing is inserted, because all text is in this line.
You can useFileSystemObjectto avoid this as it breaks on LF.
Function xExtract()
Const ForReading = 1, ForWriting = 2, ForAppending = 8 'iomode constants
Const TristateUseDefault = -2, TristateTrue = -1, TristateFalse = 0 'format constants
Dim xArray As Variant, dati As Variant
Dim i As Long, d As Long
Dim xPath As String, new_file As String, dato As String, StrFileName As String
Dim FSO As Object, TextStreamIn As Object, TextStreamOut As Object
xPath = CurrentProject.Path
new_file = xPath & "\PDS.txt"
xArray = Array("D1", "D2", "D3", "D4", "D5", "D6")
Set FSO = CreateObject("Scripting.FileSystemObject")
Set TextStreamOut = FSO.OpenTextFile(new_file, ForWriting, True, TristateUseDefault) 'open textstream to write
For i = 0 To UBound(xArray) 'loop through files
StrFileName = "\\myserver\inetpub\ftproot\PDS_" & xArray(i) & ".txt"
Set TextStreamIn = FSO.OpenTextFile(StrFileName, ForReading) ' open textstream to read
TextStreamIn.SkipLine 'skip first line with headers
Do Until TextStreamIn.AtEndOfStream 'loop through lines
dati = Split(TextStreamIn.Readline, Chr(9))
For d = 0 To UBound(dati)
If d = 0 Then
dato = Trim(dati(d))
Else
dato = dato & ";" & Trim(dati(d))
End If
Next
TextStreamOut.WriteLine dato 'write line to file
Loop
TextStreamIn.Close 'close textstream
Next i 'next file
TextStreamOut.Close
Set TextStreamOut = Nothing
Set TextStreamIn = Nothing
Set FSO = Nothing
Application.Quit
End Function
If you want to stay withOpen fileyou can split the first (and only) line on LF (Split(dato,vbLf) and ignore the first element, but you have to check the file is UNIX EOL format, FSO covers both.

Importing csv file with hidden characters into Excel using vba

With the help of this forum have been able to solve most problem but this has me stuck.
I have a comma delimited csv file ("xxxx","zzz",) that has hidden chr(10) and chr(13) in the file. If I use a script to replace both these characters, I lose the end of record chr(10) so only imports as one record.
In notepad the file shows perfect - one record per line. If I open as a an excel file it is ok, its only when I import as a csv delimited file
With thanks to other contributors, below is what I have been using.
Ideally what I would like to do is:
select the csv file
copy the file to keep the original <<< extra function
clean up all hidden characters that would affect the import
then import
-
Private Sub CSV_ImportRepl()
Dim strFile As String
Dim strBuffer As String
Dim ff As Integer
Dim strFileName As String
Dim ws As Worksheet
'ENTRIES CSV FILE ----------------------------------
' ---open file ----------
strFile = Application.GetOpenFilename("Text Files (*.csv),*.*", _
, "SELECT ENTRIES csv FILE")
strFileName = strFile
MsgBox strFileName
' ---start cleaning file ----------
strBuffer = Space(FileLen(strFile))
ff = FreeFile
Open strFile For Binary Access Read As #ff
Get #ff, , strBuffer
Close #ff
strBuffer = Replace(strBuffer, Chr(13), "")
Kill strFile
Open strFile For Binary Access Write As #ff
Put #ff, , strBuffer
Close #ff
' --- clear contents & import ----------
Sheets("Entries").Cells.ClearContents
Set ws = ActiveWorkbook.Sheets("Entries") 'set to current worksheet name
strFile = strFileName
With ws.QueryTables.Add(Connection:="TEXT;" & strFile, _
Destination:=ws.Range("A1"))
.TextFileParseType = xlDelimited
.TextFileCommaDelimiter = True
.Refresh
End With
End Sub
Any help would be most appreciated.
Sub opencsv()
strFile = Application.GetOpenFilename("Text Files (*.csv),*.*", , "Please selec text file...")
strFileName = strFile
Set src = Workbooks.Open(Filename:=strFile, Local:=True)
Cells.Copy
ThisWorkbook.Activate
Sheets("Entries").Activate
Range("A1").Select
ActiveSheet.Paste
Application.CutCopyMode = False
src.Close
End Sub

Is it possible to batch convert csv to xls using a macro?

I have a large amount of csv files that I need in .xls format. Is it possible to run a batch conversion with a macro or best done with another language?
I have used this code http://www.ozgrid.com/forum/showthread.php?t=71409&p=369573#post369573 to reference my directory but I'm not sure of the command to open each file and save them. Here's what I have:
Sub batchconvertcsvxls()
Dim wb As Workbook
Dim CSVCount As Integer
Dim myVar As String
myVar = FileList("C:\Documents and Settings\alistairw\My Documents\csvxlstest")
For i = LBound(myVar) To UBound(myVar)
With wb
Application.Workbooks.OpenText 'How do I reference the myvar string ?
wb.SaveAs '....
End With
Next
End Sub
Function FileList(fldr As String, Optional fltr As String = "*.*") As Variant
Dim sTemp As String, sHldr As String
If Right$(fldr, 1) <> "\" Then fldr = fldr & "\"
sTemp = Dir(fldr & fltr)
If sTemp = "" Then
FileList = Split("No files found", "|") 'ensures an array is returned
Exit Function
End If
Do
sHldr = Dir
If sHldr = "" Then Exit Do
sTemp = sTemp & "|" & sHldr
Loop
FileList = Split(sTemp, "|")
End Function
Edit: The files are .txt files formatted as csv
By combining the code given by Scott Holtzman and 'ExcelFreak', the conversion works quite well. The final code looks something like this:
Sub CSV_to_XLS()
Dim wb As Workbook
Dim strFile As String, strDir As String
strDir = "U:\path\"
strFile = Dir(strDir & "*.csv")
Do While strFile <> ""
Set wb = Workbooks.Open(Filename:=strDir & strFile, Local:=True)
wb.SaveAs Replace(wb.FullName, ".csv", ".xls"), 50 'UPDATE:
wb.Close True
Set wb = Nothing
strFile = Dir
Loop
End Sub
Opening the converted .xls file throws a warning everytime:
"The file you are trying to open, 'filename', is in a different format than specified by the file extension. Verify that the file is not corrupted and is from a trusted source before opening the file. Do you want to open the file now?"
Clicking Yes then opens the .xls file.
Is there a way to get rid of this warning message? Excel throws a warning everytime the .xls file is opened.
In a lot less lines of code, this should get you what you want. However, I will say this may not be the fastest way to get it done, because you are opening, saving, and closing the workbook every time. I will look for a faster way, but I forget the method off the top of my head.
Sub batchconvertcsvxls()
Dim wb As Workbook
Dim strFile As String, strDir As String
strDir = "C:\"
strFile = Dir(strDir & "*.csv")
Do While strFile <> ""
Set wb = Workbooks.Open(strDir & strFile)
With wb
.SaveAs Replace(wb.FullName, ".csv", ".xls"), 50 'UPDATE:
.Close True
End With
Set wb = Nothing
Loop
End Sub
** UPDATE **
you need the proper fileformat enumeration for a .xls file. I think its 50, but you can check here Excel File Type Enumeration, if it's not.
The Code of Scott Holtzman nearly did it for me. I had to make two changes to get it to work:
He forgot to add the line that makes our loop continue with the next file. The last line before the Loop should read
strFile = Dir
The Workbooks.Open method did not read my CSV files as expected (the whole line ended up to be text in the first cell). When I added the parameter Local:=True it worked:
Set wb = Workbooks.Open(Filename:=strDir & strFile, Local:=True)
This works properly at least on Excel 2013. Using FileFormat:=xlExcel8 parameter instead of the filetype tag 50 creates files that open without security nags.
Sub CSV_to_XLS()
Dim wb As Workbook
Dim strFile As String, strDir As String
strDir = "C:\temp\"
strFile = Dir(strDir & "*.csv")
Do While strFile <> ""
Set wb = Workbooks.Open(Filename:=strDir & strFile, Local:=True)
wb.SaveAs Replace(wb.FullName, ".csv", ".xls"), FileFormat:=xlExcel8
wb.Close True
Set wb = Nothing
strFile = Dir
Loop
End Sub
This was a good question and I have found in the internet several answers. Just making very small changes (I couldn't edit any of the codes already published) I could make things work a bit better:
Sub CSV_to_XLSX()
Dim wb As Workbook
Dim strFile As String, strDir As String
strDir = "C:\Users\acer\OneDrive\Doctorado\Study 1\data\Retest Bkp\Day 1\Sart\"
strFile = Dir(strDir & "*.csv")
Do While strFile <> ""
Set wb = Workbooks.Open(Filename:=strDir & strFile, Local:=True)
With wb
.SaveAs Replace(wb.FullName, ".csv", ".xlsx"), 51
.Close True
End With
Set wb = Nothing
strFile = Dir
Loop
End Sub

Escaping non-ASCII characters (or how to remove the BOM?)

I need to create an ANSI text file from an Access recordset that outputs to JSON and YAML. I can write the file, but the output is coming out with the original characters, and I need to escape them. For example, an umlaut-O (ö) should be "\u00f6".
I thought encoding the file as UTF-8 would work, but it doesn't. However, having looked at the file coding again, if you write "UTF-8 without BOM" then everything works.
Does anyone know how to either
a) Write text out as UTF-8 without BOM, or
b) Write in ANSI but escaping the non-ASCII characters?
Public Sub testoutput()
Set db = CurrentDb()
str_filename = "anothertest.json"
MyFile = CurrentProject.Path & "\" & str_filename
str_temp = "Hello world here is an ö"
fnum = FreeFile
Open MyFile For Output As fnum
Print #fnum, str_temp
Close #fnum
End Sub
... ok .... i found some example code on how to remove the BOM. I would have thought it would be possible to do this more elegantly when actually writing the text in the first place. Never mind. The following code removes the BOM.
(This was originally posted by Simon Pedersen at http://www.imagemagick.org/discourse-server/viewtopic.php?f=8&t=12705)
' Removes the Byte Order Mark - BOM from a text file with UTF-8 encoding
' The BOM defines that the file was stored with an UTF-8 encoding.
Public Function RemoveBOM(filePath)
' Create a reader and a writer
Dim writer, reader, fileSize
Set writer = CreateObject("Adodb.Stream")
Set reader = CreateObject("Adodb.Stream")
' Load from the text file we just wrote
reader.Open
reader.LoadFromFile filePath
' Copy all data from reader to writer, except the BOM
writer.Mode = 3
writer.Type = 1
writer.Open
reader.Position = 5
reader.CopyTo writer, -1
' Overwrite file
writer.SaveToFile filePath, 2
' Return file name
RemoveBOM = filePath
' Kill objects
Set writer = Nothing
Set reader = Nothing
End Function
It might be useful for someone else.
Late to the game here, but I can't be the only coder who got got fed up with my SQL imports being broken by text files with a Byte Order Marker. There are very few 'Stack questions that touch on the problem - this is one of closest - so I'm posting an overlapping answer here.
I say 'overlapping' because the code below is solving a slightly different problem to yours - the primary purpose is writing a Schema file for a folder with a heterogeneous collection of files - but the BOM-handling segment is clearly marked.
The key functionality is that we iterate through all the '.csv' files in a folder, and we test each file with a quick nibble of the first four bytes: and we only only strip out the Byte Order Marker if we see one.
After that, we're working in low-level file-handling code from the primordial C. We have to, all the way down to using byte arrays, because everything else that you do in VBA will deposit the Byte Order Markers embedded in the structure of a string variable.
So, without further adodb, here's the code:
BOM-Disposal code for text files in a schema.ini file:
Public Sub SetSchema(strFolder As String)
On Error Resume Next
' Write a Schema.ini file to the data folder.
' This is necessary if we do not have the registry privileges to set the
' correct 'ImportMixedTypes=Text' registry value, which overrides IMEX=1
' The code also checks for ANSI or UTF-8 and UTF-16 files, and applies a
' usable setting for CharacterSet ( UNICODE|ANSI ) with a horrible hack.
' OEM codepage-defined text is not supported: further coding is required
' ...And we strip out Byte Order Markers, if we see them - the OLEDB SQL
' provider for textfiles can't deal with a BOM in a UTF-16 or UTF-8 file
' Not implemented: handling tab-delimited files or other delimiters. The
' code assumes a header row with columns, specifies 'scan all rows', and
' imposes 'read the column as text' if the data types are mixed.
Dim strSchema As String
Dim strFile As String
Dim hndFile As Long
Dim arrFile() As Byte
Dim arrBytes(0 To 4) As Byte
If Right(strFolder, 1) <> "\" Then strFolder = strFolder & "\"
' Dir() is an iterator function when you call it with a wildcard:
strFile = VBA.FileSystem.Dir(strFolder & "*.csv")
Do While Len(strFile) > 0
hndFile = FreeFile
Open strFolder & strFile For Binary As #hndFile
Get #hndFile, , arrBytes
Close #hndFile
strSchema = strSchema & "[" & strFile & "]" & vbCrLf
strSchema = strSchema & "Format=CSVDelimited" & vbCrLf
strSchema = strSchema & "ImportMixedTypes=Text" & vbCrLf
strSchema = strSchema & "MaxScanRows=0" & vbCrLf
If arrBytes(2) = 0 Or arrBytes(3) = 0 Then ' this is a hack
strSchema = strSchema & "CharacterSet=UNICODE" & vbCrLf
Else
strSchema = strSchema & "CharacterSet=ANSI" & vbCrLf
End If
strSchema = strSchema & "ColNameHeader = True" & vbCrLf
strSchema = strSchema & vbCrLf
' BOM disposal - Byte order marks confuse OLEDB text drivers:
If arrBytes(0) = &HFE And arrBytes(1) = &HFF _
Or arrBytes(0) = &HFF And arrBytes(1) = &HFE Then
hndFile = FreeFile
Open strFolder & strFile For Binary As #hndFile
ReDim arrFile(0 To LOF(hndFile) - 1)
Get #hndFile, , arrFile
Close #hndFile
BigReplace arrFile, arrBytes(0) & arrBytes(1), ""
hndFile = FreeFile
Open strFolder & strFile For Binary As #hndFile
Put #hndFile, , arrFile
Close #hndFile
Erase arrFile
ElseIf arrBytes(0) = &HEF And arrBytes(1) = &HBB And arrBytes(2) = &HBF Then
hndFile = FreeFile
Open strFolder & strFile For Binary As #hndFile
ReDim arrFile(0 To LOF(hndFile) - 1)
Get #hndFile, , arrFile
Close #hndFile
BigReplace arrFile, arrBytes(0) & arrBytes(1) & arrBytes(2), ""
hndFile = FreeFile
Open strFolder & strFile For Binary As #hndFile
Put #hndFile, , arrFile
Close #hndFile
Erase arrFile
End If
strFile = ""
strFile = Dir
Loop
If Len(strSchema) > 0 Then
strFile = strFolder & "Schema.ini"
hndFile = FreeFile
Open strFile For Binary As #hndFile
Put #hndFile, , strSchema
Close #hndFile
End If
End Sub
Public Sub BigReplace(ByRef arrBytes() As Byte, ByRef SearchFor As String, ByRef ReplaceWith As String)
On Error Resume Next
Dim varSplit As Variant
varSplit = Split(arrBytes, SearchFor)
arrBytes = Join$(varSplit, ReplaceWith)
Erase varSplit
End Sub
The code's easier to understand if you know that a Byte Array can be assigned to a VBA.String, and vice versa. The BigReplace() function is a hack that sidesteps some of VBA's inefficient string-handling, especially allocation: you'll find that large files cause serious memory and performance problems if you do it any other way.