Reading and parsing large delimited text files in VB.net - mysql

I'm busy with an applicaton which reads space delimited log files ranging from 5mb to 1gb+ in size, then stores this information to a MySQL database for later use when printing reports based upon the information contained in the files. The methods I've tried / found work but are very slow.
Am I doing something wrong? or is there a better way to handle very large text files?
I've tried using textfieldparser as follows:
Using parser As New TextFieldParser("C:\logfiles\testfile.txt")
parser.TextFieldType = FieldType.Delimited
parser.CommentTokens = New String() {"#"}
parser.Delimiters = New String() {" "}
parser.HasFieldsEnclosedInQuotes = False
parser.TrimWhiteSpace = True
While Not parser.EndOfData
Dim input As String() = parser.ReadFields()
If input.Length = 10 Then
'add this to a datatable
End If
End While
End Using
This works but is very slow for the larger files.
I then tried using an OleDB connection to the text file as per the following function in conjunction with a schema.ini file I write to the directory beforehand:
Function GetSquidData(ByVal logfile_path As String) As System.Data.DataTable
Dim myData As New DataSet
Dim strFilePath As String = ""
If logfile_path.EndsWith("\") Then
strFilePath = logfile_path
Else
strFilePath = logfile_path & "\"
End If
Dim mySelectQry As String = "SELECT * FROM testfile.txt WHERE Client_IP <> """""
Dim myConnection As New System.Data.OleDb.OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & strFilePath & ";Extended Properties=""text;HDR=NO;""")
Dim dsCmd As New System.Data.OleDb.OleDbDataAdapter(mySelectQry, myConnection)
dsCmd.Fill(myData, "logdata")
If Not myConnection.State = ConnectionState.Closed Then
myConnection.Close()
End If
Return myData.Tables("logdata")
End Function
The schema.ini file:
[testfile.txt]
Format=Delimited( )
ColNameHeader=False
Col1=Timestamp text
Col2=Elapsed text
Col3=Client_IP text
Col4=Action_Code text
Col5=Size double
Col6=Method text
Col7=URI text
Col8=Ident text
Col9=Hierarchy_From text
Col10=Content text
Anyone have any ideas how to read these files faster?
-edit-
Corrected a typo in the code above

There are two potentially slow operations there:
File reading
Inserting lots of data into the db
Separate them and test which is taking the most time. I.e. write one test program that simply reads the file, and another test program that just inserts loads of records. See which one is slowest.
One problem could be that you are reading the whole file into memory?
Try reading it line by line with a Stream. Here is a code example copied from MSDN
Imports System
Imports System.IO
Class Test
Public Shared Sub Main()
Try
' Create an instance of StreamReader to read from a file.
' The using statement also closes the StreamReader.
Using sr As New StreamReader("TestFile.txt")
Dim line As String
' Read and display lines from the file until the end of
' the file is reached.
Do
line = sr.ReadLine()
If Not (line Is Nothing) Then
Console.WriteLine(line)
End If
Loop Until line Is Nothing
End Using
Catch e As Exception
' Let the user know what went wrong.
Console.WriteLine("The file could not be read:")
Console.WriteLine(e.Message)
End Try
End Sub
End Class

From the top of my head id say try to impelement some kind of threading to spread the workload.

Related

create custom text file from Microsoft Access

I'm trying to create a text file from an Access Database that looks exactly like this:
CADWorx P&ID Drop Down List Configuration File.
Notes:
-This file contains information on what
appears in the drop down list in the
CEDIT Additional Data Dialog
-Entries should be separated by a semi-colon (;)
-If a value is not set, an edit box will appear
in the CEDIT Additional Data Dialog instead
of a drop down list.
-Example: AREA_=031;032;033;034A;034B;
Example: SERVICE_=AEC;HW;LH;CCH;
[DOCUMENTATION]
TYPE_=
DATESUB_=
DATEAPR_=
CREATEBY_=
APRBY_=
[LINE]
SERVICE_=OIL;FUEL GAS;
AREA_=
UNIT_=
COUNT_=
TYPE_=
RATING_=
FLGFACE_=
DESIGNPSI_=
DESIGNDEG_=
LINE_NUM_=
OPERPSI_=
OPERDEG_=
SPECPRESS_=
SPECTEMP_=
MINDEG_=
TESTPSI_=
INSULATE_=
HEATTRACE_=
XRAY_=
CODE_=
JOINTEFF_=
WELDPROC_=
INSPECT_=
MATPIPE_=
COMPNOTE_=
NOTE_=
USER1_=
All the fields on the left (that end with '_=') are field titles in my database.Then as explained above, values for those fields must be added and separated by a semicolon. I've been researching for over a week and pretty much just hitting dead ends with text file customization in Access. Can someone tell me if this is the way to go? Or should I export the data to Excel and create the text file from there?
Your help is very much appreciated. Thanks in advance.
Here's the basics to write records to a file:
Dim dbs As Database
Dim rst As Recordset
Dim intFileDesc As Integer 'File descriptor for output file (number used by OS)
Dim strOutput As String 'Output string for entry
Dim strRecordSource As String 'Source for recordset, can be SQL, table, or saved query
Dim strOutfile As String 'Full path to output file
Kill strOutfile 'Delete the output file before using it.
'Not necessary, but ensures you have a clean copy every time
intFileDesc = FreeFile 'Get a free file descriptor
Open strOutfile For Binary As #intFileDesc 'Open the output file for writing
Set dbs = CurrentDb
Set rst = dbs.OpenRecordset(strRecordSource ) 'open the recordset based on our source string
With rst 'make things easier for ourselves
While Not .EOF
strOutput = !Field1 & ";" & !Field2 & ";" & !Field3
Print #intFileDesc, strOutput 'Print output string to file
.MoveNext 'Advance to next record in recordset
Wend
.Close 'Close this recordset
End With
Close #intFileDesc 'Close output file
Set rst = Nothing
Set dbs = Nothing 'Garbage handling before we exit the function
The essential line is this:
Print #intFileDesc, strOutput 'Print output string to file
by which you can write one line at a time.
Thus, build a function that expands on this, creating your output line by line (including empty lines for spacing) until done. That's it.
It takes a lot of code but really is quite trivial. And there is no other way for outputs like these.

Deleting words/strings containing a specific character in MS Access

I'm writing a query to extract text that was entered through a vendor-created word processor to an Oracle database and I need to export it to Word or Excel. The text is entered into a memo field and the text is intertwined with codes that the word processor uses for different functions (bold, indent, hard return, font size, etc.).
I've used the replace function to parse out a lot of the more common codes, but there are so many variations, it's nearly impossible to catch them all. Is there a way to do this? Unfortunately, I'm limited to using Microsoft Access 2010 to try and accomplish this.
The common thread I've found is that all the codes start with a back-slash and I'd like to be able to delete all strings that start with a back-slash up to the next space so all the codes are stripped out of the final text.
Here's a brief example of the text I'm working with:
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Times New Roman;
\viewkind4\uc1\pard\f0\fs36 An abbreviated survey was conducted
on 02/02/15 to investigate complaint #OK000227. \par
No deficiencies were cited.\par
\fs20\par
}}
If your machine has Microsoft Word installed then you already have an RTF parser available so you don't have to "roll your own". You can just get Word to open the RTF document and save it as plain text like this:
Option Compare Database
Option Explicit
Public Function RtfToPlainText(rtfText As Variant) As Variant
Dim rtn As Variant
Dim tempFolder As String, rtfPath As String, txtPath As String
Dim fso As Object ' FileSystemObject
Dim f As Object ' TextStream
Dim wordApp As Object ' Word.Application
Dim wordDoc As Object ' Word.Document
Dim tempFileName As String
tempFileName = "~RtfToPlainText"
If IsNull(rtfText) Then
rtn = Null
Else
' save RTF text as file
Set fso = CreateObject("Scripting.FileSystemObject")
tempFolder = fso.GetSpecialFolder(2) ' Temporaryfolder
rtfPath = tempFolder & "\" & tempFileName & ".rtf"
Set f = fso.CreateTextFile(rtfPath)
f.Write rtfText
f.Close
Set f = Nothing
' open in Word and save as plain text
Set wordApp = CreateObject("Word.Application")
Set wordDoc = wordApp.Documents.Open(rtfPath)
txtPath = tempFolder & "\" & tempFileName & ".txt"
wordDoc.SaveAs2 txtPath, 2 ' wdFormatText
wordDoc.Close False
Set wordDoc = Nothing
wordApp.Quit False
Set wordApp = Nothing
fso.DeleteFile rtfPath
' retrieve plain text
Set f = fso.OpenTextFile(txtPath)
rtn = f.ReadAll
f.Close
Set f = Nothing
fso.DeleteFile txtPath
Set fso = Nothing
End If
RtfToPlainText = rtn
End Function
Then, if you had a table with two Memo fields - [rtfText] and [plainText] - you could extract the plain text into the second Memo field using the following query in Access:
UPDATE rtfTestTable SET plainText = RtfToPlainText([rtfText]);
The text you are working with is RTF. Here is a tutorial about the file format.
This link (on another site, registration required) may give you copy & paste code you can use to convert rtf fields to txt.
You may be able to copy the value of the field from the database and paste it into notepad and then save the notepad file as "test.rtf"...you could then double click the file icon and the document may open.
RTF is an old MS file format that allows formatting of text. See this wikipedia page.

How to copy the contents of an attached file from an MS Access DB into a VBA variable?

Background Information:
I am not very savvy with VBA, or Access for that matter, but I have a VBA script that creates a file (a KML to be specific, but this won't matter much for my question) on the users computer and writes to it using variables that link to records in the database. As such:
Dim MyDB As Database
Dim MyRS As Recordset
Dim QryOrTblDef As String
Dim TestFile As Integer
QryOrTblDef = "Table1"
Set MyDB = CurrentDb
Set MyRS = MyDB.OpenRecordset(QryOrTblDef)
TestFile = FreeFile
Open "C:\Testing.txt"
Print #TestFile, "Generic Stuff"
Print #TestFile, MyRS.Fields(0)
etc.
My Situation:
I have a very large string(a text document with a large list of polygon vertex coordinates) that I want to add to a variable to be printed to another file (a KML file, noted in the above example). I was hoping to add this text file containing coordinates as an attachment datatype to the Access database and copy its contents into a variable to be used in the above script.
My Question:
Is there a way I can access and copy the data from an attached text file (attached as an attachment data type within a field of an MS Access database) into a variable so that I can use it in a VBA script?
What I have found:
I am having trouble finidng information on this topic I think mainly because I do not have the knowledge of what keywords to be searching for, but I was able to find someones code on a forum, "ozgrid", that seems to be close to what I want to do. Though it is just pulling from a text file on disk rather than one attached to the database.
Code from above mentioned forum that creates a function to access data in a text file:
Sub Test()
Dim strText As String
strText = GetFileContent("C:\temp\x.txt")
MsgBox strText
End Sub
Function GetFileContent(Name As String) As String
Dim intUnit As Integer
On Error Goto ErrGetFileContent
intUnit = FreeFile
Open Name For Input As intUnit
GetFileContent = Input(LOF(intUnit), intUnit)
ErrGetFileContent:
Close intUnit
Exit Function
End Function
Any help here is appreciated. Thanks.
I am a little puzzled as to why a memo data type does not suit if you are storing pure text, or even a table for organized text. That being said, one way is to output to disk and read into a string.
''Ref: Windows Script Host Object Model
Dim fs As New FileSystemObject
Dim ts As TextStream
Dim rs As DAO.Recordset, rsA As DAO.Recordset
Dim sFilePath As String
Dim sFileText As String
sFilePath = "z:\docs\"
Set rs = CurrentDb.OpenRecordset("maintable")
Set rsA = rs.Fields("aAttachment").Value
''File exists
If Not fs.FileExists(sFilePath & rsA.Fields("FileName").Value) Then
''It will save with the existing FileName, but you can assign a new name
rsA.Fields("FileData").SaveToFile sFilePath
End If
Set ts = fs.OpenTextFile(sFilePath _
& rsA.Fields("FileName").Value, ForReading)
sFileText = ts.ReadAll
See also: http://msdn.microsoft.com/en-us/library/office/ff835669.aspx

How to Modify/Update a .CSV file through VB 6.0

I have a .CSV File with 5 values in a row , i want to modify the file in a way i should add one more value in the Beginning/End/Middle of the row.
How to add a new row with a set of values in the .CSV File?
How to do this in a simple way?
There is no magic way to insert things into the middle of a stream file (such as any text file including CSV files).
So this means you need to read the old file and modify it as you go writing a new file out.
There are many ways to do this though:
Read the input file into memory as a blob and work on it there then write out the modified data.
Read/write it with changes line by line.
Use Jet Text IISAM, Log Parser's COM API, etc. which allow SQL and SQL-like operations on text data in tabular formats such as CSV.
The simplest and most general way is line by line read/modify/write. This can be slower than the "blob" approach for small to middling files but doesn't risk the headaches that may result when a large file must be processed.
For very large files this can be optimized by reading, parsing, modifying, then writing in "chunks" to minimize I/O costs. But this can also be more complex to program correctly.
This piece of code may help , this is not an answer but it will help
Dim line As String
Dim arrayOfElements() As String
Dim linenumber As Integer
Dim i As Integer
Dim opLine As String
Dim fso As New FileSystemObject
Dim ts As TextStream
line = ""
Open strPath For Input As #1 ' Open file for input
Do While Not EOF(1) ' Loop until end of file
linenumber = linenumber + 1
Line Input #1, line
arrayOfElements = Split(line, "|")
If Not linenumber = 1 Then
If UBound(arrayOfElements) = 2 Then
line = line & "|x|y"
opLine = opLine & line & vbCrLf
End If
Else
line = line & "|col4|col5"
opLine = opLine & line & vbCrLf
End If
Loop
Close #1 ' Close file.
Set ts = fso.CreateTextFile(strPath, True)
ts.WriteLine (opLine)
ts.Close
fso need to be closed!
Set fso = Nothing

How to write to a specific line in a text file using vb

I have a text file with this sample data
abduct test|1
chip test|2
hatter test|3
evil test|4
I would like to know how I could loop through it to find a user and remove that line.
This is what I have so far:
Public Sub RemoveMember(member As String)
Dim u As String, strdata() As String
Open (App.Path & "\Membership.txt") For input As #1
Do
input #1, u
strdata = Split(u, "|")
If strdata(0) = member Then
'figure out a way to remove this line from the text file'
End If
Loop Until EOF(1)
Close #1
End Sub
I would say:
Read in the lines one by one
Check if the line contains the member to be deleted
Write back the line to a new file if it does not
After reading all the lines delete the original file
Rename the new file to the original file
Stefan's method is probably faster but will use a lot of memory if the file grows very large.
I am not familiar wityh VB's native file method. Using FileSystemObject (reference Microsoft Scripting Host) you wil get:
Dim clsOriginalFile as TextStream
Dim clsNewFile as TextStream
Dim FSO as New FileSystemObject
Dim varLine as Variant
Dim strLine as String
set clsOriginalFile=FSO.OpenTextFile "members.txt", ForReading
set clsNewFile =FSO.OpentTextFile "temp.txt", ForWriting, True
Do While Not clsOriginalFile.AtEndOfStream
varLine = clsOriginalFile.ReadLine
strLine=varLine
If instr(strLine,member)=0 Then
clsNewFile.WriteLine strLine
End If
Loop
clsOriginalFile.Close
clsNewFile.Close
FSO.DeleteFile("members.txt")
FSO.MoveFile("temp.txt","members.txt")
Written without the help of the IDE, so there may a few typos in the code.