How can I read in a TXT file in Access that is over 255 char/line and contains control char? - ms-access

I am running Access 2010. I need to read in a TXT file into a string. Each line can be anywhere from 40 to 320 char long, ending in a CR. The biggest problem is the TXT file of various lines contains comma's (,) and quotations (") as part of the data.
Is there a trick to doing this? Even if it is getting each char, and testing to see if it is a CR....

To accomplish this task, you will need to write your own import code that will read directly from the file. The Microsoft Access import features will not handle a file like this very well, and since you want to analyze each line in code, it is better to handle reading it yourself.
There are many approaches you can take, and all will involve File handles and Opening the file. But, the best approach is to use a class that does all of the dirty work for you.
One such class is the LargeTextFile class that can be found in any of the Microsoft Access Developer's Handbooks (Volume 1) for Access 97, 2000, 2002 or 2003, written by Getz, Litwin, and Gilbert (Sybex), if you have access to one of them.
Another option would be the clsReadTextFile class, available for free on the Access MVP Site (The Access Web) site:
http://www.theaccessweb.com/downloads/clsReadTextFile.txt
Using clsReadTextFile you can process your file, line by line using code similar to this:
Dim file As New clsReadTextFile
Dim line As String
file.FileName = "C:\MyFile.txt"
file.cfOpenFile
Do While Not file.EndOfFile
file.csGetALine
line = file.Text
If InStr(line, "MySearchText") Then
'Do something
End If
Loop
file.cfCloseFile
The line string variable will contain the text of the line just read, and you can write code to parse it how you need and process it appropriately. Then the loop will go on to read the next line. This will allow you to process each line of the file manually in your code.

It is not clear from your post as to whether or not you can - or have tried - to use the tools available in the product for this task. Access 2010 offers linking to a .txt file as well as appending a .txt file to a table. These are standard features in the External tab of the ribbon.
The Large Text (formerly Memo) field type allows ~4K characters. Not sure if you wish to attempt to bring in all the txt data into a single field - if so then this limit is important.
If the CRs of the text document imply a new record/row of data - rather than a continuous string for the entire document - - AND - - - if there is any consistent structure within all rows of data - then the import wizard can use either character count or symbols (i.e. comma if they exist) - as the means to separate/segregate each individual row of data into separate fields in a single row of a table.

Related

Psychopy: how to avoid to store variables in the csv file?

When I run my PsychoPy experiment, PsychoPy saves a CSV file that contains my trials and the values of my variables.
Among these, there are some variables I would like to NOT be included. There are some variables which I decided to include in the CSV, but many others which automatically felt in it.
is there a way to manually force (from the code block) the exclusion of some variables in the CSV?
is there a way to decide the order of the saved columns/variables in the CSV?
It is not really important and I know I could just create myself an output file without using the one of PsychoPy, or I can easily clean it afterwards but I was just curious.
PsychoPy spits out all the variables it thinks you could need. If you want to drop some of them, that is a task for the analysis stage, and is easily done in any processing pipeline. Unless you are analysing data in a spreadsheet (which you really shouldn't), the number of columns in the output file shouldn't really be an issue. The philosophy is that you shouldn't back yourself into a corner by discarding data at the recording stage - what about the reviewer who asks about the influence of a variable that you didn't think was important?
If you are using the Builder interface, the saving of onset & offset times for each component is optional, and is controlled in the "data" tab of each component dialog.
The order of variables is also not under direct control of the user, but again, can be easily manipulated at the analysis stage.
As you note, you can of course write code to save custom output files of your own design.
there is a special block called session_variable_order: [var1, var2, var3] in experiment_config.yaml file, which you probably should be using; also, you should consider these methods:
from psychopy import data
data.ExperimentHandler.saveAsWideText(fileName = 'exp_handler.csv', delim='\t', sortColumns = False, encoding = 'utf-8')
data.TrialHandler.saveAsText(fileName = 'trial_handler.txt', delim=',', encoding = 'utf-8', dataOut = ('n', 'all_mean', 'all_raw'), summarised = False)
notice the sortColumns and dataOut params

How to get SSIS to select specific files in directory and assign name to variables (File System Task)

I have the following scenario:
I have a remote server that every week gets loaded with 2 files, these files have the following name format:
"FINAL_NAME06Apr16.txt" and
"FINAL_NAME_F106Apr16.txt"
The part in bold is fixed everytime, but the date changes, now, I need to pick, copy to another directory and rename these files. but I'm not sure about how to pick the name of the files to variables to operate with them as I need to put different name to each file.
How can I proceed? I' pretty sure it has to be done with naming a variable with an expression, but I don't know how to do that part.
I think I need some function to calculate the rest of the filename, I believe maybe some approach could be to first rename the part "FINAL_NAME_F1" and then rename the "FINAL_NAME" since some wildcards will pick both if don't do it that way?
Cheers.
You can calculate the date but why go through that complexity?
A Foreach (File) Loop Container, FELC, will handle this just fine. Add two of them to your control flow.
The first one will use a file mask of FINAL_NAME_F1*.txt. Inside that FELC, use a File System task to copy/move/rename the file to your new location.
That first FELC will run, find the target file and move it. It will then look for the next file, find none and go on to the next task.
Create a second FELC but this one will operate on FINAL_NAME*.txt It's crucial that the first FELC run first as this file mask will match both FINAL_NAME_f1-2019-01-01.txt and FINAL_NAME-2019-01-01.txt. By ordering our operations as such, we can reduce the complexity of the logic required.
Sample answer with a FELC to show where to plumb the various bits

converting csv to arff

I am working on a school project for data mining, where we were given CSV data from kaggle (this is how the data looks (2 lines out of 6970)):
4,1970,Female,150,DomesticPartnersKids,Bachelor's Degree,Democrat,,Yes,No,No,No,Yes,Public,No,Yes,No,Yes,No,No,Yes,Science,Study first,Yes,Yes,No,No,Receiving,No,No,Pragmatist,No,No,Cool headed,Standard hours,No,Happy,Yes,Yes,Yes,No,A.M.,No,End,Yes,No,Me,Yes,Yes,No,Yes,No,Mysterious,No,No,,,,,,,,,,Mac,Yes,Cautious,No,Umm...,No,Space,Yes,In-person,No,Yes,Yes,No,Yay people!,Yes,Yes,Yes,Yes,Yes,No,Yes,,,,,,,,,,,,,,,,,No,No,No,Only-child,Yes,No,No
5,1997,Male,75,Single,High School Diploma,Republican,,Yes,Yes,No,,Yes,Private,No,No,No,Yes,No,No,Yes,Science,Study first,,Yes,No,Yes,Receiving,No,Yes,Pragmatist,No,Yes,Cool headed,Odd hours,No,Right,Yes,No,No,Yes,A.M.,Yes,Start,Yes,Yes,Circumstances,No,Yes,No,Yes,Yes,Mysterious,No,No,Tunes,Technology,Yes,Yes,Yes,Yes,No,Supportive,No,PC,No,Cautious,No,Umm...,No,Space,No,In-person,No,No,Yes,Yes,Grrr people,Yes,No,No,No,No,No,No,Yes,No,No,Yes,No,Own,Pessimist,Mom,No,No,No,No,Nope,Yes,No,No,No,Yes,No,Yes,No,Yes,No
and we have to get this to an .arff format for use in weka. I manualy typed the header(107 attributes)
#ATTRIBUTE user_id NUMERIC
#ATTRIBUTE yob NUMERIC
#ATTRIBUTE gender {Male,Female}
#ATTRIBUTE income {150,100,75,50,25,10}
#ATTRIBUTE householdstatus {MarriedKids,Married,DomesticPartnersKids,DomesticPartners,Single,SingleKids}
#ATTRIBUTE educationlevel {Bachelor's Degree,High School Diploma,Current K-12,Current Undergraduate,Master's Degree,Associate's Degree,Doctoral Degree}
#ATTRIBUTE party {Democrat,Republican}
#ATTRIBUTE Q124742 {Yes,No}
#ATTRIBUTE Q124122 {Yes,No}
and I get this error :
} expected at end of enumeration read token eol
Then I tried to use the weka converter but it gave me an error
Wrong number of values.Read 2,expected 1,read Token[EOL],line 4 Problem encountered at line:3
Here's what I did:
From Kaggle, I downloaded train.csv (5568 instances, highest ID numbeer 6960).
I didn't use the converter -- just loaded it into the Weka Explorer as a CSV file. Some problems and their solution:
Line 3: First instance of "Bachelor's Degree". It did NOT like that single quote ("line 3, read 7, expected 108"). Got rid of all single quotes (using a global replace in a text editor). Then I tried to load it into Weka again.
The file doesn't have a CR (the Enter key on the keyboard) at the end of the last line, which caused an error ("null on line 5569"). I added one, again in a text editor. Then I loaded it into Weka, and took a look at the variables.
YOB (Year of Birth) is missing for about 300 instances, with "NA" filled in. So, it didn't evaluate as either string or numeric. Edited these to be empty cells instead. Then I loaded it into Weka.
And, of course, moved Party to be the class variable (at the end). I did this in Weka.
Saved this as train.arff
Loaded it back in, and it seems to work OK. I generated 51% accuracy with a OneR classifier, but you wouldn't expect a OneR classifier to work well here. I'm sure you can do better.
Note I didn't do any manual typing of headers. That must have taken a while!
Good luck!

Why is SSIS complaining that "There is a partial row at the end of the file"?

I'm importing a flat file into a database using a Data Flow Task in SSIS. The file is very simple: it contains three comma-separated values per row. Whenever I run this task, however, I receive a warning from the Flat File component:
Warning: 0x8020200F: There is a partial row at the end of the file.
This warning seems to happen regardless of the size of the file: even with only a handful of rows in the file, visually validated (with extended characters and whatnot visible) I still receive it. Moreover, it doesn't seem to matter whether I have a blank row at the end of the file or I just end it without a trailing CR+LF.
How can I get rid of this warning so I can run my package with WarnAsError enabled?
(BTW, it seems someone else may have had a similar problem in There is a partial row at the end of the file, though it wasn't much of a question.)
I have found three things to try if you encounter this problem. In at least two out of the three cases, SSIS was ignoring rows of my input file with only the above warning to show for it. Because of that, I do not recommend ignoring this warning!
Step 1: verify that your flat file is valid
This error will appear when you have an invalid input file. This can be especially hard to detect if your input file has millions of lines, as mine do, but it's vital that you discover file format violations because SSIS will happily give you this warning and continue on its way without importing the offending lines or, in some cases, the lines after the offending lines. The easiest way I found to discover a problem with the source file is to check the number of rows that are being imported successfully. If it's vastly different than the number you expect in your flat file, something may have gone wrong in the middle somewhere.
Step 2: try a dummy line at the end (fixed-width only)
If you are using a fixed-width format input file, Microsoft may have a helpful KB article for you. Basically, they suggest that you add a dummy line at the end of the file.
I am not using fixed-width files, so I can't say how useful this technique is.
Step 3: turn off text qualification for non-text
This is the tricky one, because I believe the TextQualified property is True by default. If your input file uses non-text fields (integers, etc.), then you must tell SSIS that it should not expect those columns to be qualified as text. Essentially, your input file will be invalid in spite of looking perfectly valid.
TextQualified is a property of the columns in your Flat File Connection Manager.
To change it, open up your connection manager, click "Advanced", and then click on a non-text column. Make sure the TextQualified property is set to False. You will need to do this for all of your non-text columns.
If the byte width of a line in the file is known, you can always double check that the total byte size of the file can be divided by the expected line size to give you a nice round line count number (as opposed to a decimal).
It helps also to know from your source just how many records are expected, but if you don't have this you can at least double check the resultant loaded tables record count against the calculation of line count while loading the file.
I've seen this error often when a source flat text file is missing it's last \r\n at the end of the file.
Running on Windows 64 bit is perfect. It led to no missing row, but I lost the last row when running on Windows 2008.
My workaround is
1. open the ssis in BIDs on the Windows 2008.
2. open the file connection manager make sure Text Qualifier set to
3. rebuild it
All work fine in both Windows 7 and Windows 2008.

Reporting Services 2008R2 throws "native compiler error [BC30494] Line is too long."

An unexpected error occurred while compiling expressions. Native compiler return value: '[BC30494] Line is too long.'.
When RS throws this error, the typical scenario appears to be that there are too many text boxes on a specific data region; and the only known measure seems to be to 'minify' text box names (ie. rename TextBox345 to T345).
My report is not that large (<100 text boxes); but I make extensive use of the Lookup() function to set many of the textbox style properties from a styles dataset (>2500 Lookup() calls).
So my guess is that the VB code-behind that gets generated for the Lookup() function is quite verbose and therefore breaks the 64K limit for a generated VB code block per data region.
Can I test my hypothesis? Ie. is there a way I can inspect the generated VB code?
Any suggestions as to how to fix/dodge this problem? Needless to say that using abbreviated names in my case didn't cut it.
Quite a delayed response but for the sake of posterity:
The .vb source file of the generated code exists on disk temporarily within the directory C:\Users\{RS Service Account Name}\AppData\Local\Temp. As you mentioned, if any line goes past 65535 characters then it will fail compilation due to a VB limitation
This issue was just fixed in Reporting Services 2012 SP2 CU5; the KB article is located here. Unfortunately mainstream support for SQL 2008 R2 has ended so the fix is unlikely to be backported. As for workarounds:
Shorten the textbox names to the absolute minimum possible (e.g., use all possible single character names, then all possible two character names)
Try using subreports to split up the report
Try to rework your dataset query so that you can reduce the Lookup() calls