Access Export to CSV, trouble keeping leading zeros, 55-60 Million Records - ms-access

Let me preface this by saying yes, I do need all of the records. It's part of a FOIL request. Not necessarily in one worksheet or file.
I'm having trouble figuring this out. I am currently able to export about 500k records at a time without timing out or exceeding access file size(I think this is due to working with state systems legacy data) or worrying about excel row limit. I can preserve column headers but lose leading zeros in one field.
This is done using the export wizard to text file. I change the destination file name ending from .txt to .csv and that gives me the option to keep headers. On the preview of the .csv file in the wizard and when opened in notepad it shows the field with leading zeros correctly with double quotes around it, for example "00123456" but then when opened in excel it shows as 123456. If I update the row from General format to Text the contents remain the same.
I have tried VBA method of DoCmd.TransferSpreadsheet but when I try to run it I am prompted with a Macros box. And honestly am less familiar with VBA than in am SQL. Overall I consider myself a novice.

Related

SSIS - Exporting data with commas to a csv file

I am trying to export a list of fields to a csv file from a database.
It keeps putting all the data onto one column and doesn't separate it. When checking the preview it seems to be okay but on export its not working. Currently trying to following settings. Any help would be appreciated.
SSIS settings
Excel file output issue
Actually it seems to work, Excel is just too dumb to recognize it.
Mark the whole table, then go to Data -> Text in Rows
And configure the wizard (Separeted, Semikolon as Separator):
Now you have seperated the rows and cells:

What is the fastest way to import a text file to MS Access table by using VBA?

I am just wondering what is the fastest way to import data from text file to Microsoft Access table via VBA.
So far as I know, there are 3 ways.
Use docmd.transfertext method to upload the whole file. Deal with the errors later if there are any.
use "Line Input" statement to read the text line by line, then use recordset.add method to add the record line by line.
Set new Excel.application object, open the file via excel, do all the reformatting, then saved as temp spreadsheet. use Docmd.transferspreadsheet method to upload to the table
Is there any other better way to upload text to MS Access?
What is the fastest way?
The built in command to transfer the text file will be the fastest.
However, if one needs some increased flexibility, then the line input would be next best (and next fastest).
Launching and automation of a whole copy of Excel is quite heavy. However, to be fair, once loaded, then speed would be ok.
The issue is not really the speed of the load, but what kind of code you need for re-formatting of the data.
If you use transtext, then it is very fast. However, if you now have to re-loop and re-process that data, the you are giving over the data a second time. So the additional time here is NOT the import speed, but the additional processing.
The advantage of line input, is you can then re-format, and deal with processing of that one line, and then send it off to a table. This means you ONLY loop and touch each row of data one time.
So transfertext is the fastest, but now if you have to re-loop and touch every row of data again, then you touched the data two times.
So the transfer speed likely not the real goal to “center” on but that of what kind of processing and how much processing is required once you grabbed the data.
Line input would the ONLY approach that would touch each row of data one time as you pull it from the file, process it, and then send it to the table.
All other approaches involve reading in the whole data set, and THEN processing the data – so you touching the data a second time.
Thank you to Albert D. Kallal.
Just to share some testing result here. I uploaded 508481 records in to a Access table.
It took 14 mins and 30 secs to complete the upload via Line Input method. And it took 3 mins and 12 secs to complete the upload via transfertext method and reformat the output via vba code.
docmd.transfertext method is a lot faster than line input, even though it need to read the data twice for reformat the output.
The only downside I couldn't resolve via docmd.transfertext is that the order of records in Access table cannot be kept as the same as the record order in source file if the source file doesn't contain any obvious sorting ID or sorting logic.

SSIS Flat files to retain commas from SQL data

I'm very new to ssis.
I have this issue similar to a post: Flat File to retain commas from SQL data
But unfortunately I have not gotten any clearance yet.
Basically in SSIS package, I have
OLEDB (source)
Fuzzy lookup
Conditional split to flat files (I'm using these flat files as source in another data flow)
While populating the records to the specified flat files, commas in address are being treated as delimiter hence all the values are populated wrongly.
Questions:
I have gone through some recommendations on quoting the string so the commas within one field can be escaped. If this is a resolution, then on which ssis step should I quote my address string?
How SSIS read vs How SSIS add delimiter (might cause brain damage)
SSIS wants us to specify how it should read the output from my conditional split. But think about it, shouldn't it already know since from my steps: OLEDB to FUZZY to conditional split the columns are formatted into columns properly. Shouldn't it be asking how you want your values to be delimited instead? Just like if you have an excel and you want to convert them to flat files, it prompts you whether you wants it to be comma delimited or tab delimited. When you specify CSV, it simply adds comma between all your cell value (regardless if you have comma in your cell).
Please bear with my stupidity.
While populating the records to the specified flat files, commas in address are being treated as delimiter hence all the values are populated wrongly.
For this reason you should avoid using comma as a delimiter and use something that does not normally appear in free text. The next most commonly use delimiter is TAB and you are far less likely to face this issue if you use it
If you really want to use comma then you can use a 'text qualifier' of quotes but that just means you have two characters that will cause problems: commas and quotes!
which ssis step should I quote my address string?
Don't do it - just use a tab delimiter (or something even less common like ~ or `. If you really want to do it, it should be in the output component somewhere
How SSIS read vs How SSIS add delimiter (might cause brain damage) SSIS wants us to specify how it should read the output from my conditional split. But think about it, shouldn't it already know since from my steps: OLEDB to FUZZY to conditional split the columns are formatted into columns properly. Shouldn't it be asking how you want your values to be delimited instead? Just like if you have an excel and you want to convert them to flat files, it prompts you whether you wants it to be comma delimited or tab delimited. When you specify CSV, it simply adds comma between all your cell value (regardless if you have comma in your cell).
I'm not familiar with those components but yes, internally you should never have to worry about delimiters - it already knows which are individual columns
The real misunderstanding here is double clicking a csv file and having it open in Excel. Excel guesses the delimiters. When you open your tab delimited csv file in excel it has guessed the delimiter as comma, opened it, and thinks everything is in one column
If you use "text to columns" in excel and tell it to use tab, it will display the file correctly. Not only that, it will remember next time.
In actual fact the file you have exported is a text file that can be opened in notepad. If you open your tab delimited file in notepad you'll see everything neatly lined up in columns. Of course notepad is not the best data analysis tool.
What purpose do you have for exporting these files? If the purpose is to upload data into other systems, then use a tab delimiter.
If the purpose is to have an end user open the file and inspect / analyze it, then exporting as excel might be a less confusing solution
Just be aware that excel format is a bad choice for uploading into other systems, as the excel driver has some very strange behaviour.
Just to reorganize this entire thread:
My issue is when I load data from SQL to flat file and I have comma (,) in certain columns, my values goes off the rail and end up in a wrong column. I kept going back and forth setting the column delimiter and somehow it doesn't save when I reload the data flow.
The mistake is that I assumed that SQL loads data as CSV into flat files, but in fact it doesn't. It relies on the destination file you have created. Correct me if I'm wrong, creating a tab delimited (txt) in excel is different from normal text file created in notepad. My resolution is to create the tab delimited file in excel. SSIS can somehow recognize that and doesn't default loading as comma delimited.
One thing I never tried suggested by Nick is that you don't create any output file and just define it in SSIS and let it manage itself.
I've run through a lot of threads and many are facing this issue but has no answer. I hope this helps!

Invalid field count in CSV input on line 1 phpmyadmin

I have read many threads but can't find the right specific answer. I get this error message when I try to import additional data into an existing table. The field names are all aligned correctly, but not every row has data in every field. For example, although I have a field named middle_name, not every row has a middle name in it. During the import process, is this blank field not counted as a field and thus throwing off the field count?
I have managed to get most of the data to import by making sure I had a blank column to allow for the auto-increment of ID, as well as leaving the header row in the file but choosing 1 row to skip on the import.
Now the problem is the last row won't import - get error message Invalid format of CSV input on line 19. When I copy the file to Text Wrangler, the last row ends with ,,,,,. This accounts for the last 5 columns which are blank. I need to know what the trick is to get the last row to import.
Here are the settings I have been using:
I’ve had similar problems (with a tab-separated file) after upgrading from an ancient version of phpMyAdmin. The following points might be helpful:
phpMyAdmin must have he correct number of columns. In older versions of phpMyAdmin you could get away with not supplying empty values for columns at the end of the row, but this is no longer the case.
If you export an Excel file to text and columns at the start or end of rows are completely empty, Excel will not export blanks for those rows. You need to put something in, or leave blank then edit the resulting file in a text editor with regular expressions, e.g. to add a blank first row, search for ^ and replace with , (CSV file) or \t (tab file); to add two columns to the end search for $ and replace with ,, or \t\t etc.
Add a blank line to the bottom of the file to avoid the error message referring to the last line of data. This seems to be a bug that has been fixed in newer versions.
Whilst in the text editor, also check the file encoding as Excel sometimes saves as things like UTF-16 with BOM which phpMyAdmin doesn’t like.
I saw the same error while trying to import a csv file with 20,000 rows into a custom table in Drupal 7 using phpmyadmin. My csv file didn't have headers and had one column with many blanks in it.
What worked for me: I copied the data from Excel into Notepad (a Windows plain text editor) and then back into a new Excel spreadsheet and re-saved it as a csv. I didn't have to add headers and left the blank rows blank. All went fine after I did that.
You'll never have this problem if you keep your 1st row the header row, even if your table already has a header. You can delete the extra header row later.
Why this helps is that, then mysql knows how many cells can possibly contain data, and you wont have to fill in dummy data or edit the csv or any of those things.
I’ve had a similar problem, Then I have tried in Mysql Workbench.
table data import through CSV file easily and I have done my work perfectly in MySQL Workbench.
As long as the CSV file you're importing has the proper number of empty columns, it should be no problem for phpMyAdmin (and that looks like it's probably okay based on the group of commas you pasted from your last line).
How was the CSV file generated? Can you open it in a spreadsheet program to verify the line count of row 19?
Is the file exceptionally large? (At 19 rows, I can't imagine it is, but if so you could be hitting PHP resource limits causing early termination of the phpMyAdmin import).
Make sure you are trying to import the table into the database and not the database > table.
I had this same issue, tried all listed and finally realized I needed to go up a level to the database

When SQL Server 2008 query results are exported to CSV file extra rows are added

When I am exporting my query results from SQL Server 2008 to CSV or Tab Delimited txt format I always end up seeing extra records (that are not blank) when I open the exported file in Excel or import it into Access.
The SQL query results return 116623 rows
but when i export to csv and open with excel i see 116640 records. I tried importing the csv file into access and i also see extra records.
The weird thing is that if i add up the totals in excel up to row 116623 I have the correct totals meaning i have the right data up to that point but the extra 17 records after that are bad data which i don't know how it is being added.
Does anyone know what might be causing these extra records/rows to appear at the end of my CSV file?
The way i am exporting is by right clicking on the results and export to csv (comma delimited) or txt (tab delimited) files and both are causing the problem.
I would bet that in those huge number of rows you have some data that had a carriage return internal to the record (such as an address record that includes a line break). look for rows that have empty data in some fo the columns you would expect data in. I ususally reimport the file to a work table (with an identity so you can identify which rows are near the bad ones) and then run queries on it to find the ones that are bad.
Actually, there is a bug in the export results as feature. After exporting the results, open csv file in a Hex editor and look up unique key of last record. You will find it towards the end of the file. Find the OD OA for that record and delete everything else that follows. It's not Excel or Access. For some reason SQL just can't export a csv without corrupting the end of the file.