Humanly Read Tab Delimited TXT Files - csv

Normally I work with Comma Delimited CSV files, and whenever I need to scan over the data, I just open it up in Excel, and there it is, organised neatly into columns.
However, on this occasion I have been given a tab delimited text file. If I open it up in MS Notepad, because the values and headings are all variable and different widths, the data is very disjointed. Easy for a computer to read. Not easy for me to read!
So far I have tried Notepad, Excel and Sublime Text.
Is there a program that will display a tab delimited text file, organised into humanly readable columns?
Thanks

I hope notepad++ can help you here.
Please try:
http://notepad-plus-plus.org/

OK, so it just required a few tweaks in Excel for it to format correctly.
I first changed the file extension from .txt to .csv and followed this guide:
http://www.makeuseof.com/tag/how-to-convert-delimited-text-files-into-excel-spreadsheets/
In short:
Open the file in MS Excel.
Highlight the first column (which is where all your data will be)
Select the Data tab in the File menu.
Select Text to Columns and follow the Wizard

Related

String gets separated into different rows in SSIS

I am currently working on an input file and I do have a column which contains 3 different values in one cell itself. Although this data is not being used in the transformation , I need to input this data from the source and then ignore when it is loaded into the staging table.
But the issue I face is that it gets loaded into separate rows rather than 1 cell.
This particular column is input as a string datatype. what change do I need to make to resolve this issue. Please let me know If more details are needed to answer the question.
I have uploaded a sample file to google drive https://drive.google.com/file/d/17hn8xmRd4CWsgKBzHgdwnR9W4jTJ9lTn/view?usp=sharing
The following is a screenshot of the csv data as opened in a text editor
Having downloaded sample.csv from your link, the first thing I did was open it in a text editor (Notepad++, TextPad, Visual Studio, etc) and just looked at what you have.
Row 1 is column headers
Encoded in UFT-8 with BOM (byte order marker)
Line Endings are CR/LF (Carriage Return & Line Feed)
Column delimiter appears to be a comma ,
Double Quote, ", is used as the text qualifier but only when needed
There are CR/LF characters in the actual data
I then define my flat file connection manager based on that data
Finally, I have a data flow with a Flat File Source to a Derived Column and drop a Data Viewer between them
As you can see, configuring your Flat File Connection Manager as I show will allow all the data to flow into your table as expected.
What is happening now is the CRLF, which is our row delimiter, is having precedence over the embedded CRLF in the column data. By setting the double quote as the Text Qualifier, the data reader correctly "skips" the embedded CRLF until it is encountered outside of the quotes.

Pipeline unable to read field of plain text file

Using Apache Hop latest version I'm trying to read in a plain text file. This text file is old and basically only structured by its lines (it has no delimiter, no seperator, no enclosure, etc.). I would like to read and process the lines of this file as rows in my transformation.
I use the "Text file input" transformation to read the file. Apparently reading it works, but I seem have no field available when trying to retrieve the fields. It simply states that no fields were found.
When I run the "preview records" I do get empty records equal to the number if lines in the file, so that is good. However there is no data shown as there is no field detected.
Curiously enough, when I press "Show file content" I DO get the desired content, nicely structured in the rows as desired, so I know the file is being read correctly.
Does anyone know how to best read these kind of files?
PS: The files can be anywhere from 10 to 100000 lines.
When there is no header row with field names or Hop is not able to detect any fields you can also create a field in the fields tab and it will put content in there.
As we just use a position based approach and split the content using the specified delimiter everything should go in "field1" when no delimiter is found in the data.
Figured it out. The naming is a bit misleading, but you can use the "CSV File input" and then set a TAB as delimited. Then use preview on your file and you should find that the lines are actually being parsed.

Paste CSV or Tab-Delimited data to excel with NO formatting

I'm pasting Tab Delimited data from Notepad++ to excel (about 50k rows and 3 columns). No matter how many different ways I try it, Excel wants to convert a cell containing one " to the next instance of " into one cell content.
For Example, if my data looked like this:
"Apple 1.0 Store
Banana 1.3 Store
"Cherry" 2.5 Garden
Watermelon 4.0 Field
The excel file looks like this:
Apple1.0StoreBanana1.3Store
Cherry 2.5GardenWatermelon4.0Field
One way to get around this is to open the file as a CSV in excel, however this leads to Excel formatting the number values to simplified ones using Excel's "General" format. So the data would look like the following:
"Apple 1 Store
Banana 1.3 Store
"Cherry" 2.5 Garden
Watermelon 4 Field
The data I'm getting is coming from SQL Server Studio so my options for file formats are:
.CSV
.Txt (Tab-delimited)
Copy Pasting from Query results
The solution I'm looking for is to have the data represented in Excel with no excel processing taking place on the quotations, numbers or any other cell contents.
Don't open the file directly in excel. Instead import it and control the data types and file layout.
Open a new excel document:
Select Data menu:
Select From Text in get External Data section.
Select file to import
On step 1 of import wizard select delimited
Click next
Select tab checkbox and change text qualifier to {none}.
Click next
Set column data types to general, text, text
Click finish.
Excel auto imports the data the best it can when you open directly in excel. You lose flexibility/control when this happens. better to import and control yourself to get the fine adjustments you're looking for.
You end up with something like this:
By treating the numbers like text, the zero's don't get messed up.
By setting the text qualifier to none, the quotes don't get messed up.
Have you tried opening it via Text Import?
Got to Data tab > From Text (third form left on default)
You will have window similar to Text To Columns.
Select correct delimiter, remember to remove the quote sign from TExt Qualifier and mark all columns as text to avoid Excel autoformatting.
Step 1:
Step 2:
Step 3:
EXCEL TIP: TIME SAVING IN IMPORTING CSV FILES INTO EXCEL: If u pre-set your Text-To-Columns delimiter parameters correctly in EXCEL (eg specify tabs as the delimiter) and then copy and paste the CSV data, Excel will import the CSV paste directly into the correct columns without u having to going through the Text-To-Columns rigmarole. This was particularly time saving when i had to import hundreds of bank statements into Excel.
However if your Text-To-Columns delimiters are pre-specified incorrectly as e.g. comma and you are importing tab delimited files then excel will dump all the data into one column, and u will have to go through the time consuming process of converting Text-To-Columns for each statement.
EXCEL LOOKS TO THE EXISTING Text-To-Columns delimiters TO SEE IF IT CAN USE THOSE TO MAKE YOUR LIFE EASIER WHEN PASTING DATA
Hope that tip helps (It saved me several hours)

Parse tab separated text file in Google Sheets

I have a txt file available on the web which contains tab separated values (TSV/CSV) like this:
Product_IdtabColortabPricetabQuantityItem1 tabRed tab$5.2 tab5Item2 tabBlue tab$7.5 tab10
I imported the txt file into a Google Spreadsheet using the IMPORTDATA(url) formula. The problem is that now I need to split the text to columns. I tried the following formulas without success:
Split(A1,"\t")
Split(A1," ")
Split(A1,"<tab>")
another thing I tried is to to use the Substitute function, but I just can't figure out how to match the Tab character in Google Spreadsheets?
Pages strips tabs by default when you paste text using a standard paste. Tab delimited data can be pasted and automatically parsed using:
Right Click -> Paste special -> Paste values only
IMPORTDATA(url) seems to handle tabs automatically, as others have mentioned before, if the URL ends in ".tsv".
I had trouble trying to import a file from Dropbox even though the file was named "something.tsv", because the url was
"https://www.dropbox.com/s/xxxxxxx/something.tsv?dl=1"
I managed to solve the problem by adding a dummy query parameter to the url:
"https://www.dropbox.com/s/xxxxxxx/something.tsv?dl=1&x=.tsv"
NOTE: I know this question was asked back in 2014 and I am answering this question some 5 years later. I am posting the answer here in hopes that someone else who googles their way here will be saved the headache and can be helped by how I devised a solution.
SUMMARY OF THE ISSUE: By default the IMPORTDATA() function will properly process a tab-delimited file only if the file name ends with the extension .TSV
UPDATE Nov 14, 2019:
In a comment below, Poul shared that he has found an undocumented parameter for the IMPORTDATA() function by which you can specify the delimiter to split the data. As of writing this, the official documentation makes no reference to this delimiter.
In effect the documentation should look something like the following:
IMPORTDATA("url","delimiter")
So, if you wanted to force a file to be split on the TAB character, it would look something like
IMPORTDATA("url","\t")
PRIOR ANSWER:
UPDATE: I am leaving my original answer just in case it might be helpful if the answer above, which includes undocumented functionality, does not continue to work.
ORIGINAL ANSWER: After seemingly countless attempts, I figured out how to coax Google Sheets into importing a tab-delimited file regardless of the extension.
For those looking for the quick and dirty answer, copy the following into a cell of a Google Sheet to give it a try:
=ARRAYFORMULA(IFERROR(SPLIT(IMPORTDATA("https://iso639-3.sil.org/sites/iso639-3/files/downloads/iso-639-3_Latin1.tab"),CHAR(9),FALSE,FALSE)))
For those that want to know a bit more, I will try to explain how each of the nested functions are helping to create the final solution:
=ARRAYFORMULA( IFERROR( SPLIT( IMPORTDATA(URL-HERE) ,CHAR(9),FALSE,FALSE) ) )
IMPORTDATA() - the primary function that pulls in the data file from the web
SPLIT - split the row by tab, note the use of char(09) to generate the tab character; also note the use of FALSE for the last parameter which was required in my case to ensure empty cells were not collapsed together
IFERROR - used to catch situations where an import might fail, the error will be trapped and not returned to the spreadsheet
ARRAYFORMULA - this function ensures that every line in the file is parsed; without this, only the first line of the file would be returned to the spreadsheet
It turns out that IMPORTDATA(url) can import a tab separated file, but it expects the file name to have the .tsv extension. This is inconsistent with Excel, where a tab-separated export results in *.txt.
If you can ensure that you use a .tsv extension, then your problem is solved.
You can also use the Sheets UI to import the file (into a new Spreadsheet). Select File > Import..., then Upload > Select a file from your computer. When the file selection dialog opens, paste the URL into the file name field, and click Open. The file will be downloaded to your PC then uploaded to Drive, through the Import dialog that will let you choose the delimiter.
(Validated on Windows 8.1 with Chrome; I don't know how this will behave on other OSes or browsers.)
Edit: See this gist.
importFromCSV(string fileName, string sheetName)
Populates a sheet with contents read from a CSV file located in the user's GDrive. If either parameter is not provided, the function will open inputBoxes to obtain them interactively.
Automatically detects tab or comma delimited input.
I had luck using split() and indicating only a single space as the delimiter, even though the data i pasted in had tabs separating each "column": =SPLIT(A1, " ", True) where A1 had data separated by 1 or more spaces. It seems that pasting in TSV data results in conversion from tabs to spaces.
This could be done in two steps leveraging the fact that tab is essentially multiple spaces.
Steps are as follows:
Select the columns which have tab separated data. Then trim tab to single space by using Data -> Data cleanup -> Trim whitespaces.
Now usual Data -> Split text to columns should work out of the box or after selecting space as separator.

Formatting csv files in Excel

Win XP, Excel 2007
I know there are various other posts on csv formatting but couldn't quite find what i needed.
Some of our data is held off site by another company and they send us a csv file every morning with the previous days data.
The problem is this data has come from web input forms that may have drop-down lists.
For example there may be a drop down list of Number of Employees with options like 1-10, 11-25, 26-50 etc
When we open the csv file in Excel certain options like 1-10 has been turned into Oct-01 date format which we do not want.
Is there an easy way to change these back OR reformat the cells and do a find...replace? (This didn't seem to work terribly well as it kept reverting back to the date)
Indeed is there a better way of opening the csv file to keep the formatting intact? and save us doing lots of find...replaces.
Ultimately we will need to open the csv in Excel though.
Grateful for any hints
Isn't that SO annoying? Here's how I deal with this issue:
When you open the CSV file in Excel, you should get a dialog with parsing options. First you select delimited or fixed then you get a screen that previews the data parsing.
It's easy to miss, but in the upper right corner of the dialog box there's an option to set a specific data format for each column. Select the column you want to protect and set the format to text. (This keeps Excel from dropping the leading zeros in ZIP codes for New England too!)
Once you get it into Excel, you can do a vlookup or replace to reset the values to your own codes.
Hope this helps. Good luck.