Parse tab separated text file in Google Sheets

Parse tab separated text file in Google Sheets - csv

I have a txt file available on the web which contains tab separated values (TSV/CSV) like this:
Product_IdtabColortabPricetabQuantityItem1 tabRed tab$5.2 tab5Item2 tabBlue tab$7.5 tab10
I imported the txt file into a Google Spreadsheet using the IMPORTDATA(url) formula. The problem is that now I need to split the text to columns. I tried the following formulas without success:
Split(A1,"\t")
Split(A1," ")
Split(A1,"<tab>")
another thing I tried is to to use the Substitute function, but I just can't figure out how to match the Tab character in Google Spreadsheets?

Pages strips tabs by default when you paste text using a standard paste. Tab delimited data can be pasted and automatically parsed using:
Right Click -> Paste special -> Paste values only

IMPORTDATA(url) seems to handle tabs automatically, as others have mentioned before, if the URL ends in ".tsv".
I had trouble trying to import a file from Dropbox even though the file was named "something.tsv", because the url was
"https://www.dropbox.com/s/xxxxxxx/something.tsv?dl=1"
I managed to solve the problem by adding a dummy query parameter to the url:
"https://www.dropbox.com/s/xxxxxxx/something.tsv?dl=1&x=.tsv"

NOTE: I know this question was asked back in 2014 and I am answering this question some 5 years later. I am posting the answer here in hopes that someone else who googles their way here will be saved the headache and can be helped by how I devised a solution.
SUMMARY OF THE ISSUE: By default the IMPORTDATA() function will properly process a tab-delimited file only if the file name ends with the extension .TSV
UPDATE Nov 14, 2019:
In a comment below, Poul shared that he has found an undocumented parameter for the IMPORTDATA() function by which you can specify the delimiter to split the data. As of writing this, the official documentation makes no reference to this delimiter.
In effect the documentation should look something like the following:
IMPORTDATA("url","delimiter")
So, if you wanted to force a file to be split on the TAB character, it would look something like
IMPORTDATA("url","\t")
PRIOR ANSWER:
UPDATE: I am leaving my original answer just in case it might be helpful if the answer above, which includes undocumented functionality, does not continue to work.
ORIGINAL ANSWER: After seemingly countless attempts, I figured out how to coax Google Sheets into importing a tab-delimited file regardless of the extension.
For those looking for the quick and dirty answer, copy the following into a cell of a Google Sheet to give it a try:
=ARRAYFORMULA(IFERROR(SPLIT(IMPORTDATA("https://iso639-3.sil.org/sites/iso639-3/files/downloads/iso-639-3_Latin1.tab"),CHAR(9),FALSE,FALSE)))
For those that want to know a bit more, I will try to explain how each of the nested functions are helping to create the final solution:
=ARRAYFORMULA( IFERROR( SPLIT( IMPORTDATA(URL-HERE) ,CHAR(9),FALSE,FALSE) ) )
IMPORTDATA() - the primary function that pulls in the data file from the web
SPLIT - split the row by tab, note the use of char(09) to generate the tab character; also note the use of FALSE for the last parameter which was required in my case to ensure empty cells were not collapsed together
IFERROR - used to catch situations where an import might fail, the error will be trapped and not returned to the spreadsheet
ARRAYFORMULA - this function ensures that every line in the file is parsed; without this, only the first line of the file would be returned to the spreadsheet

It turns out that IMPORTDATA(url) can import a tab separated file, but it expects the file name to have the .tsv extension. This is inconsistent with Excel, where a tab-separated export results in *.txt.
If you can ensure that you use a .tsv extension, then your problem is solved.
You can also use the Sheets UI to import the file (into a new Spreadsheet). Select File > Import..., then Upload > Select a file from your computer. When the file selection dialog opens, paste the URL into the file name field, and click Open. The file will be downloaded to your PC then uploaded to Drive, through the Import dialog that will let you choose the delimiter.
(Validated on Windows 8.1 with Chrome; I don't know how this will behave on other OSes or browsers.)
Edit: See this gist.
importFromCSV(string fileName, string sheetName)
Populates a sheet with contents read from a CSV file located in the user's GDrive. If either parameter is not provided, the function will open inputBoxes to obtain them interactively.
Automatically detects tab or comma delimited input.

I had luck using split() and indicating only a single space as the delimiter, even though the data i pasted in had tabs separating each "column": =SPLIT(A1, " ", True) where A1 had data separated by 1 or more spaces. It seems that pasting in TSV data results in conversion from tabs to spaces.

This could be done in two steps leveraging the fact that tab is essentially multiple spaces.
Steps are as follows:
Select the columns which have tab separated data. Then trim tab to single space by using Data -> Data cleanup -> Trim whitespaces.
Now usual Data -> Split text to columns should work out of the box or after selecting space as separator.

Related

Pipeline unable to read field of plain text file

Using Apache Hop latest version I'm trying to read in a plain text file. This text file is old and basically only structured by its lines (it has no delimiter, no seperator, no enclosure, etc.). I would like to read and process the lines of this file as rows in my transformation.
I use the "Text file input" transformation to read the file. Apparently reading it works, but I seem have no field available when trying to retrieve the fields. It simply states that no fields were found.
When I run the "preview records" I do get empty records equal to the number if lines in the file, so that is good. However there is no data shown as there is no field detected.
Curiously enough, when I press "Show file content" I DO get the desired content, nicely structured in the rows as desired, so I know the file is being read correctly.
Does anyone know how to best read these kind of files?
PS: The files can be anywhere from 10 to 100000 lines.

When there is no header row with field names or Hop is not able to detect any fields you can also create a field in the fields tab and it will put content in there.
As we just use a position based approach and split the content using the specified delimiter everything should go in "field1" when no delimiter is found in the data.

Figured it out. The naming is a bit misleading, but you can use the "CSV File input" and then set a TAB as delimited. Then use preview on your file and you should find that the lines are actually being parsed.

Google sheets' IMPORTHTML() fails to keep the original format of the data

I want to use IMPORTHTML() to import data from a table at http://www.cophieu68.vn/atbottom.php; but IMPORTHTML() fails to keep the original format of the data, causing misleading information.
Specifically, I enter =IMPORTHTML("http://www.cophieu68.vn/atbottom.php";"table"; 2) into A1 cell of Google sheets. Google sheets successfully imports Table 2, but wrongly adds symbol * into my data. Consequently, SVD becomes starSVDstar (Please note cell B2 on my screenshot below) or 9.3 become star9.3star (Please note cell C2).
Moreover; 401,300 becomes 401,3 (Please note cell H2 ).
Besides, 29.1 is understood as 29/01/2021 (Please note my mouse click at cell K2 on my screenshot below)
Here is the screenshot of the original table that i want to import:
https://drive.google.com/file/d/1wgJcO4O-ivpsXk0XetXzwdhSs4d_8Cjy/view?usp=sharing
and here is the screenshot of the table which is imported into my Google sheets
https://drive.google.com/file/d/1DLipA3o85MTGo2ktumTU_0A45vhuLSA6/view?usp=sharing
Could anyone tell me what is wrong with my formula and how can I fix this error?
Thank you very much for your help.
Cao

Explanation:
The source page HTML is controlled by JavaScript, so the raw output of =IMPORTHTML cannot be changed whatsoever. However, you can use this formula on a new sheet to remove the asterisks from the original table.
If your IMPORTHTML is in Sheet1 you can use this:
=ARRAYFORMULA(SUBSTITUTE(Sheet1!A2:M,"*",""))

I suggest using regular expressions to solve the table import with a single formula:
=arrayformula(regexreplace(to_text(IMPORTHTML ("http://www.cophieu68.vn/atbottom.php ", "table", 2)),"\*(.*\/?)\*","$1"))
(I have added the function to_text to avoid losing the two zeros at the end of figures that are in units of thousands)

What is wrong with the function i have used [duplicate]

This question already has an answer here:
Google Sheets importXML Returns Empty Value
(1 answer)
Closed 2 years ago.
i am trying to use the importxml function to get data off the following website
:https://fantasy.espn.com/basketball/league/standings?leagueId=1878319. I want to get the table titled final standings into a google sheet using the import xml function. The function i am using is listed below:
=IMPORTXML("https://fantasy.espn.com/basketball/league/standings?leagueId=1878319","//*[#id="espn-analytics"]/div/div[5]/div[2]/div[1]/div/div/div[4]/section/div/div/div[2]/table/tbody")
The function returns a #NA error and says the import is empty. How do i fix it to get the data set i need

Unfortunately as more sites move to dynamically loaded content, the IMPORTXML function is losing some of it's usefulness, as it can't read this. Depending on how the site is loading the content, you might be able to analyze the script and find the source, but it might be true pain to do so, and you may have to parse the format to make it work. No fun.
Since the page you referenced is a "Final Standings" - I assume you don't need this to be auto-updating since it won't change, in which case, rather than a messy copy-paste, you might want to try a Chrome extension like "Instant Web Scraper" which will analyze the tables even within dynamic content and let you export it as a CSV which you can then quickly bring into Google Sheets.
Sorry that doesn't fix the IMPORTXML issue in this case, but I hope it helps.
Edit: Here is that top table in a CSV format (copy and save to a text file and name the text file a .csv and you can then upload it to Google Sheets):
jsx-2810852873,Image src,teamName,jsx-2302882246,Table__TD,jsx-2810852873 2,jsx-2810852873 3,jsx-2810852873 4,jsx-2810852873 5,jsx-2810852873 6,dn src
1,https://g.espncdn.com/lm-static/logo-packs/core/CatsAndDogs/cats_dogs-3.svg,Kevin Manning Show,(Kevin Manning),16-3-1,20328.5,17509.5,1016.4,875.5,+140.9,
2,,los angeles lebrons,(Zack Woodard),15-4-1,20909.5,17702.5,1045.5,885.1,+160.3,https://larrybrownsports.com/wp-content/uploads/2013/11/lebron-james-face.jpg
3,,BasketBall Chimps,(Jacob Woodard),13-6-1,19189.0,17317.5,959.5,865.9,+93.6,https://www.kimballstock.com/pix/CHI/03/CHI_03_RK0299_01_P.JPG
4,https://g.espncdn.com/lm-static/logo-packs/core/DIS_Avengers_EndGame/DIS_Avengers_EndGame_Capt_America.svg,Mr.Clean ICE,(Kenil Prajapati),12-7-1,21134.0,17640.5,1056.7,882.0,+174.7,
5,https://g.espncdn.com/lm-static/logo-packs/core/OldTimeMickeyAndFriends/Hockey_Donald.svg,Yonkers Yoinkers,(Einar H),11-8-1,17317.5,16704.5,865.9,835.2,+30.6,
6,,Yogurt Slingers,(Allan Perez),8-11-1,15821.5,16717.5,791.1,835.9,-44.8,https://g.espncdn.com/lm-app/lm/img/shell/shield-FBA.svg
7,https://g.espncdn.com/lm-static/logo-packs/core/TeamMascots-RobbHarskamp/Team_Mascots-04.svg,TAMU Shauced Shnacks,(Enrique Baqueiro),10-9-1,19733.5,17396.0,986.7,869.8,+116.9,
8,https://g.espncdn.com/lm-static/fba/images/default_logos/1.svg,Htown 🍆💦 Dal,(sheshu chandrasekar),3-16-1,13393.5,18560.5,669.7,928.0,-258.4,
9,https://g.espncdn.com/lm-static/logo-packs/fba/DreamTeam-ESPN/dreamTeam-4.svg,Original Gayngster,(Lee Nguyen),7-12-1,14462.0,17812.0,723.1,890.6,-167.5,
10,https://g.espncdn.com/lm-static/logo-packs/fba/Jerseys-ESPN/fba-jerseys-10.svg,Musty Burger FC Juan Prado,(Juan Prado),0-19-1,13300.5,18229.0,665.0,911.5,-246.4,

Adwords csv file in attachment is not parsing properly

I am trying to use google apps script to extract data from an email attachment which is basically an Adwords report as csv file.
Here is the gist of the code
var dataTest3 = Utilities.parseCsv(msg.getAttachments()[0].getDataAsString());
SpreadsheetApp.getActive().getSheetByName("Sheet1").getRange(1, 1, dataTest3.length, dataTest3[0].length).setValues(dataTest3);
msg is the GmailMessage object.
The result that i am getting is an array with strange format
The data shows ok but its value is strange
Any idea how can i make it parse into the spreadsheet like a normal csv. It opens up like a normal csv when downloaded.
Thanks

The description basically an Adwords report as csv file needs to be investigated... what exactly is the file format? With only pictures of your problem, the best I can do is guess that your file is using some custom delimiter, not commas.
"CSV" stands for Comma Separated Values, but in practice it applies to text files with a number of different field delimiters - call them Delimiter-separated values. Common delimiters include commas (,), tabs (\t), colons (:), v-bars (|), and sometimes just spaces (usually between quote-enclosed text fields).
Instead of using the version of Utilities.parseCsv(csv) that assumes a comma delimiter, you can use Utilities.parseCsv(csv, delimiter) to specify a custom delimiter. You should be able to determine what the delimiter is by reviewing the attachment in the debugger.
You could also try adapting importFromCSV() from How to Import tab-delimited "CSV", which automatically detects tab or comma delimiters.

Ms-Access trying to use "transfer text" to create a csv file with a unique filename

I am trying to use an automated macro to export a Ms-Access table to a csv file. I want the destination file to have a unique name, and I reckoned that using now()yyyymmddhhnn would be a good way to achieve this.
I have got transfer text working ok from my macro, and I have set up an export file spec for the transfer.
I am using ="C:\batchfile_" & Format(Now(),"yyyymmddhhnn") & ".csv" in the filename argument in the macro. This bit works.
But when I try to run the macro, it tells me that the filename doesn't exist and then the export doesn't complete. I am not sure why this is, but I think it is because the export file specification is expecting the destination file to have the same filename and column structure as the source table.
Does anyone know a way around this?
Eric

This is very old thread, I am posting my solution so that it may be usefull for some one else
transfer text works fine, as long as variables are supplied properly, you can check for other options other than filename, datasource alternatively create using file open statement
by opening text file and convert recordset data into CSV format.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Parse tab separated text file in Google Sheets - csv

Pages strips tabs by default when you paste text using a standard paste. Tab delimited data can be pasted and automatically parsed using: Right Click -> Paste special -> Paste values only

I had luck using split() and indicating only a single space as the delimiter, even though the data i pasted in had tabs separating each "column": =SPLIT(A1, " ", True) where A1 had data separated by 1 or more spaces. It seems that pasting in TSV data results in conversion from tabs to spaces.

Related

Pipeline unable to read field of plain text file

Google sheets' IMPORTHTML() fails to keep the original format of the data

What is wrong with the function i have used [duplicate]

Adwords csv file in attachment is not parsing properly

Ms-Access trying to use "transfer text" to create a csv file with a unique filename

Categories

Resources