Does this text-based file format similar to csv/tsv that seems to contain multiple sheets have a name? - csv

I have a text-based file format that is similar to csv/tsv with separators that are pipes |, and the first column of each row seems to be the "sheet/table" name, but there are no headers.
See the example below...I'd like to put a name to it so that I can import it into a tool and work with the data.
TABLEORSHEET1|Lar|Lafard|113 North Dakota Ln.|Johnstown|PA|15905
TABLEORSHEET1|Nancy|Lafard|114 North Dakota Ln.|Johnstown|PA|15905
TABLEORSHEET1|Tommy|Lafard|115 North Dakota Ln.|Johnstown|PA|15905
TABLEORSHEET2|1|Tea Cup|1.42|0
TABLEORSHEET2|1|Coffee Cup|3.42|1
TABLEORSHEET3|1|EDIT|LNAME|Laffer|Lafard
TABLEORSHEET3|1|EDIT|FNAME|Larry|Lar
I've seen this file format used twice before and both were either an import or an export to/from an Oracle system.

Related

Importing CSV file to Google maps format

I build a software that generate trails for my own use
I would like to test the software so I create A CSV file that contain the longitude and latitude of the trail points
What is the format of a CSV file that can imported to Google maps
The documentation isn't very specific about CSV files, so I just tried a bunch of formats.
Option 1 is to have separate latitude and longitude columns. You will be able to specify columns in the upload wizard.
lon,lat,title
-20.0390625,53.27835301753182,something
-17.841796875,53.27835301753182,something
Option 2 is to have a single coordinate column with the coordinates separated by space. You will be able to chose the order of the coordinate pair in the upload wizard.
lonlat,title
-20.0390625 53.27835301753182,something
-17.841796875 53.27835301753182,something
You'll also need one column that acts as the description for your points, it is, again, selectable in the wizard.
There seems to be no way to import CSVs as line geometries and no way to convert points to lines later on. Well-known-text (WKT) in the coordinate column fails to import.
The separator needs to be comma ,. Semicolons ;, spaces   and tabs don't work.

Is there any technical difference between CSV, a TSV or a TXT file?

I use these files constantly in my application, but aren't CSV, TSV or TXT files all flat files?
The content is:
"sample","sample"
They are all text files, following the same "guidelines". The difference between the files are - as long as the creator followed some "rules", that:
A csv file will have comma separated values and a tsv file will have tab seperated values.
For .txt files, there is no formatting specified.
.csv stands for comma separated values, .tsv stands for tab separated values.
As the names suggest, different elements in the file are separated by ',' and '\t' respectively.
The type is chosen depending on the data. If we have say numbers larger than 3 digits, we might need commas as part of the content ans it would be better to use a csv in that case.
Both are types of text files and are increasingly used for classification and data mining purposes.
They do not have any other technical distinguishing factor.
A text file (which might have a txt file extension) will have lines separated by a platform specific line separator (CRLF on Windows, LF on Linux, and so on), and it will tend to contain characters human readable as text in some encoding. Apart from that human readability expectation this allows pretty much any file content on some platforms, so this is more of a content classification than a specific file format.
The other two formats are usually considered special cases of a text file intended to allow easy automated processing; tsv, a "tab separated values" file is simpler than csv, a "comma separated values" file.
csv will have commas as field separators, and it may use quoting and escaping especially to handle commas and quotes occurring in those fields. It may also include a header line as the first line in the file. The last line in the file may or may not end with a line separator.
(Details.)
tsv simply disallows tabs in the values, the header line is mandatory, the final line separator is mandatory.
(Details.)
A "flat file", in connection with databases, is a text file as opposed to a machine optimized storage method (such as a fixed size record file or a compressed backup file or a file using more elaborate markup language supporting data validation); a flat file tends to be csv or tsv or similar.
This answer benefited from a comment by Alex Shpilkin.

csv-like format that is compatible in both US and Europe

Countries that use the point as a decimal mark (US, UK, China, India, etc.) use this format as csv:
value,value,value
Countries that use the comma as a decimal mark (Germany, Russia, France, South America, etc.) use this format as csv:
value;value;value
I've had some problems with (IIRC) MS Office 2003 in this regard.
So the question:
Is there a format that is as simple to create and parse as csv that does not suffer from an incompatibility between major world regions and can be read by MS Office and Libreoffice?
Edit:
I noticed that Libreoffice assues tabs as seperators when importing from csv:
value<tab>value<tab>value
Is that format usable in MS Office for US and Europe?
If you care only about Excel, then you can do this trick, add this line at the beginning of the CSV file:
sep=;
It can be other char, but I think ; is the most intuitive and doesn't cause confusion with decimal separators.
The best answer for this problem is still XML, which is the solution adopted by both LibreOffice and Excel file formats. They contain headers informing encoding, collate and other locale settings.

Parsing csv file with vim

I have a large CSV file structured as follows:
CHINESE TRANSLATION
我去上学。 Wǒ qù shàngxué. I am going to school. 上 ♦ on, on top of ♦ go to
我去过北京。 Wǒ qùguò Běijīng. I've been to Beijing. 京 -- ♦ national capital ♦ Beijing
....
The TRANSLATION column blends together three different informations: the pinyin, the English translation and additional information. These three types of information are always present and always presented in the same way and separated by a dot.
What I want to achieve is to create three different columns from the TRANSLATION column, ie to get :
CHINESE PINYIN TRANSLATION ADDITIONAL
我去上学。 Wǒ qù shàngxué. I am going to school. 上 ♦ on, on top of ♦ go to
....
Using a vim macro, how can I do this ?
I think vim macros can handle this job, but executing a vim macro on a big file several thousand times is very slow. So if you just want your job done, I have just wrote a python script, and I think it could give you what you want.
import csv
# change 'in.csv' and 'out.csv'
# to your exact file names.
with open('in.csv', 'r') as infile:
with open('out.csv', 'w') as outfile:
csvreader = csv.reader(infile)
for a, b in csvreader:
line = a + ',' + ','.join(b.split('.'))
outfile.writelines(line)

Culture independent CSV

I wonder if there is any way to generate culture neutral CSV file or at least specify data format of certian columns present in file.
For example I generated CSV file that contains numbers with decimal separator (.), and after
pass it to the client which is in the country where decimal separator is (,), client opens it with Excel and sees all values changed.
Is there any way to resolve this isure, or just in this case do not use CSV file ?
Thank you in advance.
What you want is a "quoted CSV file".
That is as well as separating your values with commas you also enclose them in (usually) double quotes.
Like so:-
"first","second","3,00","Some other text, etc."
This format is quite common and supported by EXCEL.
Two ways I came up with to avoid the decimal separator altogether:
1) Use scientific notation, so 1.25 would be: 123E-2
2) Make it a formula, so 1.25 would be: =125/100
Both pretty crappy, depending on your target audience, but at least Excel sees them as numbers and can calculate with them.
A CSV file will be separated by commas (the 'C' in CSV) but you can output a text with any delimiter and qualifier and you'll be able to open it in Excel - you specify them in the step 2 of the import text wizard.
A common choice for situations like this is to use tabs (TSV).
You can use Tab Separated Values, which does not vary between cultures and are supported by Microsoft Excel. Common file extensions are .tsv and .tab.
http://en.wikipedia.org/wiki/Tab-separated_values