Change .xls to .csv without losing columns - csv

I have a folder with around 400 .txt files that I need to convert to .csv. When I batch rename them to .csv, all the columns get smushed together into one. Same thing happens when I convert to .xls then .csv, even though the columns are fine in .xls. If I open the .xls file and save as to .csv, it's fine, but this would require opening all 400 files.
I am working with sed from the mac terminal. After navigating to the folder that contains the files within the terminal, here is some code that did not work:
for file in *.csv; do sed 's/[[:blank:]]+/,/g'
for file in *.csv; do sed -e "s/ /,/g"
for file in *.csv; do s/[[:space:]]/,/g
for file in *.csv; do sed 's/[[:space:]]{1,}/,/g'
Any advice on how to restore the column structure to the csv files would be much appreciated. And it's probably already apparent but I'm a coding newb so please go easy. Thanks!
Edit: here is an example of how the xls columns look, and how they should look in csv format:
Dotsc.exe 2/12/15 1:17 PM 0 Nothing 1 Practice
Everything that is separated by spaces here (except the space between 7 and PM) are separated by columns in the file. Here is what it looks like when I rename the batch rename the file to .csv:
Dotsc.exe 2/12/15 1:17 PM 0 Nothing 1 Practice
Columns have now turned into spaces, and all data is in the same column. Hope that clarifies things.

I think that what you are trying to do only in batch is not possible . I suggest you to use some library in Java.
Take a look here : http://poi.apache.org/spreadsheet/

Related

file "(...).csv" not Stata file error in using merge command

I use Stata 12.
I want to add some country code identifiers from file df_all_cities.csv onto my working data.
However, this line of code:
merge 1:1 city country using "df_all_cities.csv", nogen keep(1 3)
Gives me the error:
. run "/var/folders/jg/k6r503pd64bf15kcf394w5mr0000gn/T//SD44694.000000"
file df_all_cities.csv not Stata format
r(610);
This is an attempted solution to my previous problem of the file being a dta file not working on this version of Stata, so I used R to convert it to .csv, but that also doesn't work. I assume it's because the command itself "using" doesn't work with csv files, but how would I write it instead?
Your intuition is right. The command merge cannot read a .csv file directly. (using is technically not a command here, it is a common syntax tag indicating a file path follows.)
You need to read the .csv file with the command insheet. You can use it like this.
* Preserve saves a snapshot of your data which is brought back at "restore"
preserve
* Read the csv file. clear can safely be used as data is preserved
insheet using "df_all_cities.csv", clear
* Create a tempfile where the data can be saved in .dta format
tempfile country_codes
save `country_codes'
* Bring back into working memory the snapshot saved at "preserve"
restore
* Merge your country codes from the tempfile to the data now back in working memory
merge 1:1 city country using `country_codes', nogen keep(1 3)
See how insheet is also using using and this command accepts .csv files.

How do I preserve the leading 0 of a number using Unoconv when converting from a .csv file to a .xls file?

I have a 3 column csv file. The 2nd column contains numbers with a leading zero. For example:
044934343
I need to convert a .csv file into a .xls and to do that I'm using the command line tool called 'unoconv'.
It's converting as expected, however when I load up the .xls in Excel instead of showing '04493434', the cell shows '4493434' (the leading 0 has been removed).
I have tried surrounding the number in the .csv file with a single quote and a double quote however the leading 0 is still removed after conversion.
Is there a way to tell unoconv that a particular column should be of a TEXT type? I've tried to read the man page of unocov however the options are little confusing.
Any help would be greatly appreciated.
Perhaps I came too late at the scene, but just in case someone is looking for an answer for a similar question this is how to do:
unoconv -i FilterOptions=44,34,76,1,1/1/2/2/3/1 --format xls <csvFileName>
The key here is "1/1/2/2/3/1" part, which tells unoconv that the second column's type should be "TEXT", leaving the first and third as "Standard".
You can find more info here: https://wiki.openoffice.org/wiki/Documentation/DevGuide/Spreadsheets/Filter_Options#Token_7.2C_csv_import
BTW this is my first post here...

Merging PDF's with special characters in file name with Sejda Console via .CSV

I'm new to this forum and I'm not a programmer, so forgive me if I'm asking stupid questions...
I'm trying to merge some pdf's to 1 pdf with Sejda Console using a .csv file, but when the .csv contains special characters (e.g. ø) Sejda returns with:
Invalid filename found: File 'Something � something.pdf"...
So, it changed ø in �.
I've tried to import the .csv with different encoding standards (via Notepad save as: ANSI, UNICODE and UTF-8) and they all don't work (but, they have all a unique way to screw up the filename...)
Without this kind of characters it works fine.
It also works fine when the file names with ø are given directly in the syntax, like:
sejda-console-3.1.3/bin/sejda-console merge -f first.pdf second.pdf -o merged.pdf
And a second problem occurred: when a comma exists in the file name, the file name stops by the comma. Logically when the list separator is still a comma, but on my pc the list separator is a semicolon (Regional and Language Options). Adding quotes around the file name doesn't work...
I call the batch of Sejda with:
call "C:\sejda-console-3.0.13\bin\sejda-console.bat" merge -l 28.csv -o 28.pdf
And for this test 28.csv contains:
1700050.1_0060764-CROSS TRACK SKATE AXLE.pdf,
1700050.1_0060792-ø32 ATK10K6 FIXING PLATE.pdf,
1700050.1_0060798-CROSS TRACK SKATE NUTPLATE.pdf,
What is the proper way to get Sejda to merge correctly?

How to prepend a line to a file inside a zip file

Is there an efficient command-line tool for prepending lines to a file inside a ZIP archive?
I have several large ZIP files containing CSV files missing their header, and I need to insert the header line. It's easy enough to write a script to extract them, prepend the header, and then re-compress, but the files are so large, it takes about 15 minutes to extract each one. Is there some tool that can edit the ZIP in-place without extracting?
Fast answer, no.
A zip file contains 1 to N file entries inside and all of them works as un splitable units, meaning that if you want to do something on an entry, you need to process this entry completely (i.e. extracting).
The only fast operation you can do is adding a new file to your archive. It will create a new entry and append it to the file, but this is probably not what you need

Text column issues while converting xlsx to to csv

I am having troubles converting a xlsx file to csv format. Somehow it does not copy the contents of the columns that contain text.
I tried : python xlsx2csv-0.20/xlsx2csv.py -s 2 -d ';' 'testin.xlsx' 'testout.csv'
The result should look like:
"www.vistaheads.com";"http://www.vistaheads.com/forums/microsoft-public-windows-vista-general/200274-vista-mbr-vs-xp-mbr-4.html";"YahooBossAPIv2";;"eng";"ie";;9/8/2010;TRUE;FALSE;;0;-8.2666666667;0;0;0;0
"www.drpletsch.com";"http://www.drpletsch.com/elos-acne-treatment.html";"Oxyme.Searchv3.0.0";;"eng";;;7/31/2012;TRUE;FALSE;;;;0;0;0;0
"www.charterhouse-aquatics.co.uk";"http://www.charterhouse-aquatics.co.uk/catalog/elos-systemmini-marine-litre-aquarium-black-p-7022.html";"YahooBossAPIv2";;"eng";"us";;7/11/2012;TRUE;FALSE;;1;5.6666666667;0;0;0;0
"www.proz.com";"http://www.proz.com/kudoz/latin_to_english/religion/4794760-concio_melos_tinnulo.html";"YahooBossAPIv2";;"eng";"in";;5/7/2012;TRUE;FALSE;;1;3;0;0;0;0
"schoee.blogspot.co.uk";"http://schoee.blogspot.co.uk/2010/08/review-body-shop-vitamin-c-facial.html";"YahooBossAPIv2";;"eng";;;8/1/2010;TRUE;FALSE;;1;1;0;0;0;0
But instead I get:
;;;;;;;09-08-10;TRUE;FALSE;;0.0;-8.266666666666666;0.0;0.0;0.0;0.0;
;;;;;;;07-31-12;TRUE;FALSE;;;;0.0;0.0;0.0;0.0;
;;;;;;;07-11-12;TRUE;FALSE;;1.0;5.666666666666667;0.0;0.0;0.0;0.0;
;;;;;;;05-07-12;TRUE;FALSE;;1.0;3.0;0.0;0.0;0.0;0.0;
;;;;;;;08-01-10;TRUE;FALSE;;1.0;1.0;0.0;0.0;0.0;0.0;
;;;;;;;09-08-10;TRUE;FALSE;;0.0;0.033333333333333354;0.0;0.0;0.0;0.0;
;;;;;;;07-03-12;TRUE;FALSE;;1.0;2.0;0.0;0.0;0.0;0.0;
;;;;;;;10-18-11;TRUE;FALSE;;1.0;4.666666666666667;0.0;0.0;0.0;0.0;
I also tried using ssconvert, but here I get similar outcomes i.e. :
ssconvert -S 'testin.xlsx' testout2.csv
Also here the textual contents somehow vanished:
2010/09/08,TRUE,FALSE,,0,-8.26666666666667,0,0,0,0
"2012/07/31 09:58:39.823",TRUE,FALSE,,,,0,0,0,0
"2012/07/11 13:35:09.220",TRUE,FALSE,,1,5.66666666666667,0,0,0,0
2012/05/07,TRUE,FALSE,,1,3,0,0,0,0
2010/08/01,TRUE,FALSE,,1,1,0,0,0,0
2010/09/08,TRUE,FALSE,,0,0.03333333333333,0,0,0,0
"2012/07/03 22:24:03.467",TRUE,FALSE,,1,2,0,0,0,0
2011/10/18,TRUE,FALSE,,1,4.66666666666667,0,0,0,0
"2012/07/22 02:10:58.313",TRUE,FALSE,,1,2,0,0,0,0
"2012/08/02 17:01:39.637",TRUE,FALSE,,1,1,0,0,0,0
2010/06/05,TRUE,FALSE,,1,4,0,0,0,0
"2012/07/25 16:11:47.843",TRUE,FALSE,,1,2,0,0,0,0
2012/09/26,TRUE,TRUE,1,,,1,0,0,1
2012/04/29,TRUE,TRUE,2,,,8,3,1,4
2012/07/22,TRUE,FALSE,,0,0.03333333333333,0,0,0,0
2012/05/01,TRUE,FALSE,,1,14,0,0,0,0
"2012/08/07 06:17:39.647",TRUE,FALSE,,1,1,0,0,0,0
"2012/07/18 15:15:19.283",TRUE,FALSE,,1,3,0,0,0,0
2012/07/27,TRUE,FALSE,,1,0.33333333333333,0,0,0,0
2010/09/08,TRUE,FALSE,,1,0.33333333333333,0,0,0,0
"2012/07/21 18:10:57.700",TRUE,FALSE,,1,0.33333333333333,0,0,0,0
The Excel file looks fine to me. Any ideas what could be going wrong ?
The Excel file is generated using Apache POI, maybe that's a clue?
Kind regards,
Rianne