Text column issues while converting xlsx to to csv

Text column issues while converting xlsx to to csv - csv

I am having troubles converting a xlsx file to csv format. Somehow it does not copy the contents of the columns that contain text.
I tried : python xlsx2csv-0.20/xlsx2csv.py -s 2 -d ';' 'testin.xlsx' 'testout.csv'
The result should look like:
"www.vistaheads.com";"http://www.vistaheads.com/forums/microsoft-public-windows-vista-general/200274-vista-mbr-vs-xp-mbr-4.html";"YahooBossAPIv2";;"eng";"ie";;9/8/2010;TRUE;FALSE;;0;-8.2666666667;0;0;0;0
"www.drpletsch.com";"http://www.drpletsch.com/elos-acne-treatment.html";"Oxyme.Searchv3.0.0";;"eng";;;7/31/2012;TRUE;FALSE;;;;0;0;0;0
"www.charterhouse-aquatics.co.uk";"http://www.charterhouse-aquatics.co.uk/catalog/elos-systemmini-marine-litre-aquarium-black-p-7022.html";"YahooBossAPIv2";;"eng";"us";;7/11/2012;TRUE;FALSE;;1;5.6666666667;0;0;0;0
"www.proz.com";"http://www.proz.com/kudoz/latin_to_english/religion/4794760-concio_melos_tinnulo.html";"YahooBossAPIv2";;"eng";"in";;5/7/2012;TRUE;FALSE;;1;3;0;0;0;0
"schoee.blogspot.co.uk";"http://schoee.blogspot.co.uk/2010/08/review-body-shop-vitamin-c-facial.html";"YahooBossAPIv2";;"eng";;;8/1/2010;TRUE;FALSE;;1;1;0;0;0;0
But instead I get:
;;;;;;;09-08-10;TRUE;FALSE;;0.0;-8.266666666666666;0.0;0.0;0.0;0.0;
;;;;;;;07-31-12;TRUE;FALSE;;;;0.0;0.0;0.0;0.0;
;;;;;;;07-11-12;TRUE;FALSE;;1.0;5.666666666666667;0.0;0.0;0.0;0.0;
;;;;;;;05-07-12;TRUE;FALSE;;1.0;3.0;0.0;0.0;0.0;0.0;
;;;;;;;08-01-10;TRUE;FALSE;;1.0;1.0;0.0;0.0;0.0;0.0;
;;;;;;;09-08-10;TRUE;FALSE;;0.0;0.033333333333333354;0.0;0.0;0.0;0.0;
;;;;;;;07-03-12;TRUE;FALSE;;1.0;2.0;0.0;0.0;0.0;0.0;
;;;;;;;10-18-11;TRUE;FALSE;;1.0;4.666666666666667;0.0;0.0;0.0;0.0;
I also tried using ssconvert, but here I get similar outcomes i.e. :
ssconvert -S 'testin.xlsx' testout2.csv
Also here the textual contents somehow vanished:
2010/09/08,TRUE,FALSE,,0,-8.26666666666667,0,0,0,0
"2012/07/31 09:58:39.823",TRUE,FALSE,,,,0,0,0,0
"2012/07/11 13:35:09.220",TRUE,FALSE,,1,5.66666666666667,0,0,0,0
2012/05/07,TRUE,FALSE,,1,3,0,0,0,0
2010/08/01,TRUE,FALSE,,1,1,0,0,0,0
2010/09/08,TRUE,FALSE,,0,0.03333333333333,0,0,0,0
"2012/07/03 22:24:03.467",TRUE,FALSE,,1,2,0,0,0,0
2011/10/18,TRUE,FALSE,,1,4.66666666666667,0,0,0,0
"2012/07/22 02:10:58.313",TRUE,FALSE,,1,2,0,0,0,0
"2012/08/02 17:01:39.637",TRUE,FALSE,,1,1,0,0,0,0
2010/06/05,TRUE,FALSE,,1,4,0,0,0,0
"2012/07/25 16:11:47.843",TRUE,FALSE,,1,2,0,0,0,0
2012/09/26,TRUE,TRUE,1,,,1,0,0,1
2012/04/29,TRUE,TRUE,2,,,8,3,1,4
2012/07/22,TRUE,FALSE,,0,0.03333333333333,0,0,0,0
2012/05/01,TRUE,FALSE,,1,14,0,0,0,0
"2012/08/07 06:17:39.647",TRUE,FALSE,,1,1,0,0,0,0
"2012/07/18 15:15:19.283",TRUE,FALSE,,1,3,0,0,0,0
2012/07/27,TRUE,FALSE,,1,0.33333333333333,0,0,0,0
2010/09/08,TRUE,FALSE,,1,0.33333333333333,0,0,0,0
"2012/07/21 18:10:57.700",TRUE,FALSE,,1,0.33333333333333,0,0,0,0
The Excel file looks fine to me. Any ideas what could be going wrong ?
The Excel file is generated using Apache POI, maybe that's a clue?
Kind regards,
Rianne

Related

How do I preserve the leading 0 of a number using Unoconv when converting from a .csv file to a .xls file?

I have a 3 column csv file. The 2nd column contains numbers with a leading zero. For example:
044934343
I need to convert a .csv file into a .xls and to do that I'm using the command line tool called 'unoconv'.
It's converting as expected, however when I load up the .xls in Excel instead of showing '04493434', the cell shows '4493434' (the leading 0 has been removed).
I have tried surrounding the number in the .csv file with a single quote and a double quote however the leading 0 is still removed after conversion.
Is there a way to tell unoconv that a particular column should be of a TEXT type? I've tried to read the man page of unocov however the options are little confusing.
Any help would be greatly appreciated.

Perhaps I came too late at the scene, but just in case someone is looking for an answer for a similar question this is how to do:
unoconv -i FilterOptions=44,34,76,1,1/1/2/2/3/1 --format xls <csvFileName>
The key here is "1/1/2/2/3/1" part, which tells unoconv that the second column's type should be "TEXT", leaving the first and third as "Standard".
You can find more info here: https://wiki.openoffice.org/wiki/Documentation/DevGuide/Spreadsheets/Filter_Options#Token_7.2C_csv_import
BTW this is my first post here...

convert CSV to JSON using Python

I need to convert a CSV file to JSON file using Python. I used this,
variable = csv.DictReader(file.csv)
It throws this ERROR
csv.Error: line contains NULL byte
I checked the CSV file in Excel, it shows no NULL chars, but when I printed the data in CSV file using Python. There are some data like SOHNULNULHG (here last 2 letters, HG is the data displaying in the Excel). I need to remove these ASCII chars in the CSV file, while converting to JSON. (i.e. I need only HG from the above string)

I just ran into the same issue. I converted my csv file to csv UTF-8 and ran it again without any errors. That seemed to fix the ASCII char issue. Hope that helps.
To convert the csv type, I just opened my file up in Excel, did save as, then selected CSV UTF-8(Comma delimited)(*.csv) in the Save as type.
Hope that helps.

split big json files into small pieces without breaking the format

I'm using spark.read() to read a big json file on databricks. And it failed due to: spark driver has stopped unexpectedly and is restarting after a long time of runing.I assumed it is because the file is too big, so i decided to split it. So I used command:
split -b 100m -a 1 test.json
This actually split my files into small pieces and I can now read that on databricks. But then I found what I got is a set of null values. I think that is because i splitted the file only by the size,and some files might become files that are not in json format. For example , i might get something like this in the end of a file.
{"id":aefae3,......
Then it can't be read by spark.read.format("json").So is there any way i can seperate the json file into small pieces without breaking the json format?

Search and Replace Text in CSV file using Python

I just started with Python 3.4.2 and trying to find and replace text in csv file.
In Details, Input.csv file contain below line:
0,0,0,13,.\New_Path-1.1.12\Impl\Appli\Library\Module_RM\Code\src\Exception.cpp
0,0,0,98,.\Old_Path-1.1.12\Impl\Appli\Library\Prof_bus\Code\src\Wrapper.cpp
0,0,0,26,.\New_Path-1.1.12\Impl\Support\Custom\Vital\Code\src\Interface.cpp
0,0,0,114,.\Old_Path-1.1.12\Impl\Support\Custom\Cust\Code\src\Config.cpp
I maintained my strings to be searched in other file named list.csv
Module_RM
Prof_bus
Vital
Cust
Now I need to go through each line of Input.csvand replace the last column with the matched string.
So my end result should be like this:
0,0,0,13,Module_RM
0,0,0,98,Prof_bus
0,0,0,26,Vital
0,0,0,114,Cust
I read the input files first line as a list. So text which i need to replace came in line[4]. I am reading each module name in the list.csv file and checking if there is any match of text in line[4]. I am not able to make that if condition true. Please let me know if it is not a proper search.
import csv
import re
with open("D:\\My_Python\\New_Python_Test\\Input.csv") as source, open("D:\\My_Python\\New_Python_Test\\List.csv") as module_names, open("D:\\My_Python\\New_Python_Test\\Final_File.csv","w",newline="") as result:
reader=csv.reader(source)
module=csv.reader(module_names)
writer=csv.writer(result)
#lines=source.readlines()
for line in reader:
for mod in module_names:
if any([mod in s for s in line]):
line.replace(reader[4],mod)
print ("YES")
writer.writerow("OUT")
print (mod)
module_names.seek(0)
lines=reader
Please guide me to complete this task.
Thanks for your support!

At-last i succeeded in solving this problem!
The below code works well!
import csv
with open("D:\\My_Python\\New_Python_Test\\Input.csv") as source, open("D:\\My_Python\\New_Python_Test\\List.csv") as module_names, open("D:\\My_Python\\New_Python_Test\\Final_File.csv","w",newline="") as result:
reader=csv.reader(source)
module=csv.reader(module_names)
writer=csv.writer(result)
flag=False
for row in reader:
i=row[4]
for s in module_names:
k=s.strip()
if i.find(k)!=-1 and flag==False:
row[4]=k
writer.writerow(row)
flag=True
module_names.seek(0)
flag=False
Thanks for people who tried to solve! If you have any better coding practices please do share!
Good Luck!

Change .xls to .csv without losing columns

I have a folder with around 400 .txt files that I need to convert to .csv. When I batch rename them to .csv, all the columns get smushed together into one. Same thing happens when I convert to .xls then .csv, even though the columns are fine in .xls. If I open the .xls file and save as to .csv, it's fine, but this would require opening all 400 files.
I am working with sed from the mac terminal. After navigating to the folder that contains the files within the terminal, here is some code that did not work:
for file in *.csv; do sed 's/[[:blank:]]+/,/g'
for file in *.csv; do sed -e "s/ /,/g"
for file in *.csv; do s/[[:space:]]/,/g
for file in *.csv; do sed 's/[[:space:]]{1,}/,/g'
Any advice on how to restore the column structure to the csv files would be much appreciated. And it's probably already apparent but I'm a coding newb so please go easy. Thanks!
Edit: here is an example of how the xls columns look, and how they should look in csv format:
Dotsc.exe 2/12/15 1:17 PM 0 Nothing 1 Practice
Everything that is separated by spaces here (except the space between 7 and PM) are separated by columns in the file. Here is what it looks like when I rename the batch rename the file to .csv:
Dotsc.exe 2/12/15 1:17 PM 0 Nothing 1 Practice
Columns have now turned into spaces, and all data is in the same column. Hope that clarifies things.

I think that what you are trying to do only in batch is not possible . I suggest you to use some library in Java.
Take a look here : http://poi.apache.org/spreadsheet/

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Text column issues while converting xlsx to to csv - csv

Related

How do I preserve the leading 0 of a number using Unoconv when converting from a .csv file to a .xls file?

convert CSV to JSON using Python

split big json files into small pieces without breaking the format

Search and Replace Text in CSV file using Python

Change .xls to .csv without losing columns

Categories

Resources