I have a JSON file seperating data entries with multiple and different seperators. How can I unpack the column with plenty of string values into seperate columns using Excel? (This is how it looks like (Screenshot)
Related
I am trying to store my pyspark output into csv, but when I try to save it in csv, the output does not look the same. I have the output in this form:
When I try to convert this to csv, the Concat tasks column does not show up properly, due to the size of the data. Given my requirement, it's necessary for me to store the data in csv format. Is there a way out for this. (P.S- I also see columns showing nonsensical values, even though the pyspark output shows correct value)
I have encountered issues when converting csv files to parquet in PySpark. When multiple files of the same schema were converted, they don't have the same schema, because sometimes a string of number will be read as float, others as integer, etc. There also seem to be issue with the order of columns. It seems that when writing dataframes with the same columns, but arranged in different order to parquet, then these parquet cannot be loaded in the same statement.
How to write dataframes to parquet so that all columns are stored as string type? How to handle the order of columns? Shall I rearange the columns to the same order for all the dataframes before writing to parquet?
If you want to sort the columns and convert to string type, you can do:
out_df = df.select([F.col(c).cast('string') for c in sorted(df.columns)])
out_df.write.parquet(...)
I'm trying to convert an excel file to a json file, while not losing any data.
I have a column in the excel that contains values which are both text and numbers (but both are stored as text).
First, I've set the column values to Text values.
Then, I tried attempting to solve this problem in 2 ways:
I tried converting the excel to a json file using an online converter but it didn't work well since I have text in foreign language in the excel.
I tried converting the excel to a csv, and then to a json file (also using an online converter to convert csv to json) and it worked but for some reason the numbers (which were stored as text) have become numbers again.
If there is a solution that involves code I'm great with that too.
Thanks
What's the syntax to write an array for solr in a csv file?, i need to update a multivalued field but when i upload the file, the data get all in the array but like just one element like this:
multiField:["data1,data2,data3"]
instead of this
multiField:["data1", "data2" , "data3"]
how i can write this in the csv file by default?
You can use the split parameter to split a single field into multiple values:
&f.multiField.split=,
.. should do what you want.
I have a CSV file that I can open in Excel 2012 and it comes in perfectly. When I try to setup the metadata for this CSV file in Talend the fields (columns) are not splitting the same was as Excel splits them. I suspect I am not properly setting the metadata.
The specific issue is that I have a column with string data in it which may contain commas within the string. For example suppose I have a CSV file with three columns: ID, Name and Age which looks like this:
ID,Name,Age
1,Ralph,34
2,Sue,14
3,"Smith, John", 42
When Excel reads this CSV file it looks at the second element of the third row ("Smith, John") as a single token and places it into a cell by itself.
In Talend it trys to break this same token into two since there is a comma within the token. Apparently Excel ignores all delimeters within a quoted string while Talend by default does not.
My question is how to I get Talend to behave the same as Excel?
if you use tfileinputdelimited component to read this csv file, you can use delimeter as "," and under csv options properties of this component you should enable Text Enclosure """ option or even if you use metadata there would be an option to define string/text enclosure - here you should mention """ to resolve your problem