How to write on second sheet of CSV file using TDI/SDI? - csv

I want to write some data on second sheet of a CSV file using FileConnector in IBM TDI/SDI.
The first sheet of the same file has data which should not be over written.
Is it possible to do so ?
Any lead will be appreciated! Thank you

Csv files do not have 'sheets'.
They are files with tabular data having only one structure for the whole file, resulting in a single table.

Related

How to add a row in a CSV file in pentaho data integration

I need to add a row data in a CSV file using Pentaho Data Integration.
I've tried with this transformation
This is my CSV file input configuration
and this is the CSV file output configuration (with the "append" check activated ...)
My constant definition
and this is my CSV file sample
I'd like to have this
Any suggestion will be appreciated!
You can use the Data grid step to create your constant data and the Append streams step to merge two streams into one in your desired order (data type in two streams must be matched and the same order) and then you can write the data to a CSV file. If you don't need a header present in the CSV file you can uncheck the "Header" option in the content tab

how to save dataframe into csv pyspark

I am trying to save dataframe into hdfs system.
It gets saved as part-0000 and into multiple parts.
I want to save it as an excel sheet or just one part file?
How can we achieve this?
code used so far:
df1.write.csv('/user/gtree/tree.csv')
Your dataframe is being saved based on its partitions(multiple partitions= multiple files). You can coalesce or bring your partitions down to 1, so that only 1 file can be written.
Link:https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.coalesce
df1.coalesce(1).write.csv('/user/gtree/tree.csv')
You can use .repartition(1) to set the partitions to only 1
df.repartition(1).save(filePath)

Missing rows while exporting more than 1 milliion record into csv file via SSIS

Task : Need to export 1.1 million records to a csv file
I loaded it via SSIS Dataflow.
As you can see there are 1,100,800 rows that is loaded from a table(Source) to the FlatFile location which is a CSV file.
My FlatFile destination Source filename is Test.csv
Now when i open the csv file i get the error
"file not loaded completely"
Now when i see the record at the very end of my csv file .Sorry cannot attache the csv file due to data integrity.
So i only see record till 1048578 but the row i loaded was 1,100880 so there are some missing rows and i cannot add them manually as well . See the end of the csv it does not let me type to the next row.
Any idea why?
As for workaround i loaded in to seperate csv file 1 million in 1 csv and rest in others.
But i really wanna know why it is doing this.
Thank you in advance for looking at this.
It's Excel's fault. It only supports 1,048,576 rows.
https://support.office.com/en-us/article/excel-specifications-and-limits-1672b34d-7043-467e-8e27-269d656771c3
The error you're getting is because you're trying to open a .csv with more than the acceptable number of rows. Try opening the file in a different app, like Notepad++.

Need help in validating 50 fields in JSON

I have a JSON file with almost 50 fields in it and I need to validate each one of them through automation. Also I have the test data for all 50 fields to be validated in an excel sheet.
The logic I am stuck here is I cannot place all 50 field's data in the excel cells since it will make my sheet look bulk. Any suggestions on how I can validate this ?
Can you use Python/Java/C# etc to directly read the excel sheet (or export the excel sheet to a CSV file) then write code to validate the JSON against the read data? You don't have to put the validation result back into the excel sheet, do you?

How to Get Data from CSV File and Send them to Excel Using Pentaho?

I have a tabular csv file that has seven columns and containing the following data:
ID,Gender,PatientPrefix,PatientFirstName,PatientLastName,PatientSuffix,PatientPrefName
2 ,M ,Mr ,Lawrence ,Harry , ,Larry
I am new to pentaho and I want to design a transformation that moves the data (values of the 7 columns) to an empty excel sheet. The excel sheet has different column names, but should carry the same data, as shown:
prefix_name,first_name,middle_name,last_name,maiden_name,suffix_name,Gender,ID
I tried to design a transformation using the following series of steps, but it gives me errors at the end that I could not interpret them.
What is the proper design to move the data from the csv file to the excel sheet in this case? Any ideas to solve this problem?
As #Brian.D.Myers mentioned in the comment you can use select values step. But here is how you do it step by step explanation.
Select all the fields from CSV file input step.
Configure the select values step as follows.
In the Content tab of Excel writer step click on Get fields button and fill the fields. Alternatively you can use Excel output step as well.