How to Get Data from CSV File and Send them to Excel Using Pentaho? - csv

I have a tabular csv file that has seven columns and containing the following data:
ID,Gender,PatientPrefix,PatientFirstName,PatientLastName,PatientSuffix,PatientPrefName
2 ,M ,Mr ,Lawrence ,Harry , ,Larry
I am new to pentaho and I want to design a transformation that moves the data (values of the 7 columns) to an empty excel sheet. The excel sheet has different column names, but should carry the same data, as shown:
prefix_name,first_name,middle_name,last_name,maiden_name,suffix_name,Gender,ID
I tried to design a transformation using the following series of steps, but it gives me errors at the end that I could not interpret them.
What is the proper design to move the data from the csv file to the excel sheet in this case? Any ideas to solve this problem?

As #Brian.D.Myers mentioned in the comment you can use select values step. But here is how you do it step by step explanation.
Select all the fields from CSV file input step.
Configure the select values step as follows.
In the Content tab of Excel writer step click on Get fields button and fill the fields. Alternatively you can use Excel output step as well.

Related

Importing specific columns from a CSV into excel

I am trying to do what the title says and also do it for new records. I cannot link the CSV file because it exceeds the 255 limit. So i am attempting to split up the table.
I have the below table in access
DateOfTest
Time
PromptTime
TestSequence
PATResults
Logs
Serial Number
1
2
3
4
5
6
7
Obviously, where the numbers are i want the data from the CSV to be inserted.
I have created a form including a button so i can run some VBA, but i cannot find the correct information online for my work, as i am new to VBA it is also a bit confusing.
I have attempted some random code, but i was just spraying and praying at that point
I am not sure I understood your question. In the impoer tool you can choose columns, but if you want to do it with a script, I would suggest to perform pre-processing phase with simple python and pandas to read the csv file, remove any unwanted columns and save to another CSV to be uploaded directly to excel.
something like this
import pandas as pd
df = pd.read_csv ('csvfile.csv')
df.drop('column_name', inplace=True, axis=1)
df.to_excel ('filename.xlsx', index = False, header=True)

When creating a Copy Activity in an Azure Data Factory pipeline, how do I map a CSV sheet with 5 columns to a CSV sheet with 20 columns?

So I have an input CSV sheet that I want to copy into an output CSV sheet. The output CSV sheet has all the columns in the input sheet, plus a bunch of other columns. (I will be copying data into those from other input sheets later.)
When I run the pipeline containing my Copy Activity, the only columns present in the new output sheet are the 5 columns from the input sheet, I assume because those are the only ones in the mapping. However, I've also tried creating 15 "Additional Columns" in the "source" section of the Copy Activity --- just trying out things like "test", \"test\", test, #test, #pipeline().DataFactory, $$FILEPATH, etc. --- but when I debug the pipeline and go back to my container and look at the output sheet, still only the 5 columns from the input sheet are present there!
How do I get the output sheet to contain columns that are not present in the input sheet? Do I need to create an ARM template?
I am doing this entirely via the Azure Portal, btw.
This will be much easier to design in ADF's data flows instead by creating Derived Columns to append to your output sink schema
This works fine on my side. Is there any differences?

Paste CSV or Tab-Delimited data to excel with NO formatting

I'm pasting Tab Delimited data from Notepad++ to excel (about 50k rows and 3 columns). No matter how many different ways I try it, Excel wants to convert a cell containing one " to the next instance of " into one cell content.
For Example, if my data looked like this:
"Apple 1.0 Store
Banana 1.3 Store
"Cherry" 2.5 Garden
Watermelon 4.0 Field
The excel file looks like this:
Apple1.0StoreBanana1.3Store
Cherry 2.5GardenWatermelon4.0Field
One way to get around this is to open the file as a CSV in excel, however this leads to Excel formatting the number values to simplified ones using Excel's "General" format. So the data would look like the following:
"Apple 1 Store
Banana 1.3 Store
"Cherry" 2.5 Garden
Watermelon 4 Field
The data I'm getting is coming from SQL Server Studio so my options for file formats are:
.CSV
.Txt (Tab-delimited)
Copy Pasting from Query results
The solution I'm looking for is to have the data represented in Excel with no excel processing taking place on the quotations, numbers or any other cell contents.
Don't open the file directly in excel. Instead import it and control the data types and file layout.
Open a new excel document:
Select Data menu:
Select From Text in get External Data section.
Select file to import
On step 1 of import wizard select delimited
Click next
Select tab checkbox and change text qualifier to {none}.
Click next
Set column data types to general, text, text
Click finish.
Excel auto imports the data the best it can when you open directly in excel. You lose flexibility/control when this happens. better to import and control yourself to get the fine adjustments you're looking for.
You end up with something like this:
By treating the numbers like text, the zero's don't get messed up.
By setting the text qualifier to none, the quotes don't get messed up.
Have you tried opening it via Text Import?
Got to Data tab > From Text (third form left on default)
You will have window similar to Text To Columns.
Select correct delimiter, remember to remove the quote sign from TExt Qualifier and mark all columns as text to avoid Excel autoformatting.
Step 1:
Step 2:
Step 3:
EXCEL TIP: TIME SAVING IN IMPORTING CSV FILES INTO EXCEL: If u pre-set your Text-To-Columns delimiter parameters correctly in EXCEL (eg specify tabs as the delimiter) and then copy and paste the CSV data, Excel will import the CSV paste directly into the correct columns without u having to going through the Text-To-Columns rigmarole. This was particularly time saving when i had to import hundreds of bank statements into Excel.
However if your Text-To-Columns delimiters are pre-specified incorrectly as e.g. comma and you are importing tab delimited files then excel will dump all the data into one column, and u will have to go through the time consuming process of converting Text-To-Columns for each statement.
EXCEL LOOKS TO THE EXISTING Text-To-Columns delimiters TO SEE IF IT CAN USE THOSE TO MAKE YOUR LIFE EASIER WHEN PASTING DATA
Hope that tip helps (It saved me several hours)

PowerBI - Excel datasource contains JSON in column

I have an excel sheet that has three columns A, B and C.
A and B contain regular text. A firstname and lastname, if you will. The third column C contains JSON data.
Is there a way I can read this file into PowerBI and have it automatically parse out the JSON data into additional columns? In PowerBI Desktop Client, I can use an excel sheet as the datasource, and it loads in my data into the client, however it naturally treats column C as just text. I've had a look at the Advanced Editor and I'm thinking I might have to include something in there to help parse that out.
Any ideas?
I figured it out. In the query editor, right-click on the column that contains the JSON, go to Transform and select JSON. It will parse out the data, allowing you to add them in as additional column.
Extremely handy!

2 csv file export to excel file using ssis

I'm trying read the data from 2 csv files and export into the new excel file, but I'm not able to export the data in excel destination. While doing the mapping of the columns, there are 4 columns in the input columns but it is showing only 1 column in available output column that is only F1. Please let me know how to resolve this issue.
If I understand the question correctly, you are unable to map columns to a ‘new’ XL file.
If this is the case, the metadata for the mappings is probably the issue.
Try first creating a new xl file with the column headings and column types you want, then map to this.
Alternativly, right click on the excel destination and use the ‘Show Advanced Editor’ option and then adjust the columns in the ‘External Colums’ of ‘Input and Output settings tab.
You may then need to set the ValidateExternalMetadata option to false for the excel destination component in order to allow creation of new files from scratch.
Open an excel sheet and on the 1st row, give the column headings. Column A, B, C and D should have the same names as your source columns. After doing so, save the excel file and close the work book. Go to SSIS and open the Excel destination mappings. You should be able to map them now.