KNIME manually modify node settings - knime

I have a wide table filled with ID numbers (starting with a variable number of zeros) and I want to import it into KNIME but the columns are automatically detected as Integer. I tried to manually modify the settings.xml file corresponding to the import node in order to enforce a String type import without spending my afternoon clicking on each column, every time I get a new file. The entry is now:
<entry key="cell_class" type="xstring" value="org.knime.core.data.def.StringCell"/>
I get an error when re-opening the workflow. So I also modified the MissValuePattern entry to:
<entry key="MissValuePattern" type="xstring" value="?"/>
Still getting an error when re-opening the workflow. I don't see any difference between a string and an integer column so I'm a bit stuck.

Use the Line Reader Node to read each line in one at a time into one column. Then attach it to a Cell Splitter node and use the a space character (or whatever it is) that is separating the columns. Select the "as new columns" radio button and the new columns will have the same type as the original column, i.e., a String.

You can manually execute arbitrary java code to create a new column or to replace an existing one using Java Snippet (simple) or Java Snippen. For example, you can concatenate your number of values of the Integer columns Col0, Col1, Col2 as
String myCol = $Col0$ + " " + $Col1$ + " " + $Col2$;
return myCol;
//or return $Col0$ + " " + $Col1$ + " " + $Col2$;
In general it is relly useful method for creating new parameters of a dataset.

Use the Number to String Node. You can click the "always include all columns" and it should automatically select all columns every time you import a new file.

Probably it would work better if you were using a Line Reader and split the columns based on your preferred delimiter.

You can specify the column type for each column in the dialog of the File Reader node. Therefore open the dialog and double click on the header of the column you want to change. A little window will open, in which you can specify the type of the column. Change it from Integer to String.

Related

Remove last blank row in CSV using Logic App

I have a CSV file stored in SFTP where the last row is a blank, so the data looks like this in text:
a,b,c
d,e,f
,,
How can I use Logic App to remove that final row and then save it in BLOB? I have the following but will need some extra steps before the BLOB creation I think.
Considering the same sample here is my Logic app
In Compose_2 it takes the index of the last empty item. Below is the expression that I used to retrieve the lastIndex.
lastIndexOf(variables('Sample'),'\n')
Then in Compose_3 I'm selecting the one which I wanted
substring(variables('Sample'),0,outputs('Compose_2'))
Here is the Final Result
NOTE:-
Make sure you remove an extra ' \ ' been attached to '\n' in the code view at the Compose_2.
So the final Compose_2 looks like
lastIndexOf(variables('Sample'),'
')
Updated Answer
If the received data is coming from CSV then you can use the take() expression you retrieve the wanted rows. Here are a few screen shots for detailed explanation:-
Below is the expression in the compose connector
take(outputs('Split_To_Get_Rows'),sub(length(outputs('Split_To_Get_Rows')),1))

Apache NiFi: Creating new column using a condition

I have asked a similar question. Yet I wasn't able to find a solution for my problem through that approach. I have a csv which looks like this:
studentID,regger,age,number
123,west,12,076392367
456,nort,77,098123124
231,west,33,076346325
I want to add a new column and add values according to the data in the number field.This is the logic.
If the first 4 digits of data in the number column is equal to "0763" then the new column named (status) must be set as INSIDE or if it is any other value its OUTSIDE
As mentioned in the logic the output must look like this:
studentID,regger,age,number,status
123,west,12,076392367,INSIDE
456,nort,77,098123124,OUTSIDE
231,west,33,076346325,INSIDE
My Approach
I tried to achieve this by first duplicating the number column to the status column. And then trying to take the first 4 digits and dealing with it.
Hope you would be able to suggest a way to Nifi Workflow to make this possible.
I used the UpdateRecord processor twice and got the results that you want.
Input
I started with your input data.
studentID,regger,age,number
123,west,12,076392367
456,nort,77,098123124
231,west,33,076346325
Process
First, set the UpdateRecord processor as follows:
Record Reader CSVReader
Record Writer CSVRecordSetWriter
Replacement Value Strategy Record Path Value
/status /number
it will create the new column status with the value of number column.
Second, the first output should go to another UpdateRecord processor with the options
Record Reader CSVReader
Record Writer CSVRecordSetWriter
Replacement Value Strategy Literal Value
/status ${field.value:substring(0,4):equals('0763'):ifElse(${field.value:replace(${field.value},'INSIDE')},${field.value:replace(${field.value},'OUTSIDE')})}
and this will give you the final results.
Be aware that the number column is not an integer column, so you have to set the record reader CSVReader with the option Schema Access Strategy to the Use String Fields From Header.
Output
studentID,regger,age,number,status
123,west,12,076392367,INSIDE
456,nort,77,098123124,OUTSIDE
231,west,33,076346325,INSIDE
You can try below logic :-
SplitText ->
ExtractText Processor ->
RouteOnAttribute(Add condition if first four number is 0763)
-----Match Relation--> ReplaceText(Extracted Attribute from file + "INSIDE") -> PutFile
-----Unmatch Relation--> ReplaceText(Extracted Attribute from file + "OUTSIDE") -> PutFile
Hope this will help you.

Derived column Transformation Editor - I need add number at the end of order number in increment order

I have a csv file having many lines with different order number
I need to change them via SSIS Derived column Transformation Editor so I can have transformed output.
I need to write Expression that adds number at the end of order but I need different number or another order so it should be increment
Derived column Name Derived Column Expression Data Type
OrderNumber <add as new column> ?
Derived column Name Derived Column Expression Data Type
OrderNumber <add as new column> OrderNumber+"-"+"1" unicode string
I don't think you can add an incremental number using derived column transformation, you have to use a script component to achieve that.
Simply add a script component, go to Inputs and Outputs tab and add an Output column of type DT_STR. And inside the script editor use a similar script:
int intOrder = 1;
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
if(!Row.OrderNumber_IsNull && !String.IsNullOrEmpty(Row.OrderNumber)){
Row.outOrderNumber = Row.OrderNumber + "-" + intOrder.ToString();
intOrder++;
}else{
Row.outOrderNumber_IsNull = true;
}
}
There is no stock method to do what you are trying to achieve. What would rather be easier is to write a script component and have that generate the numbers for you. Once you get the new column out of that script component, it is easy enough to concatenate that with your existing order number.
Just curious, why cant you do this on the database itself? It would be much easier to implement and control IMO.
Here is a link to generating the numbers: Generating Surrogate keys in SSIS

Write into Excel Destination from SSIS variables

I have 3 SSIS variables namely name, age, gender with initial values set. I want to write these values into excel sheet in one row. Later I will extend this to Array of records.
To do this I have created Excel connection attaching the excel sheet where I want to write.
I added control flow task and double clicked and then added Derived column component to create derived columns for each of above 3 variables . Inside derived column editor I selectd above variables as new derived columns.
And then pipelined excel destination component and mapped sheet columns to derived columns. I executed the SSIS package and its successful. But variables are not written into excel sheet.
What I am doing wrong ?
Again, you need a source. I gave you an "easy" solution. This is probably the best solution to your problem:
This time the source will be a script component (select Source).
Steps after you add Script Component:
Select Source
Go to Inputs and Outputs
Add your Output Columns (Don't forget about data types)
Go back to Script
Add you variables (Gender, Name and Age)
Go into Script
Add the following code
public override void CreateNewOutputRows()
{
Output0Buffer.AddRow();
Output0Buffer.Age = Variables.Age;
Output0Buffer.Gender = Variables.Gender;
Output0Buffer.Name = Variables.Name;
}
You need a source. the easiest would be to use a SQL connection.
Use a variable of type string named SQL.
Set SQL = "Select '" + name+ "' as name,"+ age + "as age,'" + gender + "' as Gender
Set your source to SQL variable.
Connect this Source to Destination and you should have 1 row with 3 columns
Listing the steps clearly as suggested by #KeithL
Create a SSIS variable selectQueryVariables with string datatype.
Assign variable expression as
"SELECT '"+#[User::name]+"' as Name,'"+#[User::gender]+"' as Gender,"+(DT_WSTR,4 )#[User::age]+" as Age"
Add OLE DB Source component and set data access mode as SQL command from variable and select the variable selectQueryVariables in dropdown. Now the source is ready with 3 columns Name, Age and Gender.
Pipeline this with Excel Destination and map columns source and destination.

Can I use SpreadsheetGear to read from a CSV file without it formatting the cells?

Given a simple CSV file that consists of a string of digit characters and a date in UK format:
"00000000","01/01/2014"
and code to get the used cells:
IWorkbookSet workbookSet = SpreadsheetGear.Factory.GetWorkbookSet();
IWorkbook workbook = workbookSet.Workbooks.Open(#"C:\file.csv");
IRange cells = workbook.Worksheets[0].UsedRange;
when I access cells[0,0].Text it gives it as 0, because it's treating it as numeric and therefore the leading 0s are meaningless. It will do the same for the date. I'm trying to manually construct a DataTable from the cells, but I need the original values in the file.
I tried:
SpreadsheetGear.Advanced.Cells.IValues cells = (SpreadsheetGear.Advanced.Cells.IValues)workbook.Worksheets[0];
var sb = new StringBuilder();
cells[0,0].GetText(sb);
but nothing is appended to the string builder.
How can I get access to the original file values?
SpreadsheetGear does not make available the original values as found in in the CSV file (such as "00000000" in your case). You would only be able to access cell data after it has been parsed and processed by SpreadsheetGear (i.e., converting the above to a double value of 0). If you need the CSV's original values, then you'll need to open up file yourself and manually process and parse it.
It sounds like you ultimately want a DataTable, but if you still require to create a workbook file from your CSV data, once you've created a routine to manually open and parse each "cell" in your CSV file, you could enter each value into a spreadsheet as Text, so that it is preserved as it is found in the CSV file. You can go about this in two ways:
1) Set IRange.NumberFormat to "#", which will treat any future input into that IRange as Text. Example:
worksheet.Cells["A1"].NumberFormat = "#";
worksheet.Cells["A1"].Value = "00000000";
2) Prepend your inputted value with a single apostrophe, which indicates that you want the input to be treated as text. Example:
worksheet.Cells["A1"].Value = "'00000000";
If you still need a DataTable at this point, you could use the IRange.GetDataTable(...) method to accomplish this. Because the cell data is stored as Text, your DataTable values should also reflect these same values Example:
DataTable dt = worksheet.Cells["A1"].GetDataTable(GetDataFlags.None);
(There is a GetDataFlags.FormattedText option, but this isn't really relevant for your case since the cell data is stored as text anyway and so won't be formatted)