how can i write a condition in ssis to verify if the column is a string doesn't contain numbers - ssis

i have an excel source to a data base but before i transmit the information in the column product-name i have to verify if the information is a string without numbers in it(example:lion is correct but lion124 is wrong) , using conditional split
and after the verification i have to send a message in an excel file telling the user that the column that he wrote is not correct and if it is correct i will send it to the data base
how could i check for the column and how can i send a excel file ?

I would run it through a script transformation and use c#.
Add a column (boolean) to your data and use this:
Row.NewColForIntTest = Row.YourStringColumn.Any(char.IsDigit);
Then conditionally split off the new column.

Related

Advanced mapping of JSON in Azure Data Factory - some guidance requested

I'm trying to map a JSON document (sensor data) into a more meaningful representation using Mapping Dataflows. However, hard time getting this to work and would really appreciate some insight/recommendations on how to solve the following:
The input is
What I would like to end up with is the following:
Any pointers as to how this can be implemented are more than welcome.
This can be accomplished using the Copy activity and then split function in Derived Column transformation in Azure Data Factory.
Use the copy activity to read the JSON file as source and in sink, use SQL database to store the data as table. In Mapping tab, Import the schema and map the JSON records to the corresponding column names. Refer this third-part tutorial for guidance - https://sqlkover.com/dynamically-map-json-to-sql-in-azure-data-factory/
Finally, use the Data Flow activity and choose the SQL table as source now which you have used as sink above.
Select the Derived Column transformation.
Use split function.
Add the column which will take the split values which you want to split as shown below.
Use split(<column_name_to_split>, '_') function to split the column on with _ delimiter. Change <column_name_to_split> to the name of column you cant to split. Refer image below.
Preview the data to check the result.

PowerBI - Excel datasource contains JSON in column

I have an excel sheet that has three columns A, B and C.
A and B contain regular text. A firstname and lastname, if you will. The third column C contains JSON data.
Is there a way I can read this file into PowerBI and have it automatically parse out the JSON data into additional columns? In PowerBI Desktop Client, I can use an excel sheet as the datasource, and it loads in my data into the client, however it naturally treats column C as just text. I've had a look at the Advanced Editor and I'm thinking I might have to include something in there to help parse that out.
Any ideas?
I figured it out. In the query editor, right-click on the column that contains the JSON, go to Transform and select JSON. It will parse out the data, allowing you to add them in as additional column.
Extremely handy!

SSIS Need Flat File output with 2 column headers the same

I am trying to use SSIS Flat File destination, but cannot come up with a work around for getting the output file to have two columns named to same thing.
I have a requirement for the output file to have the column headers:
first1, last1, email, shortname, email
Whenever I try to map the source data, I get error messages saying things like "This column name already exists" and "There is more than one data source column with the name "email"".
What's the best work around?
Thanks
Assuming I understand the problem correctly, you need to have the same column name in the output file twice. Doesn't matter whether it's same data or not, just the header needs to be repeated.
It's a little hokey, but in your connection manager, uncheck "Column Names in the first data row" and redefine the columns as email and email1. This will allow you to connect the columns to the right places in the file.
In your flat file destination, you have the ability to define Header row(s). It's very limited, you can't put useful things in there like dynamic checksums and such but in your case, paste in first1, last1, email, shortname, email and run the package. Data will be extracted to the correct columns and a header row will be prepended to the file with all the "right" field names.
Two downsides to this approach. First is the connection manager becomes output only as it would attempt to read in the header row from the file. Second is that any changes to the layout will not be kept in sync with the manual header row.

How to create dynamic number of output files with SSIS?

I will be creating flatfiles and based on the data in the batch, it might be necessary to split the data into an undetermined number of files.
I can make the connection string dynamic with an expression, but that is only evaluated when the package starts. I'd like to change that expression to include a '-a' or '-b' in the filename.
Alternately, if I have to create new connection manager objects at run time on demand, how do I go about that?
First determine your naming scheme for the output files and come up with an expression formula in your head
Put the Data Flow Task in a loop.
Within this Data Flow Task, define the source and destination. Destination being the Flat File Destination. Read the source and add some derived column that sets a value to another variable that you'll later use in the Filename expression.
Connect the Flat File Destination to a Connection Manager. First define some path but then add an Expression to define a Connection String based on your File Name scheme (Path + Filename + extension). Now this Filename is tricky. You'll have to put IIF statements based on the values you've got from Source
1) create grobal variable(a variable is created within the scope of a package) and assign it to the file name property.
2) change the variable during the looping.
EDITED
see for more details...
You can access the data set in a script (in the script component) and write out to a set of files based on your criteria.

SSIS - Is there a Data Flow Source component that will handle CSV files where the column order may change?

We have written a number of SSIS packages that import data from CSV files using the Flat File Source.
It now seems that after these packages are deployed into production, the providers of these files may deliver files where the column order of the files changes (Don't ask!). Currently if this happens, our packages will fail.
For example, an additional column is inserted at the beginning of each row. In this case, the flat file source continues to use the existing column order, which obviously has a detrimental effect on the transformation!
Eg. Using a trivial example, the original file has the following content :
OurReference,Client,Amount
235,MFI,20000.00
236,MS,30000.00
The output from the flat file source is :
OurReference Client Amount
235 ClientA 20000.00
236 ClientB 30000.00
Subsequently, the file delivered changes to :
OurReference,ClientReference,Client,Amount
235,A244,ClientA,20000.00
236,B222,ClientB,30000.00
When the existing unchanged package is run against this file, the output from the flat file source is :
OurReference Client Amount
235 A244 ClientA,20000.00
236 B222 ClientB,30000.00
Ideally, we would like to use a data source that will cope with this problem - ie which produces output based on the column names, instead of the column order.
Any suggestions would be welcomed!
Not that I know of.
A possibility to check for the problem in advance is to set up two different connection managers, one with a single flat row. This one can read the first row and tell if it's OK or not and abort.
If you want to do the work, you can take it a step further and make that flat one-field row the only connection manager, and use a script component in your flow to parse the row and assign to the columns you need later in the flow.
As far as I know, there is no way to dynamically add columns to the flow at runtime - so all the columns you need will need to be added to the script task output. Whether they can be found and get parsed from the each line is up to you. Any "new" (i.e. unanticipated) columns cannot be used. Columns which are missing you could default or throw an exception.
A final possibility is to use the SSIS object model to modify the package before running to alter the connection manager - or even to write the entire package dynamically using the object model based on an inspection of the input file. I have done quite a bit of package generation in C# using templates and then adding information based on metadata I obtained from master files describing the mainframe files.
Best approach would be to run a check before the SSIS package imports the CSV data. This may have to be an external script/application, because I don't think you can manipulate data in the MS Business Intelligence Studio.
Here is a rough approach. I will write down the limitations at the end.
Create a flat file source. Put the entire row in one column.
Do not check Column names in first data row.
Create a Script Component
Code:
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
string sRow = Row.Column0;
string sManipulated = string.Empty;
string temp = string.Empty;
string[] columns = sRow.Split(',');
foreach (string column in columns)
{
sManipulated = string.Format("{0}{1}", sManipulated, column.PadRight(15, ' '));
}
/* Note: For sake of demonstration I am padding to 15 chars.*/
Row.Column0 = sManipulated;
}
Create a flat file destination
Map Column0 to Column0
Limitation: I have arbitrarily padded each field to 15 characters. Points to consider:
1. Do we need to have each field of same size?
2. If yes, what is that size?
A generic way to handle that would be to create a table to store the file name, fields, and field sizes.
Use the file name to dynamically create the source and destination connection manager.
Use the field name and corresponding field size to decide the padding. Not sure, if you need this much flexibility. If you have any question, please respond.