How to remove prefixed numbers from a varchar in SSIS? - ssis

I have an SSIS package which copy data from text file to a table in sql server.
I use 3 tasks for the same 1)Flatfile Source 2) Derived Column task, 3) SQL Destination task
In the 3rd task..I specify the table...into which I have to copy the data.
In that destination table..there is a column DESC..and it's of type varchar...and flat file contains data for this column as follows :- "01 planA".."04 plan C", "PlanJ".
I need to remove these prefixed numerics. I have a query as below..but I can use this in derived column task..as SSIS wont support PATINDEX.
SUBSTRING([DESC], PATINDEX('%[a-zA-Z]%',[DESC]), LEN([DESC])- PATINDEX('%[a-zA-Z]%',[DESC])+1)
Please help me.

You could do this in a Script task, using Regular Expressions. You'll need to reference System.Text.RegularExpressions then use the .NET Regex object to mimic the code above in the Input0_ProcessInputRow method.
Alternatively, send the raw data firstly to a SQL staging table and extract from there using PATINDEX as in your example, then push the cleaned version to your destination table.

Related

Azure Data Factory remove csv extension name during copy activity to SQL DB

This is the following use case:
I have different csv files from my data lake and want to copy this to my Azure SQL DB. a typical file name of the csv looks like this : Sale-Internet-Header.csv.
In the Sink property of Azure SQL DB i used the expression in the sink dataset #replace(item().name, '-','_').
After execute the copy pipeline the sql table has the following name: dbo.sales_internet_header.csv
I would like to change my expression in the sink dataset to remove the ".csv" so that the SQL table name going to look like : dbo.sales_internet_header
Any suggestions?
Many thanks
You can use replace() and add dynamic content
#replace(variables('cc'),'.csv', '') remove the ".csv".
Give this dynamic content in the SQL dataset table name or inside ForEach as above. Here is the sample demonstration that where I have used set variable to replace the .csv.
Output:

ADF Copy Activity Fails CSV to Parquet when CSV has space in header column

When using a copy activity in Azure Data Factory to copy a typical CSV file with a header row into Parquet sink, the SINK fails with the following error due to the column names in the CSV having spaces in the header.
The column name is invalid. Column name cannot contain these
character:[,;{}()\n\t=]
The CSV is pipe delimited and displays just fine using the preview feature of the dataset with the first row marked as the header. I see no options to handle this use-case on the parquet side (sink) of the copy activity. I realize this can probably be addressed using a data flow to transform column names to remove spaces, but does that mean the native copy activity is incapable of handling this condition where a space in included in a header row?
EDIT: I should have added that dataset uses default mappings so that we can use the same dataset for any CSV to PARQUET copy. The answer provided will work for explicit mappings, but we don't see any resolution for folks who use default/dynamic mappings since we do not have access to the column names to remove spaces.
As we can note from the official Doc here
Error code: ParquetInvalidColumnName
Message: The column name is invalid. Column name cannot contain these character:[,;{}()\n\t=]
Cause: The column name contains invalid characters.
Resolution: Add or modify the column mapping to make the sink column name valid.
If you would like continue to use copy activity, there are few workarounds
1. make sure you have selected Column delimiter as Pipe(|)
2. If feasible, in mapping settings > import schema and rename the column name without spaces in destination column.
This is still an ongoing issue or request, follow here for more.

SSIS Choosing Wrong Data Type

I've created a table in SQL Server 2016 with this definition:
CREATE TABLE Stage.Test (f1 DATE NULL);
INSERT INTO Stage.Test (f1) VALUES ('1/1/2000');
Notice the f1 column uses the DATE data type.
Then I created a data flow task in SQL Server Data Tools (VS 2019) to load data from that table. I created an OLEDB Source Component and set the source table to Stage.Test. But when I examine the data type of the "f1" column (in both the 'External Column' and 'Output Column' columns), it says it's a Unicode string:
Why is it choosing a Unicode string instead of DT_DATE?
I haven't seen this exact example when it comes to date fields but SSIS converts data automatically when it has detected the field to be of a certain type. Perhaps it's the '/' in your date field that does it. We do not use this date format over in these parts of the world so I've never had the problem. You can especially see this when you import excel files with SSIS. I usually have this problem with strings where unicode strings can sometimes become non-unicode strings.
A way to fix could be to:
edit the sql query to explicitly cast the field as date
add a conversion step in the data flow after the source (like a derived column getting the parts of the string in the right order)
try to change the output datatype by right clicking on the source in the data flow and using the advanced editor and then edit the output column datatype:
SSIS Source Advanced Editor Output
I'm not sure if that 3 would work with this date format issue as I do not have experience with the format myself but it is working fine for my unicode/non-unicode problem.

SSIS package writing to CRM 2011 Data type error

We are trying to push a single order in to MS CRM (dev instance) via SSIS package.
Most of the columns coming from source (staging table) are of data type 'DT_STR' and their mapped fields in CRM are of 'DT_WSTR' data type.
I already looked for the solution on this site but in all cases the question is for converting wstr to str. In my case I need to convert str to wstr. when I run the package I get error saying,
Column xxxx cannot convert between unicode and non unicode string data type
I have already tried two solution:
1. Right click on the OLE source and convert datatype to wstr and
2. Using 'Data Conversion'
In both cases the error remains the same. Has anyone else had similar issue?
In OLE DB Source properties don't change data types. If you want you can change in
SELECT statement in OLE DB source.
you can change in 'Data Conversion'
Derived Column element
In Derived Column element code is:
(DT_WSTR, 50)([YourString])
Don't replace column, add new column in Derived column element.
You doing something wrong if you can't convert, you don't give real error message (or picture of your design), real error message is in Output window when you execute the package.

CAST or CONVERT in Excel Query

How to use CAST or CONVERT in Excel Query ie in Excel Source Task. I wanted to conert a number to Text in the Excel Source Task itself using SQL Command option.
Best option is to use Data conversion between source and destination.
As in Excel source or destination connection manager we can CAST or CONVERT as in SQL.
Please add data trasformation task like below:
As like the Excel destination it can be any destination after transformation
There are many problems connected with excel files and data types. Sometimes it's very painful.
Check this out:
http://social.msdn.microsoft.com/forums/en-US/adodotnetdataproviders/thread/355afd19-8128-4673-a9d1-9a5658b72e84
or this:
http://www.connectionstrings.com/excel
When I use xls files I often convert everything to text: IMEX=1 and then I use Data Conversion Task.
So I suggest to change your connection string to:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\MyExcel.xls;Extended Properties="Excel 8.0;HDR=Yes;IMEX=1";
And then you can do everything you want with data in SSIS.