SSIS Flat File add trailing spaces to columns - csv

I am developing a SSIS package which concatenates 3 columns and then outputs the result to a flat file.
1st column is a emp_number consists of length 10.
The values which I get is "12345" or "123456" or "1234567".
In the output I want is "12345 " or "123456 " or "1234567 "
I have a requirement wherein I need to have columns of fixed size(10), so if the length a value for a particular column is lesser than
the expected length I need to pad or fill it with spaces so that the length is matched.
Can you please help.

Add a Derived Column transformation that takes the column value, concatenates it to a string made up of 10 spaces (or whatever the total length after padding should be) and then take the rightmost 10 chars using an expression:
RIGHT("0000000000" + yourcol, 10)

Similar to iamdave's answer but you need the reverse:
left(yourcolumn + " ",10)
There are 10 spaces between the quotes.
if your column is not a string you need to cast it:
left((DT_WSTR,10)yourcolumn + " ",10)

Related

Pyspark How to Ignore Double quotes from the data present in the CSV files

I am having " (single quotes) in my data ,all the corresponding column values clubbed into one column even though I have used the delimiter value. In my case '|' is my delimiter.
Actual Data:
a|"b|c|d|
Expected Output:
a|"b|c|d
Actual Output Came:
a|"b**|c|d|**null|null| ( here 3rd & 4th column coming as single column, in place of actual 3rd & 4th col.. getting values as null)
I have tried below approach:
Approach 1:
df=spark.read.csv(filepath,header=True,sep='|',quote='')
Above approach gives particular column data correctly but empty columns coming values as """" but we need empty column as it is.
Approach 2:
df=spark.read.csv(filepath,header=True,sep='|',quote='',escape='\"')
Above approach gives values clubbing into single column as like actual output.
After some heads & trails
found solution
read the file with both below options :
quote='',escape='\"'

Converting 9 digit string to number and adding comma

I need to convert my "Column1" to string numbers to value numbers.
The input is a 9 lenght string like 000011500 or 000002151 or 000000000.
The result should be 115,00 - 21,51 - 0
So first of all I used this Expression with derived column to remove the left zeros:
(DT_WSTR,50)(DT_I8)[Column1]
So my output was 11500, 2151, 0.
Now I want to put comma's from right to the second number.
I was trying to use the following expression with derived column: LEFT([COLUMN1], LEN([COLUMN1]-2)+
"," + RIGHT([COLUMN1],2)
However I am keep getting errors.
Could you please help me?
Thank you!
Why not just convert it to a decimal and divide by 100?
e.g.
((DT_DECIMAL, 2) "000002151") / 100

SSIS remove Left zero from csv Flat File Source

I have a csv file that I want to process in SSIS. The file contains a column type string Unicode string [DT_WSTR], example: ColumnA -> ("00000123400").
I want to delete the zeroes that are on the left of 123400 and also delete the quotes and have a result as following: 123400.
For quotation marks I find the following solution via Derived Column: REPLACE (ColumnA, "\" "," "), which gives me the following result: 00000123400.
How to remove the zeroes which are on the left?
After deleting the quotation marks, I tried to convert my string to integer [DT_I4], but that does not remove the zeroes.
Do you have the answer to my case? Thanks in advance.
The solution of a part of the case is:
in our Derived column put the expression:
REPLACE(LTRIM(REPLACE(ColumnA,"0","")),"","0")
It remove just left zero
you can see the link: Removing left padding zero in SSIS
It work perfectly, but is it possible to trim left zero, and also delete quotation marks in the same time in expression?
Example: I have Column1 which is string with quotation marks and left zero - "0000123400"
I try this expression:
REPLACE(REPLACE(LTRIM(REPLACE(column1, "0", " ")), " ", "0"),"\""," ")
but it doesn't work, it deletes all zeros and returns 1234.
The solution that I want is to get 123400.
Should I do it one by one? Create a delivered column and delete quotation marks first, and after create an other delivered column for Left zero ?
Thanks in advance.
It looks like you want the output to be in numeric form? If so, the following expression will remove the quotes and leading zeros while preserving the trailing zeros from Unicode text. This can be done in a single operation, with one Derived Column that will create a new column (add a new column option) with an integer output data type in the data flow.
(DT_I4)REPLACE(ColumnA,"\"","")
If you want to keep this as the Unicode data type the expression below will do this, also in a single Derived Column. Just adjust the length according to your columns.
(DT_WSTR, 50)(DT_I4)REPLACE(CoulmnA,"\"","")

Finding number of occurence of a specific string in MYSQL

Consider the string "55,33,255,66,55"
I am finding ways to count number of occurence of a specific characters ("55" in this case) in this string using mysql select query.
Currently i am using the below logic to count
select CAST((LENGTH("55,33,255,66,55") - LENGTH(REPLACE("55,33,255,66,55", "55", ""))) / LENGTH("55") AS UNSIGNED)
But the issue with this one is, it counts all occurence of 55 and the result is = 3,
but the desired output is = 2.
Is there any way i can make this work correct? please suggest.
NOTE : "55" is the input we are giving and consider the value "55,33,255,66,55" is from a database field.
Regards,
Balan
You want to match on ',55,', but there's the first and last position to worry about. You can use the trick of adding commas to the frot and back of the input to get around that:
select LENGTH('55,33,255,66,55') + 2 -
LENGTH(REPLACE(CONCAT(',', '55,33,255,66,55', ','), ',55,', 'xxx'))
Returns 2
I've used CONCAT to pre- and post-pend the commas (rather than adding a literal into the text) because I assume you'll be using this on a column not a literal.
Note also these improvements:
Removal of the cast - it is already numeric
By replacing with a string one less in length (ie ',55,' length 4 to 'xxx' length 3), the result doesn't need to be divided - it's already the correct result
2 is added to the length because of the two commas added front and back (no need to use CONCAT to calculate the pre-replace length)
Try this:
select CAST((LENGTH("55,33,255,66,55") + 2 - LENGTH(REPLACE(concat(",","55,33,255,66,55",","), ",55,", ",,"))) / LENGTH("55") AS UNSIGNED)
I would do an sub select in this sub select I would replace every 255 with some other unique signs and them count the new signs and the standing 55's.
If(row = '255') then '1337'
for example.

Derived Column Editor

I need to assign a formatted date to a column in a data flow. I have added a Derived shape and entered the following expression for a NEW column - Derived Column = "add as new column":
"BBD" + SUBSTRING((DT_WSTR,4)DATEADD("Day",30,GETDATE()),1,4) +
SUBSTRING((DT_WSTR,2)DATEADD("Day",30,GETDATE()),6,2) +
SUBSTRING((DT_WSTR,2)DATEADD("Day",30,GETDATE()),9,2)
The problem is that the Derived Column Transformation Editor automatically assigns a Data Type of Unicode string[DT_WSTR] and a length of "7". Howver, the length of a string is 11, therefore the following exception is thrown each time:
[Best Before Date [112]] Error: The "component "Best Before Date" (112)" failed
because truncation occurred, and the truncation row disposition on "output column
"Comments" (132)" specifies failure on truncation. A truncation error occurred
on the specified object of the specified component.
Does anyone know why the edit is insisting on a length of 7? I don't seem to be able to change this.
Many thanks,
Rob.
I can't understand why SSIS is measuring that column as only resulting in a seven character field - but to force it to provide an 11 character column for it, modify your expression slightly to this:
(DT_WSTR, 11)("BBD" + SUBSTRING((DT_WSTR,4)DATEADD("Day",30,GETDATE()),1,4) + SUBSTRING((DT_WSTR,2)DATEADD("Day",30,GETDATE()),6,2) + SUBSTRING((DT_WSTR,2)DATEADD("Day",30,GETDATE()),9,2))
What you want is:
"BBD" + (DT_WSTR,4)YEAR(DATEADD("Day",30,GETDATE()))
+ RIGHT("0" + (DT_WSTR,2)MONTH(DATEADD("Day",30,GETDATE())),2)
+ RIGHT("0" + (DT_WSTR,2)DAY(DATEADD("Day",30,GETDATE())),2)
The issue is in how you are converting your dates to a string. The calls to DATEADD return a full date & time. Next, you then have either (DT_WSTR,4) or (DT_WSTR,2) to convert that date into either a 4 or 2 character string. On my system, converting a datetime to a string defaults to "Aug 24 2011 4:18PM". So the first 4 characters gets you "Aug " and the first 2 characters gets you "Au". Then, you are extracting substrings using SUBSTRING. For your last two calls to SUBSTRING, you are starting the substring past the end of the 2 character string you converted the date into. This is why SSIS displays 7 characters:
"BBD" + "Aug " + "" + ""
3 + 4 + 0 + 0 = 7
It is better to use the built in functions to extract the Year, Month and Day from a datetime rather than converting to a string and then grabbing substrings. If you really wanted to use substrings, you would need to add a call to CONVERT to get the datetime to a specific string format, otherwise you will get whatever the default is for your locale setting in Windows. This could be different on each PC.
What release and service pack of SQL are you using?
I just tried this on my machine and had no problems changing the result size from 7 to 11. Is it possible that you have not installed all the service packs?
Are you replacing your existing field, and is that field possibly 7 chars long? The thing with the Derived Column Transform is that you can't change the field types (including length) of the existing fields.
Try to add a new field instead.
If that's not working, try adding an explicit cast around the whole expression.
(DT_WSTR,11)("BBD" + SUBSTRING((DT_WSTR,4)DATEADD("Day",30,GETDATE()),1,4) + SUBSTRING((DT_WSTR,2)DATEADD("Day",30,GETDATE()),6,2) + SUBSTRING((DT_WSTR,2)DATEADD("Day",30,GETDATE()),9,2))
Right click on "Derived Column" open "Show Advanced Editor" Select "Input and output Properties" tab.
Got to "Derived column output" => "Output Columns" => "Derived Column 1" (added by you)
In right side panel go to "Data type Properties" section=> DataType=>
Select "String [DT_STR]
click OKImage showing steps
This will solve your problem.