How do I tweak columns in a Flat File Destination in SSIS? - ssis

I have an OLE DB Data source and a Flat File Destination in the Data Flow of my SSIS Project. The goal is simply to pump data into a text file, and it does that.
Where I'm having problems is with the formatting. I need to be able to rtrim() a couple of columns to remove trailing spaces, and I have a couple more that need their leading zeros preserved. The current process is losing all the leading zeros.
The rtrim() can be done by simple truncation and ignoring the truncation errors, but that's very inelegant and error prone. I'd like to find a better way, like actually doing the rtrim() function where needed.
Exploring similar SSIS questions & answers on SO, the thing to do seems to be "Use a Script Task", but that's ususally just thrown out there with no details, and it's not at all an intuitive thing to set up.
I don't see how to use scripting to do what I need. Do I use a Script Task on the Control Flow, or a Script Component in the Data Flow? Can I do rtrim() and pad strings where needed in a script? Anybody got an example of doing this or similar things?
Many thanks in advance.

With SSIS, there are many possible solutions! From what you mention, you could use a Derived Column transform within a Data Flow to perform the trimming and padding - you would use an expression to do this, it would be relatively straightforward. Eg,
ltrim([ColumnName])
to trim and something along the lines of
right("0000"+ [ColumnName],6)
to pad (this is off the top of my head so syntax may not be exact).
As for the scripting method, that is also valid. You would use the Script Component Transform on the Data Flow and use VB.NET or C# (if you have 2008) string manipulation methods (eg strVariable.Trim()).

Related

SSIS reading csv produced by My Sql, code page conflict

I've got a 3rd party file coming in - utf-8 encoded, 56 columns, csv export from MySql. My intent is to load it into a Sql Server 2019 instance - a table layout I do not have control over.
Sql Server Import Wizard will automatically do the code page conversions to latin 1 (and a couple string-to-int conversions) but it will not handle the MySql "\N" for null conventions, so I thought I'd try my hand at SSIS to see if I could get the data cleaned up on ingestion.
I got a number of components set up to do various filtering and transforming (like the "\N" stuff) and that was all working fine. Then I tried to save the data using an OLE DB destination, and the wheels kinda fall off the cart.
SSIS appears to drop all of the automatic conversions Import Wizard would do and force you to make the conversions explicit.
I add a Data Transformation Component into the flow and edit all 56 columns to be explicit about the various conversions - only it lets me edit the "Copy of" output column code pages it will not save them. Either in the Editor or the Advanced Editor.
I saw another article here saying "Use the Derived Column Transformation" but that seems to be on a column-by-column basis (so I'd have to add 56 of them).
It seems kinda crazy that SSIS is such a major step backwards in this regard from Import Wizard, bcp, or BULK INSERT.
Is there a way to get it to work through the code page switch in SSIS using SSIS components? All the components I've seen recommended don't seem to be working and all of the other articles say "make another table using different code pages or NVARCHAR and then copy one table to the other" which kinda defeats the purpose.
It took synthesizing a number of different posts on tangentially related issues, but I think I've finally gotten SSIS to do a lot of what Import Wizard and BULK INSERT gave for free.
It seems that to read a utf-8 csv file in with SSIS and to process it all the way through to a table that's in 1252 and not using NVARCHAR involves the following:
Create a Flat File Source component and set the incoming encoding to 65001 (utf-8). In the Advanced editor, convert all string columns from DT_STR/65001 to DT_WSTR (essentially NVARCHAR). It's easier to work with those outputs the rest of the way through your workflow, and (most importantly) a Data Conversion transform component won't let you convert from 65001 to any other code page. But it will let you convert from DT_WSTR to DT_STR in a different code page.
1a) SSIS is pretty annoying about putting a default 50 length on everything by default. And not carrying through any lengths as defaults from one component/transform to the next. So you have to go through and set the appropriate lengths on all the "Column 0" input columns from the Flat File Source and all the WSTR transforms you create in that component.
1b) If your input file contains, as mine apparently does, invalid utf-8 encoding now and then, choose "RD_RedirectRow" as the Truncation error handling for every column. Then add a Flat File Destination to your workflow, and attach the red line coming out of your Flat File Source to it. That's if you want to see what row was bad. You can just choose "RD_IgnoreError" if you don't care about bad input. But leaving the default means your whole package will blow up if it hits any bad data
Create a Script transform component, and in that script you can check each column for the MySql "\N" and change it to null.
Create a Data Conversion transformation component and add it to your workflow. Because of the DT_WSTR in step 1, you can now change that output back to a DT_STR in a different code page here. If you don't change to DT_WSTR from the get-go, the Data Conversion component will not work changing the code page at this step. 99% of the data I'm getting in just has latinate characters, utf-8 encoded (the accents). There are a smattering of kanji characters in a small subset of data, so to reproduce what Import Wizard does for you, you must change the Truncation error handling on every column here that might be impacted to RD_IgnoreError. Unlike some documentation I read, RD_IgnoreError does not put null in the column; it puts the text with the non-mapping characters replaced with "?" like we're all used to.
Add your OLE DB destination component and map all of the output columns from step 3 to the columns of your database.
So, a lot of work to get back to Import Wizard started and to start getting the extra things SSIS can do for you. And SSIS can be kind of annoying about snapping column widths back to the default 50 when you change something. If you've got a lot of columns this can get pretty tedious.

Reading tag from XML and conversion failure in SSIS package

I'm having problem while reading XML tag from XML structure and later inserting it to database. Problem occurs with one tag: "<UV>". I have some files with the same structure and inside this tag I have one time values: "11" and in some of them there is value: "5.5". It is example here but I have number with 2 max decimal places. The problem is when SSIS package wants to read this tag when value is with decimal places. In database, this column is set to be Numeric(10,2) precision. In SSIS package I tried different data type columns but without appropriate success. Can onyone help me to read this tag correctly? What should I change to make it working? I have following structure:
If someone is interested, I manually manipulated types in created XSD file changing existing type of variable/tag to correct one: type:"DECIMAL". This solved my problem here. This might help other not experienced developers and accelerate solving similar problems in the future. Regards, Darek.

Editing JSON - Add Attribute

I have a slew of JSON files I'm getting dumps of, with data from the day/period it was pulled. Most of the JSON files I'm dealing with are a lot larger than this, but I figured a smaller one would be easier to work with.
{"playlists":[{"uri":"spotify:user:11130196075:playlist:1Ov4b3NkyzIMwfY9E8ixpE","listeners":366,"streams":386,"dateAdded":"2016-02-24","newListeners":327,"title":"#Covers","owner":"Saga Prommeedet"},{"uri":"spotify:user:mickeyrose30:playlist:2Ov4b3NkyzIMwfY9E8ixpE","listeners":229,"streams":263,"dateAdded":"removed","newListeners":154,"title":"bestcovers2016","owner":"Mickey Rose"}],"top":2,"total":53820}
What I'm essentially trying to do is add a date attribute to each line of data, so that when I combine multiple JSON files to put through an analytical tool, the right row of data is associated with the correct date. My first thought was to write it as such:
{"playlists":[{"uri":"spotify:user:11130196075:playlist:1Ov4b3NkyzIMwfY9E8ixpE","listeners":366,"streams":386,"dateAdded":"2016-02-24","newListeners":327,"title":"#Covers","owner":"Saga Prommeedet"},{"uri":"spotify:user:mickeyrose30:playlist:2Ov4b3NkyzIMwfY9E8ixpE","listeners":229,"streams":263,"dateAdded":"removed","newListeners":154,"title":"bestcovers2016","owner":"Mickey Rose"}],"top":2,"total":53820,"date":072617}
since the "top" and "total" attributes are showing up on each row of data (with the associated values also showing up on each row) when I put it through an analytical tool like Tableau.
Also, have been editing and saving files through Brackets, and testing things through this converter (https://konklone.io/json/)
In javascript language
var m = JSON.parse(json_string);
m["date"]="20170804";
JSON.stringify(m);
This will work for you, very simple,

How to read a file and write to other file in tcl with replacing values

I have three files: Conf.txt, Temp1.txt and Temp2.txt. I have done regex to fetch some values from config.txt file. I want to place the values (Which are of same name in Temp1.txt and Temp2.txt) and create another two file say Temp1_new.txt and Temp2_new.txt.
For example: In config.txt I have a value say IP1 and the same name appears in Temp1.txt and Temp2.txt. I want to create files Temp1_new.txt and Temp2_new.txt replacing IP1 to say 192.X.X.X in Temp1.txt and Temp2.txt.
I appreciate if someone can help me with tcl code to do same.
Judging from the information provided, there basically are two ways to do what you want:
File-semantics-aware;
Brute-force.
The first way is to read the source file, parse it to produce certain structured in-memory representation of its content, then serialize this content to the new file after replacing the relevant value(s) in the produced representation.
Brute-force method means treating the contents of the source file as plain text (or a series of text strings) and running something like regsub or string replace on this text to produce the new text which you then save to the new file.
The first way should generally be favoured, especially for complex cases as it removes any chance of replacing irrelevant bits of text. The brute-force way me be simpler to code (if there's no handy library to do this, see below) and is therefore good for throw-away scripts.
Note that for certain file formats there are ready-made libraries which can be used to automate what you need. For instance, XSLT facilities of the tdom package can be used to to manipulate XML files, INI-style file can be modified using the appropriate library and so on.

Sanitizing database inputs in Matlab?

Does Matlab's database toolbox have a function to sanitize inputs? I can't find any mention of one in the documentation.
I have a bunch of strings that I'd like to write to a MySQL database. Some of the strings contain apostrophes, and these are causing errors. I'm looking for a simple way to preprocess the strings to make them database-friendly.
Also, it's not necessary in my application to be able to reconstruct the original strings exactly. The preprocessing step never needs to be "undone".
In the end I used matlab's genvarname function to preprocess my strings. This function doesn't do database sanitization, per se, and it's not invertible, but it does remove apostrophes. It met my needs.