FINDSTRING multiple criteria for derived column in SSIS - ssis

I am new to SSIS and I am having an issue populating a derived column based on more than one criteria from strings in a column.
I have managed to get it working with a single criteria as an example:
FINDSTRING(OS,"Server",1) > 0 ? "Server" : "Desktop"
The above works and populates anything with server in the OS to "server" and everything else to "Desktop" but I have other strings that can identify a server. what I have tried as an example is:
FINDSTRING(OS,"Server", "Red Hat", "AIX",1) > 0 ? "Server" : "Desktop"
I basically have about 10 key words that id a server in the OS column so I want to output Server for these in the derived column and Desktop for anything without those strings.
Is that possible? I thought about doing 10 different find strings but I assumed the outputs would overwrite each other.
Thank you.

That's not valid syntax so no, what you're directly attempting won't work.
If you have a constrained list of values aka a static list for your server values then I would take the approach of adding 10 Derived Columns to your data flow.
They will take the following pattern of adding a new boolean (true/false) column to the data flow. You do not need to bring those values into your final table but you will use them to compute the final value of whether this is a desktop or server operating system.
DER HPUX
Add a derived column to the data flow and name it DER HPUX where the final word(s) are the server key word. You'll then add a new column to the using the FINDSTRING syntax and name it along the lines of your component name
hasHPUX FINDSTRING(LOWER([OS]), "hpux") > 0
Note that I have explicitly cast the OS to lower case here and do the same with my argument to findstring as I don't know for certain whether Red Hat will always be Red Hat and not red hat or RED HAT. You know your data better but that may help if the data is inconsistently formatted.
Repeat this pattern for all your keywords
DER IsServer
Here I'll create another intermediate column called IsServer and all I'm going to do is OR all the preceding has* named
hasHPUX || hasRedHat || hasAix || etc
DER Determine Server or Desktop
Finally, we're ready to use the newly created column of isServer to populate the column OSClass or whatever you want it called
(isServer) ? "Server" : "Desktop"
Wow, that's a lot of work, why in the world would you do all of that?
Testing.
You can't debug an expression. By breaking all of the complex logic down into tiny nibbles, you can put a Data Viewer in the package after the DER isServer and at a glance determine why something is or isn't setting the appropriate flag value.
Sure, you could do this in a single expression like the syntax you are trying but help your future self by not doing that. When your expression is so long you can't see all of it in the editor, it's time to break it into smaller units.
(FINDSTRING(LOWER([OS]), "hpux") > 0 || FINDSTRING(LOWER([OS]), "red hat") > 0 || FINDSTRING(LOWER([OS]), "aix")> 0 || etc) ? "Server" : "Desktop"
References
FINDSTRING
|| aka OR
?: aka ternary operator

Related

Is there a way to store database modifications with a versioning feature (for eventual versions comparaison)?

I'm working on a project where users could upload excel files into a MySQL database. Those files are the main source of our data as they come directly from the contractors working with the company. They contain a large number of rows (23000 on average for each file) and 100 columns for each row!
The problem I am facing currently is that the same file could be changed by someone (either the contractor or the company) and when re-uploading it, my system should detect changes, update the actual data, and save the action (The fact that the cell went from a value to another value :: oldValue -> newValue) so we can go back and run a versions comparison (e.g 3 re-uploads === 3 versions). (oldValue Version1 VS newValue Version5)
I developed a tiny mechanism for saving the changes => I have a table to save Imports data (each time a user import a file a new row will be inserted in this table) and another table for saving the actual changes
Versioning data
I save the id of the row that have some changes, as well as the id and the table where the actual data was modified (Uploading a file results in a insertion in multiple tables, so whenever a change occurs, I need to know in which table that happened). I also save the new value and the old value which is gonna help me with restoring the "archives data".
To restore a version : SELECT * FROM 'Archive' WHERE idImport = ${versionNumber}
To restore a version for one row : SELECT * FROM 'Archive' WHERE idImport = ${versionNumber} and rowId = ${rowId}
To restore all version for one row : SELECT * FROM 'Archive' WHERE rowId = ${rowId}
To restore version for one table : SELECT * FROM 'Archine' WHERE tableName = ${table}
Etc.
Now with this structure, I'm struggling to restore a version or to run a comparaison between two versions, which makes think that I've came up with a wrong approach since it makes it hard to do the job! I am trying to know if anyone had done this before or what a good approach would look like?
Cases when things get really messy :
The rows that have changed in a version might not have changed in the other version (I am working on a time machine to search in other versions when this happens)
The rows have changed in both versions but not the same fields. (Say we have a user table, the data of the user with id 15 have changed in 2nd and 5th upload, great! Now for the second version only the name was changed, but for the fifth version his address was changed! When comparing these two versions, we will run into a problem constrcuting our data array. name went from "some"-> NULL (Name was never null. No name changes in 5th version) and address went from NULL -> "some' is which obviously wrong).
My actual approach (php)
<?php
//Join records sets and Compare them
foreach ($firstRecord as $frecord) {
//Retrieve first record fields that have changed
$fFields = $frecord->fieldName;
//Check if the same record have changed in the second version as well
$sId = array_search($frecord->idRecord, $secondRecord);
if($sId) {
$srecord = $secondRecord[$sId];
//Retrieve straversee fields that have changed
$sFields = $srecord->fieldName;
//Compare the two records fields
foreach ($fFields as $fField) {
$sfId = array_search($fField, $sFields);
//The same field for the same record was changed in both version (perfect case)
if($sfId) {
$sField = $sFields[$sfId];
$deltaRow[$fField]["oldValue"] = $frecord->deltaValue;
$deltaRow[$fField]["newValue"] = $srecord->deltaValue;
//Delete the checked field from the second version traversee to avoid re-checking
unset($sField[$sfId]);
}
//The changed field in V1 was not found in V2 -> Lookup for a value
else {
$deltaRow[$fField]["oldValue"] = $frecord->deltaValue;
$deltaRow[$fField]["newValue"] = $this->valueLookUp();
}
}
$dataArray[] = $deltaRow;
//Delete the checked record from the second version set to avoid re-checking
unset($secondRecord[$srecord]);
}
I don't know how to deal with that, as I said I m working on a value lookup algorithm so when no data found in a version I will try to find it in the versions between theses two so I can construct my data array. I would be very happy if anyone could give some hints, ideas, improvements so I can go futher with that.
Thank you!
Is there a way to store database modifications with a versioning feature (for eventual versions comparaison [sic!])?
What constitutes versioning depends on the database itself and how you make use of it.
As far as a relational database is concerned (e.g. MariaDB), this boils down to the so called Normal Form which is in numbers.
On Database Normalization: 5th Normal Form and Beyond you can find the following guidance:
Beyond 5th normal form you enter the heady realms of domain key normal form, a kind of theoretical ideal. Its practical use to a database designer os [sic!] similar to that of infinity to a bookkeeper - i.e. it exists in theory but is not going to be used in practice. Even the most demanding owner is not going to expect that of the bookkeeper!
One strategy to step into these realms is to reach the 5th normal form first (do this just in theory, by going through all the normal forms, and study database normalization).
Additionally you can construe versioning outside and additional to the database itself, e.g. by creating your own versioning system. Reading about what you can do with normalization will help you to find better ways to decide on how to structure and handle the database data for your versioning needs.
However, as written it depends on what you want and need. So no straight forward "code" answer can be given to such a general question.

SSIS how handle var length in Derived column to avoid truncation

I have this setup like in illustration attached, hope it clear enough. I load all xls files in loop from given folder.
While implementing new Derived Column box to record FileName from my loop I got truncation error. My variable initially was set to CCS.xls (Len=7, shortest name ).
I tried to increase Length in Derived Column Editor but failed to do this, as it's not active, I can't not type anything there, then I track that that original Length came from Variables value. In Variable windows I have DataType = String and no any option to set length.
So for now I made dummy empty file with looong CCS____1.xls name to avoid this problem and it works OK. But want to learn other good way to avoid this problem, looks like in this setup for data connection I need to use file with longest name (?)
You can change the Length property to 50 or larger manually in Advanced Editor.
Right-Click on the Derived column->Show Advanced Editor->Input and Output Properties->Derived Column Output->Output Columns->the new Column->Data Type Properties->Length

Reshape the dataset into more relational format (Transpose SOME rows and assign them to a data subset)

I have a spreadsheet/csv:
Code:,101,Course Description:,"Introduction to Rocket Science",
Student Name,Lecture Hours,Labs Hours,Test Score,Status
John Galt,48,120,4.7,Passed
James Taggart,50,120,4.9,Passed
...
I need to reshape it to the following view:
Code:,Course Description:,Students,Lecture Hours,Labs Hours,Average Test Score,Teaching Staff
101,"Introduction to Rocket Science",John Galt,48,120,4.7,Passed
101,"Introduction to Rocket Science",James Taggart,50,120,4.9,Passed
...
Beleive it or not, can not get the right idea how to do that despite it seems to be very primitive transformation, is there any silver bullet for this?
Original records (csv) have in a way json-like structure so my first approach was to represent the original data as a vector and then transpose it, (but in this case my resulting table looks like sparced matrix - rows I have transpored are blank in the rest of its values)
Another way Im considering - **serialize it into jsons and then de-serialize** into new spreadsheet (jsonize()) - in this case, Im having problems with merging them properly.
In both ways I have it "half-working";
Can anyone suggest simple and reliable algorithm for this;
Any language, RegEx, any tools, code snippets are very appreciated
Assuming that the pattern you've described here is consistent throughout, there are quite a few different approaches you could take I think, but in all cases you basically can use that fact that the 'Course' rows start with "Code:" but that's never going to be a student name.
You can take advantage of this either by a regular expression find/replace, or within OpenRefine.
Example:
Open file in a text editor that supports regular expressions in
find/replace
Search for lines starting with 'Code:' and add additional commas to the start of the row to shift the course data columns to the
right e.g. search for: ^Code: replace with: ,,,,,^Code:
If you now import the file into OpenRefine then you'll have a project with 10 columns (the 10th col is caused by the trailing
comma at the end of the course data row)
You can now use Transpose (or just rename) on the right-most columns which contain the course data, while leaving the left-most
columns which contain the student details
Isolate the rows that contain the phrase 'Student Name' in the first column and remove them (via a filter or facet)
Move the Course Code/Description columns to the beginning of the project, and use the 'Edit Cells->Fill Down' option on each column to get the values repeated on all the relevant lines
Finally rename the columns as you want, remove any extraneous columns

MySQL - return one row from 2 rows in the same table, overwrite the contents of the first 'default' with the populated fields of the second 'override'

I am trying to make use of the mobile device lookup data in the WUFL database at http://wurfl.sourceforge.net/smart.php but I'm having problems getting my head around the MySQL code needed (I use Coldfusion for the server backend). To be honest its really doing my head in but I'm sure there is a straightforward approach to this.
The WUFL is supplied as XML (approx 15200 records to date), I have the method written that saves the data to a MySQL database already. Now I need to get the data back out in a useful way!
Basically it works like this: firstly run a select using the userAgent data from a CGI pull to match against a known mobile device (row 1) using LIKE; if found then use the resultant fallback field to look up the default data for the mobile device's 'family root' (row 2). The two rows need to be combined by overwriting the contents of (row 2) with the specific mobile device's features of (row 1). Both rows contain NULL entries and not all the features are present in (row 1).
I just need the fully populated row of data returned if a match is found. I hope that makes sense, I would provide what I think the SQL should look like but I will probably confuse things even more.
Really appreciate any assistance!
This would be my shot at it in SQL Server. You would need to use IFNULL instead of ISNULL:
SELECT
ISNULL(row1.Feature1, row2.Feature1) AS Feature 1
, ISNULL(row1.Feature2, row2.Feature2) AS Feature 2
, ISNULL(row1.Feature3, row2.Feature3) AS Feature 3
FROM
featureTable row1
LEFT OUTER JOIN featureTable row2 ON row1.fallback = row2.familyroot
WHERE row1.userAgent LIKE '%Some User Agent String%'
This should accomplish the same thing in MySQL:
SELECT
IFNULL(row1.Feature1, row2.Feature1) AS Feature 1
, IFNULL(row1.Feature2, row2.Feature2) AS Feature 2
, IFNULL(row1.Feature3, row2.Feature3) AS Feature 3
FROM
featureTable AS row1
LEFT OUTER JOIN featureTable AS row2 ON row1.fallback = row2.familyroot
WHERE row1.userAgent LIKE '%Some User Agent String%'
So what this does, is takes your feature table, aliases it as row1 to get your specific model features. We then join it back to itself as row2 to get the family features. Then the ISNULL function says "if there is no Feature1 value in row 1 (it's null) then get the Feature1 value from row2".
Hope that helps.

SSIS handling NULL and blank spaces

hello i am new to SSIS and iam receiving text file created by SSIS iam using wizard to load it to oracle table but in the text file there is columns contain the String NULL and other contain blank string instead of zero length column is there an auto way to make these value to become actual null value in the table or do i have to create derived column for each one of theses cases
thank you,
Within an SSIS project in the SQL Server Data Tools for Visual Studio 2015/SQL Server 2016, the way to address the handling of empty columns seems to be via a property of the Flat File Source component (not certain whether space-only columns qualify):
Right-click the Flat File Source and choose Show Advanced Editor....
Select the Component Properties tab.
Set RetainNulls property to True (default is False).
If you want to convert the value into null if your input value in empty/blank, then you can try (under assumption datatype is of string/varchar) :
LEN(TRIM([ColumnName]))==0 ? NULL(DT_WSTR, 10) : [ColumnName]
I faced the same issue, you can use a script component and add the code below to loop through all the columns and replace each text null with actual null value...
foreach (PropertyInfo dataColumn in Row.GetType().GetProperties())
{
if (dataColumn.Name.ToLower().EndsWith("_isnull") == false && dataColumn.PropertyType == typeof(string))
{
object objValue = dataColumn.GetValue(Row, null);
if (objValue != null && objValue.ToString() == 'NULL')
{
dataColumn.SetValue(Row, null, null);
}
}
}
Code explanation is here
If you're using SSIS 2008, there's also the Null Manager component from Tactek Data Systems. It isn't free, but it's pretty cheap - like $10 bucks. (www.tactek.com). You can convert empty strings to nulls, nulls to empty strings, and nulls to "filler" values like "Unknown" or "NA".
I don't think there is any way to do this using the standard Flat File Source SSIS provides. To do this I make use of a custom component called Delimited File Source, which can be downloaded here: http://ssisdfs.codeplex.com/. As its name indicates, it's also much better at handling delimited files, plus it has the option of treating empty strings as NULL.