skipping unwated rows while reading a flat file in SSIS - ssis

I have a SSIS package which is trying to read data from a text file. The issue I am facing is that the text file doesn't have very straight forward data as in it has special characters which are creating trouble
For Example, right after the header row, there's a row full of hyphens, something like -----------------------------------------------------------------------------------------
This SSIS is reading as the first value of the first column beacause of which it fails. How do I get rid of this, without actually removing the row from the file itself?
Also, in later part of the file as well, there are some unwanted rows which I would like to ignore, the format of the file is something like this :
Header
Data
Random Rows
Same header row as above
Data
and so on.....
I would like to know if there's a way to handle this with script task or any other way before or while the 'Flat File source' task gets executed, without actually making changes in the original file.

I don't know of anyway to filter these rows on input using the Flat File Source component, but you can definitely do some filtering if you read the file in with a Script Component.
If you add a reference to Microsoft.VisualBasic, you can use the below function to read your CSV into a datatable:
public static DataTable ReadInDataFromCSV(string fileName, string delimiter)
{
DataTable dtOutput = new DataTable();
//How many lines to read in. 0 for unlimited
int numberOfLines = 0;
using (TextFieldParser parser = new TextFieldParser(fileName))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(delimiter);
//Are column names in first row?
bool columnNamesInFirstRow = true;
int rowCounter = 0;
string[] currentRow;
while (!parser.EndOfData && rowCounter <= numberOfLines)
{
try
{
currentRow = parser.ReadFields();
/*****************************
Add some kind of logic here to skip over rows you don't
want to read in
*****************************/
if (columnNamesInFirstRow == true)
{
foreach (string column in currentRow)
{
dtOutput.Columns.Add(column);
}
columnNamesInFirstRow = false;
}
else
{
DataRow dr;
dr = dtOutput.NewRow();
dr.ItemArray = currentRow;
dtOutput.Rows.Add(dr);
columnNamesInFirstRow = false;
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
rowCounter += (numberOfLines == 0) ? 0 : 1;
}
}
return dtOutput;
}
By default, the above code will read a flat file into a DataTable by calling something like:
DataTable myInputData = ReadInDataFromCSV(#"Path to file",",")
If you modify the commend I added inside the try/catch, you can filter out the rows you aren't interested in. For example, to skip the rows with hypens, you can add a simple check like:
if (currentRow.IndexOf("-----") > 0)
{
continue;
}
else
{
//If/else statement from the original code that adds the data to a DataRow and then adds it to the DataTable
}
Then you can simply add more similar checks to include/not include certain rows in your file. Good luck!

Related

SSIS Script howto append text to end of each row in flat file?

I currently have a flat file with around 1million rows.
I need to add a text string to the end of each row in the file.
I've been trying to adapt the following code but not having any success :-
public void Main()
{
// TODO: Add your code here
var lines = System.IO.File.ReadAllLines(#"E:\SSISSource\Source\Source.txt");
foreach (string item in lines)
{
var str = item.Replace("\n", "~20221214\n");
var subitems = str.Split('\n');
foreach (var subitem in subitems)
{
// write the data back to the file
}
}
Dts.TaskResult = (int)ScriptResults.Success;
}
I can't seem to get the code to recognise the carriage return "\n" & am not sure howto write the row back to the file to replace the existing rather than add a new row. Or is the above code sending me down a rabbit hole & there is an easier method ??
Many thanks for any pointers &/or assistance.
Read all lines is likely getting rid of the \n in each record. So your replace won't work.
Simply append your string and use #billinKC's solution otherwise.
BONUS:
I think DateTime.Now.ToString("yyyyMMdd"); is what you are trying to append to each line
Thanks #billinKC & #KeithL
KeithL you were correct in that the \n was stripped off. So I used a slightly amended version of #billinKC's code to get what I wanted :-
string origFile = #"E:\SSISSource\Source\Sourcetxt";
string fixedFile = #"E:\SSISSource\Source\Source.fixed.txt";
// Make a blank file
System.IO.File.WriteAllText(fixedFile, "");
var lines = System.IO.File.ReadAllLines(#"E:\SSISSource\Source\Source.txt");
foreach (string item in lines)
{
var str = item + "~20221214\n";
System.IO.File.AppendAllText(fixedFile, str);
}
As an aside KeithL - thanks for the DateTime code however the text that I am appending is obtained from a header row in the source file which is being read into a variable in an earlier step.
I read your code as
For each line in the file, replace the existing newline character with ~20221214 newline
At that point, the value of str is what you need, just write that! Instead, you split based on the new line which gets you an array of values which could be fine but why do the extra operations?
string origFile = #"E:\SSISSource\Source\Sourcetxt";
string fixedFile = #"E:\SSISSource\Source\Source.fixed.txt";
// Make a blank file
System.IO.File.WriteAllText(fixedFile, "");
var lines = System.IO.File.ReadAllLines(#"E:\SSISSource\Source\Source.txt");
foreach (string item in lines)
{
var str = item.Replace("\n", "~20221214\n");
System.IO.File.AppendAllText(fixedFile, str);
}
Something like this ought to be what you're looking for.

Get Row Count for Data Transferred using EzAPI SSIS

I am transferring some data from one table to another using SSIS with EzAPI. How can I get the number of rows that were transferred?
My setup is as follows
EzPackage package = new EzPackage();
EzOleDbConnectionManager srcConn;
EzOleDbSource src;
EzOleDbConnectionManager destConn;
EzOleDbDestination dest;
EzDataFlow dataFlow;
destConn = new EzOleDbConnectionManager(package); //set connection string
srcConn = new EzOleDbConnectionManager(package);
dataFlow = new EzDataFlow(package);
src = Activator.CreateInstance(typeof(EzOleDbSource), new object[] { dataFlow }) as EzOleDbSource;
src.Connection = srcConn;
src.SqlCommand = odbcImport.Query;
dest = Activator.CreateInstance(typeof(EzOleDbDestination), new object[] { dataFlow }) as EzOleDbDestination;
dest.Connection = destConn;
dest.AttachTo(src, 0, 0);
dest.AccessMode = AccessMode.AM_OPENROWSET_FASTLOAD;
DTSExecResult result = package.Execute();
Where in this can I add something to get the number of rows? For all versions of SQL server 2008r2 and up
The quick answer is that the Row Count Transformation isn't included out of the box. I had a brief post about that: Row Count with EzAPI
I downloaded the source project from CodePlex and then edited EzComponents.cs (in EzAPI\src) and added the following code
[CompID("{150E6007-7C6A-4CC3-8FF3-FC73783A972E}")]
public class EzRowCountTransform : EzComponent
{
public EzRowCountTransform(EzDataFlow dataFlow) : base(dataFlow) { }
public EzRowCountTransform(EzDataFlow parent, IDTSComponentMetaData100 meta) : base(parent, meta) { }
public string VariableName
{
get { return (string)Meta.CustomPropertyCollection["VariableName"].Value; }
set { Comp.SetComponentProperty("VariableName", value); }
}
}
The component id above is only for 2008.
For 2012, it's going to be E26997D8C-70DA-42B2-8208-A19CE3A9FE41 I don't have a 2012 installation at the moment to confirm I didn't transpose a value there but drop a Row Count component onto a data flow, right click and look at the properties. The component/class id is what that value needs to be. Similar story if you're dealing with 2005.
So, once you have the ability to use EzRowCountTransform, you can simply patch it into your existing script.
// Create an instance of our transform
EzRowCountTransform newRC = null;
// Create a variable to use it
Variable newRCVariable = null;
newRCVariable = package.Variables.Add("RowCountNew", false, "User", 0);
// ...
src.SqlCommand = odbcImport.Query;
// New code here too
newRC = new EzRowCountTransform(dataFlow);
newRC.AttachTo(src);
newRC.Name = "RC New Rows";
newRC.VariableName = newRCVariable.QualifiedName;
// Continue old code
I have a presentation on various approaches I've used over time and what I like/don't like about them. Type more, click less: a programmatic approach to building SSIS. It contains sample code for creating the EzRowCountTransform and usage.

Warning messages with EZAPI EzDerivedColumn and input columns

When adding a derived column to a data flow with ezAPI, I get the following warnings
"Add stuff here.Inputs[Derived Column Input].Columns[ad_zip]" on "Add
stuff here" has usage type READONLY, but is not referenced by an
expression. Remove the column from the list of available input
columns, or reference it in an expression.
I've tried to delete the input columns, but either the method is not working or I'm doing it wrong:
foreach (Microsoft.SqlServer.Dts.Pipeline.Wrapper.IDTSInputColumn100 col in derFull.Meta.InputCollection[0].InputColumnCollection)
{
Console.WriteLine(col.Name);
derFull.DeleteInputColumn(col.Name);
}
I have the following piece of code that fixes the problem.
I got it from a guy called Daniel Otykier. So he is propably the one that should be credited for it... Unlesss he got it from someone else :-)
static public void RemoveUnusedInputColumns(this EzDerivedColumn component)
{
var usedLineageIds = new HashSet<int>();
// Parse all expressions used in new output columns, to determine which input lineage ID's are being used:
foreach (IDTSOutputColumn100 column in component.GetOutputColumns())
{
AddLineageIdsFromExpression(column.CustomPropertyCollection, usedLineageIds);
}
// Parse all expressions in replaced input columns, to determine which input lineage ID's are being used:
foreach (IDTSInputColumn100 column in component.GetInputColumns())
{
AddLineageIdsFromExpression(column.CustomPropertyCollection, usedLineageIds);
}
var inputColumns = component.GetInputColumns();
// Remove all input columns not used in any expressions:
for (var i = inputColumns.Count - 1; i >= 0; i--)
{
if (!usedLineageIds.Contains(inputColumns[i].LineageID))
{
inputColumns.RemoveObjectByIndex(i);
}
}
}
static private void AddLineageIdsFromExpression(IDTSCustomPropertyCollection100 columnProperties, ICollection<int> lineageIds)
{
int lineageId = 1;
var expressionProperty = columnProperties.Cast<IDTSCustomProperty100>().FirstOrDefault(p => p.Name == "Expression");
if (expressionProperty != null)
{
// Input columns used in expressions are always referenced as "#xxx" where xxx is the integer lineage ID.
var expression = expressionProperty.Value.ToString();
var expressionTokens = expression.Split(new[] { ' ', ',', '(', ')' });
foreach (var c in expressionTokens.Where(t => t.Length > 1 && t.StartsWith("#") && int.TryParse(t.Substring(1), out lineageId)))
{
if (!lineageIds.Contains(lineageId)) lineageIds.Add(lineageId);
}
}
}
Simple but not 100% Guaranteed Method
Call ReinitializeMetaData on the base component that EzApi is extending:
dc.Comp.ReinitializeMetaData();
This doesn't always respect some of the customizations and logic checks that EzAPI has, so test it carefully. For most vanilla components, though, this should work fine.
100% Guaranteed Method But Requires A Strategy For Identifying Columns To Ignore
You can set the UsageType property of those VirtualInputColumns to the enumerated value DTSUsageType.UT_IGNORED using EzApi's SetUsageType wrapper method.
But! You have to do this after you're done modifying any of the other metadata of your component (attaching other components, adding new input or output columns, etc.) since each of these triggers the ReinitializeMetaData method on the component, which automatically sets (or resets) all UT_IGNORED VirtualInputColumn's UsageType to UT_READONLY.
So some sample code:
// define EzSourceComponent with SourceColumnToIgnore output column, SomeConnection for destination
EzDerivedColumn dc = new EzDerivedColumn(this);
dc.AttachTo(EzSourceComponent);
dc.Name = "Errors, Go Away";
dc.InsertOutputColumn("NewDerivedColumn");
dc.Expression["NewDerivedColumn"] = "I was inserted!";
// Right here, UsageType is UT_READONLY
Console.WriteLine(dc.VirtualInputCol("SourceColumnToIgnore").UsageType.ToString());
EzOleDbDestination d = new EzOleDbDestination(f);
d.Name = "Destination";
d.Connection = SomeConnection;
d.Table = "dbo.DestinationTable";
d.AccessMode = AccessMode.AM_OPENROWSET_FASTLOAD;
d.AttachTo(dc);
// Now we can set usage type on columns to remove them from the available inputs.
// Note the false boolean at the end.
// That's required to not trigger ReinitializeMetadata for usage type changes.
dc.SetUsageType(0, "SourceColumnToIgnore", DTSUsageType.UT_IGNORED, false);
// Now UsageType is UT_IGNORED and if you saved the package and viewed it,
// you'll see this column has been removed from the available input columns
// ... and the warning for it has gone away!
Console.WriteLine(dc.VirtualInputCol("SourceColumnToIgnore").UsageType.ToString());
I was having exactly your problem and found a way to solve it. The problem is that the EzDerivedColumn has not the PassThrough defined in it's class.
You just need to add this to the class:
private PassThroughIndexer m_passThrough;
public PassThroughIndexer PassThrough
{
get
{
if (m_passThrough == null)
m_passThrough = new PassThroughIndexer(this);
return m_passThrough;
}
}
And alter the ReinitializeMetadataNoCast() to this:
public override void ReinitializeMetaDataNoCast()
{
try
{
if (Meta.InputCollection[0].InputColumnCollection.Count == 0)
{
base.ReinitializeMetaDataNoCast();
LinkAllInputsToOutputs();
return;
}
Dictionary<string, bool> cols = new Dictionary<string, bool>();
foreach (IDTSInputColumn100 c in Meta.InputCollection[0].InputColumnCollection)
cols.Add(c.Name, PassThrough[c.Name]);
base.ReinitializeMetaDataNoCast();
foreach (IDTSInputColumn100 c in Meta.InputCollection[0].InputColumnCollection)
{
if (cols.ContainsKey(c.Name))
SetUsageType(0, c.Name, cols[c.Name] ? DTSUsageType.UT_READONLY : DTSUsageType.UT_IGNORED, false);
else
SetUsageType(0, c.Name, DTSUsageType.UT_IGNORED, false);
}
}
catch { }
}
That is the strategy used by other components. If you want to see all the code you can check my EzApi2016#GitHub. I'm updating the original code from Microsoft to SQL Server 2016.

Create a Non-Database-Driven Lookup

Lots of references for creating lookups out there, but all seem to draw their values from a query.
I want to add a lookup to a field that will add items from a list of values that do not come from a table, query, or any other data source.
Such as from a string: "Bananas, Apples, Oranges"
..or a container ["Bananas", "Apples", "Oranges"]
Assume the string/container is a dynamic object. Drawing from an static enum is not a choice.
Is there a way to create lookups on the fly from something other than a data source?
Example code would be a great help, but I'll take hints as well.
There is the color picker.
Also in the Global you will find pickXxxx such as pickList.
There are others, pickUser, pickUserGroup etc.
Take a look on the implementation. I guess they build a temporary table then displays that. Tables are great!
Update:
To go on you own follow the rules.
For the advanced user, see also: Lookup form returning more than one value.
public void lookup()
{
SysTableLookup sysTableLookup;
TmpTableFieldLookup tmpTableFieldLookup;
Enumerator en;
List entitylist = new list(types::String);
entitylist.addend("Banana");
entitylist.addend("Apple");
en = entityList.getEnumerator();
while (en.moveNext())
{
tmpTableFieldLookup.TableName = en.current();
tmpTableFieldLookup.insert();
}
sysTableLookup = SysTableLookup::newParameters(tableNum(tmpTableFieldLookup), this);
sysTableLookup.addLookupfield(fieldNum(TmpTableFieldLookup, TableName));
//BP Deviation documented
sysTableLookup.parmTmpBuffer(tmpTableFieldLookup);
sysTableLookup.performFormLookup();
}
The above code helps in displaying strings as lookup.
I'm also guessing there's no way to perform a lookup without a table. I say that because a lookup is simply a form with one or more datasources that is displayed in a different way.
I've also blogged about this, so you can get some info on how to perform a lookup, even with a temporary table, here:
http://devexpp.blogspot.com.br/2012/02/dynamics-ax-custom-lookup.html
Example from global::PickEnumValue:
static int pickEnumValue(EnumId _enumId, boolean _omitZero = false)
{
Object formRun;
container names;
container values;
int i,value = -1,valueIndex;
str name;
#ResAppl
DictEnum dictEnum = new DictEnum(_enumId);
;
if (!dictEnum)
return -1;
for (i=1;i<=dictEnum.values();i++)
{
value = dictEnum.index2Value(i);
if (!(_omitZero && (value == 0)))
{
names += dictEnum.index2Label(i);
values += value;
}
}
formRun = classfactory.createPicklist();
formRun.init();
formRun.choices(names, #ImageClass);
formRun.caption(dictEnum.label());
formRun.run();
formRun.wait();
name = formRun.choice();
value = formRun.choiceInt();
if (value>=0) // the picklist form returns -1 if a choice has not been made
{
valueIndex = -1;
for (i=1;i<=conLen(names);i++)
{
if (name == conPeek(names,i))
{
valueIndex = i;
break;
}
}
if (valueIndex>=0)
return conPeek(values,valueIndex);
}
return value;
}
It isn't the most graceful solution, but this does work, and it doesn't override or modify any native AX 2012 objects:
Copy the sysLookup form from AX2009 (rename it) and import it into AX 2012.
We'll call mine myLookupFormCopy.
I did a find/replace of "sysLookup" in the XPO file to rename it.
Create this class method:
public static client void lookupList(FormStringControl _formStringControl, List _valueList, str _columnLabel = '')
{
Args args;
FormRun formRun;
;
if (_formStringControl && _valueList && _valueList.typeId() == Types::String)
{
args = new Args(formstr(myLookupFormCopy));
args.parmObject(_valueList);
args.parm(_columnLabel);
formRun = classFactory.formRunClass(args);
_formStringControl.performFormLookup(formRun);
}
}
In the lookup method for your string control, use:
public void lookup()
{
List valueList = new List(Types::String);
;
...build your valueList here...
MyClass::lookupList(this, valueList, "List Title");
super();
}

What's the fastest way to search a very long list of words for a match in actionscript 3?

So I have a list of words (the entire English dictionary).
For a word matching game, when a player moves a piece I need to check the entire dictionary to see if the the word that the player made exists in the dictionary. I need to do this as quickly as possible. simply iterating through the dictionary is way too slow.
What is the quickest algorithm in AS3 to search a long list like this for a match, and what datatype should I use? (ie array, object, Dictionary etc)
I would first go with an Object, which is a hash table (at least, storage-wise).
So, for every word in your list, make an entry in your dictionary Object and store true as its value.
Then, you just have to check if a given word is a key into your dictionary to know whether the word the user has choosen is valid or not.
This works really fast in this simple test (with 10,000,000 entries):
var dict:Object = {};
for(var i:int = 0; i < 10000000; i++) {
dict[i] = true;
}
var btn:Sprite = new Sprite();
btn.graphics.beginFill(0xff0000);
btn.graphics.drawRect(0,0,50,50);
btn.graphics.endFill();
addChild(btn);
btn.addEventListener(MouseEvent.CLICK,checkWord);
var findIt:Boolean = true;
function checkWord(e:MouseEvent):void {
var word:String;
if(findIt) {
word = "3752132";
} else {
word = "9123012456";
}
if(dict[word]) {
trace(word + " found");
} else {
trace(word + " not found");
}
findIt = !findIt;
}
It takes a little longer to build the dictionary, but lookup is almost instantaneous.
The only caveat is that you will have to consider certain keys that will pass the check and not necessarily be part of your words list. Words such as toString, prototype, etc. There are just a few of them, but keep that in mind.
I would try something like this with your real data set. If it works fine, then you have a really easy solution. Go have a beer (or whatever you prefer).
Now, if the above doesn't really work after testing it with real data (notice I've build the list with numbers cast as strings for simplicity), then a couple of options, off the top of my head:
1) Partition the first dict into a set of dictionaries. So, instead of having all the words in dict, have a dictionary for words that begin with 'a', another for 'b', etc. Then, before looking up a word, check the first char to know where to look it up.
Something like:
var word:String = "hello";
var dictKey:String = word.charAt(0);
// actual check
if(dict[dictKey][word]) {
trace("found");
} else {
trace("not found");
}
You can eventually repartition if necessary. I.e, make dict['a'] point to another set of dictionaries indexed by the first two characters. So, you'll have dict['a']['b'][wordToSearch]. There are a number of possible variations on this idea (you'd also have to come up with some strategy to cope with words of two letters, such as "be", for instance).
2) Try a binary search. The problem with it is that you'll first have to sort the list, upfront. You have to do it just once, as it doesn't make sense to remove words from your dict. But with millions of words, it might be rarther intensive.
3) Try some fancy data structures from open source libraries such as:
http://sibirjak.com/blog/index.php/collections/as3commons-collections/
http://lab.polygonal.de/ds/
But again, as I said above, I'd first try the easiest and simpler solution and check if it works against the real data set.
Added
A simple way to deal with keywords used for Object's built-in properties:
var dict:Object = {};
var keywordsInDict:Array = [];
function buildDictionary():void {
// let's assume this is your original list, retrieved
// from XML or other external means
// it contains "constructor", which should be dealt with
// separately, as it's a built-in prop of Object
var sourceList:Array = ["hello","world","foo","bar","constructor"];
var len:int = sourceList.length;
var word:String;
// just a dummy vanilla object, to test if a word in the list
// is already in use internally by Object
var dummy:Object = {};
for(var i:int = 0; i < len; i++) {
// also, lower-casing is a good idea
// do that when you check words as well
word = sourceList[i].toLowerCase();
if(!dummy[word]) {
dict[i] = true;
} else {
// it's a keyword, so store it separately
keywordsInDict.push(word);
}
}
}
Now, just add an extra check for built-in props in the checkWords function:
function checkWord(e:MouseEvent):void {
var word:String;
if(findIt) {
word = "Constructor";
} else {
word = "asdfds";
}
word = word.toLowerCase();
var dummy:Object = {};
// check first if the word is a built-in prop
if(dummy[word]) {
// if it is, check if that word was in the original list
// if it was present, we've stored it in keywordsInDict
if(keywordsInDict.indexOf(word) != -1) {
trace(word + " found");
} else {
trace(word + " not found");
}
// not a built-in prop, so just check if it's present in dict
} else {
if(dict[word]) {
trace(word + " found");
} else {
trace(word + " not found");
}
}
findIt = !findIt;
}
This isn't specific to ActionScript, but a Trie is a suitable data structure for storing words.