SSIS: How to read WebSphere MQ, transform, and write to flat file? - ssis

I have data on a WebSphere MQ queue. I've written a script task to read the data, and I can output it to a variable or a text file. But I want to use that as input to a dataflow step and transform the data. The ultimate destination is a flat file.
Is there a way to read the variable as a source into a dataflow step? I could write the MQ data to a text file, and read the text file in the dataflow, but that seems like a lot of overhead. Or I could skip the dataflow altogther, and write all the transformations in a script (but then why bother with SSIS in the first place?)
Is there a way to write a Raw File out of the script step, to pass into the dataflow component?
Any ideas appreciated!

If you've got the script that consumes the webservice, you can skip all the intermediary outputs and simply use it as a source in your dataflow.
Drag a Data Flow Task onto the canvas and the add a Script Component. Instead of selecting Transformation (last option), select Source.
Double-Click on the Script Component and choose the Input and Output Properties. Under Output 0, select Output Columns and click Add Column for however many columns the web service has. Name them appropriately and be certain to correctly define their metadata.
Once the columns are defined, click back to the Script tab, select your language and edit the script. Take all of your existing code that could write that consumes the service and we'll use it here.
In the CreateNewOutputRows method, you will need to iterate through the results of the Websphere MQ request. For each row that is returned, you would apply the following pattern.
public override void CreateNewOutputRows()
{
// TODO: Add code here or in the PreExecute to fill the iterable object, mqcollection
foreach (var row in mqcollection)
{
// Adds a new row into the downstream buffer
Output0Buffer.AddRow();
// Assign all the data to the correct locations
Output0Buffer.Column = row.Column;
Output0Buffer.Column1 = row.Column1;
// handle nulls appropriately
if (string.IsNullOrEmpty(row.Column2))
{
Output0Buffer.Column2_IsNull = true;
}
else
{
Output0Buffer.Column2 = row.Column2;
}
}
}
You must handle nulls via the _IsNull attribute or your script will blow up. It's tedious work versus a normal source but you'll be far more efficient, faster and consume fewer resources than dumping to disk or some other staging mechanism.

Since I ran into some additional "gotchas", I thought I'd post my final solution.
The script I am using does not call a webservice, but directly connects and reads the WebSphere queue. However, in order to do this, I have to add a reference to amqmdnet.dll.
You can add a reference to a Script Task (which sits on the Control Flow canvas), but not to a Script Component (which is part of the Data Flow).
So I have a Script Task, with reference and code to read the contents of the queue. Each line in the queue is just a fixed width record, and each is added to a List. At the end, the List is put into a Read/Write object variable declared at the package level.
The Script feeds into a Data Flow task. The first component of the Data Flow is a Script Component, created as Source, as billinkc describes above. This script casts the object variable back to a list. Then parses each item in the list to fields in the Output Buffer.
Various split and transform tasks take over from there.

Try using the Q program available in the MA01 MQ supportpac instead of your script.

Related

Foundry writebacks - is it possible to restore an edited record to it's unedited version (BaseVersion)

Palantir-Foundry - We have a workflow that needs updates from the backing dataset of an object with a writeback to persist in the writeback, but this fails on rows that have previously been edited. Due to the "Edits-win" model the writeback it will always choose the edited version of the row, which makes sense. Short of re-architecting the entire app, I am looking into ways to take care of this by using the Foundry REST API.
Is it possible to revert an edited row in Foundry writebacks to the original unedited version? I found some API documentation in our instance for phonograph2 BaseVersion, but I have not been able to find/understand anything that would restore a row to BaseVersion. I would need to be able to do this from a functions repository using typescript, on certain events.
One way to overwrite the edits with the values from a backing datasets is to build a transform off of the backing dataset that makes a new, identical dataset. Then you can use the new dataset as backing dataset for a new object.
Transform using a simple code repo:
from transforms.api import transform_df, Input, Output
#transform_df(
Output(".../static_guests"),
source_df=Input("<backing dataset RID>"),
)
def compute(source_df):
return source_df
You can then build up the ontology of a static object that will always equal the writeback dataset.
Then create an action that will modify your edited object (in my example that is Test Guest) by reverting a value to equal a value in the static object type.
You can then use the Apply Action API to automatically apply this action to certain values on a schedule or based on a certain condition. Documentation for the API is here.

Making a custom reporter for JSCS results in Gulp4

Please correct me where I'm wrong (still learning Gulp, Streams, etc.) I'd like to create a custom reporter for my gulp-jscs results. For example, let's say I have 3 files in my gulp.src() stream. To my knowledge, each is piped one at a time through jscs, which attaches a .jscs object onto the file with its results, one such variable in that object is .errorCount.
What I'd like to do is have a variable I create, ie: maxErrors which I set to, say 5. Since we're processing 3 files, let's say the first file passes with 0 errors, but the next has 3 errors. I don't want to prematurely stop processing since the maxErrors tally has not been reached (3/5 currently). So it should continue to process the next file which lets say has 3 errors as well, putting us over our max, so that we interrupt jscs from continuing to process more files and instead fail out and then let our custom reporter function gain access to the files that have been processed so I can look at their .jscs objects and customize some output.
My problem here is that I don't understand the docs when they say: .pipe(jscs.reporter('name-of-reporter')) How does a string value invoke my reporter (which currently exists as a function I've imported called libs.reporters.myJSCSReporter. I know pipe() expects Stream objects, so I can't just put a function in the .pipe() call.
I hope I've explained myself well enough (please ask for clarifications otherwise).

SSIS best way to load configuration to custom script

I am using SSIS 2012 and i need to figure out the best way to load multiple configuration files to be used in a custom script.
This is the way it goes:
I need to use a custom script to access a NoSQL database
In this case, the NoSQL database has no rigid schema, therefore the attribute change from document to document
I want to use configuration files to tell how the columns are supposed to be renamed and configure there other basic rules.
the above task is easily done in c#, however if possible i would like read the configuration files using a SSIS component (to read a flat file, excel file or database rules). Therefore i want to know how can i feed the custom script with the data from the stream, the scipt consumes stream (the stream contains the configuration), and after consuming the entire stream, the script component generates rows.
An example case would be be:
script reads an entire stream of numbers.
the script orders the numbers on the stream
the script discards duplicates the script
outputs the ordered sequence of numbers without duplicates.
If I understood correctly, the NoSql database and configuration files just background of the problem and what you really need is an asynchronous
script component to read everything from the pipeline, then do something and finally send the results back to the pipeline?
If so then what you need is create an script component with it's output buffer set to SynchronousInputId=None.
The example of the numbers to be deduped and sorted that you posted could then be solved with the following pseudo-code
(assume you create an output column in the output buffer of the script component called "numberout"
and output buffer property SynchronousInputId is set to None) :
...
public override void PreExecute()
{
base.PreExecute();
CREATE ARRAY TO HOLD NUMBERS
}
public override void PostExecute()
{
base.PostExecute();
SORT AND DEDUPE ARRAY
FOR EACH N IN ARRAY:
output0buffer.addrow()
output0byffer.numberout=N
}
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
INSERT NUMBER TO ARRAY
}

Logging different project libraries, with a single logging library

I have a project in Apps script that uses several libraries. The project needed a more complex logger (logging levels, color coding) so I wrote one that outputs to google docs. All is fine and dandy if I immediately print the output to the google doc, when I import the logger in all of the libraries separately. However I noticed that when doing a lot of logging it takes much longer than without. So I am looking for a way to write all of the output in a single go at the end when the main script finishes.
This would require either:
Being able to define the logging library once (in the main file) and somehow accessing this in the attached libs. I can't seem to find a way to get the main projects closure from within the libraries though.
Some sort of singleton logger object. Not sure if this is possible from with a library, I have trouble figuring it out either way.
Extending the built-in Logger to suit my needs, not sure though...
My project looks at follows:
Main Project
Library 1
Library 2
Library 3
Library 4
This is how I use my current logger:
var logger = new BetterLogger(/* logging level */);
logger.warn('this is a warning');
Thanks!
Instead of writing to the file at each logged message (which is the source of your slow down), you could write your log messages to the Logger Library's ScriptDB instance and add a .write() method to your logger that will output the messages in one go. Your logger constructor can take a messageGroup parameter which can serve as a unique identifier for the lines you would like to write. This would also allow you to use different files for logging output.
As you build your messages into proper output to write to the file (don't write each line individually, batch operations are your friend), you might want to remove the message from the ScriptDB. However, it might also be a nice place to pull back old logs.
Your message object might look something like this:
{
message: "My message",
color: "red",
messageGroup: "groupName",
level: 25,
timeStamp: new Date().getTime(), //ScriptDB won't take date objects natively
loggingFile: "Document Key"
}
The query would look like:
var db = ScriptDb.getMyDb();
var results = db.query({messageGroup: "groupName"}).sortBy("timeStamp",db.NUMERIC);

How to log SQL queries to a log file with CakePHP

I have a CakePHP 1.2 application that makes a number of AJAX calls using the AjaxHelper object. The AjaxHelper makes a call to a controller function which then returns some data back to the page.
I would like to log the SQL queries that are executed by the AJAX controller functions. Normally, I would just turn the debug level to 2 in config/core.php, however, this breaks my AJAX functionality because it causes the output SQL queries to be appended to the output that is returned to the client side.
To get around this issue, I would like to be able to log any SQL queries performed to a log file. Any suggestions?
I found a nice way of adding this logging functionality at this link:
http://cakephp.1045679.n5.nabble.com/Log-SQL-queries-td1281970.html
Basically, in your cake/libs/model/datasources/dbo/ directory, you can make a subclass of the dbo that you're using. For example, if you're using the dbo_mysql.php database driver, then you can make a new class file called dbo_mysql_with_log.php. The file would contain some code along the lines of the following:
App::import('Core', array('Model', 'datasource', 'dbosource', 'dbomysql'));
class DboMysqlWithLog extends DboMysql {
function _execute($sql) {
$this->log($sql);
return parent::_execute($sql);
}
}
In a nutshell, this class modifies (i.e. overrides) the _execute function of the superclass to log the SQL query before doing whatever logic it normally does.
You can modify your app/config/database.php configuration file to use the new driver that you just created.
This is a fantastic way to debug things like this, https://github.com/cakephp/debug_kit