SSIS Processing money fields with what looks like signs over the last digit - ssis

I have a fixed length flat file input file. The records look like this
40000003858172870114823 0010087192017092762756014202METFORMIN HCL ER 500 MG 0000001200000300900000093E00000009E00000000{0000001{00000104{JOHN DOE 196907161423171289 2174558M2A2 000 xxxx YYYYY 100000000000 000020170915001 00010000300 000003zzzzzz 000{000000000{000000894{ aaaaaaaaaaaaaaa P2017092700000000{00000000{00000000{00000000{ 0000000{00000{ F89863 682004R0900001011B2017101109656 500 MG 2017010100000000{88044828665760
If you look just before the JOHN DOE you will see a field that represents a money field. It looks like 00000104{.
This looks like the type of field I used to process from a mainframe many years ago. How do I handle this in SSIS. If the { on the end is in fact a 0, then I want the field to be a string that reads 0000010.40.
I have other money fields that are, e.g. 00000159E. If my memory serves me correctly, that would be 00000015.95.
I can't find anything on how to do this transform.
Thanks,
Dick Rosenberg

import the values as strings
00000159E
00000104{
in derived column do your transforms with replace:
replace(replace(col,"E","5"),"{","0")
in another derived column cast to money and divide by 100
(DT_CY)(drvCol) / 100

I think you will need to either use a Script Component source in the data flow, or use a Derived Column transformation or Script Component transformation. I'd recommend a Script Component either way as it sounds like your custom logic will be fairly complex.
I have written a few detailed answers about how to implement a Script component source:
SSIS import a Flat File to SQL with the first row as header and last row as a total
How can I load in a pipe (|) delimited text file that has columns that sometimes contain line breaks?
Essentially, you need to locate the string, "00000104{", for example, and then convert it into decimal/money form before adding it into the data flow (or during it if you're using a Derived Column Transformation).
This could also be done in a Script Component transformation, which would function in a similar way to the Derived Column transformation, only you'd perhaps have a bit more scope for complex logic. Also in a Script Component transformation (as opposed to a source), you'd already have all of your other fields in place from the Flat File Source.

Related

How can I read in a TXT file in Access that is over 255 char/line and contains control char?

I am running Access 2010. I need to read in a TXT file into a string. Each line can be anywhere from 40 to 320 char long, ending in a CR. The biggest problem is the TXT file of various lines contains comma's (,) and quotations (") as part of the data.
Is there a trick to doing this? Even if it is getting each char, and testing to see if it is a CR....
To accomplish this task, you will need to write your own import code that will read directly from the file. The Microsoft Access import features will not handle a file like this very well, and since you want to analyze each line in code, it is better to handle reading it yourself.
There are many approaches you can take, and all will involve File handles and Opening the file. But, the best approach is to use a class that does all of the dirty work for you.
One such class is the LargeTextFile class that can be found in any of the Microsoft Access Developer's Handbooks (Volume 1) for Access 97, 2000, 2002 or 2003, written by Getz, Litwin, and Gilbert (Sybex), if you have access to one of them.
Another option would be the clsReadTextFile class, available for free on the Access MVP Site (The Access Web) site:
http://www.theaccessweb.com/downloads/clsReadTextFile.txt
Using clsReadTextFile you can process your file, line by line using code similar to this:
Dim file As New clsReadTextFile
Dim line As String
file.FileName = "C:\MyFile.txt"
file.cfOpenFile
Do While Not file.EndOfFile
file.csGetALine
line = file.Text
If InStr(line, "MySearchText") Then
'Do something
End If
Loop
file.cfCloseFile
The line string variable will contain the text of the line just read, and you can write code to parse it how you need and process it appropriately. Then the loop will go on to read the next line. This will allow you to process each line of the file manually in your code.
It is not clear from your post as to whether or not you can - or have tried - to use the tools available in the product for this task. Access 2010 offers linking to a .txt file as well as appending a .txt file to a table. These are standard features in the External tab of the ribbon.
The Large Text (formerly Memo) field type allows ~4K characters. Not sure if you wish to attempt to bring in all the txt data into a single field - if so then this limit is important.
If the CRs of the text document imply a new record/row of data - rather than a continuous string for the entire document - - AND - - - if there is any consistent structure within all rows of data - then the import wizard can use either character count or symbols (i.e. comma if they exist) - as the means to separate/segregate each individual row of data into separate fields in a single row of a table.

How to split a camel message body into rows in order to iterate over them in Talend ESB

So, like the title says, I'm using Talend ESB in order to handle camel messaging. In my case, I'm sending the contents of a file as the message body to the child Talend job. In some scenarios the contents of the file may have 2+ rows. All I need is to be able to iterate over each of those rows independently within the child-job itself.
I guess my question is 2 folded. 1. If possible how do I do this? and 2. is the iteration process better suited at the route level, or the child-job the route calls.
Right now, the files I'm handling are | delimited. To handle this, I have the tRouteInput_1 going directly to a tExtractDelimtedFields and use those values to set variables globally, like so.
The problem with this, is it's only reading the first row of the file, and moving on. I need to be able to iterate over each row within the file/camel message.
Thanks,
Alex
First you need to split your file on the row delimiter using a tNormalize.
In my example, I simulate your tRouteInput by using a tFixedFlowInput containing the whole file as a single line, with rows separated by \n. Then for each resulting row returned by tNormalize, extract the fields you want (in tExtractDelimitedFields, create the schema corresponding to your row structure):
And the result:
.--------+--------.
| tLogRow_1 |
|=-------+-------=|
|field1 |field2 |
|=-------+-------=|
|field1.1|field1.2|
|field2.1|field2.2|
|field3.1|field3.2|
'--------+--------'
You need to escape "|" by using "\\|" inside tExtractDelimitedFields, as the component accepts regex, and the pipe has special meaning.
As for your 2nd question, I think it's better to do this inside the child job and not the route, as there are dedicated components for this not available in the routing perspective.

Generating truth tables for basic logic circuits

Let's say I have a text file that looks like this:
<number> <name> <type> <inputs...>
1 XOR1 XOR A B
2 SUM XOR 1 C
What would be the best approach to generate the truth table for this circuit?
That depends on what you have available, and how big your file is.
Perl is optimized for reading files and generating simple text output. It doesn't have a library of boolean operators, but they're easy enough to write. I'd use that if I just wanted text-in, text-out.
If I wanted to display the data online AND generate a results file, I'd use PHP to read the data and write the table to a CSV file that could either be opened in Excel, or posted online in an HTML table.
If your data is in a REALLY BIG data file, I'd use SQL.
If your data is in a really huge file that you want to be accessible to authorized users online, and you want THEM to be able to create truth tables, I'd use Oracle's APEX to create an easy interface for them to build their own truth tables and play around with the data without altering it.
If you're in an electrical engineering environment, use the tools designed for your problem -- Verilog or similar.
Whatcha got? Whatcha wanna do with it?
-- Ada
I prefer using C#. I already have the code to 'parse' the input
text file. I just don't know where to start in terms of
actually 'simulating' it. The output can simply be a text file
with inputs and output values – Don 12 mins ago
How many inputs and how many outputs in the circuit you want to simulate?
The size of the simulation determines how it can most easily be run. If the circuit is small(ish), you can enter the inputs and circuit values into vector arrays, then cross them to get the output matrix.
Matlab is ideal for this, as it was written for processing arrays.
Again: Whatcha got, and whatcha wanna do with it?
-- Ada

SSIS Derived Column - Parse Text between break returns

I have a text field from a SQL Server Source. It is a phone number field that typically has this format:
Home: 555-555-1212
Work: 555-555-1212
Cell: 555-555-1212
Emergency: 555-555-1212
I'm trying to split among fields so that only 555-555-1212 is displayed
I am then taking this field and converting to a string. There are literally break returns (\r\n) between the labels here. The goal here is to have this data split among multiple fields (home,work,cell,emergency,etc.) I was researching how to split text among fields and I made some progress. In the case of home numbers, I used this logic:
SUBSTRING(Phone_converted,FINDSTRING(Phone_converted,"Home:",1) + 5,FINDSTRING(Phone_converted,"\n",1) - FINDSTRING(Phone_converted,"Home:",1) - 5)
This works great as it parses up to the text return and I get 555-555-1212.
Now I experience an issue when searching for a text between break returns. I tried the same logic for Work numbers:
SUBSTRING(Phone_converted,FINDSTRING(Phone_converted,"Work:",1) + 5,FINDSTRING(Phone_converted,"\n",1) - FINDSTRING(Phone_converted,"Work:",1) - 5)
But that won't work and results in writing to my error redirection file. I then tried to insert a break return to find the text at the beginning
SUBSTRING(Phone_converted,FINDSTRING(Phone_converted,"\nWork:",1) + 5,FINDSTRING(Phone_converted,"\n",1) - FINDSTRING(Phone_converted,"\nWork:",1) - 5)
No luck there either. Any ideas on how I can address this. Also, I would appreciate an idea of how I can handle the emergency title at the end. There won't be a break return in that situation, but I still want to parse the text.
I look at your data and I see
Home:|555-555-1212|Work:|555-555-1212|Cell:|555-555-1212|Emergency:|555-555-1212
I'm using the pipe character, |, as a placeholder for where I would segment that string, which is basically wherever you have whitespace (space, tab, newline, etc).
There are two approaches to this. I'll start with the easy one.
Script Component
String.Split is your friend here. Look at what it did with that source data
I added a new Script Component, acting as a Transformation and created 4 output columns, all string of length 12 codepage 1252: Home, Work, Cell, and Emergency. I populate them like so
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
string[] split = Row.PhoneData.Split();
Row.Home = split[1];
Row.Work = split[4];
Row.Cell = split[7];
Row.Emergency = split[10];
}
Derived Column
I'm not going to build out a full blown implementation of this. The above is much to simple but I run into situations where ETL devs say they aren't allowed to use Script tasks/components and that's usually because people reached for them first instead of last.
The approach here is to have lots of Derived Columns Components on your Data Flow. It won't hurt performance and in fact can make it easier. It definitely will make your debugging easier as you'll have lots of it to do.
DER Find Colons
This would add 4 columns into the dataflow - HomeColonPosition, WorkColonPosition etc. You've already started down this path but just build it out into the actual data flow as you'll need to reference these positions and again, it's easier to fix the calculation that populates a column versus a calculation that's wrong and used everywhere. You're likely to find that 4 derived columns are useful here as you'd want to use the previous colon's position as the starting point for the third argument to FINDSTRING
Thus, instead of Work being
FINDSTRING(PhoneData, ":", FINDSTRING(PhoneData, ":" 1) + 1)
it would just be
FINDSTRING(PhoneData, ":", HomeColonPosition + 1)
Just knowing the position of the 4 colons in that string, I can figure out where the phone numbers are (maybe). The position of the colon + 2 (colon and the space) is the starting point and then go out 12 characters.
Where this approach gets ugly, much as it did with the script approach is when that data isn't consistent.

Counting the number of passes through a CSV file in JMeter

Am I missing an easy way to do this?
I have a CSV file with a number of params in it, and in my test I want to be able to make some of the fields unique across CSV repetitions with a suffix determined by the number of times I've looped through the file.
So suppose my CSV (simplified) had:
abc
def
ghi
I want to generate in the test
abc_1
def_1
ghi_1 <hit EOF>
abc_2
def_2
ghi_2 <hit EOF>
abc_3
def_3
ghi_3
I thought I could set up a counter to run parallel to my CSV loop, but that won't work unless I increment it by 1/n each iteration, where n is the number of lines in my CSV file. Which you can't do because counters are integers.
I'm going to go flail around and see if I can come up with a solution, but in case I'm not successful, has anyone got any suggestions?
I've used an EOF marker row (index column with something like "EOF" or "END", etc) and used an IF controller with either a non-resetting counter OR user-variables incremented via javascript in a BSF element (BSF assertion or whatever, just a mechanism to run the script).
Unfortunately its the best solution I've come up with without putting too much effort into it.