Creating / Appending a Flat File Destination based on date. - ssis

The Backstory:
I have a process that loads physician demographic data into our system. This data can come in at any time and at any interval between updates. The data is what we call "Term-by-Exclusion", meaning that the source file takes precedence, and any physician record in the db that is not in the source file is marked as "Termed" or Inactive.
The Problem:
I need to be able to output the data from the source data, into a flat file destination as a daily report to a companion COBOL system. The source data is loaded into an ETL.PhysicianLoad table prior to processing and the ETL table is wiped prior to each new processing transaction, so retaining a full days' records is not possible as it stands now, without the output file.
Example: ProcessOutput_10152013.txt
The output file ideally needs to be a comprehensive of the entire days' processing. Meaning I want to continuously append to that days' file until the end of that day, then email a notification stating the file is ready for pickup. Any data that comes in after the turn of the day should then be placed in newly created file.
Output should look like this (no headers)
BatchID | LastName | FirstName | MiddleInitial | Date
0001 | Smith | John | A | 10/15/13
0001 | Smith | Sue | R | 10/15/13
0001 | Zeller | Frank | L | 10/15/13
0002 | Peters | Paula | D | 10/15/13
0002 | Rivers | Patrick | E | 10/15/13
0002 | Waters | Oliver | G | 10/15/13
What I am thinking:
I am thinking about using a CurrentDate Variable that will hold the current date comparing it to an expression based variable called FileName which will concatenate the current mmddyyyy to "ProcessOutput_.txt". My thinking is that I should be able to locate a file with that name in the destination folder and if it exists, I should be able to write to it. Otherwise I will have to create a new file. I can then set my Flat File Destination via expression to the FileName Variable.
Can anyone see a better way of doing this or any issues that may arise from this solution I am not seeing?

My thought process was in the right place, but flawed.
Here is how I solved the problem.
After trying to build my control/data flows using the logic in the original question, I discovered that I was working myself into a corner.
So that got me thinking again, how can I do this the easiest possible way
First, do I have the correct Variables defined? No..
CurrentDate - has to be there to define the date portion of the file name.
FileName - has to be present for obvious reasons.
So what did I miss?
FileExists (Type: boolean) - Something that will identify the existence of the file.
PlaceholderFile (Type: String) - Generic FileName Variable
Now what to do with it?
Add a VB Script Task to the control flow, that sets the FileExists flag.
'Check to see if ProspectivePhysician_<currentdate>.txt exists.
Dts.Variables("User::FileExists").Value = File.Exists(Dts.Variables("User::FileName").Value.ToString)
Now that we have the existence of the destination file defined, create the data flow object from the source table. Checking the FileExists Variable in a conditional split. Seperating the data flow into two branches. Create two Flat File Destinations called "Existing" and "New", setting them both to the same flat file location for the time being.
If you attempt to run the package at this point, you will receive Validation Errors from one of the two destinations, as the first is holding ownership of the file and will not allow the second to validate the file.
How to fix this...Use Expressions to swap the actual FileName value back and forth.
For the Existing Flat File Connection String Value, use the following Expression:
#[User::FileExists] == True ? #[User::FileName] : #[User::PlaceholderFile]
For the New Flat File Connection String value, use the following Expression:
#[User::FileExists] == True ? #[User::PlaceholderFile] : #[User::FileName]
Finally, Right click on each of the Flat File Destination Objects in the Data Flow and set the Overwrite property to True on the New Flat File Destination, and False on the Existing Destination. This will assure that the Append action is used on the existing file.

Related

PySpark Extract Substring from Free Form Text

I am trying to extract data from a json file using PySpark and the data I need is stored in a free form text field. Each record would contain data similar to the sample below.
I basically need to get the corresponding values under VAL, STAGE, ID, DATE and TIME. The section of text that I need starts with Audit Information and ends prior to the word NOTES. Each line ends with a pipe character and the section I need is usually found in between a large number of text.
Here's how the data looks like unformatted:
---------- Audit Information -------|
TEXT1: TEXT2: TEXT3: |
TEXT4: TEXT5: TEXT6: |
INDICATOR: |
VAL STAGE ID DATE TIME |
310 000 F11 220925 0110440 |
315 001 F14 200926 0110440 |
347 001 220926 0112310 |
NOTES: |
|
---------- Next Section ------------|
And here's how it would appear formatted:
My initial thought was to get the position of Audit Information and use that as starting point. And then next get the position of NOTES to close it off. But not really sure how to proceed from there.

Complex Mail Merge (CSV to Word, CSV to PDF, or Other)

QUESTION:
How do you write an ifStatement for Word or for PDF to calculate multiple rows per matching result?
USEAGE:
What I am trying to do seems fairly straight forward and was very easy when I was able to use MS Access 15 years ago, but with Access being not a possibility anymore, I am hoping somebody has a reasonable solution.
The WHAT:
I am trying to generate Statements/Invoices from a CSV (or spreadsheet of any format) into a nice report layout. Let's say the columns look like this:
First Name | Last Name | Account | Address | Item | Description | Item Total
Jane | Smith | 123 | 111 Main St | Ice Cream | it's really cold | $100.00
This is super easy and I can do in Word within 10 minutes and make it "pretty".
BUT what if there are multiple Items per invoice?
So maybe the CSV looks like:
First Name | Last Name | Account | Address | Item | Description | Item Total
Jane | Smith | 123 | 111 Main St | Ice Cream | it's really cold | $100.00
Jane | Smith | 123 | 111 Main St | Hot Dogs | all beef, all the time | $200.00
I still want there to only be 1 invoice per person but not sure how to do an if statement in Word that would say "If there are multiple items per person, put them on a new row, then total them all together"
I would be glad to have the CSV go into a PDF fillable form if I could get the multiple rows to work - I just cannot figure that portion out.
Other options: I looked at OpenOffice "Base" but couldn't get a nice form for a very custom Report. I researched briefly on how to do something like this on AWS, but without any luck. I don't think Microsoft has anything like Access anymore
You can use Word's Catalogue/Directory Mailmerge facility for this (the terminology depends on the Word version). To see how to do so with any mailmerge data source supported by Word, check out my Microsoft Word Catalogue/Directory Mailmerge Tutorial at:
http://www.msofficeforums.com/mail-merge/38721-microsoft-word-catalogue-directory-mailmerge-tutorial.html
or:
http://www.gmayor.com/Zips/Catalogue%20Mailmerge.zip
The tutorial covers everything from list creation to the insertion & calculation of values in multi-record tables in letters. Do read the tutorial before trying to use the mailmerge document included with it.
Depending on what you're trying to achieve, the field coding for this can be complex. However, since the tutorial document includes working field codes for all of its examples, most of the hard work has already been done for you - you should be able to do little more than copy/paste the relevant field codes into your own mailmerge main document, substitute/insert your own field names and adjust the formatting to get the results you desire. For some worked examples, see the attachments to the posts at:
http://www.msofficeforums.com/mail-merge/9180-mail-merge-duplicate-names-but-different-dollar.html#post23345
http://www.msofficeforums.com/mail-merge/11436-access-word-creating-list-multiple-records.html#post30327
Another option would be to use a DATABASE field in a normal ‘letter’ mailmerge main document and a macro to drive the process. An outline of this approach can be found at: http://answers.microsoft.com/en-us/office/forum/office_2010-word/many-to-one-email-merge-using-tables/8bce1798-fbe8-41f9-a121-1996c14dca5d
Conversely, if you're using a relational database or, Excel workbook with a separate table with just a single instance of each of the grouping criteria, a DATABASE field in a normal ‘letter’ mailmerge main document could be used without the need for a macro. An outline of this approach can be found at:
https://answers.microsoft.com/en-us/msoffice/forum/msoffice_word-mso_winother-mso_2010/mail-merge-to-a-word-table-on-a-single-page/4edb4654-27e0-47d2-bd5f-8642e46fa103
For a working example, see:
http://www.msofficeforums.com/mail-merge/37844-mail-merge-using-one-excel-file-multiple.html
The problem with the DATABASE field, though, is that it won't provide the totals you're after. Nevertheless, if you're going down the macro route, it wouldn't take too much more code to append a totals row to the resulting table.
Alternatively, you may want to try one of the Many-to-One Mail Merge add-ins, from:
Graham Mayor at http://www.gmayor.com/ManyToOne.htm; or
Doug Robbins at https://onedrive.live.com/?cid=5AEDCB43615E886B&id=5AEDCB43615E886B!566
PS: While I'm cognisant of StackOverflow's preference for the substance of answers to be posted here rather than linked to, the complexity in this case is far too great to deal with that way, besides which, one can't post the actual field codes or a document containing them here.

How to split table data separate names Excel files using an SSIS package?

I'm working with a set of data from SQL Server that I'd like to get into a group of Excel files. This task needs to be automated to run on a monthly basis. The data looks like
sk needs to be automated to run on a monthly basis. The data looks like
Site ID FirstName LastName
------ ------- --------- ---------
North 111 Jim Smith
North 112 Tim Johnson
North 113 Sachin Tedulkar
South 201 Horatio Alger
South 205 Jimi Hendrix
South 215 Bugs Bunny
I'd like the results to look like
In Excel file named **North.xls**
ID FirstName LastName
111 Jim Smith
112 Tim Johnson
113 Sachin Tedulkar
In Excel file named **South.xls**
ID FirstName LastName
201 Horatio Alger
205 Jimi Hendrix
215 Bugs Bunny
There are between 70 and 100 values in the Site column that I'd like to split upon. I'm using SSIS to perform this task, but I'm getting stuck after I've pulled the data from SQL Server with a OLE DB Source task. What should come next? If there is an easier way to do this using other tools I'm open to that too.
You can create a Execute SQL Task, which executes a SELECT DISTINCT on the column "Site" an stores the values in a object variable.
In the next step you build a Foreach Loop Container, which iterates the object variable.
The Foreach Loop Container has a Dataflow Task. In the Dataflow you have a ADO.NET Source, you build an expression for the SQL-Statement.
In the Expression you build a dynamic SELECT, in the where Part you restrict to the current iteration.
Redirect the Dataflow to a Flat File Destination. In the Expression of the Flat File Destination you can name the File with current iteration.
Do have any questions? Do you need Screenshots?
Update:
A more detailed explanation with screenshots:
Create a execute SQL Task:
It should return a full result set and in the SQLStatment property write the SELECT Distinct query on your Site column.
Define the Result as "0" and map it to a variable of type Object.
Create an Foreach Loop Container:
Set the Enumerator to "Foreach ADO Enumerator" and select your variable, which you have already definided in part 1, in the "ADO object source variable" Combobox.
Map a new variable of type string in the Resultset. This variable is iteration of the object variable in the Loop.
Now you place a Dataflow Task in the ForEachLoop Container.
You can either use an "OLE DB Source" or an "ADO NET Source" as your data source.
I will explain the "ADO NET Source":
Add this construct to your Data Flow:
Configure the ADO.NET Soure like this:
Add an expression to the ADO.NET Source:
Open the expression editor and select the property [ADO NET Source].[SQLCommand]. In this Expression Editor you can build dynamic SQL querys.
Expression are very powerfull. Here is the documentation: https://learn.microsoft.com/en-us/sql/integration-services/expressions/integration-services-ssis-expressions?view=sql-server-2017
The expression should look something like this:
"SELECT [Site]
,[ID]
,[FirstName]
,[LastName]
FROM [Test].[dbo].[Sites]
where Site = '" + #[User::sIterator] + "'"
Now every loop passage, the sql-query will select another site.
Make the the FileName dynamic with Expressions.
Create an Connection Manger for your "Flat File Destination".
Select the Expression Property of the connection Manger, like we did before in Part 5 for the Data Flow Task.
Now build your Expression for the Property "ConnectionString". The ConnectionString is the full Path including the filename.
"E:\\" + #[User::sIterator] + ".csv"
Dont forget you have to qoute "\" in expressions with "\". So always write "\\" not "\".

Crystal Reports XI - export shared variable string to csv truncates the string

I have two databases that store information on customer appointments:
AppointmentMaster has 1 record for each appoint:
Customer Name ApptDate ApptID
------------------------------------------------
2554 Smith,Bob 20140301 100
2468 Jones, Grace 20140301 101
2795 Roberts, Sam 20140302 102
2408 Harris, Chuck 20140305 103
AppointmentDetails holds a record for each operation performed at the appointment (sometimes none, sometimes dozens):
ApptID Operation OpDescription
------------------------------------------------
100 A10 Corrected the A10 unit.
100 IA Resolved issues with internal account.
100 C5 Brief consult with client.
101 A10C Replaced cage on A10 unit.
101 U1 Updated customer account.
103 C5 Brief consult with client.
My client needs a CSV file that contains 1 line per appointment. One of the fields in the CSV is a pipe separated listing of any and all operation codes performed at the appointment. The CSV file would look like this:
"2554", "Smith,Bob", "20140301", "A10|IA|C5|"
"2468", "Jones, Grace", "20140301", "A10C|U1|"
"2795", "Roberts, Sam", "20140302", ""
"2408", "Harris, Chuck", "20140305", "C5|"
I have a crystal report created that displays the fields correctly, however when I go to export to CSV I am seeing a file like this:
"2554", "Smith,Bob", "20140301", "C5|"
"2468", "Jones, Grace", "20140301", "U1|"
"2795", "Roberts, Sam", "20140302", ""
"2408", "Harris, Chuck", "20140305", "C5|"
Only the last Operation is getting exported into CSV even though all of them display.
If I export as PDF, Excel or Record Style the file has all of the operations. Unfortunately I need a CSV. I am trying to avoid having to do multiple reports and stitch them together with a script if possible; The client wants to be able to easily run and export this themselves on demand.
I created three formula fields to initialize, update and display a shared variable that concatenates the operations together.
My report is grouped by the ApptID and looks like this:
Group Header #1 (suppressed)
{#InitializeOperations}:
WhilePrintingRecords;
shared StringVar Operations := "";
Details (suppressed)
{#UpdateOperations}:
WhilePrintingRecords;
shared StringVar Operations := Operations + {AppointmentDetails.Operation} + "|";
Group Footer #1
{AppointmentMaster.Customer}
{AppointmentMaster.Name}
{AppointmentMaster.ApptDate}
{#DisplayOperations}:
WhilePrintingRecords;
shared StringVar Operations;
I have tried using evaluateAfter(#UpdateOperations) instead of WhilePrintingRecords on the #DisplayOperations, and have even tried removing any Evalutation Time command from it as well, but I still can't get the desired effect in the CSV file despite having it look correct on screen and every other way I have tried to export it.
Any help you can provide is appreciated.

Maintaing test users in cucumber steps

In my tests I have to work with different types of users and environments. At the moment I am manually updating the users since we don't have many features. However we will be adding many new features that will make it very difficult to update all files manually. Most of these are needed in the Given step. Example:
Scenario:
Given I am signed in as "user1#example.com"
I would like to change this to:
Scenario
Given I am signed in as "user1"
"user1" could stored in a csv file or in a db. Can either of these be done? If so which is the recommended method?
The CSV file would have something like:
user1,user1#example.com
user2,user2#example.com
user3,user3#example.com
A table in a db:
| id | user | email |
| 1 | user1 | user1#example.com |
| 2 | user2 | user2#example.com |
Seems using the db might be easier to maintain if it can be done. As always your help is appreciated.
The usual way to abstract test case details in Cucumber is through the use of "Scenario Outlines":
https://github.com/cucumber/cucumber/wiki/Scenario-Outlines
Using a Scenario Outline is equivalent to storing test case data in a CSV file, but it has the advantage of keeping the test case info right there in the .feature file.
If you follow this convention, all parts of the test workflow can be edited in the same place - this actually makes maintenance of the test cases easier than if the test outline and the individual test cases are segregated into separate text files (or segregated between a .feature file and a database instance).