MySQL summary status table to store results of operations - mysql

I have a MySQL (5.6) database on my local workstation into which I routinely pull large datasets to perform analysis on. I have a separate SQL script for each dataset that imports the data and reformats it when needed (notably to convert date formats). In addition, I have other scripts that perform detailed analysis on the data.
For quality assurance, I would like to have a table named ImportLog that stores a record to capture the result of each import that is run. This table would look like the following:
ImportName DateRun RowsImported
---------- ------- ------------
ImportASR 2015-08-29 12902
ImportEAD 2015-08-30 18023
ImportHRData 2015-08-30 122376
The column definitions for ImportLog are as follows:
ImportName // the name of the script that is run
DateRun // the date that the script is run
RowsImported // the count of records imported in the run.
At the very end of each script would be the code to write one line to this table with the relevant data. For example, let's say that I ran the script named ImportASR on 8/29/2015 and it imported 12,902 records. At the end of the script, I want to append one record to ImportLog (like the first record in the table above) using something like this:
INSERT INTO ImportLog
VALUES("ImportASR",$DateRun,$RowCunt);
Every time I run one of the import scripts, it would add a row to the ImportLog table with the appropriate data.
My question is: How do I populate the $DateRun variable with the current date and the $RowCount variable with the row count of the newly imported ASR dataset? Or am I trying to approach this from the wrong angle?

First thing this morning I stumbled upon the answer to my problem; it was amazingly simple, and to my surprise it didn't need the use of any variables. The code to put at the end of each import script is something like:
INSERT INTO ImportLog
"Script: ImportASR",
SELECT NOW(),
(SELECT COUNT(*) FROM ASR_Full);
The InportLog table is initially defined like so:
CREATE TABLE LPIS_SearchMatchLog (
ImportName VARCHAR(25),
DateRun DATETIME,
RowCount INT
);
Hope this helps someone else!

Related

How to update multiple records in same table using .AfterUpdate data macro without error "A data macro resource limit was hit."

I have a table tblItems with a list of inventory items. The table has many columns to describe these items, including columns for SupplierName, SupplierOrderNumber and PredictedArrivalDate.
If I order several new items from a supplier, I will record each item separately in the table with the same supplier name, order number and a predicted arrival date.
I would like to add a data macro, so that if I update the PredictedArrivalDate for one record, the value will be copied to the PredictedArrivalDate column of other records/items with the same SupplierName AND SupplierOrderNumber.
The closest I've got is:
SetLocalVar (MySupplierName, [SupplierName])
SetLocalVar (MySupplierOrderNumber , [SupplierOrderNumber ])
SetLocalVar (MyPredictedArrivalDate, [PredictedArrivalDate])
For Each Record in tblItems
Where Condition = [SupplierOrderNumber] Like [MySupplierOrderNumber] And [SupplierName] Like [MySupplierName] And [PredictedArrivalDate]<>[MyPredictedArrivalDate]
Alias OtherRecords
EditRecord
SetField ([OtherRecords].[PredictedArrivalDate], [MyPredictedArrivalDate])
End EditRecord
However, when I run this, only 5 records update, and the error log reports error -20341:
"A data macro resource limit was hit. This may be caused by a data
macro recursively calling itself. The Updated() function may be
used to detect which field in a record has been updated to help
prevent recursive calls."
How can I get this working?
I'm not one for using macro's to do anything, so I'd use VBA and recordsets/an action query to do the updating.
You can call a user-defined function inside a data macro by setting a local var equal to its result.
Access doesn't like data macros triggering themselves (which you are doing, you're using an on update macro and updating fields in the same table on a different record), because there is a risk of accidentally creating endless loops. Looks like you triggered a measure that's made to prevent this. I'd try to avoid that as much as possible.
Note: using user-defined functions inside data macros can cause problems when you're linking to the table from outside of Access (via ODBC for example).
This isn't a good solution (it's not a data macro), but it does work as a temporary fix.
I created an update query called "updatePredictedArrivalDate":
PARAMETERS
ItemID Long,
MyPredictedArrivalDate DateTime,
MySupplierName Text ( 255 ),
MySupplierOrderNumber Text ( 255 );
UPDATE tblItems
SET tblItems.PredictedArrivalDate = [MyPredictedArrivalDate]
WHERE (((tblItems.SupplierName) = [MySupplierName])
AND ((tblItems.SupplierOrderNumber) = [MySupplierOrderNumber])
AND ((tblItems.ID) <> [ItemID]));
On the PredictedArrivalDate form field .AfterUpdate event, I then added this macro:
IF [PredictedArrivalDate].[OldValue]<>[PredictedArrivalDate] Or [PredictedArrivalDate]<>""
OpenQuery (updatePredictedArrivalDate, Datasheet, Edit, [ID], [PredictedArrivalDate], [SupplierName], [SupplierOrderNumber])
I now have to remember to add this .AfterUpdate event to any other forms I create that amend that particular field.
If anyone has a better solution, please let me know.

How to Implement logging at the end of each job In talend?

I am new to Talend os.
However, I received a task:
Create file delimited .csv metadata (one for Lead & Opportunity).
Move files to your repository on the AWS server (the etl_process1 login).
Create two tables sfdc_leads_reporting_raw and sfdc_opp_reporting_raw.
Load the data from the files into the tables. Ensure the data types are correctly used when creating metadata schemas & tables.
Till step 4 I am done.
Now the problem is:
How to Implement logging at the end of each job to report the number of leads (count of distinct id in leads table) and number of opportunities created (count of opportunity id) by stages (how many converted, qualified, closed won, and dead)?
Help would be appreciated.
You can get this data using global variables, in a subjob at the end of your job. Most components provide a global variable called tComponent_NB_LINE (or _NB_LINE_INSERTED for database components) that gives you the number of lines output by the component.
For instance tFileOutputDelimited_1_NB_LINE or tOracleOutput_1_NB_LINE_INSERTED.
Using these variables you can log into console or file.
Here is a simple example. If you have a tOracleOutput_1 in your job you can do:
tPostJob -- OnComponentOk -- tFixedFlowInput -- Main -- tLogRow
Inside tFixedFlowInput you retrieve the variable
(Integer)globalMap.get("tOracleOutput_1_NB_LINE_INSERTED")`.
If you need to log aggregated info, you can append a tAggregateRow to your output components, and use tSetGlobalVar to get count by certain criteria.

Captuing runtime for each task within a dataflow in SSIS2012

In my SSIS package I have a dataflow that looks something like this.
My requirement is to log the end time of each flatfile destination (Or the time when each of the flat files is created) , in a SQL server table. To be more clear, there will be one row per flatfile in the log table. Is there any simple way(preferably) to accomplish this? Thanks in advance.
Update: I ended up using a script task after the dataflow and read the creation time of each of the file created in the dataflow. I also used same script task to insert logs into the table, just to keep things in one place. For details refer the post masked as answer.
In order to get the accurate date and timestamp of each flat file created as the destination, you'll need to create three new global variables and set up a for-each loop container in the control flow following your current data flow task and then add to the for-each loop container a script task that will read from one flat file at a time the date/time information. That information will then be saved to one of the new global variables that can then be applied in a second SQL task (also in the for-each loop) to write the information to a database table.
The following link provides a good example of the steps you'll need to apply. There are a few extra steps not applicable that you can easily exclude.
http://microsoft-ssis.blogspot.com/2011/01/use-filedates-in-ssis.html
Hope this helps.
After looking more closely at the toolbox, I think the best way to do this is to move each source/destination pairing into its own dataflow and use the OnPostExecute event of each dataflow to write to the SQL table.
Wanted to provide more detail to #TabAlleman's approach.
For each control flow task with a name like Bene_hic, you will have a source file and a destination file.
On the 'Event Handlers' tab for that executable (use the drop-down list,) you can create the OnPostExecute event.
In that event, I have two SQL tasks. One generates the SQL to execute for this control flow task, the second executes the SQL.
These SQL tasks are dependent on two user variables scoped in the OnPostExecute event. The EvaluateAsExpression property for both is set to True. The first one, Variable1, is used as a template for the SQL to execute and has a value like:
"SELECT execSQL FROM db.Ssis_onPostExecute
where stgTable = '" + #[System::SourceName] + "'"
#[System::SourceName] is an SSIS system variable containing the name of the control flow task.
I have a table in my database named Ssis_onPostExecute with two fields, an execSQL field with values like:
DELETE FROM db.TableStats WHERE TABLENAME = 'Bene_hic';
INSERT INTO db.TableStats
SELECT CreatorName ,t.tname, CURRENT_TIMESTAMP ,rcnt FROM
(SELECT databasename, TABLENAME AS tname, CreatorName FROM dbc.TablesV) t
INNER JOIN
(SELECT 'Bene_hic' AS tname,
COUNT(*) AS rcnt FROM db.Bene_hic) u ON
t.tname = u.tname
WHERE t.databasename = 'db' AND t.tname = 'Bene_hic';
and a stgTable field with the name of the corresponding control flow task in the package (case-sensitive!) like Bene_hic
In the first SQL task (named SQL,) I have the SourceVariable set to a user variable (User::Variable1) and the ResultSet property set to 'single row.' The Result Set detail includes a Result Name = 0 and Variable name as the second user variable (User::Variable2.)
In the second SQL task (exec,) I have the SQLSourceType property set to Variable and the SourceVariable property set to User::Variable2.
Then the package is able to copy the data in the source object to the destination, and whether it fails or not, enter a row in a table with the timestamp and number of rows copied, along with the table name and anything else you want to track.
Also, when debugging, you have to run the whole package, not just one task in the event. The variables won't be set correctly otherwise.
HTH, it took me forever to figure all this stuff out, working from examples on several web sites. I'm using code to generate the SQL in the execSQL field for each of the 42 control flow tasks, meaning I created 84 user variables.
-Beth
The easy solution will be:
1) drag the OLE DB Command from the tool box after the Fatfile destination.
2) Update Script to update table with current date when Flat file destination is successful.
3) You can create a variable (scope is project) with value systemdatetime.
4) You might have to create another variable depending on your package construct if Success or fail

SSIS 2008R2 Data Driven Variable Values

I am fairly new to SSIS and have set myself a challenging first project, creating a data driven package framework. My current challenge is that I want to store the values of variables for my various packages in a table and then load them. So for instance, the SSIS package might be processing records between 2 dates. I would have two records in a parameters table:
ParmName ParmValue
-------- ---------
DateFrom 2013-01-01
DateTom 2013-01-31
These variables names will exist in the package, I just need to load them. In a false start, I tried using an Execute SQL Task but this didn't work. I assume I need a Script Task C# to do this but I don't know C#. Wondering if anyone could give me a pointer to where I can find some code similar to what I am trying to do. Just to make it a bit clearer, in pseudo code I ebevision a process like
Dataset = Select * from PkgParms where PckID = ?
FOR EACH DataSet.Record
SET (DataSet.Record.ParmName.Value) = (DataSet.Record.ParmValue.Value)
If this is not doable or I am in over my head please just let me know
Thanks
Steve
This is usually done in SSIS - Package Configurations. Follow the wizard for SQL Server configuration type.
You can find some tutorials how to do it, but in general you'll need 2 new columns:
packagepath with values like:
\Package.Variables[User::DateFrom].Properties[Value]
configurationfilter which will have same values for both dates, e.g. Dates

How to export a flat file with different rows using SSIS?

I have tree tables, Customer, Invoice and InvoiceRow with the standard relations.
These I have to export in one fixed field length file with the first two characters of each row identifying the row type. The row types have different specifications.
I could probably do it with a nested loop in a script block, but this is my first ever SSIS package and that solution feels wrong.
edit:
The output has to have:
Customer
Invoice
Rows
Customer
Invoice
Rows
and so on
Your gut feeling on doing this using a Script Destination component is correct. Unfortunately, this scenario doesn't jive with SSIS well. I don't consider this a beginner package. If you must use SSIS then I'd start by inner joining all the data so there is one row for each InvoiceRow, containing the data needed from all three tables.
CustomerCols, InvoiceCols, RowCols
Then, in the script destination component you'll need to keep track of the customer and invoice values, as they change you'll need to write extra rows to the output.
See Creating a Destination with the Script Component for more information on script destination.
My experience shows that script destinations can have good performance.
I would avoid writing Script Destination, and use just Script Transform + Flat File Destination. This way, you concentrate on the logical output (strings of data), while allowing SSIS to do actual writing to the file (it might be a bit more efficient, plus you concentrate on your business, not on writing to files).
First, you'll need to get denormalized data. You can do joins and sorts in the DBMS, but if you don't want to put too much pressure on DBMS - just get sorted data out of it and merge it using two SSIS Merge Join transforms.
Then do the script: keep running values of current Customer and Invoice, output them when they change, output InvoiceRow on every input. Something like this:
if (this.CustomerID != InputBuffer.CustomerID) {
this.CustomerID = InputBuffer.CustomerID;
OutputBuffer.AddRow();
OutputBuffer.OutputColumn = "Customer: " + InputBuffer.CustomerID + " " + InputBuffer.CustomerName;
}
// repeat the same code for Invoice
OutputBuffer.AddRow();
OutputBuffer.OutputColumn = "InvoiceRow: " + InputBuffer.InvoiceRowPrice;
Finally, add a Flat File Destination with a single column (OutputColumn created by the script) to write this to the file.
Process your three tables so that the outputs are all appropriate for your output file (including the row type designator). You'll have to do this in three separate flow paths in your data flow, then bring the rows together in a Union All data flow element. From there, process them as needed to create your output file.