dbt Snapshot using Variable as source & DAG lineage - jinja2

I've got 3 main questions;
I've created a Snapshot, but instead of referencing a Source using the Source() function I have used variables that can be passed from the command line.
(My plan is to get Azure Data Factory to run a dbt Snapshot on the end of importing a source).
At the moment I'm testing in Visual Studio Code running from Powershell using the command
dbt run --vars '{"SourceSch": "SourceExampleSchema", "SourceTble": "SrcTbl"}'
In the snapshot I've got the following;
{% snapshot SrcTbl %}
At the moment I can't get "SrcTbl" to populate from the variable so I've had to hard code it.
I've tried the following results in an object called "var".
{% snapshot var("SourceTble") %}
Can anyone tell me how to reference the variable in the snapshot name?
When the docs are served only the snapshot table is a node, the source isn't listed.
Is this the default behaviour or is it because I'm not calling the Source() function?
How would I get the DAG to pick up the source going into the Snapshot?
Then I'm looking to expand this through Jinja so that it creates a snapshot for each "SourceTble".
3. How would I expand it if I had the following structure;
Database
-> Schema 1
-> Table 1
-> Schema 2
-> Table 2
-> Table 3
I think it would be something like;
dbt run --vars '{"SourceSch": ["Schema1", "SourceTble": "Table1", "Schema2", "SourceTble": ["Table2", "Table3"]]}'
But I'm confused on how to use Jinja to create a .sql file for each table using the one as a base.
Any help is appreciated.
Thanks,
Dan

Related

Comparing lists in DBT jinja macro inside for loop

I am storing output of each column in a list for the below query
select table,
columns,
flag
from {{ model }}
for each table I need to compare the flag. how do i get flag in list for the corresponding table
{% for n in table %}
{%- if flag[n] == 'False' %}
I tried as above its not working its returning null value
It's important to understand that jinja is a templating language. In dbt, we use jinja to automate the creation of sql that then gets executed in our data warehouse. Accordingly, data from your data warehouse does not (typically) flow through your jinja code.
flag is a reference to a column in your database, so to evaluate the contents of flag, you need to write sql to do that job.
The only exception is if you are using run_query or a similar macro to pull the data from your database into your jinja context at model-compile time. But I don't see any reference to that in your code (and frankly that's a complicating detail that's only relevant if you are trying to write SQL dynamically). If you are just trying to compare the value from a record in your database as part of your model, just do it in sql.

How to Implement logging at the end of each job In talend?

I am new to Talend os.
However, I received a task:
Create file delimited .csv metadata (one for Lead & Opportunity).
Move files to your repository on the AWS server (the etl_process1 login).
Create two tables sfdc_leads_reporting_raw and sfdc_opp_reporting_raw.
Load the data from the files into the tables. Ensure the data types are correctly used when creating metadata schemas & tables.
Till step 4 I am done.
Now the problem is:
How to Implement logging at the end of each job to report the number of leads (count of distinct id in leads table) and number of opportunities created (count of opportunity id) by stages (how many converted, qualified, closed won, and dead)?
Help would be appreciated.
You can get this data using global variables, in a subjob at the end of your job. Most components provide a global variable called tComponent_NB_LINE (or _NB_LINE_INSERTED for database components) that gives you the number of lines output by the component.
For instance tFileOutputDelimited_1_NB_LINE or tOracleOutput_1_NB_LINE_INSERTED.
Using these variables you can log into console or file.
Here is a simple example. If you have a tOracleOutput_1 in your job you can do:
tPostJob -- OnComponentOk -- tFixedFlowInput -- Main -- tLogRow
Inside tFixedFlowInput you retrieve the variable
(Integer)globalMap.get("tOracleOutput_1_NB_LINE_INSERTED")`.
If you need to log aggregated info, you can append a tAggregateRow to your output components, and use tSetGlobalVar to get count by certain criteria.

processing multiple files in business objects data services

I am new to the Business Objects Data services.
I have to run a dataflow reading from a file. Filename should be read based on wild chars like Platform. And I want to run the dataflow only if the file exists, if file is not present , it should not error out or should not do anything but it should just move on to the next dataflow or workflow in the job.
I tried below code to check if the file exists as built_in function File_Exists cannot check the file based on wild chars.
*$FILEEXISTSFLAG= exec('/bin/ksh',' "ls xxxxxx/Platform.csv',8);*
My intention is based on the value assigned to $FILEEXISTSFLAG from above code, I will decide whether to execute the data flow or not (if $FILEEXISTSFLAG is null do nothing otherwise execute the data flow ) but its retrieving below output.
*ls: cannot access /xxxxxx/Platform.csv: No such file*
Is there any other way to achieve this?
I was able to solve the above problem by using the index function.
$FILEEXISTSFLAG is containing a value like "ls: cannot access Platform: No such file or directory ". So, I have used the index function as below. So if the output is not null for below index function, it will execute the dataflow, otherwise it will do nothing.
index( $FILEEXISTSFLAG , 'No such file',1)
Thanks,
Phani.

Batch job to export data into CSV

I'm doing my first ABAP job and I don't have much experience so I need a little help.
What I want to do to create a batch job that runs every morning at a specific time, fetches data from different tables and exports it as a csv file. To create that batch job I can use transaction code SM36 or SM37.
But I need some help how to fetch the data?
Has anyone an example code that I can use or take a look at?
TheG is right, it sounds like you're trying to learn ABAP from scratch with no guidance. That's difficult but here are some basics:
There are three parts to this:
1. create a program
2. generate a file
3. schedule the job
For 1,
If you go to SE38, you can create a new report. You'll have to check with your colleagues about the namespace, but usually you just start the program with Z (which puts it in the 'customer' namespace).
In the entry box of SE38, you can type DEMO to pull up lots of sap-provided demo reports. The names usually give you a hint about what they demo and you can probably find one that mentions creating a file.
Once you create your own report through SE38 by typing in the name and hitting enter, you can use SELECT...INTO TABLE or SELECT ... ENDSELECT to query the database tables. Highlight select and click the blue i icon to pull up SAP's internal documentation.
At it's most basic, you can use the WRITE statement to print out the rows and columns of your data.
Once you have your report running, then scheduling it with SM36 will be more self explanatory.
This is very basic ABAP reporting program stuff. Making the report run as a background/batch job is the least of the concerns. Let us help you walk through this.
-> Have you done any reporting programming before ?
-> Do you have the list of tables from which you want the data and do you know how they are linked ?
-> Do you know how often this report would be run and what would be the selection criteria required ?
-> Did you check with the functional team whether you want 'delta pull' or 'full pull' every time you run the report ?
-> Do you have the file share where you want to output the file ? Is it on the presentation server or the application server ? If not presentation server can you reason out why not ?
-> Did you confirm on the file name and how it should look ?
-> Do you know how to generate a CSV file ? If this is a 'production requirement' ,are there reusable frameworks for handling file operations in your company ?
-> Do you have the final format of how the CSV file would look ?
-> Did you verify with the functional team whether they want the output data in external format for some fields ?
-> Did you check if there are date fields in your output and what format you want it to be for consistency ?
If you are familiar with ABAP a little bit, explore answers for above, write a report and getting it running in dialog mode. Then revert back to us and we will help you on how to run it as a batch job.

SSIS - Load flat files, save file names to SQL Table

I have a complex task that I need to complete. It worked well before since there was only one file but this is now changing. Each file has one long row that is first bulk inserted into a staging table. From here I'm supposed to save the file name into another table and then insert the the broken up parts of the staging table data. This is not the problem. We might have just one file or even multiple files to load at once. What needs to happen is this:
The first SSIS task is a script task that does some checks. The second task prepares the file list.
The staging table is truncated.
The third task is currently a Foreach loop container task that uses the files from the file list and processes it:
File is loaded into table using Bulk Insert task.
The file name needs to be passed as a variable to the next process. This was done with a C# task before but it is now a bit more complex since there could be more than one file and each file name needs to be saved separately.
The last task is a SQL task that executes a stored procedure with the file name as input variable.
My problem is that before it was only one file. This was easy enough. What would the best way be to go about it now?
In Data Flow Task which imports your file create a derrived column. Populate it with system variable value of filename. Load filename into the same table.
Use a Execute SQL task to retrieve distinc list of filenames into a recordset (Object type variable).
Use For Each Loop container to loop through the recordset. Place your code inside the container. Code will recieve filename from the loop as a value of a variable and process the file.
Use Execute SQL task in For Each Loop container to call SP. Pass filename as a parameter like:
Exec sp_MyCode param1, param2, ?
Where ? will pass filename INPUT as a string
EDIT
To make Flat File Connection to pick up the file specified by a variable - use Connection String property of the Flat File Connection
Select FF Connection, right click and select Properties
Click on empty field for Expressions and then click ellipsis that appears. With Expressions you can define every property of the object listed there using variables. Many objects in SSIS can have Expressions specified.
Add an Expression, select Connection String Property and define an expression with absolute path to the file (just to be on a safe side, it can be a UNC path too).
All the above can be accomplished using C# code in the script task itself. You can loop through all the files one by one and for each file :
1. Bulk Copy the data to the staging
2. Insert the filename to the other table
You can modify the logic as per your requirement and desired execution flow.
Add a colunm to your staging table - FileName
Capture the filename in a SSIS Variable (using expressions) then run something like this each loop:
UPDATE StagingTable SET FileName=? WHERE FileName IS NULL
Why are you messing about with C#? From your description it's totally unnecessary.