I'm currently working a project where I'm using Google Big Query to pull data from spreadsheets. I'm VERY new to SQL, so I apologize. I'm currently using the following code
Select *
From my_data
Where T1 > 1000
And T2 > 2000
So keeping the Select and From the same, I want to be able to run multiple queries where I can just keep changing the values I'm looking for between t1 and t2. Around 50 different values. I'd like for BigQuery to run through these 50 different values back to back. Is there a way to do this? Thanks!
I'm VERY new to SQL
... and I assume to BigQuery either ..., so
Below is one of the options for new users who are not familiar yet with BigQuery API and/or different clients rather than BigQuery Web UI.
BigQuery Mate adds parameters feature to BigQuery Web UI
What you need to do is
Save you query as below using Save Query Button
Notice <var_t1> and <var_t2>
Those are the parameters identifyable by BigQuery Mate
Now you can set those parameters
Click QB Mate and then Parameters to get to below form
Now you can set parameters with whatever values you want to run with;
Click on Replace Parameters OK Button and those values will appear in Editor. For example
After OK is clicked you get
So now you can run your query
To Run another round with new parameters, you need to load again your saved query to the editor by clicking on Edit Query Button
and now repeat settings parameters and so on
You can find BigQuery Mate Chrome extension here
Disclaimer: I am the author an the only developer of this tool
You may be interested in running parameterized queries. The idea would be to have a single query string, e.g.:
SELECT *
FROM YourTable
WHERE t1 > #t1_min AND
t2 > #t2_min;
You would execute this multiple times, where each time you bind different values of the t1_min and t2_min parameters. The exact logic would depend on the API through which you are using the client libraries, and there are language-specific examples in the first link that I provided.
If you are not concerned about sql-injection and just want to iteratively swap out parameters in queries, you might want to look into the mustache templating language (available in R as 'whisker').
If you are using R, you can iterate/automate this type of query with the condusco R package. Here's a complete R script that will accomplish this kind of iterative query using both whisker and condusco:
library(bigrquery)
library(condusco)
library(whisker)
# create a simple function that will create a query
# using {{{mustache}}} placeholders for any parameters
create_results_table <- function(params){
destination_table <- '{{{dataset_id}}}.{{{table_prefix}}}_results_{{{year_low}}}_{{{year_high}}}'
query <- '
SELECT *
FROM `bigquery-public-data.samples.gsod`
WHERE year > {{{year_low}}}
AND year <= {{{year_high}}}
'
# use whisker to swap out {{{mustache}}} placeholders with parameters
query_exec(
whisker.render(query,params),
project=whisker.render('{{{project}}}', params),
destination_table = whisker.render(destination_table,params),
use_legacy_sql = FALSE
)
}
# create an invocation query to provide sets of parameters to create_results_table
invocation_query <- '
SELECT
"<YOUR PROJECT HERE>" as project,
"<YOUR DATASET_ID HERE>" as dataset_id,
"<YOUR TABLE PREFIX HERE>" as table_prefix,
num as year_low,
num+1 as year_high
FROM `bigquery-public-data.common_us.num_999999`
WHERE num BETWEEN 1992 AND 1995
'
# call condusco's run_pipeline_gbq to iteratively run create_results_table over invocation_query's results
run_pipeline_gbq(
create_results_table,
invocation_query,
project = '<YOUR PROJECT HERE>',
use_legacy_sql = FALSE
)
Related
I am trying to accomplish something that is pretty easy to do in SQL, but seemingly very challenging to do in SSIS without using SQL. Basically, I need to consolidate and concatenate a field of a many-to-one relationship.
Given entities: [Contract Item] (many) to (one) [Account]
There is a field [ari_productsummary] that contains the product listed on the Contract Item entity. We want to write that value to the Account as [ari_activecontractitems]. However, an Account may have more than one Contract Item record associated to it, in which case, we want to concatenate those values. We also only want the distinct values to be concatenated (distinct rows already solved within my data flow).
This can be accomplished by writing to a temporary table, and then using a query or view to obtain the summarized results as followed. I created a SQL table called TESTTABLE that contains the [ari_productsummary] from the Contract Item entity along with the referring [accountid] to map it back to Account. I then wrote the following query as a view:
SELECT distinct accountid,
(SELECT TT2.ari_productsummary + '; '
FROM TESTTABLE TT2
WHERE TT2.accountid = TT.accountid
FOR XML PATH ('')
) AS 'ari_activecontractitems'
FROM TESTTABLE TT
Executing that Query provides me the results that I want, which I can then use for importing into the Account entity as shown below:
But how do I do this in a SSIS dataflow without writing to a SQL table as a temporary placeholder for the data?? I want to do the entire process inside one dataflow container, without using a temporary SQL table/view. The whole summarization process needs to be done on the fly:
Does anyone have a solution that doesn't require a temporary SQL table/view/query, but is contained entirely within the data flow?
I am using VS 2017 and the KingswaySoft Dynamic CRM 365 ETL toolset to develop my solution/package.
Spit balling here as I don't Dynamics nor do I have the custom components.
Data Flow 1 - Contract aggregation
The purpose of this data flow is to replicate your logic in the elegant query you provided and shove that into a Cache Connection Manager (see Notes for 2008+ at the end)
KingswaySoft Dynamics Source -> Script Task -> Cache Transform
If you want to keep the sort in there, do it before the script task. The implementation I'll take with the Script Task is that it's fully blocking - that is all the rows must arrive before it can send any on. Tasks like the Merge Join are only partially blocking because the requirement of sorted data means that once you no longer have a match for the current item, you can send it on down the pipeline.
The Script Task is going to be asynchronous transformation. You'll have two output columns, your key accountid and your new derived column of ari_activecontractitems. That column will might need to be big - you'll know your data best but if it's a blob type in Dynamics (> 4k unicode or > 8k ascii characters) then you'll have to define the data type as DT_TEXT/DT_NTEXT
As inputs, you'll select accountid and ari_productsummary from your source.
The code should be pretty easy. We're going to accumulate the inbound data into a Dictionary.
// member variable
Dictionary<string, List<string>> accumulator;
The PreProcess method, we'll tack this in there to initialize our variable
// initialize in PreProcess method
accumulator = new Dictionary<string, List<string>>();
In the OnBufferRowSent (name approx)
// simulate the inbound queue
// row_id would be something like Rows.row_id
if (!accumulator.ContainsKey(row_id))
{
// Create an empty dictionary for our list
accumulator.Add(row_id, new List<string>());
}
// add it if we don't have it
if (!accumulator[row_id].Contains(invoice))
{
accumulator[row_id].Add(invoice);
}
Once you get the signal sent of no more data available, that's when you start buffering output data. The auto generated code will have placeholders for all this.
// This is how we shove data out the pipe
foreach(var kvp in accumulator)
{
// approximately thus
OutputBuffer1.AddRow();
OutputBuffer1.row_id = kvp.Key;
OutputBuffer1.ari_productsummary = string.Join("; ", kvp.Value);
}
We have an upcoming release that comes with a component that does exactly what you are trying to achieve without the need of writing custom code. The feature is currently under preview, please reach out to us for private access to the feature. You can find our contact information on our website.
UPDATE - June 5, 2020, we have made the components available for public access at https://www.kingswaysoft.com/products/ssis-productivity-pack/ as a result of our 2020 Release Wave 1. We have two components available that serve this kind of purpose. The Composition component will take input values and transform into a composite value in a SSIS column. The Decomposition component does the opposite, it would take an input value and split it into multiple rows using either delimiter-based text splitting or XML/JSON array splitting.
To comply to regulations, I'm trying to download the purchase invoice documents (as PDF files) from some of my divisions to save them on-disk for archiving purposes.
I use Invantive Query Tool to do this. I like to know which table to use and how to export these attachments only regarding purchase invoice documents.
You can indeed do this by using the export options in Invantive Query Tool or Invantive Data Hub.
What you need is a query that hooks up the document information of type 20 (purchase invoices) with the actual attachment files. You can find a list of types and their description in the DocumentTypes view. You can find the document attachment files in the DocumentAttachmentFiles table.
When you have retrieved that, you can export the documents from that query to disk using a local export documents statement.
The full query is here:
use 123456
select /*+ join_set(dae, document, 10000) */ attachmentfromurl
, dct.division || '/' || dae.id || '-' || filename
filepath
from exactonlinerest..documents dct
join DocumentAttachmentFiles dae
on dae.division = dct.division
and dae.document = dct.id
where dct.Type = 20
order
by dct.division
, dae.id
local export documents in attachmentfromurl to "c:\temp\docs" filename column Filepath
Make sure to set the ID of the division right in the use statement (this is the technical ID, not the 'division number', which can contain duplicates). You can find that in the top menu bar under Partitions. Or simply use use all to get the documents from all divisions (this might take a while).
Also set the file path right where it says c:\temp\docs now. Then hit F5 in the Query Tool to execute, or run the script from Data Hub.
The SQL engine hides away all nifty details on what API calls are being done. However, some cloud solutions have pricing per API call.
For instance:
select *
from transactionlines
retrieves all Exact Online transaction lines of the current company, but:
select *
from transactionlines
where financialyear = 2016
filters it effectively on REST API of Exact Online to just that year, reducing data volume. And:
select *
from gltransactionlines
where year_attr = 2016
retrieves all data since the where-clause is not forwarded to this XML API of Exact.
Of course I can attach fiddler or wireshark and try to analyze the data volume, but is there an easier way to analyze the data volume of API calls with Invantive SQL?
First of all, all calls handled by Invantive SQL are logged in the Invantive Cloud together with:
the time
data volume in both directions
duration
to enable consistent API use monitoring across all supported cloud platforms. The actual data is not logged and travels directly.
You can query the same numbers from within your session, for instance:
select * from exactonlinerest..projects where code like 'A%'
retrieves all projects with a code starting with 'A'. And then:
select * from sessionios#datadictionary
shows you the API calls made:
You can also put a query like to following at the end of your session before logging off:
select main_url
, sum(bytes_received) bytes_received
, sum(duration_ms) duration_ms
from ( select regexp_replace(url, '\?.*', '') main_url
, bytes_received
, duration_ms
from sessionios#datadictionary
)
group
by main_url
with a result such as:
I want to accomplish a fairly simple task (I'd think).
I have one table with a shiftid (INT), shiftstart (datetime), shiftend (datetime).
I'd like to query that table, then run a query (on an entirely different database) that asks for production (which is calculated in an odd way - requiring three separate queries) using the start and end times, and store that in the original database with the shiftid and a production amount for the shift.
I've tried to do this using a Foreach Loop and a script task that builds a variable that would contain the query, but I'm continually hitting a brick wall there.
Dts.Variables("User::SQLshiftstart").Value = "SELECT value FROM[dbo].[AnalogHistory] WHERE TagName = 'Z_HISTFMZ_P2_0004' AND DateTime = '" & Dts.Variables("User::shiftstart").ToString
I keep getting an error - "Command text was not set for the command object". And googling that error doesn't push me any further down the path.
Help!
Well, I decided to go a different way instead of using a script object to build a variable. I actually created the variable in my SELECT:
SELECT (CONCAT
('SELECT CAST(value AS DECIMAL(10,4)) AS beg FROM [dbo].[AnalogHistory] WHERE TagName = ''Z_HISTDATA_P1_0007'' AND DateTime = '' ',
DateAdd(hh,-6,shiftstart),
' '' AND wwTimeZone = ''UTC'' '))
This way, I avoid having to build an intermediate script object and can directly query based on the variable name in my FOREACH loop.
In my SSIS package I have a dataflow that looks something like this.
My requirement is to log the end time of each flatfile destination (Or the time when each of the flat files is created) , in a SQL server table. To be more clear, there will be one row per flatfile in the log table. Is there any simple way(preferably) to accomplish this? Thanks in advance.
Update: I ended up using a script task after the dataflow and read the creation time of each of the file created in the dataflow. I also used same script task to insert logs into the table, just to keep things in one place. For details refer the post masked as answer.
In order to get the accurate date and timestamp of each flat file created as the destination, you'll need to create three new global variables and set up a for-each loop container in the control flow following your current data flow task and then add to the for-each loop container a script task that will read from one flat file at a time the date/time information. That information will then be saved to one of the new global variables that can then be applied in a second SQL task (also in the for-each loop) to write the information to a database table.
The following link provides a good example of the steps you'll need to apply. There are a few extra steps not applicable that you can easily exclude.
http://microsoft-ssis.blogspot.com/2011/01/use-filedates-in-ssis.html
Hope this helps.
After looking more closely at the toolbox, I think the best way to do this is to move each source/destination pairing into its own dataflow and use the OnPostExecute event of each dataflow to write to the SQL table.
Wanted to provide more detail to #TabAlleman's approach.
For each control flow task with a name like Bene_hic, you will have a source file and a destination file.
On the 'Event Handlers' tab for that executable (use the drop-down list,) you can create the OnPostExecute event.
In that event, I have two SQL tasks. One generates the SQL to execute for this control flow task, the second executes the SQL.
These SQL tasks are dependent on two user variables scoped in the OnPostExecute event. The EvaluateAsExpression property for both is set to True. The first one, Variable1, is used as a template for the SQL to execute and has a value like:
"SELECT execSQL FROM db.Ssis_onPostExecute
where stgTable = '" + #[System::SourceName] + "'"
#[System::SourceName] is an SSIS system variable containing the name of the control flow task.
I have a table in my database named Ssis_onPostExecute with two fields, an execSQL field with values like:
DELETE FROM db.TableStats WHERE TABLENAME = 'Bene_hic';
INSERT INTO db.TableStats
SELECT CreatorName ,t.tname, CURRENT_TIMESTAMP ,rcnt FROM
(SELECT databasename, TABLENAME AS tname, CreatorName FROM dbc.TablesV) t
INNER JOIN
(SELECT 'Bene_hic' AS tname,
COUNT(*) AS rcnt FROM db.Bene_hic) u ON
t.tname = u.tname
WHERE t.databasename = 'db' AND t.tname = 'Bene_hic';
and a stgTable field with the name of the corresponding control flow task in the package (case-sensitive!) like Bene_hic
In the first SQL task (named SQL,) I have the SourceVariable set to a user variable (User::Variable1) and the ResultSet property set to 'single row.' The Result Set detail includes a Result Name = 0 and Variable name as the second user variable (User::Variable2.)
In the second SQL task (exec,) I have the SQLSourceType property set to Variable and the SourceVariable property set to User::Variable2.
Then the package is able to copy the data in the source object to the destination, and whether it fails or not, enter a row in a table with the timestamp and number of rows copied, along with the table name and anything else you want to track.
Also, when debugging, you have to run the whole package, not just one task in the event. The variables won't be set correctly otherwise.
HTH, it took me forever to figure all this stuff out, working from examples on several web sites. I'm using code to generate the SQL in the execSQL field for each of the 42 control flow tasks, meaning I created 84 user variables.
-Beth
The easy solution will be:
1) drag the OLE DB Command from the tool box after the Fatfile destination.
2) Update Script to update table with current date when Flat file destination is successful.
3) You can create a variable (scope is project) with value systemdatetime.
4) You might have to create another variable depending on your package construct if Success or fail