I'm new to Azure Stream Analytics and query language. I have an ASA job which reads json data coming from my IoT Hub and feeds it to different functions based on one of the values. This is what I have now:
SELECT
*
INTO
storage
FROM
iothub
SELECT
*
INTO
storageQueueFunction
FROM
iothub
WHERE
recType LIKE '3'
SELECT
*
INTO
deviceTwinD2CFunctionApp
FROM
iothub
WHERE
recType LIKE '50'
SELECT
*
INTO
heartbeatD2CFunctionApp
FROM
iothub
WHERE
recType LIKE '51'
SELECT
*
INTO
ackC2D
FROM
iothub
WHERE
recType LIKE '54'
I'm pretty sure this could be done more efficiently but it's working for now.
My problem is that when a large number of events come in with recType 54, I think it is overloading my Function App "ackC2D".
My idea is to batch these types of events into a json array using something like a rolling window of 5 seconds, then send that array to the output where I can parse through the array event by event.
I haven't been able to find anything like this online, the closest I can find is aggregating data then outputting a calculation on the aggregate.
Is what I'm trying to do possible?
Thanks!
When configuring the Azure function output, you have the ability to specify 'Max batch size' and 'Max batch count' properties. If lot of input events arrive rapidly, keeping a high value for these properties will result in fewer calls to your Azure Function output (by automatically batching many outputs events in a single HTTP request).
Related
I am using Google API Client for fitness to retrieve user fitness data using the PHP Client library.
My usecase: To get users Total run time for the day, and also distance.
I am using the following code (sample code with key steps):
$service = new Google_Service_Fitness($google_client);
$dataSets = $service->users_dataset;
..
$aggBy->setDataTypeName("com.google.activity.segment");
..
$aggregates = $dataSets->aggregate('me',$aggReq);
I am using com.google.activity.segment as datatypename:
I get a response with run type with data
Activity type: 8
runtime : value in msec
no of segment: 2
Now i want to get the total distance travelled during running session,
From above response i can make out that user had 2 separate running activity, then how i can get the combined distance done by the particular activity?
Also how i can retrieve details about these in 2 running segment in details?
As said i am using PHP Client API.
I am trying to accomplish something that is pretty easy to do in SQL, but seemingly very challenging to do in SSIS without using SQL. Basically, I need to consolidate and concatenate a field of a many-to-one relationship.
Given entities: [Contract Item] (many) to (one) [Account]
There is a field [ari_productsummary] that contains the product listed on the Contract Item entity. We want to write that value to the Account as [ari_activecontractitems]. However, an Account may have more than one Contract Item record associated to it, in which case, we want to concatenate those values. We also only want the distinct values to be concatenated (distinct rows already solved within my data flow).
This can be accomplished by writing to a temporary table, and then using a query or view to obtain the summarized results as followed. I created a SQL table called TESTTABLE that contains the [ari_productsummary] from the Contract Item entity along with the referring [accountid] to map it back to Account. I then wrote the following query as a view:
SELECT distinct accountid,
(SELECT TT2.ari_productsummary + '; '
FROM TESTTABLE TT2
WHERE TT2.accountid = TT.accountid
FOR XML PATH ('')
) AS 'ari_activecontractitems'
FROM TESTTABLE TT
Executing that Query provides me the results that I want, which I can then use for importing into the Account entity as shown below:
But how do I do this in a SSIS dataflow without writing to a SQL table as a temporary placeholder for the data?? I want to do the entire process inside one dataflow container, without using a temporary SQL table/view. The whole summarization process needs to be done on the fly:
Does anyone have a solution that doesn't require a temporary SQL table/view/query, but is contained entirely within the data flow?
I am using VS 2017 and the KingswaySoft Dynamic CRM 365 ETL toolset to develop my solution/package.
Spit balling here as I don't Dynamics nor do I have the custom components.
Data Flow 1 - Contract aggregation
The purpose of this data flow is to replicate your logic in the elegant query you provided and shove that into a Cache Connection Manager (see Notes for 2008+ at the end)
KingswaySoft Dynamics Source -> Script Task -> Cache Transform
If you want to keep the sort in there, do it before the script task. The implementation I'll take with the Script Task is that it's fully blocking - that is all the rows must arrive before it can send any on. Tasks like the Merge Join are only partially blocking because the requirement of sorted data means that once you no longer have a match for the current item, you can send it on down the pipeline.
The Script Task is going to be asynchronous transformation. You'll have two output columns, your key accountid and your new derived column of ari_activecontractitems. That column will might need to be big - you'll know your data best but if it's a blob type in Dynamics (> 4k unicode or > 8k ascii characters) then you'll have to define the data type as DT_TEXT/DT_NTEXT
As inputs, you'll select accountid and ari_productsummary from your source.
The code should be pretty easy. We're going to accumulate the inbound data into a Dictionary.
// member variable
Dictionary<string, List<string>> accumulator;
The PreProcess method, we'll tack this in there to initialize our variable
// initialize in PreProcess method
accumulator = new Dictionary<string, List<string>>();
In the OnBufferRowSent (name approx)
// simulate the inbound queue
// row_id would be something like Rows.row_id
if (!accumulator.ContainsKey(row_id))
{
// Create an empty dictionary for our list
accumulator.Add(row_id, new List<string>());
}
// add it if we don't have it
if (!accumulator[row_id].Contains(invoice))
{
accumulator[row_id].Add(invoice);
}
Once you get the signal sent of no more data available, that's when you start buffering output data. The auto generated code will have placeholders for all this.
// This is how we shove data out the pipe
foreach(var kvp in accumulator)
{
// approximately thus
OutputBuffer1.AddRow();
OutputBuffer1.row_id = kvp.Key;
OutputBuffer1.ari_productsummary = string.Join("; ", kvp.Value);
}
We have an upcoming release that comes with a component that does exactly what you are trying to achieve without the need of writing custom code. The feature is currently under preview, please reach out to us for private access to the feature. You can find our contact information on our website.
UPDATE - June 5, 2020, we have made the components available for public access at https://www.kingswaysoft.com/products/ssis-productivity-pack/ as a result of our 2020 Release Wave 1. We have two components available that serve this kind of purpose. The Composition component will take input values and transform into a composite value in a SSIS column. The Decomposition component does the opposite, it would take an input value and split it into multiple rows using either delimiter-based text splitting or XML/JSON array splitting.
I want to draw the chart using zibbix data and my chart library, but I have some problems. Zabbix history get API doesn`t give any data when the monitoring target is off. As an example, I want to use this data for drawing my chart.
data [null, 2, null, null, 5, 6]
time [t1, t2, t3, t4, t5]
But the zabbix API returns the data like this:
data [2, 5, 6]
time [t2, t4, t5]
I don't know how to change data returned from zabbix to data for chart OR how to get the data from zabbix in the format that I want.
How can I achieve this?
That's the correct behavoir of the Zabbix API.
You have to decide how to handle the "voids" within your application.
The easiest (and imho correct) way is to ignore the missing values and plot the existing ones, like Grafana does.
To achieve your goal you can do something like this:
query the item first (item.get) and get the interval value
query the history (history.get)
using interval as reference, search for "voids" in the history and replace them with anything you want: a zero or a specific object to plot a big red interval that says "missing data"
This is actually ugly :) and of course it works only for items with a simple interval value: if you have custom intervals you need to check for them as well.
I'm currently working a project where I'm using Google Big Query to pull data from spreadsheets. I'm VERY new to SQL, so I apologize. I'm currently using the following code
Select *
From my_data
Where T1 > 1000
And T2 > 2000
So keeping the Select and From the same, I want to be able to run multiple queries where I can just keep changing the values I'm looking for between t1 and t2. Around 50 different values. I'd like for BigQuery to run through these 50 different values back to back. Is there a way to do this? Thanks!
I'm VERY new to SQL
... and I assume to BigQuery either ..., so
Below is one of the options for new users who are not familiar yet with BigQuery API and/or different clients rather than BigQuery Web UI.
BigQuery Mate adds parameters feature to BigQuery Web UI
What you need to do is
Save you query as below using Save Query Button
Notice <var_t1> and <var_t2>
Those are the parameters identifyable by BigQuery Mate
Now you can set those parameters
Click QB Mate and then Parameters to get to below form
Now you can set parameters with whatever values you want to run with;
Click on Replace Parameters OK Button and those values will appear in Editor. For example
After OK is clicked you get
So now you can run your query
To Run another round with new parameters, you need to load again your saved query to the editor by clicking on Edit Query Button
and now repeat settings parameters and so on
You can find BigQuery Mate Chrome extension here
Disclaimer: I am the author an the only developer of this tool
You may be interested in running parameterized queries. The idea would be to have a single query string, e.g.:
SELECT *
FROM YourTable
WHERE t1 > #t1_min AND
t2 > #t2_min;
You would execute this multiple times, where each time you bind different values of the t1_min and t2_min parameters. The exact logic would depend on the API through which you are using the client libraries, and there are language-specific examples in the first link that I provided.
If you are not concerned about sql-injection and just want to iteratively swap out parameters in queries, you might want to look into the mustache templating language (available in R as 'whisker').
If you are using R, you can iterate/automate this type of query with the condusco R package. Here's a complete R script that will accomplish this kind of iterative query using both whisker and condusco:
library(bigrquery)
library(condusco)
library(whisker)
# create a simple function that will create a query
# using {{{mustache}}} placeholders for any parameters
create_results_table <- function(params){
destination_table <- '{{{dataset_id}}}.{{{table_prefix}}}_results_{{{year_low}}}_{{{year_high}}}'
query <- '
SELECT *
FROM `bigquery-public-data.samples.gsod`
WHERE year > {{{year_low}}}
AND year <= {{{year_high}}}
'
# use whisker to swap out {{{mustache}}} placeholders with parameters
query_exec(
whisker.render(query,params),
project=whisker.render('{{{project}}}', params),
destination_table = whisker.render(destination_table,params),
use_legacy_sql = FALSE
)
}
# create an invocation query to provide sets of parameters to create_results_table
invocation_query <- '
SELECT
"<YOUR PROJECT HERE>" as project,
"<YOUR DATASET_ID HERE>" as dataset_id,
"<YOUR TABLE PREFIX HERE>" as table_prefix,
num as year_low,
num+1 as year_high
FROM `bigquery-public-data.common_us.num_999999`
WHERE num BETWEEN 1992 AND 1995
'
# call condusco's run_pipeline_gbq to iteratively run create_results_table over invocation_query's results
run_pipeline_gbq(
create_results_table,
invocation_query,
project = '<YOUR PROJECT HERE>',
use_legacy_sql = FALSE
)
The SQL engine hides away all nifty details on what API calls are being done. However, some cloud solutions have pricing per API call.
For instance:
select *
from transactionlines
retrieves all Exact Online transaction lines of the current company, but:
select *
from transactionlines
where financialyear = 2016
filters it effectively on REST API of Exact Online to just that year, reducing data volume. And:
select *
from gltransactionlines
where year_attr = 2016
retrieves all data since the where-clause is not forwarded to this XML API of Exact.
Of course I can attach fiddler or wireshark and try to analyze the data volume, but is there an easier way to analyze the data volume of API calls with Invantive SQL?
First of all, all calls handled by Invantive SQL are logged in the Invantive Cloud together with:
the time
data volume in both directions
duration
to enable consistent API use monitoring across all supported cloud platforms. The actual data is not logged and travels directly.
You can query the same numbers from within your session, for instance:
select * from exactonlinerest..projects where code like 'A%'
retrieves all projects with a code starting with 'A'. And then:
select * from sessionios#datadictionary
shows you the API calls made:
You can also put a query like to following at the end of your session before logging off:
select main_url
, sum(bytes_received) bytes_received
, sum(duration_ms) duration_ms
from ( select regexp_replace(url, '\?.*', '') main_url
, bytes_received
, duration_ms
from sessionios#datadictionary
)
group
by main_url
with a result such as: