Azure Application Insights Kusto Language Summurize by where TimeGenerated Value - mysql

Is there a way to have the where clausule inside a different column with Kusto Language. I am aware of the "Pivot" syntax that is also used with SQL to create columns based on unique value. But don't think this will help in my case. There is also another SO question who almost has the same question as me. But his solution Didn't work either.
Context of my query: This query gets the runtime of each machine for every month. You may be wondering why I used such a long query to achieve this. Any opnions and adjustments are welcome. I am very new to the language. And I already use the top query to get the start and stop times of each VM in another project.
Original query :
AzureActivity
| where ResourceProvider == "Microsoft.Compute"
and ActivityStatus == "Succeeded"
and OperationName == "Deallocate Virtual Machine"
| project DeallocateResource=Resource
,DeallocatedDate=format_datetime(EventSubmissionTimestamp, 'yyyy-MM-dd')
,DeallocatedTime=format_datetime(EventSubmissionTimestamp, 'HH:mm:ss')
| join kind=fullouter (AzureActivity
| where ResourceProvider == "Microsoft.Compute"
and ActivityStatus == "Succeeded"
and OperationName == "Start Virtual Machine"
| project StartupResource=Resource
,StartDate=format_datetime(EventSubmissionTimestamp, 'yyyy-MM-dd')
,StartTime=format_datetime(EventSubmissionTimestamp, 'HH:mm:ss')
) on $right.StartupResource == $left.DeallocateResource
| where StartDate == DeallocatedDate
| project Resource=coalesce(StartupResource, DeallocateResource) ,
Runtime = round(todouble(datetime_diff('minute', todatetime(strcat(StartDate , " " , DeallocatedTime )) , todatetime(strcat(StartDate , " " , StartTime )))) / 60)
| summarize sum(Runtime) by Resource
Now the query above will get the sum of the running time with the time range you specifically set in the portal.
To get the sum of the running time for each month (Log analytics is set for 90 days so 3 months ago) I add these where statements. in 3 Different queries. The work gets done And I got 3 different tables with the running time of each month being (month1, month2, month3 ).
| where TimeGenerated > ago(30d)
| where TimeGenerated between(ago(30d) .. ago(60d) )
| where TimeGenerated between(ago(60d) .. ago(90d) )
But these are 3 different queries and 3 different tables. My Goal is to get this look where you have the 3 different (timegenerated where statements inside one Table)
Tried the SO question solution but that didn't go as planned (got an Failed to resolve scalar expression named 'TimeGenerated' error while adding these lines of code to my original query)
| summarize sum(Runtime) by Resource , bin(TimeGenerated, 1m)
| summarize Fistmonth = TimeGenerated > ago(30d),
SecondMonth = TimeGenerated between(ago(30d) .. ago(60d)) ,
ThirdMonth = Runtime_,TimeGenerated between(ago(60d) .. ago(90d) ) by Resource
Does anyone knows what I am missing or overseeing here. Is this possible with kusto ?
And do I use an overhead of query for something that can be done in a couple of lines.

if I understand your scenario correctly, you could potentially achieve that using sumif, assuming you know the months you're targeting in advance.
Here's an example:
datatable(Resource:string, Runtime:double, TimeGenerated:datetime)
[
"A", 13.4, datetime(2019-01-01 11:11:11),
"B", 1.34, datetime(2019-01-01 10:10:10),
"C", 0.13, datetime(2019-01-01 12:12:12),
"A", 12.4, datetime(2019-02-01 11:11:11),
"B", 1.24, datetime(2019-02-01 09:09:09),
"B", 2.24, datetime(2019-02-01 09:10:09),
"B", 3.24, datetime(2019-02-01 09:11:09),
"C", 0.12, datetime(2019-02-01 08:08:08),
"A", 14.4, datetime(2019-03-01 07:07:07),
"B", 1.44, datetime(2019-03-01 05:05:05),
"C", 0.14, datetime(2019-03-01 06:06:06),
]
| summarize Month1 = sumif(Runtime, TimeGenerated between(datetime(2019-01-01)..datetime(2019-02-01))),
Month2 = sumif(Runtime, TimeGenerated between(datetime(2019-02-01)..datetime(2019-03-01))),
Month3 = sumif(Runtime, TimeGenerated between(datetime(2019-03-01)..datetime(2019-04-01)))
by Resource

Related

how parallel fetch data from MySQL with Sequel Pro in R

I want to fetch data from mysql with seqlpro in R but when I run the query it takes ages.
here is my code :
old_value<- data.frame()
new_value<- data.frame()
counter<- 0
for (i in 1:length(short_list$id)) {
mydb = OpenConn(dbname = '**', user = '**', password = '**', host = '**')
query <- paste0("select * from table where id IN (",short_list$id[i],") and country IN ('",short_list$country[i],"') and date >= '2019-04-31' and `date` <= '2020-09-1';", sep = "" )
temp_old <- RMySQL::dbFetch(RMySQL::dbSendQuery(mydb, query), n = -1
query <- paste0("select * from table2 where id IN (",short_list$id[i],") and country IN ('",short_list$country[i],"') and date >= '2019-04-31' and `date` <= '2020-09-1';", sep = "" )
temp_new <- RMySQL::dbFetch(RMySQL::dbSendQuery(mydb, query), n = -1)
RMySQL::dbDisconnect(mydb)
new_value<- rbind(temp_new,new_value)
old_value<- rbind(temp_old,old_value)
counter=counter+1
base::print(paste("completed for ",counter),sep="")
}
is there any way that I can writ it more efficient and call the queries faster because i have around 5000 rows which should go into the loop. Actually this query works but it takes time.
I have tried this but still it gives me error :
#parralel computing
clust <- makeCluster(length(6))
clusterEvalQ(cl = clust, expr = lapply(c('data.table',"RMySQL","dplyr","plyr"), library, character.only = TRUE))
clusterExport(cl = clust, c('config','short_list'), envir = environment())
new_de <- parLapply(clust, short_list, function(id,country) {
for (i in 1:length(short_list$id)) {
mydb = OpenConn(dbname = '*', user = '*', password = '*', host = '**')
query <- paste0("select * from table1 where id IN (",short_list$id[i],") and country IN ('",short_list$country[i],"') and source_event_date >= date >= '2019-04-31' and `date` <= '2020-09-1';", sep = "" )
temp_data <- RMySQL::dbFetch(RMySQL::dbSendQuery(mydb, query), n = -1) %>% data.table::data.table()
RMySQL::dbDisconnect(mydb)
return(temp_data)}
})
stopCluster(clust)
gc(reset = T)
new_de <- data.table::rbindlist(new_de, use.names = TRUE)
I have also defined the list of short_list as following:
short_list<- as.list(short_list)
and inside short_list is:
id. country
2 US
3 UK
... ...
However it gives me this error:
Error in checkForRemoteErrors(val) :
one node produced an error: object 'i' not found
However when I remove i from the id[i] and country[i] it only give me the first row result not get all ids and country result.
I think an alternative is to upload the ids you need into a temporary table, and query for everything at once.
tmptable <- "mytemptable"
dbWriteTable(conn, tmptable, short_list, create = TRUE)
alldat <- dbGetQuery(conn, paste("
select t1.*
from ", tmptable, " tmp
left join table1 t1 on tmp.id=t1.id and tmp.country=t1.country
where t1.`date` >= '2019-04-31' and t1.`date` <= '2020-09-1'"))
dbExecute(conn, paste("drop table", tmptable))
(Many DBMSes use a leading # to indicate a temporary table that is only visible to the local user, is much less likely to clash in the schema namespace, and is automatically cleaned when the connection is closed. I generally encourage use of temp-tables here, check with your DB docs, schema, and/or DBA for more info here.)
The order of tables is important: by pulling all from mytemptable and then left join table1 onto it, we are effectively filtering out any data from table1 that does not include a matching id and country.
This doesn't solve the speed of data download, but some thoughts on that:
Each time you iterate through the queries, you have not-insignificant overhead; if there's a lot of data then this overhead should not be huge, but it's still there. Using a single query will reduce this overhead significantly.
Query time can also be affected by any index(ices) on the tables. Outside the scope of this discussion, but might be relevant if you have a large-ish table. If the table is not indexed efficiently (or the query is not structured well to use those indices), then each query will take a finite amount of time to "compile" and return data. Again, overhead that will be reduced with a single more-efficient query.
Large queries might benefit from using the command-line tool mysql; it is about as fast as you're going to get, and might iron over any issues in RMySQL and/or DBI. (I'm not saying they are inefficient, but ... it is unlikely that a free open-source driver will be faster than MySQL's own command-line utility.
As for doing this in parallel ...
You're using parLapply incorrectly. It accepts a single vector/list and iterates over each object in that list. You might use it iterating over the indices of a frame, but you cannot use it to iterate over multiple columns within that frame. This is exactly like base R's lapply.
Let's show what is going on when you do your call. I'll replace it with lapply (because debugging in multiple processes is difficult).
# parLapply(clust, mtcars, function(id, country) { ... })
lapply(mtcars, function(id, country) { browser(); 1; })
# Called from: FUN(X[[i]], ...)
debug at #1: [1] 1
id
# [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2
# [24] 13.3 19.2 27.3 26.0 30.4 15.8 19.7 15.0 21.4
country
# Error: argument "country" is missing, with no default
Because the argument (mtcars here, short_list in yours) is a data.frame, since it is a list-like object, lapply (and parLapply) operate on each column at a time. You were hoping that it would "unzip" the data, applying the first column's value to id and the second column's value to country. In fact, the is a function that does this: Map (and the parallel's clusterMap, as I suggested in my comment). More on that later.
The intent of parallelizing things is to not use the for loop inside the parallel function. If short_list has 10 rows, and if your use of parLapply were correct, then you would be querying all rows 10 times, making your problem significantly worse. In pseudo-code, you'd be doing:
parallelize for each row in short_list:
# this portion is run simultaneously in 10 difference processes/threads
for each row in short_list:
query for data related to this row
Two alternatives:
Provide a single argument to parLapply representing the rows of the frame.
new_de <- new_de <- parLapply(clust, seqlen(NROW(short_list)), function(rownum) {
mydb = OpenConn(dbname = '*', user = '*', password = '*', host = '**')
on.exit({ DBI::dbDisconnect(mydb) })
tryCatch(
DBI::dbGetQuery(mydb, "
select * from table1
where id=? and country=?
and source_event_date >= date >= '2019-04-31' and `date` <= '2020-09-1'",
params = list(short_list$id[rownum], short_list$country[rownum])),
error = function(e) e)
})
Use clusterMap for the same effect.
new_de <- clusterMap(clust, function(id, country) {
mydb = OpenConn(dbname = '*', user = '*', password = '*', host = '**')
on.exit({ DBI::dbDisconnect(mydb) })
tryCatch(
DBI::dbGetQuery(mydb, "
select * from table1
where id=? and country=?
and source_event_date >= date >= '2019-04-31' and `date` <= '2020-09-1'",
params = list(id, country)),
error = function(e) e)
}, short_list$id, short_list$country)
If you are not familiar with Map, it is like "zipping" together multiple vectors/lists. For example:
myfun1 <- function(i) paste(i, "alone")
lapply(1:3, myfun1)
### "unrolls" to look like
list(
myfun1(1),
myfun1(2),
myfun1(3)
)
myfun3 <- function(i,j,k) paste(i, j, k, sep = '-')
Map(f = myfun3, 1:3, 11:13, 21:23)
### "unrolls" to look like
list(
myfun3(1, 11, 21),
myfun3(2, 12, 22),
myfun3(3, 13, 23)
)
Some liberties I took in that adapted code:
I shifted from the dbSendQuery/dbFetch double-tap to a single call to dbGetQuery.
I'm using DBI functions, since DBI functions provide a superset of what each driver's package provides. (You're likely using some of it anyway, perhaps without realizing it.) You can switch back with no issue.
I added tryCatch, since sometimes errors can be difficult to deal with in parallel processes. This means you'll need to check the return value from each of your processes to see if either inherits(ret, "error") (problem) or is.data.frame (normal).
I used on.exit so that even if there's a problem, the connection closure should still occur.

Translate T-SQL Json query to USQL

Hi I am trying to translate this logic from T-sql json query to usql. But i am not able to achieve the same results. Any help will be appreciated.
select N'{
"rowid": "1",
"freeresponses": {
"fr1": "1.1",
"fr2": "1.1",
"fr3": "1.3",
"fr4": "1.4",
"fr5": "1.4"
}
}'as jsontext
into dbo.tmp#
SELECT convert (int, JSON_VALUE(jsontext,'$.rowid' )) as rowid,
c.[key], c.[value]
FROM dbo.tmp#
cross apply openjson(json_query(jsontext,'$.freeresponses') )
as c;
the result will be like this in SQL server.
rowid | key | value
1 | fr1| 1.1
1 | fr2| 1.1
1 | fr3| 1.3
1 | fr4 | 1.4
1 | fr5| 1.4
To achive the same result in USQL i have tried the below and errors.
REFERENCE ASSEMBLY Staging.[Newtonsoft.Json];
REFERENCE ASSEMBLY Staging.[Microsoft.Analytics.Samples.Formats];
USING Microsoft.Analytics.Samples.Formats.Json;
DECLARE #inputmasterfileset string = "/sourcedata/samplefreeresponse.txt";
DECLARE #outputfreeresponse string = "/sourcedata/samplefreeresponseoutput.txt";
#freeresponse1 =
EXTRACT rowid string,
freeresponses string
FROM #inputmasterfileset
USING new JsonExtractor("");
#freeresponse2 =
SELECT rowid,
JsonFunctions.JsonTuple(freeresponses).Values AS freeresponses
FROM #freeresponse1;
#freeresponse3 =
SELECT rowid
,JsonFunctions.JsonTuple(free) AS values
FROM #freeresponse2
CROSS APPLY
EXPLODE(freeresponses) AS c(free);
OUTPUT #freeresponse3
TO #outputfreeresponse
USING Outputters.Text('|', outputHeader:true,quoting:false);
The catch is, i dont know how the keys are named the json document so i cannot specify JsonFunctions.JsonTuple(free)["fr1"] in stage 3 of the code and I want the result as same as what i got in T-SQL.
Much appreciated.
I have resolved it myself. It was the confusion with SQL MAP and SQL ARRAY.

SSRS Report Custom Sort

I need to implement custom sort on SSRS report on a Payment-Range field obtained from one of the dateset
Payment-Range is appearing like this:
$0 - $200
$200.01 - $1000
$1,000.01 - $10,000
$10,000.01 - $20,000
$20,000.01 - $30,000
$30,000.01 - $40,000
$40,000.01 - $50,000
$50,000.01 - $60,000
I have used if else in order to implement
=IIF(Fields!netPaymentRange.Value= "$0 - $200", "A",
IIF(Fields!netPaymentRange.Value= "$200.01 - $1000", "B",
IIF(Fields!netPaymentRange.Value= "$1,000.01 - $10,000", "C",
IIF(Fields!netPaymentRange.Value= "$20,000.01 - $30,000", "D",
IIF(Fields!netPaymentRange.Value= "$30,000.01 - $40,000", "E",
IIF(Fields!netPaymentRange.Value= "$40,000.01 - $50,000", "F",
IIF(Fields!netPaymentRange.Value= "$50,000.01 - $60,000", "G","")))))))
but it is not working for me. Please suggest
I would create a CTE with a select from values query to create the sort order for your list of payment ranges. Then you can join to the source table/view for the report dataset. I would still suggest storing the payment_range as a table.
Example SQL
WITH
payment_range
AS
(
SELECT tbl.* FROM (VALUES
( '$0 - $200', 1)
, ( '$200.01 - $1000', 2)
, ( '$1,000.01 - $10,000', 3)
, ( '$20,000.01 - $30,000', 4)
, ( '$30,000.01 - $40,000', 5)
, ( '$40,000.01 - $50,000', 6)
, ( '$50,000.01 - $60,000', 7)
) tbl ([netPaymentRange], [netPaymentRangeSortOrder])
)
SELECT
*
FROM
payment_range --join to your source table here
ORDER BY
[netPaymentRangeSortOrder]
Results

Available Filters With Specified Ranges In SSRS

I am working on a Chart in my report.
As I have too many records where CountId = 1, I have set up a filter showing an available values list like this:
CountId :
1
2
3
Between 4 to 6
Between 7 to 9
Above 10
If I set the available value 1 or 2 or 3 it shows results, but I don`t know how to set a filter for between and above.
I want a filter some thing like this - available filters are:
1
2
3
4
Above 5 or greater than equal to 5
You've got a mix of operators, so maybe you should look at an expression based filter to try and handle these different cases, something like:
Expression (type Text):
=Switch(Parameters!Count.Value = "1" and Fields!Count.Value = 1, "Include"
, Parameters!Count.Value = "2" and Fields!Count.Value = 2, "Include"
, Parameters!Count.Value = "3" and Fields!Count.Value = 3, "Include"
, Parameters!Count.Value = "4 to 6" and Fields!Count.Value >= 4 and Fields!Count.Value <= 6, "Include"
, Parameters!Count.Value = "7 to 9" and Fields!Count.Value >= 7 and Fields!Count.Value <= 9, "Include"
, Parameters!Count.Value = "Above 10" and Fields!Count.Value >= 10, "Include"
, true, "Exclude")
Operator:
=
Value:
Include
This assumes a string parameter Count populated with the above values.
This works by calculating the parameter and field combinations to produce a constant, either Include or Exclude, then displaying all rows that return Include.
As mentioned in a comment, it's difficult to follow exactly what you're asking here. I've done my best but if you have more questions it would be best to update the question with some sample data and how you'd like this data displayed.

Split column string into multiple columns strings

I have a entry in the table that is a string which is delimited by semicolons. Is possible to split the string into separate columns? I've been looking online and at stackoverflow and I couldn't find one that would do the splitting into columns.
The entry in the table looks something like this (anything in brackets [] is not actually in my table. Just there to make things clearer):
sysinfo [column]
miscInfo ; vendor: aaa ; bootr: bbb; revision: ccc; model: ddd [string a]
miscInfo ; vendor: aaa ; bootr: bbb; revision: ccc; model: ddd [string b]
...
There are a little over one million entries with the string that looks like this. Is it possible in mySQL so that the query returns the following
miscInfo, Vendor, Bootr, Revision , Model [columns]
miscInfo_a, vendor_a, bootr_a, revision_a, model_a
miscInfo_b, vendor_b, bootr_b, revision_b, model_b
...
for all of the rows in the table, where the comma indicates a new column?
Edit:
Here's some input and output as Bohemian requested.
sysinfo [column]
Modem <<HW_REV: 04; VENDOR: Arris ; BOOTR: 6.xx; SW_REV: 5.2.xxC; MODEL: TM602G>>
<<HW_REV: 1; VENDOR: Motorola ; BOOTR: 216; SW_REV: 2.4.1.5; MODEL: SB5101>>
Thomson DOCSIS Cable Modem <<HW_REV: 4.0; VENDOR: Thomson; BOOTR: 2.1.6d; SW_REV: ST52.01.02; MODEL: DCM425>>
Some can be longer entries but they all have similar format. Here is what I would like the output to be:
miscInfo, vendor, bootr, revision, model [columns]
04, Arris, 6.xx, 5.2.xxC, TM602G
1, Motorola, 216, 2.4.1.5, SB5101
4.0, Thomson, 2.1.6d, ST52.01.02, DCM425
You could make use of String functions (particularly substr) in mysql: http://dev.mysql.com/doc/refman/5.0/en/string-functions.html
Please take a look at how I've split my coordinates column into 2 lat/lng columns:
UPDATE shops_locations L
LEFT JOIN shops_locations L2 ON L2.id = L.id
SET L.coord_lat = SUBSTRING(L2.coordinates, 1, LOCATE('|', L2.coordinates) - 1),
L.coord_lng = SUBSTRING(L2.coordinates, LOCATE('|', L2.coordinates) + 1)
In overall I followed UPDATE JOIN advice from here MySQL - UPDATE query based on SELECT Query and STR_SPLIT question here Split value from one field to two
Yes I'm just splitting into 2, and SUBSTRING might not work well for you, but anyway, hope this helps :)