Need documentation for *.analysis.windows.net/public/reports/querydata - json

I am reverse engineering an app that sends queries to
SOMESERVERNAME.analysis.windows.net/public/reports/querydata via an HTTP POST of an JSON-structured query.
Some initial lines of a sample query are at the end of this message.
I can't find any documentation on this anywhere. I don't know if this is some secret API or what. I ultimately would like to just ignore the aggregations altogether and just dump the raw data, which seems to sit in some flat-file type container on the back-end, but without some API documentation I'm stuck with just re-running the super basic handful of queries I've been able to intercept.
Note: this app is an embedded analytics page created with PowerBI, but the only REST API I can find for PowerBI has nothing to do with querying, but just basic object management.
Thanks!
{
"version": "1.0.0",
"queries": [
{
"Query": {
"Commands": [
{
"SemanticQueryDataShapeCommand": {
"Query": {
"Version": 2,
"From": [
{
"Name": "s",
"Entity": "Sheet1"
}
],
"Select": [
{
"Aggregation": {
"Expression": {
"Column": {
"Expression": {
"SourceRef": {
"Source": "s"
}
},
"Property": "Total"
}
},
"Function": 0
},
"Name": "Sum(Sheet1.Total)"
}
],
"Where": [
{
"Condition": {
"In": {
"Expressions": [
{
"Column": {
"Expression": {
"SourceRef": {
"Source": "s"
}
},
"Property": "Year"
}
}
],
"Values": [
[
{
"Literal": {
"Value": "'2018'"
}
}
]
]
}
}
},
............

I have built a client that scrapes data off a specific Power BI report using the same API, but probably you'll be able to adapt it to your use case. Maybe we can even abstract the code into a more generalized Power BI client!
Having tinkered with the API for two days, I realised that there are many ways the data can be formatted:
"nested"/multidimensional data can be unflattened, flattened by 1 degree, etc.
a primary "table" of a result dataset (in data.PH) can reference others (in data.SH)
The basics are as follows:
A dataset is structured like a multidimensional table, with cells containing values.
In a set of cells, the first always has a field S that contains the schema of its and all subsequent cells.
The schema maps a field of each cell's object with a selection from your query, e.g. the G0 field with the queried column age.
My client seems to work only with a specific type of query (SemanticQueryDataShapeCommand), a specific nr of dimensions and a specific column marked as primary (via Binding.Primary). But maybe that helps! https://github.com/derhuerst/fetch-bvg-occupancy/blob/1ebb864b1ff7130f9d2f0ab031c6d78bcabdd633/lib/parse-dataset.js

The only documented way to use this API is through the ADOMD.NET or OleDb provider.
If you want to send a DAX/MDX query and retrieve data programmatically, there's a sample of how to front-end the service with a simple REST API here.

Related

Azure Data Factory - attempting to add params to dynamic content in the body of a REST API request

In Azure Data Factory, I'm attempting to add params to the body of a copy task (connected to a REST API post request as the source). I'm wanting to use dynamic content to do so, but I'm struggling trying to find the real solution for the proper nomenclature. Here's what I have so far.
copy task
dynamic content
{
"datatable":
{
"start":0,
"length": 10000,
"filters": [
{
"name": "Arrival Dates",
"start": "pipeline().parameters.pDate1",
"end": "pipeline().parameters.pDate2"
}
],
"sort": [
{
"name": "start_date",
"order": "ASC"
}
]
}
}
You'll notice that I've added params for dates. Is this the correct nomenclature for trying to add dynamic content? The autocorrect tried to add the # sign in the beginning of the code block, which will cause the entire thing to error out. I've tried adding it before each parameter, but that isn't actually reading the dynamic values either.
This is not correct. You need to use concat to concatenate the different variables. Something like this :
#concat('{ "datatable": { "start":0, "length": 10000, "filters": [ { "name": "Arrival Dates", "start": "',pipeline().parameters.pDate1,'", "end": "',pipeline().parameters.pDate2,'" } ], "sort": [ { "name": "start_date", "order": "ASC" } ] } }')
This is also documented in the SO question.

Elasticsearch dynamic mapping for object within attribute

Wondering if I can create a "dynamic mapping" within an elasticsearch index. The problem I am trying to solve is the following: I have a schema that has an attribute that contains an object that can differ greatly between records. I would like to mirror this data within elasticsearch if possible but believe that automatic mapping may get in the way.
Imagine a scenario where I have a schema like the following:
{
name: string
origin: string
payload: object // can be of any type / schema
}
Is it possible to create a mapping that supports this? I do not need to query the records by this payload attribute, but it would be great if I can.
Note that I have checked the documentation but am confused on if what elastic calls dynamic mapping is what I am looking for.
It's certainly possible to specify which queryable fields you expect the payload to contain and what those fields' mappings should be.
Let's say each doc will include the fields payload.livemode and payload.created_at. If these are the only two fields you'll want to perform queries on, and you'd like to disable dynamic, index-time mappings autogenerated by Elasticsearch for the rest of the fields, you can use dynamic templates like so:
PUT my-payload-index
{
"mappings": {
"dynamic_templates": [
{
"variable_payload": {
"path_match": "payload",
"mapping": {
"type": "object",
"dynamic": false,
"properties": {
"created_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"livemode": {
"type": "boolean"
}
}
}
}
}
],
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"origin": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
Then, as you ingest your docs:
POST my-payload-index/_doc
{
"name": "abc",
"origin": "web.dev",
"payload": {
"created_at": "2021-04-05 08:00:00",
"livemode": false,
"abc":"def"
}
}
POST my-payload-index/_doc
{
"name": "abc",
"origin": "web.dev",
"payload": {
"created_at": "2021-04-05 08:00:00",
"livemode": true,
"modified_at": "2021-04-05 09:00:00"
}
}
and verify with
GET my-payload-index/_mapping
no new mappings will be generated for the fields payload.abc nor payload.modified_at.
Not only that — the new fields will also be ignored, as per the documentation:
These fields will not be indexed or searchable, but will still appear in the _source field of returned hits.
Side note: if fields are neither stored nor searchable, they're effectively the opposite of enabled.
The Big Picture
Working with variable contents of a single, top-level object is quite standard. Take for instance the stripe event object — each event has an id, an api_version and a few other shared params. Then there's the data object that's analogous to your payload field.
Now, all is fine, until you need to aggregate on the contents of your payload. See, since the content is variable, so are the data paths / accessors. But wildcards in aggregation paths don't work in Elasticsearch. Scripts do but are onerous to maintain.
Back to stripe. They partially solved it through what they call polymorphic, typed hashes — as discussed in their blog on API design:
A pretty neat approach that's worth emulating.
P.S. I discuss dynamic templates in more detail in the chapter "Mapping Automation" of my ES Handbook.

Logstash json field removal

We have a heavily nested json document containing server metrcs, the document contains > 1000 fields some of which are completely irrelevant to us for analytic purposes so i would like to remove them before indexing the document in Elastic.
However i am unable to find the correct filter to use as the fields i want to remove have common names in multiple different objects within the document.
The source document looks like this ( reduced in size for brevity)
[
{
"server": {
"is_master": true,
"name": "MYServer",
"id": 2111
},
"metrics": {
"Server": {
"time": {
"boundary": {},
"type": "TEXT",
"display_name": "Time",
"value": "2018-11-01 14:57:52"
}
},
"Mem_OldGen": {
"used": {
"boundary": {},
"display_name": "Used(mb)",
"value": 687
},
"committed": {
"boundary": {},
"display_name": "Committed(mb)",
"value": 7116
}
"cpu_count": {
"boundary": {},
"display_name": "Cores",
"value": 4
}
}
}
}
]
The data is loaded into logstash using the http_poller input plugin and needs to be processed before sending to Elastic for indexing.
I am trying to remove the fields that are not relevant for us to track for analytical purposes, these include the "display_name" and "boundary" fields from each json object in the different metrics.
I have tried using the mutate filter to remove the fields but because they exist in so many different objects it requires to many coded paths to be added to the logstash config.
I have also looked at the ruby filter, which seems promising as it can look the event, but i am unable to get it to crawl the entire json document, or more importantly actually remove the fields.
Here is what i was trying as a test
filter {
split{
field => "message"
}
ruby {
code => '
event.get("[metrics][Mem_OldGen][used]").to_hash.keys.each { |k|
logger.info("field is:", k)
if k.include?("display_name")
event.remove(k)
end
if k.include?("boundary")
event.remove(k)
end
}
'
}
}
It first splits the input at the message level to create one event per server, then tries to remove the fields from a specific metric.
Any help you be greatly appreciated.
If I get the point, you want to keep just the value key.
So, considering the response hash:
response = {
"server": {
"is_master": true,
"name": "MYServer",
"id": 2111
},
"metrics": {
...
You could do:
response[:metrics].transform_values { |hh| hh.transform_values { |h| h.delete_if { |k,v| k != :value } } }
#=> {:server=>{:is_master=>true, :name=>"MYServer", :id=>2111}, :metrics=>{:Server=>{:time=>{:value=>"2018-11-01 14:57:52"}}, :Mem_OldGen=>{:used=>{:value=>687}, :committed=>{:value=>7116}, :cpu_count=>{:value=>4}}}}

Microsoft Academic API, Knowledge graph search -- retrieved paper empty

I'm using the graph search method of the Microsoft Academic API to retrieve papers by ID using the following query:
{
"path": "/paper",
"paper": {
"type": "Paper",
"id": [2557283755],
"select": [
"PublishYear",
"CitationCount",
"OriginalTitle",
"NormalizedTitle",
"DOI"
]
}
}
The issue I'm having is that for some papers the response is completely empty, even though when I lookup the paper through the user interface, it has full metadata. For example, trying to retrieve this paper through the API yields
{
"Results": [
[
{
"CellID": 2557283755,
"PublishYear": "",
"CitationCount": "",
"OriginalTitle": "",
"NormalizedTitle": "",
"DOI": ""
}
]
]
}
while for a different paper the response is correct:
{
"Results": [
[
{
"CellID": 2112796928,
"PublishYear": "1998",
"CitationCount": "135",
"OriginalTitle": "Gradient-based learning applied to document recognition",
"NormalizedTitle": "gradient based learning applied to document recognition",
"DOI": "10.1109/5.726791"
}
]
]
}
Does anyone have any experience with this? To me it looks like an error in the database, but I wanted to make sure it's not something related to my query. Thanks!
The issue is caused by data version difference. The version of the academic graph data set used by graph search method might not be the same as that of Microsoft Academic portal https://academic.microsoft.com. We are deploying a new data pipeline to make the version difference as small as possible.

Obtain a different JSON object structure in AngularJS

I'm Working on AngularJS.
In this part of the project my goal is to obtain a JSON structure after filling a form with some particulars values.
Here's the fiddle of my simple form: Fiddle
With the form I will do a query to KairosDB, that is my NoSql Database, I will query data from it by a JSON object. The form is structured in this way:
a Name
a certain Number of Tags, with Tag Id ("ch" for example) and tag value ("932" for example)
a certain Number of Aggregators to manipulate data coming from DB
Start Timestamp and End Timestamp (now they are static and only included in the final JSON Object)
After filling this form, with my code I'll obtain for example this JSON object:
{
"metrics": [
{
"tags": [
{
"id": "ch",
"value": "932"
},
{
"id": "ch",
"value": "931"
}
],
"aggregators": {
"name": "sum",
"sampling": [
{
"value": "1",
"unit": "milliseconds",
"type": "SUM"
}
]
}
}
],
"cache_time": 0,
"start_absolute": 123,
"end_absolute": 1234
}
Unfortunately, KairosDB accepts a different structure, and as you could see, Tag id "ch" doesn't hase an "id" string before, or for example, Tag values coming from the same tag id are grouped together
{
"metrics": [
{
"tags": {
"ch": [
"932",
"931"
]
},
"name": "AIENR",
"aggregators": [
{
"name": "sum",
"sampling": {
"value": "1",
"unit": "milliseconds"
}
}
]
}
],
"cache_time": 0,
"start_absolute": 1367359200000,
"end_absolute": 1386025200000
}
My question is: Is there a way to obtain the JSON structure like the one accepted by Kairos DB with an Angular JS form?. Thanks to everyone.
I've seen this topic as the one more similar to mine but it isn't in AngularJS.
Personally, I'd do the refactoring work in the backend - Have what ever server interfaces sends and receives data do the manipulation - Otherwise you'll end up needing to refactor your data inside Angular anywhere you want to use that dataset.
Where as doing it in the backend would put it in a single access point.
Of course, you could do it in Angular, just replace userString in the submitData method with a copy of the array and replace the tags section with data in the new format, and likewise refactor the returned result to the correct format when you get a reply.