Converting JSON Table Array Data in Azure Data Factory from Log Analytics REST API JSON Response to Multiple JSON Documents in same file - json

I'm currently trying to extract data out of Log Analytics through its REST API. I have been successful at using a Copy Data activity to store the response in an Azure Data Lake Gen 2 account.
The format is roughly similar to the example from the Log Analytics API Reference Page.
{
"tables": [
{
"name": "PrimaryResult",
"columns": [
{
"name": "Category",
"type": "string"
},
{
"name": "count_",
"type": "long"
}
],
"rows": [
[
"Administrative",
20839
],
[
"Recommendation",
122
],
[
"Alert",
64
],
[
"ServiceHealth",
11
]
]
}
] }
My dataset is much larger with more columns more values etc but the principals are the same.
What I am trying to do is generate a new JSON file that would hold the table but multiple documents in the same file e.g.
[{
"Category": "Administrative",
"count_": 20839
},
{
"Category": "Recommendation",
"count_": 122
},
{
"Category": "Alert",
"count_": 64
},
{
"Category": "ServiceHealth",
"count_": 11
}]
The output of this would be stored back into the data lake and then ideally could be used as a source for a copy activity to go into an Azure SQL Database.
I have tried accomplishing this using Data Flows Flattening but haven't been successful with this up until this point as when trying to map the column name it doesn't see individual column names just that level of the document where the column names are defined.
How would I go about flattening the dataset so it appears as desired? Is this an unrealistic expectation of Data flows or is this task more suitable for something like Azure Databricks?

Related

pandas json_normalize nested json where dictionary only exists on some records

I am trying to run pandas.json_normalize on a data file that has highly varied, nested json, where the content of the records can vary considerably.
I am processing a house listing file and trying to pull out prices. The prices data is stored as follows, and 'prices' is at the first nesting level within the json file:
"prices": [
{
"amountMax": 420000,
"amountMin": 420000,
"availability": "false",
"currency": "USD",
"dateSeen": [
"2020-12-21T11:57:17.190Z",
"2020-12-25T02:35:41.009Z"
],
"isSale": "false",
"isSold": "true",
"pricePerSquareFoot": 235,
"sourceURLs": [
"https://www.redfin.com/FL/Coconut-Creek/.../home/4146834"
]
}, # followed by additional entries
I am using the following line of code, which works if I edit the input file down to a single record that includes a 'prices' section:
df3 = pd.json_normalize(df['records'], record_path='prices',
meta=['id'],
errors='ignore'
)
However, the full file includes many records that do not include a prices section. If I run the code against a file with 2 records (one with, one without), it fails with KeyError: 'prices'
Clearly the 'errors='ignore'' in the json_normalize is not enough to handle the error.
What can I do? I would just like to skip the records without prices entirely.
A list comprehension on your JSON will do it. I've synthesized some JSON to match your description of input data.
js = {
"records": [
{
"prices": [
{
"amountMax": 420000,
"amountMin": 420000,
"availability": "false",
"currency": "USD",
"dateSeen": [
"2020-12-21T11:57:17.190Z",
"2020-12-25T02:35:41.009Z"
],
"isSale": "false",
"isSold": "true",
"pricePerSquareFoot": 235,
"sourceURLs": [
"https://www.redfin.com/FL/Coconut-Creek/.../home/4146834"
]
}
],
"id": 1
},{"id":2}
]
}
pd.json_normalize({"records":[r for r in js["records"] if "prices" in r.keys()]}["records"],record_path="prices",meta="id")

Need documentation for *.analysis.windows.net/public/reports/querydata

I am reverse engineering an app that sends queries to
SOMESERVERNAME.analysis.windows.net/public/reports/querydata via an HTTP POST of an JSON-structured query.
Some initial lines of a sample query are at the end of this message.
I can't find any documentation on this anywhere. I don't know if this is some secret API or what. I ultimately would like to just ignore the aggregations altogether and just dump the raw data, which seems to sit in some flat-file type container on the back-end, but without some API documentation I'm stuck with just re-running the super basic handful of queries I've been able to intercept.
Note: this app is an embedded analytics page created with PowerBI, but the only REST API I can find for PowerBI has nothing to do with querying, but just basic object management.
Thanks!
{
"version": "1.0.0",
"queries": [
{
"Query": {
"Commands": [
{
"SemanticQueryDataShapeCommand": {
"Query": {
"Version": 2,
"From": [
{
"Name": "s",
"Entity": "Sheet1"
}
],
"Select": [
{
"Aggregation": {
"Expression": {
"Column": {
"Expression": {
"SourceRef": {
"Source": "s"
}
},
"Property": "Total"
}
},
"Function": 0
},
"Name": "Sum(Sheet1.Total)"
}
],
"Where": [
{
"Condition": {
"In": {
"Expressions": [
{
"Column": {
"Expression": {
"SourceRef": {
"Source": "s"
}
},
"Property": "Year"
}
}
],
"Values": [
[
{
"Literal": {
"Value": "'2018'"
}
}
]
]
}
}
},
............
I have built a client that scrapes data off a specific Power BI report using the same API, but probably you'll be able to adapt it to your use case. Maybe we can even abstract the code into a more generalized Power BI client!
Having tinkered with the API for two days, I realised that there are many ways the data can be formatted:
"nested"/multidimensional data can be unflattened, flattened by 1 degree, etc.
a primary "table" of a result dataset (in data.PH) can reference others (in data.SH)
The basics are as follows:
A dataset is structured like a multidimensional table, with cells containing values.
In a set of cells, the first always has a field S that contains the schema of its and all subsequent cells.
The schema maps a field of each cell's object with a selection from your query, e.g. the G0 field with the queried column age.
My client seems to work only with a specific type of query (SemanticQueryDataShapeCommand), a specific nr of dimensions and a specific column marked as primary (via Binding.Primary). But maybe that helps! https://github.com/derhuerst/fetch-bvg-occupancy/blob/1ebb864b1ff7130f9d2f0ab031c6d78bcabdd633/lib/parse-dataset.js
The only documented way to use this API is through the ADOMD.NET or OleDb provider.
If you want to send a DAX/MDX query and retrieve data programmatically, there's a sample of how to front-end the service with a simple REST API here.

Copy JSON Array data from REST data factory to Azure Blob as is

I have used REST to get data from API and the format of JSON output that contains arrays. When I am trying to copy the JSON as it is using copy activity to BLOB, I am only getting first object data and the rest is ignored.
In the documentation is says we can copy JSON as is by skipping schema section on both dataset and copy activity. I followed the same and I am the getting the output as below.
https://learn.microsoft.com/en-us/azure/data-factory/connector-rest#export-json-response-as-is
Tried copy activity without schema, using the header as first row and output files to BLOB as .json and .txt
Sample REST output:
{
"totalPages": 500,
"firstPage": true,
"lastPage": false,
"numberOfElements": 50,
"number": 0,
"totalElements": 636,
"columns": {
"dimension": {
"id": "variables/page",
"type": "string"
},
"columnIds": [
"0"
]
},
"rows": [
{
"itemId": "1234",
"value": "home",
"data": [
65
]
},
{
"itemId": "1235",
"value": "category",
"data": [
92
]
},
],
"summaryData": {
"totals": [
157
],
"col-max": [
123
],
"col-min": [
1
]
}
}
BLOB Output as the text is below: which is only first object data
totalPages,firstPage,lastPage,numberOfElements,number,totalElements
500,True,False,50,0,636
If you want to write the JSON response as is, you can use an HTTP connector. However, please note that the HTTP connector doesn't support pagination.
If you want to keep using the REST connector and to write a csv file as output, can you please specify how you want the nested objects and arrays to be written ?
In csv files, we can not write arrays. You could always use a custom activity or an azure function activity to call the REST API, parse it the way you want and write to a csv file.
Hope this helps.

Is there any data returned from the Forge Data Management Search api to indicate a model is deleted?

When using GET projects/:project_id/folders/:folder_id/search, Forge Data Management API on a model with a deleted last version, is there a any information in the "attributes" or other returned data that indicates the file is deleted?
Currently, a second call to GET projects/:project_id/items/:item_id/versions is used to determine if the latest version is deleted (below) but it would be preferable to not call another request to get this information.
Returned JSON from /versions (with some data removed):
"data": [{
"type": "versions",
"id": "urn:adsk.wipprod:fs.file:vf.w0cwXPUwQziKIHtKBtYRaA?version=3",
"attributes": {
"versionNumber": 3,
"extension": {
"type": "versions:autodesk.core:Deleted",
"version": "1.0",
"schema": {
"href": "https://developer.api.autodesk.com/schema/v1/versions/versions:autodesk.core:Deleted-1.0"
},
"data": {
"originalName": "**.rvt"
}
}
}]
The json attribute.hidden = true seems to indicate deleted. This can be accessed via the filter[hidden] = true. I'm closing this as the correct answer.

How to upload multiple documents with multiple JSON files to Cloudant DB via node js?

Currently, I have a requirement to reprocess the failure records that sit in the Cloudant DB ex.say fail DB. I need to take records from there for a particular day, say 20 records, and place them in Reprocess DB. Can you please help me how to bulk insert 20 failure records that can be stored as 20 different JSON Files using Node JS.
Sample request:
{
"docs": [
{
"_id": "XXX",
"_rev": "1-XXX",
"timestamp": "2018-01-06T14:36:09.834Z",
"DocType": "CustFail",
"RequestPayload": {
},
"CustID": "4",
"Response": "Fail"
},
{
"_id": "XXX",
"_rev": "1-XXX",
"timestamp": "2018-01-06T14:36:09.834Z",
"DocType": "CustFail",
"RequestPayload": {
},
"CustID": "42",
"Response": "Fail"
}
]
}
Thanks!!
if you are using the nodejs-cloudant library, you should be able to call bulk() passing in the array of JSON docs to be inserted