Azure Data Factory GetMetadata Activity - json

I have a metadata activity in one of my azure data factory pipeline and its connected to a data lake to get the files. Is there any method available in Azure data factory to sort the files available in the metadata activity based on the file name?
Sample output for the Metadata activity is given below
"childitems" :[
{
"name":"File_20200101.csv",
"type": "File"
},
{
"name":"File_20200501.csv",
"type": "File"
},
{
"name":"File_20200301.csv",
"type": "File"
},
{
"name":"File_20200201.csv",
"type": "File"
}
]
I need to get the files in the below-given order.
"childitems" :[
{
"name":"File_20200101.csv",
"type": "File"
},
{
"name":"File_20200201.csv",
"type": "File"
},
{
"name":"File_20200301.csv",
"type": "File"
},
{
"name":"File_20200501.csv",
"type": "File"
}
]
Regards,
sandeep

I have used a SQL server table to store the array values and then used a lookup activity with a order by file name query inside another loop to get the sorted filenames. This helped me to solve the sorting problem

Based on the GetMetadata Activity doc, no sort feature for childItems. So,i'm afraid you have to sort the childItemsby yourself.
In ADF environment,you could use Azure Function Activity after GetMetadata Activity.Pass the childItems as an array param into Azure Function.Inside azure function,it's easy to sort elements in an array by one element which is common requirement so that you could write code as you want.

Related

Storing JSON blob in an Avro Field

I have inherited project where an avro file that is being consumed by Snowflake. The schema of the avro is as follows:
{
"name": "TableName",
"namespace": "sqlserver",
"type": "record",
"fields": [
{
"name": "hAccount",
"type": "string"
},
{
"name": "hTableName",
"type": "string"
},
{
"name": "hRawJSON",
"type": "string"
}
]
}
The hRawJSON is a blob of JSON itself. The previous dev put this as a type of string, and this is where I believe the problem lies.
The application takes a JSON object (the JSON is varible so I never know the contents or what it contains) and populates the hRawJSON field in the Avro record. But it contains the escape characters for the double quotes in the string:
hAccount:"H11122"
hTableName:"Departments"
hRawJSON:"{\"DepartmentID\":1,\"ModelID\":0,\"Description\":\"P Medicines\",\"Margin\":\"3.300000000000000e+001\",\"UCSVATRateID\":0,\"References\":719,\"HeadOfficeID\":1,\"DividendID\":0}"
As a result the JSON blob is staged into Snowflake as a VARIANT field but still retains the escape characters:
Snowflake image
This means when querying the data in the JSON I constantly have to use this:
PARSE_JSON(RAW_FILE:hRawJSON):DepartmentID
I can't help feeling that the field type of string in the Avro file is causing the issue and that a different type should be used. I've tried Record, but without fields it's unuable. Doc also not working.
The other alternative is that this behavior is correct and when moving the hRawJSON from staging into "proper" tables I should use something like:
INSERT INTO DATA.PUBLIC.DEPARTMENTS
SELECT
RAW_FILE:hAccount::VARCHAR(4) as Account,
PARSE_JSON(RAW_FILE:hRawJSON) as JsonRaw
FROM DATA.STAGING.AVRO_RAW WHERE RAW_FILE:hTableName::STRING = 'Department';
So if this should be the correct approach and I'm over thinking this I'd appreciate guidance.

Can I import table in a JSON Schema validation?

I am writing a JSON schema validation. I have an ID field whose values are imported from a table in SQL Server. These values are large and are frequently updated, so is there a way to dynamically connect to this table in the server and validate the JSON? Below is an example code of my schema:
{
"type": "object",
"required": ["employees"],
"properties": {
"employees": {
"type": "array",
"items": {
"type": "object",
"properties": {
"id": { "type": "integer", enum = [134,2123,3213,444,5525,6234,7532,825,9342]}
}
}
}
}
In place of 'enum' I want to connect to a table so the ID values are updated when the table is updated.
As Greg said, there is nothing in JSON Schema which allows you to do this.
Some implementations have created their own extensions to allow external sources. Many implementations allow custom keywords. You would have to check your documentation.
You should consider the cost of querying the database at the same time as checking structural correctness. It may be benficial to do ID checking which hits your database after you've confirmed the data is of the correct format and structure.

How can I access this JSON data in flutter?

I do know how to access json data. But this file is available to me not online but locally. So should I upload this JSON data file to firebase storage and access it from there? And one more thing, that how would I access data from this Json. Its formate is :
[
[
{
"nos": 0,
"name": "nnnnnnnasdfnnn",
"unique_id": "adfadfd",
"reg_details": [
{
"registered_with": "adfasdfasdf"
},
{
"type_of_ngo": "zzzzzzzz"
},
{
"registration_no": "zzzzzxxx"
},
Any example to access this kind of json data would be appriciated. Thanks
I was also stuck on this type of complex JSON. This article on Medium helped me.

Create JSON file from a query with FOR JSON clause result in ADF

I need to create a JSON file from azure SQL database and store the file in Azure blob storage.
In ADF, I created a simple pipeline with one Copy Data activity to achieve this.
I used t-sql query with FOR JSON clause to get data from the database.
SELECT * FROM stage.Employee FOR JSON AUTO, ROOT ('main_root')
Here is my source:
And this is a sink:
After execute pipeline, the created file looks like this
I want to get a normal JSON file with the structure
{
"main_root": [
{
"Employee_No": "1000",
"Status": "Employee",
"..." "...",
"..."
},
{
"Employee_No": "1000",
"Status": "Employee",
"..." "...",
"..."
},
{
"Employee_No": "1000",
"Status": "Employee",
"..."
"...",
"..."
Any help I will appreciate.
You are building a hierarchical structure from a relation source, so you'll want to build your R2H logic in data flows to accommodate this data transformation.
Set the SQL DB table as your source and then build your hierarchical structure in a Derived Column with sub-columns for hierarchies and collect data into arrays using Aggregate with the collect() function.

Unable to parse JSON list in Azure Data Factory ADF

In my datafactory pipeline I hava a web activity which is giving below JSON response. In the next stored procedure activity I am unable parse the output parameter. I tried few methods.
I have set Content-Type application/json in web activity
Sample JSON:
Output
{
"Response": "[{\"Message\":\"Number of barcode(s) found:1\",\"Status\":\"Success\",\"CCS Office\":[{\"Name\":\"Woodstock\",\"CCS Description\":null,\"BranchType\":\"Sub CFS Office\",\"Status\":\"Active\",\"Circle\":\"NJ\"}]}]"
}
For parameter in stored procedure activity:
#json(first(activity('Web1').output.Response))
output - System.Collections.Generic.List`1[System.Object]
#json(activity('Web1').output.Response[0])
output - cannot be evaluated because property '0' cannot be selected. Property selection is not supported on values of type 'String'
#json(activity('Web1').output.Response.Message)
output - cannot be evaluated because property 'Message' cannot be selected. Property selection is not supported on values of type 'String'
Here is what I did:
I created a new pipeline, and created a parameter of type 'object' using your 'output' in its entirety:
{ "Response": "[{\"Message\":\"Number of barcode(s) found:1\",\"Status\":\"Success\",\"CCS Office\":[{\"Name\":\"Woodstock\",\"CCS Description\":null,\"BranchType\":\"Sub CFS Office\",\"Status\":\"Active\",\"Circle\":\"NJ\"}]}]" }
I created a variable and setVariable activity. Variable is of type string. The dynamic expression I used is:
#{json(pipeline().parameters.output.response)[0]}
Let me break down and explain. The {curly braces} were necessary because variable is of type string. You may not want/need them.
json(....)
was necessary because data type for the value of 'response' was left as a string. Whether it being string is correct behavior or not is a different discussion. By converting from string to json, I can now do the final piece.
[0]
Now works because Data Factory sees the contents as an objects rather than string literal. This conversion seems to have been applied to the nested contents as well, because without the encapsulating {curly braces} to convert to string, I would get a type error from my setVariable activity, as the variable is of type string.
Entire pipeline code:
{
"name": "pipeline11",
"properties": {
"activities": [
{
"name": "Set Variable1",
"type": "SetVariable",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"variableName": "thing",
"value": {
"value": "#{json(pipeline().parameters.output.response)[0]}",
"type": "Expression"
}
}
}
],
"parameters": {
"output": {
"type": "object",
"defaultValue": {
"Response": "[{\"Message\":\"Number of barcode(s) found:1\",\"Status\":\"Success\",\"CCS Office\":[{\"Name\":\"Woodstock\",\"CCS Description\":null,\"BranchType\":\"Sub CFS Office\",\"Status\":\"Active\",\"Circle\":\"NJ\"}]}]"
}
}
},
"variables": {
"thing": {
"type": "String"
}
},
"annotations": []
}
}
I had the similar problem and this is how I resolved the issue.
I passed the value of Response as a string to lookup activity which calls a stored procedure in Azure SQL. The stored procedure parses the string using Json_value and return the individual key, value as a row. Now output of lookup activity can be accessed directly from preceding activities.