How to merge a dynamically named record with a static one in Dhall? - json

I'm creating an AWS Step Function definition in Dhall. However, I don't know how to create a common structure they use for Choice states such as the example below:
{
"Not": {
"Variable": "$.type",
"StringEquals": "Private"
},
"Next": "Public"
}
The Not is pretty straightforward using mapKey and mapValue. If I define a basic Comparison:
{ Type =
{ Variable : Text
, StringEquals : Optional Text
}
, default =
{ Variable = "foo"
, StringEquals = None Text
}
}
And the types:
let ComparisonType = < And | Or | Not >
And adding a helper function to render the type as Text for the mapKey:
let renderComparisonType = \(comparisonType : ComparisonType )
-> merge
{ And = "And"
, Or = "Or"
, Not = "Not"
}
comparisonType
Then I can use them in a function to generate the record halfway:
let renderRuleComparisons =
\( comparisonType : ComparisonType ) ->
\( comparisons : List ComparisonOperator.Type ) ->
let keyName = renderComparisonType comparisonType
let compare = [ { mapKey = keyName, mapValue = comparisons } ]
in compare
If I run that using:
let rando = ComparisonOperator::{ Variable = "$.name", StringEquals = Some "Cow" }
let comparisons = renderRuleComparisons ComparisonType.Not [ rando ]
in comparisons
Using dhall-to-json, she'll output the first part:
{
"Not": {
"Variable": "$.name",
"StringEquals": "Cow"
}
}
... but I've been struggling to merge that with "Next": "Sup". I've used all the record merges like /\, //, etc. and it keeps giving me various type errors I don't truly understand yet.

First, I'll include an approach that does not type-check as a starting point to motivate the solution:
let rando = ComparisonOperator::{ Variable = "$.name", StringEquals = Some "Cow" }
let comparisons = renderRuleComparisons ComparisonType.Not [ rando ]
in comparisons # toMap { Next = "Public" }
toMap is a keyword that converts records to key-value lists, and # is the list concatenation operator. The Dhall CheatSheet has a few examples of how to use both of them.
The above solution doesn't work because # cannot merge lists with different element types. The left-hand side of the # operator has this type:
comparisons : List { mapKey : Text, mapValue : Comparison.Type }
... whereas the right-hand side of the # operator has this type:
toMap { Next = "Public" } : List { mapKey : Text, mapValue : Text }
... so the two Lists cannot be merged as-is due to the different types for the mapValue field.
There are two ways to resolve this:
Approach 1: Use a union whenever there is a type conflict
Approach 2: Use a weakly-typed JSON representation that can hold arbitrary values
Approach 1 is the simpler solution for this particular example and Approach 2 is the more general solution that can handle really weird JSON schemas.
For Approach 1, dhall-to-json will automatically strip non-empty union constructors (leaving behind the value they were wrapping) when translating to JSON. This means that you can transform both arguments of the # operator to agree on this common type:
List { mapKey : Text, mapValue : < State : Text | Comparison : Comparison.Type > }
... and then you should be able to concatenate the two lists of key-value pairs and dhall-to-json will render them correctly.
There is a second solution for dealing with weakly-typed JSON schemas that you can learn more about here:
Dhall Manual - How to convert an existing YAML configuration file to Dhall
The basic idea is that all of the JSON/YAML integrations recognize and support a weakly-typed JSON representation that can hold arbitrary JSON data, including dictionaries with keys of different shapes (like in your example). You don't even need to convert the entire the expression to this weakly-typed representation; you only need to use this representation for the subset of your configuration where you run into schema issues.
What this means for your example, is that you would change both arguments to the # operator to have this type:
let Prelude = https://prelude.dhall-lang.org/v12.0.0/package.dhall
in List { mapKey : Text, mapValue : Prelude.JSON.Type }
The documentation for Prelude.JSON.Type also has more details on how to use this type.

Related

How to read field from nested json?

this is my test json file.
{
"item" : {
"fracData" : [ ],
"fractimeData" : [ {
"number" : "1232323232",
"timePeriods" : [ {
"validFrom" : "2021-08-03"
} ]
} ],
"Module" : [ ]
}
}
This is how I read the json file.
starhist_test_df = spark.read.json("/mapr/xxx/yyy/ttt/dev/rawdata/Test.json", multiLine=True)
starhist_test_df.createOrReplaceTempView("v_test_df")
This query works.
df_test_01 = spark.sql("""
select item.fractimeData.number from v_test_df""")
df_test_01.collect();
Result
[Row(number=['1232323232'])]
But this query doesn't work.
df_test_01 = spark.sql("""
select item.fractimeData.timePeriods.validFrom from v_test_df""")
df_test_01.collect();
Error
cannot resolve 'v_test_df.`item`.`fractimeData`.`timePeriods`['validFrom']' due to data type mismatch: argument 2 requires integral type, however, ''validFrom'' is of string type.; line 3 pos 0;
What do I have to change, to read the validFrom field?
dot notation to access values works with struct or array<struct> types.
The schema for field number in item.fractimeData is string and accessing it via dot notation returns an array<string> since fractimeData is an array.
Similarly, the schema for field timePeriods in item.fractimeData is <array<struct<validFrom>>, and accessing it via dot notation wraps it into another array, resulting in final schema of array<array<struct<validFrom>>>.
The error you get is because the dot notation can work on array<struct> but not on array<array>.
Hence, flatten the result from item.fractimeData.timePeriods to get back an array<struct<validFrom>> and then apply the dot notation.
df_test_01 = spark.sql("""
select flatten(item.fractimeData.timePeriods).validFrom as validFrom from v_test_df""")
df_test_01.collect()
"""
[Row(validFrom=['2021-08-03', '2021-08-03'])]
"""

Azure tables unable to store flattened JSON

I am using the npm flat package, and arrays/objects are flattened, but object/array keys are surrounded by '' , like in 'task_status.0.data' using the object below.
These specific fields do not get stored into AzureTables - other fields go through, but these are silently ignored. How would I fix this?
var obj1 = {
"studentId": "abc",
"task_status": [
{
"status":"Current",
"date":516760078
},
{
"status":"Late",
"date":1516414446
}
],
"student_plan": "n"
}
Here is how I am using it - simplified code example: Again, it successfully gets written to the table, but does not write the properties that were flattened (see further below):
var flatten = require('flat')
newObj1 = flatten(obj1);
var entGen = azure.TableUtilities.entityGenerator;
newObj1.PartitionKey = entGen.String(uniqueIDFromMyDB);
newObj1.RowKey = entGen.String(uniqueStudentId);
tableService.insertEntity(myTableName, newObj1, myCallbackFunc);
In the above example, the flattened object would look like:
var obj1 = {
studentId: "abc",
'task_status.0.status': 'Current',
'task_status.0.date': 516760078,
'task_status.1.status': 'Late',
'task_status.1.date': 516760078,
student_plan: "n"
}
Then I would add PartitionKey and RowKey.
all the task_status fields would silently fail to be inserted.
EDIT: This does not have anything to do with the actual flattening process - I just checked a perfectly good JSON object, with keys that had 'x.y.z' in it, i.e. AzureTables doesn't seem to accept these column names....which almost completely destroys the value proposition of storing schema-less data, without significant rework.
. in column name is not supported. You can use a custom delimiter to flatten your objects instead.
For example:
newObj1 = flatten(obj1, {delimiter: '__'});

Regular expression to extract a JSON array

I'm trying to use a PCRE regular expression to extract some JSON. I'm using a version of MariaDB which does not have JSON functions but does have REGEX functions.
My string is:
{"device_types":["smartphone"],"isps":["a","B"],"network_types":[],"countries":[],"category":["Jebb","Bush"],"carriers":[],"exclude_carriers":[]}
I want to grab the contents of category. I'd like a matching group that contains 2 items, Jebb and Bush (or however many items are in the array).
I've tried this pattern but it only matches the first occurrence: /(?<=category":\[).([^"]*).*?(?=\])/g
Does this match your needs? It should match the category array regardless of its size.
"category":(\[.*?\])
regex101 example
JSON not a regular language. Since it allows arbitrary embedding of balanced delimiters, it must be at least context-free.
For example, consider an array of arrays of arrays:
[ [ [ 1, 2], [2, 3] ] , [ [ 3, 4], [ 4, 5] ] ]
Clearly you couldn't parse that with true regular expressions.
See This Topic:
Regex for parsing single key: values out of JSON in Javascript
Maybe Helpful for you.
Using a set of non-capturing group you can extract a predefined json array
regex answer: (?:\"category\":)(?:\[)(.*)(?:\"\])
That expression extract "category":["Jebb","Bush"], so access the first group
to extract the array, sample java code:
Pattern pattern = Pattern.compile("(?:\"category\":)(?:\\[)(.*)(?:\"\\])");
String body = "{\"device_types\":[\"smartphone\"],\"isps\":[\"a\",\"B\"],\"network_types\":[],\"countries\":[],\"category\":[\"Jebb\",\"Bush\"],\"carriers\":[],\"exclude_carriers\":[]}";
Matcher matcher = pattern.matcher(body);
assertThat(matcher.find(), is(true));
String[] categories = matcher.group(1).replaceAll("\"","").split(",");
assertThat(categories.length, is(2));
assertThat(categories[0], is("Jebb"));
assertThat(categories[1], is("Bush"));
There are many ways. One sloppy way to do it is /([A-Z])\w+/g
Please try it on your console like
var data = '{"device_types":["smartphone"],"isps":["a","B"],"network_types":[],"countries":[],"category":["Jebb","Bush"],"carriers":[],"exclude_carriers":[]}',
res = [];
data.match(/([A-Z])\w+/g); // ["Jebb", "Bush"]
OK the above was pretty sloppy however a solid single regex solution to extract every single element regardless of the number, one by one and to place them in an array (res) is the following...
var rex = /[",]+(\w*)(?=[",\w]*"],"carriers)/g,
str = '{"device_types":["smartphone"],"isps":["a","B"],"network_types":[],"countries":[],"category":["Jebb","Bush","Donald","Trump"],"carriers":[],"exclude_carriers":[]}',
arr = [],
res = [];
while ((arr = rex.exec(str)) !== null) {
res.push(arr[1]); // <- ["Jebb", "Bush", "Donald", "Trump"]
}
Check it out # http://regexr.com/3d4ee
OK lets do it. I have come up with a devilish idea. If JS had look-behinds this could have been done simply by reversing the applied logic in the previous example where i had used a look-forward. Alas, there aren't... So i decided to turn the world the other way around. Check this out.
String.prototype.reverse = function(){
return this.split("").reverse().join("");
};
var rex = /[",]+(\w*)(?=[",\w]*"\[:"yrogetac)/g,
str = '{"device_types":["smartphone"],"isps":["a","B"],"network_types":[],"countries":[],"category":["Jebb","Bush","Donald","Trump"],"carriers":[],"exclude_carriers":[]}',
rev = str.reverse();
arr = [],
res = [];
while ((arr = rex.exec(rev)) !== null) {
res.push(arr[1].reverse()); // <- ["Trump", "Donald", "Bush", "Jebb"]
}
res.reverse(); // <- ["Jebb", "Bush", "Donald", "Trump"]
Just use your console to confirm.
In c++ you can do it like this
bool foundmatch = false;
try {
std::regex re("\"([a-zA-Z]+)\"*.:*.\\[[^\\]\r\n]+\\]");
foundmatch = std::regex_search(subject, re);
} catch (std::regex_error& e) {
// Syntax error in the regular expression
}
If the number of items in the array is limited (and manageable), you could define it with a finite number of optional items. Like this one with a maximum of 5 items:
"category":\["([^"]*)"(?:,"([^"]*)"(?:,"([^"]*)"(?:,"([^"]*)"(?:,"([^"]*)")?)?)?)?
regex101 example here.
Regards.

MongoDB - Dynamically update an object in nested array

I have a document like this:
{
Name : val
AnArray : [
{
Time : SomeTime
},
{
Time : AnotherTime
}
...arbitrary more elements
}
I need to update "Time" to a Date type (right now it is string)
I would like to do something psudo like:
foreach record in document.AnArray { record.Time = new Date(record.Time) }
I've read the documentation on $ and "dot" notation as well as a several similar questions here, I tried this code:
db.collection.update({_id:doc._id},{$set : {AnArray.$.Time : new Date(AnArray.$.Time)}});
And hoping that $ would iterate the indexes of the "AnArray" property as I don't know for each record the length of it. But am getting the error:
SyntaxError: missing : after property id (shell):1
How can I perform an update on each member of the arrays nested values with a dynamic value?
There's no direct way to do that, because MongoDB doesn't support an update-expression that references the document. Moreover, the $ operator only applies to the first match, so you'd have to perform this as long as there are still fields where AnArray.Time is of $type string.
You can, however, perform that update client side, in your favorite language or in the mongo console using JavaScript:
db.collection.find({}).forEach(function (doc) {
for(var i in doc.AnArray)
{
doc.AnArray[i].Time = new Date(doc.AnArray[i].Time);
}
db.outcollection.save(doc);
})
Note that this will store the migrated data in a different collection. You can also update the collection in-place by replacing outcollection with collection.

What is "compressed JSON"?

I see a lot of references to "compressed JSON" when it comes to different serialization formats. What exactly is it? Is it just gzipped JSON or something else?
Compressed JSON removes the key:value pair of json's encoding to store keys and values in seperate parallel arrays:
// uncompressed
JSON = {
data : [
{ field1 : 'data1', field2 : 'data2', field3 : 'data3' },
{ field1 : 'data4', field2 : 'data5', field3 : 'data6' },
.....
]
};
//compressed
JSON = {
data : [ 'data1','data2','data3','data4','data5','data6' ],
keys : [ 'field1', 'field2', 'field3' ]
};
This method of usage i found here
Content from link (http://www.nwhite.net/?p=242)
rarely find myself in a place where I am writing javascript applications that use AJAX in its pure form. I have long abandoned the ‘X’ and replaced it with ‘J’ (JSON). When working with Javascript, it just makes sense to return JSON. Smaller footprint, easier parsing and an easier structure are all advantages I have gained since using JSON.
In a recent project I found myself unhappy with the large size of my result sets. The data I was returning was tabular data, in the form of objects for each row. I was returning a result set of 50, with 19 fields each. What I realized is if I augment my result set I could get a form of compression.
// uncompressed
JSON = {
data : [
{ field1 : 'data1', field2 : 'data2', field3 : 'data3' },
{ field1 : 'data4', field2 : 'data5', field3 : 'data6' },
.....
]
};
//compressed
JSON = {
data : [ 'data1','data2','data3','data4','data5','data6' ],
keys : [ 'field1', 'field2', 'field3' ]
};
I merged all my values into a single array and store all my fields in a separate array. Returning a key value pair for each result cost me 8800 byte (8.6kb). Ripping the fields out and putting them in a separate array cost me 186 bytes. Total savings 8.4kb.
Now I have a much more compressed JSON file, but the structure is different and now harder to work with. So I implement a solution in Mootools to make the decompression transparent.
Request.JSON.extend({
options : {
inflate : []
}
});
Request.JSON.implement({
success : function(text){
this.response.json = JSON.decode(text, this.options.secure);
if(this.options.inflate.length){
this.options.inflate.each(function(rule){
var ret = ($defined(rule.store)) ? this.response.json[rule.store] : this.response.json[rule.data];
ret = this.expandData(this.response.json[rule.data], this.response.json[rule.keys]);
},this);
}
this.onSuccess(this.response.json, text);
},
expandData : function(data,keys){
var arr = [];
var len = data.length; var klen = keys.length;
var start = 0; var stop = klen;
while(stop < len){
arr.push( data.slice(start,stop).associate(keys) );
start = stop; stop += klen;
}
return arr;
}
});
Request.JSON now has an inflate option. You can inflate multiple segments of your JSON object if you so desire.
Usage:
new Request.JSON({
url : 'url',
inflate : [{ 'keys' : 'fields', 'data' : 'data' }]
onComplete : function(json){}
});
Pass as many inflate objects as you like to the option inflate array. It has an optional property called ’store’ If set the inflated data set will be stored in that key instead.
The ‘keys’ and ‘fields’ expect strings to match a location in the root of your JSON object.
Based in Paniyar's answer, we can convert a List of Objects in "compressed" Json format using C# like this:
var JsonString = serializer.Serialize(
new
{
cols = new[] { "field1", "field2", "field3"},
items = data.Select(x => new object[] {x.field1, x.field2, x.field3})
});
I used an array of object because the fields can be int, bool, string...
More Reduction:
If the field is repeated very often and it is a string type, you can get compressed a little be more if you add a distinct list of that field... for instance, a field name job position, city, etc are excellent candidate for this. You can add a distinct list of this items and in each item change the value for a reference number. That will make your Json more lite.
Compressed:
[["KeyA", "KeyB", "KeyC", "KeyD", "KeyE", "KeyF"],
["ValA1", "ValB1", "ValC1", "ValD1", "ValE1", "ValF1"],
["ValA2", "ValB2", "ValC2", "ValD2", "ValE2", "ValF2"],
["ValA3", "ValB3", "ValC3", "ValD3", "ValE3", "ValF3"],
["ValA4", "ValB4", "ValC4", "ValD4", "ValE4", "ValF4"]]
Uncompressed:
[{KeyA: "ValA1", KeyB: "ValB1", KeyC: "ValC1", KeyD: "ValD1", KeyE: "ValE1", KeyF: "ValF1"},
{KeyA: "ValA2", KeyB: "ValB2", KeyC: "ValC2", KeyD: "ValD2", KeyE: "ValE2", KeyF: "ValF2"},
{KeyA: "ValA3", KeyB: "ValB3", KeyC: "ValC3", KeyD: "ValD3", KeyE: "ValE3", KeyF: "ValF3"},
{KeyA: "ValA4", KeyB: "ValB4", KeyC: "ValC4", KeyD: "ValD4", KeyE: "ValE4", KeyF: "ValF4"}]
The most likely answer is that it really is just gzipped JSON. There is no other standard meaning to this phrase.
Re-organizing a homogenous array of JSON objects into a pair of arrays is a very useful technique to make the payload smaller and to speed up encoding and decoding, it is not commonly called "compressed JSON". I haven't run across it ever in open source or any open API, but we use this technique internally and call it "jsontable".