Flattening JSON document in Cosmos DB - json

I have a nested JSON with arrays. This raw json gets stored in cosmos db as-is.
Now I have to unnest some of the arrays within this JSON and fetch specific fields, some of these fields will form header level details and rest will form line item level details.
With separate queries(by flattening required arrays) I can find these fields(header, line item) separately. For e.g. below is just some of the required fields in header, which I get as output of one query after flattening one of the arrays within the json.
[
{
"identifier": "639",
"owner": "ABC"
"recepient": "XYZ"
}
},
{
"identifier": "640",
"owner": "TESTOWNER"
"recipient": "TESTrecipient"
}
}
]
If I combine unnesting of different arrays in single query, it becomes a cartesian product.
Not sure how to combine all these fields(header, line item) together and pass it to the consuming application as one JSON response.Is cosmos db stored procedure an option to combine these fields together by unnesting required arrays within json.
Looking forward to hearing some suggestions.

You can always us the JOIN operator to get yourself into arrays and I don not think you need to use any Cosmos SPR to achieve what you need.

Related

Processing a Kafka message using KSQL that has a field that can be either an ARRAY or a STRUCT

I'm consuming a Kafka topic published by another team (so I have very limited influence over the message format). The message has a field that holds an ARRAY of STRUCTS (an array of objects), but if the array has only one value then it just holds that STRUCT (no array, just an object). I'm trying to transform the message using Confluent KSQL. Unfortunately, I cannot figure out how to do this.
For example:
{ "field": {...} } <-- STRUCT (single element)
{ "field": [ {...}, {...} ] } <-- ARRAY (multiple elements)
{ "field": [ {...}, {...}, {...} ] <-- ARRAY (multiple elements)
If I configure the field in my message schema as a STRUCT then all messages with multiple values error. If I configure the field in my message schema as an ARRAY then all messages with a single value error. I could create two streams and merge them, but then my error log will be polluted with irrelevant errors.
I've tried capturing this field as a STRING/VARCHAR which is fine and I can split the messages into two streams. If I do this, then I can parse the single value messages and extract the data I need, but I cannot figure out how to parse the multivalue messages. None of the KSQL JSON functions seem to allow parsing of JSON Arrays out of JSON Strings. I can use EXTRACTJSONFIELD() to extract a particular element of the array, but not all of the elements.
Am I missing something? Is there any way to handle this reasonably?
In my experience, this is one use-case where KSQL just doesn't work. You would need to use Kafka Streams or a plain consumer to deserialize the event as a generic JSON type, then check object.get("field").isArray() or isObject(), and handle accordingly.
Even if you used a UDF in KSQL, the STREAM definition would be required to know ahead of time if you have field ARRAY<?> or field STRUCT<...>
I finally solved this in a roundabout way...
First, I created an initial stream reading the transaction as a stream of bytes using KAFKA format instead of JSON format. This allows me to put a filter conditional filter on the data so I can fork the stream into a version for the single (STRUCT) variation and a version for the multiple (ARRAY) variation.
The initial stream looks like:
CREATE OR REPLACE STREAM `my-topic-stream` (
id STRING KEY,
data BYTES
)
WITH (
KAFKA_TOPIC='my-topic',
VALUE_FORMAT='KAFKA'
);
Forking that stream looks like this with a second for a multiple version filtering for IS NOT NULL:
CREATE OR REPLACE STREAM `my-single-stream`
WITH (
kafka_topic='my-single-topic'
) AS
SELECT *
FROM `my-topic-stream`
WHERE JSON_ARRAY_LENGTH(EXTRACTJSONFIELD(FROM_BYTES(data, 'utf8'), '$.field')) IS NULL;
At this point I can create a schema for both variations, explode field, and merge the two streams back together. I don't know if this can be refined to be more efficient, but this successfully processes the transactions as I wanted.

Create Nested JSON data using U-SQL Json Outputter

I have to output the table data into a nested json (make the address, state, city columns as children object for Address) something like below,
[
{
"name": "Country",
"size": 0,
"children": [
{
"name": "America",
"size": 0,
"children": [
{
"name": "SouthAmerica",
"size": 2,
"children": []
}
]
}
]
}
]
But by default JSON outputter is only creating normal json file like below,
[
{
"name": "Europe",
"size": 1,
}
]
How can I create a nested json using U-sql custom outputter? Suggest me some samples.
Thanks in Advance!
The sample JSON outputter that is provided on the U-SQL GitHub page does not support nested output. You have to write your own nested outputter I am afraid.
One of the complications will be, that you will need to be able to keep nesting correlations intact and decide if you need to support sibling nestings (e.g., A contains B and C at the same non-leaf level) or only single path nestings (e.g., A contains B which in turn contains C).
You have a couple of options to write such an outputter, for inspiration, I would look at SQL Server's FOR XML capabilities:
If you only want single path nesting, look at the FOR XML AUTO mode semantics on how to decompose a rowset into nesting levels. You would probably need to pass in parameters into the Outputter that identify how column maps to levels, to mimick the AUTO mode's lineage heuristic.
If you want sibling support, you can either look at FOR XML EXPLICIT's model: Users will have to write a universal table generating SQL query that then can be transformed in a streaming fashion by the outputter, or
You can generating some hierarchy using SQL.MAP and SQL.ARRAY typed columns and then write a custom outputter that produces the nesting.
You can write JSONifier functions that compose smaller JSON documents that then can be nested as strings containing JSON fragments and build up the nesting with several SELECTs (a bit like FOR XML PATH in SQL Server, but probably not easily done at the rowset level).
Alternatively, produce the flat JSON and find a post processing tool to reshape the JSON into the structure you need.
I would currently look into first trying approach #3 (with SQL.MAP and SQL.ARRAY).

Is Apache Ignite suited for NoSQL schema

Is JSON within a JSON supported within Apache Ignite?
Example:
{
"stuff": {
"onetype": [
{"id":1,"name":"John Doe"},
{"id":2,"name":"Don Joeh"}
],
"othertype": {"id":2,"company":"ACME"}
},
"otherstuff": {
"thing": [[1,42],[2,2]]
}
}
Goal is being able to query based on any field in a JSON. So far with Apache Ignite I have seen that with creating a class and then storing object of it - is possible to add indexes and query json on a first level of Key/Value pairs but did not see any example for a nested JSON.
Is it maybe better to use MongoDB or Cassandra for that kind of need (to index and query any nested field within a JSON)?
JSON is treated as a regular string when it's put into a cache.
When a JSON has only a single level, then it's possible to represent it as either POJO or BinaryObject, put it into a cache and benefit from all the querying capabilities, but nested objects cannot be indexed and queried properly so far.
As an option, you could use ScanQueries

Source of documentation for a standard JSON document structure?

I am working on a (.NET) REST API which is returning some JSON data. The consumer of the API is an embedded client. We have been trying to establish the structure of the JSON we will be working with. The format the embedded client wants to use is something I have not seen before in working with JSON. I suggested that it is no "typical" JSON. I was met with the question "Where is 'typical' JSON format documented"?
As an example of JSON I "typically" see:
{
"item" : {
"users": [ ... list of user objects ... ],
"times": [ ... list of time objects ...],
}
}
An example of the non-typical JSON:
{
"item" : [
{
"users": [ ... list of user objects ... ]
},
{
"times": [ ... list of time objects ...]
},
]
}
In the second example, item contains an array of objects, which each contain a property whose value is an array of entities. This is valid JSON. However, I have not encountered another instance of JSON that is structured this way when it is not an arbitrary array of objects but is in fact a set list of properties on the "item" object.
In searching json.org, stackoverflow.com and other places on the interwebs I have not found any guidelines on why the structure of JSON follows the "typical" example above rather than the second example.
Can you provide links to documentation that would provide recommendations for one format or the other above?
Not a link, but just straightforward answer: Items are either indexed (0, 1, 2, ...) or keyed (users, times). No matter what software you use, you can get at indexed or keyed data equally easily and quickly. But not with what you call "non-typical" JSON: To get at the users, I have to iterate through the array and find one dictionary that has a key "users". But there might be two or more dictionaries with that key. So what am I supposed to do then? If you use JSON schema, the "non-typical" JSON is impossible to check. In iOS, in the typical case I write
NSArray* users = itemDict [#"users"];
For the non-typical JSON I have to write
NSArray* users = nil;
for (NSDictionary* dict in itemArray)
if (dict [#"users"] != nil)
users = dict [#"users"];
but that still has no error checking for multiple dicts with the key "users". Which is an error that in the first case isn't even possible. So just tell them what the are asking for is rubbish and creates nothing but unnecessary work. For other software, you probably have the same problems.

json data format in firebase - are arrays supported? And/Or, if only objects are supported, can dictionaries be numbered with integers?

I am tinkering with firebase and curious about the data structure. Browsing to my database, firebase allows me to modify the structure and data in my database. But it seems that firebase only supports objects (and dictionaries for lists).
I want to know if arrays are supported. I would also like to know if dictionary items can be named with integers - the firebase interface only inserts strings as names which makes me concerned about ordering records.
Here is a sample of json created through firebase interface:
{
"dg":{
"users":{
"rein":{
"searches":{
"0":{
"urls":"http://reinpetersen.com,http://www.reinpetersen.com",
"keyphrases":"rein petersen,reinsbrain,programmer turned kitesurfer"
}
}
},
"jacqui":{
"searches":{
"0":{
"urls":"http://www.diving-fiji.com,http://diving-fiji.com",
"keyphrases":"diving marine conservation, diving fiji"
}
}
}
},
"crawl_list":{
"1":{
"urls":"http://www.diving-fiji.com,http://diving-fiji.com",
"keyphrases":"diving marine conservation, diving fiji"
},
"0":{
"urls":"http://reinpetersen.com,http://www.reinpetersen.com",
"keyphrases":"rein petersen,reinsbrain,programmer turned kitesurfer"
}
}
}
}
Obviously, for my lists, I want the dictionary item names to be integers so i can ensure sorting is correct.
You can save arrays into Firebase. For example:
var data = new Firebase(...);
data.set(['x', 'y', 'z']);
Javascript Arrays are essentially just objects with numeric keys. When retrieving data, we automatically detect when a Firebase object has only numeric keys, and we return an array if that is the case.
Note that for storing a list of data to which many people can append, an array is not a good choice, as multiple people writing to the same index in the array can cause conflicts. Instead, we have a "push" function which creates a chronologically-ordered unique ID for your data.
Also, if you're intending to use the array as a way of ordering data, there's a better way to do that using our priorities. See the docs.
The Firebase docs have a pretty good section on how to order your data: Ordered Data.
Just like JSON fields, Firebase fields can only be named with strings. It sounds like what you're looking for is setWithPriority(), which attaches sortable priority data to your fields, or push(), which is guaranteed to give your fields unique names, ordered chronologically. (More on lists and push() here.)
You can also push() or set() arrays. For example,
new Firebase("http://gamma.firebase.com/MyUser").push(["cakes","bulldozers"]);
results in a tree like you'd expect, with MyUser receiving a uniquely named child who has children "0":"cakes" and "1":"bulldozers".