manipulating (nested) JSON keys and there values, using nifi - json

I am currently facing an issue where I have to read a JSON file that has mostly the same structure, has about 10k+ lines, and is nested.
I thought about creating my own custom processor which reads the JSON and replaces several matching key/values to the ones needed. As I am trying to use NiFi I assume that there should be a more comfortable way as the JSON-structure itself is mostly consistent.
I already tried using the ReplaceText processor as well as the JoltTransformJson processor, but I could not figure out. How can I transform both keys and values, if needed? For example: if there is something like this:
{
"id": "test"
},
{
"id": "14"
}
It might be necessary to turn the "id" into "Number" and map "test" to "3", as I am using different keys/values in my jsonfiles/database, so they need to fit those. Is there a way of doing so without having to create my own processor?
Regards,
Steve

Related

How to easily change a recurring property name in multiple schemas?

To be able to deserialize polymorphic types, I use a type discriminator across many of my JSON objects. E.g., { "$type": "SomeType", "otherProperties": "..." }
For the JSON schemas of concrete types, I specify a const value for type.
{
"type": "object",
"properties": {
"$type": { "const": "SomeType" },
"otherProperties": { "type": "string" }
}
}
This works, but distributes the chosen "$type" property name throughout many different JSON schemas. In fact, we are considering renaming it to "__type" to play more nicely with BSON.
Could I have prevented having to rename this property in all affected schemas?
I tried searching for a way to load the property name from elsewhere. As far as I can tell $ref only works for property values.
JSON Schema has no ability to dynamically load in key values from other location like you are asking. Specifically because the value will be different, and you want only the key to be loaded from elsewhere.
While you can't do this with JSON Schema, you could use a templating tool such as Jsonnet. I've seen this work well at scale.
This would require you have a pre-processing step, but it sounds like that's something you're planning for already, creating some sort of pipeline to generate your schemas.
A word of warning, watch out for existing schema generation tooling. It is often only good for scaffolding, and requires lots of modifications. It sounds like you're building your own, which is likely a better approach.

Create Nested JSON data using U-SQL Json Outputter

I have to output the table data into a nested json (make the address, state, city columns as children object for Address) something like below,
[
{
"name": "Country",
"size": 0,
"children": [
{
"name": "America",
"size": 0,
"children": [
{
"name": "SouthAmerica",
"size": 2,
"children": []
}
]
}
]
}
]
But by default JSON outputter is only creating normal json file like below,
[
{
"name": "Europe",
"size": 1,
}
]
How can I create a nested json using U-sql custom outputter? Suggest me some samples.
Thanks in Advance!
The sample JSON outputter that is provided on the U-SQL GitHub page does not support nested output. You have to write your own nested outputter I am afraid.
One of the complications will be, that you will need to be able to keep nesting correlations intact and decide if you need to support sibling nestings (e.g., A contains B and C at the same non-leaf level) or only single path nestings (e.g., A contains B which in turn contains C).
You have a couple of options to write such an outputter, for inspiration, I would look at SQL Server's FOR XML capabilities:
If you only want single path nesting, look at the FOR XML AUTO mode semantics on how to decompose a rowset into nesting levels. You would probably need to pass in parameters into the Outputter that identify how column maps to levels, to mimick the AUTO mode's lineage heuristic.
If you want sibling support, you can either look at FOR XML EXPLICIT's model: Users will have to write a universal table generating SQL query that then can be transformed in a streaming fashion by the outputter, or
You can generating some hierarchy using SQL.MAP and SQL.ARRAY typed columns and then write a custom outputter that produces the nesting.
You can write JSONifier functions that compose smaller JSON documents that then can be nested as strings containing JSON fragments and build up the nesting with several SELECTs (a bit like FOR XML PATH in SQL Server, but probably not easily done at the rowset level).
Alternatively, produce the flat JSON and find a post processing tool to reshape the JSON into the structure you need.
I would currently look into first trying approach #3 (with SQL.MAP and SQL.ARRAY).

Is it possible to use an if-else statement inside a JSON file? [duplicate]

This question already has answers here:
How to use if statement inside JSON?
(6 answers)
Closed 5 years ago.
I want to include an if-else condition in JSON based on which I need to set an attribute in the JSON file.
For example like this:
"identifier": "navTag",
"items": [{
"label": "abc",
"url": "yxz.com",
},
{
"label": "abc1",
"url": "yxz1.com",
},
{
"label": "abc2",
"url": "yxz2.com",/*I need to change this value on certain
condition like if condition is true then
"url": xyz2.com if false "url":xyz3.com*/
}
]
Is this possible?
JSON is a structure for storing data so that we can retrieved it much faster comparative to other data structure.So we can not give some conditions here.If you want to retrieve some data according to some if-else condition then there is two possible way,
1.We can create different JSON files for different conditions.
2.We can create two field in your JSON structure called if and else.If if condition satisfied then fetch the if field's value and if else satisfied then retrieved the else field's value.
eg:
{
"if":"if-value",
"else":"else-value"
}
JSON is only a data representation (unrelated to any programming language, even if early JavaScript implementations remotely inspired it). There is no notion of "execution" or "conditional" (or of "behavior" or of "semantics") in it.
Read carefully the (short) JSON definition. It simply defines what sequence of characters (e.g. the content of a file) is valid JSON. It does not define the "meaning" of JSON data.
JSON data is parsed by some program, and emitted by some program (often different ones, but could be the same).
The program handling JSON can of course use conditions and give some "meaning" (whatever is the definition of that word) to it. But JSON is only "data syntax".
You could (easily) write your own JSON transformer (using some existing JSON library, and there are many of them), and that is really simple. Some programs (notably jq) claim to be more or less generic JSON processors.
Since JSON is a textual format, you could even use some editor (such as emacs, vim or many others) to manually change parts of it. You'll better validate the result with some existing JSON parser (to be sure you did not add any mistakes).

JSON interface with UNIX

Am very new to JSON interaction, I have few doubts regarding it. Below are the basic one
1) How could we call/invoke/open JSON file through Unix, I mean let suppose I have a metedata file in JSON, then how should I fetch/update the value backforth from JSON file.
2) Need the example, on how to interact it.
3) How Unix Shell is compatible to JSON, whether is there any other tech/language/tool which is better than shell script.
Thanks,
Nikhil
JSON is just text following a specific format.
Use a text editor and follow the rules. Some editors with "JSON modes" will help with [invalid] syntax highlighting, indenting, brace matching..
A "Unix Shell" has nothing directly to do with JSON - how does a shell relate to text or XML files?
There are some utilities for dealing with JSON which might be of use such as jq - but it really depends on what needs to be done with the JSON (which is ultimately just text).
Json is a format to store strings, bools, numbers, lists, and dicts and combinations thereof (dicts of numbers pointing to lists containing strings etc.). Probably the data you want to store has some kind of structure which fits into these types. Consider that and think of a valid representation using the types given above.
For example, if your text configuration looks something like this:
Section
Name=Marsoon
Size=34
Contents
foo
bar
bloh
EndContents
EndSection
Section
Name=Billition
Size=103
Contents
one
two
three
EndContents
EndSection
… then this looks like a list of dicts which contain some strings and numbers and one list of strings. A valid representation of that in Json would be:
[
{
"Name": "Marsoon",
"Size": 34,
"Contents": [
"foo", "bar", "bloh"
]
},
{
"Name": "Billition",
"Size": 103,
"Contents": [
"one", "two", "three"
]
},
]
But in case you know that each such dictionary has a different Name and always the same fields, you don't have to store the field names and can use the Name as a key of a dictionary; so you can also represent it as a dict of strings pointing to lists containing numbers and lists of strings:
{
"Marsoon": [
34, [ "foo", "bar", "bloh" ]
],
"Billition": [
103, [ "one", "two", "three" ]
]
}
Both are valid representations of your original text configuration. How you'd choose depends mainly on the question whether you want to stay open for later changes of the data structure (the first solution is better then) or if you want to avoid bureaucratic overhead.
Such a Json can be stored as a simple text file. Use any text editor you like for this. Notice that all whitespace is optional. The last example could also be written in one line:
{"Marsoon":[34,["foo","bar","bloh"]],"Billition":[103,["one","two","three"]]}
So sometimes a computer-generated Json might be hard to read and would need an editor at least capable of handling very long lines.
Handling such a Json file in a shell script will not be easy just because the shell has no notion of the idea of such complex types. The most complicated it can handle properly is a dict of strings pointing to strings (bash arrays). So I propose to have a look for a more suitable language, e. g. Python. In Python you can handle all these structures quite efficiently and with very readable code:
import json
with open('myfile.json') as jsonFile:
data = json.load(jsonFile)
print data[0]['Contents'][2] # will print "bloh" in the first example
# or:
print data['Marsoon'][1][2] # will print "bloh" in the second example

Efficient Portable Database for Hierarchical Dataset - Json, Sqlite or?

I need to make a file that contains a hierarchical dataset. The dataset in question is a file-system listing (directory names, file name/sizes in each directory, sub-directories, ...).
My first instinct was to use Json and flatten the hierarchy using paths so the parser doesn't have to recurse so much. As seen in the example below, each entry is a path ("/", "/child01", "/child01/gchild01",...) and it's files.
{
"entries":
[
{
"path":"/",
"files":
[
{"name":"File1", "size":1024},
{"name":"File2", "size":1024}
]
},
{
"path":"/child01",
"files":
[
{"name":"File1", "size":1024},
{"name":"File2", "size":1024}
]
},
{
"path":"/child01/gchild01",
"files":
[
{"name":"File1", "size":1024},
{"name":"File2", "size":1024}
]
},
{
"path":"/child02",
"files":
[
{"name":"File1", "size":1024},
{"name":"File2", "size":1024}
]
}
]
}
Then I thought that repeating the keys over and over ("name", "size") for each file kind of sucks. So I found this article about how to use Json as if it were a database - http://peter.michaux.ca/articles/json-db-a-compressed-json-format
Using that technique I'd have a Json table like "Entry" with columns "Id", "ParentId", "EntryType", "Name", "FileSize" where "EntryType" would be 0 for Directory and 1 for File.
So, at this point, I'm wondering if sqlite would be a better choice. I'm thinking that the file size would be a LOT smaller than a Json file, but it might only be negligible if I use Json-DB-compressed format from the article. Besides size, are there any other advantages that you can think of?
I think a Javascript object for datasource, loaded as a file stream into the browser and then used in javascript logic in the browser would consume the least time and have good performance.. BUT only until a limited hierarchy size of the content.
Also, not storing the hierarchy anywhere else and keeping it only as a JSON file badly limits your data source's use in your project to client-side technologies.. or forces conversions to other technologies.
If you are building a pure javascript based application (html, js, css only app), then you could keep it as JSON object alone.. and limit your hierarchy sizes.. you could split bigger hierarchies into multiple files linking json objects.
If you will have server-side code like php, in your project,
Considering managebility of code, and scaling, you should ideally store the data in SQLite DB, at runtime create your json hierarchies for limited levels as ajax loads from your page.
If this is the only data your application stores then you can do something really simple like just store the data in an easy to parse/read text file like this:
File1:1024
File2:1024
child01
File1:1024
File2:1024
gchild01
File1:1024
File2:1024
child02
File1:1024
File2:1024
Files get File:Size and directories get just their name. Indentation gives structure. For something slightly more standard but just as easy to read, use yaml.
http://www.yaml.org/
Both can benefit from decreased file size (but decreased user readability) by gzipping the file.
And if you have more data to store, then use SQLite. SQLite is great.
Don't use JSON for data persistence. It's wasteful.