I have a json in the following format:
{
"nm_questionario":{"isEmpty":"MSGE1 - Nome do Questionário"},
"ds_questionario":{"isEmpty":"MSGE1 - Descrição do Questionário"},
"dt_inicio_vigencia":{"isEmpty":"MSGE1 - Data de Vigência"}
}
How can I print the names of the properties using nifi? I want to retrieve the names nm_questionario, dt_inicio_vigencia and ds_questionario. Tried many things already but to no avail.
You can use a LogAttribute processor with Log payload set to true to print the full contents in your $NIFI_HOME/logs/nifi-app.log file. You can also use a PutFile processor to write the contents to a flat file on disk. If you need to do something programmatic with those values, you can use the EvaluateJSONPath processor to extract various pieces of content into named attributes, which you can manage using UpdateAttribute or LogAttribute again.
Related
Background: I want to store a dict object in json format that has say, 2 entries:
(1) Some object that describes the data in (2). This is small data mostly definitions, parameters that control, etc. and things (maybe called metadata) that one would like to read before using the actual data in (2). In short, I want good human readability of this portion of the file.
(2) The data itself is a large chunk- should more like machine readable (no need for human to gaze over it on opening the file).
Problem: How to specify some custom indent, say 4 to the (1) and None to the (2). If I use something like json.dump(data, trig_file, indent=4) where data = {'meta_data': small_description, 'actual_data': big_chunk}, meaning the large data will have a lot of whitespace making the file large.
Assuming you can append json to a file:
Write {"meta_data":\n to the file.
Append the json for small_description formatted appropriately to the file.
Append ,\n"actual_data":\n to the file.
Append the json for big_chunk formatted appropriately to the file.
Append \n} to the file.
The idea is to do the json formatting out the "container" object by hand, and using your json formatter as appropriate to each of the contained objects.
Consider a different file format, interleaving keys and values as distinct documents concatenated together within a single file:
{"next_item": "meta_data"}
{
"description": "human-readable content goes here",
"split over": "several lines"
}
{"next_item": "actual_data"}
["big","machine-readable","unformatted","content","here","....."]
That way you can pass any indent parameters you want to each write, and you aren't doing any serialization by hand.
See How do I use the 'json' module to read in one JSON object at a time? for how one would read a file in this format. One of its answers wisely suggests the ijson library, which accepts a multiple_values=True argument.
I'm trying to understand the code for reading JSON file in Synapse Analytics. And here's the code provided by Microsoft documentation:
Query JSON files using serverless SQL pool in Azure Synapse Analytics
select top 10 *
from openrowset(
bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.jsonl',
format = 'csv',
fieldterminator ='0x0b',
fieldquote = '0x0b'
) with (doc nvarchar(max)) as rows
go
I wonder why the format = 'csv'. Is it trying to convert JSON to CSV to flatten the file?
Why they didn't just read the file as a SINGLE_CLOB I don't know
When you use SINGLE_CLOB then the entire file is important as one value and the content of the file in the doc is not well formatted as a single JSON. Using SINGLE_CLOB will make us do more work after using the openrowset, before we can use the content as JSON (since it is not valid JSON we will need to parse the value). It can be done but will require more work probably.
The format of the file is multiple JSON's like strings, each in separate line. "line-delimited JSON", as the document call it.
By the way, If you will check the history of the document at GitHub, then you will find that originally this was not the case. As much as I remember, originally the file included a single JSON document with an array of objects (was wrapped with [] after loaded). Someone named "Ronen Ariely" in fact found this issue in the document, which is why you can see my name in the list if the Authors of the document :-)
I wonder why the format = 'csv'. Is it trying to convert json to csv to flatten the hierarchy?
(1) JSON is not a data type in SQL Server. There is no data type name JSON. What we have in SQL Server are tools like functions which work on text and provide support for strings which are JSON's like format. Therefore, we do not CONVERT to JSON or from JSON.
(2) The format parameter has nothing to do with JSON. It specifies that the content of the file is a comma separated values file. You can (and should) use it whenever your file is well formatted as comma separated values file (also commonly known as csv file).
In this specific sample in the document, the values in the csv file are strings, which each one of them has a valid JSON format. Only after you read the file using the openrowset, we start to parse the content of the text as JSON.
Notice that only after the title "Parse JSON documents" in the document, the document starts to speak about parsing the text as JSON.
Using Apache Nifi I'd like to process a zip which contains a category.json file and a number of data files as illustrated.
somefile.zip
├──category.json
├──datafile-1
├──datafile-2
├──...
├──datafile-n
Example category.json
{
"category": "history",
"rating" : 5
}
What I'd like to do is unpack the files and apply the category.json data as attributes to each datafile.
What would be the best way to handle this problem?
Myabe not the best one, but a way to do it :
1) unzip
2) use routeOnAttribut based on category.json filename
3) retrieve category as attribut in category.json flowfile
4) zip all file again but keep atttribut
5) unzip again and keep attribut, all your flowfile will have the category attribut
I'd recommend starting with a combination of ListFile and FetchFile (or GetFile on its own) to retrieve the archive, CompressContent to extract the component files, RouteOnAttribute using the flowfile filename attribute to separate the flowfile containing category.json, and the EvaluateJSONPath processor to retrieve the JSON content of that flowfile and populate certain values into attributes.
From there, it's unclear if your question is how to update the NiFi flowfile attributes for each flowfile containing one of the data files from that archive, or apply the extracted JSON to the data files on disk somewhere.
Assuming the former, you could either write the extracted JSON into a variable or parameter (use ExecuteScript to do so) and use UpdateAttribute to apply those attributes onto the other flowfiles resulting from the CompressContent processor.
After writing a few files for saving in my JSON file in Godot. I saved the information in a variable called LData and it is working. LData looks like this:
{
"ingredients":[
"[KinematicBody2D:1370]"
],
"collected":[
{
"iname":"Pineapple",
"collected":true
},{
"iname":"Banana",
"collected":false
}
]
}
What does it mean when the file says KinematicBody2D:1370? I understand that it is saving the node in the file - or is it just saving a string? Is it saving the node's properties as well?
When I tried retrieving data - a variable that is assigned to the saved KinematicBody2D.
Code:
for ingredient in LData.ingredients:
print(ingredient.iname)
Error:
Invalid get index name 'iname' (on base: 'String')
I am assuming that the data is stored as a String and I need to put some code to get the exact node it saved. Using get_node is also throwing an error.
Code:
for ingredient in LData.ingredients:
print(get_node(ingredient).iname)
Error:
Invalid get index 'iname' (on base: 'null instance')
What information is it exactly storing when it says [KinematicBody2D:1370]? How do I access the variable iname and any other variables - variables that are assigned to the node when the game is loaded - and is not changed through the entire game?
[KinematicBody2D:1370] is just the string representation of a Node, which comes from Object.to_string:
Returns a String representing the object. If not overridden, defaults to "[ClassName:RID]".
If you truly want to serialize an entire Object, you could use Marshalls.variant_to_base64 and put that string in your json file. However, this will likely bloat your json file and contain much more information than you actually need to save a game. Do you really need to save an entire KinematicBody, or can you figure out the few properties that need to be saved (postion, type of object, ect.) and reconstruct the rest at runtime?
You can also save objects as Resources, which is more powerful and flexible than a JSON file, but tends to be better suited to game assets than save games. However, you could read the Resource docs and see if saving Resources seems like a more appropriate solution to you.
I have simple pipeline using apache nifi and i want to publish some messages in kafka topic using existing kafka puplisher processor.
The problem is how to specify kafka key using apache nifi expression language?
I tired something like ${message:jsonPath('$.key')} but, of course, i got an error because object message does not exist.
I also tried to use filename object which is something like a default object name for input messages, but it didn't help
Using another kafka publisher processor it is possible by setting message key field property, but what about PublishKafka processor?
NiFi expression language can only reference flow file attributes, and cannot directly reference the content (this is done on purpose).
So if you want to use the value of a field from your json document as the key, then you need to first use another processor like EvaluateJsonPath to extract the value of that field into a flow file attribute.
Lets say you have a field "foo" in your json document, you might use EvaluateJsonPath with destination to set to "flow file attributes" and then add a dynamic property like:
foo = $.foo
Then in PublishKafka set the key property to ${foo}.
Keep in mind this only makes sense if you have a single json document per flow file, otherwise if you have multiple then it is unclear what the key is since you can only have one "foo" attribute for the flow file, but many "foo" fields in the content of the flow file.