NiFi: Extract Content of FlowFile and Add that Content to the Attributes - json

I am generating random data from the following JSON/AVRO schema:
{
"type" : "record",
"namespace" : "test",
"name" : "metro_data",
"fields": [
{
"name" : "PersonID",
"type" : "int"
},
{
"name" : "TripStartStation",
"type" : {
"type" : "enum",
"name" : "StartStation",
"symbols" : ["WIEHLE_RESTON_EAST", "SPRING_HILL", "GREENSBORO"]
}
},
{
"name" : "TripEndStation",
"type" : {
"type" : "enum",
"name" : "EndStation",
"symbols" : ["WIEHLE_RESTON_EAST", "SPRING_HILL", "GREENSBORO""]
}
}
]
}
The above schema generates this, for example:
[ {
"PersonID" : -1089196095,
"TripStartStation" : "WIEHLE_RESTON_EAST",
"TripEndStation" : "SPRING_HILL"
}
I want to take the PersonID number of the schema, and add it to the Attributes. Eg, the blank in this photo needs to pull the actual PersonID number generated from the flow:
I have tried to use EvaluateJSONPath with the following configuration, and that's how I end up with the empty string set under PersonalID:
Is my next processor UpdateAttribute? Not sure how to pull that content. Thanks!

You are having array of json message(s)(ex: [...]) and You need to split the array of json into individual flowfiles using SplitJson processor with split expression as $.*
Then use EvaluateJsonProcessor to extract PersonID value as a attribute.
Flow:
--> SplitJson --> EvaluateJsonPath--> other processors
For more details refer to this link regards to the same issue.

Related

JSON example of nullable complex type in AVRO

I created a avro schema to validate the messages that I publish on a kafka topic. These messages consist of complex types that contain other complex types. Not all fields are required, so I figured out I need to use union types in avro for that.
So basically at some point my avro schema looks like this:
"type" : "record",
"name" : "RecordA",
"fields" : [ {
"name" : "fieldX",
"type" : [ "null", {
"type" : "record",
"name" : "RecordB",
"fields" : [ {
"name" : "gemeente",
"type" : "string"
}, {
"name" : "nummer",
"type" : "string"
}, {
"name" : "postcode",
"type" : "string"
}, {
"name" : "straat",
"type" : "string"
} ]
} ]
Can someone give me an example of how a json message that adheres to this schema would look like? All the examples I found refer to simple union types that consist of primitive values.

Extract specific key and value in Logstash

Im using Logastash to collecting data from mysql. Json result looks :
"_source" : {
"username" : "room_test#localhost",
"timestamp" : 1481785195811703,
"peer" : "user#localhost/1596084304715518942270426",
"bare_peer" : "user#localhost",
"xml" : "<message to='room_test1481784717020#localhost' type='groupchat' from='user#localhost'><body>msg</body><jid>456-345</jid></message>",
"txt" : "msg",
"id" : 6452,
"kind" : "groupchat",
"nick" : "user",
"created_at" : "2016-12-15T06:59:55.000Z",
"#version" : "1",
"#timestamp" : "2017-02-25T12:17:52.043Z"
}
I need extract value from "xml" key as separate value. Like this "jid": 456-345
Thank you
Grok can handle this.
grok {
match => {
"xml" => "<jid>(?<jid>[-0-9]{4,9})</jid>"
}
}
This will create a jid field equal to the value you supplied. The value in the JID tag in the XML can be between 4 and 9 characters long. Adjust as you need to.

Update same field in multiple documents with data from json

I have a MongoDB looking like this:
[
{
"status" : 0,
"name" : "Yaknow",
"email" : "yaknow#not.this",
"_id" : "5875a42ea469f40c684de385"
},
{
"status" : 1,
"name" : "johnk",
"email" : "johnk#not#this",
"_id" : "586e31c6ce07af6f891f80fd"
}
]
Meanwhile, all the emails have changed and I got a Json with the new ones:
[
{
"email" : "yaknow#gmai.new",
"_id" : "5875a42ea469f40c684de385"
},
{
"email" : "johnk#gmail.new",
"_id" : "586e31c6ce07af6f891f80fd"
}
]
How do I update all the emails?
There is no operator in mongodb which allows you modify string value by replacing some part of string. You should get documents, and then for each of documents you should locally prepare updated value and update document:
db.collection.find({}).forEach(function(doc){
var newEmail = doc.email.substr(0, doc.email.indexOf('#')) + "#gmail.new";
db.collection.update({_id: doc._id}, {$set:{email: newEmail}});
});

JSON schema for an object whose value is an array of objects

I am writing a software that can read the JSON data from a file. The file contains "person" - an object whose value is an array of objects. I am planning to use a JSON schema validating libraries to validate the contents instead of writing the code myself. What is the correct schema that conforms to JSON Schema Draf-4 which represents the below data?
{
"person" : [
{
"name" : "aaa",
"age" : 10
},
{
"name" : "ddd",
"age" : 11
},
{
"name" : "ccc",
"age" : 12
}
]
}
The schema that wrote down is given below. I am not sure whether it is correct or is there any other form?
{
"person" : {
"type" : "object",
"properties" : {
"type" : "array",
"items" : {
"type" : "object",
"properties" : {
"name" : {"type" : "string"},
"age" : {"type" : "integer"}
}
}
}
}
}
You actually only have one line in the wrong place, but that one line breaks the whole schema. "person" is a property of the object and thus must be under the properties keyword. By putting "person" at the top, JSON Schema interprets it as a keyword instead of a property name. Since there is no person keyword, JSON Schema ignores it and everything below it. Therefore, it is the same as validating against the empty schema {} which places no restrictions on what a JSON document can contain. Any valid JSON is valid against the empty schema.
{
"type" : "object",
"properties" : {
"person" : {
"type" : "array",
"items": {
"type" : "object",
"properties" : {
"name" : {"type" : "string"}
"age" : {"type" : "integer"}
}
}
}
}
}
By the way, there are several online JSON Schema testing tools out there that can help you out when crafting your schemas. This one is my goto http://jsonschemalint.com/draft4/#
Also, here is a great JSON Schema reference that might help you out as well: https://spacetelescope.github.io/understanding-json-schema/

How to add JSON schema optional Enum item with default value?

I need to add an optional property to a JSON schema.
This property is of Enum type. I need to set default value in the case the user does not specify this field.
// schema
"properties" : {
"Param" : {
"type" : "string",
"enum" : [ " p1", "p2" ],
"optional" : true,
"default" : "p2",
"required" : true
}
}
If user will not specify "Param" field it should recognize field as "p2"
Add null to the enum array
More: https://json-schema.org/understanding-json-schema/reference/generic.html#enumerated-values
"properties" : {
"Param" : {
"type" : "string",
"enum" : [ " p1", "p2", null ], // <--
"default" : "p2", // <--
"required" : true
}
}
As you have put in your example, "default" is a valid json-schema keyword. But its use is up to the schema consumer.
Take into account that json-schema is concerned with data structure definition and validation. In fact this keyword was added after much discussion because it is so common that we want to give a hint to clients of what should be a default value in case they do not want to set one. But, again, it is up to the client to make use this value or not.
Another way to approach your particular case would be to use "oneOf" splitting enum values.
"required" : ["Param"],
"oneOf" : [{
"properties" : {
"Param" : {
"enum" : ["p2"]
}
}
}, {
"properties" : {
"Param" : {
"enum" : ["p1", "p3"]
}
}
}
]
In this case you are telling the client: "at least you must send me "Param" with value "p2".
Finally, you could also add a pre-process step in your server side where you take all missing properties with default value, and add them to json message before validation.
The solution is not in the schema but in the parser/compiler; unspecified fields should have the value 0 when transferred to variable.
In this case it would be:
"enum" : [ "p2", "p1" ],
and the equivalent in C would be:
enum {
p2 = 0,
p1 = 1
}
Hope this help.
"properties" : {
"Param" : {
"type" : "string",
"enum" : ["p1", "p2"],
"default" : "p2"
}
},
"required" : ["Param"]