JOLT transformation remove all fields except one - json

I want to remove all fields from a json except the one named foo. I used transformation spec as given below:
[
{
"operation": "remove",
"spec": {
"^(?!foo).*$": ""
}
}
]
I tried executing this on http://jolt-demo.appspot.com/#inception but it doesn't work and it outputs the input json, untransformed. Am I doing something wrong?

Yeah so, "shift" does to support any "regex" matching other than "", so "^(?!foo).$" is not going to work.
I think you are better off, using "shift" to match "foo" and copy it across to the output. Anyting not matched by the "shift" spec does not get copied across to the output.
Spec
[
{
"operation": "shift",
"spec": {
// matches top level key "foo" in the intput, and copies the
// value at that location to the output map with key "foo".
"foo" : "foo"
}
}
]
Shift copies data from the input to a new ouput, all the other operations (default, remove, cardinality, etc) modify the input.

Related

JSONPath how to filter out objects that contains lists of empty objects?

I'm trying to use JSONPath (Jayway implementation), to get objects that contains a values array, that contains at least 1 non empty value. This is a simplified data I get from some service:
[
{
"feature" : "city",
"values" : [
{
"value" : "cba"
},
{},
{
"value" : "abc"
}
]
},
{
"feature" : "country",
"values" : [
{},
{}
]
}
]
I've dozens of crazy approaches.
Get the objects with the value attribute
$.[?(#..value)]
but this filters nothing I can still see both objects instead of only one.
I've also tried to filter by the length of the filtered values array but it doesn't work either, and returns an empty list:
$.[?(#.values[?(#.value)].length()>1)]
I'm trying to work it out using online evaluator from jsonpath.herokuapp.com
I've been searching a lot and it seems that getting the length of the filtered array isn't possible, but the first approach looks like something that should work. Any ideas?
With Jayway-JSONPath you can use the empty Filter Operator
Filter the object
$.[?(#..value empty false)]

Apache Nifi Transfrom json field to timestamp

I have the json with unix-timestamp field. I like to extract year from it.
So, for example:
{"eventno": "event1",
"unixtimestamp": 1589379890}
Expected result:
{"eventno": "event1",
"unixtime": 2020}
I try to do this using JoltTransfromJSON and NiFi expression language, but my attempts failed. One of them:
[
{
"operation": "shift",
"spec": {
"unixtime": "${unixtimestamp:multiply(1000):format('yyyy', 'GMT')}"
}
}
]
How can I transform it?
#GrigorySkvortsov
The Expression Language syntax should be:
${attribute:expressionLanguage():functions()}
If what you have above isn't just a typo retest after removing the } after unixtimestamp.
Unit Test outside of Jolt Transform with updateAttribute Processor to dial in the correct Expression Language chain. Here is an example I made to test it:
Then the 4 values are:

is there any way to write a null json transformation (passes through orig document) using Jolt?

You know how XSLT and other XML processing languages support the "null transformation" which passes a document through unmodified ?
I would like to do the same thing for Jolt (a very nice JSON transformation library used in Apache Camel and other places).
I could use JOLT's "insert default" feature and stick some harmless JSON tag and value at the top level of the document.. which is almost what want. But I couldnt' figure out how to pass through the document through JOLT but leave it untouched.
Why do i want to do this you ask ? We are developing a streaming data pipeline and I have to validate incoming strings as valid JSON... Jolt does that for me for free, but in some cases I don't want to monkey with the document. So, I want to use JOLT as a step in the pipeline, but (in some cases) have it do nothing to the input JSSON doc.
Another option is to create a custom transformer.
package com.example;
public class NullTransform implements Transform{
#Override
public Object transform(Object input) {
return input;
}
}
then reference it from the chainr jolt as below
[
{
"operation": "com.example.NullTransform"
}
]
You'll still incur the deserializaion/serialization overhead but no other code is run.
OOTB Jolt contains 5 "operations" that can be applied to the input hydrated Json. 4 of those (default, remove, sort, cardinality) are mutation operations / modify the supplied hydrated Json. I you gave those 4 an empty "spec", they would do nothing, and your data would "pass thru".
The "shift" operation does not mutate the input it is given. Instead it "copies" data from the "input" to a new "output" map/list. If you don't give "shift" a spec, then it copies nothing across.
Thus, from you question, it sounds like you are talking about "shift". With shift you have to explicitly pass thru all the things you "want to keep".
Depending on your data this may be terrible or easy, as you can have shift copy very large chunks of data across.
Example, for the "inception" example on the jolt demo site. http://jolt-demo.appspot.com/#inception
This Spec basically passes thru the input, coping the whole nested map that is "rating" thru to the output.
[
{
"operation": "shift",
"spec": {
"rating": "rating"
}
}
]
It can be generalized with wildcards to :
Spec
[
{
"operation": "shift",
"spec": {
"*": "&"
}
}
]

How to modify a nested object with jq

Given this
{
"some": "property",
"nested": {
"hello": "world"
}
}
I'd like to get this result with jq
{
"some": "property",
"nested": {
"hello": "world",
"freshly": "added"
}
}
So how can I add the freshly added field ? I don't know how many properties are at root level (and I want to keep them all), I only know the name of the nested object (here "nested"), the name of the property I'd like to add (here "freshly") and its value.
Just assign the new value to the nested object.
.nested.freshly = "added"
Well I found out myself how to do it. If you have a better solution, you're more than welcome to give it here.
jq '.nested=(.nested + {"freshly": "added"})'
You can also do simply
.nested += {freshly: "added"}
Then you can add multiple nested keys at once

What is the logical relationship between keyword in a json schema?

According to the specification (http://json-schema.org/schema) there is no mutual exclusion among schema keywords.
For example I could create the following schema:
{
"properties" : {
"foo" : {"type" : "string"}
}
"items" : [
{"type" : "integer" },
{"type" : "number" }
]
}
Would this schema validate against both objects and arrays?
If so it would imply an "OR" relationship between keyword.
But if we consider the following schema:
{
"anyOf" : [
{ "type" : "string",},
{ "type" : "integer"}
]
"not" : {
{ "type" : "string",
"maxLength" : 5
}
}
}
The most practical way to interpret it would be an "AND" relationship between anyOf and not keywords.
I could not find any indication in the draft v4 on how keywords logically interact. Can anyone point me to a documentation/standard that would answer this question?
Keywords are always an "AND" relationship. Data must satisfy all keywords from a schema.
The properties and items keywords don't specify the type of the object (you have to use type for that). Instead, they only have meaning for particular types, and are ignored otherwise. So properties actually means:
If the data is an object, then the following property definitions apply...
This means that {"properties":{...}} will match any string, because properties is ignored for values that aren't objects. And items actually means:
If the data is an array, then the following item definitions apply...
So the AND combination looks like:
(If the data is an object, then properties applies) AND (if the data is an array, then items applies)
As the spec clearly dictates, some keywords are only relevant for one particular type of JSON value, or all of them.
So, for instance, properties only applies if the JSON value you validate is a JSON Object. On any JSON value which is NOT an object, it will not apply (another way to understand it is that if the JSON value to validate is not a JSON Object, validation against this keyword will always succeed).
Similarly, items will only apply if the JSON value is a JSON Array.
Now, some other keywords apply for all types; among these are enum, allOf, anyOf, oneOf, type. In each case, the validation rules are clearly defined in the specification.
In short: you should consider what type of value is expected. The easiest way to force a value to be of a given type in a schema is to use type, as in:
"type": "integer"
BUT this keyword will nevertheless be applied INDEPENDENTLY of all others in the validation process. So, this is a legal schema:
{
"type": "integer",
"minItems": 1
}
If an empty JSON Array is passed for validation, it will fail for both keywords:
type because the value is not an array;
minItems because the value is an array but it has zero elements, which is illegal for this particular keyword since it expects one element in the array at least.
Note that the result of validation is totally independent of the order in which you evaluate keywords. That is a fundamental property of JSON Schema. And it is pretty much a requirement that it be so, since the order of members in a JSON Object is irrelevant ({ "a": 1, "b": 2 } is the same JSON Object as { "b": 2, "a": 1 }).
And of course, if only ONE keyword causes validation to fail, the whole JSON value is invalid against the schema.