NIFI: get value from json - minify

i have a queryCassandra which generate json like this one:
{"results":[{"term":"term1"},{"term":"term2"}..]}
Now, i want to get from this all the term values separated by some separator in string format; ex :
term1,term2,term3
So i can pass this list as a string parameter for a java main program which i've alreat set.
(i only need the transofrmation, not the java program execution)
Thank you !

You can easily get those values by using following ways.
GetFile-->EvaluateJsonPath-->PutFile
In get file you have to specify location of json file.
In EvaluateJsonPath configure like following properties.,
Destination:flowfile-attribute
Return Type:json
input.term1:$.results.[0].term //To get term
input.term2:$.results.[1].term
At the result of Evaluate json you have two attributes in which having those values.
Result attributes:
input.term1: term1
input.term2: term2
Above code works for me,so feel free to upvote/accept as answer.

as a variant use ExecuteScript with groovy lang:
import groovy.json.*
//get input file
def flowFile = session.get()
if(!flowFile)return
//parse json to map/array objects
def content = session.read(flowFile).withStream{ stream-> return new JsonSlurper().parse( stream ) }
//transform
content = content.results.collect{ it.term }.join(',')
//write new content
flowFile = session.write(flowFile,{ stream->
stream << content.getBytes("UTF-8")
} as OutputStreamCallback)
session.transfer(flowFile, REL_SUCCESS)

Related

Read json file in jsr223 sampler in jmeter and extract data

import com.jayway.jsonpath.JsonPath
def idCSV = new File('id.csv')
def index = [fileOne.json, fileTwo.json]
def jsonString
index.each { file ->
jsonString = ________
def ids = JsonPath.read(jsonString, '$..id')
ids.each { id ->
idCSV << id << newLine
}
}
How to fill the jsonString = ____, so that I can json file into string and parse the string to extract ids and some information from the json string.
And I don't to do it in http request-> GET-> file format.
Previously i have extraced jsonString from http response and it worked well now I want to do it this way.
Use JsonSlurper:
def jsonString = new groovy.json.JsonSlurper().parseText(new File("json.txt").text)
My expectation is that you're looking for File.getText() function
jsonString = file.text
I have no full vision why do you need to store the values from JSON in a CSV file, however there is an alternative way of achieving this which doesn't require scripting as your approach will work with 1 concurrent thread only, if you will add more users attempting writing into the same file - you'll run into a race condition :
You can read the files from the folder into JMeter Variables via Directory Listing Config
The file can be read using HTTP Request sampler
The values cane be fetched using JSON Extractor, they will be automatically stored into JMeter Variables so you will able to use them later on
If you need the values to be present in the file (although I wouldn't recommend this approach cause it will cause massive disk IO and potentially can run your test) you can go for the Flexible File Writer

GCP Proto Datastore encode JsonProperty in base64

I store a blob of Json in the datastore using JsonProperty.
I don't know the structure of the json data.
I am using endpoints proto datastore in order to retrieve my data.
The probleme is the json property is encoded in base64 and I want a plain json object.
For the example, the json data will be:
{
first: 1,
second: 2
}
My code looks something like:
import endpoints
from google.appengine.ext import ndb
from protorpc import remote
from endpoints_proto_datastore.ndb import EndpointsModel
class Model(EndpointsModel):
data = ndb.JsonProperty()
#endpoints.api(name='myapi', version='v1', description='My Sample API')
class DataEndpoint(remote.Service):
#Model.method(path='mymodel2', http_method='POST',
name='mymodel.insert')
def MyModelInsert(self, my_model):
my_model.data = {"first": 1, "second": 2}
my_model.put()
return my_model
#Model.method(path='mymodel/{entityKey}',
http_method='GET',
name='mymodel.get')
def getMyModel(self, model):
print(model.data)
return model
API = endpoints.api_server([DataEndpoint])
When I call the api for getting a model, I get:
POST /_ah/api/myapi/v1/mymodel2
{
"data": "eyJzZWNvbmQiOiAyLCAiZmlyc3QiOiAxfQ=="
}
where eyJzZWNvbmQiOiAyLCAiZmlyc3QiOiAxfQ== is the base64 encoded of {"second": 2, "first": 1}
And the print statement give me: {u'second': 2, u'first': 1}
So, in the method, I can explore the json blob data as a python dict.
But, in the api call, the data is encoded in base64.
I expeted the api call to give me:
{
'data': {
'second': 2,
'first': 1
}
}
How can I get this result?
After the discussion in the comments of your question, let me share with you a sample code that you can use in order to store a JSON object in Datastore (it will be stored as a string), and later retrieve it in such a way that:
It will show as plain JSON after the API call.
You will be able to parse it again to a Python dict using eval.
I hope I understood correctly your issue, and this helps you with it.
import endpoints
from google.appengine.ext import ndb
from protorpc import remote
from endpoints_proto_datastore.ndb import EndpointsModel
class Sample(EndpointsModel):
column1 = ndb.StringProperty()
column2 = ndb.IntegerProperty()
column3 = ndb.StringProperty()
#endpoints.api(name='myapi', version='v1', description='My Sample API')
class MyApi(remote.Service):
# URL: .../_ah/api/myapi/v1/mymodel - POSTS A NEW ENTITY
#Sample.method(path='mymodel', http_method='GET', name='Sample.insert')
def MyModelInsert(self, my_model):
dict={'first':1, 'second':2}
dict_str=str(dict)
my_model.column1="Year"
my_model.column2=2018
my_model.column3=dict_str
my_model.put()
return my_model
# URL: .../_ah/api/myapi/v1/mymodel/{ID} - RETRIEVES AN ENTITY BY ITS ID
#Sample.method(request_fields=('id',), path='mymodel/{id}', http_method='GET', name='Sample.get')
def MyModelGet(self, my_model):
if not my_model.from_datastore:
raise endpoints.NotFoundException('MyModel not found.')
dict=eval(my_model.column3)
print("This is the Python dict recovered from a string: {}".format(dict))
return my_model
application = endpoints.api_server([MyApi], restricted=False)
I have tested this code using the development server, but it should work the same in production using App Engine with Endpoints and Datastore.
After querying the first endpoint, it will create a new Entity which you will be able to find in Datastore, and which contains a property column3 with your JSON data in string format:
Then, if you use the ID of that entity to retrieve it, in your browser it will show the string without any strange encoding, just plain JSON:
And in the console, you will be able to see that this string can be converted to a Python dict (or also a JSON, using the json module if you prefer):
I hope I have not missed any point of what you want to achieve, but I think all the most important points are covered with this code: a property being a JSON object, store it in Datastore, retrieve it in a readable format, and being able to use it again as JSON/dict.
Update:
I think you should have a look at the list of available Property Types yourself, in order to find which one fits your requirements better. However, as an additional note, I have done a quick test working with a StructuredProperty (a property inside another property), by adding these modifications to the code:
#Define the nested model (your JSON object)
class Structured(EndpointsModel):
first = ndb.IntegerProperty()
second = ndb.IntegerProperty()
#Here I added a new property for simplicity; remember, StackOverflow does not write code for you :)
class Sample(EndpointsModel):
column1 = ndb.StringProperty()
column2 = ndb.IntegerProperty()
column3 = ndb.StringProperty()
column4 = ndb.StructuredProperty(Structured)
#Modify this endpoint definition to add a new property
#Sample.method(request_fields=('id',), path='mymodel/{id}', http_method='GET', name='Sample.get')
def MyModelGet(self, my_model):
if not my_model.from_datastore:
raise endpoints.NotFoundException('MyModel not found.')
#Add the new nested property here
dict=eval(my_model.column3)
my_model.column4=dict
print(json.dumps(my_model.column3))
print("This is the Python dict recovered from a string: {}".format(dict))
return my_model
With these changes, the response of the call to the endpoint looks like:
Now column4 is a JSON object itself (although it is not printed in-line, I do not think that should be a problem.
I hope this helps too. If this is not the exact behavior you want, maybe should play around with the Property Types available, but I do not think there is one type to which you can print a Python dict (or JSON object) without previously converting it to a String.

How to properly merge multiple FlowFile's?

I use MergeContent 1.3.0 in order to merge FlowFiles from 2 sources: 1) from ListenHTTP and 2) from QueryElasticsearchHTTP.
The problem is that the merging result is a list of JSON strings. How can I convert them into a single JSON string?
{"event-date":"2017-08-08T00:00:00"}{"event-date":"2017-02-23T00:00:00"}{"eid":1,"zid":1,"latitude":38.3,"longitude":2.4}
I would to get this result:
{"event-date":["2017-08-08T00:00:00","2017-02-23T00:00:00"],"eid":1,"zid":1,"latitude":38.3,"longitude":2.4}
Is it possible?
UPDATE:
After changing data structure in Elastic, I was able to come up with the following output result of MergeContent. Now I have a common field eid in both JSON strings. I would like to merge these strings by eid in order to get a single JSON file. Which operator should I use?
{"eid":"1","zid":1,"latitude":38.3,"longitude":2.4}{"eid":"1","dates":{"event-date":["2017-08-08","2017-02-23"]}}
I need to get the following output:
{"eid":"1","zid":1,"latitude":38.3,"longitude":2.4,"dates":{"event-date":["2017-08-08","2017-02-23"]}}
It was suggested to use ExecuteScript to merge files. However I cannot figure out how to do this. This is what I tried:
import json
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback
class ModJSON(StreamCallback):
def __init__(self):
pass
def process(self, inputStream, outputStream):
text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
obj = json.loads(text)
newObj = {
"eid": obj['eid'],
"zid": obj['zid'],
...
}
outputStream.write(bytearray(json.dumps(newObj, indent=4).encode('utf-8')))
flowFile1 = session.get()
flowFile2 = session.get()
if (flowFile1 != None && flowFile2 != None):
# WHAT SHOULD I PUT HERE??
flowFile = session.write(flowFile, ModJSON())
flowFile = session.putAttribute(flowFile, "filename", flowFile.getAttribute('filename').split('.')[0]+'_translated.json')
session.transfer(flowFile, REL_SUCCESS)
session.commit()
The example how to read multiple files from incoming queue using filtering
Assume you have multiple pairs of flow files with following content:
{"eid":"1","zid":1,"latitude":38.3,"longitude":2.4}
and
{"eid":"1","dates":{"event-date":["2017-08-08","2017-02-23"]}}
The same value of eid field provides a link between pairs.
Before merging we have to extract the value of eid field and put it into na attribute of the flow file for fast filtering.
Use the EvaluateJsonPath processor with properties:
Destination : flowfile-attribute
eid : $.eid
After this you'll have new eid attribute of the flow file.
Then use ExecuteScript processor with groovy language and with following code:
import org.apache.nifi.processor.FlowFileFilter;
import groovy.json.JsonSlurper
import groovy.json.JsonBuilder
//get first flow file
def ff0 = session.get()
if(!ff0)return
def eid = ff0.getAttribute('eid')
//try to find files with same attribute in the incoming queue
def ffList = session.get(new FlowFileFilter(){
public FlowFileFilterResult filter(FlowFile ff) {
if( eid == ff.getAttribute('eid') )return FlowFileFilterResult.ACCEPT_AND_CONTINUE
return FlowFileFilterResult.REJECT_AND_CONTINUE
}
})
//let's assume you require two additional files in queue with the same attribute
if( !ffList || ffList.size()<1 ){
//if less than required
//rollback current session with penalize retrieved files so they will go to the end of the incoming queue
//with pre-configured penalty delay (default 30sec)
session.rollback(true)
return
}
//let's put all in one list to simplify later iterations
ffList.add(ff0)
if( ffList.size()>2 ){
//for example unexpected situation. you have more files then expected
//redirect all of them to failure
session.transfer(ffList, REL_FAILURE)
return
}
//create empty map (aka json object)
def json = [:]
//iterate through files parse and merge attributes
ffList.each{ff->
session.read(ff).withStream{rawIn->
def fjson = new JsonSlurper().parse(rawIn)
json.putAll(fjson)
}
}
//create new flow file and write merged json as a content
def ffOut = session.create()
ffOut = session.write(ffOut,{rawOut->
rawOut.withWriter("UTF-8"){writer->
new JsonBuilder(json).writeTo(writer)
}
} as OutputStreamCallback )
//set mime-type
ffOut = session.putAttribute(ffOut, "mime.type", "application/json")
session.remove(ffList)
session.transfer(ffOut, REL_SUCCESS)
Joining together two different types of data is not really what MergeContent was made to do.
You would need to write a custom processor, or custom script, that understood your incoming data formats and created the new output.
If you have ListenHttp connected to QueryElasticSearchHttp, meaning that you are triggering the query based on the flow file coming out of ListenHttp, then you may want to make a custom version of QueryElasticSearchHttp that takes the content of the incoming flow file and joins it together with any of the outgoing results.
Here is where the query result is currently written to a flow file:
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-elasticsearch-bundle/nifi-elasticsearch-processors/src/main/java/org/apache/nifi/processors/elasticsearch/QueryElasticsearchHttp.java#L360
Another option is to use ExecuteScript and write a script that could take multiple flow files and merge them together in the way you described.

How to match json response in 2 requests?

I have a json response in 1 request like this:
{"total":1,"page":1,"records":2,"rows":[{"id":1034,"item_type_val":"Business
Requirement","field_name":"Assigned To","invalid_value":"Jmeter
System","dep_value":"","dep_field":""},{"id":1033,"item_type_val":"Risk","field_name":"Category","invalid_value":"Energy","dep_value":"Logged
User","dep_field":"Assigned To"}]}
and in 2nd request like this:
{"total":1,"page":1,"records":2,"rows":[{"id":1100,"item_type_val":"Business
Requirement","field_name":"Assigned To","invalid_value":"Jmeter
System","dep_value":"","dep_field":""},{"id":1111,"item_type_val":"Risk","field_name":"Category","invalid_value":"Energy","dep_value":"Logged
User","dep_field":"Assigned To"}]}
Both are same but different id's. I need to verify the 1st json response from 2nd json response and compare both that both are same or not. here both are same but having different id's which should be acceptable. how can i do this by regex so i can ignore the id's and match whole content?
Not sure if you can do it with a single regex but other way out is you can create a map of it and then compare everything except 'id'
I believe the easiest way would be just discarding these id entries using JSR223 PostProcessor and Groovy language which comes with JSON support
Add JSR223 PostProcessor as a child of the sampler, which returns your JSON
Put the following code into the JSR223 PostProcessor's "Script" area
import groovy.json.JsonBuilder
import groovy.json.JsonSlurper
def slurper = new JsonSlurper()
def jsonResponse = slurper.parseText(prev.getResponseDataAsString())
jsonResponse.rows.findAll { it.remove("id") }
def newResponse = new JsonBuilder(jsonResponse).toPrettyString()
//depending on what you need
vars.put("responseWithoutId", newResponse) // store response withou ID into a JMeter Variable
prev.setResponseData(new String(newResponse)) // overwrite parent sampler response data
log.info(newResponse) // just print the new value to jmeter.log file
So you have the following choices:
vars.put("responseWithoutId", newResponse) - stores the new JSON (without these id) into a ${responseWithoutId} JMeter Variable
prev.setResponseData(new String(newResponse)) - after this line execution parent sampler data won't contain any "id"
log.info(newResponse) - just prints JSON without "id" to jmeter.log file
I don't know your test plan design, personally I would store responses from 2 requests into 2 different JMeter Variables i.e. ${response1} and ${response2} using above approach and compare them with the Response Assertion like:

Return a JsonSlurper result from SoapUI

I'm using SOAPUI to mock out a web-api service, I'm reading the contents of a static json response file, but changing the contents of a couple of the nodes based on what the user has passed through in the request.
I can't create multiple responses as surcharge is calculated from the amount passed.
The toString() method of the object that gets return by the slurper is replacing { with [ with invalidates my JsonResponse. I've included the important bits of the code below, has anyone got a way around this or is JsonSlurper not the right thing to use here?
def json=slurper.parseText(new File(path).text)
// set the surcharge to the two credit card nodes
// these are being set fine
json.AvailableCardTypeResponse.PaymentCards[0].Surcharge="${sur_charge}"
json.AvailableCardTypeResponse.PaymentCards[1].Surcharge="${sur_charge}"
response.setContentType("application/json;charset=utf-8" );
response.setContentLength(length);
Tools.readAndWrite( new ByteArrayInputStream(json.toString().getBytes("UTF-8")), length,response.getOutputStream() )
return new com.eviware.soapui.impl.wsdl.mock.WsdlMockResult(mockRequest)
You're slurping the json into a List/Map structure, then writing this List/Map structure out.
You need to convert your lists and maps back to json.
Change the line:
Tools.readAndWrite( new ByteArrayInputStream(json.toString().getBytes("UTF-8")), length,response.getOutputStream() )
to
Tools.readAndWrite( new ByteArrayInputStream( new JsonBuilder( json ).toString().getBytes("UTF-8")), length,response.getOutputStream() )