I would like to use nifi to encrypt the attributes in a json but not the keys as I would like to upload the data to a mongodb server. Is there a way to do this? For the project I an using twitter data as a proof of concept. So far I have used the EvaluateJsonPath processor to extract only the text of the tweet, and I can encrypt this text, however the resulting json no longer has a key. Can Nifi recreate a json that attaches a key to this attribute that I extracted? Is there a better way to do this?
Unfortunately, this workflow isn't well supported by existing Apache NiFi processors. You could probably fashion a workflow that split the JSON content into attributes, split each attribute into the content of an individual flowfile, encrypted that content, merged the flowfiles back, and reconstituted the now-encrypted content into a attributes via UpdateAttribute.
I have created a Jira for a new NiFi processor to make this much simpler. My recommendation until such time as that is available is to use the ExecuteScript processor to achieve this. I have provided a template with an example, which you can import directly into your NiFi instance and connect to your flow. The body of the ExecuteScript processor is provided below (you can see how I initialized the AES/GCM cipher, and change the algorithm, key, and IV to your desired values).
import javax.crypto.Cipher
import javax.crypto.SecretKey
import javax.crypto.spec.IvParameterSpec
import javax.crypto.spec.SecretKeySpec
import java.nio.charset.StandardCharsets
FlowFile flowFile = session.get()
if (!flowFile) {
return
}
try {
// Get the raw values of the attributes
String normalAttribute = flowFile.getAttribute('Normal Attribute')
String sensitiveAttribute = flowFile.getAttribute('Sensitive Attribute')
// Instantiate an encryption cipher
// Lots of additional code could go here to generate a random key, derive a key from a password, read from a file or keyring, etc.
String keyHex = "0123456789ABCDEFFEDCBA9876543210" // * 2 for 256-bit encryption
SecretKey key = new SecretKeySpec(keyHex.getBytes(StandardCharsets.UTF_8), "AES")
IvParameterSpec iv = new IvParameterSpec(keyHex[0..<16].getBytes(StandardCharsets.UTF_8))
Cipher aesGcmEncCipher = Cipher.getInstance("AES/GCM/NoPadding", "BC")
aesGcmEncCipher.init(Cipher.ENCRYPT_MODE, key, iv)
String encryptedNormalAttribute = Base64.encoder.encodeToString(aesGcmEncCipher.doFinal(normalAttribute.bytes))
String encryptedSensitiveAttribute = Base64.encoder.encodeToString(aesGcmEncCipher.doFinal(sensitiveAttribute.bytes))
// Add a new attribute with the encrypted normal attribute
flowFile = session.putAttribute(flowFile, 'Normal Attribute (encrypted)', encryptedNormalAttribute)
// Replace the sensitive attribute inline with the cipher text
flowFile = session.putAttribute(flowFile, 'Sensitive Attribute', encryptedSensitiveAttribute)
session.transfer(flowFile, REL_SUCCESS)
} catch (Exception e) {
log.error("There was an error encrypting the attributes: ${e.getMessage()}")
session.transfer(flowFile, REL_FAILURE)
}
Related
I'm using Flink to process the data coming from some data source (such as Kafka, Pravega etc).
In my case, the data source is Pravega, which provided me a flink connector.
My data source is sending me some JSON data as below:
{"key": "value"}
{"key": "value2"}
{"key": "value3"}
...
...
Here is my piece of code:
PravegaDeserializationSchema<ObjectNode> adapter = new PravegaDeserializationSchema<>(ObjectNode.class, new JavaSerializer<>());
FlinkPravegaReader<ObjectNode> source = FlinkPravegaReader.<ObjectNode>builder()
.withPravegaConfig(pravegaConfig)
.forStream(stream)
.withDeserializationSchema(adapter)
.build();
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<ObjectNode> dataStream = env.addSource(source).name("Pravega Stream");
dataStream.map(new MapFunction<ObjectNode, String>() {
#Override
public String map(ObjectNode node) throws Exception {
return node.toString();
}
})
.keyBy("word") // ERROR
.timeWindow(Time.seconds(10))
.sum("count");
As you see, I used the FlinkPravegaReader and a proper deserializer to get the JSON stream coming from Pravega.
Then I try to transform the JSON data into a String, KeyBy them and count them.
However, I get an error:
The program finished with the following exception:
Field expression must be equal to '*' or '_' for non-composite types.
org.apache.flink.api.common.operators.Keys$ExpressionKeys.<init>(Keys.java:342)
org.apache.flink.streaming.api.datastream.DataStream.keyBy(DataStream.java:340)
myflink.StreamingJob.main(StreamingJob.java:114)
It seems that KeyBy threw this exception.
Well, I'm not a Flink expert so I don't know why. I've read the source code of the official example WordCount. In that example, there is a custtom splitter, which is used to split the String data into words.
So I'm thinking if I need to use some kind of splitter in this case too? If so, what kind of splitter should I use? Can you show me an example? If not, why did I get such an error and how to solve it?
I guess you have read the document about how to specify keys
Specify keys
The example codes use keyby("word") because word is a field of POJO type WC.
// some ordinary POJO (Plain old Java Object)
public class WC {
public String word;
public int count;
}
DataStream<WC> words = // [...]
DataStream<WC> wordCounts = words.keyBy("word").window(/*window specification*/);
In your case, you put a map operator before keyBy, and the output of this map operator is a string. So there is obviously no word field in your case. If you actually want to group this string stream, you need to write it like this .keyBy(String::toString)
Or you can even implement a customized keySelector to generate your own key.
Customized Key Selector
import com.jayway.jsonpath.JsonPath
def idCSV = new File('id.csv')
def index = [fileOne.json, fileTwo.json]
def jsonString
index.each { file ->
jsonString = ________
def ids = JsonPath.read(jsonString, '$..id')
ids.each { id ->
idCSV << id << newLine
}
}
How to fill the jsonString = ____, so that I can json file into string and parse the string to extract ids and some information from the json string.
And I don't to do it in http request-> GET-> file format.
Previously i have extraced jsonString from http response and it worked well now I want to do it this way.
Use JsonSlurper:
def jsonString = new groovy.json.JsonSlurper().parseText(new File("json.txt").text)
My expectation is that you're looking for File.getText() function
jsonString = file.text
I have no full vision why do you need to store the values from JSON in a CSV file, however there is an alternative way of achieving this which doesn't require scripting as your approach will work with 1 concurrent thread only, if you will add more users attempting writing into the same file - you'll run into a race condition :
You can read the files from the folder into JMeter Variables via Directory Listing Config
The file can be read using HTTP Request sampler
The values cane be fetched using JSON Extractor, they will be automatically stored into JMeter Variables so you will able to use them later on
If you need the values to be present in the file (although I wouldn't recommend this approach cause it will cause massive disk IO and potentially can run your test) you can go for the Flexible File Writer
I store a blob of Json in the datastore using JsonProperty.
I don't know the structure of the json data.
I am using endpoints proto datastore in order to retrieve my data.
The probleme is the json property is encoded in base64 and I want a plain json object.
For the example, the json data will be:
{
first: 1,
second: 2
}
My code looks something like:
import endpoints
from google.appengine.ext import ndb
from protorpc import remote
from endpoints_proto_datastore.ndb import EndpointsModel
class Model(EndpointsModel):
data = ndb.JsonProperty()
#endpoints.api(name='myapi', version='v1', description='My Sample API')
class DataEndpoint(remote.Service):
#Model.method(path='mymodel2', http_method='POST',
name='mymodel.insert')
def MyModelInsert(self, my_model):
my_model.data = {"first": 1, "second": 2}
my_model.put()
return my_model
#Model.method(path='mymodel/{entityKey}',
http_method='GET',
name='mymodel.get')
def getMyModel(self, model):
print(model.data)
return model
API = endpoints.api_server([DataEndpoint])
When I call the api for getting a model, I get:
POST /_ah/api/myapi/v1/mymodel2
{
"data": "eyJzZWNvbmQiOiAyLCAiZmlyc3QiOiAxfQ=="
}
where eyJzZWNvbmQiOiAyLCAiZmlyc3QiOiAxfQ== is the base64 encoded of {"second": 2, "first": 1}
And the print statement give me: {u'second': 2, u'first': 1}
So, in the method, I can explore the json blob data as a python dict.
But, in the api call, the data is encoded in base64.
I expeted the api call to give me:
{
'data': {
'second': 2,
'first': 1
}
}
How can I get this result?
After the discussion in the comments of your question, let me share with you a sample code that you can use in order to store a JSON object in Datastore (it will be stored as a string), and later retrieve it in such a way that:
It will show as plain JSON after the API call.
You will be able to parse it again to a Python dict using eval.
I hope I understood correctly your issue, and this helps you with it.
import endpoints
from google.appengine.ext import ndb
from protorpc import remote
from endpoints_proto_datastore.ndb import EndpointsModel
class Sample(EndpointsModel):
column1 = ndb.StringProperty()
column2 = ndb.IntegerProperty()
column3 = ndb.StringProperty()
#endpoints.api(name='myapi', version='v1', description='My Sample API')
class MyApi(remote.Service):
# URL: .../_ah/api/myapi/v1/mymodel - POSTS A NEW ENTITY
#Sample.method(path='mymodel', http_method='GET', name='Sample.insert')
def MyModelInsert(self, my_model):
dict={'first':1, 'second':2}
dict_str=str(dict)
my_model.column1="Year"
my_model.column2=2018
my_model.column3=dict_str
my_model.put()
return my_model
# URL: .../_ah/api/myapi/v1/mymodel/{ID} - RETRIEVES AN ENTITY BY ITS ID
#Sample.method(request_fields=('id',), path='mymodel/{id}', http_method='GET', name='Sample.get')
def MyModelGet(self, my_model):
if not my_model.from_datastore:
raise endpoints.NotFoundException('MyModel not found.')
dict=eval(my_model.column3)
print("This is the Python dict recovered from a string: {}".format(dict))
return my_model
application = endpoints.api_server([MyApi], restricted=False)
I have tested this code using the development server, but it should work the same in production using App Engine with Endpoints and Datastore.
After querying the first endpoint, it will create a new Entity which you will be able to find in Datastore, and which contains a property column3 with your JSON data in string format:
Then, if you use the ID of that entity to retrieve it, in your browser it will show the string without any strange encoding, just plain JSON:
And in the console, you will be able to see that this string can be converted to a Python dict (or also a JSON, using the json module if you prefer):
I hope I have not missed any point of what you want to achieve, but I think all the most important points are covered with this code: a property being a JSON object, store it in Datastore, retrieve it in a readable format, and being able to use it again as JSON/dict.
Update:
I think you should have a look at the list of available Property Types yourself, in order to find which one fits your requirements better. However, as an additional note, I have done a quick test working with a StructuredProperty (a property inside another property), by adding these modifications to the code:
#Define the nested model (your JSON object)
class Structured(EndpointsModel):
first = ndb.IntegerProperty()
second = ndb.IntegerProperty()
#Here I added a new property for simplicity; remember, StackOverflow does not write code for you :)
class Sample(EndpointsModel):
column1 = ndb.StringProperty()
column2 = ndb.IntegerProperty()
column3 = ndb.StringProperty()
column4 = ndb.StructuredProperty(Structured)
#Modify this endpoint definition to add a new property
#Sample.method(request_fields=('id',), path='mymodel/{id}', http_method='GET', name='Sample.get')
def MyModelGet(self, my_model):
if not my_model.from_datastore:
raise endpoints.NotFoundException('MyModel not found.')
#Add the new nested property here
dict=eval(my_model.column3)
my_model.column4=dict
print(json.dumps(my_model.column3))
print("This is the Python dict recovered from a string: {}".format(dict))
return my_model
With these changes, the response of the call to the endpoint looks like:
Now column4 is a JSON object itself (although it is not printed in-line, I do not think that should be a problem.
I hope this helps too. If this is not the exact behavior you want, maybe should play around with the Property Types available, but I do not think there is one type to which you can print a Python dict (or JSON object) without previously converting it to a String.
We are building a service. It has to read config from a file. We are currently using YAML and Jackson for deserializing the YAML. We have a situation where our YAML file needs to inherit/extend another YAML file(s). E.g., something like:
extends: base.yaml
appName: my-awesome-app
...
thus part of the config is stored in base.yaml. Is there any library that has support for this? Bonus points if it allows to inherit from more than one file. We could change to using JSON instead of YAML.
Neither JSON nor YAML have the ability to include files. Whatever you do will be a pre-processing step where you will be putting the base.yaml and your actual file together.
A crude way of doing this would be:
#include base.yaml
appName: my-awesome-app
Let this be your file. Upon loading, you first read the first line, and if it starts with #include, you replace it with the content of the included file. You need to do this recursively. This is basically what the C preprocessor does with C files and includes.
Drawbacks are:
even if both files are valid YAML, the result may not.
if either files includes a directive end or document end marker (--- or ...), you will end up with two separate documents in one file.
you cannot replace any values from base.yaml inside your file.
So an alternative would be to actually operate on the YAML structure. For this, you need the API of the YAML parser (SnakeYAML in your case) and parse your file with that. You should use the compose API:
private Node preprocess(final Reader myInput) {
final Yaml yaml = new Yaml();
final Node node = yaml.compose(myInput);
processIncludes(node);
return node;
}
private void processIncludes(final Node node) {
if (node instanceof MappingNode) {
final List<NodeTuple> values = ((MappingNode) node).getValue();
for (final NodeTuple tuple: values) {
if ("!include".equals(tuple.getKeyNode().getTag().getValue())) {
final String includedFilePath =
((ScalarNode) tuple.getValueNode()).getValue();
final Node content = preprocess(new FileReader(includedFilePath));
// now merge the content in your preferred way into the values list.
// that will change the content of the node.
}
}
}
}
public String executePreprocessor(final Reader source) {
final Node node = preprocess(source);
final StringWriter writer = new StringWriter();
final DumperOptions dOptions = new DumperOptions()
Serializer ser = new Serializer(new Emitter(writer, dOptions),
new Resolver(), dOptions, null);
ser.open();
ser.serialize(node);
ser.close();
return writer.toString();
}
This code would parse includes like this:
!include : base.yaml
appName: my-awesome-app
I used the private tag !include so that there will not be name clashes with any normal mapping key. Mind the space behind !include. I didn't give code to merge the included file because I did not know how you want to handle duplicate mapping keys. It should not be hard to implement though. Be aware of bugs, I have not tested this code.
The resulting String can be the input to Jackson.
Probably for the same desire, I have created this tool: jq-front.
You can do it by following syntax and combinating with yq command.
extends: [ base.yaml ]
appName: my-awesome-app
...
$ yq -j . your.yaml | jq-front | yq -y .
Note that you need to place file names to be extended in an array since the tool supports multiple inheritance.
Points potentially you don't like are
It's quite a bit slow. (But for configuration information, it might be ok since you can convert it to an expanded file once and you will never not the original one after that for your system)
Objects inside an array cannot behave as expected since the tool relies on * operator of jq.
I want to parse a json file into objects, and save it to database. I just create a groovy script that runs in grails console(typing grails console in cmd line). I did not create grails app or domain class. Inside this small script, When I call save, I have
groovy.lang.MissingMethodException: No signature of method: Blog.save()
is applicable for argument types: () values: []
Possible solutions: wait(), any(), wait(long), isCase(java.lang.Object),
sleep(long), any(groovy.lang.Closure)
Am I missing something?
I'm also confused that if I do save, is it going to save data to a table called Blog? Should I build any database connection here? (Because I grails domain class, we don't need to. But is it different using pure groovy?)
Many Thanks!
import grails.converters.*
import org.codehaus.groovy.grails.web.json.*;
class Blog {
String title
String body
static mapping = {
body type:"text"
attachment type:"text"
}
Blog(title,body,slug){
this.title = title
this.body=body
}
}
here parse the json
// parse json
List parsedList =JSON.parse(new FileInputStream("c:/ning-blogs.json"), "UTF-8")
def blogs = parsedList.collect {JSONObject jsonObject ->
new Blog(jsonObject.get("title"),jsonObject.get("description"),"N/A");
}
loop blogs and save each object
for (i in blogs){
// println i.title; I'll get the information needed.
i.save();
}
I don't have large experience with grails, but from a quick googling seems like that for a class be treated like a model class, it will need to be either on the correct convention-package/dir or a legacy jar with hibernate mapping/JPA annotation. Thus your example can't work. Why not define that model in your model package?