How to split the data of NodeObject in Apache Flink - json

I'm using Flink to process the data coming from some data source (such as Kafka, Pravega etc).
In my case, the data source is Pravega, which provided me a flink connector.
My data source is sending me some JSON data as below:
{"key": "value"}
{"key": "value2"}
{"key": "value3"}
...
...
Here is my piece of code:
PravegaDeserializationSchema<ObjectNode> adapter = new PravegaDeserializationSchema<>(ObjectNode.class, new JavaSerializer<>());
FlinkPravegaReader<ObjectNode> source = FlinkPravegaReader.<ObjectNode>builder()
.withPravegaConfig(pravegaConfig)
.forStream(stream)
.withDeserializationSchema(adapter)
.build();
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<ObjectNode> dataStream = env.addSource(source).name("Pravega Stream");
dataStream.map(new MapFunction<ObjectNode, String>() {
#Override
public String map(ObjectNode node) throws Exception {
return node.toString();
}
})
.keyBy("word") // ERROR
.timeWindow(Time.seconds(10))
.sum("count");
As you see, I used the FlinkPravegaReader and a proper deserializer to get the JSON stream coming from Pravega.
Then I try to transform the JSON data into a String, KeyBy them and count them.
However, I get an error:
The program finished with the following exception:
Field expression must be equal to '*' or '_' for non-composite types.
org.apache.flink.api.common.operators.Keys$ExpressionKeys.<init>(Keys.java:342)
org.apache.flink.streaming.api.datastream.DataStream.keyBy(DataStream.java:340)
myflink.StreamingJob.main(StreamingJob.java:114)
It seems that KeyBy threw this exception.
Well, I'm not a Flink expert so I don't know why. I've read the source code of the official example WordCount. In that example, there is a custtom splitter, which is used to split the String data into words.
So I'm thinking if I need to use some kind of splitter in this case too? If so, what kind of splitter should I use? Can you show me an example? If not, why did I get such an error and how to solve it?

I guess you have read the document about how to specify keys
Specify keys
The example codes use keyby("word") because word is a field of POJO type WC.
// some ordinary POJO (Plain old Java Object)
public class WC {
public String word;
public int count;
}
DataStream<WC> words = // [...]
DataStream<WC> wordCounts = words.keyBy("word").window(/*window specification*/);
In your case, you put a map operator before keyBy, and the output of this map operator is a string. So there is obviously no word field in your case. If you actually want to group this string stream, you need to write it like this .keyBy(String::toString)
Or you can even implement a customized keySelector to generate your own key.
Customized Key Selector

Related

GSON | Extract JSON's Root Name | JsonPath Or JsonPointer

I am looking at extracting the root element of a JSON document. It looks like this is possible neither using JsonPointer nor JsonPath as my attempts to look up for such an expression has been unsuccessful. Any tips would be appreciated. TIA.
Sample document:
{
"MESSAGE1_ROOT_INPUT": {
"CTRL_SEG": "test"
}
}
The below using gson 2.9.0:
$.*~
produces:
{"CTRL_SEG": "test"}
while JSONPath Online produces this:
[
"MESSAGE1_ROOT_INPUT"
]
The attempt is to get text "MESSAGE1_ROOT_INPUT" using JsonPath/JsonPointer expression(s). Note that, extracting this the traditional (substring or regex on a stringified json text) way, would preferably be my last resort.
Background: We are building an API service that accepts JSON documents with different roots. Such as, MESSAGE2_ROOT_INPUT, MESSAGE3_ROOT_INPUT, etc. It is based on this, the routing of a message further will occur.
Supported/Employed Languages: Java/GSON Library/RegEx
Gson does not natively support JSONPath or JSON Pointer. However, you can quite efficiently obtain the name of the first property using JsonReader:
public static String getFirstPropertyName(Reader reader) throws IOException {
// Don't have to call JsonReader.close(); that would just close the provided reader
JsonReader jsonReader = new JsonReader(reader);
jsonReader.beginObject();
return jsonReader.nextName();
}
There are however two things to keep in mind:
This only reads the beginning of the JSON document; it neither verifies that the complete JSON document has valid syntax, nor checks if there might be more top-level properties
This consumes some data from the Reader; to further process the data you have to buffer the data to allow re-reading it again (you can also first store the JSON in a String and pass a StringReader to JsonReader)

GCP Proto Datastore encode JsonProperty in base64

I store a blob of Json in the datastore using JsonProperty.
I don't know the structure of the json data.
I am using endpoints proto datastore in order to retrieve my data.
The probleme is the json property is encoded in base64 and I want a plain json object.
For the example, the json data will be:
{
first: 1,
second: 2
}
My code looks something like:
import endpoints
from google.appengine.ext import ndb
from protorpc import remote
from endpoints_proto_datastore.ndb import EndpointsModel
class Model(EndpointsModel):
data = ndb.JsonProperty()
#endpoints.api(name='myapi', version='v1', description='My Sample API')
class DataEndpoint(remote.Service):
#Model.method(path='mymodel2', http_method='POST',
name='mymodel.insert')
def MyModelInsert(self, my_model):
my_model.data = {"first": 1, "second": 2}
my_model.put()
return my_model
#Model.method(path='mymodel/{entityKey}',
http_method='GET',
name='mymodel.get')
def getMyModel(self, model):
print(model.data)
return model
API = endpoints.api_server([DataEndpoint])
When I call the api for getting a model, I get:
POST /_ah/api/myapi/v1/mymodel2
{
"data": "eyJzZWNvbmQiOiAyLCAiZmlyc3QiOiAxfQ=="
}
where eyJzZWNvbmQiOiAyLCAiZmlyc3QiOiAxfQ== is the base64 encoded of {"second": 2, "first": 1}
And the print statement give me: {u'second': 2, u'first': 1}
So, in the method, I can explore the json blob data as a python dict.
But, in the api call, the data is encoded in base64.
I expeted the api call to give me:
{
'data': {
'second': 2,
'first': 1
}
}
How can I get this result?
After the discussion in the comments of your question, let me share with you a sample code that you can use in order to store a JSON object in Datastore (it will be stored as a string), and later retrieve it in such a way that:
It will show as plain JSON after the API call.
You will be able to parse it again to a Python dict using eval.
I hope I understood correctly your issue, and this helps you with it.
import endpoints
from google.appengine.ext import ndb
from protorpc import remote
from endpoints_proto_datastore.ndb import EndpointsModel
class Sample(EndpointsModel):
column1 = ndb.StringProperty()
column2 = ndb.IntegerProperty()
column3 = ndb.StringProperty()
#endpoints.api(name='myapi', version='v1', description='My Sample API')
class MyApi(remote.Service):
# URL: .../_ah/api/myapi/v1/mymodel - POSTS A NEW ENTITY
#Sample.method(path='mymodel', http_method='GET', name='Sample.insert')
def MyModelInsert(self, my_model):
dict={'first':1, 'second':2}
dict_str=str(dict)
my_model.column1="Year"
my_model.column2=2018
my_model.column3=dict_str
my_model.put()
return my_model
# URL: .../_ah/api/myapi/v1/mymodel/{ID} - RETRIEVES AN ENTITY BY ITS ID
#Sample.method(request_fields=('id',), path='mymodel/{id}', http_method='GET', name='Sample.get')
def MyModelGet(self, my_model):
if not my_model.from_datastore:
raise endpoints.NotFoundException('MyModel not found.')
dict=eval(my_model.column3)
print("This is the Python dict recovered from a string: {}".format(dict))
return my_model
application = endpoints.api_server([MyApi], restricted=False)
I have tested this code using the development server, but it should work the same in production using App Engine with Endpoints and Datastore.
After querying the first endpoint, it will create a new Entity which you will be able to find in Datastore, and which contains a property column3 with your JSON data in string format:
Then, if you use the ID of that entity to retrieve it, in your browser it will show the string without any strange encoding, just plain JSON:
And in the console, you will be able to see that this string can be converted to a Python dict (or also a JSON, using the json module if you prefer):
I hope I have not missed any point of what you want to achieve, but I think all the most important points are covered with this code: a property being a JSON object, store it in Datastore, retrieve it in a readable format, and being able to use it again as JSON/dict.
Update:
I think you should have a look at the list of available Property Types yourself, in order to find which one fits your requirements better. However, as an additional note, I have done a quick test working with a StructuredProperty (a property inside another property), by adding these modifications to the code:
#Define the nested model (your JSON object)
class Structured(EndpointsModel):
first = ndb.IntegerProperty()
second = ndb.IntegerProperty()
#Here I added a new property for simplicity; remember, StackOverflow does not write code for you :)
class Sample(EndpointsModel):
column1 = ndb.StringProperty()
column2 = ndb.IntegerProperty()
column3 = ndb.StringProperty()
column4 = ndb.StructuredProperty(Structured)
#Modify this endpoint definition to add a new property
#Sample.method(request_fields=('id',), path='mymodel/{id}', http_method='GET', name='Sample.get')
def MyModelGet(self, my_model):
if not my_model.from_datastore:
raise endpoints.NotFoundException('MyModel not found.')
#Add the new nested property here
dict=eval(my_model.column3)
my_model.column4=dict
print(json.dumps(my_model.column3))
print("This is the Python dict recovered from a string: {}".format(dict))
return my_model
With these changes, the response of the call to the endpoint looks like:
Now column4 is a JSON object itself (although it is not printed in-line, I do not think that should be a problem.
I hope this helps too. If this is not the exact behavior you want, maybe should play around with the Property Types available, but I do not think there is one type to which you can print a Python dict (or JSON object) without previously converting it to a String.

Is there a XPath with schema equivilent for JSON? [duplicate]

I have JSON as a string and a JSONPath as a string. I'd like to query the JSON with the JSON path, getting the resulting JSON as a string.
I gather that Jayway's json-path is the standard. The online API, however, doesn't have have much relation to the actual library you get from Maven. GrepCode's version roughly matches up though.
It seems like I ought to be able to do:
String originalJson; //these are initialized to actual data
String jsonPath;
String queriedJson = JsonPath.<String>read(originalJson, jsonPath);
The problem is that read returns whatever it feels most appropriate based on what the JSONPath actually finds (e.g. a List<Object>, String, double, etc.), thus my code throws an exception for certain queries. It seems pretty reasonable to assume that there'd be some way to query JSON and get JSON back; any suggestions?
Java JsonPath API found at jayway JsonPath might have changed a little since all the above answers/comments. Documentation too. Just follow the above link and read that README.md, it contains some very clear usage documentation IMO.
Basically, as of current latest version 2.2.0 of the library, there are a few different ways of achieving what's been requested here, such as:
Pattern:
--------
String json = "{...your JSON here...}";
String jsonPathExpression = "$...your jsonPath expression here...";
J requestedClass = JsonPath.parse(json).read(jsonPathExpression, YouRequestedClass.class);
Example:
--------
// For better readability: {"store": { "books": [ {"author": "Stephen King", "title": "IT"}, {"author": "Agatha Christie", "title": "The ABC Murders"} ] } }
String json = "{\"store\": { \"books\": [ {\"author\": \"Stephen King\", \"title\": \"IT\"}, {\"author\": \"Agatha Christie\", \"title\": \"The ABC Murders\"} ] } }";
String jsonPathExpression = "$.store.books[?(#.title=='IT')]";
JsonNode jsonNode = JsonPath.parse(json).read(jsonPathExpression, JsonNode.class);
And for reference, calling 'JsonPath.parse(..)' will return an object of class 'JsonContent' implementing some interfaces such as 'ReadContext', which contains several different 'read(..)' operations, such as the one demonstrated above:
/**
* Reads the given path from this context
*
* #param path path to apply
* #param type expected return type (will try to map)
* #param <T>
* #return result
*/
<T> T read(JsonPath path, Class<T> type);
Hope this help anyone.
There definitely exists a way to query Json and get Json back using JsonPath.
See example below:
String jsonString = "{\"delivery_codes\": [{\"postal_code\": {\"district\": \"Ghaziabad\", \"pin\": 201001, \"pre_paid\": \"Y\", \"cash\": \"Y\", \"pickup\": \"Y\", \"repl\": \"N\", \"cod\": \"Y\", \"is_oda\": \"N\", \"sort_code\": \"GB\", \"state_code\": \"UP\"}}]}";
String jsonExp = "$.delivery_codes";
JsonNode pincodes = JsonPath.read(jsonExp, jsonString, JsonNode.class);
System.out.println("pincodesJson : "+pincodes);
The output of the above will be inner Json.
[{"postal_code":{"district":"Ghaziabad","pin":201001,"pre_paid":"Y","cash":"Y","pickup":"Y","repl":"N","cod":"Y","is_oda":"N","sort_code":"GB","state_code":"UP"}}]
Now each individual name/value pairs can be parsed by iterating the List (JsonNode) we got above.
for(int i = 0; i< pincodes.size();i++){
JsonNode node = pincodes.get(i);
String pin = JsonPath.read("$.postal_code.pin", node, String.class);
String district = JsonPath.read("$.postal_code.district", node, String.class);
System.out.println("pin :: " + pin + " district :: " + district );
}
The output will be:
pin :: 201001 district :: Ghaziabad
Depending upon the Json you are trying to parse, you can decide whether to fetch a List or just a single String/Long value.
Hope it helps in solving your problem.
For those of you wondering why some of these years-old answers aren't working, you can learn a lot from the test cases.
As of September 2018, here's how you can get Jackson JsonNode results:
Configuration jacksonConfig = Configuration.builder()
.mappingProvider( new JacksonMappingProvider() )
.jsonProvider( new JacksonJsonProvider() )
.build();
JsonNode node = JsonPath.using( jacksonConfig ).parse(jsonString);
//If you have a json object already no need to initiate the jsonObject
JSONObject jsonObject = new JSONObject();
String jsonString = jsonObject.toString();
String path = "$.rootObject.childObject"
//Only returning the child object
JSONObject j = JsonPath.read(jsonString, path);
//Returning the array of string type from the child object. E.g
//{"root": "child":[x, y, z]}
List<String> values = sonPath.read(jsonString, path);
Check out the jpath API. It's xpath equivalent for JSON Data. You can read data by providing the jpath which will traverse the JSON data and return the requested value.
This Java class is the implementation as well as it has example codes on how to call the APIs.
https://github.com/satyapaul/jpath/blob/master/JSONDataReader.java
Readme -
https://github.com/satyapaul/jpath/blob/master/README.md

How can I define a ReST endpoint that allows json input and maps it to a JsonSlurper

I want to write an API ReST endpoint, using Spring 4.0 and Groovy, such that the #RequestBody parameter can be any generic JSON input, and it will be mapped to a Groovy JsonSlurper so that I can simply access the data via the slurper.
The benefit here being that I can send various JSON documents to my endpoint without having to define a DTO object for every format that I might send.
Currently my method looks like this (and works):
#RequestMapping(value = "/{id}", method = RequestMethod.PUT, consumes = MediaType.APPLICATION_JSON_VALUE)
ResponseEntity<String> putTest(#RequestBody ExampleDTO dto) {
def json = new groovy.json.JsonBuilder()
json(
id: dto.id,
name: dto.name
);
return new ResponseEntity(json.content, HttpStatus.OK);
}
But what I want, is to get rid of the "ExampleDTO" object, and just have any JSON that is passed in get mapped straight into a JsonSlurper, or something that I can input into a JsonSlurper, so that I can access the fields of the input object like so:
def json = new JsonSlurper().parseText(input);
String exampleName = json.name;
I initially thought I could just accept a String instead of ExampleDTO, and then slurp the String, but then I have been running into a plethora of issues in my AngularJS client, trying to send my JSON objects as strings to the API endpoint. I'm met with an annoying need to escape all of the double quotes and surround the entire JSON string with double quotes. Then I run into issues if any of my data has quotes or various special characters in it. It just doesn't seem like a clean or reliable solution.
I open to anything that will cleanly translate my AngularJS JSON objects into valid Strings, or anything I can do in the ReST method that will allow JSON input without mapping it to a specific object.
Thanks in advance!
Tonya

Grails: Easy and efficient way to parse JSON from a Request

Please pardon me if this is a repeat question. I have been through some of the questions/answers with a similar requirement but somehow got a bit overwhelmed and confused at the same time. My requirement is:
I get a JSON string/object as a request parameter. ( eg: params.timesheetJSON )
I then have to parse/iterate through it.
Here is the JSON that my grails controller will be receiving:
{
"loginName":"user1",
"timesheetList":
[
{
"periodBegin":"2014/10/12",
"periodEnd":"2014/10/18",
"timesheetRows":[
{
"task":"Cleaning",
"description":"cleaning description",
"paycode":"payCode1"
},
{
"task":"painting",
"activityDescription":"painting description",
"paycode":"payCode2"
}
]
}
],
"overallStatus":"SUCCESS"
}
Questions:
How can I retrieve the whole JSON string from the request? Does request.JSON be fine here? If so, will request.JSON.timesheetJSON yield me the actual JSON that I want as a JSONObject?
What is the best way to parse through the JSON object that I got from the request? Is it grails.converters.JSON? Or is there any other easy way of parsing through? Like some API which will return the JSON as a collection of objects by automatically taking care of parsing. Or is programatically parsing through the JSON object the only way?
Like I said, please pardon me if the question is sounding vague. Any good references JSON parsing with grails might also be helpful here.
Edit: There's a change in the way I get the JSON string now. I get the JSON string as a request paramter.
String saveJSON // This holds the above JSON string.
def jsonObject = grails.converters.JSON.parse(saveJSON) // No problem here. Returns a JSONObject. I checked the class type.
def jsonArray = jsonArray.timesheetList // No problem here. Returns a JSONArray. I checked the class type.
println "*** Size of jsonArray1: " + jsonArray1.size() // Returns size 1. It seemed fine as the above JSON string had only one timesheet in timesheetList
def object1 = jsonArray[1] // This throws the JSONException, JSONArray[1] not found. I tried jsonArray.getJSONObject(1) and that throws the same exception.
Basically, I am looking to seamlessly iterate through the JSON string now.
I have wrote some code that explains how this can be done, that you can see below, but to be clear, first the answers to your questions:
Your JSON String as you wrote above will be the contents of your POST payload to the rest controller. Grails will use its data binding mechanism to bind the incomming data to a Command object that your should prepare. It has to have fields corresponding to the parameters in your JSON String (see below). After you bind your command object to your actual domain object, you can get all the data you want, by simply operating on fields and lists
The way to parse thru the JSON object is shown in my example below. The incomming request is esentially a nested map, with can be simply accessed with a dot
Now some code that illustrates how to do it.
In your controller create a method that accepts "YourCommand" object as input parameter:
def yourRestServiceMethod (YourCommand comm){
YourClass yourClass = new YourClass()
comm.bindTo(yourClass)
// do something with yourClass
// println yourClass.timeSheetList
}
The command looks like this:
class YourCommand {
String loginName
List<Map> timesheetList = []
String overallStatus
void bindTo(YourClass yourClass){
yourClass.loginName=loginName
yourClass.overallStatus=overallStatus
timesheetList.each { sheet ->
TimeSheet timeSheet = new TimeSheet()
timeSheet.periodBegin = sheet.periodBegin
timeSheet.periodEnd = sheet.periodEnd
sheet.timesheetRows.each { row ->
TimeSheetRow timeSheetRow = new TimeSheetRow()
timeSheetRow.task = row.task
timeSheetRow.description = row.description
timeSheetRow.paycode = row.paycode
timeSheet.timesheetRows.add(timeSheetRow)
}
yourClass.timeSheetList.add(timeSheet)
}
}
}
Its "bindTo" method is the key piece of logic that understands how to get parameters from the incomming request and map it to a regular object. That object is of type "YourClass" and it looks like this:
class YourClass {
String loginName
Collection<TimeSheet> timeSheetList = []
String overallStatus
}
all other classes that are part of that class:
class TimeSheet {
String periodBegin
String periodEnd
Collection<TimeSheetRow> timesheetRows = []
}
and the last one:
class TimeSheetRow {
String task
String description
String paycode
}
Hope this example is clear enough for you and answers your question
Edit: Extending the answer according to the new requirements
Looking at your new code, I see that you probably did some typos when writting that post
def jsonArray = jsonArray.timesheetList
should be:
def jsonArray = jsonObject.timesheetList
but you obviously have it properly in your code since otherwise it would not work, then the same with that line with "println":
jsonArray1.size()
shuold be:
jsonArray.size()
and the essential fix:
def object1 = jsonArray[1]
shuold be
def object1 = jsonArray[0]
your array is of size==1, the indexing starts with 0. // Can it be that easy? ;)
Then "object1" is again a JSONObject, so you can access the fields with a "." or as a map, for example like this:
object1.get('periodEnd')
I see your example contains errors, which lead you to implement more complex JSON parsing solutions.
I rewrite your sample to the working version. (At least now for Grails 3.x)
String saveJSON // This holds the above JSON string.
def jsonObject = grails.converters.JSON.parse(saveJSON)
println jsonObject.timesheetList // output timesheetList structure
println jsonObject.timesheetList[0].timesheetRows[1] // output second element of timesheetRows array: [paycode:payCode2, task:painting, activityDescription:painting description]