Get JSON as input in apache flink - json

I am trying to receive and access JSON data from a Kafka Topic in Flink. What works is, producing data, send it to a Kafka Topic und receive it in Flink as String. But I want to access the data in an object-oriented way (e.g. extract a specific atrribute from every message)?
Therefore I have a Kafka Producer which sends data (e.g. every 1s) to a Kafka Topic:
ObjectMapper test = new ObjectMapper();
ObjectNode jNode= test.createObjectNode();
jNode.put("LoPos", longPos)
.put("LaPos", latPos)
.put("Timestamp", timestamp.toString());
ProducerRecord<String, ObjectNode> rec = new ProducerRecord<String, ObjectNode>(topicName, jNode);
producer.send(rec);
so the JSON data looks like this:
{"LoPos":10.5,"LaPos":2.5,"Timestamp":"2022-10-31 12:45:19.353"}
What works is, receiving the data and print it as string:
DataStream<String> input =
env.fromSource(
KafkaSource.<String>builder()
.setBootstrapServers("localhost:9092")
.setBounded(OffsetsInitializer.latest())
.setValueOnlyDeserializer(new SimpleStringSchema())
.setTopics(topicName)
.build(),
WatermarkStrategy.noWatermarks(),
"kafka-source");
Print the data as string:
DataStream<String> parsed = input.map(new MapFunction<String, String>() {
private static final long serialVersionUID = -6867736771747690202L;
#Override
public String map(String value) {
System.out.println(value);
return "test";
How can I receive the data in Flink and access it in an object-oriented way (e.g. extract LoPos from every message)? Which approach would you recommend? I tried it with JSONValueDeserializationSchema, but without success...
Thanks!
Update1:
I updated to Flink 1.16 to use JsonDeserializationSchema.
Then I created a Flink Pojo Event like this:
public class Event {
public double LoPos;
public double LaPos;
public Timestamp timestamp;
public Event() {}
public Event(final double LoPos, final double LaPos, final Timestamp timestamp) {
this.LaPos=LaPos;
this.LoPos=LoPos;
this.timestamp=timestamp;
}
#Override
public String toString() {
return String.valueOf(LaPos);
}
}
To read the JSON data, I implemented the following:
KafkaSource<Event> source = KafkaSource.<Event>builder()
.setBootstrapServers("localhost:9092")
.setBounded(OffsetsInitializer.earliest())
.setValueOnlyDeserializer(new JsonDeserializationSchema<>(Event.class))
.setTopics("testTopic2")
.build();
DataStream<Event> test=env.fromSource(source, WatermarkStrategy.noWatermarks(), "test");
System.out.println(source.toString());
System.out.println(test.toString());
//test.sinkTo(new PrintSink<>());
test.print();
env.execute();
So I would expect, when using source.toString() the value of LaPos is getting returned. But all I get is:
org.apache.flink.connector.kafka.source.KafkaSource#510f3d34
What am I doing wrong?

This topic is covered in one of the recipes in the Immerok Apache Flink Cookbook.
In the examples below, I'm assuming Event is a Flink POJO.
With Flink 1.15 or earlier, you should use a custom deserializer:
KafkaSource<Event> source =
KafkaSource.<Event>builder()
.setBootstrapServers("localhost:9092")
.setTopics(TOPIC)
.setStartingOffsets(OffsetsInitializer.earliest())
.setValueOnlyDeserializer(new EventDeserializationSchema())
.build();
The deserializer can be something like this:
public class EventDeserializationSchema extends AbstractDeserializationSchema<Event> {
private static final long serialVersionUID = 1L;
private transient ObjectMapper objectMapper;
/**
* For performance reasons it's better to create on ObjectMapper in this open method rather than
* creating a new ObjectMapper for every record.
*/
#Override
public void open(InitializationContext context) {
// JavaTimeModule is needed for Java 8 data time (Instant) support
objectMapper = JsonMapper.builder().build().registerModule(new JavaTimeModule());
}
/**
* If our deserialize method needed access to the information in the Kafka headers of a
* KafkaConsumerRecord, we would have implemented a KafkaRecordDeserializationSchema instead of
* extending AbstractDeserializationSchema.
*/
#Override
public Event deserialize(byte[] message) throws IOException {
return objectMapper.readValue(message, Event.class);
}
}
We've made this easier in Flink 1.16, where we've added a proper JsonDeserializationSchema you can use:
KafkaSource<Event> source =
KafkaSource.<Event>builder()
.setBootstrapServers("localhost:9092")
.setTopics(TOPIC)
.setStartingOffsets(OffsetsInitializer.earliest())
.setValueOnlyDeserializer(new JsonDeserializationSchema<>(Event.class))
.build();
Disclaimer: I work for Immerok.

Related

Flink Kafka - Custom Class Data is always null

Custom Class
Person
class Person
{
private Integer id;
private String name;
//getters and setters
}
Kafka Flink Connector
TypeInformation<Person> info = TypeInformation.of(Person.class);
TypeInformationSerializationSchema schema = new TypeInformationSerializationSchema(info, new ExecutionConfig());
DataStream<Person> input = env.addSource( new FlinkKafkaConsumer08<>("persons", schema , getKafkaProperties()));
Now if I send the below json
{ "id" : 1, "name": Synd }
through Kafka Console Producer, the flink code throws null pointer exception
But if I use SimpleStringSchema instead of CustomSchema as defined before, the stream is getting printed.
What is wrong in the above setup
The TypeInformationSerializationSchema is a de-/serialization schema which uses Flink's serialization stack and, thus, also its serializer. Therefore, when using this SerializationSchema Flink expects that the data has been serialized with Flink's serializer for the Person type.
Given the excerpt of the Person class, Flink will most likely use its PojoTypeSerializer. Feeding JSON input data won't be understood by this serializer.
If you want to use JSON as the input format, then you have to define your own DeserializationSchema which can parse JSON into Person.
Answer for who have the same question
Custom Serializer
class PersonSchema implements DeserializationSchema<Person>{
private ObjectMapper mapper = new ObjectMapper(); //com.fasterxml.jackson.databind.ObjectMapper;
#Override
public Person deserialize(byte[] bytes) throws IOException {
return mapper.readValue( bytes, Person.class );
}
#Override
public boolean isEndOfStream(Person person) {
return false;
}
#Override
public TypeInformation<Person> getProducedType() {
return TypeInformation.of(new TypeHint<Person>(){});
}
}
Using the schema
DataStream<Person> input = env.addSource( new FlinkKafkaConsumer08<>("persons", new PersonSchema() , getKafkaProperties()));

JSON Patch Request validation in Java

In my spring boot service, I'm using https://github.com/java-json-tools/json-patch for handling PATCH requests.
Everything seems to be ok except a way to avoid modifying immutable fields like object id's, creation_time etc. I have found a similar question on Github https://github.com/java-json-tools/json-patch/issues/21 for which I could not find the right example.
This blog seems to give some interesting solutions about validating JSON patch requests with a solution in node.js. Would be good to know if something similar in JAVA is already there.
Under many circumstances you can just patch an intermediate object which only has fields that the user can write to. After that you could quite easily map the intermediate object to your entity, using some object mapper or just manually.
The downside of this is that if you have a requirement that fields must be explicitly nullable, you won’t know if the patch object set a field to null explicitly or if it was never present in the patch.
What you can do too is abuse Optionals for this, e.g.
public class ProjectPatchDTO {
private Optional<#NotBlank String> name;
private Optional<String> description;
}
Although Optionals were not intended to be used like this, it's the most straightforward way to implement patch operations while maintaining a typed input. When the optional field is null, it was never passed from the client. When the optional is not present, that means the client has set the value to null.
Instead of receiving a JsonPatch directly from the client, define a DTO to handle the validation and then you will later convert the DTO instance to a JsonPatch.
Say you want to update a user of instance User.class, you can define a DTO such as:
public class UserDTO {
#Email(message = "The provided email is invalid")
private String username;
#Size(min = 2, max = 10, message = "firstname should have at least 2 and a maximum of 10 characters")
private String firstName;
#Size(min = 2, max = 10, message = "firstname should have at least 2 and a maximum of 10 characters")
private String lastName;
#Override
public String toString() {
return new Gson().toJson(this);
}
//getters and setters
}
The custom toString method ensures that fields that are not included in the update request are not prefilled with null values.
Your PATCH request can be as follows(For simplicity, I didn't cater for Exceptions)
#PatchMapping("/{id}")
ResponseEntity<Object> updateUser(#RequestBody #Valid UserDTO request,
#PathVariable String id) throws ParseException, IOException, JsonPatchException {
User oldUser = userRepository.findById(id);
String detailsToUpdate = request.toString();
User newUser = applyPatchToUser(detailsToUpdate, oldUser);
userRepository.save(newUser);
return userService.updateUser(request, id);
}
The following method returns the patched User which is updated above in the controller.
private User applyPatchToUser(String detailsToUpdate, User oldUser) throws IOException, JsonPatchException {
ObjectMapper objectMapper = new ObjectMapper();
// Parse the patch to JsonNode
JsonNode patchNode = objectMapper.readTree(detailsToUpdate);
// Create the patch
JsonMergePatch patch = JsonMergePatch.fromJson(patchNode);
// Convert the original object to JsonNode
JsonNode originalObjNode = objectMapper.valueToTree(oldUser);
// Apply the patch
TreeNode patchedObjNode = patch.apply(originalObjNode);
// Convert the patched node to an updated obj
return objectMapper.treeToValue(patchedObjNode, User.class);
}
Another solution would be to imperatively deserialize and validate the request body.
So your example DTO might look like this:
public class CatDto {
#NotBlank
private String name;
#Min(0)
#Max(100)
private int laziness;
#Max(3)
private int purringVolume;
}
And your controller can be something like this:
#RestController
#RequestMapping("/api/cats")
#io.swagger.v3.oas.annotations.parameters.RequestBody(
content = #Content(schema = #Schema(implementation = CatDto.class)))
// ^^ this passes your CatDto model to swagger (you must use springdoc to get it to work!)
public class CatController {
#Autowired
SmartValidator validator; // we'll use this to validate our request
#PatchMapping(path = "/{id}", consumes = "application/json")
public ResponseEntity<String> updateCat(
#PathVariable String id,
#RequestBody Map<String, Object> body
// ^^ no Valid annotation, no declarative DTO binding here!
) throws MethodArgumentNotValidException {
CatDto catDto = new CatDto();
WebDataBinder binder = new WebDataBinder(catDto);
BindingResult bindingResult = binder.getBindingResult();
binder.bind(new MutablePropertyValues(body));
// ^^ imperatively bind to DTO
body.forEach((k, v) -> validator.validateValue(CatDto.class, k, v, bindingResult));
// ^^ imperatively validate user input
if (bindingResult.hasErrors()) {
throw new MethodArgumentNotValidException(null, bindingResult);
// ^^ this can be handled by your regular exception handler
}
// Here you can do normal stuff with your cat DTO.
// Map it to cat model, send to cat service, whatever.
return ResponseEntity.ok("cat updated");
}
}
No need for Optional's, no extra dependencies, your normal validation just works, your swagger looks good. The only problem is, you don't get proper merge patch on nested objects, but in many use cases that's not even required.

Query for JSON String using JdbcTemplate to neo4j?

I want to use a JdbcTemplate and the Neo4j JDBC driver to query my neo4j database and return a JSON string.
Is there an existing method to do this?
I've googled and I can't find one.
It otherwise looks like a matter of creating a home cooked RowMapper as per here.
The query :
MATCH (s:Site) - [r] - (ss:SiteState) return s,ss;
it return a json but for my use i use an object
public class SiteRowMapper implements RowMapper<Site> {
#Override
public Site mapRow(ResultSet rs, int rowNum) throws SQLException {
Site site = new Site();
SiteState siteState = new SiteState();
Gson json = new Gson();
site = json.fromJson(rs.getString("s"), Site.class);
siteState = json.fromJson(rs.getString("ss"), SiteState.class);
site.setName(siteState.getName());
return site;
}
}

Jersey / JAXB: Unmarshaling of empty json array results in a list with one item where all fields are set to null

I have a really simple rest web service returning a list of questions. This code works as expected when the number of questions returned are greater than zero. But if the server returns an empty json array like [], JAXB creates a list with one question instance where all fields are set to null!
I'm new to both Jersey and JAXB so I don't know whether I haven't configured it correctly or whether this is a known problem. Any tips?
Client configuration:
DefaultApacheHttpClientConfig config = new DefaultApacheHttpClientConfig();
config.getProperties().put(DefaultApacheHttpClientConfig.PROPERTY_HANDLE_COOKIES, true);
config.getClasses().add(JAXBContextResolver.class);
//config.getClasses().add(JacksonJsonProvider.class); // <- Jackson causes other problems
client = ApacheHttpClient.create(config);
JAXBContextResolver:
#Provider
public final class JAXBContextResolver implements ContextResolver<JAXBContext> {
private final JAXBContext context;
private final Set<Class> types;
private final Class[] cTypes = { Question.class };
public JAXBContextResolver() throws Exception {
this.types = new HashSet(Arrays.asList(cTypes));
this.context = new JSONJAXBContext(JSONConfiguration.natural().build(), cTypes);
}
#Override
public JAXBContext getContext(Class<?> objectType) {
return (types.contains(objectType)) ? context : null;
}
}
Client code:
public List<Question> getQuestionsByGroupId(int id) {
return digiRest.path("/questions/byGroupId/" + id).get(new GenericType<List<Question>>() {});
}
The Question class is just a simple pojo.
I know this is not exactly an answer to your question, but I choosed to use GSON on top of jersey, for my current projects. (and I try to avoid JAXB as much as possible), and I found it very easy and resilient.
You just have to declare
#Consumes(MediaType.TEXT_PLAIN)
or
#Produces(MediaType.TEXT_PLAIN)
or both, and use the GSON marshaller/unmarshaller, and work with plain Strings. Very easy to debug, unittest too...
Using Jackson may help.
See org.codehaus.jackson.map.ObjectMapper and org.codehaus.jackson.map.annotate.JsonSerialize.Inclusion.NON_EMPTY
import org.codehaus.jackson.map.ObjectMapper;
import org.codehaus.jackson.map.annotate.JsonSerialize;
public class SampleContextResolver implements ContextResolver<ObjectMapper>
{
#Override
public ObjectMapper getContext(Class<?> type)
{
ObjectMapper mapper = new ObjectMapper();
mapper.setSerializationConfig(mapper.getSerializationConfig()
.withSerializationInclusion(JsonSerialize.Inclusion.NON_EMPTY)
}
}

How to reuse Jersey's JSON/JAXB for serialization?

I have a JAX-RS REST service implemented using Jersey. One of the cool features of JAX-RS/Jersey is how easily a POJO can be turned into a REST service, simply by sprinkling a few Java annotations... including a trivially easy mechanism for translating POJOs to JSON - using JAXB annotations.
Now, I'd like to be able to take advantage of this cool JSON-ifying functionality for non-REST purposes - I'd love to be able to just serialize some of these objects to disk, as JSON text. Here's an example JAXB object that I'd want to serialize:
#XmlRootElement(name = "user")
public class UserInfoImpl implements UserInfo {
public UserInfoImpl() {}
public UserInfoImpl(String user, String details) {
this.user = user;
this.details = details;
}
public String getUser() { return user; }
public void setUser(String user) { this.user = user; }
public String getDetails() { return details; }
public void setDetails(String details) { this.details = details; }
private String user;
private String details;
}
Jersey can turn one of these into json with no additional info. I'm wondering if Jersey has exposed this functionality in the API for needs like mine? I've had no luck finding it so far...
Thanks!
UPDATE 2009-07-09: I have learned that I can use the Providers object to almost do exactly what I want:
#Context Providers ps;
MessageBodyWriter uw = ps.getMessageBodyWriter(UserInfoImpl.class, UserInfoImpl.class, new Annotation[0], MediaType.APPLICATION_JSON_TYPE);
uw.writeTo(....)
... This writes the object as json to any outputstream, which would be perfect for me, but I can only get at the Providers object using #Context from a #Component object. Does anyone know how to access it from a regular, un-annotated POJO? Thanks!
Jersey uses a couple different frameworks depending on whether you use mapped(), badgerfish(), or natural() notation. Natural is usually the one people want. And that's implemented using the very good (and very fast) standalone Jackson JSON processor, I believe, which goes from Object->JAXB->JSON. However Jackson also provides it's own JAX-RS provider to go direct Object->JSON.
In fact, they even added support for JAXB annotations. Have a look at
http://wiki.fasterxml.com/JacksonJAXBAnnotations
I think that's ultimately what you are looking for. Jackson does Object<->JSON processing...Jersey just makes the calls for you
Here's a simple brief example of using JAXB to map objects to JSON (using Jackson):
http://ondra.zizka.cz/stranky/programovani/java/jaxb-json-jackson-howto.texy
ObjectMapper mapper = new ObjectMapper();
String str = mapper.writeValueAsString(pojoObject);
JAXB annotations work fine when serializing to XML.
The main problem is that JAXB does not support empty arrays. So when serializing something like this...
List myArray = new ArrayList();
...to json via jaxb anottations all your empty arrays become null instead of [].
To solve this you can just serialize your pojos directly to json via jackson.
Take a look at this from Jersey's user guide:
http://jersey.java.net/nonav/documentation/latest/user-guide.html#d0e1959
This is the best way to use Jackson provider without JAXB. Moreover, you can always use the latest version of jackson by downlaoding jackson-all-x.y.z-jar from its web.
This method will not interfere with your jaxb annotations so I would suggest to have a try!
Since Jersey is a reference implementation of JAX-RS and JAX-RS is focused completely on providing a standard way of implementing the end-point for the REST service the issues of serializing the payload is left to other standards.
I think that if they included object serialization in the JAX-RS standard it would quickly become a large multi-headed beast that would be difficult to implement and loose some of it's focus.
I appreciate how focused Jersey is on delivering clean and simple to use REST endpoints. In my case I've just subclassed a parent that has all the JAXB plumbing in it so marshalling objects between binary and XML is very clean.
With a little Jersey specific bootstrapping, you can use it to create the necessary JSON objects for you. You need to include the following dependencies (you can use bundle, but it will cause problems if you are using Weld for testing):
<dependency>
<groupId>com.sun.jersey</groupId>
<artifactId>jersey-json</artifactId>
<version>1.12</version>
</dependency>
<dependency>
<groupId>com.sun.jersey</groupId>
<artifactId>jersey-client</artifactId>
<version>1.12</version>
</dependency>
From there you can create a JAXB annotated class. The following is an example:
#XmlRootElement
public class TextMessage {
private String text;
public String getText() { return text; }
public void setText(String s) { this.text = text; }
}
Then you can create the following unit test:
TextMessage textMessage = new TextMessage();
textMessage.setText("hello");
textMessage.setUuid(UUID.randomUUID());
// Jersey specific start
final Providers ps = new Client().getProviders();
// Jersey specific end
final MultivaluedMap<String, Object> responseHeaders = new MultivaluedMap<String, Object>() {
#Override
public void add(final String key, final Object value) {
}
#Override
public void clear() {
}
#Override
public boolean containsKey(final Object key) {
return false;
}
#Override
public boolean containsValue(final Object value) {
return false;
}
#Override
public Set<java.util.Map.Entry<String, List<Object>>> entrySet() {
return null;
}
#Override
public List<Object> get(final Object key) {
return null;
}
#Override
public Object getFirst(final String key) {
return null;
}
#Override
public boolean isEmpty() {
return false;
}
#Override
public Set<String> keySet() {
return null;
}
#Override
public List<Object> put(final String key, final List<Object> value) {
return null;
}
#Override
public void putAll(
final Map<? extends String, ? extends List<Object>> m) {
}
#Override
public void putSingle(final String key, final Object value) {
}
#Override
public List<Object> remove(final Object key) {
return null;
}
#Override
public int size() {
return 0;
}
#Override
public Collection<List<Object>> values() {
return null;
}
};
final MessageBodyWriter<TextMessage> messageBodyWriter = ps
.getMessageBodyWriter(TextMessage.class, TextMessage.class,
new Annotation[0], MediaType.APPLICATION_JSON_TYPE);
final ByteArrayOutputStream baos = new ByteArrayOutputStream();
Assert.assertNotNull(messageBodyWriter);
messageBodyWriter.writeTo(textMessage, TextMessage.class,
TextMessage.class, new Annotation[0],
MediaType.APPLICATION_JSON_TYPE, responseHeaders, baos);
final String jsonString = new String(baos.toByteArray());
Assert.assertTrue(jsonString.contains("\"text\":\"hello\""));
The advantage to this approach is it keeps everything within the JEE6 API, no external libraries are explicitly needed except for testing and getting the providers. However, you need to create an implementation of MultivaluedMap since there is nothing provided in the standard and we don't actually use it. It may also be slower than GSON, and a lot more complicated than necessary.
I understand XML views but it would have shown some foresight to require JSON support for POJOs as standard equipment. Having to doctor up JSON identifiers with special characters makes no sense if your implementation is JSON and your client is a JavaScript RIA.
Also, not that Java Beans are NOT POJOs. I would like to use something like this on the outer surface of my web tier:
public class Model
{
#Property height;
#Property weight;
#Property age;
}
No default constructor, no getter/setter noise, just a POJO with my own annotations.