Kafka Serializer JSON [duplicate] - json

This question already has answers here:
Writing Custom Kafka Serializer
(3 answers)
Closed 2 years ago.
I am new to Kafka, Serialization and JSON
WHat I want is the producer to send a JSON file via kafka and the consumer to consume and work with the JSON file in its original file form.
I was able to get it so JSON is converter to a string and sent via a String Serializer and then the consumer would parse the String and recreate a JSON object but I am worried that this isnt efficient or the correct method (might lose the field types for JSON)
So I looked into making a JSON serializer and setting that in my producer's configurations.
I used the JsonEncoder here : Kafka: writing custom serializer
But when I try to run my producer now, it seems that in the toBytes function of the encoder the try block is never returning anything like i want it to
try {
bytes = objectMapper.writeValueAsString(object).getBytes();
} catch (JsonProcessingException e) {
logger.error(String.format("Json processing failed for object: %s", object.getClass().getName()), e);
}
Seems objectMapper.writeValueAsString(object).getBytes(); takes my JSON obj ({"name":"Kate","age":25})and converts it to nothing,
this is my producer's run function
List<KeyedMessage<String,JSONObject>> msgList=new ArrayList<KeyedMessage<String,JSONObject>>();
JSONObject record = new JSONObject();
record.put("name", "Kate");
record.put("age", 25);
msgList.add(new KeyedMessage<String, JSONObject>(topic, record));
producer.send(msgList);
What am I missing? Would my original method(convert to string and send and then rebuild the JSON obj) be okay? or just not the correct way to go?
THanks!

Hmm, why are you afraid that a serialize/deserialize step would cause data loss?
One option you have is to use the Kafka JSON serializer that's included in Confluent's Schema Registry, which is free and open source software (disclaimer: I work at Confluent). Its test suite provides a few examples to get you started, and further details are described at serializers and formatters. The benefit of this JSON serializer and the schema registry itself is that they provide transparent integration with producer and consumer clients for Kafka. Apart from JSON there's also support for Apache Avro if you need that.
IMHO this setup is one of the best options in terms of developer convenience and ease of use when talking to Kafka in JSON -- but of course YMMV!

I would suggest to convert your event string which is JSON to byte array like:
byte[] eventBody = event.getBody();
This will increase your performance and Kafka Consumer also provides JSON parser which will help you to get your JSON back.
Please let me know if any further information required.

Related

Kafka stream append to JSON as event enrichment

I have a producer that writes a json file to the topic to be read by a kafka consumer stream. Its simple key-value pair.
I want to stream the topic and enrich the event by adding/concatenating more JSON key-value rows and publish to another topic.
None of the values or keys have anything in common by the way.
I am probably overthinking this, but how would I get around this logic?
I suppose you want to decode JSON message at the consumer side.
If you are not concerned about schema and but just want to deal with JSON as a Map, you can use Jackson library to read the JSON string as a Map<String,Object>. For this you can add the fields that you want, convert it back to a JSON string and push it to the new topic.
If you want to have a schema, you need to store the information as to which class it is mapping to or the JSON schema or some id that maps to this, then the following could work.
Store the schema info in headers
For example, you can store the JSON schema or Java class name in the headers of the message while producing and write a deserializer to extract that information from the headers and decode it.
The Deserializer#deserialize() has the Headers argument.
default T deserialize(java.lang.String topic,
Headers headers,
byte[] data)
and you can do something like..
objectMapper.readValue(data,
new Class.forName(
new String(headers.lastHeader("classname").value()
))
Use schema registry
Apart from these, there is also a schema registry from Confluent which can maintain different versions of the schema. You would need to run another process for that, though. If you are going to use this, you may want to look at the subject naming strategy and set it to RecordNameStrategy since you have multiple schemas in the same topic.

Kafka streams API - deserialize dynyamically generated JSON

I've been trying to find a solution to the following situation with no avail:
I have a Kafka Streams application which should read from a single input topic a series of JSON objects, all not exactly the same as one another. Practically speaking, each JSON is a representation of an HTTP request object, thus not all JSON records have the same headers, request parameters, cookies and so forth. Furthermore, the JSON objects are written
Is there any way to achieve this? Not expecting for any detailed how-to solutions. Only for some leads on how I can achieve this, as my search over the internet has ended me with nothing so far.
Here's an idea: use Jackson's tree model to dynamically parse your JSON into a JsonNode, and then use this tree representation in your Kafka Streams topology to process the requests.
ObjectMapper objectMapper = new ObjectMapper();
JsonNode rootNode = objectMapper.readTree(json);
...

How to write the output from DB to a JSON file using Spring batch?

I am new to spring batch and there is a requirement for me to read the data from DB and write in to JSON format. whats the best way to do this ? Any api's are there? or we need to write custom writer ? or i need to user JSON libraries such as GSON or JACKSON ? Please guide me...
To read data from a relational database, you can use one of the database readers. You can find an example in the spring-batch-samples repository.
To write JSON data, Spring Batch 4.1.0.RC1 provides the JsonFileItemWriter that allows you to write JSON data to a file. It collaborates with a JsonObjectMarshaller to marshal an object to JSON format. Spring Batch provides support for both Gson and Jackson libraries (you need to have the one you want to use in the classpath). You can find more details here.
Hope this helps.
You do not need GSON or Jackson Libraries if you DB support JSON.
Example : In SQL Server there is an option to get data out of DB as JSON String instead of resultset.
Reference - https://learn.microsoft.com/en-us/sql/relational-databases/json/format-query-results-as-json-with-for-json-sql-server?view=sql-server-2017
https://learn.microsoft.com/en-us/sql/relational-databases/json/format-nested-json-output-with-path-mode-sql-server?view=sql-server-2017
Example - select (select * from tableName for json path) as jsonString;
This will already give you output in JsonString which you can write to a file.

Kafka Connect transforming JSON string to actual JSON

I'm trying to figure out whether it's possible to transform JSON values that are stored as strings into actual JSON structures using Kafka Connect.
I tried looking for such a transformation but couldn't find one. As an example, this could be the source:
{
"UserID":2105058535,
"DocumentID":2105058535,
"RandomJSON":"{\"Tags\":[{\"TagID\":1,\"TagName\":\"Java\"},{\"TagID\":2,\"TagName\":\"Kafka\"}]}"
}
And this is my goal:
{
"UserID":2105058535,
"DocumentID":2105058535,
"RandomJSON":{
"Tags":[
{
"TagID":1,
"TagName":"Java"
},
{
"TagID":2,
"TagName":"Kafka"
}
]
}
}
I'm trying to make these transformations for Elasticsearch sink connector if it makes a difference.
I know I can use Logstash together with JSON filter in order to do this, but I'd like to know whether there's a way to do it using just Kafka Connect.
Sounds like this would be a Single Message Transform (thus applicable to any connector, not just ES), but there aren't any out of the box doing what you describe. The API is documented here.
I had a similar issue but in reverse. I had the data in Json and I needed to convert some of it into a Json string representation to store it in Cassandra using the Cassandra Sink. I ended up creating a Kafka stream app that reads from the topic and then output the Json object to another topic that is read by the connector.
topic document <- read by your kafka stream with a call to mapValues or create a Jackson POJO that serializes as you want, and then write value to -> topic document.elasticsearch
You can use FromJson converter.
Please check this link for more details
https://jcustenborder.github.io/kafka-connect-documentation/projects/kafka-connect-json-schema/transformations/examples/FromJson.inline.html

Grails - Error saving JSONObject to MongoDB

I am having trouble saving a JSONObject to a MongoDB database using the MongoDB plugin.
I receive the message:
Can't find a codec for class org.codehaus.groovy.grails.web.json.JSONObject..
This is very frustrating because I am using the JSON parser to load JSON data but can't persist this JSON data to the MongoDb which should be straightforward.
Is there a built in way to convert a JSONOBject to a normal Map? I've tried casting it using asType( Map ), ( Map ), and even using toString() and thent rying to convert back from string to object. I've seen that other vanilla Java questions involve using Jackson but I'm hoping there is a Groovier way to do this rather than importing a whole new library for just two lines of code.
This is what I'm doing for now:
Converting the JSONObject to a string and then using com.mongodb.util.JSON.parse() to convert that string to a DBObject that Mongo can use.
It's not the best but it works for now.
I'm not going to accept this answer because I don't think it's the right answer.
Not saying this is the correct answer, but I was able to convert the JSONObject to a HashMap. For my situation I had a Domain object with an ArrayList (converted from JSONArray by a previous JSONTranslationService) and I was able to convert each of the internal JSONObjects using something like this:
static final UNMARSHAL = { thing ->
thing.objects.collect {
it as Hashmap
}
}
I'm only experiencing this issue after an upgrade from mongodb:3.0.2 to 6.1.2 to support MongoDB 3.4. Are you also running this version of the plugin? If so, I think it's fair to say that there's either a bug in the plugin (I'm already aware of one) or something changed with the default behavior and wasn't documented.