KAFKA Connect XML source and JSON sink - json

Is there a way in KAFKA to consume XML source and convert it to JSON and send JSON data to KAFKA to sink?
I have seen Avro, Protobuf as convertors in kafka connect? Are they capable of converting XML to JSON? or would they convert to AVRO, Protobuf specific formats rather than JSON?

Kafka Connect can use any data format. However, there is no builtin Converter for XML, so you just use StringConverter.
Then you can use transforms to parse the XML into an internal format Connect works with, known as Struct. E.g. https://jcustenborder.github.io/kafka-connect-documentation/projects/kafka-connect-transform-xml/transformations/examples/FromXml.books.html
(the json "input" and "output" shown is from a REST proxy request, only look at the value field)
When written to a Connect sink, you can then use JSONConverter (or any other one) to deserialize the internal Struct object

Related

How to deserialize the message if the data format is not known?

I am trying to consume messages from the Kafka topic but I don't know if the format of data is JSON or AVRO, so which KEY_DESERIALIZER_CLASS_CONFIG and VALUE_DESERIALIZER_CLASS_CONFIG to use.
Is there any way to find out which format is used?

How to serialize an AVRO schema-compliant JSON message?

Given an AVRO schema, I create a JSON string which conforms to this schema.
How can I serialize the JSON string using AVRO to pass it to a Kafka producer which expects an AVRO-encoded message?
All examples I find don't have JSON as input.
BTW, the receiver will then deserialize the message to a POJO - we are working in different tech stacks. So, basically it's JSON -> Kafka -> POJO.
All examples I find don't have JSON as input
Unclear what examples you've found, but kafka-avro-console-producer does exactly what you want (assuming you're using the Confluent Schema Registry).
Otherwise, you want a jsonDecoder, which is what Confluent uses

Json Parsing in Event Driven System

I have a following use case :
parse the json from stream (kafka topic)
extract some fields (likely 35 out of 100 fields)
Build a json out of those fields
Publish it to pub/sub for further processing
My implementation is very much java language bound. Can anyone suggest optimal solution for this ? and why is it optimal ?
For json parsing, I am thinking of https://bolerio.github.io/mjson/
Kafka includes Jackson JSON library and includes its own JSON Deserializer that returns a JsonNode class
Alternatively,as listed in the comments, you can use higher level frameworks such as Spring, Vertx, Quarkus, etc to build Kafka consumers
For the listed use case, I would opt for Spark, Flink, or NiFi for integration with PubSub. Each also offering JSON proessing, with NiFi being more advanced in that it can do JSONPath

Kafka Connect transforming JSON string to actual JSON

I'm trying to figure out whether it's possible to transform JSON values that are stored as strings into actual JSON structures using Kafka Connect.
I tried looking for such a transformation but couldn't find one. As an example, this could be the source:
{
"UserID":2105058535,
"DocumentID":2105058535,
"RandomJSON":"{\"Tags\":[{\"TagID\":1,\"TagName\":\"Java\"},{\"TagID\":2,\"TagName\":\"Kafka\"}]}"
}
And this is my goal:
{
"UserID":2105058535,
"DocumentID":2105058535,
"RandomJSON":{
"Tags":[
{
"TagID":1,
"TagName":"Java"
},
{
"TagID":2,
"TagName":"Kafka"
}
]
}
}
I'm trying to make these transformations for Elasticsearch sink connector if it makes a difference.
I know I can use Logstash together with JSON filter in order to do this, but I'd like to know whether there's a way to do it using just Kafka Connect.
Sounds like this would be a Single Message Transform (thus applicable to any connector, not just ES), but there aren't any out of the box doing what you describe. The API is documented here.
I had a similar issue but in reverse. I had the data in Json and I needed to convert some of it into a Json string representation to store it in Cassandra using the Cassandra Sink. I ended up creating a Kafka stream app that reads from the topic and then output the Json object to another topic that is read by the connector.
topic document <- read by your kafka stream with a call to mapValues or create a Jackson POJO that serializes as you want, and then write value to -> topic document.elasticsearch
You can use FromJson converter.
Please check this link for more details
https://jcustenborder.github.io/kafka-connect-documentation/projects/kafka-connect-json-schema/transformations/examples/FromJson.inline.html

Convert JSON to Avro schema in Nodejs

I wanted to convert CSV to Avro schema in Nodejs. I was able to convert CSV to JSON and now trying to convert JSON to AVRO schema. Is there any package available in nodejs. Thanks in Advance
This link might be helpful. This will allow you to encode / decode avro binary format to / from json format, it supports both deflate and snappy compressions and supports node streams
https://www.npmjs.com/package/node-avro-io