Kafka Connect transforming JSON string to actual JSON - json

I'm trying to figure out whether it's possible to transform JSON values that are stored as strings into actual JSON structures using Kafka Connect.
I tried looking for such a transformation but couldn't find one. As an example, this could be the source:
{
"UserID":2105058535,
"DocumentID":2105058535,
"RandomJSON":"{\"Tags\":[{\"TagID\":1,\"TagName\":\"Java\"},{\"TagID\":2,\"TagName\":\"Kafka\"}]}"
}
And this is my goal:
{
"UserID":2105058535,
"DocumentID":2105058535,
"RandomJSON":{
"Tags":[
{
"TagID":1,
"TagName":"Java"
},
{
"TagID":2,
"TagName":"Kafka"
}
]
}
}
I'm trying to make these transformations for Elasticsearch sink connector if it makes a difference.
I know I can use Logstash together with JSON filter in order to do this, but I'd like to know whether there's a way to do it using just Kafka Connect.

Sounds like this would be a Single Message Transform (thus applicable to any connector, not just ES), but there aren't any out of the box doing what you describe. The API is documented here.

I had a similar issue but in reverse. I had the data in Json and I needed to convert some of it into a Json string representation to store it in Cassandra using the Cassandra Sink. I ended up creating a Kafka stream app that reads from the topic and then output the Json object to another topic that is read by the connector.
topic document <- read by your kafka stream with a call to mapValues or create a Jackson POJO that serializes as you want, and then write value to -> topic document.elasticsearch

You can use FromJson converter.
Please check this link for more details
https://jcustenborder.github.io/kafka-connect-documentation/projects/kafka-connect-json-schema/transformations/examples/FromJson.inline.html

Related

How to serialize an AVRO schema-compliant JSON message?

Given an AVRO schema, I create a JSON string which conforms to this schema.
How can I serialize the JSON string using AVRO to pass it to a Kafka producer which expects an AVRO-encoded message?
All examples I find don't have JSON as input.
BTW, the receiver will then deserialize the message to a POJO - we are working in different tech stacks. So, basically it's JSON -> Kafka -> POJO.
All examples I find don't have JSON as input
Unclear what examples you've found, but kafka-avro-console-producer does exactly what you want (assuming you're using the Confluent Schema Registry).
Otherwise, you want a jsonDecoder, which is what Confluent uses

Kafka stream append to JSON as event enrichment

I have a producer that writes a json file to the topic to be read by a kafka consumer stream. Its simple key-value pair.
I want to stream the topic and enrich the event by adding/concatenating more JSON key-value rows and publish to another topic.
None of the values or keys have anything in common by the way.
I am probably overthinking this, but how would I get around this logic?
I suppose you want to decode JSON message at the consumer side.
If you are not concerned about schema and but just want to deal with JSON as a Map, you can use Jackson library to read the JSON string as a Map<String,Object>. For this you can add the fields that you want, convert it back to a JSON string and push it to the new topic.
If you want to have a schema, you need to store the information as to which class it is mapping to or the JSON schema or some id that maps to this, then the following could work.
Store the schema info in headers
For example, you can store the JSON schema or Java class name in the headers of the message while producing and write a deserializer to extract that information from the headers and decode it.
The Deserializer#deserialize() has the Headers argument.
default T deserialize(java.lang.String topic,
Headers headers,
byte[] data)
and you can do something like..
objectMapper.readValue(data,
new Class.forName(
new String(headers.lastHeader("classname").value()
))
Use schema registry
Apart from these, there is also a schema registry from Confluent which can maintain different versions of the schema. You would need to run another process for that, though. If you are going to use this, you may want to look at the subject naming strategy and set it to RecordNameStrategy since you have multiple schemas in the same topic.

How to write the output from DB to a JSON file using Spring batch?

I am new to spring batch and there is a requirement for me to read the data from DB and write in to JSON format. whats the best way to do this ? Any api's are there? or we need to write custom writer ? or i need to user JSON libraries such as GSON or JACKSON ? Please guide me...
To read data from a relational database, you can use one of the database readers. You can find an example in the spring-batch-samples repository.
To write JSON data, Spring Batch 4.1.0.RC1 provides the JsonFileItemWriter that allows you to write JSON data to a file. It collaborates with a JsonObjectMarshaller to marshal an object to JSON format. Spring Batch provides support for both Gson and Jackson libraries (you need to have the one you want to use in the classpath). You can find more details here.
Hope this helps.
You do not need GSON or Jackson Libraries if you DB support JSON.
Example : In SQL Server there is an option to get data out of DB as JSON String instead of resultset.
Reference - https://learn.microsoft.com/en-us/sql/relational-databases/json/format-query-results-as-json-with-for-json-sql-server?view=sql-server-2017
https://learn.microsoft.com/en-us/sql/relational-databases/json/format-nested-json-output-with-path-mode-sql-server?view=sql-server-2017
Example - select (select * from tableName for json path) as jsonString;
This will already give you output in JsonString which you can write to a file.

Kafka Serializer JSON [duplicate]

This question already has answers here:
Writing Custom Kafka Serializer
(3 answers)
Closed 2 years ago.
I am new to Kafka, Serialization and JSON
WHat I want is the producer to send a JSON file via kafka and the consumer to consume and work with the JSON file in its original file form.
I was able to get it so JSON is converter to a string and sent via a String Serializer and then the consumer would parse the String and recreate a JSON object but I am worried that this isnt efficient or the correct method (might lose the field types for JSON)
So I looked into making a JSON serializer and setting that in my producer's configurations.
I used the JsonEncoder here : Kafka: writing custom serializer
But when I try to run my producer now, it seems that in the toBytes function of the encoder the try block is never returning anything like i want it to
try {
bytes = objectMapper.writeValueAsString(object).getBytes();
} catch (JsonProcessingException e) {
logger.error(String.format("Json processing failed for object: %s", object.getClass().getName()), e);
}
Seems objectMapper.writeValueAsString(object).getBytes(); takes my JSON obj ({"name":"Kate","age":25})and converts it to nothing,
this is my producer's run function
List<KeyedMessage<String,JSONObject>> msgList=new ArrayList<KeyedMessage<String,JSONObject>>();
JSONObject record = new JSONObject();
record.put("name", "Kate");
record.put("age", 25);
msgList.add(new KeyedMessage<String, JSONObject>(topic, record));
producer.send(msgList);
What am I missing? Would my original method(convert to string and send and then rebuild the JSON obj) be okay? or just not the correct way to go?
THanks!
Hmm, why are you afraid that a serialize/deserialize step would cause data loss?
One option you have is to use the Kafka JSON serializer that's included in Confluent's Schema Registry, which is free and open source software (disclaimer: I work at Confluent). Its test suite provides a few examples to get you started, and further details are described at serializers and formatters. The benefit of this JSON serializer and the schema registry itself is that they provide transparent integration with producer and consumer clients for Kafka. Apart from JSON there's also support for Apache Avro if you need that.
IMHO this setup is one of the best options in terms of developer convenience and ease of use when talking to Kafka in JSON -- but of course YMMV!
I would suggest to convert your event string which is JSON to byte array like:
byte[] eventBody = event.getBody();
This will increase your performance and Kafka Consumer also provides JSON parser which will help you to get your JSON back.
Please let me know if any further information required.

SpringXD JSON parser to Oracle DB

I am trying to use SpringXD to stream some JSON metrics data to a Oracle database.
I am using this example from here: SpringXD Example
Http call being made: EarthquakeJsonExample
My shell cmd.
stream create earthData --definition "trigger|usgs| jdbc --columns='mag,place,time,updated,tz,url,felt,cdi,mni,alert,tsunami,status,sig,net,code,ids,souces,types,nst,dmin,rms,gap,magnitude_type' --driverClassName=driver --username=username --password --url=url --tableName=Test_Table" --deploy
I would like to capture just the properties portion of this JSON response into the given table columns. I got it to the point where it doesn't give me a error on the hashing but instead just deposits a bunch of nulls into the column.
I think my problem is the parsing of the JSON itself. Since really the properties is in the Features array. Can SpringXD distinguish this for me out of the box or will I need to write a custom processor?
Here is a look at what the database looks like after a successful cmd.
Any advice? Im new to parsing JSON in this fashion and im not really sure how to find more documentation or examples with SpringXD itself.
Here is reference to the documentation: SpringXD Doc
The transformer in the JDBC sink expects a simple document that can converted to a map of keys/values. You would need to add a transformer upstream, perhaps in your usgs processor or even a separate processor. You could use a #jsonPath expression to extract the properties key and make it the payload.