Kafka stream append to JSON as event enrichment - json

I have a producer that writes a json file to the topic to be read by a kafka consumer stream. Its simple key-value pair.
I want to stream the topic and enrich the event by adding/concatenating more JSON key-value rows and publish to another topic.
None of the values or keys have anything in common by the way.
I am probably overthinking this, but how would I get around this logic?

I suppose you want to decode JSON message at the consumer side.
If you are not concerned about schema and but just want to deal with JSON as a Map, you can use Jackson library to read the JSON string as a Map<String,Object>. For this you can add the fields that you want, convert it back to a JSON string and push it to the new topic.
If you want to have a schema, you need to store the information as to which class it is mapping to or the JSON schema or some id that maps to this, then the following could work.
Store the schema info in headers
For example, you can store the JSON schema or Java class name in the headers of the message while producing and write a deserializer to extract that information from the headers and decode it.
The Deserializer#deserialize() has the Headers argument.
default T deserialize(java.lang.String topic,
Headers headers,
byte[] data)
and you can do something like..
objectMapper.readValue(data,
new Class.forName(
new String(headers.lastHeader("classname").value()
))
Use schema registry
Apart from these, there is also a schema registry from Confluent which can maintain different versions of the schema. You would need to run another process for that, though. If you are going to use this, you may want to look at the subject naming strategy and set it to RecordNameStrategy since you have multiple schemas in the same topic.

Related

Processing json data from kafka using structured streaming

I want to convert incoming JSON data from Kafka into a dataframe.
I am using structured streaming with Scala 2.12
Most people add a hard coded schema, but if the json can have additional fields, it requires changing the code base every-time, which is tedious.
One approach is to write it into a file and infer it with but I rather avoid doing that.
Is there any other way to approach this problem?
Edit: Found a way to turn a json string into a dataframe but cant extract it from the stream source, it is possible to extract it?
One way is to store the schema itself in the message headers (not in the key or value).
Though, this increases message size, it will be easy to parse the JSON value without the need for any external resource like a file or a schema registry.
New messages can have new schemas while at the same time old messages can still be processed using their old schema itself, because the schema is within the message itself.
Alternatively, you can version the schemas and include an id for every schema in the message headers (or) a magic byte in the key or value and infer the schema from there.
This approach is followed by Confluent Schema registry. It allows you to basically go through different versions of same schema and see how your schema has evolved over time.
Read the data as string and then convert it to map[string,String], this way you can process the any json without even knowing its schema
based on JavaTechnical answer , the best approach would be to use a schema registry and
avro data instead of json, there is no going around hardcoding a schema (for now).
include your schema name and id as a header and use them to read the schema from the schema registry.
use the from_avro fucntion to turn that data into a df!

Kafka streams API - deserialize dynyamically generated JSON

I've been trying to find a solution to the following situation with no avail:
I have a Kafka Streams application which should read from a single input topic a series of JSON objects, all not exactly the same as one another. Practically speaking, each JSON is a representation of an HTTP request object, thus not all JSON records have the same headers, request parameters, cookies and so forth. Furthermore, the JSON objects are written
Is there any way to achieve this? Not expecting for any detailed how-to solutions. Only for some leads on how I can achieve this, as my search over the internet has ended me with nothing so far.
Here's an idea: use Jackson's tree model to dynamically parse your JSON into a JsonNode, and then use this tree representation in your Kafka Streams topology to process the requests.
ObjectMapper objectMapper = new ObjectMapper();
JsonNode rootNode = objectMapper.readTree(json);
...

JSON parsing without using Java objects

I want to parse JSON data from a RESTful service.
Unlike a SOAP-based service, where a service consumer can create stubs and skeleton from WSDL, in the case of the RESTful service, the service consumer gets a raw JSON string.
Since the service consumer does not have a Java object matching the JSON structure, we are not able to use the JSON to Java Mappers like GSON, Jackson etc.
One another way is to use parsers like JsonPath, minimal-json, etc which help traversing the JSON structure and read the data.
Is there any better way of reading JSON data?
The official docs for Jackson mention 3 different ways to parse a JSON doc from Java. The first 2 do not require "Java object matching the JSON structure". In Summary :
Streaming API (aka "Incremental parsing/generation") reads and writes JSON content as discrete events.
Tree Model provides a mutable in-memory tree representation of a JSON document. ObjectMapper can build trees that consist of JsonNode nodes.
Data Binding converts JSON to and from POJOs based either on property accessor conventions or annotations.
With simple data binding you convert to and from Java Maps, Lists, Strings, Numbers, Booleans and nulls
With full data binding you convert to and from any Java bean type (as well as "simple" types mentioned above)
Another option is to generate Java Beans from JSON documents. You mileage may vary and you may/probably will have to modify the generated files. There are at least 5 online tools for that purpose that you can try:
http://www.jsonschema2pojo.org/
http://pojo.sodhanalibrary.com/
https://timboudreau.com/blog/json/read
http://jsongen.byingtondesign.com/
http://json2java.azurewebsites.net/
There are also IDE plugins that you can use. For instance this one for Intellij https://plugins.jetbrains.com/idea/plugin/7678-jackson-generator-plugin
The GSON supports work without objects, too. Something as this:
JsonObject propertiesWrapper = new JsonParser().parse(responseContent).getAsJsonObject();
assertNotNull(propertiesWrapper);
propertiesWrapper = propertiesWrapper.getAsJsonObject("properties");
assertNotNull(propertiesWrapper);
JsonArray propertiesArray = propertiesWrapper.getAsJsonArray("property");
assertNotNull(propertiesArray);
assertTrue(propertiesArray.size()>0, "The list of properties should not be empty. ");
The problem is that the work this way is so inconvenient that it is really better to create objects instead.
Jackson has absolutely the same problems, and to greater extent - extremal inconvenient for direct json reading/creation. All its tutorials advice to use POJOs instead, too.
The only really convenient way is use Groovy. Groovy works as an envelope on Java, you can simply write Java code and use Groovy operators at need. And in JSON or XML reading and creation Groovy is incomparably more powerful that Java with all its libraries multiplied on each other! It is even much more convenient than already prepared by somebody else tree structure of ready POJOs.

Kafka Serializer JSON [duplicate]

This question already has answers here:
Writing Custom Kafka Serializer
(3 answers)
Closed 2 years ago.
I am new to Kafka, Serialization and JSON
WHat I want is the producer to send a JSON file via kafka and the consumer to consume and work with the JSON file in its original file form.
I was able to get it so JSON is converter to a string and sent via a String Serializer and then the consumer would parse the String and recreate a JSON object but I am worried that this isnt efficient or the correct method (might lose the field types for JSON)
So I looked into making a JSON serializer and setting that in my producer's configurations.
I used the JsonEncoder here : Kafka: writing custom serializer
But when I try to run my producer now, it seems that in the toBytes function of the encoder the try block is never returning anything like i want it to
try {
bytes = objectMapper.writeValueAsString(object).getBytes();
} catch (JsonProcessingException e) {
logger.error(String.format("Json processing failed for object: %s", object.getClass().getName()), e);
}
Seems objectMapper.writeValueAsString(object).getBytes(); takes my JSON obj ({"name":"Kate","age":25})and converts it to nothing,
this is my producer's run function
List<KeyedMessage<String,JSONObject>> msgList=new ArrayList<KeyedMessage<String,JSONObject>>();
JSONObject record = new JSONObject();
record.put("name", "Kate");
record.put("age", 25);
msgList.add(new KeyedMessage<String, JSONObject>(topic, record));
producer.send(msgList);
What am I missing? Would my original method(convert to string and send and then rebuild the JSON obj) be okay? or just not the correct way to go?
THanks!
Hmm, why are you afraid that a serialize/deserialize step would cause data loss?
One option you have is to use the Kafka JSON serializer that's included in Confluent's Schema Registry, which is free and open source software (disclaimer: I work at Confluent). Its test suite provides a few examples to get you started, and further details are described at serializers and formatters. The benefit of this JSON serializer and the schema registry itself is that they provide transparent integration with producer and consumer clients for Kafka. Apart from JSON there's also support for Apache Avro if you need that.
IMHO this setup is one of the best options in terms of developer convenience and ease of use when talking to Kafka in JSON -- but of course YMMV!
I would suggest to convert your event string which is JSON to byte array like:
byte[] eventBody = event.getBody();
This will increase your performance and Kafka Consumer also provides JSON parser which will help you to get your JSON back.
Please let me know if any further information required.

Importing a json file into Cassandra

Hi is it possible to import any random json file into cassandra.
The json file is not exported from sstable2json. The json file is from a different website and needs to be imported into cassandra. Please could anyone advise whether this is possible
JSON support won't be introduced until Cassandra 3.0 (see CASSANDRA-7970) and in this case you still need to define a schema for your json data to map to. You do have some other options:
Use maps which sort of map to JSON. Maps can be indexed as of Cassandra 2.1 (CASSANDRA-4511) There is also a good Stack Exchange post about this.
You mention 'any random json file'. You could just have a string column that contains the raw JSON, but then you lose any query-ability of that data.
Come up with some kind of schema for your JSON data and map it to a CQL table and write some code that parses the JSON and writes it to the CQL table mapping to that data. This doesn't sound like an option for you since you want to be able to import any random JSON file.
If you are looking to only do json document storage, you might want to look at more document-oriented solutions instead of a column-oriented solution like cassandra.