What is kafka connect Json Schema good for? - json

I wonder what is the benefit of adding the json schema to your message as kafka connect do support it ?

Schemas are an important part of data pipelines. Kafka Connect supports embedding it in JSON, or you can use another option (Avro, Protobuf). If you don't have a schema you make life more difficult for consumers of the data, and some will insist on it—for example the JDBC Sink connector requires there be a schema and will fail if there isn't.
So to answer your question, if you don't want to use Avro or Protobuf (and if you like having large messages with lots of redundant repeating data ;-) then you can use Kafka Connect JSON schema format.

Related

Converting REST API JSON schema into a CQL Cassandra schema

I want download data from a Rest API into a database.The data I want save are typed objects, like java object. I have chosen cassandra because it support the type Array type, Map type, versus standard SQLdatabase(Mysql, Sqlite,..). It is better to serialize java object.
In first, I should create the tables CQL from json schema of REST API. How it is possible to generate CQL table from json schema of REST API.
I know openapi-generator can generate mysql schema from json schema, but don't support CQL for the moment. So I need to search a alternative solution.
I haven't used off-the-shelf packages extensively to manage Cassandra schema but there are possibly open-source projects or software like Hackolade that might do it for you.
https://cassandra.link/cassandra.toolkit/ managed by Anant (I don't have any affiliation) has an extensive list of resources you might be interested in. Cheers!

is it possible to convert XML data into JSON Format in Kafka

i have requirement publisher send XML data in to apache Kafka cluster and my consumer need data into JSON format.
Is it possible Apache Kafka can convert XML data into JSON?
Apache Kafka can convert [...]
No. Kafka itself stores binary data without conversion.
Your client code is responsible for deserialization, parsing, reformatting, and re-serializing into other formats
If you have consumers and producers that do not agree on a uniform format, it would be the responsibility of one of the parties (probably the consumer or the Kafka administrators) to come up with a standard way to provide a conversion service.
Similar question, but KSQL doesn't support XML last I checked, so you'd have to at least use Kafka Streams, although you could borrow some logic from this project, however it stops short of actually providing JSON output
You might be able to use that project with MirrorMaker2 since it's built on Kafka Connect as part of the same cluster, but YMMV since that's not a recommended pattern

convert mongoDB Collection to mySQL Database

I was created my project in spring 4 MVC + Hibernate with MongoDB. now, I have to convert it into the Hibernate with MySQL. my problem is I have too many collections in MongoDB the format of bson and json. how can I convert that file into MySQL table format? is that possible?
Mongodb is a non-relational database, while MySQL is relational. The key difference is that the non relational database contains documents (JSON objects) which can contain hierarchical structure, where as the relational database expects the objects to be normalised, and broken down into tables. It is therefore not possible to simply convert the bson data from MongoDB into something which MySQL will understand. You will need to write some code that will read the data from MongoDB and the write it into MySQL.
The documents in your MongoDB collections represent serialised forms of some classes (POJOs, domain object etc) in your project. Presumably, you read this data from MongoDB deserialise it into its class form and use it in your project e.g. display it to end users, use it in calculations, generate reports from it etc.
Now, you'd prefer to host that data in MySQL so you'd like to know how to migrate the data from MongoDB to MySQL but since the persistent formats are radically different you are wondering how to do that.
Here are two options:
Use your application code to read the data from MongoDB, deserialise it into your classes and then write that data into MySQL using JDBC or an ORM mapping layer etc.
Use mongoexport to export the data from MongoDB (in JSON format) and then write some kind of adapter which is capable of mapping this data into the desired format for your MySQL data model.
The non functionals (especially for the read and write aspects) will differ between these approaches but fundamentally both approaches are quite similar; they both (1) read from MongoDB; (2) map the document data to the relational model; (3) write the mapped data into MySQL. The trickiest aspect of this flow is no. 2 and since only you understand your data and your relational model there is no tool which can magically do this for you. How would a thirdparty tool be sufficiently aware of your document model and your relational model to be able to perform this transformation for you?
You could investigate a MongoDB JDBC driver or use something like Apache Drill to facilitate JDBC queries onto your Mongo DB. Since these could return java.sql.ResultSet you would be dealing with a result format which is more suited for writing to MySQL but it's likely that this still wouldn't match your target relational model and hence you'd still need some form of transformation code.

How to parse json data from kafka server with spark streaming?

I managed to connect spark streaming to my kafka server in which I have data with json format. I want to parse these data in order to do use the function groupby as explained here: Can Apache Spark merge several similar lines into one line?
In fact, in this link we import json data from a file which is clearly easier to treat. I didn't find someting similar with a kafka server.
Do you have any idea bout it.
Thanks and regards
It's really hard to understand what you're asking because we can't see where you are now without code. Maybe this general guidance is what you need.
Your StreamingContext can be given a foreachRDD block where you'll get an RDD. Then you can sqlContext.read.json(inputRDD) and you will have a DataFrame which you can process however you like.

JSON validation against a schema (Java EE application)

I have a use case where I need to validate JSON objects against a schema that can change real time..
Let me explain my requirements..
I persist JSON objects (MongoDB).
Before persisting I MUST validate the data type of some of the
fields of JSON objects (mentioned in #1) against a schema.
I persist the schema in mongodb.
I always validate the JSON objects against the latest schema available in db. (so I dont think it matters much even if the schema can change in real time for me it is kinda static).
I am using a J2EE stack (Spring Framework).
Can anyone guide me here..?
Another way of doing it is to use an external library https://github.com/fge/json-schema-validator to do the work for you. The one I proposed supports draft 4 of JSON Schema.
The IBM DataPower appliance has JSON Schema validation support. This will allow you to offload validation to an appliance that is designed for it along with routing of data within te enterprise.