I am developing an application(B) which uses Schema Registry and allows clients(A) to send data to any other platform(C). General design:
Client A -> Platform B -> Destination C
So I am not necessarily publishing messages to kafka.
Is there any documentation on how to integrate and use Schema registry with a custom application which is not publishing to kafka? All the documentation I could find is around using registry with kafka.
For this setup, I am finding it difficult to finalize the serialization format to use- if I use avro/protobuf, it requires a corresponding class/message in application B. This means clients will need to update B every time a new schema is added or an old one updated. For JSON, such a class should not be needed (I guess?) since it is schemaless.
What is the intended schema format for your solution? If a binary structured format, then serializing key/value (Java Objects - > byte Array), you definitely need a Schema Registry integrated in your application.
Refer : https://docs.confluent.io/platform/current/schema-registry/index.html
Again for #2, Schema Registry provides you a better Schema Management capability in terms of Schema Evolution and compatibility checks, so you need not to worry much about updating your KAFKA Clients whenever a schema evolves.
Answer to your questions:
Documentation in Schema Registry API Reference. Here is an example how to add key/value schemas for Platform B.
Once the key/value schemas are setup in the Schema Registry, then you have to configure the connector properties to serialize/deserialize the message.
Connector properties to serialize/deserialize avro messages.
(Connector sink example)
{
"name": "eraseme_connector_schema_test_v01",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"topics": "eraseme_topic_schema_test_v05",
"connection.url": "jdbc:mysql://xxx.xxxx.xxxx",
"connection.user": "USER",
"connection.password": "PASSOWRD",
"insert.mode": "upsert",
"delete.enabled": "true",
"pk.mode": "record_key",
"pk.fields": "YOUR_KEY_1, YOUR_KEY_1,...,YOUR_KEY_N",
"auto.create": "true",
"auto.evolve": "true",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"key.converter.schema.registry.url": "http://schema-registry:8081"
}
}
All the documentation I could find is around using registry with kafka
Because the Confluent Registry stores schemas in a Kafka topic. So, you'll therefore need a Kafka installation, even if it is not used by your services. Note: There are other Schema Registry implementations.
finding it difficult to finalize the serialization format to use
Doesn't really matter; the Confluent Registry also supports defining your own. What actually matters is how supported that format is within your clients. E.g. Avro support in Golang or .NET isn't the greatest.
requires a corresponding class/message in application B. This means clients will need to update B every time a new schema is added or an old one updated
So? Do you never upgrade other 3rd party libraries for new methods/fixes? Why would you treat your schemas any different?
The Registry enforces backwards compatible changes, by default. Schema upgrades aren't required unless you actually need the new data being sent.
For JSON, such a class should not be needed (I guess?) since it is schemaless
The Confluent Schema Registry uses JSONSchema, so it is not schemaless.
Kafka itself supports a form of JSON in the Connect API that has a schema, but it is not integrated with any Registry, and as such cannot guarantee any schema evolution rules.
any documentation on how to integrate and use Schema registry with a custom application which is not publishing to kafka
You'd use the REST API directly.
Related
Please find below a question on the Couchbase product roadmap.
CONTEXT
I am currently working on a feature to turn an existing system (a backend app and a mobile app) into a multi-tenant app.
The stack would be :
Couchbase 7.0
Spring Boot 2.5.5
Spring Data Couchbase : 4.2.5
SyncGateway : 2.8
PROBLEM
As far as I read in the documentation a good practice would be to use the new Couchbase 7.0 feature : Scopes and Collection.
However, everything does not seem to be ready :
The theory :
https://blog.couchbase.com/scopes-and-collections-for-modern-multi-tenant-applications-couchbase-7-0/
https://blog.couchbase.com/how-to-migrate-to-scopes-and-collections-in-couchbase-7-0/ and
In practice :
1- Spring Data Couchbase is (v4.2.5) not release yet to work with named scopes or collections
https://docs.spring.io/spring-data/couchbase/docs/4.2.5/reference/html/#reference
4.3.0-M3 github example give #Collection annotation which work but are not mentionned in the documentation.
2- Couchbase SyncGateway 2.8 is not compatible and 3.0 documentation does not mention a compatibility
Sync Gateway offers support for Couchbase Server’s default scopes and collections (Default Collections).
It does not currently support named scopes or collections (Named Collections).
https://docs.couchbase.com/sync-gateway/current/server-compatibility-collections.html#using-collections
QUESTIONS
To plan the project, or change the data structure (for example just using buckets) do you know when the full stack would be compatible with named scopes and collections ?
Do you know when Spring Data Couchbase 4.3.0 would be released (because 4.3.0-M3 version seems to be ok) ?
Do you know when a Syncgateway version compatible with named Scopes and Collections would be released ?
Thank you very much for your help.
Regards,
Matthieu.
Spring Data Couchbase 4.3.0 with Collections support is scheduled to be released on November 12. The try-cb-spring sample uses collections. I can't speak for Sync Gateway.
Mike
Named scopes and collections support on Sync Gateway is on our radar. We do not have a publicly available timeline at this point.
Note that until that is supported, there are alternate ways to deploy multi-tenant applications using Sync Gateway. You can point sync gateway databases to separate buckets or use the type value prefix pattern to segregate documents by tenant.
The new configuration approach in Sync Gateway 3.0 (beta) simplifies the administration of multi-tenant deployments by allowing sync gateway database configuration for tenants to be managed independent of each of other.
#MattMatt - I'm not familiar with the syncgateway api - but all that is needed for collection support for n1ql/query is an http parameter query_context=default:<bucket-name>.<scope-name> (bucket-name and scope-name escaped separately with back-tics) and then instead of specifying the bucket-name in the query, specify the collection-name. If the syncgateway has an api to add 'raw' http parameters, that could be used for n1ql/query without specific collection support.
I am trying to produce JSON records from my Scala Producer code to Kafka topic. It is successfully generated, however I am not able to register the schema and do schema evolution compatibility checks.
I am not able to find any proper code/doc references. How do I register my JSON schema and consume by connecting to schema registry client and check for the compatibilities.
Any suggestions please? (more about what am trying Class io.confluent.kafka.serializers.json.KafkaJsonSchemaSerializer could not be found)
Compatibilities are checked server-side automatically upon producing, which in-turn registers the schemas, by default.
You provide schema.registry.url to the Producer and Consumer properties when using the clients and the JSONSchema(De)Serializer classes
I want download data from a Rest API into a database.The data I want save are typed objects, like java object. I have chosen cassandra because it support the type Array type, Map type, versus standard SQLdatabase(Mysql, Sqlite,..). It is better to serialize java object.
In first, I should create the tables CQL from json schema of REST API. How it is possible to generate CQL table from json schema of REST API.
I know openapi-generator can generate mysql schema from json schema, but don't support CQL for the moment. So I need to search a alternative solution.
I haven't used off-the-shelf packages extensively to manage Cassandra schema but there are possibly open-source projects or software like Hackolade that might do it for you.
https://cassandra.link/cassandra.toolkit/ managed by Anant (I don't have any affiliation) has an extensive list of resources you might be interested in. Cheers!
I publish messages to a kafka topic (outputTopic) in a format which my subscribers can understand. I now wish to modify the format of these messages in a way which will break the existing topic consumers.
For example, I post objects, serialised in json format, but need to change the objects, and therefore the schema.
What is the best way to manage this type of change? Should I alter the producer so that it publishes to a new topic (outputTopic2)? Are there any better ways to manage this?
Avro schemas will not solve the problem. You could update the consumers to handle both old and new versions whether or not there are schemas.
Instead, keep your producer as it is. Deploy an updated version that reads from the same data source and publishes data to a new topic with the new updated format.
Allow consumers to migrate from the old version to the new version before finally killing the old one.
One clean way to do according to me, is to use Schema registry / Apache Avro. Depending on how you use it, it will help you to guarantee backward / forward compatibility.
I have a use case where I need to validate JSON objects against a schema that can change real time..
Let me explain my requirements..
I persist JSON objects (MongoDB).
Before persisting I MUST validate the data type of some of the
fields of JSON objects (mentioned in #1) against a schema.
I persist the schema in mongodb.
I always validate the JSON objects against the latest schema available in db. (so I dont think it matters much even if the schema can change in real time for me it is kinda static).
I am using a J2EE stack (Spring Framework).
Can anyone guide me here..?
Another way of doing it is to use an external library https://github.com/fge/json-schema-validator to do the work for you. The one I proposed supports draft 4 of JSON Schema.
The IBM DataPower appliance has JSON Schema validation support. This will allow you to offload validation to an appliance that is designed for it along with routing of data within te enterprise.