Kafka Topic Message Versioning - json

I publish messages to a kafka topic (outputTopic) in a format which my subscribers can understand. I now wish to modify the format of these messages in a way which will break the existing topic consumers.
For example, I post objects, serialised in json format, but need to change the objects, and therefore the schema.
What is the best way to manage this type of change? Should I alter the producer so that it publishes to a new topic (outputTopic2)? Are there any better ways to manage this?

Avro schemas will not solve the problem. You could update the consumers to handle both old and new versions whether or not there are schemas.
Instead, keep your producer as it is. Deploy an updated version that reads from the same data source and publishes data to a new topic with the new updated format.
Allow consumers to migrate from the old version to the new version before finally killing the old one.

One clean way to do according to me, is to use Schema registry / Apache Avro. Depending on how you use it, it will help you to guarantee backward / forward compatibility.

Related

JSON and Schema Registry

I am trying to produce JSON records from my Scala Producer code to Kafka topic. It is successfully generated, however I am not able to register the schema and do schema evolution compatibility checks.
I am not able to find any proper code/doc references. How do I register my JSON schema and consume by connecting to schema registry client and check for the compatibilities.
Any suggestions please? (more about what am trying Class io.confluent.kafka.serializers.json.KafkaJsonSchemaSerializer could not be found)
Compatibilities are checked server-side automatically upon producing, which in-turn registers the schemas, by default.
You provide schema.registry.url to the Producer and Consumer properties when using the clients and the JSONSchema(De)Serializer classes

Is there any best way to transfer bulk data from Mysql to Mongodb?

I am using MongoDB first time here. Is there any best way to transfer bulkdata from mysql into MongoDB. I try to search to in different ways but i did not find.
You would have to physically map all your mysql tables to documents in mongodb but you can use an already developed tool.
You can try: Mongify (http://mongify.com/)
It's a super simple way to transform your data from a MySql to MongoDB. It has a ton of support for changing your existing schema into a schema that would work better with MongoDB.
Mongify will read your mysql database, build a translation file for you and all you have to do is map how you want your data transformed.
It supports:
Updating internal IDs (to BSON ObjectID)
Updating referencing IDs
Typecasting values
Embedding Tables into other documents
Before filters (to change data manually before import)
and much much more...
There is also a short 5 min video on the homepage that shows you how easy it is.
Try it out and tell me what you think.
Please up vote if you find this answer helpful.

Quearing unpredictable JSON objects from Elasticsearch using Springboot

I am creating a spring-boot application which will interact with elasticsearch using spring-data. But the problem is, my data in the elasticsearch is unpredictable. That means there can be slight changes in the fields like additional fields or can be totally new field coming in JSON. Please guide me for a solution to address that. Using normal repository is seems not working because I don't have a defined JSON format. Your guide will be highly appreciated.
You need to provide a bit more data on your case.
Normally, when you use #Field annotations, or introducing/dropping a new simple or object field, this should not be a problem at all since spring-data-elasticsearch updates mapping when you save to ElasticsearchRepository. In some cases, e.g. introducing a parent-child rel, you would need to drop and recreate index but this can also be done programatically, if needed.
If you need advanced mapping that is also changing dynamically, then you need to build and execute a mapping update request from your code on save (custom repo).

ETL between a MySQL primary Data Store and a MongoDB secondary Data Store

We have a rails app that has a MySQL backend, each client has one DB and the schema is identical. We use a custom gem to change the DB based on the URL of the request (This is some legacy code that we are trying to move away from)
We need to capture some changes from those MySQL databases (Changes in inventory, some order information, etc) transform and store in a single MongoDB database (multitenant data store), this data will be used for analytics at first, but our idea is to move everything there.
There was something in place to do this, using AR callbacks and Rabbit, but to be honest it wasn't working correctly and it looked like it was more trouble to fix it than to start over with a fresh approach.
We did some research and found some tools to do ETL but they are overkill for our needs.
Does anyone have some experience with a similar problem?
Recommendations on how to architect and implement this simple ETL
Pentaho provides change-data-capture option which can solve Data-synchronization problems.
If by Overkill you mean Setup, Configuration, then Yes that is the common problem with ETL tools and PENTAHO is the easiest among them.
If you can provide more details, I'll be glad to provide an elaborate answer.

Mix of MySQL and Mongodb in an application

I'm writing a web application using PHP/Symfony2/Doctrine2 and just finishing up the design of the Database. We have to import these objects (for ex. Projects, Vendors) into our database that come from different customers with variety of fields. Some customers have 2 fields in the project object and some have 20. So I was thinking about implementing them in MongoDB since it seems like a good use for it.
Symfony2 supports both ORM and ODM so that shouldn't be a problem, Now my question is how to ensure the integrity of the data in both databases. Because Objects in my MySQL db need to somehow be linked to the objects in the MongoDB for integrity issues.
Are there any better solutions out there? Any help/thoughts would be appreciated
Bulat implemented a Doctrine extension while we were at OpenSky for handling references between MongoDB documents and MySQL records, which is currently sitting in their (admittedly outdated) fork of the DoctrineExtensions project. You'll want to look at either the orm2odm_references or openskyfork branches. For this to be usable in your project, you'll probably want to port it over to a fresh fork of DoctrineExtensions, or simply incorporate the code into your application. Unfortunately, there is no documentation apart from the code itself.
Thankfully, there is also cookbook article on the Doctrine website that describes how to implement this from scratch. Basically, you rely on an event listener to replace your property with a reference (i.e. uninitialized Proxy object) from the other object manager and the natural behavior of Proxy objects to lazily load themselves takes care of the rest. Provided the event listener is a service, you can easily inject both the ORM and ODM object managers into it.
The only integrity guaranteed by this model is that you'll receive exceptions when trying to hydrate a bad reference, which is probably more than you'd get by simply storing an ID of the other database and querying manually.
So the way we solved this problem was by moving to Postgres. Postgres has a datatype called hstore that acts like a NoSQL column. Works pretty sweet
UPDATE
Now that I'm looking back, go with jsonb instead of json or hstore as it allows you to have more of a data structure than a key-value store.