Can Apache Kafka/NiFi convert data into a JSON file? - json

Lets say you have a business Application producing and storing Enriched Product Master Data in its own environment, one the enrichment is completed you want to make that data available on a CouchBase Database.
In order to get that data from Business Application's environment into CouchBase, let's assume I want to use Kafka to broadcast the changes and Nifi to distribute it to the final desination (CouchBase).
But CouchBase takes JSON format files. Can I use Kafka or Nifi to convert the pulled data into a JSON format? I know I can for instance put solution such as Attunity between the Business Application and Kafka to replicate the data real time. But let us assume that there is no budget to implement the solution attunity, so one will temporarily use a REST API on the Business Application side and pull that data with (based on made changes) Kafka, can I convert the data into JSON with Kafka? or NiFi?
EDIT:
Well the reason why I want to know if NiFi can do this, is because our landscape is a bit more complex than I described. Because between CouchBase and the Business Application, you have:
[Business App] - [ X ] - [Kafka] - [NiFi] - [DC/OS with KONG API Layer] - [CouchBase Cluster].
And I want to know if I should implement a new solution for Data Replication on the spot of the X, or should I just make use of the Business App REST API and pull data from the REST API with Kafka and Convert my data to JSON in NiFI.

There is a Couchbase sink for Kafka Connect. This will enable you to do exactly what you want. Simple configuration-file based approach.

Related

Nifi, how to produce via Kafka avro files with multiple records each file

I created a pipeline that handles a single json file (a vector of 5890 elements, each a record) and send it via Kafka in avro format. The producer works fine, then when I read it with a consumer I get a flowfile (a avro file) each record. 5890 avro files. How can I set or merge more records in a single avro file?
I simply use a PublishKafkaRecord_0_10 1.5.0 (jsonTreeReader 1.5.0 and AvroRecordSetWriter 1.5.0) and ConsumeKafka_0_10 1.5.0 .
Firstly, NiFi 1.5.0 is from January 2018. Please consider upgrading as this is terribly out of date. NiFi 1.15.3 is the latest as of today.
Secondly, the *Kafka_0_10 processors are geared at very old versions of Kafka - are you really using v0.10 of Kafka? You have the following processors for later Kafka versions:
*Kafka_1.0 for Kafka 1.0+
*Kafka_2.0 for Kafka 2.0+
*Kafka_2.6 for Kafka 2.6+.
It would be useful if you provide examples of your input and desired output and what you are actually trying to achieve.
If you are looking to consume those message in NiFi and you want a single FlowFile with many messages, you should use ConsumeKafkaRecord rather than ConsumeKafka. This will let you control how many records you'd like to see per 'file'.
If your consumer is not NiFi, then either they need to merge on their end, or you need to bundle all your records into one larger message when producing. However, this is not really the point of Kafka as it's not geared towards large messages/files.

is it possible to convert XML data into JSON Format in Kafka

i have requirement publisher send XML data in to apache Kafka cluster and my consumer need data into JSON format.
Is it possible Apache Kafka can convert XML data into JSON?
Apache Kafka can convert [...]
No. Kafka itself stores binary data without conversion.
Your client code is responsible for deserialization, parsing, reformatting, and re-serializing into other formats
If you have consumers and producers that do not agree on a uniform format, it would be the responsibility of one of the parties (probably the consumer or the Kafka administrators) to come up with a standard way to provide a conversion service.
Similar question, but KSQL doesn't support XML last I checked, so you'd have to at least use Kafka Streams, although you could borrow some logic from this project, however it stops short of actually providing JSON output
You might be able to use that project with MirrorMaker2 since it's built on Kafka Connect as part of the same cluster, but YMMV since that's not a recommended pattern

Spring Boot API - POST complete data from client

I have the task to implement an API with Spring Boot and a relational database to save the data from a client (mobile app) and synchronize it.
So far no problem. I have some endpoints to post and get the stored data.
Now I have the task to provide an endpoint that return the complete data in a GET-Request and another to save the complete data of the client via a POST-Request.
My problem is:
How do I store the complete data in one POST-Request(JSON)?
The database has multiple entities with manytomany relationships and if I just POST them then I have some problems with the relations between the Entities.
My approach to GET the complete data was to just create a new Entity with every entity in it. Is this the best solution?
And is this even a good solution to POST the complete data instead of the usage of the other endpoints to get the entities one by one. Or is there another approach to store and restore the complete data from server and client? Whereby I think that posting the complete data makes less sense.
is this even a good solution to POST the complete data instead of the usage of the other endpoints to get the entities one by one
In some scenarios you may want to force update or synchronize the client data with the server, for example, WhatsApp backup now option.
How do I store the complete data in one POST-Request(JSON)
You can make one post endpoint that extracts the data sent from the client and internally use all your repositories or daos for each property.
My approach to GET the complete data was to just create a new Entity
with every entity in it. Is this the best solution
either by doing as you mentioned or by handling it manually in the endpoint like this
also check this one which uses apache camel to aggregate multiple endpoints

Bulk loading data into MarkLogic via external RESTful services

I have series of documents that I need to migrate into MarkLogic. The documents are available to me via RESTful services in JSON. What I want to know is there anyway, such as through the MLCP or Query Console to call those RESTful services and pull in the data, otherwise I have to create a small Java app and dump the files to a share then pick them up from MarkLogic.
mlcp is designed to source data from the file system or a MarkLogic database. Take a look at the Java Client API to perform ingestion from other sources. For example, you can fire up your favorite HTTP client in Java and add the results to a DocumentWriteSet. The write set acts like a buffer, allowing you to batch requests to MarkLogic for efficiency. You can then send that DocumentWriteSet to MarkLogic with one of the DocumentManager.write() methods. Take a look at the documentation for many more details or the "Bulk Writes" section of the getting started cookbook.

Nodejs connect to database or REST service

I have to choice to output the info in database (Mysql) to be json format.
directly connect to database and fetch the data and output json
connect to a REST service to get the data and output json.
Which is better and why?
directly connect to database and fetch the data and output json
If you are connecting to the database (doesn't matter if it's MySQL or something else) directly through binary based protocol it should be faster than REST based protocol.
connect to a REST service to get the data and output json.
REST based protocols are on the other hand more simple, straightforward and easier to implement from the client side of view than binary ones in general.
Which is better and why?
It depends if you need speed or simplicity of use. In case of binary connection you would additionaly have to parse fetched data to JSON. REST service can give you usually just what you need in desired JSON format. However if speed is crucial for you then binary protocol is better choice I would say.