Is it possible to schedule and monitor MR and spark jobs using Apache NIFI? - minify

I would like to use Apache NiFi for the complete data pipeline scheduling and monitoring for both batch(Map reduce, Hive) and streaming jobs(Spark)
Is it possible to schedule and monitor MR and spark jobs using Apache NIFI?
If it so what are the implementation steps ?

Related

Publish Data via Kafka with Palantir Foundry

We would like to publish data (records of an dataset) via Kafka to our enterprise service bus. We are on Prem. I know that it is possible to consume data via Kafka, but I have not found any documentation on how to publish it. Is this possible?

MySQL Trigger to RabbitMQ Communication

I have an application on Sql Server that send change data to Target database using Sql Service Broker. I just capture the data from Trigger and push data into service broker Queue. Now I want to make compatible my application to MySql. Now the problem is how i achive exactly the same implementation in MySql because its not supported. If I use external Message Broker like RabbitMq how MySql Table Trigger directly communicate with RabbitMq.
Thanks in Advance
If you can use Kafka as an alternative to RabbitMQ there are some tutorials on how to accomplish a similar goal:
Using Kafka Connect
Using GCP services
Change data capture (CDC) is a term used to classify this pattern of reacting to data changes and then delivering those changes in real-time to a downstream process.

Kafka JDBC Sink connector with json messages without schema

I am trying to load json messages into a Postgres database, using the Postgres sink connector. I have been reading online and have only found the option to have the schema in the JSON message, however, ideally, i would like not to include the schema in the message. Is there a way to register the JSON schema in the schema registry and use that like it's done with Avro?
Also, i'm currently running kafka by downloading the bin, as I had several problems with running kafka connect with docker due to ARM compatibility issues. Is there a similar install for schema registry? Because i'm only finding the option of downloading it through confluent and running it on docker. Is it possible to only run schema registry with docker, keeping my current set up?
Thanks
JSON without schema
The JDBC sink requires a schema
Is there a way to register the JSON schema in the schema registry and use that like it's done with Avro?
Yes, the Registry supports JSONSchema (as well as Protobuf), in addition to Avro. This requires you to use a specific serializer; you cannot just send plain JSON to the topic.
currently running kafka by downloading the bin... Is there a similar install for schema registry?
The Confluent Schema Registry is not a standalone package outside of Docker. You'd have to download Confluent Platform in place of Kafka and then copy over your existing zookeeper.properties and server.properties into that. Then run Schema Registry. Otherwise, compile it from source and build a standalone distribution of it with mvn -Pstandalone package
There are other registries that exist, such as Apicurio

RESTful API for Cosmos map-reduce

I' m working with the FIWARE BigData Analysis GE - Cosmos, and I need to use RESTful APIs.
I know that there are RESTful APIs for HDFS (e.g. WebHDFS), but may I run MapReduce jobs? How?
Thanks
There is not a REST API for running MapReduce jobs, but there is an Oozie server running on the instance of Cosmos in FIWARE LAB addressing this issue.
By using Oozie you can describe workflows of data analysis; you can see these workflows as a sequence of actions executed to process data, being these actions MapReduce jobs, Hive queries, shell scripts, etc.
Thus, you can describe a single action workflow for a MapReduce job.
Oozie can be used in several ways, one of them is through its REST API. All the details can be found here.

What is the difference between FIware Cosmos and Apache Spark?

What is the difference between FIware Cosmos and Apache Spark ?
as I told you by email Cosmos is not a new Big Data system but a manager of Hadoop clusters; it allows for cluster creation on demand on top of FIWARE infrastructure, among other management services. It is quite similar to Openstack Sahara... in fact, we are thinking to move to Sahara in the medium time. Sahara is quite interesting because it allows the deployment of Spark clusters as well.
So, maybe in a few months you can use our platform in order to ask for a Spark cluster.