RESTful API for Cosmos map-reduce - fiware

I' m working with the FIWARE BigData Analysis GE - Cosmos, and I need to use RESTful APIs.
I know that there are RESTful APIs for HDFS (e.g. WebHDFS), but may I run MapReduce jobs? How?
Thanks

There is not a REST API for running MapReduce jobs, but there is an Oozie server running on the instance of Cosmos in FIWARE LAB addressing this issue.
By using Oozie you can describe workflows of data analysis; you can see these workflows as a sequence of actions executed to process data, being these actions MapReduce jobs, Hive queries, shell scripts, etc.
Thus, you can describe a single action workflow for a MapReduce job.
Oozie can be used in several ways, one of them is through its REST API. All the details can be found here.

Related

How to call an informatica workflow which running in different integration service

I have 2 workflows workflow 1 in Integration service 1 and workflow 2 integration service 2.
How do I call workflow 2 from workflow 1 I am currently trying to call then using command prompt but it didn't work
Just to let you know these integration servicce 1 is informatica 9.2
and integration service 2 is informatica version 10.2
PowerCenter does not provide suppor for cross-workflows dependencies. Regardless of whether these are configured to use same or a different Integration Service.
The best way to solve this kind of challenges is to use separate scheduling tool, such as AirFlow, Control-M, Autosys - or any other.
It is also possible to expose the workflow as a webservice and call it from different workflows, if needed. Not really convenitent, but possible.
Lastly, it's possible to use command line interface pmcmd startworkflow in a command task of one workflow to have the other one started.
I have done something similar this way:
The other WF is a web service one/ or is executed along a web service.
Add an application connection.
The WSH where your WF runs should be the endpoint of that connection.
Add this WF inside the mapping of the other one as a Web Service transformation.

How to publish AWS SNS data to MySql database

I am new to AWS/Database.
Since i am completely beginner to this, any suggestions will be appreciated.
Currently in the project it has been planned like data from AWS database will be pushed using SNS HTTP fanout to external MySql Database.
NOTE :
1.The data will be pushed by the Client using AWS SNS
2. We have no access to the AWS account nor we are planning to have a AWS account.
3. External MySql database is a private database running on Linux Server
I have gone through the Official documentation of AWS SNS, and also some websites. This is all i found :
Use external applications like Zapier to map the data.
Develop some application to map the data.
Is it like using a Servlet application in the receiver side to update the table, or is there any other methods?
AWS DB -----> SNS -----> _________ -----> External MySql DB
Thanks
If you cannot have an AWS Account, you can have your own web server consume the SNS Messages. SNS can deliver messages to an HTTP/HTTPS endpoint in a predefined structure. Read more details here. You can enable such an endpoint on your own server and share your server URL with the AWS Account owner. They can create a subscription from their SNS topic to your endpoint.
For setting up this endpoint, there are many options. ExpressJS is one such popular framework to quickly implement HTTP APIs.
Probably, option two would be more suited, or at least first to be considered. For that option you would have have to develop a lambda function which would receive data from SNS, re-format if needed and upload it to MySQL. So your architecture would look like:
Data--->SNS--->Lambda function---> MySQL
Depending on the amount of incoming data to the SNS, you may add SQS queue as well to the mix, to buffer the records and enable fun-out architecture. For example:
/---> SQS queue 1---> Lambda function 1---> MySQL
Data -->SNS --/
\
\--- SQS queue 2 ---> Lambda function 2, EC2 instance, Container ---> Other destination
Other solutions are possible. But I would first consider the above, before looking into other ways.

Sync RDBMS with Apache Directory Ldap

Currently, I am on a requirement to sync data from apache direcotry ldap to any of the RDBMS Databases (MySQL, PostgreSQL). Directory approximately holds a few million of records for now and may grow in future. Ldap directory is being the primary data source for now but the motive is to have real time data in both Ldap as well as in RDBMS since We have a plan to use RDBMS for real-time analytics purpose.
Option1:
Thinking of using spring cloud data flow. A source spring boot app to read ldap data that are changed after the last sync run. Source app pushes data to queue(RabbitMQ for now). Sink would be another spring boot app that collects data directly from queue and persists the data into RDBMS. We will be able to better track and manage the sync process jobs using spring cloud data flow dashboard offerings.
Option2:
Spring LdapTemplate helps us to talk to ldap directory in our application. One approach would be to intercept the ldapTemplate calls wherever applicable and push the data to queue and then an intermediate app reads data from queue(RabbitMQ) and converts the ldap response to the required format that can be updated into RDBMS DB.
I am new to Ldap and spring cloud data flow. So far, I have got only these 2 approaches considering my project's existing technology and system landscape. Any other suggestions/ approach are really appreciated. Thanks in advance.
One another approach if LDAP is Microsoft ad server then creating windows service in C# which will connect to your LDAP server and fetch data every day and send data to your rdbms through socket connection. Which is reliable and consistent.

Spring batch deployed on openshift using several pods

I deploy an application on Openshift and I use at least 2 pods.
My war contains a Spring Batch application, scheduled by a Spring cron.
Of course, each pod start the same batch at the same time, and it's my problem/question.
Is there a way to avoid this behaviour ? I would like to start only one batch instance (or is there a way to configure Spring batch to check if a batch is already running ?)
Thanks in advance.
Assuming you use Deployment, it's not trivial, but here are some ideas that can help you.
Use ScheduledJobs/CronJobs from Kubernetes. Meaning you would ditch controling of launching batch from your app completely and have dedicated pod launched to perform batch job and die
Use master elector sidecar for establishing the right to exec batch (https://github.com/kubernetes/contrib/tree/master/election)
Implement some locking mechanism on your own
Use StatefulSet and bind batch to run only on a praticular hostname (ie. by config var passed to Pods like BATCH_HOSTNAME. StatefulSets have deterministic names so you could say that batch should run only on my-pods-0
It sounds like you need leader election in your situation. Spring Integration provides leader election functionality you can use to determine who is the master. That master would be the one that actually launches the jobs. The other would just ignore the scheduled event. You can read more about Spring Integration's leader election in the documentation here: https://docs.spring.io/spring-integration/api/org/springframework/integration/support/leader/LockRegistryLeaderInitiator.html

Spring Batch integration with spring cloud data flow on local server to add spring admin capabilities

I have a basic spring batch app that runs on embedded Apache Tomcat in spring boot. I need to add spring admin capabilities to it. As per latest spring docs, I need to use spring cloud data flow to do this (https://docs.spring.io/spring-batch-admin/). So now I need to use spring cloud dataflow and integrate my spring batch app on local server. I just want it to run on my local machine under tomcat without deploying it to any cloud environments like cloud foundry or openshift. Is it possible? I am sure its possible. I would like to get some references/Examples on this type of integration and starter guide integrating spring batch app. Do I need to create tasks in spring cloud data flow to run my spring batch app? If there are any sample examples/pseudo code to guide me then it would be easy.
As described in the migration-guide, you can use the "local" variant of Spring Cloud Data Flow (SCDF) as a replacement to Spring Batch Admin (SBA).
SCDF is a simple Spring Boot application that you can run it as a standalone Java process similar to how you're running the application today.
Also, as described in the migration-steps, you'd have to port your existing batch workload to Spring Cloud Task model, and that should be a straightforward process - use this Spring Batch sample. To the most part, you'd copy/paste the business logic into a Spring Cloud Task application and all the infrastructure including schemas, repository, and other batch goodies will continue to work. There are few complex implementations in task-app-starers, which can be used as a reference, too.
Lastly, you can use SCDF's dashboard for monitoring and management.