Apache Hop Kafka Consumer throws error when I run pipeline using the Flink Engine - apache-hop

I am trying to test a simple kafka consumer using Apache Hop v1.2. When I run the pipeline using the local runner, it works fine. But if I run it using the flink runner I get the following error
You can only have one copy of the injector transform 'output' to
accept the Kafka messages
I have tried debugging the Hop code and looks like the root cause is the initSubPipeline() method being invoked multiple times while using the Flink runner. That's not the case when I use the local runner. Am I missing something here?

For others bumping into this problem, I discussed it on an other communication platform with Mono.
In this case the problem lies with the Transform. Not all transforms in Hop are able to run in a clustered environment such as Flink, for Apache Kafka we have 2 transforms. One is called "Beam Kafka Consumer/Producer" this one can be used in combination of one of our engines that use Apache Beam (Flink, Spark, Google Dataflow). The other are the "regular" Kafka producer/consumer and these will only work using the Hop Engine.
There is a ticket on our backlog (HOP-3863) that will add a system to throw an error or warning when using a transform that is not made to run on one of our engines.

Related

Restoring to a known state

Couchbase CLI comes with the cbbackup and cbrestore commands which I had hoped would allow me to take a database in a known state and back it up and then restore it somewhere else where only a newly installed instance exists. Unfortunately it appears that the target database must have all the right buckets setup already and (possibly) that the restore command requires that each bucket name be mentioned explicitly.
This wouldn't pose too much of a problem if I were hand-holding the process but the goal is to start a new environment in a fully automated fashion and I'm wondering if someone has a working method of achieving this goal.
If it where me, I'd use the CLI, REST API or one of the Couchbase SDKs to write something to automate the creation of the target bucket then do the restore.
REST API:
http://docs.couchbase.com/couchbase-manual-2.5/cb-rest-api/#creating-and-editing-buckets
CLI:
http://docs.couchbase.com/couchbase-manual-2.5/cb-cli/#couchbase-cli-commands
Another option you might look into is to use these same kinds of methods to automate set up of uni-directional XDCR from the source to the target cluster.

Getting historical data in Fi ware using Cosmos

I'm trying to get all the historic information about a sensor of Fi Ware.
I've seen that Orion uses Cygnus to store historics in Cosmos. Is that information accesible or is it only possible to use IDAS to get it?
Where could I get more info about this?
The way you can consume the data is, in an incremental approach from the learning curve point of view:
Working with the raw data, either "locally" (i.e. logging into the Head Node of the cluster) by using the Hadoop commands, either "remotely" by using the WebHDFS/HttpFS REST API. Please observe within this approach you have to implement whichever analyzing logic you need, since Cosmos only allows you to manage, as said, raw data.
Working with Hive in order to query the data in a SQL-like approach. Again, you can do it locally by invoking the Hive CLI, or remotely by implementing your own Hive client in Java (there are some other languages) using the Hive libraries.
Working with MapReduce (MR) in order to implement strong analysis. In order to do this, you'll have to create your own MR-based application (typically in Java) and run it locally. Once you are done with the local run of the MR app, you can go with Oozie, which allows you to run such MR apps in a remote way.
My advice is you start with Hive (the step 1 is easy but does not provide any analyzing capabilities), first locally trying to execute some Hive queries, then remotely implementing your own client. If this kind of analysis is not enough for you, then move to MapReduce and Oozie.
All the documentation regarding Cosmos can be found in the FI-WARE Catalogue of enablers. Within this documentation, I would highlight:
Quick Start for Programmers.
User and Programmer Guide (functionality described in sections 2.1 and 2.2 is not currently available in FI-LAB).

What's the appropriate way to test code that uses MySQL-specific queries internally

I am collecting data and store this data in a MySQL database using Java. Additionally, I use Maven for building the project, TestNG as a test framework, and Spring-Jdbc for accessing the database. I've implemented a DAO layer that encapsulates the access to the database. Besides adding data using the DAO classes I want to execute some queries which aggregate the data and store the results in some other tables (like materialized views).
Now, I would like to write some testcases which check whether the DAO classes are working as they should. Therefore, I thought of using an in-memory database which will be populated with some test data. Since I am also using MySQL-specific SQL queries for aggregating data, I went into some trouble:
Firstly, I've thought of simply using the embedded-database functionality provided by Spring-Jdbc to instantiate an embedded database. I've decided to use the H2 implementation. There I ran into trouble because of the aggregation queries, which are using MySQL-specific content (e.g. time-manipulation functions like DATE()). Another disadvantage of this approach is that I need to maintain two ddl files - the actual ddl file defining the tables in MySQL (here I define the encoding and add comments to tables and columns, both features are MySQL-specific); and the test ddl file that defines the same tables but without comments etc. since H2 does not support comments.
I've found a description for using MySQL as an embedded database which I can use within the test cases (http://literatitech.blogspot.de/2011/04/embedded-mysql-server-for-junit-testing.html). That sounded really promising to me. Unfortunately, it didn't worked: A MissingResourceExcpetion occurred "Resource '5-0-21/Linux-amd64/mysqld' not found". It seems that the driver is not able to find the database daemon on my local machine. But I don't know what I have to look for to find a solution for that issue.
Now, I am a little bit stuck and I am wondering if I should have created the architecture differently. Do someone has some tips how I should setup an appropriate system? I have two other options in mind:
Instead of using an embedded database, I'll go with a native MySQL instance and setup a database that is only used for the testcases. This options sounds slow. Actually, I might want to setup a CI server later on and I thought that using an embedded database would be more appropriate since the test run faster.
I erase all the MySQL-specific stuff out of the SQL queries and use H2 as an embedded database for testing. If this option is the right choice, I would need to find another way to test the SQL queries that aggregates the data into materialized views.
Or is there a 3rd option which I don't have in mind?
I would appreciate any hints.
Thanks,
XComp
I've created Maven plugin exactly for this purpose: jcabi-mysql-maven-plugin. It starts a local MySQL server on pre-integration-test phase and shuts it down on post-integration-test.
If it is not possible to get the in-memory MySQL database to work I suggest using the H2 database for the "simple" tests and a dedicated MySQL instance to test MySQL-specific queries.
Additionally, the tests for the real MySQL database can be configured as integration tests in a separate maven profile so that they are not part of the regular maven build. On the CI server you can create an additional job that runs the MySQL tests periodically, e.g. daily or every few hours. With such a setup you can keep and test your product-specific queries while your regular build will not slow down. You can also run a normal build even if the test database is not available.
There is a nice maven plugin for integration tests called maven-failsafe-plugin. It provides pre- and post- integration test steps that can be used to setup the test data before the tests and to cleanup the database after the tests.

Hadoop JUnit testing writing/reading to/from the hdfs

I have written a class(es) that writes and reads from hdfs. Given certain conditions that are occurring when these classes are instantiated they create a specific path and file, and write to it (or they go to a previously created path and file and read from it). I have tested it by running a few hadoop jobs, and it appears to be functioning correctly.
However, I would like to be able to test this in the JUnit framework, but I have not found a good solution for being able to test reading and writing to hdfs in JUnit. I would appreciate an helpful advice on the matter. Thanks.
I haven't tried this myself yet, but I believe what you are looking for is org.apache.hadoop.hdfs.MiniDFSCluster.
It is in hadoop-test-.jar NOT hadoop-core-.jar. I guess the Hadoop project uses it internally to test.
Here it is:
http://svn.apache.org/viewvc/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/MiniDFSCluster.java?revision=1127823&view=markup&pathrev=1130381
I think there are plenty of uses of it in that same directory, but here is one:
http://svn.apache.org/viewvc/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/TestWriteRead.java?revision=1130381&view=markup&pathrev=1130381

MySQL User Defined Function to send a windows message

G'Day,
I want to use the Windows API Postmessage() call from inside a MySQL UDF on MySQL 5.1.51 (XP SP3). I know the UDF (Written in Delphi 2006) is working by setting a bogus result for the UDF.
The syntax of the UDF takes two integer params, one for a window handle and the other for a message number. However a call to PostMessage() from inside my UDF causes an exception in mysqld and the service stops.
Any ideas or pointers? Alternatively if anyone can tell me how I am able to simulate IB Events for MySQL via AnyDAC and Delphi OR an alternate approach to getting a notification when a record has changed in the database then please show me the light.
--Donovan
Your going to run into problems with this approach mainly due to the fact that messaging will only work to the same access level, and within the same session on the same computer. You would be better served by using other methods, such as TCPIP sockets, MailSlots, Memory mapped files, ect.
To best simulate a post message, I would use TCPIP UDP. A good lightweight library that I have used in the past is Synapse. The synapse library from SVN does run quite well against the latest versions of Delphi. The class you will want to use for this is the TUDPBlockSocket.
As an alternative to windows messages or TCP/IP, you might want to consider the named pipes answer to this question on sending information between two Delphi programs and this question on what named pipes are.
--jeroen
While I have had success via the UDF / Windows Pipe route I had another idea leveraging off being able to tap into the information message framework(?) in MySQL. See https://stackoverflow.com/q/3992779/223742