Merge Json after several splits in Nifi

Merge Json after several splits in Nifi - json

I splitted my Json several times to avoid OOM errors. I've put a Wait processor to wait for all my records to the use a Merge content. Each FF has been assigned an attribute of the original file number of lines.
The wait processor should put the FF in wait until the notify increases the counter to the total number of lines.
However It seems that my Wait processor is not putting my FF in the Wait queue(it is not shown but there is).
Is there anything wrong in this peace of flow?

You can do multiple merges by using UpdateAttribute after each Split to save the fragment.* attributes as something different, perhaps fragment1.*, fragment2.*, etc. Then you can restore each of them in reverse order with UpdateAttribute before each Merge, setting fragment.* to the fragment2.* attributes, then MergeContent, then set fragment.* to the fragment1.* attributes, then MergeContent, and so on.

Related

How to analyze/calculate a lot of csv-logfiles automatically

I have a bunch of logilfes from machines at my work, and i want to analyze them. These files are named like #increasingnumer_date_time.csv and alltogether these are round about 10.000 files (0.5-2MB each) that contain information about temperatures, pressures, status of actors like vents, pumps, ect.
I want that script to do some calculations (forming integrals of the pressures when special conditions are met) with any of these csv-files and store the result of every calculation in a csv/excel-file in the way, that I have a list of the logfile-names and the corresponding result of the calculation...
So it is not needed to put all these files in one super-big megafile, it is totally fine when these files were opened and processed one-after-another and just every result of the calculation is written in the result-file so that i can allocate the results of the calculation to the corresponding csv-file...
How can I do this? I am no programming expert, but I use python/pandas sometimes...
Thank you
I tried to do it file-after-file using excel :-))
I expect that this is a batch job that can be automated (but I do not know how) :))

How to detect redundant piece of code containing array?

The lecture for my Java class has this piece of code:
for (int i=0; i<arr.length; i=i+10){
if(i%10 == 0){
System.out.println(arr[i]);
}
}
If you start at 0 and then go 10, 20, etc. Why do you need the if condition? Naturally all of these numbers divide by 10.

It's redundant. The only way it could have an effect is when the array length is close to the Integer max value and you're causing overflows by adding 10, but then your code would loop infinitely anyway (or crash when accessing negative array values).

To me the code in the if condition might have 2 reasones:
It is a way to monitor the progress of the function (although since the condition of the for loop is i=i+10 instead of i++, it is less meaningful in this case). This is very normal when we are using some script to execute a task that is dealing with a lots of data (normally in single process, and take some time). By printing out the progress periodically we are able to know (or estimate) how many data has been read/wrtie, or how many times have the codes in the loop has been executed, in this case.
There might be more code added in the for loop, which might modify i. In this case, i%10 == 0 will be meaningful.
In other words, without any more context it does seems like the if condition is redundant, in this case.
To answer the question of the title, here's what we usually do. First, have the code review done by someone else before you merge your branch. Having another fellow to review your codes are good practise as they could give you a fresh mind on correctness and code style. Second, if you find something that is suspecious but not sure (for example, the "redundant code" you think here), wrote unit tests to cover the part of code that you would like to change, make the changes and rerun the unit tests and see if you still get what is expected.
Personally I haven't heard of any tools that is able to detect "redundant code" as the example here, as "redundant" might not be "redundant" at all under different circumstances.

How can I save intermediate results from a KNIME loop?

I am running a KNIME workflow:
It is running over every row of my data. The problem is, I planned to run 7000 iterations and at 6800 it gets stuck. Is there a way to save the csv file? There is a problem with one row, and I want to save the result at this point in time.

If there is a problem with a single input row, then easiest way to debug this in KNIME is often to run the input in a chunk loop. In your case I would set the outer chunk loop to run 1 row at a time, and remove the inner parallel chunk loop until you find the row causing the problem.
Unfortunately, this might take quite some time to run. As an alternative, try as above, but set the chunk size to say 100, and then once you know the block of rows that cause the error, use a row filter before the chunk loop to filter the input table to just that block of 100 rows, and then set the chunk size to 1 to see which row is the problem.

Place a CSV Writer node inside the loop, i.e. connected to the output of your Parallel Chunk End (keeping this also connected to the Loop End).
Configure the If file exists… setting of this CSV Writer to Append.
That should save all the data that is successfully processed by the loop.
When you say there is a problem with one row though, do you know what that problem is? Presumably you'd rather get the whole loop working.
You could also consider using Try and Catch nodes from the Workflow Control > Error Handling section to skip a chunk that causes an error.

Why amqsput command is dividing my data and push to the message queue?

While testing my application, I tried with a string taking space of some 200KB. But amqsput divided my request in multiple chunks. I am not sure why it's happening. If I reduce the size to some 100KB then it works fine.
I am using following command to push data into the message queue:
amqsput MESSAGE_QUEUE MQM < /home/usr/sampleRequest.xml
This sampleRequest.xml contains an XML formatted as one line. I don't know much about MQ admins/configuration and want an idea what's wrong.
Why it's dividing my data and push it to queue when file size is greater than a certain value.

amqsput & amqsget are simple applications for putting and getting small messages to and from a queue. If you look at the code for amqsput (i.e. amqsput0.c), you will see that the buffer size used is 65535 (64KB).
There are lots of programs that are better suited for your type of testing. There is a long list of C sample MQ applications here. The 2 that you might want to use are file2msg and msg2file. There is also Paul Clarke's QLoad program (it used to be SupportPac).

Counting the number of passes through a CSV file in JMeter

Am I missing an easy way to do this?
I have a CSV file with a number of params in it, and in my test I want to be able to make some of the fields unique across CSV repetitions with a suffix determined by the number of times I've looped through the file.
So suppose my CSV (simplified) had:
abc
def
ghi
I want to generate in the test
abc_1
def_1
ghi_1 <hit EOF>
abc_2
def_2
ghi_2 <hit EOF>
abc_3
def_3
ghi_3
I thought I could set up a counter to run parallel to my CSV loop, but that won't work unless I increment it by 1/n each iteration, where n is the number of lines in my CSV file. Which you can't do because counters are integers.
I'm going to go flail around and see if I can come up with a solution, but in case I'm not successful, has anyone got any suggestions?

I've used an EOF marker row (index column with something like "EOF" or "END", etc) and used an IF controller with either a non-resetting counter OR user-variables incremented via javascript in a BSF element (BSF assertion or whatever, just a mechanism to run the script).
Unfortunately its the best solution I've come up with without putting too much effort into it.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008