I am running a KNIME workflow:
It is running over every row of my data. The problem is, I planned to run 7000 iterations and at 6800 it gets stuck. Is there a way to save the csv file? There is a problem with one row, and I want to save the result at this point in time.
If there is a problem with a single input row, then easiest way to debug this in KNIME is often to run the input in a chunk loop. In your case I would set the outer chunk loop to run 1 row at a time, and remove the inner parallel chunk loop until you find the row causing the problem.
Unfortunately, this might take quite some time to run. As an alternative, try as above, but set the chunk size to say 100, and then once you know the block of rows that cause the error, use a row filter before the chunk loop to filter the input table to just that block of 100 rows, and then set the chunk size to 1 to see which row is the problem.
Place a CSV Writer node inside the loop, i.e. connected to the output of your Parallel Chunk End (keeping this also connected to the Loop End).
Configure the If file exists… setting of this CSV Writer to Append.
That should save all the data that is successfully processed by the loop.
When you say there is a problem with one row though, do you know what that problem is? Presumably you'd rather get the whole loop working.
You could also consider using Try and Catch nodes from the Workflow Control > Error Handling section to skip a chunk that causes an error.
Related
I have an octave script, that is reading a text file with (fopen, fseek and fread) functions. In this file, binary data is stored.
First, I read the file in a loop like this:
fid = fopen('myfile.txt', 'rb');
fseek(fid, 0);
for i = 1:5
data = fread(fid, 1000);
......
...<Opertions I want to do>
......
......
endfor
It takes 1000 bits in one iterations and computes the results for every 1000 bits.
Then I read the file from different position by changing the fseek line like below:
fseek(fid, 1000);
But still it is giving the same result as it gave for the first slot (even now I am not reading the first slot) when I read the file from the beginning.
Then I did the same thing on my other computer and their it worked for the first time, but in the second attempt it is showing the same behavior. First I thought may be there is a problem with my scripts or generated file, but then it worked fine for the first time on the other computer then I think there is some kind of problem with Octave. May be I need to clear the memory or something.
Has anyone ever faced this type of problem?
What I am trying to accomplish => Run 50 threads in parallel using a CSV file as the dataset.
Here's how the CSV looks like (Let's say there are 50 records):
Username,Password
user1,password1
user2,password2
...,...
user50, password50
In JMeter, when I run my test case, each thread will consume 1 record of the CSV file in parallel. By that I mean, Thread 1 takes the first record (user1,password1), Thread 2 takes the second record (user2,password2), until the last record (50 in this example). And all of that happens at the same time.
I am still new to JMeter and I would like to know if this is something that is "doable" through this tool. If it is possible, your help is greatly appreciated! :)
Given default CSV Data Set Config setup which looks like:
each thread (virtual user) will take next line of the CSV file on each loop (iteration)
when the last line of the CSV file will be reached it will start over from the beginning
With regards to your "at the same time" requirement - the load pattern is controlled by Thread Group settings (number of threads, loops, ramp-up period), depending on your setup you will have concurrency from 0 to 50 active users, you can observe it using i.e. Active Threads Over Time listener
If you want to send 50 requests at exactly the same moment - consider using Synchronizing Timer
This is possible using CSV Data Set Config element. It reads the data row by row and username and password can be assigned to each thread. You can use the values with following syntax
Username ${Username}
password ${Password}
Also please note that you do not have to define the variable names in the CSV Dataset Config Element as you have them in the first row of the CSV file.
I splitted my Json several times to avoid OOM errors. I've put a Wait processor to wait for all my records to the use a Merge content. Each FF has been assigned an attribute of the original file number of lines.
The wait processor should put the FF in wait until the notify increases the counter to the total number of lines.
However It seems that my Wait processor is not putting my FF in the Wait queue(it is not shown but there is).
Is there anything wrong in this peace of flow?
You can do multiple merges by using UpdateAttribute after each Split to save the fragment.* attributes as something different, perhaps fragment1.*, fragment2.*, etc. Then you can restore each of them in reverse order with UpdateAttribute before each Merge, setting fragment.* to the fragment2.* attributes, then MergeContent, then set fragment.* to the fragment1.* attributes, then MergeContent, and so on.
Am I missing an easy way to do this?
I have a CSV file with a number of params in it, and in my test I want to be able to make some of the fields unique across CSV repetitions with a suffix determined by the number of times I've looped through the file.
So suppose my CSV (simplified) had:
abc
def
ghi
I want to generate in the test
abc_1
def_1
ghi_1 <hit EOF>
abc_2
def_2
ghi_2 <hit EOF>
abc_3
def_3
ghi_3
I thought I could set up a counter to run parallel to my CSV loop, but that won't work unless I increment it by 1/n each iteration, where n is the number of lines in my CSV file. Which you can't do because counters are integers.
I'm going to go flail around and see if I can come up with a solution, but in case I'm not successful, has anyone got any suggestions?
I've used an EOF marker row (index column with something like "EOF" or "END", etc) and used an IF controller with either a non-resetting counter OR user-variables incremented via javascript in a BSF element (BSF assertion or whatever, just a mechanism to run the script).
Unfortunately its the best solution I've come up with without putting too much effort into it.
I'm importing a flat file into a database using a Data Flow Task in SSIS. The file is very simple: it contains three comma-separated values per row. Whenever I run this task, however, I receive a warning from the Flat File component:
Warning: 0x8020200F: There is a partial row at the end of the file.
This warning seems to happen regardless of the size of the file: even with only a handful of rows in the file, visually validated (with extended characters and whatnot visible) I still receive it. Moreover, it doesn't seem to matter whether I have a blank row at the end of the file or I just end it without a trailing CR+LF.
How can I get rid of this warning so I can run my package with WarnAsError enabled?
(BTW, it seems someone else may have had a similar problem in There is a partial row at the end of the file, though it wasn't much of a question.)
I have found three things to try if you encounter this problem. In at least two out of the three cases, SSIS was ignoring rows of my input file with only the above warning to show for it. Because of that, I do not recommend ignoring this warning!
Step 1: verify that your flat file is valid
This error will appear when you have an invalid input file. This can be especially hard to detect if your input file has millions of lines, as mine do, but it's vital that you discover file format violations because SSIS will happily give you this warning and continue on its way without importing the offending lines or, in some cases, the lines after the offending lines. The easiest way I found to discover a problem with the source file is to check the number of rows that are being imported successfully. If it's vastly different than the number you expect in your flat file, something may have gone wrong in the middle somewhere.
Step 2: try a dummy line at the end (fixed-width only)
If you are using a fixed-width format input file, Microsoft may have a helpful KB article for you. Basically, they suggest that you add a dummy line at the end of the file.
I am not using fixed-width files, so I can't say how useful this technique is.
Step 3: turn off text qualification for non-text
This is the tricky one, because I believe the TextQualified property is True by default. If your input file uses non-text fields (integers, etc.), then you must tell SSIS that it should not expect those columns to be qualified as text. Essentially, your input file will be invalid in spite of looking perfectly valid.
TextQualified is a property of the columns in your Flat File Connection Manager.
To change it, open up your connection manager, click "Advanced", and then click on a non-text column. Make sure the TextQualified property is set to False. You will need to do this for all of your non-text columns.
If the byte width of a line in the file is known, you can always double check that the total byte size of the file can be divided by the expected line size to give you a nice round line count number (as opposed to a decimal).
It helps also to know from your source just how many records are expected, but if you don't have this you can at least double check the resultant loaded tables record count against the calculation of line count while loading the file.
I've seen this error often when a source flat text file is missing it's last \r\n at the end of the file.
Running on Windows 64 bit is perfect. It led to no missing row, but I lost the last row when running on Windows 2008.
My workaround is
1. open the ssis in BIDs on the Windows 2008.
2. open the file connection manager make sure Text Qualifier set to
3. rebuild it
All work fine in both Windows 7 and Windows 2008.