How do Java 8 parallel streams behave on a thrown exception? - exception

How do Java 8 parallel streams behave on a thrown exception in the consuming clause, for example in forEach handling? For example, the following code:
final AtomicBoolean throwException = new AtomicBoolean(true);
IntStream.range(0, 1000)
.parallel()
.forEach(i -> {
// Throw only on one of the threads.
if (throwException.compareAndSet(true, false)) {
throw new RuntimeException("One of the tasks threw an exception. Index: " + i);
});
Does it stop the handled elements immediately? Does it wait for the already started elements to finish? Does it wait for all the stream to finish? Does it start handling stream elements after the exception is thrown?
When does it return? Immediately after the exception? After all/part of the elements were handled by the consumer?
Do elements continue being handled after the parallel stream threw the exception? (Found a case where this happened).
Is there a general rule here?
EDIT (15-11-2016)
Trying to determine if the parallel stream returns early, I found that it's not determinate:
#Test
public void testParallelStreamWithException() {
AtomicInteger overallCount = new AtomicInteger(0);
AtomicInteger afterExceptionCount = new AtomicInteger(0);
AtomicBoolean throwException = new AtomicBoolean(true);
try {
IntStream.range(0, 1000)
.parallel()
.forEach(i -> {
overallCount.incrementAndGet();
afterExceptionCount.incrementAndGet();
try {
System.out.println(i + " Sleeping...");
Thread.sleep(1000);
System.out.println(i + " After Sleeping.");
}
catch (InterruptedException e) {
e.printStackTrace();
}
// Throw only on one of the threads and not on main thread.
if (!Thread.currentThread().getName().equals("main") && throwException.compareAndSet(true, false)) {
System.out.println("Throwing exception - " + i);
throw new RuntimeException("One of the tasks threw an exception. Index: " + i);
}
});
Assert.fail("Should not get here.");
}
catch (Exception e) {
System.out.println("Cought Exception. Resetting the afterExceptionCount to zero - 0.");
afterExceptionCount.set(0);
}
System.out.println("Overall count: " + overallCount.get());
System.out.println("After exception count: " + afterExceptionCount.get());
}
Late return when throwing not from the main thread. This caused a lot of new elements to be handled way after the exception was thrown. On my machine, about 200 elements were handled after the exception was thrown. BUT, not all 1000 elements were handled. So what's the rule here? Why more elements were handled even though the exception was thrown?
Early return when removing the not (!) sign, causing the exception to be thrown in the main thread. Only the already started elements finished processing and no new ones were handled. Returning early was the case here. Not consistent with the previous behavior.
What am I missing here?

When an exception is thrown in one of the stages, it does not wait for other operations to finish, the exception is re-thrown to the caller.
That is how ForkJoinPool handles that.
In contrast findFirst for example when run in parallel, will present the result to the caller only after ALL operations have finished processing (even if the result is known before the need to finish of all operations).
Put in other words : it will return early, but will leave all the running tasks to finish.
EDIT to answer the last comment
This is very much explained by Holger's answer (link in comments), but here are some details.
1) When killing all BUT the main thread, you are also killing all the tasks that were supposed to be handled by these threads. So that number should actually be more around 250 as there are 1000 tasks and 4 Threads, I assume this returns 3?:
int result = ForkJoinPool.getCommonPoolParallelism();
Theoretically there are 1000 tasks, there are 4 threads, each supposed to handle 250 tasks, then you kill 3 of them meaning 750 tasks are lost.
There are 250 tasks left to execute, and ForkJoinPool will span 3 new threads to execute these 250 left tasks.
A few things you can try, change your stream like this (making the stream not sized):
IntStream.generate(random::nextInt).limit(1000).parallel().forEach
This time, there would be many more operations ending, because the initial split index is unknown and chosen by some other strategy. What you could also try is change this :
if (!Thread.currentThread().getName().equals("main") && throwException.compareAndSet(true, false)) {
to this:
if (!Thread.currentThread().getName().equals("main")) {
This time you would always kill all threads besides main, until a certain point, where no new threads will be created by ForkJoinPool as the task is too small to split, thus no need for other threads. In this case even less tasks would finish.
2) Your second example, when you actually kill the main thread, as the way code is, you will not see the actual running of other threads. Change it :
} catch (Exception e) {
System.out.println("Cought Exception. Resetting the afterExceptionCount to zero - 0.");
afterExceptionCount.set(0);
}
// give some time for other threads to finish their work. You could play commenting and de-commenting this line to see a big difference in results.
TimeUnit.SECONDS.sleep(60);
System.out.println("Overall count: " + overallCount.get());
System.out.println("After exception count: " + afterExceptionCount.get());

Related

C++/Cli Destructors are not called

I am trying to investigate a memory leak problem and I found out that destructors in my library are not even called and I have the following code:
PPCamNET::Native::PpQueue::PpQueue(int capacity) : m_capacity(capacity), m_array(nullptr), m_head(0), m_tail(0)
{
// Quick fix for case when capacity is 1 (single snap)
// and Push function crashes on 1st frame
if (m_capacity == 1)
m_capacity = 2;
m_array = new FrameData[m_capacity];
m_pushes = 0;
m_pops = 0;
}
The above constructor's destructor should be called after PpQueue is destroyed, but it is not stopping at the break point.
PPCamNET::Native::PpQueue::~PpQueue()
{
delete[] m_array; //<==== here I set a break point
}
PPQueue instance is created by AcqCache constructor.
PPCamNET::Internal::AcqCache::AcqCache(AcqBuffer^ acqBuffer)
{
//m_stopWatchPush = Stopwatch::StartNew();
//m_stopWatchPop = Stopwatch::StartNew();
m_acqBuffer = acqBuffer;
m_cacheLock = gcnew Object();
m_processFrameRunning = true;
try
{
m_frameDataCache = new PpQueue(acqBuffer->BufferSize / 2 + 1);
AcqCache destructor removes m_frameDataCache, which is PpQueue.
PPCamNET::Internal::AcqCache::~AcqCache()
{
m_processFrameRunning = false;
delete m_frameDataCache; // <== here another break point but not called
delete[] m_frameInfoBuffer;
}
Finally the constructor of Acqusition creates an instance of m_acqCache with gcnew.
PPCamNET::Internal::Acquisition::Acquisition(AcqBuffer^ acqBuffer,
CameraSettings^ camSettings)
{
m_eofEvent = gcnew AutoResetEvent(false);
m_acqCache = gcnew AcqCache(acqBuffer);
I am puzzled why these destructors aren't called. Is this because GC didn't kick in to clear up m_acqCache causing other desructors not being called ?
Thanks,
GC will not call a destructor. The C++/CLI compiler translates destructors into IDisposable::Dispose() implementation, and has a different syntax (ClassName::!ClassName) which gets translated to Finalize.
C# wrongly confuses this issue by using the name "destructor" for the CLR Finalize function. Finalizers are not deterministic. The Dispose() function is not called automatically by GC. Not difficult once you know what's going on, but the terminology sure leads to confusion.

How to transfer a float array (without serializing/deserializing) from Scala (JeroMQ) to C (ZMQ)?

Currently, I am using a JSON library to serialize the data at the sender (JeroMQ), and deserialize at the receiver (C, ZMQ). But, while parsing, the JSON library starts to consume a lot of memory and the OS kills the process. So, I want to send the float array as it is, i.e. without using JSON.
The existing sender code is below (syn0 and syn1 are Double arrays). If syn0 and syn1 are around 100 MB each, the process is killed while parsing the received arrays, i.e. the last line of the snippet below:
import org.zeromq.ZMQ
import com.codahale.jerkson
socket.connect("tcp://localhost:5556")
socket.send(json.JSONObject(Map("syn0"->json.JSONArray(List.fromArray(syn0Global)))).toString())
println("SYN0 Request sent”)
val reply_syn0 = socket.recv(0)
println("Response received after syn0: " + new String(reply_syn0))
logInfo("Sending Syn1 request … , size : " + syn1Global.length )
socket.send(json.JSONObject(Map("syn1"->json.JSONArray(List.fromArray(syn1Global)))).toString())
println("SYN1 Request sent")
val reply_syn1 = socket.recv(0)
socket.send(json.JSONObject(Map("foldComplete"->"Done")).toString())
println("foldComplete sent")
// Get the reply.
val reply_foldComplete = socket.recv(0)
val processedSynValuesJson = new String(reply_foldComplete)
val processedSynValues_jerkson = jerkson.Json.parse[Map[String,List[Double]]](processedSynValuesJson)
Can these arrays be transferred without using JSON?
Here I am transferring a float array between two C programs:
//client.c
int main (void)
{
printf ("Connecting to hello world server…\n");
void *context = zmq_ctx_new ();
void *requester = zmq_socket (context, ZMQ_REQ);
zmq_connect (requester, "tcp://localhost:5555");
int request_nbr;
float send_buffer[10];
float recv_buffer[10];
for(int i = 0; i < 10; i++)
send_buffer[i] = i;
for (request_nbr = 0; request_nbr != 10; request_nbr++) {
//char buffer [10];
printf ("Sending Hello %d…\n", request_nbr);
zmq_send (requester, send_buffer, 10*sizeof(float), 0);
zmq_recv (requester, recv_buffer, 10*sizeof(float), 0);
printf ("Received World %.3f\n", recv_buffer[5]);
}
zmq_close (requester);
zmq_ctx_destroy (context);
return 0;
}
//server.c
int main (void)
{
// Socket to talk to clients
void *context = zmq_ctx_new ();
void *responder = zmq_socket (context, ZMQ_REP);
int rc = zmq_bind (responder, "tcp://*:5555");
assert (rc == 0);
float recv_buffer[10];
float send_buffer[10];
while (1) {
//char buffer [10];
zmq_recv (responder, recv_buffer, 10*sizeof(float), 0);
printf ("Received Hello\n");
for(int i = 0; i < 10; i++)
send_buffer[i] = recv_buffer[i]+5;
zmq_send (responder, send_buffer, 10*sizeof(float), 0);
}
return 0;
}
Finally, my unsuccessful attempt at doing something similar using Scala (below is the client code):
def main(args: Array[String]) {
val context = ZMQ.context(1)
val socket = context.socket(ZMQ.REQ)
println("Connecting to hello world server…")
socket.connect ("tcp://localhost:5555")
val msg : Array[Float] = Array(1,2,3,4,5,6,7,8,9,10)
val bbuf = java.nio.ByteBuffer.allocate(4*msg.length)
bbuf.asFloatBuffer.put(java.nio.FloatBuffer.wrap(msg))
for (request_nbr <- 1 to 10) {
socket.sendByteBuffer(bbuf,0)
}
}
SER/DES ? Size ?No, an underlying transport-philosophy related constraint matters.
You have started with an 0.1 GB sizing for transport-payload and reported a JSON-library allocations to cause your O/S to kill the process.
Next, in other post, you have requested an 0.762 GB sizing for transport-payload.
But there is a bit more important issue in ZeroMQ transport orchestration than a choice of an external data-serialiser SER/DES policy.
No one may forbid you to try to send as big BLOB as possible, whereas a JSON-decorated string has already shown you the dark-side of such approaches, there are other reasons not to proceed this way ahead.
ZeroMQ is out of question a great and powerful toolbox. Still it takes some time for one to gain an insight necessary for indeed a smart and highly performant code-deployment, that makes maximum out of this powerful work-horse.
One of side-effects of the feature-rich internal ecosystem "under-the-hood" is a not very much known policy, hidden in a message delivery concept.
One may send any reasonable-sized message, while a delivery is not guaranteed. It is either completely delivered, or nothing gets out at all, as said above, nothing is guaranteed.
Ouch?!
Yes, not guaranteed.
Based on this core Zero-Guarrantee philosophy, one shall take due care to decide on steps and measures, the more if you plan to try to move Gigabyte BEASTs there and back.
In this very sense, it might become quantitatively supported by real SUT testing, that small-sized messages may transport ( if you indeed still need to move GBs ( refer to comment above, under the OP ) and have no other choice ) the whole volume of data segmented into smaller pieces, with error-prone re-assembly measures, which results in much faster and much safer end-to-end solution than trying to use dumb-force and instruct the code to dump about a GB of data onto whatever resources there actually are available ( Zero-Copy principle of ZeroMQ cannot and will not per-se save you in these efforts ).
For details on another hidden trap, related to not fully Zero-Copy implementation, read Martin SUSTRIK's, co-father of ZeroMQ, remarks on Zero-Copy "till-kernel-boundary-only" ( so, at least double the memory-space allocations to be expected... ).
Solution:
Redesign the architecture so as to propagate small-sized messages, if not keeping an original datastructure "mirrored" in remote process(es) instead of attempting to keep one-shot giga-transfers survivable.
The best next step?
While it does not solve your trouble with a few SLOC-s, the best thing, if you are serious about to invest your intellectual powers into distributed processing, is to read Pieter HINTJEN's lovely book "Code Connected, Vol.1"
Yes, it takes some time to generate one's own insight, but this will raise you in many aspects onto another level of professional code design. Worth time. Worth efforts.
You'll need to serialize the data in some form or fashion - ultimately you're taking a structure in memory on one side and instructing the other side on how to rebuild that structure (bonus points for using two separate languages where the structure in memory is likely different anyway). I'd suggest you use a new JSON library as that appears to be where the problem lies, but there are more efficient protocols you could be using. Protocol Buffers enjoy good support across many languages, that might be the place I'd start.

Accuracy of Task.Delay

I'm developing Windows 10 Universal App in C#/Xaml, I'm using await Task.Delay(delayInMilliseconds) to suspend a method for given time. My scenario is somewhat realtime, so it's very sensitive to time changes, and I need to make sure that when i suspend a method for let's say 8 millisecods it is suspended for 8 millisecods. I have noticed that the actual time span for which ** Task.Delay ** suspends the method differes from what is passed as delay parameter, for 1 up to 30 ms, the length of "deviation" being different in each call. So, when I want to sleep for 8 milliseconds, my system sleeps for 9 to 39 milliseconds, and this completly ruins my scenario.
So, my question what is the best way to substitute ** Task.Delay ** and to achieve good precision? For the time being I use this method:
public static void Delay1(int delay)
{
long mt = delay * TimeSpan.TicksPerMillisecond;
Stopwatch s = Stopwatch.StarNew();
while (true)
{
if (s.Elapsed.TotalMilliseconds > delay)
{
return;
}
}
}
but it guees it consumes a lot of resources, that is 100% of a processor's core. If an user has small number of processor's cores, this would be very insufficient.
According to msdn it's not possible to achieve more accuracy due to system clock resolution:
This method depends on the system clock. This means that the time
delay will approximately equal the resolution of the system clock if
the millisecondsDelay argument is less than the resolution of the
system clock, which is approximately 15 milliseconds on Windows
systems.
Seems like I've found a sattisfactory solution: using ManualResetEvent(false).WaitOne(delay) instead of await Task.Delay(delayInMilliseconds).
I've used following code to test both methods:
async void Probe()
{
for (int i = 0; i < 1000; i++)
{
// METHOD 1
await Task.Delay(3);
// METHOD 2
new System.Threading.ManualResetEvent(false).WaitOne(3);
}
}
The code above should take exactly 3 seconds to execute. With METHOD 2 it took around 3300 ms, so the error is 0.3 ms per call. Acceptable for me. But the METHOD 1 took around 15 seconds (as mentioned by others and explained above) to execute which gives totally unacceptable error for my scenario.
EDIT: WaitOne() is a blocking call, so you'll probably want to run it as a task to get it off the UI thread (or any other thread with a message pump). #Abel has pointed out another high-res timer approach that is already baked into a task and will run in an alternate thread as shown here: (https://stackoverflow.com/a/62588903/111575). That approach makes cpu-intensive calls to Thread.Spinwait() and Thread.Sleep(0) for small intervals.
You should use multi-media timers. Those are much accurate.
Take a look here:
https://social.msdn.microsoft.com/Forums/vstudio/en-US/2024e360-d45f-42a1-b818-da40f7d4264c/accurate-timer
After working hard I found a solution to sleep the thread for specific time and the error is just 3 to 4 microseconds on my Core 2 Duo CPU 3.00 GHz.
Here it is:
Here is code.(C# code is also given at the end.) Use "ThreadSleepHandler":
Public Class ThreadSleepHandler
Private Stopwatch As New Stopwatch
Private SpinStopwatch As New Stopwatch
Private Average As New TimeSpan
Private Spinner As Threading.SpinWait
Public Sub Sleep(Time As TimeSpan)
Stopwatch.Restart()
If Average.Ticks = 0 Then Average = GetSpinTime()
While Stopwatch.Elapsed < Time
If Stopwatch.Elapsed + Average < Time Then
Average = TimeSpan.FromTicks((Average + GetSpinTime()).Ticks / 2)
End If
End While
End Sub
Public Function GetSpinTime() As TimeSpan
SpinStopwatch.Restart()
Spinner.SpinOnce()
SpinStopwatch.Stop()
Return SpinStopwatch.Elapsed
End Function
End Class
Here is example code:
Sub Main()
Dim handler As New ThreadSleepHandler
Dim stopwatch As New Stopwatch
Do
stopwatch.Restart()
handler.Sleep(TimeSpan.FromSeconds(1))
stopwatch.Stop()
Console.WriteLine(stopwatch.Elapsed)
Loop
End Sub
For c# programmers here is code (I have converted this code but I am not sure):
static class Main
{
public static void Main()
{
ThreadSleepHandler handler = new ThreadSleepHandler();
Stopwatch stopwatch = new Stopwatch();
do {
stopwatch.Restart();
handler.Sleep(TimeSpan.FromSeconds(1));
stopwatch.Stop();
Console.WriteLine(stopwatch.Elapsed);
} while (true);
}
}
public class ThreadSleepHandler
{
private Stopwatch Stopwatch = new Stopwatch();
private Stopwatch SpinStopwatch = new Stopwatch();
private TimeSpan Average = new TimeSpan();
private Threading.SpinWait Spinner;
public void Sleep(TimeSpan Time)
{
Stopwatch.Restart();
if (Average.Ticks == 0)
Average = GetSpinTime();
while (Stopwatch.Elapsed < Time) {
if (Stopwatch.Elapsed + Average < Time) {
Average = TimeSpan.FromTicks((Average + GetSpinTime()).Ticks / 2);
}
}
}
public TimeSpan GetSpinTime()
{
SpinStopwatch.Restart();
Spinner.SpinOnce();
SpinStopwatch.Stop();
return SpinStopwatch.Elapsed;
}
}
Note:
"ThreadSleepHandler" is thread unsafe. You cannot use a single "ThreadSleepHandler" for multiple threads.
The first sleep time will not be enough accurate.
In my tests, I found that DispatcherTimer at 20ms intervals will deliver sufficiently smooth animation if you must do it in code. Doing it in XAML is another alternative as already mentioned.

Atmel Studio Dummy_Handler

Occasionally I will get an unexpected interrupt, and my code will hang inside Dummy_Handler() in exceptions.c of the Atmel Studio Framework (ASF). I am using the ATSAM3X8E microcontroller of the Arduino Due.
void Dummy_Handler(void)
{
while(1) {
}
}
Any ideas how to determine which interrupt it was?
Of course I could replace this single handler with unique dummy handlers, one for each exception. (There are about fifty of them.) For example change each line in the same exceptions.c file:
void HardFault_Handler ( void ) __attribute__ ((weak, alias("Dummy_Handler")));
to this
void HardFault_Handler ( void ) __attribute__ ((weak, alias("Dummy_HardFault_Handler")));
Etc... Or try to reason how my code could have generated which interrupt. But who has that kind of time?
This MCU has an Interrupt Program Status Register that gives some clue as to source. ASF has wrapped it in a function __get_IPSR() in core_cmFunc.h:
uint32_t phantomISR = 9999;
void Dummy_Handler(void)
{
while(1) {
phantomISR = __get_IPSR();
}
}
Then this global variable can be monitored at runtime. (In my case I paused the assembly code for this loop-of-death and saw the value 3 in the R3 register.) The Atmel MCU doc explains its value:
ISR_NUMBER
This is the number of the current exception:
0 = Thread mode
1 = Reserved
2 = NMI
3 = Hard fault
4 = Memory management fault
5 = Bus fault
6 = Usage fault
7-10 = Reserved
11 = SVCall
12 = Reserved for Debug
13 = Reserved
14 = PendSV
15 = SysTick
16 = IRQ0
45 = IRQ29
Both times this happened to me it was the Hard Fault, a kind of blue-screen-of-death for the Ardunio Due. So I also installed a Hard Fault handler of my own.
ISR(HardFault_Handler)
{
while (1) {
}
}
Also, detectable in debug mode by pausing. Of course the sequel is, what causes a Hard Fault? I'm guessing memory wipe or infinite recursion.

Without using recursion how can a stack overflow exception be thrown?

Without using recursion how can a stack overflow exception be thrown?
Since no one else has mentioned it:
throw new System.StackOverflowException();
You might do this when testing or doing fault-injection.
Declare an ENORMOUS array as a local variable.
If you call enough methods, a stack overflow can occur anytime. Although, if you get stack overflow errors without using recursion, you may want to rethink how you're doing things. It's just so easy with recursion because in an infinite loop, you call a ton of methods.
The following applies to Windows, but most OSs implement this in a similar fashion.
The short answer is: if you touch the last guard page, it will throw.
An exception of type EXCEPTION_STACK_OVERFLOW (C00000FD) is raised when your application touches the bottom page of the stack, that is marked a PAGE_GUARD protection flag, and there is no room to grow the stack (commit one more page), see How to trap stack overflow in a Visual C++ application.
The typical case when this happens is when the stack has grown as the result of many function frames on the stack (ie. out of control recursion), as the result of fewer frames but very large frame sizes (functions with a very large local scoped object) or by explicitly allocating from the stack with _alloca.
Another way to cause the exception is to simply intentionally touch the guard page, eg. by dereferencing a pointer that points into that page. This can happen due to a variable initializion bug.
Stack overflows can occur on valid execution paths if the input causes a very deep nesting level. For instance see Stack overflow occurs when you run a query that contains a large number of arguments inside an IN or a NOT IN clause in SQL Server.
Every method call that has not yet returned consumes some stack space. (Methods with more local variables consume more space.) A very deep call stack can result in stack overflow.
Note that on systems with limited memory (mobile devices and such) you don't have much stack space and will run out sooner.
Short answer: if you have an object which calls an internal object, you increase the stack trace by 1. So, if you have 1000s of objects nested inside one another, each calling its internal object, eventually you'll get a stack overflow.
Here's a demonstration of how to generate primes using nested iterators:
using System;
using System.Collections.Generic;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
Program p = new Program();
IEnumerator<int> primes = p.AllPrimes().GetEnumerator();
int numberOfPrimes = 1000;
for (int i = 0; i <= numberOfPrimes; i++)
{
primes.MoveNext();
if (i % 1000 == 0)
{
Console.WriteLine(primes.Current);
}
}
Console.ReadKey(true);
}
IEnumerable<int> FilterDivisors(IEnumerator<int> seq, int num)
{
while (true)
{
int current = seq.Current;
if (current % num != 0)
{
yield return current;
}
seq.MoveNext();
}
}
IEnumerable<int> AllIntegers()
{
int i = 2;
while (true)
{
yield return i++;
}
}
IEnumerable<int> AllPrimes()
{
IEnumerator<int> nums = AllIntegers().GetEnumerator();
while (true)
{
nums.MoveNext();
int prime = nums.Current;
yield return prime;
// nested iterator makes a big boom
nums = FilterDivisors(nums, prime).GetEnumerator();
}
}
}
}
There's no recursion, but the program will throw a stack overflow exception after around 150,000 primes.
If you're talking about C++ with a reasonable standard library, I image that this would work:
while (true) {
alloca(1024 * 1024); // arbitrary - 1M per iteration.
}
Details on alloca.
int main()
{
//something on the stack
int foo = 0;
for (
//pointer to an address on the stack
int* p = &foo;
//forever
;
//ever lower on the stack (assuming that the stack grows downwards)
--p)
{
//write to the stack
*p = 42;
}
}
You can allocate a few bytes in the stack as well.
static void Main(string[] args)
{
Span<byte> b = stackalloc byte[1024 * 1024 * 1024]; // Process is terminating due to StackOverflowException.
}
Easiest way to make a StackOverflowException is the following:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
SomeClass instance = new SomeClass();
string name = instance.Name;
}
}
public class SomeClass
{
public string Name
{
get
{
return Name;
}
}
}
}