Is STL empty() threadsafe? - stl

I have multiple threads modifying an stl vector and an stl list.
I want to avoid having to take a lock if the container is empty
Would the following code be threadsafe? What if items was a list or a map?
class A
{
vector<int> items
void DoStuff()
{
if(!items.empty())
{
AquireLock();
DoStuffWithItems();
ReleaseLock();
}
}
}

It depends what you expect. The other answers are right that in general, standard C++ containers are not thread-safe, and furthermore, that in particular your code doesn’t ward against another thread modifying the container between your call to empty and the acquisition of the lock (but this matter is unrelated to the thread safety of vector::empty).
So, to ward off any misunderstandings: Your code does not guarantee items will be non-empty inside the block.
But your code can still be useful, since all you want to do is avoid redundant lock creations. Your code doesn’t give guarantees but it may prevent an unnecessary lock creation. It won’t work in all cases (other threads can still empty the container between your check and the lock) but in some cases. And if all you’re after is an optimization by omitting a redundant lock, then your code accomplishes that goal.
Just make sure that any actual access to the container is protected by locks.
By the way, the above is strictly speaking undefined behaviour: an STL implementation is theoretically allowed to modify mutable members inside the call to empty. This would mean that the apparently harmless (because read-only) call to empty can actually cause a conflict. Unfortunately, you cannot rely on the assumption that read-only calls are safe with STL containers.
In practice, though, I am pretty sure that vector::empty will not modify any members. But already for list::empty I am less sure. If you really want guarantees, then either lock every access or don’t use the STL containers.

There is no thread-safe guaranty on anything in the containers and algorithms of the the STL.
So, No.

Regardless of whether or not empty is thread safe, your code will not, as written, accomplish your goal.
class A
{
vector<int> items
void DoStuff()
{
if(!items.empty())
{
//Another thread deletes items here.
AquireLock();
DoStuffWithItems();
ReleaseLock();
}
}
}
A better solution is to lock every time you work with items (when iterating, getting items, adding items, checking count/emptiness, etc.), thus providing your own thread safety. So, acquire the lock first, then check if the vector is empty.

As it is already answered, the above code is not thread safe and locking is mandatory before actually doing anything with the container.
But the following should have better performance than always locking and I can't think of a reason that it can be unsafe.
The idea here is that locking can be expensive and we are avoiding it, whenever not really needed.
class A
{
vector<int> items;
void DoStuff()
{
if(!items.empty())
{
AquireLock();
if(!items.empty())
{
DoStuffWithItems();
}
ReleaseLock();
}
}
}

STL is not thread safe and empty too. If you want make container safe you must close all its methods by mutex or other sync

Related

Conditional and Loop return in the middle of the code is that correct? [duplicate]

This question already has answers here:
Should a function have only one return statement?
(50 answers)
Closed 9 years ago.
Sometimes me have indicated that you can not put a return in the middle of a conditional or a loop, because it breaks the process. However, now have indicated to me that if you can do, and is better. I'm confused. Usually would happen in a function
Can you put a return? Is not it? Why? Or doesn't it make any difference?
Example:
if (i == 0)
{
//other code
return true;
}
else
{
//other code
return false;
}
or
if (i == 0)
{
//other code
b= true;
}
else
{
//other code
b= false;
}
return b;
Your two examples are basically equivalent in functionality, and either will work. In fact, an optimizing compiler may easily turn your second example into your first.
Most programmers would likely prefer the first as the intent is clearer.
It's better to have a single return at the bottom. That way, you have only one point of entry and one point of exit. It is much easier to debug code when you don't have to worry about where it will exit. This is not big deal with very short methods, but for long ones that go on for a few hundred lines, it is much cleaner.
I don't see any practical implication of returning in the middle of a loop. If you hear people saying you shouldn't, then it must be on the basis of readability of the code. If you have multiple exit points from the function, it might make some code ugly. Also, most of the time, you have to do some cleanup before exiting the routine. So, generally programmers tend to keep the cleanup routine at one place and always exit through that path. if you have multiple exit points then you have to add the clean up routines in all these places, that makes code duplication and again ruin the readability of the code. I have seen codes with returns spread all over the places and eventually failing to do the clean up properly and causing memory leaks.
The bigger problem is, most of the time the code you write now lives for a long time and the maintainer keeps changing, and at some point people doesn't understand the whole intent of all the lines of code present. that will add in to all these confusion.
All that said, I have seen a lot very beautifully written code with returns in the middle of loops.
This is a choice of style rather than it being a rule or a matter of performance. The second code example follows the "single entry, single exit" approach, where the code within the function only enters from the top and only exits from the bottom. The idea behind this is that this is more "safe" and easier to follow the code flow. The safety comes into play when you have manually set dynamic storage: with a single point of return, you can ensure that you free all the memory. Of course, languages like java and C# do dynamic storage for you, so this isn't really an issue. Also, if you're exiting multiple times in the middle of a function (particularly if it's very long), it might be hard to keep track of what causes the function to return.
However, choosing to exit only at the bottom of a function can create its own problems, as you may sometimes need to keep track of more state by setting and checking flags.
As for your original question, it certainly does not break anything in modern programming languages; it's all up to you. Go with the way you find easier to follow.

Which way is better to define functions in AS3?

From the performance point of view, which function definition is better/faster? Making an object and adding functions to that, or making functions one by one?
var myCollection:Object = {
first: function(variable:int):void {
},
second: function(variable:int):void {
}
}
myCollection.first(1);
or
private function first(variable:int):void {
}
private function second(variable:int):void {
}
first(1);
The latter. The performance hit will be negligible, except on a massive scale, but the second one will be slightly faster.
Basically it boils down to scope. To get a function from an object, you have to find the memory reference to the object within the scope of the class and then find the memory reference to the function within the scope of the object. With the second, you just have to find the Function object (all functions are objects) memory reference within the scope of the class.
The second method cuts out the middle man, essentially, in identifying the correct function. Now, each one will be found in less than a millisecond. As far as you are concerned, it is instant. But if you are doing this 100k times in a row? Yeah, you might see a bit of a performance boost by declaring within the class.
As an additional note, you are also adding another object to memory with the first one. Add enough of these (again, needs to be on a massive scale), and you will see a slowdown just from the superfluous objects stored in memory
You should also look at usability and readability, though. Declaring in an object means that the functions are not available as soon as the class is instantiated so you have to be careful you don't call the function before the object is instantiated. Additionally, you would lose code hinting and it is not the common way to write your code (meaning another dev, or even yourself a year from now, would have to figure out how it is working without any help from a hinter or from the standards they have already learned before they could do any modifications)

Is it still thread safe if I do first() then pop_front()?

Consider the following code in a multithread program:
QString target = remaining.first(); // remaining is a QVector<QString> class
remaining.pop_front();
Would it be safe? Looks like multiple thread may use the same "target" simultaneously. Or what's the safe way to retrieve + erase the first value?
Without a mutex protecting that code, no, it's not at all safe.
I don't know QVector in detail but I believe it's OK for two threads to both do:
QString target = remaining.first();
This simply copies an element of the vector, so each thread has its own QString object called target and they are independent objects (behind the scenes they use implicit sharing so are not independent, but you should be able to treat them as independent)
But this line modifies the QVector:
remaining.pop_front();
This means two threads modify the same object without any synchronisation. If the first thread is still accessing the vector by calling remaining.first() when the second thread calls pop_front() then there is a data race, with undefined behaviour.
Similarly, if both threads call pop_front() concurrently they will both try to remove the first element, what happens there is completely unpredictable. You might erase one element, or two, or none, or crash the entire program immediately. As another possibility, consider what happens if the vector only has one element. Both threads check it's not empty, copy the first() element, then call pop_front(), which tries to remove two elements when there's only one. You're program is broken.
The safe way to do it is protect the code with a mutex, where mutex is some global or otherwise shared variable that is visible to both threads:
QString target;
{
QMutexLocker locker(&mutex);
if (!remaining.empty())
{
target = remaining.first();
remaining.pop_front();
}
}

Programming style: should you return early if a guard condition is not satisfied?

One thing I've sometimes wondered is which is the better style out of the two shown below (if any)? Is it better to return immediately if a guard condition hasn't been satisfied, or should you only do the other stuff if the guard condition is satisfied?
For the sake of argument, please assume that the guard condition is a simple test that returns a boolean, such as checking to see if an element is in a collection, rather than something that might affect the control flow by throwing an exception. Also assume that methods/functions are short enough not to require editor scrolling.
// Style 1
public SomeType aMethod() {
SomeType result = null;
if (!guardCondition()) {
return result;
}
doStuffToResult(result);
doMoreStuffToResult(result);
return result;
}
// Style 2
public SomeType aMethod() {
SomeType result = null;
if (guardCondition()) {
doStuffToResult(result);
doMoreStuffToResult(result);
}
return result;
}
I prefer the first style, except that I wouldn't create a variable when there is no need for it. I'd do this:
// Style 3
public SomeType aMethod() {
if (!guardCondition()) {
return null;
}
SomeType result = new SomeType();
doStuffToResult(result);
doMoreStuffToResult(result);
return result;
}
Having been trained in Jackson Structured Programming in the late '80s, my ingrained philosophy was always "a function should have a single entry-point and a single exit-point"; this meant I wrote code according to Style 2.
In the last few years I have come to realise that code written in this style is often overcomplex and hard to read/maintain, and I have switched to Style 1.
Who says old dogs can't learn new tricks? ;)
Style 1 is what the Linux kernel indirectly recommends.
From https://www.kernel.org/doc/Documentation/process/coding-style.rst, chapter 1:
Now, some people will claim that having 8-character indentations makes
the code move too far to the right, and makes it hard to read on a
80-character terminal screen. The answer to that is that if you need
more than 3 levels of indentation, you're screwed anyway, and should fix
your program.
Style 2 adds levels of indentation, ergo, it is discouraged.
Personally, I like style 1 as well. Style 2 makes it harder to match up closing braces in functions that have several guard tests.
I don't know if guard is the right word here. Normally an unsatisfied guard results in an exception or assertion.
But beside this I'd go with style 1, because it keeps the code cleaner in my opinion. You have a simple example with only one condition. But what happens with many conditions and style 2? It leads to a lot of nested ifs or huge if-conditions (with || , &&). I think it is better to return from a method as soon as you know that you can.
But this is certainly very subjective ^^
Martin Fowler refers to this refactoring as :
"Replace Nested Conditional with Guard Clauses"
If/else statements also brings cyclomatic complexity. Hence harder to test cases. In order to test all the if/else blocks you might need to input lots of options.
Where as if there are any guard clauses, you can test them first, and deal with the real logic inside the if/else clauses in a clearer fashion.
If you dig through the .net-Framework using .net-Reflector you will see the .net programmers use style 1 (or maybe style 3 already mentioned by unbeli).
The reasons are already mentioned by the answers above. and maybe one other reason is to make the code better readable, concise and clear.
the most thing this style is used is when checking the input parameters, you always have to do this if you program a kind of frawework/library/dll.
first check all input parameters than work with them.
It sometimes depends on the language and what kinds of "resources" that you are using (e.g. open file handles).
In C, Style 2 is definitely safer and more convenient because a function has to close and/or release any resources that it obtained during execution. This includes allocated memory blocks, file handles, handles to operating system resources such as threads or drawing contexts, locks on mutexes, and any number of other things. Delaying the return until the very end or otherwise restricting the number of exits from a function allows the programmer to more easily ensure that s/he properly cleans up, helping to prevent memory leaks, handle leaks, deadlock, and other problems.
In C++ using RAII-style programming, both styles are equally safe, so you can pick one that is more convenient. Personally I use Style 1 with RAII-style C++. C++ without RAII is like C, so, again, Style 2 is probably better in that case.
In languages like Java with garbage collection, the runtime helps smooth over the differences between the two styles because it cleans up after itself. However, there can be subtle issues with these languages, too, if you don't explicitly "close" some types of objects. For example, if you construct a new java.io.FileOutputStream and do not close it before returning, then the associated operating system handle will remain open until the runtime garbage collects the FileOutputStream instance that has fallen out of scope. This could mean that another process or thread that needs to open the file for writing may be unable to until the FileOutputStream instance is collected.
Although it goes against best practices that I have been taught I find it much better to reduce the nesting of if statements when I have a condition such as this. I think it is much easier to read and although it exits in more than one place it is still very easy to debug.
I would say that Style1 became more used because is the best practice if you combine it with small methods.
Style2 look a better solution when you have big methods. When you have them ... you have some common code that you want to execute no matter how you exit. But the proper solution is not to force a single exit point but to make the methods smaller.
For example if you want to extract a sequence of code from a big method, and this method has two exit points you start to have problems, is hard to do it automatically. When i have a big method written in style1 i usually transform it in style2, then i extract methods then in each of them i should have Style1 code.
So Style1 is best but is compatible with small methods.
Style2 is not so good but is recommended if you have big methods that you don't want, have time to split.
I prefer to use method #1 myself, it is logically easier to read and also logically more similar to what we are trying to do. (if something bad happens, exit function NOW, do not pass go, do not collect $200)
Furthermore, most of the time you would want to return a value that is not a logically possible result (ie -1) to indicate to the user who called the function that the function failed to execute properly and to take appropriate action. This lends itself better to method #1 as well.
I would say "It depends on..."
In situations where I have to perform a cleanup sequence with more than 2 or 3 lines before leaving a function/method I would prefer style 2 because the cleanup sequence has to be written and modified only once. That means maintainability is easier.
In all other cases I would prefer style 1.
Number 1 is typically the easy, lazy and sloppy way. Number 2 expresses the logic cleanly. What others have pointed out is that yes it can become cumbersome. This tendency though has an important benefit. Style #1 can hide that your function is probably doing too much. It doesn't visually demonstrate the complexity of what's going on very well. I.e. it prevents the code from saying to you "hey this is getting a bit too complex for this one function". It also makes it a bit easier for other developers that don't know your code to miss those returns sprinkled here and there, at first glance anyway.
So let the code speak. When you see long conditions appearing or nested if statements it is saying that maybe it would be better to break this stuff up into multiple functions or that it needs to be rewritten more elegantly.

Successive success checks

Most of you have probably bumped into a situation, where multiple things must be in check and in certain order before the application can proceed, for example in a very simple case of creating a listening socket (socket, bind, listen, accept etc.). There are at least two obvious ways (don't take this 100% verbatim):
if (1st_ok)
{
if (2nd_ok)
{
...
or
if (!1st_ok)
{
return;
}
if (!2nd_ok)
{
return;
}
...
Have you ever though of anything smarter, do you prefer one over the other of the above, or do you (if the language provides for it) use exceptions?
I prefer the second technique. The main problem with the first one is that it increases the nesting depth of the code, which is a significant issue when you've got a substantial number of preconditions/resource-allocs to check since the business part of the function ends up deeply buried behind a wall of conditions (and frequently loops too). In the second case, you can simplify the conceptual logic to "we've got here and everything's OK", which is much easier to work with. Keeping the normal case as straight-line as possible is just easier to grok, especially when doing maintenance coding.
It depends on the language - e.g. in C++ you might well use exceptions, while in C you might use one of several strategies:
if/else blocks
goto (one of the few cases where a single goto label for "exception" handling might be justified
use break within a do { ... } while (0) loop
Personally I don't like multiple return statements in a function - I prefer to have a common clean up block at the end of the function followed by a single return statement.
This tends to be a matter of style. Some people only like returning at the end of a procedure, others prefer to do it wherever needed.
I'm a fan of the second method, as it allows for clean and concise code as well as ease of adding documentation on what it's doing.
// Checking for llama integration
if (!1st_ok)
{
return;
}
// Llama found, loading spitting capacity
if (!2nd_ok)
{
return;
}
// Etc.
I prefer the second version.
In the normal case, all code between the checks executes sequentially, so I like to see them at the same level. Normally none of the if branches are executed, so I want them to be as unobtrusive as possible.
I use 2nd because I think It reads better and easier to follow the logic. Also they say exceptions should not be used for flow control, but for the exceptional and unexpected cases. Id like to see what pros say about this.
What about
if (1st_ok && 2nd_ok) { }
or if some work must be done, like in your example with sockets
if (1st_ok() && 2nd_ok()) { }
I avoid the first solution because of nesting.
I avoid the second solution because of corporate coding rules which forbid multiple return in a function body.
Of course coding rules also forbid goto.
My workaround is to use a local variable:
bool isFailed = false; // or whatever is available for bool/true/false
if (!check1) {
log_error();
try_recovery_action();
isFailed = true;
}
if (!isfailed) {
if (!check2) {
log_error();
try_recovery_action();
isFailed = true;
}
}
...
This is not as beautiful as I would like but it is the best I've found to conform to my constraints and to write a readable code.
For what it is worth, here are some of my thoughts and experiences on this question.
Personally, I tend to prefer the second case you outlined. I find it easier to follow (and debug) the code. That is, as the code progresses, it becomes "more correct". In my own experience, this has seemed to be the preferred method.
I don't know how common it is in the field, but I've also seen condition testing written as ...
error = foo1 ();
if ((error == OK) && test1)) {
error = foo2 ();
}
if ((error == OK) && (test2)) {
error = foo3 ();
}
...
return (error);
Although readable (always a plus in my books) and avoiding deep nesting, it always struck me as using a lot of unnecessary testing to achieve those ends.
The first method, I see used less frequently than the second. Of those times, the vast majority of the time was because there was no nice way around it. For the remaining few instances, it was justified on the basis of extracting a little more performance on the success case. The argument was that the processor would predict a forward branch as not taken (corresponding to the else clause). This depended upon several factors including, the architecture, compiler, language, need, .... Obviously most projects (and most aspects of the project) did not meet those requirements.
Hope this helps.