Should I worry about unused variables? - language-agnostic

I am working in large code base in C++, totaling approximately 8 million lines of code. In my application I have seen thousands of unused variables, which were reported by g++ but were ignored by my team. I want to take initiative for cleaning these variables but I need some info before working this issue.
Will there be any issues or disadvantages of having thousands of unused variables?
The compiler by default treats this as an ignored warning, but I believe we should treat warnings as errors. Is there any disaster which can occur if we ignore this warning?
Should we make the effort to rectify this problem or would it just be wasted effort?

Assuming your variables are POD types like ints, floats etc, they are unlikely to have an effect on performance. But they have a huge effect on code quality. I suggest as you update your code to add new features, you remove the unused variables as you go. You MUST be using version control software in order to do this safely.
This is a not uncommon problem. As a consultant, I once reviewed a large FORTRAN codebase that contained hundreds of unused variables. When I asked the team who wrote it why they were there, their answer was "Well, we might need them in the future..."

If you compile with optimizations on, the compiler will most likely simply remove the variables, just as if they aren't there. If you don't use optimizations, then your program will occupy additional extra storage space for the variables without using it.
It's good practice to not declare variables then not use them, because they might take up space and, more importantly, they clutter up your code, making it less readable.
If you have, say, 1000 unused ints, and an integer on your platform is 32 bits long, then you will, in total, use up 4K of extra stack space, with optimizations turned off.
If the unused variables are not arguments, then there should be nothing stopping you from removing them, as there's nothing you could break. You will gain readability and you will be able to see the other, more serious warnings that the compiler might produce.

Unused variables are still allocated in memory. Removing them will free up memory.

Related

Is there a Tcl extension to deal with huge lists, workaround the 2GB allocation limit?

Based a previous node in this site, TCL max size of array
it would appear that Tcl cannot handle > 256M elements list. Is there an extension/future plan to overcome this limitation?O/W, I would assume that for the time being, and next foreseeable future, if one needs to handle larger indexed arrays and/or dictionaries than that, one must resort to a different language.
Is that true?
I plan to fix the limitation as part of Tcl 9.0; I've done a few dry runs, so I know that it's a large but mostly mechanical change.
If you're dealing with very large amounts of data, consider putting it in a database. SQLite is recommended; it has an excellent Tcl API, and should be shipped as part of Tcl 8.6 (though that does depend on the packager; Linux distributions might make it be separate).

Purpose of abstraction

What is the purpose of abstraction in coding:
Programmer's efficiency or program's efficiency?
Our professor said that it is used merely for helping the programmer comprehend & modify programs faster to suit different scenarios. He also contended that it adds an extra burden on the program's performance. I am not exactly clear by what this means.
Could someone kindly elaborate?
I would say he's about half right.
The biggest purpose is indeed to help the programmer. The computer couldn't care less how abstracted your program is. However, there is a related, but different, benefit - code reuse. This isn't just for readability though, abstraction is what lets us plug various components into our programs that were written by others. If everything were just mixed together in one code file, and with absolutely no abstraction, you would never be able to write anything even moderately complex, because you'd be starting with the bare metal every single time. Just writing text on the screen could be a week long project.
About performance, that's a questionable claim. I'm sure it depends on the type and depth of the abstraction, but in most cases I don't think the system will notice a hit. Especially modern compiled languages, which actually "un-abstract" the code for you (things like loop unrolling and function inlining) sometimes to make it easier on the system.
Your professor is correct; abstraction in coding exists to make it easier to do the coding, and it increases the workload of the computer in running the program. The trick, though, is to make the (hopefully very tiny) increase in computer workload be dwarfed by the increase in programmer efficiency.
For example, on an extremely low-level; object-oriented code is an abstraction that helps the programmer, but adds some overhead to the program in the end in extra 'stuff' in memory, and extra function calls.
Since Abstraction is really the process of pulling out common pieces of functionality into re-usable components (be it abstract classes, parent classes, interfaces, etc.) I would say that it is most definitely a Programmer's efficiency.
Saying that Abstraction comes at the cost of performance is treading on unstable ground at best though. With most modern languages, abstraction (thus enhanced flexibility) can be had a little to no cost to the performance of the application.
What abstraction is is effectively outlined in the link Tesserex posted. To your professor's point about adding an additional burden on the program, this is actually fairly true. However, the burden in modern systems is negligible. Think of it in terms of what actually happens when you call a method: each additional method you call requires adding a number of additional data structures to the stack and then handling the return values also placed on the stack. So for instance, calling
c = add(a, b);
which looks something like
public int add(int a, int b){
return a + b;
}
requires pushing two integers onto the stack for the parameters and then pushing an additional one onto the stack for the return value. However, no memory interaction is required if both values are already in registers -- it's a simple, one-instruction call. Given that memory operations are much slower than register operations, you can see where the notion of a performance hit comes from.
Ultimately, every method call you make is going to increase the overhead of your program a little bit. However as #Tesserex points out, it's minute in most modern computer systems and as #Andrew Barber points out, that compromise is usually totally dwarfed by the increase in programmer efficiency.
Abstraction is a tool to make it easier for the programmer. The abstraction may or may not have an effect on the runtime performance of the system.
For an example of an abstraction that doesn't alter performance consider assembly. The pneumonics like mov and add are an abstraction that makes opcodes easier to remember, as compared to remembering byte-codes and other instruction encoding details. However, given the 1 to 1 mapping I'd suggest its clear that this abstraction has 0 effect on final performance.
There's not a clear-cut situation that abstraction makes life easier for the programmer at the expense of more work for the computer.
Although a higher level of abstraction typically adds at least a small amount of overhead to executing a discrete unit of code, it's also what allows the programmer to think about a problem in larger "units" so he can do a better job of understanding an entire problem, and avoid executing mane (or at least some) of those discrete units of code.
Therefore, a higher level of abstraction will often lead to faster-executing programs as long as you avoid adding too much overhead. The problem, of course, is that there's no easy or simple definition of how much overhead is too much. That stems largely from the fact that the amount of overhead that's acceptable depends heavily on the problem being solved, and the degree to which working at a higher level of abstraction allows the programmer to recognize operations that are truly unnecessary, and eliminate them.

Why don't managed languages offer the ability to manually delete objects?

Lets say you want to write a high performance method which processes a large data set.
Why shouldn't developers have the ability to turn on manual memory management instead of being forced to move to C or C++?
void Process()
{
unmanaged
{
Byte[] buffer;
while (true)
{
buffer = new Byte[1024000000];
// process
delete buffer;
}
}
}
Because allowing you to manually delete a memory block while there may still be references to it (and the runtime has no way of knowing that without doing a GC cycle) can produce dangling pointers, and thus break memory safety. GC languages are generally memory-safe by design.
That said, in C#, in particular, you can do what you want already:
void Process()
{
unsafe
{
byte* buffer;
while (true)
{
buffer = Marshal.AllocHGlobal(1024000000);
// process
Marshal.FreeHGlobal(buffer);
}
}
}
Note that, as in C/C++, you have full pointer arithmetic for raw pointer types in C# - so buffer[i] or buffer+i are valid expressions.
If you need high performance and detailed control, maybe you should write what you're doing in C or C++. Not all languages are good for all things.
Edited to add: A single language is not going to be good for all things. If you add up all the useful features in all the good programming languages, you're going to get a really big mess, far worse than C++, even if you can avoid inconsistency.
Features aren't free. If a language has a feature, people are likely to use it. You won't be able to learn C# well enough without learning the new C# manual memory management routines. Compiler teams are going to implement it, at the cost of other compiler features that are useful. The language is likely to become difficult to parse like C or C++, and that leads to slow compilation. (As a C++ guy, I'm always amazed when I compile one of our C# projects. Compilation seems almost instantaneous.)
Features conflict with each other, sometimes in unexpected ways. C90 can't do as well as Fortran at matrix calculations, since the possibility that C pointers are aliased prevents some optimizations. If you allow pointer arithmetic in a language, you have to accept its consequences.
You're suggesting a C# extension to allow manual memory management, and in a few cases that would be useful. That would mean that memory would have to be allocated in separate ways, and there would have to be a way to tell manually managed memory from automatically managed memory. Suddenly, you've complicated memory management, there's more chance for a programmer to screw up, and the memory manager itself is more complicated. You're gaining a little performance that matters in a few cases in exchange for more complication and slower memory management in all cases.
It may be that at some time we'll have a programming language that's good for almost all purposes, from scripting to number crunching, but there's nothing popular that's anywhere near that. In the meantime, we have to be willing to accept the limitations of using only one language, or the challenge of learning several and switching between them.
In the example you posted, why not just erase the buffer and re-use it?
The .NET garbage collector is very very good at working out which objects aren't referenced anymore and freeing the associated memory in a timely manner. In fact, the garbage collector has a special heap (the large object heap) in which it puts large objects like this, that is optimized to deal with them.
On top of this, not allowing references to be explicitly freed simply removes a whole host of bugs with memory leaks and dangling pointers, that leads to much safer code.
Freeing each unused block individually as done in a language with explicit memory management can be more expensive than letting a Garbage Collector do it, because the GC has the possibility to use copying schemes to spend a time linear to the number of blocks left alive (or the number of recent blocks left alive) instead of having to handle each dead block.
The same reason most kernels won't let you schedule your own threads. Because 99.99+% of time you don't really need to, and exposing that functionality the rest of the time will only tempt you to do something potentially stupid/dangerous.
If you really need fine grain memory control, write that section of code in something else.

What are some tricks that a processor does to optimize code?

I am looking for things like reordering of code that could even break the code in the case of a multiple processor.
The most important one would be memory access reordering.
Absent memory fences or serializing instructions, the processor is free to reorder memory accesses. Some processor architectures have restrictions on how much they can reorder; Alpha is known for being the weakest (i.e., the one which can reorder the most).
A very good treatment of the subject can be found in the Linux kernel source documentation, at Documentation/memory-barriers.txt.
Most of the time, it's best to use locking primitives from your compiler or standard library; these are well tested, should have all the necessary memory barriers in place, and are probably quite optimized (optimizing locking primitives is tricky; even the experts can get them wrong sometimes).
Wikipedia has a fairly comprehensive list of optimization techniques here.
Yes, but what exactly is your question?
However, since this is an interesting topic: tricks that compilers and processors use to optimize code should not break code, even with multiple processors, in the absence of race conditions in that code. This is called the guarantee of sequential consistency: if your program does not have any race conditions, and all data is correctly locked before accessing, the code will behave as if it were executed sequentially.
There is a really good video of Herb Sutter talking about this here:
http://video.google.com/videoplay?docid=-4714369049736584770
Everyone should watch this :)
DavidK's answer is correct, however it is also very important to be aware of the memory model for your language/runtime. Even without race conditions and with sequential consistency and mutex usage your code can still break when data is being cached by different threads running in the different cores of the cpu. Some languages, Java is one example, ensure the state of data between threads when a mutex lock is used, but it is rarely enough to simply ensure that no two threads can access the data at the same time. You need to use the mutex in a correct way to ensure that the language runtime synchronizes the data state between the two threads. In java this is done by having the two threads synchronize on the same object.
Here is a good page explaining the problem and how it's dealt with in javas memory model.
http://gee.cs.oswego.edu/dl/cpj/jmm.html

Techniques to Get rid of low level Locking

I'm wondering, and in need, of strategies that can be applied to reducing low-level locking.
However the catch here is that this is not new code (with tens of thousands of lines of C++ code) for a server application, so I can't just rewrite the whole thing.
I fear there might not be a solution to this problem by now (too late). However I'd like to hear about good patterns others have used.
Right now there are too many lock and not as many conflicts, so it's a paranoia induced hardware performance issue.
The best way to describe the code is as single threaded code suddenly getting peppered with locks.
Why do you need to eliminate the low-level locking? Do you have deadlock issues? Do you have performance problems? Or scaling issues? Are the locks generally contended or uncontended?
What environment are you using? The answers in C++ will be different to the ones in Java, for example. E.g. uncontended synchronization blocks in Java 6 are actually relatively cheap in performance terms, so simply upgrading your JRE might get you past whatever problem you are trying to solve. There might be similar performance boosts available in C++ by switching to a different compiler or locking library.
In general, there are several strategies that allow you to reduce the number of mutexes you acquire.
First, anything only ever accessed from a single thread doesn't need a mutex.
Second, anything immutable is safe provided it is 'safely published' (i.e. created in such a way that a partially constructed object is never visible to another thread).
Third, most platforms now support atomic writes - which can help when a single primitive type (including a pointer) is all that needs protecting. These work very similarly to optimistic locking in a database. You can also use atomic writes to create lock-free algorithms to replace more complex types, including Map implementations. However, unless you are very, very good, you are much better off borrowing somebody else's debugged implementation (the java.util.concurrent package contains lots of good examples) - it is notoriously easy to accidentally introduce bugs when writing your own algorithms.
Fourth, widening the scope of the mutex can help - either simply holding open a mutex for longer, rather than constantly locking and unlocking it, or taking a lock on a 'larger' item - the object rather than one of its properties, for example. However, this has to be done extremely carefully; you can easily introduce problems this way.
The threading model of your program has to be decided before a single line is written. Any module, if inconsistent with the rest of the program, can crash, corrupt of deadlock the application.
If you have the luxury of starting fresh, try to identify large functions of your program that can be done in parallel and use a thread pool to schedule the tasks. The trick to efficiency is to avoid mutexes wherever possible and (re)code your app to avoid contention for resources at a high level.
You may find some of the answers here and here helpful as you look for ways to atomically update shared state without explicit locks.