How much space on disk do software comments take compared to the rest of the project? - language-agnostic

A question on The Workplace mentioned a company that explicitly banned code comments: https://workplace.stackexchange.com/questions/140843/my-current-job-follows-worst-practices-how-can-i-talk-about-my-experience-in with the explanation being:
my PM says it's not worth the disk space.
This made me really curious: disk space right now is cheap, on the order of pennies per gigabyte. Would you really save all that much money on disk space if you completely removed all comments from your code?

In view in the wide variety of circumstance in which coding is done, I suppose that the cost of a byte of storage space must vary quite a bit. There is, however, no doubt that it must be extremely low for something like comments. On the other hand, the cost of uncommented code is huge: it is bug-prone, a nightmare to understand and nearly impossible to maintain.
Anyway, I Googled "cost of disk storage" and got $0.025 per GB for a 4 TB Seagate ST4000DM005...

Related

Couchbase compression free disk space requirement

According to old documentation:
If the amount of available disk space is less than twice the current database size, the compaction process does not take place and a warning is issued in the log.
Assuming this is still relevant for Couchbase 5.x (I couldn't find it in the latest docs), I'd like to know whether this requirement is truly for the entire bucket size (or even the entire database) - or rather per vBucket that's being compacted at a given point in time (since the compaction process happens per vBucket, with only 3 working in parallel by default).
If it's per compacting vBucket, I'd be less worried about having my single bucket take more than 50% of disk size, which right now I'm wary of and so I keep a very large margin of disk unutilized.
This questions was also asked on the Couchbase forums. I have copied my answer from there:
Assuming this is still relevant for Couchbase 5.x
This is still correct for Couchbase Server 5.X.
The recommendation is on the size of the whole bucket as you noted.
The calculation itself to check if compaction has enough space to complete successfully is done per vBucket. In other words compaction will run as long as there is enough space for that one vBucket to be compacted.
since the compaction process happens per vBucket, with only 3 working in parallel by default).
The default setting has been changed to one - for more details see MB-18426
I can't seem to find documentation in 5.x either, but again, assuming that old documentation still holds true, I think it's probably at the vbucket level as you suspect (some notes from Don Pinto in this old blog post seem to confirm). However, while documents are distributed relatively evenly between vbuckets, the actual size can vary, so I wouldn't make any assumptions about vbucket size if you're looking at disk space. Indexes also take up disk space.
But also note that if you're concerned about running into disk size limits, you can add another node, which will redistribute the vbuckets, and should free up more space on every node.

An example of dormant fault?

I have been thinking about the dormant fault and cannot figure out an example. By definition, dormant fault is a fault (defect in the code) that does not cause error and thus do not cause a failure. Can anyone give me an example? The only thing that crossed my mind was unusued buggy code..
Thanks
Dormant faults are much more common than one might think. Most programmers have experienced moments of thinking "What was I thinking? How could that ever run correctly?", even though the code didn't show erroneous behaviour. A classic case is faulty corner-case handling, e.g. on failed memory allocation:
char *foo = malloc(42);
strcpy( foo, "BarBaz" );
The above code will work fine in most situations and pass tests just fine; however, when malloc fails due to memory exhaustion, it will fail miserably. The fault is there, but dormant.
Dormant faults are simply ones that don't get revealed until you send the right input [edit: or circumstances] to the system.
A classic example is from Therac-25. The race condition caused by an unlikely set of keys on input didn't occur until technicians became "fluent" with using the system. They memorized the key strokes for common treatments, which means they could enter them very quickly.
Some other ones that come to my mind:
Y2K bugs were all dormant faults, until the year 2000 came around...
Photoshop 7 still runs OK on my Windows 7 machine, yet it thinks my 1TB disks are full. An explanation is that the datatype used to hold free space was not designed to account for such high amounts of free space, and there's an overflow causing the free space to appear insufficient.
Transfering a file greater than 32MB with TFTP (the block counter can only go to 65535 in 16 bits) can reveal a dormant bug in a lot of old implementations.
In this last set of examples, one could argue that there was no specification requiring these systems to support such instances, and so they're not really faults. But that gets into completeness of specifications.

when referring to 'Number Crunching', how intensive is 'intensive'?

I am currently reading / learning Erlang, and it is often noted that it is not (really) suitable for 'heavy number crunching'. Now I often come across this phrase or similar, but never really know what 'heavy' exactly means.
How does one decide if an operation is computationally intensive? Can it be quantified before testing?
Edit:
is there a difference between the quantity of calculations, the complexity of the algorithm or the size of the input values.
for example 1000 computaions of 28303 / 4 vs 100 computations of 239847982628763482 / 238742
When you are talking about Erlang specifically, I doubt that you in general want to develop applications that require intensive number crunching with it. That is - you don't learn Erlang to code a physics engine in it. So don't worry about Erlang being too slow for you.
Moving from Erlang to the question in general, these things almost always come down to relativity. Let's ignore number crunching and ask a general question about programming: How fast is fast enough?
Well, fast enough depends on:
what you want to do with the application
how often you want to do it
how fast your users expect it to happen
If reading a file in some program takes 1ms or 1000ms - is 1000 ms to be considered "too slow"?
If ten files have to be read in quick succession - yes, probably way too slow. Imagine an XML parser that takes 1 second to simply read an XML file from disk - horrible!
If a file on the other hand only has to be read when a user manually clicks a button every 15 minutes or so then it's not a problem, e.g. in Microsoft Word.
The reason nobody says exactly what too slow is, is because it doesn't really matter. The same goes for your specific question. A language should rarely, if ever, be shunned for being "slow".
And last but not least, if you develop some monstrous project in Erlang and, down the road, realise that dagnabbit! you really need to crunch those numbers - then you do your research, find good libraries and implement algorithms in the language best suited for it, and then interop with that small library.
With this sort of thing you'll know it when you see it! Usually this refers to situations when it matters if you pick an int, float, double etc. Things like physical simulations or monte carlo methods, where you want to do millions of calculations.
To be honest, in reality you just write those bits in C and use your favourite other language to run them.
i once asked a question about number crunching in couch DB mapreduce: CouchDB Views: How much processing is acceptable in map reduce?
whats interesting in one of the answers is this:
suppose you had 10,000 documents and they take 1 second each to
process (which is way higher than I have ever seen). That is 10,000
seconds or 2.8 hours to completely build the view. However once the
view is complete, querying any row (?key=...) or row slice
(?startkey=...&endkey=...) takes the same time as querying for
documents directly. Lookup time is O(log n) for the document count.
In other words, even if it takes 1 second per document to execute the
map, it will take a few milliseconds to fetch the result. (Of course,
the view must build first, since it is actually an index.)
I think if you think about your current question in those terms, its an interesting angle to think of your question. on the topic of the language's speed / optimization:
How does one decide if an operation is computationally intensive?
Facebook asked this question about PHP, and ended up writing HIP HOP to solve the problem -- it compiles PHP into C++. They said the reason php is much slower than C++ is because the PHP language is all dynamic lookup, and therefore much processing is required to do anything with variables, arrays, dynamic typing (which is a source of slowdown), etc.
So, a question you can ask is: is erlang dynamic-lookup? static typing? compiled?
is there a difference between the quantity of calculations, the
complexity of the algorithm or the size of the input values. For
example 1000 computaions of 28303 / 4 vs 100 computations of
239847982628763482 / 238742
So, with that said, the fact that you can even grant specific types to numbers of different kinds means you SHOULD be using the right types, and that will definitely cause performance increase.
suitability for number-crunching depends on the library support and inherent nature of the language. for example, a pure functional language will not allow any mutable variables, which makes it extremely interesting to implement any equation solving type problems. Erlang probably falls in to this category.

Should I worry about unused variables?

I am working in large code base in C++, totaling approximately 8 million lines of code. In my application I have seen thousands of unused variables, which were reported by g++ but were ignored by my team. I want to take initiative for cleaning these variables but I need some info before working this issue.
Will there be any issues or disadvantages of having thousands of unused variables?
The compiler by default treats this as an ignored warning, but I believe we should treat warnings as errors. Is there any disaster which can occur if we ignore this warning?
Should we make the effort to rectify this problem or would it just be wasted effort?
Assuming your variables are POD types like ints, floats etc, they are unlikely to have an effect on performance. But they have a huge effect on code quality. I suggest as you update your code to add new features, you remove the unused variables as you go. You MUST be using version control software in order to do this safely.
This is a not uncommon problem. As a consultant, I once reviewed a large FORTRAN codebase that contained hundreds of unused variables. When I asked the team who wrote it why they were there, their answer was "Well, we might need them in the future..."
If you compile with optimizations on, the compiler will most likely simply remove the variables, just as if they aren't there. If you don't use optimizations, then your program will occupy additional extra storage space for the variables without using it.
It's good practice to not declare variables then not use them, because they might take up space and, more importantly, they clutter up your code, making it less readable.
If you have, say, 1000 unused ints, and an integer on your platform is 32 bits long, then you will, in total, use up 4K of extra stack space, with optimizations turned off.
If the unused variables are not arguments, then there should be nothing stopping you from removing them, as there's nothing you could break. You will gain readability and you will be able to see the other, more serious warnings that the compiler might produce.
Unused variables are still allocated in memory. Removing them will free up memory.

"Work stealing" vs. "Work shrugging"?

Why is it that I can find lots of information on "work stealing" and nothing on "work shrugging" as a dynamic load-balancing strategy?
By "work-shrugging" I mean pushing surplus work away from busy processors onto less loaded neighbours, rather than have idle processors pulling work from busy neighbours ("work-stealing").
I think the general scalability should be the same for both strategies. However I believe that it is much more efficient, in terms of latency & power consumption, to wake an idle processor when there is definitely work for it to do, rather than having all idle processors periodically polling all neighbours for possible work.
Anyway a quick google didn't show up anything under the heading of "Work Shrugging" or similar so any pointers to prior-art and the jargon for this strategy would be welcome.
Clarification
I actually envisage the work submitting processor (which may or may not be the target processor) being responsible for looking around the immediate locality of the preferred target processor (based on data/code locality) to decide if a near neighbour should be given the new work instead because they don't have as much work to do.
I dont think the decision logic would require much more than an atomic read of the immediate (typically 2 to 4) neighbours' estimated q length here. I do not think this is any more coupling than implied by the thieves polling & stealing from their neighbours. (I am assuming "lock-free, wait-free" queues in both strategies).
Resolution
It seems that what I meant (but only partially described!) as "Work Shrugging" strategy is in the domain of "normal" upfront scheduling strategies that happen to be smart about processor, cache & memory loyality, and scaleable.
I find plenty of references searching on these terms and several of them look pretty solid. I will post a reference when I identify one that best matches (or demolishes!) the logic I had in mind with my definition of "Work Shrugging".
Load balancing is not free; it has a cost of a context switch (to the kernel), finding the idle processors, and choosing work to reassign. Especially in a machine where tasks switch all the time, dozens of times per second, this cost adds up.
So what's the difference? Work-shrugging means you further burden over-provisioned resources (busy processors) with the overhead of load-balancing. Why interrupt a busy processor with administrivia when there's a processor next door with nothing to do? Work stealing, on the other hand, lets the idle processors run the load balancer while busy processors get on with their work. Work-stealing saves time.
Example
Consider: Processor A has two tasks assigned to it. They take time a1 and a2, respectively. Processor B, nearby (the distance of a cache bounce, perhaps), is idle. The processors are identical in all respects. We assume the code for each task and the kernel is in the i-cache of both processors (no added page fault on load balancing).
A context switch of any kind (including load-balancing) takes time c.
No Load Balancing
The time to complete the tasks will be a1 + a2 + c. Processor A will do all the work, and incur one context switch between the two tasks.
Work-Stealing
Assume B steals a2, incurring the context switch time itself. The work will be done in max(a1, a2 + c) time. Suppose processor A begins working on a1; while it does that, processor B will steal a2 and avoid any interruption in the processing of a1. All the overhead on B is free cycles.
If a2 was the shorter task, here, you have effectively hidden the cost of a context switch in this scenario; the total time is a1.
Work-Shrugging
Assume B completes a2, as above, but A incurs the cost of moving it ("shrugging" the work). The work in this case will be done in max(a1, a2) + c time; the context switch is now always in addition to the total time, instead of being hidden. Processor B's idle cycles have been wasted, here; instead, a busy processor A has burned time shrugging work to B.
I think the problem with this idea is that it makes the threads with actual work to do waste their time constantly looking for idle processors. Of course there are ways to make that faster, like have a queue of idle processors, but then that queue becomes a concurrency bottleneck. So it's just better to have the threads with nothing better to do sit around and look for jobs.
The basic advantage of 'work stealing' algorithms is that the overhead of moving work around drops to 0 when everyone is busy. So there's only overhead when some processor would otherwise have been idle, and that overhead cost is mostly paid by the idle processor with only a very small bus-synchronization related cost to the busy processor.
Work stealing, as I understand it, is designed for highly-parallel systems, to avoid having a single location (single thread, or single memory region) responsible for sharing out the work. In order to avoid this bottleneck, I think it does introduce inefficiencies in simple cases.
If your application is not so parallel that a single point of work distribution causes scalability problems, then I would expect you could get better performance by managing it explicitly as you suggest.
No idea what you might google for though, I'm afraid.
Some issues... if a busy thread is busy, wouldn't you want it spending its time processing real work instead of speculatively looking for idle threads to offload onto?
How does your thread decide when it has so much work that it should stop doing that work to look for a friend that will help?
How do you know that the other threads don't have just as much work and you won't be able to find a suitable thread to offload onto?
Work stealing seems more elegant, because solves the same problem (contention) in a way that guarantees that the threads doing the load balancing are only doing the load balancing while they otherwise would have been idle.
It's my gut feeling that what you've described will not only be much less efficient in the long run, but will require lots of of tweaking per-system to get acceptable results.
Though in your edit you suggest that you want submitting processor to handle this, not the worker threads as you suggested earlier and in some of the comments here. If the submitting processor is searching for the lowest queue length, you're potentially adding latency to the submit, which isn't really a desirable thing.
But more importantly it's a supplementary technique to work-stealing, not a mutually exclusive technique. You've potentially alleviated some of the contention that work-stealing was invented to control, but you still have a number of things to tweak before you'll get good results, these tweaks won't be the same for every system, and you still risk running into situations where work-stealing would help you.
I think your edited suggestion, with the submission thread doing "smart" work distribution is potentially a premature optimization against work-stealing. Are your idle threads slamming the bus so hard that your non-idle threads can't get any work done? Then comes the time to optimize work-stealing.
So, by contrast to "Work Stealing", what is really meant here by "Work Shrugging", is a normal upfront work scheduling strategy that is smart about processor, cache & memory loyalty, and scalable.
Searching on combinations of the terms / jargon above yields many substantial references to follow up. Some address the added complication of machine virtualisation, which wasn't infact a concern of the questioner, but the general strategies are still relevent.