Comparing speeds of different tasks in different languages - language-agnostic

If I wanted to test out the speeds it takes for certain tasks to be done, would it matter what language I did the test in? We can consider this to be any job a programmer might want to perform. Simple jobs such as sorting, or more complicated jobs which involve the signing and verification of files.
Considering that we all know that certain languages will run faster than others, this means that the tasks will rely upon the languages and the way their compilers / runtimes are optimised. But these will obviously all be different.
So is it best to rely upon a language which relies less on abstraction such as C, or is it OK to test out jobs and tasks in more high level languages, and rely on the fact that they are implemented well enough not to worry about any possible inefficiencies? I hope my question is clear.

it doesn't matter which real-world language you use for your tests... if you design the tests correctly. there is always an overhead of the language but there is always an overhead of the operating system, thread scheduler, IO, ram or speed of electricity that depends on current temperature etc.
but to compare anything you don't want to measure how many nanoseconds it took to do the one assignment statement. instead you want to measure how many hours it took to do billions of assignment statements. then all mentioned overheads are negligible

Related

Stress test cases for web application

What other stress test cases are there other than finding out the maximum number of users allowed to login into the web application before it slows down the performance and eventually crashing it?
This question is hard to answer thoroughly since it's too broad.
Anyway many stress tests depend on the type and execution flow of your workload. There's an entire subject dedicated (as a graduate course) to queue theory and resources optimization. Most of the things can be summarized as follows:
if you have a resource (be it a gpu, cpu, memory bank, mechanical or
solid state disk, etc..), it can serve a number of users/requests per
second and takes an X amount of time to complete one unit of work.
Make sure you don't exceed its limits.
Some systems can also be studied with a probabilistic approach (Little's Law is one of the most fundamental rules in these cases)
There are a lot of reasons for load/performance testing, many of which may not be important to your project goals. For example:
- What is the performance of a system at a given load? (load test)
- How many users the system can handle and still meet a specific set of performance goals? (load test)
- How does the performance of a system changes over time under a certain load? (soak test)
- When will the system will crash under increasing load? (stress test)
- How does the system respond to hardware or environment failures? (stress test)
I've got a post on some common motivations for performance testing that may be helpful.
You should also check out your web analytics data and see what people are actually doing.
It's not enough to simply simulate X number of users logging in. Find the scenarios that represent the most common user activities (anywhere between 2 to 20 scenarios).
Also, make sure you're not just hitting your cache on reads. Add some randomness / diversity in the requests.
I've seen stress tests where all the users were requesting the same data which won't give you real world results.

when referring to 'Number Crunching', how intensive is 'intensive'?

I am currently reading / learning Erlang, and it is often noted that it is not (really) suitable for 'heavy number crunching'. Now I often come across this phrase or similar, but never really know what 'heavy' exactly means.
How does one decide if an operation is computationally intensive? Can it be quantified before testing?
Edit:
is there a difference between the quantity of calculations, the complexity of the algorithm or the size of the input values.
for example 1000 computaions of 28303 / 4 vs 100 computations of 239847982628763482 / 238742
When you are talking about Erlang specifically, I doubt that you in general want to develop applications that require intensive number crunching with it. That is - you don't learn Erlang to code a physics engine in it. So don't worry about Erlang being too slow for you.
Moving from Erlang to the question in general, these things almost always come down to relativity. Let's ignore number crunching and ask a general question about programming: How fast is fast enough?
Well, fast enough depends on:
what you want to do with the application
how often you want to do it
how fast your users expect it to happen
If reading a file in some program takes 1ms or 1000ms - is 1000 ms to be considered "too slow"?
If ten files have to be read in quick succession - yes, probably way too slow. Imagine an XML parser that takes 1 second to simply read an XML file from disk - horrible!
If a file on the other hand only has to be read when a user manually clicks a button every 15 minutes or so then it's not a problem, e.g. in Microsoft Word.
The reason nobody says exactly what too slow is, is because it doesn't really matter. The same goes for your specific question. A language should rarely, if ever, be shunned for being "slow".
And last but not least, if you develop some monstrous project in Erlang and, down the road, realise that dagnabbit! you really need to crunch those numbers - then you do your research, find good libraries and implement algorithms in the language best suited for it, and then interop with that small library.
With this sort of thing you'll know it when you see it! Usually this refers to situations when it matters if you pick an int, float, double etc. Things like physical simulations or monte carlo methods, where you want to do millions of calculations.
To be honest, in reality you just write those bits in C and use your favourite other language to run them.
i once asked a question about number crunching in couch DB mapreduce: CouchDB Views: How much processing is acceptable in map reduce?
whats interesting in one of the answers is this:
suppose you had 10,000 documents and they take 1 second each to
process (which is way higher than I have ever seen). That is 10,000
seconds or 2.8 hours to completely build the view. However once the
view is complete, querying any row (?key=...) or row slice
(?startkey=...&endkey=...) takes the same time as querying for
documents directly. Lookup time is O(log n) for the document count.
In other words, even if it takes 1 second per document to execute the
map, it will take a few milliseconds to fetch the result. (Of course,
the view must build first, since it is actually an index.)
I think if you think about your current question in those terms, its an interesting angle to think of your question. on the topic of the language's speed / optimization:
How does one decide if an operation is computationally intensive?
Facebook asked this question about PHP, and ended up writing HIP HOP to solve the problem -- it compiles PHP into C++. They said the reason php is much slower than C++ is because the PHP language is all dynamic lookup, and therefore much processing is required to do anything with variables, arrays, dynamic typing (which is a source of slowdown), etc.
So, a question you can ask is: is erlang dynamic-lookup? static typing? compiled?
is there a difference between the quantity of calculations, the
complexity of the algorithm or the size of the input values. For
example 1000 computaions of 28303 / 4 vs 100 computations of
239847982628763482 / 238742
So, with that said, the fact that you can even grant specific types to numbers of different kinds means you SHOULD be using the right types, and that will definitely cause performance increase.
suitability for number-crunching depends on the library support and inherent nature of the language. for example, a pure functional language will not allow any mutable variables, which makes it extremely interesting to implement any equation solving type problems. Erlang probably falls in to this category.

Purpose of abstraction

What is the purpose of abstraction in coding:
Programmer's efficiency or program's efficiency?
Our professor said that it is used merely for helping the programmer comprehend & modify programs faster to suit different scenarios. He also contended that it adds an extra burden on the program's performance. I am not exactly clear by what this means.
Could someone kindly elaborate?
I would say he's about half right.
The biggest purpose is indeed to help the programmer. The computer couldn't care less how abstracted your program is. However, there is a related, but different, benefit - code reuse. This isn't just for readability though, abstraction is what lets us plug various components into our programs that were written by others. If everything were just mixed together in one code file, and with absolutely no abstraction, you would never be able to write anything even moderately complex, because you'd be starting with the bare metal every single time. Just writing text on the screen could be a week long project.
About performance, that's a questionable claim. I'm sure it depends on the type and depth of the abstraction, but in most cases I don't think the system will notice a hit. Especially modern compiled languages, which actually "un-abstract" the code for you (things like loop unrolling and function inlining) sometimes to make it easier on the system.
Your professor is correct; abstraction in coding exists to make it easier to do the coding, and it increases the workload of the computer in running the program. The trick, though, is to make the (hopefully very tiny) increase in computer workload be dwarfed by the increase in programmer efficiency.
For example, on an extremely low-level; object-oriented code is an abstraction that helps the programmer, but adds some overhead to the program in the end in extra 'stuff' in memory, and extra function calls.
Since Abstraction is really the process of pulling out common pieces of functionality into re-usable components (be it abstract classes, parent classes, interfaces, etc.) I would say that it is most definitely a Programmer's efficiency.
Saying that Abstraction comes at the cost of performance is treading on unstable ground at best though. With most modern languages, abstraction (thus enhanced flexibility) can be had a little to no cost to the performance of the application.
What abstraction is is effectively outlined in the link Tesserex posted. To your professor's point about adding an additional burden on the program, this is actually fairly true. However, the burden in modern systems is negligible. Think of it in terms of what actually happens when you call a method: each additional method you call requires adding a number of additional data structures to the stack and then handling the return values also placed on the stack. So for instance, calling
c = add(a, b);
which looks something like
public int add(int a, int b){
return a + b;
}
requires pushing two integers onto the stack for the parameters and then pushing an additional one onto the stack for the return value. However, no memory interaction is required if both values are already in registers -- it's a simple, one-instruction call. Given that memory operations are much slower than register operations, you can see where the notion of a performance hit comes from.
Ultimately, every method call you make is going to increase the overhead of your program a little bit. However as #Tesserex points out, it's minute in most modern computer systems and as #Andrew Barber points out, that compromise is usually totally dwarfed by the increase in programmer efficiency.
Abstraction is a tool to make it easier for the programmer. The abstraction may or may not have an effect on the runtime performance of the system.
For an example of an abstraction that doesn't alter performance consider assembly. The pneumonics like mov and add are an abstraction that makes opcodes easier to remember, as compared to remembering byte-codes and other instruction encoding details. However, given the 1 to 1 mapping I'd suggest its clear that this abstraction has 0 effect on final performance.
There's not a clear-cut situation that abstraction makes life easier for the programmer at the expense of more work for the computer.
Although a higher level of abstraction typically adds at least a small amount of overhead to executing a discrete unit of code, it's also what allows the programmer to think about a problem in larger "units" so he can do a better job of understanding an entire problem, and avoid executing mane (or at least some) of those discrete units of code.
Therefore, a higher level of abstraction will often lead to faster-executing programs as long as you avoid adding too much overhead. The problem, of course, is that there's no easy or simple definition of how much overhead is too much. That stems largely from the fact that the amount of overhead that's acceptable depends heavily on the problem being solved, and the degree to which working at a higher level of abstraction allows the programmer to recognize operations that are truly unnecessary, and eliminate them.

"Work stealing" vs. "Work shrugging"?

Why is it that I can find lots of information on "work stealing" and nothing on "work shrugging" as a dynamic load-balancing strategy?
By "work-shrugging" I mean pushing surplus work away from busy processors onto less loaded neighbours, rather than have idle processors pulling work from busy neighbours ("work-stealing").
I think the general scalability should be the same for both strategies. However I believe that it is much more efficient, in terms of latency & power consumption, to wake an idle processor when there is definitely work for it to do, rather than having all idle processors periodically polling all neighbours for possible work.
Anyway a quick google didn't show up anything under the heading of "Work Shrugging" or similar so any pointers to prior-art and the jargon for this strategy would be welcome.
Clarification
I actually envisage the work submitting processor (which may or may not be the target processor) being responsible for looking around the immediate locality of the preferred target processor (based on data/code locality) to decide if a near neighbour should be given the new work instead because they don't have as much work to do.
I dont think the decision logic would require much more than an atomic read of the immediate (typically 2 to 4) neighbours' estimated q length here. I do not think this is any more coupling than implied by the thieves polling & stealing from their neighbours. (I am assuming "lock-free, wait-free" queues in both strategies).
Resolution
It seems that what I meant (but only partially described!) as "Work Shrugging" strategy is in the domain of "normal" upfront scheduling strategies that happen to be smart about processor, cache & memory loyality, and scaleable.
I find plenty of references searching on these terms and several of them look pretty solid. I will post a reference when I identify one that best matches (or demolishes!) the logic I had in mind with my definition of "Work Shrugging".
Load balancing is not free; it has a cost of a context switch (to the kernel), finding the idle processors, and choosing work to reassign. Especially in a machine where tasks switch all the time, dozens of times per second, this cost adds up.
So what's the difference? Work-shrugging means you further burden over-provisioned resources (busy processors) with the overhead of load-balancing. Why interrupt a busy processor with administrivia when there's a processor next door with nothing to do? Work stealing, on the other hand, lets the idle processors run the load balancer while busy processors get on with their work. Work-stealing saves time.
Example
Consider: Processor A has two tasks assigned to it. They take time a1 and a2, respectively. Processor B, nearby (the distance of a cache bounce, perhaps), is idle. The processors are identical in all respects. We assume the code for each task and the kernel is in the i-cache of both processors (no added page fault on load balancing).
A context switch of any kind (including load-balancing) takes time c.
No Load Balancing
The time to complete the tasks will be a1 + a2 + c. Processor A will do all the work, and incur one context switch between the two tasks.
Work-Stealing
Assume B steals a2, incurring the context switch time itself. The work will be done in max(a1, a2 + c) time. Suppose processor A begins working on a1; while it does that, processor B will steal a2 and avoid any interruption in the processing of a1. All the overhead on B is free cycles.
If a2 was the shorter task, here, you have effectively hidden the cost of a context switch in this scenario; the total time is a1.
Work-Shrugging
Assume B completes a2, as above, but A incurs the cost of moving it ("shrugging" the work). The work in this case will be done in max(a1, a2) + c time; the context switch is now always in addition to the total time, instead of being hidden. Processor B's idle cycles have been wasted, here; instead, a busy processor A has burned time shrugging work to B.
I think the problem with this idea is that it makes the threads with actual work to do waste their time constantly looking for idle processors. Of course there are ways to make that faster, like have a queue of idle processors, but then that queue becomes a concurrency bottleneck. So it's just better to have the threads with nothing better to do sit around and look for jobs.
The basic advantage of 'work stealing' algorithms is that the overhead of moving work around drops to 0 when everyone is busy. So there's only overhead when some processor would otherwise have been idle, and that overhead cost is mostly paid by the idle processor with only a very small bus-synchronization related cost to the busy processor.
Work stealing, as I understand it, is designed for highly-parallel systems, to avoid having a single location (single thread, or single memory region) responsible for sharing out the work. In order to avoid this bottleneck, I think it does introduce inefficiencies in simple cases.
If your application is not so parallel that a single point of work distribution causes scalability problems, then I would expect you could get better performance by managing it explicitly as you suggest.
No idea what you might google for though, I'm afraid.
Some issues... if a busy thread is busy, wouldn't you want it spending its time processing real work instead of speculatively looking for idle threads to offload onto?
How does your thread decide when it has so much work that it should stop doing that work to look for a friend that will help?
How do you know that the other threads don't have just as much work and you won't be able to find a suitable thread to offload onto?
Work stealing seems more elegant, because solves the same problem (contention) in a way that guarantees that the threads doing the load balancing are only doing the load balancing while they otherwise would have been idle.
It's my gut feeling that what you've described will not only be much less efficient in the long run, but will require lots of of tweaking per-system to get acceptable results.
Though in your edit you suggest that you want submitting processor to handle this, not the worker threads as you suggested earlier and in some of the comments here. If the submitting processor is searching for the lowest queue length, you're potentially adding latency to the submit, which isn't really a desirable thing.
But more importantly it's a supplementary technique to work-stealing, not a mutually exclusive technique. You've potentially alleviated some of the contention that work-stealing was invented to control, but you still have a number of things to tweak before you'll get good results, these tweaks won't be the same for every system, and you still risk running into situations where work-stealing would help you.
I think your edited suggestion, with the submission thread doing "smart" work distribution is potentially a premature optimization against work-stealing. Are your idle threads slamming the bus so hard that your non-idle threads can't get any work done? Then comes the time to optimize work-stealing.
So, by contrast to "Work Stealing", what is really meant here by "Work Shrugging", is a normal upfront work scheduling strategy that is smart about processor, cache & memory loyalty, and scalable.
Searching on combinations of the terms / jargon above yields many substantial references to follow up. Some address the added complication of machine virtualisation, which wasn't infact a concern of the questioner, but the general strategies are still relevent.

How well do common programming tasks translate to GPUs?

I have recently begun working on a project to establish how best to leverage the processing power available in modern graphics cards for general programming. It seems that the field general purpose GPU programming (GPGPU) has a large bias towards scientific applications with a lot of heavy math as this fits well with the GPU computational model. This is all good and well, but most people don't spend all their time running simulation software and the like so we figured it might be possible to create a common foundation for easily building GPU-enabled software for the masses.
This leads to the question I would like to pose; What are the most common types of work performed by programs? It is not a requirement that the work translates extremely well to GPU programming as we are willing to accept modest performance improvements (Better little than nothing, right?).
There are a couple of subjects we have in mind already:
Data management - Manipulation of large amounts of data from databases
and otherwise.
Spreadsheet type programs (Is somewhat related to the above).
GUI programming (Though it might be impossible to get access to the
relevant code).
Common algorithms like sorting and searching.
Common collections (And integrating them with data manipulation
algorithms)
Which other coding tasks are very common? I suspect a lot of the code being written is of the category of inventory management and otherwise tracking of real 'objects'.
As I have no industry experience I figured there might be a number of basic types of code which is done more often than I realize but which just doesn't materialize as external products.
Both high level programming tasks as well as specific low level operations will be appreciated.
General programming translates terribly to GPUs. GPUs are dedicated to performing fairly simple tasks on streams of data at a massive rate, with massive parallelism. They do not deal well with the rich data and control structures of general programming, and there's no point trying to shoehorn that into them.
General programming translates terribly to GPUs. GPUs are dedicated to performing fairly simple tasks on streams of data at a massive rate, with massive parallelism. They do not deal well with the rich data and control structures of general programming, and there's no point trying to shoehorn that into them.
This isn't too far away from my impression of the situation but at this point we are not concerning ourselves too much with that. We are starting out by getting a broad picture of which options we have to focus on. After that is done we will analyse them a bit deeper and find out which, if any, are plausible options. If we end up determining that it is impossible to do anything within the field, and we are only increasing everybody's electricity bill then that is a valid result as well.
Things that modern computers do a lot of, where a little benefit could go a long way? Let's see...
Data management: relational database management could benefit from faster relational joins (especially joins involving a large number of relations). Involves massive homogeneous data sets.
Tokenising, lexing, parsing text.
Compilation, code generation.
Optimisation (of queries, graphs, etc).
Encryption, decryption, key generation.
Page layout, typesetting.
Full text indexing.
Garbage collection.
I do a lot of simplifying of configuration. That is I wrap the generation/management of configuration values inside a UI. The primary benefit is I can control work flow and presentation to make it simpler for non-techie users to configure apps/sites/services.
The other thing to consider when using a GPU is the bus speed, Most Graphics cards are designed to have a higher bandwidth when transferring data from the CPU out to the GPU as that's what they do most of the time. The bandwidth from the GPU back up to the CPU, which is needed to return results etc, isn't as fast. So they work best in a pipelined mode.
You might want to take a look at the March/April issue of ACM's Queue magazine, which has several articles on GPUs and how best to use them (besides doing graphics, of course).