Using functions in VHDL for synthesis

Using functions in VHDL for synthesis - function

I do use functions in VHDL now and then, mostly in testbenches and seldom in synthesized projects, and I'm quite happy with that.
However, I was wondering if for projects that will be synthesized, it really is a smart move (in terms of LE use mostly?) I've read quite a lot of things about that online, however I can't find anything satisfying.
For instance, I've read something like that : "The function is synthesized each time it's called !!". Is it really so? (I thought of it more like a component instantiated once but whose inputs and output and accessed from various places in the design but I guess that may be incorrect).
In the case of a once-used function, what would change between that and writing the VHDL directly in the process for example? (In terms of LE use?).

A circuit in hardware, for example a FPGA, executes everywhere all the time, where in compare a program for an CPU executes only one place at a time. This allows a program on a CPU to reuse program code for different data, where a circuit in hardware must have sufficient resources to process all the data all the time.
So a circuit written in VHDL is generally translated by the synthesis tool as massive parallel construction that allows concurrent operation of all of the design all the time. The VHDL language is created with the purpose of concurrent execution, and this is a major different from ordinary programming languages.
As a consequence, a design that implements an algorithm with functions vs. a design that implements the same algorithm with separate logic, will have the exact same size and speed since the synthesis tool will expand the functions to dedicated logic in order to make the required hardware available.
That being said, it is possible to reuse the same hardware for different data, but the designer must generally explicitly create the design to support this, and thereby interleave different data sets when timing allows it.
And finally, as scary_jeff also points out, it is a smart move to use functions since there is nothing to loose in terms of size or speed, but all the advantages of creating a manageable design. But be aware, that functions can't contain state, so it is only possible to create functions for combinatorial logic between flip-flops, which usually limits the possible complexity in order to meet timing.

Yes, you should use functions and procedures.
Many people and companies use functions and procedures in synthesizable code. Some coding styles disallow functions for no good reason. If you feel uncertain about a certain construct in VHDL (in this case: functions), just type up a small example and inspect the synthesis result.
Functions are really powerful and they can help you create better hardware with less effort. As with all powerful things, you can create really bad code (and bad synthesis results) with functions too.

Related

Normal Cuda Vs CuBLAS?

Just of curiosity. CuBLAS is a library for basic matrix computations. But these computations, in general, can also be written in normal Cuda code easily, without using CuBLAS. So what is the major difference between the CuBLAS library and your own Cuda program for the matrix computations?

We highly recommend developers use cuBLAS (or cuFFT, cuRAND, cuSPARSE, thrust, NPP) when suitable for many reasons:
We validate correctness across every supported hardware platform, including those which we know are coming up but which maybe haven't been released yet. For complex routines, it is entirely possible to have bugs which show up on one architecture (or even one chip) but not on others. This can even happen with changes to the compiler, the runtime, etc.
We test our libraries for performance regressions across the same wide range of platforms.
We can fix bugs in our code if you find them. Hard for us to do this with your code :)
We are always looking for which reusable and useful bits of functionality can be pulled into a library - this saves you a ton of development time, and makes your code easier to read by coding to a higher level API.
Honestly, at this point, I can probably count on one hand the number of developers out there who actually implement their own dense linear algebra routines rather than calling cuBLAS. It's a good exercise when you're learning CUDA, but for production code it's usually best to use a library.
(Disclosure: I run the CUDA Library team)

There's several reasons you'd chose to use a library instead of writing your own implementation. Three, off the top of my head:
You don't have to write it. Why do work when somebody else has done it for you?
It will be optimised. NVIDIA supported libraries such as cuBLAS are likely to be optimised for all current GPU generations, and later releases will be optimised for later generations. While most BLAS operations may seem fairly simple to implement, to get peak performance you have to optimise for hardware (this is not unique to GPUs). A simple implementation of SGEMM, for example, may be many times slower than an optimised version.
They tend to work. There's probably less chance you'll run up against a bug in a library then you'll create a bug in your own implementation which bites you when you change some parameter or other in the future.
The above isn't just relevent to cuBLAS: if you have a method that's in a well supported library you'll probably save a lot of time and gain a lot of performance using it relative to using your own implementation.

Purpose of abstraction

What is the purpose of abstraction in coding:
Programmer's efficiency or program's efficiency?
Our professor said that it is used merely for helping the programmer comprehend & modify programs faster to suit different scenarios. He also contended that it adds an extra burden on the program's performance. I am not exactly clear by what this means.
Could someone kindly elaborate?

I would say he's about half right.
The biggest purpose is indeed to help the programmer. The computer couldn't care less how abstracted your program is. However, there is a related, but different, benefit - code reuse. This isn't just for readability though, abstraction is what lets us plug various components into our programs that were written by others. If everything were just mixed together in one code file, and with absolutely no abstraction, you would never be able to write anything even moderately complex, because you'd be starting with the bare metal every single time. Just writing text on the screen could be a week long project.
About performance, that's a questionable claim. I'm sure it depends on the type and depth of the abstraction, but in most cases I don't think the system will notice a hit. Especially modern compiled languages, which actually "un-abstract" the code for you (things like loop unrolling and function inlining) sometimes to make it easier on the system.

Your professor is correct; abstraction in coding exists to make it easier to do the coding, and it increases the workload of the computer in running the program. The trick, though, is to make the (hopefully very tiny) increase in computer workload be dwarfed by the increase in programmer efficiency.
For example, on an extremely low-level; object-oriented code is an abstraction that helps the programmer, but adds some overhead to the program in the end in extra 'stuff' in memory, and extra function calls.

Since Abstraction is really the process of pulling out common pieces of functionality into re-usable components (be it abstract classes, parent classes, interfaces, etc.) I would say that it is most definitely a Programmer's efficiency.
Saying that Abstraction comes at the cost of performance is treading on unstable ground at best though. With most modern languages, abstraction (thus enhanced flexibility) can be had a little to no cost to the performance of the application.

What abstraction is is effectively outlined in the link Tesserex posted. To your professor's point about adding an additional burden on the program, this is actually fairly true. However, the burden in modern systems is negligible. Think of it in terms of what actually happens when you call a method: each additional method you call requires adding a number of additional data structures to the stack and then handling the return values also placed on the stack. So for instance, calling
c = add(a, b);
which looks something like
public int add(int a, int b){
return a + b;
}
requires pushing two integers onto the stack for the parameters and then pushing an additional one onto the stack for the return value. However, no memory interaction is required if both values are already in registers -- it's a simple, one-instruction call. Given that memory operations are much slower than register operations, you can see where the notion of a performance hit comes from.
Ultimately, every method call you make is going to increase the overhead of your program a little bit. However as #Tesserex points out, it's minute in most modern computer systems and as #Andrew Barber points out, that compromise is usually totally dwarfed by the increase in programmer efficiency.

Abstraction is a tool to make it easier for the programmer. The abstraction may or may not have an effect on the runtime performance of the system.
For an example of an abstraction that doesn't alter performance consider assembly. The pneumonics like mov and add are an abstraction that makes opcodes easier to remember, as compared to remembering byte-codes and other instruction encoding details. However, given the 1 to 1 mapping I'd suggest its clear that this abstraction has 0 effect on final performance.

There's not a clear-cut situation that abstraction makes life easier for the programmer at the expense of more work for the computer.
Although a higher level of abstraction typically adds at least a small amount of overhead to executing a discrete unit of code, it's also what allows the programmer to think about a problem in larger "units" so he can do a better job of understanding an entire problem, and avoid executing mane (or at least some) of those discrete units of code.
Therefore, a higher level of abstraction will often lead to faster-executing programs as long as you avoid adding too much overhead. The problem, of course, is that there's no easy or simple definition of how much overhead is too much. That stems largely from the fact that the amount of overhead that's acceptable depends heavily on the problem being solved, and the degree to which working at a higher level of abstraction allows the programmer to recognize operations that are truly unnecessary, and eliminate them.

Why is memoization not a language feature?

I was wondering: why is memoization not provided natively as a language feature in any language I know about?
Edit: to clarify, what I mean is that the language provides a keyword to specify a given function as memoizable, not that every function is automatically memoized "by default" unless specified otherwise. For example, Fortran provides the keyword PURE to specify a specific function as such. I guess that the compiler can take advantage of this information to memoize the call, but I ignore what happens if you declare PURE a function with side effects.

What YOU want from memoization may not be the same as what the compiler memoization option would provide.
You may know that it is only profitable to memoize the last 10 or so distinct values computed, because you know how the function will be used.
You may know that it only makes sense to memoize the last 2 or 3 values, because you will never use values older than that. (Fibonacci's Sequence comes to mind.)
You may be generating a LOT of values on some runs, and just a few on others.
You may want to "throw away" some of the memoized values and start over. (I memoized a random number generator this way, so I could replay the sequence of random numbers that built a certain structure, while some other parameters of the structure had been changed.)
Memoization as an optimization depends on the search for the memoized value being a lot cheaper than recomputation of the value. This in turn depends on the ordering of the input requests. This has implications for the memoization database: Does it use a stack, an array of all possible input values (which may be very large), a bucket hash, or a b-tree?
The memoizing compiler has to either provide a "one size fits all" memoization, or it has to provide lots of possible alternatives, and parameters to control the alternatives. At some point, it becomes easier for everyone to require the user to provide his own memoization.

Because compilers have to emit semantically correct programs. You can't memoize a function without changing program semantics unless it is referentially transparent. In most programming languages not all functions are referentially transparent (pure functional programming languages are an exception) so you can't memoize everything. But then a mechanism is needed for detecting referential transparency and that is too hard.

In Haskell, memoization is automatic for (pure) functions you've defined that take no arguments. And the Fibonacci example in that Wiki is really about the simplest demonstrable example I would be able to think of either.
Haskell can do this because your pure functions are defined to produce the same results every time; of course, monadic functions that depend on side effects won't be memoized.
I'm not sure what the upper limits are -- obviously, it won't memoize more than the available memory. And I'm also not sure offhand if the memoization occurs at compile-time (if the values can be determined at compile-time), or if it always occurs the first time the function is called.

Clojure has a memoize function (http://richhickey.github.com/clojure/clojure.core-api.html#clojure.core/memoize):
memoize
function
Usage: (memoize f)
Returns a memoized version of a referentially transparent function. The
memoized version of the function keeps a cache of the mapping from arguments
to results and, when calls with the same arguments are repeated often, has
higher performance at the expense of higher memory use.

A) Memoization trades space for time. I imagine that this can turn out to a fairly unbound property, in the sense, that the amount of data programs or libraries would have to store could consume large parts of memory really quick.
For a couple of languages, memoization is easy to implement and easy to customize for the given requirements.
As an example take some natural language processing on large bodies of text, where you don't want to compute basic properties of texts (word count, frequency, cooccurrences, ...) over and over again. In that case a memoization in combination with object serialization can be useful as opposed to memory caching, since you may run your application multiple times on unchanged corpora.
B) Another aspect: It's not true, that all functions or methods yield the same output for a same given input. Anyway some keyword or syntax for memoization would be necessary, along with configuration (memory limits, invalidation policy, ...) ...

Because you shouldn't implement something as a language feature when it can easily be implemented in the language itself. A memoization feature belongs in a library, which is exactly where most languages put it.

Your question also leaves open the solution of your learning more languages. I think that Lisp supports memoization, and I know that Mathematica does.

In order for memoization to work as a language feature there would be a couple requirements.
The compiler would need to be identify valid functions for memoization (e.g. they are referentially transparent).
The run-time would have to be able to intelligently select candidates for memoization without slowing down the overall performance.
There are some assumptions in the other language, but if we can have performance gains by just-in-time compilation of hot-spots in a Java VM, then one can surely write an automated memoziation system.
While non-trivial I think this is all theoretically possible to get performance gains in a language (especially an interpreted one) and is a worthwhile area for research.

Not all the languages natively support function decorators. I guess it would be a more general approach to support rather than supporting just memoization.

Reverse the question. Why it should? As someone has said, it can be put in a library so no need of add syntax to the language, it's only usable on pure functions which are hard to identify automatically(unless you force the programmer to annotate them). It's also very hard to determine if memoization is going to speed up things or not. I don't think it's a desirable feature for a programming language.

I really think such an option should be.
In data processing tasks there is an immutable input data (as time series, for example, where for a given time as soon as a value is known, it can never change). Taking in mind today RAM affordability, if a function result only depends on such immutable data, it is rational to memoize it rather than reread every time it's needed. Currently I have (in Scala and C#) to manually introduce an in-memory storage table and write 3 functions instead of one - one reading a value from file/db/ws, one storing it into an in-memory table, one to wrap them and read from memory if available or call the raw function if not. I think this could and should be implemented as a keyword and done behind the scenes.

Performances evaluation with Message Passing

I have to build a distributed application, using MPI.
One of the decision that I have to take is how to map instances of classes into process (and then into machines), in order to take maximum advantages from a distributed environment.
My question is: there is a model that let me choose the better mapping? I mean, some arrangements are surely wrong (for ex., putting in two different machines two objects that should process together a fairly large amount of data, in a sequential manner, without a stream of tokens to process), but there's a systematically way to determine such wrong arrangements, determined by flow of execution, message complexity, time taken by the computation done by the algorithmic components?

Well, there are data flow diagrams. Those can help identify parallelism's opportunities and pitfalls. The references on the wikipedia page might give you some more theoretical grounding.
When I worked at Lockheed Martin, I was exposed to CSIM, a tool they developed for modeling algorithm mapping to processing blocks.

Another thing you might try is the Join Calculus. I've found examples of programming with it to be surprisingly intuitive, and I think it's well grounded in theory. I'm not sure why it hasn't caught on more.
The other approach is the Pi Calculus, and I think that might be more popular, though it seems harder to understand.

A practical solution to this would be using a different model of distributed-memory parallel programming, that directly addresses your concerns. I work on the Charm++ programming system, whose model is that of individual objects sending messages from one to another. The runtime system facilitates automatic mapping of these objects to available processors, to account for issues of load balance and communication locality.

How well do common programming tasks translate to GPUs?

I have recently begun working on a project to establish how best to leverage the processing power available in modern graphics cards for general programming. It seems that the field general purpose GPU programming (GPGPU) has a large bias towards scientific applications with a lot of heavy math as this fits well with the GPU computational model. This is all good and well, but most people don't spend all their time running simulation software and the like so we figured it might be possible to create a common foundation for easily building GPU-enabled software for the masses.
This leads to the question I would like to pose; What are the most common types of work performed by programs? It is not a requirement that the work translates extremely well to GPU programming as we are willing to accept modest performance improvements (Better little than nothing, right?).
There are a couple of subjects we have in mind already:
Data management - Manipulation of large amounts of data from databases
and otherwise.
Spreadsheet type programs (Is somewhat related to the above).
GUI programming (Though it might be impossible to get access to the
relevant code).
Common algorithms like sorting and searching.
Common collections (And integrating them with data manipulation
algorithms)
Which other coding tasks are very common? I suspect a lot of the code being written is of the category of inventory management and otherwise tracking of real 'objects'.
As I have no industry experience I figured there might be a number of basic types of code which is done more often than I realize but which just doesn't materialize as external products.
Both high level programming tasks as well as specific low level operations will be appreciated.

General programming translates terribly to GPUs. GPUs are dedicated to performing fairly simple tasks on streams of data at a massive rate, with massive parallelism. They do not deal well with the rich data and control structures of general programming, and there's no point trying to shoehorn that into them.

General programming translates terribly to GPUs. GPUs are dedicated to performing fairly simple tasks on streams of data at a massive rate, with massive parallelism. They do not deal well with the rich data and control structures of general programming, and there's no point trying to shoehorn that into them.
This isn't too far away from my impression of the situation but at this point we are not concerning ourselves too much with that. We are starting out by getting a broad picture of which options we have to focus on. After that is done we will analyse them a bit deeper and find out which, if any, are plausible options. If we end up determining that it is impossible to do anything within the field, and we are only increasing everybody's electricity bill then that is a valid result as well.

Things that modern computers do a lot of, where a little benefit could go a long way? Let's see...
Data management: relational database management could benefit from faster relational joins (especially joins involving a large number of relations). Involves massive homogeneous data sets.
Tokenising, lexing, parsing text.
Compilation, code generation.
Optimisation (of queries, graphs, etc).
Encryption, decryption, key generation.
Page layout, typesetting.
Full text indexing.
Garbage collection.

I do a lot of simplifying of configuration. That is I wrap the generation/management of configuration values inside a UI. The primary benefit is I can control work flow and presentation to make it simpler for non-techie users to configure apps/sites/services.

The other thing to consider when using a GPU is the bus speed, Most Graphics cards are designed to have a higher bandwidth when transferring data from the CPU out to the GPU as that's what they do most of the time. The bandwidth from the GPU back up to the CPU, which is needed to return results etc, isn't as fast. So they work best in a pipelined mode.

You might want to take a look at the March/April issue of ACM's Queue magazine, which has several articles on GPUs and how best to use them (besides doing graphics, of course).

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008