function hashing/fingerprinting as a built-in feature? [closed] - function

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
How can one get a stable hash of a function at runtime?
This means the hash changes if the implementation changes, and this is recursive, so if there is a nested function call, the nested function hash will affect the outer function hash.
Not sure which language has this feature. I'm looking for a practical programming language where it is trivial and performant.
I guess functional languages, perhaps lisp or haskell are the usual suspects, but unsure how this looks like.
function myFunction() {
... // Some code, possibly using names from other files/modules/libraries
}
// Prints the hash which changes if anything in the implementation of `myFunction` changes, stable across runs.
print(hash(myFunction))
is there such a language? if so an example of how this is written and why it works is desired.
non-examples would be js, python, java...

If you want a general-purpose programming language, Unison does exactly this pervasively:
Each Unison definition is some syntax tree, and by hashing this tree in a way that incorporates the hashes of all that definition's dependencies, we obtain the Unison hash which uniquely identifies that definition.
— https://www.unisonweb.org/docs/tour
Every Unison definition is identified by a 512-bit SHA3 hash, and is immutable—you cannot modify a definition, only create a new one. Moreover, names are stored separately from definitions, so renaming is a trivial operation, and if two people write structurally the same code with only different variable & function names, their code will share the same hash and thus be identified as the same code.
As for configuration languages, Dhall does as well:
Use Dhall's support for semantic hashes to guarantee that many types of refactors are behavior-preserving
— https://dhall-lang.org/
Since Dhall is non–Turing complete, every expression has a normal form, and the hash of this normal form can be used to identify it, so when you refactor your Dhall code, you can have strong assurance that it produces identical results.

I've implemented this in a commercial application written in Common Lisp, deployed using CCL (Clozure Common Lisp).
You can do it by obtaining a represenation of the compiled image of the function using disassemble, and then hashing that using as suitable digester, like the one in Ironclad.
What I actually used in the CCL solution was the functions ccl::%function-code-words to obtain how many words there are in the function and the accessor ccl::%function-code-byte to get at the bytes (there are four times as many as the number of words).
Obviously, a hash based on the code bytes will not reflect differences in captured lexical environment between instances of functions created at the same point in the program at different times.
The implementation wasn't recursive. Rather, the overall solution iterated over a known list of functions.

Related

Why does Swift make a distinction between designated and convenience initializer? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Swift does make a distinction between designated and convenience initializers. The documentation, however, never states why this distinction is made.
From a programmer's point of view, it seems like an extra burden: I have to think whether some initialization mechanism is "designated" or "convenience" and there are even some practical inconveniences like that I cannot call a convenience constructor of the super class which might sometimes be totally appropriate. There have to be some advantages of this concept in return. Otherwise Apple would not have introduced this feature. So what is the reason for introducing this distinction in Swift? A references to an official statements would nice.
The distinction between designated initializers and convenience initializers is not new — it's been part of the Cocoa environment for a long time. The reason for this convention is to ensure that all the internal state managed by a class is configured as intended before a subclass, or code external to the class, starts trying to use it.
Explicit initialization is one of the "safety" features of Swift. Using values before they're initialized can lead to undefined behavior (read: bugs) in C. If you declare a variable in Swift, it has to have a value. (Optionals let you fudge this a bit while still being explicit about it.) Making Cocoa's initializer chaining convention into a requirement is how Swift ensures initialization safety on top of class inheritance.
But don't take my word for it. < /LeVar> There's a great explanation of this in the series of Swift talks from WWDC — check the videos.
Totally agree on the practical inconvenience, not being able to call a convenience constructor of the super class is a total PITA...
Though I don't have any "official" answer, the conclusion I've reached is that calling a designated constructor is the only future proof way to do so.
Convenience constructors always have the possibility of being changed in future API-releases, so making the initialization of your class depend on one is highly unsafe, while the designated initializer is always going to work, no matter what changes the API is going to go through along the way...

What are the benefits and drawbacks of using header files? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I had some experience on programming languages like Java, C#, Scala as well as some lower level programming language like C, C++, Objective - C.
My observation is that low level languages try separate out header files and implementation files while other higher level programming language never separate it out. Those languages use some identifiers like public, private, protected to try to do the jobs of header files. C++ also have both identifiers and header files as well
I saw one benefit of using header file (in some book like Code Complete), they talk about that using header files, people can never look at our implementation file and it helps with encapsulation.
A drawback is that it creates too many files for me. Sometimes, it looks like verbose.
It is just my thought and I don't know if there are any other benefits and drawbacks that people ever see and work with header file
This question may not relate directly to programming but I think that if I can understand better about programming to interface, design software.
They allow you to distribute the API of a library so the compiler can compile correct code.
As in C, rather than including the whole implementation, you just include the definition of what is in the library when linked.
In this sense, the benefits are mainly for the compiler. Hence you installing a binary library into say /lib and headers into your include search path. You are saying, at runtime, expect these symbols with this calling convention to be available.
When they are not required by the compiler/linker/interpreter then the convention for that language is the best way to do it because that's what other programmers expect to find. Conventional is expected.
Languages such as C# include the ability to inspect libraries for information from the binary blob, hence in many of these languages you don't require headers. Tools such as Cecil for C# also allow you to inspect a library yourself (and even modify it).
In short, some languages remove the benefits of headers and allow a library to be inspected for all the compile-time information required to ensure linking code meets the same interface/api specs.
I'm not sure exactly what question you are asking, so I will try to rephrase it:
What is the benefit of putting public information in a separate (header or interface) file, as opposed to simply marking information as public or private wherever it appears?
The main benefit of having a separate interface or header file is that it reduces the cognitive load on the reader. If you are a trying to understand a large system, you can tackle one implementation file at a time, and you need to read only the interfaces of the other implementations/classes/modules it depends on. This is a major benefit, and languages that do not require separate interface files (such as Java) or cannot even express interfaces in separate files (such as Haskell) often provide tools such as Doxygen or Haddock so that a separate interface, for people to read, is generated from the implementation.
I strongly prefer languages like Standard ML, Objective Caml, and Modula-2/3, where there is a separate interface file available for scrutiny. Having separate header files in C is also good, but not quite as good because in general, the header files cannot be checked independently by the compiler. (C++ header files are less good because they allow private information, such as private fields or the implementations of inline methods, to leak out into the header files, and so the public information becomes diluted.)
It's folklore in the language-design world that for typical statically typed languages, only about 10% of the information in a module is public (measured by lines of code). By putting this information in a separate header file, you reduce the reader's workload by roughly a factor of ten.
They use some identifiers like public, private, protected to do the jobs of header files.
I think you're wrong there: C++ for instance still has public private and protected, but it's common to split the implementation from the interface with a header file (although that doesn't go for function-templates).
I think it's in general a good idea to seperate interface from implementation when you're creating libraries, since you then never expose the inner workings of anything to the client and thus the client can never deliberately make code that depends on the implemenation. The choice if you want to split it is up to you. I must admit that for my own code (small programs I write for myself), I don't use it often.
In context of c The header file implementation brings lots of readability in our program and it becomes easy to understand. If it is the way to write our code systematically and header file brings abstraction,standardization and loose coupling between our main function file(.c) and other (.c) files which we are using.

Is Method Overloading considered polymorphism? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Is Method Overloading considered part of polymorphism?
There are different types of polymorphism:
overloading polymorphism (also called Ad-hoc polymorphism)
overriding polymorphism
So yes it is part of polymorphism.
"Polymorphism" is just a word and doesn't have a globally agreed-upon, precise definition. You will not be enlightened by either a "yes" or a "no" answer to your question because the difference will be in the chosen definition of "polymorphism" and not in the essence of method overloading as a feature of any particular language. You can see the evidence of that in most of the other answers here, each introducing its own definition and then evaluating the language feature against it.
Strictly speaking polymorphism, from wikipedia:
is the ability of one type, A, to appear as and be used like another type, B.
So, method overloading as such is not considered part of this definition polymorphism, as the overloads are defined as part of one type.
If you are talking about inclusion polymorphism (normally thought of as overriding), that is a different matter and then, yes it is considered to be part of polymorphism.
inclusion polymorphism is a concept in type theory wherein a name may denote instances of many different classes as long as they are related by some common super class.
There are 2 types of polymorphism.
static
dynamic.
Overloading is of type static polymorphism.. overriding comes under dynamic (or run-time) polymorphism..
ref. http://en.wikipedia.org/wiki/Polymorphism_(computer_science) which describes it more.
No, overloading is not. Maybe you refer to method overriding which is indeed part of polymorphism.
To further clarify, From the wikipedia:
Polymorphism is not the same as method
overloading or method overriding.1
Polymorphism is only concerned with
the application of specific
implementations to an interface or a
more generic base class.
So I'd say method overriding AND method overloading and convenient features of some language regarding polymorphism but notthe main concern of polymorphism (in object oriented programming) which only regards to the capability of an object to act as if it was another object in its hierarchy chain.
Method overriding or overloading is not polymorphism.
The right way to put it is that Polymorphism can be implemented using method overriding or overloading and using other ways as well.
In order to implement Polymorphism using method overriding, you can override the behaviour of a method in a sub-class.
In order to implement Polymorphism using method overloading, you need to write many methods with the same name and the same number of parameters but with different data types and implement different behavious in these methods. Now that is also polymorphism.
Other ways to implement polymorphism is operator overloading and implementing interfaces.
Wikipedia pedantics aside, one way to think about polymorphism is: the ability for a single line of code / single method call to do different things at runtime depending on the type of the object instance used to make the call.
Method overloading does not change behaviors at runtime. Overloading gives you more choices for argument lists on the same method name when you're writing and compiling the code, but when it's compiled the choice is fixed in code forever.
Not to be confused with method overriding, which is part of polymorphism.
It's a necessary evil that is and should only be used as a complement. In the end overloads should only convert and eventually forward to the main method. OverloDing is necessary because most vms for staticalky dispatched environments don't know how to convert one type to another so the parameter fits the target and this is where one uses overloads to help out.
StringBuilder
Append(String) // main
Append(Boolean) // converts and calls append(String)

What does “Data is just dumb code, and code is just smart data” mean? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I just came across an idea in The Structure And Interpretation of Computer Programs:
Data is just dumb code, and code is just smart data
I fail to understand what it means. Can some one help me to understand it better?
This is one of the fundamental lessons of SICP and one of the most powerful ideas of computer science. It works like this:
What we think of as "code" doesn't actually have the power to do anything by itself. Code defines a program only within a context of interpretation -- outside of that context, it is just a stream of characters. (Really a stream of bits, which is really a stream of electrical impulses. But let's keep it simple.) The meaning of code is defined by the system within which you run it -- and this system just treats your code as data that tells it what you wanted to do. C source code is interpreted by a C compiler as data describing an object file you want it to create. An object file is treated by the loader as data describing some machine instructions you want to queue up for execution. Machine instructions are interpreted by the CPU as data defining the sequence of state transitions it should undergo.
Interpreted languages often contain mechanisms for treating data as code, which means you can pass code into a function in some form and then execute it -- or even generate code at run time:
#!/usr/bin/perl
# Note that the above line explicitly defines the interpretive context for the
# rest of this file. Without the context of a Perl interpreter, this script
# doesn't do anything.
sub foo {
my ($expression) = #_;
# $expression is just a string that happens to be valid Perl
print "$expression = " . eval("$expression") . "\n";
}
foo("1 + 1 + 2 + 3 + 5 + 8"); # sum of first six Fibonacci numbers
foo(join(' + ', map { $_ * $_ } (1..10))); # sum of first ten squares
Some languages like scheme have a concept of "first-class functions", which means that you can treat a function as data and pass it around without evaluating it until you really want to.
The upshot is that the division between "code" and "data" is pretty much arbitrary, a function of perspective only. The lower the level of abstraction, the "smarter" the code has to be: it has to contain more information about how it should be executed. On the other hand, the more information the interpreter supplies, the more dumb the code can be, until it starts to look like data with no smarts at all.
One of the most powerful ways to write code is as a simple description of what you need: Data which will be turned into code describing how to get you what you need by the interpretive context. We call this "declarative programming".
For a concrete example, consider HTML. HTML does not describe a Turing-complete programming language. It is merely structured data. Its structure contains some smarts that let it control the behavior of its interpretive context -- but not a lot of smarts. On the other hand, it contains more smarts than the paragraphs of text that appear on an average web page: Those are pretty dumb data.
In the context of security: Due to buffer overflows, what you thought of as data and thus harmless (such as an image) can become executed as code and p0wn your machine.
In the context of software development: Many developers are very afraid of "hardcoding" things and very keen on extracting parameters that might have to change into configuration files. This is often based on the idea that config files are just "data" and thus can be changed easily (perhapy by customers) without raising the issues (compilation, deployment, testing) that changing anything in the code would.
What these developers don't realize is that since this "data" influences the behaviour of the program, it really is code; it could break the program and the only reason not to require complete testing after such a change is that, if done correctly, the configurable values have a very specific, well-documented effect and any invalid value or a broken file structure will be caught by the program.
However, what all too often happens is that the config file structure becomes a programming language in its own right, complete with control flow and everything - one that's badly documented, has a quirky syntax and parser and which only the most experienced developers in the team can touch without breaking the application completely.
So, in a language like Scheme, even code is treated as first class data. You can treat functions and lambda expressions much like you treat other code, say passing them into other functions and lambda expressions. I recommend continuing with the text as this will all become quite clear.
This is something you should come to understand from writing in a compiler.
One common step in compilers is to transform the program into an abstract syntax tree. Representation will often be like trees such as [+, 2, 3] where + is the root, and 2, 3 are the children.
Lisp languages simply treats this as its data. So there is no separation between data and code which are both lists that look like AST trees.
Code is definitely data, but data is definitely not always code. Let's take a basic example - customer name. It's nothing to do with code, it's a functional (essential), as opposed to a technical (accidental) aspect of an application.
You could probably say that any technical/accidental data is code and that functional/essential data is not.

What do you wish was automatic in your favorite programming language? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
As a programmer, I often look at some features of the language I'm currently using and think to myself "This is pretty hard to do for a programmer, and could be taken care of automatically by the machine".
One example of such a feature is memory management, which has been automatic for a while in a variety of languages. While memory management is not that hard to do manually most of the time, doing it perfectly all the way through your application without leaking memory is extremely hard. Automation has made it easy again so that we programmers could concentrate on more critical questions.
Are there any features that you think programming languages should automate because the reward/difficulty ratio is just too low (say, for example concurrency)?
This question is intended to be a brainstorm about what the future of programming could be like, and what languages could do for us to let us focus on more important tasks, so please post your wishes even if you don't think automation is practical/feasible. Good answers will point to stuff that is genuinely hard to do in many languages, as opposed to single-language pet-peeves.
Whatever the language can do for me automatically, I will want a way of doing for myself.
Concurrent programming/parallelism that is (semi-)automated, opposed to having to mess around with threads, callbacks, and synchronisation. Being able to parallelise for loops, such as:
Parallel.ForEach(fooList, item =>
{
item.PerformLongTask();
}
is just made of win.
Certain languages already support such functionality to a degree, however. Notably, F# has asynchronous workflows. Coming with the release of .NET 4.0, the Parallel Extensions library will make concurrency much easier in C# and VB.NET. I believe Python also has some sort of concurrency library, though I personally haven't used it.
What would also be cool is fully automated parallelism in purely functional languages, i.e. not having to change your code even slightly and automatically have it run near optimally across multiple cores. Note that this can only be done with purely functional languages (such as Haskell, but not CAML/F#). Still, constructs such as example given above would be very handy for automating parallelism in object-oriented and other languages.
I would imagine that libraries, design patterns, and even entire programming languages oriented towards simple and high-level support for parallelism will become increasingly widespread in the near future, as desktop computers start to move from 2 cores to 4 and then 8 cores and the advantage of automated concurrency becomes much more evident.
exec("Build a system to keep the customer happy, based on requirements.txt");
In Java, create beans less verbosely.
For example:
bean Student
{
String name;
int id;
type1 property1;
type2 property2;
}
and this would create a bean private fields, default accessors, toString, hashCode, equals, etc.
In Java I would like a keyword that would make the entire class immutable.
E.g.
public immutable class Xyz {
}
And the compiler would warn me if any conditions of immutability were broken.
Concurrency. That was my main idea when asking this question. This is going to get more and more important with time, since current CPUs already have up to 8 logical cores (4 cores + hyperthreading), and 12 logical cores will appear in a few months. In the future, we are going to have a hell of a lot of cores at our disposal, but most programing languages only make it easy for us to use one at a time.
The Threads + Synchronization model that is exposed by most programming languages is extremely low level, and very close to what the CPU does. To me, the current level of concurrency language support is roughly equivalent to the memory management support in C: Not integrated, but some things can be delegated to the OS (malloc, free).
I wish some language would come up with a suitable abstraction that either makes the Threads + Synchronization model easier, or that simply completely hides it for us (just as automatic memory management make good old malloc/free obsolete in Java).
Some functional languages such as Erlang have a reputation of having good multithreading support, but the brain-switch required to do functional programming doesn't really make the whole ordeal much easier.
.Net:
A warning when manipulating strings with methods such as Replace and not returning the value (new string) to a variable, because if you don't know that a string is immutable this issue will frustrate you.
In C++, type inference for variable declarations, so that I don't need to write
for (vector<some_longwinded_type>::const_iterator i = v.begin(); i != v.end(); ++i) {
...
}
Luckily this is coming in C++1x in the form of auto:
for (auto i = v.begin(); i != v.end(); ++i) {
...
}
Coffee. I mean the language is call Java - so it should be able to make my coffee! I hate getting up from programming, going to the coffee pot, and finding out someone from marketing has taken the last cup and not made another pot.
Persistence, it seems to me we write far too much code to deal with persistence when this really should be a configuration problem.
In C++, enum-to-string.
In Ada, the language defines the 'image attribute of an enumerated type as a function that returns a string corresponding to the textual representation of an enumeration value.
C++ provides no such clean facility. It takes several lines of very arcane preprocessor macro black magic to get a rough equivalent.
For languages that provide operator overloading, provide automatically generated overloads for symmetric operations when only one operation is defined. For example, if the programmer provides an equality operator but not an inequality operator, the language could easily generate one. In C++, the same could be done for copy constructors and assignment operators.
I think that automatically generating one-side of a symmetric operation would be nice. Of course, I would definitely want to be able to explicitly say don't do that when needed. I guess providing the implementation of both sides with one of them being private and empty could do the job.
Everything that LINQ does. C# has spoiled me and I now find it hard to do anything with collections in any other language. In Python I use list comprehensions a lot, but they are not quite as powerful as LINQ. I haven't found any other language that makes working with collections as easy as in C#.
In Visual Studio environment I want "Remove unused usings" to run across all file in the project. I find it a significant loss of time to have to manually open each individual file and call this operation of a file basis.
From a dynamic languages perspective, I'd like to see better tool support. Steve Yegge has a great post on this. For instance, there are lots of cases where a tool could look inside various code paths and determine if the methods or functions existed and provide the equivalent of the compiler smoke test. Obviously, if you're using lots and lots of truly dynamic code, this won't work, but the fact is, you probably aren't, so it would be pretty nice if Python, for instance, would tell you that .ToLowerCase() wasn't a valid function at compile time, rather than waiting until you hit the else clause.
s = "a Mixed Case String"
if True:
s = s.lower()
else:
s = s.ToLowerCase()
Easy: initialize variables in C/C++ just like C# does. It would have saved me multiple sessions of debugging in other people's code.
Of course there would be a keyword when you specifically do not want to init a var.
noinit float myVal; // undefined
float my2ndVal; // 0.0f