Semantic Diff Utilities [closed] - language-agnostic

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm trying to find some good examples of semantic diff/merge utilities. The traditional paradigm of comparing source code files works by comparing lines and characters.. but are there any utilities out there (for any language) that actually consider the structure of code when comparing files?
For example, existing diff programs will report "difference found at character 2 of line 125. File x contains v-o-i-d, where file y contains b-o-o-l". A specialized tool should be able to report "Return type of method doSomething() changed from void to bool".
I would argue that this type of semantic information is actually what the user is looking for when comparing code, and should be the goal of next-generation progamming tools. Are there any examples of this in available tools?

We've developed a tool that is able to precisely deal with this scenario. Check http://www.semanticmerge.com
It merges (and diffs) based on code structure and not using text-based algorithms, which basically allows you to deal with cases like the following, involving strong refactor. It is also able to render both the differences and the merge conflicts as you can see below:
And instead of getting confused with the text blocks being moved, since it parses first, it is able to display the conflicts on a per method basis (per element in fact). A case like the previous won't even have manual conflicts to solve.
It is a language-aware merge tool and it has been great to be finally able to answer this SO question :-)

Eclipse has had this feature for a long time. It's called "Structure Compare", and it's very nice. Here is a sample screenshot for Java, followed by another for an XML file:
(Note the minus and plus icons on methods in the upper pane.)

To do "semantic comparisons" well, you need to compare the syntax trees of
the languages, and take into account the meaning of symbols. A really
good semantic diff would understand the language semantics, and realize
when one block of code was equivalent in function to another. Going
this far requires a theorem prover, and while it would be extremely
cute, isn't presently practical for a real tool.
A workable approximation of this is simply comparing syntax trees, and reporting
changes in terms of structures inserted, deleted, moved, or changed.
Getting somewhat closer to a "semantic comparison", one could report
when an identifier is changed consistently across a block of code.
See our http://www.semanticdesigns.com/Products/SmartDifferencer/index.html
for a syntax tree-based comparison engine that works with many languages, that does
the above approximation.
EDIT Jan 2010: Versions available for C++, C#, Java, PHP, and COBOL.
The website shows specific examples for most of these.
EDIT May 2010: Python and JavaScript added.
EDIT Oct 2010: EGL added.
EDIT Nov 2010: VB6, VBScript, VB.net added

What you're groping for is a "tree diff". It turns out that this is much harder to do well than a simple line-oriented textual diff, which is really just the comparison of two flat sequences.
"A Fine-Grained XML Structural Comparison Approach" concludes, in part with:
Our theoretical study as well as our experimental evaluation
showed that the proposed method yields improved structural similarity results with
respect to existing alternatives, while having the same time complexity (O(N^2))
(emphasis mine)
Indeed, if you're looking for more examples of tree differencing I suggest focusing on XML since that's been driving practical developments in that area.

Shameless plug for my own project:
HTML Tree Diff does structure-aware comparison of xml and html documents, written in python.
http://pypi.python.org/pypi/html-tree-diff/0.1.0

The solution to this would be on a per language basis. I.e. unless it's designed with a plugin architecture that defers a lot of the parsing of the code into a tree and the semantic comparison to a language specific plugin then it will be very difficult to support multiple languages. What language(s) are you interested in having such a tool for. Personally I'd love one for C#.
For C# there is an assembly diff add-in to Reflector but it only does a diff on the IL not the C#.
You can download the diff add-in here [zip] or go to the project on the codeplex site here.

A company called Zynamics offers a binary-level semantic diff tool. It uses a meta-assembly language called REIL to perform graph-theoretic analysis of 2 versions of a binary, and produces a color-coded graph to illustrate differences between them. I am not sure of the price, but I doubt it is free.

http://prettydiff.com/
Pretty Diff minifies each input to remove comments and unnecessary white space and then beautifies the code prior to the diff algorithm. I cannot think of anyway to become more code semantic than this. And, its written JavaScript so it runs directly in the browser.

Related

I want to write a tool without usage entry barriers. Do I have to write it in C? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I want to write an open-source tool for use by developers. I want to eliminate entry barriers, so if they like the idea, they just get the tool and start playing with it.
In particular, I don't want an "Oh, should I also install 200Mb of ThatLanguage runtime libraries? Oh, so they don't build on my latest version of Linux?" entry barrier.
Should I write this tool in C, then? Or is Python, or Java, or whatever, already sufficiently widespread to not worry about this sort of things altogether (everyone already has them installed)?
Well, of course I know that they are freaking hugely widespread, but still - are there any major benefits to writing a super-lightweight zero-dependency tool, or am I being too much of a perfectionist?
Just write it first. If it is worth it people will use it.
Beyond that, (almost) everyone has Java, Python, and Ruby installed (especially devs). Some languages are still esoteric enough that it might not be worth it for 'that one app' (erlang, haskell, etc.).
Just write it though, that's the important part. From there it can be ported, rewritten, adopted, but none of that can happen if the tool isn't written first.
It won't help if people don't know C.
If you write your own DSL, you can have people use that API and not worry about which language you choose.
Write it in whatever common language you like. Everybody has installed .NET framework or JVM. The only difference between your C approach and Java or C# is, that you would link additional libraries directly to your program (opposed to standard libraries).
On the other hand I would hesitate to write it in some exotic language, for example smalltalk, because normal user does not know what is it squak or smalltalk itself and could be worried about installing the wierd thing :-).
I also think, that you should be concerned more about developers, because you write, you want it to be open source. I dont know anyone, who wants to write his own Swing, Spring or any other framework just to be independent of something. Also its (usually) much faster and easier to write it in JIT Language, than to code it in assembler...
I'm going to suggest what Reese suggested but take a slightly different approach: write it first, preferably in a language that allows you to quickly prototype and develop your program. Then, and this is the most important part, document the protocal you've developed.
I'm giving this advice because you mentioned that your "application" may later have bindings in lots of different languages and it is a client/server architecture. Well, two of the biggest applications in the world started out like this.
Bittorrent started out as Python code. This allowed very quick prototyping of the concept to get it working. The main thing that it had going for it was that the original code was well written and well documented. This later on allowed other people to port the protocol to other languages.
HTTP and HTML is an even bigger success story and started out with an even less popular language at the time it was written: objective-C. Even better than bittorrent, the protocol itself is very simple and very well documented. People didn't care that the original implementation was in a language that they've never seen before that uses square brackets in strange ways on a NeXT cube. The concept and execution was good and people quickly ported it to their favourite programming languages. Again, objective-C was chosen to aid in quick prototyping. Legend has it that the original implementation was written in just a couple of days.
I would say yes, you have to write it in C. If it were written in any language other than C (except perhaps C++ or Perl), I would definitely stop to consider whether the necessary build tools, runtime tools, and/or interpreter for that language would be available everywhere I might need the tool before getting myself dependent upon it. If the tool were meant for use in build scripts, I would consider it a complete show-stopper, since I can't expect anyone who wants to build my software to have random arbitrary language environments installed.
The reason I mentioned C++ and Perl as exceptions is that they're both largely portable in a formal sense. They have implementations that work without significant ties to the host implementation, and can be built not just on any current popular system but on any system that remotely adheres to standards. Python is quite the opposite, with strong dependencies on the underlying system's dynamic loader; I've been completely unable to get Python to work on various systems that only support static linking.
ocaml is another possible choice that has a very portable implementation, but it's not widely installed and people who aren't familiar with it tend to frown on it for no good reason.
If you write your program in C, then you will have the dependency of the platform (Windows != Linux != AIX, etc). If you are talking only about writing this tool for one OS, or rather THE OS (Linux;-), then I think that you can have a reasonable amount of confidence that your app will work on almost any system, especially if you use an Open Source language. If you want to run the app on Windows, I wouldn't count on any of those languages being installed on the host system. Your highest confidence across platforms will be with Java.
If possible you could use the lightest weight framework possible and put it online, where it can be viewed in a browser. What does your app do? Would it work as a web app?
I would suggest go for Delphi. If you want to make it portable, you can do it since most of the Delphi code is kylix compatible.

Documenting a Access Application for Developers [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I need to document a MS-Access application that was created, developed and maintained completely by a power-user over 10 years.
This is an interesting situation because what they want is a manual so that a future developer can come in without prior domain knowledge and make changes to the frontend or the backend in a timely manner.
There are a few questions on my mind for this little project:
What is a good manual design creating application? Microsoft Word doesn't quite cut it.
What kind of things would you, the developer, need to know in order to make changes to things like forms, reports, tables or other Access objects?
Anything else I missed? Any pitfalls?
You could start with generating some automatic code documentation using MZ-Tools add-in for VBA. The same add-in can help you clean unused variable declarations, generate line numbers, reorder procedures within a module, etc.
Documenting forms is more difficult. My proposal would be to keep a screen shot, alltogether with a .txt file obtained through the undocumented application.saveAstext method.
In my experience, Access and VB6- based programs are plagued by more code replication and technical debt than programs in mainstream languages. I'm not sure why. Maybe it's the nature of Access as a "prototype" or "toy" database (though it can be quite powerful when yielded correctly).
If I had to choose between expending time on documentation and expending time on reducing technical debt, for example by remodularizing, eliminating repeated code, splitting long functions, etc., I would choose the latter. The improvement to maintainability and readability would be greater.
I know this is closed for long, but I can't refrain adding my 2 cents:
In the case mentionned, I think the most usefull doc to produce is a FUNCTIONAL documentation (which should have existed before starting the development in an ideal world).
Second is within the code itself, and that includes the VBA but also the field descriptions which can be set in Access and SQL Server.
Third is a (or a set of) nice database diagram.
Once you have that, all the rest can be generated by the new developer using HIS favorite tools.
Speaking about tools, I particularly like and recommend:
MZ Tools: specially to easily find which routines call the one your looking at
Smart Indent: to properly indent code. Trying to read badly indented code makes me sick
SqlSpec: (not free) generates HTML doc of the database itself for most database engines
Have you tried the using the built in database documenter? It will print out all tables, indexes, forms, controls, each property of controls. Code, the sql used and just about any thing else. This results in huge, but just massive printouts. However, while it will kill a few trees in the process it sure is a great way to impress the boss.

Programming == Configuring? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 13 years ago.
Improve this question
I hear a couple of people using the term 'programming' rather than configuring, for example:
Have you already programmed Apache's
Virtual Hosts configuration correctly, with
ServerName named FOO?
Program your .vimrc first before
starting Vim the first time.
The last is a word-by-word citation from my teacher, but I didn't dare to correct him. Is it OK to use 'programming' instead of 'configuring'?
IMHO this sounds very ugly.
Well.. ordinary people "program" their VCR, Tivo etc. So for ordinary people program == configure. Note that even programmers don't say "program the javascript". Instead people use words like "develop" or "write" for writing programs in the programming sense.
A definition I like for programming is:
creating a sequence of instructions to enable the computer to do something
So, if you configure anything you are indirectly creating a sequence of instructions. Which IMHO would "qualify" configuring as an indirect type of programming.
EDIT:
Also, computer development is far more than computer programming. To develop you need much more than only write instruction, you also need
Requirements definition
Write specifications
Planning
a lot more
I generally tend to prefer the terms 'coding' and the verb 'to code' rather than programming. It's just that bit less fuzzy and has fewer alternative meanings.
Configuration is just a form of (usually declarative rather than procedural) scripting,, i.e., programming against an API.
In most cases, what we call configuration is not sophisticated enough be worthy of the name "scripting" or "programming", but some systems based on Ruby, Python, or Lisp -- e.g., EMACS -- use the programming language as a configuration language, and then configuration really does blend into programming.
If I'd tell you what kind of things I've heard... For example, during a network security class, we had to generate SSH certificates, and one girl said that the tool that generated the keys "wasn't compiling" (of course it was already compiled and installed, she just had to use it to generate the certificates!... but I suspect that for her, anything that was to be done in the console was "to compile").
So in brief, people will always speak and write badly, just don't follow them.
I completely agree with slebetman, but I'll also add that there might be some age and/or regional issues here.
As a military brat, having lived in the US south, and now working with a bunch of europeans, I frequently run into words used in different ways that I expected. Some of it might be slang to us, but it's completely normal to the person using it, and frequently, when I look up the words in a dictionary, you'll find an alternate definition that makes perfect sense.
In this particular case, from dictionary.com, the last verb definition for 'program' is :
to set, regulate, or modify so as to produce a specific response
or reaction: Program your eating habits to eliminate sweets.
Other times, I'll find that more recent generations have taken words and used them in more limited ways, but the term has a more general meaning. (casket comes to mind, which originally just meant 'small box', but now has death connotations)
I'd say that these are incorrect usages of the term 'programming' - as you say this is simply configuration/setup.
In a sense, configuration is programming. It is a set of instructions for a computing device that has a very limited language - the set of allowable values for the parameters of the device/software.
One could view the apache server, for example, as a language interpreter, and the parameter values as the source code for that interpreter.
However, the devices are not Turing-equivalent in general (exceptions are things like emacs, where definitely it is) and I would personally reserve "programming" for cases where the language is Turing-equivalent.

Looking for (c)lisp examples of mini-languages, that is, DSLs [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
Reading well-written code seems to help me learn a language. (At least it worked with C.) [deleting the 'over-specified' part of the question]
I'm interested in particular in lisp's reputation as a language suited to creating a mini-language or DSL specific to a problem. The program ought to be open-source, of course, and available over the web, preferably.
I've Googled and found this example:
http://lispm.dyndns.org/news?ID=NEWS-2005-07-08-1
Anybody have another? (And, yes, I will continue reading "Practical Common Lisp".)
After 11 hours (only 11 hours!): Thanks, everyone. What a wonderful site, and what a bunch of good answers and tips!
I feel your constraints are over-specified:
small enough to comprehend, varied
enough to show off most of (c)lisp's
tricks and features without being
opaque (the 'well-written' part of the
wish), and independent of other
packages.
Common Lisp is a huge language, and the power set that emerges when you combine the language elements is much larger. You can't have a small program showing "most tricks" in CL.
There are also many concepts that you will find alien when you learn CL coming from another language. As such CL is less about tricks but more about its fundamental paradigms.
My suggestion is to read up on it a bit first and then start building your own programs or looking into open source code.
Edi Weitz for example usually writes good code. Check out his projects at http://www.weitz.de/.
And now go read PCL. :)
I'm kind of lazy to find the links, but you should be able to 'Google'/'Bing' it. The following list mentions very different ways to embed languages and very different embedded languages.
ITERATE for iterations
System/Module/File description in 'defsystem's, an example would be ASDF
infix readmacro
define-application-frame in CLIM for specifying user interfaces
embedded Lispified SQL queries in LispWorks and CLSQL
Knowledgeworks of LispWorks: logic language with rules, queries, ...
embedded Prolog in Allegro CL
embedded HTML in various forms
XMLisp, integrates XML and Lisp
Screamer for non-deterministic programming
PWGL, visual programming for composing music
Note that there are simple embedded languages and really complex ones that are providing whole new paradigms like Prolog, Screamer, CORBA, ...
If you haven't taken a look at it yet, the book Practical Common Lisp is available free online and has several example projects.
The LOOP macro is an almost perfect example of a DSL embedded in Common Lisp. However, since it's already part of the standard, it may not be what you're after.
CLs format function have a mini dsl.
http://cybertiggyr.com/fmt/
I think that dsl for printing strings will compile to machine code.
(format nil "~{~A~#[~:;, ~]~}" lst))
CLSQL provides a Lispy notation for SQL queries, which it compiles to SQL, and just about all Lisp HTML and XML generation libraries qualify. Metabang bind is a DSL for lexically binding variables. You probably didn't know you needed one, but it turns out to be amazingly useful.
SERIES is kind of a DSL, depending on your definition. It's in an appendix to CLTL2, though it's not actually part of the language.

What tools do you use for outlining projects? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Whenever I start working on projects that are complex enough that I can't keep it all in my head at once I like to outline how the app should work... I usually hack something like this out in a text editor:
# Program is run
# check to see if database exists
# create database
# complain on error, exit
# ensure database is writable
# complain to user, exit
# check to see if we have stored user credentials
# present dialog asking for credentials
# verify credentials and reshow dialog if they're invalid
# show currently stored data
# start up background thread to check for new data
# update displayed data if new data becomes available
# ...
#
# Background service
# Every 15min update data from server
# Every 24 hours do a full sync w/ server
Et cetera (note: this is commented so SO won't parse it, not because I include it as comments in code).
What I'm wondering is how you guys do this. Are there any tools for outlining a program's flow? How do you describe complex projects so that when it comes time to code you can concentrate on the code and not the design/architecture of all the little pieces?
I use GraphViz if I need to sketch out such simple diagrams - the DOT language is lightweight and diffs very nicely when I compare versions of the diagrams.
I blogged about this with an example a few months ago with an example showing a more complex architecture diagram.
I've also just added a blog post with a zoomed-out diagram that shows a large program flow, to give an idea of how a GraphViz flow might be composed. I haven't the time to obfuscate all the text so just put it up there as a picture at low res to give the impression of the architecture without being able to zoom in to see readable details.
This diagram was composed by hand after a bunch of grepping to get launches. To avoid taunting you too much, here are some excerpts of the DOT text that generates the diagram.
digraph windows {
rankdir=LR
label="Windows Invoked\nby controls and menu items"
node[fontsize=12]
/* ENTRY POINTS */
wndMainMenu[shape=box color=red fontcolor=red]
DEFAULT_WINDOW[LABEL="DEFAULT\NWINDOW" shape=box color=red fontcolor=red]
/* WINDOWS */
node[shape=box color=black fontcolor=black style=solid]
App
wndAddBill [label="Add Payable\nwndAddBill"]
wndAddCustomer [label="Add a Customer\nwndAddCustomer"]
...
/* WINDOW INVOCATION */
node[shape=oval color=blue fontcolor=blue style=normal]
edge[fontsize=10 style=normal color=blue fontcolor=blue]
wndPayBills_bvlNewBill -> wndAddBill
wndAddCustomer -> wndAddCustomer_save001
wndManageDrivers_bvlNewCustomer -> wndAddCustomer
alt text http://www.aussiedesignedsoftware.com/img/WindowLaunchesZoomedOut.png
Emacs M-x outline-mode
Or, paper.
p.s. this is a serious answer.
Basically what you are trying to do is extract the information and use-cases in Given-When-Then format. refer http://wiki.github.com/aslakhellesoy/cucumber/given-when-then. This approach solved both problems.
comprehension of domain and edge cases
outlining of the solution so you know what to work on next in addition to where to start
Are there any tools for outlining a program's flow?
Your top comments ("Program is run") could be expressed using a "flow chart".
Your bottom comments ("Background service") could be expressed using a "data flow diagram".
I don't use flow charts (I don't find they add value compared to the corresponding pseudo-code/text, as you wrote it), but I do like data flow diagrams for showing a top-level view of a system (i.e. the data stores/formats/locations, and the data processing stages/IO). Data flow diagrams predate UML, though, so there aren't very many descriptions of them on the 'net.
For anything related to documentation: Wikis, wikis and more wikis!
Easy to read and most important, easy for anyone to update.
My favourite one: Trac (much more than just a wiki anyway)
I like sequence diagrams for anything in the OO realm. There are several nice ways to create sequence diagrams without spending all your time pushing polygons around.
First, there are some online sequence diagram generators that take textual input. For one example, see WebSequenceDiagrams.com.
There's also a nice Java based tool that takes textual input and creates diagrams. This is well-suited for integration into your build process, because it can be invoked directly from ant.
If something is complex I like pictures, but I tend to do these by hand on paper, so I can visualize it better. Whiteboards are great for this.
I break the large, or complex app, into smaller parts, and design those out on paper, so I can better understand the flow between the parts.
Once I have the flow between parts done, then I can better design each part separately, as each part is it's own subsystem, so I can change languages or platforms if I desire.
At that point, I just start working on the application, and just work on one subsystem at a time, even though the subsystem may need to be decomposed, until I have a part that I can keep in my head.
Use Cases
Activity Diagrams
Sequence Diagrams
State Machine Diagrams
Class Diagrams
Database Diagrams
Finally, after those are done and the project is looking well defined, into Microsoft Project.
I like to keep this flow as it keeps things well documented, well defined and easily explainable, not to mention, it's simply a good process. If you are unsure on what these are, look at my answer in here giving more information, as well as some links out.
I recommend using UML
There are various depths you can go into when designing. If you take UML far enough, most UML applications can auto generate the basic framework of your code for you.
Typically I rely on loose UML, generating use cases, use case diagram, class diagram, component diagram, and have started using sequence diagrams more.
Depending on the project a whiteboard or notepad works, but for a project of reasonable size and time, I'll do everything using ArgoUML
I have enjoyed StarUML in the past, but it's Win32 only, which is now useless to me.
A great book on the subject is
Applying UML and Patterns: An Introduction to Object-Oriented Analysis and Design and Iterative Development (3rd Edition) - [978-0131489066]
I had to pick it up for a college course which did a crumby job teaching UML, but kept it and have read it a time or two since.
This is also worth checking out: Learning UML 2.0 - O'Reilly - [978-0596009823]