Checkstyle and Findbugs for changed files only on Jenkins (and/or Hudson) - hudson

We work with a lot of legacy code and we think about introducing some metrics for new code. Is it possible to let Findbugs and Checkstyle run on changed files only instead of a complete project?
It would be nice to assure that only file with a minimum of quality is checked in, but the code base itself is not (yet) touched and evaluated not to confuse people by thousands of issues.

In theory, it would be possible. You would use a shell script to parse the SVN (or whatever SCM) change logs after a given start date, identify the .java files from these change sets and build two patterns from these:
The Findbugs Maven Plugin expects a comma-separated list of class (or
package) names for the parameter onlyAnalyze, so you'll have
to translate file names to fully qualified class names (this will get
tricky when you're dealing with inner classes)
The Maven Checkstyle Plugin is even worse, it expects a
configuration file for its packageNamesLocation parameter.
Unfortunately, only packages are allowed, not individual files. So
you'll have to translate file names to packages.
In the above examples I assume that you are using maven. I am pretty sure that similar things can be done with ant, but I wouldn't know.
I myself would probably use a Groovy script instead of a shell script to achieve the above results.

Findbugs has ant tasks that can do diffs against different findbugs results to see just the deltas, so only reporting new bugs, see
http://findbugs.sourceforge.net/manual/datamining.html

Related

Is it possible to write a dual pass checkstyle check?

I have two situations I need a checkstyle check for. Let's say I have a bunch of objects with the annotation #BusinessLogic. I want to do a first pass through all *.java files creating a Set with the full classnames of these objects. Let's say ONE of the classes here is MyBusinessLogic. NEXT, and as part of a custom checkstyle checker, I want to go through and fail the build if there is any lines of code that say "new MyBusinessLogic()" in any of the code. We want to force DI when objects are annotated with #BusinessLogic. Is this possible with checkstyle? I am not sure checkstyle does a dual pass.
Another option I am considering is some gradle plugin perhaps that scans all java files and writes to a file the list of classes annotated with #BusinessLogic and then running checkstyle after that where my checker reads in the file?
My next situation is I have a library delivered as a jar so in that jar, I also have classes annotated with #BusinessLogic and I need to make sure those are also added to my list of classes that should not be newed up manually and only created with dependency injection.
Follow up question from the previous question here after reading through checkstyle docs:
How to enforce this pattern via gradle plugins?
thanks,
Dean
Is it possible to write a dual pass checkstyle check?
Possible, yes, but not officially supported. Support would come at https://github.com/checkstyle/checkstyle/issues/3540 but it hasn't been agreed on.
Multi-file validation is possible with FileSets (still not officially supported), but it becomes harder with TreeWalker checks. This is because TreeWalker doesn't chain finishProcessing to the checks. You can implement your own TreeWalker that will chain this finishProcessing to implementation of AbstractChecks.
You will have to do everything in 1 pass with this method. Log all new XXX and classes with annotation #YYY. In the finishProcessing method, correlate the information obtained between the 2 and print a violation when you have a match.
I have a library delivered as a jar
Checkstyle does not support reading JARs or bytecode. You can always create a hard coded list as an alternative. The only other way is build your own reader into Checkstyle.

How do I find where a function is declared in Tcl?

I think this is more of a Tcl configuration question rather than a Tcl coding question...
I inherited a whole series of Tcl scripts that are used within a simulation tool that my company built in-house. In my scripts, I'm finding numerous instances where there are function calls to functions that don't seem to be declared anywhere. How can I trace the path to these phantom functions?
For example, rather than use source, someone build a custom include function that they named INCLUDE. Tclsh obviously balks when I try to run it there, but with my simulation software, it runs fine.
I've tried grep-ing through the entire simulation software for INCLUDE, but I'm not having any luck. Are there any other obvious locations outside the simulation software where a Tcl function might be defined?
The possibilities:
Within your software. (you have checked for this).
Within some other package included by the software.
Check and see if the environment variable TCLLIBPATH is set.
Also check and see if the simulation software sets TCLLIBPATH.
This will be a list of directories to search for Tcl packages, and you
will need to search the packages that are located outside of the
main source tree.
Another possibility is that the locations are specified in the pkgIndex.tcl file.
Check any pkgIndex.tcl files and look for locations outside the main source tree.
Within an unknown command handler. This could be in
your software or within some other package. You should be able to find
some code that processes the INCLUDE statement.
Within a binary package. These are shared libraries that are loaded
by Tcl. If this is the case, there should be some C code used to
build the shared library that can be searched.
Since you say there are numerous instances of unknown functions, my first
guess is that you have
not found all the directories where packages are loaded from. But an
''unknown'' command handler is also a possibility.
Edit:
One more possibility I forgot. Check and see if your software sets the auto_path variable. Check any directories added to the auto_path for
other packages.
This isn't a great answer for you, but I suspect it is the best you're going to get...
The procedure could be defined in a great many places. Your best bet for finding it is to use a tool like findstr (on Windows) or grep -R (on POSIX platforms) to search across all the relevant source files. But that still might not help! It might not be a procedure but instead a general command, which could be implemented in C and not as a procedure, or it could be defined in a packaged application archive (which are usually awkward to look inside). There are also other types of script-implemented command too, which could make things awkward. Generally searching and investigating is your best bet, but it might not work.
Tcl doesn't really differentiate strongly between different types of command except in some introspection operations. If you're lucky, you could find that info body tells you the definition of the procedure (and info args and info default tell you about the arguments) but that won't help with other command types at all. Tcl 8.7 will include a command (info cmdtype) that would help a lot with narrowing down what to do next, but that's no use to you now and it definitely doesn't exist in older versions.

What are the output files of the VxWorks Workbench kernel configuration GUI

I'm trying to generate a VxWorks 6.9.4.8 kernel configuration that is identical to another kernel workbench project. The Workbench 3.3.6 only allows GUI configuration.
Is there an underlying kernel configuration file, produced by the GUI, which can be replaced?
After updating the kernel configuration using the Workbench GUI, I see the following files have changed:
linkSyms.c,
prjComps.h,
prjConfig.c, and
prjParams.h
I guess my question is, which one, if any uniquely identifies the kernel as built?
prjComps.h will contain all the component's names, as you have chosen in your kernel configuration GUI.
First step to create new Kernel configuration based on some other Kernel configuration is to use GUI configurator and add the missing component in prjComps.h, Better use some diff tool like 'beyond compare', and keep reducing the differences by adding/removing the components. Remember not to edit this file directly, but via GUI configurator only. As the tool calculates the dependent component and adds/removes them.
Second step is to create the new prjParams.h as above.
The Workbench actually allows to use command line to edit Kernel configuration via vxprj tool in vxworks 6.9(this tool has been replaced by "wrtool" in vxworks 7), you can right click on the Image project and chose 'Open Wind River vxWorks 6.9 Developement Shell'.
If you want to add a component for e.g. telnet client (INCLUDE_TELNET_CLIENT)
, you can use the following command
vxprj component add INCLUDE_TELNET_CLIENT
To remove a component
vxprj component remove INCLUDE_TELNET_CLIENT
For more of vxprj tool, you can look up the documentation in the workbench itself.
The project configuration is held in a handful of files in the kernel project directory.
These are:
.project
.cproject
.wrproject
projectname.wpj
Files such as prjComps.h, prjParams.h prjConfig.c are all generated by the configuration tool, however these are not configuration files themselves. Instead, this is generated C code that contains, amongst other things, a list of selected components.
These files are also re-generated, I believe, when you rebuild the project.
As such, these are not really the authoritative source you are interested in.
For this, you need to look at the project files. In terms of a list of components, the most interesting is the .wpj file, which contains amongst other things a list of explicitly and implicitly included components.
The explicitly included components are those you manually selected in the Kernel Configuration GUI, the implicitly included are those that were then included to satisfy dependencies.
This distinction can sometimes make comparing kernel configurations tricky, then you may want to fall back on the generated files eg prjComps.h, however you should always remember that this is a representation of the configuration, not the source.
The .project etc configuration files are big and complex, but a decent diff tool, such as BeyondCompare can make comparisons of the project directories fairly easy
Thanks for the clue, #endTunnel. I looked at that file, and noticed that a few files get modified when I save my GUI selections.
prjComps.h - all the components #included in the kernel build
prjParams.h - the additional parameters set for the enabled components
prjConfig.c - the configuration and initialization calls for each module included.
'linkSyms.c' also gets modified. Not sure how that is used, yet.
I can now use diff to compare kernel configurations, and perhaps even duplicate a configuration (haven't tried that yet).

How to edit built in command behavior

I want to edit find_under_expand (ctrl+d) to consider hyphenated words, as single words. So when I try to replace all instance of var a, it shouldn't match substrings of "a" in words like a-b, which it currently does.
I'm assuming find_under_expand wraps your current selection in regex boundaries like this: \ba\b
I need it to wrap in something like this: \b(?<!-)a(?!-)\b
Is the find_under_expand command's source available to edit? Or do I have to rewrite the whole thing? I'm not sure where to begin.
Sublime's commands are implemented in one of several ways: as macros, as plugins, and internally as part of the compiled program (probably as C++). The default macros and plugins can be found in the Packages/Default directory in ST2 (where Packages is the directory opened when selecting Preferences -> Browse Packages...), or zipped in the Installed Packages/Default.sublime-package file in ST3, extractable using #skuroda's excellent PackageResourceViewer plugin, available via Package Control. Macros have .sublime-macro extensions, while plugins are written in Python and have .py extensions.
I searched all through the Defaults package in ST3 (things are generally the same as in ST2), and was unable to find a macro or .py file that included the find_under_expand command, or FindUnderExpand, which is the convention when naming command classes in plugins. Therefore, I strongly suspect that this command is internal to Sublime, probably written in C++ and linked into the executable or in a .dll|.dylib|.so library.
So, it doesn't look like there's an existing file that you could easily modify to adjust for your negative lookahead/lookbehind patterns (I assume that's what those are, my regex is a bit rusty...). Instead, you'll have to implement your own plugin from scratch that reads the "word_separators" value in your settings file, which the current implementation of find_under_expand doesn't seem to be doing, judging from your previous question and my own testing. Theoretically, this shouldn't be too terribly difficult - you can just open up a quick panel where the user enters the pattern/regex to be searched for, and you can just iterate through the current view looking for matches and highlighting/selecting them.
Good luck!

What should NOT be under source control?

It would be nice to have a more or less complete list over what files and/or directories that shouldn't (in most cases) be under source control. What do you think should be excluded?
Suggestion so far:
In general
Config files with sensitive information (passwords, private keys etc.)
Thumbs.db, .DS_Store and desktop.ini
Editor backups: *~ (emacs)
Generated files (for instance DoxyGen output)
C#
bin\*
obj\*
*.exe
Visual Studio
*.suo
*.ncb
*.user
*.aps
*.cachefile
*.backup
_UpgradeReport_Files
Java
*.class
Eclipse
I don't know, and this is what I'm looking for right now :-)
Python
*.pyc
Temporary files
- .*.sw?
- *~
Anything that is generated. Binary, bytecode, code/documents generated from XML.
From my commenters, exclude:
Anything generated by the build, including code documentations (doxygen, javadoc, pydoc, etc.)
But include:
3rd party libraries that you don't have the source for OR don't build.
FWIW, at my work for a very large project, we have the following under ClearCase:
All original code
Qt source AND built debug/release
(Terribly outdated) specs
We do not have built modules for our software. A complete binary is distributed every couple weeks with the latest updates.
OS specific files, generated by their file browsers such as
Thumbs.db and .DS_Store
Some other Visual Studio typical files/folders are
*.cachefile
*.backup
_UpgradeReport_Files
My tortoise global ignore pattern for example looks like this
bin obj *.suo *.user *.cachefile *.backup _UpgradeReport_Files
files that get built should not be checked in
I would approach the problem a different way; what things should be included in source control? You should only source control those files that:
( need revision history OR are created outside of your build but are part of the build, install, or media ) AND
can't be generated by the build process you control AND
are common to all users that build the product (no user config)
The list includes things like:
source files
make, project, and solution files
other build tool configuration files (not user related)
3rd party libraries
pre-built files that go on the media like PDFs & documents
documentation
images, videos, sounds
description files like WSDL, XSL
Sometimes a build output can be a build input. For example, an obfuscation rename file may be an output and an input to keep the same renaming scheme. In this case, use the checked-in file as the build input and put the output in a different file. After the build, check out the input file and copy the output file into it and check it in.
The problem with using an exclusion list is that you will never know all the right exclusions and might end up source controlling something that shouldn't be source controlled.
Like Corey D has said anything that is generated, specifically anything that is generated by the build process and development environment are good candidates. For instance:
Binaries and installers
Bytecode and archives
Documents generated from XML and code
Code generated by templates and code generators
IDE settings files
Backup files generated by your IDE or editor
Some exceptions to the above could be:
Images and video
Third party libraries
Team specific IDE settings files
Take third party libraries, if you need to ship or your build depends on a third party library it wouldn't be unreasonable to put it under source control, especially if you don't have the source. Also consider some source control systems aren't very efficient at storing binary blobs and you probably will not be able to take advantage of the systems diff tools for those files.
Paul also makes a great comment about generated files and you should check out his answer:
Basically, if you can't reasonably
expect a developer to have the exact
version of the exact tool they need,
there is a case for putting the
generated files in version control.
With all that being said ultimately you'll need to consider what you put under source control on a case by case basis. Defining a hard list of what and what not to put under it will only work for some and only probably for so long. And of course the more files you add to source control the longer it will take to update your working copy.
Anything that can be generated by the IDE, build process or binary executable process.
An exception:
4 or 5 different answers have said that generated files should not go under source control. Thats not quite true.
Files generated by specialist tools may belong in source control, especially if particular versions of those tools are necessary.
Examples:
parsers generated by bison/yacc/antlr,
autotools files such as configure or Makefile.in, created by autoconf, automake, libtool etc,
translation or localization files,
files may be generated by expensive tools, and it might be cheaper to only install them on a few machines.
Basically, if you can't reasonably expect a developer to have the exact version of the exact tool they need, there is a case for putting the generated files in version control.
This exception is discussed by the svn guys in their best practices talk.
Temp files from editors.
.*.sw?
*~
etc.
desktop.ini is another windows file I've seen sneak in.
Config files that contain passwords or any other sensitive information.
Actual config files such a web.config in asp.net because people can have different settings. Usually the way I handle this is by having a web.config.template that is on SVN. People get it, make the changes they want and rename it as web.config.
Aside from this and what you said, be careful of sensitive files containing passwords (for instance).
Avoid all the annoying files generated by Windows (thumb) or Mac OS (.ds_store)
*.bak produced by WinMerge.
additionally:
Visual Studio
*.ncb
The best way I've found to think about it is as follows:
Pretend you've got a brand-new, store-bought computer. You install the OS and updates; you install all your development tools including the source control client; you create an empty directory to be the root of your local sources; you do a "get latest" or whatever your source control system calls it to fetch out clean copies of the release you want to build; you then run the build (fetched from source control), and everything builds.
This thought process tells you why certain files have to be in source control: all of those necessary for the build to work on a clean system. This includes .designer.cs files, the outputs of T4 templates, and any other artifact that the build will not create.
Temp files, config for anything other than global development and sensitive information
Things that don't go into source control come in 3 classes
Things totally unrelated to the project (obviously)
Things that can be found on installation media, and are never changed (eg: 3rd-party APIs).
Things that can be mechanically generated, via your build process, from things that are in source control (or from things in class 2).
Whatever the language :
cache files
generally, imported files should not either (like images uploaded by users, on a web application)
temporary files ; even the ones generated by your OS (like thumbs.db under windows) or IDE
config files with passwords ? Depends on who has access to the repository
And for those who don't know about it : svn:ignore is great!
If you have a runtime environment for your code (e.g. dependency libraries, specific compiler versions etc.) do not put the packages into the source control. My approach is brutal, but effective. I commit a makefile, whose role is to downloads (via wget) the stuff, unpack it, and build my runtime environment.
I have a particular .c file that does not go in source control.
The rule is nothing in source control that is generated during the build process.
The only known exception is if a tool requires an older version of itself to build (bootstrap problem). In that case you will need a known good bootstrap copy in source control so you can build from blank.
Going out on a limb here, but I believe that if you use task lists in Visual Studio, they are kept in the .suo file. This may not be a reason to keep them in source control, but it is a reason to keep a backup somewhere, just in case...
A lot of time has passed since this question was asked, and I think a lot of the answers, while relevant, don't have hard details on .gitignore on a per language or IDE level.
Github came out with a very useful, community collaborated list of .gitignore files for all sorts of projects and IDEs that is worth taking a look.
Here's a link to that git repo: https://github.com/github/gitignore
To answer the question, here are the related examples for:
C# -> see Visual Studio
Visual Studio
Java
Eclipse
Python
There are also OS-specific .gitignore files. Following:
Windows
OS X
Linux
So, assuming you're running Windows and using Eclipse, you can just concatenate Eclipse.gitignore and Windows.gitignore to a .gitignore file in the top level directory of your project. Very nifty stuff.
Don't forget to add the .gitignore to your repo and commit it!
Chances are, your IDE already handles this for you. Visual Studio does anyway.
And for the .gitignore files, If you see any files or patterns missing in a particular .gitignore, you can open a PR on that file with the proposed change. Take a look at the commit and pull request trackers for ideas.
I am always using www.gitignore.io to generate a proper one .ignore file.
Opinion: everything can be in source control, if you need to, unless it brings significant repository overhead such as frequently changing or large blobs.
3rd party binaries, hard-to-generate (in terms of time) generated files to speed up your deployment process, all are ok.
The main purpose of source control is to match one coherent system state to a revision number. If it would be possible, I'd freeze the entire universe with the code - build tools and the target operating system.