Version number changes when restructuring software

Version number changes when restructuring software - open-source

We are open-sourcing a java software system that was previously proprietary. We loosely followed semantic versioning, by Tom Preston-Werner, where:
Bugfixes imply a patch update (for example, 1.0.X)
Changes to your Public API that are backwards-compatible imply a minor update (for example 1.X.0)
Changes to your Public API that are backwards-incompatible imply a major update (for example X.0.0)
The task of open-source the system required us to rename packages. We also felt we should consolidate much of the modules that existed previously.
The restructuring tasks do not alter the Public APIs, but does change the dependencies for API users.
Where does restructuring/package renaming fit in to semantic versioning? How is a restructuring like this handled in well-know open source projects?

If client code has to be changed to work with a new version, that's an incompatible API change.

Related

What is "vendoring"?

What is "vendoring" exactly? How would you define this term?
Does it mean the same thing in different programming languages? Conceptually speaking, not looking at the exact implementation.

Based on this answer
Defined here for Go as:
Vendoring is the act of making your own copy of the 3rd party packages
your project is using. Those copies are traditionally placed inside
each project and then saved in the project repository.
The context of this answer is in the Go language, but the concept still applies.

If your app depends on certain third-party code to be available you could declare a dependency and let your build system install the dependency for you.
If however the source of the third-party code is not very stable you could "vendor" that code. You take the third-party code and add it to your application in a more or less isolated way. If you take this isolation seriously you should "release" this code internally to your organization/working environment.
Another reason for vendoring is if you want to use certain third-party code but you want to change it a little bit (a fork in other words). You can copy the code, change it, release it internally and then let your build system install this piece of code.

Vendoring means putting a dependency into you project folder (vs. depending on it globally) AND committing it to the repo.
For example, running cp /usr/local/bin/node ~/yourproject/vendor/node & committing it to the repo would "vendor" the Node.js binary – all devs on the project would use this exact version. This is not commonly done for node itself but e.g. Yarn 2 ("Berry") is used like this (and only like this; they don't even install the binary globally).
The committing act is important. As an example, node_modules are already installed in your project but only committing them makes them "vendored". Almost nobody does that for node_modules but e.g. PnP + Zero Installs of Yarn 2 are actually built around vendoring – you commit .yarn/cache with many ZIP files into the repo.
"Vendoring" inherently brings tradeoffs between repo size (longer clone times, more data transferred, local storage requirements etc.) and reliability / reproducibility of installs.

Summarizing other, (too?) long answers:
Vendoring is hard-coding the often forked version of a dependency.
This typically involves static linking or some other copy but it doesn't have to.
Right or wrong, the term "hard-coding" has an old and bad reputation. So you won't find it near projects openly vendoring, however I can't think of a more accurate term.

As far as I know the term comes from Ruby on Rails.
It describes a convention to keep a snapshot of the full set of dependencies in source control, in directories that contain package name and version number.
The earliest occurrence of vendor as a verb I found is the vendor everything post on err the blog (2007, a bit before the author co-founded GitHub). That post explains the motivation and how to add dependencies. As far as I understand the code and commands, there was no special tool support for calling the directory vendor at that time (patches and code snippets were floating around).
The err blog post links to earlier ones with the same convention, like this fairly minimal way to add vendor subdirectories to the Rails import path (2006).
Earlier articles referenced from the err blog, like this one (2005), seemed to use the lib directory, which didn't make the distinction between own code and untouched snapshots of dependencies.
The goal of vendoring is more reproducibility, better deployment, the kind of things people currently use containers for; as well as better transparency through source control.
Other languages seem to have picked up the concept as is; one related concept is lockfiles, which define the same set of dependencies in a more compact form, involving hashes and remote package repositories. Lockfiles can be used to recreate the vendor directory and detect any alterations. The lockfile concept may have come from the Ruby gems community, but don't quote me on that.
The solution we’ve come up with is to throw every Ruby dependency in vendor. Everything. Savvy? Everyone is always on the same page: we don’t have to worry about who has what version of which gem. (we know) We don’t have to worry about getting everyone to update a gem. (we just do it once) We don’t have to worry about breaking the build with our libraries. […]
The goal here is simple: always get everyone, especially your production environment, on the same page. You don’t want to guess at which gems everyone does and does not have. Right.
There’s another point lurking subtlety in the background: once all your gems are under version control, you can (probably) get your app up and running at any point of its existence without fuss. You can also see, quite easily, which versions of what gems you were using when. A real history.

Do Mercurial bundle files support internal integrity checks?

I am working on a project with developers around the globe and we are using mercurial for our source control solution. Currently, we communicate our change sets by creating bundles and posting to a mailing list. A disagreement has arisen about best practices, and we have not been able to find an answer in the mercurial documentation.
When creating a bundle, is there any sort of internal integrity check that occurs? Or should we be sending digests along with the change set to ensure file integrity?

A bundle contains exactly the same data as the data transferred by the wire protocol. Due to the way mercurial works, there's a recursive hashing scheme going on, so every revision must be uncorrupted in order to be used.

Open alternatives to Windows Workflow

Pre-warning: There are some other questions similar to this but don't quite answer the question (these include: Alternatives to Windows Workflow Foundation?, Can anyone recommend a .Net open source alternative to Windows Workflow?)
We are developing a system that is an event based state machine, currently we are investigating windows workflow, our system needs to be low latency in its response to events from a multitude of sources (xmpp, http, sms, phone call, email etc etc) coming into the system, scalable and resilient and most importantly customisable. For a variety of reasons (and due diligence) I am looking for open workflow engines that support functions similar to Windows Workflow Foundation (and more - if possible), mainly (but it doesn't matter too much if there are engines that don't support some features):
Persistence of long running tasks, and resumption of tasks on external events
High performance, low latency
Ability to develop custom actions
The ability to specify workflows dynamically
Tracking and tracing
I am not constrained to platform or language, and I would love some help and tips from you guys so that I can start to investigate the engines more closely and any experiences you had with the engines.
Paul.

I invite you to examine Stateless further, as suggested in the answer to my SO question can-anyone-recommend-a-net-open-source-alternative-to-windows-workflow. to achieve the goal of a long running state machine is very simple in that you can store the current state of your state in a database and re-sync the state machine when needed. Consider the following code from the stateless site:
Stateless has been designed with
encapsulation within an ORM-ed domain
model in mind. Some ORMs place
requirements upon where mapped data
may be stored. To this end, the
StateMachine constructor can accept
function arguments that will be used
to read and write the state values:
var stateMachine = new StateMachine<State, Trigger>(
() => myState.Value,
s => myState.Value = s);
With very little effort you can persist your state, then retrieve that state easily later on.
In respect updating the workflow dynamically, if you configure a state machine such as
var stateMachine = new StateMachine<string, int>();
and maintain a separate file of states and triggers in XML, you can perform a configuration at runtime by looping through the string int value pairs.

"Java side":
Apache ODE (Orchestration Director Engine) executes business processes written following the WS-BPEL standard. It talks to web services, sending and receiving messages, handling data manipulation and error recovery as described by your process definition. It supports both long and short living process executions to orchestrate all the services that are part of your application.
http://ode.apache.org/
OSWorkflow can be considered a "low level" workflow implementation. Situations like "loops" and "conditions" that might be represented by a graphical icon in other workflow systems must be "coded" in OSWorkflow.
http://www.opensymphony.com/osworkflow/
Shark is an extendable workflow engine framework including a standard implementation completely based on WfMC specifications using XPDL (without any proprietary extensions !) as its native workflow process definition format and the WfMC "ToolAgents" API for serverside execution of system activitie
http://www.enhydra.org/workflow/shark/index.html
Python side:
http://bika.sourceforge.net/
http://www.vivtek.com/wftk/
I this will help you :-)

You might consider implementing your flow as an actual state machine. Tools like State Machine Compiler and Ragel can help with this. State machines, in many circumstances, are just what you need to implement insanely complex behavior that is testable, and rock-solid. I don't claim to be a Windows work flow expert, but from what I have seen, I question its superiority over coding your own state machine, either by hand or using a tool.

You might want to check out Simple State Machine.
If you feel like you want to have more control over things and want to roll your own it might be helpful to check out the Saga support that projects like NServiceBus and MassTransit use. Sagas look to be very similar to WF workflows but are POCO objects and I believe both projects just use NHibernate for Saga persistence.

I'm going to recommend you take a few hours to look at the book Open-Source ESBs in Action. "Orchestration" and "Choreography" are the key buzzwords to look at when dealing with "enterprise service busses." The systems for .NET are quite expensive (BizTalk is in the price range of a decent car, the price of Tibco is in the price range of a decent house).
Other links:
Open ESB project
Comparison of OpenESB and ServiceMix (both of which are the subject of the "In Action" book above.

Try Drools for JAVA, I personally have never tried it but I know several commercial applications are based on drools.
http://www.jboss.org/drools/
You could also upgrade to .NET 4.0 there are major improvements in the Workflow in the new framework. I know if I was writing a new workflow application I would jump to 4.0.
Good Luck

JBoss JBPM

Consider Workflow Engine, a lightweight all-in-one component that enables you to add custom executable workflows of any complexity to any .NET or Java software, be it your own creation or a third-party solution, with minimal changes to existing code. It supports custom actions and commands, has timers and supports parallel workflows. And there's a free version.

You can take a look at Imixs-Workflow, which is an event driven approach of a state machine based on bpmn 2.0. It specially focuses on human-centric long running tasks.

Mercurial: How to manage common/shared code

I'm using Mercurial for personal use and am conteplating it for some distributed projects as an alternative to SVN for various reasons.
I'm getting comfortable with using it for self contained projects and can see various options for sharing however I haven't yet found any guidance on managing common libraries to be included in multiple projects in a similar manner to that provided by externals in subversion.
The most obvious shared lump of code is error handling and reporting - we want this to be pretty much the same in all projects (its fairly well evolved). There is also utility code, control libraries and similar that we find better to have as projects built with each solution than to pull in as compiled classes (not least because it ensures they are kept up to date, continuous integration helps us address breaking changes).
Thoughts (I hate open ended questions, but I want to know what, if anything, others are doing).

Mercurial 1.3 now includes nested repository support, which can be used to express dependencies. The other option is to let your build system handle the download and tracking of dependencies using something like ivy or maven though those are more focused on pulling down compiled code.

The world has changed since I asked that question and the solution I now use is different.
The simple answer is now to use packages (specifically NuGet as I do .NET) to deliver the common code instead of nesting repos and including the projects in a solution.
So I have common code built into NuGet packages by and hosted using TeamCity and where previously I would have an external and include the project/source I would now just reference the package.

Use the Forest Extension it emulates svn externals for HG, to some extent that is.

Subrepository (with good guide) or Guestrepo "to overcome ... limitations" (of subrepos) is today's language-agnostic answer

Best practices for version information?

I am currently working on automating/improving the release process for packaging my shop's entire product. Currently the product is a combination of:
Java server-side codebase
XML configuration and application files
Shell and batch scripts for administrators
Statically served HTML pages
and some other stuff, but that's most of it
All or most of which have various versioning information contained in them, used for varying purposes. Part of the release packaging process involves doing a lot of finding, grep'ing and sed'ing (in scripts) to update the information. This glue that packages the product seems to have been cobbled together in an organic, just-in-time manner, and is pretty horrible to maintain. For example, some Java methods create Date objects for the time of release, the arguments for which are updated by a textual replacement, without compiler validation... just, urgh.
I'm trying avoid giving examples of actual software used (i.e. CVS, SVN, ant, etc.) because I'd like to avoid the "use xyz's feature to do this" and concentrate more on general practices. I'd like to blame shoddy design for the problem, but if I had to start again, still using varying technologies, I'd be unsure how best to go about handling this, beyond laying down conventions.
My questions is, are there any best practices or hints and tips for maintaining and updating versioning information across different technologies, filetypes, platforms and version control systems?

Create a properties file that contains the version number and have all of the different components reference the properties file
Java files can reference the properties through
XML can use includes?
HTML can use a JavaScript to write the version number from the properties in the HTML
Shell scripts can read in the file

Indeed, to complete Craig Angus's answer, the rule of thumb here should be to not include any meta-informations in your normal delivery files, but to report those meta-data (version number, release date, and so on) into one special file -- included in the release --.
That helps when you use one VCS (Version Control System) tool from the development to homologation to pre-production.
That means whenever you load a workspace (either for developing, or for testing or for preparing a release into production), it is the versionning tool which gives you all the details.
When you prepare a delivery (a set of packaged files), you should ask that VCS tool about every meta-information you want to keep, and write them in a special file itself included into the said set of files.
That delivery should be packaged in an external directory (outside any workspace) and:
copied to a shared directory (or a maven repository) if it is a non-official release (but just a quick packaging for helping the team next door who is waiting for your delivery). That way you can make 10 or 20 delivers a day, it does not matter: they are easily disposable.
imported into the VCS in order to serve as official deliveries, and in order to be deployed easily since all you need is to ask the versionning tool for the right version of the right deliver, and you can begin to deploy it.
Note: I just described a release management process mostly used for many inter-dependant projects. For one small single project, you can skip the import in the VCS tool and store your deliveries elsewhere.

In addition to Craig Angus' ones include the version of tools used.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008