How to upgrade OSS to new version with local changes? - open-source

I'm working on a project where we are using the Orchard CMS.
We've made some modules and themes of our own for the CMS and had to make some code changes in the Orchard sources also. This isn't a very good practice of course, because these changes will disappear when upgrading the CMS or something like that. However, making these changes was inevitable at the time, because the default functionality didn't work as our customer wanted.
Now, all of these changes were made on an older version, 1.5.1, and now we need to move on to the new version (1.6 at the moment). By doing this we will lose all the changes we've made to the Orchard sources.
I've already bundled all the changes we've made to the default sources in 1 changeset, so merging them into the new version of the CMS has been made a bit easier. However, this isn't a really good workflow in my opinion.
We could choose to fork the project and make the changes in the fork, but we will have the same difficulties when upgrading, as we still need to merge all of our changes when we upgrade the fork.
Creating pull-requests isn't really an option either, because it's likely the changes won't make it in the master repository or not all users will like the changes.
Is there some other common practice we can use in this project?
Note: At the moment I'm asking this for the project I'm working on, but I'd like to know in general how to work with OSS projects where you need to change the sources of a project and still want to get the latest version once in a while.

As long as you cloned the Orchard repo, you can pull changes from it at any time. You may have to resolve some conflicts with a compare tool where you changed a file that was also changed in the orchard repo since your latest merge. It's relatively painless though.
I keep a copy of my repo on bitbucket and just pull from the orchard repo when they release new versions. Typically I need to resolve conflicts for the solution file, and a couple things here and there. Once these conflicts are resolved, you don't need to deal with them again each time you update. That would only happen where there are files that both you and Orchard are changing. (Like the solution file)

Related

Storing PCB files *and* software files in the same Mercurial repo

I have my Banana Pi set up as my Mercurial server. It works well for me for my software as generally speaking I have firmware and that's about it in my repositories. I can access it via open VPN from anywhere in the world. However, I have started to use version control for my PCB files as well now, due to a new CAD system which complicates my old, crude but effective way of doing my PCB archiving and backup. (Also, everything in my new CAD system, all the PCBs and schamtics, are text files which makes version control work nicely.)
So, with Mercurial I started doing as I did with software and creating a new repo for my PCB for one of the boards I'm updating for a customer, and immediately came across an issue that svn seems to cope with easily and I was wondering whether Mercurial can do the same.
I have my BH0001 project repository which has all the embedded C in it and I have started creating a new issue of the PCB for which the C code is used. I had to create a new Mercurial repo called BH0001_pcb to differentiate between code and PCB. With svn you can have a project repo and then Hardware and Software directories within the project number, but still be able to check out the two different types of files to different places independently.
I could, of course, clone the BH0001 software repository to a local machine, add the PCB info in a new folder in the local Mercurial repo send it all back to the server and it would be perfectly happy. The problem then comes when checking out because I would be cloning both firmware and PCB on to a machine when I might only want one or the other.
Also, this goes against how I store stuff locally. In my /username/home directory I have a Software directory and a CAD directory and within those I have projects. So I would have:
home/CAD/CustomerName/BH0001
and
home/Software/CustomerName/BH0001.
If I'm to carry on using my current method do I have to:
Change my local directory structures to be something like:
home/Projects/CustomerName/BH0001/CAD
and
home/Projects/CustomerName/BH0001/Software
Suck it up and use things like ProjectName_pcb for separate repos.
Some other way I can't think of/can't find/am unaware of? e.g. There's a way of checking out part of a Mercurial repository to one directory and a different part of the repo to a different directory.
Or should I just use svn if I really want to carry on as I have?
With default mercurial you currently cannot do partial repository clones as you can do with SVN. So your approach to use separate repositories is a good choice.
However there ways to achieve a similar result: sub-repositories. In your case I'd create a parent repository which contains your two current repositories as sub-repositories. Mind though, sub-repositories have some rough edges, so read the linked page carefully - I'd like to especially stress that it's good practise to have a parent repo which basically only contains the 'real' repos but not much on its own.
There exist ideas like a thin or narrow clone (which is somewhat identical to what SVN does), but I haven't seen them in production.

Dealing with customized local source files in Mercurial

Our team of 4 currently uses svn, but I am testing out Mercurial. We code mostly in C++ and C# on .net. I have installed TortoiseHg, and use hgsubversion to push/pull against the existing svn repo. Other developers on the team will remain on svn for now.
I tend to have a few source files that I have customized (butchered would be a better term) in some specific way that I don't want to push to other developers. But I do want these custom changes compiled into my version of the project. The ratio of normal files to customized files is about 100:1
What is the best way to deal with these butchered files in mercurial? I could simply uncheck butchered files during commits, but eventually, I will forget this step. I have taken a brief look at shelving and ignoring.
Ignoring doesn't seem right, because these are tracked files. I do want to pull changes to butchered files from others. Shelving was close, but it removes the butchered code from my working copy, so that isn't correct either.
I can't be the only code butcher out there. Let me know how you deal with this. Git users, we are happy to hear from you too, we haven't entirely committed to using hg.
If they're files most everyone will be modifying (per developer database account info, etc.) the right way to do it is by tracking a xxxxx.sample file. For example have the build script use local-database.config which is in the .hgignore and provide a local-database.config.sample file in the repository. You can either include "copy the sample to the actual" in your instructions or have the build script do it automatically if the actual doesn't already exist. We have our config do a include of settings.local if and only if it exists, and allow it to override main settings. Then no one needs a local config, but it can be optionally used to alter anything w/o a risk of committing it.
If they're files that only you will be modifying the best way is to learn Mercurial queues. Then you can pop that changeset with the you-only changes, push to central repo, and then apply it again for your builds. There are various extensions that do similar things (shelve for example), but mq is preferable because you can version the overlay you're popping and applying.

Advice on the structure of my repository

I am an applied mathematician and I have recently joined a project that involves the development of production code for our scientific application. The code base is not small and it's deployed as part of a web application.
When I joined, the code was miraculously maintained without a revision control system. There was a central folder in a server and researchers would copy from it when they needed to work with the code. Inside this root directory there was a set of directories with different versions of the code, so people would start working on the latest version they found and create a new one with their modifications.
I created a Mercurial repository, added all code versions to it and convinced everyone to use it. However, since moving to Mercurial, we have felt little if any need to upgrade version numbers, even tough using hg copy allows us to keep revision history.
Here's where I need your advice on best practices of maintaining this code base. Does it make sense under a RCS to keep folders with different versions in a repo? If we keep a single copy of our code in the repo, what's the most common way to track versions? The README files? Should we keep snapshots of the code outside the repo specifying versions? Does it make sense to keep things as they are? What strategies do you use?
Our team is a bunch of scientists and no one has experience on how to maintain such a repo, so I'm interested in what is commonly done.
If you are going to use a version control system, forget about those version folders. Completly. Mercurial will do that for you, the repository is a complete history of all files of the project.
A common way to track version numbers is with tags. You assign a tag with the version number to a changeset.
To help you, as a "getting started guide" in version control, I suggest this book: Version Control By Example. It's free, and it starts from the beginning, it talks about CVCS, DVCS, fundamentals, what a repository is, basic commands, etc. It has also some interesting analogies, like the 3D file system: Directories x Files x Time. The book is fun and easy to understand, I highly recommend it.
I also recommend some GUI software like TortoiseHg. In daily usage, I spend most of the time in the console, but the GUI is very handy specially in the beginning when you still don't know all the commands. And the best part is the graph, you have a visual feedback of what is going on.
This is a good and quick introduction to Mercurial, it even starts out by talking about how using folders to keep different versions is not so great.
I think you're probably on the wrong track if you are using the hg copy command, I've never needed it ;)
The tutorial teaches the command line version of hg, which I personally prefer. When you need a better overview of your repository, you can run "hg serve" and open localhost:8000 in your web browser. I prefer that over TortoiseHG, but I realize that many people want a pure GUI tool.

Mercurial & Web Development: How to handle publishing/deployment?

At work we're moving from no SCM to Mercurial. It's a bit of a learning curve, but after messing with it for two days I definitely feel more comfortable with it.
I still have one big, unresolved question though in my mind: Once code is finished, how do we handle the actual deployment?
Should we be running a copy of Mercurial on the production (live) server? Or should we set rsync or something up to sync from the repo to the web directory? What's the best practice here?
If we do go w/ just pointing apache to the repo, I assume this is okay as long as we're careful not to hg update to a different, non-stable branch? That still seems a little dangerous to me though. Is there some way to force it to only switch to certain builds?
Or is pointing apache to the repo just a terrible idea and I should be doing something else instead?
On a related topic, I've also heard some talk about putting any upgrade scripts (such as schema changes for MySQL) under version control so they can be ran when the version is deployed. But how would that even work as part of the workflow? I wouldn't want to keep it w/ everything else, because it's a temporary one-time use script...
Thanks for any advice you guys can give.
I recently discovered the hg archive command, so I think we'll go w/ this instead. I've written a bash script that changes to the head of the 'production' branch then archives it to a predetermined destination. Seems to work.
I'd still appreciate any feedback you guys have as to whether this is a good idea or not.
I think pointing apache to the repo is definitely a bad idea, hg archive is ok if all you want is to take a snapshot of the dev files.
I find my development source files and a deployed application (even for a web app that doesn't need compiling) are usually very different, the latter being derived from a subset of the former.
I tend to use a shell script or a even a Makefile to "build" a deployed application in a subdirectory of the development directory, this could just be creating a directory tree and copying necessary files or could include compressing scripts etc.
This way you have to make a conscious decision whether or not to include a file in the deployed version, thus helping prevent accidentally leaving development utility files in an online application that could cause a security risk.
The only part mercurial plays is, for a major release I create a new named branch (eg: 1.5), development continues on the default branch. Subsequent bug fixes or patches can be transplanted to the release branch if necessary and if a bug fix release is made I tag the release branch with the new version (eg: 1.5.1).

To keep my own versioned app or not

I need some opinions here.
I'm working on a Django project using buildout to get the dependencies, etc...
I use mercurial as DVCS.
Now... I need to customize one of the dependencies, so I can do one of the following:
(* The changes may not be useful for everyone else.)
1- Do a fork of the project in (github, bitbucket, etc...) maintain my version, and get the dependency with (mercurial or git) recipe.
2- Clone the project, put it in the PYTHONPATH, erase DVCS dirs and add it to my projects version. So every change will be private. Here I need to erase all the info from their DVCS or something.
Any other you can think of.
I'm missing something? I'm too off?
Thanks!
Esteban, take these steps: I'll talk in mercurial-speak, but this is all do able in git too.
clone their project
make your clone of their project a subrepo in your project
That gives you the best of all worlds. You can edit code in your project and their project without paying attention to which is which, and when you commit the changes to your code go into your repo along with a pointer to a new changeset in your clone of their project. Then when you want to update your clone of their project you can do so in place and merge simply.
So this is pretty much what you said in '1' but there's no need to do a fork or host that repo publically. Just edit their clone as a subrepo of your project and never push (which wouldn't work anyway since you don't have write access to their repo).
Your option two's primary drawback is that as they modify and improve their project on which you depend you'll have a hardtime pulling their improvements in and merging them with yours.
Well if you're using DVCS then all your commits are kept as change sets, and people can choose to apply your change set or not. So as long as you comment that change, people can choose apply the change or not as they see fit. What's more if they don't want that change, but want your other changes, they can pick and choose. So the truth is the DVCS takes care of the problem for you (provided the people pulling from you are using the DVCS properly).
Personally, I recommend forking, but like I said, it doesn't really matter.
You ask this question in a rather confusing way, and I don't know if you really understand the point of a DVCS.
The whole point of a DVCS is to allow you to have your own private repository. You do not need to publish your repository on github or bitbucket or any of those places unless you want to, but I certainly would not erase the DVCS information.
If the upstream project makes changes you do want in addition to your own private changes, you will have a devil of a time merging them unless you keep the DVCS information around.
Using Mercurial, you can include a project in yours by using the Mercurial subrepo feature.