I'm working on a new feature branch. It's necessary to keep all the history, but for someone scouring over the history at a later date, much of it is over verbose.
For example I may have 5 commits taking through the steps of adding a new database table, its business logic, its validation and some experiments that I change my mind about etc etc. But for a co-developer all they might need to know is "this fixed bug X".
Is it possible to somehow group a set of commits, so that an overview is shown in the log but still being able to view all the history. Not only only my local repo, but the remote repo as well.
I'm guessing that I could have separate sub-branches and merge them as I go along. But I'll only know that I want to group a set of commit retrospectively. So I don't think that is a good route, as I'll have to keep going back and forth.
I can see that there is a group extension but it's unmaintained. And my experience with unmaintained plugins means that usually I'm going the wrong way about it and that there is a perhaps a better technique.
Is there any best practice around achieving this sort of thing?
For what it's worth, I think you're going down the correct route when you say you want to keep all of your history available. You could use the MQ extension to collapse your changesets into a single commit, but - although this would give you a 'clean' commit - you would lose all that juicy detail.
My way of handling this is to develop on a branch or in a separate clone, and when it's going into Production I describe the whole group of changes in the commit message of the merge, i.e. don't just use "Merge" for the commit message :).
I understand your point about only knowing if you need to group retrospectively, but I think as long as you have some rigour around your dev/test/release process then this shouldn't be too much of a limitation.
You want the collapse extension.
Related
I need to collaborate on a Mercurial repository (let's call it "foo") with some people who are novices at version control in general, Mercurial in particular.
I am trying to come up with a workflow that will enable us to use Mercurial without a lot of extra effort on either their end (confusion) or my end (cleanup).
My concern is that as novices I need to expect them to make errors, and I need to allow them to do so in a controlled way, otherwise they won't use the tool at all because they're too scared. But I don't want a bad change to pollute the repository unnecessarily.
I do not expect them to be able to merge properly or to use the mq extension. This is not
a matter of underestimating them, instead it is a realistic assessment given past experience with SVN and my own experience with Hg.
Which of the following approaches would make the most sense? Or if there's a better approach, what is it?
We have a repository foo-submit, read/writable by all, and a repository foo-trunk, readable by all but writable by admins. Users pull from foo-trunk, and push changes to foo-submit. Cleanup: If I find a good change, I let it through as is; if I find a bad change, I "bypass" it by merging with the previous version.
We have a repository foo-trunk readable by all, writable by admins. Each user is responsible for maintaining their own clone which is read-accessible to the rest of the team. When someone wants to push a change, they let me know and I pull it from their repository, with proper cleanup as necessary (same as in #1)
We have a repository foo-dev, read/writable by all, and a repository foo-trunk, readable by all but writable by admins. Users pull/push to foo-dev, and work in named branches if they need to do extensive development. I am responsible for performing merges and cleanup. The foo-trunk repository is merely for having a "clean" copy that has branches where the tip is always in a good state.
Good question, and one that I've never seen a great answer to.
That said, I like option 2. This is the "Pull Request" model used by the Linux Kernel and made popular by GitHub. It allows the admins to act as gatekeepers / reviewers, only allowing good change-sets to get past them when they're happy. If they decide a developer hasn't delivered something worthy, then the pull request is rejected (with reasons). Then the developer can go away, fix up their code / repo, and submit another pull request.
Running a server with something like RhodeCode on it can help keep on top of pull requests. As things grow you can have lower level gatekeepers that deal with subsystems, and higher level gatekeepers that deal with the whole project.
The bit I've never quite got my head around is what should happen to change-sets that are rejected, and that the developer decides to abandon rather than fix up and try again. They could be closed, but then could possibly appear by mistake as part of a future pull request. They'd be harmless, but possibly confusing. The alternative is stripping them, but that sounds like giving people tools they'll cut themselves on.
The other 2 options you give deserve a little comment.
1 is similar to 2. You're still doing a "Pull Request" type flow, but now you have server side branches which mirror the developer's clones. There's little difference and this is how a RhodeCode, GitHub, BitBucket server would let you work, except you don't have to go searching for changes. The server would tell you they're waiting for you to look at.
3 has the problem that everyone's changes are all merged together on foo-dev before you get to them. They would start becoming inter-dependent, and cherry-picking is going to be messy. You'd probably end up grafting change-sets on to foo-trunk which means you're creating new change-sets with new hashes. When the developers pull those they'll now have the change in two places; their original foo-dev version and your grafted foo-trunk version. This doesn't sound sustainable to me.
Best way i can think of if you don't want to use mq (understand with the least hassle for you) is to have your dev
create their own branch for the current feature being developped
merge it back to the main dev branch (or graft/transplant) when it's completed and validated
and then close the branch.
In the long term see for them to learn mq, it's not too hard to grasp.
3a - foo-dev has protected default branch (only some admins can push/merge-to this branch), users use named branches
I'm very new to SCM, and I'm aware that there are some guidlines and recomendations to follow , but I'm not aware of any of them. There are several things that keeps me confused about SCM. For example:
I know that it's a best practice to commit as soon as possible and as often as possible, but what should I do, if I'm working on a change/feature that requires several days or even weeks? I could split the task but, mercurial says that one should never commit change with future change in mind. Every change in commit should be in final stage.
In what situations are branches useful? except splitting different releases in SCM.
Why and when should I clone a repository?
Sorry for those dumb questions and my broken English, I read many articles about SCM on the net, but every of them contains conflicting information for each other.
Thanks
Commit when something logical is done or you need to perform an
action on the branch. Push when you have confirmed the code is good
via unit tests. Commits are local, pushes are public.
Branch when you are about to start something that requires
several days or even weeks :-)
Clone when you need to, there are no best practice rules around it.
The mindset isn't about committing often, it is more about merging often. If you are on a branch, merge with the mainline frequently. Smaller chunks are easier to digest and you can keep visibility on what is developing (to adjust your code accordingly).
I'm struggling to find the mercurial workflow that fits the way that we work.
I'm currently favouring a clone per feature but that is quite a change in mindset moving from Subversion. We'll also have issues with the current expense we have in setting up environments.
Using hg pull --rebase seems to give us more of a Subversion-like workflow but from reading around I'm wary of using it.
I think I understand the concepts and I can see that rewriting the history is not ideal but I can't seem to come up with any scenarios which I personally would consider unacceptable.
I'd like to know what are the 'worst' scenarios that hg pull --rebase could create either theoretical or from experience. I'd like concrete examples rather than views on whether you 'should' rewrite history. Not that I'm against people having opinions, just that there already seem to be a lot of them expressed on the internet without many examples to back them up ;)
The first thing new Mercurial converts need to learn is to get comfortable committing incomplete code. Subversion taught us that you shouldn't commit broken code. Now it's time to unlearn that habit. Committing frequently gives you a lot more flexibility in your workflow.
The main problem I see with hg pull --rebase is the ability to break a merge without any way to undo. The DVCS model is based on the idea of tracking history explicitly, and rebasing subverts that idea by saying that all of my changes came after all of your changes, even though we were really working on them at the same time. And because I don't know what your changes are (because I was basing my code off of earlier changesets) it's harder for me to know that my code, on top of yours, won't break something. You also lose the branching capabilities by rebasing, which is really the whole idea behind DVCSs.
Our workflow (which we've built an entire Mercurial hosting system around) is based on keeping multiple clones, or branch repositories, as we call them. Each dev or small team has their own branch repository, which is just a clone of the "central" repository. All of my new features and large bug fixes go into my personal branch repo. I can get that code peer reviewed, and once it's deemed ready, I can merge it into the central repo.
This gives me a few nice benefits. First, I won't be breaking the build, as all of my changes are in their own repo until they're "ready". Second, I can make another branch repo if I need to do a separate feature, or if I have something longer-running, like for the next major version. And third, I can easily get a change into the central repo if there's a bug that needs to be fixed quickly.
That said, there are a couple different ways you can use this workflow. The most simple, and the one I started with, is just keeping separate clones. So I'll have website-central, website-tghw, etc. It works well, especially since you can push and pull between them locally. More recently, I've started keeping multiple heads in the same repo, using the remotebranches extension to help manage them and hg nudge to keep from pushing everything at once.
Of course, some people don't like this workflow as much, usually because their Mercurial server makes it hard to make server-side clones. In that case, you can also look at using named branches to help keep your features straight. Unfortunately, they're not quite as flexible as Git branches (which is why we prefer branch repos) but they work well once you understand how to close branches, and why you can't really get rid of them once you start one.
This is getting a bit long, so I'll wrap it up by encouraging you to embrace the superior branching and merging that Mercurial provides (over SVN). There is definitely a learning curve, but once you get the hang of it, it really does make things easier.
From the question comments, your root issue is that you have developers working on several features/bug fixes/issues at one time and having uncommitted work in their working directory along with some completed work that is ready to be pushed back to the central repository.
There's a really nice exchange that covers the issue well and leads on to a number of ways forward.
http://thread.gmane.org/gmane.comp.version-control.mercurial.general/19704
There are ways you can get around keeping your uncommitted changes, e.g. by having a separate clone to handle merges, but my advice would be to embrace the distributed way of working and commit as often as you like - if you really feel the need you can combine the last few local commits into a single changeset (using MQ, for example) before pushing.
I'm working in a team of 3 developers and we have recently switched from CVS to Mercurial. We are using Mercurial by having local repositories on each of our workstations and pulling/pushing to a development server. I'm not sure this is the best workflow, as it is easy to forget to Push after a Commit, and 3 way merge conflicts can cause a real headache. Is there a better workflow we could use, as I think the complexity of distributed VC is outweighing the benefits at the moment.
Thanks
If you are running into a lot of 3 way merges it might be because you have too much overlap in what you and your team members are working on. Mercurial is pretty good at handling merges itself, so long as you all aren't editing the exact same lines of a file. If possible, you could divide up the work more clearly and avoid some of the headaches of large merges. Also note that this would still be a problem with CVS since it's arguably worse at merging than mercurial.
You also don't need to push after every commit. Your workflow could look something like this:
Commit part of some feature.
Commit some more of some feature.
Commit last part of feature.
Commit bug fixes for stupid mistakes.
Push full feature to repo.
To an extent, this looks like Going Dark, but that can be alleviated by making sure that the features in the above example are smallish in scope.
Forget all you know about CVS. Mercurial is nothing like it even if some commands feel somewhat similar.
Read http://hginit.com/. Follow the examples.
Forget all you know about CVS.
I mean it. This is the hardest part. Learn to trust your tool.
It sounds like you're all making your changes to the same branch. This has the unsatisfying side-effect that you're merging each others' changes on almost every single commit, which would be fine except that manually intervening for conflicts isn't something you want to do every time you push.
Here's the workflow I would suggest. The idea is to use branching more heavily, so you need to merge to the master branch less often.
Have every developer develop every feature in a separate branch. This way:
you avoid constantly merging changes from other people, and
you are free of the pressure to push incomplete work before the next guy, "makes it hard to merge."
When a feature is "done" and if the changes would appear to apply cleanly (a judgement call), merge the feature branch directly into the master branch and delete the feature branch.
If a feature falls way behind the master branch (many features merged), or if the merge otherwise appears difficult:
merge master into the feature branch.
Find and fix any bugs in contented isolation from other developers.
Assuming the feature is ready to go, merge it into master (notice: now the merge in this direction will be clean by definition). If not, you can just continue developing.
We are using Mercurial by having local repositories on each of our workstations and pulling/pushing to a development server.
That sounds fine to me. My team is about double the size and it works great.
I'm not sure this is the best workflow, as it is easy to forget to Push after a Commit,
You don't have to push after every commit; you push when you want to push. That's the big idea about DVCS: that Commit and Push are distinct!
and 3 way merge conflicts can cause a real headache.
Are you working on the same lines of code a lot? On my team of 5-6 programmers, pushing/pulling a few times a day, and committing up to a couple dozen times a day, I can't remember the last time I've had to manually resolve merge conflicts. Certainly not in the past month or two.
Is there a better workflow we could use, as I think the complexity of distributed VC is outweighing the benefits at the moment.
Perhaps you should describe your workflow in more detail, because the only complexity over centralized version control that I encounter on a typical workday is maybe one command, and the benefits are huge. Doing "hg blame" just once saves me more time over the centralized version than all the "hg push"es I've had to type all year!
For what it's worth, we're a similar size team working with Mercurial for the first time and we started with the same problem.
We persisted and things are now significantly better. I think most of the problems occurred when the codebase was tiny and people were all trying to work on the same thing. Now that it's a little more established people aren't treading on each others' toes quite so much and the Paris much reduced.
Hope you get it sorted!
I have several projects with a very large over-lapping code-base. We've just recently started using SVN so I'm trying to figure out how I should be using it.
The problem is that as I'm finishing a task on one project, I'm starting a task on another, with some overlap. Often there's a lot of interrupt driven development as well. So, my code is never really in a completely stable state that I feel comfortable checking in.
The result is that we're not really using the VC system, which is a VERY bad thing, we all know... so, suggestions?
Check out a personal branch of the code and merge in changes. At least you will have some version control for your own changes, in case you need to roll back. Once you are comfortable with the state that your branch is in, merge that branch back into the trunk.
You can also check out a branch for each task, instead of one for each individual. You can also merge changes to your branch from the trunk if someone changes the trunk, and you want your branch to reflect the changes.
This is a common way to use SVN, although there are other workflows. I have worked on projects where I was afraid to commit(I would break the build possibly) because we did not effectively use branching.
Branching is really powerful in helping your workflow, use it until you're comfortable with the idea of merging.
Edit: 'Checking out a branch' refers to creating branch in your branches folder, and then checking out that branch. The standard svn repository structure consists of the folders trunk, tags, and branches at the root.
So, my code is never really in a completely stable state that I feel comfortable checking in.
Why is that ?
If your branch is appropriate for your work (with a good naming convention for instance), everyone will know its HEAD is not always stable.
In this kind of "working" branch, just put some tag along the way to indicate some "stable code points" (which can then be queried by any tester to be deployed).
Any other version on that working branch is just made to record changes, even though the current state is not stable.
Then later you merge all on a branch supposed to represent a stable state.
In TFS, you are able to create 'Shelf Sets' (I'm not sure what they'd be called in other source control providers). When you shelve some code, you are saving it to your repository, but not checking it in.
The reason this is important is that if you are working on Bug XXXX, and you fix half of the code, but it's not stable and not 'check-in-able', but you get assigned to NewFeature YYYY, you SHOULD NOT continue working with the same code base. You should 'Shelf' your Bug XXXX code, then return your local codebase to the latest checked-in code, and implement NewFeature YYYY.
This way you are keeping your check-ins atomic. You don't have to worry about losing your work, because it is still held by the repository (so if your computer bursts into flames, you don't have to burst into tears), and you aren't mixing your fixes for XXXX with your new code for YYYY.
Then, once you are asked to go back to XXXX (assuming you've checked in YYYY) you can just unshelve your 'shelf set' and jump right back into it where you left off.
Either accept that the code in SVN is not in a completely stable state and check it in anyway (and reserve time for stabilization and refactoring every X days/weeks so the code doesn't degrade too much).
Or force your team to work in a more structured way with minimal interruption based development so you can check in good code.
The first option is not ideal (but better then no source control), the second is probably impossible - there is no third option.
If you don't have time to get the code to a stable state you defiantly don't have the time to branch and merge all the time.
In distributed sourcecontrol systems like GIT, you commit to your local repository. Only when you push your code, it's 'committed' to the remote repository.
In this way, its much easier to 'safe' your work in between.