I'm working on a big project, started many years ago and evolved over time.
I'm going to clone its repository on my new computer, but I would like to avoid to import the whole history, as it's quite unlikely that I will ever work on very old revisions.
I would like to avoid to copy all the data related to the beginning of the project history.
So, is it possible to perform a kind of "lazy clone", that is, clone only the last part of the history, let's say 6 month, and eventually get the missing parts only when needed? How?
Thanks!
Related
I have an hg (Mercurial) repo located at, say:
http://myhg:5000/projects/fizzbuzz
This fizzbuz directory has the following basic structure:
fizzbuzz/
src/
... thousands of source files
docs/
... lots of docs
tests/
... lots of tests
I am now completely re-engineering the fizzbuzz app. The new app's project structure will be completely different (from the top down) than the existing one:
fizzbuzz/
herps/
foo/
... thousands of foos
bar/
... thousands of bars
derps/
... lots of derps
It's essentially a brand new app. I guess one solution would be to delete the fizzbuzz repo and then create a new one and add my code to the new one. But I was wondering if there's a way to basically tell hg to erase everything in a repo (but not delete the repo), and then add in the new, re-engineered, content. Or some other way to elegantly swap out the new code base for the old. Ideas? Thanks in advance!
Sure, you can wipe a repository by deleting everything and commiting all the changes (all the deletions). If you ever need to restore or review the old code it will still be available by checking out the revisions prior to the deletion. Unless your repository is very large on disk this is probably the way to go--alternatively you can start a new repository for the new version and leave the current one as-is.
In either case deleting all the code and its history is typically unwise.
Take a look at the Convert extension. While it is, by design, not possible to change the history of your current repository, you can, via hg convert, construct a new repository based on the history of an existing repository. This is very useful for scenarios like you describe, where you need to refactor the file-structure to such a degree that the old history is no longer useful.
That said, consider just making the changes directly in your current repository. What actual benefit do you get by rewriting history? Make the changes now, and Mercurial will continue doing it's job of tracking where you came from.
I'm very new to SCM, and I'm aware that there are some guidlines and recomendations to follow , but I'm not aware of any of them. There are several things that keeps me confused about SCM. For example:
I know that it's a best practice to commit as soon as possible and as often as possible, but what should I do, if I'm working on a change/feature that requires several days or even weeks? I could split the task but, mercurial says that one should never commit change with future change in mind. Every change in commit should be in final stage.
In what situations are branches useful? except splitting different releases in SCM.
Why and when should I clone a repository?
Sorry for those dumb questions and my broken English, I read many articles about SCM on the net, but every of them contains conflicting information for each other.
Thanks
Commit when something logical is done or you need to perform an
action on the branch. Push when you have confirmed the code is good
via unit tests. Commits are local, pushes are public.
Branch when you are about to start something that requires
several days or even weeks :-)
Clone when you need to, there are no best practice rules around it.
The mindset isn't about committing often, it is more about merging often. If you are on a branch, merge with the mainline frequently. Smaller chunks are easier to digest and you can keep visibility on what is developing (to adjust your code accordingly).
I currently use SVN for a number of things that aren't exactly code, for instance xml files, report templates, miscellaneous files, etc. I have several non-developers who are comfortable using TortoiseSVN for this. They typically work as follows:
Person A - does an SVN Update on the folder of interest to them. Or perhaps just on a single file.
Person A - edits whichever file(s) they're working on. Perhaps add or remove files.
Person B - someone else is probably working on different files at this point
Person A - does an SVN Commit to save their changes to the repository.
Very occasionally they'll hit conflicts where more than one person has edited a file. Almost always this is just because they forgot step #1. Because they're always working on separate files, there are (almost) never real conflicts. As long as they do step #1 first everything works fine.
I'd like to move to Mercurial, however something holding me back is the prospect of having do 'merge' all the time, because Mercurial looks at the state of the entire repository, not just the files of interest at a particular time. e.g. the workflow would be like this:
Person A - does a pull and update on the repository. (let's assume there are no local changes so this is straightforward).
Person A - edits whichever file(s) they're working on. Perhaps add or remove files.
Person B - someone else edits, commits, and pushes a different file at this point
Person A - commits changes. Tries to push. Gets an error about multiple heads.
Person A - does a pull and update. update doesn't work: merge required.
Person A - does a merge. If using TortoiseHg it's a bit confusing working out what to click on to do the merge. I guess this is simpler on the command line, provided there are no complications.
Person A - commits the merge.
Person A - pushes the changes.
My resistance is that there are more steps, and the merge step is somewhat hard to get your head around if you're not a developer. Is there a way I can put these steps together to make the process nice and simple?
"Very occasionally they'll hit conflicts where more than one person has edited a file. Almost always this is just because they forgot step #1. Because they're always working on separate files, there are (almost) never real conflicts. As long as they do step #1 first everything works fine."
If this is the case why do you want to use a DVCS? Mercurial is great, but the benefits of a DVCS come from the ability to merge and fork and the ease of doing either, if your workflow requires neither why would you want to switch toolset?
Sounds like the rebase extension might work for you. The workflow becomes:
hg clone
make changes
hg commit
hg pull --rebase
hg push
The local revisions get "rebased" onto the latest tip on pull, which avoids the merge.
One possible approach is to have a point person who does all the real work of merging. I'm not a big fan of letting everyone push to one shared repos, expecially if they don't know what they are doing. An alternative approach is that A has local repos A, B has local repos B, and there is repos S, which combines A and B. Then, don't let A or B push to S. Instead let an expert pull from A and B, and do the merging in S. Then A and B never have to push to S. If they coordinate with the expert, then he/she will already have merged their changes into S by the time they pull updates from S, and so A and B will not have to merge either when pulling. This is actually the default mode in which DVCS works, since by default all repositories are read-only except by their owner.
I'm working in a team of 3 developers and we have recently switched from CVS to Mercurial. We are using Mercurial by having local repositories on each of our workstations and pulling/pushing to a development server. I'm not sure this is the best workflow, as it is easy to forget to Push after a Commit, and 3 way merge conflicts can cause a real headache. Is there a better workflow we could use, as I think the complexity of distributed VC is outweighing the benefits at the moment.
Thanks
If you are running into a lot of 3 way merges it might be because you have too much overlap in what you and your team members are working on. Mercurial is pretty good at handling merges itself, so long as you all aren't editing the exact same lines of a file. If possible, you could divide up the work more clearly and avoid some of the headaches of large merges. Also note that this would still be a problem with CVS since it's arguably worse at merging than mercurial.
You also don't need to push after every commit. Your workflow could look something like this:
Commit part of some feature.
Commit some more of some feature.
Commit last part of feature.
Commit bug fixes for stupid mistakes.
Push full feature to repo.
To an extent, this looks like Going Dark, but that can be alleviated by making sure that the features in the above example are smallish in scope.
Forget all you know about CVS. Mercurial is nothing like it even if some commands feel somewhat similar.
Read http://hginit.com/. Follow the examples.
Forget all you know about CVS.
I mean it. This is the hardest part. Learn to trust your tool.
It sounds like you're all making your changes to the same branch. This has the unsatisfying side-effect that you're merging each others' changes on almost every single commit, which would be fine except that manually intervening for conflicts isn't something you want to do every time you push.
Here's the workflow I would suggest. The idea is to use branching more heavily, so you need to merge to the master branch less often.
Have every developer develop every feature in a separate branch. This way:
you avoid constantly merging changes from other people, and
you are free of the pressure to push incomplete work before the next guy, "makes it hard to merge."
When a feature is "done" and if the changes would appear to apply cleanly (a judgement call), merge the feature branch directly into the master branch and delete the feature branch.
If a feature falls way behind the master branch (many features merged), or if the merge otherwise appears difficult:
merge master into the feature branch.
Find and fix any bugs in contented isolation from other developers.
Assuming the feature is ready to go, merge it into master (notice: now the merge in this direction will be clean by definition). If not, you can just continue developing.
We are using Mercurial by having local repositories on each of our workstations and pulling/pushing to a development server.
That sounds fine to me. My team is about double the size and it works great.
I'm not sure this is the best workflow, as it is easy to forget to Push after a Commit,
You don't have to push after every commit; you push when you want to push. That's the big idea about DVCS: that Commit and Push are distinct!
and 3 way merge conflicts can cause a real headache.
Are you working on the same lines of code a lot? On my team of 5-6 programmers, pushing/pulling a few times a day, and committing up to a couple dozen times a day, I can't remember the last time I've had to manually resolve merge conflicts. Certainly not in the past month or two.
Is there a better workflow we could use, as I think the complexity of distributed VC is outweighing the benefits at the moment.
Perhaps you should describe your workflow in more detail, because the only complexity over centralized version control that I encounter on a typical workday is maybe one command, and the benefits are huge. Doing "hg blame" just once saves me more time over the centralized version than all the "hg push"es I've had to type all year!
For what it's worth, we're a similar size team working with Mercurial for the first time and we started with the same problem.
We persisted and things are now significantly better. I think most of the problems occurred when the codebase was tiny and people were all trying to work on the same thing. Now that it's a little more established people aren't treading on each others' toes quite so much and the Paris much reduced.
Hope you get it sorted!
Just started working with Mercurial a few days ago and there's something I don't understand.
I have an experimental thing I want to do, so the normal thing to do would be to clone my repository, work on the clone and if eventually I want to keep those changes, I'll push them to my main repository.
Problem is cloning my repository takes alot of time (we have alot of code) and just compiling the cloned copy would take up to an hour.
So I need to somehow work on a different repository but still in my original working copy.
Enter local branches.
Problem is just creating a local branch takes forever, and working with them isn't all that fun either. Because when moving between local branches doesn't "revert" to the target branch state, I have to issue a hg purge (to remove files that were added in the moved from branch) and then hg update -c (to revert modified files in the moved from branch). (note: I did try PK11 fork of local branch extension, it a simple local branch creation crashes with an exception)
At the end of the day, this is just too complex. What are my options?
There are several ways of working with local branches besides cloning:
Bookmarks
Named branches
Anonymous branches
You may be interested in reading a very insightful guide to branching in Mercurial. I guess the bookmarks extension is the most appropriate way of branching in the context you've described.
Probably not the answer you want to hear, but I would say maybe you should consider splitting your repository into a little more manageable chunks :). E.g. does that documentation and those design files really need to be included in the same repository, does that editor tool not deserve its own repository, etc.
Because as cloning creates hard links, it’s pretty much already as fast as it can get; if cloning takes 5-10 minutes already, making a file system copy must be even worse. (Tip: keep in mind hg clone -U if you do not need a working copy, it will be much much faster.)
Otherwise, yeah, anonymous branches with bookmarks is the usual way to do in-place switching.
how long does cloning a local repo take?
it sounds like you build process might be the weak link to me - perhaps you need something like ccache? so that you can clone and build quickly
couldn't you make your experiments on top of your existing clone and when you want to make some changes to the main line go back and update to the last revision before you started the experiments?