SCM for configuration statement versioning

SCM for configuration statement versioning - json

My request probably seems to be a bit strange, but let me try to explain what I want to do.
So, first of all, what I want to version in the SCM isn't really the source code of something. It's JSON-files which contains statements to configure a specific server software (the server software saves its configuration in a database) using the provided api (there's a separate deployment tool). The only way to configure this software is using the api. So, the JSON looks something like this:
[{
"command": "command1",
"options": {
"option1": "value1"
}
}, {
"command": "command2",
"options": {
"option2": "value2"
}
}]
and so on and so on. So, now the configuration of this software is developed in Scrum and the result of each sprint needs to be a set of configuration commands which changes the software accordingly. This means, that the release package has to contain only the commands which weren't there the last release, too. So, what I currently think about (doing in git) is the following:
When a new sprint starts, I createa new branch in the repository and clear out all configuration files (there're several ones). I develop the configuration changes in the above mentioned JSON syntax and anything is fine. At the end of the sprint, the things in the branch is the release package (which only contains the delta configuration options from the previous release and this release). Now I would need to manually merge the branch back to the master to get an overall set of configuration options (e.g. to deploy a new server or rebuilt a server when it crashed or whatever). This is a manual task, however, I don't know how it could be done in a better way.
So, what I really want to ask into the round is:
Does anyone know a better solution to manage the configuration files? The goal is to have a delta of configuration options from the previous release, which could be used to update the configuration of an existing server, and a release package which contains all configuration statements (master). I would really love to see a better solution, however, I don't know any.
Thanks in advance for any help! If you've questions regarding what I ask for, feel free to comment :)
EDIT 1:
Based on the answer of #Marina - MSFT I thought about this a bit more. In git, something like this would probably work:
Let's assume a master like this:
|
C Another commit with changes to config2.json
|
B Some other commit with changes to config1.json
|
A First commit
|
So, currently the master tree contains two files, config1.json and config2.json, both have a JSON content like mentioned above.
Now, the next sprint (as an example, called "Sprint 1") starts and someone will create a new branch (git checkout -b dev for example). This person will also need to delete all files using git rm * and commits these chanes as the first commit to the branch, resulting in this graph:
---A---B---C---
\
D
D is the commit, which deletes all files. Now, I commit changes, which has to be done in this sprint (the configuration files will always only contain these changes). At the end of the sprint, I probably have a graph like this one:
---A---B---C---
\
D---E---F---G---H
So, because I only want E, F, G and H in the master, I don't merge the branch but instead cherry-pick all changes except D to master. Because I always edit the same files (config1.json and config2.json), git will ask me to merge these files manually (which is totally fine, I don't expect, that any tool can support me in merging files in the way I need to do it). After merging the graph should look like:
---A---B---C---E'---F'---G'---H' <--- master branch
\
D---E---F---G---H <--- dev branch
Now, I could rename the dev branch to Sprint 1 (git branch -m sprint1) or something like that and would have the delta release there and a full release in master. This should work, right?

If you want to do version control for files, git is a very popular way. For your detail requirement in git as below (if there has misunderstanding for your requirement, please correct):
Treat master branch as main branch, every time, after finish a sprint you can merge it to master branch, master branch is the last previous version, and the branch you are working on is the current, so you can use git merge to deal the conflict files with delta configuration.
Show as the below graph, when you start a new sprint, create dev1 branch (git checkout -b dev1) and make and commit changes for config files. Then merge dev1 into master branch (git checkout master and git merge dev1), you can solve the conflict files to keep delta changes, use git add . and git commit to finish the merge. The next sprint is similar.
______C_____ dev1
/ \
A---B--new commit--D---E--new commit--G--H master
\ /
_____F_____ dev2
Note: when you create a new bench new master, the config files can’t cleaned automatically, you need to delete the files or use git rm * to delete all files.
New solution bases on your edit:
A---B---C master
\
D---E---F---G---H dev
If you use cherry-pick, you need do 4 steps to make changes for E,F,G,H to master. Of cause it can work correctly, but there are also two ways to make it easier:
Rebase commit E,F,G,H to master branch for one command:
git rebase --onto master <commit id for D> dev
Because commitH has already contained changes for E,F and G, so you can only need to rebase/cherry-pick commitH to master branch. This will keep the master branch only contains the final edition for each sprint.

Related

Mercurial Repo Living Archive

We have an Hg repo that is over 6GB and 150,000 changesets. It has 8 years of history on a large application. We have used a branching strategy over the last 8 years. In this approach, we create a new branch for a feature and when finished, close the branch and merge it to default/trunk. We don't prune branches after changes are pushed into default.
As our repo grows, it is getting more painful to work with. We love having the full history on each file and don't want to lose that, but we want to make our repo size much smaller.
One approach I've been looking into would be to have two separate repos, a 'Working' repo and an 'Archive' repo. The Working repo would contain the last 1 to 2 years of history and would be the repo developers cloned and pushed/pulled from on a daily basis. The Archive repo would contain the full history, including the new changesets pushed into the working repo.
I cannot find the right Hg commands to enable this. I was able to create a Working repo using hg convert <src> <dest> --config convert.hg.startref=<rev>. However, Mecurial sees this as a completely different repo, breaking any association between our Working and Archive repos. I'm unable to find a way to merge/splice changesets pushed to the Working repo into the Archive repo and maintain a unified file history. I tried hg transplant -s <src>, but that resulted in several 'skipping emptied changeset' messages. It's not clear to my why the hg transplant command felt those changeset were empty. Also, if I were to get this working, does anyone know if it maintains a file's history, or is my repo going to see the transplanted portion as separate, maybe showing up as a delete/create or something?
Anyone have a solution to either enable this Working/Archive approach or have a different approach that may work for us? It is critical that we maintain full file history, to make historical research simple.
Thanks

You might be hitting a known bug with the underlying storage compression. 6GB for 150,000 revision is a lot.
This storage issue is usually encountered on very branchy repositories, on an internal data structure storing the content of each revision. The current fix for this bug can reduce repository size up to ten folds.
Possible Quick Fix
You can blindly try to apply the current fix for the issue and see if it shrinks your repository.
upgrade to Mercurial 4.7,
add the following to your repository configuration:
[format]
sparse-revlog = yes
run hg debugupgraderepo --optimize redeltaall --run (this will take a while)
Some other improvements are also turned on by default in 4.7. So upgrade to 4.7 and running the debugupgraderepo should help in all cases.
Finer Diagnostic
Can you tell us what is the size of the .hg/store/00manifest.d file compared to the full size of .hg/store ?
In addition, can you provide use with the output of hg debugrevlog -m
Other reason ?
Another reason for repository size to grow is for large (usually binary file) to be committed in it. Do you have any them ?

The problem is that the hash id for each revision is calculated based on a number of items including the parent id. So when you change the parent you change the id.
As far as I'm aware there is no nice way to do this, but I have done something similar with several of my repos. The bad news is that it required a chain of repos, batch files and splice maps to get it done.
The bulk of the work I'm describing is ideally done one time only and then you just run the same scripts against the same existing repos every time you want to update it to pull in the latest commits.
The way I would do it is to have three repos:
Working
Merge
Archive
The first commit of Working is a squash of all the original commits in Archive, so you'll be throwing that commit away when you pull your Working code into the Archive, and reparenting the second Working commit onto the old tip of Archive.
STOP: If you're going to do this, back up your existing repos, especially the Archive repo before trying it, it might get trashed if you run this over the top of it. It might also be fine, but I'm not having any problems on my conscience!
Pull both Working and Archive into the Merge repo.
You now have a Merge repo with two completely independent trees in it.
Create a splicemap. This is just a text file giving the hash of a child node and the hash of its proposed parent node, separated by a space.
So your splicemap would just be something like:
hash-of-working-commit-2 hash-of-archive-old-tip
Then run hg convert with the splicemap option to do the reparenting of the second commit of Working onto the old tip of the Archive. E.g.
hg convert --splicemap splicemapPath.txt --config convert.hg.saverev=true Merge Archive
You might want to try writing it to a different named repo rather than Archive the first time, or you could try writing it over a copy of the existing Archive, I'm not sure if it'll work but if it does it would probably be quicker.
Once you've run this setup once, you can just run the same scripts over the existing repos again and again to update with the latest Working revisions. Just pull from Working to Merge and then run the hg convert to put it into Archive.

Mercurial per feature work flow, single developer

I've read similar posts of SO, the official Hg guide, many articles and guides, and it's still unclear to me what the best Hg workflow is for developing by feature. Maybe some of the articles on the web are years old and don't include the latest features from Hg. Obviously there's also a lot of options in how to approach it.
I'm a solo developer working on a project where a request for a fix or feature will be submitted to me as a task, like "Task #546 - Change whatever". Some of these tasks take a few days, and some tasks are open for months and there's often up to a dozen going at one time. A task is shipped to the final site after it's approved by the requestor.
The Hg guide seems to recommend having a clone per feature. But having a dozen full copies of the site on my drive seems... wasteful? I'm up for trying it, but I've seen other suggestions that make more sense. Do people really have a dozen copies of each site on their dev machine at a time?
Name branches at first sound like what I'd want, where's I'd name a branch "task 546" work on it, then merge it back in when it ships. I see a lot of discussion about the permanence of the names and having so many branches (though they can be closed). Some people seem to care about that and some don't. I don't know Hg enough to know if I care or not, and what the downsides really mean.
Finally, bookmarks seem to be popular with the more recent articles and it would seem that the best way to use them would be to set a bookmark like "task 546" then when you merge it back into the main branch using a commit message that has the task number in it to keep a reference to what was being done in the work. I know you can delete bookmarks, but it's unclear if I'd need to do this after the final merge.
So my thought for a combined approach is to have:
one repo
three named branches:
"default" which holds the released version of the site
"dev" on which I do feature development
"test" which would hold all of the tasks being reviewed by the client
on the "dev" branch I would use bookmarks for each of the tasks that I'm working on, so I'd have a head for each task
My workflow for a task/feature would be to:
Update to the main line of the "dev" named branch
Start a new branch using a bookmark for the task "task #123"
Commit changes until I'm ready for the client to review
Merge "task #123" into the "test" branch
Deploy "test" to the test server
Repeat the commit, merge, deploy until ready for production
When approved, merge with the main line of the "dev" branch with a commit message that includes the task name
Merge "dev" into the "default" branch.
Deploy the "default" branch to the live server
Merge "default" into the open feature branches
Thoughts? Would I be better off just having a clone for each feature, and a "live" and "test" repo that I push to?
Edit: I see from some links that I should be doing the development off of "default" so my first change to my listed process would be to use a name "production" branch instead of a named "dev" branch.

Bookmarks-style of branching (Git-like "branches") works poorly in, at least, two common cases
Cross-tasks merges in the process of development
Time-back machine, when you'll want to see "the whole history of changes for task#123" (you can do it visually and, with some grimaces and jumping, using revsets)
While using named branches haven't such problems and, btw, workflow with named branches (and only default branch as aggregation point) will be less complex and more logical way
Default contain only mergesets from task-branches, head of default is always "stable version"
Heads of named branches are WIP; branches, merged to default - finished (and accepted by customer - see below) work
Default, merged to task-branch (after development of task, before merging task-branch to default) is equivalent of your "test": without affecting mainline you can test final state of feature, integrated into your stable app, show results to customer
Accepted work added to stable mainline by merging named branch to default
History (full history) of changes for every task in the past can be easy restored by using single, easy, short, memorable revset for log: -r "branch(TASK-ID)"

I like it. +1. This is the way I'd do it.

Mercurial bookmarks - utilization for development and stable versions

This is NOT another what are bookmarks/what are branches question - I have read all of these posts and now want to clarify some things about correct usage.
I am developing a website. I want a stable version, and a development version.
So I create two bookmarks 'stable', and 'development'.
If i want to create a new feature I update to the development bookmark, and create my feature.
If i want to correct a typo I do it directly in the stable version.
My confusion is as follows.
I have a central repository at bitbucket.
If i use hg push my bookmark data is not passed. If i do hg push -B stable or hg push -B development respectively then my bookmark data is pushed.
I then have two servers, a testing server and a live server.
If I ssh onto the server and do a hg pull from bitbucket because the bookmarks are not present on the server, what is pulled, and what then is the working copy updated to when I use hg update?
The correct usage for what I want, I believe is as follows. A local repository with my two bookmarks 'stable' and 'development'. I switch between the two as required and push them to bitbucket with hg push -B bookmark-name. Then I login to my testing/live server respectively and pull the correct bookmarked version.
Once I have tested my development bookmark I can merge it with my stable one and pull it onto the live server.
My concern and as such my question is what happens If i accidentally forget to specify the bookmark when pulling to the live server for example?
Thanks

Pulling
From Mercurial 2.3, pulling gets the remote repository's bookmarks as well. Before, you had to specify -B <bookmark> to get bookmarks as well as changesets. So your server repositories will have the right bookmarks after pulling.
If you're using an earlier version, you'll have to pull -B <bookmark> to get the bookmark as well. Of course, you can do that anyway, if you'd prefer not to pull all development changesets onto your live server.
Updating
Using hg update with no arguments will get you tip, which is always the last changeset added to the repository, whether that's stable, development or accidentally un-bookmarked (actually, it'll get you the last changeset added to the current branch, but it sounds like you're not using named branches). To get consistent results when updating, I'd recommend you be explicit about which bookmark you want each server repository to update to. If you're worried about forgetting to specify, use scripts to automate your update process.

The correct usage IMHO is to use named branch instead of bookmark.

I treat bookmarks as local tags and no more. So if I'm wanting to push tagging information then I use actual tags to mark stable releases. Every time I do a release I mark it such as "rel-2.4" for example.
Then on live I can update to the latest revision and know that that is the last good release. My "dev" is simply the head of the default branch and I keep adding new bits of development into it. This way you can just do a push and not worry about the bookmarks.
This might not be what you want or you envisage but it is a workable solution for the situation you describe.
Should we have a fix that we need to do (a typo in your example) I can update to the last release, correct the type, test and if happy tag it as the next release (rel-2.41). Merge that new branch back into default so my dev branch has the fix too. Jump on the live server and pull/update to rel-2.41
Is that any good to you?

How to merge in mercurial by TortoiseHG?

I have two machines, window 7 with TortoiseHG 2.6 and Linux with TortoiseHG 1.5. I use bitbucket as a website to hold my centralized repository. The window machine associates with bitbucket user account, Cassie-win and the linux machine associates with bitbucket user account, Cassie-linux. Here are the steps I have performed.
created a empty centralized repository at butbucket Cassie-linux account
create two files locally, file1.txt and file2.txt on linux machines and push them to the Cassie-linux account.
Add user Cassie-win to the repository
Clone the repository to the window machine, make changes on file2.txt there and push it back to the centralized repository.
Meanwhile, also make some changes on the file2.txt which is on the linux machine and commit locally
Then I pull changes from the centralized repository at Cassie-linux account to my local linux machine. Now my linux local repository has two heads because of different contents in file2.txt and I would like to merge them together.
I used the "merge with" tool of TortoiseHG on my linux machine to merge these two heads. However, it kept failing to do so. I tried like a hundreds times and didn't know where I got it wrong. Both file1 and file2 are test files that have only three lines. I also used a command "hg resolve file2.txt" to check the error. However, it only showed that mergeing fails and didn't shown much information. Does anyone know how to use TortoiseHG merge tool to merge two heads ? And what could be the problem to stop TortoiseHG merging two heads ?
I embedded the screenshot which I took on the Linux machine which has TortoiseHg 1.5.
I right clicked the "from Cassie-win account" and it has "merge with" option. Then, I clicked the "merge" option with "merge" tool but it failed with error messages as below.
Thank you very much,

By "merge with" in TortoiseHg, do you mean you tried "Merge with Local" from the context menu of one head after updating your working directory to the other head? If so, you should have seen a wizard taking you through the steps of a merge. TortoiseHg will expect you to intervene if it cannot automatically merge the two heads. Once it asks you to do that, you have a few options including "Mercurial Resolve" and "Tool Resolve."
If you could post more details about the errors you see when the merge "fails," we might be able to be more helpful.

First things first: A merge just creates a new file version. You must commit it before it can be seen or pushed to your other repo clones.
Now, if you're doing a merge and the same line, or adjacent lines, have been modified in both merge parents, the automatic merge won't succeed and you'll be asked to choose what to do at each conflict point. Kdiff3, the tool that TortoiseHG uses for merging the files on Windows is not terribly intuitive, so here's an overview:
If there are multiple files to merge, you'll see a directory control. Navigate to one of the files and open it so you see two file versions side by side.
Depending on the circumstances, you may now have to activate "Merge this file" (in the Merge menu and on the toolbar), which will show you a third version of the file (the merge result) below the other two.
You can now navigate from change to change, and click on the A and B buttons to select which change to use. Note that the buttons are toggles, and it is possible to activate both together (to import both versions to the merge). Note also that you'll also see diffs that could be automatically merged. In general you can leave them alone (there are navigation buttons that go directly to the next unresolved diff).
Once you've figured out how to work all this and chosen a version for each diff, you'll be able to save and go on to the next file. When you're done, your merge is ready to commit.
Edit:
So you need to do this on the Linux side, and you've got no Kdiff3. Ok, then do it the old-fashioned way: Using the commandline and a regular editor.
When a merge fails, the conflicted file turns into a context diff that includes both revisions. Open it with your favorite editor, look at it carefully and clean up each context-diff region (you'll know them when you see them) until you have a clean, usable file just as you want it to be.
Exit, drop to the command line and type hg resolve -m file2.txt. This removes the file from the list of conflicts.
When you've done this with all conflicted files (you can list them with hg resolve -l) you'll be allowed to commit, and your troubles are over.
PS. If you don't like the merge tools you've got, consider installing kdiff3 (it's available for linux, but no idea how well it works), or p4merge as #LazyBadger suggests.

Side notes
You can exchange data between your hosts without "Bitbucket-in-the-middle": just hg serve on both hosts and hg pull <PARTY> on opposite side
You could use single BB-account from both our hosts (less management on BB-side) and differentiate source of every changeset in Bitbucket interface only by usernames in changesets
To the question of merges
When (in any SCM) you try to merge two diverged lines of coding, there are two possible cases
Independent changes are not overlapped and can be combined into common descendant without user's intervention. If this case merge "just happens"
Changes are intersected and some strings are in conflicted state: i.e we have two different changes for some old data. It this case user's choice is needed and SCM or store merge-result with conflicted parts marked (and note merge as unsucceful and unfinished before appropriate user-action) or run user-defined merge-resolver (read "Visual Merge Tools")
In your situation, obviously, he have second case: some strings from 3 in merge-sources was in conflict and you haven't configured in TortoiseHG Diff/Merge tools (TortoiseHG - Global Setting - TortoiseHG)
Have and add these tools will be best choice for future. Now you have by hand edit file-in-conflict and select correct data in conflicted lines, mark conflict as resolved (check TortoiseHG context menu of file), remove temp-files and, at last, commit merge

Mercurial precommit script to change a file

Despite the decentralized nature of Mercurial, we have a centralized server that we all push to and that does nightly builds, packaging, etc...
Here's what we want to achieve: One of the files that is source controlled contains the major+minor version numbers which ideally would have to be increased with every commit. Since centralized numbering is not possible on developer's machines, we were thinking of a precommit script on the main server that would write a new minor version number to that file for each commit that is pushed. The question is/are:
since it's precommit, can this file change be part of the same commit?
if not, can precommit cause another commit and how do you prevent it from cascading/recursing?
how would one do that?
is there a better solution?

A "precommit" script is triggered only at commit time. By the time users are pushing to the "central" server they have already committed, and it's too late for a precommit hook to do anything at all. You can have changegroup and incoming hooks that are triggered to run on the "central" server when the developers push, but those can't modify the commits -- the commits are already committed/baked/done at that point, they can only react to them.
As a suggestion don't actually put the version string in the file -- having a file that changes with every commit just makes merging a pain. Instead do one or more of these:
have a CI server (like Jenkins) do builds on every push and use the Jenkins build number, which can be passed into your build script
use the Mercurial nodeid (hash) as part of your version string so you can always knew exactly what revision is in a build -- and don't put it in a file, just query for it in your build (or deploy) script
use a changegroup hook to automatically tag-on-push, which applies a pretty (possibly sequential) name to the commits (note, this pretty much doubles your number of commits since every tag is a commit)
Personally, I use something like this in my build script:
build.sh --version_string=$(hg log -r . --template '{latesttag}.{latesttagdistance}-{node|short}')
That gets me version strings that look like "1.0.3-5fd8ed67272e" which can be roughly read as "built from the changeset three commits since version 1.0 was tagged with nodeid 5fd8ed67272e", which is pretty darn good -- and it's never saved into a file it's either baked into the compile (for compiled languages) or written into a VERSION file when my deploy script uploads it to the server.

See this page in the Mercurial documentation for some comments and ideas about this issue. See also How to expand some version keywords in Mercurial? and the other SO answers referenced there.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008