Case-insensitive diffs in Mercurial - mercurial

I'm using Mercurial (specifically TortoiseHg on Windows) to do version control of VBA code. Anybody who's tried this knows that VBA changes the case of every variable throughout a project whenever any declaration of that variable is changed anywhere in the project (regardless of scope). It makes version control a nightmare.
I would like to ignore case changes in my source code when performing diffs. What is the easiest way to do this? (some option for diff that I'm missing, an external diff utility, something else?)
NOTE: I am not talking about dealing with 'case-insensitive filenames' (yes, I'm talking to you Google...)

You can do that when diffing for your on-screen consumption using the ExtDiff Extension.
[extensions]
hgext.extdiff =
[extdiff]
# add new command that runs GNU diff(1) in case-insensitive mode
cmd.mydiff = diff
opts.mydiff = -i
Then you'd run hg mydiff from the command line. That, of course, requires you have a diff binary installed be it gnu's or other.
However, that's not going to be as helpful as you might like because internally, of course, Mercurial can't ignore case -- it's taking the cryptographic hash of the file contents, and those don't allow for wiggle room. So if you get this set up you'll do hg mydiff, and see no changes, and then do hg commit and see changes all over the place.
So you can make this work on-screen, but not fundamentally.
One option would be to find a visual basic code-cleaner, similar to indent for C-like languages, that normalizes variable case and run that in a mercurial commit hook. Then at least all the code going into source control will be consistent and you can diff across revisions accurately.

If you are okay with having your code in all lower-case, say, then you could employ the encode/decode hooks for this. It would work like this:
[encode]
*.vba = tr A-Z a-z
This will encode the file content in lower-case whenever you do a commit. The diffs are also computed based on the encoded (repository) version of the files.
Consider a file that contains
hello
Changing it in your working copy to
Hello World
will give a diff of
% hg diff
diff --git a/a.txt b/a.txt
--- a/a.txt
+++ b/a.txt
## -1,1 +1,1 ##
-hello
+hello world
Notice how the capital "H" and "W" has been ignored.
I don't really know anything about VBA code, so I'm not 100% sure this solution works for you. But I hope it can be a starting point.
One drawback is that you'll need to set this encode rule for all your repositories. The reposettings extension can help you here.

Here's the solution I have settled on. It is far from ideal, but better than the other alternatives I've considered.
I created an Autohotkey script that does the following:
reverts MS Access files in a repository with detected changes (to .orig files)
reads in the .orig file (the one with the changes)
reads in the existing file (the one already in the repository)
converts the text of both files to lower case
compares the lower case contents of the files
if the files still differ, the .orig file is restored so it may be committed to the repository
if the files are the same (i.e., they differ only in case, the .orig file is deleted because we don't care about those changes)
For files that have actual changes that we care about, I still see the case changes that were made as well. If that results in a lot of noise, I open the file in a comparison tool that allows case-insensitive compares (e.g., kdiff).
It's not a perfect solution, but it removes about 90% of the frustration for me.
Here's my script. Note that the script includes another Autohotkey script, ConsoleApp.ahk, which provides a function named, ConsoleApp_RunWait(). This is a 3rd party script that no longer works very well with 64-bit AHK, so I'm not including it as part of my answer. Any AHK function that executes a command line and returns the output as a string will suffice.
; This script checks an MS Access source directory and reverts all files whose only modifications are to the
; case of the characters within the file.
#Include %A_ScriptDir%\ConsoleApp.ahk
#NoEnv ; Recommended for performance and compatibility with future AutoHotkey releases.
SendMode Input ; Recommended for new scripts due to its superior speed and reliability.
SetWorkingDir %A_ScriptDir% ; Ensures a consistent starting directory.
; Allow for custom path to hg (support for moving to TortoiseHg 2.0)
IniRead hg, %A_ScriptDir%\LocalSettings\Settings.cfg, TortoiseHg, hg_path, hg
if 0 < 1 ; The left side of a non-expression if-statement is always the name of a variable.
{
MsgBox Usage:`n`HgIgnoreCase DirectoryWithFilesToScrub
ExitApp
}
SrcDir = %1%
StringReplace SrcDir, SrcDir, ", , All
StringRight test, SrcDir, 1 ; add trailing slash if necessary
ifnotequal test, \
SrcDir = %SrcDir%\
RestoreOriginals(SrcDir)
RevertCaseChangeModifiedFiles(SrcDir)
RevertCaseChangeModifiedFiles(SrcDir) {
global hg
includes = -I "*.form" -I "*.bas" -I "*.report" -I "*.table"
cmdline = %hg% revert --all %includes%
;Don't revert items that have been removed completely
Loop 3
{
Result := ConsoleApp_RunWait(hg . " status -nrd " . includes, SrcDir)
If (Result)
Break
}
Loop parse, Result, `n, `r
{
if (A_LoopField)
cmdline = %cmdline% -X "%A_LoopField%"
}
Result =
;msgbox %cmdline%
;revert all modified forms, reports, and code modules
Loop 3
{
Result := ConsoleApp_RunWait(cmdline, SrcDir)
If (Result)
Break
}
;MsgBox %Result%
Loop parse, Result, `n, `r
{
StringLeft FileStatus, A_LoopField, 9
If (FileStatus = "reverting")
{
StringMid FName, A_LoopField, 11
FullPath = %SrcDir%%FName%
ToolTip Checking %FullPath%
RestoreIfNotEqual(FullPath, FullPath . ".orig")
}
}
ToolTip
}
RestoreIfNotEqual(FName, FNameOrig) {
FileRead File1, %FName%
FileRead File2, %FNameOrig%
StringLower File1, File1
StringLower File2, File2
;MsgBox %FName%`n%FNameOrig%
If (File1 = File2)
FileDelete %FNameOrig%
Else
FileMove %FNameOrig%, %FName%, 1
}
RestoreOriginals(SrcDir) {
Loop %SrcDir%*.orig
{
;MsgBox %A_LoopFileLongPath%`n%NewName%
NewName := SubStr(A_LoopFileLongPath, 1, -5)
FileMove %A_LoopFileLongPath%, %NewName%, 1
}
while FileExist(SrcDir . "*.orig")
Sleep 10
}

Related

Ignore JSON ordering

In a project, we use 2 IDEs. The project contains hundreds files of code, and hundreds special files of JSON format which constantly get reread and rewritten by these IDEs. While we used single IDE, it's not a problem, files always get written the same way. Unfortunately, different IDEs save JSON with different ordering which leads to dozens of changes for GIT and uselessly overwhelmed diff. These files are important and must not be excluded by GitIgnore, but they rarely get changed, and this probably can be handled manually.
So, is there a terminal command to quickly undo/unselect changes for specific file extension? Or, maybe it is possible for GIT to track changes of JSONs without considering the order?
I also had an idea to use custom script for reordering the JSONs, but it would consume too much CPU, and also lead to rereading by an IDE which is also bad.
Update
I found the following command from another SO question:
git checkout main -- $(git ls-files -- "*.yy")
This workaround isn't handy but basically solves the problem. If anybody knows how to make GIT ignore JSON ordering, it would be great!
One way to temporarily ignore changes to the json files is to tell git to assume they haven't changed:
git update-index --assume-unchanged file-to-ignore.json
And only when you want to commit, tell git to really look at the file again:
git update-index --no-assume-unchanged file-to-ignore.json
Another option would be to use a pre-commit-hook to sort the json only when committing.
i'd make a git pre-commit hook to make sure all JSONs are always formatted the same way, for example in .git/hooks/pre-commit put
#!/bin/sh
php git/precommit_hook.php
exit $?
and if you're on a unix-system, make sure pre-commit is chmod +x .git/hooks/pre-commit
and in git/precommit_hook.php put
<?php
declare (strict_types = 1);
if(PHP_VERSION_ID < 70300) {
fwrite(STDERR, "PHP 7.3 or higher is required to run this script");
exit(1);
}
$changed_files = explode("\x00", rtrim(shell_exec("git diff --name-only --cached -z"), "\x00"));
foreach ($changed_files as $file) {
if(!file_exists($file)) {
// File was deleted, skip it
continue;
}
$ext = pathinfo($file, PATHINFO_EXTENSION);
if ($ext === "json") {
$json = json_decode(file_get_contents($file), true);
if (json_last_error() !== JSON_ERROR_NONE) {
fwrite(STDERR, "JSON Error: " . json_last_error_msg() . " in $file, will not format it\n");
continue;
}
$json = json_encode($json, JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE | JSON_THROW_ON_ERROR);
file_put_contents($file, $json, LOCK_EX);
}
}
now all *.json files will be committed with the PHP json formatters JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE | JSON_THROW_ON_ERROR
no matter what IDE you use :)

Recursively Replace One Windows Path w/ Another in Text Files

I have a large amount of text files stored on a Red Hat server that contain explicit Windows paths. Today, that path has changed and I would like to change the text files to reflect the new path. As they are Windows paths, they all contain single backslashes. I would like to maintain the single backslashes if possible.
I wanted to ask what the best method to perform this string replacement would be. I have made backups of folders so that I may test on a smaller scale before applying to the larger scale that will affect my group members.
Example:
Change $oldPath to $newPath in all *.py files recursively contained in current directory.
i.e. $oldPath\common\file_referenced should become $newPath\common\file_referenced
Robustly using any awk in any shell on every Unix box and regardless of which characters your old or new directory paths contain and whether or not the final directory in either old or new could be a substring of another existing directory name:
$ cat file
\old\fashioned\common\file_referenced
$ oldPath='\old\fashioned'
$ newPath='\new\fangled\etc'
$ awk '
BEGIN { old=ARGV[1]; new=ARGV[2]; ARGV[1]=ARGV[2]="" }
index($0"\\",old"\\")==1 { $0=new substr($0,length(old)+1) }
1' "$oldPath" "$newPath" file
\new\fangled\etc\common\file_referenced
To update all .py files in a directory you could use GNU awk for -i inplace, or you could do for i in *.py; do awk '...' old new "$i" > tmp && mv tmp "$i"; done, or you could use find and/or xargs, etc. - any of the common Unix ways to process multiple files with any command.

HG command line input receives unwanted default input automatically in THG

I've written an Hg hook (in Python) to check the validity of the committed files according to out team rules. One of these rules prohibits files larger than XX kB, unless agreed with the ream. In case a large file is committed, I would like the hook to ask the user to allow the file.
I implemented it like this:
import re, os, sys, mercurial
MAX_SIZE_KB = 500
def check_committed_files(ui, repo, **kwargs):
changelog = repo.changelog._cache
lines = changelog[2].splitlines()
ui.status("Checking files...\n")
for line in lines[3:-2]:
fn = line
ui.status(" " + fn)
# check file size
file_size_kb = float(os.stat(line).st_size) / 1024
if file_size_kb > MAX_SIZE_KB:
if ui.prompt(" Allow file [%s] of %g kB?" % (fn, file_size_kb)).lower() not in ['y', 'yes']:
ui.warn(" Not allowed by user\n")
return 1
ui.flush()
return 0
It all works well if I use Hg CLI. But when I use TortoiseHg, the prompt is automatically yes-ed, so I get this in console:
Allow file [test.txt] of 2573.49 kB? y
and the hook goes on. I would like TortoiseHg to show a dialogue with Yes/No buttons. Is it possible? I'd like to have the solution as portable as possible, so e.g. no external Python modules that users need to install.
Since this is my first attempt with Hg hooks, any other comments on my implementation are also much appreciated.

auto-accepting a Mercurial change chunk

I have a very large repo with thousands of files that can regularly get updated by automatic processes that are out of my control (this is for Unity 3D, for what it's worth).
For example, if I upgrade Unity to a new version, it will reimport all textures and maybe add a line in thousands of .meta files that correspond to a new serialized data that didn't exist previously.
Obviously reviewing thousands of files is terrible. Most of the time though, I can quickly identify a particular diff, and would just like to automatically check all the files that have the same diff, commit to get them out of the way, and see what's left: other diffs that I might not know about.
For example I just commited 4000+ files that all contained this diff:
So the pattern would be easy to find:
- textureFormat: -5
+ textureFormat: -1
I suppose I could write a script, or a TortoiseHg tool to do that, I just have no idea where to begin. I'd need to iterate over all changed files/chunks, match a pattern, commit the chunks...
I know of no tool to do exactly what you want. However I believe it's relatively easy to write a small bash script for such or use the command line:
hg diff --nodates --noprefix -U 0 | grep '^+' | grep -v '+++' | sort | uniq -c
will list you the inserted lines of the current diff in descending order of the number of occurences, thus the most frequently occurring diff first.
With that list you get a list of files which match the newly inserted pattern, for instance
hg files "set:grep('^ textureFormat: -1')"
should give you all files with that pattern (whether it's new or not, though). You probably want to check those files, whether their diff contains anything else:
hg diff "set:grep('^ textureFormat: -1')"
Now you can make use of the results and even exclude single files, if the diff output didn't suit you:
hg commit "set:grep('^ textureFormat: -1') and not 'unwantedFilename.cpp'"
In the above commands I made use of the fileset capability and of hg grep which accepts regular expressions. Check hg help grep, hg help fileset and hg help patterns for a more in-depth explanation.

how do I open files with conflicts during git/mercurial merge in textmate/sublime

how do I open from terminal window only files with conflicts during git/mercurial merge in textmate/sublime text2 editors
You can use the following to open all files with git merge conflicts in sublime text:
git diff --name-only | uniq | xargs subl
I wanted to add another answer. git diff --name-only will give you all files that have diffs. This is why sometimes it will yield duplicate entries because it marks the file as "modified" as well as in a merge conflict state. Piping it into uniq is a good solution for this but git diff --name-only will also include files you might have purposely changed so it doesn't actually filter only files with merge conflicts. When you are in the middle of rebasing, this is probably not going to happen often though I would say in most cases #StephanRodemeier's answer works.
However, what you can do though is leverage the --diff-filter option which assigns a states to files. See more in the docs
--diff-filter=[(A|C|D|M|R|T|U|X|B)…​[*]]
Select only files that are Added (A), Copied (C), Deleted (D), Modified (M), Renamed (R), have their type (i.e. regular file, symlink, submodule, …​) changed (T), are Unmerged (U), are Unknown (X), or have had their pairing Broken (B). Any combination of the filter characters (including none) can be used. When * (All-or-none) is added to the combination, all paths are selected if there is any file that matches other criteria in the comparison; if there is no file that matches other criteria, nothing is selected.
It seems when files are in the both modified state, the diff status gets set to U (Unmerged) and M (Modified) so you can filter for only Unmerged files.
git diff --diff-filter=U --name-only | xargs subl
Should work without needing to pipe into uniq
Another thing you can consider is simply setting your editor as the difftool i.e. for VSCode documentation specifies how to do this by adding this to your .gitconfig
[diff]
tool = default-difftool
[difftool "default-difftool"]
cmd = code --wait --diff $LOCAL $REMOTE