Creating a diff which ignores differences between sentinel lines - language-agnostic

I'm looking for a possible way of getting around some merge conflicts when working through different branches.
It's not unlikely that some information in some files (especially version numbers) are NOT to be spread around different branches, so I'm looking for some way to output a diff ignoring text between well defined sentinel lines, and I'd like to know if there's anything around without coding my own solution.
That what I'd like: suppose two source files that look like
some text
DIFF_IGNORE_START
foo bar
DIFF_IGNORE_END
some other text
one
and
some text
DIFF_IGNORE_START
different text
DIFF_IGNORE_END
some other text
two
I want the diff to be
--- original 2011-04-04 15:34:06.000000000 +0200
+++ modified 2011-04-04 15:35:13.000000000 +0200
## -3,4 +3,4 ##
foo bar
DIFF_IGNORE_END
some other text
-one
+two
I'd need a solution that allows the ignored blocks to be of a different size as well.

One way to implement this would be through a custom diff driver, declaring a special diff script in a .gitattributes file, which would:
remove every DIFF_IGNORE_xxx sections on root, source and destination versions, replacing them with dummy content (always identical between the three version)
perform the diff with the modified versions

Related

Formatting wide output via 'column' (or similar) command(s)

This question actually asks the 'inverse' solution as the one here, namely I would like to wrap the long column (column 4) on multiple lines. In effect, the output should look like:
cat test.csv | column -s"," -t -c5
col1 col2 col3 col4 col5
1 2 3 longLineOfText 5
ThatIWantTo
InspectAndWould
LikeToWrap
(excuse the u.u.o.c. duplicated over here :) )
The solution would ideally :
make use of standard *nix text processing utilities (e.g. column, paste, pr which usually are present on any modern Linux machine nowadays, usually coming from the core-utils package);
avoid jq as it is not necessarily present on every (production) system;
don't overheat the brain: yes... am looking mainly at you awk & co. gurus :). "Normal" awk / perl / sed is fine.
as a special bonus , a solution using vim would be even more welcome (again, no brain smoke please), since that would allow for syntax-coloring as well.
The background: I want to be able to make sense of the output of docker history, so as a last resort even some Go Template-magic would suit, as would using jq.
In extreme cases (if the benefits of ease-of-remembering-and-use outweigh the inconvenience of downloading a new utilty (preferably self-contained / static linked) utility on the server - is ok, or using json processing commands (in which case using pythons json module would be preferred)
Thanks !
LE:
Please keep in mind, that dockers output has the columns separated with several spaces, which unfortunately confuses most commands :(

What does 'multiline strings are different' meant by from RIDE (Robot Framework) output?

i am trying to compare two csv file data and followed below process in RIDE -
${csvA} = Get File ${filePathA}
${csvB} = Get File ${filePathB}
Should Be Equal As Strings ${csvA} ${csvB}
Here are my two csv contents -
csvA data
Harshil,45,8.03,DMJ
Divy,55,8,VVN
Parth,1,9,vvn
kjhjmb,44,0.5,bugg
csvB data
Harshil,45,8.03,DMJ
Divy,55,78,VVN
Parth,1,9,vvnbcb
acc,5,6,afafa
As few of the data is not in match, when i Run the code in RIDE, the result is FAIL. But in the log below data is shown -
**
Multiline strings are different:
--- first
+++ second
## -1,4 +1,4 ##
Harshil,45,8.03,DMJ
-Divy,55,8,VVN
-Parth,1,9,vvn
-kjhjmb,44,0.5,bugg
+Divy,55,78,VVN
+Parth,1,9,vvnbcb
+acc,5,6,afafa**
I would like to know the meaning of ---first +++second ##-1,4+1,4## content.
Thanks in advance!
When robot compares multiline strings (data that has newlines in it), it uses the standard unix tool diff to show the differences. Those characters are all part of what's called a unified diff. Even though you pass in raw data, it's treating the data as two files and showing the differences between the two in a format familiar to most programmers.
Here are two references to read more about the format:
What does "## -1 +1 ##" mean in Git's diff output?. (stackoverflow)
the diff man page (gnu.org)
In short, the ## gives you a reference for which line numbers are different, and the + and - show you which lines are different.
In your specific example it's telling you that three lines were different between the two strings: the line beginning with Divy, the line beginning with Parth, and the line beginning with acc. Since the line beginning with Harshil does not show a + or -, that means it was identical between the two strings.

Hidden character in Pages and MySql

I have a text that seems that have a hidden character.
The original text was written with Apple Pages, the word processor, and copy-paste to a MySql database. They are h2 written with markdown. I detected that hidden character when I make a SELECT to the database to output the ## (.*) space and convert to h2 tag. Some of them work and some do not. For instance, if I use /## / (with a space behind #) regex only finds ## Brand:
## Brand
## New Tech
I tested that in different regex tools. For instance: http://regexr.com/3f660 They all find only ## Brand. with /## /
I can solve the problem if I use ##\s or just delete that space and make a space again. I have many cases like that in a big database and I would like to understand first and clean it later. If I go to Apple Pages > Show Invisibles it shows : between # and N in ## New Tech. What is that character and how can I find id to delete it in a MySql database?

Mercurial - List out the modified lines of code in a file with line number

I'm new to mercurial, i have a certain revision with me and i would like to switch to that particular revision and save the change-set of a particular file with line number. Thank You
I don't know a simple way to do this. Mercurial has a method where it calculates the diff between changesets and then it applies a formatter to this to print the data.
But your requirement is more complex than it looks. Imagine you have two changes in a file. In version 2, a couple of lines at the beginning have been deleted and then a line near the end has been changed.
Questions:
How do you plan to assign line numbers to the deleted lines? Omit them or use the original line numbers from version 1?
How about the lines after the deleted lines? Do you want to show the new line numbers or the original ones?
Which line numbers are you going to show for the changes near the end?
Of course, you could show both but that would need a lot of parsing in your head.
Some HTML-based changeset viewers use this approach: https://bitbucket.org/digulla/ts-html/commits/62fc23841ff7e7cce95eefa85244a2b821f92ba2
But I haven't see something similar for the command line since it would waste 15-20 columns of text.

Script to adjust history in an RCS/CVS ,v file

In preparation for a migration to Mercurial, I would like to make some systematic changes to many thousands of ,v files. (I'll be editing copies of the originals, I hasten to add.)
Examples of the sorts of changes I'm after:
For each revision whose message begins with some text that indicates a known username (e.g. [Fred Bloggs]), if the username in the comment matches the Author in the ,v file, then delete the unnecessary username text from the commit message
If the ,v contains a useful description, append it to the commit message for revision 1.1 (cvs2hg ignores the description - but lots of our CVS files actually came from RCS, where it was easy to put the initial commit message into the description field by mistake)
For edits made from certain shared user accounts, adjust the author, depending on the contents of the commit message.
Things I've considered:
Running 'cvs log' on each individual ,v file - parsing the output, and using rcs -m to change this history. Problems with this include:
there doesn't seem to be a way to pass a text file to rcs -m - so if the revision message contained singled and/or or double quotes, or spanned multiple lines, it would be quite a challenge quoting it correctly in the script
I can't see an rcs or cvs facility to change the author name associated with a revision
less importantly, it would be likely to start a huge number of processes - which I think could get slow
Writing Python to parse the ,v file, and adjust the contents. Problems with this include:
we have a mixture of line-endings in our ,v files - including some binary files that should have been text, and vice-versa - so great care would be needed to not corrupt the files
care would be needed for quoting of the # character in any commit messages, if it fell on the start of the line in a multi-line comment
care would also be needed on revisions where the last line of the committed file was changed, and doesn't have a newline - meaning that the ,v has a # at the very end of a line, instead of being preceded by \n
Clone the version of cvs2hg that we are using, and try to adjust its code to make the desired edits in-place
Are there any other approaches that would be less work, or any existing code that implements this kind of functionality?
Your first approach may be the best one. I know that in Perl, handling quotation marks and multiple lines wouldn't be a problem. For example:
my $revision = ...;
my $log_message = ...;
system('rcs', "-m$revision:$log_message", $filename);
where $log_message can contain any arbitrary text. Since the string doesn't go through the shell, newlines and other metacharacters won't be reinterpreted. I'm sure you can do the same thing in Python.
(As for your second approach, I wouldn't expect line endings to be a problem. If you have Unix-style \n endings and Windows-style \r\n endings, you can just treat the trailing \r as part of the line, and everything should stay consistent. I'm making some assumptions here about the layout of ,v files.)
I wrote a Python library, EditRCS (PyPi) that implements the RCS format so the user can load an RCS file as a tree of Python objects, modify it programmatically and save to a new RCS file.
You can apply a function to every revision using mapDeltas(), for example to change an author's name; or walk the tree using getNext() for something more complicated such as joining two file histories together.