Can GPG change the contents of an encrypted file? - csv

Our company has a vendor which sends a csv that contains commas that are part of the text. This causes columns drift to the right. They claim that they are enclosing those fields in quotation marks (which would resolve the issue) but when we decrypt them using gpg, the quotation marks are being lost.
Is this claim nonsense?
The file is delivered encrypted as a .pgp.
This is the template for the batch file we use to invoke gpg to perform the decryption.
gpg --batch --yes --passphrase {PASSPHRASE} --pinentry-mode loopback -d -o "{OUTPUT}" "{TARGET}"

They claim that they are enclosing those fields in quotation marks (which would resolve the issue) but when we decrypt them using gpg, the quotation marks are being lost.
Is this claim nonsense?
Yes because files before encryption and after decryption are identical.
If you want assurance the files are unchanged, have the vendor create a hash (ie, sha256) of the file before encryption and include this hash when he sends you the file.
For example, something like sha256sum FILE > SHA256SUM.txt && gpg -r USER -e FILE would produce a SHA256SUM.txt file containing the sha256 hash of FILE and also encrypt FILE with USER's key. The vendor can then send you the SHA256SUM.txt file along with the encrypted file so you can compare it to the hash of the decrypted file.

Related

Recursively Replace One Windows Path w/ Another in Text Files

I have a large amount of text files stored on a Red Hat server that contain explicit Windows paths. Today, that path has changed and I would like to change the text files to reflect the new path. As they are Windows paths, they all contain single backslashes. I would like to maintain the single backslashes if possible.
I wanted to ask what the best method to perform this string replacement would be. I have made backups of folders so that I may test on a smaller scale before applying to the larger scale that will affect my group members.
Example:
Change $oldPath to $newPath in all *.py files recursively contained in current directory.
i.e. $oldPath\common\file_referenced should become $newPath\common\file_referenced
Robustly using any awk in any shell on every Unix box and regardless of which characters your old or new directory paths contain and whether or not the final directory in either old or new could be a substring of another existing directory name:
$ cat file
\old\fashioned\common\file_referenced
$ oldPath='\old\fashioned'
$ newPath='\new\fangled\etc'
$ awk '
BEGIN { old=ARGV[1]; new=ARGV[2]; ARGV[1]=ARGV[2]="" }
index($0"\\",old"\\")==1 { $0=new substr($0,length(old)+1) }
1' "$oldPath" "$newPath" file
\new\fangled\etc\common\file_referenced
To update all .py files in a directory you could use GNU awk for -i inplace, or you could do for i in *.py; do awk '...' old new "$i" > tmp && mv tmp "$i"; done, or you could use find and/or xargs, etc. - any of the common Unix ways to process multiple files with any command.

Merging PDF's with special characters in file name with Sejda Console via .CSV

I'm new to this forum and I'm not a programmer, so forgive me if I'm asking stupid questions...
I'm trying to merge some pdf's to 1 pdf with Sejda Console using a .csv file, but when the .csv contains special characters (e.g. ø) Sejda returns with:
Invalid filename found: File 'Something � something.pdf"...
So, it changed ø in �.
I've tried to import the .csv with different encoding standards (via Notepad save as: ANSI, UNICODE and UTF-8) and they all don't work (but, they have all a unique way to screw up the filename...)
Without this kind of characters it works fine.
It also works fine when the file names with ø are given directly in the syntax, like:
sejda-console-3.1.3/bin/sejda-console merge -f first.pdf second.pdf -o merged.pdf
And a second problem occurred: when a comma exists in the file name, the file name stops by the comma. Logically when the list separator is still a comma, but on my pc the list separator is a semicolon (Regional and Language Options). Adding quotes around the file name doesn't work...
I call the batch of Sejda with:
call "C:\sejda-console-3.0.13\bin\sejda-console.bat" merge -l 28.csv -o 28.pdf
And for this test 28.csv contains:
1700050.1_0060764-CROSS TRACK SKATE AXLE.pdf,
1700050.1_0060792-ø32 ATK10K6 FIXING PLATE.pdf,
1700050.1_0060798-CROSS TRACK SKATE NUTPLATE.pdf,
What is the proper way to get Sejda to merge correctly?

CVS -- Need command line to change status of file file from Binary to allow keyword substitution

I am coming into an existing project after several years of use. I have been attempting to add the nice keywords $Header$ and $Id$ so that I can identify the file versions in use.
I have come across several text files where these keywords did not expand at all. Investigation has determined that CVS thinks these files are BINARY and will not expand the keywords.
Is there anyway from a Linux Command Line invocation to permanently change the status of these files in the repository to cause keyword expansion? I'd be appreciative if you could tell me. Several attempts that I have tried have not succeeded.
cvs admin -kkv filename
will restore the file to the default text mode so keywords are expanded.
If you type
cvs log -h filename
(to show just the header and not the entire history), a binary file will show
keyword substitution: b
which indicates that keyword substitution is never done, while a text file will show
keyword substitution: kv
The CVSROOT/cvswrappers file can be used to specify the default new files you add, based on their names.

How to load triplets from a csv-file into MarkLogic?

What I am starting with, is the postcode table from the netherlands. I split it up into a couple of csv files, containing for instance the city as subject, PartOf as predicate and municipality as object. This gives you this in a file:
city,PartOf,municipality
Meppel,PartOf,Meppel
Nijeveen,PartOf,Meppel
Rogat,PartOf,Meppel
Now I would like to get this data into MarkLogic. And I can import csv-files, I can import triples, but I can't figure out the combination.
I would suggest rewriting it slightly so it conforms to the N-Triples format, giving it the .nt extension, and then using MLCP to load it as input_type rdf.
HTH!
You can use Google Refine to convert CSV data to RDF. After that, MLCP can be used to push that data. You can do something like this -
$ mlcp.sh import -username user -password password -host localhost \
-port 8000 -input_file_path /my/data -mode local \
-input_file_type rdf
For more on loading triples using MLCP you can refer this MarkLogic Community Page

Script to adjust history in an RCS/CVS ,v file

In preparation for a migration to Mercurial, I would like to make some systematic changes to many thousands of ,v files. (I'll be editing copies of the originals, I hasten to add.)
Examples of the sorts of changes I'm after:
For each revision whose message begins with some text that indicates a known username (e.g. [Fred Bloggs]), if the username in the comment matches the Author in the ,v file, then delete the unnecessary username text from the commit message
If the ,v contains a useful description, append it to the commit message for revision 1.1 (cvs2hg ignores the description - but lots of our CVS files actually came from RCS, where it was easy to put the initial commit message into the description field by mistake)
For edits made from certain shared user accounts, adjust the author, depending on the contents of the commit message.
Things I've considered:
Running 'cvs log' on each individual ,v file - parsing the output, and using rcs -m to change this history. Problems with this include:
there doesn't seem to be a way to pass a text file to rcs -m - so if the revision message contained singled and/or or double quotes, or spanned multiple lines, it would be quite a challenge quoting it correctly in the script
I can't see an rcs or cvs facility to change the author name associated with a revision
less importantly, it would be likely to start a huge number of processes - which I think could get slow
Writing Python to parse the ,v file, and adjust the contents. Problems with this include:
we have a mixture of line-endings in our ,v files - including some binary files that should have been text, and vice-versa - so great care would be needed to not corrupt the files
care would be needed for quoting of the # character in any commit messages, if it fell on the start of the line in a multi-line comment
care would also be needed on revisions where the last line of the committed file was changed, and doesn't have a newline - meaning that the ,v has a # at the very end of a line, instead of being preceded by \n
Clone the version of cvs2hg that we are using, and try to adjust its code to make the desired edits in-place
Are there any other approaches that would be less work, or any existing code that implements this kind of functionality?
Your first approach may be the best one. I know that in Perl, handling quotation marks and multiple lines wouldn't be a problem. For example:
my $revision = ...;
my $log_message = ...;
system('rcs', "-m$revision:$log_message", $filename);
where $log_message can contain any arbitrary text. Since the string doesn't go through the shell, newlines and other metacharacters won't be reinterpreted. I'm sure you can do the same thing in Python.
(As for your second approach, I wouldn't expect line endings to be a problem. If you have Unix-style \n endings and Windows-style \r\n endings, you can just treat the trailing \r as part of the line, and everything should stay consistent. I'm making some assumptions here about the layout of ,v files.)
I wrote a Python library, EditRCS (PyPi) that implements the RCS format so the user can load an RCS file as a tree of Python objects, modify it programmatically and save to a new RCS file.
You can apply a function to every revision using mapDeltas(), for example to change an author's name; or walk the tree using getNext() for something more complicated such as joining two file histories together.