Recursively Replace One Windows Path w/ Another in Text Files - language-agnostic

I have a large amount of text files stored on a Red Hat server that contain explicit Windows paths. Today, that path has changed and I would like to change the text files to reflect the new path. As they are Windows paths, they all contain single backslashes. I would like to maintain the single backslashes if possible.
I wanted to ask what the best method to perform this string replacement would be. I have made backups of folders so that I may test on a smaller scale before applying to the larger scale that will affect my group members.
Example:
Change $oldPath to $newPath in all *.py files recursively contained in current directory.
i.e. $oldPath\common\file_referenced should become $newPath\common\file_referenced

Robustly using any awk in any shell on every Unix box and regardless of which characters your old or new directory paths contain and whether or not the final directory in either old or new could be a substring of another existing directory name:
$ cat file
\old\fashioned\common\file_referenced
$ oldPath='\old\fashioned'
$ newPath='\new\fangled\etc'
$ awk '
BEGIN { old=ARGV[1]; new=ARGV[2]; ARGV[1]=ARGV[2]="" }
index($0"\\",old"\\")==1 { $0=new substr($0,length(old)+1) }
1' "$oldPath" "$newPath" file
\new\fangled\etc\common\file_referenced
To update all .py files in a directory you could use GNU awk for -i inplace, or you could do for i in *.py; do awk '...' old new "$i" > tmp && mv tmp "$i"; done, or you could use find and/or xargs, etc. - any of the common Unix ways to process multiple files with any command.

Related

Algorithm to delete every files in a directory, except some in a given list

Assume we have a directory with structure like this, I marked directories as (+) and files as (-)
rootdir
+a
+a1
-f1
-f2
+a2
-f3
+b
+b1
+b2
-f4
-f5
-f6
+b3
-f7
-f8
and a given list of files like
/a/a1/f1
/b/b1/b2/f5
/b/b3/f7
I am struggling to find the way to remove every files inside root, except the one in the given list. So after the program executed, the root directory should look like this:
rootdir
+a
+a1
-f1
+b
+b1
+b2
-f5
+b3
-f7
This example just for easier to understand the problem. In reality, the given list include around 4 thousands of files. And the root directory has the size of ~15GB with a hundreds of thousands files inside.
That would be easy to search inside a folder, and to remove files that matched in a given list. Let just say we solve the revert issue, to keep files that matched in a given list.
Programs written in Perl/Python are prefer.
First, store your list of files you want to keep inside an associative container like a Python dict or a map of some kind.
Second, simply iterate (in Python, os.walk) over the entire directory structure, and every time you see a file, check if it is in the associative container of paths to keep. If not, delete it (in Python, os.unlink).
Alternatively:
First, create a temporary directory on the same filesystem.
Second, move (os.renames, which generates new subdirectories as needed) all the "keep" files to the temporary directory, with the same structure.
Third, overwrite (os.removedirs followed by os.rename, or just shutil.move) the original directory with the temporary one.
The os.walk path:
import os
keep = set(['/a/a1/f1', '/b/b1/b2/f5', '/b/b3/f7'])
for dirpath, dirnames, filenames in os.walk('./'):
for name in filenames:
path = os.path.join(dirpath, name).lstrip('.')
print('check ' + path)
if path not in keep:
print('delete ' + path)
else:
print('keep ' + path)
It doesn't do anything except inform you.
It don't think os.walk is too slow, and it gives you the option of keeping by regex patterns or any other criteria.
This is a working code for your problem.
import os
def list_files(directory):
for root, dirs, files in os.walk(directory):
for name in files:
yield os.path.join(root, name)
files_to_delete = {'/home/vedang/Desktop/a.out', '/home/vedang/Desktop/ABC/temp.txt'} #Keep a set instead of list for faster lookups
for f in list_files('/home/vedang/Desktop'):
if f in files_to_delete:
os.unlink(f)
Here is a function which accepts a set of files you wish to keep and the root directory from which you wish to begin deleting files.
It's a classic recursive Depth-First-Search that will remove empty directories after deleting all the unwanted files
import os
def delete_files(keep_list:set, curr_dir):
files = os.listdir(curr_dir)
for f in files:
path = f"{curr_dir}/{f}"
if os.path.isfile(path):
if path not in keep_list:
os.remove(path)
elif os.path.islink(path):
os.unlink(path)
elif os.path.isdir(path):
delete_files(keep_list, path)
files = os.listdir(curr_dir)
if not files:
os.rmdir(curr_dir)
here i got a solution in a different aspect,
suppose we are at linux environment,
first,
find .
to get a long list with all file path/folder explained
second, suppose we got the exclude path list, in order to exclude at your volume ( say thousands ) , we could just append these to the previous list, and
| sort | uniq - c |grep -v "^2"
to get the to delete list,
and third
| xargs rm
to actually do the deletion

Using arrays in a for loop, in bash [duplicate]

This question already has answers here:
bash script, create array of all files in a directory
(3 answers)
Closed 7 years ago.
I am currently working on a bash script where I must download files from our mySQL database, host them somewhere different, then update the database with the new location for the image. The last portion is my problem area, creating the array full of filenames and iterating through them, replacing the file names in the database as we go.
For whatever reason I keep getting these kinds of errors:
not found/X2b6qZP.png: 1: /xxx/images/X2b6qZP.png: ?PNG /xxx/images/X2b6qZP.png: 2: /xxx/images/X2b6qZP.png: : not found
/xxx/images/X2b6qZP.png: 1: /xxx/images/X2b6qZP.png: Syntax error: word unexpected (expecting ")")
files=$($DOWNLOADDIRECTORY/*)
files=$(${files[#]##*/})
# Iterate through the file names in the download directory, and assign the new values to the detail table.
for file in "${files[#]}"
do
mysql -h ${HOST} -u ${USER} -p${PASSWORD} ${DBNAME} "UPDATE crm_category_detail SET detail_value = 'http://xxx.xxx.x.xxx/img/$file' WHERE detail_value LIKE '%imgur.com/$file'"
done
You are trying to execute a glob as a command. The syntax to use arrays is array=(tokens):
files=("$DOWNLOADDIRECTORY"/*)
files=("${files[#]##*/}")
You are also trying to run your script with sh instead of bash.
Do not run sh file or use #!/bin/sh. Arrays are not supported in sh.
Instead use bash file or #!/bin/bash.
whats going on right here?
files=$($DOWNLOADDIRECTORY/*)
I dont think this is doing what you think it is doing.
According to this answer, you want to omit the first $ to get an array of files.
files=($DOWNLOADDIRECTORY/*)
I just wrote a sample script
#!/bin/sh
alist=(/*)
printf '%s\n' "${alist[#]}"
Output
/bin
/boot
/data
/dev
/dist
/etc
/home
/lib
....
Your assignments are not creating arrays. You need arrayname=( values for array ) as the notation. Hence:
files=( "$DOWNLOADDIRECTORY"/* )
files=( "${files[#]##*/}" )
The first line will give you all the names in the directory specified by $DOWNLOADDIRECTORY. The second carefully removes the directory prefix.
I've used spaces after ( and before ) for clarity; the shell neither requires nor objects to them. I used double quotes around the variable name and expansions to keep things sane when name do contain spaces etc.
Although it isn't immediately obvious why you might do this, its advantage over many alternatives is that it preserves spaces etc in file names.
You could just loop directly over the files:
for file in "$DOWNLOADDIRECTORY"/*; do
file="${file##*/}" # or file=$(basename "$file")
# MySQL stuff
done
Some quoting added in case of spaces in paths.

Managing a pair of files as one in Mercurial

I'm working with small binary files in Mercurial as posted.
This binary files can be dumped as text to make a diff between versions, but the problem is that the files comes in pairs (eg: Form.scx / Form.sct), and I cannot found a way to tell Mercurial to "make a snapshot" (copy to a temporary location) of the other corresponding file when I do an hg ediff.
Just make a quick script and set that as the tool for extdiff. I'm guessing you're on Windows, but whatever the powershell equivalent to this is:
#!/bin/sh
binary-to-text $1 /tmp/$1.sct
binary-to-text $2 /tmp/$2.sct
diff /tmp/$1.sct /tmp/$2.sct
rm /tmp/$1.sct /tmp/$2.sct
That creates, compares, and then deletes the text versions. You'd want to be careful to not overwrite, deal with multiple concurrent invocations, etc.
Then configure a new command to run your script:
[extdiff]
cmd.mydiff = that_script_above.sh
Then you can do things like:
hg mydiff
Ideally you have only the "source" bonary format in your respository, not the text format, as you shouldn't keep generated items in the repo -- because if you update one but not the other you have an inconsistent state. Generating the comparable text files on demand is a better way to go.
As suggested by #Ryan, I ended up with a small batch previous to the diff program:
#echo off
set f1=%1
set f2=%2
::Temporary dir created by hg to copy the snapshot file
set tdir=%~dp1
::Original repository dir
set repo=%~dp2
::Filename extension
set ext=%~x1
::The binary files comes in pairs: scx/sct \ vcx/vct ...
set ex2=%ext:~0,-1%t
::Check if "dumpable" extension
echo %ext% | grep -iE "(vcx|vct|scx|sct|pjx|pjt|frx|frt)" > nul && goto DumpFile
goto diff
:DumpFile
set f1="%tdir%\_Dump1.prg"
set f2="%tdir%\_Dump2.prg"
::Get the pair file from the repository
hg cat %repo%\%~n1%ex2% -o "%~dpn1%ex2%" -R %repo%
::Do the dump, then the diff
MyDumpProgram.exe %1 %f1%
MyDumpProgram.exe %2 %f2%
goto diff
:diff
ExamDiff.exe %f1% %f2%
pause
and then config the batch in %UserProfile%\.hgrc
[extdiff]
cmd.ediff = d:\Utiles\diff2.bat

How to create a PATCH file for the binary difference output file

I want to know how to create a PATCH for the difference file I got by comparing two binary files.
$cmp -l > output file name
I checked for text files 'diff" can be used to compare and generate a PATCH file
$ diff -u oldFile newFile > mods.diff # -u tells diff to output unified diff format
I want to apply the PATCH on the old binary image file to get my new binary image file.
Diff and Patch are designed to work with text files, not arbitrary binary data. You should use something like bsdiff instead.
If your repository, or package is using git you can make binary diff with
git diff --patch --binary old_dir patched_dir
Of course you can also use it with commits
git diff --patch --binary commit1 commit2
JDIFF is a program that outputs the differences between two (binary) files.
Also you can use therdiff command.
If you still want to use diff & patch. Here is a way...
Write a c program yourself to insert a newline character at the end of every 512/1024/your_choice bytes (this is just to fool the diff as it compares the files line by line). Run this script on your two input files.
Then run 'diff -au file1 file2 > mod.diff (you will get the patch here)'
Patching is simple 'patch < mod.diff'
Than again write a program to remove the newlines from the binary file. That is all...

DIFF utility works for 2 files. How to compare more than 2 files at a time?

So the utility Diff works just like I want for 2 files, but I have a project that requires comparisons with more than 2 files at a time, maybe up to 10 at a time. This requires having all those files side by side to each other as well. My research has not really turned up anything, vimdiff seems to be the best so far with the ability to compare 4 at a time.
My question: Is there any utility to compare more than 2 files at a time, or a way to hack diff/vimdiff so it can do multiple comparisons? The files I will be comparing are relatively short so it should not be too slow.
Displaying 10 files side-by-side and highlighting differences can be easily done with Diffuse. Simply specify all files on the command line like this:
diffuse 1.txt 2.txt 3.txt 4.txt 5.txt 6.txt 7.txt 8.txt 9.txt 10.txt
Vim can already do this:
vim -d file1 file2 file3
But you're normally limited to 4 files. You can change that by modifying a single line in Vim's source, however. The constant DB_COUNT defines the maximum number of diffed files, and it's defined towards the top of diff.c in versions 6.x and earlier, or about two thirds of the way down structs.h in versions 7.0 and up.
diff has built-in option --from-file and --to-file, which compares one operand to all others.
--from-file=FILE1
Compare FILE1 to all operands. FILE1 can be a directory.
--to-file=FILE2
Compare all operands to FILE2. FILE2 can be a directory.
Note: argument name --to-file is optional.
e.g.
# this will compare foo with bar, then foo with baz .html files
$ diff --from-file foo.html bar.html baz.html
# this will compare src/base-main.js with all .js files in git repo,
# that has 'main' in their filename or path
$ git ls-files :/*main*.js | xargs diff -u --from-file src/base-main.js
Checkout "Beyond Compare": http://www.scootersoftware.com/
It lets you compare entire directories of files, and it looks like it runs on Linux too.
if your running multiple diff's based off one file you could probably try writing a script that has a for loop to run through each directory and run the diff. Although it wouldn't be side by side you could at least compare them quickly. hope that helped.
Not answering the main question, but here's something similar to what Benjamin Neil has suggested but diffing all files:
Store the filenames in an array, then loop over the combinations of size two and diff (or do whatever you want).
files=($(ls -d /path/of/files/some-prefix.*)) # Array of files to compare
max=${#files[#]} # Take the length of that array
for ((idxA=0; idxA<max; idxA++)); do # iterate idxA from 0 to length
for ((idxB=idxA + 1; idxB<max; idxB++)); do # iterate idxB + 1 from idxA to length
echo "A: ${files[$idxA]}; B: ${files[$idxB]}" # Do whatever you're here for.
done
done
Derived from #charles-duffy's answer: https://stackoverflow.com/a/46719215/1160428
There is a simple an good way to do this = GREP.
Depending on the size of the text you can copy and paste it, or you can redirect the input of the file to the grep command. If you make a grep -vir /path to make a reverse search or a grep -ir /path. This is my way for certification exams.