I have a directory structure like this:
dir/
└── subdir
My code:
import os
for d in os.walk('dir'):
print(d)
I get the output:
('dir', ['subdir'], [])
('dir/subdir', [], [])
My question is what are those trailing [ ]s ?
There is 1 in the first tuple and 2 in the second.. it confuses me.
It's worth checking out the Python docs for questions like this as they tend to have pretty solid documentation: https://docs.python.org/2/library/os.html#os.walk
Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).
So it will always return a 3-tuple.
For your first directory 'dir', it contains one directory called 'subdir', and it doesn't contain any files so there's an empty list for filenames.
It then has another entry for subdir, which is your 'dir/subdir'. 'subdir' doesn't have any directories or files under it, so you have empty lists for both dirnames and filenames. The key thing is that it always returns a 3-tuple, and the last two elements are always lists, so there are no subdirectories or files, it will return empty lists.
My understanding is when we do a diff using mercurial it creates 2 temporary folders- one for each revision being compared against:
Mercurial/extdiff not changing to temp dir (as I THINK it's supposed to)
I know how to set up a external diff(say EXTDIFFTOOL) tool using the mercurial .ini file. My problem is: How can I let my EXTDIFFTOOL take 2 arguments: one each for the temp folders.
Is there an efficient command-line tool for prepending lines to a file inside a ZIP archive?
I have several large ZIP files containing CSV files missing their header, and I need to insert the header line. It's easy enough to write a script to extract them, prepend the header, and then re-compress, but the files are so large, it takes about 15 minutes to extract each one. Is there some tool that can edit the ZIP in-place without extracting?
Fast answer, no.
A zip file contains 1 to N file entries inside and all of them works as un splitable units, meaning that if you want to do something on an entry, you need to process this entry completely (i.e. extracting).
The only fast operation you can do is adding a new file to your archive. It will create a new entry and append it to the file, but this is probably not what you need
Assume we have a directory with structure like this, I marked directories as (+) and files as (-)
rootdir
+a
+a1
-f1
-f2
+a2
-f3
+b
+b1
+b2
-f4
-f5
-f6
+b3
-f7
-f8
and a given list of files like
/a/a1/f1
/b/b1/b2/f5
/b/b3/f7
I am struggling to find the way to remove every files inside root, except the one in the given list. So after the program executed, the root directory should look like this:
rootdir
+a
+a1
-f1
+b
+b1
+b2
-f5
+b3
-f7
This example just for easier to understand the problem. In reality, the given list include around 4 thousands of files. And the root directory has the size of ~15GB with a hundreds of thousands files inside.
That would be easy to search inside a folder, and to remove files that matched in a given list. Let just say we solve the revert issue, to keep files that matched in a given list.
Programs written in Perl/Python are prefer.
First, store your list of files you want to keep inside an associative container like a Python dict or a map of some kind.
Second, simply iterate (in Python, os.walk) over the entire directory structure, and every time you see a file, check if it is in the associative container of paths to keep. If not, delete it (in Python, os.unlink).
Alternatively:
First, create a temporary directory on the same filesystem.
Second, move (os.renames, which generates new subdirectories as needed) all the "keep" files to the temporary directory, with the same structure.
Third, overwrite (os.removedirs followed by os.rename, or just shutil.move) the original directory with the temporary one.
The os.walk path:
import os
keep = set(['/a/a1/f1', '/b/b1/b2/f5', '/b/b3/f7'])
for dirpath, dirnames, filenames in os.walk('./'):
for name in filenames:
path = os.path.join(dirpath, name).lstrip('.')
print('check ' + path)
if path not in keep:
print('delete ' + path)
else:
print('keep ' + path)
It doesn't do anything except inform you.
It don't think os.walk is too slow, and it gives you the option of keeping by regex patterns or any other criteria.
This is a working code for your problem.
import os
def list_files(directory):
for root, dirs, files in os.walk(directory):
for name in files:
yield os.path.join(root, name)
files_to_delete = {'/home/vedang/Desktop/a.out', '/home/vedang/Desktop/ABC/temp.txt'} #Keep a set instead of list for faster lookups
for f in list_files('/home/vedang/Desktop'):
if f in files_to_delete:
os.unlink(f)
Here is a function which accepts a set of files you wish to keep and the root directory from which you wish to begin deleting files.
It's a classic recursive Depth-First-Search that will remove empty directories after deleting all the unwanted files
import os
def delete_files(keep_list:set, curr_dir):
files = os.listdir(curr_dir)
for f in files:
path = f"{curr_dir}/{f}"
if os.path.isfile(path):
if path not in keep_list:
os.remove(path)
elif os.path.islink(path):
os.unlink(path)
elif os.path.isdir(path):
delete_files(keep_list, path)
files = os.listdir(curr_dir)
if not files:
os.rmdir(curr_dir)
here i got a solution in a different aspect,
suppose we are at linux environment,
first,
find .
to get a long list with all file path/folder explained
second, suppose we got the exclude path list, in order to exclude at your volume ( say thousands ) , we could just append these to the previous list, and
| sort | uniq - c |grep -v "^2"
to get the to delete list,
and third
| xargs rm
to actually do the deletion
having directory structure
foo
-1.txt
-1.notxt
-bar
-2.txt
-3.notxt
-sub
-1.txt
-2.txt
another-folders-and-files
Want to exclude all non-'.txt' files in folder foo and its subfolders.
The most similar is this pattern:
^foo/(?!.*\.txt$)
But it exclude not only 1.notxt file from foo, but all subfolders too.
I think, it because of bar is matches my exclusion pattern, but I do not understand how to say to hg not to ignore bar.
Any ideas?
Unfortunately mercurial ignore patterns don't distinguish between files and directories; so if you ignore every name that doesn't end in .txt, you'll ignore directories too. But since directory names don't usually have a suffix, what you can do is ignore every name that has a suffix other than .txt, like this:
^foo/.*[^/]\.[^/]*$(?<!txt)
Breakdown:
^foo/.* any path in foo/; followed by
[^/]\. a period preceded by (at least) one non-slash character; followed by
[^/]*$, a path-final suffix; finally:
(?<!txt) check that there was no txt immediately before the end of the line.
This lets through names that begin with a period (.hgignore), and names containing no period at all (README). If you have files with no suffix you'll have to find another way to exclude them, but this should get you most of the way there. If you have directory names with dots in the middle, this will suppress them and you'll need to work harder to exclude them-- or change your approach.
(Incidentally, it's probably safer to have a long list of ignored suffixes, and add to it as necessary; soon enough the list will stabilize, and you won't risk ignoring something that shouldn't be.)