What are the return values of os.walk() in python?

What are the return values of os.walk() in python? - output

I have a directory structure like this:
dir/
└── subdir
My code:
import os
for d in os.walk('dir'):
print(d)
I get the output:
('dir', ['subdir'], [])
('dir/subdir', [], [])
My question is what are those trailing [ ]s ?
There is 1 in the first tuple and 2 in the second.. it confuses me.

It's worth checking out the Python docs for questions like this as they tend to have pretty solid documentation: https://docs.python.org/2/library/os.html#os.walk
Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).
So it will always return a 3-tuple.
For your first directory 'dir', it contains one directory called 'subdir', and it doesn't contain any files so there's an empty list for filenames.
It then has another entry for subdir, which is your 'dir/subdir'. 'subdir' doesn't have any directories or files under it, so you have empty lists for both dirnames and filenames. The key thing is that it always returns a 3-tuple, and the last two elements are always lists, so there are no subdirectories or files, it will return empty lists.

Related

How to copy or move multiple files with same extension?

So I am trying to move a bunch of files with similar extensions from /home/ to /root/
Code I tried is
file copy /home/*.abc.xyz /root/
Also tried
set infile [glob -nocomplain /home/*.abc.xyz ]
if { [llength $infile] > 0 } {
file copy $infile /root/
}
No success.

Your two attempts fail for different reasons:
There is no wildcard expansion in arguments to file copy, or any Tcl command, for that matter: file copy /home/*.abc.xyz /root/. This will look for a single source with a literal * in its filename.
glob -nocomplain /home/*.abc.xyz is ok to collect the sources, but glob returns a list of sources. file copy requires each source to passed as a separate argument, not a single one. To expand a single collection value of source files into a multiple separate arguments, use the Tcl expansion operator {*}
Therefore:
set infiles [glob -nocomplain *.tcl]
if {[llength $infiles]} {
file copy {*}$infiles /tmp/tgt/
}

For a 1-line answer:
file copy {*}[glob /home/*.abc.xyz] /root/.

The file copy (and file rename) commands have two forms (hence the reference to the manual page in the comment). The first form copies a single file to a new target. The second form copies all the file name arguments to a new directory and this form of the command insists that the directory name be the last argument and you may have an arbitrary number of source file names preceding. Also, file copy does not do glob expansion on its arguments, so as you rightly surmised, you also need to use the glob command to obtain a list of the files to copy. The problem is that the glob command returns a list of file names and you passed that list as a single argument, i.e.
file copy $infile /root/
passes the list as a single argument and so the file copy command thinks it is dealing with the first form and attempts to find a file whose name matches that of the entire list. This file probably doesn't exist. Placing the error message in your question would have helped us to know for sure.
So what you want to do is take the list of files contained in the infile variable and expand it into separate argument words. Since this is a common situation, Tcl has some syntax to help (assuming you are not using some ancient version of Tcl). Try using the command:
file copy {*}$infile /root/
in place of your first attempt and see if that helps the situation.

How to use Webpack to combine JSON files from all subdirectories into one?

Say I have a directory structured like this:
1. folder A
1a. some_file.js
1b. data.json
2. folder B
2a. folder B1
2a1. other_file.json
2a2. data.json
2b. folder B2
2b1. data.json
3. output.json
Is there a webpack loader that can combine data.json in all subfolders, and output it to output.json?
I've found https://www.npmjs.com/package/json-files-merge-loader that seems to do something similar, but it seems to ask for each path to data.json, while I need something that goes through all folders and subfolders for data.json. All data.json keys are unique, and I want output.json to be one JSON object containing all key/value pair from all data.json.

webpack is not really suited for the use case you have. Many people think that webpack can be a replacement for a build system, but it's just a module bundler, it doesn't handle every task. Specifically:
webpack traverses require() and import statements, meaning it needs the modules to be statically defined. It's possible to get around this by writing a plugin or using a file generated from a template, but even if you did that...
webpack would have a hard time combining the JSON files in the way you want. webpack is good at bundling files into some sort of modules system (CommonJS or AMD). What you want is not a bundle containing modules, but a file containing arbitrary contents.
webpack might be able to do these things using some fancy plugins, but it's likely not worth it. In your case, you probably just want to write a Node script. If you want, you can use a plugin like this to run the code before build.
const fs = require('fs');
// https://github.com/isaacs/node-glob
const glob = require('glob');
const output = {};
glob('src/**/*.json', (error, files) => {
files.forEach((filename) => {
const contents = JSON.parse(fs.readFileSync(filename, 'utf8'));
Object.assign(output, contents);
});
fs.writeFileSync('output.json', JSON.stringify(output));
});

Is there a webpack loader that can combine data.json in all subfolders, and output it to output.json?
Use the merge-webpack-plugin.
This is differ from to run the code before build. Because deep webpack integration allow to follow file changes and update results immidiately while hot reloading:
MergePlugin = require("merge-webpack-plugin");
module.exports = {
module: {
rules: [
{
test: /\.(json)$/i,
use: [
MergePlugin.loader()
]
}
]
},
plugins: [
new MergePlugin({
search: './src/**/*.json',
})
]
}
if you need only one target file output.json for all folders in your project describe them all in search param, and that is all
if you need to join separated files for each top level folders (output_A.json, output_B.json ...) then:
if your do not need to lookup through subfolders try to play with group param with value [path] - read more about grouping
if you need to join all files through each folder and its subfolders you need to create multiple plugin instances for each top level folder; but you can group files in each plugin instance by name or by ext (see grouping)
and some more

#dee Please check this plugin
https://www.npmjs.com/package/merge-jsons-webpack-plugin
You can pass array of files/patterns as an input and it will emit single file as json.

My multi-json-loader might be helpful here. It accepts a glob and combines the files into a single JSON blob while also retaining relative paths in the output object.
Ie, dirA/a.json, dirB/b.json, and dirB/c.json get output as
{
"dirA": {
"a": /*parsed contents of a.json*/
},
"dirB": {
"b": /*parsed contents of b.json*/,
"c": /*parsed contents of b.json*/
}
}

Algorithm to delete every files in a directory, except some in a given list

Assume we have a directory with structure like this, I marked directories as (+) and files as (-)
rootdir
+a
+a1
-f1
-f2
+a2
-f3
+b
+b1
+b2
-f4
-f5
-f6
+b3
-f7
-f8
and a given list of files like
/a/a1/f1
/b/b1/b2/f5
/b/b3/f7
I am struggling to find the way to remove every files inside root, except the one in the given list. So after the program executed, the root directory should look like this:
rootdir
+a
+a1
-f1
+b
+b1
+b2
-f5
+b3
-f7
This example just for easier to understand the problem. In reality, the given list include around 4 thousands of files. And the root directory has the size of ~15GB with a hundreds of thousands files inside.
That would be easy to search inside a folder, and to remove files that matched in a given list. Let just say we solve the revert issue, to keep files that matched in a given list.
Programs written in Perl/Python are prefer.

First, store your list of files you want to keep inside an associative container like a Python dict or a map of some kind.
Second, simply iterate (in Python, os.walk) over the entire directory structure, and every time you see a file, check if it is in the associative container of paths to keep. If not, delete it (in Python, os.unlink).
Alternatively:
First, create a temporary directory on the same filesystem.
Second, move (os.renames, which generates new subdirectories as needed) all the "keep" files to the temporary directory, with the same structure.
Third, overwrite (os.removedirs followed by os.rename, or just shutil.move) the original directory with the temporary one.

The os.walk path:
import os
keep = set(['/a/a1/f1', '/b/b1/b2/f5', '/b/b3/f7'])
for dirpath, dirnames, filenames in os.walk('./'):
for name in filenames:
path = os.path.join(dirpath, name).lstrip('.')
print('check ' + path)
if path not in keep:
print('delete ' + path)
else:
print('keep ' + path)
It doesn't do anything except inform you.
It don't think os.walk is too slow, and it gives you the option of keeping by regex patterns or any other criteria.

This is a working code for your problem.
import os
def list_files(directory):
for root, dirs, files in os.walk(directory):
for name in files:
yield os.path.join(root, name)
files_to_delete = {'/home/vedang/Desktop/a.out', '/home/vedang/Desktop/ABC/temp.txt'} #Keep a set instead of list for faster lookups
for f in list_files('/home/vedang/Desktop'):
if f in files_to_delete:
os.unlink(f)

Here is a function which accepts a set of files you wish to keep and the root directory from which you wish to begin deleting files.
It's a classic recursive Depth-First-Search that will remove empty directories after deleting all the unwanted files
import os
def delete_files(keep_list:set, curr_dir):
files = os.listdir(curr_dir)
for f in files:
path = f"{curr_dir}/{f}"
if os.path.isfile(path):
if path not in keep_list:
os.remove(path)
elif os.path.islink(path):
os.unlink(path)
elif os.path.isdir(path):
delete_files(keep_list, path)
files = os.listdir(curr_dir)
if not files:
os.rmdir(curr_dir)

here i got a solution in a different aspect,
suppose we are at linux environment,
first,
find .
to get a long list with all file path/folder explained
second, suppose we got the exclude path list, in order to exclude at your volume ( say thousands ) , we could just append these to the previous list, and
| sort | uniq - c |grep -v "^2"
to get the to delete list,
and third
| xargs rm
to actually do the deletion

how to find a part of a path using tcl?

I am farily new to tcl. I am trying to write a tcl script that will perform a few thins on certain files in a tree structure, but on all files.
I have in my tree a number of files ending with .xci.
Now I want to filter our all .gbn files except the ones in a part of my tree (i.e. /src/ps/<a number of directories>/<a number of files>.xci) that contains the path part "/ps/"
I have done this:
foreach xci_file [get_files *.xci] {
#if (ps_is_found_in_path_of_${xci_file}) {
generate_target simulation [get_files $xci_file]
}
}
The foreach search through all files in my project and returns the filename (including the full path). How do I write the if statement to avoid target generation of the files whose paths include "/ps/"?
Is there a nice soul out there who could share some light on this?

You want:
if {"ps" in [file split $xci_file]} {
The quotes are required here: expr (which handles if's first argument)
needs literal strings quoted.

Mercurial .hgignore exlude all files not '.txt' in all subfolders of specified folder

having directory structure
foo
-1.txt
-1.notxt
-bar
-2.txt
-3.notxt
-sub
-1.txt
-2.txt
another-folders-and-files
Want to exclude all non-'.txt' files in folder foo and its subfolders.
The most similar is this pattern:
^foo/(?!.*\.txt$)
But it exclude not only 1.notxt file from foo, but all subfolders too.
I think, it because of bar is matches my exclusion pattern, but I do not understand how to say to hg not to ignore bar.
Any ideas?

Unfortunately mercurial ignore patterns don't distinguish between files and directories; so if you ignore every name that doesn't end in .txt, you'll ignore directories too. But since directory names don't usually have a suffix, what you can do is ignore every name that has a suffix other than .txt, like this:
^foo/.*[^/]\.[^/]*$(?<!txt)
Breakdown:
^foo/.* any path in foo/; followed by
[^/]\. a period preceded by (at least) one non-slash character; followed by
[^/]*$, a path-final suffix; finally:
(?<!txt) check that there was no txt immediately before the end of the line.
This lets through names that begin with a period (.hgignore), and names containing no period at all (README). If you have files with no suffix you'll have to find another way to exclude them, but this should get you most of the way there. If you have directory names with dots in the middle, this will suppress them and you'll need to work harder to exclude them-- or change your approach.
(Incidentally, it's probably safer to have a long list of ignored suffixes, and add to it as necessary; soon enough the list will stabilize, and you won't risk ignoring something that shouldn't be.)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

What are the return values of os.walk() in python? - output

I have a directory structure like this: dir/ └── subdir My code: import os for d in os.walk('dir'): print(d) I get the output: ('dir', ['subdir'], []) ('dir/subdir', [], []) My question is what are those trailing [ ]s ? There is 1 in the first tuple and 2 in the second.. it confuses me.

Related

How to copy or move multiple files with same extension?

How to use Webpack to combine JSON files from all subdirectories into one?

Algorithm to delete every files in a directory, except some in a given list

how to find a part of a path using tcl?

Mercurial .hgignore exlude all files not '.txt' in all subfolders of specified folder

Categories

Resources