What are all the ways to traverse directory trees? - language-agnostic

How do you traverse a directory tree in your favorite language?
What do you need to know to traverse a directory tree in different operating systems? On different filesystems?
What's your favorite library/module for aiding in traversing a directory tree?

In Python:
If you're looking for a quick, clean, and portable solution try:
import os
base_dir = '.'
def foo(arg, curr_dir, files):
print curr_dir
print files
os.path.walk(base_dir, foo, None)
Note that you can modify foo to do something else instead of just printing the names. Furthermore, if you're interested in migrating to Python 3.0, you will have to use os.walk() instead.

In Java:
Recursion is useful here. Following is a Java code snippet that's been all over the Internet for ages. Not sure who deserves the credit for it.
// Process all files and directories under dir
public static void visitAllDirsAndFiles(File dir) {
process(dir); //do something useful with the file or dir
if (dir.isDirectory()) {
String[] children = dir.list();
for (int i=0; i<children.length; i++) {
visitAllDirsAndFiles(new File(dir, children[i]));
}
}
}

bash:
#!/bin/bash
function walk_tree {
echo "Directory: $1"
local directory="$1"
local i
for i in "$directory"/*;
do
echo "File: $i"
if [ "$i" = . -o "$i" = .. ]; then
continue
elif [ -d "$i" ]; then # Process directory and / or walk-down into directory
# add command here to process all files in directory (i.e. ls -l "$i/"*)
walk_tree "$i" # DO NOT COMMENT OUT THIS LINE!!
else
continue # replace continue to process individual file (i.e. echo "$i")
fi
done
}
walk_tree $HOME
(adapted from http://ubuntuforums.org/showthread.php?t=886272 Comment #4)

In C#:
Stack<DirectoryInfo> dirs = new Stack<DirectoryInfo>();
dirs.Push(new DirectoryInfo("C:\\"));
while (dirs.Count > 0) {
DirectoryInfo current = dirs.Pop();
// Do something with 'current' (if you want)
Array.ForEach(current.GetFiles(), delegate(FileInfo f)
{
// Do something with 'f'
});
Array.ForEach(current.GetDirectories(), delegate(DirectoryInfo d)
{
dirs.Push(d);
});
}

C++
#include <utility>
#include <boost/filesystem.hpp>
#include <boost/foreach.hpp>
#define foreach BOOST_FOREACH
namespace fs = boost::filesystem;
fs::recursive_directory_iterator it(top), eod;
foreach (fs::path const & p, std::make_pair(it, eod)) {
if (is_directory(p)) {
...
} else if (is_regular_file(p)) {
...
} else if (is_symlink(p)) {
...
}
}

On Linux with GNU tools
find -print0 | xargs -0 md5sum
or
find -print0 | xargs -0 -iASD echo 'this file "ASD" should be dealt with lile this (ASD)'

mmmm, C# with a dose of recursion.....
public static List<string> CrawlPath(string path, bool IncludeSubFolders)
{
List<string> fileList = new List<string>();
try
{
Stack<string> filez = new Stack<string>(Directory.GetFiles(path));
while (filez.Count > 0)
{
fileList.Add(filez.Pop());
}
if (IncludeSubFolders)
{
filez = new Stack<string>(Directory.GetDirectories(path));
while (filez.Count > 0)
{
string curDir = filez.Pop();
fileList.AddRange(CrawlPath(curDir, IncludeSubFolders));
}
}
}
catch (System.Exception err)
{
Console.WriteLine("Error: " + err.Message);
}
return fileList;
}

Related

Delete unused key:value properties in JSON file

I have key:value JSON object that is used in my JavaScript project. Value is a string and this object looks like this
{
key1:{
someKey: "Some text",
someKey2: "Some text2"
},
key2:{
someKey3:{
someKey4: "Some text3",
someKey5: "Some text4"
}
}
}
I use it in the project like this: key1.someKey and key2.someKey3.someKey4. Do you have idea how to delete unused properties? Let's say we don't use key2.someKey3.someKey5 in any file in a project, so i want it to be deleted from a JSON file. To people in the comments. I did't say i want to use JavaScript for this. I don't want to use it in browser or server. I just want the script that can do that on my local computer.
If you live within javascript and node, you can use something like this to get all the paths:
Using some modified code from here: https://stackoverflow.com/a/70763473/999943
var lodash=require('lodash') // use this if calling from the node REPL
// import lodash from 'lodash'; // use this if calling from a script
const allPaths = (o, prefix = '', out = []) => {
if (lodash.isObject(o) || lodash.isArray(o)) Object.entries(o).forEach(([k, v]) => allPaths(v, prefix === '' ? k : `${prefix}.${k}`, out));
else out.push(prefix);
return out;
};
let j = {
key1: { someKey: 'Some text', someKey2: 'Some text2' },
key2: { someKey3: { someKey4: 'Some text3', someKey5: 'Some text4' } }
}
allPaths(j)
[
'key1.someKey',
'key1.someKey2',
'key2.someKey3.someKey4',
'key2.someKey3.someKey5'
]
That's all well and good, but now you want to take that list and look through your codebase for usage.
The main choices available are text searching with grep or awk or ag, or parse the language and look through the symbolic representation of the language after it's loaded into your project. Tree-shaking can do this for libraries... I haven't looked into how to do tree-shaking for dictionary keys, or some other undefined reference check like a linter may do for a language.
Then once you have all the instances found, then you either manually modify your list or use a json library to modify it.
My weapons of choice in this instance are:
jq and bash and grep
It's not infallible. But it's a start. (use with caution).
setup_test.sh
#!/usr/bin/env bash
mkdir src
echo "key2.someKey3.someKey4" > src/a.js
echo "key1.someKey2" > src/b.js
echo "key3.otherKey" > src/c.js
test.json
{
"key1":{
"someKey": "Some text",
"someKey2": "Some text2"
},
"key2":{
"someKey3":{
"someKey4": "Some text3",
"someKey5": "Some text4"
}
}
}
check_for_dict_references.sh
#!/usr/bin/env bash
json_input=$1
code_path=$2
cat << HEREDOC
json_input=$json_input
code_path=$code_path
HEREDOC
echo "Paths found in json"
paths="$(cat "$json_input" | jq -r 'paths | join(".")')"
no_refs=
for path in $paths; do
escaped_path=$(echo "$path" | sed -e "s|\.|\\\\.|g")
if ! grep -r "$escaped_path" "$code_path" ; then
no_refs="$no_refs $path"
fi
done
echo "Missing paths..."
echo "$no_refs"
echo "Creating a new json file without the unused paths"
del_paths_list=
for path in $no_refs; do
del_paths_list+=".$path, "
done
del_paths_list=${del_paths_list:0:-2} # remove trailing comma space
cat "$json_input" | jq -r 'del('$del_paths_list')' > ${json_input}.new.json
Running the setup_test.sh, then we can test the jq + grep solution
$ ./check_for_dict_references.sh test.json src
json_input=test.json
code_path=src
Paths found in json
src/b.js:key1.someKey2
src/b.js:key1.someKey2
src/b.js:key1.someKey2
src/a.js:key2.someKey3.someKey4
src/a.js:key2.someKey3.someKey4
src/a.js:key2.someKey3.someKey4
Missing paths...
key2.someKey3.someKey5
Creating a new json file without the unused paths
If you look closely you would want it to also print key1.someKey, but this got "found" in the middle of the name key1.someKey2. There are some more fancy regex things you can do, but for the purpose of this script it may be enough.
Now look in your directory for the new json file:
$ cat test.json.new.json
{
"key1": {
"someKey": "Some text",
"someKey2": "Some text2"
},
"key2": {
"someKey3": {
"someKey4": "Some text3"
}
}
}
Hope that helps.

Read a CSV file inside awk script (CLOSED)

I want to use an AWK script without typing in the terminal the CSV file and instead, call that file from inside my code.
Current Input terminal:
./script.awk file.csv
Desired Input Terminal:
./script.awk
On the other hand, here is the script I have done so far:
#!/usr/bin/awk -f
BEGIN{print"Filtered Elements:"}
BEGIN{FS=","}
{ if ($8~/.*5.*/ && $2~/.*Sh.*/ && ($3~/.*i.*/ || $4~/.*s.*/)) { print } }
{ if ($3~/.*ra.*/ && $7~/.*18.*/ && $13~/.*r.*/) { print } }
{ if ($5~/.*7.*/ && $2~/.*l.*/ && ($4~/.*Fi.*/ || $12~/20.*/)) { print } }
} **file.csv**
I aslo tried to do this:
#!/usr/bin/awk -f
BEGIN{print"Filtered Elements:"}
BEGIN{FS=","}
BEGIN{
while (getline < file.csv > 0) {
{ if ($8~/.*5.*/ && $2~/.*Sh.*/ && ($3~/.*i.*/ || $4~/.*s.*/)) { print } }
{ if ($3~/.*ra.*/ && $7~/.*18.*/ && $13~/.*r.*/) { print } }
{ if ($5~/.*7.*/ && $2~/.*l.*/ && ($4~/.*Fi.*/ || $12~/20.*/)) { print } }
}
But either ways an error occurred.
Thank you in advance!
Your second example is a correct getline loop, except that the file path should be quoted to be treated as a string (and not a variable): while (getline < "file.csv" > 0) #....
Alternatively, you can set the script arguments (including input files and variables) by manipulating ARGV and ARGC in a BEGIN block:
BEGIN {
ARGV[1] = "file.csv"
ARGC = 2
}
{
# commands here process file.csv as normal
}
Running this as ./script is the same as if you set the argument with the shell (like ./script file.csv).
An awk script isn't a command you call, it's a set of instructions interpreted by awk where awk IS a command you call. What you're trying to do apparently is write a Unix command that's implemented as a shell script which includes a call to awk, e.g.:
#!/usr/bin/env bash
awk '
{ print "foo", $0 }
' 'file.csv'
Store that in a file named stuff (not stuff.awk or stuff.sh or anything else with a suffix), and then call it as ./stuff or just stuff if the current directory is in your PATH.
Though you technically can use a shebang to call awk directly, don't do it - see https://stackoverflow.com/a/61002754/1745001.

Delete duplicate JSON file based on one of the attributes

I have two directories in my linux system, /dir and /dir2
Both have more than 4000 JSON files. The JSON content of every file is like
{
"someattribute":"someValue",
"url":[
"https://www.someUrl.com/xyz"
],
"someattribute":"someValue"
}
Note that url is an array, but it always contains one element (the url).
The url makes the file unique. If there is a file with the same url in /dir and /dir2 then it's a duplicate and it needs to be deleted.
I want to automate this operation either using a shell command preferrably. Any opinion how I should go about it?
Use jq to get a list of duplicates:
jq -nrj '[
foreach inputs.url as [$url] ({};
.[$url] += 1;
if .[$url] > 1 then input_filename
else empty end
)
] | join("\u0000")' /{dir1,dir2}/*.json
And to remove them, pipe above command's output to xargs:
xargs -0 rm --
Here's a quick and dirty bash script that uses jq to extract the URL from the json files, and awk to detect and delete duplicates:
#!/bin/bash
rm -f urls-dir1.txt urls-dir2.txt
for file in dir1/*.json; do
printf "%s\t%s\n" "$file" $(jq '.url[0]' "$file") >> urls-dir1.txt
done
for file in dir2/*.json; do
printf "%s\t%s\n" "$file" $(jq '.url[0]' "$file") >> urls-dir2.txt
done
awk -F $'\t' 'FNR == NR { urls[$2] = 1; next }
$2 in urls { system("rm -f \"" $1 "\"") }' urls-dir1.txt urls-dir2.txt
rm -f urls-dir1.txt urls-dir2.txt
It assumes that dir2 has the files that are to be deleted as duplicates and the ones in dir1 should be untouched.
You can use the following Java approach to achieve this:
Set<String> urls = new HashSet<>();
try (Stream<Path> paths = Files.list(Paths.get("/path/to/your/folder"))) {
paths
.map(path -> new FileInfo(path, extractUrl(path)))
.filter(info -> info.getUrl() != null)
.filter(info -> !urls.add(info.getUrl()))
.forEach(info -> {
try {
Files.delete(info.getPath());
} catch (IOException e) {
e.printStackTrace();
}
});
} catch (IOException e) {
e.printStackTrace();
}
This uses the following FileInfo class:
public class FileInfo {
private Path path;
private String url;
// constructor and getter
}
First of all it reads all files in the given directory and extracts the URL. It filters all duplicates with the help of the HashSet. At the end all files containing duplicate URLs are going to be deleted.
There are multiple options to extract the url from each file:
Quick and dirty using a regex:
private String extractUrl(Path path) {
try {
String content = String.join("\n", Files.readAllLines(path));
Pattern pattern = Pattern.compile("\"url\".+\\s+\"(?<url>[^\\s\"]+)\"");
Matcher matcher = pattern.matcher(content);
if (matcher.find()) {
return matcher.group("url");
}
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
A better solution would be using a JsonParser Library like Jackson:
private String extractUrl(Path path) {
try (BufferedReader reader = Files.newBufferedReader(path)) {
ObjectMapper mapper = new ObjectMapper();
MyObject object = mapper.readValue(reader, MyObject.class);
return object.getUrls().stream().findFirst().orElse(null);
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
This uses an Object representation of the file content:
public class MyObject {
#JsonProperty("url")
private List<String> urls;
// getter and setter
}
But at the end, the most performant solution probably would be to use a shell script.
Here is a quick and simple awk script that does all the work from base dir.
The awk script named script1.awk
/https/{
if ($1 in urlArr) {
cmd = "rm " FILENAME;
print cmd;
//system(cmd);
} else {
urlArr[$1] = FILENAME;
}
}
Initially run the script with the following command:
awk -f script1.awk dir{1,}/*.json
When ready to remove the duplicate json files, just uncomment the 5th line (line containing system(cmd)). And run again.
Here are some explanations:
The awk command runs the script script1.awk on all json files in sub directory dir and dir1.
The script traverse each file, extract the URL text having https into variable $1.
If variable $1 already exist in associative array urlArr print/remove the file.
Else add current file into associative array urlArr.
Hope you like this simple solution.

cannot call function in powershell

Function.psm1
function split-release {
Param
(
[string]$Release
)
# Regex to match semantic versioning
if($release -notmatch '\d+[.]\d+[.]\d+')
{
Write-Error "Invalid Release Number"
}
# Split the version string into an array
$RelVersion=$release.split(".")
#{"Major"=$RelVersion[0];"Minor"=$RelVersion[1];"Patch"=$RelVersion[2]}
}
split.psm1
Import-Module .\Function.psm1
split-release
I call the function as
PS c:\ > .\split.psm1 1.2.3
It doesn't print any output or errors out.
Seems to print to the console when I test importing just that function in a psm1 file, and in a separate file import the module and then pass in "0.0.0" to split-release.
The .\ syntax indicates that the desired file is in the same directory as the caller. Is that the case with your files? Is there any additional code that may be obscuring output?
Other minor points:
Write-Host will not write to your output stream. In PS the alias to echo is Write-Output.
You can use a hash table to return these as a single object with properties.
Modified function:
function split-release {
Param
(
[string]$Release
)
# Regex to match semantic versioning
if($release -notmatch '\d+[.]\d+[.]\d+')
{
Write-Error "Invalid Release Number"
}
# Split the version string into an array
$RelVersion=$release.split(".")
#{"Major"=$RelVersion[0];"Minor"=$RelVersion[1];"Patch"=$RelVersion[2]}
}
Output:
PS C:/ > $release = "1.2.3"
PS C:/ > $result = split-release -Release $release
PS C:/ > $result.Major
1
PS C:/ > $result.Minor
2
PS C:/ > $result.Patch
3
More Info:
about_functions
I tried this finally and it appears to work.
File1.psm1
function split-release ($release) {
$RelVersion=$release.split(".")
$Relmajor=$RelVersion[0]
$Relminor=$RelVersion[1]
$Relpatch=$RelVersion[2]
write-host $Relmajor $Relminor $Relpatch
}
File2.ps1
param(
[string]$release = $(throw "Release number required as script parameter")
)
Import-Module ./File1.psm1
split-release "$release"
Finally run it as PS C:\ > ./file2.ps1

How do I remove spaces from all the file-names in the current directory

As the title suggests how do I remove spaces from all the files in the current directory ?
Example
file name.mp3 should become filename.mp3
Note:
I am open to an answer in any language.
I am a big fan of python, so here is a python script for doing the same
import os
for f in os.listdir("."):
r = f.replace(" ","")
if( r != f):
os.rename(f,r)
with sh
for file in *' '*; do [ -f "$file" ] && mv "$file" "`echo $file|tr -d '[:space:]'`"; done
with perl 5.14 (replace y/ //dr by do{($x=$_)=~y/ //d;$x} for older versions)
# Linux/Unix
perl -e 'rename$_,y/ //drfor<"* *">'
# Windows
perl -e "rename$_,y/ //drfor<'* *'>"
with Java
import java.io.File;
public class M {
public static void main(String[] args) {
String o,n;
for (File old : new File(".").listFiles()) {
o=old.getName();
if (!o.contains(" ")) continue;
n=o.replaceAll(" ", "");
old.renameTo(new File(n));
}
}
}
for i in * ; do
if [ "$i" != ${i//[[:space:]]} ] ;
then
mv "$i" "${i//[[:space:]]}"
fi
done
${i//[[:space:]]} removes all the spaces in a string.
Since you're language agnostic, here's a ruby one-liner:
ruby -e 'Dir.foreach(".") {|f| f.count(" \t") > 0 and File.rename(f, f.delete(" \t"))}'
ls -1 | awk '/ /{a=$0;gsub(/ /,"");b="mv \""a"\" "$0;system(b);}'
This renames an old file only if the old file name contained a space
and the new file doesn't already exist.
for old in *; do
new="${old//[[:space:]]}"
[[ $old = $new || -f $new ]] || mv "$old" "$new"
done