Download all lines from CSV using gsutil - csv

I'm trying to figure out a way to download multiple files from a single bucket.
Assume I have a bucket with hundreds of files and I want to download 65 of those files.
I can obviously go to the console and download each file I need individually, but this is not very efficient.
One other option would be to download an entire folder using
gsutil -m cp -r gs://bucket/folder [destination folder]
However, that will download ALL files, which isn't convenient either.
Can I somehow include all the filenames I want in a CSV and have gsutil iterate that CSV file?

You can't use a CSV file directly, but you can pipe a list of urls into gsutil:
list_of_urls > gsutil -m cp -I ./download_dir
Perhaps that is good enough.

Related

How do I get a list of the gsutil URIs of all the images in a folder in google cloud storage bucket?

i have a bucket in which i have multiple folders. these folders further have images stored in them. i want to make a csv with the gsutil URIs of all these images. How can I do that?
I could not find a way to get the gsutil URIs of all images at once.
You need to add -R parameter and it will go thru all folders:
gsutil ls -R gs://bucket-name
and then you can use grep -v :$ to discard folders

Why don't mercurial file sets work when adding files?

I'm trying to use mercurial file sets to add all the files in a directory tree, excluding very large files and any binary files. Cribbing from the mercurial documentation, this command should do it:
hg init
hg add 'set: size("<1M") and not binary()'
However this returns a status code of 0, and hasn't added anything to my new, empty repo. I've tried just 'set: not binary()' and that didn't work either.
The frustrating thing is that although I can google for mercurial file sets, and find lots of examples, I can't find anything to help troubleshoot when it doesn't work!
I don't have a .hgignore file, and it's a fresh empty repo. Mercurial 4.2.2.
The directory where I'm testing this has a couple of artificially created files for the purpose of testing. In my real use case, I inherit a multi-gigbyte tarball of assorted sources and binaries from a client, and I want to get all the sources into mercurial before I start hacking to fix their problems, hence the need to exclude the binaries and large files that otherwise choke mercurial.
Here's my little test script:
#!/bin/sh -ex
dd if=/dev/urandom of=binary_1k bs=1 count=1024
dd if=/dev/urandom of=binary_2M bs=1 count=2097152
echo "This. Is, a SMALL text file." > text_small
hexdump binary_1k > text_1k
hexdump binary_2M > text_2M
ls -lh
file binary_1k
file binary_2M
file text_1k
file text_2M
hg init
hg add 'set: size("<1M") and not binary()'
hg status -a
hg add 'set: not binary()'
hg status -a
hg add 'set: size("<1M")'
hg status -a
At the end of this, each status command reports no files in the repo, and the add commands report no errors.
The problem is that file sets do a query of Mercurial's repository data base, which knows only about files that are part of the repository or have been added.
One solution is to add all, and then to get rid of the files that you don't like, e.g.:
hg forget 'set:size(">1M") or binary()'
This works, because the query also requires recently added files, even if they haven't been committed yet.

How can you list all file objects loaded in IPFS?

I can add recursively a bunch of files within IPFS with
$ ipfs add -r data/
How can I get a list back of all loaded file objects [in a specific directory]? Similar to aws s3 listObjects...
The ipfs file ls command does not seem to be recursive. I understand that I can call the API a thousand times but that does not seem to be very efficient.
I must be missing something here.
Thanks,
Pat.
IPFS is based on Merkle tree, so you can display all elements under your root resource. You can use:
web ui: http://localhost:8080/ipfs/<your_root_resource_hash>
graphmd: https://ipfs.io/ipfs/QmNZiPk974vDsPmQii3YbrMKfi12KTSNM7XMiYyiea4VYZ/example#/ipfs/QmRFTtbyEp3UaT67ByYW299Suw7HKKnWK6NJMdNFzDjYdX/graphmd/README.md
shell commands:
ipfs ls <your_root_resource_hash>
ipfs refs -r <your_root_resource_hash>
Docs for ipfs files ls
edit : More important, your directory name is not persisted into IPFS. You can access your resource knowing its hash, the one you get when you add it with ipfs add -r <your_dir>
You can use this command to list all objects
ipfs files ls
If you are interested in local files added with something like ipfs add -r --nocopy /files, what you want is
ipfs filestore ls
Unfortunately now this lists blocks instead of whole files, see https://github.com/ipfs/go-ipfs/issues/5293
you can use the command:
ipfs filestore ls

Detect is file is versioned

For example I have a hg versioned project in this path: C:\src\sample_project
Now, lets this project have subfolders, and lets say I'm editing a file inside this project C:\src\sample_project\docs\index.rst.
Having the path of this file C:\src\sample_project\docs\index.rst what is the easiest and most effective way to check if the file is versioned by hg, by either using Windows shell commands, hg.exe or tortoise (thg.exe)?
I'll post my doubt as answer.
Command to check if file is versioned: hg status <path> and then if the first character in stdout of this command is ? or a (from abort: no repository found in...) I should assume that file is not versioned.
What you stated is a way, but there is a cleaner one imo. You can use:
hg status -u which lists all unknown (read: not tracked) files in your repository.

Mercurial: get contents of a specific revision of a file

I need to get contents of a specific revision/node of a file in a local repository and write it to a temporary file.
I know it is possible to do through the internal Mercurial API.
Is there a built-in command or an extension?
You can use hg cat:
hg cat -r revisionid filename > tmpfile
The fastest, large and/or binary file friendly way to do this is:
hg cat -r revisionid repoRelativeFilePath -o tempFilePath
The tempFilePath, unless absolutely rooted (ex. 'C:\') will be relative to the repo's root