Unstable number of listed files by Google Drive Api after deleting a bunch of files and shared folders - google-drive-api

I'm testing scripts for a client. For this I created a ~10k files which I uploaded to a test folder using the web UI. Then I trashed and then deleted this folder.
Then I added a shared folder from the client and listed all the files using the /v3/files with the proper query parameters to include files from other drives.
I noticed my script was not functioning well due to a lot of 404 responses. It turns out, deleting 10k files is not instantaneous for Google Drive, at least from the API's point of views. The listed files also included the listed files that I've just deleted who were then deleted later.
From what I've seen, Google Drive is able to process about 200 files/s.
I could just wait, but then I found another problem after I deleted the shared folder and replaced it with another shared folder from my client, all of which had ten of thousand of files. As expected, it took some time to see the number of files to go down. But then I saw the number increasing slowly then decreasing.
I suspect this is the adding of a folder that increases the number at the same time the deleting of the other decreases the files count but I am not sure.
Am I the only one who experienced this? Is there something in the API that I've missed that could mitigate this or at least could tell me when Google Drive has ended processing all operations?
Edit: steps to reproduce:
Code that I used to create a bunch of files:
#!/usr/bin/env bash
#create_lots_of_files.sh
mkdir -p lots_of_files
cd lots_of_files
for i in $(seq 10000); do
FOLDER=$(("$i"%10))
FILE="file_$i".txt
mkdir -p "$FOLDER"
echo "$FILE" > "$FOLDER/$FILE"
done
Then upload this folder to your drive. Grab a coffee this will take time.
Code to fetch the file ids using the api:
#!/usr/bin/env bash
# list_ids.sh <output file path> <bearer token>
set -e
# shellcheck disable=SC2128
SCRIPTDIR="$(dirname "$(realpath "$BASH_SOURCE")")"
PAGE_SIZE=1000
if [[ -z $1 ]]; then
echo "First argument must specify a path to store the files ids"
exit 1
fi
IDS_FILE="$1"
if [[ -z $2 ]]; then
echo "Second argument must be the bearer token"
exit 1
fi
ACCESS_TOKEN="$2"
if ! jq -h &> /dev/null; then
echo "error: need to install jq: sudo apt-get install jq"
exit 1
fi
cd "$SCRIPTDIR"
BASE_QUERY_STRING="https://www.googleapis.com/drive/v3/files\
?corpora=allDrives\
&includeItemsFromAllDrives=true\
&supportsAllDrives=true\
&pageSize=$PAGE_SIZE\
"
true > "$IDS_FILE"
while true; do
# If pageToken is empty then it defaults to the first page
QUERY_STRING="$BASE_QUERY_STRING&pageToken=$NEXT_PAGE_TOKEN"
RESPONSE="$(curl \
--silent \
--fail \
-H 'GData-Version: 3.0' \
-H "Authorization: Bearer $ACCESS_TOKEN" \
--request GET \
"$QUERY_STRING" \
)"
jq -r '.files | map(select(.mimeType != "application/vnd.google-apps.folder")) | .[].id' <<<"$RESPONSE" | tee -a "$IDS_FILE"
NEXT_PAGE_TOKEN="$(jq -r '.nextPageToken' <<< "$RESPONSE")"
if [[ -z "$NEXT_PAGE_TOKEN" || "$NEXT_PAGE_TOKEN" = 'null' ]]; then
break
fi
done
Keep track of the number of files with:
while true; do date; ./list_ids.sh ids.txt '<bearer token>' | wc -l; sleep 5; done
Delete lots_of_files on your drive and watch the files count.

I can tell you, you are not the first one. In my experience, this behaviour you are reporting is expected, you will see, changes need to be replicated across all Google Workspace servers, this has a delay usually referred to as 'propagation', as mentioned in this Help Center article https://support.google.com/drive/answer/7166529.
If you share or unshare folders with a lot of files or subfolders, it might take time before all permissions change. If you change a lot of edit or view permissions at once, it might take time before you see the changes.
Although the task you are doing is different to just sharing files, due to the high volume of files and folders you are working with and the fact that you are working with shared folders you will experience a delay. As outlined in this other Help Center article https://support.google.com/a/answer/7514107, you can expect changes to be fully applied within 24 hours.
I have assisted multiple data migrations with Google Workspace admins and this is also expected when working with large amounts of data.

Related

How can I run a program on my computer via web html server?

I want to remotely run some command-line program (no need for GUI) on a computer with a web server (Apache in my case) installed.
Some simple example program to start:
MyScript.sh -f file.txt
How could I do this?
As long as this could be considered a too broad question, I suggest narrowing it to (rather) simple programs, but allowing multiple ways to implement it, like HTML, JavaScript, CRON tweaks... etc.
Possible useful (but not strictly required) features:
View outputted results.
Parameters specification.
Further notes:
Indeed, some methods could involve a bigger security risk. It is assumed, but it is still a good idea to consider it.
A very basic method :
Create some HTML file on the web server to order your program startup:
# mkdir -p /usr/local/www/apache24/data/StartScripts
# echo "Program start requested" > /usr/local/www/apache24/data/StartScripts/index.html
Create this simple script to check if somebody did access that HTML file:
$ cat CheckScriptRequest.sh
tail /var/log/httpd-access.log | grep "GET /StartScripts/"
ScriptStartupRequest=$?
if (( ScriptStartupRequest == 0 ))
then
MyScript.sh -f file.txt
fi
Program it as a CRON entry:
$ crontab -l
* * * * * CheckScriptRequest.sh
Now, when you will navigate to http://yourdomain.com/StartScripts , you will get a message "Program start requested", and your computer will run the program MyScript.sh -f file.txt (well, it will delay one minute until the CRON tab performs the check).
Of course, this simple proof of concept has multiple flaws and could be enhanced a lot.
An example of the CGI way (extracted from here) that shows the classic "Hello World" and today's date (thanks to Vivek Gite):
Assuming the path to CGI executables (could change between versions) is /usr/lib/cgi-bin :
$ cd /usr/lib/cgi-bin
$ cat first.cgi
#!/bin/bash
echo "Content-type: text/html"
echo ""
echo "<html><head><title>Bash as CGI"
echo "</title></head><body>"
echo "<h1>Hello world</h1>"
echo "Today is $(date)"
echo "</body></html>"
Setup exe permission on the script:
$ chmod +x first.cgi
Fire up your web browser and test the script navigating to:
http://localhost/cgi-bin/first.cgi
or remotely (open port, blah blah blah...):
http://your-ip/cgi-bin/first.cgi
Of course, you need CGI support enabled on your web server.

limiting number of time fswatch runs

I have a fswatch set up on a directory which triggers a script to refresh my browser every time a file is changed. It works but if there is a bunch of files added or deleted in a single shot, the browser can refresh for very long periods of time before it stops.
Looking at the documentation, it looks like --batch-marker might but what I need but it's not clear from the documentation how I might use it to limit how many times my script is triggered.
UPDATE: here is my current fswatch command:
fswatch -v -o . | xargs -n1 -I{} ~/bin/refresh.sh
UPDATE: I'm on a mac using the FSEvents monitor.

How to move from gitalb source base to gitlab omnibus?

I am trying to move gitlab-ce 8.5 source base to gitlab-ce 8.15 omnibus. We were using MySQL in source base but now we have to use thepsql with gitlab-ce omnibus`. When I was trying to take a backup so it was failing due to some empty repo.
Question: Is it any alternative way to move source base to omnibus with full backup?
I have moved gitlab from source base to the omnibus. You can use below link to convert db dump from MySQL to psql.
https://gitlab.com/gitlab-org/gitlab-ce/blob/master/doc/update/mysql_to_postgresql.md
I have created a zip file of repos manually & copied to the gitlab omnibus server & restore it on /var/opt/gitlab/git-data/repository/.
After these steps, copy the below script on /var/opt/gitlab/git-data/xyz.sh & executed for updating the hooks.
#!/bin/bash
for i in repositories/* ; do
if [ -d "$i" ]; then
for o in $i/* ; do
if [ -d "$i" ]; then
rm "$o/hooks"
# change the paths if required
ln -s "/opt/gitlab/embedded/service/gitlab-shell/hooks" /var/opt/gitlab/git-data/"$o"/hooks
echo "HOOKS CHANGED ($i/$o)"
fi
done
fi
done
Note: Repos permission should be git:git
Some useful commands during the migration:
sudo gitlab-ctl start postgres **to start the Postgres service only**
sudo gitlab-psql **to use the gitlab bundle postgres.**
Feel free to comment if you face 5xx errors code on gitlab page.

What will happen with a gsutil command if a DRA bucket's contents are unavailable?

I'm on an DRA (Durable Reduced Availability) bucket and I perform the gsutil rsync command quite often to upload/download files to/from the bucket.
Since file) could be unavailable (because of the DRA), what exactly will happen during a gsutil rsync session when such a scenario is being hit?
Will gsutil just wait until the unavailable files becomes available and complete the task, thus always downloading everything from the bucket?
Or will gsutil exit with a warning about a certain file not being available, and if so exactly what output is being used (so that I can make a script to look for this type of message)?
What will the return code be of the gsutil command in a session where files are found to be unavailable?
I need to be 100% sure that I download everything from the bucket, which I'm guessing can be difficult to keep track of when downloading hundreds of gigabytes of data. In case gsutil rsync completes without downloading unavailable files, is it possible to construct a command which retries the unavailable files until all such files have been successfully downloaded?
If your files exceed the resumable threshold (as of 4.7, this is 8MB), any availability issues will be retried with exponential backoff according to the num_retries and max_retry_delay configuration variables. If the file is smaller than the threshold, it will not be retried (this will be improved in 4.8 so small files also get retries).
If any file(s) fail to transfer successfully, gsutil will halt and output an exception depending on the failure encountered. If you are using gsutil -m rsync or gsutil rsync -C, gsutil will continue on errors and at the end, you'll get a CommandException with the message 'N file(s)/object(s) could not be copied/removed'
If retries are exhausted and/or either of the failure conditions described in #2 occur, the exit code will be nonzero.
In order to ensure that you download all files from the bucket, you can simply rerun gsutil rsync until you get a nonzero exit code.
Note that gsutil rsync relies on listing objects. Listing in Google Cloud Storage is eventually consistent. So if you are upload files to the bucket and then immediately run gsutil rsync, it is possible you will miss newly uploaded files, but the next run of gsutil rsync should pick them up.
I did some tests on a project and could not get gsutil to throw any errors. Afaik, gsutil operates on the directory level, it is not looking for a specific file.
When you run, for example $ gsutil rsync local_dir gs://bucket , gsutil is not expecting any particular file, it just takes whatever you have in "local_dir" and uploads it to gs://bucket, so :
gsutil will not wait, it will complete.
you will not get any errors - the only errors I got is when the local directory or bucket are missing entirely.
if, let´s say a file is missing on local_dir, but it is available in the bucket and then you run $ gsutil rsync -r local_dir gs://bucket, then nothing will change in the bucket. with the "-d" option, the file will be deleted on the bucket side.
As a suggestion, you could just add a crontab entry to rerun the gstuil command a couple of times a day or at night.
Another way is to create a simple script and add it to your crontab to run every hour or so. this will check if your file exists, and if so it will run the gsutil command:
#!/bin/bash
FILE=/home/user/test.txt
if [ -f $FILE ];
then
echo "file exists..or something"
else
gsutil rsync /home/user gs://bucket
fi
UPDATE :
I think this may be what you need. In ~/ you should have a .boto file .
~$ more .boto | grep max
# num_retries = <integer value>
# max_retry_delay = <integer value>
Uncomment those lines and add your numbers. Default is 6 retries, so you could do something like 24 retries and put 3600s in between. This in theory should always keep looping .
Hope this helps !

Prevent creation of conffiles

I'm trying to build a package which has some files under /etc that are not configuration. They are included in the conffiles automatically even if I create an empty package.conffiles in the debian directory.
How can I stop dh_installdeb from doing that?
I’m not sure I understand rafl’s answer, but dh_installdeb as of debhelper=9.20120115ubuntu3 adds everything below /etc to conffiles nearly unconditionally: debian/conffiles adds conffiles but does not override them.
It’s possible to override manually in debian/rules. For example, in order to prevent any files from being registered as conffiles:
override_dh_installdeb:
dh_installdeb
find ${CURDIR}/debian/*/DEBIAN -name conffiles -delete
(of course, indentation must be hard tab)
It's possible to define a upgrade rule at preinst script in debian/<package-name>.preinst using dpkg-maintscript-helper.
#!/bin/sh
# preinst script for <package-name>
set -e
case "$1" in
install|upgrade)
if dpkg-maintscript-helper supports rm_conffile 2>/dev/null; then
dpkg-maintscript-helper rm_conffile /etc/foo/conf.d/bar <Previous package version> -- "$#"
fi
;;
abort-upgrade)
;;
*)
echo "preinst called with unknown argument \`$1'" >&2
exit 1
;;
esac
exit 0
More info:
The right way to remove an obsolete conffile in a Debian package
Here is what I came up with as an extension of Vasiliy's answer. It effectively does what dh_installdeb does but without automatically adding /etc files. This way you regain full control again over what files are considered conffiles and what are not.
override_dh_installdeb:
dh_installdeb
#echo "Recreating conffiles without auto-adding /etc files"
#for dir in ${CURDIR}/debian/*/DEBIAN; do \
PKG=$$(basename $$(dirname $$dir)); \
FILES=""; \
if [ -f ${CURDIR}/debian/conffiles ]; then \
FILES="${CURDIR}/debian/conffiles"; \
fi; \
if [ -f ${CURDIR}/debian/$${PKG}.conffiles ]; then \
FILES="$$FILES ${CURDIR}/debian/$${PKG}.conffiles"; \
fi; \
if [ -n "$$FILES" ]; then \
cat $$FILES | sort -u > $$dir/conffiles; \
elif [ -f $$dir/conffiles ]; then \
rm $$dir/conffiles; \
fi; \
done
(Of course, use REAL tabs if pasting into your rules file).
This answer uses BASH (or /bin/sh which is either symlinked to BASH or is a variant of it). There may be a way to achieve this by using only makefile internal commands, but I'm not that good with those.
This should work even when building multiple binary packages from the same source and it respects the plain debian/conffiles as well as the package-specific debian/${pkg}.conffiles.
Originally, this answer suggested providing your own debian/conffiles files only listing actual configuration files to be installed. Apparently that only serves to add more configuration files but won't override the whole conffiles file.
However, I can't quite see why you'd even want that. If the files are not configuration files, the user won't edit them, so none of the automatic conffile handling will get in your way on upgrades. Also, if they're not actually config files, I'd highly recommend to simply install them to a place other than /etc, avoiding your issue as well.