Google Drive: maximum number of files in a directory - google-drive-api

Is there any maximum number of files that can reside in a Google Drive folder? Are there performance hits when a lot of files (for instance, a million of them) stay in the same folder?
From what I understand (mostly on reading how the API works), Google Drive has no real concept of "folder". Folders are just represented by a specific kind of file and folder belonging is just described within the files' metadata, but files by themselves are just a long unstructured list of blobs with metadata. This would suggest that having a big number of files in the same directory should not be a big problem.
But I would like to have more expert opinions on the matter.
(of course folders with a lot of files are going to hurt if a synchronize with my disk; but I am just going to query them with the API)
EDIT I am not going to use the web UI. The types of queries that I am going to perform are to post a file in this giant folder and retrieve a file given its name. Basically this means I am using this folder as a hash table. So I guess the actual question is: if a make a query like
'big_folder_id' in parents and title = 'some_key'
(assuming that there is just one file named some_key in the folder), is the performance impact associated with the fact that in the folder identified by big_folder_id there are a lot of files going to be bearable?

Performance hit will be on the UI side. Scrolling to the bottom of a long files list will take a very long time. Also, in a folder's web view (i.e. if you share it with 'anyone with a link can view' permission) only the first 500 files will be displayed, with no way to see the rest of the files.
From API access perspective - it depends on what you are doing with the API. For example, if you try to get a list of files in a folder with a lot of files in it, you will likely run into script execution timeout (6 minutes max).

I think Google recently started limiting this. They now have a 500k item limit per folder (root folder is exempt from this limit): https://developers.google.com/drive/api/v3/handle-errors#resolve_a_403_error_number_of_items_in_folder
I designed my system thinking there's no limit, and my logs indicate they started enforcing my account at 2020-06-15T17:13:37.020232715Z. At the time I had reached 3 232 458 files in a single folder. Limit is at 500k, so this is further evidence that this quota was retroactively added and enforcement was started without warning which brought my system down.
More proof is that this error code (numChildrenInNonRootLimitExceeded) started existing in this document somewhere between 2020-04-12 and 2020-06-11:
https://web.archive.org/web/20200412153122/https://developers.google.com/drive/api/v3/handle-errors
=> not present
https://web.archive.org/web/20200611105741/https://developers.google.com/drive/api/v3/handle-errors
=> present
Also, web search for that error code turn up very few links. The only non-Google result I find is dated 2020-06-11: https://scrapbox.io/ci7lus/Error:_The_limit_for_this_folder's_number_of_children_(files_and_folders)_has_been_exceeded.#5eeb086bae0f140000d5c509

Related

What is the difference (and similarity if any) between LocalCache and TempState application data folders?

A UWP app uses a number of data folders located under a unique folder identified by its package ID. Specifically, how do the LocalCache and TempState data folders compare and contrast?
On the desktop, data files for Windows Store apps are stored under the folder %USERPROFILE%\AppData\Local\Packages{PackageId} where {PackageId} corresponds to the Windows Store app package identifier (a slightly different but similarly unique folder on Windows Mobile).
There are about half a dozen folders, each of these with specific purpose, including folders LocalCache intended for caching app data and TempState intended as a temp folder.
Here is how the two data folders compare and contrast.
Both are excluded from backup/restore operations. In contrast, app data folders such as LocalState, Settings and RoamingState are always backed up.
Both can be deleted any time within an app by using the all clear [and dangerous!] method ApplicationData.Current.ClearAsync(), or selectively using ApplicationData.Current.ClearAsync(ApplicationDataLocality.Temporary) for TempState folder, and ApplicationData.Current.ClearAsync(ApplicationDataLocality.LocalCache) for LocalCache folder.
LocalCache folder can be relied upon until it's deleted, whereas TempState folder cannot be relied upon at a later time as it's subject to deletion by external factors such as disk clean-up, or by the operating system on running low on storage space.
If you want data that doesn't need to be backed up, but want to use it at a later time and only delete it when done with it, then use the LocalCache folder.
If you want data that doesn't need to be backed up, but only needed for the current app session (and leave the cleaning job to external tools such as storage clean-up ) then use the TempState folder.
You may like to implement an automatic clearing of the TempState folder upon exiting an app. Likewise, monitoring the LocalCache folder for clearing data that is no more needed is an important point to bear mind.

Writing large files and suspend

In our windows store app we save the files that the users are creating in an epub file, which is a zip archive with file extension .epub
The app is written primarily in HTML and JS, but to handle the writing to the zip archive we use some C# in a helper.
This all works, but I have found that the zip archive can become corrupted if the app suspends whilst writing to the zip, as sometimes when adding a particularly large file to the zip, say a 100mb video file, the operation does not complete in the 5 seconds allowed from oncheckpoint.
Are there any ways that I could avoid this problem? As far as I can see there is just no way to write a large file to a zip and be 100% sure that it won't get corrupted if the app suspends.
I agree with you that there is just no way to write a large file to a zip and be 100% sure that it won't get corrupted if the app suspends.
As far as I know, when an App was suspended, the memory owned by the app will not be released, so you don’t need to worry about the data missing in memory when suspending.
The thing you need to worry is user quit the app before the data was persistent.
But some extra designs may improve some user experience and avoid data losing.
auto-save
For example, persistent the changes when the object was changed by user.
show user saving progress
Using the progress UI to let user know the saving is in-progress and he/she will lost the data if quit the app.

Google Drive Sync + Read-only access

Here's the situation:
We use Google Apps for Business. We have one Google Drive folder -- "Folder A" -- that contains about 30 sub-folders. Each of these sub-folders contains hundreds of files and folders within it. You can assume that I am the owner of all files and folders on Google Drive. I am also the Google Apps superadmin. Folder A has a very well thought-out structure, with as many as eight levels of folders in the folder hierarchy. We need to share Folder A with 40 different computers -- folder structure, files, everything. These 40 computers are display terminals, so each is used by dozens of people every day. It's crucial for us that all 40 computers have exactly the same folder structure because people have to frequently move from one display computer to another, and they have to make a presentation in which every second matters, so we can't have them spend 5 to 10 minutes each time figuring out the folder structure of the computer they are standing at. For business reasons + potential delays, we can't have people sign in using their individual Google accounts.
Here's what I did:
created a new account ("display#domain.com")
shared Folder A with display#domain.com (at "can view" permission level)
on all 40 computers, logged in to display#domain.com's Google Drive and synced everything
My problem is
For some reason, Google Drive allows users to move, delete, or do pretty much whatever they want to folders and files -- even if they have only "can view" access. Yes, this doesn't affect the original shared folder / file, but is still a huge problem because:
If any random user goes to any of the 40 computers and accidentally deletes a file or moves it, then this affects the other 39 computers as well (because Google Drive syncs across all 40 computers)
Even if I share Folder A ("can view" access only) with 40 different new accounts (display1#domain.com, display2#domain.com, ...), a user can still mess up the folder structure by going to -- let's say -- computer 17 and moving or deleting folders. So everyone who uses computer 17 from that point onwards will struggle because the folder structure has been tampered with. Yes, the original Folder A, owned by me, will still be in perfect condition, so there is no data loss. But I have no way of knowing that the folder structure for computer 17 has been messed up. So to make sure that every computer has the correct folder structure just like my original Folder A, I need to manually go to each of the 40 computers every day and check or re-sync to Google Drive. That's going to be crazy!
So ideally we need some way to make Folder A read-only, i.e., users can access the content but can't tamper with the overall folder structure or delete files. We're open to getting creative solutions and happy to do as much work as required, as long as it's one-time work.
Your problem is the Drive Sync app which is bi-directional. If I understand you correctly, you want uni-directional sync. My recommendation would be to replace Drive Sync with your own app that implements the behaviour you're looking for.
I'm responding very late, but thought I'd share what I found (for future users with a similar problem).
Short version: there is no solution here. Google Drive will allow users to tamper with folder structures, even if they've been given only view access. Philosophically, Google probably wants each user to create his/her own folder structure.
Creating our own Google Drive lookalike, as pinoyyid suggested, wasn't really an option for business reasons (we're completely entrenched in the Google ecosystem so makes sense to stick to Drive). So what I end up doing is look through change activity in Google Drive (online, on my computer) on a daily basis, keeping an eye out for any changes to the folder structure. I then:
- undo that change
- approach the person who made that change and tell them they went wrong
Takes about 15 minutes per day
I will also eventually get around to automating this (using AppsScript I guess) but that's for later.
Thank you to all those who thought about the problem. Hopefully, Google will allow for a variety of use-cases at a later time.
I know this is an old question, but in case anyone else comes here looking for a solution, try using a different Google Drive Client. I've tried the programs WebDrive and RaiDrive before, and both of them offer the ability to use sync Google Drive to a virtual drive, and set the drive to read-only mode in the settings.
How about changing user permissions the local filesystems of all the 40 computers to "read only"? Should achieve the desired result.

Spreadsheet or script properties for simple index in Google apps script?

I'm programming a Google apps script store for tiddliwiki (tiddlywiki.com). It receives files and store them within Google Drive. It never overwrites any file, it just creates a new one on every upload. For performance reasons I want to maintain an index of the latest version of each file and its ID. Currently I'm using properties service to achieve this. When a file is uploaded, I store its Name:ID. That way retrieving a file by name does not require to search in the full folder neither check what is the latest version. I'm worried about how many entries can I store on the script properties store. I'm thinking about using a spreadsheet for save the index, but I don't know what is the difference in terms of performance compared to the properties service.
Here is the question: can I stick to properties service to achieve this task or should I switch to google spreadsheets? Will the performance be much worse? Any alternative for the index?
Thanks in advance.
EDIT:
Since this will store only a few hundred of entries, what about using a JSON file as index. Will take it much time to load the text and parse it?
It depends on the amount of files you're expecting. For a few thousand files, the Property service might do and is surely easier to use than a Spreadsheet, but it has a tighter limitation of 500kb per store.
If you think you'll have more files, than it's probably best not to index at all, and do a fast Google Drive search to retrieve your latest file. The search criteria can be very specific and fast (filter by title only or any timestamp criteria). I think it'll be much less trouble in your script (than trying to build/fit a big index into memory, etc).

Website Admin Rights: Database vs. File Structure

Background:
I am making a website where I want modular administrative rights for read/write/edit priviledges. My intent is to allow for any number of access level types, and to base it off of folder structure.
As an example, root admins would have read/write/edit for all site pages. Group A may have read/write/edit to all files in the path www.example.com/section1/ (including subfolders), Group B would have read/write/edit to all files in www.example.com/section2/, and so on.
I have considered two options to impliment this: create a MySQL database that would hold:
Group Name (reference name for the access group)
Read (list of folders the group can read separated by comma)
Write (list of folders the group can write new content to separated by comma)
Edit (list of folders the group can change already existing information separated by comma)
The other option I considered is creating a 'GroupAccess.txt' file somewhere and hand-jamming the information into that to reference.
Question: What are the advanatages of each of these systems? Specifically, what do I gain from putting admin access information in a database versus a text file, and vice versa? (i'm looking for information on potential speed issues, ease of maintainability, ease of editing/changing the information that will be stored)
Note: I'm not looking for a 'which is better', I want to know specific advantages so I can make a better informed decision on what's best for me.
The first thing that comes to mind is that the database would be more secure over a text file for the simple reason a text file can be read over the internet as most web servers serve .txt file by default, this would allow for users with restricted access and non-users of the site to see the whole structure of you site and in turn can make you more open to possible attacks on certain areas of your site.
Another benefit of using a database is that you can easily use a join to check is a user has access to some content in the database where as with a file you'll need to read the file get the permissions and the go build the SQL and get the data from the database.
Those are just two of the things that have stuck out from reading your question, hope it helps.