Get all files in box account - box-api

I need to fetch a list of all the files in a user's box account, such that the list of files can then be displayed in a table view (iOS).
I have successfully implemented this by recursively using /folders/{folder id}/items on all the folder's in my user's box.
However, while this works, it's kind of dirty, seeing as how a request is made for each of the users's folders, which could be quite a large number.
Is there any way to get a list of all the files (it's no issue if folders are included, I can ignore those manually) available?
I tried implementing this using search, but I couldn't identify a value for the query parameter that returned everything.
Any help would be appreciated.
Help me, Obi-Wan Kenobi. You're my only hope.

What you are looking for (recursive call through a Box account) is not available. We have enterprise customers will bajillions of files and millions of folders. Recursively asking for everything would take too long.
What we generally recommend is that you ask for as little as you can, and that you use multiple threads and anticipate what you'll need just a little bit, so that you can deliver a high-performance user-interface to your end-users.
For example ?fields=item_collection is expensive to retrieve, and can add a lot to a paylaod. It can double, or 10x the time that it takes to get back a payload from the Box API. Most UI's don't need to show all the items inside every folder. So they are better off asking for ?fields=.
You can make your application responsive to the user if you make the smallest possible call. Of course there is a balance. Mobile networks have high latency, and sometimes that next API call to show some extra thing is slow. But for a folder tree, you can get high performance by retrieving only the current level, displaying that, and then starting to fetch one-level down while the user is looking at the first level.
Same goes for displaying thumbnails. If a user drills into a folder and starts looking at thumbnails for pictures, there's a good chance they'll want to see other thumbnails in that same folder. Your app should anticipate that, and start to pull one or two extras down in the background. Yes, it means more API calls, but your users will give your app a higher rating for being fast.

Related

Counting views of any element on website

I am using such MySQL request for measuring views count
UPDATE content SET views=views+1 WHERE id='$id'
For example if I want to check how many times some single page has been viewed I've just putting it on top of page code. Unfortunately I always receiving about 5-10x bigger amount than results in Google Analytics.
If I am correct one refresh should increase value in my data base about +1. Doesn't "Views" in Google Analytics works in the same way?
If e.g. Google Analytics provides me that single page has been viewed 100x times and my data base says it was e.g. 450x times. How such simple request could generate additional 350 views? And I don't mean visits or unique visits. Just regular views.
Is it possible that Google Analytics interprates such data in a little bit different way and my data base result is correct?
There are quite a few reasons why this could be occurring. The most usual culprit is bots and spiders. As soon as you use a third-party API like Google Analytics, or Facebook's API, you'll get their bots making hits to your page.
You need to examine each request in more detail. The user agent is a good place to start, although I do recommend researching this area further - discriminating between human and bot traffic is quite a deep subject.
In Google Analytics the data is provided by the user, for example:
A user view a page on your domain, now he is on charge to comunicate to Google The PageView, if something fails in the road, the data will no be included in the reports.
In the other case , the SQL sistem that you have is a Log Based Analytic, the data is collected by your system reducing the data collection failures.
If we see this in that way, that means taht some data can be missed with the slow conections and users that dont execute javascriopt (Adbloquers or bots), or the HTML page is not properly printed***.
Now 5x times more it's a huge discrepancy, in my experiences must be near 8-25% of discrepancy. (tested over transaction level, maybe in Pageview can be more)
What i recomend you is:
Save device, browser information, the ip, and some other metadata information that can be useful and dont forget the timesatmp, so in that way yo can isolate the problem, maybe are robots or users with adblock, in the worst case you code is not properly implemented ( located in the Footer as example)
*** i added this because one time i had a huge discrepancy, but it was a server error, the HTML code was not properly printed showing to the user a empty HTTP. The MYSQL was no so fast to save the information and process the HTML code. I notice it when the effort test (via Screaming frog) showed a lot of 500x errors. ( Wordpress Blog with no cache)

Syncing File Name for Drive Realtime Document

My real-time document allows the user to edit the file name within the editor (much like Google's own apps). I represent this as a collaborative string so all collaborators see the file renames as soon as possible.
I'm trying to determined the best and most efficient way to keep this collaborative string in sync with the actual file name. There are two scenarios to consider:
In Editor Changes
If a user edits the document name within the editor. In this case we need to use the Drive API to push that change out to the file on Google drive. To avoid race conditions, it is best if only one of the collaborators pushes the change out. The easiest way to do this seems to check if the rename event was local.
I also found it best to add a delay so we are not pushing the rename out to the Drive API with every character change. If a few seconds pass with no more name changes at that point it pushes the change out. This all seems to work well.
External Changes
The harder one and the one I am interested in requesting advice on, the case when the file name is changed externally. For example, if the user renamed the file within the Drive interface itself. We want this change to update our collaborative string to match.
My application is entirely client-side so I can't use webhook push notifications. So my only solution is to poll the file name every X seconds (currently set to 10). But this presents the following problems:
It is API intensive. If you have 4 collaborators that keep the screen open for 8 hour that is 11520 API calls. If my app has lots of users with lots of documents I could see how this might push me past my API limits.
To avoid race conditions (and reduce API calls) we only want one collaborator to check for changes and update the collaborative string if the file name has changed. But how to pick when collaborators might join/exit at any time? Currently I am having each collaborator check anytime the collaborators change if they are the "leader". The "leader" is the collaborator whose session id is the highest. This seems to work but it all seems fairly hackey. Also if collaborators join close together I wonder if it might be possible that a race condition would cause multiple collaborators to think they are the leader.
Is there an easier way? An real-time API function I am missing?
It would be ideal if the real-time API just provided a method that stored the document name. Anytime the real-time API checks for mutations it could grab the latest document name.
I think you've identified the options. There isn't any built in functionality currently to sync it via the Realtime API specifically.
Personally I'd probably back off the poll time a lot.. its probably not critical that the title is always exactly up to date, so asking every few minutes is probably sufficient and would greatly reduce your qps.
In terms of identifying a "leader", I can't think of anything better than something deterministic based on the session id. So long as each rechecks on each session join/leave event, I don't think there should be any issues.

'web_link' types in box folder items requests

Recently objects with the type 'web_link' started showing up in the items for some of the users that we work with. This currently messes with our application because we expect a 'size' field in all of the entries that Box returns and the 'web_link' type apparently doesn't have a size. I was wondering firstly why this was happening, I think it might be part of some older API that got exposed recent. I am also not sure how to replicate it since the Box API documentation doesn't mention anything about them. Right now our work around will be to just filter the response on our end, but it would be nice to let our users know how they could find and clean up these old objects if they don't need them, so is there a way to specifically search for them?
Our webapp allows users to create "weblinks" that are links to any URL they might come across on the internet.
They only show up in the folder listing API, and they are only used by a small % of users. We may remove them sometime in the near future, which is why they are not included in the documentation.

How can I summarize the updates to a table on an page I browse?

I am a student at a University. With the placement process going on, we have an internal placement website that shows updates and status about various companies I have applied to. Since the number of companies is too large it becomes cumbersome to scroll through the complete list to find information. Sometimes, I just miss some things. Now, to tackle this problem, here is what I want to do:
The data is in an HTML table. Each row shows information about one company: Some dates, Status(Not/Shortlisted/Applied), Some yes/no options etc. each in a different column. Once I open the page I want to be able to extract information about which companies I got shortlisted in, and in which ones I didn't make it.
What is the right technology to do this ? I am thinking of writing a Greasemonkey user script (I have never actually written any, but how hard could it be ?). What other options do I have?
Edit: I don't quite understand why this question has voted to be closed?
I just displayed a use case for something general: On opening a web page, automatically extracting information from the page and display it to the user. What is the easiest and sufficiently powerful way to achieve this?
Since you can't get access to the website's database, Greasemonkey would be your best automation approach. However, this task is likely to be over before you can get a decent script up from scratch.
Your best practical approach is to save the pages and/or copy and summarize the data in MS Excel, or equivalent.
~~~~~~~~~
Here at SO, We will not develop any but the simplest Greasemonkey scripts for you from scratch (unless they are fun somehow ;) ). But, you can sometimes get such help in the "Script requests forum" at userscripts.org.
In order for someone to help you, they will need:
A clear idea of exactly what data gets manipulated, and how.
Access to the target site. Or access to saved snapshots of the target pages. GM scripts are extremely dependent on the details of the target page.
"other option":
ctrl + F
enter shortlisted
enter
ctrl + G <--repeat last search

Web displays: Paging vs. long tables

It seems that the trend in web design is to provide paged output, where long tables are displayed a page at a time. My customers don't like that, and have requested that the web sites I design for them show all entries in long tables. The arguments for paging seem to be mostly based on the performance hit of displaying long tables, and this is less of a concern in a high-bandwidth corporate intranet. Arguments against paging include the ability to print the entire table, do string searches against the entire table, select arbitrary ranges from the entire table for copying, etc. I've pointed out that these features can easily be added to paged web designs (e.g. a print button that prints the entire table, or a button that creates a CSV file of the the table), but the paged output still seems inconvenient to them. Our typical table is about 100 to 600 items. Obviously tables that would be significantly larger would probably have to be paged.
Questions:
What is your experience with personal or customer preferences for paged vs. full output in long tables?
Web design tools seem to be pushing the paging paradigm. Are they out of touch, or are my customers unusual?
If you're thinking "It depends on the length of the table", what threshold would you use?
I love long one-page listings.
One of the few reasons I can see for paged
listing is the ones you point out about performance.
I think your customers are very usual and in-touch.
The threshold would be about page loading times. When the server can't produce the full lists fast enough or when the lists gets so long that the browser slows down. (The latter can happen for quite short lists if you have non-a-tag hover stuff in your CSS and the browser is IE.)
Give the users a powerful search function and they'll narrow down their page lists themselves.
Why not simply have it be a user configurable option. It sounds like you plan to essentially implement both anyway.
To be honest I think that no matter which you choose someone will complain. At least with it being user configurable you have the ability to put it back on the user.
Provide a default page length, and a configurable parameter (e.g. in the query string for programmatic use, and/or a form on the webpage for interactive use) to control how many listings are in a page.
User flexibility is good. Texas Instruments has a parametric search tool for electrical engineers to find ICs that meet certain technical characteristics, and they include a link both to "show all" in a webpage and "download all" as a .csv file. That's a good model, kudos to TI. Ditto to flickr; their API lets you control (to a large extent) how many results show up on a web service call.
I personally HATE websites that default to 10 listings per page with no way to increase it. It takes FOREVER to browse them, & I'm willing to wait longer if I can get all the stuff at once.
If it's an interactive webpage, I would consider going to an AJAX solution that downloads 100 at a time so there's an indication of progress (and the user can stop it if there are 20000 results).
I agree with PEZ, it's all about responsiveness.
Best solution: Don't provide lists with more than 100 items.
Usually your user doesn't want to read more than 100 or even 600 items. They just don't care. They are searching for one (or possibly a few). Make sure that there's a way for them to get to those items without visual-grep-ing through the list.
And if your client insists on displaying all items, then provide paging with a configurable page size and let him enter "100000 items per page" if he wants to.
One of the seminal books on web design (sorry, I forget which one) used to say not to count on your users scrolling down because most of them don't know how or can't be bothered. I think a more recent update says that while is is true for the general public, certain sectors of more technical users can be expected to scroll down and you can make pages that require scrolling IFF (if and only iff) you know your users can handle it.
I can understand your situation extremely well. I have been in similar situation. I moved a business workflow from being man managed to an automated one. Initially it was carried out using excel spreadsheets. The stakeholders for my software were in the age group of 55+ they dont like anything ajaxy or any of the UI patterns you are talking about. It such cases data retreival logic can be optimized. Any table that touches the 1K mark or has item like image blobs or things like that should be shown in parts from a performance point of view.
long outputs slow rendering and will be performance leech
Customers dont want to changes most times and customer is always right unless u can convince them.
I have put forth my threshhold but it also depends on the content of the rows.
Happy Coding!