Error 10008 OneNote API - onenote

I have a OneNote notebook that is shared in a OneDrive library. When trying to get the sections via the REST API, I get the 10008 error message explaining that I have more than 5000 items and the query cannot be completed. I know that this notebook has far less than 5000 sections, but the OneDrive library has more than 5000 items.
My query is as follows:
https://www.onenote.com/api/v1.0/users/{user id}/notes/notebooks/{notebook id}/sections
I would have expected this kind of error if I was expecting to return 5000+ items, but in this case, I'm expecting somewhere in the neighborhood of 10-20 sections.
I have two questions I'd like answered by the OneNote product group:
Is there a way around this without moving the notebook?
Can I get an explanation as to why this is necessary?

Is there a way around this without moving the notebook?
Splitting notebooks across multiple list should solve this problem. You would like to make sure that any list doesn’t contain more than 5000 notebooks or sections
Can I get an explanation as to why this is necessary?
Given notebooks contains only 10-20 sections, however the SharePoint indexing mechanism considers all the sections available in List while filtering sections for given notebook and hence API will fail with this error message when your list contains more than 5000 notebooks or sections

Related

What is the best way to Paginate Forge/BIM360 Docs files lists?

I am currently in the process of implementing pagination, sort and search functionalities in the project files/plans/sheets views of BIM 360 Docs integration.
Since I couldn't find any best practices regarding to these features, I thought I would reach out so that I don't keep stuck reinventing the wheel.
Background:
Most of the implementation uses https://github.com/Autodesk-Forge/forge-api-dotnet-client/ SDK.
Based on what I saw, pagination in Autodesk API is very basic and does not play well with filtered views. Please correct me if I am wrong, but it looks like there is no way to get number of items in the view and/or calculate total number of pages in the resultset.
If one uses filtering to limit types of items returned by the API (e.g. documents, sheets, project files), API applies pagination first and filters second. That causes holes in returned resultsets, e.g. one would request page 1 sized as 5 items, and get 3 items back, then request similarly sized page 2 and get no items back, then page 3 would yield 2 items.
The above mentioned issues force us to use dynamic lazy-loading paging, similar to how it's currently done in the BIM360 Docs UI.
Question:
Is there a different, better way to paginate? Or do we have to lazy-load results while scrolling, never knowing how much records the next page would return?
Unfortunately, paginating is not available for Forge MD API of BIM360 currently as I know. Apologizing for any inconvenience caused.
However, it's been logged as request id FDM-1769 a few days ago. I saw your name on the request list. So I think it will be supported in the future. Besides, a workaround is to fetch all data from the API, then paginate on the client side via Javascript.

Trying to find connectivity in the Wikipedia Graph: Should I use the API or the relevant SQL Dump?

I apologize for the slightly obnoxious nature of my question.
A few years ago, I came across a game that could be played on Wikipedia. The goal is to start from a random page and try to get to the 'Adolf Hitler' page by utilizing internal wikilinks within 5 clicks(6 degrees of separation). I found this to be an interesting take on the Small-world experiment and tried it a few times, and was able to reach the target article within 5-6 clicks almost every time (however, there was no way for me to know if that was the shortest path or not).
As a project, I want to find out the degree of separation between a few(maybe hundreds, or thousands, if feasible) of random Wikipedia pages and Adolf Hitler's page in order to create a Histogram of sorts. My intention is to do an exhaustive search in a DFS manner from the root page (restricting the 'depth' of the search to 10, in order to ensure that the search terminates in case it has selected a bad path, or is running in cycles). So, the program would visit every article reachable within 10 clicks of the root article, and find the shortest way in which the target article is reachable.
I realise that the algorithm described above would certainly take too long to run. I have ideas for optimizations which I will play around with.
perhaps I will use a single source shortest path BFS based approach which seems to be more feasible considering that the degree of the graph would be quite high(mentioned later)
I will not mention all ideas for the algorithms here as they are not relevant to the question as in any possible implementation, I will have to query (directly by downloading relevant tables on my machine, or through the API) the:
pagelinks table which contains information about all internal links in all pages in the form of the 'id' of page containing the wikilink and the 'title' of the page that is being linked
page table which contains relevant information for mapping the page 'title' to 'id'. This mapping is needed because of the way data is stored in the pagelinks table.
Before I knew Wikipedia's schema, naturally, I started exploring the Wikipedia API and quickly found out that the following API query returns the list of all internal links on a given page, 500 at a time:
https://en.wikipedia.org/w/api.php?action=query&format=jsonfm&prop=links&titles=Mahatma%20Gandhi&pllimit=max
Runnig this on MediaWiki's API sandbox a few times gives a request time of about 30ms for 500 link results returned. This is not ideal as even 100 queries of this nature would end up taking 3 seconds, which means I would certainly have to reduce the scope of my search somehow.
On the other hand, I could download the relevant SQL Dumps for the two tables here and use them with MySQL. English Wikipedia has aroud 5.5 million articles (which can be considered as vertices of graph). The compressed sizes of the tables are: around 5GB for the pagelinks table and 1GB for the page table. A single tuple of the page table is a lot bigger than that of the pagelinks table, however (max size of around 630 and 270 bytes, by my estimate). Sorry I could not provide the number of tuples in each table, as I haven't yet downloaded and imported the database.
Downloading the database seems appealing because: since I have the entire list of pages in the page table, I could resort to a single pair shortest path BFS approach from Adolf Hitler's page by tracking all the internal backlinks. This would end up finding the degree of seperation of every page in the database. I would also imagine that eliminating the bottleneck (internet connection) would speed up the process.
Also, I would not be overusing the API.
However, I'm not sure that my Desktop would be able to perform even at par with the API considering the size of the database.
What would be a better approach to pursue?
More generally, what kind of workload requires the use of an offline copy of the database rather than just the API?
While we are on the topic, in your experience, is such a problem even feasible without the use of supercomputing, or perhaps a server to run the SQL queries?

Summing from large public dataset

I'm working on a web app that will let users explore some data from a public API. The idea is to let the user select a U.S. state and some other parameters, and I'll give them a line chart showing, for example, what percentage of home loans in that state were approved or denied over time.
I can make simple queries along these lines work with a small number rows, but these are rather large datasets, so I'm only seeing a sliver of the whole. Asking for all the data produces an error. I think the solution is to aggregate the data. But that's where I start getting 400 bad request responses from the server.
For example, this is an attempt to summarize 2008 California data to give the total number of applications per approval category:
https://api.consumerfinance.gov/data/hmda/slice/hmda_lar.json?$where=as_of_year=2008,state_abbr="CA"&$select=action_taken,SUM(action_taken)&group=action_taken
All summary variations produce a 400 error. I can't find an example that is very similar to what I'm trying to do, so any help is appreciated.
Publisher's information is here:
http://cfpb.github.io/api/hmda/queries.html
Sorry for the delay, but its worth noting that that API is based on Qu, CFPB's open-source query platform, and not Socrata, so you can't use the same SoQL query language on that API.
Its unclear how you can engage them for help, but there is a "Contact Us" email link in the docs page and an issues forum on GitHub.

Get all files in box account

I need to fetch a list of all the files in a user's box account, such that the list of files can then be displayed in a table view (iOS).
I have successfully implemented this by recursively using /folders/{folder id}/items on all the folder's in my user's box.
However, while this works, it's kind of dirty, seeing as how a request is made for each of the users's folders, which could be quite a large number.
Is there any way to get a list of all the files (it's no issue if folders are included, I can ignore those manually) available?
I tried implementing this using search, but I couldn't identify a value for the query parameter that returned everything.
Any help would be appreciated.
Help me, Obi-Wan Kenobi. You're my only hope.
What you are looking for (recursive call through a Box account) is not available. We have enterprise customers will bajillions of files and millions of folders. Recursively asking for everything would take too long.
What we generally recommend is that you ask for as little as you can, and that you use multiple threads and anticipate what you'll need just a little bit, so that you can deliver a high-performance user-interface to your end-users.
For example ?fields=item_collection is expensive to retrieve, and can add a lot to a paylaod. It can double, or 10x the time that it takes to get back a payload from the Box API. Most UI's don't need to show all the items inside every folder. So they are better off asking for ?fields=.
You can make your application responsive to the user if you make the smallest possible call. Of course there is a balance. Mobile networks have high latency, and sometimes that next API call to show some extra thing is slow. But for a folder tree, you can get high performance by retrieving only the current level, displaying that, and then starting to fetch one-level down while the user is looking at the first level.
Same goes for displaying thumbnails. If a user drills into a folder and starts looking at thumbnails for pictures, there's a good chance they'll want to see other thumbnails in that same folder. Your app should anticipate that, and start to pull one or two extras down in the background. Yes, it means more API calls, but your users will give your app a higher rating for being fast.

How to take into account empty groups in Report Builder

I have asset conditions that I am graphing for my Council using Microsoft Report Builder and I can’t figure out a way to show categories that have no data against them. So below for instance, I have two examples of graphs of different asset types. The top graph has all of the categories I require (Excellent, Good, Fair, Poor and Failed), but the bottom graph is missing any examples of Failed assets. Is there any way that I can specify what categories must be included, so that blank ones are covered?
Example images: