What is the reason why the OneNote APIs won't return all the pages in a notebook? - onenote

I am reading around here and I am seeing multiple messages about the /pages endpoint that is not working a expected
It seems that the OneNote APIs (MS Graph or Office365) are not returning all the pages that the user can see. In particular recent pages are not shown as available.
This message is for those of you who work for Microsoft and who keep an eye on this forum. Please if you have any explanation or workaround for this we would like to hear about it.
If this is work in progress we would also like to know when the APIs can be considered stable and reliable enough to consider them OK for production use
Update:
Permissions or scopes
scopes=[
"Notes.Read",
"Notes.Read.All",
"Notes.ReadWrite",
]
This is for a device authorization flow, the device is acting as a Microsoft Online account. The app is registered to Azure as personal app but the enterprise one does the same
The authorization process is described here
What type of app/authentication flow should I select to read my cloud OneNote content using a Python script and a personal Microsoft account?
After that I am using this endpoint to get the notebooks
https://graph.microsoft.com/v1.0/users/user-id/onenote/notebooks
from the returned json I pick the endpoint for the notebook I want to read and I access the endpoint the link stored in notebook['sectionsUrl']. This call returns a sections json
From this I pick the section I want and I access the link stored in section['pagesUrl']
Each call returns the expected info excepting the last one, when I get an arbitrary low number of pages in the section I want to explore. There is nothing wrong with the format of the info, it is just incomplete or not up to date
Not sure if this is related but when I try to access the pages in a section from MS Graph Explored I am seeing the same behavior (not all the pages are reported). This is a shared notebook and I am using the owner account for all the above so it should not be a permission problem
from msal import PublicClientApplication
import requests
endpoint= "https://graph.microsoft.com/v1.0/me/onenote"
authority = "https://login.microsoftonline.com/consumers"
app=PublicClientApplication(client_id=client_id, authority=authority)
flow = app.initiate_device_flow(scopes=scopes)
# there is an interactive part here that I automated using selenium, you
# are supposed to ouse a link to enter a code and then autorize the
# device; code not shown
result = app.acquire_token_by_device_flow(flow)
token= result['access_token']
headers={'Authorization': 'Bearer ' + token}
endpoint= "https://graph.microsoft.com/v1.0/users/c5af8759-4785-4abf-9434-xxxxxxxxx/onenote/notebooks"
notebooks = requests.get(endpoint,headers=headers).json()
for notebook in notebooks['value']:
print(notebook['displayName'])
print(notebook['sectionsUrl'])
print(notebook['sectionGroupsUrl'])
# I pick a certain notebook
section=[section for section in sections if section['displayName']=="Test"][0]
endpoint=notebook['sectionsUrl']
pages=requests.get(endpoint,headers=headers).json()
for page in pages['value']:
print(page['title'])
Update2
If I use this endpoint
https://graph.microsoft.com/v1.0/users/user-id/onenote/sections/section-id/pages
I would expect to get the complete list of pages for that section.
That is not working
After reading again and again the docs I my understanding is that the approach is to
call https://graph.microsoft.com/v1.0/users/user-id/onenote/pages$fiter or search etc etc
I this correct?
Also I vaguely remember there is a way to search for a section and have it expanded so that the search returs the children too.
Am I close to understanding this?
Thank you
MM

Related

Facebook Graph API returns a different response on a different script but same -not invalid- tokens

I am trying to make a simple python script that posts a text message to a facebook page using requests.
I actually managed to succeed this feat, however, when I add the same logic to a bigger project of mine, a certain request returns a different json.
According to this page https://developers.facebook.com/docs/pages/access-tokens I can exchange the short lived user token I generate in the graph explorer tool for a long lived one that lasts 60 days. This worked for me until now. When I run the same functions, same variables on another .py file that includes other logic as well the request does not return this line:
"expires_in": SECONDS-UNTIL-TOKEN-EXPIRES
And of course later on if I continue the logic and use the token it returns (which is the same) for, let's say, a make_post function the request prints
{'error': {'message': '(#200) If posting to a group, requires app being installed in the group, and \\\n either publish_to_groups permission with user token, or both manage_pages \\\n and publish_pages permission with page token; If posting to a page, \\\n requires both manage_pages and publish_pages as an admin with \\\n sufficient administrative permission', 'type': 'OAuthException', 'code': 200, 'fbtrace_id': 'AqYMMeOcOniWAGgEEtsEURs'}
Why does it not successfully return, the user token had not expired and it has the requires rights. Furthermore I tested this in a smaller .py file and it worked.
Another thing I found out here https://developers.facebook.com/support/bugs/523165725596520/?join_id=f1ff8392b49675c here is that other people have actually reported the same issue but it has been closed as 'intended by design' however there is no information of a solution.
Running the request in my browser also does not work correctly.
Do you have any ideas? I am completely clueless.
Thank you very much in advance
As #CBroe in a comment said, the expires_in didn't have anything to do with my error. The token it returns if valid. The issue I had later on had to do with the url I was parsing

How to restrict fields returned by stackexchange api, and turn off paging?

I'd like to have a list of just the current titles for all questions in one of the smaller (less than 10,000 questions) stackexchange site. I tried the interactive utility here: https://api.stackexchange.com/docs/questions and it both reports the result as a json at the bottom, and produces the requesting url at the top. For example:
https://api.stackexchange.com/2.2/questions?order=desc&sort=activity&tagged=apples&site=cooking
returns this JSON in my browser:
{"items":[{"tags":["apples","crumble"],"owner":{ ...
...
...],"has_more":true,"quota_max":300,"quota_remaining":252}
What is quota? It was 10,000 on one search on one site, but suddenly it's only 300 here.
I won't be doing this very often, what I'd like is the quickest way to edit that (or similar of course) url so I can get a list of all of the titles on a small site. I don't understand how to use paging, and I don't need any of the other fields. I don't care if I get them, but I'm thinking if I exclude them I can have more at once.
If I need to script it, python (2.7) is my preferred (only) language.
quota_max is the number of requests your application is allowed per day. 300 is the default for an unregistered application. This used to be mentioned directly on the page describing throttles, but seems to have been removed. Here is historical information describing the default.
To increase this to 10,000, you need to register an application and then authenticate by passing an access token in your script.
To get all titles on a site, you can use a Python library to help:
StackAPI. The answer below will use this library. DISCLAIMER: I wrote this library
Py-StackExchange
SEAPI
StackPy
Assuming you have registered your application and authenticated we can proceed.
First, install StackAPI (documentation):
pip install stackapi
This code will then grab the 10,000 most recent questions (max_pages * page_size) for the site hardwarerecs. Each page costs you one API hit, so the more items per page, the few API calls.
from stackapi import StackAPI
SITE = StackAPI('hardwarerecs')
SITE.page_size = 100
SITE.max_pages = 100
# Filter to only get question title and link
filter = '!BHMIbze0EQ*ved8LyoO6rNjkuLgHPR'
questions = SITE.fetch('questions', filter=filter)
In the questions variable is a dictionary that looks very similar to the API output, except that the library did all the paging for you. Your data is in questions['data'] and, in this case, contains a list of dictionaries that look like this:
[
...
{u'link': u'http://hardwarerecs.stackexchange.com/questions/29/sound-board-to-replace-a-gl2200-in-a-house-of-worship-foh-setting',
u'title': u'Sound board to replace a GL2200 in a house-of-worship FOH setting?'},
{ u'link': u'http://hardwarerecs.stackexchange.com/questions/31/passive-gps-tracker-logger',
u'title': u'Passive GPS tracker/logger'}
...
]
This result set is limited to only the title and the link because of the filter we applied. You can find the appropriate filter by adjusting what fields you want in the web UI and copying the filter field.
The hardwarerecs parameter that is passed when creating the SITE parameter is the first part of the site's domain URL. Alternatively, you can find it by looking at the api_site_parameter for your site when looking at the /sites end point.

Get JSON from Google Apps Script URL via Erlang

Good Evening!
I've been looking into the possibility of using GAS(Google Apps Script) to host a small bit of javascript that lets me use the new Google finance apps api. The intention being that I'll be using the stock information for a project which involves the use of stock data. I know that there are a few ways to get stock information from Google, but the data that the finanace app returns is more in-line with other sources we are using. (One constraint on this project is that we have multiple sources).
I've written the javascript and I can call a httpc:request to the URL for the script given to me from Google. In the browser the JS returns the json object as I want it, however when the call is made from Erlang I'm getting it in a list of ascii. From checking the values it appears to be a document starting like:
Below is the javascript and the url to see the json:
https://script.google.com/macros/s/AKfycbzEvuuQl4jkrbPCz7hf9Zv4nvIOzqAkBxL1ixslLBxmSEhksQM/exec
function doGet() {
var stock = FinanceApp.getStockInfo('LON:TSCO');
return ContentService.createTextOutput(JSON.stringify(stock))
.setMimeType(ContentService.MimeType.JSON);
}
For the erlang, it's a simple request but I've not been doing erlang long, so perhaps I've messed something up here (The URL being the one mentioned above). I've got crypto / ssl / inets when I'm testing this on the command line.
{ok, {Version, Headers, Body}} = httpc:request(get, URL, []}, [], []).
I think it's also worth mentioning that when i curl it from Cygwin, I get a massive load of HTML also, I've included it below, but if you see it you'll thank me for not posting it in here! http://pastebin.com/UtJHXjRm
I've been updating the script as I go with the new versions but I'm at a bit of a loss as to why it's not returning correctly.
If anyone can give me any pointers I'd be very grateful! I get the feeling that it's not intended to be used this way, perhaps only within other Google products and such.
Cheers!
It would be necessary to review how are you deploying the Web App, specifically the Who has access to the app, to access without authentication should be configured as shown in the image:
See Deploying Your Script as a Web App from the documentation.
In my test, by running:
curl -L https://script.google.com/macros/s/************/exec
Get the following result:
{
"priceopen":358,
"change":2.199981689453125,
"high52":388.04998779296875,
"tradetime":"2013-10-11T15:35:18.000Z",
"currency":"GBX",
"timezone":"Europe/London",
"low52":307,
"quote":357.8999938964844,
"name":"Tesco PLC",
"exchange":"LON",
"marketcap":28929273763,
"symbol":"TSCO",
"volumedelay":0,
"shares":8083060703,
"pe":23.4719295501709,
"eps":0.15248000621795654,
"price":357.8999938964844,
"has_stock_data":true,
"volumeavg":14196534,
"volume":8885809,
"changepct":0.6184935569763184,
"high":359.5,
"datadelay":0,
"low":355.8999938964844,
"closeyest":355.70001220703125
}
Possibly your GET is not following the REDIRECT that happens when you use contentService. Look at the html returned there is a redirect in there.

Google Drive/OAuth - Can't figure out how to get re-usable GoogleCredentials

I've successfully installed and run the Google Drive Quick Start application called DriveCommandLine. I've also adapted it a little to GET file info for one of the files in my Drive account.
What I would like to do now is save the credentials somehow and re-use them without the user having to visit a web page each time to get an authorization code. I have checked out this page with instructions to Retrieve and Use OAuth 2.0 credentials. In order to use the example class (MyClass), I have modified the line in DriveCommandLine where the Credential object is instantiated:
Credential credential = MyClass.getCredentials(code, "");
This results in the following exception being thrown:
java.lang.NullPointerException
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:187)
at com.google.api.client.json.jackson.JacksonFactory.createJsonParser(JacksonFactory.java:84)
at com.google.api.client.json.JsonFactory.fromInputStream(JsonFactory.java:247)
at com.google.api.client.googleapis.auth.oauth2.GoogleClientSecrets.load(GoogleClientSecrets.java:168)
at googledrive.MyClass.getFlow(MyClass.java:145)
at googledrive.MyClass.exchangeCode(MyClass.java:166)
at googledrive.MyClass.getCredentials(MyClass.java:239)
at googledrive.DriveCommandLine.<init>(DriveCommandLine.java:56)
at googledrive.DriveCommandLine.main(DriveCommandLine.java:115)
I've been looking at these APIs (Google Drive and OAuth) for 2 days now and have made very little progress. I'd really appreciate some help with the above error and the problem of getting persistent credentials in general.
This whole structure seems unnecessarily complicated to me. Anybody care to explain why I can't just create a simple Credential object by passing in my Google username and password?
Thanks,
Brian O Carroll, Dublin, Ireland
* Update *
Ok, I've just gotten around the above error and now I have a new one.
The way I got around the first problem was by modifying MyClass.getFlow(). Instead of creating a GoogleClientServices object from a json file, I have used a different version of GoogleAuthorizationCodeFlow.Builder that allows you to enter the client ID and client secret directly as Strings:
flow = new GoogleAuthorizationCodeFlow.Builder(httpTransport, jsonFactory, "<MY CLIENT ID>", "<MY CLIENT SECRET>", SCOPES).setAccessType("offline").setApprovalPrompt("force").build();
The problem I have now is that I get the following error when I try to use flow (GoogleAuthorizationCodeFlow object) to exchange the authorization code for the Credentials object:
An error occurred: com.google.api.client.auth.oauth2.TokenResponseException: 400 Bad Request
{
"error" : "invalid_scope"
}
googledrive.MyClass$CodeExchangeException
at googledrive.MyClass.exchangeCode(MyClass.java:185)
at googledrive.MyClass.getCredentials(MyClass.java:262)
at googledrive.DriveCommandLine.<init>(DriveCommandLine.java:56)
at googledrive.DriveCommandLine.main(DriveCommandLine.java:115)
Is there some other scope I should be using for this? I am currently using the array of scopes provided with MyClass:
private static final List<String> SCOPES = Arrays.asList(
"https://www.googleapis.com/auth/drive.file",
"https://www.googleapis.com/auth/userinfo.email",
"https://www.googleapis.com/auth/userinfo.profile");
Thanks!
I feel your pain. I'm two months in and still getting surprised.
Some of my learnings...
When you request user permissions, specify "offline=true". This will ("sometimes" sic) return a refreshtoken, which is as good as a password with restricted permissions. You can store this and reuse it at any time (until the user revokes it) to fetch an access token.
My feeling is that the Google SDKs are more of a hinderence than a help. One by one, I've stopped using them and now call the REST API directly.
On your last point, you can (just) use the Google clientlogin protocol to access the previous generation of APIs. However this is totally deprecated and will shortly be turned off. OAuth is designed to give fine grained control of authorisation which is intrinsically complex. So although I agree it's complicated, I don't think it's unnecessarily so. We live in a complicated world :-)
Your and mine experiences show that the development community is still in need of a consolidated document and recipes to get this stuff into our rear-view mirrors so we can focus on the task at hand.
Oath2Scopes is imported as follows:
import com.google.api.services.oauth2.Oauth2Scopes;
You need to have the jar file 'google-api-services-oauth2-v2-rev15-1.8.0-beta.jar' in your class path to access that package. It can be downloaded here.
No, I don't know how to get Credentials without having to visit the authorization URL at least once and copy the code. I've modified MyClass to store and retrieve credentials from a database (in my case, it's a simple table that contains userid, accesstoken and refreshtoken). This way I only have to get the authorization code once and once I get the access/refresh tokens, I can reuse them to make a GoogleCredential object. Here's how Imake the GoogleCredential object:
GoogleCredential credential = new GoogleCredential.Builder().setJsonFactory(jsonFactory)
.setTransport(httpTransport).setClientSecrets(clientid, clientsecret).build();
credential.setAccessToken(accessToken);
credential.setRefreshToken(refreshToken);
Just enter your clientid, clientsecret, accessToken and refreshToken above.
I don't really have a whole lot of time to separate and tidy up my entire code to post it up here but if you're still having problems, let me know and I'll see what I can do. Although, you are effectively asking a blind man for directions. My understanding of this whole system is very sketchy!
Cheers,
Brian
Ok, I've finally solved the second problem above and I'm finally getting a working GoogleCredential object with an access token and a refresh token.
I kept trying to solve the scopes problem by modifying the list of scopes in MyClass (the one that manages credentials). In the end I needed to adjust the scopes in my modified version of DriveCommandLine (the one that's originally used to get an authorization code). I added 2 scopes from Oauth2Scopes:
GoogleAuthorizationCodeFlow flow = new GoogleAuthorizationCodeFlow.Builder(
httpTransport, jsonFactory, CLIENT_ID, CLIENT_SECRET,
Arrays.asList(DriveScopes.DRIVE, Oauth2Scopes.USERINFO_EMAIL, Oauth2Scopes.USERINFO_PROFILE))
.setAccessType("offline").setApprovalPrompt("force").build();
Adding the scopes for user information allowed me to get the userid later in MyClass. I can now use the userid to store the credentials in a database for re-use (without having to get the user to go to a URL each time). I also set the access type to "offline" as suggested by pinoyyid.

Crawling and Scraping iTunes App Store

I noticed that iTunes preview allows you to crawl and scrape pages via the http:// protocol. However, many of the links are trying to be opened in iTunes rather than the browser. For example, when you go to the iBooks page, it immediately tries opening a url with an itms:// protocol.
Are there any other methods of crawling the App Store or is this the only way?
Can the itms:// protocol links themselves be crawled somehow?
I would have a decent look at the iTunes Search API and the iTunes Enterprise Partner API
Search API -
http://www.apple.com/itunes/affiliates/resources/blog/introduction---search-api.html
Enterprise Partner API -
http://www.apple.com/itunes/affiliates/resources/documentation/itunes-enterprise-partner-feed.html
You might get most/all of the information you need in a nice JSON file format.
If you can't get the information you need with the API, I would be interested what it is :)
As phillipp mentioned, the iTunes search API is an easy way to retrieve data about your App Store listings in JSON format.
Simply query for this with your app id (you can find the app id by viewing the web listing for your app at itunes.apple.com), ex:
http://itunes.apple.com/lookup?id=INSERT_YOUR_APP_ID_HERE
then, parse the resulting JSON to your heart's content.
The only difference between http:// links and itms:// links is that you need to set your User-Agent to an iTunes user-agent, and depending on the version you may also have to include a verification code based on some not-so-secret algorithm.
For example this is the code for iTunes 9:
# Some magic. Generates a seed we use for X-Apple-Validation. Adapted from LWP::UserAgent::iTMS_Client.
function comp_seed($url, $user_agent) {
$random = sprintf( "%04X%04X", rand(0,0x10000), rand(0,0x10000) );
$static = base64_decode("ROkjAaKid4EUF5kGtTNn3Q==");
$url_end = ( preg_match("|.*/.*/.*(/.+)$|",$url,$matches)) ? $matches[1] : '?';
$digest = md5(join("",array($url_end, $user_agent, $static, $random)) );
return $random . '-' . strtoupper($digest);
}
However if you are only scraping, iTunes preview should work for your purposes, the link you gave us to the iBooks page had more than enough information to scrape.
We tried scraping ourselves too about a year ago and it just became too much of a headache. Philipp's comment is a good one as the enterprise feed from apple (need to apply for it with a legitimate use) does have a good amount of useful info that you might be after in scraping.
There are a few companies that offer data as a service too - abto and AppMonsta are two I heard of when I was looking. I can't seem to find abto anymore but http://appmonsta.com seems to be. The search API looks ok (never experimented) but limited.
Good luck!