MediaWiki not returning continue parameter - mediawiki

I'm using this API request:
https://en.wikipedia.org/w/api.php?action=query&list=geosearch&gsradius=10000&gscoord=51.540951897949|-0.051086739997922&format=json&gslimit=50&continue=
which delivers 50 results. I want to use the 'continue' parameter to get the next page of results. According to the documentation I should get a continue field back in the results. I don't get any such result so can't get the next page.
Does anyone have any suggestions?

Dave, as #svick says, it seems list=geosearch (which is part of extension:GeoData) does not support continuation; indeed, it actually returns a "batchcomplete" element to indicate no more results (see in human-readable form).
I think you should either just get the maximum number of results (500 for users, 5000 for bots on Wikipedia), or if that's not satisfactory for your use case (which is?), pipe in at task T78703.
(Or, if you believe it to be a separate issue, report a new bug.

Related

Request.post arguments in python

Is there a way to determine the max value of an argument in a requests.post command to a website if we don't know the amount of data in the website's dataset? I'm trying to execute the following code to get specific information on all daycares from this website, but don't know the value of the last argument (length). Currently, I'm assuming this value is 20, but it is subject to change from time to time. How do I keep it open ended so I don't have to guess the max value for lenth? Code as follows:
data_requested = requests.post("https://data.nj.gov/views/INLINE/rows.json?"
"accessType=WEBSITE&method=getByIds&asHashes=true&start=0&length=20",
json=data)
njcc_data = data_requested.json()
Notice that this has nothing to do with requests.post - the range of values length can take is determined by the creator of that API and is an unknown quantity both to you and to requests.
You can try to reason about what possible values it could take, is it the length of a person? If yes, it's probably not going to be more than 250cm.
You can also use trial and error and see how high you can make it before the API endpoint gives back an error, but I guess this is what you were trying to avoid.
If length is the number of items returned (the length of the returned json array) then you could just try setting it to a high number like 1000 and see if you can get away with it.

mediawiki-api - iterating through continue to get all results

I'm trying to create a list of all the subcategories in a category, and for all those subcategories, the basic categoryinfo for them. (Number of files, subcategories, etc.)
I'm very close - just getting hung up on handling the continue process.
This gets me the first 100 results:
http://en.wikipedia.org/w/api.php?action=query&format=xml&generator=categorymembers&gcmtitle=Category:Google%20Art%20Project%20works%20by%20artist&gcmlimit=100&gcmprop=ids|title&prop=categoryinfo&continue=
But, there are thousands of subcategories.
The result includes an xml node continue with gcmcontinue and continue attributes.
If I use that in my second request, this gives me the next 100 results:
http://en.wikipedia.org/w/api.php?action=query&format=xml&generator=categorymembers&gcmtitle=Category:Google%20Art%20Project%20works%20by%20artist&gcmlimit=100&gcmprop=ids|title&prop=categoryinfo&continue=gcmcontinue||&gcmcontinue=subcat|4c41555245c380204241525241550a474f4f474c45204152542050524f4a45435420574f524b53204259204c41555245c38020424152524155|38370707
BUT, that's where I'm having the problem. These (second) set of results no longer have a continue xml node, so I'm not sure how to access the third page and so on.
(As a side note, I'm aware that if I wanted to - that I'd have to handle sub-sub-categories - but I don't need those, just the first level is fine.)
James' own answer: So, it helps to make sure you hit "commons.wikimedia.org" instead of "en.wikipedia.org" if you want the results from commons! That was the issue.

Using $skip with the SharePoint 2013 REST API

Forgive me, I'm very new to using REST.
Currently I'm using SP2013 Odata (_api/web/lists/getbytitle('<list_name>')/items?) to get the contents of a list. The list has 199 items in it so I need to call it twice and each time ask for a different set of items. I figured I could do this by calling:
_api/web/lists/getbytitle('<list_name>')/items?$skip=100&$top=100
each time changing however many I need to skip. The problem is this only ever returns the first 100 items. Is there something I'm doing wrong or is $skip broken in the OData service?
Is there a better way to iterate through REST calls, assuming this way doesn't work or isn't practical?
I'm using the JSon protocol with the Accept Header equaling application/json;odata=verbose
I suppose the $top=100 isn't really necessary
Edit: I've looked it up more and, I'm not entirely sure of the terms here, but using $skip works fine if you're using the method introduced with SharePoint 2010, i.e., _vti_bin/ListData.svc/<list_name>?$skip=100
Actually, funny enough, the old way doesn't set a 100 item limit on returns. So skip isn't even necessary. But, if you'd like to only return a certain segment of data, you'd have to do something like:
_vti_bin/ListData.svc/<list_name>?$skip=x&$top=(x+y)
where each time through the loop you would have something like x+=y
You can either use the old method which I described above, or check out my answer below for an explanation of how to do this using SP2013 OData
Alright, I've figured it out. $skip isn't a command which is meant to be used at the items? level. It works only at the lists? level. But, there's a way to do this, actually much easier than what I wanted to do.
If you just want all the data
In the returned data, assuming the list you are calling holds more than 100 items, there will be a __next at d/__next (assuming you are using json). This __next (it is a double underscorce, keep that in mind. I had a few problems at first because I was trying to get d/_next which never returned anything) is the right URL to get the next set of items. __next will only ever be a value if there is another set of items available to get.
I ended up creating a RequestURL variable which was initially set to to original request, but was changed to d/__next at the end of the loop. Then the loop went and checked if the RequestURL was not empty before going inside the loop.
Forgive my lack of code, I'm using SharePoint Designer 2013 to make this, and the syntax isn't horribly descriptive.
If you'd only like a small set of data
There's probably a few situations where you would only want x amount of rows from your list each time you go through the loop and that's real easy to do as well.
if you just add a $top=x parameter to your request, the __next URL that comes back with the response will give you the next x rows from your list. Eventually when there are no rows left to return __next won't be returned with the response.
Don't forget that in order to use __next you need to have a
$skiptoken=Paged=TRUE
in the url as well.

Drive API files.list query with 'not' parameter returns empty pages

I'm using the Drive API to list files from a collection which do not contain a certain string in their title.
My query looks something like this:
files().list(q="'xxxxx' in parents and not title contains 'toto'")
In my drive collection, I have 100 files, all contain the string "toto" in their title except for let's say 10 files.
I'm using pagination to retrieve the results 20 by 20, so I'm expecting to get only one page with the 10 files corresponding to my request. Surprisingly, the API returns 5 pages, with the first 4 having no results but with a nextToken page, and the files which are compliant with my request only come with the fifth page.
I'm still trying some use-cases here but it seems that it has something to do with the "not" operator. Like if the request was made without it, therefore returning 5 pages, but the results not corresponding to the request being removed from the response. It's very disturbing for me as I'm looking for the best performance here, and obviously having to make 5 requests to Drive instead of one single is not good for me. I'm also noticing that the results don't always come in the last page. I made the test with another collection, the results show up in the second page, but I still get 3 empty pages after that.
Am I missing something here ? Is this kind of behaviour "normal" ? I mean imagine if I had 1000 documents in my collection, having to make 50 requests to find only a few is not what I expect.
I have similar problem in files.list API. I tried to receive all three folders under root folder. I received result only on 342nd page. After several hours of researching I found some regularity in this strange behavior.
As I understood, the Drive API works in this way:
Detects something like index that best match your query
Selects first 20 records using index from step 1
Applies your filter: removes records that do not match your query
Rest is returned to you (maybe empty) with next page token.
The nextPageToken is looks like just OFFSET for the first record on next page in decided index, maybe it contains some information about query or index.
After base64 decode this token I found appropriate record number for next result in 121st position in decoded token.
Previously I built index of tokens using maxResults=1.
This is crazy, but I have no other explanation for observable behavior.
It is very useful for server because server do a very small work for search. From other side this algorithm must produce a lot of requests for pagenate whole list. But limitation for requests per second solve this problem.
Only You can do is pagenage and skip empty results. Do not forget about limitation of number of requests.
Do not try to find errors on your side. This is how Google Drive API works.
contains operator is working as a prefix matcher at the moment.title contains 'toto' will match "totolong" and "toto", but not "blahtoto".

Catch Mathematica warnings/errors without displaying them

I have a problem involving NDSolve in Mathematica, which I run multiple times with different values of the parameters. For some of these values, the solution results in singularities and NDSolve warns with NDSolve::ndsz or other related warnings.
I would simply like to catch these warnings, suppress their display, and just keep track of the fact that a problem occurred for these particular values of the parameters. I thought of the following options (neither of which really do the trick):
I know I can determine whether a command has resulted in a warning or error by using Check. However, that will still display the warning. If I turn it off with Off the Check fails to report the warning too.
It is possible to stop NDSolve using the EventLocator method, so I could check for very large values of the function or its derivatives and stop evaluation in that case. However, in practice, this still produces warnings from time to time, presumably because the step size can sometimes be so large that the NDSolve warning triggers before my Event has taken place.
Any other suggestions?
If you wrap the Check with Quiet then I believe that everything should work as you want. For example, you can suppress the specific message Power::indet
In[1]:= Quiet[Check[0^0,err,Power::indet],Power::indet]
Out[1]= err
but other messages are still displayed
In[2]:= Quiet[Check[Sin[x,y],err,Power::indet],Power::indet]
During evaluation of In[2]:= Sin::argx: Sin called with 2 arguments; 1 argument is expected. >>
Out[2]= Sin[x,y]
Using Quiet and Check together works:
Quiet[Check[Table[1/Sin[x], {x, 0, \[Pi], \[Pi]}], $Failed]]
Perhaps you wish to redirect messages? This is copied almost verbatim from that page.
stream = OpenWrite["msgtemp.txt"];
$Messages = {stream};
1/0
FilePrint["msgtemp.txt"]