Squarespace Change pageSize limit on getting blog posts - json

i'm working on a Squarespace Website, i have a blog with several posts, actually 36 posts, using ajax call i parse all the posts with the following url url to parse, the problem is that SQS return only 20 items, and the other items should be parsed again with the offset returned :
"pagination": {
"nextPage": true,
"nextPageOffset": 1518167880210,
"nextPageUrl": "/timeline-list-v7/?offset=1518167880210",
"pageSize": 20
},
So if i have 100 or 500 posts created i should do 1 ajax call to get 20 posts each time (5 or 28 calls)? SQS forums doesn't give a solution for that. is there any param that i can give to the url that might give me more items than 20?
Thanks.

I know of no parameter that can return more results than what the collection's pagesize property is set to.
However, there are ways to get more than 20 results, both of which require developer mode to be enabled.
The first option is to set the collection's pagesize property in the .conf file to a number higher than 20. That should cause your requests to return that number of items.
"pageSize" : 999,
"forcePageSize" : true
Keep in mind that increasing the pagesize in this way may increase page load times within that collection.
The second option is to use a custom query tag (<squarespace:query>) and embed a <script> within its scope. Within the query, you could set the limit to up-to 100. The script could then have access to store the collection data to the global window object for use by another script outside that context (for example). But this will only help you up to 100 results, not 500.
If neither of those work (both require dev. mode), then I think you are left with a recursive AJAX request as your only option...one that continues to pull item data 20 at a time until all items are gathered.
Hope those ideas help.

Related

Pagination yields no results in Google Fit

I am using the REST API of Google Fit. I want to list sessions with the fitness.users.sessions.list method. This gives me a few dozen of results.
Now I would like to get more results and for this I set the pageToken to the value I got from the previous response. But the new results does not contain any data points, just yet another pageToken:
{
"session": [
],
"deletedSession": [
],
"nextPageToken": "1541027616563"
}
The same happens when I use the pagination function of the Google Python API Client: I iterate on results but never get any new data.
request = self.service.users().sessions().list(userId='me')
while request is not None:
response = request.execute()
for ds in response['session']:
yield ds
request = self.service.users().sessions().list_next(request, response)
I am sure there is much(!) more session data in Google Fit for my account. Am I missing something regarding pagination?
Thanks
I think that the description of the pageToken parameter is actually rather confusing in the documentation (this answer was written prior to the documentation being updated).
The continuation token, which is used to page through large result sets. To get the next page of results, set this parameter to the value of nextPageToken from the previous response.
This is conflating two concepts: continuation, and paging. There isn't actually any paging in the implementation of Users.sessions.
Sessions are indexed by their modification timestamp. There are two (or three, depending on how you count) ways to interact with the API:
Pass a start and/or end time. Omitted start and end times are taken to be the start and end of time respectively. In this case, you will get back all sessions falling between those times.
Pass neither start nor end times. In this case, you will receive all sessions between some time in the past and now. That time is:
pageToken, if provided
Otherwise, it's 7 days ago (this doesn't actually appear in the documentation, but it is the behavior)
In any of these cases, you receive a nextPageToken back which is just after the most recent session in the results. As such, nextPageToken is really a continuation token, because what it is saying is that you have been told about all sessions modified up to now: pass that token back to be told about anything modified between nextPageToken and "current time" to get updates.
As such, if you issue a request that fetches all sessions for the last 7 days (no start/end time, no page token) and get a nextPageToken, you will only get something back in a request using that nextPageToken if any sessions have been modified in between the first and second requests.
So, if you're making these requests in quick succession, it is expected that you won't see anything in the second response.
In terms of the validity of the startTime you were passing in your comment, that's a bug. RFC3339 defines that fractional seconds should be optional.
I'll see about getting that fixed; but in the interim, just make sure you pass a fractional number of seconds (even if it is just .0, e.g. 2018-10-18T00:00:00.0+00:00).
It may be because the format of the URL you're using is different from the example in the documentation.
You are using:
startTime=2018-10-18T00:00:00+00:00
Wherein the one in the documentation has it as:
startTime=2014-04-01T00:00:00.00Z
The documentation also stated that both startTime and endTime query parameters are required.

aws s3 in boto3 - how to move to next page by paginator of s3 in boto3

i'm trying to get s3 objects by paginator or buckets.objects.all().
but there were no any idea to pass next page or next token to move.
I would like to show s3 images in html with pagination.
this is s3 paginator code. here was no any passing next page.
http://boto3.readthedocs.io/en/latest/guide/paginators.html
paginator = client.get_paginator('list_objects')
page_iterator = paginator.paginate(Bucket='my-bucket',
PaginationConfig={'MaxItems': 10})
(update, I remove the previous contents, which only apply to typical list_objects, not paginator)
Compliment to #HelloV, if you need precise pagination control , you can try boto3.client('s3').list_objects_v2 instead of list_objects,
Currently, paginator('list_objects') will return a "Marker" element, which allow you to use JMESPpath jump to specific iterators.
filtered_iterator = page_iterator.search("Contents[?Marker = `<a_marker_key>`]")
for key_data in filtered_iterator:
print(key_data)
However, you still need to loop through the whole paginator iterator to store the marker key, and do the page manipulation. For list_objects_v2, you will get "ContinuationToken" (like Marker) and "NextContinuationToken", which let you traverse the page better compare to list_object
You are confusing MaxItems with PageSize.
MaxItems
Limits the maximum number of total returned items returned while
paginating.
PageSize
Controls the number of items returned per page of each result.
No need to track the next page/token, the iterator will do it for you. The following code returns 2 objects per page subject to a maximum of 10 objects which means there can be a maximum of 5 iterations. Both MaxItems and PageSize are optional. Is it clear now?
paginator = client.get_paginator('list_objects')
page_iterator = paginator.paginate(Bucket='my-bucket',
PaginationConfig={'PageSize': 2, 'MaxItems': 10})
for page in page_iterator:
print(page['Contents'])

How to change AJAX code on a webpage that would remain after refreshing?

I wanted to crawl metal-archives.com and put the info in a database about metal bands. After looking at the code for a good 20 minutes I figured they keep the data in a JSON file that can be accessed with this URL. The only problem is that the AJAX code is set to show only 200 entries per page:
$(document).ready(function() {
createGrid(
"#searchResults", 200,
At the top of the file I can see there are more than 11,000 bands, but only 200 showing. Also, when I click the different pages AJAX takes care of fetching the data dynamically, without changing the URL in the address bar, so I couldn't see the rest of the bands.
Then I tried changing the code above to "#searchResults", 1000 hoping it would remain after refreshing, but, alas, no luck. Any idea how I could do that, essentially make it possible to parse the entire JSON to a Python dictionary and create a DB?
As the url is always returning 200 records, you can call this url in loop until you get all the records
Step 1:
Using the below url, pass iDisplayStart=0 and get first 200 records,
http://www.metal-archives.com/search/ajax-band-search/?iDisplayStart=0&iDisplayLength=200
Step 2:
Parse the json and get the value of iTotalRecords in the json and call the url again and again in the loop until you get all the records.
You can change the iDisplayStart as iDisplayStart+=200 to call next 200 records as below,
http://www.metal-archives.com/search/ajax-band-search/?iDisplayStart=200&iDisplayLength=200
and then,
http://www.metal-archives.com/search/ajax-band-search/?iDisplayStart=400&iDisplayLength=200
Hope it helps you.

MediaWiki not returning continue parameter

I'm using this API request:
https://en.wikipedia.org/w/api.php?action=query&list=geosearch&gsradius=10000&gscoord=51.540951897949|-0.051086739997922&format=json&gslimit=50&continue=
which delivers 50 results. I want to use the 'continue' parameter to get the next page of results. According to the documentation I should get a continue field back in the results. I don't get any such result so can't get the next page.
Does anyone have any suggestions?
Dave, as #svick says, it seems list=geosearch (which is part of extension:GeoData) does not support continuation; indeed, it actually returns a "batchcomplete" element to indicate no more results (see in human-readable form).
I think you should either just get the maximum number of results (500 for users, 5000 for bots on Wikipedia), or if that's not satisfactory for your use case (which is?), pipe in at task T78703.
(Or, if you believe it to be a separate issue, report a new bug.

Drive API files.list query with 'not' parameter returns empty pages

I'm using the Drive API to list files from a collection which do not contain a certain string in their title.
My query looks something like this:
files().list(q="'xxxxx' in parents and not title contains 'toto'")
In my drive collection, I have 100 files, all contain the string "toto" in their title except for let's say 10 files.
I'm using pagination to retrieve the results 20 by 20, so I'm expecting to get only one page with the 10 files corresponding to my request. Surprisingly, the API returns 5 pages, with the first 4 having no results but with a nextToken page, and the files which are compliant with my request only come with the fifth page.
I'm still trying some use-cases here but it seems that it has something to do with the "not" operator. Like if the request was made without it, therefore returning 5 pages, but the results not corresponding to the request being removed from the response. It's very disturbing for me as I'm looking for the best performance here, and obviously having to make 5 requests to Drive instead of one single is not good for me. I'm also noticing that the results don't always come in the last page. I made the test with another collection, the results show up in the second page, but I still get 3 empty pages after that.
Am I missing something here ? Is this kind of behaviour "normal" ? I mean imagine if I had 1000 documents in my collection, having to make 50 requests to find only a few is not what I expect.
I have similar problem in files.list API. I tried to receive all three folders under root folder. I received result only on 342nd page. After several hours of researching I found some regularity in this strange behavior.
As I understood, the Drive API works in this way:
Detects something like index that best match your query
Selects first 20 records using index from step 1
Applies your filter: removes records that do not match your query
Rest is returned to you (maybe empty) with next page token.
The nextPageToken is looks like just OFFSET for the first record on next page in decided index, maybe it contains some information about query or index.
After base64 decode this token I found appropriate record number for next result in 121st position in decoded token.
Previously I built index of tokens using maxResults=1.
This is crazy, but I have no other explanation for observable behavior.
It is very useful for server because server do a very small work for search. From other side this algorithm must produce a lot of requests for pagenate whole list. But limitation for requests per second solve this problem.
Only You can do is pagenage and skip empty results. Do not forget about limitation of number of requests.
Do not try to find errors on your side. This is how Google Drive API works.
contains operator is working as a prefix matcher at the moment.title contains 'toto' will match "totolong" and "toto", but not "blahtoto".