Stream Position Returned By Box API Cannot Be Used To Track Events

Stream Position Returned By Box API Cannot Be Used To Track Events - box-api

Thanks for your reply for my question: Is this a bug of Box API v2 when getting events
This is a new problem related to this. The problem is that I cannot reliably use the next_stream_position I got from previous calls to track events.
Given this scenario:
Given the following two GET HTTP queries:
1. GET https://api.box.com/2.0/events?stream_position=1336039062458
This one returns the JSON file which contains one file entry of myfile.pdf and the next stream position = 1336039062934
2. GET https://api.box.com/2.0/events?stream_position=1336039062934
This call uses the stream position I got from the first call. However, it returns the JSON contains the exactly same file entry of myfile.pdf with the first call.
I think if the first call gives a stream position, it should be used as a mark for that exact time (say: TIme A). If I use that stream position in subsequent queries, no events before "Time A" should be returned.
Is this a bug? Or did I use the API in the wrong way?
Many thanks.

Box’s /events endpoint is focused on delivering to you a highly reliable list of all the events relevant to your Box account. Events are registered against a time-sequenced list we call the stream_position. When you hit the /events API and pass in a stream_position we respond to you with the events that happened slightly before that stream position, up to the current stream_position, or the chunk_size, whichever is lesser. Due to timing lag and our preference to make sure you don’t miss some event, you may receive duplicate events when you call the /events API. You may also receive events that look like they are ‘before’ events that you’ve already received. Our philosophy is that it is better for you to know what has happened, than to be in the dark and miss something important.

Box events currently give you a window roughly 5 seconds into the past, so that you don't miss some event.
We have considered just delaying the events we send you by about 5 seconds and de-duplicating the events on our side, but at this point we've turned the dial more towards real-time. Let us know if you'd prefer a fully de-duped stream, that was slower.
For now, (in beta) if you write your client to check for duplicate events, and discard them, that will be best. We are about to add an event_id to the payload so you can de-duplicate on that. Until then, you'll have to look at a bunch of fields, depending on the event type... It's probably more challenging that it is worth.

In order to help you be able to figure out if an event is a duplicate, we have now added to each event an event_id that will be unique. It is our intention that the event_id will allow you to de-duplicate the responses you receive from subsequent GET /events calls.
You can see this reflected in the updated documentation here, including example payloads.

Related

Data Studio connector making multiple calls to API when it should only be making 1

I'm finalizing a Data Studio connector and noticing some odd behavior with the number of API calls.
Where I'm expecting to see a single API call, I'm seeing multiple calls.
In my apps script I'm keeping a simple tally which increments by 1 every url fetch and that is giving me the correct number I expect to see with getData().
However, in my API monitoring logs (using Runscope) I'm seeing multiple API requests for the same endpoint, and varying numbers for different endpoints in a single getData() call (they should all be the same). E.g.
I can't post the code here (client project) but it's substantially the same framework as the Data Connector code on Google's docs. I have caching and backoff implemented.
Looking for any ideas or if anyone has experienced something similar?
Thanks

Per the this reference, GDS will also perform semantic type detection if you aren't explicitly defining this property for your fields. If the query is semantic type detection, the request will feature sampleExtraction: true
When Data Studio executes the getData function of a community connector for the purpose of semantic detection, the incoming request will contain a sampleExtraction property which will be set to true.

If the GDS report includes multiple widgets with different dimensions/metrics configuration then GDS might fire multiple getData calls for each of them.

Kind of a late answer but this might help others who are facing the same problem.
The widgets / search filters attached to a graph issue getData calls of their own. If your custom adapter is built to retrieve data via API calls from third party services, data which is agnostic to the request.fields property sent forward by GDS => then these API calls are multiplied by N+1 (where N = the amout of widgets / search filters your report is implementing).
I could not find an official solution for this either, so I invented a workaround using cache.
The graph's request for getData (typically requesting more fields than the Search Filters) will be the only one allowed to query the API Endpoint. Before starting to do so it will store a key in the cache "cache_{hashOfReportParameters}_building" => true.
if (enableCache) {
cache.putString("cache_{hashOfReportParameters}_building", 'true');
Logger.log("Cache is being built...");
}
It will retrieve API responses, paginating in a look, and buffer the results.
Once it finished it will delete the cache key "cache_{hashOfReportParameters}building", and will cache the final merged results it buffered so far inside "cache{hashOfReportParameters}_final".
When it comes to filters, they also invoke: getData but typically with only up to 3 requested fields. First thing we want to do is make sure they cannot start executing prior to the primary getData call... so we add a little bit of a delay for things that might be the search filters / widgets that are after the same data set:
if (enableCache) {
var countRequestedFields = requestedFields.asArray().length;
Logger.log("Total Requested fields: " + countRequestedFields);
if (countRequestedFields <= 3) {
Logger.log('This seams to be a search filters.');
Utilities.sleep(1000);
}
}
After that we compute a hash on all of the moving parts of the report (date range, plus all of the other parameters you have set up that could influence the data retrieved form your API endpoints):
Now the best part, as long as the main graph is still building the cache, we make these getData calls wait:
while (cache.getString('cache_{hashOfReportParameters}_building') === 'true') {
Logger.log('A similar request is already executing, please wait...');
Utilities.sleep(2000);
}
After this loop we attempt to retrieve the contents of "cache_{hashOfReportParameters}_final" -- and in case we fail, its always a good idea to have a backup plan - which would be to allow it to traverse the API again. We have encountered ~ 2% error rate retrieving data we cached...
With the cached result (or buffered API responses), you just transform your response as per the schema GDS needs (which differs between graphs and filters).
As you start implementing this, you`ll notice yet another problem... Google Cache is limited to max 100KB per key. There is however no limit on the amount of keys you can cache... and fortunately others have encountered similar needs in the past and have come up with a smart solution of splitting up one big chunk you need cached into multiple cache keys, and gluing them back together into one object when retrieving is necessary.
See: https://github.com/lwbuck01/GASs/blob/b5885e34335d531e00f8d45be4205980d91d976a/EnhancedCacheService/EnhancedCache.gs
I cannot share the final solution we have implemented with you as it is too specific to a client - but I hope that this will at least give you a good idea on how to approach the problem.
Caching the full API result is a good idea in general to avoid round trips and server load for no good reason if near-realtime is good enough for your needs.

Avoiding Envelope charges on duplicate document if a timeout was reached.

When I create an embedded document for signing, if the document times out after the 5 minutes allowed, how should I handle this? Just resend and basically create a new one? I've done this and it seems to duplicate envelopes... Is there a way to just renew the timeout, and re-direct the user to the same envelope? I've found similar posts but cant seem to find the exact answer to this... the goal of course avoiding a second envelope charge because they took longer than 5 minutes to sign the document. In creating a new envelope I'm sending the exact same document id, user etc.. but still seems to duplicate it on the back end.

The "short lived URL" that you generate from the API does expire, and it is only good for a single use. If you need another URL due to timeout or 'finish later', you can use the API to request another URL. You do not need to create another Envelope.
https://www.docusign.com/p/RESTAPIGuide/RESTAPIGuide.htm#Basic Scenarios/Embedded Signing.htm?Highlight=embedded

box.com TAG_ITEM_CREATE event

In the box docs, it states that the event TAG_ITEM_CREATE occurs when 'A Tag was added to a file or folder'. Is there any way to find out which folder/file the tag was added to ,without iterating them all?

If you're fetching events for a Box enterprise you can have Box perform server-side filtering of events. However, when fetching events for a standard Box account you must do the filtering in your application. The next_stream_position parameter can be used to set a lower time bound for the events that you receive, which can significantly reduce the amount of metadata that you have to sort through.
EDIT: Answering questions from comment.
Q: Not sure how filtering events helps me work out which folder has been tagged?
The event object that's returned to you will look like this:
{
"next_stream_position":1348790499819,
"entries":[
{
"event_type":"TAG_ITEM_CREATE",
"source":{
"type":"folder",
"id":"11446498",
... more event info ...
}
},
... more events ...
]
}
In your application you can look for those events whose event_type is TAG_ITEM_CREATE. You can then use the source to determine which particular resource was tagged.
Q: Are you saying that I can ask for all folders changed since a given TAG_ITEM_CREATE event?
No. Think of the 'next_stream_position` property as a proxy for a timestamp. You can use it to tell Box, "Tell me about all the events that occurred after this 'position' in time." But this will still give you all types of events that occurred. You'll have to select the events of interest in your application.
This is one of the big differences in the Enterprise- and User-facing APIs. In an Enterprise you can tell Box, "I want info on all the TAG_ITEM_CREATE events that occurred in the Enterprise between yesterday and today." In the User-facing API the best you can tell Box is effectively, "I want info on all events that occurred in this User's box since yesterday."

How to implement a time wait before html form resubmission?

I have an html form which inserts data into a database. I just built it.. it's very basic, as I'm just doing this to learn. In doing this, I see that I can hit the back browser button and post again.. and again.. and again.. and it keeps writing to the db.
I've seen sites where I try to resubmit info and it tells me I must wait 60 seconds (or whatever). Is this the preferred method to solve this problem? If so, how does one go about implementing it?
Or maybe you would handle it a different way?

When you insert a row, store the submission time in the table, or in the user's session.
Whenever you process the form, compare that time to the current time. If it's within 60 seconds, display an error instead of inserting a row.

There are two methods :
i) Simple client side javascript:
Store the time of last event in a javascript variable,
when the user does the event again , send an alert message about timing.
( This method can be fooled though by users knowing javascript )
ii) Store the time of last event in your database at backend when the form post is done. When the same form post is done again, check for the time, if it is allowed, do the processing, else reply with a message about the timing.

What is the difference between the BU and ZK OK codes in SAP macro

I am trying the post an invoice to SAP using the F-47 transaction and using SHDB to record the transaction and learn how it works. I see there that sometimes BU and ZK BDC OK codes are used. I would like to understand the difference between them, but could not find any official documentation. Please, explain the difference between the two?

I found the meaning of some of the status codes. I post it here, so I can remember:
/00. Enter
/AB Go to overview
=ZK Go to additional information
=ENTE Enter (don't know exactly what is difference between /00)
=PI select cursor location
=STER Go to taxes
=DELZ delete cursor
=GO continue
=BU post (save)
/EEND end processing
=Yes select "yes" from message box
=BP park (save)
=ENTR Enter (don't know exactly what is difference between =ENTE or /00)
=AE save when changing document
=BK change document header (parking or posting parked document)
=P+ next page
=BL delete parked document

A BDC_OKCODE indicates which action is (will) be executed on a screen (things like save, back, exit etc). The BU code is used for a SAVE function (like in MM01 transaction). Sorry but I cannot recall to which function ZK maps to. Obviously their difference lies in the fact that they map to different functions. You can still find out which function each button utilizes by using System->Status->GUI status.

By the way, BTCI transactions are not fully robust- minor changes in GUI flow let your program break. Error handling / analysis is tedious.... DId you have a look to posting methods more preferably? E.g. like BAPI_* function modules? With the help of LSMW you can browse for different input methods and use them later standalone. Or you can use transaction BAPI directly.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008