Subscriber never gets the entire queue - message-queue

we are working with PubSub to integrate several systems with each other. Some systems may push data to PubSub as JSON, while others can pull that data and use it. (Note: we have to pull from PubSub instead of push to the app due to other restrictions with the receiving application) Every pulling application gets it's own subscriber to each topic.
I have noticed that the PubSub pull does not get all data currently in the queue if it is triggered too frequently. The problem originally occurred in a Java Spring App with the respective library, but the gcloud command in the cloud console exhibits the same behaviour, so I am just going to use that example. I removed the ack-ids and borders to make it fit this window. Note how I don't use the '--auto-ack' flag, so the queue should stay the same, no other system is pulling from that subscriber.
First pull (complete content):
max_binnewies#cloudshell:~ $ gcloud pubsub subscriptions pull testSubscriber --limit=100
│ DATA │ MESSAGE_ID │
│ 4 - FOUR │ 189640873208084 │
│ 5 - FIVE │ 189636274179799 │
│ 2 - TWO │ 189638666587304 │
│ 3 - THREE │ 189627470480903 │
│ 1 - ONE │ 189639207684195 │
Second pull (only one):
max_binnewies#cloudshell:~ $ gcloud pubsub subscriptions pull testSubscriber --limit=100
│ DATA │ MESSAGE_ID │
│ 1 - ONE │ 189639207684195 │
Third pull (two different ones):
max_binnewies#cloudshell:~ $ gcloud pubsub subscriptions pull testSubscriber --limit=100
│ DATA │ MESSAGE_ID │
│ 4 - FOUR │ 189640873208084 │
│ 5 - FIVE │ 189636274179799 │
Fourth pull (first one again):
max_binnewies#cloudshell:~ $ gcloud pubsub subscriptions pull testSubscriber --limit=100
│ DATA │ MESSAGE_ID │
│ 1 - ONE │ 189639207684195 │
That behaviour is confusing to me. Is that normal PubSub behaviour or am I doing something wrong? The only thing I found is this link where it says that PubSub uses load balancing for the pull method:
https://cloud.google.com/pubsub/docs/subscriber
So I think that the subscriber thinks multiple clients are subscribing to it and spreads out the data if calls come in too quickly. Is that correct? What exactly is happening here?
If I wait a little while, I get more data again, but I never seem to be getting everything even if I wait five minutes... It is very confusing.
Can that cause a problem for the consuming application? How do I make sure all the data arrives at the receiving application even if it pulls very frequently? Is there a way to turn this off?

There are a couple of things the result in you not receiving all messages every time:
With pull requests, there is no guarantee that all messages will be returned in a particular request, even if there are fewer messages available than max messages. This is because Pub/Sub tries to balance returning more messages with minimizing end-to-end latency.
Messages have an ack deadline, which is specified on subscription creation time (and defaults to 10 seconds). What this means is that when you pull messages and don't ack or nack them, they will not be redelivered for the period of the ack deadline, basically giving the process that pulled the messages a lease on them. If you want messages to be redelivered immediately, you'd need to nack them if you are using the Java client library (the preferred way to interact with Cloud Pub/Sub) or you need to send a ModifyAckDeadline request with the ack_deadline_seconds set to 0.

Related

Subscribe for instances list update in GCE autoscaled group

Is there a way to get/subscribe the/for running instances list in GCE autoscaled group.
Via gcloud tool, we can periodically call for the list, but I would like to subscribe for the list update.
I doubt that there is such API implementation for now in GCE(except project metadata), but I need to have such functionality in my application so I could write a logic on that.
Maybe someone has experience with a similar case or know any "hack" for this?
To the best of my knowledge there is no method to subscribe to a list of instances in an managed instance group.
You will need to poll the managed instance group manually to determine the list of current instances.
gcloud compute instance-groups managed list <NAME>
This is a task that could be done very easily in Cloud Functions. At fixed intervals scan the group and email you the list in Json for example. The possibilities are endless.
You can build this easily using (1) a pubsub topic "instance-group-changes" and (2) pushing events to this in your startup & shutdown scripts.
(1) Create the "instance-group-changes" topic
gcloud init
gcloud pubsub topics create instance-group-changes
(2) Modify the startup script for the instance group to send an addInstance event
note: be sure to add the "cloud pubsub api access scope" in the instance template
Use the meta-data service to obtain the instance-id, hostname, etc.
TOPIC=instance-group-changes
instance_id=`curl -s http://metadata.google.internal/0.1/meta-data/instance-id`
gcloud pubsub topics publish "$TOPIC" \
--attribute 'event=addInstance' \
--message "instance_id=$instance_id"
(3) Modify the shutdown script to send a removeInstance event
TOPIC=instance-group-changes
instance_id=`curl -s http://metadata.google.internal/0.1/meta-data/instance-id`
gcloud pubsub topics publish "$TOPIC" \
--attribute 'event=removeInstance' \
--message "instance_id=$instance_id"
Testing
Create the subscription
gcloud pubsub subscriptions create sub-instance-group-changes --topic=instance-group-changes
Pull from the subscription
gcloud pubsub subscriptions pull --limit 5 sub-instance-group-changes
┌─────────────────────────────────┬─────────────────┬──────────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ DATA │ MESSAGE_ID │ ATTRIBUTES │ ACK_ID │
├─────────────────────────────────┼─────────────────┼──────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│
│ instance_id=5396233750823583338 │ 407816607936940 │ event=addInstance │ XkASTD4HRElTK0MLKlgRTgQhIT4wPkVTRFAGFixdRkhRNxkIaFEOT14jPzUgKEUaC1MTUVx1Hk4Qb1gzdQdRDRlze2hxO1kaAFMTUHRdURsfWVx-SgNRChFze2d1bVMQBwtBU1b55f_L9q0zZhs9XBJLLD5-NTJFQQ │
│ instance_id=5396233750823583338 │ 407816742842477 │ event=removeInstance │ XkASTD4HRElTK0MLKlgRTgQhIT4wPkVTRFAGFixdRkhRNxkIaFEOT14jPzUgKEUaC1MTUVx1Hk4Qb1gzdQdRDRlze2hxO1kaAFMTUHRcURsfWVx-SgNRChFze2ZxaFIXAwZCVFb55f_L9q0zZhs9XBJLLD5-NTJFQQ │

Issuer Scripts on Verifone Vx PIN pads

Does anyone know how the issuer script processing flow is supposed to work on VeriFone PIN pads? As I understand it, the card processor sends back the script(s) in a 9f18 tag. The scripts marked with 71 tag are to be processed prior to the second Generate AC and the one marked with 72 tag are to be processed after. My question is, what are the sequence of commands, C34, C25 in each case? I suppose you can have one or more 71s and 72s at the same time. The VeriFone API specification says this:
Re C25: "This command contains the scripts that are received from the host. The script results are returned in the C34 response."
Also, "All scripts need to be initialized by sending a C34 to the PINpad"
So, it's not clear if you send all the C25s, one for each script, and then a C34 or perhaps the 71s before and then the 72s after the C34.
Send multiple C25's as needed, C25 only supports one script at a time. Do not try to distinguish the 71 and 72 scripts, just send them. After all the scripts, send the C34.
From the Integrators Guide FAQ section:
Q: When receiving a 72 script when do we send the C34 to the pinpad?
A: C34 is always sent after the C25. The pinpad will process based on the script before or after the second generate AC.

HTML5 worker farm and updating UI progress counters

Is a small sub-worker farm feasible?
Let's say we have 100K URLs that we must test to see which are still active and which are dead, and we are trying to do this as quickly as we can using javascript, reporting the tally of results in the UI as the work progresses:
Total URLs processed: ###### Dead URLs Found: ###### Timeouts: #######
Can we create a master worker, passing it the 100K URLs, and then that worker in turn would create an array of 100 minion sub-workers, sending each minion an array containing only 1K items, and have each minion do HEAD requests for its list of URLs, reporting back to the master the request status (good, 404, etc) with each request; and the master in turn would periodically post a message back to the main window, where the UI progress counters would get incremented?
Would the master worker be able to listen to messages from its 100 minions and succcessfully update its local variables with the progress counts as they are reported (total processed, total dead, total timed out) without things getting clobbered? And then, let's say with every 100 urls processed, the master would post these tallies back to the UI?

How to get more than 1 stock information per call on Google Financials?

I'm using Google Script and Google Financials to get information for a list of stocks I have in a text file. The problem is that the class FinanceApp just seems to be able to get one stock at a time and since I have to do this for more than 250 stocks I reach the maximum call limit.
Is there a better way to do this?
Since there are limitations and you are making repeted tests, I suggest using a cache : You can then repeat the test without hitting the limit (assuming you request always the same data for the same date, i.e. using StockInfoSnapshot).
You do it by wrapping FinanceApp.getHistoricalStockInfo() so that it serves from the cache if possible, or add to it if info is not available.
The cache could conveniently resides in the "script-related storage" : https://developers.google.com/apps-script/script_user_properties
Good luck !

page and page_size parameters are ignored for get_groups, get_group_folders, and get_group_users

I'm working on an application that uses the Box v1 "enterprise" APIs for user and group management (the v2 API doesn't have these methods yet). Specifically, I'm enumerating groups and their associated folders and users using get_groups, get_group_folders, and get_group_users.
I have a large number of groups and folders in my organization, and I'm unable to page through the results; I only get 20 items at a time from each of these APIs. I've tried variations on the page and page_size parameters listed in the API docs, but they don't seem to do anything.
Specifically, each of these three requests gives me the same 20 groups back:
https://www.box.net/api/1.0/rest?api_key=XXX&auth_token=YYY&action=get_groups
https://www.box.net/api/1.0/rest?api_key=XXX&auth_token=YYY&action=get_groups&page=2
https://www.box.net/api/1.0/rest?api_key=XXX&auth_token=YYY&action=get_groups&page_size=50
The same goes for get_group_folders and get_group_users.
For optional parameters you do need to format them within params[]. For example when changing the page_size, your request would be:
http://box.net/api/1.0/rest?action=get_groups&api_key=API_KEY&auth_token=AUTH_TOKEN&params[page_size]=VALUE .