DialogFlow Testing Cloud Function concurrency - google-cloud-functions

I have a Google Assistant action with fulfilment through Firebase Cloud Functions. I understand that Cloud Functions may share instances between invocations, and that you can use the Global scope to do heavy lifting and preparation. My function instantiates a global class that has serialised some JSON and handles returning data and other tasks in my function. I have variables in this class that are set when the function is called, and I have been careful to make sure that the variables are all set using the conv.data session data object that is unique to the current conversation. The hope is that although the class instance may exist between different invocations, and possibly by different users, it will still be contextualised to the local scope, and I wont see any variables being overwritten by other sessions.
Which brings me to the question, which is, how can I test this? I have tried to test on my mobile device using the Google Assistant app, at the same time as testing in the browser console. I witnessed the two sessions getting merged together, and it was an unholy mess, but I am not sure if that was the global scope, or just that I was testing two sessions with the same user account.
Can anyone enlighten me on whether it is possible to run two of the same action using the same user account? It looked like the conv.data object had a mix of the two different sessions I was running which suggests it was using the same conversation token for both sessions.
Another question would be, do you think using a global class to store state across invocations is going to be an issue with different users? The docs do state that only one invocation of the function can ever happen at a time. So there shouldn't be any race condition type scenarios.

Dialogflow should keep the data in conv.data isolated to a single session, even sessions from the same user. When you're using Dialogflow, this data is stored in a Context, which is session specific.
You can verify this by turning StackDriver logging on, which will let you examine the exact request and response that Dialogflow is using with your fulfillment, and this will include the session ID for tracking. (And if you think it is mixing the two, posting the request and response details would help figure out what is going on.)
Very roughly, it sounds like you're getting something mixed into your global, or possibly something set in one session that isn't cleared or overwritten by a different one. Again - seeing the exact requests and responses should help you (and/or us) figure that out.
My attitude is that a global such as this should be treated as read-only. If you want to have some environment object that contains the relevant information for just this session - I'd keep that separate, just from a philosophical design.
Certainly I wouldn't use this global state to store information between sessions. While a function will only be invoked, I'm not sure how that would work with Promises - which you'll need once you start any async operations. It also runs the risk that subsequent invocations might be on different instances.
My approach, in short, (which I make pretty firm in multivocal):
Store all state in a Context (which conv.data should so).
Access this via the request, conv, or some other request-specific object that you create.
Global information / configuration should be read-only.

Related

Querying a live deployment (modeling objectives) from workshop or functions, but not samplewise (like with scenarios)

I have an optimization algorithm deployed as a live deployment. It takes a set of objects and returns a set of objects with potentially different size. This works just fine when I'm using the REST api.
The problem is, I want to let the user of the workshop app query the model with a set of objects. The returned objects need to be written back to the ontology.
I looked into an action backed function but it seems like I can't query the model from a function?!
I looked into webhooks but it seems to not fit the purpose and I also would need to handle the API key and can't write back to the ontology?!
I know how to query the model with scenarios but it is per sample and that does not fit the purpose, plus I cant write back to the ontology.
My questions:
Is there any way to call the model from a function and write the return back to the ontology?
Is there any way to call a model from workshop with a set of objects and write back to the ontology?
Is modeling objectives just the wrong place for this usecase?
Do I need to implement the optimization in Functions itself?
I answered the questions, as well I tried to address some of the earlier points.
Q: "I looked into an action backed function but it seems like I can't query the model from a function?!"
A: That is correct, at this time you can't query a model from a function. However there are javascript based linear optimization libraries which can be used in a function.
Q: "I looked into webhooks but it seems to not fit the purpose and I also would need to handle the API key and can't write back to the ontology?!"
A: Webhooks are for hitting resources on networks where a magritte agent are installed. So if you have like a flask app on your corporate network you could hit that app to conduct the optimization. Then set the webhook as "writeback" on an action and use the webhook outputs as inputs for a ontology edit function.
Q: "I know how to query the model with scenarios but it is per sample and that does not fit the purpose, plus I cant write back to the ontology."
A: When querying a model via workshop you can pass in a single object as well as any objects linked in a 1:1 relationship with that object. This linking is defined in the modeling objective modeling api. You are correct to understand you can't pass in an arbitrary collection of objects. You can write back to the ontology however, you have to set up an action to apply the scenario back to the ontology (https://www.palantir.com/docs/foundry/workshop/scenarios-apply/).
Q: "Is there any way to call the model from a function and write the return back to the ontology?"
A: Not from an ontology edit function
Q: "Is there any way to call a model from workshop with a set of objects and write back to the ontology?"
A: Only object sets where the objects have 1:1 links within the ontology. You can writeback by appyling the scenario (https://www.palantir.com/docs/foundry/workshop/scenarios-apply/).
Q: "Is modeling objectives just the wrong place for this usecase? Do I need to implement the optimization in Functions itself?"
A: If you can write the optimization in an ontology edit function it will be quite a bit more straightforward. The main limitation of this is you have to use Typescript which is not as commonly used for this kind of thing as Python. There are some basic linear optimization libraries available for JS/TS.

Reflected SQLAlchemy metadata in celery tasks?

For better testability and other reasons, it is good to have SQLAlchemy database sessions configuration non-global as described very well in the following question:
how to setup sqlalchemy session in celery tasks with no global variable (and also discussed in https://github.com/celery/celery/issues/3561 )
Now, the question is, how to handle metadata elegantly? If my understanding is correct, metadata can be had once, eg:
engine = create_engine(DB_URL, encoding='utf-8', pool_recycle=3600,
pool_size=10)
# db_session = get_session() # this is old global session
meta = MetaData()
meta.reflect(bind=engine)
Reflecting on each task execution is not good for performance reason, metadata is more or less stable and thread-safe structure (if we only read it).
However, metadata sometimes changes (celery is not the "owner" of the db schema), causing errors in workers.
What could be an elegant way to deal with meta in a testable way, plus still be able to react to underlying db changes? (alembic in use, if it is relevant).
I was thinking of using alembic version change as a signal to re-reflect, but not quite sure how to make it work nicely in celery. For instance, if more than one worker will at once sense a change, the global meta may be treated in a non-thread safety way.
If it matters, celery use in the case is standalone, no web framework modules/apps/whatever present in the celery app. The problem is also simplified as only SQLAlchemy Core is in use, not object mapper.
This is only partial solution, and it's for SQLAlchemy ORM (but I guess something similar is easy to implement for Core).
Main points:
engine is at module level, but config (access URL, parameters) is from os.environ
session is in it's own factory function
at module level: BaseModel = automap_base() and then table classes use that BaseModel as superclass and usually just one argument - __tablename__, but arbitrary relationships, attributes can be added there (very similar to normal ORM use)
at module level: BaseModel.prepare(ENGINE, reflect=True)
Tests (using pytest) inject environment variable (eg DB_URL) in conftest.py at module level.
One important moment: database_session is always initiated (that is, factory function called) in the task function, and propagated into all functions explicitly. This way allows to control units of work naturally, usually one transaction per task. This also simplifies testing, because all database-using functions can be provided with fake or real (test) database session.
"Task function" is the above is a function, which is called in the function, which is decorated by task - this way task function can be tested without task machinery.
This is only partial solution, because redoing reflection is not there. If task workers can be stopped for a moment (and database anyway experience downtime due to schema changes) as those usually are backgrounds tasks, so it does not pose a problem. Workers can also be restarted by some external watchdog, which can monitor database changes. This can be made convenient by using supervisord or some other way to control celery workers running in foreground.
All in all, after I solved the problem as described above I value "explicit is better than implicit" philosophy even more. All those magical "app"s, "request"s be it in celery or Flask, may bring minuscule abbreviations in the function signatures, but I'd rather passed some kind of context down the call chain for improved testability and better context understanding and management.

Tutorial for persist data somewhere else (in a non-ethereum app) and use dapps to interact with it and verify data integrity

When building a blockchain application what is the best practice for persisting the data in a non-ethereum app for the purpose of displaying state on a website (such as a backing postgres database).
Some specific questions:
Why is Bulk access to lists/arrays/etc is described as being painful in Solidity? relevant
What is the best way to capture blockchain events and update an off-chain database.
What is the best way to verify the integrity of your off-chain database (how frequently etc.)
Can you avoid using a backing database and query the blockchain directly?
In the truffle pet shop tutorial they have a view that returns the entire chain
getAdopters() public view returns (address[16])
http://truffleframework.com/tutorials/pet-shop
Why is Bulk access to lists/arrays/etc is described as being painful in Solidity?
There's a few reasons for this:
There is no built in mechanism for iterating through collections in Solidity. You can do a standard for loop with an array using it's length, but you can't do this with mappings.
Returning complex objects is messy. If your list contains a struct, you can't simply return the struct itself to your client. You have to return the item decomposed. Converting the struct to a decomposed array is ugly:
struct MyStruct {
uint256 id;
bytes32 name;
}
MyStruct[] _structs;
function getAllStructs() public constant returns (uint256[], bytes32[]) {
uint256[] memory ids = new uint256[](_structs.length);
bytes32[] memory names = new bytes32[](_structs.length);
for (uint i = 0; i < _structs.length; i++) {
ids[i] = _structs[i].id;
names[i] = _structs[i].name;
}
return (ids, names);
}
Iterating through arrays still requires gas. Even if you're not paying for the gas (when performing this in a constant function), you can still have out of gas exceptions when your arrays become very large. The Truffle pet shop example gets away with this because it explicitly limits the array to 16 elements.
What is the best way to capture blockchain events and update an off-chain database.
"Best" way depends on your business goals and tolerance for stale data. But, probably the most common way is to set up a watch on contract events. When you receive an event, you update your DB with the custom data you stuff into the event. However, you have to make sure to handle orphaned blocks as well (there is a field in the metadata called removed which tells you if an event is no longer valid because of a block being orphaned). When you receive an orphaned event, remove it from the DB. After 12 subsequent block verifications come in, you can safely assume the event will not be orphaned.
What is the best way to verify the integrity of your off-chain database (how frequently etc.)
Again, this depends on tolerance levels of your business requirements. If you can delay using information from the DB until you're certain of block verifications, then you would simply persist block or timestamp information that lets your app know that the data in your DB mirrors what is verified on the blockchain. If you're concerned about a client process responsible for watching events has failed, you need to have failover watch clients, or allow duplicate persistence (with subsequent deduping), or track verified block numbers as the come in (or some combination of the 3). I'm sure there are plenty of other options you can architect for this as well.
Can you avoid using a backing database and query the blockchain directly?
Yes, it possible, assuming you can avoid gas limit issues as mentioned above with constant functions and you don't have to do any complex post processing of your data inside your contract for your application. Since constant functions run within the local EVM, you just need to make sure your dedicated node is up and running. To do this, you'd most likely want multiple servers running as fully synced nodes.

Is it better to handle MessagingEntityNotFoundException or should i check for Topic existance before sending message

In the code sample in documentation for Microsoft ServiceBus following code is used to make sure that the topic exists.
// Create the topic if it does not exist already
string connectionString = CloudConfigurationManager.GetSetting("Microsoft.ServiceBus.ConnectionString");
var namespaceManager =
NamespaceManager.CreateFromConnectionString(connectionString);
if (!namespaceManager.TopicExists("TestTopic"))
{
namespaceManager.CreateTopic("TestTopic");
}
But I want to know how expensive the TopicExists call will be, if I put this code before sending the message. (assume that I don't want to have initialization code separately)
Alternative approach is to be optimistic and send the message without checking for topic existence and handling MessagingEntityNotFoundException. In case of the exception we can create the topic and retry sending the message.
The second approach seems better to me, but I couldn't find any reference supporting it. So I want to know that, is there a particular reason that Microsoft in their documentation and samples chose the first approach rather than handling the exception.
One thing to bear in mind is that you need Manage permissions to the bus to create a topic. You may not want to grant this level of permission to all your clients as this can be a bit of a security risk, e.g. a client could create a subscription to read messages it's not supposed to see.
Calling TopicExists() before opening a client connection isn't very expensive and will give rise to more graceful code. If you wait for an exception to be tripped before creating anything then you may find you have a slew of failed messages on your hands.
I normally have a separate process for creating and updating the structure of the bus. How practical this is depends on how many topics and queues you are creating.

Design question: How can I access an IPC mechanism transparently?

I want to do this (no particular language):
print(foo.objects.bookdb.books[12].title);
or this:
book = foo.objects.bookdb.book.new();
book.title = 'RPC for Dummies';
book.save();
Where foo actually is a service connected to my program via some IPC, and to access its methods and objects, some layer actually sends and receives messages over the network.
Now, I'm not really looking for an IPC mechanism, as there are plenty to choose from. It's likely not to be XML based, but rather s. th. like Google's protocol buffers, dbus or CORBA. What I'm unsure about is how to structure the application so I can access the IPC just like I would any object.
In other words, how can I have OOP that maps transparently over process boundaries?
Not that this is a design question and I'm still working at a pretty high level of the overall architecture. So I'm pretty agnostic yet about which language this is going to be in. C#, Java and Python are all likely to get used, though.
I think the way to do what you are requesting is to have all object communication regarded as message passing. This is how object methods are handled in ruby and smalltalk, among others.
With message passing (rather than method calling) as your object communication mechanism, then operations such as calling a method that didn't exist when you wrote the code becomes sensible as the object can do something sensible with the message anyway (check for a remote procedure, return a value for a field with the same name from a database, etc, or throw a 'method not found' exception, or anything else you could think of).
It's important to note that for languages that don't use this as a default mechanism, you can do message passing anyway (every object has a 'handleMessage' method) but you won't get the syntax niceties, and you won't be able to get IDE help without some extra effort on your part to get the IDE to parse your handleMessage method to check for valid inputs.
Read up on Java's RMI -- the introductory material shows how you can have a local definition of a remote object.
The trick is to have two classes with identical method signatures. The local version of the class is a facade over some network protocol. The remote version receives requests over the network and does the actual work of the object.
You can define a pair of classes so a client can have
foo= NonLocalFoo( "http://host:port" )
foo.this= "that"
foo.save()
And the server receives set_this() and save() method requests from a client connection. The server side is (generally) non-trivial because you have a bunch of discovery and instance management issues.
You shouldn't do it! It is very important for programmers to see and feel the difference between an IPC/RPC and a local method call in the code. If you make it so, that they don't have to think about it, they won't think about it, and that will lead to very poorly performing code.
Think of:
foreach o, o.isGreen in someList {
o.makeBlue;
}
The programmer assumes that the loops takes a few nanoseconds to complete, instead it takes close to a second if someList happens to be remote.